From: Glenn Adams <glenn@stonehand.com>
Date: Wed, 21 Dec 94 23:57:47 -0500
Subject: [www-mling,00137] Unicode Web Home Page
Message-Id: <9412220457.AA00784@trubetzkoy.stonehand.com>



I just noticed this mailing list and see that a reference was
made earlier to a page on the Unicode web database that I was
(am still) constructing.  The official Unicode Web home page is now
available at

  http://unicode.org/

Although I haven't read through the entire archive of this list
on the topic of Unicode, I'd like to make a couple of comments
on what I've seen so far:

1. Unicode makes for a very good internal processing code in
environments that may need to deal with multiple external (existing)
character sets and coding systems.  As such, choosing Unicode as
an internal code for your application puts you in a good position
when you begin to make the transition to multiple locales or multi-
lingual texts.

2. As is quite obvious, Unicode isn't going to take over immediately;
however, a lot of inertia is building up behind it, particularly since
there are no viable alternatives for a standard, universal character
set (most people don't consider ISO 2022 text a viable alternative).

3. It does make a lot of sense to implement Unicode in well-defined
steps.  Few people (if any) are going to implement a Unicode system
that supports Japanese, Arabic, and Sanskrit on day one.  Furthermore,
it is almost trivial to use Unicode instead of ISO 8859-1 (as an
internal processing code), or, for that matter, instead of SJIS
(although the assistance, and performance hit of translating codes
is necessary in the latter case during import/export/display, etc.)

OK, with that, I'll finish up.  I'm sure that the current interest
in Unicode with help further the "Worldwide" aspects of the Web as
it grows.

Happy Holidays!

Glenn Adams
Technical Director, Unicode Consortium

Following is the formal announcement made yesterday about the Unicode
Web Page.  (My apologies if you are receiving a duplicate of the following).

----------------------------------

The Unicode Worldwide Web home page is now available for
general use.  Although it's construction is ongoing, it now
contains a variety of useful data on the Unicode Standard
and the Unicode Consortium.

You may access it at the following URL:

  http://unicode.org/

The above URL is currently redirected to

  http://www.stonehand.com/unicode.html

where the physical Unicode Web database presently resides.  You should
use the first of the above URLs since the physical location of this
database may change over time.  However, if for some reason the
former cannot be accessed, you may also try the latter.

If you would like to contribute information to the Unicode home
page, or if you have any comments on it, please send mail to
<webmaster@unicode.org>.

The HTML data which comprises the Unicode Web database has been
fully validated against the HTML 2.0 DTD. In only one case have
non-standard extensions been used: that is with the two images of
the cover of the Unicode Standard, Version 1.0, Volumes 1 and 2
which appear on the "About The Unicode Standard" page.  In this
case, the proposed attributes BORDER, WIDTH, and HEIGHT have
been used in order to make use of capabilities provided by the
Netscape Navigator(tm) of Mosaic Communications Corporation.

For those who aren't aware of it, The Unicode Consortium is primarily
a volunteer organization. The effort to create this Web database
has been made on that basis. As a result, not every piece of useful
information about Unicode is currently present in this database.
However, I do anticipate that additions will be made gradually as
I find more time (and as others step forward to contribute). In order
to help you find new information, a "What's New?" topic is available
on the home page which will point you to recently changed or new
information.

I hope you find this information useful, and I'd like to thank all
of the individuals who contributed their time to review it during
its preliminary stages of construction.

Glenn Adams
----------------------------------


From: Jean-Francois Groff <jfg@infodesign.ch>
Date: Wed, 21 Dec 94 10:12:30 +0100
Subject: [www-mling,00136] Re: language tags in www
Message-Id: <9412210912.AA03494@infodesign.ch>


Says "Jim A. Fetters" <fetters@enuxsa.eas.asu.edu>:

> In addition to the problem of character sets, the HTTPd
> server should recognize the capabilities of the browser.
> If the browser can display Japanese, the server should
> then provide the information in Japanese.  There is a
> problem with a single HTML document in several languages.
> Currently, there is no way to provide the information in
> various languages other than to provide a links to
> seperate language pages: Japanese, Korean, etc.  There
> needs to be a way to specify one HTML document where the
> server sends the appropriate languages to the browser
> based on the user/browsers capabilities or preferences.
> 

> We should discuss standards for developing:
> 

> 1. [...]
> 

> 2. multi-lingual document serving: a single HTML document
> which responds to the browser's capabilities/preferences

  In the HTTP/1.0 request syntax, there is already an "Accept-Language:"  
header which was introduced precisely for this purpose. Implementation in  
browsers has been largely neglected, although it's very little effort to  
support it, and add a corresponding user preference in the interface. See  
http://info.cern.ch/hypertext/WWW/Protocols/HTTP1.0/HTTP1.0-ID_24.html for  
full details.
  

  Browser writers, take note :-)

    Jean-Francois Groff <jfg@infodesign.ch>   (NeXT-Mail preferred)
    Founder, InfoDesign                        Tel: +41-22-785.4132
    Professional World-Wide Web Services       Fax: +41-22-785.4133
    Mail: 38 chemin Grand Puits, CH-1217 Geneva-Meyrin, Switzerland


From: bobj@mcom.com (Bob Jung)
Date: Wed, 21 Dec 1994 11:38:00 -0800
Subject: [www-mling,00135] Re:  RE:  language tags in www
Message-Id: <199412211938.LAA27775@neon.mcom.com>


Ken Itakura did a great job compiling the list of mling related URLs.  Thanks!

In case anyone else had trouble accessing:
  <ftp://isi.edu/internet-drafts/draft-ietf-mailext-lang-tag-01.txt>

you can also access it at this host:
  <ftp://ds.internic.net/internet-drafts/draft-ietf-mailext-lang-tag-01.txt>

Ken>Frankly speaking, I don't know how we should start talking about this.
But I
Ken>think it's an idea that putting my questions here as the starting point.

Great starting point.  We're not at this stage yet, but I'd like this
discussion to eventually be summarized in a proposal.  As Jim A. Fetters
writes:

Jim>Its not beyond the capabilites of this group to develop such standards.
Jim>If we do not act to decide our own fate, then it will be decided FOR us
Jim>by others.

In reply to your questions, here are
my 2 cents (or 2 yen or whatever you use in your locale!):

Ken>1.      Can Unicode be the single code set for the WWW document?

I agree with Jim A. Fetters:
Jim>I don't think UNICODE is the best solution, rather I think
Jim>that the best solution is an open system which can incorporate
Jim>many character-sets which the tagging allows for.

As I wrote earlier, we need the data to identify its charset encoding.
(I purposely did not use the word "tag" because of its special meaning
in some contexts.)  As long as we can make that happen, we should not
care what the "charset" the data is encoded in.

For single "charset" data (which is basically all we have today), using
the MIME charset header may be sufficient.  At least my browser will know
not to render Latin1 data as SJIS.  Most of the popularly used charsets
are already registered MIME charset types.  But as Ken wrote:

Ken>the default code set can be specified by the MIME header. It's the
Ken>first step but it's not enough for multi-lingual document.

We might want to recommend that all browsers support Unicode and
therefore content providers can be guaranteed that Unicode files will
be interpreted correctly.  But there remains the problem of guaranteeing
fonts are available for correct rendering.

Finally the pragmatic answer to 1.:  The horse is already out of the barn.
We already have WWW docs in ASCII, Latin1, JIS, SJIS, etc.  We will have
to support these encodings.

If we take the approach that Jim, I and others are suggesting, then
Unicode data becomes a subset of the supported encodings.

If we agree on that, then I would suggest the follow-on question:

1a.  What should the default charset be for files with unidentified charset?

Rather than hardcoding it to Unicode, I think this should be a user settable
preference.  In Japan, the most likely default might be JIS.  In France,
it might be Latin1.  Someday when NT is out there, some may want Unicode.

Ken>2.      Is UTF better than canonical Unicode for WWW document?

"better" for what?  We need to clarify this.

Return-Path: the existing technology perspective, UTF is going to be easy to
make the current browsers' parser & layout engines handle UTF.  Handling
canonical Unicode will take some work.

There are potential byte-ordering issues with canonical Unicode (but not with
UTF).  But there may be simple conventions that can resolve this.

Return-Path: the content provider perspective, what's "better"?  If many people
are using NT or other canonical Unicode-capable systems, will it be easy
for them to create and convert documents to UTF?

My gut tells me, we will have to handle both.  To a large degree it's
the content (and future content) that will be driving our design not vice
versa.  Afterall it's only software. :-)

Pragmatically, browser providers (e.g., Netscape) may choose a phased
approach to support UTF first and add canonical Unicode later.

Ken>3.      Is ISO2022  suitable/enough for multi-lingual WWW document?

My current thinking is that this should be provided in the short term.
Because of the existing Japanese data in ISO2022-JP encoding, we need
to support that.  And there's not much more work to extend it to support
many of the additionally registered "charsets" (i.e.. ISO8859-x).

Supporting ISO2022 does not require additional HTML tags and thus will
to introduce any HTML incompatibility.  It shouldn't conflict with
new proposals for HTML encoding/charset tags.

Is it a long term solution?  I have not formed my opinion yet.  The
escape sequences are kinda ugly and Unicode seems more elegant.  On
the other hand, some folks in Japan have reservations about Unicode.

Ken>4.      Is 'bi-directional' independent from the code set?

Ken>5.      Is it acceptable that converting existing (many) pages
Ken>        to conform new standard?

We should strive to support existing data.  But we should also provide
guidelines for new data ASAP.

Ken>6.      Should every browser support every code set in standard?

I don't think so, but... (see reply to 7)

Ken>7.      What should the browser do if it cannot display the character
Ken>        in the document correctly?

... it should degrade elegantly and at least warn the user if it cannot
support the code set or have the appropriate font for the current doc.

Ken>8.      Should the code conversion is done by the server?

It should not be required to.  Philosophically, I feel the server's job
is to send the bytes over to the browser client for the requested data.
It's the browser client's job to display it.

But, I think it would be nice if the browser could ***hint*** to the
server what it would like.

For example, if I'm on a Mac or Windows system, I'd like to be able
to hint to the server "send me SJIS".  If the server has a SJIS version
or is able to do the conversion, then great.  Otherwise, the server
would go ahead and send the JIS version.  Regardless of the hint,
the browser client would always look at what charset is being
returned and be ready to convert if necessary.

HTTP does have the "Accept-Language" request header, that might be
used for this hinting.

Ken>9.      The format of the code set tag.

No suggestion yet, but we should not consider code set in isolation.
Code sets are often closely related (Whether or not they should be!)
to other string stylistic characteristics (e.g., font, bold, underline).

We also need to study all the docs that Ken summarized to figure out the
good, the bad and the ugly of the other mling efforts!

Ken>Ken

Regards,
Bob

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041


From: Ajay Shekhawat <ajay@cedar.buffalo.edu>
Date: Wed, 21 Dec 1994 12:13:20 -0500 (EST)
Subject: [www-mling,00134] Re: Unsubscribe
Message-Id: <199412211713.MAA07573@albali.cs.buffalo.edu>


I am seeing more and more of these UNSUBSCRIBE messages, so let me
further waste the bandwidth in the hope that we won't see any more of
these messages. Here goes:

If you want to UNSUBSCRIBE from this mailing list, DO NOT send mail
to "www-mling@square.ntt.jp", and DO NOT "reply" to any message that
comes along. Send mail to www-mling-request@square.ntt.jp instead.

Thank you, and I wish you a very Happy New Year.

Ajay
--
ajay@cs.Buffalo.EDU


From: "Chansup Byun" <byun@nas.nasa.gov>
Date: Wed, 21 Dec 1994 08:46:56 -0800
Subject: [none given]
Message-Id: <9412210846.ZM8400@win50.nas.nasa.gov>


Please unsubscribe me. Thank you.

Chansup Byun

-- 
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
* Chansup Byun, Ph.D.                   * Aplied Computational Aerodynamics * 
* MCAT Institute                        * byun@nas.nasa.gov                 *
* M/S 258-2, NASA Ames Research Center  * Phone: (415) 604-4526             *
* Moffett Field, CA  94035              * Fax  : (415) 604-4377             *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *