From: yergeau@alis.ca
Date: Tue, 27 Dec 94 14:14:21 EST
Subject: [www-mling,00142] Re: language tags in www
Message-Id: <9411277885.AA788560393@smtplink.alis.ca>


I just found this list, and such interesting things seem to be happening.  
Perhaps I can chime in with my 2 cents.

 bobj@mcom.com (Bob Jung) writes:
 >
 > [...deletia...]
 >
 >Ken>3.      Is ISO2022  suitable/enough for multi-lingual WWW document?
 >
 >My current thinking is that this should be provided in the short 
 >term. Because of the existing Japanese data in ISO2022-JP encoding, 
 >we need
 >to support that.  And there's not much more work to extend it to support 
 >many of the additionally registered "charsets" (i.e.. ISO8859-x).

...as well as, for that matter, UTF-7.
 
 >Supporting ISO2022 does not require additional HTML tags and thus will 
 >to introduce any HTML incompatibility.  It shouldn't conflict with
 >new proposals for HTML encoding/charset tags.

I may be wrong, but doesn't that suppose that the browsers have to *assume* 
ISO-2022?  Is this acceptable?

There's also the issue of robustness.  Being modal, ISO-2022 text is vulnerable 
to the loss/corruption of just a tiny fraction of the bytes.
 
 >Ken>4.      Is 'bi-directional' independent from the code set?

Yes !
 
 >Ken>5.      Is it acceptable that converting existing (many) pages 
 >Ken>        to conform new standard?

Reality check:  it would not happen, IMHO, so it would be useless to 
request it.  The 'installed base' we have to live with, for the most 
part.
 
 >Ken>6.      Should every browser support every code set in standard?

All those in RFC 1700?  Tall order!

 >Ken>7.      What should the browser do if it cannot display the character 
 >Ken>        in the document correctly?
 >
 >... it should degrade elegantly and at least warn the user if it cannot 
 >support the code set or have the appropriate font for the current doc.

A nice, sophisticated browser could offer fallback strategies, like 
transliteration (e.g. Cyrillic -> Latin) or transcription (e.g. Chinese -> 
Pynyin).

 >Ken>8.      Should the code conversion is done by the server?
 >
 >It should not be required to.

Right.  Requiring it would put every existing server out of business.

 >Philosophically, I feel the server's job
 >is to send the bytes over to the browser client for the requested data. 
 >It's the browser client's job to display it.
 >
 >But, I think it would be nice if the browser could ***hint*** to the 
 >server what it would like.
 
And it would be even nicer if the client knew it had a better than even 
chance of being granted a request for some lingua franca, like Unicode.  
If servers generally support that, clients do not have to be fat, 
support-every-encoding-in-the-world super-apps.

 >HTTP does have the "Accept-Language" request header, that might be 
 >used for this hinting.

That, IMHO, would be a big mistake.  Language is separate (but 
related, of course) from charset issues.  I can write French in 
ISO-8859-1, CP437, CP850, CP863, the Macintosh charset and even, with 
some compromise, in ASCII or EBCDIC.  Likewise, ISO-8859-1 can be 
used to represent English, French, Spanish, Italian, Icelandic, etc.
If I use the Accept-Language header to ask for ISO-8859-1 from a 
Finnish server, do I get the Finnish version of the document, which I 
can't read?  If I ask for English, do I get EBCDIC?  I think we need 
another header.


-- 
Francois Yergeau  --  <yergeau@alis.ca>
Alis Technologies Inc., Montreal
+1-514-738-9171