From: yergeau@alis.ca Date: Tue, 27 Dec 94 14:14:21 EST Subject: [www-mling,00142] Re: language tags in www Message-Id: <9411277885.AA788560393@smtplink.alis.ca>
I just found this list, and such interesting things seem to be happening. Perhaps I can chime in with my 2 cents. bobj@mcom.com (Bob Jung) writes: > > [...deletia...] > >Ken>3. Is ISO2022 suitable/enough for multi-lingual WWW document? > >My current thinking is that this should be provided in the short >term. Because of the existing Japanese data in ISO2022-JP encoding, >we need >to support that. And there's not much more work to extend it to support >many of the additionally registered "charsets" (i.e.. ISO8859-x). ...as well as, for that matter, UTF-7. >Supporting ISO2022 does not require additional HTML tags and thus will >to introduce any HTML incompatibility. It shouldn't conflict with >new proposals for HTML encoding/charset tags. I may be wrong, but doesn't that suppose that the browsers have to *assume* ISO-2022? Is this acceptable? There's also the issue of robustness. Being modal, ISO-2022 text is vulnerable to the loss/corruption of just a tiny fraction of the bytes. >Ken>4. Is 'bi-directional' independent from the code set? Yes ! >Ken>5. Is it acceptable that converting existing (many) pages >Ken> to conform new standard? Reality check: it would not happen, IMHO, so it would be useless to request it. The 'installed base' we have to live with, for the most part. >Ken>6. Should every browser support every code set in standard? All those in RFC 1700? Tall order! >Ken>7. What should the browser do if it cannot display the character >Ken> in the document correctly? > >... it should degrade elegantly and at least warn the user if it cannot >support the code set or have the appropriate font for the current doc. A nice, sophisticated browser could offer fallback strategies, like transliteration (e.g. Cyrillic -> Latin) or transcription (e.g. Chinese -> Pynyin). >Ken>8. Should the code conversion is done by the server? > >It should not be required to. Right. Requiring it would put every existing server out of business. >Philosophically, I feel the server's job >is to send the bytes over to the browser client for the requested data. >It's the browser client's job to display it. > >But, I think it would be nice if the browser could ***hint*** to the >server what it would like. And it would be even nicer if the client knew it had a better than even chance of being granted a request for some lingua franca, like Unicode. If servers generally support that, clients do not have to be fat, support-every-encoding-in-the-world super-apps. >HTTP does have the "Accept-Language" request header, that might be >used for this hinting. That, IMHO, would be a big mistake. Language is separate (but related, of course) from charset issues. I can write French in ISO-8859-1, CP437, CP850, CP863, the Macintosh charset and even, with some compromise, in ASCII or EBCDIC. Likewise, ISO-8859-1 can be used to represent English, French, Spanish, Italian, Icelandic, etc. If I use the Accept-Language header to ask for ISO-8859-1 from a Finnish server, do I get the Finnish version of the document, which I can't read? If I ask for English, do I get EBCDIC? I think we need another header. -- Francois Yergeau -- <yergeau@alis.ca> Alis Technologies Inc., Montreal +1-514-738-9171![]()