From: Larry Masinter <masinter@parc.xerox.com>
Date: Thu, 19 Jan 1995 17:45:42 PST
Subject: [www-mling,00187] Re: Language hints in UNICODE private use area
Message-Id: <95Jan19.174551pst.2760@golden.parc.xerox.com>
> In the context of HTML, the most sensible solution to the language tagging
> problem is to add an architectural form with inheritence semantics such as
> is adopted by TEI. Why reinvent the wheel?
Those who want to see this done could make a concrete proposal for
inclusion in HTML 3.0 to the html working group.
From: pandries@alis.ca
Date: Thu, 19 Jan 95 19:23:28 EST
Subject: [www-mling,00186] Re[2]: Language hints in UNICODE private use area
Message-Id: <9500197905.AA790564995@smtplink.alis.ca>
>This discussion originated from my *real* proposal which is to have
>Unicode be the core character set that every browser should
>understand. This other issue is of less import, but without
>something, the Japanese will be reluctant to accept Unicode. Having
>Unicode be the common character set does *not* mean that iso8859-1
>could not be used. The Accept-Charset: parameter (which should
>appear in http 1.1, and the charset= parameter on the text/html mime
>type will provide ways of allowing character set negotiation.
The need for language tags in html exceeds the simple unicode CJK
rendering problem. I don't deny that it would be useful to have
presentational hints in Unicode to accelerate its acceptance in Japan.
However this is a different problem than the one we are trying to
solve here. We need language tags that are independent of the charset
(Thus outside Unicode) to do any kind of useful processing on the
multilingual data beside the CJK representation. (Spelling,
hyphenating, indexing, translating...) These tags should not only be
available in the new Unicode documents but also in all those other
documents still coded in ISO-8859-1.
>We are not discussing the interpretation of Unicode characters, but
>rather the transfer encoding of text/html and other textual data
>sent via http.
And therefore the definition of language tags should not be restricted
to Unicode.
Patrick Andries
Alis Technologies
From: Glenn Adams <glenn@stonehand.com>
Date: Thu, 19 Jan 95 15:31:19 -0500
Subject: [www-mling,00185] Re: Language hints in UNICODE private use area
Message-Id: <9501192031.AA08644@trubetzkoy.stonehand.com>
Thanks for restating the Unicode position re: language tags. You're
statement is on the mark.
Date: Fri, 20 Jan 95 05:01:34 +0900
From: pandries@alis.ca
4) I have the impression that this may not be the proper forum
(html-wg, http-wg) to discuss changes of interpretation of Unicode
characters or codes. I am not convinced that these changes will easily
be accepted by the Unicode consortium. It might be much easier to
create an html tag for this purpose.
Although I haven't seen the proposed changes, I can state authoritatively
that the Unicode Technical Committee *will not* agree to the use of
standardized Unicode code points for the purpose of denoting language.
If you wish to use the private use codes for this purpose, then that is
up to you. But keep in mind that private use codings are, by definition,
internal codes not available for interchange except among private parties.
In the context of HTML, the most sensible solution to the language tagging
problem is to add an architectural form with inheritence semantics such as
is adopted by TEI. Why reinvent the wheel?
Regards,
Glenn Adams
From: Glenn Adams <glenn@stonehand.com>
Date: Thu, 19 Jan 95 15:59:08 -0500
Subject: [www-mling,00184] Re: Language hints in UNICODE private use area
Message-Id: <9501192059.AA08656@trubetzkoy.stonehand.com>
By the way, someone recently referred to a document created by Masataka Ohta
on proposing an extension of Unicode called ICODE, and referred to a file which
I put on my FTP site describing this proposal.
I should say that I strongly disagree with the approach advocated
in this proposal and that this view is shared by pretty much all Unicode
implementers. Furthermore, this proposal indicates a lack of understanding
regarding the issues of bidirectional text processing.
My putting that proposal on my FTP archive does not mean I agree with
or advocate the proposal. In fact, my original intention in putting there
was to write a contra-ICODE document detailing why it was misconceived, etc.
Unfortunately I haven't found the time to do that yet.
Regards,
Glenn Adams
From: pandries@alis.ca
Date: Thu, 19 Jan 95 14:59:54 EST
Subject: [www-mling,00183] Language hints in UNICODE private use area
Message-Id: <9500197905.AA790549215@smtplink.alis.ca>
David Goldsmith recently commented :
>Section 3.2.4
>This is probably the section I have the most problem with. Unicode
>specifically was designed with the idea that attributes such as
>language, fonts, etc. would be encoded out-of-band, via high level tags
>or even out of the character stream entirely.
I tend to wholeheartly agreed with David's thought. I have great
reservations with the idea of using the UNICODE private use area for
encoding language hints. I have basically four reasons :
1) As David mentionned Unicode was explicitly designed not to address
this issue.
2) How do you generalise this idea with encodings where there is no
bytes left for language hinting ? I can write French, Dutch, English
and German, for instance, using ISO-8859-1 : do I have to use Unicode
even in a purely European setting so that I can tag texts ? What about
the fact that today the text base available is mainly in ISO-latin-1 ?
3) It is easy to add, in upward compatible fashion, a tag called, for
example, <lang=...>. Browsers that do not understand the tag will
simply ignore it.
4) I have the impression that this may not be the proper forum
(html-wg, http-wg) to discuss changes of interpretation of Unicode
characters or codes. I am not convinced that these changes will easily
be accepted by the Unicode consortium. It might be much easier to
create an html tag for this purpose.
Patrick Andries
Alis Technolgies Inc.
1+514+738 91 71