From: Larry Masinter <masinter@parc.xerox.com>
Date: Thu, 19 Jan 1995 17:45:42 PST
Subject: [www-mling,00187] Re:  Language hints in UNICODE private use area
Message-Id: <95Jan19.174551pst.2760@golden.parc.xerox.com>


> In the context of HTML, the most sensible solution to the language tagging
> problem is to add an architectural form with inheritence semantics such as
> is adopted by TEI.  Why reinvent the wheel?

Those who want to see this done could make a concrete proposal for
inclusion in HTML 3.0 to the html working group.


From: pandries@alis.ca
Date: Thu, 19 Jan 95 19:23:28 EST
Subject: [www-mling,00186] Re[2]: Language hints in UNICODE private use area
Message-Id: <9500197905.AA790564995@smtplink.alis.ca>




     
     
>This discussion originated from my *real* proposal which is to have 
>Unicode be the core character set that every browser should 
>understand. This other issue is of less import, but without 
>something, the Japanese will be reluctant to accept Unicode. Having 
>Unicode be the common character set does *not* mean that iso8859-1 
>could not be used. The Accept-Charset: parameter (which should 
>appear in http 1.1, and the charset= parameter on the text/html mime 
>type will provide ways of allowing character set negotiation.
     
     The need for language tags in html exceeds the simple unicode CJK 
     rendering problem. I don't deny that it would be useful to have 
     presentational hints in Unicode to accelerate its acceptance in Japan. 
     However this is a different problem than the one we are trying to 
     solve here. We need language tags that are independent of the charset 
     (Thus outside Unicode) to do any kind of useful processing on the 
     multilingual data beside the CJK representation. (Spelling, 
     hyphenating, indexing, translating...) These tags should not only be 
     available in the new Unicode documents but also in all those other 
     documents still coded in ISO-8859-1. 
     
>We are not discussing the interpretation of Unicode characters, but 
>rather the transfer encoding of text/html and other textual data 
>sent via http.
     
     And therefore the definition of language tags should not be restricted 
     to Unicode.
     
  Patrick Andries
  Alis Technologies


From: Glenn Adams <glenn@stonehand.com>
Date: Thu, 19 Jan 95 15:31:19 -0500
Subject: [www-mling,00185] Re:  Language hints in UNICODE private use area
Message-Id: <9501192031.AA08644@trubetzkoy.stonehand.com>

Thanks for restating the Unicode position re: language tags.  You're
statement is on the mark.

  Date: Fri, 20 Jan 95 05:01:34 +0900
  From: pandries@alis.ca

     4) I have the impression that this may not be the proper forum
     (html-wg, http-wg) to discuss changes of interpretation of Unicode
     characters or codes. I am not convinced that these changes will easily
     be accepted by the Unicode consortium. It might be much easier to
     create an html tag for this purpose.

Although I haven't seen the proposed changes, I can state authoritatively
that the Unicode Technical Committee *will not* agree to the use of
standardized Unicode code points for the purpose of denoting language.
If you wish to use the private use codes for this purpose, then that is
up to you.  But keep in mind that private use codings are, by definition,
internal codes not available for interchange except among private parties.

In the context of HTML, the most sensible solution to the language tagging
problem is to add an architectural form with inheritence semantics such as
is adopted by TEI.  Why reinvent the wheel?

Regards,
Glenn Adams


From: Glenn Adams <glenn@stonehand.com>
Date: Thu, 19 Jan 95 15:59:08 -0500
Subject: [www-mling,00184] Re:  Language hints in UNICODE private use area
Message-Id: <9501192059.AA08656@trubetzkoy.stonehand.com>



By the way, someone recently referred to a document created by Masataka Ohta
on proposing an extension of Unicode called ICODE, and referred to a file which
I put on my FTP site describing this proposal.

I should say that I strongly disagree with the approach advocated
in this proposal and that this view is shared by pretty much all Unicode
implementers.  Furthermore, this proposal indicates a lack of understanding
regarding the issues of bidirectional text processing.

My putting that proposal on my FTP archive does not mean I agree with
or advocate the proposal.  In fact, my original intention in putting there
was to write a contra-ICODE document detailing why it was misconceived, etc.
Unfortunately I haven't found the time to do that yet.

Regards,
Glenn Adams


From: pandries@alis.ca
Date: Thu, 19 Jan 95 14:59:54 EST
Subject: [www-mling,00183] Language hints in UNICODE private use area
Message-Id: <9500197905.AA790549215@smtplink.alis.ca>


     
     David Goldsmith recently commented :
     
>Section 3.2.4
>This is probably the section I have the most problem with. Unicode 
>specifically was designed with the idea that attributes such as 
>language, fonts, etc. would be encoded out-of-band, via high level tags
>or even out of the character stream entirely.
     
     I tend to wholeheartly agreed with David's thought. I have great 
     reservations with the idea of using the UNICODE private use area for 
     encoding language hints. I have basically four reasons :
     
     1) As David mentionned Unicode was explicitly designed not to address 
     this issue.
     
     2) How do you generalise this idea with encodings where there is no 
     bytes left for language hinting ? I can write French, Dutch, English 
     and German, for instance, using ISO-8859-1 : do I have to use Unicode 
     even in a purely European setting so that I can tag texts ? What about 
     the fact that today the text base available is mainly in ISO-latin-1 ?
     
     3) It is easy to add, in upward compatible fashion, a tag called, for 
     example, <lang=...>. Browsers that do not understand the tag will 
     simply ignore it.
     
     4) I have the impression that this may not be the proper forum 
     (html-wg, http-wg) to discuss changes of interpretation of Unicode 
     characters or codes. I am not convinced that these changes will easily 
     be accepted by the Unicode consortium. It might be much easier to 
     create an html tag for this purpose. 
     
     
     
     
     Patrick Andries
     Alis Technolgies Inc.
     1+514+738 91 71