From: sznb67@sun32.hqs.mid.gmeds.com (Lin Qun 986-2454)
Date: Wed, 21 Dec 1994 08:45:56 +0500
Subject: [www-mling,00132] Re: The future of Multilingual WWW
Message-Id: <199412211347.AA06724@gmlink.gmeds.com>


Please sign me off! Thank you!

Qun Lin


From: Axel Belinfante <Axel.Belinfante@cs.utwente.nl>
Date: Wed, 21 Dec 1994 13:02:25 +0100
Subject: [www-mling,00131] Re: language tags in www 
Message-Id: <9412211202.AA22528@utis179.cs.utwente.nl>


Jim A. Fetters" <fetters@enuxsa.eas.asu.edu> wrote:

> I don't think UNICODE is the best solution, rather I think
> that the best solution is an open system which can incorporate
> many character-sets which the tagging allows for.

Agreed (note: i know very little in this field, but open-ness sounds good :-)
(and i do think unicode may be one of the 'more useful' character sets used
in such an open system)

> In addition to the problem of character sets, the HTTPd
> server should recognize the capabilities of the browser.
> If the browser can display Japanese, the server should then provide
> the information in Japanese.  There is a problem with a single
> HTML document in several languages.  Currently, there is no way
> to provide the information in various languages other than to
> provide a links to seperate language pages: Japanese, Korean, etc.
> There needs to be a way to specify one HTML document where the
> server sends the appropriate languages to the browser based on the
> user/browsers capabilities or preferences.

Have a look at 'Accept-Language' in
- The internet draft of HTTP specification
	<ftp://ds.internic.net/internet-drafts/draft-fielding-http-spec-00.txt>

In the 'unsupported' directory of the3.0o beta release of the Plexus http
server there is some code that implements (parts of?) it.
This has also been implemented in libwww2.17 and CERN httpd3.0.
In these implementations, it is possible to have the browser ask for a
document that has a '.multi' suffix, which triggers the server to look at the
	Accept:
	Accept-Language:
	Accept-Encoding:
lines in the header of the request to return an 'appropriate' document.

[Disclaimer: i have no experience with this, although i would like to
 experiment with it - i do have a document in a couple of languages]
 
> We should discuss standards for developing:
[...]
> 2. multi-lingual document serving: a single HTML document which
>    responds to the browser's capabilities/preferences

I don't have a pointer ready, but the 'multi-format' work is definitely
something to look into.

Ps: in another message [www-mling,00126] i mentioned the 9term and sam.
They do display unicode, but it is not completely clear to me how one can
easily _enter_ 'funny' characters when using them.

Axel.

<Axel.Belinfante@cs.utwente.nl>     tel. +31 53 893774     fax. +31 53 333815
 University of Twente, Dept. of C.S., Tele-Informatics & Open Systems Group
       P.O. Box 217    NL-7500 AE Enschede      The Netherlands
     "ili ne sciis ke estas neebla do ili simple faris" -- Loesje


From: CHEN YiLong <cyl@ifcss.org>
Date: Wed, 21 Dec 1994 05:53:55 -0400 (EST)
Subject: [www-mling,00130] news/gopher+ i18n support?
Message-Id: <Pine.PCW.3.91.941221054908.10223B-100000@PPP-71-17.BU.EDU>


i was wondering should www browsers support the gopher+ protocol, 
esp of the internationalisation implementation in gopher 2.1.1,
where different language pages of the same item can be selected.

also should www browsers post/read news with support of MIME charset 
parameter and Content-Transfer-Encoding, i guess this also goes 
for mailing capabilities in www browsers also.

Nelson


From: Ken Itakura U3/ISE-Japan 8-694-6422 DECpark 4F  21-Dec-1994 1902 <itakura@jrdv04.enet.dec-j.co.jp>
Date: Wed, 21 Dec 94 19:12:45 +0900
Subject: [www-mling,00129] RE:  language tags in www
Message-Id: <9412211012.AA24741@jrdmax.jrd.dec.com>


Hi,

I went to see some internet drafts and the archives of another mailing list.

- The documents of SGML TEI (Text Encoding Initiative) (Thanks to Bob)
        <ftp://ftp.ifi.uio.no/pub/SGML/TEI>
- Editor and Terminal emulator for Unicode (Thanks to Axel)
	<http://cooper.edu:9000/~rp/plan9/plan9-info.html>
- a program that translates character sets (Thanks to Axel)
	<ftp://plan9.att.com/plan9/unixsrc/tcs.shar.Z>
- The internet draft of language tag specs (Thanks to Nelson)
	<ftp://isi.edu/internet-drafts/draft-ietf-mailext-lang-tag-01.txt>
- The internet draft of HTML specification (See 2.16)
	<ftp://ds.internic.net/internet-drafts/draft-ietf-html-spec-00.txt>
- The internet draft of HTTP specification
	<ftp://ds.internic.net/internet-drafts/draft-fielding-http-spec-00.txt>
- The internet draft of Multilingual Text Encoding, ISO2022-INT, specification
	<ftp://ds.internic.net/internet-drafts/draft-ohta-text-encoding-01.txt>
- HTML+ specification
	<http://info.cern.ch/hypertext/WWW/MarkUp/HTMLPlus/htmlplus_1.html>
	<http://info.cern.ch/hypertext/WWW/MarkUp/HTMLPlus/htmlplus_13.html>
- archive of www-talk mailing list (Find 'multilingual', 'Unicode' ...)
	<http://gummo.stanford.edu/html/hypermail/www-talk-1994q3/index.html>
	<http://gummo.stanford.edu/html/hypermail/www-talk-1994q4/subject.html>
- archive of www-html mailing list (Find 'multi' in www-html-9406)
	<http://www0.cern.ch/hypertext/WWW/Archive/www-html>

There are many thread related to multilinguality in the world.
One is of course about the code set that is argued (or I want to argue) here, 
which is found in the archive of www-html, also.
The other is about using Unicode, bi-directional text in Hebrew, multi-part 
document, and Language attributes include ancient language.
I think multi-part and Language attribute can be separate from the argument 
of the code set. Because multi-part is apparently in different level, and the 
language attribute can be handled as an independence occurrence from code 
set.
So I'd like to focus the argument here to 'code set', 'Unicode' and 
'bi-directional'. (Can 'bi-directional' be also separated?)

The achievement by now is only the description of 'character set' in the 
specification of html+. The description is not clear for me, but I suppose it 
says that the default code set can be specified by the MIME header. It's the 
first step but it's not enough for multi-lingual document.
One problem is that the specification of SGML (may be) not allow the multiple 
code set in a document. (right ?) If so, we have to propose to change the 
spec of SGML, too.

Frankly speaking, I don't know how we should start talking about this. But I 
think it's an idea that putting my questions here as the starting point.

1.	Can Unicode be the single code set for the WWW document?
2.	Is UTF better than canonical Unicode for WWW document?
3.	Is ISO2022  suitable/enough for multi-lingual WWW document?
4.	Is 'bi-directional' independent from the code set?
5.	Is it acceptable that converting existing (many) pages 
	to conform new standard?
6.	Should every browser support every code set in standard?
7.	What should the browser do if it cannot display the character
	in the document correctly?
8.	Should the code conversion is done by the server?
9.	The format of the code set tag.

Ken


From: "Jim A. Fetters" <fetters@enuxsa.eas.asu.edu>
Date: Wed, 21 Dec 1994 01:13:42 -0700 (MST)
Subject: [www-mling,00128] Re:  language tags in www
Message-Id: <Pine.SOL.3.90.941221004144.21031B-100000@enuxsa.eas.asu.edu>


Language tags would provide a <good> solution </good>

I don't think UNICODE is the best solution, rather I think
that the best solution is an open system which can incorporate
many character-sets which the tagging allows for.

In addition to the problem of character sets, the HTTPd
server should recognize the capabilities of the browser.
If the browser can display Japanese, the server should then provide
the information in Japanese.  There is a problem with a single
HTML document in several languages.  Currently, there is no way
to provide the information in various languages other than to
provide a links to seperate language pages: Japanese, Korean, etc.
There needs to be a way to specify one HTML document where the
server sends the appropriate languages to the browser based on the
user/browsers capabilities or preferences.

We should discuss standards for developing:

1. character set tagging: with emphasis on an open system which can
   utilize UNICODE, JIS, EUC, etc. and any future coding systems.

2. multi-lingual document serving: a single HTML document which
   responds to the browser's capabilities/preferences


Its not beyond the capabilites of this group to develop such standards.
If we do not act to decide our own fate, then it will be decided FOR us
by others.

Re: netscape

I agree with Bob, and netscape should be left to develop its products
as it sees fit.  Our questions of multi-lingual support of the WWW, 
should not be aimed at netscape.  Netscape is not the enemy nor is it
an ultimate solution to our problems of multi-lingual exchange.

The central idea of this debate is that if we (the multi-lingugal
community) do not develop standards for WWW, we might be shut out
from future mark-up lanugage specifications and become out of the loop.
Its time to stop being spectators and start becomming participants.
Don't wait for companies to give you what you want, the support needs
to come from the users.  And the users are us.

Jim


From: Nelson Chin <butta1@bu.edu>
Date: Tue, 20 Dec 1994 19:10:34 -0500 (EST)
Subject: [www-mling,00127] language tags in www
Message-Id: <Pine.SOL.3.91.941220190856.3442A-100000@csa>



will the language tag specs in this internet draft work in WWW?

ftp://isi.edu/internet-drafts/draft-ietf-mailext-lang-tag-01.txt

Nelson


From: Axel Belinfante <Axel.Belinfante@cs.utwente.nl>
Date: Tue, 20 Dec 1994 16:12:14 +0100
Subject: [www-mling,00126] Re: The future of Multilingual WWW 
Message-Id: <9412201512.AA13053@utis179.cs.utwente.nl>


Just some ideas that seem to be related to this thread
(apologies if they have been shot down before...)

One of the 'problems' mentioned wrt unicode was that there exist little
on the web, and that one needs (fonts) to be able to edit/generate it.
Another was the existence of 'native' non-unicode documents on the web,
which should be handled even when the Web gets 'unicode-ized'.

Once we have a browser that can handle Unicode, wouldn't it be possible
to have the _server_ do the conversion of the native formats to unicode?
(and thus have unicode as the 'main' format between server and browser)
(and, for what its worth, i also have the idea that it would be (much?)
better/easier to use UTF-8 than 'canonical unicode')

A final note: someone mentioned the existence of a unix unicode xterm.
People might want to look at the editor sam and the terminal emultator 9term
which are unix implementations of plan 9 programs which support unicode.
(see: <URL:http://cooper.edu:9000/~rp/plan9/plan9-info.html>)
Another interesting thing might be 'tcs' - a program that translates between
character sets, including unicode. It is available at
<URL:ftp://plan9.att.com/plan9/unixsrc/tcs.shar.Z>

Axel.

<Axel.Belinfante@cs.utwente.nl>     tel. +31 53 893774     fax. +31 53 333815
 University of Twente, Dept. of C.S., Tele-Informatics & Open Systems Group
       P.O. Box 217    NL-7500 AE Enschede      The Netherlands
     "ili ne sciis ke estas neebla do ili simple faris" -- Loesje