From: Ken Itakura U3/ISE-Japan 8-694-6422 DECpark 4F  16-Dec-1994 1926 <itakura@jrdv04.enet.dec-j.co.jp>
Date: Fri, 16 Dec 94 19:29:42 +0900
Subject: [www-mling,00116] The future of Multilingual WWW
Message-Id: <9412161029.AA16637@jrdmax.jrd.dec.com>


RE:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)

- Comment on Dave's post

> I agree in general with this approach.  I think it is smart
> to have the browser understand a specific codeset.  At that
> point, however, couldn't the browser convert the old codeset
> to the UNICODE codeset?  I believe that this is common for
> applications that work between Windows/DOS & UNIX -- normally
> they store their information in one codeset (SJIS for ex.) but
> will take input in EUC if the user is running in UNIX.
> 
> In a similar way, I think it would be great if a browser could
> use UNICODE as its default codeset and thus push the standard
> forward. (to the extent that I understand it -- I think UNICODE is
> a good thing)
> 
> However, I think there is always the problems of fonts, right?
> Or are there font sets available that match up with the UNICODE
> character set?

Right, the problem is font. We need the Unicode editor to create Unicode page,
and editor need fonts to display. "Unipad" , the Notepad like editor,
on WindowsNT can create Unicode file which includes latin-1 and Japanese
Kanji character, but I'm not sure it can create every character of Unicode.
I agree with that the Unicode can be the default codeset for the browser,
but it's still 'one of the codeset', even if it's the default. Because, yes,
we, you and me and the members on this ML, might willingly convert the old 
pages to Unicode (for world peace), but I don't think everyone willingly do.


- Comment on Bob's post

> Netscape does not want to become some corporate entity coercing others
> to follow our technical direction just because we say so.  We do have
> great interest in solving I18N and multilingual problems and will gladly
> take an active part in forging the solutions.
> 
> Netscape plans to get a product to market quickly that can better handle
> existing data (especially for Japan).  If there is an elegant solution
> for Unicode in that timeframe, great.

I love you and your stance, considering your political position that can leads
the world to anywhere. Currently Netscape is the most favorable browser on the 
Windows world in Japan. I also use it with the patch which converts JIS and EUC
to SJIS. I'm looking forward to see the next release that can display Japanese
without any patch.

> I think there is little argument that browser must accomodate the
> existing data on the web.

This kind of argument is always little. Because everyone think it is a 
matter of course. If you want to hear the voice from users, I say.
I insist, "The browser must accommodate the existing data on the web!!!".

> However, since there is very little multilingual web data out there
> today, we (www-mling & others) have a window of opportunity to specify
> how that data should be formatted.  Let's get to work.

Good. Why don't you create a RFC here with the members in this ML?

> >2. implement for Unicode
> 
> Using Canonical Unicode or some UTF flavor?
> We could support UTF without breaking any of the current browser.
> But if we want to use Canonical Unicode, we could confuse browser that
> scan for the ASCII '<', '>', and '&' metacharacters.  Also, there may
> be some byte ordering issues.  (UTF does not have byte ordering issues.)

This is just the story in future, and my opinion is just my personal preference.
Currently I think it should be some UTF not the Canonical Unicode. 
Because UTF is handy for the implementor (both of the editors and browser) and for
the network environment. But since, UTF tends to be larger than Canonical
UTF for the text includes Kanji character, Japanese user might not like
UTF. This is very delicate issue. We must talk carefully about this. 

> >3. implement to analyze charset tags.
> 
> I believe the correct way is for the data must "tag" itself.  One way
> would be to add some new HTML tags as Itakura-san suggests.
> 
> Another way may be to just use the ISO2022 escape sequences for existing
> encodings. This is how multilingual XMosaic says it is doing it.
> 
> The really nice attribute of this technique is that it is compatible
> with the state of the browser world today.  Even if we decide to add
> HTML tags later, this would not conflict.  This seems like a good
> short-term solution and doesn't prohibit us from coming up with
> something better in the long-term.  Also this could be done between
> numbers 1 & 2 above.

Agree.  3(tag) can be done before 2(Unicode). Personally I'd like to focus
this issue, I mean the "tag for the codeset". I want to hear the opinion from
the users of various codeset, if it's just agree or disagree.  Anyone?

About using ISO2022 escape sequences, I think ISO2022 is just 
'one of the codeset', again. The browser must support it, but it's not enough.

Ken Itakura




From: Ken Itakura U3/ISE-Japan 8-694-6422 DECpark 4F  16-Dec-1994 1923 <itakura@jrdv04.enet.dec-j.co.jp>
Date: Fri, 16 Dec 94 19:25:05 +0900
Subject: [www-mling,00115] RE: Space Character and Linebreaking in Japanese
Message-Id: <9412161025.AA16581@jrdmax.jrd.dec.com>


All of the request of Nishijima-san is the same with my request.
   - Don't add space character if it's not there in the source.
   - Allow to wrap at every character boundary in Japanese text.

I'd like to add more specific to the latter one.
  - Cannot wrap between the first byte and second byte, of course.
  - Can always wrap between 1 byte character and multi byte character.
  - Can wrap only at space character for the string of one byte
    character in the middle of multi byte text.

The requirements above are indispensable. If a browser doesn't support
above, the number of shipment in Japan will be half.
Inserting artificial newlines are not acceptable, because the creator
of the page cannot know the width of the display of the browser, 
and moreover if the writer inserts the artificial newlines, it will
be ugly on the well implemented browser.


> > 2. text lines can be broken at any character, except several
> >    rules.
> 
> The exceptions he refers to is probably "kinsoku shori" which are rules
> that some characters should not be the first character on a line and
> another handful of characters should not be the last character on a line.
> 
> Netscape is looking to add this support in the future as well.

I suppose "Kinsoku" is not so strong requirement. But If you implement
the indispensable requirement, I mentioned above, "Kinsoku shori" is 
very small extension of it. If you would like to implement, I will
willingly help you.

FYI. 
   The characters should not be at the first character are...
       - Punctuations
       - Small Hiragana and Small Katakana
          (It not means half width kana)
       - close parenthesis, including Japanese "KAGI Kakko"
   The characters should not be at the last character are...
       - open parenthesis, including Japanese "KAGI Kakko"
See? It's very few. 

> One person I talked to, suggested that some Japanese software only breaks
> on word boundaries.  This would require context analysis (yikes) and I'm
> not sure if this is a real requirement.  Anyone have any more data on this?

I don't think this is a real requirement. Even in the world of physical publishing,
this kind of treatment is hardly seen. And I believe such a treatment is for 
the vertical writing. Because this treatment makes the right edge of the
paragraph rough. We feel it ugly. You might think, You can align the right edge 
by adjusting character spacing, but it breaks vertical alignment at the middle of 
the line, we feel it ugly, too. Generally speaking, Japanese like the format that 
the every character is on the fixed lattice. I believe my preference is major in
Japanese, but of course some people must disagree with me. But anyway, you need not
to do this treatment, unless it's very easy for you.

Ken Itakura


From: NISHIJIMA Takanori / =?ISO-2022-JP?B?GyRCQD5FZzknRkEbKEI=?= <racsho@pure.cpdc.canon.co.jp>
Date: Fri, 16 Dec 1994 15:10:58 +0900
Subject: [www-mling,00114] Re: Space Character and Linebreaking in
Message-Id: <199412160610.PAA11186@slave1.pure.cpdc.canon.co.jp>


In article <199412160301.TAA19953@neon.mcom.com>, bobj@mcom.com (Bob Jung) writes:

 > In our next Netscape release we plan to support what Nishijima-san suggests:

I'm looking forward to the next release.

   > 2. text lines can be broken at any character, except several
   >    rules.

 > The exceptions he refers to is probably "kinsoku shori" which are rules
 > that some characters should not be the first character on a line and
 > another handful of characters should not be the last character on a line.

Yes.

 > One person I talked to, suggested that some Japanese software only breaks
 > on word boundaries.  This would require context analysis (yikes) and I'm
 > not sure if this is a real requirement.  Anyone have any more data on this?

Those software look for the boundaries among Kanji, Hiragana and
Katakana, and treat
  + Hiragana string
  + Kanji string optionally followed by Hiragana string,
  + Katakana string(and optionally ...)
as a word, without any analyses. :-)
This is mere ``pseudo'' word splitting, but looks almost OK.

# Precisely, these strings look like "Bunsetsu"(``phrase'' or
# ``sentence fragment''), than "Tango"(``word'').

Applying this to my silly sample text, we get these words:

   K       h   K  K   K   h     K     K  h
  (watashi/wa)(ni/hon/jin/de/,)(sashi/mi/to)
  (tem)(pu/ra/wo)(ta/be/ru/no/ga)(su/ki/de/su/.)
   K    k  k  h   K  h  h  h  h   K  h  h  h

   K: Kanji character
   k: Katakana character
   h: Hiragana character

--
NISHIJIMA Takanori      Canon Inc.
E-mail: racsho@cpdc.canon.co.jp


From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 19:31:12 -0800
Subject: [www-mling,00113] Re:  RE:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
Message-Id: <199412160331.TAA21031@neon.mcom.com>


>Hi,
>
>I support Bob and Dan.
>We cannot ignore the 'de facto', which is JIS, Shift JIS and EUC in Japan, and
>we cannot ignore Unicode, neither. I think the situation won't change in
>future. Even if the Unicode get popularity, we cannot ignore 'de facto'.
>The Unicode might an answer of I18n, and we could choose it for the single
>codeset in the world of WWW. But unfortunately, it's too late, since we have
>many pages written in non Unicode. We have to seek the way to live with both
>Unicode and 'de facto' codeset, here again.
>I think the answer to this situation is that HTML has the tag to specify the
>codeset, like <charset ="ISO2022-JP">(...</charset>). Unicode can be one of
>the selection of this. Then the creators of the browser can decide the
>priority of its supporting codeset by the market requirement.
>The migration path will be...
>1. implement for de facto standard codeset.

Agreed.

>2. implement for Unicode

Using Canonical Unicode or some UTF flavor?
We could support UTF without breaking any of the current browsers.
But if we want to use Canonical Unicode, we could confuse browser that
scan for the ASCII '<', '>', and '&' metacharacters.  Also, there may
be some byte ordering issues.  (UTF does not have byte ordering issues.)

>3. implement to analyze charset tags.

I believe the correct way is for the data must "tag" itself.  One way
would be to add some new HTML tags as Itakura-san suggests.

Another way may be to just use the ISO2022 escape sequences for existing
encodings. This is how multilingual XMosaic says it is doing it.

The really nice attribute of this technique is that it is compatible
with the state of the browser world today.  Even if we decide to add
HTML tags later, this would not conflict.  This seems like a good
short-term solution and doesn't prohibit us from coming up with
something better in the long-term.  Also this could be done between
numbers 1 & 2 above.

>Even after the charset tags are defined, we need the UI to specify the
>preferred codeset for old documents.

I agree.

>Comment?
>
>Ken Itakura

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 19:01:26 -0800
Subject: [www-mling,00112] Re:  Space Character and Linebreaking in Japanese
Message-Id: <199412160301.TAA19953@neon.mcom.com>


Unfortunately, most browsers today are English-biased and look for "white
space" for line breaking.  Without white space (space, tab, new-line), the
browsers will not wrap to the next line and you could end up with a
long horizontal scroll area.  So for the interim, I think people
developing Japanese HTML content are adding artificial newlines to their
text to accomodate these English-biased browsers.

In our next Netscape release we plan to support what Nishijima-san suggests:

> 2. text lines can be broken at any character, except several
>    rules.

The exceptions he refers to is probably "kinsoku shori" which are rules
that some characters should not be the first character on a line and
another handful of characters should not be the last character on a line.

Netscape is looking to add this support in the future as well.

One person I talked to, suggested that some Japanese software only breaks
on word boundaries.  This would require context analysis (yikes) and I'm
not sure if this is a real requirement.  Anyone have any more data on this?

I don't now about Chinese.  Will someone please provide www-mling with
the rules for Chinese layout?

-bob

>In Japanese (and Chinese, I think),
>
> 1. usually words in the text isn't separated by space, and
> 2. text lines can be broken at any character, except several
>    rules.
>
>Therefore, the following sentense
>
>  ``Watashi wa Nihon-jin de, Sashimi to
>      Tempura wo taberunoga suki desu.''
>
>  (It means ``I am Japanese, and I like eating Sashimi and Tempura.'' :-)
>
>can be formatted as follows, using Kanji and Hiragana.
>
>  ``(watashi)(wa)(ni)(hon)(jin)(de)(,)
>    (sashi)(mi)(to)(tem)(pu)(ra)(wo)(ta)
>    (be)(ru)(no)(ga)(su)(ki)(de)(su)(.)'' ---<*>
>
>   (above strings in a pair of parentheses means one Kanji or Hiragana
>    character.)
>
>However, current all existing WWW browsers always add one space at the
>end of source text lines, so the output for the HTML source <*> is:
>
>  ``(watashi)(wa)(ni)(hon)(jin)(de)(,) (sashi)(mi)(to)(tem)
>    (pu)(ra)(wo)(ta) (be)(ru)(no)(ga)(su)(ki)(de)(su)(.)''
>
>Two space characters are inserted.  This text is very ugly for most
>Japanese.
>My request is:
>
>  + don't add space at the end of lines automatically,
>    in Kanji/Hiragana/Katakana context.
>
>Please consider this on developing Japanese-conscious WWW browser.
>Thanks in advance.
>
>--
>NISHIJIMA Takanori      Canon Inc.
>E-mail: racsho@cpdc.canon.co.jp

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 18:36:27 -0800
Subject: [www-mling,00111] Re: Specifications for JIS and S-JIS
Message-Id: <199412160236.SAA19238@neon.mcom.com>


>On Fri, 16 Dec 1994, Bob Jung wrote:
>
>   Please let me get out the world. I am tired of ....  I need to sleep....
>
>   Thank you so much....
>
>
>
>--------------------------------------------------------------
>MaoHua Wang                   jhuang@unix1.sncc.lsu.edu
>GO!GO!!GO!!! TIGER            Louisiana State University
>--------------------------------------------------------------;0m
MaoHua,

I don't what happened to your copy of my email.  I did not send any
message like the above.  Below is what I really sent.

Regards,
Bob
=============================================================================

Date: Fri, 16 Dec 94 11:10:26 +0900
Date: Thu, 15 Dec 1994 18:10:34 -0800
X-Sender: bobj@pop.mcom.com
Mime-Version: 1.0
To: tomd@mactom.pls.com (Tom Donaldson)
From: bobj@mcom.com (Bob Jung)
Subject: [www-mling,00108] Re:  Specifications for JIS and S-JIS
Cc: www-mling@square.ntt.jp
Reply-To: www-mling@square.ntt.jp
Ml-Name: www-mling
Sender: takada@square.ntt.jp
Errors-To: www-mling-request@square.ntt.jp
Mail-Count: 00108
X-UIDL: 787545003.000

>Can someone point me to ftp sites with specifications for JIS and
>S-JIS?
>
>I'd also like to find sample text in each character set, if anyone
>knows of any.
>
>Thanks,
>Tom
>tomd@pls.com
Hi Tom,

It's not the specs, but there's an excellent book that has most the info
you'll need.  You can get at Computer Literacy type stores.  Or inquire
on-line to O'Reilly at order.ora.com.

        Lunde, Ken, Understanding Japanese Information Processing,
        O'Reilly & Associates, Inc., 1993.
        ISBN: 1-56592-043-0

Addenda and such are on-line: ftp://ftp.ora.com/pub/nutshell/ujip/

-Bob


Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: MaoHua Wang <jhuang@unix1.sncc.lsu.edu>
Date: Thu, 15 Dec 1994 20:21:53 -0600 (CST)
Subject: [www-mling,00110] Re: Specifications for JIS and S-JIS
Message-Id: <Pine.A32.3.91.941215201635.26602B-100000@unix1.sncc.lsu.edu>


On Fri, 16 Dec 1994, Bob Jung wrote:

   Please let me get out the world. I am tired of ....  I need to sleep....

   Thank you so much....



--------------------------------------------------------------
MaoHua Wang                   jhuang@unix1.sncc.lsu.edu
GO!GO!!GO!!! TIGER            Louisiana State University
--------------------------------------------------------------;0m






From: NISHIJIMA Takanori / =?ISO-2022-JP?B?GyRCQD5FZzknRkEbKEI=?= <racsho@pure.cpdc.canon.co.jp>
Date: Fri, 16 Dec 1994 11:13:47 +0900
Subject: [www-mling,00109] Space Character and Linebreaking in Japanese
Message-Id: <199412160213.LAA10209@slave1.pure.cpdc.canon.co.jp>


Hi everyone!

I have a request for upcoming multilingual WWW browsers.

In Japanese (and Chinese, I think),

 1. usually words in the text isn't separated by space, and
 2. text lines can be broken at any character, except several
    rules.

Therefore, the following sentense

  ``Watashi wa Nihon-jin de, Sashimi to
      Tempura wo taberunoga suki desu.''

  (It means ``I am Japanese, and I like eating Sashimi and Tempura.'' :-)

can be formatted as follows, using Kanji and Hiragana.

  ``(watashi)(wa)(ni)(hon)(jin)(de)(,)
    (sashi)(mi)(to)(tem)(pu)(ra)(wo)(ta)
    (be)(ru)(no)(ga)(su)(ki)(de)(su)(.)'' ---<*>

   (above strings in a pair of parentheses means one Kanji or Hiragana
    character.)

However, current all existing WWW browsers always add one space at the
end of source text lines, so the output for the HTML source <*> is:

  ``(watashi)(wa)(ni)(hon)(jin)(de)(,) (sashi)(mi)(to)(tem)
    (pu)(ra)(wo)(ta) (be)(ru)(no)(ga)(su)(ki)(de)(su)(.)''

Two space characters are inserted.  This text is very ugly for most
Japanese.
My request is:

  + don't add space at the end of lines automatically,
    in Kanji/Hiragana/Katakana context.

Please consider this on developing Japanese-conscious WWW browser.
Thanks in advance.

--
NISHIJIMA Takanori	Canon Inc.
E-mail: racsho@cpdc.canon.co.jp


From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 18:10:34 -0800
Subject: [www-mling,00108] Re:  Specifications for JIS and S-JIS
Message-Id: <199412160210.SAA18283@neon.mcom.com>


>Can someone point me to ftp sites with specifications for JIS and
>S-JIS?
>
>I'd also like to find sample text in each character set, if anyone
>knows of any.
>
>Thanks,
>Tom
>tomd@pls.com
Hi Tom,

It's not the specs, but there's an excellent book that has most the info
you'll need.  You can get at Computer Literacy type stores.  Or inquire
on-line to O'Reilly at order.ora.com.

        Lunde, Ken, Understanding Japanese Information Processing,
        O'Reilly & Associates, Inc., 1993.
        ISBN: 1-56592-043-0

Addenda and such are on-line: ftp://ftp.ora.com/pub/nutshell/ujip/

-Bob


Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 16:40:37 -0800
Subject: [www-mling,00107] Re:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
Message-Id: <199412160040.QAA13311@neon.mcom.com>


>Ahh... I saw this mail later.  I agree with Dan here.  Implementing
>Unicode is probably best taken in steps as he suggests.
>
>I also agree that Netscape is in a position to strongly impact
>the market -- look at what Mosaic did.  Therefore, I too urge
>Bob to consider his move into Unicode carefully.

EXACTLY the point that I've been trying to make.  Netscape does not
want to invent something on its own to support multilingual that's
going to be incompatible or cause grief for others.

Netscape does not want to become some corporate entity coercing others
to follow our technical direction just because we say so.  We do have
great interest in solving I18N and multilingual problems and will gladly
take an active part in forging the solutions.

Netscape plans to get a product to market quickly that can better handle
existing data (especially for Japan).  If there is an elegant solution
for Unicode in that timeframe, great.

www-mling is one of the forums where we all can share our ideas and
concerns and come up with a plan for multilingual support.
That's why I've joined this mailing list...

Let's get back to discussing multilingual issues!

I think there is little argument that browsers must accomodate the
existing data on the web.

However, since there is very little multilingual web data out there
today, we (www-mling & others) have a window of opportunity to specify
how that data should be formatted.  Let's get to work.

Regards,
Bob

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: bobj@mcom.com (Bob Jung)
Date: Thu, 15 Dec 1994 15:07:33 -0800
Subject: [www-mling,00106] Re:   SJIS & HTML
Message-Id: <199412152307.PAA06768@neon.mcom.com>


Whoops.  I feel so foolish.  Thanks Tsuchiya-san for the correction!
I retract my question.  SJIS has no HTML metacharacter conflicts and should
render fine.

-bob

>Ah.. I found there is some misunderstanding.
>
>Bob Jung writes:
> >         ASCII '<' == 0x60     conflicts w/SJIS 2nd byte (0x40-0x7E,
>0x80-0xFC)
> >         ASCII '>' == 0x62     conflicts w/SJIS 2nd byte (0x40-0x7E,
>0x80-0xFC)
> >         ASCII '&' == 0x38 does not conflict w/SJIS
>
>As for the ASCII code, `<', `>', `&' are represented by 60,62,38 in
>*decimal* notations.  In hexadecimal, these should be 0x3C, 0x3E, 0x26.
>
>I noticed from Itakura's comments.
>
>---------------------------------------------------------------------------
>TSUCHIYA Satoshi     | tsuchiya@sysrap.cs.fujitsu.co.jp | NIFTY-ID:GDC02435

Bob Jung                        +1 415 254-1900 x2788   fax +1 415 254-2601
Netscape Communications Corp.   650 Castro #500         Mtn View, CA 94041




From: "Michael Schoolnik" <pp001066@interramp.com>
Date: Thu, 15 Dec 1994 17:46:09 -0500
Subject: [www-mling,00105] unsubscribe
Message-Id: <9412151746.AA09535@Michael Schoolnik>



--part_AB16316000003C8A00000001
Content-Type: Text/Plain; charset=US-ASCII
Content-Disposition: Inline

Please unsubscribe me.

--part_AB16316000003C8A00000001
Content-Type: Text/Plain; charset=US-ASCII
Content-Disposition: Inline

Michael Schoolnik <pp001066@interramp.com>
--part_AB16316000003C8A00000001--



From: dqi@MIT.EDU
Date: Thu, 15 Dec 1994 16:03:11 EST
Subject: [www-mling,00104] Re:  Message not deliverable 
Message-Id: <9412152103.AA14121@m11-113-12.MIT.EDU>




Please remove me from the mailing list. thanks!


From: cmr@koibito.iisc.com
Date: Thu, 15 Dec 94 15:57:07 -0500
Subject: [www-mling,00103] Re: Netscape & Unicode (was: Beware of the  
Message-Id: <9412152057.AA06184@koibito.iisc.com>



>Just one comment from Unix point of view.
>Most Unix users are likely to generate UTF8 file instead of Unicode file.
>Even though conversion from UTF8 to Unicode is easy, it should be
>important to support native UTF8 files.


Allow me to differ on that bit about "Most Unix users". I have A DEC Alpha
which is running DEC OSF/1 3.0 with the Japanese language variant and it 
supports:

EUC, Super DEC kanji (with JIS 0212), SJIS, DEC kanji (almost EUC), and 
ISO-2022 .  A lot of the built in utilities do not like SJIS.

Charlie

*****************************************************************************
*  Charles Richmond 	Integrated International Systems Corporation        *
*  cmr@koibito.iisc.com	cmr@world.std.com				    *
*  Specializing in UNIX, X, Image Processing, and Communications.           *
*  One Longfellow Place Suite 3309 , Boston , Ma. USA 02114-2431            *
*  (617) 367 3151	FAX (617) 723 6861                                  *
*****************************************************************************


From: Ik.Kim@Eng.Sun.COM (Ik Kim)
Date: Thu, 15 Dec 1994 11:38:31 -0800
Subject: [www-mling,00102] Re:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
Message-Id: <9412151938.AA17992@mighty.Eng.Sun.COM>



  Dan writes:
  >
  >I can see that your heart is in the right place (and your experience at Apple
  >can only help; Apple is IMHO one of the best at writing international
  >software).
  >But I think I got the right impression about your direction.  It seems to me
  >that at least partial Unicode compatibility should be in the first release of 
  >your multilingual browser because
  >1. it is easy to implement; all you need is a translation table to the local character
  >   set.  The table will be about 64K in size, and doesn't need to be loaded 
  >   into memory unless it is used.  I believe Windows 95 will include this table in
  >   the operating system.
  >2. The impact of Netscape's browsers is incredible.  If your first browser doesn't
  >   support Unicode, nobody will bother to put any Unicode data on the Web.
  >
  >One needn't implement *all*, or even much, of Unicode for it to be useful.  
  >Starting off by supporting the part that corresponds to the local character 
  >set seems like a sensible and easy way to start, as it would be a critical first
  >step towards usability of Unicode on the net.  Who would put Unicode content
  >up if there were no browsers?  Nobody.  Who would write a commercial browser 
  >that supported Unicode fully if there were no Unicode content on the net?  Nobody.
  >The way to break the deadlock is to take a very small incremental step towards
  >Unicode support in browsers, while us Web content providers take small steps towards
  >providing Unicode content on the Web.
  >
  >My two bits, anyway.
  >- Dan
  >
  

Just one comment from Unix point of view.
Most Unix users are likely to generate UTF8 file instead of Unicode file.
Even though conversion from UTF8 to Unicode is easy, it should be
important to support native UTF8 files.

-- ikkim@eng.sun.com (Ik Kim 'Ike')


From: saty@skuld.ossi.com (Mr. Tsuchiya)
Date: Thu, 15 Dec 94 11:15:25 PST
Subject: [www-mling,00101]  SJIS & HTML
Message-Id: <9412151915.AA03887@skuld.ossi.com.ossi.com>


Ah.. I found there is some misunderstanding.

Bob Jung writes:
 >         ASCII '<' == 0x60     conflicts w/SJIS 2nd byte (0x40-0x7E, 0x80-0xFC)
 >         ASCII '>' == 0x62     conflicts w/SJIS 2nd byte (0x40-0x7E, 0x80-0xFC)
 >         ASCII '&' == 0x38 does not conflict w/SJIS

As for the ASCII code, `<', `>', `&' are represented by 60,62,38 in
*decimal* notations.  In hexadecimal, these should be 0x3C, 0x3E, 0x26.

I noticed from Itakura's comments.

---------------------------------------------------------------------------
TSUCHIYA Satoshi     | tsuchiya@sysrap.cs.fujitsu.co.jp | NIFTY-ID:GDC02435 



From: Jan Hardenbergh <jch@nell.oki.com>
Date: Thu, 15 Dec 94 13:13:00 E
Subject: [www-mling,00100] Unicode Web Page
Message-Id: <2EF08959@nell>



> for more info on unicode check out URL:

> http://www.stonehand.com/unicode.standard.html

That should be http://www.stonehand.com/unicode/standard.html

And is WORTHWHILE! Check it out.


From: tomd@mactom.pls.com (Tom Donaldson)
Date: Thu, 15 Dec 1994 13:15:41 -0500
Subject: [www-mling,00099] Specifications for JIS and S-JIS
Message-Id: <9412151815.AA07766@mactom>


Can someone point me to ftp sites with specifications for JIS and
S-JIS?  

I'd also like to find sample text in each character set, if anyone
knows of any.

Thanks,
Tom
tomd@pls.com


From: saty@skuld.ossi.com (Mr. Tsuchiya)
Date: Thu, 15 Dec 94 09:54:41 PST
Subject: [www-mling,00098]  RE:   SJIS & HTML
Message-Id: <9412151754.AA03811@skuld.ossi.com.ossi.com>


Ken Itakura U3/ISE-Japan 8-694-6422 DECpark 4F 15-Dec writes:
 > I think SJIS doesn't have the character that conflict with '<' nor '>'.
 > The second byte of SJIS doesn't use all of ASCII, it just uses some of ASCII.

Yes.
The second byte range of SJIS code is 40--7E and 80-FC (in hexadecimal).
Fortunately there is no conflict for `<'(3C) and `>'(3E).

---------------------------------------------------------------------------
TSUCHIYA Satoshi     | tsuchiya@sysrap.cs.fujitsu.co.jp | NIFTY-ID:GDC02435 



From: Dave.Hofert@Eng.Sun.COM (Dave Hofert)
Date: Thu, 15 Dec 1994 09:34:48 -0800
Subject: [www-mling,00097] Re:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
Message-Id: <9412151734.AA28954@baltimore.Eng.Sun.COM>



Ahh... I saw this mail later.  I agree with Dan here.  Implementing
Unicode is probably best taken in steps as he suggests.

I also agree that Netscape is in a position to strongly impact
the market -- look at what Mosaic did.  Therefore, I too urge
Bob to consider his move into Unicode carefully.

Thanks -

	Dave

+ From: dank@knowledge.adventure.com
+ Real-Date: Wed, 14 Dec 94 20:51 PST
+ To: www-mling@square.ntt.jp
+ Subject: [www-mling,00091] Re:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
+ 
+ Hi Bob!
+ 
+ >Somehow Dan got the wrong impression about the direction we at Netscape
+ >are pursuing.  Supporting Unicode is definitely a direction for our
+ >product.
+ >However, today there is not much data on the web in Unicode, so we
+ >are not planning to ***hastily rush*** any solution to the market.
+ >On the otherhand, we are being careful that products that we release
+ >NOW will not prohibit us from supporting Unicode and other future
+ >multilingual solutions.
+ >We are very interested in working with the rest of the I18N community
+ >to come up with the "correct" extensions to the standards.
+ 
+ I can see that your heart is in the right place (and your experience at Apple
+ can only help; Apple is IMHO one of the best at writing international
+ software).
+ But I think I got the right impression about your direction.  It seems to me
+ that at least partial Unicode compatibility should be in the first release of 
+ your multilingual browser because
+ 1. it is easy to implement; all you need is a translation table to the local character
+    set.  The table will be about 64K in size, and doesn't need to be loaded 
+    into memory unless it is used.  I believe Windows 95 will include this table in
+    the operating system.
+ 2. The impact of Netscape's browsers is incredible.  If your first browser doesn't
+    support Unicode, nobody will bother to put any Unicode data on the Web.
+ 
+ One needn't implement *all*, or even much, of Unicode for it to be useful.  
+ Starting off by supporting the part that corresponds to the local character 
+ set seems like a sensible and easy way to start, as it would be a critical first
+ step towards usability of Unicode on the net.  Who would put Unicode content
+ up if there were no browsers?  Nobody.  Who would write a commercial browser 
+ that supported Unicode fully if there were no Unicode content on the net?  Nobody.
+ The way to break the deadlock is to take a very small incremental step towards
+ Unicode support in browsers, while us Web content providers take small steps towards
+ providing Unicode content on the Web.
+ 
+ My two bits, anyway.
+ - Dan
+ 
+ 


From: Dave.Hofert@Eng.Sun.COM (Dave Hofert)
Date: Thu, 15 Dec 1994 09:29:52 -0800
Subject: [www-mling,00096] Re:  RE:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
Message-Id: <9412151729.AA28943@baltimore.Eng.Sun.COM>



+ From: Ken Itakura U3/ISE-Japan 8-694-6422 DECpark 4F  15-Dec-1994 1509 <itakura@jrdv04.enet.dec-j.co.jp>
+ To: www-mling@square.ntt.jp
+ Cc: itakura@jrdv04.enet.dec-j.co.jp
+ Subject: [www-mling,00094] RE:  Netscape & Unicode (was:  Beware of the bureaucrats: the future of Multilingual WWW?)
+ 
+ Hi,
+ 
+ I support Bob and Dan.
+ We cannot ignore the 'de facto', which is JIS, Shift JIS and EUC in Japan, and 
+ we cannot ignore Unicode, neither. I think the situation won't change in 
+ future. Even if the Unicode get popularity, we cannot ignore 'de facto'.
+ The Unicode might an answer of I18n, and we could choose it for the single 
+ codeset in the world of WWW. But unfortunately, it's too late, since we have 
+ many pages written in non Unicode. We have to seek the way to live with both
+ Unicode and 'de facto' codeset, here again.
+ I think the answer to this situation is that HTML has the tag to specify the 
+ codeset, like <charset ="ISO2022-JP">(...</charset>). Unicode can be one of 
+ the selection of this. Then the creators of the browser can decide the 
+ priority of its supporting codeset by the market requirement.
+ The migration path will be...
+ 1. implement for de facto standard codeset.
+ 2. implement for Unicode
+ 3. implement to analyze charset tags.
+ Even after the charset tags are defined, we need the UI to specify the 
+ preferred codeset for old documents.
+ 
+ Comment?

I agree in general with this approach.  I think it is smart
to have the browser understand a specific codeset.  At that
point, however, couldn't the browser convert the old codeset
to the UNICODE codeset?  I believe that this is common for
applications that work between Windows/DOS & UNIX -- normally
they store their information in one codeset (SJIS for ex.) but
will take input in EUC if the user is running in UNIX.

In a similar way, I think it would be great if a browser could
use UNICODE as its default codeset and thus push the standard
forward. (to the extent that I understand it -- I think UNICODE is
a good thing)

However, I think there is always the problems of fonts, right?
Or are there font sets available that match up with the UNICODE
character set?

+ Ken Itakura

	Dave Hofert	SunSoft Information Technology Engineering