Japanese Encoding Methods

First of all, documents in Japanese need at least two character sets, ASCII and JIS X 0208. The latter is a 2-byte character set including Kanji, Hiragana, Katakana and some other symbols and characters.

We use two Japanese encoding methods in this server. One is EUC-JP (Extended Unix Code) and the other is ISO-2022-JP.

EUC-JP

EUC-JP is ISO-2022 compliant 8-bit encoding for which initially designated ASCII to G0 and JIS X 0208-1983 (or JIS X 0208-1990) to G1 without explicit announcement. G2 and G3 are never used. A sample file encoded in EUC-JP is here.

ISO-2022-JP

ISO-2022-JP, which is registered as MIME charset name, is a widely used encoding in Japanese IP communities for electronic mail and network news messages. It is ISO-2022 compliant 7-bit encoding for which using only G0 codeset. ASCII is initially designated to G0. To switch character sets, you should designate it to G0 by escape sequences, for example:
	ESC ( B    ASCII
	ESC ( J    JIS X 0201-1976 ("Roman" set)
	ESC $ @    JIS X 0208-1978
	ESC $ B    JIS X 0208-1983
A sample file is here. For more detail about ISO-2022-JP, see RFC-1468.

Although I think ISO-2022-JP is better than EUC-JP, ISO-2022-JP causes some problems in HTML.

Shift-JIS

There is another encoding scheme for Japanese called Shift-JIS (also called MS-Kanji Code). Unfortunately, Shift-JIS is widely used with MS-DOS, Windows and Macintosh; but I think Shift-JIS is rubbish and should not be used anymore!!! We never use this encoding under this server except this example.

(Browsers)

________________________________________________________________________

TAKADA Toshihiro