ISO-2022-JP patch for WWWLibrary

What is the problem?

In use of HTML with multiple character sets, we have ambiguity with reference to the special characters <, > and & (the octets with numeric value 60, 62 and 38 respectively). The problem is, in the multiple character sets, where the octet 60 (for example) might appear at any point in the byte sequence for any of several characters.

Avoiding this ambiguity, the prevailing character set (for example, JIS X 0208-1983) is switched back into ASCII before HTML markup signals, and only those characters which would be interpreted as special characters in plain text should be interpreted as markup signals in HTML.

Therefore octet 60, 62 and 38 in the character sets except ASCII should not be interpreted as HTML markup signals. The following is a patch for WWWLibrary to do this.

Patch


The following patch is already put into CERN distribution. You don't need to apply the patch, but you still need to compile with -DISO_2022_JP.

I made an unofficial (quick and dirty :-) patch for WWWLibrary_2.11 by WWW project at CERN. It is also applicable to 2.09a with a few offset.

Apply this and compile with -DISO_2022_JP, you could then handle ISO-2022-JP encoded HTML documents correctly.

(Browsers)

________________________________________________________________________

TAKADA Toshihiro