From: vjs@helm.infores.com (Valerie Schmitt)
Date: Fri, 13 Jan 1995 17:07:30 -0500
Subject: [www-mling,00182] Re:  MULE problems
Message-Id: <199501132208.HAA29256@mail.core.ntt.jp>

At 06:10  95/01/14 +0900, www-mling@square.ntt.jp wrote:
>MULE is currently the only editor/viewer I can find for Solaris2.3, but
>I am unable to get it to work.  I'd appreciate any ideas ...

Tom, (and anyone else interested)

There's a MULE mailing list run by Kenichi Handa (handa@etlken.etl.go.jp).

Send subscription requests to: mule-request@etl.go.jp
 (subscribe mule@etl.go.jp [yourname])

The list itself is: mule@etl.go.jp

Good luck!
Val
________________________________________________________________________
Valerie J. Schmitt vjs@infores.com  IRI Software Waltham MA 617-890-1100
Go Sharks! Go Isles!  Infrequently at The Tank: 213/1/10-13 or 214/1/5-8
"I believe this is heaven to no one else but me ..."   --Sarah McLachlan


From: tomd@mactom.pls.com (Tom Donaldson)
Date: Fri, 13 Jan 1995 16:09:46 -0500
Subject: [www-mling,00181] MULE problems
Message-Id: <9501132109.AA17721@mactom>


MULE is currently the only editor/viewer I can find for Solaris2.3, but
I am unable to get it to work.  I'd appreciate any ideas ...

When I try to edit a file, either by passing it on the command line or via
the find-file command, I get this error message in the mini-buffer after
a couple of seconds:

    Variable binding depth exceeds max-specpdl-size

max-specpdl-size evaluates to 600.

I setq max-specpdl-size to 2400.  When I find-file, now I get a longer
pause, some garbage collection messages, then another error message:

    Lisp nesting exceeds max-lisp-eval-depth

This appears to be some sort of infinite recursion.

max-lisp-eval-depth evaluates to 300.

I set max-lisp-eval-depth to 1200.  Get many garbage collection messages,
then this error message again:

    Variable binding depth exceeds max-specpdl-size

The problem occurs with Mule when built on both of these platforms:

    Sun SPARC 20 with Solaris 2.3
    HP/715 with HP/UX 9.x

Any ideas, anyone?
Thanks,
Tom
  # Tom Donaldson                2400 Research Blvd., Suite 350      #
  # Senior Software Developer    Rockville, MD 20850, USA            #
  # Personal Library Software    (301) 208-1222, FAX: (301) 963-9738 #
  # e-mail: tomd@pls.com         http://www.pls.com                   #


From: yergeau@alis.ca
Date: Fri, 13 Jan 95 14:53:40 EST
Subject: [www-mling,00180] Re: charset parameter
Message-Id: <9500137900.AA790030464@smtplink.alis.ca>


Jacob Sparre Andersen <sparre@nbi.dk> writes:
>Francois Yergeau <yergeau@alis.ca> wrote:
>| Sure, if all the languages in a document can be represented in Latin 1,
>| you have no problem, but what if this is not the case?
> ^^^^^^^^^^
>Then I consider it a problem with the character sets. :-)

Sure :-)  Unicode nicely solves most charset and language problems, but we're 
not there yet.

>About character sets:
>What I meant to say, was that it can happen that just mapping tags 
>between different character sets by using characters with the same 
>encoding might be impractical (look strange) or even impossible in some 
>cases.

Agreed.

>When all the tags are defined in a common subset of all the used character 
>sets this is not a problem.

This is pretty good already, but not enough for some important charsets.  
Somehow, Mosaic-L10n manages to parses HTML documents coded in ISO-2022-JP, 
which does not contain ASCII as a subset, so it can be done some way.

Then either SGML is more flexible than it looks, or those ISO-2022-JP HTML docs 
are not SGML conformant (and they will soon be "illegal", as per the HTML 
draft), or the developpers of Mosaic-L10n were very clever :-)

-- 
Francois Yergeau  <yergeau@alis.ca>
Alis Technologies Inc.
+1 514 738-9171


From: yergeau@alis.ca
Date: Fri, 13 Jan 95 13:55:13 EST
Subject: [www-mling,00179] Re: charset parameter (long)
Message-Id: <9500137900.AA790026888@smtplink.alis.ca>

Bruce Kahn <Bruce_Kahn@iris.com> writes:
>Francois Yergeau wrote:
>>>One idea, is for a HTML <charset> tag that would take precedence over the
>>>MIME header:
>>
>>I like this, but will it fly?  What about multi-charset documents?
>
>  Not very well if the text until <charset=xxx> is in one charset and the 
>rest is in another.  In order for a browser to grok the charset entry it must 
>be able to parse to it.

That's very clear, but is not a problem for any of the numerous charsets that 
have ASCII as a subset (like Latin-1).  The parser sees only ASCII until it 
reaches <charset=xxx> and then knows how to display the document (in Cyrillic 
instead of Latin 1, say).  No need to switch charset here, just continue in 
what you started with.

I think this would also work with ISO-2022-JP and the like;  ASCII is not 
really a subset there, but the document starts in ASCII until told otherwise.  
Perhaps some input from Japan would be helpful here.

>  Given that, this scheme would require the authors to either write in two 
>different character sets (one for the page and one for <charset>) or we would 
>have to hack the scheme to be something too gross to consider.

It seems to me that one can go a long way without involving two charsets.  
Much further than with Latin-1 only, with the whole ISO-8859 series plus 
possibly ISO-2022 plus, thinking about it, Unicode-1-1-UTF7.

Why then is the HTML draft so restrictive?  Larry Masinter's proposed changes, 
while a nice opening, still restrics to Latin-1 (section 2.16), which doesn't 
make for a very *World* Wide Web.

>Also, do we really want to get into the business of multi-charsets w/in 1 
>document??

Emphatically yes!

>I hope not otherwise all the discussion on a header line with the desired 
>charset for negotiating on a perfered format is for 
>nothing.  (I ask for a document in EUC but it has JIS or SJIS intermixed; how 
>could I grok those parts?)

First thing, the different charsets have to be identifiable, and that means 
tagging.

>  I think providing the character set information is better left to 
>negotiation between the browser and client (as discussed so far).

It has also been pointed out that the server needs to know what to tell the 
client, and Bob Jung proposed the <charset> tag just to help with that.  No 
need for parallel trees or other, grosser schemes if your documents identify 
themselves.

>I like Dans suggestion about having the preference rating but Im not sure 
>how useful it would be over say sending multiple accept-charsets in order of 
>preferece (ie: 1st is the most prefered, the last is least prefered).

HHTP/1.0 specifies that the ordering of header field is not 
significant;  I suppose that could be changed though.  However, this 
simpler scheme does not easily allow combining charset priorities 
with Accept: and Accept-Language priorities.

-- 
Frangois Yergeau  <yergeau@alis.ca>
Alis Technologies Inc.
+1 514 738-9171


From: pandries@alis.ca
Date: Fri, 13 Jan 95 13:23:45 EST
Subject: [www-mling,00178] Re : proposed changes to charset parameter
Message-Id: <9500137900.AA790025424@smtplink.alis.ca>


Masinter@parc.xerox.com recently wrote : 

>To address the comments made at the meeting on character sets, I
>started with the text version of the HTML draft, edited it, and am
>sending proposed changes as diffs.

(...)
>================================================================
>diff -5c html-orig.txt html-revised.txt
>*** html-orig.txt Tue Jan 10 03:48:34 1995
->-- html-revised.txt Tue Jan 10 03:49:29 1995
>***************
(...)
In section 1.1.8 Character Data in HTML

>          Independent of the character encoding used,
>          HTML also allows references to any of the ISO Latin-1
>           alphabet, using the names in the table ISO Latin-1
>          Character Representations, which is derived from ISO
>           Standard 8879:1986//ENTITIES Added Latin 1//EN. For
 >          details, see 2.17.2.

I was just wondering if this sentence would not be clearer if we reserved the 
word "encoding" for the transport encoding (a la MIME) and rather simply said 
here "character set" - as is explicited in the name of the parameter specifying 
the "characters encoding used" .

>     2.16 Character Data
>  
>       Level 0
>  
!       The characters between HTML tags represent text. A HTML document
!       (including tags and text) is encoded using the coded character
!       set specified by the "charset" parameter of the "text/html"
!       media type.  For levels defined in this specification, the
!       "charset" parameter is restricted to "US-ASCII" or "ISO-8859-1".
!       ISO-8859-1 encodes a set of characters known as Latin Alphabet
!       No. 1, or simply Latin-1.  Latin-1 includes characters from most
!       Western European languages, as well as a number of control
!       characters.  Latin-1 also includes a non-breaking space, a soft
!       hyphen indicator, 93 graphical characters, 8 unassigned
!       characters, and 25 control characters.
! 
Stop me if I read this wrongly : you mean to say that we could not legally have 
arabic, cyrillic, japanese, korean or chinese in any Web documents. I take that 
the mention "for levels defined" this means all valid levels) ? Even though we 
here in Canada would not suffer (too much) from this restriction, I wonder what 
native speakers from Russia, Japan or Korea would think. I believe, as I 
understand it, that this restriction is unacceptable ...


     Patrick Andries
     Alis Technologies Inc - open to all cultures
     1+514+738-9171
     e-mail : pandries@alis.ca


From: Jacob Sparre Andersen	<sparre@connect.nbi.dk>
Date: Fri, 13 Jan 95 17:02:26 MET
Subject: [www-mling,00177] Re: charset parameter
Message-Id: <199501131601.BAA27976@mail.core.ntt.jp>


Francois Yergeau <yergeau@alis.ca> wrote:
 __________
| Jacob Sparre Andersen <sparre@nbi.dk> writes:
 ^^^^^^^^^^
|^^^^^^^^^^
| Sure, if all the languages in a document can be represented in Latin 1,
| you have no problem, but what if this is not the case?
 ^^^^^^^^^^
Then I consider it a problem with the character sets. :-)
    
About character sets:
|^^^^^^^^^^
| >This could be handled
| >by filling insignificant bits on the characters with too low number of
| >bits in the representation, but you would still need to write the DTD
| >in each character set you use it with.
|
| Sorry, but now I'm not following you any more.  Care to explain, please?
 ^^^^^^^^^^
What I meant to say, was that it can happen that just mapping tags between 
different character sets by using characters with the same encoding might 
be impractical (look strange) or even impossible in some cases.
      
When all the tags are defined in a common subset of all the used character 
sets this is not a problem.
       
Regards,
               Jacob Sparre Andersen
--
URL's: "mailto:sparre@nbi.dk", "http://meyer.fys.ku.dk/~sparre", 
       "mailto:sparre+@pitt.edu" & "http://www.pitt.edu/~sparre".
--
"We need a plan to diverge from", Fesser