A challenge to the ASCII proponents.

Bengt Richter bokr at oz.net
Fri Jul 18 20:35:42 CEST 2003


On 18 Jul 2003 09:30:32 +0200, Hallvard B Furuseth <h.b.furuseth(nospam)@usit.uio(nospam).no> wrote:

>Oren Tirosh wrote:
>>On Fri, Jul 18, 2003 at 03:11:56AM +0000, Bengt Richter wrote:
>
>>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-8859-7">
>
>Needs to be charset=utf-8.  iso-8859-7 has no character number 947.
You're right. I think the iso-8859-7 just served as a font hint, in effect.
You can't leave out the <META ... line on my browser (NS4.5, english font defaults)
but IWG one could with Greek defaults.  IWG everything is converted to
windows wchars internally either way, according to some best-guess rules
if things aren't consistent.

If we add a space and the character  &#1046; after the Greek, the difference
will show up: with utf-8 you get the Cyrillic, and with iso-8859-7 you get a question mark.
(or at least you do with NS4.5). Just tried IE5 -- it seems to fake it either way, but screws up
the presentation with a change in font weight after two characters, and then spaces between chars.
Don't know what that's about. I don't use IE5, (comma required ;-) normally.
>
>>> <h1>&#947;(...)
>
>> Actually, you don't need the "CHARSET=iso-8859-7". It would be
>> required if you used the bytes 227, 223, 227, 237, 249, 243, 234, 249
>> to represent the characters. With numeric character references you can
>> embed any character from the UCS repertoire regardless of the charset
>> used.
>
>&#<num>; seems to mean character number NUM in the current character
>set, not in UCS.  At least on NS 4.79.
That seems to be confirmed by the Cyrillic experiment above, now at least
for NS4.5 and NS4.79.

Regards,
Bengt Richter




More information about the Python-list mailing list