[I18n-sig] Re: [XML-SIG] XML and Unicode

Walter Doerwald walter@livinglogic.de
Wed, 23 May 2001 11:35:32 +0200


On 22.05.01 at 19:33 Mark Nottingham wrote:

> OK, so I'm not getting something then. The attached test script (and
> data file) is the problem pared down - if u'string' is a neutral
> encoding, and .encode('utf-8') generates a utf-8 encoded string of
> that encoding, then the utf-8.html output file should display
> correctly; however, it doesn't, while the latin-1 output does
> (because the input is latin-1).

>>> open("ISO-8859-1.xml","rb").read()
'<?xml version=3D"1.0" encoding=3D"ISO-8859-1" ?>\r\n<content>Net 21 \x96=
 The Survivors</content>\r\n\r\n'

The character contained in your test XML file seems to be \x96, which
is a control character in Unicode, but in Windows it's used as an 
endash.

If you want a "real" endash you should use the Unicode ndash U+2013:
"Net 21 &#8211; The Survivors".

But then encoding the output with latin-1 will no longer work.

> [...]

BTW, you might want to try several variants for the name of the
output encoding, because although Python encode method recognises 
the name, your web browser might not.

Bye,
   Walter D=F6rwald

-- 
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7=
 www.livinglogic.de