[I18n-sig] Re: [XML-SIG] XML and Unicode
Walter Doerwald
walter@livinglogic.de
Wed, 23 May 2001 11:35:32 +0200
On 22.05.01 at 19:33 Mark Nottingham wrote:
> OK, so I'm not getting something then. The attached test script (and
> data file) is the problem pared down - if u'string' is a neutral
> encoding, and .encode('utf-8') generates a utf-8 encoded string of
> that encoding, then the utf-8.html output file should display
> correctly; however, it doesn't, while the latin-1 output does
> (because the input is latin-1).
>>> open("ISO-8859-1.xml","rb").read()
'<?xml version=3D"1.0" encoding=3D"ISO-8859-1" ?>\r\n<content>Net 21 \x96=
The Survivors</content>\r\n\r\n'
The character contained in your test XML file seems to be \x96, which
is a control character in Unicode, but in Windows it's used as an
endash.
If you want a "real" endash you should use the Unicode ndash U+2013:
"Net 21 – The Survivors".
But then encoding the output with latin-1 will no longer work.
> [...]
BTW, you might want to try several variants for the name of the
output encoding, because although Python encode method recognises
the name, your web browser might not.
Bye,
Walter D=F6rwald
--
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7=
www.livinglogic.de