[XML-SIG] Re: XML Unicode and UTF-8

Neil Youngman neil.youngman at youngman.org.uk
Sat Aug 7 21:11:08 CEST 2004


On Saturday 07 Aug 2004 3:42 pm, Fredrik Lundh wrote:
> Neil Youngman wrote:
> > Yes, but it's being written out through a UTF-8 codec to a file which
> > specifies 'charset="utf-8"'. AIUI the python UTF-8 codec can detect that
> > it's got a unicode string and convert it to utf-8 with no futher
> > programmer intervention.
>
> Python's UTF-8 codec takes a Unicode object, and generates an 8-bit string
> object.  If you attempt to "encode" an 8-bit string object, it is converted
> to a Unicode object first.  This conversion only works if the 8-bit string
> contains ASCII characters only.
>
> There's no such thing as an 8-bit Unicode string.

I never said there was. The string comes from decode, which I believe returns 
a Unicode string. AIUI the Python type system preserves that information 
until it reaches the codec, which therefore treats it correctly. My use of 
the phrase "the python UTF-8 codec can detect that it's got a unicode string" 
might have been a poor choice, but I don't think I'm disagreeing with you.

Neil Youngman



More information about the XML-SIG mailing list