Unicode perplex

John Roth newsgroups at jhrothjr.com
Mon Jun 21 23:57:42 CEST 2004

"Irmen de Jong" <irmen at -nospam-remove-this-xs4all.nl> wrote in message
news:40d74e5d$0$568$e4fe514c at news.xs4all.nl...
> John Roth wrote:
> > Remember that the trick
> > is that it's still going to have the *same* stream of
> > bytes (at least if the Unicode string is implemented
> > in UTF-8.)
> Which it isnt't.
> AFAIK Python's storage format for Unicode strings is
> some form of 2-byte representation, it certainly isn't
> UTF-8.
> So if you want to turn your string into a Python Unicode
> object, you really have to push it trough the UTF-8 codec...

I see. I'm really very much a novice at unicode and all
the codec stuff. If I understand you, I need to get the
utf-8 codec and use the decode function to turn it into
a unicode string, and then use the encode function to
turn it back to a standard 8-byte string so I can write
it out (or send it down the pipe or socket...)

Thanks. Now that you point it out, it does look kind
of obvious - the second time.

John Roth
> --Irmen

More information about the Python-list mailing list