Using Unicode scripts

Martin v. Löwis martin at v.loewis.de
Fri Jul 18 11:21:27 EDT 2003


"yzzzzz" <yzzzzz at netcourrier.com> writes:

> This means that Python doesn't take into account the specified encoding
> (Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
> file, regardless of the encoding used. Is this normal? 

It is. The "standard" string type of Python is a byte string, which is
not suitable to represent characters in general (although it works
fine for ASCII).

As the state of a string object consists just of its bytes, no
knowledge of the original source encoding can be preserved. No
knowledge of the source encoding is preserved for Unicode strings,
either, but that is not a problem since the source gets converted to
Unicode while parsing it.

Regards,
Martin





More information about the Python-list mailing list