[Python-Dev] unifying str and unicode
skip@pobox.com
skip at pobox.com
Mon Oct 3 23:45:44 CEST 2005
Antoine> If an stdlib function returns an 8-bit string containing
Antoine> non-ascii data, then this string used in unicode context incurs
Antoine> an implicit conversion, which fails.
Such strings should be converted to Unicode at the point where they enter
the application. That's likely the only place where you have a good chance
of knowing the data encoding. Files generally have no encoding information
associated with them. Some databases don't handle Unicode transparently.
If you hang onto the input from such devices as plain strings until you need
them as Unicode, you will almost certainly not know how the string was
encoded. The state of the outside Unicode world being as miserable as it is
(think web input forms), you often don't know the encoding at the interface
and have to guess anyway. Even so, isolating that guesswork to the
interface is better than recovering somewhere further downstream.
Skip
More information about the Python-Dev
mailing list