[Python-Dev] unifying str and unicode

skip@pobox.com skip at pobox.com
Mon Oct 3 23:45:44 CEST 2005


    Antoine> If an stdlib function returns an 8-bit string containing
    Antoine> non-ascii data, then this string used in unicode context incurs
    Antoine> an implicit conversion, which fails. 

Such strings should be converted to Unicode at the point where they enter
the application.  That's likely the only place where you have a good chance
of knowing the data encoding.  Files generally have no encoding information
associated with them.  Some databases don't handle Unicode transparently.
If you hang onto the input from such devices as plain strings until you need
them as Unicode, you will almost certainly not know how the string was
encoded.  The state of the outside Unicode world being as miserable as it is
(think web input forms), you often don't know the encoding at the interface
and have to guess anyway.  Even so, isolating that guesswork to the
interface is better than recovering somewhere further downstream.

Skip


More information about the Python-Dev mailing list