[Python-Dev] unifying str and unicode
Antoine Pitrou
solipsis at pitrou.net
Mon Oct 3 22:38:19 CEST 2005
> > If that's how things were designed, then Python's entire standard
> > brary (not to mention third-party libraries) is not "unicode safe" -
> > to quote your own words - since many functions may return 8-bit strings
> > containing non-ascii characters.
>
> huh? first you talk about functions that convert unicode strings to 8-bit
> strings, now you talk about functions that return raw 8-bit strings?
Are you deliberately missing the argument?
And can't you understand that conversions are problematic in both
directions (str -> unicode /and/ unicode -> str)?
If an stdlib function returns an 8-bit string containing non-ascii data,
then this string used in unicode context incurs an implicit conversion,
which fails. How's that for "unicode safety" of stdlib functions? Will
you argue that this gives no difficulties to anyone ?
> all this in response to a post that argues that it's in fact a good idea to
> use plain strings to hold textual data that happens to contain ASCII only,
To which you apparently didn't read my answer, that is:
you can never be sure that a variable containing something which
is /semantically/ textual (*) will never contain anything other than
ASCII text. For example raw_input() won't tell you that its 8-bit string
result contains some chars > 0x7F. Same for many other library
functions. How do you cope with (more or less occasional) non-ascii data
coming in as 8-bit strings?
(*) that is, contains some natural language
Either you carefully plan for non-ascii text coming in your application
(including workarounds against Python's ascii-by-default conversion
policy), or you deliberately cripple your application by deciding that
non-ASCII text is forbidden in (some or all) places. Choose the latter
and you'll be hostile to users.
And this thread began with a poster who found difficult the way implicit
conversions happen in Python. So it's very funny that you deny the
existence of a problem for certain developers.
Antoine.
More information about the Python-Dev
mailing list