[Python-Dev] unifying str and unicode
James Y Knight
foom at fuhm.net
Tue Oct 4 05:44:13 CEST 2005
On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
> Antoine Pitrou wrote:
>>>> If I have an unicode string containing legal characters greater
>>>> 0x7F, and I pass it to a function which converts it to str, the
>>>> conversion fails.
>>> so? if it does that, it's not unicode safe.
>>> what's that has to do with
>>> my argument (which is that you can safely mix ascii strings and
>>> strings, because that's how things were designed).
>> If that's how things were designed, then Python's entire standard
>> brary (not to mention third-party libraries) is not "unicode safe" -
>> to quote your own words - since many functions may return 8-bit
>> containing non-ascii characters.
> huh? first you talk about functions that convert unicode strings
> to 8-bit
> strings, now you talk about functions that return raw 8-bit
> strings? and
> all this in response to a post that argues that it's in fact a good
> idea to
> use plain strings to hold textual data that happens to contain
> ASCII only,
> because 1) it works, by design, and 2) it's almost always more
> if you don't know what your own argument is, you cannot expect anyone
> to understand it.
Your point would be much easier to stomach if the "str" type could
*only* hold 7-bit ASCII. Perhaps that can be done when Python gets an
actual bytes type in 3.0. There indeed are a multitude of uses for
the efficient storage/processing of ASCII-only data. However,
currently, there are problems because it's so easy to screw yourself
without noticing when mixing unicode and str objects. If, on the
other hand, you have a 7bit ascii string type, and a 16/32-bit
unicode string type, both can be used interchangeably and there is no
possibility for any en/de-coding issues. And
asciiOnlyStringType.encode('utf-8') can become _ultra_ efficient, as
a bonus. :)
Seems win-win to me.
More information about the Python-Dev