[Python-Dev] unifying str and unicode

Tue Oct 4 05:44:13 CEST 2005

On Oct 3, 2005, at 3:47 PM, Fredrik Lundh wrote:
> Antoine Pitrou wrote:
>
>
>>>> If I have an unicode string containing legal characters greater  
>>>> than
>>>> 0x7F, and I pass it to a function which converts it to str, the
>>>> conversion fails.
>>>>
>>>
>>> so?  if it does that, it's not unicode safe.
>>>
>> [...]
>>
>>> what's that has to do with
>>> my argument (which is that you can safely mix ascii strings and  
>>> unicode
>>> strings, because that's how things were designed).
>>>
>>
>> If that's how things were designed, then Python's entire standard
>> brary (not to mention third-party libraries) is not "unicode safe" -
>> to quote your own words - since many functions may return 8-bit  
>> strings
>> containing non-ascii characters.
>>
>
> huh?  first you talk about functions that convert unicode strings  
> to 8-bit
> strings, now you talk about functions that return raw 8-bit  
> strings?  and
> all this in response to a post that argues that it's in fact a good  
> idea to
> use plain strings to hold textual data that happens to contain  
> ASCII only,
> because 1) it works, by design, and 2) it's almost always more  
> efficient.
>
> if you don't know what your own argument is, you cannot expect anyone
> to understand it.

Your point would be much easier to stomach if the "str" type could  
*only* hold 7-bit ASCII. Perhaps that can be done when Python gets an  
actual bytes type in 3.0. There indeed are a multitude of uses for  
the efficient storage/processing of ASCII-only data. However,  
currently, there are problems because it's so easy to screw yourself  
without noticing when mixing unicode and str objects. If, on the  
other hand, you have a 7bit ascii string type, and a 16/32-bit  
unicode string type, both can be used interchangeably and there is no  
possibility for any en/de-coding issues. And  
asciiOnlyStringType.encode('utf-8') can become _ultra_ efficient, as  
a bonus. :)

Seems win-win to me.

James