[Python-Dev] str object going in Py3K

James Y Knight foom at fuhm.net
Wed Feb 15 17:48:18 CET 2006


On Feb 15, 2006, at 7:19 AM, Fuzzyman wrote:
> [snip..]
>
> I personally like the move towards all unicode strings, basically  
> any text where you don't know the encoding used is 'random binary  
> data'. This works fine, so long as you are in control of the text  
> source. *However*, it leaves the following problem :
>
> The current situation (treating byte-sequences as text and assuming  
> they are an ascii-superset encoded text-string) *works* (albeit  
> with many breakages), simply because this assumption is usually  
> correct.
>
> Forcing the programmer to be aware of encodings, also pushes the  
> same requirement onto the user (who is often the source of the text  
> in question).
>
> Currently you can read a text file and process it - making sure  
> that any changes/requirements only use ascii characters. It  
> therefore doesn't matter what 8 bit ascii-superset encoding is used  
> in the original. If you force the programmer to specify the  
> encoding in order to read the file, they would have to pass that  
> requirement onto their user. Their user is even less likely to be  
> encoding aware than the programmer.

Or the programmer can just use "iso-8859-1" and call it done. That  
will get you the same "I don't care" behavior as now.

James


More information about the Python-Dev mailing list