[Python-Dev] str object going in Py3K
James Y Knight
foom at fuhm.net
Wed Feb 15 17:48:18 CET 2006
On Feb 15, 2006, at 7:19 AM, Fuzzyman wrote:
> [snip..]
>
> I personally like the move towards all unicode strings, basically
> any text where you don't know the encoding used is 'random binary
> data'. This works fine, so long as you are in control of the text
> source. *However*, it leaves the following problem :
>
> The current situation (treating byte-sequences as text and assuming
> they are an ascii-superset encoded text-string) *works* (albeit
> with many breakages), simply because this assumption is usually
> correct.
>
> Forcing the programmer to be aware of encodings, also pushes the
> same requirement onto the user (who is often the source of the text
> in question).
>
> Currently you can read a text file and process it - making sure
> that any changes/requirements only use ascii characters. It
> therefore doesn't matter what 8 bit ascii-superset encoding is used
> in the original. If you force the programmer to specify the
> encoding in order to read the file, they would have to pass that
> requirement onto their user. Their user is even less likely to be
> encoding aware than the programmer.
Or the programmer can just use "iso-8859-1" and call it done. That
will get you the same "I don't care" behavior as now.
James
More information about the Python-Dev
mailing list