[Python-Dev] str object going in Py3K
fuzzyman at voidspace.org.uk
Wed Feb 15 13:19:02 CET 2006
Adam Olsen wrote:
> On 2/14/06, Just van Rossum <just at letterror.com> wrote:
>> +1 for two functions.
>> My choice would be open() for binary and opentext() for text. I don't
>> find that backwards at all: the text function is going to be more
>> different from the current open() function then the binary function
>> would be since in many ways the str type is closer to bytes than to
>> Maybe it's even better to use opentext() AND openbinary(), and deprecate
>> plain open(). We could even introduce them at the same time as bytes()
>> (and leave the open() deprecation for 3.0).
> Thus providing us with a transition period, even with warnings on use
> of the old function.
I personally like the move towards all unicode strings, basically any
text where you don't know the encoding used is 'random binary data'.
This works fine, so long as you are in control of the text source.
*However*, it leaves the following problem :
The current situation (treating byte-sequences as text and assuming they
are an ascii-superset encoded text-string) *works* (albeit with many
breakages), simply because this assumption is usually correct.
Forcing the programmer to be aware of encodings, also pushes the same
requirement onto the user (who is often the source of the text in question).
Currently you can read a text file and process it - making sure that any
changes/requirements only use ascii characters. It therefore doesn't
matter what 8 bit ascii-superset encoding is used in the original. If
you force the programmer to specify the encoding in order to read the
file, they would have to pass that requirement onto their user. Their
user is even less likely to be encoding aware than the programmer.
What this means, is that for simple programs where the programmer
doesn't want to have to worry about encoding, or can't force the user to
be aware, they will read in the file as bytes. Modules will quickly and
inevitably be created implementing all the 'string methods' for bytes.
New programmers will gravitate to these and the old mess will continue,
but with a more awkward hybrid than before. (String manipulations of
byte sequences will no longer be a core part of the language - and so be
harder to use.)
Not sure what we can do to obviate this of course... but is this change
actually going to improve the situation or make it worse ?
All the best,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev