<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Adam Olsen wrote:
<blockquote
cite="midaac2c7cb0602142014t553463c5g3f98922700aa734b@mail.gmail.com"
type="cite">
<pre wrap="">On 2/14/06, Just van Rossum <a class="moz-txt-link-rfc2396E" href="mailto:just@letterror.com"><just@letterror.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">+1 for two functions.
My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to
unicode.
Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).
</pre>
</blockquote>
<pre wrap=""><!---->
Thus providing us with a transition period, even with warnings on use
of the old function.
</pre>
</blockquote>
[snip..]<br>
<br>
I personally like the move towards all unicode strings, basically any
text where you don't know the encoding used is 'random binary data'.
This works fine, so long as you are in control of the text source.
*However*, it leaves the following problem :<br>
<br>
The current situation (treating byte-sequences as text and assuming
they are an ascii-superset encoded text-string) *works* (albeit with
many breakages), simply because this assumption is usually correct.<br>
<br>
Forcing the programmer to be aware of encodings, also pushes the same
requirement onto the user (who is often the source of the text in
question).<br>
<br>
Currently you can read a text file and process it - making sure that
any changes/requirements only use ascii characters. It therefore
doesn't matter what 8 bit ascii-superset encoding is used in the
original. If you force the programmer to specify the encoding in order
to read the file, they would have to pass that requirement onto their
user. Their user is even less likely to be encoding aware than the
programmer.<br>
<br>
What this means, is that for simple programs where the programmer
doesn't want to have to worry about encoding, or can't force the user
to be aware, they will read in the file as bytes. Modules will quickly
and inevitably be created implementing all the 'string methods' for
bytes. New programmers will gravitate to these and the old mess will
continue, but with a more awkward hybrid than before. (String
manipulations of byte sequences will no longer be a core part of the
language - and so be harder to use.)<br>
<br>
Not sure what we can do to obviate this of course... but is this change
actually going to improve the situation or make it worse ?<br>
<br>
All the best,<br>
<br>
Michael Foord<br>
</body>
</html>