[RELEASED] Python 3.1 final

Nobody nobody at nowhere.com
Mon Jun 29 07:02:20 EDT 2009


On Sun, 28 Jun 2009 14:36:37 +0200, Martin v. Löwis wrote:

>> That's a significant improvement. It still decodes os.environ and sys.argv
>> before you have a chance to call sys.setfilesystemencoding(), but it
>> appears to be recoverable (with some effort; I can't find any way to re-do
>> the encoding without manually replacing the surrogates).
> 
> See PEP 383.

Okay, that's useful, except that it may have some bugs:

> r = "\udce4\udceb\udcef\udcf6\udcfc".encode("iso-8859-1", "surrogateescape")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/bytesobject.c:3182: bad argument to internal function

Trying a few random test cases suggests that the ratio of valid to invalid
bytes has an effect. Strings which consist mostly of invalid bytes trigger
the error, those which are mostly valid don't.

The error corresponds to _PyBytes_Resize(), which has the following
words of caution in a preceding comment:

/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

Assuming that this gets fixed, it should make most of the problems with
3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
sys.argv_bytes and os.environ_bytes.

>> However, sys.std{in,out,err} are still created as text streams, and AFAICT
>> there's nothing you can do about this from within your code.
> 
> That's intentional, and not going to change. You can access the
> underlying byte streams if you want to, as you could already in 3.0.

Okay, I've since been pointed to the relevant information (I was looking
under "File Objects"; I didn't think to look at "sys").




More information about the Python-list mailing list