From greg at  Fri May  1 00:44:12 2009
From: greg at (Gregory P. Smith)
Date: Thu, 30 Apr 2009 15:44:12 -0700
Subject: [Python-Dev] Proposed: a new function-based C API for declaring
	Python types
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Apr 28, 2009 at 8:03 PM, Larry Hastings <larry at> wrote:

> I've written a patch against py3k trunk creating a new function-based
> API for creating  extension types in C.  This allows PyTypeObject to
> become a (mostly) private structure.
> Here's how you create an extension type using the current API.
>  * First, find some code that already has a working type declaration.
>   Copy and paste their fifty-line PyTypeObject declaration, then
>   hack it up until it looks like what you need.
>  * Next--hey!  There *is* no next, you're done.  You can immediately
>   create an object using your type and pass it into the Python
>   interpreter and it would work fine.  You are encouraged to call
>   PyType_Ready(), but this isn't required and it's often skipped.
> This approach causes two problems.
>  1) The Python interpreter *must support* and *cannot change*
>    the PyTypeObject structure, forever.  Any meaningful change to
>    the structure will break every extension.   This has many
>    consequences:
>      a) Fields that are no longer used must be left in place,
>         forever, as ignored placeholders if need be.  Py3k cleaned
>     up a lot of these, but it's already picked up a new one
>     ("tp_compare" is now "tp_reserved").
>      b) Internal implementation details of the type system must
>         be public.
>      c) The interpreter can't even use a different structure
>         internally, because extensions are free to pass in objects
>     using PyTypeObjects the interpreter has never seen before.
>  2) As a programming interface this lacks a certain gentility.  It
>    clearly *works*, but it requires programmers to copy and paste
>    with a large structure mostly containing NULLs, which they must
>    pick carefully through to change just a few fields.
> My patch creates a new function-based extension type definition API.
> You create a type by calling PyType_New(), then call various accessor
> functions on the type (PyType_SetString and the like), and when your
> type has been completely populated you must call PyType_Activate()
> to enable it for use.
> With this API available, extension authors no longer need to directly
> see the innards of the PyTypeObject structure.  Well, most of the
> fields anyway.  There are a few shortcut macros in CPython that need
> to continue working for performance reasons, so the "tp_flags" and
> "tp_dealloc" fields need to remain publically visible.
> One feature worth mentioning is that the API is type-safe.  Many such
> APIs would have had one generic "PyType_SetPointer", taking an
> identifier for the field and a void * for its value, but this would
> have lost type safety.  Another approach would have been to have one
> accessor per field ("PyType_SetAddFunction"), but this would have
> exploded the number of functions in the API.  My API splits the
> difference: each distinct *type* has its own set of accessors
> ("PyType_GetSSizeT") which takes an identifier specifying which
> field you wish to get or set.
> The major change resulting from this API: all PyTypeObjects must now
> be *pointers* rather than static instances.  For example, the external
> declaration of PyType_Type itself changes from this:
>   PyAPI_DATA(PyTypeObject) PyType_Type;
> to this:
>   PyAPI_DATA(PyTypeObject *) PyType_Type;
> This gives rise to the first headache caused by the API: type casts
> on type objects.  It took me a day and a half to realize that this,
> from Modules/_weakref.c:
>       PyModule_AddObject(m, "ref",
>                          (PyObject *) &_PyWeakref_RefType);
> really needed to be this:
>       PyModule_AddObject(m, "ref",
>                          (PyObject *) _PyWeakref_RefType);
> Hopefully I've already found most of these in CPython itself, but
> this sort of code surely lurks in extensions yet to be touched.
> (Pro-tip: if you're working with this patch, and you see a crash,
> and gdb shows you something like this at the top of the stack:
>   #0  0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
>                  at Modules/gcmodule.c:323
>   323             if (PyObject_IS_GC(op)) {
> your problem is an errant &, likely on a type object you're passing
> in to the interpreter.  Think--what did you touch recently?  Or debug
> it by salting your code with calls to collect(NUM_GENERATIONS-1).)
> Another irksome side-effect of the API: because of "tp_flags" and
> "tp_dealloc", I now have two declarations of PyTypeObject.  There's
> the externally-visible one in Include/object.h, which lets external
> parties see "tp_dealloc" and "tp_flags".  Then there's the internal
> one in Objects/typeprivate.h which is the real structure.  Since
> declaring a type twice is a no-no, the external one is gated on
>   #ifndef PY_TYPEPRIVATE
> If you're a normal Python extension programmer, you'd include Python.h
> as normal:
>   #include "Python.h"
> Python implementation files that need to see the real PyTypeObject
> structure now look like this:
>   #define PY_TYPEPRIVATE
>   #include "Python.h"
>   #include "../Objects/typeprivate.h"
> Also, since the structure of PyTypeObject hasn't yet changed, there
> are a bunch of fields in PyTypeObject that are externally visible that
> I don't want to be visible.  To ensure no one was using them, I renamed
> them to "mysterious_object_0" and "mysterious_object_1" and the like.
> Before this patch gets accepted, I want to reorder the fields in
> PyTypeObject (which we can! because it's private!) so that these public
> fields are at the top of the both the external and internal structures.
> Python internally declares a great many types, and I haven't attempted
> to convert them all.  Instead there's an conversion header file that
> does most of the work for you.  Here's how one would apply it to an
> existing type.
> 1. Where your file currently has this:
>   #include "Python.h"
>  change it to this:
>   #define PY_TYPEPRIVATE
>   #include "Python.h"
>   #include "pytypeconvert.h"
> 2. Whenever you declare a type, change it from this:
>   static PyTypeObject YourExtension_Type = {
>  to this:
>   static PyTypeObject *YourExtension_Type;
>   static PyTypeObject _YourExtension_Type = {
>  Use NULL for your metaclass.  For example, change this:
>   PyObject_HEAD_INIT(&PyType_Type),
>  to this:
>   PyObject_HEAD_INIT(NULL),
>  Also use NULL for your baseclass.  For example, change this:
>   &PyDict_Type, /* tp_base */
>  to this:
>   NULL, /* tp_base */
>  setting it to NULL instead.
> 3. In your module's init function, add this:
>   CONVERT_TYPE(YourExtension_Type,
>       metaclass, baseclass, "description of type");
>  "metaclass" and "baseclass" should be the metaclass and baseclass
>  for your type, the ones you just set to NULL in step 3.  If you
>  had NULL before the baseclass, use NULL here too.
> 4. If you have any static object declarations, set their ob_type to
>  NULL in the static declaration, then set it explicitly in your
>  init function.  If your object uses a locally-defined type,
>  be sure to do this *after* the CONVERT_TYPE line for that type.
>  (See _Py_EllipsisObject for an example.)
> 5. Anywhere you're using existing Python type declarations
>  you must remove the & from the front.
> The conversion header file *also* redefines PyTypeObject.  But this
> time it redefines it to the existing definition, and that definition
> will stay the same forever.  That's the whole point: if you have an
> existing Python 3.0 extension, it won't have to change if we change
> the internal definition of PyTypeObject.
> (Why bother with this conversion process, with few py3k extensions
> in the wild?  This patch was started quite a while ago, when it
> seemed plausible the API would get backported to 2.x.  Now I'm not
> so sure that will happen.)
> I've uploaded a patch to the tracker:
> It applies cleanly to py3k/trunk (r72081).  But the code is awfully
> grubby.
> * I haven't dealt with any types I can't build, and I can't build
>  a lot of the extensions.  I'm using Linux, and I don't have the
>  dev headers for many libraries on my laptop, and I haven't touched
>  Windows or Mac stuff.
> * I created some new build warnings which should obviously be fixed.
> * With the patch installed, py3k trunk builds and installs.  It does
>  *not* pass the regression test suite.  (It crashes.)  I don't think
>  this'll be too bad, it's just taken me this long to get it as far
>  as I have.
> * There are some internal scaffolds and hacks that should be purged
>  by the final patch.
> * There's no documentation.  If you'd like to see how you'd use the
>  new API, currently the best way to learn is to read
>  Include/pytypeconvert.h.
> * I don't like the PY_TYPEPRIVATE hack.  I only used it 'cause it
>  sucks less than the other approaches I've thought of.  I welcome
>  your suggestions.
>  The second-best approach I've come up with: make PyTypeObject
>  genuinely private, and declare a different structure containing just
>  the head of PyTypeObject.   Let's call it PyTypeObjectHead.  Then,
>  for the convenience macros that use "dealloc" and "flags", cast the
>  object to PyTypeObjectHead before dereferencing.  This abandons type
>  safety, and given my longing for type safety while developing this
>  patch I'd prefer to not make loss of type safety an official API.
> My understanding is that the feature-freeze for Python 3.1 is in a
> little over a week.  Given the current stability level and untestedness
> of the patch, and the lateness of the hour... is there any chance this
> would be accepted into Python 3.1?  If so, I'll need to act fast.  If
> not, I might as well take it relax, huh.
> My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
> for some recent encouragement.  (And another person who I discussed it
> with so long ago I forgot who it was... maybe Fredik Lundh?)
> /larry/


I haven't looked at your code so I can't comment on the API itself... But
awesome.  I like the general idea.  Exposing structures has hampered us for
quite a while with forwards API compatability.

I predict not enough people are available to drive this to adoption and use
for Python 3.1 given the time frame (the beta feature freeze happens this
Saturday I believe?) but we should make this happen for 3.2 and get it
stable and into in trunk soon after release-31maint branch is created.

Whats needed?  Perhaps a PEP describing a lot of what you started to write
up in this email: the new extension module API with sections on the upgrade
path and backwards compatibillity story.

Extension modules are often maintained such that they work on all versions
of Python from 2.3 or 2.4 on up to 3.x.  We should provide a decent way to
do that.  Could some of these API functions be provided as a rarely changing
add on .c/.h file for extension module authors to bundle as part of their
extension modules for use with older versions of python to avoid big #ifdefs
around structure definitions vs initialization API calls?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From skippy.hammond at  Fri May  1 02:20:52 2009
From: skippy.hammond at (Mark Hammond)
Date: Fri, 01 May 2009 10:20:52 +1000
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Larry Hastings wrote:
> Counting the votes for :
>    +1 from Mark Hammond (via private mail)
>    +1 from Paul Moore (via the tracker)
>    +1 from Tim Golden (in Python-ideas, though what he literally said
>    was "I'm up for it")
>    +1 from Michael Foord
>    +1 from Eric Smith
> There have been no other votes.
> Is that enough consensus for it to go in?  If so, are there any core 
> developers who could help me get it in before the 3.1 feature freeze?  
> The patch should be in good shape; it has unit tests and updated 
> documentation.

I've taken the liberty of explicitly CCing Martin just incase he missed 
the thread with all the noise regarding PEP383.

If there are no objections from Martin or anyone else here, please feel 
free to assign it to me (and mail if I haven't taken action by the day 
before the beta freeze...)



From steve at  Fri May  1 04:40:14 2009
From: steve at (Steven D'Aprano)
Date: Fri, 1 May 2009 12:40:14 +1000
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote:

> You can get the same error on Linux:
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more
> information.
> >>> f=open(chr(255),'w')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'

Works for me under Fedora using ext3 as the file system.

$ python2.6
Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13)
[GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open(chr(255),'w')
>>> f.close()
>>> import os
>>> os.remove(chr(255))

Given that chr(255) is a valid filename on my file system, I would 
consider it a bug if Python couldn't deal with a file with that name.

Steven D'Aprano

From ronaldoussoren at  Fri May  1 07:41:16 2009
From: ronaldoussoren at (Ronald Oussoren)
Date: Fri, 01 May 2009 07:41:16 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
In-Reply-To: <>
References: <>
Message-ID: <>

On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote:

>>>>>> Ronald Oussoren <ronaldoussoren at> (RO) wrote:
>> RO> For what it's worth, the OSX API's seem to behave as follows:
>> RO> * If you create a file with an non-UTF8 name on a HFS+  
>> filesystem the
>> RO> system automaticly encodes the name.
>> RO> That is,  open(chr(255), 'w') will silently create a file named  
>> '%FF'
>> RO> instead of the name you'd expect on a unix system.
> Not for me (I am using Python 2.6.2).
>>>> f = open(chr(255), 'w')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'

That's odd. Which version of OSX do you use?

ronald at Rivendell-2[0]$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.5.6
BuildVersion:	9G55

ronald at Rivendell-2[0]$ /usr/bin/python
Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.listdir('.')
 >>> open(chr(255), 'w').write('x')
 >>> os.listdir('.')

And likewise with python 2.6.1+ (after cleaning the directory):

ronald at Rivendell-2[0]$ python2.6
Python 2.6.1+ (release26-maint:70603, Mar 26 2009, 08:38:03)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.listdir('.')
 >>> open(chr(255), 'w').write('x')
 >>> os.listdir('.')

> I once got a tar file from a Linux system which contained a file  
> with a
> non-ASCII, ISO-8859-1 encoded filename. The tar file refused to be
> unpacked on a HFS+ filesystem.
> -- 
> Piet van Oostrum <piet at>
> URL: [PGP 8DAE142BE17999C4]
> Private email: piet at

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <>

From zookog at  Fri May  1 07:44:36 2009
From: zookog at (Zooko O'Whielacronx)
Date: Thu, 30 Apr 2009 23:44:36 -0600
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>


My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
binary names from the filesystem and store them so that I can regenerate
the same byte string later, but it also requires that I *know* whether
what I got was a valid string in the expected encoding (which might be
utf-8) or whether it was not and I need to fall back to storing the
bytes.  So far, it looks like PEP 383 doesn't provide both of these
requirements, so I am going to have to continue working-around the
Python API even after PEP 383.  In fact, it might actually increase the
amount of working-around that I have to do.

If I understand correctly, .decode(encoding, 'strict') will not be
changed by PEP 383.  A new error handler is added, so .decode('utf-8',
'python-escape') performs the utf-8b decoding.  Am I right so far?
Therefore if I have a string of bytes, I can attempt to decode it with
'strict', and if that fails I can set the flag showing that it was not a
valid byte string in the expected encoding, and then I can invoke
.decode('utf-8', 'python-escape') on it.  So far, so good.

(Note that I never want to do .decode(expected_encoding,
'python-escape') -- if it wasn't a valid bytestring in the
expected_encoding, then I want to decode it with utf-8b, regardless of
what the expected encoding was.)

Anyway, I can use it like this:

class FName:
    def __init__(self, name, failed_decode=False): = name
        self.failed_decode = failed_decode

def fs_to_unicode(bytes):
        return FName(bytes.decode(sys.getfilesystemencoding(), 'strict'))
    except UnicodeDecodeError:
        return FName(fn.decode('utf-8', 'python-escape'), failed_decode=True)

And what about unicode-oriented APIs such as os.listdir()?  Uh-oh, the
PEP says that on systems with locale 'utf-8', it will automatically be
changed to 'utf-8b'.  This means I can't reliably find out whether the
entries in the directory *were* named with valid encodings in utf-8?
That's not acceptable for my use case.  I would have to refrain from
using the unicode-oriented os.listdir() on POSIX, and instead do
something like this:

if platform.system() in ('Windows', 'Darwin'):
    def listdir(d):
        return [FName(n) for n in os.listdir(d)]
elif platform.system() in ('Linux', 'SunOs'):
    def listdir(d):
        bytesd = d.encode(sys.getfilesystemencoding())
        return [fs_to_unicode(n) for n in os.listdir(bytesd)]
    raise NotImplementedError("Please classify platform.system() == %s \
as either unicode-safe or unicode-unsafe." % platform.system())

In fact, if 'utf-8' gets automatically converted to 'utf-8b' when
*decoding* as well as encoding, then I would have to change my
fs_to_unicode() function to check for that and make sure to use strict
utf-8 in the first attempt:

def fs_to_unicode(bytes):
    fse = sys.getfilesystemencoding()
    if fse == 'utf-8b':
        fse = 'utf-8'
        return FName(bytes.decode(fse, 'strict'))
    except UnicodeDecodeError:
        return FName(fn.decode('utf-8', 'python-escape'),

Would it be possible for Python unicode objects to have a flag
indicating whether the 'python-escape' error handler was present?  That
would serve the same purpose as my "failed_decode" flag above, and would
basically allow me to use the Python APIs directory and make all this
work-around code disappear.

Failing that, I can't see any way to use the os.listdir() in its
unicode-oriented mode to satisfy Tahoe's requirements.

If you take the above code and then add the fact that you want to use
the failed_decode flag when *encoding* the d argument to os.listdir(),
then you get this code: [2].

Oh, I just realized that I *could* use the PEP 383 os.listdir(), like

def listdir(d):
    fse = sys.getfilesystemencoding()
    if fse == 'utf-8b':
        fse = 'utf-8'
    ns = []
    for fn in os.listdir(d):
        bytes = fn.encode(fse, 'python-escape')
            ns.append(FName(bytes.decode(fse, 'strict')))
        except UnicodeDecodeError:
            ns.append(FName(fn.decode('utf-8', 'python-escape'),
    return ns

(And I guess I could define listdir() like this only on the
non-unicode-safe platforms, as above.)

However, that strikes me as even more horrible than the previous
"listdir()" work-around, in part because it means decoding, re-encoding,
and re-decoding every name, so I think I would stick with the previous

Oh, one more note: for Tahoe's purposes you can, in all of the code
above, replace ".decode('utf-8', 'python-replace')" with
".decode('windows-1252')" and it works just as well.  While UTF-8b seems
like a really cool hack, and it would produce more legible results if
utf-8-encoded strings were partially corrupted, I guess I should just
use 'windows-1252' which is already implemented in Python 2 (as well as
in all other software in the world).

I guess this means that PEP 383, which I have approved of and liked so
far in this discussion, would actually not help Tahoe at all and would
in fact harm Tahoe -- I would have to remember to detect and work-around
the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python

If anyone else has a concrete, real use case which would be helped by
PEP 383, I would like to hear about it.  Perhaps Tahoe can learn
something from it.

Oh, if this PEP could be extended to add a flag to each unicode object
indicating whether it was created with the python-escape handler or not,
then it would be useful to me.




From martin at  Fri May  1 08:25:34 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 08:25:34 +0200
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <>
References: <>
	<>	<>
	<> <>
Message-ID: <>

> I've taken the liberty of explicitly CCing Martin just incase he missed
> the thread with all the noise regarding PEP383.
> If there are no objections from Martin

It's fine with me - I just won't have time to look into the details of
that change.


From fuzzyman at  Fri May  1 11:06:08 2009
From: fuzzyman at (Michael Foord)
Date: Fri, 01 May 2009 10:06:08 +0100
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <>
	<>	<>	<>	<>
Message-ID: <>

Zooko O'Whielacronx wrote:
> [snip...]
> Would it be possible for Python unicode objects to have a flag
> indicating whether the 'python-escape' error handler was present?  That
> would serve the same purpose as my "failed_decode" flag above, and would
> basically allow me to use the Python APIs directory and make all this
> work-around code disappear.
> Failing that, I can't see any way to use the os.listdir() in its
> unicode-oriented mode to satisfy Tahoe's requirements.
> If you take the above code and then add the fact that you want to use
> the failed_decode flag when *encoding* the d argument to os.listdir(),
> then you get this code: [2].
> Oh, I just realized that I *could* use the PEP 383 os.listdir(), like
> this:
> def listdir(d):
>     fse = sys.getfilesystemencoding()
>     if fse == 'utf-8b':
>         fse = 'utf-8'
>     ns = []
>     for fn in os.listdir(d):
>         bytes = fn.encode(fse, 'python-escape')
>         try:
>             ns.append(FName(bytes.decode(fse, 'strict')))
>         except UnicodeDecodeError:
>             ns.append(FName(fn.decode('utf-8', 'python-escape'),
>                       failed_decode=True))
>     return ns
> (And I guess I could define listdir() like this only on the
> non-unicode-safe platforms, as above.)
> However, that strikes me as even more horrible than the previous
> "listdir()" work-around, in part because it means decoding, re-encoding,
> and re-decoding every name, so I think I would stick with the previous
> version.

The current unicode mode would skip the filenames you are interested 
(those that fail to decode correctly) - so you would have been forced to 
use the bytes mode. If you need access to the original bytes then you 
should continue to do this. PEP-383 is entirely neutral for your use 
case as far as I can see.


> Oh, one more note: for Tahoe's purposes you can, in all of the code
> above, replace ".decode('utf-8', 'python-replace')" with
> ".decode('windows-1252')" and it works just as well.  While UTF-8b seems
> like a really cool hack, and it would produce more legible results if
> utf-8-encoded strings were partially corrupted, I guess I should just
> use 'windows-1252' which is already implemented in Python 2 (as well as
> in all other software in the world).
> I guess this means that PEP 383, which I have approved of and liked so
> far in this discussion, would actually not help Tahoe at all and would
> in fact harm Tahoe -- I would have to remember to detect and work-around
> the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python
> 3.
> If anyone else has a concrete, real use case which would be helped by
> PEP 383, I would like to hear about it.  Perhaps Tahoe can learn
> something from it.
> Oh, if this PEP could be extended to add a flag to each unicode object
> indicating whether it was created with the python-escape handler or not,
> then it would be useful to me.
> Regards,
> Zooko
> [1]
> [2]
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


From rdmurray at  Fri May  1 13:13:24 2009
From: rdmurray at (R. David Murray)
Date: Fri, 1 May 2009 07:13:24 -0400 (EDT)
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Thu, 30 Apr 2009 at 23:44, Zooko O'Whielacronx wrote:
> Would it be possible for Python unicode objects to have a flag
> indicating whether the 'python-escape' error handler was present?  That

Unless I'm misunderstanding something, couldn't you implement what you
need by looking in a given string for the half surrogates?  If you find
one, you have a string python-escape modified, if you don't, it didn't.

What does Tahoe do on Windows when it gets a filename that is not valid
Unicode?  You might not even have to conditionalize the above code
on platform (ie: instead you have a generalized is_valid_unicode test
function that you always use).


From martin at  Fri May  1 17:16:16 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 17:16:16 +0200
Subject: [Python-Dev] Deferring PEP 382
Message-ID: <>

During Guido's review, we discovered that PEP 382 doesn't
deal with PEP 302 loaders; I believe that it should, though.

Rather than coming up with an ad-hoc design, I propose to
defer the PEP to Python 3.2 - unless somebody can propose
a straight-forward design with not too many new interfaces.

FWIW, my own approach would be to add two new interfaces to
1. extend the package path according to .pth files available
   to the loader (alternatively, provide the contents of the
   .pth files of the package in question)
2. search for and execute a package initialization module.


From stephen at  Fri May  1 17:36:39 2009
From: stephen at (Stephen J. Turnbull)
Date: Sat, 02 May 2009 00:36:39 +0900
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System
	Character	Interfaces
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

James Y Knight writes:

 > in python. It seems like the most common reason why people want to use  
 > SJIS is to make old pre-unicode apps work right in WINE -- in which  
 > case it doesn't actually affect unix python at all.

Mounting external drives, especially USB memory sticks which tend to
be FAT-initialized by the manufacturers, is another common case.

But I don't understand why PEP 383 needs to care at all.

From zookog at  Fri May  1 17:31:01 2009
From: zookog at (Zooko O'Whielacronx)
Date: Fri, 1 May 2009 09:31:01 -0600
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>

Following-up to my own post to correct a major error:

On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx <zookog at> wrote:
> Folks:
> My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
> binary names from the filesystem and store them so that I can regenerate
> the same byte string later, but it also requires that I *know* whether
> what I got was a valid string in the expected encoding (which might be
> utf-8) or whether it was not and I need to fall back to storing the
> bytes.

Okay, I am wrong about this.  Having a flag to remember whether I had to
fall back to the utf-8b trick is one method to implement my requirement,
but my actual requirement is this:

Requirement: either the unicode string or the bytes are faithfully
transmitted from one system to another.

That is: if you read a filename from the filesystem, and transmit that
filename to another system and use it, then there are two cases:

Requirement 1: the byte string was valid in the encoding of source
system, in which case the unicode name is faithfully transmitted
(i.e. the bytes that finally land on the target system are the result of

Requirement 2: the byte string was not valid in the encoding of source
system, in which case the bytes are faithfully transmitted (i.e. the
bytes that finally land on the target system are the same as the bytes
that originated in the source system).

Now I finally understand how fiendishly clever MvL's PEP 383
generalization of Markus Kuhn's utf-8b trick is!  The only thing
necessary to achieve both of those requirements above is that the
'python-escape' error handler is used on the target system .encode() as
well as on the source system .decode()!

Well, I'm going to have to let this sink in and maybe write some code to
see if I really understand it.

But if this is right, then I can do away with some of the mechanism that
I've built up, and instead:

Backport PEP 383 to Python 2.

And, document the PEP 383 trick in some generic, widely respected format
such as an Internet Draft so that I can explain to other users of the
Tahoe data (many of whom use other languages than Python) what they have
to do if they find invalid utf-8 in the data.  Oh good, I just realized
that Tahoe emits only utf-8, so all I have to do is point them to the
utf-8b documents (such as they are) and explain that to read filenames
produced by Tahoe they have to implement utf-8b.  That's really good
that they don't have to implement MvL's generalization of that trick to
other encodings, since utf-8b is already understood by some folks.

Okay, I find it surprisingly easy to make subtle errors in this encoding
stuff, so please let me know if you spot one.  Is it true that
srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
'python-escape') will always produce srcbytes ?  That is my Requirement



From google at  Fri May  1 17:33:47 2009
From: google at (MRAB)
Date: Fri, 01 May 2009 16:33:47 +0100
Subject: [Python-Dev] Oddity PEP 0 key
Message-ID: <>

I've just noticed an oddity in the key in PEP 0. Most letters are used
more than once. Wouldn't it be clearer if different letters were used
for "Accepted" and "Active" instead of them both being 'A', for example?

-> A - Accepted proposal
-> R - Rejected proposal
    W - Withdrawn proposal
-> D - Deferred proposal
    F - Final proposal
-> A - Active proposal
-> D - Draft proposal
-> R - Replaced proposal

From google at  Fri May  1 17:52:50 2009
From: google at (MRAB)
Date: Fri, 01 May 2009 16:52:50 +0100
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <>
	<>	<>	<>	<>	<>
Message-ID: <>

Zooko O'Whielacronx wrote:
> Following-up to my own post to correct a major error:
> On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx <zookog at> wrote:
>> Folks:
>> My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
>> binary names from the filesystem and store them so that I can regenerate
>> the same byte string later, but it also requires that I *know* whether
>> what I got was a valid string in the expected encoding (which might be
>> utf-8) or whether it was not and I need to fall back to storing the
>> bytes.
> Okay, I am wrong about this.  Having a flag to remember whether I had to
> fall back to the utf-8b trick is one method to implement my requirement,
> but my actual requirement is this:
> Requirement: either the unicode string or the bytes are faithfully
> transmitted from one system to another.
> That is: if you read a filename from the filesystem, and transmit that
> filename to another system and use it, then there are two cases:
> Requirement 1: the byte string was valid in the encoding of source
> system, in which case the unicode name is faithfully transmitted
> (i.e. the bytes that finally land on the target system are the result of
> sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).
> Requirement 2: the byte string was not valid in the encoding of source
> system, in which case the bytes are faithfully transmitted (i.e. the
> bytes that finally land on the target system are the same as the bytes
> that originated in the source system).
> Now I finally understand how fiendishly clever MvL's PEP 383
> generalization of Markus Kuhn's utf-8b trick is!  The only thing
> necessary to achieve both of those requirements above is that the
> 'python-escape' error handler is used on the target system .encode() as
> well as on the source system .decode()!
> Well, I'm going to have to let this sink in and maybe write some code to
> see if I really understand it.
> But if this is right, then I can do away with some of the mechanism that
> I've built up, and instead:
> Backport PEP 383 to Python 2.
> And, document the PEP 383 trick in some generic, widely respected format
> such as an Internet Draft so that I can explain to other users of the
> Tahoe data (many of whom use other languages than Python) what they have
> to do if they find invalid utf-8 in the data.  Oh good, I just realized
> that Tahoe emits only utf-8, so all I have to do is point them to the
> utf-8b documents (such as they are) and explain that to read filenames
> produced by Tahoe they have to implement utf-8b.  That's really good
> that they don't have to implement MvL's generalization of that trick to
> other encodings, since utf-8b is already understood by some folks.
> Okay, I find it surprisingly easy to make subtle errors in this encoding
> stuff, so please let me know if you spot one.  Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ?  That is my Requirement
> 2.
No, but srcbytes.encode('utf-8', 'python-escape').decode('utf-8',
'python-escape') == srcbytes. The encodings on both ends need to be the

For example:

 >>> b'\x80'.decode('windows-1252')
 >>> u'\u20ac'.encode('utf-8')


 >>> b'\x80'.decode('utf-8')

Traceback (most recent call last):
   File "<pyshell#7>", line 1, in <module>
   File "C:\Python26\lib\encodings\", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: 
unexpected code byte

But under this PEP:

 >>> b'x80'.decode('utf-8', 'python-escape')
 >>> u'\xdc80'.encode('utf-8', 'python-escape')

From status at  Fri May  1 18:07:30 2009
From: status at (Python tracker)
Date: Fri,  1 May 2009 18:07:30 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (04/24/09 - 05/01/09)
Python tracker at

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2190 open (+34) / 15527 closed (+29) / 17717 total (+63)

Open issues with patches:   861

Average duration of open issues: 645 days.
Median duration of open issues: 394 days.

Open Issues Breakdown
   open  2156 (+33)
pending    33 ( +1)

Issues Created Or Reopened (63)

os.path.walk fails to descend into a directory whose name ends w 04/24/09
CLOSED    created  linuxelf                      

readline update                                                  04/24/09    created  jrevans1                      

The word "error" used instead of "failure"                       04/25/09
CLOSED    created  kurtmckee                     

Deprecate PyOS_ascii_formatd                                     04/25/09
CLOSED    created  eric.smith                    

Clean up float parsing code for nans and infs                    04/25/09
CLOSED    created  marketdickinson               

support.EnvironmentVarGuard broken                               04/25/09
CLOSED    created  doerwalter                    

Test issue                                                       04/25/09
CLOSED    created  ajaksu2                       

RegOpenKeyEx key failed on Vista 64Bit with return 2             04/25/09    created  makursi                       

"Thread State and the Global Interpreter Lock" section of the do 04/25/09    created  exarkun                       

add py3k warnings to commands                                    04/25/09
CLOSED    created  dsm001                        

Move test outside of urlparse module                             04/25/09    created  Merwok                        

Possible normalization error in urlparse.urlunparse              04/25/09    created  Merwok                        

internal error on write while reading                            04/25/09    created  dsm001                        

rlcompleter should be enabled automatically                      04/25/09    created  cben                          

Deprecate obsolete functions in unittest                         04/25/09    created  michael.foord                 

IDLE/Win Installer: drop -n switch for 2.7/3.1; install 3.1 as i 04/26/09    created  kbk                           

Minor unittest doc patch                                         04/26/09
CLOSED    created  michael.foord                 
       patch, patch, easy, needs review                                        

Idle 3.01 - invalid syntec error                                 04/26/09
CLOSED    created  r2d2floyd                     

Full example for emulating a container type                      04/27/09
CLOSED    created  yaneurabeya                   

Add a stream parameter to gc.set_debug                           04/27/09    created  nicdumz                       

can't use "glog" to find the path with square bracket            04/27/09
CLOSED    created  winterTTr                     

mimetypes.guess_type() hits recursion limit                      04/27/09
CLOSED    created  djc                           

logging module's __all__ attribute not in sync with documentatio 04/27/09
CLOSED    created  flub                          

Perhaps exponential performance of sum(listoflists, [])          04/27/09
CLOSED    created  sjohn                         

Minor typo in traceback example                                  04/27/09
CLOSED    created  nielsdevos                    

Return namedtuples from tokenize token generator                 04/27/09
CLOSED    created  mallyvai                      
       needs review                                                            

Make complex repr and str more like float repr and str           04/27/09    created  marketdickinson               

Remove implicit '%f' -> '%g' switch from float formatting.       04/27/09
CLOSED    created  marketdickinson               

TextIOWrapper: bad error reporting when write() is forbidden     04/27/09
CLOSED    created  pitrou                        

test_urllib fails on windows                                     04/28/09    created  ocean-city                    

multiprocessing 'using a remote manager' example errors and poss 04/28/09    created  r.david.murray                

bz2.BZ2File should accept other file-like objects.               04/28/09    created  MizardX                       

format(1234.5, '.4') gives misleading result                     04/28/09    created  marketdickinson               

mathmodule.c fails to compile due to missing math_log1p() functi 04/28/09
CLOSED    created  alanh                         

cPickle defect with tuples and different from pickle output      04/28/09    created  jelle                         

No way to create an abstract classmethod                         04/28/09    created  della                         

mimetypes.MAGIC_FUNCTION initialization not thread-safe in Pytho 04/28/09
CLOSED    created  apoirier                      

100th character truncation in 2.4                     04/28/09
CLOSED    created  neville.bagnall               

subprocess.DEVNULL                                               04/28/09    created  MrJean1                       

email.header.Header allow to embed raw newlines into a message   04/28/09    created  jwilk                         

New C API for declaring Python types                             04/29/09    created  larry                         

Minidom: parsestring() error                                     04/29/09
CLOSED    created  naf305                        

distutils.tests.test_config_cmd is locale-sensitive              04/29/09
CLOSED    created  georg.brandl                  

test_distutils failing on OpenSUSE 10.3, Py3k                    04/29/09    created  ShuaibKhan                    

__repr__ returning unicode doesn't work when called implicitly   04/29/09    created  liori                         

Add a function for updating URL query parameters                 04/29/09    created  mrts                          

Regular Expression instances                                     04/29/09
CLOSED    created  ecasbas                       

multiprocessing - example "pool of http servers " fails on windo 04/29/09    created  ghum                          

Remove unneeded "context" pointer from getters and setters       04/29/09    created  larry                         

Remove extraneous backwards-compatibility attributes from some m 04/29/09    created  larry                         

__repr__ is ignored when formatting exceptions                   04/29/09
CLOSED    created  ellisj                        

detach() implementation                                          04/29/09    created  benjamin.peterson             

pydoc to return error status code                                04/30/09    created  mixmastamyk                   

uuid.uuid1() is too slow                                         04/30/09    created  wangchun                      

curses/ global name '_os' is not defined             04/30/09
CLOSED    created  andrix                        

mmap.write_byte out of bounds - no error, position gets screwed  04/30/09    created  bmearns                       

mmap ehancement - resize with sequence notation                  04/30/09    created  bmearns                       

Extra comma in enum - fails on AIX                               04/30/09
CLOSED    created  srid                          

Subclassing property doesn't preserve the auto __doc__ behavior  04/30/09    created  gsakkis                       

strange list.sort() behavior on import, del and inport again     05/01/09
CLOSED    created  dstemmer                      

strange list.sort() behavior on import, del and inport again     05/01/09
CLOSED    created  dstemmer                      

Add support to pydoc to output .rst restructured text            05/01/09    created  gregory.p.smith               

Lookup of localised language name by ISO 639 language code and r 05/01/09    created  pander                        

Issues Now Closed (104)

pyvm module patch                                                 515 days    benjamin.peterson             

Bad OOB data management when using asyncore with select.poll()    514 days    georg.brandl                  

str.format() wrongly formats complex() numbers (Py30a2)           505 days    eric.smith                    

sqlite3 docs should mention utf8 requirement                      434 days    georg.brandl                  
       patch, easy                                                             

aifc cannot handle unrecognised chunk type "CHAN"                 419 days    r.david.murray                

float compared to decimal is silently incorrect.                   34 days    jdunck                        

3.0 pickle docs -- what about old-style classes?                  385 days    georg.brandl                  

PyString_FromStringAndSize() to be considered unsafe              384 days    iankko                        

Python does not accept unicode keywords                           375 days    ajaksu2                       

ctypes defines global symbols                                     316 days    theller                       

Wish: disable tests in unittest                                   304 days    benjamin.peterson             

various doc typos                                                 291 days    georg.brandl                  

file.readline: bad exception recovery                             260 days    benjamin.peterson             
       patch, easy                                                             

Tuple comparison masking exception                                226 days    rhettinger                    

idle should be installed as idle3.0                               220 days    ajaksu2                       

smtplib cannot sendmail over TLS                                  217 days    ajaksu2                       
       patch, easy                                                             

Python 2.6 Doc/tools folder bigger than in 2.6rc2                 205 days    georg.brandl                  

C/API documentation: request for documentation of change to Py_s  196 days    asmodai                       

Email example should use SMTP.quit() rather than SMTP.close()     181 days    asmodai                       

ctypes could include data type limits                             145 days    theller                       

Need to rework the dbm lib/include selection process              144 days    doko                          
       patch, needs review                                                     

Idle for Python 3.0 is default even without doing make fullinsta  129 days    ajaksu2                       

failure in test_httpservers                                       101 days    tarek                         

Incorrect title case                                               98 days    loewis                        

Specifying common controls DLL in manifest                         97 days    robind                        

ctypes unwilling to allow pickling wide character                  90 days    theller                       

Inadequate documentation of the built-in function open             91 days    georg.brandl                  

IDLE improve Subprocess Startup Error message                      91 days    ajaksu2                       

Avoid redundant call to FormatError()                              88 days    theller                       

indentation in IDLE 2.6 different from IDLE 2.5, 2.4 or vim        82 days    kbk                           
       patch, 26backport                                                       

wrong paths for ctypes cleanup                                     78 days    theller                       

setting __class__ in __del__ is bad. mmkay. negative ref count!    67 days    benjamin.peterson             

email/ cannot work                                    67 days    ajaksu2                       

ctypes configuration fails on mips-linux (and probably Irix)       41 days    theller                       

test_math.testFsum failure on release30-maint                      26 days    marketdickinson               

file "<stdin>" on disk creates garbage output in stack trace       26 days    ajaksu2                       

shutils test fails on ZFS (on FUSE, on Linux)                      27 days    benjamin.peterson             

inspect.findsource() should look only for sources                  13 days    ajaksu2                       

idle pydoc et al removed from 3.1 without versioned replacements   11 days    kbk                           

IDLE cannot find windows chm file                                   8 days    kbk                           
       patch, 26backport                                                       

Rationalize isdigit / isalpha / tolower / ... uses throughout Py    8 days    eric.smith                    

test_distutils fails - sysconfig._config_vars is None               3 days    tarek                         

Fix five small bugs in the bininstall and altbininstall pseudota    3 days    benjamin.peterson             

Documentation: mention 'close' and iteration for tarfile.TarFile    2 days    georg.brandl                  

new unittest function listed as assertIsNotNot() instead of asse    2 days    michael.foord                 

Invalid behavior of unicode.lower                                   1 days    loewis                        

heapq item comparison problematic with sched's events               0 days    rhettinger                    

os.path.walk fails to descend into a directory whose name ends w    0 days    potten                        

The word "error" used instead of "failure"                          0 days    georg.brandl                  

Deprecate PyOS_ascii_formatd                                        2 days    eric.smith                    

Clean up float parsing code for nans and infs                       2 days    marketdickinson               

support.EnvironmentVarGuard broken                                  0 days    doerwalter                    

Test issue                                                          0 days    marketdickinson               

add py3k warnings to commands                                       0 days    georg.brandl                  

Minor unittest doc patch                                            1 days    georg.brandl                  
       patch, patch, easy, needs review                                        

Idle 3.01 - invalid syntec error                                    0 days    doerwalter                    

Full example for emulating a container type                         2 days    rhettinger                    

can't use "glog" to find the path with square bracket               0 days    amaury.forgeotdarc            

mimetypes.guess_type() hits recursion limit                         1 days    pitrou                        

logging module's __all__ attribute not in sync with documentatio    0 days    vsajip                        

Perhaps exponential performance of sum(listoflists, [])             0 days    pitrou                        

Minor typo in traceback example                                     0 days    georg.brandl                  

Return namedtuples from tokenize token generator                    1 days    rhettinger                    
       needs review                                                            

Remove implicit '%f' -> '%g' switch from float formatting.          4 days    marketdickinson               

TextIOWrapper: bad error reporting when write() is forbidden        0 days    benjamin.peterson             

mathmodule.c fails to compile due to missing math_log1p() functi    0 days    marketdickinson               

mimetypes.MAGIC_FUNCTION initialization not thread-safe in Pytho    0 days    pitrou                        

100th character truncation in 2.4                        0 days    neville.bagnall               

Minidom: parsestring() error                                        0 days    georg.brandl                  

distutils.tests.test_config_cmd is locale-sensitive                 0 days    tarek                         

Regular Expression instances                                        0 days    georg.brandl                  

__repr__ is ignored when formatting exceptions                      0 days    benjamin.peterson             

curses/ global name '_os' is not defined                0 days    amaury.forgeotdarc            

Extra comma in enum - fails on AIX                                  1 days    srid                          

strange list.sort() behavior on import, del and inport again        0 days    loewis                        

strange list.sort() behavior on import, del and inport again        0 days    loewis                        

Fix for bugs relating to ntpath.expanduser()                      210 days  gjb1002                       

urllib2 http auth                                                1689 days gregory.p.smith               

endianness detection fails on IRIX 5.3                           1617 days ajaksu2                       

proposed patch for tls wrapped ssl support added to smtplib      1417 days ajaksu2                       

MSI installer does not pass values as SecureProperty from UI     1311 days ajaksu2                       

Integer bit operations performance improvement.                  1073 days marketdickinson               

test_float segfaults with SIGFPE on FreeBSD 6.0 / Alpha          1066 days marketdickinson               

Use dynload_shlib on newer HP-UX versions                        1026 days ajaksu2                       

Allowing multiple instances of IDLE with sub-processes           1004 days kbk                           

Tracing and profiling functions can cause hangs in threads        999 days ajaksu2                       

Tru64 make install failure                                        954 days ajaksu2                       

Install on WinXP always goes to C:\                               943 days ajaksu2                       

Modules/readline.c fails to compile on AIX 4.2                    891 days ajaksu2                       

Would you mind renaming object.h to pyobject.h?                   844 days ajaksu2                       

Python 2.5 gets curses.h warning on HPUX                          824 days ajaksu2                       

proxy_bypass in urllib handling of <local> macro                  821 days orsenthil                     
       patch, easy                                                             

HP-UX: compiler warnings: alignment                               815 days ajaksu2                       

Python package support not properly documented                    715 days georg.brandl                  

Document effects of PY_SSIZE_T_CLEAN on argument parsing          693 days loewis                        

Solaris 64 bit LD_LIBRARY_PATH_64 needs to be set                 687 days ajaksu2                       

Modules/ld_so_aix needs to strip path off of whichcc call         687 days ajaksu2                       

zlib configure behaves differently than main configure            687 days ajaksu2                       

HP shared object option                                           687 days ajaksu2                       

HP automatic build of zlib                                        687 days ajaksu2                       

HP 64 bit does not run                                            687 days ajaksu2                       

AIX shared object build of python 2.5 does not work               687 days ajaksu2                       

Fast path for unicodedata.normalize()                             688 days pitrou                        

Python - Operation time out problem                               628 days ajaksu2                       

Top Issues Most Discussed (10)

 28 str.format() wrongly formats complex() numbers (Py30a2)          505 days

 12 support.EnvironmentVarGuard broken                                 0 days

 10 mathmodule.c fails to compile due to missing math_log1p() funct    0 days

 10 format(1234.5, '.4') gives misleading result                       3 days

  9 Invalid behavior of unicode.lower                                  1 days

  9 IDLE cannot find windows chm file                                  8 days

  8 failure in test_httpservers                                      101 days

  7 mimetypes.guess_type() hits recursion limit                        1 days

  7 C/API documentation: request for documentation of change to Py_  196 days

  6 detach() implementation                                            2 days

From chris at  Fri May  1 18:26:29 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 17:26:29 +0100
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> """
> If the package really requires adding one or more directories on sys.path (e.g.
> because it has not yet been structured to support dotted-name import), a "path
> configuration file" named package.pth can be placed in either the site-python or
> site-packages directory.
> ...
> A typical installation should have no or very few .pth files or something is
> wrong, and if you need to play with the search order, something is very wrong.
> """

I'll say! I think .pth files are absolute evil and I wish they could 
just be banned.

+1 on anything that makes them closer to going away or reduces the 
possibility of yet another similar feature from hurting the 
comprehensibility of a python setup.


Simplistix - Content Management, Zope & Python Consulting

From chris at  Fri May  1 18:30:16 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 17:30:16 +0100
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> The much more common use case is that of wanting to have a base package
> installation which optional add-ons that live in the same logical
> package namespace.
> The PEP provides a way to solve this use case by giving both developers
> and users a standard at hand which they can follow without having to
> rely on some non-standard helpers and across Python implementations.
> My proposal tries to solve this without adding yet another .pth
> file like mechanism - hopefully in the spirit of the original Python
> package idea.

Okay, I need to issue a plea for a little help.

I think I kinda get what this PEP is about now, and as someone who wants 
  to ship a base package with several add-ons that live in the same 
logical package namespace, I'm very interested.

However, despite trying to follow this thread *and* having tried to read 
the PEP a couple of times, I still don't know how I'd go about doing this.

I did give some examples from what I'd be looking to do much earlier.

I'll ask again in the vague hope of you or someone else explaining 
things to me like I'm a 5 year old - something I'm mentally equipped to 
be well ;-)

In either of the proposals on the table, what code would I write and 
where to have a base package with a set of add-on packages?

Simple examples would be greatly appreciated, and might bring things 
into focus for some of the less mentally able bystanders - like myself!



Simplistix - Content Management, Zope & Python Consulting

From chris at  Fri May  1 18:32:14 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 17:32:14 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

P.J. Eby wrote:
> At 06:15 PM 4/15/2009 +0200, M.-A. Lemburg wrote:
>> The much more common use case is that of wanting to have a base package
>> installation which optional add-ons that live in the same logical
>> package namespace.
> Please see the large number of Zope and PEAK distributions on PyPI as 
> minimal examples that disprove this being the common use case.  

If you mean "the common use case as opposed to having code in the of the namespace package", I think you'll find that's 
because people (especially me!) don't know how to do this, not because 
we don't want to!

Chris - who would actually like to know how to do this, with or without 
the PEP, and how to indicate interdependencies in situations like this 
to setuptools...

Simplistix - Content Management, Zope & Python Consulting

From chris at  Fri May  1 18:35:43 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 17:35:43 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

P.J. Eby wrote:
> It's unclear, however, who is using base packages besides mx.* and ll.*, 
> although I'd guess from the PyPI listings that perhaps Django is.  (It 
> seems that "base" packages are more likely to use a 'base-extension' 
> naming pattern, vs. the 'namespace.project' pattern used by "pure" 
> packages.)

I'll stress it again in case you missed it the first time: I think the 
main reason people use "pure namespace" versus "base namespace" packages 
is because hardly anyone know how to do the latter, not because there is 
no desire to do so!

I, for one, have been trying to figure out how to do "base namespace" 
packages for years...


Simplistix - Content Management, Zope & Python Consulting

From martin at  Fri May  1 18:38:46 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 18:38:46 +0200
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <>
	<>	<>	<>	<>	<>
Message-ID: <>

> Okay, I am wrong about this.  Having a flag to remember whether I had to
> fall back to the utf-8b trick is one method to implement my requirement,
> but my actual requirement is this:
> Requirement: either the unicode string or the bytes are faithfully
> transmitted from one system to another.

I don't understand this requirement very well, in particular not
the "faithfully" part.

> That is: if you read a filename from the filesystem, and transmit that
> filename to another system and use it, then there are two cases:

What do you mean by "use it"? Things like opening files? How does
that work? In general, a file name valid on one system is invalid
on a different system - or, at least, refers to a different file
over there. This is independent of encodings.

> Requirement 1: the byte string was valid in the encoding of source
> system, in which case the unicode name is faithfully transmitted
> (i.e. the bytes that finally land on the target system are the result of
> sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).

In all your descriptions, I'm puzzled as to where exactly you get
the source bytes from. If you use the PEP 383 interfaces, you will
start with character strings, not byte strings, always.

> Okay, I find it surprisingly easy to make subtle errors in this encoding
> stuff, so please let me know if you spot one.  Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ? 

I think you mixed up bytes and unicode here: if srcbytes is indeed
a bytes object, then you can't apply .encode to it.


From martin at  Fri May  1 18:41:03 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 18:41:03 +0200
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

> In either of the proposals on the table, what code would I write and
> where to have a base package with a set of add-on packages?

I don't quite understand the question. Why would you want to write code
(except for the code that actually is in the packages)?

PEP 382 is completely declarative - no need to write code.


From chris at  Fri May  1 18:58:18 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 17:58:18 +0100
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>> In either of the proposals on the table, what code would I write and
>> where to have a base package with a set of add-on packages?
> I don't quite understand the question. Why would you want to write code
> (except for the code that actually is in the packages)?
> PEP 382 is completely declarative - no need to write code.

"code" is anything I need to write to make this work...

So, what do I need to do?


Simplistix - Content Management, Zope & Python Consulting

From chris at  Fri May  1 19:14:12 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 18:14:12 +0100
Subject: [Python-Dev] headers api for email package
In-Reply-To: <>
References: <>
Message-ID: <>

>>> Where you just want "a damned valid email and stop making my life 
>>> hard!":
>>> Message['Subject']='Some text'
>> Yes.  In which case I propose we guess the encoding as 1) ascii, 2) 
>> utf-8, 3) wtf?

Well, we're talking about Python 3 here right? In which case the above 
involves only unicode, so why do we need to guess anything? Just use 
utf-8 and be done with it...

> However, it's not supposed to be used by mail composers, who are
> expected to know the encoding.  It's for mail gateways that are
> transforming something and don't know the encoding.  I'm not
> sure what this means for the email module, which certainly
> will be used in a mail gateways....maybe it's the responsibility
> of the application code to explicitly say 'unknown encoding'?

Indeed, surely this happens when you have bytes and need to do something 
with it? That's not what my example above is about...

>>> Where you care about what encoding is used:
>>> Message['Subject']=Header('Some text',encoding='utf-8')
>> Yes.'s covered by this.

>>> If you have bytes, for whatever reason:
>>> Message['Subject']=b'some bytes'.decode('utf-8')
>>> ...because only you know what encoding those bytes use!
>> So you're saying that __setitem__() should not accept raw bytes?

Indeed :-)


Simplistix - Content Management, Zope & Python Consulting

From chris at  Fri May  1 19:18:35 2009
From: chris at (Chris Withers)
Date: Fri, 01 May 2009 18:18:35 +0100
Subject: [Python-Dev] [Email-SIG]  headers api for email package
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
>  > > str(message['Subject'])
>  > 
>  > Yes for unstructured headers like Subject.  For structured headers...  
>  > hmm.
> Well, suppose we get really radical here.  *People* see email as
> (rich-)text.  So ... message['Subject'] returns an object, partly to
> be consistent with more complex headers' APIs, but partly to remind us
> that nothing in email is as simple as it seems.  Now,
> str(message['Subject']) is really for presentation to the user, right?
> OK, so let's make it a presentation function!  Decode the MIME-words,
> optionally unfold folded lines, optionally compress spaces, etc.  This
> by default returns the subject field as a single, possibly quite long,
> line.  Then a higher-level API can rewrap it, add fonts etc, for fancy
> presentation.  This also suggests that we don't the field tag (ie,
> "Subject") to be part of this value.
> Of course a *really* smart higher-level API would access structured
> headers based on their structure, not on the one-size-fits-all str()
> conversion.

All sounds good to me.

> Then MTAs see email as a string of octets.  So guess what:
>  > > bytes(message['Subject'])
> gives wire format.  Yow!  I think I'm just joking.  Right?

Why? That also sounds fine to me and "feels right"...

>  > > Where you just want "a damned valid email and stop making my life  
>  > > hard!":
> -1  I mean, yeah, Brother, I feel your pain but it just isn't that
> easy.  If that were feasible, it would be *criminal* to have a
> .set_header() method at all!  In fact,

Don't agree...

>  > > Message['Subject']='Some text'
> is going to (a) need to take *only* unicodes, or (b) raise Exceptions
> at the slightest provocation when handed bytes.

It should only take unicodes and bitch profusely about anything else.

> And things only get worse if you try to provide this interface for say
> "From" (let alone "Content-Type").  Is it really worth doing the
> mapping interface if it's only usable with free-form headers (ie, only
> Subject among the commonly used headers)?

Sure, for other headers it might *not* accept unicodes...

> How do you distinguish "raw" bytes from "encoded bytes"?
> __setitem__() shouldn't accept bytes at all. 

Right on :-)


Simplistix - Content Management, Zope & Python Consulting

From martin at  Fri May  1 19:38:12 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 19:38:12 +0200
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

>>> In either of the proposals on the table, what code would I write and
>>> where to have a base package with a set of add-on packages?
>> I don't quite understand the question. Why would you want to write code
>> (except for the code that actually is in the packages)?
>> PEP 382 is completely declarative - no need to write code.
> "code" is anything I need to write to make this work...
> So, what do I need to do?

Ok, so create three tar files:

1. base.tar, containing


2. addon1.tar, containing

   simplistix/addon1.pth (containing a single "*")

3. addon2.tar, containing


Unpack each of them anywhere on sys.path, in any order.


From martin at  Fri May  1 19:41:39 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 19:41:39 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

>> It's unclear, however, who is using base packages besides mx.* and
>> ll.*, although I'd guess from the PyPI listings that perhaps Django
>> is.  (It seems that "base" packages are more likely to use a
>> 'base-extension' naming pattern, vs. the 'namespace.project' pattern
>> used by "pure" packages.)
> I'll stress it again in case you missed it the first time: I think the
> main reason people use "pure namespace" versus "base namespace" packages
> is because hardly anyone know how to do the latter, not because there is
> no desire to do so!
> I, for one, have been trying to figure out how to do "base namespace"
> packages for years...

You mean, without PEP 382?

That won't be possible, unless you can coordinate all addon packages.
Base packages are a feature solely of PEP 382.


From pje at  Fri May  1 20:49:40 2009
From: pje at (P.J. Eby)
Date: Fri, 01 May 2009 14:49:40 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
Message-ID: <>

At 05:35 PM 5/1/2009 +0100, Chris Withers wrote:
>P.J. Eby wrote:
>>It's unclear, however, who is using base packages besides mx.* and 
>>ll.*, although I'd guess from the PyPI listings that perhaps Django 
>>is.  (It seems that "base" packages are more likely to use a 
>>'base-extension' naming pattern, vs. the 'namespace.project' 
>>pattern used by "pure" packages.)
>I'll stress it again in case you missed it the first time: I think 
>the main reason people use "pure namespace" versus "base namespace" 
>packages is because hardly anyone know how to do the latter, not 
>because there is no desire to do so!

I didn't say there's *no* desire, however IIRC the only person who 
*ever* asked on distutils-sig how to do a base package with 
setuptools was the author of the ll.* packages.  And in the case of 
at least the zope.* peak.* and osaf.* namespace packages it was 
specifically *not* the intention to have a base __init__.

From pje at  Fri May  1 20:51:20 2009
From: pje at (P.J. Eby)
Date: Fri, 01 May 2009 14:51:20 -0400
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 07:41 PM 5/1/2009 +0200, Martin v. L?wis wrote:
> >> It's unclear, however, who is using base packages besides mx.* and
> >> ll.*, although I'd guess from the PyPI listings that perhaps Django
> >> is.  (It seems that "base" packages are more likely to use a
> >> 'base-extension' naming pattern, vs. the 'namespace.project' pattern
> >> used by "pure" packages.)
> >
> > I'll stress it again in case you missed it the first time: I think the
> > main reason people use "pure namespace" versus "base namespace" packages
> > is because hardly anyone know how to do the latter, not because there is
> > no desire to do so!
> >
> > I, for one, have been trying to figure out how to do "base namespace"
> > packages for years...
>You mean, without PEP 382?
>That won't be possible, unless you can coordinate all addon packages.
>Base packages are a feature solely of PEP 382.

Actually, if you are using only the distutils, you can do this by 
listing only modules in the addon projects; this is how the ll.* 
tools are doing it.  That only works if the packages are all being 
installed in the same directory, though, not as eggs.

From martin at  Fri May  1 20:58:28 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 01 May 2009 20:58:28 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> Actually, if you are using only the distutils, you can do this by
> listing only modules in the addon projects; this is how the ll.* tools
> are doing it.  That only works if the packages are all being installed
> in the same directory, though, not as eggs.

Right: if all portions install into the same directory, you can have
base packages already.


From benjamin at  Fri May  1 21:32:18 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 1 May 2009 14:32:18 -0500
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/1 MRAB <google at>:
> I've just noticed an oddity in the key in PEP 0. Most letters are used
> more than once. Wouldn't it be clearer if different letters were used
> for "Accepted" and "Active" instead of them both being 'A', for example?
> -> A - Accepted proposal
> -> R - Rejected proposal
> ? W - Withdrawn proposal
> -> D - Deferred proposal
> ? F - Final proposal
> -> A - Active proposal
> -> D - Draft proposal
> -> R - Replaced proposal

Yes, that makes more sense. Would you like to submit a patch against
the PEP 0 generator? (It's in peps/pep0)


From tjreedy at  Fri May  1 22:21:36 2009
From: tjreedy at (Terry Reedy)
Date: Fri, 01 May 2009 16:21:36 -0400
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <>
	<>	<>	<>	<>	<>
Message-ID: <gtflkh$n68$>

Zooko O'Whielacronx wrote:
> Following-up to my own post to correct a major error:

> Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ?  That is my Requirement

If you start with bytes, decode with utf-8b to unicode (possibly 
'invalid'), and encode the result back to bytes with utf-8b, you should 
get the original bytes, regardless of what they were.  That is the point 
of PEP 383 -- to reliably roundtrip file 'names' that start as bytes and 
must end as the same bytes but which may not otherwise have a unicode 

If you start with invalid unicode text, encode to bytes with utf-8b, and 
decode back to unicode, you might instead get a different and valid 
unicode text.  An example was given in the discussion.  I believe this 
would be hard to avoid.  An any case, it does not matter for the use 
case of starting with bytes that one wants to temporarily but surely 
work with as text.

Terry Jan Reedy

From cs at  Fri May  1 23:39:28 2009
From: cs at (Cameron Simpson)
Date: Sat, 2 May 2009 07:39:28 +1000
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
Message-ID: <>

On 01May2009 18:38, Martin v. L?wis <martin at> wrote:
| > Okay, I am wrong about this.  Having a flag to remember whether I had to
| > fall back to the utf-8b trick is one method to implement my requirement,
| > but my actual requirement is this:
| > 
| > Requirement: either the unicode string or the bytes are faithfully
| > transmitted from one system to another.
| I don't understand this requirement very well, in particular not
| the "faithfully" part.
| > That is: if you read a filename from the filesystem, and transmit that
| > filename to another system and use it, then there are two cases:
| What do you mean by "use it"? Things like opening files? How does
| that work? In general, a file name valid on one system is invalid
| on a different system - or, at least, refers to a different file
| over there. This is independent of encodings.

I think he's doing a file transfer of some kind and needs to preserve
the names. Or I would guess the two systems are not both UNIX or there
is some subtlety not yet mentioned, or he'd just use tar or some other
byte-level UNIX tool.

| > Requirement 1: the byte string was valid in the encoding of source
| > system, in which case the unicode name is faithfully transmitted
| > (i.e. the bytes that finally land on the target system are the result of
| > sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).
| In all your descriptions, I'm puzzled as to where exactly you get
| the source bytes from. If you use the PEP 383 interfaces, you will
| start with character strings, not byte strings, always.

But if both system do present POSIX layers, it's bytes underneath and
the system tools will natively use bytes. He wants to ensure that he can
read using python, using listdir, and elsewhere when he writing using
python, preserve the bytes layer. I think.

In fact it sounds like he may be translating valid unicode and carefully not
altering byte names that don't decode. That in turn implies that the codec
may be different on the two systems.

| > Okay, I find it surprisingly easy to make subtle errors in this encoding
| > stuff, so please let me know if you spot one.  Is it true that
| > srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
| > 'python-escape') will always produce srcbytes ? 
| I think you mixed up bytes and unicode here: if srcbytes is indeed
| a bytes object, then you can't apply .encode to it.

I think he has encode/decode swapped (I did too back in the uber-thread;
if your mapping is one-to-one the distinction is almost arbitrary).

However, his assertion/hope is true only if srcencoding == 'utf-8'.
The PEP itself says that it works if the decode and encode use the same
Cameron Simpson <cs at> DoD#743

"How do you know I'm Mad?" asked Alice.
"You must be," said the Cat, "or you wouldn't have come here."

From google at  Fri May  1 23:52:02 2009
From: google at (MRAB)
Date: Fri, 01 May 2009 22:52:02 +0100
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> 2009/5/1 MRAB <google at>:
>> I've just noticed an oddity in the key in PEP 0. Most letters are used
>> more than once. Wouldn't it be clearer if different letters were used
>> for "Accepted" and "Active" instead of them both being 'A', for example?
>> -> A - Accepted proposal
>> -> R - Rejected proposal
>>   W - Withdrawn proposal
>> -> D - Deferred proposal
>>   F - Final proposal
>> -> A - Active proposal
>> -> D - Draft proposal
>> -> R - Replaced proposal
> Yes, that makes more sense. Would you like to submit a patch against
> the PEP 0 generator? (It's in peps/pep0)
I'm still trying to think which letters to use!

From fuzzyman at  Fri May  1 23:55:16 2009
From: fuzzyman at (Michael Foord)
Date: Fri, 01 May 2009 22:55:16 +0100
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>	<>
Message-ID: <>

MRAB wrote:
> Benjamin Peterson wrote:
>> 2009/5/1 MRAB <google at>:
>>> I've just noticed an oddity in the key in PEP 0. Most letters are used
>>> more than once. Wouldn't it be clearer if different letters were used
>>> for "Accepted" and "Active" instead of them both being 'A', for 
>>> example?
>>> -> A - Accepted proposal
>>> -> R - Rejected proposal
>>>   W - Withdrawn proposal
>>> -> D - Deferred proposal
>>>   F - Final proposal
>>> -> A - Active proposal
>>> -> D - Draft proposal
>>> -> R - Replaced proposal
>> Yes, that makes more sense. Would you like to submit a patch against
>> the PEP 0 generator? (It's in peps/pep0)
> I'm still trying to think which letters to use!

P for Proposal (to replace Active Proposal)? Every active PEP is a 


> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 


From barry at  Fri May  1 23:59:49 2009
From: barry at (Barry Warsaw)
Date: Fri, 1 May 2009 17:59:49 -0400
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>	<>
Message-ID: <>

On May 1, 2009, at 5:55 PM, Michael Foord wrote:

> P for Proposal (to replace Active Proposal)? Every active PEP is a  
> proposal...


Maybe even s/Active/Proposed/g ?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <>

From google at  Sat May  2 00:24:32 2009
From: google at (MRAB)
Date: Fri, 01 May 2009 23:24:32 +0100
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Michael Foord wrote:
> MRAB wrote:
>> Benjamin Peterson wrote:
>>> 2009/5/1 MRAB <google at>:
>>>> I've just noticed an oddity in the key in PEP 0. Most letters are used
>>>> more than once. Wouldn't it be clearer if different letters were used
>>>> for "Accepted" and "Active" instead of them both being 'A', for 
>>>> example?
>>>> -> A - Accepted proposal
>>>> -> R - Rejected proposal
>>>>   W - Withdrawn proposal
>>>> -> D - Deferred proposal
>>>>   F - Final proposal
>>>> -> A - Active proposal
>>>> -> D - Draft proposal
>>>> -> R - Replaced proposal
>>> Yes, that makes more sense. Would you like to submit a patch against
>>> the PEP 0 generator? (It's in peps/pep0)
>> I'm still trying to think which letters to use!
> P for Proposal (to replace Active Proposal)? Every active PEP is a 
> proposal...
The full list is:

S - Standards Track PEP
I - Informational PEP
P - Process PEP

A - Accepted proposal
R - Rejected proposal
W - Withdrawn proposal
D - Deferred proposal
F - Final proposal
A - Active proposal
D - Draft proposal
R - Replaced proposal

using one letter from each set.

 From looking more closely at the code:

Only 'Informational' or 'Process' PEPs can be 'Active'.

'Draft' and 'Active' are shown as a single space instead of 'D' or 'A'.


S - Standards Track PEP
I - Informational PEP
P - Process PEP

A - Accepted proposal
R - Rejected proposal
W - Withdrawn proposal
D - Deferred proposal
F - Final proposal
[A - Active proposal # blank, so can be omitted from key]
[D - Draft proposal  # blank, so can be omitted from key]
R - Replaced proposal

leaving just 'Rejected' and 'Replaced' to be disambiguated.

From eric at  Sat May  2 00:55:04 2009
From: eric at (Eric Smith)
Date: Fri, 01 May 2009 18:55:04 -0400
Subject: [Python-Dev] svn down?
Message-ID: <>

When checking in, I get:

Transmitting file data .svn: Commit failed (details follow):
svn: Can't create directory 
'/data/repos/projects/db/transactions/72186-1.txn': Read-only file system

With 'svn up', I get:

svn: Can't find a temporary directory: Internal error

From benjamin at  Sat May  2 01:12:23 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 1 May 2009 18:12:23 -0500
Subject: [Python-Dev] svn down?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/1 Eric Smith <eric at>:
> When checking in, I get:
> Transmitting file data .svn: Commit failed (details follow):
> svn: Can't create directory
> '/data/repos/projects/db/transactions/72186-1.txn': Read-only file system
> With 'svn up', I get:
> svn: Can't find a temporary directory: Internal error

I get that, too. In addition, I can't ssh to dinsdale.


From benjamin at  Sat May  2 03:27:48 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 1 May 2009 20:27:48 -0500
Subject: [Python-Dev] yield from?
Message-ID: <>

What's the status of yield from? There's still a small window open for
a patch to be checked into 3.1's branch. I haven't been following the
python-ideas threads, so I'm not sure if it's ready yet.


From zookog at  Sat May  2 03:42:47 2009
From: zookog at (Zooko O'Whielacronx)
Date: Fri, 1 May 2009 19:42:47 -0600
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>


Being new to the use of gmail, I accidentally sent the following only
to MvL and not to the list.  He promptly replied with a helpful
counterexample showing that my design can suffer collisions.  :-)



On Fri, May 1, 2009 at 10:38 AM, "Martin v. L?wis" <martin at> wrote:
>> Requirement: either the unicode string or the bytes are faithfully
>> transmitted from one system to another.
> I don't understand this requirement very well, in particular not
> the "faithfully" part.
>> That is: if you read a filename from the filesystem, and transmit that
>> filename to another system and use it, then there are two cases:
> What do you mean by "use it"? Things like opening files? How does
> that work? In general, a file name valid on one system is invalid
> on a different system - or, at least, refers to a different file
> over there. This is independent of encodings.

Tahoe is a backup and filesharing program, so you might for example,
execute "tahoe cp -r Mot?rhead tahoe:" to copy all the contents of
your "Mot?rhead" directory to your Tahoe filesystem.  Later you or a
friend, might execute "tahoe cp -r tahoe:Mot?rhead ." to copy
everything from that directory within your Tahoe filesystem to your
local filesystem.  So in this case the flow of information is
local_system_1 -> Tahoe -> local_system_2.

The Requirement 1 is that for each filename encountered which is a
valid encoding in local_system_1, then the resulting (unicode) name is
transmitted through the Tahoe filesystem and then written out into
local_system_2 in the expected way (i.e. just by using the Python
unicode APIs and passing the unicode object to them).

Requirement 2 is that for each filename encountered which is not a
valid encoding in local_system_1, then the original bytes are
transmitted through the Tahoe filesystem and then, if the target
system is a byte-oriented system such as Linux, the original bytes are
written into the target filesystem.  (If the target is not Linux then
mojibake! but we don't have to go into that now.)

Does that make sense?

> In all your descriptions, I'm puzzled as to where exactly you get
> the source bytes from. If you use the PEP 383 interfaces, you will
> start with character strings, not byte strings, always.

On Mac and Windows, we use the Python unicode APIs e.g.
os.listdir(u"Mot?rhead").  On Linux and Solaris, we use the Python
bytestring APIs e.g.

>> Okay, I find it surprisingly easy to make subtle errors in this encoding
>> stuff, so please let me know if you spot one.  Is it true that
>> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
>> 'python-escape') will always produce srcbytes ?
> I think you mixed up bytes and unicode here: if srcbytes is indeed
> a bytes object, then you can't apply .encode to it.

Yep, I reversed the order of encode() and decode().  However, my whole
statement was utterly wrong and shows that I still didn't fully get it
yet.  I have flip-flopped again and currently think that PEP 383 is
useless for this use case and that my original plan [1] is still the
way to go.  Please let me know if you spot a flaw in my plan or a
ridiculousity in my requirements, or if you see a way that PEP 383 can
help me.

Thank you very much.




From guido at  Sat May  2 04:10:47 2009
From: guido at (Guido van Rossum)
Date: Fri, 1 May 2009 19:10:47 -0700
Subject: [Python-Dev] yield from?
In-Reply-To: <>
References: <>
Message-ID: <>

Alas, I haven't been following it either recently. Too bad, really,
because before I left (now three weeks ago) it was already pretty
close. We could perhaps even check in Greg's patch (which I tried and
looked like a solid implementation of his proposal at the time) and
finagle it for b2. One problem though is that Greg's code is based on

On Fri, May 1, 2009 at 6:27 PM, Benjamin Peterson <benjamin at> wrote:
> What's the status of yield from? There's still a small window open for
> a patch to be checked into 3.1's branch. I haven't been following the
> python-ideas threads, so I'm not sure if it's ready yet.

--Guido van Rossum (home page:

From foom at  Sat May  2 04:12:15 2009
From: foom at (James Y Knight)
Date: Fri, 1 May 2009 22:12:15 -0400
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>

On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote:
> Yep, I reversed the order of encode() and decode().  However, my whole
> statement was utterly wrong and shows that I still didn't fully get it
> yet.  I have flip-flopped again and currently think that PEP 383 is
> useless for this use case and that my original plan [1] is still the
> way to go.  Please let me know if you spot a flaw in my plan or a
> ridiculousity in my requirements, or if you see a way that PEP 383 can
> help me.

If I were designing a new system such as this, I'd probably just go  
for utf8b *always*. That is, set the filesystem encoding to utf-8b.  
The end. All files always keep the same bytes transferring between  
unix systems. Thus, for the 99% of the world that uses either windows  
or a utf-8 locale, they get useful filenames inside tahoe. The other  
1% of the world that uses something like latin-1, EUC_JP, etc. on  
their local system sees mojibake filenames in tahoe, but will see the  
same filename that they put in when they take it back out.

Gnome already uses only utf-8 for filename displays for a few years  
now, for example, so this isn't exactly an unheard-of position to  

But if you don't do that, then, I still don't see what purpose your  
requirements serve. If I have two systems: one with a UTF-8 locale,  
and one with a Latin-1 locale, why should transmitting filenames from  
system 1 to system 2 through tahoe preserve the raw bytes, but doing  
the reverse *not* preserve the raw bytes? (all byte-sequences are  
valid in latin-1, remember, so they'll all decode into unicode without  
error, and then be reencoded in utf-8...). This seems rather a useless  
behavior to me.


From alexander.belopolsky at  Sat May  2 04:46:00 2009
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Fri, 1 May 2009 22:46:00 -0400
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

> leaving just 'Rejected' and 'Replaced' to be disambiguated.

'X' or 'Z' for "Rejected"?  Looks like a perfect start for a bikeshed
discussion. :-)

From stephen at  Sat May  2 07:34:15 2009
From: stephen at (Stephen J. Turnbull)
Date: Sat, 02 May 2009 14:34:15 +0900
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

Barry Warsaw writes:
 > On May 1, 2009, at 5:55 PM, Michael Foord wrote:
 > > P for Proposal (to replace Active Proposal)? Every active PEP is a  
 > > proposal...
 > +1
 > Maybe even s/Active/Proposed/g ?

Shouldn't that be


<duck />

From stephen at  Sat May  2 07:49:34 2009
From: stephen at (Stephen J. Turnbull)
Date: Sat, 02 May 2009 14:49:34 +0900
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

Alexander Belopolsky writes:
 > ..
 > > leaving just 'Rejected' and 'Replaced' to be disambiguated.
 > 'X' or 'Z' for "Rejected"?  Looks like a perfect start for a bikeshed
 > discussion. :-)

The Japanese contingent suggests O (UPPERCASE LATIN LETTER O) for
accepted and X for rejected.  (Actually these should be U+25EF and
U+00D7, respectively.)

From arfrever.fta at  Sat May  2 12:34:05 2009
From: arfrever.fta at (Arfrever Frehtes Taifersar Arahesis)
Date: Sat, 2 May 2009 12:34:05 +0200
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

2009-05-02 07:34:15 Stephen J. Turnbull napisa?(a):
> Barry Warsaw writes:
>  > On May 1, 2009, at 5:55 PM, Michael Foord wrote:
>  > 
>  > > P for Proposal (to replace Active Proposal)? Every active PEP is a  
>  > > proposal...
>  > 
>  > +1
>  > 
>  > Maybe even s/Active/Proposed/g ?
> Shouldn't that be
>     s/Active/Proposed/<g>

From `info sed 'sed Programs' 'The "s" Command'`:

> The `s' Command
> ===============
>    The syntax of the `s' (as in substitute) command is
> `s/REGEXP/REPLACEMENT/FLAGS'.  The `/' characters may be uniformly
> replaced by any other single character within any given `s' command.
> The `/' character (or whatever other character is used in its stead)
> can appear in the REGEXP or REPLACEMENT only if it is preceded by a `\'
> character.
> ...
>    The `s' command can be followed by zero or more of the following
> `g'
>      Apply the replacement to _all_ matches to the REGEXP, not just the
>      first.

Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <>

From aahz at  Sat May  2 14:34:04 2009
From: aahz at (Aahz)
Date: Sat, 2 May 2009 05:34:04 -0700
Subject: [Python-Dev] FWD: svn down?
Message-ID: <>

----- Forwarded message from "\"Martin v. L?wis\"" <martin at> -----

> Date: Sat, 02 May 2009 08:18:56 +0200
> From: "\"Martin v. L?wis\"" <martin at>
> To: Aahz <aahz at>
> CC: pydotorg at
> Subject: Re: [Pydotorg] FWD: [Python-Dev] svn down?
>> Benjamin Peterson reports being unable to ssh to dinsdale
> I have rebooted the machine; it seems now to be working again.
> Regards,
> Martin

----- End forwarded message -----

Aahz (aahz at           <*>

"Typing is cheap.  Thinking is expensive."  --Roy Smith

From google at  Sat May  2 16:12:07 2009
From: google at (MRAB)
Date: Sat, 02 May 2009 15:12:07 +0100
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>	
Message-ID: <>

Alexander Belopolsky wrote:
> ..
>> leaving just 'Rejected' and 'Replaced' to be disambiguated.
> 'X' or 'Z' for "Rejected"?  Looks like a perfect start for a bikeshed
> discussion. :-)
Are there Unicode codepoints for smilies? I'm thinking of :-) for
'Accepted' and :-( for 'Rejected'. :-)

From ajaksu at  Sat May  2 17:11:49 2009
From: ajaksu at (Daniel Diniz)
Date: Sat, 2 May 2009 12:11:49 -0300
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <>
Message-ID: <>

MRAB wrote:
> Are there Unicode codepoints for smilies? I'm thinking of :-) for
> 'Accepted' and :-( for 'Rejected'. :-)

Yes there are, but we'd need to set the font size to 'humongous' to
see the smilies: ? ?.

In py3k: print(chr(0x2639), chr(0x263a))
In trunk: print(unichr(0x2639), unichr(0x263a))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smilies.png
Type: image/png
Size: 3574 bytes
Desc: not available
URL: <>

From ijmorlan at  Sat May  2 17:04:22 2009
From: ijmorlan at (Isaac Morland)
Date: Sat, 2 May 2009 11:04:22 -0400 (EDT)
Subject: [Python-Dev] Oddity PEP 0 key
In-Reply-To: <>
References: <> 
Message-ID: <>

On Sat, 2 May 2009, MRAB wrote:

> Alexander Belopolsky wrote:
>> ..
>>> leaving just 'Rejected' and 'Replaced' to be disambiguated.
>> 'X' or 'Z' for "Rejected"?  Looks like a perfect start for a bikeshed
>> discussion. :-)
> Are there Unicode codepoints for smilies? I'm thinking of :-) for
> 'Accepted' and :-( for 'Rejected'. :-)


Also, U+2694 CROSSED SWORDS for "vehement discussion on mailing list", 
U+2696 SCALES for "BDFL is considering", and U+2678 BLACK UNIVERSAL 
RECYCLING SYMBOL for "proposal previously rejected is being re-proposed 
due to changed circumstances".

For code don't forget great math operator symbols like U+2264 
LESS-THAN OR EQUAL TO and U+222A UNION.  But I doubt if anybody would want 
to bake in an absolute requirement for Unicode support in order to be able 
to read or write Python code.

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist

From benjamin at  Sat May  2 20:41:51 2009
From: benjamin at (Benjamin Peterson)
Date: Sat, 2 May 2009 13:41:51 -0500
Subject: [Python-Dev] yield from?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/1 Guido van Rossum <guido at>:
> Alas, I haven't been following it either recently. Too bad, really,
> because before I left (now three weeks ago) it was already pretty
> close. We could perhaps even check in Greg's patch (which I tried and
> looked like a solid implementation of his proposal at the time) and
> finagle it for b2. One problem though is that Greg's code is based on
> 2.6...

I don't believe the compiler has changed between 2.6 and the trunk, so
a patch against the trunk would probably not be too hard. I volunteer
to review it if it is produced.


From g.brandl at  Sat May  2 21:01:28 2009
From: g.brandl at (Georg Brandl)
Date: Sat, 02 May 2009 21:01:28 +0200
Subject: [Python-Dev] multi-with statement
Message-ID: <gti5aj$ao$>


this is just a short notice that Mattias Br?ndstr?m and I have finished a
patch to implement the previously discussed and mostly warmly welcomed
extension to with's syntax, allowing

   with A() as a, B() as b:

to be written instead of

   with A() as a:
       with B() as b:

This syntax was chosen (over "with A(), B() as a, b:") because it has more
syntactical similarity to the written-out version.  Also, our current uses
of "as" all have only one expression on the right.

The patch implements it as a simple AST transformation, which guarantees
semantic equivalence.  It is at <>.

If there is no strong opposition, I will commit it and port it to py3k
before 3.1 enters beta stage.


Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From fredrik.johansson at  Sat May  2 21:26:14 2009
From: fredrik.johansson at (Fredrik Johansson)
Date: Sat, 2 May 2009 21:26:14 +0200
Subject: [Python-Dev] multi-with statement
In-Reply-To: <gti5aj$ao$>
References: <gti5aj$ao$>
Message-ID: <>

On Sat, May 2, 2009 at 9:01 PM, Georg Brandl <g.brandl at> wrote:
> Hi,
> this is just a short notice that Mattias Br?ndstr?m and I have finished a
> patch to implement the previously discussed and mostly warmly welcomed
> extension to with's syntax, allowing
> ? with A() as a, B() as b:
> to be written instead of
> ? with A() as a:
> ? ? ? with B() as b:
> This syntax was chosen (over "with A(), B() as a, b:") because it has more
> syntactical similarity to the written-out version. ?Also, our current uses
> of "as" all have only one expression on the right.
> The patch implements it as a simple AST transformation, which guarantees
> semantic equivalence. ?It is at <>.
> If there is no strong opposition, I will commit it and port it to py3k
> before 3.1 enters beta stage.
> cheers,
> Georg

I was hoping for the other syntax in order to be able to create a
nested context in advance as a simple tuple:

with A, B:

context = A, B
with context:

(I.e. a tuple, or perhaps any iterable, would be a valid context manager.)

With the syntax in the patch, I will still have to implement a custom
nesting context manager to do this, which sort of defeats the purpose.


From aleaxit at  Sat May  2 21:44:06 2009
From: aleaxit at (Alex Martelli)
Date: Sat, 2 May 2009 12:44:06 -0700
Subject: [Python-Dev] multi-with statement
In-Reply-To: <>
References: <gti5aj$ao$>
Message-ID: <>

FWIW, I prefer Fredrik's wish too.

On Sat, May 2, 2009 at 12:26 PM, Fredrik Johansson <
fredrik.johansson at> wrote:

> On Sat, May 2, 2009 at 9:01 PM, Georg Brandl <g.brandl at> wrote:
> > Hi,
> >
> > this is just a short notice that Mattias Br?ndstr?m and I have finished a
> > patch to implement the previously discussed and mostly warmly welcomed
> > extension to with's syntax, allowing
> >
> >   with A() as a, B() as b:
> >
> > to be written instead of
> >
> >   with A() as a:
> >       with B() as b:
> >
> > This syntax was chosen (over "with A(), B() as a, b:") because it has
> more
> > syntactical similarity to the written-out version.  Also, our current
> uses
> > of "as" all have only one expression on the right.
> >
> > The patch implements it as a simple AST transformation, which guarantees
> > semantic equivalence.  It is at <>.
> >
> > If there is no strong opposition, I will commit it and port it to py3k
> > before 3.1 enters beta stage.
> >
> > cheers,
> > Georg
> I was hoping for the other syntax in order to be able to create a
> nested context in advance as a simple tuple:
> with A, B:
>    pass
> context = A, B
> with context:
>    pass
> (I.e. a tuple, or perhaps any iterable, would be a valid context manager.)
> With the syntax in the patch, I will still have to implement a custom
> nesting context manager to do this, which sort of defeats the purpose.
> Fredrik
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sat May  2 21:45:47 2009
From: solipsis at (Antoine Pitrou)
Date: Sat, 2 May 2009 19:45:47 +0000 (UTC)
Subject: [Python-Dev] CVE-2008-5983 "untrusted python modules search path"
Message-ID: <>


I don't think it has already posted to the list, apologies if it has.

Some Linux tools and vendors have been hit by an alleged "security hole" where
an embedded Python interpreter will prepend the current working directory to
sys.path as soon as PySys_SetArgv() is called by the embedding application. This
means, for example, that a Python file in the working directory can break
plugins or extensions written for that application if the Python file happens to
shadow another module.

Regardless of whether this is a security hole or not, it certainly can make
things disturbingly surprising when the situation arises. In the bug report
(, I suggested we add a new function
PySys_SetArgvEx() which would take an additional parameter telling whether to
touch sys.path or not (in the same spirit as Py_InitializeEx() providing a more
flexible API than Py_Initialize()).

On the other hand, I don't think we can change the default behaviour of
PySys_SetArgv(), since there are probably tools and applications relying on it
(the obvious use case which comes to my mind is a third-party interactive

Any opinions?



From g.brandl at  Sat May  2 22:12:10 2009
From: g.brandl at (Georg Brandl)
Date: Sat, 02 May 2009 22:12:10 +0200
Subject: [Python-Dev] multi-with statement
In-Reply-To: <>
References: <gti5aj$ao$>
Message-ID: <gti9f5$a45$>

Fredrik Johansson schrieb:
> On Sat, May 2, 2009 at 9:01 PM, Georg Brandl <g.brandl at> wrote:
>> Hi,
>> this is just a short notice that Mattias Br?ndstr?m and I have finished a
>> patch to implement the previously discussed and mostly warmly welcomed
>> extension to with's syntax, allowing
>>   with A() as a, B() as b:
>> to be written instead of
>>   with A() as a:
>>       with B() as b:

> I was hoping for the other syntax in order to be able to create a
> nested context in advance as a simple tuple:
> with A, B:
>     pass
> context = A, B
> with context:
>     pass
> (I.e. a tuple, or perhaps any iterable, would be a valid context manager.)

I see; you want to construct your context manager programmatically and pass
it to "with" without knowing what is in there.

While this would be possible, we have to be aware that with this we would
effectively change the context manager protocol, rather like the iterator
protocol's __getitem__ alternate realization.  This muddies the definition
of a context manager.

(The interesting thing is that you could already implement *that* version
without any new syntactic support, by giving tuples an __enter__/__exit__
method pair.)

> With the syntax in the patch, I will still have to implement a custom
> nesting context manager to do this, which sort of defeats the purpose.

Not really.  Having an unknown number of stacked context managers is not
the purpose -- for that, I'd still say a custom nesting context manager
is better, because it is also more explicit when created not at the "with"
site.  (You could even write it as a tuple subclass, if you like the tuple


Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From rdmurray at  Sun May  3 00:33:15 2009
From: rdmurray at (R. David Murray)
Date: Sat, 2 May 2009 18:33:15 -0400 (EDT)
Subject: [Python-Dev] multi-with statement
In-Reply-To: <gti9f5$a45$>
References: <gti5aj$ao$>
Message-ID: <>

oN Sat, 2 May 2009 at 22:12, Georg Brandl wrote:
> I see; you want to construct your context manager programmatically and pass
> it to "with" without knowing what is in there.
> While this would be possible, we have to be aware that with this we would
> effectively change the context manager protocol, rather like the iterator
> protocol's __getitem__ alternate realization.  This muddies the definition
> of a context manager.
> (The interesting thing is that you could already implement *that* version
> without any new syntactic support, by giving tuples an __enter__/__exit__
> method pair.)
>> With the syntax in the patch, I will still have to implement a custom
>> nesting context manager to do this, which sort of defeats the purpose.
> Not really.  Having an unknown number of stacked context managers is not
> the purpose -- for that, I'd still say a custom nesting context manager
> is better, because it is also more explicit when created not at the "with"
> site.  (You could even write it as a tuple subclass, if you like the tuple
> interface.)

As I understand it, the primary problem the patch Georg is talking
about solves is the fact that currently if you pass multiple contexts
to contextlib.nested, and one of the later items in the argument list
throws an error, the context(s) from the earlier context manager(s) does
not get cleaned up properly.  This patch solves that problem very neatly.

I'm +1 on the patch, including preferring the syntax over the alternative.

Georg, maybe you should post the link to the python-ideas discussion?


From ben+python at  Sun May  3 01:54:38 2009
From: ben+python at (Ben Finney)
Date: Sun, 03 May 2009 09:54:38 +1000
Subject: [Python-Dev] Oddity PEP 0 key
References: <>
Message-ID: <>

Arfrever Frehtes Taifersar Arahesis <arfrever.fta at> writes:

> 2009-05-02 07:34:15 Stephen J. Turnbull napisa?(a):
> > Barry Warsaw writes:
> >  > Maybe even s/Active/Proposed/g ?
> > 
> > Shouldn't that be
> > 
> >     s/Active/Proposed/<g>
> No. 
> From `info sed 'sed Programs' 'The "s" Command'`:

Stephen was, I suspect, feeling a little frisky when he wrote that, and
attempted a joke (the shortcut ?<g>? is often used in this forum for
?insert a silly grin here?).

Knowing him, I grade the joke ?4 out of 10, could do better?.

 \      ?Think for yourselves and let others enjoy the privilege to do |
  `\                          so too.? ?Voltaire, _Essay On Tolerance_ |
_o__)                                                                  |
Ben Finney
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <>

From zookog at  Sun May  3 06:33:54 2009
From: zookog at (Zooko O'Whielacronx)
Date: Sat, 2 May 2009 22:33:54 -0600
Subject: [Python-Dev] PEP 383 and GUI libraries
In-Reply-To: <>
References: <> <>
Message-ID: <>

[cross-posting to python-dev and tahoe-dev]

On Fri, May 1, 2009 at 8:12 PM, James Y Knight <foom at> wrote:
> If I were designing a new system such as this, I'd probably just go for
> utf8b *always*.

Ah, this would be a very tempting possibility -- abandon all unix
users who are slow to embrace our utf-8b future!

However, it is moot because Tahoe is not a new system. It is currently
at v1.4.1, has a strong policy of backwards-compatibility, and already
has lots of data, lots of users, and programmers building on top of
it. It currently uses utf-8 for its internal storage (note: nothing to
do with reading or writing files from external sources -- only for
storing filenames in the decentralized storage system which is
accessed by Tahoe clients), and we can't start putting non-utf-8-valid
sequences in the "filename" slot because other Tahoe clients would
then get a UnicodeDecodeError exception when trying to read those

We *could* create a new metadata entry to hold things other than
utf-8. Current Tahoe clients would never look at that entry (the
metadata is a JSON-serialized dictionary, so we can add a new key name
into it without disturbing the existing clients), but future Tahoe
clients could look for that new key. That is where it is possible that
future versions of Tahoe might be able to benefit from utf-8b or PEP
383, although what PEP 383 offers for this use case remains unclear to

> But if you don't do that, then, I still don't see what purpose your
> requirements serve. If I have two systems: one with a UTF-8 locale, and one
> with a Latin-1 locale, why should transmitting filenames from system 1 to
> system 2 through tahoe preserve the raw bytes, but doing the reverse *not*
> preserve the raw bytes? (all byte-sequences are valid in latin-1, remember,
> so they'll all decode into unicode without error, and then be reencoded in
> utf-8...). This seems rather a useless behavior to me.

I see I'm not explaining the Tahoe requirements clearly. It's probably
that I'm not understanding them clearly myself. Hopefully the
following will help.

There are two different things stored in Tahoe for each directory
entry: the filename and the metadata.

Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system
and then you inspect the files in the Tahoe filesystem, such as by
examining the web interface [1] or by running "tahoe ls", either of
which you could do either from the same machine where you ran "tahoe
cp" or from a different machine (which could be using any operating
system). We have the following requirements about what ends up in your
Tahoe directory after that cp -r.

Requirement 1 (unicode):  Each filename that you see needs to be valid
unicode (it is stored internally in utf-8). This eliminates utf-8b and
PEP 383 from being directly applicable to the filename part, although
perhaps they could be useful for the metadata part (about which more

Requirement 2 (faithful if unicode):  For each filename (byte string)
in your myfiles directory, if that bytestring is the valid encoding of
some string in your stated locale, then the resulting filename in
Tahoe is that (unicode) string. Nobody ever doesn't want this, right?
Well, maybe some people don't want this sometimes, because it could be
that the locale was wrong for this byte string and the resulting
successfully-decoded unicode name is gibberish. This is especially
acute if the locale is an 8-bit encoding such as latin-1 or
windows-1252. However, what's the alternative?  Guessing that their
locale shouldn't be set to latin-1 and instead decoding their bytes
some other way?  It seems like we're not going to do better than
requirement 2 (faithful if unicode).

Requirement 3 (no file left behind):  For each filename (byte string)
in your myfiles directory, whether or not that byte string is the
valid encoding of anything in your stated locale, then that file will
be added into the Tahoe filesystem under *some* name (a good candidate
would be mojibake, e.g. decode the bytes with latin-1, but that is not
the only possibility). I have heard some developers say that they
don't want to support this requirement and would rather tell the users
to fix their filenames before they can back up or share those files
through Tahoe. On the other hand, users have said that they require
this and they are not going to go mucking about with all their
filenames just so that they can use my backup and filesharing tool.

Now already we can say that these three requirements mean that there
can be collisions -- for example a directory could have two entries,
one of which is not a valid encoding in the locale, and whatever
unicode string we invent to name it with in order to satisfy
requirements 3 (no file left behind) and 1 (unicode) might happen to
be the same as the (correctly-encoded) name of the other file.
Therefore these three requirements imply that we have to detect such
collisions and deal with them somehow. (Thanks to Martin v. L?wis for
reminding me of this.)

Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
"round-tripping"): Suppose you have a directory with some files with
Japanese names, encoded using shift-jis, and some files with Russian
names, encoded using koi8-r. Suppose your locale is set to shift-jis,
and then you do "tahoe cp -r myfiles/ tahoe:". Then suppose you or
someone else does "tahoe cp -r tahoe: copy_of_myfiles/". The
"round-tripping" feature is that the files with Russian names that did
not accidentally decode cleanly with shift-jis still have the same
bytes in their names as they did in the original myfiles directory.

As I write this, I am becoming skeptical of this (faithful bytes if
not unicode, a.k.a. "round-tripping"), thanks in part to criticism
from James Knight, MvL, Thomas Breuel, and others. One reason to be
skeptical is that about a third of the Russian files will happen to
decode cleanly as shift-jis anyway, and will therefore come out as
something entirely different if the target filesystem's encoding is
something other than shift-jis. But an even worse problem -- the
show-stopper for me -- is that I don't want what Tahoe shows when you
do "tahoe ls" or view it in a web browser to differ from what it
writes out when you do "tahoe cp -r tahoe: newfiles/". So I'm ready to
reject this one.

Now about the "metadata" part which is separate from the filename
itself. I have another requirement:

Requirement 5 (no loss of information):  I don't want Tahoe to destroy
information -- every transformation should be (in principle)
reversible by some future computer-augmented archaeologist. For
example, if a bytestring decodes cleanly with the locale's suggested
encoding, and we use the resulting unicode as the filename, then we
also store the original byte string in the metadata since we don't
know if the locale's suggested encoding was good. This allows the
later invention of a tool which shows the user what the filename would
have been with other encodings and let the user choose one that makes
sense. It is important to note that this does not impose any
requirement on the *filename* itself -- all such information can be
stored in the metadata.

Okay, in light of the above four requirements and the rejection of #4,
I hereby propose to change from the previous Tahoe design [2] to the

To copy an entry from a local filesystem into Tahoe:

1. On Windows or Mac read the filename with the unicode APIs.
Normalize the string with filename = unicodedata.normalize('NFC',
filename). Leave the "original_bytes" key and the "failed_decode" flag
out of the metadata.

2. On Linux or Solaris read the filename with the string APIs, and
store the result in the "original_bytes" part of the metadata. Call
sys.getfilesystemencoding() to get an alleged_encoding. Then, call
bytes.decode(alleged_encoding, 'strict') to try to get a unicode

2.a. If this decoding succeeds then normalize the unicode filename
with filename = unicodedata.normalize('NFC', filename), store the
resulting filename and leave the "failed_decode" flag out of the

2.b. If this decoding fails, then we decode it again with
bytes.decode('latin-1', 'strict'). Do not normalize it. Store the
resulting unicode object into the "filename" part, set the
"failed_decode" flag to True. This is mojibake!

3. (handling collisions)  In either case 2.a or 2.b the resulting
unicode string may already be present in the directory. If so, check
the failed_decode flags on the current entry and the new entry. If
they are both set or both unset then the new entry overwrites the old
entry -- they had the same name. If the failed_decode flags differ
then this is a case of collision -- the old entry and the new entry
had (as far as we are concerned) different names that accidentally
generated the same unicode. Alter the new entry's name, for example by
appending "~1" and then trying again and incrementing the number until
it doesn't match any extant entry.

To copy an entry from Tahoe into a local filesystem:

Always use the Python unicode API. The original_bytes field and the
failed_decode field in the metadata are not consulted.

Now a question for python-dev people: could utf-8b or PEP 383 be
useful for requirements like the four requirements listed above?  If
not, what requirements does PEP 383 help with?  I'm sure that if can
help with the use case of "I'm doing os.listdir() and then I'm going
to turn around and use the resulting unicode objects on the same local
filesystem in the same Python process". I'm not sure that it can help
if you are going to store the results of your os.listdir()
persistently or if you are going to transmit them over a network.
Indeed, using the results that way could lead to unpleasant surprises.
Does that sound right to you?  Perhaps this could be documented
somehow to help other programmers along the way.

Thanks very much for your help, everyone.




From greg.ewing at  Sun May  3 09:47:17 2009
From: greg.ewing at (Greg Ewing)
Date: Sun, 03 May 2009 19:47:17 +1200
Subject: [Python-Dev] yield from?
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> What's the status of yield from? There's still a small window open for
> a patch to be checked into 3.1's branch. I haven't been following the
> python-ideas threads, so I'm not sure if it's ready yet.

The PEP itself seems to have settle down, and is
awaiting a verdict from Guido.

The prototype implementation doesn't quite match
the PEP in some of the fine details yet. Also
it's for 2.6 rather than 3.x; someone with more
knowledge of 3.x internals would be better placed
than me to convert it.


From martin at  Sun May  3 10:17:04 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 03 May 2009 10:17:04 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
Message-ID: <>

With issue 3672 resolved, it is now unnecessary to introduce
an utf-8b codec, since the utf-8 codec will properly report errors
for all byte sequences invalid in UTF-8, including lone surrogates.
Therefore, utf-8b can be implemented solely through the error handler.

Glenn Linderman suggested that the name "python-escape" is not very
descriptive, so I've changed the name to "utf8b".

I've updated the PEP accordingly.


From stephen at  Sun May  3 11:32:38 2009
From: stephen at (Stephen J. Turnbull)
Date: Sun, 03 May 2009 18:32:38 +0900
Subject: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries]
In-Reply-To: <>
References: <> <>
Message-ID: <>

Zooko O'Whielacronx writes:

 > However, it is moot because Tahoe is not a new system. It is currently
 > at v1.4.1, has a strong policy of backwards-compatibility, and already
 > has lots of data, lots of users, and programmers building on top of
 > it.


Question: is there a way to negotiate versions, or better yet, features?

 > I see I'm not explaining the Tahoe requirements clearly. It's probably
 > that I'm not understanding them clearly myself.

Well, it's a high-dimensional problem.  Keeping track of all the
variables is hard.  That's why something like PEP 383 can be important
to you even though it's only a partial solution; it eliminates one

 > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux system
 > and then you inspect the files in the Tahoe filesystem, such as by
 > examining the web interface [1] or by running "tahoe ls", either of
 > which you could do either from the same machine where you ran "tahoe
 > cp" or from a different machine (which could be using any operating
 > system). We have the following requirements about what ends up in your
 > Tahoe directory after that cp -r.

Whoa! Slow down!  Where's "my" "Tahoe directory"?  Do you mean the
directory listing?  A copy to whatever system I'm on?  The bytes that
the Tahoe host has just loaded into a network card buffer to tell me
about it?  The bytes on disk at the Tahoe host?  You'll find it a lot
easier to explain things if you adopt a precise, consistent terminology.

 > Requirement 1 (unicode):  Each filename that you see needs to be valid
 > unicode

What does "see" mean?  In directory listings?  Under what
circumstances, if any, can what I see be different from what I get?

 > Requirement 2 (faithful if unicode):  For each filename (byte string)
 > in your myfiles directory,

My local myfiles directory, or my Tahoe myfiles directory?

 > if that bytestring is the valid encoding of some string in your
 > stated locale,

Who stated the locale?  How?  Are you referring to what
getfilesystemencoding returns?  This is a "(unicode) string", right?

 > then the resulting filename in Tahoe is that (unicode)
 > string. Nobody ever doesn't want this, right?  Well, maybe some
 > people don't want this sometimes, [...]. However, what's the
 > alternative?  Guessing that their locale shouldn't be set to
 > latin-1 and instead decoding their bytes some other way?

Sure.  Emacsen do that, you know.  Of course it's hard to guess
something else if ISO-8859/1 is the preferred encoding, but it does
happen.  This probably cannot be done accurately enough for Tahoe,

 > It seems like we're not going to do better than
 > requirement 2 (faithful if unicode).
 > Requirement 3 (no file left behind):  For each filename (byte string)
 > in your myfiles directory, whether or not that byte string is the
 > valid encoding of anything in your stated locale, then that file will
 > be added into the Tahoe filesystem under *some* name (a good candidate
 > would be mojibake, e.g. decode the bytes with latin-1, but that is not
 > the only possibility).

That's not even a possibility, actually.  Technically, Latin-1 has a
"hole" from U+0080 to U+009F.  You need to add the C1 controls to fill
in that gap.  (I don't think it actually matters in practice,
everybody seems to implement ISO-8859/1 as though it contained the
control characters ... except when detecting encodings ... but it pays
to be precise in these things ....)

 > Now already we can say that these three requirements mean that there
 > can be collisions -- for example a directory could have two entries,
 > one of which is not a valid encoding in the locale, and whatever
 > unicode string we invent to name it with in order to satisfy
 > requirements 3 (no file left behind) and 1 (unicode) might happen to
 > be the same as the (correctly-encoded) name of the other file.

This is false with rather high probability, but you need some extra
structure to deal with it.  First, claim the Unicode private planes
for Tahoe.  Then allocate characters from the private planes on demand
as encountered, *including* such characters encountered in external
file names to be stored in Tahoe *and* the surrogates used by PEP
383.  "Display names" using these private characters would be valid
Unicode, but not very useful.  However, an algorithmically generated
font (like the 4-hex-digit-square used to give a glyph to unknown code
points in the BMP) could be used by those who care.

Also store mappings from (system encoding, UTF-8b representation) to
private char and back.  For simplicity, that could be global on your
server (IIRC, there are at least two private planes up there, so you'd
need to run into almost 128Ki *unique* such characters to run out).

I guess you'd be subject to a DOS attack where somebody decided to map
all of 80000-odd CNS characters into private space, and then write
80000 files, each with a different 1-character name ....

Note that Martin does *not* do this in PEP 383 because PEP 383 only
cares about the semantics that a filename read from a directory can be
used to access the file associated with it in that directory.  For
that, a private, non-Unicode encoding is perfectly acceptable.  But
you want valid Unicode.  This scheme gives it to you.

The registry of characters is somewhat unpleasant, but it does allow
you to detect filenames that are the same reliably.

 > Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
 > "round-tripping"):

PEP 383 gives you this, but you must store the encoding used for each
such file name.

 > One reason to be skeptical is that about a third of the Russian
 > files will happen to decode cleanly as shift-jis anyway, and will
 > therefore come out as something entirely different if the target
 > filesystem's encoding is something other than shift-jis.

The only way to handle this is to store the encoding used to convert
to Unicode as part of *every* file's metadata.  This could be also
used in Tahoe to warn the user that the current system encoding does
not match the alleged_encoding used to make the backup.  Some users
might prefer to use the alleged_encoding on restore.

 > But an even worse problem -- the show-stopper for me -- is that I
 > don't want what Tahoe shows when you do "tahoe ls" or view it in a
 > web browser to differ from what it writes out when you do "tahoe cp
 > -r tahoe: newfiles/".

But as a requirement, that's incoherent.  What you are "seeing" is
Unicode, what it will write out is bytes.  That means that if multiple
locales are in use on both the backup and restore systems, and the
nominal system encodings are different, people whose personal default
locales are not the same as the system's will see what they expect on
the backup system (using system ls), mojibake on Tahoe (using tahoe
ls), and *different* mojibake on the restore system (system ls,

Note that "use Tahoe, not system, ls" doesn't help at all (unless the
weirdo has learned to read mojibake, which actually does happen, but
it's not worth betting on).

How likely is that?  Hate to tell you this: if you need the "unknown
bytes scheme at all, this scenerio is *extremely* likely.  How do you
think that KOI8-R got into a directory on a Shift-JIS system in the
first place?  Yup, a Russian visiting professor in Tokyo who set his
personal locale to ru_RU.KOI8-R wrote it there.  And he's very likely
to have the same personal locale on a very up-to-date system with a
UTF-8 system encoding when he gets back to Moscow.  Bingo! it's
mojibake all the way to Moscow.

 > Now about the "metadata" part which is separate from the filename
 > itself. I have another requirement:
 > Requirement 5 (no loss of information):  I don't want Tahoe to destroy
 > information -- every transformation should be (in principle)
 > reversible by some future computer-augmented archaeologist. For
 > example, if a bytestring decodes cleanly with the locale's suggested
 > encoding, and we use the resulting unicode as the filename, then we
 > also store the original byte string in the metadata since we don't
 > know if the locale's suggested encoding was good.

UTF-8b would be just as good for storing the original bytestring, as
long as you keep the original encoding.  It's actually probably
preferable if PEP 383 can be assumed to be implemented in the versions
of Python you use.

 > This allows the later invention of a tool

It will be called "Emacs", by the way.<wink>

 > which shows the user what the filename would
 > have been with other encodings and let the user choose one that makes
 > sense.

 > To copy an entry from a local filesystem into Tahoe:
 > 1. On Windows or Mac read the filename with the unicode APIs.
 > Normalize the string with filename = unicodedata.normalize('NFC',
 > filename). Leave the "original_bytes" key and the "failed_decode" flag
 > out of the metadata.

NFD is probably better for fuzzy matching and display on legacy

 > 2. On Linux or Solaris read the filename with the string APIs, and
 > store the result in the "original_bytes" part of the metadata. Call
 > sys.getfilesystemencoding() to get an alleged_encoding. Then, call
 > bytes.decode(alleged_encoding, 'strict') to try to get a unicode
 > object.
 > 2.a. If this decoding succeeds then normalize the unicode filename
 > with filename = unicodedata.normalize('NFC', filename), store the
 > resulting filename and leave the "failed_decode" flag out of the
 > metadata.

Per the koi8-lucky example, you don't know if it succeeded for the
right reason or the wrong reason.  You really should store the
alleged_encoding used in the metadata, always.

Note that you should *also* store the failed_decode flag, because the
presence of multiple fail_decodes is a very strong indication that
some of the users had default encoding != system encoding.  If you use
the scheme I propose above, of course you have the same information
by scanning the file name for Tahoe-only private use characters, but
that would be relatively expensive.

 > 2.b. If this decoding fails, then we decode it again with
 > bytes.decode('latin-1', 'strict'). Do not normalize it. Store the
 > resulting unicode object into the "filename" part, set the
 > "failed_decode" flag to True. This is mojibake!

Not necessarily.  Most ISO-8859/X names will fail to decode if the
alleged_encoding is UTF-8, for example, but many (even for X != 1)
will be correctly readable because of the policy of trying to share
code points across Latin-X encodings.  Certainly ISO-8859/1 (and
much ISO-8859/15) will be correct.

 > 3. (handling collisions)  In either case 2.a or 2.b the resulting
 > unicode string may already be present in the directory. If so, check
 > the failed_decode flags on the current entry and the new entry. If
 > they are both set or both unset then the new entry overwrites the old
 > entry -- they had the same name.

If both are set, you're OK, because you are forcing ISO-8859/1.  If
both are unset, however, you don't know for sure because
alleged_encoding is not necessarily a constant.

 > To copy an entry from Tahoe into a local filesystem:
 > Always use the Python unicode API. The original_bytes field and the
 > failed_decode field in the metadata are not consulted.
 > Now a question for python-dev people: could utf-8b or PEP 383 be
 > useful for requirements like the four requirements listed above?  If
 > not, what requirements does PEP 383 help with?

By giving you a standard, invertible way to represent anything that
the OS can throw at you, it helps with all of them.

 > I'm not sure that it can help if you are going to store the results
 > of your os.listdir() persistently or if you are going to transmit
 > them over a network.  Indeed, using the results that way could lead
 > to unpleasant surprises.

No more than any other system for giving a canonical Unicode spelling
to the results of an OS call.

From l.mastrodomenico at  Sun May  3 15:29:27 2009
From: l.mastrodomenico at (Lino Mastrodomenico)
Date: Sun, 3 May 2009 15:29:27 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/3 "Martin v. L?wis" <martin at>:
> With issue 3672 resolved, it is now unnecessary to introduce
> an utf-8b codec, since the utf-8 codec will properly report errors
> for all byte sequences invalid in UTF-8, including lone surrogates.
> Therefore, utf-8b can be implemented solely through the error handler.

That's even nicer. One minor detail though, in the sentence:

    "non-decodable bytes >128 will be represented as lone half surrogate"

">" should be ">=".

Lino Mastrodomenico

From solipsis at  Sun May  3 15:43:06 2009
From: solipsis at (Antoine Pitrou)
Date: Sun, 3 May 2009 13:43:06 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> Glenn Linderman suggested that the name "python-escape" is not very
> descriptive, so I've changed the name to "utf8b".

If the error handler is supposed to be used for codecs other than utf-8,
perhaps it should renamed something more generic, e.g. "surrogate-escape"?

Also, if utf8-b is not provided as a codec, will there be an easy way for user
code to use the same encoding as the IO layer does? (e.g. 

From ncoghlan at  Sun May  3 17:09:47 2009
From: ncoghlan at (Nick Coghlan)
Date: Mon, 4 May 2009 01:09:47 +1000
Subject: [Python-Dev] multi-with statement
In-Reply-To: <gti9f5$a45$>
References: <gti5aj$ao$>
Message-ID: <>

(I still don't really have net access back after moving house - just  
chiming in briefly via my mobile)

Anyway, I think there is one very good reason for NOT defining a multi- 
with statement in terms of an existing tuple: it gains us nothing  
except speed over contextlib.nested. The whole point of the new  
syntactic support is to execute each expression inside the context of  
the preceding managers. That requirement precludes the idea of using  
an intermediate tuple, since every expression would have to be  
evaluated before the tuple could be created.

I'm still not 100% convinced the saving in indentation levels due to  
this change would be worth the increase in complexity and ambiguity  

Nick Coghlan, Brisbane, Australia

On 03/05/2009, at 6:12 AM, Georg Brandl <g.brandl at> wrote:

> Fredrik Johansson schrieb:
>> On Sat, May 2, 2009 at 9:01 PM, Georg Brandl <g.brandl at>  
>> wrote:
>>> Hi,
>>> this is just a short notice that Mattias Br?ndstr?m and I have f 
>>> inished a
>>> patch to implement the previously discussed and mostly warmly  
>>> welcomed
>>> extension to with's syntax, allowing
>>>  with A() as a, B() as b:
>>> to be written instead of
>>>  with A() as a:
>>>      with B() as b:
>> I was hoping for the other syntax in order to be able to create a
>> nested context in advance as a simple tuple:
>> with A, B:
>>    pass
>> context = A, B
>> with context:
>>    pass
>> (I.e. a tuple, or perhaps any iterable, would be a valid context  
>> manager.)
> I see; you want to construct your context manager programmatically  
> and pass
> it to "with" without knowing what is in there.
> While this would be possible, we have to be aware that with this we  
> would
> effectively change the context manager protocol, rather like the  
> iterator
> protocol's __getitem__ alternate realization.  This muddies the  
> definition
> of a context manager.
> (The interesting thing is that you could already implement *that*  
> version
> without any new syntactic support, by giving tuples an __enter__/ 
> __exit__
> method pair.)
>> With the syntax in the patch, I will still have to implement a custom
>> nesting context manager to do this, which sort of defeats the  
>> purpose.
> Not really.  Having an unknown number of stacked context managers is  
> not
> the purpose -- for that, I'd still say a custom nesting context  
> manager
> is better, because it is also more explicit when created not at the  
> "with"
> site.  (You could even write it as a tuple subclass, if you like the  
> tuple
> interface.)
> Georg
> -- 
> Thus spake the Lord: Thou shalt indent with four spaces. No more, no  
> less.
> Four shall be the number of spaces thou shalt indent, and the number  
> of thy
> indenting shall be four. Eight shalt thou not indent, nor either  
> indent thou
> two, excepting that thou then proceed to four. Tabs are right out.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From murman at  Sun May  3 17:35:16 2009
From: murman at (Michael Urman)
Date: Sun, 3 May 2009 10:35:16 -0500
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 3, 2009 at 08:43, Antoine Pitrou <solipsis at> wrote:
> Also, if utf8-b is not provided as a codec, will there be an easy way for user
> code to use the same encoding as the IO layer does? (e.g.
> os.fsdecode/os.fsencode)?

I like the idea of fsencode/fsdecode functions, but we need to be
careful deciding what they accept and produce on Windows. I'd expect
them to be identity functions, but then the difference in platform
behavior suggests perhaps they should be in os.path.

Unicode to Unicode on Windows would further mean fsencode wouldn't be
useful for sending filenames over sockets, and "utf8" will be prone to
exceptions on the very names we're trying to support right now. Is
there an advantage to not providing the the "utf8b" behavior as a
registered codec?

Michael Urman

From martin at  Sun May  3 19:32:47 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 03 May 2009 19:32:47 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

> That's even nicer. One minor detail though, in the sentence:
>     "non-decodable bytes >128 will be represented as lone half surrogate"
> ">" should be ">=".

Thanks, fixed.


From martin at  Sun May  3 19:39:41 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 03 May 2009 19:39:41 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

> If the error handler is supposed to be used for codecs other than utf-8,
> perhaps it should renamed something more generic, e.g. "surrogate-escape"?

Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
it's an algorithm based on 16-bit or 32-bit code points.

> Also, if utf8-b is not provided as a codec, will there be an easy way for user
> code to use the same encoding as the IO layer does? 

s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in
fact, that's exactly what the IO layer does).


From greg at  Sun May  3 21:20:07 2009
From: greg at (Gregory P. Smith)
Date: Sun, 3 May 2009 12:20:07 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 3, 2009 at 10:39 AM, "Martin v. L?wis" <martin at>wrote:

> > If the error handler is supposed to be used for codecs other than utf-8,
> > perhaps it should renamed something more generic, e.g.
> "surrogate-escape"?
> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
> it's an algorithm based on 16-bit or 32-bit code points.

To me that lack of relationship with utf8 suggests that it should not be
called utf8b...  But I don't have any good suggestions.

> > Also, if utf8-b is not provided as a codec, will there be an easy way for
> user
> > code to use the same encoding as the IO layer does?
> s.encode(os.getfilesystemencoding(), "utf8b") will do just that (in
> fact, that's exactly what the IO layer does).
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From martin at  Sun May  3 22:27:59 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 03 May 2009 22:27:59 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	
Message-ID: <>

>     > If the error handler is supposed to be used for codecs other than
>     utf-8,
>     > perhaps it should renamed something more generic, e.g.
>     "surrogate-escape"?
>     Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
>     it's an algorithm based on 16-bit or 32-bit code points.
> To me that lack of relationship with utf8 suggests that it should not be
> called utf8b

Perhaps. However, giving it that name was Markus Kuhn's choice - and
while it may be confusing, it's (IMO) useful to be consistent with this


From greg at  Sun May  3 23:11:51 2009
From: greg at (Gregory P. Smith)
Date: Sun, 3 May 2009 14:11:51 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 3, 2009 at 1:27 PM, "Martin v. L?wis" <martin at>wrote:

> >     > If the error handler is supposed to be used for codecs other than
> >     utf-8,
> >     > perhaps it should renamed something more generic, e.g.
> >     "surrogate-escape"?
> >
> >     Perhaps. However, utf-8b doesn't really have to do anything with
> utf-8 -
> >     it's an algorithm based on 16-bit or 32-bit code points.
> >
> >
> > To me that lack of relationship with utf8 suggests that it should not be
> > called utf8b
> Perhaps. However, giving it that name was Markus Kuhn's choice - and
> while it may be confusing, it's (IMO) useful to be consistent with this
> background.
> Regards,
> Martin
Ah, right.  My original searches for utf8b didn't turn up much but searching
on his name turns some up.  Good choice of name then.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benjamin at  Mon May  4 00:50:29 2009
From: benjamin at (Benjamin Peterson)
Date: Sun, 3 May 2009 17:50:29 -0500
Subject: [Python-Dev] yield from?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/3 Greg Ewing <greg.ewing at>:
> Benjamin Peterson wrote:
>> What's the status of yield from? There's still a small window open for
>> a patch to be checked into 3.1's branch. I haven't been following the
>> python-ideas threads, so I'm not sure if it's ready yet.
> The PEP itself seems to have settle down, and is
> awaiting a verdict from Guido.

Guido is now on vacation until the 18th, so I think this will have to
be deferred until 2.7/3.2.

> The prototype implementation doesn't quite match
> the PEP in some of the fine details yet. Also
> it's for 2.6 rather than 3.x; someone with more
> knowledge of 3.x internals would be better placed
> than me to convert it.


From jimjjewett at  Mon May  4 06:36:05 2009
From: jimjjewett at (Jim Jewett)
Date: Mon, 4 May 2009 00:36:05 -0400
Subject: [Python-Dev] PEP 383 and GUI libraries
Message-ID: <>

(sent only to python-dev, as I am not a subscriber of tahoe-dev)

Zooko wrote:

> [Tahoe] currently uses utf-8 for its internal storage (note: nothing to
> do with reading or writing files from external sources -- only for
> storing filenames in the decentralized storage system which is
> accessed by Tahoe clients), and we can't start putting non-utf-8-valid
> sequences in the "filename" slot because other Tahoe clients would
> then get a UnicodeDecodeError exception when trying to read those
> directories.

So what do you do when someone has an existing file whose name is
supposed to be in utf-8, but whose actual bytes are not valid utf-8?

If you have somehow solved that problem, then you're already done --
the PEP's encoding is a no-op on anything that isn't already invalid

If you have not solved that problem, then those clients will already
be getting a UnicodeDecodeError; all the PEP does is make it at least
possible for them to recover.


> Requirement 1 (unicode):  Each filename that you see needs to be valid
> unicode (it is stored internally in utf-8).

(repeating) What does Tahoe do if this is violated?  Do you throw an
exception right there and not let them copy the file to tahoe?  If so,
then that same error correction means that utf8b will never differ
from utf-8, and you have nothing to worry about.

> Requirement 2 (faithful if unicode):

Doesn't the PEP meet this?

> Requirement 3 (no file left behind):

Doesn't the PEP also meet this?  I thought the concern was just that
the name used would not be valid unicode, unless the original name was
itself valid unicode.

> Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
> "round-tripping"):

Doesn't the PEP also support this?  (Only) the invalid bytes get
escaped and therefore must be unescaped, but the escapement is

> 3. (handling collisions)  In either case 2.a or 2.b the resulting
> unicode string may already be present in the directory.

This collision is what the use of half-surrogates (as the escape
characters) avoids.  Such collisions can't be present unless the data
was invalid unicode, in which case it was the result of an escapement
(unless something other than python is creating new invalid


From larry at  Mon May  4 11:10:51 2009
From: larry at (Larry Hastings)
Date: Mon, 04 May 2009 02:10:51 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
Message-ID: <>

I should have brought this up to python-dev before--sorry for being so 
slow.  It's already in the tracker for a couple of days:

The idea: PyGetSetDef has this "void *closure" field that acts like a 
context pointer.  You stick it in the PyGetSetDef, and it gets passed 
back to you when your getter or setter is called.  It's a reasonable API 
design, but in practice you almost never need it.  Meanwhile, it 
clutters up CPython, particularly typeobject.c; there are all these 
function calls that end with ", NULL);", just to satisfy the 
getter/setter prototype internally.

Most of the time, the "closure" parameter is not only unused, it is 
skipped.  PyGetSetDef definitions generally skip it, and often getter 
and setter implementations omit it.  The "closure" was only actually 
*used* once in CPython, a silly use in Objects/longobject.c where it was 
abused as an integer value.  And yes, I said "was": inspired by this 
discussion, Mark Dickinson removed this use in r72202 (trunk) and r72203 
(py3k).  So the "closure" field is now 100% unused in the python and 
py3k trunks.

Mr. Dickinson also located an extension using the "closure" pointer, 
pyephem, which... *also* uses it to store an integer.  Indeed, I have 
yet to see a use where someone stores a pointer in "closure".

Anyone who needed functionality like this could roll it themselves with 
stub functions:

    PyObject *my_getter_with_context(PyObject *self, void *context) {
      /* ... */

    PyObject *my_getter_A(PyObject *self) {
      return my_getter_with_context(self, "A");

    PyObject *my_getter_B(PyObject *self) {
      return my_getter_with_context(self, "B");

    /* etc. */

(Although it'd make my example more realistic if "context" were an int!)

So: you don't need it, it clutters up our code (particularly 
typeobject.c), and it adds overhead.  The only good reason to keep it is 
backwards compatibility, which I admit is a fine reason.

Whaddya think?  To be honest I'd be surprised if you guys went for 
this.  But I thought it was worth suggesting.


From eric at  Mon May  4 13:37:33 2009
From: eric at (Eric Smith)
Date: Mon, 04 May 2009 07:37:33 -0400
Subject: [Python-Dev] Changing float.__format__
Message-ID: <>

In issue 5920, Mark Dickinson raises an issue having to do with 
float.__format__ and how it handles the default format presentation type 
(that is, none of 'f', 'g', or 'e') versus how str() works on floats:

I agree with him that the current behavior is confusing and should be 
changed. I'm going to make this change, unless anyone objects. Please 
comment on the issue itself if you have any feedback.


From dickinsm at  Mon May  4 14:13:25 2009
From: dickinsm at (Mark Dickinson)
Date: Mon, 4 May 2009 13:13:25 +0100
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 10:10 AM, Larry Hastings <larry at> wrote:

> So: you don't need it, it clutters up our code (particularly typeobject.c),
> and it adds overhead. ?The only good reason to keep it is backwards
> compatibility, which I admit is a fine reason.

Presumably whoever added the context field had a reason for doing so.
Does anyone remember what the intended use was?

Trawling through the history, all I could find was this comment,
attached to revision 23270: [Modified Thu Sep 20 21:45:26 2001
UTC (7 years, 7 months ago) by gvanrossum]

Add optional docstrings to getset descriptors.  Fortunately, there's
no backwards compatibility to worry about, so I just pushed the
'closure' struct member to the back -- it's never used in the current
code base (I may eliminate it, but that's more work because the getter
and setter signatures would have to change.)

Still, binary compatibility seems like a fairly strong reason not to
remove the closure field.


From gregor.lingl at  Mon May  4 16:33:58 2009
From: gregor.lingl at (Gregor Lingl)
Date: Mon, 04 May 2009 16:33:58 +0200
Subject: [Python-Dev] update for 3.1
Message-ID: <>


Encouraged by a conversation with Martin at PyCon 2009
I've prepared a version 1.1b of the turtle module and I'd like to
get some advice or assistance to get it into the beta as explained
below. Thus I'd appreciate very much if also the release manager
would take notice of this posting.

python 2.0 had the version 1.0 and for now I'll give a terse
summary of the changes I did:

1. a few bugfixes, with 1 - 5 lines of code changed for each;
    these concern bugs that prevented turtle to run correctly

2. I've added four methods to the class TurtleScreeenBase:
    _onkeypress(fun, key)  (supplementing _onkeyrelease)
    mainloop()  (which is now a Screen-method and a function)
    textinput(title, prompt)
    numinput(title, prompt, default, minval, maxval)
           the latter two remedy the complete lack of input methods
    _onkey, an internal method name is changed to _onkeyrelease

3. I've added one method to the class TurtleScreen:
    onkeypress(fun, key=None) 
        implemented in analogy to the already present onkey()
        which got onkeyrelease as an alias.

4. I've changed several portions of the code that affect
   the representation of the turtleshape thus making it
   more compact (by removing some duplicated code) and more
   powerful, i. e. by adding the possibility to apply
   shearings to turtleshapes (in addition to the already present
   scaling and rotating transformations). Thus now the full
   range of (non singular) linear transformations is available.

   New methods in class RawTurtle:
    shearfactor(shear=None)    set or get the shearfactor
    shapetransform(t11, t12, t21, t22)
                    set or get the shape transform directly
    get_shapepoly() return the polygon of the current shape

   I've enhanced the functionality of tiltangle(angle=None)
   to contain also that of settiltangle and I propose to
   declare settiltangle as deprecated.
5. I've removed a lot of codelines that were commented out
   during the process of transferring the module from 2.6
   to 3.0

6. I've implemented the bugfix for
   according do my proposition there and I strongly
   recommend this change again, as the bug described is very
   annoying, the fix is easy and no one proposed a better

7. I've tested the present version 1.1 extensivly. It runs
   all the demo scripts without problems and many others
   too (some of them significantly better than version 1.1).
   I'd like to add two additional scripts to the demo
   directory, one of them using new features so it only runs
   with this new version.

I've *not* touched the issue of the Screen singleton, so that
remains unchanged as it was as a result of Martins patch.

Thus, as a summary, this update does some bugfixes and eliminates
three deficiencies of the module: (1) accept keypress event,
(2) provide user input functions and (3) complement scaling
and rotating of turtleshapes by shearing, thus providing
the full range of linear transforms.


(1) Submit the new version as a single file
(2) submit a unified diff containing all the changes
(3) Divide the changes into several chunks of
    related changes and submit the according diffs separately
    That would pose the problems, that there are lines
    in the code that are affected by several changes,
    e. g. those lines that define __all__
    And also: does the order of applying the patches matter?
    How do I have to account for this?
(4) Some other approach?

I'd appreciate to discuss open issues as needed and I'm
prepared to give more elaborate explanations and rationales
as wanted or as needed.

Docs for the changes are (to a large extent) contained in the
docstrings and I'm going to update the Documentation of the
turtle module (on the basis of theses docstrings) now.

Thanks in advance for your support





From phd at  Mon May  4 17:07:49 2009
From: phd at (Oleg Broytmann)
Date: Mon, 4 May 2009 19:07:49 +0400
Subject: [Python-Dev] PyPI copyright
Message-ID: <>

"Copyright ? 1990-2007, Python Software Foundation"


     Oleg Broytmann              phd at
           Programmers don't die, they just GOSUB without RETURN.

From mail at  Mon May  4 17:28:54 2009
From: mail at (Christian Schubert)
Date: Mon, 4 May 2009 17:28:54 +0200
Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python
Message-ID: <>


Python ships with a profiler module which, unfortunately, is almost useless in a multi-threaded environment. *

I've created an alternative profiler module which queries per-thread CPU usage via netlink/taskstats, which limits the applicability to Linux (which shouldn't be much of an issue, profiling is usually not done by end users). It implements two modes: a "sampling" (does CPU time accounting based on stack fraames 100 times per second, by default) and a "deterministic" profiler (does CPU time accounting on each function call/return, based on sys.profiler interface). The deterministic profiler is currently implemented in pure python (except for taskstats interface) and much slower than the sampling profiler.

Usage (don't forget make to build the c module):
>> from Profiler import *
>> def f(): do_something()
>> sampling_profiler(f)
>> deterministic_profiler(f)

Output is currently in the form of annotated source code (, in the same directory where resides). Before the *_profiler function returns, it iterates over all code objects it encountered and annotates the source files with 2 columns in front:
- 1st column: real time
- 2nd column: CPU time

numbers are log2(time_in_ns), colors are green-to-yellow for below-average and yellow-to-red for above-average metrics (relative to the average metric for all lines of the code object with a metric > 0).

Is there common need for such a module? 

Is it possible to have this included in the standard cPython distribution?

Which functional changes (besides a modification of the annotation output which shouldn't spread its result all over the FS) would be required to get this included?

Which non-functional changes would be required to get this included?

Please direct traffic regarding this subject to pyprof-devel at (no I'm not subscribed to python-dev).

SF project page:

git repository:



*) to be more exact there are at least three profiler modules: profile, cProfile, and hotshot, while I did only try (and failed) to use profile in a multi-threaded environment (by manually setting threading.profile to the profiling function), glancing at the source, I'm pretty sure that cProfile behaves similarly; I didn't test the hotshot module, but it does some other trade-offs (space-for-time), so I think that "pyprof" still adds some value
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <>

From aahz at  Mon May  4 17:56:04 2009
From: aahz at (Aahz)
Date: Mon, 4 May 2009 08:56:04 -0700
Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 04, 2009, Christian Schubert wrote:
> Python ships with a profiler module which, unfortunately, is almost
> useless in a multi-threaded environment. *
> I've created an alternative profiler module which queries per-thread
> CPU usage via netlink/taskstats, which limits the applicability to
> Linux (which shouldn't be much of an issue, profiling is usually
> not done by end users). It implements two modes: a "sampling" (does
> CPU time accounting based on stack fraames 100 times per second, by
> default) and a "deterministic" profiler (does CPU time accounting
> on each function call/return, based on sys.profiler interface). The
> deterministic profiler is currently implemented in pure python (except
> for taskstats interface) and much slower than the sampling profiler.

If you want to discuss this, please subscribe to python-ideas and repost
your message.  Generally speaking, in order to include modules like this,
they need to prove themselves over time and may require PEP approval.  If
you choose to move the discussion to python-ideas, it would help if you
mention known uses of your module.
Aahz (aahz at           <*>

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From fumanchu at  Mon May  4 18:15:24 2009
From: fumanchu at (Robert Brewer)
Date: Mon, 4 May 2009 09:15:24 -0700
Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python
In-Reply-To: <>
References: <>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6418B3B3@ex10.hostedexchange.local>

Christian Schubert wrote:
 > I've created an alternative profiler module which queries per-thread
 > CPU usage via netlink/taskstats, which limits the applicability to
 > Linux (which shouldn't be much of an issue, profiling is usually not
 > done by end users).

One of the uses for a profiling module is to compare runs on various 
platforms. And please, stop perpetuating the myth that only end-users 
use anything but Linux.

Robert Brewer
fumanchu at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From janssen at  Mon May  4 18:19:26 2009
From: janssen at (Bill Janssen)
Date: Mon, 4 May 2009 09:19:26 PDT
Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python
In-Reply-To: <>
References: <>
Message-ID: <>

Hi, Christian.

Christian Schubert <mail at> wrote:

> I've created an alternative profiler module which queries per-thread
> CPU usage via netlink/taskstats, which limits the applicability to
> Linux (which shouldn't be much of an issue, profiling is usually not
> done by end users).

A surprisingly large # of developers are running on OS X these days,
though.  I suggest make it work there, too.


From larry at  Mon May  4 19:08:12 2009
From: larry at (Larry Hastings)
Date: Mon, 04 May 2009 10:08:12 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Dickinson wrote:
> Still, binary compatibility seems like a fairly strong reason not to
> remove the closure field.

My understanding is that there a) 2.x extension modules are not binary 
compatible with 3.x, and b) there are essentially no 3.x extension 
modules in the field.  Is that accurate?  If we don't have an installed 
base (yet) to worry about, now's the time to make this change.


From amauryfa at  Mon May  4 19:17:15 2009
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Mon, 4 May 2009 19:17:15 +0200
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>


Larry Hastings wrote:
> Mark Dickinson wrote:
>> Still, binary compatibility seems like a fairly strong reason not to
>> remove the closure field.
> My understanding is that there a) 2.x extension modules are not binary
> compatible with 3.x, and b) there are essentially no 3.x extension modules
> in the field. ?Is that accurate? ?If we don't have an installed base (yet)
> to worry about, now's the time to make this change.

cx_Oracle at least uses this closure field, and has already been ported to 3.x:

Amaury Forgeot d'Arc

From larry at  Mon May  4 21:04:55 2009
From: larry at (Larry Hastings)
Date: Mon, 04 May 2009 12:04:55 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>	
Message-ID: <>

Amaury Forgeot d'Arc wrote:
> Larry Hastings wrote:
>> My understanding is that there a) 2.x extension modules are not binary
>> compatible with 3.x, and b) there are essentially no 3.x extension modules
>> in the field.  Is that accurate?  If we don't have an installed base (yet)
>> to worry about, now's the time to make this change.
> cx_Oracle at least uses this closure field, and has already been ported to 3.x:

And they're using it as a pointer, too!  Nice to see it not abused for once.

If it helps, I volunteer to port cx_Oracle to the new PyGetSetDef if my 
patch is accepted.  The resulting code would be backwards-compatible 
with Python 3.0, so it could be incorporated immediately.  Given the 
lack of interest in the proposal so far, this is an easy vow to make!


From daniel at  Mon May  4 21:11:06 2009
From: daniel at (Daniel Stutzbach)
Date: Mon, 4 May 2009 14:11:06 -0500
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 4:10 AM, Larry Hastings <larry at> wrote:

> So: you don't need it, it clutters up our code (particularly typeobject.c),
> and it adds overhead.  The only good reason to keep it is backwards
> compatibility, which I admit is a fine reason.

If you make the change, will 3rd party code that relies on it fail in
unexpected ways, or will they just get a compile error?

Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From p.f.moore at  Mon May  4 21:52:02 2009
From: p.f.moore at (Paul Moore)
Date: Mon, 4 May 2009 20:52:02 +0100
Subject: [Python-Dev] RFC: Threading-Aware Profiler for Python
In-Reply-To: <>
References: <> <>
Message-ID: <>

2009/5/4 Bill Janssen <janssen at>:
> Hi, Christian.
> Christian Schubert <mail at> wrote:
>> I've created an alternative profiler module which queries per-thread
>> CPU usage via netlink/taskstats, which limits the applicability to
>> Linux (which shouldn't be much of an issue, profiling is usually not
>> done by end users).
> A surprisingly large # of developers are running on OS X these days,
> though. ?I suggest make it work there, too.

And Windows. I doubt that the various Windows-specific modules
available were developed on Linux. And I wouldn't assume that all of
the platform-neutral modules are developed on Linux, or even that the
developers have access to Linux. (I know I don't, short of building a
brand new virtual machine...)


From dickinsm at  Mon May  4 22:00:23 2009
From: dickinsm at (Mark Dickinson)
Date: Mon, 4 May 2009 21:00:23 +0100
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 8:11 PM, Daniel Stutzbach
<daniel at> wrote:
> If you make the change, will 3rd party code that relies on it fail in
> unexpected ways, or will they just get a compile error?

I *think* that third party code that's recompiled for 3.1 and that
doesn't use the closure field will either just work, or will produce an
easily-fixed compile error.  Larry, does this sound right?

But I guess the bigger issue is that extensions already compiled against 3.0
that use PyGetSetDef (even if they don't make use of the closure field)
won't work with 3.1 without a recompile:  they'll segfault, or otherwise behave

If that's not considered a problem, then surely we ought to be getting rid of


From daniel at  Mon May  4 22:07:50 2009
From: daniel at (Daniel Stutzbach)
Date: Mon, 4 May 2009 15:07:50 -0500
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 3:00 PM, Mark Dickinson <dickinsm at> wrote:

> But I guess the bigger issue is that extensions already compiled against
> 3.0
> that use PyGetSetDef (even if they don't make use of the closure field)
> won't work with 3.1 without a recompile:  they'll segfault, or otherwise
> behave
> unpredictably.

I was under the impression that binary compatibility was only guaranteed
within a minor revision (e.g., 2.6.1 must run code compiled for 2.6.0, but
2.7.0 doesn't have to).  I've been wrong before, though.

Certainly the C extension module I maintain is sprinkled with #ifdef's so it
will compile under 2.5, 2.6, and 3.0. ;-)

Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Mon May  4 22:15:21 2009
From: solipsis at (Antoine Pitrou)
Date: Mon, 4 May 2009 20:15:21 +0000 (UTC)
Subject: [Python-Dev]
References: <>
Message-ID: <>

Mark Dickinson <dickinsm <at>> writes:
> I *think* that third party code that's recompiled for 3.1 and that
> doesn't use the closure field will either just work, or will produce an
> easily-fixed compile error.  Larry, does this sound right?

This doesn't sound right. The functions in the third party code will get
compiled with the wrong signature, so they can crash (or behave unexpectedly)
when called by Python.

From dickinsm at  Mon May  4 22:18:20 2009
From: dickinsm at (Mark Dickinson)
Date: Mon, 4 May 2009 21:18:20 +0100
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 9:15 PM, Antoine Pitrou <solipsis at> wrote:
> Mark Dickinson <dickinsm <at>> writes:
>> I *think* that third party code that's recompiled for 3.1 and that
>> doesn't use the closure field will either just work, or will produce an
>> easily-fixed compile error. ?Larry, does this sound right?
> This doesn't sound right. The functions in the third party code will get
> compiled with the wrong signature, so they can crash (or behave unexpectedly)
> when called by Python.

Yes, of course the signature of the getters and setters changes.  Please
ignore me. :-)


From larry at  Mon May  4 22:29:19 2009
From: larry at (Larry Hastings)
Date: Mon, 04 May 2009 13:29:19 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>	
Message-ID: <>

Mark Dickinson wrote:
> I *think* that third party code that's recompiled for 3.1 and that
> doesn't use the closure field will either just work, or will produce an
> easily-fixed compile error.  Larry, does this sound right?


> But I guess the bigger issue is that extensions already compiled against 3.0
> that use PyGetSetDef (even if they don't make use of the closure field)
> won't work with 3.1 without a recompile:  they'll segfault, or otherwise behave
> unpredictably.

Well, I think they'd work if they didn't use the closure and they had 
only one entry in their array of PyGetSetDefs.  But more than one, and 
yes it would behave unpredictably.  Probably segfault.

> If that's not considered a problem, then surely we ought to be getting rid of
> tp_reserved?

In principle they are equivalent, but in practice removing tp_reserved 
is a much bigger change.  Removing the closure field would result in 
obvious compile errors, and plenty of folks wouldn't even experience 
those.  Removing tp_reserved would affect everybody, with inscrutable 
compiler errors.

Personally I'd be up for removing tp_reserved.  But I lack the caution 
regarding backwards compatibility that has served Python so well, so 
you're ill-advised to listen to me.

Daniel Stutzbach wrote:
> I was under the impression that binary compatibility was only 
> guaranteed within a minor revision (e.g., 2.6.1 must run code compiled 
> for 2.6.0, but 2.7.0 doesn't have to).  I've been wrong before, though.

My understanding is that that's the explicit guarantee.  However Python 
has been well-served by being much more cautious than that, a policy 
with which I cannot find fault.

> Certainly the C extension module I maintain is sprinkled with #ifdef's 
> so it will compile under 2.5, 2.6, and 3.0. ;-)

Happily this is one change where you could maintain backwards 
compatibility without #ifdefs.  If you use the closure field, change 
your code to use stub functions and pass the closure data in yourself.


From greg at  Tue May  5 00:42:15 2009
From: greg at (Gregory P. Smith)
Date: Mon, 4 May 2009 15:42:15 -0700
Subject: [Python-Dev] update for 3.1
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 4, 2009 at 7:33 AM, Gregor Lingl <gregor.lingl at> wrote:

> Hi,
> Encouraged by a conversation with Martin at PyCon 2009
> I've prepared a version 1.1b of the turtle module and I'd like to
> get some advice or assistance to get it into the beta as explained
> below. Thus I'd appreciate very much if also the release manager
> would take notice of this posting.
> python 2.0 had the version 1.0 and for now I'll give a terse
> summary of the changes I did:
> 1. a few bugfixes, with 1 - 5 lines of code changed for each;
>   these concern bugs that prevented turtle to run correctly
> 2. I've added four methods to the class TurtleScreeenBase:
>   _onkeypress(fun, key)  (supplementing _onkeyrelease)
>   mainloop()  (which is now a Screen-method and a function)
>   textinput(title, prompt)
>   numinput(title, prompt, default, minval, maxval)
>          the latter two remedy the complete lack of input methods
>   _onkey, an internal method name is changed to _onkeyrelease
> 3. I've added one method to the class TurtleScreen:
>   onkeypress(fun, key=None)       implemented in analogy to the already
> present onkey()
>       which got onkeyrelease as an alias.
> 4. I've changed several portions of the code that affect
>  the representation of the turtleshape thus making it
>  more compact (by removing some duplicated code) and more
>  powerful, i. e. by adding the possibility to apply
>  shearings to turtleshapes (in addition to the already present
>  scaling and rotating transformations). Thus now the full
>  range of (non singular) linear transformations is available.
>  New methods in class RawTurtle:
>   shearfactor(shear=None)    set or get the shearfactor
>   shapetransform(t11, t12, t21, t22)
>                   set or get the shape transform directly
>   get_shapepoly() return the polygon of the current shape
>  I've enhanced the functionality of tiltangle(angle=None)
>  to contain also that of settiltangle and I propose to
>  declare settiltangle as deprecated.
>  5. I've removed a lot of codelines that were commented out
>  during the process of transferring the module from 2.6
>  to 3.0
> 6. I've implemented the bugfix for
>  according do my proposition there and I strongly
>  recommend this change again, as the bug described is very
>  annoying, the fix is easy and no one proposed a better
>  solution.
> 7. I've tested the present version 1.1 extensivly. It runs
>  all the demo scripts without problems and many others
>  too (some of them significantly better than version 1.1).
>  I'd like to add two additional scripts to the demo
>  directory, one of them using new features so it only runs
>  with this new version.
> I've *not* touched the issue of the Screen singleton, so that
> remains unchanged as it was as a result of Martins patch.
> Thus, as a summary, this update does some bugfixes and eliminates
> three deficiencies of the module: (1) accept keypress event,
> (2) provide user input functions and (3) complement scaling
> and rotating of turtleshapes by shearing, thus providing
> the full range of linear transforms.
> (1) Submit the new version as a single file
> (2) submit a unified diff containing all the changes
> (3) Divide the changes into several chunks of
>   related changes and submit the according diffs separately
>   That would pose the problems, that there are lines
>   in the code that are affected by several changes,
>   e. g. those lines that define __all__
>   And also: does the order of applying the patches matter?
>   How do I have to account for this?
> (4) Some other approach?

I'm happy with option #1.  If you find it reasonable to break things into
mutliple changes, feel free to do it, but at this point the turtle module
hasn't had a much love in ages so a large update in one commit is not a
problem IMHO.

> I'd appreciate to discuss open issues as needed and I'm
> prepared to give more elaborate explanations and rationales
> as wanted or as needed.
> Docs for the changes are (to a large extent) contained in the
> docstrings and I'm going to update the Documentation of the
> turtle module (on the basis of theses docstrings) now.
> Thanks in advance for your support
> Gregor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Tue May  5 02:27:36 2009
From: greg.ewing at (Greg Ewing)
Date: Tue, 05 May 2009 12:27:36 +1200
Subject: [Python-Dev] Building types programmatically (was: drop unnecessary
 "context" pointer from PyGetSetDef)
In-Reply-To: <>
References: <>
Message-ID: <>

Larry Hastings wrote:
> Removing tp_reserved would affect everybody, with inscrutable 
> compiler errors.

This would have to be considered in conjunction with the
proposed programmatic type-building API, I think.

I'd like to see a migration towards something like that,
BTW. Recently I had occasion to do some work on a Ruby
extension module, and I was struck by how much more
pleasant it was to be able to create a class and add a
few functions to it using calls, rather than having to
wrestle with a huge static struct declaration. While
I like the Python language better than Ruby, I think
Ruby's extension API is ahead in this particular area.


From zookog at  Tue May  5 05:36:50 2009
From: zookog at (Zooko O'Whielacronx)
Date: Mon, 4 May 2009 21:36:50 -0600
Subject: [Python-Dev] PEP 383 and Tahoe [was: GUI libraries]
In-Reply-To: <>
References: <>
Message-ID: <>

Thank you for sharing your extensive knowledge of these issues, SJT.

On Sun, May 3, 2009 at 3:32 AM, Stephen J. Turnbull <stephen at> wrote:
> Zooko O'Whielacronx writes:
>  > However, it is moot because Tahoe is not a new system. It is
>  > currently at v1.4.1, has a strong policy of backwards-
>  > compatibility, and already has lots of data, lots of users, and
>  > programmers building on top of it.
> Cool!

Thanks!  Actually yes it is extremely cool that it really does this
encryption, erasure-encoding, capability-based access control, and
decentralized topology all in a fully functional, stable system.  If
you're interested in such stuff then you should definitely check it

> Question: is there a way to negotiate versions, or better yet,
> features?

For the peer-to-peer protocol there is, but the persistent storage is
an inherently one-way communication.  A Tahoe client writes down
information, and at a later point a Tahoe client, possibly of a
different version, reads it.  There is no way for the original writer
to ask what versions or features the readers may eventually have.
But, the writer can write down optional information which will be
invisible to readers that don't know to look for it, but adding it
into the "metadata" dictionary.  For example:
renders the directory contents into json and results in this:

  "r\u00e9sum\u00e9.html": [
     "mutable": false,
     "metadata": {
      "ctime": 1241365319.0695441,
      "mtime": 1241365319.0695441
     "ro_uri": "URI:CHK:no2l46woyeri6xmhcrhhomgr5a:5p7cxw7ofacblmctmjtgmhi6jq7g5wf77tx6befn2rjsfpedzkia:3:10:8328",
     "size": 8328

A new version of Tahoe writing entries like this is constrained to
making the primary key (the filename) be a valid unicode string (if it
wants older Tahoe clients to be able to read the directory at all).
However, it is not constrained about what new keys it may add to the
"metadata" dict, which is where we propose to add the "failed_decode"
flag and the "original_bytes".

> Well, it's a high-dimensional problem.  Keeping track of all the
> variables is hard.

Well put.

>  That's why something like PEP 383 can be important
> to you even though it's only a partial solution; it eliminates one
> variable.

Would that it were so!  The possibility that PEP 383 could help me or
other like me is why I am trying so hard to explain what kind of help
I need.  :-)

>  > Suppose you have run "tahoe cp -r myfiles/ tahoe:" on a Linux
>  > system and then you inspect the files in the Tahoe filesystem,
>  > such as by examining the web interface [1] or by running
>  > "tahoe ls", either of which you could do either from the same
>  > machine where you ran "tahoe cp" or from a different machine
>  > (which could be using any operating system). We have the
>  > following requirements about what ends up in your Tahoe directory
>  > after that cp -r.
> Whoa! Slow down!  Where's "my" "Tahoe directory"?  Do you mean the
> directory listing?  A copy to whatever system I'm on?  The bytes that
> the Tahoe host has just loaded into a network card buffer to tell me
> about it?  The bytes on disk at the Tahoe host?  You'll find it a lot
> easier to explain things if you adopt a precise, consistent
> terminology.

Okay here's some more detail.

There exists a Tahoe directory, the bytes of which are encrypted,
erasure-coded, and spread out over multiple Tahoe servers.  (To the
servers it is utterly opaque, since it is encrypted with a symmetric
encryption key that they don't have.)  A Tahoe client has the
decryption key and it recovers the cleartext bytes.  (Note: the
internal storage format is not the json encoding shown above -- it is
a custom format -- the json format above is what is produced to be
exported through the API, and it serves as a useful example for e-mail
discussions.)  Then for each bytestring childname in the directory it
decodes it with utf-8 to get the unicode childname.

Does that all make sense?

>  > Requirement 1 (unicode):  Each filename that you see needs to be valid
>  > unicode
> What does "see" mean?  In directory listings?

Yes, either with "tahoe ls", with a FUSE plugin, wht the web UI.
Remove the trailing "?t=json" from the URL above to see an example.

>  Under what
> circumstances, if any, can what I see be different from what I get?

This a good question!  In the previous iteration of the Tahoe design,
you could sometimes get something from "tahoe cp" which is different
from what you saw with "tahoe ls".  In the current design -- , this is no
longer the case, because we abandon the requirement to have
"round-trip fidelity of bytes".

>  > Requirement 2 (faithful if unicode):  For each filename (byte
>  > string) in your myfiles directory,
> My local myfiles directory, or my Tahoe myfiles directory?

The local one.

>  > if that bytestring is the valid encoding of some string in your
>  > stated locale,
> Who stated the locale?  How?  Are you referring to what
> getfilesystemencoding returns?  This is a "(unicode) string", right?

Yes, and yes.

>  > Requirement 3 (no file left behind):  For each filename (byte
>  > string) in your myfiles directory, whether or not that byte
>  > string is the valid encoding of anything in your stated locale,
>  > then that file will be added into the Tahoe filesystem under
>  > *some* name (a good candidate would be mojibake, e.g. decode the
>  > bytes with latin-1, but that is not the only possibility).
> That's not even a possibility, actually.  Technically, Latin-1 has a
> "hole" from U+0080 to U+009F.  You need to add the C1 controls to fill
> in that gap.  (I don't think it actually matters in practice,
> everybody seems to implement ISO-8859/1 as though it contained the
> control characters ... except when detecting encodings ... but it pays
> to be precise in these things ....)

Perhaps windows-1252 would be a better codec for this purpose?
However it would be clearer for the purposes of this discussion, and
also perhaps for actual users of Tahoe, if instead of decoding with
windows-1252 in order to get a mojibake name, Tahoe would simply
generate a name like "badly_encoded_filename_#1".  Let's run with
that.  For clarity, assume that the arbitrary unicode filename that
Tahoe comes up with is "badly_encoded_filename_#1".  This doesn't
change anything in this story.  In particular it doesn't change the
fact that there might already be an entry in the directory which is
named "badly_encoded_filename_#1" even though it was *not* a badly
encoded filename, but a correctly encoded one.

>  > Now already we can say that these three requirements mean that
>  > there can be collisions -- for example a directory could have two
>  > entries, one of which is not a valid encoding in the locale, and
>  > whatever unicode string we invent to name it with in order to
>  > satisfy requirements 3 (no file left behind) and 1 (unicode)
>  > might happen to be the same as the (correctly-encoded) name of
>  > the other file.
> This is false with rather high probability, but you need some extra
> structure to deal with it.  First, claim the Unicode private planes
> for Tahoe.
[snip on long and intriguin instructions to perform unicode magic that
I don't understand]

Wait, wait.  What good would this do?  The current plan is that if the
filenames collide we increment the number at the end "#$NUMBER", if we
are just naming them "badly_encoded_filename_#1", or that we append
"~1" if we are naming them by mojibake.  And the current plan is that
the original bytes are saved in the metadata for future cyborg
archaeologists.  How would this complex unicode magic that I don't
understand improve the current plan?  Would it provide filenames that
are more meaningful or useful to the users than the
"badly_encoded_filename_#1" or the mojibake?

> The registry of characters is somewhat unpleasant, but it does allow
> you to detect filenames that are the same reliably.

There is no server, so to implement such a registry we would probably
have to include a copy of the registry inside each (encrypted,
erasure-encoded) directory.

>  > Possible Requirement 4 (faithful bytes if not unicode, a.k.a.
>  > "round-tripping"):
> PEP 383 gives you this, but you must store the encoding used for each
> such file name.

Well, at this point this has become an anti-requirement because it
causes the filename as displayed when examining the directory to be
different from the filename that results when cp'ing the directory.
Also I don't see why PEP 383's implementation of this would be better
than the previous iteration of the design in which this was
accomplished by simply storing the original bytes and then writing
them back out again on demand, or the design before that in which this
was accomplished by mojibake'ing the bytes (by decoding them with
windows-1252) and setting a flag indicating that this has been done.

I think I understand now that PEP 383 is better for the case that you
can't store extra metadata (such as our failed_decode flag or our
original_bytes), but you can ensure that the encoding that will be
used later matches the one that was used for decoding now.  Neither of
these two criteria apply to Tahoe, and I suspect that neither of them
apply to most uses other than the entirely local and non-persistent
"for x in os.listdir(): open(x)".

>  > But an even worse problem -- the show-stopper for me -- is that I
>  > don't want what Tahoe shows when you do "tahoe ls" or view it in a
>  > web browser to differ from what it writes out when you do
>  > "tahoe cp -r tahoe: newfiles/".
> But as a requirement, that's incoherent.  What you are "seeing" is
> Unicode, what it will write out is bytes.

In the new plan, we write the unicode filename out using Python's
unicode filesystem APIs, so Python will attempt to encode it into the
appropriate filesystem encoding (raising UnicodeEncodeError if it
won't fit).

>   That means that if multiple
> locales are in use on both the backup and restore systems, and the
> nominal system encodings are different, people whose personal default
> locales are not the same as the system's will see what they expect on
> the backup system (using system ls), mojibake on Tahoe (using tahoe
> ls), and *different* mojibake on the restore system (system ls,
> again).

Let's see...  Tahoe is a user-space program and lets Python determine
what the appropriate "sys.getfilesystemencoding()" is based on what
the user's locale was at Python startup.  So I don't think what you
wrote above is correct.  I think that in the first transition, from
source system to Tahoe, that either the name will be correctly
transcoded (i.e., it looks the same to the user as long as the locale
they are using to "look" at it, e.g. with "ls" or Nautilus or whatever
is the same as the locale that was set when their Python process
started up), or else it will be undecodable under their current locale
and instead will be replaced with either mojibake or
"badly_encoded_filename_#1".  Hm, here is a good argument in favor of
using mojibake to generate the arbitrary unicode name instead of
naming it "badly_encoded_filename_#1": because that's probably what ls
and Nautilus will show!  Let me try that...  Oh, cool, Nautilus and
GNU ls both replace invalid chars with U+FFFD (like the 'replace'
error handler does in Python's decode()) and append " (invalid
encoding)" to the end.  That sounds like an even better way to handle
it than either mojibake or "badly_encoded_filename_#1", and it also
means that it will look the same in Tahoe as it does in GNU ls and
Nautilus.  Excellent.

On the next transition, from Tahoe to system, Tahoe uses the Python
unicode API, which will attempt to encode the unicode filename into
the local filesystem encoding and raise UnicodeEncodeError if it

>  > Requirement 5 (no loss of information):  I don't want Tahoe to
>  > destroy information -- every transformation should be (in
>  > principle) reversible by some future computer-augmented
>  > archaeologist.
> UTF-8b would be just as good for storing the original bytestring, as
> long as you keep the original encoding.  It's actually probably
> preferable if PEP 383 can be assumed to be implemented in the
> versions of Python you use.

It isn't -- Tahoe doesn't run on Python 3.  Also Tahoe is increasingly
interoperating with tools written in completely different languages.
It is much easier for to tell all of those programmers (in my
documentation) that in the filename slot is the (normal, valid,
standard) unicode, and in the metadata slot there are the bytes than
to tell them about utf-8b (which is not even implemented in their
tools: JavaScript, JSON, C#, C, and Ruby).  I imagine that it would be
a deal-killer for many or most of them if I said they couldn't use
Tahoe reliably without first implementing utf-8b for their toolsets.

>  > 1. On Windows or Mac read the filename with the unicode APIs.
>  > Normalize the string with filename = unicodedata.normalize('NFC',
> NFD is probably better for fuzzy matching and display on legacy
> terminals.

I don't know anything about them, other than that Macintosh uses NFD
and everything else uses NFC.  Should I specify NFD?  What are these
"legacy terminals" of which you speak?  Will NFD make it look better
when I cat it to my vt102?  (Just kidding -- I don't have one.)

> Per the koi8-lucky example, you don't know if it succeeded for the
> right reason or the wrong reason.  You really should store the
> alleged_encoding used in the metadata, always.

Right -- got it.

>  > 2.b. If this decoding fails, then we decode it again with
>  > bytes.decode('latin-1', 'strict'). Do not normalize it. Store the
>  > resulting unicode object into the "filename" part, set the
>  > "failed_decode" flag to True. This is mojibake!
> Not necessarily.  Most ISO-8859/X names will fail to decode if the
> alleged_encoding is UTF-8, for example, but many (even for X != 1)
> will be correctly readable because of the policy of trying to share
> code points across Latin-X encodings.  Certainly ISO-8859/1 (and
> much ISO-8859/15) will be correct.

Ah.  What is the Japanese word for "word with some characters right
and other characters mojibake!"?  :-)

>  > Now a question for python-dev people: could utf-8b or PEP 383 be
>  > useful for requirements like the four requirements listed above?  If
>  > not, what requirements does PEP 383 help with?
> By giving you a standard, invertible way to represent anything that
> the OS can throw at you, it helps with all of them.

So, it is invertible only if you can assume that the same encoding
will be used on the second leg of the trip, right?  Which you can do
by writing down what encoding was used on this leg of the trip and
forcing it to use the same encoding on the other leg.  Except that we
can't force that to happen on Windows at all as far as I understand,
which is a show-stopper right there.  But even if we could, this would
require us to write down a bit of information and transmit it to the
other side and use it to do the encoding.  And if we are going to do
that, why don't we just transmit the original bytes?  Okay, maybe
because that would roughly double the amount of data we have to
transmit, and maybe we are stingy.  But if we are stingy we could
instead transmit a single added bit to indicate whether the name is
normal or mojibake, and then use windows-1252 to stuff the bytes into
the name.  One of those options has the advantage of simplicity to the
programmer ("There is the unicode, and there are the bytes."), and the
other has the advantage of good compression.  Both of them have the
advantage that nobody involved has to understand and possibly
implement a non-standard unicode hack.

I'm trying not to be too pushy about this (heaven knows I've been
completely wrong about things a dozen times in a row so far in this
design process), but as far as I can understand it, PEP 383 can be
used only when you can force the same encoding on both sides (the PEP
says that encoding "only 'works' if the data get converted back to
bytes with the python-escape error handler also").  That happens
naturally when both sides are in the same Python process, so PEP 383
naturally looks good in that context.  However, if the filenames are
going to be stored persistently or transmitted over a network, then it
seems simpler, easier, and more portable to use some other method than
PEP 383 to handle badly encoded names.

>  > I'm not sure that it can help if you are going to store the results
>  > of your os.listdir() persistently or if you are going to transmit
>  > them over a network.  Indeed, using the results that way could lead
>  > to unpleasant surprises.
> No more than any other system for giving a canonical Unicode spelling
> to the results of an OS call.

I think PEP 383 yields more surprises than the alternative of decoding
with error handler 'replace' and then including the original bytes
along with the unicode.  During the course of this process I have also
considered using two other mechanisms instead of decoding with error
handler 'replace' -- mojibake using windows-1252 or a simple
placeholder like "badly_encoded_filename_#1".  Any of these three seem
to be less surprising and similarly functional to PEP 383.  I have to
admit that they are not as elegant.  Utf-8b is a really neat hack, and
MvL's generalization of it to all unicode encodings is, too.

I'm still being surprised by it after trying to understand it for many
days now.  For example, what happens if you decode a filename with PEP
383, store that filename somewhere, and then later try to write a file
under that name on Windows?  If it only 'works' if the data get
converted back to bytes with the python-escape error handler, then can
you use the python-escape error handler when trying to, say, create a
new file on Windows?



From jmillikin at  Tue May  5 07:19:36 2009
From: jmillikin at (John Millikin)
Date: Mon, 4 May 2009 22:19:36 -0700
Subject: [Python-Dev] Undocumented change / bug in Python3's PyMapping_Check
Message-ID: <>

In Python 2, PyMapping_Check will return 0 for list objects. In Python
3, it returns 1. Obviously, this makes it rather difficult to
differentiate between mappings and other sized iterables. In addition,
it differs from the behavior of the ``collections.Mapping`` ABC --
isinstance([], collections.Mapping) returns False.

I believe the new behavior is erroneous, but would like to confirm
that before filing a bug.

The behavior can be seen from a C extension, or if you're lazy, using ctypes:

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.CDLL('').PyMapping_Check(ctypes.py_object([]))

Python 3.0.1+ (r301:69556, Apr 15 2009, 15:59:22)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> ctypes.CDLL('').PyMapping_Check(ctypes.py_object([]))

From larry at  Tue May  5 09:24:38 2009
From: larry at (Larry Hastings)
Date: Tue, 05 May 2009 00:24:38 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Mark Dickinson wrote:
>> This doesn't sound right. The functions in the third party code will get
>> compiled with the wrong signature, so they can crash (or behave unexpectedly)
>> when called by Python.
> Yes, of course the signature of the getters and setters changes.  Please
> ignore me. :-)

If they don't use the closure field, then either they won't compile due 
to type mismatches or they'll work fine.  There's a lot of code in 
CPython that didn't need to be changed for my remove-closure patch; the 
functions didn't bother taking the "void * closure" that they were going 
to ignore anyway, and then they cast the function pointer in the 
PyGetSetDef to make the compiler shut up.  Worked fine.  And, in nearly 
all cases, the static PyGetSetDefs omit the closure member, which means 
C initializes them with a 0.


From mal at  Tue May  5 10:40:51 2009
From: mal at (M.-A. Lemburg)
Date: Tue, 05 May 2009 10:40:51 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>
Message-ID: <>

On 2009-05-03 19:39, Martin v. L?wis wrote:
>> If the error handler is supposed to be used for codecs other than utf-8,
>> perhaps it should renamed something more generic, e.g. "surrogate-escape"?
> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
> it's an algorithm based on 16-bit or 32-bit code points.

If the error handler doesn't have anything to do with UTF-8, then why
do you use "utf8" in the name.

Please use a more descriptive name for the handler which does not cause
confusion with a existing codec.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 05 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                54 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From tjreedy at  Tue May  5 10:57:03 2009
From: tjreedy at (Terry Reedy)
Date: Tue, 05 May 2009 04:57:03 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <gtov0t$5fh$>

M.-A. Lemburg wrote:
> On 2009-05-03 19:39, Martin v. L?wis wrote:
>>> If the error handler is supposed to be used for codecs other than utf-8,
>>> perhaps it should renamed something more generic, e.g. "surrogate-escape"?
>> Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
>> it's an algorithm based on 16-bit or 32-bit code points.
> If the error handler doesn't have anything to do with UTF-8, then why
> do you use "utf8" in the name.
> Please use a more descriptive name for the handler which does not cause
> confusion with a existing codec.

Having already been confused, I agree.

From eric at  Tue May  5 11:13:58 2009
From: eric at (Eric Smith)
Date: Tue, 05 May 2009 05:13:58 -0400
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Mark Hammond wrote:
>> Is that enough consensus for it to go in?  If so, are there any core 
>> developers who could help me get it in before the 3.1 feature freeze?  
>> The patch should be in good shape; it has unit tests and updated 
>> documentation.
> I've taken the liberty of explicitly CCing Martin just incase he missed 
> the thread with all the noise regarding PEP383.
> If there are no objections from Martin or anyone else here, please feel 
> free to assign it to me (and mail if I haven't taken action by the day 
> before the beta freeze...)

Mark: I've reviewed this and it looks okay to me. It passes all the 
tests on Windows and Linux. But if you could take a look at it before 
the release tomorrow, I'd appreciate it.

I feel good enough about it to check it in if no one else gets to it.


From supreet.sethi at  Tue May  5 12:41:22 2009
From: supreet.sethi at (s|s)
Date: Tue, 5 May 2009 16:11:22 +0530
Subject: [Python-Dev] using help function in Py3k
Message-ID: <>


I Ran Python 3.0 for the first time. I used help() function and wrote
"modules hash". It issues an error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line 427,
in __call__
    return*args, **kwds)
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line
1675, in __call__
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line
1693, in interact
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line 1711, in help
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line
1799, in listmodules
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line
1913, in apropos
    ModuleScanner().run(callback, key, onerror=onerror)
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line 1875, in run
    source = loader.get_source(modname)
  File "/home/ss/eproj/xapian/INST/lib/python3.0/", line
293, in get_source
    self.source =
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line 1720, in read
    decoder = self._decoder or self._get_decoder()
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line 1506,
in _get_decoder
    make_decoder = codecs.getincrementaldecoder(self._encoding)
  File "/home/ss/eproj/xapian/INST//lib/python3.0/", line
960, in getincrementaldecoder
    decoder = lookup(encoding).incrementaldecoder
LookupError: unknown encoding: uft-8

The reason for errors is test/ directory which has got tests for
python parser are installed in Lib directory. I propose that these
files should be installed by default in some other directory.
Preferably in /share or /share/doc part of the tree.



From aahz at  Tue May  5 13:47:18 2009
From: aahz at (Aahz)
Date: Tue, 5 May 2009 04:47:18 -0700
Subject: [Python-Dev] using help function in Py3k
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 05, 2009, s|s wrote:
> I Ran Python 3.0 for the first time. I used help() function and wrote
> "modules hash". It issues an error.

Please file a report on
Aahz (aahz at           <*>

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From stephen at  Tue May  5 15:09:25 2009
From: stephen at (Stephen J. Turnbull)
Date: Tue, 05 May 2009 22:09:25 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

M.-A. Lemburg writes:
 > On 2009-05-03 19:39, Martin v. L?wis wrote:
 > >> If the error handler is supposed to be used for codecs other than utf-8,
 > >> perhaps it should renamed something more generic, e.g. "surrogate-escape"?
 > > 
 > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
 > > it's an algorithm based on 16-bit or 32-bit code points.

I don't understand this phrasing.  The algorithm is only applicable to
ASCII-compatible octet streams.  It results in code points by a simple
displacement of octet -> octet + 0xDC00.  It cannot be used on (say)
UTF-32 to deal with embedded surrogates.

Certainly, the computation requires (at least) 16 bit numbers, but the
input must be restricted to a stream of 8-bit code points, while the
output is 16- or 32-bit code points.

 > Please use a more descriptive name [than "utf-8b"] for the handler
 > which does not cause confusion with a existing codec.

But please don't use "surrogate-escape" or (as in the current PEP)
"python-escape"; it's not an escaping (quotation) mechanism.
"surrogate-replace", "surrogate-substitute", or "surrogate-translate"
would be better names.

From daniel at  Tue May  5 15:43:57 2009
From: daniel at (Daniel Stutzbach)
Date: Tue, 5 May 2009 08:43:57 -0500
Subject: [Python-Dev] using help function in Py3k
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 5, 2009 at 5:41 AM, s|s <supreet.sethi at> wrote:

> LookupError: unknown encoding: uft-8


Looks like a variation of Issue 4540 <> (or
a duplicate?  I can't tell)

Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From eric at  Tue May  5 16:08:34 2009
From: eric at (Eric Smith)
Date: Tue, 05 May 2009 10:08:34 -0400
Subject: [Python-Dev] [Fwd: [Python-checkins] r72331
	-	python/branches/py3k/Modules/posixmodule.c]
Message-ID: <>

Modules/posixmodule.c now compiles for me, but I get a Bus Error in 
test_lchflags when running test_posixmodule on Mac OS X 10.5. I'll open 
a release blocker bug on this.

-------- Original Message --------
Subject: [Python-checkins] r72331 - 
Date: Tue,  5 May 2009 15:07:31 +0200 (CEST)
From: eric.smith <python-checkins at>
To: python-checkins at

Author: eric.smith
Date: Tue May  5 15:07:30 2009
New Revision: 72331

Added missing semicolon.


Modified: python/branches/py3k/Modules/posixmodule.c
--- python/branches/py3k/Modules/posixmodule.c	(original)
+++ python/branches/py3k/Modules/posixmodule.c	Tue May  5 15:07:30 2009
@@ -1928,7 +1928,7 @@
  	if (!PyArg_ParseTuple(args, "O&i:lchmod", PyUnicode_FSConverter,
  	                      &opath, &i))
  		return NULL;
-	path = bytes2str(opath, 1)
+	path = bytes2str(opath, 1);
  	res = lchmod(path, i);
Python-checkins mailing list
Python-checkins at

From stephen at  Tue May  5 16:57:36 2009
From: stephen at (Stephen J. Turnbull)
Date: Tue, 05 May 2009 23:57:36 +0900
Subject: [Python-Dev]  PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:

 > I've updated the PEP accordingly.

I have three substantive comments.  First, although consequences for
Python 3 byte interfaces (ie, "none") are explicitly stated, as far as
I can see this PEP could apply to Python 2 as well.  I don't think
it's intended that way.  Either way, I think you should clarify that

Second, I suggest "surrogate-replace" as the name of the error handler
rather than "utf8b".  (Elsewhere I've suggested others, but I think
this is the best of the bunch.)

Third, it is not clear to me why non-decodable ASCII should be an
error.  There are plenty of low surrogates for the purpose.  Is there
another technical reason?  Stupid or not, Shift-JIS- and Big5-encoded
file systems are quite common in Asia still (including non-rewritable
media).  I think surrogate-replacement of ASCII should at least be an

I don't think "people shouldn't be using non-ASCII-compatible
encodings for locale encodings" is a sufficient rationale for a hard
error here.  I mean, of course they *should* be using UTF-8.  Maybe
Python 3.1 should just go ahead and error on any other encoding on
POSIX platforms? <wink>

I have a number of nitpicking comments and technical clarifications on
the PEP.  Rationale is in footnotes.  There were also a few typos I

1.  There is no such thing as a "half-surrogate" in Unicode.  "Lone
    surrogate" is clear enough.  Or for somewhat fancier English,
    "isolated surrogate" or "non-syntactic surrogate".  To emphasize
    that Python codecs will only produce them in contexts where a
    Unicode character or high surrogate (for UTF-16 Python) is
    syntactically required, "isolated low surrogate" or "isolated
    trailing surrogate" might be good.[1]

2.  The specification should state, and the discussion emphasize, that
    strings which were produced by surrogate replacement *must not* be
    used in data interchange with systems that do not specifically
    accept such strings, and that this is the responsibility of the

    Rather than saying that "dealing with such conflicts is out of
    scope of this PEP", I would say

    """Dealing with such conflicts is the responsibility of the
    application.  Since this PEP's mechanism produces valid Unicode
    where possible, and produces *invalid* code points only via the
    error handler, one strategy is for the application to validate all
    other sources of strings as Unicode conforming.  There may be
    other useful application-specific strategies, as well."""

3.  In the discussion, the transition from the example of alternative
    use of 'python-escape' to discussion of the error handler
    interface extension is a bit abrupt.  I suggest rewriting as:

    """The extension to the encode error handler interface proposed by
    this PEP is necessary to implement the 'utf8b' error handler,
    because there are required byte sequences which cannot be
    generated from replacement Unicode.  However, the encode error
    handler interface presently requires replacement Unicode to be
    provided in lieu of the non-encodable Unicode from the source
    string.  Then it promptly encodes that replacement Unicode.  In
    some error handlers, such as the 'utf8b' proposed here, it is also
    simpler and more efficient for the error handler to provide a
    pre-encoded replacement byte string, rather than forcing it to
    calculating Unicode from which the encoder would create the
    desired bytes."""

Typos (line references are to pep-0383.txt svn r72332):

l.  86: "Byte-orientied" -> "Byte-oriented"
l.  98, 118, 124, 127, 132, 136: "python-escape" -> "utf8b"
l. 130: "provide" -> "provided"
l. 134: "calculating" -> "calculate"

[1] Unicode 5.0 uses the terms "high-half" and "low-half" at least
    once, in section 16.6, but the context is such that I take it to
    refer to "half of the surrogate area".  Section 3.8 doesn't use
    these, instead noting that "leading" and "trailing" are sometimes
    used instead of "high" and "low".  Better to avoid the word "half"
    in PEP 383, I think.

[2] Since this error handler is going to be the default for POSIX I/O,
    of course people are going to mostly ignore that restriction.  The
    point is, passing such strings to systems that don't expect them
    is a bug, and the PEP should make it clear that it's the app's
    bug, not the other system's.  On the other hand, using those
    strings in a context of consenting adults (and I do mean
    double-opt-in here) is perfectly acceptable.  I'm specifically
    thinking of use in the Tahoe protocol discussed by Zooko
    O'Whielacronx; it may not be usable there for backward
    compatibility reasons, but "Unicode conformance" is not an issue
    in principle.

    This does imply that programs that take advantage of the error
    handler specified in this PEP are on their own if they accept data
    from any sources that are not known to be Unicode-conforming.
    OTOH, as far as I can see if other sources are known to be Unicode
    conformant, it's reasonably (but not perfectly) safe to combine
    them with strings from this PEP (and of course use either 'utf8b'
    or 'strict', as appropriate, when passing data out of Python).

From zookog at  Tue May  5 17:18:29 2009
From: zookog at (Zooko O'Whielacronx)
Date: Tue, 5 May 2009 09:18:29 -0600
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 5, 2009 at 8:57 AM, Stephen J. Turnbull <stephen at> wrote:
> 2. ?The specification should state, and the discussion emphasize, that
> ? ?strings which were produced by surrogate replacement *must not* be
> ? ?used in data interchange with systems that do not specifically
> ? ?accept such strings, and that this is the responsibility of the
> ? ?application.[2]

That sounds like a useful statement to make.  How would an application
make sure that they were producing only valid unicode?  How about add
an option to os.listdir() named "errors" with default value 'utf8b'
(or 'surrogate-replace', or whatever the name is)?  Then applications
which need to produce only valid unicode strings could pass
errors=strict, errors=ignore, or errors=replace?  (If anyone really
wants behavior like Python 3.0 then we could perhaps also add a new
one just for os.listdir() named errors=skipfilename.)

My most recent plan for Tahoe, as of the letter that I sent last
night, is to emulate the behavior of Nautilus and GNU ls by using the
'replace' error handler and (emulating Nautilus) to append " (invalid
encoding)" to the end of the string.  (screenshot: )

So if I could ask os.listdir to return filenames with U+FFFD in place
of undecodable characters, then I could subsequently do something

for f in os.listdir(d, errors='replace'):
    if u"\ufffd" in f:
        f += " (invalid encoding)"

(On top of that I would have to check for collisions, but that's out of scope.)



From google at  Tue May  5 17:25:46 2009
From: google at (MRAB)
Date: Tue, 05 May 2009 16:25:46 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:
> "Martin v. L?wis" writes:
>  > I've updated the PEP accordingly.
> I have three substantive comments.  First, although consequences for
> Python 3 byte interfaces (ie, "none") are explicitly stated, as far as
> I can see this PEP could apply to Python 2 as well.  I don't think
> it's intended that way.  Either way, I think you should clarify that
> point.
> Second, I suggest "surrogate-replace" as the name of the error handler
> rather than "utf8b".  (Elsewhere I've suggested others, but I think
> this is the best of the bunch.)

> Third, it is not clear to me why non-decodable ASCII should be an
> error.  There are plenty of low surrogates for the purpose.  Is there
> another technical reason?  Stupid or not, Shift-JIS- and Big5-encoded
> file systems are quite common in Asia still (including non-rewritable
> media).  I think surrogate-replacement of ASCII should at least be an
> option.
> I don't think "people shouldn't be using non-ASCII-compatible
> encodings for locale encodings" is a sufficient rationale for a hard
> error here.  I mean, of course they *should* be using UTF-8.  Maybe
> Python 3.1 should just go ahead and error on any other encoding on
> POSIX platforms? <wink>
I don't see why the error handler couldn't in principle be used with
encodings other than UTF-8, although in that case all of the low
surrogates should be open to use.

> I have a number of nitpicking comments and technical clarifications on
> the PEP.  Rationale is in footnotes.  There were also a few typos I
> noticed.
> 1.  There is no such thing as a "half-surrogate" in Unicode.  "Lone
>     surrogate" is clear enough.  Or for somewhat fancier English,
>     "isolated surrogate" or "non-syntactic surrogate".  To emphasize
>     that Python codecs will only produce them in contexts where a
>     Unicode character or high surrogate (for UTF-16 Python) is
>     syntactically required, "isolated low surrogate" or "isolated
>     trailing surrogate" might be good.[1]
> 2.  The specification should state, and the discussion emphasize, that
>     strings which were produced by surrogate replacement *must not* be
>     used in data interchange with systems that do not specifically
>     accept such strings, and that this is the responsibility of the
>     application.[2]
>     Rather than saying that "dealing with such conflicts is out of
>     scope of this PEP", I would say
>     """Dealing with such conflicts is the responsibility of the
>     application.  Since this PEP's mechanism produces valid Unicode
>     where possible, and produces *invalid* code points only via the
>     error handler, one strategy is for the application to validate all
>     other sources of strings as Unicode conforming.  There may be
>     other useful application-specific strategies, as well."""
> 3.  In the discussion, the transition from the example of alternative
>     use of 'python-escape' to discussion of the error handler
>     interface extension is a bit abrupt.  I suggest rewriting as:
>     """The extension to the encode error handler interface proposed by
>     this PEP is necessary to implement the 'utf8b' error handler,
>     because there are required byte sequences which cannot be
>     generated from replacement Unicode.  However, the encode error
>     handler interface presently requires replacement Unicode to be
>     provided in lieu of the non-encodable Unicode from the source
>     string.  Then it promptly encodes that replacement Unicode.  In
>     some error handlers, such as the 'utf8b' proposed here, it is also
>     simpler and more efficient for the error handler to provide a
>     pre-encoded replacement byte string, rather than forcing it to
>     calculating Unicode from which the encoder would create the
>     desired bytes."""
> Typos (line references are to pep-0383.txt svn r72332):
> l.  86: "Byte-orientied" -> "Byte-oriented"
> l.  98, 118, 124, 127, 132, 136: "python-escape" -> "utf8b"
> l. 130: "provide" -> "provided"
> l. 134: "calculating" -> "calculate"
> Footnotes: 
> [1] Unicode 5.0 uses the terms "high-half" and "low-half" at least
>     once, in section 16.6, but the context is such that I take it to
>     refer to "half of the surrogate area".  Section 3.8 doesn't use
>     these, instead noting that "leading" and "trailing" are sometimes
>     used instead of "high" and "low".  Better to avoid the word "half"
>     in PEP 383, I think.
"Leading" and "trailing" simply state the order, not the set ("high" or
"low"), so are not good terms to use.

> [2] Since this error handler is going to be the default for POSIX I/O,
>     of course people are going to mostly ignore that restriction.  The
>     point is, passing such strings to systems that don't expect them
>     is a bug, and the PEP should make it clear that it's the app's
>     bug, not the other system's.  On the other hand, using those
>     strings in a context of consenting adults (and I do mean
>     double-opt-in here) is perfectly acceptable.  I'm specifically
>     thinking of use in the Tahoe protocol discussed by Zooko
>     O'Whielacronx; it may not be usable there for backward
>     compatibility reasons, but "Unicode conformance" is not an issue
>     in principle.
>     This does imply that programs that take advantage of the error
>     handler specified in this PEP are on their own if they accept data
>     from any sources that are not known to be Unicode-conforming.
>     OTOH, as far as I can see if other sources are known to be Unicode
>     conformant, it's reasonably (but not perfectly) safe to combine
>     them with strings from this PEP (and of course use either 'utf8b'
>     or 'strict', as appropriate, when passing data out of Python).
Should there be a function or method to check for conformance and
lone surrogates?

From stephen at  Tue May  5 18:32:03 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 01:32:03 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

Zooko O'Whielacronx writes:

 > How would an application make sure that they were producing only
 > valid unicode?

That's very difficult.  There are a couple of sources that I can think
of, in Python: C modules, chr(), \u literals, and now codecs with the
'utf8b'.  There may be others.  You'd need to review your own code for
all of them very carefully, and you'd have to validate all strings
returned by non-validating APIs (which is all of them in Python now,
although many of them can probably be trusted, such as codecs not
using the 'utf8b' error handler).

 > How about add an option to os.listdir() named "errors" with default
 > value 'utf8b'

Seems reasonable to me, but Martin's probably thought more carefully
about it.  I don't think its applicable to your use case, though,
because you want to be able to *access* those files as well as display
the names to the users, right?  You won't be able to access those
files if you receive the names already munged by the error handler.

From stephen at  Tue May  5 19:31:28 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 02:31:28 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

MRAB writes:

 > > I don't think "people shouldn't be using non-ASCII-compatible
 > > encodings for locale encodings" is a sufficient rationale for a hard
 > > error here.  I mean, of course they *should* be using UTF-8.  Maybe
 > > Python 3.1 should just go ahead and error on any other encoding on
 > > POSIX platforms? <wink>
 > > 
 > I don't see why the error handler couldn't in principle be used with
 > encodings other than UTF-8, although in that case all of the low
 > surrogates should be open to use.

I should have been more clear here, I guess.  The error handler *can*,
and in the PEP *will be* by default, used with all "sane" locale
encodings on POSIX.

    It occurs to me that the PEP maybe should say that it is an error
    to have your POSIX locale set to UTF-16 or something like that.

What "sane" means in this context is

1.  ASCII NUL is the bytearray terminator, and can't be used as a byte
    in a file name.  This rules out UTF-16, UTF-32, and widechar EUC
    encodings, as well as some very rare ones.

2.  An ASCII character always translates to the Unicode character with
    the same code (ie, "to itself").  It is not a part of other
    sequences (control sequences, or a trailing byte).  This rules out
    EBCDIC, ISO-2022-*, Shift JIS, and Big5, among the encodings I'm
    familiar with.  EBCDIC because only by accident will an EBCDIC
    character map to the same ASCII character with the same code.  The
    ISO-2022-* encodings are out because ASCII characters are used in
    escape sequences.  Shift JIS and Big5 because in those encodings,
    a high-bit-set octet signals the start of a multibyte sequence,
    and some of the trailing bytes may be in the ASCII range.

What's left?  Well, UTF-8, all of the ISO-8859 sets, several national
standards (such as the KOI8 family for Cyrillic), IBM and Microsoft
"code pages", and the "packed" EUC encodings used for Japanese,
Chinese, and Korean.  These all have the character that ASCII is
ASCII, and all non-ASCII characters are encoded using only
high-bit-set octets.  In fact, in practice, on Unix these are
invariably what you encounter.

So what's the problem?  Backward compatibility for Microsoft OSes,
which not only used to use MBCS national character sets, but
"cleverly" packed more characters into the encoding by using ASCII as
trailing bytes.  Ie, the aforementioned "insane" Shift JIS (which is
mandated by the leading Japanese cellphone service provider even
today) and Big5 (the leading encoding for Chinese until very
recently).  These are very commonly found on archival media, and even
on USB keys and so on which tend to be FAT-formatted.  This doesn't
prevent usage of the Unicode APIs, but up to Windows 2000 most
Japanese vendors' OEM version of Windows used FAT format and Shift JIS
as the file system encoding, and I know of Japanese offices where
Windows 98 systems were in use as recently as early 2007.

It's the removable media which are the problem, because on Windows you
just use the Unicode APIs.  But they're not available on Unix, so you
need the byte-oriented APIs.

Is this a real problem?  I don't know, I don't do Windows, I don't do
computing with my cellphone, and I don't need to get Japanese (that
might be mixed with Russian ones!!) filenames off of ancient media or
CIFS fileshares using Shift JIS.  I guess it's possible that
cellphones do everything *except* add filenames to directories in
Shift JIS, but the filenames are in UTF-16.

OTOH, it seems to me that an *optional* extension to handling error on
ASCII is technically feasible and would be nearly trivial to add to
the PEP.  The biggest cost would be adding the error argument to
various functions (as Zooko requested) so that
surrogate-replace-extended could be specified if needed.

 > > Footnotes: 
 > > [1] Unicode 5.0 uses the terms "high-half" and "low-half" at least
 > >     once, in section 16.6, but the context is such that I take it to
 > >     refer to "half of the surrogate area".  Section 3.8 doesn't use
 > >     these, instead noting that "leading" and "trailing" are sometimes
 > >     used instead of "high" and "low".  Better to avoid the word "half"
 > >     in PEP 383, I think.
 > > 
 > "Leading" and "trailing" simply state the order, not the set ("high" or
 > "low"), so are not good terms to use.

But it's the order that's important.  If you've just finished reading
a character, and encounter a trailing surrogate, then it was produced
by the 'utf8b' error handler; nothing else in a Python codec can do
that.  If you've just finished reading a character, are in a UTF-16
Python, and encounter a leading surrogate, then you immediately gobble
the following code, which must be a trailing surrogate, and combine
them to produce a character.  The remaining case is that you encounter
a valid character.  Anything else is an error, and (assuming no bugs),
no Python codec will produce anything else.

 > >     This does imply that programs that take advantage of the error
 > >     handler specified in this PEP are on their own if they accept data
 > >     from any sources that are not known to be Unicode-conforming.
 > >     OTOH, as far as I can see if other sources are known to be Unicode
 > >     conformant, it's reasonably (but not perfectly) safe to combine
 > >     them with strings from this PEP (and of course use either 'utf8b'
 > >     or 'strict', as appropriate, when passing data out of Python).
 > > 
 > Should there be a function or method to check for conformance and
 > lone surrogates?

string.encode('utf-8',errors=strict) will do for now.

From google at  Tue May  5 19:45:45 2009
From: google at (MRAB)
Date: Tue, 05 May 2009 18:45:45 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> MRAB writes:
>  > > I don't think "people shouldn't be using non-ASCII-compatible
>  > > encodings for locale encodings" is a sufficient rationale for a hard
>  > > error here.  I mean, of course they *should* be using UTF-8.  Maybe
>  > > Python 3.1 should just go ahead and error on any other encoding on
>  > > POSIX platforms? <wink>
>  > > 
>  > I don't see why the error handler couldn't in principle be used with
>  > encodings other than UTF-8, although in that case all of the low
>  > surrogates should be open to use.
> I should have been more clear here, I guess.  The error handler *can*,
> and in the PEP *will be* by default, used with all "sane" locale
> encodings on POSIX.
>     It occurs to me that the PEP maybe should say that it is an error
>     to have your POSIX locale set to UTF-16 or something like that.
> What "sane" means in this context is
> 1.  ASCII NUL is the bytearray terminator, and can't be used as a byte
>     in a file name.  This rules out UTF-16, UTF-32, and widechar EUC
>     encodings, as well as some very rare ones.
It might be slightly OT, but sometimes strict UTF-8 encoding is violated
by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as
a terminator. I think I read that Microsoft sometimes does this.

From stephen at  Tue May  5 20:09:54 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 03:09:54 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

MRAB writes:

 > [snip]
 > It might be slightly OT, but sometimes strict UTF-8 encoding is violated
 > by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as
 > a terminator. I think I read that Microsoft sometimes does this.

Nice hack! as long as you don't let it escape.  But if 'strict' errors
on this, then PEP 383 'utf8b' will do the right thing, I think.

From l.mastrodomenico at  Tue May  5 20:16:03 2009
From: l.mastrodomenico at (Lino Mastrodomenico)
Date: Tue, 5 May 2009 20:16:03 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/5 Stephen J. Turnbull <stephen at>:
> Third, it is not clear to me why non-decodable ASCII should be an
> error.

The PEP originally allowed the conversion to U+DCxx of bytes below 128
that cannot be decoded by the encoding used, but this creates
potential security problems.

See: <>

Lino Mastrodomenico

From martin at  Tue May  5 22:46:26 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 05 May 2009 22:46:26 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

>  > > Perhaps. However, utf-8b doesn't really have to do anything with utf-8 -
>  > > it's an algorithm based on 16-bit or 32-bit code points.
> I don't understand this phrasing.  The algorithm is only applicable to
> ASCII-compatible octet streams.  It results in code points by a simple
> displacement of octet -> octet + 0xDC00.  It cannot be used on (say)
> UTF-32 to deal with embedded surrogates.
> Certainly, the computation requires (at least) 16 bit numbers, but the
> input must be restricted to a stream of 8-bit code points, while the
> output is 16- or 32-bit code points.

Right - the algorithm maps between bytes and 16/32-bit code units.
It works, in particular, for UTF-8, and was originally proposed to apply
to UTF-8 - but it can work in any other place that converts bytes to
16/32-bit code units as well.


From martin at  Tue May  5 23:01:49 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 05 May 2009 23:01:49 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

> I have three substantive comments.  First, although consequences for
> Python 3 byte interfaces (ie, "none") are explicitly stated, as far as
> I can see this PEP could apply to Python 2 as well.  I don't think
> it's intended that way.  Either way, I think you should clarify that
> point.

Done: the Python-Version header already clarifies that point.

> Second, I suggest "surrogate-replace" as the name of the error handler
> rather than "utf8b".

I think this is bike-shedding.

> Third, it is not clear to me why non-decodable ASCII should be an
> error.  There are plenty of low surrogates for the purpose.  Is there
> another technical reason?  Stupid or not, Shift-JIS- and Big5-encoded
> file systems are quite common in Asia still (including non-rewritable
> media).  I think surrogate-replacement of ASCII should at least be an
> option.

It's a security risk. If U+DCXX would map to \xXX, then somebody could
embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets
sanitized, nobody would expect that this will actually access ../

> 1.  There is no such thing as a "half-surrogate" in Unicode.  "Lone
>     surrogate" is clear enough.  Or for somewhat fancier English,
>     "isolated surrogate" or "non-syntactic surrogate".  To emphasize
>     that Python codecs will only produce them in contexts where a
>     Unicode character or high surrogate (for UTF-16 Python) is
>     syntactically required, "isolated low surrogate" or "isolated
>     trailing surrogate" might be good.[1]

Fixed. I removed the world "half" everywhere. It really doesn't mean
anything to me (it could have been called sunnygate instead, making
no difference).

I tried to understand "surrogate", and it was explained to me that
"surrogate" is something that stands for something - but then I
would argue that the two subsequence codes form a surrogate - they
stand for something else. The individual surrogate code (in Unicode
terminology) doesn't stand for anything. So don't you agree that
it is the Unicode terminology that is in error, not the PEP?

> 2.  The specification should state, and the discussion emphasize, that
>     strings which were produced by surrogate replacement *must not* be
>     used in data interchange with systems that do not specifically
>     accept such strings, and that this is the responsibility of the
>     application.[2]

No. The specification puts no requirements on applications whatsoever.
So if you propose to use MUST NOT in the RFC 2119 sense, I strongly

Applications that desire mojibake are free to produce it; we are
consenting adults; and all that.

> 3.  In the discussion, the transition from the example of alternative
>     use of 'python-escape' to discussion of the error handler
>     interface extension is a bit abrupt.  I suggest rewriting as:
>     """The extension to the encode error handler interface proposed by
>     this PEP is necessary to implement the 'utf8b' error handler,
>     because there are required byte sequences which cannot be
>     generated from replacement Unicode.  However, the encode error
>     handler interface presently requires replacement Unicode to be
>     provided in lieu of the non-encodable Unicode from the source
>     string.  Then it promptly encodes that replacement Unicode.  In
>     some error handlers, such as the 'utf8b' proposed here, it is also
>     simpler and more efficient for the error handler to provide a
>     pre-encoded replacement byte string, rather than forcing it to
>     calculating Unicode from which the encoder would create the
>     desired bytes."""

Unfortunately, I failed to understand where you want this text to
go. What paragraphs should I remove, or (if none), after which
paragraph should I insert this text?


From martin at  Tue May  5 23:44:25 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 05 May 2009 23:44:25 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

>     It occurs to me that the PEP maybe should say that it is an error
>     to have your POSIX locale set to UTF-16 or something like that.

No. It is *impossible* to have UTF-16 as the locale character set,
not an error. Your statement is like saying "it is an error to
breathe in the vacuum".

In any case, the discussion says

# Encodings that are not compatible with ASCII are not supported by
# this specification; bytes in the ASCII range that fail to decode
# will cause an exception. It is widely agreed that such encodings
# should not be used as locale charsets.


From mal at  Wed May  6 02:26:31 2009
From: mal at (M.-A. Lemburg)
Date: Wed, 06 May 2009 02:26:31 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Martin v. L?wis wrote:
>> I have three substantive comments.  First, although consequences for
>> Python 3 byte interfaces (ie, "none") are explicitly stated, as far as
>> I can see this PEP could apply to Python 2 as well.  I don't think
>> it's intended that way.  Either way, I think you should clarify that
>> point.
> Done: the Python-Version header already clarifies that point.
>> Second, I suggest "surrogate-replace" as the name of the error handler
>> rather than "utf8b".
> I think this is bike-shedding.

The name "utf8b" suggested in the PEP is not in line with the codec
design and causes confusion with an existing codec of a similar name.

Error handlers and codecs are two different things, so the namespaces
need to be clearly separate.

Please change the name of the error handler to a different name that
does not resemble or cause confusion with a codec name and fits the
scheme of error handler names we already have in place in Python for
replacing error handlers, i.e. "XYZreplace".

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 06 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                53 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From stephen at  Wed May  6 07:10:41 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 14:10:41 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:
 > >     It occurs to me that the PEP maybe should say that it is an error
 > >     to have your POSIX locale set to UTF-16 or something like that.
 > No. It is *impossible* to have UTF-16 as the locale character set,
 > not an error. Your statement is like saying "it is an error to
 > breathe in the vacuum".

I realize this is not useful, so maybe you don't need to mention it.
However, it certainly is possible to set LANG with an absurd, or
merely dangerous, encoding.

 > In any case, the discussion says
 > # Encodings that are not compatible with ASCII are not supported by
 > # this specification; bytes in the ASCII range that fail to decode
 > # will cause an exception. It is widely agreed that such encodings
 > # should not be used as locale charsets.

Which is your excuse for not supporting Shift JIS fully.  It doesn't
stop people from setting LC_ALL=ja_JP.shift_jis, or using Shift JIS as
the default encoding for certain media.

From stephen at  Wed May  6 07:35:30 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 14:35:30 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

Lino Mastrodomenico writes:
 > 2009/5/5 Stephen J. Turnbull <stephen at>:
 > > Third, it is not clear to me why non-decodable ASCII should be an
 > > error.
 > The PEP originally allowed the conversion to U+DCxx of bytes below 128
 > that cannot be decoded by the encoding used, but this creates
 > potential security problems.
 > See: <>

Yeah, yeah, this is the same old same old from PEP 3131.  Anything
that handles the various attacks based on ASCII-alike characters
should at least rule out invalid Unicode, too!

And where is this U+DC2F supposed to be coming from, anyway?  The
user's *local* environment or the user's *local* filesystem!  Codecs
not using 'utf8b' can't produce it, so the only other cases are chr()
and \u literals in the *local* process, or an already broken module in
your code.  I really can't imagine that any sane programmer these days
would be using 'utf8b' on bytes received from the Internet!

Of course I can't prove that there's no vector for an exploit here (in
fact, I'm sure there is one with sufficiently careless handling of
input), but I think "consenting adults" covers the Shift JIS use case.
Make it an option, but it should be explicitly part of the PEP.

From stephen at  Wed May  6 08:06:07 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 15:06:07 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:

 > Done: the Python-Version header already clarifies that point.

Ah, OK.  I wish my day job required reading more PEPs so I'd be more
familiar with these formalities. :-)

 > > Second, I suggest "surrogate-replace" as the name of the error handler
 > > rather than "utf8b".
 > I think this is bike-shedding.

I don't personally care (I already was aware of UTF-8B), but there are
plenty of others who do.  I think that's a good name to make
Marc-Andre and Terry happier.  You have to fix the existing uses of
the obsolete "python-escape", anyway.

 > It's a security risk. If U+DCXX would map to \xXX, then somebody could
 > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets
 > sanitized, nobody would expect that this will actually access ../

The odds that anybody will actually take notice of U+002E U+002E
U+002F in a string are sufficiently small that any number of exploits
have already been based on it.  I agree that there is some additional
risk from this if people make the check for "../" before they prepend
"\ucd2e\udc2e\udc2f", but I think that risk is very small compared to
the pain of having a error handler whose raison d'etre is to not raise
exceptions go ahead and raise them anyway.

See also my reply to Lino Mastrodomenico.  Again, an option is good
enough for my purposes as long as interfaces for os.listdir() and the
like support setting the error handler (cf. Zooko's proposal), but I
think the option should be available.

 > I tried to understand "surrogate", and it was explained to me that
 > "surrogate" is something that stands for something - but then I
 > would argue that the two subsequence codes form a surrogate - they
 > stand for something else. The individual surrogate code (in Unicode
 > terminology) doesn't stand for anything. So don't you agree that
 > it is the Unicode terminology that is in error, not the PEP?

Plausibly so.  Keep making comments like that and nobody will ever let
you off the hook for being a non-native speaker!

However, "surrogate" in English is typically used in situation that
are too complex to be covered by simply "substitution."  I've always
read "surrogate" as "alternative form of encoding", and "surrogate
code point" as "code point in that alternative form of encoding".
Where it's an alternative to code-point-is-scalar-value.  I think
probably the authors of the terminology just made the best of a bad
situation, I can't think of a better single word for this.

 > No. The specification puts no requirements on applications whatsoever.
 > So if you propose to use MUST NOT in the RFC 2119 sense, I strongly
 > disagree.

I do propose that.

But you're writing the PEP, so this battle will have to be deferred.
Eventually Python will have to take a stand on Unicode conformance,
but it's not urgent yet.

 > > 3.  In the discussion, the transition from the example of alternative
 > >     use of 'python-escape' to discussion of the error handler
 > >     interface extension is a bit abrupt.  I suggest rewriting as:
 > > 
 > >     """The extension to the encode error handler interface proposed by
 > >     this PEP is necessary to implement the 'utf8b' error handler,
 > >     because there are required byte sequences which cannot be
 > >     generated from replacement Unicode.  However, the encode error
 > >     handler interface presently requires replacement Unicode to be
 > >     provided in lieu of the non-encodable Unicode from the source
 > >     string.  Then it promptly encodes that replacement Unicode.  In
 > >     some error handlers, such as the 'utf8b' proposed here, it is also
 > >     simpler and more efficient for the error handler to provide a
 > >     pre-encoded replacement byte string, rather than forcing it to
 > >     calculating Unicode from which the encoder would create the
 > >     desired bytes."""
 > Unfortunately, I failed to understand where you want this text to
 > go. What paragraphs should I remove, or (if none), after which
 > paragraph should I insert this text?

Sorry!  I suggest substituting the paragraph above for the paragraph
which begins "The encode error handler interface presentlyrequires..."
at line 129.

I think I forgot to do this before:  "I hereby dedicate all text
I suggest for inclusion in the PEP to the public domain."

From martin at  Wed May  6 09:31:00 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 09:31:00 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

> The name "utf8b" suggested in the PEP is not in line with the codec
> design

Where is that design documented, and how exactly violates the name
the design (chapter and verse, please).

> Error handlers and codecs are two different things, so the namespaces
> need to be clearly separate.

They *are* separate naemspaces; that's guaranteed by the implementation.


From martin at  Wed May  6 09:36:01 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 09:36:01 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> "Martin v. L?wis" writes:
>  > >     It occurs to me that the PEP maybe should say that it is an error
>  > >     to have your POSIX locale set to UTF-16 or something like that.
>  > 
>  > No. It is *impossible* to have UTF-16 as the locale character set,
>  > not an error. Your statement is like saying "it is an error to
>  > breathe in the vacuum".
> I realize this is not useful, so maybe you don't need to mention it.
> However, it certainly is possible to set LANG with an absurd, or
> merely dangerous, encoding.

How so? The C library will filter it out.

>  > In any case, the discussion says
>  > 
>  > # Encodings that are not compatible with ASCII are not supported by
>  > # this specification; bytes in the ASCII range that fail to decode
>  > # will cause an exception. It is widely agreed that such encodings
>  > # should not be used as locale charsets.
> Which is your excuse for not supporting Shift JIS fully.  It doesn't
> stop people from setting LC_ALL=ja_JP.shift_jis, 

Well, it *does* stop them from doing so if their systems don't support
the locale setting.

In any case, if they do this, PEP 383 will not support them.

> or using Shift JIS as the default encoding for certain media.

I fail to see how this could ever matter. If, by "media", you mean
things like removable disks, and the file name encoding used on them,
it's fairly irrelevant for the PEP, since Python won't start using
Shift JIS as its file system encoding just because that's the encoding
used on the disk.


From martin at  Wed May  6 09:53:33 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 09:53:33 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

>  > > Second, I suggest "surrogate-replace" as the name of the error handler
>  > > rather than "utf8b".
>  > 
>  > I think this is bike-shedding.
> I don't personally care (I already was aware of UTF-8B), but there are
> plenty of others who do. 

I think it is a fairly bad name, because it is easy to confuse it with
the "surrogates" error handler (unless you suggest to rename that also).

> You have to fix the existing uses of
> the obsolete "python-escape", anyway.

Indeed - but only in the PEP. In the implementation, it's already utf8b
throughout. Now it is also in the PEP; thanks for pointing that out.

>  > It's a security risk. If U+DCXX would map to \xXX, then somebody could
>  > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets
>  > sanitized, nobody would expect that this will actually access ../
> The odds that anybody will actually take notice of U+002E U+002E
> U+002F in a string are sufficiently small that any number of exploits
> have already been based on it.  I agree that there is some additional
> risk from this if people make the check for "../" before they prepend
> "\ucd2e\udc2e\udc2f", but I think that risk is very small compared to
> the pain of having a error handler whose raison d'etre is to not raise
> exceptions go ahead and raise them anyway.

The problem is that functions like normpath will recognize ../, and
that applications rely on them for file name sanitation. If they could
be tricked into writing outside of their target folders, this would
be a huge security risk.

OTOH, I don't care breaking applications on misconfigured systems.
People using SJIS as their locale encodings have bigger problems
than Python raising exceptions.

> See also my reply to Lino Mastrodomenico.


> But you're writing the PEP, so this battle will have to be deferred.
> Eventually Python will have to take a stand on Unicode conformance,
> but it's not urgent yet.

I think it's always applications that are conforming or not, rather
than libraries. Libraries should allow to write conforming applications.
They may refuse to write certain non-conforming applications (although
users then replace the library with one that does allow them to do
what they want). Libraries can never enforce that applications conform
to some standard.

> Sorry!  I suggest substituting the paragraph above for the paragraph
> which begins "The encode error handler interface presentlyrequires..."
> at line 129.

Ah, ok. This was Glen Linderman's text before - now it's yours :-)

> I think I forgot to do this before:  "I hereby dedicate all text
> I suggest for inclusion in the PEP to the public domain."



From martin at  Wed May  6 10:03:47 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 10:03:47 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

> Yeah, yeah, this is the same old same old from PEP 3131.  Anything
> that handles the various attacks based on ASCII-alike characters
> should at least rule out invalid Unicode, too!
> And where is this U+DC2F supposed to be coming from, anyway?  The
> user's *local* environment or the user's *local* filesystem! 

Why is that not a threat? Suppose you have a setuid application, and
you pass some string on the command line that decodes to /../. Then
the setuid application will be tricked into modifying files it didn't
mean to modify.

Likewise, it might come from a relational database. Use a relational
database that supports unicode code units, or lone surrogates through
utf-8, and fill in some bogus data. Then have the Python application
(running as root) read it.

> Of course I can't prove that there's no vector for an exploit here (in
> fact, I'm sure there is one with sufficiently careless handling of
> input), but I think "consenting adults" covers the Shift JIS use case.
> Make it an option, but it should be explicitly part of the PEP.

Nothing is lost at the moment. If users complain, we can still think
of ways to enhance the experience.

In any case, Python 3.1b1 may get released today, so it's way too late
for new features in the PEP. They can wait for Python 3.2.


From ziade.tarek at  Wed May  6 11:01:14 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 6 May 2009 11:01:14 +0200
Subject: [Python-Dev] Help on issue 5941
Message-ID: <>


I need some help on

The bug is quite simple: the Distutils unixcompiler used to set the
archiver command to "ar -rc".

For quite a while now, this behavior has changed in order to be able
to customize the compiler behavior from
the environment. That introduced a regression because the mechanism in
Distutils that looks for the
AR variable in the environment also looks into the Makefile of Python.
(in the Makefile then is os.environ)

And as a matter of fact, AR is set to "ar" in there, so the -cr option
is not set anymore.

So my question is : should I make a change into the Makefile by adding
for example a variable called AR_OPTIONS
then build the ar command with AR + AR_OPTIONS


that doesn't make sense and I just need to change the  behavior so it
doesn't look for AR into the Makefile. (just in os.environ)


Tarek Ziad? |

From solipsis at  Wed May  6 11:17:43 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 09:17:43 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>	<>	<>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> > I don't personally care (I already was aware of UTF-8B), but there are
> > plenty of others who do. 
> I think it is a fairly bad name, because it is easy to confuse it with
> the "surrogates" error handler (unless you suggest to rename that also).

I didn't bother to say it at the time, but I think "surrogates" is a pretty bad
name. It should be more indicative of what it does, e.g. "surrogates-pass", or

> >  > It's a security risk. If U+DCXX would map to \xXX, then somebody could
> >  > embed U+DC2E U+DC2E U+DC2F into a character string; even if this gets
> >  > sanitized, nobody would expect that this will actually access ../

Agreed this is an annoying security breach. The whole point of the PEP is that
application developers do not have to care about filename encoding issues,
which is defeated is they have to check for strange (illegal) combinations of

By the way, what are the ASCII characters that are not suppported by Shift-JIS?
Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
backslash and the tilde).



From stephen at  Wed May  6 11:39:02 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 18:39:02 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:

 > I fail to see how this could ever matter. If, by "media", you mean
 > things like removable disks, and the file name encoding used on them,
 > it's fairly irrelevant for the PEP, since Python won't start using
 > Shift JIS as its file system encoding just because that's the encoding
 > used on the disk.

I'm sorry for the lack of clarity of my posts, but somehow you're
completely missing the point.  The point is precisely that Python
*won't* use Shift JIS as the file system encoding (if it did there
would be no problem with reading Shift JIS), but the people who
created the media *did*.

Now, with Python's file system encoding == UTF-8 or any packed EUC,
and more than a handful of Shift JIS or Big5 characters in file names,
one is *almost certain* to encounter ASCII as the second byte of a
multibyte sequence.  PEP 383 can't handle this, but it is sure to be
the most common use case for PEP 383 in East Asia.

From mal at  Wed May  6 11:53:12 2009
From: mal at (M.-A. Lemburg)
Date: Wed, 06 May 2009 11:53:12 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>> The name "utf8b" suggested in the PEP is not in line with the codec
>> design
> Where is that design documented, and how exactly violates the name
> the design (chapter and verse, please).

Martin, I designed the whole Python codec machinery, so even if
this is not explicitly written down somewhere, you can take my
word for it.

I don't want users to be confused by such an error handler
name, so please change it !

Here's a list of the currently available error handlers (taken from

        The .encode()/.decode() methods may use different error
        handling schemes by providing the errors argument. These
        string values are predefined:

         'strict' - raise a ValueError error (or a subclass)
         'ignore' - ignore the character and continue with the next
         'replace' - replace with a suitable replacement character;
                    Python will use the official U+FFFD REPLACEMENT
                    CHARACTER for the builtin Unicode codecs on
                    decoding and '?' on encoding.
         'xmlcharrefreplace' - Replace with the appropriate XML
                               character reference (only for encoding).
         'backslashreplace'  - Replace with backslashed escape sequences
                               (only for encoding).

        The set of allowed values can be extended via register_error.

>> Error handlers and codecs are two different things, so the namespaces
>> need to be clearly separate.
> They *are* separate naemspaces; that's guaranteed by the implementation.

In the implementation, yes, but not in the head of a typical user:
the 'utf8b' looks more like a codec name than an error handler

I want to avoid any such confusion with Python codecs and don't
understand why you are making a problem out of this.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 06 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                53 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From google at  Wed May  6 12:08:45 2009
From: google at (MRAB)
Date: Wed, 06 May 2009 11:08:45 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Martin v. L?wis wrote:
>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>> design
>> Where is that design documented, and how exactly violates the name
>> the design (chapter and verse, please).
> Martin, I designed the whole Python codec machinery, so even if
> this is not explicitly written down somewhere, you can take my
> word for it.
> I don't want users to be confused by such an error handler
> name, so please change it !
> Here's a list of the currently available error handlers (taken from
>         The .encode()/.decode() methods may use different error
>         handling schemes by providing the errors argument. These
>         string values are predefined:
>          'strict' - raise a ValueError error (or a subclass)
>          'ignore' - ignore the character and continue with the next
>          'replace' - replace with a suitable replacement character;
>                     Python will use the official U+FFFD REPLACEMENT
>                     CHARACTER for the builtin Unicode codecs on
>                     decoding and '?' on encoding.
>          'xmlcharrefreplace' - Replace with the appropriate XML
>                                character reference (only for encoding).
>          'backslashreplace'  - Replace with backslashed escape sequences
>                                (only for encoding).
>         The set of allowed values can be extended via register_error.
>>> Error handlers and codecs are two different things, so the namespaces
>>> need to be clearly separate.
>> They *are* separate naemspaces; that's guaranteed by the implementation.
> In the implementation, yes, but not in the head of a typical user:
> the 'utf8b' looks more like a codec name than an error handler
> name.
Judging by the existing names, I think that 'surrogate' would be
reasonable. It already contains the meaning of substitute, it's not too
long, and the codes which act as replacements are already called

> I want to avoid any such confusion with Python codecs and don't
> understand why you are making a problem out of this.

From solipsis at  Wed May  6 12:11:56 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 10:11:56 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

MRAB <google <at>> writes:
> Judging by the existing names, I think that 'surrogate' would be
> reasonable. It already contains the meaning of substitute,

Only if you are a native English-speaker I suppose... For me it's just a
technical term denoting a certain class of unicode code points (I'm not sure of
the latter terminology ;-)).



From l.mastrodomenico at  Wed May  6 12:22:50 2009
From: l.mastrodomenico at (Lino Mastrodomenico)
Date: Wed, 6 May 2009 12:22:50 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/6 Antoine Pitrou <solipsis at>:
> By the way, what are the ASCII characters that are not suppported by Shift-JIS?
> Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
> backslash and the tilde).

The biggest problem with Shift-JIS is that a perfectly valid unicode
character above 127 can be encoded to a byte sequence that includes
bytes in range(128).

E.g. the character ? (a.k.a. '\u639b') when encoded with Shift-JIS
becomes the two bytes sequence b'\x8a|'. Notice that the second byte
is 124, which on POSIX is usually interpreted as the pipe character
and can have security implications.

It's a know problem with Shift-JIS and was fixed in UTF-8.

Lino Mastrodomenico

From regebro at  Wed May  6 12:28:22 2009
From: regebro at (Lennart Regebro)
Date: Wed, 6 May 2009 12:28:22 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, May 6, 2009 at 09:31, "Martin v. L?wis" <martin at> wrote:
> They *are* separate naemspaces; that's guaranteed by the implementation.

Yes. But utf8b *sounds like* an encoding. When it isn't. I sure
thought it was when it was first mentioned. I agree that it would be
better to find another name.


Is it only usable with utf8 as an encoding?
Lennart Regebro: Python, Zope, Plone, Grok
+33 661 58 14 64

From stephen at  Wed May  6 13:39:18 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 20:39:18 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

Lino Mastrodomenico writes:

 > It's a know problem with Shift-JIS and was fixed in UTF-8.

It was fixed in EUC before Shift-JIS was invented by Microsoft or Big5
was invented by the Taiwanese clone makers.  Guido's not the only
language designer with a time machine....

From stephen at  Wed May  6 15:33:17 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 06 May 2009 22:33:17 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:

 > > Yeah, yeah, this is the same old same old from PEP 3131.  Anything
 > > that handles the various attacks based on ASCII-alike characters
 > > should at least rule out invalid Unicode, too!
 > > 
 > > And where is this U+DC2F supposed to be coming from, anyway?  The
 > > user's *local* environment or the user's *local* filesystem! 
 > Why is that not a threat? Suppose you have a setuid application, and
 > you pass some string on the command line that decodes to /../. Then
 > the setuid application will be tricked into modifying files it didn't
 > mean to modify.

Of course this is a threat, assuming that the application takes no
precautions.  But first, it should be stopped by any of several
standard precautions.  For example, applying os.path.realpath (come to
think of it, PEP 383 should say something about realpath, shouldn't
it?) and os.path.normpath (PEP 383 should definitely say something
about this function; maybe PEP 3131 should, too) before checking
access restrictions.  If you're not running your paths through those,
you're already vulnerable to symlink attacks, and maybe other forms of

Second, it's a threat already enabled by your restricted version of
PEP 383.  Access control applies to subdirectories as well as to
parent directories.  Since you can insert arbitrary non-ASCII bytes
into the path using the current definition of 'utf8b', name-based
access restrictions can be bypassed in exactly the same way for any
directory whose name is not 100.00% ASCII, and the setuid application
will be tricked into modifying files it didn't mean to modify.

Also, on Mac OS X, system directories, including directories
containing system libraries, frameworks, and executables, may be
accessible via locale-specific names (I don't have a Japanese-
localized Mac at hand to check, but I'm pretty sure in my old Mac the
Japanese names appeared in ls in, which means it may be
possible to access system directories containing libraries,
frameworks, and executables this way).  Those can be spoofed in
exactly the same way.

 > Nothing is lost at the moment.

Nothing is lost compared to 'strict', true, but under the PEP as it is
a large fraction of Shift JIS and Big5 filenames cannot be read under
ASCII-compatible file system encodings using 'utf8b'.  Yet it is those
users who are placed at risk by PEP 383.

 > In any case, Python 3.1b1 may get released today, so it's way too late
 > for new features in the PEP. They can wait for Python 3.2.

You have convinced me that the PEP should wait as well.

In its current form it is incomplete and dangerous.

From solipsis at  Wed May  6 15:40:16 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 13:40:16 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>
Message-ID: <>

Stephen J. Turnbull <stephen <at>> writes:
> Nothing is lost compared to 'strict', true, but under the PEP as it is
> a large fraction of Shift JIS and Big5 filenames cannot be read under
> ASCII-compatible file system encodings using 'utf8b'.

You should really be more specific. I'm not sure about others, but I don't
understand what filenames you are talking about.

From rdmurray at  Wed May  6 15:55:16 2009
From: rdmurray at (R. David Murray)
Date: Wed, 6 May 2009 09:55:16 -0400 (EDT)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 6 May 2009 at 13:40, Antoine Pitrou wrote:
> Stephen J. Turnbull <stephen <at>> writes:
>> Nothing is lost compared to 'strict', true, but under the PEP as it is
>> a large fraction of Shift JIS and Big5 filenames cannot be read under
>> ASCII-compatible file system encodings using 'utf8b'.
> You should really be more specific. I'm not sure about others, but I don't
> understand what filenames you are talking about.

Seems to me that the best thing to do would be to file a bug report with
test cases that demonstrate the problems when run against the current
py3k trunk.

Especially the security issues you cite (which I don't understand).


From zooko at  Wed May  6 15:48:57 2009
From: zooko at (Zooko Wilcox-O'Hearn)
Date: Wed, 6 May 2009 07:48:57 -0600
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On May 6, 2009, at 7:33 AM, Stephen J. Turnbull wrote:

> You have convinced me that the PEP should wait as well.
> In its current form it is incomplete and dangerous.

+1 on delaying PEP 383

I think PEP 383 is a good idea in principle, but I'm still struggling  
to understand it myself, and it seems to offer new hazards for the  
unwary programmer.

On the other hand, maybe the wary programmers are waiting for Python  
3.2 anyway <wink>.

On the gripping hand, if PEP 383 is released in Python 3.1, will that  
obligate python-dev to support it indefinitely, at least in backwards- 
compatibility mode?  I'm not thinking of API compatibility as much as  
data compatibility -- someone used Python 3.1 to write down some  
filenames, and now a few years later they are trying to use the  
latest and greatest Python release to read those filenames...



From foom at  Wed May  6 16:41:53 2009
From: foom at (James Y Knight)
Date: Wed, 6 May 2009 10:41:53 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote:
> Now, with Python's file system encoding == UTF-8 or any packed EUC,
> and more than a handful of Shift JIS or Big5 characters in file names,
> one is *almost certain* to encounter ASCII as the second byte of a
> multibyte sequence.  PEP 383 can't handle this

Hm, I haven't tried the implementation, but I thought that what would  
happen is:
'\x85a'.decode('utf-8', 'utf8b/surrogate-replace/whateveritscalled') - 
 > u'\uDC85a'

If that indeed doesn't happen, that's certainly a defect and should be  

> , but it is sure to be
> the most common use case for PEP 383 in East Asia.



From ncoghlan at  Wed May  6 16:59:30 2009
From: ncoghlan at (Nick Coghlan)
Date: Thu, 07 May 2009 00:59:30 +1000
Subject: [Python-Dev] Undocumented change / bug in Python3's
In-Reply-To: <>
References: <>
Message-ID: <>

John Millikin wrote:
> In Python 2, PyMapping_Check will return 0 for list objects. In Python
> 3, it returns 1. Obviously, this makes it rather difficult to
> differentiate between mappings and other sized iterables. In addition,
> it differs from the behavior of the ``collections.Mapping`` ABC --
> isinstance([], collections.Mapping) returns False.
> I believe the new behavior is erroneous, but would like to confirm
> that before filing a bug.

It's not a bug.

PyMapping_Check just tells you if a type has an entry in the
tp_as_mapping->mp_subscript slot. In 2.x, it used to have an additional
condition that the tp_as_sequence->sq_slice slot be empty, but that has
gone away in Py3k because the sq_slice slot has been removed.

Even in 2.x that test wasn't a reliable way of telling if something was
a mapping or a sequence - it happened to get it right for lists and
tuples (since they define __getslice__ and __setslice__), but this is
not the case for new-style user defined sequences:

>>> from operator import isMappingType
>>> class MySeq(object):
...   def __getitem__(self, idx):
...     # Is this a mapping or an unsliceable sequence?
...     return idx*2
>>> isMappingType(MySeq())

Using the new collections module ABCs to check for sequences and
mappings. That's what they're for, and they will give you a much more
reliable answer than the C level checks (which are really just an
implementation detail).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Wed May  6 18:54:37 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 16:54:37 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>
Message-ID: <>

Zooko Wilcox-O'Hearn <zooko <at>> writes:
> I'm not thinking of API compatibility as much as  
> data compatibility -- someone used Python 3.1 to write down some  
> filenames, and now a few years later they are trying to use the  
> latest and greatest Python release to read those filenames...

Well, if the filenames are generated by Python (as opposed to read from an
existing directory on disk), they should be regular unicode objects without any
lone surrogates, so I don't see the compatibility problem.



From v+python at  Wed May  6 19:05:01 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 10:05:01 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

On approximately 5/6/2009 6:33 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> "Martin v. L?wis" writes:
>  > In any case, Python 3.1b1 may get released today, so it's way too late
>  > for new features in the PEP. They can wait for Python 3.2.
> You have convinced me that the PEP should wait as well.
> In its current form it is incomplete and dangerous.

I see nothing in this thread that suggests that the PEP is dangerous in 
its current form.

While I (still) think that more readable transcodings could have been 
used, and while I had difficulty fully understanding the PEP at first, 
now that I think I do understand the PEP, and it has been somewhat 
clarified and amended, I cannot see how it could be dangerous.  A 
specific case of danger should be included with such a statement.

Regarding incomplete, I agree it won't brush my teeth for me, but I 
think it does solve the problem it sets out to solve.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at  Wed May  6 19:08:22 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 10:08:22 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

On approximately 5/6/2009 3:08 AM, came the following characters from 
the keyboard of MRAB:
> M.-A. Lemburg wrote:
>> Martin v. L?wis wrote:

> Judging by the existing names, I think that 'surrogate' would be
> reasonable. It already contains the meaning of substitute, it's not too
> long, and the codes which act as replacements are already called
> surrogates.
>> I want to avoid any such confusion with Python codecs and don't
>> understand why you are making a problem out of this.

+1 for "surrogate" as the name for the error handler.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From v+python at  Wed May  6 19:11:15 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 10:11:15 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On approximately 5/6/2009 12:53 AM, came the following characters from 
the keyboard of Martin v. L?wis:

>> Sorry!  I suggest substituting the paragraph above for the paragraph
>> which begins "The encode error handler interface presentlyrequires..."
>> at line 129.
> Ah, ok. This was Glen Linderman's text before - now it's yours :-)

Which is fine by me.  Stephen's is more explanatory than mine, but says 
the same thing.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From tjreedy at  Wed May  6 21:13:55 2009
From: tjreedy at (Terry Reedy)
Date: Wed, 06 May 2009 15:13:55 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <gtsnhj$sgv$>

Glenn Linderman wrote:
> On approximately 5/6/2009 3:08 AM, came the following characters from 
> the keyboard of MRAB:
>> M.-A. Lemburg wrote:
>>> Martin v. L?wis wrote:
>> Judging by the existing names, I think that 'surrogate' would be
>> reasonable. It already contains the meaning of substitute, it's not too
>> long, and the codes which act as replacements are already called
>> surrogates.
>>> I want to avoid any such confusion with Python codecs and don't
>>> understand why you are making a problem out of this.
> +1 for "surrogate" as the name for the error handler.
+1 from me also

From zooko at  Wed May  6 21:18:03 2009
From: zooko at (Zooko Wilcox-O'Hearn)
Date: Wed, 6 May 2009 13:18:03 -0600
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:

> Zooko Wilcox-O'Hearn <zooko <at>> writes:
>> I'm not thinking of API compatibility as much as data  
>> compatibility -- someone used Python 3.1 to write down some  
>> filenames, and now a few years later they are trying to use the  
>> latest and greatest Python release to read those filenames...
> Well, if the filenames are generated by Python (as opposed to read  
> from an existing directory on disk), they should be regular unicode  
> objects without any lone surrogates, so I don't see the  
> compatibility problem.

I meant that the application reads filenames from an existing  
directory on disk, saves those filenames, and then later, using a  
future version of Python, wants to read them and use them.

I'm not saying that I know this would be a problem.  I'm saying that  
I personally can't tell whether it would be a problem or not, and the  
extensive discussions so far have not convinced me that there is  
anyone who both understands PEP 383 and considers this use case.

Many people who apparently understand encoding issues well have said  
something to the effect that there is no problem, but those people  
haven't yet managed to get through my thick skull how I would use PEP  
383 safely for this sort of use case -- the one where data generated  
by os.listdir() travels forward in time or the one were that data  
travels sideways to other systems, including Windows or other systems  
that validate incoming unicode.

That's why I am a bit uncomfortable about PEP 383 being quickly  
implemented and deployed in Python 3.1.

By the way, much of the detailed discussion about what Tahoe requires  
and how that may or may not benefit from PEP 383 has now moved to the  
tahoe-dev mailing list: 
tahoe-dev .



From v+python at  Wed May  6 22:17:05 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 13:17:05 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On approximately 5/6/2009 12:18 PM, came the following characters from 
the keyboard of Zooko Wilcox-O'Hearn:
> On May 6, 2009, at 10:54 AM, Antoine Pitrou wrote:
>> Zooko Wilcox-O'Hearn <zooko <at>> writes:
>>> I'm not thinking of API compatibility as much as data compatibility 
>>> -- someone used Python 3.1 to write down some filenames, and now a 
>>> few years later they are trying to use the latest and greatest Python 
>>> release to read those filenames...
>> Well, if the filenames are generated by Python (as opposed to read 
>> from an existing directory on disk), they should be regular unicode 
>> objects without any lone surrogates, so I don't see the compatibility 
>> problem.
> I meant that the application reads filenames from an existing directory 
> on disk, saves those filenames, and then later, using a future version 
> of Python, wants to read them and use them.

Regarding future versions of Python.  In the worst case, even if 
Python's default behavior changes, the transcoding done by PEP 383 can 
be done in other software too... it is a straightforward, fully 
specified, 1-to-1, reversible transcoding process, affecting and 
generating only invalid byte encodings on one side, and invalid Unicode 
sequences on the other.

So if Python's default behavior should change, the transcoding 
implemented by PEP 383 could be easily reimplemented to enable a future 
version of a Python application to manipulate the transcoded, saved, 

By easily, I mean that I could code it in a couple hours, max.

> I'm not saying that I know this would be a problem.  I'm saying that I 
> personally can't tell whether it would be a problem or not, and the 
> extensive discussions so far have not convinced me that there is anyone 
> who both understands PEP 383 and considers this use case.

Does the above help?

> Many people who apparently understand encoding issues well have said 
> something to the effect that there is no problem, but those people 
> haven't yet managed to get through my thick skull how I would use PEP 
> 383 safely for this sort of use case -- the one where data generated by 
> os.listdir() travels forward in time or the one were that data travels 
> sideways to other systems, including Windows or other systems that 
> validate incoming unicode.

Regarding data traveling sideways, some comments:

1) PEP 383's effect could be recoded in other languages as easily as it 
is in Python (or the C in which Python is implmented).  So that could be 
a solution.

2) You mention "Windows" and "other systems that validate incoming 
unicode" in the same phrase, as if you think that "Windows" qualifies as 
  an "other systems that validate incoming unicode", but it does not (at 
least not universally).

> That's why I am a bit uncomfortable about PEP 383 being quickly 
> implemented and deployed in Python 3.1.

Does the above help?

> By the way, much of the detailed discussion about what Tahoe requires 
> and how that may or may not benefit from PEP 383 has now moved to the 
> tahoe-dev mailing list: 
> .

I have no background with Tahoe, nor particular interest, although it 
sounds like a useful project... so I won't be joining that list.  I have 
no idea if there is an installed base of existing Tahoe file systems, my 
suggestions below assume that there is not, and that you are presently 
inventing them.  Therefore, I provide no migration path, although I 
could invent one, but it would take longer to describe.

However, since I'm responding here, and have read what you have posted 
here, it seems like the following could be true.

Assumptions from your emails:

A) Tahoe wants to provide a UTF-8 file name system
B) Tahoe wants to interface to POSIX systems that use (and do not 
validate) byte interfaces.
C) Tahoe wants to interface to non-POSIX systems that use 16-bit file 
name interfaces, with no validation.
D) Tahoe wants to interface to non-POSIX systems that use 16-bit file 
name interfaces, with validation.

Uncertainties: I'm not clear on what your goals are for Tahoe filenames. 
  There seem to be 2 possibilities:

1) you want to reject attempts to use non-validating Unicode, be it from 
a 16-bit interface, or a bytes interface.
2) you don't want to reject non-validating Unicode, but you want to 
convert it to valid Unicode for (D) systems.

3) Orthogonally, you might want to store only Valid Unicode in the 
names, or you might not care, if you can meet the other goals.


If you want to support (D), and (2), then you must transform names at 
some point, using some scheme, because not all names supplied by (B) 
systems will be acceptable to (D) systems.  You can choose to do this 
transformation when a (B) system provides an invalid (per Unicode) name, 
or you can choose to do the transformation when a (D) system accesses a 
file with an invalid (per Unicode) name.

If the (B) and (D) systems talk to each other outside of Tahoe, they 
will have to do similar transformations, or, if they both access the 
same Tahoe system, they will have to do the identical transformation, to 
be sure that they can access the same file.

All transcoding schemes have the possibility of data puns between 
non-transcoded names and transcoded names.  In order to successfully and 
properly manipulate a name, you must know whether or not it has been 
transcoded, and how.

PEP 383 limits its transcoding to names that are invalid (per Unicode). 
   Names that cannot be properly decoded to Unicode are decoded to 
invalid Unicode.  Names that are invalid Unicode are encoded to invalid 
byte sequences (per the encoding scheme specified).

For PEP 383 and Python, transcoded names can be distinguished by 
checking for the existence of lone surrogates in the str form of the 
filename, or by attempting to do a strict decoding of the bytes form of 
the filename, depending on what you have (generally, the former).

For PEP 383 and Python, the names will round trip from the POSIX bytes 
interfaces to the program, and back to POSIX bytes interfaces, as long 
as only Python wrappers of system functions are used, and the filesystem 
encoding is not changed between calls (or is restored).  Passing them to 
3rd party libraries or other systems requires extra work, if there is a 
desire to manipulate files with names that are not decodeable to Unicode 
by the standard decoding algorithm for that encoding.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at  Wed May  6 22:40:13 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:40:13 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>> design
>> Where is that design documented, and how exactly violates the name
>> the design (chapter and verse, please).
> Martin, I designed the whole Python codec machinery

Not true. PEP 293 was written and designed by Walter D?rwald.

> so even if
> this is not explicitly written down somewhere, you can take my
> word for it.

If the design was specified in writing somewhere, I would probably
challenge it as obsolete. If it isn't described anywhere, I'll have
to ignore it.

> I want to avoid any such confusion with Python codecs and don't
> understand why you are making a problem out of this.

Because utf8b (or, perhaps "UTF-8b") is the official name for this


From martin at  Wed May  6 22:34:53 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:34:53 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

> I'm sorry for the lack of clarity of my posts, but somehow you're
> completely missing the point.  The point is precisely that Python
> *won't* use Shift JIS as the file system encoding (if it did there
> would be no problem with reading Shift JIS), but the people who
> created the media *did*.
> Now, with Python's file system encoding == UTF-8 or any packed EUC,
> and more than a handful of Shift JIS or Big5 characters in file names,
> one is *almost certain* to encounter ASCII as the second byte of a
> multibyte sequence.  PEP 383 can't handle this

Not true. PEP 383 handles this very example just fine, with no problems
that I can see. Can you propose a specific example that you think might
cause problems? By "specific", I mean: what file names (exact bytes,
please), what locale charset, what API calls.


From martin at  Wed May  6 22:41:11 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:41:11 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

> Judging by the existing names, I think that 'surrogate' would be
> reasonable

MAL's list of existing names is incomplete. "surrogates" is already
an existing name, also, and it means something different (similar,
but different).


From martin at  Wed May  6 22:42:03 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:42:03 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gtsnhj$sgv$>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Terry Reedy wrote:
> Glenn Linderman wrote:
>> On approximately 5/6/2009 3:08 AM, came the following characters from
>> the keyboard of MRAB:
>>> M.-A. Lemburg wrote:
>>>> Martin v. L?wis wrote:
>>> Judging by the existing names, I think that 'surrogate' would be
>>> reasonable. It already contains the meaning of substitute, it's not too
>>> long, and the codes which act as replacements are already called
>>> surrogates.
>>>> I want to avoid any such confusion with Python codecs and don't
>>>> understand why you are making a problem out of this.
>> +1 for "surrogate" as the name for the error handler.
> +1 from me also

Despite there being also an error handler called "surrogates".

Are you serious?


From martin at  Wed May  6 22:44:09 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:44:09 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>
	<>	<>
Message-ID: <>

> Is it only usable with utf8 as an encoding?

No, it applies to any codec which potentially cannot decode
all bytes >127.


From solipsis at  Wed May  6 22:48:15 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 20:48:15 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>	<>	<>	<>	<>	<>	<>	<>
	<gtsnhj$sgv$> <>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> Despite there being also an error handler called "surrogates".

People, perhaps we could end all the bikeshedding and call one of those handlers
"surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
to what they actually /do/?



From martin at  Wed May  6 22:48:34 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 06 May 2009 22:48:34 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

> But first, it should be stopped by any of several
> standard precautions.  For example, applying os.path.realpath (come to
> think of it, PEP 383 should say something about realpath, shouldn't
> it?)

Why do you think so? I think the existing documentation of realpath
is correct and complete.

> and os.path.normpath (PEP 383 should definitely say something
> about this function

Precisely what?

> maybe PEP 3131 should, too)

How can this be of relevance?

>  > Nothing is lost at the moment.
> Nothing is lost compared to 'strict', true, but under the PEP as it is
> a large fraction of Shift JIS and Big5 filenames cannot be read under
> ASCII-compatible file system encodings using 'utf8b'.  Yet it is those
> users who are placed at risk by PEP 383.

I think this statement is incorrect. Those filenames *can* be read just


From martin at  Wed May  6 22:56:34 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 06 May 2009 22:56:34 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>
Message-ID: <>

Antoine Pitrou wrote:
> Martin v. L?wis <martin <at>> writes:
>> Despite there being also an error handler called "surrogates".
> People, perhaps we could end all the bikeshedding and call one of those handlers
> "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
> to what they actually /do/?

The problem with these bike-shedding discussions is that you cannot stop
them with a proposal. People will counter-propose.

I would be willing to accept a ruling from someone who a) is a native
speaker of English, and b) has demonstrated to fully understand what
these do, and c) has understood why I insist on calling it utf8b.


From tjreedy at  Wed May  6 23:47:05 2009
From: tjreedy at (Terry Reedy)
Date: Wed, 06 May 2009 17:47:05 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>
Message-ID: <gtt0gp$sos$>

Martin v. L?wis wrote:

>>> +1 for "surrogate" as the name for the error handler.
>> +1 from me also
> Despite there being also an error handler called "surrogates".

Given that additional information which MAL apparently omitted, I would 

> Are you serious?

Are you? ;-?  You are the one naming a codec-agnostic error handler (if 
I understand correctly, and correct me if I do not) after a particular 
codec, and denying that that could cause confusion.  See other message.

Terry Jan Reedy

From p.f.moore at  Thu May  7 00:01:23 2009
From: p.f.moore at (Paul Moore)
Date: Wed, 6 May 2009 23:01:23 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
	<> <gtsnhj$sgv$>
Message-ID: <>

2009/5/6 Antoine Pitrou <solipsis at>:
> Martin v. L?wis <martin <at>> writes:
>> Despite there being also an error handler called "surrogates".
> People, perhaps we could end all the bikeshedding and call one of those handlers
> "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
> to what they actually /do/?

We could also stop the bikeshedding by sticking with the name utf8b.
Martin's comment that it is the official name for this algorithm seems
compelling to me (even if it is confusing because of its similarity
with utf-8).


From tjreedy at  Thu May  7 00:03:57 2009
From: tjreedy at (Terry Reedy)
Date: Wed, 06 May 2009 18:03:57 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <gtt1gd$viv$>

Martin v. L?wis wrote:

> Because utf8b (or, perhaps "UTF-8b") is the official name for this
> algorithm:

Thank you for the link.  It starts:
"This directory contains a C implementation of a UTF-8b codec.
A Python codec based on it is provided as well."

'RTF-8b' consists, obviously, 'UTF-8' plus 'b', with the 'b' signifying 
a variation of or addition to UTF-8.  The 'b', and only the 'b', refers 
to the innovative error-handler that was added to the existing 'UTF-8' 
codec/algorithm.  The name of the combined whole is not the name of the 

If you were incorporating the Python-wrapped utf-8b *codec* as a codec, 
which is what I once thought *because you used that name*, then calling 
it 'utf-8b' would be fine.  But you apparently instead proposed and 
implemented an *error-handler*, which seems to me to be something else, 
and which will not be specific to utf-8 but usable with any codec. 
Hence some of us think it should have a different name.

I gather that you lifted the error-handler part of the algorithm and 
propose to use it with *any* ascii-respecting codec.  I could claim that 
the 'official name' of that part is 'b', but I think we can find a 
better name.

Terry Jan Reedy

From tjreedy at  Thu May  7 00:33:11 2009
From: tjreedy at (Terry Reedy)
Date: Wed, 06 May 2009 18:33:11 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>
Message-ID: <gtt377$4mj$>

Martin v. L?wis wrote:
> Antoine Pitrou wrote:
>> Martin v. L?wis <martin <at>> writes:
>>> Despite there being also an error handler called "surrogates".
>> People, perhaps we could end all the bikeshedding and call one of those handlers
>> "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
>> to what they actually /do/?
> The problem with these bike-shedding discussions is that you cannot stop
> them with a proposal. People will counter-propose.
> I would be willing to accept a ruling from someone who a) is a native
> speaker of English, and b) has demonstrated to fully understand what
> these do, and c) has understood why I insist on calling it utf8b.

I qualify with a). I believe I understand c) but, as explained in my 
other post, I do not think your reason applies.  In fact, I think 
concern for naming rights might suggest that you *not* reuse the name 
for something different.  I would have to learn more about the existing 
'surrogates' handler to judge Antione's suggestion 'surrogates-pass'. 
'Surrogates-escape' is pretty good for the new handler since, to my 
understanding, it 'escapes' 'bad bytes' by prefixing them with bits that 
push them to the surrogates plane.

I have been supportive of the idea and, as well as I understood them, 
the particulars of your proposal, from the beginning.  Reusing the name 
of a codec as the name of an error-handler confused me and I believe it 
will confuse others, even though, but also because, the error handler 
was extracted and generalized from the codec.

Terry Jan Reedy

From martin at  Thu May  7 00:59:18 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 00:59:18 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gtt0gp$sos$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>
Message-ID: <>

>> Are you serious?
> Are you? ;-?  You are the one naming a codec-agnostic error handler (if
> I understand correctly, and correct me if I do not) after a particular
> codec, and denying that that could cause confusion.  See other message.

I can only repeat what I said before: I call it utf8b because that's
the established name for the algorithm it implements.

That algorithm was originally designed with UTF-8 in mind (and only
meant to be applied for UTF-8), however, it remains the same algorithm
even though PEP 383 widens its application.


From google at  Thu May  7 01:06:24 2009
From: google at (MRAB)
Date: Thu, 07 May 2009 00:06:24 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>
Message-ID: <>

Antoine Pitrou wrote:
> Martin v. L?wis <martin <at>> writes:
>> Despite there being also an error handler called "surrogates".
> People, perhaps we could end all the bikeshedding and call one of those handlers
> "surrogates-pass" and the other "surrogates-escape", which sounds quite faithful
> to what they actually /do/?
After having read about the existing error handler called "surrogates"
and having thought about it, I've decided that calling one just
"surrogates" isn't very helpful to the user; it has something to do with
surrogates, but what?

So +1 for Antoine's suggestion from me.

From martin at  Thu May  7 01:16:18 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 07 May 2009 01:16:18 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gtt377$4mj$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>
Message-ID: <>

> I qualify with a). I believe I understand c) but, as explained in my
> other post, I do not think your reason applies.  In fact, I think
> concern for naming rights might suggest that you *not* reuse the name
> for something different.  I would have to learn more about the existing
> 'surrogates' handler to judge Antione's suggestion 'surrogates-pass'.
> 'Surrogates-escape' is pretty good for the new handler since, to my
> understanding, it 'escapes' 'bad bytes' by prefixing them with bits that
> push them to the surrogates plane.

See issue 3672. In essence, in python 2.5:

py> u"\ud800".encode("utf-8")
py> '\xed\xa0\x80'.decode("utf-8")

In 3.1,

py> "\ud800".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed
py> "\ud800".encode("utf-8","surrogates")
py> b'\xed\xa0\x80'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2:
illegal encoding
py> b'\xed\xa0\x80'.decode("utf-8","surrogates")


From solipsis at  Thu May  7 01:27:00 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 6 May 2009 23:27:00 +0000 (UTC)
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>
	<gtt377$4mj$> <>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
> '\ud800'

The point is, "surrogates" does not mean anything intuitive for an /error
handler/. You seem to be the only one who finds this name explicit enough,
perhaps because you chose it.
Most other handlers' names have verbs in them ("ignore", "replace",
"xmlcharrefreplace", etc.).



From skippy.hammond at  Thu May  7 01:38:47 2009
From: skippy.hammond at (Mark Hammond)
Date: Thu, 07 May 2009 09:38:47 +1000
Subject: [Python-Dev] Proposed: add support for UNC paths to all
 functions in ntpath
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Eric Smith wrote:
> Mark: I've reviewed this and it looks okay to me.

Thanks Eric - I've now applied that patch.  As you mentioned in a 
followup to the bug:

| Thanks for looking at this, Mark. If we could only assign issues to
| Python 3.2 and 3.3 to change the pending deprecation warning to a real
| one, and to remove the function entirely, we'd be all set! I'm always
| worried we'll forget these things.

(for reference; the patch introduces a PendingDeprecationWarning for 

The bug tracker doesn't have these future versions available yet - is 
there some other way these things should be tracked?  I fear simply 
opening a new bug without a reasonable 'trigger' will linger way beyond 
the next few versions...



From murman at  Thu May  7 03:05:42 2009
From: murman at (Michael Urman)
Date: Wed, 6 May 2009 20:05:42 -0500
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
	<gtsnhj$sgv$> <>
Message-ID: <>

On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" <martin at> wrote:
> Despite there being also an error handler called "surrogates".

Not that I have to be, but I'm not sold on the previous UTF-8 codec
behavior becoming an error handler of the name "surrogates" for two
reasons (I do respect the obvious PBP argument for the implementation,
and have no better name - "lenient"?).

First, unless there's a way to stack error handlers, there's no way to
access the old behavior combined with the "replace" handler. Second,
errors="surrogates" reads like surrogates should be an error, not an
additionally allowed pattern. Neither of these are deal breakers or
hard to learn, but they are non-obvious. I think the utf8b behavior
makes a lot more sense with the name "surrogates", through the
mnemonic that errors become surrogates.

The stacking argument also applies to the new utf8b behavior on encode
(only, as it handles all errors on decode). This may be a YAGNI, but
for a non-UTF-8 encode, it may be useful to allow "xmlcharrefreplace"
handling for unavailable non-surrogate-escaped characters. But without
stacking that's unmaintainable, as we clearly don't want ${codec}b for
all current codecs.

I'd be perfectly happy with utf8b or UTF-8b, as either a codec or an
error handler (do we want both? YAGNI?). So what if it smells a little
inaccurate as a handler when used with codecs other than UTF-8, no big
deal. I could also see something like errors="roundtrip" which
explains the intention of the handler rather than the algorithm, but
is awkward on encode when it encounters unavailable Unicode

Michael Urman

From mal at  Thu May  7 03:06:05 2009
From: mal at (M.-A. Lemburg)
Date: Thu, 07 May 2009 03:06:05 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>>>> The name "utf8b" suggested in the PEP is not in line with the codec
>>>> design
>>> Where is that design documented, and how exactly violates the name
>>> the design (chapter and verse, please).
>> Martin, I designed the whole Python codec machinery
> Not true. PEP 293 was written and designed by Walter D?rwald.

Walter added the generic error handler callback mechanism and
we both worked on their design.

I designed and wrote the codec implementation back in 2000,
which included the whole idea of having codec error handlers in the
first place.

The original implementation only allowed per-codec
error handlers. Walter extended this to build general-purpose
handlers that could be used by many codecs. His original
motivation was to be able to do XML character reference

If you don't believe me, go look this up in the repository, the
mailing list archives and the trackers.

>> so even if
>> this is not explicitly written down somewhere, you can take my
>> word for it.
> If the design was specified in writing somewhere, I would probably
> challenge it as obsolete. If it isn't described anywhere, I'll have
> to ignore it.

Ah, lovely attitude.

>> I want to avoid any such confusion with Python codecs and don't
>> understand why you are making a problem out of this.
> Because utf8b (or, perhaps "UTF-8b") is the official name for this
> algorithm:

That's a codec implementing the escaping idea proposed by Markus
Kuhn, not an official reference. AFAIK, the term "UTF-8B" originated
from a "UTF-8 + binary" codec written for iconv:

If it were the official name of an escape algorithm, as you are
suggesting, the inventor Markus Kuhn would probably have chosen
it, but he hasn't... the only reference to it is an email where it
is described as option D for ways of dealing with malformed
UTF-8 data in a decoder:

Note that this escape method is not applicable for data that
you decode from UTF-8 and then e.g. encode as Latin-1. It only
works as general purpose method if you are decoding and encoding
using the same codec, since it is specifically designed to
assure round-trip safety.

Martin, please stop being silly and just change the name.

Or drop the idea of using an error handler altogether and just let
people use the utf-8b codec you referenced above to solve their
problems whereever and if needed.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 07 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                52 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From benjamin at  Thu May  7 03:14:06 2009
From: benjamin at (Benjamin Peterson)
Date: Wed, 6 May 2009 20:14:06 -0500
Subject: [Python-Dev] test - please ignore
Message-ID: <>

Some of my messages appear not to have gotten through.

From benjamin at  Thu May  7 03:32:47 2009
From: benjamin at (Benjamin Peterson)
Date: Wed, 6 May 2009 20:32:47 -0500
Subject: [Python-Dev] [RELEASED] Python 3.1 beta 1
Message-ID: <>

On behalf of the Python development team, I'm thrilled to announce the first and
only beta release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of features and changes
Python 3.0 introduced.  For example, the new I/O system has been rewritten in C
for speed.  File system APIs that use unicode strings now handle paths with
undecodable bytes in them. [1] Other features include an ordered dictionary
implementation and support for ttk Tile in Tkinter.  For a more extensive list
of changes in 3.1, see or
Misc/NEWS in the Python distribution.

Please note that this is a beta release, and as such is not suitable for
production environments.  We continue to strive for a high degree of quality,
but there are still some known problems and the feature sets have not been
finalized.  This beta is being released to solicit feedback and hopefully
discover bugs, as well as allowing you to determine how changes in 3.1 might
impact you.  If you find things broken or incorrect, please submit a bug report

For more information and downloadable distributions, see the Python 3.1 website:

See PEP 375 for release schedule details:

-- Benjamin

Benjamin Peterson
benjamin at
Release Manager
(on behalf of the entire python-dev team and 3.1's contributors)

From stephen at  Thu May  7 04:35:52 2009
From: stephen at (Stephen J. Turnbull)
Date: Thu, 07 May 2009 11:35:52 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" writes:

 > > Now, with Python's file system encoding == UTF-8 or any packed EUC,
 > > and more than a handful of Shift JIS or Big5 characters in file names,
 > > one is *almost certain* to encounter ASCII as the second byte of a
 > > multibyte sequence.  PEP 383 can't handle this

Ah, I see.  Of course, the algorithm not only has to handle the ASCII
octet which is erroneous because it can't be a trailing byte, but
*also the leading byte that signalled to expect a trailing byte >127*.
So the algorithm backs up to the character boundary (which is
well-defined for all the "sane" encodings), encode the high byte(s) in
the character with lone surrogates, and encode the ASCII as itself
(promoted to a Unicode code point).

Sorry, you're right, I was just confused.  I withdraw the objection as
completely mistaken, and apologize for not thinking more carefully in
the first place.

From tjreedy at  Thu May  7 05:48:38 2009
From: tjreedy at (Terry Reedy)
Date: Wed, 06 May 2009 23:48:38 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>
Message-ID: <gttlmn$aua$>

Martin v. L?wis wrote:
>>> Are you serious?
>> Are you? ;-?  You are the one naming a codec-agnostic error handler (if
>> I understand correctly, and correct me if I do not) after a particular
>> codec, and denying that that could cause confusion.  See other message.
> I can only repeat what I said before: I call it

What, specifically, is 'it'?

> utf8b because that's
> the established name for the algorithm

Which algorithm?

> it implements.

Again, what is 'it'?

As *I* read the sentence above, it is not true.

I went to the site you referred to as the source of your reasoning and 

The algorithm called utf-8b *IS* utf-8 with the addition or replacement 
(of an error return) of essentially one line in each direction:

# encode
if 0xDC00 <= codepoint <= 0xDCFF:
     byte = codepoint - 0xDC00 #encode

Note: for security concerns, you are increasing the lower limit to 
0xDC80. The comment at the top of the utf_8b.c, suggests that that is 
what it should be and should have been in the file, with the other half 
of that surrogate area an error along with the other surrogate area.

if (0x80 <= byte <= 0xFF) and utf-8-invalid(byte):
     codepoint = byte + 0xDC00 # decode

> That algorithm was originally designed with UTF-8 in mind (and only
> meant to be applied for UTF-8), however, it remains the same algorithm
> even though PEP 383 widens its application.

The error handler designed with utf-8 in mind has no name in the encode 
direction and is called "utf_8b_decoder_invalid_bytes" in the decode 
direction.  By your reasoning, *that* should be its name in Python.  The 
encoding error handler would then be named analogously 
"utf_8b_encoder_invalid_codepoints".  Even these, to me, would be better 
than confusing giving them the same name as the codec.

Terry Jan Reedy

From v+python at  Thu May  7 06:16:02 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 21:16:02 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On approximately 5/6/2009 6:06 PM, came the following characters from 
the keyboard of M.-A. Lemburg:

> Martin, please stop being silly and just change the name.

Yes, please.  If indeed Marc-Andre invented the codec business as he 
claims, he would be an appropriate person to give a fiat name to the 
error handler.

> Or drop the idea of using an error handler altogether and just let
> people use the utf-8b codec you referenced above to solve their
> problems whereever and if needed.

The design as an error handler is clever in leveraging the same error 
handler for multiple codecs, which cannot be done by using utf-8b alone, 
if I understand correctly.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at  Thu May  7 07:43:30 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 07 May 2009 07:43:30 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	
	<> <>	
	<> <>	
	<>	 <gtsnhj$sgv$>
Message-ID: <>

Michael Urman wrote:
> On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" <martin at> wrote:
>> Despite there being also an error handler called "surrogates".
> Not that I have to be, but I'm not sold on the previous UTF-8 codec
> behavior becoming an error handler of the name "surrogates" for two
> reasons (I do respect the obvious PBP argument for the implementation,
> and have no better name - "lenient"?).


> First, unless there's a way to stack error handlers, there's no way to
> access the old behavior combined with the "replace" handler.

Well, there is a way to stack error handlers, although it's not pretty:

_surrogates = codecs.lookup_errors("surrogates")
_replace = codecs.lookup_errors("replace")
def surrogates_then_replace(exc):
        return _surrogates(exc)
    except UnicodeError:
        return _replace(exc)

> The stacking argument also applies to the new utf8b behavior on encode
> (only, as it handles all errors on decode). This may be a YAGNI

Indeed - in particular, as, in the primary application of this error
handler (i.e. file IO operations), there is no way of specifying
an addition error handler anyway.


From martin at  Thu May  7 07:53:07 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 07:53:07 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gttlmn$aua$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>
Message-ID: <>

> The error handler designed with utf-8 in mind has no name in the encode
> direction and is called "utf_8b_decoder_invalid_bytes" in the decode
> direction.  By your reasoning, *that* should be its name in Python.  The
> encoding error handler would then be named analogously
> "utf_8b_encoder_invalid_codepoints".  Even these, to me, would be better
> than confusing giving them the same name as the codec.

So are you proposing that I should rename the PEP 383 handler
to "utf_8b_encoder_invalid_codepoints"?


From martin at  Thu May  7 08:10:16 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 07 May 2009 08:10:16 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

> By the way, what are the ASCII characters that are not suppported by Shift-JIS?
> Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
> backslash and the tilde).

The problem with this encoding is that bytes below 128 appear as second
bytes of a two-byte encoding:

py> "\x81@".decode("shift-jis")
py> "\x81A".decode("shift-jis")

So in on decoding, it may be the second byte (i.e. the ASCII byte) that
causes a problem:

py> "\x81/".decode("shift-jis")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position
0-1: illegal multibyte sequence

For the shift-jis codec, that's actually not a problem, though:

py> b"\x81/".decode("shift-jis","utf8b")

so the utf8b error handler will escape the first of the two bytes,
and then pass the second byte to the codec again, which then decodes


From martin at  Thu May  7 08:16:11 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 08:16:11 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>
	<> <>
Message-ID: <>

>> So are you proposing that I should rename the PEP 383 handler
>> to "utf_8b_encoder_invalid_codepoints"?
> No, he's saying that your algorithm for choosing the PEP 383 handler
> should have come up with that name, rather than utf8b.  But since PEP
> 383 applies to other codecs besides UTF-8, it should have a different
> name.  And one that is less cumbersome than
> "utf_8b_encoder_invalid_codepoints"

I'm still at a loss what name to give it, though. I understand that
I have to rename both error handlers, but I'm uncertain what I should
rename them to. So proposals that rename only one of them aren't
that helpful. It would be helpful if people would indicate support
for Antoine's proposal.


From v+python at  Thu May  7 08:00:36 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 23:00:36 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>
Message-ID: <>

On approximately 5/6/2009 10:53 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>> The error handler designed with utf-8 in mind has no name in the encode
>> direction and is called "utf_8b_decoder_invalid_bytes" in the decode
>> direction.  By your reasoning, *that* should be its name in Python.  The
>> encoding error handler would then be named analogously
>> "utf_8b_encoder_invalid_codepoints".  Even these, to me, would be better
>> than confusing giving them the same name as the codec.
> So are you proposing that I should rename the PEP 383 handler
> to "utf_8b_encoder_invalid_codepoints"?

No, he's saying that your algorithm for choosing the PEP 383 handler 
should have come up with that name, rather than utf8b.  But since PEP 
383 applies to other codecs besides UTF-8, it should have a different 
name.  And one that is less cumbersome than 

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at  Thu May  7 08:37:36 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 08:37:36 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>
	<> <>
	<> <>
Message-ID: <>

> Wouldn't renaming the existing "surrogates" handler be an incompatible
> change, and thus inappropriate?

No - it's new in Python 3.1.

So what do you think about Antoine's proposal?


From v+python at  Thu May  7 08:32:48 2009
From: v+python at (Glenn Linderman)
Date: Wed, 06 May 2009 23:32:48 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>
	<> <>
Message-ID: <>

On approximately 5/6/2009 11:16 PM, came the following characters from 
the keyboard of Martin v. L?wis:
>>> So are you proposing that I should rename the PEP 383 handler
>>> to "utf_8b_encoder_invalid_codepoints"?
>> No, he's saying that your algorithm for choosing the PEP 383 handler
>> should have come up with that name, rather than utf8b.  But since PEP
>> 383 applies to other codecs besides UTF-8, it should have a different
>> name.  And one that is less cumbersome than
>> "utf_8b_encoder_invalid_codepoints"
> I'm still at a loss what name to give it, though. I understand that
> I have to rename both error handlers, but I'm uncertain what I should
> rename them to. So proposals that rename only one of them aren't
> that helpful. It would be helpful if people would indicate support
> for Antoine's proposal.

Wouldn't renaming the existing "surrogates" handler be an incompatible 
change, and thus inappropriate?  I assume that is the second handler you 
are referring to?


That would be very descriptive of the decode case for PEP 383, but very 
long.  One problem with the word "surrogates" is that anything you add 
to it makes it too long.


This is short, but a meaningless as is -- however, adding the 
understanding via documentation that "ls" means "lone surrogates" would 
make it meaningful, and mnemonic.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From mal at  Thu May  7 11:21:28 2009
From: mal at (M.-A. Lemburg)
Date: Thu, 07 May 2009 11:21:28 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<gtt377$4mj$>
Message-ID: <>

Antoine Pitrou wrote:
> Martin v. L?wis <martin <at>> writes:
>> py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
>> '\ud800'
> The point is, "surrogates" does not mean anything intuitive for an /error
> handler/. You seem to be the only one who finds this name explicit enough,
> perhaps because you chose it.
> Most other handlers' names have verbs in them ("ignore", "replace",
> "xmlcharrefreplace", etc.).


The purpose of an error handler name is to indicate to the user
what it does, hence the use of verbs.

Walter started with "xmlcharrefreplace", ie. no space names, so
"surrogatereplace" would be the logically correct name for the
"replace with lone surrogates" scheme invented by Markus Kuhn.

The error handler for undoing this operation (ie. when converting
a Unicode string to some other encoding) should probably use the
same name based on symmetry and the fact that the escaping
scheme is meant to be used for enabling round-trip safety.

BTW: It would also be appropriate to reference Markus Kuhn in the PEP
as the inventor of the escaping scheme.

Even if only to give the reader an idea of how that scheme works and
why (the PEP on currently doesn't explain this).

It should also explain that the scheme is meant to assure round-trip
safety and doesn't necessarily work when using transcoding, ie.
reading using one encoding, writing using another.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 07 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                52 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From cournape at  Thu May  7 11:50:18 2009
From: cournape at (David Cournapeau)
Date: Thu, 7 May 2009 18:50:18 +0900
Subject: [Python-Dev] Help on issue 5941
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 6, 2009 at 6:01 PM, Tarek Ziad? <ziade.tarek at> wrote:
> Hello,
> I need some help on
> The bug is quite simple: the Distutils unixcompiler used to set the
> archiver command to "ar -rc".
> For quite a while now, this behavior has changed in order to be able
> to customize the compiler behavior from
> the environment. That introduced a regression because the mechanism in
> Distutils that looks for the
> AR variable in the environment also looks into the Makefile of Python.
> (in the Makefile then is os.environ)
> And as a matter of fact, AR is set to "ar" in there, so the -cr option
> is not set anymore.
> So my question is : should I make a change into the Makefile by adding
> for example a variable called AR_OPTIONS
> then build the ar command with AR + AR_OPTIONS

I think for consistency, it could be named ARFLAGS (this is the name
usually taken for configure scripts), and both should be overridable
as the other variable in distutils.sysconfig.customize_compiler. Those
flags should be used in Makefile.pre as well, instead of the harcoded
cr as currently used.

Here is what I would try:
 - check for AR (already done in the configure script AFAICT)
 - if ARFLAGS is defined in the environment, use those, otherwise set
 - use ARFLAGS in the makefile

Then, in the customize_compiler function, set archiver to $AR +
$ARFLAGS. IOW, just copying the logic used for e.g. ldshared,

I can prepare a patch if you want,



From ziade.tarek at  Thu May  7 12:07:01 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 7 May 2009 12:07:01 +0200
Subject: [Python-Dev] Help on issue 5941
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 11:50 AM, David Cournapeau <cournape at> wrote:
> Then, in the customize_compiler function, set archiver to $AR +
> $ARFLAGS. IOW, just copying the logic used for e.g. ldshared,
> I can prepare a patch if you want,

I am ok on Distutils side, but I wouldn't mind some help on the
makefile/configure side
Even if I could mimic what's in there, I am not confident enough yet.

Please do so, by attaching your patch in the issue,



Tarek Ziad? |

From ziade.tarek at  Thu May  7 13:49:36 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 7 May 2009 13:49:36 +0200
Subject: [Python-Dev] Help on issue 5941
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 1:37 PM, David Cournapeau <cournape at> wrote:
> On Thu, May 7, 2009 at 7:07 PM, Tarek Ziad? <ziade.tarek at> wrote:
>> On Thu, May 7, 2009 at 11:50 AM, David Cournapeau <cournape at> wrote:
>>> Then, in the customize_compiler function, set archiver to $AR +
>>> $ARFLAGS. IOW, just copying the logic used for e.g. ldshared,
>>> I can prepare a patch if you want,
>> I am ok on Distutils side, but I wouldn't mind some help on the
>> makefile/configure side
> Ok, I ended up making a patch for everything. I tested it on Linux,
> where it fixed the issue while keeping the customization (both AR and
> ARFLAGS can be customized through environment variables).
> numpy now builds under python 2.7,
> cheers,
> David

ok thanks David, I'll complete your patch with the test I have written
for this issue and commit it so it's included in 2.7/3.1.

Notice that from the beginning, the unixcompiler class options are
never used if the option has been customized
in distutils.sysconfig and present in the Makefile, so we need to
clean this behavior as well at some point, and document
the customization features.

By the way, do you happen to have a buildbot or something that builds numpy ?
If not it'll be very interesting:  I wouldn't mind having one numpy
track running on the Python trunk and receiving
mails if something is broken.

Tarek Ziad? |

From cournape at  Thu May  7 14:11:46 2009
From: cournape at (David Cournapeau)
Date: Thu, 7 May 2009 21:11:46 +0900
Subject: [Python-Dev] Help on issue 5941
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 8:49 PM, Tarek Ziad? <ziade.tarek at> wrote:

> Notice that from the beginning, the unixcompiler class options are
> never used if the option has been customized
> in distutils.sysconfig and present in the Makefile, so we need to
> clean this behavior as well at some point, and document
> the customization features.

Indeed, I have never bothered much with this part, though. Flags
customization with distutils is too awkward to be useful in general
for something like numpy IMHO, I just use scons instead when I need
fine grained control.

> By the way, do you happen to have a buildbot or something that builds numpy ?

We have a buildbot:

But I don't know if that's easy to set up such as both python and
numpy are built from sources.

> If not it'll be very interesting: ?I wouldn't mind having one numpy
> track running on the Python trunk and receiving
> mails if something is broken.

Well, I would not mind either :)


From ziade.tarek at  Thu May  7 14:25:01 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 7 May 2009 14:25:01 +0200
Subject: [Python-Dev] Help on issue 5941
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 2:11 PM, David Cournapeau <cournape at> wrote:
> But I don't know if that's easy to set up such as both python and
> numpy are built from sources.

I don't know about the numpy part, but the PyBots project code could
be a source of inspiration for the Python part

From benjamin at  Thu May  7 01:01:25 2009
From: benjamin at (Benjamin Peterson)
Date: Wed, 6 May 2009 18:01:25 -0500
Subject: [Python-Dev] [RELEASED] Python 3.1 beta 1
Message-ID: <>

On behalf of the Python development team, I'm thrilled to announce the first and
only beta release of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of features and changes
Python 3.0 introduced.  For example, the new I/O system has been rewritten in C
for speed.  File system APIs that use unicode strings now handle paths with
undecodable bytes in them. [1] Other features include an ordered dictionary
implementation and support for ttk Tile in Tkinter.  For a more extensive list
of changes in 3.1, see or
Misc/NEWS in the Python distribution.

Please note that this is a beta release, and as such is not suitable for
production environments.  We continue to strive for a high degree of quality,
but there are still some known problems and the feature sets have not been
finalized.  This beta is being released to solicit feedback and hopefully
discover bugs, as well as allowing you to determine how changes in 3.1 might
impact you.  If you find things broken or incorrect, please submit a bug report

For more information and downloadable distributions, see the Python 3.1 website:

See PEP 375 for release schedule details:

-- Benjamin

Benjamin Peterson
benjamin at
Release Manager
(on behalf of the entire python-dev team and 3.1's contributors)

From walter at  Thu May  7 15:20:07 2009
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 07 May 2009 15:20:07 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<gtt377$4mj$>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> Antoine Pitrou wrote:
>> Martin v. L?wis <martin <at>> writes:
>>> py> b'\xed\xa0\x80'.decode("utf-8","surrogates")
>>> '\ud800'
>> The point is, "surrogates" does not mean anything intuitive for an /error
>> handler/. You seem to be the only one who finds this name explicit enough,
>> perhaps because you chose it.
>> Most other handlers' names have verbs in them ("ignore", "replace",
>> "xmlcharrefreplace", etc.).
> Correct.
> The purpose of an error handler name is to indicate to the user
> what it does, hence the use of verbs.
> Walter started with "xmlcharrefreplace", ie. no space names, so
> "surrogatereplace" would be the logically correct name for the
> "replace with lone surrogates" scheme invented by Markus Kuhn.

"surrogatepass" (for the "don't complain about lone half surrogates"
handler) and "surrogatereplace" sound OK to me. However the other
"...replace" handlers are destructive (i.e. when such a "...replace"
handler is used for encoding, decoding will not produce the original
unicode string). The purpose of the PEP 383 error handler however is to
be roundtrip safe, so maybe we should choose a slightly different name?
How about "surrogateescape"?

> The error handler for undoing this operation (ie. when converting
> a Unicode string to some other encoding) should probably use the
> same name based on symmetry and the fact that the escaping
> scheme is meant to be used for enabling round-trip safety.

We have only one error handler registry, but we *can* have one error
handler for both directions (encoding and decoding) as the error handler
can simply check whether it got passed a UnicodeEncodeError or
UnicodeDecodeError object.

> BTW: It would also be appropriate to reference Markus Kuhn in the PEP
> as the inventor of the escaping scheme.
> Even if only to give the reader an idea of how that scheme works and
> why (the PEP on currently doesn't explain this).
> It should also explain that the scheme is meant to assure round-trip
> safety and doesn't necessarily work when using transcoding, ie.
> reading using one encoding, writing using another.


From google at  Thu May  7 15:47:13 2009
From: google at (MRAB)
Date: Thu, 07 May 2009 14:47:13 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>
	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>> Wouldn't renaming the existing "surrogates" handler be an incompatible
>> change, and thus inappropriate?
> No - it's new in Python 3.1.
> So what do you think about Antoine's proposal?

Although it looks like it would be without the '-' for consistency with
existing error handlers.

From murman at  Thu May  7 16:18:31 2009
From: murman at (Michael Urman)
Date: Thu, 7 May 2009 09:18:31 -0500
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
	<gtsnhj$sgv$> <>
Message-ID: <>

On Thu, May 7, 2009 at 00:43, "Martin v. L?wis" <martin at> wrote:
> Michael Urman wrote:
>> On Wed, May 6, 2009 at 15:42, "Martin v. L?wis" <martin at> wrote:
>>> Despite there being also an error handler called "surrogates".
>> Not that I have to be, but I'm not sold on the previous UTF-8 codec
>> behavior becoming an error handler of the name "surrogates" for two
>> reasons (I do respect the obvious PBP argument for the implementation,
>> and have no better name - "lenient"?).
> PBP?

Practicality beats purity. From a purity standpoint, the legacy
invalid utf-8 seems more like an encoding than an error handler to me.
>From a practicality standpoint, it's presumably much more convenient
to implement it on top of the new valid UTF-8 codec's behavior. And
then any error handler needs a name.

> Well, there is a way to stack error handlers, although it's not pretty:
> [...]
> codecs.register_error("surrogates_then_replace",
> ? ? ? ? ? ? ? ? ? ? ?surrogates_then_replace)

That mitigates my arguments significantly, although I'd rather see
something like errors=('surrogates', 'replace') chain the handlers
without additional registrations. But that's a different PEP or
arbitrary change. :)

>> The stacking argument also applies to the new utf8b behavior on encode
>> (only, as it handles all errors on decode). This may be a YAGNI
> Indeed - in particular, as, in the primary application of this error
> handler (i.e. file IO operations), there is no way of specifying
> an addition error handler anyway.

Would it be useful to allow setting this somewhere? It'd be analogous
to setfsencoding, perhaps a setfsencodingerrors. It's not hard to
imagine an application working on Windows where all Unicode characters
are valid, and constructing backup filenames by adding some arbitrary
character, or receiving them from a user who doesn't understand
encodings. When this application is taken to a non-Unicode filesystem,
without the ability to say "I really want a valid filename: so
replace", that could get messy. But it may still be a YAGNI, or a
"don't do that."

Michael Urman

From murman at  Thu May  7 16:31:11 2009
From: murman at (Michael Urman)
Date: Thu, 7 May 2009 09:31:11 -0500
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>
	<gtsnhj$sgv$> <>
	<gtt0gp$sos$> <>
	<gttlmn$aua$> <>
	<> <>
Message-ID: <>

On Thu, May 7, 2009 at 01:16, "Martin v. L?wis" <martin at> wrote:
> I'm still at a loss what name to give it, though. I understand that
> I have to rename both error handlers, but I'm uncertain what I should
> rename them to. So proposals that rename only one of them aren't
> that helpful. It would be helpful if people would indicate support
> for Antoine's proposal.

Part of the problem is they both allow byte sequences to decode to
invalid Unicode strings, and in particular they both affect the same
byte subsequences, and that brought us to the crossroads where we
wanted to name both of them "surrogates". So I'll offer a few more
colors, and try to get out of the way of choosing between them or the
other proposed ones. :)

I haven't come up with anything I like better than errors="lenient"
for the old utf8 behavior handler; would errors="nonvalidating" be
correct? It still seems to me that a new codec, perhaps
"utf8-lenient", reads better.

For the utf8b error handler, I could see any of errors="roundtrip",
errors="roundtripreplace", errors="tosurrogate",
errors="surrogatereplace", errors="surrogateescape",
errors="binaryreplace", errors="binaryescape". This includes Antoine's
proposal (sans hyphen).

Michael Urman

From walter at  Thu May  7 16:33:21 2009
From: walter at (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=)
Date: Thu, 07 May 2009 16:33:21 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
	<>	<gtsnhj$sgv$>
	<>	<>	<>
Message-ID: <>

Michael Urman wrote:

> [...]
>> Well, there is a way to stack error handlers, although it's not pretty:
>> [...]
>> codecs.register_error("surrogates_then_replace",
>>                      surrogates_then_replace)
> That mitigates my arguments significantly, although I'd rather see
> something like errors=('surrogates', 'replace') chain the handlers
> without additional registrations. But that's a different PEP or
> arbitrary change. :)

The first version of PEP 293 changed the errors argument to be a string
or callable. This would have simplified handler stacking somewhat
(because you don't have to register or lookup handlers) but it had the
disadvantage that many "char *" arguments in the C API would have had to
changed to "PyObject *". Changing the errors argument to a list of
strings would have the same problem.


From google at  Thu May  7 17:08:49 2009
From: google at (MRAB)
Date: Thu, 07 May 2009 16:08:49 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<>
Message-ID: <>

Walter D?rwald wrote:
> Michael Urman wrote:
>> [...]
>>> Well, there is a way to stack error handlers, although it's not pretty:
>>> [...]
>>> codecs.register_error("surrogates_then_replace",
>>>                      surrogates_then_replace)
>> That mitigates my arguments significantly, although I'd rather see
>> something like errors=('surrogates', 'replace') chain the handlers
>> without additional registrations. But that's a different PEP or
>> arbitrary change. :)
> The first version of PEP 293 changed the errors argument to be a string
> or callable. This would have simplified handler stacking somewhat
> (because you don't have to register or lookup handlers) but it had the
> disadvantage that many "char *" arguments in the C API would have had to
> changed to "PyObject *". Changing the errors argument to a list of
> strings would have the same problem.
A comma-separated or space-separated string, eg 'surrogates replace' or
'surrogates,replace'? It could be treated as handler stacking

From martin at  Thu May  7 19:21:58 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 07 May 2009 19:21:58 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>	
	<> <>	
	<>	 <gtsnhj$sgv$>
Message-ID: <>

>> Well, there is a way to stack error handlers, although it's not pretty:
>> [...]
>> codecs.register_error("surrogates_then_replace",
>>                      surrogates_then_replace)
> That mitigates my arguments significantly, although I'd rather see
> something like errors=('surrogates', 'replace') chain the handlers
> without additional registrations. But that's a different PEP or
> arbitrary change. :)

I think you can provide something like

errors=combine_errors('surrogates', 'replace')

as a library function, and it doesn't have to be part of the
standard library.

>>> The stacking argument also applies to the new utf8b behavior on encode
>>> (only, as it handles all errors on decode). This may be a YAGNI
>> Indeed - in particular, as, in the primary application of this error
>> handler (i.e. file IO operations), there is no way of specifying
>> an addition error handler anyway.
> Would it be useful to allow setting this somewhere?

I'm deliberately not proposing this as part of the PEP. First, it
has enough features already, and is approved as-is; plus YAGNI.


From martin at  Thu May  7 19:23:57 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 07 May 2009 19:23:57 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>	
	<gtsnhj$sgv$> <>	
	<gtt0gp$sos$> <>	
	<gttlmn$aua$> <>	
	<> <>
Message-ID: <>

> I haven't come up with anything I like better than errors="lenient"
> for the old utf8 behavior handler; would errors="nonvalidating" be
> correct?

I think either is fairly unspecific.

> For the utf8b error handler, I could see any of errors="roundtrip",
> errors="roundtripreplace", errors="tosurrogate",
> errors="surrogatereplace", errors="surrogateescape",
> errors="binaryreplace", errors="binaryescape". This includes Antoine's
> proposal (sans hyphen).

Giving multiple choices does not exactly make this proposal readily
implementable :-)


From martin at  Thu May  7 19:27:07 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 19:27:07 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<gtt377$4mj$>	<>	<>
Message-ID: <>

> The error handler for undoing this operation (ie. when converting
> a Unicode string to some other encoding) should probably use the
> same name based on symmetry and the fact that the escaping
> scheme is meant to be used for enabling round-trip safety.

Could you please familiarize yourself with the implementation
before commenting further?


From stephen at  Thu May  7 20:20:59 2009
From: stephen at (Stephen J. Turnbull)
Date: Fri, 08 May 2009 03:20:59 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <gtsnhj$sgv$>
	<> <gtt377$4mj$>
	<> <>
Message-ID: <>

Walter D?rwald writes:

 > "surrogatepass" (for the "don't complain about lone half surrogates"
 > handler) and "surrogatereplace" sound OK to me. However the other
 > "...replace" handlers are destructive (i.e. when such a "...replace"
 > handler is used for encoding, decoding will not produce the original
 > unicode string).

That doesn't bother me in the slightest.  "Replace" does not connote
"destructive" or "non-destructive" to me; it connotes "substitution".
The fact that other error handlers happen to be destructive doesn't
affect that at all for me.  YMMV.

 > The purpose of the PEP 383 error handler however is to be roundtrip
 > safe, so maybe we should choose a slightly different name?  How
 > about "surrogateescape"?

To me, "escape" has a strong connotation of a multicharacter
representation of a single character, and that's not true here.

How about "surrogatetranslate"?  I still prefer "surrogatereplace", as
it's slightly easier for me to type.

From ndbecker2 at  Thu May  7 20:42:29 2009
From: ndbecker2 at (Neal Becker)
Date: Thu, 07 May 2009 14:42:29 -0400
Subject: [Python-Dev] typo in Format Specification Mini-Language?
Message-ID: <gtva2l$acv$>

"format_spec ::=  [[fill]align][sign][#][0][width][.precision][type]"
"The precision is ignored for integer values."

In [36]: '%3x' % 10
Out[36]: '  a'

In [37]: '%.3x' % 10
Out[37]: '00a'

Apparently, precision is _not_ ignored? 

From tjreedy at  Thu May  7 20:57:56 2009
From: tjreedy at (Terry Reedy)
Date: Thu, 07 May 2009 14:57:56 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>
	<> <>
Message-ID: <gtvavk$e9v$>

Martin v. L?wis wrote:
>>> So are you proposing that I should rename the PEP 383 handler
>>> to "utf_8b_encoder_invalid_codepoints"?
>> No, he's saying that your algorithm for choosing the PEP 383 handler
>> should have come up with that name, rather than utf8b.  But since PEP
>> 383 applies to other codecs besides UTF-8, it should have a different
>> name.  And one that is less cumbersome than
>> "utf_8b_encoder_invalid_codepoints"

Correct.  Thank you Glenn.
> I'm still at a loss what name to give it, though. I understand that
> I have to rename both error handlers, but I'm uncertain what I should
> rename them to. So proposals that rename only one of them aren't
> that helpful. It would be helpful if people would indicate support
> for Antoine's proposal.

Given your explanation of what the new 'surrogates' handler does (pass 
rather than reject erroneous surrogates), I think 'surrogates_pass' is 
fine.  Thus, I considoer that and 'surrogates_excape' the best proposal 
the best so far and suggest that you make this pair the current status 
quo to be argued against and improved ... or not.


From eric at  Thu May  7 21:25:50 2009
From: eric at (Eric Smith)
Date: Thu, 07 May 2009 15:25:50 -0400
Subject: [Python-Dev] typo in Format Specification
In-Reply-To: <gtva2l$acv$>
References: <gtva2l$acv$>
Message-ID: <>

Neal Becker wrote:
> "format_spec ::=  [[fill]align][sign][#][0][width][.precision][type]"
> "The precision is ignored for integer values."
> In [36]: '%3x' % 10
> Out[36]: '  a'
> In [37]: '%.3x' % 10
> Out[37]: '00a'
> Apparently, precision is _not_ ignored? 

That section is talking about this:

 >>> format(10, '.3x')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: Precision not allowed in integer format specifier

From eric at  Thu May  7 21:27:27 2009
From: eric at (Eric Smith)
Date: Thu, 07 May 2009 15:27:27 -0400
Subject: [Python-Dev] typo in Format
	Specification	Mini-Language?
In-Reply-To: <>
References: <gtva2l$acv$> <>
Message-ID: <>

Eric Smith wrote:
> Neal Becker wrote:
>> "format_spec ::=  [[fill]align][sign][#][0][width][.precision][type]"
>> "The precision is ignored for integer values."
>> In [36]: '%3x' % 10
>> Out[36]: '  a'
>> In [37]: '%.3x' % 10
>> Out[37]: '00a'
>> Apparently, precision is _not_ ignored? 
> That section is talking about this:
>  >>> format(10, '.3x')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: Precision not allowed in integer format specifier

So I guess it shouldn't say "is ignored", it should be "is not allowed".

From tjreedy at  Thu May  7 21:35:11 2009
From: tjreedy at (Terry Reedy)
Date: Thu, 07 May 2009 15:35:11 -0400
Subject: [Python-Dev] typo in Format Specification
In-Reply-To: <gtva2l$acv$>
References: <gtva2l$acv$>
Message-ID: <gtvd5f$mfg$>

Neal Becker wrote:
> "format_spec ::=  [[fill]align][sign][#][0][width][.precision][type]"
> "The precision is ignored for integer values."
> In [36]: '%3x' % 10
> Out[36]: '  a'
> In [37]: '%.3x' % 10
> Out[37]: '00a'
> Apparently, precision is _not_ ignored? 

Apparent typo reports should go to the tracker, along with version 
information.  In this case, the Format Specification Mini-Language is 
for the new str.format() and format() facilities, not for % formatting, 
which is described in Old String Formatting Operations.  Ironically, you 
report does point to a doc problem: precision is actually not allowed 
for integer types.

 >> format(10, '3x')
'  a'
 >>> format(10, '.3x')
Traceback (most recent call last):
   File "<pyshell#2>", line 1, in <module>
     format(10, '.3x')
ValueError: Precision not allowed in integer format specifier

 >>> '{0:3x}'.format(10)
'  a'
 >>> '{0:.3x}'.format(10)
Traceback (most recent call last):
   File "<pyshell#4>", line 1, in <module>
ValueError: Precision not allowed in integer format specifier

Terry Jan Reedy

From martin at  Thu May  7 21:39:12 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 07 May 2009 21:39:12 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gtvavk$e9v$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>	<>
	<> <gtvavk$e9v$>
Message-ID: <>

> Given your explanation of what the new 'surrogates' handler does (pass
> rather than reject erroneous surrogates), I think 'surrogates_pass' is
> fine.  Thus, I considoer that and 'surrogates_excape' the best proposal
> the best so far and suggest that you make this pair the current status
> quo to be argued against and improved ... or not.

That's exactly what I want to avoid: more bike-shedding. If this is now
changed, it cannot be possibly be argued against and improved - it would
be final, end of discussion (please!!!).

So I'm happy to make it "surrogatepass" and "surrogateescape" as
proposed by Walter. I'm sure you didn't really mean the spelling of
"excape" to be taken literally - whether or not you meant the plural
and the underscore literally, I cannot tell. Stephen Turnbull approved
singular, so that's good enough for me.


From greg at  Thu May  7 22:26:08 2009
From: greg at (Gregory P. Smith)
Date: Thu, 7 May 2009 13:26:08 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <> <>
	<gtt0gp$sos$> <>
	<gttlmn$aua$> <>
	<> <>
	<gtvavk$e9v$> <>
Message-ID: <>

On Thu, May 7, 2009 at 12:39 PM, "Martin v. L?wis" <martin at> wrote:
>> Given your explanation of what the new 'surrogates' handler does (pass
>> rather than reject erroneous surrogates), I think 'surrogates_pass' is
>> fine. ?Thus, I considoer that and 'surrogates_excape' the best proposal
>> the best so far and suggest that you make this pair the current status
>> quo to be argued against and improved ... or not.
> That's exactly what I want to avoid: more bike-shedding. If this is now
> changed, it cannot be possibly be argued against and improved - it would
> be final, end of discussion (please!!!).
> So I'm happy to make it "surrogatepass" and "surrogateescape" as
> proposed by Walter. I'm sure you didn't really mean the spelling of
> "excape" to be taken literally - whether or not you meant the plural
> and the underscore literally, I cannot tell. Stephen Turnbull approved
> singular, so that's good enough for me.

singular is good.

+1 on these names.

From eric at  Thu May  7 23:36:08 2009
From: eric at (Eric Smith)
Date: Thu, 07 May 2009 17:36:08 -0400
Subject: [Python-Dev] py3k build broken
Message-ID: <>


With you ARFLAGS change, I now get the following error on a 32 bit 
Fedora 6 box. I've done "make distclean" and "./configure":

$ make
gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE  -I./Modules/_io -c 
./Modules/_io/textio.c -o Modules/textio.o
gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE  -I./Modules/_io -c 
./Modules/_io/stringio.c -o Modules/stringio.o
gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE  -c ./Modules/zipimport.c -o 
./Modules/zipimport.c: In function ?get_module_code?:
./Modules/zipimport.c:1132: warning: format ?%c? expects type ?int?, but 
argument 3 has type ?long int?
gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE  -c ./Modules/symtablemodule.c 
-o Modules/symtablemodule.o
gcc -pthread -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE  -c ./Modules/xxsubtype.c -o 
gcc -pthread -c -fno-strict-aliasing -g -Wall -Wstrict-prototypes  -I. 
-IInclude -I./Include   -DPy_BUILD_CORE -DSVNVERSION=\"`LC_ALL=C 
svnversion .`\" -o Modules/getbuildinfo.o ./Modules/getbuildinfo.c
rm -f libpython3.1.a
ar @ARFLAGS@ libpython3.1.a Modules/getbuildinfo.o
ar: illegal option -- @
Usage: ar [emulation options] [-]{dmpqrstx}[abcfilNoPsSuvV] 
[member-name] [count] archive-file file...
        ar -M [<mri-script]
   d            - delete file(s) from the archive
   m[ab]        - move file(s) in the archive
   p            - print file(s) found in the archive
   q[f]         - quick append file(s) to the archive
   r[ab][f][u]  - replace existing or insert new file(s) into the archive
   t            - display contents of archive
   x[o]         - extract file(s) from the archive
  command specific modifiers:
   [a]          - put file(s) after [member-name]
   [b]          - put file(s) before [member-name] (same as [i])
   [N]          - use instance [count] of name
   [f]          - truncate inserted file names
   [P]          - use full path names when matching
   [o]          - preserve original dates
   [u]          - only replace files that are newer than current archive 
  generic modifiers:
   [c]          - do not warn if the library had to be created
   [s]          - create an archive index (cf. ranlib)
   [S]          - do not build a symbol table
   [v]          - be verbose
   [V]          - display the version number
   @<file>      - read options from <file>
  emulation options:
   No emulation specific options
ar: supported targets: elf32-i386 a.out-i386-linux efi-app-ia32 
elf32-little elf32-big srec symbolsrec tekhex binary ihex trad-core
make: *** [libpython3.1.a] Error 1

From ziade.tarek at  Thu May  7 23:46:12 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 7 May 2009 23:46:12 +0200
Subject: [Python-Dev] py3k build broken
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 11:36 PM, Eric Smith <eric at> wrote:
> Tarek:
> With you ARFLAGS change, I now get the following error on a 32 bit Fedora 6
> box. I've done "make distclean" and "./configure":

Sorry yes, I am on it now, the produced Makefile is broken, until then
you can change it

<<<  line 71
ARFLAGS=       cr

Tarek Ziad? |

From tjreedy at  Thu May  7 23:49:40 2009
From: tjreedy at (Terry Reedy)
Date: Thu, 07 May 2009 17:49:40 -0400
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>	<>	<>
	<gtvavk$e9v$> <>
Message-ID: <gtvl1k$jm9$>

Martin v. L?wis wrote:
>> Given your explanation of what the new 'surrogates' handler does (pass
>> rather than reject erroneous surrogates), I think 'surrogates_pass' is
>> fine.  Thus, I considoer that and 'surrogates_excape' the best proposal
>> the best so far and suggest that you make this pair the current status
>> quo to be argued against and improved ... or not.
> That's exactly what I want to avoid: more bike-shedding. If this is now
> changed, it cannot be possibly be argued against and improved - it would
> be final, end of discussion (please!!!).
> So I'm happy to make it "surrogatepass" and "surrogateescape" as
> proposed by Walter. I'm sure you didn't really mean the spelling of
> "excape" to be taken literally - whether or not you meant the plural
> and the underscore literally, I cannot tell. Stephen Turnbull approved
> singular, so that's good enough for me.

Those minor tweaks for consistency with existing names are what I meant 
by 'improve' (with good arguments) and I approve of them also. +1 on 
stopping here.

From eric at  Thu May  7 23:51:32 2009
From: eric at (Eric Smith)
Date: Thu, 07 May 2009 17:51:32 -0400
Subject: [Python-Dev] py3k build broken
In-Reply-To: <>
References: <>
Message-ID: <>

Tarek Ziad? wrote:
> On Thu, May 7, 2009 at 11:36 PM, Eric Smith <eric at> wrote:
>> With you ARFLAGS change, I now get the following error on a 32 bit Fedora 6
>> box. I've done "make distclean" and "./configure":
> Sorry yes, I am on it now, the produced Makefile is broken, until then
> you can change it

No problem. I'll wait.

From tjreedy at  Thu May  7 23:51:10 2009
From: tjreedy at (Terry Reedy)
Date: Thu, 07 May 2009 17:51:10 -0400
Subject: [Python-Dev] typo in Format Specification
In-Reply-To: <>
References: <gtva2l$acv$> <>
Message-ID: <gtvl4d$jm9$>

Eric Smith wrote:
> Eric Smith wrote:
>> Neal Becker wrote:
>>> "format_spec ::=  [[fill]align][sign][#][0][width][.precision][type]"
>>> "The precision is ignored for integer values."
>>> In [36]: '%3x' % 10
>>> Out[36]: '  a'
>>> In [37]: '%.3x' % 10
>>> Out[37]: '00a'
>>> Apparently, precision is _not_ ignored? 
>> That section is talking about this:
>>  >>> format(10, '.3x')
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> ValueError: Precision not allowed in integer format specifier
> So I guess it shouldn't say "is ignored", it should be "is not allowed".

My exact suggestion in

From ziade.tarek at  Fri May  8 00:23:10 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 8 May 2009 00:23:10 +0200
Subject: [Python-Dev] py3k build broken
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 7, 2009 at 11:51 PM, Eric Smith <eric at> wrote:
> Tarek Ziad? wrote:
>> On Thu, May 7, 2009 at 11:36 PM, Eric Smith <eric at> wrote:
>>> With you ARFLAGS change, I now get the following error on a 32 bit Fedora
>>> 6
>>> box. I've done "make distclean" and "./configure":
>> Sorry yes, I am on it now, the produced Makefile is broken, until then
>> you can change it
> ...
> No problem. I'll wait.

I have fixed configure by runing autoconf, everything should be fine now

Sorry for the inconvenience.


From google at  Fri May  8 00:27:08 2009
From: google at (MRAB)
Date: Thu, 07 May 2009 23:27:08 +0100
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <gtvl1k$jm9$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>	<>	<>	<gtvavk$e9v$>
	<> <gtvl1k$jm9$>
Message-ID: <>

Terry Reedy wrote:
> Martin v. L?wis wrote:
>>> Given your explanation of what the new 'surrogates' handler does (pass
>>> rather than reject erroneous surrogates), I think 'surrogates_pass' is
>>> fine.  Thus, I considoer that and 'surrogates_excape' the best proposal
>>> the best so far and suggest that you make this pair the current status
>>> quo to be argued against and improved ... or not.
>> That's exactly what I want to avoid: more bike-shedding. If this is now
>> changed, it cannot be possibly be argued against and improved - it would
>> be final, end of discussion (please!!!).
>> So I'm happy to make it "surrogatepass" and "surrogateescape" as
>> proposed by Walter. I'm sure you didn't really mean the spelling of
>> "excape" to be taken literally - whether or not you meant the plural
>> and the underscore literally, I cannot tell. Stephen Turnbull approved
>> singular, so that's good enough for me.
> Those minor tweaks for consistency with existing names are what I meant 
> by 'improve' (with good arguments) and I approve of them also. +1 on 
> stopping here.
We argue because we care. :-)

From eric at  Fri May  8 00:49:21 2009
From: eric at (Eric Smith)
Date: Thu, 07 May 2009 18:49:21 -0400
Subject: [Python-Dev] py3k build broken
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Tarek Ziad? wrote:
> I have fixed configure by runing autoconf, everything should be fine now

And indeed, it's working fine now, thanks.

> Sorry for the inconvenience.

Not a problem. Anyone who volunteers for autoconf work gets a free pass 
from me.


From mal at  Fri May  8 00:50:21 2009
From: mal at (M.-A. Lemburg)
Date: Fri, 08 May 2009 00:50:21 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<gtt377$4mj$>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
>> The error handler for undoing this operation (ie. when converting
>> a Unicode string to some other encoding) should probably use the
>> same name based on symmetry and the fact that the escaping
>> scheme is meant to be used for enabling round-trip safety.
> Could you please familiarize yourself with the implementation
> before commenting further?

I did and it already uses the same (wrong) name for both
encoding and decoding handlers which is good.

The reason for my above comment was that the thread mentions
two different names for the handler depending on the direction,
e.g. "surrogatereplace" and "surrogatepass".

I guess that "surrogatepass" was just an attempt to find a new
name for the "surrogates" error handler (which also doesn't
match the naming scheme) and that got me confused.

I'd use "allowlonesurrogates" as name for the "surrogates" error
handler and "lonesurrogatereplace" for the "utf8b" one.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 08 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                51 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From brett at  Fri May  8 01:29:51 2009
From: brett at (Brett Cannon)
Date: Thu, 7 May 2009 16:29:51 -0700
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
Message-ID: <>

[my python-dev sabbatical is still in effect, so make sure I am at least
cc'ed on any replies to this email]

I cannot be the only person who has a need to run tests conditionally based
on whether the file system is case-sensitive or not, so I feel like I am
re-inventing the wheel for issue 5442 to handle OS X with a case-sensitive
filesystem. Is there a boolean somewhere that I can simply check or get to
know whether the filesystem is case-sensitive?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From robert.kern at  Fri May  8 01:39:41 2009
From: robert.kern at (Robert Kern)
Date: Thu, 07 May 2009 18:39:41 -0500
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2009-05-07 18:29, Brett Cannon wrote:
> [my python-dev sabbatical is still in effect, so make sure I am at least
> cc'ed on any replies to this email]
> I cannot be the only person who has a need to run tests conditionally
> based on whether the file system is case-sensitive or not, so I feel
> like I am re-inventing the wheel for issue 5442 to handle OS X with a
> case-sensitive filesystem. Is there a boolean somewhere that I can
> simply check or get to know whether the filesystem is case-sensitive?

Since one may have more than one filesystem side-by-side, this can't be just be 
a system-wide boolean somewhere. One would have to query the target directory 
for this information. I am not aware of the existence of code that does such a 
query, though.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From solipsis at  Fri May  8 01:48:29 2009
From: solipsis at (Antoine Pitrou)
Date: Thu, 7 May 2009 23:48:29 +0000 (UTC)
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
References: <>
Message-ID: <>

Robert Kern <robert.kern <at>> writes:
> Since one may have more than one filesystem side-by-side, this can't be just
> a system-wide boolean somewhere. One would have to query the target directory 
> for this information. I am not aware of the existence of code that does such
> query, though.

Or you can just be practical and test for it. Create a file "foobar" and see if
you can open "FOOBAR" in read mode...



From ziade.tarek at  Fri May  8 02:36:51 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 8 May 2009 02:36:51 +0200
Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib
Message-ID: <>


I am trying to refactor distutils.log in order to use logging but I
have been bugged by the fact that uses
distutils.util.get_platform() in "addbuilddir".
The problem is the order of imports at initialization time : importing
"logging" into distutils will make the initialization/build fail
because wil break when
trying to import "logging", then "time".

So why looks into distutils ?  because distutils has a few
functions to get some info about the platform and about the Makefile
and some
other header files like pyconfig.h etc.

But I don't think it's the best place for this, and I have a proposal :

let's create a dedicated "sysconfig" module in the standard library
that will provide all the (refactored) functions located in
distutils.sysconfig (but not customize_compiler)
and disutils.util.get_platform.

This module can be used by, by distutils, and others, and will
focus on this role.


Tarek Ziad? |

From andrew at  Fri May  8 02:24:05 2009
From: andrew at (Andrew Bennetts)
Date: Fri, 8 May 2009 10:24:05 +1000
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Robert Kern <robert.kern <at>> writes:
> > 
> > Since one may have more than one filesystem side-by-side, this can't be just
> be 
> > a system-wide boolean somewhere. One would have to query the target directory 
> > for this information. I am not aware of the existence of code that does such
> a 
> > query, though.
> Or you can just be practical and test for it. Create a file "foobar" and see if
> you can open "FOOBAR" in read mode...

Agreed.  That is how Bazaar's test suite detects this, and it works well.


From v+python at  Fri May  8 02:33:02 2009
From: v+python at (Glenn Linderman)
Date: Thu, 07 May 2009 17:33:02 -0700
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<gtt0gp$sos$>	<>	<gttlmn$aua$>	<>	<>	<>	<gtvavk$e9v$>	<>
	<gtvl1k$jm9$> <>
Message-ID: <>

On approximately 5/7/2009 3:27 PM, came the following characters from 
the keyboard of MRAB:
> Terry Reedy wrote:
>> Martin v. L?wis wrote:

>>> So I'm happy to make it "surrogatepass" and "surrogateescape" as

These seem adequate.  It is not what I would choose or suggest, but it 
is adequate, and it is unlikely you can delight everyone with your 
choice of names, or even someone else's choice of names.  These at least 
  have a logical justification for their meaning, and can be documented 

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From aahz at  Fri May  8 03:22:35 2009
From: aahz at (Aahz)
Date: Thu, 7 May 2009 18:22:35 -0700
Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 08, 2009, Tarek Ziad? wrote:
> This module can be used by, by distutils, and others, and will
> focus on this role.

This should get kicked around on python-ideas; I don't think it will
require a full-blown PEP unless there's disagreement about what it should
Aahz (aahz at           <*>

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From john.arbash.meinel at  Fri May  8 03:56:02 2009
From: john.arbash.meinel at (John Arbash Meinel)
Date: Thu, 07 May 2009 20:56:02 -0500
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Andrew Bennetts wrote:
> Antoine Pitrou wrote:
>> Robert Kern <robert.kern <at>> writes:
>>> Since one may have more than one filesystem side-by-side, this can't be just
>> be 
>>> a system-wide boolean somewhere. One would have to query the target directory 
>>> for this information. I am not aware of the existence of code that does such
>> a 
>>> query, though.
>> Or you can just be practical and test for it. Create a file "foobar" and see if
>> you can open "FOOBAR" in read mode...
> Agreed.  That is how Bazaar's test suite detects this, and it works well.
> -Andrew.

Actually, I believe we do:

open('format', 'wb').close()
except IOError, e:
  if e.errno == errno.ENOENT:

I don't know that it really matters, just wanted to indicate we use
'lstat' rather than 'open()' to check. I could be wrong about the test
suite, but I know that is what we do for 'live' files. (We always create
a format file, so we know it is there to 'stat' it via a different name.)


From cournape at  Fri May  8 05:25:53 2009
From: cournape at (David Cournapeau)
Date: Fri, 8 May 2009 12:25:53 +0900
Subject: [Python-Dev] Adding a "sysconfig" module in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 8, 2009 at 9:36 AM, Tarek Ziad? <ziade.tarek at> wrote:
> Hello,
> I am trying to refactor distutils.log in order to use logging but I
> have been bugged by the fact that uses
> distutils.util.get_platform() in "addbuilddir".
> The problem is the order of imports at initialization time : importing
> "logging" into distutils will make the initialization/build fail
> because wil break when
> trying to import "logging", then "time".
> Anyways,
> So why looks into distutils ? ?because distutils has a few
> functions to get some info about the platform and about the Makefile
> and some
> other header files like pyconfig.h etc.
> But I don't think it's the best place for this, and I have a proposal :
> let's create a dedicated "sysconfig" module in the standard library
> that will provide all the (refactored) functions located in
> distutils.sysconfig (but not customize_compiler)
> and disutils.util.get_platform.

If we are talking about putting this into the stdlib proper, I would
suggest thinking about putting information for every platform in
sysconfig, instead of just Unix. I understand it is not an easy
problem (because windows builds are totally different than every other
platform), but it would really help for interoperability with other
build tools. If sysconfig is to become independent of distutils, it
should be cross platform and not unix specific.



From turnbull at  Fri May  8 09:04:34 2009
From: turnbull at (Stephen J. Turnbull)
Date: Fri, 08 May 2009 16:04:34 +0900
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <gtsnhj$sgv$>
	<> <gtt377$4mj$>
	<> <>
Message-ID: <>

M.-A. Lemburg writes:

 > I'd use "allowlonesurrogates" as name for the "surrogates" error
 > handler and "lonesurrogatereplace" for the "utf8b" one.


From cournape at  Fri May  8 10:31:33 2009
From: cournape at (David Cournapeau)
Date: Fri, 8 May 2009 17:31:33 +0900
Subject: [Python-Dev] py3k build broken
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 8, 2009 at 7:23 AM, Tarek Ziad? <ziade.tarek at> wrote:

> I have fixed configure by runing autoconf, everything should be fine now
> Sorry for the inconvenience.

I am the one responsible for this - I did not realize that the
generated configure/Makefile were also in the trunk, and my patch did
not include the generated files. My apologies,



From walter at  Fri May  8 10:34:19 2009
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 08 May 2009 10:34:19 +0200
Subject: [Python-Dev] PEP 383 update: utf8b is now the error handler
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<gtsnhj$sgv$>	<>	<>	<>	<gtt377$4mj$>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Walter D?rwald writes:
>  > "surrogatepass" (for the "don't complain about lone half surrogates"
>  > handler) and "surrogatereplace" sound OK to me. However the other
>  > "...replace" handlers are destructive (i.e. when such a "...replace"
>  > handler is used for encoding, decoding will not produce the original
>  > unicode string).
> That doesn't bother me in the slightest.  "Replace" does not connote
> "destructive" or "non-destructive" to me; it connotes "substitution".
> The fact that other error handlers happen to be destructive doesn't
> affect that at all for me.  YMMV.
>  > The purpose of the PEP 383 error handler however is to be roundtrip
>  > safe, so maybe we should choose a slightly different name?  How
>  > about "surrogateescape"?
> To me, "escape" has a strong connotation of a multicharacter
> representation of a single character, and that's not true here.
> How about "surrogatetranslate"?  I still prefer "surrogatereplace", as
> it's slightly easier for me to type.

I like "surrogatetranslate" better than "surrogateescape" better than

But I'll stop bikesheding now and let Martin decide.


From kristjan at  Fri May  8 11:47:22 2009
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 8 May 2009 09:47:22 +0000
Subject: [Python-Dev] feature request 5804
Message-ID: <>

Hello there.  I have sumitted the following patch:
Add an 'offset' argument to zlib.decompress

I'd be interested on getting some more feedback on it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From status at  Fri May  8 18:07:06 2009
From: status at (Python tracker)
Date: Fri,  8 May 2009 18:07:06 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (05/01/09 - 05/08/09)
Python tracker at

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2188 open (+45) / 15604 closed (+30) / 17792 total (+75)

Open issues with patches:   848

Average duration of open issues: 646 days.
Median duration of open issues: 396 days.

Open Issues Breakdown
   open  2153 (+45)
pending    34 ( +0)

Issues Created Or Reopened (75)

socketmodule.c  on HPUX ia64 without _XOPEN_SOURCE_EXTENDED comp 05/01/09    created  ntai                          

timeit documentation                                             05/01/09    created  hrfeels                       

No library reference tree in chm help file                       05/01/09
CLOSED    created  suraj                         

Hang in Popen.wait() when another process has been created       05/01/09    created  farialima                     

test_capi crashes when called more than once                     05/01/09
CLOSED    created  pitrou                        

Ensure RUNPATH is added to extension modules with RPATH if GNU l 05/01/09    created  flub                          

missing meta-info in documentation pdf                           05/02/09    created  ZeD                           

Stricter codec names                                             05/02/09    created  ezio.melotti                  

strftime fails in non UTF-8 locale                               05/02/09    created  barry-scott                   

strftime docs do not explain locale affect on result string      05/02/09    created  barry-scott                   

strptime fails in non-UTF locale                                 05/02/09    created  pitrou                        

Risk of confusion in multiprocessing module - daemonic processes 05/02/09    created  pakal                         

repr of time.struct_time type does not eval                      05/02/09    created  jwm                           

I need to import the module in the same thread                   05/02/09    created  tyoc                          

Segfault in typeobject.c                                         05/02/09
CLOSED    created  gbritton                      

kqueue for more than one event is broken.                        05/02/09    created  Erik Gorset                   

built-in compile() should take encoding option.                  05/03/09    created  naoki                         

import deadlocks when using fork                                 05/03/09    created  abaron                        

On Windows os.listdir('') -> cwd and os.listdir(u'') -> C:\      05/03/09
CLOSED    created  ezio.melotti                  

Add PyOS_string_to_double function to C API                      05/03/09
CLOSED    created  marketdickinson               

PEP 383 implementation                                           05/03/09
CLOSED    created  loewis                        

Wrong function referenced in documentation of socket.inet_aton   05/03/09
CLOSED    created  phihag                        

Reference platform-independent alternative in socket.inet_ntop d 05/03/09
CLOSED    created  phihag                        

test_parser crashes when run after some other tests              05/04/09    created  pitrou                        

pygettext documentation                                          05/04/09
CLOSED    created  efrerich                      

Confusing float formatting for empty presentation type.          05/04/09
CLOSED    created  marketdickinson               

PEP 362 can be marked as finished?                               05/04/09
CLOSED    created  stutzbach                     

Multi-with patch                                                 05/04/09    created  georg.brandl                  
       patch                                                            update: 1.0 --> 1.1                                    05/04/09
CLOSED    created  gregorlingl                   

When setting complete PYTHONPATH on Python 3.x, paths in the PYT 05/04/09    created  fabioz                        

Odd formatting differences of keywords in reference              05/04/09
CLOSED    created  MLModel                       

bdist_msi - add support for minimum Python version for pure Pyth 05/04/09    created  atuining                      

Typo in library on xmlrpc                                        05/04/09
CLOSED    created           

Missing space after period in xmlrpc library documentation       05/04/09
CLOSED    created           

warnings in unicodeobject.c                                      05/04/09
CLOSED    created  pitrou                        

Transient error in multiprocessing                               05/04/09    created  pitrou                        

Python runtime name hardcoded in wsgiref.simple_server           05/04/09    created  thijs                         

_json: _convertPyInt_AsSsize_t() never raise any error           05/05/09
CLOSED    created  haypo                         

fix gcc -Wextra warnings (compare signed/unsigned)               05/05/09    created  haypo                         

fix gcc warnings: explicit type conversion for uid/gid in posix  05/05/09    created  haypo                         

Better documentation of use of BROWSER environment variable      05/05/09    created  Eddie E                       

Add MSI suport for uninstalling individual versions              05/05/09    created  bethard                       

Problems with dbm documentation                                  05/05/09    created  MLModel                       

Noddy examples haven't been updated to match PEP 3123            05/05/09
CLOSED    created  larry                         

Ensure that PyCapsule_GetPointer calls in ctypes handle errors a 05/05/09    created  larry                         

Wrong type check in check_library_list                           05/05/09
CLOSED    created  cdavid                        

customize_compiler broken                                        05/05/09
CLOSED    created  cdavid                        

Ambiguity in flag documentation                         05/05/09    created  MLModel                       

Bus error in test_posix on Mac OS                                05/05/09
CLOSED    created  eric.smith                    

test_os failure on OS X, probably related to PEP 383             05/05/09
CLOSED    created  marketdickinson               

PyMapping_Check returns 1 for lists                              05/05/09    created  jmillikin                     

Fix spelling error in Capsule docs                               05/05/09
CLOSED    created  larry                         

Deprecate CObject                                                05/05/09
CLOSED    created  larry                         

setlocale regression                                             05/06/09
CLOSED    created  Kerfred                       

IMAP4_SSL spin because of SSLSocket.suppress_ragged_eofs         05/06/09    created  kevinwatters                  

zimport doesn't work with zipfile containing comments            05/06/09    created  dsamersoff                    

email.message : get_payload args's documentation is confusing    05/06/09    created  trolldbois                    

AttributeError exception in urllib.urlopen                       05/07/09
CLOSED    created  sprigogin                     

Add to "whats new": range(n) != range(n)                         05/07/09    created  MLModel                       

PyFrame_GetLineNumber                                            05/07/09    created  jyasskin                      
       patch, needs review                                                     

aifc: close() does not close the underlying file                 05/07/09
CLOSED    reopened amaury.forgeotdarc            

test_distutils fails for Python 3.1b1 on MacOS X                 05/07/09    created  MrJean1                       

Possible mistake regarding writeback in documentation of shelve. 05/07/09    created  MLModel                       

Typo in documentation of shelve.sync                             05/07/09
CLOSED    created  MLModel                       

PyCode_NewEmpty                                                  05/07/09    created  jyasskin                      
       patch, needs review                                                     

Windows Installer Error 1722 when opting for compilation at inst 05/07/09    created  keldonin                      

Missing labelside option for Tix option menu (fix included)      05/07/09    created  caryr                         

Ambiguity about the semantics of sys.exit() and os._exit() in mu 05/07/09    created  pakal                         

Doc error: integer precision in formats                          05/07/09
CLOSED    created  tjreedy                       

WeakSet cmp methods                                              05/08/09    created  schuppenies                   

Format Specs: doc 's' and implicit conversions                   05/08/09    created  tjreedy                       

unnecessary hardlink                                             05/08/09    created  exe                           

PyList_GetSlice does not indicate negative ranges dont work as i 05/08/09    created  ideasman42                    

Generator expression bug?                                        05/08/09
CLOSED    created  svenrahmann                   

setup build with Platform SDK, finding vcvarsall.bat             05/08/09    created  MarcMarc                      

Issues Now Closed (80)

str.format() wrongly formats complex() numbers (Py30a2)           510 days    marketdickinson               

shutil.copyfile blocks indefinitely on named pipes                337 days    pitrou                        

FD leak in urllib2                                                328 days    gregory.p.smith               

IDLE opens window too low on Windows                              303 days    gpolo                         

Option to not-exit on test                                        290 days    michael.foord                 

Ill-formed surrogates not treated as errors during encoding/deco  251 days    benjamin.peterson             

unicode-internal encoder reports wrong length                     249 days    haypo                         

Add Google's to the stdlib                              220 days    gregory.p.smith               

merge json library with latest simplejson 2.0.x                    46 days    benjamin.peterson             

ctypes fails to build on mipsel-linux-gnu (detects mips instead   172 days    theller                       

[PATCH] Better stacklevel for GzipFile.filename DeprecationWarni  170 days    pjenvey                       

UTF7 encoding of slash (character 47) is incorrect                160 days    haypo                         

UTF7 decoding is far too strict                                   160 days    pitrou                        
       patch, needs review                                                     

Patch for better thread support in hashlib                        128 days    gregory.p.smith               

Curses Unicode Support                                            129 days    asmodai                       

find_library can return directories instead of files              118 days    theller                       

unpickling does not intern attribute names                         95 days    pitrou                        

Invalid UTF-8 ("%s") length in PyUnicode_FromFormatV()             94 days    haypo                         

pdb feature request: Ability to skip standard lib modules and ot   91 days    georg.brandl                  

bdist_msi generates version number for pure Python packages        75 days    bethard                       

Multicast example is outdated and ugly                    66 days    gregory.p.smith               

msvcrt bytes cleanup                                               59 days    benjamin.peterson             

Create alternative CObject API that is safe and clean              35 days    benjamin.peterson             

test__locale fails with RADIXCHAR on Windows                       34 days    benjamin.peterson             

cleanUp stack for unittest                                         29 days    yaneurabeya                   

test_zipfile fails under Windows                                   30 days    pitrou                        

os.getpwent returns unsigned 32bit value, os.setuid refuses it     28 days    gregory.p.smith               
       64bit                                                            still tries to copy non-existent test/README                28 days    loewis                        

2.6.2c1 fails to pass test_cmath on Solaris10                      26 days    marketdickinson               

ld_so_aix does exit successfully even in case of failure           22 days    pitrou                        

Support telling TestResult objects a test run has finished         23 days    michael.foord                 

Change ntpath functions to implicitly support UNC paths            16 days    eric.smith                    
       patch, needs review                                                     

IDLE/Win Installer: drop -n switch for 2.7/3.1; install 3.1 as i   11 days    benjamin.peterson             

Full example for emulating a container type                         6 days    yaneurabeya                   

Make complex repr and str more like float repr and str              5 days    marketdickinson               

test_urllib fails on windows                                        8 days    orsenthil                     

Remove extraneous backwards-compatibility attributes from some m    5 days    benjamin.peterson             

detach() implementation                                             2 days    benjamin.peterson             

mmap.write_byte out of bounds - no error, position gets screwed     6 days    bmearns                       

Extra comma in enum - fails on AIX                                  1 days    georg.brandl                  

Subclassing property doesn't preserve the auto __doc__ behavior     4 days    r.david.murray                
       patch, needs review                                                     

Add support to pydoc to output .rst restructured text               0 days    georg.brandl                  

No library reference tree in chm help file                          0 days    georg.brandl                  

test_capi crashes when called more than once                        4 days    benjamin.peterson             

Segfault in typeobject.c                                            5 days    amaury.forgeotdarc            

On Windows os.listdir('') -> cwd and os.listdir(u'') -> C:\         1 days    ezio.melotti                  

Add PyOS_string_to_double function to C API                         0 days    marketdickinson               

PEP 383 implementation                                              1 days    loewis                        

Wrong function referenced in documentation of socket.inet_aton      1 days    georg.brandl                  

Reference platform-independent alternative in socket.inet_ntop d    1 days    georg.brandl                  

pygettext documentation                                             1 days    georg.brandl                  

Confusing float formatting for empty presentation type.             1 days    eric.smith                    

PEP 362 can be marked as finished?                                  0 days    georg.brandl                  
                                                                        update: 1.0 --> 1.1                                       1 days    georg.brandl                  

Odd formatting differences of keywords in reference                 0 days    georg.brandl                  

Typo in library on xmlrpc                                           0 days    georg.brandl                  

Missing space after period in xmlrpc library documentation          0 days    georg.brandl                  

warnings in unicodeobject.c                                         1 days    georg.brandl                  

_json: _convertPyInt_AsSsize_t() never raise any error              0 days    georg.brandl                  

Noddy examples haven't been updated to match PEP 3123               0 days    larry                         

Wrong type check in check_library_list                              1 days    tarek                         

customize_compiler broken                                           3 days    tarek                         

Bus error in test_posix on Mac OS                                   0 days    loewis                        

test_os failure on OS X, probably related to PEP 383                0 days    marketdickinson               

Fix spelling error in Capsule docs                                  0 days    georg.brandl                  

Deprecate CObject                                                   0 days    georg.brandl                  

setlocale regression                                                0 days    georg.brandl                  

AttributeError exception in urllib.urlopen                          1 days    amaury.forgeotdarc            

aifc: close() does not close the underlying file                    1 days    georg.brandl                  

Typo in documentation of shelve.sync                                1 days    MLModel                       

Doc error: integer precision in formats                             1 days    eric.smith                    

Generator expression bug?                                           0 days    r.david.murray                

urllib doesn't correct server returned urls                      1875 days  orsenthil                     
       patch                                                            strips directory info from files       1629 days georg.brandl                  

http_error_302() crashes with 'HTTP/1.1 400 Bad Request          1528 days orsenthil                     

PEP 349: allow str() to return unicode                           1350 days haypo                         

linecache module returns wrong results                           1313 days georg.brandl                  

locale.getpreferredencoding() dies when setlocale fails          1158 days asmodai                       

mailbox.Maildir re-reads directory too often                      881 days akuchling                     

linecache package handling                                        659 days georg.brandl                  

Top Issues Most Discussed (10)

 20 CVE-2008-5983 python: untrusted python modules search path        24 days

 19 test_asynchat fails on Mac OSX                                    18 days

 18 locale.getpreferredencoding() dies when setlocale fails         1158 days

 17 Change ntpath functions to implicitly support UNC paths           16 days

 13 bdist_msi generates version number for pure Python packages       75 days

 10 update: 1.0 --> 1.1                                      1 days

  9 test_os failure on OS X, probably related to PEP 383               0 days

  9 customize_compiler broken                                          3 days

  9 Add Google's to the stdlib                             220 days

  9 Ill-formed surrogates not treated as errors during encoding/dec  251 days

From brett at  Fri May  8 18:52:40 2009
From: brett at (Brett Cannon)
Date: Fri, 8 May 2009 09:52:40 -0700
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <> 
	<> <> 
Message-ID: <>

On Thu, May 7, 2009 at 18:56, John Arbash Meinel <
john.arbash.meinel at> wrote:

> Andrew Bennetts wrote:
> > Antoine Pitrou wrote:
> >> Robert Kern <robert.kern <at>> writes:
> >>> Since one may have more than one filesystem side-by-side, this can't be
> just
> >> be
> >>> a system-wide boolean somewhere. One would have to query the target
> directory
> >>> for this information. I am not aware of the existence of code that does
> such
> >> a
> >>> query, though.
> >> Or you can just be practical and test for it. Create a file "foobar" and
> see if
> >> you can open "FOOBAR" in read mode...
> >
> > Agreed.  That is how Bazaar's test suite detects this, and it works well.
> >
> > -Andrew.
> Actually, I believe we do:
> open('format', 'wb').close()
> try:
>  os.lstat('FoRmAt')
> except IOError, e:
>  if e.errno == errno.ENOENT:
>   ...
> I don't know that it really matters, just wanted to indicate we use
> 'lstat' rather than 'open()' to check. I could be wrong about the test
> suite, but I know that is what we do for 'live' files. (We always create
> a format file, so we know it is there to 'stat' it via a different name.)

Thanks for the help to everyone. I ended up simply taking __file__, making
it all uppercase (or lowercase if it is already uppercase) and then doing
os.path.exists() on the modified name. Seems to work.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From google at  Fri May  8 19:01:55 2009
From: google at (MRAB)
Date: Fri, 08 May 2009 18:01:55 +0100
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Brett Cannon wrote:
> On Thu, May 7, 2009 at 18:56, John Arbash Meinel 
> <john.arbash.meinel at <mailto:john.arbash.meinel at>> wrote:
>     Andrew Bennetts wrote:
>      > Antoine Pitrou wrote:
>      >> Robert Kern <robert.kern <at> <>> writes:
>      >>> Since one may have more than one filesystem side-by-side, this
>     can't be just
>      >> be
>      >>> a system-wide boolean somewhere. One would have to query the
>     target directory
>      >>> for this information. I am not aware of the existence of code
>     that does such
>      >> a
>      >>> query, though.
>      >> Or you can just be practical and test for it. Create a file
>     "foobar" and see if
>      >> you can open "FOOBAR" in read mode...
>      >
>      > Agreed.  That is how Bazaar's test suite detects this, and it
>     works well.
>      >
>      > -Andrew.
>     Actually, I believe we do:
>     open('format', 'wb').close()
>     try:
>      os.lstat('FoRmAt')
>     except IOError, e:
>      if e.errno == errno.ENOENT:
>       ...
>     I don't know that it really matters, just wanted to indicate we use
>     'lstat' rather than 'open()' to check. I could be wrong about the test
>     suite, but I know that is what we do for 'live' files. (We always create
>     a format file, so we know it is there to 'stat' it via a different
>     name.)
> Thanks for the help to everyone. I ended up simply taking __file__, 
> making it all uppercase (or lowercase if it is already uppercase) and 
> then doing os.path.exists() on the modified name. Seems to work.
Alternatively, use swapcase() and then os.path.exists().

From phd at  Fri May  8 19:17:15 2009
From: phd at (Oleg Broytmann)
Date: Fri, 8 May 2009 21:17:15 +0400
Subject: [Python-Dev] Easy way to detect filesystem case-sensitivity?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 08, 2009 at 09:52:40AM -0700, Brett Cannon wrote:
> Thanks for the help to everyone. I ended up simply taking __file__, making
> it all uppercase (or lowercase if it is already uppercase) and then doing
> os.path.exists() on the modified name. Seems to work.

   What if __file__ is on a different filesystem with different rules
(consider NFS, SMB/CIFS, etc.)?

     Oleg Broytmann              phd at
           Programmers don't die, they just GOSUB without RETURN.

From casey at  Fri May  8 19:19:24 2009
From: casey at (Casey Duncan)
Date: Fri, 8 May 2009 11:19:24 -0600
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

On May 4, 2009, at 3:10 AM, Larry Hastings wrote:

> I should have brought this up to python-dev before--sorry for being  
> so slow.  It's already in the tracker for a couple of days:
> The idea: PyGetSetDef has this "void *closure" field that acts like  
> a context pointer.  You stick it in the PyGetSetDef, and it gets  
> passed back to you when your getter or setter is called.  It's a  
> reasonable API design, but in practice you almost never need it.   
> Meanwhile, it clutters up CPython, particularly typeobject.c; there  
> are all these function calls that end with ", NULL);", just to  
> satisfy the getter/setter prototype internally.

I think this is an important feature, which allows you to define  
generic, reusable getter and setter functions and pass static metadata  
to them at runtime. Admittedly I have never needed the full pointer,  
my typical usage is to pass in an offset.

I think this should only be removed if a suitable mechanism replaces  
it, if not it will require some needless duplication of code in  
extensions that use it (in particular my own) 8^)


From benjamin at  Fri May  8 20:09:56 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 8 May 2009 13:09:56 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
Message-ID: <>

A while ago, Guido declared that all special method lookups on
new-style classes bypass __getattr__ and __getattribute__. This almost
completely consistent now, and I've been working on patching up a few
incorrect cases. I've know hit __enter__ and __exit__. The compiler
generates LOAD_ATTR instructions for these, so it uses the normal
lookup. The only way I can see to fix this is add a new opcode which
uses _PyObject_LookupSpecial, but I don't think we really care this
much. Opinions?


From larry at  Fri May  8 21:43:06 2009
From: larry at (Larry Hastings)
Date: Fri, 08 May 2009 12:43:06 -0700
Subject: [Python-Dev] Proposed: drop unnecessary "context" pointer from
In-Reply-To: <>
References: <>
Message-ID: <>

Casey Duncan wrote:
> I think this is an important feature, which allows you to define 
> generic, reusable getter and setter functions and pass static metadata 
> to them at runtime. Admittedly I have never needed the full pointer, 
> my typical usage is to pass in an offset.
> I think this should only be removed if a suitable mechanism replaces 
> it, if not it will require some needless duplication of code in 
> extensions that use it (in particular my own) 8^)

I disagree; I think it is a minor convenience feature, and one which 
encourages a lack of type safety.

A suitable replacement mechanism already exists in C:

    static PyObject *generic_getter(PyObject *o, int context) {
        /* your generic code goes here */

    static PyObject *getter_with_context_1(o) { return generic_getter(o,
    1); }
    static PyObject *getter_with_context_2(o) { return generic_getter(o,
    2); }
    static PyObject *getter_with_context_3(o) { return generic_getter(o,
    3); }

You would then use "getter_with_context_1" &c in your PyGetSetDef.  With 
a clever optimizing compiler this should result in no detectable 
slowdown or code bloat.

However, you will be happy to learn there wasn't much support for this 
change, so it didn't make it into Python 3.1.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Sat May  9 00:41:14 2009
From: tjreedy at (Terry Reedy)
Date: Fri, 08 May 2009 18:41:14 -0400
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <gu2ce9$ft4$>

Benjamin Peterson wrote:
> A while ago, Guido declared that all special method lookups on
> new-style classes bypass __getattr__ and __getattribute__. This almost
> completely consistent now, and I've been working on patching up a few
> incorrect cases. I've know hit __enter__ and __exit__. The compiler
> generates LOAD_ATTR instructions for these, so it uses the normal
> lookup. The only way I can see to fix this is add a new opcode which
> uses _PyObject_LookupSpecial, but I don't think we really care this
> much. Opinions?

1.More consistent attribute lookup is, to me, a feature of 3.x and I 
appreciate you working on this.
2. I am puzzled why those two methods should be extra special, but don't 
know enough to say more.
3. If there are only those two or a couple of other exceptions, I'd like 
them listed in the 'Special method lookup' ref doc section.


From benjamin at  Sat May  9 00:54:23 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 8 May 2009 17:54:23 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <gu2ce9$ft4$>
References: <>
Message-ID: <>

2009/5/8 Terry Reedy <tjreedy at>:
> 2. I am puzzled why those two methods should be extra special, but don't
> know enough to say more.

They're not supposed to be special, which is the reason for this
message. :) Currently the interpreter will call __getattr__ when
looking them up. This is not the way it should be.


From daniel at  Sat May  9 01:10:53 2009
From: daniel at (Daniel Stutzbach)
Date: Fri, 8 May 2009 18:10:53 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 8, 2009 at 1:09 PM, Benjamin Peterson <benjamin at>wrote:

> I've know hit __enter__ and __exit__. The compiler
> generates LOAD_ATTR instructions for these, so it uses the normal
> lookup. The only way I can see to fix this is add a new opcode which
> uses _PyObject_LookupSpecial, but I don't think we really care this
> much. Opinions?

Why does this problem arise only with __enter__ and __exit__?

Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benjamin at  Sat May  9 01:14:12 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 8 May 2009 18:14:12 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/8 Daniel Stutzbach <daniel at>:
> On Fri, May 8, 2009 at 1:09 PM, Benjamin Peterson <benjamin at>
> wrote:
>> I've know hit __enter__ and __exit__. The compiler
>> generates LOAD_ATTR instructions for these, so it uses the normal
>> lookup. The only way I can see to fix this is add a new opcode which
>> uses _PyObject_LookupSpecial, but I don't think we really care this
>> much. Opinions?
> Why does this problem arise only with __enter__ and __exit__?

Normally special methods use slots of the PyTypeObject struct.
typeobject.c looks up all those methods on Python classes correctly.
In the case of __enter__ and __exit__, the compiler generates bytecode
to look them up, and that bytecode use PyObject_Getattr.


From daniel at  Sat May  9 02:36:44 2009
From: daniel at (Daniel Stutzbach)
Date: Fri, 8 May 2009 19:36:44 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson <benjamin at>wrote:

> Normally special methods use slots of the PyTypeObject struct.
> typeobject.c looks up all those methods on Python classes correctly.
> In the case of __enter__ and __exit__, the compiler generates bytecode
> to look them up, and that bytecode use PyObject_Getattr.

Would this problem apply to all special methods that don't use a slot in
PyTypeObject, then?  I know of several other examples:


(unless I misunderstand the definition of "special methods", which is

Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benjamin at  Sat May  9 02:37:45 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 8 May 2009 19:37:45 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/8 Daniel Stutzbach <daniel at>:
> On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson <benjamin at>
> wrote:
>> Normally special methods use slots of the PyTypeObject struct.
>> typeobject.c looks up all those methods on Python classes correctly.
>> In the case of __enter__ and __exit__, the compiler generates bytecode
>> to look them up, and that bytecode use PyObject_Getattr.
> Would this problem apply to all special methods that don't use a slot in
> PyTypeObject, then?? I know of several other examples:

Yes. I didn't think of those.

> __reduce__
> __setstate__
> __reversed__
> __length_hint__
> __sizeof__
> (unless I misunderstand the definition of "special methods", which is
> possible)


From tjreedy at  Sat May  9 02:56:25 2009
From: tjreedy at (Terry Reedy)
Date: Fri, 08 May 2009 20:56:25 -0400
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>	<gu2ce9$ft4$>
Message-ID: <gu2kbo$v1n$>

Benjamin Peterson wrote:
> 2009/5/8 Terry Reedy <tjreedy at>:
>> 2. I am puzzled why those two methods should be extra special, but don't
>> know enough to say more.
> They're not supposed to be special, which is the reason for this
> message. :) Currently the interpreter will call __getattr__ when
> looking them up. This is not the way it should be.

I was trying to ask the same question as Daniel did more clearly, and 
which you answered: they are special special methods because they are 
not in the PyTypeObject struct like the other special (name) methods. 
And that, I presume, is because they are specific to context manager 
objects, while all other 'special' methods (that I notice in 'Special 
method names') are more general in being applicable to multiple types.

Since built-in functions are compiled to load_global, call_function and 
operations to various special op codes, I could imagine that .__enter__ 
and .__exit__ are currently the only implicitly invoked special names 
that explicitly appear in code objects. I can see why you ask before 
burning an opcode (with parameter) to avoid that.

There are two issues: 1) bypass instance lookup; 2) bypass 
.__getattribute__() calling.  I presume you have or can do at least the 
first with a custom .__getattribute__ method.

Terry Jan Reedy

From tjreedy at  Sat May  9 03:47:32 2009
From: tjreedy at (Terry Reedy)
Date: Fri, 08 May 2009 21:47:32 -0400
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <gu2nbj$4kb$>

Benjamin Peterson wrote:
> 2009/5/8 Daniel Stutzbach <daniel at>:
>> On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson <benjamin at>
>> wrote:
>>> Normally special methods use slots of the PyTypeObject struct.
>>> typeobject.c looks up all those methods on Python classes correctly.
>>> In the case of __enter__ and __exit__, the compiler generates bytecode
>>> to look them up, and that bytecode use PyObject_Getattr.
>> Would this problem apply to all special methods that don't use a slot in
>> PyTypeObject, then?  I know of several other examples:
> Yes. I didn't think of those.
>> __reduce__
>> __setstate__
>> __reversed__
>> __length_hint__
>> __sizeof__
>> (unless I misunderstand the definition of "special methods", which is
>> possible)

__reversed__, at least, is called by the reversed() builtin, so there is 
no LOAD_ATTR k (__reversed__) byte code.  So for that, the problem is 
reduced to accessing type(it).__reversed__ without going thru 
type(it).__getattribute__.  I would think that a function that did that 
would work for the others on the list (all 4?) that also have no 
LOAD_ATTR bytecode.  Would a modified version of object.__getattribute__ 


From benjamin at  Sat May  9 03:52:24 2009
From: benjamin at (Benjamin Peterson)
Date: Fri, 8 May 2009 20:52:24 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <gu2nbj$4kb$>
References: <>
Message-ID: <>

2009/5/8 Terry Reedy <tjreedy at>:
> Benjamin Peterson wrote:
>> 2009/5/8 Daniel Stutzbach <daniel at>:
>>> On Fri, May 8, 2009 at 6:14 PM, Benjamin Peterson <benjamin at>
>>> wrote:
>>>> Normally special methods use slots of the PyTypeObject struct.
>>>> typeobject.c looks up all those methods on Python classes correctly.
>>>> In the case of __enter__ and __exit__, the compiler generates bytecode
>>>> to look them up, and that bytecode use PyObject_Getattr.
>>> Would this problem apply to all special methods that don't use a slot in
>>> PyTypeObject, then? ?I know of several other examples:
>> Yes. I didn't think of those.
>>> __reduce__
>>> __setstate__
>>> __reversed__
>>> __length_hint__
>>> __sizeof__
>>> (unless I misunderstand the definition of "special methods", which is
>>> possible)
> __reversed__, at least, is called by the reversed() builtin, so there is no
> LOAD_ATTR k (__reversed__) byte code. ?So for that, the problem is reduced
> to accessing type(it).__reversed__ without going thru
> type(it).__getattribute__. ?I would think that a function that did that
> would work for the others on the list (all 4?) that also have no LOAD_ATTR
> bytecode. ?Would a modified version of object.__getattribute__ work?

No, it's easier to just use _PyObject_LookupSpecial there.


From tjreedy at  Sat May  9 08:26:58 2009
From: tjreedy at (Terry Reedy)
Date: Sat, 09 May 2009 02:26:58 -0400
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<gu2nbj$4kb$>
Message-ID: <gu37nf$u2u$>

Benjamin Peterson wrote:

>>>> __reduce__
>>>> __setstate__
>>>> __reversed__
>>>> __length_hint__
>>>> __sizeof__

> No, it's easier to just use _PyObject_LookupSpecial there.

Does that mean that the above 5 'work correctly' (or can easily be made 
to do so)?  Leaving just __entry__ and __exit__ as problems?

From chris at  Sat May  9 11:02:21 2009
From: chris at (Chris Withers)
Date: Sat, 09 May 2009 10:02:21 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
Message-ID: <>

P.J. Eby wrote:
> I didn't say there's *no* desire, however IIRC the only person who 
> *ever* asked on distutils-sig how to do a base package with setuptools 
> was the author of the ll.* packages. 

I've asked before ;-)


Simplistix - Content Management, Zope & Python Consulting

From chris at  Sat May  9 11:03:53 2009
From: chris at (Chris Withers)
Date: Sat, 09 May 2009 10:03:53 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>> I, for one, have been trying to figure out how to do "base namespace"
>> packages for years...
> You mean, without PEP 382?
> That won't be possible, unless you can coordinate all addon packages.
> Base packages are a feature solely of PEP 382.

Marc-Andre has achieved this, I think, without the PEP, but I never 
really understood how :-S


Simplistix - Content Management, Zope & Python Consulting

From chris at  Sat May  9 11:06:52 2009
From: chris at (Chris Withers)
Date: Sat, 09 May 2009 10:06:52 +0100
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Ok, so create three tar files:
> 1. base.tar, containing
>    simplistix/
>    simplistix/

So this can have code in it? And base.tar can have other 
modules and subpackages in it?
What happens if the base and an addon both define a package called 

> 2. addon1.tar, containing
>    simplistix/addon1.pth (containing a single "*")

What does that * mean? I thought .pth files just had python in them?

> Unpack each of them anywhere on sys.path, in any order.

How would this work if base, addon1 and addon2 were eggs managed by 
buildout or setuptools?



Simplistix - Content Management, Zope & Python Consulting

From martin at  Sat May  9 11:27:22 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 09 May 2009 11:27:22 +0200
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
Message-ID: <>

>> Ok, so create three tar files:
>> 1. base.tar, containing
>>    simplistix/
>>    simplistix/
> So this can have code in it? 

That's the point, yes.

> And base.tar can have other modules and subpackages in it?

Certainly, yes.

> What happens if the base and an addon both define a package called
> simplistix.somepackage?

Depends on whether simplistix.somepackage is a namespace package
(it should). If so, they get merged just as any other namespace

>> 2. addon1.tar, containing
>>    simplistix/addon1.pth (containing a single "*")
> What does that * mean?

See PEP 382 (search for "*").

> I thought .pth files just had python in them?

Not at all - they never did. They have paths in them.

>> Unpack each of them anywhere on sys.path, in any order.
> How would this work if base, addon1 and addon2 were eggs managed by
> buildout or setuptools?

What is a managed egg (i.e. what kind of management does buildout
or setuptools apply to it)?


From asmodai at  Sat May  9 13:24:55 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Sat, 9 May 2009 13:24:55 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

-On [20090501 20:59], "Martin v. L?wis" (martin at wrote:
>Right: if all portions install into the same directory, you can have
>base packages already.

Speaking as a user of packages, this use case is one I hardly ever encounter
with the Python software/modules/packages I use. The only ones that spring
to mind are the mx.* and ll.* packages. The rest simply create their own
namespace as <package>.*, but there's nothing that uses that same namespace
and installs separately from the base package that I know of.

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
Knowledge was inherent in all things. The world was a library...

From martin at  Sat May  9 13:40:48 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 09 May 2009 13:40:48 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
	<>	<>	<>
Message-ID: <>

>> Right: if all portions install into the same directory, you can have
>> base packages already.
> Speaking as a user of packages, this use case is one I hardly ever encounter
> with the Python software/modules/packages I use. The only ones that spring
> to mind are the mx.* and ll.* packages. The rest simply create their own
> namespace as <package>.*, but there's nothing that uses that same namespace
> and installs separately from the base package that I know of.

There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*,
plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*,
xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, 	tw.*,
themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc.


From asmodai at  Sat May  9 13:50:37 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Sat, 9 May 2009 13:50:37 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

-On [20090509 13:40], "Martin v. L?wis" (martin at wrote:
>There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*,
>plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*,
>xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, 	tw.*,
>themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc.

Can be fairly said, though, that the majority of those you just named are
related to Zope?

That would explain why I won't know of them as I avoid Zope like the plague.

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
Hope is a letter that never arrives, delivered by the postman of my

From zookog at  Sat May  9 15:49:13 2009
From: zookog at (Zooko O'Whielacronx)
Date: Sat, 9 May 2009 07:49:13 -0600
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

.pth files are why I can't easily use GNU stow with easy_install.
If installing a Python package involved writing new files into the
filesystem, but did not require reading, updating, and re-writing any
extant files such as .pth files, then GNU stow would Just Work with
easy_install the way it Just Works with most things.



From chris at  Sat May  9 16:07:01 2009
From: chris at (Chris Withers)
Date: Sat, 09 May 2009 15:07:01 +0100
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<>	<>	<>	<>	<>
Message-ID: <>

Jeroen Ruigrok van der Werven wrote:
> -On [20090509 13:40], "Martin v. L?wis" (martin at wrote:
>> There are a few others, though: zope.*, repoze.*, redturtle.*, iw.*,
>> plone.*, pycopia.*, p4a.*, plonehrm.*, plonetheme.*, pbp.*, lovely.*,
>> xm.*, paste.*, Products.*, buildout.*, five.*, silva.*, tl.*, 	tw.*,
>> themerubber.*, themetweaker.*, zc.*, z3c.*, zgeo.*, z3ext.*, etc.
> Can be fairly said, though, that the majority of those you just named are
> related to Zope?

They're also all pure namespace packages rather than base + addons, 
which is what we've been discussing...

> That would explain why I won't know of them as I avoid Zope like the plague.

More fool you...


Simplistix - Content Management, Zope & Python Consulting

From chris at  Sat May  9 16:10:23 2009
From: chris at (Chris Withers)
Date: Sat, 09 May 2009 15:10:23 +0100
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>> So this can have code in it? 
> That's the point, yes.
>> And base.tar can have other modules and subpackages in it?
> Certainly, yes.

Great, when is the PEP due to land in 2.x? ;-)

>> What happens if the base and an addon both define a package called
>> simplistix.somepackage?
> Depends on whether simplistix.somepackage is a namespace package
> (it should). If so, they get merged just as any other namespace
> package.

Sorry, I was looking at potential bug cases here. What happens if it's 
not a namespace package?

> See PEP 382 (search for "*").
>> I thought .pth files just had python in them?
> Not at all - they never did. They have paths in them.

I've certainly seen them with python in, and that's what I hate about 

>>> Unpack each of them anywhere on sys.path, in any order.
>> How would this work if base, addon1 and addon2 were eggs managed by
>> buildout or setuptools?
> What is a managed egg (i.e. what kind of management does buildout
> or setuptools apply to it)?

Sorry, bad wording on my part... I guess I meant more how would 
buildout/setuptools go about installing/uninstalling/etc packages 
thatconform to PEP 382? Would setuptools/buildout need modification or 
would the changes take effect lower down in the stack?



Simplistix - Content Management, Zope & Python Consulting

From asmodai at  Sat May  9 16:14:34 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Sat, 9 May 2009 16:14:34 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

-On [20090509 16:07], Chris Withers (chris at wrote:
>They're also all pure namespace packages rather than base + addons, 
>which is what we've been discussing...

But from Martin's email I understood it more as being base packages. Unless
I misunderstood, of course.

If correct, which is it?

>More fool you...

Maybe, used/worked with it and don't care for it one iota. But that's a
whole different discussion.

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
Naritai jibun wo surikaetemo egao wa itsudemo suteki desuka...

From martin at  Sat May  9 16:18:44 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 09 May 2009 16:18:44 +0200
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>	
	<> <>
Message-ID: <>

Zooko O'Whielacronx wrote:
> .pth files are why I can't easily use GNU stow with easy_install.
> If installing a Python package involved writing new files into the
> filesystem, but did not require reading, updating, and re-writing any
> extant files such as .pth files, then GNU stow would Just Work with
> easy_install the way it Just Works with most things.

Please understand that this is the fault of easy_install, not of .pth
files. There is no technical need for easy_install to rewrite .pth
files on installation. It could just as well have created new .pth
files, rather than modifying existing ones.

If you always use --single-version-externally-managed with easy_install,
it will stop editing .pth files on installation.


From martin at  Sat May  9 16:32:39 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 09 May 2009 16:32:39 +0200
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
	<> <>
Message-ID: <>

Chris Withers wrote:
> Martin v. L?wis wrote:
>>> So this can have code in it? 
>> That's the point, yes.
>>> And base.tar can have other modules and subpackages in it?
>> Certainly, yes.
> Great, when is the PEP due to land in 2.x? ;-)

Most likely, never - it probably will be implemented only after
the last feature release of 2.x was made.

>>> What happens if the base and an addon both define a package called
>>> simplistix.somepackage?
>> Depends on whether simplistix.somepackage is a namespace package
>> (it should). If so, they get merged just as any other namespace
>> package.
> Sorry, I was looking at potential bug cases here. What happens if it's
> not a namespace package?

Then it will be imported as a regular child package.

>>>> Unpack each of them anywhere on sys.path, in any order.
>>> How would this work if base, addon1 and addon2 were eggs managed by
>>> buildout or setuptools?
>> What is a managed egg (i.e. what kind of management does buildout
>> or setuptools apply to it)?
> Sorry, bad wording on my part... I guess I meant more how would
> buildout/setuptools go about installing/uninstalling/etc packages
> thatconform to PEP 382? Would setuptools/buildout need modification or
> would the changes take effect lower down in the stack?

Unfortunately, I don't know precisely what they do, so I don't know
whether any of it needs modification.

All I can say is that if they want to install namespace packages
using the mechanism of PEP 382, they will have to produce the file
layout specified in the PEP.

For distutils (which is the only library in that area that I do know),
I think just installing any .pth files inside a package would be


From martin at  Sat May  9 16:34:28 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 09 May 2009 16:34:28 +0200
Subject: [Python-Dev] PEP 382: Namespace Packages
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Jeroen Ruigrok van der Werven wrote:
> -On [20090509 16:07], Chris Withers (chris at wrote:
>> They're also all pure namespace packages rather than base + addons, 
>> which is what we've been discussing...
> But from Martin's email I understood it more as being base packages. Unless
> I misunderstood, of course.
> If correct, which is it?

The list I gave you was a list of distributions that include namespace
packages (using the setuptools mechanism). I don't think that any of
them has the notion of a base package, as the setuptools mechanism
doesn't support base packages.


From pje at  Sat May  9 16:41:02 2009
From: pje at (P.J. Eby)
Date: Sat, 09 May 2009 10:41:02 -0400
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 04:18 PM 5/9/2009 +0200, Martin v. L??wis wrote:
>Zooko O'Whielacronx wrote:
> > .pth files are why I can't easily use GNU stow with easy_install.
> > If installing a Python package involved writing new files into the
> > filesystem, but did not require reading, updating, and re-writing any
> > extant files such as .pth files, then GNU stow would Just Work with
> > easy_install the way it Just Works with most things.
>Please understand that this is the fault of easy_install, not of .pth
>files. There is no technical need for easy_install to rewrite .pth
>files on installation. It could just as well have created new .pth
>files, rather than modifying existing ones.
>If you always use --single-version-externally-managed with easy_install,
>it will stop editing .pth files on installation.

It's --multi-version (-m) that does 
that.  --single-version-externally-managed is a " install" option.

Both have the effect of not editing .pth files, but they do so in 
different ways.  The " install" option causes it to install 
in a distutils-compatible layout, whereas --multi-version simply 
drops .egg files or directories in the target location and leaves it 
to the user (or the generated script wrappers) to add them to sys.path.

From martin at  Sat May  9 16:42:01 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 09 May 2009 16:42:01 +0200
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

>> If you always use --single-version-externally-managed with easy_install,
>> it will stop editing .pth files on installation.
> It's --multi-version (-m) that does that. 
> --single-version-externally-managed is a " install" option.
> Both have the effect of not editing .pth files, but they do so in
> different ways.  The " install" option causes it to install in a
> distutils-compatible layout, whereas --multi-version simply drops .egg
> files or directories in the target location and leaves it to the user
> (or the generated script wrappers) to add them to sys.path.

Ah, ok. Is there also an easy_install invocation that unpacks the zip
file into some location of sys.path (which then wouldn't require
editing sys.path)?


From pje at  Sat May  9 17:39:52 2009
From: pje at (P.J. Eby)
Date: Sat, 09 May 2009 11:39:52 -0400
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 04:42 PM 5/9/2009 +0200, Martin v. L?wis wrote:
> >> If you always use --single-version-externally-managed with easy_install,
> >> it will stop editing .pth files on installation.
> >
> > It's --multi-version (-m) that does that.
> > --single-version-externally-managed is a " install" option.
> >
> > Both have the effect of not editing .pth files, but they do so in
> > different ways.  The " install" option causes it to install in a
> > distutils-compatible layout, whereas --multi-version simply drops .egg
> > files or directories in the target location and leaves it to the user
> > (or the generated script wrappers) to add them to sys.path.
>Ah, ok. Is there also an easy_install invocation that unpacks the zip
>file into some location of sys.path (which then wouldn't require
>editing sys.path)?

Not as yet.  I'm sort of waiting to see what comes out of PEP 376 
discussions re: an installation manifest...  but then, if I actually 
had time to work on it right now, I'd probably just implement something.

Currently, you can use pip to do that, though, as long as the 
packages you want are in source form.  pip doesn't unzip eggs as yet.

It would be really straightforward, though, for someone to implement 
an easy_install variant that does this.  Just invoke "easy_install 
-Zmaxd /some/tmpdir packagelist" to get a full set of unpacked .egg 
directories in /some/tmpdir, and then move the contents of the 
resulting .egg subdirs to the target location, renaming EGG-INFO 
subdirs to projectname-version.egg-info subdirs.

(Of course, this ignores the issue of uninstalling previous versions, 
or overwriting of conflicting files in the target -- does pip handle these?)

From benjamin at  Sat May  9 17:52:11 2009
From: benjamin at (Benjamin Peterson)
Date: Sat, 9 May 2009 10:52:11 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <gu37nf$u2u$>
References: <>
Message-ID: <>

2009/5/9 Terry Reedy <tjreedy at>:
> Benjamin Peterson wrote:
>>>>> __reduce__
>>>>> __setstate__
>>>>> __reversed__
>>>>> __length_hint__
>>>>> __sizeof__
>> No, it's easier to just use _PyObject_LookupSpecial there.
> Does that mean that the above 5 'work correctly' (or can easily be made to
> do so)? ?Leaving just __entry__ and __exit__ as problems?

Yes, __enter__ and __exit__ are the tricky ones.


From p.f.moore at  Sat May  9 18:03:20 2009
From: p.f.moore at (Paul Moore)
Date: Sat, 9 May 2009 17:03:20 +0100
Subject: [Python-Dev] PEP 382: little help for stupid people?
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

2009/5/9 Chris Withers <chris at>:
> Martin v. L?wis wrote:
>>> I thought .pth files just had python in them?
>> Not at all - they never did. They have paths in them.
> I've certainly seen them with python in, and that's what I hate about
> them...

AIUI, there was a small special case that lines starting with "import"
are executed (see the source of for details). This exception
has been exploited (some would say "abused", but I'm trying to be
unbiased here) by setuptools, at least, to do path manipulations and

PEP 382 does not provide the import exception: "Unlike .pth files on
the top level, lines starting with "import" are not supported in
per-package .pth files". It's not clear to me what impact this would
have on setuptools (probably none, as top-level .pth files aren't


From g.brandl at  Sat May  9 19:16:55 2009
From: g.brandl at (Georg Brandl)
Date: Sat, 09 May 2009 19:16:55 +0200
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <gu4dqt$mjj$>

Benjamin Peterson schrieb:
> A while ago, Guido declared that all special method lookups on
> new-style classes bypass __getattr__ and __getattribute__. This almost
> completely consistent now, and I've been working on patching up a few
> incorrect cases. I've know hit __enter__ and __exit__. The compiler
> generates LOAD_ATTR instructions for these, so it uses the normal
> lookup. The only way I can see to fix this is add a new opcode which
> uses _PyObject_LookupSpecial, but I don't think we really care this
> much. Opinions?

It's easier to introduce a separate opcode like SETUP_WITH; the compilation
of a with statement produces quite a lot of bytecode which could be made
more efficient that way.


Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From greg.ewing at  Sun May 10 03:10:53 2009
From: greg.ewing at (Greg Ewing)
Date: Sun, 10 May 2009 13:10:53 +1200
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

Are we solving an actual problem by changing the
behaviour here, or is it just a case of foolish

Seems to me that trying to pin down exactly what
constitutes a "special method" is a fool's errand,
especially if you want it to include __enter__ and
__exit__ but not __reduce__, etc.


From benjamin at  Sun May 10 03:25:28 2009
From: benjamin at (Benjamin Peterson)
Date: Sat, 9 May 2009 20:25:28 -0500
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/9 Greg Ewing <greg.ewing at>:
> Are we solving an actual problem by changing the
> behaviour here, or is it just a case of foolish
> consistency?

"No implementation detail is obscure enough."

For example, Maciek Fijalkowski of PyPy told me that he cares about
this because someone is bound to eventually rely on it, and PyPy will
have to follow CPython.

> Seems to me that trying to pin down exactly what
> constitutes a "special method" is a fool's errand,
> especially if you want it to include __enter__ and
> __exit__ but not __reduce__, etc.

IMO, if it's a callable that begins with __ and ends with __, it's a
special method.


From zooko at  Sun May 10 17:41:33 2009
From: zooko at (Zooko Wilcox-O'Hearn)
Date: Sun, 10 May 2009 09:41:33 -0600
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On May 9, 2009, at 9:39 AM, P.J. Eby wrote:

> It would be really straightforward, though, for someone to  
> implement an easy_install variant that does this.  Just invoke  
> "easy_install -Zmaxd /some/tmpdir packagelist" to get a full set of  
> unpacked .egg directories in /some/tmpdir, and then move the  
> contents of the resulting .egg subdirs to the target location,  
> renaming EGG-INFO subdirs to projectname-version.egg-info subdirs.

Except for the renaming part, this is exactly what GNU stow does.

> (Of course, this ignores the issue of uninstalling previous  
> versions, or overwriting of conflicting files in the target -- does  
> pip handle these?)

GNU stow does handle these issues.



From martin at  Sun May 10 19:18:16 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 10 May 2009 19:18:16 +0200
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

> GNU stow does handle these issues.

If GNU stow solves all your problems, why do you want to
use easy_install in the first place?


From zooko at  Sun May 10 20:04:57 2009
From: zooko at (Zooko Wilcox-O'Hearn)
Date: Sun, 10 May 2009 12:04:57 -0600
Subject: [Python-Dev] how GNU stow is complementary rather than alternative
	to distutils
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

On May 10, 2009, at 11:18 AM, Martin v. L?wis wrote:

> If GNU stow solves all your problems, why do you want to use  
> easy_install in the first place?

That's a good question.  The answer is that there are two separate  
jobs: building executables and putting them in a directory structure  
of the appropriate shape for your system is one job, and installing  
or uninstalling that tree into your system is another.  GNU stow does  
only the latter.

The input to GNU stow is a set of executables, library files, etc.,  
in a directory tree that is of the right shape for your system.  For  
example, if you are on a Linux system, then your scripts all need to  
be in $prefix/bin/, your shared libs should be in $prefix/lib, your  
Python packages ought to be in $prefix/lib/python$x.$y/site- 
packages/, etc.  GNU stow is blissfully ignorant about all issues of  
building binaries, and choosing where to place files, etc. -- that's  
the job of the build system of the package, e.g. the "./configure -- 
prefix=foo && make && make install" for most C packages, or the  
"python ./ install --prefix=foo" for Python packages using  
distutils (footnote 1).

Once GNU stow has the well-shaped directory which is the output of  
the build process, then it follows a very dumb, completely reversible  
(uninstallable) process of symlinking those files into the system  
directory structure.

It is a beautiful, elegant hack because it is sooo dumb.  It is also  
very nice to use the same tool to manage packages written in any  
programming language, provided only that they can build a directory  
tree of the right shape and content.

However, there are lots of things that it doesn't do, such as  
automatically acquiring and building dependencies, or producing  
executables for the target platform for each of your console  
scripts.  Not to mention creating a directory named "$prefx/lib/python 
$x.$y/site-packages" and cp'ing your Python files into it.  That's  
why you still need a build system even if you use GNU stow for an  
install-and-uninstall system.

The thing that prevents this from working with setuptools is that  
setuptools creates a file named easy_install.pth during the "python ./ install --prefix=foo" if you build two different Python  
packages this way, they will each create an easy_install.pth file,  
and then when you ask GNU stow to link the two resulting packages  
into your system, it will say "You are asking me to install two  
different packages which both claim that they need to write a file  
named '/usr/local/lib/python2.5/site-packages/easy_install.pth'.  I'm  
too dumb to deal with this conflict, so I give up.".  If I understand  
correctly, your (MvL's) suggestion that easy_install create a .pth  
file named "easy_install-$PACKAGE-$VERSION.pth" instead of  
"easy_install.pth" would indeed make it work with GNU stow.



footnote 1: Aside from the .pth file issue, the other reason that  
setuptools doesn't work for this use while distutils does is that  
setuptools tries to hard to save you from making a mistake: maybe you  
don't know what you are doing if you ask it to install into a  
previously non-existent prefix dir "foo".  This one is easier to fix: # "be more like distutils  
with regard to --prefix=" .

From martin at  Sun May 10 20:21:48 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 10 May 2009 20:21:48 +0200
Subject: [Python-Dev] how GNU stow is complementary rather than
	alternative to distutils
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

Zooko Wilcox-O'Hearn wrote:
> On May 10, 2009, at 11:18 AM, Martin v. L?wis wrote:
>> If GNU stow solves all your problems, why do you want to use
>> easy_install in the first place?
> That's a good question.  The answer is that there are two separate jobs:
> building executables and putting them in a directory structure of the
> appropriate shape for your system is one job, and installing or
> uninstalling that tree into your system is another.  GNU stow does only
> the latter.

And so does easy_install - it's job is *not* to build the executables
and to put them in a directory structure. Instead, it's
distutils/setuptools which has this job.

The primary purpose of easy_install is to download the files from PyPI

> The thing that prevents this from working with setuptools is that
> setuptools creates a file named easy_install.pth

It will stop doing that if you ask nicely. That's why I recommended
earlier that you do ask it not to edit .pth files.

> If I understand correctly,
> your (MvL's) suggestion that easy_install create a .pth file named
> "easy_install-$PACKAGE-$VERSION.pth" instead of "easy_install.pth" would
> indeed make it work with GNU stow.

My recommendation is that you use the already existing flag to install that stops it from editing .pth files.


From zookog at  Sun May 10 20:21:57 2009
From: zookog at (Zooko O'Whielacronx)
Date: Sun, 10 May 2009 12:21:57 -0600
Subject: [Python-Dev] how GNU stow is complementary rather than
	alternative to distutils
In-Reply-To: <>
References: <> <>
Message-ID: <>

following-up to my own post to mention one very important reason why
anyone cares:

On Sun, May 10, 2009 at 12:04 PM, Zooko Wilcox-O'Hearn <zooko at> wrote:

> It is a beautiful, elegant hack because it is sooo dumb. ?It is also very
> nice to use the same tool to manage packages written in any programming
> language, provided only that they can build a directory tree of the right
> shape and content.

And, you are not relying on the author of the package that you are
installing to avoid accidentally or maliciously screwing up your
system.  You're not even relying on the authors of the *build system*
(e.g. the authors of distutils or easy_install).  You are relying
*only* on GNU stow to avoid accidentally or maliciously screwing up
your system, and GNU stow is very dumb, so it is easy to understand
what it is going to do and why that isn't going to irreversibly screw
up your system.

That is: you don't run the "build yourself and install into $prefix"
step as root.  This is an important consideration for a lot of people,
who absolutely refuse on principle to ever run "sudo python
./" on a system that they care about unless they wrote the
"" script themselves.  (Likewise they refuse to run "sudo make
install" on packages written in C.)



From pje at  Sun May 10 20:48:46 2009
From: pje at (P.J. Eby)
Date: Sun, 10 May 2009 14:48:46 -0400
Subject: [Python-Dev] how GNU stow is complementary rather than
 alternative to distutils
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 12:04 PM 5/10/2009 -0600, Zooko Wilcox-O'Hearn wrote:
>The thing that prevents this from working with setuptools is that
>setuptools creates a file named easy_install.pth during the "python 
>./ install --prefix=foo" if you build two different Python
>packages this way, they will each create an easy_install.pth file,
>and then when you ask GNU stow to link the two resulting packages
>into your system, it will say "You are asking me to install two
>different packages which both claim that they need to write a file
>named '/usr/local/lib/python2.5/site-packages/easy_install.pth'.

Adding --record and --single-version-externally-managed to that 
command line will prevent the .pth file from being used or needed, 
although I believe you already know this.

(What that mode won't do is install dependencies automatically.) 

From ncoghlan at  Sun May 10 23:51:32 2009
From: ncoghlan at (Nick Coghlan)
Date: Mon, 11 May 2009 07:51:32 +1000
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> A while ago, Guido declared that all special method lookups on
> new-style classes bypass __getattr__ and __getattribute__. This almost
> completely consistent now, and I've been working on patching up a few
> incorrect cases. I've know hit __enter__ and __exit__. The compiler
> generates LOAD_ATTR instructions for these, so it uses the normal
> lookup. The only way I can see to fix this is add a new opcode which
> uses _PyObject_LookupSpecial, but I don't think we really care this
> much. Opinions?

As Georg pointed out, the expectation was that we would eventually add a
SETUP_WITH opcode that used the special method lookup (and hopefully
speed with statements up to a point where they're competitive with
writing out the associated try statement directly). The current code is
the way it is because there is no "LOAD_SPECIAL" opcode and adding type
dereferencing logic to the expansion would have been difficult without a
custom opcode.

For other special methods that are looked up from Python code, the
closest we can ever get is to bypass the instance (i.e. using
"type(obj).__method__(obj, *args)") to avoid metaclass confusion. The
type slots are even *more* special than that because they bypass
__getattribute__ and __getattr__ even on the metaclass for speed reasons.

There's a reason the docs already say that for a guaranteed override you
*must* actually define the special method on the class rather than
merely making it accessible via __getattr__ or even __getattribute__.

The PyPy guys are right to think that some developer somewhere is going
to rely on these implementation details in CPython at some point.
However lots of developers rely on CPython ref counting as well, no
matter how many times they're told not to do that if they want to
support alternative interpreters.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From fuzzyman at  Mon May 11 00:20:01 2009
From: fuzzyman at (Michael Foord)
Date: Sun, 10 May 2009 23:20:01 +0100
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> Benjamin Peterson wrote:
>> A while ago, Guido declared that all special method lookups on
>> new-style classes bypass __getattr__ and __getattribute__. This almost
>> completely consistent now, and I've been working on patching up a few
>> incorrect cases. I've know hit __enter__ and __exit__. The compiler
>> generates LOAD_ATTR instructions for these, so it uses the normal
>> lookup. The only way I can see to fix this is add a new opcode which
>> uses _PyObject_LookupSpecial, but I don't think we really care this
>> much. Opinions?
> As Georg pointed out, the expectation was that we would eventually add a
> SETUP_WITH opcode that used the special method lookup (and hopefully
> speed with statements up to a point where they're competitive with
> writing out the associated try statement directly). The current code is
> the way it is because there is no "LOAD_SPECIAL" opcode and adding type
> dereferencing logic to the expansion would have been difficult without a
> custom opcode.
> For other special methods that are looked up from Python code, the
> closest we can ever get is to bypass the instance (i.e. using
> "type(obj).__method__(obj, *args)") to avoid metaclass confusion. The
> type slots are even *more* special than that because they bypass
> __getattribute__ and __getattr__ even on the metaclass for speed reasons.
> There's a reason the docs already say that for a guaranteed override you
> *must* actually define the special method on the class rather than
> merely making it accessible via __getattr__ or even __getattribute__.
> The PyPy guys are right to think that some developer somewhere is going
> to rely on these implementation details in CPython at some point.
> However lots of developers rely on CPython ref counting as well, no
> matter how many times they're told not to do that if they want to
> support alternative interpreters.

It's actually very annoying for things like writing Mock or proxy 
objects when this behaviour is inconsistent (sorry should have spoken up 

The Python interpreter bases some of its decisions on whether these 
methods exist at all - and when you have objects that provide methods 
through __getattr__ then you can accidentally get screwed if magic 
method lookup returns an object unexpectedly when it should have raised 
an AttributeError.

Of course for proxy objects it might be more convenient if *all* 
attribute access did go through __getattr__ - but with that not the case 
it is much better for it to be consistent rather than have to put in 
specific workaround code.

All the best,


> Cheers,
> Nick.


From google at  Mon May 11 00:50:40 2009
From: google at (MRAB)
Date: Sun, 10 May 2009 23:50:40 +0100
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Michael Foord wrote:
> Nick Coghlan wrote:
>> Benjamin Peterson wrote:
>>> A while ago, Guido declared that all special method lookups on
>>> new-style classes bypass __getattr__ and __getattribute__. This almost
>>> completely consistent now, and I've been working on patching up a few
>>> incorrect cases. I've know hit __enter__ and __exit__. The compiler
>>> generates LOAD_ATTR instructions for these, so it uses the normal
>>> lookup. The only way I can see to fix this is add a new opcode which
>>> uses _PyObject_LookupSpecial, but I don't think we really care this
>>> much. Opinions?
>> As Georg pointed out, the expectation was that we would eventually add a
>> SETUP_WITH opcode that used the special method lookup (and hopefully
>> speed with statements up to a point where they're competitive with
>> writing out the associated try statement directly). The current code is
>> the way it is because there is no "LOAD_SPECIAL" opcode and adding type
>> dereferencing logic to the expansion would have been difficult without a
>> custom opcode.
>> For other special methods that are looked up from Python code, the
>> closest we can ever get is to bypass the instance (i.e. using
>> "type(obj).__method__(obj, *args)") to avoid metaclass confusion. The
>> type slots are even *more* special than that because they bypass
>> __getattribute__ and __getattr__ even on the metaclass for speed reasons.
>> There's a reason the docs already say that for a guaranteed override you
>> *must* actually define the special method on the class rather than
>> merely making it accessible via __getattr__ or even __getattribute__.
>> The PyPy guys are right to think that some developer somewhere is going
>> to rely on these implementation details in CPython at some point.
>> However lots of developers rely on CPython ref counting as well, no
>> matter how many times they're told not to do that if they want to
>> support alternative interpreters.
> It's actually very annoying for things like writing Mock or proxy 
> objects when this behaviour is inconsistent (sorry should have spoken up 
> earlier).
> The Python interpreter bases some of its decisions on whether these 
> methods exist at all - and when you have objects that provide methods 
> through __getattr__ then you can accidentally get screwed if magic 
> method lookup returns an object unexpectedly when it should have raised 
> an AttributeError.
> Of course for proxy objects it might be more convenient if *all* 
> attribute access did go through __getattr__ - but with that not the case 
> it is much better for it to be consistent rather than have to put in 
> specific workaround code.
Suggestion: have something like "from __future__" but affecting
compile-time behaviour (like pragmas in some other languages), such as
causing Python to generate bytecodes which perform all attribute access
through __getattr__.

From david.lyon at  Mon May 11 03:32:11 2009
From: david.lyon at (David Lyon)
Date: Sun, 10 May 2009 21:32:11 -0400
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, 10 May 2009 09:41:33 -0600, Zooko Wilcox-O'Hearn <zooko at>
>> (Of course, this ignores the issue of uninstalling previous
>> versions, or overwriting of conflicting files in the target -- does
>> pip handle these?)
> GNU stow does handle these issues.

I'm not sure GNU stow will handle the .PTH when deinstalling packages.

In easy_install.PTH there will be a list of all the packages installed.

This list really needs to be edited once a package is removed.

The .PTH files are a really good part of python. Definitely nothing
evil about them.


From giuott at  Mon May 11 14:26:49 2009
From: giuott at (Giuseppe Ottaviano)
Date: Mon, 11 May 2009 14:26:49 +0200
Subject: [Python-Dev] how GNU stow is complementary rather than
	alternative to distutils
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

Talking of stow, I take advantage of this thread to do some shameless  
advertising :)
Recently I uploaded to PyPI a software of mine, BPT [1], which does  
the same symlinking trick of stow, but it is written in Python (and  
with a simple api) and, more importantly, it allows with another trick  
the relocation of the installation directory (it creates a semi- 
isolated environment, similar to virtualenv).
I find it very convenient when I have to switch between several  
versions of the same packages (for example during development), or I  
have to deploy on the same machine software that needs different  
versions of the dependencies.

I am planning to write an integration layer with buildout and  
easy_install. It should be very easy, since BPT can handle directly  
tarballs (and directories, in trunk) which contain a


P.S. I was not aware of stow, I'll add it to the references and see if  
there are any features that I can steal

From aahz at  Mon May 11 14:46:44 2009
From: aahz at (Aahz)
Date: Mon, 11 May 2009 05:46:44 -0700
Subject: [Python-Dev] Switchover:
Message-ID: <>

On Monday 2009-05-11, will be switched to another machine
starting roughly at 14:00 UTC.  This should be invisible (expected
downtime is less than ten minutes).
Aahz (aahz at           <*>

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From fumanchu at  Mon May 11 18:53:51 2009
From: fumanchu at (Robert Brewer)
Date: Mon, 11 May 2009 09:53:51 -0700
Subject: [Python-Dev] py3k, cgi, email, and form-data
Message-ID: <F1962646D3B64642B7C9A06068EE1E6418B3DA@ex10.hostedexchange.local>

There's a major change in functionality in the cgi module between Python
2 and Python 3 which I've just run across: the behavior of
FieldStorage.read_multi, specifically when an HTTP app accepts a file
upload within a multipart/form-data payload.

In Python 2, each part would be read in sequence within its own
FieldStorage instance. This allowed file uploads to be shunted to a
TemporaryFile (via make_file) as needed:

    klass = self.FieldStorageClass or self.__class__
    part = klass(self.fp, {}, ib,
                 environ, keep_blank_values, strict_parsing)
    # Throw first part away
    while not part.done:
        headers = rfc822.Message(self.fp)
        part = klass(self.fp, headers, ib,
                     environ, keep_blank_values, strict_parsing)

In Python 3 (svn revision 72466), the whole request body is read into
memory first via, and then broken into separate parts in a
second step:

    klass = self.FieldStorageClass or self.__class__
    parser = email.parser.FeedParser()
    # Create bogus content-type header for proper multipart parsing
    parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
    full_msg = parser.close()
    # Get subparts
    msgs = full_msg.get_payload()
    for msg in msgs:
        fp = StringIO(msg.get_payload())
        part = klass(fp, msg, ib, environ, keep_blank_values,

This makes the cgi module in Python 3 somewhat crippled for handling
multipart/form-data file uploads of any significant size (and since
the client is the one determining the size, opens a server up for an
unexpected Denial of Service vector).

I *think* the FeedParser is designed to accept incremental writes,
but I haven't yet found a way to do any kind of incremental reads
from it in order to shunt the out to a tempfile again.
I'm secretly hoping Barry has a one-liner fix for this. ;)

Robert Brewer
fumanchu at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From fumanchu at  Mon May 11 18:40:11 2009
From: fumanchu at (Robert Brewer)
Date: Mon, 11 May 2009 09:40:11 -0700
Subject: [Python-Dev] py3k, cgi, and form-data
Message-ID: <1242060011.19084.20.camel@haku>

There's a major change in functionality in the cgi module between Python
2 and Python 3 which I've just run across: the behavior of
FieldStorage.read_multi, specifically when an HTTP app accepts a file
upload within a multipart/form-data payload.

In Python 2, each part would be read in sequence within its own
FieldStorage instance. This allowed file uploads to be shunted to a
TemporaryFile (via make_file) as needed:

    klass = self.FieldStorageClass or self.__class__
    part = klass(self.fp, {}, ib,
                 environ, keep_blank_values, strict_parsing)
    # Throw first part away
    while not part.done:
        headers = rfc822.Message(self.fp)
        part = klass(self.fp, headers, ib,
                     environ, keep_blank_values, strict_parsing)

In Python 3 (svn revision 72466), the whole request body is read into
memory first via, and then broken into separate parts in a
second step:

    klass = self.FieldStorageClass or self.__class__
    parser = email.parser.FeedParser()
    # Create bogus content-type header for proper multipart parsing
    parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
    full_msg = parser.close()
    # Get subparts
    msgs = full_msg.get_payload()
    for msg in msgs:
        fp = StringIO(msg.get_payload())
        part = klass(fp, msg, ib, environ, keep_blank_values,

This makes the cgi module in Python 3 somewhat crippled for handling
multipart/form-data file uploads of any significant size (and since
the client is the one determining the size, opens a server up for an
unexpected Denial of Service vector).

I *think* the FeedParser is designed to accept incremental writes,
but I haven't yet found a way to do any kind of incremental reads
from it in order to shunt the out to a tempfile again.
I'm secretly hoping Barry has a one-liner fix for this. ;)

Robert Brewer
fumanchu at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pje at  Mon May 11 18:35:58 2009
From: pje at (P.J. Eby)
Date: Mon, 11 May 2009 12:35:58 -0400
Subject: [Python-Dev] .pth files are evil
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 04:42 PM 5/9/2009 +0200, Martin v. L?wis wrote:
> >> If you always use --single-version-externally-managed with easy_install,
> >> it will stop editing .pth files on installation.
> >
> > It's --multi-version (-m) that does that.
> > --single-version-externally-managed is a " install" option.
> >
> > Both have the effect of not editing .pth files, but they do so in
> > different ways.  The " install" option causes it to install in a
> > distutils-compatible layout, whereas --multi-version simply drops .egg
> > files or directories in the target location and leaves it to the user
> > (or the generated script wrappers) to add them to sys.path.
>Ah, ok. Is there also an easy_install invocation that unpacks the zip
>file into some location of sys.path (which then wouldn't require
>editing sys.path)?

No; you'd have to use the -e option to easy_install to download and 
extract a source version of the package; then run that package's, e.g.:

    easy_install -eb /some/tmpdir SomeProject
    cd /some/tmpdir/someproject  # subdir is always lowercased/normalized install --single-version-externally-managed --record=...

I suspect that this is basically what pip is doing under the hood, as 
that would explain why it doesn't support .egg files.

I previously posted code to the distutils-sig that was an .egg 
unpacker with appropriate renaming, though.  It was untested, and 
assumes you already checked for collisions in the target directory, 
and that you're handling any uninstall manifest yourself.  It could 
probably be modified to take a filter function, though, something like:

def flatten_egg(egg_filename, extract_dir, filter=lambda s,d: d):
      eggbase = os.path.filename(egg_filename)+'-info'
      def file_filter(src, dst):
          if src.startswith('EGG-INFO/'):
              src = eggbase+s[8:]
              dst = os.path.join(extract_dir, *src.split('/'))
          return filter(src, dst)
      return unpack_archive(egg_filename, extract_dir, file_filter)

Then you could pass in a None-returning filter function to check and 
accumulate collisions and generate a manifest.  A second run with the 
default filter would do the unpacking.

(This function should work with either .egg files or .egg directories 
as input, btw, since unpack_archive treats a directory input as if it 
were an archive.)

Anyway, if you used "easy_install -mxd /some/tmpdir [specs]" to get 
your target eggs found/built, you could then run this flattening 
function (with appropriate filter functions) over the *.egg contents 
of /some/tmpdir to do the actual installation.

(The reason for using -mxd instead of -Zmaxd or -zmaxd is that we 
don't care whether the eggs are zipped or not, and we leave out the 
-a so that dependencies already present on sys.path aren't copied or 
re-downloaded to the target; only dependencies we don't already have 
will get dropped in /some/tmpdir.)

Of course, the devil of this is in the details; to handle conflicts 
and uninstalls properly you would need to know what namespace 
packages were in the eggs you are installing.  But if you don't care 
about blindly overwriting things (as the distutils does not), then 
it's actually pretty easy to make such an unpacker.

I mainly haven't made one myself because I *do* care about things 
being blindly overwritten.

From asmodai at  Mon May 11 19:29:55 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Mon, 11 May 2009 19:29:55 +0200
Subject: [Python-Dev] Switchover:
In-Reply-To: <>
References: <>
Message-ID: <>

-On [20090511 14:47], Aahz (aahz at wrote:
>On Monday 2009-05-11, will be switched to another machine
>starting roughly at 14:00 UTC.  This should be invisible (expected
>downtime is less than ten minutes).

The headers for the python checkins mails are apparently different now. So
people might want to adjust any filtering.

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
The reverse side also has a reverse side...

From cesare.dimauro at  Mon May 11 20:00:16 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Mon, 11 May 2009 20:00:16 +0200 (CEST)
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At the last PyCon3 at Italy I've presented a new Python implementation,
which you'll find at

WPython is a re-implementation of (some parts of) Python, which drops
support for bytecode in favour of a wordcode-based model (where a is word
is 16 bits wide).

It also implements an hybrid stack-register virtual machine, and adds a
lot of other optimizations.

The slides are available in the download area, and explain the concept of
wordcode, showing also how work some optimizations, comparing them with
the current Python (2.6.1).

Unfortunately I had not time to make extensive benchmarks with real code,
so I've included some that I made with PyStone, PyBench, and a couple of
simple recoursive function calls (Fibonacci and Factorial).

This is the first release, and another two are scheduled; the first one to
make it possibile to select (almost) any optimization to be compiled (so
fine grained tests will be possibile).

The latter will be a rewrite of the constant folding code (specifically
for tuples, lists and dicts), removing a current "hack" to the python type
system to make them "hashable" for the constants dictionary used by

Then I'll start writing some documentation that will explain what parts of
code are related to a specific optimization, so that it'll be easier to
create patches for other Python implementations, if needed.

You'll find a bit more informations in the "README FIRST!" file present
into the project's repository.

I made so many changes to the source of Python 2.6.1, so feel free to ask
me for any information about them.


From google at  Mon May 11 20:28:20 2009
From: google at (MRAB)
Date: Mon, 11 May 2009 19:28:20 +0100
Subject: [Python-Dev] py3k, cgi, email, and form-data
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6418B3DA@ex10.hostedexchange.local>
References: <F1962646D3B64642B7C9A06068EE1E6418B3DA@ex10.hostedexchange.local>
Message-ID: <>

Robert Brewer wrote:
> There's a major change in functionality in the cgi module between Python
> 2 and Python 3 which I've just run across: the behavior of
> FieldStorage.read_multi, specifically when an HTTP app accepts a file
> upload within a multipart/form-data payload.
> In Python 2, each part would be read in sequence within its own
> FieldStorage instance. This allowed file uploads to be shunted to a
> TemporaryFile (via make_file) as needed:
>     klass = self.FieldStorageClass or self.__class__
>     part = klass(self.fp, {}, ib,
>                  environ, keep_blank_values, strict_parsing)
>     # Throw first part away
>     while not part.done:
>         headers = rfc822.Message(self.fp)
>         part = klass(self.fp, headers, ib,
>                      environ, keep_blank_values, strict_parsing)
>         self.list.append(part)
> In Python 3 (svn revision 72466), the whole request body is read into
> memory first via, and then broken into separate parts in a
> second step:
>     klass = self.FieldStorageClass or self.__class__
>     parser = email.parser.FeedParser()
>     # Create bogus content-type header for proper multipart parsing
>     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
>     parser.feed(
>     full_msg = parser.close()
>     # Get subparts
>     msgs = full_msg.get_payload()
>     for msg in msgs:
>         fp = StringIO(msg.get_payload())
>         part = klass(fp, msg, ib, environ, keep_blank_values,
>                      strict_parsing)
>         self.list.append(part)
> This makes the cgi module in Python 3 somewhat crippled for handling
> multipart/form-data file uploads of any significant size (and since
> the client is the one determining the size, opens a server up for an
> unexpected Denial of Service vector).
> I *think* the FeedParser is designed to accept incremental writes,
> but I haven't yet found a way to do any kind of incremental reads
> from it in order to shunt the out to a tempfile again.
> I'm secretly hoping Barry has a one-liner fix for this. ;)
It think what it needs is for the email.parser.FeedParser class to have
a feed_from_file() method, supported by the class BufferedSubFile.

The BufferedSubFile class keeps an internal list of lines. Perhaps it
could also have a list of files, so that when the list of lines becomes
empty it can continue by reading lines from the files instead, dropping
a file from the list when it reaches the end, something like this:

class BufferedSubFile(object):
     def __init__(self):
         # The last partial line pushed into this object.
         self._partial = ''
         # The list of full, pushed lines, in reverse order
         self._lines = []
         # The list of files.
         self._files = []

     def readline(self):
         while not self._lines and self._files:
             data = self._files[0].read(MAX_DATA_SIZE)
             if data:
                 del self._files[0]
         if not self._lines:
             if self._closed:
                 return ''
             return NeedMoreData

     def push_file(self, data_file):
         """Push some new data from a file into this object."""


and then:

class FeedParser:
     def feed(self, data):
         """Push more data into the parser."""

     def feed_from_file(self, data_file):
         """Push more data from a file into the parser."""


From solipsis at  Mon May 11 22:27:54 2009
From: solipsis at (Antoine Pitrou)
Date: Mon, 11 May 2009 20:27:54 +0000 (UTC)
Subject: [Python-Dev] A wordcode-based Python
References: <>
	<> <>
Message-ID: <>


> WPython is a re-implementation of (some parts of) Python, which drops
> support for bytecode in favour of a wordcode-based model (where a is word
> is 16 bits wide).

This is great!
Have you planned to port in to the py3k branch? Or, at least, to trunk?
Some opcode and VM optimizations have gone in after 2.6 was released, although
nothing as invasive as you did.

About the CISC-y instructions, have you tried merging the fast and const arrays
in frame objects? That way, you need less opcode space (since e.g.
BINARY_ADD_FAST_FAST will cater with constants as well as local variables).



From collinw at  Mon May 11 23:14:44 2009
From: collinw at (Collin Winter)
Date: Mon, 11 May 2009 14:14:44 -0700
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <> <>
Message-ID: <>

Hi Cesare,

On Mon, May 11, 2009 at 11:00 AM, Cesare Di Mauro
<cesare.dimauro at> wrote:
> At the last PyCon3 at Italy I've presented a new Python implementation,
> which you'll find at

Good to see some more attention on Python performance! There's quite a
bit going on in your changes; do you have an
optimization-by-optimization breakdown, to give an idea about how much
performance each optimization gives?

Looking over the slides, I see that you still need to implement
functionality to make test_trace pass, for example; do you have a
notion of how much performance it will cost to implement the rest of
Python's semantics in these areas?

Also, I checked out wpython at head to run Unladen Swallow's
benchmarks against it, but it refuses to compile with either gcc 4.0.1
or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build
failures off-list, if you're interested.

Collin Winter

From martin at  Mon May 11 23:26:16 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 11 May 2009 23:26:16 +0200
Subject: [Python-Dev] albatross backup
Message-ID: <>

Hi Sean,

Can you please setup backup for albatross?

I gave sudo permissions to the "jafo" user, which has
the key jafo at authorized.

I think the policy now is that root logins to albatross
are not allowed. So what might work is this:

Create an rsyncbackup user, and give it sudo permission
to run rsync (any command line arguments). Put your backup
pubkey into rsyncbackup's authorized_keys.

Could that actually work?

albatross admins: would that be an acceptable setup?

As for volumes to backup: I think /srv needs regular backup.
Not sure about any of the others (and neither sure what your
current strategy is wrt. volumes on the other machines).
Compared to /srv, everything else is peanuts, anyway.


P.S. I have removed ~root/.ssh/authorized_keys. It only
contained my key, and root logins are disallowed, anyway.

P.P.S. You can stop doing regular backups to bag. I think we
should keep the machine one for a little while, then turn
it off and keep it around for a further while, and then return
it to XS4ALL; making a complete dump before returning it.

From martin at  Tue May 12 00:13:16 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 12 May 2009 00:13:16 +0200
Subject: [Python-Dev] albatross backup
In-Reply-To: <>
References: <>
Message-ID: <>

[please ignore this message - I sent it to the wrong mailing list]


From skip at  Tue May 12 05:18:25 2009
From: skip at (skip at
Date: Mon, 11 May 2009 22:18:25 -0500
Subject: [Python-Dev] albatross backup
In-Reply-To: <>
References: <>
Message-ID: <>

    Martin> As for volumes to backup: I think /srv needs regular backup.
    Martin> Not sure about any of the others ....

Backup of /usr/local/spambayes-corpus would be very helpful.


From supreet.sethi at  Tue May 12 08:27:25 2009
From: supreet.sethi at (s|s)
Date: Tue, 12 May 2009 11:57:25 +0530
Subject: [Python-Dev] using help function in Py3k
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 5, 2009 at 7:13 PM, Daniel Stutzbach
<daniel at> wrote:
> On Tue, May 5, 2009 at 5:41 AM, s|s <supreet.sethi at> wrote:
>> LookupError: unknown encoding: uft-8
> uft-8?
> Looks like a variation of Issue 4540 (or a duplicate?? I can't tell)

Yes. It is the same issue. I don't think pydoc should be modified. In
my humble opinion tests should exist in /usr/share or /usr/share/doc.

> --
> Daniel Stutzbach, Ph.D.
> President, Stutzbach Enterprises, LLC


From cesare.dimauro at  Tue May 12 08:42:19 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Tue, 12 May 2009 08:42:19 +0200 (CEST)
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, May 11, 2009 10:27PM, Antoine Pitrou wrote:

Hi Antoine

> Hi,
>> WPython is a re-implementation of (some parts of) Python, which drops
>> support for bytecode in favour of a wordcode-based model (where a is
>> word
>> is 16 bits wide).
> This is great!
> Have you planned to port in to the py3k branch? Or, at least, to trunk?

It was my idea too, but first I need to take a deep look at what parts
of code are changed from 2.6 to 3.0.
That's because I don't know how much work is required for this
"forward" port.

> Some opcode and VM optimizations have gone in after 2.6 was released,
> although
> nothing as invasive as you did.

:-D Interesting.

> About the CISC-y instructions, have you tried merging the fast and const
> arrays
> in frame objects? That way, you need less opcode space (since e.g.
> BINARY_ADD_FAST_FAST will cater with constants as well as local
> variables).
> Regards
> Antoine.

It's an excellent idea, that needs exploration.

Running my stats tools against all .py files found in Lib and Tools
folders, I discovered that the maximum index used for fast/locals
is 79, and 1853 for constants.

So if I find a way to easily map locals first and constants following
in the same array, your great idea can be implemented saving
A LOT of opcodes and reducing ceval.c source code.

I'll work on that after the two releases that I planned.

Thanks for your precious suggestions!


From cesare.dimauro at  Tue May 12 08:54:01 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Tue, 12 May 2009 08:54:01 +0200 (CEST)
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <> <> 
Message-ID: <>

Hi Collin

On Mon, May 11, 2009 11:14PM, Collin Winter wrote:
> Hi Cesare,
> On Mon, May 11, 2009 at 11:00 AM, Cesare Di Mauro
> <cesare.dimauro at> wrote:
>> At the last PyCon3 at Italy I've presented a new Python implementation,
>> which you'll find at
> Good to see some more attention on Python performance! There's quite a
> bit going on in your changes; do you have an
> optimization-by-optimization breakdown, to give an idea about how much
> performance each optimization gives?

I planned it in the next release that will come may be next week.

I'll introduce some #DEFINEs and #IFs in the code, so that
only specific optimizations will be enabled.

> Looking over the slides, I see that you still need to implement
> functionality to make test_trace pass, for example; do you have a
> notion of how much performance it will cost to implement the rest of
> Python's semantics in these areas?

Very little. That's because there are only two tests on test_trace that
don't pass.

I think that the reason stays in the changes that I made in the loops.
With my code SETUP_LOOP and POP_BREAK are completely
removed, so the code in settrace will failt to recognize the loop and
the virtual machine crashes.

I'll fix it in the second release that I have planned.

> Also, I checked out wpython at head to run Unladen Swallow's
> benchmarks against it, but it refuses to compile with either gcc 4.0.1
> or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build
> failures off-list, if you're interested.
> Thanks,
> Collin Winter

I'm very interested, thanks. That's because I worked only on Windows
machines, so I definitely need to test and fix it to let it run on any other


From solipsis at  Tue May 12 13:40:29 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 12 May 2009 11:40:29 +0000 (UTC)
Subject: [Python-Dev] A wordcode-based Python
References: <>
	<> <>
Message-ID: <>

Hi Cesare,

Cesare Di Mauro <cesare.dimauro <at>> writes:
> It was my idea too, but first I need to take a deep look at what parts
> of code are changed from 2.6 to 3.0.
> That's because I don't know how much work is required for this
> "forward" port.

If you have some questions or need some help, send me a message.



From cesare.dimauro at  Tue May 12 13:45:47 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Tue, 12 May 2009 13:45:47 +0200 (CEST)
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, May 12, 2009 01:40PM, Antoine Pitrou wrote:
> Hi Cesare,
> Cesare Di Mauro <cesare.dimauro <at>> writes:
>> It was my idea too, but first I need to take a deep look at what parts
>> of code are changed from 2.6 to 3.0.
>> That's because I don't know how much work is required for this
>> "forward" port.
> If you have some questions or need some help, send me a message.
> Regards
> Antoine.

OK, thanks. :)

Another note. Fredrik Johansson let me note just few minutes ago that I've
compiled my sources without PGO optimizations enabled.

That's because I used Visual Studio Express Edition.

So another gain in performances can be obtained. :)


From collinw at  Tue May 12 17:27:11 2009
From: collinw at (Collin Winter)
Date: Tue, 12 May 2009 08:27:11 -0700
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Tue, May 12, 2009 at 4:45 AM, Cesare Di Mauro
<cesare.dimauro at> wrote:
> Another note. Fredrik Johansson let me note just few minutes ago that I've
> compiled my sources without PGO optimizations enabled.
> That's because I used Visual Studio Express Edition.
> So another gain in performances can be obtained. :)

FWIW, Unladen Swallow experimented with gcc 4.4's FDO and got an
additional 10-30% (depending on the benchmark). The training load is
important, though: some training sets offered better performance than
others. I'd be interested in how MSVC's PGO compares to gcc's FDO in
terms of overall effectiveness. The results for gcc FDO with our
2009Q1 release are at the bottom of

Collin Winter

From cesare.dimauro at  Tue May 12 18:41:45 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Tue, 12 May 2009 18:41:45 +0200 (CEST)
Subject: [Python-Dev] A wordcode-based Python
In-Reply-To: <>
References: <> <> 
Message-ID: <>

On Tue, May 12, 2009 05:27 PM, Collin Winter wrote:
> On Tue, May 12, 2009 at 4:45 AM, Cesare Di Mauro
> <cesare.dimauro at> wrote:
>> Another note. Fredrik Johansson let me note just few minutes ago that
>> I've
>> compiled my sources without PGO optimizations enabled.
>> That's because I used Visual Studio Express Edition.
>> So another gain in performances can be obtained. :)
> FWIW, Unladen Swallow experimented with gcc 4.4's FDO and got an
> additional 10-30% (depending on the benchmark). The training load is
> important, though: some training sets offered better performance than
> others. I'd be interested in how MSVC's PGO compares to gcc's FDO in
> terms of overall effectiveness. The results for gcc FDO with our
> 2009Q1 release are at the bottom of
> Collin Winter

Unfortunately I can't test PGO, since I use the Express Editions of VS.
May be Martin or othe mainteners of the Windows versions can help here.

However it'll be difficult to find a good enough profile for the binaries
distributed for the official Python. FDO brings to quite different results
based on the profile selected.


From asmodai at  Tue May 12 18:43:55 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Tue, 12 May 2009 18:43:55 +0200
Subject: [Python-Dev] Switchover:
In-Reply-To: <>
References: <>
Message-ID: <>

-On [20090512 18:41], Barry Warsaw (barry at wrote:
>Somehow, personalization got turned off for python-checkins.  This  
>disables VERPing of the headers.  I've turned it back on, so please  
>let me know if that fixes the issue.  This did not appear to happen  
>site-wide, just for python-checkins AFAICT.

Yes, the current batches are arriving with personilization again. I don't
mind either way, just thought a heads up was warranted. ;)

Thanks Barry,

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
The Idea does not replace the work...

From barry at  Tue May 12 18:41:19 2009
From: barry at (Barry Warsaw)
Date: Tue, 12 May 2009 12:41:19 -0400
Subject: [Python-Dev] Switchover:
In-Reply-To: <>
References: <>
Message-ID: <>

On May 11, 2009, at 1:29 PM, Jeroen Ruigrok van der Werven wrote:

> -On [20090511 14:47], Aahz (aahz at wrote:
>> On Monday 2009-05-11, will be switched to another  
>> machine
>> starting roughly at 14:00 UTC.  This should be invisible (expected
>> downtime is less than ten minutes).
> The headers for the python checkins mails are apparently different  
> now. So
> people might want to adjust any filtering.

Somehow, personalization got turned off for python-checkins.  This  
disables VERPing of the headers.  I've turned it back on, so please  
let me know if that fixes the issue.  This did not appear to happen  
site-wide, just for python-checkins AFAICT.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <>

From jmatejek at  Tue May 12 20:42:52 2009
From: jmatejek at (=?ISO-8859-1?Q?Jan_Mate=28jek?=)
Date: Tue, 12 May 2009 20:42:52 +0200
Subject: [Python-Dev] CVE-2008-5983 "untrusted python modules search
In-Reply-To: <>
References: <>
Message-ID: <>

Hash: SHA1

Antoine Pitrou napsal(a):
> Hello,
> I don't think it has already posted to the list, apologies if it has.
> Some Linux tools and vendors have been hit by an alleged "security hole" where
> an embedded Python interpreter will prepend the current working directory to
> sys.path as soon as PySys_SetArgv() is called by the embedding application. This
> means, for example, that a Python file in the working directory can break
> plugins or extensions written for that application if the Python file happens to
> shadow another module.
> Regardless of whether this is a security hole or not, it certainly can make
> things disturbingly surprising when the situation arises. In the bug report
> (, I suggested we add a new function
> PySys_SetArgvEx() which would take an additional parameter telling whether to
> touch sys.path or not (in the same spirit as Py_InitializeEx() providing a more
> flexible API than Py_Initialize()).
> On the other hand, I don't think we can change the default behaviour of
> PySys_SetArgv(), since there are probably tools and applications relying on it
> (the obvious use case which comes to my mind is a third-party interactive
> interpreter).
> Any opinions?

yes! Actually, i wanted to propose and implement something like this
back when this vulnerability appeared, but i never got to it.

I'd propose to create a whole new function, called, say,
PySys_FillArgv() (no, i don't think that's a very good name) that would
- -only- fill sys.argv and not touch sys.path. In addition to that, there
would be a function like PySys_SetScriptPath() that would not fill
sys.argv, but prepend the script's directory to sys.path
Then i'd reimplement PySys_SetArgv as { PySys_FillArgv();
PySys_SetScriptPath(); }

And as a final killing step, i would never ever mention PySys_SetArgv
anywhere but in its own documentation ;e) And especially not in the
first page of "Embedding Python".

My rationale is that the only application deliberately using
PySys_SetArgv the way it's written is a Python interpreter. For that,
it's desirable to have '.' in sys.path _when no script is being
executed_. For *all other applications*, this makes no sense ;e)


> Regards
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
Version: GnuPG v2.0.11 (GNU/Linux)
Comment: Using GnuPG with SUSE -


From solipsis at  Wed May 13 00:06:12 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 12 May 2009 22:06:12 +0000 (UTC)
Subject: [Python-Dev] Shorter release schedule?
Message-ID: <>


Just food for thought here, but seeing how 3.1 is going to be a real featureful
schedule despite being released shortly after 3.0, wouldn't it make sense to
tighten future release planning a little? I was thinking something like doing a
major release every 12 months (rather than 18 to 24 months as has been
heuristically the case lately). This could also imply switching to some kind of
loosely time-based release system.

If I'm wildly off-base, you can either flame me, ignore me, or assign me
annoying release blockers involving memoryviews and weird character encodings :-)



From google at  Wed May 13 00:25:26 2009
From: google at (MRAB)
Date: Tue, 12 May 2009 23:25:26 +0100
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Hello,
> Just food for thought here, but seeing how 3.1 is going to be a real featureful
> schedule despite being released shortly after 3.0, wouldn't it make sense to
> tighten future release planning a little? I was thinking something like doing a
> major release every 12 months (rather than 18 to 24 months as has been
> heuristically the case lately). This could also imply switching to some kind of
> loosely time-based release system.
> If I'm wildly off-base, you can either flame me, ignore me, or assign me
> annoying release blockers involving memoryviews and weird character encodings :-)
Next you'll be saying that they should be named after years. Python
2010, anyone? :-)

I think that releases should depend on whether there are enough changes
for one.

From solipsis at  Wed May 13 00:29:23 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 12 May 2009 22:29:23 +0000 (UTC)
Subject: [Python-Dev] Shorter release schedule?
References: <>
Message-ID: <>

MRAB <google <at>> writes:
> Next you'll be saying that they should be named after years. Python
> 2010, anyone? 

After py3k, that would be a regression ;)



From barry at  Wed May 13 00:29:43 2009
From: barry at (Barry Warsaw)
Date: Tue, 12 May 2009 18:29:43 -0400
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

On May 12, 2009, at 6:06 PM, Antoine Pitrou wrote:

> Just food for thought here, but seeing how 3.1 is going to be a real  
> featureful
> schedule despite being released shortly after 3.0, wouldn't it make  
> sense to
> tighten future release planning a little? I was thinking something  
> like doing a
> major release every 12 months (rather than 18 to 24 months as has been
> heuristically the case lately). This could also imply switching to  
> some kind of
> loosely time-based release system.
> If I'm wildly off-base, you can either flame me, ignore me, or  
> assign me
> annoying release blockers involving memoryviews and weird character  
> encodings :-)

I've been in favor of that for a while now.  With the move to a DVCS  
(how's that coming along?) I think we can have more solid, releasable  
trunks for longer in the cycle.  Then, we'd have feature branches  
which wouldn't land in trunk until they too are solid and complete  
(with docs and tests).  If a particular feature doesn't make it, it'll  
just wait until the next release, which would be only 12 months off  
instead of almost 2 years off.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <>

From collinw at  Wed May 13 00:35:31 2009
From: collinw at (Collin Winter)
Date: Tue, 12 May 2009 15:35:31 -0700
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 12, 2009 at 3:06 PM, Antoine Pitrou <solipsis at> wrote:
> Hello,
> Just food for thought here, but seeing how 3.1 is going to be a real featureful
> schedule despite being released shortly after 3.0, wouldn't it make sense to
> tighten future release planning a little? I was thinking something like doing a
> major release every 12 months (rather than 18 to 24 months as has been
> heuristically the case lately). This could also imply switching to some kind of
> loosely time-based release system.

I'd be in favor of a shorter, 12-month release cycle. I think the
limiting resource would be the time and energy of the release managers
and the package builders for Windows, etc. Provided it's not a tax on
the release staff, I think shorter release cycles would be a benefit
to the community. My own experience with time-based releases at work
is that it greatly helps focus energy and attention, knowing that you
can't simply delay the release if you slack off on your features/bugs.


From martin at  Wed May 13 05:26:08 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 13 May 2009 05:26:08 +0200
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

> Just food for thought here, but seeing how 3.1 is going to be a real featureful
> schedule despite being released shortly after 3.0, wouldn't it make sense to
> tighten future release planning a little?

Do you have any specific releases in mind that you would like to apply
such a tightened schedule to?

> I was thinking something like doing a
> major release every 12 months (rather than 18 to 24 months as has been
> heuristically the case lately). 

Such a schedule was initially used for the first 2.x releases. We then
switched to 18 months because of user complaints: if releases come too
frequently, the users are confused as to what release they should be
using. Even 24 months is too frequently for some: some people are only
starting to move to 2.5 right now - when we have stopped maintaining
it already.

One question is what would happen to the old releases: would we still
maintain them? If so, how many of them? For how long?


From fumanchu at  Wed May 13 05:43:21 2009
From: fumanchu at (Robert Brewer)
Date: Tue, 12 May 2009 20:43:21 -0700
Subject: [Python-Dev] [Web-SIG] py3k, cgi, email, and form-data
In-Reply-To: <>
References: <AcnSWQ/GR3W2RBf3RAKfzKnEHXpWuQ==>
Message-ID: <F1962646D3B64642B7C9A06068EE1E64085736FB@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> 2009/5/12 Robert Brewer <fumanchu at>:
> > There's a major change in functionality in the cgi module between
> Python
> > 2 and Python 3 which I've just run across: the behavior of
> > FieldStorage.read_multi, specifically when an HTTP app accepts a file
> > upload within a multipart/form-data payload.
> >
> > In Python 2, each part would be read in sequence within its own
> > FieldStorage instance. This allowed file uploads to be shunted to a
> > TemporaryFile (via make_file) as needed:
> >
> > ??? klass = self.FieldStorageClass or self.__class__
> > ??? part = klass(self.fp, {}, ib,
> > ???????????????? environ, keep_blank_values, strict_parsing)
> > ??? # Throw first part away
> > ??? while not part.done:
> > ??????? headers = rfc822.Message(self.fp)
> > ??????? part = klass(self.fp, headers, ib,
> > ???????????????????? environ, keep_blank_values, strict_parsing)
> > ??????? self.list.append(part)
> >
> > In Python 3 (svn revision 72466), the whole request body is read into
> > memory first via, and then broken into separate parts in a
> > second step:
> >
> > ??? klass = self.FieldStorageClass or self.__class__
> > ??? parser = email.parser.FeedParser()
> > ??? # Create bogus content-type header for proper multipart parsing
> > ??? parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type,
> ib))
> > ??? parser.feed(
> > ??? full_msg = parser.close()
> > ??? # Get subparts
> > ??? msgs = full_msg.get_payload()
> > ??? for msg in msgs:
> > ??????? fp = StringIO(msg.get_payload())
> > ??????? part = klass(fp, msg, ib, environ, keep_blank_values,
> > ???????????????????? strict_parsing)
> > ??????? self.list.append(part)
> >
> > This makes the cgi module in Python 3 somewhat crippled for handling
> > multipart/form-data file uploads of any significant size (and since
> > the client is the one determining the size, opens a server up for an
> > unexpected Denial of Service vector).
> >
> > I *think* the FeedParser is designed to accept incremental writes,
> > but I haven't yet found a way to do any kind of incremental reads
> > from it in order to shunt the out to a tempfile again.
> > I'm secretly hoping Barry has a one-liner fix for this. ;)
> FWIW, Werkzeug gave up on 'cgi' module for form passing and implements
> its own.
> Not sure whether this issue in Python 3.0 was one of the reasons or
> not. I know one of the reasons was because cgi.FieldStorage is not
> WSGI 1.0 compliant. One of the main reasons that no one actually
> adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
> been addressed by a proper amendment to WSGI 1.0 specification or a
> new WSGI 1.1 specification to allow a hint to readline().
> The Werkzeug form processing module is properly WSGI 1.0 compliant,
> meaning that Wekzeug is possibly the only major WSGI framework to be
> WSGI compliant.

FWIW, I just added a replacement for the cgi module to CherryPy over the weekend for the same reasons. It's in the python3 branch but will get backported to CherryPy 3.2 for Python 2.x.

Robert Brewer
fumanchu at

From greg.ewing at  Wed May 13 03:54:23 2009
From: greg.ewing at (Greg Ewing)
Date: Wed, 13 May 2009 13:54:23 +1200
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

MRAB wrote:

> Next you'll be saying that they should be named after years. Python
> 2010, anyone? :-)

To keep people on their toes, we should switch to a
completely random new naming scheme with every release,
like Microsoft has been doing with Windows.


From graham.dumpleton at  Wed May 13 04:33:02 2009
From: graham.dumpleton at (Graham Dumpleton)
Date: Wed, 13 May 2009 12:33:02 +1000
Subject: [Python-Dev] [Web-SIG] py3k, cgi, email, and form-data
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6418B3DA@ex10.hostedexchange.local>
References: <AcnSWQ/GR3W2RBf3RAKfzKnEHXpWuQ==>
Message-ID: <>

2009/5/12 Robert Brewer <fumanchu at>:
> There's a major change in functionality in the cgi module between Python
> 2 and Python 3 which I've just run across: the behavior of
> FieldStorage.read_multi, specifically when an HTTP app accepts a file
> upload within a multipart/form-data payload.
> In Python 2, each part would be read in sequence within its own
> FieldStorage instance. This allowed file uploads to be shunted to a
> TemporaryFile (via make_file) as needed:
> ??? klass = self.FieldStorageClass or self.__class__
> ??? part = klass(self.fp, {}, ib,
> ???????????????? environ, keep_blank_values, strict_parsing)
> ??? # Throw first part away
> ??? while not part.done:
> ??????? headers = rfc822.Message(self.fp)
> ??????? part = klass(self.fp, headers, ib,
> ???????????????????? environ, keep_blank_values, strict_parsing)
> ??????? self.list.append(part)
> In Python 3 (svn revision 72466), the whole request body is read into
> memory first via, and then broken into separate parts in a
> second step:
> ??? klass = self.FieldStorageClass or self.__class__
> ??? parser = email.parser.FeedParser()
> ??? # Create bogus content-type header for proper multipart parsing
> ??? parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
> ??? parser.feed(
> ??? full_msg = parser.close()
> ??? # Get subparts
> ??? msgs = full_msg.get_payload()
> ??? for msg in msgs:
> ??????? fp = StringIO(msg.get_payload())
> ??????? part = klass(fp, msg, ib, environ, keep_blank_values,
> ???????????????????? strict_parsing)
> ??????? self.list.append(part)
> This makes the cgi module in Python 3 somewhat crippled for handling
> multipart/form-data file uploads of any significant size (and since
> the client is the one determining the size, opens a server up for an
> unexpected Denial of Service vector).
> I *think* the FeedParser is designed to accept incremental writes,
> but I haven't yet found a way to do any kind of incremental reads
> from it in order to shunt the out to a tempfile again.
> I'm secretly hoping Barry has a one-liner fix for this. ;)

FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own.

Not sure whether this issue in Python 3.0 was one of the reasons or
not. I know one of the reasons was because cgi.FieldStorage is not
WSGI 1.0 compliant. One of the main reasons that no one actually
adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
been addressed by a proper amendment to WSGI 1.0 specification or a
new WSGI 1.1 specification to allow a hint to readline().

The Werkzeug form processing module is properly WSGI 1.0 compliant,
meaning that Wekzeug is possibly the only major WSGI framework to be
WSGI compliant.


From stephen at  Wed May 13 06:12:27 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 13 May 2009 13:12:27 +0900
Subject: [Python-Dev]  Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > Just food for thought here, but seeing how 3.1 is going to be a
 > real featureful schedule despite being released shortly after 3.0,
 > wouldn't it make sense to tighten future release planning a little?

With all due respect, it's easy and natural to have a short,
featureful release schedule immediately after a major release (or
should I say "complete rewrite"?)  The discussion should focus on what
happens as people become relatively satisfied with the core, and
development shifts to optimizations, (smallish) bug fixes, and
features that appeal to specialized audiences.  That is when both the
costs and the benefits of a tighter and/or time-based releases appear.

 > I was thinking something like doing a major release every 12 months
 > (rather than 18 to 24 months as has been heuristically the case
 > lately). This could also imply switching to some kind of loosely
 > time-based release system.

I don't wish to express an opinion on either of these, as I'm not even
in a position to help with release blockers.  But I do hope discussion
will focus on the implications for Python 3.7, not Python 3.3.

From tleeuwenburg at  Wed May 13 06:44:52 2009
From: tleeuwenburg at (Tennessee Leeuwenburg)
Date: Wed, 13 May 2009 14:44:52 +1000
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 13, 2009 at 1:26 PM, "Martin v. L?wis" <martin at>wrote:

> > Just food for thought here, but seeing how 3.1 is going to be a real
> featureful
> > schedule despite being released shortly after 3.0, wouldn't it make sense
> to
> > tighten future release planning a little?
> Do you have any specific releases in mind that you would like to apply
> such a tightened schedule to?
> > I was thinking something like doing a
> > major release every 12 months (rather than 18 to 24 months as has been
> > heuristically the case lately).

If I can just respond with a bit of feedback from my workplace, I'd say that
slower is better. I'm grimacing as I write that :) because I personally love
to be able to take advantage of the new capabilities in each release.

Can I ask if there's any sense in pursuing a release schedule which is slow
for whatever might be deemed the "most core modules" but faster for "less
core modules"?

This is really a response to my workplace environment. The pro of that is
that it's a real example, but the con is that it may not be best practise :)

Something else which would definitely be useful for me personally would be a
kind of update egg which I could apply to, say, Python 3.0 to bring it up to
3.1 capabilities. Something that already happens now at work reasonably
often is that on my PC I have Python 2.4, 2.5, 2.6 and 3.0 installed. I tend
to develop under 2.6 from preference. However, server X only has 2.4
installed or worse, 2.3 which I don't even have. Recently I was bitten by
this as my code was relying heavily on some functionality in datetime which
had changed. I was faced with having to do some re-architecting that I
really didn't want to do.

Now, I don't know of course (I found another way around the issue), but
suppose the changes to Python I needed were relatively cosmetic, i.e. the
kind of thing I could maybe install into a virtualenv wrapper, then it would
have been quite easy for me to run my scripts written for Python 2.6.

To get to the point, I wonder if it would be possible to release new
versions alongside a patch or egg which someone with only user-level
privileges could use on a server to avoid being held back by a slower server
update cycle. A more frequent update cycle would then be easier to deal
with. More features would get out into use more quickly, while the pressures
of the lowest-common-denominator would be eased.

Just some thoughts...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Wed May 13 08:10:48 2009
From: stephen at (Stephen J. Turnbull)
Date: Wed, 13 May 2009 15:10:48 +0900
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

Tennessee Leeuwenburg writes:

 > Can I ask if there's any sense in pursuing a release schedule which
 > is slow for whatever might be deemed the "most core modules" but
 > faster for "less core modules"?

I think you need to be more specific about how many levels of "fast"
there should be, and why some modules might be deemed more or less

For example, this is part of why bsddb (sp?) was removed from the
stdlib, because its cycle is different from the core (it's heavily
torqued by whatever upstream chooses to throw at it, so it has been
the devil to test).  If you're not familiar with the history, you
might try searching the list for "bsddb 'Jesus Cea' stdlib" which
should bring up relevant threads.  (Make sure you spell the package
name right, sorry if I got it wrong!)

In short, the answer is "the stuff on a different cycle is already on

 > Something else which would definitely be useful for me personally
 > would be a kind of update egg which I could apply to, say, Python
 > 3.0 to bring it up to 3.1 capabilities.

But this would have to be considered on a per-feature basis.  If
that's possible for an individual feature (ie, doesn't involve changes
to the interpreter or compiler), almost surely the feature "did hard
time" in PyPI.  So you can probably get some version there.

OTOH, such an egg would have to contain only a subset of features.  If
there are interdependencies between that subset and those that can't
be included, in some sense you will be creating a completely new and
*untested* version of Python.  So I think that most server admins
would really want you to restrict to features you actually need, and
therefore the "best" update-egg will be very application-specific.

With the new features being proposed for dist-utils, I suppose you (or
anybody who feels like doing so) could create a "namespace package"
for such updates, pulling in the relevant modules from PyPI.  Do you
think that could work for you?  (See the PEP 382 threads for more
info; I think current discussion has moved to distutils-SIG).

From dirkjan at  Wed May 13 09:39:17 2009
From: dirkjan at (Dirkjan Ochtman)
Date: Wed, 13 May 2009 09:39:17 +0200
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 13, 2009 at 12:29 AM, Barry Warsaw <barry at> wrote:
> I've been in favor of that for a while now. ?With the move to a DVCS (how's
> that coming along?)

I've been rewriting PEP 374 about the Mercurial migration. Will post
here once it's ready for review.



From larry.bugbee at  Wed May 13 10:01:33 2009
From: larry.bugbee at (Bugbee, Larry)
Date: Wed, 13 May 2009 01:01:33 -0700
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

>From the perspective of this application developer and prototyper...

In general, releases should be more frequent when the language is less
mature and perhaps lacking.  With maturity one seeks stability and less
frequency.  Python is, for the most part, a mature language.  

I submit the issue is less a question of frequency, but more a question
of the number and value of each of the new features.

Too many new features added to a mature language begs the question of
simplicity vs complexity.  One of Python's original goals, if I recall
correctly, was to keep life simple, to have executable psuedocode, be
easy to learn and re-learn, and be able to quickly read and grok your
code 6-12 months later.  Ease of maintenance is a huge advantage of
Python.  From an application developer's perspective, too many confusing
features and the language becomes more and more like C++ and APL.  

I submit Python is now at the point where new features must not be added
just because they are cool, but because they indeed add significant
value *without* compromising simplicity and the suite of "easy to"
benefits.  The alternative is to rethink the long-term goals for the
language.  That could have large unintended consequences.


From henning.vonbargen at  Wed May 13 10:34:55 2009
From: henning.vonbargen at (henning.vonbargen at
Date: Wed, 13 May 2009 10:34:55 +0200 (CEST)
Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with thread
Message-ID: <>

How to build Python 2.6.2 on HP-UX Itanium with thread support?
Note: I know that the first address to post this question is comp.lang.python, but
I posted this question a week ago on comp.lang.python
and unfortunately, I didn't receive any answers.

According to Patch 1225212, 
at least Peter Kropf was able to get Python running with threading support 
on this platform, though AFAIK he was not using GCC.

But I guess it should be possible with GCC as well.

Is anyone able to confirm that Python (built with GCC)
does or does not work with multi-threading on HP-UX Itanium?

Is HP-UX Itanium a supported platform at all?
BTW: A search for "supported platforms" at does not help!

And if it does work, which steps need to be taken to build it,
e.g. other libraries/packages, environment variables, 
configure options, manual modifications?


From solipsis at  Wed May 13 10:39:03 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 13 May 2009 08:39:03 +0000 (UTC)
Subject: [Python-Dev] Shorter release schedule?
References: <>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> Such a schedule was initially used for the first 2.x releases. We then
> switched to 18 months because of user complaints: if releases come too
> frequently, the users are confused as to what release they should be
> using. Even 24 months is too frequently for some: some people are only
> starting to move to 2.5 right now - when we have stopped maintaining
> it already.

Obviously, there are some users who value stability over everything else. While
new language features are never critical and can easily be circumvented if you
want your code to run on old Python versions, stdlib improvements can be more
important for the average user. So perhaps the answer is the split that Brett
proposed between core language and stdlib.

> One question is what would happen to the old releases: would we still
> maintain them? If so, how many of them? For how long?

Yes, I realized that's one of the problems with this proposal. If we had to
maintain more than one stable branch, it would become a burden.

From eric at  Wed May 13 10:57:29 2009
From: eric at (Eric Smith)
Date: Wed, 13 May 2009 04:57:29 -0400
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Antoine Pitrou wrote:
> Yes, I realized that's one of the problems with this proposal. If we had to
> maintain more than one stable branch, it would become a burden.

Agreed. And since we have 2.x and 3.x now, we already have that problem. 
I'd like to an acceleration of release schedules (if it even happens) 
come after 2.x is retired.

From ncoghlan at  Wed May 13 14:31:17 2009
From: ncoghlan at (Nick Coghlan)
Date: Wed, 13 May 2009 22:31:17 +1000
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Hello,
> Just food for thought here, but seeing how 3.1 is going to be a real featureful
> schedule despite being released shortly after 3.0, wouldn't it make sense to
> tighten future release planning a little? I was thinking something like doing a
> major release every 12 months (rather than 18 to 24 months as has been
> heuristically the case lately). This could also imply switching to some kind of
> loosely time-based release system.
> If I'm wildly off-base, you can either flame me, ignore me, or assign me
> annoying release blockers involving memoryviews and weird character encodings :-)

I don't think a shorter release cycle makes sense for a programming
language. It's already the case that even with 18+ month release cycles
some end users will leapfrog releases (e.g. 2.2-> 2.4 -> 2.6) for their
environments (speaking from experience there, although the 2.6 part is
mere wishful thinking at this stage). It also seems to takes 6-12 months
for the complaints about Windows binary compatibility to die down after
each release (although that appears to be less of an issue since MS
released Visual Studio Express).

That said, the 3.1 to 3.2 spacing will probably be shorter than normal
(i.e. around 12 months), simply because 3.1 is an "extra" release to
iron out some of the major issues with 3.0. This will give 'normal' 18
month spacing for the 2.6 -> 2.7 gap.

The other big factor to consider here is the duration of bug fix support
for releases. With our policy of "current release and previous release
are supported with bug fixes" and the 18-24 month release cycle, that
means each release typically receives bug fix updates for 3-4 years.
That's a reasonably period of time (and gives plenty of time to shake
out even fairly thorny issues).

If we were to switch to yearly releases, then either the support policy
would have to change to at least "current release and the previous two
releases" or we'd have to accept the fact that the support period for
each release would now be no more than 2 years. Since 2 years strikes me
as an unacceptably short period for maintenance, shorter release cycles
would then lead directly to having to maintain more parallel branches
(which doesn't strike me as a good use of developer effort).

Standardising a time frame for major releases is a fine idea, but I
don't think shortening that time frame to 12 months would be wise.
Settling on 18 months would probably work though - those that crave
stability can then use every alternate version and only upgrade every 3
years, as each branch would be maintained with general bug fixes for at
least 3 years and security fixes for a further 3 years after that. I
think 24 months would lead to too slow an overall development tempo
though - the year-and-a-half approach feels to me like it would strike a
better balance.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From janssen at  Wed May 13 19:08:41 2009
From: janssen at (Bill Janssen)
Date: Wed, 13 May 2009 10:08:41 PDT
Subject: [Python-Dev] Shorter release schedule?
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:

> Settling on 18 months would probably work though - those that crave
> stability can then use every alternate version and only upgrade every 3
> years

I wonder about that.  Lots of people are forced to upgrade by new
language features: decorators, list comprehensions, set literals, etc.,
that are required by external libraries that they use.  One of the huge
strengths of Python is the external library community.  Interesting
tension there...


From hagenf at CoLi.Uni-SB.DE  Wed May 13 18:07:50 2009
From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Wed, 13 May 2009 18:07:50 +0200
Subject: [Python-Dev] Should collections.Counter check for int?
Message-ID: <>

I just noticed that while the docs say that "Counts are allowed to be
any integer value including zero or negative counts",
collections.Counter doesn't perform any check on the types of count
values. Instead, non-numerical values will lead to strange behaviour or
exceptions later on:

>>> c = collections.Counter({'a':'3', 'b':'20', 'c':'100'})
>>> c.most_common(2)
[('a', '3'), ('b', '20')]
>>> c+c
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/hagenf/lib/python3.1/", line 467, in __add__
    if newcount > 0:
TypeError: unorderable types: str() > int()

I'd prefer Counter to refuse non-numerical values right away as the
present behaviour may hide bugs (e.g. a forgotten string->int
conversion). Any opinions? (And what about negative values or floats?)

- Hagen

From martin at  Wed May 13 21:35:21 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 13 May 2009 21:35:21 +0200
Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with
 thread	support?
In-Reply-To: <>
References: <>
Message-ID: <>

> How to build Python 2.6.2 on HP-UX Itanium with thread support?
> Note: I know that the first address to post this question is comp.lang.python, but
> I posted this question a week ago on comp.lang.python
> (
> and unfortunately, I didn't receive any answers.

That isn't sufficient reason to post to python-dev, though.

> Is HP-UX Itanium a supported platform at all?

Python does not have a single supported platform (*), so: no.

(*) in the sense that anybody is providing "support" for it, ie.
guarantees help in case somebody has problems. (**)

HP-UX is not a platform that any of the regular Python contributors
uses or tests on at a regular basis. Python contributors mostly use
Linux, Windows, and OS X; some also use Solaris and *BSD.

> And if it does work, which steps need to be taken to build it,
> e.g. other libraries/packages, environment variables, 
> configure options, manual modifications?

This really is out of scope for python-dev.

In scope would be a proposal to apply a certain patch that you had
to write Python work on HP-UX, and discussion whether this patch
is the appropriate solution.


(**) There is, of course, ActiveState, which does provide binaries,
including for HP-UX, so I suppose they support it - at least if you
buy commercial support.

From ajaksu at  Wed May 13 21:29:07 2009
From: ajaksu at (Daniel Diniz)
Date: Wed, 13 May 2009 16:29:07 -0300
Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with
	thread support?
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Henning,

henning.vonbargen wrote:
> How to build Python 2.6.2 on HP-UX Itanium with thread support?
[snip bit about python-list]

I can't give you directions, but if you can describe your issues I
might be able to help.

I'll respond in python-list, as I think this is OT for python-dev.

> Is HP-UX Itanium a supported platform at all?
> BTW: A search for "supported platforms" at does not help!

Now, this looks like python-dev material. PEP 11[0], the information
in README[1]  and the notes in the downloads pages[2] could be
improved and updated. If someone has time to invest in this, a
compatibility matrix would be very nice to have.


[2] and

From martin at  Wed May 13 21:52:30 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 13 May 2009 21:52:30 +0200
Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with
 thread support?
In-Reply-To: <>
References: <>
Message-ID: <>

> Now, this looks like python-dev material. PEP 11[0], the information
> in README[1]  and the notes in the downloads pages[2] could be
> improved and updated. If someone has time to invest in this, a
> compatibility matrix would be very nice to have.

I don't think HP-UX is ready for PEP 11 yet. It may not work, but that's
a bug that could be fixed if users would actually contribute fixes.
Likewise, changes to README could be accepted if users contribute them.

I'm not sure /download/source is really that useful - perhaps it would
be best to remove it. As for /download/other - contributions are welcome.


From dripton at  Thu May 14 00:22:39 2009
From: dripton at (David Ripton)
Date: Wed, 13 May 2009 15:22:39 -0700
Subject: [Python-Dev] How to build Python 2.6.2 on HP-UX Itanium with
	thread support?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2009.05.13 10:34:55 +0200, henning.vonbargen at wrote:
> How to build Python 2.6.2 on HP-UX Itanium with thread support?
> Note: I know that the first address to post this question is comp.lang.python, but
> I posted this question a week ago on comp.lang.python
> (
> and unfortunately, I didn't receive any answers.
> According to Patch 1225212, 
> at least Peter Kropf was able to get Python running with threading support 
> on this platform, though AFAIK he was not using GCC.
> But I guess it should be possible with GCC as well.
> Is anyone able to confirm that Python (built with GCC)
> does or does not work with multi-threading on HP-UX Itanium?

The good news:

I did get Python 2.4.x working on HP-UX Itanium, with threading.  The
compiler was gcc 4.0.x.  (I also tried building Python with aCC, but
failed.)  I remember building both 32-bit and 64-bit versions.

I don't remember it being that hard.  Used the source for the package at as a starting point, since it had a lot of good
porting tweaks, but it needed some further tweaking.  (The main one I
remember that is that the shared library extension for Itanium should be
.so not .sl  There were also a bunch of paths that required appending 32
or 64.)

We used that build of Python in production, for very heavily
multithreaded code, on multi-CPU boxes.  Worked fine.  AFAIK they're
still using it.

I'm not sure why the binary available at has
threading disabled.  I suspect that some older version of HP/UX had
pthread bugs that got fixed somewhere along the line.

The bad news:

I did this about 3.5 years ago, and I don't work there anymore, so I
don't have access to that HP-UX hardware anymore, or to the notes I made
when I was doing the port.  So I can give you encouragement but not
step-by-step instructions.  Sorry.

David Ripton    dripton at

From aahz at  Thu May 14 04:22:35 2009
From: aahz at (Aahz)
Date: Wed, 13 May 2009 19:22:35 -0700
Subject: [Python-Dev] Should collections.Counter check for int?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 13, 2009, Hagen F?rstenau wrote:
> I'd prefer Counter to refuse non-numerical values right away as the
> present behaviour may hide bugs (e.g. a forgotten string->int
> conversion). Any opinions? (And what about negative values or floats?)

Please file a report on so that there's a record of this
Aahz (aahz at           <*>

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From cesare.dimauro at  Thu May 14 08:27:10 2009
From: cesare.dimauro at (Cesare Di Mauro)
Date: Thu, 14 May 2009 08:27:10 +0200 (CEST)
Subject: [Python-Dev] special method lookup: how much do we care?
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 10, 2009 11:51PM, Nick Coghlan wrote:
> However lots of developers rely on CPython ref counting as well, no
> matter how many times they're told not to do that if they want to
> support alternative interpreters.
> Cheers,
> Nick.


# Wrapper around platform socket objects. This implements
# a platform-independent dup() functionality. The
# implementation currently relies on reference counting
# to close the underlying socket object.
class _socketobject(object):

You don't know how much time I've spent trying to understand why hanged indefinitely when I was experimenting with new
opcodes in my VM.


From rdmurray at  Thu May 14 19:30:13 2009
From: rdmurray at (R. David Murray)
Date: Thu, 14 May 2009 13:30:13 -0400 (EDT)
Subject: [Python-Dev] python -m test.regrtest should pass on an installed
Message-ID: <>

For various reasons I happened to run 'python -m test.regrtest' on my
Gentoo installed Python.  For 2.5.4 only test_tarfile failed (it tries
to write into the read-only installed test directory).  On 2.6.2
test_tarfile passes, but other test suites, including test_distutils,
do not.

So this posting is a general reminder that the tests should not make
assumptions about the writabilty of the test directory (or, for that
matter, of the CWD).

When I get time I'll file bugs on the particular failures I'm seeing,
after I do an install from checkout.


From ziade.tarek at  Fri May 15 00:21:43 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 15 May 2009 00:21:43 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
Message-ID: <>


I'm proposing this PEP, which has been discussed in Distutils-SIG, for
inclusion in Python 2.7 and 3.2

Please comment !

Tarek Ziad? |

From pje at  Fri May 15 07:00:55 2009
From: pje at (P.J. Eby)
Date: Fri, 15 May 2009 01:00:55 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
Message-ID: <>

At 12:21 AM 5/15/2009 +0200, Tarek Ziad? wrote:
>I'm proposing this PEP, which has been discussed in Distutils-SIG, for
>inclusion in Python 2.7 and 3.2
>Please comment !

I'd like to reiterate my suggestion that the uninstall record include 
size and checksum information, ala PEP 262's "FILES" section.  This 
would allow the uninstall function to validate whether a file has 
been modified, and thus prevent uninstalling a locally-modified file, 
or a file installed in some other way.

It may also be that providing an uninstall API that simply yields 
files to be uninstalled, with data about their existence/modification 
status, would be more useful than a blind uninstall operation with a 
filter function.

Also, the PEP doesn't document what happens if a single file was 
installed by more than one package.  Ideally, a file with identical 
size/checksum that belongs to more than one project should be 
silently left alone, and a file installed by more than one project 
with *different* size/checksum should be warned about and left alone.

Next, the doc for the metadata API functions seems quite 
sparse.  ISTR that I've previously commented on such issues as case- 
and punctuation-insensitivity of project names, and '/' separation in 
egg_info subpaths, but these don't seem to have been incorporated 
into the current version of the PEP.

These are important considerations in general, btw, because project 
name and version canonicalization and escaping are an important part 
of both generating and parsing .egg-info filenemaes.  At minimum, the 
relevant setuptools docs that define these standards should be cited.

Finally, the "Definitions" section also claims that a project 
installs one or more packages, but a project may not contain *any* 
packages; it may have a standalone module, or just a script, data, or metadata.

From asmodai at  Fri May 15 08:32:20 2009
From: asmodai at (Jeroen Ruigrok van der Werven)
Date: Fri, 15 May 2009 08:32:20 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

-On [20090515 06:59], P.J. Eby (pje at wrote:
>I'd like to reiterate my suggestion that the uninstall record include 
>size and checksum information, ala PEP 262's "FILES" section.  This 
>would allow the uninstall function to validate whether a file has 
>been modified, and thus prevent uninstalling a locally-modified file, 
>or a file installed in some other way.

Agreed. Within FreeBSD's ports the installed package registration gets a MD5
hash per file recorded. Size is less interesting though, since essentially
this information is encapsulated within the hash. Remove one byte from the
file and your hash is already different. And the case of a collision for
this kind of registration is sufficiently small to need the size
And if you're worried about the MD5 collision space, which for this use case
ought to be large enough, you could always settle for SHA1.

Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
????? ?????? ??? ?? ?????? | | GPG: 2EAC625B
What's one man's poison, is another's meat or drink...

From ziade.tarek at  Fri May 15 08:32:29 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 15 May 2009 08:32:29 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/15 P.J. Eby <pje at>:
> At 12:21 AM 5/15/2009 +0200, Tarek Ziad? wrote:
>> Hello
>> I'm proposing this PEP, which has been discussed in Distutils-SIG, for
>> inclusion in Python 2.7 and 3.2
>> Please comment !
> I'd like to reiterate my suggestion that the uninstall record include size
> and checksum information, ala PEP 262's "FILES" section. ?This would allow
> the uninstall function to validate whether a file has been modified, and
> thus prevent uninstalling a locally-modified file, or a file installed in
> some other way.

good point, I'll re-work that part

> It may also be that providing an uninstall API that simply yields files to
> be uninstalled, with data about their existence/modification status, would
> be more useful than a blind uninstall operation with a filter function.

Sure we could have it in that shape, I'll work on this as well.

> Also, the PEP doesn't document what happens if a single file was installed
> by more than one package.

It does:

" long as they are not mentioned in another RECORD file..."

> ?Ideally, a file with identical size/checksum that
> belongs to more than one project should be silently left alone, and a file
> installed by more than one project with *different* size/checksum should be
> warned about and left alone.

I think the path is the info that should be looked at. And a warning
could be raised
like you said if a file was manually modified.  But I don't think you
want to leave alone
 a file with identical size/checksum that belongs to more than one
project when it's not the
same absolute path.

Here's an example why : if two different packages includes the
"" module
(from the FeedParser project) for conveniency, and if you remove one package,
you *do* want to remove its "" module even if it exists in the other

So it's rather changing the PEP text like this:

" long as they are not mentioned in another RECORD file, with the
same size/checksum..."

> Next, the doc for the metadata API functions seems quite sparse. ?ISTR that
> I've previously commented on such issues as case- and
> punctuation-insensitivity of project names, and '/' separation in egg_info
> subpaths, but these don't seem to have been incorporated into the current
> version of the PEP.
> These are important considerations in general, btw, because project name and
> version canonicalization and escaping are an important part of both
> generating and parsing .egg-info filenemaes. ?At minimum, the relevant
> setuptools docs that define these standards should be cited.

I'll add more info on that part accordingly then,

> Finally, the "Definitions" section also claims that a project installs one
> or more packages, but a project may not contain *any* packages; it may have
> a standalone module, or just a script, data, or metadata.


Thanks for the feedbacks

Tarek Ziad? |

From dirkjan at  Fri May 15 09:50:13 2009
From: dirkjan at (Dirkjan Ochtman)
Date: Fri, 15 May 2009 09:50:13 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven
<asmodai at> wrote:
> Agreed. Within FreeBSD's ports the installed package registration gets a MD5
> hash per file recorded. Size is less interesting though, since essentially
> this information is encapsulated within the hash. Remove one byte from the
> file and your hash is already different. And the case of a collision for
> this kind of registration is sufficiently small to need the size
> information.

Size is nice because it's much cheaper to check. I don't know if mass
uninstalls will be so common that this is actually something we have
to worry about, though.



From ncoghlan at  Fri May 15 12:34:35 2009
From: ncoghlan at (Nick Coghlan)
Date: Fri, 15 May 2009 20:34:35 +1000
Subject: [Python-Dev] python -m test.regrtest should pass on an
 installed python
In-Reply-To: <>
References: <>
Message-ID: <>

R. David Murray wrote:
> So this posting is a general reminder that the tests should not make
> assumptions about the writabilty of the test directory (or, for that
> matter, of the CWD).

Indeed - the tempfile module is very helpful in that regard.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From status at  Fri May 15 18:07:15 2009
From: status at (Python tracker)
Date: Fri, 15 May 2009 18:07:15 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (05/08/09 - 05/15/09)
Python tracker at

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2194 open (+34) / 15658 closed (+26) / 17852 total (+60)

Open issues with patches:   855

Average duration of open issues: 647 days.
Median duration of open issues: 398 days.

Open Issues Breakdown
   open  2165 (+34)
pending    28 ( +0)

Issues Created Or Reopened (61)

Generator expression bug?                                        05/08/09
CLOSED    reopened svenrahmann                   

sys.exc_info leaks into a generator                              05/08/09    created  jyasskin                      

logging.Handler.handlerError() may raise IOError in 05/08/09
CLOSED    created  ryles                         

Failing on Redhat 4.1.2-44                        05/08/09    created  dmauldin                      

re-usable generators / generator expressions should return itera 05/08/09
CLOSED    created  svenrahmann                   

unicode decode error due to improperly entered text "Martin v. L 05/08/09
CLOSED    created  srid                          

csv unix file format ('\n' line terminator)                      05/08/09    created  jtalbot                       

test_os fails if run after test_distutils                        05/09/09
CLOSED    created  tarek                         

distutils build_ext.get_outputs returns wrong result (patch)     05/12/09    reopened ajaksu2                       

cProfile and profile don't work with pygtk/pyqt and sys.exit(0)  05/09/09    created  akkana                        

strptime() gives inconsistent exceptions                         05/09/09    created  ryles                         

Add bug tracker tasks to PEP 101                                 05/09/09    created  ajaksu2                       

float.fromhex bugs                                               05/09/09
CLOSED    created  marketdickinson               

classmethod, staticmethod: expose wrapped function               05/09/09    created  gsakkis                       
                                                                        no more in _xmlplus/utils                             05/09/09
CLOSED    created  schmirrwurst                  

distutils.command.build_ext.check_extensions_list broken checkin 05/10/09
CLOSED    created  tarek                         

Implement os.path.samefile and os.path.sameopenfile on Windows   05/10/09    created  sandberg                      

Avoid reversed() in Random.shuffle()                             05/10/09
CLOSED    created  haypo                         

Broken link to "Curses Programming with Python"                  05/10/09
CLOSED    created  ralph.corderoy                

Delete PyOS_ascii_formatd, PyOS_ascii_strtod, and PyOS_ascii_ato 05/10/09    created  eric.smith                    

unittest.TestLoader.loadTestsFromNames should accept module / cl 05/10/09
CLOSED    created  michael.foord                 

Memory leak in os.rename() and other functions                   05/10/09
CLOSED    created  pitrou                        

Add non-command help topics to help completion of cmd.Cmd        05/10/09    created  flub                          

spurious space after opening parenthesis when auto-completing    05/10/09    created  pitrou                        

python produces zombie in                        05/11/09    created  dontbugme                     

help(marshal) just gives an outline; no help text provided.      05/11/09
CLOSED    created  orsenthil                     

unittest command line behaviour                                  05/11/09
CLOSED    created  michael.foord                 
       patch, patch, easy                                                      

abstract class instantiable when subclassing dict                05/11/09    created  thet                          

strftime is broken                                               05/11/09
CLOSED    created  jonathan.cervidae             

Add __bool__ to threading.Event and multiprocessing.Event        05/11/09    created  flub                          

compile error on HP-UX 11.22 ia64 - 'mbstate_t' is used as a typ 05/11/09    created  srid                          

compile error - PyNumber_InPlaceOr(newfree, allfree) < 0         05/11/09
CLOSED    created  srid                          

Test discovery for unittest                                      05/11/09    created  michael.foord                 
       patch, patch, needs review                                              

test_urllib2_localnet DigestAuthHandler leaks nonces             05/11/09    created  r.david.murray                

ZipFile.writestr "compression_type" argument                     05/12/09    created  ronaldoussoren                

ZipFile.writestr "compression_type" argument                     05/12/09
CLOSED    created  ronaldoussoren                

Bug in socket example                                            05/12/09    created  kiilerix                      

ffi.c compile failures on AIX 5.3 with xlc                       05/12/09    created  elyeshel                      

distutils tricks you into thinking you can	build extensions with 05/12/09    created  exarkun                       

Idle should be installed as `idle3.1` and not `idle3`            05/13/09    created  srid                          

optparse docs say 'default' keyword is deprecated but uses it in 05/13/09    created  mallyvai                      

unable to retrieve latin-1 encoded data from sqlite3             05/13/09
CLOSED    created  izarf                         

python doesn't build if prefix contains non-ascii characters     05/13/09    created  zegreek                       

enhance getargs O& to accept cleanup function                    05/13/09    created  ocean-city                    

json slower than simplejson                                      05/13/09
CLOSED    created  theller                       

No shell prompt when a graphics window that was started from IDL 05/13/09    created  chessweb                      

Scrollbar in Idle os x 10.5                                      05/13/09    created  an is                         

Use shipped zlib if the system version is bad                    05/14/09
CLOSED    created  ajaksu2                       
       patch, patch                                                            

Dict fails to notice addition and deletion of keys during iterat 05/14/09
CLOSED    created  stevenjd                      

Fix the output word from "ok" to "OK"  when a testcase passes    05/14/09    created  Retro                         

Minor typos in ctypes docs                                       05/14/09
CLOSED    created  lehmannro                     

Create a datetime.timedelta.totalseconds property                05/14/09
CLOSED    created  mw44118                       

itertools.grouper                                                05/14/09
CLOSED    created  lieryan                       

test_distutils leaves a 'foo' file behind in the cwd             05/14/09
CLOSED    created  r.david.murray                

Search does not intelligently handle module.function queries on  05/14/09    created           

regrtest says refleaks are "ok"                                  05/14/09
CLOSED    created  collinwinter                  

documentation of xml.dom.minidom.parse signature is wrong        05/14/09    created  phihag                        

test_(zipfile|zipimport|gzip|distutils) fail if zlib is not avai 05/15/09    created  ezio.melotti                  

test_xmlrpc_net fails when the ISP returns "302 Found"           05/15/09    created  ezio.melotti                  

Interpreter crashes when chaining an infinite number of exceptio 05/15/09
CLOSED    created  yury                          

FAIL: test_longdouble (ctypes.test.test_callbacks.Callbacks) [SP 05/15/09    created  illumino                      

Issues Now Closed (58)

csv input converts \r\n to \n but csv output does not when a fie  531 days    ajaksu2                       

imaplib is not IPv6-capable                                       514 days    pitrou                        

nntplib is not IPv6-capable                                       512 days    dmorr                         

Cosmetic patch to supress compiler warning                        475 days    ocean-city                    

isinstance(anything, MetaclassThatDefinesInstancecheck) raises i  318 days    ajaksu2                       

Check implementation of new buffer interface for PyString in 2.6  413 days    pitrou                        

64 bit python  memory leak usage                                  388 days    pitrou                        

Py3k fails to parse a file with an iso-8859-1 string              384 days    benjamin.peterson             

update Lib/test/README                                            353 days    pitrou                        

arguments and default path not set in and sitecustomize.  351 days    haridsv                       

Python 2.6rc2: Tix ComboBox error                                 239 days    loewis                        

inspect.findsource() returns binary data for shared library modu  220 days    r.david.murray                

help("modules ftp") fails due to test modules                     208 days    ajaksu2                       

Duplicate UTF-16 BOM if a file is open in append mode             115 days    pitrou                        

backport distutils 3.x changes into 2.7 when appliabl              95 days    tarek                         

StringIO can duplicate newlines in universal newlines mode         87 days    alexandre.vassalotti          

test_importlib fails on Mac OSX 10.5.6 w/ case-sensitive file sy   37 days    brett.cannon                  

TextIOWrapper fails with SystemError when reading HTTPResponse     44 days    benjamin.peterson             

internal error on write while reading                              19 days    benjamin.peterson             

Ensure RUNPATH is added to extension modules with RPATH if GNU l    8 days    tarek                         

I need to import the module in the same thread                      7 days    amaury.forgeotdarc            

import deadlocks when using fork                                   11 days    benjamin.peterson             

test_parser crashes when run after some other tests                11 days    pitrou                        

fix gcc -Wextra warnings (compare signed/unsigned)                  4 days    marketdickinson               

Add to "whats new": range(n) != range(n)                            8 days    MLModel                       

Possible mistake regarding writeback in documentation of 	shelve    4 days    r.david.murray                

unnecessary hardlink                                                1 days    orsenthil                     

Generator expression bug?                                           0 days    tjreedy                       

logging.Handler.handlerError() may raise IOError in    1 days    vsajip                        

re-usable generators / generator expressions should return itera    1 days    r.david.murray                

unicode decode error due to improperly entered text "Martin v. L    0 days    loewis                        

test_os fails if run after test_distutils                           0 days    tarek                         

float.fromhex bugs                                                  2 days    marketdickinson               
       patch                                                            no more in _xmlplus/utils                                0 days    loewis                        

distutils.command.build_ext.check_extensions_list broken checkin    0 days    tarek                         

Avoid reversed() in Random.shuffle()                                1 days    rhettinger                    

Broken link to "Curses Programming with Python"                     1 days    ralph.corderoy                

unittest.TestLoader.loadTestsFromNames should accept module / cl    0 days    michael.foord                 

Memory leak in os.rename() and other functions                      3 days    pitrou                        

help(marshal) just gives an outline; no help text provided.         2 days    r.david.murray                

unittest command line behaviour                                     1 days    michael.foord                 
       patch, patch, easy                                                      

strftime is broken                                                  0 days    jonathan.cervidae             

compile error - PyNumber_InPlaceOr(newfree, allfree) < 0            1 days    benjamin.peterson             

ZipFile.writestr "compression_type" argument                        0 days    ronaldoussoren                

unable to retrieve latin-1 encoded data from sqlite3                0 days    loewis                        

json slower than simplejson                                         0 days    pitrou                        

Use shipped zlib if the system version is bad                       0 days    ajaksu2                       
       patch, patch                                                            

Dict fails to notice addition and deletion of keys during iterat    0 days    benjamin.peterson             

Minor typos in ctypes docs                                          1 days    georg.brandl                  

Create a datetime.timedelta.totalseconds property                   0 days    pitrou                        

itertools.grouper                                                   0 days    cvrebert                      

test_distutils leaves a 'foo' file behind in the cwd                0 days    tarek                         

regrtest says refleaks are "ok"                                     0 days    collinwinter                  

Interpreter crashes when chaining an infinite number of exceptio    0 days    pitrou                        

Solaris term.h needs curses.h                                    2023 days  ajaksu2                       

test_subprocess fails on cygwin                                   996 days ajaksu2                       

os.popen with os.close gives error message                        945 days ajaksu2                       

Problem linking to readline lib on x86(64) Solaris                797 days ajaksu2                       

Top Issues Most Discussed (10)

 14 WeakSet cmp methods                                                8 days

 12 distutils build_ext.get_outputs returns wrong result (patch)       3 days

 11 xml.dom.minidom does not escape CR, LF and TAB characters withi   31 days

 10 Add to "whats new": range(n) != range(n)                           8 days

 10 Do not assume signed integer overflow behavior                   519 days

 10 Enhance file.readlines by making line separator selectable       443 days

  8 test_asynchat fails on Mac OSX                                    25 days

  7 Add and os.symlink() and	os.path.islink() support for  942 days

  6 test_importlib fails on Mac OSX 10.5.6 w/ case-sensitive file s   37 days

  5 ssl makefile never closes socket                                  92 days

From python.leojay at  Fri May 15 18:18:42 2009
From: python.leojay at (Leo Jay)
Date: Sat, 16 May 2009 00:18:42 +0800
Subject: [Python-Dev] doc error in 2.6.2
Message-ID: <>

There is a syntax error in the client side code of
"SocketServer.UDPServer Example" in
import socket
import sys

HOST, PORT = "localhost"
data = " ".join(sys.argv[1:])

Obviously, it should be:
HOST, PORT = "localhost", 9999

Leo Jay

From aahz at  Fri May 15 19:16:27 2009
From: aahz at (Aahz)
Date: Fri, 15 May 2009 10:16:27 -0700
Subject: [Python-Dev] doc error in 2.6.2
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 16, 2009, Leo Jay wrote:
> There is a syntax error in the client side code of
> "SocketServer.UDPServer Example" in

Please follow the directions in
to report this on -- that ensures that it won't get
Aahz (aahz at           <*>

"In 1968 it took the computing power of 2 C-64's to fly a rocket to the moon.
Now, in 1998 it takes the Power of a Pentium 200 to run Microsoft Windows 98.
Something must have gone wrong."  --/bin/fortune

From pje at  Fri May 15 19:52:34 2009
From: pje at (P.J. Eby)
Date: Fri, 15 May 2009 13:52:34 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote:
>Agreed. Within FreeBSD's ports the installed package registration 
>gets a MD5 hash per file recorded. Size is less interesting though, 
>since essentially this information is encapsulated within the hash. 
>Remove one byte from the file and your hash is already different.

Which also means that in that case you can skip computing the 
MD5.  The size allows you to easily notice an overwrite/corruption 
without further processing. 

From pje at  Fri May 15 19:56:36 2009
From: pje at (P.J. Eby)
Date: Fri, 15 May 2009 13:56:36 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <
References: <>
Message-ID: <>

At 08:32 AM 5/15/2009 +0200, Tarek Ziad? wrote:
>2009/5/15 P.J. Eby <pje at>:
> >  Ideally, a file with identical size/checksum that
> > belongs to more than one project should be silently left alone, and a file
> > installed by more than one project with *different* size/checksum should be
> > warned about and left alone.
>I think the path is the info that should be looked at.

By "a file that belongs to more than one project" I meant a single 
file on *disk* (i.e., one absolute path).

>But I don't think you want to leave alone a file with identical 
>size/checksum that belongs to more than one project when it's not 
>the same absolute path.

That wouldn't be "a file" then, would it?  ;-)

>Here's an example why : if two different packages includes the
>"" module
>(from the FeedParser project) for conveniency, and if you remove one package,
>you *do* want to remove its "" module even if it exists 
>in the other

Right, that would be *two files*, though, not one file.

From tonynelson at  Sat May 16 00:03:14 2009
From: tonynelson at (Tony Nelson)
Date: Fri, 15 May 2009 18:03:14 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <p04330100c6339585929a@[]>

At 13:52 -0400 05/15/2009, P.J. Eby wrote:
>At 08:32 AM 5/15/2009 +0200, Jeroen Ruigrok van der Werven wrote:
>>Agreed. Within FreeBSD's ports the installed package registration
>>gets a MD5 hash per file recorded. Size is less interesting though,
>>since essentially this information is encapsulated within the hash.
>>Remove one byte from the file and your hash is already different.
>Which also means that in that case you can skip computing the
>MD5.  The size allows you to easily notice an overwrite/corruption
>without further processing.

In most cases the files will actually match, so the sizes and dates will be
the same and the checksum must be computed to verify the match.

RPM does this when asked to Verify a package.  It is faster than Removing a
package, and Verifying all installed packages takes a reasonable amount of
time.  I don't think Python would be any worse at verifying its own
packages, and it would normally have less data to verify, so it should be
fast enough.
TonyN.:'                       <mailto:tonynelson at>
      '                              <>

From hfuerstenau at  Sat May 16 11:58:16 2009
From: hfuerstenau at (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=)
Date: Sat, 16 May 2009 11:58:16 +0200
Subject: [Python-Dev] Should collections.Counter check for int?
In-Reply-To: <>
References: <>
Message-ID: <>

>> I'd prefer Counter to refuse non-numerical values right away as the
>> present behaviour may hide bugs (e.g. a forgotten string->int
>> conversion). Any opinions? (And what about negative values or floats?)
> Please file a report on so that there's a record of this
> issue.


- Hagen

From chris at  Sat May 16 17:00:39 2009
From: chris at (Chris Withers)
Date: Sat, 16 May 2009 16:00:39 +0100
Subject: [Python-Dev] .pth files should never contain python
In-Reply-To: <>
References: <>	
	<> <>	
	<> <>	
	<> <>	
	<> <>
Message-ID: <>

Paul Moore wrote:
> 2009/5/9 Chris Withers <chris at>:
>> Martin v. L?wis wrote:
>>>> I thought .pth files just had python in them?
>>> Not at all - they never did. They have paths in them.
>> I've certainly seen them with python in, and that's what I hate about
>> them...
> AIUI, there was a small special case that lines starting with "import"
> are executed (see the source of for details). This exception
> has been exploited (some would say "abused", but I'm trying to be
> unbiased here) by setuptools, at least, to do path manipulations and
> such.

Abused is definitely the right word, I suppose it's too late to correct 
this bug?

How about for Python 3?



From ziade.tarek at  Sat May 16 18:06:25 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sat, 16 May 2009 18:06:25 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

Ok I've changed the PEP with all the points you mentioned, if you want
to take a look.

2009/5/15 P.J. Eby <pje at>:
> Next, the doc for the metadata API functions seems quite sparse. ?ISTR that
> I've previously commented on such issues as case- and
> punctuation-insensitivity of project names, and '/' separation in egg_info
> subpaths, but these don't seem to have been incorporated into the current
> version of the PEP.
> These are important considerations in general, btw, because project name and
> version canonicalization and escaping are an important part of both
> generating and parsing .egg-info filenemaes. ?At minimum, the relevant
> setuptools docs that define these standards should be cited.

I need to find back your comments for this part, I must have missed
them. That's
the last part I didn't work out yet on the current PEP revision.


Tarek Ziad? |

From ziade.tarek at  Sat May 16 18:39:41 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sat, 16 May 2009 18:39:41 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

Yes, I don't think it's relevant to optimize install/uninstall code in Python.

In the whole PEP 376 proposal, the only part that will need care will
be the code
that browses sys.path.

On Fri, May 15, 2009 at 9:50 AM, Dirkjan Ochtman <dirkjan at> wrote:
> On Fri, May 15, 2009 at 8:32 AM, Jeroen Ruigrok van der Werven
> <asmodai at> wrote:
>> Agreed. Within FreeBSD's ports the installed package registration gets a MD5
>> hash per file recorded. Size is less interesting though, since essentially
>> this information is encapsulated within the hash. Remove one byte from the
>> file and your hash is already different. And the case of a collision for
>> this kind of registration is sufficiently small to need the size
>> information.
> Size is nice because it's much cheaper to check. I don't know if mass
> uninstalls will be so common that this is actually something we have
> to worry about, though.
> Cheers,
> Dirkjan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Tarek Ziad? |

From pje at  Sat May 16 18:55:44 2009
From: pje at (P.J. Eby)
Date: Sat, 16 May 2009 12:55:44 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <
References: <>
Message-ID: <>

At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote:
>Ok I've changed the PEP with all the points you mentioned, if you want
>to take a look.

Some notes:

1. Why ';' separation, instead of tabs as in PEP 262?  Aren't 
semicolons a valid character in filenames?

2. "if the installed file is located in a directory in site-packages" 
should refer not to site-packages but to the directory containing the 
.egg-info directory.

3. get_egg_info_file needs to be specified as using '/'-separated 
paths and converting to OS paths if appropriate.  There's also the 
problem that the mode it opens the file in (binary or text) is unspecified.

4. There should probably be a way to iterate over the projects in a 
directory, since it's otherwise impossible for an installation tool 
to find out what project(s) "own" a file that conflicts with 
something being installed.  Alternatively, reshaping the file API to 
allow querying by path as well as by project might work.

5. If any cache mechanisms are to be used by the API, the API *must* 
make it possible to bypass or explicitly manage that cache, as 
otherwise installation tools and tools that manipulate sys.path at 
runtime may end up using incorrect data.

6. get_files() doesn't document whether the yielded paths are 
absolute or relative, local or cross-platform, etc.

>I need to find back your comments for this part, I must have missed
>them. That's
>the last part I didn't work out yet on the current PEP revision.

Well, if you can't find them, the EggFormats doc explains how these 
file/dir structures are currently laid out by setuptools, 
easy_install, pip, etc., and the PEP should probably reference that.

Technically, this PEP doesn't so much propose a change to the 
EggFormats standard, as simply add a RECORD file to it, and propose 
stdlib support for reading and writing it.  So, the PEP really should 
reference (i.e. link to) the existing standard.  The EggFormats doc 
in turn cites pkg_resources doc for lower-level format issues, such 
as name and version normalization, filename escaping, file parsing, etc.

This PEP should also probably be framed as a replacement for PEP 262, 
proposing to extend the de-facto standard for an installation 
database with uninstall support, and blessing selected portions of 
the de facto standard as an official standard.  (Since that's pretty 
much exactly what it is.)

From v+python at  Sat May 16 20:17:10 2009
From: v+python at (Glenn Linderman)
Date: Sat, 16 May 2009 11:17:10 -0700
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

On approximately 5/16/2009 9:55 AM, came the following characters from 
the keyboard of P.J. Eby:
> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote:
>> Ok I've changed the PEP with all the points you mentioned, if you want
>> to take a look.
> Some notes:
> 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't semicolons 
> a valid character in filenames?

Why tabs?  Aren't tabs a valid character in filenames?
(hint: Both are valid in POSIX filenames, neither are valid in Windows 

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From pje at  Sat May 16 20:58:35 2009
From: pje at (P.J. Eby)
Date: Sat, 16 May 2009 14:58:35 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote:
>On approximately 5/16/2009 9:55 AM, came the following characters 
>from the keyboard of P.J. Eby:
>>At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote:
>>>Ok I've changed the PEP with all the points you mentioned, if you want
>>>to take a look.
>>Some notes:
>>1. Why ';' separation, instead of tabs as in PEP 262?  Aren't 
>>semicolons a valid character in filenames?
>Why tabs?  Aren't tabs a valid character in filenames?
>(hint: Both are valid in POSIX filenames, neither are valid in 
>Windows filenames)

";" *is* valid in Windows filenames, actually.  Tabs aresn't.

From v+python at  Sat May 16 21:12:15 2009
From: v+python at (Glenn Linderman)
Date: Sat, 16 May 2009 12:12:15 -0700
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

On approximately 5/16/2009 11:58 AM, came the following characters from 
the keyboard of P.J. Eby:
> At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote:
>> On approximately 5/16/2009 9:55 AM, came the following characters from 
>> the keyboard of P.J. Eby:
>>> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote:
>>>> Ok I've changed the PEP with all the points you mentioned, if you want
>>>> to take a look.
>>> Some notes:
>>> 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't 
>>> semicolons a valid character in filenames?
>> Why tabs?  Aren't tabs a valid character in filenames?
>> (hint: Both are valid in POSIX filenames, neither are valid in Windows 
>> filenames)
> ";" *is* valid in Windows filenames, actually.  Tabs aresn't.

Oops.  Guess I got that crossed with valid email address characters...

But I should probably have stated my point... that since there are no 
characters that are not illegal in file names on every platform, except 
"/" and NULL, that some mention should be made, that splitting the line 
on ; (or TAB) isn't necessarily the correct parsing technique... rather 
that the line should be parsed from the right end, and the remainder 
used as a the filename, as the numbers at the end would not have ; or 
TAB as legal characters within them.  Or else some escaping mechanism 
needs to be defined.  Or else the ; or TAB will be illegal in names used 
in the RECORD (which would be limiting, although not significantly so, 
in my opinion, but others may have other opinions).

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From shigin at  Sat May 16 21:36:04 2009
From: shigin at (Alexander Shigin)
Date: Sat, 16 May 2009 23:36:04 +0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <1242502564.4478.3.camel@jenner>

? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????:
> ";" *is* valid in Windows filenames, actually.  Tabs aresn't.

I was sure ';' is separator for PATH in Windows. Do I miss something? If
I remember right os.path.pathsep is ';' under Windows.

From martin at  Sat May 16 22:08:25 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 16 May 2009 22:08:25 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <1242502564.4478.3.camel@jenner>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Alexander Shigin wrote:
> ? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????:
>> ";" *is* valid in Windows filenames, actually.  Tabs aresn't.
> I was sure ';' is separator for PATH in Windows. Do I miss something? 

Yes, this:


From v+python at  Sat May 16 22:26:18 2009
From: v+python at (Glenn Linderman)
Date: Sat, 16 May 2009 13:26:18 -0700
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<1242502564.4478.3.camel@jenner>
Message-ID: <>

On approximately 5/16/2009 1:08 PM, came the following characters from 
the keyboard of Martin v. L?wis:
> Alexander Shigin wrote:
>> ? ???, 16/05/2009 ? 14:58 -0400, P.J. Eby ?????:
>>> ";" *is* valid in Windows filenames, actually.  Tabs aresn't.
>> I was sure ';' is separator for PATH in Windows. Do I miss something? 
> Yes, this:

Well, maybe he was missing that, or maybe he was missing that each entry 
in the Windows PATH is allowed to be quoted, so that ; characters inside 
quotes are part of path names, and ; characters outside of quotes are 

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From google at  Sun May 17 00:15:46 2009
From: google at (MRAB)
Date: Sat, 16 May 2009 23:15:46 +0100
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Glenn Linderman wrote:
> On approximately 5/16/2009 11:58 AM, came the following characters from 
> the keyboard of P.J. Eby:
>> At 11:17 AM 5/16/2009 -0700, Glenn Linderman wrote:
>>> On approximately 5/16/2009 9:55 AM, came the following characters 
>>> from the keyboard of P.J. Eby:
>>>> At 06:06 PM 5/16/2009 +0200, Tarek Ziad? wrote:
>>>>> Ok I've changed the PEP with all the points you mentioned, if you want
>>>>> to take a look.
>>>> Some notes:
>>>> 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't 
>>>> semicolons a valid character in filenames?
>>> Why tabs?  Aren't tabs a valid character in filenames?
>>> (hint: Both are valid in POSIX filenames, neither are valid in 
>>> Windows filenames)
>> ";" *is* valid in Windows filenames, actually.  Tabs aresn't.
> Oops.  Guess I got that crossed with valid email address characters...
> But I should probably have stated my point... that since there are no 
> characters that are not illegal in file names on every platform, except 
> "/" and NULL, that some mention should be made, that splitting the line 
> on ; (or TAB) isn't necessarily the correct parsing technique... rather 
> that the line should be parsed from the right end, and the remainder 
> used as a the filename, as the numbers at the end would not have ; or 
> TAB as legal characters within them.  Or else some escaping mechanism 
> needs to be defined.  Or else the ; or TAB will be illegal in names used 
> in the RECORD (which would be limiting, although not significantly so, 
> in my opinion, but others may have other opinions).
FYI, on RISC OS '/' is a valid filename character and '.' is used as the
directory separator.

I'd probably say that TAB is s reasonable character to use, even though
it's OK in POSIX; after all, should anyone really be using a control
character in a filename?

From solipsis at  Sun May 17 00:29:58 2009
From: solipsis at (Antoine Pitrou)
Date: Sat, 16 May 2009 22:29:58 +0000 (UTC)
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

MRAB <google <at>> writes:
> I'd probably say that TAB is s reasonable character to use, even though
> it's OK in POSIX; after all, should anyone really be using a control
> character in a filename?

Even newline characters are valid characters in a filename.
Why not go for the safe choice of encoding all filenames using e.g. 
(which has the advantage that usual filenames will stay perfectly readable)

From shigin at  Sun May 17 06:39:48 2009
From: shigin at (Alexander Shigin)
Date: Sun, 17 May 2009 08:39:48 +0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
	<1242502564.4478.3.camel@jenner> <>
Message-ID: <1242535188.4478.8.camel@jenner>

? ???, 16/05/2009 ? 13:26 -0700, Glenn Linderman ?????:
> On approximately 5/16/2009 1:08 PM, came the following characters from 
> the keyboard of Martin v. L?wis:
> > Yes, this:
> > 
> >
> Well, maybe he was missing that, or maybe he was missing that each entry 
> in the Windows PATH is allowed to be quoted, so that ; characters inside 
> quotes are part of path names, and ; characters outside of quotes are 
> separators.

Yep, I haven't think about it. MSDN entry makes clean that ';' is valid
for file name.

From shigin at  Sun May 17 06:52:39 2009
From: shigin at (Alexander Shigin)
Date: Sun, 17 May 2009 08:52:39 +0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <1242535959.4478.12.camel@jenner>

? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????:
> FYI, on RISC OS '/' is a valid filename character and '.' is used as
> the directory separator.
> I'd probably say that TAB is s reasonable character to use, even
> though it's OK in POSIX; after all, should anyone really be using a
> control character in a filename? 

The '\0' char is invalid in both windows and posix. I don't know if one
valid on RISC OS.

From martin at  Sun May 17 07:03:14 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 17 May 2009 07:03:14 +0200
Subject: [Python-Dev] Cleanup for O&
Message-ID: <>

Issue 6012 proposes to add cleanup support for O& converters;
a first client for this would be PyUnicode_FSConverter. Using
cleanup is always necessary if the conversion function allocates
memory, and a later argument converter fails. The memory allocated
must then be released.

There are three options currently to provide such a function:
1. Make a code O&& with two function pointers. I find that
   too tedious to use.
2. Introduce a new code O$, that takes a O&-style function which,
   in addition, can also be called with a NULL PyObject*, meaning
   that it should cleanup.
3. Extend O& so that its function pointers also support the cleanup
   mode (NULL first argument). Conversion functions that need cleanup
   would have to return a special constant rather than the usual value
   of 1.

In addition, there is also the approach introduced in issue 5990:
4. Users of a conversion function that requires cleanup need to
   initialize the output pointer to NULL, and then release memory
   explicitly when the argument conversion fails.

Which of these do you like best?


From ziade.tarek at  Sun May 17 14:55:45 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sun, 17 May 2009 14:55:45 +0200
Subject: [Python-Dev] LZW support in tarfile ?
Message-ID: <>


I want to remove the usage of the "tar" command in Distutils in favor
or the "tarfile" module.

But, there's an option in Distutils.make_archive to create a tarball
using the "compress" [1] program rather than gzip or bzip2.
Using tar -Z, it will pipe it to the compress program if present. This
program implements the LZW algorithm [2].

The LZW used to be patented but this patent seem to be expired in
every country now [3].

On Distutils side I can work things out so the tar archive created can
be piped to an arbitraty compression program when it is
not compressed using bzip2 or gzip;

But I was wondering if we should we add a LZW support in tarinfo,
besides gzip and bzip2 ?

Although this compression standard doesn't seem very used these days,



Tarek Ziad? |

From google at  Sun May 17 15:04:06 2009
From: google at (MRAB)
Date: Sun, 17 May 2009 14:04:06 +0100
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <1242535959.4478.12.camel@jenner>
References: <>	
Message-ID: <>

Alexander Shigin wrote:
> ? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????:
>> FYI, on RISC OS '/' is a valid filename character and '.' is used as
>> the directory separator.
>> I'd probably say that TAB is s reasonable character to use, even
>> though it's OK in POSIX; after all, should anyone really be using a
>> control character in a filename? 
> The '\0' char is invalid in both windows and posix. I don't know if one
> valid on RISC OS.
'\0' isn't a valid filename character on RISC OS.

From solipsis at  Sun May 17 15:19:44 2009
From: solipsis at (Antoine Pitrou)
Date: Sun, 17 May 2009 13:19:44 +0000 (UTC)
Subject: [Python-Dev] LZW support in tarfile ?
References: <>
Message-ID: <>

Tarek Ziad? <ziade.tarek <at>> writes:
> But I was wondering if we should we add a LZW support in tarinfo,
> besides gzip and bzip2 ?
> Although this compression standard doesn't seem very used these days,

It would be more useful to add LZMA / xz support.
I don't think compress is used anymore, except perhaps on old legacy systems.
On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no
.Z file.



From fuzzyman at  Sun May 17 15:23:03 2009
From: fuzzyman at (Michael Foord)
Date: Sun, 17 May 2009 14:23:03 +0100
Subject: [Python-Dev] LZW support in tarfile ?
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Tarek Ziad? <ziade.tarek <at>> writes:
>> But I was wondering if we should we add a LZW support in tarinfo,
>> besides gzip and bzip2 ?
>> Although this compression standard doesn't seem very used these days,
> It would be more useful to add LZMA / xz support.
> I don't think compress is used anymore, except perhaps on old legacy systems.
> On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no
> .Z file.

I've seen the occasional .Z file in recent years, but never that I 
recall for a Python package.

As plugging in external compression tools is less likely to work 
cross-platform wouldn't it be both easier and better to deprecate (and 
not replace) the compress support.

If there is a huge outcry adding LZW support to tarfile can be reconsidered.

Michael Foord

> Regards
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


From martin at  Sun May 17 17:00:18 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 17 May 2009 17:00:18 +0200
Subject: [Python-Dev] LZW support in tarfile ?
In-Reply-To: <>
References: <>
Message-ID: <>

> But, there's an option in Distutils.make_archive to create a tarball
> using the "compress" [1] program rather than gzip or bzip2.
> Using tar -Z, it will pipe it to the compress program if present. This
> program implements the LZW algorithm [2].

As everybody else says: it might be best to just remove that option.
For compatibility, perhaps deprecate it in 2.7 and 3.1, and remove in
in 3.2.


From piet at  Sun May 17 21:47:16 2009
From: piet at (Piet van Oostrum)
Date: Sun, 17 May 2009 21:47:16 +0200
Subject: [Python-Dev] PEP 383: Non-decodable Bytes in System Character
In-Reply-To: <> (Ned Deily's message of
	"Thu\, 30 Apr 2009 12\:54\:50 -0700")
References: <>
	<> <>
Message-ID: <>

>>>>> Ned Deily <nad at> (ND) wrote:

>ND> In article <m2ocueq6mm.fsf at>, Piet van Oostrum <piet at> 
>ND> wrote:
>>> >>>>> Ronald Oussoren <ronaldoussoren at> (RO) wrote:
>>> >RO> For what it's worth, the OSX API's seem to behave as follows:
>>> >RO> * If you create a file with an non-UTF8 name on a HFS+ filesystem the
>>> >RO> system automaticly encodes the name.
>>> >RO> That is,  open(chr(255), 'w') will silently create a file named '%FF'
>>> >RO> instead of the name you'd expect on a unix system.
>>> Not for me (I am using Python 2.6.2).
>>> >>> f = open(chr(255), 'w')
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>>> >>> 

>ND> What version of OSX are you using?  On Tiger 10.4.11 I see the failure 
>ND> you see but on Leopard 10.5.6 the behavior Ronald reports.

Yes, I am using Tiger (10.4.11). Interesting that it has changed on Leopard.
Piet van Oostrum <piet at>
URL: [PGP 8DAE142BE17999C4]
Private email: piet at

From martin at  Sun May 17 22:54:32 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 17 May 2009 22:54:32 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
Message-ID: <>

Thomas Wouters reminded me of a long-standing idea; I finally
found the time to write it down.

Please comment!


PEP: 384
Title: Defining a Stable ABI
Version: $Revision: 72754 $
Last-Modified: $Date: 2009-05-17 21:14:52 +0200 (So, 17. Mai 2009) $
Author: Martin v. L?wis <martin at>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 17-May-2009
Python-Version: 3.2


Currently, each feature release introduces a new name for the
Python DLL on Windows, and may cause incompatibilities for extension
modules on Unix. This PEP proposes to define a stable set of API
functions which are guaranteed to be available for the lifetime
of Python 3, and which will also remain binary-compatible across
versions. Extension modules and applications embedding Python
can work with different feature releases as long as they restrict
themselves to this stable ABI.


The primary source of ABI incompatibility are changes to the lay-out
of in-memory structures. For example, the way in which string interning
works, or the data type used to represent the size of an object, have
changed during the life of Python 2.x. As a consequence, extension
modules making direct access to fields of strings, lists, or tuples,
would break if their code is loaded into a newer version of the
interpreter without recompilation: offsets of other fields may have
changed, making the extension modules access the wrong data.

In some cases, the incompatibilities only affect internal objects of
the interpreter, such as frame or code objects. For example, the way
line numbers are represented has changed in the 2.x lifetime, as has
the way in which local variables are stored (due to the introduction
of closures). Even though most applications probably never used these
objects, changing them had required to change the PYTHON_API_VERSION.

On Linux, changes to the ABI are often not much of a problem: the
system will provide a default Python installation, and many extension
modules are already provided pre-compiled for that version. If additional
modules are needed, or additional Python versions, users can typically
compile them themselves on the system, resulting in modules that use
the right ABI.

On Windows, multiple simultaneous installations of different Python
versions are common, and extension modules are compiled by their
authors, not by end users. To reduce the risk of ABI incompatibilities,
Python currently introduces a new DLL name pythonXY.dll for each
feature release, whether or not ABI incompatibilities actually exist.

With this PEP, it will be possible to reduce the dependency of binary
extension modules on a specific Python feature release, and applications
embedding Python can be made work with different releases.


The ABI specification falls into two parts: an API specification,
specifying what function (groups) are available for use with the
ABI, and a linkage specification specifying what libraries to link
with. The actual ABI (layout of structures in memory, function
calling conventions) is not specified, but implied by the
compiler. As a recommendation, a specific ABI is recommended for
selected platforms.

During evolution of Python, new ABI functions will be added.
Applications using them will then have a requirement on a minimum
version of Python; this PEP provides no mechanism for such
applications to fall back when the Python library is too old.


Applications and extension modules that want to use this ABI
are collectively referred to as "applications" from here on.

Header Files and Preprocessor Definitions

Applications shall only include the header file Python.h (before
including any system headers), or, optionally, include pyconfig.h, and
then Python.h.

During the compilation of applications, the preprocessor macro
Py_LIMITED_API must be defined. Doing so will hide all definitions
that are not part of the ABI.


Only the following structures and structure fields are accessible to

- PyObject (ob_refcnt, ob_type)
- PyVarObject (ob_base, ob_size)
- Py_buffer (buf, obj, len, itemsize, readonly, ndim, shape,
  strides, suboffsets, smalltable, internal)
- PyMethodDef (ml_name, ml_meth, ml_flags, ml_doc)
- PyMemberDef (name, type, offset, flags, doc)
- PyGetSetDef (name, get, set, doc, closure)

The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE)
are also available to applications.

The following types are available, but opaque (i.e. incomplete):

- PyThreadState
- PyInterpreterState

Type Objects

The structure of type objects is not available to applications;
declaration of "static" type objects is not possible anymore
(for applications using this ABI).
Instead, type objects get created dynamically. To allow an
easy creation of types (in particular, to be able to fill out
function pointers easily), the following structures and functions
are available::

  typedef struct{
    int slot;    /* slot id, see below */
    void *pfunc; /* function pointer */
  } PyType_Slot;

    const char* name;
    const char* doc;
    int basicsize;
    int itemsize;
    int flags;
    PyType_Slot *slots; /* terminated by slot==0. */
  } PyType_Spec;

  PyObject* PyType_FromSpec(PyType_Spec*);

To specify a slot, a unique slot id must be provided. New Python
versions may introduce new slot ids, but slot ids will never be
recycled. Slots may get deprecated, but continue to be supported
throughout Python 3.x.

The slot ids are named like the field names of the structures that
hold the pointers in Python 3.1, with an added ``Py_`` prefix (i.e.
Py_tp_dealloc instead of just tp_dealloc):

- tp_dealloc, tp_print, tp_getattr, tp_setattr, tp_repr,
  tp_hash, tp_call, tp_str, tp_getattro, tp_setattro,
  tp_doc, tp_traverse, tp_clear, tp_richcompare, tp_iter,
  tp_iternext, tp_methods, tp_base, tp_descr_set, tp_descr_set,
  tp_init, tp_alloc, tp_new, tp_is_gc, tp_bases, tp_del
- nb_add nb_subtract nb_multiply nb_remainder nb_divmod nb_power
  nb_negative nb_positive nb_absolute nb_bool nb_invert nb_lshift
  nb_rshift nb_and nb_xor nb_or nb_int nb_float nb_inplace_add
  nb_inplace_subtract nb_inplace_multiply nb_inplace_remainder
  nb_inplace_power nb_inplace_lshift nb_inplace_rshift nb_inplace_and
  nb_inplace_xor nb_inplace_or nb_floor_divide nb_true_divide
  nb_inplace_floor_divide nb_inplace_true_divide nb_index
- sq_length sq_concat sq_repeat sq_item sq_ass_item was_sq_ass_slice
  sq_contains sq_inplace_concat sq_inplace_repeat
- mp_length mp_subscript mp_ass_subscript
- bf_getbuffer bf_releasebuffer

XXX Not supported yet: tp_weaklistoffset, tp_dictoffset

The following fields cannot be set during type definition:
- tp_dict tp_mro tp_cache tp_subclasses tp_weaklist

Functions and function-like Macros

All functions starting with _Py are not available to applications.
Also, all functions that expect parameter types that are unavailable
to applications are excluded from the ABI, such as PyAST_FromNode
(which expects a ``node*``).

All other functions are available, unless excluded below.

Function-like macros (in particular, field access macros) remain
available to applications, but get replaced by function calls
(unless their definition only refers to features of the ABI, such
as the various _Check macros)

ABI function declarations will not change their parameters or return
types. If a change to the signature becomes necessary, a new function
will be introduced. If the new function is source-compatible (e.g. if
just the return type changes), an alias macro may get added to
redirect calls to the new function when the applications is

If continued provision of the old function is not possible, it may get
deprecated, then removed, in accordance with PEP 7, causing
applications that use that function to break.

Excluded Functions

Functions declared in the following header files are not part
of the ABI:
- cellobject.h
- classobject.h
- code.h
- frameobject.h
- funcobject.h
- genobject.h
- pyarena.h
- pydebug.h
- symtable.h
- token.h
- traceback.h

Global Variables

Global variables representing types and exceptions are available
to applications.
XXX provide a complete list.

XXX should restrict list of globals to truly "builtin" stuff,
excluding everything that can also be looked up through imports.

XXX may specify access to predefined types and exceptions through
the interpreter state, with appropriate Get macros.

Other Macros

All macros defining symbolic constants are available to applications;
the numeric values will not change.

In addition, the following macros are available:



On Windows, applications shall link with python3.dll; an import
library python3.lib will be available. This DLL will redirect all of
its API functions through /export linker options to the full
interpreter DLL, i.e. python3y.dll.

XXX is it possible to redirect global variables in the same way?
If not, python3.dll would have to copy them, and we should verify
that all available global variables are read-only.

On Unix systems, the ABI is typically provided by the python
executable itself. PyModule_Create is changed to pass ``3`` as the API
version if the extension module was compiled with Py_LIMITED_API; the
version check for the API version will accept either 3 or the current
PYTHON_API_VERSION as conforming. If Python is compiled as a shared
library, it is installed as both, and;
applications conforming to this PEP should then link to the former.

XXX is it possible to make the soname, and still
have some applications link to

Implementation Strategy

This PEP will be implemented in a branch, allowing users to check
whether their modules conform to the ABI. To simplify this testing, an
additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
type object layout, to let users postpone rewriting all types. When
the this branch is merged into the 3.2 code base, this macro will
be removed.


This document has been placed in the public domain.

From dirkjan at  Sun May 17 23:47:07 2009
From: dirkjan at (Dirkjan Ochtman)
Date: Sun, 17 May 2009 23:47:07 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 17, 2009 at 10:54 PM, "Martin v. L?wis" <martin at> wrote:
> Excluded Functions
> ------------------
> Functions declared in the following header files are not part
> of the ABI:
> - cellobject.h
> - classobject.h
> - code.h
> - frameobject.h
> - funcobject.h
> - genobject.h
> - pyarena.h
> - pydebug.h
> - symtable.h
> - token.h
> - traceback.h

What kind of effect does this have on optimization efforts, for
example all the stuff done by Antoine Pitrou over the last few months,
and the first few results from unladen? Will it mean we won't get to
the good optimizations until 4.0? Or does it just mean unladen swallow
takes longer to come back to trunk (until 4.0) and every extension
author who wants to be compatible with it will basically have the same
burden as now?



From martin at  Mon May 18 00:07:59 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 18 May 2009 00:07:59 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

>> Functions declared in the following header files are not part
>> of the ABI:
>> - cellobject.h
>> - classobject.h
>> - code.h
>> - frameobject.h
>> - funcobject.h
>> - genobject.h
>> - pyarena.h
>> - pydebug.h
>> - symtable.h
>> - token.h
>> - traceback.h
> What kind of effect does this have on optimization efforts, for
> example all the stuff done by Antoine Pitrou over the last few months,
> and the first few results from unladen? 

I fail to see the relationship, so: no effect that I can see.

Why do you think that optimization efforts could be related to
the PEP 384 proposal?


From g.brandl at  Mon May 18 00:17:31 2009
From: g.brandl at (Georg Brandl)
Date: Mon, 18 May 2009 00:17:31 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <guq2et$kgr$>

Martin v. L?wis schrieb:

> Header Files and Preprocessor Definitions
> -----------------------------------------
> Applications shall only include the header file Python.h (before
> including any system headers), or, optionally, include pyconfig.h, and
> then Python.h.

What about structmember.h?  It's not yet included with Python.h AFAICS.


Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From dirkjan at  Mon May 18 00:34:52 2009
From: dirkjan at (Dirkjan Ochtman)
Date: Mon, 18 May 2009 00:34:52 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 18, 2009 at 12:07 AM, "Martin v. L?wis" <martin at> wrote:
> I fail to see the relationship, so: no effect that I can see.
> Why do you think that optimization efforts could be related to
> the PEP 384 proposal?

It would seem to me that optimizations are likely to require data
structure changes, for exactly the kind of core data structures that
you're talking about locking down. But that's just a high-level view,
I might be wrong.



From martin at  Mon May 18 00:35:06 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 18 May 2009 00:35:06 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <guq2et$kgr$>
References: <> <guq2et$kgr$>
Message-ID: <>

>> Header Files and Preprocessor Definitions
>> -----------------------------------------
>> Applications shall only include the header file Python.h (before
>> including any system headers), or, optionally, include pyconfig.h, and
>> then Python.h.
> What about structmember.h?  It's not yet included with Python.h AFAICS.

Right - I think it should be, though. Is there a reason why it's not

The only reason I can see is that it isn't completely namespace-safe,
e.g. it defines a constant READONLY. Not sure whether the T_ constants
would need to be changed as well.

So if that's the rationale, I would propose to make it namespace-safe
under a different file name, and add alias #defines in structmember.h
for compatibility.

I also think this should happen independent of PEP 384.

See also issue 2897 - perhaps we can even fix it for 3.1.


From solipsis at  Mon May 18 00:43:30 2009
From: solipsis at (Antoine Pitrou)
Date: Sun, 17 May 2009 22:43:30 +0000 (UTC)
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
References: <>
Message-ID: <>

Dirkjan Ochtman <dirkjan <at>> writes:
> It would seem to me that optimizations are likely to require data
> structure changes, for exactly the kind of core data structures that
> you're talking about locking down. But that's just a high-level view,
> I might be wrong.

Unless I'm misunderstanding something, Martin doesn't advocate locking data
structures down (except a couple of outliers such as Py_buffer). An
ABI-compliant application mustn't tinker directly with Python's data structures,
but use the ABI functions.



From dirkjan at  Mon May 18 00:46:21 2009
From: dirkjan at (Dirkjan Ochtman)
Date: Mon, 18 May 2009 00:46:21 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 18, 2009 at 12:43 AM, Antoine Pitrou <solipsis at> wrote:
> Unless I'm misunderstanding something, Martin doesn't advocate locking data
> structures down (except a couple of outliers such as Py_buffer). An
> ABI-compliant application mustn't tinker directly with Python's data structures,
> but use the ABI functions.

Right. Sorry about the noise, then.



From martin at  Mon May 18 00:53:00 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 18 May 2009 00:53:00 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	
Message-ID: <>

Dirkjan Ochtman wrote:
> On Mon, May 18, 2009 at 12:07 AM, "Martin v. L?wis" <martin at> wrote:
>> I fail to see the relationship, so: no effect that I can see.
>> Why do you think that optimization efforts could be related to
>> the PEP 384 proposal?
> It would seem to me that optimizations are likely to require data
> structure changes, for exactly the kind of core data structures that
> you're talking about locking down. But that's just a high-level view,
> I might be wrong.

Ah. It's exactly the opposite: The purpose of the PEP is not to lock
the data structures down, but to allow more flexible evolution of
them - by completely hiding them from extension modules.

Currently, any data structure change must be weighed for its impact
on binary compatibility. With the PEP, changing structures can
be done fairly freely - with the exception of the very few structures
that do get locked down. In particular, the list of header files
that you quoted precisely contains the structures that can be
modified with no impact on the ABI.

I'm not aware that any of the structures that I propose to lock
would be relevant for optimization - but I might be wrong. If so,
I'd like to know, and it would be possible to add accessor functions
in cases where extension modules might still legitimately want to
access certain fields.

Certain changes to the VM would definitely be binary-incompatible,
such as removal of reference counting. However, such a change would
probably have a much wider effect, breaking not just binary
compatibility, but also source compatibility. It would be justified
to call a Python release that makes such a change 4.0.


From martin at  Mon May 18 01:04:21 2009
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 18 May 2009 01:04:21 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Dino Viehland wrote:
> Dirkjan Ochtman wrote:
>> It would seem to me that optimizations are likely to require data
>> structure changes, for exactly the kind of core data structures that
>> you're talking about locking down. But that's just a high-level view,
>> I might be wrong.
> In particular I would guess that ref counting is the biggest issue here.
> I would think not directly exposing the field and having inc/dec ref
> Functions (real methods, not macros) for it would give a lot more
> ability to change the API in the future.

In the context of optimization, I'm skeptical that introducing functions
for the reference counting would be useful. Making the INCREF/DECREF
macros functions just in case the reference counting goes away is IMO
an unacceptable performance cost.

Instead, such a change should go through the regular deprecation
procedure and/or cause the release of Python 4.0.

> It also might make it easier for alternate implementations to support
> the same API so some modules could work cross implementation - but I
> suspect that's a non-goal of this PEP :).

Indeed :-) I'm also skeptical that this would actually allow
cross-implementation modules to happen. The list of functions that
an alternate implementation would have to provide is fairly long.

The memory management APIs in particular also assume a certain layout
of Python objects in general, namely that they start with a header
whose size is a compile-time constant. Again, making this more flexible
"just in case" would also impact performance, and probably fairly badly

> Other fields directly accessed (via macros or otherwise) might have similar
> problems but they don't seem as core as ref counting.

Access to the type object reference is probably similar. All the other
structs are used "directly" in C code, with no accessor macros.


From dinov at  Mon May 18 00:48:05 2009
From: dinov at (Dino Viehland)
Date: Sun, 17 May 2009 22:48:05 +0000
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

Dirkjan Ochtman wrote:
> It would seem to me that optimizations are likely to require data
> structure changes, for exactly the kind of core data structures that
> you're talking about locking down. But that's just a high-level view,
> I might be wrong.

In particular I would guess that ref counting is the biggest issue here.
I would think not directly exposing the field and having inc/dec ref
Functions (real methods, not macros) for it would give a lot more
ability to change the API in the future.

It also might make it easier for alternate implementations to support
the same API so some modules could work cross implementation - but I
suspect that's a non-goal of this PEP :).

Other fields directly accessed (via macros or otherwise) might have similar
problems but they don't seem as core as ref counting.

From fuzzyman at  Mon May 18 01:53:12 2009
From: fuzzyman at (Michael Foord)
Date: Mon, 18 May 2009 00:53:12 +0100
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Dino Viehland wrote:
>> Dirkjan Ochtman wrote:
>>> It would seem to me that optimizations are likely to require data
>>> structure changes, for exactly the kind of core data structures that
>>> you're talking about locking down. But that's just a high-level view,
>>> I might be wrong.
>> In particular I would guess that ref counting is the biggest issue here.
>> I would think not directly exposing the field and having inc/dec ref
>> Functions (real methods, not macros) for it would give a lot more
>> ability to change the API in the future.
> In the context of optimization, I'm skeptical that introducing functions
> for the reference counting would be useful. Making the INCREF/DECREF
> macros functions just in case the reference counting goes away is IMO
> an unacceptable performance cost.
> Instead, such a change should go through the regular deprecation
> procedure and/or cause the release of Python 4.0.
>> It also might make it easier for alternate implementations to support
>> the same API so some modules could work cross implementation - but I
>> suspect that's a non-goal of this PEP :).
> Indeed :-) I'm also skeptical that this would actually allow
> cross-implementation modules to happen. The list of functions that
> an alternate implementation would have to provide is fairly long.

Just in case you're unaware of it; the company I work for has an open 
source project called Ironclad. This *is* a reimplementation of the 
Python C API and gives us binary compatibility with [some subset of] 
Python C extensions for use from IronPython.

It's an ambitious project but it is now at the stage where 1000s of the 
Numpy and Scipy tests pass when run from IronPython. I don't think this 
PEP impacts the project, but it is not completely unfeasible for the 
alternative implementations to do this.

In particular we have had to address the issue of the GIL and extensions 
(IronPython has no GIL) and reference counting (which IronPython also 
doesn't) use.

Michael Foord


From foom at  Mon May 18 01:35:59 2009
From: foom at (James Y Knight)
Date: Sun, 17 May 2009 19:35:59 -0400
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On May 17, 2009, at 4:54 PM, Martin v. L?wis wrote:
> Currently, each feature release introduces a new name for the
> Python DLL on Windows, and may cause incompatibilities for extension
> modules on Unix. This PEP proposes to define a stable set of API
> functions which are guaranteed to be available for the lifetime
> of Python 3, and which will also remain binary-compatible across
> versions. Extension modules and applications embedding Python
> can work with different feature releases as long as they restrict
> themselves to this stable ABI.

It seems like a good ideal to strive for.

But I think this is too strong a promise. IMO it would be better to  
say that ABI compatibility across releases is a goal. If someone does  
make a change that breaks the ABI, I'd expect whomever is proposing it  
to put forth a fairly strong argument towards why it's a worthwhile  
change. But it should be possible and allowed, given the right  
circumstances. Because I think it's pretty much inevitable that it  
*will* need to happen, sometime.

(of course there will need to be ABI tests, so that any potential ABI  
breakages are known about when they occur)

Python is much more defined by its source language than its C  
extension API, so tying the python major version number to the C ABI  
might not be the best idea from a "marketing" standpoint. (I can see  
it now..."Python 4.0 major new features: we changed the C method  
definition struct layout incompatibly" :)


From martin at  Mon May 18 08:00:57 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 18 May 2009 08:00:57 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <>

>>> It also might make it easier for alternate implementations to support
>>> the same API so some modules could work cross implementation - but I
>>> suspect that's a non-goal of this PEP :).
>> Indeed :-) I'm also skeptical that this would actually allow
>> cross-implementation modules to happen. The list of functions that
>> an alternate implementation would have to provide is fairly long.
> Just in case you're unaware of it; the company I work for has an open
> source project called Ironclad.

I was unaware indeed; thanks for pointing this out.

IIUC, it's not just an API emulation, but also an ABI emulation.

> In particular we have had to address the issue of the GIL and extensions
> (IronPython has no GIL) and reference counting (which IronPython also
> doesn't) use.

I think this somewhat strengthens the point I was trying to make: An
alternate implementation that tries to be API compatible has to consider
so many things that it is questionable whether making Py_INCREF/DECREF
functions would be any simplification.

So I just ask:
a) Would it help IronClad if it could restrict itself to PEP 384
   compatible modules?
b) Would further restrictions in the PEP help that cause?


From nick at  Mon May 18 10:06:17 2009
From: nick at (Nick Craig-Wood)
Date: Mon, 18 May 2009 09:06:17 +0100
Subject: [Python-Dev] LZW support in tarfile ?
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Foord <fuzzyman at> wrote:
>  Antoine Pitrou wrote:
> > Tarek Ziad? <ziade.tarek <at>> writes:
> >   
> >> But I was wondering if we should we add a LZW support in tarinfo,
> >> besides gzip and bzip2 ?
> >>
> >> Although this compression standard doesn't seem very used these days,
> >>     
> >
> > It would be more useful to add LZMA / xz support.
> > I don't think compress is used anymore, except perhaps on old legacy systems.
> > On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no
> > .Z file.
>  I've seen the occasional .Z file in recent years, but never that I 
>  recall for a Python package.

On my unix filesystem (which has files stretching back over 20 years)
I find only two .Z files, one dated 1989 and one 2002.  I think you
can safely say that compress is gone!

The worst you are doing by removing compress support is getting the
user of some ancient platform to download one of the binaries here

>  As plugging in external compression tools is less likely to work 
>  cross-platform wouldn't it be both easier and better to deprecate (and 
>  not replace) the compress support.


Nick Craig-Wood <nick at> --

From ziade.tarek at  Mon May 18 10:27:58 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Mon, 18 May 2009 10:27:58 +0200
Subject: [Python-Dev] LZW support in tarfile ?
In-Reply-To: <>
References: <>
Message-ID: <>

Ok thanks for all the feedback, I'll remove compress support


On Mon, May 18, 2009 at 10:06 AM, Nick Craig-Wood <nick at> wrote:
> Michael Foord <fuzzyman at> wrote:
>> ?Antoine Pitrou wrote:
>> > Tarek Ziad? <ziade.tarek <at>> writes:
>> >
>> >> But I was wondering if we should we add a LZW support in tarinfo,
>> >> besides gzip and bzip2 ?
>> >>
>> >> Although this compression standard doesn't seem very used these days,
>> >>
>> >
>> > It would be more useful to add LZMA / xz support.
>> > I don't think compress is used anymore, except perhaps on old legacy systems.
>> > On my Linux system, I have lots of .gz, .bz2 and .lzma files, but absolutely no
>> > .Z file.
>> ?I've seen the occasional .Z file in recent years, but never that I
>> ?recall for a Python package.
> On my unix filesystem (which has files stretching back over 20 years)
> I find only two .Z files, one dated 1989 and one 2002. ?I think you
> can safely say that compress is gone!
> The worst you are doing by removing compress support is getting the
> user of some ancient platform to download one of the binaries here
> first.
> ?
>> ?As plugging in external compression tools is less likely to work
>> ?cross-platform wouldn't it be both easier and better to deprecate (and
>> ?not replace) the compress support.
> Agreed.
> --
> Nick Craig-Wood <nick at> --
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Tarek Ziad? |

From fuzzyman at  Mon May 18 13:17:37 2009
From: fuzzyman at (Michael Foord)
Date: Mon, 18 May 2009 12:17:37 +0100
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>>>> It also might make it easier for alternate implementations to support
>>>> the same API so some modules could work cross implementation - but I
>>>> suspect that's a non-goal of this PEP :).
>>> Indeed :-) I'm also skeptical that this would actually allow
>>> cross-implementation modules to happen. The list of functions that
>>> an alternate implementation would have to provide is fairly long.
>> Just in case you're unaware of it; the company I work for has an open
>> source project called Ironclad.
> I was unaware indeed; thanks for pointing this out.
> IIUC, it's not just an API emulation, but also an ABI emulation.


>> In particular we have had to address the issue of the GIL and extensions
>> (IronPython has no GIL) and reference counting (which IronPython also
>> doesn't) use.
> I think this somewhat strengthens the point I was trying to make: An
> alternate implementation that tries to be API compatible has to consider
> so many things that it is questionable whether making Py_INCREF/DECREF
> functions would be any simplification.

It would actually have been helpful for us, but I understand that it 
would be a big performance hit. The Ironclad garbage collection 
mechanism is described here:

We artificially inflate the refcount of all objects that Ironclad 
creates to 2 and hold a reference to them on the .NET side to make them 
ineligible for garbage collection.

Because we can't always know when objects have been decreffed back down 
to 1, there are some circumstances when we have to scan all the objects 
we are holding onto. If their refcount is only 1 then we no longer need 
to hold a reference them. When nothing is using them on the IronPython 
side either normal .NET garbage collection kicks in and the IronPython 
proxy object has a destructor that calls back into Ironclad and uses the 
CPython dealloc method.

> So I just ask:
> a) Would it help IronClad if it could restrict itself to PEP 384
>    compatible modules?
> b) Would further restrictions in the PEP help that cause?

I've forwarded these questions to the lead developer of Ironclad 
(William Reade) along with a link to the PEP. He isn't on Python-dev so 
I may have to be a proxy for him in discussion. His initial response was 
"looks pretty sweet".


> Regards,
> Martin


From william at  Tue May 19 11:09:58 2009
From: william at (William Reade)
Date: Tue, 19 May 2009 10:09:58 +0100
Subject: [Python-Dev] [Fwd: Re:  PEP 384: Defining a Stable ABI]
In-Reply-To: <>
References: <>
Message-ID: <>

My perspective is as follows:

1) If PEP-384 had always been in place, my life would now be a lot easier.

2) Since it hasn't always been in place, its introduction won't help me 
in the short term: there are an awful lot of extension modules that use 
excluded functions (for example, all(?) PyCxx modules use PyCode_New and 
PyFrame_New to get nicer tracebacks), and I'll still have to handle all 
these cases until everyone is up-to-date with whatever version of Python 
this gets accepted into.

3) Regardless, this PEP makes me very happy, because I can now look 
forward to the glorious day when all extension modules are 
384-compatible (and even *some* modules becoming compatible will make me 
pretty happy).

However, I'm not sure exactly how we can get there from here; I suspect 
that certain features of certain extensions already depend critically 
upon implementation details which will become hidden. The most extreme 
illustrative example I know is from NumPy (in scalarmathmodule.c), and 
looks like this:

            PyInt_Type.tp_as_number = PyLongArrType_Type.tp_as_number;
            PyInt_Type.tp_compare = PyLongArrType_Type.tp_compare;
            PyInt_Type.tp_richcompare = PyLongArrType_Type.tp_richcompare;

...and I fear that many many similar (if perhaps less frightening) 
dependencies exist elsewhere.

Regardless, in answer to the two specific questions you ask:

a) We don't really have that option. However, I would have a much higher 
degree of confidence in running PEP-384-compatible modules under 
Ironclad than I do with current modules, simply because I would no 
longer need to worry about (say) edge cases in which extension writers 
suddenly try to directly access op->ob_type->tp_as_number->nb_power.

b) I can't think of any more useful restrictions. The PEP would solve my 
biggest current worry, which is that my current implementation allows 
managed/unmanaged lists to fall out of sync in certain circumstances 
(but if every list mutation happened via an API call, it wouldn't be an 

Best Regards

Michael Foord wrote:
> The questions from Martin v. Lowis are in the email below.
> The PEP under discussion is:
> I can proxy any replies you want to send, or you can join Python-dev.
> All the best,
> Michael
> -------- Original Message --------
> Subject: 	Re: [Python-Dev] PEP 384: Defining a Stable ABI
> Date: 	Mon, 18 May 2009 08:00:57 +0200
> From: 	"Martin v. L?wis" <martin at>
> To: 	Michael Foord <fuzzyman at>
> CC: 	Dino Viehland <dinov at>, Python-Dev 
> <python-dev at>, Unladen Swallow 
> <unladen-swallow at>, Python List <python-list at>
> References: 	<4A107988.3020202 at> 
> <ea2499da0905171447n5a353b61w1c98a91bf3617e5c at> 
> <4A108ABF.9060909 at> 
> <ea2499da0905171534vca080dci473d2ee9f9915668 at> 
> <1A472770E042064698CB5ADC83A12ACD016E82D6 at> 
> <4A1097F5.2050009 at> <4A10A368.1060406 at>
> >>> It also might make it easier for alternate implementations to support
> >>> the same API so some modules could work cross implementation - but I
> >>> suspect that's a non-goal of this PEP :).
> >>>     
> >>
> >> Indeed :-) I'm also skeptical that this would actually allow
> >> cross-implementation modules to happen. The list of functions that
> >> an alternate implementation would have to provide is fairly long.
> >>
> >>   
> > 
> > Just in case you're unaware of it; the company I work for has an open
> > source project called Ironclad.
> I was unaware indeed; thanks for pointing this out.
> IIUC, it's not just an API emulation, but also an ABI emulation.
> > In particular we have had to address the issue of the GIL and extensions
> > (IronPython has no GIL) and reference counting (which IronPython also
> > doesn't) use.
> I think this somewhat strengthens the point I was trying to make: An
> alternate implementation that tries to be API compatible has to consider
> so many things that it is questionable whether making Py_INCREF/DECREF
> functions would be any simplification.
> So I just ask:
> a) Would it help IronClad if it could restrict itself to PEP 384
>    compatible modules?
> b) Would further restrictions in the PEP help that cause?
> Regards,
> Martin
> -- 

From ronaldoussoren at  Tue May 19 14:59:31 2009
From: ronaldoussoren at (Ronald Oussoren)
Date: Tue, 19 May 2009 14:59:31 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 17 May, 2009, at 15:04, MRAB wrote:

> Alexander Shigin wrote:
>> ? ???, 16/05/2009 ? 23:15 +0100, MRAB ?????:
>>> FYI, on RISC OS '/' is a valid filename character and '.' is used as
>>> the directory separator.
>>> I'd probably say that TAB is s reasonable character to use, even
>>> though it's OK in POSIX; after all, should anyone really be using a
>>> control character in a filename?
>> The '\0' char is invalid in both windows and posix. I don't know if  
>> one
>> valid on RISC OS.
> '\0' isn't a valid filename character on RISC OS.

Wouldn't it be possible to use a CSV file for this? That way we  
wouldn't have to invent yet another escaping mechanism and there's  
already good suppport for reading and writing CSV files in the  
standard library.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2224 bytes
Desc: not available
URL: <>

From solipsis at  Tue May 19 16:03:12 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 19 May 2009 14:03:12 +0000 (UTC)
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
References: <>
Message-ID: <>

Ronald Oussoren <ronaldoussoren <at>> writes:
> Wouldn't it be possible to use a CSV file for this? That way we  
> wouldn't have to invent yet another escaping mechanism and there's  
> already good suppport for reading and writing CSV files in the  
> standard library.


We can even customize the delimiter if you want to make it more readable (or if
there's a shortage of bikeshed material ;-)).



From ziade.tarek at  Tue May 19 16:04:21 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Tue, 19 May 2009 16:04:21 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, May 16, 2009 at 6:55 PM, P.J. Eby <pje at> wrote:
> 1. Why ';' separation, instead of tabs as in PEP 262? ?Aren't semicolons a
> valid character in filenames?

I am changing this into a <tab>. for now.

What about Antoine's idea about doing a quote() on the names ?

>From my point of view <tabs> seems more simple to deal with, if 3rd-party
tools want to work on these files without using pkgutil or Python.

> 4. There should probably be a way to iterate over the projects in a
> directory, since it's otherwise impossible for an installation tool to find
> out what project(s) "own" a file that conflicts with something being
> installed. ?Alternatively, reshaping the file API to allow querying by path
> as well as by project might work.

I am adding a "get_projects" api:

  get_projects() -> iterator

  Provides an iterator that will return (name, path) tuples, where `name`
  is the name of a registered project and `path` the path to its `egg-info`

But for the use case you are mentioning, what about an explicit API:

  get_owners(paths) -> sequence of project names

  returns a sequence of tuple. For each path in the "paths" list, a
tuple of project names
  is returned

> 5. If any cache mechanisms are to be used by the API, the API *must* make it
> possible to bypass or explicitly manage that cache, as otherwise
> installation tools and tools that manipulate sys.path at runtime may end up
> using incorrect data.

work in progress - (I am afraid I have to write an advanced prototype
to be able to know
exaclty how the cache might work, and so, what API we should have)

> 6. get_files() doesn't document whether the yielded paths are absolute or
> relative, local or cross-platform, etc.

I am fixing this as well

>> I need to find back your comments for this part, I must have missed
>> them. That's
>> the last part I didn't work out yet on the current PEP revision.
> Well, if you can't find them, the EggFormats doc explains how these file/dir
> structures are currently laid out by setuptools, easy_install, pip, etc.,
> and the PEP should probably reference that.

work in progress

Tarek Ziad? |

From ziade.tarek at  Tue May 19 16:12:05 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Tue, 19 May 2009 16:12:05 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Tue, May 19, 2009 at 4:03 PM, Antoine Pitrou <solipsis at> wrote:
> Ronald Oussoren <ronaldoussoren <at>> writes:
>> Wouldn't it be possible to use a CSV file for this? That way we
>> wouldn't have to invent yet another escaping mechanism and there's
>> already good suppport for reading and writing CSV files in the
>> standard library.
> +1
> We can even customize the delimiter if you want to make it more readable (or if
> there's a shortage of bikeshed material ;-)).


and the default csv delimiter ","  makes it perfectly readable

From google at  Tue May 19 16:21:25 2009
From: google at (MRAB)
Date: Tue, 19 May 2009 15:21:25 +0100
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Tarek Ziad? wrote:
> On Sat, May 16, 2009 at 6:55 PM, P.J. Eby <pje at> wrote:
>> 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't semicolons a
>> valid character in filenames?
> I am changing this into a <tab>. for now.
> What about Antoine's idea about doing a quote() on the names ?
>>From my point of view <tabs> seems more simple to deal with, if 3rd-party
> tools want to work on these files without using pkgutil or Python.
>> 4. There should probably be a way to iterate over the projects in a
>> directory, since it's otherwise impossible for an installation tool to find
>> out what project(s) "own" a file that conflicts with something being
>> installed.  Alternatively, reshaping the file API to allow querying by path
>> as well as by project might work.
> I am adding a "get_projects" api:
>   get_projects() -> iterator
>   Provides an iterator that will return (name, path) tuples, where `name`
>   is the name of a registered project and `path` the path to its `egg-info`
>   directory.
> But for the use case you are mentioning, what about an explicit API:
>   get_owners(paths) -> sequence of project names
>   returns a sequence of tuple. For each path in the "paths" list, a
> tuple of project names
>   is returned
>> 5. If any cache mechanisms are to be used by the API, the API *must* make it
>> possible to bypass or explicitly manage that cache, as otherwise
>> installation tools and tools that manipulate sys.path at runtime may end up
>> using incorrect data.
> work in progress - (I am afraid I have to write an advanced prototype
> to be able to know
> exaclty how the cache might work, and so, what API we should have)
>> 6. get_files() doesn't document whether the yielded paths are absolute or
>> relative, local or cross-platform, etc.
> I am fixing this as well
>>> I need to find back your comments for this part, I must have missed
>>> them. That's
>>> the last part I didn't work out yet on the current PEP revision.
>> Well, if you can't find them, the EggFormats doc explains how these file/dir
>> structures are currently laid out by setuptools, easy_install, pip, etc.,
>> and the PEP should probably reference that.
> work in progress
Is it Pythonic for the methods to starts with "get_", or should they be
projects(), owners(), etc?

From p.f.moore at  Tue May 19 21:33:50 2009
From: p.f.moore at (Paul Moore)
Date: Tue, 19 May 2009 20:33:50 +0100
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/19 Tarek Ziad? <ziade.tarek at>:
> On Sat, May 16, 2009 at 6:55 PM, P.J. Eby <pje at> wrote:
>> 1. Why ';' separation, instead of tabs as in PEP 262? ?Aren't semicolons a
>> valid character in filenames?
> I am changing this into a <tab>. for now.

I'm not following this thread at all, but can I put a strong vote
*against* tabs in, please. You're just asking for bug reports from
people who edit the file and expand tabs to spaces (either
deliberately, or via an automatic editor setting they forgot about)
and then can't see why a file that looks the same works differently.

OK, so it's not meant to be a human editable file, but that won't stop
some people :-)


From pje at  Tue May 19 22:36:40 2009
From: pje at (P.J. Eby)
Date: Tue, 19 May 2009 16:36:40 -0400
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <
References: <>
Message-ID: <>

At 04:04 PM 5/19/2009 +0200, Tarek Ziad? wrote:
>On Sat, May 16, 2009 at 6:55 PM, P.J. Eby <pje at> wrote:
> >
> > 1. Why ';' separation, instead of tabs as in PEP 262?  Aren't semicolons a
> > valid character in filenames?
>I am changing this into a <tab>. for now.
>What about Antoine's idea about doing a quote() on the names ?

I like the CSV idea better, since the csv module is available in 2.3 
and up.  We should just pick a dialect with unambiguous quoting rules.

> From my point of view <tabs> seems more simple to deal with, if 3rd-party
>tools want to work on these files without using pkgutil or Python.

True, but then CSV files are still pretty common.

One other possibility that might work is using a vertical bar as a separator.

My preference rank at the moment is probably tabs, CSV, or vertical 
bar.  But I don't really care all that much, so let the people who care decide.

Personally, though, I don't see much point to cross-language 
manipulation of the file.  System packaging tools have their own way 
of keeping track of this stuff.  So unless somebody's using it to 
*build* system packages (e.g. making an RPM builder), they don't need this.

Now, about the APIs...

> > 4. There should probably be a way to iterate over the projects in a
> > directory, since it's otherwise impossible for an installation tool to find
> > out what project(s) "own" a file that conflicts with something being
> > installed.  Alternatively, reshaping the file API to allow querying by path
> > as well as by project might work.
>I am adding a "get_projects" api:
>   get_projects() -> iterator
>   Provides an iterator that will return (name, path) tuples, where `name`
>   is the name of a registered project and `path` the path to its `egg-info`
>   directory.
>But for the use case you are mentioning, what about an explicit API:
>   get_owners(paths) -> sequence of project names
>   returns a sequence of tuple. For each path in the "paths" list, a
>tuple of project names
>   is returned
> >
> > 5. If any cache mechanisms are to be used by the API, the API 
> *must* make it
> > possible to bypass or explicitly manage that cache, as otherwise
> > installation tools and tools that manipulate sys.path at runtime may end up
> > using incorrect data.
>work in progress - (I am afraid I have to write an advanced prototype
>to be able to know
>exaclty how the cache might work, and so, what API we should have)

I think it would be simpler to have explicit object types 
representing things like a directory, a collection of directories, 
and individual projects, and these object types should be part of the API.

Any function-oriented API should just be exposed as the methods of a 
default singleton.  Other Python modules follow this pattern -- and 
it's what I copied for the pkg_resources design.  It gives a nice 
tradeoff between keeping the simple things simple, and complex things 
possible, as well as keeping mechanism and policy separate.

Right now, the API design you're trying to do is being burdened by 
using strings and tuples to represent things that could just as 
easily be objects with their own methods, instead of things you have 
to pass back into other APIs.  This also makes caching more complex, 
because you can't just have one main object with stuff hanging off; 
you've got to have a bunch of dictionaries, tuples, lists, sets, etc.

From fuzzyman at  Wed May 20 00:48:42 2009
From: fuzzyman at (Michael Foord)
Date: Tue, 19 May 2009 23:48:42 +0100
Subject: [Python-Dev] IronPython specific code in inspect module
Message-ID: <>

Hello all,

The inspect module (inspect.get_argspec etc) work fine for Python 
functions and classes in IronPython, but they don't work on .NET types 
which don't have the Python function attributes like im_func etc.

I have IronPython specific versions of several of these functions which 
use .NET reflection and inspect could fallback to if sys.platform == 
'cli'. Would it be ok for me to add these to the inspect module? 
Obviously the tests would only run on IronPython... The behaviour for 
CPython would be unaffected.

All the best,

Michael Foord


From benjamin at  Wed May 20 03:26:47 2009
From: benjamin at (Benjamin Peterson)
Date: Tue, 19 May 2009 20:26:47 -0500
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/19 Michael Foord <fuzzyman at>:
> I have IronPython specific versions of several of these functions which use
> .NET reflection and inspect could fallback to if sys.platform == 'cli'.
> Would it be ok for me to add these to the inspect module? Obviously the
> tests would only run on IronPython... The behaviour for CPython would be
> unaffected.

I wish we had more of a policy about this. There seems to be a long
tradition of special casing other implementations in the stdlib. For
example, see and tests/ for remnants of Jython
compatibility. However, I suspect this code has languished with out
core-developers using the trunk stdlib with Jython. I suppose this is
a good reason why we are going to split the stdlib out of the main

However that still leaves the question of how to handle putting code
like this in. Should we ask that all code be
implementation-independent as much as possible from the original
authors? Do all all changes against the stdlib have to be run against
several implementations? Should we sprinkle if switches all over the
codebase for different implementations, or should new support files be


From fijall at  Wed May 20 04:09:03 2009
From: fijall at (Maciej Fijalkowski)
Date: Tue, 19 May 2009 20:09:03 -0600
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 19, 2009 at 7:26 PM, Benjamin Peterson <benjamin at> wrote:
> 2009/5/19 Michael Foord <fuzzyman at>:
>> I have IronPython specific versions of several of these functions which use
>> .NET reflection and inspect could fallback to if sys.platform == 'cli'.
>> Would it be ok for me to add these to the inspect module? Obviously the
>> tests would only run on IronPython... The behaviour for CPython would be
>> unaffected.
> I wish we had more of a policy about this. There seems to be a long
> tradition of special casing other implementations in the stdlib. For
> example, see and tests/ for remnants of Jython
> compatibility. However, I suspect this code has languished with out
> core-developers using the trunk stdlib with Jython. I suppose this is
> a good reason why we are going to split the stdlib out of the main
> repo.
> However that still leaves the question of how to handle putting code
> like this in. Should we ask that all code be
> implementation-independent as much as possible from the original
> authors? Do all all changes against the stdlib have to be run against
> several implementations? Should we sprinkle if switches all over the
> codebase for different implementations, or should new support files be
> added?

>From my observation (mostly according to jython), such changes easily get out of
sync. The net result is that you have one, outdated, version in stdlib
and other implementation, like IronPython is maintaining it's own
anyway. IMO it's easy enough
to maintain clearly implementation-specific parts out of cpython's stdlib.

What I would rather like to see is that stdlib does not contain impl
specific parts,
even for cpython and cpython maintains it's own things outside of stdlib. This
would be in line with what we discussed at pycon I think, please correct me if
I'm wrong.


From dstanek at  Wed May 20 04:21:27 2009
From: dstanek at (David Stanek)
Date: Tue, 19 May 2009 22:21:27 -0400
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 19, 2009 at 9:26 PM, Benjamin Peterson <benjamin at> wrote:
> 2009/5/19 Michael Foord <fuzzyman at>:
>> I have IronPython specific versions of several of these functions which use
>> .NET reflection and inspect could fallback to if sys.platform == 'cli'.
>> Would it be ok for me to add these to the inspect module? Obviously the
>> tests would only run on IronPython... The behaviour for CPython would be
>> unaffected.
> I wish we had more of a policy about this. There seems to be a long
> tradition of special casing other implementations in the stdlib. For
> example, see and tests/ for remnants of Jython
> compatibility. However, I suspect this code has languished with out
> core-developers using the trunk stdlib with Jython. I suppose this is
> a good reason why we are going to split the stdlib out of the main
> repo.
> However that still leaves the question of how to handle putting code
> like this in. Should we ask that all code be
> implementation-independent as much as possible from the original
> authors? Do all all changes against the stdlib have to be run against
> several implementations? Should we sprinkle if switches all over the
> codebase for different implementations, or should new support files be
> added?

It seems that using a technique similar to dependency injection could
provide some value. DI allows implementations conforming to some
interface to be injected into a running application without the messy
construction logic. The simple construction-by-hand pattern is to
create the dependencies and pass them into the dependent objects.
Frameworks build on top of this to allow the dependencies to be wired
together without having any construction logic in code, like switch
statements, to do the wiring.

I think a similar pattern could be used in the standard library. When
the interpreter goes through its normal bootstrapping process in can
just execute a module provided by the vendor that specifies the
platform specific implementations. Some defaults can be provided since
Python already has a bunch of platform specific implementations.

An over simplified design to make this happen may look like:
 1. Create a simple configuration that allows a mapping of interfaces
to implementations. This is where the vendor would say when using
inspect you really should be using cli.inspect.
 2. Add executing this new configuration to the bootstrapping process.
 3. Add generic hooks into the library where needed to load the
dependency instead of platform specific if statements.
 4. Rip out the platform specific code that is hidden in the if
statements and use that as the basis for the sane injected defaults.
 5. Document the interfaces for each component that can be changed by
the vendor.


From benjamin at  Wed May 20 04:26:55 2009
From: benjamin at (Benjamin Peterson)
Date: Tue, 19 May 2009 21:26:55 -0500
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/19 Maciej Fijalkowski <fijall at>:
> From my observation (mostly according to jython), such changes easily get out of
> sync. The net result is that you have one, outdated, version in stdlib
> and other implementation, like IronPython is maintaining it's own
> anyway. IMO it's easy enough
> to maintain clearly implementation-specific parts out of cpython's stdlib.

Hopefully, it will be easier to visualize how this might work once the
plan for hg migration is finalized.
> What I would rather like to see is that stdlib does not contain impl
> specific parts,
> even for cpython and cpython maintains it's own things outside of stdlib. This
> would be in line with what we discussed at pycon I think, please correct me if
> I'm wrong.

I was not present, but that's my impression, too.


From dinov at  Wed May 20 04:36:13 2009
From: dinov at (Dino Viehland)
Date: Wed, 20 May 2009 02:36:13 +0000
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Foord wrote:
> I have IronPython specific versions of several of these functions which
> use .NET reflection and inspect could fallback to if sys.platform ==
> 'cli'. Would it be ok for me to add these to the inspect module?
> Obviously the tests would only run on IronPython... The behaviour for
> CPython would be unaffected.

What about instead defining __argspec__ for built-in functions/method
objects and allowing all the implementations to implement it?  We could
all agree to return:

        (return_type, (arg_types,...)),
        (return_type, (arg_types,...)),

Then inspect can check for that attribute and support introspection on
built-ins.  This would be an easy feature for us to implement and it
may also be for Jython as well given that we both get the power of our
platforms reflection capabilities.  Any platform that implements it
lights up w/o new platform specific code. And maybe this needs to go
to python-ideas now :)

From ajaksu at  Wed May 20 05:18:34 2009
From: ajaksu at (Daniel Diniz)
Date: Wed, 20 May 2009 00:18:34 -0300
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

Dino Viehland wrote:
> What about instead defining __argspec__ for built-in functions/method
> objects and allowing all the implementations to implement it? ?We could
> all agree to return:
> [
> ? ? ? ?(return_type, (arg_types,...)),
> ? ? ? ?(return_type, (arg_types,...)),
> ]
> Then inspect can check for that attribute and support introspection on
> built-ins. ?This would be an easy feature for us to implement and it
> may also be for Jython as well given that we both get the power of our
> platforms reflection capabilities. ?Any platform that implements it
> lights up w/o new platform specific code. And maybe this needs to go
> to python-ideas now :)

Curiously, inspect limitations on CPython (can't inspect
functools.partial, has issues with some descriptors and decorators)
got us chatting about PEP 362: Function Signature Object[0] on
#python-dev today.

PEP 362 was also brought up in a recent thread where the executive
summary was 'it just needs someone to guide it through the last
steps'[1], and it would make this kind of introspection nice and

It makes even more sense now we have PEP 3107: Function Annotations[3] in place.



From chrispl78 at  Wed May 20 09:31:00 2009
From: chrispl78 at (Chris Plasun)
Date: Wed, 20 May 2009 00:31:00 -0700
Subject: [Python-Dev] Python on PowerPC?
Message-ID: <>


I'm to develop console apps on a Linux embedded PowerPC board (Freescale 

Is there a Python release for the PowerPC platform?

Chris Plasun

From ziade.tarek at  Wed May 20 11:48:59 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 20 May 2009 11:48:59 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 19, 2009 at 10:36 PM, P.J. Eby <pje at> wrote:
> Now, about the APIs...
> I think it would be simpler to have explicit object types representing
> things like a directory, a collection of directories, and individual
> projects, and these object types should be part of the API.
> Any function-oriented API should just be exposed as the methods of a default
> singleton. ?Other Python modules follow this pattern -- and it's what I
> copied for the pkg_resources design. ?It gives a nice tradeoff between
> keeping the simple things simple, and complex things possible, as well as
> keeping mechanism and policy separate.
> Right now, the API design you're trying to do is being burdened by using
> strings and tuples to represent things that could just as easily be objects
> with their own methods, instead of things you have to pass back into other
> APIs. ?This also makes caching more complex, because you can't just have one
> main object with stuff hanging off; you've got to have a bunch of
> dictionaries, tuples, lists, sets, etc.

I don't know how other people work on building APIs in PEPs, but at
this stage I am unable
to work them on the paper, without having a prototype to try things out.

So I guess I'll start this prototype in bitbucket and come back with
it for feedback
in Distutils-SIG, for a new PEP 376 round.


Tarek Ziad? |

From doug.hellmann at  Wed May 20 13:13:44 2009
From: doug.hellmann at (Doug Hellmann)
Date: Wed, 20 May 2009 07:13:44 -0400
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

On May 19, 2009, at 10:21 PM, David Stanek wrote:

> On Tue, May 19, 2009 at 9:26 PM, Benjamin Peterson <benjamin at 
> > wrote:
>> 2009/5/19 Michael Foord <fuzzyman at>:
>>> I have IronPython specific versions of several of these functions  
>>> which use
>>> .NET reflection and inspect could fallback to if sys.platform ==  
>>> 'cli'.
>>> Would it be ok for me to add these to the inspect module?  
>>> Obviously the
>>> tests would only run on IronPython... The behaviour for CPython  
>>> would be
>>> unaffected.


>> However that still leaves the question of how to handle putting code
>> like this in. Should we ask that all code be
>> implementation-independent as much as possible from the original
>> authors? Do all all changes against the stdlib have to be run against
>> several implementations? Should we sprinkle if switches all over the
>> codebase for different implementations, or should new support files  
>> be
>> added?
> It seems that using a technique similar to dependency injection could
> provide some value. DI allows implementations conforming to some
> interface to be injected into a running application without the messy
> construction logic. The simple construction-by-hand pattern is to
> create the dependencies and pass them into the dependent objects.
> Frameworks build on top of this to allow the dependencies to be wired
> together without having any construction logic in code, like switch
> statements, to do the wiring.
> I think a similar pattern could be used in the standard library. When
> the interpreter goes through its normal bootstrapping process in can
> just execute a module provided by the vendor that specifies the
> platform specific implementations. Some defaults can be provided since
> Python already has a bunch of platform specific implementations.
> An over simplified design to make this happen may look like:
> 1. Create a simple configuration that allows a mapping of interfaces
> to implementations. This is where the vendor would say when using
> inspect you really should be using cli.inspect.

That sounds like a plugin and the "strategy" pattern.  Tarek is doing  
some work on providing a standard plugin mechanism as part of the work  
he's doing on distutils, isn't he?

> 2. Add executing this new configuration to the bootstrapping process.

Maybe I misunderstand, but wouldn't it make more sense to initialize  
the platform-specific parts of a module when it is imported rather  
than bring in everything at startup?

Are we only worried about interpreter-implementation-level  
dependencies, or should there be a way for all platform-specific  
features to be treated in the same way?   There are quite a few checks  
for Windows that could be moved into the platform-specific modules if  
there was an easy/standard way to do it.


> 3. Add generic hooks into the library where needed to load the
> dependency instead of platform specific if statements.
> 4. Rip out the platform specific code that is hidden in the if
> statements and use that as the basis for the sane injected defaults.
> 5. Document the interfaces for each component that can be changed by
> the vendor.
> -- 
> David
> blog:
> twitter:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From doug.hellmann at  Wed May 20 13:14:53 2009
From: doug.hellmann at (Doug Hellmann)
Date: Wed, 20 May 2009 07:14:53 -0400
Subject: [Python-Dev] Python on PowerPC?
In-Reply-To: <>
References: <>
Message-ID: <>

On May 20, 2009, at 3:31 AM, Chris Plasun wrote:

> Hi,
> I'm to develop console apps on a Linux embedded PowerPC board  
> (Freescale MPC8313).
> Is there a Python release for the PowerPC platform?

We used to run a version of the interpreter on PPC for a  
microcontroller board we had, but we built it ourselves.  Have you  
tried building from source?  It might Just Work.


From eckhardt at  Wed May 20 13:17:10 2009
From: eckhardt at (Ulrich Eckhardt)
Date: Wed, 20 May 2009 13:17:10 +0200
Subject: [Python-Dev] Python on PowerPC?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wednesday 20 May 2009, Chris Plasun wrote:
> I'm to develop console apps on a Linux embedded PowerPC board (Freescale
> MPC8313).
> Is there a Python release for the PowerPC platform?

This has pretty little to do with the development of the Python language 
itself, so it is rather off topic here.

That said, Linux systems are barely thinkable without Python, even when 
running on PPC, so yes, Python runs on PPC, too, and is included in probably 
every Linux distro, e.g. Debian.


Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

Sator Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932
           Visit our website at <>
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

From dstanek at  Wed May 20 13:54:56 2009
From: dstanek at (David Stanek)
Date: Wed, 20 May 2009 07:54:56 -0400
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 20, 2009 at 7:13 AM, Doug Hellmann <doug.hellmann at> wrote:
> On May 19, 2009, at 10:21 PM, David Stanek wrote:
>> It seems that using a technique similar to dependency injection could
>> provide some value. DI allows implementations conforming to some
>> interface to be injected into a running application without the messy
>> construction logic. The simple construction-by-hand pattern is to
>> create the dependencies and pass them into the dependent objects.
>> Frameworks build on top of this to allow the dependencies to be wired
>> together without having any construction logic in code, like switch
>> statements, to do the wiring.
>> I think a similar pattern could be used in the standard library. When
>> the interpreter goes through its normal bootstrapping process in can
>> just execute a module provided by the vendor that specifies the
>> platform specific implementations. Some defaults can be provided since
>> Python already has a bunch of platform specific implementations.
>> An over simplified design to make this happen may look like:
>> 1. Create a simple configuration that allows a mapping of interfaces
>> to implementations. This is where the vendor would say when using
>> inspect you really should be using cli.inspect.
> That sounds like a plugin and the "strategy" pattern. ?Tarek is doing some
> work on providing a standard plugin mechanism as part of the work he's doing
> on distutils, isn't he?

Basically yes. What I proposed is more like a service locator with a
pinch of DI. Where can I learn more about what Tarek is working on? Is
there a branch somewhere?

>> 2. Add executing this new configuration to the bootstrapping process.
> Maybe I misunderstand, but wouldn't it make more sense to initialize the
> platform-specific parts of a module when it is imported rather than bring in
> everything at startup?

By executing I mean figure out the mappings and necessarily create
them. This enables errors to happen early if the dependencies are not
met. This is really useful if the technique is used for more than just
the platform specific code.

> Are we only worried about interpreter-implementation-level dependencies, or
> should there be a way for all platform-specific features to be treated in
> the same way? ? There are quite a few checks for Windows that could be moved
> into the platform-specific modules if there was an easy/standard way to do
> it.


From sven.schrader at  Wed May 20 14:31:18 2009
From: sven.schrader at (Sven Schrader)
Date: Wed, 20 May 2009 14:31:18 +0200
Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2
Message-ID: <>


since our python installation is located on a symlink'ed directory,
our variables "sys.exec_prefix" and "sys.executable" can have different
paths. Therefore, the respective test in fails (line 202)
and a wrong
library directory is obtained.

To fix this issue, I have attached a patch that uses "os.path.samefile"
to see whether two files are identical irrespective of its path.


Sven Schrader

ps: please CC answers to me, I'm not on the list :-)
pps: I hope the attachment isn't inline...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python-2.5.2-build_ext-pathcompare.patch
Type: text/x-patch
Size: 1570 bytes
Desc: not available
URL: <>

From seb.binet at  Wed May 20 15:33:43 2009
From: seb.binet at (Sebastien Binet)
Date: Wed, 20 May 2009 15:33:43 +0200
Subject: [Python-Dev] IronPython specific code in inspect module
In-Reply-To: <>
References: <>
Message-ID: <>

On Wednesday 20 May 2009 13:54:56 David Stanek wrote:
> On Wed, May 20, 2009 at 7:13 AM, Doug Hellmann <doug.hellmann at> 
> > On May 19, 2009, at 10:21 PM, David Stanek wrote:
> >> It seems that using a technique similar to dependency injection could
> >> provide some value. DI allows implementations conforming to some
> >> interface to be injected into a running application without the messy
> >> construction logic. The simple construction-by-hand pattern is to
> >> create the dependencies and pass them into the dependent objects.
> >> Frameworks build on top of this to allow the dependencies to be wired
> >> together without having any construction logic in code, like switch
> >> statements, to do the wiring.
> >>
> >> I think a similar pattern could be used in the standard library. When
> >> the interpreter goes through its normal bootstrapping process in can
> >> just execute a module provided by the vendor that specifies the
> >> platform specific implementations. Some defaults can be provided since
> >> Python already has a bunch of platform specific implementations.
> >>
> >> An over simplified design to make this happen may look like:
> >> 1. Create a simple configuration that allows a mapping of interfaces
> >> to implementations. This is where the vendor would say when using
> >> inspect you really should be using cli.inspect.
> >
> > That sounds like a plugin and the "strategy" pattern.  Tarek is doing
> > some work on providing a standard plugin mechanism as part of the work
> > he's doing on distutils, isn't he?
> Basically yes. What I proposed is more like a service locator with a
> pinch of DI. Where can I learn more about what Tarek is working on? Is
> there a branch somewhere?

it is here:
and there:

# Dr. Sebastien Binet
# Laboratoire de l'Accelerateur Lineaire
# Universite Paris-Sud XI
# Batiment 200
# 91898 Orsay

From jyasskin at  Wed May 20 17:33:26 2009
From: jyasskin at (Jeffrey Yasskin)
Date: Wed, 20 May 2009 08:33:26 -0700
Subject: [Python-Dev] Documenting lnotab
Message-ID: <>

Hi all.

I've got a patch to add some documentation for lnotab and its use in
tracing at I think it's correct, but
it's complicated so I'm looking for someone who was around when it was
designed to check. I'm also proposing a change to the semantics of
PyCode_CheckLineNumber and want to know whether I should consider it

Thanks to anyone who takes a look!

From chrispl78 at  Wed May 20 17:47:50 2009
From: chrispl78 at (Chris Plasun)
Date: Wed, 20 May 2009 08:47:50 -0700
Subject: [Python-Dev] Python on PowerPC?
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks for your reply.

Ulrich Eckhardt wrote:
> On Wednesday 20 May 2009, Chris Plasun wrote:
>> I'm to develop console apps on a Linux embedded PowerPC board (Freescale
>> MPC8313).
>> Is there a Python release for the PowerPC platform?
> This has pretty little to do with the development of the Python language 
> itself, so it is rather off topic here.

This group appeared to be relevant.

> That said, Linux systems are barely thinkable without Python, even when 
> running on PPC, so yes, Python runs on PPC, too, and is included in probably 
> every Linux distro, e.g. Debian.

hmmm, hopefully I can find something to run in an embedded box.


From jyasskin at  Wed May 20 18:40:42 2009
From: jyasskin at (Jeffrey Yasskin)
Date: Wed, 20 May 2009 09:40:42 -0700
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

A couple thoughts:

I'm with the people who think the refcount should be accessed through
functions by apps that want ABI compatibility. In particular,
GIL-removal efforts are guaranteed to change how the refcount is
modified, but there's a good chance they wouldn't have to change the
API. (We have some ideas for how to maintain source compatibility in
the absence of a GIL:
Over an 8-year lifetime for Python 3, Moore's law predicts that
desktop systems will have up to 64 cores, at which point even the
simplest GIL-removal strategy of making refcounts atomic will be a
win, despite the 2x performance loss for a single thread. I wouldn't
want an ABI to rule that out.

I do think the refcounting macros should remain present in the API
(not ABI) for apps that only need source compatibility and want the
extra speed.

I wonder if it makes sense to specify an API compatibility mode in
this PEP too.

"Py_LIMITED_API" may not be the right macro name?it didn't imply
anything about an ABI when I first saw it. Might it make sense to use
Py_ABI_COMPATIBILITY=### instead? (Where ### could be an ISO date like
20090520.) That would put "ABI" in the macro name and make it easier
to define new versions later if necessary. (New versions would help
people compile against a new version of Python and be confident they
had something that would run against old versions.) If we never define
a new version, defining it to a number instead of just anything
doesn't really hurt.

It's probably worth pointing out in the PEP that the fact that
PyVarObject.ob_size is part of the ABI means that PyObject cannot
change size, even by adding fields at the end.

Right now, the globals representing types are defined like
"PyAPI_DATA(PyTypeObject) PyList_Type;". To allow the core to use the
new type creation functions, it might be useful to make the ABI type
objects PyTypeObject* constants instead.

In general, this looks really good. Thanks!


On Sun, May 17, 2009 at 1:54 PM, "Martin v. L?wis" <martin at> wrote:
> Thomas Wouters reminded me of a long-standing idea; I finally
> found the time to write it down.
> Please comment!
> Regards,
> Martin
> PEP: 384
> Title: Defining a Stable ABI
> Version: $Revision: 72754 $
> Last-Modified: $Date: 2009-05-17 21:14:52 +0200 (So, 17. Mai 2009) $
> Author: Martin v. L?wis <martin at>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 17-May-2009
> Python-Version: 3.2
> Post-History:
> Abstract
> ========
> Currently, each feature release introduces a new name for the
> Python DLL on Windows, and may cause incompatibilities for extension
> modules on Unix. This PEP proposes to define a stable set of API
> functions which are guaranteed to be available for the lifetime
> of Python 3, and which will also remain binary-compatible across
> versions. Extension modules and applications embedding Python
> can work with different feature releases as long as they restrict
> themselves to this stable ABI.
> Rationale
> =========
> The primary source of ABI incompatibility are changes to the lay-out
> of in-memory structures. For example, the way in which string interning
> works, or the data type used to represent the size of an object, have
> changed during the life of Python 2.x. As a consequence, extension
> modules making direct access to fields of strings, lists, or tuples,
> would break if their code is loaded into a newer version of the
> interpreter without recompilation: offsets of other fields may have
> changed, making the extension modules access the wrong data.
> In some cases, the incompatibilities only affect internal objects of
> the interpreter, such as frame or code objects. For example, the way
> line numbers are represented has changed in the 2.x lifetime, as has
> the way in which local variables are stored (due to the introduction
> of closures). Even though most applications probably never used these
> objects, changing them had required to change the PYTHON_API_VERSION.
> On Linux, changes to the ABI are often not much of a problem: the
> system will provide a default Python installation, and many extension
> modules are already provided pre-compiled for that version. If additional
> modules are needed, or additional Python versions, users can typically
> compile them themselves on the system, resulting in modules that use
> the right ABI.
> On Windows, multiple simultaneous installations of different Python
> versions are common, and extension modules are compiled by their
> authors, not by end users. To reduce the risk of ABI incompatibilities,
> Python currently introduces a new DLL name pythonXY.dll for each
> feature release, whether or not ABI incompatibilities actually exist.
> With this PEP, it will be possible to reduce the dependency of binary
> extension modules on a specific Python feature release, and applications
> embedding Python can be made work with different releases.
> Specification
> =============
> The ABI specification falls into two parts: an API specification,
> specifying what function (groups) are available for use with the
> ABI, and a linkage specification specifying what libraries to link
> with. The actual ABI (layout of structures in memory, function
> calling conventions) is not specified, but implied by the
> compiler. As a recommendation, a specific ABI is recommended for
> selected platforms.
> During evolution of Python, new ABI functions will be added.
> Applications using them will then have a requirement on a minimum
> version of Python; this PEP provides no mechanism for such
> applications to fall back when the Python library is too old.
> Terminology
> -----------
> Applications and extension modules that want to use this ABI
> are collectively referred to as "applications" from here on.
> Header Files and Preprocessor Definitions
> -----------------------------------------
> Applications shall only include the header file Python.h (before
> including any system headers), or, optionally, include pyconfig.h, and
> then Python.h.
> During the compilation of applications, the preprocessor macro
> Py_LIMITED_API must be defined. Doing so will hide all definitions
> that are not part of the ABI.
> Structures
> ----------
> Only the following structures and structure fields are accessible to
> applications:
> - PyObject (ob_refcnt, ob_type)
> - PyVarObject (ob_base, ob_size)
> - Py_buffer (buf, obj, len, itemsize, readonly, ndim, shape,
> ?strides, suboffsets, smalltable, internal)
> - PyMethodDef (ml_name, ml_meth, ml_flags, ml_doc)
> - PyMemberDef (name, type, offset, flags, doc)
> - PyGetSetDef (name, get, set, doc, closure)
> The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE)
> are also available to applications.
> The following types are available, but opaque (i.e. incomplete):
> - PyThreadState
> - PyInterpreterState
> Type Objects
> ------------
> The structure of type objects is not available to applications;
> declaration of "static" type objects is not possible anymore
> (for applications using this ABI).
> Instead, type objects get created dynamically. To allow an
> easy creation of types (in particular, to be able to fill out
> function pointers easily), the following structures and functions
> are available::
> ?typedef struct{
> ? ?int slot; ? ?/* slot id, see below */
> ? ?void *pfunc; /* function pointer */
> ?} PyType_Slot;
> ?struct{
> ? ?const char* name;
> ? ?const char* doc;
> ? ?int basicsize;
> ? ?int itemsize;
> ? ?int flags;
> ? ?PyType_Slot *slots; /* terminated by slot==0. */
> ?} PyType_Spec;
> ?PyObject* PyType_FromSpec(PyType_Spec*);
> To specify a slot, a unique slot id must be provided. New Python
> versions may introduce new slot ids, but slot ids will never be
> recycled. Slots may get deprecated, but continue to be supported
> throughout Python 3.x.
> The slot ids are named like the field names of the structures that
> hold the pointers in Python 3.1, with an added ``Py_`` prefix (i.e.
> Py_tp_dealloc instead of just tp_dealloc):
> - tp_dealloc, tp_print, tp_getattr, tp_setattr, tp_repr,
> ?tp_hash, tp_call, tp_str, tp_getattro, tp_setattro,
> ?tp_doc, tp_traverse, tp_clear, tp_richcompare, tp_iter,
> ?tp_iternext, tp_methods, tp_base, tp_descr_set, tp_descr_set,
> ?tp_init, tp_alloc, tp_new, tp_is_gc, tp_bases, tp_del
> - nb_add nb_subtract nb_multiply nb_remainder nb_divmod nb_power
> ?nb_negative nb_positive nb_absolute nb_bool nb_invert nb_lshift
> ?nb_rshift nb_and nb_xor nb_or nb_int nb_float nb_inplace_add
> ?nb_inplace_subtract nb_inplace_multiply nb_inplace_remainder
> ?nb_inplace_power nb_inplace_lshift nb_inplace_rshift nb_inplace_and
> ?nb_inplace_xor nb_inplace_or nb_floor_divide nb_true_divide
> ?nb_inplace_floor_divide nb_inplace_true_divide nb_index
> - sq_length sq_concat sq_repeat sq_item sq_ass_item was_sq_ass_slice
> ?sq_contains sq_inplace_concat sq_inplace_repeat
> - mp_length mp_subscript mp_ass_subscript
> - bf_getbuffer bf_releasebuffer
> XXX Not supported yet: tp_weaklistoffset, tp_dictoffset
> The following fields cannot be set during type definition:
> - tp_dict tp_mro tp_cache tp_subclasses tp_weaklist
> Functions and function-like Macros
> ----------------------------------
> All functions starting with _Py are not available to applications.
> Also, all functions that expect parameter types that are unavailable
> to applications are excluded from the ABI, such as PyAST_FromNode
> (which expects a ``node*``).
> All other functions are available, unless excluded below.
> Function-like macros (in particular, field access macros) remain
> available to applications, but get replaced by function calls
> (unless their definition only refers to features of the ABI, such
> as the various _Check macros)
> ABI function declarations will not change their parameters or return
> types. If a change to the signature becomes necessary, a new function
> will be introduced. If the new function is source-compatible (e.g. if
> just the return type changes), an alias macro may get added to
> redirect calls to the new function when the applications is
> recompiled.
> If continued provision of the old function is not possible, it may get
> deprecated, then removed, in accordance with PEP 7, causing
> applications that use that function to break.
> Excluded Functions
> ------------------
> Functions declared in the following header files are not part
> of the ABI:
> - cellobject.h
> - classobject.h
> - code.h
> - frameobject.h
> - funcobject.h
> - genobject.h
> - pyarena.h
> - pydebug.h
> - symtable.h
> - token.h
> - traceback.h
> Global Variables
> ----------------
> Global variables representing types and exceptions are available
> to applications.
> XXX provide a complete list.
> XXX should restrict list of globals to truly "builtin" stuff,
> excluding everything that can also be looked up through imports.
> XXX may specify access to predefined types and exceptions through
> the interpreter state, with appropriate Get macros.
> Other Macros
> ------------
> All macros defining symbolic constants are available to applications;
> the numeric values will not change.
> In addition, the following macros are available:
> Linkage
> -------
> On Windows, applications shall link with python3.dll; an import
> library python3.lib will be available. This DLL will redirect all of
> its API functions through /export linker options to the full
> interpreter DLL, i.e. python3y.dll.
> XXX is it possible to redirect global variables in the same way?
> If not, python3.dll would have to copy them, and we should verify
> that all available global variables are read-only.
> On Unix systems, the ABI is typically provided by the python
> executable itself. PyModule_Create is changed to pass ``3`` as the API
> version if the extension module was compiled with Py_LIMITED_API; the
> version check for the API version will accept either 3 or the current
> PYTHON_API_VERSION as conforming. If Python is compiled as a shared
> library, it is installed as both, and;
> applications conforming to this PEP should then link to the former.
> XXX is it possible to make the soname, and still
> have some applications link to
> Implementation Strategy
> =======================
> This PEP will be implemented in a branch, allowing users to check
> whether their modules conform to the ABI. To simplify this testing, an
> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
> type object layout, to let users postpone rewriting all types. When
> the this branch is merged into the 3.2 code base, this macro will
> be removed.
> Copyright
> =========
> This document has been placed in the public domain.

From jyasskin at  Wed May 20 18:49:34 2009
From: jyasskin at (Jeffrey Yasskin)
Date: Wed, 20 May 2009 09:49:34 -0700
Subject: [Python-Dev] [Fwd: Re: PEP 384: Defining a Stable ABI]
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 19, 2009 at 2:09 AM, William Reade
<william at> wrote:
> (for example, all(?) PyCxx modules use PyCode_New and
> PyFrame_New to get nicer tracebacks)

Specifically for this, I think it'd be nice to expose a function to do
this directly. I recently added PyCode_NewEmpty
( to go part of the
way here. I didn't go farther because I didn't have a big enough
picture. If most uses of PyFrame_New are really just to call into
Python with a nice traceback, I think it'd be a good idea to add such
a function to ceval.h next to PyEval_Call*(). We can only credibly
tell people to use only the ABI functions when we have an ABI
replacement for the (sane uses of) non-ABI calls.

From theller at  Wed May 20 18:52:46 2009
From: theller at (Thomas Heller)
Date: Wed, 20 May 2009 18:52:46 +0200
Subject: [Python-Dev] Python on PowerPC?
In-Reply-To: <>
References: <>	<>
Message-ID: <gv1cgv$38p$>

Chris Plasun schrieb:
> Thanks for your reply.
> Ulrich Eckhardt wrote:
>> On Wednesday 20 May 2009, Chris Plasun wrote:
>>> I'm to develop console apps on a Linux embedded PowerPC board (Freescale
>>> MPC8313).
>>> Is there a Python release for the PowerPC platform?
>> This has pretty little to do with the development of the Python language 
>> itself, so it is rather off topic here.
> This group appeared to be relevant.
>> That said, Linux systems are barely thinkable without Python, even when 
>> running on PPC, so yes, Python runs on PPC, too, and is included in probably 
>> every Linux distro, e.g. Debian.
> hmmm, hopefully I can find something to run in an embedded box.

If you need to cross-compile, I have a build script and working patches
to cross-build Python 2.6.2 for an ARM embedded system.  Contact me by private mail
if you want them.


From solipsis at  Wed May 20 19:14:46 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 20 May 2009 17:14:46 +0000 (UTC)
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
References: <>
Message-ID: <>

Jeffrey Yasskin <jyasskin <at>> writes:
> Over an 8-year lifetime for Python 3, Moore's law predicts that
> desktop systems will have up to 64 cores, at which point even the
> simplest GIL-removal strategy of making refcounts atomic will be a
> win, despite the 2x performance loss for a single thread.

That's only if you think all workloads parallelize easily (and with little work
from the average programmer), which sounds a bit presumptuous. When you have a
GUI application and the perceived performance is driven by UI responsivity,
spawning dozens of threads can little to improve the picture ("GUI application"
here can mean a feature-rich Web application, too).

As for desktop systems having 64 cores, that's unless the available die space
gets used for something else instead, e.g. an integrated GPU. Or unless the
desktop dies in favor of something else (e.g. laptops with small tightly
integrated chips). The former is already in AMD's and Intel's plans. The latter
could be happening right now.

And we're not even talking about embedded platforms, or virtual machines where a
64-core server is partitioned into 64 "single-core" systems.

(and then there's the whole threading vs processing debate ;-))

Endly, removing the GIL means you have to make all base types (especially
containers) thread-safe without sacrificing their performance. I don't think
it's just about reference-counting.

That said, the Py_Incref() and Py_Decref() functions already exist. Perhaps they
should be advertised a bit more in the documentation. The day a hypothetical
Python implementation gets rid of reference-counting while remaining binary
compatible with the rest of the API (which rules out PyPy), and gets much faster
in the process, I think people will happily suffer a small recompile.



From jyasskin at  Wed May 20 19:26:35 2009
From: jyasskin at (Jeffrey Yasskin)
Date: Wed, 20 May 2009 10:26:35 -0700
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 20, 2009 at 10:14 AM, Antoine Pitrou <solipsis at> wrote:
> Jeffrey Yasskin <jyasskin <at>> writes:
>> Over an 8-year lifetime for Python 3, Moore's law predicts that
>> desktop systems will have up to 64 cores, at which point even the
>> simplest GIL-removal strategy of making refcounts atomic will be a
>> win, despite the 2x performance loss for a single thread.
> That's only if you think all workloads parallelize easily (and with little work
> from the average programmer), which sounds a bit presumptuous. When you have a
> GUI application and the perceived performance is driven by UI responsivity,
> spawning dozens of threads can little to improve the picture ("GUI application"
> here can mean a feature-rich Web application, too).
> As for desktop systems having 64 cores, that's unless the available die space
> gets used for something else instead, e.g. an integrated GPU. Or unless the
> desktop dies in favor of something else (e.g. laptops with small tightly
> integrated chips). The former is already in AMD's and Intel's plans. The latter
> could be happening right now.
> And we're not even talking about embedded platforms, or virtual machines where a
> 64-core server is partitioned into 64 "single-core" systems.
> (and then there's the whole threading vs processing debate ;-))
> Endly, removing the GIL means you have to make all base types (especially
> containers) thread-safe without sacrificing their performance. I don't think
> it's just about reference-counting.
> That said, the Py_Incref() and Py_Decref() functions already exist. Perhaps they
> should be advertised a bit more in the documentation. The day a hypothetical
> Python implementation gets rid of reference-counting while remaining binary
> compatible with the rest of the API (which rules out PyPy), and gets much faster
> in the process, I think people will happily suffer a small recompile.

Sorry, I didn't mean to get into a GIL debate. All I'm saying is that
I don't think changing the definition of Py_INCREF and Py_DECREF
justifies going to Python 4.0, so I don't think their definitions
should be part of the ABI. If that's not what the ABI means, that's ok

From solipsis at  Wed May 20 19:34:42 2009
From: solipsis at (Antoine Pitrou)
Date: Wed, 20 May 2009 17:34:42 +0000 (UTC)
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
References: <>
Message-ID: <>

Jeffrey Yasskin <jyasskin <at>> writes:
> Sorry, I didn't mean to get into a GIL debate. All I'm saying is that
> I don't think changing the definition of Py_INCREF and Py_DECREF
> justifies going to Python 4.0, so I don't think their definitions
> should be part of the ABI. If that's not what the ABI means, that's ok
> too.

Consider, though, that if Py_INCREF and Py_DECREF are not part of the ABI,
enabling the ABI-specific preprocessor symbol will hide them, which might (or
might not!) annoy a lot of extension writers.

(I don't know if there are extensions out there having reference count
increments and decrements in their critical paths)



From jyasskin at  Wed May 20 19:41:37 2009
From: jyasskin at (Jeffrey Yasskin)
Date: Wed, 20 May 2009 10:41:37 -0700
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 20, 2009 at 10:34 AM, Antoine Pitrou <solipsis at> wrote:
> Jeffrey Yasskin <jyasskin <at>> writes:
>> Sorry, I didn't mean to get into a GIL debate. All I'm saying is that
>> I don't think changing the definition of Py_INCREF and Py_DECREF
>> justifies going to Python 4.0, so I don't think their definitions
>> should be part of the ABI. If that's not what the ABI means, that's ok
>> too.
> Consider, though, that if Py_INCREF and Py_DECREF are not part of the ABI,
> enabling the ABI-specific preprocessor symbol will hide them, which might (or
> might not!) annoy a lot of extension writers.

Yes, that's my intention. (Well, not the annoying part, but making
them use Py_IncRef instead for ABI compatibility is, I think, a good
thing.) If they don't want ABI compatibility, they shouldn't ask for
it. Giving them something else useful to ask for is why I mentioned an
API compatibility mode.

To decrease the annoyance of having to change source code, we could
have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode.

From aahz at  Wed May 20 21:34:22 2009
From: aahz at (Aahz)
Date: Wed, 20 May 2009 12:34:22 -0700
Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 20, 2009, Sven Schrader wrote:
> since our python installation is located on a symlink'ed directory,
> our variables "sys.exec_prefix" and "sys.executable" can have
> different paths. Therefore, the respective test in fails
> (line 202) and a wrong library directory is obtained.
> To fix this issue, I have attached a patch that uses
> "os.path.samefile" instead, to see whether two files are identical
> irrespective of its path.

Please post this patch to so it can be tracked.

Note that Python 2.5 is now accepting only security patches, so please
check whether 2.6 and trunk need it.
Aahz (aahz at           <*>

"A foolish consistency is the hobgoblin of little minds, adored by little
statesmen and philosophers and divines."  --Ralph Waldo Emerson

From ncoghlan at  Wed May 20 22:07:08 2009
From: ncoghlan at (Nick Coghlan)
Date: Thu, 21 May 2009 06:07:08 +1000
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Jeffrey Yasskin wrote:
> Yes, that's my intention. (Well, not the annoying part, but making
> them use Py_IncRef instead for ABI compatibility is, I think, a good
> thing.) If they don't want ABI compatibility, they shouldn't ask for
> it. Giving them something else useful to ask for is why I mentioned an
> API compatibility mode.
> To decrease the annoyance of having to change source code, we could
> have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode.

Forcing developers to choose between the speed of the INCREF/DECREF
macros and the proposed ABI compatibility mode for the benefit of an as
yet hypothetical GIL-less CPython API implementation seems more like a
way to kill adoption of the ABI compatibility mode rather than a way to
encourage the use of the IncRef/Decref functions.

The idea of allow an extension to explicitly version the stable ABI
they're using with a macro like Py_ABI_VERSION is a good one though. I'd
suggest using the Python version in hex (e.g. 0x020700 and 0x030200)
rather than an ISO date though. That way an extension developer that
wanted to ensure there code worked with a particular Python version and
later could just define the right Py_ABI_VERSION rather than have to
specifically compile against that earliest version.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Wed May 20 22:10:48 2009
From: ncoghlan at (Nick Coghlan)
Date: Thu, 21 May 2009 06:10:48 +1000
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:
> Functions and function-like Macros
> ----------------------------------
> All functions starting with _Py are not available to applications.
> Also, all functions that expect parameter types that are unavailable
> to applications are excluded from the ABI, such as PyAST_FromNode
> (which expects a ``node*``).
> All other functions are available, unless excluded below.
> Function-like macros (in particular, field access macros) remain
> available to applications, but get replaced by function calls
> (unless their definition only refers to features of the ABI, such
> as the various _Check macros)
> ABI function declarations will not change their parameters or return
> types. If a change to the signature becomes necessary, a new function
> will be introduced. If the new function is source-compatible (e.g. if
> just the return type changes), an alias macro may get added to
> redirect calls to the new function when the applications is
> recompiled.
> If continued provision of the old function is not possible, it may get
> deprecated, then removed, in accordance with PEP 7, causing
> applications that use that function to break.

Something I haven't seen explicitly mentioned as yet (in the PEP or the
python-dev list discussion) are the memory management APIs and the FILE*
APIs which can cause the MSVCRT versioning issues on Windows.

Those would either need to be excluded from the stable ABI or else
changed to use opaque pointers.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ziade.tarek at  Wed May 20 22:58:50 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 20 May 2009 22:58:50 +0200
Subject: [Python-Dev] distutils.build_ext path comparison - python 2.5.2
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Sven

can you add an issue with your patch in

Thanks in advance

On Wed, May 20, 2009 at 2:31 PM, Sven Schrader <sven.schrader at> wrote:
> Hi,
> since our python installation is located on a symlink'ed directory,
> our variables "sys.exec_prefix" and "sys.executable" can have different
> paths. Therefore, the respective test in fails (line 202)
> and a wrong
> library directory is obtained.
> To fix this issue, I have attached a patch that uses "os.path.samefile"
> instead,
> to see whether two files are identical irrespective of its path.
> Greetings
> Sven Schrader
> ps: please CC answers to me, I'm not on the list :-)
> pps: I hope the attachment isn't inline...
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Tarek Ziad? |

From skip at  Wed May 20 22:59:32 2009
From: skip at (skip at
Date: Wed, 20 May 2009 15:59:32 -0500
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

    Nick> Jeffrey Yasskin wrote:

    >> To decrease the annoyance of having to change source code, we could
    >> have Py_INCREF(x) expand to Py_IncRef(x) in ABI-compatibility mode.

    Nick> Forcing developers to choose between the speed of the
    Nick> INCREF/DECREF macros and the proposed ABI compatibility mode for
    Nick> the benefit of an as yet hypothetical GIL-less CPython API
    Nick> implementation seems more like a way to kill adoption of the ABI
    Nick> compatibility mode rather than a way to encourage the use of the
    Nick> IncRef/Decref functions.

I suspect it's not really germane to this discussion but if the
incref/decref functions were defined as inline would that effectively be
like using the macro versions vis a vis ABI compatibility?


From benjamin at  Wed May 20 23:01:23 2009
From: benjamin at (Benjamin Peterson)
Date: Wed, 20 May 2009 16:01:23 -0500
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/20  <skip at>:

> I suspect it's not really germane to this discussion but if the
> incref/decref functions were defined as inline would that effectively be
> like using the macro versions vis a vis ABI compatibility?

The code would be inlined into applications defeating the point of
being able to change the implementation. :)


From stephen at  Thu May 21 02:40:56 2009
From: stephen at (Stephen J. Turnbull)
Date: Thu, 21 May 2009 09:40:56 +0900
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson writes:
 > 2009/5/20  <skip at>:
 > > I suspect it's not really germane to this discussion but if the
 > > incref/decref functions were defined as inline would that effectively be
 > > like using the macro versions vis a vis ABI compatibility?
 > The code would be inlined into applications defeating the point of
 > being able to change the implementation. :)

Hang on, are you sure Skip isn't on to something?  If the A*P*Is are
defined in such way that by making them *function calls* they preserve
A*B*I compatibility, while making them inline gives performance, then
the user (in this case, I really mean the vendor of an application
that contains C modules, I guess) can choose which route to go, right?

I suppose that Python itself could be built with inlined code
internally, but also provide the ABI (at a cost in size, of course).

I don't know if this complexity is manageable or worth trying to
manage, but isn't it conceivable that it could work?

I guess that's for the advocates of extending the promise of ABI
compatibility to these APIs to show, though.  I don't need it myself.

From foom at  Thu May 21 02:48:01 2009
From: foom at (James Y Knight)
Date: Wed, 20 May 2009 20:48:01 -0400
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

On May 20, 2009, at 4:07 PM, Nick Coghlan wrote:
> Forcing developers to choose between the speed of the INCREF/DECREF
> macros and the proposed ABI compatibility mode for the benefit of an  
> as
> yet hypothetical GIL-less CPython API implementation seems more like a
> way to kill adoption of the ABI compatibility mode rather than a way  
> to
> encourage the use of the IncRef/Decref functions.

Indeed, and if the promise of "no-ABI-breakages-till-4.0" is removed,  
this would be a non-issue. Keep Py_INCREF macros in the current ABI,  
and then break the ABI when someone wants to remove the GIL someday.  
That's certainly going to be a big enough change to justify changing  
the ABI.


From benjamin at  Thu May 21 03:48:53 2009
From: benjamin at (Benjamin Peterson)
Date: Wed, 20 May 2009 20:48:53 -0500
Subject: [Python-Dev] [unladen-swallow] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/20 Stephen J. Turnbull <stephen at>:
> Benjamin Peterson writes:
> ?> 2009/5/20 ?<skip at>:
> ?>
> ?> > I suspect it's not really germane to this discussion but if the
> ?> > incref/decref functions were defined as inline would that effectively be
> ?> > like using the macro versions vis a vis ABI compatibility?
> ?>
> ?> The code would be inlined into applications defeating the point of
> ?> being able to change the implementation. :)
> Hang on, are you sure Skip isn't on to something? ?If the A*P*Is are
> defined in such way that by making them *function calls* they preserve
> A*B*I compatibility, while making them inline gives performance, then
> the user (in this case, I really mean the vendor of an application
> that contains C modules, I guess) can choose which route to go, right?

In that case, they might as well be macros because changing would
require recompiling.


From william at  Fri May 22 12:33:02 2009
From: william at (William Reade)
Date: Fri, 22 May 2009 11:33:02 +0100
Subject: [Python-Dev] [Fwd: Re:  PEP 384: Defining a Stable ABI]
In-Reply-To: <>
References: <>
Message-ID: <>

William Reade wrote:
> 2) Since it hasn't always been in place, its introduction won't help 
> me in the short term: there are an awful lot of extension modules that 
> use excluded functions (for example, all(?) PyCxx modules use 
> PyCode_New and PyFrame_New to get nicer tracebacks), and I'll still 
> have to handle all these cases until everyone is up-to-date with 
> whatever version of Python this gets accepted into.
It seems that where I should have said Pyrex, I actually said PyCxx. 
Sorry for the confusion. Thanks to Barry Scott for pointing it out.

From status at  Fri May 22 18:06:56 2009
From: status at (Python tracker)
Date: Fri, 22 May 2009 18:06:56 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (05/15/09 - 05/22/09)
Python tracker at

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2195 open (+35) / 15716 closed (+24) / 17911 total (+59)

Open issues with patches:   863

Average duration of open issues: 650 days.
Median duration of open issues: 400 days.

Open Issues Breakdown
   open  2168 (+35)
pending    27 ( +0)

Issues Created Or Reopened (64)

bsddb memory leak on ubuntu                                      05/18/09
CLOSED    reopened ajaksu2                       

idle pydoc et al removed from 3.1 without versioned replacements 05/22/09    reopened nad                           

Dict fails to notice addition and deletion of keys during iterat 05/16/09
CLOSED    reopened stevenjd                      

documentation of xml.dom.minidom.parse signature is wrong        05/16/09
CLOSED    reopened phihag                        

Interpreter crashes when chaining an infinite number of exceptio 05/15/09    reopened amaury.forgeotdarc            

io.BufferedWriter C module missing _write_lock                   05/15/09
CLOSED    created  jroesslein                    

BaseServer.shutdown documentation is incomplete                  05/15/09    created  gagenellina                   

Fix refleaks in test_urllib2_localnet                            05/16/09
CLOSED    created  collinwinter                  

LOOKUP_METHOD and CALL_METHOD optimization                       05/16/09    created  benjamin.peterson             

Fix object.__reversed__ doc                                      05/16/09
CLOSED    created  tjreedy                       

test_poplib Bus error with gcc-4.4 on OS X                       05/16/09
CLOSED    created  marketdickinson               

Clean up                                       05/16/09    created  phihag                        

MutableSequence.__iadd__ should return self                      05/16/09
CLOSED    created  amarzal                       

Should collections.Counter check for int?                        05/16/09
CLOSED    created  hagen                         

cygwin compilers should not check compiler versions              05/16/09    created  cdavid                        

bdist_msi does not deal with pre-release version                 05/16/09    created  cdavid                        

change sdist and register command so they use check              05/16/09
CLOSED    created  tarek                         

Document and slightly simplify lnotab tracing                    05/16/09    created  jyasskin                      
       patch, needs review                                                     

HTMLParseError derivation                                        05/16/09
CLOSED    created  bayerf                        

Exception message in int() when trying to convert a complex      05/17/09
CLOSED    created  aletornw                      

Fix dbm interfaces                                               05/17/09    created  georg.brandl                  
                                                                        fails on VC6(Windows)                          05/17/09
CLOSED    created  ocean-city                    

"install" target in python 3.x makefile should be "fullinstall"  05/17/09    created  ronaldoussoren                

make distutils use the tarinfo command                           05/17/09    created  tarek                         

str.strip() and " behaviour expected?                            05/17/09
CLOSED    created  sholvar                       

zipfile: Extracting a directory that already exists generates an 05/18/09    created  joe.amenta                    

smtplib docs should link to email module examples                05/18/09
CLOSED    created  guettli                       

for-loop doesn't work with -c                                    05/18/09
CLOSED    created  exe                           

distutils error on windows                                       05/18/09
CLOSED    created  ocean-city                    

tarfile normalizes arcname                                       05/18/09    created  mkv                           

References to "pysqlite" in documentation of sqlite3 should be c 05/18/09
CLOSED    created  MLModel                       

socket.setdefaulttimeout affecting multiprocessing Manager       05/18/09    created  ryles                         

sqlite3 error classes should be documented                       05/18/09    created  MLModel                       

Add cp65001 to encodings/                              05/19/09    created  tzot                          

uuid.uuid4 cause segfault in emesene                             05/19/09    created  acevery                       

PYTHONHOME should be more flexible (and controllable by --libdir 05/19/09    created  soundmurderer                 

time.clock(): overflow in programs that run for very long        05/19/09
CLOSED    created  tom65536                      

build_ext fails to build in the right directory using the packag 05/19/09
CLOSED    created  tarek                         

pydoc_data package is not installed                              05/19/09
CLOSED    created  ronaldoussoren                
       patch, 26backport                                                       

Add "daemon" argument to threading.Thread constructor            05/19/09    created  tebeka                        
       patch, easy                                                      failed assert when including extension modules      05/19/09    created  tim.golden                    

POP_MARK was not in pickle protocol 0                            05/20/09
CLOSED    created  collinwinter                  
       patch, easy                                                             

make error                                                       05/20/09    created  gast                          

support read/write c_ulonglong type bitfield structures          05/20/09    created  higstar                       

casting error from ctypes array to structure                     05/20/09    created  higstar                       

Pyhon 2.6 makes .pyc/.pyo bytecode files executable              05/20/09    created  phd                           

no longer possible to hash arrays                                05/20/09    created  exarkun                       

unittest.TestCase._result is very likely to collide (and break)  05/20/09
CLOSED    created  exarkun                       

threading.Timer and gtk.main are not compatible                  05/20/09    created  eric                          

.pyc files created readonly if .py file is readonly, python won' 05/20/09    created  pdsimanyi                     

Patch for IDLE/OS X to work with Tk-Cocoa                        05/20/09    created  wordtech                      

Missing title for                                05/20/09    created  wordtech                      

Unicode issue with tempfile on Windows                           05/21/09    created  daniel.ugra                   
                                                                        doesn't work                                           05/21/09    created  mzalokar                      

SyntaxError in xmlrpc.client examples                            05/21/09    created  thijs                         

Itertools objects are missing "send"                             05/21/09
CLOSED    created  tebeka                        

str.format_from_mapping()                                        05/21/09    created  rhettinger                    

os.path.sameopenfile reports that standard streams are the same  05/22/09
CLOSED    reopened ryles                         

Reference counting bug in setrlimit                              05/22/09    created  billm                         

documentation of zip function is error                           05/22/09
CLOSED    created  bones7456                     

Logging in BaseHTTPServer.BaseHTTPRequestHandler causes lag      05/22/09    created  aerodonkey                    

Correct minor typos in doanddont.rst and urllib2.rst howto docum 05/22/09
CLOSED    created  vshenoy                       

distutils.sysconfig.get_python_lib gives surprising result when  05/22/09    created  vsajip                        

Python3.0.1.1 is not available when system locale is zh_TW.eucTW 05/22/09    created  leeon                         

Issues Now Closed (65)

weakref copy module interaction                                   456 days    pitrou                        

os.listdir doc should mention that Unicode decoding can fail      366 days    georg.brandl                  

sys.stdin.fileno() gives attribute error in IDLE                  354 days    kbk                           

Problem with invalidly-encoded command-line arguments (Unix)      351 days    benjamin.peterson             

incorrect comments for PyObject_ReleaseBuffer                     315 days    pitrou                        

Py_WIN_WIDE_FILENAMES removal                                     282 days    ocean-city                    

bsddb memory leak on ubuntu                                         4 days    jcea                          

remove not decodable environment variables                        214 days    loewis                        

3 tutorial documentation errors                                   210 days    georg.brandl                  

Running Python 2.6 GUI on Windows Vista                           202 days    georg.brandl                  

Missing  make altframeworkinstall for Mac OS X                    165 days    ronaldoussoren                

2.6.1 breaks many applications that embed Python on Windows       163 days    chrisyco                      
       patch, needs review                                                     

Setting font from preference dialog in IDLE on OS X broken         97 days    kbk                           

OS X Installer: add options to specify universal build type and    93 days    ronaldoussoren                

Scanner class in re module undocumented                            87 days    rhettinger                    

add a new command called "check" into Distutils                    37 days    tarek                         

OS X Installer: new make of documentation installs at wrong loca   34 days    ronaldoussoren                

len(reversed(                                                      28 days    marketdickinson               

float('1e500') -> inf, complex('1e500') -> ValueError              26 days    marketdickinson               
       patch, easy                                                             

Better documentation of use of BROWSER environment variable        12 days    georg.brandl                  

Problems with dbm documentation                                    12 days    georg.brandl                  

Ambiguity in flag documentation                           14 days    georg.brandl                  

email.message : get_payload args's documentation is confusing      10 days    georg.brandl                  

test_distutils fails for Python 3.1b1 on MacOS X                    9 days    nad                           

WeakSet cmp methods                                                11 days    pitrou                        
       patch, needs review                                                     

Add bug tracker tasks to PEP 101                                    8 days    georg.brandl                  

Broken link to "Curses Programming with Python"                     6 days    georg.brandl                  

Add __bool__ to threading.Event and multiprocessing.Event           4 days    benjamin.peterson             

test_urllib2_localnet DigestAuthHandler leaks nonces                7 days    collinwinter                  

optparse docs say 'default' keyword is deprecated but uses it in    3 days    georg.brandl                  

Dict fails to notice addition and deletion of keys during iterat    1 days    georg.brandl                  

Fix the output word from "ok" to "OK"  when a testcase passes       3 days    benjamin.peterson             

test_distutils leaves a 'foo' file behind in the cwd                5 days    r.david.murray                

Search does not intelligently handle module.function queries on     3 days    georg.brandl                  

documentation of xml.dom.minidom.parse signature is wrong           0 days    georg.brandl                  

io.BufferedWriter C module missing _write_lock                      0 days    pitrou                        

Fix refleaks in test_urllib2_localnet                               3 days    collinwinter                  

Fix object.__reversed__ doc                                         0 days    georg.brandl                  

test_poplib Bus error with gcc-4.4 on OS X                          0 days    marketdickinson               

MutableSequence.__iadd__ should return self                         2 days    rhettinger                    

Should collections.Counter check for int?                           1 days    rhettinger                    

change sdist and register command so they use check                 0 days    tarek                         

HTMLParseError derivation                                           0 days    benjamin.peterson             

Exception message in int() when trying to convert a complex         0 days    marketdickinson               
                                                                        fails on VC6(Windows)                             1 days    tarek                         

str.strip() and " behaviour expected?                               0 days    loewis                        

smtplib docs should link to email module examples                   2 days    georg.brandl                  

for-loop doesn't work with -c                                       0 days    r.david.murray                

distutils error on windows                                          0 days    loewis                        

References to "pysqlite" in documentation of sqlite3 should be c    2 days    georg.brandl                  

time.clock(): overflow in programs that run for very long           0 days    tom65536                      

build_ext fails to build in the right directory using the packag    0 days    tarek                         

pydoc_data package is not installed                                 0 days    georg.brandl                  
       patch, 26backport                                                       

POP_MARK was not in pickle protocol 0                               1 days    collinwinter                  
       patch, easy                                                             

unittest.TestCase._result is very likely to collide (and break)     1 days    michael.foord                 

Itertools objects are missing "send"                                0 days    rhettinger                    

os.path.sameopenfile reports that standard streams are the same     0 days    ryles                         

documentation of zip function is error                              0 days    georg.brandl                  

Correct minor typos in doanddont.rst and urllib2.rst howto docum    0 days    georg.brandl                  

http libraries throw errors internally in BitTorrent             1884 days  rhettinger                    

Documentation for Descriptors in the main docs                   1809 days  rhettinger                    

pdb unable to jump to first statement                             785 days jyasskin                      
       patch, needs review                                                     

Failure to build on AIX 5.3                                       773 days ajaksu2                       

syslog syscall support for SysLogLogger                           749 days dandrzejewski                 

help() can't find right source file                               702 days pitrou                        
       patch, easy                                                             

Top Issues Most Discussed (10)

 11 fails on VC6(Windows)                            1 days

  8 Interpreter crashes when chaining an infinite number of excepti    7 days

  8 Dict fails to notice addition and deletion of keys during itera    1 days

  7 test_distutils fails for Python 3.1b1 on MacOS X                   9 days

  7 urllib/urllib2: HTTPS over (Squid) Proxy fails                  1203 days

  6 Enhanced cPython profiler with high-resolution timer             436 days

  5 distutils error on windows                                         0 days

  5 Fix dbm interfaces                                                 5 days

  5 Fix refleaks in test_urllib2_localnet                              3 days

  5 Embedding into a shared library fails                            177 days

From ziade.tarek at  Fri May 22 18:27:01 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 22 May 2009 18:27:01 +0200
Subject: [Python-Dev] PEP 376 : Changing the .egg-info structure
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, May 20, 2009 at 11:48 AM, Tarek Ziad? <ziade.tarek at> wrote:
> So I guess I'll start this prototype in bitbucket and come back with it for feedback
> in Distutils-SIG, for a new PEP 376 round.

Ok so FYI, I moved the discussion here:


From jimjjewett at  Fri May 22 18:46:57 2009
From: jimjjewett at (Jim Jewett)
Date: Fri, 22 May 2009 12:46:57 -0400
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
Message-ID: <>

Martin v. L?wis wrote:

>  - PyGetSetDef (name, get, set, doc, closure)

Is it fully decided that the generally-unused closure parameter will
stay until python 4?

> The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE)
> are also available to applications.

There have been several experiments in memory management, ranging from
not bothering to change the refcount on permanent objects like None,
to proxying objects across multiple threads or processes.  I also
believe (but don't remember for sure) that some of the proposed
Unicode (or String?) optimizations changed the memory layout a bit.
So far, these have all been complicated (or slow) enough that they
didn't get integrated, but if it ever happens ... I don't think it
would justify python 4.0

> New Python
> versions may introduce new slot ids, but slot ids will never be
> recycled. Slots may get deprecated, but continue to be supported
> throughout Python 3.x.

Weren't there already a few ready for deprecation?  Do you really want
to commit to them forever?  Even if you aren't willing to settle for
less than "3.x from now on", it might make sense to at least start
with 3.2, rather than 3.0.


From solipsis at  Fri May 22 19:00:00 2009
From: solipsis at (Antoine Pitrou)
Date: Fri, 22 May 2009 17:00:00 +0000 (UTC)
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
References: <>
Message-ID: <>

Jim Jewett <jimjjewett <at>> writes:
> > The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE)
> > are also available to applications.
> There have been several experiments in memory management, ranging from
> not bothering to change the refcount on permanent objects like None,
> to proxying objects across multiple threads or processes.

These experiments don't seem to have been very successful, have they? Besides,
Py_TYPE is a fundamental property of every PyObject.

On the other hand, I think Py_SIZE should be discouraged in favour of the
type-specific variants (PyString_GET_SIZE, etc.), since some types have their
own way of (ab)using the size field.

> I also
> believe (but don't remember for sure) that some of the proposed
> Unicode (or String?) optimizations changed the memory layout a bit.

The one Unicode optimization I know of, in, is
suspended because of Marc-Andre's opposition. In any case, it doesn't touch the
fundamental PyObject layout.



From martin at  Fri May 22 21:47:33 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 22 May 2009 21:47:33 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
Message-ID: <>

> Something I haven't seen explicitly mentioned as yet (in the PEP or the
> python-dev list discussion) are the memory management APIs and the FILE*
> APIs which can cause the MSVCRT versioning issues on Windows.
> Those would either need to be excluded from the stable ABI or else
> changed to use opaque pointers.

Good point. As a separate issue, I would actually like to deprecate,
then remove these APIs. I had originally hoped that this would happen
for 3.0 already, alas, nobody worked on it.

In any case, I have removed them from the ABI now.

I haven't thought about the Windows CRT issue yet. I can see that there
would be still problems even without that, e.g. when you do setlocale
in Python, it might not affect the extension module, etc. How would you
propose to deal with that? One approach would to fix the CRT version for
Windows, for the lifetime of 3.x. Another approach could be to document
the known restrictions, and otherwise declare "use at your own risk".


From dalcinl at  Sat May 23 02:50:33 2009
From: dalcinl at (Lisandro Dalcin)
Date: Fri, 22 May 2009 21:50:33 -0300
Subject: [Python-Dev] PEP 384: a request for PyType_Slot
Message-ID: <>

Martin, a small request.

Any chance you consider defining PyType_Slot like below?

typedef struct{
  int slot;    /* slot id, see below */
  void *pdata; /* data pointer */
  void (*pfunc)(void); /* function pointer */
} PyType_Slot

Or perhaps other way? Just to avoid compilers complaining about the
illegal conversion between pointers to data and pointers to
functions... It would be really annoying being force to do
type-punning using an union in order to get "correct" C code...

Lisandro Dalc?n
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From aahz at  Sat May 23 18:55:52 2009
From: aahz at (Aahz)
Date: Sat, 23 May 2009 09:55:52 -0700
Subject: [Python-Dev] FWD: FTP URLs for Python source
Message-ID: <>

Yes, this is ancient, I've been putting off dealing with it because I
couldn't figure out who should handle it.  At this point, I think that if
anyone does it should be the release team, therefore I'm forwarding to
python-dev.  Feel free to tell me I made the wrong choice.  ;-)

----- Forwarded message from "Douglas W. Goodall" <douglas_goodall at> -----

> From: "Douglas W. Goodall" <douglas_goodall at>
> To: webmaster at
> Subject: made too hard...
> Date: Mon, 16 Feb 2009 05:57:15 -0800
> Dear Sir,
> I am not sure why, but you have made it harder than it has to be to
> fetch the python source for installation on a unix system such as  
> OpenBSD.
> I had to use the command line ftp client and it took a lot of time to  
> discover the real
> URL of the download file.
> Here is what ended up working.
> ftp http://www.e you made it this hard on purpose. Yes, it is easy if  
> you
> are using a web browser, but if you are on a unix system without X
> it is a pain to get it when you don't know how.
> You might want to add the ftp URL to the web page for people like me.
> Respectfully,
> Doug
> ---
> Douglas W. Goodall
> 425 San Juanico Street
> Santa Maria, CA  93455
> (805) 598-9099
> I call on each of us to pray for our president.
> He is who we have for the next four years,
> and we need him to be successful for all of
> us. God Bless America, and the President.

----- End forwarded message -----

Aahz (aahz at           <*>

"A foolish consistency is the hobgoblin of little minds, adored by little
statesmen and philosophers and divines."  --Ralph Waldo Emerson

From martin at  Sat May 23 22:44:53 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 23 May 2009 22:44:53 +0200
Subject: [Python-Dev] FWD: FTP URLs for Python source
In-Reply-To: <>
References: <>
Message-ID: <>

Aahz wrote:
> Yes, this is ancient, I've been putting off dealing with it because I
> couldn't figure out who should handle it.  At this point, I think that if
> anyone does it should be the release team, therefore I'm forwarding to
> python-dev.  Feel free to tell me I made the wrong choice.  ;-)

I don't think it needs any action, except perhaps a half-polite response
that we don't intend to change anything.

a) if you are really sitting on the console of an OpenBSD system with
   no X installed, use lynx, or any other text browser:
   scroll down to "Source distribution", hit Enter
b) alternatively, and even better: don't build Python from source at
   all. Instead, use pkg_add to install the Python version that you
   want, downloadable from<rel>/packages/<arch>/python-<ver>.tgz
c) OTOH, if you had only connected to the OpenBSD system remotely
   (e.g. through ssh), just use your local web browser, to either
   * determine the full source download URL of the Python release
     you want to build, then wget on the target system, or
   * if your target system doesn't have wget, download it locally,
     then scp/rcp/ftp it to the target system.

We cannot add an FTP URL to the download page, because we don't
run an ftp server anymore, and don't plan to.

[I don't quite get the "Here is what ended up working" part. What
is http://www.e?]


From hasan.diwan at  Sun May 24 02:48:02 2009
From: hasan.diwan at (Hasan Diwan)
Date: Sat, 23 May 2009 17:48:02 -0700
Subject: [Python-Dev] FWD: FTP URLs for Python source
In-Reply-To: <>
References: <> <>
Message-ID: <>

> Aahz wrote:
>> Yes, this is ancient, I've been putting off dealing with it because I
>> couldn't figure out who should handle it. ?At this point, I think that if
>> anyone does it should be the release team, therefore I'm forwarding to
>> python-dev. ?Feel free to tell me I made the wrong choice. ?;-)

Regarding OpenBSD, what's the problem with just using the port -- the
2.6 version seems to work fine.
Sent from my mobile device

From aahz at  Sun May 24 10:54:51 2009
From: aahz at (Aahz)
Date: Sun, 24 May 2009 01:54:51 -0700
Subject: [Python-Dev] FWD: FTP URLs for Python source
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Sat, May 23, 2009, "Martin v. L?wis" wrote:
> We cannot add an FTP URL to the download page, because we don't
> run an ftp server anymore, and don't plan to.

That's the critical bit.  At this point, I don't think anything else
needs doing.
Aahz (aahz at           <*>

"A foolish consistency is the hobgoblin of little minds, adored by little
statesmen and philosophers and divines."  --Ralph Waldo Emerson

From andymac at  Sun May 24 12:34:07 2009
From: andymac at (Andrew MacIntyre)
Date: Sun, 24 May 2009 20:34:07 +1000
Subject: [Python-Dev] FWD: FTP URLs for Python source
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis wrote:

>    * if your target system doesn't have wget, download it locally,
>      then scp/rcp/ftp it to the target system.

All of [Free|Net|Open|Dragonfly]BSD have ftp clients that can also 
retrieve HTTP URLs, though I guess many wouldn't think of that...

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at  (pref) | Snail: PO Box 370
        andymac at             (alt) |        Belconnen ACT 2616
Web:               |        Australia

From charles.r.mccreary at  Sun May 24 15:20:56 2009
From: charles.r.mccreary at (Charles McCreary)
Date: Sun, 24 May 2009 08:20:56 -0500
Subject: [Python-Dev] Introducing GSOC student James Pruitt
Message-ID: <>

I am a mentor for a GSOC 2009 student working on a PSF project. His project
abstract is "Handling of subprocess async io issues, testing and
reimplementing the commands module in terms of subprocess." He has started a
blog,, in which he is providing general
information on his GSOC project. In the next few days, he will start a
project on google code so that interested parties can help guide his work. I
urge anyone interested in the subprocess module to interact with Mr. Pruitt
and provide feedback/suggestions/encouragement.

Charles R. McCreary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benjamin at  Mon May 25 14:51:01 2009
From: benjamin at (Benjamin Peterson)
Date: Mon, 25 May 2009 07:51:01 -0500
Subject: [Python-Dev] python-checkins is down
Message-ID: <>

I haven't gotten emails for any of the commits I've done in the last
12 hours or so.


From aahz at  Mon May 25 15:44:37 2009
From: aahz at (Aahz)
Date: Mon, 25 May 2009 06:44:37 -0700
Subject: [Python-Dev] python-checkins is down
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, May 25, 2009, Benjamin Peterson wrote:
> I haven't gotten emails for any of the commits I've done in the last
> 12 hours or so.

Forwarded to postmaster at -- if there's a problem with the
checkins process itself, that won't help.  Have you verified that the
commits are landing?  (I.e. is svn working properly?)  Also, if you
could double-check the python-checkins archives to see whether it's just
you not getting the messages, that would help.
Aahz (aahz at           <*>

"A foolish consistency is the hobgoblin of little minds, adored by little
statesmen and philosophers and divines."  --Ralph Waldo Emerson

From solipsis at  Mon May 25 15:49:50 2009
From: solipsis at (Antoine Pitrou)
Date: Mon, 25 May 2009 13:49:50 +0000 (UTC)
Subject: [Python-Dev] python-checkins is down
References: <>
Message-ID: <>

Aahz <aahz <at>> writes:
> Forwarded to postmaster <at> -- if there's a problem with the
> checkins process itself, that won't help.  Have you verified that the
> commits are landing?  (I.e. is svn working properly?)

Yes, it is.

>  Also, if you
> could double-check the python-checkins archives to see whether it's just
> you not getting the messages, that would help.

The messages aren't in the archives either.



From mal at  Mon May 25 19:41:54 2009
From: mal at (M.-A. Lemburg)
Date: Mon, 25 May 2009 19:41:54 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:
> Thomas Wouters reminded me of a long-standing idea; I finally
> found the time to write it down.
> Please comment!
> ...

Up until this PEP proposal, we had a very simple scheme for
the Python C-API: all documented functions and variables with
a "Py" prefix were part of the C-API, everything else was not
and could change between releases (in particular the private
"_Py" prefix APIs).

Changing the published APIs was considered a bad thing in the
2.x development process and generally required a good reason
to get supported.

Changing private functions or ones that were not documented
was generally never a big problem.

Now, with the PEP, I have a feeling that the Python C-API
will in effect be limited to what's in the PEP's idea of
a usable ABI and open up the non-inluded public C-APIs
to the same rate of change as the private APIs.

If that's the case, the PEP should be discussed on the C-API
list first, in order to identify a complete list of APIs that
is supposed to define the Python C-API. Ideally, all other
APIs would then need to be made private. However, I doubt that
this is possible before switching to Python 4.0.

Then again, I'm not sure whether that's what you're aiming for...

An optional cross-version ABI would certainly be a good thing.

Limiting the Python C-API would be counterproductive.

> During the compilation of applications, the preprocessor macro
> Py_LIMITED_API must be defined. Doing so will hide all definitions
> that are not part of the ABI.

So extensions wanting to use the full Python C-API as documented
in the C-API docs will still be able to do this, right ?

> Type Objects
> ------------
> The structure of type objects is not available to applications;
> declaration of "static" type objects is not possible anymore
> (for applications using this ABI).

Hmm, that's going to create big problems for extensions that
want to expose a C-API for their types: Type checks are normally
done by pointer comparison using those static type objects.

> Functions and function-like Macros
> ----------------------------------
> Function-like macros (in particular, field access macros) remain
> available to applications, but get replaced by function calls
> (unless their definition only refers to features of the ABI, such
> as the various _Check macros)

Including Py_INCREF()/Py_DECREF() ?

> Excluded Functions
> ------------------
> Functions declared in the following header files are not part
> of the ABI:
> - cellobject.h
> - classobject.h
> - code.h
> - frameobject.h
> - funcobject.h
> - genobject.h
> - pyarena.h
> - pydebug.h
> - symtable.h
> - token.h
> - traceback.h

I don't think that's feasable: you basically remove all introspection
functions that way.

This will need a more fine-grained approach.

> Linkage
> -------
> On Windows, applications shall link with python3.dll;

You mean: extensions that were compiled with Py_LIMITED_API, right ?

> an import
> library python3.lib will be available. This DLL will redirect all of
> its API functions through /export linker options to the full
> interpreter DLL, i.e. python3y.dll.

What if you mix extensions that use the full C-API with ones
that restrict themselves to the limited version ?

Would creating a Python object in a full-API extension and
free'ing it in a limited-API extension cause problems ?

> Implementation Strategy
> =======================
> This PEP will be implemented in a branch, allowing users to check
> whether their modules conform to the ABI. To simplify this testing, an
> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
> type object layout, to let users postpone rewriting all types. When
> the this branch is merged into the 3.2 code base, this macro will
> be removed.

Now I'm confused again: this sounds a lot like you do want all extension
writers to only use the limited API.

[And in another post]
>> Something I haven't seen explicitly mentioned as yet (in the PEP or the
>> > python-dev list discussion) are the memory management APIs and the FILE*
>> > APIs which can cause the MSVCRT versioning issues on Windows.
>> > 
>> > Those would either need to be excluded from the stable ABI or else
>> > changed to use opaque pointers.
> Good point. As a separate issue, I would actually like to deprecate,
> then remove these APIs. I had originally hoped that this would happen
> for 3.0 already, alas, nobody worked on it.
> In any case, I have removed them from the ABI now.

How do you expect Python extensions to allocate memory and objects
in a platform independent way without those APIs ?

And as an aside: Which API families are you referring to ? PyMem_Malloc,
PyObject_Malloc, or PyObject_New ?

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 25 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                34 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at  Mon May 25 23:04:58 2009
From: ncoghlan at (Nick Coghlan)
Date: Tue, 26 May 2009 07:04:58 +1000
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Now, with the PEP, I have a feeling that the Python C-API
> will in effect be limited to what's in the PEP's idea of
> a usable ABI and open up the non-inluded public C-APIs
> to the same rate of change as the private APIs.

Not really - before this PEP it was already fairly easy to write an
extension that was source-level compatible with multiple versions of
Python (depending on exactly what you wanted to do, of course).

However, it is essentially impossible to make an extension that is
binary level compatible with multiple versions.

With the defined stable ABI in place, each extension module author will
be able to make a choice:
- choose binary compatibility by limiting themselves to the stable ABI
and be able to provide a single binary that will still work with later
versions of Py3k
- stick with source compatibility and continue to provide new binaries
for each version of Python

> An optional cross-version ABI would certainly be a good thing.
> Limiting the Python C-API would be counterproductive.

I don't think anyone would disagree with that. A discussion on C-API sig
would certainly be a good idea.

>> During the compilation of applications, the preprocessor macro
>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>> that are not part of the ABI.
> So extensions wanting to use the full Python C-API as documented
> in the C-API docs will still be able to do this, right ?

Yep - they just wouldn't define the new macro.

>> Type Objects
>> ------------
>> The structure of type objects is not available to applications;
>> declaration of "static" type objects is not possible anymore
>> (for applications using this ABI).
> Hmm, that's going to create big problems for extensions that
> want to expose a C-API for their types: Type checks are normally
> done by pointer comparison using those static type objects.

They would just have to expose "MyExtensionPrefix_MyType_Check" and
"MyExtensionPrefix_MyType_CheckExact" functions the same way that types
in the C API do.

>> Functions and function-like Macros
>> ----------------------------------
>> Function-like macros (in particular, field access macros) remain
>> available to applications, but get replaced by function calls
>> (unless their definition only refers to features of the ABI, such
>> as the various _Check macros)
> Including Py_INCREF()/Py_DECREF() ?

I believe so - MvL deliberately left the fields that the ref counting
relies on as part of the ABI.

>> Excluded Functions
>> ------------------
>> Functions declared in the following header files are not part
>> of the ABI:
>> - cellobject.h
>> - classobject.h
>> - code.h
>> - frameobject.h
>> - funcobject.h
>> - genobject.h
>> - pyarena.h
>> - pydebug.h
>> - symtable.h
>> - token.h
>> - traceback.h
> I don't think that's feasable: you basically remove all introspection
> functions that way.
> This will need a more fine-grained approach.

I don't think it is reasonable to expect the introspection interfaces to
remain stable at a binary level across versions.

Having "I want deep introspection support from C" and "I want to use a
single binary for multiple Python versions" be mutually exclusive
choices sounds like a perfectly sensible position to me.

Also, keep in mind that even an extension module that restricts itself
to Py_LIMITED_API would still be able to call in to the Python
equivalents via PyObject_Call and friends (e.g. by importing and using
the inspect and traceback modules).

> What if you mix extensions that use the full C-API with ones
> that restrict themselves to the limited version ?
> Would creating a Python object in a full-API extension and
> free'ing it in a limited-API extension cause problems ?

Possibly, if you end up mixing C runtimes in the process. Specifically:
1. Python linked with MSVCRT X
2. Full extension module linked with MSVCRT Y
3. Limited extension module linked with MSVCRT Z

The PyMem/PyObject APIs in the limited extension module will use the
heap in MSVCRT X, since they will be redirected through the Python
stable ABI as function calls. However, if the full extension module uses
the macro forms and links with the wrong MSVCRT version, then you have
the usual opportunities for conflicts between the two C runtimes.

This isn't a problem created by defining a stable ABI though - it's the
main reason mixing C runtimes is a bad idea. (The two others we have
noted so far being IO issues, especially attempting to share FILE*
instances and the fact that changing the locale will only affect
whichever runtime the extension module linked against).

>> Good point. As a separate issue, I would actually like to deprecate,
>> then remove these APIs. I had originally hoped that this would happen
>> for 3.0 already, alas, nobody worked on it.
>> In any case, I have removed them from the ABI now.
> How do you expect Python extensions to allocate memory and objects
> in a platform independent way without those APIs ?
> And as an aside: Which API families are you referring to ? PyMem_Malloc,
> PyObject_Malloc, or PyObject_New ?

The ones with a FILE* parameter in the signature. There's no problem
with the PyMem/PyObject functions since those will be redirected to
consistently use the version of the C runtime that Python was originally
linked against (their macro counterparts are obviously off limits for
the stable ABI).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From aahz at  Mon May 25 23:54:16 2009
From: aahz at (Aahz)
Date: Mon, 25 May 2009 14:54:16 -0700
Subject: [Python-Dev] FWD:  python-checkins is down
Message-ID: <>

----- Forwarded message from Ralf Hildebrandt <Ralf.Hildebrandt at> -----

> Date: Mon, 25 May 2009 21:59:32 +0200
> From: Ralf Hildebrandt <Ralf.Hildebrandt at>
> To: Patrick Ben Koetter <patrick at>
> Cc: Aahz <aahz at>, postmaster at
> Subject: Re: FWD: Re: [Python-Dev] python-checkins is down
> * Patrick Ben Koetter <patrick at>:
>> This just hit python-checkins at
>> May 25 20:50:33 albatross postfix/local[12976]: A029ED5FF: to=<python-checkins at>, orig_to=<python-checkins at>, relay=local, delay=0.17, delays=0.09/0/0/0.08, dsn=2.0.0, status=sent (delivered to command: /usr/local/mailman/mail/mailman post python-checkins)
>> Looks like the list itself is online and can be reached.
>> I didn't read the whole thread (deleted part of it already).
>> If that isn't the problem, what should I look for then?
> I let all the mails through and set the senders to the "may send
> although they're not members"
> -- 
> Ralf Hildebrandt
>   Gesch?ftsbereich IT | Abteilung Netzwerk
>   Charit? - Universit?tsmedizin Berlin
>   Campus Benjamin Franklin
>   Hindenburgdamm 30 | D-12200 Berlin
>   Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
>   Ralf.Hildebrandt at |

----- End forwarded message -----

Aahz (aahz at           <*>

"A foolish consistency is the hobgoblin of little minds, adored by little
statesmen and philosophers and divines."  --Ralph Waldo Emerson

From google at  Tue May 26 01:50:58 2009
From: google at (MRAB)
Date: Tue, 26 May 2009 00:50:58 +0100
Subject: [Python-Dev] Arguments of MatchObject in re module
Message-ID: <>

I've just noticed an oddity of the re module while looking at the
sources. I'll illustrate it below:

 >>> import re
 >>> p = re.compile("foo")
 >>> help(p.match)
Help on built-in function match:

     match(string[, pos[, endpos]]) --> match object or None.
     Matches zero or more characters at the beginning of the string

 >>> p.match(string="foo")

Traceback (most recent call last):
   File "<pyshell#8>", line 1, in <module>
TypeError: Required argument 'pattern' (pos 1) not found

The name of the first argument should be "string", yet it's "pattern".
Does anyone know if it's anything other than a mistake? Should it be
fixed in the next version of the re module, or are we just stuck with it
(and should just change the docstring to match)?

From martin at  Tue May 26 08:59:51 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 26 May 2009 08:59:51 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
Message-ID: <>

> Now, with the PEP, I have a feeling that the Python C-API
> will in effect be limited to what's in the PEP's idea of
> a usable ABI and open up the non-inluded public C-APIs
> to the same rate of change as the private APIs.

That's certainly not the plan. Instead, the plan is to have
a stable ABI. The policy on the API isn't affected, except
for restricting changes to the API that would break the ABI.

>> During the compilation of applications, the preprocessor macro
>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>> that are not part of the ABI.
> So extensions wanting to use the full Python C-API as documented
> in the C-API docs will still be able to do this, right ?

Correct. They would link to the version-specific DLL on Windows.

>> The structure of type objects is not available to applications;
>> declaration of "static" type objects is not possible anymore
>> (for applications using this ABI).
> Hmm, that's going to create big problems for extensions that
> want to expose a C-API for their types: Type checks are normally
> done by pointer comparison using those static type objects.

I don't see the problem. During module initialization, you
create the type object and store it in a global variable, and
then both clients and the module compare against the stored

>> Function-like macros (in particular, field access macros) remain
>> available to applications, but get replaced by function calls
>> (unless their definition only refers to features of the ABI, such
>> as the various _Check macros)
> Including Py_INCREF()/Py_DECREF() ?

Yes, although some people are requesting that these become functions.

>> Excluded Functions
>> ------------------
>> Functions declared in the following header files are not part
>> of the ABI:
>> - cellobject.h
>> - classobject.h
>> - code.h
>> - frameobject.h
>> - funcobject.h
>> - genobject.h
>> - pyarena.h
>> - pydebug.h
>> - symtable.h
>> - token.h
>> - traceback.h
> I don't think that's feasable: you basically remove all introspection
> functions that way.
> This will need a more fine-grained approach.

What specifically is it that you want to do in a module that you
couldn't do anymore?

>> On Windows, applications shall link with python3.dll;
> You mean: extensions that were compiled with Py_LIMITED_API, right ?

Correct, see "Terminology" in the PEP.

>> an import
>> library python3.lib will be available. This DLL will redirect all of
>> its API functions through /export linker options to the full
>> interpreter DLL, i.e. python3y.dll.
> What if you mix extensions that use the full C-API with ones
> that restrict themselves to the limited version ?

Some link against python3.dll, others against python32.dll (say).

> Would creating a Python object in a full-API extension and
> free'ing it in a limited-API extension cause problems ?

No problem that I can see.

>> This PEP will be implemented in a branch, allowing users to check
>> whether their modules conform to the ABI. To simplify this testing, an
>> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
>> type object layout, to let users postpone rewriting all types. When
>> the this branch is merged into the 3.2 code base, this macro will
>> be removed.
> Now I'm confused again: this sounds a lot like you do want all extension
> writers to only use the limited API.

I certainly want to support as many modules as reasonable with the PEP.
Whether or not developers then chose to build version-independent
binaries is certainly outside the scope of the PEP - it only specifies
action items for Python, not for application authors.

>>> Something I haven't seen explicitly mentioned as yet (in the PEP or the
>>>> python-dev list discussion) are the memory management APIs and the FILE*
>>>> APIs which can cause the MSVCRT versioning issues on Windows.
>>>> Those would either need to be excluded from the stable ABI or else
>>>> changed to use opaque pointers.
>> Good point. As a separate issue, I would actually like to deprecate,
>> then remove these APIs. I had originally hoped that this would happen
>> for 3.0 already, alas, nobody worked on it.
>> In any case, I have removed them from the ABI now.
> How do you expect Python extensions to allocate memory and objects
> in a platform independent way without those APIs ?

I have only removed functions from the ABI that have FILE* in their

> And as an aside: Which API families are you referring to ? PyMem_Malloc,
> PyObject_Malloc, or PyObject_New ?

Neither. PyRun_AnyFileFlags and friends.


From mal at  Tue May 26 18:28:59 2009
From: mal at (M.-A. Lemburg)
Date: Tue, 26 May 2009 18:28:59 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
Message-ID: <>

Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> Now, with the PEP, I have a feeling that the Python C-API
>> will in effect be limited to what's in the PEP's idea of
>> a usable ABI and open up the non-inluded public C-APIs
>> to the same rate of change as the private APIs.
> Not really - before this PEP it was already fairly easy to write an
> extension that was source-level compatible with multiple versions of
> Python (depending on exactly what you wanted to do, of course).

Right and I hope that things stay that way.

> However, it is essentially impossible to make an extension that is
> binary level compatible with multiple versions.

On Windows, yes. On Unix, this often worked, even though it wasn't
always safe to do.

In practice it's usually better to recompile extensions for every
single release.

> With the defined stable ABI in place, each extension module author will
> be able to make a choice:
> - choose binary compatibility by limiting themselves to the stable ABI
> and be able to provide a single binary that will still work with later
> versions of Py3k
> - stick with source compatibility and continue to provide new binaries
> for each version of Python

Great !

>> An optional cross-version ABI would certainly be a good thing.
>> Limiting the Python C-API would be counterproductive.
> I don't think anyone would disagree with that. A discussion on C-API sig
> would certainly be a good idea.
>>> During the compilation of applications, the preprocessor macro
>>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>>> that are not part of the ABI.
>> So extensions wanting to use the full Python C-API as documented
>> in the C-API docs will still be able to do this, right ?
> Yep - they just wouldn't define the new macro.

Good !

>>> Type Objects
>>> ------------
>>> The structure of type objects is not available to applications;
>>> declaration of "static" type objects is not possible anymore
>>> (for applications using this ABI).
>> Hmm, that's going to create big problems for extensions that
>> want to expose a C-API for their types: Type checks are normally
>> done by pointer comparison using those static type objects.
> They would just have to expose "MyExtensionPrefix_MyType_Check" and
> "MyExtensionPrefix_MyType_CheckExact" functions the same way that types
> in the C API do.

Hmm, that's a function call per type check and will slow things
down a lot, esp. when working with APIs that deal a lot with
these objects.

The typical way to implement these type checks is via a simple
pointer comparison (falling back to a function for sub-types).
That's cheap and fast.

>>> Functions and function-like Macros
>>> ----------------------------------
>>> Function-like macros (in particular, field access macros) remain
>>> available to applications, but get replaced by function calls
>>> (unless their definition only refers to features of the ABI, such
>>> as the various _Check macros)
>> Including Py_INCREF()/Py_DECREF() ?
> I believe so - MvL deliberately left the fields that the ref counting
> relies on as part of the ABI.

Hmm, another slow-down. This one has even more impact if you're
writing extensions that have to deal with lots of objects.

>>> Excluded Functions
>>> ------------------
>>> Functions declared in the following header files are not part
>>> of the ABI:
>>> - cellobject.h
>>> - classobject.h
>>> - code.h
>>> - frameobject.h
>>> - funcobject.h
>>> - genobject.h
>>> - pyarena.h
>>> - pydebug.h
>>> - symtable.h
>>> - token.h
>>> - traceback.h
>> I don't think that's feasable: you basically remove all introspection
>> functions that way.
>> This will need a more fine-grained approach.
> I don't think it is reasonable to expect the introspection interfaces to
> remain stable at a binary level across versions.
> Having "I want deep introspection support from C" and "I want to use a
> single binary for multiple Python versions" be mutually exclusive
> choices sounds like a perfectly sensible position to me.
> Also, keep in mind that even an extension module that restricts itself
> to Py_LIMITED_API would still be able to call in to the Python
> equivalents via PyObject_Call and friends (e.g. by importing and using
> the inspect and traceback modules).

Sure, but they'd also want to print tracebacks or raise fatal
errors if necessary.

>> What if you mix extensions that use the full C-API with ones
>> that restrict themselves to the limited version ?
>> Would creating a Python object in a full-API extension and
>> free'ing it in a limited-API extension cause problems ?
> Possibly, if you end up mixing C runtimes in the process. Specifically:
> 1. Python linked with MSVCRT X
> 2. Full extension module linked with MSVCRT Y
> 3. Limited extension module linked with MSVCRT Z
> The PyMem/PyObject APIs in the limited extension module will use the
> heap in MSVCRT X, since they will be redirected through the Python
> stable ABI as function calls. However, if the full extension module uses
> the macro forms and links with the wrong MSVCRT version, then you have
> the usual opportunities for conflicts between the two C runtimes.
> This isn't a problem created by defining a stable ABI though - it's the
> main reason mixing C runtimes is a bad idea. (The two others we have
> noted so far being IO issues, especially attempting to share FILE*
> instances and the fact that changing the locale will only affect
> whichever runtime the extension module linked against).

Of course, but the stable ABI encourages mixing extensions
regardless of what runtime they were compiled with.

This is not much of an issue if the C runtime DLL doesn't change
between releases, but it becomes a problem when they do e.g.
due to an upgrade to a new MSVC++ compiler version or in case
the extension was downloaded pre-compiled from pypi or some
other site.

I think the module import API should check for possible
incompatibilities here and issue a warning (much like it does
now for differences in the Python API version).

>>> Good point. As a separate issue, I would actually like to deprecate,
>>> then remove these APIs. I had originally hoped that this would happen
>>> for 3.0 already, alas, nobody worked on it.
>>> In any case, I have removed them from the ABI now.
>> How do you expect Python extensions to allocate memory and objects
>> in a platform independent way without those APIs ?
>> And as an aside: Which API families are you referring to ? PyMem_Malloc,
>> PyObject_Malloc, or PyObject_New ?
> The ones with a FILE* parameter in the signature. There's no problem
> with the PyMem/PyObject functions since those will be redirected to
> consistently use the version of the C runtime that Python was originally
> linked against (their macro counterparts are obviously off limits for
> the stable ABI).

Ah, ok.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 26 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                33 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at  Tue May 26 18:42:37 2009
From: mal at (M.-A. Lemburg)
Date: Tue, 26 May 2009 18:42:37 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis wrote:
>> Now, with the PEP, I have a feeling that the Python C-API
>> will in effect be limited to what's in the PEP's idea of
>> a usable ABI and open up the non-inluded public C-APIs
>> to the same rate of change as the private APIs.
> That's certainly not the plan. Instead, the plan is to have
> a stable ABI. The policy on the API isn't affected, except
> for restricting changes to the API that would break the ABI.

Thanks for clarifying this.

>>> During the compilation of applications, the preprocessor macro
>>> Py_LIMITED_API must be defined. Doing so will hide all definitions
>>> that are not part of the ABI.
>> So extensions wanting to use the full Python C-API as documented
>> in the C-API docs will still be able to do this, right ?
> Correct. They would link to the version-specific DLL on Windows.


>>> The structure of type objects is not available to applications;
>>> declaration of "static" type objects is not possible anymore
>>> (for applications using this ABI).
>> Hmm, that's going to create big problems for extensions that
>> want to expose a C-API for their types: Type checks are normally
>> done by pointer comparison using those static type objects.
> I don't see the problem. During module initialization, you
> create the type object and store it in a global variable, and
> then both clients and the module compare against the stored
> pointer.

Ah, good point !

>>> Function-like macros (in particular, field access macros) remain
>>> available to applications, but get replaced by function calls
>>> (unless their definition only refers to features of the ABI, such
>>> as the various _Check macros)
>> Including Py_INCREF()/Py_DECREF() ?
> Yes, although some people are requesting that these become functions.

I'd opt against that, simply because it creates a lot of overhead
due to the function call and issues with cache locality.

>>> Excluded Functions
>>> ------------------
>>> Functions declared in the following header files are not part
>>> of the ABI:
>>> - cellobject.h
>>> - classobject.h
>>> - code.h
>>> - frameobject.h
>>> - funcobject.h
>>> - genobject.h
>>> - pyarena.h
>>> - pydebug.h
>>> - symtable.h
>>> - token.h
>>> - traceback.h
>> I don't think that's feasable: you basically remove all introspection
>> functions that way.
>> This will need a more fine-grained approach.
> What specifically is it that you want to do in a module that you
> couldn't do anymore?

See my reply to Nick: some of the functions are needed even
if you don't want to do introspection, such as Py_FatalError()
or PyTraceBack_Print().

BTW: Given the headline, I take it that the various type checking
macros in these header will still be available, right ?

>>> On Windows, applications shall link with python3.dll;
>> You mean: extensions that were compiled with Py_LIMITED_API, right ?
> Correct, see "Terminology" in the PEP.

Good, thanks.

>>> an import
>>> library python3.lib will be available. This DLL will redirect all of
>>> its API functions through /export linker options to the full
>>> interpreter DLL, i.e. python3y.dll.
>> What if you mix extensions that use the full C-API with ones
>> that restrict themselves to the limited version ?
> Some link against python3.dll, others against python32.dll (say).
>> Would creating a Python object in a full-API extension and
>> free'ing it in a limited-API extension cause problems ?
> No problem that I can see.

Can we be sure that the MSVCRT used by python35.dll stays compatible
to the one used by say python32.dll ? What if the CRT memory
management changes between MSVCRT versions ?

Another aspect to consider:

How will this work in the light of having multiple copies of
Python installed on a Windows machine ?

They implementation section suggests that python3.dll would always
redirect to the python3x.dll for which it was installed, ie. if
I have Python 3.5 installed, but then need to run some app with
Python 3.2, the installed python3.dll would then point back to the

Now, if I start a Python 3.5 application which uses a limited
API extension, this would try to load python32.dll into the
Python 3.5 process. AFAIK, that's not possible due to the
naming conflicts.

>>> This PEP will be implemented in a branch, allowing users to check
>>> whether their modules conform to the ABI. To simplify this testing, an
>>> additional macro Py_LIMITED_API_WITH_TYPES will expose the existing
>>> type object layout, to let users postpone rewriting all types. When
>>> the this branch is merged into the 3.2 code base, this macro will
>>> be removed.
>> Now I'm confused again: this sounds a lot like you do want all extension
>> writers to only use the limited API.
> I certainly want to support as many modules as reasonable with the PEP.
> Whether or not developers then chose to build version-independent
> binaries is certainly outside the scope of the PEP - it only specifies
> action items for Python, not for application authors.

Thanks for the clarification.

>>>> Something I haven't seen explicitly mentioned as yet (in the PEP or the
>>>>> python-dev list discussion) are the memory management APIs and the FILE*
>>>>> APIs which can cause the MSVCRT versioning issues on Windows.
>>>>> Those would either need to be excluded from the stable ABI or else
>>>>> changed to use opaque pointers.
>>> Good point. As a separate issue, I would actually like to deprecate,
>>> then remove these APIs. I had originally hoped that this would happen
>>> for 3.0 already, alas, nobody worked on it.
>>> In any case, I have removed them from the ABI now.
>> How do you expect Python extensions to allocate memory and objects
>> in a platform independent way without those APIs ?
> I have only removed functions from the ABI that have FILE* in their
> signatures.
>> And as an aside: Which API families are you referring to ? PyMem_Malloc,
>> PyObject_Malloc, or PyObject_New ?
> Neither. PyRun_AnyFileFlags and friends.


Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 26 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                33 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From martin at  Tue May 26 20:31:16 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 26 May 2009 20:31:16 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

>>>> The structure of type objects is not available to applications;
>>>> declaration of "static" type objects is not possible anymore
>>>> (for applications using this ABI).
>>> Hmm, that's going to create big problems for extensions that
>>> want to expose a C-API for their types: Type checks are normally
>>> done by pointer comparison using those static type objects.
>> They would just have to expose "MyExtensionPrefix_MyType_Check" and
>> "MyExtensionPrefix_MyType_CheckExact" functions the same way that types
>> in the C API do.
> Hmm, that's a function call per type check and will slow things
> down a lot, esp. when working with APIs that deal a lot with
> these objects.

See my other response. You can continue to provide _Check
macros; knowledge of the structure of types is not necessary to
perform such checks.

> The typical way to implement these type checks is via a simple
> pointer comparison (falling back to a function for sub-types).
> That's cheap and fast.

And will continue to be available to ABI-compliant extensions.

>>> Including Py_INCREF()/Py_DECREF() ?
>> I believe so - MvL deliberately left the fields that the ref counting
>> relies on as part of the ABI.
> Hmm, another slow-down.

??? Why is "no change" a slow-down?

> This is not much of an issue if the C runtime DLL doesn't change
> between releases, but it becomes a problem when they do e.g.
> due to an upgrade to a new MSVC++ compiler version or in case
> the extension was downloaded pre-compiled from pypi or some
> other site.

What problem specifically may occur?


From martin at  Tue May 26 20:54:35 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 26 May 2009 20:54:35 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

>>>> Functions declared in the following header files are not part
>>>> of the ABI:
>>>> - cellobject.h
>>>> - classobject.h
>>>> - code.h
>>>> - frameobject.h
>>>> - funcobject.h
>>>> - genobject.h
>>>> - pyarena.h
>>>> - pydebug.h
>>>> - symtable.h
>>>> - token.h
>>>> - traceback.h
>>> I don't think that's feasable: you basically remove all introspection
>>> functions that way.
>>> This will need a more fine-grained approach.
>> What specifically is it that you want to do in a module that you
>> couldn't do anymore?
> See my reply to Nick: some of the functions are needed even
> if you don't want to do introspection, such as Py_FatalError()

Ok. I don't know what Py_FatalError is doing in pydebug.h, so I
now propose to move it to pyerrors.h.

> or PyTraceBack_Print().

Ok; I have removed traceback.h from the list. By the other rules
of the PEP, the only function that becomes available then is

> BTW: Given the headline, I take it that the various type checking
> macros in these header will still be available, right ?

Which headers? The one on the list above? No; my idea would
be to completely hide them as-is.

All other type-checking macros will remain available, and
will remain being macros.

>>> Would creating a Python object in a full-API extension and
>>> free'ing it in a limited-API extension cause problems ?
>> No problem that I can see.
> Can we be sure that the MSVCRT used by python35.dll stays compatible
> to the one used by say python32.dll ? What if the CRT memory
> management changes between MSVCRT versions ?

It doesn't matter. For Python "things", the extension module will
use the pymem.h functions, which get routed through pythonxy.dll
to the CRT that Python was build with.

If the extension uses regular malloc(), it should also invoke
regular free() on the pointer. There is no API where Python
calls malloc directly and the extension calls free, or vice

> How will this work in the light of having multiple copies of
> Python installed on a Windows machine ?

Interesting question. One solution could be to use SxS, which
would allow multiple concurrent installations of python3.dll,
although we would need to make sure it always binds to the
"right" one in each context.

Another solution could be to keep the various copies of python3.dll
in their respective PYTHONHOMEs, and leave it to python.exe or the
app to load the right one; any subsequent extension modules should
then pick up the one that was already loaded.

> They implementation section suggests that python3.dll would always
> redirect to the python3x.dll for which it was installed, ie. if
> I have Python 3.5 installed, but then need to run some app with
> Python 3.2, the installed python3.dll would then point back to the
> python32.dll.

That depends on where they get installed. If they all go into system32,
only the most recent one would be available, which is probably not

> Now, if I start a Python 3.5 application which uses a limited
> API extension, this would try to load python32.dll into the
> Python 3.5 process. AFAIK, that's not possible due to the
> naming conflicts.

I don't see this problem. As long as we manage to install multiple
versions of python3.dll on the system somehow, different processes
could certainly load different such DLLs, and the same extension
module would always use the right one.


From phillip.sitbon+python-dev at  Tue May 26 21:48:49 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Tue, 26 May 2009 12:48:49 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
Message-ID: <>

Hi everyone,

I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)

Anyway, I'd like to ask something that may have been asked already, so
I apologize if this has been covered.

Instead of removing the GIL, has anyone thought of making it more
lightweight? The current situation for Windows is that the
single-thread case is decently fast (via interlocked operations), but
it drops to using an event object in the case of contention. (see

Now, I don't have any specific evidence aside from my experience in
Windows multithreaded programming, but event objects are often
considered the slowest synchronization mechanism available. So, what
are the alternatives? Mutexes or critical sections. Semaphores too, if
you want to get fancy, but I digress.

Because mutexes have the capability of inter-process locking, which we
don't need, critical sections fit the bill as a lightweight locking
mechanism. They work in a way similar to how the Python GIL is
handled: first, attempt an interlocked operation, and if another
thread owns the lock, wait on a kernel object. They are known to be
extremely fast.

There are some catches with using a critical section instead of the
current method:

1. It is recursive, while the current GIL setup is not. Would it break
Python to support (or deal with) recursive behavior at the GIL level?
Note that we can still disallow recursion and fail because we know if
the current thread is the lock owner, but the return from the lock
function is usually only checked when the wait parameter is zero
(meaning "don't block, just try to acquire"). The biggest problem I
see here is how mixing the PyGILState_* API with multiple interpreters
will behave: when PyGILState_Ensure() is called while the GIL is held
for a thread state under an interpreter other than the main
interpreter, it tries to re-lock the GIL. This would normally cause a
deadlock, but the best we could do with a critical section is have the
call fail and/or increase a recursion counter. If maintaining behavior
is absolutely necessary, I guess it would be pretty easy to force a
deadlock. Personally, I would prefer a Py_FatalError or something like

2. Backwards incompatibility: TryEnterCriticalSection isn't available
pre-NT4, so Windows 95 support is broken. Microsoft doesn't support or
even mention it in the list of supporting OSes for their API functions
anymore, so... non-issue? Some of the data structure is available to
us, so I bet it would be easy to implement the function manually.

3. ?? - I'm sure there are other issues that deserve a look.

I've given this a shot already while doing some concurrency testing
with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
yet. I'm using my modified python26.dll to run all of my Python code
and trying to find anywhere it could possibly break. For multiple
concurrent requests against a single multithreaded ISAPI handler
process, I see a statistically significant speed increase depending on
how much Python code is executed. With more Python code executed (e.g.
a Django page), the speedup was about 2x. I haven't tested with varied
values for _Py_CheckInterval aside from finding a sweet spot for my
specific purposes, but using 100 (the default) would likely make the
performance difference more noticeable. A spin mutex also does well,
but the results vary a lot more.

Just as a disclaimer, my tests were nowhere near scientific, but if
anyone needs convincing I can come up with some actual measurements. I
think at this point most of you are wondering more about what it would

Hopefully I haven't wasted anyone's time - I just wanted to share what
I see as a possibly substantial improvement to Python's core. let me
know if you're interested in a patch to use for your own testing.



From solipsis at  Tue May 26 21:57:53 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 26 May 2009 19:57:53 +0000 (UTC)
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
References: <>
Message-ID: <>


> Hopefully I haven't wasted anyone's time - I just wanted to share what
> I see as a possibly substantial improvement to Python's core. let me
> know if you're interested in a patch to use for your own testing.

You should definitely open a bug entry in There, post
your patch, some explanations and preferably a quick way (e.g. a simple script)
 of reproducing the speedups (without having to install a third-party library or
extension, that is).



From v+python at  Tue May 26 22:01:41 2009
From: v+python at (Glenn Linderman)
Date: Tue, 26 May 2009 13:01:41 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On approximately 5/26/2009 12:48 PM, came the following characters from 
the keyboard of Phillip Sitbon:
> Hi everyone,
> I'm new to the list but I've been embedding Python and working very
> closely with the core sources for many years now. I discovered Python
> a long time ago when I needed to embed a scripting language and found
> the PHP sources... unreadable ;)


> I've given this a shot already while doing some concurrency testing
> with my ISAPI extension (PyISAPIe). First of all, nothing looks broken
> yet. I'm using my modified python26.dll to run all of my Python code
> and trying to find anywhere it could possibly break. For multiple
> concurrent requests against a single multithreaded ISAPI handler
> process, I see a statistically significant speed increase depending on
> how much Python code is executed. With more Python code executed (e.g.
> a Django page), the speedup was about 2x. I haven't tested with varied
> values for _Py_CheckInterval aside from finding a sweet spot for my
> specific purposes, but using 100 (the default) would likely make the
> performance difference more noticeable. A spin mutex also does well,
> but the results vary a lot more.
> Just as a disclaimer, my tests were nowhere near scientific, but if
> anyone needs convincing I can come up with some actual measurements. I
> think at this point most of you are wondering more about what it would
> break.
> Hopefully I haven't wasted anyone's time - I just wanted to share what
> I see as a possibly substantial improvement to Python's core. let me
> know if you're interested in a patch to use for your own testing.

I wonder if the patch could be structured as a conditional compilation? 
  You know how many different spots are touched, and how many lines per 

If it could be, then theoretically it could be released and people could 
  do lots of comparative stress testing with different workloads.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From martin at  Tue May 26 22:07:23 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 26 May 2009 22:07:23 +0200
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> 3. ?? - I'm sure there are other issues that deserve a look.

What about fairness? I don't know off-hand whether the GIL is
fair, or whether critical sections are fair, but it needs to be


From phillip.sitbon+python-dev at  Tue May 26 23:00:10 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Tue, 26 May 2009 14:00:10 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> You should definitely open a bug entry in There, post
> your patch, some explanations and preferably a quick way (e.g. a simple script)
>  of reproducing the speedups (without having to install a third-party library or
> extension, that is).

I'll get started on that. I'm assuming I should generate a patch from
the trunk (2.7)? The file doesn't look different, but I want to make
sure I get it from the right place.

> I wonder if the patch could be structured as a conditional compilation?  You
> know how many different spots are touched, and how many lines per spot.
> If it could be, then theoretically it could be released and people could  do
> lots of comparative stress testing with different workloads.

That would be easy to do, because I am just replacing the
*NonRecursiveMutex functions.

> What about fairness? I don't know off-hand whether the GIL is
> fair, or whether critical sections are fair, but it needs to be
> considered.

If you define fairness in this context as not starving other threads
while consuming resources, that is built into the interpreter via
sys.setcheckinterval() and also anywhere the GIL is released for I/O.
What might be interesting is to see if releasing a critical section
and immediately re-acquiring it every _Py_CheckInterval bytecode
operations behaves in a similar manner (see ceval.c, line 869). My
best guess right now is that it will behave as expected when not using
the spin-based critical section. AFAIK, the kernel processes waiters
in a FIFO manner without regard to priority. Because a guarantee of
mutual exclusion is absolutely necessary, it's up to applications to
provide fairness. Python does a decent job of this.

- Phillip

From tlesher at  Tue May 26 23:03:44 2009
From: tlesher at (Tim Lesher)
Date: Tue, 26 May 2009 17:03:44 -0400
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 26, 2009 at 16:07, "Martin v. L?wis" <martin at> wrote:
>> 3. ?? - I'm sure there are other issues that deserve a look.
> What about fairness? I don't know off-hand whether the GIL is
> fair, or whether critical sections are fair, but it needs to be
> considered.

FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't
guarantee a defined order of wakeup among threads of equal priority.

Tim Lesher <tlesher at>

From solipsis at  Tue May 26 23:09:30 2009
From: solipsis at (Antoine Pitrou)
Date: Tue, 26 May 2009 21:09:30 +0000 (UTC)
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
References: <>
Message-ID: <>

Martin v. L?wis <martin <at>> writes:
> What about fairness? I don't know off-hand whether the GIL is
> fair,

According to a past discussion on this list, the current implementation isn't:
(at least on the poster's system)



From phillip.sitbon+python-dev at  Tue May 26 23:45:57 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Tue, 26 May 2009 14:45:57 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> FWIW, Win32 CriticalSections are guaranteed to be fair, but they don't
> guarantee a defined order of wakeup among threads of equal priority.

Indeed, I should have quoted the MSDN docs:

"The threads of a single process can use a critical section object for
mutual-exclusion synchronization. There is no guarantee about the
order in which threads will obtain ownership of the critical section,
however, the system will be fair to all threads."

I read somewhere else that the FIFO order is present, but obviously we
shouldn't to expect that if it's not documented as such.

> According to a past discussion on this list, the current implementation isn't:
> (at least on the poster's system)

I believe he's only talking about Linux. Apples & oranges when it
comes to stuff like this, although it still justifies looking into
what happens every _Py_CheckInterval on Windows.

- Phillip

From martin at  Wed May 27 01:24:02 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 27 May 2009 01:24:02 +0200
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>	<>
Message-ID: <>

> If you define fairness in this context as not starving other threads
> while consuming resources, that is built into the interpreter via
> sys.setcheckinterval() and also anywhere the GIL is released for I/O.
> What might be interesting is to see if releasing a critical section
> and immediately re-acquiring it every _Py_CheckInterval bytecode
> operations behaves in a similar manner (see ceval.c, line 869). My
> best guess right now is that it will behave as expected when not using
> the spin-based critical section. AFAIK, the kernel processes waiters
> in a FIFO manner without regard to priority. Because a guarantee of
> mutual exclusion is absolutely necessary, it's up to applications to
> provide fairness. Python does a decent job of this.

No: fairness in mutex synchronization means that every waiter for the
mutex will eventually acquire it; it won't happen that one thread
starves waiting for the mutex. This is something that the mutex needs to
provide, not the application.


From martin at  Wed May 27 01:36:54 2009
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 27 May 2009 01:36:54 +0200
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>	<>
Message-ID: <>

>> According to a past discussion on this list, the current implementation isn't:
>> (at least on the poster's system)
> I believe he's only talking about Linux. Apples & oranges when it
> comes to stuff like this

Please trust Antoine that it's relevant: if the current implementation
isn't fair on Linux, there is no need for the new implementation to be
fair on Windows.


From phillip.sitbon+python-dev at  Wed May 27 02:42:39 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Tue, 26 May 2009 17:42:39 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> No: fairness in mutex synchronization means that every waiter for the
> mutex will eventually acquire it; it won't happen that one thread
> starves waiting for the mutex. This is something that the mutex needs to
> provide, not the application.

Right, I guess I was thinking of it in terms of needing to release the
mutex at some point in order for it to be later acquired.

> Please trust Antoine that it's relevant: if the current implementation
> isn't fair on Linux, there is no need for the new implementation to be
> fair on Windows.

Fair enough.


While setting up my patch, I'm noticing something that could be
potentially bad for this idea that I overlooked until just now. I'm
going to hold off on submitting a ticket unless others suggest it's a
better idea to keep this discussion going there.

The thread module's lock object uses the same code used to lock and
unlock the GIL. By replacing the current locking mechanism with a
critical section, it'd be breaking the expected functionality of the
lock object, specifically two cases:

1. Blocking recursion: Critical sections don't block on recursion, no
way to enforce that
2. Releasing: Currently any thread can release a lock, but only the
owner release a critical section

Of course blocking recursion is only meaningful with the current
behavior of #2, otherwise it's an unrecoverable deadlock.

There are a few solutions to this. The first would be to implement
only the GIL as a critical section. The problem then is the need to
change all of the core code that does not use
PyEval_Acquire/ReleaseLock (there is some, right?), which is the best
place to use something other than the thread module's locking
mechanism on the GIL. This is doable with some effort, but clearly not
an option if there is any possibility that extensions are using
something other than the PyThreadState_*, PyGILState_* and PyEval_*
APIs to manipulate the GIL (are there others?). After any of this, of
course, I wonder what kind of crazy things might be expected of the
GIL externally that requires its behavior to remain as it is.

The second solution would be to use semaphores. I can't say yet if it
would be worth it performance-wise so I will refrain from conjecture
for the moment.

I like the first solution above... I don't know why non-recursion
would be necessary for the GIL; clearly it would be a little more
involved, but if I can demonstrate the performance gain maybe it's
worth my time.

- Phillip

From kristjan at  Wed May 27 11:23:00 2009
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 27 May 2009 09:23:00 +0000
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

I've often thought of this.
The problem is that the GIL uses the regular python "lock" which has to be non-recursive, since it is used for synchronization operations other than mutual exclusion, e.g. one thread going to sleep, and another waking it up.
Now, we could easily create another class of locks, a python "mutex"  or a "critical section" even, which is allowed (but not required) to be recursive.  On other platforms, this could fall back to being the good old lock.  Requiring it to be recursive would mean that we would need implementations for all platforms.  Which is possible, I suppose, building on the old python lock...

For the GIL, we would then use a python "mutex" or "critical section" whichever you prefer.

Note that for the GIL, if you use a CriticalSection object, you should initialize its "spincount" to zero, because the GIL is almost always in contention.  That is, if you don't get the GIL right away, you won't for a while.
I don't know what kernel primitive the Critical Section  uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing.


-----Original Message-----
From: at [ at] On Behalf Of Phillip Sitbon
Sent: 26. ma? 2009 19:49
To: python-dev at
Subject: [Python-Dev] Making the GIL faster & lighter on Windows

Hi everyone,

I'm new to the list but I've been embedding Python and working very
closely with the core sources for many years now. I discovered Python
a long time ago when I needed to embed a scripting language and found
the PHP sources... unreadable ;)

Anyway, I'd like to ask something that may have been asked already, so
I apologize if this has been covered.

Instead of removing the GIL, has anyone thought of making it more
lightweight? The current situation for Windows is that the
single-thread case is decently fast (via interlocked operations), but
it drops to using an event object in the case of contention. (see

From ncoghlan at  Wed May 27 13:17:55 2009
From: ncoghlan at (Nick Coghlan)
Date: Wed, 27 May 2009 21:17:55 +1000
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>
	<>	<>
	<> <>
Message-ID: <>

>>>>> Function-like macros (in particular, field access macros) remain
>>>>> available to applications, but get replaced by function calls
>>>>> (unless their definition only refers to features of the ABI, such
>>>>> as the various _Check macros)
>>>> Including Py_INCREF()/Py_DECREF() ?
>>> I believe so - MvL deliberately left the fields that the ref counting
>>> relies on as part of the ABI.
>> Hmm, another slow-down.
> ??? Why is "no change" a slow-down?

That was just a miscommunication - I misunderstood the sense in which
MAL was using "Including". He was referring to the first part of the
paragraph from the PEP (most macros become functions), but I answered
assuming he was referring to the part in parentheses (some macros get to

So to be perfectly clear: the Py_INCREF/Py_DECREF macros are available
as part of the stable ABI because they qualify for the PEP's "definition
only refers to features of the ABI" exception.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Wed May 27 13:24:02 2009
From: ncoghlan at (Nick Coghlan)
Date: Wed, 27 May 2009 21:24:02 +1000
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
 > No: fairness in mutex synchronization means that every waiter for the
> mutex will eventually acquire it; it won't happen that one thread
> starves waiting for the mutex. This is something that the mutex needs to
> provide, not the application.

CriticalSections are first come first served on Windows, just like a
regular mutex.  As Phillip already noted, their main limitation is that
they don't work cross-process (of course, that's also where they get
their extra speed).

Since we don't need the cross-process feature and we don't support Win
9x any more, this is certainly an idea worth looking at.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mal at  Wed May 27 14:05:06 2009
From: mal at (M.-A. Lemburg)
Date: Wed, 27 May 2009 14:05:06 +0200
Subject: [Python-Dev] PEP 384: Defining a Stable ABI
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

Nick Coghlan wrote:
> [PEP]
>>>>>> Function-like macros (in particular, field access macros) remain
>>>>>> available to applications, but get replaced by function calls
>>>>>> (unless their definition only refers to features of the ABI, such
>>>>>> as the various _Check macros)
> [MAL]
>>>>> Including Py_INCREF()/Py_DECREF() ?
> [Nick]
>>>> I believe so - MvL deliberately left the fields that the ref counting
>>>> relies on as part of the ABI.
> [MAL]
>>> Hmm, another slow-down.
> [MvL]
>> ??? Why is "no change" a slow-down?
> That was just a miscommunication - I misunderstood the sense in which
> MAL was using "Including". He was referring to the first part of the
> paragraph from the PEP (most macros become functions), but I answered
> assuming he was referring to the part in parentheses (some macros get to
> stay).
> So to be perfectly clear: the Py_INCREF/Py_DECREF macros are available
> as part of the stable ABI because they qualify for the PEP's "definition
> only refers to features of the ABI" exception.

Sorry for the confusion.

The exclusion clause in the PEP should probably be replaced by
an explicit list of macros which are made available.

It not necessarily obvious that a macro only uses features
made available through the ABI without actually digging through
the headers. In the case of Py_INCREF()/Py_DECREF() the
macros do use private macros which the ABI omits.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, May 27 2009)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...
2009-06-29: EuroPython 2009, Birmingham, UK                32 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From aahz at  Wed May 27 14:39:55 2009
From: aahz at (Aahz)
Date: Wed, 27 May 2009 05:39:55 -0700
Subject: [Python-Dev] Arguments of MatchObject in re module
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, May 26, 2009, MRAB wrote:
> >>> p = re.compile("foo")
> >>> help(p.match)
> Help on built-in function match:
> match(...)
>     match(string[, pos[, endpos]]) --> match object or None.
>     Matches zero or more characters at the beginning of the string
> >>> p.match(string="foo")
> Traceback (most recent call last):
>   File "<pyshell#8>", line 1, in <module>
>     p.match(string="foo")
> TypeError: Required argument 'pattern' (pos 1) not found
> The name of the first argument should be "string", yet it's "pattern".
> Does anyone know if it's anything other than a mistake? Should it be
> fixed in the next version of the re module, or are we just stuck with it
> (and should just change the docstring to match)?

Please file a report on so this doesn't get lost.
Attaching a suggested patch for _sre.c would be most welcome.
Aahz (aahz at           <*>

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 1993

From curt at  Wed May 27 14:59:48 2009
From: curt at (Curt Hagenlocher)
Date: Wed, 27 May 2009 05:59:48 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, May 27, 2009 at 4:24 AM, Nick Coghlan <ncoghlan at> wrote:
> CriticalSections are first come first served on Windows, just like a
> regular mutex.

"Starting with Windows Server 2003 with Service Pack 1 (SP1), threads
waiting on a critical section do not acquire the critical section on a
first-come, first-serve basis."

Windows critical sections use events for kernel-level synchronization.
The user-mode code basically consists of an interlocked instruction
inside the spin loop. When the likelihood of contention is low, a
critical section should be a big win because it won't need to switch
into the kernel. I suspect that contention will be frequent for the

A good description of pre-Vista Windows critical sections can be found here:

Curt Hagenlocher
curt at

From phillip.sitbon+python-dev at  Thu May 28 00:22:52 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Wed, 27 May 2009 15:22:52 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

Heads up to those who were following, I did my best to clearly outline
the situation and direction in the tracker.

It includes a patch that will break the expected behavior of the
thread lock object but make it possible to test GIL performance.

> Note that for the GIL, if you use a CriticalSection object, you should initialize its "spincount" to zero, because the GIL is almost always in contention. ?That is, if you don't get the GIL right away, you won't for a while.

If I'm not mistaken, calling InitializeCriticalSection rather than
InitializeCriticalSectionAndSpinCount (gotta love those long function
names) sets the spin count to zero. I could tell when the spin count
wasn't zero as far as performance is concerned - spinning is too much
of a gamble in most contention situations.

> I don't know what kernel primitive the Critical Section  uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing.

Judging from the increase in speed and CPU utilization I've seen, I
don't believe this is the case. My guess is that it's something
similar to a futex.

- Phillip

From at  Thu May 28 02:02:57 2009
From: at (Brian de Alwis)
Date: Wed, 27 May 2009 18:02:57 -0600
Subject: [Python-Dev] Survey on DVCS usage and experience
Message-ID: <>

Hello everybody.  I'm Brett's former lab-mate, and am part of a
team conducting a survey to understand the perceived benefits and
challenges of using a decentralized or distributed version control
systems (DVCS) in software development.

With Python having recently chosen to switch to Mercurial, I hoped
that any developers who've used a DVCS (and who are over 18 years
old) might like to participate in our survey and share your
experiences.  (We followed your extensive discussions on the switch
with great interest.)  Details on partcipating are below.  Thanks
for your time!


An increasing number of software projects have or are considering
switching their code repositories to a decentralized or distributed
VCS (DVCS).  There are many such DVCS tools, including git, bzr,
mercurial, monotone, or bitkeeper.  We are conducting a survey to
assess the perceived benefits and challenges of using a DVCS.  We
would ask that any individuals who use or are comfortable using a
DVCS for managing the artifacts for a project to please consider
completing the survey.  The survey has several open-ended questions,
and may take up to 20 minutes to complete.

The data collected from this study will be used in articles for
publication in journals and conference proceedings.  The results
of this study will provide additional knowledge and guidance for
projects considering moving to using a DVCS.

This is an anonymous survey.  Any personal information divulged
in answering a question will be kept strictly confidential.

The survey is at:

Please feel free to redistribute this to other interested groups.

If you would like more detail about the survey, or information not
included here, please contact us.

    Brian de Alwis
    Department of Computer Science
    University of Saskatchewan at

This research has the ethical approval of the Research Ethics Office
at the University of Saskatchewan.  If you have any concerns about your
treatment or rights as a research subject, please contact the office
at 306-966-2084.


-- Brian de Alwis | HCI Lab | University of Saskatchewan
On bike helmets: "If you think your hair is more important than your  
brain, you're probably right."  (B. J. Wawrykow)

From kristjan at  Thu May 28 11:00:22 2009
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Thu, 28 May 2009 09:00:22 +0000
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

You are right, a small experiment confirmed that it is set to 0 (see SetCriticalSectionSpinCount())
I had assumed that a small non-zero value might be chosen on multiprocessor machines.

Do you think that the problem lies with the use of the "event" object as such?  Have you tried using a "semaphore" or "mutex" instead?  Or do you think that all of the synchronizations primitives that rely on the WaitForMultipleObjects() api are subject to the same issue?



-----Original Message-----
From: at [ at] On Behalf Of Phillip Sitbon
Sent: 27. ma? 2009 22:23
To: python-dev
Subject: Re: [Python-Dev] Making the GIL faster & lighter on Windows

If I'm not mistaken, calling InitializeCriticalSection rather than
InitializeCriticalSectionAndSpinCount (gotta love those long function
names) sets the spin count to zero. I could tell when the spin count
wasn't zero as far as performance is concerned - spinning is too much
of a gamble in most contention situations.

> I don't know what kernel primitive the Critical Section  uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing.

Judging from the increase in speed and CPU utilization I've seen, I
don't believe this is the case. My guess is that it's something
similar to a futex.

From jeremy at  Thu May 28 15:06:03 2009
From: jeremy at (Jeremy Hylton)
Date: Thu, 28 May 2009 09:06:03 -0400
Subject: [Python-Dev] question about docstring formatting
Message-ID: <>

A question came up at work about docstring formatting.  It relates to
the description of the summary line in PEP 257.
"""Multi-line docstrings consist of a summary line just like a
one-line docstring, followed by a blank line, followed by a more
elaborate description. The summary line may be used by automatic
indexing tools; it is important that it fits on one line and is
separated from the rest of the docstring by a blank line. The summary
line may be on the same line as the opening quotes or on the next
line. The entire docstring is indented the same as the quotes at its
first line (see example below)."""

It says that the summary line may be used by automatic indexing tools,
but is there any evidence that such a tool actually exists?  Or was
there once upon a time?  If there are no such tools, do we still think
that it is important that it fits on line line?


From glyph at  Thu May 28 15:45:30 2009
From: glyph at (glyph at
Date: Thu, 28 May 2009 13:45:30 -0000
Subject: [Python-Dev] question about docstring formatting
In-Reply-To: <>
References: <>
Message-ID: <>

On 01:06 pm, jeremy at wrote:
>It says that the summary line may be used by automatic indexing tools,
>but is there any evidence that such a tool actually exists?  Or was
>there once upon a time?  If there are no such tools, do we still think
>that it is important that it fits on line line?

For what it's worth, appears to do this, 
as you can see from the numerous truncated sentences on 

I suspect a more reasonable approach for automatic documentation 
generators would be to try to identify the first complete sentence, 
rather than the first line... but, this is at least an accurate 
description of the status quo for some tools :).

From goodger at  Thu May 28 15:29:25 2009
From: goodger at (David Goodger)
Date: Thu, 28 May 2009 09:29:25 -0400
Subject: [Python-Dev] question about docstring formatting
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 28, 2009 at 09:06, Jeremy Hylton <jeremy at> wrote:
> A question came up at work about docstring formatting. ?It relates to
> the description of the summary line in PEP 257.
> """Multi-line docstrings consist of a summary line just like a
> one-line docstring, followed by a blank line, followed by a more
> elaborate description. The summary line may be used by automatic
> indexing tools; it is important that it fits on one line and is
> separated from the rest of the docstring by a blank line. The summary
> line may be on the same line as the opening quotes or on the next
> line. The entire docstring is indented the same as the quotes at its
> first line (see example below)."""
> It says that the summary line may be used by automatic indexing tools,
> but is there any evidence that such a tool actually exists? ?Or was
> there once upon a time? ?If there are no such tools, do we still think
> that it is important that it fits on line line?

There are several auto-documentation tools out there, like Sphinx and
epydoc, and the stdlib's pydoc. Historically there were other tools,
like HappyDoc ad Pythondoc. I'm not up on these or other tools, so I
don't know if or how that part of PEP 257 applies.

The point of the one-line summary was to allow for tooltips and
compact tables of contents.

Even if there were no supporting tools, I think it is useful to
express the intent of a class/method/function in a single line. The
process of distilling the description down can, in itself, be
illuminating. To imitate the Zen: if the code can't be described in a
short sentence, it may be too complicated.

I'm not saying that this should be enforced in any way. It's just a
guideline. If a tool needs a short summary and the docstring doens't
have a one-liner, I'd expect the tool just to take the first line and
add ellipsis ("...").

David Goodger <>

From phd at  Thu May 28 15:11:55 2009
From: phd at (Oleg Broytmann)
Date: Thu, 28 May 2009 17:11:55 +0400
Subject: [Python-Dev] question about docstring formatting
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, May 28, 2009 at 09:06:03AM -0400, Jeremy Hylton wrote:
> It says that the summary line may be used by automatic indexing tools,
> but is there any evidence that such a tool actually exists?

   epydoc, for one.

     Oleg Broytmann              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ziade.tarek at  Thu May 28 16:19:41 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 28 May 2009 16:19:41 +0200
Subject: [Python-Dev] [buildbot] some build slaves in bad shape
Message-ID: <>


I've noticed some problems since this morning with the trunk and 3.x
stable buildbots:

- x86 XP-4 (trunk and 3x) is throwing an "no space left on device"
error when it compiles the sqlite module in its temp dir

- amd64 gentoo 3.x  and ia64 Ubuntu 3.x buildbot versions seem to be
too old to run, they should be upgraded

- ppc Debian unstable trunk keeps on failing to connect to


Tarek Ziad? |

From rrr at  Thu May 28 18:12:52 2009
From: rrr at (Ron Adam)
Date: Thu, 28 May 2009 11:12:52 -0500
Subject: [Python-Dev] question about docstring formatting
In-Reply-To: <>
References: <>
Message-ID: <>

Jeremy Hylton wrote:
> A question came up at work about docstring formatting.  It relates to
> the description of the summary line in PEP 257.
> """Multi-line docstrings consist of a summary line just like a
> one-line docstring, followed by a blank line, followed by a more
> elaborate description. The summary line may be used by automatic
> indexing tools; it is important that it fits on one line and is
> separated from the rest of the docstring by a blank line. The summary
> line may be on the same line as the opening quotes or on the next
> line. The entire docstring is indented the same as the quotes at its
> first line (see example below)."""
> It says that the summary line may be used by automatic indexing tools,
> but is there any evidence that such a tool actually exists?  Or was
> there once upon a time?  If there are no such tools, do we still think
> that it is important that it fits on line line?
> Jeremy

Python's own built in help utility, pydoc uses it.

At the help prompt in the python console window, type "modules searchkey" 
to get a list of modules that contain the searchkey in thier one line summary.

Running pydoc with the -g option opens a tkinter search window, that 
searches the summery lines.  Selecting from that list then opens the 
browser to that item.


From phillip.sitbon+python-dev at  Thu May 28 18:11:17 2009
From: phillip.sitbon+python-dev at (Phillip Sitbon)
Date: Thu, 28 May 2009 09:11:17 -0700
Subject: [Python-Dev] Making the GIL faster & lighter on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

The testing patch I submitted to the tracker includes a semaphore as
well, and I did take some time to try it out. It seems that it's no
better than the event object, either for a single thread or scaled to
many threads... so this does appear to indicate that the WaitForXX
functions are costly (which is expected) and scale terribly (which is
unfortunate). I had always believed event objects to be "slower" but
I'm not seeing a difference here compared to semaphores. My guess is
that these results could be very different if I were to test on, say,
Windows 2000 instead of Vista.

- Phillip

2009/5/28 Kristj?n Valur J?nsson <kristjan at>:
> You are right, a small experiment confirmed that it is set to 0 (see SetCriticalSectionSpinCount())
> I had assumed that a small non-zero value might be chosen on multiprocessor machines.
> Do you think that the problem lies with the use of the "event" object as such? ?Have you tried using a "semaphore" or "mutex" instead? ?Or do you think that all of the synchronizations primitives that rely on the WaitForMultipleObjects() api are subject to the same issue?
> Cheers,
> Kristj?n
> -----Original Message-----
> From: at [ at] On Behalf Of Phillip Sitbon
> Sent: 27. ma? 2009 22:23
> To: python-dev
> Subject: Re: [Python-Dev] Making the GIL faster & lighter on Windows
> If I'm not mistaken, calling InitializeCriticalSection rather than
> InitializeCriticalSectionAndSpinCount (gotta love those long function
> names) sets the spin count to zero. I could tell when the spin count
> wasn't zero as far as performance is concerned - spinning is too much
> of a gamble in most contention situations.
>> I don't know what kernel primitive the Critical Section ?uses, but if it uses an Event object or something similar, we are in the same soup, so to say, because the CriticalSection's spinlocking feature buys us nothing.
> Judging from the increase in speed and CPU utilization I've seen, I
> don't believe this is the case. My guess is that it's something
> similar to a futex.

From rrr at  Thu May 28 18:12:52 2009
From: rrr at (Ron Adam)
Date: Thu, 28 May 2009 11:12:52 -0500
Subject: [Python-Dev] question about docstring formatting
In-Reply-To: <>
References: <>
Message-ID: <>

Jeremy Hylton wrote:
> A question came up at work about docstring formatting.  It relates to
> the description of the summary line in PEP 257.
> """Multi-line docstrings consist of a summary line just like a
> one-line docstring, followed by a blank line, followed by a more
> elaborate description. The summary line may be used by automatic
> indexing tools; it is important that it fits on one line and is
> separated from the rest of the docstring by a blank line. The summary
> line may be on the same line as the opening quotes or on the next
> line. The entire docstring is indented the same as the quotes at its
> first line (see example below)."""
> It says that the summary line may be used by automatic indexing tools,
> but is there any evidence that such a tool actually exists?  Or was
> there once upon a time?  If there are no such tools, do we still think
> that it is important that it fits on line line?
> Jeremy

Python's own built in help utility, pydoc uses it.

At the help prompt in the python console window, type "modules searchkey" 
to get a list of modules that contain the searchkey in thier one line summary.

Running pydoc with the -g option opens a tkinter search window, that 
searches the summery lines.  Selecting from that list then opens the 
browser to that item.


From at  Thu May 28 23:12:33 2009
From: at (David Bolen)
Date: Thu, 28 May 2009 17:12:33 -0400
Subject: [Python-Dev] [buildbot] some build slaves in bad shape
References: <>
Message-ID: <>

Tarek Ziad? <ziade.tarek at> writes:

> - x86 XP-4 (trunk and 3x) is throwing an "no space left on device"
> error when it compiles the sqlite module in its temp dir

Ooops, that's mine.  Geez - it's a VM, but has a 10GB C: drive, and
the actual build slave has its working directory on a separate virtual
drive.  Wonder what the heck has filled up the system drive.  I'm
working on it now though.

-- David

From eric at  Fri May 29 00:39:11 2009
From: eric at (Eric Smith)
Date: Thu, 28 May 2009 18:39:11 -0400
Subject: [Python-Dev] [Python-checkins] r72995 - in
 python/branches/py3k:	Doc/library/contextlib.rst
 Doc/whatsnew/3.1.rst	Lib/ Lib/test/
In-Reply-To: <>
References: <>
Message-ID: <>

raymond.hettinger wrote:
> Author: raymond.hettinger
> Date: Fri May 29 00:20:03 2009
> New Revision: 72995
> Log:
> Deprecate contextlib.nested().  The with-statement now provides this functionality directly.
> Modified:
>    python/branches/py3k/Doc/library/contextlib.rst
>    python/branches/py3k/Doc/whatsnew/3.1.rst
>    python/branches/py3k/Lib/
>    python/branches/py3k/Lib/test/
>    python/branches/py3k/Misc/NEWS

Shouldn't the test cases exist as long as contextlib.nested still 
exists? We want to make sure it works, after all. I think they should be 
removed only when .nested is itself deleted.


From at  Fri May 29 00:39:02 2009
From: at (David Bolen)
Date: Thu, 28 May 2009 18:39:02 -0400
Subject: [Python-Dev] [buildbot] some build slaves in bad shape
References: <>
Message-ID: <>

David Bolen < at> writes:

> Ooops, that's mine.  Geez - it's a VM, but has a 10GB C: drive, and
> the actual build slave has its working directory on a separate virtual
> drive.  Wonder what the heck has filled up the system drive.  I'm
> working on it now though.

Well, looks like it was 5+GB of temporary files of some sort.  It's
cleaned up now and back online.

-- David

From ziade.tarek at  Fri May 29 00:49:55 2009
From: ziade.tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 29 May 2009 00:49:55 +0200
Subject: [Python-Dev] [buildbot] some build slaves in bad shape
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, May 29, 2009 at 12:39 AM, David Bolen < at> wrote:
> David Bolen < at> writes:
>> Ooops, that's mine. ?Geez - it's a VM, but has a 10GB C: drive, and
>> the actual build slave has its working directory on a separate virtual
>> drive. ?Wonder what the heck has filled up the system drive. ?I'm
>> working on it now though.
> Well, looks like it was 5+GB of temporary files of some sort. ?It's
> cleaned up now and back online.

Thanks that's great

From dave at  Fri May 29 03:22:45 2009
From: dave at (David Abrahams)
Date: Thu, 28 May 2009 21:22:45 -0400
Subject: [Python-Dev] Possibility of binary configuration mismatch
Message-ID: <>

Hi All,

I'm not sure there's anything you can do about this, but I thought I
should alert the Python devs that it can happen... describes a
situation where my macports-installed python25 had a pyOpenSSL egg
installed in it by something other than macports (possibly by
easy_install-2.5?) that was not compatible with the Python build.  My
hunch is that the pyOpenSSL had binaries compiled against a UCS4 Python,
but I don't know for sure.  Whatever did the installation of the bad egg
was almost certainly being executed by the macports python25 because
macports is installed in /opt/local, and nothing is likely to have
installed it under that prefix by chance.  In other words, this egg
probably couldn't have been left over from some non-macports python
installation.  In fact, I haven't had any other version of Python2.5
installed on this machine.  Very odd.

I wonder if it makes sense to enhance the extension module system to
record this kind of information so the problem can be diagnosed by the

Dave Abrahams
BoostPro Computing

From ben+python at  Fri May 29 04:41:04 2009
From: ben+python at (Ben Finney)
Date: Fri, 29 May 2009 12:41:04 +1000
Subject: [Python-Dev] question about docstring formatting
References: <>
Message-ID: <>

David Goodger <goodger at> writes:

> Even if there were no supporting tools, I think it is useful to
> express the intent of a class/method/function in a single line. The
> process of distilling the description down can, in itself, be
> illuminating. To imitate the Zen: if the code can't be described in a
> short sentence, it may be too complicated.

Absolutely. If you can't describe what the (function, class, module)
does succinctly in a single line, how on earth are you going to choose
an appropriate short-but-descriptive name for it?

This constraint is well worth keeping, for exactly the reasons David
says above.

> I'm not saying that this should be enforced in any way. It's just a
> guideline. If a tool needs a short summary and the docstring doens't
> have a one-liner, I'd expect the tool just to take the first line and
> add ellipsis ("...").

Which in itself would be annoying enough to apply social pressure from
others to get the synopsis into a single line ? so again, I approve :-)

 \     ?Men never do evil so completely and cheerfully as when they do |
  `\        it from religious conviction.? ?Blaise Pascal (1623-1662), |
_o__)                                                   Pens?es, #894. |
Ben Finney

From orsenthil at  Fri May 29 05:35:08 2009
From: orsenthil at (Senthil Kumaran)
Date: Fri, 29 May 2009 09:05:08 +0530
Subject: [Python-Dev] Survey on DVCS usage and experience
Message-ID: <20090529033508.GA4463@ubuntu.ubuntu-domain>

On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote:

> With Python having recently chosen to switch to Mercurial, I hoped
> that any developers who've used a DVCS (and who are over 18 years
> old) might like to participate in our survey and share your

Just curious. Why is this age restriction?  You might miss out few
key developers...


From ideasman42 at  Fri May 29 06:07:02 2009
From: ideasman42 at (Campbell Barton)
Date: Thu, 28 May 2009 21:07:02 -0700
Subject: [Python-Dev] C/Python API Index removed?
Message-ID: <>

This page used to give an index of the C/Python API functions too

But a week or so ago I noticed all these functions are now missing (I
remember they existed in 2.6.1 docs)
Was this intentional?

Quite a while ago, ~2.5 the C/API docs had their own index which
personally I prefer.
This page is called an index but Im looking for a page like which includes all C/API
function names.

Is this the right place to mail such problems?

- Campbell

From ideasman42 at  Fri May 29 07:05:51 2009
From: ideasman42 at (Campbell Barton)
Date: Thu, 28 May 2009 22:05:51 -0700
Subject: [Python-Dev] Warnings when no file exists.
Message-ID: <>

Hi, there has been a problem in blender3d for 6~ years or so thats
eluded me, I decided to look into today.
- Whenever the a script raises a warnings python prints out binary
garbage in the console. Some users complain when they run python games
in blender they get beeps coming from the PC speaker.

It turns out that  _warning.c's setup_context() is taking the first
value of argv (line 534 in 2.6.2), which in our case is the blender
then some part of the binary is printed to the console.

Apart from the beeps and not being helpful this also can mess up the
console's state - a like "cat /dev/random" might.

But the real problem is that warnings expect a file to exist, in
blender we have our own internal text's that dont have a corresponding
file on disk, so setting __file__ in the global dict will just point
to a location that doesn't exist.
It surprises me that warnings do this since exceptions work as
expected, printing useful stack traces from our built in texts.

Incase this helps, the scripts are converted into a buffer and run like this...
 text->compiled = Py_CompileString( buf, text->, Py_file_input );
 PyEval_EvalCode( text->compiled, globaldict, globaldict );

Does anyone know of a workaround for this? Im sure there are other
cases where you may want to run compiled code that isnt related to a

- Campbell

From ben+python at  Fri May 29 07:30:45 2009
From: ben+python at (Ben Finney)
Date: Fri, 29 May 2009 15:30:45 +1000
Subject: [Python-Dev] Survey on DVCS usage and experience
References: <20090529033508.GA4463@ubuntu.ubuntu-domain>
Message-ID: <>

Senthil Kumaran <orsenthil at> writes:

> On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote:
> > With Python having recently chosen to switch to Mercurial, I hoped
> > that any developers who've used a DVCS (and who are over 18 years
> > old) might like to participate in our survey and share your
> Just curious. Why is this age restriction?  You might miss out few
> key developers...

I would guess because they need adult consent in order to legally use
the survey results as evidence in whatever psychological/sociological
study they perform.

 \     ?No matter how far down the wrong road you've gone, turn back.? |
  `\                                                  ?Turkish proverb |
_o__)                                                                  |
Ben Finney

From at  Fri May 29 09:13:34 2009
From: at (Brian de Alwis)
Date: Fri, 29 May 2009 01:13:34 -0600
Subject: [Python-Dev] Survey on DVCS usage and experience
In-Reply-To: <20090529033508.GA4463@ubuntu.ubuntu-domain>
References: <20090529033508.GA4463@ubuntu.ubuntu-domain>
Message-ID: <>

On 28-May-09, at 9:35 PM, Senthil Kumaran <orsenthil at> wrote:
> On Wed, May 27, 2009 at 06:02:57PM -0600, Brian de Alwis wrote:
>> With Python having recently chosen to switch to Mercurial, I hoped
>> that any developers who've used a DVCS (and who are over 18 years
>> old) might like to participate in our survey and share your
> Just curious. Why is this age restriction?  You might miss out few
> key developers...

It's a restriction required to obtain approval from our research  
ethics board -- people under 18 are considered to be minors in Canada  
and thus require the consent of their guardian to participate. Trying  
to obtain such permission for an anonymous survey is a bit difficult!   
Although we could work around this guardian-consent issue in theory,  
doing so would require jumping through several additional hoops in the  
ethics process and would take significantly more time.


From p.f.moore at  Fri May 29 12:57:31 2009
From: p.f.moore at (Paul Moore)
Date: Fri, 29 May 2009 11:57:31 +0100
Subject: [Python-Dev] Possibility of binary configuration mismatch
In-Reply-To: <>
References: <>
Message-ID: <>

2009/5/29 David Abrahams <dave at>:
> describes a
> situation where my macports-installed python25 had a pyOpenSSL egg
> installed in it by something other than macports (possibly by
> easy_install-2.5?) that was not compatible with the Python build. ?My
> hunch is that the pyOpenSSL had binaries compiled against a UCS4 Python,
> but I don't know for sure. ?Whatever did the installation of the bad egg
> was almost certainly being executed by the macports python25 because
> macports is installed in /opt/local, and nothing is likely to have
> installed it under that prefix by chance. ?In other words, this egg
> probably couldn't have been left over from some non-macports python
> installation. ?In fact, I haven't had any other version of Python2.5
> installed on this machine. ?Very odd.
> I wonder if it makes sense to enhance the extension module system to
> record this kind of information so the problem can be diagnosed by the
> system?

I have a feeling that this has been discussed before, in the context
of easy_install/setuptools' approach to encoding the build details for
a binary package in the filename, not covering UCS4 vs UCS2. You may
find it useful to search on the distutils-sig archives for further


From solipsis at  Fri May 29 13:14:28 2009
From: solipsis at (Antoine Pitrou)
Date: Fri, 29 May 2009 11:14:28 +0000 (UTC)
Subject: [Python-Dev] Survey on DVCS usage and experience
References: <20090529033508.GA4463@ubuntu.ubuntu-domain>
Message-ID: <>

Brian de Alwis < <at>> writes:
> It's a restriction required to obtain approval from our research  
> ethics board -- people under 18 are considered to be minors in Canada  
> and thus require the consent of their guardian to participate. Trying  
> to obtain such permission for an anonymous survey is a bit difficult!   

But since your survey is anonymous, you can't be sure all the responders are
over 18. Actually, they might even not be human beings!
(hint: I'm not)



From status at  Fri May 29 18:08:04 2009
From: status at (Python tracker)
Date: Fri, 29 May 2009 18:08:04 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (05/22/09 - 05/29/09)
Python tracker at

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.

 2201 open (+36) / 15764 closed (+18) / 17965 total (+54)

Open issues with patches:   866

Average duration of open issues: 652 days.
Median duration of open issues: 400 days.

Open Issues Breakdown
   open  2175 (+36)
pending    26 ( +0)

Issues Created Or Reopened (55)

improved allocation of PyUnicode objects                         05/24/09    reopened pitrou                        

str.format raises SystemError                                    05/22/09
CLOSED    created  eggy                          

zipfile DeprecationWarning Python 2.6.2                          05/22/09    created  ivb                           

Curses segfaulting in FreeBSD/amd64                              05/23/09    created  themoken                      

Changed Shortcuts don't show up in menu                          05/23/09    created  jamesie                       

Ambiguous locale.strxfrm                                         05/23/09
CLOSED    created  tuves                         

Python fails to build with Subversion 1.7                        05/23/09
CLOSED    created  Arfrever                      

os.curdir as the default argument for os.listdir                 05/23/09    created  tarek                         

SimpleXMLRPCServer not suitable for HTTP/1.1 keep-alive          05/24/09    created  krisvale                      
       patch, patch, easy, needs review                                        

Encoded surrogate characters on command line not escaped in sys. 05/24/09    created  baikie                        

xml.dom.minidom incorrectly claims DOM Level 3 conformance       05/24/09    created  phihag                        

HTTP/1.1 with keep-alive support for xmlrpclib.ServerProxy       05/24/09    created  krisvale                      
       patch, patch, needs review                                              

Expanding arrays inside other arrays                             05/24/09    created  marek_sp                      

SETUP_WITH                                                       05/24/09
CLOSED    created  benjamin.peterson             

When the package has non-ascii path and .pyc file, we cannot imp 05/25/09    created  Suzumizaki                    

Static library (libpythonX.Y.a) installed in incorrect location  05/25/09    created  Arfrever                      

OSX framework builds fail after r72861 move of _locale into core 05/25/09
CLOSED    created  nad                           

json.dumps doesn't respect OrderedDict's iteration order         05/25/09    created  wangchun                      

read_until                                                       05/25/09    created  ps                            

Subprocess.Popen output fails on Windows                         05/26/09
CLOSED    created  ac.james                      

unicode(exception) behaves differently on Py2.6 when len(excepti 05/26/09    created  ezio.melotti                  

IDLE rendering issue with oriental characters on OSX             05/26/09    created  ronaldoussoren                

IDLE has two "Preferences..." menu's on OSX                      05/26/09
CLOSED    created  ronaldoussoren                

Impossible to change preferences in IDLE                         05/26/09
CLOSED    created  ronaldoussoren                

scheduler.cancel does not raise RuntimeError                     05/26/09
CLOSED    created  fidlej                        

Dupicate instances of classes in list                            05/26/09
CLOSED    reopened mbaynham                      

distutils build_ext path comparison only based on strings        05/26/09    created  sleipnir                      

Header and doc related to PyNumber_Divide and PyNumber_InPlaceDi 05/26/09
CLOSED    created  bhy                           

frame.f_locals keeps references to things for too long           05/26/09    created  exarkun                       

Fix O(n**2) performance problem in socket._fileobject            05/26/09    created  krisvale                      
       patch, patch, easy, needs review                                        

urllib.parse.quote_plus ignores optional arguments               05/26/09
CLOSED    created  mgiuca                        

Confusing DeprecationWarning                                     05/26/09    created  alejolp                       

zipfile.ZipFile's extractall works inproperly under Windows      05/27/09
CLOSED    created  aerodonkey                    

help('modules ') causes IndexError.                              05/27/09
CLOSED    created  July                          
       patch, easy                                                             

OSError: [Errno 10] No child processes                           05/27/09    created  yonas                         

tarfile: opening an empty tar file fails                         05/27/09    created  evanj                         

Tkinter should support the OS X zoom button                      05/27/09    created  culler                        

2to3 mishandles "from module_name import" when module_name inclu 05/27/09
CLOSED    created  MLModel                       

Python 3 pdb: shows internal code, breakpoints don't work        05/27/09    created  ericp                         

Unexpected universal newline behavior (newline duplication) in W 05/27/09    created  jaraco                        

Consequences of using Py_TPFLAGS_HAVE_GC are incompletely explai 05/27/09    created  exarkun                       

2to3 does not convert imports of the form 'import sub.mod' to re 05/27/09
CLOSED    reopened MLModel                       

There ought to be a way for extension types to associate documen 05/27/09    created  exarkun                       

test_modulefinder leaks when run after test_distutils            05/27/09
CLOSED    created  pitrou                        

Implement the GIL with critical sections in Windows              05/27/09    created  sitbon                        

LOAD_CONST followed by LOAD_ATTR can be optimized to just be a L 05/27/09    created  alex                          

2to3 tests fail on Windows due to line endings                   05/28/09
CLOSED    created  abbeyj                        

subprocess seems to use local 8-bit encoding and gives no choice 05/28/09    created  mark                          

Make logging configuration files easier to use                   05/28/09    created  gjb1002                       

Pickle migration: Should pickle map "copy_reg" to "copyreg"?     05/28/09    created  mkiever                       

'./configure; make install' fails in step if .pydistuti 05/28/09    created  r.david.murray                

Typo in email.base64mime                                         05/28/09    created  ocean-city                    

configure error: shadow.h: present but cannot be compiled        05/29/09    created  Sashi                         

missing first argument on subprocess.Popen w/ executable         05/29/09    created  lieryan                       

Distutils doesn't remove .pyc files                              05/29/09    created  purpleidea                    

Issues Now Closed (53)

Test issue                                                        635 days    dtuser2                       

Return from fork() is pid_t, not int                              478 days    pitrou                        

pkg-config support                                                280 days    pitrou                        
       patch, needs review                                                     

test_fileio fails on OpenBSD 4.4                                  249 days    pitrou                        

ignored exceptions in generators (regression?)                    236 days    doughellmann                  

smtplib SMTP_SSL._get_socket doesn't return a value               228 days    r.david.murray                

Py_Object_HEAD_INIT in Py3k                                       183 days    georg.brandl                  

Issue with RotatingFileHandler logging handler on Windows         147 days    rcronk                        

pwd, spwd, grp functions vulnerable to denial of service          143 days    loewis                        

time.ctime docs refer to "time tuple" for default                 116 days    georg.brandl                  

IDLE to support                                       114 days    rhettinger                    

smtplib is broken in Python3                                      103 days    r.david.murray                
       patch, easy                                                             

StringIO can duplicate newlines in universal newlines mode        102 days    jaraco                        

OS X installer: fix makefile target changed for 3.x               101 days    ronaldoussoren                

sys.exc_info()[1] - different handling from str() and unicode()   102 days    georg.brandl                  

OS X Installer: by default install versioned-only links in /usr/   55 days    ronaldoussoren                

Speed up pickling of dicts in cPickle                              53 days    pitrou                        
       patch, needs review                                                     

idle pydoc et al removed from 3.1 without versioned replacements    3 days    nad                           

add file name to py3k IO objects repr()                            38 days    pitrou                        

pickle/cPickle of recursive tuples create pickles that cPickle c   37 days    collinwinter                  
       patch, easy, 26backport                                                 

there is en exception om Create User page                          33 days    georg.brandl                  

cPickle defect with tuples and different from pickle output        27 days    collinwinter                  

classmethod, staticmethod: expose wrapped function                 19 days    rhettinger                    

enhance getargs O& to accept cleanup function                      16 days    loewis                        

test_distutils leaves a 'foo' file behind in the cwd               11 days    rpetrov                       

"install" target in python 3.x makefile should be "fullinstall"     6 days    benjamin.peterson             

make distutils use the tarinfo command                             11 days    tarek                         

zipfile: Extracting a directory that already exists generates an    7 days    loewis                        

PYTHONHOME should be more flexible (and controllable by	--libdir    5 days    loewis                        
                                                                        failed assert when including extension modules         5 days    loewis                        

threading.Timer and gtk.main are not compatible                     7 days    amaury.forgeotdarc            
                                                                        doesn't work                                              1 days    georg.brandl                  

SyntaxError in xmlrpc.client examples                               1 days    georg.brandl                  

str.format raises SystemError                                       1 days    eric.smith                    

Ambiguous locale.strxfrm                                            0 days    loewis                        

Python fails to build with Subversion 1.7                           1 days    Arfrever                      

SETUP_WITH                                                          1 days    benjamin.peterson             

OSX framework builds fail after r72861 move of _locale into core    0 days    benjamin.peterson             

Subprocess.Popen output fails on Windows                            1 days    ac.james                      

IDLE has two "Preferences..." menu's on OSX                         0 days    ronaldoussoren                

Impossible to change preferences in IDLE                            0 days    nad                           

scheduler.cancel does not raise RuntimeError                        0 days    georg.brandl                  

Dupicate instances of classes in list                               0 days    mbaynham                      

Header and doc related to PyNumber_Divide and PyNumber_InPlaceDi    0 days    georg.brandl                  

urllib.parse.quote_plus ignores optional arguments                  0 days    georg.brandl                  

zipfile.ZipFile's extractall works inproperly under Windows         0 days    ocean-city                    

help('modules ') causes IndexError.                                 1 days    r.david.murray                
       patch, easy                                                             

2to3 mishandles "from module_name import" when module_name inclu    0 days    r.david.murray                

2to3 does not convert imports of the form 'import sub.mod' to re    0 days    benjamin.peterson             

test_modulefinder leaks when run after test_distutils               2 days    ocean-city                    

2to3 tests fail on Windows due to line endings                      1 days    benjamin.peterson             

os.listdir on empty strings. Inconsistent behaviour.             2063 days  benjamin.peterson             
       patch, needs review                                                     

Make fcntl work properly on AMD64                                1332 days pitrou                        

Top Issues Most Discussed (10)

 13 Dupicate instances of classes in list                              0 days

 10 improved allocation of PyUnicode objects                           5 days

  9 .pyc files created readonly if .py file is readonly, python won    9 days

  9 Python 2.6 makes .pyc/.pyo bytecode files executable               9 days

  8 LOAD_CONST followed by LOAD_ATTR can be optimized to just be a     2 days

  8 test_modulefinder leaks when run after test_distutils              2 days

  8 OSError: [Errno 10] No child processes                             2 days

  7 Impossible to change preferences in IDLE                           0 days

  6 zipfile.ZipFile's extractall works inproperly under Windows        0 days

  6 SETUP_WITH                                                         1 days

From dinov at  Sat May 30 02:08:46 2009
From: dinov at (Dino Viehland)
Date: Sat, 30 May 2009 00:08:46 +0000
Subject: [Python-Dev] Indentation oddness...
Message-ID: <>

Consider the code:

code = "def  Foo():\n\n    pass\n\n  "

This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces).  Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true.  So let's look at different ways I can consume this code.

If I use compile to compile this:

compile(code, 'foo', 'single')

I get an IndentationError: unindent does not match any outer indentation level

But if I put this in a file:

f= file('', 'w')
import indenttest

It imports just fine.

If I run it through the tokenize module it also tokenizes just fine:

>>> import tokenize
>>> from cStringIO import StringIO
>>> tokenize.tokenize(StringIO(code).readline)
1,0-1,3:        NAME    'def'
1,5-1,8:        NAME    'Foo'
1,8-1,9:        OP      '('
1,9-1,10:       OP      ')'
1,10-1,11:      OP      ':'
1,11-1,12:      NEWLINE '\n'
2,0-2,1:        NL      '\n'
3,0-3,4:        INDENT  '    '
3,4-3,8:        NAME    'pass'
3,8-3,9:        NEWLINE '\n'
4,0-4,1:        NL      '\n'
5,0-5,0:        DEDENT  ''
5,0-5,0:        ENDMARKER       ''

And if it fails anywhere it would seem tokenization is where it should fail - especially given that seems to report this error on other occasions.

And stranger still if I add a new line then it will even compile fine:

compile(code + '\n', 'foo', 'single')

Which seems strange because in either case all of the trailing lines are blank lines and as such should basically be ignored according to the documentation.

Is there some strange reason why compile rejects what everything else agrees is perfectly valid code?

From dinov at  Sat May 30 02:08:46 2009
From: dinov at (Dino Viehland)
Date: Sat, 30 May 2009 00:08:46 +0000
Subject: [Python-Dev] Indentation oddness...
Message-ID: <>

Consider the code:

code = "def  Foo():\n\n    pass\n\n  "

This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces).  Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true.  So let's look at different ways I can consume this code.

If I use compile to compile this:

compile(code, 'foo', 'single')

I get an IndentationError: unindent does not match any outer indentation level

But if I put this in a file:

f= file('', 'w')
import indenttest

It imports just fine.

If I run it through the tokenize module it also tokenizes just fine:

>>> import tokenize
>>> from cStringIO import StringIO
>>> tokenize.tokenize(StringIO(code).readline)
1,0-1,3:        NAME    'def'
1,5-1,8:        NAME    'Foo'
1,8-1,9:        OP      '('
1,9-1,10:       OP      ')'
1,10-1,11:      OP      ':'
1,11-1,12:      NEWLINE '\n'
2,0-2,1:        NL      '\n'
3,0-3,4:        INDENT  '    '
3,4-3,8:        NAME    'pass'
3,8-3,9:        NEWLINE '\n'
4,0-4,1:        NL      '\n'
5,0-5,0:        DEDENT  ''
5,0-5,0:        ENDMARKER       ''

And if it fails anywhere it would seem tokenization is where it should fail - especially given that seems to report this error on other occasions.

And stranger still if I add a new line then it will even compile fine:

compile(code + '\n', 'foo', 'single')

Which seems strange because in either case all of the trailing lines are blank lines and as such should basically be ignored according to the documentation.

Is there some strange reason why compile rejects what everything else agrees is perfectly valid code?

From robert.kern at  Sat May 30 02:26:22 2009
From: robert.kern at (Robert Kern)
Date: Fri, 29 May 2009 19:26:22 -0500
Subject: [Python-Dev] Indentation oddness...
In-Reply-To: <>
References: <>
Message-ID: <gvpufg$77n$>

On 2009-05-29 19:08, Dino Viehland wrote:
> Consider the code:
> code = "def  Foo():\n\n    pass\n\n  "
> This code is malformed in that the final indentation (2 spaces) does not agree with the previous indentation of the pass statement (4 spaces).  Or maybe it's just fine if you take the blank lines should be ignored statement from the docs to be true.  So let's look at different ways I can consume this code.
> If I use compile to compile this:
> compile(code, 'foo', 'single')
> I get an IndentationError: unindent does not match any outer indentation level
> But if I put this in a file:
> f= file('', 'w')
> f.write(code)
> f.close()
> import indenttest
> It imports just fine.

The 'single' mode, which is used for the REPL, is a bit different than 'exec', 
which is used for modules. This difference lets you insert "blank" lines of 
whitespace into a function definition without exiting the definition. Ending 
with a truly empty line does not cause the IndentationError, so the REPL can 
successfully compile the code, signaling that the user has finished typing the 

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From dinov at  Sat May 30 02:52:33 2009
From: dinov at (Dino Viehland)
Date: Sat, 30 May 2009 00:52:33 +0000
Subject: [Python-Dev] Indentation oddness...
In-Reply-To: <gvpufg$77n$>
References: <>
Message-ID: <>

> The 'single' mode, which is used for the REPL, is a bit different than
> 'exec',
> which is used for modules. This difference lets you insert "blank"
> lines of
> whitespace into a function definition without exiting the definition.
> Ending
> with a truly empty line does not cause the IndentationError, so the
> REPL can
> successfully compile the code, signaling that the user has finished
> typing the
> function.

Sorry, I probably should have mentioned this but it repros w/
compile(..., "exec") as well:

>>> code = "def  Foo():\n\n    pass\n\n  "
>>> compile(code, 'foo', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "foo", line 5

IndentationError: unindent does not match any outer indentation level

It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under
single and exec.

From guido at  Sat May 30 04:19:34 2009
From: guido at (Guido van Rossum)
Date: Fri, 29 May 2009 19:19:34 -0700
Subject: [Python-Dev] Indentation oddness...
In-Reply-To: <>
References: <>
Message-ID: <>

I usually append some extra newlines before passing a string to
compile(). That's the usual work-around. There's probably a subtle bug
in the tokenizer when reading from a string -- if you find it, please
upload a patch to the tracker!


On Fri, May 29, 2009 at 5:52 PM, Dino Viehland <dinov at> wrote:
>> The 'single' mode, which is used for the REPL, is a bit different than
>> 'exec',
>> which is used for modules. This difference lets you insert "blank"
>> lines of
>> whitespace into a function definition without exiting the definition.
>> Ending
>> with a truly empty line does not cause the IndentationError, so the
>> REPL can
>> successfully compile the code, signaling that the user has finished
>> typing the
>> function.
> Sorry, I probably should have mentioned this but it repros w/
> compile(..., "exec") as well:
>>>> code = "def ?Foo():\n\n ? ?pass\n\n ?"
>>>> compile(code, 'foo', 'exec')
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> ?File "foo", line 5
> IndentationError: unindent does not match any outer indentation level
> It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under
> single and exec.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From dinov at  Sat May 30 17:35:44 2009
From: dinov at (Dino Viehland)
Date: Sat, 30 May 2009 15:35:44 +0000
Subject: [Python-Dev] Indentation oddness...
In-Reply-To: <>
References: <>
Message-ID: <>

Unfortunately my problem is the opposite one - trying to emulate what
compile does for IronPython rather than just trying to make some code
compile.  So adding newlines doesn't help me.

But this case isn't really that important - it was just a wacky corner
case I ran into while trying to get other behavior right.  I think I can
safely ignore this one especially if it's just a bug.

> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf Of
> Guido van Rossum
> Sent: Friday, May 29, 2009 7:20 PM
> To: Dino Viehland
> Cc: Robert Kern; python-dev at
> Subject: Re: [Python-Dev] Indentation oddness...
> I usually append some extra newlines before passing a string to
> compile(). That's the usual work-around. There's probably a subtle bug
> in the tokenizer when reading from a string -- if you find it, please
> upload a patch to the tracker!
> --Guido
> On Fri, May 29, 2009 at 5:52 PM, Dino Viehland <dinov at>
> wrote:
> >> The 'single' mode, which is used for the REPL, is a bit different
> than
> >> 'exec',
> >> which is used for modules. This difference lets you insert "blank"
> >> lines of
> >> whitespace into a function definition without exiting the definition.
> >> Ending
> >> with a truly empty line does not cause the IndentationError, so the
> >> REPL can
> >> successfully compile the code, signaling that the user has finished
> >> typing the
> >> function.
> >
> > Sorry, I probably should have mentioned this but it repros w/
> > compile(..., "exec") as well:
> >
> >>>> code = "def  Foo():\n\n    pass\n\n  "
> >>>> compile(code, 'foo', 'exec')
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File "foo", line 5
> >
> > IndentationError: unindent does not match any outer indentation level
> >
> > It also repros when passing in PyCF_DONT_IMPLY_DEDENT for flags under
> > single and exec.
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> dev/
> >
> --
> --Guido van Rossum (home page:

From benjamin at  Sat May 30 20:04:35 2009
From: benjamin at (Benjamin Peterson)
Date: Sat, 30 May 2009 13:04:35 -0500
Subject: [Python-Dev] [RELEASED] Python 3.1 Release Candidate 1
Message-ID: <>

On behalf of the Python development team, I'm happy to announce the first
release candidate of Python 3.1.

Python 3.1 focuses on the stabilization and optimization of the features and
changes that Python 3.0 introduced.  For example, the new I/O system has been
rewritten in C for speed.  File system APIs that use unicode strings now handle
paths with undecodable bytes in them. Other features include an ordered
dictionary implementation, a condensed syntax for nested with statements, and
support for ttk Tile in Tkinter.  For a more extensive list of changes in 3.1,
see or Misc/NEWS in the Python

This is a release candidate, and as such, we do not recommend use in production
environments.  However, please take this opportunity to test the release with
your libraries or applications.  This will hopefully discover bugs before the
final release and allow you to determine how changes in 3.1 might impact you.
If you find things broken or incorrect, please submit a bug report at

For more information and downloadable distributions, see the Python 3.1 website:

See PEP 375 for release schedule details:

-- Benjamin

Benjamin Peterson
benjamin at
Release Manager
(on behalf of the entire python-dev team and 3.1's contributors)

From carmstr3 at  Sat May 30 20:02:04 2009
From: carmstr3 at (carmstr3 at
Date: Sat, 30 May 2009 13:02:04 -0500 (CDT)
Subject: [Python-Dev] looking for some people to talk with about Python
Message-ID: <>

My name is Chandler Armstrong and I'm investigating environments of collaboration.  I'm a PhD candidate at the University of Illinois, Urbana-Champaign, specialized in internet research and science & technology studies.  I'm generally interested in development methods overall, and specifically interested in both artificial languange construction and evolution, and collaboration in open-source models.  I would like to talk to some members of the Python development community about what kinds of activities they do within it.  If anybody is interested in this please email me at carmstr3 at  I will send you a document that describes the research and interview in more detail.  I'd like to do a voice interview over skype or a phone, but I can accomodate an online chat or even email.
I have some current research on this specific mailing list which is more quantitative in nature.   I downloaded the entire mailing list from the archives.  Next I looked through all the python-dev summaries and used links provided to referenced threads to indicate that a particular message or thread was meaningful in development.  I characterized the mailing list as threads, and each instance with about 30 attributes (things like the number of posts, the depth of the tree, a measure of 'branchyness' of the thread, the standard deviation of post counts across posters, the hour/day/month of the thread, etc).  Using these attributes I attempted to classify, using logistic regression, the threads that were indicated as meaningful in the python-dev summaries.  There are some significant results.  If anyone is interested I can send you my results, or even post them here to the list.  I'll be presenting my results at the Classification Society Conference at St. Louis in June.  The !
rk is unpublished at the moment but I hope to find a journal for it this summer.
I used entirely Python for all that quantitative work: downloading the mailing list and going through all the summaries, opening the links and matching the referenced message to the correct one in my downloaded database, and cleaning and transforming data.  It was a ton of fun.  I hope to develop more scripts for other sorts of automated analysis.
At any rate, please contact me if you'de like to contribute to my current tack of investigation.  I would ultimately want to interview however many people that are willing to talk with me.  I need to do about two in the next couple of weeks, and I would get with other volunteers in the weeks after that.
Chandler Armstrong
carmstr3 at

From nnorwitz at  Sat May 30 20:54:09 2009
From: nnorwitz at (Neal Norwitz)
Date: Sat, 30 May 2009 11:54:09 -0700
Subject: [Python-Dev] cleanup before 3.1 is released
Message-ID: <>

Has anyone run valgrind/purify and pychecker/pylint on the 3.1 code
recently?  Both sets of tools should be used before the final release
so we can fix any obvious problems.


From g.brandl at  Sat May 30 22:43:19 2009
From: g.brandl at (Georg Brandl)
Date: Sat, 30 May 2009 22:43:19 +0200
Subject: [Python-Dev] cleanup before 3.1 is released
In-Reply-To: <>
References: <>
Message-ID: <gvs5qp$ugd$>

Neal Norwitz schrieb:
> Has anyone run valgrind/purify and pychecker/pylint on the 3.1 code
> recently?  Both sets of tools should be used before the final release
> so we can fix any obvious problems.

Do pychecker/pylint work on 3.x code?


Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From wojtek.gminick.walczak at  Sun May 31 01:35:38 2009
From: wojtek.gminick.walczak at (Wojciech Walczak)
Date: Sun, 31 May 2009 01:35:38 +0200
Subject: [Python-Dev] [Sphinx] GSoC project announcement
Message-ID: <>

Hi, guys,

just a short introduction of one of this year's GSoC PSF projects:

I am implementing a support for per-paragraph comments and user/developer
interface for submitting/committing fixes in Sphinx[1].

In case you are interesed in adding your 2 cents (or more) by commenting
on my application[2] or proposing some enhancements - feel free to do so
on sphinx-dev[3]. Or take a look at my blog to keep up to date[4].

[1] -
[2] -
[3] -
[4] -

Best regards,
Wojtek Walczak

From greg.ewing at  Sun May 31 03:21:15 2009
From: greg.ewing at (Greg Ewing)
Date: Sun, 31 May 2009 13:21:15 +1200
Subject: [Python-Dev] Survey on DVCS usage and experience
In-Reply-To: <>
References: <20090529033508.GA4463@ubuntu.ubuntu-domain>
Message-ID: <>

Antoine Pitrou wrote:
> you can't be sure all the responders are
> over 18. Actually, they might even not be human beings!
> (hint: I'm not)

Not over 18, or not a human being?


From greg.ewing at  Sun May 31 04:02:41 2009
From: greg.ewing at (Greg Ewing)
Date: Sun, 31 May 2009 14:02:41 +1200
Subject: [Python-Dev] Indentation oddness...
In-Reply-To: <gvpufg$77n$>
References: <>
Message-ID: <>

Robert Kern wrote:

> The 'single' mode, which is used for the REPL, is a bit different than 
> 'exec', which is used for modules. This difference lets you insert 
> "blank" lines of whitespace into a function definition without exiting 
> the definition.

All that means is that the REPL needs to keep reading
lines until it gets a completely blank one. I don't
see why the compiler has to treat the source any
differently once the REPL has decided how much text
to feed it.
