From guido at  Sun Jan  1 00:56:00 2012
From: guido at (Guido van Rossum)
Date: Sat, 31 Dec 2011 16:56:00 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdo2u1$k0u$>
References: <>
Message-ID: <>

ISTM the only reasonable thing is to have a random seed picked very early
in the process, to be used to change the hash() function of
str/bytes/unicode (in a way that they are still compatible with each other).

The seed should be unique per process except it should survive fork() (but
not exec()). I'm not worried about unrelated processes needing to have the
same hash(), but I'm not against offering an env variable or command line
flag to force the seed.

I'm not too concerned about a 3rd party being able to guess the random seed
-- this would require much more effort on their part, since they would have
to generate a new set of colliding keys each time they think they have
guessed the hash (as long as they can't force the seed -- this actually
argues slightly *against* offering a way to force the seed, except that we
have strong backwards compatibility requirements).

We need to fix this as far back as Python 2.6, and it would be nice if a
source patch was available that works on Python 2.5 -- personally I do have
a need for a 2.5 fix and if nobody creates one I will probably end up
backporting the fix from 2.6 to 2.5.

Is there a tracker issue yet? The discussion should probably move there.

PS. I would propose a specific fix but I can't seem to build a working
CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this
error late in the build:

./python.exe -SE -m sysconfig --generate-posix-vars
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
  File "/Users/guido/cpython/Lib/", line 60, in <module>
make: *** [Lib/] Abort trap

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jan  1 01:11:12 2012
From: guido at (Guido van Rossum)
Date: Sat, 31 Dec 2011 17:11:12 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum <guido at> wrote:

> PS. I would propose a specific fix but I can't seem to build a working
> CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this
> error late in the build:
> ./python.exe -SE -m sysconfig --generate-posix-vars
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> Traceback (most recent call last):
>   File "/Users/guido/cpython/Lib/", line 60, in <module>
> make: *** [Lib/] Abort trap

FWIW I managed to build Python 2.6, and a trivial mutation of the
string/unicode hash function (add 1 initially) made only three tests fail;
test_symtable and test_json both have a dependency on dictionary order,
test_ctypes I can't quite figure out what's going on.

Oh, and an unrelated failure in test_sqlite:

  File "/Users/guido/pythons/p26/Lib/sqlite3/test/", line 355, in
    self.failUnlessEqual(ts.year, now.year)
AssertionError: 2012 != 2011

I betcha that's because it's still 2011 here in Texas but already 2012 in
UTC-land. Happy New Year everyone! :-)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From paul at  Sun Jan  1 04:29:59 2012
From: paul at (Paul McMillan)
Date: Sat, 31 Dec 2011 19:29:59 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> I'm not too concerned about a 3rd party being able to guess the random seed
> -- this would require much more effort on their part, since they would have
> to generate a new set of colliding keys each time they think they have
> guessed the hash

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.

The goal isn't perfection, but we need to do better than a simple
salt. I propose we modify the string hash function like this:

This code is based on PyPy's implementation, but the concept is
universal. Rather than choosing a single short random seed per
process, we generate a much larger random seed (r). As we hash, we
deterministically choose a portion of that seed and incorporate it
into the hash process. This modification is a minimally intrusive
change to the existing hash function, and so should not introduce
unexpected side effects which might come from switching to a different
class of hash functions.

I've worked through this code with Alex Gaynor, Antoine Pitrou, and
Victor Stinner, and have asked several mathematicians and security
experts to review the concept. The reviewers who have gotten back to
me thus far have agreed that if the initial random seed is not flawed,
this should not overly change the properties of the hash function, but
should make it quite difficult for an attacker to deduce the necessary
information to predictably cause hash collisions. This function is not
designed to protect against timing attacks, but should be nontrivial
to reverse even with access to timing data.

Empirical testing shows that this unoptimized python implementation
produces ~10% slowdown in the hashing of ~20 character strings. This
is probably an acceptable trade off, and actually provides better
performance in the case of short strings than a high-entropy
fixed-length seed prefix.


From martin at  Sun Jan  1 04:36:37 2012
From: martin at (martin at
Date: Sun, 01 Jan 2012 04:36:37 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> (Well, technically, you could use trees or some other O log n data
> structure as a fallback once you have too many collisions, for some value
> of "too many".  Seems a bit wasteful for the purpose, though.)

I don't think that would be wasteful. You wouldn't just use the tree for
the case of too many collisions, but for any collision. You might special-case
the case of a single key, i.e. start using the tree only if there is a

The issue is not the effort, but the need to support ordering if you want
to use trees. So you might restrict this to dicts that have only str keys
(which in practice should never have any collision, unless it's a deliberate

I'd use the tagged-pointer trick to determine whether a key is an object
pointer (tag 0) or an AVL tree (tag 1). So in the common case of interned
strings, the comparison for pointer equality (which is the normal case
if the keys are interned) will succeed quickly; if pointer comparison fails,
check the tag bit.


From solipsis at  Sun Jan  1 05:11:03 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 1 Jan 2012 05:11:03 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
Message-ID: <>

On Sat, 31 Dec 2011 16:56:00 -0700
Guido van Rossum <guido at> wrote:
> ISTM the only reasonable thing is to have a random seed picked very early
> in the process, to be used to change the hash() function of
> str/bytes/unicode (in a way that they are still compatible with each other).

Do str and bytes still have to be compatible with each other in 3.x?

Merry hashes, weakrefs and thread-local memoryviews to everyone!



From guido at  Sun Jan  1 05:22:47 2012
From: guido at (Guido van Rossum)
Date: Sat, 31 Dec 2011 21:22:47 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou <solipsis at> wrote:

> On Sat, 31 Dec 2011 16:56:00 -0700
> Guido van Rossum <guido at> wrote:
> > ISTM the only reasonable thing is to have a random seed picked very early
> > in the process, to be used to change the hash() function of
> > str/bytes/unicode (in a way that they are still compatible with each
> other).
> Do str and bytes still have to be compatible with each other in 3.x?

Hm, you're right, that's no longer a concern. (Though ATM the hashes still
*are* compatible.)

> Merry hashes, weakrefs and thread-local memoryviews to everyone!


--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jan  1 05:31:50 2012
From: guido at (Guido van Rossum)
Date: Sat, 31 Dec 2011 21:31:50 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan <paul at> wrote:

> > I'm not too concerned about a 3rd party being able to guess the random
> seed
> > -- this would require much more effort on their part, since they would
> have
> > to generate a new set of colliding keys each time they think they have
> > guessed the hash
> This is incorrect. Once an attacker has guessed the random seed, any
> operation which reveals the ordering of hashed objects can be used to
> verify the answer. JSON responses would be ideal. In fact, an attacker
> can do a brute-force attack of the random seed offline. Once they have
> the seed, generating collisions is a fast process.

Still, it would represent an effort for the attacker of a much greater
magnitude than the current attack. It's all a trade-off -- at some point
it'll just be easier for the attacker to use some other vulnerability. Also
the class of vulnerable servers would be greatly reduced.

> The goal isn't perfection, but we need to do better than a simple
> salt.


> I propose we modify the string hash function like this:
> This code is based on PyPy's implementation, but the concept is
> universal. Rather than choosing a single short random seed per
> process, we generate a much larger random seed (r). As we hash, we
> deterministically choose a portion of that seed and incorporate it
> into the hash process. This modification is a minimally intrusive
> change to the existing hash function, and so should not introduce
> unexpected side effects which might come from switching to a different
> class of hash functions.

I'm not sure I understand this. What's the worry about "a different class
of hash functions"? (It may be clear that I do not have a deep mathematical
understanding of hash functions.)

> I've worked through this code with Alex Gaynor, Antoine Pitrou, and
> Victor Stinner, and have asked several mathematicians and security
> experts to review the concept. The reviewers who have gotten back to
> me thus far have agreed that if the initial random seed is not flawed,

I forget -- what do we do on systems without urandom()? (E.g. Windows?)

> this should not overly change the properties of the hash function, but
> should make it quite difficult for an attacker to deduce the necessary
> information to predictably cause hash collisions. This function is not
> designed to protect against timing attacks, but should be nontrivial
> to reverse even with access to timing data.

Let's worry about timing attacks another time okay?

> Empirical testing shows that this unoptimized python implementation
> produces ~10% slowdown in the hashing of ~20 character strings. This
> is probably an acceptable trade off, and actually provides better
> performance in the case of short strings than a high-entropy
> fixed-length seed prefix.

Hm. I'm not sure I like the idea of extra arithmetic for every character
being hashed. But I like the idea of a bigger random seed from which we
deterministically pick some part. How about just initializing x to some
subsequence of the seed determined by e.g. the length of the hashed string
plus a few characters from it?

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From paul at  Sun Jan  1 06:57:09 2012
From: paul at (Paul McMillan)
Date: Sat, 31 Dec 2011 21:57:09 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> Still, it would represent an effort for the attacker of a much greater
> magnitude than the current attack. It's all a trade-off -- at some point
> it'll just be easier for the attacker to use some other vulnerability. Also
> the class of vulnerable servers would be greatly reduced.

I agree that doing anything is better than doing nothing. If we use
the earlier suggestion and prepend everything with a fixed-length
seed, we need quite a bit of entropy (and so a fairly long string) to
make a lasting difference.

> I'm not sure I understand this. What's the worry about "a different class of
> hash functions"? (It may be clear that I do not have a deep mathematical
> understanding of hash functions.)

This was mostly in reference to earlier suggestions of switching to
cityhash, or using btrees, or other more invasive changes. Python 2.X
is pretty stable and making large changes like that to the codebase
can have unpredictable effects. We know that the current hash function
works well (except for this specific problem), so it seems like the
best fix will be as minimal a modification as possible, to avoid
introducing bugs.

> I forget -- what do we do on systems without urandom()? (E.g. Windows?)
Windows has CryptGenRandom which is approximately equivalent.

> Let's worry about timing attacks another time okay?
Agreed. As long as there isn't a gaping hole, I'm fine with that.

> Hm. I'm not sure I like the idea of extra arithmetic for every character
> being hashed.

>From a performance standpoint, this may still be better than adding 8
or 10 characters to every single hash operation, since most hashes are
over short strings. It is important that this function touches every
character - if it only interacts with a subset of them, an attacker
can fix that subset and vary the rest.

> But I like the idea of a bigger random seed from which we
> deterministically pick some part.

Yeah. This makes it much harder to attack, since it very solidly
places the attacker outside the realm of "just brute force the key".

> How about just initializing x to some
> subsequence of the seed determined by e.g. the length of the hashed string
> plus a few characters from it?

We did consider this, and if performance is absolutely the prime
directive, this (or a variant) may be the best option. Unfortunately,
the collision generator doesn't necessarily vary the length of the
string. Additionally, if we don't vary based on all the letters in the
string, an attacker can fix the characters that we do use and generate
colliding strings around them.

Another option to consider would be to apply this change to some but
not all of the rounds. If we added the seed lookup xor operation for
only the first and last 5 values of x, we would still retain much of
the benefit without adding much computational overhead for very long

We could also consider a less computationally expensive operation than
the modulo for calculating the lookup index, like simply truncating to
the correct number of bits.


From guido at  Sun Jan  1 16:09:54 2012
From: guido at (Guido van Rossum)
Date: Sun, 1 Jan 2012 08:09:54 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan <paul at> wrote:

> > Still, it would represent an effort for the attacker of a much greater
> > magnitude than the current attack. It's all a trade-off -- at some point
> > it'll just be easier for the attacker to use some other vulnerability.
> Also
> > the class of vulnerable servers would be greatly reduced.
> I agree that doing anything is better than doing nothing. If we use
> the earlier suggestion and prepend everything with a fixed-length
> seed, we need quite a bit of entropy (and so a fairly long string) to
> make a lasting difference.

Ah, but the effect of that long string is summarized in a single (32- or
64-bit) integer.

>  > I'm not sure I understand this. What's the worry about "a different
> class of
> > hash functions"? (It may be clear that I do not have a deep mathematical
> > understanding of hash functions.)
> This was mostly in reference to earlier suggestions of switching to
> cityhash, or using btrees, or other more invasive changes. Python 2.X
> is pretty stable and making large changes like that to the codebase
> can have unpredictable effects.


> We know that the current hash function
> works well (except for this specific problem), so it seems like the
> best fix will be as minimal a modification as possible, to avoid
> introducing bugs.


> > I forget -- what do we do on systems without urandom()? (E.g. Windows?)
> Windows has CryptGenRandom which is approximately equivalent.
> > Let's worry about timing attacks another time okay?
> Agreed. As long as there isn't a gaping hole, I'm fine with that.
> > Hm. I'm not sure I like the idea of extra arithmetic for every character
> > being hashed.
> From a performance standpoint, this may still be better than adding 8
> or 10 characters to every single hash operation, since most hashes are
> over short strings.

But how about precomputing the intermediate value (x)? The hash is (mostly)
doing x = f(x, c) for each c in the input.

It is important that this function touches every
> character - if it only interacts with a subset of them, an attacker
> can fix that subset and vary the rest.

I sort of see your point, but I still think that if we could add as little
per-character overhead as possible it would be best -- sometimes people
*do* hash very long strings.

> > But I like the idea of a bigger random seed from which we
> > deterministically pick some part.
> Yeah. This makes it much harder to attack, since it very solidly
> places the attacker outside the realm of "just brute force the key".
> > How about just initializing x to some
> > subsequence of the seed determined by e.g. the length of the hashed
> string
> > plus a few characters from it?
> We did consider this, and if performance is absolutely the prime
> directive, this (or a variant) may be the best option. Unfortunately,
> the collision generator doesn't necessarily vary the length of the
> string. Additionally, if we don't vary based on all the letters in the
> string, an attacker can fix the characters that we do use and generate
> colliding strings around them.

Still, much more work for the attacker.

> Another option to consider would be to apply this change to some but
> not all of the rounds. If we added the seed lookup xor operation for
> only the first and last 5 values of x, we would still retain much of
> the benefit without adding much computational overhead for very long
> strings.

I like that.

> We could also consider a less computationally expensive operation than
> the modulo for calculating the lookup index, like simply truncating to
> the correct number of bits.

Sure. Thanks for thinking about all the details here!!

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jan  1 16:13:11 2012
From: guido at (Guido van Rossum)
Date: Sun, 1 Jan 2012 08:13:11 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Different concern. What if someone were to have code implementing an
external, persistent hash table, using Python's hash() function? They might
have a way to rehash everything when a new version of Python comes along,
but they would not be happy if hash() is different in each process. I
somehow vaguely remember possibly having seen such code, or something else
where a bit of random data was needed and hash() was used since it's so
easily available.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jan  1 16:13:44 2012
From: guido at (Guido van Rossum)
Date: Sun, 1 Jan 2012 08:13:44 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

PS. Is the collision-generator used in the attack code open source?

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From lists at  Sun Jan  1 16:27:51 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 16:27:51 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 01.01.2012 16:13, schrieb Guido van Rossum:
> Different concern. What if someone were to have code implementing an
> external, persistent hash table, using Python's hash() function? They
> might have a way to rehash everything when a new version of Python comes
> along, but they would not be happy if hash() is different in each
> process. I somehow vaguely remember possibly having seen such code, or
> something else where a bit of random data was needed and hash() was used
> since it's so easily available.

I had the same concern as you and was worried that projects like ZODB
might require a stable hash function. Fred already stated that ZODB
doesn't use the hash in its btree structures.

Possible solutions:

 * make it possible to provide the seed as an env var

 * disable randomizing as default setting or at least add an option to
disable randomization

IMHO the issue needs a PEP that explains the issue, shows all possible
solutions and describes how we have solved the issue. I'm willing to
start a PEP. Who likes to be the co-author?


From lists at  Sun Jan  1 16:30:26 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 16:30:26 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdo2u1$k0u$>
References: <>
Message-ID: <>

Am 31.12.2011 23:38, schrieb Terry Reedy:
> On 12/31/2011 4:43 PM, PJ Eby wrote:
>> Here's an idea.  Suppose we add a sys.hash_seed or some such, that's
>> settable to an int, and defaults to whatever we're using now.  Then
>> programs that want a fix can just set it to a random number,
> I do not think we can allow that to change once there are hashed 
> dictionaries existing.

Me, too. Armin suggested to use an env var as random.

From lists at  Sun Jan  1 16:48:32 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 16:48:32 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 01.01.2012 00:56, schrieb Guido van Rossum:
> ISTM the only reasonable thing is to have a random seed picked very
> early in the process, to be used to change the hash() function of
> str/bytes/unicode (in a way that they are still compatible with each other).
> The seed should be unique per process except it should survive fork()
> (but not exec()). I'm not worried about unrelated processes needing to
> have the same hash(), but I'm not against offering an env variable or
> command line flag to force the seed.

I've created a clone at as a
testbed. The code creates the seed very early in PyInitializeEx(). The
method isn't called on fork() but on exec().

> I'm not too concerned about a 3rd party being able to guess the random
> seed -- this would require much more effort on their part, since they
> would have to generate a new set of colliding keys each time they think
> they have guessed the hash (as long as they can't force the seed -- this
> actually argues slightly *against* offering a way to force the seed,
> except that we have strong backwards compatibility requirements).

The talkers claim and have shown that it's too easy to pre-calculate
collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
might be a good idea to change the hashing algorithm, too. Paul as
listed some new algorithms. Ruby 1.9 is using FNV which promises to be fast with a
good dispersion pattern. A hashing algorithm without a
meet-in-the-middle vulnerability would reduce the pressure on a good and
secure seed, too.

> We need to fix this as far back as Python 2.6, and it would be nice if a
> source patch was available that works on Python 2.5 -- personally I do
> have a need for a 2.5 fix and if nobody creates one I will probably end
> up backporting the fix from 2.6 to 2.5.


Should the randomization be disabled on 2.5 to 3.2 by default to reduce
backward compatibility issues?


From lists at  Sun Jan  1 16:56:19 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 16:56:19 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 01.01.2012 05:11, schrieb Antoine Pitrou:
> On Sat, 31 Dec 2011 16:56:00 -0700
> Guido van Rossum <guido at> wrote:
>> ISTM the only reasonable thing is to have a random seed picked very early
>> in the process, to be used to change the hash() function of
>> str/bytes/unicode (in a way that they are still compatible with each other).
> Do str and bytes still have to be compatible with each other in 3.x?

py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking
about this invariant?


From solipsis at  Sun Jan  1 17:09:23 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 1 Jan 2012 17:09:23 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
Message-ID: <>

On Sun, 01 Jan 2012 16:48:32 +0100
Christian Heimes <lists at> wrote:
> The talkers claim and have shown that it's too easy to pre-calculate
> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
> might be a good idea to change the hashing algorithm, too. Paul as
> listed some new algorithms. Ruby 1.9 is using FNV
> which promises to be fast with a
> good dispersion pattern.

We already seem to be using a FNV-alike, is it just a matter of
changing the parameters?

> A hashing algorithm without a
> meet-in-the-middle vulnerability would reduce the pressure on a good and
> secure seed, too.
> > We need to fix this as far back as Python 2.6, and it would be nice if a
> > source patch was available that works on Python 2.5 -- personally I do
> > have a need for a 2.5 fix and if nobody creates one I will probably end
> > up backporting the fix from 2.6 to 2.5.
> +1
> Should the randomization be disabled on 2.5 to 3.2 by default to reduce
> backward compatibility issues?

Isn't 2.5 already EOL'ed?
As for 3.2, I'd say no. I don't know about 2.6 and 2.7.



From solipsis at  Sun Jan  1 17:10:03 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 1 Jan 2012 17:10:03 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, 01 Jan 2012 16:56:19 +0100
Christian Heimes <lists at> wrote:
> Am 01.01.2012 05:11, schrieb Antoine Pitrou:
> > On Sat, 31 Dec 2011 16:56:00 -0700
> > Guido van Rossum <guido at> wrote:
> >> ISTM the only reasonable thing is to have a random seed picked very early
> >> in the process, to be used to change the hash() function of
> >> str/bytes/unicode (in a way that they are still compatible with each other).
> > 
> > Do str and bytes still have to be compatible with each other in 3.x?
> py3k has tests for hash("ascii") == hash(b"ascii"). Are you talking
> about this invariant?

Yes. It doesn't seem to have any point anymore.



From lists at  Sun Jan  1 17:20:34 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 17:20:34 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 01.01.2012 06:57, schrieb Paul McMillan:
> I agree that doing anything is better than doing nothing. If we use
> the earlier suggestion and prepend everything with a fixed-length
> seed, we need quite a bit of entropy (and so a fairly long string) to
> make a lasting difference.

Your code at reads about 2
MB (2**21 - 1) data from urandom. I'm worried that this is going to
exhaust the OS's random pool and suck it dry. We shouldn't forget that
Python is used for long running processes as well as short scripts. Your
suggestion also increases the process size by 2 MB which is going to be
an issue for mobile and embedded platforms.

How about this:

r = [ord(i) for i in os.urandom(256)]
rs = os.urandom(4) # or 8 ?
seed = rs[-1] + (rs[-2] << 8) + (rs[-3] << 16) + (rs[-4] << 24)

def _hash_string(s):
    """The algorithm behind compute_hash() for a string or a unicode."""
    from pypy.rlib.rarithmetic import intmask
    length = len(s)
    if length == 0:
        return -1
    x = intmask(seed + (ord(s[0]) << 7))
    i = 0
    while i < length:
        o = ord(s[i])
        x = intmask((1000003*x) ^ o ^ r[o % 0xff]
        i += 1
    x ^= length
    return intmask(x)

This combines a random seed for the hash with your suggestion.

We also need to special case short strings. The above routine hands over
the seed to attackers, if he is able to retrieve lots of single
character hashes. The randomization shouldn't be used if we can prove
that it's not possible to create hash collisions for strings shorter
than X. For example 64bit FNV-1 has no collisions for 8 chars or less,
32bit FNV has no collisions for 4 or less cars.



From lists at  Sun Jan  1 17:34:31 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 17:34:31 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Am 01.01.2012 17:09, schrieb Antoine Pitrou:
> On Sun, 01 Jan 2012 16:48:32 +0100
> Christian Heimes <lists at> wrote:
>> The talkers claim and have shown that it's too easy to pre-calculate
>> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
>> might be a good idea to change the hashing algorithm, too. Paul as
>> listed some new algorithms. Ruby 1.9 is using FNV
>> which promises to be fast with a
>> good dispersion pattern.
> We already seem to be using a FNV-alike, is it just a matter of
> changing the parameters?

No, we are using something similar to DJBX33X. FNV is a completely
different type of hash algorithm.

From solipsis at  Sun Jan  1 17:54:04 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 01 Jan 2012 17:54:04 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <1325436844.3472.6.camel@localhost.localdomain>

Le dimanche 01 janvier 2012 ? 17:34 +0100, Christian Heimes a ?crit :
> Am 01.01.2012 17:09, schrieb Antoine Pitrou:
> > On Sun, 01 Jan 2012 16:48:32 +0100
> > Christian Heimes <lists at> wrote:
> >> The talkers claim and have shown that it's too easy to pre-calculate
> >> collisions with hashing algorithms similar to DJBX33X / DJBX33A. It
> >> might be a good idea to change the hashing algorithm, too. Paul as
> >> listed some new algorithms. Ruby 1.9 is using FNV
> >> which promises to be fast with a
> >> good dispersion pattern.
> > 
> > We already seem to be using a FNV-alike, is it just a matter of
> > changing the parameters?
> No, we are using something similar to DJBX33X. FNV is a completely
> different type of hash algorithm.

I don't understand. FNV-1 multiplies the current running result with a
prime and then xors it with the following byte. This is also what we do.
(I'm assuming 1000003 is prime)

I see two differences:
- FNV starts with a non-zero constant offset basis
- FNV uses a different prime than ours

(as a side note, FNV operates on bytes, but for unicode we must operate
on code points in [0, 1114111]: although arguably the common case is
hashing ASCII substrings (protocol tokens etc.))



From lists at  Sun Jan  1 18:28:19 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 18:28:19 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <1325436844.3472.6.camel@localhost.localdomain>
References: <>
	<> <>
Message-ID: <>

Am 01.01.2012 17:54, schrieb Antoine Pitrou:
> I don't understand. FNV-1 multiplies the current running result with a
> prime and then xors it with the following byte. This is also what we do.
> (I'm assuming 1000003 is prime)

There must be a major difference somewhere inside the algorithm. The
talk at the CCC conference in Berlin mentions that Ruby 1.9 is not
vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C
code of FNV is more complex than our code, too.


From lists at  Sun Jan  1 18:32:12 2012
From: lists at (Christian Heimes)
Date: Sun, 01 Jan 2012 18:32:12 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 01.01.2012 01:11, schrieb Guido van Rossum:
> FWIW I managed to build Python 2.6, and a trivial mutation of the
> string/unicode hash function (add 1 initially) made only three tests
> fail; test_symtable and test_json both have a dependency on dictionary
> order, test_ctypes I can't quite figure out what's going on.

In my fork, these tests are failing:

test_dbm test_dis test_gdb test_inspect test_packaging test_set
test_symtable test_urllib test_userdict test_collections

From victor.stinner at  Sun Jan  1 18:32:51 2012
From: victor.stinner at (Victor Stinner)
Date: Sun, 01 Jan 2012 18:32:51 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Le 01/01/2012 04:29, Paul McMillan a ?crit :
> This is incorrect. Once an attacker has guessed the random seed, any
> operation which reveals the ordering of hashed objects can be used to
> verify the answer. JSON responses would be ideal. In fact, an attacker
> can do a brute-force attack of the random seed offline. Once they have
> the seed, generating collisions is a fast process.

If we want to protect a website against this attack for example, we must 
suppose that the attacker can inject arbitrary data and can get 
(indirectly) the result of hash(str) (e.g. with the representation of a 
dict in a traceback, with a JSON output, etc.).

> The goal isn't perfection, but we need to do better than a simple
> salt.

I disagree. I don't want to break backward compatibility and have a 
hash() function different for each process, if the change is not an 
effective protection against the "hash vulnerability".

It's really hard to write a good (secure) hash function: see for example 
the recent NIST competition (started in 2008, will end this year). Even 
good security researcher are unable to write a strong and fast hash 
function. It's easy to add a weakness in the function if you don't have 
a good background in cryptography. The NIST competition gives 4 years to 
analyze new hash functions. We should not rush to add a quick "hack" if 
it doesn't solve correctly the problem (protect against a collision 
attack and preimage attack).

Runtime performance does matter, I'm not completly sure that changing 
Python is the best place to add a countermeasure against a 
vulnerability. I don't want to slow down numpy for a web vulnerability. 
Because there are different use cases, a better compromise is maybe to 
add a runtime option to use a secure hash function, and keep the unsafe 
but fast hash function by default.

> I propose we modify the string hash function like this:

Always allocate 2**21 bytes just to workaround one specific kind of 
attack is not acceptable. I suppose that the maximum acceptable is 4096 
bytes (or better 256 bytes).

Crytographic hash functions don't need random data, why would Python 
need 2 MB (!) for its hash function?


From tjreedy at  Sun Jan  1 19:45:01 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 01 Jan 2012 13:45:01 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <jdq9je$bnl$>

On 1/1/2012 10:13 AM, Guido van Rossum wrote:
> PS. Is the collision-generator used in the attack code open source?

As I posted before, Alexander Klink and Julian W?lde gave their project 
email as hashDoS at Since they indicated disappointment in not 
hearing from Python, I presume they would welcome engagement.

Terry Jan Reedy

From tjreedy at  Sun Jan  1 19:46:51 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 01 Jan 2012 13:46:51 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <jdq9mr$bnl$>

On 1/1/2012 12:28 PM, Christian Heimes wrote:
> Am 01.01.2012 17:54, schrieb Antoine Pitrou:
>> I don't understand. FNV-1 multiplies the current running result with a
>> prime and then xors it with the following byte. This is also what we do.
>> (I'm assuming 1000003 is prime)
> There must be a major difference somewhere inside the algorithm. The
> talk at the CCC conference in Berlin mentions that Ruby 1.9 is not
> vulnerable to meet-in-the-middle attacks and Ruby 1.9 uses FNV. The C
> code of FNV is more complex than our code, too.

I understood Alexander Klink and Julian W?lde, hashDoS at, as 
saying that they consider that using a random non-zero start value is 
sufficient to make the hash non-vulnerable.

Terry Jan Reedy

From jimjjewett at  Mon Jan  2 00:28:02 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 18:28:02 -0500
Subject: [Python-Dev]  Hash collision security issue (now public)
Message-ID: <>

Steven D'Aprano (in

> By compile-time, do you mean when the byte-code is compilated, i.e. just
> before runtime, rather than a switch when compiling the Python executable from
> source?

No.  I really mean when the C code is initially compiled to produce an
python executable.

The only reason we're worrying about this is that an adversary may
force worst-case performance.  If the python instance isn't a server,
or at least isn't exposed to untrusted clients, then even a single
extra "if" test is unjustified overhead.  Adding overhead to every
string hash or every dict lookup is bad.

That said, adding some overhead (only) to dict lookups *that already
hit half a dozen consecutive collisions* probably is reasonable,
because that won't happen very often with normal data.  (6 collisions
can't happen at all unless there are already at least 6 entries, so
small dicts are safe; with at least 1/3 of the slots empty, it should
happen only 1/729 for worst-size larger dicts.)


From paul at  Mon Jan  2 00:43:52 2012
From: paul at (Paul McMillan)
Date: Sun, 1 Jan 2012 15:43:52 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> But how about precomputing the intermediate value (x)? The hash is (mostly)
> doing x = f(x, c) for each c in the input.

That's a fair point. If we go down that avenue, I think simply
choosing a random fixed starting value for x is the correct choice,
rather than computing an intermediate value.

> I sort of see your point, but I still think that if we could add as little
> per-character overhead as possible it would be best -- sometimes people *do*
> hash very long strings.

Yep, agreed. My original proposal did not adequately address this.

>> Another option to consider would be to apply this change to some but
>> not all of the rounds. If we added the seed lookup xor operation for
>> only the first and last 5 values of x, we would still retain much of
>> the benefit without adding much computational overhead for very long
>> strings.
> I like that.

I believe this is a reasonable solution. An attacker could still
manipulate the internal state of long strings, but the additional
information at both ends should make that difficult to exploit. I'll
talk it over with the reviewers.

>> We could also consider a less computationally expensive operation than
>> the modulo for calculating the lookup index, like simply truncating to
>> the correct number of bits.
> Sure. Thanks for thinking about all the details here!!

Again, I'll talk to the reviewers (and run the randomness test
battery) to be double-check that this doesn't affect the distribution
in some unexpected way, but I think it should be fine.

> PS. Is the collision-generator used in the attack code open source?

Not in working form, and they've turned down requests for it from
other projects that want to check their work. If it's imperative that
we have one, I can write one, but I'd rather not spend the effort if
we don't need it.


From paul at  Mon Jan  2 00:49:14 2012
From: paul at (Paul McMillan)
Date: Sun, 1 Jan 2012 15:49:14 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> Different concern. What if someone were to have code implementing an
> external, persistent hash table, using Python's hash() function? They might
> have a way to rehash everything when a new version of Python comes along,
> but they would not be happy if hash() is different in each process. I
> somehow vaguely remember possibly having seen such code, or something else
> where a bit of random data was needed and hash() was used since it's so
> easily available.

I agree that there are use cases for allowing users to choose the
random seed, in much the same way it's helpful to be able to set it
for the random number generator. This should probably be something
that can be passed in at runtime. This feature would also be useful
for users who want to synchronize the hashes of multiple independent
processes, for whatever reason. For the general case though,
randomization should be on by default.


From jimjjewett at  Mon Jan  2 01:02:44 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 19:02:44 -0500
Subject: [Python-Dev]  Hash collision security issue (now public)
Message-ID: <>

Paul McMillan in

> Guido van Rossum wrote:
>> Hm. I'm not sure I like the idea of extra arithmetic for every character
>> being hashed.

> the collision generator doesn't necessarily vary the length of the
> string. Additionally, if we don't vary based on all the letters in the
> string, an attacker can fix the characters that we do use and generate
> colliding strings around them.

If the new hash algorithm doesn't kick in before, say, 32 characters,
then most currently hashed strings will not be affected.  And if the
attacker has to add 32 characters to every key, it reduces the "this
can be done with only N bytes uploaded" risk.  (The same logic
would apply to even longer prefixes, except that an attacker might
more easily find short-enough strings that collide.)

> We could also consider a less computationally expensive operation
> than the modulo for calculating the lookup index, like simply truncating
> to the correct number of bits.

Given that the modulo is always 2^N, how is that different?


From jimjjewett at  Mon Jan  2 01:21:11 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 19:21:11 -0500
Subject: [Python-Dev]  Hash collision security issue (now public)
Message-ID: <>

Victor Stinner wrote in

> If we want to protect a website against this attack for example, we must
> suppose that the attacker can inject arbitrary data and can get
> (indirectly) the result of hash(str) (e.g. with the representation of a
> dict in a traceback, with a JSON output, etc.).

(1)  Is it common to hash non-string input?  Because generating integers
that collide for certain dict sizes is pretty easy...

(2)  Would it make sense for traceback printing to sort dict keys?  (Any site
worried about this issue should already be hiding tracebacks from untrusted
clients, but the cost of this extra protection may be pretty small, given that
tracebacks shouldn't be printed all that often in the first place.)

(3)  Should the docs for json.encoder.JSONEncoder suggest sort_keys=True?


From jimjjewett at  Mon Jan  2 01:37:26 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 19:37:26 -0500
Subject: [Python-Dev]
Message-ID: <>

P. J. Eby wrote:

> On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull <stephen at> wrote:

>> While the dictionary probe has to start with a hash for backward
>> compatibility reasons, is there a reason the overflow strategy for
>> insertion has to be buckets containing lists?  How about
>> double-hashing, etc?

> This won't help, because the keys still have the same hash value. ANYTHING
> you do to them after they're generated will result in them still colliding.

> The *only* thing that works is to change the hash function in such a way
> that the strings end up with different hashes in the first place.
> Otherwise, you'll still end up with (deliberate) collisions.

Well, there is nothing wrong with switching to a different hash function after N
collisions, rather than "in the first place".  The perturbation
effectively does by
shoving the high-order bits through the part of the hash that survives the mask.

> (Well, technically, you could use trees or some other O log n data
> structure as a fallback once you have too many collisions, for some value
> of "too many".  Seems a bit wasteful for the purpose, though.)

Your WSGI specification < > requires
using a real dictionary for compatibility; storing some of the values
outside the
values array would violate that.  Do you consider that obsolete?


From lists at  Mon Jan  2 02:04:38 2012
From: lists at (Christian Heimes)
Date: Mon, 02 Jan 2012 02:04:38 +0100
Subject: [Python-Dev]
In-Reply-To: <>
References: <>
Message-ID: <>

Am 02.01.2012 01:37, schrieb Jim Jewett:
> Well, there is nothing wrong with switching to a different hash function after N
> collisions, rather than "in the first place".  The perturbation
> effectively does by
> shoving the high-order bits through the part of the hash that survives the mask.

Except that it won't work or slow down every lookup of missing keys?
It's absolutely crucial that the lookup time is kept as fast as possible.

You can't just change the hash algorithm in the middle of the work
without a speed impact on lookups. The size of the dict can shrink or
grow over time. This results into a different number of collisions for
the same string. Cuckoo hashing
( doesn't
sound feasible for us because it slows down lookup and requires an ABI
incompatible change for more hash slots on str/bytes/unicode instances.


PS: Something is wrong with your email client. Every of your replies
starts a new thread for me.

From solipsis at  Mon Jan  2 02:19:37 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 2 Jan 2012 02:19:37 +0100
Subject: [Python-Dev]
References: <>
Message-ID: <>

On Mon, 02 Jan 2012 02:04:38 +0100
Christian Heimes <lists at> wrote:
> PS: Something is wrong with your email client. Every of your replies
> starts a new thread for me.

Same here.



From jimjjewett at  Mon Jan  2 02:23:16 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 20:23:16 -0500
Subject: [Python-Dev]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 1, 2012 at 8:04 PM, Christian Heimes <lists at> wrote:
> Am 02.01.2012 01:37, schrieb Jim Jewett:
>> Well, there is nothing wrong with switching to a different hash function after N
>> collisions, rather than "in the first place". ?The perturbation
>> effectively does by
>> shoving the high-order bits through the part of the hash that survives the mask.

> Except that it won't work or slow down every lookup of missing keys?
> It's absolutely crucial that the lookup time is kept as fast as possible.

It will only slow down missing keys that themselves hit more than N collisions.

Or were you assuming that I meant to switch the whole table, rather
than just that one key?  I agree that wouldn't work.

> You can't just change the hash algorithm in the middle of the work
> without a speed impact on lookups.

Right -- but there is nothing wrong with modifying the lookdict (and
insert_clean) functions to do something different after the Nth
collision than they did after the N-1th.


From pje at  Mon Jan  2 04:00:33 2012
From: pje at (PJ Eby)
Date: Sun, 1 Jan 2012 22:00:33 -0500
Subject: [Python-Dev]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett <jimjjewett at> wrote:

> Well, there is nothing wrong with switching to a different hash function
> after N
> collisions, rather than "in the first place".  The perturbation
> effectively does by
> shoving the high-order bits through the part of the hash that survives the
> mask.

Since these are true hash collisions, they will all have the same high
order bits.  So, the usefulness of the perturbation is limited mainly to
the common case where true collisions are rare.

> (Well, technically, you could use trees or some other O log n data
> > structure as a fallback once you have too many collisions, for some value
> > of "too many".  Seems a bit wasteful for the purpose, though.)
> Your WSGI specification < >
> requires
> using a real dictionary for compatibility; storing some of the values
> outside the
> values array would violate that.

When I said "use some other data structure", I was referring to the
internal implementation of the dict type, not to user code.  The only
user-visible difference (even at C API level) would be the order of keys()
et al.  (In any case, I still assume this is too costly an implementation
change compared to changing the hash function or seeding it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jimjjewett at  Mon Jan  2 04:28:13 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 1 Jan 2012 22:28:13 -0500
Subject: [Python-Dev]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 1, 2012 at 10:00 PM, PJ Eby <pje at> wrote:
> On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett <jimjjewett at> wrote:

>> Well, there is nothing wrong with switching to a different hash function
>> after N
>> collisions, rather than "in the first place". ?The perturbation
>> effectively does by
>> shoving the high-order bits through the part of the hash that survives the
>> mask.

> Since these are true hash collisions, they will all have the same high order
> bits. ?So, the usefulness of the perturbation is limited mainly to the
> common case where true collisions are rare.

That is only because the perturb is based solely on the hash.
Switching to an entirely new hash after the 5th collision (for a given
lookup) would resolve that (after the 5th collision); the question is
whether or not the cost is worthwhile.

>> > (Well, technically, you could use trees or some other O log n data
>> > structure as a fallback once you have too many collisions, for some
>> > value
>> > of "too many". ?Seems a bit wasteful for the purpose, though.)
>> Your WSGI specification < >
>> requires
>> using a real dictionary for compatibility; storing some of the values
>> outside the
>> values array would violate that.

> When I said "use some other data structure", I was referring to the internal
> implementation of the dict type, not to user code. ?The only user-visible
> difference (even at C API level) would be the order of keys() et al.

Given the wording requiring a real dictionary, I would have assumed
that it was OK (if perhaps not sensible) to do pointer arithmetic and
access the keys/values/hashes directly.  (Though if the breakage was
between python versions, I would feel guilty about griping too


From ncoghlan at  Mon Jan  2 05:44:49 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 2 Jan 2012 14:44:49 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
Message-ID: <>

I've been having an occasional argument with Benjamin regarding braces
in 4-line if statements:

  if (cond)


  if (cond) {
  } else {

He keeps leaving them out, I occasionally tell him they should always
be included (most recently this came up when we gave conflicting
advice to a patch contributor). He says what he's doing is OK, because
he doesn't consider the example in PEP 7 as explicitly disallowing it,
I think it's a recipe for future maintenance hassles when someone adds
a second statement to one of the clauses but doesn't add the braces.
(The only time I consider it reasonable to leave out the braces is for
one liner if statements, where there's no else clause at all)

Since Benjamin doesn't accept the current brace example in PEP 7 as
normative for the case above, I'm bringing it up here to seek


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ben+python at  Mon Jan  2 06:25:51 2012
From: ben+python at (Ben Finney)
Date: Mon, 02 Jan 2012 16:25:51 +1100
Subject: [Python-Dev] PEP 7 clarification request: braces
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at> writes:

> He keeps leaving [braces] out [when the block is a single statement],
> I occasionally tell him they should always be included (most recently
> this came up when we gave conflicting advice to a patch contributor).

As someone who has maintained his fair share of C code, I am firmly on
the side of unconditionally (!) enclosing C statement blocks in braces
regardless of how many statements they contain.

> He says what he's doing is OK, because he doesn't consider the example
> in PEP 7 as explicitly disallowing it

I wonder if he has a stronger argument in favour of his position,
because ?it's not forbidden? doesn't imply ?it's okay?.

> I think it's a recipe for future maintenance hassles when someone adds
> a second statement to one of the clauses but doesn't add the braces.

Agreed, it's an issue of code maintainability. Which is enough of a
problem in C code that a low-cost improvement like this should always be

But, as someone who carries no water in the Python developer community,
my opinion has no more force than the arguments, and I can't impose it
on anyone. Take it for what it's worth.

 \     ?God was invented to explain mystery. God is always invented to |
  `\     explain those things that you do not understand.? ?Richard P. |
_o__)                                                    Feynman, 1988 |
Ben Finney

From scott+python-dev at  Mon Jan  2 06:04:15 2012
From: scott+python-dev at (Scott Dial)
Date: Mon, 02 Jan 2012 00:04:15 -0500
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/1/2012 11:44 PM, Nick Coghlan wrote:
> I think it's a recipe for future maintenance hassles when someone adds
> a second statement to one of the clauses but doesn't add the braces.
> (The only time I consider it reasonable to leave out the braces is for
> one liner if statements, where there's no else clause at all)

Could you explain how these two cases differ with regard to maintenance?

In either case, there are superfluous edits required if the original
author had used braces *always*. Putting a brace on one-liners adds only
a single line to the code -- just like in the if/else case. So, your
argument seems conflicted. Surely, you would think this is a simpler
edit to make and diff to see in a patch file:

 if(cond) {
+  stmt2;


+if(cond) {
+  stmt2;

Also, the superfluous edits will wrongly attribute the blame for the
conditional to the wrong author.

Scott Dial
scott at

From paul at  Mon Jan  2 06:55:52 2012
From: paul at (Paul McMillan)
Date: Sun, 1 Jan 2012 21:55:52 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

I fixed a couple things in my proposed algorithm:

I had a typo, and used 21 instead of 12 for the size multiplier. We
definitely don't need 2MB random data.

The initialization of r was broken. Now it is an array of ints, so
there's no conversion when it's used. I've adjusted it so there's 8k
of random data, broken into 2048 ints.

I added a length-based seed to the initial value of x. This prevents
single-characters from being used to enumerate raw values from r.
This is similar to the change proposed by Christian Heimes.

Most importantly, I moved the xor with r[x % len_r] down a line.
Before, it wasn't being applied to the last character.

> Christian Heimes said:
> We also need to special case short strings. The above routine hands over
> the seed to attackers, if he is able to retrieve lots of single
> character hashes.

The updated code always includes at least 2 lookups from r, which I
believe solves the single-character enumeration problem. If we
special-case part of our hash function for short strings, we may get
suboptimal collisions between the two types of hashes.

I think Ruby uses FNV-1 with a salt, making it less vulnerable to
this. FNV is otherwise similar to our existing hash function.

For the record, cryptographically strong hash functions are in the
neighborhood of 400% slower than our existing hash function.

> Terry Reedy said:
> I understood Alexander Klink and Julian W?lde, hashDoS at, as saying
> that they consider that using a random non-zero start value is sufficient to
> make the hash non-vulnerable.

I've been talking to them. They're happy to look at our proposed
changes. They indicate that a non-zero start value is sufficient to
prevent the attack, but D. J. Bernstein disagrees with them. He also
has indicated a willingness to look at our solution.


From ncoghlan at  Mon Jan  2 07:02:59 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 2 Jan 2012 16:02:59 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial
<scott+python-dev at> wrote:
> On 1/1/2012 11:44 PM, Nick Coghlan wrote:
>> I think it's a recipe for future maintenance hassles when someone adds
>> a second statement to one of the clauses but doesn't add the braces.
>> (The only time I consider it reasonable to leave out the braces is for
>> one liner if statements, where there's no else clause at all)
> Could you explain how these two cases differ with regard to maintenance?

Sure: always including K&R style braces for compound statements (even
when they aren't technically necessary) means that indentation ==
control flow, just like Python. Indent your additions correctly, and
the reader and compiler will agree on what they mean:

if (cond) {
} else {
    addition;  /* Reader and compiler agree this is part of the else clause */

if (cond)
    addition;  /* Uh-oh, should have added braces */

I've been trying to convince Benjamin that there's a reason "always
include the braces" is accepted wisdom amongst many veteran C
programmers (with some allowing an exception for one-liners), but he
isn't believing me, and I'm not going to go through and edit every
single one of his commits to add them.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From pje at  Mon Jan  2 07:16:42 2012
From: pje at (PJ Eby)
Date: Mon, 2 Jan 2012 01:16:42 -0500
Subject: [Python-Dev] That depends on what the meaning of "is" is (was Re:
Message-ID: <>

On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett <jimjjewett at> wrote:

> Given the wording requiring a real dictionary, I would have assumed
> that it was OK (if perhaps not sensible) to do pointer arithmetic and
> access the keys/values/hashes directly.  (Though if the breakage was
> between python versions, I would feel guilty about griping too
> loudly.)

If you're going to be a language lawyer about it, I would simply point out
that all the spec requires is that "type(env) is dict" -- it says nothing
about how Python defines "type" or "is" or "dict".  So, you're on your own
with that one. ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ron3200 at  Mon Jan  2 07:22:05 2012
From: ron3200 at (Ron Adam)
Date: Mon, 02 Jan 2012 00:22:05 -0600
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <1325485325.20247.37.camel@Gutsy>

On Mon, 2012-01-02 at 14:44 +1000, Nick Coghlan wrote:
> I've been having an occasional argument with Benjamin regarding braces
> in 4-line if statements:
>   if (cond)
>     statement;
>   else
>     statement;
> vs.
>   if (cond) {
>     statement;
>   } else {
>     statement;
>   }
> He keeps leaving them out, I occasionally tell him they should always
> be included (most recently this came up when we gave conflicting
> advice to a patch contributor). He says what he's doing is OK, because
> he doesn't consider the example in PEP 7 as explicitly disallowing it,

> I think it's a recipe for future maintenance hassles when someone adds
> a second statement to one of the clauses but doesn't add the braces.

I've had to correct my self on this one a few times so I will have to
agree it's a good practice.

> (The only time I consider it reasonable to leave out the braces is for
> one liner if statements, where there's no else clause at all)

The problem is only when an additional statement is added to the last
block, not the preceding ones, as the compiler will complain about
those.  So I don't know how the 4 line example without braces is any
worse than a 2 line if without braces.

I think my preference is, if any block in a multi-block expression needs
braces, then the other blocks should have it.  (Including the last block
even if it's a single line.)

The next level up would be to require them on all blocks, even two line
if expressions, but I'm not sure that is really needed.  At some point
the extra noise of the braces makes things harder to read rather than
easier, and what you gain in preventing one type of error may increase
chances of another type of error not being noticed.


> Since Benjamin doesn't accept the current brace example in PEP 7 as
> normative for the case above, I'm bringing it up here to seek
> clarification.

From ncoghlan at  Mon Jan  2 09:31:00 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 2 Jan 2012 18:31:00 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <1325485325.20247.37.camel@Gutsy>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 4:22 PM, Ron Adam <ron3200 at> wrote:
> The problem is only when an additional statement is added to the last
> block, not the preceding ones, as the compiler will complain about
> those. ?So I don't know how the 4 line example without braces is any
> worse than a 2 line if without braces.

Even when the compiler picks it up, it's still a wasted edit-compile
cycle. More importantly though, this approach makes the rules too
complicated. "Always use braces" is simple and easy, and the only cost
is the extra line of vertical whitespace for the closing brace.

(I personally don't like even the exception made for single clause if
statements, but that's already too prevalent in the code base to do
anything about. Hence the 4-line example in my original post.)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From raymond.hettinger at  Mon Jan  2 09:47:14 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 2 Jan 2012 00:47:14 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 1, 2012, at 8:44 PM, Nick Coghlan wrote:

> I've been having an occasional argument with Benjamin regarding braces
> in 4-line if statements:
>  if (cond)
>    statement;
>  else
>    statement;
> vs.
>  if (cond) {
>    statement;
>  } else {
>    statement;
>  }

Really?  Do we need to have a brace war?
People have different preferences.
The standard library includes some of both styles
depending on what the maintainer thought was cleanest to their eyes in a given context.


From ncoghlan at  Mon Jan  2 11:15:25 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 2 Jan 2012 20:15:25 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 6:47 PM, Raymond Hettinger
<raymond.hettinger at> wrote:
> Really? ?Do we need to have a brace war?
> People have different preferences.
> The standard library includes some of both styles
> depending on what the maintainer thought was cleanest to their eyes in a given context.

If the answer is "either form is OK", I'm actually fine with that (and
will update PEP 7 accordingly). However, I have long read PEP 7 as
*requiring* the braces, and until noticing their absence in some of
Benjamin's checkins and the recent conflicting advice we gave when
reviewing the same patch, I had never encountered their absence in the
CPython code base outside the one-liner/two-liner case*.

Since I *do* feel strongly that leaving them out is a mistake that
encourages future defects, and read PEP 7 as agreeing with that (aside
from the general "follow conventions in surrounding code" escape
clause), I figured it was better to bring it up explicitly and clarify
PEP 7 accordingly (since what is currently there is clearly ambiguous
enough for two current committers to have diametrically opposed views
on what it says).


* That is, constructs like:

  if (error_condition) return -1;

  if (error_condition)
    return -1;

Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From tjreedy at  Mon Jan  2 11:25:16 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 02 Jan 2012 05:25:16 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <jds0md$6q7$>

On 1/2/2012 12:55 AM, Paul McMillan wrote:

>> Terry Reedy said:
>> I understood Alexander Klink and Julian W?lde, hashDoS at, as saying
>> that they consider that using a random non-zero start value is sufficient to
>> make the hash non-vulnerable.
> I've been talking to them. They're happy to look at our proposed
> changes. They indicate that a non-zero start value is sufficient to
> prevent the attack, but D. J. Bernstein disagrees with them. He also
> has indicated a willingness to look at our solution.

Great. My main concern currently is that there should be no noticeable 
slowdown for 64 bit builds which are apparently not vulnerable and which 
therefore would get no benefit.

Terry Jan Reedy

From solipsis at  Mon Jan  2 13:01:05 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 2 Jan 2012 13:01:05 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
Message-ID: <>

On Sun, 1 Jan 2012 21:55:52 -0800
Paul McMillan <paul at> wrote:
> This is similar to the change proposed by Christian Heimes.
> Most importantly, I moved the xor with r[x % len_r] down a line.
> Before, it wasn't being applied to the last character.

Shouldn't it be r[i % len(r)] instead?
(refer to yesterday's #python-dev discussion)

> I think Ruby uses FNV-1 with a salt, making it less vulnerable to
> this. FNV is otherwise similar to our existing hash function.

Again, we could re-use FNV-1's primes, since they claim they have
better dispersion properties than the average prime.



From solipsis at  Mon Jan  2 13:05:28 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 2 Jan 2012 13:05:28 +0100
Subject: [Python-Dev] PEP 7 clarification request: braces
References: <>
Message-ID: <>

On Mon, 2 Jan 2012 14:44:49 +1000
Nick Coghlan <ncoghlan at> wrote:
> I've been having an occasional argument with Benjamin regarding braces
> in 4-line if statements:
>   if (cond)
>     statement;
>   else
>     statement;
> vs.
>   if (cond) {
>     statement;
>   } else {
>     statement;
>   }

Good, I was afraid python-dev was getting a bit futile with all these
security concerns about hash functions.

I don't like having the else on the same line as the closing brace,
and prefer:

   if (cond) {
   else {

That said, I agree with Benjamin: the shorter form is visually lighter
and should not be frown upon.


Not-frowning Antoine.

From petri at  Mon Jan  2 14:24:28 2012
From: petri at (Petri Lehtinen)
Date: Mon, 2 Jan 2012 15:24:28 +0200
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <20120102132428.GR24315@p16>

Antoine Pitrou wrote:
> I don't like having the else on the same line as the closing brace,
> and prefer:
>    if (cond) {
>        statement;
>    }
>    else {
>        statement;
>    }

And this is how it's written in PEP-7. It seems to me that PEP-7
doesn't require braces. But it explicitly forbids

    if (cond) {
    } else {

by saying "braces as shown", and then showing them like this:

    if (mro != NULL) {
    else {

> That said, I agree with Benjamin: the shorter form is visually lighter
> and should not be frown upon.

Me too.

From ned at  Mon Jan  2 15:10:40 2012
From: ned at (Ned Batchelder)
Date: Mon, 02 Jan 2012 09:10:40 -0500
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/1/2012 11:44 PM, Nick Coghlan wrote:
> I've been having an occasional argument with Benjamin regarding braces
> in 4-line if statements:
>    if (cond)
>      statement;
>    else
>      statement;
> vs.
>    if (cond) {
>      statement;
>    } else {
>      statement;
>    }
> He keeps leaving them out, I occasionally tell him they should always
> be included (most recently this came up when we gave conflicting
> advice to a patch contributor). He says what he's doing is OK, because
> he doesn't consider the example in PEP 7 as explicitly disallowing it,
> I think it's a recipe for future maintenance hassles when someone adds
> a second statement to one of the clauses but doesn't add the braces.
> (The only time I consider it reasonable to leave out the braces is for
> one liner if statements, where there's no else clause at all)
> Since Benjamin doesn't accept the current brace example in PEP 7 as
> normative for the case above, I'm bringing it up here to seek
> clarification.
I've always valued readability and consistency above brevity, and Python 
does too.  *Sometimes* using braces in C is a recipe for confusion, and 
only adds to the cognitive load in reading the code.  The examples 
elsewhere in this thread of mistakes and noisy diffs due to leaving out 
the braces are plenty of reason for me to always include braces.

The current code uses a mixture of styles, but that doesn't mean we need 
to allow any style in the future.  I'm in favor of PEP 7 being amended 
to either require or strongly favor the braces-always style.  Note: 
while we're reading the tea-leaves in PEP 7, it has an example of a 
single-line if clause with no braces.

Some people favor the braces-sometimes style because it leads to 
"lighter" code.  I think that's a misguided optimization.  Consistency 
is better than reducing the line count.

> Cheers,
> Nick.

From stephen at  Mon Jan  2 15:32:19 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 02 Jan 2012 23:32:19 +0900
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Christian Heimes writes:
 > Am 31.12.2011 13:03, schrieb Stephen J. Turnbull:
 > > I don't know the implementation issues well enough to claim it is a
 > > solution, but this hasn't been mentioned before AFAICS:
 > > 
 > > While the dictionary probe has to start with a hash for backward
 > > compatibility reasons, is there a reason the overflow strategy for
 > > insertion has to be buckets containing lists?  How about
 > > double-hashing, etc?
 > Python's dict implementation doesn't use bucket but open addressing (aka
 > closed hashed table). The algorithm for conflict resolution doesn't use
 > double hashing. Instead it takes the original and (in most cases) cached
 > hash and perturbs the hash with a series of add, multiply and bit shift ops.

In an attack, this is still O(collisions) per probe (as any scheme
where the address of the nth collision is a function of only the
hash), where double hashing should be "roughly" O(1) (with double the

But that evidently imposes too large a performance burden on non-evil
users, so it's not worth thinking about whether "roughly O(1)" is
close enough to O(1) to deter or exhaust attackers.  I withdraw the

From solipsis at  Mon Jan  2 15:41:44 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 2 Jan 2012 15:41:44 +0100
Subject: [Python-Dev] Code reviews
References: <>
Message-ID: <>

On Mon, 2 Jan 2012 14:44:49 +1000
Nick Coghlan <ncoghlan at> wrote:
> He keeps leaving them out, I occasionally tell him they should always
> be included (most recently this came up when we gave conflicting
> advice to a patch contributor).

Oh, by the way, this is also why I avoid arguing too much about style
in code reviews. There are two bad things which can happen:

- your advice conflicts with advice given by another reviewer (perhaps
  on another issue)
- the contributor feels drowned under tiresome requests for style
  fixes ("please indent continuation lines this way")

Both are potentially demotivating. A contributor can have his/her own
style if it doesn't adversely affect code quality.



From benjamin at  Mon Jan  2 15:54:02 2012
From: benjamin at (Benjamin Peterson)
Date: Mon, 2 Jan 2012 08:54:02 -0600
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/1 Nick Coghlan <ncoghlan at>:
> I've been having an occasional argument with Benjamin regarding braces
> in 4-line if statements:

Python's C code has been dropping braces long before I ever arrived.
See this beautiful example in dictobject.c, for example:

    if (numfree < PyDict_MAXFREELIST && Py_TYPE(mp) == &PyDict_Type)
        free_list[numfree++] = mp;
        Py_TYPE(mp)->tp_free((PyObject *)mp);

There's even things like this:

    if (ep->me_key == dummy)
        freeslot = ep;
    else {
        if (ep->me_hash == hash && unicode_eq(ep->me_key, key))
            return ep;
        freeslot = NULL;

where I would normally put braces on both statements.

I think claims of its maintainability are exaggerated. (If someone
could cite an example of a bug caused by braces, I'd be interested to
see it.) If I start editing one of the bodies, emacs will dedent, so
that I know I'm back to the containing block. By virtue of being 5
lines long, it's a very easy case to see and fix as you edit it.

I think it's fine Nick raised this. PEP 7 is not very explicit about
braces at all.


From solipsis at  Mon Jan  2 16:04:58 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 2 Jan 2012 16:04:58 +0100
Subject: [Python-Dev] cpython: fix some possible refleaks from
 PyUnicode_READY error conditions
References: <>
Message-ID: <>

On Mon, 02 Jan 2012 16:00:50 +0100
benjamin.peterson <python-checkins at> wrote:
> changeset:   74236:d5cda62d0f8c
> user:        Benjamin Peterson <benjamin at>
> date:        Mon Jan 02 09:00:30 2012 -0600
> summary:
>   fix some possible refleaks from PyUnicode_READY error conditions
> files:
>   Objects/unicodeobject.c |  80 ++++++++++++++++++++--------
>   1 files changed, 56 insertions(+), 24 deletions(-)
> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
> --- a/Objects/unicodeobject.c
> +++ b/Objects/unicodeobject.c
> @@ -9132,10 +9132,15 @@
>      Py_ssize_t len1, len2;
>      str_obj = PyUnicode_FromObject(str);
> -    if (!str_obj || PyUnicode_READY(str_obj) == -1)
> +    if (!str_obj)
>          return -1;
>      sub_obj = PyUnicode_FromObject(substr);
> -    if (!sub_obj || PyUnicode_READY(sub_obj) == -1) {
> +    if (!sub_obj) {
> +        Py_DECREF(str_obj);
> +        return -1;
> +    }
> +    if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) {

Shouldn't the first one be PyUnicode_READY(sub_obj) ?

From benjamin at  Mon Jan  2 16:07:54 2012
From: benjamin at (Benjamin Peterson)
Date: Mon, 2 Jan 2012 09:07:54 -0600
Subject: [Python-Dev] cpython: fix some possible refleaks from
 PyUnicode_READY error conditions
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/2 Antoine Pitrou <solipsis at>:
> On Mon, 02 Jan 2012 16:00:50 +0100
> benjamin.peterson <python-checkins at> wrote:
>> changeset: ? 74236:d5cda62d0f8c
>> user: ? ? ? ?Benjamin Peterson <benjamin at>
>> date: ? ? ? ?Mon Jan 02 09:00:30 2012 -0600
>> summary:
>> ? fix some possible refleaks from PyUnicode_READY error conditions
>> files:
>> ? Objects/unicodeobject.c | ?80 ++++++++++++++++++++--------
>> ? 1 files changed, 56 insertions(+), 24 deletions(-)
>> diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
>> --- a/Objects/unicodeobject.c
>> +++ b/Objects/unicodeobject.c
>> @@ -9132,10 +9132,15 @@
>> ? ? ?Py_ssize_t len1, len2;
>> ? ? ?str_obj = PyUnicode_FromObject(str);
>> - ? ?if (!str_obj || PyUnicode_READY(str_obj) == -1)
>> + ? ?if (!str_obj)
>> ? ? ? ? ?return -1;
>> ? ? ?sub_obj = PyUnicode_FromObject(substr);
>> - ? ?if (!sub_obj || PyUnicode_READY(sub_obj) == -1) {
>> + ? ?if (!sub_obj) {
>> + ? ? ? ?Py_DECREF(str_obj);
>> + ? ? ? ?return -1;
>> + ? ?}
>> + ? ?if (PyUnicode_READY(substr) == -1 || PyUnicode_READY(str_obj) == -1) {
> Shouldn't the first one be PyUnicode_READY(sub_obj) ?



From lists at  Mon Jan  2 16:18:41 2012
From: lists at (Christian Heimes)
Date: Mon, 02 Jan 2012 16:18:41 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

Am 02.01.2012 06:55, schrieb Paul McMillan:
> I think Ruby uses FNV-1 with a salt, making it less vulnerable to
> this. FNV is otherwise similar to our existing hash function.
> For the record, cryptographically strong hash functions are in the
> neighborhood of 400% slower than our existing hash function.

I've pushed a new patch

The changeset adds the murmur3 hash algorithm with some minor changes,
for example more random seeds. At first I was worried that murmur might
be slower than our old hash algorithm. But in fact it seems to be faster!

Pybench 10 rounds on my Core2 Duo 2.60:

  py3k: 3.230 sec
  randomahash: 3.182 sec


From ajm at  Mon Jan  2 16:34:00 2012
From: ajm at (Anders J. Munch)
Date: Mon, 2 Jan 2012 16:34:00 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdq9mr$bnl$>
References: <><><><><jdo2u1$k0u$><><>
Message-ID: <>

> On 1/1/2012 12:28 PM, Christian Heimes wrote:
> I understood Alexander Klink and Julian W?lde, hashDoS at, as
> saying that they consider that using a random non-zero start value is
> sufficient to make the hash non-vulnerable.

Sufficient against their current attack.  But will it last?  For a
long-running server, there must be plenty of ways information can leak
that will help guessing that start value.

The alternative, to provide a dict-like datastructure for use with
untrusted input, deserves consideration.  Perhaps something simpler
than a balanced tree would do?  How about a dict-like class that is
built on a lazily sorted list?  Insertions basically just do
list.append and set a dirty-flag, and lookups use bisect - sorting
first if the dirty-flag is set.  It wouldn't be complete dict
replacement by any means, mixing insertions and lookups would have
terrible performance, but for something like POST parameters it should
be good enough.

I half expected to find something like that on activestate recipes
already, but couldn't find any.

regards, Anders

From lists at  Mon Jan  2 16:47:43 2012
From: lists at (Christian Heimes)
Date: Mon, 02 Jan 2012 16:47:43 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdq9je$bnl$>
References: <>
Message-ID: <>

Am 01.01.2012 19:45, schrieb Terry Reedy:
> On 1/1/2012 10:13 AM, Guido van Rossum wrote:
>> PS. Is the collision-generator used in the attack code open source?
> As I posted before, Alexander Klink and Julian W?lde gave their project 
> email as hashDoS at Since they indicated disappointment in not 
> hearing from Python, I presume they would welcome engagement.

Somebody should contact Alexander and Julian to let them know, that we
are working on the matter. It should be somebody "official" for the
initial contact, too. I've included Guido (BDFL), Barry (their initial
security contact) and MvL (most prominent German core dev) in CC, as
they are the logical choice for me.

I'm willing to have a phone call with them once the contact has been
established. IMHO it's slightly easier to talk in native tongue --
Alexander and Julian are German, too.


From g.brandl at  Mon Jan  2 18:35:48 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 02 Jan 2012 18:35:48 +0100
Subject: [Python-Dev] Code reviews
In-Reply-To: <>
References: <>
Message-ID: <jdsptk$jg2$>

On 01/02/2012 03:41 PM, Antoine Pitrou wrote:
> On Mon, 2 Jan 2012 14:44:49 +1000
> Nick Coghlan <ncoghlan at> wrote:
>> He keeps leaving them out, I occasionally tell him they should always
>> be included (most recently this came up when we gave conflicting
>> advice to a patch contributor).
> Oh, by the way, this is also why I avoid arguing too much about style
> in code reviews. There are two bad things which can happen:
> - your advice conflicts with advice given by another reviewer (perhaps
>   on another issue)
> - the contributor feels drowned under tiresome requests for style
>   fixes ("please indent continuation lines this way")
> Both are potentially demotivating. A contributor can have his/her own
> style if it doesn't adversely affect code quality.

Exactly. Especially for reviews of patches from non-core people, we
should exercise a lot of restraint: as the committers, I think we can be
expected to bite the sour bullet and apply our uniform style (such as
it is).

It is tiresome, if not downright disappointing, to get reviews that
are basically "nothing wrong, but please submit again with one more
empty line between the classes", and definitely not the way to
attract more contributors.


From g.brandl at  Mon Jan  2 18:38:41 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 02 Jan 2012 18:38:41 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<jdq9je$bnl$> <>
Message-ID: <jdsq2v$jg2$>

On 01/02/2012 04:47 PM, Christian Heimes wrote:
> Am 01.01.2012 19:45, schrieb Terry Reedy:
>> On 1/1/2012 10:13 AM, Guido van Rossum wrote:
>>> PS. Is the collision-generator used in the attack code open source?
>> As I posted before, Alexander Klink and Julian W?lde gave their project 
>> email as hashDoS at Since they indicated disappointment in not 
>> hearing from Python, I presume they would welcome engagement.
> Somebody should contact Alexander and Julian to let them know, that we
> are working on the matter. It should be somebody "official" for the
> initial contact, too. I've included Guido (BDFL), Barry (their initial
> security contact) and MvL (most prominent German core dev) in CC, as
> they are the logical choice for me.
> I'm willing to have a phone call with them once the contact has been
> established. IMHO it's slightly easier to talk in native tongue --
> Alexander and Julian are German, too.

I wouldn't expect too much -- they seem rather keen on cheap laughs:!/bk3n/status/152068096448921600/photo/1/large


From guido at  Mon Jan  2 19:29:29 2012
From: guido at (Guido van Rossum)
Date: Mon, 2 Jan 2012 10:29:29 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<jdq9je$bnl$> <>
Message-ID: <>

On Mon, Jan 2, 2012 at 7:47 AM, Christian Heimes <lists at> wrote:

> Am 01.01.2012 19:45, schrieb Terry Reedy:
> > On 1/1/2012 10:13 AM, Guido van Rossum wrote:
> >> PS. Is the collision-generator used in the attack code open source?
> >
> > As I posted before, Alexander Klink and Julian W?lde gave their project
> > email as hashDoS at Since they indicated disappointment in not
> > hearing from Python, I presume they would welcome engagement.
> Somebody should contact Alexander and Julian to let them know, that we
> are working on the matter. It should be somebody "official" for the
> initial contact, too. I've included Guido (BDFL), Barry (their initial
> security contact) and MvL (most prominent German core dev) in CC, as
> they are the logical choice for me.
> I'm willing to have a phone call with them once the contact has been
> established. IMHO it's slightly easier to talk in native tongue --
> Alexander and Julian are German, too.

I'm not sure I see the point -- just give them a link to the python-dev

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From francismb at  Mon Jan  2 19:26:13 2012
From: francismb at (francis)
Date: Mon, 02 Jan 2012 19:26:13 +0100
Subject: [Python-Dev] Code reviews
In-Reply-To: <jdsptk$jg2$>
References: <>	<>
Message-ID: <>

On 01/02/2012 06:35 PM, Georg Brandl wrote:
> On 01/02/2012 03:41 PM, Antoine Pitrou wrote:
>> On Mon, 2 Jan 2012 14:44:49 +1000
>> Nick Coghlan<ncoghlan at>  wrote:
>>> He keeps leaving them out, I occasionally tell him they should always
>>> be included (most recently this came up when we gave conflicting
>>> advice to a patch contributor).
>> Oh, by the way, this is also why I avoid arguing too much about style
>> in code reviews. There are two bad things which can happen:
>> - your advice conflicts with advice given by another reviewer (perhaps
>>    on another issue)
>> - the contributor feels drowned under tiresome requests for style
>>    fixes ("please indent continuation lines this way")
>> Both are potentially demotivating. A contributor can have his/her own
>> style if it doesn't adversely affect code quality.
> Exactly. Especially for reviews of patches from non-core people, we
> should exercise a lot of restraint: as the committers, I think we can be
> expected to bite the sour bullet and apply our uniform style (such as
> it is).
> It is tiresome, if not downright disappointing, to get reviews that
> are basically "nothing wrong, but please submit again with one more
> empty line between the classes", and definitely not the way to
> attract more contributors.
Hi to all member of this list,
I'm not a Python-Dev (only some very small patches over core-mentorship 
Just my 2cents here).

I would try to relax this conflicts with a script that does the 
reformatting itself. If
that reformatting where part of the process itself do you thing that 
that would
be an issue anymore?

PS: I know that there?s a pep8 checker so it could be transformed into a 
but I don't know if theres a pep7 checker (reformater)

Best regards!


From brian at  Mon Jan  2 19:41:07 2012
From: brian at (Brian Curtin)
Date: Mon, 2 Jan 2012 12:41:07 -0600
Subject: [Python-Dev] Code reviews
In-Reply-To: <>
References: <>
	<> <jdsptk$jg2$>
Message-ID: <>

On Mon, Jan 2, 2012 at 12:26, francis <francismb at> wrote:
> On 01/02/2012 06:35 PM, Georg Brandl wrote:
>> On 01/02/2012 03:41 PM, Antoine Pitrou wrote:
>>> On Mon, 2 Jan 2012 14:44:49 +1000
>>> Nick Coghlan<ncoghlan at> ?wrote:
>>>> He keeps leaving them out, I occasionally tell him they should always
>>>> be included (most recently this came up when we gave conflicting
>>>> advice to a patch contributor).
>>> Oh, by the way, this is also why I avoid arguing too much about style
>>> in code reviews. There are two bad things which can happen:
>>> - your advice conflicts with advice given by another reviewer (perhaps
>>> ? on another issue)
>>> - the contributor feels drowned under tiresome requests for style
>>> ? fixes ("please indent continuation lines this way")
>>> Both are potentially demotivating. A contributor can have his/her own
>>> style if it doesn't adversely affect code quality.
>> Exactly. Especially for reviews of patches from non-core people, we
>> should exercise a lot of restraint: as the committers, I think we can be
>> expected to bite the sour bullet and apply our uniform style (such as
>> it is).
>> It is tiresome, if not downright disappointing, to get reviews that
>> are basically "nothing wrong, but please submit again with one more
>> empty line between the classes", and definitely not the way to
>> attract more contributors.
> Hi to all member of this list,
> I'm not a Python-Dev (only some very small patches over core-mentorship
> list.
> Just my 2cents here).
> I would try to relax this conflicts with a script that does the reformatting
> itself. If
> that reformatting where part of the process itself do you thing that that
> would
> be an issue anymore?

I don't think this is a problem to the point that it needs to be fixed
via automation. The code I write is the code I build and test, so I'd
rather not have some script that goes in and modifies it to some
accepted format, then have to go through the build/test dance again.

From snaury at  Mon Jan  2 19:53:09 2012
From: snaury at (Alexey Borzenkov)
Date: Mon, 2 Jan 2012 22:53:09 +0400
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes <lists at> wrote:
> Am 02.01.2012 06:55, schrieb Paul McMillan:
>> I think Ruby uses FNV-1 with a salt, making it less vulnerable to
>> this. FNV is otherwise similar to our existing hash function.
>> For the record, cryptographically strong hash functions are in the
>> neighborhood of 400% slower than our existing hash function.
> I've pushed a new patch

It seems for 32-bit version you are using pid for the two constants.
Also, it's unclear why you even need to use a random constant for the
final pass, you already use random constant as an initial h1, and it
should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3,
k4 should be initialized to zero, these are key data, they don't need
to be mixed with anything.

Also, I'm not sure how portable is the always_inline attribute, is it
supported on all compilers and all platforms?

From snaury at  Mon Jan  2 19:57:27 2012
From: snaury at (Alexey Borzenkov)
Date: Mon, 2 Jan 2012 22:57:27 +0400
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 10:53 PM, Alexey Borzenkov <snaury at> wrote:
> On Mon, Jan 2, 2012 at 7:18 PM, Christian Heimes <lists at> wrote:
>> Am 02.01.2012 06:55, schrieb Paul McMillan:
>>> I think Ruby uses FNV-1 with a salt, making it less vulnerable to
>>> this. FNV is otherwise similar to our existing hash function.
>>> For the record, cryptographically strong hash functions are in the
>>> neighborhood of 400% slower than our existing hash function.
>> I've pushed a new patch
> It seems for 32-bit version you are using pid for the two constants.
> Also, it's unclear why you even need to use a random constant for the
> final pass, you already use random constant as an initial h1, and it
> should be enough, no need to use for k1. Same for 128-bit: k1, k2, k3,
> k4 should be initialized to zero, these are key data, they don't need
> to be mixed with anything.

Sorry, sent too soon. What I mean is that you're initializing a pretty
big array of values when you only need a 32-bit value. Pid, in my
opinion might be too predictable, it would be a lot better to simply
hash pid and gettimeofday bytes to produce this single 32-bit value
and use it for h1, h2, h3 and h4 in both 32-bit and 128-bit versions.

> Also, I'm not sure how portable is the always_inline attribute, is it
> supported on all compilers and all platforms?

From tseaver at  Mon Jan  2 21:25:00 2012
From: tseaver at (Tres Seaver)
Date: Mon, 02 Jan 2012 15:25:00 -0500
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <jdt3qr$lql$>

Hash: SHA1

On 01/02/2012 01:02 AM, Nick Coghlan wrote:
> On Mon, Jan 2, 2012 at 3:04 PM, Scott Dial 
> <scott+python-dev at> wrote:
>> On 1/1/2012 11:44 PM, Nick Coghlan wrote:
>>> I think it's a recipe for future maintenance hassles when someone
>>> adds a second statement to one of the clauses but doesn't add the
>>> braces. (The only time I consider it reasonable to leave out the
>>> braces is for one liner if statements, where there's no else
>>> clause at all)
>> Could you explain how these two cases differ with regard to
>> maintenance?
> Sure: always including K&R style braces for compound statements (even 
> when they aren't technically necessary) means that indentation == 
> control flow, just like Python. Indent your additions correctly, and 
> the reader and compiler will agree on what they mean:
> if (cond) { statement; } else { statement; addition;  /* Reader and
> compiler agree this is part of the else clause */ }
> if (cond) statement; else statement; addition;  /* Uh-oh, should have
> added braces */
> I've been trying to convince Benjamin that there's a reason "always 
> include the braces" is accepted wisdom amongst many veteran C 
> programmers (with some allowing an exception for one-liners), but he 
> isn't believing me, and I'm not going to go through and edit every 
> single one of his commits to add them.

FWIW, +1 to mandating braces-always (even for one liners):  the future
maintenance burden isn't worth the trouble of the exception.  In the days
when I did C / C++ / Java coding as my main gig, braceless code was
routinely a bug magnet *for the team*.

- -- 
Tres Seaver          +1 540-429-0999          tseaver at
Palladion Software   "Excellence by Design"
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


From benjamin at  Mon Jan  2 21:31:57 2012
From: benjamin at (Benjamin Peterson)
Date: Mon, 2 Jan 2012 14:31:57 -0600
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/1 Nick Coghlan <ncoghlan at>:
> ?if (cond) {
> ? ?statement;
> ?} else {
> ? ?statement;
> ?}

I might add that assuming you have braces, PEP 7 would want you to format it as

if (cond) {
else {


From julien at  Mon Jan  2 22:02:28 2012
From: julien at (julien tayon)
Date: Mon, 2 Jan 2012 22:02:28 +0100
Subject: [Python-Dev] Code reviews
In-Reply-To: <>
References: <>
	<> <jdsptk$jg2$>
Message-ID: <>

Like indent ?

> I don't think this is a problem to the point that it needs to be fixed
> via automation. The code I write is the code I build and test, so I'd
> rather not have some script that goes in and modifies it to some
> accepted format, then have to go through the build/test dance again.

Well, it breaks committing since it adds non significative symbols,
therefore bloats the diffs.
But as far as I am concerned for using it a long time ago, it did not
break anything, it was pretty reliable.

my 2c * 0.1

From jimjjewett at  Mon Jan  2 22:07:59 2012
From: jimjjewett at (Jim Jewett)
Date: Mon, 2 Jan 2012 16:07:59 -0500
Subject: [Python-Dev] That depends on what the meaning of "is" is (was
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby <pje at> wrote:
> On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett <jimjjewett at> wrote:
>> Given the wording requiring a real dictionary, I would have assumed
>> that it was OK (if perhaps not sensible) to do pointer arithmetic and
>> access the keys/values/hashes directly. ?(Though if the breakage was
>> between python versions, I would feel guilty about griping too
>> loudly.)

> If you're going to be a language lawyer about it, I would simply point out
> that all the spec requires is that "type(env) is dict" -- it says nothing
> about how Python defines "type" or "is" or "dict". ?So, you're on your own
> with that one. ;-)

But the public header file < >
defines the typedef structs for PyDictEntry and _dictobject.

What is the purpose of the requiring a "real dict" without also
promising what the header file promises?


From larry at  Mon Jan  2 22:50:32 2012
From: larry at (Larry Hastings)
Date: Mon, 02 Jan 2012 13:50:32 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 01/02/2012 12:47 AM, Raymond Hettinger wrote:
> Really?  Do we need to have a brace war?
> People have different preferences.
> The standard library includes some of both styles
> depending on what the maintainer thought was cleanest to their eyes in a given context.

I'm with Raymond.  Code should be readable, and code reviews are the 
best way to achieve that--not endlessly specific formatting rules.

Have there been bugs in CPython that the proposed new PEP 7 rule would 
have prevented?


From guido at  Mon Jan  2 23:08:17 2012
From: guido at (Guido van Rossum)
Date: Mon, 2 Jan 2012 14:08:17 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 1:50 PM, Larry Hastings <larry at> wrote:

> On 01/02/2012 12:47 AM, Raymond Hettinger wrote:
>> Really?  Do we need to have a brace war?
>> People have different preferences.
>> The standard library includes some of both styles
>> depending on what the maintainer thought was cleanest to their eyes in a
>> given context.
> I'm with Raymond.  Code should be readable, and code reviews are the best
> way to achieve that--not endlessly specific formatting rules.
> Have there been bugs in CPython that the proposed new PEP 7 rule would
> have prevented?

The irony is that style guides exist to *avoid* debates like this. Yes, the
choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the
rules. But still, once a style rule has been set, the idea is to stop
debating and just code.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From timothy.c.delaney at  Mon Jan  2 23:09:49 2012
From: timothy.c.delaney at (Tim Delaney)
Date: Tue, 3 Jan 2012 09:09:49 +1100
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 3 January 2012 08:50, Larry Hastings <larry at> wrote:

> On 01/02/2012 12:47 AM, Raymond Hettinger wrote:
>> Really?  Do we need to have a brace war?
>> People have different preferences.
>> The standard library includes some of both styles
>> depending on what the maintainer thought was cleanest to their eyes in a
>> given context.
> I'm with Raymond.  Code should be readable, and code reviews are the best
> way to achieve that--not endlessly specific formatting rules.
> Have there been bugs in CPython that the proposed new PEP 7 rule would
> have prevented?

I've found that until someone has experienced multiple nasty bugs caused by
not always using braces, it's nearly impossible to convince them of why you
should. Afterwards it'simpossible to convince them (me) that you shouldn't
always use braces.

I'd also point out that if you're expecting braces, not having them can
make the code less readable. A consistent format tends to make for more
readable code.


Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From francisco.martin at  Mon Jan  2 23:13:27 2012
From: francisco.martin at (Francisco Martin Brugue)
Date: Mon, 02 Jan 2012 23:13:27 +0100
Subject: [Python-Dev] Code reviews
In-Reply-To: <>
References: <>	<>
	<jdsptk$jg2$>	<>	<>
Message-ID: <>

On 01/02/2012 10:02 PM, julien tayon wrote:
> @francis
> Like indent ?
Thank you, I wasn't aware of this one !

From raymond.hettinger at  Mon Jan  2 23:32:14 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 2 Jan 2012 14:32:14 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 2, 2012, at 12:31 PM, Benjamin Peterson wrote:

> I might add that assuming you have braces, PEP 7 would want you to format it as
> if (cond) {
>    statement;
> }
> else {
>    more_stuff;
> }

Running  ``grep -B1 else Objects/*c`` shows that we've happily lived with a mixture of styles for a very long time.
ISTM, our committers have had good instincts about when braces add clarity and when they add clutter.
If Nick pushes through an always-use-braces mandate, A LOT of code will need to be changed.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From raymond.hettinger at  Mon Jan  2 23:55:59 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 2 Jan 2012 14:55:59 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote:

> I'd also point out that if you're expecting braces, not having them can make the code less readable. 

If a programmer's mind explodes when they look at the simple and beautiful
examples in K&R's The C Programming Language, then they've got problems
that can't be solved by braces ;-)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ned at  Tue Jan  3 00:08:15 2012
From: ned at (Ned Batchelder)
Date: Mon, 02 Jan 2012 18:08:15 -0500
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/2/2012 5:32 PM, Raymond Hettinger wrote:
> Running  ``grep -B1 else Objects/*c`` shows that we've happily lived 
> with a mixture of styles for a very long time.
> ISTM, our committers have had good instincts about when braces add 
> clarity and when they add clutter.
> If Nick pushes through an always-use-braces mandate, A LOT of code 
> will need to be changed.
I'm sure we can agree that 1) Nick isn't "pushing through" anything, 
this is a discussion about what to do, and 2) even if we agree to change 
PEP 7, no one would advocate having to go through all the C code to 
change it to a newly-agreed style.

> Raymond
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pje at  Tue Jan  3 01:16:15 2012
From: pje at (PJ Eby)
Date: Mon, 2 Jan 2012 19:16:15 -0500
Subject: [Python-Dev] That depends on what the meaning of "is" is (was
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett <jimjjewett at> wrote:

> On Mon, Jan 2, 2012 at 1:16 AM, PJ Eby <pje at> wrote:
> > On Sun, Jan 1, 2012 at 10:28 PM, Jim Jewett <jimjjewett at>
> wrote:
> >>
> >> Given the wording requiring a real dictionary, I would have assumed
> >> that it was OK (if perhaps not sensible) to do pointer arithmetic and
> >> access the keys/values/hashes directly.  (Though if the breakage was
> >> between python versions, I would feel guilty about griping too
> >> loudly.)
> > If you're going to be a language lawyer about it, I would simply point
> out
> > that all the spec requires is that "type(env) is dict" -- it says nothing
> > about how Python defines "type" or "is" or "dict".  So, you're on your
> own
> > with that one. ;-)
> But the public header file <
> >
> defines the typedef structs for PyDictEntry and _dictobject.
> What is the purpose of the requiring a "real dict" without also
> promising what the header file promises?
Er, just because it's in the .h doesn't mean it's in the public API.  But
in any event, if you're actually serious about this, I'd just point out

1. The struct layout doesn't guarantee anything about insertion or lookup
2. If the data structure were changed, the header file would obviously
change as well, and
3. ISTM that Python does not even promise inter-version ABI compatibility
for internals like the dict object layout.

Are you seriously writing code that relies on the C structure layout of
dicts?  Because really, that was SO not the point of the dict type
requirement.  It was so that you could use Python's low-level *API* calls,
not muck about with the data structure directly.  I'm occasionally
considered notorious for abusing Python internals, but even I have to draw
the line somewhere.  ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Jan  3 01:22:28 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 3 Jan 2012 10:22:28 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 3, 2012 at 12:54 AM, Benjamin Peterson <benjamin at> wrote:
> I think it's fine Nick raised this. PEP 7 is not very explicit about
> braces at all.

I actually discovered in this thread that I've been misreading PEP 7
for going on 7 years now - I thought the brace usage example *did* use
"} else {" (mainly because I write my if statements that way, and
nobody had ever pointed out to me that the C style guide actually says

So I'm happy enough with leaving PEP 7 alone and letting the stylistic
inconsistencies stand (even going forward). I agree in these days of
auto-indenting editors and automated test suites, the maintenance
benefits of always requiring the braces are significantly less than
they used to be.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Tue Jan  3 01:27:11 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 3 Jan 2012 10:27:11 +1000
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger
<raymond.hettinger at> wrote:
> Running ?``grep -B1 else Objects/*c`` shows that we've happily lived with a
> mixture of styles for a very long time.
> ISTM, our committers have had good instincts about when braces add clarity
> and when they add clutter.
> If Nick pushes through an always-use-braces mandate, A LOT of code will need
> to be changed.

Nah, I was asking a genuine question, not pushing anything in
particular. I *thought* the code base was more consistent than it is,
but it turns out that was an error of perception on my part, rather
than an objective fact.

With my perception of the status quo corrected, I can stop worrying
about preserving a non-existent consistency.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From raymond.hettinger at  Tue Jan  3 01:47:48 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 2 Jan 2012 16:47:48 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 2, 2012, at 4:27 PM, Nick Coghlan wrote:

> With my perception of the status quo corrected, I can stop worrying
> about preserving a non-existent consistency.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From timothy.c.delaney at  Tue Jan  3 01:53:06 2012
From: timothy.c.delaney at (Tim Delaney)
Date: Tue, 3 Jan 2012 11:53:06 +1100
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On 3 January 2012 09:55, Raymond Hettinger <raymond.hettinger at>wrote:

> On Jan 2, 2012, at 2:09 PM, Tim Delaney wrote:
> I'd also point out that if you're expecting braces, not having them can
> make the code less readable.
> If a programmer's mind explodes when they look at the simple and beautiful
> examples in K&R's The C Programming Language, then they've got problems
> that can't be solved by braces ;-)

Now that's just hyperbole ;)

If you've got a mix of braces and non-braces in a chunk of code, it's very
easy for the mind to skip over the non-brace blocks as not being blocks. I
know it's not something I'm likely to mess up when reading the code
in-depth, but if I'm skimming over trying to understand the gist of the
code or looking for what should be an obvious bug, a block that's not
brace-delimited is more likely to be missed than one that is (when amongst
other blocks that are).

If we had the option of "just use indentation" in C I'd advocate that.
Failing that, I find that consistent usage of braces is preferable.


Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue Jan  3 01:54:42 2012
From: guido at (Guido van Rossum)
Date: Mon, 2 Jan 2012 16:54:42 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 4:27 PM, Nick Coghlan <ncoghlan at> wrote:

> On Tue, Jan 3, 2012 at 8:32 AM, Raymond Hettinger
> <raymond.hettinger at> wrote:
> > Running  ``grep -B1 else Objects/*c`` shows that we've happily lived
> with a
> > mixture of styles for a very long time.
> > ISTM, our committers have had good instincts about when braces add
> clarity
> > and when they add clutter.
> > If Nick pushes through an always-use-braces mandate, A LOT of code will
> need
> > to be changed.
> Nah, I was asking a genuine question, not pushing anything in
> particular. I *thought* the code base was more consistent than it is,
> but it turns out that was an error of perception on my part, rather
> than an objective fact.
> With my perception of the status quo corrected, I can stop worrying
> about preserving a non-existent consistency.

Amen. And, as the (nominal) author of the PEP, the PEP didn't mean to state
an opinion on whether braces are mandatory. It only meant to state how they
should be placed when they are there. It's true that there are other
readings possible, but that's what I meant. If someone wants to change the
wording to clarify this, go right ahead.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Tue Jan  3 04:20:09 2012
From: barry at (Barry Warsaw)
Date: Mon, 2 Jan 2012 22:20:09 -0500
Subject: [Python-Dev] Code reviews
In-Reply-To: <jdsptk$jg2$>
References: <>
	<> <jdsptk$jg2$>
Message-ID: <>

On Jan 02, 2012, at 06:35 PM, Georg Brandl wrote:

>Exactly. Especially for reviews of patches from non-core people, we
>should exercise a lot of restraint: as the committers, I think we can be
>expected to bite the sour bullet and apply our uniform style (such as
>it is).
>It is tiresome, if not downright disappointing, to get reviews that
>are basically "nothing wrong, but please submit again with one more
>empty line between the classes", and definitely not the way to
>attract more contributors.

I think it's fine in a code review to point out where the submission misses
the important consistency points, but not to hold up merging the changes
because of that.  You want to educate and motivate so that the next submission
comes closer to our standards.  The core dev who commits the change can clean
up style issues.


P.S. +1 for the change to PEP 7.

From barry at  Tue Jan  3 04:25:55 2012
From: barry at (Barry Warsaw)
Date: Mon, 2 Jan 2012 22:25:55 -0500
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 02, 2012, at 02:08 PM, Guido van Rossum wrote:

>The irony is that style guides exist to *avoid* debates like this. Yes, the
>choices are arbitrary. Yes, tastes differ. Yes, there are exceptions to the
>rules. But still, once a style rule has been set, the idea is to stop
>debating and just code.


The other reason why style guides exist is to give contributors some sense of
what they should shoot for.  I've worked on existing code bases where there's
so little consistency I can't tell what the author's preferences are even if I
wanted to adhere to them.


From barry at  Tue Jan  3 04:36:01 2012
From: barry at (Barry Warsaw)
Date: Mon, 2 Jan 2012 22:36:01 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jdsq2v$jg2$>
References: <>
	<jdq9je$bnl$> <>
Message-ID: <>

On Jan 02, 2012, at 06:38 PM, Georg Brandl wrote:

>I wouldn't expect too much -- they seem rather keen on cheap laughs:

Heh, so yeah, it won't be me contacting them.


From martin at  Tue Jan  3 09:44:03 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 03 Jan 2012 09:44:03 +0100
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

> He keeps leaving them out, I occasionally tell him they should always
> be included (most recently this came up when we gave conflicting
> advice to a patch contributor). He says what he's doing is OK, because
> he doesn't consider the example in PEP 7 as explicitly disallowing it,
> I think it's a recipe for future maintenance hassles when someone adds
> a second statement to one of the clauses but doesn't add the braces.
> (The only time I consider it reasonable to leave out the braces is for
> one liner if statements, where there's no else clause at all)

While this appears to be settled, I'd like to add that I sided with
Benjamin here all along.

With Python, I accepted a style of "minimal punctuation". Examples
of extra punctuation are:
- parens around expression in Python's if (and while):

    if (x < 10):
      foo ()

- parens around return expression (C and Python)


- braces around single-statement blocks in C

In all these cases, punctuation can be left out without changing
the meaning of the program.

I personally think that a policy requiring braces would be (mildly)
harmful, as it decreases readability of the code. When I read code,
I read every character: not just the identifiers, but also every
punctuation character. If there is extra punctuation, I stop and wonder
what the motivation for the punctuation is - is there any hidden
meaning that required the author to put the punctuation?

There is a single case where I can accept extra punctuation in C:
to make the operator precedence explicit. Many people (including
myself) don't know how

   a | b << *c * *d

would group, so I readily accept extra parens as a clarification.

Wrt. braces, I don't share the concern that there is a risk of
somebody being confused when adding a second statement to a braceless
block. An actual risk is stuff like

   if (cond)

when MACRO expands to multiple statements. However, we should
accept that this is a bug in MACRO (which should have used the
do-while(0)-idiom), not in the application of the macro.


From lists at  Tue Jan  3 14:18:34 2012
From: lists at (Christian Heimes)
Date: Tue, 03 Jan 2012 14:18:34 +0100
Subject: [Python-Dev] RNG in the core
Message-ID: <>


all proposed fixes for a randomized hashing function raise and fall with
a good random number generator to feed the random seed. The seed must be
created very early in the startup phase of the interpreter, preferable
before the basic types are initialized. CPython already have multiple
sources for random data (win32_urandom in Modules/posixmodule.c, urandom
in Lib/, Mersenne twister in Modules/_randommodule.c). However we
can't use them because they are wrapped inside Python modules which
require infrastructure like initialized base types.

I propose an addition to the current Python C API:

int PyOS_URandom(char *buf, Py_ssize_t len)

Read "len" chars from the OS's RNG into the pre-allocated buffer "buf".
The RNG should be suitable for cryptography. In case of an error the
function returns -1 and sets an exception, otherwise it returns 0.
On Windows I can re-use most of the code of win32_urandom(). For POSIX I
have to implement os.urandom() in C in order to read data from
/dev/urandom. That's simple and straight forward.

Since some platforms may not have /dev/urandom, we need a PRNG in the
core, too. I therefore propose to move the Mersenne twister from
randommodule.c into the core, too.

typedef struct {
    unsigned long state[N];
    int index;
} _Py_MT_RandomState;

unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); //
double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random()
void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed);
// init_genrand()
void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long
init_key[], unsigned long key_length); // init_by_array

I suggest Python/random.c as source file and Python/pyrandom.h as header
file. Comments?


From anacrolix at  Tue Jan  3 15:46:51 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 4 Jan 2012 01:46:51 +1100
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

FWIW I'm against forcing braces to be used. Readability is the highest
concern, and this should be at the discretion of the contributor. A
code formatting tool, or compiler extension is the only proper handle
this, and neither are in use or available.

On Tue, Jan 3, 2012 at 7:44 PM, "Martin v. L?wis" <martin at> wrote:
>> He keeps leaving them out, I occasionally tell him they should always
>> be included (most recently this came up when we gave conflicting
>> advice to a patch contributor). He says what he's doing is OK, because
>> he doesn't consider the example in PEP 7 as explicitly disallowing it,
>> I think it's a recipe for future maintenance hassles when someone adds
>> a second statement to one of the clauses but doesn't add the braces.
>> (The only time I consider it reasonable to leave out the braces is for
>> one liner if statements, where there's no else clause at all)
> While this appears to be settled, I'd like to add that I sided with
> Benjamin here all along.
> With Python, I accepted a style of "minimal punctuation". Examples
> of extra punctuation are:
> - parens around expression in Python's if (and while):
> ? ?if (x < 10):
> ? ? ?foo ()
> - parens around return expression (C and Python)
> ? ?return(*p);
> - braces around single-statement blocks in C
> In all these cases, punctuation can be left out without changing
> the meaning of the program.
> I personally think that a policy requiring braces would be (mildly)
> harmful, as it decreases readability of the code. When I read code,
> I read every character: not just the identifiers, but also every
> punctuation character. If there is extra punctuation, I stop and wonder
> what the motivation for the punctuation is - is there any hidden
> meaning that required the author to put the punctuation?
> There is a single case where I can accept extra punctuation in C:
> to make the operator precedence explicit. Many people (including
> myself) don't know how
> ? a | b << *c * *d
> would group, so I readily accept extra parens as a clarification.
> Wrt. braces, I don't share the concern that there is a risk of
> somebody being confused when adding a second statement to a braceless
> block. An actual risk is stuff like
> ? if (cond)
> ? ? MACRO(argument);
> when MACRO expands to multiple statements. However, we should
> accept that this is a bug in MACRO (which should have used the
> do-while(0)-idiom), not in the application of the macro.
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


From stephen at  Tue Jan  3 16:46:22 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 04 Jan 2012 00:46:22 +0900
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

Matt Joiner writes:

 > Readability is the highest concern, and this should be at the
 > discretion of the contributor.

That's quite backwards.  "Readability" is community property, and has
as much, if not more, to do with common convention as with some
absolute metric.  The "contributor's discretion" must yield.

That doesn't mean the contributor has to do all the work; as several
people have pointed out, it makes a lot of sense for experienced
reviewers to make such trivial changes themselves before committing,
especially for new contributors.

From matthieu.brucher at  Tue Jan  3 18:23:08 2012
From: matthieu.brucher at (Matthieu Brucher)
Date: Tue, 3 Jan 2012 18:23:08 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
Message-ID: <>


I'm not a core Python developer, but it may be intesting to use a real
Crush resistant RNG, as one from Random123 (a parallel random generator
that is Crush resistant, contrary to the Mersenne Twister, and without a


Matthieu Brucher

2012/1/3 Christian Heimes <lists at>

> Hello,
> all proposed fixes for a randomized hashing function raise and fall with
> a good random number generator to feed the random seed. The seed must be
> created very early in the startup phase of the interpreter, preferable
> before the basic types are initialized. CPython already have multiple
> sources for random data (win32_urandom in Modules/posixmodule.c, urandom
> in Lib/, Mersenne twister in Modules/_randommodule.c). However we
> can't use them because they are wrapped inside Python modules which
> require infrastructure like initialized base types.
> I propose an addition to the current Python C API:
> int PyOS_URandom(char *buf, Py_ssize_t len)
> Read "len" chars from the OS's RNG into the pre-allocated buffer "buf".
> The RNG should be suitable for cryptography. In case of an error the
> function returns -1 and sets an exception, otherwise it returns 0.
> On Windows I can re-use most of the code of win32_urandom(). For POSIX I
> have to implement os.urandom() in C in order to read data from
> /dev/urandom. That's simple and straight forward.
> Since some platforms may not have /dev/urandom, we need a PRNG in the
> core, too. I therefore propose to move the Mersenne twister from
> randommodule.c into the core, too.
> typedef struct {
>    unsigned long state[N];
>    int index;
> } _Py_MT_RandomState;
> unsigned long _Py_MT_GenRand_Int32(_Py_MT_RandomState *state); //
> genrand_int32()
> double _Py_MT_GenRand_Res53(_Py_MT_RandomState *state); // random_random()
> void _Py_MT_GenRand_Init(_Py_MT_RandomState *state, unsigned long seed);
> // init_genrand()
> void _Py_MT_GenRand_InitArray(_Py_MT_RandomState *state, unsigned long
> init_key[], unsigned long key_length); // init_by_array
> I suggest Python/random.c as source file and Python/pyrandom.h as header
> file. Comments?
> Christian
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Information System Engineer, Ph.D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Tue Jan  3 18:46:05 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 3 Jan 2012 18:46:05 +0100
Subject: [Python-Dev] RNG in the core
References: <>
Message-ID: <>

On Tue, 03 Jan 2012 14:18:34 +0100
Christian Heimes <lists at> wrote:
> I suggest Python/random.c as source file and Python/pyrandom.h as header
> file. Comments?

Looks good on the principle. The API names for MT are a bit ugly.

> The RNG should be suitable for cryptography.

Sounds like too strong a requirement. For cryptography, we have the ssl
module (and third-party libraries).
(also, "suitable for cryptography" is somewhat vague; for example, the
Linux man pages insist that /dev/urandom is ok for session keys
but /dev/random is needed for long-lived private keys)



From lists at  Tue Jan  3 18:50:44 2012
From: lists at (Christian Heimes)
Date: Tue, 03 Jan 2012 18:50:44 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
Message-ID: <>

Am 03.01.2012 18:23, schrieb Matthieu Brucher:
> Hi,
> I'm not a core Python developer, but it may be intesting to use a real
> Crush resistant RNG, as one from Random123 (a parallel random generator
> that is Crush resistant, contrary to the Mersenne Twister, and without a
> state).

Hello Matthieu,

thanks for your input!

The core RNG is going to be part of the randomized hashing function
patch. The patch will be applied to all Python version from 2.6 to 3.3.
Some people may want to applied it to 2.4 and 2.5, too. As the patch is
going to affect six to eight Python versions, it should introduce as few
new code as possible. Any new code might be a source of new bugs. The
Mersenne Twister code is mature and works sufficiently as backup.

Any new RNG should go through a PEP process, too. You are welcome to
write a PEP and implement an additional RNG for the random module. New
developers and new ideas are well received.


From ethan at  Tue Jan  3 19:42:43 2012
From: ethan at (Ethan Furman)
Date: Tue, 03 Jan 2012 10:42:43 -0800
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Matt Joiner writes:
>  > Readability is the highest concern, and this should be at the
>  > discretion of the contributor.
> That's quite backwards.  "Readability" is community property, and has
> as much, if not more, to do with common convention as with some
> absolute metric.  The "contributor's discretion" must yield.

Readability also includes more than just the source code; as has already 
been stated:

  if(cond) {
+  stmt2;


+if(cond) {
+  stmt2;

I find the diff version that already had braces in place much more readable.


From barry at  Tue Jan  3 20:38:38 2012
From: barry at (Barry Warsaw)
Date: Tue, 3 Jan 2012 14:38:38 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Dec 31, 2011, at 04:56 PM, Guido van Rossum wrote:

>Is there a tracker issue yet? The discussion should probably move there.

I think the answer to this was "no"... until now.

Proposed patches should be linked to this issue now.  Please nosy yourself if
you want to follow the progress.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From jimjjewett at  Tue Jan  3 20:55:32 2012
From: jimjjewett at (Jim Jewett)
Date: Tue, 3 Jan 2012 14:55:32 -0500
Subject: [Python-Dev] That depends on what the meaning of "is" is (was
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 2, 2012 at 7:16 PM, PJ Eby <pje at> wrote:
> On Mon, Jan 2, 2012 at 4:07 PM, Jim Jewett <jimjjewett at> wrote:

>> But the public header file <
>> >
>> defines the typedef structs for PyDictEntry and _dictobject.

>> What is the purpose of the requiring a "real dict" without also
>> promising what the header file promises?

> Er, just because it's in the .h doesn't mean it's in the public API. ?But in
> any event, if you're actually serious about this, I'd just point out that:

> 1. The struct layout doesn't guarantee anything about insertion or lookup
> algorithms,

My concern was about your suggestion of changing the data structure to
accommodate some other algorithm -- particularly if it meant that  the
data would no longer be stored entirely in an array of PyDictEntry.

That shouldn't be done lightly even between major versions, and
certainly should not be done in a bugfix (or security-only) release.

> Are you seriously writing code that relies on the C structure layout of
> dicts?

The first page of search results for PyDictEntry suggested that others
are.  (The code I found did seem to be for getting data from a python
dict into some other language, rather than for wsgi.)

> ?Because really, that was SO not the point of the dict type
> requirement. ?It was so that you could use Python's low-level *API* calls,
> not muck about with the data structure directly.

Would it be too late to clarify that in the PEP itself?


From steve at  Tue Jan  3 21:29:10 2012
From: steve at (Steven D'Aprano)
Date: Wed, 04 Jan 2012 07:29:10 +1100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
Message-ID: <>

Christian Heimes wrote:
> I propose an addition to the current Python C API:
> int PyOS_URandom(char *buf, Py_ssize_t len)
> Read "len" chars from the OS's RNG into the pre-allocated buffer "buf".
> The RNG should be suitable for cryptography.

> Since some platforms may not have /dev/urandom, we need a PRNG in the
> core, too. I therefore propose to move the Mersenne twister from
> randommodule.c into the core, too.

Mersenne twister is not suitable for cryptography.


From matthieu.brucher at  Tue Jan  3 22:00:43 2012
From: matthieu.brucher at (Matthieu Brucher)
Date: Tue, 3 Jan 2012 22:00:43 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
Message-ID: <>

> The core RNG is going to be part of the randomized hashing function
> patch. The patch will be applied to all Python version from 2.6 to 3.3.
> Some people may want to applied it to 2.4 and 2.5, too. As the patch is
> going to affect six to eight Python versions, it should introduce as few
> new code as possible. Any new code might be a source of new bugs. The
> Mersenne Twister code is mature and works sufficiently as backup.
> Any new RNG should go through a PEP process, too. You are welcome to
> write a PEP and implement an additional RNG for the random module. New
> developers and new ideas are well received.

Good point.
In fact, these RNG are 100% based on the hash functions provided for
instance by OpenSSL. But I think this library is not a dependency so my
proposal still has the same impact.
The Random123 library is a reimplementation of some cryptographic functions
with two arguments, the key and the counter, and that's it. So if there is
somewhere in the Python C code such cryptographic function, it can be
reused to create Crush-resistant random numbers with no new code line.


Information System Engineer, Ph.D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From victor.stinner at  Tue Jan  3 22:17:06 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 3 Jan 2012 22:17:06 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
Message-ID: <>

A randomized hash doesn't need cryptographic RNG (which are slow and
need a lot of new code), and the new hash function should maybe not be
cryptographic. We need to make the DoS more expensive for the
attacker, but we don't need to add "too much security" for that.

Mersenne Twister is useless here: it is only needed when you need to
generate a fast RNG to generate megabytes of random data, whereas we
will not need more than 4 KB. The OS RNG is just fine (fast enough and
not blocking).

So we can use Windows CryptoGen API (which is already implemented in
Python, win32_urandom) and /dev/urandom on UNIX/BSD. /dev/urandom does
never block. We need also a fallback if /dev/urandom is not available.
Because this case should not occur on modern OS, the fallback can be a
weak function like something combining getpid(), gettimeofday(),
address of the stack, etc. To generate 4 KB from few words, we can use
a very simple LCG (x(n+1) = (x(n) * a + c) mod k).

From solipsis at  Tue Jan  3 22:20:53 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 3 Jan 2012 22:20:53 +0100
Subject: [Python-Dev] RNG in the core
References: <>
Message-ID: <>

On Tue, 3 Jan 2012 22:17:06 +0100
Victor Stinner <victor.stinner at> wrote:
> A randomized hash doesn't need cryptographic RNG (which are slow and
> need a lot of new code), and the new hash function should maybe not be
> cryptographic. We need to make the DoS more expensive for the
> attacker, but we don't need to add "too much security" for that.


> Mersenne Twister is useless here: it is only needed when you need to
> generate a fast RNG to generate megabytes of random data, whereas we
> will not need more than 4 KB. The OS RNG is just fine (fast enough and
> not blocking).

Have you read the following sentence:

?Since some platforms may not have /dev/urandom, we need a PRNG in the
core, too. I therefore propose to move the Mersenne twister from
randommodule.c into the core, too.?



From janssen at  Tue Jan  3 23:02:19 2012
From: janssen at (Bill Janssen)
Date: Tue, 3 Jan 2012 14:02:19 PST
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Christian Heimes <lists at> wrote:

> Am 29.12.2011 12:13, schrieb Mark Shannon:
> > The attack relies on being able to predict the hash value for a given
> > string. Randomising the string hash function is quite straightforward.
> > There is no need to change the dictionary code.
> > 
> > A possible (*untested*) patch is attached. I'll leave it for those more 
> > familiar with unicodeobject.c to do properly.
> I'm worried that hash randomization of str is going to break 3rd party
> software that rely on a stable hash across multiple Python instances.
> Persistence layers like ZODB and cross interpreter communication
> channels used by multiprocessing may (!) rely on the fact that the hash
> of a string is fixed.

Software that depends on an undefined hash function for synchronization
and persistence deserves to break, IMO.  There are plenty of
well-defined hash functions available for this purpose.


From martin at  Tue Jan  3 23:21:30 2012
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 03 Jan 2012 23:21:30 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>	<>
Message-ID: <>

> Have you read the following sentence:
> ?Since some platforms may not have /dev/urandom, we need a PRNG in the
> core, too. I therefore propose to move the Mersenne twister from
> randommodule.c into the core, too.?

I disagree. We don't need a PRNG on platforms without /dev/urandom or
any other native RNG.
Initializing the string-hash seed to 0 is perfectly fine on those
platforms; we can do slightly better by using, say, the current
time (in ms or ?s if available) and the current pid (if available).

People concerned with the security on those systems either need to
switch to a different system, or provide a patch to access the
platform's native random number generator.


From ben+python at  Tue Jan  3 23:30:24 2012
From: ben+python at (Ben Finney)
Date: Wed, 04 Jan 2012 09:30:24 +1100
Subject: [Python-Dev] PEP 7 clarification request: braces
References: <>
Message-ID: <>

"Stephen J. Turnbull" <stephen at> writes:

> Matt Joiner writes:
>  > Readability is the highest concern, and this should be at the
>  > discretion of the contributor.
> That's quite backwards.  "Readability" is community property, and has
> as much, if not more, to do with common convention as with some
> absolute metric.  The "contributor's discretion" must yield.


 \          ?Those who write software only for pay should go hurt some |
  `\                 other field.? ?Erik Naggum, in _gnu.misc.discuss_ |
_o__)                                                                  |
Ben Finney

From martin at  Wed Jan  4 00:11:50 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 04 Jan 2012 00:11:50 +0100
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

> Readability also includes more than just the source code; as has already
> been stated:
>  if(cond) {
>    stmt1;
> +  stmt2;
>  }
> vs.
> -if(cond)
> +if(cond) {
>    stmt1;
> +  stmt2;
> +}
> I find the diff version that already had braces in place much more
> readable.

Is it really *much* more readable? I have no difficulties reading either
(although I had preferred a space after the if; this worries me
more than the double if line).


From benjamin at  Wed Jan  4 00:17:56 2012
From: benjamin at (Benjamin Peterson)
Date: Tue, 3 Jan 2012 23:17:56 +0000 (UTC)
Subject: [Python-Dev] PEP 7 clarification request: braces
References: <>	<>	<>
Message-ID: <>

Ethan Furman <ethan <at>> writes:
> Readability also includes more than just the source code; as has already 
> been stated:
>   if(cond) {
>     stmt1;
> +  stmt2;
>   }
> vs.
> -if(cond)
> +if(cond) {
>     stmt1;
> +  stmt2;
> +}
> I find the diff version that already had braces in place much more readable.

There are much larger problems facing diff readibility. On your basis, we might
as well decree that code should never be arranged or reindented.


From mwm at  Wed Jan  4 01:40:36 2012
From: mwm at (Mike Meyer)
Date: Tue, 3 Jan 2012 16:40:36 -0800
Subject: [Python-Dev] Proposed PEP on concurrent programming support
Message-ID: <20120103164036.681beeae@mikmeyer-vm-fedora>

Title: Interpreter support for concurrent programming
Version: $Revision$
Last-Modified: $Date$
Author: Mike Meyer <mike at>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 11-Nov-2011


The purpose of this PEP is to explore strategies for making concurrent
programming in Python easier by allowing the interpreter to detect and
notify the user about possible bugs in concurrent access. The reason
for doing so is that "Errors should never pass silently".

Such bugs are caused by allowing objects to be accessed simultaneously
from another thread of execution while they are being modified.
Currently, python systems provide no support for such bugs, falling
back on the underlying platform facilities and some tools built on top
of those.  While these tools allow prevention of such modification if
the programmer is aware of the need for them, there are no facilities
to detect that such a need might exist and warn the programmer of it.

The goal is not to prevent such bugs, as that depends on the
programmer getting the logic of the interactions correct, which the
interpreter can't judge.  Nor is the goal to warn the programmer about
any such modifications - the goal is to catch standard idioms making
unsafe modifications.  If the programmer starts tinkering with
Python's internals, it's assumed they are aware of these issues.


Concurrency bugs are among the hardest bugs to locate and fix.  They
result in corrupt data being generated or used in a computation.  Like
most such bugs, the corruption may not become evident until much later
and far away in the program.  Minor changes in the code can cause the
bugs to fail to manifest.  They may even fail to manifest from run to
run, depending on external factors beyond the control of the

Therefore any help in locating and dealing with such bugs is valuable.
If the interpreter is to provide such help, it must be aware of when
things are safe to modify and when they are not. This means it will
almost certainly cause incompatible changes in Python, and may impose
costs so high for non-concurrent operations as to make it untenable.
As such, the final options discussed are destined for Python version 4
or later, and may never be implemented in any mainstream
implementation of Python.


The word "thread" is used throughout to mean "concurrent thread of
execution".  Nominally, this means a platform thread.  However, it is
intended to include any threading mechanism that allows the
interpreter to change threads between or in the middle of a statement
without the programmer specifically allowing this to happen.

Similarly, the word "interpreter" means any system that processes and
executes Python language files.  While this normally means cPython,
the changes discussed here should be amenable to other


Locking object

The idea is that the interpreter should indicate an error anytime an
unlocked object is mutated.  For mutable types, this would mean
changing the value of the type. For Python class instances, this would
mean changing the binding of an attribute.  Mutating an object bound
to such an attribute isn't a change in the object the attribute
belongs to, and so wouldn't indicate an error unless the object bound
to the attribute was unlocked.

Locking by name

It's also been suggested that locking "names" would be useful.  That
is, to prevent a specific attribute of an object from being rebound,
or a key/index entry in a mapping object. This provides a finer
grained locking than just locking the object, as you could lock a
specific attribute or set of attributes of an object, without locking
all of them.

Unfortunately, this isn't sufficient: a set may need to be locked to
prevent deletions for some period, or a dictionary to prevent adding a
key, or a list to prevent changing a slice, etc.

So some other locking mechanism is required.  If that needs to specify
objects, some way of distinguishing between locking a name and locking
the object bound to the name needs to be invented, or there needs to
be two different locking mechanisms.  It's not clear that the finer
grained locking is worth adding yet another language mechanism.


Explicit locking

These alternatives requires that the programmer explicitly name
anything that is going to be changed to lock it before changing it.
This lets the interpreter gets involved, but makes a number of errors
possible based on the order that locks are applied.

Platform locks

The current tool set uses platform locks via a C extension.  The
problem with these is that the interpreter has no knowledge of them,
and so can't do anything about detecting the mutation of unlocked

A ``locking`` keyword

Adding a statement to tell the interpreter to lock objects for the
attached suite would let the interpreter know which objects are
locked.  To help prevent deadlocks, such a keyword needs to imply an
order for locking objects, such that two objects locked by the a
locking statement will lock the two objects in the same order during a
single execution of the program.  This can be achieved by sorting
objects by the ``id`` of the object, since the requirements for ``id``
are sufficient for this.

While the locking order requirement is sufficient to prevent deadlocks
from non-nested locking statements, it's not sufficient if locking
statements are allowed to nest.  So either nested locking statements
need to be disallowed, or the outer statement must lock everything
that's going to need to be locked.

Either requirement is sufficiently onerous that alternatives need to
be considered.

Implicit locking

In this alternative, the interpter uses one or more heuristics to
decide when things should need locking.

Software Transactional Memory (STM)

STM is a relatively new technology being experimented with in newer
languages, and in a number of 3rd party libraries (both Peak [#Peak]_
and Kamaelia [#Kamaelia]_ provide STM facilities).  A suite is marked
as a `transaction`, and then when an unlocked object is modified,
instead of indicating an error, a locked copy of it is created to be
used through the rest of the transaction. If any of the originals are
modified during the execution of the suite, the suite is rerun from
the beginning. If it completes, the locked copies are copied back to
the originals in an atomic manner.

This causes the changes seen by any threads not running the
transaction to be atomic. If two threads are updating the same object
in transactions, the one that finishes second will be restarted with
values set by the one that finished first.

The advantage of an STM is that the programmer doesn't have to worry
about what is locked, and there's no overhead for using locked objects
(after locking them, of course).

The disadvantage is that any code in a transaction must be safe to run
multiple times.  This forbids any kind of I/O.

Compiler support

Since the point is to get the interpreter involved, we might as well
let it be involved in figuring out which things are safe and don't
need to be locked.  This could potentially eliminate a lot of locking.

Each object - whether a Python class instance or builtin type - is
created with no way to access it until it is bound.  So it is
inherently safe to modify.  Being bound to a local (or nonlocal?)
variable doesn't change this.

Being bound to a global, class or instance variable or stored in a
container does change this, as the object may now be accessed from
other threads via the module or container.

Since this analysis is being done at compile time, being passed to
another function - including methods of the object - makes it unsafe.
Likewise, yielding an object makes it unsafe for future use.
Returning it doesn't change anything, since our execution is over and
we lose access to the object.  Unfortunately, objects returned from
functions must be treated as unsafe.

Interpreter support

If we instead track whether or not objects require locking in the
interpreter, then we can improve the analsysis.  The only thing that
definitely makes an object unsafe is binding to a global variable or a
variable known to be unsafe.

Passing objects to C routines exposes them to concurrent modification,
since there's no way to know what will happen inside the C routine.
Adding some way of marking C routines - or possibly the objects passed
to them - as not exposing things to concurrent modification would help
with this, allowing C modules to be called without requiring locking
everything passed to them.

Binding to class and instance variables, or adding them to a
container, is an interesting issue.  If the object in question is
safe, then anything added to it is also safe.  However, this would
mean that when an object is flagged as unsafe, all objects accessible
through it would also have to be flagged as unsafe.

This type of tracking also means that objects effectively have three
states: locked, unlocked, and safe.  Both locked and safe objects can
safely be modified without a problem. Locking and unlocking safe
objects is a nop.

Interpreter threading

One alternative is replacing the current threading tools - which are
wrappers around the OS-provided threading - with threading support in
the interpreter.  This would allow the interpreter to control whether
or not objects are shared between threads, which isn't possible today.
The full implications of this approach have as yet to be worked out.

Mixed solutions

Most likely, any real implementation would use a number of the
techniques above, since all of them have serious shortcomings.  For
instance, combining STM with explicit locking would allow explicit
locking when IO was required, but complex multi-object changes could
be handled by STM, thus avoiding the nested locking issues.

Likewise, interpreter or compiler support could be mixed with most
other solutions to relax the requirement of locking for part of the
objects used in a program.

The implications of mixing these things together also needs to be
explored more thoroughly.

Change proposal

This is 'strawman' proposal to provide a starting point for

The proposal is to add an STM support to the python interpreter. A new
suite type - the ``transaction`` will be added to the language. The
suite will have the semantics discussed above: modifying an object in
the suite will trigger creation of a thread-local shallow copy to be
used in the Transaction. Further modifications of the original will
cause all existing copies to be discarded and the transaction to be
restarted. At the end of the transaction, the originals of all the
copies are locked and then updated to the state of the copy.

Further work

Requiring further investigation:

- The interpreter providing it's own threading.

- How various solutions interact when mixed.

There are also a couple tools that might be useful to build, or at
least investigate building:

- A static concurrency safety analyzer, that handled the AST of a
  function to determine which variables are safe.

- A dynamic concurrency safety analyzer, similar to coverage [#coverage]_.

Implementation Notes

Not significantly impacting the performance of single-threaded code
must be of paramount importance to any implementation.

One implementation technique arose that could help with this.  Instead
of keeping track of the objects state and having methods check that
state and modify their behavior based on it, change the methods as the
object changes state. So in safe or locked mode, the objects methods
could freely modify the object without having to check it's mode.  In
unlocked mode, an attempt to do so would raise an error or
warning. Unfortunately, this doesn't work if some global or thread
state must be checked instead of just object-local state.


.. [#Peak] "Peak, the Python Enterprise Application Kit",

.. [#Kamaelia] "Kamaelia - Concurrency made useful, fun",

.. [#coverage] "Code coverage measurement for Python",


This document has been placed in the public domain.

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8

From tjreedy at  Wed Jan  4 01:41:53 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 03 Jan 2012 19:41:53 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <je078l$jmb$>

On 1/3/2012 5:02 PM, Bill Janssen wrote:

> Software that depends on an undefined hash function for synchronization
> and persistence deserves to break, IMO.  There are plenty of
> well-defined hash functions available for this purpose.

The doc for id() now says "This is an integer which is guaranteed to be 
unique and constant for this object during its lifetime." Since the 
default 3.2.2 hash for my win7 64bit CPython is id-address // 16, it can 
have no longer guarantee. I suggest that hash() doc say something 

Terry Jan Reedy

From solipsis at  Wed Jan  4 02:34:03 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 4 Jan 2012 02:34:03 +0100
Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function
References: <>
Message-ID: <>

> +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
> +                        Py_ssize_t length, Py_UCS4 fill_char)
> +
> +   Fill a string with a character: write *fill_char* into
> +   ``unicode[start:start+length]``.
> +
> +   Fail if *fill_char* is bigger than the string maximum character, or if the
> +   string has more than 1 reference.
> +
> +   Return the number of written character, or return ``-1`` and raise an
> +   exception on error.

The return type should then be Py_ssize_t, not int.



From ncoghlan at  Wed Jan  4 02:42:20 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 4 Jan 2012 11:42:20 +1000
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Jan 4, 2012 at 8:21 AM, "Martin v. L?wis" <martin at> wrote:
>> Have you read the following sentence:
>> ?Since some platforms may not have /dev/urandom, we need a PRNG in the
>> core, too. I therefore propose to move the Mersenne twister from
>> randommodule.c into the core, too.?
> I disagree. We don't need a PRNG on platforms without /dev/urandom or
> any other native RNG.
> Initializing the string-hash seed to 0 is perfectly fine on those
> platforms; we can do slightly better by using, say, the current
> time (in ms or ?s if available) and the current pid (if available).
> People concerned with the security on those systems either need to
> switch to a different system, or provide a patch to access the
> platform's native random number generator.

+1 (especially given how far back this is going to be ported)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From solipsis at  Wed Jan  4 02:59:51 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 4 Jan 2012 02:59:51 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Tue, 03 Jan 2012 23:21:30 +0100
"Martin v. L?wis" <martin at> wrote:
> > Have you read the following sentence:
> > 
> > ?Since some platforms may not have /dev/urandom, we need a PRNG in the
> > core, too. I therefore propose to move the Mersenne twister from
> > randommodule.c into the core, too.?
> I disagree. We don't need a PRNG on platforms without /dev/urandom or
> any other native RNG.

Well what if /dev/urandom is unavailable because the program is run
e.g. in a chroot?
(or is /dev/urandom still available in a chroot?)



From stephen at  Wed Jan  4 05:10:37 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 04 Jan 2012 13:10:37 +0900
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson writes:
 > Ethan Furman <ethan <at>> writes:
 > > 
 > > Readability also includes more than just the source code; as has already 
 > > been stated:

[diffs elided]

 > > I find the diff version that already had braces in place much more readable.
 > There are much larger problems facing diff readibility. On your basis, we might
 > as well decree that code should never be arranged or reindented.

That's a reasonable approach sometimes used, but it would be hard in
Python.  Specifically, I often produce two patches when substantial
rearrangement is involved.  The first isolates the actual changes, the
second does the reformatting.

In Python, the first patch might be syntactically erroneous, which
would be both annoying for automatic testing and less readable.  A
Python-friendly alternative is to provide both a machine-appliable
diff and a diff ignoring whitespace changes.  This could be a toggle
in web interfaces to the VCS.  I've also sometimes found doing word
diffs to be useful.

Most developers resist such procedures passionately, though.  *shrug*

From benjamin at  Wed Jan  4 05:32:23 2012
From: benjamin at (Benjamin Peterson)
Date: Tue, 3 Jan 2012 22:32:23 -0600
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/3 Stephen J. Turnbull <stephen at>:
> Benjamin Peterson writes:
> ?> Ethan Furman <ethan <at>> writes:
> ?> >
> ?> > Readability also includes more than just the source code; as has already
> ?> > been stated:
> [diffs elided]
> ?> > I find the diff version that already had braces in place much more readable.
> ?>
> ?> There are much larger problems facing diff readibility. On your basis, we might
> ?> as well decree that code should never be arranged or reindented.
> That's a reasonable approach sometimes used

My goodness, I was trying to make a ridiculous-sounding proposition.


From pje at  Wed Jan  4 06:07:27 2012
From: pje at (PJ Eby)
Date: Wed, 4 Jan 2012 00:07:27 -0500
Subject: [Python-Dev] Proposed PEP on concurrent programming support
In-Reply-To: <20120103164036.681beeae@mikmeyer-vm-fedora>
References: <20120103164036.681beeae@mikmeyer-vm-fedora>
Message-ID: <>

On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer <mwm at> wrote:

> STM is a relatively new technology being experimented with in newer
> languages, and in a number of 3rd party libraries (both Peak [#Peak]_
> and Kamaelia [#Kamaelia]_ provide STM facilities).

I don't know about Kamaelia, but PEAK's STM (part of the Trellis
event-driven library) is *not* an inter-thread concurrency solution: it's
actually used to sort out the order of events in a co-operative
multitasking scenario.  So, it should not be considered evidence for the
practicality of doing inter-thread co-ordination that way in pure Python.

A suite is marked
> as a `transaction`, and then when an unlocked object is modified,
> instead of indicating an error, a locked copy of it is created to be
> used through the rest of the transaction. If any of the originals are
> modified during the execution of the suite, the suite is rerun from
> the beginning. If it completes, the locked copies are copied back to
> the originals in an atomic manner.

I'm not sure if "locked" is really the right word here.  A private copy
isn't "locked" because it's not shared.

The disadvantage is that any code in a transaction must be safe to run
> multiple times.  This forbids any kind of I/O.

More precisely, code in a transaction must be *reversible*, so it doesn't
forbid any I/O that can be undone.  If you can seek backward in an input
file, for example, or delete queued output data, then it can still be done.
 Even I/O like re-drawing a screen can be made STM safe by making the
redraw occur after a transaction that reads and empties a buffer written by
other transactions.

> instance, combining STM with explicit locking would allow explicit
> locking when IO was required,

I don't think this idea makes any sense, since STM's don't really "lock",
and to control I/O in an STM system you just STM-ize the queues.
 (Generally speaking.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Wed Jan  4 07:30:16 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 04 Jan 2012 15:30:16 +0900
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson writes:

 > My goodness, I was trying to make a ridiculous-sounding proposition.

In this kind of discussion, that's in the same class as "be careful
what you wish for -- because you might just get it."

From fijall at  Wed Jan  4 08:59:15 2012
From: fijall at (Maciej Fijalkowski)
Date: Wed, 4 Jan 2012 09:59:15 +0200
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen <janssen at> wrote:
> Christian Heimes <lists at> wrote:
>> Am 29.12.2011 12:13, schrieb Mark Shannon:
>> > The attack relies on being able to predict the hash value for a given
>> > string. Randomising the string hash function is quite straightforward.
>> > There is no need to change the dictionary code.
>> >
>> > A possible (*untested*) patch is attached. I'll leave it for those more
>> > familiar with unicodeobject.c to do properly.
>> I'm worried that hash randomization of str is going to break 3rd party
>> software that rely on a stable hash across multiple Python instances.
>> Persistence layers like ZODB and cross interpreter communication
>> channels used by multiprocessing may (!) rely on the fact that the hash
>> of a string is fixed.
> Software that depends on an undefined hash function for synchronization
> and persistence deserves to break, IMO. ?There are plenty of
> well-defined hash functions available for this purpose.
> Bill
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

A lot of software will break their tests, because dict ordering would
depend on the particular run. I know, because some of them break on
pypy which has a different dict ordering. This is probably a good
thing in general, but is it really worth it? People will install
python 2.6.newest and stuff *will* break.

Is it *really* a security issue? We knew all along that dicts are
O(n^2) in worst case scenario, how is this suddenly a security


From martin at  Wed Jan  4 09:02:14 2012
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 04 Jan 2012 09:02:14 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>	<>	<>
	<> <>
Message-ID: <>

> Well what if /dev/urandom is unavailable because the program is run
> e.g. in a chroot?

If the system ought to have /dev/urandom (as e.g. determined during
configure), I propose that Python fails fast, unless the command line
option is given that disables random hash seeds.

For the security fixes, we therefore might want to toggle the meaning
of the command line switch, i.e. only use random seeds if explicitly

> (or is /dev/urandom still available in a chroot?)

You can make it available if you want to: just create a /dev directory,
and do mknod in it. It's common to run /dev/MAKEDEV (or similar), or
to mount devfs into a chroot environment; else many programs run in the
chroot are likely going to fail (e.g. if /dev/tty is missing).

See, for example,

bind apparently requires /dev/null and /dev/random.


From solipsis at  Wed Jan  4 11:55:13 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 4 Jan 2012 11:55:13 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
	<> <>
Message-ID: <>

On Wed, 4 Jan 2012 09:59:15 +0200
Maciej Fijalkowski <fijall at> wrote:
> Is it *really* a security issue? We knew all along that dicts are
> O(n^2) in worst case scenario, how is this suddenly a security
> problem?

Because it has been shown to be exploitable for malicious purposes?



From lists at  Wed Jan  4 12:18:54 2012
From: lists at (Christian Heimes)
Date: Wed, 04 Jan 2012 12:18:54 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Am 04.01.2012 08:59, schrieb Maciej Fijalkowski:
> Is it *really* a security issue? We knew all along that dicts are
> O(n^2) in worst case scenario, how is this suddenly a security
> problem?

For example Microsoft has released an extraordinary and unscheduled
security patch for the issue between Christmas and New Year. I don't
normally use MS as reference but this should give you a hint about the

Have you watched the talk yet?


From victor.stinner at  Wed Jan  4 04:30:06 2012
From: victor.stinner at (Victor Stinner)
Date: Wed, 4 Jan 2012 04:30:06 +0100
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> (or is /dev/urandom still available in a chroot?)

Last time that I played with chroot, I "binded" /dev and /proc. Many
programs rely on specific devices like /dev/null.

Python should not refuse to start if /dev/urandom (or CryptoGen) is
missing or cannot be used, but should use a weak fallback.


From victor.stinner at  Wed Jan  4 04:30:16 2012
From: victor.stinner at (Victor Stinner)
Date: Wed, 4 Jan 2012 04:30:16 +0100
Subject: [Python-Dev] cpython: Add a new PyUnicode_Fill() function
In-Reply-To: <>
References: <>
Message-ID: <>

Oops, it's a typo in the doc (copy/paste failure). It's now fixed, thanks.


2012/1/4 Antoine Pitrou <solipsis at>:
>> +.. c:function:: int PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
>> + ? ? ? ? ? ? ? ? ? ? ? ?Py_ssize_t length, Py_UCS4 fill_char)
>> +
>> + ? Fill a string with a character: write *fill_char* into
>> + ? ``unicode[start:start+length]``.
>> +
>> + ? Fail if *fill_char* is bigger than the string maximum character, or if the
>> + ? string has more than 1 reference.
>> +
>> + ? Return the number of written character, or return ``-1`` and raise an
>> + ? exception on error.
> The return type should then be Py_ssize_t, not int.
> Regards
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From brian at  Wed Jan  4 15:05:28 2012
From: brian at (Brian Curtin)
Date: Wed, 4 Jan 2012 08:05:28 -0600
Subject: [Python-Dev] PEP 7 clarification request: braces
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 4, 2012 at 00:30, Stephen J. Turnbull <stephen at> wrote:
> Benjamin Peterson writes:
> ?> My goodness, I was trying to make a ridiculous-sounding proposition.
> In this kind of discussion, that's in the same class as "be careful
> what you wish for -- because you might just get it."

I wish we could move onto better discussions than brace
placement/existence at this point.

*crosses fingers*

From jimjjewett at  Wed Jan  4 15:41:19 2012
From: jimjjewett at (Jim Jewett)
Date: Wed, 4 Jan 2012 09:41:19 -0500
Subject: [Python-Dev] Proposed PEP on concurrent programming support
Message-ID: <>

(I've added back python-ideas, because I think that is still the
appropriate forum.)

>.... A new
> suite type - the ``transaction`` will be added to the language. The
> suite will have the semantics discussed above: modifying an object in
> the suite will trigger creation of a thread-local shallow copy to be
> used in the Transaction. Further modifications of the original will
> cause all existing copies to be discarded and the transaction to be
> restarted. ...

How will you know that an object has been modified?

The only ways I can think of are

(1)  Timestamp every object -- or at least every mutable object -- and
hope that everybody agrees on which modifications should count.

(2)  Make two copies of every object you're using in the suite; at the
end, compare one of them to both the original and the one you were
operating on.  With this solution, you can decide for youself what
counts as a modification, but it still isn't straightforward; I would
consider changing a value to be changing a dict, even though
nothing in the item (header) itself changed.


From barry at  Wed Jan  4 16:20:28 2012
From: barry at (Barry Warsaw)
Date: Wed, 4 Jan 2012 10:20:28 -0500
Subject: [Python-Dev] RNG in the core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Jan 04, 2012, at 02:59 AM, Antoine Pitrou wrote:

>Well what if /dev/urandom is unavailable because the program is run
>e.g. in a chroot?
>(or is /dev/urandom still available in a chroot?)

It is (apparently) in an schroot in Ubuntu, so I'd guess it's also available
in Debian (untested).


From ericsnowcurrently at  Wed Jan  4 20:15:46 2012
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 4 Jan 2012 12:15:46 -0700
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Jan 4, 2012 at 12:59 AM, Maciej Fijalkowski <fijall at> wrote:
> On Wed, Jan 4, 2012 at 12:02 AM, Bill Janssen <janssen at> wrote:
>> Christian Heimes <lists at> wrote:
>>> Am 29.12.2011 12:13, schrieb Mark Shannon:
>>> > The attack relies on being able to predict the hash value for a given
>>> > string. Randomising the string hash function is quite straightforward.
>>> > There is no need to change the dictionary code.
>>> >
>>> > A possible (*untested*) patch is attached. I'll leave it for those more
>>> > familiar with unicodeobject.c to do properly.
>>> I'm worried that hash randomization of str is going to break 3rd party
>>> software that rely on a stable hash across multiple Python instances.
>>> Persistence layers like ZODB and cross interpreter communication
>>> channels used by multiprocessing may (!) rely on the fact that the hash
>>> of a string is fixed.
>> Software that depends on an undefined hash function for synchronization
>> and persistence deserves to break, IMO. ?There are plenty of
>> well-defined hash functions available for this purpose.
>> Bill
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at
>> Unsubscribe:
> A lot of software will break their tests, because dict ordering would
> depend on the particular run. I know, because some of them break on
> pypy which has a different dict ordering. This is probably a good
> thing in general, but is it really worth it? People will install
> python 2.6.newest and stuff *will* break.

So if we're making the new hashing the default and giving an option to
use the old, we should make it _really_ clear in the release
notes/announcement about how to revert the behavior.


> Is it *really* a security issue? We knew all along that dicts are
> O(n^2) in worst case scenario, how is this suddenly a security
> problem?
> Cheers,
> fijal
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From andrew at  Thu Jan  5 05:26:27 2012
From: andrew at (Andrew Bennetts)
Date: Thu, 5 Jan 2012 15:26:27 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Jan 04, 2012 at 11:55:13AM +0100, Antoine Pitrou wrote:
> On Wed, 4 Jan 2012 09:59:15 +0200
> Maciej Fijalkowski <fijall at> wrote:
> > 
> > Is it *really* a security issue? We knew all along that dicts are
> > O(n^2) in worst case scenario, how is this suddenly a security
> > problem?
> Because it has been shown to be exploitable for malicious purposes?

I don't think that's news either. and for
instance show that in 2003 it was clearly known to at least be likely to be an
exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).

There was debate about whether it's the language's responsibility to mitigate
the problem or if apps should use safer designs for handling untrusted input
(e.g. limit the number of keys input is allowed to create, or use something
other than dicts), and debate about just how practical an effective exploit
would be.  But I think it was understood to be a real concern 8 years ago, so
not exactly sudden.

Just because it's old news doesn't make it not a security problem, of course.


From paul at  Thu Jan  5 09:58:29 2012
From: paul at (Paul Smedley)
Date: Thu, 05 Jan 2012 19:28:29 +1030
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
Message-ID: <je3onm$57p$>

Hi All,

I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2 

I have python.exe and python27.dll compiling find, but when starting to 
build sharedmods I'm getting the following error:
running build
running build_ext
Traceback (most recent call last):
   File "./", line 2092, in <module>
   File "./", line 2087, in main
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 152, in setup
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 953, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 972, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/command/", line 127, 
in run
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 326, in run_command
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 972, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/command/", line 
340, in run
   File "./", line 152, in build_extensions
     missing = self.detect_modules()
   File "./", line 1154, in detect_modules
     for arg in sysconfig.get_config_var("CONFIG_ARGS").split()]
AttributeError: 'NoneType' object has no attribute 'split'
make: *** [sharedmods] Error 1

Any suggestions?  A google showed a similar error on AIX with no clear 

Thanks in advance,


From solipsis at  Thu Jan  5 14:39:57 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 5 Jan 2012 14:39:57 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
	<> <>
Message-ID: <>

On Thu, 5 Jan 2012 15:26:27 +1100
Andrew Bennetts <andrew at> wrote:
> I don't think that's news either.
> and
> for
> instance show that in 2003 it was clearly known to at least be likely to be an
> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
> There was debate about whether it's the language's responsibility to mitigate
> the problem or if apps should use safer designs for handling untrusted input
> (e.g. limit the number of keys input is allowed to create, or use something
> other than dicts), and debate about just how practical an effective exploit
> would be.  But I think it was understood to be a real concern 8 years ago, so
> not exactly sudden.

That's not news indeed, but that doesn't make it less of a problem,
especially now that the issue has been widely publicized through a
conference and announcements on several widely-read Web sites.

That said, only doing the security fix in 3.3 would have the nice side
effect of pushing people towards Python 3, so perhaps I'm for it after



From mark at  Thu Jan  5 14:46:52 2012
From: mark at (Mark Shannon)
Date: Thu, 05 Jan 2012 13:46:52 +0000
Subject: [Python-Dev] Testing the tests by modifying the ordering of dict
Message-ID: <>


Python code should not depend upon the ordering of items in a dict. 
Unfortunately it seems that a number of tests in the standard library do 
just that.

Changing PyDict_MINSIZE from 8 to either 4 or 16 causes the following 
tests to fail:

test_dis test_email test_inspect test_nntplib test_packaging
test_plistlib test_pprint test_symtable test_trace

test_sys also fails, but this is a legitimate failure in sys.getsizeof()

Changing the collision resolution function from f(n) = 5n + 1 to
f(n) = n + 1 results in the same failures, except for test_packaging and 
test_symtable which pass.

Finally, changing the seed in unicode_hash() from (implicit) 0 to an 
arbitrary value (12345678) causes the above tests to fail plus:

test_json test_set test_ttk_textonly test_urllib test_urlparse

I think this is a real issue as the unicode_hash() function is likely to 
change soon due to

Should I:

1. Submit one big bug report?

2. Submit a bug report for each "failing" test separately?

3. Ignore it, since the tests only fail when I start messing about?


From solipsis at  Thu Jan  5 14:58:13 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 5 Jan 2012 14:58:13 +0100
Subject: [Python-Dev] Testing the tests by modifying the ordering of
	dict items.
References: <>
Message-ID: <>

On Thu, 05 Jan 2012 13:46:52 +0000
Mark Shannon <mark at> wrote:
> Should I:
> 1. Submit one big bug report?
> 2. Submit a bug report for each "failing" test separately?

I would say a separate bug report for each failing test file, i.e. one
report for test_dis, one for test_email etc.
Hope this doesn't eat too much of your time :)



From amauryfa at  Thu Jan  5 15:02:44 2012
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Thu, 5 Jan 2012 15:02:44 +0100
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je3onm$57p$>
References: <je3onm$57p$>
Message-ID: <>

2012/1/5 Paul Smedley <paul at>

> Hi All,
> I'm working on updating my port of Python 2.6.5 to v2.7.2 for the OS/2
> platform.
> I have python.exe and python27.dll compiling find, but when starting to
> build sharedmods I'm getting the following error:
> running build
> running build_ext
> Traceback (most recent call last):
>  File "./", line 2092, in <module>
>    main()
>  File "./", line 2087, in main
>    'Lib/']
>  File "U:/DEV/python-2.7.2/Lib/**distutils/", line 152, in setup
>    dist.run_commands()
>  File "U:/DEV/python-2.7.2/Lib/**distutils/", line 953, in
> run_commands
>    self.run_command(cmd)
>  File "U:/DEV/python-2.7.2/Lib/**distutils/", line 972, in
> run_command
>  File "U:/DEV/python-2.7.2/Lib/**distutils/command/", line 127,
> in run
>    self.run_command(cmd_name)
>  File "U:/DEV/python-2.7.2/Lib/**distutils/", line 326, in
> run_command
>    self.distribution.run_command(**command)
>  File "U:/DEV/python-2.7.2/Lib/**distutils/", line 972, in
> run_command
>  File "U:/DEV/python-2.7.2/Lib/**distutils/command/build_ext.**py", line
> 340, in run
>    self.build_extensions()
>  File "./", line 152, in build_extensions
>    missing = self.detect_modules()
>  File "./", line 1154, in detect_modules
>    for arg in sysconfig.get_config_var("**CONFIG_ARGS").split()]
> AttributeError: 'NoneType' object has no attribute 'split'
> make: *** [sharedmods] Error 1
> Any suggestions?  A google showed a similar error on AIX with no clear
> resolution.

Is it in the part that configures the "dbm" module?
This paragraph is already protected by a "if platform not in ['cygwin']:",
I suggest to exclude 'os2emx' as well.

Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Thu Jan  5 16:15:33 2012
From: barry at (Barry Warsaw)
Date: Thu, 5 Jan 2012 10:15:33 -0500
Subject: [Python-Dev] Testing the tests by modifying the ordering of
 dict items.
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 05, 2012, at 01:46 PM, Mark Shannon wrote:

>2. Submit a bug report for each "failing" test separately?

I'm sure it will be a pain, but this is really the best thing to do.


From fijall at  Thu Jan  5 18:34:13 2012
From: fijall at (Maciej Fijalkowski)
Date: Thu, 5 Jan 2012 19:34:13 +0200
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou <solipsis at> wrote:
> On Thu, 5 Jan 2012 15:26:27 +1100
> Andrew Bennetts <andrew at> wrote:
>> I don't think that's news either.
>> and
>> for
>> instance show that in 2003 it was clearly known to at least be likely to be an
>> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
>> There was debate about whether it's the language's responsibility to mitigate
>> the problem or if apps should use safer designs for handling untrusted input
>> (e.g. limit the number of keys input is allowed to create, or use something
>> other than dicts), and debate about just how practical an effective exploit
>> would be. ?But I think it was understood to be a real concern 8 years ago, so
>> not exactly sudden.
> That's not news indeed, but that doesn't make it less of a problem,
> especially now that the issue has been widely publicized through a
> conference and announcements on several widely-read Web sites.
> That said, only doing the security fix in 3.3 would have the nice side
> effect of pushing people towards Python 3, so perhaps I'm for it after
> all.
> Half-jokingly,
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Just to make things clear - stdlib itself has 1/64 of tests relying on
dict order. Changing dict order in *older* pythons will break
everyone's tests and some peoples code. Making this new 2.6.x release
would mean that people using new python 2.6 would have to upgrade an
unspecified amount of their python packages, that does not sound very
cool. Also consider that new 2.6.x would go as a security fix to old
ubuntu, but all other packages won't, because they'll not contain
security fixes. Just so you know


From v+python at  Thu Jan  5 20:14:51 2012
From: v+python at (Glenn Linderman)
Date: Thu, 05 Jan 2012 11:14:51 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 1/5/2012 9:34 AM, Maciej Fijalkowski wrote:
> Also consider that new 2.6.x would go as a security fix to old
> ubuntu, but all other packages won't, because they'll not contain
> security fixes. Just so you know

Why should CPython by constrained by broken policies of Ubuntu?  If the 
other packages must be fixed so they work correctly with a security fix 
in Python, then they should be considered as containing a security fix. 
If they aren't, then that is a broken policy.

On the other hand, it is very true that the seductive convenience of 
dict (readily available, good performance) in normal cases have created 
the vulnerability because its characteristics are a function of the data 
inserted, and when used for data that is from unknown, possibly 
malicious sources, that is a bug in the program that uses dict, not in 
dict itself.

So it seems to me that:

1) the security problem is not in CPython, but rather in web servers 
that use dict inappropriately.
2) changing CPython in a way that breaks code is not a security fix to 
CPython, but rather a gratuitous breakage of compatibility promises, 
wrapped in a security-fix lie.

The problem for CPython here can be summarized as follows:

a) it is being blamed for problems in web servers that are not problems 
in CPython
b) perhaps dict documentation is a bit too seductive, in not declaring 
that data from malicious sources could cause its performance to degrade 
significantly (but then, any programmer who has actually taken a decent 
set of programming classes should understand that, but on the other 
hand, there are programmers who have not taken such classes).
c) CPython provides no other mapping data structures that rival the 
performance and capabilities of dict as an alternative, nor can such 
data structures be written in CPython, as the performance of dict comes 
not only from hashing, but also from being written in C.

The solutions could be:

A) push back on the blame: it is not a CPython problem
B) perhaps add a warning to the documentation for the na?ve, untrained 
C) consider adding an additional data structure to the language, and 
mention it in the B warning for versions 3.3+.

On the other hand, the web server vulnerability could be blamed on 
CPython in another way:

identify vulnerable packages in the stdlib that are likely the be used 
during the parsing of user-supplied data.  Ones that come to mind 
(Python 3.2) are:
urllib.parse (various parse* functions)  (package names different in 
Python 2.x)
cgi (parse_multipart, FieldStorage)

So, fixing the vulnerable packages could be a sufficient response, 
rather than changing the hash function.  How to fix?  Each of those 
above allocates and returns a dict.  Simply have each of those allocate 
and return and wrapped dict, which has the following behaviors:

i) during __init__, create a local, random, string.
ii) for all key values, prepend the string, before passing it to the 
internal dict.

Changing these vulnerable packages rather than the hash function is a 
much more constrained change, and wouldn't create bugs in programs that 
erroneously depend on the current hash function directly or indirectly.

This would not fix web servers that use their own parsing and storage 
mechanism for <FORM> fields, if they have also inappropriately used a 
dict as their storage mechanism for user supplied data.  However, a 
similar solution could be similarly applied by the authors of those web 
servers, and would be a security fix to such packages, so should be 
applied to Ubuntu, if available there, or other systems with 
security-only fix acceptance.

This solution does not require changes to the hash, does not require a 
cryptographicly secure hash, and does not require code to be added to 
the initialization of Python before normal objects and mappings can be 

If a port doesn't contain a good random number generator, a weak one can 
be subsitituted, but such decisions can be made in Python code after the 
interpreter is initialized, and use of stdlib packages is available.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Thu Jan  5 20:22:22 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 5 Jan 2012 20:22:22 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, 5 Jan 2012 19:34:13 +0200
Maciej Fijalkowski <fijall at> wrote:
> Just to make things clear - stdlib itself has 1/64 of tests relying on
> dict order. Changing dict order in *older* pythons will break
> everyone's tests and some peoples code.

Breaking tests is not a problem: they are typically not run by
production code and so people can take the time to fix them.

Breaking other code is a problem if it is legitimate. Relying on dict
ordering is totally wrong and I don't think we should care about such
cases. The only issue is when relying on hash() being stable accross
runs. But hashing already varies from build to build (32-bit vs.
64-bit) and I think that anyone seriously relying on it should already
have been bitten.

> Making this new 2.6.x release
> would mean that people using new python 2.6 would have to upgrade an
> unspecified amount of their python packages, that does not sound very
> cool.

How about 2.7? Do you think it should also remain untouched?
I am ok for leaving 2.6 alone (that's Barry's call anyway) but 2.7 is
another matter - should people migrate to 3.x to get the security fix?

As for 3.2, it should certainly get the fix IMO. There are not many
Python 3 legacy applications relying on hash() stability, I think.

> Also consider that new 2.6.x would go as a security fix to old
> ubuntu, but all other packages won't, because they'll not contain
> security fixes.

Ubuntu can decide *not* to ship the fix if they prefer it like that.
Their policies and decisions, though, should not taint ours.



From dmalcolm at  Thu Jan  5 20:33:24 2012
From: dmalcolm at (David Malcolm)
Date: Thu, 05 Jan 2012 14:33:24 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <1325792005.2123.11.camel@surprise>

On Thu, 2012-01-05 at 19:34 +0200, Maciej Fijalkowski wrote:
> On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou <solipsis at> wrote:
> > On Thu, 5 Jan 2012 15:26:27 +1100
> > Andrew Bennetts <andrew at> wrote:
> >>
> >> I don't think that's news either.
> >> and
> >> for
> >> instance show that in 2003 it was clearly known to at least be likely to be an
> >> exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
> >>
> >> There was debate about whether it's the language's responsibility to mitigate
> >> the problem or if apps should use safer designs for handling untrusted input
> >> (e.g. limit the number of keys input is allowed to create, or use something
> >> other than dicts), and debate about just how practical an effective exploit
> >> would be.  But I think it was understood to be a real concern 8 years ago, so
> >> not exactly sudden.
> >
> > That's not news indeed, but that doesn't make it less of a problem,
> > especially now that the issue has been widely publicized through a
> > conference and announcements on several widely-read Web sites.
> >
> > That said, only doing the security fix in 3.3 would have the nice side
> > effect of pushing people towards Python 3, so perhaps I'm for it after
> > all.
> >
> > Half-jokingly,
> >
> > Antoine.

> Just to make things clear - stdlib itself has 1/64 of tests relying on
> dict order. Changing dict order in *older* pythons will break
> everyone's tests and some peoples code. Making this new 2.6.x release
> would mean that people using new python 2.6 would have to upgrade an
> unspecified amount of their python packages, that does not sound very
> cool. Also consider that new 2.6.x would go as a security fix to old
> ubuntu, but all other packages won't, because they'll not contain
> security fixes. Just so you know

We have similar issues in RHEL, with the Python versions going much
further back (e.g. 2.3)

When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.   Any such logic here also suggests the need
for an attribute in the sys module so that you can verify the behavior.

From tseaver at  Thu Jan  5 20:49:53 2012
From: tseaver at (Tres Seaver)
Date: Thu, 05 Jan 2012 14:49:53 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <je4ut2$9fv$>

Hash: SHA1

On 01/05/2012 02:14 PM, Glenn Linderman wrote:
> 1) the security problem is not in CPython, but rather in web servers 
> that use dict inappropriately.

Most webapp vulnerabilities are due to their use of Python's cgi module,
which it uses a dict to hold the form / query string data being supplied
by untrusted external users.

- -- 
Tres Seaver          +1 540-429-0999          tseaver at
Palladion Software   "Excellence by Design"
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


From paul at  Thu Jan  5 21:01:53 2012
From: paul at (Paul Smedley)
Date: Fri, 06 Jan 2012 06:31:53 +1030
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <>
References: <je3onm$57p$>
Message-ID: <je4vjj$dtf$>

Hi Amaury,

On 06/01/12 00:32, Amaury Forgeot d'Arc wrote:
> 2012/1/5 Paul Smedley <paul at <mailto:paul at>>
>     Hi All,
>     I'm working on updating my port of Python 2.6.5 to v2.7.2 for the
>     OS/2 platform.
>     I have python.exe and python27.dll compiling find, but when starting
>     to build sharedmods I'm getting the following error:
>     running build
>     running build_ext
>     Traceback (most recent call last):
>       File "./", line 2092, in <module>
>         main()
>       File "./", line 2087, in main
>     'Lib/']
>       File "U:/DEV/python-2.7.2/Lib/__distutils/", line 152, in setup
>         dist.run_commands()
>       File "U:/DEV/python-2.7.2/Lib/__distutils/", line 953, in
>     run_commands
>         self.run_command(cmd)
>       File "U:/DEV/python-2.7.2/Lib/__distutils/", line 972, in
>     run_command
>       File "U:/DEV/python-2.7.2/Lib/__distutils/command/", line
>     127, in run
>         self.run_command(cmd_name)
>       File "U:/DEV/python-2.7.2/Lib/__distutils/", line 326, in
>     run_command
>         self.distribution.run_command(__command)
>       File "U:/DEV/python-2.7.2/Lib/__distutils/", line 972, in
>     run_command
>       File "U:/DEV/python-2.7.2/Lib/__distutils/command/build_ext.__py",
>     line 340, in run
>         self.build_extensions()
>       File "./", line 152, in build_extensions
>         missing = self.detect_modules()
>       File "./", line 1154, in detect_modules
>         for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()]
>     AttributeError: 'NoneType' object has no attribute 'split'
>     make: *** [sharedmods] Error 1
>     Any suggestions?  A google showed a similar error on AIX with no
>     clear resolution.
> Is it in the part that configures the "dbm" module?
> This paragraph is already protected by a "if platform not in ['cygwin']:",
> I suggest to exclude 'os2emx' as well.

It is - however adding os2 the the list of platforms to the ones to 
exclude gets me only a little further:

It then bombs with:
running build
running build_ext
Traceback (most recent call last):
   File "./", line 2092, in <module>
   File "./", line 2087, in main
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 152, in setup
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 953, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 972, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/command/", line 127, 
in run
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 326, in run_command
   File "U:/DEV/python-2.7.2/Lib/distutils/", line 972, in 
   File "U:/DEV/python-2.7.2/Lib/distutils/command/", line 
340, in run
   File "./", line 152, in build_extensions
     missing = self.detect_modules()
   File "./", line 1368, in detect_modules
     if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"):
TypeError: argument of type 'NoneType' is not iterable
make: *** [sharedmods] Error 1

Which again points to problems with sysconfig.get_config_var("CONFIG_ARGS"):



From v+python at  Thu Jan  5 21:19:25 2012
From: v+python at (Glenn Linderman)
Date: Thu, 05 Jan 2012 12:19:25 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <je4ut2$9fv$>
References: <>
	<> <>
	<> <je4ut2$9fv$>
Message-ID: <>

On 1/5/2012 11:49 AM, Tres Seaver wrote:
> Hash: SHA1
> On 01/05/2012 02:14 PM, Glenn Linderman wrote:
>> 1) the security problem is not in CPython, but rather in web servers
>> that use dict inappropriately.
> Most webapp vulnerabilities are due to their use of Python's cgi module,
> which it uses a dict to hold the form / query string data being supplied
> by untrusted external users.

Yes, I understand that (and have some such web apps in production).

In fact, I pointed out urllib.parse and cgi as specific modules for 
which a proposed fix could be made without impacting the Python hash 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Thu Jan  5 21:10:35 2012
From: ethan at (Ethan Furman)
Date: Thu, 05 Jan 2012 12:10:35 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <je4ut2$9fv$>
References: <>	<>
	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Tres Seaver wrote:
> Hash: SHA1
> On 01/05/2012 02:14 PM, Glenn Linderman wrote:
>> 1) the security problem is not in CPython, but rather in web servers 
>> that use dict inappropriately.
> Most webapp vulnerabilities are due to their use of Python's cgi module,
> which it uses a dict to hold the form / query string data being supplied
> by untrusted external users.

And Glenn suggested further down that an appropriate course of action 
would be to fix the cgi module (and others) instead of messing with dict.


From p.f.moore at  Thu Jan  5 21:35:57 2012
From: p.f.moore at (Paul Moore)
Date: Thu, 5 Jan 2012 20:35:57 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <1325792005.2123.11.camel@surprise>
References: <>
	<> <>
Message-ID: <>

On 5 January 2012 19:33, David Malcolm <dmalcolm at> wrote:
> We have similar issues in RHEL, with the Python versions going much
> further back (e.g. 2.3)
> When backporting the fix to ancient python versions, I'm inclined to
> turn the change *off* by default, requiring the change to be enabled via
> an environment variable: I want to avoid breaking existing code, even if
> such code is technically relying on non-guaranteed behavior. ?But we
> could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
> That way /usr/bin/python would default to the old behavior, but web apps
> would have some protection. ? Any such logic here also suggests the need
> for an attribute in the sys module so that you can verify the behavior.

Uh, surely no-one is suggesting backporting to "ancient" versions? I
couldn't find the statement quickly on the website (so this
is via google), but isn't it true that 2.6 is in security-only mode
and 2.5 and earlier will never get the fix? Having a source-only
release for 2.6 means the fix is "off by default" in the sense that
you can choose not to build it. Or add a #ifdef to the source if it
really matters.

Personally, I find it hard to see this as a Python security hole, but
I can sympathise with the idea that it would be nice to make dict
"safer by default". (Although the benefit for me personally would be
zero, so I'm reluctant for the change to have a detectable cost...)

My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no
bells and whistles to switch it off or the like. If it's not suitable
to go in on that basis, restrict it to 3.3+ (where it's certainly OK)
and advise users of earlier versions to either upgrade or code
defensively to avoid hitting the pathological case. Surely that sort
of defensive code should be second nature to the people who might be
affected by the issue?


From barry at  Thu Jan  5 21:45:58 2012
From: barry at (Barry Warsaw)
Date: Thu, 5 Jan 2012 15:45:58 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <1325792005.2123.11.camel@surprise>
References: <>
	<> <>
Message-ID: <>

On Jan 05, 2012, at 02:33 PM, David Malcolm wrote:

>We have similar issues in RHEL, with the Python versions going much
>further back (e.g. 2.3)
>When backporting the fix to ancient python versions, I'm inclined to
>turn the change *off* by default, requiring the change to be enabled via
>an environment variable: I want to avoid breaking existing code, even if
>such code is technically relying on non-guaranteed behavior.  But we
>could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
>That way /usr/bin/python would default to the old behavior, but web apps
>would have some protection.

This sounds like a reasonable compromise for all stable Python releases.  It
can be turned on by default for Python 3.3.  If you also make the default
setting easy to change (i.e. parameterized in one place), then distros can
make their own decision about the default, although I'd argue for the above
default approach for Debian/Ubuntu.

>Any such logic here also suggests the need for an attribute in the sys module
>so that you can verify the behavior.

That would be read-only though, right?


From barry at  Thu Jan  5 21:50:34 2012
From: barry at (Barry Warsaw)
Date: Thu, 5 Jan 2012 15:50:34 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Jan 05, 2012, at 08:35 PM, Paul Moore wrote:

>Uh, surely no-one is suggesting backporting to "ancient" versions? I
>couldn't find the statement quickly on the website (so this
>is via google), but isn't it true that 2.6 is in security-only mode
>and 2.5 and earlier will never get the fix? Having a source-only
>release for 2.6 means the fix is "off by default" in the sense that
>you can choose not to build it. Or add a #ifdef to the source if it
>really matters.

Correct, although there's no reason why a patch for versions older than 2.6
couldn't be included on a security page for reference in CVE or
other security notifications.  Distros that care about versions older than
Python 2.6 will basically be back-porting the patch anyway.

>My feeling is that it should go into 2.7, 3.2, and 3.3+, but with no
>bells and whistles to switch it off or the like.

I like David Malcolm's suggestion, but I have no problem applying it to 3.3,
enabled by default with no way to turn it off.  The off-by-default on-switch
policy for stable releases would be justified by maximum backward
compatibility conservativeness.


From a.badger at  Thu Jan  5 21:51:50 2012
From: a.badger at (Toshio Kuratomi)
Date: Thu, 5 Jan 2012 12:51:50 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <> <>
Message-ID: <20120105205150.GM5336@unaka.lan>

On Thu, Jan 05, 2012 at 08:35:57PM +0000, Paul Moore wrote:
> On 5 January 2012 19:33, David Malcolm <dmalcolm at> wrote:
> > We have similar issues in RHEL, with the Python versions going much
> > further back (e.g. 2.3)
> >
> > When backporting the fix to ancient python versions, I'm inclined to
> > turn the change *off* by default, requiring the change to be enabled via
> > an environment variable: I want to avoid breaking existing code, even if
> > such code is technically relying on non-guaranteed behavior. ?But we
> > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
> > That way /usr/bin/python would default to the old behavior, but web apps
> > would have some protection. ? Any such logic here also suggests the need
> > for an attribute in the sys module so that you can verify the behavior.
> Uh, surely no-one is suggesting backporting to "ancient" versions? I
> couldn't find the statement quickly on the website (so this
> is via google), but isn't it true that 2.6 is in security-only mode
> and 2.5 and earlier will never get the fix?
I think when dmalcolm says "backporting" he means that he'll have to
backport the fix from modern, python to the ancient
python's that he's supporting as part of the Linux distributions where he's
the python package maintainer.

I'm thinking he's mentioning it here mainly to see if someone thinks that
his approach for those distributions causes anyone to point out a reason not
to diverge from upstream in that manner.

> Having a source-only
> release for 2.6 means the fix is "off by default" in the sense that
> you can choose not to build it. Or add a #ifdef to the source if it
> really matters.
I don't think that this would satisfy dmalcolm's needs.  What he's talking
about sounds more like a runtime switch (possibly only when initializing,
though, not on-the-fly).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <>

From dmalcolm at  Thu Jan  5 21:52:15 2012
From: dmalcolm at (David Malcolm)
Date: Thu, 05 Jan 2012 15:52:15 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <1325796736.2123.16.camel@surprise>

On Thu, 2012-01-05 at 20:35 +0000, Paul Moore wrote:
> On 5 January 2012 19:33, David Malcolm <dmalcolm at> wrote:
> > We have similar issues in RHEL, with the Python versions going much
> > further back (e.g. 2.3)
> >
> > When backporting the fix to ancient python versions, I'm inclined to
> > turn the change *off* by default, requiring the change to be enabled via
> > an environment variable: I want to avoid breaking existing code, even if
> > such code is technically relying on non-guaranteed behavior.  But we
> > could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
> > That way /usr/bin/python would default to the old behavior, but web apps
> > would have some protection.   Any such logic here also suggests the need
> > for an attribute in the sys module so that you can verify the behavior.
> Uh, surely no-one is suggesting backporting to "ancient" versions? I
> couldn't find the statement quickly on the website (so this
> is via google), but isn't it true that 2.6 is in security-only mode
> and 2.5 and earlier will never get the fix? Having a source-only
> release for 2.6 means the fix is "off by default" in the sense that
> you can choose not to build it. Or add a #ifdef to the source if it
> really matters.
Sorry, if I was unclear.   I don't expect python-dev to do this
backporting, but those of us who do maintain such ancient pythons via
Linux distributions may want to do the backport for our users.  My email
was to note that it may make sense to pick more conservative defaults
for such a scenario, as compared to 2.6 onwards.


Hope this is helpful

From g.brandl at  Thu Jan  5 21:52:40 2012
From: g.brandl at (Georg Brandl)
Date: Thu, 05 Jan 2012 21:52:40 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <je52in$3lg$>

On 01/05/2012 09:45 PM, Barry Warsaw wrote:
> On Jan 05, 2012, at 02:33 PM, David Malcolm wrote:
>>We have similar issues in RHEL, with the Python versions going much
>>further back (e.g. 2.3)
>>When backporting the fix to ancient python versions, I'm inclined to
>>turn the change *off* by default, requiring the change to be enabled via
>>an environment variable: I want to avoid breaking existing code, even if
>>such code is technically relying on non-guaranteed behavior.  But we
>>could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
>>That way /usr/bin/python would default to the old behavior, but web apps
>>would have some protection.
> This sounds like a reasonable compromise for all stable Python releases.  It
> can be turned on by default for Python 3.3.  If you also make the default
> setting easy to change (i.e. parameterized in one place), then distros can
> make their own decision about the default, although I'd argue for the above
> default approach for Debian/Ubuntu.



From lists at  Thu Jan  5 22:40:58 2012
From: lists at (Christian Heimes)
Date: Thu, 05 Jan 2012 22:40:58 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Am 05.01.2012 21:45, schrieb Barry Warsaw:
> This sounds like a reasonable compromise for all stable Python releases.  It
> can be turned on by default for Python 3.3.  If you also make the default
> setting easy to change (i.e. parameterized in one place), then distros can
> make their own decision about the default, although I'd argue for the above
> default approach for Debian/Ubuntu.

Hey Barry, stop stealing my ideas! :) I've argued for these default
settings for days.

ver	delivery	randomized hashing
2.3	patch		disabled by default
2.4	patch		disabled
2.5	patch		disabled
2.6	release		disabled
2.7	release		disabled
3.0	ignore?		disabled
3.1	release		disabled
3.2	release		disabled
3.3	n/a yet		enabled by default

2.3 to 2.5 are still used in production (RHEL, Ubuntu LTS). Guido has
stated that he needs a patch for 2.4, too. I think we may safely ignore
Python 3.0. Nobody should use Python 3.0 on a production system.

I've suggested the env var PYRANDOMHASH. It's easy to set env vars in
Apache. For example Debian/Ubuntu has /etc/apache2/envvars.

Settings for PYRANDOMHASH:

   enable randomized hashing function

   enable randomized hashing function and read seed from 'seed'

   disable randomed hashing function

Since there isn't an easy way to set env vars in a shebang line since
something like

  #!/usr/bin/env PYRANDOMHASH=1 python2.7

doesn't work, we could come up with a solution the shebang.

IMHO the setting for the default setting should be a compile time
option. It's reasonable easy to extend the configure script to support
--enable-randomhash / --disable-randomhash. The MS VC build scripts can
grow a flag, too.

I still think that the topic needs a PEP. A couple of days ago I started
with a PEP. But Guido told me that he doesn't see a point in a PEP
because he prefers a small and quick solution, so I stopped working on
it. However the arguments, worries and ideas in this enormous topic have
repeated over and over. We know from experience that a PEP is a great
way to explain the how, what and why of the change as well as the paths
we didn't take.


From neologix at  Thu Jan  5 22:44:26 2012
From: neologix at (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Thu, 5 Jan 2012 22:44:26 +0100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
Message-ID: <>


Issue #13697 ( deals with a problem
with the Python version of threading.RLock (a signal handler which
tries to acquire the same RLock is called right at the wrong time)
which doesn't affect the C version.
Whether such a use case can be considered good practise or the best
way to fix this is not settled yet, but the question that arose to me
is: "why do we have both a C and Python version?".
Here's Antoine answer (he suggested to me to bring this up on python-dev":
The C version is quite recent, and there's a school of thought that we
should always provide fallback Python implementations.
(also, arguably a Python implementation makes things easier to
prototype, although I don't think it's the case for an RLock)

So, what do you guys think?
Would it be okay to nuke the Python version?
Do you have more details on this "school of thought"?

Also, while we're at it, Victor created #13550 to try to rewrite the
"logging hack" of the threading module: there again, I think we could
just remove this logging altogether. What do you think?



From lists at  Thu Jan  5 22:46:06 2012
From: lists at (Christian Heimes)
Date: Thu, 05 Jan 2012 22:46:06 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>
	<>	<>	<>	<>	<>	<>	<>	<>
	<je4ut2$9fv$> <>
Message-ID: <>

Am 05.01.2012 21:10, schrieb Ethan Furman:
> Tres Seaver wrote:
>> Hash: SHA1
>> On 01/05/2012 02:14 PM, Glenn Linderman wrote:
>>> 1) the security problem is not in CPython, but rather in web servers
>>> that use dict inappropriately.
>> Most webapp vulnerabilities are due to their use of Python's cgi module,
>> which it uses a dict to hold the form / query string data being supplied
>> by untrusted external users.
> And Glenn suggested further down that an appropriate course of action
> would be to fix the cgi module (and others) instead of messing with dict.

You'd have to fix any Python core module that may handle data from
untrusted sources. The issue isn't limited to web apps and POST
requests. It's possible to trigger the DoS from JSON, a malicious PDF,
JPEG's EXIF metadata or any other data.

Oh, and somebody has to fix all 3rd party modules, too.


From solipsis at  Thu Jan  5 22:59:59 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 5 Jan 2012 22:59:59 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
	<> <>
Message-ID: <>

On Thu, 05 Jan 2012 22:40:58 +0100
Christian Heimes <lists at> wrote:
> Am 05.01.2012 21:45, schrieb Barry Warsaw:
> > This sounds like a reasonable compromise for all stable Python releases.  It
> > can be turned on by default for Python 3.3.  If you also make the default
> > setting easy to change (i.e. parameterized in one place), then distros can
> > make their own decision about the default, although I'd argue for the above
> > default approach for Debian/Ubuntu.
> Hey Barry, stop stealing my ideas! :) I've argued for these default
> settings for days.
> ver	delivery	randomized hashing
> ==========================================
> 2.3	patch		disabled by default
> 2.4	patch		disabled
> 2.5	patch		disabled
> 2.6	release		disabled
> 2.7	release		disabled
> 3.0	ignore?		disabled
> 3.1	release		disabled
> 3.2	release		disabled
> 3.3	n/a yet		enabled by default

I don't think we (python-dev) are really concerned with 2.3, 2.4,
2.5 and 3.0.  They're all unsupported, and people do what they want
with their local source trees.



From ericsnowcurrently at  Thu Jan  5 23:02:42 2012
From: ericsnowcurrently at (Eric Snow)
Date: Thu, 5 Jan 2012 15:02:42 -0700
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/5 Charles-Fran?ois Natali <neologix at>:
> Hi,
> Issue #13697 ( deals with a problem
> with the Python version of threading.RLock (a signal handler which
> tries to acquire the same RLock is called right at the wrong time)
> which doesn't affect the C version.
> Whether such a use case can be considered good practise or the best
> way to fix this is not settled yet, but the question that arose to me
> is: "why do we have both a C and Python version?".
> Here's Antoine answer (he suggested to me to bring this up on python-dev":
> """
> The C version is quite recent, and there's a school of thought that we
> should always provide fallback Python implementations.
> (also, arguably a Python implementation makes things easier to
> prototype, although I don't think it's the case for an RLock)
> """
> So, what do you guys think?
> Would it be okay to nuke the Python version?
> Do you have more details on this "school of thought"?

>From what I understand, the biggest motivation for pure Python
versions is cooperation with the other Python implementations.  See


> Also, while we're at it, Victor created #13550 to try to rewrite the
> "logging hack" of the threading module: there again, I think we could
> just remove this logging altogether. What do you think?
> Cheers,
> cf
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From lists at  Thu Jan  5 23:11:41 2012
From: lists at (Christian Heimes)
Date: Thu, 05 Jan 2012 23:11:41 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Am 05.01.2012 22:59, schrieb Antoine Pitrou:
> I don't think we (python-dev) are really concerned with 2.3, 2.4,
> 2.5 and 3.0.  They're all unsupported, and people do what they want
> with their local source trees.

Let me reply with a quote from Barry:

> Correct, although there's no reason why a patch for versions
> older than 2.6 couldn't be included on a security
> page for reference in CVE or other security notifications.
> Distros that care about versions older than Python 2.6 will
> basically be back-porting the patch anyway.


From storchaka at  Thu Jan  5 23:15:31 2012
From: storchaka at (Serhiy Storchaka)
Date: Fri, 06 Jan 2012 00:15:31 +0200
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <je57ed$6nq$>

05.01.12 21:14, Glenn Linderman ???????(??):
> So, fixing the vulnerable packages could be a sufficient response,
> rather than changing the hash function.  How to fix?  Each of those
> above allocates and returns a dict.  Simply have each of those allocate
> and return and wrapped dict, which has the following behaviors:
> i) during __init__, create a local, random, string.
> ii) for all key values, prepend the string, before passing it to the
> internal dict.

Good idea.
-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-python
Size: 1923 bytes
Desc: not available
URL: <>

From solipsis at  Thu Jan  5 23:17:18 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 5 Jan 2012 23:17:18 +0100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
References: <>
Message-ID: <>

On Thu, 5 Jan 2012 15:02:42 -0700
Eric Snow <ericsnowcurrently at> wrote:

> 2012/1/5 Charles-Fran?ois Natali <neologix at>:
> > Hi,
> >
> > Issue #13697 ( deals with a problem
> > with the Python version of threading.RLock (a signal handler which
> > tries to acquire the same RLock is called right at the wrong time)
> > which doesn't affect the C version.
> > Whether such a use case can be considered good practise or the best
> > way to fix this is not settled yet, but the question that arose to me
> > is: "why do we have both a C and Python version?".
> > Here's Antoine answer (he suggested to me to bring this up on python-dev":
> > """
> > The C version is quite recent, and there's a school of thought that we
> > should always provide fallback Python implementations.
> > (also, arguably a Python implementation makes things easier to
> > prototype, although I don't think it's the case for an RLock)
> > """
> >
> > So, what do you guys think?
> > Would it be okay to nuke the Python version?
> > Do you have more details on this "school of thought"?
> >From what I understand, the biggest motivation for pure Python
> versions is cooperation with the other Python implementations.  See

Apologies, I didn't remember it was written down in PEP.
A bit more than a school of thought, then :-)



From tjreedy at  Fri Jan  6 00:55:58 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 05 Jan 2012 18:55:58 -0500
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je4vjj$dtf$>
References: <je3onm$57p$>
Message-ID: <je5dal$co1$>

On 1/5/2012 3:01 PM, Paul Smedley wrote:

>> File "./", line 1154, in detect_modules
>> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()]
>> AttributeError: 'NoneType' object has no attribute 'split'
>> make: *** [sharedmods] Error 1

> File "./", line 1368, in detect_modules
> if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"):
> TypeError: argument of type 'NoneType' is not iterable
> make: *** [sharedmods] Error 1
> Which again points to problems with
> sysconfig.get_config_var("CONFIG_ARGS"):

[The earlier call was with "__CONFIG_ARGS", for whatever difference that 
makes.] It appears to be returning None instead of [] (or a populated list).

In 3.2.2, at line 579 of is
def get_config_var(name):
    return get_config_vars().get(name)

That defaults to None if name is not a key in the dict returned by 
get_config_vars(). My guess is that it always is and and the the value 
is always a list for tested win/*nix/mac systems. So either has 
the bug of assuming that there is always a list value for "CONFIG_ARGS" 
or has the bug of not setting it for os2, perhaps because 
of a bug elsewhere.

At line 440 of is
def get_config_var(*args):
     global _CONFIG_VARS
     if _CONFIG_VARS is None:
          _CONFIG_VARS = {}
          <code to populate _CONFIG_VARS, including>
          if in ('nt', 'os2'):
     if args:
          vals = []
          for name in args:
          return vals
         return _CONFIG_VARS

At 456 is
def _init_non_posix(vars):
     """Initialize the module as appropriate for NT"""
     # set basic install directories

"CONFIG_ARGS" is not set explicitly for any system anywhere in the file, 
so I do not know how the call ever works.

Terry Jan Reedy

From ncoghlan at  Fri Jan  6 01:10:52 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 6 Jan 2012 10:10:52 +1000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <je57ed$6nq$>
References: <>
	<> <>
	<> <je57ed$6nq$>
Message-ID: <>

On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka <storchaka at> wrote:
> 05.01.12 21:14, Glenn Linderman ???????(??):
>> So, fixing the vulnerable packages could be a sufficient response,
>> rather than changing the hash function. ?How to fix? ?Each of those
>> above allocates and returns a dict. ?Simply have each of those allocate
>> and return and wrapped dict, which has the following behaviors:
>> i) during __init__, create a local, random, string.
>> ii) for all key values, prepend the string, before passing it to the
>> internal dict.
> Good idea.

Not a good idea - a lot of the 3rd party tests that depend on dict
ordering are going to be using those modules anyway, so scattering our
solution across half the standard library is needlessly creating
additional work without really reducing the incompatibility problem.
If we're going to change anything, it may as well be the string
hashing algorithm itself.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From tjreedy at  Fri Jan  6 01:11:22 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 05 Jan 2012 19:11:22 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>
	<>	<>	<>	<>	<>	<>	<>	<>
	<je4ut2$9fv$> <>
Message-ID: <je5e7g$hvg$>

On 1/5/2012 3:10 PM, Ethan Furman wrote:
> Tres Seaver wrote:

>>> 1) the security problem is not in CPython, but rather in web servers
>>> that use dict inappropriately.
>> Most webapp vulnerabilities are due to their use of Python's cgi module,
>> which it uses a dict to hold the form / query string data being supplied
>> by untrusted external users.
> And Glenn suggested further down that an appropriate course of action
> would be to fix the cgi module (and others) instead of messing with dict.

I think both should be done. For web applications, it would be best to 
reject DOS attempts with 'random' keys in O(1) time rather than in O(n) 
time even with improved hash. But some other apps, like the Python 
interpreter itself, 'random' names may be quite normal.

Terry Jan Reedy

From steve at  Fri Jan  6 01:07:27 2012
From: steve at (Steven D'Aprano)
Date: Fri, 06 Jan 2012 11:07:27 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <1325792005.2123.11.camel@surprise>
References: <>	<>
	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

David Malcolm wrote:

> When backporting the fix to ancient python versions, I'm inclined to
> turn the change *off* by default, requiring the change to be enabled via
> an environment variable: I want to avoid breaking existing code, even if
> such code is technically relying on non-guaranteed behavior.  But we
> could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
> That way /usr/bin/python would default to the old behavior, but web apps
> would have some protection.   Any such logic here also suggests the need
> for an attribute in the sys module so that you can verify the behavior.

Surely the way to verify the behaviour is to run this from the shell:

python -c print(hash("abcde"))

twice, and see that the calls return different values. (Or have I 
misunderstood the way the fix is going to work?)

In any case, I wouldn't want to rely on the presence of a flag in the sys 
module to verify the behaviour, I'd want to see for myself that hash 
collisions are no longer predictable.


From barry at  Fri Jan  6 01:31:28 2012
From: barry at (Barry Warsaw)
Date: Thu, 5 Jan 2012 19:31:28 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Jan 05, 2012, at 10:40 PM, Christian Heimes wrote:

>Hey Barry, stop stealing my ideas! :) I've argued for these default
>settings for days.


>I've suggested the env var PYRANDOMHASH. It's easy to set env vars in
>Apache. For example Debian/Ubuntu has /etc/apache2/envvars.

For consistency, it really should be PYTHONSOMETHING.  I personally don't care
how long it is (e.g. PYTHONIOENCODING).

>Settings for PYRANDOMHASH:
>   enable randomized hashing function
> PYRANDOMHASH=/path/to/seed
>   enable randomized hashing function and read seed from 'seed'
>   disable randomed hashing function


>Since there isn't an easy way to set env vars in a shebang line since
>something like
>  #!/usr/bin/env PYRANDOMHASH=1 python2.7
>doesn't work, we could come up with a solution the shebang.

We have precedence for mirroring startup options and envars, so it doesn't
bother me to add such a switch to Python 3.3.  It *does* bother me to add a
switch to any stable release.

>IMHO the setting for the default setting should be a compile time
>option. It's reasonable easy to extend the configure script to support
>--enable-randomhash / --disable-randomhash. The MS VC build scripts can
>grow a flag, too.
>I still think that the topic needs a PEP. A couple of days ago I started
>with a PEP. But Guido told me that he doesn't see a point in a PEP
>because he prefers a small and quick solution, so I stopped working on
>it. However the arguments, worries and ideas in this enormous topic have
>repeated over and over. We know from experience that a PEP is a great
>way to explain the how, what and why of the change as well as the paths
>we didn't take.

One way to look at it is to have a quick-and-dirty solution for stable
releases.  It could be suboptimal from a ui point of view because of backward
compatibility issues.  The PEP could then outline the boffo perfect solution
for Python 3.3, which a section on how it will be backported to stable

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From ncoghlan at  Fri Jan  6 01:34:55 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 6 Jan 2012 10:34:55 +1000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at> wrote:
> Surely the way to verify the behaviour is to run this from the shell:
> python -c print(hash("abcde"))
> twice, and see that the calls return different values. (Or have I
> misunderstood the way the fix is going to work?)
> In any case, I wouldn't want to rely on the presence of a flag in the sys
> module to verify the behaviour, I'd want to see for myself that hash
> collisions are no longer predictable.

More directly, you can just check that the hash of the empty string is non-zero.

So -1 for a flag in the sys module - "hash('') != 0" should serve as a
sufficient check whether or not process-level string hash
randomisation is in effect.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From victor.stinner at  Fri Jan  6 01:46:58 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 6 Jan 2012 01:46:58 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

2012/1/6 Barry Warsaw <barry at>:
>>Settings for PYRANDOMHASH:
>> ? enable randomized hashing function
>> PYRANDOMHASH=/path/to/seed
>> ? enable randomized hashing function and read seed from 'seed'
>> ? disable randomed hashing function
> Why not PYTHONHASHSEED then?

See my patch attached to the issue #13703? I prepared the code to be
able to set easily the hash seed (it has a LCG, it's seed can be
provided by the user directly). I agree that the value 0 should give
the same behaviour than the actual hash (disable the randomized hash).
I will add the variable in the next version of my patch.

From lists at  Fri Jan  6 01:50:00 2012
From: lists at (Christian Heimes)
Date: Fri, 06 Jan 2012 01:50:00 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Am 06.01.2012 01:34, schrieb Nick Coghlan:
> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at> wrote:
>> Surely the way to verify the behaviour is to run this from the shell:
>> python -c print(hash("abcde"))
>> twice, and see that the calls return different values. (Or have I
>> misunderstood the way the fix is going to work?)
>> In any case, I wouldn't want to rely on the presence of a flag in the sys
>> module to verify the behaviour, I'd want to see for myself that hash
>> collisions are no longer predictable.
> More directly, you can just check that the hash of the empty string is non-zero.
> So -1 for a flag in the sys module - "hash('') != 0" should serve as a
> sufficient check whether or not process-level string hash
> randomisation is in effect.

This might not work as we have to special case empty strings and perhaps
\0 strings, too. Otherwise we would give away the random seed to an
attacker if an attacker can somehow get hold of hash('') or hash(n * '\0').


From benjamin at  Fri Jan  6 01:59:49 2012
From: benjamin at (Benjamin Peterson)
Date: Thu, 5 Jan 2012 18:59:49 -0600
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

2012/1/5 Nick Coghlan <ncoghlan at>:
> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at> wrote:
>> Surely the way to verify the behaviour is to run this from the shell:
>> python -c print(hash("abcde"))
>> twice, and see that the calls return different values. (Or have I
>> misunderstood the way the fix is going to work?)
>> In any case, I wouldn't want to rely on the presence of a flag in the sys
>> module to verify the behaviour, I'd want to see for myself that hash
>> collisions are no longer predictable.
> More directly, you can just check that the hash of the empty string is non-zero.
> So -1 for a flag in the sys module - "hash('') != 0" should serve as a
> sufficient check whether or not process-level string hash
> randomisation is in effect.

What exactly is the disadvantage of a sys attribute? That would seem
preferable to an obscure incarnation like that.


From solipsis at  Fri Jan  6 01:59:10 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 6 Jan 2012 01:59:10 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
References: <>
	<> <>
Message-ID: <>

On Fri, 06 Jan 2012 01:50:00 +0100
Christian Heimes <lists at> wrote:
> Am 06.01.2012 01:34, schrieb Nick Coghlan:
> > On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at> wrote:
> >> Surely the way to verify the behaviour is to run this from the shell:
> >>
> >> python -c print(hash("abcde"))
> >>
> >> twice, and see that the calls return different values. (Or have I
> >> misunderstood the way the fix is going to work?)
> >>
> >> In any case, I wouldn't want to rely on the presence of a flag in the sys
> >> module to verify the behaviour, I'd want to see for myself that hash
> >> collisions are no longer predictable.
> > 
> > More directly, you can just check that the hash of the empty string is non-zero.
> > 
> > So -1 for a flag in the sys module - "hash('') != 0" should serve as a
> > sufficient check whether or not process-level string hash
> > randomisation is in effect.
> This might not work as we have to special case empty strings and perhaps
> \0 strings, too.

The special case value doesn't have to be zero. Make it age(Barry) for
example (which, I think, is still representable in a 32-bit integer!).



From ncoghlan at  Fri Jan  6 02:33:50 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 6 Jan 2012 11:33:50 +1000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 6, 2012 at 10:59 AM, Benjamin Peterson <benjamin at> wrote:
> What exactly is the disadvantage of a sys attribute? That would seem
> preferable to an obscure incarnation like that.

Adding sys attributes in maintenance (or security) releases makes me nervous.

However, Victor and Christian are right about the need for a special
case to avoid leaking information, so my particular suggested check
won't work.

The most robust check would be to run sys.executable in a subprocess
and check if it gives the same hash for a non-empty string as the
current process.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From steve at  Fri Jan  6 02:52:45 2012
From: steve at (Steven D'Aprano)
Date: Fri, 06 Jan 2012 12:52:45 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<1325792005.2123.11.camel@surprise>	<>	<>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/5 Nick Coghlan <ncoghlan at>:
>> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at> wrote:
>>> Surely the way to verify the behaviour is to run this from the shell:
>>> python -c print(hash("abcde"))
>>> twice, and see that the calls return different values. (Or have I
>>> misunderstood the way the fix is going to work?)
>>> In any case, I wouldn't want to rely on the presence of a flag in the sys
>>> module to verify the behaviour, I'd want to see for myself that hash
>>> collisions are no longer predictable.
>> More directly, you can just check that the hash of the empty string is non-zero.
>> So -1 for a flag in the sys module - "hash('') != 0" should serve as a
>> sufficient check whether or not process-level string hash
>> randomisation is in effect.
> What exactly is the disadvantage of a sys attribute? That would seem
> preferable to an obscure incarnation like that.

There's nothing obscure about directly testing the hash. That's about as far 
from obscure as it is possible to get: you are directly testing the presence 
of a feature by testing the feature.

Relying on a flag to tell you whether hashes are randomised adds additional 
complexity: now you need to care about whether hashes are randomised AND know 
that there is a flag you can look up and what it is called.

And since the flag won't exist in all versions of Python, or even in all 
builds of a particular Python version, it isn't a matter of just testing the 
flag, but of doing the try...except or hasattr() dance to check whether it 
exists first.

At some point, presuming that there is no speed penalty, the behaviour will 
surely become not just enabled by default but mandatory. Python has never 
promised that hashes must be predictable or consistent, so apart from 
backwards compatibility concerns for old versions, future versions of Python 
should make it mandatory. Presuming that there is no speed penalty, I'd argue 
in favour of making it mandatory for 3.3. Why do we need a flag for something 
that is going to be always on?


From benjamin at  Fri Jan  6 03:04:34 2012
From: benjamin at (Benjamin Peterson)
Date: Thu, 5 Jan 2012 20:04:34 -0600
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

2012/1/5 Steven D'Aprano <steve at>:
> Benjamin Peterson wrote:
>> 2012/1/5 Nick Coghlan <ncoghlan at>:
>>> On Fri, Jan 6, 2012 at 10:07 AM, Steven D'Aprano <steve at>
>>> wrote:
>>>> Surely the way to verify the behaviour is to run this from the shell:
>>>> python -c print(hash("abcde"))
>>>> twice, and see that the calls return different values. (Or have I
>>>> misunderstood the way the fix is going to work?)
>>>> In any case, I wouldn't want to rely on the presence of a flag in the
>>>> sys
>>>> module to verify the behaviour, I'd want to see for myself that hash
>>>> collisions are no longer predictable.
>>> More directly, you can just check that the hash of the empty string is
>>> non-zero.
>>> So -1 for a flag in the sys module - "hash('') != 0" should serve as a
>>> sufficient check whether or not process-level string hash
>>> randomisation is in effect.
>> What exactly is the disadvantage of a sys attribute? That would seem
>> preferable to an obscure incarnation like that.
> There's nothing obscure about directly testing the hash. That's about as far
> from obscure as it is possible to get: you are directly testing the presence
> of a feature by testing the feature.

It's obscure because hash('') != 0 doesn't necessarily mean the hashes
are randomized. A different hashing algorithm could be in effect.


From lists at  Fri Jan  6 03:09:55 2012
From: lists at (Christian Heimes)
Date: Fri, 06 Jan 2012 03:09:55 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Am 06.01.2012 03:04, schrieb Benjamin Peterson:
> It's obscure because hash('') != 0 doesn't necessarily mean the hashes
> are randomized. A different hashing algorithm could be in effect.

Also in 1 of 2**32 or 2**64 tries hash('') is 0 although randomizing is


From robertc at  Fri Jan  6 03:43:32 2012
From: robertc at (Robert Collins)
Date: Fri, 6 Jan 2012 15:43:32 +1300
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 6, 2012 at 11:17 AM, Antoine Pitrou <solipsis at> wrote:
>> >From what I understand, the biggest motivation for pure Python
>> versions is cooperation with the other Python implementations. ?See
> Apologies, I didn't remember it was written down in PEP.
> A bit more than a school of thought, then :-)

It needs to be correct to aid other implementation though, doesn't it?

Copying/reusing something buggy won't help...


From steve at  Fri Jan  6 04:08:10 2012
From: steve at (Steven D'Aprano)
Date: Fri, 06 Jan 2012 14:08:10 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<1325792005.2123.11.camel@surprise>	<>	<>	<>	<>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/5 Steven D'Aprano <steve at>:
>> There's nothing obscure about directly testing the hash. That's about as far
>> from obscure as it is possible to get: you are directly testing the presence
>> of a feature by testing the feature.
> It's obscure because hash('') != 0 doesn't necessarily mean the hashes
> are randomized. A different hashing algorithm could be in effect.

Fair point, but I didn't actually suggest testing hash('') != 0, that was 
Nick's suggestion, which he's since withdrawn.


From v+python at  Fri Jan  6 04:46:53 2012
From: v+python at (Glenn Linderman)
Date: Thu, 05 Jan 2012 19:46:53 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<1325792005.2123.11.camel@surprise>	<>	<>
Message-ID: <>

On 1/5/2012 5:52 PM, Steven D'Aprano wrote:
> At some point, presuming that there is no speed penalty, the behaviour 
> will surely become not just enabled by default but mandatory. Python 
> has never promised that hashes must be predictable or consistent, so 
> apart from backwards compatibility concerns for old versions, future 
> versions of Python should make it mandatory. Presuming that there is 
> no speed penalty, I'd argue in favour of making it mandatory for 3.3. 
> Why do we need a flag for something that is going to be always on? 

I think the whole paragraph is invalid, because it presumes there is no 
speed penalty.  I presume there will be a speed penalty, until 
benchmarking shows otherwise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From anacrolix at  Fri Jan  6 07:11:37 2012
From: anacrolix at (Matt Joiner)
Date: Fri, 6 Jan 2012 17:11:37 +1100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

I'm pretty sure the Python version of RLock is in use in several
alternative implementations that provide an alternative _thread.lock. I
think gevent would fall into this camp, as well as a personal project of
mine in a similar vein that operates on python3.

2012/1/6 Charles-Fran?ois Natali <neologix at>

> Hi,
> Issue #13697 ( deals with a problem
> with the Python version of threading.RLock (a signal handler which
> tries to acquire the same RLock is called right at the wrong time)
> which doesn't affect the C version.
> Whether such a use case can be considered good practise or the best
> way to fix this is not settled yet, but the question that arose to me
> is: "why do we have both a C and Python version?".
> Here's Antoine answer (he suggested to me to bring this up on python-dev":
> """
> The C version is quite recent, and there's a school of thought that we
> should always provide fallback Python implementations.
> (also, arguably a Python implementation makes things easier to
> prototype, although I don't think it's the case for an RLock)
> """
> So, what do you guys think?
> Would it be okay to nuke the Python version?
> Do you have more details on this "school of thought"?
> Also, while we're at it, Victor created #13550 to try to rewrite the
> "logging hack" of the threading module: there again, I think we could
> just remove this logging altogether. What do you think?
> Cheers,
> cf
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From storchaka at  Fri Jan  6 07:41:17 2012
From: storchaka at (Serhiy Storchaka)
Date: Fri, 06 Jan 2012 08:41:17 +0200
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
	<> <je57ed$6nq$>
Message-ID: <je652j$el3$>

06.01.12 02:10, Nick Coghlan ???????(??):
> Not a good idea - a lot of the 3rd party tests that depend on dict
> ordering are going to be using those modules anyway, so scattering our
> solution across half the standard library is needlessly creating
> additional work without really reducing the incompatibility problem.
> If we're going to change anything, it may as well be the string
> hashing algorithm itself.

Changing the string hashing algorithm will hit the general performance 
and also will break down any code that depend on dict ordering. 
Specialized dict slow down only needed parts of some applications.

From paul at  Fri Jan  6 08:12:46 2012
From: paul at (Paul Smedley)
Date: Fri, 06 Jan 2012 17:42:46 +1030
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je5dal$co1$>
References: <je3onm$57p$>
	<je4vjj$dtf$> <je5dal$co1$>
Message-ID: <je66tf$oms$>

Hi Terry,

On 06/01/12 10:25, Terry Reedy wrote:
> On 1/5/2012 3:01 PM, Paul Smedley wrote:
>>> File "./", line 1154, in detect_modules
>>> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()]
>>> AttributeError: 'NoneType' object has no attribute 'split'
>>> make: *** [sharedmods] Error 1
>> File "./", line 1368, in detect_modules
>> if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"):
>> TypeError: argument of type 'NoneType' is not iterable
>> make: *** [sharedmods] Error 1
>> Which again points to problems with
>> sysconfig.get_config_var("CONFIG_ARGS"):
> [The earlier call was with "__CONFIG_ARGS", for whatever difference that
> makes.] It appears to be returning None instead of [] (or a populated
> list).
> In 3.2.2, at line 579 of is
> def get_config_var(name):
> return get_config_vars().get(name)
> That defaults to None if name is not a key in the dict returned by
> get_config_vars(). My guess is that it always is and and the the value
> is always a list for tested win/*nix/mac systems. So either has
> the bug of assuming that there is always a list value for "CONFIG_ARGS"
> or has the bug of not setting it for os2, perhaps because
> of a bug elsewhere.
> At line 440 of is
> def get_config_var(*args):
> global _CONFIG_VARS
> if _CONFIG_VARS is None:
> <code to populate _CONFIG_VARS, including>
> if in ('nt', 'os2'):
> _init_non_posix(_CONFIG_VARS)
> if args:
> vals = []
> for name in args:
> vals.append(_CONFIG_VARS.get(name))
> return vals
> else:
> return _CONFIG_VARS
> At 456 is
> def _init_non_posix(vars):
> """Initialize the module as appropriate for NT"""
> # set basic install directories
> ...
> "CONFIG_ARGS" is not set explicitly for any system anywhere in the file,
> so I do not know how the call ever works.
This looks pretty much the same as the code in 2.7.2 - I don't 
understand Python code well enough to debug the script :(

Thanks for the response,


From paul at  Fri Jan  6 09:52:38 2012
From: paul at (Paul Smedley)
Date: Fri, 06 Jan 2012 19:22:38 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
Message-ID: <je6con$r2l$>

Hi All,

I'm a little slow in responding to, 
but I'm interested in stepping up to help maintain OS/2 support in 
Python 3.3 and above.

I've been building Python 2.x for a while, and currently have binaries 
of 2.6.5 available from

Unlike Andrew Mcintyre, I'm using libc for development 
( rather than emx.  libc is still being 
developed whereas emx hasn't been updated in about 10 years.

I haven't attempted a build of 3.x yet, but will grab the latest 3.x 
release and see what it takes to get it building here.  I expect I'll 
hit the same problem with sysconfig.get_config_var("CONFIG_ARGS"): as 
with 2.7.2 but we'll wait and see.



From mark at  Fri Jan  6 10:18:39 2012
From: mark at (Mark Shannon)
Date: Fri, 06 Jan 2012 09:18:39 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <je652j$el3$>
References: <>	<>
	<>	<>	<>	<>	<>	<>	<>	<>
	<je57ed$6nq$>	<>
Message-ID: <>

Serhiy Storchaka wrote:
> 06.01.12 02:10, Nick Coghlan ???????(??):
>> Not a good idea - a lot of the 3rd party tests that depend on dict
>> ordering are going to be using those modules anyway, so scattering our
>> solution across half the standard library is needlessly creating
>> additional work without really reducing the incompatibility problem.
>> If we're going to change anything, it may as well be the string
>> hashing algorithm itself.
> Changing the string hashing algorithm will hit the general performance 
> and also will break down any code that depend on dict ordering. 
> Specialized dict slow down only needed parts of some applications.

The minimal proposed change of seeding the hash from a global value (a 
single memory read and an addition) will have such a minimal performance 
effect that it will be undetectable even on the most noise-free testing 


From sandro.tosi at  Fri Jan  6 10:29:40 2012
From: sandro.tosi at (Sandro Tosi)
Date: Fri, 6 Jan 2012 10:29:40 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #12042: a
 queue is only used to retrive results; preliminary patch by
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Thu, Jan 5, 2012 at 23:45, Terry Reedy <tjreedy at> wrote:
> On 1/5/2012 1:51 PM, sandro.tosi wrote:
>> changeset: ? 74282:3353f9747a39
>> branch: ? ? ?2.7
>> ? Doc/whatsnew/2.6.rst | ?4 ++--
> should that have been whatsnew/2.7.rst?

The wording correction was in the 2.6 what's new, when describing
multiprocessing (which was added in 2.6).

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From steve at  Fri Jan  6 11:01:28 2012
From: steve at (Steven D'Aprano)
Date: Fri, 06 Jan 2012 21:01:28 +1100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<1325792005.2123.11.camel@surprise>	<>	<>	<>	<>
Message-ID: <>

Glenn Linderman wrote:
> On 1/5/2012 5:52 PM, Steven D'Aprano wrote:
>> At some point, presuming that there is no speed penalty, the behaviour 
>> will surely become not just enabled by default but mandatory. Python 
>> has never promised that hashes must be predictable or consistent, so 
>> apart from backwards compatibility concerns for old versions, future 
>> versions of Python should make it mandatory. Presuming that there is 
>> no speed penalty, I'd argue in favour of making it mandatory for 3.3. 
>> Why do we need a flag for something that is going to be always on? 
> I think the whole paragraph is invalid, because it presumes there is no 
> speed penalty.  I presume there will be a speed penalty, until 
> benchmarking shows otherwise.

There *may* be a speed penalty, but I draw your attention to Paul McMillian's 
email on 1st of January:

     Empirical testing shows that this unoptimized python
     implementation produces ~10% slowdown in the hashing of
     ~20 character strings.

and Christian Heimes' email on 3rd of January:

     The changeset adds the murmur3 hash algorithm with some
     minor changes, for example more random seeds. At first I
     was worried that murmur might be slower than our old hash
     algorithm. But in fact it seems to be faster!

So I think that it's a fairly safe bet that there will be a solution that is 
as fast, or at worst, trivially slower, than the current hash function. But of 
course, benchmarks will be needed.


From victor.stinner at  Fri Jan  6 12:42:44 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 6 Jan 2012 12:42:44 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Using my patch (random-2.patch), the overhead is 0%. I cannot see a
difference with and without my patch.

== 3 characters ==
1 loops, best of 3: 459 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop


== 3 characters ==
1 loops, best of 3: 458 usec per loop
== 10 characters ==
1 loops, best of 3: 575 usec per loop
== 500 characters ==
1 loops, best of 3: 1.36 msec per loop
(the patched version looks faster just because the timer is not
reliable enough for such fast test)

echo "== 3 characters =="
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%03i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo "== 10 characters =="
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%010i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'

echo "== 500 characters =="
./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%0500i" % x) for x in
range(1,1000))' 'sum(hash(x) for x in text)'
./python -m timeit -n 1 -s 'text=(("%0500i" % x)
(Take the smallest timing for each test)

"-n 1" is needed because the hash value is only computed once (is
cached). I may be possible to have more reliable results by disabling
completly the hash cache (comment "PyUnicode_HASH(self) = x;" line).


From solipsis at  Fri Jan  6 13:42:45 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 6 Jan 2012 13:42:45 +0100
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
References: <je6con$r2l$>
Message-ID: <>

Hi Paul,

> I'm a little slow in responding to 
> but I'm interested in stepping up to help maintain OS/2 support in 
> Python 3.3 and above.
> I've been building Python 2.x for a while, and currently have binaries 
> of 2.6.5 available from
> Unlike Andrew Mcintyre, I'm using libc for development 
> ( rather than emx.  libc is still being 
> developed whereas emx hasn't been updated in about 10 years.
> I haven't attempted a build of 3.x yet, but will grab the latest 3.x 
> release and see what it takes to get it building here.

I would suggest you start from the Mercurial repository instead. There
you'll find both the current stable branch (named "3.2") and the
current development branch (named "default"). It will also make it
easier for you to write and maintain patches.

Let me point you to the devguide, even though it doesn't talk
specifically about porting:



From status at  Fri Jan  6 18:07:32 2012
From: status at (Python tracker)
Date: Fri,  6 Jan 2012 18:07:32 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (2011-12-30 - 2012-01-06)
Python tracker at

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3180 ( +2)
  closed 22322 (+34)
  total  25502 (+36)

Open issues with patches: 1366 

Issues opened (24)

#13685: argparse does not sanitize help strings for % signs  opened by Jeff.Yurkiw

#13686: Some notes on the docs of multiprocessing  opened by eli.bendersky

#13689: fix CGI Web Applications with Python link in howto/urllib2  opened by sandro.tosi

#13691: pydoc help (or help('help')) claims to run a help utility; doe  opened by Devin Jeanpierre

#13692: 2to3 mangles from . import frobnitz  opened by holmbie

#13694: asynchronous connect in asyncore.dispatcher does not set addr  opened by anacrolix

#13695: "type specific" to "type-specific"  opened by Retro

#13697: python RLock implementation unsafe with signals  opened by rbcollins

#13698: Mailbox module should support other mbox formats in addition t  opened by endolith

#13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani  opened by etukia

#13701: Remove Decimal Python 2.3 Compatibility  opened by ramchandra.apte

#13702: relative symlinks in tarfile.extract broken (windows)  opened by Patrick.von.Reth

#13703: Hash collision security issue  opened by barry

#13704: Random number generator in Python core  opened by christian.heimes

#13706: non-ascii fill characters no longer work in formatting  opened by skrah

#13708: Document ctypes.wintypes  opened by ramchandra.apte

#13709: Capitalization mistakes in the documentation for ctypes  opened by ramchandra.apte

#13712: pysetup create should not convert package_data to extra_files  opened by christian.heimes

#13715: typo in unicodedata documentation  opened by eli.collins

#13716: distutils doc contains lots of XXX  opened by flox

#13718: Format Specification Mini-Language does not accept comma for p  opened by mkesper

#13719: bdist_msi upload fails  opened by schmir

#13720: argparse print_help() fails if COLUMNS is set to a low value  opened by zbysz

#818201: distutils: clean does not use build_base option from build  reopened by eric.araujo

Most recent 15 issues with no replies (15)

#13720: argparse print_help() fails if COLUMNS is set to a low value

#13718: Format Specification Mini-Language does not accept comma for p

#13715: typo in unicodedata documentation

#13708: Document ctypes.wintypes

#13691: pydoc help (or help('help')) claims to run a help utility; doe

#13689: fix CGI Web Applications with Python link in howto/urllib2

#13682: Documentation of os.fdopen() refers to non-existing bufsize ar

#13668: mute ImportError in __del__ of _threading_local module

#13665: TypeError: string or integer address expected instead of str i

#13649: termios.ICANON is not documented

#13638: PyErr_SetFromErrnoWithFilenameObject is undocumented

#13633: Handling of hex character references in HTMLParser.handle_char

#13631: readline fails to parse some forms of .editrc under editline (

#13608: remove born-deprecated PyUnicode_AsUnicodeAndSize

#13605: document argparse's nargs=REMAINDER

Most recent 15 issues waiting for review (15)

#13719: bdist_msi upload fails

#13715: typo in unicodedata documentation

#13712: pysetup create should not convert package_data to extra_files

#13704: Random number generator in Python core

#13703: Hash collision security issue

#13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani

#13694: asynchronous connect in asyncore.dispatcher does not set addr

#13691: pydoc help (or help('help')) claims to run a help utility; doe

#13684: httplib tunnel infinite loop

#13681: Aifc read compressed frames fix

#13677: correct docstring for builtin compile

#13676: sqlite3: Zero byte truncates string contents

#13673: PyTraceBack_Print() fails if signal received but PyErr_CheckSi

#13670: Increase test coverage for

#13668: mute ImportError in __del__ of _threading_local module

Top 10 most discussed issues (10)

#13703: Hash collision security issue  70 msgs

#13609: Add "os.get_terminal_size()" function  17 msgs

#8184: multiprocessing.managers will not fail if listening ocket alre  14 msgs

#13697: python RLock implementation unsafe with signals  11 msgs

#13700: imaplib.IMAP4.authenticate authobject fails with PLAIN mechani  10 msgs

#1079: decode_header does not follow RFC 2047   7 msgs

#13704: Random number generator in Python core   6 msgs

#13706: non-ascii fill characters no longer work in formatting   6 msgs

#8416: python 2.6.5 documentation can't search   5 msgs

#9993: shutil.move fails on symlink source   5 msgs

Issues closed (34)

#6031: BaseServer.shutdown documentation is incomplete  closed by sandro.tosi

#8245: email examples don't actually work (SMTP.connect is not called  closed by sandro.tosi

#9201: IDLE: raises Exception TclError in a special case  closed by ned.deily

#9349: document argparse's help=SUPPRESS  closed by sandro.tosi

#9975: Incorrect use of flowinfo and scope_id in IPv6 sockaddr tuple  closed by neologix

#10521: str methods don't accept non-BMP fillchar on a narrow Unicode  closed by benjamin.peterson

#10542: Py_UNICODE_NEXT and other macros for surrogates  closed by benjamin.peterson

#11648: openlog()s 'logopt' keyword broken in syslog module  closed by sandro.tosi

#11984: Wrong "See also" in symbol and token module docs  closed by sandro.tosi

#12042: What's New multiprocessing example error  closed by sandro.tosi

#12926: tarfile tarinfo.extract*() broken with symlinks  closed by lars.gustaebel

#13302: Clarification needed in C API arg parsing  closed by sandro.tosi

#13511: Specifying multiple lib and include directories on linux  closed by loewis

#13558: multiprocessing package incompatible with PyObjC  closed by ned.deily

#13565: test_multiprocessing.test_notify_all() hangs on "AMD64 Snow Le  closed by neologix

#13594: Aifc markers write fix  closed by sandro.tosi

#13636: Python SSL Stack doesn't have a Secure Default set of ciphers  closed by pitrou

#13640: add mimetype for application/  closed by sandro.tosi

#13679: Multiprocessing system crash  closed by pitrou

#13680: Aifc comptype write fix  closed by sandro.tosi

#13683: Docs in Python 3:raise statement mistake  closed by sandro.tosi

#13687: parse incorrect command line on windows 7  closed by balenocui

#13688: ast.literal_eval fails on octal numbers  closed by fidoman

#13690: Add DEBUG flag to documentation of re.compile  closed by sandro.tosi

#13693: email.Header.Header incorrect/non-smart on international chars  closed by r.david.murray

#13696: [urllib.request.HTTPRedirectHandler.http_error_302] Relative R  closed by orsenthil

#13699: test_gdb has recently started failing  closed by python-dev

#13705: Raising exceptions from finally works better than advertised i  closed by python-dev

#13707: Clarify hash() constancy period  closed by rhettinger

#13710: hash() on strings containing only null characters returns the  closed by benjamin.peterson

#13711: html.parser.HTMLParser doesn't parse tags in comments in scrip  closed by ezio.melotti

#13713: Regression for http.client read()  closed by pitrou

#13714: Methods of ftplib never ends if the ip address changes  closed by giampaolo.rodola

#13717: print fails on unicode '\udce5' surrogates not allowed  closed by ezio.melotti

From neologix at  Fri Jan  6 20:10:04 2012
From: neologix at (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Fri, 6 Jan 2012 20:10:04 +0100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks for those precisions, but I must admit it doesn't help me much...
Can we drop it? A yes/no answer will do it ;-)

> I'm pretty sure the Python version of RLock is in use in several alternative
> implementations that provide an alternative _thread.lock. I think gevent
> would fall into this camp, as well as a personal project of mine in a
> similar vein that operates on python3.

Sorry, I'm not sure I understand. Do those projects use _PyRLock directly?
If yes, then aliasing it to _CRLock should do the trick, no?

From paul at  Fri Jan  6 20:58:00 2012
From: paul at (Paul Smedley)
Date: Sat, 07 Jan 2012 06:28:00 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
In-Reply-To: <je6con$r2l$>
References: <je6con$r2l$>
Message-ID: <je7joa$ndh$>

Hi All,

On 06/01/12 19:22, Paul Smedley wrote:
> I'm a little slow in responding to
> but I'm interested in stepping up to help maintain OS/2 support in
> Python 3.3 and above.
> I've been building Python 2.x for a while, and currently have binaries
> of 2.6.5 available from
> Unlike Andrew Mcintyre, I'm using libc for development
> ( rather than emx. libc is still being
> developed whereas emx hasn't been updated in about 10 years.
> I haven't attempted a build of 3.x yet, but will grab the latest 3.x
> release and see what it takes to get it building here. I expect I'll hit
> the same problem with sysconfig.get_config_var("CONFIG_ARGS"): as with
> 2.7.2 but we'll wait and see.

I now have a dll and exe - however when it tried to build the modules, 
it dies with:
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: Unable to get the locale encoding
LookupError: no codec search functions registered: can't find encoding

Have done a small amount of debugging:
in get_codeset(),     char* codeset = nl_langinfo(CODESET);
returns: ISO8859-1

Which can't be found by:
     codec = _PyCodec_Lookup(encoding);

get_codec_name(const char *encoding)

Where is the list of valid codepages read from? Should ISO8859-1 be 
valid? I see some references to ISO-8859-1 in the code but not ISO8859-1



From jimjjewett at  Fri Jan  6 21:06:54 2012
From: jimjjewett at (Jim Jewett)
Date: Fri, 6 Jan 2012 15:06:54 -0500
Subject: [Python-Dev]  Hash collision security issue (now public)
Message-ID: <>

Mark Shannon wrote:

> The minimal proposed change of seeding the hash from a global value (a
> single memory read and an addition) will have such a minimal performance
> effect that it will be undetectable even on the most noise-free testing
> environment.

(1)  Is it established that this (a single initial add, with no
per-loop operations) would be sufficient?

I thought that was in the gray area of "We don't yet have a known
attack, but there are clearly safer options."

(2)  Even if the direct cost (fetch and add) were free, it might be
expensive in practice.  The current hash function is designed to send
"similar" strings (and similar numbers) to similar hashes.

(2a)  That guarantees they won't (initially) collide, even in very small dicts.
(2b)  It keeps them nearby, which has an effect on cache hits.   The
exact effect (and even direction) would of course depend on the
workload, which makes me distrust micro-benchmarks.

If this were a problem in practice, I could understand accepting a
little slowdown as the price of safety, but ... it isn't.  Even in
theory, the only way to trigger this is to take unreasonable amounts
of user input and turn it directly into an unreasonable number of keys
(as opposed to values, or list elements) placed in the same dict (as
opposed to a series of smaller dicts).


From mark at  Fri Jan  6 21:25:46 2012
From: mark at (Mark Shannon)
Date: Fri, 06 Jan 2012 20:25:46 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>


It seems to me that half the folk discussing this issue want a 
super-strong, resist-all-hypothetical-attacks hash with little regard to 
performance. The other half want no change or a change that will have no 
  observable effect. (I may be exaggerating a little.)

Can I propose the following, half-way proposal:

1. Since there is a published vulnerability,
that we fix it with the most efficient solution proposed so far:

2. Decide which versions of Python this should be applied to.
3.3 seems a given, the other are open to debate.

3. If and only if (and I think this unlikely) the solution chosen is 
shown to be vulnerable to a more sophisticated attack then a new issue 
should be opened and dealt with separately.


From solipsis at  Fri Jan  6 21:28:29 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 6 Jan 2012 21:28:29 +0100
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
References: <je6con$r2l$>
Message-ID: <>

On Sat, 07 Jan 2012 06:28:00 +1030
Paul Smedley <paul at> wrote:
> I now have a dll and exe - however when it tried to build the modules, 
> it dies with:
> Could not find platform independent libraries <prefix>
> Could not find platform dependent libraries <exec_prefix>
> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
> Fatal Python error: Py_Initialize: Unable to get the locale encoding

I would look at this line:

> LookupError: no codec search functions registered: can't find encoding

Normally the standard codec search function is registered when
importing the "encodings" module (see Lib/encodings/, which
is done at the end of _PyCodecRegistry_Init() in Python/codecs.c.
There's this comment there:

            /* Ignore ImportErrors... this is done so that
               distributions can disable the encodings package. Note
               that other errors are not masked, e.g. SystemErrors
               raised to inform the user of an error in the Python
               configuration are still reported back to the user. */

For the purpose of debugging you could *not* ignore the error and
instead print it out or bail out.



From p.f.moore at  Fri Jan  6 21:52:55 2012
From: p.f.moore at (Paul Moore)
Date: Fri, 6 Jan 2012 20:52:55 +0000
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 January 2012 20:25, Mark Shannon <mark at> wrote:
> Hi,
> It seems to me that half the folk discussing this issue want a super-strong,
> resist-all-hypothetical-attacks hash with little regard to performance. The
> other half want no change or a change that will have no ?observable effect.
> (I may be exaggerating a little.)
> Can I propose the following, half-way proposal:
> 1. Since there is a published vulnerability,
> that we fix it with the most efficient solution proposed so far:
> 2. Decide which versions of Python this should be applied to.
> 3.3 seems a given, the other are open to debate.
> 3. If and only if (and I think this unlikely) the solution chosen is shown
> to be vulnerable to a more sophisticated attack then a new issue should be
> opened and dealt with separately.



From paul at  Fri Jan  6 22:52:36 2012
From: paul at (Paul Smedley)
Date: Sat, 07 Jan 2012 08:22:36 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
In-Reply-To: <>
References: <je6con$r2l$> <je7joa$ndh$>
Message-ID: <je7qfi$6at$>

Hi Antoine,
On 07/01/12 06:58, Antoine Pitrou wrote:
> On Sat, 07 Jan 2012 06:28:00 +1030
> Paul Smedley<paul at>  wrote:
>> I now have a dll and exe - however when it tried to build the modules,
>> it dies with:
>> Could not find platform independent libraries<prefix>
>> Could not find platform dependent libraries<exec_prefix>
>> Consider setting $PYTHONHOME to<prefix>[:<exec_prefix>]
>> Fatal Python error: Py_Initialize: Unable to get the locale encoding
> I would look at this line:
>> LookupError: no codec search functions registered: can't find encoding
> Normally the standard codec search function is registered when
> importing the "encodings" module (see Lib/encodings/, which
> is done at the end of _PyCodecRegistry_Init() in Python/codecs.c.
> There's this comment there:
>              /* Ignore ImportErrors... this is done so that
>                 distributions can disable the encodings package. Note
>                 that other errors are not masked, e.g. SystemErrors
>                 raised to inform the user of an error in the Python
>                 configuration are still reported back to the user. */
> For the purpose of debugging you could *not* ignore the error and
> instead print it out or bail out.
Thanks - commenting out the ImportErrors block, I get:
ImportError: No module named encodings

So seems it's not finding modules - possibly related to the warnings about:
 >> Could not find platform independent libraries<prefix>
 >> Could not find platform dependent libraries<exec_prefix>

Seems getenv() may not be working correctly...

From v+python at  Fri Jan  6 04:39:30 2012
From: v+python at (Glenn Linderman)
Date: Thu, 05 Jan 2012 19:39:30 -0800
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
	<> <je57ed$6nq$>
Message-ID: <>

On 1/5/2012 4:10 PM, Nick Coghlan wrote:
> On Fri, Jan 6, 2012 at 8:15 AM, Serhiy Storchaka<storchaka at>  wrote:
>> 05.01.12 21:14, Glenn Linderman ???????(??):
>>> So, fixing the vulnerable packages could be a sufficient response,
>>> rather than changing the hash function.  How to fix?  Each of those
>>> above allocates and returns a dict.  Simply have each of those allocate
>>> and return and wrapped dict, which has the following behaviors:
>>> i) during __init__, create a local, random, string.
>>> ii) for all key values, prepend the string, before passing it to the
>>> internal dict.
>> Good idea.

Thanks for the implementation, Serhiy.  That is the sort of thing I had 
in mind, indeed.
> Not a good idea - a lot of the 3rd party tests that depend on dict
> ordering are going to be using those modules anyway,

Stats? Didn't someone post a list of tests  that fail when changing the 
hash? Oh, those were stdlib tests, not 3rd party tests.  I'm not sure 
how to gather the stats, then, are you?

> so scattering our
> solution across half the standard library is needlessly creating
> additional work without really reducing the incompatibility problem.

Half the standard library?  no one has cared to augment my list of 
modules, but I have seen reference to JSON in addition to cgi and 
urllib.parse.  I think there are more than 6 modules in the standard 

> If we're going to change anything, it may as well be the string
> hashing algorithm itself.

Changing the string hashing algorithm is known (or at least no one has 
argued otherwise) to be a source of backward incompatibility that will 
break programs.  My proposal (and Serhiy's implementation, assuming it 
works, or can be easily tweaked to work, I haven't reviewed it in detail 
or attempted to test it) will only break programs that have vulnerabilities.

I failed to mention one other benefit of my proposal: every web request 
would have a different random prefix, so attempting to gather info is 
futile: the next request has a different random prefix, so different 
strings would collide.

> Cheers,
> Nick.
Indeed it is nice when we can be cheery even when arguing, for the most 
part :)  I've enjoyed reading the discussions in this forum because most 
folks have respect for other people's opinions, even when they differ.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From anacrolix at  Sat Jan  7 01:10:17 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 7 Jan 2012 11:10:17 +1100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

_PyRLock is not used directly. Instead, no _CRLock is provided, so the
threading.RLock function calls _PyRLock.

It's done this way because green threading libraries may only provide a
greened lock. _CRLock in these contexts would not work: It would block the
entire native thread.

I suspect that if you removed _PyRLock, these implementations would have to
expose their own RLock primitive which works the same way as the one just
removed from the standard library. I don't know if this is a good thing.

I would recommend checking with at least the gevent and eventlet developers.

2012/1/7 Charles-Fran?ois Natali <neologix at>

> Thanks for those precisions, but I must admit it doesn't help me much...
> Can we drop it? A yes/no answer will do it ;-)
> > I'm pretty sure the Python version of RLock is in use in several
> alternative
> > implementations that provide an alternative _thread.lock. I think gevent
> > would fall into this camp, as well as a personal project of mine in a
> > similar vein that operates on python3.
> Sorry, I'm not sure I understand. Do those projects use _PyRLock directly?
> If yes, then aliasing it to _CRLock should do the trick, no?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tim.peters at  Sat Jan  7 04:05:46 2012
From: tim.peters at (Tim Peters)
Date: Fri, 6 Jan 2012 22:05:46 -0500
Subject: [Python-Dev] "Sort attacks" (was Re: Hash collision security issue
 (now public))
Message-ID: <>

I can't find it now, but I believe Marc-Andre mentioned that CPython's
list.sort() was vulnerable to attack too, because of its O(n log n)
worst-case behavior.

I wouldn't worry about that, because nobody could stir up anguish
about it by writing a paper ;-)

1. O(n log n) is enormously more forgiving than O(n**2).

2. An attacker need not be clever at all:  O(n log n) is not only
sort()'s worst case, it's also its _expected_ case when fed randomly
ordered data.

3. It's provable that no comparison-based sorting algorithm can have
better worst-case asymptotic behavior when fed randomly ordered data.

So if anyone whines about this, tell 'em to go do something useful instead :-)

still-solving-problems-not-in-need-of-attention-ly y'rs  - tim

From paul at  Sat Jan  7 09:48:10 2012
From: paul at (Paul Smedley)
Date: Sat, 07 Jan 2012 19:18:10 +1030
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je5dal$co1$>
References: <je3onm$57p$>
	<je4vjj$dtf$> <je5dal$co1$>
Message-ID: <je90sb$eb8$>

Hi All,

On 06/01/12 10:25, Terry Reedy wrote:
> On 1/5/2012 3:01 PM, Paul Smedley wrote:
>>> File "./", line 1154, in detect_modules
>>> for arg in sysconfig.get_config_var("__CONFIG_ARGS").split()]
>>> AttributeError: 'NoneType' object has no attribute 'split'
>>> make: *** [sharedmods] Error 1
>> File "./", line 1368, in detect_modules
>> if '--with-system-expat' in sysconfig.get_config_var("CONFIG_ARGS"):
>> TypeError: argument of type 'NoneType' is not iterable
>> make: *** [sharedmods] Error 1
>> Which again points to problems with
>> sysconfig.get_config_var("CONFIG_ARGS"):
> [The earlier call was with "__CONFIG_ARGS", for whatever difference that
> makes.] It appears to be returning None instead of [] (or a populated
> list).
> In 3.2.2, at line 579 of is
> def get_config_var(name):
> return get_config_vars().get(name)
> That defaults to None if name is not a key in the dict returned by
> get_config_vars(). My guess is that it always is and and the the value
> is always a list for tested win/*nix/mac systems. So either has
> the bug of assuming that there is always a list value for "CONFIG_ARGS"
> or has the bug of not setting it for os2, perhaps because
> of a bug elsewhere.
> At line 440 of is
> def get_config_var(*args):
> global _CONFIG_VARS
> if _CONFIG_VARS is None:
> <code to populate _CONFIG_VARS, including>
> if in ('nt', 'os2'):
> _init_non_posix(_CONFIG_VARS)
> if args:
> vals = []
> for name in args:
> vals.append(_CONFIG_VARS.get(name))
> return vals
> else:
> return _CONFIG_VARS
> At 456 is
> def _init_non_posix(vars):
> """Initialize the module as appropriate for NT"""
> # set basic install directories
> ...
> "CONFIG_ARGS" is not set explicitly for any system anywhere in the file,
> so I do not know how the call ever works.

using _init_posix() for 'os2' instead of _init_non_posix is the fix for 
this. also needs the following changes:
--- \dev\Python-2.7.2-o\Lib\	2012-01-06 19:27:14.000000000 +1030
+++	2012-01-07 19:03:00.000000000 +1030
@@ -46,7 +46,7 @@
          'scripts': '{base}/Scripts',
          'data'   : '{base}',
-    'os2_home': {
+    'os2_user': {
          'stdlib': '{userbase}/lib/python{py_version_short}',
          'platstdlib': '{userbase}/lib/python{py_version_short}',
@@ -413,9 +413,9 @@
          _CONFIG_VARS['platbase'] = _EXEC_PREFIX
          _CONFIG_VARS['projectbase'] = _PROJECT_BASE

-        if in ('nt', 'os2'):
+        if in ('nt'):
-        if == 'posix':
+        if in ('posix', 'os2'):

          # Setting 'userbase' is done below the call to the

From tjreedy at  Sat Jan  7 10:17:33 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 07 Jan 2012 04:17:33 -0500
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je90sb$eb8$>
References: <je3onm$57p$>
	<je4vjj$dtf$> <je5dal$co1$>
Message-ID: <je92jk$mou$>

On 1/7/2012 3:48 AM, Paul Smedley wrote:

> using _init_posix() for 'os2' instead of _init_non_posix is the fix for
> this.
> also needs the following changes:
> --- \dev\Python-2.7.2-o\Lib\ 2012-01-06 19:27:14.000000000
> +1030
> +++ 2012-01-07 19:03:00.000000000 +1030
> @@ -46,7 +46,7 @@
> 'scripts': '{base}/Scripts',
> 'data' : '{base}',
> },
> - 'os2_home': {
> + 'os2_user': {
> 'stdlib': '{userbase}/lib/python{py_version_short}',
> 'platstdlib': '{userbase}/lib/python{py_version_short}',
> 'purelib': '{userbase}/lib/python{py_version_short}/site-packages',
> @@ -413,9 +413,9 @@
> _CONFIG_VARS['platbase'] = _EXEC_PREFIX
> _CONFIG_VARS['projectbase'] = _PROJECT_BASE
> - if in ('nt', 'os2'):
> + if in ('nt'):
> _init_non_posix(_CONFIG_VARS)
> - if == 'posix':
> + if in ('posix', 'os2'):
> _init_posix(_CONFIG_VARS)

Submit a patch on the tracker, preferably as a file rather than cut and 

Terry Jan Reedy

From stefan_ml at  Sat Jan  7 12:02:04 2012
From: stefan_ml at (Stefan Behnel)
Date: Sat, 07 Jan 2012 12:02:04 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <je98nc$oon$>

Christian Heimes, 31.12.2011 04:59:
> Am 31.12.2011 03:22, schrieb Victor Stinner:
> The unique structure of CPython's dict implementation makes it harder to
> get the number of values with equal hash. The academic hash map (the one
> I learnt about at university) uses a bucket to store all elements with
> equal hash (more precise hash: mod mask). However Python's dict however
> perturbs the hash until it finds a free slot its array. The second,
> third ... collision can be caused by a legit and completely different
> (!) hash.
>> The last choice is to change the hash algorithm. The *idea* is the same 
>> than adding salt to hashed password (in practice it will be a little bit 
>> different): if a pseudo-random salt is added, the attacker cannot 
>> prepare a single dataset, he/she will have to regenerate a new dataset 
>> for each possible salt value. If the salt is big enough (size in bits), 
>> the attacker will need too much CPU to generate the dataset (compute N 
>> keys with the same hash value). Basically, it slows down the attack by 
>> 2^(size of the salt).
> That's the idea of randomized hashing functions as implemented by Ruby
> 1.8, Perl and others. The random seed is used as IV. Multiple rounds of
> multiply, XOR and MOD (integer overflows) cause a deviation. In your
> other posting you were worried about the performance implication. A
> randomized hash function just adds a single ADD operation, that's all.
> Downside: With randomization all hashes are unpredictable and change
> after every restart of the interpreter. This has some subtle side
> effects like a different outcome of {a:1, b:1, c:1}.keys() after a
> restart of the interpreter.
>> Another possibility would be to replace our fast hash function by a 
>> better hash function like MD5 or SHA1 (so the creation of the dataset 
>> would be too slow in practice = too expensive), but cryptographic hash 
>> functions are much slower (and so would slow down Python too much).
> I agree with your analysis. Cryptographic hash functions are far too
> slow for our use case. During my research I found another hash function
> that claims to be fast and that may not be vulnerable to this kind of
> attack:

Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's
portable, known to provide a very good distribution for different string
values and is generally fast on both 32 and 64 bit architectures.

The analysis is here:

It seems that there's also support for generating 64bit hash values
(actually 2x32bits) efficiently.

Admittedly, this may require some adaptation for the PEP393 unicode memory
layout in order to produce identical hashes for all three representations
if they represent the same content. So it's not a drop-in replacement.


From ncoghlan at  Sat Jan  7 14:13:23 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 7 Jan 2012 23:13:23 +1000
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/7 Charles-Fran?ois Natali <neologix at>:
> Thanks for those precisions, but I must admit it doesn't help me much...
> Can we drop it? A yes/no answer will do it ;-)

The yes/no answer is "No, we can't drop it".

Even though CPython no longer uses the Python version of RLock in
normal operation, it's still the reference implementation for everyone
else that has to perform the same task (i.e. wrap Python code around a
non-reentrant lock to create a reentrant one).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan  7 14:22:44 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 7 Jan 2012 23:22:44 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the
 source and destination are on different filesystems, 
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 7, 2012 at 5:17 AM, antoine.pitrou
<python-checkins at> wrote:
> changeset: ? 74288:1ea8b7233fd7
> user: ? ? ? ?Antoine Pitrou <solipsis at>
> date: ? ? ? ?Fri Jan 06 20:16:19 2012 +0100
> summary:
> ?Issue #9993: When the source and destination are on different filesystems,
> and the source is a symlink, shutil.move() now recreates a symlink on the
> destination instead of copying the file contents.
> Patch by Jonathan Niehof and Hynek Schlawack.

That seems like a fairly nasty backwards incompatibilty right there.
While the old behaviour was different from mv, it was still perfectly
well defined. Now, operations that used to work may fail - basically
anything involving an absolute symlink will silently fail if being
moved to removable media (it will create a symlink that is completely
useless on the destination machine). Relative symlinks may or may not
be broken depending on whether or not their target is *also* being
copied to the destination media.

The new help text also doesn't say what will happen if the destination
doesn't even *support* symlinks (as is quite likely in the removable
media case).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From anacrolix at  Sat Jan  7 16:22:15 2012
From: anacrolix at (Matt Joiner)
Date: Sun, 8 Jan 2012 02:22:15 +1100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

Nick did you mean to say "wrap python code around a reentrant lock to
create a non-reentrant lock"? Isn't that what PyRLock is doing?

FWIW having now read issues 13697 and 13550, I'm +1 for dropping Python
RLock, and all the logging machinery in threading.

2012/1/8 Nick Coghlan <ncoghlan at>

> 2012/1/7 Charles-Fran?ois Natali <neologix at>:
> > Thanks for those precisions, but I must admit it doesn't help me much...
> > Can we drop it? A yes/no answer will do it ;-)
> The yes/no answer is "No, we can't drop it".
> Even though CPython no longer uses the Python version of RLock in
> normal operation, it's still the reference implementation for everyone
> else that has to perform the same task (i.e. wrap Python code around a
> non-reentrant lock to create a reentrant one).
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Jan  7 16:38:26 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 8 Jan 2012 01:38:26 +1000
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/8 Matt Joiner <anacrolix at>:
> Nick did you mean to say "wrap python code around a reentrant lock to create
> a non-reentrant lock"? Isn't that what PyRLock is doing?

Actually, I should have said recursive, not reentrant.

> FWIW having now read issues 13697 and 13550, I'm +1 for dropping Python
> RLock, and all the logging machinery in threading.

While I agree on removing the unused and potentially problematic
debugging machinery, I'm not convinced of the benefits of removing the
pure Python RLock implementation. To quote Charles-Fran?ois from the
tracker issue: "Now, the fun part: this affects not only RLock, but
every Python code performing "atomic" actions: condition variables,
barriers, etc. There are some constraints on what can be done from a
signal handler, and it should probably be documented."

Remove the pure Python RLock doesn't seem to actually solve anything -
it just pushes the problem of fixing the signal interaction back onto
third party users that are even more ill-equipped to resolve it than
we are.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From hs at  Sat Jan  7 17:11:19 2012
From: hs at (Hynek Schlawack)
Date: Sat, 7 Jan 2012 17:11:19 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the
 source and destination are on different filesystems, 
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Nick,  

Am Samstag, 7. Januar 2012 um 14:22 schrieb Nick Coghlan:

> >
> > changeset: 74288:1ea8b7233fd7
> > user: Antoine Pitrou <solipsis at (mailto:solipsis at>
> > date: Fri Jan 06 20:16:19 2012 +0100
> > summary:
> > Issue #9993: When the source and destination are on different filesystems,
> > and the source is a symlink, shutil.move() now recreates a symlink on the
> > destination instead of copying the file contents.
> > Patch by Jonathan Niehof and Hynek Schlawack.
> That seems like a fairly nasty backwards incompatibilty right there.
> While the old behaviour was different from mv, it was still perfectly
> well defined. Now, operations that used to work may fail - basically
> anything involving an absolute symlink will silently fail if being
> moved to removable media (it will create a symlink that is completely
> useless on the destination machine). Relative symlinks may or may not
> be broken depending on whether or not their target is *also* being
> copied to the destination media.

I had a look at it, the possible cases are as following:

1. we can just do a os.rename(): if src is a link it stays one
2. os.rename() fails, src is not a symlink but a directory: copytree() is used with symlinks=True, i.e. symlinks are preserved, no matter where they point to, i.e. this would clash with removable media as well.
3. os.rename() fails and src is a symlink. In both former cases, links were preserved. And the removable-media-argument is IMHO moot due to case 2.

If you want hardcore backwards compatibility, we could make the old behavior default and add some flag. But to be honest, the new approach seems more congruent to me.  
> The new help text also doesn't say what will happen if the destination
> doesn't even *support* symlinks (as is quite likely in the removable
> media case).

A clarification might be appropriate. Maybe even a direct warning, that in such cases the usage of copytree(?, symlinks=False) might be a better idea?

But the more I think about it, the more it's my impression, that symlink problems aren't really our problems as they go through all possible layers and it's next to impossible to catch all edge cases in library code. Therefore I'd say it's best just to behave like UNIX tools (please note I'm not defensive here, I've just fixed the tests+docs :)).


From lists at  Sat Jan  7 18:57:10 2012
From: lists at (Christian Heimes)
Date: Sat, 07 Jan 2012 18:57:10 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <je98nc$oon$>
References: <>
	<> <>
Message-ID: <jea11n$d3v$>

Am 07.01.2012 12:02, schrieb Stefan Behnel:
> Wouldn't Bob Jenkins' "lookup3" hash function fit in here? After all, it's
> portable, known to provide a very good distribution for different string
> values and is generally fast on both 32 and 64 bit architectures.
> The analysis is here:
> It seems that there's also support for generating 64bit hash values
> (actually 2x32bits) efficiently.

This thread as well as the ticket is getting so long that people barely
have a chance to catch up ...

Guido has stated that he doesn't want a completely new hash algorithm
for Python 2.x to 3.2. A new hash algorithm for 3.3 needs a PEP, too.

I've done some experiments with FNV and Murmur3. With Murmur3 128bit
I've seen some minor speed improvements on 64bit platforms. At first I
was surprised but it makes sense. Murmur3 operates on uint32_t blocks
while Python's hash algorithm iterates over 1 byte (bytes, ASCII), 2
bytes (USC2) or 4 bytes (USC4) types. Since most strings are either
ASCII or UCS2, the inner loop of the current algorithm is more tight.

> Admittedly, this may require some adaptation for the PEP393 unicode memory
> layout in order to produce identical hashes for all three representations
> if they represent the same content. So it's not a drop-in replacement.

Is this condition required and implemented at the moment?


From martin at  Sat Jan  7 18:57:41 2012
From: martin at (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 07 Jan 2012 18:57:41 +0100
Subject: [Python-Dev] Python as a Metro-style App
Message-ID: <>

I just tried porting Python as a Metro (Windows 8) App, and failed.

Metro Apps use a variant of the Windows API called WinRT that still
allows to write native applications in C++, but restricts various APIs
to a subset of the full Win32 functionality. For example, everything
related to subprocess creation would not work; none of the
byte-oriented file API seems to be present, and a number of file
operation functions are absent as well (such as MoveFile).

Regardless, porting Python ought to be feasible, except that it fails
fundamentally with the preview release of Visual Studio.

The problem is that compilation of C code is apparently not
supported/tested in that preview release. When compiling a trivial
C file in a Metro app, the compiler complains that a temporary file
ending with "md" could not be found, most likely because the C
compiler failed to generate it, whereas the C++ compiler would.

I tried compiling the Python sources as C++, but that produced
hundreds of compilation errors. Most of them are either about missing
casts (e.g. from int to enum types, or from void * to other pointer
types), or about the "static forward" declarations of type objects.

For the latter, anonymous namespaces should be used. While it is
feasible to replace

static PyTypeObject foo;
static PyTypeObject foo = {


PyTypeObject foo;
PyTypeObject foo = {

I'm not sure whether such a change would be accepted, in particular as
Microsoft might fix the bug in the compiler until the final release
of Windows 8.


From tjreedy at  Sat Jan  7 21:53:29 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 07 Jan 2012 15:53:29 -0500
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <jea11n$d3v$>
References: <>
	<> <>
	<je98nc$oon$> <jea11n$d3v$>
Message-ID: <jeabci$cjo$>

On 1/7/2012 12:57 PM, Christian Heimes wrote:
> Am 07.01.2012 12:02, schrieb Stefan Behnel:

>> Admittedly, this may require some adaptation for the PEP393 unicode memory
>> layout in order to produce identical hashes for all three representations
>> if they represent the same content. So it's not a drop-in replacement.
> Is this condition required and implemented at the moment?

If o1 == o2, then hash(o1) == hash(o2) is an unstated requirement 
implied by "They [hash values] are used to quickly compare dictionary 
keys during a dictionary lookup." since hash(o1) != hash(o2) is taken to 
mean o1 != o2 (whereas hash(o1) == hash(o2) is taken to mean o1 == o2 is 
possible but must be checked). Hashing should be a coarsening of == as 
an equivalence relationship.

Terry Jan Reedy

From vinay_sajip at  Sat Jan  7 22:25:37 2012
From: vinay_sajip at (Vinay Sajip)
Date: Sat, 7 Jan 2012 21:25:37 +0000 (UTC)
Subject: [Python-Dev] A question about the subprocess implementation
Message-ID: <>

The subprocess.Popen constructor takes stdin, stdout and stderr keyword
arguments which are supposed to represent the file handles of the child process.
The object also has stdin, stdout and stderr attributes, which one would naively
expect to correspond to the passed in values, except where you pass in e.g.
subprocess.PIPE (in which case the corresponding attribute would be set to an
actual stream or descriptor).

However, in common cases, even when keyword arguments are passed in, the
corresponding attributes are set to None. The following script

import os
from subprocess import Popen, PIPE
import tempfile

cmd = 'ls /tmp'.split()

p = Popen(cmd, stdout=open(os.devnull, 'w+b'))
print('process output streams: %s, %s' % (p.stdout, p.stderr))
p = Popen(cmd, stdout=tempfile.TemporaryFile())
print('process output streams: %s, %s' % (p.stdout, p.stderr))


process output streams: None, None
process output streams: None, None

under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then
the corresponding attribute *is* set: if the last four lines are changed to

p = Popen(cmd, stdout=PIPE)
print('process output streams: %s, %s' % (p.stdout, p.stderr))
p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE)
print('process output streams: %s, %s' % (p.stdout, p.stderr))

then you get

process output streams: <open file '<fdopen>', mode 'rb' at 0x2088660>, None
process output streams: None, <open file '<fdopen>', mode 'rb' at 0x2088e40>

under Python 2.7, and

process output streams: <_io.FileIO name=3 mode='rb'>, None
process output streams: None, <_io.FileIO name=5 mode='rb'>

This seems to me to contradict the principle of least surprise. One would
expect, when an file-like object is passed in as a keyword argument, that it be
placed in the corresponding attribute. That way, if one wants to do
p.stdout.close() (which is necessary in some cases), one doesn't hit an
AttributeError because NoneType has no attribute 'close'.

This seems like it might be a bug, but if so it does seem rather egregious: can
someone tell me if there is a good design reason for the current behaviour? If
there isn't one, I'll raise an issue.


Vinay Sajip

From benjamin at  Sat Jan  7 22:47:50 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 7 Jan 2012 16:47:50 -0500
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/7 "Martin v. L?wis" <martin at>:
> I just tried porting Python as a Metro (Windows 8) App, and failed.

Is this required for Python to run on Windows 8?

Sorry if that's a dumb question. I'm not sure if "Metro App" is a
special class of application.


From martin at  Sat Jan  7 23:07:33 2012
From: martin at (martin at
Date: Sat, 07 Jan 2012 23:07:33 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von Benjamin Peterson <benjamin at>:

> 2012/1/7 "Martin v. L?wis" <martin at>:
>> I just tried porting Python as a Metro (Windows 8) App, and failed.
> Is this required for Python to run on Windows 8?

No. Existing applications ("desktop applications") will continue to work
unmodified. Metro-style apps are primarily intended for smart phones and
tablet PCs, and will be distributed through the Windows app store. The
current VS prerelease supports both Intel and ARM processors for Apps.

A related question is whether Python will compile unmodified with Visual
Studio 11. Although I had some difficulties with that also so far, I expect
that this will ultimately work (although not unmodified - the project files
need to be updated, as will the packaging process).

A then-related question is whether Python 3.3 should be compiled with Visual
Studio 11. I'd still be in favor of that, provided Microsoft manages  
to release
that soon enough.


From brian at  Sat Jan  7 23:52:44 2012
From: brian at (Brian Curtin)
Date: Sat, 7 Jan 2012 16:52:44 -0600
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 7, 2012 at 16:07,  <martin at> wrote:
> A then-related question is whether Python 3.3 should be compiled with Visual
> Studio 11. I'd still be in favor of that, provided Microsoft manages to
> release that soon enough.

I'm guessing the change would have to be done before the first beta?
It would have to be released awfully soon, and I haven't heard an
estimated release date as of yet.

I currently have the default branch mostly ported to VS 2010 save for
a number of failed tests, FWIW.

From eliben at  Sat Jan  7 23:56:20 2012
From: eliben at (Eli Bendersky)
Date: Sun, 8 Jan 2012 00:56:20 +0200
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

> A then-related question is whether Python 3.3 should be compiled with
> Visual
> Studio 11. I'd still be in favor of that, provided Microsoft manages to
> release
> that soon enough.

Martin, I assume you mean the Express version of Visual Studio 11 here,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sat Jan  7 23:57:29 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 7 Jan 2012 23:57:29 +0100
Subject: [Python-Dev] Python as a Metro-style App
References: <>
Message-ID: <>

On Sat, 07 Jan 2012 18:57:41 +0100
"Martin v. L?wis" <martin at> wrote:
> For example, everything
> related to subprocess creation would not work; none of the
> byte-oriented file API seems to be present, and a number of file
> operation functions are absent as well (such as MoveFile).

When you say MoveFile is absent, is MoveFileEx supported instead?
Or is moving files just totally impossible?

Depending on the extent of removed/disabled functionality, it might not
be very interesting to have a Metro port at all.

> I'm not sure whether such a change would be accepted, in particular as
> Microsoft might fix the bug in the compiler until the final release
> of Windows 8.

I would hope they finally support compiling C code...



From tjreedy at  Sun Jan  8 00:38:08 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 07 Jan 2012 18:38:08 -0500
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <jeal19$3ij$>

On 1/7/2012 4:47 PM, Benjamin Peterson wrote:
> 2012/1/7 "Martin v. L?wis"<martin at>:
>> I just tried porting Python as a Metro (Windows 8) App, and failed.
> Is this required for Python to run on Windows 8?

No, normal 'desktop' programs will still run in desktop mode.

> Sorry if that's a dumb question. I'm not sure if "Metro App" is a
> special class of application.

Yes. They are basically 'phone/touchpad' apps, and will be managed in 
the more or less the same way. They will probably only be available 
through MS storefront, after vetting by MS. Only Metro Apps will survive 
a system Refresh, along with user data. Traditional unvetted, 
direct-from-supplier, desktops apps will be wiped because they might be 

Terry Jan Reedy

From paul at  Sun Jan  8 00:47:59 2012
From: paul at (Paul Smedley)
Date: Sun, 08 Jan 2012 10:17:59 +1030
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je92jk$mou$>
References: <je3onm$57p$>
	<je4vjj$dtf$> <je5dal$co1$>
	<je90sb$eb8$> <je92jk$mou$>
Message-ID: <jealjg$6c6$>

Hi Terry,

On 07/01/12 19:47, Terry Reedy wrote:
> On 1/7/2012 3:48 AM, Paul Smedley wrote:
>> using _init_posix() for 'os2' instead of _init_non_posix is the fix for
>> this.
>> also needs the following changes:
>> --- \dev\Python-2.7.2-o\Lib\ 2012-01-06 19:27:14.000000000
>> +1030
>> +++ 2012-01-07 19:03:00.000000000 +1030
>> @@ -46,7 +46,7 @@
>> 'scripts': '{base}/Scripts',
>> 'data' : '{base}',
>> },
>> - 'os2_home': {
>> + 'os2_user': {
>> 'stdlib': '{userbase}/lib/python{py_version_short}',
>> 'platstdlib': '{userbase}/lib/python{py_version_short}',
>> 'purelib': '{userbase}/lib/python{py_version_short}/site-packages',
>> @@ -413,9 +413,9 @@
>> _CONFIG_VARS['platbase'] = _EXEC_PREFIX
>> _CONFIG_VARS['projectbase'] = _PROJECT_BASE
>> - if in ('nt', 'os2'):
>> + if in ('nt'):
>> _init_non_posix(_CONFIG_VARS)
>> - if == 'posix':
>> + if in ('posix', 'os2'):
>> _init_posix(_CONFIG_VARS)
> Submit a patch on the tracker, preferably as a file rather than cut and
> paste.
Will do right now.



From tjreedy at  Sun Jan  8 01:02:08 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 07 Jan 2012 19:02:08 -0500
Subject: [Python-Dev] A question about the subprocess implementation
In-Reply-To: <>
References: <>
Message-ID: <jeame8$asu$>

On 1/7/2012 4:25 PM, Vinay Sajip wrote:
> The subprocess.Popen constructor takes stdin, stdout and stderr keyword
> arguments which are supposed to represent the file handles of the child process.
> The object also has stdin, stdout and stderr attributes, which one would naively
> expect to correspond to the passed in values, except where you pass in e.g.
> subprocess.PIPE (in which case the corresponding attribute would be set to an
> actual stream or descriptor).
> However, in common cases, even when keyword arguments are passed in, the
> corresponding attributes are set to None. The following script
> import os
> from subprocess import Popen, PIPE
> import tempfile
> cmd = 'ls /tmp'.split()
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'))
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=tempfile.TemporaryFile())
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> prints
> process output streams: None, None
> process output streams: None, None
> under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then
> the corresponding attribute *is* set: if the last four lines are changed to
> p = Popen(cmd, stdout=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> then you get
> process output streams:<open file '<fdopen>', mode 'rb' at 0x2088660>, None
> process output streams: None,<open file '<fdopen>', mode 'rb' at 0x2088e40>
> under Python 2.7, and
> process output streams:<_io.FileIO name=3 mode='rb'>, None
> process output streams: None,<_io.FileIO name=5 mode='rb'>
> This seems to me to contradict the principle of least surprise. One would
> expect, when an file-like object is passed in as a keyword argument, that it be
> placed in the corresponding attribute.

The behavior matches the doc: Popen.stdin
If the stdin argument was PIPE, this attribute is a file object that 
provides input to the child process. Otherwise, it is None.
-- ditto for Popen.stdout, .stderr

> That way, if one wants to do
> p.stdout.close() (which is necessary in some cases), one doesn't hit an
> AttributeError because NoneType has no attribute 'close'.

I believe you are expected to keep a reference to anything you pass in.

pout = open(os.devnull, 'w+b')
p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE)

The attributes were added for the case when you do not otherwise have 

> This seems like it might be a bug, but if so it does seem rather egregious:

It would be egregious if is were a bug, but it is not.

> someone tell me if there is a good design reason for the current behaviour? If
> there isn't one, I'll raise an issue.

That seems like a possibly reasonable enhancement request. But the 
counterargument might be that you have to separately keep track of the 
need to close anyway. Or that you should do things like

with open(os.devnull, 'w+b') as pout:
     p = Popen(cmd, stdout=pout, 'w+b'), stderr=PIPE)

Terry Jan Reedy

From p.f.moore at  Sun Jan  8 01:04:38 2012
From: p.f.moore at (Paul Moore)
Date: Sun, 8 Jan 2012 00:04:38 +0000
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 January 2012 22:56, Eli Bendersky <eliben at> wrote:
>> A then-related question is whether Python 3.3 should be compiled with
>> Visual
>> Studio 11. I'd still be in favor of that, provided Microsoft manages to
>> release
>> that soon enough.
> Martin, I assume you mean the Express version of Visual Studio 11 here,
> right?

I would assume that Express should work, but the
distributed binaries will use the full version (IIUC, the official
distribution uses some optimisations not present in Express - Profile
Guided Optimisation, I believe).


From brian at  Sun Jan  8 01:11:22 2012
From: brian at (Brian Curtin)
Date: Sat, 7 Jan 2012 18:11:22 -0600
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 7, 2012 at 18:04, Paul Moore <p.f.moore at> wrote:
> On 7 January 2012 22:56, Eli Bendersky <eliben at> wrote:
>>> A then-related question is whether Python 3.3 should be compiled with
>>> Visual
>>> Studio 11. I'd still be in favor of that, provided Microsoft manages to
>>> release
>>> that soon enough.
>> Martin, I assume you mean the Express version of Visual Studio 11 here,
>> right?
> I would assume that Express should work, but the
> distributed binaries will use the full version (IIUC, the official
> distribution uses some optimisations not present in Express - Profile
> Guided Optimisation, I believe).

The bigger issue is how Express doesn't (officially) support x64
builds, unless that's changing in VS11.

Perhaps this is better for another topic, but is anyone using the PGO
stuff? I know we have PGInstrument and PGUpdate build configurations
but I've never seen them mentioned anywhere.

From nyamatongwe at  Sun Jan  8 01:12:08 2012
From: nyamatongwe at (Neil Hodgson)
Date: Sun, 8 Jan 2012 11:12:08 +1100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

Antoine Pitrou:

> When you say MoveFile is absent, is MoveFileEx supported instead?

   WinRT strongly prefers asynchronous methods for all lengthy
operations. The most likely call to use for moving files is

> Depending on the extent of removed/disabled functionality, it might not
> be very interesting to have a Metro port at all.

   Asynchronous APIs will become much more important on all platforms
in the future to ensure responsive user interfaces. Python should not
be left behind.


From mwm at  Sun Jan  8 01:14:06 2012
From: mwm at (Mike Meyer)
Date: Sat, 7 Jan 2012 16:14:06 -0800
Subject: [Python-Dev] A question about the subprocess implementation
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 7 Jan 2012 21:25:37 +0000 (UTC)
Vinay Sajip <vinay_sajip at> wrote:

> The subprocess.Popen constructor takes stdin, stdout and stderr keyword
> arguments which are supposed to represent the file handles of the child process.
> The object also has stdin, stdout and stderr attributes, which one would naively
> expect to correspond to the passed in values, except where you pass in e.g.
> subprocess.PIPE (in which case the corresponding attribute would be set to an
> actual stream or descriptor).
> However, in common cases, even when keyword arguments are passed in, the
> corresponding attributes are set to None. The following script

Note that this is documented behavior for these attributes.

> This seems to me to contradict the principle of least surprise. One
> would expect, when an file-like object is passed in as a keyword
> argument, that it be placed in the corresponding attribute.

Since the only reason they exist is so you can access your end of a
pipe, setting them to anything would seem to be a bug. I'd argue that
their existence is more a pola violation than them having the value
None. But None is easier than a call to hasattr.

> That way, if one wants to do p.stdout.close() (which is necessary in
> some cases), one doesn't hit an AttributeError because NoneType has
> no attribute 'close'.

You can close the object you passed in if it wasn't PIPE. If you
passed in PIPE, the object has to be exposed some way, otherwise you
*can't* close it.

This did raise one interesting question, which will go to ideas...


Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

> import os
> from subprocess import Popen, PIPE
> import tempfile
> cmd = 'ls /tmp'.split()
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'))
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=tempfile.TemporaryFile())
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> prints
> process output streams: None, None
> process output streams: None, None
> under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then
> the corresponding attribute *is* set: if the last four lines are changed to
> p = Popen(cmd, stdout=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> then you get
> process output streams: <open file '<fdopen>', mode 'rb' at 0x2088660>, None
> process output streams: None, <open file '<fdopen>', mode 'rb' at 0x2088e40>
> under Python 2.7, and
> process output streams: <_io.FileIO name=3 mode='rb'>, None
> process output streams: None, <_io.FileIO name=5 mode='rb'>
> This seems to me to contradict the principle of least surprise. One would
> expect, when an file-like object is passed in as a keyword argument, that it be
> placed in the corresponding attribute. That way, if one wants to do
> p.stdout.close() (which is necessary in some cases), one doesn't hit an
> AttributeError because NoneType has no attribute 'close'.

From solipsis at  Sun Jan  8 01:27:34 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 08 Jan 2012 01:27:34 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <1325982454.3374.1.camel@localhost.localdomain>

> > When you say MoveFile is absent, is MoveFileEx supported instead?
>    WinRT strongly prefers asynchronous methods for all lengthy
> operations. The most likely call to use for moving files is
> StorageFile.MoveAsync.

How does it translate to C?

> > Depending on the extent of removed/disabled functionality, it might not
> > be very interesting to have a Metro port at all.
>    Asynchronous APIs will become much more important on all platforms
> in the future to ensure responsive user interfaces. Python should not
> be left behind.

I'm not sure why "responsive user interfaces" would be more important
today than 10 years ago, but at least I hope Microsoft has found
something more usable than overlapped I/O.



From ncoghlan at  Sun Jan  8 01:32:10 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 8 Jan 2012 10:32:10 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Issue #9993: When the
 source and destination are on different filesystems, 
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 8, 2012 at 4:00 AM, Antoine Pitrou <solipsis at> wrote:
> I'm not sure it was *well* defined (or even defined at all). It seems
> more of a by-product of the implementation. It's not only different
> from mv, but it's inconsistent with itself (the semantics are different
> depending on whether the paths are on the same filesystem or not;
> also, it copied the *file* but erased the *link*).

Yeah, Hynek's explanation pointing out the existing inconsistencies
made sense to me. I have to agree with the point that
symlinks+removable media are almost inevitably going to create
weirdness that isn't easily handled by any means other than
"symlinks=False" :P


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From nyamatongwe at  Sun Jan  8 02:02:21 2012
From: nyamatongwe at (Neil Hodgson)
Date: Sun, 8 Jan 2012 12:02:21 +1100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain>
References: <> <>
Message-ID: <>

Antoine Pitrou:

> How does it translate to C?

   The simplest technique would be to use C++ code to bridge from C to
the API. If you really wanted to you could explicitly call the
function pointer in the COM vtable but doing COM in C is more effort
than calling through C++.

> I'm not sure why "responsive user interfaces" would be more important
> today than 10 years ago, but at least I hope Microsoft has found
> something more usable than overlapped I/O.

   They are more important now due to the use of phones and tablets
together with distant file systems.


From python-dev at  Sun Jan  8 02:19:38 2012
From: python-dev at (Xavier Morel)
Date: Sun, 8 Jan 2012 02:19:38 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain>
References: <> <>
Message-ID: <>

On 2012-01-08, at 01:27 , Antoine Pitrou wrote:
>>> When you say MoveFile is absent, is MoveFileEx supported instead?
>>   WinRT strongly prefers asynchronous methods for all lengthy
>> operations. The most likely call to use for moving files is
>> StorageFile.MoveAsync.
> How does it translate to C?
From what I've read so far, it does not. WinRT inherits from COM (and the .net framework in some parts), so it seems like it's fundamentally an object-based API and the lowest-level language available is two variants of C++ (a template library and an extension to C++ which looks a bit like MS's older C++/CLI).

I have not seen any mention of C bindings for WinRT so far.

From vinay_sajip at  Sun Jan  8 02:48:54 2012
From: vinay_sajip at (Vinay Sajip)
Date: Sun, 8 Jan 2012 01:48:54 +0000 (UTC)
Subject: [Python-Dev] A question about the subprocess implementation
References: <>
Message-ID: <>

Terry Reedy <tjreedy <at>> writes:

> The behavior matches the doc: Popen.stdin
> If the stdin argument was PIPE, this attribute is a file object that 
> provides input to the child process. Otherwise, it is None.

Right, but it's not very helpful, nor especially intuitive. Why does it have to
be None in the case where you pass in a file object? Is there some benefit to be
gained by doing this? Does something bad happen if you store that file object in
proc.stdin / proc.stdout / proc.stderr?

> I believe you are expected to keep a reference to anything you pass in.

This can of course be done, but it can make code less clear than it needs to be.
For example, if you run a subprocess asynchronously, the code that makes the
Popen constructor call can be in a different place to the code that e.g.
captures process output after completion. For that code to know how the Popen
was constructed seems to make coupling overly strong.

> That seems like a possibly reasonable enhancement request. But the
> counterargument might be that you have to separately keep track of the
> need to close anyway.

It may be that the close() needs to be called whether you passed PIPE in, or a
file-like object - (a) because of the need to receive and handle SIGPIPE in
command pipelines, and (b) because it's e.g. set to a pipe you constructed
yourself, and you need to close the write end before you can issue an unsized
read on the read end. So the close logic would have to do e.g.

if proc.stdout is None:
    # pull out the reference from some other place and then close it

rather than just


It's doable, of course. The with construction you suggested isn't usable in the
general case, where the close() code is in a different place from the code which
fires off the subprocess.

Of course, since the behaviour matches the docs it would be an enhancement
request rather than a bug report. I was hoping someone could enlighten me as to
the *reason* for the current behaviour ... as it is, subprocess comes in for
some stick in the community for being "hard to use" ...


Vinay Sajip

From vinay_sajip at  Sun Jan  8 03:06:33 2012
From: vinay_sajip at (Vinay Sajip)
Date: Sun, 8 Jan 2012 02:06:33 +0000 (UTC)
Subject: [Python-Dev] A question about the subprocess implementation
References: <>
Message-ID: <>

Mike Meyer <mwm <at>> writes:

> Since the only reason they exist is so you can access your end of a
> pipe, setting them to anything would seem to be a bug. I'd argue that
> their existence is more a pola violation than them having the value
> None. But None is easier than a call to hasattr.

I don't follow your reasoning, re. why setting them to a handle used for
subprocess output would be a bug - it's logically the same as the PIPE case. For
example, I might have a pipe (say, constructed using os.pipe()) whose write end
is intended for the subprocess to output to, and whose read end I want to hand
off to some other code to read the output from the subprocess. However, if that
other code does a read() on that pipe, it will hang until the write handle for
the pipe is closed. So, once the subprocess has terminated, I need to close the
write handle. The actual reading might be done not in my code but in some client
code of my code. While I could use some other place to store it, where's the
problem in storing it in proc.stdout or proc.stderr? 

> You can close the object you passed in if it wasn't PIPE. If you
> passed in PIPE, the object has to be exposed some way, otherwise you
> *can't* close it.

Yes, I'm not disputing that I need to keep track of it - just that proc.stdout
seems a good place to keep it. That way, the closing code can be de-coupled from
the code that sets up the subprocess. A use case for this is when you want the
subprocess and the parent to run concurrently/asynchronously, so the proc.wait()
and subsequent processing happens at a different time and place to the kick-off.


Vinay Sajip

From dasdasich at  Sun Jan  8 03:29:45 2012
From: dasdasich at (=?utf-8?Q?Daniel_Neuh=C3=A4user?=)
Date: Sun, 8 Jan 2012 03:29:45 +0100
Subject: [Python-Dev] A question about the subprocess implementation
In-Reply-To: <>
References: <>
Message-ID: <>

That's documented behaviour nonetheless. I would agree that the behaviour is a stupid one (not knowing the reason for it); even so it cannot be changed in a backwards compatible way.

Am 07.01.2012 um 22:25 schrieb Vinay Sajip <vinay_sajip at>:

> The subprocess.Popen constructor takes stdin, stdout and stderr keyword
> arguments which are supposed to represent the file handles of the child process.
> The object also has stdin, stdout and stderr attributes, which one would naively
> expect to correspond to the passed in values, except where you pass in e.g.
> subprocess.PIPE (in which case the corresponding attribute would be set to an
> actual stream or descriptor).
> However, in common cases, even when keyword arguments are passed in, the
> corresponding attributes are set to None. The following script
> import os
> from subprocess import Popen, PIPE
> import tempfile
> cmd = 'ls /tmp'.split()
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'))
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=tempfile.TemporaryFile())
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> prints
> process output streams: None, None
> process output streams: None, None
> under both Python 2.7 and 3.2. However, if subprocess.PIPE is passed in, then
> the corresponding attribute *is* set: if the last four lines are changed to
> p = Popen(cmd, stdout=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> p = Popen(cmd, stdout=open(os.devnull, 'w+b'), stderr=PIPE)
> print('process output streams: %s, %s' % (p.stdout, p.stderr))
> then you get
> process output streams: <open file '<fdopen>', mode 'rb' at 0x2088660>, None
> process output streams: None, <open file '<fdopen>', mode 'rb' at 0x2088e40>
> under Python 2.7, and
> process output streams: <_io.FileIO name=3 mode='rb'>, None
> process output streams: None, <_io.FileIO name=5 mode='rb'>
> This seems to me to contradict the principle of least surprise. One would
> expect, when an file-like object is passed in as a keyword argument, that it be
> placed in the corresponding attribute. That way, if one wants to do
> p.stdout.close() (which is necessary in some cases), one doesn't hit an
> AttributeError because NoneType has no attribute 'close'.
> This seems like it might be a bug, but if so it does seem rather egregious: can
> someone tell me if there is a good design reason for the current behaviour? If
> there isn't one, I'll raise an issue.
> Regards,
> Vinay Sajip
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From vandry at TZoNE.ORG  Sun Jan  8 03:28:56 2012
From: vandry at TZoNE.ORG (Phil Vandry)
Date: Sun, 08 Jan 2012 11:28:56 +0900
Subject: [Python-Dev] A question about the subprocess implementation
In-Reply-To: <>
References: <>
Message-ID: <4F08FF68.1020406@TZoNE.ORG>

On 2012-01-08 10:48 , Vinay Sajip wrote:
> Terry Reedy<tjreedy<at>>  writes:
>> The behavior matches the doc: Popen.stdin
>> If the stdin argument was PIPE, this attribute is a file object that
>> provides input to the child process. Otherwise, it is None.
> Right, but it's not very helpful, nor especially intuitive. Why does it have to
> be None in the case where you pass in a file object? Is there some benefit to be
> gained by doing this? Does something bad happen if you store that file object in
> proc.stdin / proc.stdout / proc.stderr?

proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference 
to the file that got connected to the subprocess' stdin/stdout/stderr. 
They are meant to be a reference to the OTHER END of the pipe that got 
connected. When you pass in a normal file object there is no such thing 
as the OTHER END of that file. The value None reflects this fact, and 
should continue to do so.


From mwm at  Sun Jan  8 03:48:56 2012
From: mwm at (Mike Meyer)
Date: Sat, 7 Jan 2012 18:48:56 -0800
Subject: [Python-Dev] A question about the subprocess implementation
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, 8 Jan 2012 02:06:33 +0000 (UTC)
Vinay Sajip <vinay_sajip at> wrote:

> Mike Meyer <mwm <at>> writes:
> > Since the only reason they exist is so you can access your end of a
> > pipe, setting them to anything would seem to be a bug. I'd argue that
> > their existence is more a pola violation than them having the value
> > None. But None is easier than a call to hasattr.
> I don't follow your reasoning, re. why setting them to a handle used for
> subprocess output would be a bug - it's logically the same as the PIPE case.

No, it isn't. In the PIPE case, the value of the attributes isn't
otherwise available to the caller.

I think you're not following because you're thinking about what you
want to do with the attributes:

> storing it [the fd] in proc.stdout or proc.stderr?

As opposed to what they're used for, which is communicating the fd's
created in the PIPE case to the caller.  Would you feel the same way
if they were given the more accurate names "pipe_input" and

> > You can close the object you passed in if it wasn't PIPE. If you
> > passed in PIPE, the object has to be exposed some way, otherwise you
> > *can't* close it.
> Yes, I'm not disputing that I need to keep track of it - just that proc.stdout
> seems a good place to keep it.

I disagree. Having the proc object keep track of these things for you
is making it more complicated (by the admittedly trivial change of
assigning those two attributes when they aren't used) so you can make
your process creation code less complicated (by the equally trivial
change of assigning the values in those two attributes when they are
used). Since only the caller knows when this complication is needed,
that's the logical place to put it.

> That way, the closing code can be de-coupled from the code that sets
> up the subprocess.

There are other ways to do that. It's still the same tradeoff - you're
making the proc code more complicated to make the calling code
simpler, even though only the calling code knows if that's needed.

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From martin at  Sun Jan  8 04:17:14 2012
From: martin at (martin at
Date: Sun, 08 Jan 2012 04:17:14 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von Eli Bendersky <eliben at>:

>> A then-related question is whether Python 3.3 should be compiled with
>> Visual Studio 11. I'd still be in favor of that, provided Microsoft  
>> manages to
>> release that soon enough.
> Martin, I assume you mean the Express version of Visual Studio 11 here,
> right?

*Here*, I mean "Visual Studio 11, any edition". I don't think the  
edition matters
for determining what version the project files have - any edition will be able
to read the project files, Express or not.

If you are specifically asking whether I would make the release of the
express edition a prerequisite to releasing Python: no, I wouldn't. I would
expect that Microsoft releases the express edition along with or soon after
the commercial editions, and the commercial edition is sufficient for running
the Python release process.


From martin at  Sun Jan  8 04:35:17 2012
From: martin at (martin at
Date: Sun, 08 Jan 2012 04:35:17 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

> When you say MoveFile is absent, is MoveFileEx supported instead?
> Or is moving files just totally impossible?

I can't check the SDK headers right now, but according to the online
documentation, MoveFileExW is indeed available. I'm not sure whether
you are allowed to pass arbitrary file names in an App, though.

> Depending on the extent of removed/disabled functionality, it might not
> be very interesting to have a Metro port at all.

I'm not so sure. Even if the low-level Win32 API was not available, you
might still be able to do useful things with the higher-level APIs, such
as Windows.Storage (in case of file access). If you use, say,
Windows.Storage.ApplicationData.RoamingSettings in your app, you should
not actually worry what the file is named on disk (or whether there is
a spinning disk in the system at all, which probably isn't).


From martin at  Sun Jan  8 04:38:38 2012
From: martin at (martin at
Date: Sun, 08 Jan 2012 04:38:38 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

> Perhaps this is better for another topic, but is anyone using the PGO
> stuff? I know we have PGInstrument and PGUpdate build configurations
> but I've never seen them mentioned anywhere.

I'm using them in the 32-bit builds. I don't use them for the 64-bit  
builds, as the
build machine was a 32-bit system (but perhaps I start with PGO for  
Win64 for 3.3).


From martin at  Sun Jan  8 04:42:46 2012
From: martin at (martin at
Date: Sun, 08 Jan 2012 04:42:46 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <1325982454.3374.1.camel@localhost.localdomain>
References: <> <>
Message-ID: <>

Zitat von Antoine Pitrou <solipsis at>:

>> > When you say MoveFile is absent, is MoveFileEx supported instead?
>>    WinRT strongly prefers asynchronous methods for all lengthy
>> operations. The most likely call to use for moving files is
>> StorageFile.MoveAsync.
> How does it translate to C?

Not sure whether you are asking literally for *C*: please remember that
my original report said that C is apparently not currently supported for

In any case, for native C++ code, do

   StorageFile ^the_file = something();
   the_file->MoveAsync(destinationFolder, "newfile.txt");

This may look like managed C++ to you, but it really compiles into  
native code.


From paul at  Sun Jan  8 09:37:48 2012
From: paul at (Paul Smedley)
Date: Sun, 08 Jan 2012 19:07:48 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
In-Reply-To: <je7qfi$6at$>
References: <je6con$r2l$> <je7joa$ndh$>
	<> <je7qfi$6at$>
Message-ID: <jebkkt$qpm$>

On 07/01/12 08:22, Paul Smedley wrote:
>> For the purpose of debugging you could *not* ignore the error and
>> instead print it out or bail out.
> Thanks - commenting out the ImportErrors block, I get:
> ImportError: No module named encodings

OK got through this - PYTHONPATH in makefile was borked for OS/2 (: 
separators vs ; which don't work so well with drive letters)

Now having trouble importing the _io module even though it's builtin <sigh>

From paul at  Sun Jan  8 09:42:48 2012
From: paul at (Paul Smedley)
Date: Sun, 08 Jan 2012 19:12:48 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
In-Reply-To: <jebkkt$qpm$>
References: <je6con$r2l$> <je7joa$ndh$>
	<je7qfi$6at$> <jebkkt$qpm$>
Message-ID: <jebku8$sl7$>

On 08/01/12 19:07, Paul Smedley wrote:
> On 07/01/12 08:22, Paul Smedley wrote:
>>> For the purpose of debugging you could *not* ignore the error and
>>> instead print it out or bail out.
>> Thanks - commenting out the ImportErrors block, I get:
>> ImportError: No module named encodings
> OK got through this - PYTHONPATH in makefile was borked for OS/2 (:
> separators vs ; which don't work so well with drive letters)
> Now having trouble importing the _io module even though it's builtin <sigh>
to be clear, the error is:
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
   File "U:/DEV/python-3.2.2/Lib/", line 60, in <module>

Killed by SIGABRT

From paul at  Sun Jan  8 09:59:59 2012
From: paul at (Paul Smedley)
Date: Sun, 08 Jan 2012 19:29:59 +1030
Subject: [Python-Dev] What's required to keep OS/2 support in Python 3.3
In-Reply-To: <jebku8$sl7$>
References: <je6con$r2l$> <je7joa$ndh$>
	<je7qfi$6at$> <jebkkt$qpm$>
Message-ID: <jebluf$1mo$>

On 08/01/12 19:12, Paul Smedley wrote:
> On 08/01/12 19:07, Paul Smedley wrote:
>> On 07/01/12 08:22, Paul Smedley wrote:
>>>> For the purpose of debugging you could *not* ignore the error and
>>>> instead print it out or bail out.
>>> Thanks - commenting out the ImportErrors block, I get:
>>> ImportError: No module named encodings
>> OK got through this - PYTHONPATH in makefile was borked for OS/2 (:
>> separators vs ; which don't work so well with drive letters)
>> Now having trouble importing the _io module even though it's builtin
>> <sigh>
> to be clear, the error is:
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> Traceback (most recent call last):
> File "U:/DEV/python-3.2.2/Lib/", line 60, in <module>
> Killed by SIGABRT
  and it's dying in _iomodule.c at:

     /* put os in the module state */
     state->os_module = PyImport_ImportModule("os");
     if (state->os_module == NULL){
fprintf(stderr,"_iomodule fail\n");
         goto fail;}

for some reason.. at least I'm slowly making progress :P (I think)



From neologix at  Sun Jan  8 12:32:08 2012
From: neologix at (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Sun, 8 Jan 2012 12:32:08 +0100
Subject: [Python-Dev] usefulness of Python version of threading.RLock
In-Reply-To: <>
References: <>
Message-ID: <>

> The yes/no answer is "No, we can't drop it".

Thanks, that's a clear answer :-)

> I'm not convinced of the benefits of removing the pure Python RLock
> implementation

As noted, this issue with signal handlers is more general, so this
wouldn't solve the problem at hand. I just wanted to know whether we
could remove this "duplicate" code, but since it might be used by some
implementations, it's best to keep it.

From vinay_sajip at  Sun Jan  8 13:09:38 2012
From: vinay_sajip at (Vinay Sajip)
Date: Sun, 8 Jan 2012 12:09:38 +0000 (UTC)
Subject: [Python-Dev] A question about the subprocess implementation
References: <>
Message-ID: <>

Phil Vandry <vandry <at> TZoNE.ORG> writes:

> proc.stdin, proc.stdout, and proc.stderr aren't meant to be a reference 
> to the file that got connected to the subprocess' stdin/stdout/stderr. 
> They are meant to be a reference to the OTHER END of the pipe that got 
> connected. 

Of course, and I've been using them like that, in general. But reading those two
sentences above made the light bulb come on :-)


Vinay Sajip

From jimjjewett at  Sun Jan  8 23:33:32 2012
From: jimjjewett at (Jim Jewett)
Date: Sun, 8 Jan 2012 17:33:32 -0500
Subject: [Python-Dev]  Hash collision security issue (now public)
Message-ID: <>

Stefan Behnel wrote:

> Admittedly, this may require some adaptation for the PEP393 unicode memory
> layout in order to produce identical hashes for all three representations
> if they represent the same content.

They SHOULD NOT represent the same content; comparing two strings
currently requires converting them to canonical form, which means the
smallest format (of those three) that works.

If it can be represented in PyUnicode_1BYTE_KIND, then representations
using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as
canonical, won't be created by Python itself, and already compare
unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a
shortcut used by dicts).

That said, I don't think smallest-format is actually enforced with
anything stronger than comments (such as in unicodeobject.h struct
PyASCIIObject) and asserts (mostly calling
_PyUnicode_CheckConsistency).  I don't have any insight on how
prevalent non-conforming strings will be in practice, or whether
supporting their equality will be required as a bugfix.


From brian at  Mon Jan  9 01:36:59 2012
From: brian at (Brian Curtin)
Date: Sun, 8 Jan 2012 18:36:59 -0600
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 8, 2012 at 16:33, Jim Jewett <jimjjewett at> wrote:
> In
> Stefan Behnel wrote:

Can you please configure your mail client to not create new threads
like this? As if this topic wasn't already hard enough to follow, it
now exists across handfuls of threads with the same title.

From ncoghlan at  Mon Jan  9 01:40:18 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 9 Jan 2012 10:40:18 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset
 36f2e236c601: For some reason, rewinddir() doesn't work as
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 9, 2012 at 5:31 AM, charles-francois.natali
<python-checkins at> wrote:
> ?Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as
> it should on OpenIndiana.

Can rewinddir() end up touching the filesystem to retrieve data? I
noticed that your previous change (the one this checkin reverted)
moved it outside the GIL release macros.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From benjamin at  Mon Jan  9 01:43:33 2012
From: benjamin at (Benjamin Peterson)
Date: Sun, 8 Jan 2012 19:43:33 -0500
Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset
 36f2e236c601: For some reason, rewinddir() doesn't work as
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/8 Nick Coghlan <ncoghlan at>:
> On Mon, Jan 9, 2012 at 5:31 AM, charles-francois.natali
> <python-checkins at> wrote:
>> ?Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as
>> it should on OpenIndiana.
> Can rewinddir() end up touching the filesystem to retrieve data? I
> noticed that your previous change (the one this checkin reverted)
> moved it outside the GIL release macros.

It just resets a position count. (in glibc).


From lists at  Mon Jan  9 02:01:46 2012
From: lists at (Christian Heimes)
Date: Mon, 09 Jan 2012 02:01:46 +0100
Subject: [Python-Dev] py3benchmark not working
Message-ID: <jede9q$n16$>


I tried to compare the py3k baseline with my randomhash branch but the
benchmark suite is failing.

I've follewed the instruction

#   hg clone py2benchmarks
#   mkdir py3benchmarks;
#   cd py3benchmarks
#   ../py2benchmarks/ ../py2benchmarks
#   python3.1 -b py3k old_py3k new_py3k

but the suite immediately bails out:

$ ../3.1/python -r -b default ../py3k/python ../randomhash/python
Running 2to3...
INFO:root:Running ../py3k/python lib/2to3/2to3 -f all lib/2to3_data
Traceback (most recent call last):
  File "", line 2236, in <module>
  File "", line 2192, in main
  File "", line 1279, in BM_2to3
    return SimpleBenchmark(Measure2to3, *args, **kwargs)
  File "", line 706, in SimpleBenchmark
    *args, **kwargs)
  File "", line 1275, in Measure2to3
    return MeasureCommand(command, trials, env, options.track_memory)
  File "", line 1223, in MeasureCommand
    CallAndCaptureOutput(command, env=env)
  File "", line 1053, in CallAndCaptureOutput
    raise RuntimeError("Benchmark died: " + str(stderr, 'ascii'))
RuntimeError: Benchmark died: RefactoringTool: Skipping implicit fixer:
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
Traceback (most recent call last):
  File "lib/2to3/2to3", line 5, in <module>
"/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/", line
173, in main
line 700, in refactor
    items, write, doctests_only)
line 294, in refactor
    self.refactor_dir(dir_or_file, write, doctests_only)
line 314, in refactor_dir
    self.refactor_file(fullname, write, doctests_only)
line 741, in refactor_file
    *args, **kwargs)
line 349, in refactor_file
    tree = self.refactor_string(input, filename)
line 381, in refactor_string
    self.refactor_tree(tree, name)
line 455, in refactor_tree
    new = fixer.transform(node, results)
line 43, in transform
    method = self._check_method(node, results)
line 89, in _check_method
    method = getattr(self, "_" + results["method"][0].value.encode("ascii"))
TypeError: Can't convert 'bytes' object to str implicitly


From solipsis at  Mon Jan  9 02:24:42 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 9 Jan 2012 02:24:42 +0100
Subject: [Python-Dev] py3benchmark not working
References: <jede9q$n16$>
Message-ID: <>

On Mon, 09 Jan 2012 02:01:46 +0100
Christian Heimes <lists at> wrote:
> I tried to compare the py3k baseline with my randomhash branch but the
> benchmark suite is failing.
> I've follewed the instruction

For the record, you don't really need this. Just run the "2n3"
benchmark set (it works under both 2.x and 3.x). The "py3k" set will
include a couple more/other benchmarks though.



From jdhardy at  Mon Jan  9 07:13:25 2012
From: jdhardy at (Jeff Hardy)
Date: Sun, 8 Jan 2012 22:13:25 -0800
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou <solipsis at> wrote:
> Depending on the extent of removed/disabled functionality, it might not
> be very interesting to have a Metro port at all.

Win 8 is practically a new OS target - the nt module may need to be
replaced with a metro module to handle it well.

Accessing the WinRT APIs directly from Python will also require a set
of Python projections for the API, which should be straightforward to
generate from the WinRT metadata files. I know Dino Viehland did some
work on that; not sure if he can elaborate or not though.

Otherwise, IronPython would be the only option for writing Metro apps
in Python - not that I'd be *horribly* upset at that :). IronPython is
slowly growing Metro support, and it seems like most things will work,
but the .NET framework shields it from a lot of the WinRT guts.

- Jeff

From stefan_ml at  Mon Jan  9 09:13:15 2012
From: stefan_ml at (Stefan Behnel)
Date: Mon, 09 Jan 2012 09:13:15 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <jee7is$3ff$>

Jim Jewett, 08.01.2012 23:33:
> Stefan Behnel wrote:
>> Admittedly, this may require some adaptation for the PEP393 unicode memory
>> layout in order to produce identical hashes for all three representations
>> if they represent the same content.
> They SHOULD NOT represent the same content; comparing two strings
> currently requires converting them to canonical form, which means the
> smallest format (of those three) that works.
> [...]
> That said, I don't think smallest-format is actually enforced with
> anything stronger than comments (such as in unicodeobject.h struct
> PyASCIIObject) and asserts (mostly calling
> _PyUnicode_CheckConsistency).

That's what I meant. AFAIR, the PEP393 discussions at some point brought up
the suspicion that third party code may end up generating Unicode strings
that do not comply with that "invariant". So internal code shouldn't
strictly rely on it when it deals with user provided data. One example is
the "unequal kinds" optimisation in equality comparison, which, if I'm not
mistaken, wasn't implemented, due to exactly this reasoning. The same
applies to hashing then.


From neologix at  Mon Jan  9 09:23:30 2012
From: neologix at (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Mon, 9 Jan 2012 09:23:30 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset
 36f2e236c601: For some reason, rewinddir() doesn't work as
In-Reply-To: <>
References: <>
Message-ID: <>

>> Can rewinddir() end up touching the filesystem to retrieve data? I
>> noticed that your previous change (the one this checkin reverted)
>> moved it outside the GIL release macros.
> It just resets a position count. (in glibc).

Actually, it also calls lseek() on the directory FD:;a=blob;f=sysdeps/unix/rewinddir.c;hb=HEAD

But lseek() doesn't (normally) perform I/O, it just sets an offset in
the kernel file structure:

For example, it's not documented to return EINTR.

Now, one could imagine that the kernel could do some read-ahead or
some other magic things when passed SEEK_DATA or SEEK_HOLE, but
seeking at the beginning of a directory FD should be fast.

Anyway, I ended up reverting this change, because for some reason this
broke OpenIndiana buildbots (maybe rewinddir() is a no-op before
readdir() has been called?).



From mark at  Mon Jan  9 09:56:56 2012
From: mark at (Mark Shannon)
Date: Mon, 09 Jan 2012 08:56:56 +0000
Subject: [Python-Dev] py3benchmark not working
In-Reply-To: <jede9q$n16$>
References: <jede9q$n16$>
Message-ID: <>

Christian Heimes wrote:
> Hello,
> I tried to compare the py3k baseline with my randomhash branch but the
> benchmark suite is failing.
> I've follewed the instruction
> #   hg clone py2benchmarks
> #   mkdir py3benchmarks;
> #   cd py3benchmarks
> #   ../py2benchmarks/ ../py2benchmarks
> #   python3.1 -b py3k old_py3k new_py3k
> but the suite immediately bails out:
> "/media/ssd/heimes/python/py3benchmarks/lib/2to3/lib2to3/fixes/",
> line 89, in _check_method
>     method = getattr(self, "_" + results["method"][0].value.encode("ascii"))
> TypeError: Can't convert 'bytes' object to str implicitly

You can temporarily "fix" this by removing the .encode("ascii")
from line 89 in lib2to3/fixes/

I'm not sure if this is a bug in 2to3 or the benchmark.


From victor.stinner at  Mon Jan  9 10:53:19 2012
From: victor.stinner at (Victor Stinner)
Date: Mon, 9 Jan 2012 10:53:19 +0100
Subject: [Python-Dev] Hash collision security issue (now public)
In-Reply-To: <>
References: <>
Message-ID: <>

> That said, I don't think smallest-format is actually enforced with
> anything stronger than comments (such as in unicodeobject.h struct
> PyASCIIObject) and asserts (mostly calling
> _PyUnicode_CheckConsistency). ?I don't have any insight on how
> prevalent non-conforming strings will be in practice, or whether
> supporting their equality will be required as a bugfix.

If you are only Python, you cannot create a string in a non canonical form.

If you use the C API, you can create a string in a non canonical form
using PyUnicode_New() + PyUnicode_WRITE, or
PyUnicode_FromUnicode(NULL, length) (or
PyUnicode_FromStringAndSize(NULL, length)) + direct access to the
Py_UNICODE* string. If you create strings in a non canonical form, it
is a bug in your application and Python doesn't help you. But how
could Python help you? Expose a function to check your newly creating
string? There is already _PyUnicode_CheckConsistency() which is slow
(O(n)) because it checks each character, it is only used in debug


From victor.stinner at  Mon Jan  9 10:58:25 2012
From: victor.stinner at (Victor Stinner)
Date: Mon, 9 Jan 2012 10:58:25 +0100
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <je90sb$eb8$>
References: <je3onm$57p$>
	<je4vjj$dtf$> <je5dal$co1$>
Message-ID: <>

> - ? ? ? ?if in ('nt', 'os2'):
> + ? ? ? ?if in ('nt'):

This change is wrong: it should be == 'nt'.


From steve at  Mon Jan  9 11:02:57 2012
From: steve at (Steven D'Aprano)
Date: Mon, 09 Jan 2012 21:02:57 +1100
Subject: [Python-Dev] Compiling 2.7.2 on OS/2
In-Reply-To: <>
References: <je3onm$57p$>	<>	<je4vjj$dtf$>
	<je5dal$co1$>	<je90sb$eb8$>
Message-ID: <>

Victor Stinner wrote:
>> -        if in ('nt', 'os2'):
>> +        if in ('nt'):
> This change is wrong: it should be == 'nt'.

Or possibly in ('nt', ) (note the comma).


From benjamin at  Mon Jan  9 14:02:53 2012
From: benjamin at (Benjamin Peterson)
Date: Mon, 9 Jan 2012 08:02:53 -0500
Subject: [Python-Dev] [Python-checkins] cpython: Backed out changeset
 36f2e236c601: For some reason, rewinddir() doesn't work as
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/9 Charles-Fran?ois Natali <neologix at>:
>>> Can rewinddir() end up touching the filesystem to retrieve data? I
>>> noticed that your previous change (the one this checkin reverted)
>>> moved it outside the GIL release macros.
>> It just resets a position count. (in glibc).
> Actually, it also calls lseek() on the directory FD:
> But lseek() doesn't (normally) perform I/O, it just sets an offset in
> the kernel file structure:

Sorry, I should have implied I looked at the kernel source, too. :)


From pasparis at  Mon Jan  9 15:46:04 2012
From: pasparis at (pasparis at
Date: Mon,  9 Jan 2012 15:46:04 +0100 (CET)
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of a
	python Class
Message-ID: <>

An HTML attachment was scrubbed...
URL: <>

From jon at  Mon Jan  9 15:32:13 2012
From: jon at (Jon Wells)
Date: Tue, 10 Jan 2012 01:32:13 +1100
Subject: [Python-Dev] descriptor as instance attribute
Message-ID: <1326119533.16276.54.camel@localhost>

I can't find an answer to this grovelling through get user info. on

Assuming desc() is a data descriptor class why are the following not the

    class poop(object):
        var = desc()

    class poop(object):
        def __init__(self):
            self.var = desc()

In the second form the descriptor protocol for access to 'var' is

Would seem to not make sense to me.


From phd at  Mon Jan  9 16:51:35 2012
From: phd at (Oleg Broytman)
Date: Mon, 9 Jan 2012 19:51:35 +0400
Subject: [Python-Dev] descriptor as instance attribute
In-Reply-To: <1326119533.16276.54.camel@localhost>
References: <1326119533.16276.54.camel@localhost>
Message-ID: <>


   We are sorry but we cannot help you. This mailing list is to work on
developing Python (adding new features to Python itself and fixing bugs);
if you're having problems learning, understanding or using Python, please
find another forum. Probably python-list/comp.lang.python mailing list/news
group is the best place; there are Python developers who participate in it;
you may get a faster, and probably more complete, answer there. See for other lists/news groups/fora. Thank
you for understanding.

On Tue, Jan 10, 2012 at 01:32:13AM +1100, Jon Wells wrote:
> I can't find an answer to this grovelling through get user info. on
> descriptors.

   Read carefully

> Assuming desc() is a data descriptor class why are the following not the
> same???
>     class poop(object):
>         var = desc()
> and
>     class poop(object):
>         def __init__(self):
>             self.var = desc()
> In the second form the descriptor protocol for access to 'var' is
> ignored. 


...transforms b.x into type(b).__dict__['x'].__get__(b, type(b))..

   Please note the first type(b).

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From amauryfa at  Mon Jan  9 19:09:19 2012
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Mon, 9 Jan 2012 19:09:19 +0100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <>
References: <>
Message-ID: <>

Good evening,

2012/1/9 <pasparis at>

> **
> I am trying to send a tuple to a method of a python class and I got a Run
> failed from netbeans compiler
> when I want to send a tuple to a simple method in a module it works,when I
> want to send a simple parameter to a method of a clas it works also but not
> a tuple to a method of a class

This mailing list is for the development *of* python.
For development *with* python, please ask your questions on
the comp.lang.python group or the python-list at mailing list.
There you will find friendly people willing to help.

[for your particular question: keep in mind that PyObject_Call takes
arguments as a tuple;
if you want to pass one tuple, you need to build a 1-tuple around your

Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dinov at  Mon Jan  9 18:59:45 2012
From: dinov at (Dino Viehland)
Date: Mon, 9 Jan 2012 17:59:45 +0000
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

We spent some time investigating Python/Win8 projections but we don't really have anything else to say right now, but it is certainly possible.

I haven't been following this thread so maybe this was already discussed, but on the whole "new OS target" thing - if people want to write immersive apps in Python then there will need to be a new build of Python.  One thing that might make that easier is the fact that the C runtime is still available to metro apps, even if the C runtime calls a banned API.  So to the extent that Python is just a C program the "port" should be pretty easy and mostly involve disabling functionality that isn't available at all to metro apps.  

I have packaged up Python 2.7 in an appx and run the application verifier on it (this was a while ago, so things may have changed between now and then), the attached banned.txt includes the list of APIs which Python is using that aren't allowed for the curious.

Also, people who write apps will need to distribute Python w/ their app, there's currently no sharing between apps.

-----Original Message-----
From: Jeff Hardy [mailto:jdhardy at] 
Sent: Sunday, January 08, 2012 10:13 PM
To: Antoine Pitrou
Cc: python-dev at; Dino Viehland
Subject: Re: [Python-Dev] Python as a Metro-style App

On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou <solipsis at> wrote:
> Depending on the extent of removed/disabled functionality, it might 
> not be very interesting to have a Metro port at all.

Win 8 is practically a new OS target - the nt module may need to be replaced with a metro module to handle it well.

Accessing the WinRT APIs directly from Python will also require a set of Python projections for the API, which should be straightforward to generate from the WinRT metadata files. I know Dino Viehland did some work on that; not sure if he can elaborate or not though.

Otherwise, IronPython would be the only option for writing Metro apps in Python - not that I'd be *horribly* upset at that :). IronPython is slowly growing Metro support, and it seems like most things will work, but the .NET framework shields it from a lot of the WinRT guts.

- Jeff

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: banned.txt
URL: <>

From solipsis at  Mon Jan  9 22:59:07 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 9 Jan 2012 22:59:07 +0100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
References: <>
Message-ID: <>

On Mon, 09 Jan 2012 21:58:29 +0100
terry.reedy <python-checkins at> wrote:
> -Different branches are used at a time to represent different *minor versions*
> -in which development is made.  All development should be done **first** in the
> -:ref:`in-development <indevbranch>` branch, and selectively backported
> -to other branches when necessary.
> +There is a branch for each *minor version*. Development is done separately
> +for Python 2 and Python 3. For each *major version*, each change should be made
> +**first** in the oldest branch to which it applies and forward-ported as
> +appropriate.

Please avoid using the terms "minor version" and "major version", they
are confusing.



From neologix at  Mon Jan  9 23:01:54 2012
From: neologix at (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Mon, 9 Jan 2012 23:01:54 +0100
Subject: [Python-Dev] certificate expired
Message-ID: <>


All the buildbots are turning red because of test_ssl:
ERROR: test_connect (test.test_ssl.NetworkedTests)
Traceback (most recent call last):
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/",
line 616, in test_connect
    s.connect(("", 443))
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/",
line 519, in connect
    self._real_connect(addr, False)
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/",
line 509, in _real_connect
  File "/var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/",
line 489, in do_handshake
ssl.SSLError: [Errno 1] _ssl.c:420: error:14090086:SSL
routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

It seems that certificate expired today (09/01/2012).



From ncoghlan at  Tue Jan 10 02:52:40 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 10 Jan 2012 11:52:40 +1000
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou <solipsis at> wrote:
> Please avoid using the terms "minor version" and "major version", they
> are confusing.

Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x,
3.x) are the least confusing terms we have available.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From tjreedy at  Tue Jan 10 05:05:05 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 09 Jan 2012 23:05:05 -0500
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
Message-ID: <jegddt$dat$>

On 1/9/2012 8:52 PM, Nick Coghlan wrote:
> On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou<solipsis at>  wrote:
>> Please avoid using the terms "minor version" and "major version", they
>> are confusing.
> Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x,
> 3.x) are the least confusing terms we have available.

I minimally edited what was already there to correct what is now an 
error. The change comes immediately after a section defining major, 
minor, and micro releases. To change terms,
and possibly other pages needs more extensive editing.

Terry Jan Reedy

From stefan_ml at  Tue Jan 10 09:35:44 2012
From: stefan_ml at (Stefan Behnel)
Date: Tue, 10 Jan 2012 09:35:44 +0100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
	a python Class
In-Reply-To: <>
References: <>
Message-ID: <jegt9c$ar4$>


sorry for hooking into this off-topic thread.

Amaury Forgeot d'Arc, 09.01.2012 19:09:
> 2012/1/9 <pasparis at>
>> I am trying to send a tuple to a method of a python class and I got a Run
>> failed from netbeans compiler
>> when I want to send a tuple to a simple method in a module it works,when I
>> want to send a simple parameter to a method of a clas it works also but not
>> a tuple to a method of a class
> This mailing list is for the development *of* python.
> For development *with* python, please ask your questions on
> the comp.lang.python group or the python-list at mailing list.
> There you will find friendly people willing to help.

It's also worth mentioning the cython-users mailing list here, in case the
OP cares about simplifying these kinds of issues from the complexity of
C/C++ into Python. Cython is a really good and simple way to implement
these kinds of language interactions, also for embedding Python.

> [for your particular question: keep in mind that PyObject_Call takes
> arguments as a tuple;
> if you want to pass one tuple, you need to build a 1-tuple around your
> tuple]

The presented code also requires a whole lot of fixes (specifically in the
error handling parts) that Cython would basically just handle for you already.


From anacrolix at  Tue Jan 10 09:40:39 2012
From: anacrolix at (Matt Joiner)
Date: Tue, 10 Jan 2012 19:40:39 +1100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <jegt9c$ar4$>
References: <>
Message-ID: <>

Perhaps the python-dev mailing list should be renamed to python-core.

On Tue, Jan 10, 2012 at 7:35 PM, Stefan Behnel <stefan_ml at> wrote:
> Hi,
> sorry for hooking into this off-topic thread.
> Amaury Forgeot d'Arc, 09.01.2012 19:09:
>> 2012/1/9 <pasparis at>
>>> I am trying to send a tuple to a method of a python class and I got a Run
>>> failed from netbeans compiler
>>> when I want to send a tuple to a simple method in a module it works,when I
>>> want to send a simple parameter to a method of a clas it works also but not
>>> a tuple to a method of a class
>> This mailing list is for the development *of* python.
>> For development *with* python, please ask your questions on
>> the comp.lang.python group or the python-list at mailing list.
>> There you will find friendly people willing to help.
> It's also worth mentioning the cython-users mailing list here, in case the
> OP cares about simplifying these kinds of issues from the complexity of
> C/C++ into Python. Cython is a really good and simple way to implement
> these kinds of language interactions, also for embedding Python.
>> [for your particular question: keep in mind that PyObject_Call takes
>> arguments as a tuple;
>> if you want to pass one tuple, you need to build a 1-tuple around your
>> tuple]
> The presented code also requires a whole lot of fixes (specifically in the
> error handling parts) that Cython would basically just handle for you already.
> Stefan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


From ncoghlan at  Tue Jan 10 09:50:11 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 10 Jan 2012 18:50:11 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12760: Add a
 create mode to open(). Patch by David Townshend.
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 10, 2012 at 7:40 AM, charles-francois.natali
<python-checkins at> wrote:
> changeset: ? 74315:bf609baff4d3
> user: ? ? ? ?Charles-Fran?ois Natali <neologix at>
> date: ? ? ? ?Mon Jan 09 22:40:02 2012 +0100
> summary:
> ?Issue #12760: Add a create mode to open(). Patch by David Townshend.

To help make the 'x' more intuitive, it would be helpful if the mode
was referred to as "exclusive create" in the docs (at least once,
anyway), and the What's New entry stated explicitly that 'x' is used
based on the C11 precedent. Otherwise, I'm sure I'll be far from the
only one thinking "why not 'c'?". People shouldn't have to go read the
tracker item to find out the reason 'x' is used instead of 'c'.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From rob.cliffe at  Tue Jan 10 09:49:04 2012
From: rob.cliffe at (Rob Cliffe)
Date: Tue, 10 Jan 2012 08:49:04 +0000
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <jegddt$dat$>
References: <>	<>	<>
Message-ID: <>

But "minor version" and "major version" are readily understandable to 
the general reader, e.g. me, whereas "feature release" and "release 
series" I find are not.  Couldn't the first two terms be defined once 
and then used throughout?
Rob Cliffe

On 10/01/2012 04:05, Terry Reedy wrote:
> On 1/9/2012 8:52 PM, Nick Coghlan wrote:
>> On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou<solipsis at>  
>> wrote:
>>> Please avoid using the terms "minor version" and "major version", they
>>> are confusing.
>> Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x,
>> 3.x) are the least confusing terms we have available.
> I minimally edited what was already there to correct what is now an 
> error. The change comes immediately after a section defining major, 
> minor, and micro releases. To change terms,
> and possibly other pages needs more extensive editing.

From anthony.hw.kong at  Tue Jan 10 11:03:25 2012
From: anthony.hw.kong at (Anthony Kong)
Date: Tue, 10 Jan 2012 21:03:25 +1100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <>

I don't find 'major' and 'minor' confusing too. Maybe because it is the
designation used in linux community for years.

On Tue, Jan 10, 2012 at 7:49 PM, Rob Cliffe <rob.cliffe at>wrote:

> But "minor version" and "major version" are readily understandable to the
> general reader, e.g. me, whereas "feature release" and "release series" I
> find are not.  Couldn't the first two terms be defined once and then used
> throughout?
> Rob Cliffe
> On 10/01/2012 04:05, Terry Reedy wrote:
>> On 1/9/2012 8:52 PM, Nick Coghlan wrote:
>>> On Tue, Jan 10, 2012 at 7:59 AM, Antoine Pitrou<solipsis at>
>>>  wrote:
>>>> Please avoid using the terms "minor version" and "major version", they
>>>> are confusing.
>>> Indeed. "Feature release" (2.7, 3.2, 3.3) and "release series" (2.x,
>>> 3.x) are the least confusing terms we have available.
>> I minimally edited what was already there to correct what is now an
>> error. The change comes immediately after a section defining major, minor,
>> and micro releases. To change terms,
>> and possibly other pages needs more extensive editing.
>>  ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:**mailman/options/python-dev/**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Tue Jan 10 11:09:37 2012
From: barry at (Barry Warsaw)
Date: Tue, 10 Jan 2012 11:09:37 +0100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <20120110110937.4eb53781@rivendell>

On Jan 10, 2012, at 09:03 PM, Anthony Kong wrote:

>I don't find 'major' and 'minor' confusing too. Maybe because it is the
>designation used in linux community for years.

Neither do I.  I read them as aliases for "leftmost digit" and "middle digit"
respectively, regardless of Python's interpretation of them.


From peck at  Tue Jan 10 12:00:58 2012
From: peck at (Jon K Peck)
Date: Tue, 10 Jan 2012 04:00:58 -0700
Subject: [Python-Dev] AUTO: Jon K Peck is out of the office (returning
Message-ID: <>

I am out of the office until 01/12/2012.

I will be out of the office Monda through Wednesday with limited access to

Note: This is an automated response to your message  "Python-Dev Digest,
Vol 102, Issue 26" sent on 1/9/2012 21:05:32.

This is the only notification you will receive while this person is away.

From stefan_ml at  Tue Jan 10 13:17:34 2012
From: stefan_ml at (Stefan Behnel)
Date: Tue, 10 Jan 2012 13:17:34 +0100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
	a python Class
In-Reply-To: <>
References: <>
Message-ID: <jeha8u$auc$>

Matt Joiner, 10.01.2012 09:40:
> Perhaps the python-dev mailing list should be renamed to python-core.

Well, there *is* a rather visible warning on the list subscription page
that tells people that it's most likely not the list they actually want to
use. If they manage to ignore that, I doubt that a different list name
would fix it for them.


From solipsis at  Tue Jan 10 13:57:05 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 10 Jan 2012 13:57:05 +0100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
References: <>
	<jegddt$dat$> <>
Message-ID: <>

On Tue, 10 Jan 2012 08:49:04 +0000
Rob Cliffe <rob.cliffe at> wrote:
> But "minor version" and "major version" are readily understandable to 
> the general reader, e.g. me, whereas "feature release" and "release 
> series" I find are not.  Couldn't the first two terms be defined once 
> and then used throughout?

To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature
release, e.g. 3.3.  I have a hard time considering 3.2 or 3.3 "minor".



From victor.stinner at  Tue Jan 10 14:08:52 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 10 Jan 2012 14:08:52 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Fix stock symbol
	for Microsoft
In-Reply-To: <>
References: <>
Message-ID: <>

You may port the fix to 3.2 and 3.3.


2012/1/10 raymond.hettinger <python-checkins at>:
> changeset: ? 74320:068ce5d7f7e7
> branch: ? ? ?2.7
> user: ? ? ? ?Raymond Hettinger <python at>
> date: ? ? ? ?Tue Jan 10 09:51:51 2012 +0000
> summary:
> ?Fix stock symbol for Microsoft
> files:
> ?Doc/library/sqlite3.rst | ?4 ++--
> ?1 files changed, 2 insertions(+), 2 deletions(-)
> diff --git a/Doc/library/sqlite3.rst b/Doc/library/sqlite3.rst
> --- a/Doc/library/sqlite3.rst
> +++ b/Doc/library/sqlite3.rst
> @@ -66,7 +66,7 @@
> ? ?# Larger example
> ? ?for t in [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
> - ? ? ? ? ? ? ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00),
> + ? ? ? ? ? ? ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
> ? ? ? ? ? ? ?('2006-04-06', 'SELL', 'IBM', 500, 53.00),
> ? ? ? ? ? ? ]:
> ? ? ? ?c.execute('insert into stocks values (?,?,?,?,?)', t)
> @@ -86,7 +86,7 @@
> ? ?(u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
> ? ?(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
> ? ?(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
> - ? (u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)
> + ? (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)
> ? ?>>>
> --
> Repository URL:
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at

From sandro.tosi at  Tue Jan 10 17:32:01 2012
From: sandro.tosi at (Sandro Tosi)
Date: Tue, 10 Jan 2012 17:32:01 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <j3a0dn$pas$>
References: <>
Message-ID: <>

Hi all,

On Sat, Aug 27, 2011 at 07:47, Georg Brandl <g.brandl at> wrote:
> One of the main reasons for keeping Sphinx compatibility to 0.6.x was to
> enable distributions (like Debian) to build the docs for the Python they ship
> with the version of Sphinx that they ship.
> This should now be fine with 1.0.x, so since you are ready to do the work of
> converting the 2.7 Doc sources, it will be accepted. ?The argument of easier
> backports is a very good one.

Not exactly as quickly as I would, I started to work on upgrading
sphinx for 2.7. Currently I've all the preliminary patches at:

in the 2.7-sphinx branch (they fix one thing at a time, they'll be
collapsed once all ready).

During the build process, there are some warnings that I can understand:

writing output... [100%] whatsnew/index
/home/morph/cpython/morph_sandbox/Doc/glossary.rst:520: WARNING:
unknown keyword: nonlocal
WARNING: more than one target found for cross-reference u'next':,,,,,,,,,,
WARNING: more than one target found for cross-reference u'next':,,,,,,,,,,
/home/morph/cpython/morph_sandbox/Doc/library/sys.rst:651: WARNING:
unknown keyword: None
/home/morph/cpython/morph_sandbox/Doc/library/sys.rst:712: WARNING:
unknown keyword: None
WARNING: unknown keyword: not in
WARNING: unknown keyword: not in
WARNING: unknown keyword: not in
WARNING: unknown keyword: not in
WARNING: unknown keyword: is not
WARNING: unknown keyword: is not
WARNING: unknown keyword: None
WARNING: unknown keyword: None
WARNING: unknown keyword: None
writing additional files... genindex py-modindex search download index

Do you know how I can fix them?

Thanks & Cheers,
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From glyph at  Tue Jan 10 17:57:03 2012
From: glyph at (Glyph)
Date: Tue, 10 Jan 2012 11:57:03 -0500
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
	that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <>

On Jan 10, 2012, at 7:57 AM, Antoine Pitrou wrote:

> On Tue, 10 Jan 2012 08:49:04 +0000
> Rob Cliffe <rob.cliffe at> wrote:
>> But "minor version" and "major version" are readily understandable to 
>> the general reader, e.g. me, whereas "feature release" and "release 
>> series" I find are not.  Couldn't the first two terms be defined once 
>> and then used throughout?
> To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature
> release, e.g. 3.3.  I have a hard time considering 3.2 or 3.3 "minor".

Whatever your personal feelings, there is a precedent established in the API:

>>> sys.version_info.major
>>> sys.version_info.minor
>>> sys.version_info.micro

This strikes me as the most authoritative definition of the terms, in the context of Python.  (Although the fact that this precedent is widely established elsewhere doesn't hurt.)

Whatever term is chosen, the important thing is to apply the terminology consistently so that it's clear what is meant.  I doubt that anyone has a term which every reader will intuitively and immediately associate with "middle dot-separated digit increment by one".

If you want to emphasize the importance of a release, just choose a subjective term aside from "major" or "minor".


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From anacrolix at  Tue Jan 10 18:09:51 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 11 Jan 2012 04:09:51 +1100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <>

This has made sense since Gentoo days.

On Tue, Jan 10, 2012 at 11:57 PM, Antoine Pitrou <solipsis at> wrote:
> On Tue, 10 Jan 2012 08:49:04 +0000
> Rob Cliffe <rob.cliffe at> wrote:
>> But "minor version" and "major version" are readily understandable to
>> the general reader, e.g. me, whereas "feature release" and "release
>> series" I find are not. ?Couldn't the first two terms be defined once
>> and then used throughout?
> To me "minor" is a bugfix release, e.g. 2.7.2, and "major" is a feature
> release, e.g. 3.3. ?I have a hard time considering 3.2 or 3.3 "minor".
> Regards
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From anacrolix at  Tue Jan 10 18:15:06 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 11 Jan 2012 04:15:06 +1100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <jeha8u$auc$>
References: <>
Message-ID: <>

I suspect it actually would fix the confusion. "dev" usually means
development, not "core implementation development". People float past
looking for dev help... python-dev. Python-list is a bit generic.

On Tue, Jan 10, 2012 at 11:17 PM, Stefan Behnel <stefan_ml at> wrote:
> Matt Joiner, 10.01.2012 09:40:
>> Perhaps the python-dev mailing list should be renamed to python-core.
> Well, there *is* a rather visible warning on the list subscription page
> that tells people that it's most likely not the list they actually want to
> use. If they manage to ignore that, I doubt that a different list name
> would fix it for them.
> Stefan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From solipsis at  Tue Jan 10 18:14:47 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 10 Jan 2012 18:14:47 +0100
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <>

On Tue, 10 Jan 2012 11:57:03 -0500
Glyph <glyph at> wrote:
> Whatever your personal feelings, there is a precedent established in the API:
> >>> sys.version_info.major
> 2
> >>> sys.version_info.minor
> 7
> >>> sys.version_info.micro
> 1
> This strikes me as the most authoritative definition of the terms, in the context of Python.  (Although the fact that this precedent is widely established elsewhere doesn't hurt.)

While authoritative, it is still counter-intuitive and misleading for
some people (including Nick and me, apparently). I never use the field
names myself, I use version_info as a 3-tuple.

> Whatever term is chosen, the important thing is to apply the terminology consistently so that it's clear what is meant.  I doubt that anyone has a term which every reader will intuitively and immediately associate with "middle dot-separated digit increment by one".

I changed the terminology in my latest changeset:

Important to notice is that the major / minor distinction isn't
relevant in most contexts, while the feature / bugfix distinction is.
Where "major" plays a role, we can simply avoid the term by talking
about Python 2 and Python 3, which is more explicit too. I doubt this
needs to be revisited before 10 years anyway.



From martin at  Tue Jan 10 23:30:58 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 10 Jan 2012 23:30:58 +0100
Subject: [Python-Dev] certificate expired
In-Reply-To: <>
References: <>
Message-ID: <>

> It seems that certificate expired today (09/01/2012).

I have now replaced the certificate. The current one will expire on
Chistmas 2013.


From tjreedy at  Tue Jan 10 23:38:18 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 10 Jan 2012 17:38:18 -0500
Subject: [Python-Dev] devguide: Backporting is obsolete. Add details
 that I had to learn.
In-Reply-To: <>
References: <>
	<jegddt$dat$> <>
Message-ID: <jeiel7$7e4$>

On 1/10/2012 12:14 PM, Antoine Pitrou wrote:

> I changed the terminology in my latest changeset:
> Important to notice is that the major / minor distinction isn't
> relevant in most contexts, while the feature / bugfix distinction is.
> Where "major" plays a role, we can simply avoid the term by talking
> about Python 2 and Python 3, which is more explicit too. I doubt this
> needs to be revisited before 10 years anyway.

FWIW, I like the changes, and you did them better than I would have.

Terry Jan Reedy

From martin at  Wed Jan 11 01:20:21 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 11 Jan 2012 01:20:21 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

Am 09.01.2012 07:13, schrieb Jeff Hardy:
> On Sat, Jan 7, 2012 at 2:57 PM, Antoine Pitrou <solipsis at> wrote:
>> Depending on the extent of removed/disabled functionality, it might not
>> be very interesting to have a Metro port at all.
> Win 8 is practically a new OS target - the nt module may need to be
> replaced with a metro module to handle it well.

No, it's not. Everything continues to work just fine on Windows 8,
as long as we keep developing desktop apps.

Only if Metro Apps are the target things may need to be replaced (but
only very few changes are necessary to the nt module to make it compile).


From martin at  Wed Jan 11 01:32:12 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 11 Jan 2012 01:32:12 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

> I haven't been following this thread so maybe this was already
> discussed, but on the whole "new OS target" thing - if people want to
> write immersive apps in Python then there will need to be a new build
> of Python.  One thing that might make that easier is the fact that
> the C runtime is still available to metro apps, even if the C runtime
> calls a banned API.

Does that hold for all versions of the C runtime (i.e. is msvcr80.dll
also exempt from the ban, or just the version that comes with VS 11)?

> So to the extent that Python is just a C program
> the "port" should be pretty easy and mostly involve disabling
> functionality that isn't available at all to metro apps.

See the start of the thread: I tried to create a "WinRT Component DLL",
and that failed, as VS would refuse to compile any C file in such a
project. Not sure whether this is triggered by defining WINAPI_FAMILY=2,
or any other compiler setting.

I'd really love to use WINAPI_FAMILY=2, as compiler errors are much
easier to fix than verifier errors.


From phd at  Mon Jan  9 16:07:23 2012
From: phd at (Oleg Broytman)
Date: Mon, 9 Jan 2012 19:07:23 +0400
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <>
References: <>
Message-ID: <>


   We are sorry but we cannot help you. This mailing list is to work on
developing Python (adding new features to Python itself and fixing bugs);
if you're having problems learning, understanding or using Python, please
find another forum. Probably python-list/comp.lang.python mailing list/news
group is the best place; there are Python developers who participate in it;
you may get a faster, and probably more complete, answer there. See for other lists/news groups/fora. Thank
you for understanding.

On Mon, Jan 09, 2012 at 03:46:04PM +0100, pasparis at wrote:
> <BODY>Hello,<br><br>I am trying to send a tuple to a method of a python class

   Also please don't send html-only mail.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From martin at  Wed Jan 11 02:09:02 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 11 Jan 2012 02:09:02 +0100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <>
References: <>	<>	<jegt9c$ar4$>	<>	<jeha8u$auc$>
Message-ID: <>

Am 10.01.2012 18:15, schrieb Matt Joiner:
> I suspect it actually would fix the confusion. "dev" usually means
> development, not "core implementation development". People float past
> looking for dev help... python-dev. Python-list is a bit generic.

There is occasional confusion. More often, people think "there are the
folks who could actually answer my question, and nobody on python-list
answered, so I'll just ask there". We established to assume that they
are confused instead of deliberately breaking convention, which is a
polite way of pointing out that we really mean it.

IOW, I think it is all fine the way it is. Typically, somebody answers
quickly. In this case, *two* people answered the same, which
a) really gets the message through, and
b) suggests that people are not too tired in actually typing in
   this message every now and then.

Of course, pointing the OP to a more specific focused forum (which is
not always cython-users) is also kind.


From ncoghlan at  Wed Jan 11 03:25:46 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 11 Jan 2012 12:25:46 +1000
Subject: [Python-Dev] os.walk() with followlinks=False
Message-ID: <>

When discussing, Charles-Fran?ois
noted that when os.walk() is called with "followlinks=False", symlinks
to directories are still included in the "subdirs" list rather than
the "files" list.

This seems rather odd to me, so I'm asking here to see if there's a
specific rationale for it, or if it's just an artifact of the

If it's the latter... could we change it for 3.3, or is that too
significant a breach of backwards compatibility?

Even if we can't change os.walk(), does os.walkfd() need to replicate
the annoying behaviour for consistency, or can it instead consider
such symlinks to be files rather than directories?


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From dinov at  Wed Jan 11 02:59:08 2012
From: dinov at (Dino Viehland)
Date: Wed, 11 Jan 2012 01:59:08 +0000
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin wrote:
> Does that hold for all versions of the C runtime (i.e. is msvcr80.dll also
> exempt from the ban, or just the version that comes with VS 11)?

Just the VS 11 CRT is allowed.

> > So to the extent that Python is just a C program the "port" should be
> > pretty easy and mostly involve disabling functionality that isn't
> > available at all to metro apps.
> See the start of the thread: I tried to create a "WinRT Component DLL", and
> that failed, as VS would refuse to compile any C file in such a project. Not
> sure whether this is triggered by defining WINAPI_FAMILY=2, or any other
> compiler setting.
> I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier
> to fix than verifier errors.

Let me see if I can try this.  Hopefully I still have my VM w/ this all setup and
I can see if I can get it building this way.  I can always ping some people on the
C++ team and ask them for help if I run into issues.  I'll give it a shot tomorrow 
and get back to you.

From ericsnowcurrently at  Wed Jan 11 05:23:28 2012
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 10 Jan 2012 21:23:28 -0700
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 10, 2012 at 6:09 PM, "Martin v. L?wis" <martin at> wrote:
> IOW, I think it is all fine the way it is. Typically, somebody answers
> quickly. In this case, *two* people answered the same, which
> a) really gets the message through, and
> b) suggests that people are not too tired in actually typing in
> ? this message every now and then.



From lists at  Wed Jan 11 10:49:46 2012
From: lists at (Christian Heimes)
Date: Wed, 11 Jan 2012 10:49:46 +0100
Subject: [Python-Dev] shutil.copy() and hard links
Message-ID: <jejlvr$ifk$>


here is another fun fact about links, this time hard links and the
shutil.copy() function.

The shutil.copy() functions behaves like the Unix cp(1) command. Both
don't unlink the destination file if it already exists. As a consequence
all hard links point to the updated file data. This behavior may
surprise some users. Perhaps the docs should point out how shutil.copy()
works when hard links join the party.

It might be worth to add a function that works similar to install(1).
The install(1) command unlinks the destination first and opens it with
exclusive create flags. This compensates for possible symlink attacks, too.


Shell session example of cp and install

$ echo "test1" > test1
$ echo "test2" > test2
$ ln test1 test_hardlink

now test_hardlink points to the same inodes as test1
$ cat test_hardlink

test_hardlink still points to the same inodes
$ cp test2 test1
$ cat test_hardlink

$ echo "test1" > test1
$ cat test_hardlink

install unlinks the file first, test1 and test_hardlink point to
different inodes
$ install test2 test1
$ cat test_hardlink

strace of install test2 test1

stat("test1", {st_mode=S_IFREG|0755, st_size=6, ...}) = 0
stat("test2", {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
lstat("test1", {st_mode=S_IFREG|0755, st_size=6, ...}) = 0
unlink("test1")                         = 0
open("test2", O_RDONLY)                 = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=6, ...}) = 0
open("test1", O_WRONLY|O_CREAT|O_EXCL, 0664) = 4
fstat(4, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0

From martin at  Wed Jan 11 11:12:16 2012
From: martin at (martin at
Date: Wed, 11 Jan 2012 11:12:16 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

> Let me see if I can try this.  Hopefully I still have my VM w/ this  
> all setup and
> I can see if I can get it building this way.  I can always ping some  
> people on the
> C++ team and ask them for help if I run into issues.  I'll give it a  
> shot tomorrow
> and get back to you.

Hi Dino,

I reported that as a bug. If you need that for reference, see


From solipsis at  Wed Jan 11 15:52:07 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 11 Jan 2012 15:52:07 +0100
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
References: <>
Message-ID: <>

On Wed, 11 Jan 2012 02:09:02 +0100
"Martin v. L?wis" <martin at> wrote:

> Am 10.01.2012 18:15, schrieb Matt Joiner:
> > I suspect it actually would fix the confusion. "dev" usually means
> > development, not "core implementation development". People float past
> > looking for dev help... python-dev. Python-list is a bit generic.
> There is occasional confusion. More often, people think "there are the
> folks who could actually answer my question, and nobody on python-list
> answered, so I'll just ask there". We established to assume that they
> are confused instead of deliberately breaking convention, which is a
> polite way of pointing out that we really mean it.
> IOW, I think it is all fine the way it is. Typically, somebody answers
> quickly. In this case, *two* people answered the same, which
> a) really gets the message through, and
> b) suggests that people are not too tired in actually typing in
>    this message every now and then.

I suspect one of them doesn't actually *type* the message ;)



From solipsis at  Wed Jan 11 15:54:05 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 11 Jan 2012 15:54:05 +0100
Subject: [Python-Dev] os.walk() with followlinks=False
References: <>
Message-ID: <>

On Wed, 11 Jan 2012 12:25:46 +1000
Nick Coghlan <ncoghlan at> wrote:
> When discussing, Charles-Fran?ois
> noted that when os.walk() is called with "followlinks=False", symlinks
> to directories are still included in the "subdirs" list rather than
> the "files" list.
> This seems rather odd to me, so I'm asking here to see if there's a
> specific rationale for it, or if it's just an artifact of the
> implementation.
> If it's the latter... could we change it for 3.3, or is that too
> significant a breach of backwards compatibility?

I think we could change it.

> Even if we can't change os.walk(), does os.walkfd() need to replicate
> the annoying behaviour for consistency, or can it instead consider
> such symlinks to be files rather than directories?

IMO walkfd() should do the right thing.



From phd at  Wed Jan 11 16:07:32 2012
From: phd at (Oleg Broytman)
Date: Wed, 11 Jan 2012 19:07:32 +0400
Subject: [Python-Dev] Python C API: Problem sending tuple to a method of
 a python Class
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Jan 11, 2012 at 03:52:07PM +0100, Antoine Pitrou wrote:
> On Wed, 11 Jan 2012 02:09:02 +0100
> "Martin v. L?wis" <martin at> wrote:
> > b) suggests that people are not too tired in actually typing in
> >    this message every now and then.
> I suspect one of them doesn't actually *type* the message ;)

   Certainly, no.

:0r mail/misc/python-dev

   And even this command is in vim history, I don't type it, just press
:0<Up><Up><Up> ;-)

   Sometimes I add something useful to the OP but this time I didn't - I
just haven't got any helpful information.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From jdhardy at  Wed Jan 11 18:30:28 2012
From: jdhardy at (Jeff Hardy)
Date: Wed, 11 Jan 2012 09:30:28 -0800
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Tue, Jan 10, 2012 at 4:20 PM, "Martin v. L?wis" <martin at> wrote:
>> Win 8 is practically a new OS target - the nt module may need to be
>> replaced with a metro module to handle it well.
> No, it's not. Everything continues to work just fine on Windows 8,
> as long as we keep developing desktop apps.
> Only if Metro Apps are the target things may need to be replaced (but
> only very few changes are necessary to the nt module to make it compile).

Yeah, that's what I meant. I should have said "WinRT is ..." instead
of "Win 8 is ...".  If nt can be made to work, than that's even better
than I expected.

- Jeff

From mwm at  Thu Jan 12 01:01:44 2012
From: mwm at (Mike Meyer)
Date: Wed, 11 Jan 2012 16:01:44 -0800
Subject: [Python-Dev] Proposed PEP on concurrent programming support
In-Reply-To: <>
References: <20120103164036.681beeae@mikmeyer-vm-fedora>
Message-ID: <20120111160144.66c46236@mikmeyer-vm-fedora>

On Wed, 4 Jan 2012 00:07:27 -0500
PJ Eby <pje at> wrote:

> On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer <mwm at> wrote:
> > A suite is marked
> > as a `transaction`, and then when an unlocked object is modified,
> > instead of indicating an error, a locked copy of it is created to be
> > used through the rest of the transaction. If any of the originals
> > are modified during the execution of the suite, the suite is rerun
> > from the beginning. If it completes, the locked copies are copied
> > back to the originals in an atomic manner.
> I'm not sure if "locked" is really the right word here.  A private
> copy isn't "locked" because it's not shared.

Do you have a suggestion for a better word? Maybe the "safe" state
used elsewhere?

> > For
> > instance, combining STM with explicit locking would allow explicit
> > locking when IO was required,
> I don't think this idea makes any sense, since STM's don't really
> "lock", and to control I/O in an STM system you just STM-ize the
> queues. (Generally speaking.)

I thought about that. I couldn't convince myself that STM by itself
sufficient. If you need to make irreversible changes to the state of
an object, you can't use STM, so what do you use? Can every such
situation be handled by creating "safe" values then using an STM to
update them?


From dinov at  Thu Jan 12 01:46:15 2012
From: dinov at (Dino Viehland)
Date: Thu, 12 Jan 2012 00:46:15 +0000
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin wrote:
> See the start of the thread: I tried to create a "WinRT Component DLL", and
> that failed, as VS would refuse to compile any C file in such a project. Not
> sure whether this is triggered by defining WINAPI_FAMILY=2, or any other
> compiler setting.
> I'd really love to use WINAPI_FAMILY=2, as compiler errors are much easier
> to fix than verifier errors.

I got the same errors as you - it seems like they're related to enabling the Immersive 
bit for the compile of the DLL.  I'm not certain if that's necessary, when I did
the run before to see if Python would pass the app store validation it didn't care that
we didn't have the App Container bit set on the DLL (it did want NXCOMPAT and dynamic
base set though).  I was also able to just define WINAPI_FAMILY=2 in the .vcxproj file 
and I got the various expected errors when accessing banned APIs (it actually seems 
like a bunch were missing vs. what the validator reported, but maybe that's just an
issue w/ the developer preview).  Once I fixed those errors up I was able to get a DLL
that successfully compiled.

I'm going to ping some people on the windows team and see if the app container
bit is or will be necessary for DLLs.

From anacrolix at  Thu Jan 12 08:20:08 2012
From: anacrolix at (Matt Joiner)
Date: Thu, 12 Jan 2012 18:20:08 +1100
Subject: [Python-Dev] Proposed PEP on concurrent programming support
In-Reply-To: <20120111160144.66c46236@mikmeyer-vm-fedora>
References: <20120103164036.681beeae@mikmeyer-vm-fedora>
Message-ID: <>

On Thu, Jan 12, 2012 at 11:01 AM, Mike Meyer <mwm at> wrote:
> On Wed, 4 Jan 2012 00:07:27 -0500
> PJ Eby <pje at> wrote:
>> On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer <mwm at> wrote:
>> > A suite is marked
>> > as a `transaction`, and then when an unlocked object is modified,
>> > instead of indicating an error, a locked copy of it is created to be
>> > used through the rest of the transaction. If any of the originals
>> > are modified during the execution of the suite, the suite is rerun
>> > from the beginning. If it completes, the locked copies are copied
>> > back to the originals in an atomic manner.
>> I'm not sure if "locked" is really the right word here. ?A private
>> copy isn't "locked" because it's not shared.
> Do you have a suggestion for a better word? Maybe the "safe" state
> used elsewhere?
>> > For
>> > instance, combining STM with explicit locking would allow explicit
>> > locking when IO was required,
>> I don't think this idea makes any sense, since STM's don't really
>> "lock", and to control I/O in an STM system you just STM-ize the
>> queues. (Generally speaking.)
> I thought about that. I couldn't convince myself that STM by itself
> sufficient. If you need to make irreversible changes to the state of
> an object, you can't use STM, so what do you use? Can every such
> situation be handled by creating "safe" values then using an STM to
> update them?
> ? ? ? <mike
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

IMHO STM by itself isn't sufficient. Either immutability, or careful
use of references protected by STM amounting to the same are the only
reasonable ways to do it. Both also perform much better than the

From ncoghlan at  Thu Jan 12 12:47:16 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 12 Jan 2012 21:47:16 +1000
Subject: [Python-Dev] os.walk() with followlinks=False
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 12, 2012 at 12:54 AM, Antoine Pitrou <solipsis at> wrote:
> On Wed, 11 Jan 2012 12:25:46 +1000
> Nick Coghlan <ncoghlan at> wrote:
>> If it's the latter... could we change it for 3.3, or is that too
>> significant a breach of backwards compatibility?
> I think we could change it.

For the benefit of those not following the tracker issue,
Charles-Fran?ois pointed out that putting the symlinks-to-directories
into the files list instead of the subdirectory list isn't really any
better (it just moves the problem to different use cases, such as
those that actually want to read the file contents).

With that being the case, I've changed my mind and figure we may as
well leave the current behaviour alone. I'll think about adding a
filter to walkdir that makes it easy to control the way they're
handled [1].



Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From victor.stinner at  Fri Jan 13 02:24:33 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 13 Jan 2012 02:24:33 +0100
Subject: [Python-Dev] Status of the fix for the hash collision vulnerability
Message-ID: <>

Many people proposed their own idea to fix the vulnerability, but only
3 wrote a patch:

- Glenn Linderman proposes to fix the vulnerability by adding a new
"safe" dict type (only accepting string keys). His proof-of-concept
( uses a secret of 64 random bits and uses it to compute
the hash of a key.
- Marc Andre Lemburg proposes to fix the vulnerability directly in
dict (for any key type). The patch raises an exception if a lookup
causes more than 1000 collisions.
- I propose to fix the vulnerability only in the Unicode hash (not for
other types). My patch adds a random secret initialized at startup (it
can be disabled or fixed using an environment variable).


I consider that Glenn's proposition is not applicable in practice
because all applications and all libraries have to be patched to use
the new "safe" dict type.

Some people are concerned by possible regression introduced by Marc's
proposition: his patch may raise an exception for legitimate data.

My proposition tries to be "just enough" secure with a low (runtime
performance) overhead. My patch becomes huge (and so backporting is
more complex), whereas Marc's patch is very simple and so trivial to


It is still unclear to me if the fix should be enabled by default for
Python < 3.3. Because the overhead (of my patch) is low, I would
prefer to enable the fix by default, to protect everyone with a simple
Python upgrade.

I prefer to explain how to disable explicitly the randomized hash
(PYTHONHASHSEED=0) (or how to fix application bugs) to people having
troubles with randomized hash, instead of leaving the hole open by


We might change hash() for types other than str, but it looks like web
servers are only concerned by dict with string keys.

We may use Paul's hash function if mine is not enough secure.

My patch doesn't fix the DoS, it just make the attack more complex.
The attacker cannot pregenerate data for an attack: (s)he has first to
compute the hash secret, and then compute hash collisions using the
secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
system). So I hope that computing collisions requires a lot of CPU
time (is slow) to make the attack ineffective with today computers.


I plan to write a nice patch for Python 3.3, then write a simpler
patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged,
maybe don't create a new random.c file, maybe don't touch the test
suite while the patch breaks many tests), and finally write patches
for Python 2.6 and 2.7.

Details about my patch:

- I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits)
- a new PYTHONSEED environment variable allow to control the
randomized hash: PYTHONSEED=0 disables completly the randomized hash
(restore the previous behaviour), PYTHONSEED=value uses a fixed seed
for processes sharing data and needind same hash values
(multiprocessing users?)
- no overhead on hash(str)
- no startup overhead on Linux
- startup overhead is 10% on Windows (see the issue, I propose another
solution with a startup overhead of 1%)

The patch is not done, some tests are still failing because of the
randomized hash.


FYI, PHP released a version 5.3.9 adding "max_input_vars directive to
prevent attacks based on hash collisions (CVE-2011-4885)".


From guido at  Fri Jan 13 03:57:42 2012
From: guido at (Guido van Rossum)
Date: Thu, 12 Jan 2012 18:57:42 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Hm... I started out as a big fan of the randomized hash, but thinking more
about it, I actually believe that the chances of some legitimate app having
>1000 collisions are way smaller than the chances that somebody's code will
break due to the variable hashing. In fact we know for a fact that the
latter will break code, since it changes the order of items in a dict. This
affects many tests written without this in mind, and I assume there will be
some poor sap out there who uses Python's hash() function to address some
external persistent hash table or some other external datastructure. How
pathological the data needs to be before the collision counter triggers?
I'd expect *very* pathological.

This is depending on how the counting is done (I didn't look at MAL's
patch), and assuming that increasing the hash table size will generally
reduce collisions if items collide but their hashes are different.

That said, even with collision counting I'd like a way to disable it
without changing the code, e.g. a flag or environment variable.


On Thu, Jan 12, 2012 at 5:24 PM, Victor Stinner <
victor.stinner at> wrote:

> Many people proposed their own idea to fix the vulnerability, but only
> 3 wrote a patch:
> - Glenn Linderman proposes to fix the vulnerability by adding a new
> "safe" dict type (only accepting string keys). His proof-of-concept
> ( uses a secret of 64 random bits and uses it to compute
> the hash of a key.
> - Marc Andre Lemburg proposes to fix the vulnerability directly in
> dict (for any key type). The patch raises an exception if a lookup
> causes more than 1000 collisions.
> - I propose to fix the vulnerability only in the Unicode hash (not for
> other types). My patch adds a random secret initialized at startup (it
> can be disabled or fixed using an environment variable).
> --
> I consider that Glenn's proposition is not applicable in practice
> because all applications and all libraries have to be patched to use
> the new "safe" dict type.
> Some people are concerned by possible regression introduced by Marc's
> proposition: his patch may raise an exception for legitimate data.
> My proposition tries to be "just enough" secure with a low (runtime
> performance) overhead. My patch becomes huge (and so backporting is
> more complex), whereas Marc's patch is very simple and so trivial to
> backport.
> --
> It is still unclear to me if the fix should be enabled by default for
> Python < 3.3. Because the overhead (of my patch) is low, I would
> prefer to enable the fix by default, to protect everyone with a simple
> Python upgrade.
> I prefer to explain how to disable explicitly the randomized hash
> (PYTHONHASHSEED=0) (or how to fix application bugs) to people having
> troubles with randomized hash, instead of leaving the hole open by
> default.
> --
> We might change hash() for types other than str, but it looks like web
> servers are only concerned by dict with string keys.
> We may use Paul's hash function if mine is not enough secure.
> My patch doesn't fix the DoS, it just make the attack more complex.
> The attacker cannot pregenerate data for an attack: (s)he has first to
> compute the hash secret, and then compute hash collisions using the
> secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
> system). So I hope that computing collisions requires a lot of CPU
> time (is slow) to make the attack ineffective with today computers.
> --
> I plan to write a nice patch for Python 3.3, then write a simpler
> patch for 3.1 and 3.2 (duplicate os.urandom code to keep it unchanged,
> maybe don't create a new random.c file, maybe don't touch the test
> suite while the patch breaks many tests), and finally write patches
> for Python 2.6 and 2.7.
> Details about my patch:
> - I tested it on Linux (32 and 64 bits) and Windows (Seven 64 bits)
> - a new PYTHONSEED environment variable allow to control the
> randomized hash: PYTHONSEED=0 disables completly the randomized hash
> (restore the previous behaviour), PYTHONSEED=value uses a fixed seed
> for processes sharing data and needind same hash values
> (multiprocessing users?)
> - no overhead on hash(str)
> - no startup overhead on Linux
> - startup overhead is 10% on Windows (see the issue, I propose another
> solution with a startup overhead of 1%)
> The patch is not done, some tests are still failing because of the
> randomized hash.
> --
> FYI, PHP released a version 5.3.9 adding "max_input_vars directive to
> prevent attacks based on hash collisions (CVE-2011-4885)".
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pje at  Fri Jan 13 05:19:29 2012
From: pje at (PJ Eby)
Date: Thu, 12 Jan 2012 23:19:29 -0500
Subject: [Python-Dev] Proposed PEP on concurrent programming support
In-Reply-To: <20120111160144.66c46236@mikmeyer-vm-fedora>
References: <20120103164036.681beeae@mikmeyer-vm-fedora>
Message-ID: <>

On Wed, Jan 11, 2012 at 7:01 PM, Mike Meyer <mwm at> wrote:

> On Wed, 4 Jan 2012 00:07:27 -0500
> PJ Eby <pje at> wrote:
>  > On Tue, Jan 3, 2012 at 7:40 PM, Mike Meyer <mwm at> wrote:
> > > For
> > > instance, combining STM with explicit locking would allow explicit
> > > locking when IO was required,
> > I don't think this idea makes any sense, since STM's don't really
> > "lock", and to control I/O in an STM system you just STM-ize the
> > queues. (Generally speaking.)
> I thought about that. I couldn't convince myself that STM by itself
> sufficient. If you need to make irreversible changes to the state of
> an object, you can't use STM, so what do you use? Can every such
> situation be handled by creating "safe" values then using an STM to
> update them?

If you need to do something irreversible, you just need to use an
STM-controlled queue, with something that reads from it to do the
irreversible things.  The catch is that your queue design has to support
guaranteed-successful item removal, since if the dequeue transaction fails,
it's too late.  Alternately, the queue reader can commit removal first,
then perform the irreversible operation...  but leave open a short window
for failure.  It depends on the precise semantics you're looking for.

In either case, though, the STM is pretty much sufficient, given a good
enough queue data structure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pydev at  Fri Jan 13 09:11:47 2012
From: pydev at (Frank Sievertsen)
Date: Fri, 13 Jan 2012 09:11:47 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Am 13.01.2012 02:24, schrieb Victor Stinner:
> My patch doesn't fix the DoS, it just make the attack more complex.
> The attacker cannot pregenerate data for an attack: (s)he has first to
> compute the hash secret, and then compute hash collisions using the
> secret. The hash secret is a least 64 bits long (128 bits on a 64 bit
> system). So I hope that computing collisions requires a lot of CPU
> time (is slow) to make the attack ineffective with today computers.
Unfortunately it requires only a few seconds to compute enough 32bit 
collisions on one core with no precomputed data.  I'm sure it's possible 
to make this less than a second.

In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) 
^ suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) 
is possible.

So the question is: How difficult is it to guess the seed?


From victor.stinner at  Fri Jan 13 10:23:45 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 13 Jan 2012 10:23:45 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> Unfortunately it requires only a few seconds to compute enough 32bit
> collisions on one core with no precomputed data.

Are you running the hash function "backward" to generate strings with
the same value, or you are more trying something like brute forcing?

And how do you get the hash secret? You need it to run an attack.

> In fact, since hash(X) == hash(Y) is independent of the suffix [ hash(X) ^
> suffix == hash(Y) ^ suffix ], a lot of precomputation (from the tail) is
> possible.

My change adds also a prefix (a prefix and a suffix). I don't know if
it changes anything for generating collisions.

> So the question is: How difficult is it to guess the seed?

I wrote some remarks about that in the issue. For example:

(hash("\0")^1) ^ (hash("\0\0")^2) gives ((prefix * 1000003) &
HASH_MASK) ^ ((prefix * 1000003**2)  & HASH_MASK)

I suppose that you don't have directly the full output of hash(str) in
practical, but hash(str) & DICT_MASK where DICT_MASK depends is the
size of the internal dict array minus 1. For example, for a dictionary
of 65,536 items, the mask is 0x1ffff and so cannot gives you more than
17 bits of hash(str) output. I still don't know how difficult it is to
retreive hash(str) bits from repr(dict).


From regebro at  Fri Jan 13 12:20:28 2012
From: regebro at (Lennart Regebro)
Date: Fri, 13 Jan 2012 12:20:28 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 02:24, Victor Stinner
<victor.stinner at> wrote:
> - Glenn Linderman proposes to fix the vulnerability by adding a new
> "safe" dict type (only accepting string keys). His proof-of-concept
> ( uses a secret of 64 random bits and uses it to compute
> the hash of a key.

This is my preferred solution. The vulnerability is basically only in
the dictionary you keep the form data you get from a request. This
solves it easily and nicely. It can also be a separate module
installable for Python 2, which many web frameworks still use, so it
can be practical implementable now, and not in a couple of years.

Then again, nothing prevents us from having both this, *and* one of
the other solutions.  :-)


From ncoghlan at  Fri Jan 13 13:14:43 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 13 Jan 2012 22:14:43 +1000
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
Message-ID: <>

I marked PEP 380 as Final this evening, after pushing the tested and
documented implementation to

As the list of names in the NEWS and What's New entries suggests, it
was quite a collaborative effort to get this one over the line, and
that's without even listing all the people that offered helpful
suggestions and comments along the way :)

print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))())))

Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From frank at  Fri Jan 13 12:49:15 2012
From: frank at (Frank Sievertsen)
Date: Fri, 13 Jan 2012 12:49:15 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

>> Unfortunately it requires only a few seconds to compute enough 32bit
>> collisions on one core with no precomputed data.
> Are you running the hash function "backward" to generate strings with
> the same value, or you are more trying something like brute forcing?

If you try it brute force to hit a specific target, you'll only find 
only one good string every 4 billion tries. That's why you first blow up 
your target:

You start backward from an arbitrary target-value. You brute force for 3 
characters, for example, this will give you 16 million intermediate 
values from which you know that they'll end up in your target-value.

Those 16 million values are a huge target for now brute-forcing forward: 
Every 256 tries you'll hit one of these values.

> And how do you get the hash secret? You need it to run an attack.

I don't know. This was meant as an answer to the quoted text "So I hope 
that computing collisions requires a lot of CPU time (is slow) to make 
the attack ineffective with today computers.".

What I wanted to say is: The security relies on the fact that the 
attacker can't guess the prefix, not that he can't precompute the values 
and it takes hours or days to compute the collisions. If the prefix 
leaks out of the application, then the rest is trivial and done in a few 
seconds. The suffix is not important for the collision-prevention, but 
it will probably make it much harder to guess the prefix.

I don't know an effective way to get the prefix either, (if the 
application doesn't leak full hash(X) values).


From anacrolix at  Fri Jan 13 13:34:38 2012
From: anacrolix at (Matt Joiner)
Date: Fri, 13 Jan 2012 23:34:38 +1100
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
In-Reply-To: <>
References: <>
Message-ID: <>

Great work Nick, I've been looking forward to this one. Thanks all for
putting the effort in.

On Fri, Jan 13, 2012 at 11:14 PM, Nick Coghlan <ncoghlan at> wrote:
> I marked PEP 380 as Final this evening, after pushing the tested and
> documented implementation to
> As the list of names in the NEWS and What's New entries suggests, it
> was quite a collaborative effort to get this one over the line, and
> that's without even listing all the people that offered helpful
> suggestions and comments along the way :)
> print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))())))
> --
> Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From and-dev at  Fri Jan 13 13:45:50 2012
From: and-dev at (And Clover)
Date: Fri, 13 Jan 2012 12:45:50 +0000
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-13 11:20, Lennart Regebro wrote:
> The vulnerability is basically only in the dictionary you keep the
> form data you get from a request.

I'd have to disagree with this statement. The vulnerability is anywhere 
that creates a dictionary (or set) from attacker-provided keys. That 
would include HTTP headers, RFC822-family subheaders and parameters, the 
environ, input taken from JSON or XML, and so on - and indeed hash 
collision attacks are not at all web-specific.

The problem with having two dict implementations is that a caller would 
have to tell libraries that use dictionaries which implementation to 
use. So for example an argument would have to be passed to json.load[s] 
to specify whether the input was known-sane or potentially hostile.

Any library could ever use dictionaries to process untrusted input *or 
any library that used another library that did* would have to pass such 
a flag through, which would quickly get very unwieldy indeed... or else 
they'd have to just always use safedict, in which case we're in pretty 
much the same position as we are with changing dict anyway.

And Clover
mailto:and at
gtalk:chat?jid=bobince at

From g.brandl at  Fri Jan 13 16:17:09 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 13 Jan 2012 16:17:09 +0100
Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes
In-Reply-To: <>
References: <>
Message-ID: <jephtj$7d0$>

Caution, long review ahead.

On 01/13/2012 12:43 PM, nick.coghlan wrote:
> changeset:   74356:d64ac9ab4cd0
> user:        Nick Coghlan <ncoghlan at>
> date:        Fri Jan 13 21:43:40 2012 +1000
> summary:
>   Implement PEP 380 - 'yield from' (closes #11682)
> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
> --- a/Doc/reference/expressions.rst
> +++ b/Doc/reference/expressions.rst
> @@ -318,7 +318,7 @@

There should probably be a "versionadded" somewhere on this page.

>  .. productionlist::
>     yield_atom: "(" `yield_expression` ")"
> -   yield_expression: "yield" [`expression_list`]
> +   yield_expression: "yield" [`expression_list` | "from" `expression`]
>  The :keyword:`yield` expression is only used when defining a generator function,
>  and can only be used in the body of a function definition.  Using a
> @@ -336,7 +336,10 @@
>  the generator's methods, the function can proceed exactly as if the
>  :keyword:`yield` expression was just another external call.  The value of the
>  :keyword:`yield` expression after resuming depends on the method which resumed
> -the execution.
> +the execution. If :meth:`__next__` is used (typically via either a
> +:keyword:`for` or the :func:`next` builtin) then the result is :const:`None`,
> +otherwise, if :meth:`send` is used, then the result will be the value passed
> +in to that method.
>  .. index:: single: coroutine
> @@ -346,12 +349,29 @@
>  where should the execution continue after it yields; the control is always
>  transferred to the generator's caller.
> -The :keyword:`yield` statement is allowed in the :keyword:`try` clause of a
> +:keyword:`yield` expressions are allowed in the :keyword:`try` clause of a
>  :keyword:`try` ...  :keyword:`finally` construct.  If the generator is not
>  resumed before it is finalized (by reaching a zero reference count or by being
>  garbage collected), the generator-iterator's :meth:`close` method will be
>  called, allowing any pending :keyword:`finally` clauses to execute.
> +When ``yield from expression`` is used, it treats the supplied expression as
> +a subiterator. All values produced by that subiterator are passed directly
> +to the caller of the current generator's methods. Any values passed in with
> +:meth:`send` and any exceptions passed in with :meth:`throw` are passed to
> +the underlying iterator if it has the appropriate methods. If this is not the
> +case, then :meth:`send` will raise :exc:`AttributeError` or :exc:`TypeError`,
> +while :meth:`throw` will just raise the passed in exception immediately.
> +
> +When the underlying iterator is complete, the :attr:`~StopIteration.value`
> +attribute of the raised :exc:`StopIteration` instance becomes the value of
> +the yield expression. It can be either set explicitly when raising
> +:exc:`StopIteration`, or automatically when the sub-iterator is a generator
> +(by returning a value from the sub-generator).
> +
> +The parentheses can be omitted when the :keyword:`yield` expression is the
> +sole expression on the right hand side of an assignment statement.
> +
>  .. index:: object: generator
>  The following generator's methods can be used to control the execution of a
> @@ -444,6 +464,10 @@
>        The proposal to enhance the API and syntax of generators, making them
>        usable as simple coroutines.
> +   :pep:`0380` - Syntax for Delegating to a Subgenerator
> +      The proposal to introduce the :token:`yield_from` syntax, making delegation
> +      to sub-generators easy.
> +
>  .. _primaries:
>  PEP 3155: Qualified name for classes and functions
>  ==================================================
> @@ -208,7 +224,6 @@
>  how they might be accessible from the global scope.
>  Example with (non-bound) methods::
> -
>     >>> class C:
>     ...     def meth(self):
>     ...         pass

This looks like a spurious (and syntax-breaking) change.

> diff --git a/Grammar/Grammar b/Grammar/Grammar
> --- a/Grammar/Grammar
> +++ b/Grammar/Grammar
> @@ -121,7 +121,7 @@
>                           |'**' test)
>  # The reason that keywords are test nodes instead of NAME is that using NAME
>  # results in an ambiguity. ast.c makes sure it's a NAME.
> -argument: test [comp_for] | test '=' test  # Really [keyword '='] test
> +argument: (test) [comp_for] | test '=' test  # Really [keyword '='] test

This looks like a change without effect?

> diff --git a/Include/genobject.h b/Include/genobject.h
> --- a/Include/genobject.h
> +++ b/Include/genobject.h
> @@ -11,20 +11,20 @@
>  struct _frame; /* Avoid including frameobject.h */
>  typedef struct {
> -	PyObject_HEAD
> -	/* The gi_ prefix is intended to remind of generator-iterator. */
> +        PyObject_HEAD
> +        /* The gi_ prefix is intended to remind of generator-iterator. */
> -	/* Note: gi_frame can be NULL if the generator is "finished" */
> -	struct _frame *gi_frame;
> +        /* Note: gi_frame can be NULL if the generator is "finished" */
> +        struct _frame *gi_frame;
> -	/* True if generator is being executed. */
> -	int gi_running;
> +        /* True if generator is being executed. */
> +        int gi_running;
> -	/* The code object backing the generator */
> -	PyObject *gi_code;
> +        /* The code object backing the generator */
> +        PyObject *gi_code;
> -	/* List of weak reference. */
> -	PyObject *gi_weakreflist;
> +        /* List of weak reference. */
> +        PyObject *gi_weakreflist;
>  } PyGenObject;

While these change tabs into spaces, it should be 4 spaces, not 8.

>  @@ -34,6 +34,7 @@
>  PyAPI_FUNC(PyObject *) PyGen_New(struct _frame *);
>  PyAPI_FUNC(int) PyGen_NeedsFinalizing(PyGenObject *);
> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);
>  #ifdef __cplusplus
>  }

Does this API need to be public? If yes, it needs to be documented.

> diff --git a/Include/opcode.h b/Include/opcode.h
> --- a/Include/opcode.h
> +++ b/Include/opcode.h
> @@ -7,116 +7,117 @@
>  /* Instruction opcodes for compiled code */
> -#define POP_TOP		1
> -#define ROT_TWO		2
> -#define ROT_THREE	3
> -#define DUP_TOP		4
> +#define POP_TOP         1
> +#define ROT_TWO         2
> +#define ROT_THREE       3
> +#define DUP_TOP         4
>  #define DUP_TOP_TWO     5
> -#define NOP		9
> +#define NOP             9
> -#define UNARY_POSITIVE	10
> -#define UNARY_NEGATIVE	11
> -#define UNARY_NOT	12
> +#define UNARY_POSITIVE  10
> +#define UNARY_NEGATIVE  11
> +#define UNARY_NOT       12
> -#define UNARY_INVERT	15
> +#define UNARY_INVERT    15
> -#define BINARY_POWER	19
> +#define BINARY_POWER    19
> -#define BINARY_MULTIPLY	20
> +#define BINARY_MULTIPLY 20
> -#define BINARY_MODULO	22
> -#define BINARY_ADD	23
> -#define BINARY_SUBTRACT	24
> -#define BINARY_SUBSCR	25
> +#define BINARY_MODULO   22
> +#define BINARY_ADD      23
> +#define BINARY_SUBTRACT 24
> +#define BINARY_SUBSCR   25
>  #define BINARY_TRUE_DIVIDE 27
> -#define STORE_MAP	54
> -#define INPLACE_ADD	55
> -#define INPLACE_SUBTRACT	56
> -#define INPLACE_MULTIPLY	57
> +#define STORE_MAP       54
> +#define INPLACE_ADD     55
> +#define INPLACE_SUBTRACT        56
> +#define INPLACE_MULTIPLY        57
> -#define INPLACE_MODULO	59
> -#define STORE_SUBSCR	60
> -#define DELETE_SUBSCR	61
> +#define INPLACE_MODULO  59
> +#define STORE_SUBSCR    60
> +#define DELETE_SUBSCR   61
> -#define BINARY_LSHIFT	62
> -#define BINARY_RSHIFT	63
> -#define BINARY_AND	64
> -#define BINARY_XOR	65
> -#define BINARY_OR	66
> -#define INPLACE_POWER	67
> -#define GET_ITER	68
> -#define STORE_LOCALS	69
> -#define PRINT_EXPR	70
> +#define BINARY_LSHIFT   62
> +#define BINARY_RSHIFT   63
> +#define BINARY_AND      64
> +#define BINARY_XOR      65
> +#define BINARY_OR       66
> +#define INPLACE_POWER   67
> +#define GET_ITER        68
> +#define STORE_LOCALS    69
> +#define PRINT_EXPR      70
>  #define LOAD_BUILD_CLASS 71
> +#define YIELD_FROM      72
> -#define INPLACE_LSHIFT	75
> -#define INPLACE_RSHIFT	76
> -#define INPLACE_AND	77
> -#define INPLACE_XOR	78
> -#define INPLACE_OR	79
> -#define BREAK_LOOP	80
> +#define INPLACE_LSHIFT  75
> +#define INPLACE_RSHIFT  76
> +#define INPLACE_AND     77
> +#define INPLACE_XOR     78
> +#define INPLACE_OR      79
> +#define BREAK_LOOP      80
>  #define WITH_CLEANUP    81
> -#define RETURN_VALUE	83
> -#define IMPORT_STAR	84
> +#define RETURN_VALUE    83
> +#define IMPORT_STAR     84
> -#define YIELD_VALUE	86
> -#define POP_BLOCK	87
> -#define END_FINALLY	88
> -#define POP_EXCEPT	89
> +#define YIELD_VALUE     86
> +#define POP_BLOCK       87
> +#define END_FINALLY     88
> +#define POP_EXCEPT      89
> -#define HAVE_ARGUMENT	90	/* Opcodes from here have an argument: */
> +#define HAVE_ARGUMENT   90      /* Opcodes from here have an argument: */
> -#define STORE_NAME	90	/* Index in name list */
> -#define DELETE_NAME	91	/* "" */
> -#define UNPACK_SEQUENCE	92	/* Number of sequence items */
> -#define FOR_ITER	93
> +#define STORE_NAME      90      /* Index in name list */
> +#define DELETE_NAME     91      /* "" */
> +#define UNPACK_SEQUENCE 92      /* Number of sequence items */
> +#define FOR_ITER        93
>  #define UNPACK_EX       94      /* Num items before variable part +
>                                     (Num items after variable part << 8) */
> -#define STORE_ATTR	95	/* Index in name list */
> -#define DELETE_ATTR	96	/* "" */
> -#define STORE_GLOBAL	97	/* "" */
> -#define DELETE_GLOBAL	98	/* "" */
> +#define STORE_ATTR      95      /* Index in name list */
> +#define DELETE_ATTR     96      /* "" */
> +#define STORE_GLOBAL    97      /* "" */
> +#define DELETE_GLOBAL   98      /* "" */
> -#define LOAD_CONST	100	/* Index in const list */
> -#define LOAD_NAME	101	/* Index in name list */
> -#define BUILD_TUPLE	102	/* Number of tuple items */
> -#define BUILD_LIST	103	/* Number of list items */
> -#define BUILD_SET	104     /* Number of set items */
> -#define BUILD_MAP	105	/* Always zero for now */
> -#define LOAD_ATTR	106	/* Index in name list */
> -#define COMPARE_OP	107	/* Comparison operator */
> -#define IMPORT_NAME	108	/* Index in name list */
> -#define IMPORT_FROM	109	/* Index in name list */
> +#define LOAD_CONST      100     /* Index in const list */
> +#define LOAD_NAME       101     /* Index in name list */
> +#define BUILD_TUPLE     102     /* Number of tuple items */
> +#define BUILD_LIST      103     /* Number of list items */
> +#define BUILD_SET       104     /* Number of set items */
> +#define BUILD_MAP       105     /* Always zero for now */
> +#define LOAD_ATTR       106     /* Index in name list */
> +#define COMPARE_OP      107     /* Comparison operator */
> +#define IMPORT_NAME     108     /* Index in name list */
> +#define IMPORT_FROM     109     /* Index in name list */
> -#define JUMP_FORWARD	110	/* Number of bytes to skip */
> -#define JUMP_IF_FALSE_OR_POP 111	/* Target byte offset from beginning of code */
> -#define JUMP_IF_TRUE_OR_POP 112	/* "" */
> -#define JUMP_ABSOLUTE	113	/* "" */
> -#define POP_JUMP_IF_FALSE 114	/* "" */
> -#define POP_JUMP_IF_TRUE 115	/* "" */
> +#define JUMP_FORWARD    110     /* Number of bytes to skip */
> +#define JUMP_IF_FALSE_OR_POP 111        /* Target byte offset from beginning of code */
> +#define JUMP_IF_TRUE_OR_POP 112 /* "" */
> +#define JUMP_ABSOLUTE   113     /* "" */
> +#define POP_JUMP_IF_FALSE 114   /* "" */
> +#define POP_JUMP_IF_TRUE 115    /* "" */
> -#define LOAD_GLOBAL	116	/* Index in name list */
> +#define LOAD_GLOBAL     116     /* Index in name list */
> -#define CONTINUE_LOOP	119	/* Start of loop (absolute) */
> -#define SETUP_LOOP	120	/* Target address (relative) */
> -#define SETUP_EXCEPT	121	/* "" */
> -#define SETUP_FINALLY	122	/* "" */
> +#define CONTINUE_LOOP   119     /* Start of loop (absolute) */
> +#define SETUP_LOOP      120     /* Target address (relative) */
> +#define SETUP_EXCEPT    121     /* "" */
> +#define SETUP_FINALLY   122     /* "" */
> -#define LOAD_FAST	124	/* Local variable number */
> -#define STORE_FAST	125	/* Local variable number */
> -#define DELETE_FAST	126	/* Local variable number */
> +#define LOAD_FAST       124     /* Local variable number */
> +#define STORE_FAST      125     /* Local variable number */
> +#define DELETE_FAST     126     /* Local variable number */
> -#define RAISE_VARARGS	130	/* Number of raise arguments (1, 2 or 3) */
> +#define RAISE_VARARGS   130     /* Number of raise arguments (1, 2 or 3) */
>  /* CALL_FUNCTION_XXX opcodes defined below depend on this definition */
> -#define CALL_FUNCTION	131	/* #args + (#kwargs<<8) */
> -#define MAKE_FUNCTION	132	/* #defaults + #kwdefaults<<8 + #annotations<<16 */
> -#define BUILD_SLICE 	133	/* Number of items */
> +#define CALL_FUNCTION   131     /* #args + (#kwargs<<8) */
> +#define MAKE_FUNCTION   132     /* #defaults + #kwdefaults<<8 + #annotations<<16 */
> +#define BUILD_SLICE     133     /* Number of items */
>  #define MAKE_CLOSURE    134     /* same as MAKE_FUNCTION */
>  #define LOAD_CLOSURE    135     /* Load free variable from closure */

Not sure putting these and all the other cosmetic changes into an already
big patch is such a good idea...

> diff --git a/Include/pyerrors.h b/Include/pyerrors.h
> --- a/Include/pyerrors.h
> +++ b/Include/pyerrors.h
> @@ -51,6 +51,11 @@
>      Py_ssize_t written;   /* only for BlockingIOError, -1 otherwise */
>  } PyOSErrorObject;
> +typedef struct {
> +    PyException_HEAD
> +    PyObject *value;
> +} PyStopIterationObject;
> +
>  /* Compatibility typedefs */
>  typedef PyOSErrorObject PyEnvironmentErrorObject;
>  #ifdef MS_WINDOWS
> @@ -380,6 +385,8 @@
>      const char *reason          /* UTF-8 encoded string */
>      );
> +/* create a StopIteration exception with the given value */
> +PyAPI_FUNC(PyObject *) PyStopIteration_Create(PyObject *);

About this API see below.

> diff --git a/Objects/abstract.c b/Objects/abstract.c
> --- a/Objects/abstract.c
> +++ b/Objects/abstract.c
> @@ -2267,7 +2267,6 @@
>      func = PyObject_GetAttrString(o, name);
>      if (func == NULL) {
> -        PyErr_SetString(PyExc_AttributeError, name);
>          return 0;
>      }
> @@ -2311,7 +2310,6 @@
>      func = PyObject_GetAttrString(o, name);
>      if (func == NULL) {
> -        PyErr_SetString(PyExc_AttributeError, name);
>          return 0;
>      }
>      va_start(va, format);

These two changes also look suspiciously unrelated?

> +PyObject *
> +PyStopIteration_Create(PyObject *value)
> +{
> +    return PyObject_CallFunctionObjArgs(PyExc_StopIteration, value, NULL);
> +}

I think this function is rather questionable.  It is only used once at all.
If kept, it should rather be named _PyE{rr,xc}_CreateStopIteration.  But since
it's so trivial, it should be removed altogether.

> diff --git a/Objects/genobject.c b/Objects/genobject.c
> --- a/Objects/genobject.c
> +++ b/Objects/genobject.c
> @@ -5,6 +5,9 @@
>  #include "structmember.h"
>  #include "opcode.h"
> +static PyObject *gen_close(PyGenObject *gen, PyObject *args);
> +static void gen_undelegate(PyGenObject *gen);
> +
>  static int
>  gen_traverse(PyGenObject *gen, visitproc visit, void *arg)
>  {
> @@ -90,12 +93,18 @@
>      /* If the generator just returned (as opposed to yielding), signal
>       * that the generator is exhausted. */
> -    if (result == Py_None && f->f_stacktop == NULL) {
> -        Py_DECREF(result);
> -        result = NULL;
> -        /* Set exception if not called by gen_iternext() */
> -        if (arg)
> +    if (result && f->f_stacktop == NULL) {
> +        if (result == Py_None) {
> +            /* Delay exception instantiation if we can */
>              PyErr_SetNone(PyExc_StopIteration);
> +        } else {
> +            PyObject *e = PyStopIteration_Create(result);
> +            if (e != NULL) {
> +                PyErr_SetObject(PyExc_StopIteration, e);
> +                Py_DECREF(e);
> +            }

Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here

> +/*
> + *   If StopIteration exception is set, fetches its 'value'
> + *   attribute if any, otherwise sets pvalue to None.
> + *
> + *   Returns 0 if no exception or StopIteration is set.
> + *   If any other exception is set, returns -1 and leaves
> + *   pvalue unchanged.
> + */
> +
> +int
> +PyGen_FetchStopIterationValue(PyObject **pvalue) {
> +    PyObject *et, *ev, *tb;
> +    PyObject *value = NULL;
> +    
> +    if (PyErr_ExceptionMatches(PyExc_StopIteration)) {
> +        PyErr_Fetch(&et, &ev, &tb);
> +        Py_XDECREF(et);
> +        Py_XDECREF(tb);
> +        if (ev) {
> +            value = ((PyStopIterationObject *)ev)->value;
> +            Py_DECREF(ev);
> +        }

PyErr_Fetch without PyErr_Restore clears the exception, that should be
mentioned in the docstring.


From techtonik at  Fri Jan 13 16:34:38 2012
From: techtonik at (anatoly techtonik)
Date: Fri, 13 Jan 2012 18:34:38 +0300
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior in
 Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
Message-ID: <>

Posting to python-dev as it is no more relates to the idea of improving

sys.stdout.write() in Python 3 causes backwards incompatible behavior that
breaks recipe for unbuffered character reading from stdin on Linux -  At first I though that the
problem is in the new print() function, but it appeared that the culprit is

Attached is a test script which is a stripped down version of the recipe

If executed with Python 2, you can see the prompt to press a key (even
though output on Linux is buffered in Python 2).
With Python 3, there is not prompt until you press a key.

Is it a bug or intended behavior? What is the cause of this break?
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-python
Size: 489 bytes
Desc: not available
URL: <>

From guido at  Fri Jan 13 16:49:56 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 07:49:56 -0800
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
 in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <>

I think this may be because in Python 2, there is a coupling between stdin
and stderr (in the C stdlib code) that flushes stdout when you read stdin.
This doesn't seem to be required by the C std, but most implementations
seem to do it.

I think it was a nice feature but I can see problems with it; apps that
want this behavior ought to bite the bullet and flush stdout.

On Fri, Jan 13, 2012 at 7:34 AM, anatoly techtonik <techtonik at>wrote:

> Posting to python-dev as it is no more relates to the idea of improving
> print().
> sys.stdout.write() in Python 3 causes backwards incompatible behavior that
> breaks recipe for unbuffered character reading from stdin on Linux -
>  At first I though that the
> problem is in the new print() function, but it appeared that the culprit is
> sys.stdout.write()
> Attached is a test script which is a stripped down version of the recipe
> above.
> If executed with Python 2, you can see the prompt to press a key (even
> though output on Linux is buffered in Python 2).
> With Python 3, there is not prompt until you press a key.
> Is it a bug or intended behavior? What is the cause of this break?
> --
> anatoly t.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Jan 13 17:00:23 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 08:00:23 -0800
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
In-Reply-To: <>
References: <>
Message-ID: <>


On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan <ncoghlan at> wrote:

> I marked PEP 380 as Final this evening, after pushing the tested and
> documented implementation to
> As the list of names in the NEWS and What's New entries suggests, it
> was quite a collaborative effort to get this one over the line, and
> that's without even listing all the people that offered helpful
> suggestions and comments along the way :)
> print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))())))

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From python-dev at  Fri Jan 13 17:00:57 2012
From: python-dev at (Xavier Morel)
Date: Fri, 13 Jan 2012 17:00:57 +0100
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
	in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-13, at 16:34 , anatoly techtonik wrote:
> Posting to python-dev as it is no more relates to the idea of improving
> print().
> sys.stdout.write() in Python 3 causes backwards incompatible behavior that
> breaks recipe for unbuffered character reading from stdin on Linux -
>  At first I though that the
> problem is in the new print() function, but it appeared that the culprit is
> sys.stdout.write()
> Attached is a test script which is a stripped down version of the recipe
> above.
> If executed with Python 2, you can see the prompt to press a key (even
> though output on Linux is buffered in Python 2).
> With Python 3, there is not prompt until you press a key.
> Is it a bug or intended behavior? What is the cause of this break?
FWIW this is not restricted to Linux (the same behavior change can
be observed in OSX), and the script is overly complex you can expose
the change with 3 lines

    import sys

Python 2 displays "prompt" and terminates execution on [Return],
Python 3 does not display anything until [Return] is pressed.

Interestingly, the `-u` option is not sufficient to make
"prompt>" appear in Python 3, the stream has to be flushed
explicitly unless the input is ~16k characters (I guess that's
an internal buffer size of some sort)

From solipsis at  Fri Jan 13 17:19:08 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 13 Jan 2012 17:19:08 +0100
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
 in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
References: <>
Message-ID: <>

On Fri, 13 Jan 2012 17:00:57 +0100
Xavier Morel <python-dev at> wrote:
> FWIW this is not restricted to Linux (the same behavior change can
> be observed in OSX), and the script is overly complex you can expose
> the change with 3 lines
>     import sys
>     sys.stdout.write('promt>')
> Python 2 displays "prompt" and terminates execution on [Return],
> Python 3 does not display anything until [Return] is pressed.
> Interestingly, the `-u` option is not sufficient to make
> "prompt>" appear in Python 3, the stream has to be flushed
> explicitly unless the input is ~16k characters (I guess that's
> an internal buffer size of some sort)

"-u" forces line-buffering mode for stdout/stderr, which is already the
default if they are wired to an interactive device (isattr() returning

But this was already rehashed on python-ideas and the bug tracker, and
apparently Anatoly thought it would be a good idea to post on a third
medium. Sigh.



From status at  Fri Jan 13 18:07:30 2012
From: status at (Python tracker)
Date: Fri, 13 Jan 2012 18:07:30 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (2012-01-06 - 2012-01-13)
Python tracker at

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3210 (+30)
  closed 22352 (+30)
  total  25562 (+60)

Open issues with patches: 1384 

Issues opened (42)

#6774: socket.shutdown documentation: on some platforms, closing one  reopened by neologix

#13721: ssl.wrap_socket on a connected but failed connection succeeds  opened by kiilerix

#13722: "distributions can disable the encodings package"  opened by pitrou

#13723: Regular expressions: (?:X|\s+)*$ takes a long time  opened by ericp

#13725: regrtest does not recognize -d flag  opened by etukia

#13726: regrtest ambiguous -S flag  opened by etukia

#13727: Accessor macros for PyDateTime_Delta members  opened by amaury.forgeotdarc

#13728: Description of -m and -c cli options wrong?  opened by sandro.tosi

#13730: Grammar mistake in Decimal documentation  opened by zacherates

#13733: Change required to for Python 2.7.2 on OS/2  opened by Paul.Smedley

#13734: Add a generic directory walker method to avoid symlink attacks  opened by hynek

#13736: urllib.request.urlopen leaks exceptions from socket and httpli  opened by jmoy

#13737:'s Django settings file DEBUG=True  opened by Bithin.A

#13740: winsound.SND_NOWAIT ignored on modern Windows platforms  opened by bughunter2

#13742: Add a key parameter (like sorted) to heapq.merge  opened by ssapin

#13743: xml.dom.minidom.Document class is not documented  opened by sandro.tosi

#13744: raw byte strings are described in a confusing way  opened by barry

#13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte  opened by doko

#13746: ast.Tuple's have an inconsistent "col_offset" value  opened by bronikkk

#13747: ssl_version documentation error  opened by Ben.Darnell

#13749: socketserver can't stop  opened by teamnoir

#13751: multiprocessing.pool hangs if any worker raises an Exception w  opened by fmitha

#13752: add a str.casefold() method  opened by benjamin.peterson

#13756: Python3.2.2 make fail on cygwin  opened by holgerd00d

#13758: compile() should not encode 'filename' (at least on Windows)  opened by terry.reedy

#13759: Python 3.2.2 Mac installer version doesn't accept multibyte ch  opened by ats

#13760: ConfigParser exceptions are not pickleable  opened by fmitha

#13761: Add flush keyword to print()  opened by georg.brandl

#13763: rm obsolete reference in devguide  opened by tshepang

#13764: Misc/ is outdated... talks about svn  opened by tshepang

#13766: explain the relationship between Lib/lib2to3/Grammar.txt and G  opened by tshepang

#13768: Doc/tools/ available only on 2.7 branch  opened by tshepang

#13769: json.dump(ensure_ascii=False) return str instead of unicode  opened by mmarkk

#13770: python3 & json: add ensure_ascii documentation  opened by mmarkk

#13771: HTTPSConnection __init__ super implementation causes recursion  opened by michael.mulich

#13772: listdir() doesn't work with non-trivial symlinks  opened by pitrou

#13773: Support sqlite3 uri filenames  opened by poq

#13774: json.loads raises a SystemError for invalid encoding on 2.7.2  opened by Julian

#13775: Access Denied message on symlink creation misleading for an ex  opened by santa4nt

#13777: socket: communicating with Mac OS X KEXT controls  opened by goderbauer

#13779: os.walk: bottom-up  opened by patrick.vrijlandt

#13780: make YieldFrom its own node  opened by benjamin.peterson

Most recent 15 issues with no replies (15)

#13780: make YieldFrom its own node

#13779: os.walk: bottom-up

#13777: socket: communicating with Mac OS X KEXT controls

#13771: HTTPSConnection __init__ super implementation causes recursion

#13770: python3 & json: add ensure_ascii documentation

#13769: json.dump(ensure_ascii=False) return str instead of unicode

#13768: Doc/tools/ available only on 2.7 branch

#13766: explain the relationship between Lib/lib2to3/Grammar.txt and G

#13760: ConfigParser exceptions are not pickleable

#13756: Python3.2.2 make fail on cygwin

#13745: configuring --with-dbmliborder=bdb doesn't build the gdbm exte

#13743: xml.dom.minidom.Document class is not documented

#13740: winsound.SND_NOWAIT ignored on modern Windows platforms

#13730: Grammar mistake in Decimal documentation

#13727: Accessor macros for PyDateTime_Delta members

Most recent 15 issues waiting for review (15)

#13780: make YieldFrom its own node

#13777: socket: communicating with Mac OS X KEXT controls

#13775: Access Denied message on symlink creation misleading for an ex

#13774: json.loads raises a SystemError for invalid encoding on 2.7.2

#13773: Support sqlite3 uri filenames

#13763: rm obsolete reference in devguide

#13761: Add flush keyword to print()

#13752: add a str.casefold() method

#13742: Add a key parameter (like sorted) to heapq.merge

#13736: urllib.request.urlopen leaks exceptions from socket and httpli

#13734: Add a generic directory walker method to avoid symlink attacks

#13733: Change required to for Python 2.7.2 on OS/2

#13730: Grammar mistake in Decimal documentation

#13727: Accessor macros for PyDateTime_Delta members

#13725: regrtest does not recognize -d flag

Top 10 most discussed issues (10)

#13703: Hash collision security issue  43 msgs

#13734: Add a generic directory walker method to avoid symlink attacks  13 msgs

#13761: Add flush keyword to print()  12 msgs

#13721: ssl.wrap_socket on a connected but failed connection succeeds   8 msgs

#13122: Out of date links in the sidebar of the documentation index of   7 msgs

#13241: llvm-gcc-4.2 miscompiles Python (XCode 4.1 on Mac OS 10.7)   7 msgs

#13733: Change required to for Python 2.7.2 on OS/2   7 msgs

#13642: urllib incorrectly quotes username and password in https basic   6 msgs

#9253: argparse: optional subparsers   5 msgs

#13521: Make dict.setdefault() atomic   5 msgs

Issues closed (29)

#9637: docs do not say that urllib uses HTTP_PROXY  closed by orsenthil

#9993: shutil.move fails on symlink source  closed by pitrou

#11418: Method's global scope is module containing function definition  closed by python-dev

#11682: PEP 380 reference implementation for 3.3  closed by ncoghlan

#12364: Deadlock in test_concurrent_futures  closed by rosslagerwall

#13168: Python 2.6 having trouble finding modules when invoked via a s  closed by terry.reedy

#13502: Documentation for Event.wait return value is either wrong or i  closed by neologix

#13692: 2to3 mangles from . import frobnitz  closed by benjamin.peterson

#13718: Format Specification Mini-Language does not accept comma for p  closed by eric.smith

#13724: socket.create_connection and multiple IP addresses  closed by pitrou

#13729: Evaluation order for dics key/value  closed by amaury.forgeotdarc

#13731: Awkward phrasing in Decimal documentation  closed by rhettinger

#13732: test_logging failure on Windows buildbots  closed by python-dev

#13735: The protocol > 0 of cPickle does not given stable dictionary v  closed by pitrou

#13738: Optimize bytes.upper() and lower()  closed by pitrou

#13739: os.fdlistdir() is not idempotent  closed by neologix

#13741: *** glibc detected *** python: double free or corruption (!pre  closed by neologix

#13748: Allow rb"" literals as an equivalent to br""  closed by pitrou

#13750: queue broken when built without-thread  closed by rhettinger

#13753: str.join description contains an incorrect reference to argume  closed by terry.reedy

#13754: str.ljust and str.rjust do not exactly describes original stri  closed by python-dev

#13755: str.endswith and str.startswith do not take lists of strings  closed by rhettinger

#13757: os.fdlistdir() should not close the file descriptor given in a  closed by neologix

#13762: missing section: how to contribute to devguide  closed by tshepang

#13765: Distutils does not put quotes around paths that contain spaces  closed by eric.araujo

#13767: Would be nice to have a future import that turned off old exce  closed by benjamin.peterson

#13776: formatter_unicode.c still assumes ASCII  closed by eric.smith

#13778: Python should invalidate all non-owned 'thread.lock' objects w  closed by neologix

#12736: Request for python casemapping functions to use full not simpl  closed by benjamin.peterson

From python-dev at  Fri Jan 13 18:07:28 2012
From: python-dev at (Xavier Morel)
Date: Fri, 13 Jan 2012 18:07:28 +0100
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
	in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-13, at 17:19 , Antoine Pitrou wrote:
> "-u" forces line-buffering mode for stdout/stderr, which is already the
> default if they are wired to an interactive device (isattr() returning
> True).
Oh, I had not noticed the documentation had changed in Python 3 (in
Python 2 it stated that `-u` made IO unbuffered, on Python 3 it now
states that only binary IO is unbuffered and text IO remains
line-buffered). Sorry about that.

From dickinsm at  Fri Jan 13 18:08:26 2012
From: dickinsm at (Mark Dickinson)
Date: Fri, 13 Jan 2012 17:08:26 +0000
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum <guido at> wrote:
> How
> pathological the data needs to be before the collision counter triggers? I'd
> expect *very* pathological.

How pathological do you consider the set

   {1 << n for n in range(2000)}

to be?  What about the set:

   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}

?  The > 2000 elements of the latter set have only 61 distinct hash
values on 64-bit machine, so there will be over 2000 total collisions
involved in creating this set (though admittedly only around 30
collisions per hash value).


From guido at  Fri Jan 13 18:43:00 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 09:43:00 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 9:08 AM, Mark Dickinson <dickinsm at> wrote:

> On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum <guido at>
> wrote:
> > How
> > pathological the data needs to be before the collision counter triggers?
> I'd
> > expect *very* pathological.
> How pathological do you consider the set
>   {1 << n for n in range(2000)}
> to be?  What about the set:
>   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}
> ?  The > 2000 elements of the latter set have only 61 distinct hash
> values on 64-bit machine, so there will be over 2000 total collisions
> involved in creating this set (though admittedly only around 30
> collisions per hash value).

Hm... So how does the collision counting work for this case?

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Fri Jan 13 18:54:29 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 13 Jan 2012 18:54:29 +0100
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
References: <>
Message-ID: <>

On Fri, 13 Jan 2012 22:14:43 +1000
Nick Coghlan <ncoghlan at> wrote:
> I marked PEP 380 as Final this evening, after pushing the tested and
> documented implementation to

I don't know if this is supposed to work, but the exception looks wrong:

>>> def g(): yield from ()
>>> f = list(g())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in g
SystemError: error return without exception set

Also, the checkin lacked a bytecode magic number bump. It is not really
a problem since I've just bumped it anyway.



From dickinsm at  Fri Jan 13 19:13:08 2012
From: dickinsm at (Mark Dickinson)
Date: Fri, 13 Jan 2012 18:13:08 +0000
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum <guido at> wrote:
>> How pathological do you consider the set
>> ? {1 << n for n in range(2000)}
>> to be? ?What about the set:
>> ? ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}
>> ? ?The > 2000 elements of the latter set have only 61 distinct hash
>> values on 64-bit machine, so there will be over 2000 total collisions
>> involved in creating this set (though admittedly only around 30
>> collisions per hash value).
> Hm... So how does the collision counting work for this case?

Ah, my bad.  It looks like the ieee754_powers_of_two is safe---IIUC,
it's the number of collisions involved in a single key-set operation
that's limited.  So a dictionary with keys {1<<n for n in range(2000)}
is fine, but a dictionary with keys  {1<<(61*n) for n in range(2000)}
is not:

>>> {1<<(n*61):True for n in range(2000)}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
KeyError: 'too many hash collisions'
[67961 refs]

I'd still not consider this particularly pathological, though.


From guido at  Fri Jan 13 22:22:32 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 13:22:32 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 10:13 AM, Mark Dickinson <dickinsm at> wrote:

> On Fri, Jan 13, 2012 at 5:43 PM, Guido van Rossum <guido at>
> wrote:
> >> How pathological do you consider the set
> >>
> >>   {1 << n for n in range(2000)}
> >>
> >> to be?  What about the set:
> >>
> >>   ieee754_powers_of_two = {2.0**n for n in range(-1074, 1024)}
> >>
> >> ?  The > 2000 elements of the latter set have only 61 distinct hash
> >> values on 64-bit machine, so there will be over 2000 total collisions
> >> involved in creating this set (though admittedly only around 30
> >> collisions per hash value).
> >
> > Hm... So how does the collision counting work for this case?
> Ah, my bad.  It looks like the ieee754_powers_of_two is safe---IIUC,
> it's the number of collisions involved in a single key-set operation
> that's limited.  So a dictionary with keys {1<<n for n in range(2000)}
> is fine, but a dictionary with keys  {1<<(61*n) for n in range(2000)}
> is not:
> >>> {1<<(n*61):True for n in range(2000)}
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "<stdin>", line 1, in <dictcomp>
> KeyError: 'too many hash collisions'
> [67961 refs]
> I'd still not consider this particularly pathological, though.

Really? Even though you came up with specifically to prove me wrong?

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Fri Jan 13 23:48:09 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 13 Jan 2012 17:48:09 -0500
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
In-Reply-To: <>
References: <>
Message-ID: <jeqcbq$cbm$>

On 1/13/2012 7:14 AM, Nick Coghlan wrote:
> print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))())))

I pulled, rebuilt, and it indeed works (on Win 7).

I just remembered that Tim Peters somewhere (generator.c?) left a large 
comment with examples of recursive generators, such as knight's tours. 
Could these be rewritten with (and benefit from) 'yield from'? (It 
occurs to me his stuff might be worth exposing in an iterator/generator 

Terry Jan Reedy

From dinov at  Sat Jan 14 00:22:20 2012
From: dinov at (Dino Viehland)
Date: Fri, 13 Jan 2012 23:22:20 +0000
Subject: [Python-Dev] Python as a Metro-style App
References: <> <>
Message-ID: <>

Dino wrote:
> Martin wrote:
> > See the start of the thread: I tried to create a "WinRT Component
> > DLL", and that failed, as VS would refuse to compile any C file in
> > such a project. Not sure whether this is triggered by defining
> > WINAPI_FAMILY=2, or any other compiler setting.
> >
> > I'd really love to use WINAPI_FAMILY=2, as compiler errors are much
> > easier to fix than verifier errors.
> ...
> I'm going to ping some people on the windows team and see if the app
> container bit is or will be necessary for DLLs.

I heard back from the Windows team and they are going to require the app 
container bit to be set on all PE files (although they don't currently enforce it).  
I was able to compile a simple .c file and pass /link /appcontainer and that 
worked, so I'm going to try and figure out if there's some way to get the .vcxproj 
to build a working command line that includes that.

From benjamin at  Sat Jan 14 01:37:28 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 13 Jan 2012 19:37:28 -0500
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/13 Guido van Rossum <guido at>:
> Really? Even though you came up with specifically to prove me wrong?

Coming up with a counterexample now invalidates it?


From solipsis at  Sat Jan 14 02:17:08 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 14 Jan 2012 02:17:08 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
References: <>
Message-ID: <>

On Thu, 12 Jan 2012 18:57:42 -0800
Guido van Rossum <guido at> wrote:
> Hm... I started out as a big fan of the randomized hash, but thinking more
> about it, I actually believe that the chances of some legitimate app having
> >1000 collisions are way smaller than the chances that somebody's code will
> break due to the variable hashing.

Breaking due to variable hashing is deterministic: you notice it as
soon as you upgrade (and then you use PYTHONHASHSEED to disable
variable hashing). That seems better than unpredictable breaking when
some legitimate collision chain happens.



From victor.stinner at  Sat Jan 14 02:35:14 2012
From: victor.stinner at (Victor Stinner)
Date: Sat, 14 Jan 2012 02:35:14 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> - Glenn Linderman proposes to fix the vulnerability by adding a new
> "safe" dict type (only accepting string keys). His proof-of-concept
> ( uses a secret of 64 random bits and uses it to compute
> the hash of a key.

We could mix Marc's collision counter with SafeDict idea (being able
to use a different secret for each dict): use hash(key, secret)
(simple example: hash(secret+key)) instead of hash(key) in dict (and
set), and change the secret if we have more than N collisions. But it
would slow down all dict lookup (dict creation, get, set, del, ...).
And getting new random data can also be slow.

SafeDict and hash(secret+key) lose the benefit of the cached hash
result. Because the hash result depends on a argument, we cannot cache
the result anymore, and we have to recompute the hash for each lookup
(even if you lookup the same key twice ore more).


From guido at  Sat Jan 14 02:38:02 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 17:38:02 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at> wrote:

> On Thu, 12 Jan 2012 18:57:42 -0800
> Guido van Rossum <guido at> wrote:
> > Hm... I started out as a big fan of the randomized hash, but thinking
> more
> > about it, I actually believe that the chances of some legitimate app
> having
> > >1000 collisions are way smaller than the chances that somebody's code
> will
> > break due to the variable hashing.
> Breaking due to variable hashing is deterministic: you notice it as
> soon as you upgrade (and then you use PYTHONHASHSEED to disable
> variable hashing). That seems better than unpredictable breaking when
> some legitimate collision chain happens.

Fair enough. But I'm now uncomfortable with turning this on for bugfix
releases. I'm fine with making this the default in 3.3, just not in 3.2,
3.1 or 2.x -- it will break too much code and organizations will have to
roll back the release or do extensive testing before installing a bugfix
release -- exactly what we *don't* want for those.

FWIW, I don't believe in the SafeDict solution -- you never know which
dicts you have to change.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Sat Jan 14 02:58:23 2012
From: greg at (Gregory P. Smith)
Date: Fri, 13 Jan 2012 17:58:23 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at> wrote:

> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at>wrote:
>> On Thu, 12 Jan 2012 18:57:42 -0800
>> Guido van Rossum <guido at> wrote:
>> > Hm... I started out as a big fan of the randomized hash, but thinking
>> more
>> > about it, I actually believe that the chances of some legitimate app
>> having
>> > >1000 collisions are way smaller than the chances that somebody's code
>> will
>> > break due to the variable hashing.
>> Breaking due to variable hashing is deterministic: you notice it as
>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>> variable hashing). That seems better than unpredictable breaking when
>> some legitimate collision chain happens.
> Fair enough. But I'm now uncomfortable with turning this on for bugfix
> releases. I'm fine with making this the default in 3.3, just not in 3.2,
> 3.1 or 2.x -- it will break too much code and organizations will have to
> roll back the release or do extensive testing before installing a bugfix
> release -- exactly what we *don't* want for those.
> FWIW, I don't believe in the SafeDict solution -- you never know which
> dicts you have to change.

Of the three options Victor listed only one is good.

I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to always
get everything right with regards to data that came from outside the
process never ending up hashed in a non-safe dict or set *anywhere*.
 "Safe" needs to be the default option for all hash tables.

I don't like the "*too many hash collisions*" exception. *-1*. It provides
non-deterministic application behavior for data driven applications with no
way for them to predict when it'll happen or where and prepare for it. It
may work in practice for many applications but is simply odd behavior.

I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
back ported to any Python version.

It is perfectly okay to break existing users who had anything depending on
ordering of internal hash tables. Their code was already broken. We
*will*provide a flag and/or environment variable that can be set to
turn the
feature off at their own peril which they can use in their test harnesses
that are stupid enough to use doctests with order dependencies.

This approach worked fine for Perl 9 years ago.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From v+python at  Sat Jan 14 03:09:33 2012
From: v+python at (Glenn Linderman)
Date: Fri, 13 Jan 2012 18:09:33 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/13/2012 5:35 PM, Victor Stinner wrote:
>> - Glenn Linderman proposes to fix the vulnerability by adding a new
>> "safe" dict type (only accepting string keys). His proof-of-concept
>> ( uses a secret of 64 random bits and uses it to compute
>> the hash of a key.
> We could mix Marc's collision counter with SafeDict idea (being able
> to use a different secret for each dict): use hash(key, secret)
> (simple example: hash(secret+key)) instead of hash(key) in dict (and
> set), and change the secret if we have more than N collisions. But it
> would slow down all dict lookup (dict creation, get, set, del, ...).
> And getting new random data can also be slow.
> SafeDict and hash(secret+key) lose the benefit of the cached hash
> result. Because the hash result depends on a argument, we cannot cache
> the result anymore, and we have to recompute the hash for each lookup
> (even if you lookup the same key twice ore more).
> Victor

So integrating SafeDict into dict so it could be automatically converted 
would mean changing the data structures underneath dict.  Given that, a 
technique for hash caching could be created, that isn't quite as good as 
the one in place, but may be less expensive than not caching the 
hashes.  It would also take more space, a second dict, internally, as 
well as the secret.

So once the collision counter reaches some threshold (since there would 
be a functional fallback, it could be much lower than 1000), the secret 
is obtained, and the keys are rehashed using hash(secret+key).  Now when 
lookups occur, the object id of the key and the hash of the key are used 
as the index and hash(secret+key) is stored as a cached value.  This 
would only benefit lookups by the same object, other objects with the 
same key value would be recalculated (at least the first time).  Some 
limit on the number of cached values would probably be appropriate.  
This would add complexity, of course, in trying to save time.

An alternate solution would be to convert a dict to a tree once the 
number of collisions produces poor performance.  Converting to a tree 
would result in O(log N) instead of O(1) lookup performance, but that is 
better than the degenerate case of O(N) which is produced by the 
excessive number of collisions resulting from an attack.  This would 
require new tree code to be included in the core, of course, probably a 
red-black tree, which stays balanced.

In either of these cases, the conversion is expensive, because a 
collision threshold must first be reached to determine the need for 
conversion, so the hash could already contain lots of data.  If it were 
too expensive, the attack could still be effective.

Another solution would be to change the collision code, so that 
colliding keys don't produce O(N) behavior, but some other behavior.  
Each colliding entry could convert that entry to a tree of entries, 
perhaps.  This would require no conversion of "bad dicts", and an attack 
could at worst convert O(1) performance to O(log N).

Clearly these ideas are more complex than adding randomization, but 
adding randomization doesn't seem to be produce immunity from attack, 
when data about the randomness is leaked.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Sat Jan 14 03:25:49 2012
From: greg at (Gregory P. Smith)
Date: Fri, 13 Jan 2012 18:25:49 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> Clearly these ideas are more complex than adding randomization, but adding
> randomization doesn't seem to be produce immunity from attack, when data
> about the randomness is leaked.

Which will not normally happen.

I'm firmly in the camp that believes the random seed can be probed and
determined by creatively injecting values and measuring timing of things.
 But doing that is difficult and time and bandwidth intensive so the per
process random hash seed is good enough.

There's another elephant in the room here, if you want to avoid this attack
use a 64-bit Python build as it uses 64-bit hash values that are
significantly more difficult to force a collision on.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Sat Jan 14 03:34:48 2012
From: greg at (Gregory P. Smith)
Date: Fri, 13 Jan 2012 18:34:48 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

btw, Tim's commit message on this one is amusingly relevant. :)

On Fri, Jan 13, 2012 at 6:25 PM, Gregory P. Smith <greg at> wrote:

>> Clearly these ideas are more complex than adding randomization, but
>> adding randomization doesn't seem to be produce immunity from attack, when
>> data about the randomness is leaked.
> Which will not normally happen.
> I'm firmly in the camp that believes the random seed can be probed and
> determined by creatively injecting values and measuring timing of things.
>  But doing that is difficult and time and bandwidth intensive so the per
> process random hash seed is good enough.
> There's another elephant in the room here, if you want to avoid this
> attack use a 64-bit Python build as it uses 64-bit hash values that are
> significantly more difficult to force a collision on.
> -gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sat Jan 14 03:55:22 2012
From: steve at (Steven D'Aprano)
Date: Sat, 14 Jan 2012 13:55:22 +1100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 14/01/12 12:58, Gregory P. Smith wrote:

> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
> back ported to any Python version.
> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken.

For the record:

steve at runes:~$ python -c "print(hash('spam ham'))"
steve at runes:~$ jython -c "print(hash('spam ham'))"

So it is already the case that Python code that assumes stable hashing is broken.

For what it's worth, I'm not convinced that we should be overly-concerned by 
"poor saps" (Guido's words) who rely on accidents of implementation regarding 
hash. We shouldn't break their code unless we have a good reason, but this 
strikes me as a good reason. The documentation for hash certainly makes no 
promise about stability, and relying on it strikes me as about as sensible as 
relying on the stability of error messages.

I'm also not convinced that the option to raise an exception after 1000 
collisions actually solves the problem. That relies on the application being 
re-written to catch the exception and recover from it (how?). Otherwise, all 
it does is change the attack vector from "cause an indefinite number of hash 
collisions" to "cause 999 hash collisions followed by crashing the application 
with an exception", which doesn't strike me as much of an improvement.

+1 on random seeding. Default to on in 3.3+ and default to off in older 
versions, which allows people to avoid breaking their code until they're ready 
for it to be broken.


From greg at  Sat Jan 14 04:06:00 2012
From: greg at (Gregory P. Smith)
Date: Fri, 13 Jan 2012 19:06:00 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at> wrote:

> On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at>wrote:
>> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at>wrote:
>>> On Thu, 12 Jan 2012 18:57:42 -0800
>>> Guido van Rossum <guido at> wrote:
>>> > Hm... I started out as a big fan of the randomized hash, but thinking
>>> more
>>> > about it, I actually believe that the chances of some legitimate app
>>> having
>>> > >1000 collisions are way smaller than the chances that somebody's code
>>> will
>>> > break due to the variable hashing.
>>> Breaking due to variable hashing is deterministic: you notice it as
>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>> variable hashing). That seems better than unpredictable breaking when
>>> some legitimate collision chain happens.
>> Fair enough. But I'm now uncomfortable with turning this on for bugfix
>> releases. I'm fine with making this the default in 3.3, just not in 3.2,
>> 3.1 or 2.x -- it will break too much code and organizations will have to
>> roll back the release or do extensive testing before installing a bugfix
>> release -- exactly what we *don't* want for those.
>> FWIW, I don't believe in the SafeDict solution -- you never know which
>> dicts you have to change.
> Agreed.
> Of the three options Victor listed only one is good.
> I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to
> always get everything right with regards to data that came from outside the
> process never ending up hashed in a non-safe dict or set *anywhere*.
>  "Safe" needs to be the default option for all hash tables.
> I don't like the "*too many hash collisions*" exception. *-1*. It
> provides non-deterministic application behavior for data driven
> applications with no way for them to predict when it'll happen or where and
> prepare for it. It may work in practice for many applications but is simply
> odd behavior.
> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily
> be back ported to any Python version.
> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
> feature off at their own peril which they can use in their test harnesses
> that are stupid enough to use doctests with order dependencies.

What an implementation looks like:

some stuff to be filled in, but this is all that is really required.  add
logic to allow a particular seed to be specified or forced to 0 from the
command line or environment.  add the logic to grab random bytes.  add the
autoconf glue to disable it.  done.


> This approach worked fine for Perl 9 years ago.
> -gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Sat Jan 14 04:19:38 2012
From: barry at (Barry Warsaw)
Date: Sat, 14 Jan 2012 04:19:38 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <20120114041938.098fd14b@rivendell>

On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote:

>On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at> wrote:
>> Breaking due to variable hashing is deterministic: you notice it as
>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>> variable hashing). That seems better than unpredictable breaking when
>> some legitimate collision chain happens.
>Fair enough. But I'm now uncomfortable with turning this on for bugfix
>releases. I'm fine with making this the default in 3.3, just not in 3.2,
>3.1 or 2.x -- it will break too much code and organizations will have to
>roll back the release or do extensive testing before installing a bugfix
>release -- exactly what we *don't* want for those.



From merwok at  Sat Jan 14 04:24:52 2012
From: merwok at (=?UTF-8?Q?=C3=89ric_Araujo?=)
Date: Sat, 14 Jan 2012 04:24:52 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <>
References: "\"<>"
Message-ID: <>

Hi Sandro,

Thanks for getting the ball rolling on this.  One style for markup, one
Sphinx version to code our extensions against and one location for the
documenting guidelines will make our work a bit easier.

> During the build process, there are some warnings that I can 
> understand:
I assume you mean ?can?t?, as you later ask how to fix them.  As a
general rule, they?re only warnings, so they don?t break the build, 
some links or stylings, so I think it?s okay to ignore them *right 

> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
That?s a mistake I did in cefe4f38fa0e.  This sentence should be 

> Doc/library/stdtypes.rst:2372: WARNING: more than one target found 
> for
> cross-reference u'next':
Need to use :meth:`.next` to let Sphinx find the right target (more 
on request :)

> Doc/library/sys.rst:651: WARNING: unknown keyword: None
Should use ``None``.

> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not
I don?t know if these should work (i.e. create a link to the 
language reference section) or abuse the markup (there are ?not? and
?in? keywords, but no ?not in? keyword ? use ``not in``).  I?d say 


From martin at  Sat Jan 14 04:45:57 2012
From: martin at (martin at
Date: Sat, 14 Jan 2012 04:45:57 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> What an implementation looks like:
> some stuff to be filled in, but this is all that is really required.

I think this statement (and the patch) is wrong. You also need to change
the byte string hashing, at least for 2.x. This I consider the biggest
flaw in that approach - other people may have written string-like objects
which continue to compare equal to a string but now hash different.


From guido at  Sat Jan 14 05:00:54 2012
From: guido at (Guido van Rossum)
Date: Fri, 13 Jan 2012 20:00:54 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at> wrote:

> It is perfectly okay to break existing users who had anything depending on
> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
> feature off at their own peril which they can use in their test harnesses
> that are stupid enough to use doctests with order dependencies.

No, that is not how we usually take compatibility between bugfix releases.
"Your code is already broken" is not an argument to break forcefully what
worked (even if by happenstance) before. The difference between CPython and
Jython (or between different CPython feature releases) also isn't relevant
-- historically we have often bent over backwards to avoid changing
behavior that was technically undefined, if we believed it would affect a
significant fraction of users.

I don't think anyone doubts that this will break lots of code (at least,
the arguments I've heard have been "their code is broken", not "nobody does

This approach worked fine for Perl 9 years ago.

I don't know what the Perl attitude about breaking undefined behavior
between micro versions was at the time. But ours is pretty clear -- don't
do it.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Jan 14 06:16:32 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 14 Jan 2012 15:16:32 +1000
Subject: [Python-Dev] [Python-checkins] cpython: add test,
	which was missing from d64ac9ab4cd0
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson
<python-checkins at> wrote:
> changeset: ? 74363:be85914b611c
> parent: ? ? ?74361:609482c6710e
> user: ? ? ? ?Benjamin Peterson <benjamin at>
> date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500
> summary:
> ?add test, which was missing from d64ac9ab4cd0

Ah, that's where that came from, thanks.

I still haven't fully trained myself to use hg import instead of
patch, which would avoid precisely this kind of error :P


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From tjreedy at  Sat Jan 14 06:43:04 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 14 Jan 2012 00:43:04 -0500
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <jer4lp$qe4$>

On 1/13/2012 8:58 PM, Gregory P. Smith wrote:

> It is perfectly okay to break existing users who had anything depending
> on ordering of internal hash tables. Their code was already broken.

Given that the doc says "Return the hash value of the object", I do not 
think we should be so hard-nosed. The above clearly implies that there 
is such a thing as *the* Python hash value for an object. And indeed, 
that has been true across many versions. If we had written "Return a 
hash value for the object, which can vary from run to run", the case 
would be different.

Terry Jan Reedy

From jackdied at  Sat Jan 14 07:24:54 2012
From: jackdied at (Jack Diederich)
Date: Sat, 14 Jan 2012 01:24:54 -0500
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at> wrote:
> Hm... I started out as a big fan of the randomized hash, but thinking more
> about it, I actually believe that the chances of some legitimate app having
>>1000 collisions are way smaller than the chances that somebody's code will
> break due to the variable hashing.

Python's dicts are designed to avoid hash conflicts by resizing and
keeping the available slots bountiful.  1000 conflicts sounds like a
number that couldn't be hit accidentally unless you had a single dict
using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're
good).   The hashes also look to exploit cache locality but that is
very unlikely to get one thousand conflicts by chance.  If you get
that many there is an attack.

> This is depending on how the counting is done (I didn't look at MAL's
> patch), and assuming that increasing the hash table size will generally
> reduce collisions if items collide but their hashes are different.

The patch counts conflicts on an individual insert and not lifetime
conflicts.  Looks sane to me.

> That said, even with collision counting I'd like a way to disable it without
> changing the code, e.g. a flag or environment variable.

Agreed.  Paranoid people can turn the behavior off and if it ever were
to become a problem in practice we could point people to a solution.


From ncoghlan at  Sat Jan 14 07:53:39 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 14 Jan 2012 16:53:39 +1000
Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes
In-Reply-To: <jephtj$7d0$>
References: <>
Message-ID: <>

On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl <g.brandl at> wrote:
> On 01/13/2012 12:43 PM, nick.coghlan wrote:
>> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
> There should probably be a "versionadded" somewhere on this page.

Good catch, I added versionchanged notes to this page, simple_stmts
and the StopIteration entry in the library reference.

>> ?PEP 3155: Qualified name for classes and functions
>> ?==================================================
> This looks like a spurious (and syntax-breaking) change.

Yeah, it was an error I introduced last time I merged from default. Fixed.

>> diff --git a/Grammar/Grammar b/Grammar/Grammar
>> -argument: test [comp_for] | test '=' test ?# Really [keyword '='] test
>> +argument: (test) [comp_for] | test '=' test ?# Really [keyword '='] test
> This looks like a change without effect?


It was a lingering after-effect of Greg's original patch (which also
modified the function call syntax to allow "yield from" expressions
with extra parens). I reverted the change to the function call syntax,
but forgot to ditch the added parens while doing so.

>> diff --git a/Include/genobject.h b/Include/genobject.h
>> - ? ? /* List of weak reference. */
>> - ? ? PyObject *gi_weakreflist;
>> + ? ? ? ?/* List of weak reference. */
>> + ? ? ? ?PyObject *gi_weakreflist;
>> ?} PyGenObject;
> While these change tabs into spaces, it should be 4 spaces, not 8.


>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);
> Does this API need to be public? If yes, it needs to be documented.

Hmm, good point - that one needs a bit of thought, so I've put it on
the tracker:

(that issue also covers your comments regarding the docstring for this
function and whether or not we even need the StopIteration instance
creation API)

>> -#define CALL_FUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */
>> -#define MAKE_FUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>> -#define BUILD_SLICE ?133 ? ? /* Number of items */
>> +#define CALL_FUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */
>> +#define MAKE_FUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>> +#define BUILD_SLICE ? ? 133 ? ? /* Number of items */
> Not sure putting these and all the other cosmetic changes into an already
> big patch is such a good idea...

I agree, but it's one of the challenges of a long-lived branch like
the PEP 380 one (I believe some of these cosmetic changes started life
in Greg's original patch and separating them out would have been quite
a pain). Anyone that wants to see the gory details of the branch
history can take a look at my bitbucket repo:

>> diff --git a/Objects/abstract.c b/Objects/abstract.c
>> --- a/Objects/abstract.c
>> +++ b/Objects/abstract.c
>> @@ -2267,7 +2267,6 @@
>> ? ? ?func = PyObject_GetAttrString(o, name);
>> ? ? ?if (func == NULL) {
>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>> ? ? ? ? ?return 0;
>> ? ? ?}
>> @@ -2311,7 +2310,6 @@
>> ? ? ?func = PyObject_GetAttrString(o, name);
>> ? ? ?if (func == NULL) {
>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>> ? ? ? ? ?return 0;
>> ? ? ?}
>> ? ? ?va_start(va, format);
> These two changes also look suspiciously unrelated?

IIRC, I removed those lines while working on the patch because the
message they produce (just the attribute name) is worse than the one
produced by the call to PyObject_GetAttrString (which also includes
the type of the object being accessed). Leaving the original
exceptions alone helped me track down some failures I was getting at
the time.

I've now made the various CallMethod helper APIs in abstract.c (1
public, 3 private) consistently leave the GetAttr exception alone and
added an explicit C API note to NEWS.

(Vaguely related tangent: the new code added by the patch probably has
a few parts that could benefit from the new GetAttrId private API)

>> diff --git a/Objects/genobject.c b/Objects/genobject.c
>> + ? ? ? ?} else {
>> + ? ? ? ? ? ?PyObject *e = PyStopIteration_Create(result);
>> + ? ? ? ? ? ?if (e != NULL) {
>> + ? ? ? ? ? ? ? ?PyErr_SetObject(PyExc_StopIteration, e);
>> + ? ? ? ? ? ? ? ?Py_DECREF(e);
>> + ? ? ? ? ? ?}
> Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here
> anyway?

I think you're right - so noted in the tracker issue about the C API additions.

Thanks for the thorough review, a fresh set of eyes is very helpful :)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 14 08:01:48 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 14 Jan 2012 17:01:48 +1000
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich <jackdied at> wrote:
>> This is depending on how the counting is done (I didn't look at MAL's
>> patch), and assuming that increasing the hash table size will generally
>> reduce collisions if items collide but their hashes are different.
> The patch counts conflicts on an individual insert and not lifetime
> conflicts. ?Looks sane to me.

Having a hard limit on the worst-case behaviour certainly sounds like
an attractive prospect. And there's nothing to worry about in terms of
secrecy or sufficient randomness - by default, attackers cannot
generate more than 1000 hash collisions in one lookup, period.

>> That said, even with collision counting I'd like a way to disable it without
>> changing the code, e.g. a flag or environment variable.
> Agreed. ?Paranoid people can turn the behavior off and if it ever were
> to become a problem in practice we could point people to a solution.

Does MAL's patch allow the limit to be set on a per-dict basis
(including setting it to None to disable collision limiting
completely)? If people have data sets that need to tolerate that kind
of collision level (and haven't already decided to move to a data
structure other than the builtin dict), then it may make sense to
allow them to remove the limit when using trusted input.

For maintenance versions though, it would definitely need to be
possible to switch it off without touching the code.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From g.brandl at  Sat Jan 14 08:53:59 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 14 Jan 2012 08:53:59 +0100
Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes
In-Reply-To: <>
References: <>
Message-ID: <jercak$d3$>

On 01/14/2012 07:53 AM, Nick Coghlan wrote:

>>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);
>> Does this API need to be public? If yes, it needs to be documented.
> Hmm, good point - that one needs a bit of thought, so I've put it on
> the tracker:
> (that issue also covers your comments regarding the docstring for this
> function and whether or not we even need the StopIteration instance
> creation API)


>>> -#define CALL_FUNCTION        131     /* #args + (#kwargs<<8) */
>>> -#define MAKE_FUNCTION        132     /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> -#define BUILD_SLICE  133     /* Number of items */
>>> +#define CALL_FUNCTION   131     /* #args + (#kwargs<<8) */
>>> +#define MAKE_FUNCTION   132     /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> +#define BUILD_SLICE     133     /* Number of items */
>> Not sure putting these and all the other cosmetic changes into an already
>> big patch is such a good idea...
> I agree, but it's one of the challenges of a long-lived branch like
> the PEP 380 one (I believe some of these cosmetic changes started life
> in Greg's original patch and separating them out would have been quite
> a pain). Anyone that wants to see the gory details of the branch
> history can take a look at my bitbucket repo:

I see.  I hadn't followed the development of PEP 380 closely before.

In any case, it is probably a good idea to mention this branch URL in the
commit message in case it is meant to be kept permanently  (it would also be
possible to put only that branch of your sandbox into another clone at

>>> diff --git a/Objects/abstract.c b/Objects/abstract.c
>>> --- a/Objects/abstract.c
>>> +++ b/Objects/abstract.c
>>> @@ -2267,7 +2267,6 @@
>>>      func = PyObject_GetAttrString(o, name);
>>>      if (func == NULL) {
>>> -        PyErr_SetString(PyExc_AttributeError, name);
>>>          return 0;
>>>      }
>>> @@ -2311,7 +2310,6 @@
>>>      func = PyObject_GetAttrString(o, name);
>>>      if (func == NULL) {
>>> -        PyErr_SetString(PyExc_AttributeError, name);
>>>          return 0;
>>>      }
>>>      va_start(va, format);
>> These two changes also look suspiciously unrelated?
> IIRC, I removed those lines while working on the patch because the
> message they produce (just the attribute name) is worse than the one
> produced by the call to PyObject_GetAttrString (which also includes
> the type of the object being accessed). Leaving the original
> exceptions alone helped me track down some failures I was getting at
> the time.

I agree that it's useful.

> I've now made the various CallMethod helper APIs in abstract.c (1
> public, 3 private) consistently leave the GetAttr exception alone and
> added an explicit C API note to NEWS.
> (Vaguely related tangent: the new code added by the patch probably has
> a few parts that could benefit from the new GetAttrId private API)

Maybe another candidate for an issue, so that we don't forget?


From chris at  Fri Jan 13 21:11:36 2012
From: chris at (Chris Withers)
Date: Fri, 13 Jan 2012 20:11:36 +0000
Subject: [Python-Dev] PEP 380 ("yield from") is now Final
In-Reply-To: <>
References: <>
Message-ID: <>

Finally, a reason to use Python 3 ;-)


On 13/01/2012 16:00, Guido van Rossum wrote:
> On Fri, Jan 13, 2012 at 4:14 AM, Nick Coghlan <ncoghlan at
> <mailto:ncoghlan at>> wrote:
>     I marked PEP 380 as Final this evening, after pushing the tested and
>     documented implementation to <>:
>     As the list of names in the NEWS and What's New entries suggests, it
>     was quite a collaborative effort to get this one over the line, and
>     that's without even listing all the people that offered helpful
>     suggestions and comments along the way :)
>     print("\n".join(list((lambda:(yield from ("Cheers,", "Nick")))())))
> --
> --Guido van Rossum ( <>)
> ______________________________________________________________________
> This email has been scanned by the Symantec Email service.
> For more information please visit
> ______________________________________________________________________
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Simplistix - Content Management, Batch Processing & Python Consulting

From stephen at  Sat Jan 14 09:05:24 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 14 Jan 2012 17:05:24 +0900
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>
Message-ID: <>

Jack Diederich writes:
 > On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at> wrote:
 > > Hm... I started out as a big fan of the randomized hash, but thinking more
 > > about it, I actually believe that the chances of some legitimate app having
 > >>1000 collisions are way smaller than the chances that somebody's code will
 > > break due to the variable hashing.
 > Python's dicts are designed to avoid hash conflicts by resizing and
 > keeping the available slots bountiful.  1000 conflicts sounds like a
 > number that couldn't be hit accidentally

I may be missing something, but AIUI, with the resize, the search for
an unused slot after collision will be looking in a different series
of slots, so the N counter for the N^2 behavior resets on resize.  If
not, you can delete this message now.

If so, since (a) in the error-on-many-collisions approach we're adding
a test here for collision count anyway and (b) we think this is almost
never gonna happen, can't we defuse the exploit by just resizing the
dict after 1000 collisions, with strictly better performance than the
error approach, and almost current performance for "normal" input?

In order to prevent attackers from exploiting every 1000th collision
to force out-of-memory, the expansion factor for collision-induced
resizing could be "very small".  (I don't know if that's possible in
the Python dict implementation, if the algorithm requires something
like doubling the dict size on every resize this is right out, of

Or, since this is an error/rare path anyway, offer the user a choice
of an error or a resize on hitting 1000 collisions?

From solipsis at  Sat Jan 14 09:33:02 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 14 Jan 2012 09:33:02 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
References: <>
Message-ID: <>

On Sat, 14 Jan 2012 04:45:57 +0100
martin at wrote:
> > What an implementation looks like:
> >
> >
> >
> > some stuff to be filled in, but this is all that is really required.
> I think this statement (and the patch) is wrong. You also need to change
> the byte string hashing, at least for 2.x. This I consider the biggest
> flaw in that approach - other people may have written string-like objects
> which continue to compare equal to a string but now hash different.

They're unlikely to have rewritten the hash algorithm by hand -
especially given the caveats wrt. differences between Python integers
and C integers.
Rather, they would have returned the hash() of the equivalent str or
unicode object.



From solipsis at  Sat Jan 14 09:33:28 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 14 Jan 2012 09:33:28 +0100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
References: <>
Message-ID: <>

On Sat, 14 Jan 2012 13:55:22 +1100
Steven D'Aprano <steve at> wrote:
> On 14/01/12 12:58, Gregory P. Smith wrote:
> > I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
> > back ported to any Python version.
> >
> > It is perfectly okay to break existing users who had anything depending on
> > ordering of internal hash tables. Their code was already broken.
> For the record:
> steve at runes:~$ python -c "print(hash('spam ham'))"
> -376510515
> steve at runes:~$ jython -c "print(hash('spam ham'))"
> 2054637885

Not to mention:

$ ./python -c "print(hash('spam ham'))"

(64-bit CPython)



From martin at  Sat Jan 14 13:09:40 2012
From: martin at (martin at
Date: Sat, 14 Jan 2012 13:09:40 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

>> I think this statement (and the patch) is wrong. You also need to change
>> the byte string hashing, at least for 2.x. This I consider the biggest
>> flaw in that approach - other people may have written string-like objects
>> which continue to compare equal to a string but now hash different.
> They're unlikely to have rewritten the hash algorithm by hand -
> especially given the caveats wrt. differences between Python integers
> and C integers.

See the CHAR_HASH macro in

It's not *that* unlikely that more copies of that algorithm exist.


From ncoghlan at  Sat Jan 14 15:04:55 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 15 Jan 2012 00:04:55 +1000
Subject: [Python-Dev] cpython: Implement PEP 380 - 'yield from' (closes
Message-ID: <>

On Jan 14, 2012 5:56 PM, "Georg Brandl" <g.brandl at> wrote:
> On 01/14/2012 07:53 AM, Nick Coghlan wrote:

> > I agree, but it's one of the challenges of a long-lived branch like
> > the PEP 380 one (I believe some of these cosmetic changes started life
> > in Greg's original patch and separating them out would have been quite
> > a pain). Anyone that wants to see the gory details of the branch
> > history can take a look at my bitbucket repo:
> >
> >
> I see.  I hadn't followed the development of PEP 380 closely before.
> In any case, it is probably a good idea to mention this branch URL in the
> commit message in case it is meant to be kept permanently  (it would also
> possible to put only that branch of your sandbox into another clone at

You're right we should have a PSF-controlled copy of the entire branch
history in cases like this. I actually still keep an irregularly updated
clone of my entire sandbox repo on (that's actually where it
started), so I'll refresh that and add a link to the pep380 branch history
into the tracker item that covered the PEP 380 integration into 3.3.

> > (Vaguely related tangent: the new code added by the patch probably has
> > a few parts that could benefit from the new GetAttrId private API)
> Maybe another candidate for an issue, so that we don't forget?

I just added a note about it to the C API cleanup tracker item.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sandro.tosi at  Sat Jan 14 15:31:31 2012
From: sandro.tosi at (Sandro Tosi)
Date: Sat, 14 Jan 2012 15:31:31 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 14, 2012 at 04:24, ?ric Araujo <merwok at> wrote:
> Hi Sandro,
> Thanks for getting the ball rolling on this. ?One style for markup, one
> Sphinx version to code our extensions against and one location for the
> documenting guidelines will make our work a bit easier.

thanks :) I'm happy to help!

>> During the build process, there are some warnings that I can understand:
> I assume you mean ?can?t?, as you later ask how to fix them. ?As a

yes, indeed

> general rule, they?re only warnings, so they don?t break the build, only
> some links or stylings, so I think it?s okay to ignore them *right now*.

but I like to get them fixed nonetheless: after all, the current build
doesn't show warnings - but I agree it's a non-blocking issue.

>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
> That?s a mistake I did in cefe4f38fa0e. ?This sentence should be removed.

Do you mean revert this whole hunk:

@@ -480,10 +516,11 @@
    nested scope
       The ability to refer to a variable in an enclosing definition.  For
       instance, a function defined inside another function can refer to
-      variables in the outer function.  Note that nested scopes work only for
-      reference and not for assignment which will always write to the innermost
-      scope.  In contrast, local variables both read and write in the innermost
-      scope.  Likewise, global variables read and write to the global
+      variables in the outer function.  Note that nested scopes by default work
+      only for reference and not for assignment.  Local variables both read and
+      write in the innermost scope.  Likewise, global variables read and write
+      to the global namespace.  The :keyword:`nonlocal` allows writing to outer
+      scopes.

    new-style class
       Any class which inherits from :class:`object`.  This includes
all built-in

or just "The :keyword:`nonlocal` allows writing to outer scopes."?

>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found for
>> cross-reference u'next':
> Need to use :meth:`.next` to let Sphinx find the right target (more info
> on request :)

it seems what it needed to was :meth:`next` (without the dot). The
current page links all 'next' in to functions.html#next,
and using :meth:`next` does that.

>> Doc/library/sys.rst:651: WARNING: unknown keyword: None
> Should use ``None``.


>> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
>> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not
> I don?t know if these should work (i.e. create a link to the appropriate
> language reference section) or abuse the markup (there are ?not? and
> ?in? keywords, but no ?not in? keyword ? use ``not in``). ?I?d say ignore
> them.

ACK, but I'm willing to fix them if someone tells me how to :)

I'm going to prepare the patches and then push - i'll send a heads-up afterward.

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From martin at  Sat Jan 14 16:12:19 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 14 Jan 2012 16:12:19 +0100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Am 13.01.2012 18:08, schrieb Mark Dickinson:
> On Fri, Jan 13, 2012 at 2:57 AM, Guido van Rossum <guido at> wrote:
>> How
>> pathological the data needs to be before the collision counter triggers? I'd
>> expect *very* pathological.
> How pathological do you consider the set
>    {1 << n for n in range(2000)}
> to be?  

I think this is not a counter-example for the proposed algorithm (at
least not in the way I think it should be implemented).

Those values may collide on the slot in the set, but they don't collide
on the actual hash value.

So in order to determine whether the collision limit is exceeded, we
shouldn't count colliding slots, but colliding hash values (which we
will all encounter during an insert).

> though admittedly only around 30 collisions per hash value.

I do consider the case of hashing integers with only one bit set
pathological. However, this can be overcome by factoring the magnitude
of the number into the hash as well.


From martin at  Sat Jan 14 16:17:59 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 14 Jan 2012 16:17:59 +0100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Am 14.01.2012 01:37, schrieb Benjamin Peterson:
> 2012/1/13 Guido van Rossum <guido at>:
>> Really? Even though you came up with specifically to prove me wrong?
> Coming up with a counterexample now invalidates it?

There are two concerns here:
- is it possible to come up with an example of constructed values that
  show many collisions in a way that poses a threat? To this, the answer
  is apparently "yes", and the proposed reaction is to hard-limit the
  number of collisions accepted by the implementation.
- then, *assuming* such a limitation is in place: is it possible to come
  up with a realistic application that would break under this
  limitation. Mark's example is no such realistic application, instead,
  it is yet another example demonstrating collisions using constructed
  values (although the specific example would continue to work fine
  even under the limitation).

A valid counterexample would have to come from a real application, or
at least from a scenario that is plausible for a real application.


From sandro.tosi at  Sat Jan 14 17:14:05 2012
From: sandro.tosi at (Sandro Tosi)
Date: Sat, 14 Jan 2012 17:14:05 +0100
Subject: [Python-Dev] 2.7 now uses Sphinx 1.0
Message-ID: <>

just a heads-up: documentation for 2.7 branch has been ported to use
sphinx 1.0, so now the same syntax can be used for 2.x and 3.x
patches, hopefully easying working on both python stacks.

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From sandro.tosi at  Sat Jan 14 19:09:10 2012
From: sandro.tosi at (Sandro Tosi)
Date: Sat, 14 Jan 2012 19:09:10 +0100
Subject: [Python-Dev] "Documenting Python" is moving to devguide
Message-ID: <>

Hi all,
(another) heads-up about my current work: I've just pushed the
"Documenting Python" doc section (ftr: to devguide. That was
possibile now that we use the same sphinx version on all the active

It was not a re-editing of the content, that might still be outdated
and in need of work, but just a brutal cut & paste of the current
files. Now that we have a central place, additional editing will be
much more easy.

The section is still available in the cpython repo, and I'm waiting to
remove it because it's better to have some redirections in place from
the current urls to the new ones. I've prepared a small set of
RewriteRules (attached): I don't know the actual setup of apache for
docs.p.o but at least they are a start :) whomever has root access,
could please review & apply those rules?

Once the rewrites are in place, i'll take care of removing the
Doc/documenting dir from the active branches.

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:
-------------- next part --------------
        RewriteEngine On
        RewriteRule /documenting/$              /devguide/documenting.html                                 [NE,R=permanent,L]
        RewriteRule /documenting/index.html     /devguide/documenting.html                                 [NE,R=permanent,L]
        RewriteRule /documenting/intro.html     /devguide/documenting.html#introduction                    [NE,R=permanent,L]
        RewriteRule /documenting/style.html     /devguide/documenting.html#style-guide                     [NE,R=permanent,L]
        RewriteRule /documenting/rest.html      /devguide/documenting.html#restructuredtext-primer         [NE,R=permanent,L]
        RewriteRule /documenting/markup.html    /devguide/documenting.html#additional-markup-constructs    [NE,R=permanent,L]
        RewriteRule /documenting/fromlatex.html /devguide/documenting.html#differences-to-the-latex-markup [NE,R=permanent,L]
        RewriteRule /documenting/building.html  /devguide/documenting.html#building-the-documentation      [NE,R=permanent,L]

From greg at  Sat Jan 14 20:17:01 2012
From: greg at (Gregory P. Smith)
Date: Sat, 14 Jan 2012 11:17:01 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

My patch example does change the bytes object hash as well as Unicode.
On Jan 13, 2012 7:46 PM, <martin at> wrote:

> What an implementation looks like:
>> some stuff to be filled in, but this is all that is really required.
> I think this statement (and the patch) is wrong. You also need to change
> the byte string hashing, at least for 2.x. This I consider the biggest
> flaw in that approach - other people may have written string-like objects
> which continue to compare equal to a string but now hash different.
> Regards,
> Martin
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:**mailman/options/python-dev/**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sandro.tosi at  Sat Jan 14 22:34:52 2012
From: sandro.tosi at (Sandro Tosi)
Date: Sat, 14 Jan 2012 22:34:52 +0100
Subject: [Python-Dev] "Documenting Python" is moving to devguide
In-Reply-To: <>
References: <>
Message-ID: <>

Hi again,

On Sat, Jan 14, 2012 at 19:09, Sandro Tosi <sandro.tosi at> wrote:
> Hi all,
> (another) heads-up about my current work: I've just pushed the
> "Documenting Python" doc section (ftr:
> to devguide. That was
> possibile now that we use the same sphinx version on all the active
> branches.
> It was not a re-editing of the content, that might still be outdated
> and in need of work, but just a brutal cut & paste of the current
> files. Now that we have a central place, additional editing will be
> much more easy.
> The section is still available in the cpython repo, and I'm waiting to
> remove it because it's better to have some redirections in place from
> the current urls to the new ones. I've prepared a small set of
> RewriteRules (attached): I don't know the actual setup of apache for
> docs.p.o but at least they are a start :) whomever has root access,
> could please review & apply those rules?

Thanks to Georg that applied the rewrites both for 2.7 and 3.2 .

> Once the rewrites are in place, i'll take care of removing the
> Doc/documenting dir from the active branches.

and so Doc/documenting is gone on all the active branches.

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From greg at  Sun Jan 15 02:31:34 2012
From: greg at (Gregory P. Smith)
Date: Sat, 14 Jan 2012 17:31:34 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

FWIW the quick change i pastebin'ed is basically covered by the change
already under review in  I've
made my comments and suggestions there.

I looked into Modules/expat/xmlparse.c and it has an odd copy of the old
string hash algorithm entirely for its own internal use and its own
internal hash table implementations.  That module is likely vulnerable to
creatively crafted documents for the same reason.  With 13704 and the
public API it provides to get the random hash seed, that module could
simply be updated to use that in its own hash implementation.

As for when to enable it or not, I unfortunately have to agree, despite my
wild desires we can't turn on the hash randomization change by default in
anything prior to 3.3.


On Sat, Jan 14, 2012 at 11:17 AM, Gregory P. Smith <greg at> wrote:

> My patch example does change the bytes object hash as well as Unicode.
> On Jan 13, 2012 7:46 PM, <martin at> wrote:
>>  What an implementation looks like:
>>> some stuff to be filled in, but this is all that is really required.
>> I think this statement (and the patch) is wrong. You also need to change
>> the byte string hashing, at least for 2.x. This I consider the biggest
>> flaw in that approach - other people may have written string-like objects
>> which continue to compare equal to a string but now hash different.
>> Regards,
>> Martin
>> ______________________________**_________________
>> Python-Dev mailing list
>> Python-Dev at
>> Unsubscribe:**mailman/options/python-dev/**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sun Jan 15 05:42:59 2012
From: steve at (Steven D'Aprano)
Date: Sun, 15 Jan 2012 15:42:59 +1100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Victor Stinner wrote:

> - Marc Andre Lemburg proposes to fix the vulnerability directly in
> dict (for any key type). The patch raises an exception if a lookup
> causes more than 1000 collisions.

Am I missing something? How does this fix the vulnerability? It seems to me 
that the only thing this does is turn one sort of DOS attack into another sort 
of DOS attack: hostile users will just cause hash collisions until an 
exception is raised and the application falls over.

Catching these exceptions, and recovering from them (how?), would be the 
responsibility of the application author. Given that developers are unlikely 
to ever see 1000 collisions by accident, or even realise that it could happen, 
I don't expect that many people will do that -- until they personally get bitten.


From steve at  Sun Jan 15 05:49:50 2012
From: steve at (Steven D'Aprano)
Date: Sun, 15 Jan 2012 15:49:50 +1100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at> wrote:
>> It is perfectly okay to break existing users who had anything depending on
>> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
>> feature off at their own peril which they can use in their test harnesses
>> that are stupid enough to use doctests with order dependencies.
> No, that is not how we usually take compatibility between bugfix releases.
> "Your code is already broken" is not an argument to break forcefully what
> worked (even if by happenstance) before. The difference between CPython and
> Jython (or between different CPython feature releases) also isn't relevant
> -- historically we have often bent over backwards to avoid changing
> behavior that was technically undefined, if we believed it would affect a
> significant fraction of users.
> I don't think anyone doubts that this will break lots of code (at least,
> the arguments I've heard have been "their code is broken", not "nobody does
> that").

I don't know about "lots" of code, but it will break at least one library (or 
so I'm told):


From ncoghlan at  Sun Jan 15 06:11:44 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 15 Jan 2012 15:11:44 +1000
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 15, 2012 at 2:42 PM, Steven D'Aprano <steve at> wrote:
> Victor Stinner wrote:
>> - Marc Andre Lemburg proposes to fix the vulnerability directly in
>> dict (for any key type). The patch raises an exception if a lookup
>> causes more than 1000 collisions.
> Am I missing something? How does this fix the vulnerability? It seems to me
> that the only thing this does is turn one sort of DOS attack into another
> sort of DOS attack: hostile users will just cause hash collisions until an
> exception is raised and the application falls over.
> Catching these exceptions, and recovering from them (how?), would be the
> responsibility of the application author. Given that developers are unlikely
> to ever see 1000 collisions by accident, or even realise that it could
> happen, I don't expect that many people will do that -- until they
> personally get bitten.

As I understand it, the way the attack works is that a *single*
malicious request from the attacker can DoS the server by eating CPU
resources while evaluating a massive collision chain induced in a dict
by attacker supplied data. Explicitly truncating the collision chain
boots them out almost immediately (likely with a 500 response for an
internal server error), so they no longer affect other events, threads
and processes on the same machine.

In some ways, the idea is analogous to the way we implement explicit
recursion limiting in an attempt to avoid actually blowing the C stack
- we take a hard-to-detect-and-hard-to-handle situation (i.e. blowing
the C stack or malicious generation of long collision chains in a
dict) and replace it with something that is easy to detect and can be
handled by normal exception processing (i.e. a recursion depth
exception or one reporting an excessive number of slot collisions in a
dict lookup).

That then makes the default dict implementation safe from this kind of
attack by default, and use cases that are getting that many collisions
legitimately can be handled in one of two ways:
- switch to a more appropriate data type (if you're getting that many
collisions with benign data, a dict is probably the wrong container to
be using)
- offer a mechanism (command line switch or environment variable) to
turn the collision limiting off

Now, where you can still potentially run into problems is if a single
shared dict is used to store both benign and malicious data - if the
malicious data makes it into the destination dict before the exception
finally gets triggered, and then benign data also happens to trigger
the same collision chain, then yes, the entire app may fall over.
However, such an app would have been crippled by the original DoS
anyway, since its performance would have been gutted - the collision
chain limiting just means it will trigger exceptions for the cases
that would been insanely slow.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From eliben at  Sun Jan 15 07:33:16 2012
From: eliben at (Eli Bendersky)
Date: Sun, 15 Jan 2012 08:33:16 +0200
Subject: [Python-Dev] "Documenting Python" is moving to devguide
In-Reply-To: <>
References: <>
Message-ID: <>

> > The section is still available in the cpython repo, and I'm waiting to
> > remove it because it's better to have some redirections in place from
> > the current urls to the new ones. I've prepared a small set of
> > RewriteRules (attached): I don't know the actual setup of apache for
> > docs.p.o but at least they are a start :) whomever has root access,
> > could please review & apply those rules?
> Thanks to Georg that applied the rewrites both for 2.7 and 3.2 .
> > Once the rewrites are in place, i'll take care of removing the
> > Doc/documenting dir from the active branches.
> and so Doc/documenting is gone on all the active branches.

Good work Sandro, thanks! "Documenting Python" definitely belongs in the

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From hs at  Sun Jan 15 13:15:05 2012
From: hs at (Hynek Schlawack)
Date: Sun, 15 Jan 2012 13:15:05 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano:
> > I don't think anyone doubts that this will break lots of code (at least,
> > the arguments I've heard have been "their code is broken", not "nobody does
> > that").
> I don't know about "lots" of code, but it will break at least one library (or 
> so I'm told):
Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on top of that, the development is in limbo ATM)

From victor.stinner at  Sun Jan 15 15:27:55 2012
From: victor.stinner at (Victor Stinner)
Date: Sun, 15 Jan 2012 15:27:55 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

I don't think that it would be hard to patch this library to use
another hash function. It can implement its own hash function, use
MD5, SHA1, or anything else. hash() is not stable accross Python
versions and 32/64 bit systems.


2012/1/15 Hynek Schlawack <hs at>:
> Am Sonntag, 15. Januar 2012 um 05:49 schrieb Steven D'Aprano:
>> > I don't think anyone doubts that this will break lots of code (at least,
>> > the arguments I've heard have been "their code is broken", not "nobody does
>> > that").
>> I don't know about "lots" of code, but it will break at least one library (or
>> so I'm told):
> Sadly, suds is also Python's _only_ usable SOAP library at this moment. :( (on top of that, the development is in limbo ATM)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From stefan_ml at  Sun Jan 15 15:30:59 2012
From: stefan_ml at (Stefan Behnel)
Date: Sun, 15 Jan 2012 15:30:59 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <jer4lp$qe4$>
References: <>
Message-ID: <jeunv4$qu1$>

Terry Reedy, 14.01.2012 06:43:
> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>> It is perfectly okay to break existing users who had anything depending
>> on ordering of internal hash tables. Their code was already broken.
> Given that the doc says "Return the hash value of the object", I do not
> think we should be so hard-nosed. The above clearly implies that there is
> such a thing as *the* Python hash value for an object. And indeed, that has
> been true across many versions. If we had written "Return a hash value for
> the object, which can vary from run to run", the case would be different.

Just a side note, but I don't think hash() is the right place to document
this. Hashing is a protocol in Python, just like indexing or iteration.
Nothing keeps an object from changing its hash value due to modification,
and that would even be valid in the face of the usual dict lookup
invariants if changes are only applied while the object is not referenced
by any dict. So the guarantees do not depend on the function hash() and may
be even weaker than your above statement.


From lukasz at  Sun Jan 15 15:17:39 2012
From: lukasz at (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Sun, 15 Jan 2012 15:17:39 +0100
Subject: [Python-Dev] Dinsdale is no more
Message-ID: <>

Gentlemen, is down at the moment.

Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

From eliben at  Sun Jan 15 16:20:06 2012
From: eliben at (Eli Bendersky)
Date: Sun, 15 Jan 2012 17:20:06 +0200
Subject: [Python-Dev] Dinsdale is no more
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/15 ?ukasz Langa <lukasz at>

> Gentlemen, is down at the moment.
Well, it's back now:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Jan 15 17:10:54 2012
From: guido at (Guido van Rossum)
Date: Sun, 15 Jan 2012 08:10:54 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <jeunv4$qu1$>
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <>

On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel <stefan_ml at> wrote:

> Terry Reedy, 14.01.2012 06:43:
> > On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >
> >> It is perfectly okay to break existing users who had anything depending
> >> on ordering of internal hash tables. Their code was already broken.
> >
> > Given that the doc says "Return the hash value of the object", I do not
> > think we should be so hard-nosed. The above clearly implies that there is
> > such a thing as *the* Python hash value for an object. And indeed, that
> has
> > been true across many versions. If we had written "Return a hash value
> for
> > the object, which can vary from run to run", the case would be different.
> Just a side note, but I don't think hash() is the right place to document
> this.

You mean we shouldn't document that the hash() of a string will vary per

> Hashing is a protocol in Python, just like indexing or iteration.
> Nothing keeps an object from changing its hash value due to modification,

Eh? There's a huge body of cultural awareness that only immutable objects
should define a hash, implying that the hash remains constant during the
object's lifetime.

> and that would even be valid in the face of the usual dict lookup
> invariants if changes are only applied while the object is not referenced
> by any dict.

And how would you know it isn't?

> So the guarantees do not depend on the function hash() and may
> be even weaker than your above statement.

There are no actual guarantees for hash(), but lots of rules for
well-behaved hashes.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan_ml at  Sun Jan 15 17:46:36 2012
From: stefan_ml at (Stefan Behnel)
Date: Sun, 15 Jan 2012 17:46:36 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <jeuvtc$cun$>

Guido van Rossum, 15.01.2012 17:10:
> On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> Terry Reedy, 14.01.2012 06:43:
>>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>>>> It is perfectly okay to break existing users who had anything depending
>>>> on ordering of internal hash tables. Their code was already broken.
>>> Given that the doc says "Return the hash value of the object", I do not
>>> think we should be so hard-nosed. The above clearly implies that there is
>>> such a thing as *the* Python hash value for an object. And indeed, that
>> has
>>> been true across many versions. If we had written "Return a hash value
>> for
>>> the object, which can vary from run to run", the case would be different.
>> Just a side note, but I don't think hash() is the right place to document
>> this.
> You mean we shouldn't document that the hash() of a string will vary per
> run?

No, I mean that the hash() builtin function is not the right place to
document the behaviour of a string hash. That should go into the string
object documentation.

Although, arguably, it may be worth mentioning in the docs of hash() that,
in general, hash values of builtin types are bound to the lifetime of the
interpreter instance (or entire runtime?) and may change after restarts. I
think that's a reasonable restriction to document that prominently, even if
it will only apply to str for the time being.

>> Hashing is a protocol in Python, just like indexing or iteration.
>> Nothing keeps an object from changing its hash value due to modification,
> Eh? There's a huge body of cultural awareness that only immutable objects
> should define a hash, implying that the hash remains constant during the
> object's lifetime.
>> and that would even be valid in the face of the usual dict lookup
>> invariants if changes are only applied while the object is not referenced
>> by any dict.
> And how would you know it isn't?

Well, if it's an object with a mutable hash then it's up to the application
defining that object to make sure it's used in a sensible way. Immutability
just makes your life easier. I can imagine that an object gets removed from
a dict (say, a cache), modified and then reinserted, and I think it's valid
to allow the modification to have an impact on the hash in this case, in
order to accommodate for any changes to equality comparisons due to the

That being said, it seems that the Python docs actually consider constant
hashes a requirement rather than a virtue.

An object is hashable if it has a hash value which never changes during its
lifetime (it needs a __hash__() method), and can be compared to other
objects (it needs an __eq__() or __cmp__() method). Hashable objects which
compare equal must have the same hash value.

It also seems to me that the wording "has a hash value which never changes
during its lifetime" makes it pretty clear that the lifetime of the hash
value is not guaranteed to supersede the lifetime of the object (although
that's a rather muddy definition - memory lifetime? or pickle-unpickle as

However, this entry in the glossary only seems to have appeared with Py2.6,
likely as a result of the abc changes. So it won't help in defending a
change to the hash function.

>> So the guarantees do not depend on the function hash() and may
>> be even weaker than your above statement.
> There are no actual guarantees for hash(), but lots of rules for
> well-behaved hashes.



From greg at  Sun Jan 15 18:02:35 2012
From: greg at (Gregory P. Smith)
Date: Sun, 15 Jan 2012 09:02:35 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <jeuvtc$cun$>
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <>

On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at> wrote:
> It also seems to me that the wording "has a hash value which never changes
> during its lifetime" makes it pretty clear that the lifetime of the hash
> value is not guaranteed to supersede the lifetime of the object (although
> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
> well?).

Lifetime to me means of that specific instance of the object. I would not
expect that to survive pickle-unpickle.

> However, this entry in the glossary only seems to have appeared with Py2.6,
> likely as a result of the abc changes. So it won't help in defending a
> change to the hash function.

Ugh, I really hope there is no code out there depending on the hash
function being the same across a pickle and unpickle boundary.
 Unfortunately the hash function was last changed in 1996 in so it is possible someone
somewhere has written code blindly assuming that non-guarantee is true.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sun Jan 15 18:11:10 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 15 Jan 2012 18:11:10 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <>

On Sun, 15 Jan 2012 17:46:36 +0100
Stefan Behnel <stefan_ml at> wrote:
> Guido van Rossum, 15.01.2012 17:10:
> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
> >> Terry Reedy, 14.01.2012 06:43:
> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >>>
> >>>> It is perfectly okay to break existing users who had anything depending
> >>>> on ordering of internal hash tables. Their code was already broken.
> >>>
> >>> Given that the doc says "Return the hash value of the object", I do not
> >>> think we should be so hard-nosed. The above clearly implies that there is
> >>> such a thing as *the* Python hash value for an object. And indeed, that
> >> has
> >>> been true across many versions. If we had written "Return a hash value
> >> for
> >>> the object, which can vary from run to run", the case would be different.
> >>
> >> Just a side note, but I don't think hash() is the right place to document
> >> this.
> > 
> > You mean we shouldn't document that the hash() of a string will vary per
> > run?
> No, I mean that the hash() builtin function is not the right place to
> document the behaviour of a string hash. That should go into the string
> object documentation.

No, but we can document that *any* hash() value can vary between runs
without being specific about which builtin types randomize their
hashes right now.



From guido at  Sun Jan 15 18:44:08 2012
From: guido at (Guido van Rossum)
Date: Sun, 15 Jan 2012 09:44:08 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <jeuvtc$cun$>
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <>

On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at> wrote:

> Guido van Rossum, 15.01.2012 17:10:
> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
> >> Terry Reedy, 14.01.2012 06:43:
> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
> >>>
> >>>> It is perfectly okay to break existing users who had anything
> depending
> >>>> on ordering of internal hash tables. Their code was already broken.
> >>>
> >>> Given that the doc says "Return the hash value of the object", I do not
> >>> think we should be so hard-nosed. The above clearly implies that there
> is
> >>> such a thing as *the* Python hash value for an object. And indeed, that
> >> has
> >>> been true across many versions. If we had written "Return a hash value
> >> for
> >>> the object, which can vary from run to run", the case would be
> different.
> >>
> >> Just a side note, but I don't think hash() is the right place to
> document
> >> this.
> >
> > You mean we shouldn't document that the hash() of a string will vary per
> > run?
> No, I mean that the hash() builtin function is not the right place to
> document the behaviour of a string hash. That should go into the string
> object documentation.
> Although, arguably, it may be worth mentioning in the docs of hash() that,
> in general, hash values of builtin types are bound to the lifetime of the
> interpreter instance (or entire runtime?) and may change after restarts. I
> think that's a reasonable restriction to document that prominently, even if
> it will only apply to str for the time being.

Actually it will apply to a lot more than str, because the hash of
(immutable) compound objects is often derived from the hash of the
constituents, e.g. hash of a tuple.

> >> Hashing is a protocol in Python, just like indexing or iteration.
> >> Nothing keeps an object from changing its hash value due to
> modification,
> >
> > Eh? There's a huge body of cultural awareness that only immutable objects
> > should define a hash, implying that the hash remains constant during the
> > object's lifetime.
> >
> >> and that would even be valid in the face of the usual dict lookup
> >> invariants if changes are only applied while the object is not
> referenced
> >> by any dict.
> >
> > And how would you know it isn't?
> Well, if it's an object with a mutable hash then it's up to the application
> defining that object to make sure it's used in a sensible way. Immutability
> just makes your life easier. I can imagine that an object gets removed from
> a dict (say, a cache), modified and then reinserted, and I think it's valid
> to allow the modification to have an impact on the hash in this case, in
> order to accommodate for any changes to equality comparisons due to the
> modification.

That could be considered valid only in a very abstract, theoretical,
non-constructive way, since there is no protocol to detect removal from a
dict (and you cannot assume an object is used in only one dict at a time).

> That being said, it seems that the Python docs actually consider constant
> hashes a requirement rather than a virtue.
> """
> An object is hashable if it has a hash value which never changes during its
> lifetime (it needs a __hash__() method), and can be compared to other
> objects (it needs an __eq__() or __cmp__() method). Hashable objects which
> compare equal must have the same hash value.
> """
> It also seems to me that the wording "has a hash value which never changes
> during its lifetime" makes it pretty clear that the lifetime of the hash
> value is not guaranteed to supersede the lifetime of the object (although
> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
> well?).

Across pickle-unpickle it's not considered the same object. Pickling at
best preserves values.

However, this entry in the glossary only seems to have appeared with Py2.6,
> likely as a result of the abc changes. So it won't help in defending a
> change to the hash function.
> >> So the guarantees do not depend on the function hash() and may
> >> be even weaker than your above statement.
> >
> > There are no actual guarantees for hash(), but lots of rules for
> > well-behaved hashes.
> Absolutely.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From modelnine at  Sun Jan 15 19:40:49 2012
From: modelnine at (Heiko Wundram)
Date: Sun, 15 Jan 2012 19:40:49 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Am 15.01.2012 15:27, schrieb Victor Stinner:
> I don't think that it would be hard to patch this library to use
> another hash function. It can implement its own hash function, use
> MD5, SHA1, or anything else. hash() is not stable accross Python
> versions and 32/64 bit systems.

As I wrote in a reply further down: no, it isn't hard to change this 
behaviour (and I find the current caching system, which uses hash() on 
an URL to choose the cache index, braindead to begin with), but, as with 
all other considerations: the current version of the library, with the 
default options, depends on hash() to be stable for the cache to make 
any sense at all (and especially with "generic" schema such as the 
referenced xml.dtd, caching makes a lot of sense, and not being able to 
cache _breaks_ applications as it did mine). This is juts something to 
bear in mind.

--- Heiko.

From ulrich.eckhardt at  Mon Jan 16 10:12:27 2012
From: ulrich.eckhardt at (Ulrich Eckhardt)
Date: Mon, 16 Jan 2012 10:12:27 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <>
Message-ID: <>

Am 07.01.2012 18:57, schrieb "Martin v. L?wis":
> I just tried porting Python as a Metro (Windows 8) App, and failed.
> Metro Apps use a variant of the Windows API called WinRT that still
> allows to write native applications in C++, but restricts various APIs
> to a subset of the full Win32 functionality. For example, everything
> related to subprocess creation would not work; none of the
> byte-oriented file API seems to be present, and a number of file
> operation functions are absent as well (such as MoveFile).

Just wondering, do Metro apps define UNDER_CE or _WIN32_WCE? The point 
is that the old ANSI functions (CreateFileA etc) have been removed from 
the embedded MS Windows CE long ago, too, and MS Windows Mobile used to 
be a custom CE variant or at least strongly related. In any case, it 
could help using the existing (incomplete) CE port as base for Metro.

Domino Laser GmbH, Fangdieckstra?e 75a, 22547 Hamburg, Deutschland
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932
Visit our website at
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Domino Laser GmbH ist f?r diese Folgen nicht verantwortlich.

From neo_python at  Mon Jan 16 11:23:51 2012
From: neo_python at (python)
Date: Mon, 16 Jan 2012 18:23:51 +0800
Subject: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35
Message-ID: <>


python-dev-request at

>Send Python-Dev mailing list submissions to
>	python-dev at
>To subscribe or unsubscribe via the World Wide Web, visit
>or, via email, send a message with subject or body 'help' to
>	python-dev-request at
>You can reach the person managing the list at
>	python-dev-owner at
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Python-Dev digest..."
>Today's Topics:
>   1. Re: Status of the fix for the hash collision	vulnerability
>      (Gregory P. Smith)
>   2. Re: Status of the fix for the hash collision vulnerability
>      (Barry Warsaw)
>   3. Re: Sphinx version for Python 2.x docs (?ric Araujo)
>   4. Re: Status of the fix for the hash collision vulnerability
>      (martin at
>   5. Re: Status of the fix for the hash collision	vulnerability
>      (Guido van Rossum)
>   6. Re: [Python-checkins] cpython: add test,	which was missing
>      from d64ac9ab4cd0 (Nick Coghlan)
>   7. Re: Status of the fix for the hash collision	vulnerability
>      (Terry Reedy)
>   8. Re: Status of the fix for the hash collision	vulnerability
>      (Jack Diederich)
>   9. Re: cpython: Implement PEP 380 - 'yield from' (closes	#11682)
>      (Nick Coghlan)
>  10. Re: Status of the fix for the hash collision	vulnerability
>      (Nick Coghlan)
>Message: 1
>Date: Fri, 13 Jan 2012 19:06:00 -0800
>From: "Gregory P. Smith" <greg at>
>Cc: python-dev at
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>	<CAGE7PNKkHW-_WqiuQC9bhqxnoU77f+eprs_q3nqmycstM3JZag at>
>Content-Type: text/plain; charset="iso-8859-1"
>On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at> wrote:
>> On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at>wrote:
>>> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at>wrote:
>>>> On Thu, 12 Jan 2012 18:57:42 -0800
>>>> Guido van Rossum <guido at> wrote:
>>>> > Hm... I started out as a big fan of the randomized hash, but thinking
>>>> more
>>>> > about it, I actually believe that the chances of some legitimate app
>>>> having
>>>> > >1000 collisions are way smaller than the chances that somebody's code
>>>> will
>>>> > break due to the variable hashing.
>>>> Breaking due to variable hashing is deterministic: you notice it as
>>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>>> variable hashing). That seems better than unpredictable breaking when
>>>> some legitimate collision chain happens.
>>> Fair enough. But I'm now uncomfortable with turning this on for bugfix
>>> releases. I'm fine with making this the default in 3.3, just not in 3.2,
>>> 3.1 or 2.x -- it will break too much code and organizations will have to
>>> roll back the release or do extensive testing before installing a bugfix
>>> release -- exactly what we *don't* want for those.
>>> FWIW, I don't believe in the SafeDict solution -- you never know which
>>> dicts you have to change.
>> Agreed.
>> Of the three options Victor listed only one is good.
>> I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to
>> always get everything right with regards to data that came from outside the
>> process never ending up hashed in a non-safe dict or set *anywhere*.
>>  "Safe" needs to be the default option for all hash tables.
>> I don't like the "*too many hash collisions*" exception. *-1*. It
>> provides non-deterministic application behavior for data driven
>> applications with no way for them to predict when it'll happen or where and
>> prepare for it. It may work in practice for many applications but is simply
>> odd behavior.
>> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily
>> be back ported to any Python version.
>> It is perfectly okay to break existing users who had anything depending on
>> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
>> feature off at their own peril which they can use in their test harnesses
>> that are stupid enough to use doctests with order dependencies.
>What an implementation looks like:
>some stuff to be filled in, but this is all that is really required.  add
>logic to allow a particular seed to be specified or forced to 0 from the
>command line or environment.  add the logic to grab random bytes.  add the
>autoconf glue to disable it.  done.
>> This approach worked fine for Perl 9 years ago.
>> -gps
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <>
>Message: 2
>Date: Sat, 14 Jan 2012 04:19:38 +0100
>From: Barry Warsaw <barry at>
>To: python-dev at
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID: <20120114041938.098fd14b at rivendell>
>Content-Type: text/plain; charset=US-ASCII
>On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote:
>>On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at> wrote:
>>> Breaking due to variable hashing is deterministic: you notice it as
>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>> variable hashing). That seems better than unpredictable breaking when
>>> some legitimate collision chain happens.
>>Fair enough. But I'm now uncomfortable with turning this on for bugfix
>>releases. I'm fine with making this the default in 3.3, just not in 3.2,
>>3.1 or 2.x -- it will break too much code and organizations will have to
>>roll back the release or do extensive testing before installing a bugfix
>>release -- exactly what we *don't* want for those.
>Message: 3
>Date: Sat, 14 Jan 2012 04:24:52 +0100
>From: ?ric Araujo <merwok at>
>To: <python-dev at>
>Subject: Re: [Python-Dev] Sphinx version for Python 2.x docs
>Message-ID: <ff8dc5d4bd1c5d3583c3ff9c18e2445e at>
>Content-Type: text/plain; charset=UTF-8; format=flowed
>Hi Sandro,
>Thanks for getting the ball rolling on this.  One style for markup, one
>Sphinx version to code our extensions against and one location for the
>documenting guidelines will make our work a bit easier.
>> During the build process, there are some warnings that I can 
>> understand:
>I assume you mean ?can?t?, as you later ask how to fix them.  As a
>general rule, they?re only warnings, so they don?t break the build, 
>some links or stylings, so I think it?s okay to ignore them *right 
>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
>That?s a mistake I did in cefe4f38fa0e.  This sentence should be 
>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found 
>> for
>> cross-reference u'next':
>Need to use :meth:`.next` to let Sphinx find the right target (more 
>on request :)
>> Doc/library/sys.rst:651: WARNING: unknown keyword: None
>Should use ``None``.
>> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
>> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not
>I don?t know if these should work (i.e. create a link to the 
>language reference section) or abuse the markup (there are ?not? and
>?in? keywords, but no ?not in? keyword ? use ``not in``).  I?d say 
>Message: 4
>Date: Sat, 14 Jan 2012 04:45:57 +0100
>From: martin at
>To: python-dev at
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>	<20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA at>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed; DelSp=Yes
>> What an implementation looks like:
>> some stuff to be filled in, but this is all that is really required.
>I think this statement (and the patch) is wrong. You also need to change
>the byte string hashing, at least for 2.x. This I consider the biggest
>flaw in that approach - other people may have written string-like objects
>which continue to compare equal to a string but now hash different.
>Message: 5
>Date: Fri, 13 Jan 2012 20:00:54 -0800
>From: Guido van Rossum <guido at>
>To: "Gregory P. Smith" <greg at>
>Cc: Antoine Pitrou <solipsis at>, python-dev at
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>	<CAP7+vJL+Qrz0oiqbLPCg3QxVqZLjbOeMQpeQykiidiGC2uN9FQ at>
>Content-Type: text/plain; charset="iso-8859-1"
>On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at> wrote:
>> It is perfectly okay to break existing users who had anything depending on
>> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
>> feature off at their own peril which they can use in their test harnesses
>> that are stupid enough to use doctests with order dependencies.
>No, that is not how we usually take compatibility between bugfix releases.
>"Your code is already broken" is not an argument to break forcefully what
>worked (even if by happenstance) before. The difference between CPython and
>Jython (or between different CPython feature releases) also isn't relevant
>-- historically we have often bent over backwards to avoid changing
>behavior that was technically undefined, if we believed it would affect a
>significant fraction of users.
>I don't think anyone doubts that this will break lots of code (at least,
>the arguments I've heard have been "their code is broken", not "nobody does
>This approach worked fine for Perl 9 years ago.
>I don't know what the Perl attitude about breaking undefined behavior
>between micro versions was at the time. But ours is pretty clear -- don't
>do it.
>--Guido van Rossum (
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <>
>Message: 6
>Date: Sat, 14 Jan 2012 15:16:32 +1000
>From: Nick Coghlan <ncoghlan at>
>To: python-dev at
>Cc: python-checkins at
>Subject: Re: [Python-Dev] [Python-checkins] cpython: add test,	which
>	was missing from d64ac9ab4cd0
>	<CADiSq7fcjLgkrjQEqBhb0oNu9eiLnHhovtoZRDzNSTDvjzx3ZQ at>
>Content-Type: text/plain; charset=ISO-8859-1
>On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson
><python-checkins at> wrote:
>> changeset: ? 74363:be85914b611c
>> parent: ? ? ?74361:609482c6710e
>> user: ? ? ? ?Benjamin Peterson <benjamin at>
>> date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500
>> summary:
>> ?add test, which was missing from d64ac9ab4cd0
>Ah, that's where that came from, thanks.
>I still haven't fully trained myself to use hg import instead of
>patch, which would avoid precisely this kind of error :P
>Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia
>Message: 7
>Date: Sat, 14 Jan 2012 00:43:04 -0500
>From: Terry Reedy <tjreedy at>
>To: python-dev at
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID: <jer4lp$qe4$1 at>
>Content-Type: text/plain; charset=UTF-8; format=flowed
>On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>> It is perfectly okay to break existing users who had anything depending
>> on ordering of internal hash tables. Their code was already broken.
>Given that the doc says "Return the hash value of the object", I do not 
>think we should be so hard-nosed. The above clearly implies that there 
>is such a thing as *the* Python hash value for an object. And indeed, 
>that has been true across many versions. If we had written "Return a 
>hash value for the object, which can vary from run to run", the case 
>would be different.
>Terry Jan Reedy
>Message: 8
>Date: Sat, 14 Jan 2012 01:24:54 -0500
>From: Jack Diederich <jackdied at>
>To: Guido van Rossum <guido at>
>Cc: Python Dev <Python-Dev at>
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>	<CACLn2+3Z1EW8Rxox7Zif=20P2SDHxYhv+Wo6dhXKKnO09+-uxQ at>
>Content-Type: text/plain; charset=ISO-8859-1
>On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at> wrote:
>> Hm... I started out as a big fan of the randomized hash, but thinking more
>> about it, I actually believe that the chances of some legitimate app having
>>>1000 collisions are way smaller than the chances that somebody's code will
>> break due to the variable hashing.
>Python's dicts are designed to avoid hash conflicts by resizing and
>keeping the available slots bountiful.  1000 conflicts sounds like a
>number that couldn't be hit accidentally unless you had a single dict
>using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're
>good).   The hashes also look to exploit cache locality but that is
>very unlikely to get one thousand conflicts by chance.  If you get
>that many there is an attack.
>> This is depending on how the counting is done (I didn't look at MAL's
>> patch), and assuming that increasing the hash table size will generally
>> reduce collisions if items collide but their hashes are different.
>The patch counts conflicts on an individual insert and not lifetime
>conflicts.  Looks sane to me.
>> That said, even with collision counting I'd like a way to disable it without
>> changing the code, e.g. a flag or environment variable.
>Agreed.  Paranoid people can turn the behavior off and if it ever were
>to become a problem in practice we could point people to a solution.
>Message: 9
>Date: Sat, 14 Jan 2012 16:53:39 +1000
>From: Nick Coghlan <ncoghlan at>
>To: Georg Brandl <g.brandl at>
>Cc: python-dev at
>Subject: Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from'
>	(closes	#11682)
>	<CADiSq7dA6P8U3_MiweM9=s-q49+y0KndeQX=ZNGWog-dZ-hzMA at>
>Content-Type: text/plain; charset=ISO-8859-1
>On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl <g.brandl at> wrote:
>> On 01/13/2012 12:43 PM, nick.coghlan wrote:
>>> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
>> There should probably be a "versionadded" somewhere on this page.
>Good catch, I added versionchanged notes to this page, simple_stmts
>and the StopIteration entry in the library reference.
>>> ?PEP 3155: Qualified name for classes and functions
>>> ?==================================================
>> This looks like a spurious (and syntax-breaking) change.
>Yeah, it was an error I introduced last time I merged from default. Fixed.
>>> diff --git a/Grammar/Grammar b/Grammar/Grammar
>>> -argument: test [comp_for] | test '=' test ?# Really [keyword '='] test
>>> +argument: (test) [comp_for] | test '=' test ?# Really [keyword '='] test
>> This looks like a change without effect?
>It was a lingering after-effect of Greg's original patch (which also
>modified the function call syntax to allow "yield from" expressions
>with extra parens). I reverted the change to the function call syntax,
>but forgot to ditch the added parens while doing so.
>>> diff --git a/Include/genobject.h b/Include/genobject.h
>>> - ? ? /* List of weak reference. */
>>> - ? ? PyObject *gi_weakreflist;
>>> + ? ? ? ?/* List of weak reference. */
>>> + ? ? ? ?PyObject *gi_weakreflist;
>>> ?} PyGenObject;
>> While these change tabs into spaces, it should be 4 spaces, not 8.
>>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);
>> Does this API need to be public? If yes, it needs to be documented.
>Hmm, good point - that one needs a bit of thought, so I've put it on
>the tracker:
>(that issue also covers your comments regarding the docstring for this
>function and whether or not we even need the StopIteration instance
>creation API)
>>> -#define CALL_FUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */
>>> -#define MAKE_FUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> -#define BUILD_SLICE ?133 ? ? /* Number of items */
>>> +#define CALL_FUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */
>>> +#define MAKE_FUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> +#define BUILD_SLICE ? ? 133 ? ? /* Number of items */
>> Not sure putting these and all the other cosmetic changes into an already
>> big patch is such a good idea...
>I agree, but it's one of the challenges of a long-lived branch like
>the PEP 380 one (I believe some of these cosmetic changes started life
>in Greg's original patch and separating them out would have been quite
>a pain). Anyone that wants to see the gory details of the branch
>history can take a look at my bitbucket repo:
>>> diff --git a/Objects/abstract.c b/Objects/abstract.c
>>> --- a/Objects/abstract.c
>>> +++ b/Objects/abstract.c
>>> @@ -2267,7 +2267,6 @@
>>> ? ? ?func = PyObject_GetAttrString(o, name);
>>> ? ? ?if (func == NULL) {
>>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>>> ? ? ? ? ?return 0;
>>> ? ? ?}
>>> @@ -2311,7 +2310,6 @@
>>> ? ? ?func = PyObject_GetAttrString(o, name);
>>> ? ? ?if (func == NULL) {
>>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>>> ? ? ? ? ?return 0;
>>> ? ? ?}
>>> ? ? ?va_start(va, format);
>> These two changes also look suspiciously unrelated?
>IIRC, I removed those lines while working on the patch because the
>message they produce (just the attribute name) is worse than the one
>produced by the call to PyObject_GetAttrString (which also includes
>the type of the object being accessed). Leaving the original
>exceptions alone helped me track down some failures I was getting at
>the time.
>I've now made the various CallMethod helper APIs in abstract.c (1
>public, 3 private) consistently leave the GetAttr exception alone and
>added an explicit C API note to NEWS.
>(Vaguely related tangent: the new code added by the patch probably has
>a few parts that could benefit from the new GetAttrId private API)
>>> diff --git a/Objects/genobject.c b/Objects/genobject.c
>>> + ? ? ? ?} else {
>>> + ? ? ? ? ? ?PyObject *e = PyStopIteration_Create(result);
>>> + ? ? ? ? ? ?if (e != NULL) {
>>> + ? ? ? ? ? ? ? ?PyErr_SetObject(PyExc_StopIteration, e);
>>> + ? ? ? ? ? ? ? ?Py_DECREF(e);
>>> + ? ? ? ? ? ?}
>> Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here
>> anyway?
>I think you're right - so noted in the tracker issue about the C API additions.
>Thanks for the thorough review, a fresh set of eyes is very helpful :)
>Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia
>Message: 10
>Date: Sat, 14 Jan 2012 17:01:48 +1000
>From: Nick Coghlan <ncoghlan at>
>To: Jack Diederich <jackdied at>
>Cc: Guido van Rossum <guido at>, Python Dev
>	<Python-Dev at>
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>	<CADiSq7cmNjM8mEEhktFjA5Ss+K0Z8u_CF7tmMucn56dWOzVFUQ at>
>Content-Type: text/plain; charset=ISO-8859-1
>On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich <jackdied at> wrote:
>>> This is depending on how the counting is done (I didn't look at MAL's
>>> patch), and assuming that increasing the hash table size will generally
>>> reduce collisions if items collide but their hashes are different.
>> The patch counts conflicts on an individual insert and not lifetime
>> conflicts. ?Looks sane to me.
>Having a hard limit on the worst-case behaviour certainly sounds like
>an attractive prospect. And there's nothing to worry about in terms of
>secrecy or sufficient randomness - by default, attackers cannot
>generate more than 1000 hash collisions in one lookup, period.
>>> That said, even with collision counting I'd like a way to disable it without
>>> changing the code, e.g. a flag or environment variable.
>> Agreed. ?Paranoid people can turn the behavior off and if it ever were
>> to become a problem in practice we could point people to a solution.
>Does MAL's patch allow the limit to be set on a per-dict basis
>(including setting it to None to disable collision limiting
>completely)? If people have data sets that need to tolerate that kind
>of collision level (and haven't already decided to move to a data
>structure other than the builtin dict), then it may make sense to
>allow them to remove the limit when using trusted input.
>For maintenance versions though, it would definitely need to be
>possible to switch it off without touching the code.
>Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia
>Python-Dev mailing list
>Python-Dev at
>End of Python-Dev Digest, Vol 102, Issue 35

From steve at  Mon Jan 16 13:28:59 2012
From: steve at (Steven D'Aprano)
Date: Mon, 16 Jan 2012 23:28:59 +1100
Subject: [Python-Dev] Python-Dev Digest, Vol 102, Issue 35
In-Reply-To: <>
References: <>
Message-ID: <>

python wrote:
> jbk
[snip 560+ lines of quoted text]

Please delete irrelevant text when replying to digests, and replace the 
subject line with a meaningful subject.


From merwok at  Mon Jan 16 16:42:14 2012
From: merwok at (=?UTF-8?Q?=C3=89ric_Araujo?=)
Date: Mon, 16 Jan 2012 16:42:14 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <>
References: "\"<>	<>	<j3a0dn$pas$>"
Message-ID: <>


Le 14/01/2012 15:31, Sandro Tosi a ?crit :
> On Sat, Jan 14, 2012 at 04:24, ?ric Araujo <merwok at> wrote:
>>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
>> That?s a mistake I did in cefe4f38fa0e.  This sentence should be 
>> removed.
> Do you mean revert this whole hunk:
> [...]
> or just "The :keyword:`nonlocal` allows writing to outer scopes."?

My proposal was to remove just that one last sentence, but the only
other change in the diff hunk is the addition of ?by default?, which is
connected to the existence of nonlocal.  Both changes, i.e. the whole
hunk, should be reverted (I think I?ll have time to do that today).

>>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found 
>>> for
>>> cross-reference u'next':
>> Need to use :meth:`.next` to let Sphinx find the right target (more 
>> info
>> on request :)
> it seems what it needed to was :meth:`next` (without the dot). The
> current page links all 'next' in to functions.html#next,
> and using :meth:`next` does that.

I should have given more info, as I wanted the opposite result :) should not link to the next function but to the
method.  Because Sphinx does not differentiate between
meth/func/class/mod roles, :meth:`next` is not resolved to the nearest
next method as one could expect but to the next function, so we have to
use :meth:`` or :meth:`.next` (local ref markup) to get
our links to methods.

>>> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
>>> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is 
>>> not

Georg fixed them.


From brett at  Mon Jan 16 17:17:42 2012
From: brett at (Brett Cannon)
Date: Mon, 16 Jan 2012 11:17:42 -0500
Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3
	feature list up to date.
In-Reply-To: <>
References: <>
Message-ID: <>

Is the change to the pyc format big enough news to go into the release PEP?
Or should that just be a "What's New" topic?

On Fri, Jan 13, 2012 at 15:18, georg.brandl <python-checkins at>wrote:

> changeset:   4012:ea3ffa3611e5
> user:        Georg Brandl <georg at>
> date:        Fri Jan 13 21:18:11 2012 +0100
> summary:
>  Bring the Python 3.3 feature list up to date.
> files:
>  pep-0398.txt |  17 ++++++++++++-----
>  1 files changed, 12 insertions(+), 5 deletions(-)
> diff --git a/pep-0398.txt b/pep-0398.txt
> --- a/pep-0398.txt
> +++ b/pep-0398.txt
> @@ -57,27 +57,34 @@
>  Features for 3.3
>  ================
> +Implemented PEPs:
> +
> +* PEP 380: Syntax for Delegating to a Subgenerator
> +* PEP 393: Flexible String Representation
> +* PEP 3151: Reworking the OS and IO exception hierarchy
> +* PEP 3155: Qualified name for classes and functions
> +
> +Other final large-scale changes:
> +
> +* Addition of the "packaging" module, deprecating "distutils"
> +* Addition of the faulthandler module
> +
>  Candidate PEPs:
>  * PEP 362: Function Signature Object
> -* PEP 380: Syntax for Delegating to a Subgenerator
>  * PEP 382: Namespace Packages
> -* PEP 393: Flexible String Representation
>  * PEP 395: Module Aliasing
>  * PEP 397: Python launcher for Windows
>  * PEP 3143: Standard daemon process library
> -* PEP 3151: Reworking the OS and IO exception hierarchy
>  (Note that these are not accepted yet and even if they are, they might
>  not be finished in time for Python 3.3.)
>  Other planned large-scale changes:
> -* Addition of the "packaging" module, replacing "distutils"
>  * Implementing ``__import__`` using importlib
>  * Email version 6
>  * A standard event-loop interface (PEP by Jim Fulton pending)
> -* Adding the faulthandler module.
>  * Breaking out standard library and docs in separate repos?
>  * A PEP on supplementing C modules with equivalent Python modules?
> --
> Repository URL:
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Mon Jan 16 17:28:11 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 16 Jan 2012 17:28:11 +0100
Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3
 feature list up to date.
References: <>
Message-ID: <>

On Mon, 16 Jan 2012 11:17:42 -0500
Brett Cannon <brett at> wrote:
> Is the change to the pyc format big enough news to go into the release PEP?
> Or should that just be a "What's New" topic?

"What's New" sounds enough to me. The change doesn't enable any new
feature, it just makes an issue much less likely to pop out.



From jaraco at  Mon Jan 16 21:00:37 2012
From: jaraco at (Jason R. Coombs)
Date: Mon, 16 Jan 2012 20:00:37 +0000
Subject: [Python-Dev] Script(s) for building Python on Windows
Message-ID: <>

The current scripts for building Python lack some things to be desired.


The first thing I notice when I try to build Python on Windows is the
scripts expect to be run inside of a Visual Studio environment, the
environment of which is only defined inside of a cmd.exe context. This means
the scripts can't be executed from within Powershell (my preferred shell on
Windows). One must first shell out to cmd.exe, which disables any
Powershell-specific features the developer might have installed (aliases,
functions, etc).


The second thing I notice is the scripts assume Visual Studio 2008. And
while I recognize that Python is specifically built against Visual Studio
2008 for the official releases and that Visual Studio 2008 may be the only
officially-supported build environment, later releases, such as Visual
Studio 2010 are also adequate for testing purposes. I've been developing
Python against Visual Studio 2010 for quite a while and it seems to be more
than adequate. And while it's not the responsibility of the scripts to
accommodate such environments, if the scripts could allow for such
environments, that would be nice. Furthermore, having scripts that codify
the process to upgrade will facilitate the migration should someone make the
decision to officially upgrade to Visual Studio 2010.


The third thing that I notice is that the command-line argument handling by
the batch scripts is clumsy (compared to argparse, for example). This
clumsiness is not a criticism of the authors, who have done well with the
tools they had. However, batch programming is probably one of the least
powerful ways to automate builds these days.


So to ease my experience, I've developed my own library of functions and
commands to facilitate building Python that aren't subject to the above
limitations. Of course, I built these in Python, so they do require Python
to build Python (not a huge burden, but worth mentioning). All of these
modules are open-source and part of the jaraco.develop package
<> .


The first of these modules is jaraco.develop.vstudio
/> . It exposes a class for locating Visual Studio in the usual
locations, loading the environment for that instance of Visual Studio, and
upgrading a project or solution file to that version. This class in
particular enables running Visual Studio commands (including msbuild) from
within a Visual Studio environment without actually requiring a cmd.exe
context with that environment.


Another module is jaraco.develop.python
/> , which includes build_python, a function (and command) to build
Python using whatever version of Visual Studio can be found (9 or 10
required). It has no environmental requirements except that Visual Studio be
installed. Simply run build-python (part of jaraco.develop's console
scripts) and it will build PCbuild.sln from the current directory to
whatever targets are specified (or all of them if none are specified). The
builder currently makes some assumptions (such as always building the 64-bit
Release targets), but those could easily be customized using argparse


This package and these modules have been tested and run on Python 2.7+.
These tools solve the three shortcomings I mentioned above and make the
development process so much smoother, IMO. If these modules were built into
the repository, building Python could be as simple as "hg clone; cd
cpython/pcbuild; ./" (assuming only Visual Studio and Python


I'd like to propose migrating this functionality (mainly these two modules)
into the CPython heads for Python 2.7, 3.1, 3.2, and default as
PCbuild/ (or similar). This functionality doesn't necessarily need
to supersede the existing scripts (env, build_env, build), though it
certainly could (and would as far as my usage is concerned).


If there are no objections, I'll work to extract the aforementioned
functionality from the jaraco.develop modules and into a portable script and
put together a proof-of-concept in the default branch. The build script
should not interfere with any build bots or other existing build processes,
but should enable another more powerful technique for producing builds.


I look forward to your comments and feedback.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6662 bytes
Desc: not available
URL: <>

From greg at  Mon Jan 16 21:16:38 2012
From: greg at (Gregory P. Smith)
Date: Mon, 16 Jan 2012 12:16:38 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
	<jer4lp$qe4$> <jeunv4$qu1$>
Message-ID: <>

On Sun, Jan 15, 2012 at 9:44 AM, Guido van Rossum <guido at> wrote:

> On Sun, Jan 15, 2012 at 8:46 AM, Stefan Behnel <stefan_ml at>wrote:
>> Guido van Rossum, 15.01.2012 17:10:
>> > On Sun, Jan 15, 2012 at 6:30 AM, Stefan Behnel wrote:
>> >> Terry Reedy, 14.01.2012 06:43:
>> >>> On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>> >>>
>> >>>> It is perfectly okay to break existing users who had anything
>> depending
>> >>>> on ordering of internal hash tables. Their code was already broken.
>> >>>
>> >>> Given that the doc says "Return the hash value of the object", I do
>> not
>> >>> think we should be so hard-nosed. The above clearly implies that
>> there is
>> >>> such a thing as *the* Python hash value for an object. And indeed,
>> that
>> >> has
>> >>> been true across many versions. If we had written "Return a hash value
>> >> for
>> >>> the object, which can vary from run to run", the case would be
>> different.
>> >>
>> >> Just a side note, but I don't think hash() is the right place to
>> document
>> >> this.
>> >
>> > You mean we shouldn't document that the hash() of a string will vary per
>> > run?
>> No, I mean that the hash() builtin function is not the right place to
>> document the behaviour of a string hash. That should go into the string
>> object documentation.
>> Although, arguably, it may be worth mentioning in the docs of hash() that,
>> in general, hash values of builtin types are bound to the lifetime of the
>> interpreter instance (or entire runtime?) and may change after restarts. I
>> think that's a reasonable restriction to document that prominently, even
>> if
>> it will only apply to str for the time being.
> Actually it will apply to a lot more than str, because the hash of
> (immutable) compound objects is often derived from the hash of the
> constituents, e.g. hash of a tuple.
>> >> Hashing is a protocol in Python, just like indexing or iteration.
>> >> Nothing keeps an object from changing its hash value due to
>> modification,
>> >
>> > Eh? There's a huge body of cultural awareness that only immutable
>> objects
>> > should define a hash, implying that the hash remains constant during the
>> > object's lifetime.
>> >
>> >> and that would even be valid in the face of the usual dict lookup
>> >> invariants if changes are only applied while the object is not
>> referenced
>> >> by any dict.
>> >
>> > And how would you know it isn't?
>> Well, if it's an object with a mutable hash then it's up to the
>> application
>> defining that object to make sure it's used in a sensible way.
>> Immutability
>> just makes your life easier. I can imagine that an object gets removed
>> from
>> a dict (say, a cache), modified and then reinserted, and I think it's
>> valid
>> to allow the modification to have an impact on the hash in this case, in
>> order to accommodate for any changes to equality comparisons due to the
>> modification.
> That could be considered valid only in a very abstract, theoretical,
> non-constructive way, since there is no protocol to detect removal from a
> dict (and you cannot assume an object is used in only one dict at a time).
>> That being said, it seems that the Python docs actually consider constant
>> hashes a requirement rather than a virtue.
>> """
>> An object is hashable if it has a hash value which never changes during
>> its
>> lifetime (it needs a __hash__() method), and can be compared to other
>> objects (it needs an __eq__() or __cmp__() method). Hashable objects which
>> compare equal must have the same hash value.
>> """
>> It also seems to me that the wording "has a hash value which never changes
>> during its lifetime" makes it pretty clear that the lifetime of the hash
>> value is not guaranteed to supersede the lifetime of the object (although
>> that's a rather muddy definition - memory lifetime? or pickle-unpickle as
>> well?).
> Across pickle-unpickle it's not considered the same object. Pickling at
> best preserves values.

Updating the docs to explicitly clarify this sounds like a good idea.  How
does this wording to be added to the glossary.rst hashing section sound?

"""Hash values may not be stable across Python processes and must not be
used for storage or otherwise communicated outside of a single Python

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From brian at  Mon Jan 16 21:19:33 2012
From: brian at (Brian Curtin)
Date: Mon, 16 Jan 2012 14:19:33 -0600
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 16, 2012 at 14:00, Jason R. Coombs <jaraco at> wrote:
> The second thing I notice is the scripts assume Visual Studio 2008. And
> while I recognize that Python is specifically built against Visual Studio
> 2008 for the official releases and that Visual Studio 2008 may be the only
> officially-supported build environment, later releases, such as Visual
> Studio 2010 are also adequate for testing purposes. I?ve been developing
> Python against Visual Studio 2010 for quite a while and it seems to be more
> than adequate. And while it?s not the responsibility of the scripts to
> accommodate such environments, if the scripts could allow for such
> environments, that would be nice.

2010 is adequate for limited use but the test suite doesn't pass, so I
would be hesitant to add support and/or documentation for building
with it until we actually support it the same as or in place of 2008.

From jaraco at  Mon Jan 16 21:33:08 2012
From: jaraco at (Jason R. Coombs)
Date: Mon, 16 Jan 2012 20:33:08 +0000
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> From: Brian Curtin [mailto:brian at]
> Sent: Monday, 16 January, 2012 15:20
> 2010 is adequate for limited use but the test suite doesn't pass, so I
would be
> hesitant to add support and/or documentation for building with it until we
> actually support it the same as or in place of 2008.

Good point. The current tools don't automatically support 2010; an extra
command is require to perform the conversion. I'll be cautious and not
expose that functionality without some indication to the user of the

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6662 bytes
Desc: not available
URL: <>

From martin at  Mon Jan 16 22:24:40 2012
From: martin at (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 16 Jan 2012 22:24:40 +0100
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> If there are no objections, I?ll work to extract the aforementioned
> functionality from the jaraco.develop modules and into a portable script
> and put together a proof-of-concept in the default branch. The build
> script should not interfere with any build bots or other existing build
> processes, but should enable another more powerful technique for
> producing builds.

I'd be hesitant to put too many specialized tools into the tree that
will become unmaintained. Please take a look at the vs9to8 tool
in PCbuild; if you could adjust that to support VS 10, it would be
better IMO.

As for completely automating the build: please take notice of
Tools/buildbot/build.bat. It also fully automates the build, also
doesn't require that the VS environment is already activated,
and has the additional advantage of not requiring Python to be


From paul at  Mon Jan 16 23:23:40 2012
From: paul at (Paul McMillan)
Date: Mon, 16 Jan 2012 14:23:40 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> As I understand it, the way the attack works is that a *single*
> malicious request from the attacker can DoS the server by eating CPU
> resources while evaluating a massive collision chain induced in a dict
> by attacker supplied data. Explicitly truncating the collision chain
> boots them out almost immediately (likely with a 500 response for an
> internal server error), so they no longer affect other events, threads
> and processes on the same machine.

This is only true in the specific attack presented at 28c3. If an
attacker can insert data without triggering the attack, it's possible
to produce (in the example of a web application) urls that (regardless
of the request) always produce pathological behavior. For example, a
collection of pathological usernames might make it impossible to list
users (and so choose which ones to delete) without resorting to
removing the problem data at an SQL level.

This is why the "simply throw an error" solution isn't a complete fix.
Making portions of an interface unusable for regular users is clearly
a bad thing, and is clearly applicable to other types of poisoned data
as well. We need to detect collisions and work around them

> However, such an app would have been crippled by the original DoS
> anyway, since its performance would have been gutted - the collision
> chain limiting just means it will trigger exceptions for the cases
> that would been insanely slow.

We can do better than saying "it would have been broken before, it's
broken differently now". The universal hash function idea has merit,
and for practical purposes hash randomization would fix this too
(since colliding data is only likely to collide within a single
process, persistent poisoning is far less feasible).


From timothy.c.delaney at  Tue Jan 17 00:14:02 2012
From: timothy.c.delaney at (Tim Delaney)
Date: Tue, 17 Jan 2012 10:14:02 +1100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On 17 January 2012 09:23, Paul McMillan <paul at> wrote:

> This is why the "simply throw an error" solution isn't a complete fix.
> Making portions of an interface unusable for regular users is clearly
> a bad thing, and is clearly applicable to other types of poisoned data
> as well. We need to detect collisions and work around them
> transparently.

What if in a pathological collision (e.g. > 1000 collisions), we increased
the size of a dict by a small but random amount? Should be transparent,
have neglible speed penalty, maximal reuse of existing code, and should be
very difficult to attack since the dictionary would change size in a (near)
non-deterministic manner when being attacked (i.e. first attack causes
non-deterministic remap, next attack should fail).

It should also have near-zero effect on existing tests and frameworks since
we would only get the non-deterministic behaviour in pathological cases,
which we would presumably need new tests for.


Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From timothy.c.delaney at  Tue Jan 17 00:17:05 2012
From: timothy.c.delaney at (Tim Delaney)
Date: Tue, 17 Jan 2012 10:17:05 +1100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On 17 January 2012 10:14, Tim Delaney <timothy.c.delaney at> wrote:

> On 17 January 2012 09:23, Paul McMillan <paul at> wrote:
>> This is why the "simply throw an error" solution isn't a complete fix.
>> Making portions of an interface unusable for regular users is clearly
>> a bad thing, and is clearly applicable to other types of poisoned data
>> as well. We need to detect collisions and work around them
>> transparently.
> What if in a pathological collision (e.g. > 1000 collisions), we increased
> the size of a dict by a small but random amount? Should be transparent,
> have neglible speed penalty, maximal reuse of existing code, and should be
> very difficult to attack since the dictionary would change size in a (near)
> non-deterministic manner when being attacked (i.e. first attack causes
> non-deterministic remap, next attack should fail).
> It should also have near-zero effect on existing tests and frameworks
> since we would only get the non-deterministic behaviour in pathological
> cases, which we would presumably need new tests for.
> Thoughts?

And one thought I had immediately after hitting send is that there could be
an attack of the form "build a huge dict, then hit it with something that
causes it to rehash due to >1000 collisions". But that's not really going
to be any worse than just building a huge dict and hitting a resize anyway.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jaraco at  Tue Jan 17 01:01:12 2012
From: jaraco at (Jason R. Coombs)
Date: Tue, 17 Jan 2012 00:01:12 +0000
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> From: "Martin v. L?wis" [mailto:martin at]
> Sent: Monday, 16 January, 2012 16:25
> I'd be hesitant to put too many specialized tools into the tree that will
> become unmaintained. Please take a look at the vs9to8 tool in PCbuild; if
> could adjust that to support VS 10, it would be better IMO.

Are you suggesting creating vs10to9, which would be congruent to vs9to8, or

I'm unsure if the conversion from 9 to 10 or 10 to 9 can be as simple as the
vs9to8 suggests. When I run the upgrade using the Visual Studio tools, it
does upgrade the .sln file [as so]( But as you can
see, it also converts all of the .vcproj to .vcxproj, which appears to be a
very different schema. According to [this article](
4345a151-d288-48d6-b7c7-a7c598d0f85e) it should be trivial to downgrade by
only updating the .sln file (perhaps Visual Studio 2008 is forward
compatible with the .vcxproj format).

I'll look into this more when I have a better idea what you had in mind.

My goal in adding the upgrade code was to provide a one-step upgrade for
developers with only VS 10 installed. That's what vs-upgrade in
jaraco.develop does.

> As for completely automating the build: please take notice of
> Tools/buildbot/build.bat. It also fully automates the build, also doesn't
> require that the VS environment is already activated, and has the
> advantage of not requiring Python to be installed.

That's interesting, but it still suffers from several shortcomings:

1) It still assumes Visual Studio 2008 and fails with an obscure error
2) You can't use it to build different targets (only the whole solution).
3) It automatically downloads the external dependencies (it'd be nice to
build without them on occasion).
4) It's still a batch file, so still gives the abominable "Terminate batch
job (Y/N)?" when cancelling any operation via Ctrl+C.
5) This functionality isn't in PCBuild/*. Why not?
6) There's no good way to select which type to build (64-bit versus 32-bit,
release versus debug). Adding these command-line options is clumsy in batch
7) Since it's written in batch script, Python programmers might be hesitant
to work with it (improve it).

For a buildbot, the batch file is perfectly adequate. It should do the same
thing every time reliably.

For anyone but a robot or seasoned CPython Windows developer, however, the
build tools are not intuitive, and I find that I'm constantly tweaking the
batch scripts and asking myself, "why couldn't this be in Python, which is a
much more powerful language?" This is why I developed the scripts, and my
thought is they could be useful to others as well.

My hope is they might even supersede the existing scripts and become
canonical, in which case there would be no possibility of them becoming
unmaintained. If it turns out that they do become unused and unmaintained,
they can be removed, but my feeling is since they're concise, documented,
Python scripts, they'd be more likely to be maintained than their '.bat'

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6662 bytes
Desc: not available
URL: <>

From brian at  Tue Jan 17 01:13:29 2012
From: brian at (Brian Curtin)
Date: Mon, 16 Jan 2012 18:13:29 -0600
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 16, 2012 at 18:01, Jason R. Coombs
> My goal in adding the upgrade code was to provide a one-step upgrade for
> developers with only VS 10 installed. That's what vs-upgrade in
> jaraco.develop does.

Upgrading to 2010 requires some code changes in addition to the
conversion, so the process might not be as ripe for automation as the
previous versions. For one, a lot of constants in errno had to be
updated, then a few places that set certain errnos had to be updated.

From victor.stinner at  Tue Jan 17 01:16:43 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 01:16:43 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/17 Tim Delaney <timothy.c.delaney at>:
> What if in a pathological collision (e.g. > 1000 collisions), we increased
> the size of a dict by a small but random amount?

It doesn't change anything, you will still get collisions.


From guido at  Tue Jan 17 02:18:27 2012
From: guido at (Guido van Rossum)
Date: Mon, 16 Jan 2012 17:18:27 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 16, 2012 at 4:16 PM, Victor Stinner <
victor.stinner at> wrote:

> 2012/1/17 Tim Delaney <timothy.c.delaney at>:
> > What if in a pathological collision (e.g. > 1000 collisions), we
> increased
> > the size of a dict by a small but random amount?
> It doesn't change anything, you will still get collisions.

That depends right? If the collision is because they all have the same
hash(), yes. It might be different if it is because the secondary hashing
(or whatever it's called :-) causes collisions.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jaraco at  Tue Jan 17 04:08:27 2012
From: jaraco at (Jason R. Coombs)
Date: Tue, 17 Jan 2012 03:08:27 +0000
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> From: at [mailto:python-
> at] On Behalf Of Jason R. Coombs
> Sent: Monday, 16 January, 2012 19:01
> I'm unsure if the conversion from 9 to 10 or 10 to 9 can be as simple as
> vs9to8 suggests. When I run the upgrade using the Visual Studio tools, it
> upgrade the .sln file [as so]( But as you can
see, it also
> converts all of the .vcproj to .vcxproj, which appears to be a very
> schema. According to [this article](
> ad/
> 4345a151-d288-48d6-b7c7-a7c598d0f85e) it should be trivial to downgrade by
> only updating the .sln file (perhaps Visual Studio 2008 is forward
> with the .vcxproj format).

I upgraded the solution file using Visual Studio, then followed those
instructions suggested by the article, but the solution no longer builds
under Visual Studio 2008, so apparently that answer is incorrect.

Perhaps it's possible to upgrade the .sln in a less aggressive way than the
Visual Studio tools do by default, but my initial experience suggests it
won't be as easy to upgrade/downgrade the solution file as it was between
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6662 bytes
Desc: not available
URL: <>

From g.brandl at  Tue Jan 17 08:22:34 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 17 Jan 2012 08:22:34 +0100
Subject: [Python-Dev] [Python-checkins] peps: Bring the Python 3.3
 feature list up to date.
In-Reply-To: <>
References: <>
Message-ID: <jf37d2$af6$>

Am 16.01.2012 17:28, schrieb Antoine Pitrou:
> On Mon, 16 Jan 2012 11:17:42 -0500
> Brett Cannon <brett at> wrote:
>> Is the change to the pyc format big enough news to go into the release PEP?
>> Or should that just be a "What's New" topic?
> "What's New" sounds enough to me. The change doesn't enable any new
> feature, it just makes an issue much less likely to pop out.



From martin at  Tue Jan 17 09:16:36 2012
From: martin at (martin at
Date: Tue, 17 Jan 2012 09:16:36 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

>> It doesn't change anything, you will still get collisions.
> That depends right? If the collision is because they all have the same
> hash(), yes. It might be different if it is because the secondary hashing
> (or whatever it's called :-) causes collisions.

But Python deals with the latter case just fine already. The open hashing
approach relies on the dict resizing "enough" to prevent collisions after
the dictionary has grown. Unless somebody can demonstrate a counter example,
I believe this discussion is a red herring.

Plus: if an attacker could craft keys that deliberately cause collisions
because of the dictionary size, they could likely also craft keys in the same
number that collide on actual hash values, bringing us back to the original


From techtonik at  Tue Jan 17 11:59:16 2012
From: techtonik at (anatoly techtonik)
Date: Tue, 17 Jan 2012 13:59:16 +0300
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
 in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 13, 2012 at 7:19 PM, Antoine Pitrou <solipsis at> wrote:

> On Fri, 13 Jan 2012 17:00:57 +0100
> Xavier Morel <python-dev at> wrote:
> > FWIW this is not restricted to Linux (the same behavior change can
> > be observed in OSX), and the script is overly complex you can expose
> > the change with 3 lines
> >
> >     import sys
> >     sys.stdout.write('promt>')
> >
> >
> > Python 2 displays "prompt" and terminates execution on [Return],
> > Python 3 does not display anything until [Return] is pressed.
> >
> > Interestingly, the `-u` option is not sufficient to make
> > "prompt>" appear in Python 3, the stream has to be flushed
> > explicitly unless the input is ~16k characters (I guess that's
> > an internal buffer size of some sort)
> "-u" forces line-buffering mode for stdout/stderr, which is already the
> default if they are wired to an interactive device (isattr() returning
> True).
> But this was already rehashed on python-ideas and the bug tracker, and
> apparently Anatoly thought it would be a good idea to post on a third
> medium. Sigh.

If you track this more closely, you'll notice there are four issues
(surprises) from the user point of view:
1. print() buffers output on Python3
2. print() also buffers output on Python2, but only on Linux
3. there is some useless '-u' command line parameter
    (useless, because the last thing user wants is not only care about
Python 2/3, but also how to invoke them)
4. print() is not guilty - it is sys.stdout.write() that buffers output

1-2 discussion was about idea to make new print() function behavior more
'pythonic', i.e. 'user-friendly' or just KISS, which resulted in adding a
flush parameter
3 is a just a side FYI remark
4 doesn't relate to python-ideas anymore about fixing print() - it is about
the *cause* of the problem with print() UX, which is underlying
sys.stdout.write() behavior

I asked 4 here, because it is the more appropriate place not only to ask if
it can be/will be fixed, but also why. The target audience of the question
are developers.

Hope that helps Antoine recover from the sorrow. ;)
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Tue Jan 17 12:10:38 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 17 Jan 2012 20:10:38 +0900
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

martin at writes:
 > >> It doesn't change anything, you will still get collisions.
 > >
 > >
 > > That depends right? If the collision is because they all have the same
 > > hash(), yes. It might be different if it is because the secondary hashing
 > > (or whatever it's called :-) causes collisions.
 > But Python deals with the latter case just fine already. The open hashing
 > approach relies on the dict resizing "enough" to prevent collisions after
 > the dictionary has grown. Unless somebody can demonstrate a counter example,
 > I believe this discussion is a red herring.
 > Plus: if an attacker could craft keys that deliberately cause collisions
 > because of the dictionary size, they could likely also craft keys in the same
 > number that collide on actual hash values, bringing us back to the original
 > problem.

I thought that the original problem was that with N insertions in the
dictionary, by repeatedly inserting different keys generating the same
hash value an attacker could arrange that the cost of finding an open
slot is O(N), and thus the cost of N insertions is O(N^2).

If so, frequent resizing could make the attacker's problem much more
difficult, as the distribution of secondary probes should change with
each resize.

From victor.stinner at  Tue Jan 17 12:55:02 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 12:55:02 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> I thought that the original problem was that with N insertions in the
> dictionary, by repeatedly inserting different keys generating the same
> hash value an attacker could arrange that the cost of finding an open
> slot is O(N), and thus the cost of N insertions is O(N^2).
> If so, frequent resizing could make the attacker's problem much more
> difficult, as the distribution of secondary probes should change with
> each resize.

The attack creates 60,000 strings (or more) with exactly the same hash
value. A dictionary uses hash(str) & DICT_MASK to compute the bucket
index, where DICT_HASH is the number of buckets minus one. If all
strings have the same hash value, we always start in the same bucket
and the key has to be compared to all previous strings to find the
next empty bucket. The attack works because a LOT of strings are
compared and comparing strings is slow.

If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but
hash(str1)!=hash(str2), strings are not compared (this is a common
optimization in Python), and the so the attack would not be successful
(it would be slow, but not as slow as comparing two strings).


From ronaldoussoren at  Tue Jan 17 12:25:18 2012
From: ronaldoussoren at (Ronald Oussoren)
Date: Tue, 17 Jan 2012 12:25:18 +0100
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
 in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <>

On 17 Jan, 2012, at 11:59, anatoly techtonik wrote:
> If you track this more closely, you'll notice there are four issues (surprises) from the user point of view:
> 1. print() buffers output on Python3
> 2. print() also buffers output on Python2, but only on Linux
> 3. there is some useless '-u' command line parameter
>     (useless, because the last thing user wants is not only care about Python 2/3, but also how to invoke them)
> 4. print() is not guilty - it is sys.stdout.write() that buffers output
> 1-2 discussion was about idea to make new print() function behavior more 'pythonic', i.e. 'user-friendly' or just KISS, which resulted in adding a flush parameter
> 3 is a just a side FYI remark
> 4 doesn't relate to python-ideas anymore about fixing print() - it is about the *cause* of the problem with print() UX, which is underlying sys.stdout.write() behavior
> I asked 4 here, because it is the more appropriate place not only to ask if it can be/will be fixed, but also why. The target audience of the question are developers.

All four "issues" are related to output buffering and how that is not user-friendly. The new issue you raise is the same as before: sys.stdout is line buffered when writing to a tty, which means that you have to explictly flush output when you want to output a partial line.  Why is this a problem for you? Is that something that bothers you personally or do you have data that suggests that this is a problem for a significant amount of (new) users?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4788 bytes
Desc: not available
URL: <>

From victor.stinner at  Tue Jan 17 13:28:52 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 13:28:52 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

I finished my patch transforming hash(str) to a randomized hash
function, see random-8.patch attached to the issue:

The remaining question is which random number generator should be used
on Windows to initialize the hash secret (CryptoGen adds an overhead
of 10%, at least when the DLL is loaded dynamically), read the issue
for the details.

I plan to commit my fix to Python 3.3 if it is accepted. Then write a
simplified version to Python 3.2 and backport it to 3.1. Then backport
the simplified fix to 2.7, and finally to 2.6.

The vulnerability is public since one month, it is maybe time to fix
it before it is widely exploited.


From jeremy at  Tue Jan 17 16:39:03 2012
From: jeremy at (Jeremy Sanders)
Date: Tue, 17 Jan 2012 15:39:03 +0000
Subject: [Python-Dev] Status of the fix for the hash collision
References: <>
Message-ID: <jf44mp$ii$>

Victor Stinner wrote:

> If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but
> hash(str1)!=hash(str2), strings are not compared (this is a common
> optimization in Python), and the so the attack would not be successful
> (it would be slow, but not as slow as comparing two strings).

It's a shame the hash function can't take a second salt parameter to include 
in the hash. Each dict could have its own salt, generated from a quick 
pseudo-random generator.


From jeremy at  Tue Jan 17 16:44:21 2012
From: jeremy at (Jeremy Sanders)
Date: Tue, 17 Jan 2012 15:44:21 +0000
Subject: [Python-Dev] Status of the fix for the hash collision
References: <>
Message-ID: <jf450l$2dn$>

Jeremy Sanders wrote:

> Victor Stinner wrote:
>> If hash(str1)&DICT_MASK == hash(str2)&DICT_MASK but
>> hash(str1)!=hash(str2), strings are not compared (this is a common
>> optimization in Python), and the so the attack would not be successful
>> (it would be slow, but not as slow as comparing two strings).
> It's a shame the hash function can't take a second salt parameter to
> include in the hash. Each dict could have its own salt, generated from a
> quick pseudo-random generator.

Please ignore... forgot that the hashes are cached for strings!


From merwok at  Tue Jan 17 18:26:05 2012
From: merwok at (=?UTF-8?Q?=C3=89ric_Araujo?=)
Date: Tue, 17 Jan 2012 18:26:05 +0100
Subject: [Python-Dev] [Python-checkins] cpython: add str.casefold()
	(closes #13752)
In-Reply-To: <>
References: <>
Message-ID: <>


> changeset:   d4669f43d05f
> user:        Benjamin Peterson <benjamin at>
> date:        Sat Jan 14 13:23:30 2012 -0500
> summary:
>   add str.casefold() (closes #13752)

> diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
> --- a/Doc/library/stdtypes.rst
> +++ b/Doc/library/stdtypes.rst
> @@ -1002,6 +1002,14 @@
>     rest lowercased.
> +.. method:: str.casefold()
> +
> +   Return a casefolded copy of the string. Casefolded strings may be 
> used for
> +   caseless matching. For example, ``"MASSE".casefold() == 
> "ma?e".casefold()``.
> +
> +   .. versionadded:: 3.3

I think this method requires at least a link to relevant definitions
(Unicode website or Wikipedia), and at best a bit more explanation (for
example, it is not locale-dependent, even though the example above is
only meaningful for German).


From merwok at  Tue Jan 17 18:27:31 2012
From: merwok at (=?UTF-8?Q?=C3=89ric_Araujo?=)
Date: Tue, 17 Jan 2012 18:27:31 +0100
Subject: [Python-Dev]
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Giampaolo,

> changeset:   53a5a5b8859d
> user:        Giampaolo Rodola' <g.rodola at>
> date:        Mon Jan 09 17:10:10 2012 +0100
> summary:
>   provide a common method to check for RETR_DATA validity, first
> checking the expected len and then the actual data content; this
> way we get a failure on len mismatch rather than content mismatch
> (which is very long and unreadable)

My trick is to convert long strings to lists (with
data.split(appropriate line ending)) and pass them to assertEqual.
Then I get more readable element-based diffs when there is a test

Another trick I use is this (for example when I don?t want to make
too much diff noise, or when I don?t want to build the list of
expected results):

   self.assertEqual(len(got), 3, got)

unittest will print the third argument on failure.


From matrixhasu at  Tue Jan 17 19:02:13 2012
From: matrixhasu at (Sandro Tosi)
Date: Tue, 17 Jan 2012 19:02:13 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 16, 2012 at 16:42, ?ric Araujo <merwok at> wrote:
> Hi,
> Le 14/01/2012 15:31, Sandro Tosi a ?crit :
>> On Sat, Jan 14, 2012 at 04:24, ?ric Araujo <merwok at> wrote:
>>>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
>>> That?s a mistake I did in cefe4f38fa0e. ?This sentence should be removed.
>> Do you mean revert this whole hunk:
>> [...]
>> or just "The :keyword:`nonlocal` allows writing to outer scopes."?
> My proposal was to remove just that one last sentence, but the only
> other change in the diff hunk is the addition of ?by default?, which is
> connected to the existence of nonlocal. ?Both changes, i.e. the whole
> hunk, should be reverted (I think I?ll have time to do that today).

I've reverted it with ef1612a6a4f7

>>>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found for
>>>> cross-reference u'next':
>>> Need to use :meth:`.next` to let Sphinx find the right target (more info
>>> on request :)
>> it seems what it needed to was :meth:`next` (without the dot). The
>> current page links all 'next' in to functions.html#next,
>> and using :meth:`next` does that.
> I should have given more info, as I wanted the opposite result :)
> should not link to the next function but to the
> method. ?Because Sphinx does not differentiate between
> meth/func/class/mod roles, :meth:`next` is not resolved to the nearest
> next method as one could expect but to the next function, so we have to
> use :meth:`` or :meth:`.next` (local ref markup) to get
> our links to methods.

I tried :meth:`.next` but got a lots of :

/home/morph/cpython/py27/Doc/library/stdtypes.rst:2372: WARNING: more
than one target found for cross-reference u'next':,,,,,,,,,,

so I ended up with :meth:`next` but it was still wrong. I've committed
51e11b4937b7 which uses :meth:`` instead, and it works.

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From g.brandl at  Tue Jan 17 20:33:30 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 17 Jan 2012 20:33:30 +0100
Subject: [Python-Dev] Sphinx version for Python 2.x docs
In-Reply-To: <>
References: <>
Message-ID: <jf4i7i$eiu$>

Am 17.01.2012 19:02, schrieb Sandro Tosi:

>> I should have given more info, as I wanted the opposite result :)
>> should not link to the next function but to the
>> method.  Because Sphinx does not differentiate between
>> meth/func/class/mod roles, :meth:`next` is not resolved to the nearest
>> next method as one could expect but to the next function, so we have to
>> use :meth:`` or :meth:`.next` (local ref markup) to get
>> our links to methods.
> I tried :meth:`.next` but got a lots of :
> /home/morph/cpython/py27/Doc/library/stdtypes.rst:2372: WARNING: more
> than one target found for cross-reference u'next':,
> so I ended up with :meth:`next` but it was still wrong. I've committed
> 51e11b4937b7 which uses :meth:`` instead, and it works.

No need to try, just read the docs :)

`next` looks in the current (class, then module) namespaces.
`.next` looks everywhere, so the match must be unique.
So for something as common as "next", an explicit `` is


From martin at  Tue Jan 17 21:09:02 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 21:09:02 +0100
Subject: [Python-Dev] Script(s) for building Python on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> Are you suggesting creating vs10to9, which would be congruent to vs9to8, or
> vs9to10?

After reconsidering, I don't think I want anything like this in the tree
at this point. The code will be outdated by the time Python 3.3 is
released, as Python 3.3 will be built with a Visual Studio different
from 2008.


P.S. Please shorten your messages. They contain too much text for me to

From martin at  Tue Jan 17 21:29:51 2012
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 17 Jan 2012 21:29:51 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

> I thought that the original problem was that with N insertions in the
> dictionary, by repeatedly inserting different keys generating the same
> hash value an attacker could arrange that the cost of finding an open
> slot is O(N), and thus the cost of N insertions is O(N^2).
> If so, frequent resizing could make the attacker's problem much more
> difficult, as the distribution of secondary probes should change with
> each resize.

Not sure what you mean by "distribution of secondary probes".

Let H be the initial hash value, and let MASK be the current size
of the dictionary. Then I(n), the sequence of dictionary indices
being probed, is computed as

   I(0) = H & MASK
   PERTURB(0) = H
   I(n+1) = (5*I(n) + 1 + PERTURB(n)) & MASK
   PERTURN(n+1) = PERTURB(n) >> 5

So if two objects O1 and O2 have the same hash value H, the sequence of
probed indices is the same for any MASK value. It will be a different
sequence, yes, but they will still collide on each and every slot.

This is the very nature of open addressing. If it *wouldn't* try all
indices in the probe sequence, it may not be possible to perform
the lookup for a key correctly.


From solipsis at  Tue Jan 17 21:34:40 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 17 Jan 2012 21:34:40 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term
 support versions
Message-ID: <>


We would like to propose the following PEP to change (C)Python's release
cycle. Discussion is welcome, especially from people involved in the
release process, and maintainers from third-party distributions of



PEP: 407
Title: New release cycle and introducing long-term support versions
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis at>,
        Georg Brandl <georg at>,
        Barry Warsaw <barry at>
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 2012-01-12
Resolution: TBD


Finding a release cycle for an open-source project is a delicate
exercise in managing mutually contradicting constraints: developer
manpower, availability of release management volunteers, ease of
maintenance for users and third-party packagers, quick availability of
new features (and behavioural changes), availability of bug fixes
without pulling in new features or behavioural changes.

The current release cycle errs on the conservative side.  It is
adequate for people who value stability over reactivity.  This PEP is
an attempt to keep the stability that has become a Python trademark,
while offering a more fluid release of features, by introducing the
notion of long-term support versions.


This PEP doesn't try to change the maintenance period or release
scheme for the 2.7 branch.  Only 3.x versions are considered.


Under the proposed scheme, there would be two kinds of feature
versions (sometimes dubbed "minor versions", for example 3.2 or 3.3):
normal feature versions and long-term support (LTS) versions.

Normal feature versions would get either zero or at most one bugfix
release; the latter only if needed to fix critical issues.  Security
fix handling for these branches needs to be decided.

LTS versions would get regular bugfix releases until the next LTS
version is out.  They then would go into security fixes mode, up to a
termination date at the release manager's discretion.


A new feature version would be released every X months.  We
tentatively propose X = 6 months.

LTS versions would be one out of N feature versions.  We tentatively
propose N = 4.

With these figures, a new LTS version would be out every 24 months,
and remain supported until the next LTS version 24 months later.  This
is mildly similar to today's 18 months bugfix cycle for every feature

Pre-release versions

More frequent feature releases imply a smaller number of disruptive
changes per release.  Therefore, the number of pre-release builds
(alphas and betas) can be brought down considerably.  Two alpha builds
and a single beta build would probably be enough in the regular case.
The number of release candidates depends, as usual, on the number of
last-minute fixes before final release.


Effect on development cycle

More feature releases might mean more stress on the development and
release management teams.  This is quantitatively alleviated by the
smaller number of pre-release versions; and qualitatively by the
lesser amount of disruptive changes (meaning less potential for
breakage).  The shorter feature freeze period (after the first beta
build until the final release) is easier to accept.  The rush for
adding features just before feature freeze should also be much

Effect on bugfix cycle

The effect on fixing bugs should be minimal with the proposed figures.
The same number of branches would be simultaneously open for regular
maintenance (two until 2.x is terminated, then one).

Effect on workflow

The workflow for new features would be the same: developers would only
commit them on the ``default`` branch.

The workflow for bug fixes would be slightly updated: developers would
commit bug fixes to the current LTS branch (for example ``3.3``) and
then merge them into ``default``.

If some critical fixes are needed to a non-LTS version, they can be
grafted from the current LTS branch to the non-LTS branch, just like
fixes are ported from 3.x to 2.7 today.

Effect on the community

People who value stability can just synchronize on the LTS releases
which, with the proposed figures, would give a similar support cycle
(both in duration and in stability).

People who value reactivity and access to new features (without taking
the risk to install alpha versions or Mercurial snapshots) would get
much more value from the new release cycle than currently.

People who want to contribute new features or improvements would be
more motivated to do so, knowing that their contributions will be more
quickly available to normal users.  Also, a smaller feature freeze
period makes it less cumbersome to interact with contributors of


These are open issues that should be worked out during discussion:

* Decide on X (months between feature releases) and N (feature releases
  per LTS release) as defined above.

* For given values of X and N, is the no-bugfix-releases policy for
  non-LTS versions feasible?

* Restrict new syntax and similar changes (i.e. everything that was
  prohibited by PEP 3003) to LTS versions?

* What is the effect on packagers such as Linux distributions?

* How will release version numbers or other identifying and marketing
  material make it clear to users which versions are normal feature
  releases and which are LTS releases?  How do we manage user

A community poll or survey to collect opinions from the greater Python
community would be valuable before making a final decision.


This document has been placed in the public domain.

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8

From martin at  Tue Jan 17 21:43:49 2012
From: martin at (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 21:43:49 +0100
Subject: [Python-Dev] Switching to Visual Studio 2010
Message-ID: <>

It seems a number of people are interested that the Python trunk
switches to Visual Studio 2010 *now*. I've been hesitant to agree
to such a change, as I still hope that Python can skip over VS 2010
(a.k.a.  VS 10), and go straight to VS 11.

However, I just learned that VS 11 supposed ready VS 10 project files
just fine, with no need of conversion.

So I'd be willing to agree to converting the Python trunk now. It
will surely cause all kinds of issues, as any switching of Visual Studio
releases has caused in the past.

Since a number of people have already started with such a project,
I'd like to ask for a volunteer who will lead this project. You
get the honor to commit the changes, and you will be in charge if
something breaks, hopefully finding out solutions in a timely manner
(not necessarily implementing the solutions yourself).

Any volunteers?


P.S. Here is my personal list of requirements and non-requirements:
- must continue to live in PCbuild, and must replace the VS 9
  project files "for good"
- may or may not support automatic conversion to VS 9. If it turns
  out that conversion to old project files is not feasible, we could
  either decide to maintain old project files manually (in PC/VS9),
  or just give up on maintaining build support for old VS releases.
- must generate binaries that run on Windows XP
- must support x86 and AMD64 builds
- must support debug and no-debug builds
- must support PGO builds
- must support buildbot
- must support building all extensions that we currently build
- may break existing buildbot installations until they upgrade to
  a new VS release
- must support PCbuild/rt.bat
- should support Tools/msi. If it doesn't, I'll look into it.
- must nearly pass the test suite (i.e. number of additional failures
  due to VS 2010 should be "small")

From brian at  Tue Jan 17 21:51:04 2012
From: brian at (Brian Curtin)
Date: Tue, 17 Jan 2012 14:51:04 -0600
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 14:43, "Martin v. L?wis" <martin at> wrote:
> It seems a number of people are interested that the Python trunk
> switches to Visual Studio 2010 *now*. I've been hesitant to agree
> to such a change, as I still hope that Python can skip over VS 2010
> (a.k.a. ?VS 10), and go straight to VS 11.
> However, I just learned that VS 11 supposed ready VS 10 project files
> just fine, with no need of conversion.
> So I'd be willing to agree to converting the Python trunk now. It
> will surely cause all kinds of issues, as any switching of Visual Studio
> releases has caused in the past.
> Since a number of people have already started with such a project,
> I'd like to ask for a volunteer who will lead this project. You
> get the honor to commit the changes, and you will be in charge if
> something breaks, hopefully finding out solutions in a timely manner
> (not necessarily implementing the solutions yourself).
> Any volunteers?

I previously completed the port at my old company (but could not
release it), and I have a good bit of it completed for us at That repo is a little bit
behind 'default' but updating it shouldn't pose any problems.

From martin at  Tue Jan 17 21:52:02 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 21:52:02 +0100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>
Message-ID: <>

> I plan to commit my fix to Python 3.3 if it is accepted. Then write a
> simplified version to Python 3.2 and backport it to 3.1.

I'm opposed to any change to the hash values of strings in maintenance
releases, so I guess I'm opposed to your patch in principle.

See my next message for an alternative proposal.

> The vulnerability is public since one month, it is maybe time to fix
> it before it is widely exploited.

I don't think there is any urgency. The vulnerability has been known for
more than five years now. From creating a release to the point where
the change actually arrives at end users, many months will pass.


From martin at  Tue Jan 17 21:59:28 2012
From: martin at (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 21:59:28 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
Message-ID: <>

I'd like to propose a different approach to seeding the string hashes:
only do so for dictionaries involving only strings, and leave the
tp_hash slot of strings unchanged.

Each string would get two hashes: the "public" hash, which is constant
across runs and bugfix releases, and the dict-hash, which is only used
by the dictionary implementation, and only if all keys to the dict are
strings. In order to allow caching of the hash, all dicts should use
the same hash (if caching wasn't necessary, each dict could use its own

There are several variants of that approach wrt. caching of the hash
1. add an additional field to all string objects, to cache the second
   hash value.
   a) variant: in 3.3, drop the extra field, and declare that hashes
   may change across runs
2. only cache the dict-hash, recomputing the public hash each time
3. on a per-string choice, cache either the dict-hash or the public
   hash, depending on which one gets computed first, and recompute
   the other one every time it's needed.

As you can see, 1 vs. 2/3 is a classical time-space-tradeoff.

What do you think?


From martin at  Tue Jan 17 22:01:21 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 22:01:21 +0100
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

> I previously completed the port at my old company (but could not
> release it), and I have a good bit of it completed for us at
> That repo is a little bit
> behind 'default' but updating it shouldn't pose any problems.

So: do you agree that we switch? Do you volunteer to drive the change?


From martin at  Tue Jan 17 22:06:30 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 17 Jan 2012 22:06:30 +0100
Subject: [Python-Dev] Python as a Metro-style App
In-Reply-To: <>
References: <> <>
Message-ID: <>

> Just wondering, do Metro apps define UNDER_CE or _WIN32_WCE? The point
> is that the old ANSI functions (CreateFileA etc) have been removed from
> the embedded MS Windows CE long ago, too, and MS Windows Mobile used to
> be a custom CE variant or at least strongly related. In any case, it
> could help using the existing (incomplete) CE port as base for Metro.

I have now completed building Python as a Metro DLL; the WinRT
restrictions are fairly limited (code-wise, not so in impact).

They are quite different from the CE restrictions. For example,
CreateSemaphore is not available on WinRT, you have to use
CreateSemaphoreExW (which is new in Windows Vista). No traces of
the CE API can be seen in the restrictions, and the separation
is done in a different manner (WINAPI_FAMILY==2).


From brian at  Tue Jan 17 22:11:21 2012
From: brian at (Brian Curtin)
Date: Tue, 17 Jan 2012 15:11:21 -0600
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 15:01, "Martin v. L?wis" <martin at> wrote:
>> I previously completed the port at my old company (but could not
>> release it), and I have a good bit of it completed for us at
>> That repo is a little bit
>> behind 'default' but updating it shouldn't pose any problems.
> So: do you agree that we switch? Do you volunteer to drive the change?

I do, and I'll volunteer.

From solipsis at  Tue Jan 17 22:26:11 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 17 Jan 2012 22:26:11 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
References: <>
Message-ID: <>

On Tue, 17 Jan 2012 21:59:28 +0100
"Martin v. L?wis" <martin at> wrote:
> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.

I think Python 3 would be better with a clean fix (all hashes
Now for Python 2... The problem with this idea is that it only
addresses str dicts. Unicode dicts, and any other dicts, are left
vulnerable. Unicode dicts are quite likely in Web
frameworks/applications and other places which have well-thought text

That said, here's a suggestion to squeeze those bits:

> 1. add an additional field to all string objects, to cache the second
>    hash value.
>    a) variant: in 3.3, drop the extra field, and declare that hashes
>    may change across runs

In 2.7, a string object has the following fields:

    long ob_shash;
    int ob_sstate;

Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
could cache a "hash perturbation" computed from the string and the
random bits:

- hash() would use ob_shash
- dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))

This way, you cache almost all computations, adding only a computation
and a couple logical ops when looking up a string in a dict.



From mark at  Tue Jan 17 23:03:45 2012
From: mark at (Mark Shannon)
Date: Tue, 17 Jan 2012 22:03:45 +0000
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Hi all.

Lets start controversially: I don't like PEP 380, I think it's a kludge.

I think that CPython should have proper coroutines, rather than add more 
bits and pieces to generators in an attempt to make them more like 

I have mentioned this before, but this time I have done something about 
it :)

I have a working, portable, (asymmetric) coroutine implementation here:

Its all standard C, no messing with the C stack, just using standard 
techniques to convert recursion to iteration
(in the VM not at the Python level) and a revised internal calling 
convention to make CPython stackless:

Then I've added a Coroutine class and fiddled with the implementation of 
YIELD_VALUE to support it.

I think the stackless implementation is pretty solid, but the
coroutine stuff needs some refinement.
I've not tested it well (it passes the test suite, but I've added no new 
It is (surprisingly) a bit faster than tip (on my machine).
There are limitations: all calls must be Python-to-Python calls,
which rules out most __xxx__ methods. It might be worth special casing 
__iter__, but I've not done that yet.

To try it out:

 >>> import coroutine
To send a value to a coroutine:
 >>> co.send(val)
where co is a Coroutine()
To yield a value:
 >>> coroutine.co_yield(val)
send() is a method, co_yield is a function.

Here's a little program to demonstrate:

import coroutine

class Node:
     def __init__(self, l, item, r):
         self.l = l
         self.item = item
         self.r = r

def make_tree(n):
     if n == 0:
         return Node(None, n, None)
         return Node(make_tree(n-1), n, make_tree(n-1))

def walk_tree(t, f):
     if t is not None:
         walk_tree(t.l, f)
         walk_tree(t.r, f)

def yielder(t):

def tree_yielder(t):
     walk_tree(t, yielder)

co = coroutine.Coroutine(tree_yielder, (make_tree(2),))

while True:

Which will output:

Traceback (most recent call last):
   File "", line 30, in <module>
TypeError: can't send to a halted coroutine


From victor.stinner at  Tue Jan 17 23:06:47 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 23:06:47 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/17 "Martin v. L?wis" <martin at>:
> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.

The real problem is in dict (or any structure using an hash table), so
if it is possible, it would also prefer to fix the problem directly in

> There are several variants of that approach wrt. caching of the hash
> 1. add an additional field to all string objects, to cache the second
> ? hash value.
> ? a) variant: in 3.3, drop the extra field, and declare that hashes
> ? may change across runs
> 2. only cache the dict-hash, recomputing the public hash each time
> 3. on a per-string choice, cache either the dict-hash or the public
> ? hash, depending on which one gets computed first, and recompute
> ? the other one every time it's needed.

There is a simpler solution:

bucket_index = (hash(str) ^ secret) & DICT_MASK.

Remark: set must also be fixed.


From victor.stinner at  Tue Jan 17 23:23:48 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 23:23:48 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

> There is a simpler solution:
> bucket_index = (hash(str) ^ secret) & DICT_MASK.

Oops, hash^secret doesn't add any security.


From ericsnowcurrently at  Tue Jan 17 23:24:09 2012
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 17 Jan 2012 15:24:09 -0700
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 1:34 PM, Antoine Pitrou <solipsis at> wrote:
> Under the proposed scheme, there would be two kinds of feature
> versions (sometimes dubbed "minor versions", for example 3.2 or 3.3):
> normal feature versions and long-term support (LTS) versions.
> A new feature version would be released every X months.  We
> tentatively propose X = 6 months.
> LTS versions would be one out of N feature versions.  We tentatively
> propose N = 4.

It sounds like every six months we would get a new feature version,
with every fourth one an LTS release.  That sounds great, but, unless
I've misunderstood, there has been a strong desire to keep that number
to one digit.  It doesn't matter to me all that much.  However, if
there is such a limit, implied or explicit, it should be mentioned and
factor into the PEP.

That aside, +1.


From victor.stinner at  Tue Jan 17 23:57:46 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 17 Jan 2012 23:57:46 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

> Each string would get two hashes: the "public" hash, which is constant
> across runs and bugfix releases, and the dict-hash, which is only used
> by the dictionary implementation, and only if all keys to the dict are
> strings.

The distinction between secret (private, secure) and "public" hash
(deterministic) is not clear to me.

Example: collections.UserDict implements __hash__() using
hash( Should it use the public or the private hash? computes its hash using hash(x) of each item. Same

If we need to use the secret hash, it should be exposed in Python.
Which function/method would be used? I suppose that we cannot add
anything to stable releases like 2.7.


From anacrolix at  Wed Jan 18 00:04:19 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 18 Jan 2012 10:04:19 +1100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

If minor/feature releases are introducing breaking changes perhaps it's
time to adopt accelerated major versioning schedule. For instance there are
breaking ABI changes between 3.0/3.1, and 3.2, and while acceptable for the
early adoption state of Python 3, such changes should normally be reserved
for major versions.

If every 4th or so feature release is sufficiently different to be worth of
an LTS, consider this a major release albeit with smaller beading changes
than Python 3.

Aside from this, given the radical features of 3.3, and the upcoming Ubuntu
12.04 LTS, I would recommend adopting 2.7 and 3.2 as the first LTSs, to be
reviewed 2 years hence should this go ahead.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From anacrolix at  Wed Jan 18 00:17:13 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 18 Jan 2012 10:17:13 +1100
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Just to clarify, this differs in functionality from enhanced generators by
allowing you to yield from an arbitrary call depth rather than having to
"yield from" through a chain of calling generators? Furthermore there's no
syntactical change except to the bottommost frame doing a co_yield? Does
this capture the major differences?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jan 18 00:20:06 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 00:20:06 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>


On Wed, 18 Jan 2012 10:04:19 +1100
Matt Joiner <anacrolix at> wrote:
> If minor/feature releases are introducing breaking changes perhaps it's
> time to adopt accelerated major versioning schedule.

The PEP doesn't propose to accelerate compatibility breakage. So I don't
think a change in numbering is required.

> For instance there are
> breaking ABI changes between 3.0/3.1, and 3.2, and while acceptable for the
> early adoption state of Python 3, such changes should normally be reserved
> for major versions.

Which "breaking ABI changes" are you thinking about? Python doesn't
guarantee any A*B*I (as opposed to API), unless you use Py_LIMITED_API
which was introduced in 3.2.



From victor.stinner at  Wed Jan 18 00:25:23 2012
From: victor.stinner at (Victor Stinner)
Date: Wed, 18 Jan 2012 00:25:23 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

>> I plan to commit my fix to Python 3.3 if it is accepted. Then write a
>> simplified version to Python 3.2 and backport it to 3.1.
> I'm opposed to any change to the hash values of strings in maintenance
> releases, so I guess I'm opposed to your patch in principle.

If randomized hash cannot be turned on by default, an alternative is
to switch them off by default, and add an option (command line option,
environment variable, etc.) to enable it.

>> The vulnerability is public since one month, it is maybe time to fix
>> it before it is widely exploited.
> I don't think there is any urgency. The vulnerability has been known for
> more than five years now. From creating a release to the point where
> the change actually arrives at end users, many months will pass.

In 2003, Python was not seen as vulnerable. Maybe because the hash
function is different than Perl hash function, or because nobody tried
to generate collisions. Today it is clear that Python is vulnerable
(64 bits version is also affected), and it's really fast to generate
collisions using the right algorithm.

Why is it so long to fix the vulnerability in Python, whereas it was
fixed quickly in Ruby? (they chose to use a randomized hash)


From tjreedy at  Wed Jan 18 00:29:11 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 17 Jan 2012 18:29:11 -0500
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <jf508b$k14$>

On 1/17/2012 3:34 PM, Antoine Pitrou wrote:
> Hello,
> We would like to propose the following PEP to change (C)Python's release
> cycle. Discussion is welcome, especially from people involved in the
> release process, and maintainers from third-party distributions of
> Python.
> Regards
> Antoine.
> PEP: 407
> Title: New release cycle and introducing long-term support version

To me, as I understand the proposal, the title is wrong. Our current 
feather releases already are long-term support versions. They get bugfix 
releases at close to 6 month intervals for 1 1/2 -2 years and security 
fixes for 3 years. The only change here is that you propose, for 
instance, a fixed 6-month interval and 2 year period.

As I read this, you propose to introduce a new short-term (interim, 
preview) feature release along with each bugfix release. Each would have 
all the bugfixes plus a preview of the new features expected to be in 
the next long-term release. (I know, this is not exactly how you spun it.)

There has been discussion on python-ideas about whether new features are 
or can be considered experimental, or whether there should be an 
'experimental' package. An argument against is that long-term production 
releases should not have experimental features that might go away or 
have their apis changed.

If the short-term, non-production, interim feature releases were called 
preview releases, then some or all of the new features could be labelled 
experimental and subject to change. It might actually be good to have 
major new features tested in at least one preview release before being 
frozen. Maybe then more of the initial bugs would be found and repaired 
*before* their initial appearance in a long-term release. (All of this 
is not to say that experimental features should be casually changed or 
reverted without good reason.)

One problem, at least on Windows, is that short-term releases would 
almost never have compiled binaries for 3rd-party libraries. It already 
takes awhile for them to appear for the current long-term releases. On 
the other hand,  library authors might be more inclined to test new 
features, a few at a time, if part of tested preview releases, than if 
just in the repository. So the result *might* be quicker library updates 
after each long-term release.

Terry Jan Reedy

From ethan at  Tue Jan 17 23:46:35 2012
From: ethan at (Ethan Furman)
Date: Tue, 17 Jan 2012 14:46:35 -0800
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Shannon wrote:
> I think that CPython should have proper coroutines, rather than add more 
> bits and pieces to generators in an attempt to make them more like 
> coroutines.
> I have mentioned this before, but this time I have done something about 
> it :)
> I have a working, portable, (asymmetric) coroutine implementation here:

As a user, this sounds cool!


From glyph at  Wed Jan 18 00:37:31 2012
From: glyph at (Glyph)
Date: Tue, 17 Jan 2012 18:37:31 -0500
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote:

> Lets start controversially: I don't like PEP 380, I think it's a kludge.

Too late; it's already accepted.  There's not much point in making controversial statements about it now.

> I think that CPython should have proper coroutines, rather than add more bits and pieces to generators in an attempt to make them more like coroutines.

By "proper" coroutines, you mean implicit coroutines (cooperative threads) rather than explicit coroutines (cooperative generators).  Python has been going in the "explicit" direction on this question for a long time.  (And, in my opinion, this is the right direction to go, but that's not really relevant here.)

I think this discussion would be more suitable for python-ideas though, since you have a long row to hoe here.  There's already a PEP - - apparently deferred and not rejected, which you may want to revisit.

There are several libraries which can give you cooperative threading already; I assume you're already aware of greenlet and stackless, but I didn't see what advantages your proposed implementation provides over those.  I would guess that one of the first things you should address on python-ideas is why adopting your implementation would be a better idea than just bundling one of those with the standard library :).


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jan 18 00:42:21 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 00:42:21 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
References: <> <jf508b$k14$>
Message-ID: <>

On Tue, 17 Jan 2012 18:29:11 -0500
Terry Reedy <tjreedy at> wrote:
> To me, as I understand the proposal, the title is wrong. Our current 
> feather releases already are long-term support versions. They get bugfix 
> releases at close to 6 month intervals for 1 1/2 -2 years and security 
> fixes for 3 years. The only change here is that you propose, for 
> instance, a fixed 6-month interval and 2 year period.
> As I read this, you propose to introduce a new short-term (interim, 
> preview) feature release along with each bugfix release. Each would have 
> all the bugfixes plus a preview of the new features expected to be in 
> the next long-term release. (I know, this is not exactly how you spun it.)

Well, "spinning" is important here. We are not proposing any "preview"
releases. These would have the same issue as alphas or betas: nobody
wants to install them where they could disrupt working applications and

What we are proposing are first-class releases that are as robust as
any other (and usable in production). It's really about making feature
releases more frequent, not making previews available during

I agree "long-term" could be misleading as their support duration is
not significantly longer than current feature releases. I chose this
term because it is quite well-known and well-understood, but we could
pick something else ("extended support", "2-year support", etc.).

> There has been discussion on python-ideas about whether new features are 
> or can be considered experimental, or whether there should be an 
> 'experimental' package. An argument against is that long-term production 
> releases should not have experimental features that might go away or 
> have their apis changed.

That's orthogonal to this PEP.
(that said, more frequent feature releases are also a benefit for the
__preview__ proposal, since we could be more reactive changing APIs in
that namespace)

> One problem, at least on Windows, is that short-term releases would 
> almost never have compiled binaries for 3rd-party libraries.

That's a good point, although Py_LIMITED_API will hopefully make things
better in the middle term.



From ezio.melotti at  Wed Jan 18 00:50:52 2012
From: ezio.melotti at (Ezio Melotti)
Date: Wed, 18 Jan 2012 01:50:52 +0200
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>


On 17/01/2012 22.34, Antoine Pitrou wrote:
> [...]
> Proposal
> ========
> Under the proposed scheme, there would be two kinds of feature
> versions (sometimes dubbed "minor versions", for example 3.2 or 3.3):
> normal feature versions and long-term support (LTS) versions.
> Normal feature versions would get either zero or at most one bugfix
> release; the latter only if needed to fix critical issues.  Security
> fix handling for these branches needs to be decided.

If non-LTS releases won't get bug fixes, a bug that is fixed in 3.3.x 
might not be fixed in 3.4, unless the bug fixes releases are 
synchronized with the new feature releases (see below).

> LTS versions would get regular bugfix releases until the next LTS
> version is out.  They then would go into security fixes mode, up to a
> termination date at the release manager's discretion.
> Periodicity
> -----------
> A new feature version would be released every X months.  We
> tentatively propose X = 6 months.
> LTS versions would be one out of N feature versions.  We tentatively
> propose N = 4.

If LTS bug fixes releases and feature releases are synchronized, we will 
have something like:

3.3.1 / 3.4
3.3.2 / 3.5
3.3.3 / 3.6
3.7.1 / 3.8

so every new feature release will have all the bug fixes of the current 
LTS release, plus new features.

With this scheme we will soon run out of 1-digit numbers though.
Currently we already have a 3.x release every ~18 months, so if we keep 
doing that (just every 24 months instead of 18) and introduce the 
feature releases in between under a different versioning scheme, we 
might avoid the problem.

This means:
... 18 months, N bug fix releases...
... 18 months, N bug fix releases ...
3.3 LTS
... 24 months, 3 bug fix releases, 3 feature releases ...
3.4 LTS
... 24 months, 3 bug fix releases, 3 feature releases ...
3.5 LTS

In this way we solve the numbering problem and keep a familiar scheme 
(all the 3.x will be LTS and will be released as the same pace as 
before, no need to mark some 3.x as LTS).  OTOH this will make the 
feature releases less "noticeable" and people might just ignore them and 
stick with the LTS releases.  Also we would need to define a versioning 
convention for the feature releases.

> [...]
> Effect on bugfix cycle
> ----------------------
> The effect on fixing bugs should be minimal with the proposed figures.
> The same number of branches would be simultaneously open for regular
> maintenance (two until 2.x is terminated, then one).

Wouldn't it still be two?
Bug fixes will go to the last LTS and on default, features only on default.

> Effect on workflow
> ------------------
> The workflow for new features would be the same: developers would only
> commit them on the ``default`` branch.
> The workflow for bug fixes would be slightly updated: developers would
> commit bug fixes to the current LTS branch (for example ``3.3``) and
> then merge them into ``default``.

So here the difference is that instead of committing on the previous 
release (what currently is 3.2), we commit it to the previous LTS 
release, ignoring the ones between that and default.

> If some critical fixes are needed to a non-LTS version, they can be
> grafted from the current LTS branch to the non-LTS branch, just like
> fixes are ported from 3.x to 2.7 today.
> Effect on the community
> -----------------------
> People who value stability can just synchronize on the LTS releases
> which, with the proposed figures, would give a similar support cycle
> (both in duration and in stability).

That's why I proposed to keep the same versioning scheme for these 
releases, and have a different numbering for the feature releases.

> [...]
> Discussion
> ==========
> These are open issues that should be worked out during discussion:
> * Decide on X (months between feature releases) and N (feature releases
>    per LTS release) as defined above.

This doesn't necessarily have to be fixed, especially if we don't change 
the versioning scheme (so we don't need to know that we have a LTS 
release every N releases).

> * For given values of X and N, is the no-bugfix-releases policy for
>    non-LTS versions feasible?

If LTS bug fix releases and feature releases are synchronized it should 
be feasible.

> * Restrict new syntax and similar changes (i.e. everything that was
>    prohibited by PEP 3003) to LTS versions?

(I was reading this the other way around, maybe rephrase it to "Allow 
new syntax and similar changes only in LTS versions")

> * What is the effect on packagers such as Linux distributions?

* What is the effect on PyPy/Jython/IronPython?  Can they just skip the 
feature releases and focus on the LTS ones?

> * How will release version numbers or other identifying and marketing
>    material make it clear to users which versions are normal feature
>    releases and which are LTS releases?  How do we manage user
>    expectations?

This is not an issue with the scheme I proposed.

> A community poll or survey to collect opinions from the greater Python
> community would be valuable before making a final decision.
> [...]

Best Regards,
Ezio Melotti

From tjreedy at  Wed Jan 18 00:58:55 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 17 Jan 2012 18:58:55 -0500
Subject: [Python-Dev] Backwards incompatible sys.stdout.write() behavior
 in Python 3 (Was: [Python-ideas] Pythonic buffering in Py3 print())
In-Reply-To: <>
References: <>
Message-ID: <jf5203$uql$>

On 1/17/2012 5:59 AM, anatoly techtonik wrote:

> 1. print() buffers output on Python3
> 2. print() also buffers output on Python2, but only on Linux

No, print() does not buffer output. It merely sends it to a file.

> 4. print() is not guilty - it is sys.stdout.write() that buffers output

Oh, you already know that 1&2 are false.

So is 4, if interpreted as saying that sys.stdout.write() *will* buffer 
output. sys.stdout can be *any* file-like object. Its .write method 
*may* buffer output, or it *may not*. With IDLE, it does not. We have 
been over this before. At your instigation, the doc has been changed to 
make this clearer. At your request, a new feature has been added to 
force flushing. By most people's standards, you won.

Terry Jan Reedy

From jdhardy at  Wed Jan 18 01:24:34 2012
From: jdhardy at (Jeff Hardy)
Date: Tue, 17 Jan 2012 16:24:34 -0800
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Tue, Jan 17, 2012 at 3:50 PM, Ezio Melotti <ezio.melotti at> wrote:
> * What is the effect on PyPy/Jython/IronPython? ?Can they just skip the
> feature releases and focus on the LTS ones?

At least for IronPython it's unlikely we'd be able track the feature
releases. We're still trying to catch up as it is.

Honestly, I don't see the advantages of this. Are there really enough
new features planned that Python needs a full release more than every
18 months?

- Jeff

From martin at  Wed Jan 18 01:30:59 2012
From: martin at (martin at
Date: Wed, 18 Jan 2012 01:30:59 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von Victor Stinner <victor.stinner at>:

>> Each string would get two hashes: the "public" hash, which is constant
>> across runs and bugfix releases, and the dict-hash, which is only used
>> by the dictionary implementation, and only if all keys to the dict are
>> strings.
> The distinction between secret (private, secure) and "public" hash
> (deterministic) is not clear to me.

It's not about privacy or security. It's about compatibility. The
dict-hash is only used in the dict implementation, and never exposed,
leaving the tp_hash unmodified.

> Example: collections.UserDict implements __hash__() using
> hash(

Are you sure? I only see that used for UserString, not UserDict.

> computes its hash using hash(x) of each item. Same
> question.

The hash of the Set should most certainly use the element's tp_hash.
That *is* the hash of the objects, and it may collide for strings
just fine due to the vulnerability.

> If we need to use the secret hash, it should be exposed in Python.

It's not secret, just specific. I don't mind it being exposed. However,
that would be a new feature, which cannot be added in a security fix
or bug fix release.

> Which function/method would be used? I suppose that we cannot add
> anything to stable releases like 2.7.

Right. Nor do I see any need to expose it. It fixes the vulnerability
just fine without being exposed.


From martin at  Wed Jan 18 01:37:49 2012
From: martin at (martin at
Date: Wed, 18 Jan 2012 01:37:49 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

> If randomized hash cannot be turned on by default, an alternative is
> to switch them off by default, and add an option (command line option,
> environment variable, etc.) to enable it.

That won't really fix the problem. If people install a new release because
it fixes a vulnerability, it better does so.

> In 2003, Python was not seen as vulnerable. Maybe because the hash
> function is different than Perl hash function, or because nobody tried
> to generate collisions. Today it is clear that Python is vulnerable
> (64 bits version is also affected), and it's really fast to generate
> collisions using the right algorithm.

There is the common vulnerability to the threat of confusing threats
with vulnerabilities [1]. Python was vulnerable all the time, and nobody
claimed otherwise. It's just that nobody saw it as a threat. I still
don't see it as a practical threat, as there are many ways that people
use in practice to protect against this threat already. But I understand
that others feel threatened now.

> Why is it so long to fix the vulnerability in Python, whereas it was
> fixed quickly in Ruby? (they chose to use a randomized hash)

Because the risk of breakage for Python is much higher than it is for Ruby.



From merwok at  Wed Jan 18 01:39:17 2012
From: merwok at (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Wed, 18 Jan 2012 01:39:17 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Refactored logging
 rotating handlers for improved flexibility.
In-Reply-To: <>
References: <>
Message-ID: <>


> changeset:   57295c4d81ac
> user:        Vinay Sajip <vinay_sajip at>
> date:        Wed Jan 04 12:02:26 2012 +0000
> summary:
>   Refactored logging rotating handlers for improved flexibility.

> diff --git a/Doc/howto/logging-cookbook.rst b/Doc/howto/logging-cookbook.rst
> --- a/Doc/howto/logging-cookbook.rst
> +++ b/Doc/howto/logging-cookbook.rst
> [snip]
> +These are not ?true? .gz files, as they are bare compressed data, with no
> +?container? such as you?d find in an actual gzip file. This snippet is just
> +for illustration purposes.

I believe using the right characters for quote marks will upset Latex
and thus PDF generation, so the docs use ASCII straight quote marks.

> diff --git a/Doc/library/logging.handlers.rst b/Doc/library/logging.handlers.rst
> --- a/Doc/library/logging.handlers.rst
> +++ b/Doc/library/logging.handlers.rst
> [snip]
> +   .. method:: BaseRotatingHandler.rotation_filename(default_name)
> +
> +      Modify the filename of a log file when rotating.
> +
> +      This is provided so that a custom filename can be provided.
> +
> +      The default implementation calls the 'namer' attribute of the handler,
> +      if it's callable, passing the default name to it. If the attribute isn't
> +      callable (the default is `None`), the name is returned unchanged.

Should be ``None``.


From martin at  Wed Jan 18 01:58:42 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 18 Jan 2012 01:58:42 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <> <>
Message-ID: <>

Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> On Tue, 17 Jan 2012 21:59:28 +0100
> "Martin v. L?wis" <martin at> wrote:
>> I'd like to propose a different approach to seeding the string hashes:
>> only do so for dictionaries involving only strings, and leave the
>> tp_hash slot of strings unchanged.
> I think Python 3 would be better with a clean fix (all hashes
> randomized).
> Now for Python 2... The problem with this idea is that it only
> addresses str dicts. Unicode dicts, and any other dicts, are left
> vulnerable.

No, you misunderstood. I meant to propose that this applies to both
kinds of string (unicode and byte strings); for 2.x also dictionaries
including a mix of them.

> Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
> could cache a "hash perturbation" computed from the string and the
> random bits:
> - hash() would use ob_shash
> - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
> This way, you cache almost all computations, adding only a computation
> and a couple logical ops when looking up a string in a dict.

That's a good idea. For Unicode, it might be best to add another slot
into the object, even though this increases the object size.


From stephen at  Wed Jan 18 03:37:08 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 18 Jan 2012 11:37:08 +0900
Subject: [Python-Dev] PEP 407: New release cycle and introducing long-term
 support versions
In-Reply-To: <>
References: <>
Message-ID: <>

Executive summary:

My take is "show us the additional resources, and don't be stingy!"
Sorry, Antoine, I agree with your goals, but I think you are too
optimistic about the positive effects and way too optimistic about the

Antoine Pitrou writes:

 > Finding a release cycle for an open-source project is a delicate
 > exercise in managing mutually contradicting constraints: developer
 > manpower,

This increases the demand for developer manpower somewhat.

 > availability of release management volunteers,

Dramatic increase here.  It may look like RM is not so demanding --
run a few scripts to put out the alphas/betas/releases.  But the RM
needs to stay on top of breaking news, make decisions.  That takes
time, interrupts other work, etc.

 > ease of maintenance for users and third-party packagers,

Dunno about users, but 3rd party packagers will also have more work to
do, or will have to tell their users "we only promise compatibility
with LTS releases."

 > quick availability of new features (and behavioural changes),

These are already *available*, just not *tested*.

Since testing is the bottleneck on what users consider to be
"available for me", you cannot decrease the amount of testing (alpha,
beta releases) by anywhere near the amount you're increasing
frequency, or you're just producing "as is" snapshots.  Percentage of
time in feature freeze goes way up, features get introduced all at
once just before the next release, schedule slippage is inevitable on
some releases.

 > availability of bug fixes without pulling in new features or
 > behavioural changes.

Sounds like a slight further increase in demand for RM, and as
described a dramatic decrease in the bugfixing for throw-away releases.

 > The current release cycle errs on the conservative side.

What evidence do you have for that, besides people who aren't RMs
wishing that somebody else would do more RM work?

 > More feature releases might mean more stress on the development and
 > release management teams.  This is quantitatively alleviated by the
 > smaller number of pre-release versions; and qualitatively by the
 > lesser amount of disruptive changes (meaning less potential for
 > breakage).

Way optimistic IMO (theoretical, admitted, but I do release management
for a less well-organized project, and I teach in a business school,

 > The shorter feature freeze period (after the first beta build until
 > the final release) is easier to accept.

But you need to look at total time in feature freeze over the LTS
cycle, not just before each throw-away release.

 > The rush for adding features just before feature freeze should also
 > be much smaller.

This doesn't depend on the length of time in feature freeze per
release, it depends on the fraction of time in feature freeze over the
cycle.  Given your quality goals, this will go way up.

From tjreedy at  Wed Jan 18 05:32:04 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 17 Jan 2012 23:32:04 -0500
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <> <jf508b$k14$>
Message-ID: <jf5i09$r29$>

On 1/17/2012 6:42 PM, Antoine Pitrou wrote:
> On Tue, 17 Jan 2012 18:29:11 -0500
> Terry Reedy<tjreedy at>  wrote:
>> To me, as I understand the proposal, the title is wrong. Our current
>> feather releases already are long-term support versions. They get bugfix
>> releases at close to 6 month intervals for 1 1/2 -2 years and security
>> fixes for 3 years. The only change here is that you propose, for
>> instance, a fixed 6-month interval and 2 year period.
>> As I read this, you propose to introduce a new short-term (interim,
>> preview) feature release along with each bugfix release. Each would have
>> all the bugfixes plus a preview of the new features expected to be in
>> the next long-term release. (I know, this is not exactly how you spun it.)

The main point of my comment is that the new thing you are introducing 
is not long-term supported versions but short term unsupported versions.

> Well, "spinning" is important here. We are not proposing any "preview"
> releases. These would have the same issue as alphas or betas: nobody

I said nothing about quality. We aim to keep default in near-release 
condition and seem to be getting better. The new unicode is still 
getting polished a bit, it seems, after 3 months, but that is fairly 

> wants to install them where they could disrupt working applications and
> libraries.
> What we are proposing are first-class releases that are as robust as
> any other (and usable in production).

But I am dubious that releases that are obsolete in 6 months and lack 
3rd party support will see much production use.

> It's really about making feature releases more frequent,
 > not making previews available during development.

Given the difficulty of making a complete windows build, it would be 
nice to have one made available every 6 months, regardless of how it is 

I believe that some people will see and use good-for-6-months releases 
as previews of the new features that will be in the 'real', normal, 
bug-fix supported, long-term releases.

Every release is a snapshot of a continuous process, with some extra 
effort made to tie up some (but not all) of the loose ends.

Terry Jan Reedy

From greg at  Wed Jan 18 06:58:51 2012
From: greg at (Gregory P. Smith)
Date: Tue, 17 Jan 2012 21:58:51 -0800
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 12:59 PM, "Martin v. L?wis" <martin at>wrote:

> I'd like to propose a different approach to seeding the string hashes:
> only do so for dictionaries involving only strings, and leave the
> tp_hash slot of strings unchanged.
> Each string would get two hashes: the "public" hash, which is constant
> across runs and bugfix releases, and the dict-hash, which is only used
> by the dictionary implementation, and only if all keys to the dict are
> strings. In order to allow caching of the hash, all dicts should use
> the same hash (if caching wasn't necessary, each dict could use its own
> seed).
> There are several variants of that approach wrt. caching of the hash
> 1. add an additional field to all string objects, to cache the second
>   hash value.

yuck, our objects are large enough as it is.

>   a) variant: in 3.3, drop the extra field, and declare that hashes
>   may change across runs

+1 Absolutely.  We can and should make 3.3 change hashes across runs
(behavior that can be disabled via a flag or environment variable).

I think the issue of doctests and such breaking even in 2.7 due to hash
order changes is a being overblown.  Code like that has already needs to
fix its tests at least once when they want tests to pass on on both 32-bit
and 64-bit python VMs (they have different hashes).  Do we have _any_
measure of how big a deal this will be before going too far here?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg at  Wed Jan 18 07:06:33 2012
From: greg at (Gregory P. Smith)
Date: Tue, 17 Jan 2012 22:06:33 -0800
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 12:52 PM, "Martin v. L?wis" <martin at>wrote:

> > I plan to commit my fix to Python 3.3 if it is accepted. Then write a
> > simplified version to Python 3.2 and backport it to 3.1.
> I'm opposed to any change to the hash values of strings in maintenance
> releases, so I guess I'm opposed to your patch in principle.

Please at least consider his patch for 3.3 onwards then.  Changing the hash
seed per interpreter instance / process is the right thing to do going

What to do on maintenance releases is a separate discussion.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From martin at  Wed Jan 18 08:15:35 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 18 Jan 2012 08:15:35 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

Am 18.01.2012 07:06, schrieb Gregory P. Smith:
> On Tue, Jan 17, 2012 at 12:52 PM, "Martin v. L?wis" <martin at
> <mailto:martin at>> wrote:
>     > I plan to commit my fix to Python 3.3 if it is accepted. Then write a
>     > simplified version to Python 3.2 and backport it to 3.1.
>     I'm opposed to any change to the hash values of strings in maintenance
>     releases, so I guess I'm opposed to your patch in principle.
> Please at least consider his patch for 3.3 onwards then.  Changing the
> hash seed per interpreter instance / process is the right thing to do
> going forward.

For 3.3 onwards, I'm skeptical whether all this configuration support is
really necessary. I think a much smaller patch which leaves no choice
would be more appropriate.


From martin at  Wed Jan 18 08:19:44 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 18 Jan 2012 08:19:44 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

> +1 Absolutely.  We can and should make 3.3 change hashes across runs
> (behavior that can be disabled via a flag or environment variable).
> I think the issue of doctests and such breaking even in 2.7 due to hash
> order changes is a being overblown.  Code like that has already needs to
> fix its tests at least once when they want tests to pass on on both
> 32-bit and 64-bit python VMs (they have different hashes).  Do we have
> _any_ measure of how big a deal this will be before going too far here?

My concern is not about breaking doctests: this proposal will also break
them. My concern is about applications that assume that hash(s) is
stable across runs, and we do have reports that it will break


From g.brandl at  Wed Jan 18 08:46:39 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 18 Jan 2012 08:46:39 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf5i09$r29$>
References: <> <jf508b$k14$>
	<> <jf5i09$r29$>
Message-ID: <jf5t6b$pma$>

Am 18.01.2012 05:32, schrieb Terry Reedy:
> On 1/17/2012 6:42 PM, Antoine Pitrou wrote:
>> On Tue, 17 Jan 2012 18:29:11 -0500
>> Terry Reedy<tjreedy at>  wrote:
>>> To me, as I understand the proposal, the title is wrong. Our current
>>> feather releases already are long-term support versions. They get bugfix
>>> releases at close to 6 month intervals for 1 1/2 -2 years and security
>>> fixes for 3 years. The only change here is that you propose, for
>>> instance, a fixed 6-month interval and 2 year period.
>>> As I read this, you propose to introduce a new short-term (interim,
>>> preview) feature release along with each bugfix release. Each would have
>>> all the bugfixes plus a preview of the new features expected to be in
>>> the next long-term release. (I know, this is not exactly how you spun it.)
> The main point of my comment is that the new thing you are introducing 
> is not long-term supported versions but short term unsupported versions.

That is really a matter of perspective.  For the proposed cycle, there
would be more regular version than LTS versions, so they are the exception
and get the special name.  (And at the same time, the name is already
established and people probably grasp instantly what it means.)

>> Well, "spinning" is important here. We are not proposing any "preview"
>> releases. These would have the same issue as alphas or betas: nobody
> I said nothing about quality. We aim to keep default in near-release 
> condition and seem to be getting better. The new unicode is still 
> getting polished a bit, it seems, after 3 months, but that is fairly 
> unusual.
>> wants to install them where they could disrupt working applications and
>> libraries.
>> What we are proposing are first-class releases that are as robust as
>> any other (and usable in production).
> But I am dubious that releases that are obsolete in 6 months and lack 
> 3rd party support will see much production use.

Whether people would use the releases is probably something that only
they can tell us -- that's why a community survey is mentioned in the

Not sure what you mean by lacking 3rd party support.

>> It's really about making feature releases more frequent,
>  > not making previews available during development.
> Given the difficulty of making a complete windows build, it would be 
> nice to have one made available every 6 months, regardless of how it is 
> labeled.
> I believe that some people will see and use good-for-6-months releases 
> as previews of the new features that will be in the 'real', normal, 
> bug-fix supported, long-term releases.

Maybe they will.  That's another thing that is made clear in the PEP:
for one group of people (those preferring stability over long time),
nothing much changes, except that the release period is a little longer,
and there are these "previews" as you call them.


From p.f.moore at  Wed Jan 18 08:44:30 2012
From: p.f.moore at (Paul Moore)
Date: Wed, 18 Jan 2012 07:44:30 +0000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf5i09$r29$>
References: <> <jf508b$k14$>
	<> <jf5i09$r29$>
Message-ID: <>

On 18 January 2012 04:32, Terry Reedy <tjreedy at> wrote:
>> It's really about making feature releases more frequent,
>> not making previews available during development.
> Given the difficulty of making a complete windows build, it would be nice to
> have one made available every 6 months, regardless of how it is labeled.
> I believe that some people will see and use good-for-6-months releases as
> previews of the new features that will be in the 'real', normal, bug-fix
> supported, long-term releases.

I'd love to see 6-monthly releases, including Windows binaries, and
binary builds of all packages that needed a compiler to build. Oh, and
a pony every LTS release :-)

Seriously, this proposal doesn't really acknowledge the amount of work
by other people that would be needed for a 6-month release to be
*usable* in normal cases (by Windows users, at least). It's usually
some months after a release on the current schedule that Windows
binaries have appeared for everything I use regularly.

I could easily imagine 3rd-party developers tending to only focus on
LTS releases, making the release cycle effectively *slower* for me,
rather than faster.


PS Things that might help improve this: (1) PY_LIMITED_API, and (2)
support in packaging for binary releases, including a way to force
installation of a binary release on the "wrong" version (so that
developers don't have to repackage and publish identical binaries
every 6 months).

From g.brandl at  Wed Jan 18 08:55:08 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 18 Jan 2012 08:55:08 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <> <>
Message-ID: <jf5tm3$sil$>

Am 18.01.2012 01:24, schrieb Jeff Hardy:
> On Tue, Jan 17, 2012 at 3:50 PM, Ezio Melotti <ezio.melotti at> wrote:
>> * What is the effect on PyPy/Jython/IronPython?  Can they just skip the
>> feature releases and focus on the LTS ones?
> At least for IronPython it's unlikely we'd be able track the feature
> releases. We're still trying to catch up as it is.
> Honestly, I don't see the advantages of this. Are there really enough
> new features planned that Python needs a full release more than every
> 18 months?

Yes, we think so.  (What is a non-full release, by the way?)

The main reason is changes in the library.  We have been getting complaints
about the standard library bitrotting for years now, and one of the main
reasons it's so hard to a) get decent code into the stdlib and b) keep it
maintained is that the release cycles are so long.  It's a tough thing for
contributors to accept that the feature you've just implemented will only
be in a stable release in 16 months.

If the stdlib does not get more reactive, it might just as well be cropped
down to a bare core, because 3rd-party libraries do everything as well and
do it before we do.  But you're right that if Python came without batteries,
the current release cycle would be fine.

(Another, more far-reaching proposal, has been to move the stdlib out of
the cpython repo and share a new repo with Jython/IronPython/PyPy.  It could
then also be released separately from the core.  But this is much more work
than the current proposal.)


From p.f.moore at  Wed Jan 18 08:52:20 2012
From: p.f.moore at (Paul Moore)
Date: Wed, 18 Jan 2012 07:52:20 +0000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf5t6b$pma$>
References: <> <jf508b$k14$>
	<> <jf5i09$r29$>
Message-ID: <>

On 18 January 2012 07:46, Georg Brandl <g.brandl at> wrote:
>> But I am dubious that releases that are obsolete in 6 months and lack
>> 3rd party support will see much production use.
> Whether people would use the releases is probably something that only
> they can tell us -- that's why a community survey is mentioned in the
> PEP.

The class of people who we need to consider carefully is those who
want to use the latest release, but are limited by the need for other
parties to release stuff that works with that release (usually, this
means Windows binaries of extensions, or platform vendor packaged
releases of modules/packages). For them, if the other parties focus on
LTS releases (as is possible, certainly) the release cycle became
slower, going from 18 months to 24.

> Not sure what you mean by lacking 3rd party support.

I take it as meaning that the people who release Windows binaries on
PyPI, and vendors who package up PyPI distributions in their own
distribution format. Lacking support in the sense that these people
might well decide that a 6 month cycle is too fast (too much work) and
explicitly decide to focus only on LTS releases.


From g.brandl at  Wed Jan 18 09:00:55 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 18 Jan 2012 09:00:55 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <> <>
Message-ID: <jf5u0u$ujv$>

Am 18.01.2012 00:50, schrieb Ezio Melotti:
> Hi,
> On 17/01/2012 22.34, Antoine Pitrou wrote:
>> [...]
>> Proposal
>> ========
>> Under the proposed scheme, there would be two kinds of feature
>> versions (sometimes dubbed "minor versions", for example 3.2 or 3.3):
>> normal feature versions and long-term support (LTS) versions.
>> Normal feature versions would get either zero or at most one bugfix
>> release; the latter only if needed to fix critical issues.  Security
>> fix handling for these branches needs to be decided.
> If non-LTS releases won't get bug fixes, a bug that is fixed in 3.3.x 
> might not be fixed in 3.4, unless the bug fixes releases are 
> synchronized with the new feature releases (see below).

That's already the case today.  3.2.5 might be released before 3.3.1 and
therefore include bugfixes that 3.3.0 doesn't.  True, there will be a
3.3.1 afterwards that does include it, but in the new case, there will be
a new feature release instead.

>> LTS versions would get regular bugfix releases until the next LTS
>> version is out.  They then would go into security fixes mode, up to a
>> termination date at the release manager's discretion.
>> Periodicity
>> -----------
>> A new feature version would be released every X months.  We
>> tentatively propose X = 6 months.
>> LTS versions would be one out of N feature versions.  We tentatively
>> propose N = 4.
> If LTS bug fixes releases and feature releases are synchronized, we will 
> have something like:
> 3.3
> 3.3.1 / 3.4
> 3.3.2 / 3.5
> 3.3.3 / 3.6
> 3.7
> 3.7.1 / 3.8
> ...
> so every new feature release will have all the bug fixes of the current 
> LTS release, plus new features.
> With this scheme we will soon run out of 1-digit numbers though.
> Currently we already have a 3.x release every ~18 months, so if we keep 
> doing that (just every 24 months instead of 18) and introduce the 
> feature releases in between under a different versioning scheme, we 
> might avoid the problem.
> This means:
> 3.1
> ... 18 months, N bug fix releases...
> 3.2
> ... 18 months, N bug fix releases ...
> 3.3 LTS
> ... 24 months, 3 bug fix releases, 3 feature releases ...
> 3.4 LTS
> ... 24 months, 3 bug fix releases, 3 feature releases ...
> 3.5 LTS
> In this way we solve the numbering problem and keep a familiar scheme 
> (all the 3.x will be LTS and will be released as the same pace as 
> before, no need to mark some 3.x as LTS).  OTOH this will make the 
> feature releases less "noticeable" and people might just ignore them and 
> stick with the LTS releases.  Also we would need to define a versioning 
> convention for the feature releases.

Let's see how Guido feels about 3.10 first.

>> [...]
>> Effect on bugfix cycle
>> ----------------------
>> The effect on fixing bugs should be minimal with the proposed figures.
>> The same number of branches would be simultaneously open for regular
>> maintenance (two until 2.x is terminated, then one).
> Wouldn't it still be two?
> Bug fixes will go to the last LTS and on default, features only on default.

"Maintenance" excludes the feature development branch here.  Will clarify.

>> Effect on workflow
>> ------------------
>> The workflow for new features would be the same: developers would only
>> commit them on the ``default`` branch.
>> The workflow for bug fixes would be slightly updated: developers would
>> commit bug fixes to the current LTS branch (for example ``3.3``) and
>> then merge them into ``default``.
> So here the difference is that instead of committing on the previous 
> release (what currently is 3.2), we commit it to the previous LTS 
> release, ignoring the ones between that and default.


>> If some critical fixes are needed to a non-LTS version, they can be
>> grafted from the current LTS branch to the non-LTS branch, just like
>> fixes are ported from 3.x to 2.7 today.
>> Effect on the community
>> -----------------------
>> People who value stability can just synchronize on the LTS releases
>> which, with the proposed figures, would give a similar support cycle
>> (both in duration and in stability).
> That's why I proposed to keep the same versioning scheme for these 
> releases, and have a different numbering for the feature releases.
>> [...]
>> Discussion
>> ==========
>> These are open issues that should be worked out during discussion:
>> * Decide on X (months between feature releases) and N (feature releases
>>    per LTS release) as defined above.
> This doesn't necessarily have to be fixed, especially if we don't change 
> the versioning scheme (so we don't need to know that we have a LTS 
> release every N releases).

For these relatively short times (X = 6 months), I feel it is important
to fix the time spans to have predictability for our developers.


From mark at  Wed Jan 18 09:47:57 2012
From: mark at (Mark Shannon)
Date: Wed, 18 Jan 2012 08:47:57 +0000
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Matt Joiner wrote:
> Just to clarify, this differs in functionality from enhanced generators 
> by allowing you to yield from an arbitrary call depth rather than having 
> to "yield from" through a chain of calling generators? Furthermore 
> there's no syntactical change except to the bottommost frame doing a 
> co_yield? Does this capture the major differences?

From mark at  Wed Jan 18 10:23:49 2012
From: mark at (Mark Shannon)
Date: Wed, 18 Jan 2012 09:23:49 +0000
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Glyph wrote:
> On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote:
>> Lets start controversially: I don't like PEP 380, I think it's a kludge.
> Too late; it's already accepted.  There's not much point in making 
> controversial statements about it now.

Why is it too late? Presenting this as a fait accompli does not make it 
any better. The PEP mailing list is closed to most people, so what forum 
for debate is there?

>> I think that CPython should have proper coroutines, rather than add 
>> more bits and pieces to generators in an attempt to make them more 
>> like coroutines.
> By "proper" coroutines, you mean implicit coroutines (cooperative 
> threads) rather than explicit coroutines (cooperative generators). 
Nothing "implicit" about it.
>  Python has been going in the "explicit" direction on this question for 
> a long time.  (And, in my opinion, this is the right direction to go, 
> but that's not really relevant here.)

You can use asymmetric coroutines with a scheduler to provide 
cooperative threads if you want, but coroutines not have to be used as 

The key advantages of my coroutine implmentation over PEP 380 are:

1. No syntax change.
2. Code can be used in coroutines without modification.
3. No stack unwinding is required at a yield point.

> I think this discussion would be more suitable for python-ideas though, 
> since you have a long row to hoe here.  There's already a PEP - 
> - apparently deferred and not 
> rejected, which you may want to revisit.
> There are several libraries which can give you cooperative threading 
> already; I assume you're already aware of greenlet and stackless, but I 
> didn't see what advantages your proposed implementation provides over 
> those.  I would guess that one of the first things you should address on 
> python-ideas is why adopting your implementation would be a better idea 
> than just bundling one of those with the standard library :).

Already been discussed:

All of the objections to coroutines (as I propose) also apply to PEP 380.

The advantage of my implementation over greenlets is portability.

I suspect stackless is actually fairly similar to what I have done,
I haven't checked in detail.


From victor.stinner at  Wed Jan 18 10:54:26 2012
From: victor.stinner at (Victor Stinner)
Date: Wed, 18 Jan 2012 10:54:26 +0100
Subject: [Python-Dev] Status of the fix for the hash collision
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/18 "Martin v. L?wis" <martin at>:
> For 3.3 onwards, I'm skeptical whether all this configuration support is
> really necessary. I think a much smaller patch which leaves no choice
> would be more appropriate.

The configuration helps unit testing: see changes on Lib/test/*.py in
my last patch. I hesitate to say that the configuration is required
for tests. Anyway, users upgrading from Python 3.2 to 3.3 may need to
keep the same hash function and don't care of security (e.g. programs
running locally with trusted data).


From hrvoje.niksic at  Wed Jan 18 11:15:49 2012
From: hrvoje.niksic at (Hrvoje Niksic)
Date: Wed, 18 Jan 2012 11:15:49 +0100
Subject: [Python-Dev] Status of the fix for the hash
	collision	vulnerability
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On 01/17/2012 09:29 PM, "Martin v. L?wis" wrote:
>     I(0) = H&  MASK
>     PERTURB(0) = H
>     I(n+1) = (5*I(n) + 1 + PERTURB(n))&  MASK
>     PERTURN(n+1) = PERTURB(n)>>  5
> So if two objects O1 and O2 have the same hash value H, the sequence of
> probed indices is the same for any MASK value. It will be a different
> sequence, yes, but they will still collide on each and every slot.
> This is the very nature of open addressing.

Open addressing can still deploy a collision resolution mechanism 
without this property. For example, double hashing uses a different hash 
function (applied to the key) to calculate PERTURB(0). To defeat it, the 
attacker would have to produce keys that hash the same using both hash 

Double hashing is not a good general solution for Python dicts because 
it complicates the interface of hash tables that support arbitrary keys. 
Still, it could be considered for dicts with known key types (built-ins 
could hardcode the alternative hash function) or for SafeDicts, if they 
are still considered.


From solipsis at  Wed Jan 18 12:00:00 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 12:00:00 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
References: <> <jf508b$k14$>
	<> <jf5i09$r29$>
Message-ID: <>

On Wed, 18 Jan 2012 07:52:20 +0000
Paul Moore <p.f.moore at> wrote:
> On 18 January 2012 07:46, Georg Brandl <g.brandl at> wrote:
> >> But I am dubious that releases that are obsolete in 6 months and lack
> >> 3rd party support will see much production use.
> >
> > Whether people would use the releases is probably something that only
> > they can tell us -- that's why a community survey is mentioned in the
> > PEP.
> The class of people who we need to consider carefully is those who
> want to use the latest release, but are limited by the need for other
> parties to release stuff that works with that release (usually, this
> means Windows binaries of extensions, or platform vendor packaged
> releases of modules/packages).

Well, do consider, though, that anyone not using third-party C
extensions under Windows (either Windows users that are content with
pure Python libs, or users of other platforms) won't have that problem.
That should be quite a lot of people already.

As for vendors, they have their own release management independent of
ours already, so this PEP wouldn't change anything for them.



From solipsis at  Wed Jan 18 12:15:30 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 12:15:30 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 18 Jan 2012 11:37:08 +0900
"Stephen J. Turnbull" <stephen at> wrote:
>  > availability of release management volunteers,
> Dramatic increase here.  It may look like RM is not so demanding --
> run a few scripts to put out the alphas/betas/releases.  But the RM
> needs to stay on top of breaking news, make decisions.  That takes
> time, interrupts other work, etc.

Georg and Barry may answer you here: they are release managers and PEP

>  > quick availability of new features (and behavioural changes),
> These are already *available*, just not *tested*.
> Since testing is the bottleneck on what users consider to be
> "available for me", you cannot decrease the amount of testing (alpha,
> beta releases) by anywhere near the amount you're increasing
> frequency, or you're just producing "as is" snapshots.

The point is to *increase* the amount of testing by making features
available in stable releases on a more frequent basis. Not decrease it.

Alphas and betas never produce much feedback, because people are
reluctant to install them for anything else than toying around. Python
is not emacs or Firefox, you don't use it in a vacuum and therefore
installing non-stable versions is dangerous.



From ncoghlan at  Wed Jan 18 12:26:19 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 18 Jan 2012 21:26:19 +1000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

This won't be a surprise to Antoine or Georg (since I've already
expressed the same opinion privately), but I'm -1 on the idea of
official releases of the whole shebang every 6 months. We're not
Ubuntu, Fedora, Chrome or Firefox with a for-profit company (or large
foundation) with multiple paid employees kicking around to really
drive the QA process. If we had official support from Red Hat or
Canonical promising to devote paid QA and engineering resources to
keeping things on track my opinion might be different, but that is
highly unlikely. I'm also wholly in agreement with Ezio that using the
same versioning scheme for both full releases and interim releases is
thoroughly confusing for users (for example, I consider Red Hat's
completely separate branding and versioning for Fedora and RHEL a
better model for end users than Canonical's more subtle 'Ubuntu' and
'Ubuntu LTS' distinction, and that's been my opinion since long before
I started working for RH).

My original suggestion to Antoine and Georg for 3.4 was that we simply
propose to Larry Hastings (the 3.4 RM) that we spread out the release
cycle, releasing the first alpha after ~6 months, the second after
about ~12, then rolling into the regular release cycle of a final
alpha, some beta releases, one or two release candidates and then the
actual release. However, I'm sympathetic to Antoine's point that early
alphas aren't likely to be at all interesting to folks that would like
a fully supported stdlib update to put into production and no longer
think that suggestion makes much sense on its own.

Instead, if the proposal involves instituting a PEP 3003 style
moratorium (i.e. stdlib changes only) for all interim releases, then
we're essentially talking about splitting the versioning of the core
language (and the CPython C API) and the standard library. If we're
going to discuss that, we may as well go a bit further and just split
development of the two out onto separate branches, with the current
numbering scheme applying to full language version releases and
switching to a date-based versioning scheme for the standard library
(i.e. if 3.3 goes out in August as planned, then it would be "Python
3.3 with the 12.08 stdlib release").

What might such a change mean?

1. For 3.3, the following releases would be made:
    - 3.2.x is cut from the 3.2 branch (1 rc + 1 release)
    - 3.3.0 + PyStdlib 12.08 is created from the default branch (1
alpha, 2 betas, 1+ rc, 1 release)
    - the 3.3 maintenance branch is created
    - the stdlib development branch is created

2. Once 3.2 goes into security-fix only mode, this would then leave us
with 4 active branches:
    - 2.7 (maintenance)
    - 3.3 (maintenance)
    - stdlib (Python 3.3 compatible, PEP 3003 compliant updates)
    - default (3.4 development)

The 2.7 branch would remain a separate head of development, but for
3.x development the update flow would become:
    Bug fixes: 3.3->stdlib->default
    Stdlib features: stdlib->default
    Language changes: default

3. Somewhere around February 2013, we prepare to release Python 3.4a1
and 3.3.1, along with PyStdlib 13.02:
    - 3.3.1 + PyStdlib 12.08 is cut from the 3.3 branch (1 rc + 1 release)
    - 3.3.1 + PyStdlib 13.02 comes from the stdlib branch (1 alpha, 1
beta, 1+ rc, 1 release)
    - 3.4.0a1 comes from the default branch (may include additional
stdlib changes)

4. Around August 2013 this process repeats:
    - 3.3.2 + PyStdlib 12.08 is cut from the 3.3 branch
    - 3.3.2 + PyStdlib 13.08 comes from the stdlib branch (final 3.3
compatible stdlib release)
    - 3.4.0a2 comes from the default branch

5. And then in February 2014, we gear up for a new major release:
    - 3.3.3 is cut from the 3.3 branch and the 3.3 branch enters
security fix only mode
    - 3.4.0 + PyStdlib 14.02 is created from the default branch (1
alpha, 2 betas, 1+ rc, 1 release)
    - the 3.4 maintenance branch is created and merged into the stdlib branch

(alternatively, Feb 2014 could be another interim release of 3.4 alpha
and a 3.3 compatible stdlib updated, with 3.4 delayed until August

I believe this approach would get to the core of what the PEP authors
want (i.e. more frequent releases of the standard library), while
being quite explicit in *avoiding* the concerns associated with more
frequent releases of the core language itself. The rate of updates on
the language spec, the C API (and ABI), the bytecode format and the
AST would remain largely unchanged at 18-24 months. Other key
protocols (e.g. default pickle formats) could also be declared
ineligible for changes in interim releases.

If a critical security problem is found, then additional releases may
be cut for the maintenance branch and for the stdlib branch.

There's a slight annoyance in having all development filtered through
an additional branch, but there's a large advantage in that having a
stable core in the stdlib branch makes it more likely we'll be able to
use it as a venue for collaboration with the PyPy, Jython and
IronPython folks (they all have push rights and a separate branch
means they can use it without having to worry about any of the core
changes going on in the default branch). A separate branch with
combined "3.x.y + PyStdlib YY.MM" releases is also significantly less
work than trying to split the stdlib out completely into a separate


From glyph at  Wed Jan 18 12:27:39 2012
From: glyph at (Glyph)
Date: Wed, 18 Jan 2012 06:27:39 -0500
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 18, 2012, at 4:23 AM, Mark Shannon wrote:

> Glyph wrote:
>> On Jan 17, 2012, at 5:03 PM, Mark Shannon wrote:
>>> Lets start controversially: I don't like PEP 380, I think it's a kludge.
>> Too late; it's already accepted.  There's not much point in making controversial statements about it now.
> Why is it too late?

Because discussion happens before the PEP is accepted.  See the description of the workflow in <>.  The time to object to PEP 380 was when those threads were going on.

> Presenting this as a fait accompli does not make it any better.

But it is[1] a fait accompli, whether you like it or not; I'm first and foremost informing you of the truth, not trying to make you feel better (or worse).  Secondly, I am trying to forestall a long and ultimately pointless conversation :).

> The PEP mailing list is closed to most people,

The PEP mailing list is just where you submit your PEPs, and where the PEP editors do their work.  I'm not on it, but to my understanding of the process, there's not really any debate there.

> so what forum for debate is there?

python-ideas, and then this mailing list, in that order.  Regarding PEP 380 specifically, there's been quite a bit.  See for example <>.  Keep in mind that the purpose of debate in this context is to inform Guido's opinion.  There's no voting involved, although he will occasionally delegate decisions about particular PEPs to people knowledgeable in a relevant area.

>> I think this discussion would be more suitable for python-ideas though [...]
> Already been discussed:

If you're following the PEP process, then the next step would be for you (having built some support) to author a new PEP, or to resurrect the deferred Stackless PEP with some new rationale - personally I'd recommend the latter.

My brief skimming of the linked thread doesn't indicate you have a lot of strong support though, just some people who would be somewhat interested.  So I still think it bears more discussion there, especially on the motivation / justification side of things.

> All of the objections to coroutines (as I propose) also apply to PEP 380.

You might want to see the video of Guido's "Fireside Chat" last year <>.  Skip to a little before 15:00.  He mentions the point that coroutines that can implicitly switch out from under you have the same non-deterministic property as threads: you don't know where you're going to need a lock or lock-like construct to update any variables, so you need to think about concurrency more deeply than if you could explicitly always see a 'yield'.  I have more than one "painful event in my past" (as he refers to it) indicating that microthreads have the same problem as real threads :).

(And yes, they're microthreads, even if you don't have an elaborate scheduling construct.  If you can switch to another stack by making a function call, then you are effectively context switching, and it can become arbitrarily complex.  Any coroutine in a system may introduce an arbitrarily complex microthread scheduler just by calling a function that yields to it.)


([1]: Well actually it isn't, note the dashed line from "Accepted" to "Rejected" in the workflow diagram.  But you have to have a really darn good reason, and championing the rejection of a pep that Guido has explicitly accepted and has liked from pretty much the beginning is going to be very, very hard.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Wed Jan 18 13:30:10 2012
From: barry at (Barry Warsaw)
Date: Wed, 18 Jan 2012 07:30:10 -0500
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 18, 2012, at 08:19 AM, Martin v. L?wis wrote:

>My concern is not about breaking doctests: this proposal will also break
>them. My concern is about applications that assume that hash(s) is
>stable across runs, and we do have reports that it will break

I am a proponent of doctests, and thus use them heavily.  I can tell you that
the issue of dict hashing (non-)order has been well known for *years* and I
have convenience functions in my own doctests to sort and print dict
elements.  Back in my Launchpad days (which has oodles of doctests), many
years ago we went on a tear to fix dict printing when some change in Python
caused them to break.  So I'm not personally worried that such a change would
break any of my own code.

Even though I hope anybody who uses doctests has their own workarounds for
this, I still support being conservative in default behavior for stable
releases, because it's the right thing to do for our users.


From solipsis at  Wed Jan 18 13:30:13 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 13:30:13 +0100
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <1326889813.3395.37.camel@localhost.localdomain>

Le mercredi 18 janvier 2012 ? 21:26 +1000, Nick Coghlan a ?crit :
> I'm also wholly in agreement with Ezio that using the
> same versioning scheme for both full releases and interim releases is
> thoroughly confusing for users 

It's a straight-forward way to track the feature support of a release.
How do you suggest all these "sys.version_info >= (3, 2)" - and the
corresponding documentation snippets a.k.a "versionadded" or
"versionchanged" tags - be spelt otherwise?

> for example, I consider Red Hat's
> completely separate branding and versioning for Fedora and RHEL a
> better model for end users

It's not only branding and versioning, is it? They're completely
different projects with different goals (and different commercial

If you're suggesting we do only short-term releases and leave the
responsibility of long-term support to another project or entity, I'm
not against it, but it's far more radical than what we are proposing in
the PEP :-)

> Instead, if the proposal involves instituting a PEP 3003 style
> moratorium (i.e. stdlib changes only) for all interim releases, then
> we're essentially talking about splitting the versioning of the core
> language (and the CPython C API) and the standard library. If we're
> going to discuss that, we may as well go a bit further and just split
> development of the two out onto separate branches, with the current
> numbering scheme applying to full language version releases and
> switching to a date-based versioning scheme for the standard library
> (i.e. if 3.3 goes out in August as planned, then it would be "Python
> 3.3 with the 12.08 stdlib release").

Well, you're opposing the PEP on the basis that it's workforce-intensive
but you're proposing something much more workforce-intensive :-)

Splitting the stdlib:
- requires someone to do the splitting (highly non-trivial given the
interactions of some modules with interpreter details or low-level C
- requires setting up separate resources (continuous integration with N
stdlib versions and M interpreter versions, for example)
- requires separate maintenance and releases for the stdlib (but with
non-trivial interaction with interpreter maintenance, since they will
affect each other and must be synchronized for Python to be usable at
- requires more attention by users since there are now *two* release
schedules and independent version numbers to track

The former two are one-time costs, but the latter two are recurring

Therefore, splitting the stdlib is much more complicated and involved
than many people think; it's not just "move a few directories around and
be done".
And it's not even obvious it would have an actual benefit, since
developers of other implementations are busy doing just that (see Jeff
Hardy's message in this thread).



From stephen at  Wed Jan 18 13:48:58 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 18 Jan 2012 21:48:58 +0900
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > > Since testing is the bottleneck on what users consider to be
 > > "available for me", you cannot decrease the amount of testing (alpha,
 > > beta releases) by anywhere near the amount you're increasing
 > > frequency, or you're just producing "as is" snapshots.
 > The point is to *increase* the amount of testing by making features
 > available in stable releases on a more frequent basis. Not decrease
 > it.

We're talking about different kinds of testing.  You're talking about
(what old-school commercial software houses meant by "beta") testing
in a production or production prototype environment.  I'd love to see
more of that, too!  My claim is that I don't expect much uptake if you
don't do close to as many of what are called "alpha" and "beta" tests
on python-dev as are currently done.

 > Alphas and betas never produce much feedback, because people are
 > reluctant to install them for anything else than toying around. Python
 > is not emacs or Firefox, you don't use it in a vacuum
 > and therefore installing non-stable versions is dangerous.

Exactly my point, except that the PEP authors seem to think that we
can cut back on the number of alpha and beta prereleases and still
achieve the stability that such users expect from a Python release.  I
don't think that's right.  I expect that unless quite substantial
resources (far more than "proportional to 1/frequency") are devoted to
each non-LTS release, a large fraction of such users to avoid non-LTS
releases the way they avoid betas now.

From solipsis at  Wed Jan 18 14:02:07 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 14:02:07 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <1326891727.3395.44.camel@localhost.localdomain>

Le mercredi 18 janvier 2012 ? 21:48 +0900, Stephen J. Turnbull a ?crit :
> My claim is that I don't expect much uptake if you
> don't do close to as many of what are called "alpha" and "beta" tests
> on python-dev as are currently done.

You claim people won't use stable releases because of not enough alphas?
That sounds completely unrelated. I don't know of any users who would
bother about that.
(you can produce flimsy software with many alphas, too)

>  > Alphas and betas never produce much feedback, because people are
>  > reluctant to install them for anything else than toying around. Python
>  > is not emacs or Firefox, you don't use it in a vacuum
>  > and therefore installing non-stable versions is dangerous.
> Exactly my point, except that the PEP authors seem to think that we
> can cut back on the number of alpha and beta prereleases and still
> achieve the stability that such users expect from a Python release.  I
> don't think that's right.

Sure, and we think it is :)



From ncoghlan at  Wed Jan 18 15:08:49 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 00:08:49 +1000
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <1326889813.3395.37.camel@localhost.localdomain>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 10:30 PM, Antoine Pitrou <solipsis at> wrote:
> Splitting the stdlib:
> - requires someone to do the splitting (highly non-trivial given the
> interactions of some modules with interpreter details or low-level C
> code)
> - requires setting up separate resources (continuous integration with N
> stdlib versions and M interpreter versions, for example)
> - requires separate maintenance and releases for the stdlib (but with
> non-trivial interaction with interpreter maintenance, since they will
> affect each other and must be synchronized for Python to be usable at
> all)
> - requires more attention by users since there are now *two* release
> schedules and independent version numbers to track

Did you read what I actually proposed? I specifically *didn't* propose
separate stdlib releases (for all the reasons you point out), only
separate date based stdlib *versioning*. Distribution of the CPython
interpreter + stdlib would remain monolithic, as it is today. Any
given stdlib release would only be supported for the most recent
language release. The only difference is that between language
releases, where we currently only release maintenance builds, we'd
*also* release a second version of each maintenance build with an
updated standard library, along with an alpha release of the next
language version (with the last part being entirely optional, but I
figured I may as well make the suggestion since I like the idea to
encourage getting syntax updates and the like out for earlier

When you initially pitched the proposal via email, you didn't include
the "language moratarium applies to interim releases" idea. That one
additional suggestion makes the whole concept *much* more appealing to
me, but I only like it on the condition that we decouple the stdlib
versioning from the language definition versioning (even though I
recommend we only officially support very specific combinations of the
two). My suggestion is really just a concrete proposal for
implementing Ezio's idea of only bumping the Python version for
releases with long term support, and using some other mechanism to
distinguish the interim releases.

So, assuming a 2 year LTS cycle, the released versions up to February
2015 with my suggestion would end up being:

>From the default branch:
Python 3.3.0 + stdlib 12.08.0  (~August 2012)
Python 3.4.0a1 + stdlib 14.08.0a1  (~February 2013)
Python 3.4.0a2 + stdlib 14.08.0a2 (~August 2013)
Python 3.4.0a3 + stdlib 14.08.0a3  (~February 2014)
Python 3.4.0a4 + stdlib 14.08.0a4  (~2014)
Python 3.4.0b1 + stdlib 14.08.0b1  (~2014)
Python 3.4.0b2 + stdlib 14.08.0b2  (~2014)
Python 3.4.0c1 + stdlib 14.08.0c1  (~2014)
Python 3.4.0 + stdlib 14.08  (~August 2014)
Python 3.5.0a1 + stdlib 16.08.0a1  (~February 2015)

>From the 3.3 maintenance branch (these are maintenance updates to the
"LTS" release):
Python 3.3.1 + stdlib 12.08.1  (~February 2013)
Python 3.3.2 + stdlib 12.08.2  (~August 2013)
Python 3.3.3 + stdlib 12.08.3  (~February 2014)
Python 3.3.4 + stdlib 12.08.4  (~August 2014) (and 3.3 branch enters
security patch only mode)

>From the 3.4 maintenance branch (these are maintenance updates to the
"LTS" release):
Python 3.4.1 + stdlib 14.08.1  (~February 2015)

>From the stdlib feature development branch (these are the new interim
releases with standard library updates only as proposed by PEP 407):
Python 3.3.1 + stdlib 13.02.0  (~February 2013)
Python 3.3.2 + stdlib 13.08.0  (~August 2013)
Python 3.3.3 + stdlib 14.02.0  (~February 2014) (only upgrade path
from here is to make the jump to 3.4.0)
-- 3.4.0 + 12.08.0 is released from default branch --
Python 3.4.1 + stdlib 15.02.0  (~February 2015)

If we have to make "brown paper bag" releases for the maintenance or
stdlib branches then the micro versions get bumped - the date based
version of the standard library versions relates to when that
particular *API* was realised, not when bugs were last fixed in it. If
a target release date slips, then the stdlib version would be
increased accordingly (cf. Ubuntu 6.06).

Yes, we'd have an extra set of active buildbots to handle the stdlib
branch, but a) that's no harder than creating the buildbots for a new
maintenance branch and b) the interim release proposal will need to
separate language level changes from stdlib level changes *anyway*.

As far as how sys.version checks would be updated, I would propose a
simple API addition to track the new date-based standard lib
versioning: sys.stdlib_version. People could choose to just depend on
a specific Python version (implicitly depending on the stdlib version
that was originally shipped with that version of CPython), or they may
instead decide to depend on a specific stdlib version (implicitly
depending on the first Python version that was shipped with that

The reason I like this scheme is that it allows us (and users) to
precisely track the things that can vary at the two different rates.
At least the following would still be governed by changes in the first
two fields of sys.version (i.e. the major Python version):
    - deprecation policy
    - language syntax
    - compiler AST
    - C ABI stability
    - Windows compilation suite and C runtime version
    - anything else we decide to link with the Python language version
(e.g. default pickle protocol)

However, the addition of date based stdlib versioning would allow us
to clearly identify the new interim releases proposed by PEP 407
*without* mucking up all those things that are currently linked to
sys.version and really *shouldn't* be getting updated every 6 months.
Users get a clear guarantee that if they follow the stdlib updates
instead of the regular maintenance releases, they'll get nice new
features along with their bug fixes, but no new deprecations or
backwards incompatible API changes. However, they're also going to be
obliged to transition to each new language release as it comes out if
they want to continue getting security updates.

Basically, what it boils down to is that I'm now +1 on the general
proposal in the PEP, *so long as*:
1. We get a separate Hg branch for "stdlib only" changes and default
becomes the destination specifically for "language update" changes
(with the latter being a superset of the former)
2. The proposed "interim releases" are denoted by a new date-based
sys.stdlib_version field and sys.version retains its current meaning
(and slow rate of change)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Wed Jan 18 16:06:07 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 01:06:07 +1000
Subject: [Python-Dev] [Python-checkins] Daily reference leaks
	(12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 2:31 PM,  <solipsis at> wrote:
> results for 12de1ad1cee8 on branch "default"
> --------------------------------------------
> test_capi leaked [2008, 2008, 2008] references, sum=6024

Yikes, you weren't kidding about that new subinterpreter code
execution test upsetting the refleak detection...


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From stephen at  Wed Jan 18 16:25:10 2012
From: stephen at (Stephen J. Turnbull)
Date: Thu, 19 Jan 2012 00:25:10 +0900
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <1326891727.3395.44.camel@localhost.localdomain>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > You claim people won't use stable releases because of not enough
 > alphas?  That sounds completely unrelated.

Surely testing is related to user perceptions of stability.  More
testing helps reduce bugs in released software, which improves user
perception of stability, encouraging them to use the software in
production.  Less testing, then, will have the opposite effect.  But
you understand that theory, I'm sure.  So what do you mean to say?

 > (you can produce flimsy software with many alphas, too)

The problem is the converse: can you produce Python-release-quality
software with much less pre-release testing than current feature
releases get?

 > Sure, and we think it is [possible to do that] :)

Given the relative risk of rejecting PEP 407 and me being wrong (the
status quo really isn't all that bad AFAICS), vs. accepting PEP 407
and you being wrong, I don't find a smiley very convincing.  In fact,
I don't find the PEP itself convincing -- and I'm not the only one.

We'll see what Barry and Georg have to say.

From solipsis at  Wed Jan 18 16:51:59 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 16:51:59 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <1326901919.3395.67.camel@localhost.localdomain>

Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit :
>  > You claim people won't use stable releases because of not enough
>  > alphas?  That sounds completely unrelated.
> Surely testing is related to user perceptions of stability.  More
> testing helps reduce bugs in released software, which improves user
> perception of stability, encouraging them to use the software in
> production.

I have asked a practical question, a theoretical answer isn't exactly
what I was waiting for.

>  > Sure, and we think it is [possible to do that] :)
> Given the relative risk of rejecting PEP 407 and me being wrong (the
> status quo really isn't all that bad AFAICS), vs. accepting PEP 407
> and you being wrong, I don't find a smiley very convincing.

I don't care to convince *you*, since you are not involved in Python
development and release management (you haven't ever been a contributor
AFAIK). Unless you produce practical arguments, saying "I don't think
you can do it" is plain FUD and certainly not worth answering to.



From senthil at  Wed Jan 18 16:54:49 2012
From: senthil at (Senthil Kumaran)
Date: Wed, 18 Jan 2012 23:54:49 +0800
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <20120118155449.GE1958@mathmagic>

On Wed, Jan 18, 2012 at 09:26:19PM +1000, Nick Coghlan wrote:
> My original suggestion to Antoine and Georg for 3.4 was that we simply
> propose to Larry Hastings (the 3.4 RM) that we spread out the release
> cycle, releasing the first alpha after ~6 months, the second after
> about ~12, then rolling into the regular release cycle of a final
> alpha, some beta releases, one or two release candidates and then the
> actual release. However, I'm sympathetic to Antoine's point that early
> alphas aren't likely to be at all interesting to folks that would like
> a fully supported stdlib update to put into production and no longer
> think that suggestion makes much sense on its own.

This looks like a 'good bridge' of suggestion between rapid releases
and stable releases. 

What would be purpose of alpha release. Would we encourage people to
use it or test it? Which the rapid relase cycle, the encouragement is
to use rather than test.


From solipsis at  Wed Jan 18 16:56:04 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 16:56:04 +0100
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
References: <>
Message-ID: <>

On Thu, 19 Jan 2012 01:06:07 +1000
Nick Coghlan <ncoghlan at> wrote:
> On Wed, Jan 18, 2012 at 2:31 PM,  <solipsis at> wrote:
> > results for 12de1ad1cee8 on branch "default"
> > --------------------------------------------
> >
> > test_capi leaked [2008, 2008, 2008] references, sum=6024
> Yikes, you weren't kidding about that new subinterpreter code
> execution test upsetting the refleak detection...

Well, these are real leaks, but I expect them to be quite difficult to
track (I've found a couple of them), because they can be scattered
around in C module initialization routines and the like. I suggest we
skip this test on refleak runs.



From pje at  Wed Jan 18 17:01:10 2012
From: pje at (PJ Eby)
Date: Wed, 18 Jan 2012 11:01:10 -0500
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" <martin at>wrote:

> Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
> > could cache a "hash perturbation" computed from the string and the
> > random bits:
> >
> > - hash() would use ob_shash
> > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
> >
> > This way, you cache almost all computations, adding only a computation
> > and a couple logical ops when looking up a string in a dict.
> That's a good idea. For Unicode, it might be best to add another slot
> into the object, even though this increases the object size.

Wouldn't that break the ABI in 2.x?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From brett at  Wed Jan 18 17:14:50 2012
From: brett at (Brett Cannon)
Date: Wed, 18 Jan 2012 11:14:50 -0500
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou <solipsis at> wrote:

> On Thu, 19 Jan 2012 01:06:07 +1000
> Nick Coghlan <ncoghlan at> wrote:
> > On Wed, Jan 18, 2012 at 2:31 PM,  <solipsis at> wrote:
> > > results for 12de1ad1cee8 on branch "default"
> > > --------------------------------------------
> > >
> > > test_capi leaked [2008, 2008, 2008] references, sum=6024
> >
> > Yikes, you weren't kidding about that new subinterpreter code
> > execution test upsetting the refleak detection...
> Well, these are real leaks, but I expect them to be quite difficult to
> track (I've found a couple of them), because they can be scattered
> around in C module initialization routines and the like. I suggest we
> skip this test on refleak runs.

Do we have any general strategy to help make it more fine-grained to detect
where the leak might be coming from? We could then maybe try to get some
people pound on this at the PyCon sprints. Otherwise I'm reluctant to skip
it since they are legitimate leaks that should be get fixed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jan 18 17:27:56 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 17:27:56 +0100
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 18 Jan 2012 11:14:50 -0500
Brett Cannon <brett at> wrote:

> On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou <solipsis at> wrote:
> > On Thu, 19 Jan 2012 01:06:07 +1000
> > Nick Coghlan <ncoghlan at> wrote:
> > > On Wed, Jan 18, 2012 at 2:31 PM,  <solipsis at> wrote:
> > > > results for 12de1ad1cee8 on branch "default"
> > > > --------------------------------------------
> > > >
> > > > test_capi leaked [2008, 2008, 2008] references, sum=6024
> > >
> > > Yikes, you weren't kidding about that new subinterpreter code
> > > execution test upsetting the refleak detection...
> >
> > Well, these are real leaks, but I expect them to be quite difficult to
> > track (I've found a couple of them), because they can be scattered
> > around in C module initialization routines and the like. I suggest we
> > skip this test on refleak runs.
> >
> Do we have any general strategy to help make it more fine-grained to detect
> where the leak might be coming from?

Unfortunately not. I've tried to track down the remaining leaks (*) by
using gc.get_objects(), but apart from a couple of false positives
(dead weakrefs lingering in some tp_subclasses slots until the next
subclasses take their place ;-)), most refleaks seem to be either on
long-lived objects (meaning the leaks are not severe) or on
non-gc-tracked objects.


$ ./python -m test -R 3:2 test_capi
[1/1] test_capi
beginning 5 repetitions
test_capi leaked [152, 152] references, sum=304

> We could then maybe try to get some
> people pound on this at the PyCon sprints. Otherwise I'm reluctant to skip
> it since they are legitimate leaks that should be get fixed.

Well it's the old well-known issue with pseudo-"permanent" references
not being appropriately managed/cleaned up. Which only shows when
calling Py_Initialize/Py_Finalize multiple times, or using



From brett at  Wed Jan 18 17:39:42 2012
From: brett at (Brett Cannon)
Date: Wed, 18 Jan 2012 11:39:42 -0500
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 11:27, Antoine Pitrou <solipsis at> wrote:

> On Wed, 18 Jan 2012 11:14:50 -0500
> Brett Cannon <brett at> wrote:
> > On Wed, Jan 18, 2012 at 10:56, Antoine Pitrou <solipsis at>
> wrote:
> >
> > > On Thu, 19 Jan 2012 01:06:07 +1000
> > > Nick Coghlan <ncoghlan at> wrote:
> > > > On Wed, Jan 18, 2012 at 2:31 PM,  <solipsis at> wrote:
> > > > > results for 12de1ad1cee8 on branch "default"
> > > > > --------------------------------------------
> > > > >
> > > > > test_capi leaked [2008, 2008, 2008] references, sum=6024
> > > >
> > > > Yikes, you weren't kidding about that new subinterpreter code
> > > > execution test upsetting the refleak detection...
> > >
> > > Well, these are real leaks, but I expect them to be quite difficult to
> > > track (I've found a couple of them), because they can be scattered
> > > around in C module initialization routines and the like. I suggest we
> > > skip this test on refleak runs.
> > >
> >
> > Do we have any general strategy to help make it more fine-grained to
> detect
> > where the leak might be coming from?
> Unfortunately not. I've tried to track down the remaining leaks (*) by
> using gc.get_objects(), but apart from a couple of false positives
> (dead weakrefs lingering in some tp_subclasses slots until the next
> subclasses take their place ;-)), most refleaks seem to be either on
> long-lived objects (meaning the leaks are not severe) or on
> non-gc-tracked objects.
> (*)
> $ ./python -m test -R 3:2 test_capi
> [1/1] test_capi
> beginning 5 repetitions
> 12345
> .....
> test_capi leaked [152, 152] references, sum=304
> > We could then maybe try to get some
> > people pound on this at the PyCon sprints. Otherwise I'm reluctant to
> skip
> > it since they are legitimate leaks that should be get fixed.
> Well it's the old well-known issue with pseudo-"permanent" references
> not being appropriately managed/cleaned up. Which only shows when
> calling Py_Initialize/Py_Finalize multiple times, or using
> sub-interpreters.

Could we tweak the report to somehow ignore the permanent refcounts for
just this test? If not then we might as well leave it out since that number
will never hit 0.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jan 18 17:42:15 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 17:42:15 +0100
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 18 Jan 2012 11:39:42 -0500
Brett Cannon <brett at> wrote:
> >
> > > We could then maybe try to get some
> > > people pound on this at the PyCon sprints. Otherwise I'm reluctant to
> > skip
> > > it since they are legitimate leaks that should be get fixed.
> >
> > Well it's the old well-known issue with pseudo-"permanent" references
> > not being appropriately managed/cleaned up. Which only shows when
> > calling Py_Initialize/Py_Finalize multiple times, or using
> > sub-interpreters.
> >
> Could we tweak the report to somehow ignore the permanent refcounts for
> just this test? If not then we might as well leave it out since that number
> will never hit 0.

I can't think of any way to specifically ignore them (if we knew where
they are we could just fix the refleaks :-)).



From stephen at  Wed Jan 18 17:57:12 2012
From: stephen at (Stephen J. Turnbull)
Date: Thu, 19 Jan 2012 01:57:12 +0900
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan writes:

 > >From the stdlib feature development branch (these are the new interim
 > releases with standard library updates only as proposed by PEP 407):
 > Python 3.3.1 + stdlib 13.02.0  (~February 2013)
 > Python 3.3.2 + stdlib 13.08.0  (~August 2013)
 > Python 3.3.3 + stdlib 14.02.0  (~February 2014) (only upgrade path
 > from here is to make the jump to 3.4.0)
 > -- 3.4.0 + 12.08.0 is released from default branch --

Typo? -> 3.4.0 + 14.08.0, right?

 > Python 3.4.1 + stdlib 15.02.0  (~February 2015)

It seems to me there could be considerable divergence between the
stdlib code in

 > the default branch:
 > Python 3.4.0a1 + stdlib 14.08.0a1  (~February 2013)
 > Python 3.4.0a2 + stdlib 14.08.0a2  (~August 2013)
 > Python 3.4.0a3 + stdlib 14.08.0a3  (~February 2014)


 > the stdlib feature development branch
 > Python 3.3.1 + stdlib 13.02.0  (~February 2013)
 > Python 3.3.2 + stdlib 13.08.0  (~August 2013)
 > Python 3.3.3 + stdlib 14.02.0  (~February 2014) (only upgrade path

because 14.08.0a* will be targeting 3.4, and *should* use new language
constructs and APIs where they are appropriate, while 13.02.0 ...
14.02.0 will be targeting the 3.3 API, and mustn't use them.

From dirkjan at  Wed Jan 18 18:32:22 2012
From: dirkjan at (Dirkjan Ochtman)
Date: Wed, 18 Jan 2012 18:32:22 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

On Tuesday, January 17, 2012, Antoine Pitrou <solipsis at> wrote:
> We would like to propose the following PEP to change (C)Python's release
> cycle. Discussion is welcome, especially from people involved in the
> release process, and maintainers from third-party distributions of
> Python.

As a Gentoo packager, this would mean much more work for us, unless all the
non-LTS releases promised to be backwards compatible. I.e. the hard part
for us is managing all the incompatibilities in other packages,
compatibility with Python.

As a user of Python, I would rather dislike the change from 18 to 24 months
for LTS release cycles. And the limiting factor for my use of Python
features is largely old Python versions still in use, not the availability
of newer features in the newest Python. So I'm much more interested in
finding ways of improving 2.7/3.2 uptake than adding more feature releases.
I also think that it would be sensible to wait with something like this
process change until the 3.x adoption curve is much further along.



> Regards
> Antoine.
> PEP: 407
> Title: New release cycle and introducing long-term support versions
> Version: $Revision$
> Last-Modified: $Date$
> Author: Antoine Pitrou <solipsis at>,
>        Georg Brandl <georg at>,
>        Barry Warsaw <barry at>
> Status: Draft
> Type: Process
> Content-Type: text/x-rst
> Created: 2012-01-12
> Post-History:
> Resolution: TBD
> Abstract
> ========
> Finding a release cycle for an open-source project is a delicate
> exercise in managing mutually contradicting constraints: developer
> manpower, availability of release management volunteers, ease of
> maintenance for users and third-party packagers, quick availability of
> new features (and behavioural changes), availability of bug fixes
> without pulling in new features or behavioural changes.
> The current release cycle errs on the conservative side.  It is
> adequate for people who value stability over reactivity.  This PEP is
> an attempt to keep the stability that has become a Python trademark,
> while offering a more fluid release of features, by introducing the
> notion of long-term support versions.
> Scope
> =====
> This PEP doesn't try to change the maintenance period or release
> scheme for the 2.7 branch.  Only 3.x versions are considered.
> Proposal
> ========
> Under the proposed scheme, there would be two kinds of feature
> versions (sometimes dubbed "minor versions", for example 3.2 or 3.3):
> normal feature versions and long-term support (LTS) versions.
> Normal feature versions would get either zero or at most one bugfix
> release; the latter only if needed to fix critical issues.  Security
> fix handling for these branches needs to be decided.
> LTS versions would get regular bugfix releases until the next LTS
> version is out.  They then would go into security fixes mode, up to a
> termination date at the release manager's discretion.
> Periodicity
> -----------
> A new feature version would be released every X months.  We
> tentatively propose X = 6 months.
> LTS versions would be one out of N feature versions.  We tentatively
> propose N = 4.
> With these figures, a new LTS version would be out every 24 months,
> and remain supported until the next LTS version 24 months later.  This
> is mildly similar to today's 18 months bugfix cycle for every feature
> version.
> Pre-release versions
> --------------------
> More frequent feature releases imply a smaller number of disruptive
> changes per release.  Therefore, the number of pre-release builds
> (alphas and betas) can be brought down considerably.  Two alpha builds
> and a single beta build would probably be enough in the regular case.
> The number of release candidates depends, as usual, on the number of
> last-minute fixes before final release.
> Effects
> =======
> Effect on development cycle
> ---------------------------
> More feature releases might mean more stress on the development and
> release management teams.  This is quantitatively alleviated by the
> smaller number of pre-release versions; and qualitatively by the
> lesser amount of disruptive changes (meaning less potential for
> breakage).  The shorter feature freeze period (after the first beta
> build until the final release) is easier to accept.  The rush for
> adding features just before feature freeze should also be much
> smaller.
> Effect on bugfix cycle
> ----------------------
> The effect on fixing bugs should be minimal with the proposed figures.
> The same number of branches would be simultaneously open for regular
> maintenance (two until 2.x is terminated, then one).
> Effect on workflow
> ------------------
> The workflow for new features would be the same: developers would only
> commit them on the ``default`` branch.
> The workflow for bug fixes would be slightly updated: developers would
> commit bug fixes to the current LTS branch (for example ``3.3``) and
> then merge them into ``default``.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From martin at  Wed Jan 18 18:55:31 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 18 Jan 2012 18:55:31 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Am 18.01.2012 17:01, schrieb PJ Eby:
> On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" <martin at
> <mailto:martin at>> wrote:
>     Am 17.01.2012 22:26, schrieb Antoine Pitrou:
>     > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
>     > could cache a "hash perturbation" computed from the string and the
>     > random bits:
>     >
>     > - hash() would use ob_shash
>     > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
>     >
>     > This way, you cache almost all computations, adding only a computation
>     > and a couple logical ops when looking up a string in a dict.
>     That's a good idea. For Unicode, it might be best to add another slot
>     into the object, even though this increases the object size.
> Wouldn't that break the ABI in 2.x?

I was thinking about adding the field at the end, so I thought it
shouldn't. However, if somebody inherits from PyUnicodeObject, it still
might - so my new proposal is to add the extra hash into the str block,
either at str[-1], or after the terminating 0. This would cause an
average increase of four bytes of the storage (0 bytes in 50% of the
cases, 8 bytes because of padding in the other 50%).

What do you think?


From brett at  Wed Jan 18 18:56:21 2012
From: brett at (Brett Cannon)
Date: Wed, 18 Jan 2012 12:56:21 -0500
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 09:08, Nick Coghlan <ncoghlan at> wrote:

> On Wed, Jan 18, 2012 at 10:30 PM, Antoine Pitrou <solipsis at>
> wrote:
> > Splitting the stdlib:
> > - requires someone to do the splitting (highly non-trivial given the
> > interactions of some modules with interpreter details or low-level C
> > code)
> > - requires setting up separate resources (continuous integration with N
> > stdlib versions and M interpreter versions, for example)
> > - requires separate maintenance and releases for the stdlib (but with
> > non-trivial interaction with interpreter maintenance, since they will
> > affect each other and must be synchronized for Python to be usable at
> > all)
> > - requires more attention by users since there are now *two* release
> > schedules and independent version numbers to track
> Did you read what I actually proposed? I specifically *didn't* propose
> separate stdlib releases (for all the reasons you point out), only
> separate date based stdlib *versioning*. Distribution of the CPython
> interpreter + stdlib would remain monolithic, as it is today. Any
> given stdlib release would only be supported for the most recent
> language release. The only difference is that between language
> releases, where we currently only release maintenance builds, we'd
> *also* release a second version of each maintenance build with an
> updated standard library, along with an alpha release of the next
> language version (with the last part being entirely optional, but I
> figured I may as well make the suggestion since I like the idea to
> encourage getting syntax updates and the like out for earlier
> experimentation).

> When you initially pitched the proposal via email, you didn't include
> the "language moratarium applies to interim releases" idea. That one
> additional suggestion makes the whole concept *much* more appealing to
> me, but I only like it on the condition that we decouple the stdlib
> versioning from the language definition versioning (even though I
> recommend we only officially support very specific combinations of the
> two). My suggestion is really just a concrete proposal for
> implementing Ezio's idea of only bumping the Python version for
> releases with long term support, and using some other mechanism to
> distinguish the interim releases.

IOW we would have a language moratorium every 2 years (i.e. between LTS
releases) while switching to a 6 month release cycle for language/VM
bugfixes and full stdlib releases? I would support that as it has several
benefits from several angles.

>From a VM perspective, it gives other VMs 2 years to catch up to the next
release instead of 18 months; not a big switch, but still better than
shortening it.

It also makes disruptive language changes less frequent so people have more
time to catch up, update books/docs, etc. We can also let them bake longer
and we all get more experience with them.

Doing a release every 6 months that includes updates to the stdlib and
bugfixes to the language/VM also benefits other VMs by getting
compatibility fixes in faster. All of the other VM maintainers have told me
that keeping the stdlib non-CPython compliant is the biggest hurdle. This
kind of switch means they could release a VM that supports a release 6
months or a year after a language change release (e.g. 1 to 2 releases in)
so as to get changes in faster and lower the need to keep their own fork.

It should also increase the chances of external developers of projects
being willing to become core developers and contributing their project to
Python. If they get to keep a 6 month release cycle we could consider
pulling in project like httplib2 and others that have resisted inclusion in
the stdlib because painfully long (for them) wait between releases.

> So, assuming a 2 year LTS cycle, the released versions up to February
> 2015 with my suggestion would end up being:
> >From the default branch:
> Python 3.3.0 + stdlib 12.08.0  (~August 2012)
> Python 3.4.0a1 + stdlib 14.08.0a1  (~February 2013)
> Python 3.4.0a2 + stdlib 14.08.0a2 (~August 2013)
> Python 3.4.0a3 + stdlib 14.08.0a3  (~February 2014)
> Python 3.4.0a4 + stdlib 14.08.0a4  (~2014)
> Python 3.4.0b1 + stdlib 14.08.0b1  (~2014)
> Python 3.4.0b2 + stdlib 14.08.0b2  (~2014)
> Python 3.4.0c1 + stdlib 14.08.0c1  (~2014)
> Python 3.4.0 + stdlib 14.08  (~August 2014)
> Python 3.5.0a1 + stdlib 16.08.0a1  (~February 2015)
> >From the 3.3 maintenance branch (these are maintenance updates to the
> "LTS" release):
> Python 3.3.1 + stdlib 12.08.1  (~February 2013)
> Python 3.3.2 + stdlib 12.08.2  (~August 2013)
> Python 3.3.3 + stdlib 12.08.3  (~February 2014)
> Python 3.3.4 + stdlib 12.08.4  (~August 2014) (and 3.3 branch enters
> security patch only mode)
> >From the 3.4 maintenance branch (these are maintenance updates to the
> "LTS" release):
> Python 3.4.1 + stdlib 14.08.1  (~February 2015)
> >From the stdlib feature development branch (these are the new interim
> releases with standard library updates only as proposed by PEP 407):
> Python 3.3.1 + stdlib 13.02.0  (~February 2013)
> Python 3.3.2 + stdlib 13.08.0  (~August 2013)
> Python 3.3.3 + stdlib 14.02.0  (~February 2014) (only upgrade path
> from here is to make the jump to 3.4.0)
> -- 3.4.0 + 12.08.0 is released from default branch --
> Python 3.4.1 + stdlib 15.02.0  (~February 2015)
> If we have to make "brown paper bag" releases for the maintenance or
> stdlib branches then the micro versions get bumped - the date based
> version of the standard library versions relates to when that
> particular *API* was realised, not when bugs were last fixed in it. If
> a target release date slips, then the stdlib version would be
> increased accordingly (cf. Ubuntu 6.06).
> Yes, we'd have an extra set of active buildbots to handle the stdlib
> branch, but a) that's no harder than creating the buildbots for a new
> maintenance branch and b) the interim release proposal will need to
> separate language level changes from stdlib level changes *anyway*.
> As far as how sys.version checks would be updated, I would propose a
> simple API addition to track the new date-based standard lib
> versioning: sys.stdlib_version. People could choose to just depend on
> a specific Python version (implicitly depending on the stdlib version
> that was originally shipped with that version of CPython), or they may
> instead decide to depend on a specific stdlib version (implicitly
> depending on the first Python version that was shipped with that
> stdlib).
> The reason I like this scheme is that it allows us (and users) to
> precisely track the things that can vary at the two different rates.
> At least the following would still be governed by changes in the first
> two fields of sys.version (i.e. the major Python version):
>    - deprecation policy
>    - language syntax
>    - compiler AST
>    - C ABI stability
>    - Windows compilation suite and C runtime version
>    - anything else we decide to link with the Python language version
> (e.g. default pickle protocol)
> However, the addition of date based stdlib versioning would allow us
> to clearly identify the new interim releases proposed by PEP 407
> *without* mucking up all those things that are currently linked to
> sys.version and really *shouldn't* be getting updated every 6 months.
> Users get a clear guarantee that if they follow the stdlib updates
> instead of the regular maintenance releases, they'll get nice new
> features along with their bug fixes, but no new deprecations or
> backwards incompatible API changes. However, they're also going to be
> obliged to transition to each new language release as it comes out if
> they want to continue getting security updates.
> Basically, what it boils down to is that I'm now +1 on the general
> proposal in the PEP, *so long as*:
> 1. We get a separate Hg branch for "stdlib only" changes and default
> becomes the destination specifically for "language update" changes
> (with the latter being a superset of the former)
> 2. The proposed "interim releases" are denoted by a new date-based
> sys.stdlib_version field and sys.version retains its current meaning
> (and slow rate of change)
I don't think we need to do a new versioning scheme. Why can't we just say
which releases are covered by a language moratorium? The community seemed
to pick up on that rather well when we did it for Python 3 and I didn't see
anyone having difficulty explaining it when someone didn't know what was
going on. As long as we are clear which releases are under a language
moratorium and which one's aren't we shouldn't need to switch to language +
stdlib versioning scheme. This will lead to use reaching Python 4 faster
(in about 4 years), but even that doesn't need to be a big deal. Linux
jumped from 2 to 3 w/o issue. Once again, as long as we are clear on which
new versions have language changes it should be clear as to what to expect.

Otherwise I say we just  bump the major version when we do a
language-changing release (i.e. every 2 years) and just to a minor/feature
number bump (i.e. every 6 months) when we add/change stuff to the stdlib.
People can then be told "learn Python 4" which is easy to point out on
docs, e.g. you won't have to go digging for what minor/feature release a
book covers, just what major release which will probably be emblazoned on
the cover. And with the faster stdlib release schedule other VMs can aim
for X.N versions when they have all the language features *and* all of
their compatibility fixes into the stdlib. And then once they hit that they
can just continue to support that major version by just keeping up with
minor releases with compatibility fixes (which buildbots can help

And honestly, if we don't go with this I'm with Georg's comment in another
email of beginning to consider stripping the stdlib down to core libraries
to help stop with the bitrot (sorry, Paul). If we can't attract new
replacements for modules we can't ditch because of backwards compatibility
I start to wonder if I should even care about improving the stdlib outside
of core code required to make Python simply function.


> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From martin at  Wed Jan 18 18:52:23 2012
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 18 Jan 2012 18:52:23 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Am 18.01.2012 13:30, schrieb Barry Warsaw:
> On Jan 18, 2012, at 08:19 AM, Martin v. L?wis wrote:
>> My concern is not about breaking doctests: this proposal will also break
>> them. My concern is about applications that assume that hash(s) is
>> stable across runs, and we do have reports that it will break
>> applications.
> I am a proponent of doctests, and thus use them heavily.  I can tell you that
> the issue of dict hashing (non-)order has been well known for *years* and I
> have convenience functions in my own doctests to sort and print dict
> elements.

Indeed. So that breakage may actually be less than people expect.

As for cases that still rely on dict order: none of the proposed
solutions preserve full compatibility in dict order. The only solution
(not actually proposed so far) is to add an AVL tree into the hash
table, to track keys that collide on hash values (rather than hash
slots). Such a tree would be only used if there is an actual collision,
which, in practical dict usage, never occurs.

I've been seriously considering implementing a balanced tree inside
the dict (again for string-only dicts, as ordering can't be guaranteed
otherwise). However, this would be a lot of code for a security fix.
It *would* solve the issue for good, though.


From solipsis at  Wed Jan 18 19:37:33 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 19:37:33 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

Hello Dirkjan,

On Wed, 18 Jan 2012 18:32:22 +0100
Dirkjan Ochtman <dirkjan at> wrote:
> On Tuesday, January 17, 2012, Antoine Pitrou <solipsis at> wrote:
> > We would like to propose the following PEP to change (C)Python's release
> > cycle. Discussion is welcome, especially from people involved in the
> > release process, and maintainers from third-party distributions of
> > Python.
> As a Gentoo packager, this would mean much more work for us, unless all the
> non-LTS releases promised to be backwards compatible. I.e. the hard part
> for us is managing all the incompatibilities in other packages,
> compatibility with Python.

It might need to be spelt clearly in the PEP, but one of my assumptions
is that packagers choose on what release series they want to
synchronize. So packagers can synchronize on the LTS releases if it's
more practical for them, or if it maps better to their own release
model (e.g. Debian).

Do you think that's a valid answer to Gentoo's concerns?

> So I'm much more interested in
> finding ways of improving 2.7/3.2 uptake than adding more feature releases.

That would be nice as well, but I think it's orthogonal to the PEP.
Besides, I'm afraid there's not much we (python-dev) can do about it.
Some vendors (Debian, Redhat) will always lag behind the bleeding-edge
feature releases.



From g.brandl at  Wed Jan 18 19:43:13 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 18 Jan 2012 19:43:13 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <jf73l8$95c$>

Am 18.01.2012 16:25, schrieb Stephen J. Turnbull:
> Antoine Pitrou writes:
>  > You claim people won't use stable releases because of not enough
>  > alphas?  That sounds completely unrelated.
> Surely testing is related to user perceptions of stability.  More
> testing helps reduce bugs in released software, which improves user
> perception of stability, encouraging them to use the software in
> production.  Less testing, then, will have the opposite effect.  But
> you understand that theory, I'm sure.  So what do you mean to say?
>  > (you can produce flimsy software with many alphas, too)
> The problem is the converse: can you produce Python-release-quality
> software with much less pre-release testing than current feature
> releases get?
>  > Sure, and we think it is [possible to do that] :)
> Given the relative risk of rejecting PEP 407 and me being wrong (the
> status quo really isn't all that bad AFAICS), vs. accepting PEP 407
> and you being wrong, I don't find a smiley very convincing.

"The status quo really isn't all that bad" applies to any PEP.  Also,
compared to most PEPs, it is quite easy to revert to the previous
state of things if they don't work out as wanted.

> In fact,
> I don't find the PEP itself convincing -- and I'm not the only one.

That is noted.  And I think Antoine was a little harsh earlier; of
course we also need to convince users that the new cycle is
advantageous and not detrimental.

> We'll see what Barry and Georg have to say.

Two things:

a) The release manager's job is not as bad as you might believe.  We
have an incredibly helpful and active core of developers which means
that the RM job is more or less "reduced" to pronouncing on changes
during the rc phase, and actually producing the releases.

b) I did not have the impression (maybe someone can underline that
with tracker stats?) that there were a lot more bug reports than
usual during the alpha and early beta stages of Python 3.2.


From g.brandl at  Wed Jan 18 19:46:38 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 18 Jan 2012 19:46:38 +0100
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <jf73rl$aoj$>

Am 18.01.2012 18:56, schrieb Brett Cannon:

> IOW we would have a language moratorium every 2 years (i.e. between LTS
> releases) while switching to a 6 month release cycle for language/VM bugfixes
> and full stdlib releases?

That is certainly a possibility (it's listed as an open issue in the PEP).

> I would support that as it has several benefits from
> several angles.
> From a VM perspective, it gives other VMs 2 years to catch up to the next
> release instead of 18 months; not a big switch, but still better than shortening it.
> It also makes disruptive language changes less frequent so people have more time
> to catch up, update books/docs, etc. We can also let them bake longer and we all
> get more experience with them.

Yes.  In the end, the moratorium really was a good idea, and this would be
carrying on the spirit.

> Doing a release every 6 months that includes updates to the stdlib and bugfixes
> to the language/VM also benefits other VMs by getting compatibility fixes in
> faster. All of the other VM maintainers have told me that keeping the stdlib
> non-CPython compliant is the biggest hurdle. This kind of switch means they
> could release a VM that supports a release 6 months or a year after a language
> change release (e.g. 1 to 2 releases in) so as to get changes in faster and
> lower the need to keep their own fork.
> It should also increase the chances of external developers of projects being
> willing to become core developers and contributing their project to Python. If
> they get to keep a 6 month release cycle we could consider pulling in project
> like httplib2 and others that have resisted inclusion in the stdlib because
> painfully long (for them) wait between releases.



From v+python at  Wed Jan 18 20:09:27 2012
From: v+python at (Glenn Linderman)
Date: Wed, 18 Jan 2012 11:09:27 -0800
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

On 1/18/2012 9:52 AM, "Martin v. L?wis" wrote:
> I've been seriously considering implementing a balanced tree inside
> the dict (again for string-only dicts, as ordering can't be guaranteed
> otherwise). However, this would be a lot of code for a security fix.
> It*would*  solve the issue for good, though.

To handle keys containing non-orderable keys along with strings, which 
are equally vulnerable to string-only keys, especially if the non-string 
components can have fixed values during an attack, you could simply use 
their hash value as an orderable proxy for the non-orderable key components.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Jan 18 21:50:57 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 18 Jan 2012 21:50:57 +0100
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
References: <>
Message-ID: <>

Well, they should be fixed now :-)



On Wed, 18 Jan 2012 17:42:15 +0100
Antoine Pitrou <solipsis at> wrote:
> On Wed, 18 Jan 2012 11:39:42 -0500
> Brett Cannon <brett at> wrote:
> > >
> > > > We could then maybe try to get some
> > > > people pound on this at the PyCon sprints. Otherwise I'm reluctant to
> > > skip
> > > > it since they are legitimate leaks that should be get fixed.
> > >
> > > Well it's the old well-known issue with pseudo-"permanent" references
> > > not being appropriately managed/cleaned up. Which only shows when
> > > calling Py_Initialize/Py_Finalize multiple times, or using
> > > sub-interpreters.
> > >
> > 
> > Could we tweak the report to somehow ignore the permanent refcounts for
> > just this test? If not then we might as well leave it out since that number
> > will never hit 0.
> I can't think of any way to specifically ignore them (if we knew where
> they are we could just fix the refleaks :-)).
> Regards
> Antoine.

From fwierzbicki at  Wed Jan 18 22:31:26 2012
From: fwierzbicki at (fwierzbicki at
Date: Wed, 18 Jan 2012 13:31:26 -0800
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 9:56 AM, Brett Cannon <brett at> wrote:

> Doing a release every 6 months that includes updates to the stdlib and
> bugfixes to the language/VM also benefits other VMs by getting compatibility
> fixes in faster. All of the other VM maintainers have told me that keeping
> the stdlib non-CPython compliant is the biggest hurdle. This kind of switch
> means they could release a VM that supports a release 6 months or a year
> after a language change release (e.g. 1 to 2 releases in) so as to get
> changes in faster and lower the need to keep their own fork.
As one of the other VM maintainers I agree with everything Brett has
said here. The proposal sounds very good to me from that perspective.


From steve at  Thu Jan 19 01:12:06 2012
From: steve at (Steven D'Aprano)
Date: Thu, 19 Jan 2012 11:12:06 +1100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <1326901919.3395.67.camel@localhost.localdomain>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>
Message-ID: <>

Antoine Pitrou wrote:
> Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit :
>>  > You claim people won't use stable releases because of not enough
>>  > alphas?  That sounds completely unrelated.
>> Surely testing is related to user perceptions of stability.  More
>> testing helps reduce bugs in released software, which improves user
>> perception of stability, encouraging them to use the software in
>> production.
> I have asked a practical question, a theoretical answer isn't exactly
> what I was waiting for.
> I don't care to convince *you*, since you are not involved in Python
> development and release management (you haven't ever been a contributor
> AFAIK). Unless you produce practical arguments, saying "I don't think
> you can do it" is plain FUD and certainly not worth answering to.

Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the 
sort of people you DO have to convince that moving to an accelerated or more 
complex release process will result in a better product. The risk is that you 
will lose users, or fragment the user base even more than it is now with 2.x 
vs 3.x.

Quite frankly, I like the simplicity and speed of the current release cycle. 
All this talk about separate LTS releases and parallel language releases and 
library releases makes my head spin. I fear the day that people asking 
questions on the tutor or python-list mailing lists will have to say (e.g.) 
"I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the 
version they're using.

I fear change, because the current system works well and for every way to make 
it better there are a thousand ways to make it worse. Dismissing fears like 
this as FUD doesn't do anyone any favours.

One on-going complaint is that Python-Dev doesn't have the manpower or time to 
do everything that needs to be done. Bugs languish for months or years because 
nobody has the time to look at it. Will going to a more rapid release cycle 
give people more time, or just increase their workload? You're hoping that a 
more rapid release cycle will attract more developers, and there is a chance 
that you could be right; but a more rapid release cycle WILL increase the 
total work load. So you're betting that this change will attract enough new 
developers that the work load per person will decrease even as the total work 
load increases. I don't think that's a safe bet.


From steve at  Thu Jan 19 01:19:29 2012
From: steve at (Steven D'Aprano)
Date: Thu, 19 Jan 2012 11:19:29 +1100
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>	<>	<1326889813.3395.37.camel@localhost.localdomain>	<>
Message-ID: <>

Brett Cannon wrote:

> And honestly, if we don't go with this I'm with Georg's comment in another
> email of beginning to consider stripping the stdlib down to core libraries
> to help stop with the bitrot (sorry, Paul). If we can't attract new
> replacements for modules we can't ditch because of backwards compatibility
> I start to wonder if I should even care about improving the stdlib outside
> of core code required to make Python simply function.

Do we have any evidence of this alleged bitrot? I spend a lot of time on the 
comp.lang.python newsgroup and I see no evidence that people using Python 
believe the standard library is rotting from lack of attention.

I do see people having trouble with installing third party packages. I see 
that stripping back the standard library and forcing people to rely more on 
external libraries will hurt, rather than help, the experience they have with 


From anacrolix at  Thu Jan 19 01:42:00 2012
From: anacrolix at (Matt Joiner)
Date: Thu, 19 Jan 2012 11:42:00 +1100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf5tm3$sil$>
References: <> <>
Message-ID: <>

On Wed, Jan 18, 2012 at 6:55 PM, Georg Brandl <g.brandl at> wrote:
> The main reason is changes in the library. ?We have been getting complaints
> about the standard library bitrotting for years now, and one of the main
> reasons it's so hard to a) get decent code into the stdlib and b) keep it
> maintained is that the release cycles are so long. ?It's a tough thing for
> contributors to accept that the feature you've just implemented will only
> be in a stable release in 16 months.
> If the stdlib does not get more reactive, it might just as well be cropped
> down to a bare core, because 3rd-party libraries do everything as well and
> do it before we do. ?But you're right that if Python came without batteries,
> the current release cycle would be fine.

I think this is the real issue here. The batteries in Python are so
important because:
1) The stability and quality of 3rd party libraries is not guaranteed.
2) The mechanism used to obtain 3rd party libraries, is not popular or
considered reliable.

Much of the "bitrot" is that standard library modules have been
deprecated by third party ones that are of a much higher
functionality. Rather than importing these libraries, it needs to be
trivial to obtain them.

Putting some of these higher quality 3rd party modules into lock step
with Python is an unpopular move, and hampers their future growth.

>From the top of my head, libraries such as LXML, argparse, and
requests are such popular libraries that shouldn't be baked in. In the
long term, it would be nice to see these kinds of libraries dropped
from the standard installation, and made available through the new
distribute package systems etc.

From ethan at  Thu Jan 19 01:01:23 2012
From: ethan at (Ethan Furman)
Date: Wed, 18 Jan 2012 16:01:23 -0800
Subject: [Python-Dev] Writable __doc__
Message-ID: <>

Is there a reason why normal classes can't have their __doc__ strings 
rewritten?  Creating a do-nothing metaclass seems like overkill for such 
a simple operation.

Python 3.2 ... on win32
--> class Test():
...   __doc__ = 'am I permanent?'
--> Test.__doc__
'am I permanent?'
--> Test.__doc__ = 'yes'
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'type' objects is not writable
--> type(Test)
<class 'type'>

--> class Meta(type):
...   "only for exists to allow writable __doc__"
--> class Test(metaclass=Meta):
...   __doc__ = 'am I permanent?'
--> Test.__doc__
'am I permanent?'
--> Test.__doc__ = 'No!'
--> Test.__doc__
--> type(Test)
<class '__main__.Meta'>

Should I create a bug report?


From benjamin at  Thu Jan 19 01:54:39 2012
From: benjamin at (Benjamin Peterson)
Date: Wed, 18 Jan 2012 19:54:39 -0500
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/18 Ethan Furman <ethan at>:
> Is there a reason why normal classes can't have their __doc__ strings
> rewritten? ?Creating a do-nothing metaclass seems like overkill for such a
> simple operation.
> Python 3.2 ... on win32
> --> class Test():
> ... ? __doc__ = 'am I permanent?'
> ...
> --> Test.__doc__
> 'am I permanent?'
> --> Test.__doc__ = 'yes'
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> AttributeError: attribute '__doc__' of 'type' objects is not writable
> --> type(Test)
> <class 'type'>
> --> class Meta(type):
> ... ? "only for exists to allow writable __doc__"
> ...
> --> class Test(metaclass=Meta):
> ... ? __doc__ = 'am I permanent?'
> ...
> --> Test.__doc__
> 'am I permanent?'
> --> Test.__doc__ = 'No!'
> --> Test.__doc__
> 'No!'
> --> type(Test)
> <class '__main__.Meta'>
> Should I create a bug report?

 $ ./python
Python 3.3.0a0 (default:095de2293f39, Jan 18 2012, 10:34:18)
[GCC 4.5.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class Test:
...     __doc__ = "time machine"
>>> Test.__doc__ = "strikes again"
>>> Test.__doc__
'strikes again'


From anacrolix at  Thu Jan 19 01:58:24 2012
From: anacrolix at (Matt Joiner)
Date: Thu, 19 Jan 2012 11:58:24 +1100
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

PEP380 and Mark's coroutines could coexist, so I really don't "it's
too late" matters. Furthermore, PEP380 has utility in its own right
without considering its use for "explicit coroutines".

I would like to see these coroutines considered, but as someone else
mentioned, coroutines via PEP380 enhanced generators have some
interesting characteristics, from my experimentations they feel

From ncoghlan at  Thu Jan 19 02:03:15 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 11:03:15 +1000
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 7:31 AM, fwierzbicki at
<fwierzbicki at> wrote:
> On Wed, Jan 18, 2012 at 9:56 AM, Brett Cannon <brett at> wrote:
>> Doing a release every 6 months that includes updates to the stdlib and
>> bugfixes to the language/VM also benefits other VMs by getting compatibility
>> fixes in faster. All of the other VM maintainers have told me that keeping
>> the stdlib non-CPython compliant is the biggest hurdle. This kind of switch
>> means they could release a VM that supports a release 6 months or a year
>> after a language change release (e.g. 1 to 2 releases in) so as to get
>> changes in faster and lower the need to keep their own fork.
> As one of the other VM maintainers I agree with everything Brett has
> said here. The proposal sounds very good to me from that perspective.

Yes, with the addition of the idea of a PEP 3003 style language change
moratorium for interim releases, I've been converted from an initial
opponent of the idea (since we don't want to give the wider community
whiplash) to a supporter (since some parts of the community,
especially web service developers that deploy to tightly controlled
environments, aren't well served by the standard library's inability
to keep up with externally maintained standards and recommended
development practices).

It means PEP 407 can end up serving two goals:

1. Speeding up the rate of release for the standard library, allowing
enhanced features to be made available to end users sooner.
2. Slowing down (slightly) the rate of release of changes to the core
language and builtins, providing more time for those changes to filter
out through the wider Python ecosystem.

Agreeing with those goals in principle then leaves two key questions
to be addressed:

1. How would we have to update our development practices to make such
a dual versioning scheme feasible?
2. How can we best communicate a new approach to versioning without
unduly confusing developers that have built up certain expectations
about Python's release cycle over the past 20+ years?

For the first point, I think having two active development branches
(one for stdlib updates, one for language updates) will prove to be
absolutely essential. Otherwise all language updates would have to be
landed in the 6 month window between the last stdlib release for a
given language version and the next language release, which seems to
me a crazy way to go about things. As a consequence, I think we'd be
obliged to do something to avoid conflicts on Misc/NEWS (this could be
as simple as splitting it out into NEWS and NEWS_STDLIB, but if we're
restructuring those files anyway, we may also want to do something
about the annoying conflicts between maintenance releases and
development releases).

That then leaves the question of how to best communicate such a change
to the rest of the Python community. This is more a political and
educational question than it is a technical one. A few different
approaches have already been suggested:

1. I believe the PEP currently proposes just taking the "no more than
9" limit off the minor version of the language. Feature releases would
just come out every 6 months, with every 4th release flagged as a
language release. This could even be conveyed programmatically by
offering "sys.lang_version" and "sys.lang_version_info" attributes
that define the *language* version of a given release - 3.3, 3.4, 3.5
and 3.6 would all have something like sys.lang_version == '3.3', and
then in 3.7 (the next language release) it would be updated to say
sys.lang_version == '3.7'.

This approach would require that some policies (such as the
deprecation cycle) by updated to refer to changes in the language
version (sys.lang_version) rather than change in the stdlib version

I don't like this scheme because it tries to use one number (the minor
version field) to cover two very different concepts (stdlib updates
and language updates). While technically feasible, this is
unnecessarily obscure and confusing for end users.

2. Brett's alternative proposal is that we switch to using the major
version for language releases and the minor version for stdlib
releases. We would then release 3.3, 3.4, 3.5 and 3.6 at 6 month
intervals, with 4.0 then being released in August 2014 as a new
language version.

Without taking recent history into acount, I actually like this scheme
- it fits well with traditional usage of major.minor.micro version
numbering. However, I'm not confident that the "python" name will
refer to Python 3 on a majority of systems by 2014 and accessing
Python 4.0 through the "python3" name would just be odd.

It also means we lose our ability to signal to the community when we
plan to make a backwards incompatible language release (making the
assumtion that we're never going to want to do that again would be
incredibly naive). On a related note, we'd also be setting ourselves
to have to explain to everyone that "no, no, Python 3 -> 4 is like
upgrading from Python 3.2 -> 3.3, not 2.7 -> 3.2". I expect the
disruptions of the Python 3 transition will still be fresh enough in
everyone's mind at that point that we really shouldn't go there if we
don't have to.

3. Finally, we get to my proposal: that we just leave sys.version and
sys.version_info alone. They will still refer to Python language
versions, the micro release will be incremented every 6 months or so,
the minor release once every couple of years to indicate a language
update and the major release every decade or so (if absolutely
necessary) to indicate the introduction of backwards

All current intuitions and expectations regarding the meaning of
sys.version and sys.version_info remain completely intact.

However, we would still need *something* to indicate that the stdlib
has changed in the interim releases. This should be a monotically
increasing value, but should also be clearly distinct from the
language version. Hence my proposal of a date based sys.stdlib_version
and sys.stdlib_version_info.

That way, nobody has to *unlearn* anything about current Python
development practices and policies. Instead, all people have to do is
*learn* that we now effectively have two release streams: a date-based
release stream that comes out every 6 months (described by
sys.stdlib_version) and an explicitly numbered release stream
(described by sys.version) that comes out every 24 months.

So in August this year, we would release 3.3+12.08, followed by
3.3+13.02, 3.3+13.08, 3.3+14.02 at 6 month intervals, and then the
next language release as 3.4+14.08. If someone refers to just Python
3.3, then the "at least stdlib 12.08" is implied. If they refer to
Python stdlib 12.08, 13.02, 13.08 or 14.02, then it is the dependency
on "Python 3.3" that is implied.

Two different rates of release -> two different version numbers. Makes
sense to me.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Thu Jan 19 02:06:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 11:06:01 +1000
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano <steve at> wrote:
> Brett Cannon wrote:
> Do we have any evidence of this alleged bitrot? I spend a lot of time on the
> comp.lang.python newsgroup and I see no evidence that people using Python
> believe the standard library is rotting from lack of attention.

IMO, it's a problem mainly with network (especially web) protocols and
file formats. It can take the stdlib a long time to catch up with
external developments due to the long release cycle, so people are
often forced to switch to third party libraries that better track the
latest versions of relevant standards (de facto or otherwise).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From tjreedy at  Thu Jan 19 02:54:45 2012
From: tjreedy at (Terry Reedy)
Date: Wed, 18 Jan 2012 20:54:45 -0500
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <jf7t5b$k4h$>

On 1/18/2012 8:06 PM, Nick Coghlan wrote:
> On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano<steve at>  wrote:
>> Do we have any evidence of this alleged bitrot? I spend a lot of time on the
>> comp.lang.python newsgroup and I see no evidence that people using Python
>> believe the standard library is rotting from lack of attention.
> IMO, it's a problem mainly with network (especially web) protocols and
> file formats. It can take the stdlib a long time to catch up with
> external developments due to the long release cycle, so people are
> often forced to switch to third party libraries that better track the
> latest versions of relevant standards (de facto or otherwise).

Some of those modules are more that 2 years out of date and I guess what 
Brett is saying is that the people interested and able to update them 
will not do so in the stdlib because they want to be able to push out 
feature updates whenever they are needed and available and not be tied 
to a slow release schedule. Morever, since the external standards will 
continue to evolve for the foreseeable future, the need to track them 
more quickly will also continue.

We could relax the ban on new features in micro releases and designate 
such modules as volatile and let them get new features in each x.y.z 
release. In a sense, this would be less drastic than inventing a new 
type of release. Code can require an x.y.z release, as it must if it 
depends on a bug fix not in x.y.0.

I also like the idea of stretching out the alpha release cycle. I would 
like to see 3.3.0a1 appear along with 3.2.3 (in February?). If alpha 
releases are released with all buildbots green, they are as good, at 
least with respect to old features, as a corresponding bugfix release. 
All releases will become more dependable as test coverage improves. 
Again, this idea avoids inventing a new type of release with new release 

I think one reason people avoid alpha releases is that they so quickly 
become obsolete. If one sat for 3 to 6 months, it might get more 
attention. As for any alpha stigma, we should emphasize that alpha only 
mean not feature frozen.

Terry Jan Reedy

From ericsnowcurrently at  Thu Jan 19 04:31:38 2012
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 18 Jan 2012 20:31:38 -0700
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 5:01 PM, Ethan Furman <ethan at> wrote:
> Is there a reason why normal classes can't have their __doc__ strings
> rewritten? ?Creating a do-nothing metaclass seems like overkill for such a
> simple operation.
> Python 3.2 ... on win32
> --> class Test():
> ... ? __doc__ = 'am I permanent?'
> ...
> --> Test.__doc__
> 'am I permanent?'
> --> Test.__doc__ = 'yes'
> Traceback (most recent call last):
> ?File "<stdin>", line 1, in <module>
> AttributeError: attribute '__doc__' of 'type' objects is not writable
> --> type(Test)
> <class 'type'>
> --> class Meta(type):
> ... ? "only for exists to allow writable __doc__"
> ...
> --> class Test(metaclass=Meta):
> ... ? __doc__ = 'am I permanent?'
> ...
> --> Test.__doc__
> 'am I permanent?'
> --> Test.__doc__ = 'No!'
> --> Test.__doc__
> 'No!'
> --> type(Test)
> <class '__main__.Meta'>
> Should I create a bug report?  :)


From thatiparthysreenivas at  Thu Jan 19 05:52:08 2012
From: thatiparthysreenivas at (Sreenivas Reddy T)
Date: Thu, 19 Jan 2012 10:22:08 +0530
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

this is happening on python 2.6 too.

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class Test(type):
...       __doc__=
  File "<stdin>", line 2
SyntaxError: invalid syntax
>>> class Test(type):
...       __doc__='asasdas'
>>> Test.__doc__='sadfsdff'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: attribute '__doc__' of 'type' objects is not writable
>>> type(Test)
<type 'type'>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From noufal at  Thu Jan 19 06:07:30 2012
From: noufal at (Noufal Ibrahim)
Date: Thu, 19 Jan 2012 10:37:30 +0530
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
	(Sreenivas Reddy T.'s message of "Thu, 19 Jan 2012 10:22:08 +0530")
References: <>
Message-ID: <874nvs8elp.fsf@sanitarium.localdomain>

Sreenivas Reddy T <thatiparthysreenivas at> writes:

> this is happening on python 2.6 too.
> Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
> [GCC 4.4.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> class Test(type):
> ...       __doc__=
>   File "<stdin>", line 2
>     __doc__=
>            ^
> SyntaxError: invalid syntax
>>>> class Test(type):
> ...       __doc__='asasdas'
> ...

I don't get any syntax errors (Python2.7 and 2.6)

>>> class Test(object):
...   __doc__ = "Something"
>>> help(Test)

>>> class Test(type):
...   __doc__ = "something"
>>> help(Test)

>>> Test.__doc__

>>>> Test.__doc__='sadfsdff'
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: attribute '__doc__' of 'type' objects is not writable
>>>> type(Test)
> <type 'type'>

The __name__, __bases__, __module__, __abstractmethods__, __dict__ and
__doc__ attributes have custom getters and setters in the type object
definition. __doc__ has only a getter. No setter and no deleter.

That is why you're seeing this. What's the question here?



May I ask a question?

From noufal at  Thu Jan 19 06:12:51 2012
From: noufal at (Noufal Ibrahim)
Date: Thu, 19 Jan 2012 10:42:51 +0530
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <874nvs8elp.fsf@sanitarium.localdomain> (Noufal Ibrahim's message
	of "Thu, 19 Jan 2012 10:37:30 +0530")
References: <>
Message-ID: <87zkdk6zsc.fsf@sanitarium.localdomain>

Noufal Ibrahim <noufal at> writes:


> That is why you're seeing this. What's the question here?


My apologies. I didn't read the whole thread. 


Some bird populations soaring down -Headline of an article in Science News, page 126, February 20, 1993.

From turnbull at  Thu Jan 19 07:29:35 2012
From: turnbull at (Stephen J. Turnbull)
Date: Thu, 19 Jan 2012 15:29:35 +0900
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf73l8$95c$>
References: <>
Message-ID: <>

Georg Brandl writes:

 > "The status quo really isn't all that bad" applies to any PEP.  Also,
 > compared to most PEPs, it is quite easy to revert to the previous
 > state of things if they don't work out as wanted.

That depends on how "doesn't work out" plays out.  If meeting the
schedule *and* producing a good release regularly is just more work
than expected, of course you're right.

If you stick to the schedule with insufficient resources, and lack of
testing produces a really bad release (or worse, a couple of sorta bad
releases in succession), reverting Python's reputation for stability
is going to be non-trivial.

 > a) The release manager's job is not as bad as you might believe.  We
 > have an incredibly helpful and active core of developers which means
 > that the RM job is more or less "reduced" to pronouncing on changes
 > during the rc phase, and actually producing the releases.

I've done release management and I've been watching Python do release
management since PEP 263; I'm well aware that Python has a truly
excellent process in place, and I regularly recommend studying to
friends interested in improving their own projects' processes.

But I've also (twice) been involved (as RM) in a major revision of RM
procedures, and both times it was a lot more work than anybody
expected.  Finally, the whole point of this exercise is to integrate a
lot more stdlib changes (including whole packages) than in the past on
a much shorter timeline, and to do it repeatedly.  "Every six months"
still sounds like a long time if you are a "leaf" project still
working on your changes on your own schedule and chafing at the bit
waiting to get them in to the core project's releases, but it's
actually quite short for the RM.

I'm not against this change (especially since, as Antoine so
graciously pointed out, I'm not going to be actually doing the work in
the foreseeable future), but I do advise that the effort required
seemed to be dramatically underestimated.

 > b) I did not have the impression (maybe someone can underline that
 > with tracker stats?) that there were a lot more bug reports than
 > usual during the alpha and early beta stages of Python 3.2.

Yeah, but the question for Python's stability reputation is "were
there more than zero?"  Every bug that gets through is a risk.

From stephen at  Thu Jan 19 07:33:51 2012
From: stephen at (Stephen J. Turnbull)
Date: Thu, 19 Jan 2012 15:33:51 +0900
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano writes:

 > Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the 
 > sort of people you DO have to convince that moving to an accelerated or more 
 > complex release process will result in a better product.

Well, to be fair, Antoine is right in excluding me from the user base
he's trying to attract (as I understand it).  I do not maintain
products or systems that depend on Python working 99.99999% of the
time, and in fact in many of my personal projects I use trunk.

One of the problems with this kind of discussion is that the targets
of the new procedures are not clear in everybody's mind, but all of us
tend to use generic terms like "users" when we mean to discuss
benefits or costs to a specific class of users.

From greg at  Thu Jan 19 07:59:10 2012
From: greg at (Gregory P. Smith)
Date: Wed, 18 Jan 2012 22:59:10 -0800
Subject: [Python-Dev] Daily reference leaks (12de1ad1cee8): sum=6024
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 18, 2012 at 12:50 PM, Antoine Pitrou <solipsis at> wrote:
> Well, they should be fixed now :-)
> Regards
> Antoine.

awesome! :)

From victor.stinner at  Thu Jan 19 11:02:05 2012
From: victor.stinner at (Victor Stinner)
Date: Thu, 19 Jan 2012 11:02:05 +0100
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

> ?:)

The bug is marked as close, whereas the bug exists in Python 3.2 and
has no been closed. The fix must be backported.


From solipsis at  Thu Jan 19 12:07:59 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 19 Jan 2012 12:07:59 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
References: <>
Message-ID: <>

On Thu, 19 Jan 2012 11:12:06 +1100
Steven D'Aprano <steve at> wrote:
> Antoine Pitrou wrote:
> > Le jeudi 19 janvier 2012 ? 00:25 +0900, Stephen J. Turnbull a ?crit :
> >>  > You claim people won't use stable releases because of not enough
> >>  > alphas?  That sounds completely unrelated.
> >>
> >> Surely testing is related to user perceptions of stability.  More
> >> testing helps reduce bugs in released software, which improves user
> >> perception of stability, encouraging them to use the software in
> >> production.
> > 
> > I have asked a practical question, a theoretical answer isn't exactly
> > what I was waiting for.
> [...]
> > I don't care to convince *you*, since you are not involved in Python
> > development and release management (you haven't ever been a contributor
> > AFAIK). Unless you produce practical arguments, saying "I don't think
> > you can do it" is plain FUD and certainly not worth answering to.
> Pardon me, but people like Stephen Turnbull are *users* of Python, exactly the 
> sort of people you DO have to convince that moving to an accelerated or more 
> complex release process will result in a better product. The risk is that you 
> will lose users, or fragment the user base even more than it is now with 2.x 
> vs 3.x.

Well, you might bring some examples here, but I haven't seen any project
lose users *because* they switched to a faster release cycle (*). I
don't understand why this proposal would fragment the user base, either.
We're not proposing to drop compatibility or build Python 4.

((*) Firefox's decrease in popularity seems to be due to Chrome uptake,
and their new release cycle is arguably in response to that)

> Quite frankly, I like the simplicity and speed of the current release cycle. 
> All this talk about separate LTS releases and parallel language releases and 
> library releases makes my head spin.

Well, the PEP discussion might make your head spin, because various
possibilities are explored. Obviously the final solution will have to
be simple enough to be understood by anyone :-)

(do you find Ubuntu's release model, for example, too complicated?)

> I fear the day that people asking 
> questions on the tutor or python-list mailing lists will have to say (e.g.) 
> "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the 
> version they're using.

Yeah, that's my biggest problem with Nick's proposal. Hopefully we can
avoid parallel version schemes.

> You're hoping that a 
> more rapid release cycle will attract more developers, and there is a chance 
> that you could be right; but a more rapid release cycle WILL increase the 
> total work load. So you're betting that this change will attract enough new 
> developers that the work load per person will decrease even as the total work 
> load increases.

This is not something that we can find out without trying, I think.
As Georg pointed out, the decision is easy to revert or amend if
we find out that the new release cycle is unworkable.



From solipsis at  Thu Jan 19 12:17:51 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 19 Jan 2012 12:17:51 +0100
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 19 Jan 2012 11:03:15 +1000
Nick Coghlan <ncoghlan at> wrote:
> 1. I believe the PEP currently proposes just taking the "no more than
> 9" limit off the minor version of the language. Feature releases would
> just come out every 6 months, with every 4th release flagged as a
> language release.

With the moratorium suggestion factored in, yes. The PEP insists on
support duration rather than the breadth of changes, though. I think
that's a more important piece of information for users.

(you don't care whether or not new language constructs were added, if
you were not planning to use them)

> I don't like this scheme because it tries to use one number (the minor
> version field) to cover two very different concepts (stdlib updates
> and language updates). While technically feasible, this is
> unnecessarily obscure and confusing for end users.

As an end user I wouldn't really care whether a release is "stdlib
changes only" or "language/builtins additions too" (especially in a
language like Python where the boundaries are somewhat blurry). I think
this distinction is useful mainly for experts and therefore not worth
complicating version numbering for.

> 2. Brett's alternative proposal is that we switch to using the major
> version for language releases and the minor version for stdlib
> releases. We would then release 3.3, 3.4, 3.5 and 3.6 at 6 month
> intervals, with 4.0 then being released in August 2014 as a new
> language version.

The main problem I see with this is that Python 3 was a big
disruptive event for the community, and calling a new version "Python
4" may make people anxious at the prospect of compatibility breakage.
Instead of spending some time advertising that "Python 4" is a safe
upgrade, perhaps we could simply call it "Python 3.X+1"?

(and, as you point out, keep "Python X+1" for when we want to change the
language in incompatible ways again)

> So in August this year, we would release 3.3+12.08, followed by
> 3.3+13.02, 3.3+13.08, 3.3+14.02 at 6 month intervals, and then the
> next language release as 3.4+14.08. If someone refers to just Python
> 3.3, then the "at least stdlib 12.08" is implied. If they refer to
> Python stdlib 12.08, 13.02, 13.08 or 14.02, then it is the dependency
> on "Python 3.3" that is implied.

If I were a casual user of a piece of software, I'd really find such a
numbering scheme complicated and intimidating. I don't think most users
want such a level of information.



From solipsis at  Thu Jan 19 12:18:44 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 19 Jan 2012 12:18:44 +0100
Subject: [Python-Dev] Writable __doc__
References: <>
Message-ID: <>

On Wed, 18 Jan 2012 20:31:38 -0700
Eric Snow <ericsnowcurrently at> wrote:
> >
> > Should I create a bug report?
>  :)

Well done Eric :)

From ncoghlan at  Thu Jan 19 12:35:19 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 21:35:19 +1000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Jan 19, 2012 at 9:07 PM, Antoine Pitrou <solipsis at> wrote:
>> I fear the day that people asking
>> questions on the tutor or python-list mailing lists will have to say (e.g.)
>> "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the
>> version they're using.
> Yeah, that's my biggest problem with Nick's proposal. Hopefully we can
> avoid parallel version schemes.

They're not really parallel - the stdlib version would fully determine
the language version. I'm only proposing two version numbers because
we're planning to start versioning *two* things (the standard library,
updated every 6 months, and the language spec, updated every 18-24

Since the latter matches what we do now, I'm merely proposing that we
leave its versioning alone, and add a *new* identiifier specifically
for the interim stdlib updates.

Thinking about it though, I've realised that the sys.version string
already contains a lot more than just the language version number, so
I think it should just be updated to include the stdlib version
information, and the version_info named tuple could get a new 'stdlib'
field as a string.

That way, sys.version and sys.version_info would still fully define
the Python version, we just wouldn't be mucking with the meaning of
any of the existing fields.

For example, the current:

>>> sys.version
'3.2.2 (default, Sep  5 2011, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=2, micro=2, releaselevel='final', serial=0)

might become:

>>> sys.version
'3.3.1 (stdlib 12.08, default, Feb  18 2013, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=3, micro=1, releaselevel='final',
serial=0, stdlib='12.08')

for the maintenance release and:

>>> sys.version
'3.3.1 (stdlib 13.02, default, Feb  18 2013, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=3, micro=1, releaselevel='final',
serial=0, stdlib='13.02')

for the stdlib-only update.

Explicit-is-better-than-implicit'ly yours,

Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Thu Jan 19 13:00:06 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 19 Jan 2012 22:00:06 +1000
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 9:17 PM, Antoine Pitrou <solipsis at> wrote:
> If I were a casual user of a piece of software, I'd really find such a
> numbering scheme complicated and intimidating. I don't think most users
> want such a level of information.

I think the ideal numbering scheme from a *new* user point of view is
the one Brett suggested (where major=language update, minor=stdlib
update), but (as has been noted) there are solid historical reasons we
can't use that.

While I still have misgivings, I'm starting to come around to the idea
of just allowing the minor release number to increment faster (Barry's
co-authorship of the PEP, suggesting he doesn't see such a scheme
causing any problems for Ubuntu is big factor in that). I'd still like
the core language version to be available programmatically, though,
and I'd like the PEP to consider displaying it as part of sys.version
and using it to allow things like having bytecode compatible versions
share bytecode files in the cache.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From barry at  Thu Jan 19 13:00:35 2012
From: barry at (Barry Warsaw)
Date: Thu, 19 Jan 2012 07:00:35 -0500
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 19, 2012, at 12:17 PM, Antoine Pitrou wrote:

>The main problem I see with this is that Python 3 was a big
>disruptive event for the community, and calling a new version "Python
>4" may make people anxious at the prospect of compatibility breakage.


The Python 3 transition is ongoing, and Guido himself at the time thought it
would take 5 years.  I think we're making excellent progress, but there are
still occasional battles just to convince upstream third party developers that
supporting Python 3 (let alone *switching* to Python 3) is even worth the
effort.  I think we're soon going to be at a tipping point where not
supporting Python 3 will be the minority position.  Even if a hypothetical
Python 4 were completely backward compatible, I shudder at the PR nightmare
that would entail.

I'm not saying there will never be a time for Python 4, but I sure hope it's
far enough in the future that you youngun's will be telling us about it in the
Tim Peters Home for Python Old Farts, where we'll smile blankly, bore you
again with stories of vinyl records, phones with real buttons, and Python
1.6.1 while you feed us our mush under chronologically arranged pictures of
BDFLs Van Rossum, Peterson, and Van Rossum.


From benjamin at  Thu Jan 19 14:07:45 2012
From: benjamin at (Benjamin Peterson)
Date: Thu, 19 Jan 2012 08:07:45 -0500
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/19 Victor Stinner <victor.stinner at>:
>> ?:)
> The bug is marked as close, whereas the bug exists in Python 3.2 and
> has no been closed. The fix must be backported.

It's not a bug; it's a feature.


From merwok at  Thu Jan 19 15:03:07 2012
From: merwok at (=?UTF-8?Q?=C3=89ric_Araujo?=)
Date: Thu, 19 Jan 2012 15:03:07 +0100
Subject: [Python-Dev] [Python-checkins] cpython: add str.casefold()
 (closes #13752)
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks for 0b5ce36a7a24 Benjamin.

From pje at  Thu Jan 19 16:17:18 2012
From: pje at (PJ Eby)
Date: Thu, 19 Jan 2012 10:17:18 -0500
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Jan 18, 2012 12:55 PM, Martin v. L?wis <martin at> wrote:
> Am 18.01.2012 17:01, schrieb PJ Eby:
> > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" <martin at
> > <mailto:martin at>> wrote:
> >
> >     Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> >     > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30
> >     > could cache a "hash perturbation" computed from the string and the
> >     > random bits:
> >     >
> >     > - hash() would use ob_shash
> >     > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate &
> >     >
> >     > This way, you cache almost all computations, adding only a
> >     > and a couple logical ops when looking up a string in a dict.
> >
> >     That's a good idea. For Unicode, it might be best to add another
> >     into the object, even though this increases the object size.
> >
> >
> > Wouldn't that break the ABI in 2.x?
> I was thinking about adding the field at the end, so I thought it
> shouldn't. However, if somebody inherits from PyUnicodeObject, it still
> might - so my new proposal is to add the extra hash into the str block,
> either at str[-1], or after the terminating 0. This would cause an
> average increase of four bytes of the storage (0 bytes in 50% of the
> cases, 8 bytes because of padding in the other 50%).
> What do you think?

So far it sounds like the very best solution of all, as far as backward
compatibility is concerned.  If the extra bits are only used when two
strings have a matching hash value, the only doctests that could be
affected are ones testing for this issue.  ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Thu Jan 19 17:36:59 2012
From: ethan at (Ethan Furman)
Date: Thu, 19 Jan 2012 08:36:59 -0800
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/19 Victor Stinner <victor.stinner at>:
>>>  :)
>> The bug is marked as close, whereas the bug exists in Python 3.2 and
>> has no been closed. The fix must be backported.
> It's not a bug; it's a feature.

Where does one draw the line between feature and bug?  As a user I'm 
inclined to classify this as a bug:  __doc__ was writable with old-style 
classes; __doc__ is writable with new-style classes with any metaclass; 
and there exists no good reason (that I'm aware of ;) for __doc__ to not 
be writable.


From guido at  Thu Jan 19 18:21:56 2012
From: guido at (Guido van Rossum)
Date: Thu, 19 Jan 2012 09:21:56 -0800
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 8:36 AM, Ethan Furman <ethan at> wrote:

> Benjamin Peterson wrote:
>> 2012/1/19 Victor Stinner <victor.stinner at>**:
>>>**issue12773 <> :)
>>> The bug is marked as close, whereas the bug exists in Python 3.2 and
>>> has no been closed. The fix must be backported.
>> It's not a bug; it's a feature.
> Where does one draw the line between feature and bug?  As a user I'm
> inclined to classify this as a bug:  __doc__ was writable with old-style
> classes; __doc__ is writable with new-style classes with any metaclass; and
> there exists no good reason (that I'm aware of ;) for __doc__ to not be
> writable.

Like it or not, this has worked this way ever since new-style classes were
introduced. That has made it a de-facto feature. We should not encourage
people to write code that works with a certain bugfix release but not with
the previous bugfix release of the same feature release.

Given that we haven't had any complaints about this in nearly a decade, the
backport can't be important. Don't do it.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From janssen at  Thu Jan 19 18:22:03 2012
From: janssen at (Bill Janssen)
Date: Thu, 19 Jan 2012 09:22:03 PST
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:

> On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano <steve at> wrote:
> > Brett Cannon wrote:
> > Do we have any evidence of this alleged bitrot? I spend a lot of time on the
> > comp.lang.python newsgroup and I see no evidence that people using Python
> > believe the standard library is rotting from lack of attention.
> IMO, it's a problem mainly with network (especially web) protocols and
> file formats. It can take the stdlib a long time to catch up with
> external developments due to the long release cycle, so people are
> often forced to switch to third party libraries that better track the
> latest versions of relevant standards (de facto or otherwise).

I'm not sure how much of a problem this really is.  I continually build
fairly complicated systems with Python that do a lot of HTTP networking,
for instance.  It's fairly easy to replace use of the standard library
modules with use of Tornado and httplib2, and I wouldn't think of *not*
doing that.  But the standard modules are there, out-of-the-box, for
experimentation and tinkering, and they work in the sense that they pass
their module tests.  Are those standard modules as "Internet-proof" as
some commercially-supported package with an income stream that supports
frequent security updates would be?

Perhaps not.  But maybe that's OK.

Another way of doing this would be to "bless" certain third-party
modules in some fashion short of incorporation, and provide them with
more robust development support, again, "somehow", so that they don't
fall by the wayside when their developers move on to something else,
but are still able to release on an independent schedule.


From greg at  Thu Jan 19 18:41:56 2012
From: greg at (Gregory P. Smith)
Date: Thu, 19 Jan 2012 09:41:56 -0800
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Wed, Jan 18, 2012 at 9:55 AM, "Martin v. L?wis" <martin at>wrote:

> Am 18.01.2012 17:01, schrieb PJ Eby:
> > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. L?wis" <martin at
> > <mailto:martin at>> wrote:
> >
> >     Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> >     > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30
> bits
> >     > could cache a "hash perturbation" computed from the string and the
> >     > random bits:
> >     >
> >     > - hash() would use ob_shash
> >     > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
> >     >
> >     > This way, you cache almost all computations, adding only a
> computation
> >     > and a couple logical ops when looking up a string in a dict.
> >
> >     That's a good idea. For Unicode, it might be best to add another slot
> >     into the object, even though this increases the object size.
> >
> > Wouldn't that break the ABI in 2.x?
> I was thinking about adding the field at the end, so I thought it
> shouldn't. However, if somebody inherits from PyUnicodeObject, it still
> might - so my new proposal is to add the extra hash into the str block,
> either at str[-1], or after the terminating 0. This would cause an
> average increase of four bytes of the storage (0 bytes in 50% of the
> cases, 8 bytes because of padding in the other 50%).
> What do you think?

str[-1] is not likely to work if you want to maintain ABI compatibility.
 Appending it to the data after the terminating \0 is more likely to be
possible, but if there is any possibility that existing compiled extension
modules have somehow inlined code to do allocation of the str field even
that is questionable (i don't think there are?).

I'd also be concerned about C API code that uses PyUnicode_Resize(). How do
you keep track of if you have filled in these extra bytes at the end in or
not?  allocation and resize fill it with a magic value indicating "not
filled in" similar to a tp_hash of -1?

Regardless of all of this, I don't think this fully addresses the overall
issue as strings within other hashable data structures like tuples would
not be treated this way, only strings directly stored in a dict.  Sure you
can continue on and "fix" tuples and such in a similar manner but then what
about user defined classes that implement __hash__ based on the return
value of hash() on some strings they contain?

I don't see anything I'd consider a real complete fix unless we also
backport the randomized hash code so that people who need a guaranteed fix
can enable it and use it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Thu Jan 19 19:01:15 2012
From: ericsnowcurrently at (Eric Snow)
Date: Thu, 19 Jan 2012 11:01:15 -0700
Subject: [Python-Dev] PEP 407 / splitting the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 19, 2012 9:28 AM, "Bill Janssen" <janssen at> wrote:
> I'm not sure how much of a problem this really is.  I continually build
> fairly complicated systems with Python that do a lot of HTTP networking,
> for instance.  It's fairly easy to replace use of the standard library
> modules with use of Tornado and httplib2, and I wouldn't think of *not*
> doing that.  But the standard modules are there, out-of-the-box, for
> experimentation and tinkering, and they work in the sense that they pass
> their module tests.  Are those standard modules as "Internet-proof" as
> some commercially-supported package with an income stream that supports
> frequent security updates would be?

This is starting to sound a little like the discussion about the
__preview__ / __experimental__ idea.  If I recall correctly, one of the
points is that for some organizations getting a third-party library
approved for use is not trivial.  In contrast, inclusion in the stdlib is
like a free pass, since the organization can rely on the robustness of the
CPython QA and release processes.

As well, there is at least a small cost with third-party libraries for
those that maintain more rigorous configuration management.  In contrast,
there is basically no extra cost with new/updated stdlib, beyond upgrading


> Perhaps not.  But maybe that's OK.
> Another way of doing this would be to "bless" certain third-party
> modules in some fashion short of incorporation, and provide them with
> more robust development support, again, "somehow", so that they don't
> fall by the wayside when their developers move on to something else,
> but are still able to release on an independent schedule.
> Bill
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Thu Jan 19 19:04:36 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 20 Jan 2012 03:04:36 +0900
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman writes:

 > Where does one draw the line between feature and bug?

Bug:      Doesn't work as documented.
Feature:  Works as expected but not documented[1] to do so.
Miracle:  Works as documented.[2]

Unspecified behavior that doesn't work as you expect is the unmarked
case (ie, none of the above).

The Devil's Dictionary defines feature somewhat differently:

Feature: Name for any behavior you don't feel like justifying to a user.

[1]  Including cases where the patch contains documentation but hasn't
been committed to trunk yet.

[2]  Python is pretty miraculous, isn't it?

From ethan at  Thu Jan 19 18:46:07 2012
From: ethan at (Ethan Furman)
Date: Thu, 19 Jan 2012 09:46:07 -0800
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
 > We should not encourage people to write code that works with a certain
 > bugfix release but not with the previous bugfix release of the same
 > feature release.

Then what's the point of a bug-fix release?  If 3.2.1 had broken 
threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to 
3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is that 
more or less what happened with 3.0?)

> Like it or not, this has worked this way ever since new-style classes 
> were introduced. That has made it a de-facto feature.

But what of the discrepancy between the 'type' metaclass and any other 
Python metaclass?

> Given that we haven't had any complaints about this in nearly a decade, 
> the backport can't be important. Don't do it.



From fuzzyman at  Thu Jan 19 19:40:09 2012
From: fuzzyman at (Michael Foord)
Date: Thu, 19 Jan 2012 18:40:09 +0000
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

On 19/01/2012 17:46, Ethan Furman wrote:
> Guido van Rossum wrote:
> > We should not encourage people to write code that works with a certain
> > bugfix release but not with the previous bugfix release of the same
> > feature release.
> Then what's the point of a bug-fix release?  If 3.2.1 had broken 
> threading, wouldn't we fix it in 3.2.2 and encourage folks to switch 
> to 3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is 
> that more or less what happened with 3.0?)
>> Like it or not, this has worked this way ever since new-style classes 
>> were introduced. That has made it a de-facto feature.
> But what of the discrepancy between the 'type' metaclass and any other 
> Python metaclass?

There are many discrepancies between built-in types and any Python 
class. Writable attributes are (generally) one of them.


>> Given that we haven't had any complaints about this in nearly a 
>> decade, the backport can't be important. Don't do it.
> Agreed.
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From g.brandl at  Thu Jan 19 21:12:01 2012
From: g.brandl at (Georg Brandl)
Date: Thu, 19 Jan 2012 21:12:01 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>
Message-ID: <jf9t7m$hu4$>

Am 19.01.2012 01:12, schrieb Steven D'Aprano:

> One on-going complaint is that Python-Dev doesn't have the manpower or time to 
> do everything that needs to be done. Bugs languish for months or years because 
> nobody has the time to look at it. Will going to a more rapid release cycle 
> give people more time, or just increase their workload? You're hoping that a 
> more rapid release cycle will attract more developers, and there is a chance 
> that you could be right; but a more rapid release cycle WILL increase the 
> total work load. So you're betting that this change will attract enough new 
> developers that the work load per person will decrease even as the total work 
> load increases. I don't think that's a safe bet.

I can't help noticing that so far, worries about the workload came mostly from
people who don't actually bear that load (this is no accusation!), while those
that do are the proponents of the PEP...

That is, I don't want to exclude you from the discussion, but on the issue of
workload I would like to encourage more of our (past and present) release
managers and active bug triagers to weigh in.


From nadeem.vawda at  Thu Jan 19 22:09:40 2012
From: nadeem.vawda at (Nadeem Vawda)
Date: Thu, 19 Jan 2012 23:09:40 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add
 documentation for nargs=argparse.REMAINDER
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi
<python-checkins at> wrote:
> + ?are gathered into a lits. This is commonly useful for command line

s/lits/list ?

From sandro.tosi at  Thu Jan 19 22:17:56 2012
From: sandro.tosi at (Sandro Tosi)
Date: Thu, 19 Jan 2012 22:17:56 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add
 documentation for nargs=argparse.REMAINDER
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 22:09, Nadeem Vawda <nadeem.vawda at> wrote:
> On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi
> <python-checkins at> wrote:
>> + ?are gathered into a lits. This is commonly useful for command line
> s/lits/list ?

crap! I committed an older version of the patch... thanks for spotting
it, i'll fix it right away

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From guido at  Thu Jan 19 22:21:28 2012
From: guido at (Guido van Rossum)
Date: Thu, 19 Jan 2012 13:21:28 -0800
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 9:46 AM, Ethan Furman <ethan at> wrote:

> Guido van Rossum wrote:
> > We should not encourage people to write code that works with a certain
> > bugfix release but not with the previous bugfix release of the same
> > feature release.
> Then what's the point of a bug-fix release?  If 3.2.1 had broken
> threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to
> 3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is that more
> or less what happened with 3.0?)

Usually the bugs fixed in bugfix releases are things that usually go well
but don't work under certain circumstances.

But I'd also be happy to just declare that assignable __doc__ is a feature
without explaining why.

 Like it or not, this has worked this way ever since new-style classes were
> introduced. That has made it a de-facto feature.

But what of the discrepancy between the 'type' metaclass and any other
> Python metaclass?

Michael Foord explained that.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sandro.tosi at  Thu Jan 19 23:10:42 2012
From: sandro.tosi at (Sandro Tosi)
Date: Thu, 19 Jan 2012 23:10:42 +0100
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add
 documentation for nargs=argparse.REMAINDER
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Thu, Jan 19, 2012 at 22:07, Terry Reedy <tjreedy at> wrote:
> typo
> lits .> list

yep, i've already fixed it committing a more useful example too

Sandro Tosi (aka morph, morpheus, matrixhasu)
My website:
Me at Debian:

From tjreedy at  Thu Jan 19 23:14:49 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 19 Jan 2012 17:14:49 -0500
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>
Message-ID: <jfa4l0$a3i$>

On 1/19/2012 1:04 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>   >  Where does one draw the line between feature and bug?
> Bug:      Doesn't work as documented.

The basic idea is that the x.y docs define (mostly) the x.y language. 
Patches to the x.y docs fix typos, omissions, ambiguities, and the 
occasional error. The x.y.z cpython releases are increasingly better 
implementations of Python x.y.

Terry Jan Reedy

From ethan at  Thu Jan 19 23:44:18 2012
From: ethan at (Ethan Furman)
Date: Thu, 19 Jan 2012 14:44:18 -0800
Subject: [Python-Dev] Writable __doc__
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> Where does one draw the line between feature and bug?
> Miracle:  Works as documented.[2]
> [2]  Python is pretty miraculous, isn't it?

Yes, indeed it is!  :)


From martin at  Fri Jan 20 00:54:00 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 20 Jan 2012 00:54:00 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jf9t7m$hu4$>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>	<1326901919.3395.67.camel@localhost.localdomain>	<>
Message-ID: <>

> I can't help noticing that so far, worries about the workload came mostly from
> people who don't actually bear that load (this is no accusation!), while those
> that do are the proponents of the PEP...

Ok, so let me add then that I'm worried about the additional work-load.

I'm particularly worried about the coordination of vacation across the
three people that work on a release. It might well not be possible to
make any release for a period of two months, which, in a six-months
release cycle with two alphas and a beta, might mean that we (the
release people) would need to adjust our vacation plans with the release
schedule, or else step down (unless you would release the "normal"
feature releases as source-only releases).

FWIW, it might well be that I can't be available for the 3.3 final
release (I haven't finalized my vacation schedule yet for August).


From vijaymajagaonkar at  Fri Jan 20 00:56:25 2012
From: vijaymajagaonkar at (Vijay N. Majagaonkar)
Date: Thu, 19 Jan 2012 18:56:25 -0500
Subject: [Python-Dev] python build failed on mac
Message-ID: <>

Hi all,

I am trying to build python 3 on mac and build failing with following error
can somebody help me with this

$ hg clone

$ ./configure
$ make

gcc   -framework CoreFoundation -o python.exe Modules/python.o
libpython3.3m.a -ldl  -framework CoreFoundation
./python.exe -SE -m sysconfig --generate-posix-vars
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
python.exe(43296) malloc: *** mmap(size=7310873954244194304) failed (error
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
make: *** [Lib/] Segmentation fault: 11
make: *** Deleting file `Lib/'

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From victor.stinner at  Fri Jan 20 01:48:53 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 01:48:53 +0100
Subject: [Python-Dev] Counting collisions for the win
Message-ID: <>


I'm working on the hash collision issue since 2 or 3 weeks. I
evaluated all solutions and I think that I have now a good knowledge
of the problem and how it should be solved. The major issue is to have
a minor or no impact on applications (don't break backward
compatibility). I saw three major solutions:

 - use a randomized hash
 - use two hashes, a randomized hash and the actual hash kept for
backward compatibility
 - count collisions on dictionary lookup

Using a randomized hash does break a lot of tests (e.g. tests relying
on the representation of a dictionary). The patch is huge, too big to
backport it directly on stable versions. Using a randomized hash may
also break (indirectly) real applications because the application
output is also somehow "randomized". For example, in the Django test
suite, the HTML output is different at each run. Web browsers may
render the web page differently, or crash, or ... I don't think that
Django would like to sort attributes of each HTML tag, just because we
wanted to fix a vulnerability.

Randomized hash has also a major issue: if the attacker is able to
compute the secret, (s)he can easily compute collisions and exploit
the hash collision vulnerability again. I don't know exactly how
complex it is to compute the secret, but our hash function is weak (it
is far from being cryptographic, it is really simple to run it
backward). If someone writes a fast function to compute the secret, we
will go back to the same point.

IMO using two hashes has the same disavantages of the randomized hash
solution, whereas it is more complex to implement.

The last solution is very simple: count collision and raise an
exception if it hits a limit. The path is something like 10 lines
whereas the randomized hash is more close to 500 lines, add a new
file, change Visual Studio project file, etc. First I thaught that it
would break more applications than the randomized hash, but I tried on
Django: the test suite fails with a limit of 20 collisions, but not
with a limit of 50 collisions, whereas the patch uses a limit of 1000
collisions. According to my basic tests, a limit of 35 collisions
requires a dictionary with more than 10,000,000 integer keys to raise
an error. I am not talking about the attack, but valid data.

More details about my tests on the Django test suite:


I propose to solve the hash collision vulnerability by counting
collisions because it does fix the vulnerability with a minor or no
impact on applications or backward compatibility. I don't see why we
should use a different fix for Python 3.3. If counting collisons
solves the issue for stable versions, it is also enough for Python
3.3. We now know all issues of the randomized hash solution, and I
think that there are more drawbacks than advantages. IMO the
randomized hash is overkill to fix the hash collision issue.

I just have some requests on Marc Andre Lemburg patch:

 - the limit should be configurable: a new function in the sys module
should be enough. It may be private (or replaced by an environment
variable?) in stable versions
 - the set type should also be patched (I didn't check if it is
vulnerable or not using the patch)
 - the patch has no test! (a class with a fixed hash should be enough
to write a test)
 - the limit must be documented somwhere
 - the exception type should be different than KeyError


From greg.ewing at  Thu Jan 19 22:41:17 2012
From: greg.ewing at (Greg)
Date: Fri, 20 Jan 2012 10:41:17 +1300
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Glyph wrote:
> [Guido] mentions the point that coroutines that can implicitly switch out from 
> under you have the same non-deterministic property as threads: you don't 
> know where you're going to need a lock or lock-like construct to update 
> any variables, so you need to think about concurrency more deeply than 
> if you could explicitly always see a 'yield'.

I'm not convinced that being able to see 'yield's will help
all that much. In any system that makes substantial use of
generator-based coroutines, you're going to see 'yield from's
all over the place, from the lowest to the highest levels.
But that doesn't mean you need a correspondingly large
number of locks. You can't look at a 'yield' and conclude
that you need a lock there or tell what needs to be locked.

There's no substitute for deep thought where any kind of
theading is involved, IMO.


From guido at  Fri Jan 20 03:47:13 2012
From: guido at (Guido van Rossum)
Date: Thu, 19 Jan 2012 18:47:13 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 4:48 PM, Victor Stinner <
victor.stinner at> wrote:

> Hi,
> I'm working on the hash collision issue since 2 or 3 weeks. I
> evaluated all solutions and I think that I have now a good knowledge
> of the problem and how it should be solved. The major issue is to have
> a minor or no impact on applications (don't break backward
> compatibility). I saw three major solutions:
>  - use a randomized hash
>  - use two hashes, a randomized hash and the actual hash kept for
> backward compatibility
>  - count collisions on dictionary lookup
> Using a randomized hash does break a lot of tests (e.g. tests relying
> on the representation of a dictionary). The patch is huge, too big to
> backport it directly on stable versions. Using a randomized hash may
> also break (indirectly) real applications because the application
> output is also somehow "randomized". For example, in the Django test
> suite, the HTML output is different at each run. Web browsers may
> render the web page differently, or crash, or ... I don't think that
> Django would like to sort attributes of each HTML tag, just because we
> wanted to fix a vulnerability.
> Randomized hash has also a major issue: if the attacker is able to
> compute the secret, (s)he can easily compute collisions and exploit
> the hash collision vulnerability again. I don't know exactly how
> complex it is to compute the secret, but our hash function is weak (it
> is far from being cryptographic, it is really simple to run it
> backward). If someone writes a fast function to compute the secret, we
> will go back to the same point.
> IMO using two hashes has the same disavantages of the randomized hash
> solution, whereas it is more complex to implement.
> The last solution is very simple: count collision and raise an
> exception if it hits a limit. The path is something like 10 lines
> whereas the randomized hash is more close to 500 lines, add a new
> file, change Visual Studio project file, etc. First I thaught that it
> would break more applications than the randomized hash, but I tried on
> Django: the test suite fails with a limit of 20 collisions, but not
> with a limit of 50 collisions, whereas the patch uses a limit of 1000
> collisions. According to my basic tests, a limit of 35 collisions
> requires a dictionary with more than 10,000,000 integer keys to raise
> an error. I am not talking about the attack, but valid data.
> More details about my tests on the Django test suite:
> --
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.


> I just have some requests on Marc Andre Lemburg patch:
>  - the limit should be configurable: a new function in the sys module
> should be enough. It may be private (or replaced by an environment
> variable?) in stable versions
>  - the set type should also be patched (I didn't check if it is
> vulnerable or not using the patch)
>  - the patch has no test! (a class with a fixed hash should be enough
> to write a test)
>  - the limit must be documented somwhere
>  - the exception type should be different than KeyError
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Fri Jan 20 03:49:29 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 20 Jan 2012 12:49:29 +1000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
	<> <jf9t7m$hu4$>
Message-ID: <>

On Fri, Jan 20, 2012 at 9:54 AM, "Martin v. L?wis" <martin at> wrote:
>> I can't help noticing that so far, worries about the workload came mostly from
>> people who don't actually bear that load (this is no accusation!), while those
>> that do are the proponents of the PEP...
> Ok, so let me add then that I'm worried about the additional work-load.
> I'm particularly worried about the coordination of vacation across the
> three people that work on a release. It might well not be possible to
> make any release for a period of two months, which, in a six-months
> release cycle with two alphas and a beta, might mean that we (the
> release people) would need to adjust our vacation plans with the release
> schedule, or else step down (unless you would release the "normal"
> feature releases as source-only releases).

I must admit that aspect had concerned me as well. Currently we use
the 18-24 month window for releases to slide things around to
accommodate the schedules of the RM, Martin (Windows binaries) and
Ned/Ronald (Mac OS X binaries).

Before we could realistically switch to more frequent releases,
something would need to change on the binary release side.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From anacrolix at  Fri Jan 20 04:01:19 2012
From: anacrolix at (Matt Joiner)
Date: Fri, 20 Jan 2012 14:01:19 +1100
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 8:41 AM, Greg <greg.ewing at> wrote:
> Glyph wrote:
>> [Guido] mentions the point that coroutines that can implicitly switch out
>> from under you have the same non-deterministic property as threads: you
>> don't know where you're going to need a lock or lock-like construct to
>> update any variables, so you need to think about concurrency more deeply
>> than if you could explicitly always see a 'yield'.
> I'm not convinced that being able to see 'yield's will help
> all that much. In any system that makes substantial use of
> generator-based coroutines, you're going to see 'yield from's
> all over the place, from the lowest to the highest levels.
> But that doesn't mean you need a correspondingly large
> number of locks. You can't look at a 'yield' and conclude
> that you need a lock there or tell what needs to be locked.
> There's no substitute for deep thought where any kind of
> theading is involved, IMO.
> --
> Greg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

I wasn't aware that Guido had brought this up, and I believe what he
says to be true. Preemptive coroutines, are just a hack around the
GIL, and reduce OS overheads. It's the explicit nature of the enhanced
generators that is their greatest value.

FWIW, I wrote a Python 3 compatible equivalent to gevent (also
greenlet based, and also very similar to Brett's et al coroutine
proposal), which didn't really solve the concurrency problems I hoped.
There were no guarantees whether functions would "switch out", so all
the locking and threading issues simply reemerged, albeit with also
needing to have all calls non-blocking, losing compatibility with any
routine that didn't make use of nonblocking calls and/or expose it's
"yield" in the correct way, but reducing GIL contention. Overall not
worth it.

In short, implicit coroutines are just a GIL work around, that break
compatibility for little gain.

Thanks Glyph for those links.

From ivan at  Fri Jan 20 04:32:13 2012
From: ivan at (Ivan Kozik)
Date: Fri, 20 Jan 2012 03:32:13 +0000
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 00:48, Victor Stinner
<victor.stinner at> wrote:
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.

I'd like to point out that an attacker is not limited to sending just
one dict full of colliding keys.  Given a 22ms stall for a dict full
of 1000 colliding keys, and 100 such objects inside a parent object
(perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
the raise-at-1000 approach doesn't solve the problem for everyone.

In addition, because the raise-at-N-collisions approach raises an
exception, everyone who wants to handle this error condition properly
has to change their code to catch a previously-unexpected exception.
(I know they're usually still better off with the fix, but why force
many people to change code when you can actually fix the hashing

Another issue is that even with a configurable limit, different
modules can't have their own limits.  One module might want a
relatively safe raise-at-100, and another module creating massive
dicts might want raise-at-1000.  How does a developer know whether
they can raise or lower the limit, given that they use a bunch of
different modules?

I actually went with this stop-at-N-collisions approach by patching my
CPython a few years ago, where I limiting dictobject and setobject's
critical `for` loop to 100 iterations (I realize this might handle
fewer than 100 collisions.)  This worked fine until I tried to compile
PyPy, where the translator blew up due to a massive dict.  This,
combined with the second problem (needing to catch an exception), led
me to abandon this approach and write Securetypes, which has a
securedict that uses SHA-1.  Not that I like this either; I think I'm
happy with the randomize-hash() approach.


From guido at  Fri Jan 20 04:48:16 2012
From: guido at (Guido van Rossum)
Date: Thu, 19 Jan 2012 19:48:16 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik <ivan at> wrote:

> On Fri, Jan 20, 2012 at 00:48, Victor Stinner
> <victor.stinner at> wrote:
> > I propose to solve the hash collision vulnerability by counting
> > collisions because it does fix the vulnerability with a minor or no
> > impact on applications or backward compatibility. I don't see why we
> > should use a different fix for Python 3.3. If counting collisons
> > solves the issue for stable versions, it is also enough for Python
> > 3.3. We now know all issues of the randomized hash solution, and I
> > think that there are more drawbacks than advantages. IMO the
> > randomized hash is overkill to fix the hash collision issue.
> I'd like to point out that an attacker is not limited to sending just
> one dict full of colliding keys.  Given a 22ms stall for a dict full
> of 1000 colliding keys, and 100 such objects inside a parent object
> (perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
> the raise-at-1000 approach doesn't solve the problem for everyone.

It's "just" a DoS attack. Those won't go away. We just need to raise the
effort needed for the attacker. The original attack would cause something
like 5 minutes of CPU usage per request (with a set of colliding keys that
could be computed once and used to attack every Python-run website in the
world). That's at least 2 orders of magnitude worse.

In addition, because the raise-at-N-collisions approach raises an
> exception, everyone who wants to handle this error condition properly
> has to change their code to catch a previously-unexpected exception.
> (I know they're usually still better off with the fix, but why force
> many people to change code when you can actually fix the hashing
> problem?)

Why would anybody need to change their code? Every web framework worth its
salt has a top-level error catcher that logs the error, serves a 500
response, and possibly does other things like email the admin.

> Another issue is that even with a configurable limit, different
> modules can't have their own limits.  One module might want a
> relatively safe raise-at-100, and another module creating massive
> dicts might want raise-at-1000.  How does a developer know whether
> they can raise or lower the limit, given that they use a bunch of
> different modules?

I don't think it needs to be configurable. There just needs to be a way to
turn it off.

> I actually went with this stop-at-N-collisions approach by patching my
> CPython a few years ago, where I limiting dictobject and setobject's
> critical `for` loop to 100 iterations (I realize this might handle
> fewer than 100 collisions.)  This worked fine until I tried to compile
> PyPy, where the translator blew up due to a massive dict.

I think that's because your collision-counting algorithm was much more
primitive than MAL's.

> This,
> combined with the second problem (needing to catch an exception), led
> me to abandon this approach and write Securetypes, which has a
> securedict that uses SHA-1.  Not that I like this either; I think I'm
> happy with the randomize-hash() approach.

Why did you need to catch the exception? Were you not happy with the
program simply terminating with a traceback when it got attacked?

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From brian at  Fri Jan 20 04:57:53 2012
From: brian at (Brian Curtin)
Date: Thu, 19 Jan 2012 21:57:53 -0600
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
	<> <jf9t7m$hu4$>
Message-ID: <>

On Thu, Jan 19, 2012 at 17:54, "Martin v. L?wis" <martin at> wrote:
> Ok, so let me add then that I'm worried about the additional work-load.
> I'm particularly worried about the coordination of vacation across the
> three people that work on a release. It might well not be possible to
> make any release for a period of two months, which, in a six-months
> release cycle with two alphas and a beta, might mean that we (the
> release people) would need to adjust our vacation plans with the release
> schedule, or else step down (unless you would release the "normal"
> feature releases as source-only releases).
> FWIW, it might well be that I can't be available for the 3.3 final
> release (I haven't finalized my vacation schedule yet for August).

In the interest of not having Windows releases depend on one person,
and having gone through building the installer myself (which I know is
but one of the duties), I'm available to help should you need it.

From steve at  Fri Jan 20 05:00:48 2012
From: steve at (Steven D'Aprano)
Date: Fri, 20 Jan 2012 15:00:48 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Victor Stinner wrote:

> The last solution is very simple: count collision and raise an
> exception if it hits a limit. ...
> According to my basic tests, a limit of 35 collisions
> requires a dictionary with more than 10,000,000 integer keys to raise
> an error. I am not talking about the attack, but valid data.

You might think that 10 million keys is a lot of data, but that's only about 
100 MB worth. I already see hardware vendors advertising computers with 6 GB 
RAM as "entry level", e.g. the HP Pavilion starts with 6GB expandable to 16GB. 
I expect that there are already people using Python who will unpredictably hit 
that limit by accident, and the number will only grow as computers get more 

With a limit of 35 collisions, it only takes 35 keys to to force a dict to 
raise an exception, if you are an attacker able to select colliding keys. 
We're trying to defend against an attacker who is able to force collisions, 
not one who is waiting for accidental collisions. I don't see that causing the 
dict to raise an exception helps matters: it just changes the attack from 
"keep the dict busy indefinitely" to "cause an exception and crash the 

This moves responsibility from dealing with collisions out of the dict to the 
application code. Instead of solving the problem in one place (the built-in 
dict) now every application that uses dicts has to identify which dicts can be 
attacked, and deal with the exception.

That pushes the responsibility for security onto people who are the least 
willing or able to deal with it: the average developer, who neither 
understands nor cares about security, or if they do care, they can't convince 
their manager to care.

I suppose an exception is an improvement over the application hanging 
indefinitely, but I'd hardly call it a fix.

Ruby uses randomized hashes. Are there any other languages with a dict or 
mapping class that raises on too many exceptions?


From ivan at  Fri Jan 20 05:06:25 2012
From: ivan at (Ivan Kozik)
Date: Fri, 20 Jan 2012 04:06:25 +0000
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 03:48, Guido van Rossum <guido at> wrote:
> I think that's because your collision-counting algorithm was much more
> primitive than MAL's.


>> This,
>> combined with the second problem (needing to catch an exception), led
>> me to abandon this approach and write Securetypes, which has a
>> securedict that uses SHA-1. ?Not that I like this either; I think I'm
>> happy with the randomize-hash() approach.
> Why did you need to catch the exception? Were you not happy with the program
> simply terminating with a traceback when it got attacked?

No, I wasn't happy with termination.  I wanted to treat it just like a
JSON decoding error, and send the appropriate response.

I actually forgot to mention the main reason I abandoned the
stop-at-N-collisions approach.  I had a server with a dict that stayed
in memory, across many requests.  It was being populated with
identifiers chosen by clients.  I couldn't have my server stay broken
if this dict filled up with a bunch of colliding keys.  (I don't think
I could have done another thing either, like nuke the dict or evict
some keys.)


From carl at  Fri Jan 20 05:54:18 2012
From: carl at (Carl Meyer)
Date: Thu, 19 Jan 2012 21:54:18 -0700
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Hash: SHA1

Hi Victor,

On 01/19/2012 05:48 PM, Victor Stinner wrote:
> Using a randomized hash may
> also break (indirectly) real applications because the application
> output is also somehow "randomized". For example, in the Django test
> suite, the HTML output is different at each run. Web browsers may
> render the web page differently, or crash, or ... I don't think that
> Django would like to sort attributes of each HTML tag, just because we
> wanted to fix a vulnerability.

I'm a Django core developer, and if it is true that our test-suite has a
dictionary-ordering dependency that is expressed via HTML attribute
ordering, I consider that a bug and would like to fix it. I'd be
grateful for, not resentful of, a change in CPython that revealed the
bug and prompted us to fix it. (I presume that it is true, as it sounds
like you experienced it directly; I don't have time to play around at
the moment, but I'm surprised we haven't seen bug reports about it from
users of 64-bit Pythons long ago). I can't speak for the core team, but
I doubt there would be much disagreement on this point: ideally Django
would run equally well on any implementation of Python, and as far as I
know none of the alternative implementations guarantee hash or
dict-ordering compatibility with CPython.

I don't have the expertise to speak otherwise to the alternatives for
fixing the collisions vulnerability, but I don't believe it's accurate
to presume that Django would not want to fix a dict-ordering dependency,
and use that as a justification for one approach over another.

Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


From ncoghlan at  Fri Jan 20 06:15:16 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 20 Jan 2012 15:15:16 +1000
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano <steve at> wrote:
> With a limit of 35 collisions, it only takes 35 keys to to force a dict to
> raise an exception, if you are an attacker able to select colliding keys.
> We're trying to defend against an attacker who is able to force collisions,
> not one who is waiting for accidental collisions. I don't see that causing
> the dict to raise an exception helps matters: it just changes the attack
> from "keep the dict busy indefinitely" to "cause an exception and crash the
> application".

No, that's fundamentally misunderstanding the nature of the attack.
The reason the hash collision attack is a problem is because it allows
you to DoS a web service in a way that requires minimal client side
resources but can have a massive effect on the server. The attacker is
making a single request that takes the server an inordinately long
time to process, consuming CPU resources all the while, and likely
preventing the handling of any other requests (especially for an
event-based server, since the attack is CPU based, bypassing all use
of asynchronous IO).

With the 1000 collision limit in place, the attacker sends their
massive request, the affected dict quickly hits the limit, throws an
unhandled exception which is then caught by the web framework and
turned into a 500 Error response (or whatever's appropriate for the
protocol being attacked).

If a given web service doesn't *already* have a catch all handler to
keep an unexpected exception from bringing the entire service down,
then DoS attacks like this one are the least of its worries.

As for why other languages haven't gone this way, I have no idea.
There are lots of details relating to a language's hash and hash map
design that will drive how suitable randomisation is as an answer, and
it also depends greatly on how you decide to characterise the threat.

FWIW, Victor's analysis in the opening post of this thread matches the
conclusions I came to a few days ago, although he's been over the
alternatives far more thoroughly than I have.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Fri Jan 20 06:18:36 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 20 Jan 2012 15:18:36 +1000
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer <carl at> wrote:
> I don't have the expertise to speak otherwise to the alternatives for
> fixing the collisions vulnerability, but I don't believe it's accurate
> to presume that Django would not want to fix a dict-ordering dependency,
> and use that as a justification for one approach over another.

It's more a matter of wanting deployment of a security fix to be as
painless as possible - a security fix that system administrators can't
deploy because it breaks critical applications may as well not exist.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From v+python at  Fri Jan 20 06:24:55 2012
From: v+python at (Glenn Linderman)
Date: Thu, 19 Jan 2012 21:24:55 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/19/2012 8:54 PM, Carl Meyer wrote:
> Hash: SHA1
> Hi Victor,
> On 01/19/2012 05:48 PM, Victor Stinner wrote:
> [snip]
>> Using a randomized hash may
>> also break (indirectly) real applications because the application
>> output is also somehow "randomized". For example, in the Django test
>> suite, the HTML output is different at each run. Web browsers may
>> render the web page differently, or crash, or ... I don't think that
>> Django would like to sort attributes of each HTML tag, just because we
>> wanted to fix a vulnerability.
> I'm a Django core developer, and if it is true that our test-suite has a
> dictionary-ordering dependency that is expressed via HTML attribute
> ordering, I consider that a bug and would like to fix it. I'd be
> grateful for, not resentful of, a change in CPython that revealed the
> bug and prompted us to fix it. (I presume that it is true, as it sounds
> like you experienced it directly; I don't have time to play around at
> the moment, but I'm surprised we haven't seen bug reports about it from
> users of 64-bit Pythons long ago). I can't speak for the core team, but
> I doubt there would be much disagreement on this point: ideally Django
> would run equally well on any implementation of Python, and as far as I
> know none of the alternative implementations guarantee hash or
> dict-ordering compatibility with CPython.
> I don't have the expertise to speak otherwise to the alternatives for
> fixing the collisions vulnerability, but I don't believe it's accurate
> to presume that Django would not want to fix a dict-ordering dependency,
> and use that as a justification for one approach over another.
> Carl

It might be a good idea to have a way to seed the hash with some value 
to allow testing with different dict orderings -- this would allow tests 
to be developed using one Python implementation that would be immune to 
the different orderings on different implementations; however, 
randomizing the hash not only doesn't solve the problem for long-running 
applications, it causes non-deterministic performance from one run to 
the next even with the exact same data: a different (random) seed could 
cause collisions sporadically with data that usually gave good 
performance results, and there would be little explanation for it, and 
little way to reproduce the problem to report it or understand it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Fri Jan 20 07:28:22 2012
From: glyph at (Glyph)
Date: Fri, 20 Jan 2012 01:28:22 -0500
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 19, 2012, at 4:41 PM, Greg wrote:

> Glyph wrote:
>> [Guido] mentions the point that coroutines that can implicitly switch out from under you have the same non-deterministic property as threads: you don't know where you're going to need a lock or lock-like construct to update any variables, so you need to think about concurrency more deeply than if you could explicitly always see a 'yield'.
> I'm not convinced that being able to see 'yield's will help
> all that much.

Well, apparently we disagree, and I work on such a system all day, every day :-).  It was nice to see that Matt Joiner also agreed for very similar reasons, and at least I know I'm not crazy.

> In any system that makes substantial use of
> generator-based coroutines, you're going to see 'yield from's
> all over the place, from the lowest to the highest levels.
> But that doesn't mean you need a correspondingly large
> number of locks. You can't look at a 'yield' and conclude
> that you need a lock there or tell what needs to be locked.

Yes, but you can look at a 'yield' and conclude that you might need a lock, and that you have to think about it.

Further exploration of my own feelings on the subject grew a bit beyond a good length for a reply here, so if you're interested in my thoughts you can have a look at my blog: <>.

> There's no substitute for deep thought where any kind of theading is involved, IMO.

Sometimes there's no alternative, but wherever I can, I avoid thinking, especially hard thinking.  This maxim has served me very well throughout my programming career ;-).


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From hs at  Fri Jan 20 10:29:06 2012
From: hs at (Hynek Schlawack)
Date: Fri, 20 Jan 2012 10:29:06 +0100
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

Hello Vijay 

Am Freitag, 20. Januar 2012 um 00:56 schrieb Vijay N. Majagaonkar:

> I am trying to build python 3 on mac and build failing with following error can somebody help me with this
It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: 

make clean
CC=clang ./configure && make -s

works though (despite the abundant warnings).


From martin at  Fri Jan 20 10:34:09 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 20 Jan 2012 10:34:09 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> The last solution is very simple: count collision and raise an
> exception if it hits a limit. The path is something like 10 lines
> whereas the randomized hash is more close to 500 lines, add a new
> file, change Visual Studio project file, etc. First I thaught that it
> would break more applications than the randomized hash

The main issue with that approach is that it allows a new kind of attack.

An attacker now needs to find 1000 colliding keys, and submit them
one-by-one into a database. The limit will not trigger, as those are
just database insertions.

Now, if the applications also as a need to read the entire database
table into a dictionary, that will suddenly break, and not for the
attacker (which would be ok), but for the regular user of the
application or the site administrator.

So it may be that this approach actually simplifies the attack, making
the cure worse than the disease.


From hrvoje.niksic at  Fri Jan 20 10:49:06 2012
From: hrvoje.niksic at (Hrvoje Niksic)
Date: Fri, 20 Jan 2012 10:49:06 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

On 01/18/2012 06:55 PM, "Martin v. L?wis" wrote:
> I was thinking about adding the field at the end,

Will this make all strings larger, or only those that create dict 
collisions?  Making all strings larger to fix this issue sounds like a 
really bad idea.

Also, would it be acceptable to simply not cache the alternate hash? 
The cached string hash is an optimization anyway.


From pydev at  Fri Jan 20 10:57:44 2012
From: pydev at (Frank Sievertsen)
Date: Fri, 20 Jan 2012 10:57:44 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> The main issue with that approach is that it allows a new kind of attack.

Indeed, I posted another example:

This kind of fix can be used in a specific application or maybe in a
special-purpose framework, but not on the level of a general-purpose

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Fri Jan 20 11:06:32 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 20 Jan 2012 20:06:32 +1000
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 7:34 PM, "Martin v. L?wis" <martin at> wrote:
> The main issue with that approach is that it allows a new kind of attack.
> An attacker now needs to find 1000 colliding keys, and submit them
> one-by-one into a database. The limit will not trigger, as those are
> just database insertions.
> Now, if the applications also as a need to read the entire database
> table into a dictionary, that will suddenly break, and not for the
> attacker (which would be ok), but for the regular user of the
> application or the site administrator.
> So it may be that this approach actually simplifies the attack, making
> the cure worse than the disease.

Ouch, I think you're right. So hash randomisation may be the best
option, and admins will need to test for themselves to see if it
breaks things...


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From mark at  Fri Jan 20 11:49:05 2012
From: mark at (Mark Shannon)
Date: Fri, 20 Jan 2012 10:49:05 +0000
Subject: [Python-Dev] Changing the order of iteration over a dictionary
In-Reply-To: <>
References: <>	<>	<1326889813.3395.37.camel@localhost.localdomain>	<>	<>	<>	<>
Message-ID: <>


One of the main sticking points over possible fixes for the 
hash-collision security issue seems to be a fear that changing the 
iteration order of a dictionary will break backwards compatibility.

The order of iteration has never been specified. In fact not only is it 
arbitrary, it cannot be determined from the contents of a dict alone; it 
may depend on the insertion order.

Changing a hash function is not the only change that will change the 
iteration order; any of the following will also do so:
* Changing the minimum size of a dict.
* Changing the load factor of a dict.
* Changing the resizing policy of a dict.
* Sharing of keys between dicts.

By treating iteration order as part of the API we are effectively ruling 
out ever making any improvements to the dict.

For example, my new dictionary implementation
reduces memory use by 47% for gcbench, and by about 20% for the 2to3 
benchmark, on my 32bit machine.
(Nice graphs: )

The new dict implementation (necessarily) changes the iteration order 
and will break code that relies on it.

If dict iteration order is to be treated as part of the API (and I think 
that is a very bad idea) then it should be documented, which will be 
difficult since it is barely deterministic.
This will also be a major problem for PyPy, Jython and IronPython, as 
they will have to reimplement their dicts.

So, don't be afraid to change that hash function :)


From fdrake at  Fri Jan 20 12:11:47 2012
From: fdrake at (Fred Drake)
Date: Fri, 20 Jan 2012 06:11:47 -0500
Subject: [Python-Dev] Changing the order of iteration over a dictionary
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 20, 2012 at 5:49 AM, Mark Shannon <mark at> wrote:
> So, don't be afraid to change that hash function :)


The hash function *has* been changed in the past, and a lot of developers
were schooled in not relying on the iteration order.  That's a good thing,
as those developers now write tests of what's actually important rather
than relying on implementation details of the Python runtime.

A hash function that changes more often than during an occasional major
version update will encourage more developers to write better tests.  We
can think of it as an educational tool.


Fred L. Drake, Jr.? ? <fdrake at>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From ncoghlan at  Fri Jan 20 12:32:04 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 20 Jan 2012 21:32:04 +1000
Subject: [Python-Dev] Changing the order of iteration over a dictionary
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 20, 2012 at 8:49 PM, Mark Shannon <mark at> wrote:
> So, don't be afraid to change that hash function :)

Changing it for 3.3 isn't really raising major concerns: the real
concern is with changing it in maintenance and security patches for
earlier releases. Security patches that may break production
applications aren't desirable, since it means admins have to weigh up
the risk of being affected by the security vulnerability against the
risk of breakage from the patch itself.

The collision counting approach was attractive because it looked like
it might offer a way out that was less likely to break deployed
systems. Unfortunately, I think the point Martin raised about just
opening a new (even more subtle) attack vector kills that idea dead.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From victor.stinner at  Fri Jan 20 12:46:39 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 12:46:39 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/20 Ivan Kozik <ivan at>:
> I'd like to point out that an attacker is not limited to sending just
> one dict full of colliding keys. ?Given a 22ms stall ...

The presented attack produces a stall of at least 30 seconds (5
minutes or more if there is no time limit in the application), not
0.022 second. You have to send a lot of requests to produce a DoS if a
single requests just eat 22 ms. I suppose that there are a lot of
other kinds of request than takes much longer than 22 ms, even valid

> Another issue is that even with a configurable limit, different
> modules can't have their own limits. ?One module might want a
> relatively safe raise-at-100, and another module creating massive
> dicts might want raise-at-1000. ?How does a developer know whether
> they can raise or lower the limit, given that they use a bunch of
> different modules?

Python becomes really slow when you have more than N collisions
(O(n^2) problem). If an application hits this limit with valid data,
it is time to use another data structure or use a different hash
function. We have to do more tests to choose correctly N, but
according my first tests, it looks like N=1000 is a safe limit.

Marc Andre's patch doesn't count all "collisions", but only
"collisions" requiring to compare objects. When two objects have the
same hash value, the open addressing algorithm searchs a free bucket.
If a bucket is not free but has a different hash value, the objects
are not compared and the collision counter is not incremented. The
limit is only reached when you have N objects having the same hash
value modulo the size of the bucket (hash(str) & DICT_MASK).

When there are not enough empty buckets (it comes before all buckets
are full), Python resizes the dictionary (it does something like size
= size * 2) and so it uses at least one more bit each time than the
dictionary is resized. Collisions are very likely with a small
dictioanry, but becomes more rare each time than the dictionary is
resized. It means that the number of potential collisions (with valid
data) decreases when the dictionary grows. Tell me if I am wrong.

From victor.stinner at  Fri Jan 20 13:08:43 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 13:08:43 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> I'm surprised we haven't seen bug reports about it from users
> of 64-bit Pythons long ago

A Python dictionary only uses the lower bits of a hash value. If your
dictionary has less than 2**32 items, the dictionary order is exactly
the same on 32 and 64 bits system: hash32(str) & mask == hash64(str) &
mask for mask <= 2**32-1.

From frank at  Fri Jan 20 13:12:57 2012
From: frank at (Frank Sievertsen)
Date: Fri, 20 Jan 2012 13:12:57 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

No, that's not true.
Whenever a collision happens, other bits are mixed in very fast.


Am 20.01.2012 13:08, schrieb Victor Stinner:
>> I'm surprised we haven't seen bug reports about it from users
>> of 64-bit Pythons long ago
> A Python dictionary only uses the lower bits of a hash value. If your
> dictionary has less than 2**32 items, the dictionary order is exactly
> the same on 32 and 64 bits system: hash32(str)&  mask == hash64(str)&
> mask for mask<= 2**32-1.
> _________________________

From victor.stinner at  Fri Jan 20 13:42:37 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 13:42:37 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/20 Frank Sievertsen <frank at>:
> No, that's not true.
> Whenever a collision happens, other bits are mixed in very fast.

Oh, I didn't know that. So the dict order is only the same if there is
no collision.


From victor.stinner at  Fri Jan 20 13:50:18 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 13:50:18 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> The main issue with that approach is that it allows a new kind of attack.
> An attacker now needs to find 1000 colliding keys, and submit them
> one-by-one into a database. The limit will not trigger, as those are
> just database insertions.
> Now, if the applications also as a need to read the entire database
> table into a dictionary, that will suddenly break, and not for the
> attacker (which would be ok), but for the regular user of the
> application or the site administrator.

Oh, good catch. But it would not call it a new kind of attack, it is
just a particular case of the hash collision vulnerability.

Counting collision doesn't solve this case, but it doesn't make the
situation worse than before. Raising quickly an exception is better
than stalling for minutes, even if I agree than it is not the best
behaviour. The best would is to answer quickly with the expected
result :-) (using a different data structure or a different hash

Right now, I don't see any counter measure against this case.


From p.f.moore at  Fri Jan 20 13:57:45 2012
From: p.f.moore at (Paul Moore)
Date: Fri, 20 Jan 2012 12:57:45 +0000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>
	<> <jf9t7m$hu4$>
Message-ID: <>

On 20 January 2012 03:57, Brian Curtin <brian at> wrote:
>> FWIW, it might well be that I can't be available for the 3.3 final
>> release (I haven't finalized my vacation schedule yet for August).
> In the interest of not having Windows releases depend on one person,
> and having gone through building the installer myself (which I know is
> but one of the duties), I'm available to help should you need it.

One thought comes to mind - while we need a PEP to make a permanent
change to the release schedule, would it be practical in any way to do
a "trial run" of the process, and simply aim to release 3.4 about 6
months after 3.3? Based on the experiences gained from that, some of
the discussions around this PEP could be supported (or not :-)) with
more concrete information. If we can't do that, then that says
something about the practicality of the proposal in itself...

The plan for 3.4 would need to be publicised well in advance, of
course, but doing that as a one-off exercise might well be viable.


PS I have no view on whether the proposal is a good idea or a bad idea
from a RM point of view. That's entirely up to the people who do the
work to decide, in my opinion.

From barry at  Fri Jan 20 14:10:30 2012
From: barry at (Barry Warsaw)
Date: Fri, 20 Jan 2012 08:10:30 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:

>Counting collision doesn't solve this case, but it doesn't make the
>situation worse than before. Raising quickly an exception is better
>than stalling for minutes, even if I agree than it is not the best

ISTM that adding the possibility of raising a new exception on dictionary
insertion is *more* backward incompatible than changing dictionary order,
which for a very long time has been known to not be guaranteed.  You're
running some application, you upgrade Python because you apply all security
fixes, and suddenly you're starting to get exceptions in places you can't
really do anything about.  Yet those exceptions are now part of the documented
public API for dictionaries.  This is asking for trouble.  Bugs will suddenly
start appearing in that application's tracker and they will seem to the
application developer like Python just added a new public API in a security

OTOH, if you change dictionary order and *that* breaks the application, then
the bugs submitted to the application's tracker will be legitimate bugs that
have to be fixed even if nothing else changed.

So I still think we should ditch the paranoia about dictionary order changing,
and fix this without counting.  A little bit of paranoia could creep back in
by disabling the hash fix by default in stable releases, but I think it would
be fine to make that a compile-time option.


From barry at  Fri Jan 20 14:17:05 2012
From: barry at (Barry Warsaw)
Date: Fri, 20 Jan 2012 08:17:05 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 20, 2012, at 03:18 PM, Nick Coghlan wrote:

>On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer <carl at> wrote:
>> I don't have the expertise to speak otherwise to the alternatives for
>> fixing the collisions vulnerability, but I don't believe it's accurate
>> to presume that Django would not want to fix a dict-ordering dependency,
>> and use that as a justification for one approach over another.
>It's more a matter of wanting deployment of a security fix to be as
>painless as possible - a security fix that system administrators can't
>deploy because it breaks critical applications may as well not exist.

True, but collision counting is worse IMO.  It's just as likely (maybe) that
an application would start getting new exceptions on dictionary insertion, as
they would failures due to dictionary order changes.  Unfortunately, in the
former case it's because Python just added a new public API in a security
release (the new exception *is* public API).  In the latter case, no new API
was added, but something exposed an already existing bug in the application.
That's still a bug in the application even if counting was added.  It's also a
bug that any number of changes in the environment, or OS vendor deployment,
could have triggered.

-1 for collision counting.


From barry at  Fri Jan 20 14:20:55 2012
From: barry at (Barry Warsaw)
Date: Fri, 20 Jan 2012 08:20:55 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 20, 2012, at 03:15 PM, Nick Coghlan wrote:

>With the 1000 collision limit in place, the attacker sends their
>massive request, the affected dict quickly hits the limit, throws an
>unhandled exception which is then caught by the web framework and
>turned into a 500 Error response (or whatever's appropriate for the
>protocol being attacked).

Let's just be clear about it: this exception is new public API.  Changing
dictionary order is not.

For me, that comes down firmly on the side of the latter rather than the
former for stable releases.


From solipsis at  Fri Jan 20 14:51:59 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 20 Jan 2012 14:51:59 +0100
Subject: [Python-Dev] Counting collisions for the win
References: <>
Message-ID: <>

On Fri, 20 Jan 2012 13:50:18 +0100
Victor Stinner <victor.stinner at> wrote:

> > The main issue with that approach is that it allows a new kind of attack.
> >
> > An attacker now needs to find 1000 colliding keys, and submit them
> > one-by-one into a database. The limit will not trigger, as those are
> > just database insertions.
> >
> > Now, if the applications also as a need to read the entire database
> > table into a dictionary, that will suddenly break, and not for the
> > attacker (which would be ok), but for the regular user of the
> > application or the site administrator.
> Oh, good catch. But it would not call it a new kind of attack, it is
> just a particular case of the hash collision vulnerability.
> Counting collision doesn't solve this case, but it doesn't make the
> situation worse than before. Raising quickly an exception is better
> than stalling for minutes, even if I agree than it is not the best
> behaviour.

Actually, it *is* worse because stalling for seconds or minutes may not
be a problem in some cases (e.g. some batch script that gets run



From solipsis at  Fri Jan 20 14:53:47 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 20 Jan 2012 14:53:47 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
References: <>
	<> <jf9t7m$hu4$>
Message-ID: <>

On Fri, 20 Jan 2012 12:57:45 +0000
Paul Moore <p.f.moore at> wrote:
> On 20 January 2012 03:57, Brian Curtin <brian at> wrote:
> >> FWIW, it might well be that I can't be available for the 3.3 final
> >> release (I haven't finalized my vacation schedule yet for August).
> >
> > In the interest of not having Windows releases depend on one person,
> > and having gone through building the installer myself (which I know is
> > but one of the duties), I'm available to help should you need it.
> One thought comes to mind - while we need a PEP to make a permanent
> change to the release schedule, would it be practical in any way to do
> a "trial run" of the process, and simply aim to release 3.4 about 6
> months after 3.3?

It sounds reasonable to me, although we probably wouldn't market it as
a "trial run".



From meadori at  Fri Jan 20 16:02:24 2012
From: meadori at (Meador Inge)
Date: Fri, 20 Jan 2012 09:02:24 -0600
Subject: [Python-Dev] [Python-checkins] cpython (3.1): Closes #13807:
 Now checks for sys.stderr being there before writing to it.
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 5:32 AM, vinay.sajip <python-checkins at> wrote:
> changeset: ? 74538:73dad4940b88
> branch: ? ? ?3.1

I thought that the 3.1 branch is in security mode?  Is this a security
related fix?
>From my brief scan of the changeset, it doesn't seem to be.

> parent: ? ? ?74253:fb5707168351
> user: ? ? ? ?Vinay Sajip <vinay_sajip at>
> date: ? ? ? ?Fri Jan 20 11:23:02 2012 +0000
> summary:
> ?Closes #13807: Now checks for sys.stderr being there before writing to it.
> files:
> ?Lib/logging/ | ?2 +-
> ?1 files changed, 1 insertions(+), 1 deletions(-)
> diff --git a/Lib/logging/ b/Lib/logging/
> --- a/Lib/logging/
> +++ b/Lib/logging/
> @@ -721,7 +721,7 @@
> ? ? ? ? You could, however, replace this with a custom handler if you wish.
> ? ? ? ? The record which was being processed is passed in to this method.
> ? ? ? ? """
> - ? ? ? ?if raiseExceptions:
> + ? ? ? ?if raiseExceptions and sys.stderr: ?# see issue 13807
> ? ? ? ? ? ? ei = sys.exc_info()
> ? ? ? ? ? ? try:
> ? ? ? ? ? ? ? ? traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)
> --
> Repository URL:
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at

# Meador

From guido at  Fri Jan 20 16:33:28 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 07:33:28 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 1:34 AM, "Martin v. L?wis" <martin at>wrote:

> > The last solution is very simple: count collision and raise an
> > exception if it hits a limit. The path is something like 10 lines
> > whereas the randomized hash is more close to 500 lines, add a new
> > file, change Visual Studio project file, etc. First I thaught that it
> > would break more applications than the randomized hash
> The main issue with that approach is that it allows a new kind of attack.
> An attacker now needs to find 1000 colliding keys, and submit them
> one-by-one into a database. The limit will not trigger, as those are
> just database insertions.
> Now, if the applications also as a need to read the entire database
> table into a dictionary, that will suddenly break, and not for the
> attacker (which would be ok), but for the regular user of the
> application or the site administrator.
> So it may be that this approach actually simplifies the attack, making
> the cure worse than the disease.

It would be a pretty lousy app that tried to load the contents of an entire
database into a dict. It seems that this would require much more knowledge
of what the app is trying to do before a successful attack can be mounted.
So I don't think this is worse than the original attack -- I think it
requires much more ingenuity of an attacker. (I'm thinking that the
original attack is trivial once the set of 65000 colliding keys is public
knowledge, which must be only a matter of time.)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pydev at  Fri Jan 20 16:55:32 2012
From: pydev at (Frank Sievertsen)
Date: Fri, 20 Jan 2012 16:55:32 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>


I still see at least two ways to create a DOS attack even with the

I assumed that it's possible to send ~500KB of payload to the

1. It's fully deterministic which slots the dict will lookup.
Since we don't count slot-collisions, but only hash-value-collisions
this can be exploited easily by creating strings with the hash-values
along the lookup-way of an arbitrary (short) string.

So first we pick an arbitrary string. Then calculate which slots it will
visit on the way to the first empty slot. Then we create strings with
hash-values for these slots.

This attack first injects the strings to fill all the slots that the
one short string will want to visit. Then it adds THE SAME string
again and again. Since the entry is already there, nothing will be added
and no additional collisions happen, no exception raised.

$ ls -l super.txt
-rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt
$ tail -n3 super.txt
$ wc -l super.txt
90000 super.txt
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("super.txt"))'
real    0m52.724s
user    0m51.543s
sys    0m0.028s

2. The second attack actually attacks that 1000 allowed string
comparisons are still a lot of work.
First I added 999 strings that collide with a one-byte string "a". In
some applications a zero-byte string might work even better. Then I
can add a many thousand of the "a"'s, just like the first attack.

$ ls -l 1000.txt
-rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt
$ head -n 3 1000.txt
$ wc -l 1000.txt
247000 1000.txt
$ tail -n 3 1000.txt
$ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("1000.txt"))'
real    0m17.408s
user    0m15.897s
sys    0m0.008s

Of course the first attack is far more efficient. One could argue
that 16 seconds is not enough for an attack. But maybe it's possible
to send 1MB, have zero-bytes strings, and since for example django
does 5 lookups per query-string this will keep it busy for ~80 seconds on
my pc.

What to do now?
I think it's not smart to reduce the number of allowed collisions 
AND count all slot-collisions at the same time.


From victor.stinner at  Fri Jan 20 17:04:18 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 17:04:18 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> (I'm thinking that the original
> attack is trivial once the set of 65000 colliding keys is public knowledge,
> which must be only a matter of time.)

I have a program able to generate collisions: it takes 1 second to
compute 60,000 colliding strings on a desktop computer. So the
security of the randomized hash is based on the fact than the attacker
cannot compute the secret.


From victor.stinner at  Fri Jan 20 17:17:24 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 20 Jan 2012 17:17:24 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> So I still think we should ditch the paranoia about dictionary order changing,
> and fix this without counting.

The randomized hash has other issues:

 - its security is based on its secret, whereas it looks to be easy to
compute it (see more details in the issue)
 - my patch only changes hash(str), whereas other developers asked me
to patch also bytes, int and other types

hash(bytes) can be changed. But changing hash(int) may leak easily the
secret. We may use a different secret for each type, but if it is easy
to compute int hash secret, dictionaries using int are still


There is no perfect solutions, drawbacks of each solution should be compared.


From solipsis at  Fri Jan 20 17:31:17 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 20 Jan 2012 17:31:17 +0100
Subject: [Python-Dev] Counting collisions for the win
References: <>
Message-ID: <>

On Fri, 20 Jan 2012 17:17:24 +0100
Victor Stinner <victor.stinner at> wrote:
> > So I still think we should ditch the paranoia about dictionary order changing,
> > and fix this without counting.
> The randomized hash has other issues:
>  - its security is based on its secret, whereas it looks to be easy to
> compute it (see more details in the issue)

How do you compute the secret? I see two possibilities:

- the application leaks the hash() values: this sounds unlikely since I
  don't see the use case for it;

- the application shows the dict iteration order (e.g. order of HTML
  attributes): then we could add a second per-dictionary secret so that
  the iteration order of a single dict doesn't give any useful
  information about the hash function.

But the bottom line for me is the following:

- randomized hashes eliminate the possibility to use a single exploit
  for all Python-powered applications: for each application, the
  attacker now has to find a way to extract the secret;

- collision counting doesn't eliminate the possibility of generic
  exploits, as Frank Sievertsen has just shown in



From regebro at  Fri Jan 20 17:41:34 2012
From: regebro at (Lennart Regebro)
Date: Fri, 20 Jan 2012 17:41:34 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 01:48, Victor Stinner
<victor.stinner at> wrote:
> ?- the limit should be configurable: a new function in the sys module
> should be enough. It may be private (or replaced by an environment
> variable?) in stable versions

I'd like to see both. I would like both the programmer and the "user"
to be able to control what the limit is.


From status at  Fri Jan 20 18:07:33 2012
From: status at (Python tracker)
Date: Fri, 20 Jan 2012 18:07:33 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (2012-01-13 - 2012-01-20)
Python tracker at

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3209 ( -1)
  closed 22405 (+53)
  total  25614 (+52)

Open issues with patches: 1376 

Issues opened (37)

#13411: Hashable memoryviews  reopened by skrah

#13782: xml.etree.ElementTree: Element.append doesn't type-check its a  opened by sjmachin

#13783: Clean up PEP 380 C API additions  opened by ncoghlan

#13784: Documentation of  xml.sax.xmlreader: Locator.getLineNumber() a  opened by patrick.vrijlandt

#13785: Make concurrent.futures.Future state public  opened by jjdominguezm

#13788: os.closerange optimization  opened by ferringb

#13789: _tkinter does not build on Windows 7  opened by terry.reedy

#13790: In str.format an incorrect error message for list, tuple, dict  opened by py.user

#13792: The "os.execl" call doesn't give programs exit code  opened by kayhayen

#13793: hasattr, delattr, getattr fail with unnormalized names  opened by Jim.Jewett

#13796: use 'text=...' to define the text attribute of and xml.etree.E  opened by paaguti

#13797: Allow objects implemented in pure Python to export PEP 3118 bu  opened by ncoghlan

#13798: Pasting and then running code doesn't work in the IDLE Shell  opened by ramchandra.apte

#13799: Base 16 should be hexadecimal in Unicode HOWTO  opened by ramchandra.apte

#13801: The Python 3 Docs don't highlight nonlocal  opened by ramchandra.apte

#13802: IDLE Prefernces/Fonts: use multiple alphabets in examples  opened by terry.reedy

#13804: Python library structure creates hard to read code when using  opened by dwt

#13806: Audioop decompression frames size check fix  opened by Oleg.Plakhotnyuk

#13812: multiprocessing package doesn't flush stderr on child exceptio  opened by brandj

#13814: Generators as context managers.  opened by yak

#13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper  opened by cjwatson

#13816: Two typos in the docs  opened by Retro

#13817: deadlock in subprocess while running several threads using Pop  opened by glaubich

#13818: argparse: -h listening required options under optional argumen  opened by mgodinho

#13819: _warnings settings are process-wide  opened by pitrou

#13820: 2.6 is no longer in the future  opened by Jim.Jewett

#13821: misleading return from isidentifier  opened by Jim.Jewett

#13822: is(upper/lower/title) are not exactly correct  opened by benjamin.peterson

#13823: xml.etree.ElementTree.ElementTree.write - argument checking  opened by patrick.vrijlandt

#13824: argparse.FileType opens a file without excepting resposibility  opened by David.Layton

#13825: Datetime failing while reading active directory time attribute  opened by scape

#13826: Having a shlex example in the subprocess.Popen docs is confusi  opened by Julian

#13828: Further improve casefold documentation  opened by Jim.Jewett

#13829: exception error  opened by Dan.kamp

#13830: codecs error handler is called with a UnicodeDecodeError with  opened by amaury.forgeotdarc

#13831: get method of  multiprocessing.pool.Async should return full t  opened by fmitha

#13833: No documentation for PyStructSequence  opened by torsten

Most recent 15 issues with no replies (15)

#13833: No documentation for PyStructSequence

#13831: get method of  multiprocessing.pool.Async should return full t

#13830: codecs error handler is called with a UnicodeDecodeError with

#13829: exception error

#13824: argparse.FileType opens a file without excepting resposibility

#13823: xml.etree.ElementTree.ElementTree.write - argument checking

#13822: is(upper/lower/title) are not exactly correct

#13820: 2.6 is no longer in the future

#13819: _warnings settings are process-wide

#13818: argparse: -h listening required options under optional argumen

#13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

#13802: IDLE Prefernces/Fonts: use multiple alphabets in examples

#13784: Documentation of  xml.sax.xmlreader: Locator.getLineNumber() a

#13777: socket: communicating with Mac OS X KEXT controls

#13771: HTTPSConnection __init__ super implementation causes recursion

Most recent 15 issues waiting for review (15)

#13833: No documentation for PyStructSequence

#13817: deadlock in subprocess while running several threads using Pop

#13816: Two typos in the docs

#13815: tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

#13806: Audioop decompression frames size check fix

#13788: os.closerange optimization

#13785: Make concurrent.futures.Future state public

#13777: socket: communicating with Mac OS X KEXT controls

#13775: Access Denied message on symlink creation misleading for an ex

#13773: Support sqlite3 uri filenames

#13742: Add a key parameter (like sorted) to heapq.merge

#13736: urllib.request.urlopen leaks exceptions from socket and httpli

#13734: Add a generic directory walker method to avoid symlink attacks

#13733: Change required to for Python 2.7.2 on OS/2

#13719: bdist_msi upload fails

Top 10 most discussed issues (10)

#13703: Hash collision security issue  48 msgs

#12600: Add example of using load_tests to parameterise Test Cases  12 msgs

#13790: In str.format an incorrect error message for list, tuple, dict  10 msgs

#6727: ImportError when package is symlinked on Windows   8 msgs

#13405: Add DTrace probes   8 msgs

#6531: atexit_callfuncs() crashing within Py_Finalize() when using mu   7 msgs

#13804: Python library structure creates hard to read code when using   7 msgs

#8052: subprocess close_fds behavior should only close open fds   6 msgs

#11805: package_data only allows one glob per-package   6 msgs

#10181: Problems with Py_buffer management in memoryobject.c (and else   5 msgs

Issues closed (51)

#2124: xml.sax and xml.dom fetch DTDs by default  closed by loewis

#2134: Add new attribute to TokenInfo to report specific token IDs  closed by meador.inge

#6528: builtins colored as keyword at beginning of line  closed by terry.reedy

#8285: IDLE not smart indenting correctly in nested statements  closed by terry.reedy

#11906: test_argparse failure in interactive mode  closed by terry.reedy

#11948: Tutorial/Modules - small fix to better clarify the modules sea  closed by sandro.tosi

#12705: Make compile('1\n2\n', '', 'single') raise an exception instea  closed by meador.inge

#12949: Documentation of PyCode_New() lacks kwonlyargcount argument  closed by meador.inge

#13039: IDLE editor: shell-like behaviour on line starting with ">>>"  closed by terry.reedy

#13516: Gzip old log files in rotating handlers  closed by vinay.sajip

#13589: Aifc low level serialization primitives fix  closed by pitrou

#13605: document argparse's nargs=REMAINDER  closed by sandro.tosi

#13629: _PyParser_TokenNames does not match up with the token.h number  closed by meador.inge

#13642: urllib incorrectly quotes username and password in https basic  closed by orsenthil

#13645: import machinery vulnerable to timestamp collisions  closed by pitrou

#13665: TypeError: string or integer address expected instead of str i  closed by ezio.melotti

#13695: "type specific" to "type-specific"  closed by ezio.melotti

#13715: typo in unicodedata documentation  closed by ezio.melotti

#13722: "distributions can disable the encodings package"  closed by pitrou

#13723: Regular expressions: (?:X|\s+)*$ takes a long time  closed by terry.reedy

#13725: regrtest does not recognize -d flag  closed by meador.inge

#13726: regrtest ambiguous -S flag  closed by orsenthil

#13727: Accessor macros for PyDateTime_Delta members  closed by amaury.forgeotdarc

#13728: Description of -m and -c cli options wrong?  closed by sandro.tosi

#13730: Grammar mistake in Decimal documentation  closed by python-dev

#13746: ast.Tuple's have an inconsistent "col_offset" value  closed by georg.brandl

#13752: add a str.casefold() method  closed by python-dev

#13760: ConfigParser exceptions are not pickleable  closed by lukasz.langa

#13761: Add flush keyword to print()  closed by python-dev

#13763: Potentially hard to understand wording in devguide  closed by terry.reedy

#13764: Misc/ is outdated... talks about svn  closed by pitrou

#13766: explain the relationship between Lib/lib2to3/Grammar.txt and G  closed by python-dev

#13768: Doc/tools/ available only on 2.7 branch  closed by georg.brandl

#13774: json.loads raises a SystemError for invalid encoding on 2.7.2  closed by amaury.forgeotdarc

#13780: make YieldFrom its own node  closed by python-dev

#13781: gzip module does the wrong thing with an os.fdopen()'ed fileob  closed by nadeem.vawda

#13786: does not handle --trace  closed by meador.inge

#13787: PyCode_New not round-trippable (TypeError)  closed by amaury.forgeotdarc

#13791: Reword ???Old versions??? in the doc sidebar  closed by eric.araujo

#13794: Copyright Year - Change it to 2012 please  closed by eric.araujo

#13795: CDATA Element missing  closed by amaury.forgeotdarc

#13803: Under Solaris, distutils doesn't include bitness in the direct  closed by python-dev

#13805: [].sort() should return self  closed by ezio.melotti

#13807: logging.Handler.handlerError() may raise AttributeError in tra  closed by python-dev

#13808: url for Tutor mailing list is broken  closed by ezio.melotti

#13809: bz2 does not work when threads are disabled  closed by nadeem.vawda

#13810: refer people to Doc/Makefile when not using 'make' to build ma  closed by sandro.tosi

#13811: In str.format, if invalid fill and alignment are specified, th  closed by python-dev

#13813: "" and "distutils/" redundancy  closed by eric.araujo

#13827: Unexecuted import changes namespace  closed by benjamin.peterson

#13832: tokenization assuming ASCII whitespace; missing multiline case  closed by benjamin.peterson

From ethan at  Fri Jan 20 18:09:57 2012
From: ethan at (Ethan Furman)
Date: Fri, 20 Jan 2012 09:09:57 -0800
Subject: [Python-Dev] exception chaining
Message-ID: <>


Exception Chaining is cool, unless you are writing libraries that want 
to transform from Exception X to Exception Y as the the previous 
exception context is unnecessary, potentially confusing, and cluttery 
(yup, just made that word up!).

For all the gory details, see

I'm going to attempt a patch implementing MRAB's suggestion:

except ValueError:
     raise as OtherError() # `raise` keeps context, `raise as` does not

The question I have at the moment is:  should `raise as` be an error if 
no exception is currently being handled?


def smurfy(x):
     if x != 'magic flute':
          raise as WrongInstrument

If this is allowed then `smurfy` could be called from inside an `except` 
clause or outside it.

I don't care for it for two reasons:

   - I don't like the way it looks
   - I can see it encouraging always using `raise as` instead of `raise` 
and losing the value of exception chaining.

Other thoughts?


From guido at  Fri Jan 20 18:43:27 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 09:43:27 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 19, 2012 at 8:06 PM, Ivan Kozik <ivan at> wrote:

> No, I wasn't happy with termination.  I wanted to treat it just like a
> JSON decoding error, and send the appropriate response.

So was this attack actually being mounted on your service regularly? I'd
think it would be sufficient to treat it as a MemoryError -- unavoidable,
if it happens things are really bad, and hopefully you'll crash quickly and
some monitor process restarts your service. That's a mechanism that you
should have anyway.

> I actually forgot to mention the main reason I abandoned the
> stop-at-N-collisions approach.  I had a server with a dict that stayed
> in memory, across many requests.  It was being populated with
> identifiers chosen by clients.  I couldn't have my server stay broken
> if this dict filled up with a bunch of colliding keys.  (I don't think
> I could have done another thing either, like nuke the dict or evict
> some keys.)

What would your service do if it ran out of memory?

Maybe one tweak to the collision counting would be that the exception needs
to inherit from BaseException (like MemoryError) so most generic exception
handlers don't actually handle it. (Style note: never use "except:", always
use "except Exception:".)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benjamin at  Fri Jan 20 18:47:13 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 20 Jan 2012 12:47:13 -0500
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/20 Ethan Furman <ethan at>:
> Summary:
> Exception Chaining is cool, unless you are writing libraries that want to
> transform from Exception X to Exception Y as the the previous exception
> context is unnecessary, potentially confusing, and cluttery (yup, just made
> that word up!).
> For all the gory details, see
> I'm going to attempt a patch implementing MRAB's suggestion:
> try:
> ? ?some_op
> except ValueError:
> ? ?raise as OtherError() # `raise` keeps context, `raise as` does not

I dislike this syntax. Raise what as OtherError()? I think the "raise
x from None" idea is preferable, since it indicates you are nulling
the context. The optimal solution would be to have "raise X
nocontext", but that would obviously require another keyword...


From guido at  Fri Jan 20 18:50:25 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 09:50:25 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 20, 2012 at 1:57 AM, Frank Sievertsen <pydev at>wrote:

>  The main issue with that approach is that it allows a new kind of attack.
> Indeed, I posted another example:
> This kind of fix can be used in a specific application or maybe in a
> special-purpose framework, but not on the level of a general-purpose
> language.

Right. We are discussion this issue (for weeks now...) because it makes
pretty much any Python app that takes untrusted data vulnerable, especially
web apps, and after extensive analysis we came to the conclusion that
defenses in the framework or in the app are really hard to do, very
disruptive for developers, whereas preventing the attack by a modification
of the dict or hash algorithms would fix it for everybody. And moreover,
the attack would work against pretty much any Python web app using a set of
evil strings computed once (hence encouraging script kiddies of just firing
their fully-automatic weapon at random websites).

The new attacks that are now being considered require analysis of how the
website is implemented, how it uses and stores data, etc. So an attacker
has to sit down and come up with an attack tailored to a specific website.
That can be dealt with on an ad-hoc basis.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Jan 20 19:15:21 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 10:15:21 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw <barry at> wrote:

> On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
> >Counting collision doesn't solve this case, but it doesn't make the
> >situation worse than before. Raising quickly an exception is better
> >than stalling for minutes, even if I agree than it is not the best
> >behaviour.
> ISTM that adding the possibility of raising a new exception on dictionary
> insertion is *more* backward incompatible than changing dictionary order,
> which for a very long time has been known to not be guaranteed.  You're
> running some application, you upgrade Python because you apply all security
> fixes, and suddenly you're starting to get exceptions in places you can't
> really do anything about.  Yet those exceptions are now part of the
> documented
> public API for dictionaries.  This is asking for trouble.  Bugs will
> suddenly
> start appearing in that application's tracker and they will seem to the
> application developer like Python just added a new public API in a security
> release.

Dict insertion can already raise an exception: MemoryError. I think we
should be safe if the new exception also derives from BaseException. We
should actually eriously consider just raising MemoryException, since
introducing a new built-in exception in a bugfix release is also very
questionable: code explicitly catching or raising it would not work on
previous bugfix releases of the same feature release.

OTOH, if you change dictionary order and *that* breaks the application, then
> the bugs submitted to the application's tracker will be legitimate bugs
> that
> have to be fixed even if nothing else changed.

There are lots of things that are undefined according to the language spec
(and quite possibly known to vary between versions or platforms or
implementations like PyPy or Jython) but which we would never change in a
bugfix release.

So I still think we should ditch the paranoia about dictionary order
> changing,
> and fix this without counting.  A little bit of paranoia could creep back
> in
> by disabling the hash fix by default in stable releases, but I think it
> would
> be fine to make that a compile-time option.

I'm sorry, but I don't want to break a user's app with a bugfix release and
say "haha your code was already broken you just didn't know it".

Sure, the dict order already varies across Python implementations, possibly
across 32/64 bits or operating systems. But many organizations (I know a
few :-) have a very large installed software base, created over many years
by many people with varying skills, that is kept working in part by very
carefully keeping the environment as constant as possible. This means that
the target environment is much more predictable than it is for the typical
piece of open source software.

Sure, a good Python developer doesn't write apps or tests that depend on
dict order. But time and again we see that not everybody writes perfect
code every time. Especially users writing "in-house" apps (as opposed to
frameworks shared as open source) are less likely to always use the most
robust, portable algorithms in existence, because they may know with much
more certainty that their code will never be used on certain combinations
of platforms. For example, I rarely think  about whether code I write might
not work on IronPython or Jython, or even CPython on Windows. And if
something I wrote suddenly needs to be ported to one of those, well, that's
considered a port and I'll just accept that it might mean changing a few

The time to break a dependency on dict order is not with a bugfix release
but with a feature release: those are more likely to break other things as
well anyway, and uses are well aware that they have to test everything and
anticipate having to fix some fraction of their code for each feature
release. OTOH we have established a long and successful track record of
conservative bugfix releases that don't break anything. (I am aware of
exactly one thing that was broken by a bugfix release in application code I
am familiar with.)

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Jan 20 19:16:15 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 10:16:15 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 5:20 AM, Barry Warsaw <barry at> wrote:

> Let's just be clear about it: this exception is new public API.  Changing
> dictionary order is not.

Not if you raise MemoryError or BaseException.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Jan 20 19:20:08 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 10:20:08 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

This is the first objection I have seen against collision-counting that
might stand.

On Fri, Jan 20, 2012 at 7:55 AM, Frank Sievertsen <pydev at>wrote:

> Hello,
> I still see at least two ways to create a DOS attack even with the
> collison-counting-patch.
> I assumed that it's possible to send ~500KB of payload to the
> application.
> 1. It's fully deterministic which slots the dict will lookup.
> Since we don't count slot-collisions, but only hash-value-collisions
> this can be exploited easily by creating strings with the hash-values
> along the lookup-way of an arbitrary (short) string.
> So first we pick an arbitrary string. Then calculate which slots it will
> visit on the way to the first empty slot. Then we create strings with
> hash-values for these slots.
> This attack first injects the strings to fill all the slots that the
> one short string will want to visit. Then it adds THE SAME string
> again and again. Since the entry is already there, nothing will be added
> and no additional collisions happen, no exception raised.
> $ ls -l super.txt
> -rw-r--r-- 1 fx5 fx5 520000 20. Jan 10:19 super.txt
> $ tail -n3 super.txt
> FX5
> FX5
> FX5
> $ wc -l super.txt
> 90000 super.txt
> $ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("super.txt"))'
> real    0m52.724s
> user    0m51.543s
> sys    0m0.028s
> 2. The second attack actually attacks that 1000 allowed string
> comparisons are still a lot of work.
> First I added 999 strings that collide with a one-byte string "a". In
> some applications a zero-byte string might work even better. Then I
> can add a many thousand of the "a"'s, just like the first attack.
> $ ls -l 1000.txt
> -rw-r--r-- 1 fx5 fx5 500000 20. Jan 16:15 1000.txt
> $ head -n 3 1000.txt
> 7hLci00
> 4wVFm10
> _rZJU50
> $ wc -l 1000.txt
> 247000 1000.txt
> $ tail -n 3 1000.txt
> a
> a
> a
> $ time python -c 'dict((unicode(l[:-1]), 0)  for l in open("1000.txt"))'
> real    0m17.408s
> user    0m15.897s
> sys    0m0.008s
> Of course the first attack is far more efficient. One could argue
> that 16 seconds is not enough for an attack. But maybe it's possible
> to send 1MB, have zero-bytes strings, and since for example django
> does 5 lookups per query-string this will keep it busy for ~80 seconds on
> my pc.
> What to do now?
> I think it's not smart to reduce the number of allowed collisions
> dramatically
> AND count all slot-collisions at the same time.
> Frank
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:**mailman/options/python-dev/**

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Fri Jan 20 19:52:44 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 20 Jan 2012 19:52:44 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <jfccv1$skc$>

Am 20.01.2012 19:15, schrieb Guido van Rossum:

>     OTOH, if you change dictionary order and *that* breaks the application, then
>     the bugs submitted to the application's tracker will be legitimate bugs that
>     have to be fixed even if nothing else changed.
> There are lots of things that are undefined according to the language spec (and
> quite possibly known to vary between versions or platforms or implementations
> like PyPy or Jython) but which we would never change in a bugfix release.
>     So I still think we should ditch the paranoia about dictionary order changing,
>     and fix this without counting.  A little bit of paranoia could creep back in
>     by disabling the hash fix by default in stable releases, but I think it would
>     be fine to make that a compile-time option.
> I'm sorry, but I don't want to break a user's app with a bugfix release and say
> "haha your code was already broken you just didn't know it".
> Sure, the dict order already varies across Python implementations, possibly
> across 32/64 bits or operating systems. But many organizations (I know a few :-)
> have a very large installed software base, created over many years by many
> people with varying skills, that is kept working in part by very carefully
> keeping the environment as constant as possible. This means that the target
> environment is much more predictable than it is for the typical piece of open
> source software.

I agree.  This applies to 3.2 and 2.7, but even more to 3.1 and 2.6, which are
in security-fix mode.

Even if relying on dict order is a bug right now, I believe it happens many
times more often in code bases out there than dicts that are filled with many
many colliding keys.  So even if we can honestly blame the programmer in the
former case, the users applying the security fix will have the same bad
experience and won't likely care if we claim "undefined behavior".  This means
that it seems preferable to go with the situation where you have less breakages
in total.

Not to mention that changing dict order is likely to lead to much more subtle
bugs than a straight MemoryError on a dict insert.

Also, another advantage of collision counting I haven't seen in the discussion
yet is that it's quite trivial to provide an API in sys to turn it on or off --
while turning on or off randomized hashes has to be done before Python starts
up, i.e. at build time or with an environment variable or flag.


From brett at  Fri Jan 20 19:49:55 2012
From: brett at (Brett Cannon)
Date: Fri, 20 Jan 2012 13:49:55 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 13:15, Guido van Rossum <guido at> wrote:

> On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw <barry at> wrote:
>> On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
>> >Counting collision doesn't solve this case, but it doesn't make the
>> >situation worse than before. Raising quickly an exception is better
>> >than stalling for minutes, even if I agree than it is not the best
>> >behaviour.
>> ISTM that adding the possibility of raising a new exception on dictionary
>> insertion is *more* backward incompatible than changing dictionary order,
>> which for a very long time has been known to not be guaranteed.  You're
>> running some application, you upgrade Python because you apply all
>> security
>> fixes, and suddenly you're starting to get exceptions in places you can't
>> really do anything about.  Yet those exceptions are now part of the
>> documented
>> public API for dictionaries.  This is asking for trouble.  Bugs will
>> suddenly
>> start appearing in that application's tracker and they will seem to the
>> application developer like Python just added a new public API in a
>> security
>> release.
> Dict insertion can already raise an exception: MemoryError. I think we
> should be safe if the new exception also derives from BaseException. We
> should actually eriously consider just raising MemoryException, since
> introducing a new built-in exception in a bugfix release is also very
> questionable: code explicitly catching or raising it would not work on
> previous bugfix releases of the same feature release.
> OTOH, if you change dictionary order and *that* breaks the application,
>> then
>> the bugs submitted to the application's tracker will be legitimate bugs
>> that
>> have to be fixed even if nothing else changed.
> There are lots of things that are undefined according to the language spec
> (and quite possibly known to vary between versions or platforms or
> implementations like PyPy or Jython) but which we would never change in a
> bugfix release.
> So I still think we should ditch the paranoia about dictionary order
>> changing,
>> and fix this without counting.  A little bit of paranoia could creep back
>> in
>> by disabling the hash fix by default in stable releases, but I think it
>> would
>> be fine to make that a compile-time option.
> I'm sorry, but I don't want to break a user's app with a bugfix release
> and say "haha your code was already broken you just didn't know it".
> Sure, the dict order already varies across Python implementations,
> possibly across 32/64 bits or operating systems. But many organizations (I
> know a few :-) have a very large installed software base, created over many
> years by many people with varying skills, that is kept working in part by
> very carefully keeping the environment as constant as possible. This means
> that the target environment is much more predictable than it is for the
> typical piece of open source software.
> Sure, a good Python developer doesn't write apps or tests that depend on
> dict order. But time and again we see that not everybody writes perfect
> code every time. Especially users writing "in-house" apps (as opposed to
> frameworks shared as open source) are less likely to always use the most
> robust, portable algorithms in existence, because they may know with much
> more certainty that their code will never be used on certain combinations
> of platforms. For example, I rarely think  about whether code I write might
> not work on IronPython or Jython, or even CPython on Windows. And if
> something I wrote suddenly needs to be ported to one of those, well, that's
> considered a port and I'll just accept that it might mean changing a few
> things.
> The time to break a dependency on dict order is not with a bugfix release
> but with a feature release: those are more likely to break other things as
> well anyway, and uses are well aware that they have to test everything and
> anticipate having to fix some fraction of their code for each feature
> release. OTOH we have established a long and successful track record of
> conservative bugfix releases that don't break anything. (I am aware of
> exactly one thing that was broken by a bugfix release in application code I
> am familiar with.)

Why can't we have our cake and eat it too?

Can we do hash randomization in 3.3 and use the hash count solution for
bugfix releases? That way we get a basic fix into the bugfix releases that
won't break people's code (hopefully) but we go with a more thorough (and
IMO correct) solution of hash randomization starting with 3.3 and moving
forward. We aren't breaking compatibility in any way by doing this since
it's a feature release anyway where we change tactics. And it can't be that
much work since we seem to have patches for both solutions. At worst it
will make merging commits for those files affected by the patches, but that
will most likely be isolated and not a common collision (and less of any
issue once 3.3 is released later this year).

I understand the desire to keep backwards-compatibility, but collision
counting could cause an error in some random input that someone didn't
expect to cause issues whether they were under a DoS attack or just had
some unfortunate input from private data. The hash randomization, though,
is only weak if someone is attacked, not if they are just using Python with
their own private data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Fri Jan 20 20:03:36 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 20 Jan 2012 14:03:36 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <jfcdqh$vg4$>

On 1/20/2012 11:17 AM, Victor Stinner wrote:

> There is no perfect solutions, drawbacks of each solution should be compared.


One possible attack that has been described for a collision counting 
dict depends on knowing precisely the trigger point. So let 
MAXCOLLISIONS either be configureable or just choose a random count 
between M and N, say 700 and 999.

It would not hurt to have alternate patches available in case a 
particular Python-powered site comes under prolonged attack. Though 
given our miniscule share of the market, than is much less likely that 
an attack on a PHP- or MS-powered site.

Terry Jan Reedy

From donald.stufft at  Fri Jan 20 20:04:21 2012
From: donald.stufft at (Donald Stufft)
Date: Fri, 20 Jan 2012 14:04:21 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Even if a MemoryException is raised I believe that is still a fundamental change in the documented contract of dictionary API. I don't believe there is a way to fix this without breaking someones application. The major differences I see between the two solutions is that counting will break people's applications who are otherwise following the documented api contract of dictionaries, and randomization will break people's applications who are violating the documented api contract of dictionaries. 

Personally I feel that the lesser of two evils is to reward those who followed the documentation, and not reward those who didn't.

So +1 for Randomization as the only option in 3.3, and off by default with a flag or environment variable in bug fixes. I think it's the only way to proceed that won't hurt people who have followed the documented behavior. 

On Friday, January 20, 2012 at 1:49 PM, Brett Cannon wrote:

> On Fri, Jan 20, 2012 at 13:15, Guido van Rossum <guido at (mailto:guido at> wrote:
> > On Fri, Jan 20, 2012 at 5:10 AM, Barry Warsaw <barry at (mailto:barry at> wrote:
> > > On Jan 20, 2012, at 01:50 PM, Victor Stinner wrote:
> > > 
> > > >Counting collision doesn't solve this case, but it doesn't make the
> > > >situation worse than before. Raising quickly an exception is better
> > > >than stalling for minutes, even if I agree than it is not the best
> > > >behaviour.
> > > 
> > > ISTM that adding the possibility of raising a new exception on dictionary
> > > insertion is *more* backward incompatible than changing dictionary order,
> > > which for a very long time has been known to not be guaranteed.  You're
> > > running some application, you upgrade Python because you apply all security
> > > fixes, and suddenly you're starting to get exceptions in places you can't
> > > really do anything about.  Yet those exceptions are now part of the documented
> > > public API for dictionaries.  This is asking for trouble.  Bugs will suddenly
> > > start appearing in that application's tracker and they will seem to the
> > > application developer like Python just added a new public API in a security
> > > release.
> > 
> > Dict insertion can already raise an exception: MemoryError. I think we should be safe if the new exception also derives from BaseException. We should actually eriously consider just raising MemoryException, since introducing a new built-in exception in a bugfix release is also very questionable: code explicitly catching or raising it would not work on previous bugfix releases of the same feature release.
> > 
> > > OTOH, if you change dictionary order and *that* breaks the application, then
> > > the bugs submitted to the application's tracker will be legitimate bugs that
> > > have to be fixed even if nothing else changed.
> > 
> > There are lots of things that are undefined according to the language spec (and quite possibly known to vary between versions or platforms or implementations like PyPy or Jython) but which we would never change in a bugfix release.
> > 
> > > So I still think we should ditch the paranoia about dictionary order changing,
> > > and fix this without counting.  A little bit of paranoia could creep back in
> > > by disabling the hash fix by default in stable releases, but I think it would
> > > be fine to make that a compile-time option.
> > 
> > I'm sorry, but I don't want to break a user's app with a bugfix release and say "haha your code was already broken you just didn't know it".
> > 
> > Sure, the dict order already varies across Python implementations, possibly across 32/64 bits or operating systems. But many organizations (I know a few :-) have a very large installed software base, created over many years by many people with varying skills, that is kept working in part by very carefully keeping the environment as constant as possible. This means that the target environment is much more predictable than it is for the typical piece of open source software.
> > 
> > Sure, a good Python developer doesn't write apps or tests that depend on dict order. But time and again we see that not everybody writes perfect code every time. Especially users writing "in-house" apps (as opposed to frameworks shared as open source) are less likely to always use the most robust, portable algorithms in existence, because they may know with much more certainty that their code will never be used on certain combinations of platforms. For example, I rarely think  about whether code I write might not work on IronPython or Jython, or even CPython on Windows. And if something I wrote suddenly needs to be ported to one of those, well, that's considered a port and I'll just accept that it might mean changing a few things.
> > 
> > The time to break a dependency on dict order is not with a bugfix release but with a feature release: those are more likely to break other things as well anyway, and uses are well aware that they have to test everything and anticipate having to fix some fraction of their code for each feature release. OTOH we have established a long and successful track record of conservative bugfix releases that don't break anything. (I am aware of exactly one thing that was broken by a bugfix release in application code I am familiar with.) 
> Why can't we have our cake and eat it too?
> Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year). 
> I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data. 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at (mailto:Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From casevh at  Fri Jan 20 20:06:46 2012
From: casevh at (Case Van Horsen)
Date: Fri, 20 Jan 2012 11:06:46 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 8:17 AM, Victor Stinner
<victor.stinner at> wrote:
>> So I still think we should ditch the paranoia about dictionary order changing,
>> and fix this without counting.
> The randomized hash has other issues:
> ?- its security is based on its secret, whereas it looks to be easy to
> compute it (see more details in the issue)
> ?- my patch only changes hash(str), whereas other developers asked me
> to patch also bytes, int and other types

Changing hash(int) on a bugfix release will cause issues with
extensions (gmpy, sage, probably others) that calculate the hash of
numerical objects.

> hash(bytes) can be changed. But changing hash(int) may leak easily the
> secret. We may use a different secret for each type, but if it is easy
> to compute int hash secret, dictionaries using int are still
> vulnerable.
> --
> There is no perfect solutions, drawbacks of each solution should be compared.
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From g.brandl at  Fri Jan 20 20:14:49 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 20 Jan 2012 20:14:49 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>	<1326901919.3395.67.camel@localhost.localdomain>	<>
	<jf9t7m$hu4$> <>
Message-ID: <jfce8h$1c0$>

Am 20.01.2012 00:54, schrieb "Martin v. L?wis":
>> I can't help noticing that so far, worries about the workload came mostly from
>> people who don't actually bear that load (this is no accusation!), while those
>> that do are the proponents of the PEP...
> Ok, so let me add then that I'm worried about the additional work-load.
> I'm particularly worried about the coordination of vacation across the
> three people that work on a release. It might well not be possible to
> make any release for a period of two months, which, in a six-months
> release cycle with two alphas and a beta, might mean that we (the
> release people) would need to adjust our vacation plans with the release
> schedule, or else step down (unless you would release the "normal"
> feature releases as source-only releases).

Thanks for the reminder, Martin.  Even with the current release schedule,
I think that the load on you is too much, and we need a whole team of
Windows release experts.  It's not really fair that the RM usually changes
from release to release (at least every 2), and you have to do the same
for everyone.

It looks like we have one volunteer already; if we find another, I think
one of them will also be not on vacation at most times :)

For the Mac, at least we're up to two experts, but I'd like to see a
third there too.


From tjreedy at  Fri Jan 20 20:29:31 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 20 Jan 2012 14:29:31 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <jfcfb3$559$>

On 1/20/2012 10:55 AM, Frank Sievertsen wrote:
> Hello,
> I still see at least two ways to create a DOS attack even with the
> collison-counting-patch.

> 2. The second attack actually attacks that 1000 allowed string
> comparisons are still a lot of work.
> First I added 999 strings that collide with a one-byte string "a". In
> some applications a zero-byte string might work even better. Then I
> can add a many thousand of the "a"'s, just like the first attack.

If 1000 were replaced by, for instance, random.randint(700,1000) the 
dict could not be set to have an exception triggered with one other 
entry (which I believe was Martin's idea). But I suppose you would say 
that 699 entries would still make for much work.

The obvious defense for this particular attack is to reject duplicate 
keys. Perhaps there should be write-once string sets and dicts available.

This gets to the point that there is no best blind defense to all 
possible attacks.

Terry Jan Reedy

From tseaver at  Fri Jan 20 20:36:56 2012
From: tseaver at (Tres Seaver)
Date: Fri, 20 Jan 2012 14:36:56 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <jfcfoo$461$>

Hash: SHA1

On 01/20/2012 02:04 PM, Donald Stufft wrote:

> Even if a MemoryException is raised I believe that is still a 
> fundamental change in the documented contract of dictionary API.

How so?  Dictionary inserts can *already* raise that error.

> I don't believe there is a way to fix this without breaking someones 
> application. The major differences I see between the two solutions is
>  that counting will break people's applications who are otherwise 
> following the documented api contract of dictionaries,

Do you have a case in mind where legitimate user data (not crafted as
part of a DoS attack) would trip the 1000-collision limit?  How likely is
it that such cases exist in already-deployed applications, compared to
the known breakage in existing applications due to hash randomization?

> and randomization will break people's applications who are violating 
> the documented api contract of dictionaries.
> Personally I feel that the lesser of two evils is to reward those who
>  followed the documentation, and not reward those who didn't.

Except that I think your set is purely hypothetical, while the second set
is *lots* of deployed applications.

- -- 
Tres Seaver          +1 540-429-0999          tseaver at
Palladion Software   "Excellence by Design"
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


From chris at  Fri Jan 20 20:05:41 2012
From: chris at (Chris Withers)
Date: Fri, 20 Jan 2012 19:05:41 +0000
Subject: [Python-Dev] 2.7 now uses Sphinx 1.0
In-Reply-To: <>
References: <>
Message-ID: <>

On 14/01/2012 16:14, Sandro Tosi wrote:
> Hello,
> just a heads-up: documentation for 2.7 branch has been ported to use
> sphinx 1.0, so now the same syntax can be used for 2.x and 3.x
> patches, hopefully easying working on both python stacks.

That's great news, does that now mean the objects inventory for Python 
2.7 and Python 3 on now supports referring to section headers 
from 3rd party packages?


Simplistix - Content Management, Batch Processing & Python Consulting

From donald.stufft at  Fri Jan 20 20:51:16 2012
From: donald.stufft at (Donald Stufft)
Date: Fri, 20 Jan 2012 14:51:16 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <jfcfoo$461$>
References: <>
Message-ID: <>

On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:

> Hash: SHA1
> On 01/20/2012 02:04 PM, Donald Stufft wrote:
> > Even if a MemoryException is raised I believe that is still a 
> > fundamental change in the documented contract of dictionary API.
> > 
> How so? Dictionary inserts can *already* raise that error.
Because it's raising it for a fundamentally different thing. "You have plenty of memory, but we decided to add an arbitrary limit that has nothing to do with memory and pretend you are out of memory anyways".
> > I don't believe there is a way to fix this without breaking someones 
> > application. The major differences I see between the two solutions is
> > that counting will break people's applications who are otherwise 
> > following the documented api contract of dictionaries,
> > 
> Do you have a case in mind where legitimate user data (not crafted as
> part of a DoS attack) would trip the 1000-collision limit? How likely is
> it that such cases exist in already-deployed applications, compared to
> the known breakage in existing applications due to hash randomization?

I don't, but as there's never been a limit on how many collisions a dictionary can have, this would be a fundamental change in the documented (and undocumented) abilities of a dictionary. Dictionary key order has never been guaranteed, is documented to not be relied on, already changes depending on if you are using 32bit, 64bit, Jython, PyPy etc or as someone else pointed out, to any number of possible improvements to dict. The counting solution violates the existing contract in order to serve people who themselves are violating the contract. Even with their violation the method that I +1'd still serves to not break existing applications by default.
> > and randomization will break people's applications who are violating 
> > the documented api contract of dictionaries.
> > 
> > Personally I feel that the lesser of two evils is to reward those who
> > followed the documentation, and not reward those who didn't.
> > 
> Except that I think your set is purely hypothetical, while the second set
> is *lots* of deployed applications.

Which is why I believe that it should be off by default on the bugfix, but easily enabled. (Flag, env var, whatever). That allows people to upgrade to a bugfix without breaking their application, and if this vulnerability affects them, they can enable it.

I think the counting collision is at best a bandaid and not a proper fix stemmed from a desire to not break existing applications on a bugfix release which can be better solved by implementing the real fix and allowing people to control (only on the bugfix, on 3.3+ it should be forced to on always) if they have it enabled or not.
> Tres.
> - -- 
> ===================================================================
> Tres Seaver +1 540-429-0999 tseaver at (mailto:tseaver at
> Palladion Software "Excellence by Design"
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla -
> iEYEARECAAYFAk8ZwlgACgkQ+gerLs4ltQ4KOACglAHDgn5wUb+cye99JbeW0rZo
> 5oAAn2ja7K4moFLN/aD4ZP7m+8WnwhcA
> =u7Mt
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at (mailto:Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Fri Jan 20 20:27:07 2012
From: ethan at (Ethan Furman)
Date: Fri, 20 Jan 2012 11:27:07 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Donald Stufft wrote:
> Even if a MemoryException is raised I believe that is still a 
> fundamental change in the documented contract of dictionary API. I don't 
> believe there is a way to fix this without breaking someones 
> application. The major differences I see between the two solutions is 
> that counting will break people's applications who are otherwise 
> following the documented api contract of dictionaries, and randomization 
> will break people's applications who are violating the documented api 
> contract of dictionaries. 
> Personally I feel that the lesser of two evils is to reward those who 
> followed the documentation, and not reward those who didn't.
> So +1 for Randomization as the only option in 3.3, and off by default 
> with a flag or environment variable in bug fixes. I think it's the only 
> way to proceed that won't hurt people who have followed the documented 
> behavior.



From guido at  Fri Jan 20 21:02:39 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 12:02:39 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft <donald.stufft at>wrote:

>  On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:
> On 01/20/2012 02:04 PM, Donald Stufft wrote:
> Even if a MemoryException is raised I believe that is still a
> fundamental change in the documented contract of dictionary API.
> How so? Dictionary inserts can *already* raise that error.
> Because it's raising it for a fundamentally different thing. "You have
> plenty of memory, but we decided to add an arbitrary limit that has nothing
> to do with memory and pretend you are out of memory anyways".

Actually due to fragmentation that can already happen.

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mail at  Fri Jan 20 21:08:50 2012
From: mail at (Tim Golden)
Date: Fri, 20 Jan 2012 20:08:50 +0000
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <jfce8h$1c0$>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>	<1326901919.3395.67.camel@localhost.localdomain>	<>
	<jf9t7m$hu4$> <>
Message-ID: <>

On 20/01/2012 19:14, Georg Brandl wrote:
> Am 20.01.2012 00:54, schrieb "Martin v. L?wis":
>>> I can't help noticing that so far, worries about the workload came mostly from
>>> people who don't actually bear that load (this is no accusation!), while those
>>> that do are the proponents of the PEP...
>> Ok, so let me add then that I'm worried about the additional work-load.
>> I'm particularly worried about the coordination of vacation across the
>> three people that work on a release. It might well not be possible to
>> make any release for a period of two months, which, in a six-months
>> release cycle with two alphas and a beta, might mean that we (the
>> release people) would need to adjust our vacation plans with the release
>> schedule, or else step down (unless you would release the "normal"
>> feature releases as source-only releases).
> Thanks for the reminder, Martin.  Even with the current release schedule,
> I think that the load on you is too much, and we need a whole team of
> Windows release experts.  It's not really fair that the RM usually changes
> from release to release (at least every 2), and you have to do the same
> for everyone.
> It looks like we have one volunteer already; if we find another, I think
> one of them will also be not on vacation at most times :)

I'm certainly happy to help out there. Like everyone I'm
not always clear on my availability but the more people
who know what needs to be done, the better ISTM.


From ethan at  Fri Jan 20 21:05:12 2012
From: ethan at (Ethan Furman)
Date: Fri, 20 Jan 2012 12:05:12 -0800
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/20 Ethan Furman <ethan at>:
>> Summary:
>> Exception Chaining is cool, unless you are writing libraries that want to
>> transform from Exception X to Exception Y as the the previous exception
>> context is unnecessary, potentially confusing, and cluttery (yup, just made
>> that word up!).
>> For all the gory details, see
>> I'm going to attempt a patch implementing MRAB's suggestion:
>> try:
>>    some_op
>> except ValueError:
>>    raise as OtherError() # `raise` keeps context, `raise as` does not
> I dislike this syntax. Raise what as OtherError()? I think the "raise
> x from None" idea is preferable, since it indicates you are nulling
> the context. The optimal solution would be to have "raise X
> nocontext", but that would obviously require another keyword...

Raise 'the error' as OtherError.

The problem I have with 'raise x from None' is it puts 'from None' clear 
at the end of line -- not a big deal on this very short example, but 
when you have actual text it's not as obvious:

except SomeError():
     raise SomeOtherError('explanatory text with actual %data to help 
track down the problem' % data) from None

Of course, I suppose that same issue exists with the 'raise x from exc' 
syntax, and 'from None' certainly matches that better...


From benjamin at  Fri Jan 20 21:56:27 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 20 Jan 2012 15:56:27 -0500
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/20 Ethan Furman <ethan at>:
> Benjamin Peterson wrote:
>> 2012/1/20 Ethan Furman <ethan at>:
>>> Summary:
>>> Exception Chaining is cool, unless you are writing libraries that want to
>>> transform from Exception X to Exception Y as the the previous exception
>>> context is unnecessary, potentially confusing, and cluttery (yup, just
>>> made
>>> that word up!).
>>> For all the gory details, see
>>> I'm going to attempt a patch implementing MRAB's suggestion:
>>> try:
>>> ? some_op
>>> except ValueError:
>>> ? raise as OtherError() # `raise` keeps context, `raise as` does not
>> I dislike this syntax. Raise what as OtherError()? I think the "raise
>> x from None" idea is preferable, since it indicates you are nulling
>> the context. The optimal solution would be to have "raise X
>> nocontext", but that would obviously require another keyword...
> Raise 'the error' as OtherError.

Where 'the error' is? Aren't you trying to override the current error?

> The problem I have with 'raise x from None' is it puts 'from None' clear at
> the end of line -- not a big deal on this very short example, but when you
> have actual text it's not as obvious:
> except SomeError():
> ? ?raise SomeOtherError('explanatory text with actual %data to help track
> down the problem' % data) from None
> Of course, I suppose that same issue exists with the 'raise x from exc'
> syntax, and 'from None' certainly matches that better...



From d01c at  Fri Jan 20 21:20:12 2012
From: d01c at (Dr.-Ing. Ingo D. Rullhusen)
Date: Fri, 20 Jan 2012 21:20:12 +0100
Subject: [Python-Dev] negative ref count on windows debug version
Message-ID: <>



loc = PyDict_New();


PyObject *src = Py_CompileString( code.toStdString().c_str(),
"<console>", Py_single_input );

results in a "object at blahblah has negative ref count -1" error on
windows visual studio in debug mode. And yes, python is compiled and
linked in debug mode also. The release version seems to work.

This happens in version 2.6.7 and 2.7.2.

Any hints?


From g.brandl at  Fri Jan 20 22:06:11 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 20 Jan 2012 22:06:11 +0100
Subject: [Python-Dev] PEP 407: New release cycle and introducing
 long-term support versions
In-Reply-To: <>
References: <>	<>	<>	<>	<1326891727.3395.44.camel@localhost.localdomain>	<>	<1326901919.3395.67.camel@localhost.localdomain>	<>
	<jf9t7m$hu4$> <>
	<jfce8h$1c0$> <>
Message-ID: <jfckp7$mrg$>

Am 20.01.2012 21:08, schrieb Tim Golden:
> On 20/01/2012 19:14, Georg Brandl wrote:
>> Am 20.01.2012 00:54, schrieb "Martin v. L?wis":
>>>> I can't help noticing that so far, worries about the workload came mostly from
>>>> people who don't actually bear that load (this is no accusation!), while those
>>>> that do are the proponents of the PEP...
>>> Ok, so let me add then that I'm worried about the additional work-load.
>>> I'm particularly worried about the coordination of vacation across the
>>> three people that work on a release. It might well not be possible to
>>> make any release for a period of two months, which, in a six-months
>>> release cycle with two alphas and a beta, might mean that we (the
>>> release people) would need to adjust our vacation plans with the release
>>> schedule, or else step down (unless you would release the "normal"
>>> feature releases as source-only releases).
>> Thanks for the reminder, Martin.  Even with the current release schedule,
>> I think that the load on you is too much, and we need a whole team of
>> Windows release experts.  It's not really fair that the RM usually changes
>> from release to release (at least every 2), and you have to do the same
>> for everyone.
>> It looks like we have one volunteer already; if we find another, I think
>> one of them will also be not on vacation at most times :)
> I'm certainly happy to help out there. Like everyone I'm
> not always clear on my availability but the more people
> who know what needs to be done, the better ISTM.

Definitely.  Thanks for volunteering, Tim!


From g.brandl at  Fri Jan 20 22:07:54 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 20 Jan 2012 22:07:54 +0100
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <jfckse$mrg$>

Am 20.01.2012 21:05, schrieb Ethan Furman:
> Benjamin Peterson wrote:
>> 2012/1/20 Ethan Furman <ethan at>:
>>> Summary:
>>> Exception Chaining is cool, unless you are writing libraries that want to
>>> transform from Exception X to Exception Y as the the previous exception
>>> context is unnecessary, potentially confusing, and cluttery (yup, just made
>>> that word up!).
>>> For all the gory details, see
>>> I'm going to attempt a patch implementing MRAB's suggestion:
>>> try:
>>>    some_op
>>> except ValueError:
>>>    raise as OtherError() # `raise` keeps context, `raise as` does not
>> I dislike this syntax. Raise what as OtherError()? I think the "raise
>> x from None" idea is preferable, since it indicates you are nulling
>> the context. The optimal solution would be to have "raise X
>> nocontext", but that would obviously require another keyword...
> Raise 'the error' as OtherError.
> The problem I have with 'raise x from None' is it puts 'from None' clear 
> at the end of line -- not a big deal on this very short example, but 
> when you have actual text it's not as obvious:

Well, the "as" in "raise as" would be very easily overlooked too.

> except SomeError():
>      raise SomeOtherError('explanatory text with actual %data to help 
> track down the problem' % data) from None

In any case, I don't think the context suppression is the most important
thing about the exception raising, so it doesn't need to stand out...


From tjreedy at  Fri Jan 20 22:38:01 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 20 Jan 2012 16:38:01 -0500
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <jfcms2$ttp$>

Since 'raise' means 're-raise the current error', 'raise as OtherError' 
means (clearly to me, anyway) 're-raise the current error as 
OtherError'. This is just what you want to be able to say. Since 'raise' 
without a current error results in a TypeError, so should 'raise as 
OtherError'. I would just go with this as the proposal.

Terry Jan Reedy

From amauryfa at  Fri Jan 20 22:42:20 2012
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Fri, 20 Jan 2012 22:42:20 +0100
Subject: [Python-Dev] negative ref count on windows debug version
In-Reply-To: <>
References: <>
Message-ID: <>


2012/1/20 Dr.-Ing. Ingo D. Rullhusen <d01c at>

> using
> loc = PyDict_New();
> Py_XDECREF(loc);
> or
> PyObject *src = Py_CompileString( code.toStdString().c_str(),
> "<console>", Py_single_input );
> Py_XDECREF(src);
> results in a "object at blahblah has negative ref count -1" error on
> windows visual studio in debug mode. And yes, python is compiled and
> linked in debug mode also. The release version seems to work.
> This happens in version 2.6.7 and 2.7.2.

This looks very unlikely. Python itself is written with tons of similar
and works very well in debug mode.
If you can isolate a reproducible case, please file a ticket on,
with all details: code, versions of the compiler, etc.

Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Fri Jan 20 23:11:15 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 20 Jan 2012 17:11:15 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <jfcoqc$4aq$>

On 1/20/2012 2:51 PM, Donald Stufft wrote:

> I think the counting collision is at best a bandaid and not a proper fix
> stemmed from a desire to not break existing applications on a bugfix
> release ...

My opinion of counting is better than yours, but even conceding the 
theoretical, purity argument, our release process is practical as well. 
There have been a few occasions when fixes to bugs in our code have been 
delayed from a bugfix release to the next feature release -- because the 
fix would break too much code depending on the bug.

Some years ago there was a proposal that we should deliberately tweak 
hash() to break 'buggy' code that depended on it not changing. This 
never happened. So it has been left de facto constant, to the extent it 
is, for some years.

Terry Jan Reedy

From wolfson at  Fri Jan 20 23:33:08 2012
From: wolfson at (Ben Wolfson)
Date: Fri, 20 Jan 2012 14:33:08 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <jfcoqc$4aq$>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy <tjreedy at> wrote:
> On 1/20/2012 2:51 PM, Donald Stufft wrote:
>> I think the counting collision is at best a bandaid and not a proper fix
>> stemmed from a desire to not break existing applications on a bugfix
>> release ...
> My opinion of counting is better than yours, but even conceding the
> theoretical, purity argument, our release process is practical as well.
> There have been a few occasions when fixes to bugs in our code have been
> delayed from a bugfix release to the next feature release -- because the fix
> would break too much code depending on the bug.

AFAICT Brett's suggestion (which had occurred to me as well, but I'm
not a core developer by any stretch) seemed to get lost in the debate:
would it be possible to go with collision counting for bugfix releases
and hash randomization for new feature releases? (Brett made it here:

Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]

From ethan at  Fri Jan 20 23:17:29 2012
From: ethan at (Ethan Furman)
Date: Fri, 20 Jan 2012 14:17:29 -0800
Subject: [Python-Dev] exception chaining
In-Reply-To: <jfckse$mrg$>
References: <>	<>	<>
Message-ID: <>

Georg Brandl wrote:
> Well, the "as" in "raise as" would be very easily overlooked too.
> In any case, I don't think the context suppression is the most important
> thing about the exception raising, so it doesn't need to stand out...

Good point.

From pydev at  Fri Jan 20 23:35:42 2012
From: pydev at (Frank Sievertsen)
Date: Fri, 20 Jan 2012 23:35:42 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Am 20.01.2012 16:33, schrieb Guido van Rossum:
> (I'm thinking that the original attack is trivial once the set of 
> 65000 colliding keys is public knowledge, which must be only a matter 
> of time.

I think it's very likely that this will happen soon.

For ASP and PHP there is attack-payload publicly available.
PHP and ASP have patches to limit the number of query-variables.

We're very lucky that there's no public payload for python yet,
and all non-public software and payload I'm aware of is based
upon my software.

But this can change any moment. It's not really difficult to
write software to create 32bit-collisions.


From donald.stufft at  Fri Jan 20 23:36:20 2012
From: donald.stufft at (Donald Stufft)
Date: Fri, 20 Jan 2012 17:36:20 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <jfcoqc$4aq$>
References: <>
Message-ID: <>

I believe that either solution has the potential to break existing applications so to ensure that no applications are broken there will need to be a method of disabling it. I also believe that to maintain the backwards compatibility that Python has traditionally had in bug fix releases that either solution will need to default to off. 

Given those 2 things that I believe, I don't think that the argument should be which solution will break less, because I believe both will need to be off by default, but which solution more adequately solves the underlying problem. 

On Friday, January 20, 2012 at 5:11 PM, Terry Reedy wrote:

> On 1/20/2012 2:51 PM, Donald Stufft wrote:
> > I think the counting collision is at best a bandaid and not a proper fix
> > stemmed from a desire to not break existing applications on a bugfix
> > release ...
> > 
> My opinion of counting is better than yours, but even conceding the 
> theoretical, purity argument, our release process is practical as well. 
> There have been a few occasions when fixes to bugs in our code have been 
> delayed from a bugfix release to the next feature release -- because the 
> fix would break too much code depending on the bug.
> Some years ago there was a proposal that we should deliberately tweak 
> hash() to break 'buggy' code that depended on it not changing. This 
> never happened. So it has been left de facto constant, to the extent it 
> is, for some years.
> -- 
> Terry Jan Reedy
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at (mailto:Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From vijaymajagaonkar at  Fri Jan 20 23:40:29 2012
From: vijaymajagaonkar at (Vijay Majagaonkar)
Date: Fri, 20 Jan 2012 17:40:29 -0500
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-20, at 4:29 AM, Hynek Schlawack wrote:

> Hello Vijay 
> Am Freitag, 20. Januar 2012 um 00:56 schrieb Vijay N. Majagaonkar:
>> I am trying to build python 3 on mac and build failing with following error can somebody help me with this
> It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: 
> make clean
> CC=clang ./configure && make -s

Hi Hynek,

Thanks for the help, but above command need to run in different way

./configure CC=clang

this allowed me to build the code but when ran test I got following error message

[363/364/3] test_io
python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1  


From benjamin at  Fri Jan 20 23:45:08 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 20 Jan 2012 17:45:08 -0500
Subject: [Python-Dev] exception chaining
In-Reply-To: <jfcms2$ttp$>
References: <>
	<> <jfcms2$ttp$>
Message-ID: <>

2012/1/20 Terry Reedy <tjreedy at>:
> Since 'raise' means 're-raise the current error', 'raise as OtherError'
> means (clearly to me, anyway) 're-raise the current error as OtherError'.

That doesn't make any sense. You're changing the exception completely
not reraising it.


From guido at  Fri Jan 20 23:51:19 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 14:51:19 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 2:33 PM, Ben Wolfson <wolfson at> wrote:
> On Fri, Jan 20, 2012 at 2:11 PM, Terry Reedy <tjreedy at> wrote:
>> On 1/20/2012 2:51 PM, Donald Stufft wrote:
>>> I think the counting collision is at best a bandaid and not a proper fix
>>> stemmed from a desire to not break existing applications on a bugfix
>>> release ...
>> My opinion of counting is better than yours, but even conceding the
>> theoretical, purity argument, our release process is practical as well.
>> There have been a few occasions when fixes to bugs in our code have been
>> delayed from a bugfix release to the next feature release -- because the fix
>> would break too much code depending on the bug.
> AFAICT Brett's suggestion (which had occurred to me as well, but I'm
> not a core developer by any stretch) seemed to get lost in the debate:
> would it be possible to go with collision counting for bugfix releases
> and hash randomization for new feature releases? (Brett made it here:
> <>.)

I made it earlier.

--Guido van Rossum (

From guido at  Fri Jan 20 23:55:03 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 14:55:03 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 2:35 PM, Frank Sievertsen <pydev at> wrote:
> Am 20.01.2012 16:33, schrieb Guido van Rossum:
>> (I'm thinking that the original attack is trivial once the set of 65000
>> colliding keys is public knowledge, which must be only a matter of time.
> I think it's very likely that this will happen soon.
> For ASP and PHP there is attack-payload publicly available.
> PHP and ASP have patches to limit the number of query-variables.
> We're very lucky that there's no public payload for python yet,
> and all non-public software and payload I'm aware of is based
> upon my software.
> But this can change any moment. It's not really difficult to
> write software to create 32bit-collisions.

While we're debating the best fix, could we allow people to at least
protect themselves against script-kiddies by offering fixes to,
django, webob and a few other popular frameworks that limits forms to
1000 keys? (I suppose it's really only POST requests that are
vulnerable to script kiddies, because of the length restriction on

--Guido van Rossum (

From amauryfa at  Sat Jan 21 00:03:55 2012
From: amauryfa at (Amaury Forgeot d'Arc)
Date: Sat, 21 Jan 2012 00:03:55 +0100
Subject: [Python-Dev] Hashing proposal: change only string-only dicts
In-Reply-To: <>
References: <> <>
Message-ID: <>

2012/1/19 Gregory P. Smith <greg at>

> str[-1] is not likely to work if you want to maintain ABI compatibility.
>  Appending it to the data after the terminating \0 is more likely to be
> possible, but if there is any possibility that existing compiled extension
> modules have somehow inlined code to do allocation of the str field even
> that is questionable (i don't think there are?).

There are. Unfortunately.

Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sat Jan 21 01:40:15 2012
From: steve at (Steven D'Aprano)
Date: Sat, 21 Jan 2012 11:40:15 +1100
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/20 Terry Reedy <tjreedy at>:
>> Since 'raise' means 're-raise the current error', 'raise as OtherError'
>> means (clearly to me, anyway) 're-raise the current error as OtherError'.
> That doesn't make any sense. You're changing the exception completely
> not reraising it.

I expect Terry is referring to the coder's intention, not the actual nuts and 
bolts of how it is implemented.

def spam():
     except HamError:
         raise SpamError

is implemented by catching a HamError and raising a completely different 
SpamError, but the intention is to "replace the HamError which actually 
occurred with a more appropriate SpamError".

At least that is *my* intention when I write code like the above, and it 
appears to be the usual intention in code I've seen that uses that idiom. 
Typically SpamError is part of the function's API while HamError is not.

The fact that Python doesn't actually "replace" anything is besides the point. 
The purpose of the idiom is to turn one exception into another exception, 
which is as close as we can get to re-raising HamError as a SpamError instead.

(It's not actually a re-raise as the traceback will point to a different line 
of code, but it's close enough.)

I'd prefer "raise SpamError from None", but I understand that this cannot work 
due to technical limitations. If that is the case, then "raise as SpamError" 
works for me.


From steve at  Sat Jan 21 01:53:31 2012
From: steve at (Steven D'Aprano)
Date: Sat, 21 Jan 2012 11:53:31 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<jfcfoo$461$>	<>
Message-ID: <>

Guido van Rossum wrote:
> On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft <donald.stufft at>wrote:
>>  On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:
>> On 01/20/2012 02:04 PM, Donald Stufft wrote:
>> Even if a MemoryException is raised I believe that is still a
>> fundamental change in the documented contract of dictionary API.
>> How so? Dictionary inserts can *already* raise that error.
>> Because it's raising it for a fundamentally different thing. "You have
>> plenty of memory, but we decided to add an arbitrary limit that has nothing
>> to do with memory and pretend you are out of memory anyways".
> Actually due to fragmentation that can already happen.

Whether you have run out of total memory, or a single contiguous block, it is 
still a memory error.

An arbitrary limit on collisions is not a memory error. If we were designing 
this API from scratch, would anyone propose using MemoryError for "you have 
reached a limit on collisions"? It has nothing to do with memory. Using 
MemoryError for something which isn't a memory error is ugly.

How about RuntimeError?


From guido at  Sat Jan 21 02:02:53 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 17:02:53 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

It should derive from BaseException.

--Guido van Rossum (sent from Android phone)
On Jan 20, 2012 4:59 PM, "Steven D&apos;Aprano" <steve at> wrote:

> Guido van Rossum wrote:
>> On Fri, Jan 20, 2012 at 11:51 AM, Donald Stufft <donald.stufft at>
>> **wrote:
>>   On Friday, January 20, 2012 at 2:36 PM, Tres Seaver wrote:
>>> On 01/20/2012 02:04 PM, Donald Stufft wrote:
>>> Even if a MemoryException is raised I believe that is still a
>>> fundamental change in the documented contract of dictionary API.
>>> How so? Dictionary inserts can *already* raise that error.
>>> Because it's raising it for a fundamentally different thing. "You have
>>> plenty of memory, but we decided to add an arbitrary limit that has
>>> nothing
>>> to do with memory and pretend you are out of memory anyways".
>> Actually due to fragmentation that can already happen.
> Whether you have run out of total memory, or a single contiguous block, it
> is still a memory error.
> An arbitrary limit on collisions is not a memory error. If we were
> designing this API from scratch, would anyone propose using MemoryError for
> "you have reached a limit on collisions"? It has nothing to do with memory.
> Using MemoryError for something which isn't a memory error is ugly.
> How about RuntimeError?
> --
> Steven
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:**mailman/options/python-dev/**
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sat Jan 21 03:25:08 2012
From: steve at (Steven D'Aprano)
Date: Sat, 21 Jan 2012 13:25:08 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <jfcdqh$vg4$>
References: <>	<>	<>	<>	<>
Message-ID: <>

Terry Reedy wrote:
> On 1/20/2012 11:17 AM, Victor Stinner wrote:
>> There is no perfect solutions, drawbacks of each solution should be 
>> compared.
> Amen.
> One possible attack that has been described for a collision counting 
> dict depends on knowing precisely the trigger point. So let 
> MAXCOLLISIONS either be configureable or just choose a random count 
> between M and N, say 700 and 999.

Have I missed something? Why wouldn't the attacker simply target 1000 
collisions, and if the collision triggers at 700 instead of 1000, that's a bonus?


From steve at  Sat Jan 21 03:33:24 2012
From: steve at (Steven D'Aprano)
Date: Sat, 21 Jan 2012 13:33:24 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<jfcfoo$461$>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> It should derive from BaseException.

RuntimeError meets that requirement, and it is an existing exception so there 
are no issues with introducing a new built-in exception to a point release.

py> issubclass(RuntimeError, BaseException)


From benjamin at  Sat Jan 21 03:36:32 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 20 Jan 2012 21:36:32 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/20 Steven D'Aprano <steve at>:
> Guido van Rossum wrote:
>> It should derive from BaseException.
> RuntimeError meets that requirement, and it is an existing exception so
> there are no issues with introducing a new built-in exception to a point
> release.
> py> issubclass(RuntimeError, BaseException)
> True

Guido meant a direct subclass.


From guido at  Sat Jan 21 03:37:25 2012
From: guido at (Guido van Rossum)
Date: Fri, 20 Jan 2012 18:37:25 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 20, 2012 at 6:33 PM, Steven D'Aprano <steve at> wrote:
> Guido van Rossum wrote:
>> It should derive from BaseException.

> RuntimeError meets that requirement, and it is an existing exception so
> there are no issues with introducing a new built-in exception to a point
> release.
> py> issubclass(RuntimeError, BaseException)
> True

Sorry, I was ambiguous. I meant it should not derive from Exception.
It goes RuntimeError -> StandardError -> Exception -> BaseException.

--Guido van Rossum (

From steve at  Sat Jan 21 07:02:14 2012
From: steve at (Steven D'Aprano)
Date: Sat, 21 Jan 2012 17:02:14 +1100
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman wrote:

> The question I have at the moment is:  should `raise as` be an error if 
> no exception is currently being handled?

I think so.

"raise as Spam" essentially means "raise Spam with the context set to None". 
That's actually only useful if the context otherwise wouldn't be None, that 
is, if you're raising an exception when another exception is active. Doing it 
"just in case" defeats the usefulness of exception chaining, and should be 

It is easier to change our minds later and allow "raise as" outside of an 
except block, than to change our minds and forbid it.

> Example:
> def smurfy(x):
>     if x != 'magic flute':
>          raise as WrongInstrument
>     do_something_with_x
> If this is allowed then `smurfy` could be called from inside an `except` 
> clause or outside it.

What's your use-case? The only one I can think of is this:

def choose_your_own_exception(x):
     if x < 0:
         raise as ValueError
     elif x == 0:
         raise as SpamError
     elif x < 1:
         raise as SmallerThanOneError
         raise as RuntimeError

except TypeError:

I don't think we want to encourage such practices. Besides, if you really need 
such an exception selector, change every "raise as" to return, then do:

except TypeError:
     raise as choose_your_own_exception(x)

Much more clear.

> I don't care for it for two reasons:
>   - I don't like the way it looks
>   - I can see it encouraging always using `raise as` instead of `raise` 
> and losing the value of exception chaining.
> Other thoughts?
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 

From tjreedy at  Sat Jan 21 07:07:00 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 21 Jan 2012 01:07:00 -0500
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <jfdkmd$1vf$>

On 1/20/2012 7:40 PM, Steven D'Aprano wrote:
> Benjamin Peterson wrote:
>> 2012/1/20 Terry Reedy <tjreedy at>:
>>> Since 'raise' means 're-raise the current error', 'raise as OtherError'
>>> means (clearly to me, anyway) 're-raise the current error as
>>> OtherError'.
>> That doesn't make any sense. You're changing the exception completely
>> not reraising it.
> I expect Terry is referring to the coder's intention, not the actual
> nuts and bolts of how it is implemented.

Yes, same error situation, translated, typically from developer language 
to app language.

> def spam():
> try:
> something()
> except HamError:
> raise SpamError
> is implemented by catching a HamError and raising a completely different
> SpamError, but the intention is to "replace the HamError which actually
> occurred with a more appropriate SpamError".
> At least that is *my* intention when I write code like the above, and it
> appears to be the usual intention in code I've seen that uses that
> idiom. Typically SpamError is part of the function's API while HamError
> is not.

Terry Jan Reedy

From senthil at  Sat Jan 21 07:32:12 2012
From: senthil at (Senthil Kumaran)
Date: Sat, 21 Jan 2012 14:32:12 +0800
Subject: [Python-Dev] 2.7 now uses Sphinx 1.0
In-Reply-To: <>
References: <>
Message-ID: <20120121063212.GA1988@mathmagic>

On Fri, Jan 20, 2012 at 07:05:41PM +0000, Chris Withers wrote:
> That's great news, does that now mean the objects inventory for
> Python 2.7 and Python 3 on now supports referring to
> section headers from 3rd party packages?

Nope. It does not seem to have any relation to that. Would you care to
explain more, possibly show an example in rst as what you mean?


From g.brandl at  Sat Jan 21 08:43:33 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 21 Jan 2012 08:43:33 +0100
Subject: [Python-Dev] 2.7 now uses Sphinx 1.0
In-Reply-To: <>
References: <>
Message-ID: <jfdq48$c13$>

Am 20.01.2012 20:05, schrieb Chris Withers:
> On 14/01/2012 16:14, Sandro Tosi wrote:
>> Hello,
>> just a heads-up: documentation for 2.7 branch has been ported to use
>> sphinx 1.0, so now the same syntax can be used for 2.x and 3.x
>> patches, hopefully easying working on both python stacks.
> That's great news, does that now mean the objects inventory for Python 
> 2.7 and Python 3 on now supports referring to section headers 
> from 3rd party packages?

Yes, they should -- there's something wrong with the automatic build still,
but I'll fix that ASAP.


From matthew at  Sat Jan 21 16:50:59 2012
From: matthew at (Matthew Woodcraft)
Date: Sat, 21 Jan 2012 15:50:59 +0000 (UTC)
Subject: [Python-Dev] Counting collisions for the win
References: <>
Message-ID: <jfemt2$te5$>

Victor Stinner  <victor.stinner at> wrote:
> I propose to solve the hash collision vulnerability by counting
> collisions [...]

> We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.

For web frameworks, forcing an exception is less harmful than forcing a
many-second delay, but I think it's hard to be confident that there
aren't other vulnerable applications where it's the other way round.

Web frameworks like the exception because they already have backstop
exception handlers, and anyway they use short-lived processes and keep
valuable data in databases rather than process memory.

Web frameworks don't like the delay because they allow unauthenticated
users to submit many requests (including multiple requests in parallel),
and they normally expect each response to take little cpu time.

But many programs are not like this.

What about a log analyser or a mailing list archiver or a web crawler or
a game server or some other kind of program we haven't considered?


From guido at  Sat Jan 21 17:45:29 2012
From: guido at (Guido van Rossum)
Date: Sat, 21 Jan 2012 08:45:29 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <jfemt2$te5$>
References: <>
Message-ID: <>

On Sat, Jan 21, 2012 at 7:50 AM, Matthew Woodcraft
<matthew at> wrote:
> Victor Stinner ?<victor.stinner at> wrote:
>> I propose to solve the hash collision vulnerability by counting
>> collisions [...]
>> We now know all issues of the randomized hash solution, and I
>> think that there are more drawbacks than advantages. IMO the
>> randomized hash is overkill to fix the hash collision issue.
> For web frameworks, forcing an exception is less harmful than forcing a
> many-second delay, but I think it's hard to be confident that there
> aren't other vulnerable applications where it's the other way round.
> Web frameworks like the exception because they already have backstop
> exception handlers, and anyway they use short-lived processes and keep
> valuable data in databases rather than process memory.
> Web frameworks don't like the delay because they allow unauthenticated
> users to submit many requests (including multiple requests in parallel),
> and they normally expect each response to take little cpu time.
> But many programs are not like this.
> What about a log analyser or a mailing list archiver or a web crawler or
> a game server or some other kind of program we haven't considered?

If my log crawler ended up taking minutes per log entry instead of
milliseconds I'd have to kill it anyway. Web crawlers are huge
multi-process systems that are as robust as web servers, or more. Game
servers are just web apps.

--Guido van Rossum (

From hs at  Sat Jan 21 19:57:23 2012
From: hs at (Hynek Schlawack)
Date: Sat, 21 Jan 2012 19:57:23 +0100
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar:
> > > I am trying to build python 3 on mac and build failing with following error can somebody help me with this
> > 
> > It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: 
> > 
> > make clean
> > CC=clang ./configure && make -s
> Thanks for the help, but above command need to run in different way
> ./configure CC=clang
> make

I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine.

> this allowed me to build the code but when ran test I got following error message
> [363/364/3] test_io
> python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
> *** error: can't allocate region
> *** set a breakpoint in malloc_error_break to debug
> I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 

Please ensure there aren't any gcc-created objects left by running "make distclean" first.


From dmalcolm at  Sat Jan 21 21:22:34 2012
From: dmalcolm at (David Malcolm)
Date: Sat, 21 Jan 2012 15:22:34 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <1327177356.4992.265.camel@surprise>

On Fri, 2012-01-20 at 16:55 +0100, Frank Sievertsen wrote:
> Hello,
> I still see at least two ways to create a DOS attack even with the
> collison-counting-patch.

[snip description of two types of attack on the collision counting

> What to do now?
> I think it's not smart to reduce the number of allowed collisions 
> dramatically
> AND count all slot-collisions at the same time.

Frank: did you see the new approach I proposed in:

(repurposes the ma_smalltable region of large dictionaries to add
tracking of each such dict's average iterations taken per modification,
and raise an exception when it exceeds a particular ratio)

I'm interested in hearing how it holds up against your various test
cases, or what flaws there are in it.


From vijaymajagaonkar at  Sat Jan 21 21:24:00 2012
From: vijaymajagaonkar at (Vijay Majagaonkar)
Date: Sat, 21 Jan 2012 15:24:00 -0500
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-21, at 1:57 PM, Hynek Schlawack wrote:

> Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar:
>>>> I am trying to build python 3 on mac and build failing with following error can somebody help me with this
>>> It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: 
>>> make clean
>>> CC=clang ./configure && make -s
>> Thanks for the help, but above command need to run in different way
>> ./configure CC=clang
>> make
> I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine.

I am not sure, that was just try and worked for me, with first option suggested by you was throwing same compile error then I tried with this that worked :)

>> this allowed me to build the code but when ran test I got following error message
>> [363/364/3] test_io
>> python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>> *** error: can't allocate region
>> *** set a breakpoint in malloc_error_break to debug
>> I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 
> Please ensure there aren't any gcc-created objects left by running "make distclean" first.

I have tried this option too but still result is same, I have attached test result if that will helps  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mac_test.log
Type: application/octet-stream
Size: 3051915 bytes
Desc: not available
URL: <>
-------------- next part --------------
and I will like to work on this if you give me some guideline to look into this issue 

Thanks for the help

From solipsis at  Sat Jan 21 23:20:47 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 21 Jan 2012 23:20:47 +0100
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
References: <>
Message-ID: <>

On Sat, 21 Jan 2012 21:51:43 +0100
gregory.p.smith <python-checkins at> wrote:
> changeset:   74561:d01fecadf3ea
> branch:      3.2
> parent:      74558:03e61104f7a2
> user:        Gregory P. Smith <greg at>
> date:        Sat Jan 21 12:31:25 2012 -0800
> summary:
>   Avoid the compiler warning about the unused return value.

Can't that give you another warning about the ssize_t being truncated to
How about the following instead?

    (void) write(...);



From benjamin at  Sat Jan 21 23:24:56 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 21 Jan 2012 17:24:56 -0500
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/21 Antoine Pitrou <solipsis at>:
> On Sat, 21 Jan 2012 21:51:43 +0100
> gregory.p.smith <python-checkins at> wrote:
>> changeset: ? 74561:d01fecadf3ea
>> branch: ? ? ?3.2
>> parent: ? ? ?74558:03e61104f7a2
>> user: ? ? ? ?Gregory P. Smith <greg at>
>> date: ? ? ? ?Sat Jan 21 12:31:25 2012 -0800
>> summary:
>> ? Avoid the compiler warning about the unused return value.
> Can't that give you another warning about the ssize_t being truncated to
> int?
> How about the following instead?
> ? ?(void) write(...);

Also, if you use a recent enough version of gcc, ./configure will
disable the warning. I would prefer if stop using these kinds of


From stefan at  Sat Jan 21 23:33:08 2012
From: stefan at (Stefan Krah)
Date: Sat, 21 Jan 2012 23:33:08 +0100
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning
	about	the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson <benjamin at> wrote:
> > Can't that give you another warning about the ssize_t being truncated to
> > int?
> > How about the following instead?
> >
> >   (void) write(...);
> Also, if you use a recent enough version of gcc, ./configure will
> disable the warning. I would prefer if stop using these kinds of
> hacks.

Do you mean (void)write(...)? Many people think this is good practice,
since it indicates to the reader that the return value is deliberately

Stefan Krah

From benjamin at  Sat Jan 21 23:53:17 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 21 Jan 2012 17:53:17 -0500
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/21 Stefan Krah <stefan at>:
> Benjamin Peterson <benjamin at> wrote:
>> > Can't that give you another warning about the ssize_t being truncated to
>> > int?
>> > How about the following instead?
>> >
>> > ? (void) write(...);
>> Also, if you use a recent enough version of gcc, ./configure will
>> disable the warning. I would prefer if stop using these kinds of
>> hacks.
> Do you mean (void)write(...)? Many people think this is good practice,
> since it indicates to the reader that the return value is deliberately
> ignored.

Not doing anything with it seems fairly deliberate to me.


From solipsis at  Sat Jan 21 23:52:36 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 21 Jan 2012 23:52:36 +0100
Subject: [Python-Dev] cpython (3.2): Fixes issue #8052: The posix
 subprocess module's close_fds behavior was
References: <>
Message-ID: <>

On Sat, 21 Jan 2012 23:39:41 +0100
gregory.p.smith <python-checkins at> wrote:
> changeset:   74563:61aa484a3e54
> branch:      3.2
> parent:      74561:d01fecadf3ea
> user:        Gregory P. Smith <greg at>
> date:        Sat Jan 21 14:01:08 2012 -0800
> summary:
>   Fixes issue #8052: The posix subprocess module's close_fds behavior was
> suboptimal by closing all possible file descriptors rather than just
> the open ones in the child process before exec().

For what it's worth, I'm not really confident with so much new low-level
code in a bugfix release.
IMHO it's more of a new feature, since it's a performance improvement.



From greg at  Sun Jan 22 00:02:58 2012
From: greg at (Gregory P. Smith)
Date: Sat, 21 Jan 2012 15:02:58 -0800
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 21, 2012 at 2:33 PM, Stefan Krah <stefan at> wrote:
> Benjamin Peterson <benjamin at> wrote:
>> > Can't that give you another warning about the ssize_t being truncated to
>> > int?
>> > How about the following instead?
>> >
>> > ? (void) write(...);
>> Also, if you use a recent enough version of gcc, ./configure will
>> disable the warning. I would prefer if stop using these kinds of
>> hacks.
> Do you mean (void)write(...)? Many people think this is good practice,
> since it indicates to the reader that the return value is deliberately
> ignored.

Unfortunately (void) write(...) does not get rid of the warning.

Asking me to change the version of the compiler i'm using is
unfortunately not helpful.  I don't want to see this warning on any
common default compiler versions today.

I am not going to use a different gcc/g++ version than what my distro
provides to build python unless we start making that demand of all
CPython users as well.

It is normally a useful warning message, just not in this specific case.


From greg.ewing at  Sun Jan 22 00:22:56 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 22 Jan 2012 12:22:56 +1300
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

Glyph wrote:

> Yes, but you /can/ look at a 'yield' and conclude that you /might/ need 
> a lock, and that you have to think about it.

My concern is that you will end up with vastly more 'yield from's
than places that require locks, so most of them are just noise.
If you bite your nails over whether a lock is needed every time
you see one, they will cause you a lot more anxiety than they

> Sometimes there's no alternative, but wherever I can, I avoid thinking, 
> especially hard thinking.  This maxim has served me very well throughout 
> my programming career ;-).

There are already well-known techniques for dealing with
concurrency that minimise the amount of hard thinking required.
You devise some well-behaved abstractions, such as queues, and
put all your hard thinking into implementing them. Then you
build the rest of your code around those abstractions. That
way you don't have to rely on crutches such as explicitly
marking everything that might cause a task switch, because
it doesn't matter.


From greg at  Sun Jan 22 00:36:26 2012
From: greg at (Gregory P. Smith)
Date: Sat, 21 Jan 2012 15:36:26 -0800
Subject: [Python-Dev] cpython (3.2): Fixes issue #8052: The posix
 subprocess module's close_fds behavior was
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 21, 2012 at 2:52 PM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 21 Jan 2012 23:39:41 +0100
> gregory.p.smith <python-checkins at> wrote:
>> changeset: ? 74563:61aa484a3e54
>> branch: ? ? ?3.2
>> parent: ? ? ?74561:d01fecadf3ea
>> user: ? ? ? ?Gregory P. Smith <greg at>
>> date: ? ? ? ?Sat Jan 21 14:01:08 2012 -0800
>> summary:
>> ? Fixes issue #8052: The posix subprocess module's close_fds behavior was
>> suboptimal by closing all possible file descriptors rather than just
>> the open ones in the child process before exec().
> For what it's worth, I'm not really confident with so much new low-level
> code in a bugfix release.
> IMHO it's more of a new feature, since it's a performance improvement.

No APIs change and it makes the subprocess module usable on systems
running with high file descriptor limits where it was painfully slow
to use in the past.

This was a regression in behavior introduced with 3.2's change to make
close_fds=True be the (quite sane) default so I do consider it a fix
rather than a performance improvement.

Obviously the final decision rests with the 3.2.3 release manager.

For anyone uncomfortable with the code itself: The equivalent of that
code has been in use in production at work continuously in
multithreaded processes across a massive number of machines running a
variety of versions of Linux for many years now.  And the non-Linux
code is effectively what the Java VM's Process module does.


From benjamin at  Sun Jan 22 01:15:38 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 21 Jan 2012 19:15:38 -0500
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/21 Gregory P. Smith <greg at>:
> On Sat, Jan 21, 2012 at 2:33 PM, Stefan Krah <stefan at> wrote:
>> Benjamin Peterson <benjamin at> wrote:
>>> > Can't that give you another warning about the ssize_t being truncated to
>>> > int?
>>> > How about the following instead?
>>> >
>>> > ? (void) write(...);
>>> Also, if you use a recent enough version of gcc, ./configure will
>>> disable the warning. I would prefer if stop using these kinds of
>>> hacks.
>> Do you mean (void)write(...)? Many people think this is good practice,
>> since it indicates to the reader that the return value is deliberately
>> ignored.
> Unfortunately (void) write(...) does not get rid of the warning.
> Asking me to change the version of the compiler i'm using is
> unfortunately not helpful. ?I don't want to see this warning on any
> common default compiler versions today.

I'm not asking you to. I'm just saying this annoyance (which is all it
is) has been fixed when the infrastructure is new enough to support

> I am not going to use a different gcc/g++ version than what my distro
> provides to build python unless we start making that demand of all
> CPython users as well.
> It is normally a useful warning message, just not in this specific case.


From jared.grubb at  Sun Jan 22 01:19:46 2012
From: jared.grubb at (Jared Grubb)
Date: Sat, 21 Jan 2012 16:19:46 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On 20 Jan 2012, at 10:49, Brett Cannon wrote:
> Why can't we have our cake and eat it too?
> Can we do hash randomization in 3.3 and use the hash count solution for bugfix releases? That way we get a basic fix into the bugfix releases that won't break people's code (hopefully) but we go with a more thorough (and IMO correct) solution of hash randomization starting with 3.3 and moving forward. We aren't breaking compatibility in any way by doing this since it's a feature release anyway where we change tactics. And it can't be that much work since we seem to have patches for both solutions. At worst it will make merging commits for those files affected by the patches, but that will most likely be isolated and not a common collision (and less of any issue once 3.3 is released later this year).
> I understand the desire to keep backwards-compatibility, but collision counting could cause an error in some random input that someone didn't expect to cause issues whether they were under a DoS attack or just had some unfortunate input from private data. The hash randomization, though, is only weak if someone is attacked, not if they are just using Python with their own private data.

I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash:
 * this would solve the security issue that someone mentioned about being able to deduce the hash because if they keep being mean it'll change anyway
 * for bugfix, start off without randomization (seed==0) and start to use it only when the collision count hits the threshold
 * for release, reseeding when you hit a certain threshold still seems like a good idea as it will make lookups/insertions better in the long-run

AFAIUI, Python already doesnt guarantee order stability when you insert something into a dictionary, as in the worst case the dictionary has to resize its hash table, and then the order is freshly jumbled again.

Just my two cents.


From benjamin at  Sun Jan 22 01:21:52 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 21 Jan 2012 19:21:52 -0500
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fixes issue
 #8052: The posix subprocess module's close_fds behavior was
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/21 gregory.p.smith <python-checkins at>:
> +/* Convert ASCII to a positive int, no libc call. no overflow. -1 on error. */

Is no libc call important?

> +static int _pos_int_from_ascii(char *name)

To be consistent with the rest of posixmodule.c, "static int" should
be on a different line from the signature. This also applies to all
other function declarations added by this.

> +{
> + ? ?int num = 0;
> + ? ?while (*name >= '0' && *name <= '9') {
> + ? ? ? ?num = num * 10 + (*name - '0');
> + ? ? ? ?++name;
> + ? ?}
> + ? ?if (*name)
> + ? ? ? ?return -1; ?/* Non digit found, not a number. */
> + ? ?return num;
> +}
> +
> +
> +/* Returns 1 if there is a problem with fd_sequence, 0 otherwise. */
> +static int _sanity_check_python_fd_sequence(PyObject *fd_sequence)
> +{
> + ? ?Py_ssize_t seq_idx, seq_len = PySequence_Length(fd_sequence);

PySequence_Length can fail.

> + ? ?long prev_fd = -1;
> + ? ?for (seq_idx = 0; seq_idx < seq_len; ++seq_idx) {
> + ? ? ? ?PyObject* py_fd = PySequence_Fast_GET_ITEM(fd_sequence, seq_idx);
> + ? ? ? ?long iter_fd = PyLong_AsLong(py_fd);
> + ? ? ? ?if (iter_fd < 0 || iter_fd < prev_fd || iter_fd > INT_MAX) {
> + ? ? ? ? ? ?/* Negative, overflow, not a Long, unsorted, too big for a fd. */
> + ? ? ? ? ? ?return 1;
> + ? ? ? ?}
> + ? ?}
> + ? ?return 0;
> +}
> +
> +
> +/* Is fd found in the sorted Python Sequence? */
> +static int _is_fd_in_sorted_fd_sequence(int fd, PyObject *fd_sequence)
> +{
> + ? ?/* Binary search. */
> + ? ?Py_ssize_t search_min = 0;
> + ? ?Py_ssize_t search_max = PySequence_Length(fd_sequence) - 1;
> + ? ?if (search_max < 0)
> + ? ? ? ?return 0;
> + ? ?do {
> + ? ? ? ?long middle = (search_min + search_max) / 2;
> + ? ? ? ?long middle_fd = PyLong_AsLong(
> + ? ? ? ? ? ? ? ?PySequence_Fast_GET_ITEM(fd_sequence, middle));

No check for error?

> + ? ? ? ?if (fd == middle_fd)
> + ? ? ? ? ? ?return 1;
> + ? ? ? ?if (fd > middle_fd)
> + ? ? ? ? ? ?search_min = middle + 1;
> + ? ? ? ?else
> + ? ? ? ? ? ?search_max = middle - 1;
> + ? ?} while (search_min <= search_max);
> + ? ?return 0;
> +}
> +
> +
> +/* Close all file descriptors in the range start_fd inclusive to
> + * end_fd exclusive except for those in py_fds_to_keep. ?If the
> + * range defined by [start_fd, end_fd) is large this will take a
> + * long time as it calls close() on EVERY possible fd.
> + */
> +static void _close_fds_by_brute_force(int start_fd, int end_fd,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject *py_fds_to_keep)
> +{
> + ? ?Py_ssize_t num_fds_to_keep = PySequence_Length(py_fds_to_keep);
> + ? ?Py_ssize_t keep_seq_idx;
> + ? ?int fd_num;
> + ? ?/* As py_fds_to_keep is sorted we can loop through the list closing
> + ? ? * fds inbetween any in the keep list falling within our range. */
> + ? ?for (keep_seq_idx = 0; keep_seq_idx < num_fds_to_keep; ++keep_seq_idx) {
> + ? ? ? ?PyObject* py_keep_fd = PySequence_Fast_GET_ITEM(py_fds_to_keep,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?keep_seq_idx);
> + ? ? ? ?int keep_fd = PyLong_AsLong(py_keep_fd);
> + ? ? ? ?if (keep_fd < start_fd)
> + ? ? ? ? ? ?continue;
> + ? ? ? ?for (fd_num = start_fd; fd_num < keep_fd; ++fd_num) {
> + ? ? ? ? ? ?while (close(fd_num) < 0 && errno == EINTR);
> + ? ? ? ?}
> + ? ? ? ?start_fd = keep_fd + 1;
> + ? ?}
> + ? ?if (start_fd <= end_fd) {
> + ? ? ? ?for (fd_num = start_fd; fd_num < end_fd; ++fd_num) {
> + ? ? ? ? ? ?while (close(fd_num) < 0 && errno == EINTR);
> + ? ? ? ?}
> + ? ?}
> +}
> +
> +
> +#if defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)
> +/* It doesn't matter if d_name has room for NAME_MAX chars; we're using this
> + * only to read a directory of short file descriptor number names. ?The kernel
> + * will return an error if we didn't give it enough space. ?Highly Unlikely.
> + * This structure is very old and stable: It will not change unless the kernel
> + * chooses to break compatibility with all existing binaries. ?Highly Unlikely.
> + */
> +struct linux_dirent {
> + ? unsigned long ?d_ino; ? ? ? ?/* Inode number */
> + ? unsigned long ?d_off; ? ? ? ?/* Offset to next linux_dirent */
> + ? unsigned short d_reclen; ? ? /* Length of this linux_dirent */
> + ? char ? ? ? ? ? d_name[256]; ?/* Filename (null-terminated) */
> +};
> +
> +/* Close all open file descriptors in the range start_fd inclusive to end_fd
> + * exclusive. Do not close any in the sorted py_fds_to_keep list.
> + *
> + * This version is async signal safe as it does not make any unsafe C library
> + * calls, malloc calls or handle any locks. ?It is _unfortunate_ to be forced
> + * to resort to making a kernel system call directly but this is the ONLY api
> + * available that does no harm. ?opendir/readdir/closedir perform memory
> + * allocation and locking so while they usually work they are not guaranteed
> + * to (especially if you have replaced your malloc implementation). ?A version
> + * of this function that uses those can be found in the _maybe_unsafe variant.
> + *
> + * This is Linux specific because that is all I am ready to test it on. ?It
> + * should be easy to add OS specific dirent or dirent64 structures and modify
> + * it with some cpp #define magic to work on other OSes as well if you want.
> + */
> +static void _close_open_fd_range_safe(int start_fd, int end_fd,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject* py_fds_to_keep)
> +{
> + ? ?int fd_dir_fd;
> + ? ?if (start_fd >= end_fd)
> + ? ? ? ?return;
> + ? ?fd_dir_fd = open(LINUX_SOLARIS_FD_DIR, O_RDONLY | O_CLOEXEC, 0);
> + ? ?/* Not trying to open the BSD_OSX path as this is currently Linux only. */
> + ? ?if (fd_dir_fd == -1) {
> + ? ? ? ?/* No way to get a list of open fds. */
> + ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep);
> + ? ? ? ?return;
> + ? ?} else {
> + ? ? ? ?char buffer[sizeof(struct linux_dirent)];
> + ? ? ? ?int bytes;
> + ? ? ? ?while ((bytes = syscall(SYS_getdents, fd_dir_fd,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(struct linux_dirent *)buffer,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sizeof(buffer))) > 0) {
> + ? ? ? ? ? ?struct linux_dirent *entry;
> + ? ? ? ? ? ?int offset;
> + ? ? ? ? ? ?for (offset = 0; offset < bytes; offset += entry->d_reclen) {
> + ? ? ? ? ? ? ? ?int fd;
> + ? ? ? ? ? ? ? ?entry = (struct linux_dirent *)(buffer + offset);
> + ? ? ? ? ? ? ? ?if ((fd = _pos_int_from_ascii(entry->d_name)) < 0)
> + ? ? ? ? ? ? ? ? ? ?continue; ?/* Not a number. */
> + ? ? ? ? ? ? ? ?if (fd != fd_dir_fd && fd >= start_fd && fd < end_fd &&
> + ? ? ? ? ? ? ? ? ? ?!_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) {
> + ? ? ? ? ? ? ? ? ? ?while (close(fd) < 0 && errno == EINTR);
> + ? ? ? ? ? ? ? ?}
> + ? ? ? ? ? ?}
> + ? ? ? ?}
> + ? ? ? ?close(fd_dir_fd);
> + ? ?}
> +}
> +
> +#define _close_open_fd_range _close_open_fd_range_safe
> +
> +#else ?/* NOT (defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)) */
> +
> +
> +/* Close all open file descriptors in the range start_fd inclusive to end_fd
> + * exclusive. Do not close any in the sorted py_fds_to_keep list.
> + *
> + * This function violates the strict use of async signal safe functions. :(
> + * It calls opendir(), readdir64() and closedir(). ?Of these, the one most
> + * likely to ever cause a problem is opendir() as it performs an internal
> + * malloc(). ?Practically this should not be a problem. ?The Java VM makes the
> + * same calls between fork and exec in its own UNIXProcess_md.c implementation.
> + *
> + * readdir_r() is not used because it provides no benefit. ?It is typically
> + * implemented as readdir() followed by memcpy(). ?See also:
> + * ?
> + */
> +static void _close_open_fd_range_maybe_unsafe(int start_fd, int end_fd,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PyObject* py_fds_to_keep)
> +{
> + ? ?DIR *proc_fd_dir;
> +#ifndef HAVE_DIRFD
> + ? ?while (_is_fd_in_sorted_fd_sequence(start_fd, py_fds_to_keep) &&
> + ? ? ? ? ? (start_fd < end_fd)) {
> + ? ? ? ?++start_fd;
> + ? ?}
> + ? ?if (start_fd >= end_fd)
> + ? ? ? ?return;
> + ? ?/* Close our lowest fd before we call opendir so that it is likely to
> + ? ? * reuse that fd otherwise we might close opendir's file descriptor in
> + ? ? * our loop. ?This trick assumes that fd's are allocated on a lowest
> + ? ? * available basis. */
> + ? ?while (close(start_fd) < 0 && errno == EINTR);
> + ? ?++start_fd;
> +#endif
> + ? ?if (start_fd >= end_fd)
> + ? ? ? ?return;
> +
> + ? ?proc_fd_dir = opendir(BSD_OSX_FD_DIR);
> + ? ?if (!proc_fd_dir)
> + ? ? ? ?proc_fd_dir = opendir(LINUX_SOLARIS_FD_DIR);
> + ? ?if (!proc_fd_dir) {
> + ? ? ? ?/* No way to get a list of open fds. */
> + ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep);
> + ? ?} else {
> + ? ? ? ?struct dirent64 *dir_entry;
> +#ifdef HAVE_DIRFD
> + ? ? ? ?int fd_used_by_opendir = DIRFD(proc_fd_dir);
> +#else
> + ? ? ? ?int fd_used_by_opendir = start_fd - 1;
> +#endif
> + ? ? ? ?errno = 0;
> + ? ? ? ?/* readdir64 is used to work around Solaris 9 bug 6395699. */
> + ? ? ? ?while ((dir_entry = readdir64(proc_fd_dir))) {
> + ? ? ? ? ? ?int fd;
> + ? ? ? ? ? ?if ((fd = _pos_int_from_ascii(dir_entry->d_name)) < 0)
> + ? ? ? ? ? ? ? ?continue; ?/* Not a number. */
> + ? ? ? ? ? ?if (fd != fd_used_by_opendir && fd >= start_fd && fd < end_fd &&
> + ? ? ? ? ? ? ? ?!_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) {
> + ? ? ? ? ? ? ? ?while (close(fd) < 0 && errno == EINTR);
> + ? ? ? ? ? ?}
> + ? ? ? ? ? ?errno = 0;
> + ? ? ? ?}
> + ? ? ? ?if (errno) {
> + ? ? ? ? ? ?/* readdir error, revert behavior. Highly Unlikely. */
> + ? ? ? ? ? ?_close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep);
> + ? ? ? ?}
> + ? ? ? ?closedir(proc_fd_dir);
> + ? ?}
> +}
> +
> +#define _close_open_fd_range _close_open_fd_range_maybe_unsafe
> +
> +#endif ?/* else NOT (defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)) */
> +
> +
> ?/*
> ?* This function is code executed in the child process immediately after fork
> ?* to set things up and call exec().
> @@ -46,12 +292,12 @@
> ? ? ? ? ? ? ? ? ? ? ? ?int errread, int errwrite,
> ? ? ? ? ? ? ? ? ? ? ? ?int errpipe_read, int errpipe_write,
> ? ? ? ? ? ? ? ? ? ? ? ?int close_fds, int restore_signals,
> - ? ? ? ? ? ? ? ? ? ? ? int call_setsid, Py_ssize_t num_fds_to_keep,
> + ? ? ? ? ? ? ? ? ? ? ? int call_setsid,
> ? ? ? ? ? ? ? ? ? ? ? ?PyObject *py_fds_to_keep,
> ? ? ? ? ? ? ? ? ? ? ? ?PyObject *preexec_fn,
> ? ? ? ? ? ? ? ? ? ? ? ?PyObject *preexec_fn_args_tuple)
> ?{
> - ? ?int i, saved_errno, fd_num, unused;
> + ? ?int i, saved_errno, unused;
> ? ? PyObject *result;
> ? ? const char* err_msg = "";
> ? ? /* Buffer large enough to hold a hex integer. ?We can't malloc. */
> @@ -113,33 +359,8 @@
> ? ? ? ? POSIX_CALL(close(errwrite));
> ? ? }
> - ? ?/* close() is intentionally not checked for errors here as we are closing */
> - ? ?/* a large range of fds, some of which may be invalid. */
> - ? ?if (close_fds) {
> - ? ? ? ?Py_ssize_t keep_seq_idx;
> - ? ? ? ?int start_fd = 3;
> - ? ? ? ?for (keep_seq_idx = 0; keep_seq_idx < num_fds_to_keep; ++keep_seq_idx) {
> - ? ? ? ? ? ?PyObject* py_keep_fd = PySequence_Fast_GET_ITEM(py_fds_to_keep,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?keep_seq_idx);
> - ? ? ? ? ? ?int keep_fd = PyLong_AsLong(py_keep_fd);
> - ? ? ? ? ? ?if (keep_fd < 0) { ?/* Negative number, overflow or not a Long. */
> - ? ? ? ? ? ? ? ?err_msg = "bad value in fds_to_keep.";
> - ? ? ? ? ? ? ? ?errno = 0; ?/* We don't want to report an OSError. */
> - ? ? ? ? ? ? ? ?goto error;
> - ? ? ? ? ? ?}
> - ? ? ? ? ? ?if (keep_fd < start_fd)
> - ? ? ? ? ? ? ? ?continue;
> - ? ? ? ? ? ?for (fd_num = start_fd; fd_num < keep_fd; ++fd_num) {
> - ? ? ? ? ? ? ? ?close(fd_num);
> - ? ? ? ? ? ?}
> - ? ? ? ? ? ?start_fd = keep_fd + 1;
> - ? ? ? ?}
> - ? ? ? ?if (start_fd <= max_fd) {
> - ? ? ? ? ? ?for (fd_num = start_fd; fd_num < max_fd; ++fd_num) {
> - ? ? ? ? ? ? ? ?close(fd_num);
> - ? ? ? ? ? ?}
> - ? ? ? ?}
> - ? ?}
> + ? ?if (close_fds)
> + ? ? ? ?_close_open_fd_range(3, max_fd, py_fds_to_keep);
> ? ? if (cwd)
> ? ? ? ? POSIX_CALL(chdir(cwd));
> @@ -227,7 +448,7 @@
> ? ? pid_t pid;
> ? ? int need_to_reenable_gc = 0;
> ? ? char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
> - ? ?Py_ssize_t arg_num, num_fds_to_keep;
> + ? ?Py_ssize_t arg_num;
> ? ? if (!PyArg_ParseTuple(
> ? ? ? ? ? ? args, "OOOOOOiiiiiiiiiiO:fork_exec",
> @@ -243,9 +464,12 @@
> ? ? ? ? PyErr_SetString(PyExc_ValueError, "errpipe_write must be >= 3");
> ? ? ? ? return NULL;
> ? ? }
> - ? ?num_fds_to_keep = PySequence_Length(py_fds_to_keep);
> - ? ?if (num_fds_to_keep < 0) {
> - ? ? ? ?PyErr_SetString(PyExc_ValueError, "bad fds_to_keep");
> + ? ?if (PySequence_Length(py_fds_to_keep) < 0) {
> + ? ? ? ?PyErr_SetString(PyExc_ValueError, "cannot get length of fds_to_keep");
> + ? ? ? ?return NULL;
> + ? ?}
> + ? ?if (_sanity_check_python_fd_sequence(py_fds_to_keep)) {
> + ? ? ? ?PyErr_SetString(PyExc_ValueError, "bad value(s) in fds_to_keep");
> ? ? ? ? return NULL;
> ? ? }
> @@ -348,8 +572,7 @@
> ? ? ? ? ? ? ? ? ? ?p2cread, p2cwrite, c2pread, c2pwrite,
> ? ? ? ? ? ? ? ? ? ?errread, errwrite, errpipe_read, errpipe_write,
> ? ? ? ? ? ? ? ? ? ?close_fds, restore_signals, call_setsid,
> - ? ? ? ? ? ? ? ? ? num_fds_to_keep, py_fds_to_keep,
> - ? ? ? ? ? ? ? ? ? preexec_fn, preexec_fn_args_tuple);
> + ? ? ? ? ? ? ? ? ? py_fds_to_keep, preexec_fn, preexec_fn_args_tuple);
> ? ? ? ? _exit(255);
> ? ? ? ? return NULL; ?/* Dead code to avoid a potential compiler warning. */
> ? ? }
> diff --git a/configure b/configure
> --- a/configure
> +++ b/configure
> @@ -6165,7 +6165,7 @@
> ?sys/audioio.h sys/bsdtty.h sys/epoll.h sys/event.h sys/file.h sys/loadavg.h \
> ?sys/lock.h sys/mkdev.h sys/modem.h \
> ?sys/param.h sys/poll.h sys/select.h sys/socket.h sys/statvfs.h sys/stat.h \
> -sys/termio.h sys/time.h \
> +sys/syscall.h sys/termio.h sys/time.h \
> ?sys/times.h sys/types.h sys/un.h sys/utsname.h sys/wait.h pty.h libutil.h \
> ?sys/resource.h netpacket/packet.h sysexits.h bluetooth.h \
> ?bluetooth/bluetooth.h linux/tipc.h spawn.h util.h
> diff --git a/ b/
> --- a/
> +++ b/
> @@ -1341,7 +1341,7 @@
> ?sys/audioio.h sys/bsdtty.h sys/epoll.h sys/event.h sys/file.h sys/loadavg.h \
> ?sys/lock.h sys/mkdev.h sys/modem.h \
> ?sys/param.h sys/poll.h sys/select.h sys/socket.h sys/statvfs.h sys/stat.h \
> -sys/termio.h sys/time.h \
> +sys/syscall.h sys/termio.h sys/time.h \
> ?sys/times.h sys/types.h sys/un.h sys/utsname.h sys/wait.h pty.h libutil.h \
> ?sys/resource.h netpacket/packet.h sysexits.h bluetooth.h \
> ?bluetooth/bluetooth.h linux/tipc.h spawn.h util.h)
> diff --git a/ b/
> --- a/
> +++ b/
> @@ -789,6 +789,9 @@
> ?/* Define to 1 if you have the <sys/stat.h> header file. */
> ?#undef HAVE_SYS_STAT_H
> +/* Define to 1 if you have the <sys/syscall.h> header file. */
> +
> ?/* Define to 1 if you have the <sys/termio.h> header file. */


From paul at  Sun Jan 22 05:09:10 2012
From: paul at (Paul McMillan)
Date: Sat, 21 Jan 2012 20:09:10 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb <jared.grubb at> wrote:
> I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash

This is nonsense. You have to determine the random seed at startup,
and it has to be uniform for the entire life of the process. You can't
change it after Python has started.


From steve at  Sun Jan 22 05:24:02 2012
From: steve at (Steven D'Aprano)
Date: Sun, 22 Jan 2012 15:24:02 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Paul McMillan wrote:
> On Sat, Jan 21, 2012 at 4:19 PM, Jared Grubb <jared.grubb at> wrote:
>> I agree; it sounds really odd to throw an exception since nothing is actually wrong and there's nothing the caller would do about it to recover anyway. Rather than throwing an exception, maybe you just reseed the random value for the hash
> This is nonsense. You have to determine the random seed at startup,
> and it has to be uniform for the entire life of the process. You can't
> change it after Python has started.

I may have a terminology problem here. I expect that a random seed must change 
every time it is used, otherwise the pseudorandom number generator using it 
just returns the same value each time. Should we be talking about a salt 
rather than a seed?


From stephen at  Sun Jan 22 05:59:37 2012
From: stephen at (Stephen J. Turnbull)
Date: Sun, 22 Jan 2012 13:59:37 +0900
Subject: [Python-Dev] cpython (3.2): Avoid the compiler warning about
 the unused return value.
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson writes:
 > 2012/1/21 Stefan Krah <stefan at>:

 > > Do you mean (void)write(...)? Many people think this is good practice,
 > > since it indicates to the reader that the return value is deliberately
 > > ignored.
 > Not doing anything with it seems fairly deliberate to me.

It may be deliberate, but then again it may not be.  EIBTI applies.

From anacrolix at  Sun Jan 22 07:45:11 2012
From: anacrolix at (Matt Joiner)
Date: Sun, 22 Jan 2012 17:45:11 +1100
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

> My concern is that you will end up with vastly more 'yield from's
> than places that require locks, so most of them are just noise.
> If you bite your nails over whether a lock is needed every time
> you see one, they will cause you a lot more anxiety than they
> alleviate.

Not necessarily. The yield from's follow the blocking control flow,
which is surprisingly less common than you might think. Parts of your
code naturally arise as not requiring blocking behaviour in the same
manner as in Haskell where parts of your code are identified as
requiring the IO monad.

>> Sometimes there's no alternative, but wherever I can, I avoid thinking,
>> especially hard thinking. ?This maxim has served me very well throughout my
>> programming career ;-).

I'd replace "hard thinking" with "future confusion" here.

> There are already well-known techniques for dealing with
> concurrency that minimise the amount of hard thinking required.
> You devise some well-behaved abstractions, such as queues, and
> put all your hard thinking into implementing them. Then you
> build the rest of your code around those abstractions. That
> way you don't have to rely on crutches such as explicitly
> marking everything that might cause a task switch, because
> it doesn't matter.

It's my firm belief that this isn't sufficient. If this were true,
then the Python internals could be improved by replacing the GIL with
a series of channels/queues or what have you. State is complex, and
without guarantees of immutability, it's just not practical to try to
wrap every state object in some protocol to be passed back and forth
on queues.

From paul at  Sun Jan 22 08:44:24 2012
From: paul at (Paul McMillan)
Date: Sat, 21 Jan 2012 23:44:24 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> I may have a terminology problem here. I expect that a random seed must
> change every time it is used, otherwise the pseudorandom number generator
> using it just returns the same value each time. Should we be talking about a
> salt rather than a seed?

You should read the several other threads, the bug, as well as the
implementation and patch under discussion. Briefly, Python string
hashes are calculated once per string, and then used in many places.
You can't change the hash value for a string during program execution
without breaking everything. The proposed change modifies the starting
value of the hash function to include a process-wide randomly
generated seed. This seed is chosen randomly at runtime, but cannot
change once chosen. Using the seed changes the final output of the
hash to be unpredictable to an attacker, solving the underlying

Salt could also be an appropriate term here, but since salt is
generally changed on a per-use basis (a single process may use many
different salts), seed is more correct, since this value is only
chosen once per process.


From greg at  Sun Jan 22 10:08:13 2012
From: greg at (Gregory P. Smith)
Date: Sun, 22 Jan 2012 01:08:13 -0800
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fixes issue
 #8052: The posix subprocess module's close_fds behavior was
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 21, 2012 at 4:21 PM, Benjamin Peterson <benjamin at> wrote:
> 2012/1/21 gregory.p.smith <python-checkins at>:
> ...
>> +/* Convert ASCII to a positive int, no libc call. no overflow. -1 on error. */
> Is no libc call important?

Yes.  strtol() is not on the async signal safe C library function list.

>> +static int _pos_int_from_ascii(char *name)
> To be consistent with the rest of posixmodule.c, "static int" should
> be on a different line from the signature. This also applies to all
> other function declarations added by this.

Python C style as a whole, yes.  This file already has a mix of same
line vs two line declarations, I added these following the style of
the functions immediately surrounding them.  Want a style fixup on the
whole file?

>> +{
>> + ? ?int num = 0;
>> + ? ?while (*name >= '0' && *name <= '9') {
>> + ? ? ? ?num = num * 10 + (*name - '0');
>> + ? ? ? ?++name;
>> + ? ?}
>> + ? ?if (*name)
>> + ? ? ? ?return -1; ?/* Non digit found, not a number. */
>> + ? ?return num;
>> +}
>> +
>> +
>> +/* Returns 1 if there is a problem with fd_sequence, 0 otherwise. */
>> +static int _sanity_check_python_fd_sequence(PyObject *fd_sequence)
>> +{
>> + ? ?Py_ssize_t seq_idx, seq_len = PySequence_Length(fd_sequence);
> PySequence_Length can fail.

It has already been checked not to by the only entry point into the
code in this file.

>> + ? ?long prev_fd = -1;
>> + ? ?for (seq_idx = 0; seq_idx < seq_len; ++seq_idx) {
>> + ? ? ? ?PyObject* py_fd = PySequence_Fast_GET_ITEM(fd_sequence, seq_idx);
>> + ? ? ? ?long iter_fd = PyLong_AsLong(py_fd);
>> + ? ? ? ?if (iter_fd < 0 || iter_fd < prev_fd || iter_fd > INT_MAX) {
>> + ? ? ? ? ? ?/* Negative, overflow, not a Long, unsorted, too big for a fd. */
>> + ? ? ? ? ? ?return 1;
>> + ? ? ? ?}
>> + ? ?}
>> + ? ?return 0;
>> +}
>> +
>> +
>> +/* Is fd found in the sorted Python Sequence? */
>> +static int _is_fd_in_sorted_fd_sequence(int fd, PyObject *fd_sequence)
>> +{
>> + ? ?/* Binary search. */
>> + ? ?Py_ssize_t search_min = 0;
>> + ? ?Py_ssize_t search_max = PySequence_Length(fd_sequence) - 1;
>> + ? ?if (search_max < 0)
>> + ? ? ? ?return 0;
>> + ? ?do {
>> + ? ? ? ?long middle = (search_min + search_max) / 2;
>> + ? ? ? ?long middle_fd = PyLong_AsLong(
>> + ? ? ? ? ? ? ? ?PySequence_Fast_GET_ITEM(fd_sequence, middle));
> No check for error?

_sanity_check_python_fd_sequence() already checked the entire list to
guarantee that there would not be any such error.

>> + ? ? ? ?if (fd == middle_fd)
>> + ? ? ? ? ? ?return 1;
>> + ? ? ? ?if (fd > middle_fd)
>> + ? ? ? ? ? ?search_min = middle + 1;
>> + ? ? ? ?else
>> + ? ? ? ? ? ?search_max = middle - 1;
>> + ? ?} while (search_min <= search_max);
>> + ? ?return 0;
>> +}

In general this is an extension module that is best viewed as a whole
including its existing comments rather than as a diff.

It contains code that will look "odd" in a diff because much of it
executes in a path where not much is allowed (post fork, pre exec) and
no useful way of responding to an error is possible so it attempts to
pre-check for any possible errors up front so that later code that is
unable to handle errors cannot possibly fail.


From victor.stinner at  Sun Jan 22 11:11:29 2012
From: victor.stinner at (Victor Stinner)
Date: Sun, 22 Jan 2012 11:11:29 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> This seed is chosen randomly at runtime, but cannot
> change once chosen.

The hash is used to compare objects: if hash(obj1) != hash(obj2),
objects are considered different. So two strings must have the same
hash if their value is the same.

> Salt could also be an appropriate term here, but since salt is
> generally changed on a per-use basis (a single process may use many
> different salts), seed is more correct, since this value is only
> chosen once per process.

We may use a different salt per dictionary.


From fuzzyman at  Sun Jan 22 14:14:19 2012
From: fuzzyman at (Michael Foord)
Date: Sun, 22 Jan 2012 13:14:19 +0000
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 Jan 2012, at 20:24, Vijay Majagaonkar wrote:

> On 2012-01-21, at 1:57 PM, Hynek Schlawack wrote:
>> Am Freitag, 20. Januar 2012 um 23:40 schrieb Vijay Majagaonkar:
>>>>> I am trying to build python 3 on mac and build failing with following error can somebody help me with this
>>>> It is a known bug that Apple's latest gcc-llvm (that comes with Xcode 4.1 by default as gcc) miscompiles Python: 
>>>> make clean
>>>> CC=clang ./configure && make -s
>>> Thanks for the help, but above command need to run in different way
>>> ./configure CC=clang
>>> make
>> I'm not sure why you think it "needs" to be that way, but it's fine by me as both ways work fine.
> I am not sure, that was just try and worked for me, with first option suggested by you was throwing same compile error then I tried with this that worked :)

The problems compiling Python 3 on the Mac with XCode 4.1 have been reported and discussed here:

This invocation worked for me:

	./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug

All the best,

Michael Foord

>>> this allowed me to build the code but when ran test I got following error message
>>> [363/364/3] test_io
>>> python.exe(11411) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>>> *** error: can't allocate region
>>> *** set a breakpoint in malloc_error_break to debug
>>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>>> *** error: can't allocate region
>>> *** set a breakpoint in malloc_error_break to debug
>>> python.exe(11411,0x7fff7a8ba960) malloc: *** mmap(size=9223372036854775808) failed (error code=12)
>>> *** error: can't allocate region
>>> *** set a breakpoint in malloc_error_break to debug
>>> I am using Mac OS-X 10.7.2 and insatlled Xcode 4.2.1 
>> Please ensure there aren't any gcc-created objects left by running "make distclean" first.
> I have tried this option too but still result is same, I have attached test result if that will helps  <mac_test.log>and I will like to work on this if you give me some guideline to look into this issue 
> Thanks for the help
> ;)_______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From lukasz at  Sun Jan 22 18:43:52 2012
From: lukasz at (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Sun, 22 Jan 2012 18:43:52 +0100
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

Wiadomo?? napisana przez Michael Foord w dniu 22 sty 2012, o godz. 14:14:

> 	./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug

Why the phony prefix?

Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
Please consider the environment before printing out this e-mail.

From fuzzyman at  Sun Jan 22 19:17:03 2012
From: fuzzyman at (Michael Foord)
Date: Sun, 22 Jan 2012 18:17:03 +0000
Subject: [Python-Dev] python build failed on mac
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 Jan 2012, at 17:43, ?ukasz Langa wrote:

> Wiadomo?? napisana przez Michael Foord w dniu 22 sty 2012, o godz. 14:14:
>> 	./configure CC=gcc-4.2 --prefix=/dev/null --with-pydebug
> Why the phony prefix?

Heh, it's what I've always done - I think copied from other developers. 

The dev guide suggests it:

There is normally no need to install your built copy of Python! The interpreter will realize where it is being run from and thus use the files found in the working copy. If you are worried you might accidentally install your working copy build, you can add --prefix=/dev/null to the configuration step.

Not that this is particularly a worry for me...

All the best,


> -- 
> Best regards,
> ?ukasz Langa
> Senior Systems Architecture Engineer
> IT Infrastructure Department
> Grupa Allegro Sp. z o.o.
> Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
> Please consider the environment before printing out this e-mail.


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From regebro at  Sun Jan 22 20:53:41 2012
From: regebro at (Lennart Regebro)
Date: Sun, 22 Jan 2012 20:53:41 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 22, 2012 at 11:11, Victor Stinner
<victor.stinner at> wrote:
>> This seed is chosen randomly at runtime, but cannot
>> change once chosen.
> The hash is used to compare objects: if hash(obj1) != hash(obj2),
> objects are considered different. So two strings must have the same
> hash if their value is the same.
>> Salt could also be an appropriate term here, but since salt is
>> generally changed on a per-use basis (a single process may use many
>> different salts), seed is more correct, since this value is only
>> chosen once per process.
> We may use a different salt per dictionary.

Can we do that? I was thinking of ways to not raise errors when we get
over a collision count, but instead somehow change the way the
dictionary behaves when we get over the collision count, but I
couldn't come up with something. Somehow adding a salt would be one
possibility. But I don't see how it's doable except for the
string-keys only case mentioned before.

But I might just be lacking imagination. :-)


From solipsis at  Sun Jan 22 21:13:32 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 22 Jan 2012 21:13:32 +0100
Subject: [Python-Dev] Counting collisions for the win
References: <>
Message-ID: <>

I think this thread is approaching the recursion limit.
Be careful not to blow the stack :)



On Sun, 22 Jan 2012 20:53:41 +0100
Lennart Regebro <regebro at> wrote:
> On Sun, Jan 22, 2012 at 11:11, Victor Stinner
> <victor.stinner at> wrote:
> >> This seed is chosen randomly at runtime, but cannot
> >> change once chosen.
> >
> > The hash is used to compare objects: if hash(obj1) != hash(obj2),
> > objects are considered different. So two strings must have the same
> > hash if their value is the same.
> >
> >> Salt could also be an appropriate term here, but since salt is
> >> generally changed on a per-use basis (a single process may use many
> >> different salts), seed is more correct, since this value is only
> >> chosen once per process.
> >
> > We may use a different salt per dictionary.
> Can we do that? I was thinking of ways to not raise errors when we get
> over a collision count, but instead somehow change the way the
> dictionary behaves when we get over the collision count, but I
> couldn't come up with something. Somehow adding a salt would be one
> possibility. But I don't see how it's doable except for the
> string-keys only case mentioned before.
> But I might just be lacking imagination. :-)
> //Lennart

From paul at  Mon Jan 23 06:02:46 2012
From: paul at (Paul McMillan)
Date: Sun, 22 Jan 2012 21:02:46 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

> We may use a different salt per dictionary.

If we're willing to re-hash everything on a per-dictionary basis. That
doesn't seem reasonable given our existing usage.

From regebro at  Mon Jan 23 06:49:16 2012
From: regebro at (Lennart Regebro)
Date: Mon, 23 Jan 2012 06:49:16 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 23, 2012 at 06:02, Paul McMillan <paul at> wrote:
>> We may use a different salt per dictionary.
> If we're willing to re-hash everything on a per-dictionary basis. That
> doesn't seem reasonable given our existing usage.

Well, if we get crazy amounts of collisions, re-hashing with a new
salt to get rid of those collisions seems quite reasonable to me...


From stephen at  Mon Jan 23 07:15:54 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 23 Jan 2012 15:15:54 +0900
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Lennart Regebro writes:
 > On Mon, Jan 23, 2012 at 06:02, Paul McMillan <paul at> wrote:
 > >> We may use a different salt per dictionary.
 > >
 > > If we're willing to re-hash everything on a per-dictionary basis. That
 > > doesn't seem reasonable given our existing usage.
 > Well, if we get crazy amounts of collisions, re-hashing with a new
 > salt to get rid of those collisions seems quite reasonable to me...

But doesn't the whole idea of a hash table fall flat on its face if
you need to worry about crazy amounts of collisions (outside of
deliberate attacks)?

From timothy.c.delaney at  Mon Jan 23 07:41:51 2012
From: timothy.c.delaney at (Tim Delaney)
Date: Mon, 23 Jan 2012 17:41:51 +1100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On 23 January 2012 16:49, Lennart Regebro <regebro at> wrote:

> On Mon, Jan 23, 2012 at 06:02, Paul McMillan <paul at> wrote:
> >> We may use a different salt per dictionary.
> >
> > If we're willing to re-hash everything on a per-dictionary basis. That
> > doesn't seem reasonable given our existing usage.
> Well, if we get crazy amounts of collisions, re-hashing with a new
> salt to get rid of those collisions seems quite reasonable to me...

Actually, this looks like it has the seed of a solution in it. I haven't
scrutinised the following beyond "it sounds like it could work" - it could
well contain nasty flaws.

Assumption: We only get an excessive number of collisions during an attack
(directly or indirectly).
Assumption: Introducing a salt into hashes will change those hashes
sufficiently to mitigate the attack (all discussion of randomising hashes
makes this assumption).

1. Keep the current hashing (for all dictionaries) i.e. just using

2. Count collisions.

3. If any key hits X collisions change that dictionary to use a random salt
for hashes (at least for str and unicode keys). This salt would be
remembered for the dictionary.

Consequence: The dictionary would need to be rebuilt when an attack was
Consequence: Hash caching would no longer occur for this dictionary, making
most operations more expensive.
Consequence: Anything relying on the iteration order of a dictionary which
has suffered excessive conflicts would fail.

4. (Optional) in 3.3, provide a way to get a dictionary with random salt
(i.e. not wait for attack).

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pydev at  Mon Jan 23 09:53:06 2012
From: pydev at (Frank Sievertsen)
Date: Mon, 23 Jan 2012 09:53:06 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>


I'd still prefer to see a randomized hash()-function (at least for 3.3).

But to protect against the attacks it would be sufficient to use
randomization for collision resolution in dicts (and sets).

What if we use a second (randomized) hash-function in case there
are many collisions in ONE lookup. This hash-function is used only
for collision resolution and is not cached.

The benefits:

* protection against the known attacks
* hash(X) stays stable and the same
* dict order is only changed when there are many collisions
* doctests will not break
* enhanced collision resolution
* RNG doesn't have to be initialized in smaller programs
* nearly no slowdown of most dicts
* second hash-function is only used for keys with higher collision-rate
* lower probability to leak secrets
* possibility to use different secrets for each dict

The drawback:

* need to add a second hash-function
* slower than using one hash-function only, when > 20 collisions
* need to add this to container-types? (if used for py3.3)
* need to expose this to the user? (if used for py3.3)
* works only for datatypes with this new function
* possible to implement without breaking ABI?

The following code is meant for explanation purpose only:

for(perturb = hash; ; perturb >>= 5) {
     i = (i << 2) + i + perturb + 1;

     if((collisions++) == 20) {
         // perturb is already zero after 13 rounds.
         // 20 collisions are rare.

         // you can add && (ma_mask > 256) to make 100% sure
         // that it's not used for smaller dicts.

         if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) {
             // If type has a randomized hash, use this now for lookup
             i = perturb = PyObject_RandomizedHash(key));

If I got this right we could add a new function "tp_randomized_hash"
to 3.3 release. But can we also add functions in older releases, without
breaking ABI?

If not, can we implement this somehow using a flag?


PyObject_RandomizedHash(PyVarObject *o) {
     PyTypeObject *tp = Py_TYPE(v);
     if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH))
         return -1;

     global_flags_somewhere->USE_RANDOMIZED_HASH = 1;
     return (*tp->tp_hash)(v);

.... and in unicodeobject.c: (and wherever we need randomization)

static Py_hash_t
unicode_hash(PyUnicodeObject *self)
     Py_ssize_t len;
     Py_UNICODE *p;
     Py_hash_t x;
     Py_hash_t prefix=0;
     Py_hash_t suffix=0;

     if(global_flags_somewhere->USE_RANDOMIZED_HASH) {
         global_flags_somewhere->USE_RANDOMIZED_HASH = 0;

     ..... (and don't cache in this case) .....

It's ugly, but if I understand this correctly, the GIL will
protect us against race-conditions, right?

Hello, internals experts: Would this work or is there a better way to do
this without breaking the ABI?


From lukasz at  Mon Jan 23 15:49:11 2012
From: lukasz at (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Mon, 23 Jan 2012 15:49:11 +0100
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <>

Wiadomo?? napisana przez Ethan Furman w dniu 20 sty 2012, o godz. 21:05:

> The problem I have with 'raise x from None' is it puts 'from None' clear at the end of line

from None raise SomeOtherError('etc.')

Better yet:

with nocontext():
	raise SomeOtherError('etc.')

But that's python-ideas territory ;)

Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
Please consider the environment before printing out this e-mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1898 bytes
Desc: not available
URL: <>

From v+python at  Mon Jan 23 19:25:43 2012
From: v+python at (Glenn Linderman)
Date: Mon, 23 Jan 2012 10:25:43 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/23/2012 12:53 AM, Frank Sievertsen wrote:
> What if we use a second (randomized) hash-function in case there
> are many collisions in ONE lookup. This hash-function is used only
> for collision resolution and is not cached. 

So this sounds like SafeDict, but putting it under the covers and 
automatically converting from dict to SafeDict after a collision 
threshold has been reached.  Let's call it fallback-dict.

Compared to SafeDict as a programmer tool, fallback-dict has these benefits:

* No need to change program (or library) source to respond to an attack
* Order is preserved until the collision threshold has been reached
* Performance is preserved until the collision threshold has been reached

and costs:

* converting the dict from one hash to the other by rehashing all the keys.

Compared to always randomizing the hash, fallback-dict has these benefits:

* hash (and perfomance) is deterministic: programs running on the same 
data set will have the same performance characteristic, unless the 
collision threshold is reached for that data set.
* lower probability to leak secrets, because each attacked set/dict can 
have its own secret, randomized hash seed
* patch would not need to include RNG initialization during startup, 
lowering the impact on short-running programs.

What is not clear is how much SafeDict degrades performance when it is 
used; non-cached hashes will definitely have an impact.  I'm not sure 
whether an implementation of fallback-dict in C code, would be 
significantly faster than the implementation of SafeDict in Python, to 
know whether comparing the performance of SafeDict and dict would be 
representative of the two stages of fallback-dict performance, but 
certainly the performance cost of SafeDict would be an upper bound on 
the performance cost of fallback-dict, once conversion takes place, but 
would not measure the conversion cost.  The performance of fallback-dict 
does have to be significantly better than the performance of dict with 
collisions to be beneficial, but if the conversion cost is significant, 
triggering conversions could be an attack vector.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From pydev at  Mon Jan 23 19:58:25 2012
From: pydev at (Frank Sievertsen)
Date: Mon, 23 Jan 2012 19:58:25 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 23.01.2012 19:25, Glenn Linderman wrote:
> So this sounds like SafeDict, but putting it under the covers and 
> automatically converting from dict to SafeDict after a collision 
> threshold has been reached.  Let's call it fallback-dict.
> and costs:
> * converting the dict from one hash to the other by rehashing all the 
> keys.

That's not exactly what it does, it calls the randomized hash-function 
only for those
keys, that that didn't find a free slot after 20 collision. And it uses 
this value only for
the further collision resolution.

So the value of hash() is used for the first 20 slots, randomized_hash() 
is used
after that.

1st try: slot[i = perturb = hash];
2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
20th try: slot[i= i * 5 + 1 + (perturb >>= 5)]
21th try: slot[i= perturb = randomized_hash(key)] <---- HERE
22th try: slot[i= i * 5 + 1 + (perturb >>= 5)]

This is also why there is no conversion needed. It's a
per-key/per-lookup rule.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Mon Jan 23 21:18:33 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 23 Jan 2012 21:18:33 +0100
Subject: [Python-Dev] exception chaining
In-Reply-To: <>
References: <>
Message-ID: <jfkfap$5fo$>

Am 23.01.2012 15:49, schrieb ?ukasz Langa:

> Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
> Please consider the environment before printing out this e-mail.

Oh please?!


From v+python at  Mon Jan 23 21:15:36 2012
From: v+python at (Glenn Linderman)
Date: Mon, 23 Jan 2012 12:15:36 -0800
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 1/23/2012 10:58 AM, Frank Sievertsen wrote:
> On 23.01.2012 19:25, Glenn Linderman wrote:
>> So this sounds like SafeDict, but putting it under the covers and 
>> automatically converting from dict to SafeDict after a collision 
>> threshold has been reached.  Let's call it fallback-dict.
>> and costs:
>> * converting the dict from one hash to the other by rehashing all the 
>> keys.
> That's not exactly what it does, it calls the randomized hash-function 
> only for those
> keys, that that didn't find a free slot after 20 collision. And it 
> uses this value only for
> the further collision resolution.
> So the value of hash() is used for the first 20 slots, 
> randomized_hash() is used
> after that.
> 1st try: slot[i = perturb = hash];
> 2nd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
> 3rd try: slot[i=i * 5 + 1 + (perturb >>= 5)]
> ....
> 20th try: slot[i= i * 5 + 1 + (perturb >>= 5)]
> 21th try: slot[i= perturb = randomized_hash(key)] <---- HERE
> 22th try: slot[i= i * 5 + 1 + (perturb >>= 5)]
> This is also why there is no conversion needed. It's a
> per-key/per-lookup rule.
> Frank

Interesting idea, and I see it would avoid conversions.  What happens if 
the data area also removed from the hash?  So you enter 20 colliding 
keys, then 20 more that get randomized, then delete the 18 of the first 
20.  Can you still find the second 20 keys? Takes two sets of probes, 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From lukasz at  Mon Jan 23 21:32:22 2012
From: lukasz at (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Mon, 23 Jan 2012 21:32:22 +0100
Subject: [Python-Dev] exception chaining
In-Reply-To: <jfkfap$5fo$>
References: <>
Message-ID: <>

Wiadomo?? napisana przez Georg Brandl w dniu 23 sty 2012, o godz. 21:18:

> Am 23.01.2012 15:49, schrieb ?ukasz Langa:
> [graphics]
>> Pomy?l o ?rodowisku naturalnym zanim wydrukujesz t? wiadomo??!
>> Please consider the environment before printing out this e-mail.
> Oh please?!

Excuse me. Corpo speak! (at least it's short)

Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

From mal at  Mon Jan 23 22:55:47 2012
From: mal at (M.-A. Lemburg)
Date: Mon, 23 Jan 2012 22:55:47 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
Message-ID: <>

Frank Sievertsen wrote:
> Hello,
> I'd still prefer to see a randomized hash()-function (at least for 3.3).
> But to protect against the attacks it would be sufficient to use
> randomization for collision resolution in dicts (and sets).
> What if we use a second (randomized) hash-function in case there
> are many collisions in ONE lookup. This hash-function is used only
> for collision resolution and is not cached.

This sounds a lot like what I'm referring to as universal hash function
in the discussion on the ticket:

However, I don't like the term "random" in there. It's better to make
the approach deterministic to avoid issues with not being able
to easily reproduce Python application runs for debugging purposes.

If you find that the data is manipulated, simply incrementing the
universal hash parameter and rehashing the dict with that parameter
should be enough to solve the issue (if not, which is highly unlikely,
the dict will simply reapply the fix). No randomness needed.

BTW: I attached a demo script to the ticket which demonstrates both
types of collisions using integers.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 23 2012)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From janzert at  Mon Jan 23 23:38:47 2012
From: janzert at (Janzert)
Date: Mon, 23 Jan 2012 17:38:47 -0500
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <jfkni8$2b0$>

On 1/23/2012 1:25 PM, Glenn Linderman wrote:
> On 1/23/2012 12:53 AM, Frank Sievertsen wrote:
>> What if we use a second (randomized) hash-function in case there
>> are many collisions in ONE lookup. This hash-function is used only
>> for collision resolution and is not cached.
> So this sounds like SafeDict, but putting it under the covers and
> automatically converting from dict to SafeDict after a collision
> threshold has been reached.  Let's call it fallback-dict.

If you're going to essentially switch data structures dynamically 
anyway, why not just switch to something that doesn't have n**2 worse 
case performance?


From frank at  Mon Jan 23 21:43:11 2012
From: frank at (Frank Sievertsen)
Date: Mon, 23 Jan 2012 21:43:11 +0100
Subject: [Python-Dev] Counting collisions for the win
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

> Interesting idea, and I see it would avoid conversions.  What happens 
> if the data area also removed from the hash?  So you enter 20 
> colliding keys, then 20 more that get randomized, then delete the 18 
> of the first 20.  Can you still find the second 20 keys? Takes two 
> sets of probes, somehow?
That's no problem, because the dict doesn't really free a slot, it
replaces the values with a dummy-values.

These places are later reused for new values or the whole dict is 
recreated and


From brett at  Tue Jan 24 16:42:17 2012
From: brett at (Brett Cannon)
Date: Tue, 24 Jan 2012 10:42:17 -0500
Subject: [Python-Dev] Sprinting at PyCon US
Message-ID: <>

I went ahead  and signed us up as usual: . I listed myself as
the leader, but I will only be at the sprints one full day and whatever
part of Tuesday I can fit in before flying out to Toronto (which is
probably not much thanks to the timezone difference). So if someone wants
to be the official leader who will be there longer feel free to take me off
and put yourself in (and you don't need to ask me beforehand).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Tue Jan 24 19:52:53 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 24 Jan 2012 19:52:53 +0100
Subject: [Python-Dev] devguide: Use -j0 to maximimze parallel execution.
In-Reply-To: <>
References: <>
Message-ID: <jfmum4$se9$>

Am 24.01.2012 18:58, schrieb brett.cannon:
> changeset:   489:a34e4a6b89dc
> user:        Brett Cannon <brett at>
> date:        Tue Jan 24 12:58:01 2012 -0500
> summary:
>   Use -j0 to maximimze parallel execution.
> files:
>   runtests.rst |  2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
> diff --git a/runtests.rst b/runtests.rst
> --- a/runtests.rst
> +++ b/runtests.rst
> @@ -41,7 +41,7 @@
>  If you have a multi-core or multi-CPU machine, you can enable parallel testing
>  using several Python processes so as to speed up things::
> -   ./python -m test -j2
> +   ./python -m test -j0

That only works on 3.3 though...


From brett at  Tue Jan 24 20:03:35 2012
From: brett at (Brett Cannon)
Date: Tue, 24 Jan 2012 14:03:35 -0500
Subject: [Python-Dev] devguide: Use -j0 to maximimze parallel execution.
In-Reply-To: <jfmum4$se9$>
References: <>
Message-ID: <>

On Tue, Jan 24, 2012 at 13:52, Georg Brandl <g.brandl at> wrote:

> Am 24.01.2012 18:58, schrieb brett.cannon:
> >
> > changeset:   489:a34e4a6b89dc
> > user:        Brett Cannon <brett at>
> > date:        Tue Jan 24 12:58:01 2012 -0500
> > summary:
> >   Use -j0 to maximimze parallel execution.
> >
> > files:
> >   runtests.rst |  2 +-
> >   1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >
> > diff --git a/runtests.rst b/runtests.rst
> > --- a/runtests.rst
> > +++ b/runtests.rst
> > @@ -41,7 +41,7 @@
> >  If you have a multi-core or multi-CPU machine, you can enable parallel
> testing
> >  using several Python processes so as to speed up things::
> >
> > -   ./python -m test -j2
> > +   ./python -m test -j0
> That only works on 3.3 though...

Bugger. I will add a note.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From alexis at  Tue Jan 24 21:54:13 2012
From: alexis at (=?ISO-8859-1?Q?Alexis_M=E9taireau?=)
Date: Tue, 24 Jan 2012 21:54:13 +0100
Subject: [Python-Dev] Packaging and setuptools compatibility
Message-ID: <>

Hi folks,

I have this in my mind since a long time, but I didn't talked about that 
on this list, was only writing on distutils@ or another list we had for 
distutils2 (the fellowship of packaging).

AFAIK, we're almost good about packaging in python 3.3, but there is 
still something that keeps bogging me. What we've done (I worked 
especially on this bit) is to provide a compatibility layer for the 
distributions packaged using setuptools/distribute. What it does, 
basically, is to install things using setuptools or distribute (the one 
present with the system) and then convert the metadata to the new one 
described in PEP 345.

A few things are not handled yet, regarding setuptools: entrypoints and 
namespaces. I would like to espeicially talk about entrypoints here.

Entrypoints basically are a plugin system. They are storing information 
in the metadata and then retrieving them when needing them. The problem 
with this, as everything when trying to get information from metadata is 
that we need to parse all the metadata for all the installed 
distributions. (say O(N)).

I'm wondering if we should support that (a way to have plugins) in the 
new packaging thing, or not. If not, this mean we should come with 
another solution to support this outside of packaging (may be in 
distribute). If yes, then we should design it, and probably make it a 
sub-part of packaging.

What are your opinions on that? Should we do it or not? and if yes, 
what's the way to go?

-- Alexis

From glyph at  Tue Jan 24 22:58:52 2012
From: glyph at (Glyph Lefkowitz)
Date: Tue, 24 Jan 2012 13:58:52 -0800
Subject: [Python-Dev] Packaging and setuptools compatibility
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 24, 2012, at 12:54 PM, Alexis M?taireau wrote:

> I'm wondering if we should support that (a way to have plugins) in the new packaging thing, or not. If not, this mean we should come with another solution to support this outside of packaging (may be in distribute). If yes, then we should design it, and probably make it a sub-part of packaging.

First, my interest: Twisted has its own plugin system.  I would like this to continue to work in the future.

I do not believe that packaging should support plugins directly.  Run-time metadata is not the packaging system's job.  However, the packaging system does need to provide some guarantees about how to install and update data at installation (and post-installation time) so that databases of plugin metadata may be kept up to date.  Basically, packaging's job is constructing explicitly declared parallels between your development environment and your deployment environment.

Some such databases are outside of Python entirely (for example, you might think of /etc/init.d as such a database), so even if you don't care about the future of Twisted's weirdo plugin system, it would be nice for this to be supported.

In other words, packaging should have a meta-plugin system: a way for a plugin system to register itself and provide an API for things to install their metadata, and a way to query the packaging module about the way that a Python package is installed so that it can put things near to it in an appropriate way.  (Keep in mind that "near to it" may mean in a filesystem directory, or a zip file, or stuffed inside a bundle or executable.)

In my design of Twisted's plugin system, we used PEP 302 as this sort of meta-standard, and (modulo certain bugs in easy_install and pip, most of which are apparently getting fixed in pip pretty soon) it worked out reasonably well.  The big missing pieces are post-install and post-uninstall hooks.  If we had those, translating to "native" packages for Twisted (and for things that use it) could be made totally automatic.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From nadeem.vawda at  Wed Jan 25 05:05:19 2012
From: nadeem.vawda at (Nadeem Vawda)
Date: Wed, 25 Jan 2012 06:05:19 +0200
Subject: [Python-Dev] Status of Mac buildbots
Message-ID: <>

Hi all,

I've noticed that most of the Mac buildbots have been offline for a while:


Does anyone know what the status of these bots is? Are they
permanently down, or just temporarily inaccessible?


From greg at  Wed Jan 25 06:24:31 2012
From: greg at (Gregory P. Smith)
Date: Tue, 24 Jan 2012 21:24:31 -0800
Subject: [Python-Dev] Counting collisions w/ no need for a fatal
Message-ID: <>

On Sun, Jan 22, 2012 at 10:41 PM, Tim Delaney
<timothy.c.delaney at> wrote:
> On 23 January 2012 16:49, Lennart Regebro <regebro at> wrote:
>> On Mon, Jan 23, 2012 at 06:02, Paul McMillan <paul at> wrote:
>> >> We may use a different salt per dictionary.
>> >
>> > If we're willing to re-hash everything on a per-dictionary basis. That
>> > doesn't seem reasonable given our existing usage.
>> Well, if we get crazy amounts of collisions, re-hashing with a new
>> salt to get rid of those collisions seems quite reasonable to me...
> Actually, this looks like it has the seed of a solution in it. I haven't
> scrutinised the following beyond "it sounds like it could work" - it could
> well contain nasty flaws.
> Assumption: We only get an excessive number of collisions during an attack
> (directly or indirectly).
> Assumption: Introducing a salt into hashes will change those hashes
> sufficiently to mitigate the attack (all discussion of randomising hashes
> makes this assumption).
> 1. Keep the current hashing (for all dictionaries) i.e. just using
> hash(key).
> 2. Count collisions.
> 3. If any key hits X collisions change that dictionary to use a random salt
> for hashes (at least for str and unicode keys). This salt would be
> remembered for the dictionary.
> Consequence: The dictionary would need to be rebuilt when an attack was
> detected.
> Consequence: Hash caching would no longer occur for this dictionary, making
> most operations more expensive.
> Consequence: Anything relying on the iteration order of a dictionary which
> has suffered excessive conflicts would fail.


I like this!  The dictionary would still be O(n) but the constant cost
in front of that just went up.  When you are dealing with keys coming
in from outside of the process, those are unlikely to already have any
hash values so the constant cost at insertion time has really not
changed at all because they would need hashing anyways. Their cost at
non-iteration lookup time will be a constant factor greater but I do
not see that as being a problem given that known keys being looked up
in a

This approach also allows for the dictionary hashing mode switch to
occur after a lower number of collisions than the previous
investigations into raising a MemoryError or similar were asking for
(because they wanted to avoid false hard failures).  It prevents that
case from breaking in favor of a brief performance hiccup.

I would *combine* this with a per process/interpreter-instance seed in
3.3 and later for added impact (less need for this code path to ever
be triggered).  For the purposes of backporting as a security fix,
that part would be disabled by default but #1-3 would be enabled by

Question A: Does the dictionary get rebuilt -again- with a new
dict-salt if a large number of collisions occurs after a dict-salt has
already been established?

Question B: Is there a size of dictionary in which we refuse to
rebuild & rehash it because it would simply be too costly?  obviously
if we lack the ram to malloc a new table, when else?  ever?

Suggestion: Would there be any benefit to making the number of
collisions threshold on when to rebuild & rehash a log function of the
dictionary's current size rather than a constant for all dicts?

> 4. (Optional) in 3.3, provide a way to get a dictionary with random salt
> (i.e. not wait for attack).

I don't like #4 as a documented public API as I'm not sure how well
that'd play with other VMs (I suppose they could ignore it) but it
would be useful for dict implementation testing purposes and easier
studying of the behavior.  If this is added it should be a method on
the dict such as ._set_hash_salt() or something and for testing
purposes it would be good to allow a dictionary to be queried to see
if they are using their own salt or not (perhaps just
._get_hash_salt() returning non 0 means they are?)


From anacrolix at  Wed Jan 25 06:32:43 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 25 Jan 2012 16:32:43 +1100
Subject: [Python-Dev] io module types
Message-ID: <>

Can calls to the C types in the io module be made into module lookups
more akin to how it would work were it written in Python? The C
implementation for io_open invokes the C type objects for FileIO, and
friends, instead of looking them up on the io or _io modules. This
makes it difficult to subclass and/or modify the behaviour of those
classes from Python.

From anacrolix at  Wed Jan 25 08:35:30 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 25 Jan 2012 18:35:30 +1100
Subject: [Python-Dev] Coroutines and PEP 380
In-Reply-To: <>
References: <>
Message-ID: <>

After much consideration, and playing with PEP380, I've changed my
stance on this. Full blown coroutines are the proper way forward.
greenlet doesn't cut it because the Python interpreter isn't aware of
the context switches. Profiling, debugging and tracebacks are
completely broken by this. Stackless would need to be merged, and
that's clearly not going to happen.

I built a basic scheduler and had a go at "enhancing" the stdlib using
PEP380, here are some examples making use of this style:

After realising it was a dead-end, I read up on Mark's ideas, there's
some really good stuff in there:

If someone can explain what's stopping real coroutines being into
Python (3.3), that would be great.

From matteo at  Wed Jan 25 10:41:20 2012
From: matteo at (Matteo Bertini)
Date: Wed, 25 Jan 2012 10:41:20 +0100
Subject: [Python-Dev] distutils 'depends' management
Message-ID: <jfoio0$t4b$>

I've noted that distutils manages depends in a way I cannot understand.

Suppose I have a minimal

from distutils.core import setup, Extension
      depends=['fop.conf'] # <---- note the typo foo->fop

Now will rebuild all every time, this is because the policy of
newer_group in build_extension is to consider 'newer' any missing file.

def build_extension(self, ext):
    depends = sources + ext.depends
    if not (self.force or newer_group(depends, ext_path, 'newer')):
        logger.debug("skipping '%s' extension (up-to-date)",
    else:"building '%s' extension",

Can someone suggest me the reason of this choice instead of
missing='error' (at least for ext.depends)?


From janssen at  Wed Jan 25 16:35:20 2012
From: janssen at (Bill Janssen)
Date: Wed, 25 Jan 2012 07:35:20 PST
Subject: [Python-Dev] Status of Mac buildbots
In-Reply-To: <>
References: <>
Message-ID: <>

Nadeem Vawda <nadeem.vawda at> wrote:

> Hi all,
> I've noticed that most of the Mac buildbots have been offline for a while:
> *
> *
> *
> Does anyone know what the status of these bots is? Are they
> permanently down, or just temporarily inaccessible?

We're tinkering with that server room.  They should be back by the end of
the week.


From nadeem.vawda at  Wed Jan 25 16:57:38 2012
From: nadeem.vawda at (Nadeem Vawda)
Date: Wed, 25 Jan 2012 17:57:38 +0200
Subject: [Python-Dev] Status of Mac buildbots
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Jan 25, 2012 at 5:35 PM, Bill Janssen <janssen at> wrote:
> We're tinkering with that server room. ?They should be back by the end of
> the week.

OK, cool. Thanks for the info.

From pje at  Wed Jan 25 18:28:23 2012
From: pje at (PJ Eby)
Date: Wed, 25 Jan 2012 12:28:23 -0500
Subject: [Python-Dev] Packaging and setuptools compatibility
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/24 Alexis M?taireau <alexis at>

> Entrypoints basically are a plugin system. They are storing information in
> the metadata and then retrieving them when needing them. The problem with
> this, as everything when trying to get information from metadata is that we
> need to parse all the metadata for all the installed distributions. (say
> O(N)).

Note that this is why setuptools doesn't put entry points into PKG-INFO,
but instead uses separate metadata files.  Thus there is a lower "N" as
well as smaller files to parse.  ;-)

Entrypoints are also only one type of extension metadata supported by
setuptools; there is for example the EggTranslations system built on
setuptools metadata system: it allows plugins to provide translations and
localized resources for applications, and for other plugins in the same
application.  And it does this by using a different metadata file, again
stored in the installed project's metadata.

Since the new packaging metadata format is still a directory (replacing
setuptools' EGG-INFO or .egg-info directories), it seems a reasonable
migration path to simply install entry_points.txt and other metadata
extensions to that same directory, and provide API to iterate over all the
packages that offer a particular metadata file name.  Entry points work
this way now in setuptools, i.e. they iterate over all eggs containing
entry_points metadata, then parse and cache the contents.  An API for doing
the same sort of thing here seems appropriate.  This is still "meta" as
Glyph suggests, and allows both setuptools-style entry point plugins,
EggTranslations-style plugins, and whatever other sorts of plugin systems
people would like.  (I believe some other systems exist with this sort of
metadata scheme; ISTM that Paster has a metadata format, but I don't know
if it's exposed in egg-info metadata like this currently.)

Anyway, if you offer an API for finding packages by metadata file (or even
just a per-installed-package object API to query the existence of a
metadata file), and for process-level caching of extended metadata for
installed packages, that is sufficient for the above systems to work,
without needing to bless any particular plugin API per se.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mark at  Thu Jan 26 10:30:45 2012
From: mark at (Mark Shannon)
Date: Thu, 26 Jan 2012 09:30:45 +0000
Subject: [Python-Dev] [Python-ideas] Coroutines and PEP 380
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Nick Coghlan wrote:
> (redirecting to python-ideas - coroutine proposals are nowhere near
> mature enough for python-dev)
> On Wed, Jan 25, 2012 at 5:35 PM, Matt Joiner <anacrolix at> wrote:
>> If someone can explain what's stopping real coroutines being into
>> Python (3.3), that would be great.
> The general issues with that kind of idea:
> - the author hasn't offered the code for inclusion and relicensing
> under the PSF license (thus we legally aren't allowed to do it)
If by the author you mean me, then of course it can be included.
Since it is a fork of CPython and I haven't changed the licence
I assumed it already was under the PSF licence.
> - complexity
> - maintainability
Hard to measure, but it adds about 200 lines of code.
> - platform support
Its all fully portable standard C.
> In the specific case of coroutines, you have the additional hurdle of
> convincing people whether or not they're a good idea at all.
That may well be the biggest obstacle :)

One other obstacle (and this may be a killer) is that it may not be
practical to refactor Jython to use coroutines since Jython compiles
Python direct to JVM bytecodes and the JVM doesn't support coroutines.
Jython should be able to support yield-from much more easily.


From brian at  Thu Jan 26 21:33:43 2012
From: brian at (Brian Curtin)
Date: Thu, 26 Jan 2012 14:33:43 -0600
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 17, 2012 at 15:11, Brian Curtin <brian at> wrote:
> On Tue, Jan 17, 2012 at 15:01, "Martin v. L?wis" <martin at> wrote:
>>> I previously completed the port at my old company (but could not
>>> release it), and I have a good bit of it completed for us at
>>> That repo is a little bit
>>> behind 'default' but updating it shouldn't pose any problems.
>> So: do you agree that we switch? Do you volunteer to drive the change?
> I do, and I'll volunteer.

Is this considered a new feature that has to be in by the first beta?
I'm hoping to have it completed much sooner than that so we can get
mileage on it, but is there a cutoff for changing the compiler?

From martin at  Thu Jan 26 21:54:31 2012
From: martin at (martin at
Date: Thu, 26 Jan 2012 21:54:31 +0100
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

> Is this considered a new feature that has to be in by the first beta?
> I'm hoping to have it completed much sooner than that so we can get
> mileage on it, but is there a cutoff for changing the compiler?

At some point, I'll start doing this myself if it hasn't been done by
then, and I would certainly want the build process adjusted (with
all buildbots updated) before beta 1.


From ethan at  Fri Jan 27 04:19:45 2012
From: ethan at (Ethan Furman)
Date: Thu, 26 Jan 2012 19:19:45 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
Message-ID: <>

Title: Interpreter support for concurrent programming
Version: $Revision$
Last-Modified: $Date$
Author: Ethan Furman <ethan at>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 26-Jan-2012
Python-Version: 3.3


One of the open issues from PEP 3134 is suppressing context:  currently 
there is no way to do it.  This PEP proposes one.


There are two basic ways to generate exceptions: 1) Python does it 
(buggy code, missing resources, ending loops, etc.); and, 2) manually 
(with a raise statement).

When writing libraries, or even just custom classes, it can become 
necessary to raise exceptions; moreover it can be useful, even 
necessary, to change from one exception to another.  To take an example 
from my dbf module:

         value = int(value)
     except Exception:
         raise DbfError(...)

Whatever the original exception was (ValueError, TypeError, or something 
   else) is irrelevant.  The exception from this point on is a DbfError, 
and the original exception is of no value.  However, if this exception 
is printed, we would currently see both.

Several possibilities have been put forth:

   - raise as NewException()

     Reuses the 'as' keyword; can be confusing since we are not really 
reraising the originating exception

   - raise NewException() from None

     Follows existing syntax of explicitly declaring the originating 

   - exc = NewException(); exc.__context__ = None; raise exc

     Very verbose way of the previous method

   - raise NewException.no_context(...)

     Make context suppression a class method.

All of the above options will require changes to the core.


I proprose going with the second option:

     raise NewException from None

It has the advantage of using the existing pattern of explicitly setting 
the cause:

     raise KeyError() from NameError()

but because the 'cause' is None the previous context is discarded. 
There is a patch to this effect attached to Issue6210 


This document has been placed in the public domain.

    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8

From benjamin at  Fri Jan 27 04:54:06 2012
From: benjamin at (Benjamin Peterson)
Date: Thu, 26 Jan 2012 22:54:06 -0500
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/26 Ethan Furman <ethan at>:
> Title: Interpreter support for concurrent programming


> Version: $Revision$
> Last-Modified: $Date$
> Author: Ethan Furman <ethan at>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 26-Jan-2012
> Python-Version: 3.3
> Post-History:

BTW, I don't really think this needs a PEP.


From barry at  Fri Jan 27 05:16:06 2012
From: barry at (Barry Warsaw)
Date: Thu, 26 Jan 2012 23:16:06 -0500
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 26, 2012, at 10:54 PM, Benjamin Peterson wrote:

>2012/1/26 Ethan Furman <ethan at>:
>> Title: Interpreter support for concurrent programming
>> Version: $Revision$
>> Last-Modified: $Date$
>> Author: Ethan Furman <ethan at>
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 26-Jan-2012
>> Python-Version: 3.3
>> Post-History:
>BTW, I don't really think this needs a PEP.

I think a PEP is appropriate, but the title is certainly misnamed.


From ethan at  Fri Jan 27 05:03:46 2012
From: ethan at (Ethan Furman)
Date: Thu, 26 Jan 2012 20:03:46 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/26 Ethan Furman <ethan at>:
>> Title: Interpreter support for concurrent programming
> mm?

>> Version: $Revision$
>> Last-Modified: $Date$
>> Author: Ethan Furman <ethan at>
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 26-Jan-2012
>> Python-Version: 3.3
>> Post-History:
> BTW, I don't really think this needs a PEP.

I was surprised, but Nick seems to think it is.

If somebody could fix that oopsie, and any others ;) and then commit it 
(if necessary) I would appreciate it.


From benjamin at  Fri Jan 27 05:40:02 2012
From: benjamin at (Benjamin Peterson)
Date: Thu, 26 Jan 2012 23:40:02 -0500
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/26 Ethan Furman <ethan at>:
>> BTW, I don't really think this needs a PEP.

Obviously it doesn't hurt. And I see from the issue that the change
was not as uncontroversial as I originally thought, so it's likely for
the better.


From ncoghlan at  Fri Jan 27 06:18:49 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 27 Jan 2012 15:18:49 +1000
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 1:54 PM, Benjamin Peterson <benjamin at> wrote:
> BTW, I don't really think this needs a PEP.

That's largely my influence - the discussion in the relevant tracker
item ( had covered enough ground that
I didn't notice that Ethan's specific proposal *isn't* a syntax
change, but is rather just a matter of giving some additional
semantics to the "raise X from Y" syntax (some of the other
suggestions like "raise as <whatever>" really were syntax changes).

So I've changed my mind to being +1 on the idea and proposed syntax of
the draft PEP, but I think there are still some details to be worked
through in terms of the detailed semantics. (The approach in Ethan's
patch actually *clobbers* the context info when "from None" is used,
and I don't believe that's a good idea. My own suggestions in the
tracker item aren't very good either, for exactly the same reason)

Currently, the raise from syntax is just syntactic sugar for setting
__cause__ manually:

>>> try:
...     1/0
... except ZeroDivisionError as ex:
...     new_exc = ValueError("Denominator is zero")
...     new_exc.__cause__ = ex
...     raise new_exc
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
ValueError: Denominator is zero

The context information isn't lost in that case, the display of it is
simply suppressed when an explicit cause is set:

>>> try:
...     try:
...         1/0
...     except ZeroDivisionError as ex:
...         new_exc = ValueError()
...         new_exc.__cause__ = ex
...         raise new_exc
... except ValueError as ex:
...     saved = ex
>>> saved.__context__
ZeroDivisionError('division by zero',)
>>> saved.__cause__
ZeroDivisionError('division by zero',)

This behaviour (i.e. preserving the context, but not displaying it by
default) is retained when using the dedicated syntax:

>>> try:
...     try:
...         1/0
...     except ZeroDivisionError as ex:
...         raise ValueError() from ex
... except ValueError as ex:
...     saved = ex
>>> saved.__context__
ZeroDivisionError('division by zero',)
>>> saved.__cause__
ZeroDivisionError('division by zero',)

However, if you try to set the __cause__ to None explicitly, then the
display falls back to showing the context:

>>> try:
...     1/0
... except ZeroDivisionError as ex:
...     new_exc = ValueError("Denominator is zero")
...     new_exc.__cause__ = None
...     raise new_exc
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
ValueError: Denominator is zero

This happens because None is used by the exception display logic to
indicate "no specific cause, so report the context if that is set".

My proposal would be that instead of using None as the "not set"
sentinel value for __cause__, we instead use a dedicated sentinel
object (exposed to Python at least as "BaseException().__cause__", but
potentially being given its own name somewhere).

Then the display logic for exceptions would be changed to be:
- if the __cause__ is None, then don't report a cause or exception
context at all
- if the __cause__ is BaseException().__cause__, report the exception
context (from __context__)
- otherwise report __cause__ as the specific cause of the raised exception

That way we make it easy to emit nicer default tracebacks when
replacing exceptions without completely hiding the potentially useful
data that can be provided by retaining information in __context__.

I've been burnt by too much code that replaces detailed, informative
and useful error messages that tell me exactly what is going wrong
with bland, useless garbage to be in favour of an approach that
doesn't even set the __context__ attribute in the first place. If
__context__ is always set regardless, and then __cause__ is used to
control whether or not __context__ gets displayed in the standard
tracebacks, that's a much more flexible approach.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From guido at  Fri Jan 27 06:51:35 2012
From: guido at (Guido van Rossum)
Date: Thu, 26 Jan 2012 21:51:35 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 26, 2012 at 9:18 PM, Nick Coghlan <ncoghlan at> wrote:
> I've been burnt by too much code that replaces detailed, informative
> and useful error messages that tell me exactly what is going wrong
> with bland, useless garbage to be in favour of an approach that
> doesn't even set the __context__ attribute in the first place.

Ditto here.

> If __context__ is always set regardless, and then __cause__ is used
> to control whether or not __context__ gets displayed in the standard
> tracebacks, that's a much more flexible approach.

Well, but usually all you see is the printed traceback, so it might as
well be lost, right? (It gives full control to programmatic handlers,
of course, but that's usually not where the problem lies. It's when
things go horribly wrong in the hash function and all you see in the
traceback is a lousy KeyError. :-) Did you consider to just change the
words so users can ignore it more easily?

--Guido van Rossum (

From v+python at  Fri Jan 27 07:47:57 2012
From: v+python at (Glenn Linderman)
Date: Thu, 26 Jan 2012 22:47:57 -0800
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/26/2012 10:25 PM, Gregory P. Smith wrote:
> (and on top of all of this I believe we're all settled on having per
> interpreter hash randomization_as well_  in 3.3; but this AVL tree
> approach is one nice option for a backport to fix the major
> vulnerability)

If the tree code cures the problem, then randomization just makes 
debugging harder.  I think if it is included in 3.3, it needs to have a 
switch to turn it on/off (whichever is not default).

I'm curious why AVL tree rather than RB tree, simpler implementation? 
C++ stdlib includes RB tree, though, for even simpler implementation :)

Can we have a tree type in 3.3, independent of dict?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stefan_ml at  Fri Jan 27 09:32:58 2012
From: stefan_ml at (Stefan Behnel)
Date: Fri, 27 Jan 2012 09:32:58 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
Message-ID: <jftnfq$3kh$>

Glenn Linderman, 27.01.2012 07:47:
> Can we have a tree type in 3.3, independent of dict?

I'd be happy to see that happen, but I guess the usual requirements on
stdlib extensions would apply here. I.e., someone has to write the code,
make sure people actually use it to prove that it's worth being added, make
sure it runs in different Python implementations, donate the code to the
PSF asking for stdlib addition and agree to maintain it in the future.

Such an addition is a totally separate issue from the hash collision attack


From martin at  Fri Jan 27 09:55:07 2012
From: martin at (martin at
Date: Fri, 27 Jan 2012 09:55:07 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
Message-ID: <>

> I'm curious why AVL tree rather than RB tree, simpler implementation?

Somewhat arbitrary. AVL trees have a better performance than RB trees
(1.44 log2(N) vs 2 log2(N) in the worst case). Wrt. implementation,
I looked around for a trustworthy, reusable, free (as in speech),
C-only implementation of both AVL and RB trees. The C++ std::map is
out of question as it is C++, and many other free implementations are
out of question as they are GPLed and LGPLed. Writing an implementation
from scratch for a bugfix release is also out of the question.

So I found Ian Piumarta's AVL tree 1.0 from 2006. I trust Ian Piumarta
to get it right (plus I reviewed the code a little). There are some
API glitches (such as assuming a single comparison function, whereas
it would better be rewritten to directly invoke rich comparison, or
such as node removal not returning the node that was removed). It
gets most API decisions right, in particular wrt. memory management.
The license is in the style of the MIT license. If somebody could
propose an alternative implementation (e.g. one with an even more liberal
license, or with a smaller per-node memory usage), I'd be open to
change it.

From stefan_ml at  Fri Jan 27 10:49:08 2012
From: stefan_ml at (Stefan Behnel)
Date: Fri, 27 Jan 2012 10:49:08 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
Message-ID: <jftruk$23t$>

martin at, 27.01.2012 09:55:
> So I found Ian Piumarta's AVL tree 1.0 from 2006. I trust Ian Piumarta
> to get it right (plus I reviewed the code a little). There are some
> API glitches (such as assuming a single comparison function, whereas
> it would better be rewritten to directly invoke rich comparison, or
> such as node removal not returning the node that was removed). It
> gets most API decisions right, in particular wrt. memory management.
> The license is in the style of the MIT license.

That sounds ok for internal use, and the implementation really looks short
enough to allow the adaptations you propose and generic enough to be
generally usable.

However, note that my comment on Glenn's question regarding a stdlib
addition of a tree type still applies - someone would have to write a
suitable CPython wrapper for it as well as a separate pure Python
implementation, and then offer both for inclusion and maintenance. I'm not
sure it's a good idea to have multiple C tree implementations in CPython,
i.e. one for internal use and one for the stdlib. Unless there's a serious
interest in maintaining both, that is. After all, writing a Python wrapper
for this may not be simpler than the work that went into one of the
existing (C)Python tree implementations already.


From martin at  Fri Jan 27 10:59:15 2012
From: martin at (martin at
Date: Fri, 27 Jan 2012 10:59:15 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <jftruk$23t$>
References: <>
Message-ID: <>

> However, note that my comment on Glenn's question regarding a stdlib
> addition of a tree type still applies

I agree with all that. Having a tree-based mapping type in the standard
library is a different issue entirely.

From eliben at  Fri Jan 27 14:21:33 2012
From: eliben at (Eli Bendersky)
Date: Fri, 27 Jan 2012 15:21:33 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
Message-ID: <>


Following an earlier discussion on python-ideas [1], we would like to
propose the following PEP for review. Discussion is welcome. The PEP
can also be viewed in HTML form at




PEP: 408
Title: Standard library __preview__ package
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan at>,
        Eli Bendersky <eliben at>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2012-01-07
Python-Version: 3.3
Post-History: 2012-01-27


The process of including a new module into the Python standard library is
hindered by the API lock-in and promise of backward compatibility implied by
a module being formally part of Python.  This PEP proposes a transitional
state for modules - inclusion in a special ``__preview__`` package for the
duration of a minor release (roughly 18 months) prior to full acceptance into
the standard library.  On one hand, this state provides the module with the
benefits of being formally part of the Python distribution.  On the other hand,
the core development team explicitly states that no promises are made with
regards to the module's eventual full inclusion into the standard library,
or to the stability of its API, which may change for the next release.

Proposal - the __preview__ package

Whenever the Python core development team decides that a new module should be
included into the standard library, but isn't entirely sure about whether the
module's API is optimal, the module can be placed in a special package named
``__preview__`` for a single minor release.

In the next minor release, the module may either be "graduated" into the
standard library (and occupy its natural place within its namespace, leaving the
``__preview__`` package), or be rejected and removed entirely from the Python
source tree.  If the module ends up graduating into the standard library after
spending a minor release in ``__preview__``, its API may be changed according
to accumulated feedback.  The core development team explicitly makes no
guarantees about API stability and backward compatibility of modules in

Entry into the ``__preview__`` package marks the start of a transition of the
module into the standard library.  It means that the core development team
assumes responsibility of the module, similarly to any other module in the
standard library.

Which modules should go through ``__preview__``

We expect most modules proposed for addition into the Python standard library
to go through a minor release in ``__preview__``. There may, however, be some
exceptions, such as modules that use a pre-defined API (for example ``lzma``,
which generally follows the API of the existing ``bz2`` module), or modules
with an API that has wide acceptance in the Python development community.

In any case, modules that are proposed to be added to the standard library,
whether via ``__preview__`` or directly, must fulfill the acceptance conditions
set by PEP 2.

It is important to stress that the aim of of this proposal is not to make the
process of adding new modules to the standard library more difficult.  On the
contrary, it tries to provide a means to add *more* useful libraries.  Modules
which are obvious candidates for entry can be added as before.  Modules which
due to uncertainties about the API could be stalled for a long time now have
a means to still be distributed with Python, via an incubation period in the
``__preview__`` package.

Criteria for "graduation"

In principle, most modules in the ``__preview__`` package should eventually
graduate to the stable standard library.  Some reasons for not graduating are:

* The module may prove to be unstable or fragile, without sufficient developer
  support to maintain it.
* A much better alternative module may be found during the preview release

Essentially, the decision will be made by the core developers on a per-case
basis.  The point to emphasize here is that a module's appearance in the
``__preview__`` package in some release does not guarantee it will continue
being part of Python in the next release.


Suppose the ``example`` module is a candidate for inclusion in the standard
library, but some Python developers aren't convinced that it presents the best
API for the problem it intends to solve.  The module can then be added to the
``__preview__`` package in release ``3.X``, importable via::

    from __preview__ import example

Assuming the module is then promoted to the the standard library proper in
release ``3.X+1``, it will be moved to a permanent location in the library::

    import example

And importing it from ``__preview__`` will no longer work.


Benefits for the core development team

Currently, the core developers are really reluctant to add new interfaces to
the standard library.  This is because as soon as they're published in a
release, API design mistakes get locked in due to backward compatibility

By gating all major API additions through some kind of a preview mechanism
for a full release, we get one full release cycle of community feedback
before we lock in the APIs with our standard backward compatibility guarantee.

We can also start integrating preview modules with the rest of the standard
library early, so long as we make it clear to packagers that the preview
modules should not be considered optional.  The only difference between preview
APIs and the rest of the standard library is that preview APIs are explicitly
exempted from the usual backward compatibility guarantees.

Essentially, the ``__preview__`` package is intended to lower the risk of
locking in minor API design mistakes for extended periods of time.  Currently,
this concern can block new additions, even when the core development team
consensus is that a particular addition is a good idea in principle.

Benefits for end users

For future end users, the broadest benefit lies in a better "out-of-the-box"
experience - rather than being told "oh, the standard library tools for task X
are horrible, download this 3rd party library instead", those superior tools
are more likely to be just be an import away.

For environments where developers are required to conduct due diligence on
their upstream dependencies (severely harming the cost-effectiveness of, or
even ruling out entirely, much of the material on PyPI), the key benefit lies
in ensuring that anything in the ``__preview__`` package is clearly under
python-dev's aegis from at least the following perspectives:

* Licensing:  Redistributed by the PSF under a Contributor Licensing Agreement.
* Documentation: The documentation of the module is published and organized via
  the standard Python documentation tools (i.e. ReST source, output generated
  with Sphinx and published on
* Testing: The module test suites are run on the buildbot fleet
  and results published via
* Issue management: Bugs and feature requests are handled on
* Source control: The master repository for the software is published

Candidates for inclusion into __preview__

For Python 3.3, there are a number of clear current candidates:

* ``regex`` (
* ``daemon`` (PEP 3143)
* ``ipaddr`` (PEP 3144)

Other possible future use cases include:

* Improved HTTP modules (e.g. ``requests``)
* HTML 5 parsing support (e.g. ``html5lib``)
* Improved URL/URI/IRI parsing
* A standard image API (PEP 368)
* Encapsulation of the import state (PEP 368)
* Standard event loop API (PEP 3153)
* A binary version of WSGI for Python 3 (e.g. PEP 444)
* Generic function support (e.g. ``simplegeneric``)

Relationship with PEP 407

PEP 407 proposes a change to the core Python release cycle to permit interim
releases every 6 months (perhaps limited to standard library updates). If
such a change to the release cycle is made, the following policy for the
``__preview__`` namespace is suggested:

* For long term support releases, the ``__preview__`` namespace would always
  be empty.
* New modules would be accepted into the ``__preview__`` namespace only in
  interim releases that immediately follow a long term support release.
* All modules added will either be migrated to their final location in the
  standard library or dropped entirely prior to the next long term support

Rejected alternatives and variations

Using ``__future__``

Python already has a "forward-looking" namespace in the form of the
``__future__`` module, so it's reasonable to ask why that can't be re-used for
this new purpose.

There are two reasons why doing so not appropriate:

1. The ``__future__`` module is actually linked to a separate compiler
directives feature that can actually change the way the Python interpreter
compiles a module.  We don't want that for the preview package - we just want
an ordinary Python package.

2. The ``__future__`` module comes with an express promise that names will be
maintained in perpetuity, long after the associated features have become the
compiler's default behaviour.  Again, this is precisely the opposite of what is
intended for the preview package - it is almost certain that all names added to
the preview will be removed at some point, most likely due to their being moved
to a permanent home in the standard library, but also potentially due to their
being reverted to third party package status (if community feedback suggests the
proposed addition is irredeemably broken).

Versioning the package

One proposed alternative [1]_ was to add explicit versioning to the
``__preview__`` package, i.e. ``__preview34__``.  We think that it's better to
simply define that a module being in ``__preview__`` in Python 3.X will either
graduate to the normal standard library namespace in Python 3.X+1 or will
disappear from the Python source tree altogether.  Versioning the ``_preview__``
package complicates the process and does not align well with the main intent of
this proposal.

Using a package name without leading and trailing underscores

It was proposed [1]_ to use a package name like ``preview`` or ``exp``, instead
of ``__preview__``.  This was rejected in the discussion due to the special
meaning a "dunder" package name (that is, a name *with* leading and
trailing double-underscores) conveys in Python.  Besides, a non-dunder name
would suggest normal standard library API stability guarantees, which is not
the intention of the ``__preview__`` package.

Preserving pickle compatibility

A pickled class instance based on a module in ``__preview__`` in release 3.X
won't be unpickle-able in release 3.X+1, where the module won't be in
``__preview__``.  Special code may be added to make this work, but this goes
against the intent of this proposal, since it implies backward compatibility.
Therefore, this PEP does not propose to preserve pickle compatibility.


Dj Gilcrease initially proposed the idea of having a ``__preview__`` package
in Python [2]_.  Although his original proposal uses the name
``__experimental__``, we feel that ``__preview__`` conveys the meaning of this
package in a better way.


.. [#] Discussed in this thread:

.. [#]


This document has been placed in the public domain.

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8

From anacrolix at  Fri Jan 27 14:48:06 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 28 Jan 2012 00:48:06 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

+0. I think the idea is right, and will help to get good quality
modules in at a faster rate. However it is compensating for a lack of
interface and packaging standardization in the 3rd party module world.

From phil at  Fri Jan 27 15:37:08 2012
From: phil at (Philippe Fremy)
Date: Fri, 27 Jan 2012 15:37:08 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>


A small comment from a user perspective.

Since a package in preview is strongly linked to a given version of
Python, any program taking advantage of it becomes strongly specific to
a given version of Python.

Such programs will of course break for any upgrade or downgrade of
python version. To make the reason for the breakage more explicit, I
believe that the PEP should provide examples of correct versionned usage
of the module.

Something along the lines of :

if sys.version_info[:2] == (3, X):
	from __preview__ import example
	raise ImportError( 'Package example is only available as preview in
Python version 3.X. Please check the documentation of your version of
Python to see if and how you can get the package example.' )



From solipsis at  Fri Jan 27 16:09:34 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 27 Jan 2012 16:09:34 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Fri, 27 Jan 2012 15:21:33 +0200
Eli Bendersky <eliben at> wrote:
> Following an earlier discussion on python-ideas [1], we would like to
> propose the following PEP for review. Discussion is welcome. The PEP
> can also be viewed in HTML form at

A big +1 from me.

> Assuming the module is then promoted to the the standard library proper in
> release ``3.X+1``, it will be moved to a permanent location in the library::
>     import example
> And importing it from ``__preview__`` will no longer work.

Why not leave it accessible through __preview__ too?

> Benefits for the core development team
> --------------------------------------
> Currently, the core developers are really reluctant to add new interfaces to
> the standard library.

A nit, but I think "reluctant" is enough and "really" makes the
tone very defensive :)

> Relationship with PEP 407
> =========================
> PEP 407 proposes a change to the core Python release cycle to permit interim
> releases every 6 months (perhaps limited to standard library updates). If
> such a change to the release cycle is made, the following policy for the
> ``__preview__`` namespace is suggested:
> * For long term support releases, the ``__preview__`` namespace would always
>   be empty.
> * New modules would be accepted into the ``__preview__`` namespace only in
>   interim releases that immediately follow a long term support release.

Well this is all speculative (due to the status of PEP 407) but I think
a simpler approach of having a __preview__ namespace in all releases
(including LTS) would be easier to handler for both us and our users.
People can refrain from using anything in __preview__ if that's what
they prefer. The naming and the double underscores make it quite
recognizable at the top of a source file :-)

> Preserving pickle compatibility
> -------------------------------
> A pickled class instance based on a module in ``__preview__`` in release 3.X
> won't be unpickle-able in release 3.X+1, where the module won't be in
> ``__preview__``.  Special code may be added to make this work, but this goes
> against the intent of this proposal, since it implies backward compatibility.
> Therefore, this PEP does not propose to preserve pickle compatibility.

Wouldn't it be a good argument to keep __preview__.XXX as an alias?



From fuzzyman at  Fri Jan 27 16:25:28 2012
From: fuzzyman at (Michael Foord)
Date: Fri, 27 Jan 2012 15:25:28 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 14:37, Philippe Fremy wrote:
> Hi,
> A small comment from a user perspective.
> Since a package in preview is strongly linked to a given version of
> Python, any program taking advantage of it becomes strongly specific to
> a given version of Python.
> Such programs will of course break for any upgrade or downgrade of
> python version. To make the reason for the breakage more explicit, I
> believe that the PEP should provide examples of correct versionned usage
> of the module.
> Something along the lines of :
> if sys.version_info[:2] == (3, X):
> 	from __preview__ import example
> else:
> 	raise ImportError( 'Package example is only available as preview in
> Python version 3.X. Please check the documentation of your version of
> Python to see if and how you can get the package example.' )

A more normal incantation, as is often the way for packages that became 
parts of the standard library after first being a third party library 
(sometimes under a different name, e.g. simplejson -> json):

     from __preview__ import thing
except ImportError:
     import thing

So no need to target a very specific version of Python.


> cheers,
> Philippe
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Fri Jan 27 16:27:36 2012
From: fuzzyman at (Michael Foord)
Date: Fri, 27 Jan 2012 15:27:36 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 15:09, Antoine Pitrou wrote:
> On Fri, 27 Jan 2012 15:21:33 +0200
> Eli Bendersky<eliben at>  wrote:
>> Following an earlier discussion on python-ideas [1], we would like to
>> propose the following PEP for review. Discussion is welcome. The PEP
>> can also be viewed in HTML form at
> A big +1 from me.
>> Assuming the module is then promoted to the the standard library proper in
>> release ``3.X+1``, it will be moved to a permanent location in the library::
>>      import example
>> And importing it from ``__preview__`` will no longer work.
> Why not leave it accessible through __preview__ too?


The point about pickling is one good reason, minimising code breakage 
(due to package name changing) is another.


>> Benefits for the core development team
>> --------------------------------------
>> Currently, the core developers are really reluctant to add new interfaces to
>> the standard library.
> A nit, but I think "reluctant" is enough and "really" makes the
> tone very defensive :)
>> Relationship with PEP 407
>> =========================
>> PEP 407 proposes a change to the core Python release cycle to permit interim
>> releases every 6 months (perhaps limited to standard library updates). If
>> such a change to the release cycle is made, the following policy for the
>> ``__preview__`` namespace is suggested:
>> * For long term support releases, the ``__preview__`` namespace would always
>>    be empty.
>> * New modules would be accepted into the ``__preview__`` namespace only in
>>    interim releases that immediately follow a long term support release.
> Well this is all speculative (due to the status of PEP 407) but I think
> a simpler approach of having a __preview__ namespace in all releases
> (including LTS) would be easier to handler for both us and our users.
> People can refrain from using anything in __preview__ if that's what
> they prefer. The naming and the double underscores make it quite
> recognizable at the top of a source file :-)
>> Preserving pickle compatibility
>> -------------------------------
>> A pickled class instance based on a module in ``__preview__`` in release 3.X
>> won't be unpickle-able in release 3.X+1, where the module won't be in
>> ``__preview__``.  Special code may be added to make this work, but this goes
>> against the intent of this proposal, since it implies backward compatibility.
>> Therefore, this PEP does not propose to preserve pickle compatibility.
> Wouldn't it be a good argument to keep __preview__.XXX as an alias?
> Regards
> Antoine.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From benjamin at  Fri Jan 27 16:34:59 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 10:34:59 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/27 Eli Bendersky <eliben at>:
> Criteria for "graduation"
> -------------------------

I think you also need "Criteria for being placed in __preview__". Do
we just toss everything someone suggests in?


From anacrolix at  Fri Jan 27 16:35:53 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 28 Jan 2012 02:35:53 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

> A more normal incantation, as is often the way for packages that became
> parts of the standard library after first being a third party library
> (sometimes under a different name, e.g. simplejson -> json):
> try:
> ? ?from __preview__ import thing
> except ImportError:
> ? ?import thing
> So no need to target a very specific version of Python.

I think this is suboptimal, having to guess where modules are located,
you end up with this in every module:

    import cjson as json
except ImportError:
       import simplejson as json
  except ImportError:
      import json as json

Perhaps the versioned import stuff could be implemented (whatever the
syntax may be), in order that something like this can be done instead:

import regex('__preview__')
import regex('3.4')

Where clearly the __preview__ version makes no guarantees about
interface or implementation whatsoever.


From fuzzyman at  Fri Jan 27 16:37:44 2012
From: fuzzyman at (Michael Foord)
Date: Fri, 27 Jan 2012 15:37:44 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 15:34, Benjamin Peterson wrote:
> 2012/1/27 Eli Bendersky<eliben at>:
>> Criteria for "graduation"
>> -------------------------
> I think you also need "Criteria for being placed in __preview__". Do
> we just toss everything someone suggests in?
And given that permanently deleting something from __preview__ would be 
a big deal (deciding it didn't make the grade and should never 
graduate), the criteria shouldn't be much less strict than for adopting 
a package into the standard library.

i.e. once something gets into __preview__ people are going to assume it 
will graduate at some point - __preview__ is a place for apis to 
stabilise and mature, not a place for dubious libraries that we may or 
may not want in the standard library at some point.



May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Fri Jan 27 16:39:44 2012
From: fuzzyman at (Michael Foord)
Date: Fri, 27 Jan 2012 15:39:44 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 15:35, Matt Joiner wrote:
>> A more normal incantation, as is often the way for packages that became
>> parts of the standard library after first being a third party library
>> (sometimes under a different name, e.g. simplejson ->  json):
>> try:
>>     from __preview__ import thing
>> except ImportError:
>>     import thing
>> So no need to target a very specific version of Python.
> I think this is suboptimal, having to guess where modules are located,
> you end up with this in every module:
> try:
>      import cjson as json
> except ImportError:
>     try:
>         import simplejson as json
>    except ImportError:
>        import json as json

It's trivial to wrap in a function though - or do the import in one 
place and then import the package from there.


> Perhaps the versioned import stuff could be implemented (whatever the
> syntax may be), in order that something like this can be done instead:
> import regex('__preview__')
> import regex('3.4')
> Where clearly the __preview__ version makes no guarantees about
> interface or implementation whatsoever.
> etc.


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From benjamin at  Fri Jan 27 16:42:51 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 10:42:51 -0500
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/26 Ethan Furman <ethan at>:

Congratulations, you are now PEP 409.


From phil at  Fri Jan 27 17:09:08 2012
From: phil at (Philippe Fremy)
Date: Fri, 27 Jan 2012 17:09:08 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 16:25, Michael Foord wrote:
> On 27/01/2012 14:37, Philippe Fremy wrote:
>> Hi,
>> A small comment from a user perspective.
>> Since a package in preview is strongly linked to a given version of
>> Python, any program taking advantage of it becomes strongly specific to
>> a given version of Python.
>> Such programs will of course break for any upgrade or downgrade of
>> python version. To make the reason for the breakage more explicit, I
>> believe that the PEP should provide examples of correct versionned usage
>> of the module.
>> Something along the lines of :
>> if sys.version_info[:2] == (3, X):
>>     from __preview__ import example
>> else:
>>     raise ImportError( 'Package example is only available as preview in
>> Python version 3.X. Please check the documentation of your version of
>> Python to see if and how you can get the package example.' )
> A more normal incantation, as is often the way for packages that became
> parts of the standard library after first being a third party library
> (sometimes under a different name, e.g. simplejson -> json):
> try:
>     from __preview__ import thing
> except ImportError:
>     import thing
> So no need to target a very specific version of Python.

According to the PEP, the interface may change betweeen __preview__ and
final inclusion in stdlib. It would be unwise as a developer to assume
that a program written for the preview version will work correctly in
the stdlib version, wouldn't it ?

I would use your "normal" incantation only after checking that no
significant API change have occured after stdlib integration.

By the way, if as Antoine suggests, the package remain available in
__preview__ even after it's accepted in the stdlib, how is the user
supposed to deal with possible API changes ?



From solipsis at  Fri Jan 27 17:39:50 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 27 Jan 2012 17:39:50 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

Hello Philippe,

On Fri, 27 Jan 2012 17:09:08 +0100
Philippe Fremy <phil at> wrote:
> According to the PEP, the interface may change betweeen __preview__ and
> final inclusion in stdlib. It would be unwise as a developer to assume
> that a program written for the preview version will work correctly in
> the stdlib version, wouldn't it ?
> I would use your "normal" incantation only after checking that no
> significant API change have occured after stdlib integration.
> By the way, if as Antoine suggests, the package remain available in
> __preview__ even after it's accepted in the stdlib, how is the user
> supposed to deal with possible API changes ?

The API *may* change but it would probably not change much anyway.
Consider e.g. the "regex" module: it aims at compatibility with the
standard "re" module; there may be additional APIs (e.g. new flags),
but whoever uses it with the standard "re" API would not see any
difference between the __preview__ version and the final version.



From eliben at  Fri Jan 27 17:44:02 2012
From: eliben at (Eli Bendersky)
Date: Fri, 27 Jan 2012 18:44:02 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

>> Assuming the module is then promoted to the the standard library proper in
>> release ``3.X+1``, it will be moved to a permanent location in the library::
>> ? ? import example
>> And importing it from ``__preview__`` will no longer work.
> Why not leave it accessible through __preview__ too?

I guess there's no real problem with leaving it accessible, as long as
it's clear that the API may have changed between releases. I.e. when a
package "graduates" and is also left accessible through __preview__,
it should obviously be just a pointer to the same package, so if the
API changed, code that imported it from __preview__ in a previous
release may stop working.

>> Benefits for the core development team
>> --------------------------------------
>> Currently, the core developers are really reluctant to add new interfaces to
>> the standard library.
> A nit, but I think "reluctant" is enough and "really" makes the
> tone very defensive :)

Agreed, I will change this

>> Relationship with PEP 407
>> =========================
>> PEP 407 proposes a change to the core Python release cycle to permit interim
>> releases every 6 months (perhaps limited to standard library updates). If
>> such a change to the release cycle is made, the following policy for the
>> ``__preview__`` namespace is suggested:
>> * For long term support releases, the ``__preview__`` namespace would always
>> ? be empty.
>> * New modules would be accepted into the ``__preview__`` namespace only in
>> ? interim releases that immediately follow a long term support release.
> Well this is all speculative (due to the status of PEP 407) but I think
> a simpler approach of having a __preview__ namespace in all releases
> (including LTS) would be easier to handler for both us and our users.
> People can refrain from using anything in __preview__ if that's what
> they prefer. The naming and the double underscores make it quite
> recognizable at the top of a source file :-)

I agree that it's speculative, and would recommend to decouple the two
PEPs. They surely can live on their own and aren't tied. If PEP 407
gets accepted, this section can be reworded appropriately.

>> Preserving pickle compatibility
>> -------------------------------
>> A pickled class instance based on a module in ``__preview__`` in release 3.X
>> won't be unpickle-able in release 3.X+1, where the module won't be in
>> ``__preview__``. ?Special code may be added to make this work, but this goes
>> against the intent of this proposal, since it implies backward compatibility.
>> Therefore, this PEP does not propose to preserve pickle compatibility.
> Wouldn't it be a good argument to keep __preview__.XXX as an alias?

Good point.


From eliben at  Fri Jan 27 17:45:27 2012
From: eliben at (Eli Bendersky)
Date: Fri, 27 Jan 2012 18:45:27 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

>> Something along the lines of :
>> if sys.version_info[:2] == (3, X):
>> ? ? ? ?from __preview__ import example
>> else:
>> ? ? ? ?raise ImportError( 'Package example is only available as preview in
>> Python version 3.X. Please check the documentation of your version of
>> Python to see if and how you can get the package example.' )
> A more normal incantation, as is often the way for packages that became
> parts of the standard library after first being a third party library
> (sometimes under a different name, e.g. simplejson -> json):
> try:
> ? ?from __preview__ import thing
> except ImportError:
> ? ?import thing
> So no need to target a very specific version of Python.

Yep, this is what I had in mind. And it appeared too trivial to place
it in the PEP.


From eliben at  Fri Jan 27 17:47:05 2012
From: eliben at (Eli Bendersky)
Date: Fri, 27 Jan 2012 18:47:05 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 17:34, Benjamin Peterson <benjamin at> wrote:
> 2012/1/27 Eli Bendersky <eliben at>:
>> Criteria for "graduation"
>> -------------------------
> I think you also need "Criteria for being placed in __preview__". Do
> we just toss everything someone suggests in?

I hoped to have this covered by:

  "In any case, modules that are proposed to be added to the standard
library, whether via __preview__ or directly, must fulfill the
acceptance conditions set by PEP 2."

PEP 2 is quite detailed and I saw no need to repeat large chunks of it
here. The idea is that all the same restrictions and caveats apply.
The thing that goes away is promise for future API stability.


From status at  Fri Jan 27 18:07:35 2012
From: status at (Python tracker)
Date: Fri, 27 Jan 2012 18:07:35 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <>

ACTIVITY SUMMARY (2012-01-20 - 2012-01-27)
Python tracker at

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3234 (+25)
  closed 22437 (+32)
  total  25671 (+57)

Open issues with patches: 1391 

Issues opened (44)

#6631: Disallow relative files paths in urllib*.open()  reopened by amaury.forgeotdarc

#13829: exception error in  reopened by ned.deily

#13836: Define key failed  opened by olivier57

#13837: test_shutil fails with symlinks enabled under Windows  opened by pitrou

#13839: -m pstats should combine all the profiles given as arguments  opened by anacrolix

#13841: multiprocessing should use sys.exit() where possible  opened by brandj

#13842: Cannot pickle Ellipsis or NotImplemented  opened by James.Sanders

#13843: Python doesn't compile anymore on our Solaris buildbot: undefi  opened by haypo

#13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on  opened by haypo

#13846: Add time.monotonic() function  opened by haypo

#13847: Catch time(), ftime(), localtime() and clock() errors  opened by haypo

#13848: doesn't check for embedded NUL characters  opened by pitrou

#13849: Add tests for NUL checking in certain strs  opened by alex

#13850: Summary tables for argparse add_argument options  opened by ncoghlan

#13851: Packaging distutils2 for Fedora  opened by vikash

#13854: multiprocessing: SystemExit from child with non-int, non-str a  opened by brandj

#13855: Add qualname support to types.FunctionType  opened by meador.inge

#13856: xmlrpc / httplib changes to allow for certificate verification  opened by Nathanael.Noblet

#13857: Add textwrap.indent() as counterpart to textwrap.dedent()  opened by ncoghlan

#13860: PyBuffer_FillInfo() return value  opened by skrah

#13861: test_pydoc failure  opened by skrah

#13863: import.c sometimes generates incorrect timestamps on Windows +  opened by mark.dickinson

#13865: distutils documentation says Extension has "optional" argument  opened by tebeka

#13866: {urllib,urllib.parse}.urlencode should not use quote_plus  opened by Stephen.Day

#13867: misleading comment in weakrefobject.h  opened by Jim.Jewett

#13868: Add hyphen doc fix  opened by Retro

#13869: CFLAGS="-UNDEBUG" build failure  opened by skrah

#13871: namedtuple does not normalize field names when checking for du  opened by Jim.Jewett

#13872: socket.detach doesn't mark socket._closed  opened by anacrolix

#13873: SIGBUS in test_zlib on Debian bigmem buildbot  opened by nadeem.vawda

#13874: test_faulthandler: read_null test fails with current clang  opened by skrah

#13875: cmd: no user documentation  opened by techtonik

#13876: Sporadic failure in test_socket  opened by nadeem.vawda

#13878: test_sched failures on Windows buildbot  opened by nadeem.vawda

#13879: Argparse does not support subparser aliases in 2.7  opened by Tim.Willis

#13880: pydoc -k throws "AssertionError: distutils has already been pa  opened by __KFL__

#13881: Stream encoder for zlib_codec doesn't use the incremental enco  opened by amcnabb

#13882: Add format argument for time.time(), time.clock(), ... to get  opened by haypo

#13884: IDLE 2.6.5 Recent Files undocks  opened by mcgrete

#13886: readline-related test_builtin failure  opened by nadeem.vawda

#13888: test_builtin failure when run after test_tk  opened by nadeem.vawda

#13889: str(float) and round(float) issues with FPU precision  opened by samuel.iseli

#13890: test_importlib failures under Windows  opened by pitrou

#1003195: segfault when running smtplib example  reopened by neologix

Most recent 15 issues with no replies (15)

#13890: test_importlib failures under Windows

#13889: str(float) and round(float) issues with FPU precision

#13888: test_builtin failure when run after test_tk

#13881: Stream encoder for zlib_codec doesn't use the incremental enco

#13876: Sporadic failure in test_socket

#13872: socket.detach doesn't mark socket._closed

#13869: CFLAGS="-UNDEBUG" build failure

#13868: Add hyphen doc fix

#13867: misleading comment in weakrefobject.h

#13866: {urllib,urllib.parse}.urlencode should not use quote_plus

#13865: distutils documentation says Extension has "optional" argument

#13861: test_pydoc failure

#13860: PyBuffer_FillInfo() return value

#13856: xmlrpc / httplib changes to allow for certificate verification

#13855: Add qualname support to types.FunctionType

Most recent 15 issues waiting for review (15)

#13889: str(float) and round(float) issues with FPU precision

#13886: readline-related test_builtin failure

#13882: Add format argument for time.time(), time.clock(), ... to get

#13879: Argparse does not support subparser aliases in 2.7

#13872: socket.detach doesn't mark socket._closed

#13868: Add hyphen doc fix

#13856: xmlrpc / httplib changes to allow for certificate verification

#13848: doesn't check for embedded NUL characters

#13847: Catch time(), ftime(), localtime() and clock() errors

#13846: Add time.monotonic() function

#13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on

#13842: Cannot pickle Ellipsis or NotImplemented

#13839: -m pstats should combine all the profiles given as arguments

#13833: No documentation for PyStructSequence

#13817: deadlock in subprocess while running several threads using Pop

Top 10 most discussed issues (10)

#13703: Hash collision security issue  61 msgs

#4966: Improving Lib Doc Sequence Types Section  10 msgs

#13790: In str.format an incorrect error message for list, tuple, dict   9 msgs

#11457: os.stat(): add new fields to get timestamps as Decimal objects   8 msgs

#13850: Summary tables for argparse add_argument options   8 msgs

#6210: Exception Chaining missing method for suppressing context   7 msgs

#13845: Use GetSystemTimeAsFileTime() to get a resolution of 100 ns on   7 msgs

#13847: Catch time(), ftime(), localtime() and clock() errors   7 msgs

#13849: Add tests for NUL checking in certain strs   7 msgs

#13609: Add "os.get_terminal_size()" function   6 msgs

Issues closed (31)

#8052: subprocess close_fds behavior should only close open fds  closed by gregory.p.smith

#11235: Source files with date modifed in 2106 cause OverflowError  closed by pitrou

#12922: StringIO and seek()  closed by pitrou

#13071: IDLE accepts, then crashes, on invalid key bindings.  closed by terry.reedy

#13190: ConfigParser uses wrong newline on Windows  closed by lukasz.langa

#13435: Copybutton does not hide tracebacks  closed by ezio.melotti

#13737:'s Django settings file DEBUG=True  closed by ezio.melotti

#13772: listdir() doesn't work with non-trivial symlinks  closed by pitrou

#13793: hasattr, delattr, getattr fail with unnormalized names  closed by benjamin.peterson

#13796: use 'text=...' to define the text attribute of and xml.etree.E  closed by terry.reedy

#13798: Pasting and then running code doesn't work in the IDLE Shell  closed by terry.reedy

#13804: Python library structure creates hard to read code when using  closed by terry.reedy

#13812: multiprocessing package doesn't flush stderr on child exceptio  closed by pitrou

#13816: Two typos in the docs  closed by georg.brandl

#13820: 2.6 is no longer in the future  closed by terry.reedy

#13834: In help(bytes.strip) there is no info about leading ASCII whit  closed by georg.brandl

#13835: whatsnew/3.3 misspelling/mislink  closed by sandro.tosi

#13838: In str.format "{0:#.5g}" for decimal.Decimal doesn't print tra  closed by eric.smith

#13840: create_string_buffer rejects str init_or_size parameter  closed by meador.inge

#13844: doesn't escape title attributes in annotate view  closed by pitrou

#13852: Doc fixes with patch  closed by georg.brandl

#13853: SystemExit/sys.exit() doesn't print boolean argument  closed by brett.cannon

#13858: readline fails on nonblocking, unbuffered io.FileIO objects  closed by neologix

#13859: Lingering StandardError in logging module  closed by python-dev

#13862: test_zlib failure  closed by nadeem.vawda

#13864: IDLE: Python 2.7.2 refuses to open  closed by terry.reedy

#13870: Out-of-date comment in collections/ ordered dict  closed by rhettinger

#13877: segfault when running smtplib example  closed by neologix

#13883: PYTHONCASEOK docs mistakenly says it is limited to Windows  closed by brett.cannon

#13885: CVE-2011-3389: _ssl module always disables the CBC IV attack c  closed by pitrou

#13887: defaultdict.get does not default to initial default but None  closed by python-dev

From alex.gaynor at  Fri Jan 27 18:26:29 2012
From: alex.gaynor at (Alex)
Date: Fri, 27 Jan 2012 17:26:29 +0000 (UTC)
Subject: [Python-Dev]
References: <>
Message-ID: <>

Eli Bendersky <eliben <at>> writes:

> Hello,
> Following an earlier discussion on python-ideas [1], we would like to
> propose the following PEP for review. Discussion is welcome. The PEP
> can also be viewed in HTML form at
> [1]

I'm -1 on this, for a pretty simple reason. Something goes into __preview__,
instead of it's final destination directly because it needs feedback/possibly
changes. However, given the release cycle of the stdlib (~18 months), any
feedback it gets can't be seen by actual users until it's too late. Essentially
you can only get one round of stdlib.

I think a significantly healthier process (in terms of maximizing feedback and
getting something into it's best shape) is to let a project evolve naturally on
PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
and then include it when it becomes ready on it's own merits. The counter
argument to  this is that putting it in the stdlib gets you signficantly more
eyeballs (and hopefully more feedback, therefore), my only response to this is:
if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
justify it in the stdlib.


From ethan at  Fri Jan 27 18:08:40 2012
From: ethan at (Ethan Furman)
Date: Fri, 27 Jan 2012 09:08:40 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> Did you consider to just change the
> words so users can ignore it more easily?

Yes, that has also been discussed.

Speaking for myself, it would be only slightly better.

Speaking for everyone that wants context suppression (using Steven 
D'Aprano's words):  chained exceptions expose details to the caller that 
are irrelevant implementation details.

It seems to me that generating the amount of information needed to track 
down errors is a balancing act between too much and too little; forcing 
the print of previous context when switching from exception A to 
exception B feels like too much:  at the very least it's extra noise; at 
the worst it can be confusing to the actual problem.  When the library 
(or custom class) author is catching A, saying "Yes, expected, now let's 
raise B instead", A is no longer necessary.

Also, the programmer is free to *not* use 'from None', leaving the 
complete traceback in place.


From v+python at  Fri Jan 27 19:18:35 2012
From: v+python at (Glenn Linderman)
Date: Fri, 27 Jan 2012 10:18:35 -0800
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/26/2012 10:47 PM, Glenn Linderman wrote:
> On 1/26/2012 10:25 PM, Gregory P. Smith wrote:
>> (and on top of all of this I believe we're all settled on having per
>> interpreter hash randomization_as well_  in 3.3; but this AVL tree
>> approach is one nice option for a backport to fix the major
>> vulnerability)
> If the tree code cures the problem, then randomization just makes 
> debugging harder.  I think if it is included in 3.3, it needs to have 
> a switch to turn it on/off (whichever is not default).

In case it is not clear, I meant randomization should always be able to 
be switched off.

Another issue occurs to me: when a hash with colliding keys (one that 
has been attacked, and has trees) has a non-string key added, isn't the 
flattening process likely to have extremely poor performance?

Agreed that the common HTML FORM or JSON attack vectors are unlikely to 
produce anything except string keys, but if an application grabs those, 
knows that the user keys are all strings, and adds a few more bits of 
info to the dict for convenience, using other key types, then ...  WHAM?

Seems a bit unlikely, but I know I've coded things along that line from 
time to time... I don't recall doing it in Python Web applications...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From martin at  Fri Jan 27 20:39:28 2012
From: martin at (martin at
Date: Fri, 27 Jan 2012 20:39:28 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> Another issue occurs to me: when a hash with colliding keys (one  
> that has been attacked, and has trees) has a non-string key added,  
> isn't the flattening process likely to have extremely poor  
> performance?

Correct. "Don't do that, then"

I don't consider it mandatory to fix all issues with hash collision.
In fact, none of the strategies fixes all issues with hash collisions;
even the hash-randomization solutions only deal with string keys, and
don't consider collisions on non-string keys.

From guido at  Fri Jan 27 20:54:31 2012
From: guido at (Guido van Rossum)
Date: Fri, 27 Jan 2012 11:54:31 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman <ethan at> wrote:
> Guido van Rossum wrote:
>> Did you consider to just change the
>> words so users can ignore it more easily?
> Yes, that has also been discussed.
> Speaking for myself, it would be only slightly better.
> Speaking for everyone that wants context suppression (using Steven
> D'Aprano's words): ?chained exceptions expose details to the caller that are
> irrelevant implementation details.
> It seems to me that generating the amount of information needed to track
> down errors is a balancing act between too much and too little; forcing the
> print of previous context when switching from exception A to exception B
> feels like too much: ?at the very least it's extra noise; at the worst it
> can be confusing to the actual problem. ?When the library (or custom class)
> author is catching A, saying "Yes, expected, now let's raise B instead", A
> is no longer necessary.
> Also, the programmer is free to *not* use 'from None', leaving the complete
> traceback in place.

Ok, got it. The developer has to explicitly say "raise <something>
from None" and that indicates they have really thought about the issue
of suppressing too much information and they are okay with it. I dig

--Guido van Rossum (

From storchaka at  Fri Jan 27 20:59:02 2012
From: storchaka at (Serhiy Storchaka)
Date: Fri, 27 Jan 2012 21:59:02 +0200
Subject: [Python-Dev] Hashing proposal: 64-bit hash
Message-ID: <jfuvme$i5b$>

As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely. Perhaps the easiest change, avoid 32-bit Python on the vulnerability, will use 64-bit (or more) hash on all platforms. The performance is comparable to the randomization. Keys order depended code will be braked not stronger than when you change the platform or Python feature version. Maybe all the 64 bits used only for strings, and for other objects -- only the lower 32 bits.

From benjamin at  Fri Jan 27 21:39:44 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 15:39:44 -0500
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <jfuvme$i5b$>
References: <jfuvme$i5b$>
Message-ID: <>

2012/1/27 Serhiy Storchaka <storchaka at>:
> As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely. Perhaps the easiest change, avoid 32-bit Python on the vulnerability, will use 64-bit (or more) hash on all platforms. The performance is comparable to the randomization. Keys order depended code will be braked not stronger than when you change the platform or Python feature version. Maybe all the 64 bits used only for strings, and for other objects -- only the lower 32 bits.

A tempting idea, but binary incompatible.


From steve at  Fri Jan 27 21:43:46 2012
From: steve at (Steven D'Aprano)
Date: Sat, 28 Jan 2012 07:43:46 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Eli Bendersky wrote:
> Hello,
> Following an earlier discussion on python-ideas [1], we would like to
> propose the following PEP for review. Discussion is welcome.

I think you need to emphasize that modules in __preview__ are NOT expected to 
have a forward-compatible, stable, API. This is a feature of __preview__, not 
a bug, and I believe it is the most important feature.

I see responses to this PEP that assume that APIs will be stable, and that 
having a module fail to graduate out of __preview__ should be an extraordinary 
event. But if this is the case, then why bother with __preview__? It just adds 
complexity to the process -- if __preview__.spam and spam are expected to be 
the same, then just spam straight into the std lib and be done with it.

This PEP only makes sense if we assume that __preview__.spam and spam *will* 
be different, even if only in minor ways, and that there might not even be a 
spam. There should be no expectation that every __preview__ module must 
graduate, or that every standard library module must go through __preview__. 
If it is stable and uncontroversial, __preview__ adds nothing to the process.

Even when there are candidates for inclusion with relatively stable APIs, like 
regex, we should *assume* that there will be API differences between 
__preview__.regex and regex, simply because it is less harmful to expect 
changes that don't eventuate than to expect stability and be surprised by changes.

This, I believe, rules out Antoine's suggestion that modules remain importable 
from __preview__ even after graduation to a full member of the standard 
library. We simply can't say have all three of these statements true at the 
same time:

1) regular standard library modules are expected to be backward compatible
2) __preview__ modules are not expected to be forward compatible
3) __preview__.spam is an alias to regular standard library spam

At least one of them has to go. Since both 1) and 2) are powerful features, 
and 3) is only a convenience, the obvious one to drop is 3). I note that the 
PEP, as it is currently written, explicitly states that __preview__.spam will 
be dropped when it graduates to spam. This is a good thing and should not be 

Keeping __preview__.spam around after graduation is, I believe, actively 
harmful. It adds complexity to the developer's decision-making process 
("Should I import spam from __preview__, or just import spam? What's the 
difference?"). It gives a dangerous impression that code written for 
__preview__.spam will still work for spam.

We should be discouraging simple-minded recipes like

     import spam
except ImportError:
     from __preview__ import spam, b, c)

since they undermine the vital feature of __preview__ that the signature and 
even the existence of is subject to change.

I would go further and suggest that __preview__ be explicitly called 
__unstable__. If that name is scary, and it frightens some users off, good! 
The last thing we want is when 3.4 comes around to have dozens of bug reports 
along the line of " and have different 
function signatures and aren't compatible". Of course they do. That's why 
__preview__.spam existed in the first place, to allow the API to mature 
without the expectation that it was already stable.

Since __preview__.spam (or, as I would prefer, __unstable__.spam) and spam 
cannot be treated as drop-in replacements, what is __preview__.spam good for? 
Without a stable API, __preview__.spam is not suitable for use in production 
applications that expect to run under multiple versions of the standard library.

I think the PEP needs more use-cases on who might use __preview__.spam, and 
why. These come to my mind:

* if you don't care about Python 3.x+1, then there is no reason not to
   treat Python 3.x's __preview__.spam as stable;

* rapid development proof-of-concept software ("build one to throw away")
   can safely use __preview__.spam, since they are expected to be replaced

* one-use scripts;

* use at the interactive interpreter;

* any other time where forward-compatibility is not required.

I am reminded of the long, often acrimonious arguments that took place on 
Python-Dev a few years back about the API for the ipaddr library. A lot of the 
arguments could have been short-circuited if we had said "putting ipaddr into 
__preview__ does not constitute acceptance of its API".

(On the other hand, if __preview__ becomes used in the future for library 
authors to fob-off criticism for 18 months in the hope it will just be 
forgotten, then this will be a bad thing.)


From steve at  Fri Jan 27 21:48:55 2012
From: steve at (Steven D'Aprano)
Date: Sat, 28 Jan 2012 07:48:55 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Eli Bendersky wrote:

>> try:
>>    from __preview__ import thing
>> except ImportError:
>>    import thing
>> So no need to target a very specific version of Python.
> Yep, this is what I had in mind. And it appeared too trivial to place
> it in the PEP.

Trivial and wrong.

Since thing and __preview__.thing may have subtle, or major, API differences, 
how do you use it?

     result =, b, c) +
except AttributeError:
     # Must be the preview version
     result = thing.foobar(a, c, b, x)


From pydev at  Fri Jan 27 22:08:37 2012
From: pydev at (Frank Sievertsen)
Date: Fri, 27 Jan 2012 22:08:37 +0100
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <jfuvme$i5b$>
References: <jfuvme$i5b$>
Message-ID: <>

> As already mentioned, the vulnerability of 64-bit Python rather theoretical and not practical. The size of the hash makes the attack is extremely unlikely.

Unfortunately this assumption is not correct. It works very good with

It's much harder to create (efficiently) 64-bit hash-collisions.
But I managed to do so and created strings with
a length of 16 (6-bit)-characters (a-z,  A-Z, 0-9, _, .). Even
14 characters would have been enough.

You need less than twice as many characters for the same effect as in
the 32bit-world.


From barry at  Fri Jan 27 22:10:51 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 16:10:51 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Jan 27, 2012, at 05:26 PM, Alex wrote:

>I'm -1 on this, for a pretty simple reason. Something goes into __preview__,
>instead of it's final destination directly because it needs feedback/possibly
>changes. However, given the release cycle of the stdlib (~18 months), any
>feedback it gets can't be seen by actual users until it's too
>late. Essentially you can only get one round of stdlib.

I'm -1 on this as well.  It just feels like the completely wrong way to
stabilize an API, and I think despite the caveats that are explicit in
__preview__, Python will just catch tons of grief from users and haters about
API instability anyway, because from a practical standpoint, applications
written using __preview__ APIs *will* be less stable.

It also won't improve the situation for prospective library developers because
they're locked into Python's development cycle anyway.  I also think the
benefit to users is a false one since it will be much harder to write
applications that are portable across Python releases.

>I think a significantly healthier process (in terms of maximizing feedback
>and getting something into it's best shape) is to let a project evolve
>naturally on PyPi and in the ecosystem, give feedback to it from an inclusion
>perspective, and then include it when it becomes ready on it's own
>merits. The counter argument to this is that putting it in the stdlib gets
>you signficantly more eyeballs (and hopefully more feedback, therefore), my
>only response to this is: if it doesn't get eyeballs on PyPi I don't think
>there's a great enough need to justify it in the stdlib.

I agree with everything Alex said here.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From solipsis at  Fri Jan 27 22:48:58 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 27 Jan 2012 22:48:58 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Fri, 27 Jan 2012 16:10:51 -0500
Barry Warsaw <barry at> wrote:
> I'm -1 on this as well.  It just feels like the completely wrong way to
> stabilize an API, and I think despite the caveats that are explicit in
> __preview__, Python will just catch tons of grief from users and haters about
> API instability anyway, because from a practical standpoint, applications
> written using __preview__ APIs *will* be less stable.

Well, obviously __preview__ is not for the most conservative users. I
think the name clearly conveys the idea that you are trying out
something which is not in its definitive state, doesn't it?

> >I think a significantly healthier process (in terms of maximizing feedback
> >and getting something into it's best shape) is to let a project evolve
> >naturally on PyPi and in the ecosystem, give feedback to it from an inclusion
> >perspective, and then include it when it becomes ready on it's own
> >merits. The counter argument to this is that putting it in the stdlib gets
> >you signficantly more eyeballs (and hopefully more feedback, therefore), my
> >only response to this is: if it doesn't get eyeballs on PyPi I don't think
> >there's a great enough need to justify it in the stdlib.
> I agree with everything Alex said here.

The idea that being on PyPI is sufficient is nice but flawed (the
IPaddr example). PyPI doesn't guarantee any visibility (how many
packages are there?). Furthermore, having users is not a guarantee that
the API is appropriate, either; it just means that the API is
appropriate for *some* users.

On the other hand, __preview__ would clearly signal that something is
on the verge of being frozen as an official stdlib API, and would
prompt people to actively try it.



From p.f.moore at  Fri Jan 27 23:02:00 2012
From: p.f.moore at (Paul Moore)
Date: Fri, 27 Jan 2012 22:02:00 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27 January 2012 21:48, Antoine Pitrou <solipsis at> wrote:
> Well, obviously __preview__ is not for the most conservative users. I
> think the name clearly conveys the idea that you are trying out
> something which is not in its definitive state, doesn't it?

Agreed. But that in turn implies to me that should not
be maintained as an alias for foo once it gets "promoted". Firstly,
because if you're not comfortable with changing your code to make the
simple change to remove the __preview__ prefix in the import, then how
could you be comfortable with using a module with no compatibility
guarantee anyway?

(BTW, I assume that the normal incantation would actually be "from
__preview__ import foo", as that limits the module name change to the
import statement).

> The idea that being on PyPI is sufficient is nice but flawed (the
> IPaddr example). PyPI doesn't guarantee any visibility (how many
> packages are there?). Furthermore, having users is not a guarantee that
> the API is appropriate, either; it just means that the API is
> appropriate for *some* users.

Agreed entirely. We need a way to signal somehow that a module is
being seriously considered for stdlib inclusion. That *would* result
in more uptake, and hence more testing and feedback. As an example, I
would definitely try out MRAB's regex module if it were in
__preview__, but even though I keep meaning to, I've never actually
got round to bothering to download from PyPI - I end up just using the
stdlib re for my one-off scripts.

> On the other hand, __preview__ would clearly signal that something is
> on the verge of being frozen as an official stdlib API, and would
> prompt people to actively try it.

Precisely. It's in effect a "last call for feedback", and people
should view it that way, in my opinion.


From tjreedy at  Fri Jan 27 23:40:16 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 27 Jan 2012 17:40:16 -0500
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <jfv94p$jt7$>

On 1/27/2012 2:54 PM, Guido van Rossum wrote:
> On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman<ethan at>  wrote:
>> Guido van Rossum wrote:
>>> Did you consider to just change the
>>> words so users can ignore it more easily?
>> Yes, that has also been discussed.
>> Speaking for myself, it would be only slightly better.
>> Speaking for everyone that wants context suppression (using Steven
>> D'Aprano's words):  chained exceptions expose details to the caller that are
>> irrelevant implementation details.

Especially if the users are non-programmer app users.

>> It seems to me that generating the amount of information needed to track
>> down errors is a balancing act between too much and too little; forcing the
>> print of previous context when switching from exception A to exception B
>> feels like too much:  at the very least it's extra noise; at the worst it
>> can be confusing to the actual problem.  When the library (or custom class)
>> author is catching A, saying "Yes, expected, now let's raise B instead", A
>> is no longer necessary.

I find double tracebacks to be 'jarring'. If there is a double bug, one 
in both the try and except blocks, it *should* stand out. If there is 
just one bug and the developer merely wants to rename it and change the 
message, it should not.
>> Also, the programmer is free to *not* use 'from None', leaving the complete
>> traceback in place.
> Ok, got it. The developer has to explicitly say "raise<something>
> from None" and that indicates they have really thought about the issue
> of suppressing too much information and they are okay with it. I dig
> that.

Now that I have been reminded that 'from x' was already added to raise 
statements, I am fine with reusing that. I still think it 'sticks out' 
more than the 'as' version, but when reading code, having (rare) info 
suppression stick out is not so bad.

The PEP does not address the issue of whether the new variation of raise 
is valid outside of an except block. My memory is that it was not to be 
and I think it should not be. One advantage of the 'as' form is that it 
is clear that raising the default as something else is invalid if there 
is no default.

Terry Jan Reedy

From barry at  Fri Jan 27 23:54:14 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 17:54:14 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote:

>On Fri, 27 Jan 2012 16:10:51 -0500
>Barry Warsaw <barry at> wrote:
>> I'm -1 on this as well.  It just feels like the completely wrong way to
>> stabilize an API, and I think despite the caveats that are explicit in
>> __preview__, Python will just catch tons of grief from users and haters about
>> API instability anyway, because from a practical standpoint, applications
>> written using __preview__ APIs *will* be less stable.
>Well, obviously __preview__ is not for the most conservative users. I
>think the name clearly conveys the idea that you are trying out
>something which is not in its definitive state, doesn't it?

Maybe.  I could quibble about the name, but let's not bikeshed on that
right now.  The problem as I see it is that __preview__ will be very tempting
to use in production.  In fact, its use case is almost predicated on that.
(We want you to use it so you can tell us if the API is good.)

Once people use it, they will probably ship code that relies on it, and then
the pressure will be applied to us to continue to support that API even if a
newer, better one gets promoted out of __preview__.  I worry that over time,
for all practical purposes, there won't be much difference between __preview__
and the stdlib.

>> >I think a significantly healthier process (in terms of maximizing feedback
>> >and getting something into it's best shape) is to let a project evolve
>> >naturally on PyPi and in the ecosystem, give feedback to it from an inclusion
>> >perspective, and then include it when it becomes ready on it's own
>> >merits. The counter argument to this is that putting it in the stdlib gets
>> >you signficantly more eyeballs (and hopefully more feedback, therefore), my
>> >only response to this is: if it doesn't get eyeballs on PyPi I don't think
>> >there's a great enough need to justify it in the stdlib.
>> I agree with everything Alex said here.
>The idea that being on PyPI is sufficient is nice but flawed (the
>IPaddr example). PyPI doesn't guarantee any visibility (how many
>packages are there?). Furthermore, having users is not a guarantee that
>the API is appropriate, either; it just means that the API is
>appropriate for *some* users.

I can't argue with that, it's just that I don't think __preview__ solves that
problem.  And it seems to me that __preview__ introduces a whole 'nother set
of problems on top of that.

So taking the IPaddr example further.  Would having it in the stdlib,
relegated to an explicitly unstable API part of the stdlib, increase eyeballs
enough to generate the kind of API feedback we're looking for, without
imposing an additional maintenance burden on us?  If you were writing an app
that used something in __preview__, how would you provide feedback on what
parts of the API you'd want to change, *and* how would you adapt your
application to use those better APIs once they became available 18 months from
now?  I think we'll just see folks using the unstable APIs and then
complaining when we remove them, even though they *know* *upfront* that these
APIs will go away.

I'm also nervous about it from an OS vender point of view.  Should I reject
any applications that import from __preview__?  Or do I have to make a
commitment to support those APIs longer than Python does because the
application that uses it is important to me?

I think the OS vendor problem is easier with an application that uses some
PyPI package, because I can always make that package available to the
application by pulling in the version I care about.  It's harder if a newer,
incompatible version is released upstream and I want to provide both, but I
don't think __preview__ addresses that.  A robust, standard approach to
versioning of modules would though, and I think would better solve what
__preview__ is trying to solve.

>On the other hand, __preview__ would clearly signal that something is
>on the verge of being frozen as an official stdlib API, and would
>prompt people to actively try it.

I'm not so sure about that.  If I were to actively try it, I'm not sure how
much motivation I'd have to rewrite key parts of my code when an incompatible
version gets promoted to the un__preview__d stdlib.


From barry at  Fri Jan 27 23:56:03 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 17:56:03 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 27, 2012, at 10:02 PM, Paul Moore wrote:

>Agreed entirely. We need a way to signal somehow that a module is
>being seriously considered for stdlib inclusion. That *would* result
>in more uptake, and hence more testing and feedback.

I'm just not convinced that's a message that we can clearly articulate to
users of the library.  I think most people will see it in the module
documentation, just use it, and then complain when it's gone.


From solipsis at  Sat Jan 28 00:19:37 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 00:19:37 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Fri, 27 Jan 2012 17:54:14 -0500
Barry Warsaw <barry at> wrote:
> On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote:
> >On Fri, 27 Jan 2012 16:10:51 -0500
> >Barry Warsaw <barry at> wrote:
> >> 
> >> I'm -1 on this as well.  It just feels like the completely wrong way to
> >> stabilize an API, and I think despite the caveats that are explicit in
> >> __preview__, Python will just catch tons of grief from users and haters about
> >> API instability anyway, because from a practical standpoint, applications
> >> written using __preview__ APIs *will* be less stable.
> >
> >Well, obviously __preview__ is not for the most conservative users. I
> >think the name clearly conveys the idea that you are trying out
> >something which is not in its definitive state, doesn't it?
> Maybe.  I could quibble about the name, but let's not bikeshed on that
> right now.  The problem as I see it is that __preview__ will be very tempting
> to use in production.  In fact, its use case is almost predicated on that.
> (We want you to use it so you can tell us if the API is good.)

That's my opinion too. But using it in production doesn't mean you
lose control on the code and its users. Perhaps you are used to a kind
of production where the code gets disseminated all over the
GNUniverse :)
But for most people "production" means a single server or machine where
they have entire control.

> If you were writing an app
> that used something in __preview__, how would you provide feedback on what
> parts of the API you'd want to change, *and* how would you adapt your
> application to use those better APIs once they became available 18 months from
> now?

For the former, the normal channels probably apply (bug tracker or

For the latter, depending on the API change, catching e.g.
AttributeError on module lookup, or TypeError on function call, or
explicitly examining the Python version are all plausible choices.

Let's take another example: the regex module, where the API is unlikely
to change much (since it's meant to be re-compatible), and the main
concerns are ease of maintenance, data-wise compatibility with re
(rather than API-wise), performance, and the like.

> I think we'll just see folks using the unstable APIs and then
> complaining when we remove them, even though they *know* *upfront* that these
> APIs will go away.

Hmm, isn't that a bit pessimistic about our users?

> I'm also nervous about it from an OS vender point of view.  Should I reject
> any applications that import from __preview__?  Or do I have to make a
> commitment to support those APIs longer than Python does because the
> application that uses it is important to me?

Well, is the application supported upstream? If yes, then there
shouldn't be any additional burden. If no, then you have a complication

> A robust, standard approach to
> versioning of modules would though, and I think would better solve what
> __preview__ is trying to solve.

I don't think versioning can replace API stability. __preview__ is
explicitly and visibly special, and that's a protection against us
becoming too complacent.

> >On the other hand, __preview__ would clearly signal that something is
> >on the verge of being frozen as an official stdlib API, and would
> >prompt people to actively try it.
> I'm not so sure about that.  If I were to actively try it, I'm not sure how
> much motivation I'd have to rewrite key parts of my code when an incompatible
> version gets promoted to the un__preview__d stdlib.

Obviously you would only use a module from __preview__ if the
functionality is exciting enough for you (or the cost/benefit ratio is
good enough).



From v+python at  Sat Jan 28 01:17:28 2012
From: v+python at (Glenn Linderman)
Date: Fri, 27 Jan 2012 16:17:28 -0800
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 1/27/2012 11:39 AM, martin at wrote:
>> Another issue occurs to me: when a hash with colliding keys (one that 
>> has been attacked, and has trees) has a non-string key added, isn't 
>> the flattening process likely to have extremely poor performance?
> Correct. 

Thanks for the clarification.

> "Don't do that, then"
> I don't consider it mandatory to fix all issues with hash collision.
> In fact, none of the strategies fixes all issues with hash collisions;
> even the hash-randomization solutions only deal with string keys, and
> don't consider collisions on non-string keys. 

Which is fine, I just wanted the clarification.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sat Jan 28 01:32:42 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 01:32:42 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
References: <>
	<> <>
Message-ID: <>

> I don't consider it mandatory to fix all issues with hash collision.
> In fact, none of the strategies fixes all issues with hash collisions;
> even the hash-randomization solutions only deal with string keys, and
> don't consider collisions on non-string keys.

How so? None of the patches did, but I think it was said several times
that other types (int, tuple, float) could also be converted to use
randomized hashes. What's more, there isn't any technical difficulty in
doing so.

And once you have randomized the hashes for these 4 or 5 built-in
types, most third-party types follow since the common case of a
__hash__ implementation is to call hash() on one or several



From steve at  Sat Jan 28 01:50:16 2012
From: steve at (Steven D'Aprano)
Date: Sat, 28 Jan 2012 11:50:16 +1100
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <jfv94p$jt7$>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Terry Reedy wrote:
> On 1/27/2012 2:54 PM, Guido van Rossum wrote:
>> On Fri, Jan 27, 2012 at 9:08 AM, Ethan Furman<ethan at>  wrote:
>>> Guido van Rossum wrote:
>>>> Did you consider to just change the
>>>> words so users can ignore it more easily?
>>> Yes, that has also been discussed.
>>> Speaking for myself, it would be only slightly better.
>>> Speaking for everyone that wants context suppression (using Steven
>>> D'Aprano's words):  chained exceptions expose details to the caller 
>>> that are
>>> irrelevant implementation details.
> Especially if the users are non-programmer app users.

Or beginner programmers, e.g. on the python-list and tutor mailing lists. It 
is hard enough to get beginners to post the entire traceback without making 
them bigger. The typical newbie posts just the error message, sometimes not 
even the exception type. What they will make of chained exceptions, I hate to 

> I find double tracebacks to be 'jarring'. If there is a double bug, one 
> in both the try and except blocks, it *should* stand out. If there is 
> just one bug and the developer merely wants to rename it and change the 
> message, it should not.

Agreed with all of this.

> The PEP does not address the issue of whether the new variation of raise 
> is valid outside of an except block. My memory is that it was not to be 
> and I think it should not be. One advantage of the 'as' form is that it 
> is clear that raising the default as something else is invalid if there 
> is no default.

I think that raise ... from None should be illegal outside an except block. My 
reasoning is:

1) It ensures that raise from None only occurs when the developer can see
    the old exception right there, and not "just in case".

2) I can't think of any use-cases for raise from None outside of an
    except block.

3) When in doubt, start with something more restrictive, because it is
    easier to loosen the restriction later if it turns out to be too much,
    than to change our mind and add the restriction afterwards.


From ethan at  Sat Jan 28 01:33:21 2012
From: ethan at (Ethan Furman)
Date: Fri, 27 Jan 2012 16:33:21 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <jfv94p$jt7$>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Terry Reedy wrote:
> The PEP does not address the issue of whether the new variation of raise 
> is valid outside of an except block. My memory is that it was not to be 
> and I think it should not be. One advantage of the 'as' form is that it 
> is clear that raising the default as something else is invalid if there 
> is no default.

Were you speaking of the original (PEP 3134), or this new one (PEP 409)?

Because at this point it is possible to do:

     raise ValueError from NameError

outside a try block.  I don't see it as incredibly useful, but I don't 
know that it's worth making it illegal.

So the question is:

   - should 'raise ... from ...' be legal outside a try block?

   - should 'raise ... from None' be legal outside a try block?


From martin at  Sat Jan 28 01:53:40 2012
From: martin at (martin at
Date: Sat, 28 Jan 2012 01:53:40 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

> How so? None of the patches did, but I think it was said several times
> that other types (int, tuple, float) could also be converted to use
> randomized hashes. What's more, there isn't any technical difficulty in
> doing so.

The challenge again is about incompatibility: the more types you apply this
to, the higher the risk of breaking third-party code.

Plus you still risk that the hash seed might leak out of the application,
opening it up again to the original attack.

From ncoghlan at  Sat Jan 28 02:04:09 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 11:04:09 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 11:48 PM, Matt Joiner <anacrolix at> wrote:
> +0. I think the idea is right, and will help to get good quality
> modules in at a faster rate. However it is compensating for a lack of
> interface and packaging standardization in the 3rd party module world.

No, it really isn't. virtualenv and pip already work *beautifully*, so
long as you're in an environment where:

1. Due diligence isn't a problem
2. Network connectivity isn't a problem
3. You *already know* about virtual environments and the Python Package Index
4. You either don't need dependencies written in C, or the ones you
need are written to compile cleanly under distutils and you aren't on
Windows (because Microsoft consider building fully functional binaries
from source to be an optional extra people should be charged for
rather than a fundamental feature of an operating system)

It would probably be worth adding a heading specifically countering
this myth, though.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 28 02:13:06 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 11:13:06 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

n Sat, Jan 28, 2012 at 3:26 AM, Alex <alex.gaynor at> wrote:
> I think a significantly healthier process (in terms of maximizing feedback and
> getting something into it's best shape) is to let a project evolve naturally on
> PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
> and then include it when it becomes ready on it's own merits. The counter
> argument to ?this is that putting it in the stdlib gets you signficantly more
> eyeballs (and hopefully more feedback, therefore), my only response to this is:
> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
> justify it in the stdlib.

And what about a project like regex, which *has* the eyeballs on PyPI,
but the core devs aren't confident enough of its maintainability yet
to be happy about adding it directly to the stdlib with full backwards
compatibility guarantees? The easy answer for us in that context is to
just not add it (i.e. the status quo), which isn't a healthy outcome
for the overall language ecosystem.

Really, regex is the *reason* this PEP exists: we *know* we need to
either replace or seriously enhance "re" (since its Unicode handling
isn't up to scratch), but we're only *pretty sure* adding "regex" to
the stdlib is the right answer. Adding "__preview__.regex" instead
gives us a chance to back out if we uncover serious problems (e.g.
with the cross-platform support).


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From solipsis at  Sat Jan 28 02:13:40 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 02:13:40 +0100
Subject: [Python-Dev] [issue13703] Hash collision security issue
References: <>
	<> <>
Message-ID: <>

On Sat, 28 Jan 2012 01:53:40 +0100
martin at wrote:
> > How so? None of the patches did, but I think it was said several times
> > that other types (int, tuple, float) could also be converted to use
> > randomized hashes. What's more, there isn't any technical difficulty in
> > doing so.
> The challenge again is about incompatibility: the more types you apply this
> to, the higher the risk of breaking third-party code.
> Plus you still risk that the hash seed might leak out of the application,
> opening it up again to the original attack.

Attacks on the hash seed are a different level of difficulty than
sending a well-known universal payload to a Web site.

Unless the application leaks hash() values directly, you have to guess
them from the dict ordering observed in the application's output. IMHO
it's ok if our hash function is vulnerable to cryptanalysts rather than
script kiddies.



From benjamin at  Sat Jan 28 02:19:58 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 20:19:58 -0500
Subject: [Python-Dev] plugging the hash attack
Message-ID: <>

Hello everyone,
In effort to get a fix out before Perl 6 goes mainstream, Barry and I
have decided to pronounce on what we want for our stable releases.
What we have decided is that
1. Simple hash randomization is the way to go. We think this has the
best chance of actually fixing the problem while being fairly
straightforward such that we're comfortable putting it in a stable
2. It will be off by default in stable releases and enabled by an
envar at runtime. This will prevent code breakage from dictionary
order changing as well as people depending on the hash stability.


From ncoghlan at  Sat Jan 28 02:27:35 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 11:27:35 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 6:43 AM, Steven D'Aprano <steve at> wrote:
> This PEP only makes sense if we assume that __preview__.spam and spam *will*
> be different, even if only in minor ways, and that there might not even be a
> spam. There should be no expectation that every __preview__ module must
> graduate, or that every standard library module must go through __preview__.
> If it is stable and uncontroversial, __preview__ adds nothing to the
> process.

Yes, the PEP already points to lzma as an example of a module with a
sufficiently obvious API that it didn't need to go through a preview

> Keeping __preview__.spam around after graduation is, I believe, actively
> harmful. It adds complexity to the developer's decision-making process
> ("Should I import spam from __preview__, or just import spam? What's the
> difference?"). It gives a dangerous impression that code written for
> __preview__.spam will still work for spam.

Yes, this was exactly the reasoning behind removing the names from
__preview__ namespace when the modules graduated. It sets a line in
the sand: "An API compatibility break is not only allowed, it is 100%
guaranteed. If you are not prepared to deal with this, then you are
*not* part of the target audience for the __preview__ namespace. Wait
until the module reaches the main section of the standard library
before you start using it, or else download a third party supported
version with backwards compatibility guarantees from PyPI. The
__preview__ namespace is not designed for anything that requires long
term support spanning multiple Python version - it is intended for use
in single version environments, such as intranet web services and
student classrooms"

> I would go further and suggest that __preview__ be explicitly called
> __unstable__. If that name is scary, and it frightens some users off, good!

Hmm, the problem with "unstable" is that we only mean the *API* is
unstable. The software itself will be as thoroughly tested as
everything else we ship.

> I think the PEP needs more use-cases on who might use __preview__.spam, and
> why. These come to my mind:
> * if you don't care about Python 3.x+1, then there is no reason not to
> ?treat Python 3.x's __preview__.spam as stable;
> * rapid development proof-of-concept software ("build one to throw away")
> ?can safely use __preview__.spam, since they are expected to be replaced
> ?anyway;
> * one-use scripts;
> * use at the interactive interpreter;
> * any other time where forward-compatibility is not required.

A specific list of use cases is a good idea.

I'd add a couple more:

* in a student classroom where the concept of PyPI and third party
packages has yet to be introduced

* for an intranet web service deployment where due diligence adds
significant overhead to any use of third party packages

> I am reminded of the long, often acrimonious arguments that took place on
> Python-Dev a few years back about the API for the ipaddr library. A lot of
> the arguments could have been short-circuited if we had said "putting ipaddr
> into __preview__ does not constitute acceptance of its API".

Yep, there's a reason 'ipaddr' was high on the list of modules this
could be used for :)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From s.brunthaler at  Sat Jan 28 02:28:28 2012
From: s.brunthaler at (stefan brunthaler)
Date: Fri, 27 Jan 2012 17:28:28 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <>


On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson <benjamin at> wrote:
> 2011/11/8 stefan brunthaler <s.brunthaler at>:
>> How does that sound?
> I think I can hear real patches and benchmarks most clearly.
I spent the better part of my -20% time on implementing the work as
"suggested". Please find the benchmarks attached to this email, I just
did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
off the regular 3.3a0 default tip changeset 73977 shortly after your
email. I do not have an official patch yet, but am going to create one
if wanted. Changes to the existing interpreter are minimal, the
biggest chunk is a new interpreter dispatch loop.

Merging dispatch loops eliminates some of my optimizations, but my
inline caching technique enables inlining some functionality, which
results in visible speedups. The code is normalized to the
non-threaded-code version of the CPython interpreter (named
"vanilla"), so that I can reference it to my preceding results. I
anticipate *no* compatibility issues and the interpreter requires less
than 100 KiB of extra memory at run-time. Since my interpreter is
using 215 of a maximum of 255 instructions, there is room for adding
additional derivatives, e.g., for popular Python libraries, too.

Let me know what python-dev thinks of this and have a nice weekend,

PS: AFAIR the version without partial stack frame caching also passes
all regression tests modulo the ones that test against specific
-------------- next part --------------
currently processing:  bench/
phd-cpy-3a0-thr-cod-pytho      arg:     10 | time:   0.161876  | stdev:  0.007780 | var:  0.000061 | mem:   6633.60
phd-cpy-3a0-thr-cod-pytho      arg:     12 | time:   0.699243  | stdev:  0.019112 | var:  0.000365 | mem:   8142.67
phd-cpy-3a0-thr-cod-pytho      arg:     14 | time:   3.388344  | stdev:  0.048042 | var:  0.002308 | mem:  13586.93
phd-cpy-pio-sne-pre-pyt-no-psf arg:     10 | time:   0.153875  | stdev:  0.003828 | var:  0.000015 | mem:   6873.73
phd-cpy-pio-sne-pre-pyt-no-psf arg:     12 | time:   0.632572  | stdev:  0.019121 | var:  0.000366 | mem:   8246.27
phd-cpy-pio-sne-pre-pyt-no-psf arg:     14 | time:   3.020988  | stdev:  0.043483 | var:  0.001891 | mem:  13640.27
phd-cpy-pio-sne-pre-pytho      arg:     10 | time:   0.150942  | stdev:  0.005157 | var:  0.000027 | mem:   6901.87
phd-cpy-pio-sne-pre-pytho      arg:     12 | time:   0.660841  | stdev:  0.020538 | var:  0.000422 | mem:   8286.80
phd-cpy-pio-sne-pre-pytho      arg:     14 | time:   3.184198  | stdev:  0.051103 | var:  0.002612 | mem:  13680.40
phd-cpy-3a0-van-pytho          arg:     10 | time:   0.202812  | stdev:  0.005480 | var:  0.000030 | mem:   6633.33
phd-cpy-3a0-van-pytho          arg:     12 | time:   0.908456  | stdev:  0.015744 | var:  0.000248 | mem:   8153.07
phd-cpy-3a0-van-pytho          arg:     14 | time:   4.364805  | stdev:  0.037522 | var:  0.001408 | mem:  13593.60
### phd-cpy-3a0-thr-cod-pytho     :  1.2887 (avg-sum:   1.416488)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.4383 (avg-sum:   1.269145)
### phd-cpy-pio-sne-pre-pytho     :  1.3704 (avg-sum:   1.331994)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.825358)
currently processing:  bench/
phd-cpy-3a0-thr-cod-pytho      arg:      8 | time:   0.172677  | stdev:  0.006620 | var:  0.000044 | mem:   6424.13
phd-cpy-3a0-thr-cod-pytho      arg:      9 | time:   1.426755  | stdev:  0.035545 | var:  0.001263 | mem:   6425.20
phd-cpy-pio-sne-pre-pyt-no-psf arg:      8 | time:   0.168010  | stdev:  0.010277 | var:  0.000106 | mem:   6481.07
phd-cpy-pio-sne-pre-pyt-no-psf arg:      9 | time:   1.345817  | stdev:  0.033127 | var:  0.001097 | mem:   6479.60
phd-cpy-pio-sne-pre-pytho      arg:      8 | time:   0.165876  | stdev:  0.007136 | var:  0.000051 | mem:   6520.00
phd-cpy-pio-sne-pre-pytho      arg:      9 | time:   1.351150  | stdev:  0.028822 | var:  0.000831 | mem:   6519.73
phd-cpy-3a0-van-pytho          arg:      8 | time:   0.216146  | stdev:  0.012879 | var:  0.000166 | mem:   6419.07
phd-cpy-3a0-van-pytho          arg:      9 | time:   1.834247  | stdev:  0.028224 | var:  0.000797 | mem:   6418.67
### phd-cpy-3a0-thr-cod-pytho     :  1.2820 (avg-sum:   0.799716)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.3544 (avg-sum:   0.756913)
### phd-cpy-pio-sne-pre-pytho     :  1.3516 (avg-sum:   0.758513)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.025197)
currently processing:  bench/
phd-cpy-3a0-thr-cod-pytho      arg:  50000 | time:   0.374023  | stdev:  0.010870 | var:  0.000118 | mem:   6495.07
phd-cpy-3a0-thr-cod-pytho      arg: 100000 | time:   0.714577  | stdev:  0.024713 | var:  0.000611 | mem:   6495.47
phd-cpy-3a0-thr-cod-pytho      arg: 150000 | time:   1.062866  | stdev:  0.040138 | var:  0.001611 | mem:   6496.27
phd-cpy-pio-sne-pre-pyt-no-psf arg:  50000 | time:   0.345621  | stdev:  0.022549 | var:  0.000508 | mem:   6551.87
phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time:   0.656174  | stdev:  0.031608 | var:  0.000999 | mem:   6551.60
phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time:   0.964326  | stdev:  0.046202 | var:  0.002135 | mem:   6552.13
phd-cpy-pio-sne-pre-pytho      arg:  50000 | time:   0.381223  | stdev:  0.015771 | var:  0.000249 | mem:   6592.40
phd-cpy-pio-sne-pre-pytho      arg: 100000 | time:   0.739112  | stdev:  0.035685 | var:  0.001273 | mem:   6591.60
phd-cpy-pio-sne-pre-pytho      arg: 150000 | time:   1.080334  | stdev:  0.035524 | var:  0.001262 | mem:   6591.73
phd-cpy-3a0-van-pytho          arg:  50000 | time:   0.417759  | stdev:  0.016483 | var:  0.000272 | mem:   6490.27
phd-cpy-3a0-van-pytho          arg: 100000 | time:   0.788182  | stdev:  0.019665 | var:  0.000387 | mem:   6492.40
phd-cpy-3a0-van-pytho          arg: 150000 | time:   1.187140  | stdev:  0.035640 | var:  0.001270 | mem:   6491.73
### phd-cpy-3a0-thr-cod-pytho     :  1.1123 (avg-sum:   0.717155)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.2172 (avg-sum:   0.655374)
### phd-cpy-pio-sne-pre-pytho     :  1.0874 (avg-sum:   0.733556)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   0.797694)
currently processing:
phd-cpy-3a0-thr-cod-pytho      arg:    200 | time:   0.244281  | stdev:  0.009795 | var:  0.000096 | mem:   6424.13
phd-cpy-3a0-thr-cod-pytho      arg:    400 | time:   0.861120  | stdev:  0.019812 | var:  0.000393 | mem:   6501.87
phd-cpy-3a0-thr-cod-pytho      arg:    500 | time:   1.338883  | stdev:  0.029741 | var:  0.000885 | mem:   6730.67
phd-cpy-pio-sne-pre-pyt-no-psf arg:    200 | time:   0.220013  | stdev:  0.013307 | var:  0.000177 | mem:   6476.00
phd-cpy-pio-sne-pre-pyt-no-psf arg:    400 | time:   0.789915  | stdev:  0.028319 | var:  0.000802 | mem:   6566.00
phd-cpy-pio-sne-pre-pyt-no-psf arg:    500 | time:   1.180740  | stdev:  0.042762 | var:  0.001829 | mem:   6794.00
phd-cpy-pio-sne-pre-pytho      arg:    200 | time:   0.218946  | stdev:  0.014494 | var:  0.000210 | mem:   6519.47
phd-cpy-pio-sne-pre-pytho      arg:    400 | time:   0.767381  | stdev:  0.042411 | var:  0.001799 | mem:   6614.67
phd-cpy-pio-sne-pre-pytho      arg:    500 | time:   1.162739  | stdev:  0.029852 | var:  0.000891 | mem:   6842.67
phd-cpy-3a0-van-pytho          arg:    200 | time:   0.328553  | stdev:  0.009619 | var:  0.000093 | mem:   6419.60
phd-cpy-3a0-van-pytho          arg:    400 | time:   1.202208  | stdev:  0.018670 | var:  0.000349 | mem:   6514.27
phd-cpy-3a0-van-pytho          arg:    500 | time:   1.860382  | stdev:  0.036647 | var:  0.001343 | mem:   6712.93
### phd-cpy-3a0-thr-cod-pytho     :  1.3874 (avg-sum:   0.814761)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.5480 (avg-sum:   0.730223)
### phd-cpy-pio-sne-pre-pytho     :  1.5780 (avg-sum:   0.716355)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.130381)
currently processing:  bench/
phd-cpy-3a0-thr-cod-pytho      arg:  50000 | time:   0.907789  | stdev:  0.021787 | var:  0.000475 | mem:   6668.13
phd-cpy-3a0-thr-cod-pytho      arg: 100000 | time:   1.788778  | stdev:  0.042285 | var:  0.001788 | mem:   6674.67
phd-cpy-3a0-thr-cod-pytho      arg: 150000 | time:   2.666433  | stdev:  0.062115 | var:  0.003858 | mem:   6663.20
phd-cpy-pio-sne-pre-pyt-no-psf arg:  50000 | time:   0.789515  | stdev:  0.022475 | var:  0.000505 | mem:   6720.00
phd-cpy-pio-sne-pre-pyt-no-psf arg: 100000 | time:   1.525695  | stdev:  0.039957 | var:  0.001597 | mem:   6735.87
phd-cpy-pio-sne-pre-pyt-no-psf arg: 150000 | time:   2.283342  | stdev:  0.071985 | var:  0.005182 | mem:   6730.93
phd-cpy-pio-sne-pre-pytho      arg:  50000 | time:   0.789915  | stdev:  0.012848 | var:  0.000165 | mem:   6771.47
phd-cpy-pio-sne-pre-pytho      arg: 100000 | time:   1.563297  | stdev:  0.033950 | var:  0.001153 | mem:   6770.00
phd-cpy-pio-sne-pre-pytho      arg: 150000 | time:   2.324945  | stdev:  0.050021 | var:  0.002502 | mem:   6768.93
phd-cpy-3a0-van-pytho          arg:  50000 | time:   1.167939  | stdev:  0.025035 | var:  0.000627 | mem:   6666.80
phd-cpy-3a0-van-pytho          arg: 100000 | time:   2.327478  | stdev:  0.047759 | var:  0.002281 | mem:   6666.93
phd-cpy-3a0-van-pytho          arg: 150000 | time:   3.434881  | stdev:  0.066780 | var:  0.004460 | mem:   6666.67
### phd-cpy-3a0-thr-cod-pytho     :  1.2922 (avg-sum:   1.787667)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.5071 (avg-sum:   1.532851)
### phd-cpy-pio-sne-pre-pytho     :  1.4814 (avg-sum:   1.559386)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   2.310099)
currently processing:  bench/
phd-cpy-3a0-thr-cod-pytho      arg:    100 | time:   0.267083  | stdev:  0.010964 | var:  0.000120 | mem:   6548.80
phd-cpy-3a0-thr-cod-pytho      arg:    200 | time:   0.970060  | stdev:  0.023750 | var:  0.000564 | mem:   6539.20
phd-cpy-3a0-thr-cod-pytho      arg:    300 | time:   2.160668  | stdev:  0.044157 | var:  0.001950 | mem:   6528.93
phd-cpy-pio-sne-pre-pyt-no-psf arg:    100 | time:   0.233081  | stdev:  0.007929 | var:  0.000063 | mem:   6611.87
phd-cpy-pio-sne-pre-pyt-no-psf arg:    200 | time:   0.837918  | stdev:  0.019807 | var:  0.000392 | mem:   6596.80
phd-cpy-pio-sne-pre-pyt-no-psf arg:    300 | time:   1.865183  | stdev:  0.028789 | var:  0.000829 | mem:   6616.40
phd-cpy-pio-sne-pre-pytho      arg:    100 | time:   0.241614  | stdev:  0.006662 | var:  0.000044 | mem:   6647.60
phd-cpy-pio-sne-pre-pytho      arg:    200 | time:   0.870454  | stdev:  0.017455 | var:  0.000305 | mem:   6646.53
phd-cpy-pio-sne-pre-pytho      arg:    300 | time:   1.969456  | stdev:  0.052760 | var:  0.002784 | mem:   6651.33
phd-cpy-3a0-van-pytho          arg:    100 | time:   0.355088  | stdev:  0.007057 | var:  0.000050 | mem:   6545.07
phd-cpy-3a0-van-pytho          arg:    200 | time:   1.335549  | stdev:  0.021511 | var:  0.000463 | mem:   6555.47
phd-cpy-3a0-van-pytho          arg:    300 | time:   3.042990  | stdev:  0.032533 | var:  0.001058 | mem:   6599.87
### phd-cpy-3a0-thr-cod-pytho     :  1.3931 (avg-sum:   1.132603)
### phd-cpy-pio-sne-pre-pyt-no-psf:  1.6122 (avg-sum:   0.978727)
### phd-cpy-pio-sne-pre-pytho     :  1.5361 (avg-sum:   1.027175)
### phd-cpy-3a0-van-pytho         :  1.0000 (avg-sum:   1.577876)
Overall performance:
  Interpreter: cpython-3.3a0-threaded-code/python           :  1.129733 (speedup:  1.3004, counts: 510)
Overall performance:
  Interpreter: cpython-pio-sneak-preview/python-no-psfc     :  1.000752 (speedup:  1.4680, counts: 510)
Overall performance:
  Interpreter: cpython-pio-sneak-preview/python             :  1.036613 (speedup:  1.4172, counts: 510)
Overall performance:
  Interpreter: cpython-3.3a0-vanilla/python                 :  1.469095 (speedup:  1.0000, counts: 510)

From benjamin at  Sat Jan 28 02:31:46 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 20:31:46 -0500
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/27 stefan brunthaler <s.brunthaler at>:
> Hi,
> On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson <benjamin at> wrote:
>> 2011/11/8 stefan brunthaler <s.brunthaler at>:
>>> How does that sound?
>> I think I can hear real patches and benchmarks most clearly.
> I spent the better part of my -20% time on implementing the work as
> "suggested". Please find the benchmarks attached to this email, I just
> did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
> off the regular 3.3a0 default tip changeset 73977 shortly after your
> email. I do not have an official patch yet, but am going to create one
> if wanted. Changes to the existing interpreter are minimal, the
> biggest chunk is a new interpreter dispatch loop.
> Merging dispatch loops eliminates some of my optimizations, but my
> inline caching technique enables inlining some functionality, which
> results in visible speedups. The code is normalized to the
> non-threaded-code version of the CPython interpreter (named
> "vanilla"), so that I can reference it to my preceding results. I
> anticipate *no* compatibility issues and the interpreter requires less
> than 100 KiB of extra memory at run-time. Since my interpreter is
> using 215 of a maximum of 255 instructions, there is room for adding
> additional derivatives, e.g., for popular Python libraries, too.
> Let me know what python-dev thinks of this and have a nice weekend,

Cool. It'd be nice to see a patch.


From ncoghlan at  Sat Jan 28 02:37:37 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 11:37:37 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 8:54 AM, Barry Warsaw <barry at> wrote:
> I think the OS vendor problem is easier with an application that uses some
> PyPI package, because I can always make that package available to the
> application by pulling in the version I care about. ?It's harder if a newer,
> incompatible version is released upstream and I want to provide both, but I
> don't think __preview__ addresses that. ?A robust, standard approach to
> versioning of modules would though, and I think would better solve what
> __preview__ is trying to solve.

I'd be A-OK with an explicit requirement that *any* module shipped in
__preview__ must have a third-party supported multi-version compatible
alternative on PyPI. (PEP 2 actually pretty much says that should be
the case, but making it mandatory in the __preview__ case would be a
good idea).

As an OS vendor, you'd then be able to say: "Don't use __preview__,
since that will cause problems when we next upgrade the system Python.
Use the PyPI version instead."

Then the stdlib docs for that module (while it is in __preview__)
would say "If you are able to easily use third party packages, package
<X> offers this API for multiple Python versions with stronger API
stability guarantees. This preview version of the module is for use in
environments that specifically target a single Python version and/or
where the use of third party packages outside the standard library
poses additional complications beyond simply downloading and
installing the code."


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From barry at  Sat Jan 28 02:48:10 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 20:48:10 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 28, 2012, at 11:37 AM, Nick Coghlan wrote:

>Then the stdlib docs for that module (while it is in __preview__)
>would say "If you are able to easily use third party packages, package
><X> offers this API for multiple Python versions with stronger API
>stability guarantees. This preview version of the module is for use in
>environments that specifically target a single Python version and/or
>where the use of third party packages outside the standard library
>poses additional complications beyond simply downloading and
>installing the code."

Would it be acceptable then for a distro to disable __preview__ or empty it

The thinking goes like this: if you would normally use an __preview__ module
because you can't get approval to download some random package from PyPI, well
then your distro probably could or should provide it, so get it from them.  In
fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents
were a requirement, then a distro vendor could just ensure those PyPI versions
are available as distro packages outside of the __preview__ stdlib namespace
(i.e. in their normal third-party namespace).  Then folks developing on that
platform could just use the distro package and ignore __preview__.

If that's acceptable, then maybe it should be explicitly so in the PEP.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From martin at  Sat Jan 28 02:49:26 2012
From: martin at (martin at
Date: Sat, 28 Jan 2012 02:49:26 +0100
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

> 1. Simple hash randomization is the way to go. We think this has the
> best chance of actually fixing the problem while being fairly
> straightforward such that we're comfortable putting it in a stable
> release.
> 2. It will be off by default in stable releases and enabled by an
> envar at runtime. This will prevent code breakage from dictionary
> order changing as well as people depending on the hash stability.

I think this is a good compromise given the widely varying assessments
of the issue.


From barry at  Sat Jan 28 02:54:22 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 20:54:22 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 28, 2012, at 11:27 AM, Nick Coghlan wrote:

>* for an intranet web service deployment where due diligence adds
>significant overhead to any use of third party packages

Which really means that *we* are assuming the responsibility for this due
diligence.  And of course, we should not add anything to __preview__ without
consent (and contributor agreement) of the upstream developers.


From barry at  Sat Jan 28 02:58:40 2012
From: barry at (Barry Warsaw)
Date: Fri, 27 Jan 2012 20:58:40 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 28, 2012, at 11:13 AM, Nick Coghlan wrote:

>Really, regex is the *reason* this PEP exists: we *know* we need to
>either replace or seriously enhance "re" (since its Unicode handling
>isn't up to scratch), but we're only *pretty sure* adding "regex" to
>the stdlib is the right answer. Adding "__preview__.regex" instead
>gives us a chance to back out if we uncover serious problems (e.g.
>with the cross-platform support).

I'd also feel much better about this PEP if we had specific ways to measure
success.  If, for example, regex were added to Python 3.3, but removed from
3.4 because we didn't get enough feedback about it, then I'd consider the
approach put forward in this PEP to be a failure.  Experiments that fail are
*okay* of course, if they are viewed as experiments, there are clear metrics
to measure their success, and we have the guts to end the experiment if it
doesn't work out.

Of course, if it's a resounding success, then that's fantastic too.


From fuzzyman at  Sat Jan 28 03:02:02 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 02:02:02 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 20:43, Steven D'Aprano wrote:
> Eli Bendersky wrote:
>> Hello,
>> Following an earlier discussion on python-ideas [1], we would like to
>> propose the following PEP for review. Discussion is welcome.
> I think you need to emphasize that modules in __preview__ are NOT 
> expected to have a forward-compatible, stable, API. This is a feature 
> of __preview__, not a bug, and I believe it is the most important 
> feature.
> I see responses to this PEP that assume that APIs will be stable, 

I didn't see responses like that - the *point* of this pep is to allow 
an api we think *should* be in the standard library stabilise and mature 
(that's how I see it anyway). There is a difference between "not yet 
stable" and "we will make huge gratuitous changes" though. We *might* 
make huge gratuitous changes, but only if they're really needed (meaning 
they're huge but not gratuitous I guess).

> and that having a module fail to graduate out of __preview__ should be 
> an extraordinary event. 

I would say this will probably be the case. Once we add something there 
will be resistance to removing it and we shouldn't let things rot in 
__preview__ either. I would say failing to graduate would be the 
exception, although maybe not extraordinary.

> But if this is the case, then why bother with __preview__? It just 
> adds complexity to the process -- if __preview__.spam and spam are 
> expected to be the same, then just spam straight into the std lib and 
> be done with it.
I think you're misunderstanding what was suggested. The suggestion was 
that once spam has graduated from __preview__ into stdlib, that 
__preview__.spam should remain as an alias - so that code using it from 
__preview__ at least has a fighting chance of working.

> This PEP only makes sense if we assume that __preview__.spam and spam 
> *will* be different,

I disagree. Once there is a spam they should remain the same. 
__preview__ is for packages that haven't yet made it into the standard 
library - not a place for experimenting with apis that are already there.
> even if only in minor ways, and that there might not even be a spam. 
> There should be no expectation that every __preview__ module must 
> graduate, 

Graduate or die however.
> or that every standard library module must go through __preview__. If 
> it is stable and uncontroversial, __preview__ adds nothing to the 
> process.
Sure. __preview__ is for things that *need* previewing.

All the best,

Michael Foord

> Even when there are candidates for inclusion with relatively stable 
> APIs, like regex, we should *assume* that there will be API 
> differences between __preview__.regex and regex, simply because it is 
> less harmful to expect changes that don't eventuate than to expect 
> stability and be surprised by changes.
> This, I believe, rules out Antoine's suggestion that modules remain 
> importable from __preview__ even after graduation to a full member of 
> the standard library. We simply can't say have all three of these 
> statements true at the same time:
> 1) regular standard library modules are expected to be backward 
> compatible
> 2) __preview__ modules are not expected to be forward compatible
> 3) __preview__.spam is an alias to regular standard library spam
> At least one of them has to go. Since both 1) and 2) are powerful 
> features, and 3) is only a convenience, the obvious one to drop is 3). 
> I note that the PEP, as it is currently written, explicitly states 
> that __preview__.spam will be dropped when it graduates to spam. This 
> is a good thing and should not be changed.
> Keeping __preview__.spam around after graduation is, I believe, 
> actively harmful. It adds complexity to the developer's 
> decision-making process ("Should I import spam from __preview__, or 
> just import spam? What's the difference?"). It gives a dangerous 
> impression that code written for __preview__.spam will still work for 
> spam.
> We should be discouraging simple-minded recipes like
> try:
>     import spam
> except ImportError:
>     from __preview__ import spam
>, b, c)
> since they undermine the vital feature of __preview__ that the 
> signature and even the existence of is subject to change.
> I would go further and suggest that __preview__ be explicitly called 
> __unstable__. If that name is scary, and it frightens some users off, 
> good! The last thing we want is when 3.4 comes around to have dozens 
> of bug reports along the line of " and 
> have different function signatures and aren't 
> compatible". Of course they do. That's why __preview__.spam existed in 
> the first place, to allow the API to mature without the expectation 
> that it was already stable.
> Since __preview__.spam (or, as I would prefer, __unstable__.spam) and 
> spam cannot be treated as drop-in replacements, what is 
> __preview__.spam good for? Without a stable API, __preview__.spam is 
> not suitable for use in production applications that expect to run 
> under multiple versions of the standard library.
> I think the PEP needs more use-cases on who might use 
> __preview__.spam, and why. These come to my mind:
> * if you don't care about Python 3.x+1, then there is no reason not to
>   treat Python 3.x's __preview__.spam as stable;
> * rapid development proof-of-concept software ("build one to throw away")
>   can safely use __preview__.spam, since they are expected to be replaced
>   anyway;
> * one-use scripts;
> * use at the interactive interpreter;
> * any other time where forward-compatibility is not required.
> I am reminded of the long, often acrimonious arguments that took place 
> on Python-Dev a few years back about the API for the ipaddr library. A 
> lot of the arguments could have been short-circuited if we had said 
> "putting ipaddr into __preview__ does not constitute acceptance of its 
> API".
> (On the other hand, if __preview__ becomes used in the future for 
> library authors to fob-off criticism for 18 months in the hope it will 
> just be forgotten, then this will be a bad thing.)


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Sat Jan 28 03:05:23 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 02:05:23 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>	<>
Message-ID: <>

On 27/01/2012 20:48, Steven D'Aprano wrote:
> Eli Bendersky wrote:
>>> try:
>>>    from __preview__ import thing
>>> except ImportError:
>>>    import thing
>>> So no need to target a very specific version of Python.
>> Yep, this is what I had in mind. And it appeared too trivial to place
>> it in the PEP.
> Trivial and wrong.
> Since thing and __preview__.thing may have subtle, or major, API 
> differences, how do you use it?
No, potentially wrong in cases where the APIs are different. Even with 
the try...except ImportError dance around StringIO / cStringIO there are 
some API differences. But for a lot of use cases it works fine 
(simplejson and json aren't *identical*, but it works for most people).


> try:
>     result =, b, c) +
> except AttributeError:
>     # Must be the preview version
>     result = thing.foobar(a, c, b, x)


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From steve at  Sat Jan 28 03:28:21 2012
From: steve at (Steven D'Aprano)
Date: Sat, 28 Jan 2012 13:28:21 +1100
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> Hello everyone,
> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
> have decided to pronounce on what we want for our stable releases.
> What we have decided is that
> 1. Simple hash randomization is the way to go. We think this has the
> best chance of actually fixing the problem while being fairly
> straightforward such that we're comfortable putting it in a stable
> release.
> 2. It will be off by default in stable releases and enabled by an
> envar at runtime. This will prevent code breakage from dictionary
> order changing as well as people depending on the hash stability.

Do you have the expectation that it will become on by default in some future 


From benjamin at  Sat Jan 28 03:33:57 2012
From: benjamin at (Benjamin Peterson)
Date: Fri, 27 Jan 2012 21:33:57 -0500
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/27 Steven D'Aprano <steve at>:
> Benjamin Peterson wrote:
>> Hello everyone,
>> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
>> have decided to pronounce on what we want for our stable releases.
>> What we have decided is that
>> 1. Simple hash randomization is the way to go. We think this has the
>> best chance of actually fixing the problem while being fairly
>> straightforward such that we're comfortable putting it in a stable
>> release.
>> 2. It will be off by default in stable releases and enabled by an
>> envar at runtime. This will prevent code breakage from dictionary
>> order changing as well as people depending on the hash stability.
> Do you have the expectation that it will become on by default in some future
> release?

Yes, 3.3. The solution in 3.3 could even be one of the more
sophisticated proposals we have today.


From guido at  Sat Jan 28 03:40:32 2012
From: guido at (Guido van Rossum)
Date: Fri, 27 Jan 2012 18:40:32 -0800
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 5:19 PM, Benjamin Peterson <benjamin at> wrote:
> Hello everyone,
> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
> have decided to pronounce on what we want for our stable releases.
> What we have decided is that
> 1. Simple hash randomization is the way to go. We think this has the
> best chance of actually fixing the problem while being fairly
> straightforward such that we're comfortable putting it in a stable
> release.
> 2. It will be off by default in stable releases and enabled by an
> envar at runtime. This will prevent code breakage from dictionary
> order changing as well as people depending on the hash stability.

Okay, good call!

--Guido van Rossum (

From steve at  Sat Jan 28 03:51:41 2012
From: steve at (Steven D'Aprano)
Date: Sat, 28 Jan 2012 13:51:41 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Michael Foord wrote:
> On 27/01/2012 20:48, Steven D'Aprano wrote:
>> Eli Bendersky wrote:
>>>> try:
>>>>    from __preview__ import thing
>>>> except ImportError:
>>>>    import thing
>>>> So no need to target a very specific version of Python.
>>> Yep, this is what I had in mind. And it appeared too trivial to place
>>> it in the PEP.
>> Trivial and wrong.
>> Since thing and __preview__.thing may have subtle, or major, API 
>> differences, how do you use it?
> No, potentially wrong in cases where the APIs are different. Even with 
> the try...except ImportError dance around StringIO / cStringIO there are 
> some API differences. But for a lot of use cases it works fine 
> (simplejson and json aren't *identical*, but it works for most people).

Okay, granted, I accept your point.

But I think we need to distinguish between these cases.

In the case of StringIO and cStringIO, API compatibility is expected, and 
differences are either bugs or implementation differences that you shouldn't 
be relying on.

In the case of the typical[1] __preview__ module, one of the motivations of 
adding it to __preview__ is to test the existing API. We should expect 
changes, even if in practice often there won't be. We might hope for no API 
changes, but we should plan for the case where there will be.

And that rules out the "try import" dance for the typical __preview__ module. 
There may be modules which graduate and keep the same API. In those cases, 
people will quickly work out the import dance on their own, it's a very common 

But we shouldn't advertise it as the right way to deal with __preview__, since 
that implies the expectation of API stability, and we want to send the 
opposite message: __preview__ is the last time the API can change without a 
big song and dance, so be prepared for it to change.

I'm with Nick on this one: if you're not prepared to change "from __preview__ 
import module" to "import module" in your app, then you certainly won't be 
prepared to deal with the potential API changes and you aren't the target 
audience for __preview__.

[1] I am fully aware of the folly of referring to a "typical" example of 
something that doesn't exist yet <wink>


From eliben at  Sat Jan 28 04:11:14 2012
From: eliben at (Eli Bendersky)
Date: Sat, 28 Jan 2012 05:11:14 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

>> No, potentially wrong in cases where the APIs are different. Even with the
>> try...except ImportError dance around StringIO / cStringIO there are some
>> API differences. But for a lot of use cases it works fine (simplejson and
>> json aren't *identical*, but it works for most people).
> Okay, granted, I accept your point.
> But I think we need to distinguish between these cases.
> In the case of StringIO and cStringIO, API compatibility is expected, and
> differences are either bugs or implementation differences that you shouldn't
> be relying on.

I just recently ran into a compatibility of StringIO and cStringIO.
It's a good thing it's documented:

   "Another difference from the StringIO module is that calling
StringIO() with a string parameter creates a read-only object. Unlike
an object created without a string parameter, it does not have write
methods. These objects are not generally visible. They turn up in
tracebacks as StringI and StringO."

But it did cause me a couple of minutes of grief until I found this
piece in the docs and wrote a work-around. But no, even in the current
stable stdlib, the "try import ... except import from elsewhere" trick
doesn't "just work" for StringIO/cStringIO. And as far as I can
understand this is documented, not a bug or some obscure
implementation detail.

My point is that if our users accept *this*, in the stable stdlib, I
see no reason they won't accept the same happening between __preview__
and a graduated module, when they (hopefully) understand the intention
of __preview__.


From turnbull at  Sat Jan 28 05:44:52 2012
From: turnbull at (Stephen J. Turnbull)
Date: Sat, 28 Jan 2012 13:44:52 +0900
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Foord writes:

 > >> Assuming the module is then promoted to the the standard library proper in
 > >> release ``3.X+1``, it will be moved to a permanent location in the library::
 > >>
 > >>      import example
 > >>
 > >> And importing it from ``__preview__`` will no longer work.
 > > Why not leave it accessible through __preview__ too?
 > +1

Er, doesn't this contradict your point about using

    from __preview__ import spam
except ImportError:
    import spam


I think it's a bad idea to introduce a feature that's *supposed* to
break (in the sense of "make a break", ie, change the normal pattern)
with every release and then try to avoid breaking (in the sense of
"causing an unexpected failure") code written by people who don't want
to follow the discipline of keeping up with changing APIs.  If they
want that stability, they should wait for the stable release.

Modules should become unavailable from __preview__ as soon as they
have a stable home.

From stephen at  Sat Jan 28 06:22:54 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 28 Jan 2012 14:22:54 +0900
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Executive summary:

If the promise to remove the module from __preview__ is credible (ie,
strictly kept), then __preview__ will have a specific audience in
those who want the stdlib candidate code and are willing to deal with
a certain amount of instability in that code.

(Whether that audience is big enough to be worth the effort of
managing __preview__ is another question.)

Barry Warsaw writes:

 > >> I agree with everything Alex said here.

I don't necessarily disagree.  But:

 > I can't argue with that, it's just that I don't think __preview__
 > solves [the visibility] problem.

I do disagree with that.  I frequently refer to the library reference
for modules that do what I need, but almost never to PyPI (my own
needs are usually not very hard to program, but if there's a stdlib
module it's almost surely far more general, robust, and tested than my
special-case code would be; PyPI provides far less of a robustness
guarantee than a stdlib candidate would).

I don't know how big or important a use case this is, though I think
that Antoine's point that a similar argument applies to those who
develop software for their own internal use (like me, but they have
actual standards for QA) is valid.

 > I think we'll just see folks using the unstable APIs and then
 > complaining when we remove them, even though they *know* *upfront*
 > that these APIs will go away.

So maybe the Hon. Mr. Broytman would be willing to supply a form
letter for those folks, too.  "We promised to remove the module from
__preview__, and we did.  We warned you the API would be likely
unstable, and it was.  You have no complaint." would be the gist.

 > A robust, standard approach to versioning of modules would though,
 > and I think would better solve what __preview__ is trying to solve.

I suspect that "robust, standard approach to versioning of modules" is
an oxymoron.  The semantics of "module version" from the point of view
of application developers and users is very complex, and cannot be
encapsulated in a linear sequence.  The only reliable comparison that
can be done on versions is equality (and Python knows that; that's why
there is a stdlib bound to the core in the first place!)

 > I'm not so sure about that.  If I were to actively try it, I'm not
 > sure how much motivation I'd have to rewrite key parts of my code
 > when an incompatible version gets promoted to the un__preview__d
 > stdlib.

So use the old version of Python.  You do that anyway.  Or avoid APIs
where you are unwilling to deal with more or less frequent changes.
You do that anyway.  And if you're motivated enough, use __preview__.

I don't understand what you think you lose here.

From scott+python-dev at  Sat Jan 28 06:09:13 2012
From: scott+python-dev at (Scott Dial)
Date: Sat, 28 Jan 2012 00:09:13 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/27/2012 8:48 PM, Barry Warsaw wrote:
> The thinking goes like this: if you would normally use an __preview__ module
> because you can't get approval to download some random package from PyPI, well
> then your distro probably could or should provide it, so get it from them.

That is my thought about the entire __preview__ concept. Anything that
would/should go into __preview__ would be better off being packaged for
a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would
get better visibility than just being on PyPI and would be more flexible
in terms of release schedule to allow API changes.

If the effort being put into making the __preview__ package was put into
packaging those modules for distros, then you would get the same
exposure with better flexibility and a better maintenance story.  The
whole idea of __preview__ seems to be a workaround for the difficult
packaging story for Python modules on common distros -- stuffing them
into __preview__ is a cheat to get the distro packagers to distribute
these interesting modules since we would be bundling them.

However, as you have pointed out, it would very desirable to them to not
do so. So in the end, these modules may not receive as wide of
visibility as the PEP suggests. I could very easily imagine the more
stable distributions refusing or patching anything that used __preview__
in order to eliminate difficulties.

Scott Dial
scott at

From stephen at  Sat Jan 28 06:41:44 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 28 Jan 2012 14:41:44 +0900
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Eli Bendersky writes:

 > My point is that if our users accept *this*, in the stable stdlib, I
 > see no reason they won't accept the same happening between __preview__
 > and a graduated module, when they (hopefully) understand the intention
 > of __preview__.

If it doesn't happen with sufficiently high frequency and annoyance
factors to make attempting to use both the __preview__ and graduated
versions in the same code base unacceptable to most users, then
__preview__ is unnecessary, and the PEP should be rejected.

From eliben at  Sat Jan 28 07:05:53 2012
From: eliben at (Eli Bendersky)
Date: Sat, 28 Jan 2012 08:05:53 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Jan 28, 2012 at 07:41, Stephen J. Turnbull <stephen at> wrote:
> Eli Bendersky writes:
> ?> My point is that if our users accept *this*, in the stable stdlib, I
> ?> see no reason they won't accept the same happening between __preview__
> ?> and a graduated module, when they (hopefully) understand the intention
> ?> of __preview__.
> If it doesn't happen with sufficiently high frequency and annoyance
> factors to make attempting to use both the __preview__ and graduated
> versions in the same code base unacceptable to most users, then
> __preview__ is unnecessary, and the PEP should be rejected.

API differences such as changing one method to another (perhaps
repeated over several methods) is unacceptable for stdlib modules. On
the other hand, for a determined user importing from either
__preview__ or the graduated version it's only a matter of a few lines
in a conditional import. IMHO this is much preferable to having the
module either external or in the stdlib, because that imposes another
external dependency.

But I think that the issue of keeping __preview__ in a later release
is just an "implementation detail" of the PEP and shouldn't be seen as
its main decision point.


From ncoghlan at  Sat Jan 28 07:31:41 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 16:31:41 +1000
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
	<jfv94p$jt7$> <>
Message-ID: <>

On Sat, Jan 28, 2012 at 10:33 AM, Ethan Furman <ethan at> wrote:
> Because at this point it is possible to do:
> ? ?raise ValueError from NameError
> outside a try block. ?I don't see it as incredibly useful, but I don't know
> that it's worth making it illegal.
> So the question is:
> ?- should 'raise ... from ...' be legal outside a try block?
> ?- should 'raise ... from None' be legal outside a try block?

Given that it would be quite a bit of work to make it illegal, my
preference is to leave it alone.

I believe that means there's only one open question. Should "raise ex
from None" be syntactic sugar for:

1. clearing the current thread's exception state (as I believe Ethan's
patch currently does), thus meaning that __context__ and __cause__
both end up being None
2. setting __cause__ to None (so that __context__ still gets set
normally, as it is now when __cause__ is set to a specific exception),
and having __cause__ default to a *new* sentinel object that indicates
"use __context__"

I've already stated my own preference in favour of 2 - that approach
means developers that think about it can explicitly change exception
types such that the context isn't displayed by default, but
application and framework developers remain free to insert their own
exception handlers that *always* report the full exception stack.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 28 07:37:57 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 16:37:57 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 11:48 AM, Barry Warsaw <barry at> wrote:
> Would it be acceptable then for a distro to disable __preview__ or empty it
> out?
> The thinking goes like this: if you would normally use an __preview__ module
> because you can't get approval to download some random package from PyPI, well
> then your distro probably could or should provide it, so get it from them. ?In
> fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents
> were a requirement, then a distro vendor could just ensure those PyPI versions
> are available as distro packages outside of the __preview__ stdlib namespace
> (i.e. in their normal third-party namespace). ?Then folks developing on that
> platform could just use the distro package and ignore __preview__.
> If that's acceptable, then maybe it should be explicitly so in the PEP.

I think that's an excellent idea - in that case, the distro vendor is
taking over the due diligence responsibilities, which are the main
point of __preview__.

Similarly, sumo distributions like ActiveState or Python(x, y) could
choose to add the PyPI version.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 28 08:10:22 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 17:10:22 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 3:22 PM, Stephen J. Turnbull <stephen at> wrote:
> Executive summary:
> If the promise to remove the module from __preview__ is credible (ie,
> strictly kept), then __preview__ will have a specific audience in
> those who want the stdlib candidate code and are willing to deal with
> a certain amount of instability in that code.

People need to remember there's another half to this equation: the
core dev side.

The reason *regex* specifically isn't in the stdlib already is largely
due to (perhaps excessive) concerns about the potential maintenance
burden. It's not a small chunk of code and we don't want to deal with
another bsddb.

That's the main roadblock to inclusion. Not lack of user demand. Not
blindness to the problems with re. Just concerns about
maintainability. Add to that some niggling concerns about backwards
compatibility in obscure corner cases that may not be exercised by
current users. And so we have an impasse. Matthew has indicated he's
happy to include it and maintain it as part of the core, but it hasn't
really gone anywhere because we don't currently have a good way to
address those maintainability concerns (aside from saying "you're
worrying about it too much", which isn't what I would call

That's what __preview__ gives us: a way to deal with the *negative*
votes that keep positive additions out of the standard library. Most
of the PEP's arguments for due diligence etc are actually talking
about why we want things in the standard library in the first place,
rather than about __preview__ in particular.

The core idea behind the __preview__ namespace is to allow *3*
possible responses when a module is proposed for stdlib inclusion:

1. Yes, that's a good idea, we'll add it (cf. lzma for 3.3)
2. Maybe, so we'll add it to __preview__ for a release and see if it
blows up in our face (hopefully at least regex for 3.3, maybe ipaddr
and daemon as well)
3. No, not going to happen.

Currently, anything where we would answer "2" ends up becoming a "3"
by default, and that's not a good thing for the long-term health of
the language.

The reason this will be more effective in building core developer
confidence than third party distribution via PyPI is due to a few
different things:
- we all run the test suite, so we get to see that the software builds
and tests effectively
- we know what our own buildbots cover, so we know it's passing on all
those platforms
- we'll get to see more of the related discussions in channels we
monitor *anyway* (i.e. the bug tracker, python-dev)

As far as the criteria for failing to graduate goes, I'd say something
that ends up in __preview__ will almost always make it into the main
part of the standard library, with the following exceptions:
- excessive build process, test suite and buildbot instability.
Whether this is due to fragile test cases or fragile code, we don't
want to deal with another bsddb. If the test suite can't be stabilised
over the course of an entire feature release, then the module would
most likely be rejected rather than allowing it to graduate to the
standard library.
- strongly negative (or just plain confused) user feedback. We deal
with feedback on APIs all the time. Sometimes we add new ones, or
tweak the existing ones. Occasionally we'll judge them to be
irredeemably broken and just plain remove them (cf. CObject,
contextlib.nested, Bastion, rexec). This wouldn't change just because
a module was in __preview__ - instead, we'd just have another option
available to us (i.e. rejecting the module for stdlib inclusion
post-preview rather than trying to fix it).

Really, the main benefit for end users doesn't lie in __preview__
itself: it lies in the positive effect __preview__ will have on the
long term evolution of the standard library, as it aims to turn
python-dev's inherent conservatism (which is a good thing!) into a
speed bump rather than a road block.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 28 08:13:28 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 17:13:28 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 4:37 PM, Nick Coghlan <ncoghlan at> wrote:
> I think that's an excellent idea - in that case, the distro vendor is
> taking over the due diligence responsibilities, which are the main
> point of __preview__.

Heh, contradicted myself in my next email. python-dev handling due
diligence is a key benefit for *stdlib inclusion*, not __preview__ per


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From anacrolix at  Sat Jan 28 08:42:42 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 28 Jan 2012 02:42:42 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 12:26 PM, Alex <alex.gaynor at> wrote:
> I think a significantly healthier process (in terms of maximizing feedback and
> getting something into it's best shape) is to let a project evolve naturally on
> PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
> and then include it when it becomes ready on it's own merits. The counter
> argument to ?this is that putting it in the stdlib gets you signficantly more
> eyeballs (and hopefully more feedback, therefore), my only response to this is:
> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
> justify it in the stdlib.

Strongly agree.

From anacrolix at  Sat Jan 28 08:49:40 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 28 Jan 2012 02:49:40 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

FWIW I'm now -1 for this idea. Stronger integration with PyPI and
packaging systems is much preferable. Python core public releases are
no place for testing.

On Sat, Jan 28, 2012 at 2:42 AM, Matt Joiner <anacrolix at> wrote:
> On Fri, Jan 27, 2012 at 12:26 PM, Alex <alex.gaynor at> wrote:
>> I think a significantly healthier process (in terms of maximizing feedback and
>> getting something into it's best shape) is to let a project evolve naturally on
>> PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
>> and then include it when it becomes ready on it's own merits. The counter
>> argument to ?this is that putting it in the stdlib gets you signficantly more
>> eyeballs (and hopefully more feedback, therefore), my only response to this is:
>> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
>> justify it in the stdlib.
> Strongly agree.

From raymond.hettinger at  Sat Jan 28 08:50:48 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Fri, 27 Jan 2012 23:50:48 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 26, 2012, at 7:19 PM, Ethan Furman wrote:

> One of the open issues from PEP 3134 is suppressing context:  currently there is no way to do it.  This PEP proposes one.

Thanks for proposing fixes to this issue. 
It is an annoying problem.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From hodgestar+pythondev at  Sat Jan 28 08:58:15 2012
From: hodgestar+pythondev at (Simon Cross)
Date: Sat, 28 Jan 2012 09:58:15 +0200
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 9:49 AM, Matt Joiner <anacrolix at> wrote:
> FWIW I'm now -1 for this idea. Stronger integration with PyPI and
> packaging systems is much preferable. Python core public releases are
> no place for testing.

+1. I'd much rather just use the module from PyPI.

It would be good to have a practical guide on how to manage the
transition from third-party to core library module though. A PEP with
a list of modules earmarked for upcoming inclusion in the standard
library (and which Python version they're intended to be included in)
might focus community effort on using, testing and fixing modules
before they make it into core and fixing becomes a lot harder.


From anacrolix at  Sat Jan 28 09:08:33 2012
From: anacrolix at (Matt Joiner)
Date: Sat, 28 Jan 2012 19:08:33 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

> +1. I'd much rather just use the module from PyPI.
> It would be good to have a practical guide on how to manage the
> transition from third-party to core library module though. A PEP with
> a list of modules earmarked for upcoming inclusion in the standard
> library (and which Python version they're intended to be included in)
> might focus community effort on using, testing and fixing modules
> before they make it into core and fixing becomes a lot harder.

+1 for your +1, and earmarking. That's the word I was looking for, and
instead chose "advocacy".

From stephen at  Sat Jan 28 09:38:20 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 28 Jan 2012 17:38:20 +0900
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan writes:

 > People need to remember there's another half to this equation: the
 > core dev side.

Why?  There's nothing about it in the PEP.<wink/>

 > The reason *regex* specifically isn't in the stdlib already is
 > largely due to (perhaps excessive) concerns about the potential
 > maintenance burden.

But then giving regex as an example seems to contradict the PEP: "The
only difference between preview APIs and the rest of the standard
library is that preview APIs are explicitly exempted from the usual
backward compatibility guarantees," "in principle, most modules in the
__preview__ package should eventually graduate to the stable standard
library," and "whenever the Python core development team decides that
a new module should be included into the standard library, but isn't
sure about whether the module's API is optimal".

True, there were a few bits spilled on the possibility of being
"without sufficient developer support to maintain it," but I read that
as a risk that is basically a consequence of instability of the API.
The rationale is entirely focused on API instability, and a focus on
API instability is certainly the reason for calling it "__preview__"
rather than "__experimental__".

I don't have an opinion on whether this is an argument for rejecting
the PEP or for rewriting it (specifically, seriously beefing up the
"after trying it, maybe we won't want to maintain it" rationale).  I
also think that if "we need to try it to decide if the maintenance
burden is acceptable" is a rationale, the name "__experimental__"
should be seriously reconsidered as more accurately reflecting the
intended content of the package.

From ncoghlan at  Sat Jan 28 09:55:13 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 18:55:13 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 5:49 PM, Matt Joiner <anacrolix at> wrote:
> FWIW I'm now -1 for this idea. Stronger integration with PyPI and
> packaging systems is much preferable. Python core public releases are
> no place for testing.

People saying this: we KNOW this approach doesn't work in all cases.
If it worked perfectly, regex would be in the standard library by now.

Don't consider this PEP a purely theoretical proposal, because it
isn't. It's really being put forward to solve a specific problem: the
fact that we need to do something about re's lack of proper Unicode
support [1]. Those issues are actually hard to solve, so replacing re
with Matthew Barnett's regex module (just as re itself was a
replacement for the original regex module) that already addresses most
of them seems like a good way forward, but this is currently being
blocked because there are still a few lingering concerns with
maintainability and backwards compatibility.

We *need* to break the impasse preventing its inclusion in the
standard library, and __preview__ lets us do that without running
roughshod over the legitimate core developer concerns raised in the
associated tracker issue [2].

With the current criteria for stdlib inclusion, it doesn't *matter* if
a module is oh-so-close to being accepted: it gets rejected anyway,
just like a module that has no chance of ever being suitable. There is
currently *no* path forward for resolving any stdlib-specific concerns
that arise with already popular PyPI modules, and so such situations
remain unresolved and key components of the standard library stagnate.

While regex is the current poster-child for this problem, it's quite
likely that similar problems will arise in the future. Kenneth Reitz's
requests module is an obvious candidate: it's enormously popular with
users, Kenneth has indicated he's amenable to the idea of stdlib
inclusion once the feature set is sufficiently stable (i.e. not for
3.3), but I expect there will be legitimate concerns with
incorporating it, given its scope.




Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Sat Jan 28 10:18:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 28 Jan 2012 19:18:01 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 6:38 PM, Stephen J. Turnbull <stephen at> wrote:
> I don't have an opinion on whether this is an argument for rejecting
> the PEP or for rewriting it (specifically, seriously beefing up the
> "after trying it, maybe we won't want to maintain it" rationale). ?I
> also think that if "we need to try it to decide if the maintenance
> burden is acceptable" is a rationale, the name "__experimental__"
> should be seriously reconsidered as more accurately reflecting the
> intended content of the package.

I think it's an argument for rewriting it (and, as you point out,
perhaps reverting to __experimental__ as the proposed name). Eli
started from a draft I wrote a while back and my own thinking on the
topic wasn't particularly clear (in fact, it's only this thread that
has really clarified things for me).

The main thing I've realised is that the end user benefits currently
discussed in the PEP are really about the importance of a robust
*standard library*. They aren't specific to the new namespace at all -
that part of the rationale is really only needed to counter the
predictable "who cares about the standard library, we can just use
PyPI!" responses (and the answer is, "lots of people that can't or
won't use PyPI modules for a wide range of reasons").

The only reason to add a new double-underscore namespace is to address
*core developer* concerns in cases where we're *almost* sure that we
want to add the module to the standard library, but aren't quite
prepared to commit to maintaining it for the life of the 3.x series
(cf. 2.x and the ongoing problems we had with keeping the bsddb module
working properly, especially before Jesus Cea stepped up to wrangle it
into submission).

It's basically us saying to Python users "We're explicitly flagging
this PyPI module for inclusion in the next major Python release. We've
integrated it into our build process, test suite and binary releases,
so you don't even have to download it from PyPI in order to try it
out, you can just import it from the __preview__ namespace (although
you're still free to download it from PyPI if you prefer - in fact, if
you need to support multiple Python versions, we actively recommend
it!). There's still a small chance this module won't make the grade
and will be dropped from the standard library entirely (that's why
it's only a preview), but most likely it will move into the main part
of the standard library with full backwards compatibility guarantees
in the next release".


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From mark at  Sat Jan 28 10:56:39 2012
From: mark at (Mark Shannon)
Date: Sat, 28 Jan 2012 09:56:39 +0000
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>	<>
Message-ID: <>

stefan brunthaler wrote:
> Hi,
> On Tue, Nov 8, 2011 at 10:36, Benjamin Peterson <benjamin at> wrote:
>> 2011/11/8 stefan brunthaler <s.brunthaler at>:
>>> How does that sound?
>> I think I can hear real patches and benchmarks most clearly.
> I spent the better part of my -20% time on implementing the work as
> "suggested". Please find the benchmarks attached to this email, I just

Could you try benchmarking with the "standard" benchmarks:
and see what sort of performance gains you get?

> did them on my system (i7-920, Linux 3.0.0-15, GCC 4.6.1). I branched
> off the regular 3.3a0 default tip changeset 73977 shortly after your
> email. I do not have an official patch yet, but am going to create one
> if wanted. Changes to the existing interpreter are minimal, the
> biggest chunk is a new interpreter dispatch loop.

How portable is the threaded interpreter?

Do you have a public repository for the code, so we can take a look?


From lukasz at  Sat Jan 28 11:37:53 2012
From: lukasz at (=?iso-8859-2?Q?=A3ukasz_Langa?=)
Date: Sat, 28 Jan 2012 11:37:53 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Wiadomo?? napisana przez Simon Cross w dniu 28 sty 2012, o godz. 08:58:

> +1. I'd much rather just use the module from PyPI.
> It would be good to have a practical guide on how to manage the
> transition from third-party to core library module though. A PEP with
> a list of modules earmarked for upcoming inclusion in the standard
> library (and which Python version they're intended to be included in)
> might focus community effort on using, testing and fixing modules
> before they make it into core and fixing becomes a lot harder.


Best regards,
?ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

From fijall at  Sat Jan 28 13:21:18 2012
From: fijall at (Maciej Fijalkowski)
Date: Sat, 28 Jan 2012 14:21:18 +0200
Subject: [Python-Dev] Python 3 benchmarks
Message-ID: <>


Something that's maybe worth mentioning is that the "official" python
benchmark suite has a pretty
incomplete set of benchmarks for python 3 compared to say what we run
for pypy: I think a very
worthwhile project would be to try to port other benchmarks (that
actually use existing python projects like sympy or django) for those
that has been ported to python 3.

Any thoughts?


From p.f.moore at  Sat Jan 28 14:04:45 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 28 Jan 2012 13:04:45 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28 January 2012 09:18, Nick Coghlan <ncoghlan at> wrote:

> It's basically us saying to Python users "We're explicitly flagging
> this PyPI module for inclusion in the next major Python release. We've
> integrated it into our build process, test suite and binary releases,
> so you don't even have to download it from PyPI in order to try it
> out, you can just import it from the __preview__ namespace (although
> you're still free to download it from PyPI if you prefer - in fact, if
> you need to support multiple Python versions, we actively recommend
> it!). There's still a small chance this module won't make the grade
> and will be dropped from the standard library entirely (that's why
> it's only a preview), but most likely it will move into the main part
> of the standard library with full backwards compatibility guarantees
> in the next release".



From p.f.moore at  Sat Jan 28 14:13:09 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 28 Jan 2012 13:13:09 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28 January 2012 01:48, Barry Warsaw <barry at> wrote:
> The thinking goes like this: if you would normally use an __preview__ module
> because you can't get approval to download some random package from PyPI, well
> then your distro probably could or should provide it, so get it from them. ?In
> fact, if the number of __preview__ modules is kept low, *and* PyPI equivalents
> were a requirement, then a distro vendor could just ensure those PyPI versions
> are available as distro packages outside of the __preview__ stdlib namespace
> (i.e. in their normal third-party namespace). ?Then folks developing on that
> platform could just use the distro package and ignore __preview__.

Just so that you know that such cases exist, I am in a position where
I have access to systems with (distro-supplied) Python installed. I
can use anything supplied with Python (i.e., the stdlib - and
__preview__ would fall into this category as well). And yet I have
essentially no means of gaining access to any 3rd party modules,
whether they are packaged by the distro or obtained from PyPI.  (And
"build your own" isn't an option in many cases, if only because a C
compiler may well not be available!) This is essentially due to
corporate inertia and bogged down "do-nothing" policies rather than
due dilligence or supportability concerns. But it is a reality for me
(and many others, I suspect).

Having said this, of course, the same corporate inertia means that
Python 3.3 is a pipe-dream for me in those environments for many years
yet. So ignoring them may be reasonable.

Just some facts to consider :-)


From eric at  Sat Jan 28 14:23:45 2012
From: eric at (Eric V. Smith)
Date: Sat, 28 Jan 2012 08:23:45 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/28/2012 2:10 AM, Nick Coghlan wrote:
> On Sat, Jan 28, 2012 at 3:22 PM, Stephen J. Turnbull <stephen at> wrote:
>> Executive summary:
>> If the promise to remove the module from __preview__ is credible (ie,
>> strictly kept), then __preview__ will have a specific audience in
>> those who want the stdlib candidate code and are willing to deal with
>> a certain amount of instability in that code.
> People need to remember there's another half to this equation: the
> core dev side.
> The reason *regex* specifically isn't in the stdlib already is largely
> due to (perhaps excessive) concerns about the potential maintenance
> burden. It's not a small chunk of code and we don't want to deal with
> another bsddb.


> Really, the main benefit for end users doesn't lie in __preview__
> itself: it lies in the positive effect __preview__ will have on the
> long term evolution of the standard library, as it aims to turn
> python-dev's inherent conservatism (which is a good thing!) into a
> speed bump rather than a road block.

I was -0 on this proposal, but after Nick's discussion above I'm now +1.

I also think it's worth thinking about how multiprocessing would have
benefited from the __preview__ process.

And for people saying "just use PyPI": that tends to exclude many
Windows users from trying out packages that aren't pure Python.

From anacrolix at  Sat Jan 28 14:55:09 2012
From: anacrolix at (Matt Joiner)
Date: Sun, 29 Jan 2012 00:55:09 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

> __preview__ would fall into this category as well). And yet I have
> essentially no means of gaining access to any 3rd party modules,
> whether they are packaged by the distro or obtained from PyPI. ?(And
> "build your own" isn't an option in many cases, if only because a C
> compiler may well not be available!) This is essentially due to
> corporate inertia and bogged down "do-nothing" policies rather than
> due dilligence or supportability concerns. But it is a reality for me
> (and many others, I suspect).
> Having said this, of course, the same corporate inertia means that
> Python 3.3 is a pipe-dream for me in those environments for many years
> yet. So ignoring them may be reasonable.

You clearly want access to external modules sooner. A preview
namespace addresses this indirectly. The separated stdlib versioning
concept is far superior for this use case.

From hs at  Sat Jan 28 15:07:00 2012
From: hs at (Hynek Schlawack)
Date: Sat, 28 Jan 2012 15:07:00 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>


Am 27.01.2012 um 18:26 schrieb Alex:

> I'm -1 on this, for a pretty simple reason. Something goes into __preview__,
> instead of it's final destination directly because it needs feedback/possibly
> changes. However, given the release cycle of the stdlib (~18 months), any
> feedback it gets can't be seen by actual users until it's too late. Essentially
> you can only get one round of stdlib.
> I think a significantly healthier process (in terms of maximizing feedback and
> getting something into it's best shape) is to let a project evolve naturally on
> PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
> and then include it when it becomes ready on it's own merits. The counter
> argument to  this is that putting it in the stdlib gets you signficantly more
> eyeballs (and hopefully more feedback, therefore), my only response to this is:
> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
> justify it in the stdlib.

I agree with Alex on this: The iterations ? even with PEP 407 ? would be wayyy too long to be useful.

As for the only downside: How about endorsing certain pypi projects as possible future additions in order to give them more exposure? I'm sure there is some nice way for that.

Plus: Everybody could pin the version their code depends on right now, so updates wouldn't break anything. I.e. api users would have more peace of mind and api developers could develop more aggressively.


From fuzzyman at  Sat Jan 28 15:59:22 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 14:59:22 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28/01/2012 13:04, Paul Moore wrote:
> On 28 January 2012 09:18, Nick Coghlan<ncoghlan at>  wrote:
>> It's basically us saying to Python users "We're explicitly flagging
>> this PyPI module for inclusion in the next major Python release. We've
>> integrated it into our build process, test suite and binary releases,
>> so you don't even have to download it from PyPI in order to try it
>> out, you can just import it from the __preview__ namespace (although
>> you're still free to download it from PyPI if you prefer - in fact, if
>> you need to support multiple Python versions, we actively recommend
>> it!). There's still a small chance this module won't make the grade
>> and will be dropped from the standard library entirely (that's why
>> it's only a preview), but most likely it will move into the main part
>> of the standard library with full backwards compatibility guarantees
>> in the next release".
> +1.

Yep, nice way of putting it - and summing up the virtues of the 
approach. (Although I might say "most likely it will move into the main 
part of the standard library with full backwards compatibility 
guarantees in a future release".)


> Paul.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Sat Jan 28 16:12:45 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 15:12:45 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 27/01/2012 22:54, Barry Warsaw wrote:
> On Jan 27, 2012, at 10:48 PM, Antoine Pitrou wrote:
>> On Fri, 27 Jan 2012 16:10:51 -0500
>> Barry Warsaw<barry at>  wrote:
>>> I'm -1 on this as well.  It just feels like the completely wrong way to
>>> stabilize an API, and I think despite the caveats that are explicit in
>>> __preview__, Python will just catch tons of grief from users and haters about
>>> API instability anyway, because from a practical standpoint, applications
>>> written using __preview__ APIs *will* be less stable.
>> Well, obviously __preview__ is not for the most conservative users. I
>> think the name clearly conveys the idea that you are trying out
>> something which is not in its definitive state, doesn't it?
> Maybe.  I could quibble about the name, but let's not bikeshed on that
> right now.  The problem as I see it is that __preview__ will be very tempting
> to use in production.  In fact, its use case is almost predicated on that.
> (We want you to use it so you can tell us if the API is good.)
> Once people use it, they will probably ship code that relies on it, and then
> the pressure will be applied to us to continue to support that API even if a
> newer, better one gets promoted out of __preview__.  I worry that over time,
> for all practical purposes, there won't be much difference between __preview__
> and the stdlib.
>>>> I think a significantly healthier process (in terms of maximizing feedback
>>>> and getting something into it's best shape) is to let a project evolve
>>>> naturally on PyPi and in the ecosystem, give feedback to it from an inclusion
>>>> perspective, and then include it when it becomes ready on it's own
>>>> merits. The counter argument to this is that putting it in the stdlib gets
>>>> you signficantly more eyeballs (and hopefully more feedback, therefore), my
>>>> only response to this is: if it doesn't get eyeballs on PyPi I don't think
>>>> there's a great enough need to justify it in the stdlib.
>>> I agree with everything Alex said here.
>> The idea that being on PyPI is sufficient is nice but flawed (the
>> IPaddr example). PyPI doesn't guarantee any visibility (how many
>> packages are there?). Furthermore, having users is not a guarantee that
>> the API is appropriate, either; it just means that the API is
>> appropriate for *some* users.
> I can't argue with that, it's just that I don't think __preview__ solves that
> problem.  And it seems to me that __preview__ introduces a whole 'nother set
> of problems on top of that.
> So taking the IPaddr example further.  Would having it in the stdlib,
> relegated to an explicitly unstable API part of the stdlib, increase eyeballs
> enough to generate the kind of API feedback we're looking for, without
> imposing an additional maintenance burden on us?

I think the answer is yes. That's kind of the crux of the matter I guess.
>    If you were writing an app
> that used something in __preview__, how would you provide feedback on what
> parts of the API you'd want to change,
The bugtracker.

> *and* how would you adapt your
> application to use those better APIs once they became available 18 months from
> now?

How do users do it for the standard library? Using the third party 
version is one way.

> I think we'll just see folks using the unstable APIs and then
> complaining when we remove them, even though they *know* *upfront* that these
> APIs will go away.
> I'm also nervous about it from an OS vender point of view.  Should I reject
> any applications that import from __preview__?  Or do I have to make a
> commitment to support those APIs longer than Python does because the
> application that uses it is important to me?
> I think the OS vendor problem is easier with an application that uses some
> PyPI package, because I can always make that package available to the
> application by pulling in the version I care about.  It's harder if a newer,
> incompatible version is released upstream and I want to provide both, but I
> don't think __preview__ addresses that.  A robust, standard approach to
> versioning of modules would though, and I think would better solve what
> __preview__ is trying to solve.

Don't OS vendors go further and say "pin your dependency to the version 
we ship", whether it's in the Python standard library or not?

So "just use a more recent version from pypi" is explicitly not an 
option for people using system packages. As OS packagers tend to target 
a specific version of python, using  __preview__ for that version would 
be fine - and when they upgrade to the next version applications may 
need fixing in the same way as they would if the system packaged a new 
release of the third party library. (When moving between Ubuntu 
distributions I've found that my software using system packages often 
needs to change because the version of some library has now changed.)

Plus having a package in __preview__ has no bearing on whether or not 
the system packages the third party version, so I think it's a bit of a 


>> On the other hand, __preview__ would clearly signal that something is
>> on the verge of being frozen as an official stdlib API, and would
>> prompt people to actively try it.
> I'm not so sure about that.  If I were to actively try it, I'm not sure how
> much motivation I'd have to rewrite key parts of my code when an incompatible
> version gets promoted to the un__preview__d stdlib.
> -Barry
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Sat Jan 28 17:05:11 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 16:05:11 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28/01/2012 05:09, Scott Dial wrote:
> On 1/27/2012 8:48 PM, Barry Warsaw wrote:
>> The thinking goes like this: if you would normally use an __preview__ module
>> because you can't get approval to download some random package from PyPI, well
>> then your distro probably could or should provide it, so get it from them.
> That is my thought about the entire __preview__ concept. Anything that
> would/should go into __preview__ would be better off being packaged for
> a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would
> get better visibility than just being on PyPI and would be more flexible
> in terms of release schedule to allow API changes.
> If the effort being put into making the __preview__ package was put into
> packaging those modules for distros,

That effort wouldn't be put in though. Largely those involved in working 
on Python are not the ones packaging for Linux distributions. So it 
isn't an alternative to __preview__ - it could happily be done alongside 
it though. Those who work on Python won't just switch to Linux if this 
proposal isn't accepted, they'll do different work on Python instead.

>   then you would get the same
> exposure
Packaging libraries for Linux gets you no exposure on Windows or the 
Mac, so __preview__ is wider.

>   with better flexibility and a better maintenance story.  The
> whole idea of __preview__ seems to be a workaround for the difficult
> packaging story for Python modules on common distros
I don't know where you got that impression. :-)

One of the reasons for __preview__ is that it means integrating 
libraries with the Python build and test systems, for all platforms. 
Packaging for [some-variants-of] Linux only doesn't do anything for this.

All the best,


> -- stuffing them
> into __preview__ is a cheat to get the distro packagers to distribute
> these interesting modules since we would be bundling them.
> However, as you have pointed out, it would very desirable to them to not
> do so. So in the end, these modules may not receive as wide of
> visibility as the PEP suggests. I could very easily imagine the more
> stable distributions refusing or patching anything that used __preview__
> in order to eliminate difficulties.


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Sat Jan 28 17:09:08 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 16:09:08 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28/01/2012 13:55, Matt Joiner wrote:
>> __preview__ would fall into this category as well). And yet I have
>> essentially no means of gaining access to any 3rd party modules,
>> whether they are packaged by the distro or obtained from PyPI.  (And
>> "build your own" isn't an option in many cases, if only because a C
>> compiler may well not be available!) This is essentially due to
>> corporate inertia and bogged down "do-nothing" policies rather than
>> due dilligence or supportability concerns. But it is a reality for me
>> (and many others, I suspect).
>> Having said this, of course, the same corporate inertia means that
>> Python 3.3 is a pipe-dream for me in those environments for many years
>> yet. So ignoring them may be reasonable.
> You clearly want access to external modules sooner. A preview
> namespace addresses this indirectly. The separated stdlib versioning
> concept is far superior for this use case.
There are two proposals for the standard library - one is to do 
development in a separate repository to make it easier for other 
implementations to contribute. To my understanding this proposal is 
mildly controversial, but doesn't involve changing the way the standard 
library is distributed or versioned.

A separate proposal about standard library versioning has been floated 
but is *much* more controversial and therefore much less likely to 
happen. So I wouldn't hold your breath on it...

All the best,

Michael Foord

> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From fuzzyman at  Sat Jan 28 17:12:47 2012
From: fuzzyman at (Michael Foord)
Date: Sat, 28 Jan 2012 16:12:47 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 28/01/2012 04:44, Stephen J. Turnbull wrote:
> Michael Foord writes:
>   >  >>  Assuming the module is then promoted to the the standard library proper in
>   >  >>  release ``3.X+1``, it will be moved to a permanent location in the library::
>   >  >>
>   >  >>       import example
>   >  >>
>   >  >>  And importing it from ``__preview__`` will no longer work.
>   >  >  Why not leave it accessible through __preview__ too?
>   >
>   >  +1
> Er, doesn't this contradict your point about using
> try:
>      from __preview__ import spam
> except ImportError:
>      import spam
> ?
> I think it's a bad idea to introduce a feature that's *supposed* to
> break (in the sense of "make a break", ie, change the normal pattern)
> with every release and then try to avoid breaking (in the sense of
> "causing an unexpected failure") code written by people who don't want
> to follow the discipline of keeping up with changing APIs.  If they
> want that stability, they should wait for the stable release.
> Modules should become unavailable from __preview__ as soon as they
> have a stable home.
I like not breaking people's code where *possible*.



May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing

From storchaka at  Sat Jan 28 11:18:34 2012
From: storchaka at (Serhiy Storchaka)
Date: Sat, 28 Jan 2012 12:18:34 +0200
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <>
References: <jfuvme$i5b$> <>
Message-ID: <jg0i21$9oe$>

27.01.12 23:08, Frank Sievertsen ???????(??):
>> As already mentioned, the vulnerability of 64-bit Python rather
>> theoretical and not practical. The size of the hash makes the attack
>> is extremely unlikely.
> Unfortunately this assumption is not correct. It works very good with
> 64bit-hashing.
> It's much harder to create (efficiently) 64-bit hash-collisions.
> But I managed to do so and created strings with
> a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even
> 14 characters would have been enough.
> You need less than twice as many characters for the same effect as in
> the 32bit-world.

The point is not the length of the string, but the size of string space 
for inspection. To search for a string with a specified 64-bit hash to 
iterate over 2 ** 64 strings. Spending on a single string scan 1 
nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 / 
(3600 * 24 * 365.25) = 585 years. For the attack we need to find 1000 
such strings -- more than half a million years. For 32-bit hash would 
need only an hour.

Of course, to calculate the hash function to use secure, not allowing 
"cut corners" and reduce computation time.

From solipsis at  Sat Jan 28 15:30:30 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 15:30:30 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Sat, 28 Jan 2012 02:49:40 -0500
Matt Joiner <anacrolix at> wrote:
> FWIW I'm now -1 for this idea. Stronger integration with PyPI and
> packaging systems is much preferable.

That will probably never happen. "pip install XXX" is the best we
(python-dev and the community) can do. "import some_module" won't
magically start fetching some_module from PyPI if it isn't installed on
your system.

So the bottom line is: we would benefit from an intermediate status
between "available on PyPI" and "shipped as a stable API in the
stdlib". The __preview__ proposal does just that in an useful way; are
there any alternatives you'd like to suggest?



From solipsis at  Sat Jan 28 15:17:17 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 15:17:17 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Sat, 28 Jan 2012 00:09:13 -0500
Scott Dial <scott+python-dev at> wrote:
> On 1/27/2012 8:48 PM, Barry Warsaw wrote:
> > The thinking goes like this: if you would normally use an __preview__ module
> > because you can't get approval to download some random package from PyPI, well
> > then your distro probably could or should provide it, so get it from them.
> That is my thought about the entire __preview__ concept. Anything that
> would/should go into __preview__ would be better off being packaged for
> a couple of key distros (e.g., Ubuntu/Fedora/Gentoo) where they would
> get better visibility than just being on PyPI and would be more flexible
> in terms of release schedule to allow API changes.

This is a red herring. First, not everyone uses a distro. There are
almost a million monthly downloads of the Windows installers. Second,
what a distro puts in their packages has nothing to do with considering
a module for inclusion in the Python stdlib.

Besides, I don't understand how being packaged by a distro makes a
difference. My distro has thousands of packages, many of them quite

OTOH, being shipped in the stdlib *and* visibly documented on (in the stdlib docs, in the what's new, etc.) will make a



From guido at  Sat Jan 28 18:15:15 2012
From: guido at (Guido van Rossum)
Date: Sat, 28 Jan 2012 09:15:15 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 5:04 AM, Paul Moore <p.f.moore at> wrote:
> On 28 January 2012 09:18, Nick Coghlan <ncoghlan at> wrote:
>> It's basically us saying to Python users "We're explicitly flagging
>> this PyPI module for inclusion in the next major Python release. We've
>> integrated it into our build process, test suite and binary releases,
>> so you don't even have to download it from PyPI in order to try it
>> out, you can just import it from the __preview__ namespace (although
>> you're still free to download it from PyPI if you prefer - in fact, if
>> you need to support multiple Python versions, we actively recommend
>> it!). There's still a small chance this module won't make the grade
>> and will be dropped from the standard library entirely (that's why
>> it's only a preview), but most likely it will move into the main part
>> of the standard library with full backwards compatibility guarantees
>> in the next release".
> +1.

Hm. You could do this just as well without a __preview__ package. You
just flag the module as experimental in the docs and get on with your

We have some experience with this in Google App Engine. We used to use
a separate "labs" package in our namespace and when packages were
deemed stable enough they were moved from labs to non-labs. But the
move always turned out to be a major pain, causing more breakage than
we would have had if we had simply kept the package location the same
but let the API mutate. Now we just put new, experimental packages in
the right place from the start, and put a loud "experimental" banner
on all pages of their docs, which is removed once the API is stable.

There is much less pain now: while incompatible changes do happen for
experimental package, they are not frequent, and rarely
earth-shattering, and usually the final step is simply removing the
banner without making any (incompatible) changes to the code. This
means that the final step is painless for early adopters, thereby
rewarding them for their patience instead of giving them one final
kick while they sort out the import changes.

So I do not support the __preview__ package. I think we're better off
flagging experimental modules in the docs than in their name. For the
specific case of the regex module, the best way to adoption may just
be to include it in the stdlib as regex and keep it there. Any other
solution will just cause too much anxiety.

--Guido van Rossum (

From storchaka at  Sat Jan 28 16:21:10 2012
From: storchaka at (Serhiy Storchaka)
Date: Sat, 28 Jan 2012 17:21:10 +0200
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <>
References: <jfuvme$i5b$> <>
Message-ID: <jg13p8$e12$>

27.01.12 23:08, Frank Sievertsen ???????(??):
>> As already mentioned, the vulnerability of 64-bit Python rather
>> theoretical and not practical. The size of the hash makes the attack
>> is extremely unlikely.
> Unfortunately this assumption is not correct. It works very good with
> 64bit-hashing.
> It's much harder to create (efficiently) 64-bit hash-collisions.
> But I managed to do so and created strings with
> a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even
> 14 characters would have been enough.
> You need less than twice as many characters for the same effect as in
> the 32bit-world.

The point is not the length of the string, but the size of string space 
for inspection. To search for a string with a specified 64-bit hash to 
iterate over 2 ** 64 strings. Spending on a single string scan 1 
nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 / 
(3600 * 24 * 365.25) = 585 years. For the attack we need to find 1000 
such strings -- more than half a million years. For 32-bit hash would 
need only an hour.

Of course, to calculate the hash function to use secure, not allowing 
"cut corners" and reduce computation time.

From g.brandl at  Sat Jan 28 18:54:38 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 28 Jan 2012 18:54:38 +0100
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <jg1cop$ph0$>

Am 28.01.2012 02:19, schrieb Benjamin Peterson:
> Hello everyone,
> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
> have decided to pronounce on what we want for our stable releases.
> What we have decided is that
> 1. Simple hash randomization is the way to go. We think this has the
> best chance of actually fixing the problem while being fairly
> straightforward such that we're comfortable putting it in a stable
> release.
> 2. It will be off by default in stable releases and enabled by an
> envar at runtime. This will prevent code breakage from dictionary
> order changing as well as people depending on the hash stability.

FWIW, the same will be done for 3.2.


From barry at  Sat Jan 28 19:14:36 2012
From: barry at (Barry Warsaw)
Date: Sat, 28 Jan 2012 13:14:36 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote:

>So I do not support the __preview__ package. I think we're better off
>flagging experimental modules in the docs than in their name. For the
>specific case of the regex module, the best way to adoption may just
>be to include it in the stdlib as regex and keep it there. Any other
>solution will just cause too much anxiety.


What does the PEP give you above this "simple as possible" solution?


From solipsis at  Sat Jan 28 19:29:49 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 19:29:49 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Sat, 28 Jan 2012 13:14:36 -0500
Barry Warsaw <barry at> wrote:
> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote:
> >So I do not support the __preview__ package. I think we're better off
> >flagging experimental modules in the docs than in their name. For the
> >specific case of the regex module, the best way to adoption may just
> >be to include it in the stdlib as regex and keep it there. Any other
> >solution will just cause too much anxiety.
> +1
> What does the PEP give you above this "simple as possible" solution?

"I think we'll just see folks using the unstable APIs and then
complaining when we remove them, even though they *know* *upfront* that
these APIs will go away."

That problem would be much worse if some modules were simply marked
"experimental" in the doc, rather than put in a separate namespace.
You will see people copying recipes found on the internet without
knowing that they rely on unstable APIs.



From solipsis at  Sat Jan 28 19:39:08 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 19:39:08 +0100
Subject: [Python-Dev] Python 3 benchmarks
References: <>
Message-ID: <>

On Sat, 28 Jan 2012 14:21:18 +0200
Maciej Fijalkowski <fijall at> wrote:
> Hi
> Something that's maybe worth mentioning is that the "official" python
> benchmark suite has a pretty
> incomplete set of benchmarks for python 3 compared to say what we run
> for pypy: I think a very
> worthwhile project would be to try to port other benchmarks (that
> actually use existing python projects like sympy or django) for those
> that has been ported to python 3.




From mwm at  Sat Jan 28 19:46:18 2012
From: mwm at (Mike Meyer)
Date: Sat, 28 Jan 2012 10:46:18 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at> wrote:

>On Sat, 28 Jan 2012 13:14:36 -0500
>Barry Warsaw <barry at> wrote:
>> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote:
>> >So I do not support the __preview__ package. I think we're better
>> >flagging experimental modules in the docs than in their name. For
>> >specific case of the regex module, the best way to adoption may just
>> >be to include it in the stdlib as regex and keep it there. Any other
>> >solution will just cause too much anxiety.
>> +1
>> What does the PEP give you above this "simple as possible" solution?
>"I think we'll just see folks using the unstable APIs and then
>complaining when we remove them, even though they *know* *upfront* that
>these APIs will go away."
>That problem would be much worse if some modules were simply marked
>"experimental" in the doc, rather than put in a separate namespace.
>You will see people copying recipes found on the internet without
>knowing that they rely on unstable APIs.

How. About doing them the way we do depreciated modules, and have them spit warnings to stderr?  Maybe add a flag and environment variable to disable that.

Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From solipsis at  Sat Jan 28 19:49:01 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 19:49:01 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <1327776541.8904.5.camel@localhost.localdomain>

Le samedi 28 janvier 2012 ? 10:46 -0800, Mike Meyer a ?crit :
> Antoine Pitrou <solipsis at> wrote:
> >On Sat, 28 Jan 2012 13:14:36 -0500
> >Barry Warsaw <barry at> wrote:
> >> On Jan 28, 2012, at 09:15 AM, Guido van Rossum wrote:
> >> 
> >> >So I do not support the __preview__ package. I think we're better
> >off
> >> >flagging experimental modules in the docs than in their name. For
> >the
> >> >specific case of the regex module, the best way to adoption may just
> >> >be to include it in the stdlib as regex and keep it there. Any other
> >> >solution will just cause too much anxiety.
> >> 
> >> +1
> >> 
> >> What does the PEP give you above this "simple as possible" solution?
> >
> >"I think we'll just see folks using the unstable APIs and then
> >complaining when we remove them, even though they *know* *upfront* that
> >these APIs will go away."
> >
> >That problem would be much worse if some modules were simply marked
> >"experimental" in the doc, rather than put in a separate namespace.
> >You will see people copying recipes found on the internet without
> >knowing that they rely on unstable APIs.
> How. About doing them the way we do depreciated modules, and have them
> spit warnings to stderr?  Maybe add a flag and environment variable to
> disable that.

You're proposing that new experimental modules spit warnings when you
use them? I don't think that's a good way of promoting their use :)
(something we do want to do even though we also want to convey the idea
that they're not yet "stable" or "fully approved")



From ethan at  Sat Jan 28 19:48:31 2012
From: ethan at (Ethan Furman)
Date: Sat, 28 Jan 2012 10:48:31 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Michael Foord wrote:
> On 28/01/2012 04:44, Stephen J. Turnbull wrote:
>> I think it's a bad idea to introduce a feature that's *supposed* to
>> break (in the sense of "make a break", ie, change the normal pattern)
>> with every release and then try to avoid breaking (in the sense of
>> "causing an unexpected failure") code written by people who don't want
>> to follow the discipline of keeping up with changing APIs.  If they
>> want that stability, they should wait for the stable release.
>> Modules should become unavailable from __preview__ as soon as they
>> have a stable home.
> I like not breaking people's code where *possible*.

__preview__ is not about stability.  It's about making code easily 
available for testing before the API freezes.

If nothing has changed once it graduates, how hard is it to change a few 
lines of code from

     from __preview__ import blahblahblah


     import blahblahblah


It seems to me that including a __preview__ package in production 
software is a mistake, and not its intention.


From ethan at  Sat Jan 28 19:56:57 2012
From: ethan at (Ethan Furman)
Date: Sat, 28 Jan 2012 10:56:57 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<jfv94p$jt7$>	<>
Message-ID: <>

Nick Coghlan wrote:
> On Sat, Jan 28, 2012 at 10:33 AM, Ethan Furman <ethan at> wrote:
>> So the question is:
>>  - should 'raise ... from ...' be legal outside a try block?
>>  - should 'raise ... from None' be legal outside a try block?
> Given that it would be quite a bit of work to make it illegal, my
> preference is to leave it alone.
> I believe that means there's only one open question. Should "raise ex
> from None" be syntactic sugar for:
> 1. clearing the current thread's exception state (as I believe Ethan's
> patch currently does), thus meaning that __context__ and __cause__
> both end up being None
> 2. setting __cause__ to None (so that __context__ still gets set
> normally, as it is now when __cause__ is set to a specific exception),
> and having __cause__ default to a *new* sentinel object that indicates
> "use __context__"
> I've already stated my own preference in favour of 2 - that approach
> means developers that think about it can explicitly change exception
> types such that the context isn't displayed by default, but
> application and framework developers remain free to insert their own
> exception handlers that *always* report the full exception stack.

The reasoning behind choice two makes a lot of sense.  My latest effort 
(I should be able to get the patch posted within two days) involves 
creating a new dummy exception, SuppressContext, and 'raise ... from 
None' sets cause to it; the printing logic checks to see if cause is 
SuppressContext, and if so, prints neither context nor cause.

Not exactly how Nick describes it, but as far as I've gotten in my 
Python core hacking skills.  ;)


From bauertomer at  Sat Jan 28 20:59:10 2012
From: bauertomer at (T.B.)
Date: Sat, 28 Jan 2012 21:59:10 +0200
Subject: [Python-Dev] threading.Semaphore()'s counter can become negative
	for non-ints
Message-ID: <>

Hello python-dev,

This is probably worth of a bug report: While looking at I 
noticed that Semaphore's counter can go below zero. This is opposed to 
the docs: "The counter can never go below zero; ...". Just try:

import threading
s = threading.Semaphore(0.5)
# You can now acquire s as many times as you want!
# even when s._value < 0.

The fix is tiny:
diff -r 265d35e8fe82 Lib/
--- a/Lib/  Fri Jan 27 21:17:04 2012 +0000
+++ b/Lib/  Sat Jan 28 21:22:04 2012 +0200
@@ -322,7 +321,7 @@
          rc = False
          endtime = None
-        while self._value == 0:
+        while self._value <= 0:
              if not blocking:
              if __debug__:

Which is better than forcing s._value to be an int.
I also think that the docs should be updated to reflect that the counter 
is not compared to be equal to zero, but non-positive. e.g. "when 
acquire() finds that it is zero...", "If it is zero on entry, block...".

On another commit: Regarding, an unused 
import was left:
-from collections import deque


From benjamin at  Sat Jan 28 21:07:16 2012
From: benjamin at (Benjamin Peterson)
Date: Sat, 28 Jan 2012 15:07:16 -0500
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/28 T.B. <bauertomer at>:
> Hello python-dev,
> This is probably worth of a bug report: While looking at I
> noticed that Semaphore's counter can go below zero. This is opposed to the
> docs: "The counter can never go below zero; ...". Just try:
> import threading
> s = threading.Semaphore(0.5)

But why would you want to pass a float? It seems like API abuse to me.


From martin at  Sat Jan 28 21:57:01 2012
From: martin at (martin at
Date: Sat, 28 Jan 2012 21:57:01 +0100
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <jg0i21$9oe$>
References: <jfuvme$i5b$> <>
Message-ID: <>

Zitat von Serhiy Storchaka <storchaka at>:

> 27.01.12 23:08, Frank Sievertsen ???????(??):
>>> As already mentioned, the vulnerability of 64-bit Python rather
>>> theoretical and not practical. The size of the hash makes the attack
>>> is extremely unlikely.
>> Unfortunately this assumption is not correct. It works very good with
>> 64bit-hashing.
>> It's much harder to create (efficiently) 64-bit hash-collisions.
>> But I managed to do so and created strings with
>> a length of 16 (6-bit)-characters (a-z, A-Z, 0-9, _, .). Even
>> 14 characters would have been enough.
>> You need less than twice as many characters for the same effect as in
>> the 32bit-world.
> The point is not the length of the string, but the size of string  
> space for inspection. To search for a string with a specified 64-bit  
> hash to iterate over 2 ** 64 strings.

I think you entirely missed the point of Frank's message. Despite your
analysis that it shall not be possible, Frank has *actually* computed
colliding strings, most likely also for a specified hash value.

> Of course, to calculate the hash function to use secure, not  
> allowing "cut corners" and reduce computation time.

This issue wouldn't be that relevant if there wasn't a documented
algorithm to significantly reduce the number of tries you need to
make to produce a string with a desired hash value. My own implementation
would need 2**33 tries in the worst case (for a 64-bit hash value);
thanks to the birthday paradox, it's actually a significant chance
that the algorithm finds collisions even faster.


From mwm at  Sat Jan 28 22:49:52 2012
From: mwm at (Mike Meyer)
Date: Sat, 28 Jan 2012 13:49:52 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <1327776541.8904.5.camel@localhost.localdomain>
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at> wrote:

>Le samedi 28 janvier 2012 ? 10:46 -0800, Mike Meyer a ?crit :
>> Antoine Pitrou <solipsis at> wrote:
>> >You will see people copying recipes found on the internet without
>> >knowing that they rely on unstable APIs.
>> How. About doing them the way we do depreciated modules, and have
>> spit warnings to stderr?  Maybe add a flag and environment variable
>> disable that.
>You're proposing that new experimental modules spit warnings when you
>use them?

To be explicit, when the system loada them.

> I don't think that's a good way of promoting their use :)

And importing something from __preview__or __experimental__or whatever won't? This thread did include the suggestion that they go into their final location instead of a magic module.

>(something we do want to do even though we also want to convey the idea
>that they're not yet "stable" or "fully approved")

Doing it with a message pointing at the page describing the status makes sure users read the docs before using them. That solves the problem of using them without realizing it.

Sent from my Android phone with K-9 Mail. Please excuse my brevity.

From solipsis at  Sat Jan 28 23:02:37 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 28 Jan 2012 23:02:37 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <1327788157.8904.17.camel@localhost.localdomain>

> >You're proposing that new experimental modules spit warnings when you
> >use them?
> To be explicit, when the system loada them.

There are many reasons to import a module, such as viewing its
documentation. And the warning will trigger if the import happens in
non-user code, such as a library; or when there is a fallback for the
module not being present. People usually get annoyed by intempestive
warnings which don't warn about an actual problem.

> >(something we do want to do even though we also want to convey the idea
> >that they're not yet "stable" or "fully approved")
> Doing it with a message pointing at the page describing the status
> makes sure users read the docs before using them.

Sure, it's just much less user-friendly than conveying that idea in the
module's namespace. Besides, it only works if warnings are not silenced.

People are used to __future__ (and I've seen no indication that they
don't like it). __preview__ is another application of the same pattern
(using a special namespace to indicate the status of a feature).



From ericsnowcurrently at  Sun Jan 29 00:03:45 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 28 Jan 2012 16:03:45 -0700
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <1327788157.8904.17.camel@localhost.localdomain>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou <solipsis at> wrote:
> There are many reasons to import a module, such as viewing its
> documentation. And the warning will trigger if the import happens in
> non-user code, such as a library; or when there is a fallback for the
> module not being present. People usually get annoyed by intempestive
> warnings which don't warn about an actual problem.

As an alternative, how about a __preview__ or __provisional__
attribute on modules that are in this provisional state?  So just add
that big warning to the docs, as Guido suggested, and set the
attribute as a programmatic indicator.  Perhaps also add
sys.provisional_modules (or wherever) to explicitly give the full list
for the current Python version.


From solipsis at  Sun Jan 29 00:08:37 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 29 Jan 2012 00:08:37 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <1327792117.4376.8.camel@localhost.localdomain>

Le samedi 28 janvier 2012 ? 16:03 -0700, Eric Snow a ?crit :
> On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou <solipsis at> wrote:
> > There are many reasons to import a module, such as viewing its
> > documentation. And the warning will trigger if the import happens in
> > non-user code, such as a library; or when there is a fallback for the
> > module not being present. People usually get annoyed by intempestive
> > warnings which don't warn about an actual problem.
> As an alternative, how about a __preview__ or __provisional__
> attribute on modules that are in this provisional state?  So just add
> that big warning to the docs, as Guido suggested, and set the
> attribute as a programmatic indicator.  Perhaps also add
> sys.provisional_modules (or wherever) to explicitly give the full list
> for the current Python version.

Well, how often do you examine the attributes of a module before using
it? I think that's a much too obscure way to convey the information.



From ericsnowcurrently at  Sun Jan 29 00:34:37 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 28 Jan 2012 16:34:37 -0700
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <1327792117.4376.8.camel@localhost.localdomain>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 4:08 PM, Antoine Pitrou <solipsis at> wrote:
> Le samedi 28 janvier 2012 ? 16:03 -0700, Eric Snow a ?crit :
>> On Sat, Jan 28, 2012 at 3:02 PM, Antoine Pitrou <solipsis at> wrote:
>> > There are many reasons to import a module, such as viewing its
>> > documentation. And the warning will trigger if the import happens in
>> > non-user code, such as a library; or when there is a fallback for the
>> > module not being present. People usually get annoyed by intempestive
>> > warnings which don't warn about an actual problem.
>> As an alternative, how about a __preview__ or __provisional__
>> attribute on modules that are in this provisional state? ?So just add
>> that big warning to the docs, as Guido suggested, and set the
>> attribute as a programmatic indicator. ?Perhaps also add
>> sys.provisional_modules (or wherever) to explicitly give the full list
>> for the current Python version.
> Well, how often do you examine the attributes of a module before using
> it? I think that's a much too obscure way to convey the information.

Granted.  However, actively looking for the attribute is only one of
the lesser use-cases.  The key is that it allows you to check any
library programmatically for dependence on any of the provisional
modules.  The warning in the docs is important, but being able to have
code check for it is important too.  As a small bonus, it would show
up in help for the module and in dir().


From pydev at  Sun Jan 29 00:39:48 2012
From: pydev at (Frank Sievertsen)
Date: Sun, 29 Jan 2012 00:39:48 +0100
Subject: [Python-Dev] Hashing proposal: 64-bit hash
In-Reply-To: <jg13p8$e12$>
References: <jfuvme$i5b$> <>
Message-ID: <>

> The point is not the length of the string, but the size of string 
> space for inspection. To search for a string with a specified 64-bit 
> hash to iterate over 2 ** 64 strings. Spending on a single string scan 
> 1 nanosecond (a very optimistic estimate), it would take 2 ** 64 / 1e9 
> / (3600 * 24 * 365.25) = 585 years. For the attack we need to find 
> 1000 such strings -- more than half a million years. For 32-bit hash 
> would need only an hour.

With meet-in-the-middle and some other tricks it's possible to generate 
25,000 64-bit-collisions per hour using an older desktop-cpu and 4gb ram.



From tjreedy at  Sun Jan 29 00:47:23 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 28 Jan 2012 18:47:23 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <jg21em$vj8$>

On 1/28/2012 3:55 AM, Nick Coghlan wrote:

I am currently -something on the proposal as it because it will surely 
create a lot of hassles and because I do not think it is necessary the 
best solution to the motivating concerns.

> Don't consider this PEP a purely theoretical proposal, because it
> isn't. It's really being put forward to solve a specific problem: the
> fact that we need to do something about re's lack of proper Unicode
> support [1]. Those issues are actually hard to solve, so replacing re
> with Matthew Barnett's regex module (just as re itself was a
> replacement for the original regex module) that already addresses most
> of them seems like a good way forward, but this is currently being
> blocked because there are still a few lingering concerns with
> maintainability and backwards compatibility.

I find the concern about 'maintainability' a bit strange as regex seems 
to be getting more maintainance and improvement than re. The re author 
is no longer active. If neither were in the library, and we were 
considering both, regex would certainly win, at least from a user view. 
Tom Christiansen reviewed about 8 unicode-capable extended r. e. 
packages, including both re and regex, and regex came out much better.

The concern about back compatibility ignores the code that re users 
cannot write. In any case, that problem would be solved by adding regex 
in addition to re instead of as a replacement. If it were initially 
added as __preview__.regex, would the next step be to call it regex? or 
change it to re and remove the current package?. If the former, I think 
we might as well do it now. If the latter, that is different from what 
the pep proposes.

> While regex is the current poster-child for this problem,

I see it as a special case that is not really addressed by the Pep.

The other proposed use-case for __preview__ is packages whose api is not 
stable. Such packages may need their api changed a lot sooner than 18-24 
months. Or, their api may change for a lot longer than just one release 
cycle. So the PEP would be best suited for packages who api may be fixed 
but might need code-breaking adjustments *once* in 18 months.

A counter-proposal: add an __x__ package to site-packages. Document the 
contents separately in an X-Library manual. Let the api of such packages 
change with every micro release. Don't guarantee that modules won't 
disappear completely. Don't put a time limit on residence there before 
being moved up (to the stdlib) or out. Packages that track volatile 
external standards could stay there indefinitely.

If an module is moved to stdlib, leave a stub for at least two versions 
that emits a deprecation warning (to switch to import a instead of 
__x__.a) and a notice that the doc has moved, along with importing the 
contents of the stdlib version. (This would work for the __preview__ 
proposal also.)

Terry Jan Reedy

From ncoghlan at  Sun Jan 29 02:33:21 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 29 Jan 2012 11:33:21 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 29, 2012 at 3:15 AM, Guido van Rossum <guido at> wrote:
> Hm. You could do this just as well without a __preview__ package. You
> just flag the module as experimental in the docs and get on with your
> life.
> We have some experience with this in Google App Engine. We used to use
> a separate "labs" package in our namespace and when packages were
> deemed stable enough they were moved from labs to non-labs. But the
> move always turned out to be a major pain, causing more breakage than
> we would have had if we had simply kept the package location the same
> but let the API mutate. Now we just put new, experimental packages in
> the right place from the start, and put a loud "experimental" banner
> on all pages of their docs, which is removed once the API is stable.
> There is much less pain now: while incompatible changes do happen for
> experimental package, they are not frequent, and rarely
> earth-shattering, and usually the final step is simply removing the
> banner without making any (incompatible) changes to the code. This
> means that the final step is painless for early adopters, thereby
> rewarding them for their patience instead of giving them one final
> kick while they sort out the import changes.
> So I do not support the __preview__ package. I think we're better off
> flagging experimental modules in the docs than in their name. For the
> specific case of the regex module, the best way to adoption may just
> be to include it in the stdlib as regex and keep it there. Any other
> solution will just cause too much anxiety.

I'm willing to go along with that (especially given your report of
AppEngine's experience with the "labs" namespace).

Can we class this as a pronouncement on PEP 408? That is, "No to
adding a __preview__ namespace, but yes to adding regex directly for


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From guido at  Sun Jan 29 04:29:05 2012
From: guido at (Guido van Rossum)
Date: Sat, 28 Jan 2012 19:29:05 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan <ncoghlan at> wrote:
> I'm willing to go along with that (especially given your report of
> AppEngine's experience with the "labs" namespace).
> Can we class this as a pronouncement on PEP 408? That is, "No to
> adding a __preview__ namespace, but yes to adding regex directly for
> 3.3"?

Yup. We seem to have a tendency to over-analyze decisions a bit lately
(witness the hand-wringing about the hash collision DoS attack).

For those who worry about people who copy recipes that stop working, I
think they're worrying too much. If people want to take a shortcut
without reading the documentation or understanding the code they are
copying, fine, but they should realize the limitations of free advice.

I don't mean to put down the many great recipes that exist or the
value of copying code to get started quickly. But I think our
liability as maintainers of the library is sufficiently delineated
when we clearly mark a module as experimental in the documentation.
(Recipe authors should ideally also add this warning to their recipe
if it depends on an experimental API.)

Finally, if you really want to put warnings in whenever an
experimental module is being used, make it a silent warning, like
SilentDeprecationWarning. That allows people to request more strict
warnings without unduly alarming the users of an app.

--Guido van Rossum (

From ncoghlan at  Sun Jan 29 07:42:28 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 29 Jan 2012 16:42:28 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 29, 2012 at 1:29 PM, Guido van Rossum <guido at> wrote:
> On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan <ncoghlan at> wrote:
>> I'm willing to go along with that (especially given your report of
>> AppEngine's experience with the "labs" namespace).
>> Can we class this as a pronouncement on PEP 408? That is, "No to
>> adding a __preview__ namespace, but yes to adding regex directly for
>> 3.3"?
> Yup. We seem to have a tendency to over-analyze decisions a bit lately
> (witness the hand-wringing about the hash collision DoS attack).

I have now updated PEP 408 accordingly (i.e. rejected, but with a
specific note about regex).

And (since Alex Gaynor brought it up off-list), I'll explicitly note
here that I'm taking your approval as granting the special permission
PEP 399 needs to accept a C extension module without a pure Python
equivalent. Patches to *add* a pure Python version for use by other
implementations are of course welcome (in practice, I suspect it's
likely only in PyPy that such an engine would be fast enough to be


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ethan at  Sun Jan 29 08:42:20 2012
From: ethan at (Ethan Furman)
Date: Sat, 28 Jan 2012 23:42:20 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

Benjamin Peterson wrote:
> 2012/1/26 Ethan Furman <ethan at>:
> Congratulations, you are now PEP 409.

Thanks, Benjamin!

So, how do I make changes to it?


From ethan at  Sun Jan 29 08:44:32 2012
From: ethan at (Ethan Furman)
Date: Sat, 28 Jan 2012 23:44:32 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <>

For those not on the nosy list, here's the latest post


It looks like agreement is forming around the

     raise ... from None

method.  It has been mentioned more than once that having the context 
saved on the exception would be a Good Thing, and for further debugging 
(or logging or what-have-you) I must agree.

The patch attached now sets __cause__ to True, leaving __context__ 
unclobbered.  The exception printing routine checks to see if __cause__ 
is True, and if so simply skips the display of either cause or 
__context__, but __context__ can still be queried by later code.

One concern raised was that since it is possible to write (even before 
this patch)

     raise KeyError from NameError

outside of a try block that some would get into the habit of writing

     raise KeyError from None

as a way of preemptively suppressing implicit context chaining;  I am 
happy to report that this is not an issue, since when that exception is 
caught and a new exception raised, it is the new exception that controls 
the display.

In other words:

 >>> >>> try:
...   raise ValueError from None
... except:
...   raise NameError
Traceback (most recent call last):
   File "<stdin>", line 2, in <module>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
   File "<stdin>", line 4, in <module>

From g.brandl at  Sun Jan 29 09:39:01 2012
From: g.brandl at (Georg Brandl)
Date: Sun, 29 Jan 2012 09:39:01 +0100
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
Message-ID: <jg30j2$q67$>

Am 29.01.2012 08:42, schrieb Ethan Furman:
> Benjamin Peterson wrote:
>> 2012/1/26 Ethan Furman <ethan at>:
>>> PEP: XXX
>> Congratulations, you are now PEP 409.
> Thanks, Benjamin!
> So, how do I make changes to it?

Please send PEP updates to the PEP editors at peps at


From mark at  Sun Jan 29 11:31:48 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 10:31:48 +0000
Subject: [Python-Dev] A new dictionary implementation
Message-ID: <>


Now that issue 13703 has been largely settled,
I want to propose my new dictionary implementation again.
It is a little more polished than before.

Object-oriented benchmarks use considerably less memory and are
sometimes faster (by a small amount).
(I've only benchmarked on my old 32bit machine)

E.g   2to3  No speed change  -28% memory
     GCbench   +10% speed     -47% memory

Other benchmarks show little or no change in behaviour,
mainly minor memory savings.

If an application is OO and uses lots of memory
the new dict will save a lot of memory and maybe boost performance.
Other applications will be largely unaffected.

It passes all the tests.
(I had to change a couple that relied on dict repr() ordering)


From solipsis at  Sun Jan 29 15:09:34 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 29 Jan 2012 15:09:34 +0100
Subject: [Python-Dev] A new dictionary implementation
References: <>
Message-ID: <>


On Sun, 29 Jan 2012 10:31:48 +0000
Mark Shannon <mark at> wrote:
> Now that issue 13703 has been largely settled,
> I want to propose my new dictionary implementation again.
> It is a little more polished than before.

I briefly took a look at your code yesterday and it looked generally
reasonable to me. It would be nice to open an issue on so that we can review it there (just fill the
"repository" field and use the "create patch" button).



From benjamin at  Sun Jan 29 15:56:11 2012
From: benjamin at (Benjamin Peterson)
Date: Sun, 29 Jan 2012 09:56:11 -0500
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/29 Mark Shannon <mark at>:
> Hi,
> Now that issue 13703 has been largely settled,
> I want to propose my new dictionary implementation again.
> It is a little more polished than before.

If you're serious about changing the dictionary implementation, I
think you should write a PEP. It should explain the new dicts
advantages (and disadvantages?) and give comprehensive benchmark
numbers. Something along the lines of I should think.


From solipsis at  Sun Jan 29 16:08:41 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 29 Jan 2012 16:08:41 +0100
Subject: [Python-Dev] A new dictionary implementation
References: <>
Message-ID: <>

On Sun, 29 Jan 2012 09:56:11 -0500
Benjamin Peterson <benjamin at> wrote:

> 2012/1/29 Mark Shannon <mark at>:
> > Hi,
> >
> > Now that issue 13703 has been largely settled,
> > I want to propose my new dictionary implementation again.
> > It is a little more polished than before.
> If you're serious about changing the dictionary implementation, I
> think you should write a PEP. It should explain the new dicts
> advantages (and disadvantages?) and give comprehensive benchmark
> numbers. Something along the lines of
> I should think.

"New dictionary implementation" is a misnomer here. Mark's patch merely
allows to share the keys array between several dictionaries. The lookup
algorithm remains exactly the same as far as I've read. It's actually
much less invasive than e.g. Martin's AVL trees-for-hash-collisions



From benjamin at  Sun Jan 29 16:19:39 2012
From: benjamin at (Benjamin Peterson)
Date: Sun, 29 Jan 2012 10:19:39 -0500
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/29 Antoine Pitrou <solipsis at>:
> On Sun, 29 Jan 2012 09:56:11 -0500
> Benjamin Peterson <benjamin at> wrote:
>> 2012/1/29 Mark Shannon <mark at>:
>> > Hi,
>> >
>> > Now that issue 13703 has been largely settled,
>> > I want to propose my new dictionary implementation again.
>> > It is a little more polished than before.
>> If you're serious about changing the dictionary implementation, I
>> think you should write a PEP. It should explain the new dicts
>> advantages (and disadvantages?) and give comprehensive benchmark
>> numbers. Something along the lines of
>> I should think.
> "New dictionary implementation" is a misnomer here. Mark's patch merely
> allows to share the keys array between several dictionaries. The lookup
> algorithm remains exactly the same as far as I've read. It's actually
> much less invasive than e.g. Martin's AVL trees-for-hash-collisions
> proposal.

Ah, okay. So, the subject makes sound scarier than it is. :)


From mark at  Sun Jan 29 16:16:24 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 15:16:24 +0000
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <> <>
Message-ID: <>

Antoine Pitrou wrote:
> Hi,
> On Sun, 29 Jan 2012 10:31:48 +0000
> Mark Shannon <mark at> wrote:
>> Now that issue 13703 has been largely settled,
>> I want to propose my new dictionary implementation again.
>> It is a little more polished than before.
> I briefly took a look at your code yesterday and it looked generally
> reasonable to me. It would be nice to open an issue on
> so that we can review it there (just fill the
> "repository" field and use the "create patch" button).



From andrea.crotti.0 at  Sun Jan 29 16:34:38 2012
From: andrea.crotti.0 at (Andrea Crotti)
Date: Sun, 29 Jan 2012 15:34:38 +0000
Subject: [Python-Dev] #include "Python.h"
Message-ID: <>

I have a newbie question about CPython.
Looking at the C code I noted that for example in tupleobject.c there is
only one include
#include "Python.h"

Python.h actually includes everything as far as I can I see so:
- it's very hard with a not-enough smart editor to find out where the
   not-locally defined symbols are actually defined (well sure that is
   not a problem for most of the people)

- if all the files include python.h, doesn't it generate very big object
   files? Or is it not a problem since they are stripped out after?


From mark at  Sun Jan 29 17:07:56 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 16:07:56 +0000
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Sun, 29 Jan 2012 09:56:11 -0500
> Benjamin Peterson <benjamin at> wrote:
>> 2012/1/29 Mark Shannon <mark at>:
>>> Hi,
>>> Now that issue 13703 has been largely settled,
>>> I want to propose my new dictionary implementation again.
>>> It is a little more polished than before.
>> If you're serious about changing the dictionary implementation, I
>> think you should write a PEP. It should explain the new dicts
>> advantages (and disadvantages?) and give comprehensive benchmark
>> numbers. Something along the lines of
>> I should think.
> "New dictionary implementation" is a misnomer here. Mark's patch merely
> allows to share the keys array between several dictionaries. The lookup
> algorithm remains exactly the same as far as I've read. It's actually
> much less invasive than e.g. Martin's AVL trees-for-hash-collisions
> proposal.

Antoine is right. It is a reorganisation of the dict, plus a couple of 
changes to typeobject.c and object.c to ensure that instance 
dictionaries do indeed share keys arrays.
The lookup algorithm remains the same (it works well).


From phd at  Sun Jan 29 18:22:23 2012
From: phd at (Oleg Broytman)
Date: Sun, 29 Jan 2012 21:22:23 +0400
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
Message-ID: <>


   We are sorry but we cannot help you. This mailing list is to work on
developing Python (adding new features to Python itself and fixing bugs);
if you're having problems learning, understanding or using Python, please
find another forum. Probably python-list/comp.lang.python mailing list/news
group is the best place; there are Python developers who participate in it;
you may get a faster, and probably more complete, answer there. See for other lists/news groups/fora. Thank
you for understanding.

On Sun, Jan 29, 2012 at 03:34:38PM +0000, Andrea Crotti wrote:
> I have a newbie question about CPython.
> Looking at the C code I noted that for example in tupleobject.c there is
> only one include
> #include "Python.h"
> Python.h actually includes everything as far as I can I see so:
> - it's very hard with a not-enough smart editor to find out where the
>   not-locally defined symbols are actually defined (well sure that is
>   not a problem for most of the people)
> - if all the files include python.h, doesn't it generate very big object
>   files? Or is it not a problem since they are stripped out after?
> Thanks,
> Andrea

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From andrea.crotti.0 at  Sun Jan 29 18:59:51 2012
From: andrea.crotti.0 at (Andrea Crotti)
Date: Sun, 29 Jan 2012 17:59:51 +0000
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
Message-ID: <>

On 01/29/2012 05:22 PM, Oleg Broytman wrote:
> Hello.
>     We are sorry but we cannot help you. This mailing list is to work on
> developing Python (adding new features to Python itself and fixing bugs);
> if you're having problems learning, understanding or using Python, please
> find another forum. Probably python-list/comp.lang.python mailing list/news
> group is the best place; there are Python developers who participate in it;
> you may get a faster, and probably more complete, answer there. See
> for other lists/news groups/fora. Thank
> you for understanding.

I wrote here because I thought it was the best place, but I understand
this point of view, I can ask on python or python-core for example..

From ctb at  Sun Jan 29 19:10:07 2012
From: ctb at (C. Titus Brown)
Date: Sun, 29 Jan 2012 10:10:07 -0800
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 29, 2012 at 05:59:51PM +0000, Andrea Crotti wrote:
> On 01/29/2012 05:22 PM, Oleg Broytman wrote:
>> Hello.
>>     We are sorry but we cannot help you. This mailing list is to work on
>> developing Python (adding new features to Python itself and fixing bugs);
>> if you're having problems learning, understanding or using Python, please
>> find another forum. Probably python-list/comp.lang.python mailing list/news
>> group is the best place; there are Python developers who participate in it;
>> you may get a faster, and probably more complete, answer there. See
>> for other lists/news groups/fora. Thank
>> you for understanding.
> I wrote here because I thought it was the best place, but I understand
> this point of view, I can ask on python or python-core for example..

python-dev isn't that inappropriate, IMO, but probably the best place to
go with this discussion is python-ideas.  Could you repost over there?

C. Titus Brown, ctb at

From p.f.moore at  Sun Jan 29 19:34:49 2012
From: p.f.moore at (Paul Moore)
Date: Sun, 29 Jan 2012 18:34:49 +0000
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 29 January 2012 18:10, C. Titus Brown <ctb at> wrote:
> python-dev isn't that inappropriate, IMO, but probably the best place to
> go with this discussion is python-ideas. ?Could you repost over there?

I agree that python-dev isn't particularly appropriate, python-list is
probably your best bet. The python-ideas isn't really appropriate, as
this isn't a proposal for a change to Python, but rather a question
about how the Python C code is structured. That's always a grey area,
and I can see why the OP thought python-dev might be a reasonable

Having said all that:

> Python.h actually includes everything as far as I can I see so:
> - it's very hard with a not-enough smart editor to find out where the
>  not-locally defined symbols are actually defined (well sure that is
>  not a problem for most of the people)

Well, that's more of a question of what tools you use to edit/read
Python code. I guess you could view it as a trade-off between ease of
writing the core code and extensions (avoiding micromanagement of
headers, and being able to document #include "Python.h" as the
canonical way to get access to the Python API from C) versus tracking
down macro definitions and symbol declarations (and that's really only
for information, as the API is documented in the manuals anyway).

I don't use an editor that can automatically find the definitions, but
grep and the manuals does me fine.

> - if all the files include python.h, doesn't it generate very big object
>  files? Or is it not a problem since they are stripped out after?

That's more of a C/linker question, but generally .h files only
contain declarations and macros, and nothing that generates code. So
there is no impact on object code size if you include multiple .h
files, or too many, or whatever. So no, it doesn't generate big object


From andrea.crotti.0 at  Sun Jan 29 19:53:31 2012
From: andrea.crotti.0 at (Andrea Crotti)
Date: Sun, 29 Jan 2012 18:53:31 +0000
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 01/29/2012 06:34 PM, Paul Moore wrote:
> On 29 January 2012 18:10, C. Titus Brown<ctb at>  wrote:
>> python-dev isn't that inappropriate, IMO, but probably the best place to
>> go with this discussion is python-ideas.  Could you repost over there?
> I agree that python-dev isn't particularly appropriate, python-list is
> probably your best bet. The python-ideas isn't really appropriate, as
> this isn't a proposal for a change to Python, but rather a question
> about how the Python C code is structured. That's always a grey area,
> and I can see why the OP thought python-dev might be a reasonable
> place.

Ok well for this I won't repost it anywhere else, I have already all
the answers I wanted and it was not so important..

> Having said all that:
>> Python.h actually includes everything as far as I can I see so:
>> - it's very hard with a not-enough smart editor to find out where the
>>   not-locally defined symbols are actually defined (well sure that is
>>   not a problem for most of the people)
> Well, that's more of a question of what tools you use to edit/read
> Python code. I guess you could view it as a trade-off between ease of
> writing the core code and extensions (avoiding micromanagement of
> headers, and being able to document #include "Python.h" as the
> canonical way to get access to the Python API from C) versus tracking
> down macro definitions and symbol declarations (and that's really only
> for information, as the API is documented in the manuals anyway).
> I don't use an editor that can automatically find the definitions, but
> grep and the manuals does me fine.

Yes sure it makes sense, probably it's even better than including
only simple files, since all the contributions to Python.h can be moved
around and refactored without breaking all the code..

And for editor I use Emacs, which can actually do any kind of magic
on the symbols, I just didn't set it up for the python source code..


From eliben at  Sun Jan 29 20:05:33 2012
From: eliben at (Eli Bendersky)
Date: Sun, 29 Jan 2012 21:05:33 +0200
Subject: [Python-Dev] #include "Python.h"
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Jan 29, 2012 at 17:34, Andrea Crotti <andrea.crotti.0 at> wrote:
> I have a newbie question about CPython.
> Looking at the C code I noted that for example in tupleobject.c there is
> only one include
> #include "Python.h"
> Python.h actually includes everything as far as I can I see so:
> - it's very hard with a not-enough smart editor to find out where the
> ?not-locally defined symbols are actually defined (well sure that is
> ?not a problem for most of the people)

Hi Andrea,

Not sure what you mean by "not-enough smart editor". Dismissing IDEs
for the moment (which by your classifications are probably "smart
enough"), Python's source code (including headers included in
Python.h) is readily navigable with Emacs or Vim using ctags, which is
very easy to set up. Declarations are then easily found.

Even if you forgo such features of the editor, grepping (or source
specific greppers like ack or pss) also works fine most of the time.

> - if all the files include python.h, doesn't it generate very big object
> ?files? Or is it not a problem since they are stripped out after?

Header files usually don't affect object file size. Unless something
very fishy is going on (and this is not the case for headers included
from Python.h, AFAIK) headers only contain declarations which don't
affect code size. They may affect compilation time, but that's not a
bit problem for Python's code base which is very fast to compile.


From greg at  Sun Jan 29 22:20:07 2012
From: greg at (Gregory P. Smith)
Date: Sun, 29 Jan 2012 13:20:07 -0800
Subject: [Python-Dev] [issue13703] Hash collision security issue
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Jan 27, 2012 at 11:39 AM,  <martin at> wrote:
> In fact, none of the strategies fixes all issues with hash collisions;
> even the hash-randomization solutions only deal with string keys, and
> don't consider collisions on non-string keys.

The hash-randomization approach also works fine on immutable container
objects containing bytes and string keys such as tuples and UserString
that merely expose a combination of the hashes of all of their
contained elements.


From greg at  Sun Jan 29 22:26:06 2012
From: greg at (Gregory P. Smith)
Date: Sun, 29 Jan 2012 13:26:06 -0800
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 6:33 PM, Benjamin Peterson <benjamin at> wrote:
> 2012/1/27 Steven D'Aprano <steve at>:
>> Benjamin Peterson wrote:
>>> Hello everyone,
>>> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
>>> have decided to pronounce on what we want for our stable releases.
>>> What we have decided is that
>>> 1. Simple hash randomization is the way to go. We think this has the
>>> best chance of actually fixing the problem while being fairly
>>> straightforward such that we're comfortable putting it in a stable
>>> release.
>>> 2. It will be off by default in stable releases and enabled by an
>>> envar at runtime. This will prevent code breakage from dictionary
>>> order changing as well as people depending on the hash stability.
>> Do you have the expectation that it will become on by default in some future
>> release?
> Yes, 3.3. The solution in 3.3 could even be one of the more
> sophisticated proposals we have today.

Yay!  Thanks for the decision Release Managers!


From greg at  Sun Jan 29 22:39:10 2012
From: greg at (Gregory P. Smith)
Date: Sun, 29 Jan 2012 13:39:10 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 9:26 AM, Alex <alex.gaynor at> wrote:
> Eli Bendersky <eliben <at>> writes:
>> Hello,
>> Following an earlier discussion on python-ideas [1], we would like to
>> propose the following PEP for review. Discussion is welcome. The PEP
>> can also be viewed in HTML form at
>> [1]
> I'm -1 on this, for a pretty simple reason. Something goes into __preview__,
> instead of it's final destination directly because it needs feedback/possibly
> changes. However, given the release cycle of the stdlib (~18 months), any
> feedback it gets can't be seen by actual users until it's too late. Essentially
> you can only get one round of stdlib.
> I think a significantly healthier process (in terms of maximizing feedback and
> getting something into it's best shape) is to let a project evolve naturally on
> PyPi and in the ecosystem, give feedback to it from an inclusion perspective,
> and then include it when it becomes ready on it's own merits. The counter
> argument to ?this is that putting it in the stdlib gets you signficantly more
> eyeballs (and hopefully more feedback, therefore), my only response to this is:
> if it doesn't get eyeballs on PyPi I don't think there's a great enough need to
> justify it in the stdlib.

-1 from me as well.

How is the __preview__ namespace any different than the
PendingDeprecationWarning that nobody ever uses?  Nobody is likely to
write significant code depending on anything in __preview__ thus the
amount of feedback received would be low.

A better way to get additional feedback would be to promote libraries
that we are considering including by way of direct links to them on
pypi from the relevant areas of the Python documentation (including
the Module Reference / Index pages?) for that release and let the
feedback on them roll in via that route.

An example of this working: ipaddr is ready to go in. It got the
eyeballs and API modifications while still a pypi library as a result
of the discussion around the time it was originally suggested as being
added.  I or any other committers have simply not added it yet.


From francismb at  Sun Jan 29 21:59:27 2012
From: francismb at (francis)
Date: Sun, 29 Jan 2012 21:59:27 +0100
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>
Message-ID: <>

On 01/29/2012 11:31 AM, Mark Shannon wrote:
> It passes all the tests.
> (I had to change a couple that relied on dict repr() ordering)

Hi Mark,
I've cloned the repo, build it the I've tried with ./python -m test. I 
got some errors:

First in general:
340 tests OK.
2 tests failed:
     test_dis test_gdb
4 tests altered the execution environment:
     test_multiprocessing test_packaging test_site test_strlit
18 tests skipped:
     test_curses test_devpoll test_kqueue test_lzma test_msilib
     test_ossaudiodev test_smtpnet test_socketserver test_startfile
     test_timeout test_tk test_ttk_guionly test_urllib2net
     test_urllibnet test_winreg test_winsound test_xmlrpc_net
1 skip unexpected on linux:
[1348560 refs]

then test_dis:

== CPython 3.3.0a0 (default:f15cf35c9922, Jan 29 2012, 18:12:19) [GCC 4.6.2]
==   Linux-3.1.0-1-amd64-x86_64-with-debian-wheezy-sid little-endian
==   /home/ci/prog/cpython/hotpy_new_dict/build/test_python_14470
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, 
optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, 
ignore_environment=0, verbose=0, bytes_warning=0, quiet=0)
[1/1] test_dis
test_big_linenos (test.test_dis.DisTests) ... ok
test_boundaries (test.test_dis.DisTests) ... ok
test_bug_1333982 (test.test_dis.DisTests) ... ok
test_bug_708901 (test.test_dis.DisTests) ... ok
test_dis (test.test_dis.DisTests) ... ok
test_dis_none (test.test_dis.DisTests) ... ok
test_dis_object (test.test_dis.DisTests) ... ok
test_dis_traceback (test.test_dis.DisTests) ... ok
test_disassemble_bytes (test.test_dis.DisTests) ... ok
test_disassemble_method (test.test_dis.DisTests) ... ok
test_disassemble_method_bytes (test.test_dis.DisTests) ... ok
test_disassemble_str (test.test_dis.DisTests) ... ok
test_opmap (test.test_dis.DisTests) ... ok
test_opname (test.test_dis.DisTests) ... ok
test_code_info (test.test_dis.CodeInfoTests) ... FAIL
test_code_info_object (test.test_dis.CodeInfoTests) ... ok
test_pretty_flags_no_flags (test.test_dis.CodeInfoTests) ... ok
test_show_code (test.test_dis.CodeInfoTests) ... FAIL

FAIL: test_code_info (test.test_dis.CodeInfoTests)
Traceback (most recent call last):
   File "/home/ci/prog/cpython/hotpy_new_dict/Lib/test/", 
line 439, in test_code_info
     self.assertRegex(dis.code_info(x), expected)
AssertionError: Regex didn't match: 'Name:              
f\nFilename:          (.*)\nArgument count:    1\nKw-only arguments: 
0\nNumber of locals:  1\nStack size:        8\nFlags:             
OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n   0: None\nNames:\n   0: 
print\nVariable names:\n   0: c\nFree variables:\n   0: e\n   1: d\n   
2: f\n   3: y\n   4: x\n   5: z' not found in 'Name:              
count:    1\nKw-only arguments: 0\nNumber of locals:  1\nStack 
size:        8\nFlags:             OPTIMIZED, NEWLOCALS, 
NESTED\nConstants:\n   0: None\nNames:\n   0: print\nVariable names:\n   
0: c\nFree variables:\n   0: y\n   1: e\n   2: d\n   3: f\n   4: x\n   5: z'

FAIL: test_show_code (test.test_dis.CodeInfoTests)
Traceback (most recent call last):
   File "/home/ci/prog/cpython/hotpy_new_dict/Lib/test/", 
line 446, in test_show_code
     self.assertRegex(output.getvalue(), expected+"\n")
AssertionError: Regex didn't match: 'Name:              
f\nFilename:          (.*)\nArgument count:    1\nKw-only arguments: 
0\nNumber of locals:  1\nStack size:        8\nFlags:             
OPTIMIZED, NEWLOCALS, NESTED\nConstants:\n   0: None\nNames:\n   0: 
print\nVariable names:\n   0: c\nFree variables:\n   0: e\n   1: d\n   
2: f\n   3: y\n   4: x\n   5: z\n' not found in 'Name:              
count:    1\nKw-only arguments: 0\nNumber of locals:  1\nStack 
size:        8\nFlags:             OPTIMIZED, NEWLOCALS, 
NESTED\nConstants:\n   0: None\nNames:\n   0: print\nVariable names:\n   
0: c\nFree variables:\n   0: y\n   1: e\n   2: d\n   3: f\n   4: x\n   
5: z\n'

Ran 18 tests in 0.070s

FAILED (failures=2)
test test_dis failed
1 test failed:
[111919 refs]

For test gdb:

Lots of output .....

Ran 42 tests in 11.361s

FAILED (failures=28)
test test_gdb failed
1 test failed:
[109989 refs]

From p.f.moore at  Sun Jan 29 23:02:44 2012
From: p.f.moore at (Paul Moore)
Date: Sun, 29 Jan 2012 22:02:44 +0000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 29 January 2012 21:39, Gregory P. Smith <greg at> wrote:

> An example of this working: ipaddr is ready to go in. It got the
> eyeballs and API modifications while still a pypi library as a result
> of the discussion around the time it was originally suggested as being
> added. ?I or any other committers have simply not added it yet.

Interesting. I recall the API debates and uncertainty, but I don't
recall having seen anything to indicate that it all got resolved and
we're essentially "ready to go". If I were looking for an IP address
library, I wouldn't know where to go, and I certainly wouldn't know
that there was an option that would become part of the stdlib. Not
sure that counts as the approach "working"... (although I concede that
my lack of a *real* need for an IP address library may be a
contributing factor to my lack of knowledge...)


From martin at  Sun Jan 29 23:15:37 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 29 Jan 2012 23:15:37 +0100
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>
Message-ID: <>

> Now that issue 13703 has been largely settled,
> I want to propose my new dictionary implementation again.
> It is a little more polished than before.

Please clarify the status of that code: are you actually proposing
6a21f3b35e20 for inclusion into Python as-is? If so, please post it
as a patch to the tracker, as it will need to be reviewed (possibly
with requests for further changes).

If not, it would be good if you could give a list of things that need to
be done before you consider submission to Python.

Also, please submit a contrib form if you haven't done so.


From mark at  Sun Jan 29 23:26:21 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 22:26:21 +0000
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <> <>
Message-ID: <>

francis wrote:
> On 01/29/2012 11:31 AM, Mark Shannon wrote:
>> It passes all the tests.
>> (I had to change a couple that relied on dict repr() ordering)
> Hi Mark,
> I've cloned the repo, build it the I've tried with ./python -m test. I 
> got some errors:
> First in general:
> 340 tests OK.
> 2 tests failed:
>     test_dis test_gdb

> ****************************************************
> then test_dis:
> ======================================================================
> FAIL: test_code_info (test.test_dis.CodeInfoTests)
> ----------------------------------------------------------------------
> ======================================================================
> FAIL: test_show_code (test.test_dis.CodeInfoTests)
> ----------------------------------------------------------------------

These are known failures, the tests are at fault as they rely on dict 
ordering. However, they should be commented out. Probably crept back in 
again when I pulled the latest version of cpython -- I'll fix them now.

> *****************************************************
> For test gdb:
> Lots of output .....
> Ran 42 tests in 11.361s
> FAILED (failures=28)
> test test_gdb failed
> 1 test failed:
>     test_gdb
> [109989 refs]

I still have gdb 6.somthing,
would you mail me the full output please,
so I can see what the problem is.


From barry at  Sun Jan 29 23:44:07 2012
From: barry at (Barry Warsaw)
Date: Sun, 29 Jan 2012 17:44:07 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 28, 2012, at 07:29 PM, Guido van Rossum wrote:

>Finally, if you really want to put warnings in whenever an
>experimental module is being used, make it a silent warning, like
>SilentDeprecationWarning. That allows people to request more strict
>warnings without unduly alarming the users of an app.

I'll just note too that we have examples of "stable" APIs in modules being
used successfully in the field for years, and still having long hand-wringing
debates about whether the API choices are right or not.  <cough>email</cough>

Nothing beats people beating on it heavily for years in production code to
shake things out.  I often think a generic answer to "did I get the API right"
could be "no, but it's okay" :)


From mark at  Sun Jan 29 23:44:53 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 22:44:53 +0000
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis wrote:
>> Now that issue 13703 has been largely settled,
>> I want to propose my new dictionary implementation again.
>> It is a little more polished than before.
> Please clarify the status of that code: are you actually proposing
> 6a21f3b35e20 for inclusion into Python as-is? If so, please post it
> as a patch to the tracker, as it will need to be reviewed (possibly
> with requests for further changes).

I thought it already was a patch. What do I need to do to make it a patch?

> If not, it would be good if you could give a list of things that need to
> be done before you consider submission to Python.

A few tests that rely on dict ordering should probably be fixed first.
I'll submit bug reports for those.

> Also, please submit a contrib form if you haven't done so.

Where do I find it?


From martin at  Sun Jan 29 23:57:36 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 29 Jan 2012 23:57:36 +0100
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

>     I... I think I might have already done this, inadvertently.  I
>     needed an x64 VS2010 debug build of Subversion/APR*/Python a few
>     weeks ago -- forgetting the fact that we're still on VS2008.

There is a lot of duplication of work going on here: at least four
people have done the same. The more people duplicate the work, the
more urgent it apparently becomes that the trunk switches "officially".

>   * Three new buildbot scripts:
>         - build-amd64-vs10.bat
>         - clean-amd64-vs10.bat
>         - external-amd64-vs10.bat

When we switch, these should actually replace the current ones, rather
than being additions.

>     So, I guess my question is, is that work useful?

Perhaps not, given that several other copies of that to draw from may
exist. OTOH, I haven't heard anybody reporting these specific changes.
In any case, it's now in Brian's hand.


From martin at  Mon Jan 30 00:01:24 2012
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 30 Jan 2012 00:01:24 +0100
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <> <>
Message-ID: <>

>> Please clarify the status of that code: are you actually proposing
>> 6a21f3b35e20 for inclusion into Python as-is? If so, please post it
>> as a patch to the tracker, as it will need to be reviewed (possibly
>> with requests for further changes).
> I thought it already was a patch. What do I need to do to make it a patch?

I missed your announcement of issue13903; all is fine here.

> Where do I find it?


From mark at  Mon Jan 30 00:20:32 2012
From: mark at (Mark Shannon)
Date: Sun, 29 Jan 2012 23:20:32 +0000
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Matt Joiner wrote:
> Mark, Good luck with getting this in, I'm also hopeful about coroutines, 
> maybe after pushing your dict optimization your coroutine implementation 
> will get more consideration.

Shush, don't say the C word or you'll put people off ;)

I'm actually not that fussed about the coroutine implementation.
With "yield from" generators have all the power of asymmetric coroutines.
I think my coroutine implementation is a neater way to do things,
but it is not worth the fuss.

Anyway, I'm working on my next crazy experiment :)


From steve at  Mon Jan 30 00:30:14 2012
From: steve at (Steven D'Aprano)
Date: Mon, 30 Jan 2012 10:30:14 +1100
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Mark Shannon wrote:
> Antoine Pitrou wrote:
>> On Sun, 29 Jan 2012 09:56:11 -0500
>> Benjamin Peterson <benjamin at> wrote:
>>> 2012/1/29 Mark Shannon <mark at>:
>>>> Hi,
>>>> Now that issue 13703 has been largely settled,
>>>> I want to propose my new dictionary implementation again.
>>>> It is a little more polished than before.
>>> If you're serious about changing the dictionary implementation, I
>>> think you should write a PEP. It should explain the new dicts
>>> advantages (and disadvantages?) and give comprehensive benchmark
>>> numbers. Something along the lines of
>>> I should think.
>> "New dictionary implementation" is a misnomer here. Mark's patch merely
>> allows to share the keys array between several dictionaries. The lookup
>> algorithm remains exactly the same as far as I've read. It's actually
>> much less invasive than e.g. Martin's AVL trees-for-hash-collisions
>> proposal.
> Antoine is right. It is a reorganisation of the dict, plus a couple of 
> changes to typeobject.c and object.c to ensure that instance 
> dictionaries do indeed share keys arrays.

I don't quite follow how that could work.

If I have this:

class C:

a = C()
b = C()

a.spam = 1
b.ham = 2

how can a.__dict__ and b.__dict__ share key arrays? I've tried reading the 
source, but I'm afraid I don't understand it well enough to make sense of it.


From ncoghlan at  Mon Jan 30 00:46:13 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 30 Jan 2012 09:46:13 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 30, 2012 at 8:44 AM, Barry Warsaw <barry at> wrote:
> Nothing beats people beating on it heavily for years in production code to
> shake things out. ?I often think a generic answer to "did I get the API right"
> could be "no, but it's okay" :)

Heh, my answer to complaints about the urrlib (etc) APIs being
horrendous in the modern web era is to point out that they were put
together in an age where "web" mostly meant "unauthenticated HTTP GET

They're hard to use for modern authentication protocols because they
*predate* widespread use of such things...


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From victor.stinner at  Mon Jan 30 00:46:32 2012
From: victor.stinner at (Victor Stinner)
Date: Mon, 30 Jan 2012 00:46:32 +0100
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

>> import threading
>> s = threading.Semaphore(0.5)
> But why would you want to pass a float? It seems like API abuse to me.

If something should be changed, Semaphore(arg) should raise a
TypeError if arg is not an integer.


From anacrolix at  Mon Jan 30 01:31:34 2012
From: anacrolix at (Matt Joiner)
Date: Mon, 30 Jan 2012 11:31:34 +1100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

I think an advocacy of 3rd party modules would start with modules such as
ipaddr, requests, regex. Linking directly to them from the python core
documentation, while requesting they hold a successful moratorium in order
to be included in a later standard module release.
On Jan 30, 2012 10:47 AM, "Nick Coghlan" <ncoghlan at> wrote:

> On Mon, Jan 30, 2012 at 8:44 AM, Barry Warsaw <barry at> wrote:
> > Nothing beats people beating on it heavily for years in production code
> to
> > shake things out.  I often think a generic answer to "did I get the API
> right"
> > could be "no, but it's okay" :)
> Heh, my answer to complaints about the urrlib (etc) APIs being
> horrendous in the modern web era is to point out that they were put
> together in an age where "web" mostly meant "unauthenticated HTTP GET
> requests".
> They're hard to use for modern authentication protocols because they
> *predate* widespread use of such things...
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From francismb at  Mon Jan 30 00:22:07 2012
From: francismb at (francis)
Date: Mon, 30 Jan 2012 00:22:07 +0100
Subject: [Python-Dev] A new dictionary implementation
In-Reply-To: <>
References: <> <>
Message-ID: <>

> I still have gdb 6.somthing,
> would you mail me the full output please,
> so I can see what the problem is.
It's done, let me know if you need more output.


From jxo6948 at  Mon Jan 30 03:11:08 2012
From: jxo6948 at (John O'Connor)
Date: Sun, 29 Jan 2012 21:11:08 -0500
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson <benjamin at> wrote:
> But why would you want to pass a float? It seems like API abuse to me.

Agreed. Anything else seems meaningless.

From ethan at  Mon Jan 30 05:51:29 2012
From: ethan at (Ethan Furman)
Date: Sun, 29 Jan 2012 20:51:29 -0800
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <jg30j2$q67$>
References: <>	<>	<>
Message-ID: <>

Latest addition for PEP 409 has been sent.  Text follows:

Language Details

Currently, __context__ and __cause__ start out as None, and then get set
as exceptions occur.

To support 'from None', __context__ will stay as it is, but __cause__
will start out as False, and will change to None when the 'raise ...
from None' method is used.

If __cause__ is False the __context__ (if any) will be printed.

If __cause__ is None the __context__ will not be printed.

if __cause__ is anything else, __cause__ will be printed.

This has the benefit of leaving the __context__ intact for future
logging, querying, etc., while suppressing its display if it is not caught.

raise ... from ... is not disallowed outside a try block, but this
behavior is not guaranteed to remain.


Should that last disclaimer be there?  Should it be changed?


From ncoghlan at  Mon Jan 30 07:23:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 30 Jan 2012 16:23:01 +1000
Subject: [Python-Dev] PEP for allowing 'raise NewException from None'
In-Reply-To: <>
References: <>
	<> <jg30j2$q67$>
Message-ID: <>

On Mon, Jan 30, 2012 at 2:51 PM, Ethan Furman <ethan at> wrote:
> raise ... from ... is not disallowed outside a try block, but this
> behavior is not guaranteed to remain.
> ------------------------------------------------------------------
> Should that last disclaimer be there? ?Should it be changed?

I'd leave it out - the original PEP didn't disallow it, enforcing it
would be annoying, and it's easy enough to pick up if you happen to
happen to care (it will mean __cause__ is set along with __context ==


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From thcberserk at  Mon Jan 30 10:51:45 2012
From: thcberserk at (Ivano)
Date: Mon, 30 Jan 2012 09:51:45 +0000 (UTC)
Subject: [Python-Dev] Release cycle question
Message-ID: <>

Hello everyone.
I'm writing to ask if Python uses a "fixed" release
time or if it depends strongly on something else.
In example, Blender does and since I'm diving
into Python because I would like to extend it, I 
would like to know if my work will have a default
lifetime or not.
By the way, Python 3 changed the game AFAIK,
will another major change come short?
Thanks in advance for any help.

Bye, Ivano.

From ncoghlan at  Mon Jan 30 11:55:06 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 30 Jan 2012 20:55:06 +1000
Subject: [Python-Dev] Release cycle question
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 30, 2012 at 7:51 PM, Ivano <thcberserk at> wrote:
> Hello everyone.
> I'm writing to ask if Python uses a "fixed" release
> time or if it depends strongly on something else.
> In example, Blender does and since I'm diving
> into Python because I would like to extend it, I
> would like to know if my work will have a default
> lifetime or not.

Hi Ivano,

The current release cycle is documented in the developer's guide:

At this point in time, there are two official releases:
Python 2.7 and 3.2

2.7 was released in July 2010 and will receive maintenance updates
until around July 2015 (as it is the final release in the 2.x series)
3.2 was released in February 2011 and will receive maintenance updates
until around August this year (but will receive further source-only
security updates until around February 2016)
3.3 is due for release in August this year.

However, those are the official support dates specifically for
python-dev. OS vendors such as Red Hat and Canonical provide support
for older versions of Python as part of their enterprise releases
(e.g. RHEL5 is still supported by Red Hat and ships with Python 2.4,
even though python-dev ended upstream security updates for 2.4 in

> By the way, Python 3 changed the game AFAIK,
> will another major change come short?

No, as noted on the development cycle page, changes on the scale of
those between Python 2 and Python 3 are not expected any time in the
near future. I'd personally be surprised if anything like that
transition happened again within the next decade.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From brett at  Mon Jan 30 18:03:20 2012
From: brett at (Brett Cannon)
Date: Mon, 30 Jan 2012 12:03:20 -0500
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Jan 27, 2012 at 21:33, Benjamin Peterson <benjamin at>wrote:

> 2012/1/27 Steven D'Aprano <steve at>:
> > Benjamin Peterson wrote:
> >>
> >> Hello everyone,
> >> In effort to get a fix out before Perl 6 goes mainstream, Barry and I
> >> have decided to pronounce on what we want for our stable releases.
> >> What we have decided is that
> >> 1. Simple hash randomization is the way to go. We think this has the
> >> best chance of actually fixing the problem while being fairly
> >> straightforward such that we're comfortable putting it in a stable
> >> release.
> >> 2. It will be off by default in stable releases and enabled by an
> >> envar at runtime. This will prevent code breakage from dictionary
> >> order changing as well as people depending on the hash stability.
> >


> >
> > Do you have the expectation that it will become on by default in some
> future
> > release?
> Yes, 3.3. The solution in 3.3 could even be one of the more
> sophisticated proposals we have today.

I think that would be good. And I would  even argue we remove support for
turning it off to force people to no longer lean on dict ordering as a
crutch (in 3.3 obviously).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Mon Jan 30 18:14:44 2012
From: barry at (Barry Warsaw)
Date: Mon, 30 Jan 2012 12:14:44 -0500
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 30, 2012, at 12:03 PM, Brett Cannon wrote:

>I think that would be good. And I would  even argue we remove support for
>turning it off to force people to no longer lean on dict ordering as a
>crutch (in 3.3 obviously).

Yes, please!


From scott+python-dev at  Mon Jan 30 19:19:56 2012
From: scott+python-dev at (Scott Dial)
Date: Mon, 30 Jan 2012 13:19:56 -0500
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/29/2012 4:39 PM, Gregory P. Smith wrote:
> An example of this working: ipaddr is ready to go in. It got the
> eyeballs and API modifications while still a pypi library as a result
> of the discussion around the time it was originally suggested as being
> added.  I or any other committers have simply not added it yet.

This is wrong. PEP 3144 was not pronounced upon, so ipaddr is not just
waiting for someone to commit it; it's waiting on consensus and

PEP 3144 wasn't pronounced upon because there were significant
disagreements about the design of the API proposed in the PEP. As it
stands, I believe the authorship of ipaddr either decided that they were
not going to compromise their module or lost interest.

See Nick Coghlan's summary:

Scott Dial
scott at

From guido at  Mon Jan 30 19:29:44 2012
From: guido at (Guido van Rossum)
Date: Mon, 30 Jan 2012 10:29:44 -0800
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

Maybe that's another example of waiting too long for the perfect
decision though. In the last ~12 months, ipaddr was downloaded at
least 11,000 times from its home
( There's been a
fair amount of changes over that time and a new release was put out 10
days ago. What are the stats for the "competing" package?


On Mon, Jan 30, 2012 at 10:19 AM, Scott Dial
<scott+python-dev at> wrote:
> On 1/29/2012 4:39 PM, Gregory P. Smith wrote:
>> An example of this working: ipaddr is ready to go in. It got the
>> eyeballs and API modifications while still a pypi library as a result
>> of the discussion around the time it was originally suggested as being
>> added. ?I or any other committers have simply not added it yet.
> This is wrong. PEP 3144 was not pronounced upon, so ipaddr is not just
> waiting for someone to commit it; it's waiting on consensus and
> pronouncement.
> PEP 3144 wasn't pronounced upon because there were significant
> disagreements about the design of the API proposed in the PEP. As it
> stands, I believe the authorship of ipaddr either decided that they were
> not going to compromise their module or lost interest.
> See Nick Coghlan's summary:
> --
> Scott Dial
> scott at
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (

From bauertomer at  Mon Jan 30 19:40:32 2012
From: bauertomer at (T.B.)
Date: Mon, 30 Jan 2012 20:40:32 +0200
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-30 01:46, Victor Stinner wrote:
>> But why would you want to pass a float? It seems like API abuse to me.
> If something should be changed, Semaphore(arg) should raise a
> TypeError if arg is not an integer.
Short version:
I propose the the change to be
-        while self._value == 0:
+        while self._value < 1:
This should not change the flow when Semaphore._value is an int.

Longer explanation:
I thought it is surprising to use math.floor() for threading.Semaphore, 
but now as you propose, we will need to use something like 
int(math.floor(value)) in Python2.x - which is even more surprising. 
That is because math.floor() (and round() for that matter) return a 
float object in Python2.x.

Note: isinstance(4.0, numbers.Integral) is False, even in Python3.x, but 
until now 4.0 was valid as a value for Semaphore(). Also, using the 
builtin int()/math.trunc() on a float is probably not what you want 
here, but rather math.floor().

The value argument given to threading.Semaphore() is really a duck (or 
an object) that can be compared to 0 and 1, incremented by 1 and 
decremented by 1. These are properties that fit float. Why should you 
force the entire builtin int behavior on that object?

I agree that using a float as the counter smells bad, but at times you 
might have something like a fractional resource (which is different from 
a floating point number). In such cases Semaphore.acquire(), after the 
tiny patch above, can be thought as checking if you have at least one 
"unit of resource" available. If you do have at least one such resource 
- acquire it. This will make sure the invariant "The counter can never 
go below zero" holds.


From guido at  Mon Jan 30 19:52:29 2012
From: guido at (Guido van Rossum)
Date: Mon, 30 Jan 2012 10:52:29 -0800
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

TB, what's your use case for passing a float to a semaphore?
Semaphores are conceptually tied to integers. You've kept arguing a
few times now that the workaround you need are clumsy, but you've not
explained why you're passing floats in the first place. A "fractional
resource" just doesn't sound like a real use case to me.

On Mon, Jan 30, 2012 at 10:40 AM, T.B. <bauertomer at> wrote:
> On 2012-01-30 01:46, Victor Stinner wrote:
>>> But why would you want to pass a float? It seems like API abuse to me.
>> If something should be changed, Semaphore(arg) should raise a
>> TypeError if arg is not an integer.
> Short version:
> I propose the the change to be
> - ? ? ? ?while self._value == 0:
> + ? ? ? ?while self._value < 1:
> This should not change the flow when Semaphore._value is an int.
> Longer explanation:
> I thought it is surprising to use math.floor() for threading.Semaphore, but
> now as you propose, we will need to use something like
> int(math.floor(value)) in Python2.x - which is even more surprising. That is
> because math.floor() (and round() for that matter) return a float object in
> Python2.x.
> Note: isinstance(4.0, numbers.Integral) is False, even in Python3.x, but
> until now 4.0 was valid as a value for Semaphore(). Also, using the builtin
> int()/math.trunc() on a float is probably not what you want here, but rather
> math.floor().
> The value argument given to threading.Semaphore() is really a duck (or an
> object) that can be compared to 0 and 1, incremented by 1 and decremented by
> 1. These are properties that fit float. Why should you force the entire
> builtin int behavior on that object?
> I agree that using a float as the counter smells bad, but at times you might
> have something like a fractional resource (which is different from a
> floating point number). In such cases Semaphore.acquire(), after the tiny
> patch above, can be thought as checking if you have at least one "unit of
> resource" available. If you do have at least one such resource - acquire it.
> This will make sure the invariant "The counter can never go below zero"
> holds.
> Regards,
> TB
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (

From solipsis at  Mon Jan 30 19:59:22 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 30 Jan 2012 19:59:22 +0100
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
References: <>
Message-ID: <>

On Sun, 29 Jan 2012 16:42:28 +1000
Nick Coghlan <ncoghlan at> wrote:
> On Sun, Jan 29, 2012 at 1:29 PM, Guido van Rossum <guido at> wrote:
> > On Sat, Jan 28, 2012 at 5:33 PM, Nick Coghlan <ncoghlan at> wrote:
> >> I'm willing to go along with that (especially given your report of
> >> AppEngine's experience with the "labs" namespace).
> >>
> >> Can we class this as a pronouncement on PEP 408? That is, "No to
> >> adding a __preview__ namespace, but yes to adding regex directly for
> >> 3.3"?
> >
> > Yup. We seem to have a tendency to over-analyze decisions a bit lately
> > (witness the hand-wringing about the hash collision DoS attack).
> I have now updated PEP 408 accordingly (i.e. rejected, but with a
> specific note about regex).

It would be nice if that pronouncement or decision could outline the
steps required to include an "experimental" module in the stdlib, and
the steps required to move it from "experimental" to "stable".



From stefan at  Mon Jan 30 20:06:44 2012
From: stefan at (stefan brunthaler)
Date: Mon, 30 Jan 2012 11:06:44 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <>


> Could you try benchmarking with the "standard" benchmarks:
> and see what sort of performance gains you get?
Yeah, of course. I already did. Refere to the page listed below for
details. I did not look into the results yet, though.

> How portable is the threaded interpreter?
Well, you can implement threaded code on any machine that support
indirect branch instructions. Fortunately, GCC supports the
"label-as-values" feature, which makes it available on any machine
that supports GCC. My optimizations themselves are portable, and I
tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports
this feature, too.)

> Do you have a public repository for the code, so we can take a look?
I have created a patch (as Benjamin wanted) and put all of the
resources (i.e., benchmark results and the patch itself) on my home


From solipsis at  Mon Jan 30 20:13:52 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 30 Jan 2012 20:13:52 +0100
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
References: <>
Message-ID: <>


> Well, you can implement threaded code on any machine that support
> indirect branch instructions. Fortunately, GCC supports the
> "label-as-values" feature, which makes it available on any machine
> that supports GCC. My optimizations themselves are portable, and I
> tested them on a PowerPC for my thesis, too. (AFAIR, llvm supports
> this feature, too.)

Well, you're aware that Python already uses threaded code where
available? Or are you testing against Python 2?



From stefan at  Mon Jan 30 20:18:09 2012
From: stefan at (stefan brunthaler)
Date: Mon, 30 Jan 2012 11:18:09 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <>

> Well, you're aware that Python already uses threaded code where
> available? Or are you testing against Python 2?
Yes, and I am building on that.


From ncoghlan at  Mon Jan 30 22:07:53 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 07:07:53 +1000
Subject: [Python-Dev] plugging the hash attack
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 3:03 AM, Brett Cannon <brett at> wrote:
> I think that would be good. And I would ?even argue we remove support for
> turning it off to force people to no longer lean on dict ordering as a
> crutch (in 3.3 obviously).

On-by-default should be enough to cover that. Just as we allow people
to force the random seed to reproduce particular sequences, there's
value in being able to increase determinism in cases where the
collision attack isn't a concern.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From brett at  Mon Jan 30 22:26:30 2012
From: brett at (Brett Cannon)
Date: Mon, 30 Jan 2012 16:26:30 -0500
Subject: [Python-Dev] [Python-checkins] cpython: Issue #8828: Add new
 function os.replace(), for cross-platform renaming with
In-Reply-To: <>
References: <>
Message-ID: <>

Should this end up being used in importlib through _os?

On Mon, Jan 30, 2012 at 16:11, antoine.pitrou <python-checkins at>wrote:

> changeset:   74689:80ddbd822227
> user:        Antoine Pitrou <solipsis at>
> date:        Mon Jan 30 22:08:52 2012 +0100
> summary:
>  Issue #8828: Add new function os.replace(), for cross-platform renaming
> with overwriting.
> files:
>  Doc/library/os.rst    |  18 +++++++++-
>  Lib/test/   |  12 ++++++
>  Misc/NEWS             |   3 +
>  Modules/posixmodule.c |  55 +++++++++++++++++++++---------
>  4 files changed, 69 insertions(+), 19 deletions(-)
> diff --git a/Doc/library/os.rst b/Doc/library/os.rst
> --- a/Doc/library/os.rst
> +++ b/Doc/library/os.rst
> @@ -1889,8 +1889,9 @@
>    Unix flavors if *src* and *dst* are on different filesystems.  If
> successful,
>    the renaming will be an atomic operation (this is a POSIX requirement).
>  On
>    Windows, if *dst* already exists, :exc:`OSError` will be raised even if
> it is a
> -   file; there may be no way to implement an atomic rename when *dst*
> names an
> -   existing file.
> +   file.
> +
> +   If you want cross-platform overwriting of the destination, use
> :func:`replace`.
>    Availability: Unix, Windows.
> @@ -1908,6 +1909,19 @@
>       permissions needed to remove the leaf directory or file.
> +.. function:: replace(src, dst)
> +
> +   Rename the file or directory *src* to *dst*.  If *dst* is a directory,
> +   :exc:`OSError` will be raised.  If *dst* exists and is a file, it will
> +   be replaced silently if the user has permission.  The operation may
> fail
> +   if *src* and *dst* are on different filesystems.  If successful,
> +   the renaming will be an atomic operation (this is a POSIX requirement).
> +
> +   Availability: Unix, Windows
> +
> +   .. versionadded:: 3.3
> +
> +
>  .. function:: rmdir(path)
>    Remove (delete) the directory *path*.  Only works when the directory is
> diff --git a/Lib/test/ b/Lib/test/
> --- a/Lib/test/
> +++ b/Lib/test/
> @@ -129,6 +129,18 @@
>         self.fdopen_helper('r')
>         self.fdopen_helper('r', 100)
> +    def test_replace(self):
> +        TESTFN2 = support.TESTFN + ".2"
> +        with open(support.TESTFN, 'w') as f:
> +            f.write("1")
> +        with open(TESTFN2, 'w') as f:
> +            f.write("2")
> +        self.addCleanup(os.unlink, TESTFN2)
> +        os.replace(support.TESTFN, TESTFN2)
> +        self.assertRaises(FileNotFoundError, os.stat, support.TESTFN)
> +        with open(TESTFN2, 'r') as f:
> +            self.assertEqual(, "1")
> +
>  # Test attributes on return values from os.*stat* family.
>  class StatAttributeTests(unittest.TestCase):
> diff --git a/Misc/NEWS b/Misc/NEWS
> --- a/Misc/NEWS
> +++ b/Misc/NEWS
> @@ -463,6 +463,9 @@
>  Library
>  -------
> +- Issue #8828: Add new function os.replace(), for cross-platform renaming
> +  with overwriting.
> +
>  - Issue #13848: open() and the FileIO constructor now check for NUL
>   characters in the file name.  Patch by Hynek Schlawack.
> diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
> --- a/Modules/posixmodule.c
> +++ b/Modules/posixmodule.c
> @@ -3280,17 +3280,16 @@
>  #endif /* HAVE_SETPRIORITY */
> -PyDoc_STRVAR(posix_rename__doc__,
> -"rename(old, new)\n\n\
> -Rename a file or directory.");
> -
> -static PyObject *
> -posix_rename(PyObject *self, PyObject *args)
> +static PyObject *
> +internal_rename(PyObject *self, PyObject *args, int is_replace)
>  {
>  #ifdef MS_WINDOWS
>     PyObject *src, *dst;
>     BOOL result;
> -    if (PyArg_ParseTuple(args, "UU:rename", &src, &dst))
> +    int flags = is_replace ? MOVEFILE_REPLACE_EXISTING : 0;
> +    if (PyArg_ParseTuple(args,
> +                         is_replace ? "UU:replace" : "UU:rename",
> +                         &src, &dst))
>     {
>         wchar_t *wsrc, *wdst;
> @@ -3301,16 +3300,17 @@
>         if (wdst == NULL)
>             return NULL;
> -        result = MoveFileW(wsrc, wdst);
> +        result = MoveFileExW(wsrc, wdst, flags);
>         if (!result)
> -            return win32_error("rename", NULL);
> +            return win32_error(is_replace ? "replace" : "rename", NULL);
>         Py_INCREF(Py_None);
>         return Py_None;
>     }
>     else {
>         PyErr_Clear();
> -        if (!PyArg_ParseTuple(args, "O&O&:rename",
> +        if (!PyArg_ParseTuple(args,
> +                              is_replace ? "O&O&:replace" : "O&O&:rename",
>                               PyUnicode_FSConverter, &src,
>                               PyUnicode_FSConverter, &dst))
>             return NULL;
> @@ -3319,15 +3319,15 @@
>             goto error;
> -        result = MoveFileA(PyBytes_AS_STRING(src),
> -                           PyBytes_AS_STRING(dst));
> +        result = MoveFileExA(PyBytes_AS_STRING(src),
> +                             PyBytes_AS_STRING(dst), flags);
>         Py_XDECREF(src);
>         Py_XDECREF(dst);
>         if (!result)
> -            return win32_error("rename", NULL);
> +            return win32_error(is_replace ? "replace" : "rename", NULL);
>         Py_INCREF(Py_None);
>         return Py_None;
> @@ -3337,10 +3337,30 @@
>         return NULL;
>     }
>  #else
> -    return posix_2str(args, "O&O&:rename", rename);
> -#endif
> -}
> -
> +    return posix_2str(args,
> +                      is_replace ? "O&O&:replace" : "O&O&:rename",
> rename);
> +#endif
> +}
> +
> +PyDoc_STRVAR(posix_rename__doc__,
> +"rename(old, new)\n\n\
> +Rename a file or directory.");
> +
> +static PyObject *
> +posix_rename(PyObject *self, PyObject *args)
> +{
> +    return internal_rename(self, args, 0);
> +}
> +
> +PyDoc_STRVAR(posix_replace__doc__,
> +"replace(old, new)\n\n\
> +Rename a file or directory, overwriting the destination.");
> +
> +static PyObject *
> +posix_replace(PyObject *self, PyObject *args)
> +{
> +    return internal_rename(self, args, 1);
> +}
>  PyDoc_STRVAR(posix_rmdir__doc__,
>  "rmdir(path)\n\n\
> @@ -10555,6 +10575,7 @@
>     {"readlink",        win_readlink, METH_VARARGS, win_readlink__doc__},
>  #endif /* !defined(HAVE_READLINK) && defined(MS_WINDOWS) */
>     {"rename",          posix_rename, METH_VARARGS, posix_rename__doc__},
> +    {"replace",         posix_replace, METH_VARARGS,
> posix_replace__doc__},
>     {"rmdir",           posix_rmdir, METH_VARARGS, posix_rmdir__doc__},
>     {"stat",            posix_stat, METH_VARARGS, posix_stat__doc__},
>     {"stat_float_times", stat_float_times, METH_VARARGS,
> stat_float_times__doc__},
> --
> Repository URL:
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Mon Jan 30 22:34:29 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 30 Jan 2012 22:34:29 +0100
Subject: [Python-Dev] cpython: Issue #8828: Add new function
 os.replace(), for cross-platform renaming with
References: <>
Message-ID: <>

On Mon, 30 Jan 2012 16:26:30 -0500
Brett Cannon <brett at> wrote:
> Should this end up being used in importlib through _os?

Yes, probably. I hadn't thought about that.



From ncoghlan at  Mon Jan 30 22:44:26 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 07:44:26 +1000
Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard
 library __preview__ package)
Message-ID: <>

On Tue, Jan 31, 2012 at 4:19 AM, Scott Dial
<scott+python-dev at> wrote:
> PEP 3144 wasn't pronounced upon because there were significant
> disagreements about the design of the API proposed in the PEP. As it
> stands, I believe the authorship of ipaddr either decided that they were
> not going to compromise their module or lost interest.
> See Nick Coghlan's summary:

Peter Moody actually addressed all my comments from last year (alas, I
forgot that python-ideas got dropped from the latter part of the email
chain, so it became a private discussion between Peter, Guido and
myself). I apparently got distracted by other issues and never
followed up on Peter's final review request. The branch with the
relevant changes is here (these weren't added back into ipaddr
mainline since they aren't all backwards compatible with the existing
ipaddr API):

Peter was very responsive and accommodating during that discussion :)

(The notes below are an edited version of Peter's off-list reply to me
from last year, reflecting the final state of the ipaddr 3144 branch)

On Mon, Aug 29, 2011 at 7:09 PM, Nick Coghlan <ncoghlan at> wrote:

    I believe the PEP would be significantly more palatable with the
    following changes/additions:
    1. Draft ReStructuredText documentation for inclusion in the stdlib docs

(still needed)

    2. Removal of the "ip" attribute of IP network objects (since it makes
    the nominal "networks" behave like IP interface definitions)

the Class hierarchy now looks like:

_IPAddrBase(object) # mother of everything
_BaseAddress(_IPAddrBase) # base for addresses
_ BaseNetwork(_IPAddrBase) # base for networks and interfaces, could
use be renamed.
_BaseV4(object) # ipv4 base
_BaseV6(object) # ipv6 base

IPv4Address(_BaseV4, _BaseAddress)
IPv4Interface(_BaseV4, _BaseNetwork)

IPv6Address(_BaseV6, _BaseAddress)
IPv6Interface(_BaseV6, _BaseNetwork)

(essentially, the current ipaddr "Network" objects become "Interface"
objects in PEP 3144, with a new strict "Network" object that has no ip

    3. "network" property renamed to "netaddr" (since it returns an
    address object rather than a network object)

renamed to network_address.
did the same for the broadcast_address.

    4. "strict" parameter removed from class signatures, replaced with
    class method for non-strict behaviour

'strict' is gone, just create IPv*Interface objects or use the
ip_interface API instead. Network objects are always strict.

    5. Factory functions renamed so they don't look like class names
    (ip_network, ip_address, ip)

Now ip_address, ip_network, ip_interface

    6. "strict" parameter on factory functions modified to default to True
    rather than False

'strict' is gone. Interfaces allow a host IP, Networks don't.

    7. Addition of an explicit "IPInterface" class to cover the
    association of an address with a specific network that is currently
    handled by storing arbitrary addresses on IP network objects


So with a cleanup of the docstrings (and creation of some ReST docs
based on them) a definite +1 from me for inclusion of ipaddr (based on
the 3144 branch in SVN) in 3.3. (with the tweaks to the API, we may
want to use a different name like "ipaddress" or "iptools", though -
otherwise people could be legitimately confused by the differences
relative to the PyPI "ipaddr" module)


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From pmoody at  Mon Jan 30 22:52:26 2012
From: pmoody at (Peter Moody)
Date: Mon, 30 Jan 2012 13:52:26 -0800
Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 --
 Standard library __preview__ package)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 30, 2012 at 1:44 PM, Nick Coghlan <ncoghlan at> wrote:
> On Tue, Jan 31, 2012 at 4:19 AM, Scott Dial
> <scott+python-dev at> wrote:
>> PEP 3144 wasn't pronounced upon because there were significant
>> disagreements about the design of the API proposed in the PEP. As it
>> stands, I believe the authorship of ipaddr either decided that they were
>> not going to compromise their module or lost interest.
>> See Nick Coghlan's summary:
> Peter Moody actually addressed all my comments from last year (alas, I
> forgot that python-ideas got dropped from the latter part of the email
> chain, so it became a private discussion between Peter, Guido and
> myself). I apparently got distracted by other issues and never
> followed up on Peter's final review request. The branch with the
> relevant changes is here (these weren't added back into ipaddr
> mainline since they aren't all backwards compatible with the existing
> ipaddr API):
> Peter was very responsive and accommodating during that discussion :)
> (The notes below are an edited version of Peter's off-list reply to me
> from last year, reflecting the final state of the ipaddr 3144 branch)
> On Mon, Aug 29, 2011 at 7:09 PM, Nick Coghlan <ncoghlan at> wrote:
> ? ?I believe the PEP would be significantly more palatable with the
> ? ?following changes/additions:
> ? ?1. Draft ReStructuredText documentation for inclusion in the stdlib docs
> (still needed)
> ? ?2. Removal of the "ip" attribute of IP network objects (since it makes
> ? ?the nominal "networks" behave like IP interface definitions)
> the Class hierarchy now looks like:
> _IPAddrBase(object) # mother of everything
> _BaseAddress(_IPAddrBase) # base for addresses
> _ BaseNetwork(_IPAddrBase) # base for networks and interfaces, could
> use be renamed.
> _BaseV4(object) # ipv4 base
> _BaseV6(object) # ipv6 base
> IPv4Address(_BaseV4, _BaseAddress)
> IPv4Interface(_BaseV4, _BaseNetwork)
> IPv4Network(IPv4Interface)
> IPv6Address(_BaseV6, _BaseAddress)
> IPv6Interface(_BaseV6, _BaseNetwork)
> IPv6Network(IPv6Interface)
> (essentially, the current ipaddr "Network" objects become "Interface"
> objects in PEP 3144, with a new strict "Network" object that has no ip
> attribute)
> ? ?3. "network" property renamed to "netaddr" (since it returns an
> ? ?address object rather than a network object)
> renamed to network_address.
> did the same for the broadcast_address.
> ? ?4. "strict" parameter removed from class signatures, replaced with
> ? ?class method for non-strict behaviour
> 'strict' is gone, just create IPv*Interface objects or use the
> ip_interface API instead. Network objects are always strict.
> ? ?5. Factory functions renamed so they don't look like class names
> ? ?(ip_network, ip_address, ip)
> Now ip_address, ip_network, ip_interface
> ? ?6. "strict" parameter on factory functions modified to default to True
> ? ?rather than False
> 'strict' is gone. Interfaces allow a host IP, Networks don't.
> ? ?7. Addition of an explicit "IPInterface" class to cover the
> ? ?association of an address with a specific network that is currently
> ? ?handled by storing arbitrary addresses on IP network objects
> done.
> So with a cleanup of the docstrings (and creation of some ReST docs
> based on them) a definite +1 from me for inclusion of ipaddr (based on
> the 3144 branch in SVN) in 3.3. (with the tweaks to the API, we may
> want to use a different name like "ipaddress" or "iptools", though -
> otherwise people could be legitimately confused by the differences
> relative to the PyPI "ipaddr" module)

Cleaning up the docstrings and re-tooling the PEP was where I stalled
after addressing your comments. Easy enough to complete if there's
still interest.

Note, is actually the same module,
but down a few versions. I'm not sure if your concern is about the
same library having such a different api or if you had thought they
were completely different libraries.


> Cheers,
> Nick.
> --
> Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

Peter Moody? ? ? Google? ? 1.650.253.7306
Security Engineer? pgp:0xC3410038

From ncoghlan at  Mon Jan 30 23:04:28 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 08:04:28 +1000
Subject: [Python-Dev] PEP 408 -- Standard library __preview__ package
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 4:59 AM, Antoine Pitrou <solipsis at> wrote:
> It would be nice if that pronouncement or decision could outline the
> steps required to include an "experimental" module in the stdlib, and
> the steps required to move it from "experimental" to "stable".

Actually, that's a good idea - Eli, care to try your hand at writing
up a counter-PEP to 408 that more explicitly documents Guido's
preferred approach?

It should document a standard note to be placed in the module
documentation and in What's New for experimental/provisional/whatever
modules. For example:

"The <X> module has been included in the standard library on a
provisional basis. While major changes are not anticipated, as long as
this notice remains in place, backwards incompatible changes are
permitted if deemed necessary by the standard library developers. Such
changes will not be made gratuitously - they will occur only if
serious API flaws are uncovered that were missed prior to inclusion of
the module. If the small chance of such changes is not acceptable for
your use, the module is also available from PyPI with full backwards
compatibility guarantees." (include direct link to module on PyPI)

As far as the provisional->stable transition goes, I'd say there are a
couple of options:
1. Just make it part of the normal release process to ask for each
provisional module "This hasn't been causing any dramas, shall we drop
the provisional warning?"
2. Explicitly create 'release blocker' tracker issues for the *next*
release whenever a provisional module is added. These will basically
say "either drop the provisional warning for module <X> or bump this
issue along to the next release"

Former is obviously easier, latter means we're less likely to forget to do it.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From ncoghlan at  Mon Jan 30 23:09:22 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 08:09:22 +1000
Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 --
 Standard library __preview__ package)
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 7:52 AM, Peter Moody <pmoody at> wrote:
> Note, is actually the same module,
> but down a few versions. I'm not sure if your concern is about the
> same library having such a different api or if you had thought they
> were completely different libraries.

No, I knew that - my point was that the changes in the PEP 3144 branch
are backwards incompatible with the existing ipaddr API (mainly due to
the always-strict Network objects, with the permissive behaviour moved
out to the separate Interface objects, but also due to the renamed
factory functions), so it may be easier to just give the 3144 version
of the module a different name.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From bauertomer at  Mon Jan 30 23:11:04 2012
From: bauertomer at (T.B.)
Date: Tue, 31 Jan 2012 00:11:04 +0200
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-30 20:52, Guido van Rossum wrote:
> TB, what's your use case for passing a float to a semaphore?
> Semaphores are conceptually tied to integers. You've kept arguing a
> few times now that the workaround you need are clumsy, but you've not
> explained why you're passing floats in the first place. A "fractional
> resource" just doesn't sound like a real use case to me.

Not an example from real life and certainly not one that can't be worked 
around; rather a thing that caught my eyes while looking at 
Lib/ Say you have a "known" constant guaranteed bandwidth 
and you need to split it among several connections which each of them 
take a known fixed amount of bandwidth (no more, no less).
How many connections can I reliably serve? 

Side note: If someone really want a discrete math implementation of a 
semaphore, you can replace _value with a list of resources. Then you 
check in acquire() "while not self._resources:" and pop a resource. In 
that case when a semaphore is used as a context manager it can have a 
useful 'as' clause. To me it seems too complicated for something that 
should be simple like a semaphore.


From anacrolix at  Mon Jan 30 23:11:22 2012
From: anacrolix at (Matt Joiner)
Date: Tue, 31 Jan 2012 09:11:22 +1100
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

It's also potentially lossy if you incremented and decremented until
integer precision is lost. My vote is for an int type check. No casting.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Jan 30 23:19:42 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 08:19:42 +1000
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner <anacrolix at> wrote:
> It's also potentially lossy if you incremented and decremented until integer
> precision is lost. My vote is for an int type check. No casting.

operator.index() is built for that purpose (it's what we use these
days to restrict slicing to integers).

+1 for the type restriction from me.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From guido at  Mon Jan 30 23:14:51 2012
From: guido at (Guido van Rossum)
Date: Mon, 30 Jan 2012 14:14:51 -0800
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 30, 2012 at 2:11 PM, Matt Joiner <anacrolix at> wrote:
> It's also potentially lossy if you incremented and decremented until integer
> precision is lost. My vote is for an int type check. No casting.

+1. Anything else is insane scope creep for something called "Semaphore".

--Guido van Rossum (

From benjamin at  Mon Jan 30 23:23:42 2012
From: benjamin at (Benjamin Peterson)
Date: Mon, 30 Jan 2012 17:23:42 -0500
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

2012/1/30 Nick Coghlan <ncoghlan at>:
> On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner <anacrolix at> wrote:
>> It's also potentially lossy if you incremented and decremented until integer
>> precision is lost. My vote is for an int type check. No casting.
> operator.index() is built for that purpose (it's what we use these
> days to restrict slicing to integers).
> +1 for the type restriction from me.

We don't need a type check. Just pass integers (obviously the only
right type) to it.


From victor.stinner at  Tue Jan 31 00:31:13 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 31 Jan 2012 00:31:13 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
Message-ID: <>


In issues #13882 and #11457, I propose to add an argument to functions
returning timestamps to choose the timestamp format. Python uses float
in most cases whereas float is not enough to store a timestamp with a
resolution of 1 nanosecond. I added recently time.clock_gettime() to
Python 3.3 which has a resolution of a nanosecond. The (first?) new
timestamp format will be decimal.Decimal because it is able to store
any timestamp in any resolution without loosing bits. Instead of
adding a boolean argument, I would prefer to support more formats. My
last patch provides the following formats:

 - "float": float (used by default)
 - "decimal": decimal.Decimal
 - "datetime": datetime.datetime
 - "timespec": (sec, nsec) tuple # I don't think that we need it, it
is just another example

The proposed API is:

  time.clock_gettime(time.CLOCK_REALTIME, format="decimal")
  os.stat(path, timestamp="datetime)

This API has an issue: importing the datetime or decimal object is
implicit, I don't know if it is really an issue. (In my last patch,
the import is done too late, but it can be fixed, it is not really a

Alexander Belopolsky proposed to use
time.time(format=datetime.datetime) instead.


The first step would be to add an argument to functions returning
timestamps. The second step is to accept these new formats (Decimal?)
as input, for datetime.datetime.fromtimestamp() and os.utime() for

(Using decimal.Decimal, we may remove os.utimens() and use the right
function depending on the timestamp resolution.)


I prefer Decimal over a dummy tuple like (sec, nsec) because you can
do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the
resolution of the clock: time.time() and time.clock_gettime() have for
example different resolution (sec, ms, us for time.time() and ns for

The decimal module is still implemented in Python, but there is
working implementation in C which is much faster. Store timestamps as
Decimal can be a motivation to integrate the C implementation :-)


Examples with the time module:

$ ./python
Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31)
>>> import time
>>> time.time()
>>> time.time('decimal')
>>> t1=time.time('decimal'); t2=time.time('decimal'); t2-t1
>>> t1=time.time('float'); t2=time.time('float'); t2-t1
>>> time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal')
>>> time.clock_getres(time.CLOCK_MONOTONIC, 'decimal')
>>> time.clock()
>>> time.clock('decimal')

Examples with os.stat:

$ ./python
Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24)
>>> import os
>>> s=os.stat("", timestamp="datetime")
>>> s.st_mtime - s.st_ctime
>>> print(s.st_atime - s.st_ctime)
52 days, 1:44:06.191293
>>> os.stat("", timestamp="timespec").st_ctime
(1323458640, 702327236)
>>> os.stat("", timestamp="decimal").st_ctime


From anacrolix at  Tue Jan 31 00:50:45 2012
From: anacrolix at (Matt Joiner)
Date: Tue, 31 Jan 2012 10:50:45 +1100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

Sounds good, but I also prefer Alexander's method. The type information is
already encoded in the class object. This way you don't need to maintain a
mapping of strings to classes, and other functions/third party can join in
the fun without needing access to the latest canonical mapping. Lastly
there will be no confusion or contention for duplicate keys.
On Jan 31, 2012 10:32 AM, "Victor Stinner" <victor.stinner at>

> Hi,
> In issues #13882 and #11457, I propose to add an argument to functions
> returning timestamps to choose the timestamp format. Python uses float
> in most cases whereas float is not enough to store a timestamp with a
> resolution of 1 nanosecond. I added recently time.clock_gettime() to
> Python 3.3 which has a resolution of a nanosecond. The (first?) new
> timestamp format will be decimal.Decimal because it is able to store
> any timestamp in any resolution without loosing bits. Instead of
> adding a boolean argument, I would prefer to support more formats. My
> last patch provides the following formats:
>  - "float": float (used by default)
>  - "decimal": decimal.Decimal
>  - "datetime": datetime.datetime
>  - "timespec": (sec, nsec) tuple # I don't think that we need it, it
> is just another example
> The proposed API is:
>  time.time(format="datetime")
>  time.clock_gettime(time.CLOCK_REALTIME, format="decimal")
>  os.stat(path, timestamp="datetime)
>  etc.
> This API has an issue: importing the datetime or decimal object is
> implicit, I don't know if it is really an issue. (In my last patch,
> the import is done too late, but it can be fixed, it is not really a
> matter.)
> Alexander Belopolsky proposed to use
> time.time(format=datetime.datetime) instead.
> --
> The first step would be to add an argument to functions returning
> timestamps. The second step is to accept these new formats (Decimal?)
> as input, for datetime.datetime.fromtimestamp() and os.utime() for
> example.
> (Using decimal.Decimal, we may remove os.utimens() and use the right
> function depending on the timestamp resolution.)
> --
> I prefer Decimal over a dummy tuple like (sec, nsec) because you can
> do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the
> resolution of the clock: time.time() and time.clock_gettime() have for
> example different resolution (sec, ms, us for time.time() and ns for
> clock_gettime()).
> The decimal module is still implemented in Python, but there is
> working implementation in C which is much faster. Store timestamps as
> Decimal can be a motivation to integrate the C implementation :-)
> --
> Examples with the time module:
> $ ./python
> Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31)
> >>> import time
> >>> time.time()
> 1327611705.948446
> >>> time.time('decimal')
> Decimal('1327611708.988419')
> >>> t1=time.time('decimal'); t2=time.time('decimal'); t2-t1
> Decimal('0.000550')
> >>> t1=time.time('float'); t2=time.time('float'); t2-t1
> 5.9604644775390625e-06
> >>> time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal')
> Decimal('1211833.389740312')
> >>> time.clock_getres(time.CLOCK_MONOTONIC, 'decimal')
> Decimal('1E-9')
> >>> time.clock()
> 0.12
> >>> time.clock('decimal')
> Decimal('0.120000')
> Examples with os.stat:
> $ ./python
> Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24)
> >>> import os
> >>> s=os.stat("", timestamp="datetime")
> >>> s.st_mtime - s.st_ctime
> datetime.timedelta(0)
> >>> print(s.st_atime - s.st_ctime)
> 52 days, 1:44:06.191293
> >>> os.stat("", timestamp="timespec").st_ctime
> (1323458640, 702327236)
> >>> os.stat("", timestamp="decimal").st_ctime
> Decimal('1323458640.702327236')
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Tue Jan 31 01:51:09 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 31 Jan 2012 09:51:09 +0900
Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 -- Standard
 library __preview__ package)
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan writes:

 >     1. Draft ReStructuredText documentation for inclusion in the stdlib docs
 > (still needed)

No wonder people (not directly involved in development of the module)
think that the proponents don't care!  What good is a battery if the
odds are even that you will hook it up with wrong polarity and fry
your expensive components?

I don't mean to criticize the proponents and mentors of *this* PEP; I
recall the ipaddr vs. netaddr discussions, and clearly the API needed
and got a lot of changes.  That's definitely a chilling factor for
writing a second document that largely covers the same material as the
PEP.  On the other hand, people who are not battery manufacturers have
every right to use stdlib-ready documentation as a litmus test for
readiness (and even if you think otherwise, you can't stop them).

While you probably won't get a lot of comments from those people if
you publish such docs, if you don't publish docs, you will get none.

I suggest emphasizing (in the 408bis PEP that Nick suggested) the
importance of documentation in convincing the "just users" audience
(which is the one that stdlib is really aimed at) of the value and
readiness of a module proposed for stdlib integration.

From ncoghlan at  Tue Jan 31 02:26:09 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 11:26:09 +1000
Subject: [Python-Dev] PEP 3144 ipaddr module (was Re: PEP 408 --
 Standard library __preview__ package)
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 10:51 AM, Stephen J. Turnbull
<stephen at> wrote:
> Nick Coghlan writes:
> ?> ? ? 1. Draft ReStructuredText documentation for inclusion in the stdlib docs
> ?>
> ?> (still needed)
> No wonder people (not directly involved in development of the module)
> think that the proponents don't care! ?What good is a battery if the
> odds are even that you will hook it up with wrong polarity and fry
> your expensive components?

Thinking about how to document the library from a network engineer's
perspective was actually the driving force behind my asking for the
Address/Interface/Network split in the PEP 3144 branch. Without that,
Network tries to fill both the Interface and Network role and it
becomes a bit of a nightmare to write coherent prose documentation.

Sure, merging them can *work* from a programming point of view, but
you can't document it that way and have the API seems sensible to
anyone familiar with the underlying networking concepts.

Now that ReadTheDocs exists, it is of course *much* easier to draft
and publish such documentation than it once was


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From g.brandl at  Tue Jan 31 07:22:08 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 31 Jan 2012 07:22:08 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <jg81ad$5s9$>

Am 31.01.2012 00:50, schrieb Matt Joiner:
> Sounds good, but I also prefer Alexander's method. The type information is
> already encoded in the class object. This way you don't need to maintain a
> mapping of strings to classes, and other functions/third party can join in the
> fun without needing access to the latest canonical mapping. Lastly there will be
> no confusion or contention for duplicate keys.

Sorry, I don't think it makes any sense to pass around classes as flags.
Sure, if you do something directly with the class, it's fine, but in this case
that's impossible. So you will be testing

  if format is datetime.datetime:
  elif format is decimal.Decimal:

which has no advantage at all over

  if format == "datetime":
  elif format == "decimal":

Not to speak of formats like "timespec" that don't have a respective class.
And how do you propose to handle the extensibility you speak of to work?


From stefan_ml at  Tue Jan 31 07:55:29 2012
From: stefan_ml at (Stefan Behnel)
Date: Tue, 31 Jan 2012 07:55:29 +0100
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <jg8391$fnd$>

stefan brunthaler, 30.01.2012 20:18:
>> Well, you're aware that Python already uses threaded code where
>> available? Or are you testing against Python 2?
> Yes, and I am building on that.

I assume "yes" here means "yes, I'm aware" and not "yes, I'm using Python
2", right? And you're building on top of the existing support for threaded
code in order to improve it?


From g.brandl at  Tue Jan 31 08:12:22 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 31 Jan 2012 08:12:22 +0100
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <jg848i$ioq$>

Am 30.01.2012 20:06, schrieb stefan brunthaler:

>> Do you have a public repository for the code, so we can take a look?
> I have created a patch (as Benjamin wanted) and put all of the
> resources (i.e., benchmark results and the patch itself) on my home
> page:

If I read the patch correctly, most of it is auto-generated (and there
is probably a few spurious changes that blow it up, such as the file).  But the tool that actually generates the code
doesn't seem to be included?  (Which means that in this form, the
patch couldn't possibly be accepted.)


From ncoghlan at  Tue Jan 31 08:16:06 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 17:16:06 +1000
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 9:31 AM, Victor Stinner
<victor.stinner at> wrote:
> Hi,
> In issues #13882 and #11457, I propose to add an argument to functions
> returning timestamps to choose the timestamp format. Python uses float
> in most cases whereas float is not enough to store a timestamp with a
> resolution of 1 nanosecond. I added recently time.clock_gettime() to
> Python 3.3 which has a resolution of a nanosecond. The (first?) new
> timestamp format will be decimal.Decimal because it is able to store
> any timestamp in any resolution without loosing bits. Instead of
> adding a boolean argument, I would prefer to support more formats.

I think this is definitely worth elaborating in a PEP (to recap the
long discussion in #11457 if nothing else). In particular, I'd want to
see a very strong case being made for supporting multiple formats over
standardising on a *single* new higher precision format (for example,
using decimal.Decimal in conjunction with integration of Stefan's
cdecimal work) that can then be converted to other formats (like
datetime) via the appropriate APIs.

"There are lots of alternatives, so let's choose not to choose!" is a
bad way to design an API. Helping to make decisions like this by
laying out the alternatives and weighing up their costs and benefits
is one of the major reasons the PEP process exists.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From victor.stinner at  Tue Jan 31 10:42:39 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 31 Jan 2012 10:42:39 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

> I think this is definitely worth elaborating in a PEP (to recap the
> long discussion in #11457 if nothing else).

The discussion in issues #13882 and #11457 already lists many
alternatives with their costs and benefits, but I can produce a PEP if
you need a summary.

> In particular, I'd want to
> see a very strong case being made for supporting multiple formats over
> standardising on a *single* new higher precision format (for example,
> using decimal.Decimal in conjunction with integration of Stefan's
> cdecimal work) that can then be converted to other formats (like
> datetime) via the appropriate APIs.

To convert a Decimal to a datetime object, we have already the
datetime.datetime.fromtimestamp() function (it converts Decimal to
float, but the function can be improved without touching its API). But
I like the possibility of getting the file modification time directly
as a datetime object to have something like:

>>> s=os.stat("", timestamp="datetime")
>>> print(s.st_atime - s.st_ctime)
52 days, 1:44:06.191293

We have already more than one timestamp format: os.stat() uses int or
float depending on os.stat_float_times() value. In 5 years, we may
prefer to use directly float128 instead of Decimal. I prefer to have
an extensible API to prepare future needs, even if we just add Decimal

Hum, by the way, we need a "int" format for os.stat(), so
os.stat_float_times() can be deprecated. So there will be a minimum of
3 types:
 - int
 - float
 - decimal.Decimal


From ncoghlan at  Tue Jan 31 12:11:37 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 31 Jan 2012 21:11:37 +1000
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner
<victor.stinner at> wrote:
>> I think this is definitely worth elaborating in a PEP (to recap the
>> long discussion in #11457 if nothing else).
> The discussion in issues #13882 and #11457 already lists many
> alternatives with their costs and benefits, but I can produce a PEP if
> you need a summary.

PEPs are about more than just providing a summary - they're about
presenting the alternatives in a clear form instead of having them
scattered across a long meandering tracker discussion. Laying out the
alternatives and clearly articulating their pros and cons (as Larry
attempted to do on the tracker) *helps to make better decisions*.

I counted several options presented as possibilities and I probably missed some:

- expose the raw POSIX (seconds, nanoseconds) 2-tuples (lots of good
reasons not to go that way)
- use decimal.Decimal (with or without cdecimal)
- use float128 (nixed due to cross-platform supportability problems)
- use datetime (bad idea for the reasons Martin mentioned)
- use timedelta (not mentioned on the tracker, but a *much* better fit
for a timestamp than datetime, since timestamps are relative to the
epoch while datetime objects try to be absolute)

A PEP would also allow the following items to be specifically addressed:

- a survey of what other languages are doing to cope with nanosecond
time resolutions (as suggested by Raymond but not actually done as far
I could see on the tracker)
- how to avoid a negative performance impact on os.stat() (new API?
flag argument? new lazily populated attributes accessed by name only?)

Guido's admonition against analysis paralysis doesn't mean we should
go to the other extreme and skip clearly documenting our analysis of
complex problems altogether (particularly for something like this
which may end up having ramifications for a lot of other time related

Having a low-level module like os needing to know about higher-level
types like decimal.Decimal and datetime.datetime (or even timedelta)
should be setting off all kinds of warning bells. Of all the
possibilties that offer decent arithmetic support, timedelta is
probably the one currently most suited to being pushed down to the os
level, although decimal.Decimal is also a contender if backed up by
Stefan's C implementation.

You're right that supporting this does mean being able to at least
select between 'int', 'float' and <high precision> output, but that's
the kind of case that can be made most clearly in a PEP.


Nick Coghlan?? |?? ncoghlan at |?? Brisbane, Australia

From p.f.moore at  Tue Jan 31 12:47:27 2012
From: p.f.moore at (Paul Moore)
Date: Tue, 31 Jan 2012 11:47:27 +0000
Subject: [Python-Dev] cdecimal (Was: Store timestamps as decimal.Decimal
Message-ID: <>

On 31 January 2012 11:11, Nick Coghlan <ncoghlan at> wrote:
> although decimal.Decimal is also a contender if backed up by
> Stefan's C implementation.

As you mention this, and given the ongoing thread about __preview__
and "nearly ready for stdlib" modules, what is the current position on
cdecimal? I seem to recall it being announced some time ago, but I
don't recall any particular discussions/conclusions about including it
in the stdlib.

Is it being considered for stdlib inclusion? What obstacles remain
before inclusion (clearly not many, if it's being seriously considered
as an option to support functions in something as fundamental as os)?
Do Guido's comments on the __preview__ thread make any difference

(Note - I don't have any particular *need* for cdecimal, I'm just curious...)


From victor.stinner at  Tue Jan 31 13:08:21 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 31 Jan 2012 13:08:21 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>


2012/1/31 Matt Joiner <anacrolix at>:
> Sounds good, but I also prefer Alexander's method. The type information is
> already encoded in the class object.

Ok, I posted a patch version 6 to use types instead of strings. I also
prefer types because it solves the "hidden import" issue.

> This way you don't need to maintain a
> mapping of strings to classes, and other functions/third party can join in
> the fun without needing access to the latest canonical mapping. Lastly there
> will be no confusion or contention for duplicate keys.

My patch checks isinstance(format, type), format.__module__ and
format.__name__ to do the "mapping". It is not a direct mapping
because I don't always call the same method, the implementation is
completly differenet for each type.

I don't think that we need user defined timestamp formats. My last
patch provides 5 formats:

- int
- float
- decimal.Decimal
- datetime.datetime
- datetime.timedelta

(I removed the timespec format, I consider that we don't need it.)


    >>> time.time()
    >>> time.time(format=int)
    >>> time.time(format=decimal.Decimal)
    >>> time.time(format=datetime.datetime)
    datetime.datetime(2012, 1, 31, 11, 49, 49, 409831)
    >>> print(time.time(format=datetime.timedelta))
    15370 days, 10:49:52.842116

If someone wants another format, he/she should pick up an existing
format to build his/her own format.

datetime.datetime and datetime.timedelta can be used on any function,
but datetime.datetime format gives surprising results on clocks using
an arbitrary start like time.clock() or time.wallclock(). We may raise
an error in these cases.

From solipsis at  Tue Jan 31 13:13:30 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 31 Jan 2012 13:13:30 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
References: <>
Message-ID: <>

On Tue, 31 Jan 2012 21:11:37 +1000
Nick Coghlan <ncoghlan at> wrote:
> Having a low-level module like os needing to know about higher-level
> types like decimal.Decimal and datetime.datetime (or even timedelta)
> should be setting off all kinds of warning bells.

Decimal is ideally low-level (it's a number), it's just that it has a
complicated high-level implementation :)
But we can't use Decimal by default, for the obvious reason
(performance impact that threatens to contaminate other parts of the
code through operator application).

> Of all the
> possibilties that offer decent arithmetic support, timedelta is
> probably the one currently most suited to being pushed down to the os
> level, although decimal.Decimal is also a contender if backed up by
> Stefan's C implementation.

I'm -1 on using timedelta. This is a purity proposition that will make
no sense to the average user. By the way, datetimes are relative too,
by the same reasoning.



From victor.stinner at  Tue Jan 31 13:20:23 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 31 Jan 2012 13:20:23 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

> - use datetime (bad idea for the reasons Martin mentioned)

It is only a bad idea if it is the only available choice.

> - use timedelta (not mentioned on the tracker, but a *much* better fit
> for a timestamp than datetime, since timestamps are relative to the
> epoch while datetime objects try to be absolute)

Last version of my patch supports also timedelta.

> - a survey of what other languages are doing to cope with nanosecond
> time resolutions (as suggested by Raymond but not actually done as far
> I could see on the tracker)

I didn't check that right now. I don't know if it is really revelant
because some languages don't have a builtin Decimal class or no
"builtin" datetime module.

> - how to avoid a negative performance impact on os.stat() (new API?
> flag argument? new lazily populated attributes accessed by name only?)

Because timestamp is an optional argument to os.stat() and the
behaviour is unchanged by default, the performance impact of my patch
on os.stat() is null (if you don't set timestamp argument).

> Having a low-level module like os needing to know about higher-level
> types like decimal.Decimal and datetime.datetime (or even timedelta)
> should be setting off all kinds of warning bells.

What is the problem of using decimal in the os module? Especially if
it is an option.

In my patch version 6, the timestamp argument is now a type (e.g.
decimal.Decimal) instead of a string, so the os module doesn't import
directly the module (well, to be exact, it does import the module, but
the module should already be in the cache, sys.modules).

> You're right that supporting this does mean being able to at least
> select between 'int', 'float' and <high precision> output, but that's
> the kind of case that can be made most clearly in a PEP.

Why do you want to limit the available formats? Why not giving the
choice to the user between Decimal, datetime and timedelta? Each type
has a different use case and different features, sometimes exclusive.


From stefan_ml at  Tue Jan 31 14:19:40 2012
From: stefan_ml at (Stefan Behnel)
Date: Tue, 31 Jan 2012 14:19:40 +0100
Subject: [Python-Dev] PEPs and cons (was: Re: Store timestamps as
	decimal.Decimal objects)
In-Reply-To: <>
References: <>
Message-ID: <jg8ppc$g9i$>

Nick Coghlan, 31.01.2012 12:11:
> On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner wrote:
>>> I think this is definitely worth elaborating in a PEP (to recap the
>>> long discussion in #11457 if nothing else).
>> The discussion in issues #13882 and #11457 already lists many
>> alternatives with their costs and benefits, but I can produce a PEP if
>> you need a summary.
> PEPs are about more than just providing a summary - they're about
> presenting the alternatives in a clear form instead of having them
> scattered across a long meandering tracker discussion.

There was a keynote by Jan Lehnardt (of CouchDB fame) on last year's
PyCon-DE on the end of language wars and why we should just give each other
a hug and get along and all that. To seed some better understanding, he had
come up with mottoes for the Ruby and Python language communities, which
find themselves in continuous quarrel. I remember the motto for Python
being "you do it right - and you document it".

A clear hit IMHO. Decisions about language changes and environmental
changes (such as the stdlib) aren't easily taken in the Python world, but
when they are taken, they tend to show a good amount of well reflected
common sense, and we make it transparent how they come to be by writing a
PEP about them, so that we (and others) can go back and read them up later
on when they are being questioned again or when similar problems appear in
other languages. That's a good thing, and we should keep that up.


From s.brunthaler at  Tue Jan 31 16:33:15 2012
From: s.brunthaler at (stefan brunthaler)
Date: Tue, 31 Jan 2012 07:33:15 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <jg8391$fnd$>
References: <>
Message-ID: <>

> I assume "yes" here means "yes, I'm aware" and not "yes, I'm using Python
> 2", right? And you're building on top of the existing support for threaded
> code in order to improve it?
Your assumption is correct, I'm sorry for the sloppiness (I was
heading out for lunch.) None of the code is 2.x compatible, all of my
work has always targeted Python 3.x. My work does not improve threaded
code (as in interpreter dispatch technique), but enables efficient and
purely interpretative inline caching via quickening. (So, after
execution of BINARY_ADD, I rewrite the specific occurence of the
bytecode instruction to a, say, FLOAT_ADD instruction and ensure that
my assumption is correct in the FLOAT_ADD instruction.)


From s.brunthaler at  Tue Jan 31 16:46:04 2012
From: s.brunthaler at (stefan brunthaler)
Date: Tue, 31 Jan 2012 07:46:04 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <jg848i$ioq$>
References: <>
Message-ID: <>

> If I read the patch correctly, most of it is auto-generated (and there
> is probably a few spurious changes that blow it up, such as the
> file).

Hm, honestly I don't know where the file comes from, I
thought it came with the switch from 3.1 to the tip version I was
using. Anyways, I did not tuch it or at least have no recollection of
doing so. Regarding the spurious changes: This might very well be,
regression testing works, and it would actually be fairly easy to
figure out crashes (e.g., by tracing all executed bytecode
instructions and seeing if all of them are actually executed, I could
easily do that if wanted/necessary.)

> But the tool that actually generates the code
> doesn't seem to be included? ?(Which means that in this form, the
> patch couldn't possibly be accepted.)
Well, the tool is not included because it does a lot more (e.g.,
generate the code for elimination of reference count operations.)
Unfortunately, my interpreter architecture that achieves the highest
speedups is more complicated, and I got the feeling that this is not
going well with python-dev. So, I had the idea of basically using just
one (but a major one) optimization technique and going with that. I
don't see why you would need my code generator, though. Not that I
cared, but I would need to strip down and remove many parts of it and
also make it more accessible to other people. However, if python-dev
decides that it wants to include the optimizations and requires the
code generator, I'll happily chip in the extra work an give you the
corresponding code generator, too.


From brett at  Tue Jan 31 16:54:22 2012
From: brett at (Brett Cannon)
Date: Tue, 31 Jan 2012 10:54:22 -0500
Subject: [Python-Dev] cdecimal (Was: Store timestamps as decimal.Decimal
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 06:47, Paul Moore <p.f.moore at> wrote:

> On 31 January 2012 11:11, Nick Coghlan <ncoghlan at> wrote:
> > although decimal.Decimal is also a contender if backed up by
> > Stefan's C implementation.
> As you mention this, and given the ongoing thread about __preview__
> and "nearly ready for stdlib" modules, what is the current position on
> cdecimal? I seem to recall it being announced some time ago, but I
> don't recall any particular discussions/conclusions about including it
> in the stdlib.
> Is it being considered for stdlib inclusion? What obstacles remain
> before inclusion (clearly not many, if it's being seriously considered
> as an option to support functions in something as fundamental as os)?
> Do Guido's comments on the __preview__ thread make any difference
> here?
> (Note - I don't have any particular *need* for cdecimal, I'm just
> curious...)

Because cdecimal is just an accelerated version of decimal there is no
specific stdlib restriction from it going in. At this point I think it just
needs to be finished and then committed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From bauertomer at  Tue Jan 31 19:46:54 2012
From: bauertomer at (T.B.)
Date: Tue, 31 Jan 2012 20:46:54 +0200
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-01-31 00:23, Benjamin Peterson wrote:
> 2012/1/30 Nick Coghlan<ncoghlan at>:
>> On Tue, Jan 31, 2012 at 8:11 AM, Matt Joiner<anacrolix at>  wrote:
>>> It's also potentially lossy if you incremented and decremented until integer
>>> precision is lost. My vote is for an int type check. No casting.
>> operator.index() is built for that purpose (it's what we use these
>> days to restrict slicing to integers).
>> +1 for the type restriction from me.
> We don't need a type check. Just pass integers (obviously the only
> right type) to it.
When a float is used, think of debugging such a thing, e.g. a float from 
integer division. I don't care if float (or generally non-integers) are 
not allowed in threading.Semaphore, but please make it fail with a bang.


From alexander.belopolsky at  Tue Jan 31 19:57:49 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Tue, 31 Jan 2012 13:57:49 -0500
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Jan 30, 2012 at 6:31 PM, Victor Stinner
<victor.stinner at> wrote:
> Alexander Belopolsky proposed to use
> time.time(format=datetime.datetime) instead.

Just to make sure my view is fully expressed: I am against adding flag
arguments to time.time().  My preferred solution to exposing high
resolution clocks is to do it in a separate module.  You can even call
the new function time() and access it as hirestime.time().  Longer
names that reflect various time representation are also an option:
hirestime.decimal_time(), hirestime.datetime_time() etc.

The suggestion to use the actual type as a flag was motivated by the
desire to require module import before fancy time.time() can be
called.  When you care about nanoseconds in your time stamps you won't
tolerate an I/O delay between calling time() and getting the result.
A separate module can solve this issue much better: simply import
decimal or datetime or both at the top of the module.

From alexander.belopolsky at  Tue Jan 31 20:08:31 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Tue, 31 Jan 2012 14:08:31 -0500
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou <solipsis at> wrote:
> On Tue, 31 Jan 2012 21:11:37 +1000
> Nick Coghlan <ncoghlan at> wrote:
>> Having a low-level module like os needing to know about higher-level
>> types like decimal.Decimal and datetime.datetime (or even timedelta)
>> should be setting off all kinds of warning bells.
> Decimal is ideally low-level (it's a number), it's just that it has a
> complicated high-level implementation :)

FWIW, my vote is also for Decimal and against datetime or timedelta.
(I dream of Decimal replacing float in Python 4000, so take my vote
with an appropriate amount of salt. :-)

From raymond.hettinger at  Tue Jan 31 21:10:16 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Tue, 31 Jan 2012 12:10:16 -0800
Subject: [Python-Dev] threading.Semaphore()'s counter can become
	negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 29, 2012, at 6:11 PM, John O'Connor wrote:

> On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson <benjamin at> wrote:
>> But why would you want to pass a float? It seems like API abuse to me.
> Agreed. Anything else seems meaningless.

I concur.  This is very much a non-problem.
There is no need to add more code and slow
running time with superfluous type checks.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Tue Jan 31 21:49:52 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 31 Jan 2012 21:49:52 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <jg9k5c$2rs$>

Am 31.01.2012 13:08, schrieb Victor Stinner:

>> This way you don't need to maintain a
>> mapping of strings to classes, and other functions/third party can join in
>> the fun without needing access to the latest canonical mapping. Lastly there
>> will be no confusion or contention for duplicate keys.
> My patch checks isinstance(format, type), format.__module__ and
> format.__name__ to do the "mapping". It is not a direct mapping
> because I don't always call the same method, the implementation is
> completly differenet for each type.
> I don't think that we need user defined timestamp formats. My last
> patch provides 5 formats:
> - int
> - float
> - decimal.Decimal
> - datetime.datetime
> - datetime.timedelta
> (I removed the timespec format, I consider that we don't need it.)

Rather, I guess you removed it because it didn't fit the "types as flags"

As I said in another message, another hint that this is the wrong API design:
Will the APIs ever support passing in types other than these five?  Probably
not, so I strongly believe they should not be passed in as types.


From g.brandl at  Tue Jan 31 21:50:57 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 31 Jan 2012 21:50:57 +0100
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <>
References: <>
Message-ID: <jg9k7c$2rs$>

Am 31.01.2012 16:46, schrieb stefan brunthaler:
>> If I read the patch correctly, most of it is auto-generated (and there
>> is probably a few spurious changes that blow it up, such as the
>> file).
> Hm, honestly I don't know where the file comes from, I
> thought it came with the switch from 3.1 to the tip version I was
> using. Anyways, I did not tuch it or at least have no recollection of
> doing so. Regarding the spurious changes: This might very well be,
> regression testing works, and it would actually be fairly easy to
> figure out crashes (e.g., by tracing all executed bytecode
> instructions and seeing if all of them are actually executed, I could
> easily do that if wanted/necessary.)

There is also the issue of the two test modules removed from the
test suite.

>> But the tool that actually generates the code
>> doesn't seem to be included?  (Which means that in this form, the
>> patch couldn't possibly be accepted.)
> Well, the tool is not included because it does a lot more (e.g.,
> generate the code for elimination of reference count operations.)
> Unfortunately, my interpreter architecture that achieves the highest
> speedups is more complicated, and I got the feeling that this is not
> going well with python-dev. So, I had the idea of basically using just
> one (but a major one) optimization technique and going with that. I
> don't see why you would need my code generator, though. Not that I
> cared, but I would need to strip down and remove many parts of it and
> also make it more accessible to other people. However, if python-dev
> decides that it wants to include the optimizations and requires the
> code generator, I'll happily chip in the extra work an give you the
> corresponding code generator, too.

Well, nobody wants to review generated code.


From stefan at  Tue Jan 31 22:17:41 2012
From: stefan at (stefan brunthaler)
Date: Tue, 31 Jan 2012 13:17:41 -0800
Subject: [Python-Dev] Python 3 optimizations, continued,
	continued again...
In-Reply-To: <jg9k7c$2rs$>
References: <>
Message-ID: <>

> There is also the issue of the two test modules removed from the
> test suite.
Oh, I'm sorry, seems like the patch did contain too much of my
development stuff. (I did remove them before, because they were always
failing due to the instruction opcodes being changed because of
quickening; they pass the tests, though.)

> Well, nobody wants to review generated code.
I agree. The code generator basically uses templates that contain the
information and a dump of the C-structure of several types to traverse
and see which one of them implements which functions. There is really
no magic there, the most "complex" thing is to get the inline-cache
miss checks for function calls right. But I tried to make the
generated code look pretty, so that working with it is not too much of
a hassle. The code generator itself is a little bit more complicated,
so I am not sure it would help a lot...


From victor.stinner at  Tue Jan 31 22:41:39 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 31 Jan 2012 22:41:39 +0100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <jg9k5c$2rs$>
References: <>
Message-ID: <>

> (I removed the timespec format, I consider that we don't need it.)
> Rather, I guess you removed it because it didn't fit the "types as flags"
> pattern.

I removed it because I don't like tuple: you cannot do arithmetic on
tuple, like t2-t1. Print a tuple doesn't give you a nice output. It is
used in C because you have no other choice, but in Python, we can do

> As I said in another message, another hint that this is the wrong API design:
> Will the APIs ever support passing in types other than these five? ?Probably
> not, so I strongly believe they should not be passed in as types.

I don't know if we should only support 3 types today, or more, but I
suppose that we will add more later (e.g. if datetime is replaced by
another new and better datetime module).

You mean that we should use a string instead of type, so
time.time(format="decimal")? Or do something else?


From bauertomer at  Tue Jan 31 22:58:40 2012
From: bauertomer at (T.B.)
Date: Tue, 31 Jan 2012 23:58:40 +0200
Subject: [Python-Dev] threading.Semaphore()'s counter can become
 negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <>

> I concur. This is very much a non-problem.
> There is no need to add more code and slow
> running time with superfluous type checks.
> Raymond

What do you think about the following check from

@@ -317,8 +317,6 @@
          self._value = value

      def acquire(self, blocking=True, timeout=None):
-        if not blocking and timeout is not None:
-            raise ValueError("can't specify timeout for non-blocking 
          rc = False
(There are similar checks in Modules/_threadmodule.c)

Removing the check means that we ignore the timeout argument when 
blocking=False. Currently in the multiprocessing docs there is an 
outdated note concerning acquire() methods that also says: "If block is 
False then timeout is ignored". This makes the acquire() methods of the 
threading and multiprocessing modules have different behaviors.


From tjreedy at  Tue Jan 31 23:07:53 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 31 Jan 2012 17:07:53 -0500
Subject: [Python-Dev] threading.Semaphore()'s counter can become
	negative for non-ints
In-Reply-To: <>
References: <>
Message-ID: <jg9ons$70l$>

On 1/31/2012 3:10 PM, Raymond Hettinger wrote:
> On Jan 29, 2012, at 6:11 PM, John O'Connor wrote:
>> On Sat, Jan 28, 2012 at 3:07 PM, Benjamin Peterson
>> <benjamin at <mailto:benjamin at>> wrote:
>>> But why would you want to pass a float? It seems like API abuse to me.
>> Agreed. Anything else seems meaningless.
> I concur. This is very much a non-problem.
> There is no need to add more code and slow
> running time with superfluous type checks.

If it does not now, the doc could be changed to say that the arg must be 
an int, and behavior is undefined otherwise. Then the contract is clear.

Terry Jan Reedy

From anacrolix at  Tue Jan 31 23:41:56 2012
From: anacrolix at (Matt Joiner)
Date: Wed, 1 Feb 2012 09:41:56 +1100
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>
Message-ID: <>

Nick mentioned using a single type and converting upon return, I'm starting
to like that more. A limited set of time formats is mostly arbitrary, and
there will always be a performance hit deciding which type to return.

The goal here is to allow high precision timings with minimal cost. A
separate module, and an agreement on what the best performing high
precision type is I think is the best way forward.
On Feb 1, 2012 8:47 AM, "Victor Stinner" <victor.stinner at>

> > (I removed the timespec format, I consider that we don't need it.)
> >
> > Rather, I guess you removed it because it didn't fit the "types as flags"
> > pattern.
> I removed it because I don't like tuple: you cannot do arithmetic on
> tuple, like t2-t1. Print a tuple doesn't give you a nice output. It is
> used in C because you have no other choice, but in Python, we can do
> better.
> > As I said in another message, another hint that this is the wrong API
> design:
> > Will the APIs ever support passing in types other than these five?
>  Probably
> > not, so I strongly believe they should not be passed in as types.
> I don't know if we should only support 3 types today, or more, but I
> suppose that we will add more later (e.g. if datetime is replaced by
> another new and better datetime module).
> You mean that we should use a string instead of type, so
> time.time(format="decimal")? Or do something else?
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mark at  Tue Jan 31 23:58:48 2012
From: mark at (Mark Shannon)
Date: Tue, 31 Jan 2012 22:58:48 +0000
Subject: [Python-Dev] Store timestamps as decimal.Decimal objects
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Alexander Belopolsky wrote:
> On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou <solipsis at> wrote:
>> On Tue, 31 Jan 2012 21:11:37 +1000
>> Nick Coghlan <ncoghlan at> wrote:
>>> Having a low-level module like os needing to know about higher-level
>>> types like decimal.Decimal and datetime.datetime (or even timedelta)
>>> should be setting off all kinds of warning bells.
>> Decimal is ideally low-level (it's a number), it's just that it has a
>> complicated high-level implementation :)
> FWIW, my vote is also for Decimal and against datetime or timedelta.
> (I dream of Decimal replacing float in Python 4000, so take my vote
> with an appropriate amount of salt. :-)

Why not add a new function rather than modifying time.time()?
(after all its just a timestamp, does it really need nanosecond precision?)

For those who do want super-accuracy then add a new function
time.picotime() (it could be nanotime but why not future proof it :) )
which returns an int represent the number of picoseconds since the
epoch. ints never loose precision and never overflow.


From trent at  Sun Jan 29 21:23:14 2012
From: trent at (Trent Nelson)
Date: Sun, 29 Jan 2012 15:23:14 -0500
Subject: [Python-Dev] Switching to Visual Studio 2010
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 26, 2012 at 12:54:31PM -0800, martin at wrote:
> > Is this considered a new feature that has to be in by the first beta?
> > I'm hoping to have it completed much sooner than that so we can get
> > mileage on it, but is there a cutoff for changing the compiler?
> At some point, I'll start doing this myself if it hasn't been done by
> then, and I would certainly want the build process adjusted (with
> all buildbots updated) before beta 1.

    I... I think I might have already done this, inadvertently.  I
    needed an x64 VS2010 debug build of Subversion/APR*/Python a few
    weeks ago -- forgetting the fact that we're still on VS2008.

    By the time I got to building Python, I'd already coerced everything
    else to use VS2010, so I just bit the bullet and coerced Python to
    use it too, including updating all the buildbot scripts and relevant
    externals to use VS2010, too.

    Things that immediately come to mind as potentially being useful:

  * Three new buildbot scripts:
        - build-amd64-vs10.bat
        - clean-amd64-vs10.bat
        - external-amd64-vs10.bat

  * Updates to externals/(tcl|tk)-8.5.9.x so that they both build with
    VS2010.  This was a tad fiddly.  I ended up creating makefile.vs10
    from win/ and encapsulating the changes there, then
    calling that from the buildbot *-vs10.bat scripts.  I had to change
    win/, too.

  * A few other things I can't remember off the top of my head.

    So, I guess my question is, is that work useful?  Based on Martin's
    original list, it seems to check a few boxes.

    Brian, what are your plans?  Are you going to continue working in then merge everything over when
    ready?  I have some time available to work on this for the next
    three weeks or so and would like to help out.

