From tim at  Wed Feb  1 00:09:46 2006
From: tim at (Tim Parkin)
Date: Tue, 31 Jan 2006 23:09:46 +0000
Subject: [Python-Dev] YAML (was Re: Extension to ConfigParser)
In-Reply-To: <drofeu$9g9$>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Georg Brandl wrote:
> Guido van Rossum wrote:
>>Ah. This definitely isn't what ConfigParser was meant to do. I'd think
>>for this you should use some kind of XML pickle though. That's
>>horrible if end users must edit it, but great for saving
>>near-arbitrary persistent data in a readable and occasionally editable
>>(for the developer) form.
> While we're at it, is the Python library going to incorporate some YAML
> parser in the future? YAML seems like a perfectly matching data format
> for Python.

Unfortunately, YAML still doesn't have a fully featured pure python
parser (pyyaml works on simple yaml documents).

The specification also doesn't have a blueprint implementation (there
was talk about one at some point) and the fact that the specification
has a context sensitive grammar and quite a large lookahead means that
writing parsers with standard components is a little tricky (I know I
tried for some time). The defacto standard implementation is 'syck'
which is a c library that is used in the ruby distribution and works
very well. Up until recently the only python wrapper that didn't
segfault for syck was our own pyrex wrapper. Forunately, Kirill Simonov
has written an excellent wrapper (which handles load and dump) which is
available at

Although we make extensive use of yaml and it is definitely the best
human editable data format I've used - and our non techy clients agree
that it's pretty simple to use - it is a lot more complicated than ini
files. Our opinion is that it undoubtedly has it's bad points but that
it makes complex configuration files easy to write, read and edit.

If you want a human readable serialisation format, it's way, way better
than xml. If you want to create config files that have some nesting and
typing, have a look and see what you think.

Tim Parkin

p.s. JSON is 'nearly' a subset of YAML (the nearly point is being
considered by various parties).

From bob at  Wed Feb  1 00:21:15 2006
From: bob at (Bob Ippolito)
Date: Tue, 31 Jan 2006 15:21:15 -0800
Subject: [Python-Dev] YAML (was Re: Extension to ConfigParser)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
	<drofeu$9g9$> <>
Message-ID: <>

On Jan 31, 2006, at 3:09 PM, Tim Parkin wrote:

> Georg Brandl wrote:
>> Guido van Rossum wrote:
>>> Ah. This definitely isn't what ConfigParser was meant to do. I'd  
>>> think
>>> for this you should use some kind of XML pickle though. That's
>>> horrible if end users must edit it, but great for saving
>>> near-arbitrary persistent data in a readable and occasionally  
>>> editable
>>> (for the developer) form.
>> While we're at it, is the Python library going to incorporate some  
>> parser in the future? YAML seems like a perfectly matching data  
>> format
>> for Python.
> Unfortunately, YAML still doesn't have a fully featured pure python
> parser (pyyaml works on simple yaml documents).

That's the killer for me.  I wanted to try it out once, but since  
there wasn't a good implementation I tossed it.

> p.s. JSON is 'nearly' a subset of YAML (the nearly point is being
> considered by various parties).

There's a subset of JSON that is valid YAML.  The output of  
simplejson is intentionally valid JSON and YAML, for example.   
Basically, the JSON serializer just needs to put whitespace in the  
right places.

JSON isn't a great human editable format... Better than XML I guess,  
but it's not terribly natural.  However, it is simple to implement,  
and the tools to deal with it are very widely available.


From bokr at  Wed Feb  1 01:05:11 2006
From: bokr at (Bengt Richter)
Date: Wed, 01 Feb 2006 00:05:11 GMT
Subject: [Python-Dev] Octal literals
References: <>
Message-ID: <>

On Tue, 31 Jan 2006 17:17:22 -0500, "Andrew Koenig" <ark at> wrote:

>> Apart from making 0640 a syntax error (which I think is wrong too),
>> could this be solved by *requiring* the argument to be a string? (Or
>> some other data type, but that's probably overkill.)
>That solves the problem only in that particular context.
>I would think that if it is deemed undesirable for a leading 0 to imply
>octal, then it would be best to decide on a different syntax for octal
>literals and use that syntax consistently everywhere.
>I am personally partial to allowing an optional radix (in decimal) followed
>by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would
>all represent the same value.
In that case, could I also make a pitch for the letter c which would similarly
follow a radix (in decimal) but would introduce the rest of the number as
a radix-complement signed number, e.g., -2, 16cfe, 8c76, 2c110, 10c98 would
all have the same value, and the sign-digit could be arbitrarily repeated to
the left without changing the value, e.g., -2, 16cfffe, 8c776, 2c1110, 10c99998
would all have the same value. Likewise the positive values, where the "sign-digit"
would be 0 instead of radix-1 (in the particular digit set for the radix). E.g.,
2, 16c02, 16c0002, 8c02, 8c0002, 2c010, 2c0010, 10c02, 10c00002, etc. Of course
you can put a unary minus in front of any of those, so -16f7 == 1609, and
-2c0110 == -6 == 2c1010 etc.

This permits negative literal constants to be expressed "showing the bits"
as they are in two's complement or with the bits grouped to show as hex or
octal digits etc. And 16cf80000000 would become a 32-bit int, not a long as
would -0x80000000 (being a unary minus on a positive value that is promoted to long).

Bengt Richter

From facundobatista at  Wed Feb  1 02:01:31 2006
From: facundobatista at (Facundo Batista)
Date: Tue, 31 Jan 2006 22:01:31 -0300
Subject: [Python-Dev] Extension to ConfigParser
In-Reply-To: <drm57e$isn$>
References: <>
	<> <>
Message-ID: <>

2006/1/30, Fredrik Lundh <fredrik at>:

> fwiw, I've *never* used INI files to store program state, and I've
> never used the save support in ConfigParser.

As a SiGeFi developing decision, we obligated us to keep the program
state between executions (hey, if I set the window this big, I want
the window this big next time!).

It was natural to us to save it in the user home directory, in a
".sigefi" file.

And we thought it was unpolite, at less, to put a pickled dictionary
in users home directory. That's how we finished keeping program state
in a .INI, :s.


.    Facundo


From facundobatista at  Wed Feb  1 02:11:24 2006
From: facundobatista at (Facundo Batista)
Date: Tue, 31 Jan 2006 22:11:24 -0300
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

2006/1/31, Bengt Richter <bokr at>:

> In that case, could I also make a pitch for the letter c which would similarly
> follow a radix (in decimal) but would introduce the rest of the number as
> a radix-complement signed number, e.g., -2, 16cfe, 8c76, 2c110, 10c98 would
> all have the same value, and the sign-digit could be arbitrarily repeated to
> the left without changing the value, e.g., -2, 16cfffe, 8c776, 2c1110, 10c99998
> would all have the same value. Likewise the positive values, where the "sign-digit"
> would be 0 instead of radix-1 (in the particular digit set for the radix). E.g.,
> 2, 16c02, 16c0002, 8c02, 8c0002, 2c010, 2c0010, 10c02, 10c00002, etc. Of course
> you can put a unary minus in front of any of those, so -16f7 == 1609, and
> -2c0110 == -6 == 2c1010 etc.

This is getting too complicated.

I dont' want to read code and pause myself 5 minutes while doing math
to understand a number.

I think that the whole point of modifying something is to simplify it.

I'm +0 on removing 0-leading literals. But only if we create "d", "h"
and "o" suffixes to represent decimal, hex and octal literals (2.35d,
3Fh, 660o). And +0 on keeping the "0x" preffix for hexa (c'mon, it
seems so natural....).


.    Facundo


From tim.peters at  Wed Feb  1 02:16:21 2006
From: tim.peters at (Tim Peters)
Date: Tue, 31 Jan 2006 20:16:21 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

[Thomas Wouters]
> I noticed a few compiler warnings, when I compile Python on my amd64 with
> gcc 4.0.3:
> Objects/longobject.c: In function 'PyLong_AsDouble':
> Objects/longobject.c:655: warning: 'e' may be used uninitialized in this function

Well, that's pretty bizarre.  There's _obviously_ no way to get to a
reference to `e` without going through

	x = _PyLong_AsScaledDouble(vv, &e);

first.  That isn't a useful warning.

> Objects/longobject.c: In function 'long_true_divide':
> Objects/longobject.c:2263: warning: 'aexp' may be used uninitialized in this function
> Objects/longobject.c:2263: warning: 'bexp' may be used uninitialized in this function

Same thing, really, complaining about vrbls whose values are always
set by _PyLong_AsScaledDouble().

> Modules/linuxaudiodev.c: In function 'lad_obuffree':
> Modules/linuxaudiodev.c:392: warning: 'ssize' may be used uninitialized in this function
> Modules/linuxaudiodev.c: In function 'lad_bufsize':
> Modules/linuxaudiodev.c:348: warning: 'ssize' may be used uninitialized in this function
> Modules/linuxaudiodev.c: In function 'lad_obufcount':
> Modules/linuxaudiodev.c:369: warning: 'ssize' may be used uninitialized in this function

Those are Linux bugs ;-)

> ...
> Should these warnings be fixed?

I don't know.  Is this version of gcc broken in some way relative to
other gcc versions, or newer, or ... ?  We certainly don't want to see
warnings under gcc, since it's heavily used, but I'm not clear on why
other versions of gcc aren't producing these warnings (or are they,
and people have been ignoring that?).

> I know Tim has always argued to fix them, in the past (and I agree,) and it
> doesn't look like doing so, by initializing the variables, wouldn't be too big a
> performance hit.

We shouldn't see any warnings under a healthy gcc.

> I also noticed test_logging is spuriously failing, and not just on my
> machine (according to buildbot logs.) Is anyone (Vinay?) looking at that
> yet?

FWIW, I've never seen this fail on Windows.  The difference is
probably that sockets on Windows work <wink>.

From guido at  Wed Feb  1 02:59:54 2006
From: guido at (Guido van Rossum)
Date: Tue, 31 Jan 2006 17:59:54 -0800
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/31/06, Tim Peters <tim.peters at> wrote:
> [Thomas Wouters]
> > Objects/longobject.c:655: warning: 'e' may be used uninitialized in this function
> Well, that's pretty bizarre.  There's _obviously_ no way to get to a
> reference to `e` without going through
>         x = _PyLong_AsScaledDouble(vv, &e);
> first.  That isn't a useful warning.

But how can the compiler know that it is an output-only argument?

--Guido van Rossum (home page:

From tim.peters at  Wed Feb  1 03:19:41 2006
From: tim.peters at (Tim Peters)
Date: Tue, 31 Jan 2006 21:19:41 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

>> Well, that's pretty bizarre.  There's _obviously_ no way to get to a
>> reference to `e` without going through
>>         x = _PyLong_AsScaledDouble(vv, &e);
>> first.  That isn't a useful warning.

> But how can the compiler know that it is an output-only argument?

In the absence of interprocedural analysis, it cannot -- and neither
can it know that it's not an output argument.  It can't know anything
non-trivial, and because it can't, a reasonable compiler would avoid
raising a red flag at "warning" level.  "info", maybe, if it has such
a concept.  It's as silly to me as seeing, e.g.,

recip(double z)
    return 1.0 / z;

"warning: possible division by 0 or signaling NaN"

Perhaps, but not useful because there's no reason to presume it's a
_likely_ error.

From evdo.hsdpa at  Wed Feb  1 03:42:27 2006
From: evdo.hsdpa at (Robert Kim Wireless Internet Advisor)
Date: Tue, 31 Jan 2006 18:42:27 -0800
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

u guys are way over my head :)

Robert Kim
2611s Highway 101
suite 102
San diego CA 92007
206 984 0880

On 1/31/06, Guido van Rossum <guido at> wrote:
> On 1/31/06, Tim Peters <tim.peters at> wrote:
> > [Thomas Wouters]
> > > Objects/longobject.c:655: warning: 'e' may be used uninitialized in this
> function
> >
> > Well, that's pretty bizarre.  There's _obviously_ no way to get to a
> > reference to `e` without going through
> >
> >         x = _PyLong_AsScaledDouble(vv, &e);
> >
> > first.  That isn't a useful warning.
> But how can the compiler know that it is an output-only argument?
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Robert Q Kim, Wireless Internet Advisor

2611 S. Pacific Coast Highway 101
Suite 102
Cardiff by the Sea, CA 92007
206 984 0880

From foom at  Wed Feb  1 04:27:01 2006
From: foom at (James Y Knight)
Date: Tue, 31 Jan 2006 22:27:01 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On Jan 31, 2006, at 8:16 PM, Tim Peters wrote:

> [Thomas Wouters]
>> I noticed a few compiler warnings, when I compile Python on my  
>> amd64 with
>> gcc 4.0.3:
>> Objects/longobject.c: In function 'PyLong_AsDouble':
>> Objects/longobject.c:655: warning: 'e' may be used uninitialized  
>> in this function
> Well, that's pretty bizarre.  There's _obviously_ no way to get to a
> reference to `e` without going through
> 	x = _PyLong_AsScaledDouble(vv, &e);
> first.  That isn't a useful warning.

Look closer, and it's not quite so obvious. Here's the beginning of  
> double
> PyLong_AsDouble(PyObject *vv)
> {
>     int e;
>     double x;
>     if (vv == NULL || !PyLong_Check(vv)) {
>         PyErr_BadInternalCall();
>         return -1;
>     }
>     x = _PyLong_AsScaledDouble(vv, &e);
>     if (x == -1.0 && PyErr_Occurred())
>         return -1.0;
>     if (e > INT_MAX / SHIFT)
>         goto overflow;

Here's the beginning of _PyLong_AsScaledDouble:

> _PyLong_AsScaledDouble(PyObject *vv, int *exponent)
> {
> #define NBITS_WANTED 57
>     PyLongObject *v;
>     double x;
>     const double multiplier = (double)(1L << SHIFT);
>     int i, sign;
>     int nbitsneeded;
>     if (vv == NULL || !PyLong_Check(vv)) {
>         PyErr_BadInternalCall();
>         return -1;
>     }

Now here's the thing: _PyLong_AsScaledDouble *doesn't* set exponent  
before returning -1 there, which is where the warning comes from.  
Now, you might protest, it's impossible to go down that code path,  
because of two reasons:

1) PyLong_AsDouble has an identical "(vv == NULL || !PyLong_Check 
(vv))" check, so that codepath in _PyLong_AsScaledDouble cannot  
possibly be gone down. However, PyLong_Check is a macro which expands  
to a function call to an external function, "PyType_IsSubtype((vv)- 
 >ob_type, (&PyLong_Type)))", so GCC has no idea it cannot return an  
error the second time. This is the kind of thing C++'s const

2) There's a guard "(x == -1.0 && PyErr_Occurred())" before "e" is  
used in PyLong_AsDouble, which checks the conditions that  
_PyLong_AsScaledDouble set. Thus, e cannot possibly be used, even if  
the previous codepath *was* possible to go down. However, again,  
PyErr_BadInternalCall() is an external function, so the compiler has  
no way of knowing that PyErr_BadInternalCall() causes PyErr_Occurred 
() to return true.

So in conclusion, from all the information the compiler has available  
to it, it is giving a correct diagnostic.


From jeremy at  Wed Feb  1 05:28:22 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 31 Jan 2006 23:28:22 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On 1/31/06, Robert Kim Wireless Internet Advisor <evdo.hsdpa at> wrote:
> u guys are way over my head :)
> bob
> --
> Robert Kim
> 2611s Highway 101
> suite 102
> San diego CA 92007
> 206 984 0880

Stop spamming our list.


From ianb at  Wed Feb  1 05:32:20 2006
From: ianb at (Ian Bicking)
Date: Tue, 31 Jan 2006 22:32:20 -0600
Subject: [Python-Dev] Extension to ConfigParser
In-Reply-To: <>
References: <>		<>		<>	<>	<>	<>
Message-ID: <>

Sorry, I didn't follow up here like I should have, and I haven't 
followed the rest of this conversation, so apologies if I am being 

Fuzzyman wrote:
>>While ConfigParser is okay for simple configuration, it is (IMHO) not a 
>>very good basis for anyone who wants to build better systems, like 
>>config files that can be changed programmatically, or error messages 
>>that point to file and line numbers.  Those aren't necessarily features 
>>we need to expose in the standard library, but it'd be nice if you could 
>>implement that kind of feature without having to ignore the standard 
>>library entirely.
> Can you elaborate on what kinds of programattic changes you envisage ?
> I'm just wondering if there are classes of usage not covered by
> ConfigObj. Of course you can pretty much do anything to a ConfigObj
> instance programattically, but even so...

ConfigObj does fine, my criticism was simply of ConfigParser in this 
case.  Just yesterday I was doing (with ConfigParser):'app:main', '## Uncomment this next line to enable 
authentication:\n#filter-with', 'openid')

This is clearly lame ;)

>>That said, I'm not particularly enthused about a highly featureful 
>>config file *format* in the standard library, even if I would like a 
>>much more robust implementation.
> I don't see how you can easily separate the format from the parser -
> unless you just leave raw values. (As I said in the other email, I don't
> think I fully understand you.)
> If accessing raw values suits your purposes, why not subclass
> ConfigParser and do magic in the get* methods ?

I guess I haven't really looked closely at the implementation of 
ConfigParser, so I don't know how serious the subclassing would have to 
be.  But, for example, if you wanted to do nested sections this is not 
infeasible with the current syntax, you just have to overload the 
meaning of the section names.  E.g., [] (a section named 
"") could mean that this is a subsection of "foo".  Or, if the 
parser allows you to see the order of sections, you could use [[bar]] (a 
section named "[bar]") to imply a subsection, not unlike what you have 
already, except without the indentation.

I think there's lots of other kinds of things you can do with the INI 
syntax as-is, but providing a different interface to it.  If you allow 
an easy-to-reuse parser, you can even check that syntax at read time. 
(Or if you keep enough information, check the syntax later and still be 
able to signal errors with filenames and line numbers)

An example of a parser that doesn't imply much of anything about the 
object being produced is one that I wrote here:

On top of that I was able to build some other fancy things without much 
problem (which ended up being too fancy, but that's a different issue ;)

>> From my light reading on ConfigObj, it looks like it satisfies my 
>>personal goals (though I haven't used it), but maybe has too many 
>>features, like nested sections.  And it seems like maybe the API can be 
> I personally think nested sections are very useful and would be sad to
> not see them included. Grouping additional configuration options as a
> sub-section can be *very* handy.

Using .'s in names can also do grouping, or section naming conventions.

Ian Bicking  |  ianb at  |

From jcarlson at  Wed Feb  1 05:36:34 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 31 Jan 2006 20:36:34 -0800
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

Robert Kim Wireless Internet Advisor <evdo.hsdpa at> wrote:
> u guys are way over my head :)
> bob

You seem to be new to the python-dev mailing list.  As a heads-up,
python-dev is for the development _of_ python.  If you are using Python,
and want help or want to help others using Python, you should instead
join python-list, or the equivalent comp.lang.python newsgroup.

Posting as a new user what you just did "u guys are way over my head :)",
as well as your earlier post of "anybody here?", is a good and fast way
of being placed in everyone's kill file.

 - Josiah

From martin at  Wed Feb  1 08:15:41 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Feb 2006 08:15:41 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

Tim Peters wrote:
>>I noticed a few compiler warnings, when I compile Python on my amd64 with
>>gcc 4.0.3:
>>Objects/longobject.c: In function 'PyLong_AsDouble':
>>Objects/longobject.c:655: warning: 'e' may be used uninitialized in this function
> Well, that's pretty bizarre.  There's _obviously_ no way to get to a
> reference to `e` without going through
> 	x = _PyLong_AsScaledDouble(vv, &e);
> first.  That isn't a useful warning.

It inlines the function to make this determination. Now, it's not true
that e can be uninitialized then, but there the gcc logic fails:

If you take the

        if (vv == NULL || !PyLong_Check(vv)) {
                return -1;

case in _PyLong_AsScaledDouble, *exponent won't be initialized. Then,
in PyLong_AsDouble, with

        x = _PyLong_AsScaledDouble(vv, &e);
        if (x == -1.0 && PyErr_Occurred())
                return -1.0;

it looks like the return would not be taken if PyErr_Occurred returns
false. Of course, it won't, but that is difficult to analyse.

> I don't know.  Is this version of gcc broken in some way relative to
> other gcc versions, or newer, or ... ?  We certainly don't want to see
> warnings under gcc, since it's heavily used, but I'm not clear on why
> other versions of gcc aren't producing these warnings (or are they,
> and people have been ignoring that?).

gcc 4 does inlining in far more cases now.


From martin at  Wed Feb  1 08:20:21 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Feb 2006 08:20:21 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Guido van Rossum wrote:
>>Well, that's pretty bizarre.  There's _obviously_ no way to get to a
>>reference to `e` without going through
>>        x = _PyLong_AsScaledDouble(vv, &e);
>>first.  That isn't a useful warning.
> But how can the compiler know that it is an output-only argument?

If a variable's address is passed to a function, gcc normally assumes
that the function will modify the variable, so you normally don't
see "might be used uninitialized" warnings. However, gcc now also
inlines the functions called if possible, to find out how the pointer
is used inside the function.

Changing the order of the functions in the file won't help anymore,
either. If you want to suppress inlining, you must put


before the function.


From thomas at  Wed Feb  1 11:14:05 2006
From: thomas at (Thomas Wouters)
Date: Wed, 1 Feb 2006 11:14:05 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Jan 31, 2006 at 08:16:21PM -0500, Tim Peters wrote:

> Is this version of gcc broken in some way relative to other gcc versions,
> or newer, or ... ?  We certainly don't want to see warnings under gcc,
> since it's heavily used, but I'm not clear on why other versions of gcc
> aren't producing these warnings (or are they, and people have been
> ignoring that?).

Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
(in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
behaves the same way. The normal make process shows quite a lot of output on
systems that use gcc, so I wouldn't be surprised if people did ignore it,
for the most part.

My main problem with fixing the warnings is that I don't see the difference
between, for example, the 'ssize' variable and the 'nchannels' variable in
linuxaudio's lad_obuffree/lad_bufsize/lad_obufcount. 'ssize' gets a warning,
'nchannels' doesn't, yet how they are treated is not particularly different.
The ssize output parameter gets set inside a switch, is directly followed by
a break, and the switch is directly followed by a set of the nchannels
output parameter. The only way through the switch is through the set of
ssize. I understand the compiler doesn't "see" it this way, but who knows
for how long :)

I guess we ignore this until we're closer to a 2.5alpha1 ;P

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From sjoerd at  Wed Feb  1 11:34:00 2006
From: sjoerd at (Sjoerd Mullender)
Date: Wed, 01 Feb 2006 11:34:00 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Thomas Wouters wrote:
> On Tue, Jan 31, 2006 at 08:16:21PM -0500, Tim Peters wrote:
>>Is this version of gcc broken in some way relative to other gcc versions,
>>or newer, or ... ?  We certainly don't want to see warnings under gcc,
>>since it's heavily used, but I'm not clear on why other versions of gcc
>>aren't producing these warnings (or are they, and people have been
>>ignoring that?).
> Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
> (in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
> behaves the same way. The normal make process shows quite a lot of output on
> systems that use gcc, so I wouldn't be surprised if people did ignore it,
> for the most part.
> My main problem with fixing the warnings is that I don't see the difference
> between, for example, the 'ssize' variable and the 'nchannels' variable in
> linuxaudio's lad_obuffree/lad_bufsize/lad_obufcount. 'ssize' gets a warning,
> 'nchannels' doesn't, yet how they are treated is not particularly different.
> The ssize output parameter gets set inside a switch, is directly followed by
> a break, and the switch is directly followed by a set of the nchannels
> output parameter. The only way through the switch is through the set of
> ssize. I understand the compiler doesn't "see" it this way, but who knows
> for how long :)
> I guess we ignore this until we're closer to a 2.5alpha1 ;P

I don't quite understand what's the big deal.  The compiler issues a
warning.  We know better (and I agree, we *do* know better in most of
these cases), but it's easy to add a "= 0" to the declaration of the
variable to shut up the compiler, hopefully with a comment saying as
much.  That's what I've been doing in my code that generated these
warnings.  It's clearly a "bug" in the compiler that it isn't smart
enough to figure out that variable do actually only get used after
they've been set.  Hence, this is Somebody Else's Problem.

Sjoerd Mullender
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 369 bytes
Desc: OpenPGP digital signature
Url : 

From gjc at  Wed Feb  1 13:33:36 2006
From: gjc at (Gustavo J. A. M. Carneiro)
Date: Wed, 01 Feb 2006 12:33:36 +0000
Subject: [Python-Dev] Octal literals
In-Reply-To: <011c01c626b4$2d6a0750$6402a8c0@arkdesktop>
References: <011c01c626b4$2d6a0750$6402a8c0@arkdesktop>
Message-ID: <1138797216.6791.38.camel@localhost.localdomain>

On Tue, 2006-01-31 at 17:17 -0500, Andrew Koenig wrote:
> > Apart from making 0640 a syntax error (which I think is wrong too),
> > could this be solved by *requiring* the argument to be a string? (Or
> > some other data type, but that's probably overkill.)
> That solves the problem only in that particular context.
> I would think that if it is deemed undesirable for a leading 0 to imply
> octal, then it would be best to decide on a different syntax for octal
> literals and use that syntax consistently everywhere.

  +1, and then issue a warning every time the parser sees leading 0
octal constant instead of the new syntax, although the old syntax would
continue to work for compatibility reasons.

> I am personally partial to allowing an optional radix (in decimal) followed
> by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would
> all represent the same value.

  For me, adding the radix to the right instead of left looks nicer:
23r8, 13r16, etc., since a radix is almost like a unit, and units are
always to the right.  Plus, we already use suffix characters to the
right, like 10L.  And I seem to recall an old assembler (a z80
assembler, IIRC :P) that used a syntax like 10h and 11b for hex an bin

  Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
casual observer; perhaps a suffix letter is more readable, since we
don't need arbitrary radix support anyway.

/me thinks of some examples:

  644o # I _think_ the small 'o' cannot be easily confused with 0 or O,
  10h  # hex.. hm.. but we already have 0x10
  101b # binary

  Another possility is to extend the 0x syntax to non-hex,

   0xff   # hex
   0o644  # octal
   0b1101 # binary

  I'm unsure which one I like better.


Gustavo J. A. M. Carneiro
<gjc at> <gustavo at>
The universe is always one step beyond logic

From scott+python-dev at  Wed Feb  1 14:07:00 2006
From: scott+python-dev at (Scott Dial)
Date: Wed, 01 Feb 2006 08:07:00 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Thomas Wouters wrote:
> My main problem with fixing the warnings is that I don't see the difference
> between, for example, the 'ssize' variable and the 'nchannels' variable

As was pointed out elsewhere, any variable that is passed by-reference 
to another function is ignored for the purposes of these warnings. The 
fact that the ioctl call with nchannels happens well after potential 
problem spots doesn't matter. It appears that GCC has eliminated it from 
the decision process for the purposes of these warnings already.

The problem roots from the ambiguity of the returns. At compile-time, 
there is no way for GCC that the return value will be negative in the 
error case, and thus the return may cause us to go down an execution 
path that ssize (and nchannels) need to be initialized. This check seems 
to be very shallow, even if you provide a guarantee that the return 
value will be well-behaved, GCC has already given up on figuring this 
out. The rule of thumb here seems to be "if you make a call to a 
function which provides the condition for the uninitialized variable is 
used, then the condition is decided to be ambiguous."

So, either the GCC people have not noticed this problem, or (more 
likely) have decided that this is acceptable, but clearly it will cause 
spurious warnings. Hey, after all, they are just warnings.

Scott Dial
scott at
dialsa at

From mwh at  Wed Feb  1 14:51:03 2006
From: mwh at (Michael Hudson)
Date: Wed, 01 Feb 2006 13:51:03 +0000
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <> (Scott Dial's message of "Wed,
	01 Feb 2006 08:07:00 -0500")
References: <>
	<> <>
Message-ID: <>

Scott Dial <scott+python-dev at> writes:

> So, either the GCC people have not noticed this problem, or (more 
> likely) have decided that this is acceptable, but clearly it will cause 
> spurious warnings. Hey, after all, they are just warnings.

Well, indeed, but "no warnings" is a useful policy -- it makes new
warnings much easier to spot :)

The warnings under discussion seem rather excessive to me.


  Ignoring the rules in the FAQ: 1" slice in spleen and prevention 
    of immediate medical care.
                                              -- Mark C. Langston, asr

From thomas at  Wed Feb  1 15:33:22 2006
From: thomas at (Thomas Wouters)
Date: Wed, 1 Feb 2006 15:33:22 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Feb 01, 2006 at 01:51:03PM +0000, Michael Hudson wrote:
> Scott Dial <scott+python-dev at> writes:
> > So, either the GCC people have not noticed this problem, or (more 
> > likely) have decided that this is acceptable, but clearly it will cause 
> > spurious warnings. Hey, after all, they are just warnings.

> Well, indeed, but "no warnings" is a useful policy -- it makes new
> warnings much easier to spot :)

> The warnings under discussion seem rather excessive to me.

Yes, and more than that; fixing them 'properly' requires more than just
initializing them. There is no sane default for some of those warnings, so a
proper fix would have to check for a sane value after the function returns.
That is, if we take the warning seriously. If we don't take it seriously,
initializing the variable may surpress a warning in the future: one of the
called functions could change, opening a code path that in fact doesn't
initialize the output variable. But initializing to a sentinel value,
checking the value before use and handling that case sanely isn't always
easy, or efficient. Hence my suggestion to let this wait a bit (since they
are, at this time, spurious errors). Fixing the warnings *now* won't fix any
bugs, may mask future bugs, and may not be necessary if gcc 4.0.4 grows a
better way to surpress these warnings. Or gcc 4.0 may grow more such
warnings, in which case we may want to change the 'no warnings' policy or
the flags to gcc, or add a 'known spurious warnings' checking thing to the

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From tim.peters at  Wed Feb  1 16:15:15 2006
From: tim.peters at (Tim Peters)
Date: Wed, 1 Feb 2006 10:15:15 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

[Martin v. L?wis]
> It inlines the function to make this determination.

Very cool!  Is this a new(ish) behavior?

> Now, it's not true that e can be uninitialized then, but there
> the gcc logic fails:

That's fine -- there are any number of ways a compiler can reach a
wrong conclusion by making conservative assumptions, and so long as
it's actually staring at code I don't mind that at all.  What I would
mind is griping about some_func(&a) possibly not setting `a` in the
_absence_ of staring at `some_func`'s internals.

> If you take the
>         if (vv == NULL || !PyLong_Check(vv)) {
>                 PyErr_BadInternalCall();
>                 return -1;
>         }
> case in _PyLong_AsScaledDouble, *exponent won't be initialized.

Certainly, and I don't expect a compiler to realize that this branch
is impossible when _PyLong_AsScaledDouble is invoked from the call
sites where gcc is complaining.

> Then, in PyLong_AsDouble, with
>         x = _PyLong_AsScaledDouble(vv, &e);
>         if (x == -1.0 && PyErr_Occurred())
>                 return -1.0;
> it looks like the return would not be taken if PyErr_Occurred returns
> false. Of course, it won't, but that is difficult to analyse.

PyLong_AsDouble already did:

  	if (vv == NULL || !PyLong_Check(vv)) {
		return -1;

before calling _PyLong_AsScaledDouble(), and the latter's `x` is the
former's `vv`.  That is, the check you showed above from
_PyLong_AsScaledDouble() is exactly the same as the check
PyLong_AsDouble already made.  To exploit that, gcc would have to
realize PyLong_Check() is a "pure enough" function, and I don't expect
gcc to be able to figure that out.

>> I don't know.  Is this version of gcc broken in some way relative to
>> other gcc versions, or newer, or ... ?  We certainly don't want to see
>> warnings under gcc, since it's heavily used, but I'm not clear on why
>> other versions of gcc aren't producing these warnings (or are they,
>> and people have been ignoring that?).

> gcc 4 does inlining in far more cases now.

OK then.  Thomas, for these _PyLong_AsScaledDouble()-caller cases, I
suggest doing whatever obvious thing manages to silence the warning. 
For example, in PyLong_AsDouble:

	int e = -1;  /* silence gcc warning */

and then add:

	assert(e >= 0);

after the call.

From scott+python-dev at  Wed Feb  1 16:17:26 2006
From: scott+python-dev at (Scott Dial)
Date: Wed, 01 Feb 2006 10:17:26 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>	<>
	<>	<>
Message-ID: <>

Thomas Wouters wrote:
> On Wed, Feb 01, 2006 at 01:51:03PM +0000, Michael Hudson wrote:
>> Scott Dial <scott+python-dev at> writes:
>>> So, either the GCC people have not noticed this problem, or (more 
>>> likely) have decided that this is acceptable, but clearly it will cause 
>>> spurious warnings. Hey, after all, they are just warnings.
>> Well, indeed, but "no warnings" is a useful policy -- it makes new
>> warnings much easier to spot :)
>> The warnings under discussion seem rather excessive to me.
> Yes, and more than that; fixing them 'properly' requires more than just
> initializing them. There is no sane default for some of those warnings, so a
> proper fix would have to check for a sane value after the function returns.
> That is, if we take the warning seriously. If we don't take it seriously,
> initializing the variable may surpress a warning in the future: one of the
> called functions could change, opening a code path that in fact doesn't
> initialize the output variable. But initializing to a sentinel value,
> checking the value before use and handling that case sanely isn't always
> easy, or efficient. Hence my suggestion to let this wait a bit (since they
> are, at this time, spurious errors). Fixing the warnings *now* won't fix any
> bugs, may mask future bugs, and may not be necessary if gcc 4.0.4 grows a
> better way to surpress these warnings. Or gcc 4.0 may grow more such
> warnings, in which case we may want to change the 'no warnings' policy or
> the flags to gcc, or add a 'known spurious warnings' checking thing to the
> buildbot.

Although it is no consolation, there are two types of unused variable 
warnings: the known-error ("is used uninitialized in this function") and 
a probable-error ("may be used uninitialized in this function"). It may 
be reasonable to ignore the probable-error case. I think someone even 
mentioned that they really should be an "info" and not a "warning".

The points in the code clearly should have attention brought to them 
because there is a real possibility of error, but as you say, there is 
no way to rid yourself of this type of warning.

Also, note that the phrasing "is"/"may be" is a change from 3.x to 4.x. 
The old warning was "might" always, and as I understand gcc the "might" 
of 3.x maps directly to the "is" of 4.x -- leaving "may be" an entirely 
new thing to 4.x.

 From gcc/tree-ssa.c:
    The second pass follows PHI nodes to find uses that are potentially
    uninitialized.  In this case we can't necessarily prove that the use
    is really uninitialized.  This pass is run after most optimizations,
    so that we thread as many jumps and possible, and delete as much dead
    code as possible, in order to reduce false positives.  We also look
    again for plain uninitialized variables, since optimization may have
    changed conditionally uninitialized to unconditionally uninitialized.

Scott Dial
scott at
dialsa at

From rhamph at  Wed Feb  1 16:32:49 2006
From: rhamph at (Adam Olsen)
Date: Wed, 1 Feb 2006 08:32:49 -0700
Subject: [Python-Dev] Octal literals
In-Reply-To: <1138797216.6791.38.camel@localhost.localdomain>
References: <011c01c626b4$2d6a0750$6402a8c0@arkdesktop>
Message-ID: <>

On 2/1/06, Gustavo J. A. M. Carneiro <gjc at> wrote:
> On Tue, 2006-01-31 at 17:17 -0500, Andrew Koenig wrote:
> > I am personally partial to allowing an optional radix (in decimal) followed
> > by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would
> > all represent the same value.
>   For me, adding the radix to the right instead of left looks nicer:
> 23r8, 13r16, etc., since a radix is almost like a unit, and units are
> always to the right.  Plus, we already use suffix characters to the
> right, like 10L.  And I seem to recall an old assembler (a z80
> assembler, IIRC :P) that used a syntax like 10h and 11b for hex an bin
> radix.

ffr16  #16rff or 255
Iamadeadparrotr36 # 36rIamadeadparrot or 3120788520272999375597

Suffix syntax for bases higher than 10 is ambiguous with variable
names.  Prefix syntax is not.

Adam Olsen, aka Rhamphoryncus

From bokr at  Wed Feb  1 16:35:55 2006
From: bokr at (Bengt Richter)
Date: Wed, 01 Feb 2006 15:35:55 GMT
Subject: [Python-Dev] Octal literals
References: <011c01c626b4$2d6a0750$6402a8c0@arkdesktop>
Message-ID: <>

On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro" <gjc at> wrote:
>  Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
>casual observer; perhaps a suffix letter is more readable, since we
>don't need arbitrary radix support anyway.
>/me thinks of some examples:
>  644o # I _think_ the small 'o' cannot be easily confused with 0 or O,
>  10h  # hex.. hm.. but we already have 0x10
>  101b # binary
>  Another possility is to extend the 0x syntax to non-hex,
>   0xff   # hex
>   0o644  # octal
>   0b1101 # binary
>  I'm unsure which one I like better.
Sorry if I seem to be picking nits, but IMO there's more than a nit here:

The trouble with all of these is that they are all literals
for integers, but integers are signed, and there is no way
to represent the sign bit (wherever it is for a particular platform)
along with the others, without triggering a promotion to positive long.

So you get stuff like

 >>> def i32(i): return int(-(i&0x80000000))+int(i&0x7fffffff)
 >>> MYCONST = i32(0x87654321)
 >>> type(MYCONST)
 <type 'int'>
 >>> hex(MYCONST)
Oops ;-/
 >>> hex(MYCONST&0xffffffff)

instead of

    MYCONST = 16cf87654321

Hm... maybe an explicit ordinary sign _after_ the prefix would be more mnemonic
instead of indicating it with the radix-complement (f or 0 for hex). E.g.,

    MYCONST = 16r-87654321  # all bits above the 8 are ones


    MYCONST = 16r+87654321   # explicitly positive, all bits above 8 (none for 32 bits) are zeroes
    MYCONST = 16r87654321    # implicitly positive, ditto

or the above in binary

    MYCONST = 2r-10000111011001010100001100100001  # leading bits are ones (here all are specified for 32-bit int, but
                                                   # effect would be noticeable for smaller numbers or wider ints)
    MYCONST = 2r+10000111011001010100001100100001  # leading bits are zeroes (ditto)
    MYCONST = 2r10000111011001010100001100100001   # ditto

This could also be done as alternative 0x syntax, e.g. using 0h, 0o, and 0b,
but I sure don't like that '0o' ;-)

BTW, for non-power-of-two radices(?), it should be remembered that the '-'
is mnemonic for the symbol for (radix-1), and '+' or no sign is mnemonic for
a prefixed 0 (which is 0 in any allowable radix) in order to have this notation
have general radix expressivity for free ;-)

Bengt Richter

From dw at  Wed Feb  1 16:33:09 2006
From: dw at (David Wilson)
Date: Wed, 1 Feb 2006 15:33:09 +0000
Subject: [Python-Dev] failing sender verification.
Message-ID: <>

Hi there,

Recently, updates from MoinMoin have started getting quarantined due to sender
verification failing. On investigating the problem, it seems that an assumption
about the webmaster mailbox is incorrect:

    220 ESMTP Postfix (Debian/GNU)
    MAIL FROM: <>
    503 Error: send HELO/EHLO first
    MAIL FROM: <>
    250 Ok
    RCPT TO: webmaster at
    553 invalid bounce (address does not send mail)

The MoinMoin instance on is sending mail as "webmaster at".
Can somebody take a look? Or at least tell me who to contact.


PS: Please CC me in replies as I am not currently subscribed.

It's never too late to have a happy childhood.

From tim.peters at  Wed Feb  1 17:29:16 2006
From: tim.peters at (Tim Peters)
Date: Wed, 1 Feb 2006 11:29:16 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

[Thomas Wouters]
> Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
> (in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
> behaves the same way. The normal make process shows quite a lot of output on
> systems that use gcc, so I wouldn't be surprised if people did ignore it,
> for the most part.

Does it really?  It's completely warning-free on Windows, and that's
the intent, and it takes ongoing work to keep it that way.  Over at,

I only see one gcc warning, coming from Python/Python-ast.c.  I
suppose that isn't a complete build, though.

From foom at  Wed Feb  1 18:40:42 2006
From: foom at (James Y Knight)
Date: Wed, 1 Feb 2006 12:40:42 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <1138797216.6791.38.camel@localhost.localdomain>
References: <011c01c626b4$2d6a0750$6402a8c0@arkdesktop>
Message-ID: <>

On Feb 1, 2006, at 7:33 AM, Gustavo J. A. M. Carneiro wrote:

>   Another possility is to extend the 0x syntax to non-hex,
>    0xff   # hex
>    0o644  # octal
>    0b1101 # binary



From jcarlson at  Wed Feb  1 18:47:34 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 01 Feb 2006 09:47:34 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

bokr at (Bengt Richter) wrote:
> On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro" <gjc at> wrote:
> [...]
> >  Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
> >casual observer; perhaps a suffix letter is more readable, since we
> >don't need arbitrary radix support anyway.

[snip discussion over radix and compliments]

I hope I'm not the only one who thinks that "simple is better than
complex", at least when it comes to numeric constants.  Certainly it
would be _convenient_ to express constants in a radix other than decimal,
hexidecimal, or octal, but to me, it all looks like noise.

Peronally, I was on board for the removal of octal literals, if only
because I find _seeing_ a leading zero without something else (like the
'x' for hexidecimal) to be difficult, and because I've found little use
for them in my work (decimals and hex are usually all I need).

Should it change for me?  Of course not, but I think that adding
different ways to spell integer values will tend to confuse new and
seasoned python users.  Some will like the flexibility that adding new
options offers, but I believe such a change will be a net loss for the
understandability of those pieces of code which use it.

 - Josiah

From bjourne at  Wed Feb  1 19:14:25 2006
From: bjourne at (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Wed, 1 Feb 2006 18:14:25 +0000
Subject: [Python-Dev] The path module PEP
In-Reply-To: <>
References: <>
Message-ID: <>

I've submitted an updated version of the PEP. The only major change is
that instead of the method atime and property getatime() there is now
only one method named atime(). Also some information about the string
inheritance problem in Open Issues. I still have no idea what to do
about it though.

mvh Bj?rn

From bokr at  Wed Feb  1 19:17:30 2006
From: bokr at (Bengt Richter)
Date: Wed, 01 Feb 2006 18:17:30 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Wed, 01 Feb 2006 09:47:34 -0800, Josiah Carlson <jcarlson at> wrote:

>bokr at (Bengt Richter) wrote:
>> On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro" <gjc at> wrote:
>> [...]
>> >  Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
>> >casual observer; perhaps a suffix letter is more readable, since we
>> >don't need arbitrary radix support anyway.
>[snip discussion over radix and compliments]
>I hope I'm not the only one who thinks that "simple is better than
>complex", at least when it comes to numeric constants.  Certainly it
>would be _convenient_ to express constants in a radix other than decimal,
>hexidecimal, or octal, but to me, it all looks like noise.
You don't have to use any other radix, any more than you have to use all forms
of float literals if you are happy with xx.yy. The others just become available
through a consistent methodology.

>Peronally, I was on board for the removal of octal literals, if only
>because I find _seeing_ a leading zero without something else (like the
>'x' for hexidecimal) to be difficult, and because I've found little use
>for them in my work (decimals and hex are usually all I need).
I agree that 8r641 is more easily disambiguated than 0641 ;-)

But how do you represent a negative int in hex? Or have you never encountered the need?
The failure of current formats with respect to negative values whose values you
want to specify in a bit-specifying format was my main point.

Bengt Richter

From barry at  Wed Feb  1 19:35:14 2006
From: barry at (Barry Warsaw)
Date: Wed, 01 Feb 2006 13:35:14 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Wed, 2006-02-01 at 09:47 -0800, Josiah Carlson wrote:

> I hope I'm not the only one who thinks that "simple is better than
> complex", at least when it comes to numeric constants.  Certainly it
> would be _convenient_ to express constants in a radix other than decimal,
> hexidecimal, or octal, but to me, it all looks like noise.

As a Unix weenie and occasional bit twiddler, I've had needs for octal,
hex, and binary literals.  +1 for coming up with a common syntax for
these.  -1 on removing any way to write octal literals.

The proposal for something like 0xff, 0o664, and 0b1001001 seems like
the right direction, although 'o' for octal literal looks kind of funky.
Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From gvwilson at  Wed Feb  1 19:55:42 2006
From: gvwilson at (Greg Wilson)
Date: Wed, 1 Feb 2006 13:55:42 -0500 (EST)
Subject: [Python-Dev] syntactic support for sets
Message-ID: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>


I have a student who may be interested in adding syntactic support for
sets to Python, so that:

    x = {1, 2, 3, 4, 5}


    y = {z for z in x if (z % 2)}

would be legal.  There are of course issues (what's the syntax for a
frozen set? for the empty set?), but before he even starts, I'd like to
know if this would ever be considered for inclusion into the language.


p.s. please Cc: me as well as the list, since I'm no longer subscribed.

From martin at  Wed Feb  1 20:16:04 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Feb 2006 20:16:04 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Sjoerd Mullender wrote:
> I don't quite understand what's the big deal.

Traditionally, people see two problems with these initializations:
- the extra initialization may cause a performance loss.
- the initialization might hide real bugs later on. For example,
  if an additional control flow branch is added which fails to
  initialize the variable, you don't get the warning anymore,
  not even from compilers which previously did a correct analysis.

Whether this is a big deal, I don't know.


From jcarlson at  Wed Feb  1 20:07:17 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 01 Feb 2006 11:07:17 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

bokr at (Bengt Richter) wrote:
> On Wed, 01 Feb 2006 09:47:34 -0800, Josiah Carlson <jcarlson at> wrote:
> >bokr at (Bengt Richter) wrote:
> >> On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro" <gjc at> wrote:
> >> [...]
> >> >  Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
> >> >casual observer; perhaps a suffix letter is more readable, since we
> >> >don't need arbitrary radix support anyway.
> >
> >[snip discussion over radix and compliments]
> >
> >I hope I'm not the only one who thinks that "simple is better than
> >complex", at least when it comes to numeric constants.  Certainly it
> >would be _convenient_ to express constants in a radix other than decimal,
> >hexidecimal, or octal, but to me, it all looks like noise.
> You don't have to use any other radix, any more than you have to use all forms
> of float literals if you are happy with xx.yy. The others just become available
> through a consistent methodology.
> >Peronally, I was on board for the removal of octal literals, if only
> >because I find _seeing_ a leading zero without something else (like the
> >'x' for hexidecimal) to be difficult, and because I've found little use
> >for them in my work (decimals and hex are usually all I need).
> I agree that 8r641 is more easily disambiguated than 0641 ;-)
> But how do you represent a negative int in hex? Or have you never encountered the need?
> The failure of current formats with respect to negative values whose values you
> want to specify in a bit-specifying format was my main point.

In my experience, I've rarely had the opportunity (or misfortune?) to
deal with negative constants, whose exact bit representation I needed to
get "just right".  For my uses, I find that specifying "-0x..." or "-..."
to be sufficient.

Certainly it may or may not be the case in what you are doing (hence
your exposition on signs, radixes, etc.).

Would the i32() function you previously defined, as well as a utility
h32() function which does the reverse be a reasonable start?  Are there
any radixes beyond binary, octal, decimal, and hexidecimal that people
want to use?  Does it make sense to create YYrXXXXX syntax for integer
literals for basically 4 representations, all of which can be handled by
int('XXXXXX', YY) (ignoring the runtime overhead)?  Does the suffix idea
for different types (long, decimal, ...) necessarily suggest that
suffixes for radixes for one type (int/long) is a good idea (1011b,
2000o, ...) are a good idea?

I'll expand what I said before; there are many things that would make
integer literals more convenient for heavy (or experienced) users of
non-decimal or non-decimal-non-positive literals, but it wouldn't
necessarily increase the understandability of code which uses them.

 - Josiah

From paul-python at  Wed Feb  1 19:54:49 2006
From: paul-python at (Paul Svensson)
Date: Wed, 1 Feb 2006 13:54:49 -0500 (EST)
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Wed, 1 Feb 2006, Barry Warsaw wrote:

> The proposal for something like 0xff, 0o664, and 0b1001001 seems like
> the right direction, although 'o' for octal literal looks kind of funky.
> Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).

Shouldn't it be 0t644 then, and 0n1001001 for binary ?
That would sidestep the issue of 'b' and 'c' being valid
hexadecimal digits as well.

Regarding negative numbers, I think they're a red herring.
If there is any need for a new literal format,
it would be to express ~0x0f, not -0x10.
1xf0 has been proposed before, but I think YAGNI.


From martin at  Wed Feb  1 20:21:58 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Feb 2006 20:21:58 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Tim Peters wrote:
>>It inlines the function to make this determination.
> Very cool!  Is this a new(ish) behavior?

In 3.4:

# A new unit-at-a-time compilation scheme for C, Objective-C, C++ and
# Java which is enabled via -funit-at-a-time (and implied by -O2). In
# this scheme a whole file is parsed first and optimized later. The
# following basic inter-procedural optimizations are implemented:
#  - ...

The actual "might be uninitialized" warning comes from the SSA branch,
which was merged in 4.0, as somebody else pointed out.


From rasky at  Wed Feb  1 20:40:58 2006
From: rasky at (Giovanni Bajo)
Date: Wed, 1 Feb 2006 20:40:58 +0100
Subject: [Python-Dev] Compiler warnings
References: <>
Message-ID: <07d601c62767$6cde4aa0$bf03030a@trilan>

Tim Peters <tim.peters at> wrote:

> [Thomas Wouters]
>> I noticed a few compiler warnings, when I compile Python on my amd64 with
>> gcc 4.0.3:
>> Objects/longobject.c: In function 'PyLong_AsDouble':
>> Objects/longobject.c:655: warning: 'e' may be used uninitialized in this
>> function
> Well, that's pretty bizarre.  There's _obviously_ no way to get to a
> reference to `e` without going through
> x = _PyLong_AsScaledDouble(vv, &e);
> first.  That isn't a useful warning.

This has been discussed many times on the GCC mailing list. Ultimately,
detecting whether a variable is using initialized or not (given full
interprocedural and whole-program compilation) is a problem that can be
reduced to the halting problem. The only thing that GCC should (and will) do
is finding a way to be consistent across different releases and optimization
levels, and to produce an useful number of warnings, while not issuing too
many false positives.
Giovanni Bajo

From barry at  Wed Feb  1 20:51:41 2006
From: barry at (Barry Warsaw)
Date: Wed, 01 Feb 2006 14:51:41 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-01 at 11:07 -0800, Josiah Carlson wrote:

> In my experience, I've rarely had the opportunity (or misfortune?) to
> deal with negative constants, whose exact bit representation I needed to
> get "just right".  For my uses, I find that specifying "-0x..." or "-..."
> to be sufficient.

I can't remember a time when signed hex, oct, or binary representation
wasn't a major inconvenience, let alone something desirable.  Don't get
me started about hex(id(object()))!  I typically use hex for addresses
and bit fields, binary for bit flags and other bit twiddling, and oct
for OS/file system interfaces.  In none of those cases do you actually
need or want signed values.  IME.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From brett at  Wed Feb  1 20:59:10 2006
From: brett at (Brett Cannon)
Date: Wed, 1 Feb 2006 11:59:10 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On 2/1/06, Greg Wilson <gvwilson at> wrote:
> Hi,
> I have a student who may be interested in adding syntactic support for
> sets to Python, so that:
>     x = {1, 2, 3, 4, 5}
> and:
>     y = {z for z in x if (z % 2)}
> would be legal.  There are of course issues (what's the syntax for a
> frozen set? for the empty set?), but before he even starts, I'd like to
> know if this would ever be considered for inclusion into the language.

I am -0 on set syntax support.  If the set() constructor was expanded
to take an arbitrary number of arguments (and thus be more inline with
the dict constructor) then the syntax need really starts to go away
since the above could be done as ``set(1, 2, 3, 4, 5)``.

As for the set copmrehension/expression/thing, I don't think that is
needed at all when ``set(z for z in x if z % 2)`` will get the job
done just as well without adding more syntactic sugar to the language
for something that is so easy to do already.

> p.s. please Cc: me as well as the list, since I'm no longer subscribed.


From raymond.hettinger at  Wed Feb  1 20:50:28 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 01 Feb 2006 14:50:28 -0500
Subject: [Python-Dev] syntactic support for sets
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <000d01c62768$c11c0980$b83efea9@RaymondLaptop1>

[Greg Wilson]
> I have a student who may be interested in adding syntactic support for
> sets to Python, so that:
>    x = {1, 2, 3, 4, 5}
> and:
>    y = {z for z in x if (z % 2)}
> would be legal.  There are of course issues (what's the syntax for a
> frozen set? for the empty set?), but before he even starts, I'd like to
> know if this would ever be considered for inclusion into the language.

Generator expressions make syntactic support irrelevant:

  x = set(xrange(1,6))
  y = set(z for z in x if (z % 2))
  y = frozenset(z for z in x if (z % 2))

Accordingly,Guido rejected the braced notation for set comprehensions.  


From pje at  Wed Feb  1 21:03:22 2006
From: pje at (Phillip J. Eby)
Date: Wed, 01 Feb 2006 15:03:22 -0500
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

At 01:55 PM 2/1/2006 -0500, Greg Wilson wrote:
>I have a student who may be interested in adding syntactic support for
>sets to Python, so that:
>     x = {1, 2, 3, 4, 5}
>     y = {z for z in x if (z % 2)}
>would be legal.  There are of course issues (what's the syntax for a
>frozen set? for the empty set?),

Ones that work now:

    frozenset(z for z in x if (z%2))


The only case that looks slightly less than optimal is:

    set((1, 2, 3, 4, 5))

But I'm not sure that it warrants a special syntax just to get rid of the 
extra ().

From raymond.hettinger at  Wed Feb  1 21:16:58 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 01 Feb 2006 15:16:58 -0500
Subject: [Python-Dev] syntactic support for sets
References: <>
Message-ID: <000701c6276c$7504d500$b83efea9@RaymondLaptop1>

[Phillip J. Eby]
> The only case that looks slightly less than optimal is:
>    set((1, 2, 3, 4, 5))
> But I'm not sure that it warrants a special syntax just to get rid of the
> extra ().

The PEP records that Tim argued for leaving the extra parentheses.
What would you do with {'title'} -- create a four element set consisting
of letters or a single element set consisting of a string?


From thomas at  Wed Feb  1 22:15:11 2006
From: thomas at (Thomas Wouters)
Date: Wed, 1 Feb 2006 22:15:11 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 01, 2006 at 11:29:16AM -0500, Tim Peters wrote:
> [Thomas Wouters]
> > Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
> > (in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
> > behaves the same way. The normal make process shows quite a lot of output on
> > systems that use gcc, so I wouldn't be surprised if people did ignore it,
> > for the most part.

> Does it really?  It's completely warning-free on Windows, and that's
> the intent, and it takes ongoing work to keep it that way.  Over at,
> e.g.,

No, it's mostly warning-free, it just outputs a lot of text. By default,
the warnings don't stand out much. And if you have a decent computer, it
scrolls by pretty fast, too. ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Wed Feb  1 22:34:22 2006
From: thomas at (Thomas Wouters)
Date: Wed, 1 Feb 2006 22:34:22 +0100
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 01, 2006 at 10:15:15AM -0500, Tim Peters wrote:

> Thomas, for these _PyLong_AsScaledDouble()-caller cases, I suggest doing
> whatever obvious thing manages to silence the warning.  For example, in
> PyLong_AsDouble:
> 	int e = -1;  /* silence gcc warning */
> and then add:
> 	assert(e >= 0);
> after the call.

Done, although it was nowhere near obvious to me that -1 would be a sane
sentinel value ;) Not that I don't believe you, but it took some actual
reading of _PyLong_AsScaledDouble to confirm it.

Reading--imagine-that-ly y'rs,
Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From steven.bethard at  Wed Feb  1 22:58:41 2006
From: steven.bethard at (Steven Bethard)
Date: Wed, 1 Feb 2006 14:58:41 -0700
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <000701c6276c$7504d500$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
> [Phillip J. Eby]
> > The only case that looks slightly less than optimal is:
> >
> >    set((1, 2, 3, 4, 5))
> >
> > But I'm not sure that it warrants a special syntax just to get rid of the
> > extra ().
> The PEP records that Tim argued for leaving the extra parentheses.
> What would you do with {'title'} -- create a four element set consisting
> of letters or a single element set consisting of a string?

I think the answer to this one is clearly that it is a single element
set consisting of a string, just as ['title'] is a single element list
consisting of a string.

I believe the confusion arises if Brett's proposal for ``set(1, 2, 3,
4, 5)`` is considered.  Currently, set('title') is a five element set
consisting of letters.  But set('title', 'author') would be a two
element set consisting of two strings?  The problem is in calling the
set constructor, not in writing a set literal.

That said, I don't think there's really that much of a need for set
literals.  I use sets almost exclusively to remove duplicates, so I
almost always start with empty sets and add things to them.  And I'm
certainly never going to write ``set([1, 1, 2])`` when I could just
write ``set([1, 2])`.

You can wordify anything if you just verb it.
        --- Bucky Katt, Get Fuzzy

From tim.peters at  Wed Feb  1 23:42:36 2006
From: tim.peters at (Tim Peters)
Date: Wed, 1 Feb 2006 17:42:36 -0500
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

> Done,


> although it was nowhere near obvious to me that -1 would be a sane
> sentinel value ;) Not that I don't believe you, but it took some actual
> reading of _PyLong_AsScaledDouble to confirm it.

Nope, the thing to do was to read the docs for _PyLong_AsScaledDouble,
which explicitly promise e >= 0.  That's what I did :-)  "The docs"
are in longobject.h.  You can tell which functions I wrote, BTW,
because they're the ones with comments in the header file documenting
what they do.  It's an ongoing mystery to me why nobody else found
that to be a practice worth emulating ;-)/:-(

From raymond.hettinger at  Wed Feb  1 23:49:21 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 01 Feb 2006 17:49:21 -0500
Subject: [Python-Dev] syntactic support for sets
References: <>
Message-ID: <001b01c62781$be6b8620$b83efea9@RaymondLaptop1>

[Greg Wilson]
> This is a moderately-fertile source of bugs for newcomers: judging from
> the number of students who come into my office with code that they think
> ought to work, but doesn't, most people believe that:
>    set(1, 2, 3)

Like many things in Python where people pre-emptively believe one thing
or another, the interpreter's corrective feedback is immediate:

    >>> set(1, 2, 3)
    Traceback (most recent call last):
        set(1, 2, 3)
    TypeError: set expected at most 1 arguments, got 3

There is futher feedback in the repr string which serves as a reminder
of how to construct a literal:

    >>> set(xrange(3))
    set([0, 1, 2])

Once the students have progressed beyond academic finger drills and
have started writing real code, have you observed a shift in emphasis
away from hard-coded literals and towards something like s=set(data)
where the data is either read-in from outside the script or generated by
another part of the program?

For academic purposes, I think the genexp form also has value in that
it is broadly applicable to more than just sets (i.e. dict comprehensions)
and that it doesn't have to grapple with arbitrary choices about whether
{1,2,3} would be a set or frozenset.


From mwh at  Wed Feb  1 23:59:21 2006
From: mwh at (Michael Hudson)
Date: Wed, 01 Feb 2006 22:59:21 +0000
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <> (Thomas Wouters's message of
	"Wed, 1 Feb 2006 22:15:11 +0100")
References: <>
Message-ID: <>

Thomas Wouters <thomas at> writes:

> On Wed, Feb 01, 2006 at 11:29:16AM -0500, Tim Peters wrote:
>> [Thomas Wouters]
>> > Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
>> > (in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
>> > behaves the same way. The normal make process shows quite a lot of output on
>> > systems that use gcc, so I wouldn't be surprised if people did ignore it,
>> > for the most part.
>> Does it really?  It's completely warning-free on Windows, and that's
>> the intent, and it takes ongoing work to keep it that way.  Over at,
>> e.g.,
> No, it's mostly warning-free, it just outputs a lot of text. By default,
> the warnings don't stand out much. And if you have a decent computer, it
> scrolls by pretty fast, too. ;)

"make -s" is a wonderful thing :)


  In case you're not a computer person, I should probably point out
  that "Real Soon Now" is a technical term meaning "sometime before
  the heat-death of the universe, maybe".
                                     -- Scott Fahlman <sef at>

From evdo.hsdpa at  Thu Feb  2 00:03:57 2006
From: evdo.hsdpa at (Robert Kim Wireless Internet Advisor)
Date: Wed, 1 Feb 2006 15:03:57 -0800
Subject: [Python-Dev] Compiler warnings
In-Reply-To: <>
References: <>
Message-ID: <>

Thomas,,,, thanks.. useful string ... bob

On 2/1/06, Michael Hudson <mwh at> wrote:
> Thomas Wouters <thomas at> writes:
> > On Wed, Feb 01, 2006 at 11:29:16AM -0500, Tim Peters wrote:
> >> [Thomas Wouters]
> >> > Well, I said 4.0.3, and that was wrong. It's actually a pre-release of 4.0.3
> >> > (in Debian's 'unstable' distribution.) However, 4.0.2 (the actual release)
> >> > behaves the same way. The normal make process shows quite a lot of output on
> >> > systems that use gcc, so I wouldn't be surprised if people did ignore it,
> >> > for the most part.
> >
> >> Does it really?  It's completely warning-free on Windows, and that's
> >> the intent, and it takes ongoing work to keep it that way.  Over at,
> >> e.g.,
> >
> > No, it's mostly warning-free, it just outputs a lot of text. By default,
> > the warnings don't stand out much. And if you have a decent computer, it
> > scrolls by pretty fast, too. ;)
> "make -s" is a wonderful thing :)
> Cheers,
> mwh
> --
>  In case you're not a computer person, I should probably point out
>  that "Real Soon Now" is a technical term meaning "sometime before
>  the heat-death of the universe, maybe".
>                                     -- Scott Fahlman <sef at>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Robert Q Kim, Wireless Internet Advisor

2611 S. Pacific Coast Highway 101
Suite 102
Cardiff by the Sea, CA 92007
206 984 0880

From dw at  Thu Feb  2 01:36:24 2006
From: dw at (David Wilson)
Date: Thu, 2 Feb 2006 00:36:24 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Wed, Feb 01, 2006 at 03:03:22PM -0500, Phillip J. Eby wrote:

> The only case that looks slightly less than optimal is:
>     set((1, 2, 3, 4, 5))
> But I'm not sure that it warrants a special syntax just to get rid of the 
> extra ().

In any case I don't think it's possible to differentiate between the
current calling convention and the 'parenless' one reliably, eg.:

    S = set([])

There is no way to tell if that is a set containing an empty list
created using the parenless syntax, or an empty set, as is created with
the current calling convention.

DISOBEY, v.t.  To celebrate with an appropriate ceremony the maturity
of a command.

From eric.nieuwland at  Thu Feb  2 14:45:52 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Thu, 2 Feb 2006 14:45:52 +0100
Subject: [Python-Dev] The path module PEP
In-Reply-To: <>
References: <>
Message-ID: <>

On 1 feb 2006, at 19:14, BJ?rn Lindqvist wrote:

> I've submitted an updated version of the PEP. The only major change is
> that instead of the method atime and property getatime() there is now
> only one method named atime(). Also some information about the string
> inheritance problem in Open Issues. I still have no idea what to do
> about it though.

The current PEP still contains some redundancy between properties and 
methods under Specifications:

basename() <-> name
basename(), stripext() <-> namebase
splitpath() <-> parent, name (documented)
I would like to suggest to use only properties and use splitall() to 
obtain a tuple with the complete breakdown of the path.
And may be splitall() could then be renamed to split().

The directory methods mkdir()/makedirs() and rmdir()/removedirs() could 
be unified. To me it seems they only exist because of Un*x details.

my $0.005


From hyeshik at  Thu Feb  2 18:44:13 2006
From: hyeshik at (Hye-Shik Chang)
Date: Fri, 3 Feb 2006 02:44:13 +0900
Subject: [Python-Dev] ctypes patch (was:  (libffi) Re: Copyright issue)
Message-ID: <>

On 1/30/06, "Martin v. L?wis" <martin at> wrote:
> Hye-Shik Chang wrote:
> > I did some work to make ctypes+libffi compacter and liberal.
> >  (svn)
> >
> > I removed sources/gcc and put sources/libffi copied from gcc 4.0.2.
> > And removed all automake-related build processes and integrated
> > them into There's still aclocal.m4 in sources/libffi. But
> > it is just identical to libffi's acinclude.m4 which looks liberal.
> Well done! Would you like to derive a Python patch from that?
> Don't worry about MSVC, yet, I will do that once the sources
> are in the subversion.

Here goes patches for the integration:


I implemented it in two flavors.  [1] runs libffi's configure along with
Python's and just builds it.  And [2] has no change to
Python's configure and runs libffi configure and builds it.
And both patches don't have things for documentations yet.

> (Of course, for due process, it would be better if this code gets
> integrated into the official ctypes first, and then we incorporate
> some named/versioned snapshot into /external, and svn cp it into
> python/trunk from there).

Thomas and I collaborated on integration into the ctypes repository
and testing on various platforms yesterday.  My patches for Python
are derived from ctypes CVS with a change of only one line.


From scs5mjf at  Wed Feb  1 20:09:52 2006
From: scs5mjf at (M J Fleming)
Date: Wed, 1 Feb 2006 19:09:52 +0000
Subject: [Python-Dev] Octal literals
Message-ID: <>

On Wed, Feb 01, 2006 at 01:35:14PM -0500, Barry Warsaw wrote:
> The proposal for something like 0xff, 0o664, and 0b1001001 seems like
> the right direction, although 'o' for octal literal looks kind of funky.
> Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).
> -Barry


I definately agree with the 0c664 octal literal. Seems rather more


From gvwilson at  Wed Feb  1 22:44:32 2006
From: gvwilson at (Greg Wilson)
Date: Wed, 1 Feb 2006 16:44:32 -0500 (EST)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <000d01c62768$c11c0980$b83efea9@RaymondLaptop1>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <Pine.GSO.4.58.0602011642460.5572@dvp.cs>

> Generator expressions make syntactic support irrelevant:

Not when you're teaching the language to undergraduates: I haven't
actually done the study yet (though I may this summer), but I'm willing to
bet that allowing "math" notation for sets will more than double their
use.  (Imagine having to write "list(1, 2, 3, 4, 5)"...)

> Accordingly,Guido rejected the braced notation for set comprehensions.
> See:

"...however, the issue could be revisited for Python 3000 (see PEP 3000)."
So I'm only 1994 years early ;-)


From gvwilson at  Wed Feb  1 22:48:23 2006
From: gvwilson at (Greg Wilson)
Date: Wed, 1 Feb 2006 16:48:23 -0500 (EST)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <000701c6276c$7504d500$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <Pine.GSO.4.58.0602011646130.5572@dvp.cs>

> The PEP records that Tim argued for leaving the extra parentheses. What
> would you do with {'title'} -- create a four element set consisting of
> letters or a single element set consisting of a string?

This is a moderately-fertile source of bugs for newcomers: judging from
the number of students who come into my office with code that they think
ought to work, but doesn't, most people believe that:

    set(1, 2, 3)

is "right".  I believe curly-brace notation would eliminate this problem.


From gvwilson at  Thu Feb  2 02:55:38 2006
From: gvwilson at (Greg Wilson)
Date: Wed, 1 Feb 2006 20:55:38 -0500 (EST)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <001b01c62781$be6b8620$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <Pine.GSO.4.58.0602012050540.14355@qew.cs>

> Like many things in Python where people pre-emptively believe one thing
> or another, the interpreter's corrective feedback is immediate:

Yup, that's the theory; it's a shame practice is different.

> Once the students have progressed beyond academic finger drills and have
> started writing real code, have you observed a shift in emphasis away
> from hard-coded literals and towards something like s=set(data) where
> the data is either read-in from outside the script or generated by
> another part of the program?

The problem is that once people classify something as "hard" or "fragile",
they (consciously or unconsciously) avoid it thereafter, which of course
means that it doesn't get any easier or more robust, since they're not
practicing it.  This has been observed in many arenas, not just
programming.  I agree it's not a compelling reason to add set notation to
the language, but I'd rather eliminate the sand traps than reuqire people
to learn to recognize and avoid them.


From theller at  Thu Feb  2 19:55:50 2006
From: theller at (Thomas Heller)
Date: Thu, 02 Feb 2006 19:55:50 +0100
Subject: [Python-Dev] ctypes patch
References: <>
Message-ID: <>

Hye-Shik Chang <hyeshik at> writes:

> On 1/30/06, "Martin v. L?wis" <martin at> wrote:
>> Hye-Shik Chang wrote:
>> > I did some work to make ctypes+libffi compacter and liberal.
>> >  (svn)
>> >
>> > I removed sources/gcc and put sources/libffi copied from gcc 4.0.2.
>> > And removed all automake-related build processes and integrated
>> > them into There's still aclocal.m4 in sources/libffi. But
>> > it is just identical to libffi's acinclude.m4 which looks liberal.
>> Well done! Would you like to derive a Python patch from that?
>> Don't worry about MSVC, yet, I will do that once the sources
>> are in the subversion.
> Here goes patches for the integration:
> [1]
> [2]
> I implemented it in two flavors.  [1] runs libffi's configure along with
> Python's and just builds it.  And [2] has no change to
> Python's configure and runs libffi configure and builds it.
> And both patches don't have things for documentations yet.

My plan is to make separate ctypes releases for 2.3 and 2.4, even after
it is integrated into Python 2.5, so it seems [2] would be better - it
must be possible to build ctypes without Python.

As I said before, docs need still to be written.  I think content is
more important than markup, so I'm writing in rest, it can be converted
to latex later.  I expect that writing the docs will show quite some
edges that need to be cleaned up - that should certainly be done before
the first 2.5 release.

Also I want to make a few releases before declaring the 1.0 version.
This does not mean that I'm against integrating it right now.

>> (Of course, for due process, it would be better if this code gets
>> integrated into the official ctypes first, and then we incorporate
>> some named/versioned snapshot into /external, and svn cp it into
>> python/trunk from there).
> Thomas and I collaborated on integration into the ctypes repository
> and testing on various platforms yesterday.  My patches for Python
> are derived from ctypes CVS with a change of only one line.
Hye-Shik has done a great job!  Many thanks to him for that.


From ark at  Thu Feb  2 20:09:38 2006
From: ark at (Andrew Koenig)
Date: Thu, 2 Feb 2006 14:09:38 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
Message-ID: <000001c6282c$3a0efa50$6402a8c0@arkdesktop>

> I definately agree with the 0c664 octal literal. Seems rather more
> intuitive.

I still prefer 8r664.

From bokr at  Thu Feb  2 20:11:13 2006
From: bokr at (Bengt Richter)
Date: Thu, 02 Feb 2006 19:11:13 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Wed, 1 Feb 2006 13:54:49 -0500 (EST), Paul Svensson <paul-python at> wrote:

>On Wed, 1 Feb 2006, Barry Warsaw wrote:
>> The proposal for something like 0xff, 0o664, and 0b1001001 seems like
>> the right direction, although 'o' for octal literal looks kind of funky.
>> Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).
>Shouldn't it be 0t644 then, and 0n1001001 for binary ?
>That would sidestep the issue of 'b' and 'c' being valid
>hexadecimal digits as well.
>Regarding negative numbers, I think they're a red herring.
>If there is any need for a new literal format,
>it would be to express ~0x0f, not -0x10.
>1xf0 has been proposed before, but I think YAGNI.
YMMV re YAGNI, but you have an excellent point re negative numbers vs ~.

If you look at examples, the representation digits _are_ actually "~" ;-)
I.e., I first proposed 'c' in place of 'r' for 16cf0, where "c" stands for
radix _complement_, and 0 and 1 are complements wrt 2, as are
hex 0 and f wrt radix 16.

So the actual notation has digits that are radix-complement, and
are evaluated as such to get the integer value.

So ~0x0f is represented r16-f0, which does produce a negative number
(but whose integer value BTW is -0x10, not 0x0f. I.e., -16r-f0 == 16r+10,
and the sign after the 'r' is a complement-notation indicator, not
an algebraic sign. (Perhaps or '^' would be a better indicator, as -16r^f0 == 0x10)

Thank you for making the point that the negative value per se is a red herring.

Still, that is where the problem shows up: e.g. when we want to define a hex bit mask
as an int and the sign bit happens to be set. IMO it's a wart that if you want
to define bit masks as integer data, you have to invoke computation for the sign bit,

BIT_0 = 0x1
BIT_1 = 0x02
BIT_30 = 0x40000000
BIT_31 = int(-0x80000000)

instead of defining true literals all the way, e.g.,

BIT_0 = 16r1
BIT_1 = 16r2 # or 16r00000002 obviously
BIT_30 = 16r+40000000
BIT_31 = 16r-80000000)

and if you wanted to define the bit-wise complement masks as literals,
you could, though radix-2 is certainly easier to see (introducing '_' as transparent elision)

CBIT_0 = 16r-f # or 16r-fffffffe or 2r-0 or 2r-11111111_11111111_11111111_11111110
CBIT_1 = 16r-d # or 16r-fffffffd or 2r-01 or 2r-11111111_11111111_11111111_11111101
CBIT_30 = 16r-b0000000 or 2r-10111111_11111111_11111111_11111111
CBIT_31 = 16r+7fffffff or 2r+01111111_11111111_11111111_11111111

With constant-folding optimization and some kind of inference-guiding for expressions like
-sys.maxint-1, perhaps computation vs true literals will become moot. And practically
it already is, since a one-time computation is normally insignificant in time or space.

But aren't we also targeting platforms also where space is at a premium, and being able to
define constants as literal data without resorting to workaround pre-processing would be nice?

BTW, base-complement decoding works by generalized analogy to twos complement decoding, by assuming
that the most significant digit is a signed coefficient value for base**digitpos in radix-complement form,
where the upper half of the range of digits represents negative values as digit-radix, and the rest positive as digit.
The rest of the digits are all positive coefficients for base powers.

E.g., to decode our simple example[1] represented as a literal in base-complement form (very little tested):

 >>> def bclitval(s, digits='0123456789abcdefghijklmnopqrstuvwxyz'):
 ...     """
 ...     decode base complement literal of form <base>r<sign><digits>
 ...     where
 ...         <base> is in range(2,37) or more if digits supplied
 ...         <sign> is a mnemonic + for digits[0] and - for digits[<base>-1] or absent
 ...         <digits> are decoded as base-complement notation after <sign> if
 ...             present is changed to appropriate digit.
 ...         The first digit is taken as a signed coefficient with value
 ...         digit-<base> (negative) if the digit*2>=B and digit (positive) otherwise.
 ...     """
 ...     B, s = s.split('r', 1)
 ...     B = int(B)
 ...     if s[0] =='+': s = digits[0]+s[1:]
 ...     elif s[0] =='-': s = digits[B-1]+s[1:]
 ...     ds = digits.index(s[0])
 ...     if ds*2 >= B: acc = ds-B
 ...     else: acc = ds
 ...     for c in s[1:]: acc = acc*B + digits.index(c)
 ...     return acc
 >>> bclitval('16r80000004')
 >>> bclitval('2r10000000000000000000000000000100')

BTW, because of the decoding method, extended "sign" bits
don't force promotion to a long value:

 >>> bclitval('16rffffffff80000004')

[1] To reduce all this eye-glazing discussion to a simple example, how do people now
use hex notation to define an integer bit-mask constant with bits 31 and 2 set?
(assume 32-bit int for target platform, counting bit 0 as LSB and bit 31 as sign).

Bengt Richter

From foom at  Thu Feb  2 21:26:24 2006
From: foom at (James Y Knight)
Date: Thu, 2 Feb 2006 15:26:24 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Feb 2, 2006, at 7:11 PM, Bengt Richter wrote:
> [1] To reduce all this eye-glazing discussion to a simple example,  
> how do people now
> use hex notation to define an integer bit-mask constant with bits  
> 31 and 2 set?

That's easy:

That was broken in python < 2.4, though, so there you need to do:
MASK = 2**32 - 1
0x80000004 & MASK
> (assume 32-bit int for target platform, counting bit 0 as LSB and  
> bit 31 as sign).

The 31st bit _isn't_ the sign bit in python and the bit-ness of the  
target platform doesn't matter. Python's integers are arbitrarily  
long. I'm not sure why you're trying to pretend as if python was C.


From jjl at  Thu Feb  2 21:30:00 2006
From: jjl at (John J Lee)
Date: Thu, 2 Feb 2006 20:30:00 +0000 (GMT Standard Time)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602012050540.14355@qew.cs>
References: <>
Message-ID: <Pine.WNT.4.64.0602022026170.2060@shaolin>

On Wed, 1 Feb 2006, Greg Wilson wrote:

>> Like many things in Python where people pre-emptively believe one thing
>> or another, the interpreter's corrective feedback is immediate:
> Yup, that's the theory; it's a shame practice is different.

So what mistake(s) *do* your students make?  As people have pointed out, 
the mistake you complain about *does* usually result in an immediate 

>>> set(1, 2, 3)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: set expected at most 1 arguments, got 3
>>> set(1)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence

Perhaps this?

>>> set("argh")
set(['a', 'h', 'r', 'g'])

> the language, but I'd rather eliminate the sand traps than reuqire people
> to learn to recognize and avoid them.

I'm sure nobody would disagree with you, but of course the devil is in 
the detail.


From jjl at  Thu Feb  2 21:32:34 2006
From: jjl at (John J Lee)
Date: Thu, 2 Feb 2006 20:32:34 +0000 (GMT Standard Time)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602011642460.5572@dvp.cs>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <Pine.WNT.4.64.0602022031250.2060@shaolin>

On Wed, 1 Feb 2006, Greg Wilson wrote:
> (Imagine having to write "list(1, 2, 3, 4, 5)"...)

I believe that was actually proposed on this list for Python 3.


From mrovner at  Thu Feb  2 22:28:40 2006
From: mrovner at (Mike Rovner)
Date: Thu, 02 Feb 2006 13:28:40 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <000001c6282c$3a0efa50$6402a8c0@arkdesktop>
References: <>
Message-ID: <drtu5b$nbr$>

Andrew Koenig wrote:
>>I definately agree with the 0c664 octal literal. Seems rather more
> I still prefer 8r664.

664[8] looks better and allows any radix

From aleaxit at  Thu Feb  2 23:26:30 2006
From: aleaxit at (Alex Martelli)
Date: Thu, 2 Feb 2006 14:26:30 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602011642460.5572@dvp.cs>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On 2/1/06, Greg Wilson <gvwilson at> wrote:
> > Generator expressions make syntactic support irrelevant:
> Not when you're teaching the language to undergraduates: I haven't
> actually done the study yet (though I may this summer), but I'm willing to
> bet that allowing "math" notation for sets will more than double their
> use.  (Imagine having to write "list(1, 2, 3, 4, 5)"...)

Actually, as far as I'm concerned, I'd just love to remove the [ ... ]
notation for building lists if good ways could be found to distinguish
"a list with this one item" from "a list with the same items as this
iterable".  list(1, 2, 3) is perfectly easy to explain, more readable,
and just as likely to be used, if not more, than cryptic shorthand
[1,2,3]. "If you want APL, you know where to find it" (==on IBM's
online store, called APL2!-).

> > Accordingly,Guido rejected the braced notation for set comprehensions.
> > See:
> "...however, the issue could be revisited for Python 3000 (see PEP 3000)."
> So I'm only 1994 years early ;-)

Don't be such a pessimist, it's ONLY 994 years to go!


From bokr at  Thu Feb  2 23:36:03 2006
From: bokr at (Bengt Richter)
Date: Thu, 02 Feb 2006 22:36:03 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Thu, 2 Feb 2006 15:26:24 -0500, James Y Knight <foom at> wrote:

>On Feb 2, 2006, at 7:11 PM, Bengt Richter wrote:
>> [1] To reduce all this eye-glazing discussion to a simple example,  
>> how do people now
>> use hex notation to define an integer bit-mask constant with bits
>> 31 and 2 set?                    |
>                                   |
>That's easy:                       |
>0x80000004                         |
 >>> 0x80000004                     |
 2147483652L                        |  

That didn't meet specs ;-)

>That was broken in python < 2.4, though, so there you need to do:
I agree it was broken, but
>MASK = 2**32 - 1
>0x80000004 & MASK
does not solve the problem of doing correctly what it was doing (creating
a mask in a signed type int variable, which happened to have the sign bit set).
So long as there is a fixed-width int different from long, the problem will reappear.

>> (assume 32-bit int for target platform, counting bit 0 as LSB and
>> bit 31 as sign).
>The 31st bit _isn't_ the sign bit in python and the bit-ness of the
>target platform doesn't matter. Python's integers are arbitrarily 
>long. I'm not sure why you're trying to pretend as if python was C.
Evidently I haven't made myself clear to you, and your mind reading wrt
what I am trying to pretend is definitely flawed (and further speculations
along that line are likely to be OT ;-)

So long as we have a distinction between int and long, IWT int will be fixed width
for any given implementation, and for interfacing with foreign functions it will
continue to be useful at times to limit the type of arguments being passed.

To do this arms-length C argument type control, it may be important to have constants
of int type, knowing what that means on a given platform, and therefore _nice_ to be able
to define them directly, understanding full well all the issues, and that there are workarounds ;-)

Whatever the fixed width of int, ISTM we'll have predictable type promotion effects
such as

 >>> width=32
 >>> -1*2**(width-2)*2
 >>> -1*2**(width-1)


 >>> hex(-sys.maxint-1)
 >>> (-int(hex(-sys.maxint-1)[1:],16)) ==  (-sys.maxint-1)
 >>> (-int(hex(-sys.maxint-1)[1:],16)) ,   (-sys.maxint-1)
 (-2147483648L, -2147483648)
 >>> type(-int(hex(-sys.maxint-1)[1:],16)) ==  type(-sys.maxint-1)
 >>> type(-int(hex(-sys.maxint-1)[1:],16)) ,   type(-sys.maxint-1)
 (<type 'long'>, <type 'int'>)

[1] Even though BTW you could well define a sign bit position abstractly for any
integer value. E.g., the LSB of the arbitrarily repeated sign bits to the left
of any integer in a twos complement representation (which can be well defined abstractly too).
Code left as exercise ;-)

Bottom line: You haven't shown me an existing way to do "16r80000004" and produce the int ;-)

Bengt Richter

From aleaxit at  Thu Feb  2 23:43:51 2006
From: aleaxit at (Alex Martelli)
Date: Thu, 2 Feb 2006 14:43:51 -0800
Subject: [Python-Dev] any support for a methodcaller HOF?
Message-ID: <>

I was recently reviewing a lot of the Python 2.4 code I have written,
and I've noticed one thing: thanks to the attrgetter and itemgetter
functions in module operator, I've been using (or been tempted to use)
far fewer lambdas, particularly but not exclusively in key= arguments
to sort and sorted.  Most of those "lambda temptations" will be
removed by PEP 309 (functional.partial), and most remaining ones are
of the form:
    lambda x: x.amethod(zip, zop)

So I was thinking -- wouldn't it be nice to have (possibly in module
functional, like partial; possibly in module operator, like itemgetter
and attrgetter -- I'm partial to functional;-) a methodcaller entry
akin to (...possibly with a better name...):

def methodcaller(methodname, *a, **k):
    def caller(self):
        getattr(self, methodname)(*a, **k)
    caller.__name__ = methodname
    return caller

...?  This would allow removal of even more lambdas.

I'll be glad to write a PEP, but I first want to check whether the
Python-Dev crowd would just blast it out of the waters, in which case
I may save writing it...


From martin at  Thu Feb  2 23:46:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 02 Feb 2006 23:46:00 +0100
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:
>>>[1] To reduce all this eye-glazing discussion to a simple example,  
>>>how do people now
>>>use hex notation to define an integer bit-mask constant with bits
>                                  ^^^^^^^  
>>>31 and 2 set?                    |
>>                                  |
>>That's easy:                       |
>>0x80000004                         |
>  >>> 0x80000004                     |
>  2147483652L                        |  
>            ^------------------------'
> That didn't meet specs ;-)

It sure does: 2147483652L is an integer (a long one); it isn't an


From tdelaney at  Fri Feb  3 00:14:00 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Fri, 3 Feb 2006 10:14:00 +1100
Subject: [Python-Dev] Octal literals
Message-ID: <>

M J Fleming wrote:

> +1
> I definately agree with the 0c664 octal literal. Seems rather more
> intuitive.

And importantly, sounds like "Oc" 664 ;)

Tim Delaney

From tdelaney at  Fri Feb  3 00:16:17 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Fri, 3 Feb 2006 10:16:17 +1100
Subject: [Python-Dev] Octal literals
Message-ID: <>

Andrew Koenig wrote:

>> I definately agree with the 0c664 octal literal. Seems rather more
>> intuitive.
> I still prefer 8r664.

The more I look at this, the worse it gets. Something beginning with
zero (like 0xFF, 0c664) immediately stands out as "unusual". Something
beginning with any other digit doesn't. This just looks like noise to

I found the suffix version even worse, but they're blown out of the
water anyway by the fact that FFr16 is a valid identifier.

Tim Delaney

From bokr at  Fri Feb  3 01:08:18 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 00:08:18 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On Thu, 02 Feb 2006 23:46:00 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Bengt Richter wrote:
>>>>[1] To reduce all this eye-glazing discussion to a simple example,  
>>>>how do people now
>>>>use hex notation to define an integer bit-mask constant with bits
>>                                  ^^^^^^^  
>>>>31 and 2 set?                    |
>>>                                  |
>>>That's easy:                       |
>>>0x80000004                         |
>>  >>> 0x80000004                     |
>>  2147483652L                        |  
>>            ^------------------------'
>> That didn't meet specs ;-)
>It sure does: 2147483652L is an integer (a long one); it isn't an
Aw, shux, dang. I didn't say what I meant ;-/
Apologies to James & all 'round. s/integer/int/ in the above.

Bengt Richter

From bokr at  Fri Feb  3 01:27:19 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 00:27:19 GMT
Subject: [Python-Dev] Octal literals
References: <>
Message-ID: <>

On Fri, 3 Feb 2006 10:16:17 +1100, "Delaney, Timothy (Tim)" <tdelaney at> wrote:

>Andrew Koenig wrote:
>>> I definately agree with the 0c664 octal literal. Seems rather more
>>> intuitive.
>> I still prefer 8r664.
>The more I look at this, the worse it gets. Something beginning with
>zero (like 0xFF, 0c664) immediately stands out as "unusual". Something
>beginning with any other digit doesn't. This just looks like noise to
>I found the suffix version even worse, but they're blown out of the
>water anyway by the fact that FFr16 is a valid identifier.
Are you sure you aren't just used to the x in 0xff? I.e., if the leading
0 were just an alias for 16, we could use 8x664 instead of 8r664.

BTW Ada uses radix prefix, but with # separating the prefix, so we can't use that.
How about apostrophe as separator?

     8'664   # or the suffix version could work also, although you'd have to back out of some names:

Bengt Richter

From foom at  Fri Feb  3 02:39:01 2006
From: foom at (James Y Knight)
Date: Thu, 2 Feb 2006 20:39:01 -0500
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Feb 2, 2006, at 10:36 PM, Bengt Richter wrote:
> So long as we have a distinction between int and long, IWT int will  
> be fixed width
> for any given implementation, and for interfacing with foreign  
> functions it will
> continue to be useful at times to limit the type of arguments being  
> passed.

We _don't_ have a distinction in any meaningful way, anymore. ints  
and longs are almost always treated exactly the same, other than the  
"L" suffix. I expect that suffix will soon go away as well. If there  
is code that _doesn't_ treat them the same, there is the bug. We  
don't need strange new syntax to work around buggy code.

Note that 10**14/10**13 is also a long, yet any interface that did  
not accept that as an argument but did accept "10" is simply buggy.  
Same goes for code that says it takes a 32-bit bitfield argument but  
won't accept 0x80000000.


From bokr at  Fri Feb  3 09:05:25 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 08:05:25 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>
Message-ID: <>

On Thu, 2 Feb 2006 20:39:01 -0500, James Y Knight <foom at> wrote:

>On Feb 2, 2006, at 10:36 PM, Bengt Richter wrote:
>> So long as we have a distinction between int and long, IWT int will  
>> be fixed width
>> for any given implementation, and for interfacing with foreign  
>> functions it will
>> continue to be useful at times to limit the type of arguments being  
>> passed.
>We _don't_ have a distinction in any meaningful way, anymore. ints
Which will disappear, "int" or "long"? Or both in favor of "integer"?
What un-"meaningful" distinction(s) are you hedging your statement about? ;-)
>and longs are almost always treated exactly the same, other than the  
>"L" suffix. I expect that suffix will soon go away as well. If there  
>is code that _doesn't_ treat them the same, there is the bug. We  
If you are looking at them in C code receiving them as args in a call,
"treat them the same" would have to mean provide code to coerce long->int
or reject it with an exception, IWT. This could be a performance issue
that one might like to control by calling strictly with int args, or even
an implementation restriction due to lack of space on some microprocessor
for unnecessary general coercion code.

>don't need strange new syntax to work around buggy code.
It's not a matter of "buggy" if you are trying to optimize.
(I am aware of premature optimization issues, and IMO "strange"
is in the eye of the beholder. What syntax would you suggest?
I am not married to any particular syntax, just looking for
expressive control over what my programs will do ;-)

>Note that 10**14/10**13 is also a long, yet any interface that did  
>not accept that as an argument but did accept "10" is simply buggy.
def foo(i): assert isinstance(i, int); ... # when this becomes illegal, yes.

>Same goes for code that says it takes a 32-bit bitfield argument but  
>won't accept 0x80000000.
If the bitfield is signed, it can't, unless you are glossing over
an assumed coercion rule.

 >>> int(0x80000000)
 >>> int(-0x80000000)

BTW, I am usually on the pure-abstraction-view side of discussions ;-)

Bengt Richter

From stefan.rank at  Fri Feb  3 09:38:06 2006
From: stefan.rank at (Stefan Rank)
Date: Fri, 03 Feb 2006 09:38:06 +0100
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

on 03.02.2006 00:16 Delaney, Timothy (Tim) said the following:
> Andrew Koenig wrote:
>>> I definately agree with the 0c664 octal literal. Seems rather more
>>> intuitive.
>> I still prefer 8r664.
> The more I look at this, the worse it gets. Something beginning with
> zero (like 0xFF, 0c664) immediately stands out as "unusual". Something
> beginning with any other digit doesn't.

Let me throw something into the arena :-)

I know there should only be one way to do it, but what about requiring a 
leading 0 for any 'special' number format, and then allow::






and maybe have 0b be a synonym of 02r, and some other nice character 
(o/c) for octals.

For backwards compatibility you could even allow classic octal literals, 
though I think it would be better to have a Syntax Error for any literal 
starting with 0 but missing a radix code.


From mwh at  Fri Feb  3 10:36:30 2006
From: mwh at (Michael Hudson)
Date: Fri, 03 Feb 2006 09:36:30 +0000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <> (Alex
	Martelli's message of "Thu, 2 Feb 2006 14:43:51 -0800")
References: <>
Message-ID: <>

Alex Martelli <aleaxit at> writes:

> I was recently reviewing a lot of the Python 2.4 code I have written,
> and I've noticed one thing: thanks to the attrgetter and itemgetter
> functions in module operator, I've been using (or been tempted to use)
> far fewer lambdas, particularly but not exclusively in key= arguments
> to sort and sorted.

Interesting.  Something I'd noticed was that *until* the key= argument
to sort appeared, I was hardly using any lambdas at all (most of the
places I had used them were rendered obsolete by list comprehensions).

> Most of those "lambda temptations" will be
> removed by PEP 309 (functional.partial), and most remaining ones are
> of the form:
>     lambda x: x.amethod(zip, zop)
> So I was thinking -- wouldn't it be nice to have (possibly in module
> functional, like partial; possibly in module operator, like itemgetter
> and attrgetter -- I'm partial to functional;-) a methodcaller entry
> akin to (...possibly with a better name...):
> def methodcaller(methodname, *a, **k):
>     def caller(self):
>         getattr(self, methodname)(*a, **k)
>     caller.__name__ = methodname
>     return caller
> ...?  This would allow removal of even more lambdas.
> I'll be glad to write a PEP, but I first want to check whether the
> Python-Dev crowd would just blast it out of the waters, in which case
> I may save writing it...


>>> funcTakingCallback(lamda x:x.method(zip, zop))
>>> funcTakingCallback(methodcaller("method", zip, zop))

I'm not sure which of these is clearer really.  Are lambdas so bad?
(FWIW, I haven't internalized itemgetter/attrgetter yet and still tend
to use lambdas instead those too).

A class I wrote (and lost) ages ago was a "placeholder" class, so if
'X' was an instance of this class, "X + 1" was roughly equivalent to
"lambda x:x+1" and "X.method(zip, zop)" was roughly equivalent to your
"methodcaller("method", zip, zop)".  I threw it away when listcomps
got implemented.  Not sure why I mention it now, something about your
post made me think of it...


  If you give someone Fortran, he has Fortran.
  If you give someone Lisp, he has any language he pleases.
    -- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh

From ncoghlan at  Fri Feb  3 11:07:12 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 03 Feb 2006 20:07:12 +1000
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

Bengt Richter wrote:
> On Fri, 3 Feb 2006 10:16:17 +1100, "Delaney, Timothy (Tim)" <tdelaney at> wrote:
>> Andrew Koenig wrote:
>>>> I definately agree with the 0c664 octal literal. Seems rather more
>>>> intuitive.
>>> I still prefer 8r664.
>> The more I look at this, the worse it gets. Something beginning with
>> zero (like 0xFF, 0c664) immediately stands out as "unusual". Something
>> beginning with any other digit doesn't. This just looks like noise to
>> me.
>> I found the suffix version even worse, but they're blown out of the
>> water anyway by the fact that FFr16 is a valid identifier.
> Are you sure you aren't just used to the x in 0xff? I.e., if the leading
> 0 were just an alias for 16, we could use 8x664 instead of 8r664.

No, I'm with Tim - it's definitely the distinctive shape of the '0' that helps 
the non-standard base stand out. '0c' creates a similar shape, also helping it 
to stand out. More on distinctive shapes below, though.

That said, I'm still trying to figure out exactly what problem is being solved 
here. Thinking out loud. . .

The full syntax for writing integers in any base is:

   int("LITERAL", RADIX)
   int("LITERAL", base=RADIX)

5 prefix chars, 3 or 8 in the middle (counting the space, and depending on 
whether the keyword is used or not), one on the end, and one or two to specify 
the radix. That's quite verbose, so its unsurprising that many would like 
something nicer in the toolkit when they need to write multiple numeric 
literals in a base other than ten. This can typically happen when writing Unix 
system admin scripts, bitbashing to control a piece of hardware or some other 
low-level task.

The genuine use cases we have for integer literals are:
   - decimal (normal numbers)
   - hex (compact bitmasks)
   - octal (unix file permissions)
   - binary (explicit bitmasks for those that don't speak fluent hex)

Currently, there is no syntax for binary literals, and the syntax for octal 
literals is both magical (where else in integer mathematics does a leading 
zero matter?) and somewhat error prone (int and eval will give different 
answers for a numeric literal with a leading zero - int ignores the leading 
zero, eval treats it as signifying that the value is in octal. The charming 
result is that the following statement fails: assert int('0123') == 0123).

Looking at existing precedent in the language, a prefix is currently used when 
the parsing of the subsequent literal may be affected (that is, the elements 
that make up the literal may be interpreted differently depending on the 
prefix). This is the case for hex and octal literals, and also for raw and 
unicode strings.

Suffixes are currently used when the literal as a whole is affected, but the 
meaning of the individual elements remains the same. This is the case for both 
long integer and imaginary number literals. A suffix also makes sense for 
decimal float literals, as the individual elements would still be interpreted 
as base 10 digits.

So, since we want to affect the parsing process, this means we want a prefix. 
The convention of using '0x' to denote hex extends far beyond Python, and 
doesn't seem to provoke much in the way of objection.

This suggests options like '0o' or '0c' for octal literals. Given that '0x' 
matches the '%x' in string formatting, the least magical option would be '0o' 
(to match the existing '%o' output format). While '0c' is cute and quite 
suggestive, it creates a significant potential for confusion , as it most 
emphatically does *not* align with the meaning of the '%c' format specifier.

I'd be +0 on changing the octal literal prefix from '0' to '0o', and also +0 
on adding an '0b' prefix and '%b' format specifier for binary numbers.

Whether anyone will actually care enough to implement a patch to change the 
syntax for any of these is an entirely different question ;)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From Ben.Young at  Fri Feb  3 11:15:54 2006
From: Ben.Young at (Ben.Young at
Date: Fri, 3 Feb 2006 10:15:54 +0000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
Message-ID: <>

Michael Hudson wrote on 03/02/2006 09:36:30:

> Hmm.
> >>> funcTakingCallback(lamda x:x.method(zip, zop))
> >>> funcTakingCallback(methodcaller("method", zip, zop))
> I'm not sure which of these is clearer really.  Are lambdas so bad?
> (FWIW, I haven't internalized itemgetter/attrgetter yet and still tend
> to use lambdas instead those too).
> A class I wrote (and lost) ages ago was a "placeholder" class, so if
> 'X' was an instance of this class, "X + 1" was roughly equivalent to
> "lambda x:x+1" and "X.method(zip, zop)" was roughly equivalent to your
> "methodcaller("method", zip, zop)".  I threw it away when listcomps
> got implemented.  Not sure why I mention it now, something about your
> post made me think of it...

The C++ library Boost makes use of this method, but has a number of 
"placeholder" variables _1, _2, _3 ... _9 which can be combined to form 
expressions. e.g _1 + _2 is the same as lambda x,y: x+y so maybe there 
could be a lambda module that exposes placeholders like this. Pythons ones 
will be better that the C++ ones because we would be able to delay 
function calls as above with a much nicer syntax than the C++ versions. 

_1.method(_2+_3) !


> Cheers,
> mwh
> -- 
>   If you give someone Fortran, he has Fortran.
>   If you give someone Lisp, he has any language he pleases.
>     -- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:
> dev/

From bob at  Fri Feb  3 11:40:54 2006
From: bob at (Bob Ippolito)
Date: Fri, 3 Feb 2006 02:40:54 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Feb 3, 2006, at 2:07 AM, Nick Coghlan wrote:

> Bengt Richter wrote:
>> On Fri, 3 Feb 2006 10:16:17 +1100, "Delaney, Timothy (Tim)"  
>> <tdelaney at> wrote:
>>> Andrew Koenig wrote:
>>>>> I definately agree with the 0c664 octal literal. Seems rather more
>>>>> intuitive.
>>>> I still prefer 8r664.
>>> The more I look at this, the worse it gets. Something beginning with
>>> zero (like 0xFF, 0c664) immediately stands out as "unusual".  
>>> Something
>>> beginning with any other digit doesn't. This just looks like  
>>> noise to
>>> me.
>>> I found the suffix version even worse, but they're blown out of the
>>> water anyway by the fact that FFr16 is a valid identifier.
>> Are you sure you aren't just used to the x in 0xff? I.e., if the  
>> leading
>> 0 were just an alias for 16, we could use 8x664 instead of 8r664.
> Currently, there is no syntax for binary literals, and the syntax  
> for octal
> literals is both magical (where else in integer mathematics does a  
> leading
> zero matter?) and somewhat error prone (int and eval will give  
> different
> answers for a numeric literal with a leading zero - int ignores the  
> leading
> zero, eval treats it as signifying that the value is in octal. The  
> charming
> result is that the following statement fails: assert int('0123') ==  
> 0123).

That's just a misunderstanding on your part.  The default radix is  
10, not DWIM.  0 signifies that behavior::

assert int('0123', 0) == 0123
assert int('0x123', 0) == 0x123


From ncoghlan at  Fri Feb  3 11:44:47 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 03 Feb 2006 20:44:47 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Hudson wrote:
> Alex Martelli <aleaxit at> writes:
>> I'll be glad to write a PEP, but I first want to check whether the
>> Python-Dev crowd would just blast it out of the waters, in which case
>> I may save writing it...
> Hmm.
>>>> funcTakingCallback(lamda x:x.method(zip, zop))
>>>> funcTakingCallback(methodcaller("method", zip, zop))
> I'm not sure which of these is clearer really.  Are lambdas so bad?
> (FWIW, I haven't internalized itemgetter/attrgetter yet and still tend
> to use lambdas instead those too).

I've been convinced for a while that the proliferation of features like 
operator.itemgetter and attrgetter (and some uses of functional.partial) 
demonstrate that the ability to defer a single expression (as lambda currently 
allows) is a very useful feature to have in the language. Unfortunately that 
utility gets overshadowed by the ugliness of the syntax, the mathematical 
baggage associated with the current keyword, and the fact that lambda gets 
pitched as an "anonymous function limited to a single expression", rather than 
as "the ability to defer an expression for later evaluation" (the former 
sounds like a limitation that should be fixed, the latter sounds like the 
deliberate design choice that it is).

At the moment it looks like the baby is going to get thrown out with the 
bathwater in Py3k, but I'd love to be able to simply write the following 
instead of some byzantine mixture of function calls to get the same effect:

     funcTakingCallback(x.method(zip, zop) def (x))

Consider these comparisons:

   itemgetter(1)     <=> (x[1] def (x))
   attrgetter('foo') <=> ( def (x))
   partial(y, arg)   <=> (y(arg) def)

So rather than yet another workaround for lambda being ugly, I'd rather see a 
PEP that proposed "Let's make the syntax for deferring an expression not be 
ugly anymore, now that we have generator expressions and conditionals as an 
example of how to do it right".

Guido was rather unenthused the last time this topic came up, though, so maybe 
it isn't worth the effort. . . (although he did eventually change his mind on 
PEP 308, so I haven't entirely given up hope yet).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abo at  Fri Feb  3 12:12:05 2006
From: abo at (Donovan Baarda)
Date: Fri, 03 Feb 2006 11:12:05 +0000
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-01 at 19:09 +0000, M J Fleming wrote:
> On Wed, Feb 01, 2006 at 01:35:14PM -0500, Barry Warsaw wrote:
> > The proposal for something like 0xff, 0o664, and 0b1001001 seems like
> > the right direction, although 'o' for octal literal looks kind of funky.
> > Maybe 'c' for oCtal?  (remember it's 'x' for heXadecimal).
> >
> > -Barry
> >
> +1

+1 too. 

It seems like a "least changes" way to fix the IMHO strange 0123 != 123

Any sort of arbitrary base syntax is overkill; decimal, hexadecimal,
octal, and binary cover 99.9% of cases. The 0.1% of other cases are very
special, and can use int("LITERAL",base=RADIX).

For me, binary is far more useful than octal, so I'd be happy to let
octal languish as legacy support, but I definitely want "0b10110101".

Donovan Baarda <abo at>

From ncoghlan at  Fri Feb  3 12:15:22 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 03 Feb 2006 21:15:22 +1000
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Bob Ippolito wrote:
> On Feb 3, 2006, at 2:07 AM, Nick Coghlan wrote:
>> Currently, there is no syntax for binary literals, and the syntax for 
>> octal
>> literals is both magical (where else in integer mathematics does a 
>> leading
>> zero matter?) and somewhat error prone (int and eval will give different
>> answers for a numeric literal with a leading zero - int ignores the 
>> leading
>> zero, eval treats it as signifying that the value is in octal. The 
>> charming
>> result is that the following statement fails: assert int('0123') == 
>> 0123).
> That's just a misunderstanding on your part.  The default radix is 10, 
> not DWIM.  0 signifies that behavior::
> assert int('0123', 0) == 0123
> assert int('0x123', 0) == 0x123

How does that make the situation any better? The fact remains that a leading 
zero on an integer string may be significant, depending on the exact method 
used to convert the string to a number. The fact that int() can be made to 
behave like eval() doesn't change the fact that the default behaviours are 
different, and in a fashion that allows errors to pass silently.

You've highlighted a nice way to turn this into a real bug, though - use the 
DWIM feature of int() to accept numbers in either decimal or hex, and wait 
until someone relying on the mathematics they learned in high school enters a 
decimal number with a leading zero (leading zeros don't matter, right?).

I think it's a bad thing that Python defaults to handling numbers differently 
from high school mathematics. One of the virtues of '0x' and '0o' is that the 
resulting strings aren't actually legal numbers, leading to people wondering 
what the prefixes mean. The danger of the leading 0 denoting octal is that 
programmers without a background in C (or one of its successors that use the 
same convention) may *think* they know what it means, only to discover they're 
wrong the hard way (when their program doesn't work right).

Do I think this *really* matters? Nope - I think most bugs due to this will be 
pretty shallow. That's why I was only +0 on doing anything about it.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From abo at  Fri Feb  3 13:04:52 2006
From: abo at (Donovan Baarda)
Date: Fri, 03 Feb 2006 12:04:52 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote:
> Hi,
> I have a student who may be interested in adding syntactic support for
> sets to Python, so that:
>     x = {1, 2, 3, 4, 5}
> and:
>     y = {z for z in x if (z % 2)}

Personally I'd like this. currently the "set(...)"  syntax makes sets
feel tacked on compared to tuples, lists, dicts, and strings which have
nice built in syntax support. Many people don't realise they are there
because of this.

Before set() the standard way to do them was to use dicts with None
Values... to me the "{1,2,3}" syntax would have been a logical extension
of the "a set is a dict with no values, only keys" mindset. I don't know
why it wasn't done this way in the first place, though I missed the
arguments where it was rejected.

As for frozenset vs set, I would be inclined to make them normal mutable
sets. This is in line with the "dict without values" idea.

Frozensets are to sets what tuples are to lists. It would be nice if
there was another type of bracket that could be used for frozenset...
something like ':1,2,3:'... yuk... I dunno.

Alternatively you could to the same thing we do with strings; add a
prefix char for different variants; {1,2,3} is a set, f{1,2,3} is a
frozen set...

For Python 3000 you could extend this approach to lists and dicts;
[1,2,3] is a list, f[1,2,3] is a "frozen list" or tuple, {1:'a',2:'b'}
is a dict, f{1:'a',2:'b'} is a "frozen dict" which can be used as a key
in other dicts... etc.

Donovan Baarda <abo at>

From fredrik at  Fri Feb  3 13:10:33 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 3 Feb 2006 13:10:33 +0100
Subject: [Python-Dev] syntactic support for sets
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <drvh7r$uqg$>

Donovan Baarda wrote:

> For Python 3000 you could extend this approach to lists and dicts;
> [1,2,3] is a list, f[1,2,3] is a "frozen list" or tuple, {1:'a',2:'b'}
> is a dict, f{1:'a',2:'b'} is a "frozen dict" which can be used as a key
> in other dicts... etc.

Traceback (most recent call last):
  File "", line 219, in monitor
HyperGeneralizationViolationError: please let your brain cool down before proceeding

From aleaxit at  Fri Feb  3 15:35:02 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 3 Feb 2006 06:35:02 -0800
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 3, 2006, at 1:36 AM, Michael Hudson wrote:

> Alex Martelli <aleaxit at> writes:
>> I was recently reviewing a lot of the Python 2.4 code I have written,
>> and I've noticed one thing: thanks to the attrgetter and itemgetter
>> functions in module operator, I've been using (or been tempted to  
>> use)
>> far fewer lambdas, particularly but not exclusively in key= arguments
>> to sort and sorted.
> Interesting.  Something I'd noticed was that *until* the key= argument
> to sort appeared, I was hardly using any lambdas at all (most of the
> places I had used them were rendered obsolete by list comprehensions).

Mine too, but many new places appeared, especially in itertools.

> A class I wrote (and lost) ages ago was a "placeholder" class, so if
> 'X' was an instance of this class, "X + 1" was roughly equivalent to
> "lambda x:x+1" and "X.method(zip, zop)" was roughly equivalent to your
> "methodcaller("method", zip, zop)".  I threw it away when listcomps
> got implemented.  Not sure why I mention it now, something about your
> post made me think of it...

Such a placeholder would certainly offer better syntax and more power  
than methodcaller (and itemgetter and attrgetter, too).  A lovely idea!


From rasky at  Fri Feb  3 15:47:05 2006
From: rasky at (Giovanni Bajo)
Date: Fri, 3 Feb 2006 15:47:05 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <><>
Message-ID: <02c401c628d0$b3236df0$bf03030a@trilan>

Nick Coghlan <ncoghlan at> wrote:

> Consider these comparisons:
>    itemgetter(1)     <=> (x[1] def (x))
>    attrgetter('foo') <=> ( def (x))
>    partial(y, arg)   <=> (y(arg) def)
> So rather than yet another workaround for lambda being ugly, I'd rather
> a PEP that proposed "Let's make the syntax for deferring an expression not
> be ugly anymore, now that we have generator expressions and conditionals
> an example of how to do it right".

+1000. Instead of keep on adding arcane functions which return objects which
(when called) do things not obvious if not by knowing the function
beforehand, a generic syntax should be added for deferred execution. I too
use itemgetter and friends but the "correct" way of doing a defferred "x[1]"
*should* let you write "x[1]" in the code. This is my main opposition to
partial/itemgetter/attrgetter/methodcaller: they allow deferred execution
using a syntax which is not equivalent to that of immediate execution.
Unless we propose to deprecate "x[1]" in favor of "itemgetter(1)(x)"...
Giovanni Bajo

From aleaxit at  Fri Feb  3 16:00:26 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 3 Feb 2006 07:00:26 -0800
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <02c401c628d0$b3236df0$bf03030a@trilan>
References: <><>
Message-ID: <>

On Feb 3, 2006, at 6:47 AM, Giovanni Bajo wrote:
> use itemgetter and friends but the "correct" way of doing a  
> defferred "x[1]"
> *should* let you write "x[1]" in the code. This is my main  
> opposition to
> partial/itemgetter/attrgetter/methodcaller: they allow deferred  
> execution
> using a syntax which is not equivalent to that of immediate execution.

I understand your worry re the syntax issue.  So what about Michael  
Hudson's "placeholder class" idea, where X[1] returns the callable  
that will do x[1] when called, etc?  Looks elegant to me...


From bokr at  Fri Feb  3 16:16:23 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 15:16:23 GMT
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>
	<> <>
Message-ID: <>

On Fri, 03 Feb 2006 20:44:47 +1000, Nick Coghlan <ncoghlan at> wrote:

>Michael Hudson wrote:
>> Alex Martelli <aleaxit at> writes:
>>> I'll be glad to write a PEP, but I first want to check whether the
>>> Python-Dev crowd would just blast it out of the waters, in which case
>>> I may save writing it...
>> Hmm.
>>>>> funcTakingCallback(lamda x:x.method(zip, zop))
>>>>> funcTakingCallback(methodcaller("method", zip, zop))
>> I'm not sure which of these is clearer really.  Are lambdas so bad?
>> (FWIW, I haven't internalized itemgetter/attrgetter yet and still tend
>> to use lambdas instead those too).
If you are familiar with lambda, it's clearer, because the expression
evaluation is just deferred, and you can see that zip and zop are going
to be accessed at lambda call time. methodcaller could potentially hide
an alternate binding of zip and zop like lambda x, zip=zip, zop=zop:x.method(zip,zop)
So you have to know what methodcaller does rather than just reading the expression.
And if you want to customize with def-time (lambda eval time) bindings as well as
call arg bindings, you can't easily AFAICS.

BTW, re def-time bindings, the default arg abuse is a hack, so I would like to
see a syntax that would permit default-arg-like def-time function-local bindings without
affecting the call signature. E.g., if def foo(*args, **keywords, ***bindings): ...
would use bindings as a dict at def-time to create local namespace bindings like **keywords,
but not affecting the call signature. This would allow a nicer version of above-mentioned
   lambda x, zip=zip, zop=zop:x.method(zip,zop)
   lambda x, ***dict(zip=zip, zop=zop):x.method(zip,zop)
   lambda x, ***{'zip':zip, 'zop':zop}:x.method(zip,zop)
This could also be used to do currying without the typical cost of wrapped nested calling.

>I've been convinced for a while that the proliferation of features like 
>operator.itemgetter and attrgetter (and some uses of functional.partial) 
>demonstrate that the ability to defer a single expression (as lambda currently 
>allows) is a very useful feature to have in the language. Unfortunately that
<BTW>note that "deferring" is a particular case of controlling evaluation time.
The other direction in time goes towards reader macros & such.</BTW>
>utility gets overshadowed by the ugliness of the syntax, the mathematical 
>baggage associated with the current keyword, and the fact that lambda gets 
>pitched as an "anonymous function limited to a single expression", rather than 
>as "the ability to defer an expression for later evaluation" (the former 
>sounds like a limitation that should be fixed, the latter sounds like the 
>deliberate design choice that it is).
>At the moment it looks like the baby is going to get thrown out with the 
>bathwater in Py3k, but I'd love to be able to simply write the following 
>instead of some byzantine mixture of function calls to get the same effect:
>     funcTakingCallback(x.method(zip, zop) def (x))
>Consider these comparisons:
This looks a lot like the "anonymous def" expression in a postfix form ;-)
>   itemgetter(1)     <=> (x[1] def (x))
                      <=> def(x):x[1]
>   attrgetter('foo') <=> ( def (x))
                      <=> def(x)
>   partial(y, arg)   <=> (y(arg) def)
                      <=> def(***{'arg':arg}):y()  # ?? (not sure about semantics of partial)
>So rather than yet another workaround for lambda being ugly, I'd rather see a 
>PEP that proposed "Let's make the syntax for deferring an expression not be 
>ugly anymore, now that we have generator expressions and conditionals as an 
>example of how to do it right".
I guess you can guess my vote is for anonymous def ;-)
>Guido was rather unenthused the last time this topic came up, though, so maybe 
>it isn't worth the effort. . . (although he did eventually change his mind on 
>PEP 308, so I haven't entirely given up hope yet).
Likewise ;-)

Bengt Richter

From abo at  Fri Feb  3 17:09:34 2006
From: abo at (Donovan Baarda)
Date: Fri, 03 Feb 2006 16:09:34 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Fri, 2006-02-03 at 12:04 +0000, Donovan Baarda wrote:
> On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote:
> Personally I'd like this. currently the "set(...)"  syntax makes sets
> feel tacked on compared to tuples, lists, dicts, and strings which have
> nice built in syntax support. Many people don't realise they are there
> because of this.
> Frozensets are to sets what tuples are to lists. It would be nice if
> there was another type of bracket that could be used for frozenset...
> something like ':1,2,3:'... yuk... I dunno.

One possible bracket option for frozenset would be "<1,2,3>" which I
initially rejected because of the possible syntactic clash with the <
and > operators... however, there may be a way this could work... dunno.

The other thing that keeps nagging me is set, frozenset, tuple, and list
all overlap in functionality to fairly significant degrees. Sometimes it
feels like just implementation or application differences... could a
list that is never modified be optimised under the hood as a tuple?
Could the immutability constraint of tuples be just acquired by a list
when it is used as a key? Could a set simply be a list with unique
values? etc.

Donovan Baarda <abo at>

From exarkun at  Fri Feb  3 17:31:44 2006
From: exarkun at (Jean-Paul Calderone)
Date: Fri, 3 Feb 2006 11:31:44 -0500
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
Message-ID: <20060203163144.2697.2017601097.divmod.quotient.5992@ohm>

On Fri, 3 Feb 2006 07:00:26 -0800, Alex Martelli <aleaxit at> wrote:
>On Feb 3, 2006, at 6:47 AM, Giovanni Bajo wrote:
>    ...
>> use itemgetter and friends but the "correct" way of doing a
>> defferred "x[1]"
>> *should* let you write "x[1]" in the code. This is my main
>> opposition to
>> partial/itemgetter/attrgetter/methodcaller: they allow deferred
>> execution
>> using a syntax which is not equivalent to that of immediate execution.
>I understand your worry re the syntax issue.  So what about Michael
>Hudson's "placeholder class" idea, where X[1] returns the callable
>that will do x[1] when called, etc?  Looks elegant to me...




From rasky at  Fri Feb  3 17:32:30 2006
From: rasky at (Giovanni Bajo)
Date: Fri, 3 Feb 2006 17:32:30 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <><>
Message-ID: <046a01c628df$6d4da160$bf03030a@trilan>

Alex Martelli <aleaxit at> wrote:

>> use itemgetter and friends but the "correct" way of doing a
>> defferred "x[1]"
>> *should* let you write "x[1]" in the code. This is my main
>> opposition to
>> partial/itemgetter/attrgetter/methodcaller: they allow deferred
>> execution
>> using a syntax which is not equivalent to that of immediate execution.
> I understand your worry re the syntax issue.  So what about Michael
> Hudson's "placeholder class" idea, where X[1] returns the callable
> that will do x[1] when called, etc?  Looks elegant to me...

Depends on how the final API looks like. "deffered(x)[1]" isn't that bad,
but "def x: x[1]" still looks clearer as the 'def' keyword immediatly makes
clear you're DEFining a DEFerred function <g> :) Of course we can paint our
bikeshed of whatever color we like, but I'm happy enough if we agree with
the general idea of keeping the same syntax in both deferred and immediate

There is an also an issue with deferred execution without arguments. By
grepping my code it turned out that many lambda instances are in calls to
assertRaises() (unittest), where I stricly prefer the syntax:

self.assertRaises(TypeError, lambda: int("ABK", 16))

to the allowed:

self.assertRaises(TypeError, int, "ABK", 16)

With the inline def proposal we'd get something along the lines of:

self.assertRaises(TypeError, def(): int("ABK", 16))
self.assertRaises(TypeError, (int("ABK", 16) def))   # it's not lisp,
really, I swear

while I'm not sure how this would get with the placeholder class.
Giovanni Bajo

From hyeshik at  Fri Feb  3 17:41:25 2006
From: hyeshik at (Hye-Shik Chang)
Date: Sat, 4 Feb 2006 01:41:25 +0900
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <20060203163144.2697.2017601097.divmod.quotient.5992@ohm>
References: <>
Message-ID: <>

On 2/4/06, Jean-Paul Calderone <exarkun at> wrote:
> On Fri, 3 Feb 2006 07:00:26 -0800, Alex Martelli <aleaxit at> wrote:
> >
> >I understand your worry re the syntax issue.  So what about Michael
> >Hudson's "placeholder class" idea, where X[1] returns the callable
> >that will do x[1] when called, etc?  Looks elegant to me...
> >
> <>
> <>

Yet another implementation,


From jcarlson at  Fri Feb  3 18:00:56 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 03 Feb 2006 09:00:56 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

Donovan Baarda <abo at> wrote:
> On Wed, 2006-02-01 at 13:55 -0500, Greg Wilson wrote:
> > Hi,
> > 
> > I have a student who may be interested in adding syntactic support for
> > sets to Python, so that:
> > 
> >     x = {1, 2, 3, 4, 5}
> > 
> > and:
> > 
> >     y = {z for z in x if (z % 2)}
> Personally I'd like this. currently the "set(...)"  syntax makes sets
> feel tacked on compared to tuples, lists, dicts, and strings which have
> nice built in syntax support. Many people don't realise they are there
> because of this.

Sets are tacked on.  That's why you need to use 'import sets' to get to
them, in a similar fashion that you need to use 'import array' to get
access to C-like arrays.

People don't realize that sets are there because they tend to not read
the "what's new in Python X.Y", and also fail to read through the
"global module index" every once and a while.

I personally object to making syntax for sets for the same reasons I
object to making arrays, heapqs, Queues, deques, or any of the other
data structure-defining modules in the standard library into syntax.

 - Josiah

From abo at  Fri Feb  3 18:04:24 2006
From: abo at (Donovan Baarda)
Date: Fri, 03 Feb 2006 17:04:24 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Fri, 2006-02-03 at 09:00 -0800, Josiah Carlson wrote:
> Sets are tacked on.  That's why you need to use 'import sets' to get to
> them, in a similar fashion that you need to use 'import array' to get
> access to C-like arrays.

No you don't;

$ python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> v=set((1,2,3))
>>> f=frozenset(v)

set and frozenset are now builtin.

> I personally object to making syntax for sets for the same reasons I
> object to making arrays, heapqs, Queues, deques, or any of the other
> data structure-defining modules in the standard library into syntax.

Nuff was a fairy... though I guess it depends on where you draw the
line; should [1,2,3] be list(1,2,3)?

Donovan Baarda <abo at>

From mwh at  Fri Feb  3 18:13:08 2006
From: mwh at (Michael Hudson)
Date: Fri, 03 Feb 2006 17:13:08 +0000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <20060203163144.2697.2017601097.divmod.quotient.5992@ohm>
	Calderone's message of "Fri, 3 Feb 2006 11:31:44 -0500")
References: <20060203163144.2697.2017601097.divmod.quotient.5992@ohm>
Message-ID: <>

Jean-Paul Calderone <exarkun at> writes:

> On Fri, 3 Feb 2006 07:00:26 -0800, Alex Martelli <aleaxit at> wrote:
>>On Feb 3, 2006, at 6:47 AM, Giovanni Bajo wrote:
>>    ...
>>> use itemgetter and friends but the "correct" way of doing a
>>> defferred "x[1]"
>>> *should* let you write "x[1]" in the code. This is my main
>>> opposition to
>>> partial/itemgetter/attrgetter/methodcaller: they allow deferred
>>> execution
>>> using a syntax which is not equivalent to that of immediate execution.
>>I understand your worry re the syntax issue.  So what about Michael
>>Hudson's "placeholder class" idea, where X[1] returns the callable
>>that will do x[1] when called, etc?  Looks elegant to me...

I'd just like to point out here that I only mentioned this class; I
didn't suggest it for anything :)

> <>
> <>

Yow.  My implementation was somewhere in between those for length, I
think (and pre-dated new style classes, which probably changes


  I'm sorry, was my bias showing again? :-)
                                      -- William Tanksley, 13 May 2000

From g.brandl at  Fri Feb  3 18:31:00 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 03 Feb 2006 18:31:00 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
Message-ID: <ds040k$a4m$>

Alex Martelli wrote:

>> A class I wrote (and lost) ages ago was a "placeholder" class, so if
>> 'X' was an instance of this class, "X + 1" was roughly equivalent to
>> "lambda x:x+1" and "X.method(zip, zop)" was roughly equivalent to your
>> "methodcaller("method", zip, zop)".  I threw it away when listcomps
>> got implemented.  Not sure why I mention it now, something about your
>> post made me think of it...
> Such a placeholder would certainly offer better syntax and more power  
> than methodcaller (and itemgetter and attrgetter, too).  A lovely idea!

Yep. And it would make Python stand out of the crowd another time ;)

The question is: is it "serious" and deterministic enough to be builtin?


From bokr at  Fri Feb  3 18:51:31 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 17:51:31 GMT
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>
Message-ID: <>

On Thu, 2 Feb 2006 14:43:51 -0800, Alex Martelli <aleaxit at> wrote:

>I was recently reviewing a lot of the Python 2.4 code I have written,
>and I've noticed one thing: thanks to the attrgetter and itemgetter
>functions in module operator, I've been using (or been tempted to use)
>far fewer lambdas, particularly but not exclusively in key= arguments
>to sort and sorted.  Most of those "lambda temptations" will be
>removed by PEP 309 (functional.partial), and most remaining ones are
>of the form:
>    lambda x: x.amethod(zip, zop)
>So I was thinking -- wouldn't it be nice to have (possibly in module
>functional, like partial; possibly in module operator, like itemgetter
>and attrgetter -- I'm partial to functional;-) a methodcaller entry
>akin to (...possibly with a better name...):
>def methodcaller(methodname, *a, **k):
>    def caller(self):
>        getattr(self, methodname)(*a, **k)
>    caller.__name__ = methodname
>    return caller
>...?  This would allow removal of even more lambdas.
Yes, but what semantics do you really want? The above, as I'm sure you know, is
not a direct replacement for the lambda:

 >>> import dis
 >>> foo = lambda x: x.amethod(zip, zop)
 >>> def methodcaller(methodname, *a, **k):
 ...     def caller(self):
 ...         getattr(self, methodname)(*a, **k)
 ...     caller.__name__ = methodname
 ...     return caller
 >>> bar = methodcaller('amethod', zip, zop)
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 NameError: name 'zop' is not defined
 >>> zop = 'must exist at methodcaller call time'
 >>> bar = methodcaller('amethod', zip, zop)
 >>> dis.dis(foo)
   1           0 LOAD_FAST                0 (x)
               3 LOAD_ATTR                1 (amethod)
               6 LOAD_GLOBAL              2 (zip)
               9 LOAD_GLOBAL              3 (zop)
              12 CALL_FUNCTION            2
              15 RETURN_VALUE
 >>> dis.dis(bar)
   3           0 LOAD_GLOBAL              0 (getattr)
               3 LOAD_FAST                0 (self)
               6 LOAD_DEREF               2 (methodname)
               9 CALL_FUNCTION            2
              12 LOAD_DEREF               0 (a)
              15 LOAD_DEREF               1 (k)
              18 CALL_FUNCTION_VAR_KW     0
              21 POP_TOP
              22 LOAD_CONST               0 (None)
              25 RETURN_VALUE

>I'll be glad to write a PEP, but I first want to check whether the
>Python-Dev crowd would just blast it out of the waters, in which case
>I may save writing it...
-0 ;-)

Bengt Richter

From tismer at  Fri Feb  3 19:10:42 2006
From: tismer at (Christian Tismer)
Date: Fri, 03 Feb 2006 19:10:42 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Bengt Richter wrote:


> BTW, re def-time bindings, the default arg abuse is a hack, so I would like to
> see a syntax that would permit default-arg-like def-time function-local bindings without
> affecting the call signature. E.g., if def foo(*args, **keywords, ***bindings): ...
> would use bindings as a dict at def-time to create local namespace bindings like **keywords,
> but not affecting the call signature. This would allow a nicer version of above-mentioned
>    lambda x, zip=zip, zop=zop:x.method(zip,zop)
> as
>    lambda x, ***dict(zip=zip, zop=zop):x.method(zip,zop)
> or
>    lambda x, ***{'zip':zip, 'zop':zop}:x.method(zip,zop)
> This could also be used to do currying without the typical cost of wrapped nested calling.

Just in case that you might be not aware of it (like I was):
lambda does support local scope, like here:

 >>> def locallambda(x, y):
... 	func = lambda: x+y
... 	return func
 >>> f=locallambda(2, 3)
 >>> f()

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
tismerysoft GmbH             :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A     :    *Starship*
14109 Berlin                 :     PGP key ->
work +49 30 802 86 56  mobile +49 173 24 18 776  fax +49 30 80 90 57 05
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?

From martin at  Fri Feb  3 19:56:20 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Feb 2006 19:56:20 +0100
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:
> If you are looking at them in C code receiving them as args in a call,
> "treat them the same" would have to mean provide code to coerce long->int
> or reject it with an exception, IWT.

The typical way of processing incoming ints in C is through
PyArg_ParseTuple, which already has the code to coerce long->int
(which in turn may raise an exception for a range violation).

So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.

> It's not a matter of "buggy" if you are trying to optimize.
> (I am aware of premature optimization issues, and IMO "strange"
> is in the eye of the beholder. What syntax would you suggest?

The question is: what is the problem you are trying to solve?
If it is "bit masks", then consider the problem solved already.

>>Same goes for code that says it takes a 32-bit bitfield argument but  
>>won't accept 0x80000000.
> If the bitfield is signed, it can't, unless you are glossing over
> an assumed coercion rule.

Just have a look at the 'k' specifier in PyArg_ParseTuple.


From martin at  Fri Feb  3 20:02:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 03 Feb 2006 20:02:16 +0100
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

Donovan Baarda wrote:
> Before set() the standard way to do them was to use dicts with None
> Values... to me the "{1,2,3}" syntax would have been a logical extension
> of the "a set is a dict with no values, only keys" mindset. I don't know
> why it wasn't done this way in the first place, though I missed the
> arguments where it was rejected.

There might be many reasons; one obvious reason is that you can't spell
the empty set that way.

> Frozensets are to sets what tuples are to lists. It would be nice if
> there was another type of bracket that could be used for frozenset...
> something like ':1,2,3:'... yuk... I dunno.

Readability counts.


From fumanchu at  Fri Feb  3 19:52:01 2006
From: fumanchu at (Robert Brewer)
Date: Fri, 3 Feb 2006 10:52:01 -0800
Subject: [Python-Dev] any support for a methodcaller HOF?
Message-ID: <6949EC6CD39F97498A57E0FA55295B21015D82B8@ex9.hostedexchange.local>

Giovanni Bajo wrote:
> Alex Martelli <aleaxit at> wrote:
> > I understand your worry re the syntax issue.  So what about Michael
> > Hudson's "placeholder class" idea, where X[1] returns the callable
> > that will do x[1] when called, etc?  Looks elegant to me...
> Depends on how the final API looks like. "deffered(x)[1]" 
> isn't that bad, but "def x: x[1]" still looks clearer as
> the 'def' keyword immediatly makes clear you're DEFining
> a DEFerred function <g> :) Of course we can paint our
> bikeshed of whatever color we like, but I'm happy enough if 
> we agree with the general idea of keeping the same syntax
> in both deferred and immediate execution.

I don't agree with that "general idea" at all. Sorry. ;) I think the
semantic emphasis should not be on "execution", but rather on
"expression". The word "execution" to me implies "statements", and
although some functions somewhere are called behind the scenes to
evaluate any expression, the lambda (and its potential successors)
differ from "def" by not allowing statements. They may be used to "defer
execution" but to me, their value lies in being static
expressions--object instances which are portable and introspectable.

This is where LINQ [1] is taking off: expressions are declared with
"var" (in C#). I used Expression() in Dejavu [2] for the same reasons
(before LINQ came along ;), and am using it to build SQL from Python
lambdas. I had to use lambda because that's Python's only builtin
support for expressions-as-objects at the moment, but I'd like to see
Python grow a syntax like:

    e = expr(x: x + 1)

...where expr() does early binding like dejavu.logic does. [Looking back
over my logic module, I'm noticing it requires boolean return values,
but it would not be difficult to extend to return abitrary values--even
easier if it were rewritten as builtin functionality. Guess I need to
write myself another ticket. ;)]

Robert Brewer
System Architect
Amor Ministries
fumanchu at


From rasky at  Fri Feb  3 20:05:42 2006
From: rasky at (Giovanni Bajo)
Date: Fri, 3 Feb 2006 20:05:42 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <6949EC6CD39F97498A57E0FA55295B21015D82B8@ex9.hostedexchange.local>
Message-ID: <009e01c628f4$d4705260$bf03030a@trilan>

Robert Brewer <fumanchu at> wrote:

> The word "execution" to me implies "statements", and
> although some functions somewhere are called behind the scenes to
> evaluate any expression, the lambda (and its potential successors)
> differ from "def" by not allowing statements. They may be used to "defer
> execution" but to me, their value lies in being static
> expressions--object instances which are portable and introspectable.

> This is where LINQ [1] is taking off: expressions are declared with
> "var" (in C#). I used Expression() in Dejavu [2] for the same reasons
> (before LINQ came along ;), and am using it to build SQL from Python
> lambdas. I had to use lambda because that's Python's only builtin
> support for expressions-as-objects at the moment, but I'd like to see
> Python grow a syntax like:
>     e = expr(x: x + 1)

I see what you mean, but in a way you're still agreeing with me :) Your
expression-as-objects proposal is very clever, but to me (and as far as this
thread is concerned) it still allows to write a "decorated" piece of code
(expression), pass it around, and execute (evaluate) it later. This is what
I (and others) mainly use lambda for, and your expr() thing would still
serve me well. Instead, itemgetter() and friends are going to a different
direction (the expression which is later evaluated is not clearly expressed
in familiar Python terms), and that's what I find inconvenient.
Giovanni Bajo

From jcarlson at  Fri Feb  3 20:56:55 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 03 Feb 2006 11:56:55 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <>
Message-ID: <>

Donovan Baarda <abo at> wrote:
> On Fri, 2006-02-03 at 09:00 -0800, Josiah Carlson wrote:
> [...]
> > Sets are tacked on.  That's why you need to use 'import sets' to get to
> > them, in a similar fashion that you need to use 'import array' to get
> > access to C-like arrays.
> No you don't;
> $ python
> Python 2.4.1 (#2, Mar 30 2005, 21:51:10)
> [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> v=set((1,2,3))
> >>> f=frozenset(v)
> >>>
> set and frozenset are now builtin.

Indeed they are.  My apologies for being incorrect, I'm still using 2.3
for all of my commercial work.

> > I personally object to making syntax for sets for the same reasons I
> > object to making arrays, heapqs, Queues, deques, or any of the other
> > data structure-defining modules in the standard library into syntax.
> Nuff was a fairy... though I guess it depends on where you draw the
> line; should [1,2,3] be list(1,2,3)?

Who is "Nuff"?

Along the lines of "not every x line function should be a builtin", "not
every builtin should have syntax".  I think that sets have particular
uses, but I don't believe those uses are sufficiently varied enough to
warrant the creation of a syntax.  I suggest that people take a walk
through their code. How often do you use other sequence and/or mapping
types? How many lists, tuples and dicts are there?  How many sets? Ok,
now how many set literals?

Syntax for sets is only really useful for the equivalent of a set
literal, and with minimal syntax for a set literal being some sort of
start and ending character pair, the only thing gained is a 3 key
reduction in the amount of typing necessary, and a possible compiler
optimization to call the set creation code instead of the local, global,
then builtin namespaces.

Essentially, I'm saying that "set(...)" isn't significantly worse than
"{...}" (or some other pair) for set creation.  One can say the same
thing about list(), tuple(), and dict(), but I think that their millions
of uses far overwhelms the minimal uses (and usage) of set(), and puts
them in a completely different class.

 - Josiah

From bokr at  Fri Feb  3 21:58:06 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 20:58:06 GMT
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<>
	<> <>
Message-ID: <>

On Fri, 03 Feb 2006 19:10:42 +0100, Christian Tismer <tismer at> wrote:

>Bengt Richter wrote:
>> BTW, re def-time bindings, the default arg abuse is a hack, so I would like to
>> see a syntax that would permit default-arg-like def-time function-local bindings without
>> affecting the call signature. E.g., if def foo(*args, **keywords, ***bindings): ...
>> would use bindings as a dict at def-time to create local namespace bindings like **keywords,
>> but not affecting the call signature. This would allow a nicer version of above-mentioned
>>    lambda x, zip=zip, zop=zop:x.method(zip,zop)
>> as
>>    lambda x, ***dict(zip=zip, zop=zop):x.method(zip,zop)
>> or
>>    lambda x, ***{'zip':zip, 'zop':zop}:x.method(zip,zop)
>> This could also be used to do currying without the typical cost of wrapped nested calling.
>Just in case that you might be not aware of it (like I was):
>lambda does support local scope, like here:
> >>> def locallambda(x, y):
>... 	func = lambda: x+y
>... 	return func
> >>> f=locallambda(2, 3)
> >>> f()
Yes, thanks, I really did know that ;-/ Just got thinking along another line. So
    lambda x, zip=zip, zop=zop:x.method(zip,zop)
    lambda x, ***{'zip':zip, 'zop':zop}:x.method(zip,zop)
would better have been
    (lambda zip,zop:lambda x:x.method(zip,zop))(zip, zop)

Bengt Richter

From bokr at  Sat Feb  4 00:08:39 2006
From: bokr at (Bengt Richter)
Date: Fri, 03 Feb 2006 23:08:39 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On Fri, 03 Feb 2006 19:56:20 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Bengt Richter wrote:
>> If you are looking at them in C code receiving them as args in a call,
>> "treat them the same" would have to mean provide code to coerce long->int
>> or reject it with an exception, IWT.
>The typical way of processing incoming ints in C is through
>PyArg_ParseTuple, which already has the code to coerce long->int
>(which in turn may raise an exception for a range violation).
>So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
I thought there might be a case in a hot loop where it could make a difference.
I confess not having done a C extension since I wrote one to access RDTSC quite some time ago.

>> It's not a matter of "buggy" if you are trying to optimize.
>> (I am aware of premature optimization issues, and IMO "strange"
>> is in the eye of the beholder. What syntax would you suggest?
>The question is: what is the problem you are trying to solve?
>If it is "bit masks", then consider the problem solved already.
Well, I was visualizing having a homogeneous bunch of bit mask
definitions all as int type if they could fit. I can't express
them all in hex as literals without some processing. That got me started ;-)
Not that some one-time processing at module import time is a big deal.
Just that it struck me as a wart not to be able to do it without processing,
even if constant folding is on the way.

>>>Same goes for code that says it takes a 32-bit bitfield argument but  
>>>won't accept 0x80000000.
>> If the bitfield is signed, it can't, unless you are glossing over
>> an assumed coercion rule.
>Just have a look at the 'k' specifier in PyArg_ParseTuple.
Ok, well that's the provision for the coercion then.
BTW, is long mandatory for all implementations? Is there a doc that
defines minimum features for a conforming Python implementation?
E.g., IIRC Scheme has a list naming what's optional and not.

Bengt Richter

From brett at  Sat Feb  4 00:29:17 2006
From: brett at (Brett Cannon)
Date: Fri, 3 Feb 2006 15:29:17 -0800
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B21015D82B8@ex9.hostedexchange.local>
References: <6949EC6CD39F97498A57E0FA55295B21015D82B8@ex9.hostedexchange.local>
Message-ID: <>

On 2/3/06, Robert Brewer <fumanchu at> wrote:
> Giovanni Bajo wrote:
> > Alex Martelli <aleaxit at> wrote:
> > > I understand your worry re the syntax issue.  So what about Michael
> > > Hudson's "placeholder class" idea, where X[1] returns the callable
> > > that will do x[1] when called, etc?  Looks elegant to me...
> >
> > Depends on how the final API looks like. "deffered(x)[1]"
> > isn't that bad, but "def x: x[1]" still looks clearer as
> > the 'def' keyword immediatly makes clear you're DEFining
> > a DEFerred function <g> :) Of course we can paint our
> > bikeshed of whatever color we like, but I'm happy enough if
> > we agree with the general idea of keeping the same syntax
> > in both deferred and immediate execution.
> I don't agree with that "general idea" at all. Sorry. ;) I think the
> semantic emphasis should not be on "execution", but rather on
> "expression". The word "execution" to me implies "statements", and
> although some functions somewhere are called behind the scenes to
> evaluate any expression, the lambda (and its potential successors)
> differ from "def" by not allowing statements. They may be used to "defer
> execution" but to me, their value lies in being static
> expressions--object instances which are portable and introspectable.
> This is where LINQ [1] is taking off: expressions are declared with
> "var" (in C#). I used Expression() in Dejavu [2] for the same reasons
> (before LINQ came along ;), and am using it to build SQL from Python
> lambdas. I had to use lambda because that's Python's only builtin
> support for expressions-as-objects at the moment, but I'd like to see
> Python grow a syntax like:
>     e = expr(x: x + 1)
> ...where expr() does early binding like dejavu.logic does. [Looking back
> over my logic module, I'm noticing it requires boolean return values,
> but it would not be difficult to extend to return abitrary values--even
> easier if it were rewritten as builtin functionality. Guess I need to
> write myself another ticket. ;)]

Well, maybe what we really want is lambda but under a different name. 
``expr x: x + 1`` seems fine to me and it doesn't have the issue of
portraying Python has having a crippled lambda expression.  I do think
that a general solution can be found that can allow us to do away with
itemgetter, itergetter, and Alex's methodcaller (something like
Michael's Placeholder class).

The problem is when we want deferred arguments to a function call.  We
have functional.partial, but it can't do something like ``lambda x:
func(1, 2, x, 4, 5)`` unless everything is turned to keyword arguments
but that doesn't work when something only take positional arguments. 
This does not seem to have a good solution outside of lambda in terms
of non-function definition.

But then again small functions can be defined for those situations. 
So I think that functional.partial along with some deferred object
implementation should deal with most uses of lambda and then allow us
to use custom functions to handle all the other cases of when we would
want lambda.


From ncoghlan at  Sat Feb  4 03:18:11 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 04 Feb 2006 12:18:11 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Bengt Richter wrote:
> On Fri, 03 Feb 2006 20:44:47 +1000, Nick Coghlan <ncoghlan at> wrote:
>>     funcTakingCallback(x.method(zip, zop) def (x))
>> Consider these comparisons:
> This looks a lot like the "anonymous def" expression in a postfix form ;-)

If you think about the way a for-loop statement maps to the looping portion of 
a listcomp or genexp, or the way an if statement maps to a conditional 
expression, you might notice that this is *not* a coincidence :)

   def g(_seq):
     for x in _seq:
       yield x*x
   g = g(seq)

=> g = (x*x for x in seq)

   l = []
   for x in seq:

=> l = [x*x for x in seq]

   if cond:
     val = x
     val = y

=> val = x if cond else y

In all three of the recent cases where a particular usage of a statement has 
been converted to an expression, the variable portion of the innermost part of 
the the first suite is pulled up and placed to the left of the normal 
statement keyword. A bracketing syntax is used when the expression creates a 
new object. All I'm suggesting is that a similarly inspired syntax is worth 
considering when it comes to deferred expressions:

   def f(x):
     return x*x

=> f = (x*x def (x))


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From eric.nieuwland at  Sat Feb  4 03:48:18 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Sat, 4 Feb 2006 03:48:18 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

On 4 feb 2006, at 3:18, Nick Coghlan wrote:
> All I'm suggesting is that a similarly inspired syntax is worth
> considering when it comes to deferred expressions:
>    def f(x):
>      return x*x
> => f = (x*x def (x))

It's not the same, as x remains free whereas in g = [x*x for x in seq] 
x is bound.

Yours is

f = lambda x: x*x

and it will die by Guido hand...

From ncoghlan at  Sat Feb  4 04:11:21 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 04 Feb 2006 13:11:21 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Eric Nieuwland wrote:
> On 4 feb 2006, at 3:18, Nick Coghlan wrote:
>> All I'm suggesting is that a similarly inspired syntax is worth
>> considering when it comes to deferred expressions:
>>    def f(x):
>>      return x*x
>> => f = (x*x def (x))
> It's not the same, as x remains free whereas in g = [x*x for x in seq] x 
> is bound.

That's like saying "it's not the same because '(x*x def (x)' creates a 
function while '(x*x for x in seq)' creates a generator-iterator". Well, 
naturally - if the expression didn't do something different, what would be the 
point in having it?

The parallel I'm trying to draw is at the syntactic level, not the semantic. 
I'm quite aware that the semantics will be very different ;)

> Yours is
> f = lambda x: x*x
> and it will die by Guido hand...

In the short term, probably. I'm hoping that the progressive accumulation of 
workarounds like itemgetter, attrgetter and partial (and Alex's suggestion of 
'methodcaller') and the increasing use of function arguments for things like 
sorting and the itertools module will eventually convince Guido that deferring 
expressions is a feature that needs to be *fixed* rather than discarded entirely.

But until the BDFL is willing to at least entertain the notion of fixing 
deferred expressions rather than getting ridding of them, there isn't much 
point in writing a PEP or a patch to tweak the parser (with the AST in place, 
this is purely a change to the parser front-end - the AST and code generation 
back end don't need to be touched).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From gvwilson at  Fri Feb  3 01:26:14 2006
From: gvwilson at (Greg Wilson)
Date: Thu, 2 Feb 2006 19:26:14 -0500 (EST)
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs> 
Message-ID: <Pine.GSO.4.58.0602021925490.3688@qew.cs>

> > > Raymond:
> > > Accordingly,Guido rejected the braced notation for set comprehensions.
> > > See:

> > Greg:
> > "...however, the issue could be revisited for Python 3000 (see PEP 3000)."
> > So I'm only 1994 years early ;-)

> Alex:
> Don't be such a pessimist, it's ONLY 994 years to go!

I was allowing for likely schedule slippage... ;-)


From ncoghlan at  Sat Feb  4 07:01:43 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 04 Feb 2006 16:01:43 +1000
Subject: [Python-Dev] Path PEP and the division operator
Message-ID: <>

I was tinkering with something today, and wondered whether it would cause 
fewer objections if the PEP used the floor division operator (//) to combine 
path fragments, instead of the true division operator?

The parallel to directory separators is still there, but the syntax isn't tied 
quite so strongly to the Unix path separator.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From eric.nieuwland at  Sat Feb  4 09:05:37 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Sat, 4 Feb 2006 09:05:37 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Nick Coghlan wrote:
> That's like saying "it's not the same because '(x*x def (x)' creates a
> function while '(x*x for x in seq)' creates a generator-iterator". 
> Well,
> naturally - if the expression didn't do something different, what 
> would be the
> point in having it?
Naturally.  I just wanted to point out it's a beast of another kind, so 
like syntax may not be a good idea.

> The parallel I'm trying to draw is at the syntactic level, not the 
> semantic.
> I'm quite aware that the semantics will be very different ;)
>> Yours is
>> f = lambda x: x*x
>> and it will die by Guido hand...
> In the short term, probably. I'm hoping that the progressive 
> accumulation of
> workarounds like itemgetter, attrgetter and partial (and Alex's 
> suggestion of
> 'methodcaller') and the increasing use of function arguments for 
> things like
> sorting and the itertools module will eventually convince Guido that 
> deferring
> expressions is a feature that needs to be *fixed* rather than 
> discarded entirely.

Then how about nameless function/method definition:
	def (x):
		... usual body ...
produces an unnamed method object
	def spam(x):
is just
	spam = def (x):
while our beloved
	eggs(lambda x: x*x)
would become
	eggs(def(x): return x*x)


From martin at  Sat Feb  4 11:11:08 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Feb 2006 11:11:08 +0100
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Bengt Richter wrote:
>>The typical way of processing incoming ints in C is through
>>PyArg_ParseTuple, which already has the code to coerce long->int
>>(which in turn may raise an exception for a range violation).
>>So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
> Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.

I didn't say that 'k' takes no significant time for longs vs ints. In
fact, I did not make any performance claims. I don't know what the
relative performance is.

> Well, I was visualizing having a homogeneous bunch of bit mask
> definitions all as int type if they could fit. I can't express
> them all in hex as literals without some processing. That got me started ;-)

I still can't see *why* you want to do that. Just write them as
hex literals the way you expect it to work, and it typically will
work just fine. Some of these literals are longs, some are ints,
but there is no need to worry about this. It will all work just

> BTW, is long mandatory for all implementations? Is there a doc that
> defines minimum features for a conforming Python implementation?

The Python language reference is typically considered as a specification
of what Python is. There is no "minimal Python" specification: you have
to do all of it.


From ncoghlan at  Sat Feb  4 13:41:48 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 04 Feb 2006 22:41:48 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Eric Nieuwland wrote:
> Then how about nameless function/method definition:
>     def (x):
>         ... usual body ...

Hell no. If I want to write a real function, I already have perfectly good 
syntax for that in the form of a def statement. I want to *increase* the 
conceptual (and pedagogical) difference between deferred expressions and real 
functions, not reduce it. There's a reason I try to use the term 'deferred 
expression' for lambda rather than 'anonymous function'. Even if lambdas are 
*implemented* as normal function objects, they're a conceptually different 
beast as far as I'm concerned - a function is typically about factoring out a 
piece of common code to be used in multiple places, while a lambda is about 
defining *here* and *now* an operation that is to be carried out *elsewhere* 
and possibly *later* (e.g., sorting and predicate arguments are defined at the 
  call site but executed in the function body, callbacks are defined when 
registered but executed when the relevant event occurs).

> produces an unnamed method object
> and
>     def spam(x):
>         ....
> is just
>     spam = def (x):
>         ...

Except that it wouldn't be - the name used in a def statement has special 
status that a normal variable name does not (e.g. the function knows about its 
real name, but nothing about the aliases given to it by assignment statements).

> while our beloved
>     eggs(lambda x: x*x)
> would become
>     eggs(def(x): return x*x)

I personally believe this fascination with "we want to be able to include a 
suite inside an expression" has been a major contributor to Guido's irritation 
with the whole concept of anonymous functions. That may just be me projecting 
my own feelings though - every time I try to start a discussion about getting 
a clean deferred expression syntax, at least one part of the thread will veer 
off onto the topic of embedded suites. IMO, if what you want to do is complex 
enough that you can't write it using a single expression, then giving it a 
name and a docstring would probably make the code more comprehensible anyway.

Generator expressions allow a generator to be embedded only if it is simple 
enough to be written using a single expression in the body of the loop. Lambda 
does the same thing for functions, but for some reason people seem to love the 
flexibility provided by genexps, while many think the exact same restriction 
in lambda is a problem that needs "fixing". Maybe once PEP 308 has been 
implemented, some of that griping will go away, as it will then be possible to 
cleanly embed conditional logic inside an expression (and hence inside a lambda).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From martin at  Sat Feb  4 13:55:23 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Feb 2006 13:55:23 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>	<>
	<>	<>	<>	<>	<>
Message-ID: <>

Nick Coghlan wrote:
> Hell no. If I want to write a real function, I already have perfectly good 
> syntax for that in the form of a def statement. I want to *increase* the 
> conceptual (and pedagogical) difference between deferred expressions and real 
> functions, not reduce it. There's a reason I try to use the term 'deferred 
> expression' for lambda rather than 'anonymous function'. Even if lambdas are 
> *implemented* as normal function objects, they're a conceptually different 
> beast as far as I'm concerned - a function is typically about factoring out a 
> piece of common code to be used in multiple places, while a lambda is about 
> defining *here* and *now* an operation that is to be carried out *elsewhere* 
> and possibly *later* (e.g., sorting and predicate arguments are defined at the 
>   call site but executed in the function body, callbacks are defined when 
> registered but executed when the relevant event occurs).

Hmm. A function also defines *here* and *now* an operation to be carried
out *elsewhere* and *later*.

> Generator expressions allow a generator to be embedded only if it is simple 
> enough to be written using a single expression in the body of the loop. Lambda 
> does the same thing for functions, but for some reason people seem to love the 
> flexibility provided by genexps, while many think the exact same restriction 
> in lambda is a problem that needs "fixing". Maybe once PEP 308 has been 
> implemented, some of that griping will go away, as it will then be possible to 
> cleanly embed conditional logic inside an expression (and hence inside a lambda).

I believe that usage of a keyword with the name of a Greek letter also
contributes to people considering something broken.


From ncoghlan at  Sat Feb  4 15:01:47 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 05 Feb 2006 00:01:47 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>	<>
	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Hmm. A function also defines *here* and *now* an operation to be carried
> out *elsewhere* and *later*.

Agreed, but when I use a lambda, I almost always have a *specific* elsewhere 
in mind (such as a sorting operation or a callback registration). With named 
functions, that isn't usually the case - I'll either be returning the function 
from a factory function or decorator (allowing the caller to do whatever they 
want with it), or I'll be storing the function in a module or class namespace 
where any code that needs to use it can retrieve it later.

Local utility functions occupy a middle ground - their usage is localised to 
one function or class definition, but they aren't necessarily defined just for 
one particular use. Using them more than once is a clear sign that they're 
worth naming, and the occasional need to name a complex single-use function 
seems a worthwhile trade-off when compared to trying to permit that complexity 
to be embedded inside an expression.

>> Generator expressions allow a generator to be embedded only if it is simple 
>> enough to be written using a single expression in the body of the loop. Lambda 
>> does the same thing for functions, but for some reason people seem to love the 
>> flexibility provided by genexps, while many think the exact same restriction 
>> in lambda is a problem that needs "fixing". Maybe once PEP 308 has been 
>> implemented, some of that griping will go away, as it will then be possible to 
>> cleanly embed conditional logic inside an expression (and hence inside a lambda).
> I believe that usage of a keyword with the name of a Greek letter also
> contributes to people considering something broken.

Aye, I agree there are serious problems with the current syntax. All I'm 
trying to say above is that I don't believe the functionality itself is broken.

At last count, Guido's stated preference was to ditch the functionality 
entirely for Py3k, so unless he says something to indicate he's changed his 
mind, we'll simply need to continue with proposing functions like 
methodcaller() as workarounds for its absence...


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Sat Feb  4 17:05:59 2006
From: guido at (Guido van Rossum)
Date: Sat, 4 Feb 2006 08:05:59 -0800
Subject: [Python-Dev] Path PEP and the division operator
In-Reply-To: <>
References: <>
Message-ID: <>

I won't even look at the PEP as long as it uses / or // (or any other
operator) for concatenation.

On 2/3/06, Nick Coghlan <ncoghlan at> wrote:
> I was tinkering with something today, and wondered whether it would cause
> fewer objections if the PEP used the floor division operator (//) to combine
> path fragments, instead of the true division operator?
> The parallel to directory separators is still there, but the syntax isn't tied
> quite so strongly to the Unix path separator.
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From bjourne at  Sat Feb  4 17:16:46 2006
From: bjourne at (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Sat, 4 Feb 2006 17:16:46 +0100
Subject: [Python-Dev] Path PEP and the division operator
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/4/06, Guido van Rossum <guido at> wrote:
> I won't even look at the PEP as long as it uses / or // (or any other
> operator) for concatenation.

That's good, because it doesn't. :)

mvh Bj?rn

From ncoghlan at  Sat Feb  4 17:28:43 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 05 Feb 2006 02:28:43 +1000
Subject: [Python-Dev] Path PEP and the division operator
In-Reply-To: <>
References: <>	<>
Message-ID: <>

BJ?rn Lindqvist wrote:
> On 2/4/06, Guido van Rossum <guido at> wrote:
>> I won't even look at the PEP as long as it uses / or // (or any other
>> operator) for concatenation.
> That's good, because it doesn't. :)

My mistake - that's been significantly updated since I last read it. I should 
have known better, though, as I think I was one of the people advocating use 
of the constructor instead of an operator. . .


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From duncan.booth at  Sat Feb  4 18:26:12 2006
From: duncan.booth at (Duncan Booth)
Date: Sat, 4 Feb 2006 11:26:12 -0600
Subject: [Python-Dev] Path PEP and the division operator
References: <>
Message-ID: <n2m-g.Xns9760B1543BBB8duncanrcpcouk@>

BJ?rn Lindqvist <bjourne at> wrote in
news:740c3aec0602040816w34981344n271b237d6b6c9fd5 at 

> On 2/4/06, Guido van Rossum <guido at> wrote:
>> I won't even look at the PEP as long as it uses / or // (or any other
>> operator) for concatenation.
> That's good, because it doesn't. :)
No, but it does say that / may be reintroduced 'if the BFDL so desires'. I 
hope that doesn't mean the BDFL may be overruled. :^)

I'm not convinced by the rationale given why atime,ctime,mtime and size are 
methods rather than properties but I do find this PEP much more agreeable 
than the last time I looked at it.

From eric.nieuwland at  Sat Feb  4 20:11:31 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Sat, 4 Feb 2006 20:11:31 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>	<>
	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

  Martin v. L?wis wrote:
> I believe that usage of a keyword with the name of a Greek letter also
> contributes to people considering something broken.

QOTW! ;-)


From eric.nieuwland at  Sat Feb  4 20:17:15 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Sat, 4 Feb 2006 20:17:15 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<>	<>
	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Nick Coghlan wrote:
>> I believe that usage of a keyword with the name of a Greek letter also
>> contributes to people considering something broken.
> Aye, I agree there are serious problems with the current syntax. All 
> I'm
> trying to say above is that I don't believe the functionality itself 
> is broken.

Lambda is not broken, it's restricted to  single calculation and 
therefore of limited use.
Although I wasn't too serious (should had added more signs of that), an 
anonymous 'def' would allow to use the full power of method definition.

> At last count, Guido's stated preference was to ditch the functionality
> entirely for Py3k, so unless he says something to indicate he's 
> changed his
> mind, we'll simply need to continue with proposing functions like
> methodcaller() as workarounds for its absence...

Yep, we'll just have to learn to live without it. :-( / ;-)


From rasky at  Sat Feb  4 20:35:43 2006
From: rasky at (Giovanni Bajo)
Date: Sat, 4 Feb 2006 20:35:43 +0100
Subject: [Python-Dev] Path PEP: some comments
Message-ID: <030001c629c2$30345c90$bf03030a@trilan>


my comments on the Path PEP:

- Many methods contain the word 'path' in them. I suppose this is to help
transition from the old library to the new library. But in the context of a
new Python user, I don't think that Path.abspath() is optimal. Path.abs()
looks better. Maybe it's not so fundamental to have exactly the same names
of the old library, especially when thinking of future? If I rearrange my
code to use Path, I can as well rename methods to something more sound at
the same time.

- Why having a basename() and a .namebase property? Again for backward
compatibility? I guess we can live with the property only.

- The operations that return list of files have confusing names. Something
more orthogonal could be: list, listdirs, listfiles / walk, walkdirs,
walkfiles. Where, I guess, the first triplet does not recurse into subdirs
while the second does. glob() could be dropped (as someone else proposed).

- ctime() is documented to be unportable: it has different semantics on UNIX
and Windows. I believe the class should abstract from these details. One
solution is to rip it off and forget about it. Another is to provide two
different functions which have a fixed semantic (and possibly available only
a subset of the operating systems / file systems).

- remove() and unlink() are duplicates, I'd drop one (unlink() has a more
arcane name).

- mkdir+makedirs and rmdir+removedirs are confusing and could use some
example. I believe it's enough to have a single makedir() (which is
recursive by default) and a single remove() (again recursive by default, and
could work with both files and directories). rmtree() should go for the same
reason (duplicated).

- Whatever function we comes out with for removing trees, it should have a
force=True flag to mimic "rm -rf". That is, it should try to remove
read-only files as well. I saw so many times people writing their own
rmtree_I_mean_it() wrapper which uses the onerror callback to change the
permissions. That's so unpythonic for such a common task.

- copy/copy2/copyfile mean the same to me. copy2() is really a bad name
though, I'd use copy(stats=True).

- My own feeling on the controversial split() vs splitpath() is that split()
is always wrong for paths so I don't see nothing fundamentally wrong in
overwriting it. I don't expect to find existing code (using strings for
path) calling split() on a path. split("/") might be common though, and in
fact my proposal is to overwrite the zero-argument split() giving it the
meaning of split("/").

- I'm missing read(), write(), readlines() and bytes() from the original
Path class. When I have a Path() that points to a file, it's pretty common
to read from it. Those functions were handy because they were saving much
obvious code:

for L in Path("foo.txt").readlines():
     print L,
f = open(Path("foo.txt"), "rU")
   for L in f:
        print L

- Since we're at it, we could also move part of "fileinput" into Path. For
instance, why not have a replacelines() method:

import fileinput
for L in fileinput.FileInput("foo.txt", inplace=True, backup=True):
     print "(modified) " + L,
for L in Path("foo.txt").replacelines(backup=True):
     print "(modified) " + L,

Thanks for working on this!
Giovanni Bajo

From pje at  Sat Feb  4 22:08:42 2006
From: pje at (Phillip J. Eby)
Date: Sat, 04 Feb 2006 16:08:42 -0500
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <030001c629c2$30345c90$bf03030a@trilan>
Message-ID: <>

At 08:35 PM 2/4/2006 +0100, Giovanni Bajo wrote:
>- ctime() is documented to be unportable: it has different semantics on UNIX
>and Windows. I believe the class should abstract from these details.

Note that this is the opposite of normal Python policy: Python does not 
attempt to create cross-platform abstractions, but instead chooses to 
expose platform differences.  The Path class shouldn't abstract this any 
more than the original *path modules do.

>  One
>solution is to rip it off and forget about it. Another is to provide two
>different functions which have a fixed semantic (and possibly available only
>a subset of the operating systems / file systems).

Keep in mind that to properly replace os.path, each of the various *path 
modules will need their own Path variant to support foreign path 
manipulation.  For example, one can use posixpath.join() right now on 
Windows to manipulate Posix paths, and ntpath.join() to do the reverse on 
Unix.  So there is already going to have to be a Path class for each os 
anyway - and they will all need to be simultaneously usable.

Note that this is a big difference from the Path implementation currently 
in circulation, which is limited to processing the native OS's paths.  The 
PEP also currently doesn't address this point at all; it should probably 
mention that each of the posixpath, ntpath, macpath, etc. modules will each 
need to include a Path implementation.  Whether this should be made 
available as os.Path or os.path.Path is the only open question; the latter 
of course would be automatic by simply adding a Path implementation to each 
of the *path modules.

From Scott.Daniels at Acm.Org  Sat Feb  4 22:42:21 2006
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sat, 04 Feb 2006 13:42:21 -0800
Subject: [Python-Dev] Path PEP -- a couple of typos.
Message-ID: <ds3739$qcq$>

Here are a couple of simple-minded fixes for the PEP.

Near the bottom of "Replacing older functions with the Path class":

 >    fname = Path("Python2.4.tar.gz")
 >    base, ext = fname.namebase, fname.extx

Surely this should be:
      base, ext = fname.namebase, fname.ext

 >    lib_dir = "/lib"
 >    libs = glob.glob(os.path.join(lib_dir, "*s.o"))
 >    ==>
 >    lib_dir = Path("/lib")
 >    libs = lib_dir.files("*.so")

Probably that should be:
      libs = glob.glob(os.path.join(lib_dir, "*.so"))

--Scott David Daniels
Scott.Daniels at Acm.Org

From rasky at  Sun Feb  5 00:18:08 2006
From: rasky at (Giovanni Bajo)
Date: Sun, 5 Feb 2006 00:18:08 +0100
Subject: [Python-Dev] Path PEP: some comments
References: <>
Message-ID: <028701c629e1$42c11b90$5cbc2997@bagio>

Phillip J. Eby <pje at> wrote:

>> - ctime() is documented to be unportable: it has different semantics
>> on UNIX and Windows. I believe the class should abstract from these
>> details.
> Note that this is the opposite of normal Python policy: Python does
> not attempt to create cross-platform abstractions, but instead
> chooses to expose platform differences.  The Path class
> shouldn't abstract this
> any more than the original *path modules do.

I don't follow. One thing is to provide an interface which totally abstracts
from low-level details. Another is to provide a function which holds different
results depending on the operating system. I'm fine to have different functions
available for different purposes on different platforms, I'm not fine with
having a single function which does different things. Do you have any other

Giovanni Bajo

From ncoghlan at  Sun Feb  5 02:26:19 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 05 Feb 2006 11:26:19 +1000
Subject: [Python-Dev] Path PEP and the division operator
In-Reply-To: <n2m-g.Xns9760B1543BBB8duncanrcpcouk@>
References: <>	<>	<>
Message-ID: <>

Duncan Booth wrote:
> I'm not convinced by the rationale given why atime,ctime,mtime and size are 
> methods rather than properties but I do find this PEP much more agreeable 
> than the last time I looked at it.

A better rationale for doing it is that all of them may raise IOException. 
It's rude for properties to do that, so it's better to make them methods instead.

That was a general guideline that came up the first time adding Path was 
proposed - if the functionality involved querying or manipulating the actual 
filesystem (and therefore potentially raising IOError), then it should be a 
method. If the operation related solely to the string representation, then it 
could be a property.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tjreedy at  Sun Feb  5 05:38:29 2006
From: tjreedy at (Terry Reedy)
Date: Sat, 4 Feb 2006 23:38:29 -0500
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<><>
Message-ID: <ds3vg6$nd5$>

"Nick Coghlan" <ncoghlan at> wrote in message 
news:43E4A10C.7020703 at
> Hell no. If I want to write a real function, I already have perfectly 
> good
> syntax for that in the form of a def statement. I want to *increase* the
> conceptual (and pedagogical) difference between deferred expressions and 
> real
> functions, not reduce it.

Mathematically, a function is a function.  Expressions and statements are 
two syntaxes for composing functions to create/define new functions.  A few 
languages use just one or the other.  Python intentionally uses both.  But 
I think making an even bigger deal of surface syntax is exactly the wrong 
movement, especially pedagogically.

Terry Jan Reedy

From tjreedy at  Sun Feb  5 06:06:28 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 5 Feb 2006 00:06:28 -0500
Subject: [Python-Dev] Path PEP: some comments
References: <030001c629c2$30345c90$bf03030a@trilan>
Message-ID: <ds418g$qta$>

"Phillip J. Eby" <pje at> wrote in message 
news: at
> Note that this is the opposite of normal Python policy: Python does not
> attempt to create cross-platform abstractions, but instead chooses to
> expose platform differences.

I had the opposite impression about Python -- that it generally masks such 
differences.  Overall, I see it as a cross-platform abstraction.  The 
requirement that ints be at least 32 bits masked the difference between 
16-bit int and 32-bit int platforms, in a way that C did/does not.  I am 
pretty sure that Tim Peters has said that he would welcome better 
uniformity in binary float computations, but that he won't do the work 
needed.  The decimal package attempts to completely mask the underlying 
platform.  Cross-platform guis, whether written in Python or just 
accessible from Python, also mask differences.  The os module has names 
like  sep and pathsep precisely so people can more easily write platform 
independent code.  And so on.

From ncoghlan at  Sun Feb  5 08:09:14 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 05 Feb 2006 17:09:14 +1000
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <ds3vg6$nd5$>
References: <>	<><>	<><><><><>	<>
Message-ID: <>

Terry Reedy wrote:
> "Nick Coghlan" <ncoghlan at> wrote in message 
> news:43E4A10C.7020703 at
>> Hell no. If I want to write a real function, I already have perfectly 
>> good
>> syntax for that in the form of a def statement. I want to *increase* the
>> conceptual (and pedagogical) difference between deferred expressions and 
>> real
>> functions, not reduce it.
> Mathematically, a function is a function.  Expressions and statements are 
> two syntaxes for composing functions to create/define new functions.  A few 
> languages use just one or the other.  Python intentionally uses both.  But 
> I think making an even bigger deal of surface syntax is exactly the wrong 
> movement, especially pedagogically.

I guess I misstated myself slightly - I've previously advocated re-using the 
'def' keyword, so there are obviously parallels I want to emphasize.

I guess my point is that expressions are appropriate sometimes, functions are 
appropriate other times, and it *is* possible to give reasonably simple 
guidelines as to which one is most appropriate when (one consumer->deferred 
expression, multiple consumers->named function).

I see it as similar to the choice of whether to use a generator function or 
generator expression in a given situation.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From duncan.booth at  Sun Feb  5 11:10:08 2006
From: duncan.booth at (Duncan Booth)
Date: Sun, 5 Feb 2006 04:10:08 -0600
Subject: [Python-Dev] Path PEP and the division operator
References: <n2m-g.Xns9760B1543BBB8duncanrcpcouk@>
Message-ID: <n2m-g.Xns9761676A7369Eduncanrcpcouk@>

Nick Coghlan <ncoghlan at> wrote in
news:43E5543B.1080907 at 

> Duncan Booth wrote:
>> I'm not convinced by the rationale given why atime,ctime,mtime and
>> size are methods rather than properties but I do find this PEP much
>> more agreeable than the last time I looked at it.
> A better rationale for doing it is that all of them may raise
> IOException. It's rude for properties to do that, so it's better to
> make them methods instead. 

Yes, that rationale sounds good to me.

> That was a general guideline that came up the first time adding Path
> was proposed - if the functionality involved querying or manipulating
> the actual filesystem (and therefore potentially raising IOError),
> then it should be a method. If the operation related solely to the
> string representation, then it could be a property.

Perhaps Bjorn could add that to the PEP?

From martin at  Sun Feb  5 13:57:41 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Feb 2006 13:57:41 +0100
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <>
References: <>
Message-ID: <>

Phillip J. Eby wrote:
>>- ctime() is documented to be unportable: it has different semantics on UNIX
>>and Windows. I believe the class should abstract from these details.
> Note that this is the opposite of normal Python policy: Python does not 
> attempt to create cross-platform abstractions, but instead chooses to 
> expose platform differences.  The Path class shouldn't abstract this any 
> more than the original *path modules do.

I think this is partially due to a misunderstanding, both by Microsoft,
and in Python. There is a long-time myth that ctime denotes "creation
time", as this is really in-line with mtime and atime.

I think the path module should provide these under a different name:
creation_time and status_change_time. Either of these might be absent.
ctime should be provided to report whatever ctime used to report in
the past (i.e. creation_time on Windows, status_change_time on Unix).


From martin at  Sun Feb  5 14:03:28 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Feb 2006 14:03:28 +0100
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <ds418g$qta$>
References: <030001c629c2$30345c90$bf03030a@trilan>	<>
Message-ID: <>

Terry Reedy wrote:
>>Note that this is the opposite of normal Python policy: Python does not
>>attempt to create cross-platform abstractions, but instead chooses to
>>expose platform differences.
> I had the opposite impression about Python -- that it generally masks such 
> differences.

I think it is both ways. For counter-examples, consider GUIs: Python
does *not* attempt to provide a cross-platform GUI library (Tk tries
that, but that is a different story). It also exposes os.lstat on
systems that provide it, but doesn't try to emulate it on systems
which don't. Likewise, there is a module linuxaudiodev which is only
useful on some systems, and winsound, which is only useful on others.

So first of all, Python exposes the platform API as-is, and doesn't
try to "correct" things that it thinks the system got "wrong", or
forgot to implement. On top of that, you have layers which try to
mask differences, e.g. the os module or the subprocess module.


From martin at  Sun Feb  5 14:09:22 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Feb 2006 14:09:22 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
In-Reply-To: <>
References: <>	<><>	<><><><><>	<>	<ds3vg6$nd5$>
Message-ID: <>

Nick Coghlan wrote:
> I guess my point is that expressions are appropriate sometimes, functions are 
> appropriate other times, and it *is* possible to give reasonably simple 
> guidelines as to which one is most appropriate when (one consumer->deferred 
> expression, multiple consumers->named function).

I don't think this guideline is really valuable. If you transfer this to
variables, you would get "one reader -> inline expression, multiple
readers -> named variable".

This is clearly wrong: it is established practice to use local variables
even if there is only one access to the variable, if creating the
variable improves readability of the code (e.g. if the expression is
very complex).

For functions, the same should hold: if it improves readability, make it
a local function.


From martin at  Sun Feb  5 14:12:10 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Feb 2006 14:12:10 +0100
Subject: [Python-Dev] ctypes patch (was:  (libffi) Re: Copyright issue)
In-Reply-To: <>
References: <>
Message-ID: <>

Hye-Shik Chang wrote:
> Thomas and I collaborated on integration into the ctypes repository
> and testing on various platforms yesterday.  My patches for Python
> are derived from ctypes CVS with a change of only one line.

Not sure whether you think you need further approval: if you are ready
to check this into the Python trunk, just go ahead. As I said, I would
prefer if what is checked in is a literal copy of the ctypes CVS (as
far as reasonable).


From rasky at  Sun Feb  5 15:17:42 2006
From: rasky at (Giovanni Bajo)
Date: Sun, 5 Feb 2006 15:17:42 +0100 (CET)
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, February 5, 2006 13:57, "Martin v. L?wis" wrote:

> I think the path module should provide these under a different name:
> creation_time and status_change_time. Either of these might be absent.

+1. This is exactly what I proposed, in fact.

> ctime should be provided to report whatever ctime used to report in
> the past (i.e. creation_time on Windows, status_change_time on Unix).

As I stated in my mail, I don't agree that there needs to be such a strict
compatibility between methods in the new Path class and functions in the
old os.path (or other) modules. Some consistency will ease the transition
of course, but there is absolutely no need to provide a 1:1 mapping. Old
code will continue to work, and new code might adapt to a new (possibly)
better API. Given the confusion with 'ctime', I don't think that providing
it in the new Path class would be a good move. It's better to force people
to explicitally name what they're asking for (either creation_time or

In other words, if there are mistakes in the old API, this is the time to
fix them. Why should we carry them over to a new API?

Giovanni Bajo

From bokr at  Sun Feb  5 16:57:54 2006
From: bokr at (Bengt Richter)
Date: Sun, 05 Feb 2006 15:57:54 GMT
Subject: [Python-Dev] Octal literals
References: <1138797216.6791.38.camel@localhost.localdomain>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On Sat, 04 Feb 2006 11:11:08 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Bengt Richter wrote:
>>>The typical way of processing incoming ints in C is through
>>>PyArg_ParseTuple, which already has the code to coerce long->int
>>>(which in turn may raise an exception for a range violation).
>>>So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
>> Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
>I didn't say that 'k' takes no significant time for longs vs ints. In
>fact, I did not make any performance claims. I don't know what the
>relative performance is.
Sorry, I apologize for putting words in your mouth.
>> Well, I was visualizing having a homogeneous bunch of bit mask
>> definitions all as int type if they could fit. I can't express
>> them all in hex as literals without some processing. That got me started ;-)
>I still can't see *why* you want to do that. Just write them as
>hex literals the way you expect it to work, and it typically will
>work just fine. Some of these literals are longs, some are ints,
>but there is no need to worry about this. It will all work just
Perhaps it's mostly aesthetics.
Imagine that I was a tile-setter and my supplier had an order form where I could
order square glazed tiles in various colors with dimensions in multiples of 4cm,
and I said that I was very happy with the product, except why does the supplier
have to send stretchable plastic tiles whenever I order the 32cm size, when I know
they can be made like the others? (Granted that the plastic works just fine for most uses ;-).

I have to admit the price for supplies is unbeatable, and that the necessary kit for
converting 32cm plastic to ceramic was also supplied, but still, if one can order ceramic
at all, why not the full range? Especially since if one orders the 32cm size in another dialect
one can get it without having to use the conversion kit, e.g.,
 >>> -2147483648
 >>> -0x80000000
 >>> int(-0x80000000)
That minus seems to bind differently in different literal dialects,
e.g. to make the point clearer, compare with above:
 >>> -2147483648
 >>> -(2147483648)

>> BTW, is long mandatory for all implementations? Is there a doc that
>> defines minimum features for a conforming Python implementation?
>The Python language reference is typically considered as a specification
>of what Python is. There is no "minimal Python" specification: you have
>to do all of it.
Good to know, thanks. Sorry to go OT. If someone wants to add something about supersetting
and pypy's facilitation of same, I guess that belongs in another thread ;-)

Bengt Richter

From martin at  Sun Feb  5 18:04:06 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Feb 2006 18:04:06 +0100
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <>
References: <>
Message-ID: <>

Giovanni Bajo wrote:
>>ctime should be provided to report whatever ctime used to report in
>>the past (i.e. creation_time on Windows, status_change_time on Unix).
> In other words, if there are mistakes in the old API, this is the time to
> fix them. Why should we carry them over to a new API?

I'm not talking about all API in general, I'm talking about ctime
specifically. People will ask "where is ctime?", and it better be
where they think it should be. I don't see a point in confusing

(Plus, there might be systems that associate yet a different meaning
with ctime - it is just our guess that it means "status change time").


From aleaxit at  Sun Feb  5 18:31:48 2006
From: aleaxit at (Alex Martelli)
Date: Sun, 5 Feb 2006 09:31:48 -0800
Subject: [Python-Dev] math.areclose ...?
Message-ID: <>

When teaching some programming to total newbies, a common frustration  
is how to explain why a==b is False when a and b are floats computed  
by different routes which ``should'' give the same results (if  
arithmetic had infinite precision).  Decimals can help, but another  
approach I've found useful is embodied in Numeric.allclose(a,b) --  
which returns True if all items of the arrays are ``close'' (equal to  
within certain absolute and relative tolerances):

 >>> (1.0/3.0)==(0.1/0.3)
 >>> Numeric.allclose(1.0/3.0, 0.1/0.3)

But pulling in the whole of Numeric just to have that one handy  
function is often overkill.  So I was wondering if module math (and  
perhaps by symmetry module cmath, too) shouldn't grow a function  
'areclose' (calling it just 'close' seems likely to engender  
confusion, since 'close' is more often used as a verb than as an  
adjective; maybe some other name would work better, e.g.  
'almost_equal') taking two float arguments and optional tolerances  
and using roughly the same specs as Numeric, e.g.:

def areclose(x,y,rtol=1.e-5,atol=1.e-8):
     return abs(x-y)<atol+rtol*abs(y)

What do y'all think...?


From jcarlson at  Sun Feb  5 18:38:35 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 05 Feb 2006 09:38:35 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
Message-ID: <>

bokr at (Bengt Richter) wrote:
> Martin v. Lowis <martin at> wrote:
> >Bengt Richter wrote:
> >>>The typical way of processing incoming ints in C is through
> >>>PyArg_ParseTuple, which already has the code to coerce long->int
> >>>(which in turn may raise an exception for a range violation).
> >>>
> >>>So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
> >> 
> >> Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
> >
> >I didn't say that 'k' takes no significant time for longs vs ints. In
> >fact, I did not make any performance claims. I don't know what the
> >relative performance is.
> Sorry, I apologize for putting words in your mouth.

In regards to the aesthetics and/or inconsistancies of:
 >>> -0x80000000
 >>> -2147483648
 >>> -(2147483648)

1. If your Python code distinguishes between ints and longs, it has a

2. If your C extension to Python isn't using the 'k' format specifier as
Martin is telling you to, then your C extension has a bug.

3. If you are concerned about *potential* performance degredation due to
a use of 'k' rather than 'i' or 'I', then you've forgotten the fact that
Python function calling is orders of magnitude slower than the minimal
bit twiddling that PyInt_AsUnsignedLongMask() or
PyLong_AsUnsignedLongMask() has to do.

Please, just use 'k' and let the list get past this.

 - Josiah

From guido at  Sun Feb  5 18:43:28 2006
From: guido at (Guido van Rossum)
Date: Sun, 5 Feb 2006 09:43:28 -0800
Subject: [Python-Dev] Let's just *keep* lambda
Message-ID: <>

After so many attempts to come up with an alternative for lambda,
perhaps we should admit defeat. I've not had the time to follow the
most recent rounds, but I propose that we keep lambda, so as to stop
wasting everybody's talent and time on an impossible quest.

--Guido van Rossum (home page:

From aahz at  Sun Feb  5 19:01:24 2006
From: aahz at (Aahz)
Date: Sun, 5 Feb 2006 10:01:24 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Feb 05, 2006, Alex Martelli wrote:
> But pulling in the whole of Numeric just to have that one handy  
> function is often overkill.  So I was wondering if module math (and  
> perhaps by symmetry module cmath, too) shouldn't grow a function  
> 'areclose' (calling it just 'close' seems likely to engender  
> confusion, since 'close' is more often used as a verb than as an  
> adjective; maybe some other name would work better, e.g.  
> 'almost_equal') taking two float arguments and optional tolerances  
> and using roughly the same specs as Numeric, e.g.:
> def areclose(x,y,rtol=1.e-5,atol=1.e-8):
>      return abs(x-y)<atol+rtol*abs(y)

Looks interesting.  I don't quite understand what atol/rtol are, though.
You're right that another name would be better; almost_equal is fine.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From g.brandl at  Sun Feb  5 19:02:34 2006
From: g.brandl at (Georg Brandl)
Date: Sun, 05 Feb 2006 19:02:34 +0100
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <>
References: <>
Message-ID: <ds5ejq$kff$>

Alex Martelli wrote:
> When teaching some programming to total newbies, a common frustration  
> is how to explain why a==b is False when a and b are floats computed  
> by different routes which ``should'' give the same results (if  
> arithmetic had infinite precision).  Decimals can help, but another  
> approach I've found useful is embodied in Numeric.allclose(a,b) --  
> which returns True if all items of the arrays are ``close'' (equal to  
> within certain absolute and relative tolerances):
>  >>> (1.0/3.0)==(0.1/0.3)
> False
>  >>> Numeric.allclose(1.0/3.0, 0.1/0.3)
> 1
> But pulling in the whole of Numeric just to have that one handy  
> function is often overkill.  So I was wondering if module math (and  
> perhaps by symmetry module cmath, too) shouldn't grow a function  
> 'areclose' (calling it just 'close' seems likely to engender  
> confusion, since 'close' is more often used as a verb than as an  
> adjective; maybe some other name would work better, e.g.  
> 'almost_equal') taking two float arguments and optional tolerances  
> and using roughly the same specs as Numeric, e.g.:
> def areclose(x,y,rtol=1.e-5,atol=1.e-8):
>      return abs(x-y)<atol+rtol*abs(y)
> What do y'all think...?

atol sounds suspicious to me, but otherwise fine.


From tjreedy at  Sun Feb  5 19:11:29 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 5 Feb 2006 13:11:29 -0500
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<><>	<><><><><>	<><ds3vg6$nd5$>
Message-ID: <ds5fic$o3i$>

"Nick Coghlan" <ncoghlan at> wrote in message 
news:43E5A49A.5050907 at
> I guess I misstated myself slightly - I've previously advocated re-using 
> the
> 'def' keyword, so there are obviously parallels I want to emphasize.

If 3.0 comes with a conversion program, then I would like to see 'lambda' 
replaced with either 'def' or another abbreviation like 'edef' (expression 
def) or 'func'. 

From bokr at  Sun Feb  5 19:45:52 2006
From: bokr at (Bengt Richter)
Date: Sun, 05 Feb 2006 18:45:52 GMT
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<>	<>
	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On Sat, 4 Feb 2006 20:17:15 +0100, Eric Nieuwland <eric.nieuwland at> wrote:

>Nick Coghlan wrote:
>>> I believe that usage of a keyword with the name of a Greek letter also
>>> contributes to people considering something broken.
>> Aye, I agree there are serious problems with the current syntax. All 
>> I'm
>> trying to say above is that I don't believe the functionality itself 
>> is broken.
>Lambda is not broken, it's restricted to  single calculation and 
>therefore of limited use.
It's not even that restricted, if you want to be perverse, e.g.,

 >>> (lambda w:eval(compile("""if 1: # indented looks nicer ;-)
 ...     if len(w)<=3: adj ='short'
 ...     elif len(w)<=5: adj ='medium length'
 ...     else: adj = 'long'
 ...     print 'Hi, %s! I would say you have a %s name ;-)'%(w,adj)
 ... """,'','exec')))('Monty')
 Hi, Monty! I would say you have a medium length name ;-)

lazy copy/pasting and changing the arg:

 >>> (lambda w:eval(compile("""if 1: # indented looks nicer ;-)
 ...     if len(w)<=3: adj ='short'
 ...     elif len(w)<=5: adj ='medium length'
 ...     else: adj = 'long'
 ...     print 'Hi, %s! I would say you have a %s name ;-)'%(w,adj)
 ... """,'','exec')))('Ada')
 Hi, Ada! I would say you have a short name ;-)

My point is that ISTM preventing easy inclusion of suites in lambda/anonymous_def
is more of a morality/taste/catechistic issue than a technical one. It seems like
an attempt to control coding style by disincentivizing the disapproved.
That may be ok in the big picture, I'm not sure, but IMO transparency of motivations is best.

>Although I wasn't too serious (should had added more signs of that), an 
>anonymous 'def' would allow to use the full power of method definition.
It's already allowed, just not in a way that generates efficient code (although
the above can be improved upon, let's not go there ;-)

>> At last count, Guido's stated preference was to ditch the functionality
>> entirely for Py3k, so unless he says something to indicate he's 
>> changed his
>> mind, we'll simply need to continue with proposing functions like
>> methodcaller() as workarounds for its absence...
>Yep, we'll just have to learn to live without it. :-( / ;-)
If it's needed, I believe a way will be found to have it ;-)
I do think the current lambda serves a valuable purpose, so I
hope some way is found to preserve the functionality, whatever
problems anyone may have with its current simple syntax.
Psst, Nick, how about
    (x*y for x,y in ()) ? # "()" as mnemonic for call args

Bengt Richter

From python at  Sun Feb  5 19:48:51 2006
From: python at (Raymond Hettinger)
Date: Sun, 5 Feb 2006 13:48:51 -0500
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <001601c62a84$cef91d80$b83efea9@RaymondLaptop1>

>>So I was wondering if module math (and
>> perhaps by symmetry module cmath, too) shouldn't grow a function
>> 'areclose' (calling it just 'close' seems likely to engender
>> confusion, since 'close' is more often used as a verb than as an
>> adjective; maybe some other name would work better, e.g.
>> 'almost_equal') taking two float arguments and optional tolerances
>> and using roughly the same specs as Numeric, e.g.:
>> def areclose(x,y,rtol=1.e-5,atol=1.e-8):
>>      return abs(x-y)<atol+rtol*abs(y)

IMO, the cure is worse than the disease.  It is easier to learn about
the hazards of floating point equality testing than to think through the
implications of tolerance testing (such as loss of transitivity) and 
how to set the right tolerance values for a given application (ones that
give the right results across the entire domain of expected inputs).

The areclose() function can be a dangerous crutch that temporarily
glosses over the issue.  Without some numerical sophistication, it would not
be hard create programs that look correct and pass a few test but, in fact,
contain nasty bugs (non-termination, incorrect acceptance/rejection, etc).

This proposal is one of several that have recently surfaced that aim to help
newbies skip learning basic lessons.  I think the efforts are noble but 

* If someone doesn't get why set(1,2,3) raises an exception, it is a good
   opportunity to teach a broadly applicable skill:

       def Set(*args): return set(args)

* If someone doesn't get why sum([0.1]*10)!=1.0, then we have a good
   opportunity to teach the basics of floating point.  Otherwise, we're 
   to get people writing accounting apps using floats instead of ints or 

* If someone doesn't get how to empty a list using a[:]=[], it is a good 
   to go through the basics of slicing which are a foundation for 
   many parts of the language.

A language suitable for beginners should be easy to learn, but it should not
leave them permanently crippled.  All of the above are sets of training 
that don't come off.  To misquote Einstein:  The language should be as 
as possible, but no simpler.


From Scott.Daniels at Acm.Org  Sun Feb  5 19:53:30 2006
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sun, 05 Feb 2006 10:53:30 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <ds5ejq$kff$>
References: <>
Message-ID: <ds5him$up2$>

Georg Brandl wrote:
> Alex Martelli wrote:
>>So I was wondering if module math (and perhaps by symmetry module cmath, 
>> too) shouldn't grow a function 'areclose' ...maybe ... 'almost_equal')
>> def areclose(x, y, rtol=1.e-5, atol=1.e-8):
>>      return abs(x-y)<atol+rtol*abs(y)
> atol sounds suspicious to me, but otherwise fine.

"almost_equal", "closeto", or some variant of "near" (no nasty verb to
worry about) would do for me.  atol / rtol would be better as either
abs_tol / rel_tol or even absolute_tolerance / relative_tolerance.

As to the equation itself, wouldn't a symmetric version be somewhat

     def nearby(x, y, rel_tol=1.e-5, abs_tol=1.e-8):
         return abs(x - y) < abs_tol + rel_tol * (abs(x) + abs(y))

This avoids areclose(0, 1e-8) != areclose(1e-8, 0), for example.

--Scott David Daniels
Scott.Daniels at Acm.Org

From gherron at  Sun Feb  5 19:35:23 2006
From: gherron at (Gary Herron)
Date: Sun, 05 Feb 2006 10:35:23 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

>After so many attempts to come up with an alternative for lambda,
>perhaps we should admit defeat. I've not had the time to follow the
>most recent rounds, but I propose that we keep lambda, so as to stop
>wasting everybody's talent and time on an impossible quest.
>--Guido van Rossum (home page:
>Python-Dev mailing list
>Python-Dev at
Hear hear!


Gary Herron

From tjreedy at  Sun Feb  5 20:01:58 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 5 Feb 2006 14:01:58 -0500
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <ds5i3r$um$>

"Guido van Rossum" <guido at> wrote in message 
news:ca471dc20602050943q5bad4d1ehadd9d3b653d8b4fb at
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.

To me, there are two separate issues: the keyword and the syntax.  I also 
have not been impressed by any of the numerous alternative syntaxes 
proposed over several years and just this morning was thinking something 
similar to the above.

But will you consider changing the keyword from the charged and overladen 
'lambda' to something else?  (See other post today.)  I think this would 
cut at least half the fuss.

I base this on the following observation: generator expressions are to 
generator statement definitions much like function expressions are to 
function statement definitions.  Both work when the payload yielded or 
returned is computed in a single expression.  But I personally have not 
seen any complaints about the 'limitations of generator expressions' nor 
proposals to duplicate the generality of statement definitions by stuffing 
compound statement bodies within expressions.

But if we had called them generator lambdas, I suspect we would have.

Terry Jan Reedy

From fredrik at  Sun Feb  5 20:02:39 2006
From: fredrik at (Fredrik Lundh)
Date: Sun, 5 Feb 2006 20:02:39 +0100
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<><>	<><><><><>	<><ds3vg6$nd5$><>
Message-ID: <ds5i4g$10u$>

Terry Reedy wrote:

> If 3.0 comes with a conversion program, then I would like to see 'lambda'
> replaced with either 'def' or another abbreviation like 'edef' (expression
> def) or 'func'.

making the implied return statment visible might also be a good idea, e.g.

    lambda x, y: return x + y

or even

    def (x, y): return x + y


From bob at  Sun Feb  5 20:16:19 2006
From: bob at (Bob Ippolito)
Date: Sun, 5 Feb 2006 11:16:19 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <001601c62a84$cef91d80$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On Feb 5, 2006, at 10:48 AM, Raymond Hettinger wrote:

>>> So I was wondering if module math (and
>>> perhaps by symmetry module cmath, too) shouldn't grow a function
>>> 'areclose' (calling it just 'close' seems likely to engender
>>> confusion, since 'close' is more often used as a verb than as an
>>> adjective; maybe some other name would work better, e.g.
>>> 'almost_equal') taking two float arguments and optional tolerances
>>> and using roughly the same specs as Numeric, e.g.:
>>> def areclose(x,y,rtol=1.e-5,atol=1.e-8):
>>>      return abs(x-y)<atol+rtol*abs(y)
> IMO, the cure is worse than the disease.  It is easier to learn about
> the hazards of floating point equality testing than to think  
> through the
> implications of tolerance testing (such as loss of transitivity) and
> learning
> how to set the right tolerance values for a given application (ones  
> that
> give the right results across the entire domain of expected inputs).
> The areclose() function can be a dangerous crutch that temporarily
> glosses over the issue.  Without some numerical sophistication, it  
> would not
> be hard create programs that look correct and pass a few test but,  
> in fact,
> contain nasty bugs (non-termination, incorrect acceptance/ 
> rejection, etc).

For those of us that already know what we're doing with floating  
point, areclose would be very convenient to have.  Especially for  
unit testing.  I could definitely throw away a bunch of ugly code  
that uses less correct arbitrary tolerance guesses if it were around.


From python at  Sun Feb  5 20:31:42 2006
From: python at (Raymond Hettinger)
Date: Sun, 5 Feb 2006 14:31:42 -0500
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <006001c62a8a$cb558230$b83efea9@RaymondLaptop1>

[Bob Ipppolito]
> For those of us that already know what we're doing with floating  
> point, areclose would be very convenient to have.

Do you agree that the original proposed use (helping newbs ignore floating
point realities) is misguided and error-prone?

Just curious, for your needs, do you want both absolute and relative 
checks combined into the same function?

>  Especially for  
> unit testing.  I could definitely throw away a bunch of ugly code  
> that uses less correct arbitrary tolerance guesses if it were around.

The unittest module already has assertAlmostEqual().  Does that
method meet your needs or does it need to be improved in some way?


From bob at  Sun Feb  5 20:46:25 2006
From: bob at (Bob Ippolito)
Date: Sun, 5 Feb 2006 11:46:25 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <006001c62a8a$cb558230$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On Feb 5, 2006, at 11:31 AM, Raymond Hettinger wrote:

> [Bob Ipppolito]
>> For those of us that already know what we're doing with floating   
>> point, areclose would be very convenient to have.
> Do you agree that the original proposed use (helping newbs ignore  
> floating
> point realities) is misguided and error-prone?

Maybe it's a bit misguided, but it's less error-prone than more naive  
comparisons.  It could delay the necessity for a newer programmer to  
lean all about floating point, but maybe most of those users don't  
really need to learn it.

Whether the function is there or not, this is really a documentation  
issue.  If the function is there then maybe it could highly suggest  
reading some "floating point in Python" guide that would describe the  
scenario, then lists common pitfalls with patterns that avoid those  

> Just curious, for your needs, do you want both absolute and  
> relative checks combined into the same function?

Having both makes it less likely that you'll need to tweak the  
constants, except of course if you're working with very small numbers  
such that the absolute tolerance is too big.  Of course, if you only  
want one or the other in a given case, you can always pass in 0  

For my needs, the proposed function and default tolerances would be  
better than the sloppy stuff that usually ends up in my tests.

>>  Especially for  unit testing.  I could definitely throw away a  
>> bunch of ugly code  that uses less correct arbitrary tolerance  
>> guesses if it were around.
> The unittest module already has assertAlmostEqual().  Does that
> method meet your needs or does it need to be improved in some way?

I generally write tests that don't run directly under the unittest  
framework, such as doctests or assert-based functions for nose or  
py.test.  The unittest module does not expose assertAlmostEqual as a  
function so it's of little use for me.


From bokr at  Sun Feb  5 21:28:30 2006
From: bokr at (Bengt Richter)
Date: Sun, 05 Feb 2006 20:28:30 GMT
Subject: [Python-Dev] Octal literals
References: <> <>
Message-ID: <>

On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson <jcarlson at> wrote:

>bokr at (Bengt Richter) wrote:
>> Martin v. Lowis <martin at> wrote:
>> >Bengt Richter wrote:
>> >>>The typical way of processing incoming ints in C is through
>> >>>PyArg_ParseTuple, which already has the code to coerce long->int
>> >>>(which in turn may raise an exception for a range violation).
>> >>>
>> >>>So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
>> >> 
>> >> Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
>> >
>> >I didn't say that 'k' takes no significant time for longs vs ints. In
>> >fact, I did not make any performance claims. I don't know what the
>> >relative performance is.
>> Sorry, I apologize for putting words in your mouth.
>In regards to the aesthetics and/or inconsistancies of:
> >>> -0x80000000
> -2147483648L
> >>> -2147483648
> -2147483648
> >>> -(2147483648)
> -2147483648L
>1. If your Python code distinguishes between ints and longs, it has a
Are you just lecturing me personally (in which case off list would be more appropriate),
or do you include the authors of the 17 files I count under <some prefix>/Lib that have
isinstance(<something>, int) in them?
Or would you like to rephrase that with suitable qualifications? ;-)

>2. If your C extension to Python isn't using the 'k' format specifier as
>Martin is telling you to, then your C extension has a bug.
I respect Martin's expert knowledge and manner of communication. He said,
"Just have a look at the 'k' specifier in PyArg_ParseTuple."

Bengt Richter

From fdrake at  Sun Feb  5 21:38:06 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Sun, 5 Feb 2006 15:38:06 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On Sunday 05 February 2006 12:43, Guido van Rossum wrote:
 > After so many attempts to come up with an alternative for lambda,
 > perhaps we should admit defeat. I've not had the time to follow the
 > most recent rounds, but I propose that we keep lambda, so as to stop
 > wasting everybody's talent and time on an impossible quest.



Fred L. Drake, Jr.   <fdrake at>

From bokr at  Sun Feb  5 21:42:19 2006
From: bokr at (Bengt Richter)
Date: Sun, 05 Feb 2006 20:42:19 GMT
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <>

On Sun, 5 Feb 2006 13:48:51 -0500, "Raymond Hettinger" <python at> wrote:
>A language suitable for beginners should be easy to learn, but it should not
>leave them permanently crippled.  All of the above are sets of training 
>that don't come off.  To misquote Einstein:  The language should be as 
>as possible, but no simpler.
++1 QOTW

Bengt Richter

From p.f.moore at  Sun Feb  5 21:43:57 2006
From: p.f.moore at (Paul Moore)
Date: Sun, 5 Feb 2006 20:43:57 +0000
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/5/06, Guido van Rossum <guido at> wrote:
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.


The recently suggested keyword change, from lambda to expr (as in
'''expr x, y: x+y''') looks like an improvement to me, but I suspect
opening up the possibility of a keyword change would simply restart
all the discussions... (Nevertheless, I'd be +1 on lambda being
renamed to expr, if it was an option).


From python at  Sun Feb  5 21:49:14 2006
From: python at (Raymond Hettinger)
Date: Sun, 5 Feb 2006 15:49:14 -0500
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <004e01c62a95$a03e9ef0$6701a8c0@RaymondLaptop1>

> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.

+1 -- trying to cover all the use cases is a fools errand


From allison at  Sun Feb  5 22:02:37 2006
From: allison at (Dennis Allison)
Date: Sun, 5 Feb 2006 13:02:37 -0800 (PST)
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
Message-ID: <>

+1 on retaining lambda
-1 on any name change

On Sun, 5 Feb 2006, Paul Moore wrote:

> On 2/5/06, Guido van Rossum <guido at> wrote:
> > After so many attempts to come up with an alternative for lambda,
> > perhaps we should admit defeat. I've not had the time to follow the
> > most recent rounds, but I propose that we keep lambda, so as to stop
> > wasting everybody's talent and time on an impossible quest.
> +1
> The recently suggested keyword change, from lambda to expr (as in
> '''expr x, y: x+y''') looks like an improvement to me, but I suspect
> opening up the possibility of a keyword change would simply restart
> all the discussions... (Nevertheless, I'd be +1 on lambda being
> renamed to expr, if it was an option).
> Paul.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:


From crutcher at  Sun Feb  5 22:09:50 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 5 Feb 2006 13:09:50 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>


On 2/5/06, Guido van Rossum <guido at> wrote:
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Crutcher Dunnavant <crutcher at>

From tim.peters at  Sun Feb  5 23:16:38 2006
From: tim.peters at (Tim Peters)
Date: Sun, 5 Feb 2006 17:16:38 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.

Huh!  Was someone bad-mouthing lambda again?  We should keep it, but
rename it to honor a different Greek letter.  xi is a good one, easier
to type, and would lay solid groundwork for future flamewars between
xi enthusiasts and Roman numeral fans :-)

From crutcher at  Sun Feb  5 23:26:54 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 5 Feb 2006 14:26:54 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Which reminds me, we need to support roman numeral constants.
A silly implementation follows.

class RomanNumeralDict(dict):
  def __getitem__(self, key):
    if not self.has_key(key) and self.isRN(key):
      return self.decodeRN(key)
    return dict.__getitem__(self, key)

  def isRN(self, key):
    for c in key:
      if c not in 'MmCcXxIiDdVv':
        return False
    return True

  def decodeRN(self, key):
    val = 0
    # ... do stuff ...
    return val

On 2/5/06, Tim Peters <tim.peters at> wrote:
> [Guido]
> > After so many attempts to come up with an alternative for lambda,
> > perhaps we should admit defeat. I've not had the time to follow the
> > most recent rounds, but I propose that we keep lambda, so as to stop
> > wasting everybody's talent and time on an impossible quest.
> Huh!  Was someone bad-mouthing lambda again?  We should keep it, but
> rename it to honor a different Greek letter.  xi is a good one, easier
> to type, and would lay solid groundwork for future flamewars between
> xi enthusiasts and Roman numeral fans :-)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Crutcher Dunnavant <crutcher at>

From tim.peters at  Sun Feb  5 23:35:44 2006
From: tim.peters at (Tim Peters)
Date: Sun, 5 Feb 2006 17:35:44 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

[Crutcher Dunnavant[
> Which reminds me, we need to support roman numeral constants.

One of my more-normal relatives reminded me that this is Super Bowl XL
Sunday, so your demand is more topical than it would ordinarily be. 
Alas, there's already a PEP on this, and it was already rejected.  See

From crutcher at  Sun Feb  5 23:53:20 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 5 Feb 2006 14:53:20 -0800
Subject: [Python-Dev] [PATCH] Fix dictionary subclass semantics whenused
	as global dictionaries
In-Reply-To: <>
References: <>
Message-ID: <>

I've significantly re-worked the patch to permit globals to be
arbitrary mappings.
The regression tests continue to all pass.

On 1/24/06, "Martin v. L?wis" <martin at> wrote:
> Crutcher Dunnavant wrote:
> > Okay, but is there any reason not to include this in 2.5? There
> > doesn't seem to be any noticeable performance impact, and it does add
> > consistancy (and opens some really, really cool options up).
> I see no reason, except perhaps the lack of volunteers to actually
> patch the repository (along with the accompanying work).
> Regards,
> Martin

Crutcher Dunnavant <crutcher at>

From python at  Mon Feb  6 00:01:16 2006
From: python at (Raymond Hettinger)
Date: Sun, 5 Feb 2006 18:01:16 -0500
Subject: [Python-Dev] [PATCH] Fix dictionary subclass semantics whenused
	as global dictionaries
References: <>
Message-ID: <002001c62aa8$12439890$6701a8c0@RaymondLaptop1>

You don't have to keep writing notes to python-dev on this patch.
It is assigned to me and when I get a chance to go through it in detail,
it has a good likelihood of going in (if no issues arise).


----- Original Message ----- 
From: "Crutcher Dunnavant" <crutcher at>
To: "Martin v. L?wis" <martin at>
Cc: <python at>; "Aahz" <aahz at>; <python-dev at>
Sent: Sunday, February 05, 2006 5:53 PM
Subject: Re: [Python-Dev] [PATCH] Fix dictionary subclass semantics whenused 
as global dictionaries

I've significantly re-worked the patch to permit globals to be
arbitrary mappings.
The regression tests continue to all pass.

On 1/24/06, "Martin v. L?wis" <martin at> wrote:
> Crutcher Dunnavant wrote:
> > Okay, but is there any reason not to include this in 2.5? There
> > doesn't seem to be any noticeable performance impact, and it does add
> > consistancy (and opens some really, really cool options up).
> I see no reason, except perhaps the lack of volunteers to actually
> patch the repository (along with the accompanying work).
> Regards,
> Martin

Crutcher Dunnavant <crutcher at> 

From bokr at  Mon Feb  6 00:28:55 2006
From: bokr at (Bengt Richter)
Date: Sun, 05 Feb 2006 23:28:55 GMT
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>	<>	<>
	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

On Sun, 05 Feb 2006 18:45:52 GMT, bokr at (Bengt Richter) wrote:
>Psst, Nick, how about
>    (x*y for x,y in ()) ? # "()" as mnemonic for call args
D'oh, sorry, that should have been illegal syntax, e.g.,

     (x*y for x,y in *) ? # "*" as mnemonic for call *args
     (x*y for x,y in *)(3,5) # => 15
     (x*y for x,y in *)(*[3,5]) # => 15

Hm, along that line why not

     (x*y for x,y in **) ? # "**" as mnemonic for call **kwargs
     (x*y for x,y in **)(x=3, y=5) # => 15
or maybe even
     (x*y+z for (x,y),z in *,**)(3, 5, z=200) # => 215

Though I see this is moot, since Guido decided to "keep lambda,"
(+1 on that, although this is kind of growing on me, no doubt from
partial ih-factor ;-)

Bengt Richter

From eric.nieuwland at  Mon Feb  6 02:19:52 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Mon, 6 Feb 2006 02:19:52 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 feb 2006, at 18:43, Guido van Rossum wrote:

> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.
> --
> --Guido van Rossum (home page:


And let's add "Wise" t the BDFL's title: WBDFL. ;-)


From guido at  Mon Feb  6 03:08:58 2006
From: guido at (Guido van Rossum)
Date: Sun, 5 Feb 2006 18:08:58 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 2/5/06, Bengt Richter <bokr at> wrote:
> On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson <jcarlson at> wrote:
> >1. If your Python code distinguishes between ints and longs, it has a
> >bug.
> Are you just lecturing me personally (in which case off list would be more appropriate),
> or do you include the authors of the 17 files I count under <some prefix>/Lib that have
> isinstance(<something>, int) in them?

Josiah is correct, and those modules all have bugs.

--Guido van Rossum (home page:

From jcarlson at  Mon Feb  6 03:47:13 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 05 Feb 2006 18:47:13 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <>
Message-ID: <>

bokr at (Bengt Richter) wrote:
> Are you just lecturing me personally (in which case off list would be more appropriate),
> or do you include the authors of the 17 files I count under <some prefix>/Lib that have
> isinstance(<something>, int) in them?
> Or would you like to rephrase that with suitable qualifications? ;-)

I did not mean to sound like I was lecturing you personally.

Without taking a peek at the source, I would guess that the various uses
of isinstance(<something>, int) are bugs, possibly replacing previous
uses of type(<something>) is int, shortly after int subclassing was
allowed.  But that's just a guess.

 - Josiah

From smiles at  Mon Feb  6 03:50:43 2006
From: smiles at (Chris or Leslie Smith)
Date: Sun, 5 Feb 2006 20:50:43 -0600
Subject: [Python-Dev] any support for a methodcaller HOF?
References: <>
Message-ID: <003001c62aca$47d7e750$cf2c4fca@csmith>

| making the implied return statment visible might also be a good idea,
| e.g. 
|    lambda x, y: return x + y
| or even
|    def (x, y): return x + y

Although I don't understand the implications of making such a change, the 2nd alternative above looks very nice. Whenever I write a lambda I feel like I am doing something non-pythonic. I think the 2nd proposal increases the readability of the lambda, something that is often touted as being part of what makes python beautiful.


From steven.bethard at  Mon Feb  6 05:44:15 2006
From: steven.bethard at (Steven Bethard)
Date: Sun, 5 Feb 2006 21:44:15 -0700
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.

Personally, I'd rather see a callable-from-expression syntax (the
``lambda`` expression) that looks more like our
callable-from-statements syntax (the ``def`` statement), e.g. Nick
Coghlan's def-from syntax::

    (def f(a) + o(b) - o(c) from (a, b, c))

Something like this is more consistent with how list creation is
turned into list comprehensions, how generator functions are turned
into generator expressions and how if/else statements are turned into
conditional expressions.

That said, I firmly believe that syntax decisions *must* be left to
the BDFL.  The decorator syntax and with-statement syntax debates
clearly showed this.  So if after looking at all the syntax
alternatives_, you feel that the current lambda syntax is the best we
can do, I'm willing to accept that decision.

.. _alternatives:

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From bokr at  Mon Feb  6 06:33:57 2006
From: bokr at (Bengt Richter)
Date: Mon, 06 Feb 2006 05:33:57 GMT
Subject: [Python-Dev] Octal literals
References: <> <>
Message-ID: <>

On Sun, 5 Feb 2006 18:08:58 -0800, Guido van Rossum <guido at> wrote:

>On 2/5/06, Bengt Richter <bokr at> wrote:
>> On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson <jcarlson at> wrote:
>> >1. If your Python code distinguishes between ints and longs, it has a
>> >bug.
>> Are you just lecturing me personally (in which case off list would be more appropriate),
>> or do you include the authors of the 17 files I count under <some prefix>/Lib that have
>> isinstance(<something>, int) in them?
>Josiah is correct, and those modules all have bugs.
It seems I stand incontestably corrected. Sorry, both ways ;-/
Perhaps I missed a py3k assumption in this thread (where I see in the PEP that 
"Remove distinction between int and long types" is core item number one)?
I googled, but could not find that isinstance(<something>,int) was slated for deprecation,
so I assumed that Josiah's absolute statement "1. ..." (above) could not be absolutely true, at least
in the "has" (present) tense that he used. Is PEP 237 phase C to be implemented sooner than py3k,
making isinstance(<something>, int) a transparently distinction-hiding alias for
isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int)
be _guaranteed_ to be a bug, thus requiring code change? If so, when?

Bengt Richter

From bokr at  Mon Feb  6 06:55:05 2006
From: bokr at (Bengt Richter)
Date: Mon, 06 Feb 2006 05:55:05 GMT
Subject: [Python-Dev] Octal literals
References: <>
Message-ID: <>

On Sun, 05 Feb 2006 18:47:13 -0800, Josiah Carlson <jcarlson at> wrote:

>bokr at (Bengt Richter) wrote:
>> Are you just lecturing me personally (in which case off list would be more appropriate),
>> or do you include the authors of the 17 files I count under <some prefix>/Lib that have
>> isinstance(<something>, int) in them?
>> Or would you like to rephrase that with suitable qualifications? ;-)
>I did not mean to sound like I was lecturing you personally.
>Without taking a peek at the source, I would guess that the various uses
>of isinstance(<something>, int) are bugs, possibly replacing previous
>uses of type(<something>) is int, shortly after int subclassing was
>allowed.  But that's just a guess.
Thank you. I didn't look either, but I did notice that most (but not all) of them were
under <some prefix>/Lib/test/. Maybe it's excusable for test code ;-)

Bengt Richter

From thomas at  Mon Feb  6 09:05:01 2006
From: thomas at (Thomas Wouters)
Date: Mon, 6 Feb 2006 09:05:01 +0100
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Mon, Feb 06, 2006 at 05:33:57AM +0000, Bengt Richter wrote:

> Perhaps I missed a py3k assumption in this thread (where I see in the PEP
> that "Remove distinction between int and long types" is core item number
> one)? -- an ungoing process, not a
Py3K-eventual one.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From abo at  Mon Feb  6 15:09:18 2006
From: abo at (Donovan Baarda)
Date: Mon, 06 Feb 2006 14:09:18 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, 2006-02-03 at 11:56 -0800, Josiah Carlson wrote:
> Donovan Baarda <abo at> wrote:
> > Nuff was a fairy... though I guess it depends on where you draw the
> > line; should [1,2,3] be list(1,2,3)?
> Who is "Nuff"?

fairynuff... :-)

> Along the lines of "not every x line function should be a builtin", "not
> every builtin should have syntax".  I think that sets have particular
> uses, but I don't believe those uses are sufficiently varied enough to
> warrant the creation of a syntax.  I suggest that people take a walk
> through their code. How often do you use other sequence and/or mapping
> types? How many lists, tuples and dicts are there?  How many sets? Ok,
> now how many set literals?

The absence of sets in early Python, the requirement to "import sets"
when they first appeared, and the lack of a set syntax now all mean that
people tend to avoid using sets and resort to lists, tuples, and "dicts
of None" instead, even though they really want a set. Anywhere you see
"if value in sequence:", they probably mean sequence is a set, and this
code would run much faster if it really was, and might even avoid
potential bugs because it would prevent duplicates...

Donovan Baarda <abo at>

From abo at  Mon Feb  6 15:11:20 2006
From: abo at (Donovan Baarda)
Date: Mon, 06 Feb 2006 14:11:20 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Fri, 2006-02-03 at 20:02 +0100, "Martin v. L?wis" wrote:
> Donovan Baarda wrote:
> > Before set() the standard way to do them was to use dicts with None
> > Values... to me the "{1,2,3}" syntax would have been a logical extension
> > of the "a set is a dict with no values, only keys" mindset. I don't know
> > why it wasn't done this way in the first place, though I missed the
> > arguments where it was rejected.
> There might be many reasons; one obvious reason is that you can't spell
> the empty set that way.

Hmm... how about "{,}", which is the same trick tuples use for the empty

Donovan Baarda <abo at>

From bokr at  Mon Feb  6 15:27:12 2006
From: bokr at (Bengt Richter)
Date: Mon, 06 Feb 2006 14:27:12 GMT
Subject: [Python-Dev] Octal literals
References: <> <>
Message-ID: <>

On Mon, 6 Feb 2006 09:05:01 +0100, Thomas Wouters <thomas at> wrote:

>On Mon, Feb 06, 2006 at 05:33:57AM +0000, Bengt Richter wrote:
>> Perhaps I missed a py3k assumption in this thread (where I see in the PEP
>> that "Remove distinction between int and long types" is core item number
>> one)?
> -- an ungoing process, not a
>Py3K-eventual one.
Thanks, I noticed. Hence my question following what you quote:
 Is PEP 237 phase C to be implemented sooner than py3k,
making isinstance(<something>, int) a transparently distinction-hiding alias for
isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int)
be _guaranteed_ to be a bug, thus requiring code change? If so, when?
Sorry that my paragraph-packing habit tends to bury things. I'll have to work on that ;-/

Bengt Richter

From ronaldoussoren at  Mon Feb  6 15:36:06 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Mon, 06 Feb 2006 15:36:06 +0100
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Monday, February 06, 2006, at 03:12PM, Donovan Baarda <abo at> wrote:

>On Fri, 2006-02-03 at 20:02 +0100, "Martin v. L?wis" wrote:
>> Donovan Baarda wrote:
>> > Before set() the standard way to do them was to use dicts with None
>> > Values... to me the "{1,2,3}" syntax would have been a logical extension
>> > of the "a set is a dict with no values, only keys" mindset. I don't know
>> > why it wasn't done this way in the first place, though I missed the
>> > arguments where it was rejected.
>> There might be many reasons; one obvious reason is that you can't spell
>> the empty set that way.
>Hmm... how about "{,}", which is the same trick tuples use for the empty

Isn't () the empty tuple? I guess you're confusing this with a single element tuple: (1,) instead of (1) (well actually it is "1,")

BTW. I don't like your proposal for spelling the empty set as {,} because that is entirely non-obvious. If {1,2,3} where a valid way to spell a set literal, I'd expect {} for the empty set.


>Donovan Baarda <abo at>
>Python-Dev mailing list
>Python-Dev at

From abo at  Mon Feb  6 15:42:31 2006
From: abo at (Donovan Baarda)
Date: Mon, 06 Feb 2006 14:42:31 +0000
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On Mon, 2006-02-06 at 15:36 +0100, Ronald Oussoren wrote:
>  On Monday, February 06, 2006, at 03:12PM, Donovan Baarda <abo at> wrote:
> >On Fri, 2006-02-03 at 20:02 +0100, "Martin v. L?wis" wrote:
> >> Donovan Baarda wrote:
> >> > Before set() the standard way to do them was to use dicts with None
> >> > Values... to me the "{1,2,3}" syntax would have been a logical extension
> >> > of the "a set is a dict with no values, only keys" mindset. I don't know
> >> > why it wasn't done this way in the first place, though I missed the
> >> > arguments where it was rejected.
> >> 
> >> There might be many reasons; one obvious reason is that you can't spell
> >> the empty set that way.
> >
> >Hmm... how about "{,}", which is the same trick tuples use for the empty
> >tuple?
> Isn't () the empty tuple? I guess you're confusing this with a single element tuple: (1,) instead of (1) (well actually it is "1,")

Yeah, sorry.. nasty brainfart...

> BTW. I don't like your proposal for spelling the empty set as {,} because that is entirely non-obvious. If {1,2,3} where a valid way to spell a set literal, I'd expect {} for the empty set.

yeah... the problem is differentiating the empty set from an empty dict.
The only alternative that occured to me was the not-so-nice and
not-backwards-compatible "{:}" for an empty dict and "{}" for an empty

Donovan Baarda <abo at>

From guido at  Mon Feb  6 18:34:42 2006
From: guido at (Guido van Rossum)
Date: Mon, 6 Feb 2006 09:34:42 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On 2/6/06, Donovan Baarda <abo at> wrote:
> yeah... the problem is differentiating the empty set from an empty dict.
> The only alternative that occured to me was the not-so-nice and
> not-backwards-compatible "{:}" for an empty dict and "{}" for an empty
> set.

How about spelling the empty set as ``set()''? Wouldn't that solve the
ambiguity and the backwards compatibility nicely?

--Guido van Rossum (home page:

From guido at  Mon Feb  6 18:44:32 2006
From: guido at (Guido van Rossum)
Date: Mon, 6 Feb 2006 09:44:32 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/6/06, Bengt Richter <bokr at> wrote:
>  Is PEP 237 phase C to be implemented sooner than py3k,
> making isinstance(<something>, int) a transparently distinction-hiding alias for
> isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int)
> be _guaranteed_ to be a bug, thus requiring code change? If so, when?

Probably not before Python 3.0. Until then, int and long will be
distinct types for backwards compatibilty reasons. But we want as much
code as possible to treat longs the same as ints, hence the party line
that (barring attenuating circumstances :-) isinstance(x, int) is a
bug if the code doesn't also have a similar case for long. If you find
standard library code (in Python *or* C!) that treats int
preferentially, please submit a patch or bug.

What we should do in 3.0 is not entirely clear to me. It would be nice
if there was only a single type (named 'int', of course) with two
run-time representations, one similar to the current int and one
similar to the current long. But that's not so easy, and somewhat
contrary to the philosophy that differences in (C-level)
representation are best distinguisghed by looking at the type of an
object. The next most likely solution is to make long a subclass of
int, or perhaps to make int an abstract base class with two
subclasses, short and long.

--Guido van Rossum (home page:

From jcarlson at  Mon Feb  6 19:39:38 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 06 Feb 2006 10:39:38 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <>
Message-ID: <>

Donovan Baarda <abo at> wrote:
> On Fri, 2006-02-03 at 11:56 -0800, Josiah Carlson wrote:
> > Along the lines of "not every x line function should be a builtin", "not
> > every builtin should have syntax".  I think that sets have particular
> > uses, but I don't believe those uses are sufficiently varied enough to
> > warrant the creation of a syntax.  I suggest that people take a walk
> > through their code. How often do you use other sequence and/or mapping
> > types? How many lists, tuples and dicts are there?  How many sets? Ok,
> > now how many set literals?
> The absence of sets in early Python, the requirement to "import sets"
> when they first appeared, and the lack of a set syntax now all mean that
> people tend to avoid using sets and resort to lists, tuples, and "dicts
> of None" instead, even though they really want a set. Anywhere you see
> "if value in sequence:", they probably mean sequence is a set, and this
> code would run much faster if it really was, and might even avoid
> potential bugs because it would prevent duplicates...

Maybe they mean set, maybe they don't.  'if obj in seq' is used for
various reasons.

A quick check of the Python standard library shows that some of the uses
of 'if obj in tuple_literal' could certainly be converted into sets, but
that ignores the performance impact of using sets instead of short
tuples (where short, if I remember correctly, is a length of 3, check
the python-dev archives), as well as the module-level contant creation
that occurs with tuples. There was probably a good reason why such a
thing hasn't happened with lists and dicts (according to my Python 2.4
installation), and why it may not happen with sets.

A nontrivial number of other 'if obj in seq' instances actually need
dictionaries, the test is for some sort of data handler or headers with
a particular name.

 - Josiah

From aleaxit at  Mon Feb  6 19:37:42 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 6 Feb 2006 10:37:42 -0800
Subject: [Python-Dev] syntactic support for sets
In-Reply-To: <>
References: <Pine.GSO.4.58.0602011353350.2165@dvp.cs>
Message-ID: <>

On 2/6/06, Guido van Rossum <guido at> wrote:
> On 2/6/06, Donovan Baarda <abo at> wrote:
> > yeah... the problem is differentiating the empty set from an empty dict.
> > The only alternative that occured to me was the not-so-nice and
> > not-backwards-compatible "{:}" for an empty dict and "{}" for an empty
> > set.
> How about spelling the empty set as ``set()''? Wouldn't that solve the
> ambiguity and the backwards compatibility nicely?

And of course, thanks to the time machine, it has always worked that way:

hesperos:~$ python2.4
Python 2.4.1 (#1, Apr 21 2005, 11:14:17)
[GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> set()

just like dict(), tuple(), list(), str(), int(), float(), bool(),
complex() -- each type, called without args, returns an instance F of
that type such that "bool(F) is False" holds (meaning len(F)==0 for
container types, F==0 for number types).


From aleaxit at  Mon Feb  6 20:00:10 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 6 Feb 2006 11:00:10 -0800
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/6/06, Guido van Rossum <guido at> wrote:
> What we should do in 3.0 is not entirely clear to me. It would be nice
> if there was only a single type (named 'int', of course) with two
> run-time representations, one similar to the current int and one
> similar to the current long. But that's not so easy, and somewhat
> contrary to the philosophy that differences in (C-level)
> representation are best distinguisghed by looking at the type of an
> object. The next most likely solution is to make long a subclass of
> int, or perhaps to make int an abstract base class with two
> subclasses, short and long.

Essentially, you need to decide: does type(x) mostly refer to the
protocol that x respects ("interface" plus semantics and pragmatics),
or to the underlying implementation?  If the latter,  as your
observation about "the philosophy" suggests, then it would NOT be nice
if int was an exception wrt other types.

If int is to be a concrete type, then I'd MUCH rather it didn't get
subclassed, for all sorts of both pratical and principled reasons. 
So, to me, the best solution would be the abstract base class with
concrete implementation subclasses.  Besides being usable for
isinstance checks, like basestring, it should also work as a factory
when called, returning an instance of the appropriate concrete
subclass.  AND it would let me have (part of) what I was pining for a
while ago -- an abstract base class that type gmpy.mpz can subclass to
assert "I _am_ an integer type!", so lists will accept mpz instances
as indices, etc etc.

Now consider how nice it would be, on occasion, to be able to operate
on an integer that's guaranteed to be 8, 16, 32, or 64 bits, to
ensured the desired shifting/masking behavior for certain kinds of
low-level programming; and also on one that's unsigned, in each of
these sizes.  Python could have a module offering signed8, unsigned16,
and so forth (all combinations of size and signedness supported by the
underlying C compiler), all subclassing the abstract int, and
guarantee much happiness to people who are, for example, writing a
Python prototype of code that's going to become C or assembly...

Similarly, it would help a slightly different kind of prototyping a
lot if another Python module could offer 32-bit, 64-bit, 80-bit and
128-bit floating point types (if supported by the underlying C
compiler) -- all subclassing an ABSTRACT 'float'; the concrete
implementation that one gets by calling float or using a float literal
would also subclass it... and so would the decimal type (why not? it's
floating point -- 'float' doesn't mean 'BINARY fp';-).  And I'd be
happy, because gmpy.mpf could also subclass the abstract float!

And then finally we could have an abstract superclass 'number', whose
subclasses are the abstract int and the abstract float (dunno 'bout
complex, I'd be happy either way), and Python's typesystem would
finally start being nice and cleanly organized instead of
grand-prarie-level flat ...!-)


From rhamph at  Mon Feb  6 20:39:52 2006
From: rhamph at (Adam Olsen)
Date: Mon, 6 Feb 2006 12:39:52 -0700
Subject: [Python-Dev] Octal literals
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/6/06, Alex Martelli <aleaxit at> wrote:
> Now consider how nice it would be, on occasion, to be able to operate
> on an integer that's guaranteed to be 8, 16, 32, or 64 bits, to
> ensured the desired shifting/masking behavior for certain kinds of
> low-level programming; and also on one that's unsigned, in each of
> these sizes.  Python could have a module offering signed8, unsigned16,
> and so forth (all combinations of size and signedness supported by the
> underlying C compiler), all subclassing the abstract int, and
> guarantee much happiness to people who are, for example, writing a
> Python prototype of code that's going to become C or assembly...

I dearly hope such types do NOT subclass abstract int.  The reason is
that although they can represent an integral value they do not behave
like one.  Approximately half of all possible float values are
integral, but would you want it to subclass abstract int when
possible?  Of course not, the behavior is vastly different, and any
function doing more than just comparing to it would have to convert it
to the true int type before use it.

I see little point for more than one integer type.  long behaves
properly like an integer in all cases I can think of, with the long
exception of performance.  And given that python tends to be orders of
magnitudes slower than C code there is little desire to trade off
functionality for performance.

That we have two integer types is more of a historical artifact than a
consious decision.  We may not be willing to trade off functionality
for performance, but once we've already made the tradeoff we're
reluctant to go back.  So it seems the challenge is this: can anybody
patch long to have performance sufficiently close to int for small

Adam Olsen, aka Rhamphoryncus

From smiles at  Mon Feb  6 21:12:03 2006
From: smiles at (Chris or Leslie Smith)
Date: Mon, 6 Feb 2006 14:12:03 -0600
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <001301c62b59$cdfbe900$152c4fca@csmith>

|| def areclose(x,y,rtol=1.e-5,atol=1.e-8):
||      return abs(x-y)<atol+rtol*abs(y)
| Looks interesting.  I don't quite understand what atol/rtol are,
| though. 

Does it help to spell it like this?

def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
 diff = abs(x - y)
 ave = (abs(x) + abs(y))/2
 return diff < absolute_err or diff/ave < relative_err

Also, separating the two terms with 'or' rather than '+' makes the two error terms mean more what they are named. The '+' mixes the two effects and even though the result is basically the same, it makes it difficult to explain when the test will be true.


From aahz at  Mon Feb  6 21:20:31 2006
From: aahz at (Aahz)
Date: Mon, 6 Feb 2006 12:20:31 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <001301c62b59$cdfbe900$152c4fca@csmith>
References: <>
Message-ID: <>

On Mon, Feb 06, 2006, Chris or Leslie Smith wrote:
> || def areclose(x,y,rtol=1.e-5,atol=1.e-8):
> ||      return abs(x-y)<atol+rtol*abs(y)
> | 
> | Looks interesting.  I don't quite understand what atol/rtol are,
> | though. 
> Does it help to spell it like this?
> def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
>   diff = abs(x - y)
>   ave = (abs(x) + abs(y))/2
>   return diff < absolute_err or diff/ave < relative_err
> Also, separating the two terms with 'or' rather than '+' makes the
> two error terms mean more what they are named. The '+' mixes the two
> effects and even though the result is basically the same, it makes it
> difficult to explain when the test will be true.

Yes, that's a big help.  I was a bit concerned that this would have no
utility for numbers with large magnitude.  Alex, given your focus on
Python readability, I'm a bit surprised you didn't write this to start
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From janssen at  Mon Feb  6 21:55:50 2006
From: janssen at (Bill Janssen)
Date: Mon, 6 Feb 2006 12:55:50 PST
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: Your message of "Sun, 05 Feb 2006 09:43:28 PST."
Message-ID: <06Feb6.125557pst."58633">

> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.


This would remove my strongest objection to the current Python 3000 PEP.

Now, let's improve lambda... :-).


From aleaxit at  Mon Feb  6 22:03:26 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 6 Feb 2006 13:03:26 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/6/06, Aahz <aahz at> wrote:
> > def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
> >   diff = abs(x - y)
> >   ave = (abs(x) + abs(y))/2
> >   return diff < absolute_err or diff/ave < relative_err
> >
> > Also, separating the two terms with 'or' rather than '+' makes the
> > two error terms mean more what they are named. The '+' mixes the two
> > effects and even though the result is basically the same, it makes it
> > difficult to explain when the test will be true.
> Yes, that's a big help.  I was a bit concerned that this would have no
> utility for numbers with large magnitude.  Alex, given your focus on
> Python readability, I'm a bit surprised you didn't write this to start
> with!

As I said, I was just copying the definition in Numeric, which is
well-tried by long use.  Besides, this "clear expression" could
present problems, such as possible overflows or divisions by zero when
ave is 0 or very small; much as I care about readability, I care about
correctness even more.

Once it comes to readability, I prefer Numeric's choice to call the
two terms "tolerances", rather than (as here) "errors"; maybe that
depends on my roots being in engineering, where an error means a
mistake (like it does in real life), while tolerance's a good and
useful thing to have (ditto), rather than some scientific discipline
where terms carry different nuances.


From thomas at  Mon Feb  6 22:32:10 2006
From: thomas at (Thomas Lotze)
Date: Mon, 06 Feb 2006 22:32:10 +0100
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <>

Steven Bethard wrote:

> Guido van Rossum wrote:
>> After so many attempts to come up with an alternative for lambda,
>> perhaps we should admit defeat. I've not had the time to follow the most
>> recent rounds, but I propose that we keep lambda, so as to stop wasting
>> everybody's talent and time on an impossible quest.

+1 for keeping the functionality, especially given list and generator
expressions being "compound lambda expressions" in a sense. Removing
anonymous functions would break a nice symmetry there.

> .. _alternatives:

Of those, I like the "for" syntax without parens around the arguments
best: (x*y + z for x, y, z). Parentheses around the whole expression
should be optional in the same cases that allow for omitting parentheses
around generator expressions. It fits perfectly with the way generator
expression syntax relates to generator function definitions, and re-using
the "for" keyword keeps the zoo of reserved words small.

Just my 2 cents and all that...


From raymond.hettinger at  Mon Feb  6 22:37:22 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 06 Feb 2006 16:37:22 -0500
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <002701c62b65$844a9750$7f00a8c0@RaymondLaptop1>

[Chris Smith]
> Does it help to spell it like this?
> def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
>     diff = abs(x - y)
>     ave = (abs(x) + abs(y))/2
>     return diff < absolute_err or diff/ave < relative_err

There is a certain beauty and clarity to this presentation; however, it is 
problematic numerically:

* the division by either absolute_err and relative_err can overflow or 
trigger a ZeroDivisionError

* the 'or' part of the expression can introduce an unnecessary discontinuity 
in the first derivative.

The original Numeric definition is likely to be better for people who know 
what they're doing; however, I still question whether it is an appropriate 
remedy for the beginner issue
 of why 1.1 + 1.1 + 1.1 doesn't equal 3.3.


From imbaczek at  Mon Feb  6 22:48:31 2006
From: imbaczek at (=?ISO-8859-2?Q?Marek_=22Baczek=22_Baczy=F1ski?=)
Date: Mon, 6 Feb 2006 22:48:31 +0100
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <002701c62b65$844a9750$7f00a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

2006/2/6, Raymond Hettinger <raymond.hettinger at>:
> The original Numeric definition is likely to be better for people who know
> what they're doing; however, I still question whether it is an appropriate
> remedy for the beginner issue
>  of why 1.1 + 1.1 + 1.1 doesn't equal 3.3.

Beginners won't know about math.areclose anyway (and if they will,
they won't use it, thinking "why bother?"), and having a standard,
well-behaved and *correct* version of a useful function can't hurt.

{ Marek Baczy?ski :: UIN 57114871 :: GG 161671 :: JID imbaczek at  }
{ | imbaczek at poczta fm | }
.. .. .. .. ... ... ...... evolve or face extinction ...... ... ... .. .. .. ..

From rrr at  Tue Feb  7 00:51:29 2006
From: rrr at (Ron Adam)
Date: Mon, 06 Feb 2006 17:51:29 -0600
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <>
References: <>	<001301c62b59$cdfbe900$152c4fca@csmith>	<>
Message-ID: <>

Alex Martelli wrote:
> On 2/6/06, Aahz <aahz at> wrote:
>    ...
>>>def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
>>>  diff = abs(x - y)
>>>  ave = (abs(x) + abs(y))/2
>>>  return diff < absolute_err or diff/ave < relative_err
>>>Also, separating the two terms with 'or' rather than '+' makes the
>>>two error terms mean more what they are named. The '+' mixes the two
>>>effects and even though the result is basically the same, it makes it
>>>difficult to explain when the test will be true.
>>Yes, that's a big help.  I was a bit concerned that this would have no
>>utility for numbers with large magnitude.  Alex, given your focus on
>>Python readability, I'm a bit surprised you didn't write this to start
> As I said, I was just copying the definition in Numeric, which is
> well-tried by long use.  Besides, this "clear expression" could
> present problems, such as possible overflows or divisions by zero when
> ave is 0 or very small; much as I care about readability, I care about
> correctness even more.

It looks like the definition from Numeric measures relative error while 
the above measure relative deviation.  I'm not sure which one would be 
desirable or if they are interchangeable.  I was looking up relative 
error to try and understand the above at the following site.

As far as beginner vs advanced users are concerned I think that is a 
matter of documentation especially when intermediate users are concerned 
which I believe are the majority.

Possibly something like the following would be suitable... ?

   The absolute error is the absolute value of the difference between
   the accepted value and the measurement.

   Absolute error = abs( Observed - Accepted value )

   The Relative err is the percentage of absolute err relative to the
   accepted value.

                    Absolute error
   Relative error = -------------- x 100%
                    Accepted value

def isclose(observed, accepted, abs_err, rel_err):
     """Determine if the accuracy of a observed value is close
        to an accepted value"""
     diff = abs(observed, accepted)
     if diff < abs_err: return True
         return 100 * abs_diff / accepted < rel_err
     except ZeroDivisionError:
     return False

    Ron Adam

From kbk at  Tue Feb  7 03:45:28 2006
From: kbk at (Kurt B. Kaiser)
Date: Mon, 6 Feb 2006 21:45:28 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <>

Patch / Bug Summary

Patches :  391 open ( +0) /  3038 closed (+10) /  3429 total (+10)
Bugs    :  915 open ( +9) /  5540 closed (+21) /  6455 total (+30)
RFE     :  209 open ( +2) /   197 closed ( +0) /   406 total ( +2)

New / Reopened Patches

difflib exceeding recursion limit  (2006-01-24)
CLOSED  opened by  Gustavo Niemeyer

Patch for bug #1380970  (2006-01-25)  opened by  Collin Winter

Clairify docs on reference stealing  (2006-01-26)  opened by  Collin Winter

optparse enable_interspersed_args disable_interspersed_args  (2006-01-26)  opened by  Rocky Bernstein

Configure patch for Mac OS X 10.3  (2006-01-27)  opened by  Ronald Oussoren

have SimpleHTTPServer return last-modified headers  (2006-01-28)  opened by  Aaron Swartz

Fix "be be" documentation typo in lang ref  (2006-02-01)
CLOSED  opened by  Wummel

Changes to nis module to support multiple NIS domains  (2006-02-02)
CLOSED  opened by  Ben Bell

Patches Closed

difflib exceeding recursion limit  (2006-01-24)  closed by  niemeyer

fix bsddb test associate problems w/bsddb 4.1  (2006-01-16)  closed by  greg

Patch f. bug 495682 cannot handle http_proxy with user:pass@  (2005-11-05)  closed by  loewis

bsddb3 build problems on FreeBSD (2.4 + 2.5)  (2005-02-22)  closed by  greg

Add support for db 4.3  (2004-11-23)  closed by  nnorwitz

zipfile: use correct system type on unixy systems  (2006-01-23)  closed by  loewis

Fill out the functional module  (2006-01-22)  closed by  rhettinger

Fix "be be" documentation typo in lang ref  (2006-02-01)  closed by  effbot

Changes to nis module to support multiple NIS domains  (2006-02-02)  closed by  loewis

anonymous mmap  (2006-01-16)  closed by  nnorwitz

New / Reopened Bugs

Popenhangs with latest Cygwin update  (2006-01-23)
CLOSED  opened by  Eric McRae

Popened file object close hangs in latest Cygwin update  (2006-01-23)  opened by  Eric McRae

zipfile: inserting some filenames produces corrupt .zips  (2006-01-24)  opened by  Grant Olson UnicodeError in RFC2322 header  (2006-01-25)  opened by  A. Sagawa

Can only install 1 of each version of Python on Windows  (2006-01-25)
CLOSED  opened by  Max M Rasmussen

Underspecified behaviour of string.split/rsplit  (2006-01-25)  opened by  Collin Winter

inconsistency in help(set)  (2006-01-25)  opened by  Gregory Petrosyan

Typo in online documentation - Replacing popen2.*  (2006-01-26)
CLOSED  opened by  Phil Wright

Inconsistency between StringIO and cStringIO  (2006-01-27)  opened by  Michael Kerrin

Problem with SOAPpy on 64-bit systems  (2006-01-27)
CLOSED  opened by  Gustavo J. A. M. Carneiro

SimpleHTTPServer doesn't return last-modified headers  (2006-01-28)  opened by  Aaron Swartz

EditorWindow demo causes attr-error  (2006-01-29)  opened by  snowman

float/atof have become locale aware  (2006-01-29)  opened by  Bernhard Herzog

PyRun_SimpleString won't parse \\x  (2006-01-30)
CLOSED  opened by  gnupun

PyImport_AppendInittab stores pointer to parameter  (2006-01-31)  opened by  coder_5

class dictionary shortcircuits __getattr__  (2006-01-31)  opened by  Shaun Cutts

IMPORT PROBLEM: Local submodule shadows global module  (2006-02-01)  opened by  Jens Engel

[win32] stderr atty encoding not set  (2006-02-01)  opened by  Snaury

http response dictionary incomplete  (2006-02-01)  opened by  Jim Jewett

CVS (not SVN) mentioned in Python FAQ  (2006-02-01)
CLOSED  opened by  Gregory Petrosyan

2.4.1 mentioned in Python FAQ as most stable version  (2006-02-01)
CLOSED  opened by  Gregory Petrosyan

Inconsistency in Programming FAQ  (2006-02-01)  opened by  Gregory Petrosyan

email.MIME*.as_string removes duplicate spaces  (2006-02-02)  opened by  hads

Unicode IOError: execfile(u'\u043a\u043a\u043a/')   (2006-02-02)  opened by  Robert Kiendl

PEP 4 additions  (2006-02-02)  opened by  Jim Jewett

mmap module leaks file descriptors on UNIX  (2006-02-02)
CLOSED  opened by  Fazal Majid

Email tests fail  (2006-02-04)
CLOSED  opened by  Martin v. L??wis

Assert failure in signal handling  (2006-02-04)
CLOSED  opened by  doom

The mmap module does unnecessary dup()  (2006-02-04)
CLOSED  opened by  Keith Dart

The email package needs an "application" type  (2006-02-04)  opened by  Keith Dart

urllib.FancyURLopener.redirect_internal looses data on POST!  (2006-02-04)  opened by  Robert Kiendl

urllib: HTTPS over (Squid) Proxy fails  (2006-02-04)  opened by  Robert Kiendl

patch for etree cdata and attr quoting  (2006-02-04)  opened by  Chris McDonough

os.remove OSError: [Errno 13] Permission denied  (2006-02-06)  opened by  cheops modified to work with .NET 2005 on win64  (2006-02-06)  opened by  beaudrym

Bugs Closed

Popenhangs with latest Cygwin update  (2006-01-23)  deleted by  sferic

__self - Watcom compiler reserved word  (2006-01-23)  closed by  nnorwitz

bsddb: segfault on db.associate call with Txn and large data  (2006-01-23)  closed by  nnorwitz

Closing dbenv first bsddb doesn't release locks &amp; segfau  (2003-08-13)  closed by  nnorwitz

cannot handle http_proxy with user:pass@  (2001-12-21)  closed by  loewis

BSD DB test failures for BSD DB 4.1  (2005-10-19)  closed by  nnorwitz

2.[345]: --with-wctype-functions 4 test failures  (2004-01-10)  closed by  nnorwitz

posixmodule uses utimes, which is broken in glibc-2.3.2  (2003-08-10)  closed by  nnorwitz

Error: ... ossaudiodev.c, line 48: Missing type specifier  (2005-05-05)  closed by  nnorwitz

Can only install 1 of each version of Python on Windows  (2006-01-25)  closed by  loewis

Typo in online documentation - Replacing popen2.*  (2006-01-26)  closed by  nnorwitz

Problem with SOAPpy on 64-bit systems  (2006-01-27)  closed by  loewis

PyRun_SimpleString won't parse \\x  (2006-01-30)  deleted by  effbot

Registry key CurrentVersion not set  (2003-10-22)  closed by  loewis

CVS (not SVN) mentioned in Python FAQ  (2006-02-01)  closed by  loewis

2.4.1 mentioned in Python FAQ as most stable version  (2006-02-01)  closed by  loewis

urllib2 doesn't do HTTP-EQUIV &amp; Refresh  (2002-10-21)  closed by  jjlee

urllib2 dont respect debuglevel in httplib  (2005-02-27)  closed by  abbatini

TimedRotatingFileHandler midnight rollover time increases  (01/04/06)  closed by  sf-robot

mmap module leaks file descriptors on UNIX  (2006-02-02)  closed by  nnorwitz

Email tests fail  (2006-02-04)  closed by  bwarsaw

Assert failure in signal handling  (2006-02-04)  closed by  nnorwitz

The mmap module does unnecessary dup()  (2006-02-04)  closed by  nnorwitz

r41552 broke test_file on OS X  (2005-12-04)  closed by  nnorwitz

New / Reopened RFE

lib-deprecated  (2006-02-02)  opened by  Jim Jewett

Support for MSVC 7 and MSVC8 in msvccompiler  (2006-02-06)  opened by  dlm

From edgimar at  Sun Feb  5 22:26:48 2006
From: edgimar at (Mark Edgington)
Date: Sun, 05 Feb 2006 22:26:48 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
Message-ID: <>

Does anyone have any comments about applying the following patch to 
asynchat?  It should not affect the behavior of the module in any way 
for those who do not want to use the feature provided by the patch.  The 
point of the patch is to make it easy to use asynchat in a multithreaded 
application.  Maybe I am missing something, and the patch really doesn't 
make it threadsafe?  Any comments would be appreciated.  Also, if it 
looks good to everyone, feel free to use it.

-------BEGIN PATCH----------
---    Fri Oct 15 03:03:16 2004
+++    Sun Feb 05 22:05:42 2006
@@ -59,10 +59,11 @@
     ac_in_buffer_size       = 4096
     ac_out_buffer_size      = 4096
-    def __init__ (self, conn=None):
+    def __init__ (self, conn=None, running_in_thread=False):
         self.ac_in_buffer = ''
         self.ac_out_buffer = ''
         self.producer_fifo = fifo()
+        self.running_in_thread = runnning_in_thread
         asyncore.dispatcher.__init__ (self, conn)
     def collect_incoming_data(self, data):
@@ -157,7 +158,9 @@
     def push (self, data):
         self.producer_fifo.push (simple_producer (data))
-        self.initiate_send()
+        # only initiate a send if not running in a threaded 
environment, since
+        # initiate_send() is not threadsafe.
+        if not self.running_in_thread: self.initiate_send()
     def push_with_producer (self, producer):
         self.producer_fifo.push (producer)
-------END PATCH----------


From xavier.morel at  Sun Feb  5 19:43:04 2006
From: xavier.morel at (Morel Xavier)
Date: Sun, 05 Feb 2006 19:43:04 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.
The inline anonymous `def` isn't as ugly/problematic as the block (block 
anonymous def) version, and could probably work better than lambda, I 
think (a bit more verbose, but at least it doesn't feel like a castrated 
function definition, is more coherent with the existing function 
definition syntax, and accepts more than a single statement... well that 
last part probably isn't a pro arguments...).

Couldn't it be enabled (as an inline construct only) to replace the 
current lambda?

From brett at  Tue Feb  7 03:56:12 2006
From: brett at (Brett Cannon)
Date: Mon, 6 Feb 2006 18:56:12 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/5/06, Guido van Rossum <guido at> wrote:
> After so many attempts to come up with an alternative for lambda,
> perhaps we should admit defeat. I've not had the time to follow the
> most recent rounds, but I propose that we keep lambda, so as to stop
> wasting everybody's talent and time on an impossible quest.

I have been thinking about this, and I have to say I am a little
disappointed (-0 disappointed, not -1 disappointed).  I honestly
bought the argument for removing lambda.  And I think that a deferred
object would help with one of lambda's biggest uses and made its loss
totally reasonable.

But I know that everyone and their email client is against me on this
one, so I am not going to really try to tear into this.  But I do
think that lambda needs a renaming.  Speaking as someone who still
forgets that Python's lambda is not the same as those found in
functional languages, I would much rather have it named 'expr' or
'expression' or something that is more inline with its abilities then
with a name taken for CS historical reasons.  This ain't for father's
lambda and thus shouldn't be named so.

Then again, Guido did say he "should", not that he "did" admit defeat.  =)


From martin at  Tue Feb  7 07:29:49 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 07:29:49 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Edgington wrote:
> Does anyone have any comments about applying the following patch to 
> asynchat? 

That patch looks wrong. What does it mean to "run in a thread"?
All code runs in a thread, all the time: sometime, that thread
is the main thread.

Furthermore, I can't see any presumed thread-unsafety in asynchat.

Sure, there is a lot of member variables in asynchat which aren't
specifically protected against mutual access from different threads.
So you shouldn't be accessing the same async_chat object from multiple
threads. I cannot see why using a creating and using
an async_chat object in a thread that is not the main thread
could cause any problems. I also cannot see how this patch could
have significant effect on asyn_chat's behaviour when used in
multiple threads.


From martin at  Tue Feb  7 07:36:12 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 07:36:12 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Brett Cannon wrote:
> But I know that everyone and their email client is against me on this
> one, so I am not going to really try to tear into this.  But I do
> think that lambda needs a renaming.  Speaking as someone who still
> forgets that Python's lambda is not the same as those found in
> functional languages

Can you elaborate on that point? I feel that Python's lambda is exactly
the same as the one in Lisp. Sure, the Lisp lambda supports multiple
sequential expressions (the "progn" feature), but I understand that
this is just "an extension" (although one that has been around several

Of course, Python's expressions are much more limited as Lisp's (where
you really can have macros and special forms in as the "expression"
in a lambda), but the lambda construct itself seems to be the very
same one.


From radeex at  Tue Feb  7 08:19:35 2006
From: radeex at (Christopher Armstrong)
Date: Tue, 7 Feb 2006 18:19:35 +1100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/7/06, "Martin v. L?wis" <martin at> wrote:
> Brett Cannon wrote:
> > But I know that everyone and their email client is against me on this
> > one, so I am not going to really try to tear into this.  But I do
> > think that lambda needs a renaming.  Speaking as someone who still
> > forgets that Python's lambda is not the same as those found in
> > functional languages
> Can you elaborate on that point? I feel that Python's lambda is exactly
> the same as the one in Lisp. Sure, the Lisp lambda supports multiple
> sequential expressions (the "progn" feature), but I understand that
> this is just "an extension" (although one that has been around several
> decades).
> Of course, Python's expressions are much more limited as Lisp's (where
> you really can have macros and special forms in as the "expression"
> in a lambda), but the lambda construct itself seems to be the very
> same one.

If we phrase it somewhat differently, we can see that lambdas are
different in Python and Lisp, in a very practical way. First:
Everything in Lisp is an expression. There's no statement, in Lisp,
that isn't also an expression. Lambdas in Lisp can contain arbitrary
expressions; therefore you can put any language construct inside a
lambda. In Python, you cannot put any language construct inside a
lambda. Python's and Lisp's lambdas are effectively totally different.

+1 on keeping Lambda, +1 on making it more useful.

  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix    |    --
            |  Release Manager, Twisted Project
  \\\V///   |    --
   |o O|    |

From seojiwon at  Tue Feb  7 09:34:39 2006
From: seojiwon at (Jiwon Seo)
Date: Tue, 7 Feb 2006 00:34:39 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/6/06, Christopher Armstrong <radeex at> wrote:
> On 2/7/06, "Martin v. L?wis" <martin at> wrote:
> > Brett Cannon wrote:
> > > But I know that everyone and their email client is against me on this
> > > one, so I am not going to really try to tear into this.  But I do
> > > think that lambda needs a renaming.  Speaking as someone who still
> > > forgets that Python's lambda is not the same as those found in
> > > functional languages
> >
> > Can you elaborate on that point? I feel that Python's lambda is exactly
> > the same as the one in Lisp. Sure, the Lisp lambda supports multiple
> > sequential expressions (the "progn" feature), but I understand that
> > this is just "an extension" (although one that has been around several
> > decades).
> >
> > Of course, Python's expressions are much more limited as Lisp's (where
> > you really can have macros and special forms in as the "expression"
> > in a lambda), but the lambda construct itself seems to be the very
> > same one.
> If we phrase it somewhat differently, we can see that lambdas are
> different in Python and Lisp, in a very practical way. First:
> Everything in Lisp is an expression. There's no statement, in Lisp,
> that isn't also an expression. Lambdas in Lisp can contain arbitrary
> expressions; therefore you can put any language construct inside a
> lambda. In Python, you cannot put any language construct inside a
> lambda. Python's and Lisp's lambdas are effectively totally different.
> +1 on keeping Lambda, +1 on making it more useful.

After lambda being made more useful, can I hope that I will be able to
use lambda with multiple statements? :) Lambdas in Lisp and Python are
different, but in the usability perspective they don't need to differ
too much.


From thomas at  Tue Feb  7 09:52:02 2006
From: thomas at (Thomas Lotze)
Date: Tue, 07 Feb 2006 09:52:02 +0100
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <>

Jiwon Seo wrote:

> After lambda being made more useful, can I hope that I will be able to use
> lambda with multiple statements? :) Lambdas in Lisp and Python are
> different, but in the usability perspective they don't need to differ too
> much.

I don't think it helps usability much if anonymous functions are allowed
multiple statements. IMO greater amounts of code deserve a named function
for readability's sake, and the distinction between expressions and suites
feels like a good criterion for what is a greater amount of code. In any
case, it's the same limit as found in list and generator expressions or
the proposed conditional expression.


From p.f.moore at  Tue Feb  7 10:56:31 2006
From: p.f.moore at (Paul Moore)
Date: Tue, 7 Feb 2006 09:56:31 +0000
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/7/06, Brett Cannon <brett at> wrote:
> On 2/5/06, Guido van Rossum <guido at> wrote:
> > After so many attempts to come up with an alternative for lambda,
> > perhaps we should admit defeat. I've not had the time to follow the
> > most recent rounds, but I propose that we keep lambda, so as to stop
> > wasting everybody's talent and time on an impossible quest.
> I have been thinking about this, and I have to say I am a little
> disappointed (-0 disappointed, not -1 disappointed).  I honestly
> bought the argument for removing lambda.  And I think that a deferred
> object would help with one of lambda's biggest uses and made its loss
> totally reasonable.

I'm not 100% sure what you mean here, but as far as my understanding
goes, current lambda *is* a "deferred object" (or at least a "deferred
expression", which may not be quite what you mean...)

> But I know that everyone and their email client is against me on this
> one, so I am not going to really try to tear into this.  But I do
> think that lambda needs a renaming.

I agree with this. The *name* "lambda" is a wart, even if the deferred
expression feature isn't. My preference is to simply replace the
keyword lambda with a keyword "expr" (or if that's not acceptable
because there's too much prior use of expr as a variable name, then
maybe "expression" - but that's starting to get a bit long).

> Speaking as someone who still
> forgets that Python's lambda is not the same as those found in
> functional languages,

Well, only in the sense that Python's *expressions* are not the same
as those found in functional languages (ie, Python has statements
which are not expressions). But I see your point - and I strongly
object to going the other way and extending lambda/expr to allow
statements or suites.

> I would much rather have it named 'expr' or
> 'expression' or something that is more inline with its abilities then
> with a name taken for CS historical reasons.  This ain't for father's
> lambda and thus shouldn't be named so.

Agreed. But if "expr" isn't acceptable, I don't like the other common
suggestion of reusing "def". It's not a definition, nor is it "like an
anonymous function" (the lack of support for statements/suites being
the key difference).

> Then again, Guido did say he "should", not that he "did" admit defeat.  =)

OTOH, he was trying to stop endless the discussion... :-)


From crutcher at  Tue Feb  7 11:46:45 2006
From: crutcher at (Crutcher Dunnavant)
Date: Tue, 7 Feb 2006 02:46:45 -0800
Subject: [Python-Dev] Any interest in tail call optimization as a decorator?
Message-ID: <>

Maybe someone has already brought this up, but my searching hasn't
revealed it. Is there any interest in something like this for the
functional module?

#!/usr/bin/env python2.4
# This program shows off a python decorator which implements
# tail call optimization. It does this by throwing an exception
# if it is it's own grandparent, and catching such exceptions
# to recall the stack.

import sys

class TailRecurseException:
  def __init__(self, args, kwargs):
    self.args = args
    self.kwargs = kwargs

def tail_call_optimized(g):
  def func(*args, **kwargs):
      raise ZeroDivisionError
    except ZeroDivisionError:
      f = sys.exc_info()[2].tb_frame
    if f.f_back and f.f_back.f_back \
        and f.f_back.f_back.f_code == f.f_code:
      raise TailRecurseException(args, kwargs)
      while 1:
          return g(*args, **kwargs)
        except TailRecurseException, e:
          args = e.args
          kwargs = e.kwargs
  func.__doc__ = g.__doc__
  return func

def factorial(n, acc=1):
  "calculate a factorial"
  if n == 0:
    return acc
  return factorial(n-1, n*acc)

print factorial(10000)
# prints a big, big number,
# but doesn't hit the recursion limit.

def fib(i, current = 0, next = 1):
  if i == 0:
    return current
    return fib(i - 1, next, current + next)

print fib(10000)
# also prints a big number,
# but doesn't hit the recursion limit.

Crutcher Dunnavant <crutcher at>

From arigo at  Tue Feb  7 11:42:24 2006
From: arigo at (Armin Rigo)
Date: Tue, 7 Feb 2006 11:42:24 +0100
Subject: [Python-Dev] cProfile module
Message-ID: <>

Hi all,

As promized two months ago, I eventually finished the integration of the
'lsprof' profiler.  It's now in an internal '_lsprof' module that is
exposed via a 'cProfile' module with the same interface as 'profile',
producing compatible dump stats that can be inspected with 'pstats'.

See previous discussion here:


The code is currently in the following repository, from where I'll merge
it into CPython if nobody objects:


with tests and docs, including new tests and doc refinements for profile
itself.  The docs mark hotshot as "reversed for specialized usage".
They probably need a bit of bad-English-hunting...

And yes, I do promize to maintain this code in the future.

A bientot,


From murman at  Tue Feb  7 16:47:46 2006
From: murman at (Michael Urman)
Date: Tue, 7 Feb 2006 09:47:46 -0600
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/6/06, Brett Cannon <brett at> wrote:
> And I think that a deferred object would help with one of
> lambda's biggest uses and made its loss totally reasonable.

The ambiguity inherent from the perspective of a deferred object makes
a general one impractical. Both map(Deferred().attribute, seq) and
map(Deferred().method(arg), seq) look the same - how does the object
know that the first case it should return the attribute of the first
element of seq when called, but in the second it should wait for the
next call when it will call method(arg) on the first element of seq?

Since there's also no way to spell "lambda y: foo(x, y, z)" on a
simple deferred object, it's strictly less powerful. If the current
Python lambda's functionality is desired, there is no better pythonic
way to spell it. There are plenty of new syntactic options that help
highlight its expression nature, but are they worth the change?

Michael Urman

From martin at  Tue Feb  7 20:11:06 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 20:11:06 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>	
Message-ID: <>

Jiwon Seo wrote:
> After lambda being made more useful, can I hope that I will be able to
> use lambda with multiple statements? :) Lambdas in Lisp and Python are
> different, but in the usability perspective they don't need to differ
> too much.

To my knowledge, nobody proposed to make it "more useful", or to allow
statements in the body of a lambda expression (neither single nor


From oliphant.travis at  Tue Feb  7 20:52:21 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Tue, 07 Feb 2006 12:52:21 -0700
Subject: [Python-Dev] Help with Unicode arrays in NumPy
Message-ID: <dsatpo$4fo$>

This is a design question which is why I'm posting here.  Recently the 
NumPy developers have become more aware of the difference between UCS2 
and UCS4 builds of Python.  NumPy arrays can be of Unicode type.  In 
other words a NumPy array can be made of up fixed-data-length unicode 

Currently that means that they are "unicode" strings of basic size UCS2 
or UCS4 depending on the platform.  It is this duality that has some 
people concerned.  For all other data-types, NumPy allows the user to 
explicitly request a bit-width for the data-type.

So, we are thinking of introducing another data-type to NumPy to 
differentiate between UCS2 and UCS4 unicode strings.  (This also means a 
unicode scalar object, i.e. string of each of these, exactly one of 
which will inherit from the Python type).

Before embarking on this journey, however, we are seeking advice from 
individuals wiser to the way of Unicode on this list.

Perhaps all we need to do is be more careful on input and output of 
Unicode data-types so that transfer of unicode can be handled correctly 
on each platform.

Any thoughts?

-Travis Oliphant

From martin at  Tue Feb  7 21:06:28 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 21:06:28 +0100
Subject: [Python-Dev] Help with Unicode arrays in NumPy
In-Reply-To: <dsatpo$4fo$>
References: <dsatpo$4fo$>
Message-ID: <>

Travis E. Oliphant wrote:
> Currently that means that they are "unicode" strings of basic size UCS2 
> or UCS4 depending on the platform.  It is this duality that has some 
> people concerned.  For all other data-types, NumPy allows the user to 
> explicitly request a bit-width for the data-type.

Why is that a desirable property? Also: Why does have NumPy support for
Unicode arrays in the first place?

> Before embarking on this journey, however, we are seeking advice from 
> individuals wiser to the way of Unicode on this list.

My initial reaction is: use whatever Python uses in "NumPy Unicode".
Upon closer inspection, it is not all that clear what operations
are supported on a Unicode array, and how these operations relate
to the Python Unicode type.

In any case, I think NumPy should have only a single "Unicode array"
type (please do explain why having zero of them is insufficient).

If the purpose of the type is to interoperate with a Python
unicode object, it should use the same width (as this will
allow for mempcy).

If the purpose is to support arbitrary Unicode characters, it should
use 4 bytes (as two bytes are insufficient to represent arbitrary
Unicode characters).

If the purpose is something else, please explain what the purpose


From oliphant.travis at  Tue Feb  7 21:23:13 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Tue, 07 Feb 2006 13:23:13 -0700
Subject: [Python-Dev] Help with Unicode arrays in NumPy
In-Reply-To: <>
References: <dsatpo$4fo$> <>
Message-ID: <dsavjl$bgg$>

Martin v. L?wis wrote:
> Travis E. Oliphant wrote:
>>Currently that means that they are "unicode" strings of basic size UCS2 
>>or UCS4 depending on the platform.  It is this duality that has some 
>>people concerned.  For all other data-types, NumPy allows the user to 
>>explicitly request a bit-width for the data-type.
> Why is that a desirable property? Also: Why does have NumPy support for
> Unicode arrays in the first place?

Numpy supports arrays of arbitrary fixed-length "records".  It is much 
more than numeric-only data now.  One of the fields that a record can 
contain is a string.  If strings are supported, it makes sense to 
support unicode strings as well.

This allows NumPy to memory-map arbitrary data-files on disk.

Perhaps you should explain why you think NumPy "shouldn't support Unicode"

> My initial reaction is: use whatever Python uses in "NumPy Unicode".
> Upon closer inspection, it is not all that clear what operations
> are supported on a Unicode array, and how these operations relate
> to the Python Unicode type.

That is currently what is done.  The current unicode data-type is 
exactly what Python uses.

The chararray subclass gives to unicode and string arrays all the 
methods of unicode and strings (operating on an element-by-element basis).

When you extract an element from the unicode data-type you get a Python 
unicode object (every NumPy data-type has a corresponding "type-object" 
that determines what is returned when an element is extracted).  All of 
these types are in a hierarchy of data-types which inherit from the 
basic Python types when available.

> In any case, I think NumPy should have only a single "Unicode array"
> type (please do explain why having zero of them is insufficient).

Please explain why having zero of them is *sufficient*.

> If the purpose of the type is to interoperate with a Python
> unicode object, it should use the same width (as this will
> allow for mempcy).
> If the purpose is to support arbitrary Unicode characters, it should
> use 4 bytes (as two bytes are insufficient to represent arbitrary
> Unicode characters).

And Python does not support arbitrary Unicode characters on narrow 
builds?  Then how is \U0010FFFF represented?

> If the purpose is something else, please explain what the purpose
> is.

The purpose is to represent bytes as they might exist in a file or 
data-stream according to the users specification.  The purpose is 
whatever the user wants them for.  It's the same purpose as having an 
unsigned 64-bit data-type --- because users may need it to represent 
data as it exists in a file.

From theller at  Tue Feb  7 21:52:07 2006
From: theller at (Thomas Heller)
Date: Tue, 07 Feb 2006 21:52:07 +0100
Subject: [Python-Dev] ctypes patch (was:  (libffi) Re: Copyright issue)
In-Reply-To: <> (Martin v. =?iso-8859-1?Q?L=F6?=
	=?iso-8859-1?Q?wis's?= message of "Sun, 05 Feb 2006 14:12:10 +0100")
References: <>
Message-ID: <>

> Hye-Shik Chang <hyeshik at> writes:
>>> > I did some work to make ctypes+libffi compacter and liberal.
>>> >  (svn)
>>> >
>> Here goes patches for the integration:
>> [1]
>> [2]
>> I implemented it in two flavors.  [1] runs libffi's configure along with
>> Python's and just builds it.  And [2] has no change to
>> Python's configure and runs libffi configure and builds it.
>> And both patches don't have things for documentations yet.

[Thomas Heller]

> My plan is to make separate ctypes releases for 2.3 and 2.4, even after
> it is integrated into Python 2.5, so it seems [2] would be better - it
> must be possible to build ctypes without Python.
> As I said before, docs need still to be written.  I think content is
> more important than markup, so I'm writing in rest, it can be converted
> to latex later.  I expect that writing the docs will show quite some
> edges that need to be cleaned up - that should certainly be done before
> the first 2.5 release.
> Also I want to make a few releases before declaring the 1.0 version.
> This does not mean that I'm against integrating it right now.

"Martin v. L?wis" <martin at> writes:

> Not sure whether you think you need further approval: if you are ready
> to check this into the Python trunk, just go ahead. As I said, I would
> prefer if what is checked in is a literal copy of the ctypes CVS (as
> far as reasonable).

I was not looking for further approval, I wanted to explain why I prefer
the patch [2] that Hye-Shik posted above.

I'll do at least one separate ctypes release before checking this into
the Python trunk.



From martin at  Tue Feb  7 21:53:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 21:53:16 +0100
Subject: [Python-Dev] Help with Unicode arrays in NumPy
In-Reply-To: <dsavjl$bgg$>
References: <dsatpo$4fo$> <>
Message-ID: <>

Travis E. Oliphant wrote:
> Numpy supports arrays of arbitrary fixed-length "records".  It is
> much more than numeric-only data now.  One of the fields that a
> record can contain is a string.  If strings are supported, it makes
> sense to support unicode strings as well.

Hmm. How do you support strings in fixed-length records? Strings are
variable-sized, after all.

On common application is that you have a C struct in some API which
has a fixed-size array for string data (either with a length field,
or null-terminated), in this case, it is moderately useful to model
such a struct in Python. However, transferring this to Unicode is
pointless - there aren't any similar Unicode structs that need

> This allows NumPy to memory-map arbitrary data-files on disk.

Ok, so this is the "C struct" case. Then why do you need Unicode
support there? Which common file format has embedded fixed-size
Unicode data?

> Perhaps you should explain why you think NumPy "shouldn't support
> Unicode"

I think I said "Unicode arrays", not Unicode. Unicode arrays are
a pointless data type, IMO. Unicode always comes in strings
(i.e. variable sized, either null-terminated or with an introducing
length). On disk/on the wire Unicode comes as UTF-8 more often
than not.

Using UCS-2/UCS-2 as an on-disk represenationis also questionable
practice (although admittedly Microsoft uses that a lot).

> That is currently what is done.  The current unicode data-type is 
> exactly what Python uses.

Then I wonder how this goes along with the use case "allow to
map arbitrary files".

> The chararray subclass gives to unicode and string arrays all the 
> methods of unicode and strings (operating on an element-by-element
> basis).

For strings, I can see use cases (although I wonder how you deal
with data formats that also support variable-sized strings, as
most data formats supporting strings do).

> Please explain why having zero of them is *sufficient*.

Because I (still) cannot imagine any specific application that
might need such a feature (IOWYAGNI).

>> If the purpose is to support arbitrary Unicode characters, it
>> should use 4 bytes (as two bytes are insufficient to represent
>> arbitrary Unicode characters).
> And Python does not support arbitrary Unicode characters on narrow 
> builds?  Then how is \U0010FFFF represented?

It's represented using UTF-16. Try this for yourself:

py> len(u"\U0010FFFF")
py> u"\U0010FFFF"[0]
py> u"\U0010FFFF"[1]

This has all kinds of non-obvious implications.

> The purpose is to represent bytes as they might exist in a file or 
> data-stream according to the users specification.

See, and this is precisely the statement that I challenge. Sure,
they "might" exist - but I'd rather expect that they don't.

If they exist, "Unicode" might come as variable-sized UTF-8, UTF-16,
or UTF-32. In either case, NumPy should already support that by
mapping a string object onto the encoded bytes, to which you then
can apply .decode() should you need to process the actual Unicode

> The purpose is 
> whatever the user wants them for.  It's the same purpose as having an
>  unsigned 64-bit data-type --- because users may need it to represent
>  data as it exists in a file.

No. I would expect you have 64-bit longs because users *do* need them,
and because there wouldn't be an easy work-around if users wouldn't have
them. For Unicode, it's different: users don't directly need them
(atleast not many users), and if they do, there is an easy work-around
for their absence.

Say I want to process NTFS run lists. In NTFS run lists, there are
24-bit integers, 40-bit integers, and 4-bit integers (i.e. nibbles).
Can I represent them all in NumPy? Can I have NumPy transparently
map a sequence of run list records (which are variable-sized)
map as an array of run list records?


From brett at  Tue Feb  7 22:43:15 2006
From: brett at (Brett Cannon)
Date: Tue, 7 Feb 2006 13:43:15 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/6/06, Christopher Armstrong <radeex at> wrote:
> On 2/7/06, "Martin v. L?wis" <martin at> wrote:
> > Brett Cannon wrote:
> > > But I know that everyone and their email client is against me on this
> > > one, so I am not going to really try to tear into this.  But I do
> > > think that lambda needs a renaming.  Speaking as someone who still
> > > forgets that Python's lambda is not the same as those found in
> > > functional languages
> >
> > Can you elaborate on that point? I feel that Python's lambda is exactly
> > the same as the one in Lisp. Sure, the Lisp lambda supports multiple
> > sequential expressions (the "progn" feature), but I understand that
> > this is just "an extension" (although one that has been around several
> > decades).
> >
> > Of course, Python's expressions are much more limited as Lisp's (where
> > you really can have macros and special forms in as the "expression"
> > in a lambda), but the lambda construct itself seems to be the very
> > same one.
> If we phrase it somewhat differently, we can see that lambdas are
> different in Python and Lisp, in a very practical way. First:
> Everything in Lisp is an expression. There's no statement, in Lisp,
> that isn't also an expression. Lambdas in Lisp can contain arbitrary
> expressions; therefore you can put any language construct inside a
> lambda. In Python, you cannot put any language construct inside a
> lambda. Python's and Lisp's lambdas are effectively totally different.

Chris is exactly right in what I meant.  Lisp-like language do not
have the statement/expression dichotomy.  For instance, function
definitions are syntactic sugar for defining a lambda expression that
is bound to a name.  This only works in Python if the function body is
a single expression which is not the entire language.  For Lisp,
though, that can be anything allowed in the language, so the abilities
are different.


From brett at  Tue Feb  7 22:52:51 2006
From: brett at (Brett Cannon)
Date: Tue, 7 Feb 2006 13:52:51 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/7/06, Paul Moore <p.f.moore at> wrote:
> On 2/7/06, Brett Cannon <brett at> wrote:
> > On 2/5/06, Guido van Rossum <guido at> wrote:
> > > After so many attempts to come up with an alternative for lambda,
> > > perhaps we should admit defeat. I've not had the time to follow the
> > > most recent rounds, but I propose that we keep lambda, so as to stop
> > > wasting everybody's talent and time on an impossible quest.
> >
> > I have been thinking about this, and I have to say I am a little
> > disappointed (-0 disappointed, not -1 disappointed).  I honestly
> > bought the argument for removing lambda.  And I think that a deferred
> > object would help with one of lambda's biggest uses and made its loss
> > totally reasonable.
> I'm not 100% sure what you mean here, but as far as my understanding
> goes, current lambda *is* a "deferred object" (or at least a "deferred
> expression", which may not be quite what you mean...)

Yes, lambda is deferred.  What I mean is using lambda for things like
``lambda x: x.attr`` and such; specifically for deferred execution,
and not for stuff like ``lambda x: func(1, 2, x, 3, 4)`` stuff.

> > But I know that everyone and their email client is against me on this
> > one, so I am not going to really try to tear into this.  But I do
> > think that lambda needs a renaming.
> I agree with this. The *name* "lambda" is a wart, even if the deferred
> expression feature isn't. My preference is to simply replace the
> keyword lambda with a keyword "expr" (or if that's not acceptable
> because there's too much prior use of expr as a variable name, then
> maybe "expression" - but that's starting to get a bit long).
> > Speaking as someone who still
> > forgets that Python's lambda is not the same as those found in
> > functional languages,
> Well, only in the sense that Python's *expressions* are not the same
> as those found in functional languages (ie, Python has statements
> which are not expressions). But I see your point - and I strongly
> object to going the other way and extending lambda/expr to allow
> statements or suites.
> > I would much rather have it named 'expr' or
> > 'expression' or something that is more inline with its abilities then
> > with a name taken for CS historical reasons.  This ain't for father's
> > lambda and thus shouldn't be named so.
> Agreed. But if "expr" isn't acceptable, I don't like the other common
> suggestion of reusing "def". It's not a definition, nor is it "like an
> anonymous function" (the lack of support for statements/suites being
> the key difference).

Yeah, reusing def is taking back into the functional world too much. 
It makes our current use of def seem more like syntactic sugar for
assigning a lambda to a name for function definition and that is not
what is happening here.

> > Then again, Guido did say he "should", not that he "did" admit defeat.  =)
> OTOH, he was trying to stop endless the discussion... :-)

=)  Well, it should when Python 3 comes out, so there is some extra
incentive for that to happen sooner than later.


From brett at  Tue Feb  7 23:05:32 2006
From: brett at (Brett Cannon)
Date: Tue, 7 Feb 2006 14:05:32 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/7/06, Michael Urman <murman at> wrote:
> On 2/6/06, Brett Cannon <brett at> wrote:
> > And I think that a deferred object would help with one of
> > lambda's biggest uses and made its loss totally reasonable.
> The ambiguity inherent from the perspective of a deferred object makes
> a general one impractical. Both map(Deferred().attribute, seq) and
> map(Deferred().method(arg), seq) look the same - how does the object
> know that the first case it should return the attribute of the first
> element of seq when called, but in the second it should wait for the
> next call when it will call method(arg) on the first element of seq?

Magic.  =)  Honestly, I don't know, but I bet there is some evil,
black magic way to pull it off.  Otherwise, worst case, Deferred takes
an argument that flags that it has a method being called on it for it
to defer against and not to treat it as an attribute access only.  And
that is within reason in terms of interace requirement for the object,
in my opinion.

> Since there's also no way to spell "lambda y: foo(x, y, z)" on a
> simple deferred object, it's strictly less powerful. If the current
> Python lambda's functionality is desired, there is no better pythonic
> way to spell it. There are plenty of new syntactic options that help
> highlight its expression nature, but are they worth the change?

I never claimed that a deferred object would replace all uses of
lambda, just that it would make it reasonable.  For the above
suggestion I would go to a named function.  Or, if the argument was on
the end or everything named, use functional.partial().


From martin at  Tue Feb  7 23:55:47 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Feb 2006 23:55:47 +0100
Subject: [Python-Dev] Linking with mscvrt
Message-ID: <>

I just came up with an idea how to resolve the VC versioning
problems for good: Python should link with mscvrt.dll (which
is part of the operating system), not with the CRT that the
compiler provides.

To do that, we would need to compile and link with the SDK
header files and import libraries, not with the ones that
visual studio provides.

For that to work, everyone building Python or Python extensions (*)
would have to install the Platform SDK (which is available
for free, but contains quite a number of bits). Would that be

Disclaimer: I haven't tried yet whether this would actually


(*) For Python extensions, it should be possible to use mingw
instead, and configure it for linking against msvcrt.

From guido at  Wed Feb  8 00:05:38 2006
From: guido at (Guido van Rossum)
Date: Tue, 7 Feb 2006 15:05:38 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <> <>
Message-ID: <>

IMO asynchat and asyncore are braindead. The should really be removed
from the standard library. The code is 10 years old and represents at
least 10-year-old thinking about how to do this. The amount of hackery
in Zope related to asyncore was outrageous -- basically most of
asyncore's guts were replaced with more advanced Zope code, but the
API was maintained for compatibility reasons. A nightmare.


On 2/6/06, "Martin v. L?wis" <martin at> wrote:
> Mark Edgington wrote:
> > Does anyone have any comments about applying the following patch to
> > asynchat?
> That patch looks wrong. What does it mean to "run in a thread"?
> All code runs in a thread, all the time: sometime, that thread
> is the main thread.
> Furthermore, I can't see any presumed thread-unsafety in asynchat.
> Sure, there is a lot of member variables in asynchat which aren't
> specifically protected against mutual access from different threads.
> So you shouldn't be accessing the same async_chat object from multiple
> threads. I cannot see why using a creating and using
> an async_chat object in a thread that is not the main thread
> could cause any problems. I also cannot see how this patch could
> have significant effect on asyn_chat's behaviour when used in
> multiple threads.
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From fredrik at  Wed Feb  8 00:46:18 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 8 Feb 2006 00:46:18 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
References: <> <>
Message-ID: <dsbbgc$pio$>

Guido van Rossum wrote:

> IMO asynchat and asyncore are braindead. The should really be removed
> from the standard library. The code is 10 years old and represents at
> least 10-year-old thinking about how to do this.

strange.  I'd say it works perfectly fine for what it was designed for
(after all, sockets haven't changed much in 10 years either).

what other reactive socket framework is there that would fit well into
the standard library ?  is twisted really simple enough ?


From fredrik at  Wed Feb  8 00:56:32 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 8 Feb 2006 00:56:32 +0100
Subject: [Python-Dev] release plan for 2.5 ?
Message-ID: <dsbc3h$rct$>

a while ago, I wrote

> > Hopefully something can get hammered out so that at least the Python
> > 3 docs can premiere having been developed on by the whole community.
> why wait for Python 3 ?
> what's the current release plan for Python 2.5, btw?  I cannot find a
> relevant PEP, and the "what's new" says "late 2005":

but I don't think that anyone followed up on this.  what's the current
status ?


From fumanchu at  Wed Feb  8 01:01:38 2006
From: fumanchu at (Robert Brewer)
Date: Tue, 7 Feb 2006 16:01:38 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
Message-ID: <6949EC6CD39F97498A57E0FA55295B210171975F@ex9.hostedexchange.local>

Guido van Rossum wrote:
> IMO asynchat and asyncore are braindead. The should really be removed
> from the standard library. The code is 10 years old and represents at
> least 10-year-old thinking about how to do this. The amount of hackery
> in Zope related to asyncore was outrageous -- basically most of
> asyncore's guts were replaced with more advanced Zope code, but the
> API was maintained for compatibility reasons. A nightmare.

Perhaps, but please keep in mind that the smtpd module uses both, currently, and would have to be rewritten if either is "removed".

Robert Brewer
System Architect
Amor Ministries
fumanchu at

From aleaxit at  Wed Feb  8 01:28:09 2006
From: aleaxit at (Alex Martelli)
Date: Tue, 7 Feb 2006 16:28:09 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsbbgc$pio$>
References: <> <>
Message-ID: <>

On 2/7/06, Fredrik Lundh <fredrik at> wrote:
> what other reactive socket framework is there that would fit well into
> the standard library ?  is twisted really simple enough ?

Twisted is wonderful, powerful, rich, and very large.  Perhaps a small
subset could be carefully extracted that (given suitable volunteers to
maintain it in the future) might fit in the standard library, but [a]
that extraction is not going to be a simple or fast job, and [b] I
suspect that the minimum sensible subset would still be much larger
(and richer / more powerful) than asyncore.


From jcarlson at  Wed Feb  8 01:57:15 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 07 Feb 2006 16:57:15 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum <guido at> wrote:
> IMO asynchat and asyncore are braindead. The should really be removed
> from the standard library. The code is 10 years old and represents at
> least 10-year-old thinking about how to do this. The amount of hackery
> in Zope related to asyncore was outrageous -- basically most of
> asyncore's guts were replaced with more advanced Zope code, but the
> API was maintained for compatibility reasons. A nightmare.

I'm going to go ahead and disagree with Guido on this one.  Before
removing asyncore (and asynchat) from the standard library, I believe
that there would necessarily need to be a viable replacement already
in place. The SocketServer module and its derivatives are wholly
unscalable for server-oriented applications once you get past a few
dozen threads (where properly designed asyncore derivatives will do
quite well all the way to your platform file handle limit).

Every once and a while I hear about people pushing for Twisted to be
included with Python, but at 2 megs for the base bz2 package, it seems a
little...hefty.  I'm not aware of any other community-accepted package
for asynchronous socket clients and servers, but I'm always looking.

Now, don't get me wrong, writing servers and clients using asyncore or
asynchat can be a beast, but it does get one into the callback/reactor
method of programming, which seems to have invaded other parts of Python
and 3rd party libraries (xml.sax, tk, Twisted, wxPython, ...).

Back to the topic that Guido was really complaining about: Zope +
asyncore.  I don't doubt that getting Zope to play nicely with asyncore
was difficult, but it begs the questions: what would have been done if
asyncore didn't exist, and why wasn't that done instead of trying to
play nicely with asyncore?

 - Josiah

From barry at  Wed Feb  8 02:19:33 2006
From: barry at (Barry Warsaw)
Date: Tue, 07 Feb 2006 20:19:33 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B210171975F@ex9.hostedexchange.local>
References: <6949EC6CD39F97498A57E0FA55295B210171975F@ex9.hostedexchange.local>
Message-ID: <>

On Tue, 2006-02-07 at 16:01 -0800, Robert Brewer wrote:

> Perhaps, but please keep in mind that the smtpd module uses both, currently, and would have to be rewritten if either is "removed".

Would that really be a huge loss?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From radeex at  Wed Feb  8 02:59:00 2006
From: radeex at (Christopher Armstrong)
Date: Wed, 8 Feb 2006 12:59:00 +1100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 2/8/06, Alex Martelli <aleaxit at> wrote:
> On 2/7/06, Fredrik Lundh <fredrik at> wrote:
>    ...
> > what other reactive socket framework is there that would fit well into
> > the standard library ?  is twisted really simple enough ?
> Twisted is wonderful, powerful, rich, and very large.  Perhaps a small
> subset could be carefully extracted that (given suitable volunteers to
> maintain it in the future) might fit in the standard library, but [a]
> that extraction is not going to be a simple or fast job, and [b] I
> suspect that the minimum sensible subset would still be much larger
> (and richer / more powerful) than asyncore.

The subject of putting (parts of) Twisted into the standard library
comes up once every 6 months or so, at least on our mailing list. For
all that I think asyncore is worthless, I'm still against copying
Twisted into the stdlib. Or at least I'm not willing to maintain the
necessary fork, and I fear the nightmares about versioning that can
easily occur when you've got both standard library and third party
versions of a project.

But, for the record, to the people who argue not to put Twisted into
the stdlib because of its size: The parts of it that would actually be
applicable (i.e. those that obselete async* in the stdlib) are only a
few kilobytes of code. At a quick run of "wc", the parts that support
event loops, accurate timed calls, SSL, Unix sockets, TCP, UDP,
arbitrary file descriptors, processes, and threads sums up to about
5300 lines of code. asynchat and asyncore are about 1200.

  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix    |    --
            |  Release Manager, Twisted Project
  \\\V///   |    --
   |o O|    |

From nnorwitz at  Wed Feb  8 04:03:11 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 7 Feb 2006 19:03:11 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <dsbc3h$rct$>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/7/06, Fredrik Lundh <fredrik at> wrote:
> >
> > what's the current release plan for Python 2.5, btw?  I cannot find a
> > relevant PEP, and the "what's new" says "late 2005":
> >
> but I don't think that anyone followed up on this.  what's the current
> status ?

Guido and I had a brief discussion about this.  IIRC, he was thinking
alpha around March and release around summer.  I think this is
aggressive with all the things still to do.  We really need to get the
ssize_t branch integrated.

There are a bunch of PEPs that have been accepted (or close), but not
implemented.  I think these include (please correct me, so we can get
a good list):

 SA  308  Conditional Expressions
 SA  328  Imports: Multi-Line and Absolute/Relative
 SA  342  Coroutines via Enhanced Generators
 S   343  The "with" Statement
 S   353  Using ssize_t as the index type

This one should be marked as final I believe:

  SA  341  Unifying try-except and try-finally


From jeremy at  Wed Feb  8 04:26:02 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 7 Feb 2006 22:26:02 -0500
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

It looks like we need a Python 2.5 Release Schedule PEP.


On 2/7/06, Neal Norwitz <nnorwitz at> wrote:
> On 2/7/06, Fredrik Lundh <fredrik at> wrote:
> > >
> > > what's the current release plan for Python 2.5, btw?  I cannot find a
> > > relevant PEP, and the "what's new" says "late 2005":
> > >
> > but I don't think that anyone followed up on this.  what's the current
> > status ?
> Guido and I had a brief discussion about this.  IIRC, he was thinking
> alpha around March and release around summer.  I think this is
> aggressive with all the things still to do.  We really need to get the
> ssize_t branch integrated.
> There are a bunch of PEPs that have been accepted (or close), but not
> implemented.  I think these include (please correct me, so we can get
> a good list):
>  SA  308  Conditional Expressions
>  SA  328  Imports: Multi-Line and Absolute/Relative
>  SA  342  Coroutines via Enhanced Generators
>  S   343  The "with" Statement
>  S   353  Using ssize_t as the index type
> This one should be marked as final I believe:
>   SA  341  Unifying try-except and try-finally
> n
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From nnorwitz at  Wed Feb  8 05:49:32 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 7 Feb 2006 20:49:32 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 2/7/06, Christopher Armstrong <radeex at> wrote:
> > Twisted is wonderful, powerful, rich, and very large.  Perhaps a small
> > subset could be carefully extracted
> The subject of putting (parts of) Twisted into the standard library
> comes up once every 6 months or so, at least on our mailing list. For
> all that I think asyncore is worthless, I'm still against copying
> Twisted into the stdlib. Or at least I'm not willing to maintain the
> necessary fork, and I fear the nightmares about versioning that can
> easily occur when you've got both standard library and third party
> versions of a project.

I wouldn't be enthusiastic about putting all of Twisted in the stdlib
either.  Twisted is on a different release schedule than Python. 
However, isn't there a relatively small core subset like Alex
mentioned that isn't changing much?  Could we split up those
components and have those live in the core, but the vast majority of
Twisted live outside as it does now?


From janssen at  Wed Feb  8 05:53:46 2006
From: janssen at (Bill Janssen)
Date: Tue, 7 Feb 2006 20:53:46 PST
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: Your message of "Tue, 07 Feb 2006 15:46:18 PST."
Message-ID: <06Feb7.205350pst."58633">

> what other reactive socket framework is there that would fit well into
> the standard library ?  is twisted really simple enough ?

I've been very happy with Medusa, which is asyncore-based.

Perhaps the right idea is to fix the various problems of asyncore.  We
might lift the similar code from the kernel of ILU, for example, which
carefully addresses the various issues around this style of action loop.


From tim.peters at  Wed Feb  8 06:15:41 2006
From: tim.peters at (Tim Peters)
Date: Wed, 8 Feb 2006 00:15:41 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
Message-ID: <>

[Josiah Carlson]
> ...
> Back to the topic that Guido was really complaining about: Zope +
> asyncore.  I don't doubt that getting Zope to play nicely with asyncore
> was difficult,

It's more that mixing asyncore with threads is a bloody nightmare, and
ZEO and Zope both do that.  Zope (but not ZEO) goes on to mix threads
with asynchat too.  In addition, ZEO makes life much harder than
should be necessary by running in two different modes and
auto-switching between them, depending on whether "the app" is or is
not running an asyncore mainloop itself.  In order to _detect_ when
"the app" fires up an asyncore mainloop, ZEO monkey-patches asyncore's
loop() function and physically replaces it with its own loop()
function.  It goes downhill from there.

Guido's memories are partly out of date now:  ZEO used to replace a
lot more of asyncore than it does now, because of bugs in the asyncore
distributed with older Python versions.  The _needs_ for that went
away little by little over the years, but the code in ZEO stuck around
much longer.  ZEO's ThreadedAsync/ is much smaller now
(ZODB 3.6) than Guido remembers.

For a brief while, I even ripped out ZEO's monkey-patching of Python's
asyncore loop(), but it turned out that newer code in Zope3 (but not
Zope2) relied on, in turn, poking values into ZEO's module globals to
cause ZEO's loop() replacement to shut down (that's the kind of
"expedient" joy you get when mixing asyncore with threads).

Every piece of it remains "underdocumented" and, IMO, highly obscure.

> but it begs the questions: what would have been done if asyncore didn't exist,

Who knows?  What would python-dev be like if you didn't exist :-)?

> and why wasn't that done instead of trying to play nicely with asyncore?

Bugs and "missing features" in asyncore.  For ZEO's purposes, if I had
designed it, I expect it would have used threads (without asyncore). 
However, bits of code still sitting around suggest that it was at
least the _intent_ at one time that ZEO be able to run without threads
at all.  That's certainly not possible now.

If you look at asyncore's revision history, you'll note that Jeremy
and Guido made many changes when they worked at Zope Corp.  Those
largely reflect the history of moving ZEO's asyncore monkey-patches
into the Python core.

BTW, if you don't use ZEO, I believe it's possible to run Zope3
without asyncore (you can use Twisted in Zope3 instead).

From brett at  Wed Feb  8 06:39:19 2006
From: brett at (Brett Cannon)
Date: Tue, 7 Feb 2006 21:39:19 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/7/06, Neal Norwitz <nnorwitz at> wrote:
> On 2/7/06, Fredrik Lundh <fredrik at> wrote:
> > >
> > > what's the current release plan for Python 2.5, btw?  I cannot find a
> > > relevant PEP, and the "what's new" says "late 2005":
> > >
> > but I don't think that anyone followed up on this.  what's the current
> > status ?
> Guido and I had a brief discussion about this.  IIRC, he was thinking
> alpha around March and release around summer.  I think this is
> aggressive with all the things still to do.  We really need to get the
> ssize_t branch integrated.
> There are a bunch of PEPs that have been accepted (or close), but not
> implemented.  I think these include (please correct me, so we can get
> a good list):
>  SA  308  Conditional Expressions
>  SA  328  Imports: Multi-Line and Absolute/Relative
>  SA  342  Coroutines via Enhanced Generators
>  S   343  The "with" Statement
>  S   353  Using ssize_t as the index type
> This one should be marked as final I believe:
>   SA  341  Unifying try-except and try-finally

Supposedly Guido is close on pronouncing on PEP 352 (Required
Superclass for Exceptions), or so he said last time that thread came


From nnorwitz at  Wed Feb  8 07:35:31 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 7 Feb 2006 22:35:31 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/7/06, Jeremy Hylton <jeremy at> wrote:
> It looks like we need a Python 2.5 Release Schedule PEP.

Very draft:

Needs lots of work and release managers.  Anthony, Martin, Fred, Sean
are all mentioned with TBDs and question marks.


From smiles at  Wed Feb  8 06:57:18 2006
From: smiles at (Smith)
Date: Tue, 7 Feb 2006 23:57:18 -0600
Subject: [Python-Dev] math.areclose ...?
References: <>
Message-ID: <006c01c62c7d$d33926b0$1f2c4fca@csmith>

Raymond Hettinger wrote:
| [Chris Smith]
|| Does it help to spell it like this?
|| def areclose(x, y, relative_err = 1.e-5, absolute_err=1.e-8):
||     diff = abs(x - y)
||     ave = (abs(x) + abs(y))/2
||     return diff < absolute_err or diff/ave < relative_err
| There is a certain beauty and clarity to this presentation; however,
| it is problematic numerically:
| * the division by either absolute_err and relative_err can overflow or
| trigger a ZeroDivisionError

I'm not dividing by either of these values so that shouldn't be a problem. As long as absolute_err is not 0 then the first test would catch the possiblity that x==y==ave==0. (see below)

As for the overflow, does your version of python overflow? Mine (2.4) just returns 1.#INF which still computes as a number:

>>> 1.79769313486e+308+1.79769313486e+308
>>> inf=_
>>> inf>1
>>> inf<1
>>> 2./inf
>>> inf/inf

There is a problem with dividing by 'ave' if the x and y are at the floating point limits, but the symmetric behaving form (presented by Scott Daniels) will have the same problem. The following format for close() has the same semantic meaning but avoids the overflow possibility and avoids extra work for the case when abs_tol=0 and x==y:

def close(x, y, abs_tol=1.e-8, rel_tol=1.e-5):
 '''Return True if |x-y| < abs_tol or |x-y|/ave(|x|,|y|) < rel_tol.
 The average is not computed directly so as to avoid overflow for
 numbers close to the floating point upper limit.'''
 if x==y: return True
 diff = abs(x - y)
 if diff < abs_tol: return True
 f = rel_tol/2.
 if diff < f*abs(x) + f*abs(y): return True
 return False

| * the 'or' part of the expression can introduce an unnecessary
| discontinuity in the first derivative.
If a value other than boolean were being returned, I could see the desire for continuity in derivative. Since the original form presents a boolean result, however, I'm having a hard time thinking of how the continuity issue comes to play.
| The original Numeric definition is likely to be better for people who
| know what they're doing; however, I still question whether it is an
| appropriate remedy for the beginner issue
| of why 1.1 + 1.1 + 1.1 doesn't equal 3.3.

I'm in total agreement. Being able to see that math.areclose(1.1*3,3.3) is True but 1.1*3==3.3 is False is not going to make them feel much better. They are going to have to face the floating point issue. 

As for the experienced user, perhaps such a function would be helpful. Maybe it would be better to require that the tolerances be given rather than defaulting so as to make clear which test is being used if only one test was going to be used:



From smiles at  Wed Feb  8 07:18:54 2006
From: smiles at (Smith)
Date: Wed, 8 Feb 2006 00:18:54 -0600
Subject: [Python-Dev] small floating point number problem
Message-ID: <006d01c62c7d$d7bab730$1f2c4fca@csmith>

I just ran into a curious behavior with small floating points, trying to find the limits of them on my machine (XP). Does anyone know why the '0.0' is showing up for one case below but not for the other? According to my tests, the smallest representable float on my machine is much smaller than 1e-308: it is


but I can only create it as a product of two numbers, not directly. Here is an attempt to create the much larger 1e-308:

>>> a=1e-308
>>> a
>>> a==0
True            <-- it really is 0; this is not a repr issue
>>> b=.1*1e-307
>>> b
>>> a==b
False            <--they really are different

Also, I see that there is some graininess in the numbers at the low end, but I'm guessing that there is some issue with floating points that I would need to read up on again. The above dilemma is a little more troublesome.

>>> m=2.470328229206234e-017
>>> s=1e-307
>>> m*s
4.9406564584124654e-324 #2x too large
>>> 2*m*s
>>> 3*m*s==4*m*s


From martin at  Wed Feb  8 08:05:51 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 08:05:51 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Tim Peters wrote:
> Bugs and "missing features" in asyncore.  For ZEO's purposes, if I had
> designed it, I expect it would have used threads (without asyncore). 
> However, bits of code still sitting around suggest that it was at
> least the _intent_ at one time that ZEO be able to run without threads
> at all.  That's certainly not possible now.

What is the reason that people want to use threads when they can have
poll/select-style message processing? Why does Zope require threads?
IOW, why would anybody *want* a "threadsafe patch for asynchat"?


From stephen at  Wed Feb  8 08:19:30 2006
From: stephen at (Stephen J. Turnbull)
Date: Wed, 08 Feb 2006 16:19:30 +0900
Subject: [Python-Dev] Help with Unicode arrays in NumPy
In-Reply-To: <dsavjl$bgg$> (Travis E. Oliphant's message of
	"Tue, 07 Feb 2006 13:23:13 -0700")
References: <dsatpo$4fo$> <>
Message-ID: <>

>>>>> "Travis" == Travis E Oliphant <oliphant.travis at> writes:

    Travis> Numpy supports arrays of arbitrary fixed-length "records".
    Travis> It is much more than numeric-only data now.  One of the
    Travis> fields that a record can contain is a string.  If strings
    Travis> are supported, it makes sense to support unicode strings
    Travis> as well.

That is not obvious.  A string is really an array of bytes, which for
historical reasons in some places (primarily the U.S. of A.) can be
used to represent text.  Unicode, on the other hand, is intended to
represent text streams robustly and does so in a universal but
flexible way ... but all of the different Unicode transformation
formats are considered to represent the *identical* text stream.  Some
applications may specify a transformation format, others will not.

In any case, internally Python is only going to support *one*; all the
others must be read in through codecs anyway.  See below.

    Travis> This allows NumPy to memory-map arbitrary data-files on
    Travis> disk.

In the case where a transformation format *is* specified, I don't see
why you can't use a byte array field (ie, ordinary "string") of
appropriate size for this purpose, and read it through a codec when it
needs to be treated as text.  This is going to be necessary in
essentially all of the cases I encounter, because the files are UTF-8
and sane internal representations are either UTF-16 or UTF-32.  In
particular, Python's internal representation is 16 or 32 bits wide.

    Travis> Perhaps you should explain why you think NumPy "shouldn't
    Travis> support Unicode"

Because it can't, not in the way you would like to, if I understand
you correctly.  Python chooses *one* of the many standard
representations for internal use, and because of the way the standard
is specified, it doesn't matter which one!  And none of the others can
be represented directly, all must be decoded for internal use and
encoded when written back to external media.  So any memory mapping
application is inherently nonportable, even across Python

    Travis> And Python does not support arbitrary Unicode characters
    Travis> on narrow builds?  Then how is \U0010FFFF represented?

In a way incompatible with the concept of character array.  Now what
do you do?

The point is that Unicode is intentionally designed in such a way that
a plethora of representations is possible, but all are easily and
reliably interconverted.  Implementations are then free to choose an
appropriate internal representation, knowing that conversion from
external representations is "cheap" and standardized.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From steve at  Wed Feb  8 08:33:28 2006
From: steve at (Steve Holden)
Date: Wed, 08 Feb 2006 02:33:28 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <dsc6sa$85k$>

Martin v. L?wis wrote:
> Tim Peters wrote:
>>Bugs and "missing features" in asyncore.  For ZEO's purposes, if I had
>>designed it, I expect it would have used threads (without asyncore). 
>>However, bits of code still sitting around suggest that it was at
>>least the _intent_ at one time that ZEO be able to run without threads
>>at all.  That's certainly not possible now.
> What is the reason that people want to use threads when they can have
> poll/select-style message processing? Why does Zope require threads?
> IOW, why would anybody *want* a "threadsafe patch for asynchat"?
In case the processing of events needed to block? If I'm processing web 
requests in an async* dispatch loop and a request needs the results of a 
(probably lengthy) database query in order to generate its output, how 
do I give the dispatcher control again to process the next asynchronous 
network event?

The usual answer is "process the request in a thread". That way the 
dispatcher can spring to life for each event as quickly as needed.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From fredrik at  Wed Feb  8 08:44:25 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 8 Feb 2006 08:44:25 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
References: <>	<>	<>	<><>
Message-ID: <dsc7gr$9to$>

Steve Holden wrote:

> > What is the reason that people want to use threads when they can have
> > poll/select-style message processing? Why does Zope require threads?
> > IOW, why would anybody *want* a "threadsafe patch for asynchat"?
> >
> In case the processing of events needed to block? If I'm processing web
> requests in an async* dispatch loop and a request needs the results of a
> (probably lengthy) database query in order to generate its output, how
> do I give the dispatcher control again to process the next asynchronous
> network event?
> The usual answer is "process the request in a thread". That way the
> dispatcher can spring to life for each event as quickly as needed.

but why do such threads have to talk to asyncore directly ?


From jcarlson at  Wed Feb  8 08:57:00 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 07 Feb 2006 23:57:00 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsc7gr$9to$>
References: <dsc6sa$85k$> <dsc7gr$9to$>
Message-ID: <>

"Fredrik Lundh" <fredrik at> wrote:
> Steve Holden wrote:
> > > What is the reason that people want to use threads when they can have
> > > poll/select-style message processing? Why does Zope require threads?
> > > IOW, why would anybody *want* a "threadsafe patch for asynchat"?
> > >
> > In case the processing of events needed to block? If I'm processing web
> > requests in an async* dispatch loop and a request needs the results of a
> > (probably lengthy) database query in order to generate its output, how
> > do I give the dispatcher control again to process the next asynchronous
> > network event?
> >
> > The usual answer is "process the request in a thread". That way the
> > dispatcher can spring to life for each event as quickly as needed.
> but why do such threads have to talk to asyncore directly ?

Indeed.  I seem to remember a discussion a few months ago about "easy"
thread programming, which invariably directed people off to use the
simplest abstractions necessary: Queues.

 - Josiah

From jcarlson at  Wed Feb  8 09:07:12 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 08 Feb 2006 00:07:12 -0800
Subject: [Python-Dev] small floating point number problem
In-Reply-To: <006d01c62c7d$d7bab730$1f2c4fca@csmith>
References: <006d01c62c7d$d7bab730$1f2c4fca@csmith>
Message-ID: <>

"Smith" <smiles at> wrote:
> I just ran into a curious behavior with small floating points, trying
> to find the limits of them on my machine (XP). Does anyone know why the
> '0.0' is showing up for one case below but not for the other? According
> to my tests, the smallest representable float on my machine is much
> smaller than 1e-308: it is

There are all sorts of ugly bits when working with all binary fp numbers
(for the small ones, look for a reference on 'denormals'). I'm sure that
Raymond has more than a few things to say about them (and fp in general),
but I will speed up the discussion by saying that you should read the
IEEE 754 standard for floating point, or alternatively ask on
comp.lang.python where more users would get more out of the answers that
you will recieve there.

One thing to remember is that decimal is not the native representation
of binary floating point, so 1e-100 differs from 1e-101 significantly in
various bit positions.  You can use struct.pack('d', flt) to see this,
or you can try any one of the dozens of IEEE 754 javascript calculators
out there.

 - Josiah

From raymond.hettinger at  Wed Feb  8 09:08:25 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 08 Feb 2006 03:08:25 -0500
Subject: [Python-Dev] small floating point number problem
References: <006d01c62c7d$d7bab730$1f2c4fca@csmith>
Message-ID: <001c01c62c86$d694beb0$b83efea9@RaymondLaptop1>

>I just ran into a curious behavior with small floating points, trying to 
>find the limits of them on my machine (XP). Does anyone know why the '0.0' 
>is showing up for one case below but not for the other? According to my 
>tests, the smallest representable float on my machine is much smaller than 
>1e-308: it is
> 2.470328229206234e-325
> but I can only create it as a product of two numbers, not directly. Here 
> is an attempt to create the much larger 1e-308:
>>>> a=1e-308
>>>> a
> 0.0

The clue is in that the two differ by 17 orders of magnitude (325-308) which 
is about 52 bits.

The interpreter builds 1-e308 by using the underlying C library 
string-to-float function and it isn't constructing numbers outside the 
normal range for floats.  When you enter a value outside that range, the 
function underflows it to zero.

In contrast, your computed floats (such as 1*1e-307) return a denormal 
result (where the significand is stored with fewer bits than normal because 
the exponent is already at its outer limit).  That denormal result is not 
zero and the C library float-to-string conversion successfully generates a 
decimal string representation.

The asymmetric handling of denormals by the atof() and ftoa() functions is 
why you see a difference.  A consequence of that asymmetry is the breakdown 
of the expected eval(repr(f))==f invariant:

>>> f = f = .1*1e-307
>>> eval(repr(f)) == f


From thomas at  Wed Feb  8 11:00:06 2006
From: thomas at (Thomas Wouters)
Date: Wed, 8 Feb 2006 11:00:06 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <06Feb7.205350pst."58633">
References: <dsbbgc$pio$>
Message-ID: <>

On Tue, Feb 07, 2006 at 08:53:46PM -0800, Bill Janssen wrote:

> Perhaps the right idea is to fix the various problems of asyncore.

The problem with making asyncore more useful is that you end up with (a cut
down version of) Twisted, although not one that would be able to integrate
with Twisted. asyncore/asynchat and Twisted are really not that different,
and anything you do to enhance the former will make it look more like the
latter. I'd personally rather fork parts of Twisted, in spite of the
maintenance issues, than re-invent Twisted, fix all the issues Twisted
already solves and face the same kind of maintenance issues. It would be
perfect if the twisted-light in the stdlib would integrate with the 'real'
Twisted, so that users can 'upgrade' their programs just by installing
Twisted and using the extra features.

Not that I think we should stop at the event core and the TCP/SSL parts of
Twisted; imaplib, poplib, httplib, xmlrpclib, for instance, could all do
with Twisted-inspired alternatives (or even replacements, if the synchronous
API was kept the same.) The synchronous versions are fine for simple scripts
(or complex scripts that don't mind long blocking operations.) If we start
exporting a really useful asynchronous framework, I would expect
asynchronous counterparts to the useful higher-level networking modules,
too. But that doesn't have to come right away ;)

Anything beyond simple bugfixes on asyncore/asynchat seems like a terrible
waste of effort, to me. And I hardly ever use Twisted.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From fuzzyman at  Wed Feb  8 11:01:40 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 08 Feb 2006 10:01:40 +0000
Subject: [Python-Dev] Old Style Classes Goiung in Py3K
Message-ID: <>

Hello all,

I understand that old style classes are slated to disappear in Python 3000.

Does this mean that the following will be a syntax error :

class Something:

*or* that instead it will automatically inherit from object ?

The latter would break a few orders of magnitude less code of course...

All the best,

Michael Foord

From raveendra-babu.m at  Wed Feb  8 10:41:35 2006
From: raveendra-babu.m at (M, Raveendra Babu (STSD))
Date: Wed, 8 Feb 2006 15:11:35 +0530
Subject: [Python-Dev] Make error on solaris 9 x86 - error: parse error
	before "upad128_t"
Message-ID: <>


I am trying to build python-2.3.5 on solaris 9 - X86. 
1) first  I have unpacked : Python-2.3.5.tgz  using   : tar -zxvf
         no erros at this stage
2) then run :
            No errors at this stage
3)then /usr/ccs/bin/make

  it is giving  some errors and the error is :
 gcc -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes
-I. -I./Include  -DPy_BUILD_CORE -o Python/pythonrun.o
In file included from /usr/include/sys/reg.h:13,
                 from /usr/include/sys/regset.h:24,
                 from /usr/include/sys/ucontext.h:21,
                 from /usr/include/sys/signal.h:240,
                 from /usr/include/signal.h:27,
                 from Python/pythonrun.c:17:
/usr/include/ia32/sys/reg.h:300: error: parse error before "upad128_t"
/usr/include/ia32/sys/reg.h:302: error: parse error before '}' token
/usr/include/ia32/sys/reg.h:309: error: field `kfpu_fx' has incomplete
/usr/include/ia32/sys/reg.h:314: confused by earlier errors, bailing out
*** Error code 1
make: Fatal error: Command failed for target `Python/pythonrun.o'

Can you  please reply me with some fix for this problem.


From fredrik at  Wed Feb  8 11:15:42 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 8 Feb 2006 11:15:42 +0100
Subject: [Python-Dev] Make error on solaris 9 x86 - error: parse
	errorbefore "upad128_t"
References: <>
Message-ID: <dscgcf$5ks$>

M, Raveendra Babu (STSD) wrote:

> 3)then /usr/ccs/bin/make
>   it is giving  some errors and the error is :
>  gcc -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes
> -I. -I./Include  -DPy_BUILD_CORE -o Python/pythonrun.o
> Python/pythonrun.c
> In file included from /usr/include/sys/reg.h:13,
>                  from /usr/include/sys/regset.h:24,
>                  from /usr/include/sys/ucontext.h:21,
>                  from /usr/include/sys/signal.h:240,
>                  from /usr/include/signal.h:27,
>                  from Python/pythonrun.c:17:
> /usr/include/ia32/sys/reg.h:300: error: parse error before "upad128_t"
> /usr/include/ia32/sys/reg.h:302: error: parse error before '}' token
> /usr/include/ia32/sys/reg.h:309: error: field `kfpu_fx' has incomplete
> type
> /usr/include/ia32/sys/reg.h:314: confused by earlier errors, bailing out
> *** Error code 1
> make: Fatal error: Command failed for target `Python/pythonrun.o'
> Can you  please reply me with some fix for this problem.

a quick google search indicates that this is a compiler problem.  random
FAQ entry:

    The problem is that the Solaris headers changed across updates
    of Solaris 9 and you are using a GCC from before the change on
    an updated system. (i.e. a GCC built for Solaris 9 <= 12/03 on
    Solaris 9 >= 4/04).

    You can either rebuild GCC for your version of the system (it
    works, even using a GCC built for the previous version), or fix
    your headers:


From patrick at  Wed Feb  8 11:13:25 2006
From: patrick at (Patrick Collison)
Date: Wed, 8 Feb 2006 10:13:25 +0000
Subject: [Python-Dev] Let's just *keep* lambda
Message-ID: <>

>> After so many attempts to come up with an alternative for lambda,
>> perhaps we should admit defeat. I've not had the time to follow the
>> most recent rounds, but I propose that we keep lambda, so as to stop
>> wasting everybody's talent and time on an impossible quest.
> I agree with this. The *name* "lambda" is a wart, even if the deferred
> expression feature isn't. My preference is to simply replace the
> keyword lambda with a keyword "expr" (or if that's not acceptable
> because there's too much prior use of expr as a variable name, then
> maybe "expression" - but that's starting to get a bit long).

Sorry, I'm a little late to this discussion.

How about `procedure', or just `proc'?

And to think that people thought that keeping "lambda", but changing  
the name, would avoid all the heated discussion... :-)


From theller at  Wed Feb  8 12:05:43 2006
From: theller at (Thomas Heller)
Date: Wed, 08 Feb 2006 12:05:43 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <> (Martin v. =?iso-8859-1?Q?L=F6?=
	=?iso-8859-1?Q?wis's?= message of "Tue, 07 Feb 2006 23:55:47 +0100")
References: <>
Message-ID: <>

"Martin v. L?wis" <martin at> writes:

> I just came up with an idea how to resolve the VC versioning
> problems for good: Python should link with mscvrt.dll (which
> is part of the operating system), not with the CRT that the
> compiler provides.
> To do that, we would need to compile and link with the SDK
> header files and import libraries, not with the ones that
> visual studio provides.
> For that to work, everyone building Python or Python extensions (*)
> would have to install the Platform SDK (which is available
> for free, but contains quite a number of bits). Would that be
> acceptable?
> Disclaimer: I haven't tried yet whether this would actually
> work.
> Regards,
> Martin
> (*) For Python extensions, it should be possible to use mingw
> instead, and configure it for linking against msvcrt.

I think think would remove a lot of headaches.  Downloading and
installing the Platform SDK should not be an issue, imo.

The only problem that I see is this:

I'm not sure the platform SDK include files (.H and .IDL) are really
compatible with VC7.1.  I remember that we (on our company, building C++
software) had to 'Unregister the PSDK Directories with Visual Studio'
(available from the start menu) before building the stuff, otherwise
there were compiler errors.


From steve at  Wed Feb  8 13:25:35 2006
From: steve at (Steve Holden)
Date: Wed, 08 Feb 2006 07:25:35 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <dsc6sa$85k$> <dsc7gr$9to$>
Message-ID: <dsco01$vkl$>

Josiah Carlson wrote:
> "Fredrik Lundh" <fredrik at> wrote:
>>Steve Holden wrote:
>>>>What is the reason that people want to use threads when they can have
>>>>poll/select-style message processing? Why does Zope require threads?
>>>>IOW, why would anybody *want* a "threadsafe patch for asynchat"?
>>>In case the processing of events needed to block? If I'm processing web
>>>requests in an async* dispatch loop and a request needs the results of a
>>>(probably lengthy) database query in order to generate its output, how
>>>do I give the dispatcher control again to process the next asynchronous
>>>network event?
>>>The usual answer is "process the request in a thread". That way the
>>>dispatcher can spring to life for each event as quickly as needed.
>>but why do such threads have to talk to asyncore directly ?
Good question.
> Indeed.  I seem to remember a discussion a few months ago about "easy"
> thread programming, which invariably directed people off to use the
> simplest abstractions necessary: Queues.
Maybe people are finding Python too easy and they just want to 
complicate their code to the point where it contains interesting bugs? I 
dunno ....

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From abo at  Wed Feb  8 14:23:26 2006
From: abo at (Donovan Baarda)
Date: Wed, 08 Feb 2006 13:23:26 +0000
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsc6sa$85k$>
References: <>
	<>  <dsc6sa$85k$>
Message-ID: <>

On Wed, 2006-02-08 at 02:33 -0500, Steve Holden wrote:
> Martin v. L?wis wrote:
> > Tim Peters wrote:
> > What is the reason that people want to use threads when they can have
> > poll/select-style message processing? Why does Zope require threads?
> > IOW, why would anybody *want* a "threadsafe patch for asynchat"?
> > 
> In case the processing of events needed to block? If I'm processing web 
> requests in an async* dispatch loop and a request needs the results of a 
> (probably lengthy) database query in order to generate its output, how 
> do I give the dispatcher control again to process the next asynchronous 
> network event?
> The usual answer is "process the request in a thread". That way the 
> dispatcher can spring to life for each event as quickly as needed.

I believe that Twisted does pretty much this with it's "deferred" stuff.
It shoves slow stuff off for processing in a separate thread that
re-syncs with the event loop when it's finished.

In the case of Zope/ZEO I'm not entirely sure but I think what happened
was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't
have this deferred handler support. When they found some of the stuff
Zope was doing took a long time, they came up with an initially simpler
but IMHO uglier solution of running multiple async loops in separate
threads and using a front-end dispatcher to distribute connections to
them. This way it wasn't too bad if an async loop stalled, because the
other loops in other threads could continue to process stuff.

If ZEO is still using this approach I think switching to a twisted style
approach would be a good idea. However, I suspect this would be a very
painful refactor...

Donovan Baarda <abo at>

From andrew-pythondev at  Wed Feb  8 14:57:04 2006
From: andrew-pythondev at (Andrew Bennetts)
Date: Thu, 9 Feb 2006 00:57:04 +1100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
	<> <dsc6sa$85k$>
Message-ID: <>

Donovan Baarda wrote:
> On Wed, 2006-02-08 at 02:33 -0500, Steve Holden wrote:
> > Martin v. L?wis wrote:
> > > Tim Peters wrote:
> [...]
> > > What is the reason that people want to use threads when they can have
> > > poll/select-style message processing? Why does Zope require threads?
> > > IOW, why would anybody *want* a "threadsafe patch for asynchat"?
> > > 
> > In case the processing of events needed to block? If I'm processing web 
> > requests in an async* dispatch loop and a request needs the results of a 
> > (probably lengthy) database query in order to generate its output, how 
> > do I give the dispatcher control again to process the next asynchronous 
> > network event?
> > 
> > The usual answer is "process the request in a thread". That way the 
> > dispatcher can spring to life for each event as quickly as needed.
> I believe that Twisted does pretty much this with it's "deferred" stuff.
> It shoves slow stuff off for processing in a separate thread that
> re-syncs with the event loop when it's finished.

Argh!  No.  Threading is completely orthogonal to Deferreds.

Deferreds are just an abstraction for managing callbacks for an asychronous
operation.  They don't magically invoke threads, or otherwise turn synchronous
code into asynchronous code for you.

This seems to be a depressingly common misconception.  I wish I knew how to stop

They're much simpler than people seem to think.  They're an object a function
returns to say "I don't have a result for you yet, but if you attach callbacks
to this I'll run those when I do."  We've do this because it's much nicer than
having to pass callbacks into functions, particularly when you want to deal with
chains of callbacks and error handling.

There is a single utility function in Twisted called "deferToThread" that will
run a function in a threadpool, and arrange for a Deferred to be fired with the
result (in the event loop thread, of course).  This is just one of many possible
uses for Deferreds, and not an especially common one.

I'm happy to provide pointers to several Twisted docs if anyone is at all
unclear on this.

While they are very useful, I don't think they're an essential part of a minimal
Twisted replacement for asyncore/asynchat -- in fact, they'd work just fine with
asyncore/asynchat, because they do so little.


From arigo at  Wed Feb  8 15:20:34 2006
From: arigo at (Armin Rigo)
Date: Wed, 8 Feb 2006 15:20:34 +0100
Subject: [Python-Dev] _length_cue()
Message-ID: <>

Hi all,

Last september, the __len__ method of iterators was removed -- see
discussion at:

It was replaced by an optional undocumented method called _length_cue(),
which would be used to guess the number of remaining items in an
iterator, for performance reasons.

I'm worried about the name.  There are now exactly two names that behave
like a special method without having the double-underscores around it.
The first name is 'next', which is kind of fine because it's for
iterator classes only and it's documented.  But now, consider: the
CPython implementation can unexpectedly invoke a method on a
user-defined iterator class, even though this method's name is not
'__*__' and not documented as special!  That's new and that's bad.

IMHO for safety reasons we need to stick double-underscores around this
name too, e.g. __length_cue__().  It's new in 2.5 and not documented
anyway so this change won't break anything.  Do you agree with that?

BTW the reason I'm looking at this is that I'm considering adding
another undocumented internal-use-only method, maybe __getitem_cue__(),
that would try to guess what the nth item to be returned will be.  This
would allow the repr of some iterators to display more helpful
information when playing around with them at the prompt, e.g.:

>>> enumerate([3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141596])
<enumerate (0, 3.1), (1, 3.14), (2, 3.141),... length 6>

A bientot,


From rasky at  Wed Feb  8 15:24:33 2006
From: rasky at (Giovanni Bajo)
Date: Wed, 8 Feb 2006 15:24:33 +0100
Subject: [Python-Dev] Linking with mscvrt
References: <>
Message-ID: <031401c62cbb$61810630$bf03030a@trilan>

Martin v. L?wis <martin at> wrote:

> I just came up with an idea how to resolve the VC versioning
> problems for good: Python should link with mscvrt.dll (which
> is part of the operating system), not with the CRT that the
> compiler provides.

Can you elaborate exactly on which versioning problems you think of?

> For that to work, everyone building Python or Python extensions (*)
> would have to install the Platform SDK (which is available
> for free, but contains quite a number of bits). Would that be
> acceptable?

It would complicate the build process and make Python lag behind CRT
development (including bugfixes and whatnot) that Microsoft does. You could
as well ask to always stick with GCC 2.95 to solve ABI problems, but I don't
think it's the correct long time solution. I expect more and more Windows
libraries (binary version) to be shipped with dependencies on MSVCR71.DLL.

Anyway, it's just a feeling, since I still don't understand which problems
you are trying to solve in the first place.
Giovanni Bajo

From dialtone at  Wed Feb  8 15:14:42 2006
From: dialtone at (Valentino Volonghi aka Dialtone)
Date: Wed, 8 Feb 2006 15:14:42 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
	<> <dsc6sa$85k$>
Message-ID: <>

On Wed, Feb 08, 2006 at 01:23:26PM +0000, Donovan Baarda wrote:
> I believe that Twisted does pretty much this with it's "deferred" stuff.
> It shoves slow stuff off for processing in a separate thread that
> re-syncs with the event loop when it's finished.

Deferreds are only an elaborate way to deal with a bunch of callbacks.
It's Twisted itself that provides a way to run something in a separate thread
and then fire a deferred (from the main thread) when the child thread
finishes (reactor.callInThread() to call stuff in a different thread,
reactor.callFromThread() to call reactor APIs from a different thread)
Deferreds are just a bit more than:

class Deferred(object):
    def __init__(self):
        self.callbacks = []

    def addCallback(self, callback):

    def callback(self, value):
        for callback in self.callbacks:
            value = callback(value)

This is mostly what a deferred is (without error handling, extra argument
passing, 'nested' deferreds handling and blabla, the core concept however
is there). As you see there is no extra magic in deferreds (or weird
dependency on Twisted, they are pure python and could be used everywhere,
you can implement them in any language that supports first class functions).

> In the case of Zope/ZEO I'm not entirely sure but I think what happened
> was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't
> have this deferred handler support. When they found some of the stuff

Here I think you meant that medusa didn't handle computation in separate
threads instead.

Valentino Volonghi aka Dialtone
Now Running MacOSX 10.4
New Pet:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : 

From g.brandl at  Wed Feb  8 15:39:28 2006
From: g.brandl at (Georg Brandl)
Date: Wed, 08 Feb 2006 15:39:28 +0100
Subject: [Python-Dev] Old Style Classes Goiung in Py3K
In-Reply-To: <>
References: <>
Message-ID: <dscvr0$tck$>

Fuzzyman wrote:
> Hello all,
> I understand that old style classes are slated to disappear in Python 3000.
> Does this mean that the following will be a syntax error :
> class Something:
>     pass
> *or* that instead it will automatically inherit from object ?

Of course, I would say. There's no reason to forbid this in Py3k.


From g.brandl at  Wed Feb  8 15:42:39 2006
From: g.brandl at (Georg Brandl)
Date: Wed, 08 Feb 2006 15:42:39 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
	<>	<>	<dsbbgc$pio$>	<>	<>
Message-ID: <dsd00v$tck$>

Neal Norwitz wrote:
> On 2/7/06, Christopher Armstrong <radeex at> wrote:
>> > Twisted is wonderful, powerful, rich, and very large.  Perhaps a small
>> > subset could be carefully extracted
>> The subject of putting (parts of) Twisted into the standard library
>> comes up once every 6 months or so, at least on our mailing list. For
>> all that I think asyncore is worthless, I'm still against copying
>> Twisted into the stdlib. Or at least I'm not willing to maintain the
>> necessary fork, and I fear the nightmares about versioning that can
>> easily occur when you've got both standard library and third party
>> versions of a project.
> I wouldn't be enthusiastic about putting all of Twisted in the stdlib
> either.  Twisted is on a different release schedule than Python. 
> However, isn't there a relatively small core subset like Alex
> mentioned that isn't changing much?  Could we split up those
> components and have those live in the core, but the vast majority of
> Twisted live outside as it does now?

+1. This would be very useful for simple networking applications.


From aahz at  Wed Feb  8 16:42:29 2006
From: aahz at (Aahz)
Date: Wed, 8 Feb 2006 07:42:29 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <dsbbgc$pio$>
Message-ID: <>

On Wed, Feb 08, 2006, Thomas Wouters wrote:
> Anything beyond simple bugfixes on asyncore/asynchat seems like a terrible
> waste of effort, to me. And I hardly ever use Twisted.

Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From aahz at  Wed Feb  8 16:48:39 2006
From: aahz at (Aahz)
Date: Wed, 8 Feb 2006 07:48:39 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 08, 2006, Patrick Collison wrote:
> How about `procedure', or just `proc'?


lambdas are *expected* to return a result -- procedures are functions
with side-effects that don't return a result.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From aahz at  Wed Feb  8 16:50:54 2006
From: aahz at (Aahz)
Date: Wed, 8 Feb 2006 07:50:54 -0800
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 08, 2006, Armin Rigo wrote:
> IMHO for safety reasons we need to stick double-underscores around this
> name too, e.g. __length_cue__().  It's new in 2.5 and not documented
> anyway so this change won't break anything.  Do you agree with that?

Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From bokr at  Wed Feb  8 16:54:38 2006
From: bokr at (Bengt Richter)
Date: Wed, 08 Feb 2006 15:54:38 GMT
Subject: [Python-Dev] small floating point number problem
References: <006d01c62c7d$d7bab730$1f2c4fca@csmith>
Message-ID: <>

On Wed, 08 Feb 2006 03:08:25 -0500, "Raymond Hettinger" <raymond.hettinger at> wrote:

>>I just ran into a curious behavior with small floating points, trying to 
>>find the limits of them on my machine (XP). Does anyone know why the '0.0' 
>>is showing up for one case below but not for the other? According to my 
>>tests, the smallest representable float on my machine is much smaller than 
>>1e-308: it is
>> 2.470328229206234e-325
>> but I can only create it as a product of two numbers, not directly. Here 
>> is an attempt to create the much larger 1e-308:
>>>>> a=1e-308
>>>>> a
>> 0.0
>The clue is in that the two differ by 17 orders of magnitude (325-308) which 
>is about 52 bits.
>The interpreter builds 1-e308 by using the underlying C library 
>string-to-float function and it isn't constructing numbers outside the 
>normal range for floats.  When you enter a value outside that range, the 
>function underflows it to zero.
>In contrast, your computed floats (such as 1*1e-307) return a denormal 
>result (where the significand is stored with fewer bits than normal because 
>the exponent is already at its outer limit).  That denormal result is not 
>zero and the C library float-to-string conversion successfully generates a 
>decimal string representation.
>The asymmetric handling of denormals by the atof() and ftoa() functions is 
>why you see a difference.  A consequence of that asymmetry is the breakdown 
>of the expected eval(repr(f))==f invariant:
>>>> f = f = .1*1e-307
>>>> eval(repr(f)) == f
BTW, for the OP, chasing minimum float values is probably best done with powers of 2

 >>> math.ldexp(1, -1074)
 >>> math.ldexp(1, -1075)
 >>> .5**1074
 >>> .5**1075
 >>> math.frexp(.5**1074)
 (0.5, -1073)
 >>> math.frexp(.5**1075)
 (0.0, 0)

Bengt Richter

From gjc at  Wed Feb  8 16:47:04 2006
From: gjc at (Gustavo J. A. M. Carneiro)
Date: Wed, 08 Feb 2006 15:47:04 +0000
Subject: [Python-Dev] Python modules should link to libpython
Message-ID: <1139413624.10037.36.camel@localhost>

gjc:/usr/lib/python2.4/lib-dynload$ ldd => /lib/ (0x00002aaaaabcc000) => /lib/ (0x00002aaaaace2000)
        /lib/ (0x0000555555554000)

It seems that Python C extension modules are not linking explicitly to
libpython.  Yet, they explicitly reference symbols defined in libpython.
When libpython is loaded in a global scope all is fine.  However, when
libpython is dlopen()ed with the RTLD_LOCAL flag, python C extensions
always get undefined symbols.

  This problem happened recently with the nautilus-python package, which
installs an extension for the Nautilus file manager that allows
extensions in Python.  For performance reasons, it now opens extensions
with RTLD_LOCAL flag, thus breaking python extensions.

  Any thoughts?  Should I go ahead and open a bug report (maybe with
patch), or is this controversial?

Gustavo J. A. M. Carneiro
<gjc at> <gustavo at>
The universe is always one step beyond logic.

From Scott.Daniels at Acm.Org  Wed Feb  8 17:11:55 2006
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Wed, 08 Feb 2006 08:11:55 -0800
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <006c01c62c7d$d33926b0$1f2c4fca@csmith>
References: <>	<001301c62b59$cdfbe900$152c4fca@csmith>	<002701c62b65$844a9750$7f00a8c0@RaymondLaptop1>
Message-ID: <dsd57n$lup$>

Smith wrote:
> ... There is a problem with dividing by 'ave' if the x and y are at 
 > the floating point limits, but the symmetric behaving form (presented
 > by Scott Daniels) will have the same problem.
Upon reflection, 'max' is probably better than averaging, and avoiding
divide is also a reasonably good idea.  Note that relative_tol < 1.0
(typically) so underflow, rather than overflow, is the issue:

     def nearby(x, y, relative_tol=1.e-5, absolute_tol=1.e-8):
         difference = abs(x - y)
         return (difference <= absolute_tol or
                 difference <= max(abs(x), abs(y)) * relative_tol)

I use <=, since "zero-tolerance" should pass equal values.

--Scott David Daniels
scott.Daniels at Acm.Org

From ark at  Wed Feb  8 17:54:14 2006
From: ark at (Andrew Koenig)
Date: Wed, 8 Feb 2006 11:54:14 -0500
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
Message-ID: <008701c62cd0$4e1b4eb0$6402a8c0@arkdesktop>

> I'm worried about the name.  There are now exactly two names that behave
> like a special method without having the double-underscores around it.
> The first name is 'next', which is kind of fine because it's for
> iterator classes only and it's documented.  But now, consider: the
> CPython implementation can unexpectedly invoke a method on a
> user-defined iterator class, even though this method's name is not
> '__*__' and not documented as special!  That's new and that's bad.

Might I suggest that at least you consider using "hint" instead of "cue"?
I'm pretty sure that "hint" has been in use for some time, and always to
mean a value that can't be assumed to be correct but that improves
performance if it is.

For example, algorithms that insert values in balanced trees sometimes take
hint arguments that suggest where the algorithm should start searching for
the insertion point.

From pedro.werneck at  Wed Feb  8 17:48:11 2006
From: pedro.werneck at (Pedro Werneck)
Date: Wed, 8 Feb 2006 14:48:11 -0200
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsd00v$tck$>
References: <> <>
Message-ID: <>

On Wed, 08 Feb 2006 15:42:39 +0100
Georg Brandl <g.brandl at> wrote:

> Neal Norwitz wrote:
> > On 2/7/06, Christopher Armstrong <radeex at> wrote:
> >>
> >> Twisted is wonderful, powerful, rich, and very large.  Perhaps a
> >small > > subset could be carefully extracted
> >>
> >The subject of putting (parts of) Twisted into the standard library
> >comes up once every 6 months or so, at least on our mailing list.
> >For all that I think asyncore is worthless, I'm still against
> >copying Twisted into the stdlib. Or at least I'm not willing to
> >maintain the necessary fork, and I fear the nightmares about
> >versioning that can easily occur when you've got both standard
> >library and third party versions of a project.
> > 
> > I wouldn't be enthusiastic about putting all of Twisted in the
> > stdlib either.  Twisted is on a different release schedule than
> > Python.  However, isn't there a relatively small core subset like
> > Alex mentioned that isn't changing much?  Could we split up those
> > components and have those live in the core, but the vast majority of
> > Twisted live outside as it does now?
> +1. This would be very useful for simple networking applications.

I have a simple library I wrote some time ago to make asynchronous TCP
servers (honeypots), and I wrote it exactly for the reasons being
discussed on this thread: the other developers were not very familiar
with Python (they were planning to use Perl on the project) and a bit
confused with asyncore. Twisted was the obvious answer, but I could not
convince them to put it in the project because of the size and the work
needed to put it in all machines they were planning to use.

I used this library several times the last two years. The last two weeks
I've been using it with another project, but yesterday (a coincidence ?)
I decided to reduce all of it to a single module. 

It is roughly based on Twisted, the interface is similar, some parts are
a copy of Twisted code (select code, LineProtocol is a copy of twisted's
LineReceiver) but only 16k in size, everything is covered by unittests.
It's intended for servers, but client support can be added with some
effort too. Maybe it fits the needs of what is being discussed on this

It's available here:

Pedro Werneck

From guido at  Wed Feb  8 18:59:07 2006
From: guido at (Guido van Rossum)
Date: Wed, 8 Feb 2006 09:59:07 -0800
Subject: [Python-Dev] _length_cue()
In-Reply-To: <008701c62cd0$4e1b4eb0$6402a8c0@arkdesktop>
References: <>
Message-ID: <>

+1 for __length_hint__. Raymond?

On 2/8/06, Andrew Koenig <ark at> wrote:
> > I'm worried about the name.  There are now exactly two names that behave
> > like a special method without having the double-underscores around it.
> > The first name is 'next', which is kind of fine because it's for
> > iterator classes only and it's documented.  But now, consider: the
> > CPython implementation can unexpectedly invoke a method on a
> > user-defined iterator class, even though this method's name is not
> > '__*__' and not documented as special!  That's new and that's bad.
> Might I suggest that at least you consider using "hint" instead of "cue"?
> I'm pretty sure that "hint" has been in use for some time, and always to
> mean a value that can't be assumed to be correct but that improves
> performance if it is.
> For example, algorithms that insert values in balanced trees sometimes take
> hint arguments that suggest where the algorithm should start searching for
> the insertion point.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From guido at  Wed Feb  8 19:07:01 2006
From: guido at (Guido van Rossum)
Date: Wed, 8 Feb 2006 10:07:01 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Patrick Collison <patrick at> wrote:
> And to think that people thought that keeping "lambda", but changing
> the name, would avoid all the heated discussion... :-)

Note that I'm not participating in any attempts to "improve" lambda.

Just about the only improvement I'd like to see is to add parentheses
around the arguments, so you'd write lambda(x, y): x**y instead of
lambda x, y: x**y.

--Guido van Rossum (home page:

From pje at  Wed Feb  8 19:16:16 2006
From: pje at (Phillip J. Eby)
Date: Wed, 08 Feb 2006 13:16:16 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

At 10:07 AM 2/8/2006 -0800, Guido van Rossum wrote:
>On 2/8/06, Patrick Collison <patrick at> wrote:
> > And to think that people thought that keeping "lambda", but changing
> > the name, would avoid all the heated discussion... :-)
>Note that I'm not participating in any attempts to "improve" lambda.
>Just about the only improvement I'd like to see is to add parentheses
>around the arguments, so you'd write lambda(x, y): x**y instead of
>lambda x, y: x**y.

lambda(x,y) looks like a function call until you hit the ':'; we don't 
usually have keywords that work that way.

How about (lambda x,y: x**y)?  It seems like all the recently added 
constructs (conditionals, yield expressions, generator expressions) take on 
this rather lisp-y look.  :)

Or, if you wanted to eliminate the "lambda" keyword, then "(from x,y return 
x**y)" could be a "function expression", and it looks even more like most 
of the recently-added expression constructs.

Well, actually, I guess to mirror the style of conditionals and genexps 
more closely, it would have to be something like "(return x**y from x,y)" 
or "(x**y from x,y)".

Ugh.  Never mind, let's just leave it the way it is today.  :)

From martin at  Wed Feb  8 19:21:51 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 19:21:51 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Thomas Heller wrote:
> I'm not sure the platform SDK include files (.H and .IDL) are really
> compatible with VC7.1.  I remember that we (on our company, building C++
> software) had to 'Unregister the PSDK Directories with Visual Studio'
> (available from the start menu) before building the stuff, otherwise
> there were compiler errors.

This needs some testing, sure. However, I'm fairly confident that
Microsoft has fixed/is going to fix whatever issues arise - they
want the platform SDK to be usable with a "recent" compiler (not
necessarily the latest one). There was a recent update to the platform
SDK (which now comes with both Itanium and AMD64 compilers), so
I'm (still) optimistic.


From ldlandis at  Wed Feb  8 18:56:07 2006
From: ldlandis at (LD 'Gus' Landis)
Date: Wed, 8 Feb 2006 11:56:07 -0600
Subject: [Python-Dev] _length_cue()
In-Reply-To: <008701c62cd0$4e1b4eb0$6402a8c0@arkdesktop>
References: <>
Message-ID: <>

+1 on 'hint' vs 'cue'... also infers 'not definitive' (sort of like having a
    hint of how much longer the "honey do" list is... the honey do list is
    never 'exhaustive', only exhausting! ;-)

On 2/8/06, Andrew Koenig <ark at> wrote:
> > I'm worried about the name.  There are now exactly two names that behave
> > like a special method without having the double-underscores around it.
> > The first name is 'next', which is kind of fine because it's for
> > iterator classes only and it's documented.  But now, consider: the
> > CPython implementation can unexpectedly invoke a method on a
> > user-defined iterator class, even though this method's name is not
> > '__*__' and not documented as special!  That's new and that's bad.
> Might I suggest that at least you consider using "hint" instead of "cue"?
> I'm pretty sure that "hint" has been in use for some time, and always to
> mean a value that can't be assumed to be correct but that improves
> performance if it is.
> For example, algorithms that insert values in balanced trees sometimes take
> hint arguments that suggest where the algorithm should start searching for
> the insertion point.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

LD Landis - N0YRQ - from the St Paul side of Minneapolis

From fumanchu at  Wed Feb  8 19:24:35 2006
From: fumanchu at (Robert Brewer)
Date: Wed, 8 Feb 2006 10:24:35 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
Message-ID: <6949EC6CD39F97498A57E0FA55295B210179086B@ex9.hostedexchange.local>

Barry Warsaw wrote:
> On Tue, 2006-02-07 at 16:01 -0800, Robert Brewer wrote:
> > Perhaps, but please keep in mind that the smtpd module uses 
> > both, currently, and would have to be rewritten if either is 
> > "removed".
> Would that really be a huge loss?

It'd be a huge loss for the random fellow who needs to write an email
fixup proxy between a broken client and Exim in a couple of hours. ;)
But I can't speak for how often this need comes up among users.

Robert Brewer
System Architect
Amor Ministries
fumanchu at

From ronaldoussoren at  Wed Feb  8 19:33:31 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Wed, 8 Feb 2006 19:33:31 +0100
Subject: [Python-Dev] Python modules should link to libpython
In-Reply-To: <1139413624.10037.36.camel@localhost>
References: <1139413624.10037.36.camel@localhost>
Message-ID: <>

On 8-feb-2006, at 16:47, Gustavo J. A. M. Carneiro wrote:

> gjc:/usr/lib/python2.4/lib-dynload$ ldd
> => /lib/ (0x00002aaaaabcc000)
> => /lib/ (0x00002aaaaace2000)
>         /lib/ (0x0000555555554000)
> gjc:/usr/lib/python2.4/lib-dynload$
> It seems that Python C extension modules are not linking explicitly to
> libpython.  Yet, they explicitly reference symbols defined in  
> libpython.
> When libpython is loaded in a global scope all is fine.  However, when
> libpython is dlopen()ed with the RTLD_LOCAL flag, python C extensions
> always get undefined symbols.
>   This problem happened recently with the nautilus-python package,  
> which
> installs an extension for the Nautilus file manager that allows
> extensions in Python.  For performance reasons, it now opens  
> extensions
> with RTLD_LOCAL flag, thus breaking python extensions.
>   Any thoughts?  Should I go ahead and open a bug report (maybe with
> patch), or is this controversial?

I don't know about Linux, but on OSX we don't link with libpython
(or Python.framework) on purpose: this allows you to share extensions  
several builds of the same version of Python. If you do link with  
and extension that was compiled by a python installed at a different  
will result in having two copies of libpython in memory, only one of  
is initialized. You end up with very interesting crashes.


> -- 
> Gustavo J. A. M. Carneiro
> <gjc at> <gustavo at>
> The universe is always one step beyond logic.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2157 bytes
Desc: not available
Url : 

From martin at  Wed Feb  8 19:38:42 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 19:38:42 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <031401c62cbb$61810630$bf03030a@trilan>
References: <>
Message-ID: <>

Giovanni Bajo wrote:
>>I just came up with an idea how to resolve the VC versioning
>>problems for good: Python should link with mscvrt.dll (which
>>is part of the operating system), not with the CRT that the
>>compiler provides.
> Can you elaborate exactly on which versioning problems you think of?

I could, but I don't want to elaborate too much. Please google for
it - there has been written a lot about it.

In short, you cannot really link two different versions of msvcrt
(e.g. mscvrt.dll, msvcrt4.dll, msvcr7.dll, msvcr71.dll, mscvrtd.dll,
msvcr71d.dll, ...) into a single program, plus you cannot redistribute
the CRT unless you are a Visual Studio licensee. This causes problems
for extension writers: they need to own the same version of visual
studio that Python was built with.

> It would complicate the build process and make Python lag behind CRT
> development (including bugfixes and whatnot) that Microsoft does.

There isn't really too much development in the CRT, and the little
development I can see (e.g. in VS 2005) is rather counter-productive.
So ideally, Python should drop usage of the CRT entirely (but getting
there will be a long process). Hopefully, P3k will drop usage of
stdio for file objects, which will be a big step forward.

> You could
> as well ask to always stick with GCC 2.95 to solve ABI problems, but I don't
> think it's the correct long time solution. I expect more and more Windows
> libraries (binary version) to be shipped with dependencies on MSVCR71.DLL.

Now that VS2005 is out, I doubt that. More and more will also depend on
msvcr80.dll. Then, when the next visual studio comes out, you can
(probably) add msvcr81.dll to the list of libraries that might be used.
This will go on forever, and we cannot win.

It's really not using GCC 2.95 which I'm after. It's using /lib/
that I want to. People should be free to use whatever compiler they have
access to.


From martin at  Wed Feb  8 19:43:48 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 19:43:48 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsc6sa$85k$>
References: <>	<>	<>	<>	<>
Message-ID: <>

Steve Holden wrote:
> In case the processing of events needed to block? If I'm processing web 
> requests in an async* dispatch loop and a request needs the results of a 
> (probably lengthy) database query in order to generate its output, how 
> do I give the dispatcher control again to process the next asynchronous 
> network event?

I see. Ideally, you should obtain the socket for the connection to
the database, and add it to the asyncore loop. That would require
you have an async database API, of course.


From barry at  Wed Feb  8 19:45:55 2006
From: barry at (Barry Warsaw)
Date: Wed, 08 Feb 2006 13:45:55 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B210179086B@ex9.hostedexchange.local>
References: <6949EC6CD39F97498A57E0FA55295B210179086B@ex9.hostedexchange.local>
Message-ID: <>

On Wed, 2006-02-08 at 10:24 -0800, Robert Brewer wrote:

> It'd be a huge loss for the random fellow who needs to write an email
> fixup proxy between a broken client and Exim in a couple of hours. ;)

Or the guy who needs to whip together an RFC-compliant minimal SMTP
server to use in unit tests of some random Python implemented mailing
list manager.  Just fer instance.  But still...

> But I can't speak for how often this need comes up among users.

Yeah, there is that. ;)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From guido at  Wed Feb  8 19:49:14 2006
From: guido at (Guido van Rossum)
Date: Wed, 8 Feb 2006 10:49:14 -0800
Subject: [Python-Dev] Old Style Classes Goiung in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Fuzzyman <fuzzyman at> wrote:
> I understand that old style classes are slated to disappear in Python 3000.
> Does this mean that the following will be a syntax error :
> class Something:
>     pass
> *or* that instead it will automatically inherit from object ?

The latter of course. I never even considered making this illegal. :-)

--Guido van Rossum (home page:

From brett at  Wed Feb  8 19:49:22 2006
From: brett at (Brett Cannon)
Date: Wed, 8 Feb 2006 10:49:22 -0800
Subject: [Python-Dev] Old Style Classes Goiung in Py3K
In-Reply-To: <dscvr0$tck$>
References: <> <dscvr0$tck$>
Message-ID: <>

On 2/8/06, Georg Brandl <g.brandl at> wrote:
> Fuzzyman wrote:
> > Hello all,
> >
> > I understand that old style classes are slated to disappear in Python 3000.
> >
> > Does this mean that the following will be a syntax error :
> >
> > class Something:
> >     pass
> >
> > *or* that instead it will automatically inherit from object ?
> Of course, I would say. There's no reason to forbid this in Py3k.

And you would be right.  Guido has always said that classes would act
as if they inherited from object by default.  There are no plans to
change the syntax of how you can specify inheritance in Python 3.  All
that is changing is what the default is when you specify no


From martin at  Wed Feb  8 19:55:38 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 19:55:38 +0100
Subject: [Python-Dev] Python modules should link to libpython
In-Reply-To: <1139413624.10037.36.camel@localhost>
References: <1139413624.10037.36.camel@localhost>
Message-ID: <>

Gustavo J. A. M. Carneiro wrote:
>   Any thoughts?  Should I go ahead and open a bug report (maybe with
> patch), or is this controversial?

You should only link with libpython if there really is a shared
libpython. In a standard Python installation, there is no libpython, but
instead, symbols are in the executable.

Notice that libpython isn't really supported: all changes to that code
originate from contributions, and I refuse to develop changes to it
myself. So you can file a bug report, but there likely won't be any
reaction in the next few years (atleast not from me).

OTOH, if a working patch was contributed, I could apply that fairly
quickly: I agree that modules should link with libpython if libpython
is shared.

I can accept that the Mac does it differently, although I think the
rationale for doing that is dangerous: you shouldn't really attempt
to share extension modules across Python versions.


From brett at  Wed Feb  8 19:58:46 2006
From: brett at (Brett Cannon)
Date: Wed, 8 Feb 2006 10:58:46 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Phillip J. Eby <pje at> wrote:
> At 10:07 AM 2/8/2006 -0800, Guido van Rossum wrote:
> >On 2/8/06, Patrick Collison <patrick at> wrote:
> > > And to think that people thought that keeping "lambda", but changing
> > > the name, would avoid all the heated discussion... :-)
> >
> >Note that I'm not participating in any attempts to "improve" lambda.
> >
> >Just about the only improvement I'd like to see is to add parentheses
> >around the arguments, so you'd write lambda(x, y): x**y instead of
> >lambda x, y: x**y.
> lambda(x,y) looks like a function call until you hit the ':'; we don't
> usually have keywords that work that way.

I agree with Phillip.  Making it look more like a function definition,
I think, is a bad move to make.  The thing is quirky as-is, let's not
partially mask that fact.

> How about (lambda x,y: x**y)?  It seems like all the recently added
> constructs (conditionals, yield expressions, generator expressions) take on
> this rather lisp-y look.  :)
> Or, if you wanted to eliminate the "lambda" keyword, then "(from x,y return
> x**y)" could be a "function expression", and it looks even more like most
> of the recently-added expression constructs.
> Well, actually, I guess to mirror the style of conditionals and genexps
> more closely, it would have to be something like "(return x**y from x,y)"
> or "(x**y from x,y)".
> Ugh.  Never mind, let's just leave it the way it is today.  :)

``(use x, y, in x**y)`` is the best I can think of off the top of my
head.  But if Guido is not budging on tweaking lambda in any way other
than parentheses, then I say just leave the busted thing as it is and
let it be the wart that was never removed.


From ronaldoussoren at  Wed Feb  8 20:02:40 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Wed, 8 Feb 2006 20:02:40 +0100
Subject: [Python-Dev] Python modules should link to libpython
In-Reply-To: <>
References: <1139413624.10037.36.camel@localhost> <>
Message-ID: <>

On 8-feb-2006, at 19:55, Martin v. L?wis wrote:

> Gustavo J. A. M. Carneiro wrote:
>>   Any thoughts?  Should I go ahead and open a bug report (maybe with
>> patch), or is this controversial?
> I can accept that the Mac does it differently, although I think the
> rationale for doing that is dangerous: you shouldn't really attempt
> to share extension modules across Python versions.

My explanation seems to be bad, I meant to say sharing extensions across
different builds of the same Python version. One might install a normal
unix build in /opt/python and a framework build in /Library/Frameworks.

This is not as important now as it was when Python 2.3.x was state of  
art, then you could have a python 2.3.x framework both in
/System/Library/Frameworks (provided by Apple) and in /Library/ 
(build yourself or downloaded the official MacPython binaries). Those  
share the same site-packages directory (/Library/Python/2.3).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2157 bytes
Desc: not available
Url : 

From tim.peters at  Wed Feb  8 20:07:58 2006
From: tim.peters at (Tim Peters)
Date: Wed, 8 Feb 2006 14:07:58 -0500
Subject: [Python-Dev] small floating point number problem
In-Reply-To: <001c01c62c86$d694beb0$b83efea9@RaymondLaptop1>
References: <006d01c62c7d$d7bab730$1f2c4fca@csmith>
Message-ID: <>

[Raymond Hettinger]
> ...
> The asymmetric handling of denormals by the atof() and ftoa() functions is
> why you see a difference.  A consequence of that asymmetry is the breakdown
> of the expected eval(repr(f))==f invariant:

Just noting that such behavior is a violation of the 754 standard for
string->double conversion.  But Microsoft's libraries don't _claim_ to
support the 754 standard, so good luck suing them ;-).  Python doesn't
promise anything here either.

From raymond.hettinger at  Wed Feb  8 20:16:10 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 08 Feb 2006 14:16:10 -0500
Subject: [Python-Dev] Let's just *keep* lambda
References: <><>
Message-ID: <009401c62ce4$1f139320$b83efea9@RaymondLaptop1>

> How about (lambda x,y: x**y)?

The purpose of this thread was to conserve brain-power by bringing the issue 
to a close.  Instead, it is turning into syntax/renaming fest.  May I 
suggest that this be moved to comp.lang.python and return only if a 
community consensus emerges from the thousands of random variants?


From bob at  Wed Feb  8 20:17:07 2006
From: bob at (Bob Ippolito)
Date: Wed, 8 Feb 2006 11:17:07 -0800
Subject: [Python-Dev] Python modules should link to libpython
In-Reply-To: <>
References: <1139413624.10037.36.camel@localhost> <>
Message-ID: <>

On Feb 8, 2006, at 11:02 AM, Ronald Oussoren wrote:

> On 8-feb-2006, at 19:55, Martin v. L?wis wrote:
>> Gustavo J. A. M. Carneiro wrote:
>>>   Any thoughts?  Should I go ahead and open a bug report (maybe with
>>> patch), or is this controversial?
>> I can accept that the Mac does it differently, although I think the
>> rationale for doing that is dangerous: you shouldn't really attempt
>> to share extension modules across Python versions.
> My explanation seems to be bad, I meant to say sharing extensions  
> across
> different builds of the same Python version. One might install a  
> normal
> unix build in /opt/python and a framework build in /Library/ 
> Frameworks.
> This is not as important now as it was when Python 2.3.x was state  
> of the
> art, then you could have a python 2.3.x framework both in
> /System/Library/Frameworks (provided by Apple) and in /Library/ 
> Frameworks
> (build yourself or downloaded the official MacPython binaries).  
> Those would
> share the same site-packages directory (/Library/Python/2.3).

They never shared the same site-packages directory...  The major  
reason we use -undefined dynamic_lookup rather than linking directly  
to a particular Python is so that the framework can be moved around  
without everything going to hell.. e.g., to the inside of an  
application bundle.  At the time, we didn't have any tools that could  
do the Mach-O header rewriting that py2app does now.  There isn't a  
whole lot of reason to use -undefined dynamic_lookup these days, but  
there also isn't many compelling reasons to go back to direct linking.

There is one use case that direct linking would support: having  
multiple distinct Python interpreters in the same process space,  
which could be useful for writing plug-ins to applications that are  
not Python based... Other than that, there's little reason to bother  
with it.


From martin at  Wed Feb  8 20:18:40 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 20:18:40 +0100
Subject: [Python-Dev] Python modules should link to libpython
In-Reply-To: <>
References: <1139413624.10037.36.camel@localhost> <>
Message-ID: <>

Ronald Oussoren wrote:
> My explanation seems to be bad, I meant to say sharing extensions across
> different builds of the same Python version. One might install a normal
> unix build in /opt/python and a framework build in /Library/Frameworks.

Sorry, I didn't read your message carefully enough. This isn't a problem
in Unix/ELF: you (normally) only put the name of the library into the
resulting executable/library, not the absolute path. You then use
the library search path (system-defined or LD_LIBRARY_PATH) to find the


From fumanchu at  Wed Feb  8 20:20:23 2006
From: fumanchu at (Robert Brewer)
Date: Wed, 8 Feb 2006 11:20:23 -0800
Subject: [Python-Dev] Let's just *keep* lambda
Message-ID: <6949EC6CD39F97498A57E0FA55295B21017909B9@ex9.hostedexchange.local>

Raymond Hettinger wrote:
> > How about (lambda x,y: x**y)?
> The purpose of this thread was to conserve brain-power by 
> bringing the issue to a close.  Instead, it is turning into
> syntax/renaming fest.  May I suggest that this be moved to
> comp.lang.python and return only if a community consensus
> emerges from the thousands of random variants?

I'd like to suggest this be moved to comp.lang.python and never return.
Community consensus on syntax is a pipe dream.

Robert Brewer
System Architect
Amor Ministries
fumanchu at

From steve at  Wed Feb  8 20:28:25 2006
From: steve at (Steve Holden)
Date: Wed, 08 Feb 2006 14:28:25 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <009401c62ce4$1f139320$b83efea9@RaymondLaptop1>
References: <><>	<>
Message-ID: <dsdgop$85b$>

Raymond Hettinger wrote:
>>How about (lambda x,y: x**y)?
> The purpose of this thread was to conserve brain-power by bringing the issue 
> to a close.  Instead, it is turning into syntax/renaming fest.  May I 
> suggest that this be moved to comp.lang.python and return only if a 
> community consensus emerges from the thousands of random variants?
Right, then we can get back to important stuff like how to represent 
octal constants.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From keith at  Wed Feb  8 20:45:48 2006
From: keith at (Keith Dart)
Date: Wed, 8 Feb 2006 11:45:48 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <6949EC6CD39F97498A57E0FA55295B210179086B@ex9.hostedexchange.local>
Message-ID: <>

Barry Warsaw wrote the following on 2006-02-08 at 13:45 PST:
> Or the guy who needs to whip together an RFC-compliant minimal SMTP
> server to use in unit tests of some random Python implemented mailing
> list manager.  Just fer instance.  But still...
> > But I can't speak for how often this need comes up among users.
> Yeah, there is that. ;)

There are other, third-party, SMTP server objects available. You could
always use one of those. 

Once the "Python egg" and PyPI improve and start widespread use perhaps
the question of what is in the core library and what is not will become

Being a Gentoo Linux user I already enjoy having many modules
available, with automatic dependency installation, on demand. So the
idea of "core" library is already blurred for me.


-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at>
   public key: ID: 19017044
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : 

From keith at  Wed Feb  8 21:00:17 2006
From: keith at (Keith Dart)
Date: Wed, 8 Feb 2006 12:00:17 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote the following on 2006-02-08 at 10:07 PST:
> Note that I'm not participating in any attempts to "improve" lambda.


FWIW, I like lambda. No need to change it. Thank you.


-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at>
   public key: ID: 19017044
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : 

From raymond.hettinger at  Wed Feb  8 21:02:21 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 08 Feb 2006 15:02:21 -0500
Subject: [Python-Dev] _length_cue()
References: <>
Message-ID: <00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1>

[Armin Rigo]
> It was replaced by an optional undocumented method called _length_cue(),
> which would be used to guess the number of remaining items in an
> iterator, for performance reasons.
> I'm worried about the name.  There are now exactly two names that behave
> like a special method without having the double-underscores around it.
> IMHO for safety reasons we need to stick double-underscores around this
> name too, e.g. __length_cue__().

The single underscore was meant to communicate that this method is private 
(which is why it is undocumented).  Accordingly, the dir() function is smart 
enough to omit the method from its listing (which is a good thing).

We follow similar naming conventions in pure Python library code.  OTOH, 
this one is a bit different in that it is not truly private; rather, it is 
more like a friend method used internally for various tools to be able to 
communicate with each other.  If you change to a double underscore 
convention, you're essentially making this a public protocol.

IMHO, the "safety reasons" are imaginary -- the scenario would involve 
subclassing one of these builtin objects and attaching an identically named 
private method.

All that being said, I don't feel strongly about it and you guys are welcome 
to change it if offends your naming convention sensibilities.

[Andrew Koenig]
> Might I suggest that at least you consider using "hint" instead of "cue"?

Personally, I prefer "cue" which my dictionary defines as "a signal, hint, 
or suggestion".   The alternate definition of "a prompt for some action" 
applies equally well.

Also, to my ear, length_hint doesn't sound right.

I'm -0 on changing the name.  If you must, then go ahead.

[Armin Rigo]
> BTW the reason I'm looking at this is that I'm considering adding
> another undocumented internal-use-only method, maybe __getitem_cue__(),
> that would try to guess what the nth item to be returned will be.  This
> would allow the repr of some iterators to display more helpful
> information when playing around with them at the prompt, e.g.:
>>>> enumerate([3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141596])
> <enumerate (0, 3.1), (1, 3.14), (2, 3.141),... length 6>

At one point, I explored and then abandoned this idea.  For objects like 
itertools.count(n), it worked fine -- the state was readily knowable and the 
eval(repr(obj)) round-trip was possible.  However, for tools like 
enumerate(), it didn't make sense to have a preview that only applied in a 
tiny handful of (mostly academic) cases and was not evaluable in any case.

I was really attracted to the idea of having more informative iterator 
representations but learned that even when it could be done, it wasn't 
especially useful.  When someone creates an iterator at the interactive 
prompt, they almost always either wrap it in a consumer function or they 
assign it to a variable.  The case of typing just, "enumerate([1,2,3])", 
comes up only once, when first learning was enumerate() does.


From barry at  Wed Feb  8 21:08:08 2006
From: barry at (Barry Warsaw)
Date: Wed, 08 Feb 2006 15:08:08 -0500
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <6949EC6CD39F97498A57E0FA55295B210179086B@ex9.hostedexchange.local>
Message-ID: <>

On Wed, 2006-02-08 at 11:45 -0800, Keith Dart wrote:

> There are other, third-party, SMTP server objects available. You could
> always use one of those. 

Very true.  In fact, Twisted comes to the rescue again here.  When I
needed to test Mailman's NNTP integration I could either spend several
weeks figuring out how to install and configure some traditional NNTP
server, or I could just install Twisted and run exactly three commands
(one of which was "sudo" :).

> Once the "Python egg" and PyPI improve and start widespread use perhaps
> the question of what is in the core library and what is not will become
> moot.


> Being a Gentoo Linux user I already enjoy having many modules
> available, with automatic dependency installation, on demand. So the
> idea of "core" library is already blurred for me.

Although I'm doing a lot more dev on the Mac these days, I definitely
agree that this is what makes Gentoo so cool for Linux, and I can't wait
for Gentoo-on-OSX to switch to doing things the Right Way (can you say
bye-bye DarwinPorts?).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From ateijelo at  Wed Feb  8 17:10:45 2006
From: ateijelo at (Andy Teijelo =?iso-8859-1?q?P=E9rez?=)
Date: Wed, 8 Feb 2006 11:10:45 -0500
Subject: [Python-Dev] Path PEP: some comments
In-Reply-To: <030001c629c2$30345c90$bf03030a@trilan>
References: <030001c629c2$30345c90$bf03030a@trilan>
Message-ID: <>

El S?bado, 4 de Febrero de 2006 2:35, Giovanni Bajo escribi?:
> Hello,
> my comments on the Path PEP:
> - Many methods contain the word 'path' in them. I suppose this is to help
> transition from the old library to the new library. But in the context of a
> new Python user, I don't think that Path.abspath() is optimal. Path.abs()
> looks better. Maybe it's not so fundamental to have exactly the same names
> of the old library, especially when thinking of future? If I rearrange my
> code to use Path, I can as well rename methods to something more sound at
> the same time.
I haven't revised the whole class to look exactly which methods contain the 
word path and which do not. But, anyway this is just a simple comment. It's 
clear to me that Path.abspath() look redundant and Path.abs() tells clearly 
what the method does. But I think in most cases the method won't be used 
through the class, like 'Path.abs(instance)' but through an existing instance 
like 'home.abs()'. In this case, I think 'home.abspath()'  would be more 
readable than 'home.abs()'. Anyway, in the long term, I think people will 
just get used to what gets finally decided, so I could say I'm +0 about this. 
(Does one have to be a python developer or something to use the {+,-}{0,1} 
thing?, 'cause I'm not.)


From steven.bethard at  Wed Feb  8 22:16:15 2006
From: steven.bethard at (Steven Bethard)
Date: Wed, 8 Feb 2006 14:16:15 -0700
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B21017909B9@ex9.hostedexchange.local>
References: <6949EC6CD39F97498A57E0FA55295B21017909B9@ex9.hostedexchange.local>
Message-ID: <>

Robert Brewer wrote:
> Community consensus on syntax is a pipe dream.


And trust me, it'll be in there, since I'm one of the summary writers. ;-)

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From joao.macaiba at  Wed Feb  8 22:31:27 2006
From: joao.macaiba at (Joao Macaiba)
Date: Wed, 08 Feb 2006 19:31:27 -0200
Subject: [Python-Dev] Help on choosing a PEP to volunteer on it : 308,
	328 or 343
Message-ID: <>

Hi. I'm interested in doing an undergraduate project under some Python 
core PEP.

I'm newbie to Python core. Program in C/C++.

I've downloaded the sources with svn and now I'm studying it.

There are 3 PEP accepted :

. 308 : Conditional Expressions

. 328 : Imports: Multi-Line and Absolute/Relative

. 343 : The "with" Statement

I've some questions :

1. For a newbie in the Python core development, what is the best PEP to 
begin with ?

2. PEP's "owner" is the one who submitted the proposal or the one who is 
working on it;

3. How do we know what are the developers working on the PEP ?

Thanks in advance.

Joao Macaiba (wavefunction).

From brett at  Wed Feb  8 22:39:34 2006
From: brett at (Brett Cannon)
Date: Wed, 8 Feb 2006 13:39:34 -0800
Subject: [Python-Dev] Help on choosing a PEP to volunteer on it : 308,
	328 or 343
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Joao Macaiba <joao.macaiba at> wrote:
> Hi. I'm interested in doing an undergraduate project under some Python
> core PEP.
> I'm newbie to Python core. Program in C/C++.
> I've downloaded the sources with svn and now I'm studying it.
> There are 3 PEP accepted :
> . 308 : Conditional Expressions
> . 328 : Imports: Multi-Line and Absolute/Relative
> . 343 : The "with" Statement
> I've some questions :
> 1. For a newbie in the Python core development, what is the best PEP to
> begin with ?

Wild guess?  308, but that still requires changing the grammar and
editing the AST compiler.  328 will need playing with the import code
which is known to be hairy.  343 has the same needs as 308, but I bet
would be more complicated.

> 2. PEP's "owner" is the one who submitted the proposal or the one who is
> working on it;

Technically it is the person who drew up the proposal and agreed to
carry it through.  Usually, though, they are also the ones willing to
implement it (or at least make sure that happens).

> 3. How do we know what are the developers working on the PEP ?

You ask just like you are.  =)  Otherwise you just have to listen on
python-dev for anyone to mention they are working on it.


From martin at  Wed Feb  8 22:45:54 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Feb 2006 22:45:54 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:
> To do that, we would need to compile and link with the SDK
> header files and import libraries, not with the ones that
> visual studio provides.

I withdraw that idea. It appears that the platform SDK doesn't
(any longer?) provide an import library for msvrt.dll, and
Microsoft documents mscvrt as intended only for "system


From raymond.hettinger at  Wed Feb  8 22:58:01 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 08 Feb 2006 16:58:01 -0500
Subject: [Python-Dev] Help on choosing a PEP to volunteer on it : 308,
 328 or 343
References: <>
Message-ID: <001f01c62cfa$bba337c0$b83efea9@RaymondLaptop1>

[Joao Macaiba]
> 1. For a newbie in the Python core development, what is the best PEP to 
> begin with ?

I recommend, PEP 308: Conditional Expressions


From nyamatongwe at  Wed Feb  8 23:38:11 2006
From: nyamatongwe at (Neil Hodgson)
Date: Thu, 9 Feb 2006 09:38:11 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis:

> So ideally, Python should drop usage of the CRT entirely (but getting
> there will be a long process). Hopefully, P3k will drop usage of
> stdio for file objects, which will be a big step forward.

  You don't need to drop the CRT, just encapsulate it so there is one
copy controlled by Python that hands out wrapped objects (file
handles, file pointers, memory blocks, others?). These wrappers can
only be manipulated through calls back to that owning code that then
calls the CRT. Unfortunately this change would itself be incompatible
with current extensions.


From arigo at  Thu Feb  9 00:51:56 2006
From: arigo at (Armin Rigo)
Date: Thu, 9 Feb 2006 00:51:56 +0100
Subject: [Python-Dev] _length_cue()
In-Reply-To: <00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Hi Raymond,

On Wed, Feb 08, 2006 at 03:02:21PM -0500, Raymond Hettinger wrote:
> IMHO, the "safety reasons" are imaginary -- the scenario would involve 
> subclassing one of these builtin objects and attaching an identically named 
> private method.

No, the senario applies to any user-defined iterator class, not
necessary subclassing an existing one:

>>> class MyIter(object):
...     def __iter__(self):
...         return self
...     def next(self):
...         return whatever
...     def _length_cue(self):
...         print "oups! please, CPython, don't call me unexpectedly"
>>> list(MyIter())
oups! please, CPython, don't call me unexpectedly

This means that _length_cue() is at the moment a special method, in the
sense that Python can invoke it implicitely.

This said, do we vote for __length_hint__ or __length_cue__? :-)
And does anyone objects about __getitem_hint__ or __getitem_cue__?
Maybe __lookahead_hint__ or __lookahead_cue__?


From thomas at  Thu Feb  9 01:08:01 2006
From: thomas at (Thomas Wouters)
Date: Thu, 9 Feb 2006 01:08:01 +0100
Subject: [Python-Dev] Help on choosing a PEP to volunteer on it : 308,
	328 or 343
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 08, 2006 at 01:39:34PM -0800, Brett Cannon wrote:
> On 2/8/06, Joao Macaiba <joao.macaiba at> wrote:

> > 1. For a newbie in the Python core development, what is the best PEP to
> > begin with ?

> Wild guess?  308, but that still requires changing the grammar and
> editing the AST compiler.  328 will need playing with the import code
> which is known to be hairy.  343 has the same needs as 308, but I bet
> would be more complicated.

Joao brought up an interesting point on #python on freenode, though... Is
there any documentation regarding the AST code? I started fiddling with it
just to get to know it, adding some weird syntax just for the hell of it,
and I *think* I understand how the AST is supposed to work. I haven't gotten
around to actually coding it, though (just like I haven't gotten around to
PEP 13 ;) so maybe I have it all wrong. A short description of the
principles and design choices would be nice, maybe with a paragraph on how
to add new syntax constructs. How tightly should the AST follow the grammar,
for instance?

(I pointed Joao to the augmented assignment patch for 2.0, which doesn't say
anything about the AST but should be helpful hints in his quest to
understand Python's internals. Lord knows that's how I learned it... By the
time he groks it all, hopefully someone can help him with the AST parts ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From mwh at  Thu Feb  9 01:18:28 2006
From: mwh at (Michael Hudson)
Date: Thu, 09 Feb 2006 00:18:28 +0000
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <> (Guido
	van Rossum's message of "Wed, 8 Feb 2006 10:07:01 -0800")
References: <>
Message-ID: <>

Guido van Rossum <guido at> writes:

> On 2/8/06, Patrick Collison <patrick at> wrote:
>> And to think that people thought that keeping "lambda", but changing
>> the name, would avoid all the heated discussion... :-)
> Note that I'm not participating in any attempts to "improve" lambda.
> Just about the only improvement I'd like to see is to add parentheses
> around the arguments, so you'd write lambda(x, y): x**y instead of
> lambda x, y: x**y.

That would seem to be a bad idea, as it means something already:

>>> f = lambda (x,y): x + y
>>> t = (1,2)
>>> f(t)


  Solaris: Shire horse that dreams of being a race horse,
  blissfully unaware that its owners don't quite know whether
  to put it out to grass, to stud, or to the knackers yard.
                           -- Jim's pedigree of operating systems, asr

From edgimar at  Tue Feb  7 09:34:05 2006
From: edgimar at (Mark Edgington)
Date: Tue, 07 Feb 2006 09:34:05 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
Message-ID: <>

Martin v. L?wis wrote:
 > That patch looks wrong. What does it mean to "run in a thread"?
 > All code runs in a thread, all the time: sometime, that thread
 > is the main thread.
 > Furthermore, I can't see any presumed thread-unsafety in asynchat.

Ok, perhaps the notation could be improved, but the idea of the 
semaphore in the patch is "Does it run inside of a multithreaded 
environment, and could its push() functions be called from a different 

I have verified that there is a problem with running it in such an 
environment.  My results are more or less identical to those described 
in the following thread:
(see also the reply message to this one regarding the solution -- if you 
look at the Zope source, Zope deals with the problem in the way I am 
suggesting asynchat be patched)

It seems that somehow in line 271 (python 2.4) of, 
producer_fifo.list is not empty, and thus popleft() is executed. 
However, popleft() finds the deque empty.  This means that somehow the 
deque (or list -- the bug is identical in python 2.3) is emptied between 
the if() and the popleft(), so perhaps asyncore.loop(), running in a 
different thread from the thread which calls async_chat.push(), empties it.

The problem is typically exhibited when running in a multithreaded 
environment, and when calling the async_chat.push() function many (i.e. 
perhaps tens of thousands) times quickly in a row from a different 
thread.  However, this behavior is avoided by creating a Lock for 
refill_buffer(), so that it cannot be executed simultaneously.  It is 
also avoided by not executing initiate_send() at all (as is done by Zope 
in ZHTTPServer.zhttp_channel).

 > Sure, there is a lot of member variables in asynchat which aren't
 > specifically protected against mutual access from different threads.
 > So you shouldn't be accessing the same async_chat object from multiple
 > threads.

If applying this patch does indeed make it safe to use async_chat.push() 
from other threads, why would it be a bad thing to have?  It seems to 
make the code less cryptic (i.e. I don't need to override base classes 
in order to include code which processes a nonempty Queue object -- I 
simply make a call to the push() function of my instance of async_chat, 
and I'm done).


(also, of course push_with_producer() would probably also need the same 
changes that would be made to push() )

From pedro.werneck at  Wed Feb  8 18:11:38 2006
From: pedro.werneck at (Pedro Werneck)
Date: Wed, 8 Feb 2006 15:11:38 -0200
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsd00v$tck$>
References: <> <>
Message-ID: <>

On Wed, 08 Feb 2006 15:42:39 +0100
Georg Brandl <g.brandl at> wrote:

> Neal Norwitz wrote:
> > On 2/7/06, Christopher Armstrong <radeex at> wrote:
> >>
> >> Twisted is wonderful, powerful, rich, and very large.  Perhaps a
> >small > > subset could be carefully extracted
> >>
> >The subject of putting (parts of) Twisted into the standard library
> >comes up once every 6 months or so, at least on our mailing list.
> >For all that I think asyncore is worthless, I'm still against
> >copying Twisted into the stdlib. Or at least I'm not willing to
> >maintain the necessary fork, and I fear the nightmares about
> >versioning that can easily occur when you've got both standard
> >library and third party versions of a project.
> > 
> > I wouldn't be enthusiastic about putting all of Twisted in the
> > stdlib either.  Twisted is on a different release schedule than
> > Python.  However, isn't there a relatively small core subset like
> > Alex mentioned that isn't changing much?  Could we split up those
> > components and have those live in the core, but the vast majority of
> > Twisted live outside as it does now?
> +1. This would be very useful for simple networking applications.

I have a simple library I wrote some time ago to make asynchronous TCP
servers (honeypots), and I wrote it exactly for the reasons being
discussed on this thread: the other developers were not very familiar
with Python (they were planning to use Perl on the project) and a bit
confused with asyncore. Twisted was the obvious answer, but I could not
convince them to put it in the project because of the size and the work
needed to put it in all machines they were planning to use.

I used this library several times the last two years. The last two weeks
I've been using it with another project, but yesterday (a coincidence ?)
I decided to reduce all of it to a single module. 

It is roughly based on Twisted, the interface is similar, some parts are
a copy of Twisted code (select code, LineProtocol is a copy of twisted's
LineReceiver) but only 16k in size, everything is covered by unittests.
It's intended for servers, but client support can be added with some
effort too. Maybe it fits the needs of what is being discussed on this

It's available here:

Pedro Werneck

From seojiwon at  Thu Feb  9 02:22:31 2006
From: seojiwon at (Jiwon Seo)
Date: Wed, 8 Feb 2006 17:22:31 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Guido van Rossum <guido at> wrote:
> On 2/8/06, Patrick Collison <patrick at> wrote:
> > And to think that people thought that keeping "lambda", but changing
> > the name, would avoid all the heated discussion... :-)
> Note that I'm not participating in any attempts to "improve" lambda.

Then, is there any chance anonymous function - or closure - is
supported in python 3.0 ? Or at least have a discussion about it?

(IMHO, closure is very handy for function like map, sort etc. And
having to write a function for multiple statement is kind of good in
that function name explains what it does. However, I sometimes feel
that having no name at all is clearer. Also, having to define a
function when it'll be used only once seemed inappropriate sometimes.)

or is there already discussion about it (and closed)?



> Just about the only improvement I'd like to see is to add parentheses
> around the arguments, so you'd write lambda(x, y): x**y instead of
> lambda x, y: x**y.
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From jcarlson at  Thu Feb  9 02:39:38 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 08 Feb 2006 17:39:38 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Jiwon Seo <seojiwon at> wrote:
> On 2/8/06, Guido van Rossum <guido at> wrote:
> > On 2/8/06, Patrick Collison <patrick at> wrote:
> > > And to think that people thought that keeping "lambda", but changing
> > > the name, would avoid all the heated discussion... :-)
> >
> > Note that I'm not participating in any attempts to "improve" lambda.
> Then, is there any chance anonymous function - or closure - is
> supported in python 3.0 ? Or at least have a discussion about it?
> or is there already discussion about it (and closed)?

Closures already exist in Python.

>>> def foo(bar):
...     return lambda: bar + 1
>>> a = foo(5)
>>> a()

 - Josiah

From janssen at  Thu Feb  9 02:54:51 2006
From: janssen at (Bill Janssen)
Date: Wed, 8 Feb 2006 17:54:51 PST
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: Your message of "Wed, 08 Feb 2006 09:11:38 PST."
Message-ID: <06Feb8.175452pst."58633">

Not terrible.  I think I may try re-working Medusa to use this.


From python at  Thu Feb  9 03:21:02 2006
From: python at (Raymond Hettinger)
Date: Wed, 8 Feb 2006 21:21:02 -0500
Subject: [Python-Dev] _length_cue()
References: <>
Message-ID: <009001c62d1f$79a6abc0$b83efea9@RaymondLaptop1>

[Armin Rigo]
> Hi Raymond,
 . . .
> This means that _length_cue() is at the moment a special method, in the
> sense that Python can invoke it implicitely.

Okay, that makes sense.  Go ahead and make the swap.

> This said, do we vote for __length_hint__ or __length_cue__? :-)

I prefer __length_cue__ unless someone has a strong objection.

> And does anyone objects about __getitem_hint__ or __getitem_cue__?
> Maybe __lookahead_hint__ or __lookahead_cue__?

No objections here though I do question the utility of the protocol.  It is 
going to be difficult to find pairs of objects (one providing the lookahead 
value and the other consuming it) that can make good use of the protocol. 
Outside of those unique pairings, there is no value at all.  Thinking back 
over the code I ever seen, I cannot think of one case where the would have 
been helpful (except for the ill-fated adventure of trying to make iterators 
have more informative __repr__ methods).

Before putting this in production, it would probably be worthwhile to search 
for code where it would have been helpful.  In the case of __length_cue__, 
there was an immediate payoff.

Value pre-fetching has more utility in an environment where the concept is 
used everywhere (such as your lightning demo at PyCon last year where you 
ran iterators forwards/backwards and do tricks with infinite iterators). 
Outside of such an environment, I think it is going to be use-case 


From brett at  Thu Feb  9 03:45:01 2006
From: brett at (Brett Cannon)
Date: Wed, 8 Feb 2006 18:45:01 -0800
Subject: [Python-Dev] Help on choosing a PEP to volunteer on it : 308,
	328 or 343
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Thomas Wouters <thomas at> wrote:
> On Wed, Feb 08, 2006 at 01:39:34PM -0800, Brett Cannon wrote:
> > On 2/8/06, Joao Macaiba <joao.macaiba at> wrote:
> > > 1. For a newbie in the Python core development, what is the best PEP to
> > > begin with ?
> > Wild guess?  308, but that still requires changing the grammar and
> > editing the AST compiler.  328 will need playing with the import code
> > which is known to be hairy.  343 has the same needs as 308, but I bet
> > would be more complicated.
> Joao brought up an interesting point on #python on freenode, though... Is
> there any documentation regarding the AST code? I started fiddling with it
> just to get to know it, adding some weird syntax just for the hell of it,
> and I *think* I understand how the AST is supposed to work. I haven't gotten
> around to actually coding it, though (just like I haven't gotten around to
> PEP 13 ;) so maybe I have it all wrong. A short description of the
> principles and design choices would be nice, maybe with a paragraph on how
> to add new syntax constructs. How tightly should the AST follow the grammar,
> for instance?

There is a Python/compile.txt that was originally started by Jeremy
that I subsequently picked up and heavily fleshed out at the last
PyCon sprint.  It didn't get checked in during the merge because
Jeremy was not sure where to put it.  But I just checked it in since I
realized I can delete it once PEP 339 is updated.  It is slightly out
of date, though, because of the lack of info on the arena API.

> (I pointed Joao to the augmented assignment patch for 2.0, which doesn't say
> anything about the AST but should be helpful hints in his quest to
> understand Python's internals. Lord knows that's how I learned it... By the
> time he groks it all, hopefully someone can help him with the AST parts ;)

Probably best way to read it is to follow how an 'if' statement gets
compiled.  That's how I picked it up.


From barry at  Thu Feb  9 04:15:14 2006
From: barry at (Barry Warsaw)
Date: Wed, 08 Feb 2006 22:15:14 -0500
Subject: [Python-Dev] email 3.1 for Python 2.5 using PEP 8 module names
Message-ID: <>

I posted a message to the email-sig expressing my desire to change our
module naming scheme to conform to PEP 8.  This would entail a bump in
the email version to 3.1, and would be included in Python 2.5.  Of
course, the old names would still work, for at least one Python release.

All the responses so far have been favorable, and Fred Drake provided a
nice hook for allow us to support both the old and new names.  Code is
now checked into the Python sandbox that implements this.

Here's the top of the thread:

I'd like to keep discussion on the email-sig, so please join us there if
you care about this one way or the other.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From greg.ewing at  Thu Feb  9 04:24:05 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 09 Feb 2006 16:24:05 +1300
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis wrote:

> I withdraw that idea. It appears that the platform SDK doesn't
> (any longer?) provide an import library for msvrt.dll, and
> Microsoft documents mscvrt as intended only for "system
> components".

Insofar as it forms a base on which other separately-
compiled pieces of code run, it seems to me that Python
itself deserves to be classed as a "system component".

Although I concede that's probably not quite what
Microsoft mean by the term...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb  9 04:27:50 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 09 Feb 2006 16:27:50 +1300
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Neil Hodgson wrote:

>   You don't need to drop the CRT, just encapsulate it so there is one
> copy controlled by Python that hands out wrapped objects (file
> handles, file pointers, memory blocks, others?). These wrappers can
> only be manipulated through calls back to that owning code that then
> calls the CRT.

But that won't help when you need to deal with third-party
code that knows nothing about Python or its wrapped file
objects, and calls the CRT (or one of the myriad extant
CRTs, chosen at random:-) directly.

I can't see *any* solution to this that works in general.
Even if Python itself and all its extensions completely
avoid using the CRT, there's still the possibility that
two different extensions will use two third-party libraries
that were compiled with different CRTs.

As far as I can see, Microsoft have created an intractable
mess here. Their solution of "compile your whole program
with the same CRT" completely misses the possibility that
the "whole program" may consist of disparate separately-
written and separately-compiled parts, and there may be no
single person with the ability and/or legal right to
compile and link the whole thing.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb  9 04:27:54 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 09 Feb 2006 16:27:54 +1300
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

Armin Rigo wrote:

> This said, do we vote for __length_hint__ or __length_cue__? :-)

I prefer something containing "hint" rather than "cue"
because it more explicitly says what we mean.

I feel that __length_hint__ is a bit long, though.
We have __len__, not __length__, so maybe it should
be __len_hint__ or __lenhint__.

> And does anyone objects about __getitem_hint__ or __getitem_cue__?

I'm having trouble seeing widespread use cases for this.
If an object is capable of computing arbitrary items on
demand, seems to me it should be implemented as a
lazily-evaluated sequence or mapping rather than an

The iterator protocol is currently very simple and
well-focused on a single task -- producing things
one at a time, in sequence. Let's not clutter it up
with too much more cruft.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb  9 04:41:10 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 09 Feb 2006 16:41:10 +1300
Subject: [Python-Dev] Let's send lambda to the shearing shed (Re: Let's just
	*keep* lambda)
In-Reply-To: <>
References: <>
Message-ID: <>

My thought on lambda at the moment is that it's too VERBOSE.

If a syntax for anonymous functions is to pull its weight,
it needs to be *very* concise. The only time I ever consider
writing a function definition in-line is when the body is
extremely short, otherwise it's clearer to use a def instead.

Given that, I do *not* have the space to waste with 6 or 7
characters of geeky noise-word.

So my vote for Py3k is to either

1) Replace lambda args: value with

   args -> value

or something equivalently concise, or

2) Remove lambda entirely.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From seojiwon at  Thu Feb  9 05:03:31 2006
From: seojiwon at (Jiwon Seo)
Date: Wed, 8 Feb 2006 20:03:31 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, Josiah Carlson <jcarlson at> wrote:
> Jiwon Seo <seojiwon at> wrote:
> >
> > On 2/8/06, Guido van Rossum <guido at> wrote:
> > > On 2/8/06, Patrick Collison <patrick at> wrote:
> > > > And to think that people thought that keeping "lambda", but changing
> > > > the name, would avoid all the heated discussion... :-)
> > >
> > > Note that I'm not participating in any attempts to "improve" lambda.
> >
> > Then, is there any chance anonymous function - or closure - is
> > supported in python 3.0 ? Or at least have a discussion about it?
> >
> > or is there already discussion about it (and closed)?
> Closures already exist in Python.
> >>> def foo(bar):
> ...     return lambda: bar + 1
> ...
> >>> a = foo(5)
> >>> a()
> 6

Not in that we don't have anonymous function (or closure) with
multiple statements. Also, current limited closure does not capture
programming context - or variables.


From smiles at  Thu Feb  9 03:07:59 2006
From: smiles at (Smith)
Date: Wed, 8 Feb 2006 20:07:59 -0600
Subject: [Python-Dev] [BULK]  Python-Dev Digest, Vol 31, Issue 37
References: <>
Message-ID: <008b01c62d30$7f3dbb30$2b2c4fca@csmith>

| From: Michael Hudson <mwh at>
| Guido van Rossum <guido at> writes:
|| On 2/8/06, Patrick Collison <patrick at> wrote:
||| And to think that people thought that keeping "lambda", but changing
||| the name, would avoid all the heated discussion... :-)
|| Note that I'm not participating in any attempts to "improve" lambda.
|| Just about the only improvement I'd like to see is to add parentheses
|| around the arguments, so you'd write lambda(x, y): x**y instead of
|| lambda x, y: x**y.
| That would seem to be a bad idea, as it means something already:
|||| f = lambda (x,y): x + y
|||| t = (1,2)
|||| f(t)
| 3
| Cheers,
| mwh

Hey! I didn't know you could do that. I'm happy. My lambdas just grew parenthesis on the arguments:

>>> f=lambda(x):x+1
>>> f(2)
>>> def go(f,x):
...  print f(x)
>>> go(lambda(x):x+1,1)
>>> go(lambda(x,y):x+y,(1,3))


From jcarlson at  Thu Feb  9 05:51:33 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 08 Feb 2006 20:51:33 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Jiwon Seo <seojiwon at> wrote:
> On 2/8/06, Josiah Carlson <jcarlson at> wrote:
> > Closures already exist in Python.
> >
> > >>> def foo(bar):
> > ...     return lambda: bar + 1
> > ...
> > >>> a = foo(5)
> > >>> a()
> > 6
> Not in that we don't have anonymous function (or closure) with
> multiple statements.

As already said, lambdas (Python's anonymous functions) are limited to a
single expression.  If you can't do what you want with a single
expression, then it probably SHOULD have a name, so you should use a
standard function definition.

> Also, current limited closure does not capture
> programming context - or variables.

You should clarify yourself.  According to my experience, you can do
anything you want with Python closures, it just may take more work than
you are used to.

def environment():
    env = {}
    def get_variable(name):
        return env[name]
    def set_variable(name, value):
        env[name] = value
    def del_variable(name):
        del env[name]
    return get_variable, set_variable, del_variable

 - Josiah

From jcarlson at  Thu Feb  9 06:02:59 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 08 Feb 2006 21:02:59 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Edgington <edgimar at> wrote:
> Martin v. L?wis wrote:
>  > That patch looks wrong. What does it mean to "run in a thread"?
>  > All code runs in a thread, all the time: sometime, that thread
>  > is the main thread.
>  >
>  > Furthermore, I can't see any presumed thread-unsafety in asynchat.
> Ok, perhaps the notation could be improved, but the idea of the 
> semaphore in the patch is "Does it run inside of a multithreaded 
> environment, and could its push() functions be called from a different 
> thread?"

Asyncore is not threadsafe.  The reason it is not threadsafe is because
there was no effort made to make it threadsafe, because it is not
uncommon for the idea of asynchronous sockets to be the antithesis of
threaded socket servers.

In any case, one must be very careful as (at least in older versions of
Python on certain platforms), running sock.send(data) on two threads
simultaneously for the same socket was a segfault.  I understand that
this is what you are trying to avoid, but have you considered just doing...

q = Queue.Queue()
def push(sock, data):
    q.put((sock, data))

def mainloop():
    while not q.empty():
        sock, data = q.get()

Wow, now we don't have to update the standard library to introduce a
false sense of thread-safety into asyncore!

 - Josiah

From greg.ewing at  Thu Feb  9 03:13:25 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 09 Feb 2006 15:13:25 +1300
Subject: [Python-Dev] _length_cue()
In-Reply-To: <00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
> [Andrew Koenig]
>>Might I suggest that at least you consider using "hint" instead of "cue"?
> Personally, I prefer "cue" which my dictionary defines as "a signal, hint, 
> or suggestion". The alternate definition of "a prompt for some action" 
> applies equally well.

No, it doesn't, because it's in the wrong direction. The
caller isn't prompting the callee to perform an action,
it's asking for some information.

I agree that "hint" is a more precise name.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From martin at  Thu Feb  9 06:28:40 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 06:28:40 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Greg Ewing wrote:
> As far as I can see, Microsoft have created an intractable
> mess here. Their solution of "compile your whole program
> with the same CRT" completely misses the possibility that
> the "whole program" may consist of disparate separately-
> written and separately-compiled parts, and there may be no
> single person with the ability and/or legal right to
> compile and link the whole thing.

Hence, Microsoft's suggesting is entirely different these
days: use .NET, and you won't have these versioning problems

I'm getting off-topic...


From martin at  Thu Feb  9 06:33:01 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 06:33:01 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Jiwon Seo wrote:
> Then, is there any chance anonymous function - or closure - is
> supported in python 3.0 ? Or at least have a discussion about it?

That discussion appears to be closed (or, not really: everybody
can discuss, but it likely won't change anything).

> (IMHO, closure is very handy for function like map, sort etc. And
> having to write a function for multiple statement is kind of good in
> that function name explains what it does. However, I sometimes feel
> that having no name at all is clearer. Also, having to define a
> function when it'll be used only once seemed inappropriate sometimes.)

Hmm. Can you give real-world examples (of existing code) where you
needed this?


From oliphant.travis at  Thu Feb  9 08:18:40 2006
From: oliphant.travis at (Travis Oliphant)
Date: Thu, 09 Feb 2006 00:18:40 -0700
Subject: [Python-Dev] Help with Unicode arrays in NumPy
In-Reply-To: <>
References: <dsatpo$4fo$>
	<>	<dsavjl$bgg$>
Message-ID: <dseqcg$g2v$>

Thank you, Martin and Stephen, for the suggestions and comments.

For your information:

We decided that all NumPy arrays of unicode strings will use UCS4 for 
internal representation.  When an element of the array is selected, a 
unicodescalar (which inherits directly from the unicode builtin type but 
has attributes and methods of arrays) will be returned.   On wide 
builds, the scalar is a perfect match.  On narrow builds, surrogate 
pairs will be used if they are necessary as the data is copied over to 
the scalar.

Best regards,


From nyamatongwe at  Thu Feb  9 08:29:39 2006
From: nyamatongwe at (Neil Hodgson)
Date: Thu, 9 Feb 2006 18:29:39 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
	<031401c62cbb$61810630$bf03030a@trilan> <>
Message-ID: <>

Martin v. L?wis:

> I don't think this would be good enough. I then also need a way to
> provide extension authors with an API that looks like the CRT, but
> isn't: they cannot realistically change all their code to use the
> wrapped objects. In a recent case, somebody tried to passed a FILE*
> to a postrgres DLL linked with a different CRT; he shouldn't need
> to change the entire postgres code to use the modified API.

   The postgres example is strange to me as I'd never consider passing
a FILE* over a DLL boundary. Maybe this is a Unix/Windows cultural
thing due to such practices being more dangerous on Windows.

> Also, there is still the redistribution issue: to redistribute
> msvcr71.dll, you need to own a MSVC license. People that want to
> use py2exe (or some such) are in trouble: they need to distribute
> both python25.dll, and msvcr71.dll. They are allowed to distribute
> the former, but (formally) not allowed to distribute the latter.

   Link statically.


From nyamatongwe at  Thu Feb  9 08:29:59 2006
From: nyamatongwe at (Neil Hodgson)
Date: Thu, 9 Feb 2006 18:29:59 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Greg Ewing:

> But that won't help when you need to deal with third-party
> code that knows nothing about Python or its wrapped file
> objects, and calls the CRT (or one of the myriad extant
> CRTs, chosen at random:-) directly.

   Can you explain exactly why there is a problem here? Its fairly
normal under Windows to build applications that provide a generic
plugin interface (think Netscape plugins or COM) that allow the
plugins to be built with any compiler and runtime.


From oliphant.travis at  Thu Feb  9 09:00:22 2006
From: oliphant.travis at (Travis Oliphant)
Date: Thu, 09 Feb 2006 01:00:22 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
 a or b, can be used in X[a:b] notation
Message-ID: <>

Guido seemed accepting to this idea about 9 months ago when I spoke to 
him.  I finally got around to writing up the PEP.   I'd really like to 
get this into Python 2.5 if possible.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PEP_index.txt

From smiles at  Thu Feb  9 09:41:17 2006
From: smiles at (Smith)
Date: Thu, 9 Feb 2006 02:41:17 -0600
Subject: [Python-Dev] py3k and not equal; re names
Message-ID: <034901c62d54$cf30c320$2b2c4fca@csmith>

I'm wondering if it's just "foolish consistency" (to quote a PEP 8) that is calling for the dropping of <> in preference of only !=. I've used the former since the beginning in everything from basic, fortran, claris works, excel, gnumeric, and python. I tried to find a rationale for the dropping--perhaps there is some other object that will be represented (like an empty set). I'm sure there must be some reason, but just want to put a vote in for keeping this variety.

And another suggestion for py3k would be to increase the correspondence between string methods and re methods. e.g. since re.match and string.startswith are checking for the same thing, was there a reason to introduce the new names? The same question is asked for string.find and

Instead of having to learn another set of method names to use re, it would be nice to have the only change be the pattern used for the method.  Here is a side-by-side listing of methods in both modules that are candidates for consistency--hopefully not "foolish" ;-)

      string        re
      ------        ------
      find          search  

startswith    match
split         split
replace       sub
NA            subn
NA            findall
NA            finditer


From eric.nieuwland at  Thu Feb  9 09:46:12 2006
From: eric.nieuwland at (Eric Nieuwland)
Date: Thu, 9 Feb 2006 09:46:12 +0100
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

Travis Oliphant wrote:
> PEP:  ###
> Title:  Allowing any object to be used for slicing
> [...]
> Rationale
>    Currently integers and long integers play a special role in slice
>    notation in that they are the only objects allowed in slice
>    syntax. In other words, if X is an object implementing the sequence
>    protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>    integers or long integers.  There is no way for obj1 and obj2 to
>    tell Python that they could be reasonably used as indexes into a
>    sequence.  This is an unnecessary limitation.
> [...]

I like the general idea from an academic point of view.
Just one question: could you explain what I should expect from x[ 
slicer('spam') : slicer('eggs') ]  when slicer implements this 
Specifically, I'd like to known how you want to define the interval 
between two objects. Or is that for the sliced/indexed object to 


From g.brandl at  Thu Feb  9 09:53:36 2006
From: g.brandl at (Georg Brandl)
Date: Thu, 09 Feb 2006 09:53:36 +0100
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <dsevug$138$>

Eric Nieuwland wrote:
> Travis Oliphant wrote:
>> PEP:  ###
>> Title:  Allowing any object to be used for slicing
>> [...]
>> Rationale
>>    Currently integers and long integers play a special role in slice
>>    notation in that they are the only objects allowed in slice
>>    syntax. In other words, if X is an object implementing the sequence
>>    protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>>    integers or long integers.  There is no way for obj1 and obj2 to
>>    tell Python that they could be reasonably used as indexes into a
>>    sequence.  This is an unnecessary limitation.
>> [...]
> I like the general idea from an academic point of view.
> Just one question: could you explain what I should expect from x[ 
> slicer('spam') : slicer('eggs') ]  when slicer implements this 
> protocol?
> Specifically, I'd like to known how you want to define the interval 
> between two objects. Or is that for the sliced/indexed object to 
> decide?

As I understand it:

The sliced object will only see integers. The PEP wants to give arbitrary
objects the possibility of pretending to be an integer that can be used
for indexing.


From oliphant.travis at  Thu Feb  9 10:08:36 2006
From: oliphant.travis at (Travis Oliphant)
Date: Thu, 09 Feb 2006 02:08:36 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

Eric Nieuwland wrote:

> Travis Oliphant wrote:
>> PEP:  ###
>> Title:  Allowing any object to be used for slicing
>> [...]
>> Rationale
>>    Currently integers and long integers play a special role in slice
>>    notation in that they are the only objects allowed in slice
>>    syntax. In other words, if X is an object implementing the sequence
>>    protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>>    integers or long integers.  There is no way for obj1 and obj2 to
>>    tell Python that they could be reasonably used as indexes into a
>>    sequence.  This is an unnecessary limitation.
>> [...]
> I like the general idea from an academic point of view.
> Just one question: could you explain what I should expect from x[ 
> slicer('spam') : slicer('eggs') ]  when slicer implements this protocol?
> Specifically, I'd like to known how you want to define the interval 
> between two objects. Or is that for the sliced/indexed object to decide?

I'm not proposing to define that.  The sequence protocol already 
provides to the object only a c-integer (currently it's int but there's 
a PEP to change it to ssize_t).   Right now, only Python integer and 
Python Long integers are allowed to be converted to this c-integer 
passed to the object that is implementing the slicing protocol.  It's up 
to the object to deal with those integers as it sees fit.

One possible complaint that is easily addressed is that the slot should 
really go into the PyNumber_Methods as nb_index because a number-like 
object is what would typically be easily convertible to a c-integer.  
Having to implement the sequence protocol (on the C-level) just to 
enable sq_index seems in-appropriate.

So, I would change the PEP to implement nb_index instead...


From rhamph at  Thu Feb  9 10:47:40 2006
From: rhamph at (Adam Olsen)
Date: Thu, 9 Feb 2006 02:47:40 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> Guido seemed accepting to this idea about 9 months ago when I spoke to
> him.  I finally got around to writing up the PEP.   I'd really like to
> get this into Python 2.5 if possible.


I've detailed my reasons here:

In short, there are purely math usages that want to convert to int
while raising exceptions from inexact results.  The name __index__
seems inappropriate for this, and I feel it would be cleaner to fix
float.__int__ to raise exceptions from inexact results (after a
suitable warning period and with a trunc() function added to math.)

Adam Olsen, aka Rhamphoryncus

From seojiwon at  Thu Feb  9 10:51:58 2006
From: seojiwon at (Jiwon Seo)
Date: Thu, 9 Feb 2006 01:51:58 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/8/06, "Martin v. L?wis" <martin at> wrote:
> Jiwon Seo wrote:
> > Then, is there any chance anonymous function - or closure - is
> > supported in python 3.0 ? Or at least have a discussion about it?
> That discussion appears to be closed (or, not really: everybody
> can discuss, but it likely won't change anything).
> > (IMHO, closure is very handy for function like map, sort etc. And
> > having to write a function for multiple statement is kind of good in
> > that function name explains what it does. However, I sometimes feel
> > that having no name at all is clearer. Also, having to define a
> > function when it'll be used only once seemed inappropriate sometimes.)
> Hmm. Can you give real-world examples (of existing code) where you
> needed this?

Apparently, simplest example is,

collection.visit(lambda x: print x)

which currently is not possible. Another example is,

map(lambda x: if odd(x): return 1
                      else: return 0,

(however, with new if/else expression, that's not so much a problem any more.)

Also, anything with exception handling code can't be without explicit
function definition.

collection.visit(lambda x: try: foo(x); except SomeError: error("error

Anyway, I was just curious that if anyone is interested in having more
closure-like closure in python 3.0 - in any form, not necessary an
extension on lambda.


> Regards,
> Martin

From abo at  Thu Feb  9 11:32:56 2006
From: abo at (Donovan Baarda)
Date: Thu, 09 Feb 2006 10:32:56 +0000
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
	<> <dsc6sa$85k$>
Message-ID: <>

On Wed, 2006-02-08 at 15:14 +0100, Valentino Volonghi aka Dialtone
> On Wed, Feb 08, 2006 at 01:23:26PM +0000, Donovan Baarda wrote:
> > I believe that Twisted does pretty much this with it's "deferred" stuff.
> > It shoves slow stuff off for processing in a separate thread that
> > re-syncs with the event loop when it's finished.
> Deferreds are only an elaborate way to deal with a bunch of callbacks.
> It's Twisted itself that provides a way to run something in a separate thread
> and then fire a deferred (from the main thread) when the child thread
> finishes (reactor.callInThread() to call stuff in a different thread,

I know they are more than just a way to run slow stuff in threads, but
once you have them, simple as they are, they present an obvious solution
to all sorts of things, including long computations in a thread.

Note that once zope2 took the approach it did, blocking the async-loop
didn't hurt so bad, so lots of zope add-ons just did it gratuitously. In
many cases the slow event handlers were slow because they are waiting on
IO that could in theory be serviced as yet another event handler in the
async-loop. However, the Zope/Medusa async framework had become so scary
hardly anyone knew how to do this without breaking Zope itself.

> > In the case of Zope/ZEO I'm not entirely sure but I think what happened
> > was medusa (asyncore/asynchat based stuff Zope2 was based on) didn't
> > have this deferred handler support. When they found some of the stuff
> Here I think you meant that medusa didn't handle computation in separate
> threads instead.

No, I pretty much meant what I said :-)

Medusa didn't have any concept of a deferred, hence the idea of using
one to collect the results of a long computation in another thread never
occurred to them... remember the highly refactored OO beauty that is
twisted was not even a twinkle in anyone's eye yet.

In theory it would be just as easy to add twisted style deferToThread to
Medusa, and IMHO it is a much better approach. Unfortunately at the time
they went the other way and implemented multiple async-loops in separate

Donovan Baarda <abo at>

From fredrik at  Thu Feb  9 13:12:29 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 9 Feb 2006 13:12:29 +0100
Subject: [Python-Dev] threadsafe patch for asynchat
References: <><><><><>
Message-ID: <dsfbjd$bh8$>

Donovan Baarda wrote:

>> Here I think you meant that medusa didn't handle computation in separate
>> threads instead.
> No, I pretty much meant what I said :-)
> Medusa didn't have any concept of a deferred, hence the idea of using
> one to collect the results of a long computation in another thread never
> occurred to them... remember the highly refactored OO beauty that is
> twisted was not even a twinkle in anyone's eye yet.
> In theory it would be just as easy to add twisted style deferToThread to
> Medusa, and IMHO it is a much better approach. Unfortunately at the time
> they went the other way and implemented multiple async-loops in separate
> threads.

that doesn't mean that everyone using Medusa has done things in the wrong
way, of course ;-)


From barry at  Thu Feb  9 13:39:06 2006
From: barry at (Barry Warsaw)
Date: Thu, 9 Feb 2006 07:39:06 -0500
Subject: [Python-Dev] py3k and not equal; re names
In-Reply-To: <034901c62d54$cf30c320$2b2c4fca@csmith>
References: <034901c62d54$cf30c320$2b2c4fca@csmith>
Message-ID: <>

On Feb 9, 2006, at 3:41 AM, Smith wrote:

> I'm wondering if it's just "foolish consistency" (to quote a PEP 8)  
> that is calling for the dropping of <> in preference of only !=.  
> I've used the former since the beginning in everything from basic,  
> fortran, claris works, excel, gnumeric, and python. I tried to find  
> a rationale for the dropping--perhaps there is some other object  
> that will be represented (like an empty set). I'm sure there must  
> be some reason, but just want to put a vote in for keeping this  
> variety.

I've long advocated for keeping <> as I find it much more visually  
distinctive when reading code.


From p.f.moore at  Thu Feb  9 13:53:10 2006
From: p.f.moore at (Paul Moore)
Date: Thu, 9 Feb 2006 12:53:10 +0000
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 2/9/06, Neil Hodgson <nyamatongwe at> wrote:
> Greg Ewing:
> > But that won't help when you need to deal with third-party
> > code that knows nothing about Python or its wrapped file
> > objects, and calls the CRT (or one of the myriad extant
> > CRTs, chosen at random:-) directly.
>    Can you explain exactly why there is a problem here? Its fairly
> normal under Windows to build applications that provide a generic
> plugin interface (think Netscape plugins or COM) that allow the
> plugins to be built with any compiler and runtime.

This has all been thrashed out before, but the issue is passing
CRT-allocated objects across DLL boundaries. If you open a file
(getting a FILE*) in one DLL, using one CRT, and pass it to a second
DLL, linked with a different CRT, the FILE* is not valid in that
second CRT, and operations on it will fail.

At first glance, this is a minor issue - passing FILE* pointers across
DLL boundaries isn't something I'd normally expect people to do - but
look further and you find you're opening a real can of worms. For
example, Python has public APIs which take FILE* parameters. Further,
memory allocation is CRT-managed - allocate memory with one CRT's
malloc, and dealloacte it elsewhere, and you have issues. So *any*
pointer could be CRT-managed, to some extent. Etc, etc...

As a counterexample, however, I've heard reports that you can do a
binary edit of the DLLs in the Subversion Python bindings, to change
references to python23.dll to python24.dll, and everything still
works. Make of that what you will...

Also, there are intractable cases, like mod_python. Apache is still
built with MSVC6, where Python is built with MSVC7.1. And so,
mod_python, as a bridge, has *no* CRT that is "officially" OK. And
yet, it works. I don't know how, maybe the mod_python developers could

Anyway, that's the brief summary.


From thomas at  Thu Feb  9 14:49:57 2006
From: thomas at (Thomas Wouters)
Date: Thu, 9 Feb 2006 14:49:57 +0100
Subject: [Python-Dev] py3k and not equal; re names
In-Reply-To: <>
References: <034901c62d54$cf30c320$2b2c4fca@csmith>
Message-ID: <>

On Thu, Feb 09, 2006 at 07:39:06AM -0500, Barry Warsaw wrote:

> I've long advocated for keeping <> as I find it much more visually  
> distinctive when reading code.

+1. And, two years ago, in his PyCon keynote, Guido forgot to say <> was
going away, so I think Barry and I are completely in our rights to demand
it'd stay.

<0.5 wink>-ly y'rs,
Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From skip at  Thu Feb  9 15:38:58 2006
From: skip at (skip at
Date: Thu, 9 Feb 2006 08:38:58 -0600
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

    >> [Andrew Koenig]
    >>> Might I suggest that at least you consider using "hint" instead of "cue"?

    Greg> I agree that "hint" is a more precise name.

Ditto.  In addition, we already have queues.  Do we really need to use a
homonym that means something entirely different?  (Hint: consider the added
difficulty for non-native English speakers).


From abo at  Thu Feb  9 15:39:15 2006
From: abo at (Donovan Baarda)
Date: Thu, 09 Feb 2006 14:39:15 +0000
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <dsfbjd$bh8$>
References: <>
	<> <dsc6sa$85k$>
Message-ID: <>

On Thu, 2006-02-09 at 13:12 +0100, Fredrik Lundh wrote:
> Donovan Baarda wrote:
> >> Here I think you meant that medusa didn't handle computation in separate
> >> threads instead.
> >
> > No, I pretty much meant what I said :-)
> >
> > Medusa didn't have any concept of a deferred, hence the idea of using
> > one to collect the results of a long computation in another thread never
> > occurred to them... remember the highly refactored OO beauty that is
> > twisted was not even a twinkle in anyone's eye yet.
> >
> > In theory it would be just as easy to add twisted style deferToThread to
> > Medusa, and IMHO it is a much better approach. Unfortunately at the time
> > they went the other way and implemented multiple async-loops in separate
> > threads.
> that doesn't mean that everyone using Medusa has done things in the wrong
> way, of course ;-)

Of course... and even Zope2 was not necessarily the "wrong way"... it
was a perfectly valid design decision, given that it was all new ground
at the time. And it works really well... there were many consequences of
that design that probably contributed to the robustness of other Zope
components like ZODB...

Donovan Baarda <abo at>

From skip at  Thu Feb  9 15:52:19 2006
From: skip at (skip at
Date: Thu, 9 Feb 2006 08:52:19 -0600
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

    >> Hmm. Can you give real-world examples (of existing code) where you
    >> needed this?

    Jiwon> Apparently, simplest example is,

    Jiwon> collection.visit(lambda x: print x)

Sure, but has several other people have indicated, statements are not
expressions in Python as they are in C (or in Lisp, which doesn't have
statements).  You can't do this either:

    if print x:
        print 5

because "print x" is a statement, while the if statement only accepts

Lambdas are expressions.  Statements can't be embedded in expressions.  That
statements and expressions are different is a core feature of the language.
That is almost certainly not going to change.


From oliphant.travis at  Thu Feb  9 16:23:05 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 08:23:05 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <dsfmp3$pc4$>

Adam Olsen wrote:
> On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
>>Guido seemed accepting to this idea about 9 months ago when I spoke to
>>him.  I finally got around to writing up the PEP.   I'd really like to
>>get this into Python 2.5 if possible.
> -1
> I've detailed my reasons here:
> In short, there are purely math usages that want to convert to int
> while raising exceptions from inexact results.  The name __index__
> seems inappropriate for this, and I feel it would be cleaner to fix
> float.__int__ to raise exceptions from inexact results (after a
> suitable warning period and with a trunc() function added to math.)

I'm a little confused.  Is your opposition solely due to the fact that 
you think float's __int__ method ought to raise exceptions and the 
apply_slice code should therefore use the __int__ slot?

In theory I can understand this reasoning.  In practice, however, the 
__int__ slot has been used for "coercion" and changing the semantics of 
int(3.2) at this stage seems like a recipe for lots of code breakage.  I 
don't think something like that is possible until Python 3k.

If that is not your opposition, please be more clear. Regardless of how 
it is done, it seems rather unPythonic to only allow 2 special types to 
be used in apply_slice and assign_slice.


From jeremy at  Thu Feb  9 16:29:36 2006
From: jeremy at (Jeremy Hylton)
Date: Thu, 9 Feb 2006 10:29:36 -0500
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

Hint seems like the standard terminology in the field.  I don't think
it makes sense to invent our own terminology without some compelling


On 2/9/06, skip at <skip at> wrote:
>     >> [Andrew Koenig]
>     >>
>     >>> Might I suggest that at least you consider using "hint" instead of "cue"?
>     ...
>     Greg> I agree that "hint" is a more precise name.
> Ditto.  In addition, we already have queues.  Do we really need to use a
> homonym that means something entirely different?  (Hint: consider the added
> difficulty for non-native English speakers).
> Skip
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From guido at  Thu Feb  9 16:43:28 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 07:43:28 -0800
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, skip at <skip at> wrote:
>     Greg> I agree that "hint" is a more precise name.
> Ditto.  In addition, we already have queues.  Do we really need to use a
> homonym that means something entirely different?  (Hint: consider the added
> difficulty for non-native English speakers).

Right. As a non-native speaker I can confirm that for English
learners, "cue" is a bit mysterious at first while "hint" is obvious.

--Guido van Rossum (home page:

From tjreedy at  Thu Feb  9 16:39:40 2006
From: tjreedy at (Terry Reedy)
Date: Thu, 9 Feb 2006 10:39:40 -0500
Subject: [Python-Dev] _length_cue()
References: <><00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1><>
Message-ID: <dsfo0e$un8$>

<skip at> wrote in message 
news:17387.21506.776343.95040 at
>    >>> Might I suggest that at least you consider using "hint" instead of 
> "cue"?
>    ...
>    Greg> I agree that "hint" is a more precise name.
> Ditto.  In addition, we already have queues.  Do we really need to use a
> homonym that means something entirely different?  (Hint: consider the 
> added
> difficulty for non-native English speakers).

Even as a native English speaker, but without knowing the intended meaning, 
I did not understand or guess that length_cue meant length_hint.  The 
primary meaning of cue is 'signal to begin some action', with 'hint, 
suggestion' being a secondary meaning.  Even then, I would take it as 
referring to possible action rather than possible information.

Cue is also short for queue, leading to cue stick (looks like a pigtail, 
long and tapering) and cue ball. 

From skip at  Thu Feb  9 16:54:59 2006
From: skip at (skip at
Date: Thu, 9 Feb 2006 09:54:59 -0600
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

    >> Ditto.  In addition, we already have queues.  Do we really need to
    >> use a homonym that means something entirely different?  (Hint:
    >> consider the added difficulty for non-native English speakers).

    Guido> Right. As a non-native speaker I can confirm that for English
    Guido> learners, "cue" is a bit mysterious at first while "hint" is
    Guido> obvious.

Guido, you're hardly your typical non-native speaker.  I think your English
may be better than mine. ;-) At any rate, I was thinking of some of the
posts I see on where it requires a fair amount of detective work just
to figure out what the poster has written, what with all the incorrect
grammar and wild misspellings.  For that sort of person I can believe that
"cue", "queue" and "kew" might mean exactly the same thing...


From jack at  Thu Feb  9 17:21:49 2006
From: jack at (Jack Diederich)
Date: Thu, 9 Feb 2006 11:21:49 -0500
Subject: [Python-Dev] _length_cue()
In-Reply-To: <00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

[Raymond Hettinger]
> [Armin Rigo]
> > BTW the reason I'm looking at this is that I'm considering adding
> > another undocumented internal-use-only method, maybe __getitem_cue__(),
> > that would try to guess what the nth item to be returned will be.  This
> > would allow the repr of some iterators to display more helpful
> > information when playing around with them at the prompt, e.g.:
> >
> >>>> enumerate([3.1, 3.14, 3.141, 3.1415, 3.14159, 3.141596])
> > <enumerate (0, 3.1), (1, 3.14), (2, 3.141),... length 6>
> At one point, I explored and then abandoned this idea.  For objects like 
> itertools.count(n), it worked fine -- the state was readily knowable and the 
> eval(repr(obj)) round-trip was possible.  However, for tools like 
> enumerate(), it didn't make sense to have a preview that only applied in a 
> tiny handful of (mostly academic) cases and was not evaluable in any case.

That is my experience too.  Even for knowable sequences people consume
it in series and not just one element.  My permutation module supports 
pulling out just the Nth canonical permutation.  Lots of people have
used the module and no one uses that feature.

>>> import probstat
>>> p = probstat.Permutation(range(4))
>>> p[0]
[0, 1, 2, 3]
>>> len(p)
>>> p[23]
[3, 2, 1, 0]


From martin at  Thu Feb  9 17:34:57 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 17:34:57 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>	
Message-ID: <>

Neil Hodgson wrote:
>    The postgres example is strange to me as I'd never consider passing
> a FILE* over a DLL boundary. Maybe this is a Unix/Windows cultural
> thing due to such practices being more dangerous on Windows.

In the specific example, Postgres has a PQprint function that can
print a query result to a file; the file was sys.stdout.

>>Also, there is still the redistribution issue: to redistribute
>>msvcr71.dll, you need to own a MSVC license. People that want to
>>use py2exe (or some such) are in trouble: they need to distribute
>>both python25.dll, and msvcr71.dll. They are allowed to distribute
>>the former, but (formally) not allowed to distribute the latter.
>    Link statically.

Not sure whether this was a serious suggestion. If pythonxy.dll
was statically linked, you would get all the CRT duplication
already in extension modules. Given that there are APIs in Python
where you have to do malloc/free across the python.dll
boundary, you get memory leaks.


From martin at  Thu Feb  9 17:39:31 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 17:39:31 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>	
Message-ID: <>

Jiwon Seo wrote:
> Apparently, simplest example is,
> collection.visit(lambda x: print x)

Ok. I remotely recall Guido suggesting that print should become
a function.

It's not a specific example though: what precise library provides
the visit method?

> which currently is not possible. Another example is,
> map(lambda x: if odd(x): return 1
>                       else: return 0,
>         listOfNumbers)

Hmm. What's wrong with

map(odd, listOfNumbers)

or, if you really need ints:

map(lambda x:int(odd(x)), listOfNumbers)

> Also, anything with exception handling code can't be without explicit
> function definition.
> collection.visit(lambda x: try: foo(x); except SomeError: error("error
> message"))

That's not a specific example.


From martin at  Thu Feb  9 17:43:32 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 17:43:32 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
	<>	<>	<>
Message-ID: <>

Neil Hodgson wrote:
>>But that won't help when you need to deal with third-party
>>code that knows nothing about Python or its wrapped file
>>objects, and calls the CRT (or one of the myriad extant
>>CRTs, chosen at random:-) directly.
>    Can you explain exactly why there is a problem here? Its fairly
> normal under Windows to build applications that provide a generic
> plugin interface (think Netscape plugins or COM) that allow the
> plugins to be built with any compiler and runtime.

COM really solves all problems people might have on Windows.
Alas, it is not a cross-platform API. Standard C is cross-platform,
so Python uses it in its own APIs.


From brett at  Thu Feb  9 18:42:44 2006
From: brett at (Brett Cannon)
Date: Thu, 9 Feb 2006 09:42:44 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> Guido seemed accepting to this idea about 9 months ago when I spoke to
> him.  I finally got around to writing up the PEP.   I'd really like to
> get this into Python 2.5 if possible.
> -Travis
> PEP:  ###
> Title:  Allowing any object to be used for slicing

Overally I am fine with the idea.  Being used as an index is different
than coercion into an int so adding this extra method seems

> Implementation Plan
>    1) Add the slots
>    2) Change the ISINT macro in ceval.c to accomodate objects with the
>    index slot defined.

Maybe the macro should also be renamed?  Not exactly testing if
something is an int anymore if it checks for __index__.

>    3) Change the _PyEval_SliceIndex function to accomodate objects
>    with the index slot defined.


From raymond.hettinger at  Thu Feb  9 19:13:37 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Thu, 09 Feb 2006 13:13:37 -0500
Subject: [Python-Dev] _length_cue()
References: <><00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1><><>
Message-ID: <003101c62da4$8d0a3260$b83efea9@RaymondLaptop1>

> Hint seems like the standard terminology in the field.  I don't think
> it makes sense to invent our own terminology without some compelling
> reason.

Okay, I give, hint wins.


From bokr at  Thu Feb  9 19:24:43 2006
From: bokr at (Bengt Richter)
Date: Thu, 09 Feb 2006 18:24:43 GMT
Subject: [Python-Dev] Let's just *keep* lambda
References: <>	
Message-ID: <>

On Thu, 09 Feb 2006 17:39:31 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Jiwon Seo wrote:
>> Apparently, simplest example is,
>> collection.visit(lambda x: print x)
>Ok. I remotely recall Guido suggesting that print should become
>a function.
Even so, that one is so trivial to define (other than the >> part):

 >>> import sys
 >>> def printfun(*args): sys.stdout.write(' '.join(map(str,args))+'\n')
 >>> lamb = lambda x: printfun(x)
 >>> lamb(123)
 >>> printfun('How', 'about', 'that?')
 How about that?

Also the quasi-C variant:
 >>> def printf(fmt, *args): sys.stdout.write(fmt%args)
 >>> (lambda x: printf('How about this: %s', x))('-- also a function\n(no \\n here ;-) ')
 How about this: -- also a function
 (no \n here ;-) >>>

>It's not a specific example though: what precise library provides
>the visit method?
>> which currently is not possible. Another example is,
>> map(lambda x: if odd(x): return 1
>>                       else: return 0,
>>         listOfNumbers)
>Hmm. What's wrong with
>map(odd, listOfNumbers)
>or, if you really need ints:
>map(lambda x:int(odd(x)), listOfNumbers)
>> Also, anything with exception handling code can't be without explicit
>> function definition.
>> collection.visit(lambda x: try: foo(x); except SomeError: error("error
>> message"))
>That's not a specific example.
 >>> (lambda : """
 ...     I will say that the multi-line part
 ...     of the argument against lambda suites
 ...     is bogus, though ;-)
 ... """)(
 ...     ).splitlines(
 ...     )[-1].split()[1].capitalize(
 ...     ).rstrip(',')+'! (though this is ridiculous ;-)'
 'Bogus! (though this is ridiculous ;-)'

And, as you know, you can abuse the heck out of lambda (obviously this is
ridiculous**2 avoidance of external def)

 >>> lamb = lambda x: eval(compile("""if 1:
 ...     def f(x):
 ...         try: return 'zero one two three'.split()[x]
 ...         except Exception,e:return 'No name for %r -- %s:%s'%(x,e.__class__.__name__, e)
 ... """,'','exec')) or locals()['f'](x)
 >>> lamb(2)
 >>> lamb(0)
 >>> lamb(4)
 'No name for 4 -- IndexError:list index out of range'
 >>> lamb('x')
 "No name for 'x' -- TypeError:list indices must be integers"

But would e.g. [1]

    collection.visit(lambda x::  # double ':' to signify suite start
       try: return 'zero one two three'.split()[x]
       except Exception,e:return 'No name for %r -- %s:%s'%(x,e.__class__.__name__, e)

be so bad an "improvement"? Search your heart for the purest answer ;-)
(requires enclosing parens, and suite ends on closing ')' and if multiline,
the first line after the :: defines the indent-one left edge, and explicit
return of value required after ::).

[1] (using the function body above just as example, not saying it makes sense for collection.visit)

Bengt Richter

From guido at  Thu Feb  9 19:33:10 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 10:33:10 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Enough already.

As has clearly been proven, lambda is already perfect.


To those folks attempting to propose alternate syntax (e.g. x -> y):
this is the wrong thread for that (see subject). Seriously, I've seen
lots of proposals that just change the syntax, and none of them are so
much better than what we have. My comments on some recent proposals:

- <expr> for <formals>
Smells to much like a loop. And what if there are no formals? Also the
generalization from a generator without the "in <sequence>" part is
wrong; "f(x) for x in S" binds x, while the proposed "f(x) for x" has
x as a free variable. Very odd.

- <formals> -> <expr>
The -> symbol is much easier to miss. Also it means something
completely different in other languages. And it has some problems with
multiple formals: (x, y -> x+y) isn't very clear on the binding --
since '->' is an uncommon operator, there's no strong intuition about
whether ',' or '->' binds stronger. (x, y) -> x+y would make more
sense, but has an ambiguity as long as we want to allow argument
tuples (which I've wanted to take out, but that is also getting a lot
of opposition).

- lambda(<formals>): <expr>
This was my own minimal proposal. I withdraw it -- I agree with the
criticism that it looks too much like a function call.

- Use a different keyword instead of lambda
What is that going to solve?

- If there were other proposals, I missed them, or they were too far
out to left field to be taken seriously.


To those people complaining that Python's lambda misleads people into
thinking that it is the same as Lisp's lambda: you better get used to
it. Python has a long tradition of borrowing notations from other
languages and changing the "deep" meaning -- for example, Python's
assignment operator does something completely different from the same
operator in C or C++.


To those people who believe that lambda is required in some situations
because it behaves differently with respect to the surrounding scope
than def: it doesn't, and it never did. This is (still!) a
surprisingly common myth. I have no idea where it comes from; does
this difference exist in some other language that has lambda as well
as some other function definition mechanism?


To those people still complaining that lambda is crippled because it
doesn't do statements: First, remember that adding statement
capability wouldn't really add any power to the language; lambda is
purely syntactic sugar for an anonymous function definition (see above
myth debunking section). Second, years of attempts to overcome this
haven't come up with a usable syntax (and yes, curly braces have been
proposed and rejected like everything else). It's a hard problem
because switching back to indentation-based parsing inside an
expression is problematic. For example, consider this hypothetical

a = foo(lambda x, y:
      print x
      print y)

Should this be considered legal? Or should it be written as

a = foo(lambda x, y:
          print x
          print y

??? (Indenting the prints so they start at a later column than the 'l'
of 'lambda', and adding an explicit dedent before the close
parenthesis.) Note that if the former were allowed, we'd have
additional ambiguity if foo() took two parameters, e.g.:

a = foo(lambda x, y:
      print x
      print y, 42)

-- is 42 the second argument to foo() or is it printed?

I'd much rather avoid this snake's nest by giving the function a name
and using existing statement syntax, like this:

def callback(x, y):
    print x
    print y
a = foo(callback)

This is unambiguous, easier to parse (for humans as well as for
computers), and doesn't actually span more text lines. Since this
typically happens in a local scope, the name 'callback' disappears as
soon as as the scope is exited.

BTW I use the same approach regularly for breaking up long
expressions; for example instead of writing

a = foo(some_call(another_call(some_long_argument,
                  and_more(1, 2, 3),

I'll write

x = another_call(some_long_argument, another_argument)
a = foo(some_call(x, and_more(1, 2, 3)), and_still_more())

and suddenly my code is more compact and yet easier to read! (In real
life, I'd use a more meaningful name than 'x', but since the example
is nonsense it's hard to come up with a meaningful name here. :-)

Regarding the leakage of temporary variable names in this case: I
don't care; this typically happens in a local scope where a compiler
could easily enough figure out that a variable is no longer in use.
And for clarity we use local variables in this way all the time


Parting shot: it appears that we're getting more and more
expressionized versions of statements: first list comprehensions, then
generator expressions, most recently conditional expressions, in
Python 3000 print() will become a function... Seen this way, lambda
was just ahead of its time! Perhaps we could add a try/except/finally
expression, and allow assignments in expressions, and then we could
rid of statements altogether, turning Python into an expression
language. Change the use of parentheses a bit, and... voila, Lisp! :-)

--Guido van Rossum (home page:

From dialtone at  Thu Feb  9 19:47:51 2006
From: dialtone at (Valentino Volonghi aka Dialtone)
Date: Thu, 9 Feb 2006 19:47:51 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
Message-ID: <20060209184751.20077.352925131.divmod.quotient.1@ohm>

On Thu, 09 Feb 2006 17:39:31 +0100, "\"Martin v. L?wis\"" <martin at> wrote:
>It's not a specific example though: what precise library provides
>the visit method?

I'll provide my own usecase right now which is event driven programming of
any kind (from GUI toolkits, to network frameworks/libraries).

>From my experience these are the kind of usecases that suffer most from
having to define functions everytime and, worse, to define functions before
their actual usage (which is responsible for part of the bad reputation that,
for example, deferreds have).

Let's consider this piece of code (actual code that works today and uses
twisted for convenience):

def do_stuff(result):
    if result == 'Initial Value':
        d2 = work_on_result_and_return_a_deferred(result)
        return d2
    return 'No work on result'

def println(something):
    print something

d1 = some_operation_that_results_in_a_deferred()
d1.addCallback(lambda _: reactor.stop())

As evident the execution order is almost upside-down and this is because I
have to define a function before using it (instead of defining and using a
function inline). However python cannot have a statement inside an expression
as has already been said, thus I think some new syntax to support this could
be helpful, for example:

when some_operation_that_results_in_a_deferred() -> result:
    if result == 'Initial Value':
        when work_on_result_and_return_a_deferred(result) -> inner_res:
            print inner_res
        print "No work on result"

In this case the execution order is correct and indentation helps in
identifying which pieces of the execution will be run at a later time
(depending on the when block).

This way of coding could be useful for many kind of event driven frameworks
like GUI toolkits that could do the following:

when button.clicked() -> event, other_args:
    when some_dialog() -> result:
        if result is not None:

IMHO similar considerations are valid for other libraries/frameworks like
asyncore. What would this require? Python should basically support a protocol
for a deferred like object that could be used by a framework to support that
syntax. Something like:

add_callback(callback, *a, **kw)
add_errback(callback, *a, **kw)
(extra methods if needed)


Valentino Volonghi aka Dialtone
Now Running MacOSX 10.4
New Pet:

From bokr at  Thu Feb  9 20:06:12 2006
From: bokr at (Bengt Richter)
Date: Thu, 09 Feb 2006 19:06:12 GMT
Subject: [Python-Dev] _length_cue()
References: <>
Message-ID: <>

On Thu, 9 Feb 2006 09:54:59 -0600, skip at wrote:

>    >> Ditto.  In addition, we already have queues.  Do we really need to
>    >> use a homonym that means something entirely different?  (Hint:
>    >> consider the added difficulty for non-native English speakers).
>    Guido> Right. As a non-native speaker I can confirm that for English
>    Guido> learners, "cue" is a bit mysterious at first while "hint" is
>    Guido> obvious.
>Guido, you're hardly your typical non-native speaker.  I think your English
>may be better than mine. ;-) At any rate, I was thinking of some of the
>posts I see on where it requires a fair amount of detective work just
>to figure out what the poster has written, what with all the incorrect
>grammar and wild misspellings.  For that sort of person I can believe that
>"cue", "queue" and "kew" might mean exactly the same thing...
FWIW, I first thought "cue" might be a typo mutation of "clue" ;-)
+1 on something with "hint".

Bengt Richter

From guido at  Thu Feb  9 20:30:01 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 11:30:01 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> Guido seemed accepting to this idea about 9 months ago when I spoke to
> him.  I finally got around to writing up the PEP.   I'd really like to
> get this into Python 2.5 if possible.

Excellent! I was just going over the 2.5 schedule with Neal Norwitz
last night, and looking back in my slides for OSCON 2005 I noticed
this idea, and was wondering if you still wanted it. I'm glad the
answer is yes!

BTW do you also still want to turn ZeroDivisionError into a warning
(that is changed into an error by default)? That idea shared a slide
and I believe it was discussed in the same meeting you & I and some
others had in San Mateo last summer.

I'll comment on the PEP in-line. I've assigned it number 357 and checked it in.


In the past, the protocol for aqcuiring a PEP number has been to ask
the PEP coordinators (Barry Warsaw and David Goodger) to assign one. I
believe that we could simplify this protocol to avoid necessary
involvement of the PEP coordinators; all that is needed is someone
with checkin privileges. I propose the following protocol:

1. In the peps directory, do a svn sync.

2. Look at the files that are there and the contents of pep-0000.txt.
This should provide you with the last PEP number in sequence, ignoring
the out-of-sequence PEPs (666, 754, and 3000). The reason to look in
PEP 0 is that it is conceivable that a PEP number has been reserved in
the index but not yet committed, so you should use the largest number.

3. Add 1 to the last PEP number. This gives your new PEP number, NNNN.

4. Using svn add and svn commit, check in the file pep-NNNN.txt (use
%04d to format the number); the contents can be a minimal summary or
even just headers. If this succeeds, you have successfully assigned
yourself PEP number NNNN. Exit.

5. If you get an error from svn about the commit, someone else was
carrying out the same protocol at the same time, and they won the
race. Start over from step 1.

I suspect the PEP coordinators have informally been using this
protocol amongst themseles -- and amongst the occasional developer who
bypassed the "official" protocol, like I've done in the past and like
Neal Norwitz did last night with the Python 2.5 release schedule, PEP
356. I'm simply extending the protocol to all developers with checkin
permissions. For PEP authors without checkin permissions, nothing
changes, except that optionally if they don't get a timely response
from the PEP coordinators, they can ask someone else with checkin


> PEP:  ###
> Title:  Allowing any object to be used for slicing
> Version:  $Revision 1.1$
> Last Modified: $Date: 2006/02/09 $
> Author: Travis Oliphant <oliphant at>
> Status: Draft
> Type:  Standards Track
> Created:  09-Feb-2006
> Python-Version:  2.5
> Abstract
>    This PEP proposes adding an sq_index slot in PySequenceMethods and
>    an __index__ special method so that arbitrary objects can be used
>    in slice syntax.
> Rationale
>    Currently integers and long integers play a special role in slice
>    notation in that they are the only objects allowed in slice
>    syntax. In other words, if X is an object implementing the sequence
>    protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>    integers or long integers.  There is no way for obj1 and obj2 to
>    tell Python that they could be reasonably used as indexes into a
>    sequence.  This is an unnecessary limitation.
>    In NumPy, for example, there are 8 different integer scalars
>    corresponding to unsigned and signed integers of 8, 16, 32, and 64
>    bits.  These type-objects could reasonably be used as indexes into
>    a sequence if there were some way for their typeobjects to tell
>    Python what integer value to use.
> Proposal
>    Add a sq_index slot to PySequenceMethods, and a corresponding
>    __index__ special method.  Objects could define a function to
>    place in the sq_index slot that returns an C-integer for use in
>    PySequence_GetSlice, PySequence_SetSlice, and PySequence_DelSlice.

Shouldn't this slot be in the PyNumberMethods extension? It feels more
like a property of numbers than of a property of sequences. Also, the
slot name should then probably be nb_index.

There's also an ambiguity when using simple indexing. When writing
x[i] where x is a sequence and i an object that isn't int or long but
implements __index__, I think i.__index__() should be used rather than
bailing out. I suspect that you didn't think of this because you've
already special-cased this in your code -- when a non-integer is
passed, the mapping API is used (mp_subscript). This is done to
suppose extended slicing. The built-in sequences (list, str, unicode,
tuple for sure, probably more) that implement mp_subscript should
probe for nb_index before giving up. The generic code in
PyObject_GetItem should also check for nb_index before giving up.

> Implementation Plan
>    1) Add the slots
>    2) Change the ISINT macro in ceval.c to accomodate objects with the
>    index slot defined.
>    3) Change the _PyEval_SliceIndex function to accomodate objects
>    with the index slot defined.

I think all sequence objects that implement mp_subscript should
probably be modified according to the lines I sketched above.

> Possible Concerns
>    Speed:
>    Implementation should not slow down Python because integers and long
>    integers used as indexes will complete in the same number of
>    instructions.  The only change will be that what used to generate
>    an error will now be acceptable.
>    Why not use nb_int which is already there?
>    The nb_int, nb_oct, and nb_hex methods are used for coercion.
>    Floats have these methods defined and floats should not be used in
>    slice notation.
> Reference Implementation
>    Available on PEP acceptance.

This is very close to acceptance. I think I'd like to see the patch
developed and submitted to SF (and assigned to me) prior to

> Copyright
>    This document is placed in the public domain

If you agree with the above comments, please send me an updated
version of the PEP and I'll check it in over the old one, and approve
it. Then just use SF to submit the patch etc.

--Guido van Rossum (home page:

From guido at  Thu Feb  9 20:31:27 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 11:31:27 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Brett Cannon <brett at> wrote:
> >    2) Change the ISINT macro in ceval.c to accomodate objects with the
> >    index slot defined.
> Maybe the macro should also be renamed?  Not exactly testing if
> something is an int anymore if it checks for __index__.

Have you looked at the code? ceval.c uses this macro only in the slice
processing code. I don't particularly care what it's called...

--Guido van Rossum (home page:

From guido at  Thu Feb  9 20:37:48 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 11:37:48 -0800
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 2/9/06, "Martin v. L?wis" <martin at> wrote:
> COM really solves all problems people might have on Windows.

Taken deliberately out of context, that sounds rather like a claim
even Microsoft itself wouldn't make. :-)

--Guido van Rossum (home page:

From guido at  Thu Feb  9 20:42:21 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 11:42:21 -0800
Subject: [Python-Dev] threadsafe patch for asynchat
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/7/06, Mark Edgington <edgimar at> wrote:
> Ok, perhaps the notation could be improved, but the idea of the
> semaphore in the patch is "Does it run inside of a multithreaded
> environment, and could its push() functions be called from a different
> thread?"

The long-term fate of asyncore/asynchat aside, instead of wanting to
patch asynchat, you should be able to subclass it easily to introduce
the functionality you want. Given the disagreement over whether this
is a good thing, I suggest that that's a much better way for you to
solve your problem than to introduce yet another obscure confusing
optional parameter. And you won't have to wait for Python 2.5.

--Guido van Rossum (home page:

From brett at  Thu Feb  9 21:28:37 2006
From: brett at (Brett Cannon)
Date: Thu, 9 Feb 2006 12:28:37 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Guido van Rossum <guido at> wrote:
> On 2/9/06, Brett Cannon <brett at> wrote:
> > >    2) Change the ISINT macro in ceval.c to accomodate objects with the
> > >    index slot defined.
> >
> > Maybe the macro should also be renamed?  Not exactly testing if
> > something is an int anymore if it checks for __index__.
> Have you looked at the code? ceval.c uses this macro only in the slice
> processing code. I don't particularly care what it's called...

Yeah, I looked.  I just don't want a misnamed macro to start being
abused for some odd reason.  Might as well rename it while we are
thinking about it then let it have a bad name.


From bokr at  Thu Feb  9 21:53:39 2006
From: bokr at (Bengt Richter)
Date: Thu, 09 Feb 2006 20:53:39 GMT
Subject: [Python-Dev] Let's send lambda to the shearing shed (Re: Let's
	just *keep* lambda)
References: <>
Message-ID: <>

On Thu, 09 Feb 2006 16:41:10 +1300, Greg Ewing <greg.ewing at> wrote:

>My thought on lambda at the moment is that it's too VERBOSE.
>If a syntax for anonymous functions is to pull its weight,
>it needs to be *very* concise. The only time I ever consider
>writing a function definition in-line is when the body is
>extremely short, otherwise it's clearer to use a def instead.
>Given that, I do *not* have the space to waste with 6 or 7
>characters of geeky noise-word.
OTOH, it does stand out as a flag to indicate what is being done.

>So my vote for Py3k is to either
>1) Replace lambda args: value with
>   args -> value
>or something equivalently concise, or
Yet another bike shed color chip:

    !(args:expr)   # <==> lambda args:expr
    !(args::suite) # <==> (lambda args::suite)

(where the latter lambda form requires outer enclosing parens) But either "::" form
allows full def suite, with indentation for multilines having left edge of single indent
defined by first line following the "::"-containing line, and explicit returns for values
required and top suite ending on closing outer paren)

Probable uses for the "::" form would be for short inline suite definitions
    !(x::print x)               # <==> (lambda x::print x) & etc. similarly
    !(::global_counter+=1;return global_counter)
    !(::raise StopIteration)()  # more honest than iter([]).next()

but the flexibility would be there for an in-context definition, e.g.,

    sorted(seq, key= !(x::
        try: return abs(x)
        except TypeError: return 0))

and closures could be spelled

    !(c0,c1:!(x:c0+c1*x))(3,5)   # single use with constants is silly spelling of !(x:3+5*x)

Hm, are the latter two really better for eliminating "lambda"? Cf:

    sorted(seq, key=(lambda x::
        try:return abs(x)
        except TypeError: return 0))
    (lambda c1,c2:lambda x:c0+c1*x)(3,5) # also silly with constants

I'm not sure. I think I kind of like lambda args:expr and (lambda args::suite)
but sometimes super-concise is nice ;-)

>2) Remove lambda entirely.

Bengt Richter

From rhamph at  Thu Feb  9 22:14:43 2006
From: rhamph at (Adam Olsen)
Date: Thu, 9 Feb 2006 14:14:43 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <dsfmp3$pc4$>
References: <>
Message-ID: <>

On 2/9/06, Travis E. Oliphant <oliphant.travis at> wrote:
> I'm a little confused.  Is your opposition solely due to the fact that
> you think float's __int__ method ought to raise exceptions and the
> apply_slice code should therefore use the __int__ slot?
> In theory I can understand this reasoning.  In practice, however, the
> __int__ slot has been used for "coercion" and changing the semantics of
> int(3.2) at this stage seems like a recipe for lots of code breakage.  I
> don't think something like that is possible until Python 3k.
> If that is not your opposition, please be more clear. Regardless of how
> it is done, it seems rather unPythonic to only allow 2 special types to
> be used in apply_slice and assign_slice.

Yes, that is the basis of my opposition, and I do understand it would
take a long time to change __int__.

What is the recommended practice for python?  I can think of three
distinct categories of behavior:
- float to str.  Some types converted to str might by lossy, but in
general it's a very drastic conversion and unrelated to the others
- float to Decimal.  Raises an exception because it's usually lossy.
- Decimal to int.  Truncates, quite happily losing precision..

I guess my confusion revolves around float to Decimal.  Is lossless
conversion a good thing in python, or is prohibiting float to Decimal
conversion just a fudge to prevent people from initializing a Decimal
from a float when they really want a str?

Adam Olsen, aka Rhamphoryncus

From bokr at  Thu Feb  9 22:26:34 2006
From: bokr at (Bengt Richter)
Date: Thu, 09 Feb 2006 21:26:34 GMT
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
References: <>
Message-ID: <>

On Thu, 09 Feb 2006 01:00:22 -0700, Travis Oliphant <oliphant.travis at> wrote:
>   This PEP proposes adding an sq_index slot in PySequenceMethods and
>   an __index__ special method so that arbitrary objects can be used
>   in slice syntax.
>   Currently integers and long integers play a special role in slice
>   notation in that they are the only objects allowed in slice
>   syntax. In other words, if X is an object implementing the sequence
>   protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>   integers or long integers.  There is no way for obj1 and obj2 to
>   tell Python that they could be reasonably used as indexes into a
>   sequence.  This is an unnecessary limitation.  
>   In NumPy, for example, there are 8 different integer scalars
>   corresponding to unsigned and signed integers of 8, 16, 32, and 64
>   bits.  These type-objects could reasonably be used as indexes into
>   a sequence if there were some way for their typeobjects to tell
>   Python what integer value to use.  
>   Add a sq_index slot to PySequenceMethods, and a corresponding
>   __index__ special method.  Objects could define a function to
>   place in the sq_index slot that returns an C-integer for use in
>   PySequence_GetSlice, PySequence_SetSlice, and PySequence_DelSlice.
How about if SLICE byte code interpretation would try to call
obj.__int__() if passed a non-(int,long) obj ? Would that cover your use case?

BTW the slice type happily accepts anything for start:stop:step I believe,
and something[slice(whatever)] will call something.__getitem__ with the slice
instance, though this is neither a fast nor nicely spelled way to customize.

Bengt Richter

From barry at  Thu Feb  9 22:20:16 2006
From: barry at (Barry Warsaw)
Date: Thu, 09 Feb 2006 16:20:16 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any	object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 2006-02-09 at 11:30 -0800, Guido van Rossum wrote:

> In the past, the protocol for aqcuiring a PEP number has been to ask
> the PEP coordinators (Barry Warsaw and David Goodger) to assign one. I
> believe that we could simplify this protocol to avoid necessary
> involvement of the PEP coordinators; all that is needed is someone
> with checkin privileges. I propose the following protocol:


In general, this is probably fine.  Occasionally we reserve a PEP number
for something special, or for a pre-request, but I think both are pretty
rare.  And because of svn and the commit messages we can at least catch
those fairly quickly and fix them.  Maybe we can add known reserved
numbers to PEP 0 so they aren't taken accidentally.

What I'm actually more concerned about is that we (really David) often
review PEPs and reject first submissions on several grounds.  I must say
that David's done such a good job at keeping the quality of PEPs high
that I'm leery of interfering with that.  OTOH, perhaps those with
commit privileges should be expected to produce high quality PEPs on the
first draft.

Maybe we can amend your rules to those people who both have commit
privileges and have successfully submitted a PEP before.  PEP virgins
should go through the normal process.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From g.brandl at  Thu Feb  9 22:38:49 2006
From: g.brandl at (Georg Brandl)
Date: Thu, 09 Feb 2006 22:38:49 +0100
Subject: [Python-Dev] Let's send lambda to the shearing shed (Re: Let's
 just *keep* lambda)
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <dsgcp9$p1e$>

Bengt Richter wrote:

>>1) Replace lambda args: value with
>>   args -> value
>>or something equivalently concise, or
> Yet another bike shed color chip:
>     !(args:expr)   # <==> lambda args:expr
> and
>     !(args::suite) # <==> (lambda args::suite)

Please drop it. Guido pronounced on it, it is _not_ going to change,
and the introduction of new punctuation is clearly not improving anything.

> (where the latter lambda form requires outer enclosing parens) But either "::" form
> allows full def suite, with indentation for multilines having left edge of single indent
> defined by first line following the "::"-containing line, and explicit returns for values
> required and top suite ending on closing outer paren)
> Probable uses for the "::" form would be for short inline suite definitions
>     !(x::print x)               # <==> (lambda x::print x) & etc. similarly

Use sys.stdout.write.

>     !(::global_counter+=1;return global_counter)
>     !(::raise StopIteration)()  # more honest than iter([]).next()

Use a function.

> but the flexibility would be there for an in-context definition, e.g.,
>     sorted(seq, key= !(x::
>         try: return abs(x)
>         except TypeError: return 0))

Bah! I can't parse this. In "!(x::" there's clearly too much noise.


From oliphant.travis at  Thu Feb  9 22:39:07 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 14:39:07 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <> <>
Message-ID: <dsgcqa$pdg$>

Bengt Richter wrote:
> How about if SLICE byte code interpretation would try to call
> obj.__int__() if passed a non-(int,long) obj ? Would that cover your use case?

I believe that this is pretty much exactly what I'm proposing.  The 
apply_slice and assign_slice functions in ceval.c are called for the 

> BTW the slice type happily accepts anything for start:stop:step I believe,
> and something[slice(whatever)] will call something.__getitem__ with the slice
> instance, though this is neither a fast nor nicely spelled way to customize.

Yes, the slice object itself takes whatever you want.  However, Python 
special-cases what happens for X[a:b] *if* X as the sequence-protocol 
defined.   This is the code-path I'm trying to enhance.


From oliphant.travis at  Thu Feb  9 22:40:29 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 14:40:29 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <> <>
Message-ID: <dsgcsd$pdg$>

Bengt Richter wrote:
> How about if SLICE byte code interpretation would try to call
> obj.__int__() if passed a non-(int,long) obj ? Would that cover your use case?

I believe that this is pretty much what I'm proposing (except I'm not 
proposing to use the __int__ method because it is already used as 
coercion and doing this would allow floats to be used in slices which is 
a bad thing).  The apply_slice and assign_slice functions in ceval.c are 
called for the SLICE and STORE_SLICE and DELETE_SLICE opcodes.

> BTW the slice type happily accepts anything for start:stop:step I believe,
> and something[slice(whatever)] will call something.__getitem__ with the slice
> instance, though this is neither a fast nor nicely spelled way to customize.

Yes, the slice object itself takes whatever you want.  However, Python 
special-cases what happens for X[a:b] *if* X as the sequence-protocol 
defined.   This is the code-path I'm trying to enhance.


From nyamatongwe at  Thu Feb  9 23:00:10 2006
From: nyamatongwe at (Neil Hodgson)
Date: Fri, 10 Feb 2006 09:00:10 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis:

> COM really solves all problems people might have on Windows.

   COM was partly just a continuation of the practices used for
controls, VBXs and other forms of extension. Visual Basic never forced
use of a particular compiler or runtime library for extensions so why
should Python? It was also easy to debug an extension DLL inside
release-mode VB (I can't recall if debug versions of VB were ever
readily available) which is something that is more difficult than it
should be for Python.

> Alas, it is not a cross-platform API. Standard C is cross-platform,
> so Python uses it in its own APIs.

   The old (pre-XPCOM) Netscape plugin interface was cross-platform
and worked with any compiler on Windows.


From nyamatongwe at  Thu Feb  9 23:00:35 2006
From: nyamatongwe at (Neil Hodgson)
Date: Fri, 10 Feb 2006 09:00:35 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
	<031401c62cbb$61810630$bf03030a@trilan> <>
Message-ID: <>

Martin v. L?wis:

> Not sure whether this was a serious suggestion.

   Yes it is.

> If pythonxy.dll
> was statically linked, you would get all the CRT duplication
> already in extension modules. Given that there are APIs in Python
> where you have to do malloc/free across the python.dll
> boundary, you get memory leaks.

   Memory allocations across DLL boundaries will have to use wrapper functions.


From nyamatongwe at  Thu Feb  9 23:00:51 2006
From: nyamatongwe at (Neil Hodgson)
Date: Fri, 10 Feb 2006 09:00:51 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Paul Moore:

> This has all been thrashed out before, but the issue is passing
> CRT-allocated objects across DLL boundaries.

   Yes, that was the first point I addressed through wrapping CRT objects.

> At first glance, this is a minor issue - passing FILE* pointers across
> DLL boundaries isn't something I'd normally expect people to do - but
> look further and you find you're opening a real can of worms. For
> example, Python has public APIs which take FILE* parameters.

   So convert them to taking PyWrappedFile * parameters.

> Further,
> memory allocation is CRT-managed - allocate memory with one CRT's
> malloc, and dealloacte it elsewhere, and you have issues. So *any*
> pointer could be CRT-managed, to some extent. Etc, etc...

   I thought PyMem_Malloc was the correct call to use for memory
allocation now and avoided direct links to the CRT for memory


From brett at  Thu Feb  9 23:01:42 2006
From: brett at (Brett Cannon)
Date: Thu, 9 Feb 2006 14:01:42 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Barry Warsaw <barry at> wrote:
> On Thu, 2006-02-09 at 11:30 -0800, Guido van Rossum wrote:
> > In the past, the protocol for aqcuiring a PEP number has been to ask
> > the PEP coordinators (Barry Warsaw and David Goodger) to assign one. I
> > believe that we could simplify this protocol to avoid necessary
> > involvement of the PEP coordinators; all that is needed is someone
> > with checkin privileges. I propose the following protocol:
> [omitted]
> In general, this is probably fine.  Occasionally we reserve a PEP number
> for something special, or for a pre-request, but I think both are pretty
> rare.  And because of svn and the commit messages we can at least catch
> those fairly quickly and fix them.  Maybe we can add known reserved
> numbers to PEP 0 so they aren't taken accidentally.
> What I'm actually more concerned about is that we (really David) often
> review PEPs and reject first submissions on several grounds.  I must say
> that David's done such a good job at keeping the quality of PEPs high
> that I'm leery of interfering with that.  OTOH, perhaps those with
> commit privileges should be expected to produce high quality PEPs on the
> first draft.
> Maybe we can amend your rules to those people who both have commit
> privileges and have successfully submitted a PEP before.  PEP virgins
> should go through the normal process.

Sounds reasonable to me.  Then again I don't think I would ever
attempt to get a PEP accepted without at least a single pass over by
python-dev or .  But making it simpler definitely would be nice
when you can already check in yourself.


From oliphant.travis at  Thu Feb  9 23:11:17 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 15:11:17 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <dsgem7$10u$>

Attached is an updated PEP for 357.  I think the index concept is better 
situated in the PyNumberMethods structure.  That way an object doesn't 
have to define the Sequence protocol just to be treated like an index.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PEP_index.txt

From martin at  Thu Feb  9 23:23:02 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 23:23:02 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>	
Message-ID: <>

Neil Hodgson wrote:
>    COM was partly just a continuation of the practices used for
> controls, VBXs and other forms of extension. Visual Basic never forced
> use of a particular compiler or runtime library for extensions so why
> should Python?

Do you really not know? Because of API that happens to be defined
the way it is.

>>Alas, it is not a cross-platform API. Standard C is cross-platform,
>>so Python uses it in its own APIs.
>    The old (pre-XPCOM) Netscape plugin interface was cross-platform
> and worked with any compiler on Windows.

Yes, and consequently, it avoids using standard C library types


From martin at  Thu Feb  9 23:24:58 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 23:24:58 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>	
Message-ID: <>

Neil Hodgson wrote:
>>If pythonxy.dll
>>was statically linked, you would get all the CRT duplication
>>already in extension modules. Given that there are APIs in Python
>>where you have to do malloc/free across the python.dll
>>boundary, you get memory leaks.
>    Memory allocations across DLL boundaries will have to use wrapper functions.

Sure, but that is a change to the API. Contributions are welcome, along
with a plan how breakage of existing code can be minimized.


From martin at  Thu Feb  9 23:28:58 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Feb 2006 23:28:58 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <>
	<>	<>	<>	<>	<>
Message-ID: <>

Neil Hodgson wrote:
>>At first glance, this is a minor issue - passing FILE* pointers across
>>DLL boundaries isn't something I'd normally expect people to do - but
>>look further and you find you're opening a real can of worms. For
>>example, Python has public APIs which take FILE* parameters.
>    So convert them to taking PyWrappedFile * parameters.

Easy to say, hard to do.


From brett at  Thu Feb  9 23:32:47 2006
From: brett at (Brett Cannon)
Date: Thu, 9 Feb 2006 14:32:47 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <dsgem7$10u$>
References: <> <dsgem7$10u$>
Message-ID: <>

Looks good to me.  Only change I might make is mention why __int__
doesn't work sooner (such as in the rationale).  Otherwise +1 from me.


On 2/9/06, Travis E. Oliphant <oliphant.travis at> wrote:
> Attached is an updated PEP for 357.  I think the index concept is better
> situated in the PyNumberMethods structure.  That way an object doesn't
> have to define the Sequence protocol just to be treated like an index.
> -Travis
> PEP: 357357357
> Title:  Allowing any object to be used for slicing
> Version:  Revision 1.2
> Last Modified: 02/09/2006
> Author: Travis Oliphant <oliphant at>
> Status: Draft
> Type:  Standards Track
> Created:  09-Feb-2006
> Python-Version:  2.5
> Abstract
>    This PEP proposes adding an nb_as_index slot in PyNumberMethods and
>    an __index__ special method so that arbitrary objects can be used
>    in slice syntax.
> Rationale
>    Currently integers and long integers play a special role in slice
>    notation in that they are the only objects allowed in slice
>    syntax. In other words, if X is an object implementing the sequence
>    protocol, then X[obj1:obj2] is only valid if obj1 and obj2 are both
>    integers or long integers.  There is no way for obj1 and obj2 to
>    tell Python that they could be reasonably used as indexes into a
>    sequence.  This is an unnecessary limitation.
>    In NumPy, for example, there are 8 different integer scalars
>    corresponding to unsigned and signed integers of 8, 16, 32, and 64
>    bits.  These type-objects could reasonably be used as indexes into
>    a sequence if there were some way for their typeobjects to tell
>    Python what integer value to use.
> Proposal
>    Add a nb_index slot to PyNumberMethods, and a corresponding
>    __index__ special method.  Objects could define a function to
>    place in the sq_index slot that returns an appropriate
>    C-integer for use as ilow or ihigh in PySequence_GetSlice,
>    PySequence_SetSlice, and PySequence_DelSlice.
> Implementation Plan
>    1) Add the slots
>    2) Change the ISINT macro in ceval.c to ISINDEX and alter it to
>       accomodate objects with the index slot defined.
>    3) Change the _PyEval_SliceIndex function to accomodate objects
>       with the index slot defined.
> Possible Concerns
>    Speed:
>    Implementation should not slow down Python because integers and long
>    integers used as indexes will complete in the same number of
>    instructions.  The only change will be that what used to generate
>    an error will now be acceptable.
>    Why not use nb_int which is already there?
>    The nb_int, nb_oct, and nb_hex methods are used for coercion.
>    Floats have these methods defined and floats should not be used in
>    slice notation.
> Reference Implementation
>    Available on PEP acceptance.
> Copyright
>    This document is placed in the public domain
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From oliphant.travis at  Thu Feb  9 23:38:16 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 15:38:16 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <dsgg8s$7it$>

Guido van Rossum wrote:
> On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> BTW do you also still want to turn ZeroDivisionError into a warning
> (that is changed into an error by default)? That idea shared a slide
> and I believe it was discussed in the same meeting you & I and some
> others had in San Mateo last summer.

I think that idea has some support, but I haven't been thinking about it 
for awhile.

> Shouldn't this slot be in the PyNumberMethods extension? It feels more
> like a property of numbers than of a property of sequences. Also, the
> slot name should then probably be nb_index.

Yes, definitely!!!

> There's also an ambiguity when using simple indexing. When writing
> x[i] where x is a sequence and i an object that isn't int or long but
> implements __index__, I think i.__index__() should be used rather than
> bailing out. I suspect that you didn't think of this because you've
> already special-cased this in your code -- when a non-integer is
> passed, the mapping API is used (mp_subscript). This is done to
> suppose extended slicing. The built-in sequences (list, str, unicode,
> tuple for sure, probably more) that implement mp_subscript should
> probe for nb_index before giving up. The generic code in
> PyObject_GetItem should also check for nb_index before giving up.

I agree.  These should also be changed. I'll change the PEP, too.
> I think all sequence objects that implement mp_subscript should
> probably be modified according to the lines I sketched above.


> This is very close to acceptance. I think I'd like to see the patch
> developed and submitted to SF (and assigned to me) prior to
> acceptance.

O.K. I'll work on it.


From guido at  Thu Feb  9 23:42:15 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 14:42:15 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Adam Olsen <rhamph at> wrote:
> On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> >
> > Guido seemed accepting to this idea about 9 months ago when I spoke to
> > him.  I finally got around to writing up the PEP.   I'd really like to
> > get this into Python 2.5 if possible.
> -1
> I've detailed my reasons here:

I don't actually see anything relevant to this discussion in that post.

> In short, there are purely math usages that want to convert to int
> while raising exceptions from inexact results.  The name __index__
> seems inappropriate for this, and I feel it would be cleaner to fix
> float.__int__ to raise exceptions from inexact results (after a
> suitable warning period and with a trunc() function added to math.)

Maybe cleaner, but a thousand time harder given the status quo. Travis
has a need for this *today* and __index__ can be added without any
incompatibilities. I'm not even sure that it's worth changing __int__
for Python 3.0.

Even if float.__int__ raised an exception if the float isn't exactly
an integer, I still think it's wrong to use it here. Suppose I naively
write some floating point code that usually (or with sufficiently
lucky inputs) produces exact results, but which can produce inaccurate
(or at least approximate) results in general. If I use such a result
as an index, your proposal would allow that -- but the program would
suddenly crash with an ImpreciseConversionError exception if the
inputs produced an approximated result. I'd rather be made aware of
this problem on the first run. Then I can decide whether to use int()
or int(round()) or whatever other appropriate conversion.

--Guido van Rossum (home page:

From nyamatongwe at  Thu Feb  9 23:47:56 2006
From: nyamatongwe at (Neil Hodgson)
Date: Fri, 10 Feb 2006 09:47:56 +1100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis:

> > Visual Basic never forced
> > use of a particular compiler or runtime library for extensions so why
> > should Python?
> Do you really not know? Because of API that happens to be defined
> the way it is.

   It was rhetorical: Why should Python be inferior to VB?

   I suppose the answer (hmm, am I allowed to anser my own rhtorical
questions?) is that it was originally developed on other operating
systems and the Windows port has never been as much of a focus for
most contributors.


From rhamph at  Thu Feb  9 23:52:06 2006
From: rhamph at (Adam Olsen)
Date: Thu, 9 Feb 2006 15:52:06 -0700
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <20060209184751.20077.352925131.divmod.quotient.1@ohm>
References: <>
Message-ID: <>

On 2/9/06, Valentino Volonghi aka Dialtone <dialtone at> wrote:
> Let's consider this piece of code (actual code that works today and uses
> twisted for convenience):
> def do_stuff(result):
>     if result == 'Initial Value':
>         d2 = work_on_result_and_return_a_deferred(result)
>         d2.addCallback(println)
>         return d2
>     return 'No work on result'
> def println(something):
>     print something
> d1 = some_operation_that_results_in_a_deferred()
> d1.addCallback(do_stuff)
> d1.addCallback(lambda _: reactor.stop())

PEP 342 provides a much better alternative:

def do_stuff():
    result = (yield some_operation())
    something = (yield work_on_result(result))
    print something
    reactor.stop()  # Maybe unnecessary?

Apparantly it's already been applied to Python 2.5:

Now that may not be the exact syntax that Twisted provides, but the
point is that the layout (and the top-to-bottom execution order) is

Adam Olsen, aka Rhamphoryncus

From martin at  Fri Feb 10 00:03:33 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 00:03:33 +0100
Subject: [Python-Dev] Linking with mscvrt
In-Reply-To: <>
References: <> <>	
Message-ID: <>

Neil Hodgson wrote:
>    I suppose the answer (hmm, am I allowed to anser my own rhtorical
> questions?) is that it was originally developed on other operating
> systems and the Windows port has never been as much of a focus for
> most contributors.

That's certainly the case. It is all Mark Hammond's doing still;
not much has happened since the original Windows port.

The other reason, of course, is that adding *specific* support
for Windows will break support of other platforms. Microsoft
had no problems breaking support of VB on Linux :-)


From thomas at  Fri Feb 10 00:27:34 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 00:27:34 +0100
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <> <dsgem7$10u$>
Message-ID: <>

On Thu, Feb 09, 2006 at 02:32:47PM -0800, Brett Cannon wrote:
> Looks good to me.  Only change I might make is mention why __int__
> doesn't work sooner (such as in the rationale).  Otherwise +1 from me.

I have a slight reservation about the name. On the one hand it's clear the
canonical use will be for indexing sequences, and __index__ doesn't look
enough like __int__ to get people confused on the difference. On the other
hand, there are other places (in C) that want an actual int, and they could
use __index__ too. Even more so if a PyArg_Parse* grew a format for 'the
index-value for this object' ;)

On the *other* one hand, I can't think of a good name... but on the other
other hand, it would be awkward to have to support an old name just because
the real use wasn't envisioned yet.

One-time-machine-for-the-shortsighted-quadrumanus-please-ly y'r,s
Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From aahz at  Fri Feb 10 00:39:46 2006
From: aahz at (Aahz)
Date: Thu, 9 Feb 2006 15:39:46 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <> <dsgem7$10u$>
Message-ID: <>

On Fri, Feb 10, 2006, Thomas Wouters wrote:
> I have a slight reservation about the name. On the one hand it's clear the
> canonical use will be for indexing sequences, and __index__ doesn't look
> enough like __int__ to get people confused on the difference. On the other
> hand, there are other places (in C) that want an actual int, and they could
> use __index__ too. Even more so if a PyArg_Parse* grew a format for 'the
> index-value for this object' ;)
> On the *other* one hand, I can't think of a good name... but on the other
> other hand, it would be awkward to have to support an old name just because
> the real use wasn't envisioned yet.

Can you provide a couple of examples where you think you'd want __index__
functionality but the name would be inappropriate?
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From thomas at  Fri Feb 10 01:03:48 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 01:03:48 +0100
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <> <dsgem7$10u$>
Message-ID: <>

On Thu, Feb 09, 2006 at 03:39:46PM -0800, Aahz wrote:

> Can you provide a couple of examples where you think you'd want __index__
> functionality but the name would be inappropriate?

Not really, or I wouldn't have had only a _slight_ reservation :) There are
many functioncalls and methodcalls that only take integers, though, and they
all currently use int() on their argument., socket.recv,
signal.signal, str.zfill/center/ljust -- basically anything that uses the
'i' PyArg_Parse* format specifier, which is quite a lot. For a great many of
them it will not make sense to pass objects that don't have an appropriate
__int__, but who knows howmany really *mean* to ask for __index__ instead. I
mostly voice the reservation to lure out people with actual reservations ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From jimjjewett at  Fri Feb 10 01:10:38 2006
From: jimjjewett at (Jim Jewett)
Date: Thu, 9 Feb 2006 19:10:38 -0500
Subject: [Python-Dev]  py3k and not equal; re names
Message-ID: <>

Smith asked:

> I'm wondering if it's just "foolish consistency" (to quote a PEP 8)
> that is calling for the dropping of <> in preference of only !=.

Logically, "<=" means the same as "< or ="

<> does not mean the same as "< or >"; it might just mean that
they aren't comparable.  Whether that is a strong enough reason
to remove it is another question.


From bokr at  Fri Feb 10 01:16:30 2006
From: bokr at (Bengt Richter)
Date: Fri, 10 Feb 2006 00:16:30 GMT
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <>

On Thu, 9 Feb 2006 10:33:10 -0800, Guido van Rossum <guido at> wrote:

>Enough already.
>As has clearly been proven, lambda is already perfect.
>To those people still complaining that lambda is crippled because it
>doesn't do statements: First, remember that adding statement
>capability wouldn't really add any power to the language; lambda is
>purely syntactic sugar for an anonymous function definition (see above
>myth debunking section). Second, years of attempts to overcome this
>haven't come up with a usable syntax (and yes, curly braces have been
Yes, but if you're an optimist, those years mean we're closer to the magic moment ;-)

>proposed and rejected like everything else). It's a hard problem
>because switching back to indentation-based parsing inside an
>expression is problematic. For example, consider this hypothetical
>a = foo(lambda x, y:
>      print x
>      print y)
>Should this be considered legal? Or should it be written as
>a = foo(lambda x, y:
>          print x
>          print y
>        )
Neither. If I may ;-)
First, please keep the existing expression lambda exactly as is.

Second, allow a new lambda variant to have a suite. But this necessitates:

1. same suite syntax as a function def suite, with explicit returns of values
   except if falling out with a default None. Just like a function def.

2. diffentiating the variant lambda, and providing for suite termination.
   2a. differentiate by using doubled ':'
       a = foo(lambda x, y :: print x+y)

   2b. the lambda:: variant _requires_ enclosing parens, and the top suite ends at the closing ')'
       A calling context may be sufficient parens, but sometimes, like tuple expressions,
       yet another pair of enclosing expression-parens may be needed.

   2c. Single-line suites terminate on closing paren. Hence
       a = foo(lambda x, y :: print x; print y)  # is ok

   2d. For multiline suites, the first line after the one with the '::' defines
       the column of a single indent (COLSI), at the first non-whitepace character.
       Further indents work normally and terminate by dedent, or the closing ')' may be placed
       anywhere convenient to terminate the  whole lambda suite. Any statement dedenting to left
       of the established single indent column (COLSI) before the closing ')' is a syntax error.
       I recognize that this requires keeping track of independent nested indentation contexts,
       but that part of tokenizing was always fun, I imagine. I'd volunteer to suffer appropriately
       if you like this (the lambda variant, I mean, not my suffering ;-)

I think that's it, though I'm always prepared for a d'oh moment ;-)

>??? (Indenting the prints so they start at a later column than the 'l'
>of 'lambda', and adding an explicit dedent before the close
>parenthesis.) Note that if the former were allowed, we'd have
>additional ambiguity if foo() took two parameters, e.g.:
>a = foo(lambda x, y:
>      print x
>      print y, 42)
>-- is 42 the second argument to foo() or is it printed?
To make 42 a second argument, it would be spelled

    a = foo((lambda x, y::
           print x
           print y), 42)

to have the "print y, 42" statement, you could move the closing paren like

    a = foo((lambda x, y::
           print x
           print y, 42))

but that would have redundant parens with the same meaning as

    a = foo(lambda x, y::
           print x
           print y, 42)

Though perhaps requiring the redundant parens for _all_
(lambda::) expressions would make the grammar easier.
>I'd much rather avoid this snake's nest by giving the function a name
>and using existing statement syntax, like this:
This is Python! How can a snake's nest be bad? ;-)

Seriously, with the above indentation rules it seems straightforward to me.
I do think it would be hard to do something reasonable without an explicitly
differentiated lambda variant though.

>def callback(x, y):
>    print x
>    print y
>a = foo(callback)
    a = foo(lambda x, y :: print x; print y)
>This is unambiguous, easier to parse (for humans as well as for
>computers), and doesn't actually span more text lines. Since this
Well, it does use more lines when :: allows simple statement suites ;-)

>typically happens in a local scope, the name 'callback' disappears as
>soon as as the scope is exited.
>BTW I use the same approach regularly for breaking up long
>expressions; for example instead of writing
>a = foo(some_call(another_call(some_long_argument,
>                               another_argument),
>                  and_more(1, 2, 3),
>        and_still_more())
>I'll write
>x = another_call(some_long_argument, another_argument)
>a = foo(some_call(x, and_more(1, 2, 3)), and_still_more())
>and suddenly my code is more compact and yet easier to read! (In real
>life, I'd use a more meaningful name than 'x', but since the example
>is nonsense it's hard to come up with a meaningful name here. :-)
I can't argue with any of that, except that I think I would like to be
able to do both styles. Sometimes it's nice to define right
in the context of one-shot use, e.g., I could see writing

    ss = sorted(seq, key=(lambda x::
            try: return abs(x)
            except TypeError: return 0))

(unless I desperately wanted to avoid the LOAD_CONST, MAKE_FUNCTION
overhead of using an inline lambda at all. I guess that does favor
a global def done once).

>Parting shot: it appears that we're getting more and more
>expressionized versions of statements: first list comprehensions, then
>generator expressions, most recently conditional expressions, in
>Python 3000 print() will become a function... Seen this way, lambda
>was just ahead of its time! Perhaps we could add a try/except/finally
>expression, and allow assignments in expressions, and then we could
>rid of statements altogether, turning Python into an expression
>language. Change the use of parentheses a bit, and... voila, Lisp! :-)
Well, if you want to do it, (lambda args::suite) is perhaps a start.
I promise not to use it immoderately ;-)
<ducking further>

Bengt Richter

From thomas at  Fri Feb 10 01:23:25 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 01:23:25 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 10, 2006 at 12:16:30AM +0000, Bengt Richter wrote:
> On Thu, 9 Feb 2006 10:33:10 -0800, Guido van Rossum <guido at> wrote:
> >Enough already.

> Yes, but if you're an optimist, those years mean we're closer to the magic
> moment ;-)

Please stop. Discuss it elsewhere. I suggest not CC'ing Guido in that
discussion, either, at least not if you want the proposals to still have a
chance. Also don't CC me, please, although it's not as hazardous as pissing
off Guido ;)

Make-the-hurting-stop-ly y'rs,
Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From guido at  Fri Feb 10 01:27:35 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 16:27:35 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

[Bengt, on lambda :: suite]

Since you probably won't stop until I give you an answer: I'm really
not interested in a syntactic solution that allows multi-line lambdas.
I don't think the complexity (in terms of users needing to learn them)
is worth it. So please stop (as several people have already asked
you). There's some text somewhere in the guidelines for python
developers on when to know when to give up. Read it. :-)

--Guido van Rossum (home page:

From steve at  Fri Feb 10 02:03:40 2006
From: steve at (Steve Holden)
Date: Thu, 09 Feb 2006 20:03:40 -0500
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <dsgopd$290$>

Guido van Rossum wrote:
> [Bengt, on lambda :: suite]
> Since you probably won't stop until I give you an answer: I'm really
> not interested in a syntactic solution that allows multi-line lambdas.
> I don't think the complexity (in terms of users needing to learn them)
> is worth it. So please stop (as several people have already asked
> you). There's some text somewhere in the guidelines for python
> developers on when to know when to give up. Read it. :-)
It's not just a matter of knowing when to give up. It's also a matter of 
actually *giving up* once you know it's time.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From bokr at  Fri Feb 10 02:09:18 2006
From: bokr at (Bengt Richter)
Date: Fri, 10 Feb 2006 01:09:18 GMT
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <>

On Fri, 10 Feb 2006 01:23:25 +0100, Thomas Wouters <thomas at> wrote:

>On Fri, Feb 10, 2006 at 12:16:30AM +0000, Bengt Richter wrote:
>> On Thu, 9 Feb 2006 10:33:10 -0800, Guido van Rossum <guido at> wrote:
>> >Enough already.
[...some stuff snipped...]
>> Yes, but if you're an optimist, those years mean we're closer to the magic
>> moment ;-)
[...some stuff snipped...]
>Please stop. Discuss it elsewhere. I suggest not CC'ing Guido in that
>discussion, either, at least not if you want the proposals to still have a
>chance. Also don't CC me, please, although it's not as hazardous as pissing
>off Guido ;)
Well, he presented a technical problem (indentation for lambda suites),
and my main point was to address it with a suggestion he may not have seen
(or why wouldn't he have mentioned it at least as a dumb failing attempt
at solving the problem he was discussing?)

IMHO it does solve the problem (modulo stupidities that I am prepared to have
my nose rubbed in if I missed something) and was on topic.

If a solution to a problem that Guido presents as an obstacle pisses him off,
I'd be surprised, and disappointed.

>Make-the-hurting-stop-ly y'rs,
I'm sorry you're hurting. That was not my intent ;-/

BTW, I never CC anyone unless they have asked me to. Unless gmane is doing it
automatically, it shouldn't be happening.

Bengt Richter

From oliphant.travis at  Fri Feb 10 02:09:27 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 18:09:27 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <dsgp48$41b$>

Guido van Rossum wrote:
> On 2/9/06, Travis Oliphant <oliphant.travis at> wrote:
> This is very close to acceptance. I think I'd like to see the patch
> developed and submitted to SF (and assigned to me) prior to
> acceptance.
>>   This document is placed in the public domain
> If you agree with the above comments, please send me an updated
> version of the PEP and I'll check it in over the old one, and approve
> it. Then just use SF to submit the patch etc.

I uploaded a patch to SF against current SVN.  The altered code compiles 
and the functionality works with classes defined in Python.  I have yet 
to test against a C-type that defines the method.

The patch adds a new API function int PyObject_AsIndex(obj).

This was not specifically in the PEP but probably should be.  The name 
could also be PyNumber_AsIndex(obj)  but I was following the nb_nonzero 
slot example to help write the code.


From aleaxit at  Fri Feb 10 02:26:45 2006
From: aleaxit at (Alex Martelli)
Date: Thu, 9 Feb 2006 17:26:45 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <dsgp48$41b$>
References: <>
Message-ID: <>

On 2/9/06, Travis E. Oliphant <oliphant.travis at> wrote:
> The patch adds a new API function int PyObject_AsIndex(obj).
> This was not specifically in the PEP but probably should be.  The name
> could also be PyNumber_AsIndex(obj)  but I was following the nb_nonzero
> slot example to help write the code.

Shouldn't that new API function (whatever its name) also be somehow
exposed for easy access from Python code? I realize new builtins are
unpopular, so a builtin 'asindex' might not be appropriate, but
perhaps operator.asindex might be. My main point is that I don't think
we want every Python-coded sequence to have to call x.__index__()


From guido at  Fri Feb 10 02:34:22 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 17:34:22 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/9/06, Alex Martelli <aleaxit at> wrote:
> Shouldn't that new API function (whatever its name) also be somehow
> exposed for easy access from Python code? I realize new builtins are
> unpopular, so a builtin 'asindex' might not be appropriate, but
> perhaps operator.asindex might be. My main point is that I don't think
> we want every Python-coded sequence to have to call x.__index__()
> instead.

Very good point; this is why we have a PEP discussion phase.

If it's x.__index__(), I think it ought to be operator.index(x). I'm
not sure we need a builtin (also not sure we don't).

I wonder if hasattr(x, "__index__") can be used as the litmus test for
int-ness? (Then int and long should have one that returns self.)

Travis, can you send me additional PEP updates as context or unified
diffs vs. the PEP in SVN? (or against if
you don't want to download the entire PEP directory).

--Guido van Rossum (home page:

From oliphant.travis at  Fri Feb 10 02:35:43 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Thu, 09 Feb 2006 18:35:43 -0700
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
	<dsgem7$10u$>	<>
Message-ID: <dsgqlg$7ml$>

Thomas Wouters wrote:
> On Thu, Feb 09, 2006 at 02:32:47PM -0800, Brett Cannon wrote:
>>Looks good to me.  Only change I might make is mention why __int__
>>doesn't work sooner (such as in the rationale).  Otherwise +1 from me.
> I have a slight reservation about the name. On the one hand it's clear the
> canonical use will be for indexing sequences, and __index__ doesn't look
> enough like __int__ to get people confused on the difference. On the other
> hand, there are other places (in C) that want an actual int, and they could
> use __index__ too. Even more so if a PyArg_Parse* grew a format for 'the
> index-value for this object' ;)

There are other places in Python that check specifically for int objects 
and long integer objects and fail with anything else.  Perhaps all of 
these should aslo call the __index__ slot.

But, then it *should* be renamed to i.e. "__true_int__".  One such place 
is in abstract.c sequence_repeat function.

The patch I submitted, perhaps aggressivele, changed that function to 
call the nb_index slot as well instead of raising an error.

Perhaps the slot should be called nb_true_int?


> On the *other* one hand, I can't think of a good name... but on the other
> other hand, it would be awkward to have to support an old name just because
> the real use wasn't envisioned yet.
> One-time-machine-for-the-shortsighted-quadrumanus-please-ly y'r,s

From guido at  Fri Feb 10 02:54:41 2006
From: guido at (Guido van Rossum)
Date: Thu, 9 Feb 2006 17:54:41 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <dsgqlg$7ml$>
References: <> <dsgem7$10u$>
	<> <dsgqlg$7ml$>
Message-ID: <>

On 2/9/06, Travis E. Oliphant <oliphant.travis at> wrote:
> Thomas Wouters wrote:
> > I have a slight reservation about the name. On the one hand it's clear the
> > canonical use will be for indexing sequences, and __index__ doesn't look
> > enough like __int__ to get people confused on the difference. On the other
> > hand, there are other places (in C) that want an actual int, and they could
> > use __index__ too. Even more so if a PyArg_Parse* grew a format for 'the
> > index-value for this object' ;)

I think we should just change all the existing formats that require
int or long to support nb_as_index.

> There are other places in Python that check specifically for int objects
> and long integer objects and fail with anything else.  Perhaps all of
> these should aslo call the __index__ slot.

Right, absolutely.

> But, then it *should* be renamed to i.e. "__true_int__".  One such place
> is in abstract.c sequence_repeat function.

I don't like __true_int__ very much. Personally, I'm fine with calling
it __index__ after the most common operation. (Well, I would be since
I think I came up with the name in the first place. :-) Since naming
is always so subjective *and* important, I'll wait a few days, but if
nobody suggests something better then we should just go with

--Guido van Rossum (home page:

From bokr at  Fri Feb 10 03:07:28 2006
From: bokr at (Bengt Richter)
Date: Fri, 10 Feb 2006 02:07:28 GMT
Subject: [Python-Dev] Let's just *keep* lambda
References: <>
Message-ID: <>

On Thu, 9 Feb 2006 16:27:35 -0800, Guido van Rossum <guido at> wrote:

>[Bengt, on lambda :: suite]
>Since you probably won't stop until I give you an answer: I'm really
>not interested in a syntactic solution that allows multi-line lambdas.
>I don't think the complexity (in terms of users needing to learn them)
>is worth it. So please stop (as several people have already asked
>you). There's some text somewhere in the guidelines for python
>developers on when to know when to give up. Read it. :-)
Thank you. I give up ;-) I will try to find it and read it.

But no fair tempting the weak with
It's a hard problem ...  For example, consider this hypothetical
example: ...

Bengt Richter

From stephen at  Fri Feb 10 03:43:41 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 10 Feb 2006 11:43:41 +0900
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
	(Brett Cannon's message of "Thu, 9 Feb 2006 14:01:42 -0800")
References: <>
Message-ID: <>

>>>>> "Brett" == Brett Cannon <brett at> writes:

    Brett> On 2/9/06, Barry Warsaw <barry at> wrote:

    >> Maybe we can amend your rules to those people who both have
    >> commit privileges and have successfully submitted a PEP before.
    >> PEP virgins should go through the normal process.


    Brett> Sounds reasonable to me.  Then again I don't think I would
    Brett> ever attempt to get a PEP accepted without at least a
    Brett> single pass over by python-dev or .  But making it
    Brett> simpler definitely would be nice when you can already check
    Brett> in yourself.

Besides Brett's point that in some sense most new authors *want* to go
through the normal process, having the normal process means that there
are a couple of people you can contact who are default mentor/editors,

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From tim.peters at  Fri Feb 10 04:17:02 2006
From: tim.peters at (Tim Peters)
Date: Thu, 9 Feb 2006 22:17:02 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
Message-ID: <>

Noticed that various socket tests are failing today, WinXP, Python trunk:

Exception in thread Thread-27:
Traceback (most recent call last):
  File "C:\Code\python\lib\", line 444, in __bootstrap
  File "C:\Code\python\lib\", line 424, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Code\python\lib\test\", line 50, in listener
  File "C:\Code\python\lib\", line 169, in accept
    sock, addr = self._sock.accept()
error: unable to select on socket

test test_socket_ssl crashed -- socket.error: (10061, 'Connection refused')

test test_urllibnet failed -- errors occurred; run in verbose mode for details
Running that in verbose mode shows 2 "ok" and 8 "ERROR".  A typical ERROR:

ERROR: test_basic (test.test_urllibnet.urlopenNetworkTests)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 43, in test_basic
    open_url = urllib.urlopen("")
  File "C:\Code\python\lib\", line 82, in urlopen
  File "C:\Code\python\lib\", line 190, in open
    return getattr(self, name)(url)
  File "C:\Code\python\lib\", line 325, in open_http
  File "C:\Code\python\lib\", line 798, in endheaders
  File "C:\Code\python\lib\", line 679, in _send_output
  File "C:\Code\python\lib\", line 658, in send
  File "<string>", line 1, in sendall
IOError: [Errno socket error] unable to select on socket

test_logging appears to consume 100% of a CPU now, and never finishes.
 This may be an independent error.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Code\python\lib\", line 444, in __bootstrap
  File "C:\Code\python\lib\test\", line 18, in run
    conn, client = sock.accept()
  File "C:\Code\python\lib\", line 169, in accept
    sock, addr = self._sock.accept()
error: unable to select on socket

test_socket is a long-winded disaster.

test test_socketserver crashed -- socket.error: (10061, 'Connection refused')

There are others, but tests that use sockets hang a lot now & it's
tedious to worm around that.

I _suspect_ that rev 42253 introduced these problems.  For example, that added:

+       /* Guard against socket too large for select*/
+       if (s->sock_fd >= FD_SETSIZE)
+               return SOCKET_INVALID;

to _ssl.c, and added

+/* Can we call select() with this socket without a buffer overrun? */
+#define IS_SELECTABLE(s) ((s)->sock_fd < FD_SETSIZE)

to socketmodule.c, but those appear to make no sense.  FD_SETSIZE is
the maximum number of distinct fd's an fdset can hold, and the
numerical magnitude of any specific fd has nothing to do with that in
general (they may be related in fact on Unix systems that implement an
fdset as "a big bit vector" -- but Windows doesn't work that way, and
neither do all Unix systems, and nothing in socket specs requires an
implementation to work that way).

From greg.ewing at  Fri Feb 10 04:20:22 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 10 Feb 2006 16:20:22 +1300
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

skip at wrote:

> Lambdas are expressions.  Statements can't be embedded in expressions.  That
> statements and expressions are different is a core feature of the language.
> That is almost certainly not going to change.

Although "print" may become a function in 3.0, so that this
particular example would no longer be a problem.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From barry at  Fri Feb 10 00:25:29 2006
From: barry at (Barry Warsaw)
Date: Thu, 09 Feb 2006 18:25:29 -0500
Subject: [Python-Dev] py3k and not equal; re names
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, 2006-02-09 at 19:10 -0500, Jim Jewett wrote:

> Logically, "<=" means the same as "< or ="
> <> does not mean the same as "< or >"; it might just mean that
> they aren't comparable.  Whether that is a strong enough reason
> to remove it is another question.

Visually, "==" looks very symmetrical and stands out nicely, while "!="
is asymmetric and jarring.  "<>" has a visual symmetry that is a nice
counterpart to "==".  For me, that's enough of a reason to keep it.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From greg.ewing at  Fri Feb 10 04:49:13 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 10 Feb 2006 16:49:13 +1300
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <20060209184751.20077.352925131.divmod.quotient.1@ohm>
References: <20060209184751.20077.352925131.divmod.quotient.1@ohm>
Message-ID: <>

Valentino Volonghi aka Dialtone wrote:

> when some_operation_that_results_in_a_deferred() -> result:
>     if result == 'Initial Value':
>         when work_on_result_and_return_a_deferred(result) -> inner_res:
>             print inner_res
>     else:
>         print "No work on result"
>     reactor.stop()

Hmmm. This looks remarkably similar to something I got half
way through dreaming up a while back, that I was going to
call "Simple Continuations" (by analogy with "Simple Generators").
Maybe I should finish working out the details and write it up.

On the other hand, it may turn out that it's subsumed by
the new enhanced generators plus a trampoline.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Fri Feb 10 04:59:38 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 10 Feb 2006 16:59:38 +1300
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <> <dsgem7$10u$>
Message-ID: <>

Thomas Wouters wrote:

> I have a slight reservation about the name. ... On the other
> hand, there are other places (in C) that want an actual int, and they could
> use __index__ too.

Maybe __exactint__?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Fri Feb 10 05:05:22 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 10 Feb 2006 17:05:22 +1300
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> To those people who believe that lambda is required in some situations
> because it behaves differently with respect to the surrounding scope
> than def: it doesn't, and it never did. This is (still!) a
> surprisingly common myth. I have no idea where it comes from; does
> this difference exist in some other language that has lambda as well
> as some other function definition mechanism?

Not that I know of. Maybe it's because these people first
encountered the concept of a closure in when using lambda in
Lisp or Scheme, and unconsciously assumed there was a

> Parting shot: it appears that we're getting more and more
> expressionized versions of statements: ...
 > Perhaps we could add a try/except/finally
> expression, and allow assignments in expressions, and then we could
> rid of statements altogether, turning Python into an expression
> language. Change the use of parentheses a bit, and... voila, Lisp! :-)
> <duck>

Or we could go the other way and provide means of writing
all expressions as statements.

     lambda y,z:
       w =:
         "Result is"


Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From scott+python-dev at  Fri Feb 10 06:02:25 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 00:02:25 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

Tim Peters wrote:
> I _suspect_ that rev 42253 introduced these problems.  For example, that added:
> +       /* Guard against socket too large for select*/
> +       if (s->sock_fd >= FD_SETSIZE)
> +               return SOCKET_INVALID;
> to _ssl.c, and added
> +/* Can we call select() with this socket without a buffer overrun? */
> +#define IS_SELECTABLE(s) ((s)->sock_fd < FD_SETSIZE)
> to socketmodule.c, but those appear to make no sense.  FD_SETSIZE is
> the maximum number of distinct fd's an fdset can hold, and the
> numerical magnitude of any specific fd has nothing to do with that in
> general (they may be related in fact on Unix systems that implement an
> fdset as "a big bit vector" -- but Windows doesn't work that way, and
> neither do all Unix systems, and nothing in socket specs requires an
> implementation to work that way).

Neal checked these changes in to address bug #876637 "Random stack 
corruption from socketmodule.c" But the Windows implementation of 
"select" is entirely different than other platforms, in so far as 
windows uses an internal counter to assign fds to an fd_set, so the fd 
number itself has no relevance to where they are placed in an fd_set. 
This stack corruption bug then does not exist on Windows, and so the 
code should not be used with Windows either.

Scott Dial
scott at
dialsa at

From tjreedy at  Fri Feb 10 06:07:37 2006
From: tjreedy at (Terry Reedy)
Date: Fri, 10 Feb 2006 00:07:37 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
References: <> <dsgem7$10u$>
Message-ID: <dsh72n$c5c$>

>   Add a nb_index slot to PyNumberMethods, and a corresponding
>   __index__ special method.  Objects could define a function to
>   place in the sq_index slot that returns an appropriate

I presume 'sq_index' should also be 'nb_index' 

From tim.peters at  Fri Feb 10 06:36:09 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 00:36:09 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

> ...  FD_SETSIZE is the maximum number of distinct fd's an fdset can
> hold, and the numerical magnitude of any specific fd has nothing to do
> with that in general (they may be related in fact on Unix systems that
> implement an fdset as "a big bit vector" -- but Windows doesn't work
> that way, and neither do all Unix systems, and nothing in socket
> specs requires an implementation to work that way).

Hmm.  Looks like POSIX _does_ require that.  Can't work on Windows,
though.  I have a distinct memory of a 64-bit Unix that didn't work
that way either, but while that memory is younger than I am, it's too
old for me to recall more than just that ;-).

From dialtone at  Fri Feb 10 09:19:37 2006
From: dialtone at (Valentino Volonghi aka Dialtone)
Date: Fri, 10 Feb 2006 09:19:37 +0100
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
Message-ID: <20060210081937.20077.1221015021.divmod.quotient.180@ohm>

On Fri, 10 Feb 2006 16:49:13 +1300, Greg Ewing <greg.ewing at> wrote:
>Valentino Volonghi aka Dialtone wrote:
>> when some_operation_that_results_in_a_deferred() -> result:
>>     if result == 'Initial Value':
>>         when work_on_result_and_return_a_deferred(result) -> inner_res:
>>             print inner_res
>>     else:
>>         print "No work on result"
>>     reactor.stop()
>Hmmm. This looks remarkably similar to something I got half
>way through dreaming up a while back, that I was going to
>call "Simple Continuations" (by analogy with "Simple Generators").
>Maybe I should finish working out the details and write it up.
>On the other hand, it may turn out that it's subsumed by
>the new enhanced generators plus a trampoline.

This in only partially true. In fact, let's consider again twisted for the example, you can do something like this:

def foo():
    for url in urls:
        page = yield client.getPage(url)
        print page

This has 2 disadvantages IMHO. First of all I have to use a function or a method decorated with @defgen to write that. But most important that code, although correct is serializing things that could be parallel. The solution is again simple but not really intuitive:

def foo():
    for d in map(client.getPage, urls):
        page = yield d
        print page

Written in this way it will actually work in a parallel way but it is not really an intuitive solution.

Using when instead:

for url in urls:
    when client.getPage(url) -> page:
        print page

This wouldn't have any problem and is quite readable. A similar construct is used in the E language and here is explained how when works for them and their promise object. You can also have multiple things to wait for:

when (client.getPage(url), cursor.execute(query)) -> (page, results):
    print page, results


l = [list, of, deferreds]

when l -> *results:
    print results

and we could catch errors in the following way:

when client.getPage(url) -> page:
    print page
except socket.error, e:
    print "something bad happened"


Valentino Volonghi aka Dialtone
Now Running MacOSX 10.4
New Pet:

From rasky at  Fri Feb 10 09:43:10 2006
From: rasky at (Giovanni Bajo)
Date: Fri, 10 Feb 2006 09:43:10 +0100
Subject: [Python-Dev] Linking with mscvrt
References: <><>	<>	<>	<>	<><>
Message-ID: <018b01c62e1e$05a47b80$0e4d2597@bagio>

Martin v. L?wis wrote:

>>> At first glance, this is a minor issue - passing FILE* pointers
>>> across
>>> DLL boundaries isn't something I'd normally expect people to do -
>>> but
>>> look further and you find you're opening a real can of worms. For
>>> example, Python has public APIs which take FILE* parameters.
>>    So convert them to taking PyWrappedFile * parameters.
> Easy to say, hard to do.

But *that's* the solution for this problem. It's always been like this under
Windows and will always be. Changing back to msvcrt so that people must compile
their extension with non-standard compilation options it's really *worse* than
just requiring msvcrt71 and punt. There's also a free compiler from Microsoft
and tons of webpages which say how to compile with it. Or with mingw, even.

So, I really believe that the situation is settling down. People are doing what
they want to, with some difficulties perhaps, but there's nothing really
undoable. If another change has to be pursued, it is to abstract Python from
CRT altogether, or at least across boundaries.
Giovanni Bajo

From python at  Fri Feb 10 09:57:14 2006
From: python at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 03:57:14 -0500
Subject: [Python-Dev] Let's just *keep* lambda
References: <20060210081937.20077.1221015021.divmod.quotient.180@ohm>
Message-ID: <000e01c62e1f$fd2e9470$b83efea9@RaymondLaptop1>

Die thread, die!

From python at  Fri Feb 10 10:12:55 2006
From: python at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 04:12:55 -0500
Subject: [Python-Dev] _length_cue()
References: <>
Message-ID: <002001c62e22$2de27760$b83efea9@RaymondLaptop1>

>> I was really attracted to the idea of having more informative iterator
>> representations but learned that even when it could be done, it wasn't
>> especially useful.  When someone creates an iterator at the
>> interactive
>> prompt, they almost always either wrap it in a consumer function or
>> they
>> assign it to a variable.  The case of typing just,
>> "enumerate([1,2,3])",
>> comes up only once, when first learning was enumerate() does.
> On the other hand, it's very common to see the iterator in the debug 
> window
> showing the locals or the watches. And it's pretty easy to add some 
> debugging
> print statement to the code, run the program/test, find out that, hey, 
> that
> function returns an iterator, go back and add a list() around it to find 
> out
> what's inside.
> I would welcome if the iterator repr string could show, when possible, the 
> next
> couple of elements.

Sorry, that's a pipe-dream.  Real use-cases for enumerate() don't usually 
have the luxury of having an argument that is a sequence.  Instead, you have 
to run the iteration a few steps to see what lies ahead.  In general, this 
isn't always possible (stdin for example) or desirable (where the iterator 
is time consuming or memory intensive and so shouldn't be run unless the 
value is actually needed) or may even be a disaster (if the iterator 
participates in co-routine style code that expects to be passing control 
back and forth between multiple open iterators).  IOW, you cannot safely run 
an iterator a few steps in advance, save-up the results for display, and 
then expect everything else to work right.

I spent a good time of time pursuing this mirage, but there was no water:

AFAICT, the only way to achieve the effect you want is to get an environment 
where all iterators are designed around an API that supports being run 
forward and backward (such as the one demonstrated by Armin at PyCon last 


From thomas at  Fri Feb 10 11:27:13 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 11:27:13 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 10, 2006 at 12:36:09AM -0500, Tim Peters wrote:
> [Tim]
> > ...  FD_SETSIZE is the maximum number of distinct fd's an fdset can
> > hold, and the numerical magnitude of any specific fd has nothing to do
> > with that in general (they may be related in fact on Unix systems that
> > implement an fdset as "a big bit vector" -- but Windows doesn't work
> > that way, and neither do all Unix systems, and nothing in socket
> > specs requires an implementation to work that way).

> Hmm.  Looks like POSIX _does_ require that.  Can't work on Windows,
> though.  I have a distinct memory of a 64-bit Unix that didn't work
> that way either, but while that memory is younger than I am, it's too
> old for me to recall more than just that ;-).

Perhaps the memory you have is of select-lookalikes, like poll(), or maybe
of vendor-specific (and POSIX-breaking) extensions to select(). select()
performs pretty poorly on large fdsets with holes in, and has the fixed size
fdset problem, so poll() was added to fix that (by Linux and later by XPG4,
IIRC.) poll() takes an array of structs containing the fd, the operations to
watch for and an output parameter with seen events. Does that jar your
memory? :)

(The socketmodule has support for poll(), on systems that have it, by the

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From ncoghlan at  Fri Feb 10 12:26:33 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 10 Feb 2006 21:26:33 +1000
Subject: [Python-Dev] cProfile module
In-Reply-To: <>
References: <>
Message-ID: <>

Armin Rigo wrote:
> Hi all,
> As promized two months ago, I eventually finished the integration of the
> 'lsprof' profiler.  It's now in an internal '_lsprof' module that is
> exposed via a 'cProfile' module with the same interface as 'profile',
> producing compatible dump stats that can be inspected with 'pstats'.

Hurrah! (trying to optimise the Decimal module before 2.4 was a painful 
exercise, because hotshot wasn't really up to the job and executing the tests 
and the benchmark under the normal profile module was horribly slow. . .).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Fri Feb 10 13:16:42 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 10 Feb 2006 22:16:42 +1000
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>	<>	<dsfmp3$pc4$>
Message-ID: <>

Adam Olsen wrote:
> I guess my confusion revolves around float to Decimal.  Is lossless
> conversion a good thing in python, or is prohibiting float to Decimal
> conversion just a fudge to prevent people from initializing a Decimal
> from a float when they really want a str?

The general rule is that a lossy conversion is fine, so long as the programmer 
explicitly requests it.

float to Decimal is a special case, which has more to do with the nature of 
Decimal and the guarantees it provides, than to do with general issues of 
lossless conversion.

Specifically, what does Decimal(1.1) mean? Did you want Decimal("1.1") or 
Decimal("1.100000001")? Allowing direct conversion from float would simply 
infect the Decimal type with all of the problems of binary floating point 
representations, without providing any countervailing benefit.

The idea of providing a special notation or separate method for float 
precision was toyed with, but eventually rejected in favour of the existing 
string formatting notation and a straight up type error. Facundo included the 
gory details in the final version of his PEP [1].



Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Fri Feb 10 13:45:44 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 10 Feb 2006 22:45:44 +1000
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
	<dsgem7$10u$>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
>> But, then it *should* be renamed to i.e. "__true_int__".  One such place
>> is in abstract.c sequence_repeat function.
> I don't like __true_int__ very much. Personally, I'm fine with calling
> it __index__ after the most common operation. (Well, I would be since
> I think I came up with the name in the first place. :-) Since naming
> is always so subjective *and* important, I'll wait a few days, but if
> nobody suggests something better then we should just go with
> __index__.

An alternative would be to call it "__discrete__", as that is the key 
characteristic of an indexing type - it consists of a sequence of discrete 
values that can be isomorphically mapped to the integers. Numbers conceptually 
representing continuously variable quantities (such as floats and decimals) 
are the ones that really shouldn't define this method.

I wouldn't mind __index__ though, as even though some of the use cases won't 
be strictly using the result as an index, the shared characteristic of being 
isomorphic to the integers should be sufficient to allow the term to make some 
sort of sense.

This would hardly be the first case where names of operators are overloaded 
using imprecise terminology, after all. 'or', 'and', 'sub' and 'xor' aren't 
the right terms for set union, intersection, difference and disjunction, but 
they're close enough conceptually that the names still have meaning. Ditto for 
'mul' and 'add' meaning repetition and concatenation for sequences (no comment 
on 'mod' and string formatting though. . .)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From arigo at  Fri Feb 10 14:08:06 2006
From: arigo at (Armin Rigo)
Date: Fri, 10 Feb 2006 14:08:06 +0100
Subject: [Python-Dev] _length_cue()
In-Reply-To: <009001c62d1f$79a6abc0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Hi Raymond,

On Wed, Feb 08, 2006 at 09:21:02PM -0500, Raymond Hettinger wrote:
> (... __getitem_cue__ ...)
> Before putting this in production, it would probably be worthwhile to search 
> for code where it would have been helpful.  In the case of __length_cue__, 
> there was an immediate payoff.

Indeed, I don't foresee any place where it would help apart from the
__repr__ of the iterators, which is precisely what I'm aiming at.  The
alternative here would be a kind of "smart" global function that knows
about many built-in iterator types and is able to fish for the data
inside automatically (but this hits problems of data structures being
private).  I thought that __getitem_cue__ would be a less dirty
solution.  I really think a better __repr__ would be generally helpful,
and I cannot think of a 3rd solution at the moment...  (Ideas welcome!)

A bientot,


From ncoghlan at  Fri Feb 10 14:21:52 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 10 Feb 2006 23:21:52 +1000
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>	<00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1>	<>	<009001c62d1f$79a6abc0$b83efea9@RaymondLaptop1>
Message-ID: <>

Armin Rigo wrote:
> Indeed, I don't foresee any place where it would help apart from the
> __repr__ of the iterators, which is precisely what I'm aiming at.  The
> alternative here would be a kind of "smart" global function that knows
> about many built-in iterator types and is able to fish for the data
> inside automatically (but this hits problems of data structures being
> private).  I thought that __getitem_cue__ would be a less dirty
> solution.  I really think a better __repr__ would be generally helpful,
> and I cannot think of a 3rd solution at the moment...  (Ideas welcome!)

Do they really need anything more sophisticated than:

   def __repr__(self):
     return "%s(%r)" % (type(self).__name__, self._subiter)

(modulo changes in the format of arguments, naturally. This simple one would 
work for things like enumerate and reversed, though)

If the subiterators themselves have decent repr methods, the top-level repr 
should also look reasonable.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From arigo at  Fri Feb 10 14:25:41 2006
From: arigo at (Armin Rigo)
Date: Fri, 10 Feb 2006 14:25:41 +0100
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Greg,

On Thu, Feb 09, 2006 at 04:27:54PM +1300, Greg Ewing wrote:
> The iterator protocol is currently very simple and
> well-focused on a single task -- producing things
> one at a time, in sequence. Let's not clutter it up
> with too much more cruft.

Please refer to my original message: I intended these methods to be
private and undocumented, not part of any official protocol in any way.

A bientot,


From arigo at  Fri Feb 10 14:33:08 2006
From: arigo at (Armin Rigo)
Date: Fri, 10 Feb 2006 14:33:08 +0100
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Nick,

On Fri, Feb 10, 2006 at 11:21:52PM +1000, Nick Coghlan wrote:
> Do they really need anything more sophisticated than:
>    def __repr__(self):
>      return "%s(%r)" % (type(self).__name__, self._subiter)
> (modulo changes in the format of arguments, naturally. This simple one would 
> work for things like enumerate and reversed, though)

My goal here is not primarily to help debugging, but to help playing
around at the interactive command-line.  Python's command-line should
not be dismissed as "useless for real programmers"; I definitely use it
all the time to try things out.  It would be nicer if all these
iterators I'm not familiar with would give me a hint about what they
actually return, instead of:

>>> itertools.count(17)
count(17)                  # yes, thank you, not very helpful
>>> enumerate("spam")
enumerate("spam")          # with your proposed extension -- not better

However, if this kind of goal is considered "not serious enough" for
adding a private special method, then I'm fine with trying out a fishing

A bientot,


From ncoghlan at  Fri Feb 10 14:44:45 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 10 Feb 2006 23:44:45 +1000
Subject: [Python-Dev] _length_cue()
In-Reply-To: <>
References: <>
Message-ID: <>

Armin Rigo wrote:
> Hi Nick,
> On Fri, Feb 10, 2006 at 11:21:52PM +1000, Nick Coghlan wrote:
>> Do they really need anything more sophisticated than:
>>    def __repr__(self):
>>      return "%s(%r)" % (type(self).__name__, self._subiter)
>> (modulo changes in the format of arguments, naturally. This simple one would 
>> work for things like enumerate and reversed, though)
> My goal here is not primarily to help debugging, but to help playing
> around at the interactive command-line.  Python's command-line should
> not be dismissed as "useless for real programmers"; I definitely use it
> all the time to try things out.  It would be nicer if all these
> iterators I'm not familiar with would give me a hint about what they
> actually return, instead of:
>>>> itertools.count(17)
> count(17)                  # yes, thank you, not very helpful
>>>> enumerate("spam")
> enumerate("spam")          # with your proposed extension -- not better
> However, if this kind of goal is considered "not serious enough" for
> adding a private special method, then I'm fine with trying out a fishing
> approach.

Ah, I see the use case now. You're right in thinking I was mainly considering 
the debugging element (and supporting even that would be an improvement on the 
current repr methods, which are just the 'type with instance ID' default repr).

In terms of "what does it do" though, I'd tend to actually iterate the thing:

Py> for x in enumerate("spam"): print x
(0, 's')
(1, 'p')
(2, 'a')
(3, 'm')


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mwh at  Fri Feb 10 14:56:45 2006
From: mwh at (Michael Hudson)
Date: Fri, 10 Feb 2006 13:56:45 +0000
Subject: [Python-Dev] Post-PyCon PyPy Sprint: February 27th - March 2nd 2006
Message-ID: <>

The next PyPy sprint is scheduled to take place right after 
PyCon 2006 in Dallas, Texas, USA. 

We hope to see lots of newcomers at this sprint, so we'll give
friendly introductions.  Note that during the Pycon conference 
we are giving PyPy talks which serve well as preparation.  

Goals and topics of the sprint 

While attendees of the sprint are of course welcome to work on what
they wish, we offer these ideas:

  - Work on an 'rctypes' module aiming at letting us use a ctypes
    implementation of an extension module from the compiled pypy-c.

  - Writing ctypes implementations of modules to be used by the above

  - Experimenting with different garbage collection strategies.

  - Implementing Python 2.5 features in PyPy

  - Implementation of constraints solvers and integration of dataflow
    variables to PyPy.

  - Implement new features and improve the 'py' lib and py.test 
    which are heavily used by PyPy (doctests/test selection/...).

  - Generally experiment with PyPy -- for example, play with
    transparent distribution of objects or coroutines and stackless
    features at application level.

  - Have fun!


The sprint will be held wherever the PyCon sprints end up being held,
which is to say somewhere within the Dallas/Addison Marriott Quorum

For more information see the PyCon 06 sprint pages:


Exact times 

The PyPy sprint will from from Monday February 27th until Thursday
March 2nd 2006. Hours will be from 10:00 until people have had enough.

Registration, etc.

If you know before the conference that you definitely want to attend
our sprint, please subscribe to the `PyPy sprint mailing list`_,
introduce yourself and post a note that you want to come.  Feel free
to ask any questions or make suggestions there!

There is a separate `PyCon 06 people`_ page tracking who is already
planning to come.  If you have commit rights on codespeak then you can
modify yourself a checkout of

.. _`PyPy sprint mailing list`:
.. _`PyCon 06 people`:

42. You can measure a programmer's perspective by noting his
    attitude on the continuing vitality of FORTRAN.
  -- Alan Perlis,

From Jack.Jansen at  Fri Feb 10 15:23:07 2006
From: Jack.Jansen at (Jack Jansen)
Date: Fri, 10 Feb 2006 15:23:07 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
Message-ID: <>

I keep running into problems with the "const" modifications to  
PyArg_ParseTupleAndKeywords() (rev. 41638 by Jeremy).

I have lots of code of the form
	char *kw[] = {"itself", 0};

	if (PyArg_ParseTupleAndKeywords(_args, _kwds, "O&", kw,  
CFTypeRefObj_Convert, &itself)) ...
which now no longer compiles, neither with C nor with C++ (gcc4, both  
MacOSX and Linux). Changing the kw declaration to "const char *kw[]"  
makes it compile again.

I don't understand why it doesn't compile: even though the  
PyArg_ParseTupleAndKeywords signature promises that it won't change  
the "kw" argument I see no reason why I shouldn't be able to pass a  
non-const argument.

And to make matters worse adding the "const" of course makes the code  
non-portable to previous versions of Python (where the C compiler  
rightly complains that I'm passing a const object through a non-const  

Can anyone enlighten me?
Jack Jansen, <Jack.Jansen at>,
If I can't dance I don't want to be part of your revolution -- Emma  

From guido at  Fri Feb 10 16:39:53 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 07:39:53 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

OMG. Are we now adding 'const' modifiers to random places? I thought
"const propagation hell" was a place we were happily avoiding by not
falling for that meme. What changed?


On 2/10/06, Jack Jansen <Jack.Jansen at> wrote:
> I keep running into problems with the "const" modifications to
> PyArg_ParseTupleAndKeywords() (rev. 41638 by Jeremy).
> I have lots of code of the form
>         char *kw[] = {"itself", 0};
>         if (PyArg_ParseTupleAndKeywords(_args, _kwds, "O&", kw,
> CFTypeRefObj_Convert, &itself)) ...
> which now no longer compiles, neither with C nor with C++ (gcc4, both
> MacOSX and Linux). Changing the kw declaration to "const char *kw[]"
> makes it compile again.
> I don't understand why it doesn't compile: even though the
> PyArg_ParseTupleAndKeywords signature promises that it won't change
> the "kw" argument I see no reason why I shouldn't be able to pass a
> non-const argument.
> And to make matters worse adding the "const" of course makes the code
> non-portable to previous versions of Python (where the C compiler
> rightly complains that I'm passing a const object through a non-const
> parameter).
> Can anyone enlighten me?
> --
> Jack Jansen, <Jack.Jansen at>,
> If I can't dance I don't want to be part of your revolution -- Emma
> Goldman
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From jeremy at  Fri Feb 10 17:30:30 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 11:30:30 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Guido van Rossum <guido at> wrote:
> OMG. Are we now adding 'const' modifiers to random places? I thought
> "const propagation hell" was a place we were happily avoiding by not
> falling for that meme. What changed?

I added some const to several API functions that take char* but
typically called by passing string literals.  In C++, a string literal
is a const char* so you need to add a const_cast<> to every call site,
which is incredibly cumbersome.  After some discussion on python-dev,
I made changes to a small set of API functions and chased the
const-ness the rest of the way, as you would expect.  There was
nothing random about the places const was added.

I admit that I'm also puzzled by Jack's specific question.  I don't
understand why an array passed to PyArg_ParseTupleAndKeywords() would
need to be declared as const.  I observed the problem in my initial
changes but didn't think very hard about the cause of the problem. 
Perhaps someone with better C/C++ standards chops can explain.


> --Guido
> On 2/10/06, Jack Jansen <Jack.Jansen at> wrote:
> > I keep running into problems with the "const" modifications to
> > PyArg_ParseTupleAndKeywords() (rev. 41638 by Jeremy).
> >
> > I have lots of code of the form
> >         char *kw[] = {"itself", 0};
> >
> >         if (PyArg_ParseTupleAndKeywords(_args, _kwds, "O&", kw,
> > CFTypeRefObj_Convert, &itself)) ...
> > which now no longer compiles, neither with C nor with C++ (gcc4, both
> > MacOSX and Linux). Changing the kw declaration to "const char *kw[]"
> > makes it compile again.
> >
> > I don't understand why it doesn't compile: even though the
> > PyArg_ParseTupleAndKeywords signature promises that it won't change
> > the "kw" argument I see no reason why I shouldn't be able to pass a
> > non-const argument.
> >
> > And to make matters worse adding the "const" of course makes the code
> > non-portable to previous versions of Python (where the C compiler
> > rightly complains that I'm passing a const object through a non-const
> > parameter).
> >
> > Can anyone enlighten me?
> > --
> > Jack Jansen, <Jack.Jansen at>,
> > If I can't dance I don't want to be part of your revolution -- Emma
> > Goldman
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From keith at  Fri Feb 10 17:37:34 2006
From: keith at (Keith Dart)
Date: Fri, 10 Feb 2006 08:37:34 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote the following on 2006-02-09 at 16:27 PST:
> Since you probably won't stop until I give you an answer: I'm really
> not interested in a syntactic solution that allows multi-line lambdas.

Fuzzy little lambdas, wouldn't hurt a fly.
Object of much derision, one has to wonder why?

Docile little lambdas, so innocent and pure
Only wants to function with finality and closure.

Cute little lambdas, they really are so sweet
When ingested by a Python they make a tasty treat.


-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at>
   public key: ID: 19017044
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : 

From thomas at  Fri Feb 10 17:53:39 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 17:53:39 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 10, 2006 at 11:30:30AM -0500, Jeremy Hylton wrote:
> On 2/10/06, Guido van Rossum <guido at> wrote:
> > OMG. Are we now adding 'const' modifiers to random places? I thought
> > "const propagation hell" was a place we were happily avoiding by not
> > falling for that meme. What changed?
> I added some const to several API functions that take char* but
> typically called by passing string literals.  In C++, a string literal
> is a const char* so you need to add a const_cast<> to every call site,
> which is incredibly cumbersome.  After some discussion on python-dev,
> I made changes to a small set of API functions and chased the
> const-ness the rest of the way, as you would expect.  There was
> nothing random about the places const was added.
> I admit that I'm also puzzled by Jack's specific question.  I don't
> understand why an array passed to PyArg_ParseTupleAndKeywords() would
> need to be declared as const.  I observed the problem in my initial
> changes but didn't think very hard about the cause of the problem. 
> Perhaps someone with better C/C++ standards chops can explain.

Well, it's counter-intuitive, but a direct result of how pointer equivalence
is defined in C. I'm rusty in this part, so I will get some terminology
wrong, but IIRC, a variable A is of an equivalent type of variable B if they
hold the same type of data. So, a 'const char *' is equivalent to a 'char *'
because they both hold the memory of a 'char'. But a 'const char**' (or
'const *char[]') is not equivalent to a 'char **' (or 'char *[]') because
the first holds the address of a 'const char *', and the second the address
of a 'char *'. A 'char * const *' is equivalent to a 'char **' though.

As I said, I got some of the terminology wrong, but the end result is
exactly that: a 'const char **' is not equivalent to a 'char **', even
though a 'const char *' is equivalent to a 'char *'. Equivalence, in this
case, means 'can be automatically downcasted'. Peter v/d Linden explains
this quite well in "Expert C Programming" (aka 'Deep C Secrets'), but
unfortunately I'm working from home and I left my copy at a coworkers' desk.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From martin at  Fri Feb 10 18:02:03 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 18:02:03 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Jeremy Hylton wrote:
> I admit that I'm also puzzled by Jack's specific question.  I don't
> understand why an array passed to PyArg_ParseTupleAndKeywords() would
> need to be declared as const.  I observed the problem in my initial
> changes but didn't think very hard about the cause of the problem. 
> Perhaps someone with better C/C++ standards chops can explain.

Please take a look at this code:

void foo(const char** x, const char*s)
        x[0] = s;

void bar()
        char *kwds[] = {0};
        const char *s = "Text";
        foo(kwds, s);
        kwds[0][0] = 't';

If it was correct, you would be able to modify the const char
array in the string literal, without any compiler errors. The

  x[0] = s;

is kosher, because you are putting a const char* into a
const char* array, and the assigment

     kwds[0][0] = 't';

is ok, because you are modifying a char array. So the place
where it has to fail is the passing of the pointer-pointer.


From jeremy at  Fri Feb 10 18:06:21 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 12:06:21 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

It looks like a solution may be to define it as "const char * const *"
rather than "const char **".  I'll see if that works.


On 2/10/06, "Martin v. L?wis" <martin at> wrote:
> Jeremy Hylton wrote:
> > I admit that I'm also puzzled by Jack's specific question.  I don't
> > understand why an array passed to PyArg_ParseTupleAndKeywords() would
> > need to be declared as const.  I observed the problem in my initial
> > changes but didn't think very hard about the cause of the problem.
> > Perhaps someone with better C/C++ standards chops can explain.
> Please take a look at this code:
> void foo(const char** x, const char*s)
> {
>         x[0] = s;
> }
> void bar()
> {
>         char *kwds[] = {0};
>         const char *s = "Text";
>         foo(kwds, s);
>         kwds[0][0] = 't';
> }
> If it was correct, you would be able to modify the const char
> array in the string literal, without any compiler errors. The
> assignment
>   x[0] = s;
> is kosher, because you are putting a const char* into a
> const char* array, and the assigment
>      kwds[0][0] = 't';
> is ok, because you are modifying a char array. So the place
> where it has to fail is the passing of the pointer-pointer.
> Regards,
> Martin

From martin at  Fri Feb 10 18:07:24 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 18:07:24 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Jeremy Hylton wrote:
> I added some const to several API functions that take char* but
> typically called by passing string literals.  In C++, a string literal
> is a const char* so you need to add a const_cast<> to every call site,

That's not true.

A string literal of length N is of type const char[N+1]. However,
a (deprecated) conversion of string literals to char* is provided
in the language. So assigning a string literal to char* or passing
it in a char* parameter is compliant with standard C++, no
const_cast is required.


From jeremy at  Fri Feb 10 18:14:28 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 12:14:28 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, "Martin v. L?wis" <martin at> wrote:
> Jeremy Hylton wrote:
> > I added some const to several API functions that take char* but
> > typically called by passing string literals.  In C++, a string literal
> > is a const char* so you need to add a const_cast<> to every call site,
> That's not true.
> A string literal of length N is of type const char[N+1]. However,
> a (deprecated) conversion of string literals to char* is provided
> in the language. So assigning a string literal to char* or passing
> it in a char* parameter is compliant with standard C++, no
> const_cast is required.

Ok.  I reviewed the original problem and you're right, the problem was
not that it failed outright but that it produced a warning about the
deprecated conversion:
warning: deprecated conversion from string constant to 'char*''

I work at a place that takes the same attitude as python-dev about
warnings:  They're treated as errors and you can't check in code that
the compiler generates warnings for.

Nonetheless, the consensus on the c++ sig and python-dev at the time
was to fix Python.  If we don't allow warnings in our compilations, we
shouldn't require our users at accept warnings in theirs.


From tim.peters at  Fri Feb 10 18:19:01 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 12:19:01 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

[Jeremy Hylton]
> ...
> I admit that I'm also puzzled by Jack's specific question.  I don't
> understand why an array passed to PyArg_ParseTupleAndKeywords() would
> need to be declared as const.  I observed the problem in my initial
> changes but didn't think very hard about the cause of the problem.
> Perhaps someone with better C/C++ standards chops can explain.

Oh, who cares?  I predict "Jack's problem" would go away if we changed
the declaration of PyArg_ParseTupleAndKeywords to what you intended
<wink> to begin with:

PyAPI_FUNC(int) PyArg_ParseTupleAndKeywords(PyObject *, PyObject *,
                                                  const char *, const
char * const *, ...);

That is, declare the keywords argument as a pointer to const pointer
to const char, rather than the current pointer to pointer to const

How about someone on a Linux box try that with gcc, and check it in if
it solves Jack's problem (meaning that gcc stops whining about the
original spelling of his original example).

From jeremy at  Fri Feb 10 18:22:24 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 12:22:24 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Jeremy Hylton <jeremy at> wrote:
> It looks like a solution may be to define it as "const char * const *"
> rather than "const char **".  I'll see if that works.

No.  It doesn't work.  I'm not sure about this one either, but some
searching suggests that you can pass a char** to a function taking
const char* const* in C++ but not in C.  Sigh.  I don't see any way to
avoid a warning in Jack's case.


> Jeremy
> On 2/10/06, "Martin v. L?wis" <martin at> wrote:
> > Jeremy Hylton wrote:
> > > I admit that I'm also puzzled by Jack's specific question.  I don't
> > > understand why an array passed to PyArg_ParseTupleAndKeywords() would
> > > need to be declared as const.  I observed the problem in my initial
> > > changes but didn't think very hard about the cause of the problem.
> > > Perhaps someone with better C/C++ standards chops can explain.
> >
> > Please take a look at this code:
> >
> > void foo(const char** x, const char*s)
> > {
> >         x[0] = s;
> > }
> >
> > void bar()
> > {
> >         char *kwds[] = {0};
> >         const char *s = "Text";
> >         foo(kwds, s);
> >         kwds[0][0] = 't';
> > }
> >
> > If it was correct, you would be able to modify the const char
> > array in the string literal, without any compiler errors. The
> > assignment
> >
> >   x[0] = s;
> >
> > is kosher, because you are putting a const char* into a
> > const char* array, and the assigment
> >
> >      kwds[0][0] = 't';
> >
> > is ok, because you are modifying a char array. So the place
> > where it has to fail is the passing of the pointer-pointer.
> >
> > Regards,
> > Martin
> >

From tim.peters at  Fri Feb 10 18:27:35 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 12:27:35 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

>> It looks like a solution may be to define it as "const char * const *"
>> rather than "const char **".  I'll see if that works.

> No.  It doesn't work.  I'm not sure about this one either, but some
> searching suggests that you can pass a char** to a function taking
> const char* const* in C++ but not in C.

Oops!  I think that's right.

> Sigh.  I don't see any way to avoid a warning in Jack's case.

Martin's turn ;-)

From guido at  Fri Feb 10 18:29:42 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 09:29:42 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Jeremy Hylton <jeremy at> wrote:
> I added some const to several API functions that take char* but
> typically called by passing string literals.  In C++, a string literal
> is a const char* so you need to add a const_cast<> to every call site,
> which is incredibly cumbersome.  After some discussion on python-dev,
> I made changes to a small set of API functions and chased the
> const-ness the rest of the way, as you would expect.  There was
> nothing random about the places const was added.

I still don't understand *why* this was done, nor how the set of
functions was chosen if not randomly.

--Guido van Rossum (home page:

From tim.peters at  Fri Feb 10 18:43:00 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 12:43:00 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

[Thomas Wouters]
> Perhaps the memory you have is of select-lookalikes, like poll(),

No, it was definitely select(), and on a 64-bit Unix (probably _not_
Linux) that allowed for an enormous number of sockets.

> or maybe of vendor-specific (and POSIX-breaking) extensions to select().

Yes, it must have been non-POSIX.

> select() performs pretty poorly on large fdsets with holes in, and has the fixed
> size fdset problem, so poll() was added to fix that (by Linux and later by XPG4,
> IIRC.) poll() takes an array of structs containing the fd, the operations to
> watch for and an output parameter with seen events. Does that jar your
> memory? :)

No more than it had been jarred ;-)  Well, a bit more:  it was
possible to pass a first argument to select() that was larger than
FD_SETSIZE.  In effect, FD_SETSIZE had no meaning.

> (The socketmodule has support for poll(), on systems that have it, by the
> way.)


From tim.peters at  Fri Feb 10 18:54:17 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 12:54:17 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

>> I added some const to several API functions that take char* but
>> typically called by passing string literals.  In C++, a string literal
>> is a const char* so you need to add a const_cast<> to every call site,
>> which is incredibly cumbersome.  After some discussion on python-dev,
>> I made changes to a small set of API functions and chased the
>> const-ness the rest of the way, as you would expect.  There was
>> nothing random about the places const was added.

> I still don't understand *why* this was done,

Primarily to make life easier for C++ programmers using Python's C
API.  But didn't Jeremy just say that?

Some people (including me) have been adding const to char* API
arguments for years, but in much slower motion, and at least I did it
only when someone complained about a specific function.

> nor how the set of functions was chosen if not randomly.

    I added some const to several API functions that take char* but
    typically called by passing string literals.

If he had _stuck_ to that, we wouldn't be having this discussion :-) 
(that is, nobody passes string literals to
PyArg_ParseTupleAndKeywords's kws argument).

From jeremy at  Fri Feb 10 19:05:41 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 13:05:41 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Tim Peters <tim.peters at> wrote:
>     [Jeremy]
>     I added some const to several API functions that take char* but
>     typically called by passing string literals.
> If he had _stuck_ to that, we wouldn't be having this discussion :-)
> (that is, nobody passes string literals to
> PyArg_ParseTupleAndKeywords's kws argument).

They are passing arrays of string literals.  In my mind, that was a
nearly equivalent use case.  I believe the C++ compiler complains
about passing an array of string literals to char**.


From guido at  Fri Feb 10 19:07:45 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 10:07:45 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Tim Peters <tim.peters at> wrote:
> [Jeremy]
> >> I added some const to several API functions that take char* but
> >> typically called by passing string literals.  In C++, a string literal
> >> is a const char* so you need to add a const_cast<> to every call site,
> >> which is incredibly cumbersome.  After some discussion on python-dev,
> >> I made changes to a small set of API functions and chased the
> >> const-ness the rest of the way, as you would expect.  There was
> >> nothing random about the places const was added.
> [Guido]
> > I still don't understand *why* this was done,
> Primarily to make life easier for C++ programmers using Python's C
> API.  But didn't Jeremy just say that?

I didn't connect the dots.

> Some people (including me) have been adding const to char* API
> arguments for years, but in much slower motion, and at least I did it
> only when someone complained about a specific function.
> > nor how the set of functions was chosen if not randomly.
>     [Jeremy]
>     I added some const to several API functions that take char* but
>     typically called by passing string literals.
> If he had _stuck_ to that, we wouldn't be having this discussion :-)
> (that is, nobody passes string literals to
> PyArg_ParseTupleAndKeywords's kws argument).

Is it too late to revert this one?

Is there another way to make C++ programmers happy (e.g. my having a
macro that expands to const when compiled with C++ but vanishes when
compiled with C?)

--Guido van Rossum (home page:

From tim.peters at  Fri Feb 10 19:27:35 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 13:27:35 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

>>> I added some const to several API functions that take char* but
>>> typically called by passing string literals.

>> If he had _stuck_ to that, we wouldn't be having this discussion :-)
>> (that is, nobody passes string literals to
>> PyArg_ParseTupleAndKeywords's kws argument).

> They are passing arrays of string literals.  In my mind, that was a
> nearly equivalent use case.  I believe the C++ compiler complains
> about passing an array of string literals to char**.

It's the consequences:  nobody complains about tacking "const" on to a
former honest-to-God "char *" argument that was in fact not modified,
because that's not only helpful for C++ programmers, it's _harmless_
for all programmers.  For example, nobody could sanely object (and
nobody did :-)) to adding const to the attribute-name argument in
PyObject_SetAttrString().  Sticking to that creates no new problems
for anyone, so that's as far as I ever went.

From jeremy at  Fri Feb 10 19:32:51 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 13:32:51 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Guido van Rossum <guido at> wrote:
> On 2/10/06, Tim Peters <tim.peters at> wrote:
> >     [Jeremy]
> >     I added some const to several API functions that take char* but
> >     typically called by passing string literals.
> >
> > If he had _stuck_ to that, we wouldn't be having this discussion :-)
> > (that is, nobody passes string literals to
> > PyArg_ParseTupleAndKeywords's kws argument).
> Is it too late to revert this one?

The change is still beneficial to C++ programmers, so my initial
preference is to keep it.  There are still some benefits to the other
changes, so it's isn't a complete loss if we revert it.

> Is there another way to make C++ programmers happy (e.g. my having a
> macro that expands to const when compiled with C++ but vanishes when
> compiled with C?)

Sounds icky.  Are we pretty sure there is no way to do the right thing
in plain C?  That is, declare the argument as taking a set of const
strings and still allow non-const strings to be passed without


From martin at  Fri Feb 10 20:18:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 20:18:53 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Jeremy Hylton wrote:
> Ok.  I reviewed the original problem and you're right, the problem was
> not that it failed outright but that it produced a warning about the
> deprecated conversion:
> warning: deprecated conversion from string constant to 'char*''
> I work at a place that takes the same attitude as python-dev about
> warnings:  They're treated as errors and you can't check in code that
> the compiler generates warnings for.

In that specific case, I think the compiler's warning should be turned
off; it is a bug in the compiler if that specific warning cannot be
turned off separately.

While it is true that the conversion is deprecated, the C++ standard
defines this as

"Normative for the current edition of the Standard, but not guaranteed
to be part of the Standard in future revisions."

The current version is from 1998. I haven't been following closely,
but I believe there are no plans to actually remove the feature
in the next revision.

FWIW, Annex D also defines these features as deprecated:
- the use of "static" for objects in namespace scope (AFAICT
  including C file-level static variables and functions)
- C library headers (i.e. <stdio.h>)

Don't you get a warning when including Python.h, because that
include <limits.h>?

> Nonetheless, the consensus on the c++ sig and python-dev at the time
> was to fix Python.  If we don't allow warnings in our compilations, we
> shouldn't require our users at accept warnings in theirs.

We don't allow warnings for "major compilers". This specific compiler
appears flawed (or your configuration of it).


From martin at  Fri Feb 10 20:33:42 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 20:33:42 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Tim Peters wrote:
>>Sigh.  I don't see any way to avoid a warning in Jack's case.
> Martin's turn ;-)

I see two options:

1. Revert the change for the const char** keywords argument (but
   leave the change for everything else). C++ users should only
   see a problem if they have a const char* variable, not if
   they use literals (Jeremy's compiler's warning is insensate)

   For keyword arguments, people typically don't have char*
   variables; instead, they have an array of string literals.

2. Only add the const in C++:

#ifdef __cplusplus
#define Py_cxxconst const
#define Py_cxxconst

   PyAPI_FUNC(int) PyArg_ParseTupleAndKeywords(PyObject *, PyObject *,
       const char *, Py_cxxconst char *Py_cxxconst*, ...);

   This might look like it could break C/C++ interoperability on
   platforms that take an inventive interpretation of standard
   (e.g. if they would mangle even C symbols). However, I believe
   it won't make things worse: The C++ standard doesn't guarantee
   interoperability of C and C++ implementations at all, and the
   platforms I'm aware of support the above construct (since
   PA_PTAK is extern "C").


From python at  Fri Feb 10 20:52:09 2006
From: python at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 14:52:09 -0500
Subject: [Python-Dev] _length_cue()
References: <><00a101c62cea$9315b2c0$b83efea9@RaymondLaptop1><><009001c62d1f$79a6abc0$b83efea9@RaymondLaptop1><><>
Message-ID: <001501c62e7b$7ab86dc0$b83efea9@RaymondLaptop1>

> It would be nicer if all these
> iterators I'm not familiar with would give me a hint about what they
> actually return, instead of:
>>>> itertools.count(17)
> count(17)                  # yes, thank you, not very helpful

I prefer that the repr() of count() be left alone.  It follows the style 
used by xrange() and other repr's that can be run through eval(). Also, the 
existing repr keeps its information up-to-date to reflect the current state 
of the iterator:

>>> it = count(10)
>>> it

A good deal of thought and discussion went into these repr forms.  See the 
python-dev discussions in April 2004.  Please don't randomly go in and 
change those choices.

For most of the tools like enumerate(), there are very few assumptions you 
can make about the input without actually running the iteration.  So, I 
don't see how you can change enumerate's repr method unless adopting a 
combination of styles, switching back and forth depending on the input:

>>> enumerate('abcde')
<(0, 'a'), (1, 'b'), ...>
>>> enumerate(open('tmp.txt'))
<enumerate object at 0x00BFD800>

IMO, switching back and forth is an especially bad idea.
Hence, enumerate's repr ought to be left alone too.


From guido at  Fri Feb 10 21:21:26 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 12:21:26 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/7/06, Neal Norwitz <nnorwitz at> wrote:
> On 2/7/06, Jeremy Hylton <jeremy at> wrote:
> > It looks like we need a Python 2.5 Release Schedule PEP.
> Very draft:
> Needs lots of work and release managers.  Anthony, Martin, Fred, Sean
> are all mentioned with TBDs and question marks.

Before he went off to a boondoggle^Woff-site at a Mexican resort, Neal
made me promise that I'd look at this and try to get the 2.5 release
plan going for real.

First things first: we need a release manager. Anthony, do you want to
do the honors again, or are you ready for retirement?

Next, the schedule. Neal's draft of the schedule has us releasing 2.5
in October. That feels late -- nearly two years after 2.4 (which was
released on Nov 30, 2004). Do people think it's reasonable to strive
for a more aggressive (by a month) schedule, like this:

    alpha 1: May 2006
    alpha 2: June 2006
    beta 1:  July 2006
    beta 2:  August 2006
    rc 1:    September 2006
    final:   September 2006

??? Would anyone want to be even more aggressive (e.g. alpha 1 right
after PyCon???). We could always do three alphas.

There's a bunch of sections (some very long) towards the end of the
PEP of questionable use; Neal just copied these from the 2.4 release
schedule (PEP 320):

- Ongoing tasks
- Carryover features from Python 2.4
- Carryover features from Python 2.3 (!)

Can someone go over these and suggest which we should keep, which we
should drop? (I may do this later, but I have other priorities below.)

Then, the list of features that ought to be in 2.5. Quoting Neal's draft:

>    PEP 308: Conditional Expressions

Definitely. Don't we have a volunteer doing this now?

>    PEP 328: Absolute/Relative Imports

Yes, please.

>    PEP 343: The "with" Statement

Didn't Michael Hudson have a patch?

>    PEP 352: Required Superclass for Exceptions

I believe this is pretty much non-controversial; it's a much weaker
version of PEP 348 which was rightfully rejected for being too
radical. I've tweaked some text in this PEP and approved it. Now we
need to make it happen. It might be quite a tricky thing, since
Exception is currently implemented in C as a classic class. If Brett
wants to sprint on this at PyCon I'm there to help (Mon/Tue only).
Fortunately we have MWH's patch 1104669 as a starting point.

>    PEP 353: Using ssize_t as the index type

Neal tells me that this is in progress in a branch, but that the code
is not yet flawless (tons of warnings etc.). Martin, can you tell us
more? When do you expect this to land? Maybe aggressively merging into
the HEAD and then releasing it as alpha would be a good way to shake
out the final issues???

Other PEPs I'd like comment on:

PEP 357 (__index__): the patch isn't on SF yet, but otherwise I'm all
for this, and I'd like to accept it ASAP to get it in 2.5. It doesn't
look like it'll cause any problems.

PEP 314 (metadata v1.1): this is marked as completed, but there's a
newer PEP available: PEP 334 (metadata v1.2). That PEP has 2.5 as its
target date. Shouldn't we implement it? (This is a topic that I
haven't followed closely.) There's also the question whether 314
should be marked final. Andrew or Richard?

PEP 355 (path module): I still haven't reviewed this, because I'm -0
on adding what appears to me duplicate functionality. But if there's a
consensus building perhaps it should be allowed to go forward (and
then I *will* review it carefully).

I found a few more PEPs slated for 2.5 but that haven't seen much action lately:

PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
freezing arbitrary mutable data structures. Are there champions who
want to argue this?

PEP 349 - str() may return unicode. Where is this? I'm not at all sure
the PEP is ready. it would probably be a lot of work to make this work
everywhere in the C code, not to mention the stdlib .py code. Perhaps
this should be targeted for 2.6 instead? The consequences seem
potentially huge.

PEP 315 - do while. A simple enough syntax proposal, albeit one
introducing a new keyword (which I'm fine with). I kind of like it but
it doesn't strike me as super important -- if we put this off until
Py3k I'd be fine with that too. Opinions? Champions?

Ouch, a grep produced tons more. Quick rundown:

PEP 246 - adaptation. I'm still as lukewarm as ever; it needs
interfaces, promises to cause a paradigm shift, and the global map
worries me.

PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

PEP 337 - logging in the stdlib. What of it? This seems a good idea
but potentially disruptive (because backwards incompatible). Also it
could be done piecemeal on an opportunistic basis. Any volunteers?

PEP 338 - support -m for modules in packages. I believe Nick Coghlan
is close to implementing this. I'm fine with accepting it.

PEP 344 - exception chaining. There are deep problems with this due to
circularities; perhaps we should drop this, or revisit it for Py3k.

That's the "pep parade" for now. It would be appropriate to start a
new topic to discuss specific PEPs; a response to this thread
referencing the new thread would be appropriate.

--Guido van Rossum (home page:

From scott+python-dev at  Fri Feb 10 21:24:28 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 15:24:28 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Tim Peters wrote:
> No more than it had been jarred ;-)  Well, a bit more:  it was
> possible to pass a first argument to select() that was larger than
> FD_SETSIZE.  In effect, FD_SETSIZE had no meaning.

This begs the question then whether the check that is implemented has 
any relevance to any platform other than Linux. I am no portability 
guru, but I have to think there are other platforms where this patch 
will cause problems. For now at least, can we at least do some 
preprocessing magic to not use this code with Windows?

Scott Dial
scott at
dialsa at

From thomas at  Fri Feb 10 21:40:29 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 21:40:29 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 10, 2006 at 03:24:28PM -0500, Scott Dial wrote:
> Tim Peters wrote:
> >No more than it had been jarred ;-)  Well, a bit more:  it was
> >possible to pass a first argument to select() that was larger than
> >FD_SETSIZE.  In effect, FD_SETSIZE had no meaning.

> any relevance to any platform other than Linux. I am no portability 
> guru, but I have to think there are other platforms where this patch 
> will cause problems. For now at least, can we at least do some 
> preprocessing magic to not use this code with Windows?

I doubt it will have problems on other platforms. As Tim said, FD_SETSIZE is
mandated by POSIX. Perhaps some platforms do allow larger sizes, by
replacing the FD_* macros with functions that dynamically grow whatever
magic is the 'fdset' datatype. I sincerely doubt it's a common approach,
though, and for them to be POSIX they would need to have FD_SETSIZE set to
some semi-sane value. So at worst, on those platforms (if any), we're
reducing the number of sockets you can actually select() on, from some
undefined platform maximum to whatever the platform *claims* is the maximum.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From martin at  Fri Feb 10 21:40:59 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 21:40:59 +0100
Subject: [Python-Dev] ssize_t status (Was: release plan for 2.5 ?)
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
>>   PEP 353: Using ssize_t as the index type
> Neal tells me that this is in progress in a branch, but that the code
> is not yet flawless (tons of warnings etc.). Martin, can you tell us
> more?

"It works", in a way. You only get the tons of warnings with the
right compiler, and you don't actually need to fix them all to get
something useful. Not all modules need to be converted to support
more than 2**31 elements for all containers they operate on, so
this could also be based on user feedback.

Some users (so far, just Marc-Andre) have complained that this
breaks backwards compatibility. Some improvements can be made still,
but for some aspects (tp_as_sequence callbacks), I think the best
we can hope for is compiler warnings about incorrect function
pointer types.

> When do you expect this to land? Maybe aggressively merging into
> the HEAD and then releasing it as alpha would be a good way to shake
> out the final issues???

Sure: I hope to complete this all in March.


From martin at  Fri Feb 10 21:47:26 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 21:47:26 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Scott Dial wrote:
> This begs the question then whether the check that is implemented has 
> any relevance to any platform other than Linux. I am no portability 
> guru, but I have to think there are other platforms where this patch 
> will cause problems.

The patch is right on all platforms conforming to the POSIX standard.
POSIX says that FD_ISSET and friends have undefined behaviour if
the file descriptor is larger than FD_SETSIZE.

For platforms not conforming to the POSIX standard, the patch errs
on the conservative side: it refuses to do something that POSIX
says has undefined behaviour, yet may be well-defined on that

Disabling this for Windows is fine with me; I also think there should
be some kind of documentation that quickly shows the potential cause
of the exception


From martin at  Fri Feb 10 22:00:46 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 22:00:46 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Thomas Wouters wrote:
> I doubt it will have problems on other platforms. As Tim said, FD_SETSIZE is
> mandated by POSIX. Perhaps some platforms do allow larger sizes, by
> replacing the FD_* macros with functions that dynamically grow whatever
> magic is the 'fdset' datatype. I sincerely doubt it's a common approach,
> though, and for them to be POSIX they would need to have FD_SETSIZE set to
> some semi-sane value. So at worst, on those platforms (if any), we're
> reducing the number of sockets you can actually select() on, from some
> undefined platform maximum to whatever the platform *claims* is the maximum.

I think the Windows interpretation is actually well-designed: FD_SETSIZE
shouldn't be the number of the largest descriptor, but instead be the
maximum size of the set. So FD_SETSIZE is 64 on Windows, but you still
can have much larger file descriptor numbers.

The implementation strategy of Windows is to use an array of integers,
rather than the bit mask, and an index telling you how many slots have
already been filled. With FD_SETSIZE being 64, the fd_set requires
256 bytes.

This strategy has a number of interesting implications:
- a naive implementation of FD_SET is not idempotent; old winsock
  implementations where so naive. So you might fill the set by
  setting the same descriptor 64 times. Current implementations
  use a linear search to make the operation idempotent.
- FD_CLR needs to perform a linear scan for the descriptor,
  and then shift all subsequent entries by one (it could actually
  just move the very last entry to the deleted slot, but doesn't)

In any case, POSIX makes it undefined what FD_SET does when the
socket is larger than FD_SETSIZE, and apparently clearly expects
an fd_set to be a bit mask.


From fabianosidler at  Fri Feb 10 22:03:59 2006
From: fabianosidler at (Fabiano Sidler)
Date: Fri, 10 Feb 2006 22:03:59 +0100
Subject: [Python-Dev] compiler.pyassem
Message-ID: <>

Hi folks!

Do I see things as they are and compiler.pyassem generates bytecode
straight without involve any C code, i.e. code from the VM or the
compiler? How is this achieved? I took a look at Python/compile.c as
mentioned in compiler.pyassem and I'm trying to get into it, but about
6500 lines of C code are too much for me in one file. Could someone
here please give me some hints on how one can do what compiler.pyassem


From raymond.hettinger at  Fri Feb 10 22:05:35 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 16:05:35 -0500
Subject: [Python-Dev] release plan for 2.5 ?
References: <dsbc3h$rct$><><><>
Message-ID: <000a01c62e85$bd081770$b83efea9@RaymondLaptop1>

[Guido van Rossum]
> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?

It has at least one anti-champion.  I think it is a horrible idea and would 
like to see it rejected in a way that brings finality.  If needed, I can 
elaborate in a separate thread.

> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?

I helped tweak a few issues with the PEP and got added as a co-author.
I didn't push for it because the syntax is a little odd if nothing appears 
the while suite:

    val =
while val != lastitem:

I never found a way to improve this.  Dropping the final colon and 
steps improved the looks but diverged too far away from the rest of the 

    val =
while val != lastitem

So, unless another champion arises, putting this off until Py3k is fine with 

 > PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

I installed the underlying mechanism in support of itertools.tee() in Py2.4.

So, if anyone really wants to make xrange() copyable, it is now a trivial 
task --
likewise for any other iterator that has a potentially copyable state.

I've yet to find a use case for it, so I never pushed for the rest of
the PEP to be implemented.  There's nothing wrong with the idea,
but there doesn't seem to be much interest.

> PEP 344 - exception chaining. There are deep problems with this due to
> circularities; perhaps we should drop this, or revisit it for Py3k.

I wouldn't hold-up Py2.5 for this.

My original idea for this was somewhat simpler.  Essentially, a high-level 
function would concatenate extra string information onto the result of an 
exception raised at a lower level.  That strategy was applied to an existing 
problem for type objects and has met with good success.

IOW, there is a simpler alternative on the table, but resolution won't take 
place until we collectively take interest in it again.  At this point, it 
seems to be low on everyone's priority list (including mine).


From pje at  Fri Feb 10 22:07:50 2006
From: pje at (Phillip J. Eby)
Date: Fri, 10 Feb 2006 16:07:50 -0500
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <
References: <>
Message-ID: <>

At 12:21 PM 2/10/2006 -0800, Guido van Rossum wrote:
> >    PEP 343: The "with" Statement
>Didn't Michael Hudson have a patch?

PEP 343's "Accepted" status was reverted to "Draft" in October, and then 
changed back to "Accepted".  I believe the latter change is an error, since 
you haven't pronounced on the changes.  Have you reviewed the __context__ 
stuff that was added?

In any case Michael's patch was pre-AST branch merge, and no longer 
reflects the current spec.

>PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

Wasn't the plan to just make this a builtin version of array.array for 
bytes, plus a .decode method and maybe a few other tweaks?  We presumably 
won't be able to .encode() to bytes or get bytes from sockets and files 
until 3.0, but having the type and being able to write it to files and 
sockets would be nice.  I'm not sure about the b"" syntax, ISTR it was 
controversial but I don't remember if there was a resolution.

>PEP 314 (metadata v1.1): this is marked as completed, but there's a
>newer PEP available: PEP 334 (metadata v1.2). That PEP has 2.5 as its
>target date. Shouldn't we implement it? (This is a topic that I
>haven't followed closely.) There's also the question whether 314
>should be marked final. Andrew or Richard?

I'm concerned that both metadata PEPs push to define syntax for things that 
have undefined semantics.  And worse, to define incompatible syntax in some 
cases.  PEP 345 for example, dictates the use of StrictVersion syntax for 
the required version of Python and the version of external requirements, 
but Python's own version numbers don't conform to strict version 
syntax.  ISTM that the metadata standard needs more work, especially since 
PyPI doesn't actually support using all of the metadata provided by the 
implemented version of the standard.  There's no way to search for 
requires/provides, for example (which is one reason why I went with 
distribution names for dependency resolution in setuptools).  Also, the 
specs don't allow for a Maintainer distinct from the package Author, even 
though the distutils themselves allow this.  IMO, 345 needs to go back to 
the drawing board, and I'm not really thrilled with the currently-useless 
"requires/provides" stuff in PEP 314.

If we do anything with the package metadata in Python 2.5, I'd like it to 
be *installing* PKG-INFO files alongside the packages, using a filename of 
the form "distributionname-version-py2.5.someext".  Setuptools supports 
such files currently under the ".egg-info" extension, but I'd be just as 
happy with '.pkg-info' if it becomes a Python standard addition to the 
installation.  Having this gives most of the benefits of PEP 262 (database 
of installed packages), although I wouldn't mind extending the PKG-INFO 
file format to include some of the PEP 262 additional data.

These are probably distutils-sig and/or catalog-sig topics; I just mainly 
wanted to point out that 314, 245, and 262 need at least some tweaking and 
possibly rethinking before any push to implementation.

From thomas at  Fri Feb 10 22:11:31 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 22:11:31 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Fri, Feb 10, 2006 at 12:21:26PM -0800, Guido van Rossum wrote:

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

Well, PyCon might be a nice place to finish any PEP patches. I know I'll be
available to do such work on the sprint days ;) I don't think that means
we'll have a working repository with all 2.5 features right after, though.

> >    PEP 308: Conditional Expressions

> Definitely. Don't we have a volunteer doing this now?

There is a volunteer, but he's new at this, so he probably needs a bit of
time to work through the intricacies of the AST, the compiler and the eval

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From jeremy at  Fri Feb 10 22:14:12 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 10 Feb 2006 16:14:12 -0500
Subject: [Python-Dev] compiler.pyassem
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, Fabiano Sidler <fabianosidler at> wrote:
> Do I see things as they are and compiler.pyassem generates bytecode
> straight without involve any C code, i.e. code from the VM or the
> compiler? How is this achieved? I took a look at Python/compile.c as
> mentioned in compiler.pyassem and I'm trying to get into it, but about
> 6500 lines of C code are too much for me in one file. Could someone
> here please give me some hints on how one can do what compiler.pyassem
> does?

I'm not sure what exactly you want to know.  The compiler package
implements most of a Python bytecode compiler in Python.  It re-uses
the parser written in C, but otherwise does the entire transformation
in Python.  The "how is this achieved?" question is hard to answer
without saying "read the source."  There are about 6000 lines of
Python code in the compiler pacakge, but you can largely ignore
and if you just want to study the compiler.

Perhaps you specific question is: How does the interpreter create new
bytecode or function objects from a program instead of compiling from
source or importing a module?  At some level, bytecode is simply a
string representation of a progam.  The new module takes the bytecode
plus a lot of meta-data including the names of variables and a list of
constants, and produces a new code object.  See the newCodeObject()

I suspect further discussion on this topic might be better done on
python-list, unless you have some discussion that is relevant for
Python implementors.


From tim.peters at  Fri Feb 10 22:14:38 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 16:14:38 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

[Scott Dial]
> This begs the question then whether the check that is implemented has
> any relevance to any platform other than Linux. I am no portability
> guru, but I have to think there are other platforms where this patch
> will cause problems. For now at least, can we at least do some
> preprocessing magic to not use this code with Windows?

We _have_ to gut this patch on Windows, because Python code using
sockets on Windows no longer works.  That can't stand.  Indeed, I'm
half tempted to revert the checkin right now since Python's test suite
fails or hangs on Windows in test after test now.  This at least
blocks me from doing work I wanted to do (instead I spent the time
allocated for that staring at test failures).

I suggest skipping the new crud conditionalized on a symbol like


The Windows pyconfig.h can #define that, and other platforms can
ignore its possible existence.  If it applies to some Unix variant
too, fine, that variant can also #define it.  No idea here what the
story is on, e.g., Cygwin or OS2.

From guido at  Fri Feb 10 22:29:30 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 13:29:30 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Phillip J. Eby <pje at> wrote:

I'm not following up to anything that Phillip wrote (yet), but his
response reminded me of two more issues:

- wsgiref, an implementation of PEP 333 (Web Standard Gateway
interface). I think this might make a good addition to the standard
library. The web-sig has been discussing additional things that might
be proposed for addition but I believe there's no consensus -- in any
case we ought to be conservative.

- setuplib? Wouldn't it make sense to add this to the 2.5 stdlib?

--Guido van Rossum (home page:

From martin at  Fri Feb 10 22:33:28 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 22:33:28 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Tim Peters wrote:
> I suggest skipping the new crud conditionalized on a symbol like

Hmm... How about this patch:

Index: Modules/socketmodule.c
--- Modules/socketmodule.c      (Revision 42308)
+++ Modules/socketmodule.c      (Arbeitskopie)
@@ -396,7 +396,14 @@
 static PyTypeObject sock_type;

 /* Can we call select() with this socket without a buffer overrun? */
+#ifdef MS_WINDOWS
+/* Everything is selectable on Windows */
+#define IS_SELECTABLE(s)  1
+/* POSIX says selecting descriptors above FD_SETSIZE is undefined
+   behaviour. */
 #define IS_SELECTABLE(s) ((s)->sock_fd < FD_SETSIZE)

 static PyObject*


From tim.peters at  Fri Feb 10 22:35:49 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 16:35:49 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

[Martin v. L?wis]
> I think the Windows interpretation is actually well-designed: FD_SETSIZE
> shouldn't be the number of the largest descriptor, but instead be the
> maximum size of the set.

It's more that the fdset macros were well designed:  correct code
using FD_SET() etc is portable across Windows and Linux, and that's so
because the macros define an interface rather than an implementation. 
BTW, note that the first argument to select() is ignored on Windows.

> So FD_SETSIZE is 64 on Windows,

In Python FD_SETSIZE is 512 on Windows (see the top of selectmodule.c).

> but you still can have much larger file descriptor numbers.

Which is the _source_ of "the problem" on Windows:  Windows socket
handles aren't file descriptors (if they were, they'd be little
integers ;-)).

> ...
> In any case, POSIX makes it undefined what FD_SET does when the
> socket is larger than FD_SETSIZE, and apparently clearly expects
> an fd_set to be a bit mask.

Yup -- although the people who designed the fdset macros to begin with
didn't appear to have this assumption.

From scott+python-dev at  Fri Feb 10 22:41:41 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 16:41:41 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Tim Peters wrote:
>> I suggest skipping the new crud conditionalized on a symbol like
> Hmm... How about this patch:
> Index: Modules/socketmodule.c
> ===================================================================
> --- Modules/socketmodule.c      (Revision 42308)
> +++ Modules/socketmodule.c      (Arbeitskopie)
> @@ -396,7 +396,14 @@
>  static PyTypeObject sock_type;
>  /* Can we call select() with this socket without a buffer overrun? */
> +#ifdef MS_WINDOWS
> +/* Everything is selectable on Windows */
> +#define IS_SELECTABLE(s)  1
> +#else
> +/* POSIX says selecting descriptors above FD_SETSIZE is undefined
> +   behaviour. */
>  #define IS_SELECTABLE(s) ((s)->sock_fd < FD_SETSIZE)
> +#endif
>  static PyObject*
>  select_error(void)
> Regards,
> Martin

That is the exact patch I applied, but you also need to patch _ssl.c

--- C:/python-trunk/Modules/_ssl.c	(revision 42305)
+++ C:/python-trunk/Modules/_ssl.c	(working copy)
@@ -376,9 +376,11 @@
  	if (s->sock_fd < 0)

+#ifndef MS_WINDOWS
  	/* Guard against socket too large for select*/
  	if (s->sock_fd >= FD_SETSIZE)
  		return SOCKET_INVALID;

  	/* Construct the arguments to select */
  	tv.tv_sec = (int)s->sock_timeout;

But then that leaves whether to go with the 

Scott Dial
scott at
dialsa at

From scott+python-dev at  Fri Feb 10 22:46:03 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 16:46:03 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

Tim Peters wrote:
> [Martin v. L?wis]
>> So FD_SETSIZE is 64 on Windows,
> In Python FD_SETSIZE is 512 on Windows (see the top of selectmodule.c).

Although I agree, in terms of the socketmodule, there was no such define 
overriding the default FD_SETSIZE, so you are both right.

Scott Dial
scott at
dialsa at

From tim.peters at  Fri Feb 10 22:49:09 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 16:49:09 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

>> I suggest skipping the new crud conditionalized on a symbol like

> Hmm... How about this patch:

I don't know.  Of course it misses similar new tests added to _ssl.c
(see the msg that started this thread), so it spreads beyond just
this.  Does it do the right thing for Windows variants like Cygwin,
and OS/2?  Don't know.  If the initial


here gets duplicated in multiple modules (and looks like it must -- or
IS_SELECTABLE should be given a _Py name and defined once in pyport.h
instead), and gets hairier over time, then I'd rather have a name like
the one I suggested (to describe the _intent_ rather than paste
together a growing collection of "which platform do I think I'm being
compiled on?" names).

> Index: Modules/socketmodule.c
> ===================================================================
> --- Modules/socketmodule.c      (Revision 42308)
> +++ Modules/socketmodule.c      (Arbeitskopie)
> @@ -396,7 +396,14 @@
>  static PyTypeObject sock_type;
>  /* Can we call select() with this socket without a buffer overrun? */
> +#ifdef MS_WINDOWS
> +/* Everything is selectable on Windows */
> +#define IS_SELECTABLE(s)  1
> +#else
> +/* POSIX says selecting descriptors above FD_SETSIZE is undefined
> +   behaviour. */
>  #define IS_SELECTABLE(s) ((s)->sock_fd < FD_SETSIZE)
> +#endif
>  static PyObject*
>  select_error(void)

From tim.peters at  Fri Feb 10 22:55:18 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 16:55:18 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

[Martin v. L?wis]
>>> So FD_SETSIZE is 64 on Windows,

[Tim Peters]
>> In Python FD_SETSIZE is 512 on Windows (see the top of selectmodule.c).

[Scott Dial]
> Although I agree, in terms of the socketmodule, there was no such define
> overriding the default FD_SETSIZE, so you are both right.

?  Sorrry, don't know what you're talking about here.  Python's
selectmodule.c #defines FD_SETSIZE before it includes winsock.h on
Windows, so Microsoft's default is irrelevant to Python.  The reason
selectmodule.c uses "!defined(FD_SETSIZE)" in its

#if defined(MS_WINDOWS) && !defined(FD_SETSIZE)
#define FD_SETSIZE 512

is explained in the comment right before that code.

From barry at  Fri Feb 10 23:00:23 2006
From: barry at (Barry Warsaw)
Date: Fri, 10 Feb 2006 17:00:23 -0500
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Feb 10, 2006, at 3:21 PM, Guido van Rossum wrote:
> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?

I have no interest in it any longer, and wouldn't shed a tear if it  
were rejected.

One other un-PEP'd thing.  I'd like to put email 3.1 in Python 2.5  
with the new module naming scheme.  The old names will still work,  
and all the unit tests pass.  Do we need a PEP for that?


From mal at  Fri Feb 10 23:06:24 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 10 Feb 2006 23:06:24 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
>>    PEP 328: Absolute/Relative Imports
> Yes, please.

+0 for adding relative imports. -1 for raising errors for
in-package relative imports using the current notation
in Python 2.6.


for a previous discussion.

The PEP still doesn't have any mention of the above discussion or
later follow-ups.

The main argument is that the strategy to make absolute imports
mandatory and offer relative imports as work-around breaks the
possibility to produce packages that work in e.g. Python 2.4 and
2.6, simply because Python 2.4 doesn't support the needed
relative import syntax.

The only strategy left would be to use absolute imports throughout,
which isn't all that bad, except when it comes to relocating a
package or moving a set of misc. modules into a package - which is
not all that uncommon in larger projects, e.g. to group third-party
top-level modules into a package to prevent cluttering up the
top-level namespace or to simply make a clear distinction in
your code that you are relying on a third-party module, e.g

from thirdparty import tool

I don't mind having to deal with a warning for these, but don't
want to see this raise an error before Py3k.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 10 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From thomas at  Fri Feb 10 23:38:42 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 23:38:42 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Fri, Feb 10, 2006 at 11:06:24PM +0100, M.-A. Lemburg wrote:
> Guido van Rossum wrote:
> >>    PEP 328: Absolute/Relative Imports
> > 
> > Yes, please.

> +0 for adding relative imports. -1 for raising errors for
> in-package relative imports using the current notation
> in Python 2.6.

+1/-1 for me. Being able to explicitly demand relative imports is good,
breaking things soon bad. I'll happily shoehorn this in at the sprints after
PyCon ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From guido at  Fri Feb 10 23:45:54 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 14:45:54 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
	<> <>
Message-ID: <>

On 2/10/06, Thomas Wouters <thomas at> wrote:
> On Fri, Feb 10, 2006 at 11:06:24PM +0100, M.-A. Lemburg wrote:
> > Guido van Rossum wrote:
> > >>    PEP 328: Absolute/Relative Imports
> > >
> > > Yes, please.
> > +0 for adding relative imports. -1 for raising errors for
> > in-package relative imports using the current notation
> > in Python 2.6.
> +1/-1 for me. Being able to explicitly demand relative imports is good,
> breaking things soon bad. I'll happily shoehorn this in at the sprints after
> PyCon ;)

The PEP has the following timeline (my interpretation):

2.4: implement new behavior with from __future__ import absolute_import
2.5: deprecate old-style relative import unless future statement present
2.6: disable old-style relative import, future statement no longer necessary

Since it wasn't implemented in 2.4, I think all these should be bumped
by one release. Aahz, since you own the PEP, can you do that (and make
any other updates that might result)?

--Guido van Rossum (home page:

From raymond.hettinger at  Fri Feb 10 23:45:54 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 17:45:54 -0500
Subject: [Python-Dev] release plan for 2.5 ?
References: <dsbc3h$rct$><><><><>
Message-ID: <000801c62e93$c0de4460$b83efea9@RaymondLaptop1>

[Barry Warsaw"]like to put email 3.1 in Python 2.5  
> with the new module naming scheme.  The old names will still work,  
> and all the unit tests pass.  Do we need a PEP for that?


From guido at  Fri Feb 10 23:47:01 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 14:47:01 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <000801c62e93$c0de4460$b83efea9@RaymondLaptop1>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Raymond Hettinger <raymond.hettinger at> wrote:
> [Barry Warsaw"]like to put email 3.1 in Python 2.5
> > with the new module naming scheme.  The old names will still work,
> > and all the unit tests pass.  Do we need a PEP for that?
> +1

I don't know if Raymond meant "we need a PEP" or "go ahead with the
feature" but my own feeling is that this doesn't need a PEP and Barry
can Just Do It.

--Guido van Rossum (home page:

From martin at  Fri Feb 10 23:49:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 10 Feb 2006 23:49:53 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	
Message-ID: <>

Tim Peters wrote:
> I don't know.  Of course it misses similar new tests added to _ssl.c
> (see the msg that started this thread), so it spreads beyond just
> this.  Does it do the right thing for Windows variants like Cygwin,
> and OS/2?  Don't know.

I see. How does Py_SOCKET_FD_CAN_BE_GE_FD_SETSIZE help here?
Does defining it in PC/pyconfig.h do the right thing?

I guess I'm primarily opposed to the visual ugliness of the
define. Why does it spell out "can be", but abbreviates
"greater than or equal to"? What about Py_CHECK_FD_SETSIZE?


From aleaxit at  Fri Feb 10 23:54:25 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 10 Feb 2006 14:54:25 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Guido van Rossum <guido at> wrote:
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:

October would seem to me to be just about right.  I don't see that one
month either way should make any big difference, though.

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

If I could have a definitive frozen list of features by the first week
of April at the latest, that could make it (as a "2.5 preview") into
the 2nd edition of "Python in a Nutshell". But since alphas are not
feature-frozen, it wouldn't make much of a difference to me, I think.

> Other PEPs I'd like comment on:
> PEP 357 (__index__): the patch isn't on SF yet, but otherwise I'm all
> for this, and I'd like to accept it ASAP to get it in 2.5. It doesn't
> look like it'll cause any problems.

It does look great, and by whatever name I support it most heartily. 
Do, however, notice that it's "yet another specialpurpose adaptation
protocol" and that such specific restricted solutions to the general
problem, with all of their issues, will just keep piling up forever
(and need legacy support ditto) until and unless your temperature wrt
246 (or any variation thereof) should change.

> PEP 355 (path module): I still haven't reviewed this, because I'm -0
> on adding what appears to me duplicate functionality. But if there's a

I feel definitely -0 towards it too.

> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?

Another -0 from me. I suggest we shelve it for now and revisit in 3k
(maybe PEPs in that state, "not in any 2.* but revisit for 3.0", need
a special status value).

> PEP 246 - adaptation. I'm still as lukewarm as ever; it needs
> interfaces, promises to cause a paradigm shift, and the global map
> worries me.

Doesn't _need_ interfaces as a concept -- any unique markers as
"protocol names" would do, even strings, although obviously the
"stronger" the markers the better (classes/types for example would be
just perfect).  It was written on the assumption of interfaces just
because they were being proposed just before it.  The key "paradigm
shift" is to offer a way to unify what's already being widely done, in
haphazard and dispersed manners.  And I'll be quite happy to rewrite
it in terms of a more nuanced hierarchy of maps (e.g. builtin /
per-module / lexically nested, or whatever) if that's what it takes to
warm you to it -- I just think it would be over-engineering it, since
in practice the global-on-all-modules map would cover by far most
usage (both for "blessed" protocols that come with Python, and for the
use of "third party" adapting framework A to consume stuff that
framework B produces, global is the natural "residence"; other uses
are far less important.

> PEP 323 - copyable iterators. Seems stalled. Alex, do you care?

Sure, I'd like to make this happen, particularly since Raymond appears
to have already done the hard part.  What would you like to see
happening to bless it for 2.5?

> PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

Ditto -- I'd like at least SOME of it to be in 2.5.  What needs to
happen for that?


From thomas at  Fri Feb 10 23:55:36 2006
From: thomas at (Thomas Wouters)
Date: Fri, 10 Feb 2006 23:55:36 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
	<> <>
Message-ID: <>

On Fri, Feb 10, 2006 at 02:45:54PM -0800, Guido van Rossum wrote:

> The PEP has the following timeline (my interpretation):
> 2.4: implement new behavior with from __future__ import absolute_import
> 2.5: deprecate old-style relative import unless future statement present
> 2.6: disable old-style relative import, future statement no longer necessary

> Since it wasn't implemented in 2.4, I think all these should be bumped
> by one release. Aahz, since you own the PEP, can you do that (and make
> any other updates that might result)?

Bumping is fine (of course), but I'd like a short discussion on the actual
disabling before it happens (rather than the disabling happening without
anyone noticing until beta2.) There seem to be a lot of users still using
2.3, at the moment, in spite of its age. Hopefully, by the time 2.7 comes
out, everyone will have switched to 2.5, but if not, it could still be a
major annoyance to conscientious module-writers, like MAL.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From brett at  Sat Feb 11 00:06:45 2006
From: brett at (Brett Cannon)
Date: Fri, 10 Feb 2006 15:06:45 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Guido van Rossum <guido at> wrote:
> On 2/7/06, Neal Norwitz <nnorwitz at> wrote:
> > On 2/7/06, Jeremy Hylton <jeremy at> wrote:
> > > It looks like we need a Python 2.5 Release Schedule PEP.
> >
> > Very draft:
> >
> > Needs lots of work and release managers.  Anthony, Martin, Fred, Sean
> > are all mentioned with TBDs and question marks.
> Before he went off to a boondoggle^Woff-site at a Mexican resort, Neal
> made me promise that I'd look at this and try to get the 2.5 release
> plan going for real.
> First things first: we need a release manager. Anthony, do you want to
> do the honors again, or are you ready for retirement?
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>     alpha 1: May 2006
>     alpha 2: June 2006
>     beta 1:  July 2006
>     beta 2:  August 2006
>     rc 1:    September 2006
>     final:   September 2006
> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

I think that schedule is fine, but going alpha after PyCon is too fast
with the number of PEPs that need implementing.
> >    PEP 352: Required Superclass for Exceptions
> I believe this is pretty much non-controversial; it's a much weaker
> version of PEP 348 which was rightfully rejected for being too
> radical. I've tweaked some text in this PEP and approved it. Now we
> need to make it happen. It might be quite a tricky thing, since
> Exception is currently implemented in C as a classic class. If Brett
> wants to sprint on this at PyCon I'm there to help (Mon/Tue only).
> Fortunately we have MWH's patch 1104669 as a starting point.

I might sprint on it.  It's either this or I will work on the AST
stuff (the PyObject branch is still not finishd and thus it has not
been finalized if that solution or the way it is now will be the final
way of implementing the compiler and I would like to see this

Either way I take responsibility to make sure the PEP gets implemented
so you can take that question off of the schedule PEP.

> PEP 351 - freeze protocol. I'm personally -1; I don't like the idea of
> freezing arbitrary mutable data structures. Are there champions who
> want to argue this?

If Barry doesn't even care anymore I say kill it.

> PEP 315 - do while. A simple enough syntax proposal, albeit one
> introducing a new keyword (which I'm fine with). I kind of like it but
> it doesn't strike me as super important -- if we put this off until
> Py3k I'd be fine with that too. Opinions? Champions?

Eh, seems okay but I am not jumping up and down for it.  Waiting until
Python 3 is fine with me if a discussion is warranted (don't really
remember it coming up before).
> PEP 332 - byte vectors. Looks incomplete. Put off until 2.6?

I say put off.  This could be discussed at PyCon since this might be
an important type to get right.

> PEP 344 - exception chaining. There are deep problems with this due to
> circularities; perhaps we should drop this, or revisit it for Py3k.

I say revisit issues later.  Raymond says he has an idea for chaining
just the messages which could be enough help for developers.  But
either way I don't think this has been hashed out enough to go in
as-is.  I suspect a simpler solution will work, such as ditching the
traceback and only keeping either the text that would have been
printed or just the exception instance (and thus also its message).


From barry at  Sat Feb 11 00:26:51 2006
From: barry at (Barry Warsaw)
Date: Fri, 10 Feb 2006 18:26:51 -0500
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Feb 10, 2006, at 5:47 PM, Guido van Rossum wrote:

> On 2/10/06, Raymond Hettinger <raymond.hettinger at> wrote:
>> [Barry Warsaw"]like to put email 3.1 in Python 2.5
>>> with the new module naming scheme.  The old names will still work,
>>> and all the unit tests pass.  Do we need a PEP for that?
>> +1
> I don't know if Raymond meant "we need a PEP" or "go ahead with the
> feature" but my own feeling is that this doesn't need a PEP and Barry
> can Just Do It.

I was going to ask the same thing. :)

Cool.  So far there have been no objections on the email-sig, so I'll  
try to move the sandbox to the trunk this weekend.  That should give  
us plenty of time to shake out any nastiness.


From raymond.hettinger at  Sat Feb 11 00:32:06 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Fri, 10 Feb 2006 18:32:06 -0500
Subject: [Python-Dev] release plan for 2.5 ?
References: <dsbc3h$rct$>
Message-ID: <001901c62e9a$351be660$b83efea9@RaymondLaptop1>

Just do it.

----- Original Message ----- 
From: "Guido van Rossum" <guido at>
To: "Raymond Hettinger" <python at>
Cc: "Barry Warsaw" <barry at>; <python-dev at>
Sent: Friday, February 10, 2006 5:47 PM
Subject: Re: [Python-Dev] release plan for 2.5 ?

On 2/10/06, Raymond Hettinger <raymond.hettinger at> wrote:
> [Barry Warsaw"]like to put email 3.1 in Python 2.5
> > with the new module naming scheme.  The old names will still work,
> > and all the unit tests pass.  Do we need a PEP for that?
> +1

I don't know if Raymond meant "we need a PEP" or "go ahead with the
feature" but my own feeling is that this doesn't need a PEP and Barry
can Just Do It.

--Guido van Rossum (home page:

From greg.ewing at  Sat Feb 11 00:46:20 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 11 Feb 2006 12:46:20 +1300
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:

> FWIW, Annex D also defines these features as deprecated:
> - the use of "static" for objects in namespace scope (AFAICT
>   including C file-level static variables and functions)
> - C library headers (i.e. <stdio.h>)

Things like this are really starting to get on my groat.
It used to be that C++ was very nearly a superset of C,
so it was easy to write code that would compile as either.
But C++ seems to be evolving into a different language

(And an obnoxiously authoritarian one at that. If I want
to write some C++ code that uses stdio because I happen
to like it better, why the heck shouldn't I be allowed
to? It's MY program, not the C++ standards board's!)

Sorry, I just had to say that.


From greg.ewing at  Sat Feb 11 01:14:23 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 11 Feb 2006 13:14:23 +1300
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Tim Peters wrote:
> [Martin v. L?wis]
> > In any case, POSIX makes it undefined what FD_SET does when the
> > socket is larger than FD_SETSIZE, and apparently clearly expects
> > an fd_set to be a bit mask.
> Yup -- although the people who designed the fdset macros to begin with
> didn't appear to have this assumption.

I don't agree. I rather think the entire purpose of
the fdset interface was simply to allow more than
32 items in the set (which the original select()
in BSD was limited to). The whole thing still seems
totally bitmask-oriented, down to the confusion between
set size and file descriptor number.

The MacOSX man page for select() (which seems fairly
closely BSD-based) even explicitly says "The descriptor
sets are stored as bit fields in arrays of integers."


From tim.peters at  Sat Feb 11 02:48:30 2006
From: tim.peters at (Tim Peters)
Date: Fri, 10 Feb 2006 20:48:30 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

[Martin v. L?wis]
> I see. How does Py_SOCKET_FD_CAN_BE_GE_FD_SETSIZE help here?

By naming a logical condition as opposed to a list of
platform-specific symbols that aren't documented anywhere.  For
example, I have no idea exactly which compiler+OS combinations define
MS_WINDOWS, so "#ifdef MS_WINDOWS" is always something of a mystery. 
I don't want to see mystery-symbols inside modules -- to the extent
that they must be used, I want to hide them in .h files clearly
dedicated to wrestling with portability headaches (like pyconfig.h and

> Does defining it in PC/pyconfig.h do the right thing?

That much would stop the test failures _I_ see, which is what I need
to get unstuck.  If POSIX systems simply ignore it, it would do the
right thing for them too.  Documentation in pyport.h would serve to
guide others (in the "Config #defines referenced here:" comments near
the top of that file).  I don't know what other systems need, so
assuming "we have to do something" _at all_ here, the best I can do is
provide documented macros and config symbols to deal with it.

I think the relationship between SIGNED_RIGHT_SHIFT_ZERO_FILLS and
pyport.h's Py_ARITHMETIC_RIGHT_SHIFT macro is a good analogy here. 
Almost everyone ignores SIGNED_RIGHT_SHIFT_ZERO_FILLS, and that's
fine, because almost all C compilers generate code to do
sign-extending right shifts.  If someone has a box that doesn't, fine,
it's up to them to get SIGNED_RIGHT_SHIFT_ZERO_FILLS #define'd in
their pyconfig.h, and everything else "just works" for them then.  All
other platforms can remain blissfully ignorant.

> I guess I'm primarily opposed to the visual ugliness of the define.

I don't much care how it's spelled.

> Why does it spell out "can be", but abbreviates
> "greater than or equal to"?

Don't care.  I don't know of a common abbrevation for "can be", but GE
same-as >= is in my Fortran-trained blood :-)

> What about Py_CHECK_FD_SETSIZE?

That's fine, except I think it would be pragmatically better to make
it Py_DONT_CHECK_FD_SETSIZE, since most platforms want to check it. 
The platforms that don't want this check (like Windows) are the
oddballs, so it's better to default to checking, making the oddballs
explicitly do something to stop such checking.

It's no problem to add a #define to PC/pyconfig.h, since that
particular config file is 100% hand-written (and always will be).

From bokr at  Sat Feb 11 03:02:10 2006
From: bokr at (Bengt Richter)
Date: Sat, 11 Feb 2006 02:02:10 GMT
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
References: <>
Message-ID: <>

On Fri, 10 Feb 2006 17:53:39 +0100, Thomas Wouters <thomas at> wrote:

>On Fri, Feb 10, 2006 at 11:30:30AM -0500, Jeremy Hylton wrote:
>> On 2/10/06, Guido van Rossum <guido at> wrote:
>> > OMG. Are we now adding 'const' modifiers to random places? I thought
>> > "const propagation hell" was a place we were happily avoiding by not
>> > falling for that meme. What changed?
>> I added some const to several API functions that take char* but
>> typically called by passing string literals.  In C++, a string literal
>> is a const char* so you need to add a const_cast<> to every call site,
>> which is incredibly cumbersome.  After some discussion on python-dev,
>> I made changes to a small set of API functions and chased the
>> const-ness the rest of the way, as you would expect.  There was
>> nothing random about the places const was added.
>> I admit that I'm also puzzled by Jack's specific question.  I don't
>> understand why an array passed to PyArg_ParseTupleAndKeywords() would
>> need to be declared as const.  I observed the problem in my initial
>> changes but didn't think very hard about the cause of the problem. 
>> Perhaps someone with better C/C++ standards chops can explain.
>Well, it's counter-intuitive, but a direct result of how pointer equivalence
>is defined in C. I'm rusty in this part, so I will get some terminology
>wrong, but IIRC, a variable A is of an equivalent type of variable B if they
>hold the same type of data. So, a 'const char *' is equivalent to a 'char *'
>because they both hold the memory of a 'char'. But a 'const char**' (or
>'const *char[]') is not equivalent to a 'char **' (or 'char *[]') because
>the first holds the address of a 'const char *', and the second the address
>of a 'char *'. A 'char * const *' is equivalent to a 'char **' though.
>As I said, I got some of the terminology wrong, but the end result is
>exactly that: a 'const char **' is not equivalent to a 'char **', even
>though a 'const char *' is equivalent to a 'char *'. Equivalence, in this
>case, means 'can be automatically downcasted'. Peter v/d Linden explains
>this quite well in "Expert C Programming" (aka 'Deep C Secrets'), but
>unfortunately I'm working from home and I left my copy at a coworkers' desk.
Would it make sense to use a typedef for readability's sake? E.g.,

    typedef const char * p_text_literal;

and then use

    p_text_literal, const p_text_literal *

in the signature, for read-only access to the data? (hope I got that right).

(also testing whether I have been redirected to /dev/null ;-)

Bengt Richter

From scott+python-dev at  Sat Feb 11 03:31:52 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 21:31:52 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	
	<> <>	
Message-ID: <>

Tim Peters wrote:
> ?  Sorrry, don't know what you're talking about here.  Python's
> selectmodule.c #defines FD_SETSIZE before it includes winsock.h on
> Windows, so Microsoft's default is irrelevant to Python.  The reason
> selectmodule.c uses "!defined(FD_SETSIZE)" in its

Not that this is really that important, but if we are talking about as 
the code stands right now, IS_SELECTABLE uses FD_SETSIZE with no such 
define ever appearing. That is what I meant, and I am pretty sure that 
is where Martin came up with saying it was 64. But like I say.. it's not 
that important. Sorry for the noise.

Scott Dial
scott at
dialsa at

From scott+python-dev at  Sat Feb 11 03:42:46 2006
From: scott+python-dev at (Scott Dial)
Date: Fri, 10 Feb 2006 21:42:46 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Tim Peters wrote:
 > Does it do the right thing for Windows variants like Cygwin, and OS/2?

I can at least say that the Cygwin implements a full POSIX facade in 
front of Windows sockets, so it would be important that the code in 
question is used to protect it as well. Also, MS_WINDOWS is not defined 
for a Cygwin compile, so it is fine to be using that. But I realize 
there is a whole 'nother discussion about that.

Scott Dial
scott at
dialsa at

From nas at  Sat Feb 11 06:08:09 2006
From: nas at (Neil Schemenauer)
Date: Sat, 11 Feb 2006 05:08:09 +0000 (UTC)
Subject: [Python-Dev] release plan for 2.5 ?
References: <dsbc3h$rct$>
Message-ID: <dsjrfp$g72$>

Guido van Rossum <guido at> wrote:
> PEP 349 - str() may return unicode. Where is this?

Does that mean you didn't find and read the PEP or was it written so
badly that it answered none of your questions?  The PEP is on with all the rest.  I set the status to "Deferred"
because it seemed that no one was interested in the change.

> I'm not at all sure the PEP is ready. it would probably be a lot
> of work to make this work everywhere in the C code, not to mention
> the stdlib .py code. Perhaps this should be targeted for 2.6
> instead? The consequences seem potentially huge.

The backwards compatibility problems *seem* to be relatively minor.
I only found one instance of breakage in the standard library.  Note
that my patch does not change PyObject_Str(); that would break
massive amounts of code.  Instead, I introduce a new function:
PyString_New().  I'm not crazy about the name but I couldn't think
of anything better.


From guido at  Sat Feb 11 06:25:21 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 21:25:21 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <dsjrfp$g72$>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Neil Schemenauer <nas at> wrote:
> Guido van Rossum <guido at> wrote:
> > PEP 349 - str() may return unicode. Where is this?
> Does that mean you didn't find and read the PEP or was it written so
> badly that it answered none of your questions?  The PEP is on
> with all the rest.  I set the status to "Deferred"
> because it seemed that no one was interested in the change.

Sorry -- it was an awkward way to ask "what's the status"? You've answered that.

> > I'm not at all sure the PEP is ready. it would probably be a lot
> > of work to make this work everywhere in the C code, not to mention
> > the stdlib .py code. Perhaps this should be targeted for 2.6
> > instead? The consequences seem potentially huge.
> The backwards compatibility problems *seem* to be relatively minor.
> I only found one instance of breakage in the standard library.  Note
> that my patch does not change PyObject_Str(); that would break
> massive amounts of code.  Instead, I introduce a new function:
> PyString_New().  I'm not crazy about the name but I couldn't think
> of anything better.

So let's think about this more post 2.5.

--Guido van Rossum (home page:

From bokr at  Sat Feb 11 06:30:00 2006
From: bokr at (Bengt Richter)
Date: Sat, 11 Feb 2006 05:30:00 GMT
Subject: [Python-Dev] release plan for 2.5 ?
References: <dsbc3h$rct$>
Message-ID: <>

On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at> wrote:

>Guido van Rossum <guido at> wrote:
>> PEP 349 - str() may return unicode. Where is this?
>Does that mean you didn't find and read the PEP or was it written so
>badly that it answered none of your questions?  The PEP is on
> with all the rest.  I set the status to "Deferred"
>because it seemed that no one was interested in the change.
>> I'm not at all sure the PEP is ready. it would probably be a lot
>> of work to make this work everywhere in the C code, not to mention
>> the stdlib .py code. Perhaps this should be targeted for 2.6
>> instead? The consequences seem potentially huge.
>The backwards compatibility problems *seem* to be relatively minor.
>I only found one instance of breakage in the standard library.  Note
>that my patch does not change PyObject_Str(); that would break
>massive amounts of code.  Instead, I introduce a new function:
>PyString_New().  I'm not crazy about the name but I couldn't think
>of anything better.
Should this not be coordinated with PEP 332?

Bengt Richter

From guido at  Sat Feb 11 06:35:26 2006
From: guido at (Guido van Rossum)
Date: Fri, 10 Feb 2006 21:35:26 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at> > >The backwards compatibility problems *seem* to be relatively minor.
> >I only found one instance of breakage in the standard library.  Note
> >that my patch does not change PyObject_Str(); that would break
> >massive amounts of code.  Instead, I introduce a new function:
> >PyString_New().  I'm not crazy about the name but I couldn't think
> >of anything better.

On 2/10/06, Bengt Richter <bokr at> wrote:
> Should this not be coordinated with PEP 332?

Probably.. But that PEP is rather incomplete. Wanna work on fixing that?

--Guido van Rossum (home page:

From bokr at  Sat Feb 11 09:20:27 2006
From: bokr at (Bengt Richter)
Date: Sat, 11 Feb 2006 08:20:27 GMT
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <guido at> wrote:

>> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at> > >The backwards compatibility problems *seem* to be relatively minor.
>> >I only found one instance of breakage in the standard library.  Note
>> >that my patch does not change PyObject_Str(); that would break
>> >massive amounts of code.  Instead, I introduce a new function:
>> >PyString_New().  I'm not crazy about the name but I couldn't think
>> >of anything better.
>On 2/10/06, Bengt Richter <bokr at> wrote:
>> Should this not be coordinated with PEP 332?
>Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
I'd be glad to add my thoughts, but first of course it's Skip's PEP,
and Martin casts a long shadow when it comes to character coding issues
that I suspect will have to be considered.

(E.g., if there is a b'...' literal for bytes, the actual characters of
the source code itself that the literal is being expressed in could be ascii
or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
is at least temporarily normalized to Unicode, and then re-encoded (except now
for string literals?) per coding cookie or other encoding inference. (I may be
out of date, gotta catch up).

If one way or the other a string literal is in Unicode, then presumably so is
a byte string b'...' literal -- i.e. internally u"b'...'" just before
being turned into bytes.

Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
for non-ascii and non-printables, to define the full 8 bits without encoding error?
Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
(but how does this play with str being able to produce unicode? And when do these changes happen?)
I guess I'm getting ahead of myself ;-)

So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.

I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
that anyone could then improve further. I don't know about an early deadline. I don't want
to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
time more effectively ;-)

I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
know who else might be interested...

Bengt Richter

From martin at  Sat Feb 11 09:30:52 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 09:30:52 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
>>FWIW, Annex D also defines these features as deprecated:
>>- the use of "static" for objects in namespace scope (AFAICT
>>  including C file-level static variables and functions)
>>- C library headers (i.e. <stdio.h>)
> Things like this are really starting to get on my groat.
> It used to be that C++ was very nearly a superset of C,
> so it was easy to write code that would compile as either.
> But C++ seems to be evolving into a different language
> altogether.

Not at all. People appear to completely fail to grasp
the notion of "deprecated" in this context. It just
means "it may go away in a future version", implying
that the rest of it may *not* go away in a future

That future version might get published in 2270,
when everybody has switched to C++, and compatibility
with C is no longer required.

So the compiler is wrong for warning about it (or
the user is wrong for asking to get warned), and
you are wrong for getting upset about this.

> (And an obnoxiously authoritarian one at that. If I want
> to write some C++ code that uses stdio because I happen
> to like it better, why the heck shouldn't I be allowed
> to? It's MY program, not the C++ standards board's!)

Again, you are misunderstanding what precisely is
deprecated. Sure you can still use stdio, and it is
never going away (it isn't deprecated). However, you
have to spell the header as

#include <cstdio>

and then refer to the functions as std::printf,
std::stderr, etc.

What is really being deprecated here is the global
namespace. That's also the reason to deprecate
file-level static: you should use anonymous namespaces

(Also, just in case this is misunderstood again:
it is *not* that programs cannot put stuff in
the global namespace anymore. It's just that the
standard library should not put stuff in the
global namespace).


From martin at  Sat Feb 11 09:33:23 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 09:33:23 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:
> Would it make sense to use a typedef for readability's sake? E.g.,
>     typedef const char * p_text_literal;
> and then use
>     p_text_literal, const p_text_literal *
> in the signature, for read-only access to the data? (hope I got that right).
> (also testing whether I have been redirected to /dev/null ;-)

Nearly. Please try your proposals out in a sandbox before posting.
How does this contribute to solving the PyArg_ParseTupleAndKeywords
issue? Readability is not the problem that puzzled Jack.


From bokr at  Fri Feb 10 18:35:10 2006
From: bokr at (Bengt Richter)
Date: Fri, 10 Feb 2006 17:35:10 GMT
Subject: [Python-Dev] _length_cue()
References: <>
Message-ID: <>

On Fri, 10 Feb 2006 14:33:08 +0100, Armin Rigo <arigo at> wrote:

>Hi Nick,
>On Fri, Feb 10, 2006 at 11:21:52PM +1000, Nick Coghlan wrote:
>> Do they really need anything more sophisticated than:
>>    def __repr__(self):
>>      return "%s(%r)" % (type(self).__name__, self._subiter)
>> (modulo changes in the format of arguments, naturally. This simple one would 
>> work for things like enumerate and reversed, though)
>My goal here is not primarily to help debugging, but to help playing
>around at the interactive command-line.  Python's command-line should
>not be dismissed as "useless for real programmers"; I definitely use it
>all the time to try things out.  It would be nicer if all these
>iterators I'm not familiar with would give me a hint about what they
>actually return, instead of:
>>>> itertools.count(17)
>count(17)                  # yes, thank you, not very helpful
>>>> enumerate("spam")
>enumerate("spam")          # with your proposed extension -- not better
>However, if this kind of goal is considered "not serious enough" for
>adding a private special method, then I'm fine with trying out a fishing
For enhancing interactive usage, how about putting the special info and smarts in help?
Or even a specialized part of help, e.g.,


or maybe


leading to an interactive prompt putting handy cmdwords in a line to get
easily to type, mro, non-underscore methods, attribute name list, etc.

E.g. I often find myself typing stuff like
    [x for x in dir(obj) if not x.startswith('_')]
    [k for k,v in type(obj).__dict__.items() if callable(v) and not k.startswith('_')]
that I would welcome being able to do easily with a specialized help.plaindir(obj)
or help.plainmethods(obj) or help.mromethods(obj) etc.

Hm, now that I think of it, I guess I could do stuff like that in, since
 >>> help.plaindir = lambda x: sorted([x for x in dir(x) if not x.startswith('_')])
 >>> help.plaindir(int)
 >>> help.plaindir([])
 ['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

But some kind of standards would probably be nice for everyone if they like the general idea.
I'll leave it to someone else as to whether and where a thread re help enhancements
might be ok.

My .02USD ;-)

Bengt Richter

From g.brandl at  Sat Feb 11 10:29:56 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 11 Feb 2006 10:29:56 +0100
Subject: [Python-Dev] The decorator(s) module
Message-ID: <dskaqk$ene$>


it has been proposed before, but there was no conclusive answer last time:
is there any chance for 2.5 to include commonly used decorators in a module?

Of course not everything that jumps around should go in, only pretty basic
stuff that can be widely used.

Candidates are:
 - @decorator. This properly wraps up a decorator function to change the
   signature of the new function according to the decorated one's.

 - @contextmanager, see PEP 343.

 - @synchronized/@locked/whatever, for thread safety.

 - @memoize

 - Others from wiki:PythonDecoratorLibrary and Michele Simionato's decorator
   module at <>.

Unfortunately, a @property decorator is impossible...


From g.brandl at  Fri Feb 10 22:09:52 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 10 Feb 2006 22:09:52 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>
Message-ID: <dsivf1$p6j$>

Guido van Rossum wrote:

> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>     alpha 1: May 2006
>     alpha 2: June 2006
>     beta 1:  July 2006
>     beta 2:  August 2006
>     rc 1:    September 2006
>     final:   September 2006
> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

I am not experienced in releasing, but with the multitude of new things
introduced in Python 2.5, could it be a good idea to release an early alpha
not long after all (most of?) the desired features are in the trunk?
That way people would get to testing sooner and the number of non-obvious
bugs may be reduced (I'm thinking of the import PEP, the implementation of
which is bound to be hairy, or "with" in its full extent).


From g.brandl at  Fri Feb 10 22:38:59 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 10 Feb 2006 22:38:59 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>
Message-ID: <dsj15j$uob$>

Guido van Rossum wrote:

> - setuplib? Wouldn't it make sense to add this to the 2.5 stdlib?

If you mean setuptools, I'm a big +1 (if it's production-ready by that time).
Together with a whipped up cheese shop we should finally be able to put up
something equal to cpan/rubygems.


From g.brandl at  Fri Feb 10 22:32:23 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 10 Feb 2006 22:32:23 +0100
Subject: [Python-Dev] The decorator(s) module
Message-ID: <dsj0p7$tk3$>


it has been proposed before, but there was no conclusive answer last time:
is there any chance for 2.5 to include commonly used decorators in a module?

Of course not everything that jumps around should go in, only pretty basic
stuff that can be widely used.

Candidates are:
 - @decorator. This properly wraps up a decorator function to change the
   signature of the new function according to the decorated one's.

 - @contextmanager, see PEP 343.

 - @synchronized/@locked/whatever, for thread safety.

 - @memoize

 - Others from wiki:PythonDecoratorLibrary and Michele Simionato's decorator
   module at <>.

Unfortunately, a @property decorator is impossible...


From ncoghlan at  Sat Feb 11 12:04:41 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 11 Feb 2006 21:04:41 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
Message-ID: <>

I finally finished updating PEP 338 to comply with the flexible importing 
system in PEP 302.

The result is a not-yet-thoroughly-tested module that should allow the -m 
switch to execute any module written in Python that is accessible via an 
absolute import statement.

The PEP now uses runpy for the module name, and run_module for the function 
used to locate and execute scripts. There's probably some discussion to be had 
in relation to the Design Decisions section of the PEP, relating to the way I 
wrote the module (the handling of locals dictionaries in particular deserves 

Tracker items for the runpy module [1] and its documentation [2] are on 
Sourceforge (the interesting parts of the documentation are in the PEP, so I 
suggest reading that rather than the LaTex version).

Still missing from the first tracker item are a patch to update '-m' to invoke 
the new module and some unit tests (the version on SF has only had ad hoc 
testing from the interactive prompt at this stage). I hope to have those up 
shortly, though.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sat Feb 11 12:04:53 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 11 Feb 2006 21:04:53 +1000
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> PEP 338 - support -m for modules in packages. I believe Nick Coghlan
> is close to implementing this. I'm fine with accepting it.

I just checked in a new version of PEP 338 that cleans up the approach so that 
it provides support for any PEP 302 compliant packaging mechanism as well as 
normal filesystem packages.

I've started a new thread for the discussion:
   PEP 338 - Executing Modules as Scripts


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From thomas at  Sat Feb 11 13:51:02 2006
From: thomas at (Thomas Wouters)
Date: Sat, 11 Feb 2006 13:51:02 +0100
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dsj0p7$tk3$>
References: <dsj0p7$tk3$>
Message-ID: <>

On Fri, Feb 10, 2006 at 10:32:23PM +0100, Georg Brandl wrote:

> Unfortunately, a @property decorator is impossible...

Depends. You can do, e.g.,

def propertydef(propertydesc):
    data = propertydesc()
    if not data:
        raise ValueError, "Invalid property descriptors"
    getter, setter, deller = (data + (None, None))[:3]
    return property(fget=getter, fset=setter, fdel=deller,

and use it like:

class X(object):
    def __init__(self):
        self._prop = None

    def prop():
        "Public, read-only access to self._prop"
        def getter(self):
            return self._prop
        return (getter,)

    def rwprop():
        "Public read-write access to self._prop"
        def getter(self):
            return self._prop
        def setter(self, val):
            self._prop = val
        def deller(self):
            self._prop = None
        return (getter, setter, deller)

    def hiddenprop():
        "Public access to a value stored in a closure"
        prop = [None]
        def getter(self):
            return prop[0]
        def setter(self, val):
            prop[0] = val
        def deller(self):
            prop[0] = None
        return (getter, setter, deller)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From bokr at  Fri Feb 10 21:36:15 2006
From: bokr at (Bengt Richter)
Date: Fri, 10 Feb 2006 20:36:15 GMT
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
References: <>	<>
Message-ID: <>

On Fri, 10 Feb 2006 18:02:03 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Jeremy Hylton wrote:
>> I admit that I'm also puzzled by Jack's specific question.  I don't
>> understand why an array passed to PyArg_ParseTupleAndKeywords() would
>> need to be declared as const.  I observed the problem in my initial
>> changes but didn't think very hard about the cause of the problem. 
>> Perhaps someone with better C/C++ standards chops can explain.
>Please take a look at this code:
>void foo(const char** x, const char*s)
>        x[0] = s;
>void bar()
>        char *kwds[] = {0};
>        const char *s = "Text";
>        foo(kwds, s);
>        kwds[0][0] = 't';
>If it was correct, you would be able to modify the const char
>array in the string literal, without any compiler errors. The
>  x[0] = s;
>is kosher, because you are putting a const char* into a
>const char* array, and the assigment
>     kwds[0][0] = 't';
>is ok, because you are modifying a char array. So the place
>where it has to fail is the passing of the pointer-pointer.
Will a typedef help?

----< martin.c >-------------------------------------------
#include <cstdio>
typedef const char *ptext;
void foo(ptext *kw)
    const char *s = "Text";
    ptext *p;
    for(p=kw;*p;p++){ printf("foo:%s\n", *p);}
    kw[0] = s;
    for(p=kw;*p;p++){ printf("foo2:%s\n", *p);}
    kw[0][0] = 't';  /* comment this out and it compiles and runs */
    for(p=kw;*p;p++){ printf("foo3:%s\n", *p);}

int main()
    char *kwds[] = {"Foo","Bar",0};
    char **p;
    for(p=kwds;*p;p++){ printf("%s\n", *p);}
    for(p=kwds;*p;p++){ printf("%s\n", *p);}
[12:32] C:\pywk\pydev>cl martin.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86
Copyright (C) Microsoft Corp 1984-1998. All rights reserved.

martin.c(10) : error C2166: l-value specifies const object

But after commenting out:

[12:32] C:\pywk\pydev>cl martin.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86
Copyright (C) Microsoft Corp 1984-1998. All rights reserved.

Microsoft (R) Incremental Linker Version 6.00.8168
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.


[12:34] C:\pywk\pydev>martin

Bengt Richter

From martin at  Sat Feb 11 14:14:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 14:14:00 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:
> Will a typedef help?

A typedef can never help. It is always possible to reformulate
a program using typedefs to one that doesn't use typedefs.

Compiling your program with the const modification line
removed gives

martin.c: In function 'int main()':
martin.c:18: error: invalid conversion from 'char**' to 'const char**'
martin.c:18: error:   initializing argument 1 of 'void foo(const char**)'


From duncan.booth at  Sat Feb 11 14:29:07 2006
From: duncan.booth at (Duncan Booth)
Date: Sat, 11 Feb 2006 07:29:07 -0600
Subject: [Python-Dev] The decorator(s) module
References: <dsj0p7$tk3$>
Message-ID: <n2m-g.Xns976789233915Fduncanrcpcouk@>

Georg Brandl <g.brandl at> wrote in news:dsj0p7$tk3$1 at

> Unfortunately, a @property decorator is impossible...

It all depends what you want (and whether you want the implementation to be 
portable to other Python implementations). Here's one possible but not 
exactly portable example:

from inspect import getouterframes, currentframe
import unittest

class property(property):
    def get(cls, f):
        locals = getouterframes(currentframe())[1][0].f_locals
        prop = locals.get(f.__name__, property())
        return cls(f, prop.fset, prop.fdel, prop.__doc__)

    def set(cls, f):
        locals = getouterframes(currentframe())[1][0].f_locals
        prop = locals.get(f.__name__, property())
        return cls(prop.fget, f, prop.fdel, prop.__doc__)

    def delete(cls, f):
        locals = getouterframes(currentframe())[1][0].f_locals
        prop = locals.get(f.__name__, property())
        return cls(prop.fget, prop.fset, f, prop.__doc__)

class PropTests(unittest.TestCase):
    def test_setgetdel(self):
        class C(object):
            def __init__(self, colour):
                self._colour = colour

            def colour(self, value):
                self._colour = value

            def colour(self):
                return self._colour

            def colour(self):
                self._colour = 'none'
        inst = C('red')
        self.assertEquals(inst.colour, 'red')
        inst.colour = 'green'
        self.assertEquals(inst._colour, 'green')
        del inst.colour
        self.assertEquals(inst._colour, 'none')

if __name__=='__main__':

From ronaldoussoren at  Sat Feb 11 14:48:46 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Sat, 11 Feb 2006 14:48:46 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On 10-feb-2006, at 23:49, Martin v. L?wis wrote:

> Tim Peters wrote:
>> I don't know.  Of course it misses similar new tests added to _ssl.c
>> (see the msg that started this thread), so it spreads beyond just
>> this.  Does it do the right thing for Windows variants like Cygwin,
>> and OS/2?  Don't know.
> I see. How does Py_SOCKET_FD_CAN_BE_GE_FD_SETSIZE help here?
> Does defining it in PC/pyconfig.h do the right thing?
> I guess I'm primarily opposed to the visual ugliness of the
> define. Why does it spell out "can be", but abbreviates
> "greater than or equal to"? What about Py_CHECK_FD_SETSIZE?

If I understand this discussion correctly that code that would be
conditionalized using this define is the IS_SELECTABLE macro in
selectmodule.c and very simular code in other modules. I'd say that
calling the test _Py_IS_SELECTABLE and putting it into pyport.h
as Tim mentioned in an aside seems to be a good solution. At the
very least it is a lot nicer than defining a very long name in
pyconfig.h and then having very simular code in several #if blocks.

> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2157 bytes
Desc: not available
Url : 

From keith at  Sat Feb 11 14:51:52 2006
From: keith at (Keith Dart)
Date: Sat, 11 Feb 2006 05:51:52 -0800
Subject: [Python-Dev] Let's just *keep* lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote the following on 2006-02-10 at 16:20 PST:
> Although "print" may become a function in 3.0, so that this
> particular example would no longer be a problem.


You can always make your own Print function. The pyNMS framework adds
many new builtins, as well as a Print function, when it is installed.


-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at>
   public key: ID: 19017044
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : 

From martin at  Sat Feb 11 14:59:26 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 14:59:26 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

Ronald Oussoren wrote:
> If I understand this discussion correctly that code that would be
> conditionalized using this define is the IS_SELECTABLE macro in
> selectmodule.c and very simular code in other modules. I'd say that
> calling the test _Py_IS_SELECTABLE and putting it into pyport.h
> as Tim mentioned in an aside seems to be a good solution. At the
> very least it is a lot nicer than defining a very long name in
> pyconfig.h and then having very simular code in several #if blocks.

For the moment, I have committed Tim's original proposal. Moving
the macro into pyport.h could be done in addition. That should
be done only if selectmodule is also adjusted; this currently
tests for _MSC_VER.


From ncoghlan at  Sat Feb 11 14:59:40 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 11 Feb 2006 23:59:40 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> The PEP now uses runpy for the module name, and run_module for the function 
> used to locate and execute scripts. There's probably some discussion to be had 
> in relation to the Design Decisions section of the PEP, relating to the way I 
> wrote the module (the handling of locals dictionaries in particular deserves 
> consideration).

Huh. Speaking of not-thoroughly-tested, exec + function code objects doesn't 
seem to work anything like I expected, so some of my assumptions in the PEP 
relating to the way the locals dictionary should be handled are clearly wrong. 
As I discovered, the name binding operations in a function code object have no 
effect whatsoever on the dictionaries passed to an invocation of exec.

I'll update the PEP to drop run_function_code, and make run_code a simple 
wrapper around the exec statement that always returns the dictionary used as 
'locals' (which may happen to be the same dictionary used as 'globals').

If the way exec handles function code objects and provision of a locals 
dictionary ever changes, then run_code will pick up the new semantics 


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From dave at  Sat Feb 11 15:11:26 2006
From: dave at (David Abrahams)
Date: Sat, 11 Feb 2006 09:11:26 -0500
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
Message-ID: <>

It isn't completely clear which branch or tag to get, and Google
turned up no obvious documentation.


Dave Abrahams
Boost Consulting

From martin at  Sat Feb 11 16:02:02 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 16:02:02 +0100
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
In-Reply-To: <>
References: <>
Message-ID: <>

David Abrahams wrote:
> It isn't completely clear which branch or tag to get, and Google
> turned up no obvious documentation.


From skip at  Sat Feb 11 16:10:41 2006
From: skip at (skip at
Date: Sat, 11 Feb 2006 09:10:41 -0600
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
In-Reply-To: <>
References: <>
Message-ID: <>

    Dave> It isn't completely clear which branch or tag to get, and Google
    Dave> turned up no obvious documentation.

On subversion, you want releaseXY-maint for the various X.Y releases.  For
2.4.2, release24-maint is what you want, though it may have a few bug fixes
since 2.4.2 was released.  With CVS I used to use "cvs log README" to see
what all the tags and branches were.  I don't know what the equivalent svn
command is.


From raveendra-babu.m at  Fri Feb 10 13:36:34 2006
From: raveendra-babu.m at (M, Raveendra Babu (STSD))
Date: Fri, 10 Feb 2006 18:06:34 +0530
Subject: [Python-Dev] To know how to set "pythonpath"
Message-ID: <>

I am a newbe to python. While I am running some scripts it reports some
errors because of PYTHONPATH variable.

Can you send me information of how to set PYTHONPATH.
I am using  python 2.1.3 on aix 5.2.


From mrussell at  Fri Feb 10 14:08:39 2006
From: mrussell at (Mark Russell)
Date: Fri, 10 Feb 2006 13:08:39 +0000
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
	<dsgem7$10u$>	<>	<>
Message-ID: <>

On 10 Feb 2006, at 12:45, Nick Coghlan wrote:
> An alternative would be to call it "__discrete__", as that is the key
> characteristic of an indexing type - it consists of a sequence of  
> discrete
> values that can be isomorphically mapped to the integers.

Another alternative: __as_ordinal__.  Wikipedia describes ordinals as  
"numbers used to denote the position in an ordered sequence" which  
seems a pretty precise description of the intended result.  The "as_"  
prefix also captures the idea that this should be a lossless conversion.

Mark Russell
-------------- next part --------------
An HTML attachment was scrubbed...

From thomas at  Sat Feb 11 16:29:57 2006
From: thomas at (Thomas Wouters)
Date: Sat, 11 Feb 2006 16:29:57 +0100
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Feb 11, 2006 at 09:10:41AM -0600, skip at wrote:

>     Dave> It isn't completely clear which branch or tag to get, and Google
>     Dave> turned up no obvious documentation.

> On subversion, you want releaseXY-maint for the various X.Y releases.  For
> 2.4.2, release24-maint is what you want, though it may have a few bug fixes
> since 2.4.2 was released.  With CVS I used to use "cvs log README" to see
> what all the tags and branches were.  I don't know what the equivalent svn
> command is.

The 'cvs log' trick only works if the file you log is actually part of the
branch. Not an issue with Python or any other project that always branches
sanely, fortunately, but there's always wackos out there ;)
You get the list of branches in SVN with:

svn ls

And similarly, tags with:

svn ls

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From martin at  Sat Feb 11 16:32:23 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 16:32:23 +0100
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
In-Reply-To: <>
References: <>
Message-ID: <>

skip at wrote:
> On subversion, you want releaseXY-maint for the various X.Y releases.  For
> 2.4.2, release24-maint is what you want, though it may have a few bug fixes
> since 2.4.2 was released.  With CVS I used to use "cvs log README" to see
> what all the tags and branches were.  I don't know what the equivalent svn
> command is.

The easiest is to open either


in a web browser. If you want to use the subversion command line,

svn ls


From g.brandl at  Sat Feb 11 16:33:25 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 11 Feb 2006 16:33:25 +0100
Subject: [Python-Dev] Where to put "post-it notes"?
Message-ID: <dsl045$94n$>

I just updated the general copyright notice to include the
year 2006. This is scattered in at least 6 files (I found that many searching
for 2004 and 2005) which would be handy to record somewhere so that next year
it's easier. Where does this belong?


From aahz at  Sat Feb 11 16:37:34 2006
From: aahz at (Aahz)
Date: Sat, 11 Feb 2006 07:37:34 -0800
Subject: [Python-Dev] To know how to set "pythonpath"
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 10, 2006, M, Raveendra Babu (STSD) wrote:
> I am a newbe to python. While I am running some scripts it reports some
> errors because of PYTHONPATH variable.
> Can you send me information of how to set PYTHONPATH.
> I am using  python 2.1.3 on aix 5.2.

Sorry, this is the wrong place.  Please use another place, such as
comp.lang.python, and read
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From aahz at  Sat Feb 11 16:40:35 2006
From: aahz at (Aahz)
Date: Sat, 11 Feb 2006 07:40:35 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sat, Feb 11, 2006, "Martin v. L?wis" wrote:
> Not at all. People appear to completely fail to grasp the notion of
> "deprecated" in this context. It just means "it may go away in a
> future version", implying that the rest of it may *not* go away in a
> future version.
> That future version might get published in 2270, when everybody has
> switched to C++, and compatibility with C is no longer required.

Just for the clarification of those of us who are not C/C++ programmers,
are you saying that this is different from the meaning in Python, where
"deprecated" means that something *IS* going away?
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From p.f.moore at  Sat Feb 11 16:50:35 2006
From: p.f.moore at (Paul Moore)
Date: Sat, 11 Feb 2006 15:50:35 +0000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/11/06, Nick Coghlan <ncoghlan at> wrote:
> I finally finished updating PEP 338 to comply with the flexible importing
> system in PEP 302.
> The result is a not-yet-thoroughly-tested module that should allow the -m
> switch to execute any module written in Python that is accessible via an
> absolute import statement.

Does this implementation resolve as
well? A reading of the PEP would seem to imply that it does, but the
SF patches you mention don't include any changes to the core, so I'm
not sure...


From thomas at  Sat Feb 11 16:56:51 2006
From: thomas at (Thomas Wouters)
Date: Sat, 11 Feb 2006 16:56:51 +0100
Subject: [Python-Dev] vs. file.readline()
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Jan 05, 2006 at 07:30:08PM +0100, Thomas Wouters wrote:
> On Wed, Jan 04, 2006 at 10:10:07AM -0800, Guido van Rossum wrote:

> > I'd say go right ahead and submit a change to SF (and then after it's
> > reviewed you can check it in yourself :-).


So, any objections to me checking this in? It doesn't break anything that
wasn't already broken, but neither does it fix it; it just makes the error
more apparent. I don't think it'd be a bugfix candidate, since it changes
the effect of the error (rather than silently delivering data out of order,
it complains.)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From ncoghlan at  Sat Feb 11 17:06:54 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 12 Feb 2006 02:06:54 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore wrote:
> On 2/11/06, Nick Coghlan <ncoghlan at> wrote:
>> I finally finished updating PEP 338 to comply with the flexible importing
>> system in PEP 302.
>> The result is a not-yet-thoroughly-tested module that should allow the -m
>> switch to execute any module written in Python that is accessible via an
>> absolute import statement.
> Does this implementation resolve as
> well? A reading of the PEP would seem to imply that it does, but the
> SF patches you mention don't include any changes to the core, so I'm
> not sure...

It will. I haven't updated the command line switch itself yet, so you'd need 
to do "-m runpy <whatever>". I do plan on fixing the switch, but at the moment 
there's a bug in the module's handling of nested packages, so I want to sort 
that out first.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From bokr at  Sat Feb 11 17:23:16 2006
From: bokr at (Bengt Richter)
Date: Sat, 11 Feb 2006 16:23:16 GMT
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

On Sat, 11 Feb 2006 14:14:00 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Bengt Richter wrote:
>> Will a typedef help?
>A typedef can never help. It is always possible to reformulate
>a program using typedefs to one that doesn't use typedefs.
I realize that's true for a correct compiler, and should have
reflected that you aren't just trying to appease a particular possibly quirky one.
>Compiling your program with the const modification line
>removed gives
>martin.c: In function 'int main()':
>martin.c:18: error: invalid conversion from 'char**' to 'const char**'
>martin.c:18: error:   initializing argument 1 of 'void foo(const char**)'
Sorry, I should have tried it with gcc, which does complain:

[07:16] /c/pywk/pydev>gcc martin.c
martin.c: In function `main':
martin.c:19: warning: passing arg 1 of `foo' from incompatible pointer type

also g++, but not just warning (no a.exe generated)

[07:16] /c/pywk/pydev>g++ martin.c
martin.c: In function `int main()':
martin.c:19: invalid conversion from `char**' to `const char**'

[07:17] /c/pywk/pydev>gcc -v
<snip full specs>
gcc version 3.2.3 (mingw special 20030504-1)

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86
didn't complain. But then it doesn't complain about const char** x either.
I wonder if I have complaints accidentally turned off someplace ;-/

Bengt Richter

From ncoghlan at  Sat Feb 11 18:02:25 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 12 Feb 2006 03:02:25 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Nick Coghlan wrote:
> Paul Moore wrote:
>> On 2/11/06, Nick Coghlan <ncoghlan at> wrote:
>>> I finally finished updating PEP 338 to comply with the flexible importing
>>> system in PEP 302.
>>> The result is a not-yet-thoroughly-tested module that should allow the -m
>>> switch to execute any module written in Python that is accessible via an
>>> absolute import statement.
>> Does this implementation resolve as
>> well? A reading of the PEP would seem to imply that it does, but the
>> SF patches you mention don't include any changes to the core, so I'm
>> not sure...
> It will. I haven't updated the command line switch itself yet, so you'd need 
> to do "-m runpy <whatever>". I do plan on fixing the switch, but at the moment 
> there's a bug in the module's handling of nested packages, so I want to sort 
> that out first.

OK, nested packages now work right (I'd managed to make the common mistake 
that's highlighted quite clearly in the docs for __import__).

Running from inside a zipfile also appears to be working, but I don't have 
zlib in my Python 2.5 build to be 100% certain of that (I could check for 
certain with Python 2.4, but that would involve enough mucking around that I 
don't want to do it right now).

My aim is to have a patch up for the command line switch tomorrow. It 
shouldn't be too tricky, since it is just a matter of retrieving and calling 
the function from the module.

That should supply the last missing piece for the PEP implementation (aside 
from figuring out how to integrate my current manual test setup for 
runpy.run_module into the unit tests - it shouldn't be that hard to create a 
temp directory and add some files to it, similar to what test_pkg already does).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sat Feb 11 18:08:40 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 12 Feb 2006 03:08:40 +1000
Subject: [Python-Dev] Where to put "post-it notes"?
In-Reply-To: <dsl045$94n$>
References: <dsl045$94n$>
Message-ID: <>

Georg Brandl wrote:
> I just updated the general copyright notice to include the
> year 2006. This is scattered in at least 6 files (I found that many searching
> for 2004 and 2005) which would be handy to record somewhere so that next year
> it's easier. Where does this belong?

PEP 101 maybe? Checking the copyright notices can be done independently of 
releases, but they should *definitely* be checked before a release goes out.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Sat Feb 11 19:28:25 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 11 Feb 2006 19:28:25 +0100
Subject: [Python-Dev] Where to put "post-it notes"?
In-Reply-To: <>
References: <dsl045$94n$> <>
Message-ID: <dslac9$qhs$>

Nick Coghlan wrote:
> Georg Brandl wrote:
>> I just updated the general copyright notice to include the
>> year 2006. This is scattered in at least 6 files (I found that many searching
>> for 2004 and 2005) which would be handy to record somewhere so that next year
>> it's easier. Where does this belong?
> PEP 101 maybe? Checking the copyright notices can be done independently of 
> releases, but they should *definitely* be checked before a release goes out.

Ah! They were already there. I added two more files.

By the way, PEP 101 will need to be rewritten to reflect the move to SVN.


From crutcher at  Sat Feb 11 19:33:45 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sat, 11 Feb 2006 10:33:45 -0800
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dskaqk$ene$>
References: <dskaqk$ene$>
Message-ID: <>

+1, and we could maybe include tail_call_optimized?

On 2/11/06, Georg Brandl <g.brandl at> wrote:
> Hi,
> it has been proposed before, but there was no conclusive answer last time:
> is there any chance for 2.5 to include commonly used decorators in a module?
> Of course not everything that jumps around should go in, only pretty basic
> stuff that can be widely used.
> Candidates are:
>  - @decorator. This properly wraps up a decorator function to change the
>    signature of the new function according to the decorated one's.
>  - @contextmanager, see PEP 343.
>  - @synchronized/@locked/whatever, for thread safety.
>  - @memoize
>  - Others from wiki:PythonDecoratorLibrary and Michele Simionato's decorator
>    module at <>.
> Unfortunately, a @property decorator is impossible...
> regards,
> Georg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Crutcher Dunnavant <crutcher at>

From dave at  Sat Feb 11 19:54:50 2006
From: dave at (David Abrahams)
Date: Sat, 11 Feb 2006 13:54:50 -0500
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
References: <>
Message-ID: <>

Thomas Wouters <thomas at> writes:

> On Sat, Feb 11, 2006 at 09:10:41AM -0600, skip at wrote:
>>     Dave> It isn't completely clear which branch or tag to get, and Google
>>     Dave> turned up no obvious documentation.
>> On subversion, you want releaseXY-maint for the various X.Y releases.  For
>> 2.4.2, release24-maint is what you want, though it may have a few bug fixes
>> since 2.4.2 was released.  With CVS I used to use "cvs log README" to see
>> what all the tags and branches were.  I don't know what the equivalent svn
>> command is.
> The 'cvs log' trick only works if the file you log is actually part of the
> branch. Not an issue with Python or any other project that always branches
> sanely, fortunately, but there's always wackos out there ;)
> You get the list of branches in SVN with:
> svn ls
> And similarly, tags with:
> svn ls

Yes, that's easy enough, but being sure of the meaning of any given
tag or branch name is less easy.

Dave Abrahams
Boost Consulting

From aleaxit at  Sat Feb 11 21:55:10 2006
From: aleaxit at (Alex Martelli)
Date: Sat, 11 Feb 2006 12:55:10 -0800
Subject: [Python-Dev] PEP 351
In-Reply-To: <000a01c62e85$bd081770$b83efea9@RaymondLaptop1>
References: <dsbc3h$rct$><><><>
Message-ID: <>

On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:

> [Guido van Rossum]
>> PEP 351 - freeze protocol. I'm personally -1; I don't like the  
>> idea of
>> freezing arbitrary mutable data structures. Are there champions who
>> want to argue this?
> It has at least one anti-champion.  I think it is a horrible idea  
> and would
> like to see it rejected in a way that brings finality.  If needed,  
> I can
> elaborate in a separate thread.

Could you please do that?  I'd like to understand all of your  
objections.  Thanks!


From arigo at  Sat Feb 11 21:57:35 2006
From: arigo at (Armin Rigo)
Date: Sat, 11 Feb 2006 21:57:35 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Tim,

On Fri, Feb 10, 2006 at 12:19:01PM -0500, Tim Peters wrote:
> Oh, who cares?  I predict "Jack's problem" would go away if we changed
> the declaration of PyArg_ParseTupleAndKeywords to what you intended
> <wink> to begin with:
> PyAPI_FUNC(int) PyArg_ParseTupleAndKeywords(PyObject *, PyObject *,
>                                                   const char *, const
> char * const *, ...);

Alas, this doesn't make gcc happy either.  (I'm trying gcc 3.4.4.)  In
theory, it prevents the const-bypassing trick showed by Martin, but
apparently the C standard (or gcc) is not smart enough to realize that.

I don't see a way to spell it in C so that the same extension module
compiles with 2.4 and 2.5 without a warning, short of icky macros.

A bientot,


From barry at  Sat Feb 11 22:18:59 2006
From: barry at (Barry Warsaw)
Date: Sat, 11 Feb 2006 16:18:59 -0500
Subject: [Python-Dev] PEP 351
In-Reply-To: <>
References: <dsbc3h$rct$><><><>
Message-ID: <>

On Feb 11, 2006, at 3:55 PM, Alex Martelli wrote:

> On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the
>>> idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>> It has at least one anti-champion.  I think it is a horrible idea
>> and would
>> like to see it rejected in a way that brings finality.  If needed,
>> I can
>> elaborate in a separate thread.
> Could you please do that?  I'd like to understand all of your
> objections.  Thanks!

Better yet, add them to the PEP.


From greg.ewing at  Sat Feb 11 22:48:05 2006
From: greg.ewing at (Greg Ewing)
Date: Sun, 12 Feb 2006 10:48:05 +1300
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:

> That future version might get published in 2270,

There are *already* differences which make C and C++
annoyingly incompatible. One is the const char * const *
issue that appeared here. Another is that it no longer
seems to be permissible to forward-declare static things,
which has caused me trouble with Pyrex. That's not
just a deprecation -- some compilers refuse to compile it
at all.

Personally I wouldn't mind about these things, as I
currently don't care if I never write another line of
C++ in my life. But if e.g. Pyrex-generated code is to
interoperate with other people's C++ code, I need to
worry about these issues.

> when everybody has switched to C++, and compatibility
> with C is no longer required.

Yeeks, I hope not! The world needs *less* C++, not more...

> Sure you can still use stdio, and it is
> never going away (it isn't deprecated). However, you
> have to spell the header as
> #include <cstdio>
> and then refer to the functions as std::printf,
> std::stderr, etc.

Which makes it a very different language from C in
this area. That's my point.


From python at  Sat Feb 11 23:04:43 2006
From: python at (Raymond Hettinger)
Date: Sat, 11 Feb 2006 17:04:43 -0500
Subject: [Python-Dev] PEP 351
References: <dsbc3h$rct$><><><>
Message-ID: <001501c62f57$2b070b60$6a01a8c0@RaymondLaptop1>

----- Original Message ----- 
From: "Alex Martelli" <aleaxit at>
To: "Raymond Hettinger" <python at>
Cc: <python-dev at>
Sent: Saturday, February 11, 2006 3:55 PM
Subject: PEP 351

> On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the  idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>> It has at least one anti-champion.  I think it is a horrible idea  and 
>> would
>> like to see it rejected in a way that brings finality.  If needed,  I can
>> elaborate in a separate thread.
> Could you please do that?  I'd like to understand all of your  objections. 
> Thanks!

Here was one email on the subject:

I have a number of comp.lang.python posts on the subject also.

The presence of frozenset() tempts this sort of hypergeneralization.  The 
first stumbling block comes with dictionaries.  Even if you skip past the 
question of why you would want to freeze a dictionary (do you really want to 
use it as a key?), one find that dicts are not naturally freezable -- dicts 
compare using both keys and values; hence, if you want to hash a dict, you 
need to hash both the keys and values, which means that the values have to 
be hashable, a new and suprising requirement -- also, the values cannot be 
mutated or else an equality comparison will fail when search for a frozen 
dict that has been used as a key.  One person who experimented with an 
implementation dealt with the problem by recursively freezing all the 
components (perhaps one of the dict's values is another dict which then 
needs to be frozen too).  Executive summary:  freezing dicts is a can of 
worms and not especially useful.

Another thought is that PEP 351 reflects a world view of wanting to treat 
all containers polymorphically.  I would suggest that they aren't designed 
that way (i.e. you use different methods to add elements to lists, dicts, 
and sets).  Also, it is not especially useful to shovel around mutable 
containers without respect to their type.  Further, even if they were 
polymorphic and freezable, treating them generically is likely to reflect 
bad design -- the soul of good programming is the correct choice of 
appropriate data structures.

Another PEP 351 world view is that tuples can serve as frozenlists; however, 
that view represents a Liskov violation (tuples don't support the same 
methods).  This idea resurfaces and has be shot down again every few months.

More important than all of the above is the thought that auto-freezing is 
like a bad C macro, it makes too much implicit and hides too much -- the 
supported methods change, there is a issue keeping in sync with the 
non-frozen original, etc.

In my experience with frozensets, I've learned that freezing is not an 
incidental downstream effect; instead, it is an intentional, essential part 
of the design and needs to be explicit.

If more is needed on the subject, I'll hunt down my old posts and organize 
them.  I hope we don't offer a freeze() builtin.  If it is there, it will be 
tempting to use it and I think it will steer people away from good design 
and have a net harmful effect.


P.S.  The word "freezing" is itself misleading because it suggests an 
in-place change.  However, it really means that a new object is created 
(just like tuple(somelist)). 

From tim.peters at  Sat Feb 11 23:11:20 2006
From: tim.peters at (Tim Peters)
Date: Sat, 11 Feb 2006 17:11:20 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

[Martin v. L?wis]
> For the moment, I have committed Tim's original proposal.

Thank you!  I checked, and that fixed all the test failures I was
seeing on Windows.

> Moving the macro into pyport.h could be done in addition. That
> should be done only if selectmodule is also adjusted; this currently
> tests for _MSC_VER.

It's a nice illustration of why platform-dependent code sprayed across
modules sucks, too.  Why _MSC_VER instead of MS_WINDOWS?  What's the
difference, exactly?  Who knows?

I see that selectmodule.c has this comment near the top:

   Under BeOS, we suffer the same dichotomy as Win32; sockets can be anything
   >= 0.

but there doesn't appear to be any _code_ matching that comment in
that module -- unless on BeOS _MSC_VER is defined.  Beats me whether
it is, but doubt it.

The code in selectmodule when _MSC_VER is _not_ defined complains if a
socket fd is >= FD_SETSIZE _or_ is < 0.  But the new code in
socketmodule on non-Windows boxes is happy with negative fds, saying
"fine" whenever fd < FD_SETSIZE.  Is that right or wrong?

"The answer" isn't so important to me as that this kind of crap always
happens when platform-specific logic ends up getting defined in
multiple modules.  Much better to define macros to hide this junk,
exactly once; pyport.h is the natural place for it.

From martin at  Sat Feb 11 23:52:45 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 11 Feb 2006 23:52:45 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

Aahz wrote:
>>That future version might get published in 2270, when everybody has
>>switched to C++, and compatibility with C is no longer required.
> Just for the clarification of those of us who are not C/C++ programmers,
> are you saying that this is different from the meaning in Python, where
> "deprecated" means that something *IS* going away?

To repeat the literal words from the standard:

Annex D [depr]:

1 This clause describes features of the C++ Standard that are specified
  for compatibility with existing implementations.
2 These are deprecated features, where deprecated is defined as:
  Normative for the current edition of the Standard, but not guaranteed
  to be part of the Standard in future revisions.


From martin at  Sun Feb 12 00:02:32 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Feb 2006 00:02:32 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

Greg Ewing wrote:
> There are *already* differences which make C and C++
> annoyingly incompatible. One is the const char * const *
> issue that appeared here.

Of course there are differences. C++ has classes, C doesn't.
C++ has function overloading, C doesn't.

C++ has assignment from char** to const char*const*,
C doesn't. Why is it annoying that C++ extends C?

> Another is that it no longer
> seems to be permissible to forward-declare static things,

Not sure what you are referring to. You can forward-declare
static functions in C++ just fine.

>>when everybody has switched to C++, and compatibility
>>with C is no longer required.
> Yeeks, I hope not! The world needs *less* C++, not more...

I'm sure the committee waits until you retire before
deciding that compatibility with C is not needed
anymore :-)

>>Sure you can still use stdio, and it is
>>never going away (it isn't deprecated). However, you
>>have to spell the header as
>>#include <cstdio>
>>and then refer to the functions as std::printf,
>>std::stderr, etc.
> Which makes it a very different language from C in
> this area. That's my point.

That future version of C++ to be published in 2270,
yes, it will be different from C, because the last
C programmer will have died 20 years ago.


From dave at  Sun Feb 12 00:04:00 2006
From: dave at (David Abrahams)
Date: Sat, 11 Feb 2006 18:04:00 -0500
Subject: [Python-Dev] How to get the Python-2.4.2 sources from SVN?
References: <> <>
Message-ID: <>

"Martin v. L?wis" <martin at> writes:

> David Abrahams wrote:
>> It isn't completely clear which branch or tag to get, and Google
>> turned up no obvious documentation.


Dave Abrahams
Boost Consulting

From noamraph at  Sun Feb 12 00:15:12 2006
From: noamraph at (Noam Raphael)
Date: Sun, 12 Feb 2006 01:15:12 +0200
Subject: [Python-Dev] PEP 351
In-Reply-To: <001501c62f57$2b070b60$6a01a8c0@RaymondLaptop1>
References: <dsbc3h$rct$>
Message-ID: <>


I just wanted to say this: you can reject PEP 351, please don't reject
the idea of frozen objects completely. I'm working on an idea similar
to that of the PEP, and I think that it can be done elegantly, without
the concrete problems that Raymond pointed. I didn't work on it in the
last few weeks, because of my job, but I hope to come back to it soon
and post a PEP and a reference implementation in CPython.

My quick responses, mostly to try to convince that I know a bit about
what I'm talking about:

First about the last point: I suggest that the function will be named
frozen(x), which suggests that nothing happens to x, you only get a
"frozen x". I suggest that this operation won't be called "freezing
x", but "making a frozen copy of x".

Now, along with the original order. Frozen dicts - if you want, you
can decide that dicts aren't frozenable, and that's ok. But if you do
want to make frozen copies of dicts, it isn't really such a problem -
it's similar to hashing a tuple, which requires recursive hashing of
all its elements; for making a frozen copy of a dict, you make a
frozen copy of all its values.

Treating all containers polymorphically - I don't suggest that. In my
suggestion, you may have frozen lists, frozen tuples (which are normal
tuples with frozen elements), frozen sets and frozen dicts.

Treating tuples as frozen lists - I don't suggest to do that. But if
my suggestion is accepted, there would be no need for tuples - frozen
lists would be just as useful.

And about the other concerns:

> More important than all of the above is the thought that auto-freezing is
> like a bad C macro, it makes too much implicit and hides too much -- the
> supported methods change, there is a issue keeping in sync with the
> non-frozen original, etc.
> In my experience with frozensets, I've learned that freezing is not an
> incidental downstream effect; instead, it is an intentional, essential part
> of the design and needs to be explicit.

I think these concerns can only be judged given a real suggestion,
along with an implementation. I have already implemented most of my
idea in CPython, and I think it's elegant and doesn't cause problems.
Of course, I may not be objective about the subject, but I only ask to
wait for the real suggestion before dropping it down.

To summarize, I see the faults in PEP 351. I think that another,
fairly similar idea might be a good one.

Have a good week,

From martin at  Sun Feb 12 00:45:59 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Feb 2006 00:45:59 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Armin Rigo wrote:
> Alas, this doesn't make gcc happy either.  (I'm trying gcc 3.4.4.)  In
> theory, it prevents the const-bypassing trick showed by Martin, but
> apparently the C standard (or gcc) is not smart enough to realize that.

It appears to be language-defined. Looking at the assignment

  char **a;
  const char* const* b;
  b = a;

then, in C++, 4.4p4 [conv.qual] has a rather longish formula to
decide that the assignment is well-formed. In essence, it goes
like this:
- the pointers are "similar": they have the same levels of indirection,
  and the same underlying type.
- In all places where the type of a has const/volatile qualification,
  the type of b also has these qualifications (i.e. none in the
- Starting from the first point where the qualifications differ
  (from left to right), all later levels also have const.

I'm unsure about C; I think the rule comes from

       [#2]  For  any  qualifier  q, a pointer to a non-q-qualified
       type may be  converted  to  a  pointer  to  the  q-qualified
       version  of  the type; the values stored in the original and
       converted pointers shall compare equal.

So it is possible to convert a non-const pointer to a const pointer,
but only if the the target types are the same. In the example, they
are not: the target type of a is char*, the target of b is
const char*.


From martin at  Sun Feb 12 00:54:41 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Feb 2006 00:54:41 +0100
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Tim Peters wrote:
> The code in selectmodule when _MSC_VER is _not_ defined complains if a
> socket fd is >= FD_SETSIZE _or_ is < 0.  But the new code in
> socketmodule on non-Windows boxes is happy with negative fds, saying
> "fine" whenever fd < FD_SETSIZE.  Is that right or wrong?

I think it is right: the code just "knows" that negative values
cannot happen. The socket handles originate from system calls
(socket(2), accept(2)), and a negative value returned there is
an error. However, the system might (and did) return handles
larger than FD_SETSIZE (as the kernel often won't know what
value FD_SETSIZE has).

> "The answer" isn't so important to me as that this kind of crap always
> happens when platform-specific logic ends up getting defined in
> multiple modules.  Much better to define macros to hide this junk,
> exactly once; pyport.h is the natural place for it.

That must be done carefully, though. For example, how should
the line

                max = 0;                     /* not used for Win32 */

be treated? Should we introduce a


From ncoghlan at  Sun Feb 12 03:05:17 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 12 Feb 2006 12:05:17 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore wrote:
> On 2/11/06, Nick Coghlan <ncoghlan at> wrote:
>> I finally finished updating PEP 338 to comply with the flexible importing
>> system in PEP 302.
>> The result is a not-yet-thoroughly-tested module that should allow the -m
>> switch to execute any module written in Python that is accessible via an
>> absolute import statement.
> Does this implementation resolve as
> well? A reading of the PEP would seem to imply that it does, but the
> SF patches you mention don't include any changes to the core, so I'm
> not sure...

I copied the module and test packages over to my Python 2.4 site packages, and 
running modules from inside zip packages does indeed work as intended (with an 
explicit redirection through runpy, naturally). Kudos to the PEP 302 folks - I 
only tested with's Python emulation of PEP 302 style imports for the 
normal file system initially, but zipimport still worked correctly on the 
first go.

For Python 2.5, this redirection from the command line switch to 
runpy.run_module should be automatic.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From raymond.hettinger at  Sun Feb 12 03:49:47 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sat, 11 Feb 2006 21:49:47 -0500
Subject: [Python-Dev] PEP 351
References: <dsbc3h$rct$>
Message-ID: <001c01c62f7e$fd086b50$b83efea9@RaymondLaptop1>

> I just wanted to say this: you can reject PEP 351, please don't reject
> the idea of frozen objects completely. I'm working on an idea similar
> to that of the PEP,
 . . .
> I think these concerns can only be judged given a real suggestion,
> along with an implementation. I have already implemented most of my
> idea in CPython, and I think it's elegant and doesn't cause problems.
> Of course, I may not be objective about the subject, but I only ask to
> wait for the real suggestion before dropping it down

I was afraid of this -- the freezing concept is a poison that will cause 
some good minds to waste a good deal of their time.  Once frozensets were 
introduced, it was like lighting a flame drawing moths to their doom.  At 
first, it seems like such a natural, obvious extension to generically freeze 
anything that is mutable.  People exploring it seem to lose sight of 
motivating use cases and get progressively turned around.  It doesn't take 
long to suddenly start thinking it is a good idea to have mutable strings, 
to recursively freeze components of a dictionary, to introduce further 
list/tuple variants, etc.  Perhaps a consistent solution can be found, but 
it no longer resembles Python; rather, it is a new language, one that is not 
grounded in real-world use cases.  Worse, I think a frozen() built-in would 
be hazardous to users, drawing them away from better solutions to their 

Expect writing and defending a PEP to consume a month of your life.  Before 
devoting more of your valuable time, here's a checklist of questions to ask 
yourself (sort of a mid-project self-assessment and reality check):

1.  It is already possible to turn many objects into key strings -- perhaps 
by marshaling, pickling, or making a custom repr such as 
repr(sorted(mydict.items())).  Have you ever had occasion to use this?  IOW, 
have you ever really needed to use a dictionary as a key to another 
dictionary?   Has there been any clamor for a frozendict(), not as a toy 
recipe but as a real user need that cannot be met by other Python 
techniques?  If the answer is no, it should be a hint that a generalized 
freezing protocol will rot in the basement.

2. Before introducing a generalized freezing protocol, wouldn't it make 
sense to write a third-party extension for just frozendicts, just to see if 
anyone can possibly make productive use of it?  One clue would be to search 
for code that exercises the existing code in dict.__eq__().  If you rarely 
have occasion to compare dicts, then it is certainly even more rare to want 
to be able to hash them.  If not, then is this project being pursued because 
it is interesting or because there's a burning need that hasn't surfaced 

3. Does working out the idea entail recursive freezing of a dictionary? 
Does that impose limits on generality (you can freeze some dicts but not 
others)?  Does working out the idea lead you to mutable strings?  If so, 
don't count on Guido's support..

4. Leaving reality behind (meaning actual problems that aren't readily 
solvable with the existing language), try to contrive some hypothetical use 
cases?  Any there any that are not readily met by the simple recipe in the 
earlier email: ?

5. How extensively does the rest of Python have to change to support the new 
built-in.  If the patch ends-up touching many objects and introducing new 
rules, then the payoff needs to be pretty darned good.  I presume that for 
frozen(x) to work a lot of types have to be modified.  Python seems to fare 
quite well without frozendicts and frozenlists, so do we need to introduce 
them just to make the new frozen() built-in work with more than just sets?


From bokr at  Sun Feb 12 04:24:17 2006
From: bokr at (Bengt Richter)
Date: Sun, 12 Feb 2006 03:24:17 GMT
Subject: [Python-Dev] PEP 351
References: <dsbc3h$rct$><><><>
Message-ID: <>

On Sat, 11 Feb 2006 12:55:10 -0800, Alex Martelli <aleaxit at> wrote:

>On Feb 10, 2006, at 1:05 PM, Raymond Hettinger wrote:
>> [Guido van Rossum]
>>> PEP 351 - freeze protocol. I'm personally -1; I don't like the  
>>> idea of
>>> freezing arbitrary mutable data structures. Are there champions who
>>> want to argue this?
>> It has at least one anti-champion.  I think it is a horrible idea  
>> and would
>> like to see it rejected in a way that brings finality.  If needed,  
>> I can
>> elaborate in a separate thread.
>Could you please do that?  I'd like to understand all of your  
>objections.  Thanks!
PMJI. I just read PEP 351, and had an idea for doing the same without pre-instantiating protected
subclasses, and doing the wrapping on demand instead. Perhaps of interest? (Or if already considered
and rejected, shouldn't this be mentioned in the PEP?)

The idea is to factor out freezing from the objects to be frozen. If it's going to involve copying anyway,
feeding the object to a wrapping class constructor doesn't seem like much extra overhead.

The examples in the PEP were very amenable to this approach, but I don't know how it would apply
to whatever Alex's use cases might be.

Anyhow, why shouldn't you be able to call freeze(an_ordinary_list) and get back freeze(xlist(an_ordinary_list))
automatically, based e.g. on a freeze_registry_dict[type(an_ordinary_list)] => xlist lookup, if plain hash fails?

Common types that might be usefully freezable could be pre-registered, and when a freeze fails
on a user object (presumably inheriting a __hash__ that bombs or because he wants it to) the programmer's
solution would be to define a suitable callable to produce the frozen object, and register that, but not modify his
unwrapped pre-freeze-mods object types and instantiations.

BTW, xlist wouldn't need to exist, since freeze_registry_dict[type(alist)] could just return the tuple type.
Otherwise the programmer would make a wrapper class taking the object as an __init__ (or maybe __new__) arg,
and intercepting the mutating methods etc., and stuff that in the freeze_registry_dict. IWT some metaclass stuff
might make it possible to parameterize a lot of wrapper class aspects, e.g., if you gave it a
__mutator_method_name_list__ to work with.

Perhaps freeze builtin could be a callable object with __call__ for the freeze "function" call
and with e.g. freeze.register(objtype, wrapper_class) as a registry API.

I am +0 on any of this in any case, not having had a use case to date, but I thought taking the
__freeze__ out of the objects (by not forcing them to be them pre-instantiatated as wrapped instances)
and letting registered freeze wrappers do it on demand instead might be interesting to someone.
If not, or if it's been discussed (no mention on the PEP tho) feel free to ignore ;-)

BTW freeze as just described might be an instance of

class Freezer(object):
    def __init__(self):
        self._registry_dict = {
    def __call__(self, obj):
        try: return hash(obj)
        except TypeError:
            freezer = self._registry_dict.get(type(obj))
            if freezer: return freezer(obj)
            raise TypeError('object is not freezable')
    def register(self, objtype, wrapper):
        self._registry_dict[objtype] = wrapper

(above refers to imdict from PEP 351) 
Usage example:

 >>> import alt351
 >>> freeze = alt351.Freezer()
(well, pretend freeze is builtin)

 >>> fr5 = freeze(range(5))
 >>> fr5
 (0, 1, 2, 3, 4)
 >>> d = dict(a=1,b=2)
 >>> d
 {'a': 1, 'b': 2}
 >>> fd = freeze(d)
 >>> fd
 {'a': 1, 'b': 2}
 >>> fd['a']
 >>> fd['a']=3
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "", line 7, in _immutable
     raise TypeError('object is immutable')
 TypeError: object is immutable
 >>> type(fd)
 <class 'alt351.imdict'>

+0 ;-)

Bengt Richter

From tim.peters at  Sun Feb 12 04:35:35 2006
From: tim.peters at (Tim Peters)
Date: Sat, 11 Feb 2006 22:35:35 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

>> The code in selectmodule when _MSC_VER is _not_ defined complains if a
>> socket fd is >= FD_SETSIZE _or_ is < 0.  But the new code in
>> socketmodule on non-Windows boxes is happy with negative fds, saying
>> "fine" whenever fd < FD_SETSIZE.  Is that right or wrong?

> I think it is right: the code just "knows" that negative values
> cannot happen. The socket handles originate from system calls
> (socket(2), accept(2)), and a negative value returned there is
> an error. However, the system might (and did) return handles
> larger than FD_SETSIZE (as the kernel often won't know what
> value FD_SETSIZE has).

Since the new code was just added, you can remember that now.  No
comments record the reasoning, though, and over time it's likely to
become another mass of micro-optimized "mystery code".  If it's true
that negative values can't happen (and I believe that), then it
doesn't hurt to verify that they're >= 0 either (except from a
micro-efficiency view), and it would simplify the code do to so.

>> "The answer" isn't so important to me as that this kind of crap always
>> happens when platform-specific logic ends up getting defined in
>> multiple modules.  Much better to define macros to hide this junk,
>> exactly once; pyport.h is the natural place for it.

> That must be done carefully, though. For example, how should
> the line
>                 max = 0;                     /* not used for Win32 */
> be treated? Should we introduce a

I wouldn't:  I'd simply throw away the current confusing avoidance of
computing "max" on Windows.  That's another case where
platform-specific micro-efficiency seems the only justification
(select() on Windows ignores its first argument; there's nothing
special about "0" here, despite that the code currently makes 0 _look_
special on Windows somehow).

So fine by me if the current:

#if defined(_MSC_VER)
		max = 0;		     /* not used for Win32 */
#else  /* !_MSC_VER */
		if (v < 0 || v >= FD_SETSIZE) {
				    "filedescriptor out of range in select()");
			goto finally;
		if (v > max)
			max = v;
#endif /* _MSC_VER */

block got replaced by, e.g.,:

		max = 0;
		if (! Py_IS_SOCKET_FD_OK(v)) {
				    "filedescriptor out of range in select()");
			goto finally;
		if (v > max)
			max = v;

Unlike the current code, that would, for example, also allow for the
_possibility_ of checking that v != INVALID_SOCKET on Windows, by
fiddling the Windows expansion of Py_IS_SOCKET_FD_OK (and of course
all users of that macro would grow the same new smarts).

I'm not really a macro fan:  I'm a fan of centralizing portability
hacks in config header files, and hiding them under abstractions.  C
macros are usually strong enough to support this, and are all the
concession to micro-efficiency I'm eager ;-) to make.

From ncoghlan at  Sun Feb 12 04:48:20 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 12 Feb 2006 13:48:20 +1000
Subject: [Python-Dev] PEP 338 - Executing Modules as Scripts
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore wrote:
> On 2/11/06, Nick Coghlan <ncoghlan at> wrote:
>> I finally finished updating PEP 338 to comply with the flexible importing
>> system in PEP 302.
>> The result is a not-yet-thoroughly-tested module that should allow the -m
>> switch to execute any module written in Python that is accessible via an
>> absolute import statement.
> Does this implementation resolve as
> well? A reading of the PEP would seem to imply that it does, but the
> SF patches you mention don't include any changes to the core, so I'm
> not sure...

I've uploaded a patch with the necessary changes to main.c to the PEP 338 
implementation tracker item.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From nnorwitz at  Sun Feb 12 06:59:23 2006
From: nnorwitz at (Neal Norwitz)
Date: Sat, 11 Feb 2006 21:59:23 -0800
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/11/06, Tim Peters <tim.peters at> wrote:
>> [Tim telling how I broke pyuthon]
> [Martin fixing it]

Sorry for the breakage (I didn't know about the Windows issues). 
Thank you Martin for fixing it.  I agree with the solution.

I was away from mail, ahem, "working".


From nnorwitz at  Sun Feb 12 07:32:58 2006
From: nnorwitz at (Neal Norwitz)
Date: Sat, 11 Feb 2006 22:32:58 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Guido van Rossum <guido at> wrote:
> Next, the schedule. Neal's draft of the schedule has us releasing 2.5
> in October. That feels late -- nearly two years after 2.4 (which was
> released on Nov 30, 2004). Do people think it's reasonable to strive
> for a more aggressive (by a month) schedule, like this:
>     alpha 1: May 2006
>     alpha 2: June 2006
>     beta 1:  July 2006
>     beta 2:  August 2006
>     rc 1:    September 2006
>     final:   September 2006

I think this is very reasonable.  Based on Martin's message and if we
can get everyone fired up and implementing, it would possible to start
in April.  I'll update the PEP for starting in May now.  We can revise
further later.

> ??? Would anyone want to be even more aggressive (e.g. alpha 1 right
> after PyCon???). We could always do three alphas.

I think PyCon is too early, but 3 alphas is a good idea.  I'll add
this as well.  Probably separated by 3-4 weeks so it doesn't change
the schedule much.  The exact schedule will still changed based on
release manager availability and other stuff that needs to be

> >    PEP 353: Using ssize_t as the index type
> Neal tells me that this is in progress in a branch, but that the code
> is not yet flawless (tons of warnings etc.). Martin, can you tell us
> more? When do you expect this to land? Maybe aggressively merging into
> the HEAD and then releasing it as alpha would be a good way to shake
> out the final issues???

I'm tempted to say we should merge now.  I know the branch works on
64-bit boxes.  I can test on a 32-bit box if Martin hasn't already. 
There will be a lot of churn fixing problems, but maybe we can get
more people involved.


From nnorwitz at  Sun Feb 12 07:38:10 2006
From: nnorwitz at (Neal Norwitz)
Date: Sat, 11 Feb 2006 22:38:10 -0800
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <dsivf1$p6j$>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/10/06, Georg Brandl <g.brandl at> wrote:
> I am not experienced in releasing, but with the multitude of new things
> introduced in Python 2.5, could it be a good idea to release an early alpha
> not long after all (most of?) the desired features are in the trunk?

In the past, all new features had to be in before beta 1 IIRC (it
could have been beta 2 though).  The goal is to get things in sooner,
preferably prior to alpha.

For 2.5, we should strive really hard to get features implemented
prior to alpha 1.  Some of the changes (AST, ssize_t) are pervasive. 
AST while localized, ripped the guts out of something every script
needs (more or less).  ssize_t touches just about everything it seems.


From thomas at  Sun Feb 12 11:51:41 2006
From: thomas at (Thomas Wouters)
Date: Sun, 12 Feb 2006 11:51:41 +0100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Sat, Feb 11, 2006 at 10:38:10PM -0800, Neal Norwitz wrote:
> On 2/10/06, Georg Brandl <g.brandl at> wrote:

> > I am not experienced in releasing, but with the multitude of new things
> > introduced in Python 2.5, could it be a good idea to release an early alpha
> > not long after all (most of?) the desired features are in the trunk?

> In the past, all new features had to be in before beta 1 IIRC (it
> could have been beta 2 though).  The goal is to get things in sooner,
> preferably prior to alpha.

Well, in the past, features -- even syntax changes -- have gone in between
the last beta and the final release (but reminding Guido might bring him to
tears of regret. ;) Features have also gone into what would have been
'bugfix releases' if you looked at the numbering alone (1.5 -> 1.5.1 ->
1.5.2, for instance.) "The past" doesn't have a very impressive track
record... However, beta 1 is a very good ultimate deadline, and it's been
stuck by for the last few years, AFAIK. But I concur with:

> For 2.5, we should strive really hard to get features implemented
> prior to alpha 1.  Some of the changes (AST, ssize_t) are pervasive. 
> AST while localized, ripped the guts out of something every script
> needs (more or less).  ssize_t touches just about everything it seems.

that as many features as possible, in particular the broad-touching ones,
should be in alpha 1.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From martin at  Sun Feb 12 12:13:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Feb 2006 12:13:53 +0100
Subject: [Python-Dev] ssize_t branch (Was: release plan for 2.5 ?)
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>
Message-ID: <>

Neal Norwitz wrote:
> I'm tempted to say we should merge now.  I know the branch works on
> 64-bit boxes.  I can test on a 32-bit box if Martin hasn't already. 
> There will be a lot of churn fixing problems, but maybe we can get
> more people involved.

The ssize_t branch has now all the API I want it to have. I just
posted the PEP to comp.lang.python, maybe people have additional
things they consider absolutely necessary.

There are two aspects left, and both can be done after the merge:
- a lot of modules still need adjustments, to really support
  64-bit collections. This shouldn't cause any API changes, AFAICT.

- the printing of Py_ssize_t values should be supported. I think
  Tim proposed to provide the 'z' formatter across platforms.
  This is a new API, but it's a pure extension, so it can be
  done in the trunk.

I would like to avoid changing APIs after the merge to the trunk
has happened; I remember Guido saying (a few years ago) that this
change must be a single large change, rather many small incremental
changes. I agree, and I hope I have covered everything that needs
to be covered.


From smiles at  Sun Feb 12 19:44:51 2006
From: smiles at (Smith)
Date: Sun, 12 Feb 2006 12:44:51 -0600
Subject: [Python-Dev] nice()
Message-ID: <038701c63004$733603c0$132c4fca@csmith>

I've been thinking about a function that was recently proposed at python-dev named 'areclose'. It is a function that is meant to tell whether two (or possible more) numbers are close to each other. It is a function similar to one that exists in Numeric. One such implementation is

def areclose(x,y,abs_tol=1e-8,rel_tol=1e-5):
    diff = abs(x-y)
    return diff <= ans_tol or diff <= rel_tol*max(abs(x),abs(y))

(This is the form given by Scott Daniels on python-dev.)

Anyway, one of the rationales for including such a function was: 

  When teaching some programming to total newbies, a common frustration
  is how to explain why a==b is False when a and b are floats computed
  by different routes which ``should'' give the same results (if
  arithmetic had infinite precision).  Decimals can help, but another
  approach I've found useful is embodied in Numeric.allclose(a,b) --
  which returns True if all items of the arrays are ``close'' (equal to
  within certain absolute and relative tolerances)
The problem with the above function, however, is that it *itself* has a comparison between floats and it will give undesired result for something like the following test:

>>> print areclose(2, 2.1, .1, 0) #see if 2 and 2.1 are within 0.1 of each other

Here is an alternative that might be a nice companion to the repr() and round() functions: nice(). It is a combination of Tim Peter's delightful 'case closed' presentation in the thread, "Rounding to n significant digits?" [1] and the hidden magic of "prints" simplification of floating point numbers when being asked to show them. 

It's default behavior is to return a number in the form that the number would have when being printed. An optional argument, however, allows the user to specify the number of digits to round the number to as counted from the most significant digit. (An alternative name, then, could be 'lround' but I think there is less baggage for the new user to think about if the name is something like nice()--a function that makes the floating point numbers "play nice." And I also think the name...sounds nice.) 

Here it is in action:

>>> 3*1.1==3.3
>>> nice(3*1.1)==nice(3.3)
>>> x=3.21/0.65; print x
>>> print nice(x,2)
>>> x=x*1e5; print nice(x,2)

Here's the function: 
def nice(x,leadingDigits=0):
 """Return x either as 'print' would show it (the default) or rounded to the
 specified digit as counted from the leftmost non-zero digit of the number,

 e.g. nice(0.00326,2) --> 0.0033"""
 assert leadingDigits>=0
 if leadingDigits==0:
  return float(str(x)) #just give it back like 'print' would give it
 return float('%.*e' % (leadingDigits,x)) #give it back as rounded by the %e format

Might something like this be useful? For new users, no arguments are needed other than x and floating points suddenly seem to behave in tests made using nice() values. It's also useful for those computing who want to show a physically meaningful value that has been rounded to the appropriate digit as counted from the most significant digit rather than from the decimal point. 

Some time back I had worked on the significant digit problem and had several math calls to figure out what the exponent was. The beauty of Tim's solution is that you just use built in string formatting to do the work. Nice.


-------------- next part --------------
An HTML attachment was scrubbed...

From mwh at  Mon Feb 13 00:30:27 2006
From: mwh at (Michael Hudson)
Date: Sun, 12 Feb 2006 23:30:27 +0000
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
	(Phillip J. Eby's message of "Fri, 10 Feb 2006 16:07:50 -0500")
References: <>
Message-ID: <>

"Phillip J. Eby" <pje at> writes:

> At 12:21 PM 2/10/2006 -0800, Guido van Rossum wrote:
>> >    PEP 343: The "with" Statement
>>Didn't Michael Hudson have a patch?
> PEP 343's "Accepted" status was reverted to "Draft" in October, and then 
> changed back to "Accepted".  I believe the latter change is an error, since 
> you haven't pronounced on the changes.  Have you reviewed the __context__ 
> stuff that was added?
> In any case Michael's patch was pre-AST branch merge, and no longer 
> reflects the current spec.

It also never quite reflected the spec at the time, although I forget
the detail it didn't support :/


81. In computing, turning the obvious into the useful is a living
    definition of the word "frustration".
  -- Alan Perlis,

From kxroberto at  Sun Feb 12 21:46:50 2006
From: kxroberto at (Robert)
Date: Sun, 12 Feb 2006 21:46:50 +0100
Subject: [Python-Dev] Fwd: Ruby/Python Continuations: Turning a block
 callback into a read()-method ?
Message-ID: <>

Fwd: news:<dso4vi$2ndu$1 at>

After failing on a yield/iterator-continuation problem in Python (see
below) I tried the Ruby (1.8.2) language first time on that construct:
The example tries to convert a block callback interface
(Net::FTP.retrbinary) into a read()-like iterator function in order to
virtualize the existing FTP class as kind of file system.  4 bytes max
per read in this first simple test below. But it fails on the second
continuation with ThreadError after this second continuation really
executing!? Any ideas how to make this work/correct?

(The question is not about the specific FTP example as it - e.g. about a
rewrite of FTP/retrbinary or use of OS tricks, real threads with polling
etc... - but about the continuation language trick to get the execution
flow right in order to turn any callback interface into an "enslaved
callable iterator". Python can do such things in simple situations with
yield-generator functions/ But Python obviously fails by a
hair when there is a function-context barrier for "yield". Ruby's
block-yield-mechanism seems to not at all have the power of real
generator-continuation as in Python, but in principle only to be that
what a normal callback would be in Python. Yet "callcc" seemes to be
promising - I thought so far :-(   )

=== Ruby callcc Pattern : execution fails with ThreadError!? ===========
require 'net/ftp'
module Net

class FTPFile
   def initialize(ftp,path)
      @ftp = ftp
   def read
      if @iter
         puts ""
         puts "RETR "+ at path
         @ftp.retrbinary("RETR "+ at path,4) do |block|
            print "CALLBACK ",block,"\n"
            callcc{|@iter| @flag=true}
            if @flag
               return block


ftp ="localhost",'user','pass')
ff  =,'data.txt')

=== Output/Error ====

vs:~/test$ ruby ftpfile.rb
RETR data.txt

/usr/lib/ruby/1.8/monitor.rb:259:in `mon_check_owner': current thread
not owner (ThreadError)
         from /usr/lib/ruby/1.8/monitor.rb:211:in `mon_exit'
         from /usr/lib/ruby/1.8/monitor.rb:231:in `synchronize'
         from /usr/lib/ruby/1.8/net/ftp.rb:399:in `retrbinary'
         from ftpfile.rb:17:in `read'
         from ftpfile.rb:33

===  Python Pattern : I cannot write down the idea because of a barrier ===

#### I tried a pattern like:
     def open(self,ftppath,mode='rb'):
         class FTPFile:
             def iter_retr()
                 def callback(blk):
                     how-to-yield-from-here-as-iter_retr blk???
                 self.ftp.retrbinary("RETR %s" % self.relpath,callback)
             def read(self, bytes=-1):



From alan.gauld at  Mon Feb 13 00:24:45 2006
From: alan.gauld at (Alan Gauld)
Date: Sun, 12 Feb 2006 23:24:45 -0000
Subject: [Python-Dev] [Tutor] nice()
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <00a001c6302b$82d51f10$0b01a8c0@xp>

I have no particularly strong view on the concept (except that I usually
see the "problem" as a valuable opportunity to introduce a concept
that has far wider reaching consequences than floating point

However I do dislike the name nice() - there is already a nice() in the
os module with a fairly well understood function. But I'm sure some
time with a thesaurus can overcome that single mild objection. :-)

Alan G
Author of the learn to program web tutor

----- Original Message ----- 
From: "Smith" <smiles at>
To: <tutor at>
Cc: <edu-sig at>; <python-dev at>
Sent: Sunday, February 12, 2006 6:44 PM
Subject: [Tutor] nice()

I've been thinking about a function that was recently proposed at python-dev 
named 'areclose'. It is a function that is meant to tell whether two (or 
possible more) numbers are close to each other. It is a function similar to 
one that exists in Numeric. One such implementation is

def areclose(x,y,abs_tol=1e-8,rel_tol=1e-5):
    diff = abs(x-y)
    return diff <= ans_tol or diff <= rel_tol*max(abs(x),abs(y))

(This is the form given by Scott Daniels on python-dev.)

Anyway, one of the rationales for including such a function was:

  When teaching some programming to total newbies, a common frustration
  is how to explain why a==b is False when a and b are floats computed
  by different routes which ``should'' give the same results (if
  arithmetic had infinite precision).  Decimals can help, but another
  approach I've found useful is embodied in Numeric.allclose(a,b) --
  which returns True if all items of the arrays are ``close'' (equal to
  within certain absolute and relative tolerances)
The problem with the above function, however, is that it *itself* has a 
comparison between floats and it will give undesired result for something 
like the following test:

>>> print areclose(2, 2.1, .1, 0) #see if 2 and 2.1 are within 0.1 of each 
>>> other

Here is an alternative that might be a nice companion to the repr() and 
round() functions: nice(). It is a combination of Tim Peter's delightful 
'case closed' presentation in the thread, "Rounding to n significant 
digits?" [1] and the hidden magic of "prints" simplification of floating 
point numbers when being asked to show them.

It's default behavior is to return a number in the form that the number 
would have when being printed. An optional argument, however, allows the 
user to specify the number of digits to round the number to as counted from 
the most significant digit. (An alternative name, then, could be 'lround' 
but I think there is less baggage for the new user to think about if the 
name is something like nice()--a function that makes the floating point 
numbers "play nice." And I also think the name...sounds nice.)

Here it is in action:

>>> 3*1.1==3.3
>>> nice(3*1.1)==nice(3.3)
>>> x=3.21/0.65; print x
>>> print nice(x,2)
>>> x=x*1e5; print nice(x,2)

Here's the function:
def nice(x,leadingDigits=0):
 """Return x either as 'print' would show it (the default) or rounded to the
 specified digit as counted from the leftmost non-zero digit of the number,

 e.g. nice(0.00326,2) --> 0.0033"""
 assert leadingDigits>=0
 if leadingDigits==0:
  return float(str(x)) #just give it back like 'print' would give it
 return float('%.*e' % (leadingDigits,x)) #give it back as rounded by the %e 

Might something like this be useful? For new users, no arguments are needed 
other than x and floating points suddenly seem to behave in tests made using 
nice() values. It's also useful for those computing who want to show a 
physically meaningful value that has been rounded to the appropriate digit 
as counted from the most significant digit rather than from the decimal 

Some time back I had worked on the significant digit problem and had several 
math calls to figure out what the exponent was. The beauty of Tim's solution 
is that you just use built in string formatting to do the work. Nice.



From jcarlson at  Mon Feb 13 01:14:50 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 12 Feb 2006 16:14:50 -0800
Subject: [Python-Dev] [Tutor] nice()
In-Reply-To: <00a001c6302b$82d51f10$0b01a8c0@xp>
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <>

"Alan Gauld" <alan.gauld at> wrote:
> However I do dislike the name nice() - there is already a nice() in the
> os module with a fairly well understood function. But I'm sure some
> time with a thesaurus can overcome that single mild objection. :-)

Presumably it would be located somewhere like the math module.

 - Josiah

> Alan G
> Author of the learn to program web tutor
> ----- Original Message ----- 
> From: "Smith" <smiles at>
> To: <tutor at>
> Cc: <edu-sig at>; <python-dev at>
> Sent: Sunday, February 12, 2006 6:44 PM
> Subject: [Tutor] nice()
> I've been thinking about a function that was recently proposed at python-dev 
> named 'areclose'. It is a function that is meant to tell whether two (or 
> possible more) numbers are close to each other. It is a function similar to 
> one that exists in Numeric. One such implementation is
> def areclose(x,y,abs_tol=1e-8,rel_tol=1e-5):
>     diff = abs(x-y)
>     return diff <= ans_tol or diff <= rel_tol*max(abs(x),abs(y))
> (This is the form given by Scott Daniels on python-dev.)
> Anyway, one of the rationales for including such a function was:
>   When teaching some programming to total newbies, a common frustration
>   is how to explain why a==b is False when a and b are floats computed
>   by different routes which ``should'' give the same results (if
>   arithmetic had infinite precision).  Decimals can help, but another
>   approach I've found useful is embodied in Numeric.allclose(a,b) --
>   which returns True if all items of the arrays are ``close'' (equal to
>   within certain absolute and relative tolerances)
> The problem with the above function, however, is that it *itself* has a 
> comparison between floats and it will give undesired result for something 
> like the following test:
> ###
> >>> print areclose(2, 2.1, .1, 0) #see if 2 and 2.1 are within 0.1 of each 
> >>> other
> False
> >>>
> ###
> Here is an alternative that might be a nice companion to the repr() and 
> round() functions: nice(). It is a combination of Tim Peter's delightful 
> 'case closed' presentation in the thread, "Rounding to n significant 
> digits?" [1] and the hidden magic of "prints" simplification of floating 
> point numbers when being asked to show them.
> It's default behavior is to return a number in the form that the number 
> would have when being printed. An optional argument, however, allows the 
> user to specify the number of digits to round the number to as counted from 
> the most significant digit. (An alternative name, then, could be 'lround' 
> but I think there is less baggage for the new user to think about if the 
> name is something like nice()--a function that makes the floating point 
> numbers "play nice." And I also think the name...sounds nice.)
> Here it is in action:
> ###
> >>> 3*1.1==3.3
> False
> >>> nice(3*1.1)==nice(3.3)
> True
> >>> x=3.21/0.65; print x
> 4.93846153846
> >>> print nice(x,2)
> 4.9
> >>> x=x*1e5; print nice(x,2)
> 490000.0
> ###
> Here's the function:
> ###
> def nice(x,leadingDigits=0):
>  """Return x either as 'print' would show it (the default) or rounded to the
>  specified digit as counted from the leftmost non-zero digit of the number,
>  e.g. nice(0.00326,2) --> 0.0033"""
>  assert leadingDigits>=0
>  if leadingDigits==0:
>   return float(str(x)) #just give it back like 'print' would give it
>  leadingDigits=int(leadingDigits)
>  return float('%.*e' % (leadingDigits,x)) #give it back as rounded by the %e 
> format
> ###
> Might something like this be useful? For new users, no arguments are needed 
> other than x and floating points suddenly seem to behave in tests made using 
> nice() values. It's also useful for those computing who want to show a 
> physically meaningful value that has been rounded to the appropriate digit 
> as counted from the most significant digit rather than from the decimal 
> point.
> Some time back I had worked on the significant digit problem and had several 
> math calls to figure out what the exponent was. The beauty of Tim's solution 
> is that you just use built in string formatting to do the work. Nice.
> /c
> [1] 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From kd5bjo at  Mon Feb 13 01:19:46 2006
From: kd5bjo at (Eric Sumner)
Date: Sun, 12 Feb 2006 18:19:46 -0600
Subject: [Python-Dev] PEP 343: Context managers a superset of decorators?
Message-ID: <>

Forgive me if someone has already come up with this; I know I am
coming to the party several months late.  All of the proposals for
decorators (including the accepted one) seemed a bit kludgey to me,
and I couldn't figure out why.  When I read PEP 343, I realized that
they all provide a solution for an edge case without addressing the
larger problem.

If context managers are provided access to the contained and
containing namespaces of their with statement, they can perform the
same function that decorators do now.  A transforming class could be
implemented as:

    ## Code Start -------------------------------------------------
    class DecoratorContext(object):
        def __init__(self, func): self.func = func
        def __context__(self): return self
        def __enter__(self, contained, containing): pass
        def __exit__(self, contained, containing):
            for k,v in contained.iteritems():
                containing[k] = self.func(v)
    ## Code End ---------------------------------------------------

With this in place, decorators can be used with the with statement:

    ## Code Start -------------------------------------------------
    classmethod = DecoratorContext(classmethod)

    class foo:
        def __init__(self, ...): pass
        with classmethod:
            def method1(cls, ...):
            def method2(cls, ...):
    ## Code End ---------------------------------------------------

The extra level of indention could be avoided by dealing with multiple
block-starting statements on a line by stating that all except the
last block contain only one statement:

    ## Code Start -------------------------------------------------
    classmethod = DecoratorContext(classmethod)

    class foo:
        def __init__(self, ...): pass
        with classmethod: def method1(cls, ...):
        with classmethod: def method2(cls, ...):
    ## Code End ---------------------------------------------------

I will readily admit that I have no idea how difficult either of these
suggestions would be to implement, or if it would be a good idea to do
so.  At this point, they are just something to think about

  -- Eric Sumner

From jcarlson at  Mon Feb 13 03:24:18 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 12 Feb 2006 18:24:18 -0800
Subject: [Python-Dev] PEP 343: Context managers a superset of decorators?
In-Reply-To: <>
References: <>
Message-ID: <>

Eric Sumner <kd5bjo at> wrote:
> Forgive me if someone has already come up with this; I know I am
> coming to the party several months late.  All of the proposals for
> decorators (including the accepted one) seemed a bit kludgey to me,
> and I couldn't figure out why.  When I read PEP 343, I realized that
> they all provide a solution for an edge case without addressing the
> larger problem.

[snip code samples]

> I will readily admit that I have no idea how difficult either of these
> suggestions would be to implement, or if it would be a good idea to do
> so.  At this point, they are just something to think about

Re-read the decorator PEP: to
understand why both of these options (indentation and prefix notation)
are undesireable for a general decorator syntax.

The desire for context managers to have access to its enclosing scope is
another discussion entirely, though it may do so without express
permission via stack frame manipulation.

 - Josiah

From martin at  Mon Feb 13 05:06:21 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Feb 2006 05:06:21 +0100
Subject: [Python-Dev] Fwd: Ruby/Python Continuations: Turning a block
 callback into a read()-method ?
In-Reply-To: <>
References: <>
Message-ID: <>

Robert wrote:
> Any ideas how to make this work/correct?

Why is that a question for python-dev?


From steve at  Mon Feb 13 05:39:10 2006
From: steve at (Steve Holden)
Date: Sun, 12 Feb 2006 23:39:10 -0500
Subject: [Python-Dev] Pervasive socket failures on Windows
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <dsp2hc$grb$>

Neal Norwitz wrote:
> On 2/11/06, Tim Peters <tim.peters at> wrote:
>>>[Tim telling how I broke pyuthon]
>>[Martin fixing it]
> Sorry for the breakage (I didn't know about the Windows issues). 
> Thank you Martin for fixing it.  I agree with the solution.
> I was away from mail, ahem, "working".
yeah, right, at your off-site boondoggle south of the border. we know.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From greg.ewing at  Mon Feb 13 07:10:18 2006
From: greg.ewing at (Greg Ewing)
Date: Mon, 13 Feb 2006 19:10:18 +1300
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:

> then, in C++, 4.4p4 [conv.qual] has a rather longish formula to
> decide that the assignment is well-formed. In essence, it goes
> like this:
 > [A large head-exploding set of rules]


Const - Just Say No.


From greg.ewing at  Mon Feb 13 07:34:57 2006
From: greg.ewing at (Greg Ewing)
Date: Mon, 13 Feb 2006 19:34:57 +1300
Subject: [Python-Dev] nice()
In-Reply-To: <038701c63004$733603c0$132c4fca@csmith>
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <>

Smith wrote:

>     When teaching some programming to total newbies, a common frustration
>     is how to explain why a==b is False when a and b are floats computed
>     by different routes which ``should'' give the same results (if
>     arithmetic had infinite precision).

This is just a special case of the problems inherent
in the use of floating point. As with all of these,
papering over this particular one isn't going to help
in the long run -- another one will pop up in due

Seems to me it's better to educate said newbies not
to use algorithms that require comparing floats for
equality at all. In my opinion, if you ever find
yourself trying to do this, you're not thinking about
the problem correctly, and your algorithm is simply
wrong, even if you had infinitely precise floats.


From greg.ewing at  Mon Feb 13 07:35:07 2006
From: greg.ewing at (Greg Ewing)
Date: Mon, 13 Feb 2006 19:35:07 +1300
Subject: [Python-Dev] PEP 351
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

Bengt Richter wrote:

> Anyhow, why shouldn't you be able to call freeze(an_ordinary_list) and get back freeze(xlist(an_ordinary_list))
> automatically, based e.g. on a freeze_registry_dict[type(an_ordinary_list)] => xlist lookup, if plain hash fails?

[Cue: sound of loud alarm bells going off in Greg's head]

-1 on having any kind of global freezing registry.

If we need freezing at all, I think it would be quite
sufficient to have a few types around such as
frozenlist(), frozendict(), etc.

I would consider it almost axiomatic that code needing
to freeze something will know what type of thing it is
freezing. If it doesn't, it has no business attempting
to do so.

If you need to freeze something not covered by the
standard frozen types, write your own class or function
to handle it, and invoke it explicitly where appropriate.


From kd5bjo at  Mon Feb 13 11:52:28 2006
From: kd5bjo at (Eric Sumner)
Date: Mon, 13 Feb 2006 04:52:28 -0600
Subject: [Python-Dev] PEP 343: Context managers a superset of decorators?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/12/06, Josiah Carlson <jcarlson at> wrote:
[paragraphs swapped]
> The desire for context managers to have access to its enclosing scope is
> another discussion entirely, though it may do so without express
> permission via stack frame manipulation.

My main point was that, with relatively small changes to 343, it can
replace the decorator syntax with a more general solution that matches
the style of the rest of the language better.  The main change (access
to scopes) makes this possible, and the secondary change (altering the
block syntax) mitigates (but does not remove) the syntax difficulties
presented.  I realize that I made an assumption that may not be valid;
namely, that a new scope is generated by the 'with' statement.  Stack
frame manipulation would not be able to provide access to a scope that
no longer exists.

> Re-read the decorator PEP: to
> understand why both of these options (indentation and prefix notation)
> are undesireable for a general decorator syntax.

With the changes that I propose, both syntaxes are equivalent and can
be used interchangeably.  While each of them has problems, I believe
that in situations where one has a problem, the other usually does

>From this point on, I provide a point-by-point reaction to the most
applicable syntax objections listed in PEP 318.  If you're not
interested in this, bail out now.

In the PEP, there is no discussion of a prefix notation in which the
decorator is placed before the 'def' on the same line.  The most
similar example has the decorator between the 'def' and the parameter
list.  It mentions two problems:

> There are a couple of objections to this form. The first is that it breaks
> easily 'greppability' of the source -- you can no longer search for 'def foo('
> and find the definition of the function. The second, more serious, objection
> is that in the case of multiple decorators, the syntax would be extremely
> unwieldy.

The first does not apply, as this syntax does not separate 'def' and
the function name.  The second is still a valid concern, but the
decorator list can easily be broken across multiple lines.

The main objection to an indented syntax seems to be that it requires
decorated functions to be indented an extra level.  For simple
decorators, the compacted syntax could be used to sidestep this
problem.  The main complaints about the J2 proposal don't quite apply:
the code in the block is a sequence of statements and 'with' is
already going to be added to the language as a compound statement.

  -- Eric

From ncoghlan at  Mon Feb 13 12:15:29 2006
From: ncoghlan at (Nick Coghlan)
Date: Mon, 13 Feb 2006 21:15:29 +1000
Subject: [Python-Dev] PEP 343: Context managers a superset of decorators?
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Eric Sumner wrote:
> I realize that I made an assumption that may not be valid;
> namely, that a new scope is generated by the 'with' statement.

The with statement uses the existing scope - its just a way of factoring out 
try/finally boilerplate code. No more, and, in fact, fractionally less (the 
'less' being the fact that just like any other Python function, you only get 
to supply one value to be bound to a name in the invoking scope).

Trying to link this with the function definition pipelining provided by 
decorators seems like a bit of a stretch. It certainly isn't a superset of the 
decorator functionality - if you want a statement that manipulates the 
namespace it contains, that's what class statements are for :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Mon Feb 13 16:03:29 2006
From: g.brandl at (Georg Brandl)
Date: Mon, 13 Feb 2006 16:03:29 +0100
Subject: [Python-Dev] still available
Message-ID: <dsq741$4un$>

The above docs are from August 2005 while is current.
Shouldn't the old docs be removed?


From kd5bjo at  Mon Feb 13 17:42:30 2006
From: kd5bjo at (Eric Sumner)
Date: Mon, 13 Feb 2006 10:42:30 -0600
Subject: [Python-Dev] PEP 343: Context managers a superset of decorators?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/13/06, Nick Coghlan <ncoghlan at> wrote:
> Eric Sumner wrote:
> > I realize that I made an assumption that may not be valid;
> > namely, that a new scope is generated by the 'with' statement.
> The with statement uses the existing scope - its just a way of factoring out
> try/finally boilerplate code. No more, and, in fact, fractionally less (the
> 'less' being the fact that just like any other Python function, you only get
> to supply one value to be bound to a name in the invoking scope).

Ok.  These changes are more substantial than I thought, then.

> Trying to link this with the function definition pipelining provided by
> decorators seems like a bit of a stretch. It certainly isn't a superset of the
> decorator functionality - if you want a statement that manipulates the
> namespace it contains, that's what class statements are for :)

Several examples of how the 'with' block would be used involve
transactions which are either rolled back or confirmed.  All of these
use the transaction capabilities of some external database.  With
separate scopes, the '__exit__' function can decide which names to
export outwards to the containing scope.  Unlike class statements, the
contained scope is used temporarily and can be discarded when the
'with' statement is completed.  This would allow a context manager to
provide a local transaction handler.

To me, it is not much of a leap from copying data between scopes to
modifying it as it goes through, which is exactly what decorators do. 
The syntax that this provides for decorators seems reasonable enough
(to me) to make the '@' syntax redundant.  However, this is a larger
change than I thought, and maybe not worth the effort to implement.

  -- Eric

From smiles at  Mon Feb 13 18:10:28 2006
From: smiles at (Smith)
Date: Mon, 13 Feb 2006 11:10:28 -0600
Subject: [Python-Dev] nice()
References: <>
Message-ID: <004f01c630c0$f051e1f0$5f2c4fca@csmith>

| From: Josiah Carlson <jcarlson at>
| "Alan Gauld" <alan.gauld at> wrote:
|| However I do dislike the name nice() - there is already a nice() in
|| the 
|| os module with a fairly well understood function. 

perhaps trim(), nearly(), about(), defer_the_pain_of() :-) I've waited to think of names until after writing this. The reason for the last name option may become apparent after reading the rest of this post.

|| But I'm sure some
|| time with a thesaurus can overcome that single mild objection. :-)
| Presumably it would be located somewhere like the math module.

I would like to see it as accessible as round, int, float, and repr. I really think a round-from-the-left is a nice tool to have. It's obviously very easy to build your own if you know what tools to use. Not everyone is going to be reading the python-dev or similar lists, however, and so having it handy would be nice.

| From: Greg Ewing <greg.ewing at>
| Smith wrote:
||     When teaching some programming to total newbies, a common
||     frustration is how to explain why a==b is False when a and b are
||     floats computed by different routes which ``should'' give the
||     same results (if arithmetic had infinite precision).
| This is just a special case of the problems inherent
| in the use of floating point. As with all of these,
| papering over this particular one isn't going to help
| in the long run -- another one will pop up in due
| course.
| Seems to me it's better to educate said newbies not
| to use algorithms that require comparing floats for
| equality at all. 

I think that having a helper function like nice() is a middle ground solution to the problem, falling short of using only decimal or rational values for numbers and doing better than requiring a test of error between floating values that should be equal but aren't because of alternate methods of computation. Just like the argument for having true division being the default behavior for the computational environment, it seems a little unfriendly to expect the more casual user to have to worry that 3*0.1 is not the same as 3/10.0. I know--they really are different, and one should (eventually) understand why, but does anyone really want the warts of floating point representation to be popping up in their work if they could be avoided, or at least easily circumvented?

I know you know why the following numbers show up as not equal, but this would be an example of the pain in working with a reasonably simple exercise of, say, computing the bin boundaries for a histogram where bins are a width of 0.1: 

>>> for i in range(20):
...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
3 0.30000000000000004 0.29999999999999999 0.3 0.3
6 0.60000000000000009 0.59999999999999998 0.6 0.6
7 0.70000000000000007 0.69999999999999996 0.7 0.7
12 1.2000000000000002 1.2 1.2 1.2
14 1.4000000000000001 1.3999999999999999 1.4 1.4
17 1.7000000000000002 1.7 1.7 1.7
19 1.9000000000000001 1.8999999999999999 1.9 1.9

For, say, garden variety numbers that aren't full of garbage digits resulting from fp computation, the boundaries computed as 0.1*i are not going to agree with such simple numbers as 1.4 and 0.7.

Would anyone (and I truly don't know the answer) really mind if all floating point values were filtered through whatever lies behind the str() manipulation of floats before the computation was made? I'm not saying that strings would be compared, but that float(str(x)) would be compared to float(str(y)) if x were being compared to y as in x<=y. If this could be done, wouldn't a lot of grief just go away and not require the use of decimal or rational types for many users? 

I understand that the above really is just a patch over the problem, but I'm wondering if it moves the problem far enough away that most users wouldn't have to worry about it. Here, for example, are the first values where the running sum doesn't equal the straight multiple of some step size:

>>> def go(x,n=1000):
...  s=0;i=0
...  while s<n:
...   i+=1;s+=x
...   if nice(s)<>nice(i*x):
...    return i,s,i*x,`s`,`i*x`
>>> for i in range(1,100):
...  print i, go(i/1000.)
...  print
1 (60372 60.3719999999 60.372 60.371999999949999 60.372)

2 (49645 99.2899999999 99.29 99.289999999949998 99.290000000000006)

The soonest the breakdown occurs is at the 22496th multiple of 0.041 for the range given above. By the time someone starts getting into needs of iterating so many times, they will be ready to use the more sophisticated option of nice()--the one which makes it more versatile and less of a patch--the option to round the answers to a given number of leading digits rather than a given decimal precision like round. nice() gives a simple way to think about making a comparison of floats. You just have to ask yourself at what "part per X" do you no longer care whether the numbers are different or not. e.g., for approximately 1 part in 100, use nice(x,2) and nice(y,2) to make the comparison between x and y. Replacing nice(*) with nice(*,6) in the go() defined above produces no discrepancy in values computed the two different ways. Since the cost of str() and '%.*e' is nearly the same, perhaps a default value of leadingDigits=9 would be a good default value, and the float(str()) option could be eliminated from nice. Isn't nice() sort of a poor-man's decimal-type without all the extra baggage?

| In my opinion, if you ever find
| yourself trying to do this, you're not thinking about
| the problem correctly, and your algorithm is simply
| wrong, even if you had infinitely precise floats.

As for real world examples of when this would be nice I will have to rely on others to justify this more heavily. Some quick examples that come to mind are:

* Creating histograms of physical measurements with limited significant digits (i.e., not lots of digits from computation)
* Collecting sets of points within a certain range of a given value (all points within 10% of a given value)
* Stopping iterations when computed errors have fallen below a certain threshold. (For this, getting the stopping condition "right" is not so critical because doing one more iteration usually isn't a problem if an error happens to be a tiny bit larger than the required tolerance. However, the leadingDigits option on nice() allows one to even get this stopping condition right to a limited precision, something like

tol = 1e-5
while 1:
    #do something and compute err
    if nice(err,3)<=nice(tol,3):

By specifying the leadingDigits value of 3, the user is saying that it's fine to quit when the err >= 0.9995. Since there is no additional cost in specifying more digits, a value of 9 could be used as well.

| Ismael at tutor wrote:
| How about overloading Float comparison? 

I'm not so adept at such things--how easy is this to do for all comparisions in a script? in an interactive session? For the latter, if it were easy, perhaps it could be part of a "newbie" mode that could be loaded. I think that some (one above has said so) would rather not have an issue pushed away, they would want to leave things as they are and just learn to work around it, not be given a hand-holding device that is eventually going to let them down anyway. I'm wondering if just having to use the function to make a comparison will be like putting your helmet on before you cycle--a reminder that there may be hazards ahead, proceed with caution. If so, then overloading the Float comparision would be better not done, requiring the "buckling" of the floats within nice().

| If I have understood correctly, float to float comparison must be done
| comparing relative errors, so that when dealing with small but rightly
| represented numbers it won't tell "True" just because they're
| "close". I 
| think your/their solution only covers the case when dealing with "big"
| numbers.

Think of two small numbers that you think might fail the nice test and then use the leadingDigits option (set at something like 6) and see if the problem doesn't disappear. If I understand you correctly, is this such a case: x and y defined below are truly close and nice()'s default comparison would say they are different, but nice(*,6) would say they are the same--the same to the first 6 digits of the exponential representation:

>>> x=1.234567e-7
>>> y=1.234568e-7
>>> nice(x)==nice(y)
>>> nice(x,6)==nice(y,6)

| Chuck Allison wrote on edu-sig:
| There is a reliable way to compute the exact number of floating-point
| "intervals" (one less than the number of FP numbers) between any two
| FP numbers. It is a long-ago solved problem. I have attached a C++
| version. You can't define closeness by a "distance" in a FP system -
| you should use this measure instead (called "ulps" - units in the
| last place). The distance between large FP numbers may always be
| greater than the tolerance you prescribe. The spacing between
| adjacent FP numbers at the top of the scale for IEEE double precision
| numbers is 2^(972) (approx. 10^(293))! I doubt you're going to make
| your tolerance this big. I don't believe newbies can grasp this, but
| they can be taught to get a "feel" for floating-point number systems.
| You can't write reliable FP code without this understanding. See

A very readable 13 page introduction to some floating point issues. Thanks for the reference. The author concludes with,

"Computer science students don't need to be numerical analysts, but they may be called upon to write mathematical software. Indeed, scientists and engineers use tools like Matlab and Mathematica, but who implements these systems? It takes the expertise that only CS graduates have to write such sophisticated software. Without knowledge of the intricacies of floating-point computation, they will make a mess of things. In this paper I have surveyed the basics that every CS graduate should have mastered before they can be trusted in a workplace that does any kind of computing with real numbers."

So perhaps this brings us back to the original comment that "fp issues are a learning opportunity." They are. The question I have is "how soon do they need to run into them?" Is decreasing the likelihood that they will see the problem (but not eliminate it) a good thing for the python community or not?


From guido at  Mon Feb 13 18:55:56 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 09:55:56 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should suffice, at least initially.

I also wonder if having a b"..." literal would just add more confusion
-- bytes are not characters, but b"..." makes it appear as if they


On 2/11/06, Bengt Richter <bokr at> wrote:
> On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <guido at> wrote:
> >> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at> > >The backwards compatibility problems *seem* to be relatively minor.
> >> >I only found one instance of breakage in the standard library.  Note
> >> >that my patch does not change PyObject_Str(); that would break
> >> >massive amounts of code.  Instead, I introduce a new function:
> >> >PyString_New().  I'm not crazy about the name but I couldn't think
> >> >of anything better.
> >
> >On 2/10/06, Bengt Richter <bokr at> wrote:
> >> Should this not be coordinated with PEP 332?
> >
> >Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
> >
> I'd be glad to add my thoughts, but first of course it's Skip's PEP,
> and Martin casts a long shadow when it comes to character coding issues
> that I suspect will have to be considered.
> (E.g., if there is a b'...' literal for bytes, the actual characters of
> the source code itself that the literal is being expressed in could be ascii
> or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
> is at least temporarily normalized to Unicode, and then re-encoded (except now
> for string literals?) per coding cookie or other encoding inference. (I may be
> out of date, gotta catch up).
> If one way or the other a string literal is in Unicode, then presumably so is
> a byte string b'...' literal -- i.e. internally u"b'...'" just before
> being turned into bytes.
> Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
> for non-ascii and non-printables, to define the full 8 bits without encoding error?
> Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
> to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
> (but how does this play with str being able to produce unicode? And when do these changes happen?)
> I guess I'm getting ahead of myself ;-)
> So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
> going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.
> I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
> that anyone could then improve further. I don't know about an early deadline. I don't want
> to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
> time more effectively ;-)
> I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
> know who else might be interested...
> Regards,
> Bengt Richter
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From mal at  Mon Feb 13 19:12:18 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 13 Feb 2006 19:12:18 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> One recommendation: for starters, I'd much rather see the bytes type
> standardized without a literal notation. There should be are lots of
> ways to create bytes objects from string objects, with specific
> explicit encodings, and those should suffice, at least initially.
> I also wonder if having a b"..." literal would just add more confusion
> -- bytes are not characters, but b"..." makes it appear as if they
> are.


Given that we have a source code encoding which would need
to be honored, b"..." doesn't really make all that much sense
(unless you always use hex escapes).

Note that if we drop the string type, all codecs which currently
return strings will have to return bytes. This gives you a pretty
exhaustive way of defining your binary literals in Python :-)

Here's one:

	data = "abc".encode("latin-1")

To simplify things we might want to have


do the above encoding per default.

> --Guido
> On 2/11/06, Bengt Richter <bokr at> wrote:
>> On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <guido at> wrote:
>>>> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <nas at> > >The backwards compatibility problems *seem* to be relatively minor.
>>>>> I only found one instance of breakage in the standard library.  Note
>>>>> that my patch does not change PyObject_Str(); that would break
>>>>> massive amounts of code.  Instead, I introduce a new function:
>>>>> PyString_New().  I'm not crazy about the name but I couldn't think
>>>>> of anything better.
>>> On 2/10/06, Bengt Richter <bokr at> wrote:
>>>> Should this not be coordinated with PEP 332?
>>> Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
>> I'd be glad to add my thoughts, but first of course it's Skip's PEP,
>> and Martin casts a long shadow when it comes to character coding issues
>> that I suspect will have to be considered.
>> (E.g., if there is a b'...' literal for bytes, the actual characters of
>> the source code itself that the literal is being expressed in could be ascii
>> or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the source
>> is at least temporarily normalized to Unicode, and then re-encoded (except now
>> for string literals?) per coding cookie or other encoding inference. (I may be
>> out of date, gotta catch up).
>> If one way or the other a string literal is in Unicode, then presumably so is
>> a byte string b'...' literal -- i.e. internally u"b'...'" just before
>> being turned into bytes.
>> Should that then be an internal straight u"b'...'".encode('byte') with default ascii + escapes
>> for non-ascii and non-printables, to define the full 8 bits without encoding error?
>> Should unicode be encodable into byte via a specific encoding? E.g., u'abc'.encode('byte','latin1'),
>> to distinguish producing a mutable byte string vs an immutable str type as with u'abc'.encode('latin1').
>> (but how does this play with str being able to produce unicode? And when do these changes happen?)
>> I guess I'm getting ahead of myself ;-)
>> So I would first ask Skip what he'd like to do, and Martin for some hints on reading, to avoid
>> going down paths he already knows lead to brick walls ;-) And I need to think more about PEP 349.
>> I would propose to do the reading they suggest, and edit up a new version of pep-0332.txt
>> that anyone could then improve further. I don't know about an early deadline. I don't want
>> to over-commit, as time and energies vary. OTOH, as you've noticed, I could be spending my
>> time more effectively ;-)
>> I changed the thread title, and will wait for some signs from you, Skip, Martin, Neil, and I don't
>> know who else might be interested...
>> Regards,
>> Bengt Richter
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at
>> Unsubscribe:
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From pje at  Mon Feb 13 19:19:04 2006
From: pje at (Phillip J. Eby)
Date: Mon, 13 Feb 2006 13:19:04 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <
References: <> <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
>One recommendation: for starters, I'd much rather see the bytes type
>standardized without a literal notation. There should be are lots of
>ways to create bytes objects from string objects, with specific
>explicit encodings, and those should suffice, at least initially.
>I also wonder if having a b"..." literal would just add more confusion
>-- bytes are not characters, but b"..." makes it appear as if they

Why not just have the constructor be:

     bytes(initializer [,encoding])

Where initializer must be either an iterable of suitable integers, or a 
unicode/string object.  If the latter (i.e., it's a basestring), the 
encoding argument would then be required.  Then, there's no need for 
special codec support for the bytes type, since you call bytes on the thing 
to be encoded.  And of course, no need for a 'b' literal.

From guido at  Mon Feb 13 19:52:03 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 10:52:03 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <> <dsgem7$10u$>
	<> <dsgqlg$7ml$>
Message-ID: <>

On 2/10/06, Mark Russell <mrussell at> wrote:
> On 10 Feb 2006, at 12:45, Nick Coghlan wrote:
> An alternative would be to call it "__discrete__", as that is the key
> characteristic of an indexing type - it consists of a sequence of discrete
> values that can be isomorphically mapped to the integers.
> Another alternative: __as_ordinal__.  Wikipedia describes ordinals as
> "numbers used to denote the position in an ordered sequence" which seems a
> pretty precise description of the intended result.  The "as_" prefix also
> captures the idea that this should be a lossless conversion.

Aren't ordinals generally assumed to be non-negative? The numbers used
as slice or sequence indices can be negative!

Also, I don't buy the reason for 'as'l I don't see how this word would
require the conversion to be losless.

The PEP continues to use __index__ and I'm happy with that.

--Guido van Rossum (home page:

From guido at  Mon Feb 13 20:12:42 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 11:12:42 -0800
Subject: [Python-Dev] ssize_t branch (Was: release plan for 2.5 ?)
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/12/06, "Martin v. L?wis" <martin at> wrote:
> Neal Norwitz wrote:
> > I'm tempted to say we should merge now.  I know the branch works on
> > 64-bit boxes.  I can test on a 32-bit box if Martin hasn't already.
> > There will be a lot of churn fixing problems, but maybe we can get
> > more people involved.
> The ssize_t branch has now all the API I want it to have. I just
> posted the PEP to comp.lang.python, maybe people have additional
> things they consider absolutely necessary.
> There are two aspects left, and both can be done after the merge:
> - a lot of modules still need adjustments, to really support
>   64-bit collections. This shouldn't cause any API changes, AFAICT.
> - the printing of Py_ssize_t values should be supported. I think
>   Tim proposed to provide the 'z' formatter across platforms.
>   This is a new API, but it's a pure extension, so it can be
>   done in the trunk.

Great news. I'm looking forward to getting this over with!

> I would like to avoid changing APIs after the merge to the trunk
> has happened; I remember Guido saying (a few years ago) that this
> change must be a single large change, rather many small incremental
> changes. I agree, and I hope I have covered everything that needs
> to be covered.

Let me qualify that a bit -- I'd be okay with one honking big change
followed by some minor adjustments. I'd say that, since you've already
done so much in the branch, we're quickly approaching the point where
the extra testing we get from merging soon out-benefits the problems
some folks may experience due to the branch not being perfect yet.

--Guido van Rossum (home page:

From python at  Mon Feb 13 21:27:28 2006
From: python at (Raymond Hettinger)
Date: Mon, 13 Feb 2006 15:27:28 -0500
Subject: [Python-Dev] nice()
References: <>
Message-ID: <005a01c630db$e93fa030$b83efea9@RaymondLaptop1>

Please do not spam multiple mail lists with these posts (edu-sig, 
python-dev, and tutor).


----- Original Message ----- 
From: "Smith" <smiles at>
To: <python-dev at>
Cc: <edu-sig at>; <tutor at>
Sent: Monday, February 13, 2006 12:10 PM
Subject: Re: [Python-Dev] nice() 

From jimjjewett at  Mon Feb 13 21:28:37 2006
From: jimjjewett at (Jim Jewett)
Date: Mon, 13 Feb 2006 15:28:37 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
Message-ID: <>


> I don't like __true_int__ very much. Personally,
> I'm fine with calling it __index__

index is OK, but is there a reason __integer__ would be

__int__ roughly follows the low-level C implementation,
and may do odd things on unusual input.

__integer__ properly creates a conceptual integer, so
it won't lose or corrupt information (unless the class
writer does this intentionally).


From guido at  Mon Feb 13 21:32:15 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 12:32:15 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/13/06, Jim Jewett <jimjjewett at> wrote:
> Guido:
> > I don't like __true_int__ very much. Personally,
> > I'm fine with calling it __index__
> index is OK, but is there a reason __integer__ would be
> rejected?
> __int__ roughly follows the low-level C implementation,
> and may do odd things on unusual input.
> __integer__ properly creates a conceptual integer, so
> it won't lose or corrupt information (unless the class
> writer does this intentionally).

Given the number of folks who misappreciate the difference between
__getattr__ and __getattribute__, I'm not sure I'd want to encourage
using abbreviated and full forms of the same term in the same context.
When confronted with the existence of __int__ and __integer__ I can
see plenty of confusion ahead.

--Guido van Rossum (home page:

From guido at  Mon Feb 13 21:34:52 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 12:34:52 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

On 2/13/06, Phillip J. Eby <pje at> wrote:
> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
> >One recommendation: for starters, I'd much rather see the bytes type
> >standardized without a literal notation. There should be are lots of
> >ways to create bytes objects from string objects, with specific
> >explicit encodings, and those should suffice, at least initially.
> >
> >I also wonder if having a b"..." literal would just add more confusion
> >-- bytes are not characters, but b"..." makes it appear as if they
> >are.
> Why not just have the constructor be:
>      bytes(initializer [,encoding])
> Where initializer must be either an iterable of suitable integers, or a
> unicode/string object.  If the latter (i.e., it's a basestring), the
> encoding argument would then be required.  Then, there's no need for
> special codec support for the bytes type, since you call bytes on the thing
> to be encoded.  And of course, no need for a 'b' literal.

It'd be cruel and unusual punishment though to have to write

  bytes("abc", "Latin-1")

I propose that the default encoding (for basestring instances) ought
to be "ascii" just like everywhere else. (Meaning, it should really be
the system default encoding, which defaults to "ascii" and is
intentionally hard to change.)

--Guido van Rossum (home page:

From guido at  Mon Feb 13 21:40:57 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 12:40:57 -0800
Subject: [Python-Dev] still available
In-Reply-To: <dsq741$4un$>
References: <dsq741$4un$>
Message-ID: <>

Shouldn't be removed? It seems to add mroe confusion
than anything, especially since most links on continue to
point to

On 2/13/06, Georg Brandl <g.brandl at> wrote:
> The above docs are from August 2005 while is current.
> Shouldn't the old docs be removed?

(Now that I work for Google I realize more than ever before the
importance of keeping URLs stable; PageRank(tm) numbers don't get
transferred as quickly as contents. I have this worry too in the
context of the redesign; 301 permanent redirect is *not*
going to help PageRank of the new page.)

--Guido van Rossum (home page:

From fdrake at  Mon Feb 13 21:52:44 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Mon, 13 Feb 2006 15:52:44 -0500
Subject: [Python-Dev] still available
In-Reply-To: <dsq741$4un$>
References: <dsq741$4un$>
Message-ID: <>

On Monday 13 February 2006 10:03, Georg Brandl wrote:
 > The above docs are from August 2005 while is current.
 > Shouldn't the old docs be removed?

I'm afraid I've generally been too busy to chime in much on this topic, but 
I've spent a bit of time thinking about it, and would like to keep on top of 
the issue still.

The automatically-maintained version of the development docs is certainly 
preferrable to the manually-maintained-by-me version, and I've updated the 
link from to refer to that version for now.  However, I 
do have some concerns about how this is all structured still.

One of the goals of was to be able to do a Google site-search 
and only see the current version.  Having multiple versions on that site is 
contrary to that purpose.  I'd like to see the development version(s) move 
back to being in the hierarchy.

What I would also like to see is to have an automatically-updated version for 
each of the maintainer versions of Python, as well as the development trunk.  
That would mean two versions at this point (2.4.x, 2.5.x); only one of those 
is currently handled automatically.


Fred L. Drake, Jr.   <fdrake at>

From fredrik at  Mon Feb 13 21:53:58 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 13 Feb 2006 21:53:58 +0100
Subject: [Python-Dev] moving content around (Re: still available)
References: <dsq741$4un$>
Message-ID: <dsqrl8$v9t$>

Guido van Rossum wrote:

> (Now that I work for Google I realize more than ever before the
> importance of keeping URLs stable; PageRank(tm) numbers don't get
> transferred as quickly as contents. I have this worry too in the
> context of the redesign; 301 permanent redirect is *not*
> going to help PageRank of the new page.)

so what's the best way to move stuff around?

wikipedia seems to display the content from the "new" location under
the old URL, but with a small blurb at the top that says "redirected
from <old url>", e.g.

(not sure if it's done that way to avoid HTTP roundtrips, or for some
obscure googlerank reason...)


From jimjjewett at  Mon Feb 13 21:55:14 2006
From: jimjjewett at (Jim Jewett)
Date: Mon, 13 Feb 2006 15:55:14 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

Is there a reason __integer__ would be rejected?

Guido van Rossum answered:

> Given the number of folks who misappreciate the difference between
> __getattr__ and __getattribute__, I'm not sure I'd want to encourage
> using abbreviated and full forms of the same term in the same context.
> When confronted with the existence of __int__ and __integer__ I can
> see plenty of confusion ahead.

I see this case as slightly different.

getattr and getattribute are both things you might
reasonably want to do.  __int__ is something you
probably shouldn't be doing very often anymore;
it is being kept for backwards compatibility.

Switching getattr and getattribute will cause bugs,
which may be hard to diagnose, even for people
who might reasonably be using the hooks.  Switching
__int__ and (newname) won't matter, unless
__int__ was already doing something unexpected.
Since backwards compatibility means we can't
prevent __int__ from doing the unexpected, a
similar name might be *good* -- at least it would
tip people off that __int__ might not be what they

I can't think of any way to associate getattr vs
getattribute with timing or precedence.  I already
associate int with a specific C datatype and integer
with something more abstract.  (I'm not sure the
new method is a better match for my integer
concept, and it probably isn't a better match
for java.lang.Integer, but ... the separation is there.)


From guido at  Mon Feb 13 22:03:07 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 13:03:07 -0800
Subject: [Python-Dev] moving content around (Re: still available)
In-Reply-To: <dsqrl8$v9t$>
References: <dsq741$4un$>
Message-ID: <>

On 2/13/06, Fredrik Lundh <fredrik at> wrote:
> Guido van Rossum wrote:
> > (Now that I work for Google I realize more than ever before the
> > importance of keeping URLs stable; PageRank(tm) numbers don't get
> > transferred as quickly as contents. I have this worry too in the
> > context of the redesign; 301 permanent redirect is *not*
> > going to help PageRank of the new page.)
> so what's the best way to move stuff around?

I don't know; my point was to avoid needless moving rather than giving
a best practice for moving.

> wikipedia seems to display the content from the "new" location under
> the old URL, but with a small blurb at the top that says "redirected
> from <old url>", e.g.
> (not sure if it's done that way to avoid HTTP roundtrips, or for some
> obscure googlerank reason...)

Can't say I understand that particular example. Wikipedia has
different requirements though; there are aliases (e.g. homonyms,
synonyms) that won't go away. For we're looking at
minimizing the URL space churn.

--Guido van Rossum (home page:

From guido at  Mon Feb 13 22:09:59 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 13:09:59 -0800
Subject: [Python-Dev] PEP 351
In-Reply-To: <001501c62f57$2b070b60$6a01a8c0@RaymondLaptop1>
References: <dsbc3h$rct$>
Message-ID: <>

I've rejected PEP 351, with a reference to this thread as the rationale.

--Guido van Rossum (home page:

From guido at  Mon Feb 13 22:11:36 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 13:11:36 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/12/06, Greg Ewing <greg.ewing at> wrote:
>  > [A large head-exploding set of rules]
> Blarg.
> Const - Just Say No.


--Guido van Rossum (home page:

From jimjjewett at  Mon Feb 13 22:16:17 2006
From: jimjjewett at (Jim Jewett)
Date: Mon, 13 Feb 2006 16:16:17 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
Message-ID: <>

Travis wrote:

>  The patch adds a new API function int PyObject_AsIndex(obj)

How did you decide between int and long?

Why not ssize_t?

Also, if index is being added as a builtin, should the failure
result be changed?  I'm thinking that this may become a
replacement for isinstance(val, (int, long)).  If so, it might
be nice not to raise errors, or at least to raise a more
specific subclass.  (Catching a TypeError and then
checking the message string ... does not seem clean.)


From jimjjewett at  Mon Feb 13 22:24:52 2006
From: jimjjewett at (Jim Jewett)
Date: Mon, 13 Feb 2006 16:24:52 -0500
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
Message-ID: <>

Travis wrote:

>  The patch adds a new API function int PyObject_AsIndex(obj)

How did you decide between int and long?

Why not ssize_t?

From guido at  Mon Feb 13 22:30:26 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 13:30:26 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/13/06, Jim Jewett <jimjjewett at> wrote:
> Travis wrote:
> >  The patch adds a new API function int PyObject_AsIndex(obj)
> How did you decide between int and long?
> Why not ssize_t?

It should be the same type used everywhere for indexing. In the svn
HEAD that's int. Once PEP 353 lands it should be ssize_t. I've made
Travis aware of this issue already.

> Also, if index is being added as a builtin, should the failure
> result be changed?

I don't like to add a built-in index() at this point; mostly because
of Occam's razor (we haven't found a need).

> I'm thinking that this may become a
> replacement for isinstance(val, (int, long)).

But only if it's okay if values > sys.maxint (or some other constant
indicating the limit of ssize_t) are not required to be supported.

> If so, it might
> be nice not to raise errors, or at least to raise a more
> specific subclass.  (Catching a TypeError and then
> checking the message string ... does not seem clean.)

I'm not sure what you mean. How could index(x) ever replace
isinstance(x, (int, long)) without raising an exception? Surely
index("abc") *should* raise an exception.

--Guido van Rossum (home page:

From amk at  Mon Feb 13 23:41:00 2006
From: amk at (A.M. Kuchling)
Date: Mon, 13 Feb 2006 17:41:00 -0500
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$> <>
Message-ID: <>

On Mon, Feb 13, 2006 at 03:52:44PM -0500, Fred L. Drake, Jr. wrote:
> What I would also like to see is to have an automatically-updated
> version for each of the maintainer versions of Python, as well as
> the development trunk.  That would mean two versions at this point
> (2.4.x, 2.5.x); only one of those is currently handled
> automatically.

If Thomas could set up a wildcard DNS of some sort, would it be a good
idea to have lots of hostnames, e.g.,, etc.?  We could probably make it work in Apache
with mod_rewrite so that we aren't endlessly tweaking the config file
as new versions are released.


From aahz at  Mon Feb 13 22:43:45 2006
From: aahz at (Aahz)
Date: Mon, 13 Feb 2006 13:43:45 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Feb 13, 2006, Jim Jewett wrote:
> getattr and getattribute are both things you might reasonably want to
> do. __int__ is something you probably shouldn't be doing very often
> anymore; it is being kept for backwards compatibility.

And how do you convert a float to an int?  __int__ is NOT going away; the
sole purpose of __index__ is to enable sequence index functionality and
similar use-cases for int-like objects that do not subclass from int.
(For example, one might want to allow an enumeration type to index into
a list.)
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From jeremy at  Mon Feb 13 22:49:44 2006
From: jeremy at (Jeremy Hylton)
Date: Mon, 13 Feb 2006 16:49:44 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

It sounds like the right answer for Python is to change the signature
of PyArg_ParseTupleAndKeywords() back.  We'll fix it when C fixes its
const rules <wink>.


On 2/13/06, Guido van Rossum <guido at> wrote:
> On 2/12/06, Greg Ewing <greg.ewing at> wrote:
> >  > [A large head-exploding set of rules]
> >
> > Blarg.
> >
> > Const - Just Say No.
> +1
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From guido at  Mon Feb 13 22:52:54 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 13:52:54 -0800
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>


On 2/13/06, Jeremy Hylton <jeremy at> wrote:
> It sounds like the right answer for Python is to change the signature
> of PyArg_ParseTupleAndKeywords() back.  We'll fix it when C fixes its
> const rules <wink>.
> Jeremy
> On 2/13/06, Guido van Rossum <guido at> wrote:
> > On 2/12/06, Greg Ewing <greg.ewing at> wrote:
> > >  > [A large head-exploding set of rules]
> > >
> > > Blarg.
> > >
> > > Const - Just Say No.
> >
> > +1
> >
> > --
> > --Guido van Rossum (home page:
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >

--Guido van Rossum (home page:

From mal at  Mon Feb 13 22:55:01 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 13 Feb 2006 22:55:01 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/13/06, Phillip J. Eby <pje at> wrote:
>> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
>>> One recommendation: for starters, I'd much rather see the bytes type
>>> standardized without a literal notation. There should be are lots of
>>> ways to create bytes objects from string objects, with specific
>>> explicit encodings, and those should suffice, at least initially.
>>> I also wonder if having a b"..." literal would just add more confusion
>>> -- bytes are not characters, but b"..." makes it appear as if they
>>> are.
>> Why not just have the constructor be:
>>      bytes(initializer [,encoding])
>> Where initializer must be either an iterable of suitable integers, or a
>> unicode/string object.  If the latter (i.e., it's a basestring), the
>> encoding argument would then be required.  Then, there's no need for
>> special codec support for the bytes type, since you call bytes on the thing
>> to be encoded.  And of course, no need for a 'b' literal.
> It'd be cruel and unusual punishment though to have to write
>   bytes("abc", "Latin-1")
> I propose that the default encoding (for basestring instances) ought
> to be "ascii" just like everywhere else. (Meaning, it should really be
> the system default encoding, which defaults to "ascii" and is
> intentionally hard to change.)

We're talking about Py3k here: "abc" will be a Unicode string,
so why restrict the conversion to 7 bits when you can have 8 bits
without any conversion problems ?

While we're at it: I'd suggest that we remove the auto-conversion
from bytes to Unicode in Py3k and the default encoding along with
it. In Py3k the standard lib will have to be Unicode compatible
anyway and string parser markers like "s#" will have to go away
as well, so there's not much need for this anymore.

(Maybe a bit radical, but I guess that's what Py3k is meant for.)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From jeremy at  Mon Feb 13 22:58:33 2006
From: jeremy at (Jeremy Hylton)
Date: Mon, 13 Feb 2006 16:58:33 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/10/06, "Martin v. L?wis" <martin at> wrote:
> Jeremy Hylton wrote:
> > Ok.  I reviewed the original problem and you're right, the problem was
> > not that it failed outright but that it produced a warning about the
> > deprecated conversion:
> > warning: deprecated conversion from string constant to 'char*''
> >
> > I work at a place that takes the same attitude as python-dev about
> > warnings:  They're treated as errors and you can't check in code that
> > the compiler generates warnings for.
> In that specific case, I think the compiler's warning should be turned
> off; it is a bug in the compiler if that specific warning cannot be
> turned off separately.

The compiler in question is gcc and the warning can be turned off with
-Wno-write-strings.  I think we'd be better off leaving that option
on, though.  This warning will help me find places where I'm passing a
string literal to a function that does not take a const char*.  That's
valuable, not insensate.


> While it is true that the conversion is deprecated, the C++ standard
> defines this as
> "Normative for the current edition of the Standard, but not guaranteed
> to be part of the Standard in future revisions."
> The current version is from 1998. I haven't been following closely,
> but I believe there are no plans to actually remove the feature
> in the next revision.
> FWIW, Annex D also defines these features as deprecated:
> - the use of "static" for objects in namespace scope (AFAICT
>   including C file-level static variables and functions)
> - C library headers (i.e. <stdio.h>)
> Don't you get a warning when including Python.h, because that
> include <limits.h>?
> > Nonetheless, the consensus on the c++ sig and python-dev at the time
> > was to fix Python.  If we don't allow warnings in our compilations, we
> > shouldn't require our users at accept warnings in theirs.
> We don't allow warnings for "major compilers". This specific compiler
> appears flawed (or your configuration of it).
> Regards,
> Martin

From thomas at  Mon Feb 13 23:04:55 2006
From: thomas at (Thomas Wouters)
Date: Mon, 13 Feb 2006 23:04:55 +0100
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$> <>
Message-ID: <>

On Mon, Feb 13, 2006 at 05:41:00PM -0500, A.M. Kuchling wrote:
> On Mon, Feb 13, 2006 at 03:52:44PM -0500, Fred L. Drake, Jr. wrote:
> > What I would also like to see is to have an automatically-updated
> > version for each of the maintainer versions of Python, as well as
> > the development trunk.  That would mean two versions at this point
> > (2.4.x, 2.5.x); only one of those is currently handled
> > automatically.

> If Thomas could set up a wildcard DNS of some sort,

That wouldn't be a problem. I fear what it'll do to the PageRank though ;-)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From fuzzyman at  Mon Feb 13 23:14:23 2006
From: fuzzyman at (Michael Foord)
Date: Mon, 13 Feb 2006 22:14:23 +0000
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

Guido van Rossum wrote:
> Shouldn't be removed? It seems to add mroe confusion
> than anything, especially since most links on continue to
> point to
All the web says about 1200 links into the subdomain. 
(Different to the google link feature, which only shows links to a 
specific URL I believe.)

It's where I link to as well. Be a shame to lose it. ;-)

Michael Foord

> On 2/13/06, Georg Brandl <g.brandl at> wrote:
>> The above docs are from August 2005 while is current.
>> Shouldn't the old docs be removed?
> (Now that I work for Google I realize more than ever before the
> importance of keeping URLs stable; PageRank(tm) numbers don't get
> transferred as quickly as contents. I have this worry too in the
> context of the redesign; 301 permanent redirect is *not*
> going to help PageRank of the new page.)
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From pje at  Mon Feb 13 23:15:05 2006
From: pje at (Phillip J. Eby)
Date: Mon, 13 Feb 2006 17:15:05 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
	<dsjrfp$g72$> <>
Message-ID: <>

At 10:55 PM 2/13/2006 +0100, M.-A. Lemburg wrote:
>Guido van Rossum wrote:
> > On 2/13/06, Phillip J. Eby <pje at> wrote:
> >> At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
> >>> One recommendation: for starters, I'd much rather see the bytes type
> >>> standardized without a literal notation. There should be are lots of
> >>> ways to create bytes objects from string objects, with specific
> >>> explicit encodings, and those should suffice, at least initially.
> >>>
> >>> I also wonder if having a b"..." literal would just add more confusion
> >>> -- bytes are not characters, but b"..." makes it appear as if they
> >>> are.
> >> Why not just have the constructor be:
> >>
> >>      bytes(initializer [,encoding])
> >>
> >> Where initializer must be either an iterable of suitable integers, or a
> >> unicode/string object.  If the latter (i.e., it's a basestring), the
> >> encoding argument would then be required.  Then, there's no need for
> >> special codec support for the bytes type, since you call bytes on the 
> thing
> >> to be encoded.  And of course, no need for a 'b' literal.
> >
> > It'd be cruel and unusual punishment though to have to write
> >
> >   bytes("abc", "Latin-1")
> >
> > I propose that the default encoding (for basestring instances) ought
> > to be "ascii" just like everywhere else. (Meaning, it should really be
> > the system default encoding, which defaults to "ascii" and is
> > intentionally hard to change.)
>We're talking about Py3k here: "abc" will be a Unicode string,
>so why restrict the conversion to 7 bits when you can have 8 bits
>without any conversion problems ?

Actually, I thought we were talking about adding bytes() in 2.5.

However, now that you've brought this up, it actually makes perfect sense 
to just use latin-1 as the effective encoding for both strings and 
unicode.  In Python 2.x, strings are byte strings by definition, so it's 
only in 3.0 that an encoding would be required.  And again, latin1 is a 
reasonable, roundtrippable default encoding.

So, it sounds like making the encoding default to latin-1 would be a 
reasonably safe approach in both 2.x and 3.x.

>While we're at it: I'd suggest that we remove the auto-conversion
>from bytes to Unicode in Py3k and the default encoding along with
>it. In Py3k the standard lib will have to be Unicode compatible
>anyway and string parser markers like "s#" will have to go away
>as well, so there's not much need for this anymore.

I thought all this was already in the plan for 3.0, but maybe I assume too 
much.  :)

From mal at  Mon Feb 13 23:18:23 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 13 Feb 2006 23:18:23 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Tim Peters wrote:
> [Jeremy]
>>>> I added some const to several API functions that take char* but
>>>> typically called by passing string literals.
> [Tim]
>>> If he had _stuck_ to that, we wouldn't be having this discussion :-)
>>> (that is, nobody passes string literals to
>>> PyArg_ParseTupleAndKeywords's kws argument).
> [Jeremy]
>> They are passing arrays of string literals.  In my mind, that was a
>> nearly equivalent use case.  I believe the C++ compiler complains
>> about passing an array of string literals to char**.
> It's the consequences:  nobody complains about tacking "const" on to a
> former honest-to-God "char *" argument that was in fact not modified,
> because that's not only helpful for C++ programmers, it's _harmless_
> for all programmers.  For example, nobody could sanely object (and
> nobody did :-)) to adding const to the attribute-name argument in
> PyObject_SetAttrString().  Sticking to that creates no new problems
> for anyone, so that's as far as I ever went.

Well, it broke my C extensions... I now have this in my code:

/* The keyword array changed to const char* in Python 2.5 */
#if PY_VERSION_HEX >= 0x02050000
# define Py_KEYWORDS_STRING_TYPE const char
static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
if (!PyArg_ParseTupleAndKeywords(args,kws,format,kwslist,&a1))
    goto onError;

The crux is that code which should be portable across Python
versions won't work otherwise: you either get Python 2.5 xor
Python 2.x (for x < 5) compatibility.

Not too happy about it, but then compared to the ssize_t
changes and the relative imports PEP, this one is an easy
one to handle.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From fdrake at  Mon Feb 13 23:29:11 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Mon, 13 Feb 2006 17:29:11 -0500
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

On Monday 13 February 2006 15:40, Guido van Rossum wrote:
 > Shouldn't be removed? It seems to add mroe confusion
 > than anything, especially since most links on continue to
 > point to was created specifically to make searching the most recent 
"stable" version of the docs easier (using Google's site: modifier, no less).  
I don't know what the link count statistics say (other than what you 
mention), and don't know which gets hit more often, but I still think it's a 
reasonable approach.

I've been switching links to point to whenever I find an older 
link that points to; other parts of the doc/ area 
from the site didn't move, and perhaps that's a problem that should be 

 > (Now that I work for Google I realize more than ever before the
 > importance of keeping URLs stable; PageRank(tm) numbers don't get
 > transferred as quickly as contents. I have this worry too in the
 > context of the redesign; 301 permanent redirect is *not*
 > going to help PageRank of the new page.)

Maybe I'm just not getting why that's relevant.


Fred L. Drake, Jr.   <fdrake at>

From fredrik at  Mon Feb 13 23:45:18 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 13 Feb 2006 23:45:18 +0100
Subject: [Python-Dev] still available
References: <dsq741$4un$><>
Message-ID: <dsr260$ol4$>

Fred L. Drake, Jr. wrote:

> was created specifically to make searching the most recent
> "stable" version of the docs easier (using Google's site: modifier, no less).
> I don't know what the link count statistics say (other than what you
> mention), and don't know which gets hit more often

I've been looking into page stats for the AltPyDotOrgCms activity; from
what I can tell, it's evenly distributed (~55% on,
45% on


From mal at  Tue Feb 14 00:03:35 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 14 Feb 2006 00:03:35 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>	<>
Message-ID: <>

Phillip J. Eby wrote:
>>>> Why not just have the constructor be:
>>>>      bytes(initializer [,encoding])
>>>> Where initializer must be either an iterable of suitable integers, or a
>>>> unicode/string object.  If the latter (i.e., it's a basestring), the
>>>> encoding argument would then be required.  Then, there's no need for
>>>> special codec support for the bytes type, since you call bytes on the 
>> thing
>>>> to be encoded.  And of course, no need for a 'b' literal.
>>> It'd be cruel and unusual punishment though to have to write
>>>   bytes("abc", "Latin-1")
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
> Actually, I thought we were talking about adding bytes() in 2.5.

Then we'd need to make the "ascii" encoding assumption
again, just like Guido proposed.

> However, now that you've brought this up, it actually makes perfect sense 
> to just use latin-1 as the effective encoding for both strings and 
> unicode.  In Python 2.x, strings are byte strings by definition, so it's 
> only in 3.0 that an encoding would be required.  And again, latin1 is a 
> reasonable, roundtrippable default encoding.

It is. However, it's not a reasonable assumption of the
default encoding since there are many encodings out there
that special case the characters 0x80-0xFF, hence the choice
of using ASCII as default encoding in Python.

The conversion from Unicode to bytes is different in this
respect, since you are converting from a "bigger" type to
a "smaller" one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.

> So, it sounds like making the encoding default to latin-1 would be a 
> reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

>> While we're at it: I'd suggest that we remove the auto-conversion
>>from bytes to Unicode in Py3k and the default encoding along with
>> it. In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
> I thought all this was already in the plan for 3.0, but maybe I assume too 
> much.  :)

Wouldn't want to wait for Py4D :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 13 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From guido at  Tue Feb 14 00:10:50 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 15:10:50 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

On 2/13/06, M.-A. Lemburg <mal at> wrote:
> Guido van Rossum wrote:
> > It'd be cruel and unusual punishment though to have to write
> >
> >   bytes("abc", "Latin-1")
> >
> > I propose that the default encoding (for basestring instances) ought
> > to be "ascii" just like everywhere else. (Meaning, it should really be
> > the system default encoding, which defaults to "ascii" and is
> > intentionally hard to change.)
> We're talking about Py3k here: "abc" will be a Unicode string,
> so why restrict the conversion to 7 bits when you can have 8 bits
> without any conversion problems ?

As Phillip guessed, I was indeed thinking about introducing bytes()
sooner than that, perhaps even in 2.5 (though I don't want anything

Even in Py3k though, the encoding issue stands -- what if the file
encoding is Unicode? Then using Latin-1 to encode bytes by default
might not by what the user expected. Or what if the file encoding is
something totally different? (Cyrillic, Greek, Japanese, Klingon.)
Anything default but ASCII isn't going to work as expected. ASCII
isn't going to work as expected either, but it will complain loudly
(by throwing a UnicodeError) whenever you try it, rather than causing
subtle bugs later.

> While we're at it: I'd suggest that we remove the auto-conversion
> from bytes to Unicode in Py3k and the default encoding along with
> it.

I'm not sure which auto-conversion you're talking about, since there
is no bytes type yet. If you're talking about the auto-conversion from
str to unicode: the bytes type should not be assumed to have *any*
properties that the current str type has, and that includes

> In Py3k the standard lib will have to be Unicode compatible
> anyway and string parser markers like "s#" will have to go away
> as well, so there's not much need for this anymore.
> (Maybe a bit radical, but I guess that's what Py3k is meant for.)


--Guido van Rossum (home page:

From guido at  Tue Feb 14 00:15:23 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 15:15:23 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

On 2/13/06, Phillip J. Eby <pje at> wrote:
> Actually, I thought we were talking about adding bytes() in 2.5.

I was.

> However, now that you've brought this up, it actually makes perfect sense
> to just use latin-1 as the effective encoding for both strings and
> unicode.  In Python 2.x, strings are byte strings by definition, so it's
> only in 3.0 that an encoding would be required.  And again, latin1 is a
> reasonable, roundtrippable default encoding.
> So, it sounds like making the encoding default to latin-1 would be a
> reasonably safe approach in both 2.x and 3.x.

I disagree. IMO the same reasons why we don't do this now for the
conversion between str and unicode stands for bytes.

> >While we're at it: I'd suggest that we remove the auto-conversion
> >from bytes to Unicode in Py3k and the default encoding along with
> >it. In Py3k the standard lib will have to be Unicode compatible
> >anyway and string parser markers like "s#" will have to go away
> >as well, so there's not much need for this anymore.

I don't know yet what the C API will look like in 3.0. But it may well
have to support auto-conversion from Unicode to char* using some
system default encoding (e.g. the Windows default code page?) in order
to be able to conveniently wrap OS APIs that use char* instead of some
sort of Unicode (and each OS has its own way of interpreting char* as
Unicode -- I believe Apple uses UTF-8?).

> I thought all this was already in the plan for 3.0, but maybe I assume too
> much.  :)

In Py3k, I can see two reasonable approaches to conversion between
strings (Unicode) and bytes: always require an explicit encoding, or
assume ASCII. Anything else is asking for trouble IMO.

--Guido van Rossum (home page:

From pje at  Tue Feb 14 00:17:07 2006
From: pje at (Phillip J. Eby)
Date: Mon, 13 Feb 2006 18:17:07 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
	<dsjrfp$g72$> <>
Message-ID: <>

At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
>The conversion from Unicode to bytes is different in this
>respect, since you are converting from a "bigger" type to
>a "smaller" one. Choosing latin-1 as default for this
>conversion would give you all 8 bits, instead of just 7
>bits that ASCII provides.

I was just pointing out that since byte strings are bytes by definition, 
then simply putting those bytes in a bytes() object doesn't alter the 
existing encoding.  So, using latin-1 when converting a string to bytes 
actually seems like the the One Obvious Way to do it.

I'm so accustomed to being wary of encoding issues that the idea doesn't 
*feel* right at first - I keep going, "but you can't know what encoding 
those bytes are".  Then I go, Duh, that's the point.  If you convert 
str->bytes, there's no conversion and no interpretation - neither the str 
nor the bytes object knows its encoding, and that's okay.  So 
str(bytes_object) (in 2.x) should also just turn it back to a normal 

In fact, the 'encoding' argument seems useless in the case of str objects, 
and it seems it should default to latin-1 for unicode objects.  The only 
use I see for having an encoding for a 'str' would be to allow confirming 
that the input string in fact is valid for that encoding.  So, 
"bytes(some_str,'ascii')" would be an assertion that some_str must be valid 

> > So, it sounds like making the encoding default to latin-1 would be a
> > reasonably safe approach in both 2.x and 3.x.
>Reasonable for bytes(): yes. In general: no.

Right, I was only talking about bytes().

For 3.0, the type formerly known as "str" won't exist, so only the Unicode 
part will be relevant then.

From guido at  Tue Feb 14 00:23:45 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 15:23:45 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>
Message-ID: <>

On 2/13/06, Phillip J. Eby <pje at> wrote:
> At 12:03 AM 2/14/2006 +0100, M.-A. Lemburg wrote:
> >The conversion from Unicode to bytes is different in this
> >respect, since you are converting from a "bigger" type to
> >a "smaller" one. Choosing latin-1 as default for this
> >conversion would give you all 8 bits, instead of just 7
> >bits that ASCII provides.
> I was just pointing out that since byte strings are bytes by definition,
> then simply putting those bytes in a bytes() object doesn't alter the
> existing encoding.  So, using latin-1 when converting a string to bytes
> actually seems like the the One Obvious Way to do it.

This actually makes some sense -- bytes(s) where isinstance(s, str)
should just copy the data, since we can't know what encoding the user
believes it is in anyway. (With the exception of string literals,
where it makes sense to assume that the user believes it is in the
same encoding as the source code -- but I believe non-ASCII characters
in string literals are disallowed anyway, or at least known to cause
undefined results in rats.)

> I'm so accustomed to being wary of encoding issues that the idea doesn't
> *feel* right at first - I keep going, "but you can't know what encoding
> those bytes are".  Then I go, Duh, that's the point.  If you convert
> str->bytes, there's no conversion and no interpretation - neither the str
> nor the bytes object knows its encoding, and that's okay.  So
> str(bytes_object) (in 2.x) should also just turn it back to a normal
> bytestring.

You've got me convinced. Scrap my previous responses in this thread.

> In fact, the 'encoding' argument seems useless in the case of str objects,


> and it seems it should default to latin-1 for unicode objects.

But here I disagree.

> The only
> use I see for having an encoding for a 'str' would be to allow confirming
> that the input string in fact is valid for that encoding.  So,
> "bytes(some_str,'ascii')" would be an assertion that some_str must be valid

We already have ways to assert that a string is ASCII.

> For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> part will be relevant then.

And I think then the encoding should be required or default to ASCII.

--Guido van Rossum (home page:

From fuzzyman at  Tue Feb 14 00:40:16 2006
From: fuzzyman at (Michael Foord)
Date: Mon, 13 Feb 2006 23:40:16 +0000
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

Phillip J. Eby wrote:
> In fact, the 'encoding' argument seems useless in the case of str objects, 
> and it seems it should default to latin-1 for unicode objects.  The only 
-1 for having an implicit encode that behaves differently to other 
implicit encodes/decodes that happen in Python. Life is confusing enough 

Michael Foord

From guido at  Tue Feb 14 00:44:27 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 15:44:27 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>
Message-ID: <>

On 2/13/06, Michael Foord <fuzzyman at> wrote:
> Phillip J. Eby wrote:
> [snip..]
> >
> > In fact, the 'encoding' argument seems useless in the case of str objects,
> > and it seems it should default to latin-1 for unicode objects.  The only
> >
> -1 for having an implicit encode that behaves differently to other
> implicit encodes/decodes that happen in Python. Life is confusing enough
> already.

But adding an encoding doesn't help. The str.encode() method always
assumes that the string itself is ASCII-encoded, and that's not good

>>> "abc".encode("latin-1")
>>> "abc".decode("latin-1")
>>> "abc\xf0".decode("latin-1")
>>> "abc\xf0".encode("latin-1")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
3: ordinal not in range(128)

The right way to look at this is, as Phillip says, to consider
conversion between str and bytes as not an encoding but a data type
change *only*.

--Guido van Rossum (home page:

From barry at  Tue Feb 14 00:50:40 2006
From: barry at (Barry Warsaw)
Date: Mon, 13 Feb 2006 18:50:40 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349?
	[	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>
Message-ID: <>

On Mon, 2006-02-13 at 15:44 -0800, Guido van Rossum wrote:

> The right way to look at this is, as Phillip says, to consider
> conversion between str and bytes as not an encoding but a data type
> change *only*.

That sounds right to me too.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From fuzzyman at  Tue Feb 14 00:53:16 2006
From: fuzzyman at (Michael Foord)
Date: Mon, 13 Feb 2006 23:53:16 +0000
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/13/06, Michael Foord <fuzzyman at> wrote:
>> Phillip J. Eby wrote:
>> [snip..]
>>> In fact, the 'encoding' argument seems useless in the case of str objects,
>>> and it seems it should default to latin-1 for unicode objects.  The only
>> -1 for having an implicit encode that behaves differently to other
>> implicit encodes/decodes that happen in Python. Life is confusing enough
>> already.
> But adding an encoding doesn't help. The str.encode() method always
> assumes that the string itself is ASCII-encoded, and that's not good
> enough:
Sorry - I meant for the unicode to bytes case. A default encoding that 
behaves differently to the current to implicit encodes/decodes would be 
confusing IMHO.

I agree that string to bytes shouldn't change the value of the bytes. 
The least confusing description of a non-unicode string is 'byte-string'.

Michael Foord
>>>> "abc".encode("latin-1")
> 'abc'
>>>> "abc".decode("latin-1")
> u'abc'
>>>> "abc\xf0".decode("latin-1")
> u'abc\xf0'
>>>> "abc\xf0".encode("latin-1")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> 3: ordinal not in range(128)
> The right way to look at this is, as Phillip says, to consider
> conversion between str and bytes as not an encoding but a data type
> change *only*.
> --
> --Guido van Rossum (home page:

From aleaxit at  Tue Feb 14 00:53:31 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 13 Feb 2006 15:53:31 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/13/06, Guido van Rossum <guido at> wrote:
> I don't like to add a built-in index() at this point; mostly because
> of Occam's razor (we haven't found a need).

I thought you had agreed, back when I had said that __index__ should
also be made easily available to implementors of Python-coded classes
implementing sequences, more elegantly than by demanding that they
code x.__index__() [I can't think offhand of any other special-named
method that you HAVE to call directly -- there's always some syntax or
functionality in the standard library to call it more elegantly on
your behalf].  This doesn't neessarily argue that index should be in
the built-ins module, of course, but I thought there was a sentiment
towards having it in either the operator or math modules.


From guido at  Tue Feb 14 01:04:26 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 16:04:26 -0800
Subject: [Python-Dev] bdist_* to stdlib?
Message-ID: <>

In private email, Phillip Eby suggested to add these things to the
2.5. standard library:

bdist_deb, bdist_msi, and friends

He explained them as follows:

bdist_deb makes .deb files (packages for Debian-based Linux distros, like
Ubuntu).  bdist_msi makes .msi installers for Windows (it's by Martin v.
Loewis).  Marc Lemburg proposed on the distutils-sig that these and various
other implemented bdist_* formats (other than bdist_egg) be included in the
next Python release, and there was no opposition there that I recall.

I guess bdist_egg should also be added if we support setuptools (not
setuplib as I mistakenly called it previously)? (I'm still a bit
unclear on the various concepts here, not having made a distribution
of anything in a very long time...)

--Guido van Rossum (home page:

From guido at  Tue Feb 14 01:07:56 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 16:07:56 -0800
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object,
	a or b, can be used in X[a:b] notation
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry, you're right. operator.index() sounds fine.


On 2/13/06, Alex Martelli <aleaxit at> wrote:
> On 2/13/06, Guido van Rossum <guido at> wrote:
>    ...
> > I don't like to add a built-in index() at this point; mostly because
> > of Occam's razor (we haven't found a need).
> I thought you had agreed, back when I had said that __index__ should
> also be made easily available to implementors of Python-coded classes
> implementing sequences, more elegantly than by demanding that they
> code x.__index__() [I can't think offhand of any other special-named
> method that you HAVE to call directly -- there's always some syntax or
> functionality in the standard library to call it more elegantly on
> your behalf].  This doesn't neessarily argue that index should be in
> the built-ins module, of course, but I thought there was a sentiment
> towards having it in either the operator or math modules.
> Alex

--Guido van Rossum (home page:

From pje at  Tue Feb 14 01:09:57 2006
From: pje at (Phillip J. Eby)
Date: Mon, 13 Feb 2006 19:09:57 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
	<dsbc3h$rct$> <dsjrfp$g72$>
Message-ID: <>

At 03:23 PM 2/13/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, Phillip J. Eby <pje at> wrote:
> > The only
> > use I see for having an encoding for a 'str' would be to allow confirming
> > that the input string in fact is valid for that encoding.  So,
> > "bytes(some_str,'ascii')" would be an assertion that some_str must be valid
> > ASCII.
>We already have ways to assert that a string is ASCII.

I didn't mean that it was the only purpose.  In Python 2.x, practical code 
has to sometimes deal with "string-like" objects.  That is, code that takes 
either strings or unicode.  If such code calls bytes(), it's going to want 
to include an encoding so that unicode conversions won't fail.  But 
silently ignoring the encoding argument in that case isn't a good idea.

Ergo, I propose to permit the encoding to be specified when passing in a 
(2.x) str object, to allow code that handles both str and unicode to be 
"str-stable" in 2.x.

I'm fine with rejecting an encoding argument if the initializer is not a 
str or unicode; I just don't want the call signature to vary based on a 
runtime distinction between str and unicode.  And, I don't want the 
encoding argument to be silently ignored when you pass in a string.  If I 
assert that I'm encoding ASCII (or utf-8 or whatever), then the string 
should be required to be valid.  If I don't pass in an encoding, then I'm 
good to go.

(This is orthogonal to the issue of what encoding is used as a default for 
conversions from the unicode type, btw.)

> > For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> > part will be relevant then.
>And I think then the encoding should be required or default to ASCII.

The reason I'm arguing for latin-1 is symmetry in 2.x versions only.  (In 
3.x, there's no str vs. unicode, and thus nothing to be symmetrical.)  So, 
if you invoke bytes() without an encoding on a 2.x basestring, you should 
get the same result.  Latin-1 produces "the same result" when viewed in 
terms of the resulting byte string.

If we don't go with latin-1, I'd argue for requiring an encoding for 
unicode objects in 2.x, because that seems like the only reasonable way to 
break the symmetry between str and unicode, even though it forces 
"str-stable" code to specify an encoding.  The key is that at least *one* 
of the signatures needs to be stable in meaning across both str and unicode 
in 2.x in order to allow unicode-safe, str-stable code to be written.

(Again, for 3.x, this issue doesn't come into play because there's only one 
string type to worry about; what the default is or whether there's a 
default is therefore entirely up to you.)

From guido at  Tue Feb 14 01:09:32 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 16:09:32 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, Michael Foord <fuzzyman at> wrote:
> Sorry - I meant for the unicode to bytes case. A default encoding that
> behaves differently to the current to implicit encodes/decodes would be
> confusing IMHO.

And I am in agreement with you there (I think only Phillip argued otherwise).

> I agree that string to bytes shouldn't change the value of the bytes.

It's a deal then.

Can the owner of PEP 332 update the PEP to record these decisions?

--Guido van Rossum (home page:

From python-dev at  Tue Feb 14 01:16:10 2006
From: python-dev at (Ka-Ping Yee)
Date: Mon, 13 Feb 2006 18:16:10 -0600 (CST)
Subject: [Python-Dev] Missing PyCon 2006
Message-ID: <>

Hi folks.  I had been planning to attend PyCon this year and was really
looking forward to it, but i need to cancel.  I am sorry that i won't
be getting to see you all in a couple of weeks.

If you know anyone who hasn't yet registered but wants to go, please
contact me -- we can transfer my registration.  Thanks, and sorry for
using python-dev for this.

-- ?!ng

From guido at  Tue Feb 14 01:29:27 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 16:29:27 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, Phillip J. Eby <pje at> wrote:
> I didn't mean that it was the only purpose.  In Python 2.x, practical code
> has to sometimes deal with "string-like" objects.  That is, code that takes
> either strings or unicode.  If such code calls bytes(), it's going to want
> to include an encoding so that unicode conversions won't fail.

That sounds like a rather hypothetical example. Have you thought it
through? Presumably code that accepts both str and unicode either
doesn't care about encodings, but simply returns objects of the same
type as the arguments -- and then it's unlikely to want to convert the
arguments to bytes; or it *does* care about encodings, and then it
probably already has to special-case str vs. unicode because it has to
control how str objects are interpreted.

> But
> silently ignoring the encoding argument in that case isn't a good idea.
> Ergo, I propose to permit the encoding to be specified when passing in a
> (2.x) str object, to allow code that handles both str and unicode to be
> "str-stable" in 2.x.

Again, have you thought this through?

What would bytes("abc\xf0", "latin-1") *mean*? Take the string
"abc\xf0", interpret it as being encoded in XXX, and then encode from
XXX to Latin-1. But what's XXX? As I showed in a previous post,
"abc\xf0".encode("latin-1") *fails* because the source for the
encoding is assumed to be ASCII.

I think we can make this work only when the string in fact only
contains ASCII and the encoding maps ASCII to itself (which most
encodings do -- but e.g. EBCDIC does not). But I'm not sure how useful
that is.

> I'm fine with rejecting an encoding argument if the initializer is not a
> str or unicode; I just don't want the call signature to vary based on a
> runtime distinction between str and unicode.

I'm still not sure that this will actually help anyone.

> And, I don't want the
> encoding argument to be silently ignored when you pass in a string.


> If I
> assert that I'm encoding ASCII (or utf-8 or whatever), then the string
> should be required to be valid.

Defined how? That the string is already in that encoding?

> If I don't pass in an encoding, then I'm
> good to go.
> (This is orthogonal to the issue of what encoding is used as a default for
> conversions from the unicode type, btw.)

Right. The issues are completely different!

> > > For 3.0, the type formerly known as "str" won't exist, so only the Unicode
> > > part will be relevant then.
> >
> >And I think then the encoding should be required or default to ASCII.
> The reason I'm arguing for latin-1 is symmetry in 2.x versions only.  (In
> 3.x, there's no str vs. unicode, and thus nothing to be symmetrical.)  So,
> if you invoke bytes() without an encoding on a 2.x basestring, you should
> get the same result.  Latin-1 produces "the same result" when viewed in
> terms of the resulting byte string.

Only if you assume the str object is encoded in Latin-1.

Your argument for symmetry would be a lot stronger if we used Latin-1
for the conversion between str and Unicode. But we don't. I like the
other interpretation (which I thought was yours too?) much better: str
<--> bytes conversions don't use encodings by simply change the type
without changing the bytes; conversion between either and unicode
works exactly the same, and requires an encoding unless all the
characters involved are pure ASCII.

> If we don't go with latin-1, I'd argue for requiring an encoding for
> unicode objects in 2.x, because that seems like the only reasonable way to
> break the symmetry between str and unicode, even though it forces
> "str-stable" code to specify an encoding.  The key is that at least *one*
> of the signatures needs to be stable in meaning across both str and unicode
> in 2.x in order to allow unicode-safe, str-stable code to be written.

Using ASCII as the default encoding has the same property -- it can
remain stable across the 2.x / 3.0 boundary.

> (Again, for 3.x, this issue doesn't come into play because there's only one
> string type to worry about; what the default is or whether there's a
> default is therefore entirely up to you.)

A nice-to-have property would be that it might be possible to write
code that today deals with Unicode and str, but in 3.0 will deal with
Unicode and bytes instead. But I'm not sure how likely that is since
bytes objects won't have most methods that str and Unicode objects
have (like lower(), find(), etc.).

There's one property that bytes, str and unicode all share: type(x[0])
== type(x), at least as long as len(x) >= 1. This is perhaps the
ultimate test for string-ness.

Or should b[0] be an int, if b is a bytes object? That would change
things dramatically.

There's also the consideration for APIs that, informally, accept
either a string or a sequence of objects. Many of these exist, and
they are probably all being converted to support unicode as well as
str (if it makes sense at all). Should a bytes object be considered as
a sequence of things, or as a single thing, from the POV of these
types of APIs? Should we try to standardize how code tests for the
difference? (Currently all sorts of shortcuts are being taken, from
isinstance(x, (list, tuple)) to isinstance(x, basestring).)

--Guido van Rossum (home page:

From foom at  Tue Feb 14 01:49:55 2006
From: foom at (James Y Knight)
Date: Mon, 13 Feb 2006 19:49:55 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Feb 13, 2006, at 7:09 PM, Guido van Rossum wrote:

> On 2/13/06, Michael Foord <fuzzyman at> wrote:
>> Sorry - I meant for the unicode to bytes case. A default encoding  
>> that
>> behaves differently to the current to implicit encodes/decodes  
>> would be
>> confusing IMHO.
> And I am in agreement with you there (I think only Phillip argued  
> otherwise).
>> I agree that string to bytes shouldn't change the value of the bytes.
> It's a deal then.
> Can the owner of PEP 332 update the PEP to record these decisions?

So, in python2.X, you have:
- bytes("\x80"), you get a bytestring with a single byte of value  
0x80 (when no encoding is specified, and the object is a str, it  
doesn't try to encode it at all).
- bytes("\x80", encoding="latin-1"), you get an error, because  
encoding "\x80" into latin-1 implicitly decodes it into a unicode  
object first, via the system-wide default: ascii.
- bytes(u"\x80"), you get an error, because the default encoding for  
a unicode string is ascii.
- bytes(u"\x80", encoding="latin-1"), you get a bytestring with a  
single byte of value 0x80.

In py3k, when the str object is eliminated, then what do you have?  
- bytes("\x80"), you get an error, encoding is required. There is no  
such thing as "default encoding" anymore, as there's no str object.
- bytes("\x80", encoding="latin-1"), you get a bytestring with a  
single byte of value 0x80.


From guido at  Tue Feb 14 02:11:42 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 17:11:42 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, James Y Knight <foom at> wrote:
> So, in python2.X, you have:
> - bytes("\x80"), you get a bytestring with a single byte of value
> 0x80 (when no encoding is specified, and the object is a str, it
> doesn't try to encode it at all).
> - bytes("\x80", encoding="latin-1"), you get an error, because
> encoding "\x80" into latin-1 implicitly decodes it into a unicode
> object first, via the system-wide default: ascii.
> - bytes(u"\x80"), you get an error, because the default encoding for
> a unicode string is ascii.
> - bytes(u"\x80", encoding="latin-1"), you get a bytestring with a
> single byte of value 0x80.

Yes to all.

> In py3k, when the str object is eliminated, then what do you have?
> Perhaps
> - bytes("\x80"), you get an error, encoding is required. There is no
> such thing as "default encoding" anymore, as there's no str object.
> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
> single byte of value 0x80.

Yes to both again.

--Guido van Rossum (home page:

From jeremy at  Tue Feb 14 02:49:21 2006
From: jeremy at (Jeremy Hylton)
Date: Mon, 13 Feb 2006 20:49:21 -0500
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

On 2/13/06, Fred L. Drake, Jr. <fdrake at> wrote:
> On Monday 13 February 2006 15:40, Guido van Rossum wrote:
>  > Shouldn't be removed? It seems to add mroe confusion
>  > than anything, especially since most links on continue to
>  > point to
> was created specifically to make searching the most recent
> "stable" version of the docs easier (using Google's site: modifier, no less).
> I don't know what the link count statistics say (other than what you
> mention), and don't know which gets hit more often, but I still think it's a
> reasonable approach.

Why not do a query like this?


From nas at  Tue Feb 14 03:52:40 2006
From: nas at (Neil Schemenauer)
Date: Tue, 14 Feb 2006 02:52:40 +0000 (UTC)
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
References: <dsbc3h$rct$>
Message-ID: <dsrglo$2a7$>

Guido van Rossum <guido at> wrote:
>> In py3k, when the str object is eliminated, then what do you have?
>> Perhaps
>> - bytes("\x80"), you get an error, encoding is required. There is no
>> such thing as "default encoding" anymore, as there's no str object.
>> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
>> single byte of value 0x80.
> Yes to both again.

I haven't been following this dicussion about bytes() real closely
but I don't think that bytes() should do the encoding.  We already
have a way to spell that:


Also, I think it would useful to introduce byte array literals at
the same time as the bytes object.  That would allow people to use
byte arrays without having to get involved with all the silly string
encoding confusion.


From fdrake at  Tue Feb 14 04:29:21 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Mon, 13 Feb 2006 22:29:21 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <dsrglo$2a7$>
References: <dsbc3h$rct$>
Message-ID: <>

On Monday 13 February 2006 21:52, Neil Schemenauer wrote:
 > Also, I think it would useful to introduce byte array literals at
 > the same time as the bytes object.  That would allow people to use
 > byte arrays without having to get involved with all the silly string
 > encoding confusion.

bytes([0, 1, 2, 3])


Fred L. Drake, Jr.   <fdrake at>

From guido at  Tue Feb 14 05:07:49 2006
From: guido at (Guido van Rossum)
Date: Mon, 13 Feb 2006 20:07:49 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <dsrglo$2a7$>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, Neil Schemenauer <nas at> wrote:
> Guido van Rossum <guido at> wrote:
> >> In py3k, when the str object is eliminated, then what do you have?
> >> Perhaps
> >> - bytes("\x80"), you get an error, encoding is required. There is no
> >> such thing as "default encoding" anymore, as there's no str object.
> >> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >> single byte of value 0x80.
> >
> > Yes to both again.
> I haven't been following this dicussion about bytes() real closely
> but I don't think that bytes() should do the encoding.  We already
> have a way to spell that:
>     "\x80".encode('latin-1')

But in 2.5 we can't change that to return a bytes object without
creating HUGE incompatibilities.

In general I've come to appreciate that there are two ways of
converting an object of type A to an object of type B: ask an A
instance to convert itself to a B, or ask the type B to create a new
instance from an A. Depending on what A and B are, both APIs make
sense; sometimes reasons of decoupling require that A can't know about
B, in which case you have to use the latter approach; sometimes B
can't know about A, in which case you have to use the former. Even
when A == B we sometimes support both APIs: to create a new list from
a list a, you can write a[:] or list(a); to create a new dict from a
dict d, you can write d.copy() or dict(d).

An advantage of the latter API is that there's no confusion about the
resulting type -- dict(d) is definitely a dict, and list(a) is
definitely a list. Not so for d.copy() or a[:] -- if the input type is
another mapping or sequence, it'll probably return an object of that
same type.

Again, it depends on the application which is better.

I think that bytes(s, <encoding>) is fine, especially for expressing a
new type, since it is unambiguous about the result type, and has no
backwards compatibility issues.

> Also, I think it would useful to introduce byte array literals at
> the same time as the bytes object.  That would allow people to use
> byte arrays without having to get involved with all the silly string
> encoding confusion.

You missed the part where I said that introducing the bytes type
*without* a literal seems to be a good first step. A new type, even
built-in, is much less drastic than a new literal (which requires
lexer and parser support in addition to everything else).

--Guido van Rossum (home page:

From barry at  Tue Feb 14 05:59:03 2006
From: barry at (Barry Warsaw)
Date: Mon, 13 Feb 2006 23:59:03 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote:

> There's one property that bytes, str and unicode all share: type(x[0])
> == type(x), at least as long as len(x) >= 1. This is perhaps the
> ultimate test for string-ness.

But not perfect, since of course other containers can contain objects  
of their own type too.  But it leads to an interesting issue...

> Or should b[0] be an int, if b is a bytes object? That would change
> things dramatically.

This makes me think I want an unsigned byte type, which b[0] would  
return.  In another thread I think someone mentioned something about  
fixed width integral types, such that you could have an object that  
was guaranteed to be 8-bits wide, 16-bits wide, etc.   Maybe you also  
want signed and unsigned versions of each.  This may seem like YAGNI  
to many people, but as I've been working on a tightly embedded/ 
extended application for the last few years, I've definitely had  
occasions where I wish I could more closely and more directly model  
my C values as Python objects (without using the standard workarounds  
or writing my own C extension types).

But anyway, without hyper-generalizing, it's still worth asking  
whether a bytes type is just a container of byte objects, where the  
contained objects would be distinct, fixed 8-bit unsigned integral  

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects. Many of these exist, and
> they are probably all being converted to support unicode as well as
> str (if it makes sense at all). Should a bytes object be considered as
> a sequence of things, or as a single thing, from the POV of these
> types of APIs? Should we try to standardize how code tests for the
> difference? (Currently all sorts of shortcuts are being taken, from
> isinstance(x, (list, tuple)) to isinstance(x, basestring).)

I think bytes objects are very much like string objects today --  
they're the photons of Python since they can act like either  
sequences or scalars, depending on the context.  For example, we have  
code that needs to deal with situations where an API can return  
either a scalar or a sequence of those scalars.  So we have a utility  
function like this:

def thingiter(obj):
         it = iter(obj)
     except TypeError:
         yield obj
         for item in it:
             yield item

Maybe there's a better way to do this, but the most obvious problem  
is that (for our use cases), this fails for strings because in this  
context we want strings to act like scalars.  So we add a little test  
just before the "try:" like "if isinstance(obj, basestring): yield  
obj".  But that's yucky.

I don't know what the solution is -- if there /is/ a solution short  
of special case tests like above, but I think the key observation is  
that sometimes you want your string to act like a sequence and  
sometimes you want it to act like a scalar.  I suspect bytes objects  
will be the same way.


From pje at  Tue Feb 14 06:20:56 2006
From: pje at (Phillip J. Eby)
Date: Tue, 14 Feb 2006 00:20:56 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <
References: <>
Message-ID: <>

At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, Phillip J. Eby <pje at> wrote:
> > I didn't mean that it was the only purpose.  In Python 2.x, practical code
> > has to sometimes deal with "string-like" objects.  That is, code that takes
> > either strings or unicode.  If such code calls bytes(), it's going to want
> > to include an encoding so that unicode conversions won't fail.
>That sounds like a rather hypothetical example. Have you thought it
>through? Presumably code that accepts both str and unicode either
>doesn't care about encodings, but simply returns objects of the same
>type as the arguments -- and then it's unlikely to want to convert the
>arguments to bytes; or it *does* care about encodings, and then it
>probably already has to special-case str vs. unicode because it has to
>control how str objects are interpreted.

Actually, it's the other way around.  Code that wants to output 
uninterpreted bytes right now and accepts either strings or Unicode has to 
special-case *unicode* -- not str, because str is the only "bytes type" we 
currently have.

This creates an interesting issue in WSGI for Jython, which of course only 
has one (unicode-based) string type now.  Since there's no bytes type in 
Python in general, the only solution we could come up with was to treat 
such strings as latin-1:

This is why I'm biased towards latin-1 encoding of unicode to bytes; it's 
"the same thing" as an uninterpreted string of bytes.

I think the difference in our viewpoints is that you're still thinking 
"string" thoughts, whereas I'm thinking "byte" thoughts.  Bytes are just 
bytes; they don't *have* an encoding.

So, if you think of "converting a string to bytes" as meaning "create an 
array of numerals corresponding to the characters in the string", then this 
leads to a uniform result whether the characters are in a str or a unicode 
object.  In other words, to me, bytes(str_or_unicode) should be treated as:

     bytes(map(ord, str_or_unicode))

In other words, without an encoding, bytes() should simply treat str and 
unicode objects *as if they were a sequence of integers*, and produce an 
error when an integer is out of range.  This is a logical and consistent 
interpretation in the absence of an encoding, because in that case you 
don't care about the encoding - it's just raw data.

If, however, you include an encoding, then you're stating that you want to 
encode the *meaning* of the string, not merely its integer values.

>What would bytes("abc\xf0", "latin-1") *mean*? Take the string
>"abc\xf0", interpret it as being encoded in XXX, and then encode from
>XXX to Latin-1. But what's XXX? As I showed in a previous post,
>"abc\xf0".encode("latin-1") *fails* because the source for the
>encoding is assumed to be ASCII.

I'm saying that XXX would be the same encoding as you specified.  i.e., 
including an encoding means you are encoding the *meaning* of the string.

However, I believe I mainly proposed this as an alternative to having 
bytes(str_or_unicode) work like bytes(map(ord,str_or_unicode)), which I 
think is probably a saner default.

>Your argument for symmetry would be a lot stronger if we used Latin-1
>for the conversion between str and Unicode. But we don't.

But that's because we're dealing with its meaning *as a string*, not merely 
as ordinals in a sequence of bytes.

>  I like the
>other interpretation (which I thought was yours too?) much better: str
><--> bytes conversions don't use encodings by simply change the type
>without changing the bytes;

I like it better too.  The part you didn't like was where MAL and I believe 
this should be extended to Unicode characters in the 0-255 range also.  :)

>There's one property that bytes, str and unicode all share: type(x[0])
>== type(x), at least as long as len(x) >= 1. This is perhaps the
>ultimate test for string-ness.
>Or should b[0] be an int, if b is a bytes object? That would change
>things dramatically.

+1 for it being an int.  Heck, I'd want to at least consider the 
possibility of introducing a character type (chr?) in Python 3.0, and 
getting rid of the "iterating a string yields strings" 
characteristic.  I've found it to be a bit of a pain when dealing with 
heterogeneous nested sequences that contain strings.

>There's also the consideration for APIs that, informally, accept
>either a string or a sequence of objects. Many of these exist, and
>they are probably all being converted to support unicode as well as
>str (if it makes sense at all). Should a bytes object be considered as
>a sequence of things, or as a single thing, from the POV of these
>types of APIs? Should we try to standardize how code tests for the
>difference? (Currently all sorts of shortcuts are being taken, from
>isinstance(x, (list, tuple)) to isinstance(x, basestring).)

I'm inclined to think of certain features at least in terms of the buffer 
interface, but that's not something that's really exposed at the Python level.

From martin at  Tue Feb 14 07:30:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 07:30:16 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	
Message-ID: <>

Jeremy Hylton wrote:
> The compiler in question is gcc and the warning can be turned off with
> -Wno-write-strings.  I think we'd be better off leaving that option
> on, though.  This warning will help me find places where I'm passing a
> string literal to a function that does not take a const char*.  That's
> valuable, not insensate.

Hmm. I'd say this depends on what your reaction to the warning is.
If you sprinkle const_casts in the code, nothing is gained.

Perhaps there is some value in finding functions which ought to expect
const char*. For that, occasional checks should be sufficient; I cannot
see a point in having code permanently pass with that option. In
particular not if you are interfacing with C libraries.


From martin at  Tue Feb 14 07:47:13 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 07:47:13 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> We're talking about Py3k here: "abc" will be a Unicode string,
> so why restrict the conversion to 7 bits when you can have 8 bits
> without any conversion problems ?

YAGNI. If you have a need for byte string in source code, it will
typically be "random" bytes, which can be nicely used through

  bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14,  0x8b, 0xee])

For larger blocks, people should use base64.string_to_bytes (which
can become a synonym for base64.decodestring in Py3k).

If you have bytes that are meaningful text for some application
(say, a wire protocol), it is typically ASCII-Text. No protocol
I know of uses non-ASCII characters for protocol information.

Of course, you need a way to get .encode output as bytes somehow,
both in 2.5, and in Py3k. I suggest writing


In 2.5, bytes() can be constructed from strings, and will do a
conversion; in Py3k, .encode will already return a string, so
this will be a no-op.


From martin at  Tue Feb 14 07:52:13 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 07:52:13 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

Phillip J. Eby wrote:
> I was just pointing out that since byte strings are bytes by definition, 
> then simply putting those bytes in a bytes() object doesn't alter the 
> existing encoding.  So, using latin-1 when converting a string to bytes 
> actually seems like the the One Obvious Way to do it.

This is a misconception. In Python 2.x, the type str already *is* a
bytes type. So if S is an instance of 2.x str, bytes(S) does not need
to do any conversion. You don't need to assume it is latin-1: it's
already bytes.

> In fact, the 'encoding' argument seems useless in the case of str objects, 
> and it seems it should default to latin-1 for unicode objects.

I agree with the former, but not with the latter. There shouldn't be a
conversion of Unicode objects to bytes at all. If you want bytes from
a Unicode string U, write



From martin at  Tue Feb 14 07:58:01 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 07:58:01 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
>>In py3k, when the str object is eliminated, then what do you have?
>>- bytes("\x80"), you get an error, encoding is required. There is no
>>such thing as "default encoding" anymore, as there's no str object.
>>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
>>single byte of value 0x80.
> Yes to both again.

Please reconsider, and don't give bytes() an encoding= argument.
It doesn't need one. In Python 3, people should write


if they absolutely want to, although they better write


Now, the first form isn't valid in 2.5, but


could work in all versions.


From rhamph at  Tue Feb 14 08:04:32 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 00:04:32 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
	<> <>
Message-ID: <>

On 2/13/06, "Martin v. L?wis" <martin at> wrote:
> M.-A. Lemburg wrote:
> > We're talking about Py3k here: "abc" will be a Unicode string,
> > so why restrict the conversion to 7 bits when you can have 8 bits
> > without any conversion problems ?
> YAGNI. If you have a need for byte string in source code, it will
> typically be "random" bytes, which can be nicely used through
>   bytes([0x73, 0x9f, 0x44, 0xd2, 0xfb, 0x49, 0xa3, 0x14,  0x8b, 0xee])
> For larger blocks, people should use base64.string_to_bytes (which
> can become a synonym for base64.decodestring in Py3k).
> If you have bytes that are meaningful text for some application
> (say, a wire protocol), it is typically ASCII-Text. No protocol
> I know of uses non-ASCII characters for protocol information.

What would that imply for repr()?  To support eval(repr(x)) it would
have to produce whatever format the source code includes to begin

If I understand correctly there's three main candidates:
1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
2. Direct copying to str/unicode if it's only ascii values, switching
to a list of hex literals if there's any non-ascii values
3. b"foo" literal with ascii for all ascii characters (other than \
and "), \xFF for individual characters that aren't ascii

Given the choice I prefer the third option, with the second option as
my runner up.  The first option just screams "silent errors" to me.

Adam Olsen, aka Rhamphoryncus

From martin at  Tue Feb 14 08:04:50 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 08:04:50 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
>>It's the consequences:  nobody complains about tacking "const" on to a
>>former honest-to-God "char *" argument that was in fact not modified,
>>because that's not only helpful for C++ programmers, it's _harmless_
>>for all programmers.  For example, nobody could sanely object (and
>>nobody did :-)) to adding const to the attribute-name argument in
>>PyObject_SetAttrString().  Sticking to that creates no new problems
>>for anyone, so that's as far as I ever went.
> Well, it broke my C extensions... I now have this in my code:
> /* The keyword array changed to const char* in Python 2.5 */
> #if PY_VERSION_HEX >= 0x02050000
> # define Py_KEYWORDS_STRING_TYPE const char
> #else
> # define Py_KEYWORDS_STRING_TYPE char
> #endif
> ...
> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
> ...

You did not read Tim's message carefully enough. He wasn't talking
about PyArg_ParseTupleAndKeywords *at all*. He only talked about
changing char* arguments to const char*, e.g. in
PyObject_SetAttrString. Did that break your C extensions also?


From foom at  Tue Feb 14 08:09:55 2006
From: foom at (James Y Knight)
Date: Tue, 14 Feb 2006 02:09:55 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 14, 2006, at 12:20 AM, Phillip J. Eby wrote:
>      bytes(map(ord, str_or_unicode))
> In other words, without an encoding, bytes() should simply treat  
> str and
> unicode objects *as if they were a sequence of integers*, and  
> produce an
> error when an integer is out of range.  This is a logical and  
> consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.

If you're talking about "raw data", then make bytes(unicodestring)  
produce what buffer(unicodestring) currently does -- something  
completely and utterly worthless. :) [it depends on how you compiled  
python and what endianness your system has.]

There really is no case where you don't care about the  
encoding...there is always a specific desired output encoding, and  
you have to think about what encoding that is. The argument that  
latin-1 is a sensible default just because you can convert to latin-1  
by chopping off the upper 3 bytes of a unicode character's ordinal  
position is not convincing; you're still doing an encoding operation,  
it just happens to be computationally easy. That Jython programs have  
to pretend that unicode strings are an appropriate way to store  
bytes, and thus often have to do fake "latin-1" conversions which are  
really no such thing, doesn't make a convincing argument either.  
Using unicode strings to store bytes read from or written to a socket  
is really just broken.

Actually having any default encoding at all is IMO a poor idea, but  
as python has one at the moment (ascii), might as well keep using it  
for consistency until it's eliminated (sys.setdefaultencoding 
('undefined') is my friend.)


From martin at  Tue Feb 14 08:11:50 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 08:11:50 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> In private email, Phillip Eby suggested to add these things to the
> 2.5. standard library:
> bdist_deb, bdist_msi, and friends
> I guess bdist_egg should also be added if we support setuptools (not
> setuplib as I mistakenly called it previously)? 

I'm in favour of that (and not only because I wrote bdist_msi :-).
I think distutils should support all native package formats we can
get code for.

I'm actually opposed to bdist_egg, from a conceptual point of view.
I think it is wrong if Python creates its own packaging format
(just as it was wrong that Java created jar files - but they are
without deployment procedures even today). The burden should be
on developer's side, for creating packages for the various systems,
not on the users side, when each software comes with its own
deployment infrastructure.

OTOH, users are fond of eggs, for reasons that I haven't yet

>From a release management point of view, I would still like to
make another bdist_msi release before contributing it to Python.


From martin at  Tue Feb 14 08:14:57 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 14 Feb 2006 08:14:57 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

Adam Olsen wrote:
> What would that imply for repr()?  To support eval(repr(x))

I don't think eval(repr(x)) needs to be supported for the bytes
type. However, if that is desirable, it should return something



From thomas at  Tue Feb 14 08:19:46 2006
From: thomas at (Thomas Wouters)
Date: Tue, 14 Feb 2006 08:19:46 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Feb 13, 2006 at 04:04:26PM -0800, Guido van Rossum wrote:
> In private email, Phillip Eby suggested to add these things to the
> 2.5. standard library:
> bdist_deb, bdist_msi, and friends

FWIW, I've been using a patched distutils with bdist_deb, and it's worked
fine for the most part. The only issue I had was with a setuptools package
(rather than distutils), which I'm sure can be worked out. (Not that I'm
particularly convinced setuptools is the right approach for a .deb, but I
haven't really seen the point of setuptools anyway ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Tue Feb 14 09:09:22 2006
From: thomas at (Thomas Wouters)
Date: Tue, 14 Feb 2006 09:09:22 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsjrfp$g72$> <>
Message-ID: <>

On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:

> But adding an encoding doesn't help. The str.encode() method always
> assumes that the string itself is ASCII-encoded, and that's not good
> enough:

> >>> "abc".encode("latin-1")
> 'abc'
> >>> "abc".decode("latin-1")
> u'abc'
> >>> "abc\xf0".decode("latin-1")
> u'abc\xf0'
> >>> "abc\xf0".encode("latin-1")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> 3: ordinal not in range(128)

These comments disturb me. I never really understood why (byte) strings grew
the 'encode' method, since 8-bit strings *are already encoded*, by their
very nature. I mean, I understand it's useful because Python does
non-unicode encodings like 'hex', but I don't really understand *why*. The
benefits don't seem to outweigh the cost (but that's hindsight.)

Directly encoding a (byte) string into a unicode encoding is mostly useless,
as you've shown. The only use-case I can think of is translating ASCII in,
for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
unless the system encoding isn't 'ascii' (and that's pretty rare, and not
something a Python programmer should depend on.) On the other hand, the fact
that (byte) strings have an 'encode' method creates a lot of confusion in
unicode-newbies, and causes programs to break only when input is non-ASCII.
And non-ASCII input just happens too often and too unpredictably in
'real-world' code, and not enough in European programmers' tests ;P

Unicode objects and strings are not the same thing. We shouldn't treat them
as the same thing. They share an interface (like lists and tuples do), and
if you only use that interface, treating them as the same kind object is
mostly ok. They actually share *less* of an interface than lists and tuples,
though, as comparing strings to unicode objects can raise an exception,
whereas comparing lists to tuples is not expected to. For anything less
trivial than indexing, slicing and most of the string methods, and anything
what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
objects and strings *must* be treated separately. For instance, there is no
correct way to do:


unless you know the type of 's'. If it's unicode, you want u"\x80" instead
of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
but you wouldn't know from looking at the code -- maybe it expects a
specific encoding (or encoding family), maybe not. As soon as you deal with
unicode, you need to really understand the concept, and too many programmers
don't. And it's very hard to tell from someone's comments whether they fail
to understand or just get some of the terminology wrong; that's why Guido's
comments about 'encoding a byte string' and 'what if the file encoding is
Unicode' scare me. The unicode/string mixup almost makes me wish Python
was statically typed.

So please, please, please don't make the mistake of 'doing something' with
the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
It wouldn't actually be usable except for the same things as 'str.encode':
to convert from ASCII to non-ASCII-supersets, or to convert to non-unicode
encodings (such as 'hex'.) You can achieve those two by doing, e.g.,
'bytes(s.encode('hex'))' if you really want to. Ignoring the encoding
(rather than raising an exception) would also allow code to be trivially
portable between Python 2.x and Py3K, when "" is actually a unicode object.

Not that I'm happy with ignoring anything, but not ignoring would be bigger
crime here.

Oh, and while on the subject, I'm not convinced going all-unicode in Py3K is
a good idea either, but maybe I should save that discussion for PyCon. I'm
not thinking "why do we need unicode" anymore (which I did two years ago ;)
but I *am* thinking it'll be a big step for 90% of the programmers if they
have to grasp unicode and encodings to be able to even do 'raw_input()'
sensibly. I know I spend an inordinate amount of time trying to explain the
basics on #python on already.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From nnorwitz at  Tue Feb 14 09:09:36 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 14 Feb 2006 00:09:36 -0800
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$> <>
Message-ID: <>

On 2/13/06, Fred L. Drake, Jr. <fdrake at> wrote:
> On Monday 13 February 2006 10:03, Georg Brandl wrote:
>  > The above docs are from August 2005 while is current.
>  > Shouldn't the old docs be removed?
> I'm afraid I've generally been too busy to chime in much on this topic, but
> I've spent a bit of time thinking about it, and would like to keep on top of
> the issue still.


While you are here, are you planning to do the doc releases for 2.5? 
You are tentatively listed in PEP 356.  (Technically it says TBD with
a ? next to your name.)

> The automatically-maintained version of the development docs is certainly
> preferrable to the manually-maintained-by-me version, and I've updated the
> link from to refer to that version for now.  However, I
> do have some concerns about how this is all structured still.

I think this was the quick hack I did.  I hope there are many
concerns. :-)  For example, if the doc build fails, ...  Hmmm, this
probably isn't a problem.  The doc won't be updated, but will still be
the last good version.  So if I send mail when the doc doesn't build,
then it might not be so bad.  Will have to test this.  I still need to
switch over the failure mails to go to python-checkins.  There are too
many right now though.  Unless people don't mind getting several
messages about refleaks every day?  Anyone?

> What I would also like to see is to have an automatically-updated version for
> each of the maintainer versions of Python, as well as the development trunk.
> That would mean two versions at this point (2.4.x, 2.5.x); only one of those
> is currently handled automatically.

That shouldn't be a problem.  See


From mal at  Tue Feb 14 09:09:56 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 14 Feb 2006 09:09:56 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> M.-A. Lemburg wrote:
>>> It's the consequences:  nobody complains about tacking "const" on to a
>>> former honest-to-God "char *" argument that was in fact not modified,
>>> because that's not only helpful for C++ programmers, it's _harmless_
>>> for all programmers.  For example, nobody could sanely object (and
>>> nobody did :-)) to adding const to the attribute-name argument in
>>> PyObject_SetAttrString().  Sticking to that creates no new problems
>>> for anyone, so that's as far as I ever went.
>> Well, it broke my C extensions... I now have this in my code:
>> /* The keyword array changed to const char* in Python 2.5 */
>> #if PY_VERSION_HEX >= 0x02050000
>> # define Py_KEYWORDS_STRING_TYPE const char
>> #else
>> # define Py_KEYWORDS_STRING_TYPE char
>> #endif
>> ...
>> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
>> ...
> You did not read Tim's message carefully enough. He wasn't talking
> about PyArg_ParseTupleAndKeywords *at all*. He only talked about
> changing char* arguments to const char*, e.g. in
> PyObject_SetAttrString. Did that break your C extensions also?

I did read Tim's post: sorry for phrasing the reply the way I did.

I was referring to his statement "nobody complains about tacking "const"
on to a former honest-to-God "char *" argument that was in fact not

Also: it's not me complaining, it's the compilers !

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From fuzzyman at  Tue Feb 14 10:29:37 2006
From: fuzzyman at (Fuzzyman)
Date: Tue, 14 Feb 2006 09:29:37 +0000
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <dsbc3h$rct$>	
Message-ID: <>

Guido van Rossum wrote:

> [snip..]
>>In py3k, when the str object is eliminated, then what do you have?
>>- bytes("\x80"), you get an error, encoding is required. There is no
>>such thing as "default encoding" anymore, as there's no str object.
>>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
>>single byte of value 0x80.
>Yes to both again.
*Slightly* related question. Sorry for the tangent.

In Python 3K, when the string data-type has gone, what will
``open(filename).read()`` return ? Will the object returned have a
``decode`` method, to coerce to a unicode string ?

Also, what datatype will ``u'some string'.encode('ascii')`` return ?

I assume that when the ``bytes`` datatype is implemented, we will be
able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
probably ought to read the bytes PEP and the Py3k one...

Just curious...

All the best,

Michael Foord

>--Guido van Rossum (home page:

From greg.ewing at  Tue Feb 14 11:52:59 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 14 Feb 2006 23:52:59 +1300
Subject: [Python-Dev] nice()
In-Reply-To: <004f01c630c0$f051e1f0$5f2c4fca@csmith>
References: <>
Message-ID: <>

Smith wrote:

> computing the bin boundaries for a histogram
 > where bins are a width of 0.1:
>>>>for i in range(20):
> ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
> ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.

I don't see how that has any relevance to the way bin boundaries
would be used in practice, which is to say something like

   i = int(value / 0.1)
   bin[i] += 1 # modulo appropriate range checks

which doesn't require comparing floats for equality at all.

> For, say, garden variety numbers that aren't full of garbage digits
 > resulting from fp computation, the boundaries computed as 0.1*i are\
 > not going to agree with such simple numbers as 1.4 and 0.7.

Because the arithmetic is binary rather than decimal. But even using
decimal, you get the same sort of problems using a bin width of
1.0/3.0. The solution is to use an algorithm that isn't sensitive
to those problems, then it doesn't matter what base your arithmetic
is done in.

> I understand that the above really is just a patch over the problem,
 > but I'm wondering if it moves the problem far enough away that most
 > users wouldn't have to worry about it.

No, it doesn't. The problems are not conveniently grouped together
in some place you can get away from; they're scattered all over the
place where you can stumble upon one at any time.

> So perhaps this brings us back to the original comment that "fp issues
 > are a learning opportunity." They are. The question I have is "how
> soon  do they need to run into them?" Is decreasing the likelihood that
 > they will see the problem (but not eliminate it) a good thing for the
 > python community or not?

I don't think you're doing anyone any favours by trying to protect
them from having to know about these things, because they *need* to
know about them if they're not to write algorithms that seem to
work fine on tests but mysteriously start producing garbage when
run on real data, possibly without it even being obvious that it is


From greg.ewing at  Tue Feb 14 11:59:04 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 14 Feb 2006 23:59:04 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<dsjrfp$g72$> <>
Message-ID: <>

Guido van Rossum wrote:

> I also wonder if having a b"..." literal would just add more confusion
> -- bytes are not characters, but b"..." makes it appear as if they
> are.

I'm inclined to agree. Bytes objects are more likely to be used
for things which are *not* characters -- if they're characters,
they would be better kept in strings or char arrays.

+1 on any eventual bytes literal looking completely different
from a string literal.


From greg.ewing at  Tue Feb 14 12:25:03 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 00:25:03 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

Guido van Rossum wrote:

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects.

My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of


From greg.ewing at  Tue Feb 14 12:35:17 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 00:35:17 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

Barry Warsaw wrote:

> This makes me think I want an unsigned byte type, which b[0] would  
> return.

Come to think of it, this is something I don't
remember seeing discussed. I've been thinking
that bytes[i] would return an integer, but is
the intention that it would return another bytes


From ncoghlan at  Tue Feb 14 12:53:04 2006
From: ncoghlan at (Nick Coghlan)
Date: Tue, 14 Feb 2006 21:53:04 +1000
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	<>	<>	<>	<>	<>	<>	<>	<>	<dsrglo$2a7$>
Message-ID: <>

Guido van Rossum wrote:
> In general I've come to appreciate that there are two ways of
> converting an object of type A to an object of type B: ask an A
> instance to convert itself to a B, or ask the type B to create a new
> instance from an A.

And the difference between the two isn't even always that clear cut. Sometimes 
you'll ask type B to create a new instance from an A, and then while you're 
not looking type B cheats and goes and asks the A instance to do it instead ;)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Tue Feb 14 13:08:47 2006
From: ncoghlan at (Nick Coghlan)
Date: Tue, 14 Feb 2006 22:08:47 +1000
Subject: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b,
 can be used in X[a:b] notation
In-Reply-To: <>
References: <>
	<dsgem7$10u$>	<>	<>
	<dsgqlg$7ml$>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/10/06, Mark Russell <mrussell at> wrote:
>> On 10 Feb 2006, at 12:45, Nick Coghlan wrote:
>> An alternative would be to call it "__discrete__", as that is the key
>> characteristic of an indexing type - it consists of a sequence of discrete
>> values that can be isomorphically mapped to the integers.
>> Another alternative: __as_ordinal__.  Wikipedia describes ordinals as
>> "numbers used to denote the position in an ordered sequence" which seems a
>> pretty precise description of the intended result.  The "as_" prefix also
>> captures the idea that this should be a lossless conversion.
> Aren't ordinals generally assumed to be non-negative? The numbers used
> as slice or sequence indices can be negative!

The other problem with 'ordinal' as a name is that the term already has a 
meaning in Python (what else would 'ord' be short for?).

I liked index from the start, but I thought we should put at least a bit of 
effort into seeing if we could come up with anything better. I don't really 
see any way that either 'discrete' or 'ordinal' can be said to qualify as 
better :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rhamph at  Tue Feb 14 13:47:39 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 05:47:39 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
	<> <>
Message-ID: <>

On 2/14/06, "Martin v. L?wis" <martin at> wrote:
> Adam Olsen wrote:
> > What would that imply for repr()?  To support eval(repr(x))
> I don't think eval(repr(x)) needs to be supported for the bytes
> type. However, if that is desirable, it should return something
> like
>   bytes([1,2,3])

I'm starting to wonder, do we really need anything fancy?  Wouldn't it
be sufficient to have a way to compactly store 8-bit integers?

In 2.x we could convert unicode like this:
bytes(ord(c) for c in u"It's...".encode('utf-8'))
u"It's...".byteencode('utf-8')  # Shortcut for above

In 3.0 it changes to:
u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility

Passing a str or unicode directly to bytes() would be an error. 
repr(bytes(...)) would produce bytes([1,2,3]).

Probably need a __bytes__() method that print can call, or even better
a __print__(file) method[0].  The write() methods would of course have
to support bytes objects.

I realize it would be odd for the interactive interpret to print them
as a list of ints by default:
>>> u"It's...".byteencode('utf-8')
[73, 116, 39, 115, 46, 46, 46]
But maybe it's time we stopped hiding the real nature of bytes from users?

[0] By this I mean calling objects recursively and telling them what
file to print to, rather than getting a temporary string from them and
printing that.  I always wondered why you could do that from C
extensions but not from Python code.

Adam Olsen, aka Rhamphoryncus

From Jack.Jansen at  Tue Feb 14 13:59:31 2006
From: Jack.Jansen at (Jack Jansen)
Date: Tue, 14 Feb 2006 13:59:31 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Thanks to all for a rather insightful discussion, it's always fun to  
learn that after 28 years of C programming the language still has  
little corners that I know absolutely nothing about:-)

Practically speaking, though, I've adopted MAL's solution for the  
time being:

> /* The keyword array changed to const char* in Python 2.5 */
> #if PY_VERSION_HEX >= 0x02050000
> # define Py_KEYWORDS_STRING_TYPE const char
> #else
> # define Py_KEYWORDS_STRING_TYPE char
> #endif
> ...
> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
> ...
> if (!PyArg_ParseTupleAndKeywords(args,kws,format,kwslist,&a1))
>     goto onError;

At least this appears to work...
Jack Jansen, <Jack.Jansen at>,
If I can't dance I don't want to be part of your revolution -- Emma  

From jeremy at  Tue Feb 14 14:01:10 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 14 Feb 2006 08:01:10 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/14/06, M.-A. Lemburg <mal at> wrote:
> Martin v. L?wis wrote:
> > M.-A. Lemburg wrote:
> >>> It's the consequences:  nobody complains about tacking "const" on to a
> >>> former honest-to-God "char *" argument that was in fact not modified,
> >>> because that's not only helpful for C++ programmers, it's _harmless_
> >>> for all programmers.  For example, nobody could sanely object (and
> >>> nobody did :-)) to adding const to the attribute-name argument in
> >>> PyObject_SetAttrString().  Sticking to that creates no new problems
> >>> for anyone, so that's as far as I ever went.
> >>
> >> Well, it broke my C extensions... I now have this in my code:
> >>
> >> /* The keyword array changed to const char* in Python 2.5 */
> >> #if PY_VERSION_HEX >= 0x02050000
> >> # define Py_KEYWORDS_STRING_TYPE const char
> >> #else
> >> # define Py_KEYWORDS_STRING_TYPE char
> >> #endif
> >> ...
> >> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
> >> ...
> >
> > You did not read Tim's message carefully enough. He wasn't talking
> > about PyArg_ParseTupleAndKeywords *at all*. He only talked about
> > changing char* arguments to const char*, e.g. in
> > PyObject_SetAttrString. Did that break your C extensions also?
> I did read Tim's post: sorry for phrasing the reply the way I did.
> I was referring to his statement "nobody complains about tacking "const"
> on to a former honest-to-God "char *" argument that was in fact not
> modified".
> Also: it's not me complaining, it's the compilers !

Tim was talking about adding const to a char* not adding const to a
char** (note the two stars).  The subsequent discussion has been about
the different way those are handled in C and C++ and a general
agreement that the "const char**" has been a bother for people.


From mwh at  Tue Feb 14 14:03:39 2006
From: mwh at (Michael Hudson)
Date: Tue, 14 Feb 2006 13:03:39 +0000
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <> (Greg Ewing's message of
	"Wed, 15 Feb 2006 00:25:03 +1300")
References: <dsbc3h$rct$>
Message-ID: <>

Greg Ewing <greg.ewing at> writes:

> Guido van Rossum wrote:
>> There's also the consideration for APIs that, informally, accept
>> either a string or a sequence of objects.
> My preference these days is not to design APIs that
> way. It's never necessary and it avoids a lot of
> problems.

Oh yes.


  ZAPHOD:  Listen three eyes, don't try to outweird me, I get stranger
           things than you free with my breakfast cereal.
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 7

From jeremy at  Tue Feb 14 14:05:32 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 14 Feb 2006 08:05:32 -0500
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, "Martin v. L?wis" <martin at> wrote:
> Jeremy Hylton wrote:
> > The compiler in question is gcc and the warning can be turned off with
> > -Wno-write-strings.  I think we'd be better off leaving that option
> > on, though.  This warning will help me find places where I'm passing a
> > string literal to a function that does not take a const char*.  That's
> > valuable, not insensate.
> Hmm. I'd say this depends on what your reaction to the warning is.
> If you sprinkle const_casts in the code, nothing is gained.

Except for the Python APIs, we would declare the function as taking a
const char* if took a const char*.  If the function legitimately takes
a char*, then you have to change the code to avoid a segfault.

> Perhaps there is some value in finding functions which ought to expect
> const char*. For that, occasional checks should be sufficient; I cannot
> see a point in having code permanently pass with that option. In
> particular not if you are interfacing with C libraries.

I don't understand what you mean:  I'm not sure what you mean by
"occasional checks" or "permanently pass".  The compiler flags are
always the same.


From barry at  Tue Feb 14 14:32:42 2006
From: barry at (Barry Warsaw)
Date: Tue, 14 Feb 2006 08:32:42 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Feb 14, 2006, at 6:35 AM, Greg Ewing wrote:

> Barry Warsaw wrote:
>> This makes me think I want an unsigned byte type, which b[0] would
>> return.
> Come to think of it, this is something I don't
> remember seeing discussed. I've been thinking
> that bytes[i] would return an integer, but is
> the intention that it would return another bytes
> object?

A related question: what would bytes([104, 101, 108, 108, 111, 8004])  
return?  An exception hopefully.  I also think you'd want bytes([x  
for x in some_bytes_object]) to return an object equal to the original.


From foom at  Tue Feb 14 17:08:30 2006
From: foom at (James Y Knight)
Date: Tue, 14 Feb 2006 11:08:30 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>
	<>	<>	<>	<>	<>	<>
Message-ID: <>

On Feb 14, 2006, at 1:52 AM, Martin v. L?wis wrote:

> Phillip J. Eby wrote:
>> I was just pointing out that since byte strings are bytes by  
>> definition,
>> then simply putting those bytes in a bytes() object doesn't alter the
>> existing encoding.  So, using latin-1 when converting a string to  
>> bytes
>> actually seems like the the One Obvious Way to do it.
> This is a misconception. In Python 2.x, the type str already *is* a
> bytes type. So if S is an instance of 2.x str, bytes(S) does not need
> to do any conversion. You don't need to assume it is latin-1: it's
> already bytes.
>> In fact, the 'encoding' argument seems useless in the case of str  
>> objects,
>> and it seems it should default to latin-1 for unicode objects.
> I agree with the former, but not with the latter. There shouldn't be a
> conversion of Unicode objects to bytes at all. If you want bytes from
> a Unicode string U, write
>   bytes(U.encode(encoding))

I like it, it makes sense. Unicode strings are simply not allowed as  
arguments to the byte constructor. Thinking about it, why would it be  
otherwise? And if you're mixing str-strings and unicode-strings, that  
means the str-strings you're sometimes giving are actually not byte  
strings, but character strings anyhow, so you should be encoding  
those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Kill the encoding argument, and you're left with:

- bytes(bytes_object) -> copy constructor
- bytes(str_object) -> copy the bytes from the str to the bytes object
- bytes(sequence_of_ints) -> make bytes with the values of the ints,  
error on overflow

Python3.X removes str, and most APIs that did return str return bytes  
instead. Now all you have is:
- bytes(bytes_object) -> copy constructor
- bytes(sequence_of_ints) -> make bytes with the values of the ints,  
error on overflow

Nice and simple.


From pje at  Tue Feb 14 17:25:01 2006
From: pje at (Phillip J. Eby)
Date: Tue, 14 Feb 2006 11:25:01 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
	<dsjrfp$g72$> <>
Message-ID: <>

At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:

>On Feb 14, 2006, at 1:52 AM, Martin v. L?wis wrote:
>>Phillip J. Eby wrote:
>>>I was just pointing out that since byte strings are bytes by
>>>then simply putting those bytes in a bytes() object doesn't alter the
>>>existing encoding.  So, using latin-1 when converting a string to
>>>actually seems like the the One Obvious Way to do it.
>>This is a misconception. In Python 2.x, the type str already *is* a
>>bytes type. So if S is an instance of 2.x str, bytes(S) does not need
>>to do any conversion. You don't need to assume it is latin-1: it's
>>already bytes.
>>>In fact, the 'encoding' argument seems useless in the case of str
>>>and it seems it should default to latin-1 for unicode objects.
>>I agree with the former, but not with the latter. There shouldn't be a
>>conversion of Unicode objects to bytes at all. If you want bytes from
>>a Unicode string U, write
>>   bytes(U.encode(encoding))
>I like it, it makes sense. Unicode strings are simply not allowed as
>arguments to the byte constructor. Thinking about it, why would it be
>otherwise? And if you're mixing str-strings and unicode-strings, that
>means the str-strings you're sometimes giving are actually not byte
>strings, but character strings anyhow, so you should be encoding
>those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Actually, I think you mean:

     if isinstance(s_or_U, str):
         s_or_U = s_or_U.decode('utf-8')

     b = bytes(s_or_U.encode('utf-8'))

Or maybe:

     if isinstance(s_or_U, unicode):
         s_or_U = s_or_U.encode('utf-8')

     b = bytes(s_or_U)

Which is why I proposed that the boilerplate logic get moved *into* the 
bytes constructor.  I think this use case is going to be common in today's 
Python, but in truth I'm not as sure what bytes() will get used *for* in 
today's Python.  I'm probably overprojecting based on the need to use str 
objects now, but bytes aren't going to be a replacement for str for a good 
while anyway.

>Kill the encoding argument, and you're left with:
>- bytes(bytes_object) -> copy constructor
>- bytes(str_object) -> copy the bytes from the str to the bytes object
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>Python3.X removes str, and most APIs that did return str return bytes
>instead. Now all you have is:
>- bytes(bytes_object) -> copy constructor
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>Nice and simple.

I could certainly live with that approach, and it certainly rules out all 
the "when does the encoding argument apply and when should it be an error 
to pass it" questions.  :)

From mal at  Tue Feb 14 17:47:39 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 14 Feb 2006 17:47:39 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

James Y Knight wrote:
> Kill the encoding argument, and you're left with:
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> Python3.X removes str, and most APIs that did return str return bytes  
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> Nice and simple.

Albeit, too simple.

The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes("123") would not work in Py3k.

It's hard to imagine how you'd provide a decent upgrade path
for bytes() if you introduce the above semantics in Py2.x.

People would start writing bytes("123") in Py2.x and expect
it to also work in Py3k, which it wouldn't.

To prevent this, you'd have to outrule bytes() construction
from strings altogether, which doesn't look like a viable
option either.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From alan.gauld at  Mon Feb 13 01:35:16 2006
From: alan.gauld at (Alan Gauld)
Date: Mon, 13 Feb 2006 00:35:16 -0000
Subject: [Python-Dev] [Tutor] nice()
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <00a401c63035$5cad50a0$0b01a8c0@xp>

>> However I do dislike the name nice() - there is already a nice() in the
>> os module with a fairly well understood function. But I'm sure some

> Presumably it would be located somewhere like the math module.

For sure, but let's avoid as many name clashes as we can.
Python is very good at managing namespaces but there are still a 
lot of folks who favour the 

from x import * 

mode of working.

Alan G.

From jcarlson at  Tue Feb 14 18:28:54 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 14 Feb 2006 09:28:54 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

James Y Knight <foom at> wrote:
> I like it, it makes sense. Unicode strings are simply not allowed as  
> arguments to the byte constructor. Thinking about it, why would it be  
> otherwise? And if you're mixing str-strings and unicode-strings, that  
> means the str-strings you're sometimes giving are actually not byte  
> strings, but character strings anyhow, so you should be encoding  
> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

I also like the removal of the encoding...

> Kill the encoding argument, and you're left with:
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> Python3.X removes str, and most APIs that did return str return bytes  
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow

What's great is that this already works:

>>> import array
>>> array.array('b', [1,2,3])
array('b', [1, 2, 3])
>>> array.array('b', "hello")
array('b', [104, 101, 108, 108, 111])
>>> array.array('b', u"hello")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: array initializer must be list or string
>>> array.array('b', [150])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: signed char is greater than maximum
>>> array.array('B', [150])
array('B', [150])
>>> array.array('B', [350])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: unsigned byte integer is greater than maximum

And out of the deal we can get both signed and unsigned ints.

Re: Adam Olsen
> I'm starting to wonder, do we really need anything fancy?  Wouldn't it
> be sufficient to have a way to compactly store 8-bit integers?

It already exists.  It could just use another interface.  The buffer
interface offers any array the ability to return strings.  That may have
to change to return bytes objects in Py3k.

 - Josiah

From crutcher at  Tue Feb 14 18:48:59 2006
From: crutcher at (Crutcher Dunnavant)
Date: Tue, 14 Feb 2006 09:48:59 -0800
Subject: [Python-Dev] [Tutor] nice()
In-Reply-To: <00a401c63035$5cad50a0$0b01a8c0@xp>
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <>

On 2/12/06, Alan Gauld <alan.gauld at> wrote:
> >> However I do dislike the name nice() - there is already a nice() in the
> >> os module with a fairly well understood function. But I'm sure some
> > Presumably it would be located somewhere like the math module.
> For sure, but let's avoid as many name clashes as we can.
> Python is very good at managing namespaces but there are still a
> lot of folks who favour the
> from x import *
> mode of working.

Yes, and there are people who insist on drinking and driving, that
doesn't mean cars should be designed with that as a motivating
assumption. There are just too many places where you are going to get
name clashes, where something which is _obvious_ in one context will
have a different ( and _obvious_ ) meaning in another. Lets just keep
the namespaces clean, and not worry about inter-module conflicts.

Crutcher Dunnavant <crutcher at>

From mal at  Tue Feb 14 18:58:11 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 14 Feb 2006 18:58:11 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>	
	<dsjrfp$g72$> <>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/13/06, M.-A. Lemburg <mal at> wrote:
>> Guido van Rossum wrote:
>>> It'd be cruel and unusual punishment though to have to write
>>>   bytes("abc", "Latin-1")
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
> As Phillip guessed, I was indeed thinking about introducing bytes()
> sooner than that, perhaps even in 2.5 (though I don't want anything
> rushed).

Hmm, that is probably going to be too early. As the thread shows
there are lots of things to take into account, esp. since if you
plan to introduce byte() in 2.x, the upgrade path to 3.x would
have to be carefully planned. Otherwise, we end up introducing
a feature which is meant to prepare for 3.x and then we end up
causing breakage when the move is finally implemented.

> Even in Py3k though, the encoding issue stands -- what if the file
> encoding is Unicode? Then using Latin-1 to encode bytes by default
> might not by what the user expected. Or what if the file encoding is
> something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> Anything default but ASCII isn't going to work as expected. ASCII
> isn't going to work as expected either, but it will complain loudly
> (by throwing a UnicodeError) whenever you try it, rather than causing
> subtle bugs later.

I think there's a misunderstanding here: in Py3k, all "string"
literals will be converted from the source code encoding to
Unicode. There are no ambiguities - a Klingon character will still
map to the same ordinal used to create the byte content regardless
of whether the source file is encoded in UTF-8, UTF-16 or
some Klingon charset (are there any ?).

Furthermore, by restricting to ASCII you'd also outrule hex escapes
which seem to be the natural choice for presenting binary data in
literals - the Unicode representation would then only be an
implementation detail of the way Python treats "string" literals
and a user would certainly expect to find e.g. \x88 in the bytes object
if she writes bytes('\x88').

But maybe you have something different in mind... I'm talking
about ways to create bytes() in Py3k using "string" literals.

>> While we're at it: I'd suggest that we remove the auto-conversion
>> from bytes to Unicode in Py3k and the default encoding along with
>> it.
> I'm not sure which auto-conversion you're talking about, since there
> is no bytes type yet. If you're talking about the auto-conversion from
> str to unicode: the bytes type should not be assumed to have *any*
> properties that the current str type has, and that includes
> auto-conversion.

I was talking about the automatic conversion of 8-bit strings to
Unicode - which was a key feature to make the introduction of
Unicode less painful, but will no longer be necessary in Py3k.

>> In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
>> (Maybe a bit radical, but I guess that's what Py3k is meant for.)
> Right.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From at  Tue Feb 14 19:15:37 2006
From: at (Michael Walter)
Date: Tue, 14 Feb 2006 19:15:37 +0100
Subject: [Python-Dev] [Tutor] nice()
In-Reply-To: <>
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <>

It doesn't seem to me that math.nice has an obvious meaning.


On 2/14/06, Crutcher Dunnavant <crutcher at> wrote:
> On 2/12/06, Alan Gauld <alan.gauld at> wrote:
> > >> However I do dislike the name nice() - there is already a nice() in the
> > >> os module with a fairly well understood function. But I'm sure some
> >
> > > Presumably it would be located somewhere like the math module.
> >
> > For sure, but let's avoid as many name clashes as we can.
> > Python is very good at managing namespaces but there are still a
> > lot of folks who favour the
> >
> > from x import *
> >
> > mode of working.
> Yes, and there are people who insist on drinking and driving, that
> doesn't mean cars should be designed with that as a motivating
> assumption. There are just too many places where you are going to get
> name clashes, where something which is _obvious_ in one context will
> have a different ( and _obvious_ ) meaning in another. Lets just keep
> the namespaces clean, and not worry about inter-module conflicts.
> --
> Crutcher Dunnavant <crutcher at>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From foom at  Tue Feb 14 19:35:44 2006
From: foom at (James Y Knight)
Date: Tue, 14 Feb 2006 13:35:44 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>	<>	<dsbc3h$rct$>	<>	<>	<>	<>	<dsjrfp$g72$>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

On Feb 14, 2006, at 11:47 AM, M.-A. Lemburg wrote:
> The above approach would basically remove the possibility to easily
> create bytes() from literals in Py3k, since literals in Py3k create
> Unicode objects, e.g. bytes("123") would not work in Py3k.

That is true. And I think that is correct. There should be b"string"  

> It's hard to imagine how you'd provide a decent upgrade path
> for bytes() if you introduce the above semantics in Py2.x.
> People would start writing bytes("123") in Py2.x and expect
> it to also work in Py3k, which it wouldn't.

Agreed, it won't work.

> To prevent this, you'd have to outrule bytes() construction
> from strings altogether, which doesn't look like a viable
> option either.

I don't think you have to do that, you just have to provide b"string".

I'd like to point out that the previous proposal had the same issue:

On Feb 13, 2006, at 8:11 PM, Guido van Rossum wrote:
> On 2/13/06, James Y Knight <foom at> wrote:
>> In py3k, when the str object is eliminated, then what do you have?
>> Perhaps
>> - bytes("\x80"), you get an error, encoding is required. There is no
>> such thing as "default encoding" anymore, as there's no str object.
>> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
>> single byte of value 0x80.
> Yes to both again.


From foom at  Tue Feb 14 19:36:26 2006
From: foom at (James Y Knight)
Date: Tue, 14 Feb 2006 13:36:26 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
	<dsjrfp$g72$> <>
Message-ID: <>

On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
> At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
>> I like it, it makes sense. Unicode strings are simply not allowed as
>> arguments to the byte constructor. Thinking about it, why would it be
>> otherwise? And if you're mixing str-strings and unicode-strings, that
>> means the str-strings you're sometimes giving are actually not byte
>> strings, but character strings anyhow, so you should be encoding
>> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good  
>> spelling.
> Actually, I think you mean:
>     if isinstance(s_or_U, str):
>         s_or_U = s_or_U.decode('utf-8')
>     b = bytes(s_or_U.encode('utf-8'))
> Or maybe:
>     if isinstance(s_or_U, unicode):
>         s_or_U = s_or_U.encode('utf-8')
>     b = bytes(s_or_U)
> Which is why I proposed that the boilerplate logic get moved *into*  
> the bytes constructor.  I think this use case is going to be common  
> in today's Python, but in truth I'm not as sure what bytes() will  
> get used *for* in today's Python.  I'm probably overprojecting  
> based on the need to use str objects now, but bytes aren't going to  
> be a replacement for str for a good while anyway.

I most certainly *did not* mean that. If you are mixing together str  
and unicode instances, the str instances _must be_ in the default  
encoding (ascii). Otherwise, you are bound for failure anyhow, e.g.  
''.join(['\x95', u'1']). Str is used for two things right now: 1) a  
byte string. 2) a unicode string restricted to 7bit ASCII. These two  
uses are separate and you cannot mix them without causing disaster.

You've created an interface which can take either a utf8 byte-string,  
or unicode character string. But that's wrong and can only cause  
problems. It should take either an encoded bytestring, or a unicode  
character string. Not both. If it takes a unicode character string,  
there are two ways of spelling that in current python: a "str" object  
with only ASCII in it, or a "unicode" object with arbitrary  
characters in it. bytes(s_or_U.encode('utf-8')) works correctly with  


From guido at  Tue Feb 14 20:07:09 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 11:07:09 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/14/06, Fuzzyman <fuzzyman at> wrote:
> In Python 3K, when the string data-type has gone,

Technically it won't be gone; str will mean what it already means in
Jython and IronPython (for which CPython uses unicode in 2.x).

> what will
> ``open(filename).read()`` return ?

Since you didn't specify an open mode, it'll open it as a text file
using some default encoding (or perhaps it can guess the encoding from
file metadata -- this is all OS specific). So it'll return a string.

If you open the file in binary mode, however, read() will return a
bytes object. I'm currently considering whether we should have a
single open() function which returns different types of objects
depending on a string parameter's value, or whether it makes more
sense to have different functions, e.g. open() for text files and
openbinary() for binary files. I believe Fredrik Lundh wants open() to
use binary mode and opentext() for text files, but that seems
backwards -- surely text files are more commonly used, and surely the
most common operation should have the shorter name -- call it the
Huffman Principle.

> Will the object returned have a
> ``decode`` method, to coerce to a unicode string ?

No, the object returned will *be* a (unicode) string.

But a bytes object (returned by a binary open operation) will have a
decode() method.

> Also, what datatype will ``u'some string'.encode('ascii')`` return ?

It will be a syntax error (u"..." will be illegal).

The str.encode() method will return a bytes object (if the design goes
as planned -- none of this is set in stone yet).

> I assume that when the ``bytes`` datatype is implemented, we will be
> able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
> probably ought to read the bytes PEP and the Py3k one...

Sort of (except perhaps we'd be using openbinary(filename, 'w")).
Perhaps write(somedata) should automatically coerce the data to bytes?

--Guido van Rossum (home page:

From guido at  Tue Feb 14 20:16:32 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 11:16:32 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/13/06, "Martin v. L?wis" <martin at> wrote:
> I'm actually opposed to bdist_egg, from a conceptual point of view.
> I think it is wrong if Python creates its own packaging format
> (just as it was wrong that Java created jar files - but they are
> without deployment procedures even today).

I think Jars are a lower-level thing than what we're talking about
here; they're no different than shared libraries, and for an
architecture that has its own bytecode and toolchain it only makes
sense to invent its own cross-platform shared library format
(especially given the "deploy anywhere" slogan).

> The burden should be
> on developer's side, for creating packages for the various systems,
> not on the users side, when each software comes with its own
> deployment infrastructure.

Well, just like Java, if you have pure Python code, why should a
developer have to duplicate the busy-work of creating distributions
for different platforms? (Especially since there are so many different
target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
you -- I'm no expert but ISTM there are too many!)

> OTOH, users are fond of eggs, for reasons that I haven't yet
> understood.

I'm neutral on them; to be honest I don't even understand the
difference between eggs and setuptools yet. :-) I imagine that users
don't particularly care about eggs, but do care about the ease of use
of the tools around them, i.e. ez_setup.

> From a release management point of view, I would still like to
> make another bdist_msi release before contributing it to Python.

Please go ahead.

--Guido van Rossum (home page:

From nas at  Tue Feb 14 20:31:07 2006
From: nas at (Neil Schemenauer)
Date: Tue, 14 Feb 2006 12:31:07 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Feb 13, 2006 at 08:07:49PM -0800, Guido van Rossum wrote:
> On 2/13/06, Neil Schemenauer <nas at> wrote:
> >     "\x80".encode('latin-1')
> But in 2.5 we can't change that to return a bytes object without
> creating HUGE incompatibilities.

People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.  That spelling would provide a way of ensuring the type
of the return value.

> You missed the part where I said that introducing the bytes type
> *without* a literal seems to be a good first step. A new type, even
> built-in, is much less drastic than a new literal (which requires
> lexer and parser support in addition to everything else).

Are you concerned about the implementation effort?  If so, I don't
think that's justified since adding a new string prefix should be
pretty straightforward (relative to rest of the effort involved).
Are you comfortable with the proposed syntax?


From pje at  Tue Feb 14 20:53:37 2006
From: pje at (Phillip J. Eby)
Date: Tue, 14 Feb 2006 14:53:37 -0500
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <
References: <>
Message-ID: <>

(Disclaimer: I'm not currently promoting the addition of bdist_egg or any 
egg-specific features for the 2.5 timeframe, but neither am I 
opposed.  This message is just to clarify a few points and questions under 
discussion, not to advocate a particular outcome.  If you read this and 
think you see arguments for *doing* anything, you're projecting your own 
conclusions where there is only analysis.)

At 11:16 AM 2/14/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, "Martin v. L?wis" <martin at> wrote:
> > I'm actually opposed to bdist_egg, from a conceptual point of view.
> > I think it is wrong if Python creates its own packaging format
> > (just as it was wrong that Java created jar files - but they are
> > without deployment procedures even today).
>I think Jars are a lower-level thing than what we're talking about
>here; they're no different than shared libraries, and for an
>architecture that has its own bytecode and toolchain it only makes
>sense to invent its own cross-platform shared library format
>(especially given the "deploy anywhere" slogan).

Java, however, layers many things atop jars, including resources (files, 
images, messages, etc.) and metadata (manifests, deployment descriptors, 
etc.).  Eggs are the same.

To think that jars or eggs are a "packaging format" is a conceptual error 
if by "packaging format" you're equating them with .rpm, .deb, .msi, 
etc.  It is merely a convenient side benefit that .jar files and .egg files 
are convenient transport mechanisms for what's inside them - the jar or 
egg.  Jars and eggs are conceptual entities independent of the distribution 
format, and in the case of eggs there are two other formats (.egg directory 
and .egg-info tags) that can be used to express the conceptual entity.

> > The burden should be
> > on developer's side, for creating packages for the various systems,
> > not on the users side, when each software comes with its own
> > deployment infrastructure.
>Well, just like Java, if you have pure Python code, why should a
>developer have to duplicate the busy-work of creating distributions
>for different platforms? (Especially since there are so many different
>target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
>you -- I'm no expert but ISTM there are too many!)

Indeed.  Placing the burden on the developer's side simply means that it 
doesn't happen until volunteers pick it up, which happens slowly and only 
for "popular enough" packages.  Which means that as a practical matter, 
developers cannot release packages that depend on other packages without 
committing to some small set of target platforms and packaging systems -- 
the situation that setuptools was created to help change.

> > OTOH, users are fond of eggs, for reasons that I haven't yet
> > understood.
>I'm neutral on them; to be honest I don't even understand the
>difference between eggs and setuptools yet. :-)

Eggs are a way of associating metadata and resources with installed Python 
packages.  ".egg" is a zip or directory file layout that is one 
implementation of this concept.

Setuptools is a set of distutils enhancements that make it easier to build, 
test, distribute and deploy eggs, including the pkg_resources module (egg 
runtime support) and  the easy_install package manager.

>  I imagine that users
>don't particularly care about eggs, but do care about the ease of use
>of the tools around them, i.e. ez_setup.

And developers of course also care about not having to create those myriad 
installation formats, for platforms they may not even have.  :)  They also 
care about being able to specify dependencies reliably, which rules out 
entire classes of support issues and debugging.  It actually makes reuse of 
Python packages practical *without* unnecessarily tying the result to just 
one of the myriad platforms that Python runs on.  Some developers also like 
the plugin features, the ability to easily get data from their package 
directories, etc.

(Setuptools also offers a lot of creature comforts that the distutils 
doesn't, and some of those conveniences depend on eggs, but others do not.)

From just at  Tue Feb 14 21:35:50 2006
From: just at (Just van Rossum)
Date: Tue, 14 Feb 2006 21:35:50 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
Message-ID: <r01050400-1039-7EC926449D9911DA8736001124365170@[]>

Guido van Rossum wrote:

> > what will
> > ``open(filename).read()`` return ?
> Since you didn't specify an open mode, it'll open it as a text file
> using some default encoding (or perhaps it can guess the encoding from
> file metadata -- this is all OS specific). So it'll return a string.
> If you open the file in binary mode, however, read() will return a
> bytes object. I'm currently considering whether we should have a
> single open() function which returns different types of objects
> depending on a string parameter's value, or whether it makes more
> sense to have different functions, e.g. open() for text files and
> openbinary() for binary files. I believe Fredrik Lundh wants open() to
> use binary mode and opentext() for text files, but that seems
> backwards -- surely text files are more commonly used, and surely the
> most common operation should have the shorter name -- call it the
> Huffman Principle.

+1 for two functions.

My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to

Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).


From crutcher at  Tue Feb 14 22:41:21 2006
From: crutcher at (Crutcher Dunnavant)
Date: Tue, 14 Feb 2006 13:41:21 -0800
Subject: [Python-Dev] [Tutor] nice()
In-Reply-To: <>
References: <038701c63004$733603c0$132c4fca@csmith>
Message-ID: <>

On 2/14/06, Michael Walter < at> wrote:
> It doesn't seem to me that math.nice has an obvious meaning.

I don't disagree, I think math.nice is a terrible name. I was
objecting to the desire to try to come up with interesting, different
names in every module namespace.

> Regards,
> Michael
> On 2/14/06, Crutcher Dunnavant <crutcher at> wrote:
> > On 2/12/06, Alan Gauld <alan.gauld at> wrote:
> > > >> However I do dislike the name nice() - there is already a nice() in the
> > > >> os module with a fairly well understood function. But I'm sure some
> > >
> > > > Presumably it would be located somewhere like the math module.
> > >
> > > For sure, but let's avoid as many name clashes as we can.
> > > Python is very good at managing namespaces but there are still a
> > > lot of folks who favour the
> > >
> > > from x import *
> > >
> > > mode of working.
> >
> > Yes, and there are people who insist on drinking and driving, that
> > doesn't mean cars should be designed with that as a motivating
> > assumption. There are just too many places where you are going to get
> > name clashes, where something which is _obvious_ in one context will
> > have a different ( and _obvious_ ) meaning in another. Lets just keep
> > the namespaces clean, and not worry about inter-module conflicts.
> >
> > --
> > Crutcher Dunnavant <crutcher at>
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >

Crutcher Dunnavant <crutcher at>

From thomas at  Tue Feb 14 22:46:08 2006
From: thomas at (Thomas Wouters)
Date: Tue, 14 Feb 2006 22:46:08 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 11:16:32AM -0800, Guido van Rossum wrote:

> Well, just like Java, if you have pure Python code, why should a
> developer have to duplicate the busy-work of creating distributions
> for different platforms? (Especially since there are so many different
> target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
> you -- I'm no expert but ISTM there are too many!)

Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
writes a regular distutils, and I can install the latest,
not-quite-in-apt version by doing ' bdist_deb' and installing the
resulting .deb. Very convenient for both parties ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From unknown_kev_cat at  Tue Feb 14 23:05:08 2006
From: unknown_kev_cat at (Joe Smith)
Date: Tue, 14 Feb 2006 17:05:08 -0500
Subject: [Python-Dev] bdist_* to stdlib?
References: <>
Message-ID: <dstlvb$6cb$>

"Guido van Rossum" <guido at> wrote in message 
news:ca471dc20602131604v12a4d70eq9d41b5ce543f3264 at
> In private email, Phillip Eby suggested to add these things to the
> 2.5. standard library:
> bdist_deb, bdist_msi, and friends
> He explained them as follows:
> """
> bdist_deb makes .deb files (packages for Debian-based Linux distros, like
> Ubuntu).  bdist_msi makes .msi installers for Windows (it's by Martin v.
> Loewis).  Marc Lemburg proposed on the distutils-sig that these and 
> various
> other implemented bdist_* formats (other than bdist_egg) be included in 
> the
> next Python release, and there was no opposition there that I recall.
> """

I don't like the idea of bdist_deb very much.
The idea behind the debian packaging system is that unlike with RPM and 
Windows, package management should be clean.

Windows and RPM are known for major dependency problems, letting packages 
damage each other, having packages that do not uninstall cleanly (i.e. 
packages that leave junk all over the place) and generally messing the sytem 
up quite baddly over time, so that the OS is usually removed and 
re-installed periodically.)

The Debian style system attempts to overcome these deficiencies, and 
generally does a decent job with it. The problem is that this can really 
only work if packages are well maintained, and adhere to a set of policies 
that help to further mitigate these problems. Even with all of that, 
packages from one debian based distribution may well cause problems with a 
different one. For that reason it is quite rare to see .debs distributed by 
parties other than those directly involved with a Debian-based distribution, 
and even then they are normally targeted specifically at one distibution. 
Making it easy to generate .debs of python modules will likely result in a 
noticable increase in the number of .debs that do not target a specific 
distribution and/or do not follow the policies of that distribution.

So basically what I am saying is that such a system has a pretty good chance 
of resulting in debs that mess-up users systems, and that is not good. I'm 
not saying don't do it, but if it would be included in the standard library, 
procede with caution! 

From aleaxit at  Tue Feb 14 23:37:59 2006
From: aleaxit at (Alex Martelli)
Date: Tue, 14 Feb 2006 14:37:59 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <r01050400-1039-7EC926449D9911DA8736001124365170@>
References: <>
Message-ID: <>

On 2/14/06, Just van Rossum <just at> wrote:
> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
might make it easy to eventually migrate off it. Maybe text and data
could be two subclasses of file, with file remaining initially as it
is (and perhaps becoming an abstract-only baseclass at the time 'open'
is deprecated).

In real life, people do all the time use 'open' inappropriately (on
non-text files on Windows): one of the most frequent tasks on
python-help has to do with diagnosing that this is what happened and
suggest the addition of an explicit 'rb' or 'wb' argument.  This
unending chore, in particular, makes me very wary of forever keeping
open to mean "open this _text_ file".


From barry at  Tue Feb 14 23:48:57 2006
From: barry at (Barry Warsaw)
Date: Tue, 14 Feb 2006 17:48:57 -0500
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:

> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

I was actually thinking about static methods file.text() and
which seem nicely self descriptive, if a little bit longer.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From guido at  Tue Feb 14 23:51:20 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 14:51:20 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <r01050400-1039-7EC926449D9911DA8736001124365170@>
References: <>
Message-ID: <>

On 2/14/06, Just van Rossum <just at> wrote:
> Guido van Rossum wrote:
> > [...] surely text files are more commonly used, and surely the
> > most common operation should have the shorter name -- call it the
> > Huffman Principle.
> +1 for two functions.
> My choice would be open() for binary and opentext() for text. I don't
> find that backwards at all: the text function is going to be more
> different from the current open() function then the binary function
> would be since in many ways the str type is closer to bytes than to
> unicode.

It's still backwards because the current open function defaults to
text on Windows (the only platform where it matters any more).

> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

And then, on 2/14/06, Alex Martelli <aleaxit at> wrote:
> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

Plain 'text' and 'data' don't convey the fact that we're talking about
opening I/O objects here. If you want, we could say textfile() and
datafile(). (I'm fine with data instead of binary.)

But somehow I still like the 'open' verb. It has a long and rich
tradition. And it also nicely conveys that it is a factory function
which may return objects of different types (though similar in API)
based upon either additional arguments (e.g. buffering) or the
environment (e.g. encodings) or even inspection of the file being

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:25 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:25 -0800
Subject: [Python-Dev] bytes type discussion
Message-ID: <>

I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a summary of where I believe I stand.
Let's continue the discussion in this new thread unless there are
specific hairs to be split in the other thread that aren't addressed
below or by later posts.

Non-controversial (or almost):

- we need a new PEP; PEP 332 won't cut it

- no b"..." literal

- bytes objects are mutable

- bytes objects are composed of ints in range(256)

- you can pass any iterable of ints to the bytes constructor, as long
as they are in range(256)

- longs or anything with an __index__ method should do, too

- when you index a bytes object, you get a plain int

- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'

Somewhat controversial:

- it's probably too big to attempt to rush this into 2.5

- bytes("abc") == bytes(map(ord, "abc"))

- bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256])

Very controversial:

- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument

- bytes(u"abc") == bytes("abc") # for ASCII at least

- bytes(u"\x80\xff") raises UnicodeError

- bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")

Martin von Loewis's alternative for the "very controversial" set is to
disallow an encoding argument and (I believe) also to disallow Unicode
arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
only way to convert a string (which is always unicode) to bytes. The
problem with this is that there's no code that works in both 2.x and

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:29 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:29 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, "Martin v. L?wis" <martin at> wrote:
> Guido van Rossum wrote:
> >>In py3k, when the str object is eliminated, then what do you have?
> >>Perhaps
> >>- bytes("\x80"), you get an error, encoding is required. There is no
> >>such thing as "default encoding" anymore, as there's no str object.
> >>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >>single byte of value 0x80.
> >
> > Yes to both again.
> Please reconsider, and don't give bytes() an encoding= argument.
> It doesn't need one. In Python 3, people should write
>   "\x80".encode("latin-1")
> if they absolutely want to, although they better write
>   bytes([0x80])
> Now, the first form isn't valid in 2.5, but
>   bytes(u"\x80".encode("latin-1"))
> could work in all versions.

In 3.0, I agree that .encode() should return a bytes object.

I'd almost be convinced that in 2.x bytes() doesn't need an encoding
argument, except it will require excessive copying.
bytes(u.encode("utf8")) will certainly use 2*len(u) bytes  space (plus
a constant); bytes(u, "utf8") only needs len(u) bytes. In 3.0,
bytes(s.encode(xxx)) would also create an extra copy, since the bytes
type is mutable (we all agree on that, don't we?).

I think that's a good enough argument for 2.x. We could keep the
extended API as an alternative form in 3.x, or automatically translate
calls to bytes(x, y) into x.encode(y).

BTW I think we'll need a new PEP instead of PEP 332. The latter has
almost no details relevant to this discussion, and it seems to treat
bytes as a near-synonym for str in 2.x. That's not the way this
discussion is going it seems.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:33 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:33 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsjrfp$g72$>
Message-ID: <>

On 2/14/06, Thomas Wouters <thomas at> wrote:
> On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
> > But adding an encoding doesn't help. The str.encode() method always
> > assumes that the string itself is ASCII-encoded, and that's not good
> > enough:
> > >>> "abc".encode("latin-1")
> > 'abc'
> > >>> "abc".decode("latin-1")
> > u'abc'
> > >>> "abc\xf0".decode("latin-1")
> > u'abc\xf0'
> > >>> "abc\xf0".encode("latin-1")
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> > 3: ordinal not in range(128)

(Note that I've since been convinced that bytes(s) where type(s) ==
str should just return a bytes object containing the same bytes as s,
regardless of encoding. So basically you're preaching to the choir
now. The only remaining question is what if anything to do with an
encoding argment when the first argument is of type str...)

> These comments disturb me. I never really understood why (byte) strings grew
> the 'encode' method, since 8-bit strings *are already encoded*, by their
> very nature. I mean, I understand it's useful because Python does
> non-unicode encodings like 'hex', but I don't really understand *why*. The
> benefits don't seem to outweigh the cost (but that's hindsight.)

It may also have something to do with Jython compatibility (which has
str and unicode being the same thing) or 3.0 future-proofing.

> Directly encoding a (byte) string into a unicode encoding is mostly useless,
> as you've shown. The only use-case I can think of is translating ASCII in,
> for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
> unless the system encoding isn't 'ascii' (and that's pretty rare, and not
> something a Python programmer should depend on.) On the other hand, the fact
> that (byte) strings have an 'encode' method creates a lot of confusion in
> unicode-newbies, and causes programs to break only when input is non-ASCII.
> And non-ASCII input just happens too often and too unpredictably in
> 'real-world' code, and not enough in European programmers' tests ;P

Oh, there are lots of ways that non-ASCII input can break code, you
don't have to invoke encode() on str objects to get that effect. :/

> Unicode objects and strings are not the same thing. We shouldn't treat them
> as the same thing.

Well in 3.0 they *will* be the same thing, and in Jython they already are.

> They share an interface (like lists and tuples do), and
> if you only use that interface, treating them as the same kind object is
> mostly ok. They actually share *less* of an interface than lists and tuples,
> though, as comparing strings to unicode objects can raise an exception,
> whereas comparing lists to tuples is not expected to.

No, it causes silent surprises since [1,2,3] != (1,2,3).

> For anything less
> trivial than indexing, slicing and most of the string methods, and anything
> what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
> objects and strings *must* be treated separately. For instance, there is no
> correct way to do:
>   s.split("\x80")
> unless you know the type of 's'. If it's unicode, you want u"\x80" instead
> of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
> but you wouldn't know from looking at the code -- maybe it expects a
> specific encoding (or encoding family), maybe not. As soon as you deal with
> unicode, you need to really understand the concept, and too many programmers
> don't. And it's very hard to tell from someone's comments whether they fail
> to understand or just get some of the terminology wrong; that's why Guido's
> comments about 'encoding a byte string' and 'what if the file encoding is
> Unicode' scare me. The unicode/string mixup almost makes me wish Python
> was statically typed.

I'm mostly trying to reflect various broken mental models that users
may have. Believe me, my own confusion is nothing compared to the
confusion that occurs in less gifted users. :-)

The only use case for mixing ASCII and Unicode that I *wanted* to work
right was the mixing of pure ASCII strings (typically literals) with
Unicode data. And that works.

Where things unfortunately fall flat is when you start reading data
from files or interactive input and it gives you some encoded str
object instead of a Unicode object. Our mistake was that we didn't
foresee this clearly enough. Perhaps open(filename).read(), where the
file contains non-ASCII bytes, should have been changed to either
return a Unicode string (if an encoding can somehow be guessed), or
raise an exception, rather than returning an str object in some
unknown (and usually unknowable) encoding.

I hope to fix that in 3.0 too, BTW.

> So please, please, please don't make the mistake of 'doing something' with
> the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
> It wouldn't actually be usable except for the same things as 'str.encode':
> to convert from ASCII to non-ASCII-supersets, or to convert to non-unicode
> encodings (such as 'hex'.) You can achieve those two by doing, e.g.,
> 'bytes(s.encode('hex'))' if you really want to. Ignoring the encoding
> (rather than raising an exception) would also allow code to be trivially
> portable between Python 2.x and Py3K, when "" is actually a unicode object.
> Not that I'm happy with ignoring anything, but not ignoring would be bigger
> crime here.

I'm beginning to see that this is a pretty reasonable interpretation.

> Oh, and while on the subject, I'm not convinced going all-unicode in Py3K is
> a good idea either, but maybe I should save that discussion for PyCon. I'm
> not thinking "why do we need unicode" anymore (which I did two years ago ;)
> but I *am* thinking it'll be a big step for 90% of the programmers if they
> have to grasp unicode and encodings to be able to even do 'raw_input()'
> sensibly. I know I spend an inordinate amount of time trying to explain the
> basics on #python on already.

I'm actually hoping that by having all strings be Unicode we'd
*reduce* the amount of confusion. The key (see above where I admitted
this as our biggest Unicode mistake) is to make sure that the
encoding/decoding is built into all I/O operations.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:37 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:37 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Neil Schemenauer <nas at> wrote:
> People could spell it bytes(s.encode('latin-1')) in order to make it
> work in 2.X.  That spelling would provide a way of ensuring the type
> of the return value.

At the cost of an extra copying step.

> > You missed the part where I said that introducing the bytes type
> > *without* a literal seems to be a good first step. A new type, even
> > built-in, is much less drastic than a new literal (which requires
> > lexer and parser support in addition to everything else).
> Are you concerned about the implementation effort?  If so, I don't
> think that's justified since adding a new string prefix should be
> pretty straightforward (relative to rest of the effort involved).

Not so much the implementation but also the documentation, updating
3rd party Python preprocessors, etc.

> Are you comfortable with the proposed syntax?

Not entirely, since I don't know what b"abc<euro>def" would mean
(where <euro> is a Unicode Euro character typed in whatever source
encoding was used).

Instead of b"abc" (only ASCII) you could write bytes("abc"). Instead
of b"\xf0\xff\xee" you could write bytes([0xf0, 0xff, 0xee]).

The key disconnect for me is that if bytes are not characters, we
shouldn't use a literal notation that resembles the literal notation
for characters. And there's growing consensus that a bytes type should
be considered as an array of (8-bit unsigned) ints.

Also, bytes objects are (in my mind anyway) mutable. We have no other
literal notation for mutable objects. What would the following code

  for i in range(2):
    b = b"abc"
    print b
    b[0] = ord("A")

Would the second output line print abc or Abc?

I guess the only answer that makes sense is that it should print abc
both times; but that means that b"abc" must be internally implemented
by creating a new bytes object each time. Perhaps the implementation
effort isn't so minimal after all...

(PS why is there a reply-to in your email the excludes you from the
list of recipients but includes me?)

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:36 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:36 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<> <>
Message-ID: <>

On 2/14/06, Adam Olsen <rhamph at> wrote:
> I'm starting to wonder, do we really need anything fancy?  Wouldn't it
> be sufficient to have a way to compactly store 8-bit integers?
> In 2.x we could convert unicode like this:
> bytes(ord(c) for c in u"It's...".encode('utf-8'))


> u"It's...".byteencode('utf-8')  # Shortcut for above

Yuck**2. I'd like to avoid adding new APIs to existing types to return
bytes instead of str. (It's okay to change existing APIs to *accept*
bytes as an alternative to str though.)

> In 3.0 it changes to:
> "It's...".encode('utf-8')
> u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility

No. 3.0 won't have "backward compatibility" features. That's the whole
point of 3.0.

> Passing a str or unicode directly to bytes() would be an error.
> repr(bytes(...)) would produce bytes([1,2,3]).

I'm fine with that.

> Probably need a __bytes__() method that print can call, or even better
> a __print__(file) method[0].  The write() methods would of course have
> to support bytes objects.

Right on the latter.

> I realize it would be odd for the interactive interpret to print them
> as a list of ints by default:
> >>> u"It's...".byteencode('utf-8')
> [73, 116, 39, 115, 46, 46, 46]

No. This prints the repr() which should include the type. bytes([73,
116, 39, 115, 46, 46, 46]) is the right thing to print here.

> But maybe it's time we stopped hiding the real nature of bytes from users?

That's the whole point.

> [0] By this I mean calling objects recursively and telling them what
> file to print to, rather than getting a temporary string from them and
> printing that.  I always wondered why you could do that from C
> extensions but not from Python code.

I want to keep the Python-level API small.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:41 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:41 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/13/06, Barry Warsaw <barry at> wrote:
> This makes me think I want an unsigned byte type, which b[0] would
> return.  In another thread I think someone mentioned something about
> fixed width integral types, such that you could have an object that
> was guaranteed to be 8-bits wide, 16-bits wide, etc.   Maybe you also
> want signed and unsigned versions of each.  This may seem like YAGNI
> to many people, but as I've been working on a tightly embedded/
> extended application for the last few years, I've definitely had
> occasions where I wish I could more closely and more directly model
> my C values as Python objects (without using the standard workarounds
> or writing my own C extension types).

So I'm taking that the specific properties you want to model are the
overflow behavior, right? N-bit unsigned is defined as arithmethic mod
2**N; N-bit signed is a bit more tricky to define but similar. These
never overflow but instead just throw away bits in an exactly
specified manner (2's complement arithmetic).

While I personally am comfortable with writing (x+y) & 0xFFFF (for
16-bit unsigned), I can see that someone who spends a lot of time
doing arithmetic in this field might want specialized types.

But I'm not sure that that's what the Numeric folks want -- I believe
they're more interested in saving space, not in the mod 2**N
properties. So (here I'm to some extent guessing) they have different
array types whose elements are ints or floats of various widths; I'm
guessing they also have scalars of those widths for consistency or to
guide the creation of new arrays from scalars. I wouldn't be surprised
if, rather than requiring N-bit 2's complement, they would prefer more
flexible control over overflow -- e.g. ignore, warn, error, turn into
NaN, etc.

> But anyway, without hyper-generalizing, it's still worth asking
> whether a bytes type is just a container of byte objects, where the
> contained objects would be distinct, fixed 8-bit unsigned integral
> types.

There's certainly a point to treating bytes as ints; I don't know if
it's more compelling than to treating them as unit bytes. But if we
decide that the bytes types contains ints, b[0] should return a plain
int (whose value necessarily is in range(0, 256)), not some new
unsigned-8-bit type. And creating a bytes object from a list of ints
should accept any input values as long as their __index__ value is in
that same range.

I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and
bytes([-1]) should raise a ValueError.

> > There's also the consideration for APIs that, informally, accept
> > either a string or a sequence of objects. Many of these exist, and
> > they are probably all being converted to support unicode as well as
> > str (if it makes sense at all). Should a bytes object be considered as
> > a sequence of things, or as a single thing, from the POV of these
> > types of APIs? Should we try to standardize how code tests for the
> > difference? (Currently all sorts of shortcuts are being taken, from
> > isinstance(x, (list, tuple)) to isinstance(x, basestring).)
> I think bytes objects are very much like string objects today --
> they're the photons of Python since they can act like either
> sequences or scalars, depending on the context.  For example, we have
> code that needs to deal with situations where an API can return
> either a scalar or a sequence of those scalars.  So we have a utility
> function like this:
> def thingiter(obj):
>      try:
>          it = iter(obj)
>      except TypeError:
>          yield obj
>      else:
>          for item in it:
>              yield item
> Maybe there's a better way to do this, but the most obvious problem
> is that (for our use cases), this fails for strings because in this
> context we want strings to act like scalars.  So we add a little test
> just before the "try:" like "if isinstance(obj, basestring): yield
> obj".  But that's yucky.
> I don't know what the solution is -- if there /is/ a solution short
> of special case tests like above, but I think the key observation is
> that sometimes you want your string to act like a sequence and
> sometimes you want it to act like a scalar.  I suspect bytes objects
> will be the same way.

I agree it's icky, and I'd rather not design APIs like that -- but I
can't help it that others continue to want to use that idiom. I also
agree that most likely we'll want to treat bytes the same as strings
here. But no basestring (bytes are mutable and don't behave like
sequences of characters).

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:38 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:38 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>
	<> <>
Message-ID: <>

On 2/13/06, Adam Olsen <rhamph at> wrote:
> What would that imply for repr()?  To support eval(repr(x)) it would
> have to produce whatever format the source code includes to begin
> with.

I'm not sure that's a requirement. (I do think that in 2.x,
str(bytes(s)) == s should hold as long as type(s) == str.)

> If I understand correctly there's three main candidates:
> 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x

I'm not sure what you mean, but I'm guessing you're thinking that the
repr() of a bytes object created from bytes('abc\xf0') would be


under this rule. What's so bad about that?

> 2. Direct copying to str/unicode if it's only ascii values, switching
> to a list of hex literals if there's any non-ascii values

That works for me too. But why hex literals? As MvL stated, a list of
decimals would be just as useful.

> 3. b"foo" literal with ascii for all ascii characters (other than \
> and "), \xFF for individual characters that aren't ascii
> Given the choice I prefer the third option, with the second option as
> my runner up.  The first option just screams "silent errors" to me.

The 3rd is out of the running for many reasons.

I'm not sure I understand your "silent errors" fear; can you elaborate?

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:47 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:47 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
Message-ID: <>

On 2/13/06, Phillip J. Eby <pje at> wrote:
> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
> >On 2/13/06, Phillip J. Eby <pje at> wrote:
> > > I didn't mean that it was the only purpose.  In Python 2.x, practical code
> > > has to sometimes deal with "string-like" objects.  That is, code that takes
> > > either strings or unicode.  If such code calls bytes(), it's going to want
> > > to include an encoding so that unicode conversions won't fail.
> >
> >That sounds like a rather hypothetical example. Have you thought it
> >through? Presumably code that accepts both str and unicode either
> >doesn't care about encodings, but simply returns objects of the same
> >type as the arguments -- and then it's unlikely to want to convert the
> >arguments to bytes; or it *does* care about encodings, and then it
> >probably already has to special-case str vs. unicode because it has to
> >control how str objects are interpreted.
> Actually, it's the other way around.  Code that wants to output
> uninterpreted bytes right now and accepts either strings or Unicode has to
> special-case *unicode* -- not str, because str is the only "bytes type" we
> currently have.

But this is assuming that the str input is indeed uninterpreted bytes.
That may be a tacit assumption or agreement but it may be wrong. Also,
there are many ways to interpret "uninterpreted bytes" -- is it an
image, a sound file, or UTF-8 text? In 2 out of those 3, passing
unicode is more likely a bug than anything else (except in Jython).

> This creates an interesting issue in WSGI for Jython, which of course only
> has one (unicode-based) string type now.  Since there's no bytes type in
> Python in general, the only solution we could come up with was to treat
> such strings as latin-1:

I believe that's the general convention in Jython, as it matches the
default (albeit deprecated) conversion between bytes and characters in
Java itself.

> This is why I'm biased towards latin-1 encoding of unicode to bytes; it's
> "the same thing" as an uninterpreted string of bytes.

But in CPython this is not how this is generally done.

> I think the difference in our viewpoints is that you're still thinking
> "string" thoughts, whereas I'm thinking "byte" thoughts.  Bytes are just
> bytes; they don't *have* an encoding.

I think when one side of the equation is Unicode, in CPython, I can be
forgiven for thinking string thoughts, since Unicode is never used to
carry binary bytes in CPython.

You may have to craft some kind of different rule for Jython; it
doesn't have a default encoding used when str meets unicode.

> So, if you think of "converting a string to bytes" as meaning "create an
> array of numerals corresponding to the characters in the string", then this
> leads to a uniform result whether the characters are in a str or a unicode
> object.  In other words, to me, bytes(str_or_unicode) should be treated as:
>      bytes(map(ord, str_or_unicode))
> In other words, without an encoding, bytes() should simply treat str and
> unicode objects *as if they were a sequence of integers*, and produce an
> error when an integer is out of range.  This is a logical and consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.

I see your point (now that you mentioned Jython). But I still don't
think that this is a good default for CPython.

> If, however, you include an encoding, then you're stating that you want to
> encode the *meaning* of the string, not merely its integer values.

Note that in Python 3000 we won't be using str/unicode to carry
integer values around, since we will have the bytes type. So there, it
makes sense to think of the conversion to always involve an encoding,
possibly a default one. (And I think the default might more usefully
be UTF-8 then.)

> >What would bytes("abc\xf0", "latin-1") *mean*? Take the string
> >"abc\xf0", interpret it as being encoded in XXX, and then encode from
> >XXX to Latin-1. But what's XXX? As I showed in a previous post,
> >"abc\xf0".encode("latin-1") *fails* because the source for the
> >encoding is assumed to be ASCII.
> I'm saying that XXX would be the same encoding as you specified.  i.e.,
> including an encoding means you are encoding the *meaning* of the string.

That would be the same as ignoring the encoding argument when the
input is str in CPython 2.x, right? I believe we started out saying we
didn't want to ignore the encoding. Perhaps we need to reconsider
that, given the Jython requirement? Then code that converts str to
bytes and needs to be portable between Jython and CPython could write

  b = bytes(s, "latin-1")

> However, I believe I mainly proposed this as an alternative to having
> bytes(str_or_unicode) work like bytes(map(ord,str_or_unicode)), which I
> think is probably a saner default.

Sorry, i still don't buy that.

> >Your argument for symmetry would be a lot stronger if we used Latin-1
> >for the conversion between str and Unicode. But we don't.
> But that's because we're dealing with its meaning *as a string*, not merely
> as ordinals in a sequence of bytes.

Well, *sometimes* the user *meant* it as a string, and *sometimes* she
*didn't*. But we can't tell. I think it's safer to force her to be

> >  I like the
> >other interpretation (which I thought was yours too?) much better: str
> ><--> bytes conversions don't use encodings by simply change the type
> >without changing the bytes;
> I like it better too.  The part you didn't like was where MAL and I believe
> this should be extended to Unicode characters in the 0-255 range also.  :)

I still don't.

> >There's one property that bytes, str and unicode all share: type(x[0])
> >== type(x), at least as long as len(x) >= 1. This is perhaps the
> >ultimate test for string-ness.
> >
> >Or should b[0] be an int, if b is a bytes object? That would change
> >things dramatically.
> +1 for it being an int.  Heck, I'd want to at least consider the
> possibility of introducing a character type (chr?) in Python 3.0, and
> getting rid of the "iterating a string yields strings"
> characteristic.  I've found it to be a bit of a pain when dealing with
> heterogeneous nested sequences that contain strings.

Can you give an example of that pain?

What would a chr type behave like?  Would it be a tiny int or a tiny
string or something else again? Would you write its literals as 'c'?

This would be a huge change and it needs more thought going into it
than "sometimes it can be a bit of a pain", since I'm sure that's also
true with the char-as-tiny-int interpretation.

> >There's also the consideration for APIs that, informally, accept
> >either a string or a sequence of objects. Many of these exist, and
> >they are probably all being converted to support unicode as well as
> >str (if it makes sense at all). Should a bytes object be considered as
> >a sequence of things, or as a single thing, from the POV of these
> >types of APIs? Should we try to standardize how code tests for the
> >difference? (Currently all sorts of shortcuts are being taken, from
> >isinstance(x, (list, tuple)) to isinstance(x, basestring).)
> I'm inclined to think of certain features at least in terms of the buffer
> interface, but that's not something that's really exposed at the Python level.

(And where it is, it's wrong. :-)

If bytes support the buffer interface, we get another interesting
issue -- regular expressions over bytes. Brr.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:13:44 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:13:44 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On 2/14/06, Barry Warsaw <barry at> wrote:
> A related question: what would bytes([104, 101, 108, 108, 111, 8004])
> return?  An exception hopefully.


> I also think you'd want bytes([x
> for x in some_bytes_object]) to return an object equal to the original.

You mean if types(some_bytes_object) is bytes? Yes. But that doesn't
constrain the API much.

Anyway, I'm now convinced that bytes should act as an array of ints,
where the ints are restricted to range(0, 256) but have type int.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 00:14:07 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 15:14:07 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <dsjrfp$g72$>
Message-ID: <>

On 2/14/06, M.-A. Lemburg <mal at> wrote:
> Guido van Rossum wrote:
> > As Phillip guessed, I was indeed thinking about introducing bytes()
> > sooner than that, perhaps even in 2.5 (though I don't want anything
> > rushed).
> Hmm, that is probably going to be too early. As the thread shows
> there are lots of things to take into account, esp. since if you
> plan to introduce bytes() in 2.x, the upgrade path to 3.x would
> have to be carefully planned. Otherwise, we end up introducing
> a feature which is meant to prepare for 3.x and then we end up
> causing breakage when the move is finally implemented.

You make a good point. Someone probably needs to write up a new PEP
summarizing this discussion (or rather, consolidating the agreement
that is slowly emerging, where there is agreement, and summarizing the
key open questions).

> > Even in Py3k though, the encoding issue stands -- what if the file
> > encoding is Unicode? Then using Latin-1 to encode bytes by default
> > might not by what the user expected. Or what if the file encoding is
> > something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> > Anything default but ASCII isn't going to work as expected. ASCII
> > isn't going to work as expected either, but it will complain loudly
> > (by throwing a UnicodeError) whenever you try it, rather than causing
> > subtle bugs later.
> I think there's a misunderstanding here: in Py3k, all "string"
> literals will be converted from the source code encoding to
> Unicode. There are no ambiguities - a Klingon character will still
> map to the same ordinal used to create the byte content regardless
> of whether the source file is encoded in UTF-8, UTF-16 or
> some Klingon charset (are there any ?).

OK, so a string (literal or otherwise) containing a Klingon character
won't be acceptable to the bytes() constructor in 3.0. It shouldn't be
in 2.x either then.

I still think that someone who types a file in Latin-1 and enters
non-ASCII Latin-1 characters in a string literal and then passes it to
the bytes() constructor might expect to get bytes encoded in Latin-1,
and someone who types a file in UTF-8 and enters non-ASCII Unicode
characters might expect to get UTF-8-encoded bytes. Since they can't
both get what they want, we should disallow both, and only allow

> Furthermore, by restricting to ASCII you'd also outrule hex escapes
> which seem to be the natural choice for presenting binary data in
> literals - the Unicode representation would then only be an
> implementation detail of the way Python treats "string" literals
> and a user would certainly expect to find e.g. \x88 in the bytes object
> if she writes bytes('\x88').

I guess we'l just have to disappoint her. Too bad for the person who
wrote bytes("\x12\x34\x56\x78\x9a\xbc\xde\xf0") -- they'll have to
write bytes([0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0]). Not so bad IMO
and certainly easier than a *mixture* of hex and ASCII like

> But maybe you have something different in mind... I'm talking
> about ways to create bytes() in Py3k using "string" literals.

I'm not sure that's going to be common practive except for ASCII
characters used in network protocols.

> >> While we're at it: I'd suggest that we remove the auto-conversion
> >> from bytes to Unicode in Py3k and the default encoding along with
> >> it.
> >
> > I'm not sure which auto-conversion you're talking about, since there
> > is no bytes type yet. If you're talking about the auto-conversion from
> > str to unicode: the bytes type should not be assumed to have *any*
> > properties that the current str type has, and that includes
> > auto-conversion.
> I was talking about the automatic conversion of 8-bit strings to
> Unicode - which was a key feature to make the introduction of
> Unicode less painful, but will no longer be necessary in Py3k.

OK. The bytes type certainly won't have this property.

--Guido van Rossum (home page:

From bob at  Wed Feb 15 00:14:29 2006
From: bob at (Bob Ippolito)
Date: Tue, 14 Feb 2006 15:14:29 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <dstlvb$6cb$>
References: <>
Message-ID: <>

On Feb 14, 2006, at 2:05 PM, Joe Smith wrote:

> "Guido van Rossum" <guido at> wrote in message
> news:ca471dc20602131604v12a4d70eq9d41b5ce543f3264 at
>> In private email, Phillip Eby suggested to add these things to the
>> 2.5. standard library:
>> bdist_deb, bdist_msi, and friends
>> He explained them as follows:
>> """
>> bdist_deb makes .deb files (packages for Debian-based Linux  
>> distros, like
>> Ubuntu).  bdist_msi makes .msi installers for Windows (it's by  
>> Martin v.
>> Loewis).  Marc Lemburg proposed on the distutils-sig that these and
>> various
>> other implemented bdist_* formats (other than bdist_egg) be  
>> included in
>> the
>> next Python release, and there was no opposition there that I recall.
>> """
> I don't like the idea of bdist_deb very much.
> The idea behind the debian packaging system is that unlike with RPM  
> and
> Windows, package management should be clean.
> Windows and RPM are known for major dependency problems, letting  
> packages
> damage each other, having packages that do not uninstall cleanly (i.e.
> packages that leave junk all over the place) and generally messing  
> the sytem
> up quite baddly over time, so that the OS is usually removed and
> re-installed periodically.)

This is one problem that eggs go a LONG way towards solving,  
especially for platforms such as Windows and OS X that do not ship  
with an intelligent package management solution.

The way that eggs are built more or less guarantees that they remain  
consistent, because it temporarily replaces file/open/etc and some  
other functions with sanity checks to make sure that the installation  
layout is self-contained** and thus compatible with eggs.  It's not a  
real chroot, of course, but it's good enough for all practical purposes.

The only things that easy_install overwrites** in the context of eggs  
are other eggs with an identical filename (version, platform, etc.),  
unless explicitly asked to do otherwise (e.g. remove some existing  
older version).  Uninstallation is of course similarly clean, because  
it just nukes one directory or .egg file, and/or an associated .pth  

** The exception is scripts.  Scripts go wherever --install-scripts=  
point to, and AFAIK there is no means to ensure that the scripts from  
one egg do not interfere with the scripts for another egg or anything  
else on the PATH.  I'm also not sure what the uninstallation story  
with scripts is.


From thomas at  Wed Feb 15 00:15:37 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 00:15:37 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 05:48:57PM -0500, Barry Warsaw wrote:
> On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:
> > What about shorter names, such as 'text' instead of 'opentext' and
> > 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> > might make it easy to eventually migrate off it. Maybe text and data
> > could be two subclasses of file, with file remaining initially as it
> > is (and perhaps becoming an abstract-only baseclass at the time 'open'
> > is deprecated).
> I was actually thinking about static methods file.text() and
> which seem nicely self descriptive, if a little bit longer.

Make them classmethods though, like dict.fromkeys.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Wed Feb 15 00:25:58 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 00:25:58 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <dstlvb$6cb$>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 05:05:08PM -0500, Joe Smith wrote:

> I don't like the idea of bdist_deb very much.
> The idea behind the debian packaging system is that unlike with RPM and 
> Windows, package management should be clean.

The idea behind RPM is also that package management should be clean. Debian
packages, on average, do a better job, and 'dpkg' deals a bit more flexibly
with overwritten files and such, but it's not that big a difference.

> The Debian style system attempts to overcome these deficiencies, and
> generally does a decent job with it. The problem is that this can really
> only work if packages are well maintained, and adhere to a set of policies
> that help to further mitigate these problems. Making it easy to generate
> .debs of python modules will likely result in a noticable increase in the
> number of .debs that do not target a specific distribution and/or do not
> follow the policies of that distribution.

That sounds like "oh no, what if the user presses the wrong button". Users
can already mess up the system if they do the wrong thing. Distutils offers
a simple, generic way of saying 'install this' while letting distutils
figure out most of the details. bdist_deb can then put it all in
debian-specific locations, in the debian-preferred way, while registering
all the files so they get deleted properly on deinstall. Things get more
complicated when you have pre-/post-install/remove scripts, but those are
pretty rare for the average Python packages, and since they would (in the
Python package) most likely run from, those would break at
bdist-time, not deb-install-time.

It's not easier for bdist-deb created .deb's to break things than it is for
arbitrary developer-built .deb's to do so, and it's quite a bit easier for
' install' to break things. At least a .deb can be easily removed.
And the alternative to bdist_deb is in many cases ' install'.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From bob at  Wed Feb 15 00:35:14 2006
From: bob at (Bob Ippolito)
Date: Tue, 14 Feb 2006 15:35:14 -0800
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:

> I'm about to send 6 or 8 replies to various salient messages in the
> PEP 332 revival thread. That's probably a sign that there's still a
> lot to be sorted out. In the mean time, to save you reading through
> all those responses, here's a summary of where I believe I stand.
> Let's continue the discussion in this new thread unless there are
> specific hairs to be split in the other thread that aren't addressed
> below or by later posts.
> Non-controversial (or almost):
> - we need a new PEP; PEP 332 won't cut it
> - no b"..." literal
> - bytes objects are mutable
> - bytes objects are composed of ints in range(256)
> - you can pass any iterable of ints to the bytes constructor, as long
> as they are in range(256)

Sounds like array.array('B').

Will the bytes object support the buffer interface?  Will it accept  
objects supporting the buffer interface in the constructor (or a  
class method)?  If so, will it be a copy or a view?  Current  
array.array behavior says copy.

> - longs or anything with an __index__ method should do, too
> - when you index a bytes object, you get a plain int

When slicing a bytes object, do you get another bytes object or a  
list?  If its a bytes object, is it a copy or a view?  Current  
array.array behavior says copy.

> - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
> Somewhat controversial:
> - it's probably too big to attempt to rush this into 2.5
> - bytes("abc") == bytes(map(ord, "abc"))
> - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,  
> 256])

It would be VERY controversial if ord('\xff') == 256 ;)

> Very controversial:
> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"  
> argument
> - bytes(u"abc") == bytes("abc") # for ASCII at least
> - bytes(u"\x80\xff") raises UnicodeError
> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
> Martin von Loewis's alternative for the "very controversial" set is to
> disallow an encoding argument and (I believe) also to disallow Unicode
> arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
> only way to convert a string (which is always unicode) to bytes. The
> problem with this is that there's no code that works in both 2.x and
> 3.0.

Given a base64 or hex string, how do you get a bytes object out of  
it?  Currently str.decode('base64') and str.decode('hex') are good  
solutions to this... but you get a str object back.


From nas at  Wed Feb 15 00:38:33 2006
From: nas at (Neil Schemenauer)
Date: Tue, 14 Feb 2006 16:38:33 -0700
Subject: [Python-Dev] byte literals unnecessary [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 03:13:37PM -0800, Guido van Rossum wrote:
> Also, bytes objects are (in my mind anyway) mutable. We have no other
> literal notation for mutable objects. What would the following code
> print?
>   for i in range(2):
>     b = b"abc"
>     print b
>     b[0] = ord("A")
> Would the second output line print abc or Abc?
> I guess the only answer that makes sense is that it should print abc
> both times; but that means that b"abc" must be internally implemented
> by creating a new bytes object each time. Perhaps the implementation
> effort isn't so minimal after all...

I agree.  I was thinking that bytes() would be immutable and
therefore very similar to the current str object.  You've convinced
me that a literal representation is not needed.  Thanks for
clarifying your position.

> (PS why is there a reply-to in your email the excludes you from the
> list of recipients but includes me?)

Maybe you should ask your coworkers. :-)  I think gmail is trying to
do something intelligent with the Mail-Followup-To header.


From pje at  Wed Feb 15 00:44:13 2006
From: pje at (Phillip J. Eby)
Date: Tue, 14 Feb 2006 18:44:13 -0500
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <dstlvb$6cb$>
Message-ID: <>

At 03:14 PM 2/14/2006 -0800, Bob Ippolito wrote:
>I'm also not sure what the uninstallation story
>with scripts is.

The scripts have enough breadcrumbs in them that you can figure out what 
egg they go with.  More precisely, an egg contains enough information for 
you to search PATH for its scripts and verify that they still refer to the 
egg before removing them.

This is of course fragile if you put the scripts in some random location 
not on your PATH.

Anyway, actual *implementation* of uninstallation features isn't going to 
be until the 0.7 development cycle.

From barry at  Wed Feb 15 00:51:48 2006
From: barry at (Barry Warsaw)
Date: Tue, 14 Feb 2006 18:51:48 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349?
	[	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Tue, 2006-02-14 at 15:13 -0800, Guido van Rossum wrote:

> So I'm taking that the specific properties you want to model are the
> overflow behavior, right? N-bit unsigned is defined as arithmethic mod
> 2**N; N-bit signed is a bit more tricky to define but similar. These
> never overflow but instead just throw away bits in an exactly
> specified manner (2's complement arithmetic).

That would be my use case, yep.

> While I personally am comfortable with writing (x+y) & 0xFFFF (for
> 16-bit unsigned), I can see that someone who spends a lot of time
> doing arithmetic in this field might want specialized types.

I'd put it in the "annoying, although there exists a workaround that
might confound newbies" category.  Which means it's definitely not
urgent enough to address for 2.5 -- if ever -- especially given your
current stance on bytes(bunch_of_ints)[0].  The two are of course
separate issues, but thinking about one lead to the other.

> But I'm not sure that that's what the Numeric folks want -- I believe
> they're more interested in saving space, not in the mod 2**N
> properties. 

Could be.  I don't care about space savings.  And I definitely have no
clue what the Numeric folks want. ;)

> There's certainly a point to treating bytes as ints; I don't know if
> it's more compelling than to treating them as unit bytes. But if we
> decide that the bytes types contains ints, b[0] should return a plain
> int (whose value necessarily is in range(0, 256)), not some new
> unsigned-8-bit type. And creating a bytes object from a list of ints
> should accept any input values as long as their __index__ value is in
> that same range.
> I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and
> bytes([-1]) should raise a ValueError.

That seems fine to me.

> I agree it's icky, and I'd rather not design APIs like that -- but I
> can't help it that others continue to want to use that idiom. I also
> agree that most likely we'll want to treat bytes the same as strings
> here. But no basestring (bytes are mutable and don't behave like
> sequences of characters).

That's interesting.  So bytes really behave a lot more like some weird
string/lists hybrid then? It makes some sense.  You read 801 bytes from
a binary file, twiddle bytes 223 and 741 and then write those bytes back
out to a different binary file.

If we don't inherit from basestring, what I'm worried about is that for
those who do continue to use the idiom described previously, we'll have
to extend our isinstance() to include both basestring and bytes.  Which
definitely gets ickier.  But if bytes are mutable, as make sense, then
it also makes sense that they don't inherit from basestring.

BTW, using that idiom is a bit of a hedge against such API (which you
may not control).  It allows us to say "okay, at /this/ point I don't
know whether I have a scalar or a sequence, but from this point forward,
I know I have something I can safely iterate over."

I wonder if it makes sense to add a more fundamental abstract base class
that can be used as a marker for "photonic behavior".  I don't know what
that class would be called, but you'd then have a hierarchy like this:


OTOH, it seems like a lot to add for a specialized (and some would say
dubious) use case.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From guido at  Wed Feb 15 01:13:07 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 16:13:07 -0800
Subject: [Python-Dev] byte literals unnecessary [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Neil Schemenauer <nas at> wrote:
> Maybe you should ask your coworkers. :-)  I think gmail is trying to
> do something intelligent with the Mail-Followup-To header.

But you're the only person for whom it does that. Do you have a funny
gmail setting?

--Guido van Rossum (home page:

From guido at  Wed Feb 15 01:17:11 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 16:17:11 -0800
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Bob Ippolito <bob at> wrote:
> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
> > - we need a new PEP; PEP 332 won't cut it
> >
> > - no b"..." literal
> >
> > - bytes objects are mutable
> >
> > - bytes objects are composed of ints in range(256)
> >
> > - you can pass any iterable of ints to the bytes constructor, as long
> > as they are in range(256)
> Sounds like array.array('B').


> Will the bytes object support the buffer interface?

Do you want them to?

I suppose they should *not* support the *text* part of that API.

> Will it accept
> objects supporting the buffer interface in the constructor (or a
> class method)?  If so, will it be a copy or a view?  Current
> array.array behavior says copy.

bytes() should always copy -- thanks for asking.

> > - longs or anything with an __index__ method should do, too
> >
> > - when you index a bytes object, you get a plain int
> When slicing a bytes object, do you get another bytes object or a
> list? If its a bytes object, is it a copy or a view?  Current
> array.array behavior says copy.

Another bytes object which is a copy.

(Why would you even think about views here? They are evil.)

> > - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
> >
> > Somewhat controversial:
> >
> > - it's probably too big to attempt to rush this into 2.5
> >
> > - bytes("abc") == bytes(map(ord, "abc"))
> >
> > - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,
> > 256])
> It would be VERY controversial if ord('\xff') == 256 ;)

Oops. :-)

> > Very controversial:
> >
> > - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
> > argument
> >
> > - bytes(u"abc") == bytes("abc") # for ASCII at least
> >
> > - bytes(u"\x80\xff") raises UnicodeError
> >
> > - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
> >
> > Martin von Loewis's alternative for the "very controversial" set is to
> > disallow an encoding argument and (I believe) also to disallow Unicode
> > arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
> > only way to convert a string (which is always unicode) to bytes. The
> > problem with this is that there's no code that works in both 2.x and
> > 3.0.
> Given a base64 or hex string, how do you get a bytes object out of
> it?  Currently str.decode('base64') and str.decode('hex') are good
> solutions to this... but you get a str object back.

I don't know -- you can propose an API you like here. base64 is as
likely to encode text as binary data, so I don't think it's wrong for
those things to return strings.

--Guido van Rossum (home page:

From thomas at  Wed Feb 15 01:24:46 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 01:24:46 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 03:13:25PM -0800, Guido van Rossum wrote:

> Martin von Loewis's alternative for the "very controversial" set is to
> disallow an encoding argument and (I believe) also to disallow Unicode
> arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
> only way to convert a string (which is always unicode) to bytes. The
> problem with this is that there's no code that works in both 2.x and
> 3.0.

Unless you only ever create (byte)strings by doing s.encode(), and only send
them to code that is either byte/string-agnostic or -aware. Oh, and don't
use indexing, only slicing (length-1 if you have to.) I guess it depends on
howmuch code will accept a bytes-string where currently a string is the norm
(and a unicode object is default-encoded.)

I'm still worried that all this is quite a big leap. Very few people
understand the intricacies of unicode encodings. (Almost everyone
understands unicode, except they don't know it yet; it's the encodings that
are the problem.) By forcing everything to be unicode without a uniform
encoding-detection scheme, we're forcing every programmer who opens a file
or reads from the network to think about encodings. This will be a pretty
big step for newbie programmers.

And it's not just that. The encoding of network streams or files may be
entirely unknown beforehand, and depend on the content: a content-encoding,
a <META EQUIV> HTML tag. Will bytes-strings get string methods for easy
searching of content descriptors? Will the 're' module accept bytes-strings?
What would the literals you want to search for, look like? Do I really do
'if bytes("Content-Type:") in data:' and such? Should data perhaps get read
using the opentext() equivalent of 'decode('ascii', 'replace')' and then
parsed the 'normal' way? What about data gotten from an extension? And
nevermind what the 'right way' for that is; what will *programmers* do? The
'right way' often escapes them.

It may well be that I'm thinking too conservatively, too stuck in the old
ways, but I think we're being too hasty in dismissing the ol' string. Don't
get me wrong, I really like the idea of as much of Python doing unicode as
possible, and the idea of a mutable bytes type sounds good to me too. I just
don't like the wide gap between the troublesome-to-get unicode object and
the unreadable-repr, weird-indexing, hard-to-work-with bytes-string. I don't
think adding something inbetween is going to work (we basically have that
now, the normal string), so I suggest the bytes-string becomes a bit more
'string' and a bit less 'sequence of bytes'. Perhaps in the form of:

 - A bytes type that repr()'s to something readable

 - A way to write byte literals that doesn't bleed the eyes, and isn't so
   fragile in the face of source-encoding (all the suggestions so far have
   you explicitly re-stating the source-encoding at each bytes("".encode()))
   If you have to wonder why that's fragile, just think about a recoding
   editor. Alternatively, get a short way to say 'encode in source-encoding'

 (I can't think of anything better than b"..." for the above two...
  Except... hmm... didn't `` become available in Py3k? Too little visual

 - A way to manipulation the bytes as character-strings. Pattern matching,
   splitting, finding, slicing, etc. Quite like current strings.

 - Disallowing any interaction between bytes and real (meaning 'unicode')
   strings. Not "oh, let's assume ascii or the default encoding", either. If
   the user wants to explicitly decode using 'ascii', that's their choice,
   but they should consciously make it.

 - Mutable or immutable, I don't know. I fear that if the bytes type was
   easy enough to handle and mutable, and the normal (unicode) strings were
   immutable, people may end up using bytes all the time. In fact, they may
   do that anyway; I'm sure Python will grow entire subcults that prefer
   doing 'string("\xa1Python!")' where 'string' is

Bytes should be easy enough to manipulate 'as strings' to do the basic
tasks, but not easy enough to encourage people to forget about that whole
annoying 'encoding' business and just use them instead (which is basically
what we have now.) On the other hand, if people don't want to deal with that
whole encoding business, we should allow them to -- consciously. We can
offer a variety of hints and tips on how to figure out the encoding of
something, but we can't do the thinking for them (trust me, I've tried.)

When a file's encoding is specified in file metadata, that's great, really
great. When a network connection is handled by a library that knows how to
deal with the content (*cough*Twisted*cough*) and can decode it for you,
that's really great too. But we're not there yet, not by a long shot. And
explaining encodings to a ADHD-infested teenager high on adrenalin and
creative inspiration who just wants to connect to an IRC server to make his
bot say "Hi!", well, that's hard. I'd rather they don't go and do PHP
instead. Doing it right is hard, but it's even harder to do it all right the
first time, and Python never really worried about that ;P

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From martin at  Wed Feb 15 01:25:15 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 01:25:15 +0100
Subject: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Jeremy Hylton wrote:
>>Perhaps there is some value in finding functions which ought to expect
>>const char*. For that, occasional checks should be sufficient; I cannot
>>see a point in having code permanently pass with that option. In
>>particular not if you are interfacing with C libraries.
> I don't understand what you mean:  I'm not sure what you mean by
> "occasional checks" or "permanently pass".  The compiler flags are
> always the same.

I'm objecting to the "this warning should never occur" rule. If the
warning is turned on in a regular build, then clearly it is desirable
to make it go away in all cases, and add work-arounds to make it
go away if necessary.

This is bad, because it means you add work-arounds to code where
really no work-around is necessary (e.g. because it is *known* that
some function won't modify the storage behind a char*, even though
it doesn't take a const char*). So it is appropriate that the
warning generates many false positives. Therefore, it should be
a manual interaction to turn this warning on, inspect all the
messages, and fix those that need correction, then turn the warning
off again.


From tjreedy at  Wed Feb 15 01:32:12 2006
From: tjreedy at (Terry Reedy)
Date: Tue, 14 Feb 2006 19:32:12 -0500
Subject: [Python-Dev] nice()
References: <><004f01c630c0$f051e1f0$5f2c4fca@csmith>
Message-ID: <dstsqa$sqo$>

"Greg Ewing" <greg.ewing at> wrote in message 
news:43F1B68B.5010604 at
> I don't think you're doing anyone any favours by trying to protect
> them from having to know about these things, because they *need* to
> know about them if they're not to write algorithms that seem to
> work fine on tests but mysteriously start producing garbage when
> run on real data,

I agree.  Here was my 'kick-in-the-butt' lesson (from 20+ years ago):  the 
'simplified for computation' formula for standard deviation, found in too 
many statistics books without a warning as to its danger, and specialized 
for three data points, is sqrt( ((a*a+b*b+c*c)-(a+b+c)**2/3.0) /2.0). 
After 1000s of ok calculations, the data were something like a,b,c = 
10005,10006,10007.  The correct answer is 1.0 but with numbers rounded to 7 
digits, the computed answer is sqrt(-.5) == CRASH.  I was aware that 
subtraction lost precision but not how rounding could make a theoretically 
guaranteed non-negative difference negative.

Of course, Python floats being C doubles makes such glitches much rarer. 
Not exposing C floats is a major newbie (and journeyman) protection 

Terry Jan Reedy

From jimjjewett at  Wed Feb 15 01:39:54 2006
From: jimjjewett at (Jim Jewett)
Date: Tue, 14 Feb 2006 19:39:54 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
Message-ID: <>

On 2/14/06, Neil Schemenauer <nas at> wrote:
> People could spell it bytes(s.encode('latin-1')) in order to make it
> work in 2.X.

Guido wrote:
> At the cost of an extra copying step.

That sounds like an implementation issue.  If it is important
enough to matter, then why not just add some smarts to the
bytes constructor?

If the argument is a str, and the constructor owns the only
reference, then go ahead and use the argument's own
underlying array; the string itself will be deallocated when
(or before) the constructor returns, so no one else can use
it expecting an immutable.


From python at  Wed Feb 15 01:41:07 2006
From: python at (Raymond Hettinger)
Date: Tue, 14 Feb 2006 19:41:07 -0500
Subject: [Python-Dev] bytes type discussion
References: <>
Message-ID: <007e01c631c8$82c1b170$b83efea9@RaymondLaptop1>

[Guido van Rossum]
> Somewhat controversial:
> - bytes("abc") == bytes(map(ord, "abc"))

At first glance, this seems obvious and necessary, so if it's somewhat 
controversial, then I'm missing something.  What's the issue?


From martin at  Wed Feb 15 01:45:39 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 01:45:39 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

Bob Ippolito wrote:
>>Martin von Loewis's alternative for the "very controversial" set is to
>>disallow an encoding argument and (I believe) also to disallow Unicode
>>arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
>>only way to convert a string (which is always unicode) to bytes. The
>>problem with this is that there's no code that works in both 2.x and
> Given a base64 or hex string, how do you get a bytes object out of  
> it?  Currently str.decode('base64') and str.decode('hex') are good  
> solutions to this... but you get a str object back.

If s is a base64 string,


should work. In 2.x, it returns a str, which is then copied into
bytes; in 3.x, .decode("base64") returns a byte string already (*),
for which an extra copy is made.

I would prefer to see base64.decodestring to return bytes,
though - perhaps even in 2.x already.


(*) Interestingly enough, the "base64" encoding will work reversed
in terms of types, compared to all other encodings. Where .encode
returns bytes normally, it will return a string for base64, and
vice versa (assuming the bytes type has .decode/.encode methods).

From greg.ewing at  Wed Feb 15 01:51:03 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 13:51:03 +1300
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

Thomas Wouters wrote:

> Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
> writes a regular distutils, and I can install the latest,
> not-quite-in-apt version by doing ' bdist_deb' and installing the
> resulting .deb.

Why not just do ' install' directly?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From bob at  Wed Feb 15 01:56:00 2006
From: bob at (Bob Ippolito)
Date: Tue, 14 Feb 2006 16:56:00 -0800
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:

> On 2/14/06, Bob Ippolito <bob at> wrote:
>> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
>>> - we need a new PEP; PEP 332 won't cut it
>>> - no b"..." literal
>>> - bytes objects are mutable
>>> - bytes objects are composed of ints in range(256)
>>> - you can pass any iterable of ints to the bytes constructor, as  
>>> long
>>> as they are in range(256)
>> Sounds like array.array('B').
> Sure.
>> Will the bytes object support the buffer interface?
> Do you want them to?
> I suppose they should *not* support the *text* part of that API.

I would imagine that it'd be convenient for integrating with existing  
extensions... e.g. initializing an array or Numeric array with one.

>> Will it accept
>> objects supporting the buffer interface in the constructor (or a
>> class method)?  If so, will it be a copy or a view?  Current
>> array.array behavior says copy.
> bytes() should always copy -- thanks for asking.

I only really ask because it's worth fully specifying these things.   
Copy seems a lot more sensible given the rest of the interpreter and  
stdlib (e.g. buffer(x) seems to always return a read-only buffer).

>>> - longs or anything with an __index__ method should do, too
>>> - when you index a bytes object, you get a plain int
>> When slicing a bytes object, do you get another bytes object or a
>> list? If its a bytes object, is it a copy or a view?  Current
>> array.array behavior says copy.
> Another bytes object which is a copy.
> (Why would you even think about views here? They are evil.)

I mention views because that's what numpy/Numeric/numarray/etc.  
do...  It's certainly convenient at times to have that functionality,  
for example, to work with only the alpha channel in an RGBA image.   
Probably too magical for the bytes type.

 >>> import numpy
 >>> image = numpy.array(list('RGBARGBARGBA'))
 >>> alpha = image[3::4]
 >>> alpha
array([A, A, A], dtype=(string,1))
 >>> alpha[:] = 'X'
 >>> image
array([R, G, B, X, R, G, B, X, R, G, B, X], dtype=(string,1))

>>> Very controversial:
>>> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
>>> argument
>>> - bytes(u"abc") == bytes("abc") # for ASCII at least
>>> - bytes(u"\x80\xff") raises UnicodeError
>>> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
>>> Martin von Loewis's alternative for the "very controversial" set  
>>> is to
>>> disallow an encoding argument and (I believe) also to disallow  
>>> Unicode
>>> arguments. In 3.0 this would leave us with s.encode(<encoding>)  
>>> as the
>>> only way to convert a string (which is always unicode) to bytes. The
>>> problem with this is that there's no code that works in both 2.x and
>>> 3.0.
>> Given a base64 or hex string, how do you get a bytes object out of
>> it?  Currently str.decode('base64') and str.decode('hex') are good
>> solutions to this... but you get a str object back.
> I don't know -- you can propose an API you like here. base64 is as
> likely to encode text as binary data, so I don't think it's wrong for
> those things to return strings.

That's kinda true I guess -- but you'd still need an encoding in py3k  
to turn base64 -> text.  A lot of the current codecs infrastructure  
doesn't make sense in py3k -- for example, the 'zlib' encoding, which  
is really a bytes transform, or 'unicode_escape' which is a text  

I suppose there aren't too many different ways you'd want to encode  
or decode data to binary (beyond the text codecs), they should  
probably just live in a module -- something like the binascii we have  
now.  I do find the codecs infrastructure to be convenient at times  
(maybe too convenient), but since you're not interested in adding  
functions to existing types then a module seems like the best approach.


From thomas at  Wed Feb 15 02:00:10 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 02:00:10 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006 at 01:51:03PM +1300, Greg Ewing wrote:
> Thomas Wouters wrote:

> > Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
> > writes a regular distutils, and I can install the latest,
> > not-quite-in-apt version by doing ' bdist_deb' and installing the
> > resulting .deb.

> Why not just do ' install' directly?

Because that *does* overwrite files the package system might not want
overwritten, and the resulting install is not listed in the packaging
system, not taken into account on upgrades, etc. I don't want to keep track
of a separate list of distutils-installed packages; that's what I use APT
for. If I wanted to keep manually massaging my system after each install or
upgrade, I'd be using Gentoo or FreeBSD ;)

(I should point out that CPAN and CPANPLUS on FreeBSD do this slightly
better; they register packages installed through CPAN (or actually the
build/install part of it, MakefileMaker or whatever it's called) with the
FreeBSD packaging database. I don't know what distutils does on FreeBSD, but
that packaging database is just a bunch of files in appropriately named
directories in /var/db/pkg...)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From greg.ewing at  Wed Feb 15 02:00:21 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 14:00:21 +1300
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <dstlvb$6cb$>
References: <>
Message-ID: <>

Joe Smith wrote:

> Windows and RPM are known for major dependency problems, letting packages 
> damage each other, having packages that do not uninstall cleanly (i.e. 
> packages that leave junk all over the place) and generally messing the sytem 
> up quite baddly over time, so that the OS is usually removed and 
> re-installed periodically.)

I'm disappointed that the various Linux distributions
still don't seem to have caught onto the very simple
idea of *not* scattering files all over the place when
installing something.

MacOSX seems to be the only system so far that has got
this right -- organising the system so that everything
related to a given application or library can be kept
under a single directory, clearly labelled with a
version number.

I haven't looked closely into eggs yet, but if they allow
Python packages to be managed this way, and do it cross-
platform, that's a very good reason to prefer using eggs
over a platform-specific package format.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 15 02:06:17 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 14:06:17 +1300
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

Alex Martelli wrote:

> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?

Because those words are just names for pieces of data,
with nothing to connect them with files or the act of
opening a file.

I think the association of "open" with "file" is
established strongly enough in programmers' brains that
dropping it now would just lead to unnecessary confusion.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From martin at  Wed Feb 15 02:11:24 2006
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 15 Feb 2006 02:11:24 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <007e01c631c8$82c1b170$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
>>- bytes("abc") == bytes(map(ord, "abc"))
> At first glance, this seems obvious and necessary, so if it's somewhat 
> controversial, then I'm missing something.  What's the issue?

There is an "implicit Latin-1" assumption in that code. Suppose
you do

# -*- coding: koi-8r -*-
print bytes("????? ??? ??????")

in Python 2.x, then this means something (*). In Python 3, it gives
you an exception, as the ordinals of this are suddenly above 256.

Or, perhaps worse, the code

# -*- coding: utf-8 -*-
print bytes("Martin v. L?wis")

will work in 2.x and 3.x, but produce different numbers (**).


(*) [231, 215, 201, 196, 207, 32, 215, 193, 206, 32, 242, 207, 211, 211,
213, 205]

(**) In 2.x, this will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 195, 182, 119, 105, 115]
whereas in 3.x, it will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 246, 119, 105, 115]

From guido at  Wed Feb 15 02:15:03 2006
From: guido at (Guido van Rossum)
Date: Tue, 14 Feb 2006 17:15:03 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Jim Jewett <jimjjewett at> wrote:
> On 2/14/06, Neil Schemenauer <nas at> wrote:
> > People could spell it bytes(s.encode('latin-1')) in order to make it
> > work in 2.X.
> Guido wrote:
> > At the cost of an extra copying step.
> That sounds like an implementation issue.  If it is important
> enough to matter, then why not just add some smarts to the
> bytes constructor?

Short answer: you can't.

> If the argument is a str, and the constructor owns the only
> reference, then go ahead and use the argument's own
> underlying array; the string itself will be deallocated when
> (or before) the constructor returns, so no one else can use
> it expecting an immutable.

Hard to explain, but the VM usually keeps an extra reference on the
stack so the refcount is never 1. But you can't rely on that so
assuming that it's safe to reuse the storage if it's >1. Also, since
the str's underlying array is allocated inline with the str header,
this require str and bytes to have the same object layout. But since
bytes are mutable, they can't.

Summary: you don't understand the implementation well enough to
suggest these kinds of things.

--Guido van Rossum (home page:

From trentm at  Wed Feb 15 02:22:14 2006
From: trentm at (Trent Mick)
Date: Tue, 14 Feb 2006 17:22:14 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

[Greg Ewing wrote]
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number.

ActivePython and MacPython have to install stuff to:

    /Applications/MacPython-2.4/...  # just MacPython does this
        # Symlink needed here to have a hope of registration with
        # Apple's (crappy) help viewer system to work.

Also, a receipt of the installation ends up here:


though Apple does not provide tools for uninstallation using those

Mac OS X's installation tech ain't no panacea. If one is just
distributing a single .app, then it is okay. If one is just distributing
a library with no UI (graphical or otherwise) for the user, then it is
okay. And "okay" here still means a pretty poor installation experience
for the user: open DMG, don't run the app from here, drag it to your
Applications folder, then eject this window/disk, then run it from
/Applications, etc.


Trent Mick
TrentM at

From greg.ewing at  Wed Feb 15 02:30:29 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 14:30:29 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsjrfp$g72$>
Message-ID: <>

Guido van Rossum wrote:

> The only remaining question is what if anything to do with an
> encoding argment when the first argument is of type str...)

 From what you said earlier about str in 2.x being
interpretable as a unicode string which contains
only ascii, it seems to me that if you say

   bytes(s, encoding)

where s is a str, then by the presence of the encoding
argument you're saying that you want s to be treated as
unicode and encoded using the specified encoding.
So the result should be the same as

   bytes(u, encoding)

where u is a unicode string containing the same code
points as s. This implies that it should be an error
if s contains non-ascii characters.

This interpretation would satisfy the requirement for
a single call signature covering both unicode and
str-used-as-ascii-characters, while providing a
different call signature (without encoding) for

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From bob at  Wed Feb 15 02:24:06 2006
From: bob at (Bob Ippolito)
Date: Tue, 14 Feb 2006 17:24:06 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On Feb 14, 2006, at 5:00 PM, Greg Ewing wrote:

> Joe Smith wrote:
>> Windows and RPM are known for major dependency problems, letting  
>> packages
>> damage each other, having packages that do not uninstall cleanly  
>> (i.e.
>> packages that leave junk all over the place) and generally messing  
>> the sytem
>> up quite baddly over time, so that the OS is usually removed and
>> re-installed periodically.)
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number.
> I haven't looked closely into eggs yet, but if they allow
> Python packages to be managed this way, and do it cross-
> platform, that's a very good reason to prefer using eggs
> over a platform-specific package format.

It should also be mentioned that eggs and platform-specific package  
formats are absolutely not mutually exclusive.  You could use apt/rpm/ 
ports/etc. to fetch/build/install eggs too.  There are very few  
reasons not to use eggs -- in theory anyway, the implementation isn't  
finished yet.

The only things that really need to change are the packages like  
Twisted, numpy, or SciPy that don't have a distutils-based main Technically, since egg is just a specification, they  
could even implement it themselves without the help of setuptools  
(though that seems like a bad approach).


From thomas at  Wed Feb 15 02:35:03 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 02:35:03 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On Wed, Feb 15, 2006 at 02:00:21PM +1300, Greg Ewing wrote:
> Joe Smith wrote:
> > Windows and RPM are known for major dependency problems, letting packages 
> > damage each other, having packages that do not uninstall cleanly (i.e. 
> > packages that leave junk all over the place) and generally messing the sytem 
> > up quite baddly over time, so that the OS is usually removed and 
> > re-installed periodically.)
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.

Well, as an end user, I honestly don't care. I install stuff through apt, it
installs the dependencies for me, does basic configuration where applicable
(often asking for user-input once, then remembering the settings) and allows
me to deinstall when I'm tired of a package. As long as apt handles it, I
couldn't care less whether it's installed in separate directories, large
bzip2 archives with suitable playmates from mixed ethnicity to improve
social contact, or spread out across every 17th byte of a logical volume.

As a programmer, I also don't care. I tell distutils which modules/packages,
data files and scripts to install, and it does the rest. And that's why I
like my Python packages to become .deb's through bdist_deb :)

You-think-too-much'ly y'rs,
Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From greg.ewing at  Wed Feb 15 02:59:24 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 14:59:24 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
Message-ID: <>

Guido van Rossum wrote:
> On 2/13/06, Phillip J. Eby <pje at> wrote:
>>At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>>On 2/13/06, Phillip J. Eby <pje at> wrote:
>>>What would bytes("abc\xf0", "latin-1") *mean*? 
>>I'm saying that XXX would be the same encoding as you specified.  i.e.,
>>including an encoding means you are encoding the *meaning* of the string.

No, this is wrong. As I understand it, the encoding
argument to bytes() is meant to specify how to *encode*
characters into the bytes object. If you want to be able
to specify how to *decode* a str argument as well, you'd
need a third argument.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From bob at  Wed Feb 15 03:18:48 2006
From: bob at (Bob Ippolito)
Date: Tue, 14 Feb 2006 18:18:48 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On Feb 14, 2006, at 5:22 PM, Trent Mick wrote:

> [Greg Ewing wrote]
>> MacOSX seems to be the only system so far that has got
>> this right -- organising the system so that everything
>> related to a given application or library can be kept
>> under a single directory, clearly labelled with a
>> version number.
> ActivePython and MacPython have to install stuff to:
>     /usr/local/bin/...

The /usr/local/bin links are superfluous.. people should really be  
putting sys.prefix/bin on their path, cause that's where distutils  
scripts get installed to.

>     /Library/Frameworks/Python.framework/...
>     /Applications/MacPython-2.4/...  # just MacPython does this

ActivePython doesn't install app bundles for IDLE or anything?

>     /Library/Documentation/Help/...
>         # Symlink needed here to have a hope of registration with
>         # Apple's (crappy) help viewer system to work.

It is pretty bad.. probably even worth punting on this step.

> Also, a receipt of the installation ends up here:
>     /Library/Receipts/$package_name/...
> though Apple does not provide tools for uninstallation using those
> receipts.

That stuff is really behind the scenes stuff that's wholly managed by and is pretty much irrelevant.

> Mac OS X's installation tech ain't no panacea. If one is just
> distributing a single .app, then it is okay. If one is just  
> distributing
> a library with no UI (graphical or otherwise) for the user, then it is
> okay. And "okay" here still means a pretty poor installation  
> experience
> for the user: open DMG, don't run the app from here, drag it to your
> Applications folder, then eject this window/disk, then run it from
> /Applications, etc.

Single apps are better than OK.  Download them by whatever means you  
want, put them wherever you want, and run them.  You can run any well- 
behaved application from a DMG (or a CD, or a USB key, or any other  
readable media).

Libraries are not so great, as you've said.  However, only developers  
should have to install libraries.  Good applications are shipped with  
all of the libraries they need embedded in the application bundle.   
Dynamic linkage should only really happen internally, and to vendor  
supplied libraries.


From greg.ewing at  Wed Feb 15 04:03:09 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 16:03:09 +1300
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

Trent Mick wrote:

> ActivePython and MacPython have to install stuff to:
>     /usr/local/bin/...
>     /Library/Frameworks/Python.framework/...
>     /Applications/MacPython-2.4/...  # just MacPython does this

It's not perfect, but it's still a lot better than the
situation on any other unix I've seen so far. It's a
bit more complicated with something like Python, which
is really several things - a library, an application,
and some unix programs (the latter of which don't really
fit into the MacOSX structure).

At least all of the myriad library and header files go
together under a single easily-identified directory, if
you know where to look for it.

 >     /Library/Documentation/Help/...
 >         # Symlink needed here to have a hope of registration with
 >         # Apple's (crappy) help viewer system to work.

I didn't know about that one. It never even occurred to me
that Python might *have* Apple Help Viewer files. I use
Firefox to view all my Python documentation. :-)

> Also, a receipt of the installation ends up here:
>     /Library/Receipts/$package_name/...
> though Apple does not provide tools for uninstallation using those
> receipts.

And I hope they don't! I'd rather see progress towards
a system where you don't *need* a special tool to uninstall
something. It should be as simple and obvious as dragging
a file or folder to the trash.

> open DMG, don't run the app from here, drag it to your
> Applications folder, then eject this window/disk, then run it from
> /Applications,

A decently-designed application should be runnable from
anywhere, including a dmg, if the user wants to do that.
If an app refuses to run from a dmg, I consider that a
bug in the application.

Likewise, the user should be able to put it anywhere on
the HD, not just the Applications folder.

Also I consider the need for a dmg in the first place
to be a bug in the Web. :-) (You should be able to just
directly download the .app file.)

This sort of thing is still not quite as smooth as it
was under Classic MacOS, but I'm hopeful of improvement.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 15 04:19:19 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 16:19:19 +1300
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

Thomas Wouters wrote:

> Well, as an end user, I honestly don't care.
> As a programmer, I also don't care.

Perhaps I've been burned once too often by someone's
oh-so-clever installer script screwing up and leaving
me to wade through an impenetrable pile of makefiles,
shell scripts and m4 macros trying to figure out what
went wrong and what I can possibly do to fix it, but
I've become a deep believer in keeping things simple.

Common sense suggests that a system which keeps
everything related to a package, and only to that
package, in one directory, has got to be more robust
than one which scatters files far and wide and then
relies on some elaborate bookkeeping system to try
to make sure things don't step on each other's toes.

When everything goes right, I don't care either. But
things go wrong often enough to make me care about
unnecessary complexity in the tools I use.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 15 04:34:23 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 16:34:23 +1300
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

Thomas Wouters wrote:
> The encoding of network streams or files may be
> entirely unknown beforehand, and depend on the content: a content-encoding,
> a <META EQUIV> HTML tag. Will bytes-strings get string methods for easy
> searching of content descriptors?

Seems to me this is a case where you want to be able
to change encodings in the middle of reading the stream.
You start off reading the data as ascii, and once you've
figured out the encoding, you switch to that and carry
on reading.

Are there any plans to make it possible to change the
encoding of a text file object on the fly like this?

If that would be awkward, maybe file objects themselves
shouldn't be where the decoding occurs, but decoders
should be separate objects that wrap byte streams.
Under that model,

   opentext(filename, encoding)

would be a factory function that did something like

   codecs.streamdecoder(encoding, openbinary(filename))

Having codecs be stream filters might be a good idea
anyway, since then you could use them to wrap anything
that can be treated as a stream of bytes (sockets,
some custom object in your program, etc.), you
could create pipelines of encoders and decoders, etc.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From oliphant.travis at  Wed Feb 15 04:39:49 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Tue, 14 Feb 2006 20:39:49 -0700
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <dsu7q8$m2i$>

Guido van Rossum wrote:
> I'm about to send 6 or 8 replies to various salient messages in the
> PEP 332 revival thread. That's probably a sign that there's still a
> lot to be sorted out. In the mean time, to save you reading through
> all those responses, here's a summary of where I believe I stand.
> Let's continue the discussion in this new thread unless there are
> specific hairs to be split in the other thread that aren't addressed
> below or by later posts.

I hope bytes objects will be pickle-able?  If so, and they support the 
buffer protocol, then many NumPy users will be very happy.


From rrr at  Wed Feb 15 04:45:26 2006
From: rrr at (Ron Adam)
Date: Tue, 14 Feb 2006 21:45:26 -0600
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Guido van Rossum wrote:
>> On 2/13/06, Phillip J. Eby <pje at> wrote:
>>> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>>> On 2/13/06, Phillip J. Eby <pje at> wrote:
>>>> What would bytes("abc\xf0", "latin-1") *mean*? 
>>> I'm saying that XXX would be the same encoding as you specified.  i.e.,
>>> including an encoding means you are encoding the *meaning* of the string.
> No, this is wrong. As I understand it, the encoding
> argument to bytes() is meant to specify how to *encode*
> characters into the bytes object. If you want to be able
> to specify how to *decode* a str argument as well, you'd
> need a third argument.

I'm not sure I understand why this would be needed?  But maybe it's 
still too early to pin anything down.

My first impression and thoughts were:  (and seems incorrect now)

     bytes(object) ->  byte sequence of objects value

Basically a "memory dump" of objects value.  And so...

     object(bytes) ->  copy of original object

This would reproduce a copy of the original object as long as the from 
and to object are the same type with no encoding needed.  If they are 
different then you would get garbage, or an error. But that would be a 
programming error and not a language issue. It would be up to the 
programmer to not do that.

Of course this is one of those easier to say than do concepts I'm sure.

And I was thinking a bytes argument of more than one item would indicate 
a byte sequence.

     bytes(1,2,3)  ->  bytes([1,2,3])

Where any values above 255 would give an error,  but it seems an 
explicit list is preferred.  And that's fine because it creates a way 
for bytes to know how to handle everything else. (I think)

    bytes([1,2,3]]  -> bytes[(1,2,3)]

Which is fine... so ???

    b = bytes(0L) ->  bytes([0,0,0,0])

    long(b) ->  0L    convert it back to 0L

And ...

    b = bytes([0L])  ->  bytes([0])  # a single byte

    int(b) ->  0    convert it back to 0
    long(b) ->  0L

It's up to the programmer to know if it's safe. Working with raw data is 
always a programmer needs to be aware of what's going on thing.

But would it be any different with strings?  You wouldn't ever want to 
encode one type's bytes into a different type directly. It would be 
better to just encode it back to the original type, then use *it's* 
encoding method to change it.


   b = bytes(s)  ->  bytes( raw sequence of bytes )

Weather or not you get a single byte per char or multiple bytes per 
character would depend on the strings encoding.

   s = str(bytes, encoding)  ->  original string

You need to specify it here, because there is more than one sting 
encoding. To avoid encodings entirely we would need a type for each 
encoding. (which isn't really avoiding anything) And it's the "raw data 
so programmer needs to be aware" situation again. Don't decode to 
something other than what it is.

If someone needs automatic encoding/decoding, then they probably should 
write a class to do what they want.  Something roughly like...

   class bytekeeper(object):
      b = None
      t = None
      e = None
      def __init__(self, obj, enc='bytes')   # or whatever encoding
         self.e = enc
         self.t = type(obj)
         self.b = bytes(obj)
      def decode(self):

Would we be able to subclass bytes?

     class bytekeeper(bytes):   ?

Ok.. enough rambling... I wonder how much of this is way out in left 
field.  ;)

  Ronald Adam

And as fa

In this case the encoding argument would only be needed not to

From oliphant.travis at  Wed Feb 15 04:41:19 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Tue, 14 Feb 2006 20:41:19 -0700
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot to
Message-ID: <dsu7t9$m9c$>

After some revisions, PEP 357 is ready for more comments.  Please voice 
any concerns.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pep-0357.txt

From rhamph at  Wed Feb 15 05:14:45 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 21:14:45 -0700
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <r01050400-1039-7EC926449D9911DA8736001124365170@>
References: <>
Message-ID: <>

On 2/14/06, Just van Rossum <just at> wrote:
> +1 for two functions.
> My choice would be open() for binary and opentext() for text. I don't
> find that backwards at all: the text function is going to be more
> different from the current open() function then the binary function
> would be since in many ways the str type is closer to bytes than to
> unicode.
> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

Thus providing us with a transition period, even with warnings on use
of the old function.

I think coming up with a way to transition that doesn't silently break
code and doesn't leave us with permanent ugly names is the hardest
challenge here.

+1 on opentext(), openbinary()
-1 on silently changing open() in a way that results in breakage

Adam Olsen, aka Rhamphoryncus

From fdrake at  Wed Feb 15 05:23:45 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Tue, 14 Feb 2006 23:23:45 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Tuesday 14 February 2006 22:34, Greg Ewing wrote:
 > Seems to me this is a case where you want to be able
 > to change encodings in the middle of reading the stream.
 > You start off reading the data as ascii, and once you've
 > figured out the encoding, you switch to that and carry
 > on reading.

Not quite.  The proper response in this case is often to re-start decoding 
with the correct encoding, since some of the data extracted so far may have 
been decoded incorrectly.  A very carefully constructed application may be 
able to go back and re-decode any data saved from the stream with the 
previous encoding, but that seems like it would be pretty fragile in 

There may be cases where switching encoding on the fly makes sense, but I'm 
not aware of any actual examples of where that approach would be required.


Fred L. Drake, Jr.   <fdrake at>

From fdrake at  Wed Feb 15 05:40:53 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Tue, 14 Feb 2006 23:40:53 -0500
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$> <>
Message-ID: <>

On Tuesday 14 February 2006 03:09, Neal Norwitz wrote:
 > While you are here, are you planning to do the doc releases for 2.5?
 > You are tentatively listed in PEP 356.  (Technically it says TBD with
 > a ? next to your name.)

Releases generally aren't a problem, since they're heavily automated and 
scheduled well in advance.  I'm glad to continue helping with that, 
especially since that seems to be about all I can get to sometimes.

 > I think this was the quick hack I did.  I hope there are many
 > concerns. :-)  For example, if the doc build fails, ...  Hmmm, this
 > probably isn't a problem.  The doc won't be updated, but will still be
 > the last good version.  So if I send mail when the doc doesn't build,
 > then it might not be so bad.  

Seems reasonable to me.

 > I still need to 
 > switch over the failure mails to go to python-checkins.  There are too 
 > many right now though.  Unless people don't mind getting several
 > messages about refleaks every day?  Anyone?

Documentation build errors should probably be separated from leak detection 
reports.  I don't know what it would take to get them separated.

 > That shouldn't be a problem.  See

Works for me!  Thanks for putting the effort into this.

The general question of where the development docs should show up remains.  
There are a number of options:

1., where I'd put them at one point

2., which is reasonable, but new

3., which seems reasonable, but
   proponents may not like

4. for trunk documentation, and and/or for maintenance updates

That last one has a certain appeal.  It would allow corrections to go online 
quicker, so people using or a mirror would get updates quickly (an 
advantage of delivering docs over the net!), and I wouldn't get so many 
repeat reports of commonly-noticed typos.  The released versions would still 
be available via

My own inclination is that if we continue to use, it should 
contain only one copy of the documentation, and that should be for the most 
recent "stable" release (though perhaps an updated version of the 
documentation).  I'm not really on either side of the fence about whether is the "right thing" to do; the idea came out of the folks 
interested in advocacy.


Fred L. Drake, Jr.   <fdrake at>

From rhamph at  Wed Feb 15 05:41:02 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 21:41:02 -0700
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, "Martin v. L?wis" <martin at> wrote:
> Raymond Hettinger wrote:
> >>- bytes("abc") == bytes(map(ord, "abc"))
> >
> >
> > At first glance, this seems obvious and necessary, so if it's somewhat
> > controversial, then I'm missing something.  What's the issue?
> There is an "implicit Latin-1" assumption in that code. Suppose
> you do
> # -*- coding: koi-8r -*-
> print bytes("????? ??? ??????")
> in Python 2.x, then this means something (*). In Python 3, it gives
> you an exception, as the ordinals of this are suddenly above 256.
> Or, perhaps worse, the code
> # -*- coding: utf-8 -*-
> print bytes("Martin v. L?wis")
> will work in 2.x and 3.x, but produce different numbers (**).

My assumption is these would become errors in 3.x.  bytes(str) is only
needed so you can do bytes(u"abc".encode('utf-8')) and have it work in
2.x and 3.x.

(I wonder if maybe they should be an error in 2.x as well.  Source
encoding is for unicode literals, not str literals.)

Adam Olsen, aka Rhamphoryncus

From rhamph at  Wed Feb 15 06:02:32 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 22:02:32 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
	<> <>
Message-ID: <>

On 2/14/06, Guido van Rossum <guido at> wrote:
> On 2/14/06, Adam Olsen <rhamph at> wrote:
> > In 3.0 it changes to:
> > "It's...".encode('utf-8')
> > u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility
> No. 3.0 won't have "backward compatibility" features. That's the whole
> point of 3.0.


> > I realize it would be odd for the interactive interpret to print them
> > as a list of ints by default:
> > >>> u"It's...".byteencode('utf-8')
> > [73, 116, 39, 115, 46, 46, 46]
> No. This prints the repr() which should include the type. bytes([73,
> 116, 39, 115, 46, 46, 46]) is the right thing to print here.

Typo, sorry :)

Adam Olsen, aka Rhamphoryncus

From nnorwitz at  Wed Feb 15 06:04:48 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 14 Feb 2006 21:04:48 -0800
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$> <>
Message-ID: <>

On 2/14/06, Fred L. Drake, Jr. <fdrake at> wrote:
> Releases generally aren't a problem, since they're heavily automated and
> scheduled well in advance.  I'm glad to continue helping with that,
> especially since that seems to be about all I can get to sometimes.

Great, I updated the PEP.

> Documentation build errors should probably be separated from leak detection
> reports.  I don't know what it would take to get them separated.

Yup, they already are AFAICT.  I will activate the 2.4 doc builds to
send failures to python-checkins unless someone has a better idea. 
These should be very rare.  The destination is controlled by

> The general question of where the development docs should show up remains.
[4 options sliced]

Agreed, I don't have a strong opinion either.  There should definitely
only be one place to look though.  That should make things easier. 
What do others think?

> My own inclination is that if we continue to use, it should
> contain only one copy of the documentation, and that should be for the most
> recent "stable" release (though perhaps an updated version of the
> documentation).



From rhamph at  Wed Feb 15 06:11:49 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 22:11:49 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
	<> <>
Message-ID: <>

On 2/14/06, Guido van Rossum <guido at> wrote:
> On 2/13/06, Adam Olsen <rhamph at> wrote:
> > If I understand correctly there's three main candidates:
> > 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
> I'm not sure what you mean, but I'm guessing you're thinking that the
> repr() of a bytes object created from bytes('abc\xf0') would be
>   bytes('abc\xf0')
> under this rule. What's so bad about that?

See below.

> > 2. Direct copying to str/unicode if it's only ascii values, switching
> > to a list of hex literals if there's any non-ascii values
> That works for me too. But why hex literals? As MvL stated, a list of
> decimals would be just as useful.

PEBKAC.  Yeah, decimals are simpler and shorter even.

> > 3. b"foo" literal with ascii for all ascii characters (other than \
> > and "), \xFF for individual characters that aren't ascii
> >
> > Given the choice I prefer the third option, with the second option as
> > my runner up.  The first option just screams "silent errors" to me.
> The 3rd is out of the running for many reasons.
> I'm not sure I understand your "silent errors" fear; can you elaborate?

I think it's that someone will create a unicode object with real
latin-1 characters and it'll get passed through without errors, the
code assuming it's 8bit-as-latin-1.  If they had put other unicode
characters in they would have gotten an exception instead.

However, at this point all the posts on latin-1 encoding/decoding have
become so muddled in my mind that I don't know what they're
suggesting.  I think I'll wait for the pep to clear that up.

Adam Olsen, aka Rhamphoryncus

From rhamph at  Wed Feb 15 06:20:16 2006
From: rhamph at (Adam Olsen)
Date: Tue, 14 Feb 2006 22:20:16 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Guido van Rossum <guido at> wrote:
> Not entirely, since I don't know what b"abc<euro>def" would mean
> (where <euro> is a Unicode Euro character typed in whatever source
> encoding was used).

SyntaxError I would hope.  Ascii and hex escapes only please. :)

Although I'm not arguing for or against byte literals.  They do make
for a much terser form, but they're not strictly necessary.

Adam Olsen, aka Rhamphoryncus

From nnorwitz at  Wed Feb 15 06:24:57 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 14 Feb 2006 21:24:57 -0800
Subject: [Python-Dev] 2.5 release schedule
Message-ID: <>

I was hoping to get a lot more feedback about PEP 356 and the 2.5
release schedule.

I updated the schedule it is now:

    alpha 1: May 6, 2006 [planned]
    alpha 2: June 3, 2006 [planned]
    alpha 3: July 1, 2006 [planned]
    beta 1:  July 29, 2006 [planned]
    beta 2:  August 26, 2006 [planned]
    rc 1:    September 16, 2006 [planned]
    final:   September 30, 2006 [planned]

What do people think about that?  There are still a lot of features we
want to add.  Is this ok with everyone?  Do you think it's realistic?

We still need a release manager.  No one has heard from Anthony.  If
he isn't interested is someone else interested in trying their hand at
it?  There are many changes necessary in PEP 101 because since the
last release both python and pydotorg have transitioned from CVS to
SVN.  Creosote also moved.


From janssen at  Wed Feb 15 06:32:09 2006
From: janssen at (Bill Janssen)
Date: Tue, 14 Feb 2006 21:32:09 PST
Subject: [Python-Dev] how to upload new MacPython web page?
Message-ID: <06Feb14.213215pst."58633">

We (the pythonmac-sig mailing list) seem to have converged (almost --
still talking about the logo) on a new download page for MacPython, to
replace the page currently at  The strawman can be
seen at

How do I get the bits changed on (when we're finished)?


From brett at  Wed Feb 15 06:49:17 2006
From: brett at (Brett Cannon)
Date: Tue, 14 Feb 2006 21:49:17 -0800
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Neal Norwitz <nnorwitz at> wrote:
> I was hoping to get a lot more feedback about PEP 356 and the 2.5
> release schedule.
> I updated the schedule it is now:
>     alpha 1: May 6, 2006 [planned]
>     alpha 2: June 3, 2006 [planned]
>     alpha 3: July 1, 2006 [planned]
>     beta 1:  July 29, 2006 [planned]
>     beta 2:  August 26, 2006 [planned]
>     rc 1:    September 16, 2006 [planned]
>     final:   September 30, 2006 [planned]
> What do people think about that?  There are still a lot of features we
> want to add.  Is this ok with everyone?  Do you think it's realistic?

Speaking as one of the people who has a PEP to implement, I am okay with it.


From nnorwitz at  Wed Feb 15 06:58:46 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 14 Feb 2006 21:58:46 -0800
Subject: [Python-Dev] 2.5 PEP
Message-ID: <>

Attached is the 2.5 release PEP 356.  It's also available from:

Does anyone have any comments?  Is this good or bad?  Feel free to
send to me comments.

We need to ensure that PEPs 308, 328, and 343 are implemented.  We
have possible volunteers for 308 and 343, but not 328.  Brett is doing
352 and Martin is doing 353.

We also need to resolve a bunch of other implementation details about
providing the C AST to Python, bdist_* issues and a few more possible
stdlib modules.  Don't be shy, tell the world what you think about

Can someone go through PEP 4 and 11 and determine what work needs to be done?

The more we distribute the work, the easier it will be on everyone. 
You don't really want to listen to me whine any more do you? ;-)

Thank you,
-------------- next part --------------
PEP: 356
Title: Python 2.5 Release Schedule
Version: $Revision: 42375 $
Author: Neal Norwitz, GvR
Status: Draft
Type: Informational
Created: 07-Feb-2006
Python-Version: 2.5


    This document describes the development and release schedule for
    Python 2.5.  The schedule primarily concerns itself with PEP-sized
    items.  Small features may be added up to and including the first
    beta release.  Bugs may be fixed until the final release.

    There will be at least two alpha releases, two beta releases, and
    one release candidate.  The release date is planned 30 September 2006.

Release Manager

    TBD (Anthony Baxter?)

    Martin von Loewis is building the Windows installers,
    Fred Drake the doc packages, and
    TBD (Sean Reifschneider?) the RPMs.

Release Schedule

    alpha 1: May 6, 2006 [planned]
    alpha 2: June 3, 2006 [planned]
    alpha 3: July 1, 2006 [planned]
    beta 1:  July 29, 2006 [planned]
    beta 2:  August 26, 2006 [planned]
    rc 1:    September 16, 2006 [planned]
    final:   September 30, 2006 [planned]

Completed features for 2.5

    PEP 309: Partial Function Application
    PEP 314: Metadata for Python Software Packages v1.1
        (should PEP 314 be marked final?)
    PEP 341: Unified try-except/try-finally to try-except-finally
    PEP 342: Coroutines via Enhanced Generators

    - AST-based compiler

    - Add support for reading shadow passwords (

    - any()/all() builtin truth functions

    - new hashlib module add support for SHA-224, -256, -384, and -512
      (replaces old md5 and sha modules)

    - new cProfile module suitable for profiling long running applications
      with minimal overhead

Planned features for 2.5

    PEP 308: Conditional Expressions
    (Someone volunteered on python-dev, is there progress?)

    PEP 328: Absolute/Relative Imports
    (Needs volunteer, mail python-dev if interested)

    PEP 343: The "with" Statement
    (nn: I have a possible volunteer.)
        Note there are two separate implementation parts:
        interpreter changes and python code for utilities.

    PEP 352: Required Superclass for Exceptions
    (Brett Cannon is expected to implement this.)

    PEP 353: Using ssize_t as the index type
    MvL expects this to be complete in March.

    Access to C AST from Python

    Add bdist_msi to the distutils package.  (MvL wants one more
    independent release first.)

    Add bdist_deb to the distutils package?

    Add bdist_egg to the distutils package???

    Add setuptools to the standard library.

    Add wsgiref to the standard library.

    (GvR: I have a bunch more that could/would/should be added. -- Still true?)

Deferred until 2.6:

    - None

Open issues

    This PEP needs to be updated and release managers confirmed.

    - Review PEP  4: Deprecate and/or remove the modules
    - Review PEP 11: Remove support for platforms as described


    This document has been placed in the public domain.

Local Variables:
mode: indented-text
indent-tabs-mode: nil

From greg.ewing at  Wed Feb 15 07:44:12 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 19:44:12 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
	<> <>
Message-ID: <>

Ron Adam wrote:

> My first impression and thoughts were:  (and seems incorrect now)
>      bytes(object) ->  byte sequence of objects value
> Basically a "memory dump" of objects value.

As I understand the current intentions, this is correct.
The bytes constructor would have two different signatures:

    (1)   bytes(seq) --> interprets seq as a sequence of
                         integers in the range 0..255,
                         exception otherwise

    (2a)  bytes(str, encoding)     --> encodes the characters of
    (2b)  bytes(unicode, encoding)     the string using the specified

In (2a) the string would be interpreted as containing
ascii characters, with an exception otherwise. In 3.0,
(2a) will disappear leaving only (1) and (2b).

> And I was thinking a bytes argument of more than one item would indicate 
> a byte sequence.
>      bytes(1,2,3)  ->  bytes([1,2,3])

But then you have to test the argument in the one-argument
case and try to guess whether it should be interpreted as
a sequence or an integer. Best to avoid having to do that.

> Which is fine... so ???
>     b = bytes(0L) ->  bytes([0,0,0,0])

No, bytes(0L) --> TypeError because 0L doesn't implement
the iterator protocol or the buffer interface.

I suppose long integers might be enhanced to support the
buffer interface in 3.0, but that doesn't seem like a good
idea, because the bytes you got that way would depend on
the internal representation of long integers. In particular,


via the buffer interface would most likely *not* give you
bytes[0x12, 0x34, 0x56, 0x78]).

Maybe types should grow a __bytes__ method?


From greg.ewing at  Wed Feb 15 07:44:20 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 19:44:20 +1300
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Fred L. Drake, Jr. wrote:

> The proper response in this case is often to re-start decoding 
> with the correct encoding, since some of the data extracted so far may have 
> been decoded incorrectly.

If the protocol has been sensibly designed, that shouldn't
happen, since everything up to the coding marker should
be ascii (or some other protocol-defined initial coding).

For protocols that are not sensibly designed (or if you're
just trying to guess) what you suggest may be needed. But
it would be good to have a nicer way of going about it
for when the protocol is sensible.


From fdrake at  Wed Feb 15 08:12:37 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Wed, 15 Feb 2006 02:12:37 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Wednesday 15 February 2006 01:44, Greg Ewing wrote:
 > If the protocol has been sensibly designed, that shouldn't
 > happen, since everything up to the coding marker should
 > be ascii (or some other protocol-defined initial coding).


 > For protocols that are not sensibly designed (or if you're
 > just trying to guess) what you suggest may be needed. But
 > it would be good to have a nicer way of going about it
 > for when the protocol is sensible.

I agree in principle, but the example of using an HTML <meta> tag as a source 
of document encoding information isn't sensible.  Unfortunately, it's still 
part of the HTML specification.  :-(

I'm not opposing a way to do a sensible thing, but wanted to note that it 
wasn't going to be right for all cases, with such an example having been 
mentioned already (though the issues with it had not been fully spelled out).


Fred L. Drake, Jr.   <fdrake at>

From martin at  Wed Feb 15 09:03:49 2006
From: martin at (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 15 Feb 2006 09:03:49 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>	
Message-ID: <>

Adam Olsen wrote:
> My assumption is these would become errors in 3.x.  bytes(str) is only
> needed so you can do bytes(u"abc".encode('utf-8')) and have it work in
> 2.x and 3.x.

I think the proposal for bytes(seq) to mean bytes(map(ord, seq))
was meant to be valid for both 2.x and 3.x, on the grounds that
you should be able to write byte string constants in the same
way in all versions.

> (I wonder if maybe they should be an error in 2.x as well.  Source
> encoding is for unicode literals, not str literals.)

Source encoding applies to the entire source code, including (byte)
string literals, comments, identifiers, and keywords. IOW, if you
declare your source encoding is utf-8, the keyword "print" must
be represented with the bytes that represent the Unicode letters
for "p","r","i","n", and "t" in UTF-8.


From martin at  Wed Feb 15 09:14:37 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 09:14:37 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>	<>
	<>	<>
Message-ID: <>

Greg Ewing wrote:
> If the protocol has been sensibly designed, that shouldn't
> happen, since everything up to the coding marker should
> be ascii (or some other protocol-defined initial coding).

XML, for one protocol, requires you to restart over. The
initial sequence could be UTF-16, or it could be EBCDIC.
You read a few bytes (up to four), then know which of
these it is. Then you start over, reading further if
it looks like an ASCII superset, to find out the real
encoding. You normally then start over, although switching
at that point could also work.

> For protocols that are not sensibly designed (or if you're
> just trying to guess) what you suggest may be needed. But
> it would be good to have a nicer way of going about it
> for when the protocol is sensible.

There might be buffering of decoded strings already,
(ie. beyond the point to which you have read), so
you would need to unbuffer these, and reinterpret
them. To support that, you really need to buffer
both the original bytes, and the decoded ones, since
the encoding might not roundtrip.


From martin at  Wed Feb 15 09:19:33 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 09:19:33 +0100
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:
> What do people think about that?  There are still a lot of features we
> want to add.  Is this ok with everyone?  Do you think it's realistic?

My view on schedules is that they need to exist, whether they are
followed or not. So having one is orders of magnitude better than
having none. This specific one "looks right" also.

> We still need a release manager.  No one has heard from Anthony.  If
> he isn't interested is someone else interested in trying their hand at
> it?

He might be on vacation, no need to worry yet. If he doesn't want to
do it, I would.


From alain.poirier at  Wed Feb 15 09:22:43 2006
From: alain.poirier at (Alain Poirier)
Date: Wed, 15 Feb 2006 09:22:43 +0100
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>


2 questions:

  - is (c)ElementTree still planned for inclusion ?
  - isn't the current implementation of itertools.tee (cache of previous
    generated values) incompatible with the new possibility to feed a
    generator (PEP 342) ?


Neal Norwitz a ?crit :
> Attached is the 2.5 release PEP 356.  It's also available from:
> Does anyone have any comments?  Is this good or bad?  Feel free to
> send to me comments.
> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.
> We also need to resolve a bunch of other implementation details about
> providing the C AST to Python, bdist_* issues and a few more possible
> stdlib modules.  Don't be shy, tell the world what you think about
> these.
> Can someone go through PEP 4 and 11 and determine what work needs to be
> done?
> The more we distribute the work, the easier it will be on everyone.
> You don't really want to listen to me whine any more do you? ;-)
> Thank you,

From ncoghlan at  Wed Feb 15 09:33:54 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 18:33:54 +1000
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>	<dstlvb$6cb$>
Message-ID: <>

Bob Ippolito wrote:
> ** The exception is scripts.  Scripts go wherever --install-scripts=  
> point to, and AFAIK there is no means to ensure that the scripts from  
> one egg do not interfere with the scripts for another egg or anything  
> else on the PATH.  I'm also not sure what the uninstallation story  
> with scripts is.

Hopefully PEP 338 will go some way towards fixing that - in Python 2.5, the 
'-m' switch should be able to run modules inside eggs as scripts, reducing the 
need to install them directly into the filesystem.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From brett at  Wed Feb 15 09:34:35 2006
From: brett at (Brett Cannon)
Date: Wed, 15 Feb 2006 00:34:35 -0800
Subject: [Python-Dev] C AST to Python discussion
Message-ID: <>

As per Neal's prodding email, here is a thread to discuss where we
want to go with the C AST to Python stuff and what I think are the
core issues at the moment.

First issue is the ast-objects branch.  Work is being done on it, but
it still leaks some references (Neal or Martin can correct me if I am
wrong).  We really should choose either this branch or the current
solution before really diving into coding stuff for exposing the AST
so as to not waste too much time.  Basically the issues are that the
current solution will require using a serialization form to go from C
to Python and back again.  The PyObjects solution in the branch won't
need this.  One protects us from ending up with an unusable AST since
the seralization can keep the original AST around and if the version
passed back in from Python code is junk it can be tossed and the
original version used.  The PyObjects branch most likely won't have
this since the actual AST will most likely be passed to Python code. 
But there is performance issues with all of this seralization compared
to a simple Pyobject pointer into Pythonland.  Jeremy supports the
serialization option.  I am personally indifferent while leaning
towards the serialization.

Then there is the API.  First we need to decide if AST modification is
allowed or not.  It has been argued on my blog by someone (see for the
entry on this whole topic which highly mirrors this email) that Guido
won't okay AST transformations since it can lead to control flow
changes behind the scenes.  I say that is fine as long as knowing that
AST transformations are occurring are sufficiently obvious.  I say
allow transformations.

Once that is settled, I see three places for possible access to the
AST.  One is the command line like -m.  Totally obvious to the user as
long as they are not just working off of the .pyc files.  Next is
something like sys.ast_transformations that is a list of functions
that are passed in the AST (and return a new version if modifications
are allowed).  This could allow chaining of AST transformations by
feeding the next function with another one.  Next is per-object AST
access.  This could get expensive since if we don't keep a copy of the
AST with the code objects (which we probably shouldn't since that is
wasted memory if the AST is not used a lot) we will need to read the
code a second time to get the AST regenerated.

I personally think we should choose an initial global access API to
the AST as a starting API.  I like the sys.ast_transformations idea
since it is simple and gives enough access that whether read-only or
read-write is allowed something like PyChecker can get the access it
needs.  It also allows for simple Python scripts that can install the
desired functions and then compile or check the passed-in files. 
Obviously write accesss would be needed for optimization stuff (such
as if the peepholer was rewritten in Python and used by default), but
we can also expose this later if we want.

In terms of 2.5, I think we really need to settle on the fate of the
ast-objects branch.  If we can get the very basic API for exposing the
AST to Python code in 2.5 that would be great, but I don't view that
as critical as choosing on the final AST implementation style since
wasting work on a version that will disappear would just plain suck. 
It would be great to resolve this before the PyCon sprints since a
good chunk of the AST-caring folk will be there for at least part of
the time.


From rhamph at  Wed Feb 15 09:39:10 2006
From: rhamph at (Adam Olsen)
Date: Wed, 15 Feb 2006 01:39:10 -0700
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, "Martin v. L?wis" <martin at> wrote:
> Adam Olsen wrote:
> > (I wonder if maybe they should be an error in 2.x as well.  Source
> > encoding is for unicode literals, not str literals.)
> Source encoding applies to the entire source code, including (byte)
> string literals, comments, identifiers, and keywords. IOW, if you
> declare your source encoding is utf-8, the keyword "print" must
> be represented with the bytes that represent the Unicode letters
> for "p","r","i","n", and "t" in UTF-8.

Although it does apply to the entire source file, I think this is more
for convenience (try telling an editor that only a single line is
Shift_JIS!) than to allow 8-bit (or 16-bit?!) str literals.  Indeed,
you could have arbitrary 8-bit str literals long before the source
encoding was added.  Keywords and identifiers continue to be limited
to ascii characters (even if they make a roundtrip through other
encodings), and comments continue to be ignored.

Source encoding exists so that you can write u"123" with the encoding
stated once at the top of the file, rather than "123".decode('utf-8')
with the encoding repeated everywhere.

Making it an error to have 8-bit str literals in 2.x would help
educate the user that they will change behavior in 3.0 and not be
8-bit str literals anymore.

Adam Olsen, aka Rhamphoryncus

From stephen at  Wed Feb 15 09:45:23 2006
From: stephen at (Stephen J. Turnbull)
Date: Wed, 15 Feb 2006 17:45:23 +0900
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <> (M.'s message of "Tue, 14 Feb
	2006 17:47:39 +0100")
References: <>
Message-ID: <>

>>>>> "M" == "M.-A. Lemburg" <mal at> writes:

    M> James Y Knight wrote:

    >> Nice and simple.

    M> Albeit, too simple.

    M> The above approach would basically remove the possibility to
    M> easily create bytes() from literals in Py3k, since literals in
    M> Py3k create Unicode objects, e.g. bytes("123") would not work
    M> in Py3k.

No, it just rules out a builtin easy way to create bytes() from

But who needs to do that?  codec writers and people implementing wire
protocols with bytes() that look like character strings but aren't.
OK, so this makes life hard on codec writers.  But those implementing
wire protocols can use existing codecs, presumably 'ascii' will do 99%
of the time:

def make_wire_token (unicode_string, encoding='ascii'):
    return bytes(unicode_string.encode(encoding))

Everybody else is just asking for trouble by using bytes() for
character strings.  It would really be desirable to have "string" be a
Unicode literal in Py3k, and u"string" a syntax error.

    M> To prevent [people from learning to write "bytes('string')" in
    M> 2.x and expecting that to work in Py3k], you'd have to outrule
    M> bytes() construction from strings altogether, which doesn't
    M> look like a viable option either.

Why not?  Either bytes() are the same as strings, in which case why
change the name? or they're not, in which case we ask people to jump
through the required hoops to create them.  Maybe I'm missing some
huge use case, of course, but it looks to me like the use cases are
pretty specialized, and are likely to involve explicit coding anyway.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From ncoghlan at  Wed Feb 15 09:48:27 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 18:48:27 +1000
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
 to	PyNumberMethods
In-Reply-To: <dsu7t9$m9c$>
References: <dsu7t9$m9c$>
Message-ID: <>

Travis E. Oliphant wrote:
>     3) A new C-API function PyNumber_Index will be added with signature
>        Py_ssize_t PyNumber_index (PyObject *obj)

There's a typo in the function name here. Other than that, the PEP looks 
pretty much fine to me.

About the only other quibble is that it could arguably do with a link to the 
thread where we discussed (and discarded) 'discrete' and 'ordinal' as 
alternative names (you mention the discussion, but don't give a reference).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From t-meyer at  Wed Feb 15 09:48:43 2006
From: t-meyer at (Tony Meyer)
Date: Wed, 15 Feb 2006 21:48:43 +1300
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

> We still need a release manager.  No one has heard from Anthony.

It is the peak of the summer down here.  Perhaps he is lucky enough  
to be enjoying it away from computers for a while?


From just at  Wed Feb 15 09:51:44 2006
From: just at (Just van Rossum)
Date: Wed, 15 Feb 2006 09:51:44 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
Message-ID: <r01050400-1039-4C033D2A9E0011DA8736001124365170@[]>

Guido van Rossum wrote:

> If bytes support the buffer interface, we get another interesting
> issue -- regular expressions over bytes. Brr.

We already have that:

  >>> import re, array
  >>>'\2', array.array('B', [1, 2, 3, 4])).group()
  array('B', [2])

Not sure whether to blame array or re, though...


From greg.ewing at  Wed Feb 15 09:43:57 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 15 Feb 2006 21:43:57 +1300
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

Brett Cannon wrote:
> One protects us from ending up with an unusable AST since
> the seralization can keep the original AST around and if the version
> passed back in from Python code is junk it can be tossed and the
> original version used.

I don't understand why this is an issue. If Python code
produces junk and tries to use it as an AST, then it's
buggy and deserves what it gets. All the AST compiler
should be responsible for is to try not to crash the
interpreter under those conditions. But that's true
whatever method is used for passing ASTs from Python
to the compiler.

   The PyObjects branch most likely won't have
> this since the actual AST will most likely be passed to Python code. 
> But there is performance issues with all of this seralization compared
> to a simple Pyobject pointer into Pythonland.  Jeremy supports the
> serialization option.  I am personally indifferent while leaning
> towards the serialization.
> Then there is the API.  First we need to decide if AST modification is
> allowed or not.  It has been argued on my blog by someone (see
> for the
> entry on this whole topic which highly mirrors this email) that Guido
> won't okay AST transformations since it can lead to control flow
> changes behind the scenes.  I say that is fine as long as knowing that
> AST transformations are occurring are sufficiently obvious.  I say
> allow transformations.
> Once that is settled, I see three places for possible access to the
> AST.  One is the command line like -m.  Totally obvious to the user as
> long as they are not just working off of the .pyc files.  Next is
> something like sys.ast_transformations that is a list of functions
> that are passed in the AST (and return a new version if modifications
> are allowed).  This could allow chaining of AST transformations by
> feeding the next function with another one.  Next is per-object AST
> access.  This could get expensive since if we don't keep a copy of the
> AST with the code objects (which we probably shouldn't since that is
> wasted memory if the AST is not used a lot) we will need to read the
> code a second time to get the AST regenerated.
> I personally think we should choose an initial global access API to
> the AST as a starting API.  I like the sys.ast_transformations idea
> since it is simple and gives enough access that whether read-only or
> read-write is allowed something like PyChecker can get the access it
> needs.  It also allows for simple Python scripts that can install the
> desired functions and then compile or check the passed-in files. 
> Obviously write accesss would be needed for optimization stuff (such
> as if the peepholer was rewritten in Python and used by default), but
> we can also expose this later if we want.
> In terms of 2.5, I think we really need to settle on the fate of the
> ast-objects branch.  If we can get the very basic API for exposing the
> AST to Python code in 2.5 that would be great, but I don't view that
> as critical as choosing on the final AST implementation style since
> wasting work on a version that will disappear would just plain suck. 
> It would be great to resolve this before the PyCon sprints since a
> good chunk of the AST-caring folk will be there for at least part of
> the time.
> -Brett
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From ncoghlan at  Wed Feb 15 10:01:21 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 19:01:21 +1000
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Bob Ippolito wrote:
> On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:
>> (Why would you even think about views here? They are evil.)
> I mention views because that's what numpy/Numeric/numarray/etc.  
> do...  It's certainly convenient at times to have that functionality,  
> for example, to work with only the alpha channel in an RGBA image.   
> Probably too magical for the bytes type.

The key difference between numpy arrays and normal sequences is that the 
length of a sequence can change, but the shape of a numpy array is essentially 

So view behaviour can be reserved for a dimensioned array type (if the numpy 
folks ever find the time to finish writing their PEP. . .)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From thomas at  Wed Feb 15 10:22:30 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 10:22:30 +0100
Subject: [Python-Dev] how to upload new MacPython web page?
In-Reply-To: <06Feb14.213215pst."58633">
References: <06Feb14.213215pst."58633">
Message-ID: <>

On Tue, Feb 14, 2006 at 09:32:09PM -0800, Bill Janssen wrote:
> We (the pythonmac-sig mailing list) seem to have converged (almost --
> still talking about the logo) on a new download page for MacPython, to
> replace the page currently at
>  The strawman can be
> seen at
> How do I get the bits changed on (when we're finished)?

pydotorg at is probably the right email address (although most of
them are on here as well.)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From ncoghlan at  Wed Feb 15 10:28:36 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 19:28:36 +1000
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote:
> Brett Cannon wrote:
>> One protects us from ending up with an unusable AST since
>> the seralization can keep the original AST around and if the version
>> passed back in from Python code is junk it can be tossed and the
>> original version used.
> I don't understand why this is an issue. If Python code
> produces junk and tries to use it as an AST, then it's
> buggy and deserves what it gets. All the AST compiler
> should be responsible for is to try not to crash the
> interpreter under those conditions. But that's true
> whatever method is used for passing ASTs from Python
> to the compiler.

I'd prefer the AST node be real Python objects. The arena approach seems to be 
working reasonably well, but I still don't see a good reason for using a 
specialised memory allocation scheme when it really isn't necessary and we 
have a perfectly good memory management system for PyObject's.

On the 'unusable AST' front, if AST transformation code creates illegal 
output, then the main thing is to raise an exception complaining about what's 
wrong with it. I believe that may need a change to the compiler whether the 
modified AST was serialised or not.

In terms of reverting back to the untransformed AST if the transformation 
fails, then that option is up to the code doing the transformation. Instead of 
serialising all the time (even for cases where the AST is just being inspected 
instead of transformed), we can either let the AST objects support the 
copy/deepcopy protocol, or else provide a method to clone a tree before trying 
to transform it.

A unified representation means we only have one API to learn, that is 
accessible from both Python and C. It also eliminates any need to either 
implement features twice (once in Python and once in C) or else let the Python 
and C API's diverge to the point where what you can do with one differs from 
what you can do with the other.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From thomas at  Wed Feb 15 10:52:27 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 10:52:27 +0100
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006 at 09:58:46PM -0800, Neal Norwitz wrote:

> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.

I can volunteer for 328 if no one else wants it, I've messed with the import
mechanism before (and besides, it's fun.) I've also written an unfinished
308 implementation to get myself acquainted with the AST code more.
'Unfinished' means that it works completely, except for some cases of
ambiguous syntax. I can fix that in a few days if the deadline nears and
there's no working patch.

(Naively adding if/else expressions broke list comprehensions with an 'if'
clause, and fixing that broke list comprehensions with 'for x in lambda:0,
lambda:1', and fixing that broke list comprehensions altogether... I added
"clean up Grammar file" to the PyCon core sprint topics for that reason. I
guess 308 wasn't as much a trainer implementation as people thought ;) The
syntax part of 328 is probably easier (but the rest isn't.)

>     Access to C AST from Python

If this still needs work when I finish grokking the AST code and the PyObj
branch of it, I can help.

I should have more than enough spare time to finish these things before
alpha 1.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Wed Feb 15 11:03:09 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 11:03:09 +0100
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Feb 15, 2006 at 07:28:36PM +1000, Nick Coghlan wrote:

> On the 'unusable AST' front, if AST transformation code creates illegal
> output, then the main thing is to raise an exception complaining about
> what's wrong with it. I believe that may need a change to the compiler
> whether the modified AST was serialised or not.

I would personally prefer the AST validation to be a separate part of the
compiler. It means the one or the other can be out of sync, but it also
means it can be accessed directly (validating AST before sending it to the
compiler) and the compiler (or CFG generator, or something between AST and
CFG) can decide not to validate internally generated AST for non-debug
builds, for instance.

I like both those reasons.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From stephen at  Wed Feb 15 11:06:21 2006
From: stephen at (Stephen J. Turnbull)
Date: Wed, 15 Feb 2006 19:06:21 +0900
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <> (Fred L. Drake, Jr.'s
	message of "Tue, 14 Feb 2006 23:23:45 -0500")
References: <>
	<> <>
Message-ID: <>

>>>>> "Fred" == Fred L Drake, <fdrake at> writes:

    Fred> On Tuesday 14 February 2006 22:34, Greg Ewing wrote:

    >> Seems to me this is a case where you want to be able to change
    >> encodings in the middle of reading the stream.  You start off
    >> reading the data as ascii, and once you've figured out the
    >> encoding, you switch to that and carry on reading.

    Fred> Not quite.  The proper response in this case is often to
    Fred> re-start decoding with the correct encoding, since some of
    Fred> the data extracted so far may have been decoded incorrectly.
    Fred> A very carefully constructed application may be able to go
    Fred> back and re-decode any data saved from the stream with the
    Fred> previous encoding, but that seems like it would be pretty
    Fred> fragile in practice.

I believe GNU Emacs is currently doing this.  AIUI, they save
annotations where the codec is known to be non-invertible (eg, two
charset-changing escape sequences in a row).  I do think this is
fragile, and a robust application really should buffer everything it's
not sure of decoding correctly.

    Fred> There may be cases where switching encoding on the fly makes
    Fred> sense, but I'm not aware of any actual examples of where
    Fred> that approach would be required.

This is exactly what ISO 2022 formalizes: switching encodings on the

mboxes of Japanese mail often contain random and unsignaled encoding

A terminal emulator may need to switch when logging in to a remote

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From ncoghlan at  Wed Feb 15 11:09:33 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 20:09:33 +1000
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>
Message-ID: <>

Guido van Rossum wrote:
> But somehow I still like the 'open' verb. It has a long and rich
> tradition. And it also nicely conveys that it is a factory function
> which may return objects of different types (though similar in API)
> based upon either additional arguments (e.g. buffering) or the
> environment (e.g. encodings) or even inspection of the file being
> opened.

If we went with longer names, a slight variation on the opentext/openbinary 
idea would be to use opentext and opendata.

That is, "give me something that looks like a text file (it contains 
characters)", or "give me something that looks like a data file (it contains 

"opentext" would map to "" (that is, accepting an encoding argument)

"opendata" would map to the standard "open", but with the 'b' in the mode 
string added automatically.

So the mode choices common to both would be:

   'r'/'w'/'a'   - read/write/append (default 'r')
   ''/'+'        - update (IOError if file does not already exist) (default '')

opentext would allow the additional option:
   ''/'U'        - universal newlines (default '')

Neither of them would accept a 'b' in the mode string.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tim at  Wed Feb 15 11:11:49 2006
From: tim at (Tim Parkin)
Date: Wed, 15 Feb 2006 10:11:49 +0000
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

Guido van Rossum wrote:

> (Now that I work for Google I realize more than ever before the
> importance of keeping URLs stable; PageRank(tm) numbers don't get
> transferred as quickly as contents. I have this worry too in the
> context of the redesign; 301 permanent redirect is *not*
> going to help PageRank of the new page.)
Hi Guido,

Could you expand on why 301 redirects won't help with the transfer of
page rank (if you're allowed)? We've done exactly this on many sites and
the pagerank (or more relevantly the search rankings on specific terms)
has transferred almost overnight. The bigger pagerank updates (both
algorithm changes and overhauls in approach) seem to only happen every
few months and these also seem to take notice of 301 redirects (they
generally clear up any supplemental results).

The addition of the was also intended (I thought) to be
used in the google customised search (the google page you go to when you
search from I'm not sure if that go lost in implementation
but the idea was that the google box would have a radio button for

I agree that should only be the current documentation
however what about the large amount of people who use 2.3 as standard?
perhaps the makes sense.

In terms of pagerank for the different versions of the docs, would it
make sense to 'hide' the older versions of the docs with a noindex so
that general google searches will only return the current docs.

<aside> Google seems to have a policy of ranking 'long standing' links
with a higher pagerank weighting, hence older versions of python docs
ranking higher). Hence keeping a single 'current' set of docs and having
all inbound links pointing to them (e.g. will gradually
build up the search ranking.</aside>

+1 on only containing current (with the caveat that
there be an equivalent for users of specific versions, e.g. 2.3 users)

Tim Parkin

p.s. All my knowledge of how google work is gained through personal
research so the terminology, techniques and results may be completely
wrong (and also may vary from time to time) - however they do reflect
direct experience.

p.p.s regarding 'site:', 'allinurl:' and other google modifiers; It
would seem a good idea to create a single page that helped site users
make such searches without having to learn how the modifiers work.

It maybe should be noted that you can also add a 'temporary redirects'
(302's) which is taken by google to mean "leave the original search
results in place". This has also worked for us (old urls remain the same
as far as google is concerned).

From rrr at  Wed Feb 15 11:24:38 2006
From: rrr at (Ron Adam)
Date: Wed, 15 Feb 2006 04:24:38 -0600
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Greg Ewing wrote:
> Ron Adam wrote:
>> My first impression and thoughts were:  (and seems incorrect now)
>>      bytes(object) ->  byte sequence of objects value
>> Basically a "memory dump" of objects value.
> As I understand the current intentions, this is correct.
> The bytes constructor would have two different signatures:
>     (1)   bytes(seq) --> interprets seq as a sequence of
>                          integers in the range 0..255,
>                          exception otherwise
>     (2a)  bytes(str, encoding)     --> encodes the characters of
>     (2b)  bytes(unicode, encoding)     the string using the specified
>                                        encoding
> In (2a) the string would be interpreted as containing
> ascii characters, with an exception otherwise. In 3.0,
> (2a) will disappear leaving only (1) and (2b).

I was presuming it would be done in C code and it will just need a 
pointer to the first byte, memchr(), and then read n bytes directly into 
a new memory range via  memcpy(). But I don't know if that's possible 
with Pythons object model.  (My C skills are a bit rusty as well)

However, if it's done with a Python iterator and then each item is 
translated to bytes in a sequence, (much slower), an encoding will need 
to be known for it to work correctly.  Unfortunately Unicode strings 
don't set an attribute to indicate it's own encoding. So bytes() can't 
just do encoding = s.encoding to find out, it would need to be specified 
in this case.

And that should give you a byte object that is equivalent to the bytes 
in memory, providing Python doesn't compress data internally to save 
space. (?, I don't think it does)

I'd prefer the first version *if possible* because of the performance.

>> And I was thinking a bytes argument of more than one item would indicate 
>> a byte sequence.
>>      bytes(1,2,3)  ->  bytes([1,2,3])
> But then you have to test the argument in the one-argument
> case and try to guess whether it should be interpreted as
> a sequence or an integer. Best to avoid having to do that.

Yes, I agree.

>> Which is fine... so ???
>>     b = bytes(0L) ->  bytes([0,0,0,0])
> No, bytes(0L) --> TypeError because 0L doesn't implement
> the iterator protocol or the buffer interface.

It wouldn't need it if it was a direct C memory copy.

> I suppose long integers might be enhanced to support the
> buffer interface in 3.0, but that doesn't seem like a good
> idea, because the bytes you got that way would depend on
> the internal representation of long integers. In particular,

Since some longs will be of different length, yes a bytes(0L) could give 
differing results on different platforms, but it will always give the 
same result on the platform it is run on. I actually think this is a 
plus and not a problem. If you are using Python to implement a byte 
interface you need to *know* it is different, not have it hidden.

     bytesize = len(bytes(0L))  # find how long a long is

   Ronald Adam

From ncoghlan at  Wed Feb 15 11:29:45 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 15 Feb 2006 20:29:45 +1000
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Thomas Wouters wrote:
> On Wed, Feb 15, 2006 at 07:28:36PM +1000, Nick Coghlan wrote:
>> On the 'unusable AST' front, if AST transformation code creates illegal
>> output, then the main thing is to raise an exception complaining about
>> what's wrong with it. I believe that may need a change to the compiler
>> whether the modified AST was serialised or not.
> I would personally prefer the AST validation to be a separate part of the
> compiler. It means the one or the other can be out of sync, but it also
> means it can be accessed directly (validating AST before sending it to the
> compiler) and the compiler (or CFG generator, or something between AST and
> CFG) can decide not to validate internally generated AST for non-debug
> builds, for instance.
> I like both those reasons.

Aye, I was thinking much the same thing.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From thomas at  Wed Feb 15 11:37:46 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 11:37:46 +0100
Subject: [Python-Dev] Generalizing *args and **kwargs
Message-ID: <>

I've been thinking about generalization of the *args/**kwargs syntax for
quite a while, and even though I'm pretty sure Guido (and many people) will
consider it overgeneralization, I am finally going to suggest it. This whole
idea is not something dear to my heart, although I obviously would like to
see it happen. If the general vote is 'no', I'll write a small PEP or add it
to PEP 13 and be done with it.

The grand total of the generalization would be something like this:

Allow 'unpacking' of arbitrary iterables in sequences:
>>> iterable = (1, 2)
>>> ['a', 'b', *iterable, 'c']
['a', 'b', 1, 2, 'c']
>>> ('a', 'b', *iterable, 'c')
('a', 'b', 1, 2, 'c')

Possibly also allow 'unpacking' in list comprehensions and genexps:
>>> [ *subseq for subseq in [(1, 2), (3, 4)] ]
[1, 2, 3, 4]

(You can already do this by adding an extra 'for' loop inside the LC)

Allow 'unpacking' of mapping types (anything supporting 'items' or
'iteritems') in dictionaries:
>>> args = {'verbose': 1}
>>> defaults = {'verbose': 0}
>>> {**defaults, **args, 'fixedopt': 1}
{'verbose': 1, 'fixedopt': 1}

Allow 'packing' in assignment, stuffing left-over items in a list.
>>> a, b, *rest = range(5)
>>> a, b, rest
(0, 1, [2, 3, 4])
>>> a, b, *rest = range(2)
(0, 1, [])

(A list because you can't always take the type of the RHS and it's the right
Python type for 'an arbitrary length homogeneous sequence'.)

While generalizing that, it may also make sense to allow:

>>> def spam(*args, **kwargs):
...     return args, kwargs
>>> args = (1, 2); kwargs = {'eggs': 'no'}
>>> spam(*args, 3)
((1, 2, 3), {})
>>> spam(*args, 3, **kwargs, spam='extra', eggs='yes')
((1, 2, 3), {'spam': 'extra', 'eggs': 'yes'})

(In spite of the fact that both are already possible by fiddling args/kwargs
beforehand or doing '*(args + (3,))'.)

Maybe it also makes sense on the defining side, particularly for keyword
arguments to indicate 'keyword-only arguments'. Maybe with a '**' without a
name attached:

>>> def spam(pos1, pos2, **, kwarg1=.., kwarg2=..)

But I dunno yet.

Although I've made it look like I have a working implementation, I haven't.
I know exactly how to do it, though, except for the AST part ;) Once I
figure out how to properly work with the AST code I'll probably write this
patch whether it's a definite 'no' or not, just to see if I can. I wouldn't
mind if people gave their opinion, though.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From rhamph at  Wed Feb 15 12:08:46 2006
From: rhamph at (Adam Olsen)
Date: Wed, 15 Feb 2006 04:08:46 -0700
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<> <>
	<> <>
Message-ID: <>

On 2/15/06, Ron Adam <rrr at> wrote:
> Greg Ewing wrote:
> > Ron Adam wrote:
> >>     b = bytes(0L) ->  bytes([0,0,0,0])
> >
> > No, bytes(0L) --> TypeError because 0L doesn't implement
> > the iterator protocol or the buffer interface.
> It wouldn't need it if it was a direct C memory copy.
> > I suppose long integers might be enhanced to support the
> > buffer interface in 3.0, but that doesn't seem like a good
> > idea, because the bytes you got that way would depend on
> > the internal representation of long integers. In particular,
> Since some longs will be of different length, yes a bytes(0L) could give
> differing results on different platforms, but it will always give the
> same result on the platform it is run on. I actually think this is a
> plus and not a problem. If you are using Python to implement a byte
> interface you need to *know* it is different, not have it hidden.
>      bytesize = len(bytes(0L))  # find how long a long is

I believe you're confusing a C long with a Python long.  A Python long
is implemented as an array and has variable size.

In any case we already have the struct module:

>>> import struct
>>> struct.calcsize('l')

Adam Olsen, aka Rhamphoryncus

From simon at  Wed Feb 15 23:07:34 2006
From: simon at (Simon Burton)
Date: Wed, 15 Feb 2006 22:07:34 +0000
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 15 Feb 2006 00:34:35 -0800
Brett Cannon <brett at> wrote:

> As per Neal's prodding email, here is a thread to discuss where we
> want to go with the C AST to Python stuff and what I think are the
> core issues at the moment.
> First issue is the ast-objects branch.  Work is being done on it, but
> it still leaks some references (Neal or Martin can correct me if I am
> wrong).  

I've been doing the heavy lifting on ast-objects the last few weeks.
Today it finally passed the python test suite. The last thing to do is
the addition of XDECREF's, so yes, it is leaking a lot of references.

I won't make it to PyCon (it's a long way for me to come), but gee I've left
all the fun stuff for you to do !

Even if AST transforms are not allowed, I see it as the strongest form of
code reflection, and long over-due in python.


Simon Burton, B.Sc.
Licensed PO Box 8066
ANU Canberra 2601
Ph. 61 02 6249 6940 

From smiles at  Wed Feb 15 12:54:44 2006
From: smiles at (Smith)
Date: Wed, 15 Feb 2006 05:54:44 -0600
Subject: [Python-Dev] nice()
References: <>
Message-ID: <000e01c63226$fc342660$7c2c4fca@csmith>

I am reluctantly posting here since this is of less intense interest than other things being discussed right now, but this is related to the areclose proposal that was discussed here recently. 

The following discussion ends with things that python-dev might want to consider in terms of adding a function that allows something other than the default 12- and 17-digit precision representations of numbers that str() and repr() give. Such a function (like nice(), perhaps named trim()?) would provide a way to convert fp numbers that are being used in comparisons into a precision that reflects the user's preference.  

Everyone knows that fp numbers must be compared with caution, but there is a void in the relative-error department for exercising such caution, thus the proposal for something like 'areclose'. The problem with areclose(), however, is that it only solves one part of the problem that needs to be solved if two fp's *are* going to be compared: if you are going to check if a < b you would need to do something like 

    not areclose(a,b) and a < b

With something like trim() (a.k.a nice()) you could do

    trim(a) < trim(b)

to get the comparison to 12-digit default precision or arbitrary precision with optional arguments, e.g. to 3 digits of precision:

    trim(a,3) < trim(b,3)

>From a search on the documentation, I don't see that the name trim() is taken yet.

OK, comments responding to Greg follow.

| From: Greg Ewing greg.ewing at
| Smith wrote:
|| computing the bin boundaries for a histogram
|| where bins are a width of 0.1:
||||| for i in range(20):
|| ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
|| ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
| I don't see how that has any relevance to the way bin boundaries
| would be used in practice, which is to say something like
|   i = int(value / 0.1)
|   bin[i] += 1 # modulo appropriate range checks

This is just masking the issue by converting numbers to integers. The fact remains that two mathematically equal numbers can have two different internal representations with one being slightly larger than the exact integer value and one smaller:

>>> a=(23*.1)*10;a
>>> b=2.3/.1;b
>>> int(a/.1),int(b/.1)
(230, 229)

Part of the answer in this context is to use round() rather than int so you are getting to the closest integer.

|| For, say, garden variety numbers that aren't full of garbage digits
|| resulting from fp computation, the boundaries computed as 0.1*i are\
|| not going to agree with such simple numbers as 1.4 and 0.7.
| Because the arithmetic is binary rather than decimal. But even using
| decimal, you get the same sort of problems using a bin width of
| 1.0/3.0. The solution is to use an algorithm that isn't sensitive
| to those problems, then it doesn't matter what base your arithmetic
| is done in.


|| I understand that the above really is just a patch over the problem,
|| but I'm wondering if it moves the problem far enough away that most
|| users wouldn't have to worry about it.
| No, it doesn't. The problems are not conveniently grouped together
| in some place you can get away from; they're scattered all over the
| place where you can stumble upon one at any time.

Yes, even a simple computation of the wrong type can lead to unexpected results. I agree.

|| So perhaps this brings us back to the original comment that "fp
|| issues are a learning opportunity." They are. The question I have is
|| "how 
|| soon  do they need to run into them?" Is decreasing the likelihood
|| that they will see the problem (but not eliminate it) a good thing
|| for the python community or not?
| I don't think you're doing anyone any favours by trying to protect
| them from having to know about these things, because they *need* to
| know about them if they're not to write algorithms that seem to
| work fine on tests but mysteriously start producing garbage when
| run on real data, possibly without it even being obvious that it is
| garbage.

Mostly I agree, but if you go to the extreme then why don't we just drop floating point comparisons altogether and force the programmer to convert everything to integers and make their own bias evident (like converting to int rather than nearest int). Or we drop the fp comparison operators and introduce fp comparison functions that require the use of tolerance terms to again make the assumptions transparent: 

def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
    return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)

The fact is, we make things easier and let the programmer shoot themselves in the foot if they want to by providing things like fp comparisons and even functions like sum that do dumb-sums (though Raymond Hettinger's Python Recipe at ASPN provides a smart-sum).

I think the biggest argument for something like nice() is that it fills the void for a simple way to round numbers to a relative error rather than an absolute error. round() handles absolute error--it rounds to a given precision. str() rounds to the 12th digit and repr() to the 17th digit. There is nothing else except build-your-own solutions to rounding to an arbitrary significant figure. nice() would fill that niche and provide the default 12 significant digit solution. 

I agree that making all float comparisions default to 12-digit precision would not be smart. That would be throwing away 5 digits that someone might really want. Providing a simple way to specify the desired significance is something that is needed, especially since fp issues are such a thorny issue. The user that explicitly uses nice(x)<nice(y) is being rewarded at the moment by getting a result that they expect, e.g. 


and also getting a subtle reminder that their result is only true at the default (12th digit) precision level.


From fuzzyman at  Wed Feb 15 13:19:02 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 15 Feb 2006 12:19:02 +0000
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>
Message-ID: <>

Adam Olsen wrote:
> On 2/14/06, Just van Rossum <just at> wrote:
>> +1 for two functions.
>> My choice would be open() for binary and opentext() for text. I don't
>> find that backwards at all: the text function is going to be more
>> different from the current open() function then the binary function
>> would be since in many ways the str type is closer to bytes than to
>> unicode.
>> Maybe it's even better to use opentext() AND openbinary(), and deprecate
>> plain open(). We could even introduce them at the same time as bytes()
>> (and leave the open() deprecation for 3.0).
> Thus providing us with a transition period, even with warnings on use
> of the old function.

I personally like the move towards all unicode strings, basically any 
text where you don't know the encoding used is 'random binary data'. 
This works fine, so long as you are in control of the text source. 
*However*, it leaves the following problem :

The current situation (treating byte-sequences as text and assuming they 
are an ascii-superset encoded text-string) *works* (albeit with many 
breakages), simply because this assumption is usually correct.

Forcing the programmer to be aware of encodings, also pushes the same 
requirement onto the user (who is often the source of the text in question).

Currently you can read a text file and process it - making sure that any 
changes/requirements only use ascii characters. It therefore doesn't 
matter what 8 bit ascii-superset encoding is used in the original. If 
you force the programmer to specify the encoding in order to read the 
file, they would have to pass that requirement onto their user. Their 
user is even less likely to be encoding aware than the programmer.

What this means, is that for simple programs where the programmer 
doesn't want to have to worry about encoding, or can't force the user to 
be aware, they will read in the file as bytes. Modules will quickly and 
inevitably be created implementing all the 'string methods' for bytes. 
New programmers will gravitate to these and the old mess will continue, 
but with a more awkward hybrid than before. (String manipulations of 
byte sequences will no longer be a core part of the language - and so be 
harder to use.)

Not sure what we can do to obviate this of course... but is this change 
actually going to improve the situation or make it worse ?

All the best,

Michael Foord
-------------- next part --------------
An HTML attachment was scrubbed...

From raymond.hettinger at  Wed Feb 15 13:45:32 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 15 Feb 2006 07:45:32 -0500
Subject: [Python-Dev] nice()
References: <>
Message-ID: <001401c6322d$b5c017a0$b83efea9@RaymondLaptop1>

> The following discussion ends with things that python-dev might want to 
> consider in terms of adding a function that allows something other than the 
> default 12- and 17-digit precision representations of numbers that str() and 
> repr() give. Such a function (like nice(), perhaps named trim()?) would 
> provide a way to convert fp numbers that are being used in comparisons into a 
> precision that reflects the user's preference.

-1  See posts by Greg, Terry, and myself which recommend against trim(), nice(), 
or other variants.  For the purpose of precision sensitive comparisons, these 
constructs are unfit for their intended purpose -- they are error-prone and do 
not belong in Python.  They may have some legitimate uses, but those tend to be 
dominated by the existing round() function.

If anything, then some variant of is_close() can go in the math module.  BUT, 
the justification should not be for newbies to ignore issues with floating-point 
equality comparisons.  The justification would have to be that folks with some 
numerical sophistication have a recurring need for the function (with 
sophistication meaning that they know how to come up with relative and absolute 
tolerances that make their application succeed over the full domain of possible 


---- relevant posts from Greg and Terry ----

[Greg Ewing]
>> I don't think you're doing anyone any favours by trying to protect
>> them from having to know about these things, because they *need* to
>> know about them if they're not to write algorithms that seem to
>> work fine on tests but mysteriously start producing garbage when
>> run on real data,

[Terry Reedy]
> I agree.  Here was my 'kick-in-the-butt' lesson (from 20+ years ago):  the
> 'simplified for computation' formula for standard deviation, found in too
> many statistics books without a warning as to its danger, and specialized
> for three data points, is sqrt( ((a*a+b*b+c*c)-(a+b+c)**2/3.0) /2.0).
> After 1000s of ok calculations, the data were something like a,b,c =
> 10005,10006,10007.  The correct answer is 1.0 but with numbers rounded to 7
> digits, the computed answer is sqrt(-.5) == CRASH.  I was aware that
> subtraction lost precision but not how rounding could make a theoretically
> guaranteed non-negative difference negative.
> Of course, Python floats being C doubles makes such glitches much rarer.
> Not exposing C floats is a major newbie (and journeyman) protection
> feature.

[Greg Ewing]
> I don't think you're doing anyone any favours by trying to protect
> them from having to know about these things, because they *need* to
> know about them if they're not to write algorithms that seem to
> work fine on tests but mysteriously start producing garbage when
> run on real data,

I recommend rejecting trim(), nice(), areclose(), and all variants.

Greg, Terry, and myself have

> OK, comments responding to Greg follow.
> | From: Greg Ewing greg.ewing at
> | Smith wrote:
> |
> || computing the bin boundaries for a histogram
> || where bins are a width of 0.1:
> ||
> ||||| for i in range(20):
> || ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
> || ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
> |
> | I don't see how that has any relevance to the way bin boundaries
> | would be used in practice, which is to say something like
> |
> |   i = int(value / 0.1)
> |   bin[i] += 1 # modulo appropriate range checks
> This is just masking the issue by converting numbers to integers. The fact 
> remains that two mathematically equal numbers can have two different internal 
> representations with one being slightly larger than the exact integer value 
> and one smaller:
>>>> a=(23*.1)*10;a
> 23.000000000000004
>>>> b=2.3/.1;b
> 22.999999999999996
>>>> int(a/.1),int(b/.1)
> (230, 229)
> Part of the answer in this context is to use round() rather than int so you 
> are getting to the closest integer.
> || For, say, garden variety numbers that aren't full of garbage digits
> || resulting from fp computation, the boundaries computed as 0.1*i are\
> || not going to agree with such simple numbers as 1.4 and 0.7.
> |
> | Because the arithmetic is binary rather than decimal. But even using
> | decimal, you get the same sort of problems using a bin width of
> | 1.0/3.0. The solution is to use an algorithm that isn't sensitive
> | to those problems, then it doesn't matter what base your arithmetic
> | is done in.
> Agreed.
> |
> || I understand that the above really is just a patch over the problem,
> || but I'm wondering if it moves the problem far enough away that most
> || users wouldn't have to worry about it.
> |
> | No, it doesn't. The problems are not conveniently grouped together
> | in some place you can get away from; they're scattered all over the
> | place where you can stumble upon one at any time.
> |
> Yes, even a simple computation of the wrong type can lead to unexpected 
> results. I agree.
> || So perhaps this brings us back to the original comment that "fp
> || issues are a learning opportunity." They are. The question I have is
> || "how
> || soon  do they need to run into them?" Is decreasing the likelihood
> || that they will see the problem (but not eliminate it) a good thing
> || for the python community or not?
> |
> | I don't think you're doing anyone any favours by trying to protect
> | them from having to know about these things, because they *need* to
> | know about them if they're not to write algorithms that seem to
> | work fine on tests but mysteriously start producing garbage when
> | run on real data, possibly without it even being obvious that it is
> | garbage.
> Mostly I agree, but if you go to the extreme then why don't we just drop 
> floating point comparisons altogether and force the programmer to convert 
> everything to integers and make their own bias evident (like converting to int 
> rather than nearest int). Or we drop the fp comparison operators and introduce 
> fp comparison functions that require the use of tolerance terms to again make 
> the assumptions transparent:
> def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
>    return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
> print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
> print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)
> The fact is, we make things easier and let the programmer shoot themselves in 
> the foot if they want to by providing things like fp comparisons and even 
> functions like sum that do dumb-sums (though Raymond Hettinger's Python Recipe 
> at ASPN provides a smart-sum).
> I think the biggest argument for something like nice() is that it fills the 
> void for a simple way to round numbers to a relative error rather than an 
> absolute error. round() handles absolute error--it rounds to a given 
> precision. str() rounds to the 12th digit and repr() to the 17th digit. There 
> is nothing else except build-your-own solutions to rounding to an arbitrary 
> significant figure. nice() would fill that niche and provide the default 12 
> significant digit solution.
> I agree that making all float comparisions default to 12-digit precision would 
> not be smart. That would be throwing away 5 digits that someone might really 
> want. Providing a simple way to specify the desired significance is something 
> that is needed, especially since fp issues are such a thorny issue. The user 
> that explicitly uses nice(x)<nice(y) is being rewarded at the moment by 
> getting a result that they expect, e.g.
>    nice(2.3/.1)==nice((23*.1)*10)
> and also getting a subtle reminder that their result is only true at the 
> default (12th digit) precision level.
> /c
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe: 

From gustavo at  Wed Feb 15 13:35:52 2006
From: gustavo at (Gustavo Niemeyer)
Date: Wed, 15 Feb 2006 10:35:52 -0200
Subject: [Python-Dev] Generalizing *args and **kwargs
In-Reply-To: <>
References: <>
Message-ID: <20060215123552.GA8946@localhost.localdomain>

> I've been thinking about generalization of the *args/**kwargs syntax for
> quite a while, and even though I'm pretty sure Guido (and many people) will
> consider it overgeneralization, I am finally going to suggest it. This whole
> idea is not something dear to my heart, although I obviously would like to
> see it happen. If the general vote is 'no', I'll write a small PEP or add it
> to PEP 13 and be done with it.

A PEP would be great, even if not accepted. At least we'll have it discussed
in a single place and avoid rediscussing it everytime someone figures
out it's a nice idea. Have a look for the subject "Extending tuple unpacking"
in the mailing list for a recent discussion on the topic.

Gustavo Niemeyer

From lists at  Wed Feb 15 13:49:04 2006
From: lists at (Jan Claeys)
Date: Wed, 15 Feb 2006 13:49:04 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$>  <>
Message-ID: <1140007745.13739.7.camel@localhost.localdomain>

Op wo, 15-02-2006 te 14:00 +1300, schreef Greg Ewing:
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number. 

Those directories might be mounted on entirely different hardware (even
over a network), often with different characteristics (access speed,
writeability, etc.).

Jan Claeys

From tim at  Wed Feb 15 16:14:41 2006
From: tim at (Tim Parkin)
Date: Wed, 15 Feb 2006 15:14:41 +0000
Subject: [Python-Dev] how to upload new MacPython web page?
In-Reply-To: <>
References: <06Feb14.213215pst."58633">
Message-ID: <>

Thomas Wouters wrote:

>On Tue, Feb 14, 2006 at 09:32:09PM -0800, Bill Janssen wrote:
>>We (the pythonmac-sig mailing list) seem to have converged (almost --
>>still talking about the logo) on a new download page for MacPython, to
>>replace the page currently at
>>  The strawman can be
>>seen at
>>How do I get the bits changed on (when we're finished)?
>pydotorg at is probably the right email address (although most of
>them are on here as well.)
I'm happy to upload the pages when you're ready.


From jeremy at  Wed Feb 15 16:29:38 2006
From: jeremy at (Jeremy Hylton)
Date: Wed, 15 Feb 2006 10:29:38 -0500
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

I am still -1 on the ast-objects branch.  It adds a lot of boilerplate
code and its makes complicated what is now simple.  I'll see if I can
get a rough cut of the marshal code ready today, so there will be a
complete implementation of my original plan.

I also think we should keep the transformation api simple.  If we
provide an extension module, along the lines of the parser module,
users can write transformations with that module.  They can also write
their own wrapper script that runs a script after applying

I agree that the question of saved bytecode files still needs to be
resolved.  I'm not sure that extending the bytecode format to record
modifications is enough, since you also have a filename problem:  How
do you manage two versions of a module, one compiled with
transformation and one compiled without?

How about we arrange for some open space time at PyCon to discuss? 
Unfortunately, the compiler talk isn't until the last day and I can't
stay for sprints.  It would be better to have the talk, then the open
space, then the sprint.


On 2/15/06, Simon Burton <simon at> wrote:
> On Wed, 15 Feb 2006 00:34:35 -0800
> Brett Cannon <brett at> wrote:
> > As per Neal's prodding email, here is a thread to discuss where we
> > want to go with the C AST to Python stuff and what I think are the
> > core issues at the moment.
> >
> > First issue is the ast-objects branch.  Work is being done on it, but
> > it still leaks some references (Neal or Martin can correct me if I am
> > wrong).
> I've been doing the heavy lifting on ast-objects the last few weeks.
> Today it finally passed the python test suite. The last thing to do is
> the addition of XDECREF's, so yes, it is leaking a lot of references.
> I won't make it to PyCon (it's a long way for me to come), but gee I've left
> all the fun stuff for you to do !
> :)
> Even if AST transforms are not allowed, I see it as the strongest form of
> code reflection, and long over-due in python.
> Simon.
> --
> Simon Burton, B.Sc.
> Licensed PO Box 8066
> ANU Canberra 2601
> Australia
> Ph. 61 02 6249 6940
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From aahz at  Wed Feb 15 16:47:08 2006
From: aahz at (Aahz)
Date: Wed, 15 Feb 2006 07:47:08 -0800
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006, Thomas Wouters wrote:
> I can volunteer for 328 if no one else wants it, I've messed with the import
> mechanism before (and besides, it's fun.) I've also written an unfinished
> 308 implementation to get myself acquainted with the AST code more.
> 'Unfinished' means that it works completely, except for some cases of
> ambiguous syntax. I can fix that in a few days if the deadline nears and
> there's no working patch.

If you want to also take over the PEP328 editing, please be my guest.  I
keep making time for it that gets overridden by other things.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From foom at  Wed Feb 15 17:48:18 2006
From: foom at (James Y Knight)
Date: Wed, 15 Feb 2006 11:48:18 -0500
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>
Message-ID: <>

On Feb 15, 2006, at 7:19 AM, Fuzzyman wrote:
> [snip..]
> I personally like the move towards all unicode strings, basically  
> any text where you don't know the encoding used is 'random binary  
> data'. This works fine, so long as you are in control of the text  
> source. *However*, it leaves the following problem :
> The current situation (treating byte-sequences as text and assuming  
> they are an ascii-superset encoded text-string) *works* (albeit  
> with many breakages), simply because this assumption is usually  
> correct.
> Forcing the programmer to be aware of encodings, also pushes the  
> same requirement onto the user (who is often the source of the text  
> in question).
> Currently you can read a text file and process it - making sure  
> that any changes/requirements only use ascii characters. It  
> therefore doesn't matter what 8 bit ascii-superset encoding is used  
> in the original. If you force the programmer to specify the  
> encoding in order to read the file, they would have to pass that  
> requirement onto their user. Their user is even less likely to be  
> encoding aware than the programmer.

Or the programmer can just use "iso-8859-1" and call it done. That  
will get you the same "I don't care" behavior as now.


From smiles at  Wed Feb 15 15:29:01 2006
From: smiles at (Smith)
Date: Wed, 15 Feb 2006 08:29:01 -0600
Subject: [Python-Dev] math.areclose ...?
References: <>
	<00dd01c63142$3dd61280$892c4fca@csmith> <>
Message-ID: <004d01c63251$4ea87340$452c4fca@csmith>

A problem that I pointed out with the proposed areclose() function is that it has within it a fp comparison. If such a function is to have greater utility, it should allow the user to specify how significant to consider the computed error. A natural extension of being able to tell if 2 fp numbers are close is to make a more general comparison. For that purpose, a proposed fpcmp function is appended. From that, fp boolean comparison operators (le, gt, ...) are easily constructed.

Python allows fp comparison. This is significantly of source of surprises and learning experiences. Are any of these proposals of interest for providing tools to more intelligently make the fp comparisons?

#new proposal for the areclose() function
def areclose(x,y,atol=1e-8,rtol=1e-5,prec=12):
    """Return False if the |x-y| is greater than atol or
     greater than the absolute value of the larger of x and y, 
     otherwise True. The comparison is made by computing a 
     difference that should be 0 if the two numbers satisfy 
     either condition; prec controls the precision of the
     value that is obtained, e.g. 8.3__e-17 is obtained 
     for (2.1-2)-.1. But rounding to the 12th digit (the default 
     precision) the value of 0.0 is returned indicating that for
    that precision there is no (significant) error."""
    diff = abs(x-y)
    return round(diff-atol,prec)<=0 or \

#fp cmp
def fpcmp(x,y,atol=1e-8,rtol=1e-5,prec=12):
    """Return 0 if x and y are close in the absolute or 
    relative sense. If not, then return -1 if x < y or +1 if x > y.
    Note: prec controls how many digits of the error are retained
    when checking for closeness."""
    if areclose(x,y,atol,rtol,prec):
        return 0
        return cmp(x,y)

# fp comparisons functions
def lt(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec)==-1
def le(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec) in (-1,0)
def eq(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec)==0
def gt(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec)==1
def ge(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec) in (0,1)
def ne(x,y,atol=1e-8,rtol=1e-5,prec=12):
    return fpcmp(x, y, atol, rtol, prec)<>0

From guido at  Wed Feb 15 18:17:44 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 09:17:44 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Nick Coghlan <ncoghlan at> wrote:
> If we went with longer names, a slight variation on the opentext/openbinary
> idea would be to use opentext and opendata.

After some thinking I don't like opendata any more -- often data is
text, so the term is wrong. openbinary is fine but long. So how about
openbytes? This clearly links the resulting object with the bytes
type, which is mutually reassuring.

Regarding open vs. opentext, I'm still not sure. I don't want to
generalize from the openbytes precedent to openstr or openunicode
(especially since the former is wrong in 2.x and the latter is wrong
in 3.0). I'm tempting to hold out for open() since it's most

--Guido van Rossum (home page:

From guido at  Wed Feb 15 18:25:59 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 09:25:59 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Fuzzyman <fuzzyman at> wrote:
>  Forcing the programmer to be aware of encodings, also pushes the same
> requirement onto the user (who is often the source of the text in question).

The programmer shouldn't have to be aware of encodings most of the
time -- it's the job of the I/O library to determine the end user's
(as opposed to the language's) default encoding dynamically and act
accordingly. Users who use non-ASCII characters without informing the
OS of their encoding are in a world of pain, *unless* they use the OS
default encoding (which may vary per locale). If the OS can figure out
the default encoding, so can the Python I/O library. Many apps won't
have to go beyond this at all.

Note that I don't want to use this OS/user default encoding as the
default encoding between bytes and strings; once you are reading bytes
you are writing "grown-up" code and you will have to be explicit. It's
only the I/O library that should automatically encode on write and
decode on read.

>  Currently you can read a text file and process it - making sure that any
> changes/requirements only use ascii characters. It therefore doesn't matter
> what 8 bit ascii-superset encoding is used in the original. If you force the
> programmer to specify the encoding in order to read the file, they would
> have to pass that requirement onto their user. Their user is even less
> likely to be encoding aware than the programmer.

I disagree -- the user most likely has set or received a default
encoding when they first got the computer, and that's all they are
using. If other tools (notepad, wordpad, emacs, vi etc.) can figure
out the encoding, so can Python's I/O library.

>  What this means, is that for simple programs where the programmer doesn't
> want to have to worry about encoding, or can't force the user to be aware,
> they will read in the file as bytes.

Of course not!

> Modules will quickly and inevitably be
> created implementing all the 'string methods' for bytes. New programmers
> will gravitate to these and the old mess will continue, but with a more
> awkward hybrid than before. (String manipulations of byte sequences will no
> longer be a core part of the language - and so be harder to use.)

This seems an unlikely development if we do the conversions in the I/O library.

>  Not sure what we can do to obviate this of course... but is this change
> actually going to improve the situation or make it worse ?

I'm not worried about this scenario. "What if all the programmers in
the world suddenly became dumb?"

--Guido van Rossum (home page:

From mal at  Wed Feb 15 18:29:14 2006
From: mal at (M.-A. Lemburg)
Date: Wed, 15 Feb 2006 18:29:14 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Nick Coghlan <ncoghlan at> wrote:
>> If we went with longer names, a slight variation on the opentext/openbinary
>> idea would be to use opentext and opendata.
> After some thinking I don't like opendata any more -- often data is
> text, so the term is wrong. openbinary is fine but long. So how about
> openbytes? This clearly links the resulting object with the bytes
> type, which is mutually reassuring.
> Regarding open vs. opentext, I'm still not sure. I don't want to
> generalize from the openbytes precedent to openstr or openunicode
> (especially since the former is wrong in 2.x and the latter is wrong
> in 3.0). I'm tempting to hold out for open() since it's most
> compatible.

Maybe a weird idea, but why not use static methods on the
bytes and str type objects for this ?!

E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
renamed to str.openfile())

After all, you are in a certain way constructing object
of the given types - only that the input to these
constructors happen to be files in the file system.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From barry at  Wed Feb 15 18:51:49 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 12:51:49 -0500
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:

> Regarding open vs. opentext, I'm still not sure. I don't want to
> generalize from the openbytes precedent to openstr or openunicode
> (especially since the former is wrong in 2.x and the latter is wrong
> in 3.0). I'm tempting to hold out for open() since it's most
> compatible.

If we go with two functions, I'd much rather hang them off of the file
type object then add two new builtins.  I really do think file.bytes()
and file.text() (a.k.a. open.bytes() and open.text()) is better than
opentext() or openbytes().


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From barry at  Wed Feb 15 18:53:43 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 12:53:43 -0500
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:

> Maybe a weird idea, but why not use static methods on the
> bytes and str type objects for this ?!
> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> renamed to str.openfile())

That's also not a bad idea, but I'd leave off one or the other of the
redudant "open" and "file" parts.  E.g. and
seem fine to me (we all know what 'open' means, right? :).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From amk at  Wed Feb 15 19:57:40 2006
From: amk at (A.M. Kuchling)
Date: Wed, 15 Feb 2006 13:57:40 -0500
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006 at 10:29:38AM -0500, Jeremy Hylton wrote:
> Unfortunately, the compiler talk isn't until the last day and I can't
> stay for sprints.  It would be better to have the talk, then the open
> space, then the sprint.

If you mean "Implementation of the Python Bytecode Compiler", that's
on Saturday at 10:50, so you have a whole day in which to fit an open
space event.  Unfortunately there are already a lot of open space
events on that day, and the next open slot is at 3:15PM.  But if you
don't need a room to talk in, I'm sure you can find a comfortable
place for 5 or 6 people to chat.


From barry at  Wed Feb 15 19:02:13 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 13:02:13 -0500
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 00:34 -0800, Brett Cannon wrote:

> I personally think we should choose an initial global access API to
> the AST as a starting API.  I like the sys.ast_transformations idea
> since it is simple and gives enough access that whether read-only or
> read-write is allowed something like PyChecker can get the access it
> needs.

I haven't been following the AST stuff closely enough, but I'm not crazy
about putting access to this in the sys module.  It seems like it
clutters that up with a name that will be rarely used by the average
Python programmer.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From mal at  Wed Feb 15 19:02:58 2006
From: mal at (M.-A. Lemburg)
Date: Wed, 15 Feb 2006 19:02:58 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	
Message-ID: <>

Barry Warsaw wrote:
> On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
>> Maybe a weird idea, but why not use static methods on the
>> bytes and str type objects for this ?!
>> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
>> renamed to str.openfile())
> That's also not a bad idea, but I'd leave off one or the other of the
> redudant "open" and "file" parts.  E.g. and
> seem fine to me (we all know what 'open' means, right? :).

Thinking about it, I like your idea better (file.bytes()
and file.text()).

Anyway, as long as we don't start adding openthis() and openthat()
I guess I'm happy ;-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From barry at  Wed Feb 15 19:06:44 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 13:06:44 -0500
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, 2006-02-14 at 21:24 -0800, Neal Norwitz wrote:

> We still need a release manager.  No one has heard from Anthony.  If
> he isn't interested is someone else interested in trying their hand at
> it?  There are many changes necessary in PEP 101 because since the
> last release both python and pydotorg have transitioned from CVS to
> SVN.  Creosote also moved.

I would definitely like to see a PEP 101 update as part of the 2.5 RM's
responsibilities, and I think it could be done while spinning the first
alpha release.  I know others have volunteered, but in a pinch I'd be
happy to dust off my RM hat and help out too.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From barry at  Wed Feb 15 19:07:51 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 13:07:51 -0500
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 19:02 +0100, M.-A. Lemburg wrote:

> Anyway, as long as we don't start adding openthis() and openthat()
> I guess I'm happy ;-)

Me too! :)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From guido at  Wed Feb 15 19:29:30 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 10:29:30 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, M.-A. Lemburg <mal at> wrote:
> Barry Warsaw wrote:
> > On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
> >
> >> Maybe a weird idea, but why not use static methods on the
> >> bytes and str type objects for this ?!
> >>
> >> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> >> renamed to str.openfile())
> >
> > That's also not a bad idea, but I'd leave off one or the other of the
> > redudant "open" and "file" parts.  E.g. and
> > seem fine to me (we all know what 'open' means, right? :).
> Thinking about it, I like your idea better (file.bytes()
> and file.text()).

This is better than making it a static/class method on file (which has
the problem that it might return something that's not a file at all --
file is a particular stream implementation, there may be others) but I
don't like the tight coupling it creates between a data type and an
I/O library. I still think that having global (i.e. built-in) factory
functions for creating various stream types makes the most sense.

--Guido van Rossum (home page:

From jimjjewett at  Wed Feb 15 19:38:41 2006
From: jimjjewett at (Jim Jewett)
Date: Wed, 15 Feb 2006 13:38:41 -0500
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/14/06, Neil Schemenauer wrote:
> People could spell it bytes(s.encode('latin-1'))

Guido wrote:
> At the cost of an extra copying step.

I asked:
> ... why not just add some smarts to the bytes constructor?

Guido wrote:

> ... the VM usually keeps an extra reference
> on the stack so the refcount is never 1. But
> you can't rely on that

I did miss this, but _PyString_Resize seems to
work around it, and I'm not sure that the bytes
object can't be just as intimate.

Even if that is insurmountable, bytes objects
could recognize two states -- one normal, and
one for "I'm delegating to a string, and have to
copy to my own buffer before I actually mutate

Then a new bytes object would still need its
own header, but the data copying could often
be avoided.

But back to the possibility of not creating
even a new object header...
> the str's underlying array is allocated inline
> with the str header, this require str and
> bytes to have the same object layout. But
> since bytes are mutable, they can't.

Looking at the arraymodule, the only extra
fields in an array are weakrefs, description
(which will no longer be needed) and tracking
for the indirection.  There are even a few extra
bytes leftover that could be used to indicate
that ob_item was redirected later, the way
tables do with small_table.


From janssen at  Wed Feb 15 19:59:44 2006
From: janssen at (Bill Janssen)
Date: Wed, 15 Feb 2006 10:59:44 PST
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: Your message of "Wed, 15 Feb 2006 09:51:49 PST."
Message-ID: <06Feb15.105950pst."58633">

> If we go with two functions, I'd much rather hang them off of the file
> type object then add two new builtins.  I really do think file.bytes()
> and file.text() (a.k.a. open.bytes() and open.text()) is better than
> opentext() or openbytes().


The default behavior of the current open() in opening files as text is
particularly grating.  This would make things much clearer.


From jason.orendorff at  Wed Feb 15 20:01:37 2006
From: jason.orendorff at (Jason Orendorff)
Date: Wed, 15 Feb 2006 14:01:37 -0500
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination
	with pep 349?]
Message-ID: <>

Instead of byte literals, how about a classmethod bytes.from_hex(), which
works like this:

  # two equivalent things
  expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
  expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
131, 79, 229, 201, 46, 106])

It's just a nicety; the former fits my brain a little better.  This would
work fine both in 2.5 and in 3.0.

I thought about unicode.encode('hex'), but obviously it will continue to
return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
'zip', 'uu', etc.) generally scare me.  And now that bytes and text are
going to be two very different types, they're even weirder than before.

  text.encode('utf-8') ==> bytes
  text.encode('rot13') ==> text
  bytes.encode('zip') ==> bytes
  bytes.encode('uu') ==> text (?)

This state of affairs seems kind of crazy to me.

Actually users trying to figure out Unicode would probably be better served
if bytes.encode() and text.decode() did not exist.

-------------- next part --------------
An HTML attachment was scrubbed...

From martin at  Wed Feb 15 20:04:07 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 20:04:07 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>	
Message-ID: <>

Adam Olsen wrote:
> Making it an error to have 8-bit str literals in 2.x would help
> educate the user that they will change behavior in 3.0 and not be
> 8-bit str literals anymore.

You would like to ban string literals from the language? Remember:
all string literals are currently 8-bit (byte) strings.


From guido at  Wed Feb 15 20:16:51 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 11:16:51 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Jason Orendorff <jason.orendorff at> wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
>    # two equivalent things
>    expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>    expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
>  It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.

Yes, this looks nice.

>  I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me.  And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
>    text.encode('utf-8') ==> bytes
>    text.encode('rot13') ==> text
>    bytes.encode('zip') ==> bytes
>    bytes.encode('uu') ==> text (?)
>  This state of affairs seems kind of crazy to me.
>  Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

Yeah, the pseudogeneralizations seem to be a mistake -- they are
almost universally frowned upon. I'll happily send their to their
grave in Py3k.

It would be better if the signature of text.encode() always returned a
bytes object. But why deny the bytes object a decode() method if text
objects have an encode() method?

I'd say there are two "symmetric" API flavors possible (t and b are
text and bytes objects, respectively, where text is a string type,
either str or unicode; enc is an encoding name):

- b.decode(enc) -> t; t.encode(enc) -> b
- b = bytes(t, enc); t = text(b, enc)

I'm not sure why one flavor would be preferred over the other,
although having both would probably be a mistake.

--Guido van Rossum (home page:

From trentm at  Wed Feb 15 20:18:56 2006
From: trentm at (Trent Mick)
Date: Wed, 15 Feb 2006 11:18:56 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

[Bob Ippolito wrote]
> >    /Library/Frameworks/Python.framework/...
> >    /Applications/MacPython-2.4/...  # just MacPython does this
> ActivePython doesn't install app bundles for IDLE or anything?

It does, but puts them under here instead:

> >Also, a receipt of the installation ends up here:
> >
> >    /Library/Receipts/$package_name/...
> >
> >though Apple does not provide tools for uninstallation using those
> >receipts.
> That stuff is really behind the scenes stuff that's wholly managed by  
> and is pretty much irrelevant.


> Single apps are better than OK.  Download them by whatever means you  
> want, put them wherever you want, and run them.  You can run any well- 
> behaved application from a DMG (or a CD, or a USB key, or any other  
> readable media).

For naive or new-to-mac users it is a confusing process to get the .app
bundle to an appropriate place and then start running it. Why else have
various app distributors out there come up with myriad slick background
images for their DMG's trying to instruct users what to do with the
icons in the mounted DMG's Finder window?

On Windows you download an MSI (it ends up in your browser downloads
folder), it starts the installation, and the end of the installation it
starts the app for you. The app is nicely in Program Files. No need to
eject something. No need to find somewhere to drag the icon.

I'll grant that having the whole thing in one bundle is cool/handy/cute.

...anyway this is getting seriously OT for python-dev. :)


Trent Mick
TrentM at

From theller at  Wed Feb 15 20:21:03 2006
From: theller at (Thomas Heller)
Date: Wed, 15 Feb 2006 20:21:03 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
 coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <dsvuuq$i71$>

Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

I hope this will also be equivalent:
>   expected_md5_hash = bytes.from_hex('5c 53 50 24 ca c5 19 91 53 e3 83 4f e5 c9 2e 6a')


From jcarlson at  Wed Feb 15 20:25:01 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 15 Feb 2006 11:25:01 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <> <>
Message-ID: <>

Ron Adam <rrr at> wrote:
> Greg Ewing wrote:
> > Ron Adam wrote:
> >>     b = bytes(0L) ->  bytes([0,0,0,0])
> > 
> > No, bytes(0L) --> TypeError because 0L doesn't implement
> > the iterator protocol or the buffer interface.
> It wouldn't need it if it was a direct C memory copy.

Yes it would.  Python long integers are stored as arrays of signed
16-bit short ints.  See longintrepr.h from the source.

 - Josiah

From bob at  Wed Feb 15 20:23:22 2006
From: bob at (Bob Ippolito)
Date: Wed, 15 Feb 2006 11:23:22 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <1140007745.13739.7.camel@localhost.localdomain>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On Feb 15, 2006, at 4:49 AM, Jan Claeys wrote:

> Op wo, 15-02-2006 te 14:00 +1300, schreef Greg Ewing:
>> I'm disappointed that the various Linux distributions
>> still don't seem to have caught onto the very simple
>> idea of *not* scattering files all over the place when
>> installing something.
>> MacOSX seems to be the only system so far that has got
>> this right -- organising the system so that everything
>> related to a given application or library can be kept
>> under a single directory, clearly labelled with a
>> version number.
> Those directories might be mounted on entirely different hardware  
> (even
> over a network), often with different characteristics (access speed,
> writeability, etc.).

Huh?  What does that have to do with anything?  I've never seen a  
system where /usr/include, /usr/lib, /usr/bin, etc. are not all on  
the same mount.  It's not really any different with OS X either.


From martin at  Wed Feb 15 20:24:17 2006
From: martin at (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 20:24:17 +0100
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

Alain Poirier wrote:
>   - is (c)ElementTree still planned for inclusion ?

It is included already.


From martin at  Wed Feb 15 20:26:24 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 20:26:24 +0100
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Thomas Wouters wrote:
> I would personally prefer the AST validation to be a separate part of the
> compiler. It means the one or the other can be out of sync, but it also
> means it can be accessed directly (validating AST before sending it to the
> compiler) and the compiler (or CFG generator, or something between AST and
> CFG) can decide not to validate internally generated AST for non-debug
> builds, for instance.

That's how the ast-objects branch currently works. There is a method
checking that the tree actually conforms to the grammar.


From trentm at  Wed Feb 15 20:28:48 2006
From: trentm at (Trent Mick)
Date: Wed, 15 Feb 2006 11:28:48 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

[Greg Ewing wrote]
> It's not perfect, but it's still a lot better than the
> situation on any other unix I've seen so far.

Better than Unix, sure. But you *can* (and ActivePython does do) install
everything under:


> > open DMG, don't run the app from here, drag it to your
> > Applications folder, then eject this window/disk, then run it from
> > /Applications,
> A decently-designed application should be runnable from
> anywhere, including a dmg, if the user wants to do that.
> If an app refuses to run from a dmg, I consider that a
> bug in the application.

Yes, but the typical user probably *wants* to run the app from their
/Applications folder (or somewhere else on their harddrive). When they
start running from the mounted DMG, they can't then unmount the DMG to
clean up. Actually the typical non-geek user doesn't care where they run
the app from. They don't want to worry about those details.


Trent Mick
TrentM at

From martin at  Wed Feb 15 20:37:17 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 20:37:17 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
 coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Jason Orendorff wrote:
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

This looks good, although it duplicates

expected_md5_hash = binascii.unhexlify('5c535024cac5199153e3834fe5c92e6a')


From guido at  Wed Feb 15 20:38:47 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 11:38:47 -0800
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

On 2/15/06, Tim Parkin <tim at> wrote:
> Guido van Rossum wrote:
> > (Now that I work for Google I realize more than ever before the
> > importance of keeping URLs stable; PageRank(tm) numbers don't get
> > transferred as quickly as contents. I have this worry too in the
> > context of the redesign; 301 permanent redirect is *not*
> > going to help PageRank of the new page.)

> Could you expand on why 301 redirects won't help with the transfer of
> page rank (if you're allowed)? We've done exactly this on many sites and
> the pagerank (or more relevantly the search rankings on specific terms)
> has transferred almost overnight. The bigger pagerank updates (both
> algorithm changes and overhauls in approach) seem to only happen every
> few months and these also seem to take notice of 301 redirects (they
> generally clear up any supplemental results).

OK, perhaps I stand corrected. I don't actually know that much about PageRank!

I still don't like, and adding more like it seems a
mistake; but it's possible that this is because of a poor execution of
the idea (there's no "search docs" button near the search button on
the old

--Guido van Rossum (home page:

From guido at  Wed Feb 15 20:40:49 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 11:40:49 -0800
Subject: [Python-Dev] Generalizing *args and **kwargs
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Thomas Wouters <thomas at> wrote:
> I've been thinking about generalization of the *args/**kwargs syntax for
> quite a while, and even though I'm pretty sure Guido (and many people) will
> consider it overgeneralization, I am finally going to suggest it. This whole
> idea is not something dear to my heart, although I obviously would like to
> see it happen. If the general vote is 'no', I'll write a small PEP or add it
> to PEP 13 and be done with it.

Feel free to write a PEP so that at least we have a concrete proposal
where all the nuts and bolts have been thought through.

I'm currently not able to give much thought to any more new proposals,
so don't expect me to look at it any time soon. Unless a miracle
occurs it's off the table for 2.5 so there's no hurry.

--Guido van Rossum (home page:

From guido at  Wed Feb 15 20:43:13 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 11:43:13 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <8245655246816499522@unknownmsgid>
References: <>
Message-ID: <>

On 2/15/06, Bill Janssen <janssen at> wrote:
> The default behavior of the current open() in opening files as text is
> particularly grating.

Why? Are you perhaps one of those rare folks who read more binary data
than text?

--Guido van Rossum (home page:

From tim at  Wed Feb 15 21:08:32 2006
From: tim at (Tim Parkin)
Date: Wed, 15 Feb 2006 20:08:32 +0000
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Tim Parkin <tim at> wrote:
>>Guido van Rossum wrote:
>>>I have this worry too in the
>>>context of the redesign; 301 permanent redirect is *not*
>>>going to help PageRank of the new page.)
>>Could you expand on why 301 redirects won't help with the transfer of
>>page rank (if you're allowed)? We've done exactly this on many sites and
>>the pagerank (or more relevantly the search rankings on specific terms)
>>has transferred almost overnight. The bigger pagerank updates (both
>>algorithm changes and overhauls in approach) seem to only happen every
>>few months and these also seem to take notice of 301 redirects (they
>>generally clear up any supplemental results).
> OK, perhaps I stand corrected. I don't actually know that much about PageRank!
No problem, I don't think that many people do and the general consensus
seems to be that, although the calculations behind pagerank may be one
of the core parts of the google algorithm, there are so many additional
algorithms* that affect searches on a case by case and day by day basis
that the value from is almost meaningless (apart from possibly 0-2 may
be a problem 3-5 is normal, 6-9 is generally good and 10 I've not seen)

* (for instance, patents on working out the value of inbound links based
on there age, how many other inbound links appeared around the same
time, the status of the originating site as an 'authority' site, the
text contained in the inbound link and title attributes, etc and the
general relation between the inbound links and the 'theme' of the target
site ['theme' == the distribution of important keywords across the site])

> I still don't like, and adding more like it seems a
> mistake; but it's possible that this is because of a poor execution of
> the idea (there's no "search docs" button near the search button on
> the old
I'll try and make a more functional/usable google search page on the new

Tim Parkin

p.s. I hope you didn't think I was digging for 'insider info'..

From jeremy at  Wed Feb 15 21:07:01 2006
From: jeremy at (Jeremy Hylton)
Date: Wed, 15 Feb 2006 15:07:01 -0500
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>
Message-ID: <>

As I said in an earlier message, there's no need to have a separate
domain to restrict queries to just the doc/current part of
 Just type
" your query here"

If there isn't any other rationale, maybe we can redirects back to


On 2/15/06, Guido van Rossum <guido at> wrote:
> On 2/15/06, Tim Parkin <tim at> wrote:
> > Guido van Rossum wrote:
> >
> > > (Now that I work for Google I realize more than ever before the
> > > importance of keeping URLs stable; PageRank(tm) numbers don't get
> > > transferred as quickly as contents. I have this worry too in the
> > > context of the redesign; 301 permanent redirect is *not*
> > > going to help PageRank of the new page.)
> > Could you expand on why 301 redirects won't help with the transfer of
> > page rank (if you're allowed)? We've done exactly this on many sites and
> > the pagerank (or more relevantly the search rankings on specific terms)
> > has transferred almost overnight. The bigger pagerank updates (both
> > algorithm changes and overhauls in approach) seem to only happen every
> > few months and these also seem to take notice of 301 redirects (they
> > generally clear up any supplemental results).
> OK, perhaps I stand corrected. I don't actually know that much about PageRank!
> I still don't like, and adding more like it seems a
> mistake; but it's possible that this is because of a poor execution of
> the idea (there's no "search docs" button near the search button on
> the old
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From g.brandl at  Wed Feb 15 21:13:14 2006
From: g.brandl at (Georg Brandl)
Date: Wed, 15 Feb 2006 21:13:14 +0100
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	<>	<>	<>
Message-ID: <dt020q$s7$>

Jeremy Hylton wrote:
> As I said in an earlier message, there's no need to have a separate
> domain to restrict queries to just the doc/current part of
>  Just type
> " your query here"
> If there isn't any other rationale, maybe we can redirects
> back to

If something like Fredrik's new doc system is adopted, it would be extremely
convenient to refer someone to just

without looking up how the page is actually named.


From guido at  Wed Feb 15 21:33:10 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 12:33:10 -0800
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/14/06, Greg Ewing <greg.ewing at> wrote:
> Fred L. Drake, Jr. wrote:
> > The proper response in this case is often to re-start decoding
> > with the correct encoding, since some of the data extracted so far may have
> > been decoded incorrectly.
> If the protocol has been sensibly designed, that shouldn't
> happen, since everything up to the coding marker should
> be ascii (or some other protocol-defined initial coding).
> For protocols that are not sensibly designed (or if you're
> just trying to guess) what you suggest may be needed. But
> it would be good to have a nicer way of going about it
> for when the protocol is sensible.

I think that the implementation of encoding-guessing or
auto-encoding-upgrade techniques should be left out of the standard
library design for now. I know that XML does something like this, but
fortunately we employ dedicated C code to parse XML so that particular
case should be taken care of without complicating the rest of the
standard I/O library.

As far as searching bytes objects, that shouldn't be a problem as long
as the search 'string' is also specified as a bytes object.

--Guido van Rossum (home page:

From tim at  Wed Feb 15 21:52:49 2006
From: tim at (Tim Parkin)
Date: Wed, 15 Feb 2006 20:52:49 +0000
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	
Message-ID: <>

Jeremy Hylton wrote:
> As I said in an earlier message, there's no need to have a separate
> domain to restrict queries to just the doc/current part of
>  Just type
> " your query here"
> If there isn't any other rationale, maybe we can redirects
> back to

One possible reason, I'd like to be able to serve the docs up integrated
with the new design (with a full hierarchical navigation). I had planned
on leaving the as the raw tex2html conversion. If we got
rid of the would we still want the in the
current style? Personally I was hoping that nearly all of the site could
be in the new html structure and design for consistency and usability

Tim Parkin

From bokr at  Wed Feb 15 21:53:06 2006
From: bokr at (Bengt Richter)
Date: Wed, 15 Feb 2006 20:53:06 GMT
Subject: [Python-Dev] bytes type discussion
References: <>
Message-ID: <>

On Tue, 14 Feb 2006 19:41:07 -0500, "Raymond Hettinger" <python at> wrote:

>[Guido van Rossum]
>> Somewhat controversial:
>> - bytes("abc") == bytes(map(ord, "abc"))
>At first glance, this seems obvious and necessary, so if it's somewhat 
>controversial, then I'm missing something.  What's the issue?
ord("x") gets the source encoding's ord value of "x", but if that is not unicode
or latin-1, it will break when PY 3000 makes "x" unicode.

This means until Py 3000 plain str string literals have to use ascii and
escapes in order to preserve the meaning when "x" == u"x".

But the good news is bytes(map(ord(u"x"))) works fine for any source encoding
now or after PY 3000. You just have to type characters into your editor
between the quotes that look on the screen like any of the first 256 unicode characters
(or use ascii escapes for unshowables). The u"x" translates x into unicode according
to the *character* of x, whatever the source encoding, so all you have to do is
choose characters of the first 256 unicodes. This happens to be latin-1, but you can ignore that
unless you are interested in the actual byte values. If they have byte meaning, escapes
are clearer anyway, and they work in a unicode string (where "x".decode(source_encoding) might
fail on an illegal character).

The solution is to use u"x" for now or use ascii-only with escapes, and just
map ord on either kind of string. This should work when u"x"
becomes equivalent to "x". The unicode that comes from a current u"x" string
defines a *character* sequence. If you use legal latin-1 *characters* in
whatever source encoding your editor and coding cookie say, you will get
the *characters* you see inside the quotes in the u"..." literal translated
to unicode, and the first 256 characters of unicode happen to be the latin-1 set,
so map ord just works. With a unicode string you don't have to think about encoding,
just use ord/unichr in range(0,256). Hex escapes within unicode strings work as expected,
so IMO it's pretty clean.

I think I have shown this in a couple of other posts in the orignal thread
(where I created and compiled source code in several encodings including utf-8
and comiled with coding cookies and exec'd the result)

I could always have overlooked something, but I am hopeful.

Bengt Richter

From fredrik at  Wed Feb 15 21:53:53 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 15 Feb 2006 21:53:53 +0100
Subject: [Python-Dev] still available
References: <dsq741$4un$>	<>	<>	<><>
Message-ID: <dt04d3$9gl$>

Georg Brandl wrote:

> If something like Fredrik's new doc system is adopted, it would be extremely
> convenient to refer someone to just
> without looking up how the page is actually named.

you could of course reserve a toplevel directory for that purpose; e.g.

or perhaps



From guido at  Wed Feb 15 21:58:32 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 12:58:32 -0800
Subject: [Python-Dev] bytes type needs a new champion
Message-ID: <>

Skip has mentioned in private email that he's not available to update
PEP 332. I've therefore rejected that PEP; the current ideas are
rather different so we might as well start a new PEP. Anyway, we need
a new PEP author who can take the current discussion and turn it into
a coherent PEP. I've tried to keep up with the current thread but it
takes too much time to organize it all and I need to start focusing on
the 2.5 release schedule.

Any volunteers?

--Guido van Rossum (home page:

From fredrik at  Wed Feb 15 21:56:54 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 15 Feb 2006 21:56:54 +0100
Subject: [Python-Dev] 2.5 PEP
References: <><>
Message-ID: <dt04in$a5l$>

Martin v. Löwis wrote:

> >   - is (c)ElementTree still planned for inclusion ?
> It is included already.

in the xml.etree package, in case someone's looking for it in the
usual place.

that is,

    import xml.etree.ElementTree as ET
    import xml.etree.cElementTree as ET

will work in any 2.5 that has a working pyexpat.

(is the xmlplus/xmlcore issue still an issue, btw?)


From mal at  Wed Feb 15 22:07:02 2006
From: mal at (M.-A. Lemburg)
Date: Wed, 15 Feb 2006 22:07:02 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
 coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
> It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.
> I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me. 

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving
the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough
to also work with many other types, e.g. it is easily possible to
write a codec that convert between the hex literal sequence you
have above to a list of ordinals:

""" Hex string codec

    Converts between a list of ordinals and a two byte hex literal

    >>> codecs.encode([1,2,3], 'hexstring')
    >>> codecs.decode(_, 'hexstring')
    [1, 2, 3]

    (c) 2006, Marc-Andre Lemburg.

import codecs

class Codec(codecs.Codec):

    def encode(self, input, errors='strict'):

        """ Convert hex ordinal list to hex literal string.
        if not isinstance(input, list):
            raise TypeError('expected list of integers')
        return (
            ''.join(['%02x' % x for x in input]),

    def decode(self,input,errors='strict'):

        """ Convert hex literal string to hex ordinal list.
        if not isinstance(input, str):
            raise TypeError('expected string of hex literals')
        size = len(input)
        if not size % 2 == 0:
            raise TypeError('input string has uneven length')
        return (
            [int(input[(i<<1):(i<<1)+2], 16)
             for i in range(size >> 1)],

class StreamWriter(Codec,codecs.StreamWriter):

class StreamReader(Codec,codecs.StreamReader):

def getregentry():
    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

> And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
>   text.encode('utf-8') ==> bytes
>   text.encode('rot13') ==> text
>   bytes.encode('zip') ==> bytes
>   bytes.encode('uu') ==> text (?)
> This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above
usages through the .encode() and .decode() methods is
not the only way you can make use of them.

To get full access to the codecs, you'll have to use
the codecs module.

> Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods
are merely interfaces to the registered codecs. Whether they
make sense for a certain codec depends on the codec, not the
methods that interface to it, and again, codecs do not
only exist to convert between Unicode and strings.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From thomas at  Wed Feb 15 22:27:13 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 22:27:13 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006 at 01:38:41PM -0500, Jim Jewett wrote:
> On 2/14/06, Neil Schemenauer wrote:
> > People could spell it bytes(s.encode('latin-1'))
> Guido wrote:
> > At the cost of an extra copying step.
> I asked:
> > ... why not just add some smarts to the bytes constructor?
> Guido wrote:
> > ... the VM usually keeps an extra reference
> > on the stack so the refcount is never 1. But
> > you can't rely on that
> I did miss this, but _PyString_Resize seems to
> work around it, and I'm not sure that the bytes
> object can't be just as intimate.

No, _PyString_Resize doesn't work around it. _PyString_Resize only works if
the refcount is exactly one: only the caller has a reference. And by
'caller', I mean 'the calling C function'. Besides that, the caller takes
care to only use _PyString_Resize on strings it created itself.
Theoretically it could 'steal' a reference from someplace else, but I
haven't seen _PyString_Resize-using code do that, and it would be a recipe
for disaster.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From janssen at  Wed Feb 15 22:27:15 2006
From: janssen at (Bill Janssen)
Date: Wed, 15 Feb 2006 13:27:15 PST
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: Your message of "Wed, 15 Feb 2006 11:43:13 PST."
Message-ID: <06Feb15.132716pst."58633">

Well, I probably am, but that's not the reason.  Reading has nothing
to do with it.

The default mode (text) corrupts data on write on a certain platform
(Windows) by inserting extra bytes in the data stream.  This bug
particularly exhibits itself when programs developed on Linux or Mac
OS X are then run on a Windows platform.  I think it's a bug to
default to a mode which modifies the data stream.  The default mode
should be 'binary'; people interested in exploiting the obsolete
Windows distinction between "text" and "binary" should have to use a
mode switch (I suggest "t") to put a file stream in 'text' mode.


> On 2/15/06, Bill Janssen <janssen at> wrote:
> > The default behavior of the current open() in opening files as text is
> > particularly grating.
> Why? Are you perhaps one of those rare folks who read more binary data
> than text?
> --
> --Guido van Rossum (home page:

From guido at  Wed Feb 15 22:37:52 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 13:37:52 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <-8169137970497330069@unknownmsgid>
References: <>
Message-ID: <>

On 2/15/06, Bill Janssen <janssen at> wrote:
> Well, I probably am, but that's not the reason.  Reading has nothing
> to do with it.

Actually if you read binary data in text mode on Windows you also get
corrupt (and often truncated) data, unless you're lucky enough that
the binary data contains neither ^Z (EOF) nor CRLF.

> The default mode (text) corrupts data on write on a certain platform
> (Windows) by inserting extra bytes in the data stream.  This bug
> particularly exhibits itself when programs developed on Linux or Mac
> OS X are then run on a Windows platform.  I think it's a bug to
> default to a mode which modifies the data stream.  The default mode
> should be 'binary'; people interested in exploiting the obsolete
> Windows distinction between "text" and "binary" should have to use a
> mode switch (I suggest "t") to put a file stream in 'text' mode.

This might have been a possibility in Python 2.x where binary reads
return strings. In Python 3000 binary files will return bytes objects
while text files will return strings (which are decoded from unicode
using an encoding that's determined when the file is opened, taking
into account system and user settings as well as possible overrides
passed to open()). I expect that the APIs for reading and writing
binary data will be sufficiently different from that for
reading/writing text that even staunch Unix programmers won't make the
mistake of using the text API for creating binary files.

I realize that's not the answer you're looking for, but for backwards
compatibility we can't change the default on Windows in Python 2.x, so
the point is moot until 3.0 or until a new binary file API is added to

--Guido van Rossum (home page:

From gjc at  Wed Feb 15 21:55:05 2006
From: gjc at (Gustavo J. A. M. Carneiro)
Date: Wed, 15 Feb 2006 20:55:05 +0000
Subject: [Python-Dev] math.areclose ...?
In-Reply-To: <004d01c63251$4ea87340$452c4fca@csmith>
References: <>
	<00dd01c63142$3dd61280$892c4fca@csmith> <>
Message-ID: <1140036905.8544.3.camel@localhost.localdomain>

  Please, I don't much care about the fine points of the function's
semantics, but PLEASE rename that function to are_close.  Every time I
see this subject in my email client I have to think for a few seconds
what the hell 'areclose' means.  This time it's not just because of the
new PEP 8, 'areclose' is really really hard to read.

Gustavo J. A. M. Carneiro
<gjc at> <gustavo at>
The universe is always one step beyond logic

From barry at  Wed Feb 15 22:42:35 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 16:42:35 -0500
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 14:01 -0500, Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(),
> which works like this:
>   # two equivalent things
>   expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83,
> 227, 131, 79, 229, 201, 46, 106])

Kind of like binascii.unhexlify() but returning a bytes object.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From ncoghlan at  Wed Feb 15 22:43:24 2006
From: ncoghlan at (Nick Coghlan)
Date: Thu, 16 Feb 2006 07:43:24 +1000
Subject: [Python-Dev] Generalizing *args and **kwargs
In-Reply-To: <>
References: <>
Message-ID: <>

Thomas Wouters wrote:
> Although I've made it look like I have a working implementation, I haven't.
> I know exactly how to do it, though, except for the AST part ;) Once I
> figure out how to properly work with the AST code I'll probably write this
> patch whether it's a definite 'no' or not, just to see if I can. I wouldn't
> mind if people gave their opinion, though.

A phase 1 for Python 2.5 that allowed keyword args to go between "*args" and 
"**kwds" at the call site would be nice (Guido even approved the concept 
already, it's that it hasn't irritated anyone enough to actually tweak the 
grammar. . .)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From arekm at  Wed Feb 15 22:43:35 2006
From: arekm at (Arkadiusz Miskiewicz)
Date: Wed, 15 Feb 2006 22:43:35 +0100
Subject: [Python-Dev] how bugfixes are handled?
Message-ID: <dt07a8$khp$>


How bugfixes are handled? 

I've posted a bug and a patch + test case for a quite common issue (see
google, problem mentioned on this ml) long time ago and nothing happened
with it

Is anyone reviewing fixes on regular basis? Or just some bugfixes are
reviewed + commited depending on interest of commiters?

Arkadiusz Mi?kiewicz                    PLD/Linux Team

From guido at  Wed Feb 15 22:48:16 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 13:48:16 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, M.-A. Lemburg <mal at> wrote:
> Jason Orendorff wrote:
> > Also the pseudo-encodings ('hex', 'rot13',
> > 'zip', 'uu', etc.) generally scare me.
> Those are not pseudo-encodings, they are regular codecs.
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

It's fine that the codec system supports this. However it's
questionable that these encodings are invoked using the standard
encode() and decode() APIs; and it will be more questionable once
encode() returns a bytes object. Methods that return different types
depending on the value of an argument are generally a bad idea. (Hence
the movement to have separate opentext and openbinary or openbytes

--Guido van Rossum (home page:

From ncoghlan at  Wed Feb 15 22:53:50 2006
From: ncoghlan at (Nick Coghlan)
Date: Thu, 16 Feb 2006 07:53:50 +1000
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:
> Attached is the 2.5 release PEP 356.  It's also available from: 
> Does anyone have any comments?  Is this good or bad?  Feel free to
> send to me comments.
> We need to ensure that PEPs 308, 328, and 343 are implemented.  We
> have possible volunteers for 308 and 343, but not 328.  Brett is doing
> 352 and Martin is doing 353.

PEP 338 is pretty much ready to go, too - just waiting on Guido's review and 
pronouncement on the specific API used in the latest update (his last PEP 
parade said he was OK with the general concept, but I only posted the PEP 302 
compliant version after that).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From barry at  Wed Feb 15 22:57:41 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 16:57:41 -0500
Subject: [Python-Dev] A codecs nit (was Re:  bytes.from_hex())
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:

> Those are not pseudo-encodings, they are regular codecs.
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

Slightly off-topic, but one thing that's always bothered me about the
current codecs implementation is that str.encode() (and friends)
implicitly treats its argument as module, and imports it, even if the
module doesn't live in the encodings package.  That seems like a mistake
to me (and a potential security problem if the import has side-effects).
I don't know whether at the very least restricting the imports to the
encodings package would make sense or would break things.

>>> import sys
>>> sys.modules['smtplib']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'smtplib'
>>> ''.encode('smtplib')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding: smtplib
>>> sys.modules['smtplib']
<module 'smtplib' from '/usr/lib/python2.4/smtplib.pyc'>

I can't see any reason for allowing any randomly importable module to
act like an encoding.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From guido at  Wed Feb 15 22:58:42 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 13:58:42 -0800
Subject: [Python-Dev] how bugfixes are handled?
In-Reply-To: <dt07a8$khp$>
References: <dt07a8$khp$>
Message-ID: <>

We're all volunteers here, and we get a large volume of bugs.
Unfortunately, bugfixes are reviewed on a voluntary basis.

Are you aware of the standing offer that if you review 5 bugs/patches
some of the developers will pay attention to your bug/patch?

On 2/15/06, Arkadiusz Miskiewicz <arekm at> wrote:
> Hi,
> How bugfixes are handled?
> I've posted a bug and a patch + test case for a quite common issue (see
> google, problem mentioned on this ml) long time ago and nothing happened
> with it
> Is anyone reviewing fixes on regular basis? Or just some bugfixes are
> reviewed + commited depending on interest of commiters?
> Thanks,
> --
> Arkadiusz Mi?kiewicz                    PLD/Linux Team
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From fredrik at  Wed Feb 15 23:12:20 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 15 Feb 2006 23:12:20 +0100
Subject: [Python-Dev] still available
References: <dsq741$4un$>	<>	<>	<><>
Message-ID: <dt0905$qm9$>

Georg Brandl wrote:

> If something like Fredrik's new doc system is adopted

don't hold your breath, by the way.  it's clear that the current PSF-sponsored
site overhaul won't lead to anything remotely close to a best-of-breed python-
powered site, and I'm beginning to think that I should spend my time on other

I find it a bit sad that we'll end up with a butt-ugly static and boring
site when we have so much talent in the python universe, but I guess that's in-
evitable at this stage in Python's evolution.


From martin at  Wed Feb 15 23:15:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Feb 2006 23:15:00 +0100
Subject: [Python-Dev] ssize_t branch merged
Message-ID: <>

Just in case you haven't noticed, I just merged
the ssize_t branch (PEP 353).

If you have any corrections to the code to make which
you would consider bug fixes, just go ahead.

If you are uncertain how specific problems should be resolved,
feel free to ask.

If you think certain API changes should be made, please
discuss them here - they would need to be reflected in the
PEP as well.


From fredrik at  Wed Feb 15 23:28:59 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 15 Feb 2006 23:28:59 +0100
Subject: [Python-Dev] bytes type discussion
References: <>
Message-ID: <dt09vc$tvv$>

Guido van Rossum wrote:

> - it's probably too big to attempt to rush this into 2.5

After reading some of the discussion, and seen some of the arguments,
I'm beginning to feel that we need working code to get this right.

It would be nice if we could get a bytes() type into the first alpha, so
the design can get some real-world exposure in real-world apps/libs be-
fore 2.5 final.


From thomas at  Wed Feb 15 23:39:43 2006
From: thomas at (Thomas Wouters)
Date: Wed, 15 Feb 2006 23:39:43 +0100
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt09vc$tvv$>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006 at 11:28:59PM +0100, Fredrik Lundh wrote:

> After reading some of the discussion, and seen some of the arguments,
> I'm beginning to feel that we need working code to get this right.
> It would be nice if we could get a bytes() type into the first alpha, so
> the design can get some real-world exposure in real-world apps/libs be-
> fore 2.5 final.

I agree that working code would be nice, but I don't see why it should be in
an alpha release. IMHO it shouldn't be in an alpha release until it at least
looks good enough for the developers, and good enough to put in a PEP.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From fredrik at  Wed Feb 15 23:51:07 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 15 Feb 2006 23:51:07 +0100
Subject: [Python-Dev] bytes type discussion
References: <><dt09vc$tvv$>
Message-ID: <dt0b8s$2eb$>

Thomas Wouters wrote:

> > After reading some of the discussion, and seen some of the arguments,
> > I'm beginning to feel that we need working code to get this right.
> >
> > It would be nice if we could get a bytes() type into the first alpha, so
> > the design can get some real-world exposure in real-world apps/libs be-
> > fore 2.5 final.
> I agree that working code would be nice, but I don't see why it should be in
> an alpha release. IMHO it shouldn't be in an alpha release until it at least
> looks good enough for the developers, and good enough to put in a PEP.

I'm not convinced that the PEP will be good enough without experience
from using a bytes type in *real-world* (i.e. *existing*) byte-crunching

if we put it in an early alpha, we can use it with real code, fix any issues
that arises, and even remove it if necessary, before 2.5 final.  if it goes in
late, we'll be stuck with whatever the PEP says.


From fuzzyman at  Wed Feb 15 23:54:09 2006
From: fuzzyman at (Michael Foord)
Date: Wed, 15 Feb 2006 22:54:09 +0000
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Fuzzyman <fuzzyman at> wrote:
>>  Forcing the programmer to be aware of encodings, also pushes the same
>> requirement onto the user (who is often the source of the text in question).
> The programmer shouldn't have to be aware of encodings most of the
> time -- it's the job of the I/O library to determine the end user's
> (as opposed to the language's) default encoding dynamically and act
> accordingly. Users who use non-ASCII characters without informing the
> OS of their encoding are in a world of pain, *unless* they use the OS
> default encoding (which may vary per locale). If the OS can figure out
> the default encoding, so can the Python I/O library. Many apps won't
> have to go beyond this at all.
> Note that I don't want to use this OS/user default encoding as the
> default encoding between bytes and strings; once you are reading bytes
> you are writing "grown-up" code and you will have to be explicit. It's
> only the I/O library that should automatically encode on write and
> decode on read.
>>  Currently you can read a text file and process it - making sure that any
>> changes/requirements only use ascii characters. It therefore doesn't matter
>> what 8 bit ascii-superset encoding is used in the original. If you force the
>> programmer to specify the encoding in order to read the file, they would
>> have to pass that requirement onto their user. Their user is even less
>> likely to be encoding aware than the programmer.
> I disagree -- the user most likely has set or received a default
> encoding when they first got the computer, and that's all they are
> using. If other tools (notepad, wordpad, emacs, vi etc.) can figure
> out the encoding, so can Python's I/O library.
I'm intrigued  by the encoding guessing techniques you envisage. I 
currently use a modified version of something contained within docutils.

I read the file in binary and first check for UTF8 or UTF16 BOM.

Then I try to decode the text using the following encodings (in this 
order) :


(The encodings returned by the locale calls are only used on platforms 
for which they exist.)

The first decode that doesn't blow up, I assume is correct. The problem 
I have is that I usually (for the application I have in mind anyway) 
then want to re-encode into a consistent encoding rather than back into 
the original encoding. If the encoding of the original (usually 
unspecified) is any arbitrary 8-bit ascii superset (as it usually is), 
then it will probably not blow up if decoded with any other arbitrary 8 
bit encoding. This means I sometimes get junk.

I'm curious if there is any extra things I could do ? This is possibly 
beyond the scope of this discussion (in which case I apologise), but we 
are discussing the techniques the I/O layer would use to 'guess' the 
encoding of a file opened in text mode - so maybe it's not so off topic.

There is also the following cookbook recipe that uses an heuristic to 
guess encoding :

XML, HTML, or other text streams may also contain additional information 
about their encoding - which be unreliable. :-)

All the best,

Michael Foord

From tim at  Thu Feb 16 00:02:00 2006
From: tim at (Tim Parkin)
Date: Wed, 15 Feb 2006 23:02:00 +0000
Subject: [Python-Dev] still available
In-Reply-To: <dt0905$qm9$>
References: <dsq741$4un$>	<>	<>	<><>	<dt020q$s7$>
Message-ID: <>

Fredrik Lundh wrote:
> Georg Brandl wrote:
>>If something like Fredrik's new doc system is adopted
> don't hold your breath, by the way.  it's clear that the current PSF-sponsored
> site overhaul won't lead to anything remotely close to a best-of-breed python-
> powered site, and I'm beginning to think that I should spend my time on other
> stuff.
> I find it a bit sad that we'll end up with a butt-ugly static and boring
> site when we have so much talent in the python universe, but I guess that's in-
> evitable at this stage in Python's evolution.
> </F>
Some very large sites - and some may say some very interesting, very
large sites - are delivered as static html (for some time the two
biggest sites in the uk were both delivered as static html, one of which
was and the other was for which I used to be
the main web developer. As far as I know the bbc and sporting life still
both use static html for a large portion of their content).

Regarding the python site, it was a concious decision to deliver the
pages as static html. This was for many reasons, of which a prominent
one (but by no means the only major one) was mirroring.

One of the advantages of a semantically structured website that uses css
for layout and style is that, as far as design goes, you are welcome to
re-style the html using css; we can also offer it as an alternate
stylesheet (just as I've added a 'large font' style and a 'default font
settings' style). However, design is a subjective thing - I've spent
quite a bit of time reacting to the majority of constructive feedback
(probably far too much time when I should have been getting content
migrated) but obviously it won't please everyone :-)

As for cutting edge, it's using twisted, restructured text, nevow, clean
urls, xhtml, semantic markup, css2, interfaces, adaption, eggs, the path
module, moinmoin, yaml (to avoid xml), etc  - just because it's
generating all of the html up front rather than at runtime doesn't mean
that it's not best-of-breed (although I'm not sure what best-of-breed
is; I'm presuming it's some sort of accolade for excellence in python
programming; something I don't think I would be qualified to judge,
never mind receive).

However, back to the Goerg's comment, we could use mod_write to map:





rewriteRule ^/lib/(.*)$ /doc/lib/module-$1.html [L,R=301]

(not tested)

Whether that is a good idea or not is another matter.

Tim Parkin

From fredrik at  Thu Feb 16 00:11:38 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 00:11:38 +0100
Subject: [Python-Dev] still available
References: <dsq741$4un$>	<>	<>	<><>	<dt020q$s7$><dt0905$qm9$>
Message-ID: <dt0cfb$68v$>

Tim Parkin wrote:

> As for cutting edge, it's using twisted, restructured text, nevow, clean
> urls, xhtml, semantic markup, css2, interfaces, adaption, eggs, the path
> module, moinmoin, yaml (to avoid xml),

that's not cutting edge, that's buzzword bingo.

> something I don't think I would be qualified to judge,never mind receive).

no, you're not qualified.  yet, someone gave you total control over the
future of, and there's no way to make you give it up, despite
the fact that you're over a year late and the stuff you've delivered this
far is massively underwhelming.  that's the problem.


From nas at  Thu Feb 16 00:14:08 2006
From: nas at (Neil Schemenauer)
Date: Wed, 15 Feb 2006 23:14:08 +0000 (UTC)
Subject: [Python-Dev] bytes type needs a new champion
References: <>
Message-ID: <dt0cjv$65j$>

Guido van Rossum <guido at> wrote:
> Anyway, we need a new PEP author who can take the current
> discussion and turn it into a coherent PEP.

I'm not sure that I have time to be the official champion.  Right
now I'm spending some time to collect all the ideas presented in the
email messages and put them into a draft PEP.  Hopefully that will
be useful.


From guido at  Thu Feb 16 00:15:48 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 15:15:48 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Michael Foord <fuzzyman at> wrote:
> I'm intrigued  by the encoding guessing techniques you envisage.

Don't hold your breath. *I* am not very interested in guessing
encodings -- I was just commenting on posts by others that mentioned
difficulties caused by this approach. My position is that the standard
library (with the exception of XML processing code perhaps) shouldn't
be *guessing* encodings but simply using the encoding specified by the
user (or the OS default) in the environment or some such place. (It is
OS dependent how to retrieve this information but my hypothesis is
that every OS with any kind of text support has a way to get this info
-- even if it's as rudimentary as "it's always ASCII" (v7 Unix :-) or
"it's always UTF-8" (I am hoping this will eventually be the answer in
the distant future).

--Guido van Rossum (home page:

From guido at  Thu Feb 16 00:20:16 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 15:20:16 -0800
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt0b8s$2eb$>
References: <>
	<dt09vc$tvv$> <>
Message-ID: <>

I'm actually assuming to put this off until 2.6 anyway.

On 2/15/06, Fredrik Lundh <fredrik at> wrote:
> Thomas Wouters wrote:
> > > After reading some of the discussion, and seen some of the arguments,
> > > I'm beginning to feel that we need working code to get this right.
> > >
> > > It would be nice if we could get a bytes() type into the first alpha, so
> > > the design can get some real-world exposure in real-world apps/libs be-
> > > fore 2.5 final.
> >
> > I agree that working code would be nice, but I don't see why it should be in
> > an alpha release. IMHO it shouldn't be in an alpha release until it at least
> > looks good enough for the developers, and good enough to put in a PEP.
> I'm not convinced that the PEP will be good enough without experience
> from using a bytes type in *real-world* (i.e. *existing*) byte-crunching
> applications.
> if we put it in an early alpha, we can use it with real code, fix any issues
> that arises, and even remove it if necessary, before 2.5 final.  if it goes in
> late, we'll be stuck with whatever the PEP says.
> </F>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From guido at  Thu Feb 16 00:21:08 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 15:21:08 -0800
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>
Message-ID: <>

Great! I'll mark the PEP as accepted. (Which doesn't mean you can't
update it if changes are found necessary.)


On 2/15/06, "Martin v. L?wis" <martin at> wrote:
> Just in case you haven't noticed, I just merged
> the ssize_t branch (PEP 353).
> If you have any corrections to the code to make which
> you would consider bug fixes, just go ahead.
> If you are uncertain how specific problems should be resolved,
> feel free to ask.
> If you think certain API changes should be made, please
> discuss them here - they would need to be reflected in the
> PEP as well.
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From greg.ewing at  Thu Feb 16 00:54:56 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 12:54:56 +1300
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$> <>
	<> <>
	<> <>
Message-ID: <>

Ron Adam wrote:

> I was presuming it would be done in C code and it will just need a 
> pointer to the first byte, memchr(), and then read n bytes directly into 
> a new memory range via  memcpy().

If the object supports the buffer interface, it can be
done that way. But if not, it would seem to make sense to
fall back on the iterator protocol.

> However, if it's done with a Python iterator and then each item is 
> translated to bytes in a sequence, (much slower), an encoding will need 
> to be known for it to work correctly.

No, it won't. When using the bytes(x) form, encoding has
nothing to do with it. It's purely a conversion from one
representation of an array of 0..255 to another.

When you *do* want to perform encoding, you use
bytes(u, encoding) and say what encoding you want
to use.

> Unfortunately Unicode strings 
> don't set an attribute to indicate it's own encoding.

I think you don't understand what an encoding is. Unicode
strings don't *have* an encoding, because theyre not encoded!
Encoding is what happens when you go from a unicode string
to something else.

> Since some longs will be of different length, yes a bytes(0L) could give 
> differing results on different platforms,

It's not just a matter of length. I'm not sure of the
details, but I believe longs are currently stored as an
array of 16-bit chunks, of which only 15 bits are used.
I'm having trouble imagining a use for low-level access
to that format, other than just treating it as an opaque
lump of data for turning back into a long later -- in
which case why not just leave it as a long in the first


From fredrik at  Thu Feb 16 01:09:04 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 01:09:04 +0100
Subject: [Python-Dev] bytes type discussion
References: <><dt09vc$tvv$>
Message-ID: <dt0fr2$fmg$>

Guido wrote:

> I'm actually assuming to put this off until 2.6 anyway.

makes sense.

(but will there be a 2.6?  isn't it time to start hacking on 3.0?)


From greg.ewing at  Thu Feb 16 01:03:27 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 13:03:27 +1300
Subject: [Python-Dev] nice()
In-Reply-To: <000e01c63226$fc342660$7c2c4fca@csmith>
References: <>
Message-ID: <>

Smith wrote:

> The problem with areclose(), however, is that it
> only solves one part of the problem that needs to be solved 
> if two fp's *are* going to be compared: if you are going to 
> check if a < b you would need to do something like 
>     not areclose(a,b) and a < b

No, no, no.

If your algorithm is well-designed, it won't matter which
way the comparison goes if a and b are that close.

In any case, the idea behind nice() is fundamentally doomed.
IT CANNOT WORK, because the numbers it's returning are still
binary, not decimal.


From barry at  Thu Feb 16 01:29:17 2006
From: barry at (Barry Warsaw)
Date: Wed, 15 Feb 2006 19:29:17 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt0fr2$fmg$>
References: <>
	<dt09vc$tvv$> <>
Message-ID: <>

On Thu, 2006-02-16 at 01:09 +0100, Fredrik Lundh wrote:

> (but will there be a 2.6?  isn't it time to start hacking on 3.0?)

We know at least there will never be a 2.10, so I think we still have


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From nas at  Thu Feb 16 01:36:35 2006
From: nas at (Neil Schemenauer)
Date: Thu, 16 Feb 2006 00:36:35 +0000 (UTC)
Subject: [Python-Dev] from __future__ import unicode_strings?
Message-ID: <dt0hej$j09$>

I'm in the process of summarizing the dicussion on the bytes object
and an idea just occured to me.  Imagine that I want to write code
that deals with strings and I want to be maximally compatible with
P3k.  It would be nice if I could add:

    from __future__ import unicode_strings

and have string literals without a 'u' prefix become unicode
instances.  I'm not sure how tricky the implementation would be but
it seems like a useful feature.

An even crazier idea is to have that import change 'str' to be
an alias for 'unicode'.


From bokr at  Thu Feb 16 01:43:24 2006
From: bokr at (Bengt Richter)
Date: Thu, 16 Feb 2006 00:43:24 GMT
Subject: [Python-Dev] bytes type discussion
References: <>
	<dt09vc$tvv$> <>
Message-ID: <>

On Wed, 15 Feb 2006 15:20:16 -0800, Guido van Rossum <guido at> wrote:

>I'm actually assuming to put this off until 2.6 anyway.
>On 2/15/06, Fredrik Lundh <fredrik at> wrote:
>> Thomas Wouters wrote:
>> > > After reading some of the discussion, and seen some of the arguments,
>> > > I'm beginning to feel that we need working code to get this right.
>> > >
>> > > It would be nice if we could get a bytes() type into the first alpha, so
>> > > the design can get some real-world exposure in real-world apps/libs be-
>> > > fore 2.5 final.
>> >
>> > I agree that working code would be nice, but I don't see why it should be in
>> > an alpha release. IMHO it shouldn't be in an alpha release until it at least
>> > looks good enough for the developers, and good enough to put in a PEP.
>> I'm not convinced that the PEP will be good enough without experience
>> from using a bytes type in *real-world* (i.e. *existing*) byte-crunching
>> applications.
>> if we put it in an early alpha, we can use it with real code, fix any issues
>> that arises, and even remove it if necessary, before 2.5 final.  if it goes in
>> late, we'll be stuck with whatever the PEP says.
>> </F>

I could hardly keep up with reading, never mind trying some things
and writing coherently, so if others had that experience, 2.6 sounds +1.

I agree with Fredrik that an implementation to try in real-world use cases
would probably yield valuable information.

As a step in that direction, could we have a sub-thread on what methods
to implement for bytes?

I.e., which str methods make sense, which special methods? How many methods
from list make sense, given that bytes will be mutable? How much of array.array('B')
should be emulated? (a protype hack could just wrap array.array for storage).

Should the type really be a subclass of int? I think that might be hard for
prototyping, since builtin types as bases seem to get priority subclass bypass
access from some builtin functions. At least I've had some frustrations with that.

If it were a kind of int, would it be an int-string, where int(bytes([65])) would work
like ord does with non-length-1? BTW bytes([1,2])[1] by analogy to str should
then return bytes([2]), shouldn't it? I have a feeling a lot of str-like methods
will bomb if that's not so.

 >>> int(bytes([1,2]))  # faked ;-)
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: int() expected a byte, but bytes of length 2 found

I've hacked a few pieces, but I think further discussion either in this thread
or maybe a bytes prototype spec thread would be fruitful.

By the time a prototype spec takes shape, someone will probably have beaten me
to something workable, but that's ok ;-)

Then a PEP will mostly be writing and collecting rationale references etc.
That's really not my favorite kind of work, frankly. But I like thinking and programming.

Bengt Richter

From aahz at  Thu Feb 16 01:55:48 2006
From: aahz at (Aahz)
Date: Wed, 15 Feb 2006 16:55:48 -0800
Subject: [Python-Dev] Off-topic:
In-Reply-To: <dt0cfb$68v$>
References: <dsq741$4un$>
	<> <dt0cfb$68v$>
Message-ID: <>

On Thu, Feb 16, 2006, Fredrik Lundh wrote:
> Tim Parkin wrote:
>> [...]
> no, you're not qualified.  yet, someone gave you total control over the
> future of, and there's no way to make you give it up, despite
> the fact that you're over a year late and the stuff you've delivered this
> far is massively underwhelming.  that's the problem.

In all fairness to Tim (and despite the fact that emotionally I agree
with you), the fact is that there had been essentially no forward motion
on redesign until he went to work.  Even if we end up
chucking out all his work in favor of something else, I'll consider the
PSF's money well-spent for bringing the community energy into it.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From exarkun at  Thu Feb 16 01:56:17 2006
From: exarkun at (Jean-Paul Calderone)
Date: Wed, 15 Feb 2006 19:56:17 -0500
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <dt0hej$j09$>
Message-ID: <20060216005617.6122.1311347827.divmod.quotient.50@ohm>

On Thu, 16 Feb 2006 00:36:35 +0000 (UTC), Neil Schemenauer <nas at> wrote:
>I'm in the process of summarizing the dicussion on the bytes object
>and an idea just occured to me.  Imagine that I want to write code
>that deals with strings and I want to be maximally compatible with
>P3k.  It would be nice if I could add:
>    from __future__ import unicode_strings
>and have string literals without a 'u' prefix become unicode
>instances.  I'm not sure how tricky the implementation would be but
>it seems like a useful feature.

FWIW, I've considered this before, and superficially at least, it seems attractive.

>An even crazier idea is to have that import change 'str' to be
>an alias for 'unicode'.

That's further than I went, though :)  Until there's a replacement for str, this would make it impossible to do certain things with that __future__ import in effect.


From guido at  Thu Feb 16 02:23:56 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 17:23:56 -0800
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <dt0hej$j09$>
References: <dt0hej$j09$>
Message-ID: <>

On 2/15/06, Neil Schemenauer <nas at> wrote:
> I'm in the process of summarizing the dicussion on the bytes object
> and an idea just occured to me.  Imagine that I want to write code
> that deals with strings and I want to be maximally compatible with
> P3k.  It would be nice if I could add:
>     from __future__ import unicode_strings
> and have string literals without a 'u' prefix become unicode
> instances.  I'm not sure how tricky the implementation would be but
> it seems like a useful feature.

Didn't we have a command-line option to do this? I believe it was
removed because nobody could see the point. (Or am I hallucinating?
After several days of non-stop discussing bytes that must be
considered a possibility.)

Of course a per-module switch is much more useful.

> An even crazier idea is to have that import change 'str' to be
> an alias for 'unicode'.

Now *that's* crazy talk. :-)

It's probably easier to do that by placing a line

  str = unicode

at the top of the file. Of course (like a good per-module switch
should!) this won't affect code in other modules that you invoke so
it's not clear that it always does the right thing. But it's a start.

--Guido van Rossum (home page:

From jeremy at  Thu Feb 16 02:25:06 2006
From: jeremy at (Jeremy Hylton)
Date: Wed, 15 Feb 2006 20:25:06 -0500
Subject: [Python-Dev] still available
In-Reply-To: <dt0cfb$68v$>
References: <dsq741$4un$>
	<dt020q$s7$> <dt0905$qm9$>
	<> <dt0cfb$68v$>
Message-ID: <>

I don't think this message is on-topic for python-dev.  There are lots
of great places to discuss the design of the python web site, but the
list for developers doesn't seem like a good place for it.  Do we need
a different list for people to gripe^H^H^H^H^H discuss the web site?


On 2/15/06, Fredrik Lundh <fredrik at> wrote:
> Tim Parkin wrote:
> > As for cutting edge, it's using twisted, restructured text, nevow, clean
> > urls, xhtml, semantic markup, css2, interfaces, adaption, eggs, the path
> > module, moinmoin, yaml (to avoid xml),
> that's not cutting edge, that's buzzword bingo.
> > something I don't think I would be qualified to judge,never mind receive).
> no, you're not qualified.  yet, someone gave you total control over the
> future of, and there's no way to make you give it up, despite
> the fact that you're over a year late and the stuff you've delivered this
> far is massively underwhelming.  that's the problem.
> </F>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From thomas at  Thu Feb 16 02:43:02 2006
From: thomas at (Thomas Wouters)
Date: Thu, 16 Feb 2006 02:43:02 +0100
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <>
References: <dt0hej$j09$>
Message-ID: <>

On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:

> >     from __future__ import unicode_strings

> Didn't we have a command-line option to do this? I believe it was
> removed because nobody could see the point. (Or am I hallucinating?
> After several days of non-stop discussing bytes that must be
> considered a possibility.)

We do, and it's not been removed: the -U switch.

Python 2.3.5 (#2, Nov 21 2005, 01:27:27)
>>> ""
Python 2.4.2 (#2, Nov 21 2005, 02:24:28)
>>> ""
Python 2.5a0 (trunk:42390, Feb 16 2006, 00:12:03)
>>> ""

I've never seen it *used*, though, and IIRC there were quite a number of
stdlib modules that broke when you used it, at least back when it was

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From rrr at  Thu Feb 16 03:11:59 2006
From: rrr at (Ron Adam)
Date: Wed, 15 Feb 2006 20:11:59 -0600
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
 Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <dsbc3h$rct$>
	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
Message-ID: <>

Greg Ewing wrote:

> I think you don't understand what an encoding is. Unicode
> strings don't *have* an encoding, because theyre not encoded!
> Encoding is what happens when you go from a unicode string
> to something else.

Ah.. ok, my mental picture was a bit off.  I had this reversed somewhat.

> It's not just a matter of length. I'm not sure of the
> details, but I believe longs are currently stored as an
> array of 16-bit chunks, of which only 15 bits are used.
> I'm having trouble imagining a use for low-level access
> to that format, other than just treating it as an opaque
> lump of data for turning back into a long later -- in
> which case why not just leave it as a long in the first
> place.

I had laps thinking Pythons longs are the same as c longs. I know 
Pythons longs can get much much bigger.

The idea was to be able to show the byte data as is in what ever form it 
takes and not try to change it, weather it's longs, floats, strings, etc.


From aahz at  Thu Feb 16 03:35:29 2006
From: aahz at (Aahz)
Date: Wed, 15 Feb 2006 18:35:29 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 14, 2006, Guido van Rossum wrote:
> Anyway, I'm now convinced that bytes should act as an array of ints,
> where the ints are restricted to range(0, 256) but have type int.

range(0, 255)?
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From greg.ewing at  Thu Feb 16 03:47:55 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 15:47:55 +1300
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> So how about
> openbytes? This clearly links the resulting object with the bytes
> type, which is mutually reassuring.

That looks quite nice.

Another thought -- what is going to happen to
Will it change to return bytes, or will there be a new

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From nas at  Thu Feb 16 03:49:11 2006
From: nas at (Neil Schemenauer)
Date: Wed, 15 Feb 2006 19:49:11 -0700
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <>
References: <dt0hej$j09$>
Message-ID: <>

On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
> On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:
> > >     from __future__ import unicode_strings
> > Didn't we have a command-line option to do this? I believe it was
> > removed because nobody could see the point. (Or am I hallucinating?
> > After several days of non-stop discussing bytes that must be
> > considered a possibility.)
> We do, and it's not been removed: the -U switch.

As Guido alluded, the global switch is useless.  A per-module switch
something that could actually useful.  One nice advantage is that
you would write code that works the same with Jython (wrt to string
literals anyhow).


From bob at  Thu Feb 16 03:49:39 2006
From: bob at (Bob Ippolito)
Date: Wed, 15 Feb 2006 18:49:39 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 15, 2006, at 6:35 PM, Aahz wrote:

> On Tue, Feb 14, 2006, Guido van Rossum wrote:
>> Anyway, I'm now convinced that bytes should act as an array of ints,
>> where the ints are restricted to range(0, 256) but have type int.
> range(0, 255)?

No, Guido was correct.  range(0, 256) is [0, 1, 2, ..., 255].


From nas at  Thu Feb 16 03:55:16 2006
From: nas at (Neil Schemenauer)
Date: Wed, 15 Feb 2006 19:55:16 -0700
Subject: [Python-Dev] Pre-PEP: The "bytes" object
Message-ID: <>

This could be a replacement for PEP 332.  At least I hope it can
serve to summarize the previous discussion and help focus on the
currently undecided issues.

I'm too tired to dig up the rules for assigning it a PEP number.
Also, there are probably silly typos, etc.   Sorry.

-------------- next part --------------
Title: The "bytes" object
Version: $Revision$
Last-Modified: $Date$
Author: Neil Schemenauer <nas at>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 15-Feb-2006
Python-Version: 2.5


This PEP outlines the introduction of a raw bytes sequence object.
Adding the bytes object is one step in the transition to Unicode based
str objects.


Python's current string objects are overloaded. They serve to hold
both sequences of characters and sequences of bytes. This overloading
of purpose leads to confusion and bugs. In future versions of Python,
string objects will be used for holding character data. The bytes object
will fulfil the role of a byte container. Eventually the unicode
built-in will be renamed to str and the str object will be removed.


A bytes object stores a mutable sequence of integers that are in the
range 0 to 255.  Unlike string objects, indexing a bytes object returns
an integer.  Assigning an element using a object that is not an integer
causes a TypeError exception.  Assigning an element to a value outside
the range 0 to 255 causes a ValueError exception.  The __len__ method of
bytes returns the number of integers stored in the sequence (i.e. the
number of bytes).

The constructor of the bytes object has the following signature:

    bytes([initialiser[, [encoding]])

If no arguments are provided then an object containing zero elements is
created and returned.  The initialiser argument can be a string or a
sequence of integers.  The pseudo-code for the constructor is:

    def bytes(initialiser=[], encoding=None):
        if isinstance(initialiser, basestring):
            if encoding is None or encoding.lower() == 'ascii':
                # raises UnicodeDecodeError if the string contains
                # non-ASCII characters
                initialiser = initialiser.encode('ascii')
            elif isinstance(initialiser, unicode):
                initialiser = initialiser.encode(encoding)
                # silently ignore the encoding argument if the
                # initialiser is a str object
            initialiser = [ord(c) for c in initialiser]
        elif encoding is not None:
            raise TypeError("explicit encoding invalid for non-string "
        create bytes object and fill with integers from initialiser
        return bytes object

The __repr__ method returns a string that can be evaluated to generate a
new bytes object containing the same sequence of integers.  The sequence
is represented by a list of ints.  For example:

    >>> repr(bytes[10, 20, 30])
    'bytes([10, 20, 30])'

The object has a decode method equivalent to the decode method of the
str object.  The object has a classmethod fromhex that takes a string of
characters from the set [0-9a-zA-Z ] and returns a bytes object (similar
to binascii.unhexlify).  For example:

    >>> bytes.fromhex('5c5350ff')
    bytes([92, 83, 80, 255]])
    >>> bytes.fromhex('5c 53 50 ff')
    bytes([92, 83, 80, 255]])

The object has a hex method that does the reverse conversion (similar to

    >> bytes([92, 83, 80, 255]]).hex()

The bytes object has methods similar to the list object:


Out of scope issues

* If we provide a literal syntax for bytes then it should look distinctly
  different than the syntax for literal strings.  Also, a new type, even
  built-in, is much less drastic than a new literal (which requires
  lexer and parser support in addition to everything else).  Since there
  appears to be no immediate need for a literal representation,
  designing and implementing one is out of the scope of this PEP.

* Python 3k will have a much different I/O subsystem.  Deciding how that
  I/O subsystem will work and interact with the bytes object is out of
  the scope of this PEP.

* It has been suggested that a special method named __bytes__ be added
  to language to allow objects to be converted into byte arrays.  This
  decision is out of scope.

Unresolved issues

* Perhaps the bytes object should be implemented as a extension module
  until we are more sure of the design (similar to how the set object
  was prototyped).

* Should the bytes object implement the buffer interface?  Probably, but
  we need to look into the implications of that (e.g. regex operations
  on byte arrays).

* Should the object implement __reversed__ and reverse?  Should it
  implement sort?

* Need to clarify what some of the methods do.  How are comparisons
  done?  Hashing?  Pickling and marshalling?

Questions and answers

Q: Why have the optional encoding argument when the encode method of
   Unicode objects does the same thing.

A: In the current version of Python, the encode method returns a str
   object and we cannot change that without breaking code.  The construct
   bytes(s.encode(...)) is expensive because it has to copy the byte
   sequence multiple times.  Also, Python generally provides two ways of
   converting an object of type A into an object of type B: ask an A
   instance to convert itself to a B, or ask the type B to create a new
   instance from an A. Depending on what A and B are, both APIs make
   sense; sometimes reasons of decoupling require that A can't know
   about B, in which case you have to use the latter approach; sometimes
   B can't know about A, in which case you have to use the former.

Q: Why does bytes ignore the encoding argument if the initialiser is a

A: There is no sane meaning that the encoding can have in that case.
   str objects *are* byte arrays and they know nothing about the
   encoding of character data they contain.  We need to assume that the
   programmer has provided str object that already uses the desired
   encoding. If you need something other than a pure copy of the bytes
   then you need to first decode the string.  For example:

       bytes(s.decode(encoding1), encoding2)

Q: Why not have the encoding argument default to Latin-1 (or some other
   encoding that covers the entire byte range) rather than ASCII ?

A: The system default encoding for Python is ASCII.  It seems least
   confusing to use that default.  Also, in Py3k, using Latin-1 as
   the default might not be what users expect.  For example, they might
   prefer a Unicode encoding.  Any default will not always work as
   expected.  At least ASCII will complain loudly if you try to encode
   non-ASCII data.


This document has been placed in the public domain.

   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70

From guido at  Thu Feb 16 03:57:26 2006
From: guido at (Guido van Rossum)
Date: Wed, 15 Feb 2006 18:57:26 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
> > So how about
> > openbytes? This clearly links the resulting object with the bytes
> > type, which is mutually reassuring.
> That looks quite nice.
> Another thought -- what is going to happen to
> Will it change to return bytes, or will there be a new
> os.openbytes?

Hm, I hadn't thought about that yet. On Windows, has the
ability to set text or binary mode. But IMO it's better to make this
always use binary mode.

My expectation is that the Py3k standard I/O library will do all of
its own conversions on top of binary files anyway -- if you missed it,
I'd like to get rid of any ties to C's stdio.

--Guido van Rossum (home page:

From greg.ewing at  Thu Feb 16 04:00:16 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 16:00:16 +1300
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

M.-A. Lemburg wrote:

> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> renamed to str.openfile())

This seems wrong to me, because it creates an unnecessary
dependency of the bytes/str/unicode types on the file type.
These types should remain strictly focused on being just
containers for data.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 16 04:06:45 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 16:06:45 +1300
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

Barry Warsaw wrote:

> If we go with two functions, I'd much rather hang them off of the file
> type object then add two new builtins.  I really do think file.bytes()
> and file.text() (a.k.a. open.bytes() and open.text()) is better than
> opentext() or openbytes().

I'm worried about feeping creaturism of the file type
here. To my mind, the file type already has too many
features, and this hinders code that wants to define
its own file-like objects.

In 3.0 I'd like to see the file type reduced to having
as simple an interface as possible (basically just
read/write) and all other stuff (readlines, text codecs,
etc.) implemented as wrappers around it.

To be compatible with that model, opentext() etc.
need to be factory functions returning the appropriate
stack of objects. As such they shouldn't be class
methods of any type.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 16 04:12:23 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 16:12:23 +1300
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
 coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Jason Orendorff wrote:

> Also the pseudo-encodings ('hex', 
> 'rot13', 'zip', 'uu', etc.) generally scare me.

I think these will have to cease being implemented as
encodings in 3.0. They should really never have been
in the first place.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 16 04:29:32 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 16:29:32 +1300
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

Trent Mick wrote:

> On Windows you download an MSI (it ends up in your browser downloads
> folder), it starts the installation, and the end of the installation it
> starts the app for you.

Which then conveniently inserts a virus into my system.
No, thanks. (Okay up until that last bit, though.)

This isn't really a problem with the Mac, but with the
Mac-Web interface. If there were a file format (e.g.
.app.tar.gz) that the Mac would recognise as an app
and unpack automatically and put in an appropriate
place, things would be much the same. (Including
running it automatically, if you were insane enough
to turn that option on.)

> ...anyway this is getting seriously OT for python-dev. :)

Agreed. I will say no more about it here.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From kbk at  Thu Feb 16 05:00:22 2006
From: kbk at (Kurt B. Kaiser)
Date: Wed, 15 Feb 2006 23:00:22 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <>

Patch / Bug Summary

Patches :  399 open ( +8) /  3042 closed ( +4) /  3441 total (+12)
Bugs    :  923 open ( +8) /  5553 closed (+13) /  6476 total (+21)
RFE     :  209 open ( +0) /   198 closed ( +1) /   407 total ( +1)

New / Reopened Patches

urllib proxy_bypass broken  (2006-02-07)  opened by  Anthony Tuininga

Implementation of PEP 357  (2006-02-09)
CLOSED  opened by  Travis Oliphant

pdb: fix for  1326406 (import  __main__ pdb failure)  (2006-02-10)  opened by  Ilya Sandler

Implement PEP 357 for real  (2006-02-11)  opened by  Travis Oliphant

PEP 338 implementation  (2006-02-11)  opened by  Nick Coghlan

PEP 338 documentation  (2006-02-11)  opened by  Nick Coghlan

Link Python modules to libpython on linux if --enable-shared  (2006-02-11)  opened by  Gustavo J. A. M. Carneiro needs to know about doctests  (2006-02-11)  opened by  Marius Gedminas

Missing HCI sockets in bluetooth code from socketmodule  (2006-02-15)  opened by  Philippe Biondi

Patches Closed

Implementation of PEP 357  (2006-02-10)  closed by  ncoghlan

Prefer linking against ncursesw over ncurses library  (2006-02-09)  closed by  loewis

Enhancing '-m' to support packages (PEP 338)  (2004-10-09)  closed by  ncoghlan

File-iteration and read* method protection  (2006-01-05)  closed by  twouters

New / Reopened Bugs
___________________ bug / corrupt data  (2006-02-08)  opened by  Chris86

List not initialized if used as default argument  (2006-02-08)
CLOSED  opened by  Jason

Crash on invalid coding pragma  (2006-02-09)
CLOSED  opened by  ocean-city

add /usr/local support  (2006-02-09)
CLOSED  opened by  Karol Pietrzak

set documentation deficiencies  (2006-02-10)
CLOSED  opened by  Keith Briggs

For loop exit early  (2006-02-10)  opened by  msmith

segfault in FreeBSD  (2006-02-11)
CLOSED  opened by  aix-d AttributeError on BadStatusLine  (2006-02-11)  opened by  Robert Kiendl

smtplib: empty mail addresses  (2006-02-12)  opened by  Freek Dijkstra

urlib2  (2006-02-13)  opened by  halfik

recursive __getattr__ in thread crashes OS X  (2006-02-12)  opened by  Aaron Swartz

CSV Sniffer fails to report mismatch of column counts  (2006-02-13)  opened by  Vinko

Logging hangs thread after detaching a StreamHandler's termi  (2006-02-13)  opened by  Yang Zhang

long path support in win32 part of os.listdir(posixmodule.c)  (2006-02-14)  opened by  Sergey Dorofeev

pydoc still doesn't handle lambda well  (2006-02-15)  opened by  William McVey

Descript of file-object read() method is wrong.  (2006-02-15)  opened by  Grant Edwards

arrayobject should use PyObject_VAR_HEAD  (2006-02-15)  opened by  Jim Jewett

Bugs Closed

Random stack corruption from socketmodule.c   (2004-01-13)  closed by  nnorwitz

patch for etree cdata and attr quoting  (2006-02-04)  closed by  effbot

List not initialized if used as default argument  (2006-02-08)  closed by  birkenfeld modified to work with .NET 2005 on win64  (2006-02-06)  closed by  loewis

email.Message should supress warning from uu.decode  (2006-01-18)  closed by  bwarsaw

Crash on invalid coding pragma  (2006-02-09)  closed by  birkenfeld

add /usr/local support  (2006-02-10)  closed by  loewis

set documentation deficiencies  (2006-02-10)  closed by  birkenfeld

segfault in FreeBSD  (2006-02-11)  closed by  nnorwitz

typo in tutorial  (2006-02-12)  closed by  effbot

New / Reopened RFE

itertools.any and itertools.all  (2006-02-15)
CLOSED  opened by  paul cannon

RFE Closed

itertools.any and itertools.all  (2006-02-15)  closed by  birkenfeld

From aahz at  Thu Feb 16 05:20:31 2006
From: aahz at (Aahz)
Date: Wed, 15 Feb 2006 20:20:31 -0800
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349? [
	Was:Re: release plan for 2.5 ?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006, Bob Ippolito wrote:
> On Feb 15, 2006, at 6:35 PM, Aahz wrote:
>> On Tue, Feb 14, 2006, Guido van Rossum wrote:
>>> Anyway, I'm now convinced that bytes should act as an array of ints,
>>> where the ints are restricted to range(0, 256) but have type int.
>> range(0, 255)?
> No, Guido was correct.  range(0, 256) is [0, 1, 2, ..., 255].

My mistake -- I wasn't thinking of the literal Python function.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From jcarlson at  Thu Feb 16 06:36:03 2006
From: jcarlson at (Josiah Carlson)
Date: Wed, 15 Feb 2006 21:36:03 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Jason Orendorff wrote:
> > Also the pseudo-encodings ('hex', 
> > 'rot13', 'zip', 'uu', etc.) generally scare me.
> I think these will have to cease being implemented as
> encodings in 3.0. They should really never have been
> in the first place.

I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
and likely a few others that the two of you may be arguing against
should stay as encodings, because strictly speaking, they are defined as
encodings of data.  They may not be encodings of _unicode_ data, but
that doesn't mean that they aren't useful encodings for other kinds of
data, some text, some binary, ...

 - Josiah

From nnorwitz at  Thu Feb 16 06:36:01 2006
From: nnorwitz at (Neal Norwitz)
Date: Wed, 15 Feb 2006 21:36:01 -0800
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Barry Warsaw <barry at> wrote:
> I haven't been following the AST stuff closely enough, but I'm not crazy
> about putting access to this in the sys module.  It seems like it
> clutters that up with a name that will be rarely used by the average
> Python programmer.

Agreed.  I'm hoping we can get rid of lots of code in the compiler
module and use the AST provided from C.  The compiler module seems the
best place to put anything related to the AST.

Regardless of what internal approach is used.  We can still try to
hash out a nice API.


From nnorwitz at  Thu Feb 16 06:44:19 2006
From: nnorwitz at (Neal Norwitz)
Date: Wed, 15 Feb 2006 21:44:19 -0800
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <dt04in$a5l$>
References: <>
	<> <dt04in$a5l$>
Message-ID: <>

On 2/15/06, Fredrik Lundh <fredrik at> wrote:
> (is the xmlplus/xmlcore issue still an issue, btw?)

What issue are you talking about?


From nnorwitz at  Thu Feb 16 06:50:47 2006
From: nnorwitz at (Neal Norwitz)
Date: Wed, 15 Feb 2006 21:50:47 -0800
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Alain Poirier <alain.poirier at> wrote:
>   - isn't the current implementation of itertools.tee (cache of previous
>     generated values) incompatible with the new possibility to feed a
>     generator (PEP 342) ?

I'm not sure what you are referring to.  What is the issue?


From brett at  Thu Feb 16 06:55:41 2006
From: brett at (Brett Cannon)
Date: Wed, 15 Feb 2006 21:55:41 -0800
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Jeremy Hylton <jeremy at> wrote:
> How about we arrange for some open space time at PyCon to discuss?
> Unfortunately, the compiler talk isn't until the last day and I can't
> stay for sprints.  It would be better to have the talk, then the open
> space, then the sprint.

I would definitely be interested in having an open space discussion on
where we want to go with this.  If we want to generate as much
interest we should probably hold it the same day as your talk and have
you announce it.  Otherwise it could be scheduled at any time before
the sprints.


From aleaxit at  Thu Feb 16 06:59:55 2006
From: aleaxit at (Alex Martelli)
Date: Wed, 15 Feb 2006 21:59:55 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 15, 2006, at 9:51 AM, Barry Warsaw wrote:

> On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:
>> Regarding open vs. opentext, I'm still not sure. I don't want to
>> generalize from the openbytes precedent to openstr or openunicode
>> (especially since the former is wrong in 2.x and the latter is wrong
>> in 3.0). I'm tempting to hold out for open() since it's most
>> compatible.
> If we go with two functions, I'd much rather hang them off of the file
> type object then add two new builtins.  I really do think file.bytes()
> and file.text() (a.k.a. open.bytes() and open.text()) is better than
> opentext() or openbytes().

I agree, or, MAL's idea of and is also  
good.  My fondest dream is that we do NOT have an 'open' builtin  
which has proven to be very error-prone when used in Windows by  
newbies (as evidenced by beginner errors as seen on, the  
python-help lists, and other venues) -- defaulting 'open' to text is  
errorprone, defaulting it to binary doesn't seem the greatest idea  
either, principle "when in doubt, resist the temptation to guess"  
strongly suggests not having 'open' as a built-in at all.  (And  
namemangling into openthis and openthat seems less Pythonic to me  
than exploiting namespaces by making structured names, either and or open.this and open.that).  IOW, I entirely  
agree with Barry and Marc Andre.


From brett at  Thu Feb 16 07:01:18 2006
From: brett at (Brett Cannon)
Date: Wed, 15 Feb 2006 22:01:18 -0800
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Neal Norwitz <nnorwitz at> wrote:
> On 2/15/06, Barry Warsaw <barry at> wrote:
> >
> > I haven't been following the AST stuff closely enough, but I'm not crazy
> > about putting access to this in the sys module.  It seems like it
> > clutters that up with a name that will be rarely used by the average
> > Python programmer.
> Agreed.  I'm hoping we can get rid of lots of code in the compiler
> module and use the AST provided from C.  The compiler module seems the
> best place to put anything related to the AST.

Sure, fine with me.  I am not in love with the sys idea, just seemed
reasonable.  I just happen to think of the compiler module as this
Python implementation of the bytecode compiler and not as this generic
package where all compiler-related stuff goes.  But if we move towards
removing the parts of the compiler package that overlap with any AST
being exposed that would be great.


From brett at  Thu Feb 16 07:14:17 2006
From: brett at (Brett Cannon)
Date: Wed, 15 Feb 2006 22:14:17 -0800
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/15/06, Nick Coghlan <ncoghlan at> wrote:
> Greg Ewing wrote:
> > Brett Cannon wrote:
> >> One protects us from ending up with an unusable AST since
> >> the seralization can keep the original AST around and if the version
> >> passed back in from Python code is junk it can be tossed and the
> >> original version used.
> >
> > I don't understand why this is an issue. If Python code
> > produces junk and tries to use it as an AST, then it's
> > buggy and deserves what it gets. All the AST compiler
> > should be responsible for is to try not to crash the
> > interpreter under those conditions. But that's true
> > whatever method is used for passing ASTs from Python
> > to the compiler.
> I'd prefer the AST node be real Python objects. The arena approach seems to be
> working reasonably well, but I still don't see a good reason for using a
> specialised memory allocation scheme when it really isn't necessary and we
> have a perfectly good memory management system for PyObject's.

If the compiler was hacked on by more people I would agree with this. 
But few people do and so I am not too worried about using a simple,
custom memory system as long as its use is clearly written out for
those few who do decide to work on it (and I am willing to be in
charge of that, regardless of which solution we go with).  Obviously
it could be argued that more people don't because of its "special"
coding style, but then again the old compiler wasn't special and very
few people touched that beast.

> On the 'unusable AST' front, if AST transformation code creates illegal
> output, then the main thing is to raise an exception complaining about what's
> wrong with it. I believe that may need a change to the compiler whether the
> modified AST was serialised or not.

That's fine, but I wasn't sure where this exception would be raised. 
I guess it would come up during the import of a module if it was
automatically passing the AST through a list of processing functions. 
Some might view it as not as bad as a segfault of the interpreter, but
worse than just an ImportError.  As I said, I am fine with allowing
modification, but others have expressed reservations.

> In terms of reverting back to the untransformed AST if the transformation
> fails, then that option is up to the code doing the transformation. Instead of
> serialising all the time (even for cases where the AST is just being inspected
> instead of transformed), we can either let the AST objects support the
> copy/deepcopy protocol, or else provide a method to clone a tree before trying
> to transform it.

I view it as a one-time serialization and a one-time conversion back. 
So the compiler goes C -> Python objects.  That is then subsequently
passed into the first function registered to access the AST.  The AST
returned by that function is then immediately and directly passed to
the next function in the list.  This continues until the last function
in which that returned AST is then converted back to the C
representation, verified, and then sent on to the bytecode compiler.

> A unified representation means we only have one API to learn, that is
> accessible from both Python and C. It also eliminates any need to either
> implement features twice (once in Python and once in C) or else let the Python
> and C API's diverge to the point where what you can do with one differs from
> what you can do with the other.

I suspect that any marshalling from C to Python will have a matching
object design based on the AST node layout in the ASDL.  So that API
won't really be different from C to Python if we stick with the arena

And I also realized that marshalling might just go straight C to
Python objects and not an intermediary step as I had in my head. 
Don't know why I thought it might need it or if anyone picked up on
that being a possibility.


From brett at  Thu Feb 16 07:15:33 2006
From: brett at (Brett Cannon)
Date: Wed, 15 Feb 2006 22:15:33 -0800
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/15/06, Nick Coghlan <ncoghlan at> wrote:
> Thomas Wouters wrote:
> > On Wed, Feb 15, 2006 at 07:28:36PM +1000, Nick Coghlan wrote:
> >
> >> On the 'unusable AST' front, if AST transformation code creates illegal
> >> output, then the main thing is to raise an exception complaining about
> >> what's wrong with it. I believe that may need a change to the compiler
> >> whether the modified AST was serialised or not.
> >
> > I would personally prefer the AST validation to be a separate part of the
> > compiler. It means the one or the other can be out of sync, but it also
> > means it can be accessed directly (validating AST before sending it to the
> > compiler) and the compiler (or CFG generator, or something between AST and
> > CFG) can decide not to validate internally generated AST for non-debug
> > builds, for instance.
> >
> > I like both those reasons.
> Aye, I was thinking much the same thing.

Yeah, I would want it to be a separate part as well.


From anthony at  Thu Feb 16 07:45:23 2006
From: anthony at (Anthony Baxter)
Date: Thu, 16 Feb 2006 17:45:23 +1100
Subject: [Python-Dev] 2.5 - I'm ok to do release management
Message-ID: <>

I'm still catching up on the hundreds of python-dev messages from the 
last couple of days, but a quick note first that I'm ok to do release 
management for 2.5

Anthony Baxter     <anthony at>
It's never too late to have a happy childhood.

From rasky at  Thu Feb 16 08:01:25 2006
From: rasky at (Giovanni Bajo)
Date: Thu, 16 Feb 2006 08:01:25 +0100
Subject: [Python-Dev] from __future__ import unicode_strings?
References: <dt0hej$j09$><>
Message-ID: <045901c632c6$cd545f90$1abf2997@bagio>

Thomas Wouters <thomas at> wrote:

>>>     from __future__ import unicode_strings
>> Didn't we have a command-line option to do this? I believe it was
>> removed because nobody could see the point. (Or am I hallucinating?
>> After several days of non-stop discussing bytes that must be
>> considered a possibility.)
> We do, and it's not been removed: the -U switch.

It's not in the output of "python -h", though. Is it secret or what?

Giovanni Bajo

From fredrik at  Thu Feb 16 08:18:59 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 08:18:59 +0100
Subject: [Python-Dev] 2.5 PEP
References: <><><>
Message-ID: <dt1915$gvh$>

(my mails to python-dev are bouncing; guess that's what you get when
you question the PSF's ability to build web sites...  trying again.)

Neal Norwitz wrote:

> > (is the xmlplus/xmlcore issue still an issue, btw?)
> What issue are you talking about?

 the changes described here

   "I'd like to propose that a new package be created in the standard library:

which led to this response from a pyxml maintainer:

   "I don't agree with the change. You just broke source compatibility
   between the core package and PyXML."


From arekm at  Thu Feb 16 08:28:41 2006
From: arekm at (Arkadiusz Miskiewicz)
Date: Thu, 16 Feb 2006 08:28:41 +0100
Subject: [Python-Dev] how bugfixes are handled?
References: <dt07a8$khp$>
Message-ID: <dt19ja$i8s$>

Guido van Rossum wrote:

> We're all volunteers here, and we get a large volume of bugs.
That's obvious (note, I'm not complaining, I'm asking ,,how it works for

> Unfortunately, bugfixes are reviewed on a voluntary basis.
> Are you aware of the standing offer that if you review 5 bugs/patches
> some of the developers will pay attention to your bug/patch?
I wasn't, thanks for information.

Still few questions... one of developers/commiters reviews patch and commit
it? Few developers has to review single patch?

Arkadiusz Mi?kiewicz                    PLD/Linux Team

From nnorwitz at  Thu Feb 16 08:32:19 2006
From: nnorwitz at (Neal Norwitz)
Date: Wed, 15 Feb 2006 23:32:19 -0800
Subject: [Python-Dev] how bugfixes are handled?
In-Reply-To: <dt19ja$i8s$>
References: <dt07a8$khp$>
Message-ID: <>

On 2/15/06, Arkadiusz Miskiewicz <arekm at> wrote:
> Still few questions... one of developers/commiters reviews patch and commit
> it? Few developers has to review single patch?

One developer can review and commit a patch.  Sometimes we request
more input from other developers or interested parties.


From stefan.rank at  Thu Feb 16 08:37:56 2006
From: stefan.rank at (Stefan Rank)
Date: Thu, 16 Feb 2006 08:37:56 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>
Message-ID: <>

on 16.02.2006 06:59 Alex Martelli said the following:
> On Feb 15, 2006, at 9:51 AM, Barry Warsaw wrote:
>> On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:
>>> Regarding open vs. opentext, I'm still not sure. I don't want to
>>> generalize from the openbytes precedent to openstr or openunicode
>>> (especially since the former is wrong in 2.x and the latter is wrong
>>> in 3.0). I'm tempting to hold out for open() since it's most
>>> compatible.
>> If we go with two functions, I'd much rather hang them off of the file
>> type object then add two new builtins.  I really do think file.bytes()
>> and file.text() (a.k.a. open.bytes() and open.text()) is better than
>> opentext() or openbytes().
> I agree, or, MAL's idea of and is also  
> good.  My fondest dream is that we do NOT have an 'open' builtin  
> which has proven to be very error-prone when used in Windows by  
> newbies (as evidenced by beginner errors as seen on, the  
> python-help lists, and other venues) -- defaulting 'open' to text is  
> errorprone, defaulting it to binary doesn't seem the greatest idea  
> either, principle "when in doubt, resist the temptation to guess"  
> strongly suggests not having 'open' as a built-in at all.  (And  
> namemangling into openthis and openthat seems less Pythonic to me  
> than exploiting namespaces by making structured names, either  
> and or open.this and open.that).  IOW, I entirely  
> agree with Barry and Marc Andre.

`open`ing a file, i.e. constructing a `file` object, always requires a 
path argument.

In case that Py3k manages to incorporate a `Path` object, I could be 
more natural to have `.openbytes` and `.opentext` as methods on Path 
objects. But `` and `text/unicode/` looks nice too.

Just for the record,

From bokr at  Thu Feb 16 08:54:41 2006
From: bokr at (Bengt Richter)
Date: Thu, 16 Feb 2006 07:54:41 GMT
Subject: [Python-Dev] str object going in Py3K
References: <>
Message-ID: <>

On Wed, 15 Feb 2006 18:57:26 -0800, Guido van Rossum <guido at> wrote:

>On 2/15/06, Greg Ewing <greg.ewing at> wrote:
>> Guido van Rossum wrote:
>> > So how about
>> > openbytes? This clearly links the resulting object with the bytes
>> > type, which is mutually reassuring.
>> That looks quite nice.
>> Another thought -- what is going to happen to
>> Will it change to return bytes, or will there be a new
>> os.openbytes?
>Hm, I hadn't thought about that yet. On Windows, has the
>ability to set text or binary mode. But IMO it's better to make this
>always use binary mode.
>My expectation is that the Py3k standard I/O library will do all of
>its own conversions on top of binary files anyway -- if you missed it,
>I'd like to get rid of any ties to C's stdio.
Would the standard I/O module have low level utility stream-processing generators
to do things like linesep normalization in text or splitlines etc? I.e., primitives
that could be composed for unforseen usefulness, like unix pipeable stuff?

Maybe they could even be composable with '|' for unixy left->right piping, e.g., on windows

    for line in ('somepath') | linechunker | decoder('latin-1')): ...

where'path').__or__(linechunker) returns linechunker('path')),
which in turn has an __or__ to do similarly. Just had this bf, but ISTM it reads ok.
The equivalent nested generator expression with same assumed primitives would I guess be

    for line in decoder('latin-1')(linechunker(binaryfile('path'))): ...

which doesn't have the same natural left to right reading order to match processing order.

Bengt Richter

From talin at  Thu Feb 16 08:05:09 2006
From: talin at (Talin)
Date: Wed, 15 Feb 2006 23:05:09 -0800
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
Message-ID: <>

First off, let me apologize for bringing up a topic that I am sure that 
everyone is sick of: Lambda.

I broached this subject to a couple of members of this list privately, 
and I got wise feedback on my suggestions which basically amounted to 
"don't waste your time."

However, after having thought about this for several weeks, I came to 
the conclusion that I felt so strongly about this issue that the path of 
wisdom simply would not do, and I would have to choose the path of 
folly. Which I did.

In other words, I went ahead and implemented it. Actually, it wasn't too 
bad, it only took about an hour of reading the ast.c code and the 
Grammar file (neither of which I had ever looked at before) to get the 
general sense of what's going on.

So the general notion is similar to the various proposals on the Wiki - 
an inline keyword which serves the function of lambda. I chose the 
keyword "given" because it reminds me of math textbooks, e.g. "given x, 
solve for y". And I like the idea of syntactical structures that make 
sense when you read them aloud.

Here's an interactive console session showing it in action.

The first example shows a simple closure that returns the square of a 

    >>> a = (x*x given x)
    >>> a(9)

You can also put parens around the argument list if you like:

    >>> a = (x*x given (x))
    >>> a(9)

Same thing with two arguments, and with the optional parens:

    >>> a = (x*y given x,y)
    >>> a(9, 10)
    >>> a = (x*y given (x,y))
    >>> a(9, 10)

Yup, keyword arguments work too:

    >>> a = (x*y given (x=3,y=4))
    >>> a(9, 10)
    >>> a(9)
    >>> a()

Use an empty paren-list to indicate that you want to define a closure 
with no arguments:

    >>> a = (True given ())
    >>> a()

Note that there are some cases where you have to use the parens around 
the arguments to avoid a syntactical ambiguity:

    >>> map( str(x) given x, (1, 2, 3, 4) )
      File "<stdin>", line 1
        map( str(x) given x, (1, 2, 3, 4) )
    SyntaxError: invalid syntax

As you can see, adding the parens makes this work:

    >>> map( str(x) given (x), (1, 2, 3, 4) )
    ['1', '2', '3', '4']

More fun with "map":

    >>> map( str(x)*3 given (x), (1, 2, 3, 4) )
    ['111', '222', '333', '444']

Here's an example that uses the **args syntax:

    >>> a = (("%s=%s" % pair for pair in kwargs.items()) given **kwargs)
    >>> list( a(color="red") )
    >>> list( a(color="red", sky="blue") )
    ['color=red', 'sky=blue']

I have to say, the more I use it, the more I like it, but I'm sure that 
this is just a personal taste issue. It looks a lot more natural to me 
than lambda.

I should also mention that I resisted the temptation to make the 'given' 
keyword an optional generator suffix as in "(a for a in l given l). As I 
started working with the code, I started to realize that generators and 
closures, although they have some aspects in common, are very different 
beasts and should not be conflated lightly. (Plus the implementation 
would have been messy. I took that as a clue :))

Anyway, if anyone wants to play around with the patch, it is rather 
small - a couple of lines in Grammar, and a small new function in ast.c, 
plus a few mods to other functions to get them to call it. The context 
diff is less than two printed pages. I can post it somewhere if people 
are interested.

Anyway, I am not going to lobby for a language change or write a PEP 
(unless someone asks me to.) I just wanted to throw this out there and 
see what people think of it. I definately don't want to start a flame 
war, although I suspect I already have :/

Now I can stop thinking about this and go back to my TurboGears-based 
Thesaurus editor :)

-- Talin

From fredrik at  Thu Feb 16 09:33:47 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 09:33:47 +0100
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
References: <>
Message-ID: <dt1ddd$sv2$>

Talin wrote:

> So the general notion is similar to the various proposals on the Wiki -
> an inline keyword which serves the function of lambda. I chose the
> keyword "given" because it reminds me of math textbooks, e.g. "given x,
> solve for y". And I like the idea of syntactical structures that make
> sense when you read them aloud.

but that's about the only advantage you get from writing

    (x*x given x)

instead of

    lambda x: x*x

right ?  or did I miss some subtle detail here ?

> I definately don't want to start a flame war, although I suspect I already
> have :/

I think most about everything has already been said wrt lambda already,
but I guess we could have a little war on spelling issues ;-)


From p.f.moore at  Thu Feb 16 10:28:56 2006
From: p.f.moore at (Paul Moore)
Date: Thu, 16 Feb 2006 09:28:56 +0000
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <dt1ddd$sv2$>
References: <> <dt1ddd$sv2$>
Message-ID: <>

On 2/16/06, Fredrik Lundh <fredrik at> wrote:
> Talin wrote:
> > I definately don't want to start a flame war, although I suspect I already
> > have :/
> I think most about everything has already been said wrt lambda already,
> but I guess we could have a little war on spelling issues ;-)

Agreed, but credit to Talin for actually implementing his suggestion.
And it's nice to see that the AST makes this sort of experimentation


From gh at  Thu Feb 16 10:11:50 2006
From: gh at (=?ISO-8859-1?Q?Gerhard_H=E4ring?=)
Date: Thu, 16 Feb 2006 10:11:50 +0100
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	<>	<>	<>	<>	<dt020q$s7$>
	<dt0905$qm9$>	<>
Message-ID: <>

Jeremy Hylton wrote:
> I don't think this message is on-topic for python-dev.  There are lots
> of great places to discuss the design of the python web site, but the
> list for developers doesn't seem like a good place for it.  Do we need
> a different list for people to gripe^H^H^H^H^H discuss the web site? [...]

Such as ?

-- Gerhard

From greg.ewing at  Thu Feb 16 10:43:25 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 22:43:25 +1300
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
 coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:

> They may not be encodings of _unicode_ data,

But if they're not encodings of unicode data, what
business do they have being available through


From greg.ewing at  Thu Feb 16 10:55:42 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 16 Feb 2006 22:55:42 +1300
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Brett Cannon wrote:

> If the compiler was hacked on by more people I would agree with this. 
> But few people do

This has the potential to be a self-perpetuating situation.
There may be few people hacking on it now, but more people
may want to in the future. Those people may look at the
funky coding style and get discouraged, so there remains
only few people working on it, thus apparently justifying
the decision to keep the funky coding style.

Whereas if there weren't any funky coding style in the
first place, more potential compiler hackers might be
encouraged to have a go.

Also I'm still wondering why we're going to all this effort
to build a whole new AST and compiler structure if the
purpose isn't to *avoid* all this transformation between
different representations.


From mal at  Thu Feb 16 11:24:35 2006
From: mal at (M.-A. Lemburg)
Date: Thu, 16 Feb 2006 11:24:35 +0100
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <>
References: <dt0hej$j09$>	<>	<>
Message-ID: <>

Neil Schemenauer wrote:
> On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
>> On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:
>>>>     from __future__ import unicode_strings
>>> Didn't we have a command-line option to do this? I believe it was
>>> removed because nobody could see the point. (Or am I hallucinating?
>>> After several days of non-stop discussing bytes that must be
>>> considered a possibility.)
>> We do, and it's not been removed: the -U switch.
> As Guido alluded, the global switch is useless.  A per-module switch
> something that could actually useful.  One nice advantage is that
> you would write code that works the same with Jython (wrt to string
> literals anyhow).

The global switch is not useless. It's purpose is to test the
standard library (or any other piece of Python code) for Unicode

Since we're not even close to such compatibility, I'm not sure
how useful a per-module switch would be.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 16 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From ncoghlan at  Thu Feb 16 11:57:14 2006
From: ncoghlan at (Nick Coghlan)
Date: Thu, 16 Feb 2006 20:57:14 +1000
Subject: [Python-Dev] PEP 338 issue finalisation (was Re:  2.5 PEP)
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Nick Coghlan <ncoghlan at> wrote:
>> PEP 338 is pretty much ready to go, too - just waiting on Guido's review and
>> pronouncement on the specific API used in the latest update (his last PEP
>> parade said he was OK with the general concept, but I only posted the PEP 302
>> compliant version after that).
> I  like the PEP and the implementation (which I downloaded from SF).
> Here are some comments in the form of diffs (attached).
> Do you have unit tests for everything? I believe I fixed a bug in the
> code that reads a bytecode file (it wasn't skipping the timestamp).

I haven't worked the filesystem based tests into the unit tests yet, and even 
the manual tests I was using managed to leave out compiled bytecode files (as 
you noticed). I'll fix that.

Given I do my testing on Linux, I probably still would have forgotten the 'rb' 
mode definitions on the relevant calls to open() though. . .

> +++ pep-0338.txt	(working copy)
> -    The optional argument ``init_globals`` may be used to pre-populate
> +    The optional argument ``init_globals`` may be a dictionary used to pre-populate
>      the globals dictionary before the code is executed. The supplied
>      dictionary will not be modified.

I just realised that anything that's a legal argument to "dict.update" will 
work. I'll fix the function description in the PEP (and the docs patch as well).

> ---	Wed Feb 15 15:56:07 2006
>           def get_data(self, pathname):
> !             # XXX Unfortunately PEP 302 assumes text data :-(
> !             return open(pathname).read()


The PEP itself requests that a string be returned from get_data(), but doesn't 
require that the file be opened in text mode. Perhaps the PEP 302 emulation 
should use binary mode here? Otherwise there could be strange data corruption 
bugs on Windows.

> --- 337,349 ----
>   # This helper is needed as both the PEP 302 emulation and the
>   # main file execution functions want to read compiled files
> + # XXX marshal can also raise EOFError; perhaps that should be
> + # turned into ValueError?  Some callers expect ValueError.
>   def _read_compiled_file(compiled_file):
>       magic =
>       if magic != imp.get_magic():
>           raise ValueError("File not compiled for this Python version")
> + # Throw away timestamp
>       return marshal.load(compiled_file)

I'm happy to convert EOFError to ValueError here if you'd prefer (using the 
string representation of the EOFError as the message in the ValueError).

Or did you mean changing the behaviour in marshal itself?

> --- 392,407 ----
>       loader = _get_loader(mod_name)
>       if loader is None:
>           raise ImportError("No module named " + mod_name)
> +     # XXX get_code() is an *optional* loader feature. Is that okay?
>       code = loader.get_code(mod_name)

If the loader doesn't provide access to the code object or the source code, 
then runpy can't really do anything useful with that module (e.g. if its a C 
builtin module). Given that PEP 302 states that if you provide get_source() 
you should also provide get_code(), this check should be sufficient to let 
runpy.run_module get to everything it can handle.

A case could be made for converting the attribute error to an ImportError, I 
guess. . .

>       filename = _get_filename(loader, mod_name)
>       if run_name is None:
>           run_name = mod_name
> +     # else:
> +         # XXX Should we also set sys.modules[run_name] = sys.modules[mod_name]?
> +         #     I know of code that does "import __main__".  It should probably
> +         #     get the substitute __main__ rather than the original __main__,
> +         #     if run_name != mod_name
>       return run_module_code(code, init_globals, run_name, 
>                              filename, loader, as_script)

Hmm, good point. How about a different solution, though: in run_module_code, I 
could create a new module object and put it temporarily in sys.modules, and 
then remove it when done (restoring the original module if necessary).

That would mean any module with code that looks up "sys.modules[__name__]" 
would still work when run via runpy.run_module or runpy.run_file.

I also realised that sys.argv[0] should be restored to its original value, too.

I'd then change the "as_script" flag to "alter_sys" and have it affect both of 
the above operations (and grab the import lock to avoid other import or 
run_module_code operations seeing the altered version of sys.modules).

> --- 439,457 ----
>          Returns the resulting top level namespace dictionary
>          First tries to run as a compiled file, then as a source file
> +        XXX That's not the same algorithm as used by regular import;
> +            if the timestamp in the compiled file is not equal to the
> +            source file's mtime, the compiled file is ignored
> +            (unless there is no source file -- then the timestamp
> +            is ignored)

They're doing different things though - the import process uses that algorithm 
to decide which filename to use (.pyo, .pyc or .py). This code in run_file is 
trying to decide whether the supplied filename points to a compiled file or a 
source file without a tight coupling to the specific file extension used (e.g. 
so it works for Unix Python scripts that rely on the shebang line to identify 
which interpreter to use to run them).

I'll add a comment to that effect.

Another problem that occurred to me is that the module isn't thread safe at 
the moment. The PEP 302 emulation isn't protected by the import lock, and the 
changes to sys.argv in run_module_code will be visible across threads (and may 
clobber each other or the original if multiple threads invoke the function).

On that front, I'll change _get_path_loader to acquire and release the import 
lock, and the same for run_module_code when "alter_sys" is set to True.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rhamph at  Thu Feb 16 12:00:38 2006
From: rhamph at (Adam Olsen)
Date: Thu, 16 Feb 2006 04:00:38 -0700
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, "Martin v. L?wis" <martin at> wrote:
> Adam Olsen wrote:
> > Making it an error to have 8-bit str literals in 2.x would help
> > educate the user that they will change behavior in 3.0 and not be
> > 8-bit str literals anymore.
> You would like to ban string literals from the language? Remember:
> all string literals are currently 8-bit (byte) strings.

That's a rather literal interpretation of what I said. ;)  What I
meant was to only accept 7-bit characters, namely ascii.

Adam Olsen, aka Rhamphoryncus

From ncoghlan at  Thu Feb 16 12:06:59 2006
From: ncoghlan at (Nick Coghlan)
Date: Thu, 16 Feb 2006 21:06:59 +1000
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Neal Norwitz wrote:
> On 2/15/06, Alain Poirier <alain.poirier at> wrote:
>>   - isn't the current implementation of itertools.tee (cache of previous
>>     generated values) incompatible with the new possibility to feed a
>>     generator (PEP 342) ?
> I'm not sure what you are referring to.  What is the issue?

The 'tee' object doesn't have a "send" method. (This is true for all of the 
itertools iterators, I believe).

The request is misguided though - the itertools module is designed to operate 
on output-only iterators, not on generators that expect input via send(). 
Because the output values might depend on the values sent, then it makes no 
sense to cache them (or do most of the other things itertools does).

The relevant functionality would actually make the most sense as a fork() 
method on generators, but PEP 342 was trying to be fairly minimalist.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From rhamph at  Thu Feb 16 12:51:06 2006
From: rhamph at (Adam Olsen)
Date: Thu, 16 Feb 2006 04:51:06 -0700
Subject: [Python-Dev] Rename str/unicode to text [Was: Re: str object going
	in Py3K]
Message-ID: <>

On 2/15/06, Guido van Rossum <guido at> wrote:
> On 2/15/06, M.-A. Lemburg <mal at> wrote:
> > Barry Warsaw wrote:
> > > On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
> > >
> > >> Maybe a weird idea, but why not use static methods on the
> > >> bytes and str type objects for this ?!
> > >>
> > >> E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
> > >> renamed to str.openfile())
> > >
> > > That's also not a bad idea, but I'd leave off one or the other of the
> > > redudant "open" and "file" parts.  E.g. and
> > > seem fine to me (we all know what 'open' means, right? :).
> >
> > Thinking about it, I like your idea better (file.bytes()
> > and file.text()).
> This is better than making it a static/class method on file (which has
> the problem that it might return something that's not a file at all --
> file is a particular stream implementation, there may be others) but I
> don't like the tight coupling it creates between a data type and an
> I/O library. I still think that having global (i.e. built-in) factory
> functions for creating various stream types makes the most sense.

While we're at it, any chance of renaming str/unicode to text in 3.0? 
It's a MUCH better name, as evidenced by the opentext/openbytes names.
 str is just some odd C-ism.

Obviously it's a form of gratuitous breakage, but I think the long
term benefits are enough that we need to be *sure* that the breakage
would be too much before we discount it.  This seems the right time to
discuss that.

(And no, I'm not suggesting any special semantics for text.  It's just
the name I want.)

str literal -> text literal
unicode literal -> text literal
text file -> text file (duh!)
tutorial section called "Strings" -> tutorial section called "Text"
Documentation Strings -> Documentation Text
String Pattern Matching -> Text Pattern Matching
String Services -> Text Services.  Actually this is a problem.  struct
should be used on bytes, not unicode/text.
textwrap -> textwrap
stringprep -> textprep?  Doesn't seem like a descriptive name
linecache "Random access to text lines"
gettext (not getstring!)

Adam Olsen, aka Rhamphoryncus

From barry at  Thu Feb 16 13:38:34 2006
From: barry at (Barry Warsaw)
Date: Thu, 16 Feb 2006 07:38:34 -0500
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 16, 2006, at 2:05 AM, Talin wrote:

> Anyway, if anyone wants to play around with the patch, it is rather
> small - a couple of lines in Grammar, and a small new function in  
> ast.c,
> plus a few mods to other functions to get them to call it. The context
> diff is less than two printed pages. I can post it somewhere if people
> are interested.

Please submit a SourceForge patch so others can play with it!


From ncoghlan at  Thu Feb 16 13:45:29 2006
From: ncoghlan at (Nick Coghlan)
Date: Thu, 16 Feb 2006 22:45:29 +1000
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <> <dt1ddd$sv2$>
Message-ID: <>

Paul Moore wrote:
> On 2/16/06, Fredrik Lundh <fredrik at> wrote:
>> Talin wrote:
>>> I definately don't want to start a flame war, although I suspect I already
>>> have :/
>> I think most about everything has already been said wrt lambda already,
>> but I guess we could have a little war on spelling issues ;-)
> Agreed, but credit to Talin for actually implementing his suggestion.
> And it's nice to see that the AST makes this sort of experimentation
> easier.

Aye to both of those comments (and the infix spelling really is kind of 
pretty). Who knows, maybe Guido will decide he wants to change the spelling 
some day. Probably only if the sky fell on him or something, though ;)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From jeremy at  Thu Feb 16 13:49:08 2006
From: jeremy at (Jeremy Hylton)
Date: Thu, 16 Feb 2006 07:49:08 -0500
Subject: [Python-Dev] C AST to Python discussion
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/16/06, Greg Ewing <greg.ewing at> wrote:
> Whereas if there weren't any funky coding style in the
> first place, more potential compiler hackers might be
> encouraged to have a go.

I'm trying to make the code simple.  The style of code is different
than other parts of Python, but a compiler is different than a
bytecode engine or implementations of basic types.  Different problem
domains lead to different program structure.

> Also I'm still wondering why we're going to all this effort
> to build a whole new AST and compiler structure if the
> purpose isn't to *avoid* all this transformation between
> different representations.

The goal is to get the right representation for the problem.  It was
harder to understand and modify the compiler when it worked on the
concrete parse trees.  The compiler now has a couple of abstractions
that are well suited to particular phases of compilation.


From exarkun at  Thu Feb 16 15:10:20 2006
From: exarkun at (Jean-Paul Calderone)
Date: Thu, 16 Feb 2006 09:10:20 -0500
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <>
Message-ID: <20060216141020.6122.443333891.divmod.quotient.541@ohm>

On Thu, 16 Feb 2006 11:24:35 +0100, "M.-A. Lemburg" <mal at> wrote:
>Neil Schemenauer wrote:
>> On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
>>> On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:
>>>>>     from __future__ import unicode_strings
>>>> Didn't we have a command-line option to do this? I believe it was
>>>> removed because nobody could see the point. (Or am I hallucinating?
>>>> After several days of non-stop discussing bytes that must be
>>>> considered a possibility.)
>>> We do, and it's not been removed: the -U switch.
>> As Guido alluded, the global switch is useless.  A per-module switch
>> something that could actually useful.  One nice advantage is that
>> you would write code that works the same with Jython (wrt to string
>> literals anyhow).
>The global switch is not useless. It's purpose is to test the
>standard library (or any other piece of Python code) for Unicode
>Since we're not even close to such compatibility, I'm not sure
>how useful a per-module switch would be.

Just what Neil suggested: developers writing new code benefit from having the behavior which will ultimately be Python's default, rather than the behavior that is known to be destined for obsolescence.

Being able to turn this on per-module is useful for the same reason the rest of the future system is useful on a per-module basis.  It's easier to convert things incrementally than monolithicly.


From guido at  Thu Feb 16 16:07:41 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 07:07:41 -0800
Subject: [Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Nick Coghlan <ncoghlan at> wrote:
> Guido van Rossum wrote:
> > Do you have unit tests for everything? I believe I fixed a bug in the
> > code that reads a bytecode file (it wasn't skipping the timestamp).

[Hey, I thought I sent that just to you. Is python-dev really
interested in this?]

> I haven't worked the filesystem based tests into the unit tests yet, and even
> the manual tests I was using managed to leave out compiled bytecode files (as
> you noticed). I'll fix that.
> Given I do my testing on Linux, I probably still would have forgotten the 'rb'
> mode definitions on the relevant calls to open() though. . .

But running the unit tests on Windows would have revealed the problem.

> > +++ pep-0338.txt      (working copy)
> > -    The optional argument ``init_globals`` may be used to pre-populate
> > +    The optional argument ``init_globals`` may be a dictionary used to pre-populate
> >      the globals dictionary before the code is executed. The supplied
> >      dictionary will not be modified.
> I just realised that anything that's a legal argument to "dict.update" will
> work. I'll fix the function description in the PEP (and the docs patch as well).

I'm not sure that's a good idea -- you'll never be able to switch to a
different implementation then.

> > ---  Wed Feb 15 15:56:07 2006
> >           def get_data(self, pathname):
> > !             # XXX Unfortunately PEP 302 assumes text data :-(
> > !             return open(pathname).read()
> Hmm.
> The PEP itself requests that a string be returned from get_data(), but doesn't
> require that the file be opened in text mode. Perhaps the PEP 302 emulation
> should use binary mode here? Otherwise there could be strange data corruption
> bugs on Windows.

But PEP 302 shows as its only example reading from a file with a .txt
extension. Adding spurious \r characters is also data corruption. We
should probably post to python-dev a request for clarification of PEP
302, but in the mean time I vote for text mode.

> > --- 337,349 ----
> >
> >   # This helper is needed as both the PEP 302 emulation and the
> >   # main file execution functions want to read compiled files
> > + # XXX marshal can also raise EOFError; perhaps that should be
> > + # turned into ValueError?  Some callers expect ValueError.
> >   def _read_compiled_file(compiled_file):
> >       magic =
> >       if magic != imp.get_magic():
> >           raise ValueError("File not compiled for this Python version")
> > + # Throw away timestamp
> >       return marshal.load(compiled_file)
> I'm happy to convert EOFError to ValueError here if you'd prefer (using the
> string representation of the EOFError as the message in the ValueError).
> Or did you mean changing the behaviour in marshal itself?

No -- the alternative is to catch EOFError in _read_compiled_file()'s
caller, but that seems worse. You should check marshal.c if it can
raise any *other* errors (perhaps OverflowError?).

Also, *perhaps* it makes more sense to return None instead of raising
ValueError? Since you're always catching it? (Or are you?)

> > --- 392,407 ----
> >       loader = _get_loader(mod_name)
> >       if loader is None:
> >           raise ImportError("No module named " + mod_name)
> > +     # XXX get_code() is an *optional* loader feature. Is that okay?
> >       code = loader.get_code(mod_name)
> If the loader doesn't provide access to the code object or the source code,
> then runpy can't really do anything useful with that module (e.g. if its a C
> builtin module). Given that PEP 302 states that if you provide get_source()
> you should also provide get_code(), this check should be sufficient to let
> runpy.run_module get to everything it can handle.

OK. But a loader could return None from get_code() -- do you check for
that? (I don't have the source handy here.)

> A case could be made for converting the attribute error to an ImportError, I
> guess. . .

I'm generally not keen on that; leave it.

> >       filename = _get_filename(loader, mod_name)
> >       if run_name is None:
> >           run_name = mod_name
> > +     # else:
> > +         # XXX Should we also set sys.modules[run_name] = sys.modules[mod_name]?
> > +         #     I know of code that does "import __main__".  It should probably
> > +         #     get the substitute __main__ rather than the original __main__,
> > +         #     if run_name != mod_name
> >       return run_module_code(code, init_globals, run_name,
> >                              filename, loader, as_script)
> Hmm, good point. How about a different solution, though: in run_module_code, I
> could create a new module object and put it temporarily in sys.modules, and
> then remove it when done (restoring the original module if necessary).

That might work too.

What happens when you execute "" as __main__ and then (perhaps
indirectly) something does "import foo"? Does a second copy of
get loaded by the regular loader?

> That would mean any module with code that looks up "sys.modules[__name__]"
> would still work when run via runpy.run_module or runpy.run_file.

Yeah, except if they do that, they're likely to also *assign* to that.
Well, maybe that would just work, too...

> I also realised that sys.argv[0] should be restored to its original value, too.


> I'd then change the "as_script" flag to "alter_sys" and have it affect both of
> the above operations (and grab the import lock to avoid other import or
> run_module_code operations seeing the altered version of sys.modules).

Makes sense.

I do wonder if isn't getting a bit over-engineered -- it
seems a lot of the functionality isn't actually necessary to implement
-m, and the usefulness in other circumstances is as yet
unproven. What do you think of taking a dose of YAGNI here?
(Especially since I notice that most of the public APIs are very thin
layers over exec or execfile -- people can just use those directly.)

> > --- 439,457 ----
> >
> >          Returns the resulting top level namespace dictionary
> >          First tries to run as a compiled file, then as a source file
> > +        XXX That's not the same algorithm as used by regular import;
> > +            if the timestamp in the compiled file is not equal to the
> > +            source file's mtime, the compiled file is ignored
> > +            (unless there is no source file -- then the timestamp
> > +            is ignored)
> They're doing different things though - the import process uses that algorithm
> to decide which filename to use (.pyo, .pyc or .py). This code in run_file is
> trying to decide whether the supplied filename points to a compiled file or a
> source file without a tight coupling to the specific file extension used (e.g.
> so it works for Unix Python scripts that rely on the shebang line to identify
> which interpreter to use to run them).
> I'll add a comment to that effect.

Ah, good point. So you never go from to foo.pyc, right?

> Another problem that occurred to me is that the module isn't thread safe at
> the moment. The PEP 302 emulation isn't protected by the import lock, and the
> changes to sys.argv in run_module_code will be visible across threads (and may
> clobber each other or the original if multiple threads invoke the function).

Another reason to consider cutting it down to only what's needed by
-m; -m doesn't need thread-safety (I think).

> On that front, I'll change _get_path_loader to acquire and release the import
> lock, and the same for run_module_code when "alter_sys" is set to True.

OK, just be very, very careful. The import lock is not a regular mutex
and if you don't release it you're stuck forever. Just use

--Guido van Rossum (home page:

From fredrik at  Thu Feb 16 16:23:10 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 16:23:10 +0100
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
References: <> <dt1ddd$sv2$>
Message-ID: <dt25d1$hne$>

Paul Moore wrote:

> > I think most about everything has already been said wrt lambda already,
> > but I guess we could have a little war on spelling issues ;-)
> Agreed, but credit to Talin for actually implementing his suggestion.
> And it's nice to see that the AST makes this sort of experimentation
> easier.

absolutely! +1 on experimentation!


From mal at  Thu Feb 16 16:29:40 2006
From: mal at (M.-A. Lemburg)
Date: Thu, 16 Feb 2006 16:29:40 +0100
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <20060216141020.6122.443333891.divmod.quotient.541@ohm>
References: <20060216141020.6122.443333891.divmod.quotient.541@ohm>
Message-ID: <>

Jean-Paul Calderone wrote:
> On Thu, 16 Feb 2006 11:24:35 +0100, "M.-A. Lemburg" <mal at> wrote:
>> Neil Schemenauer wrote:
>>> On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
>>>> On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:
>>>>>>     from __future__ import unicode_strings
>>>>> Didn't we have a command-line option to do this? I believe it was
>>>>> removed because nobody could see the point. (Or am I hallucinating?
>>>>> After several days of non-stop discussing bytes that must be
>>>>> considered a possibility.)
>>>> We do, and it's not been removed: the -U switch.
>>> As Guido alluded, the global switch is useless.  A per-module switch
>>> something that could actually useful.  One nice advantage is that
>>> you would write code that works the same with Jython (wrt to string
>>> literals anyhow).
>> The global switch is not useless. It's purpose is to test the
>> standard library (or any other piece of Python code) for Unicode
>> compatibility.
>> Since we're not even close to such compatibility, I'm not sure
>> how useful a per-module switch would be.
> Just what Neil suggested: developers writing new code benefit from having the behavior which will ultimately be Python's default, rather than the behavior that is known to be destined for obsolescence.
> Being able to turn this on per-module is useful for the same reason the rest of the future system is useful on a per-module basis.  It's easier to convert things incrementally than monolithicly.

Sure, but in this case the option would not only affect the module
you define it in, but also all other code that now gets Unicode
objects instead of strings as a result of the Unicode literals
defined in these modules.

It is rather likely that you'll start hitting Unicode-related
compatibility bugs in the standard lib more often than you'd

It's usually better to switch to Unicode in a controlled manner:
not by switching all literals to Unicode, but only some, then
test things, then switch over some more, etc.

This can be done by prepending the literal with the u"" modifier.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 16 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From p.f.moore at  Thu Feb 16 16:39:30 2006
From: p.f.moore at (Paul Moore)
Date: Thu, 16 Feb 2006 15:39:30 +0000
Subject: [Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Guido van Rossum <guido at> wrote:
> On 2/16/06, Nick Coghlan <ncoghlan at> wrote:

> > The PEP itself requests that a string be returned from get_data(), but doesn't
> > require that the file be opened in text mode. Perhaps the PEP 302 emulation
> > should use binary mode here? Otherwise there could be strange data corruption
> > bugs on Windows.
> But PEP 302 shows as its only example reading from a file with a .txt
> extension. Adding spurious \r characters is also data corruption. We
> should probably post to python-dev a request for clarification of PEP
> 302, but in the mean time I vote for text mode.

FWIW, the .txt example was just a toy example. I'd say that binary
mode makes sense, as I can imagine using the get_data interface to
load image files, for example. It makes getting text files a bit
harder (you have to munge CRLF manually) but at least you have the
*option* of getting binary files.

On reflection, get_data should probably have been given a mode
argument. But given that it didn't, binary seems safest.

OTOH, I don't know who actually *uses* get_data for real (PJE, for
eggs? py2exe?). Their opinions are likely to be of more importance.

On the third hand, doing whatever the zipimport module does is likely
to be right, as that's the key bit of prior art.

Regardless, the PEP should be clarified. I'll make the change once
agreement is reached.


From 2005a at  Thu Feb 16 17:16:36 2006
From: 2005a at (Alexander Schremmer)
Date: Thu, 16 Feb 2006 17:16:36 +0100
Subject: [Python-Dev] still available
References: <dsq741$4un$>	<>	<>	<>
Message-ID: <19igus5puu6e2$>

On Wed, 15 Feb 2006 21:13:14 +0100, Georg Brandl wrote:

> If something like Fredrik's new doc system is adopted, it would be extremely
> convenient to refer someone to just

In fact, PHP does it like which is even shorter, i.e.
they fallback to the documentation if that path does not exist otherwise.

Kind regards,

From fredrik at  Thu Feb 16 07:18:02 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 07:18:02 +0100
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
	<> <dt04in$a5l$>
Message-ID: <>

> > (is the xmlplus/xmlcore issue still an issue, btw?)
> What issue are you talking about?

the changes described here

    "I'd like to propose that a new package be created in the standard library:

which led to this response from a pyxml maintainer:

    "I don't agree with the change. You just broke source compatibility
    between the core package and PyXML."


From nicolas.chauvat at  Thu Feb 16 09:57:00 2006
From: nicolas.chauvat at (Nicolas Chauvat)
Date: Thu, 16 Feb 2006 09:57:00 +0100
Subject: [Python-Dev] [Python-projects] AST in Python 2.5
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 15, 2006 at 09:40:17PM -0800, Neal Norwitz wrote:
> I'm not sure if anyone here is following the AST discussion on
> python-dev, but it would be great if you had any input.  pylint is a
> pretty big consumer of the compiler module and the decisions with
> respect to the AST could impact you.

We will jump in with better comments, but I just wanted to make sure
you knew about:

and the work being done in PyPy:

Here is a bit from our EU reports that is about Workpackage 10 
"Aspects and Contracts in Python":

WP10 Status

Extend language with aspects and contracts

* researched how other languages do it (AspectJ, HyperJ, AspectS, etc.)
* started allowing AST manipulation (for weaving code and function calls)
* started allowing grammar manipulation (for experimenting with syntax)

WP10 Status (cont.)

AST and grammar manipulation

* needed for both WP9 and WP10
* AST nodes are exposed at application-level and a compiler hook 
* allows to modify the AST at compile-time
* syntax can be modified at run-time, but still limited because
  grammar objects are not fully exposed at application-level

WP10 Status (cont.)

AST manipulation example::
    >>>> 3 + 3
    >>>> from parser import install_compiler_hook
    >>>> from hooks import _3becomes2
    >>>> install_compiler_hook(_3becomes2)
    >>>> 3 + 3

Nicolas Chauvat - services en informatique avanc?e et gestion de connaissances  

From fredrik at  Thu Feb 16 11:27:49 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 11:27:49 +0100
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <> <dt1ddd$sv2$>
Message-ID: <>

Paul Moore wrote:

> > > I definately don't want to start a flame war, although I suspect I already
> > > have :/
> >
> > I think most about everything has already been said wrt lambda already,
> > but I guess we could have a little war on spelling issues ;-)
> Agreed, but credit to Talin for actually implementing his suggestion.
> And it's nice to see that the AST makes this sort of experimentation
> easier.

absolutely! +1 for experimentation.


From bsder at  Thu Feb 16 11:36:05 2006
From: bsder at (Andrew Lentvorski)
Date: Thu, 16 Feb 2006 02:36:05 -0800
Subject: [Python-Dev] nice()
In-Reply-To: <000e01c63226$fc342660$7c2c4fca@csmith>
References: <>
Message-ID: <>

Smith wrote:
> Everyone knows that fp numbers must be compared with caution, but
> there is a void in the relative-error department for exercising such
> caution, thus the proposal for something like 'areclose'. The problem
> with areclose(), however, is that it only solves one part of the
> problem that needs to be solved if two fp's *are* going to be
> compared: if you are going to check if a < b you would need to do
> something like
> not areclose(a,b) and a < b


This kind of function, at best, delays the newbie pain of learning about 
  binary floating point very slightly.  No matter how you set your test, 
I can make a pathological case which will catch at the boundary.  The 
standard deviation formula; the area of triangle formula which fails on 
slivers; ill-conditioned linear equations--the examples are endless 
which can trip up newbies.

On the other hand, people who do care about accurate numerical analysis 
will not trust that the people who wrote the library really had enough 
numerical sophistication and will simply rewrite the test *anyhow*.

The "best" solution would be to optimize the Decimal module into 
something sufficiently fast that binary floating point goes away by 
default in Python.

A nice reference about binary floating point is:
"What Every Computer Scientist Should Know About Floating-Point 
Arithmetic" by David Goldberg (available *everywhere*)

For truly pedantic details about the gory nastiness of binary floating 
point, see William Kahan's homepage at Berkeley:


From mal at  Thu Feb 16 17:30:32 2006
From: mal at (M.-A. Lemburg)
Date: Thu, 16 Feb 2006 17:30:32 +0100
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <045901c632c6$cd545f90$1abf2997@bagio>
References: <dt0hej$j09$><>	<>
Message-ID: <>

Giovanni Bajo wrote:
> Thomas Wouters <thomas at> wrote:
>>>>     from __future__ import unicode_strings
>>> Didn't we have a command-line option to do this? I believe it was
>>> removed because nobody could see the point. (Or am I hallucinating?
>>> After several days of non-stop discussing bytes that must be
>>> considered a possibility.)
>> We do, and it's not been removed: the -U switch.
> It's not in the output of "python -h", though. Is it secret or what?


We removed it from the help output to not confuse users
who are not aware of the fact that this is an experimental

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 16 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From John.Marshall at  Thu Feb 16 17:30:02 2006
From: John.Marshall at (John Marshall)
Date: Thu, 16 Feb 2006 16:30:02 +0000
Subject: [Python-Dev] Does eval() leak?
Message-ID: <>


Should I expect the virtual memory allocation
to go up if I do the following?
raw = open("data").read()
while True:
	d = eval(raw)

I would have expected the memory allocated to the
object referenced by d to be deallocated, garbage
collected, and reallocated for the new eval(raw)
results, assigned to d.

The file contains a large, SIMPLE (no self refs; all
native python types/objects) dictionary (>300K).

While doing 'd = eval(raw)' in the python interpreter
I am monitoring the VIRT column of top and it keeps
increasing until I run out of memory.

When I use a safe_eval() from:
I have no memory problems.

I see this under python 2.3.5 (fast and obvious).


From fredrik at  Thu Feb 16 18:13:53 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 18:13:53 +0100
Subject: [Python-Dev] bytes type discussion
References: <><dt09vc$tvv$>
Message-ID: <dt2bsi$cb4$>

Barry Warsaw wrote:

> We know at least there will never be a 2.10, so I think we still have
> time.

because there's no way to count to 10 if you only have one digit?

we used to think that back when the gas price was just below 10 SEK/L,
but they found a way...


From walter at  Thu Feb 16 18:14:45 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 16 Feb 2006 18:14:45 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:

> On Wed, 15 Feb 2006 18:57:26 -0800, Guido van Rossum <guido at> wrote:
>> [...]
>> My expectation is that the Py3k standard I/O library will do all of
>> its own conversions on top of binary files anyway -- if you missed it,
>> I'd like to get rid of any ties to C's stdio.
> Would the standard I/O module have low level utility stream-processing generators
> to do things like linesep normalization in text or splitlines etc? I.e., primitives
> that could be composed for unforseen usefulness, like unix pipeable stuff?
> Maybe they could even be composable with '|' for unixy left->right piping, e.g., on windows
>     for line in ('somepath') | linechunker | decoder('latin-1')): ...
> where'path').__or__(linechunker) returns linechunker('path')),
> which in turn has an __or__ to do similarly. Just had this bf, but ISTM it reads ok.
> The equivalent nested generator expression with same assumed primitives would I guess be
>     for line in decoder('latin-1')(linechunker(binaryfile('path'))): ...
> which doesn't have the same natural left to right reading order to match processing order.

I'm currently implementing something like this, which might go into 
IPython. See for 
code. (This requires the current IPython svn trunk)


for f in ils("/usr/lib/python2.3/") | ifilter("name.endswith('.py')"):
    print, f.size

for p in ipwd | ifilter("shell=='/bin/false'") | isort("uid") | \
    ieval('"%s (%s)" % (, _.gecos)'):
    print p

The other part of the project is a curses based browser for the output 
of these pipelines. See for a screenshot 
of the result of ils("/usr/lib/python2.3/") | 

    Walter D?rwald

From shane.holloway at  Thu Feb 16 18:20:24 2006
From: shane.holloway at (Shane Holloway (IEEE))
Date: Thu, 16 Feb 2006 10:20:24 -0700
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 15, 2006, at 20:06, Greg Ewing wrote:

> Barry Warsaw wrote:
>> If we go with two functions, I'd much rather hang them off of the  
>> file
>> type object then add two new builtins.  I really do think  
>> file.bytes()
>> and file.text() (a.k.a. open.bytes() and open.text()) is better than
>> opentext() or openbytes().
> I'm worried about feeping creaturism of the file type
> here. To my mind, the file type already has too many
> features, and this hinders code that wants to define
> its own file-like objects.
> In 3.0 I'd like to see the file type reduced to having
> as simple an interface as possible (basically just
> read/write) and all other stuff (readlines, text codecs,
> etc.) implemented as wrappers around it.

I'd like to put my 2 cents in a agree with Greg here.  Implementing a  
"complete" file-like object has come to be something of a pain.   
Perhaps we can do something akin to UserDict -- perhaps UserTextFile  
and UserBinaryFile?  It would be nice if it could handle the default  
implementation of everything but read and write.


From fredrik at  Thu Feb 16 18:15:47 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 18:15:47 +0100
Subject: [Python-Dev] Rename str/unicode to text [Was: Re: str object
	goingin Py3K]
References: <>
Message-ID: <dt2c04$cq5$>

Adam Olsen wrote:

> While we're at it, any chance of renaming str/unicode to text in 3.0?
> It's a MUCH better name, as evidenced by the opentext/openbytes names.
>  str is just some odd C-ism.
> Obviously it's a form of gratuitous breakage, but I think the long
> term benefits are enough that we need to be *sure* that the breakage
> would be too much before we discount it.

it's a very common variable name...


From fredrik at  Thu Feb 16 18:25:40 2006
From: fredrik at (Fredrik Lundh)
Date: Thu, 16 Feb 2006 18:25:40 +0100
Subject: [Python-Dev] Off-topic:
References: <dsq741$4un$><><><>
	<dt0cfb$68v$> <>
Message-ID: <dt2cim$f03$>

Aahz wrote:

> In all fairness to Tim (and despite the fact that emotionally I agree
> with you), the fact is that there had been essentially no forward motion
> on redesign until he went to work.  Even if we end up
> chucking out all his work in favor of something else, I'll consider the
> PSF's money well-spent for bringing the community energy into it.

the problem isn't the work that has already been done, the problem is that
things change, and choices that were made years ago are not necessarily
true today.

more on this in another forum, at some other time.  I'll concentrate on the
library reference for now...


From thomas at  Thu Feb 16 18:43:26 2006
From: thomas at (Thomas Wouters)
Date: Thu, 16 Feb 2006 18:43:26 +0100
Subject: [Python-Dev] Test failures in test_timeout
Message-ID: <>

I'm seeing spurious test failures in test_timeout, on my own workstation and
on (now that it crashes less; Apple sent over some new
memory.) The problem is pretty simple: both macteagle and my workstation
live too closely, network-wise, to

class TimeoutTestCase(unittest.TestCase):
    def setUp(self):
        self.addr_remote = ('', 80)
    def testConnectTimeout(self):
        # Test connect() timeout
        _timeout = 0.001

        _t1 = time.time()
        self.failUnlessRaises(socket.error, self.sock.connect,

In other words, the test fails because responds too quickly.

The test on my workstation only fails occasionally, but I do expect
macteagle's failure to be more common (since it's connected to through (literally) a pair of gigabit switches, whereas my
workstation has to pass through a few more switches, two Junipers and some
dark fiber.) Lowering the timeout has no effect, as far as I can tell, which
is probably a granularity issue.

I'm thinking that it could probably try to connect to a less reliable
website, but that's just moving the problem around (and possibly harassing
an unsuspecting website, particularly around release-time.) Perhaps the test
should try to connect to a known unconnecting site, like a firewalled port
on Not something that refuses connections, just something
that times out.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From lists at  Thu Feb 16 19:09:37 2006
From: lists at (Jan Claeys)
Date: Thu, 16 Feb 2006 19:09:37 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$>  <>
Message-ID: <1140113378.13739.75.camel@localhost.localdomain>

Op wo, 15-02-2006 te 11:23 -0800, schreef Bob Ippolito:
> On Feb 15, 2006, at 4:49 AM, Jan Claeys wrote:
> > Op wo, 15-02-2006 te 14:00 +1300, schreef Greg Ewing:
> >> I'm disappointed that the various Linux distributions
> >> still don't seem to have caught onto the very simple
> >> idea of *not* scattering files all over the place when
> >> installing something.

> > Those directories might be mounted on entirely different hardware  
> > (even over a network), often with different characteristics (access speed,
> > writeability, etc.).
> Huh?  What does that have to do with anything?  I've never seen a  
> system where /usr/include, /usr/lib, /usr/bin, etc. are not all on  
> the same mount.  It's not really any different with OS X either.

Paths like /etc, /var, /srv, /usr/include and /usr/share are good
candidates to be on another mount than the bin & lib directories...

BTW, Mac-style packages do exist for Linux too, if you prefer that.
Look e.g. at Klik: <>

Jan Claeys

From guido at  Thu Feb 16 19:27:53 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 10:27:53 -0800
Subject: [Python-Dev] 2.5 - I'm ok to do release management
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Anthony Baxter <anthony at> wrote:
> I'm still catching up on the hundreds of python-dev messages from the
> last couple of days, but a quick note first that I'm ok to do release
> management for 2.5

Thanks! While catching up, yuo can ignore the bytes discussion except
for Neil Schemenauer's proto-pep. :-)

--Guido van Rossum (home page:

From guido at  Thu Feb 16 19:33:19 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 10:33:19 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Alex Martelli <aleaxit at> wrote:
> I agree, or, MAL's idea of and is also
> good.

No, the bytes and text data types shouldn't have to be tied to the I/O
system. (The latter tends to evolve at a much faster rate so should be

> My fondest dream is that we do NOT have an 'open' builtin
> which has proven to be very error-prone when used in Windows by
> newbies (as evidenced by beginner errors as seen on, the
> python-help lists, and other venues) -- defaulting 'open' to text is
> errorprone, defaulting it to binary doesn't seem the greatest idea
> either, principle "when in doubt, resist the temptation to guess"
> strongly suggests not having 'open' as a built-in at all.

Bill Janssen has expressed this sentiment too. But this is because
open() *appears* to work for both types to Unix programmers. If open()
is *only* usable for text data, even Unix programmers will be using
openbytes() from the start.

--Guido van Rossum (home page:

From nnorwitz at  Thu Feb 16 19:33:36 2006
From: nnorwitz at (Neal Norwitz)
Date: Thu, 16 Feb 2006 10:33:36 -0800
Subject: [Python-Dev] [Python-checkins] r42396 - peps/trunk/pep-0011.txt
In-Reply-To: <>
References: <>
Message-ID: <>

[Moving to python-dev]

I don't have a strong opinion.  Any one else have an opinion about
removing --with-wctype-functions from configure?


On 2/16/06, M.-A. Lemburg <mal at> wrote:
> neal.norwitz wrote:
> > Author: neal.norwitz
> > Date: Thu Feb 16 06:25:37 2006
> > New Revision: 42396
> >
> > Modified:
> >    peps/trunk/pep-0011.txt
> > Log:
> > MAL says this option should go away in bug report 874534:
> >
> >     The reason for the removal is that the option causes
> >     semantical problems and makes Unicode work in non-standard
> >     ways on platforms that use locale-aware extensions to the
> >     wc-type functions.
> >
> > Since it wasn't previously announced, we can keep the option until 2.6
> > unless someone feels strong enough to rip it out.
> I've been wanting to rip this out for some time now, but
> you're right: I forgot to add this to PEP 11, so let's
> wait for another release.
> OTOH, this normally only affects system builders, so perhaps
> we could do this a little faster, e.g. add a warning in the
> first alpha and then rip it out with one of the last betas ?!
> > Modified: peps/trunk/pep-0011.txt
> >
> > +    Name:             Systems using --with-wctype-functions
> > +    Unsupported in:   Python 2.6
> > +    Code removed in:  Python 2.6

From guido at  Thu Feb 16 19:35:27 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 10:35:27 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Bengt Richter <bokr at> wrote:
> On Wed, 15 Feb 2006 18:57:26 -0800, Guido van Rossum <guido at> wrote:
> >My expectation is that the Py3k standard I/O library will do all of
> >its own conversions on top of binary files anyway -- if you missed it,
> >I'd like to get rid of any ties to C's stdio.
> >
> Would the standard I/O module have low level utility stream-processing generators
> to do things like linesep normalization in text or splitlines etc? I.e., primitives
> that could be composed for unforseen usefulness, like unix pipeable stuff?

Yes. To get a (very limited) idea of what I'm talking about, see the
sio package in the sandbox:

--Guido van Rossum (home page:

From guido at  Thu Feb 16 19:50:08 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 10:50:08 -0800
Subject: [Python-Dev] Rename str/unicode to text [Was: Re: str object
	going in Py3K]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Adam Olsen <rhamph at> wrote:
> While we're at it, any chance of renaming str/unicode to text in 3.0?
> It's a MUCH better name, as evidenced by the opentext/openbytes names.
>  str is just some odd C-ism.
> Obviously it's a form of gratuitous breakage, but I think the long
> term benefits are enough that we need to be *sure* that the breakage
> would be too much before we discount it.  This seems the right time to
> discuss that.

I'm +/-0 on this. ABC used text. In almost every other currently
popular language it's called string. But the advantage of text is that
it's not an abbreviation, and it reinforces the notion that it's not
binary data. "Binary string" is a common colloquialism; "binary text"
is an oxymoron. Mechanical conversion of code using 'str' (or
'unicode') to use 'text' seems simply enough.

OTOH, even if we didn't rename str/unicode to text, opentext would
still be a good name for the function that opens a text file.

--Guido van Rossum (home page:

From tim.peters at  Thu Feb 16 20:24:54 2006
From: tim.peters at (Tim Peters)
Date: Thu, 16 Feb 2006 14:24:54 -0500
Subject: [Python-Dev] 2.5 - I'm ok to do release management
In-Reply-To: <>
References: <>
Message-ID: <>

[Anthony Baxter]
> I'm still catching up on the hundreds of python-dev messages from the
> last couple of days, but a quick note first that I'm ok to do release
> management for 2.5

I, for one, am delighted to see that Australian millionaires don't
give up tech work after winning an Olympic gold medal. 
Congratulations to Anthony on his!  I didn't even know that
Human-Kangaroo Doubles Luge was a sport until last night.  Damn gutsy
move letting the roo take top position, and I hope to see more bold
thinking like that after Anthony's ribs heal.

From jason.orendorff at  Thu Feb 16 21:00:42 2006
From: jason.orendorff at (Jason Orendorff)
Date: Thu, 16 Feb 2006 15:00:42 -0500
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Guido van Rossum <guido at> wrote:
> >  Actually users trying to figure out Unicode would probably be better
> served
> > if bytes.encode() and text.decode() did not exist.
> [...]
> It would be better if the signature of text.encode() always returned a
> bytes object. But why deny the bytes object a decode() method if text
> objects have an encode() method?

I agree, text.encode() and bytes.decode() are both swell.  It's the other
two that bother me.

I'd say there are two "symmetric" API flavors possible (t and b are
> text and bytes objects, respectively, where text is a string type,
> either str or unicode; enc is an encoding name):
> - b.decode(enc) -> t; t.encode(enc) -> b
> - b = bytes(t, enc); t = text(b, enc)
> I'm not sure why one flavor would be preferred over the other,
> although having both would probably be a mistake.

I prefer constructor flavor; the word "bytes" feels more concrete than
"encode".  But I worry about constructors being too overloaded.

>>> text(b, enc)  # decode
>>> text(mydict)  # repr
>>> text(b)       # uh... decode with default encoding?

-------------- next part --------------
An HTML attachment was scrubbed...

From guido at  Thu Feb 16 21:09:06 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 12:09:06 -0800
Subject: [Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Paul Moore <p.f.moore at> wrote:
> On 2/16/06, Guido van Rossum <guido at> wrote:
> > On 2/16/06, Nick Coghlan <ncoghlan at> wrote:
> > > The PEP itself requests that a string be returned from get_data(), but doesn't
> > > require that the file be opened in text mode. Perhaps the PEP 302 emulation
> > > should use binary mode here? Otherwise there could be strange data corruption
> > > bugs on Windows.
> >
> > But PEP 302 shows as its only example reading from a file with a .txt
> > extension. Adding spurious \r characters is also data corruption. We
> > should probably post to python-dev a request for clarification of PEP
> > 302, but in the mean time I vote for text mode.
> FWIW, the .txt example was just a toy example. I'd say that binary
> mode makes sense, as I can imagine using the get_data interface to
> load image files, for example. It makes getting text files a bit
> harder (you have to munge CRLF manually) but at least you have the
> *option* of getting binary files.
> On reflection, get_data should probably have been given a mode
> argument. But given that it didn't, binary seems safest.
> OTOH, I don't know who actually *uses* get_data for real (PJE, for
> eggs? py2exe?). Their opinions are likely to be of more importance.
> On the third hand, doing whatever the zipimport module does is likely
> to be right, as that's the key bit of prior art.

It doesn't do any CRLF -> LF translation so this supports the binary theory.

> Regardless, the PEP should be clarified. I'll make the change once
> agreement is reached.

Thanks. Based on the zipimport precedent I propose to make it binary.
The example could be changed to read a GIF image.

--Guido van Rossum (home page:

From bh at  Thu Feb 16 20:59:10 2006
From: bh at (Bernhard Herzog)
Date: Thu, 16 Feb 2006 20:59:10 +0100
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
 to PyNumberMethods
In-Reply-To: <dsu7t9$m9c$> (Travis E. Oliphant's message of
	"Tue, 14 Feb 2006 20:41:19 -0700")
References: <dsu7t9$m9c$>
Message-ID: <>

"Travis E. Oliphant" <oliphant.travis at> writes:

>     2) The __index__ special method will have the signature
>        def __index__(self):
>            return obj
>        Where obj must be either an int or a long or another object
>        that has the __index__ special method (but not self).

So int objects will not have an __index__ method (assuming that ints
won't return a different but equal int object).  However:

>     4) A new operator.index(obj) function will be added that calls
>        equivalent of obj.__index__() and raises an error if obj does not
>        implement the special method.

So operator.index(1) will raise an exception.  I would expect
operator.index to be implemented using PyNumber_index.


Intevation GmbH                       

From benji at  Thu Feb 16 20:35:26 2006
From: benji at (Benji York)
Date: Thu, 16 Feb 2006 14:35:26 -0500
Subject: [Python-Dev] still available
In-Reply-To: <19igus5puu6e2$>
References: <dsq741$4un$>	<>	<>	<>	<>	<dt020q$s7$>
Message-ID: <>

Alexander Schremmer wrote:
> In fact, PHP does it like which is even shorter, i.e.
> they fallback to the documentation if that path does not exist otherwise.

Like many things PHP, that seems a bit too magical for my tastes.
Benji York

From guido at  Thu Feb 16 21:47:22 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 12:47:22 -0800
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/15/06, Neil Schemenauer <nas at> wrote:
> This could be a replacement for PEP 332.  At least I hope it can
> serve to summarize the previous discussion and help focus on the
> currently undecided issues.
> I'm too tired to dig up the rules for assigning it a PEP number.
> Also, there are probably silly typos, etc.   Sorry.

I may check it in for you, although right now it would be good if we
had some more feedback.

I noticed one behavior in your pseudo-code constructor that seems
questionable: while in the Q&A section you explain why the encoding is
ignored when the argument is a str instance, in fact you require an
encoding (and one that's not "ascii") if the str instance contains any
non-ASCII bytes. So bytes("\xff") would fail, but bytes("\xff",
"blah") would succeed. I think that's a bit strange -- if you ignore
the encoding, you should always ignore it. So IMO bytes("\xff") and
bytes("\xff", "ascii") should both return the same as bytes([255]).
Also, there's a code path where the initializer is a unicode instance
and its encode() method is called with None as the argument. I think
both could be fixed by setting the encoding to
sys.getdefaultencoding() if it is None and the argument is a unicode

    def bytes(initialiser=[], encoding=None):
        if isinstance(initialiser, basestring):
            if isinstance(initialiser, unicode):
                if encoding is None:
                    encoding = sys.getdefaultencoding()
                initialiser = initialiser.encode(encoding)
            initialiser = [ord(c) for c in initialiser]
        elif encoding is not None:
            raise TypeError("explicit encoding invalid for non-string "
        create bytes object and fill with integers from initialiser
        return bytes object

BTW, for folks who want to experiment, it's quite simple to create a
working bytes implementation by inheriting from array.array. Here's a
quick draft (which only takes str instance arguments):

    from array import array
    class bytes(array):
        def __new__(cls, data=None):
            b = array.__new__(cls, "B")
            if data is not None:
            return b
        def __str__(self):
            return self.tostring()
        def __repr__(self):
            return "bytes(%s)" % repr(list(self))
        def __add__(self, other):
            if isinstance(other, array):
                return bytes(super(bytes, self).__add__(other))
            return NotImplemented

--Guido van Rossum (home page:

From mal at  Thu Feb 16 21:50:10 2006
From: mal at (M.-A. Lemburg)
Date: Thu, 16 Feb 2006 21:50:10 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Alex Martelli <aleaxit at> wrote:
>> I agree, or, MAL's idea of and is also
>> good.
> No, the bytes and text data types shouldn't have to be tied to the I/O
> system. (The latter tends to evolve at a much faster rate so should be
> isolated.)
>> My fondest dream is that we do NOT have an 'open' builtin
>> which has proven to be very error-prone when used in Windows by
>> newbies (as evidenced by beginner errors as seen on, the
>> python-help lists, and other venues) -- defaulting 'open' to text is
>> errorprone, defaulting it to binary doesn't seem the greatest idea
>> either, principle "when in doubt, resist the temptation to guess"
>> strongly suggests not having 'open' as a built-in at all.
> Bill Janssen has expressed this sentiment too. But this is because
> open() *appears* to work for both types to Unix programmers. If open()
> is *only* usable for text data, even Unix programmers will be using
> openbytes() from the start.

All the variations aside:

What will be the explicit way to open a file in bytes mode
and in text mode (I for one would like to move away from
open() completely as well) ?

Will we have a single file type with two different modes
or two different types ?

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 16 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From guido at  Thu Feb 16 22:11:49 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 13:11:49 -0800
Subject: [Python-Dev] Proposal: defaultdict
Message-ID: <>

A bunch of Googlers were discussing the best way of doing the
following (a common idiom when maintaining a dict of lists of values
relating to a key, sometimes called a multimap):

  if key not in d: d[key] = []

An alternative way to spell this uses setdefault(), but it's not very readable:

  d.setdefault(key, []).append(value)

and it also suffers from creating an unnecessary list instance.
(Timings were inconclusive; the approaches are within 5-10% of each
other in speed.)

My conclusion is that setdefault() is a failure -- it was a
well-intentioned construct, but doesn't actually create more readable

Google has an internal data type called a DefaultDict which gets
passed a default value upon construction. Its __getitem__ method,
instead of raising KeyError, inserts a shallow copy (!) of the given
default value into the dict when the value is not found. So the above
code, after

  d = DefaultDict([])

can be written as simply


Note that of all the possible semantics for __getitem__ that could
have produced similar results (e.g. not inserting the default in the
underlying dict, or not copying the default value), the chosen
semantics are the only ones that makes this example work.

Over lunch with Alex Martelli, he proposed that a subclass of dict
with this behavior (but implemented in C) would be a good addition to
the language. It looks like it wouldn't be hard to implement. It could
be a builtin named defaultdict. The first, required, argument to the
constructor should be the default value. Remaining arguments (even
keyword args) are passed unchanged to the dict constructor.

Some more design subtleties:

- "key in d" still returns False if the key isn't there
- "d.get(key)" still returns None if the key isn't there
- "d.default" should be a read-only attribute giving the default value


--Guido van Rossum (home page:

From thomas at  Thu Feb 16 22:21:33 2006
From: thomas at (Thomas Wouters)
Date: Thu, 16 Feb 2006 22:21:33 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Feb 16, 2006 at 01:11:49PM -0800, Guido van Rossum wrote:

> Over lunch with Alex Martelli, he proposed that a subclass of dict
> with this behavior (but implemented in C) would be a good addition to
> the language. It looks like it wouldn't be hard to implement. It could
> be a builtin named defaultdict. The first, required, argument to the
> constructor should be the default value. Remaining arguments (even
> keyword args) are passed unchanged to the dict constructor.

Should a dict subclass really change the constructor/initializer signature
in an incompatible way?

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From tdelaney at  Thu Feb 16 22:27:04 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Fri, 17 Feb 2006 08:27:04 +1100
Subject: [Python-Dev] Proposal: defaultdict
Message-ID: <>

Guido van Rossum wrote:

> Over lunch with Alex Martelli, he proposed that a subclass of dict
> with this behavior (but implemented in C) would be a good addition to
> the language. It looks like it wouldn't be hard to implement. It could
> be a builtin named defaultdict. The first, required, argument to the
> constructor should be the default value. Remaining arguments (even
> keyword args) are passed unchanged to the dict constructor.
> Feedback?

On behalf of everyone who has answered this question on, may I

FWIW, my usual spelling is:

        v = d[key]
        v = d[key] = value

which breaks the principle of "write it once".

Tim Delaney

From exarkun at  Thu Feb 16 22:53:57 2006
From: exarkun at (Jean-Paul Calderone)
Date: Thu, 16 Feb 2006 16:53:57 -0500
Subject: [Python-Dev] from __future__ import unicode_strings?
In-Reply-To: <>
Message-ID: <20060216215357.6122.733406986.divmod.quotient.864@ohm>

On Thu, 16 Feb 2006 16:29:40 +0100, "M.-A. Lemburg" <mal at> wrote:
>Jean-Paul Calderone wrote:
>> On Thu, 16 Feb 2006 11:24:35 +0100, "M.-A. Lemburg" <mal at> wrote:
>>> Neil Schemenauer wrote:
>>>> On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
>>>>> On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:
>>>>>>>     from __future__ import unicode_strings
>>>>>> Didn't we have a command-line option to do this? I believe it was
>>>>>> removed because nobody could see the point. (Or am I hallucinating?
>>>>>> After several days of non-stop discussing bytes that must be
>>>>>> considered a possibility.)
>>>>> We do, and it's not been removed: the -U switch.
>>>> As Guido alluded, the global switch is useless.  A per-module switch
>>>> something that could actually useful.  One nice advantage is that
>>>> you would write code that works the same with Jython (wrt to string
>>>> literals anyhow).
>>> The global switch is not useless. It's purpose is to test the
>>> standard library (or any other piece of Python code) for Unicode
>>> compatibility.
>>> Since we're not even close to such compatibility, I'm not sure
>>> how useful a per-module switch would be.
>> Just what Neil suggested: developers writing new code benefit from having the behavior which will ultimately be Python's default, rather than the behavior that is known to be destined for obsolescence.
>> Being able to turn this on per-module is useful for the same reason the rest of the future system is useful on a per-module basis.  It's easier to convert things incrementally than monolithicly.
>Sure, but in this case the option would not only affect the module
>you define it in, but also all other code that now gets Unicode
>objects instead of strings as a result of the Unicode literals
>defined in these modules.

This is precisely correct.  It is also exactly parallel to the only other __future__ import which changes any behavior.  Personally, I _also_ like future division.  Is it generally considered to have been a mistake?

>It is rather likely that you'll start hitting Unicode-related
>compatibility bugs in the standard lib more often than you'd

You can guess this.  I'll guess that it isn't the case.  And who's to say how often I'd like that to happen, anyway? :)  Anyone who's afraid that will happen can avoid using the import.  Voila, problem solved.

>It's usually better to switch to Unicode in a controlled manner:
>not by switching all literals to Unicode, but only some, then
>test things, then switch over some more, etc.

There's nothing uncontrolled about this proposed feature, so this statement doesn't hold any meaning.

>This can be done by prepending the literal with the u"" modifier.

Anyone who is happier converting one string literal at a time can do this.  Anyone who would rather convert a module at a time can use the future import.


From martin at  Thu Feb 16 23:06:15 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 16 Feb 2006 23:06:15 +0100
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>	<>	<>
	<dt04in$a5l$>	<>
Message-ID: <>

Fredrik Lundh wrote:
>     "I don't agree with the change. You just broke source compatibility
>     between the core package and PyXML."

I'm still unhappy with that change, and still nobody has told me how to
maintain PyXML so that it can continue to work both for 2.5 and for 2.4.


From greg.ewing at  Thu Feb 16 23:55:33 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 17 Feb 2006 11:55:33 +1300
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> The first, required, argument to the
> constructor should be the default value.

I'd like to suggest that this argument be a function
for creating default values, rather than an actual
default value. This would avoid any confusion over
exactly how the default value is copied. (Shallow or
deep? How deep?)

In an earlier discussion it was pointed out that
this would be no less convenient for many common
use cases, e.g. in your example,

   d = defaultdict(list)

Also I'm not sure about the name "defaultdict".
When I created a class like this recently, I called
it an "autodict" (i.e. a dict that automatically
extends itself with new entries). And perhaps the
value should be called an "initial value" rather
than a default value, to more strongly suggest that
it becomes a permanent part of the dict.


From bokr at  Fri Feb 17 00:15:04 2006
From: bokr at (Bengt Richter)
Date: Thu, 16 Feb 2006 23:15:04 GMT
Subject: [Python-Dev] str object going in Py3K
References: <>
Message-ID: <>

On Wed, 15 Feb 2006 21:59:55 -0800, Alex Martelli <aleaxit at> wrote:

>On Feb 15, 2006, at 9:51 AM, Barry Warsaw wrote:
>> On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:
>>> Regarding open vs. opentext, I'm still not sure. I don't want to
>>> generalize from the openbytes precedent to openstr or openunicode
>>> (especially since the former is wrong in 2.x and the latter is wrong
>>> in 3.0). I'm tempting to hold out for open() since it's most
>>> compatible.
>> If we go with two functions, I'd much rather hang them off of the file
>> type object then add two new builtins.  I really do think file.bytes()
>> and file.text() (a.k.a. open.bytes() and open.text()) is better than
>> opentext() or openbytes().
>I agree, or, MAL's idea of and is also  
>good.  My fondest dream is that we do NOT have an 'open' builtin  
>which has proven to be very error-prone when used in Windows by  
>newbies (as evidenced by beginner errors as seen on, the  
>python-help lists, and other venues) -- defaulting 'open' to text is  
>errorprone, defaulting it to binary doesn't seem the greatest idea  
>either, principle "when in doubt, resist the temptation to guess"  
>strongly suggests not having 'open' as a built-in at all.  (And  
>namemangling into openthis and openthat seems less Pythonic to me  
>than exploiting namespaces by making structured names, either  
> and or open.this and open.that).  IOW, I entirely  
>agree with Barry and Marc Andre.
FWIW, I'd vote for file.text and file.bytes

I don't like or because I think
types in general should not know about I/O (IIRC Guido said that, so pay attention ;-)
Especially unicode.

E.g., why should unicode pull in a whole wad of I/O-related code if the user
is only using it as intermediary in some encoding change between low level binary
input and low level binary output? E.g., consider what you could do with one statement like (untested)

    s_str.translate(table, delch).encode('utf-8')

especially if you didn't have to introduce a phony latin-1 decoding and write it as (untested)

    s_str.translate(table, delch).decode('latin-1').encode('utf-8')     # use str.translate
    s_str.decode('latin-1').translate(mapping).encode('utf-8')          # use unicode.translate also for delch

to avoid exceptions if you have non-ascii in your s_str translation

It seems s_str.translate(table, delchars) wants to convert the s_str to unicode
if table is unicode, and then use unicode.translate (which bombs on delchars!) instead
of just effectively defining str.translate as

    def translate(self, table, deletechars=None):
        return ''.join(table[ord(x)] for x in self
                       if deletechars is None or x not in deletechars)

IMO, if you want unicode.translate, then write unicode(s_str).translate and use that.
Let str.translate just use the str ords, so simple custom decodes can be written without
the annoyance of

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 3: ordinal not in range(128)

Can we change this? Or what am I missing? I certainly would like to miss
the above message for str.translate :-(

BTW This would also allow taking advantage of features of both translates if desired, e.g. by
    s_str.translate(unichartable256, strdelchrs).translate(uniord_to_ustr_mapping).
(e.g., the latter permits single to multiple-character substitution)

This makes me think a translate method for bytes would be good for py3k (on topic ;-)
It it is just too handy a high speed conversion goodie to forgo IMO.

BTW, ISTM that it would be nice to have a chunking-iterator-wrapper-returning-method
(as opposed to buffering specification) for file.bytes, so you could plug in

    file.bytes('path').chunk(1)  # maybe keyword opts for simple common record chunking also?

in places where you might now have to have (untested)

    (ord(x) for x in iter(lambda f=open('path','rb') if x)

or write a helper like
    def by_byte_ords(path, bufsize=8192):
        f = open(path, 'rb')
        buf =
        while buf:
            for x in buf: yield ord(x)
            buf =
and plug in

BTW, bytes([]) would presumably be the file.bytes EOF?
Bengt Richter

From martin at  Fri Feb 17 00:21:26 2006
From: martin at (=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 00:21:26 +0100
Subject: [Python-Dev] how bugfixes are handled?
In-Reply-To: <dt19ja$i8s$>
References: <dt07a8$khp$>	<>
Message-ID: <>

Arkadiusz Miskiewicz wrote:
> I wasn't, thanks for information.
> Still few questions... one of developers/commiters reviews patch and commit
> it? Few developers has to review single patch?

As Neal says, a single committer can review and commit. However,
non-committers can also review; this is the point of asking for
patch reviews. In many cases, the initial patch will not be "good
enough": it will lack documentation and test cases, it will contain
bugs, not follow the code formatting guidelines, and it will make
changes irrelevant to the issue being addressed ("gratuitous changes").

A reviewer is supposed to sort these all out, and then end up with
a final recommendation ("accept" or "reject"). Of course, if it is
going to be "reject", there is little point in making the submitter
comply with formal criteria.

Ideally, a committer then will only have to read the entire review
process, agree with it step-by-step, and commit the proposed change.

As a historical note: people doing a lot of reviews eventually end
up as committers, just because it is easier for the other committers
if they also do the final step.


From martin at  Fri Feb 17 00:27:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 00:27:16 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Another thought -- what is going to happen to
> Will it change to return bytes, or will there be a new
> os.openbytes?

Nit-pickingly: will continue to return integers.

I think it should return OS handles on Windows, instead
of C library handles. (also notice that this has nothing
to do with stdio: does not use stdio; this is
POSIX open).


From martin at  Fri Feb 17 00:33:49 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 00:33:49 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Josiah Carlson wrote:
> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
> and likely a few others that the two of you may be arguing against
> should stay as encodings, because strictly speaking, they are defined as
> encodings of data.  They may not be encodings of _unicode_ data, but
> that doesn't mean that they aren't useful encodings for other kinds of
> data, some text, some binary, ...

To support them, the bytes type would have to gain a .encode method,
and I'm -1 on supporting bytes.encode, or string.decode.

Why is


any better than



From bokr at  Fri Feb 17 03:25:25 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 02:25:25 GMT
Subject: [Python-Dev] str.translate vs unicode.translate (was: Re: str
	object going in Py3K)
References: <>
Message-ID: <>

If str becomes unicode for PY 3000, and we then have bytes as out coding-agnostic
byte data, then I think bytes should have the str translation method, with a tweak
that I would hope could also be done to str now.

BTW, str.translate will presumably become unicode.translate, so
perhaps unicode.translate should grow a compatible deletechars parameter.

But that's not the tweak. The tweak is to eliminate unavoidable pre-conversion to unicode
in str(something).translate(u'...', delchars) (and preemptively bytes(something).translate(u'...', delchars))

E.g. suppose you now want to write:

    s_str.translate(table, delch).encode('utf-8')

Note that s_str has no encoding information, and translate is conceptually just a 1:1 substitution
minus characters in delch. But if we want to do one-chr:one-unichr substitution by specifying a
256-long table of unicode characters, we cannot. It would be simple to allow it, and that's the
tweak I would like. It would allow easy custom decodes.

At the moment, if you want to write the above, you have to introduce a phony latin-1 decoding
and write it as (not typo-proof)

    s_str.translate(table, delch).decode('latin-1').encode('utf-8')     # use str.translate
    s_str.decode('latin-1').translate(mapping).encode('utf-8')          # use unicode.translate also for delch

to avoid exceptions if you have non-ascii in your s_str (even if delch would have removed them!!)

It seems s_str.translate(table, delchars) wants to convert the s_str to unicode
if table is unicode, and then use unicode.translate (which bombs on delchars!)
instead of just effectively defining str.translate as

    def translate(self, table, deletechars=None):
        return ''.join((table or isinstance(table,unicode) and uidentity or sidentity)[ord(x)] for x in self
                       if not deletechars or x not in deletechars)

    # For convenience in just pruning with deletechars, s_str.translate('', deletechars) deletes without translating,
    # and s_str.translate(u'', deletechars)  does the same and then maps to same-ord unicode characters
    # given
    #     sidentity = ''.join(chr(i) for i in xrange(256))
    # and
    #     uidentity = u''.join(unichr(i) for i in xrrange(256)).

IMO, if you want unicode.translate, then it doesn't hurt to write unicode(s_str).translate and use that.

Let str.translate just use the str ords, so simple custom decodes can be written without
the annoyance of e.g.,

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 3: ordinal not in range(128)

Can we change this for bytes? And why couldn't we change this for str.translate now?
Or what am I missing? I certainly would like to miss the above message for str.translate :-(

BTW This would also allow taking advantage of features of both translates if desired, e.g. by
    s_str.translate(unichartable256, strdelchrs).translate(uniord_to_ustr_or_uniord_mapping).
(e.g., the latter permits single to multiple-character substitution)

I think at least a tweaked translate method for bytes would be good for py3k,
and I hope we can do it for str.translate now.
It it is just too handy a high speed conversion goodie to forgo IMO.

Bengt Richter

From python at  Fri Feb 17 05:27:17 2006
From: python at (Raymond Hettinger)
Date: Thu, 16 Feb 2006 23:27:17 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <002301c6337a$7001f140$b83efea9@RaymondLaptop1>

>> Over lunch with Alex Martelli, he proposed that a subclass of dict
>> with this behavior (but implemented in C) would be a good addition to
>> the language

I would like to add something like this to the collections module, but a PEP is 
probably needed to deal with issues like:

* implications of a __getitem__ succeeding while get(value, x) returns x 
(possibly different from the overall default)
* implications of a __getitem__ succeeding while __contains__ would fail
* whether to add this to the collections module (I would say yes)
* whether to allow default functions as well as default values (so you could 
instantiate a new default list)
* comparing all the existing recipes and third-party modules that have already 
done this
* evaluating its fitness for common use cases (i.e. bags and dict of lists).
* lay out a few examples:

# bag like behavior
dd = collections.default_dict()
for elem in collection:
    dd[elem] += 1

# setdefault-like behavior
dd = collections.default_dict()
dd.default(list)                                # instantiate a new list for 
empty cells
for page_number, page in enumerate(book):
    for word in page.split():


From guido at  Fri Feb 17 05:44:43 2006
From: guido at (Guido van Rossum)
Date: Thu, 16 Feb 2006 20:44:43 -0800
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, M.-A. Lemburg <mal at> wrote:
> What will be the explicit way to open a file in bytes mode
> and in text mode (I for one would like to move away from
> open() completely as well) ?
> Will we have a single file type with two different modes
> or two different types ?

I'm currently thinking of an I/O stack somewhat like Java's. At the
bottom there's a class that lets you do raw unbuffered reads and
writes (and seek/tell) on binary files using bytes arrays. We can
layer onto this buffering, text encoding/decoding, and more. (Windows
CRLF<->LF conversion is also an encoding of sorts).

Years ago I wrote a prototype; checkout sandbox/sio/.

--Guido van Rossum (home page:

From jack at  Fri Feb 17 05:50:38 2006
From: jack at (Jack Diederich)
Date: Thu, 16 Feb 2006 23:50:38 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt2bsi$cb4$>
References: <>
Message-ID: <>

On Thu, Feb 16, 2006 at 06:13:53PM +0100, Fredrik Lundh wrote:
> Barry Warsaw wrote:
> > We know at least there will never be a 2.10, so I think we still have
> > time.
> because there's no way to count to 10 if you only have one digit?
> we used to think that back when the gas price was just below 10 SEK/L,
> but they found a way...

Of course they found a way.  The alternative was cutting taxes.



From jcarlson at  Fri Feb 17 06:20:30 2006
From: jcarlson at (Josiah Carlson)
Date: Thu, 16 Feb 2006 21:20:30 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Josiah Carlson wrote:
> > They may not be encodings of _unicode_ data,
> But if they're not encodings of unicode data, what
> business do they have being available through
> someunicodestring.encode(...)?

I had always presumed that bytes objects are going to be able to be a
source for encode AND decode, like current non-unicode strings are able
to be today.  In that sense, if I have a bytes object which is an
encoding of rot13, hex, uu, etc., or I have a bytes object which I would
like to be in one of those encodings, I should be able to do b.encode(...)
or b.decode(...), given that 'b' is a bytes object.

Are 'encodings' going to become a mechanism to encode and decode
_unicode_ strings, rather than a mechanism to encode and decode _text
and data_ strings?  That would seem like a backwards step to me, as the
email package would need to package their own base-64 encode/decode API
and implementation, and similarly for any other package which uses any
one of the encodings already available.

 - Josiah

From steve at  Fri Feb 17 06:43:50 2006
From: steve at (Steve Holden)
Date: Fri, 17 Feb 2006 00:43:50 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt2bsi$cb4$>
References: <><dt09vc$tvv$>	<><dt0b8s$2eb$><><dt0fr2$fmg$>	<>
Message-ID: <dt3nqh$rlt$>

Fredrik Lundh wrote:
> Barry Warsaw wrote:
>>We know at least there will never be a 2.10, so I think we still have
> because there's no way to count to 10 if you only have one digit?
> we used to think that back when the gas price was just below 10 SEK/L,
> but they found a way...
IIRC Guido is on record as saying "There will be no Python 2.10 because 
I hate the ambiguity of double-digit minor release numbers", or words to 
that effect.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From steve at  Fri Feb 17 06:59:25 2006
From: steve at (Steve Holden)
Date: Fri, 17 Feb 2006 00:59:25 -0500
Subject: [Python-Dev] Test failures in test_timeout
In-Reply-To: <>
References: <>
Message-ID: <dt3onn$rlt$>

Thomas Wouters wrote:
> I'm seeing spurious test failures in test_timeout, on my own workstation and
> on (now that it crashes less; Apple sent over some new
> memory.) The problem is pretty simple: both macteagle and my workstation
> live too closely, network-wise, to
> class TimeoutTestCase(unittest.TestCase):
>     [...]
>     def setUp(self):
>         [...]
>         self.addr_remote = ('', 80)
>     [...]
>     def testConnectTimeout(self):
>         # Test connect() timeout
>         _timeout = 0.001
>         self.sock.settimeout(_timeout)
>         _t1 = time.time()
>         self.failUnlessRaises(socket.error, self.sock.connect,
>                 self.addr_remote)
> In other words, the test fails because responds too quickly.
> The test on my workstation only fails occasionally, but I do expect
> macteagle's failure to be more common (since it's connected to
> through (literally) a pair of gigabit switches, whereas my
> workstation has to pass through a few more switches, two Junipers and some
> dark fiber.) Lowering the timeout has no effect, as far as I can tell, which
> is probably a granularity issue.
> I'm thinking that it could probably try to connect to a less reliable
> website, but that's just moving the problem around (and possibly harassing
> an unsuspecting website, particularly around release-time.) Perhaps the test
> should try to connect to a known unconnecting site, like a firewalled port
> on Not something that refuses connections, just something
> that times out.
Couldn't the test use subprocess to start a reliably slow server on 
localhost? It might even be possible to retrieve the ephemeral port 
number used by the server, to avoid conflicts with already-used ports on 
the testing machine.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From steve at  Fri Feb 17 06:55:42 2006
From: steve at (Steve Holden)
Date: Fri, 17 Feb 2006 00:55:42 -0500
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>
Message-ID: <>

Talin wrote:
> First off, let me apologize for bringing up a topic that I am sure that 
> everyone is sick of: Lambda.
> I broached this subject to a couple of members of this list privately, 
> and I got wise feedback on my suggestions which basically amounted to 
> "don't waste your time."
> However, after having thought about this for several weeks, I came to 
> the conclusion that I felt so strongly about this issue that the path of 
> wisdom simply would not do, and I would have to choose the path of 
> folly. Which I did.
Thereby proving the truth of the old Scottish adage "The better the 
advice, the worse it's wasted". Not that I haven't been wasted myself a 
time or two.

> In other words, I went ahead and implemented it. Actually, it wasn't too 
> bad, it only took about an hour of reading the ast.c code and the 
> Grammar file (neither of which I had ever looked at before) to get the 
> general sense of what's going on.
> So the general notion is similar to the various proposals on the Wiki - 
> an inline keyword which serves the function of lambda. I chose the 
> keyword "given" because it reminds me of math textbooks, e.g. "given x, 
> solve for y". And I like the idea of syntactical structures that make 
> sense when you read them aloud.
> Here's an interactive console session showing it in action.
> The first example shows a simple closure that returns the square of a 
> number.
>     >>> a = (x*x given x)
>     >>> a(9)
>     81
> You can also put parens around the argument list if you like:
>     >>> a = (x*x given (x))
>     >>> a(9)
>     81
> Same thing with two arguments, and with the optional parens:
>     >>> a = (x*y given x,y)
>     >>> a(9, 10)
>     90
>     >>> a = (x*y given (x,y))
>     >>> a(9, 10)
>     90
> Yup, keyword arguments work too:
>     >>> a = (x*y given (x=3,y=4))
>     >>> a(9, 10)
>     90
>     >>> a(9)
>     36
>     >>> a()
>     12
> Use an empty paren-list to indicate that you want to define a closure 
> with no arguments:
>     >>> a = (True given ())
>     >>> a()
>     True
> Note that there are some cases where you have to use the parens around 
> the arguments to avoid a syntactical ambiguity:
>     >>> map( str(x) given x, (1, 2, 3, 4) )
>       File "<stdin>", line 1
>         map( str(x) given x, (1, 2, 3, 4) )
>                             ^
>     SyntaxError: invalid syntax
> As you can see, adding the parens makes this work:
>     >>> map( str(x) given (x), (1, 2, 3, 4) )
>     ['1', '2', '3', '4']
> More fun with "map":
>     >>> map( str(x)*3 given (x), (1, 2, 3, 4) )
>     ['111', '222', '333', '444']
> Here's an example that uses the **args syntax:
>     >>> a = (("%s=%s" % pair for pair in kwargs.items()) given **kwargs)
>     >>> list( a(color="red") )
>     ['color=red']
>     >>> list( a(color="red", sky="blue") )
>     ['color=red', 'sky=blue']
> I have to say, the more I use it, the more I like it, but I'm sure that 
> this is just a personal taste issue. It looks a lot more natural to me 
> than lambda.
> I should also mention that I resisted the temptation to make the 'given' 
> keyword an optional generator suffix as in "(a for a in l given l). As I 
> started working with the code, I started to realize that generators and 
> closures, although they have some aspects in common, are very different 
> beasts and should not be conflated lightly. (Plus the implementation 
> would have been messy. I took that as a clue :))
> Anyway, if anyone wants to play around with the patch, it is rather 
> small - a couple of lines in Grammar, and a small new function in ast.c, 
> plus a few mods to other functions to get them to call it. The context 
> diff is less than two printed pages. I can post it somewhere if people 
> are interested.
> Anyway, I am not going to lobby for a language change or write a PEP 
> (unless someone asks me to.) I just wanted to throw this out there and 
> see what people think of it. I definately don't want to start a flame 
> war, although I suspect I already have :/
> Now I can stop thinking about this and go back to my TurboGears-based 
> Thesaurus editor :)
Whether or not Guido can steel himself to engage in yet another round of 
this seemingly interminable discussion, at least this proposal has the 
merit of being concrete and not hypothetical.

It appears to hang together, but I'm not sure I see how it overcomes 
objections to lambda by replacing it with another keyword.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From steve at  Fri Feb 17 07:09:26 2006
From: steve at (Steve Holden)
Date: Fri, 17 Feb 2006 01:09:26 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <dt3pah$j0$>

Thomas Wouters wrote:
> On Thu, Feb 16, 2006 at 01:11:49PM -0800, Guido van Rossum wrote:
>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>with this behavior (but implemented in C) would be a good addition to
>>the language. It looks like it wouldn't be hard to implement. It could
>>be a builtin named defaultdict. The first, required, argument to the
>>constructor should be the default value. Remaining arguments (even
>>keyword args) are passed unchanged to the dict constructor.
> Should a dict subclass really change the constructor/initializer signature
> in an incompatible way?
Dict is a particularly difficult type to subclass anyway, given that it 
can take an arbitrary number of arbitrarily-named keyword arguments 
(among many other argument styles).

The proposed behavior is exactly how Icon tables behaved, and it was 
indeed useful in that language. Guido is right about setdefault being a 
busted flush.

If there's no way to resolve the signature issue (which there may not 
be, given that

dict({'one': 2, 'two': 3})
dict({'one': 2, 'two': 3}.items())
dict({'one': 2, 'two': 3}.iteritems())
dict(zip(('one', 'two'), (2, 3)))
dict([['two', 3], ['one', 2]])
dict(one=2, two=3)
dict([(['one', 'two'][i-2], i) for i in (2, 3)])

are all valid calls to the type) then a factory function would be a very 
acceptable substitute, no? (The function could make use of a subclass - 
there's surely no necessity to provide the default as an initializer 
argument: it could be provided as an argument to a method present only 
in the subclass).

wishing-i-could-have-lunch-with-alex-ly y'rs  - steve
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From stephen at  Fri Feb 17 07:11:12 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 17 Feb 2006 15:11:12 +0900
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <>
	(Guido van Rossum's message of "Wed, 15 Feb 2006 12:33:10 -0800")
References: <>
	<> <>
Message-ID: <>

>>>>> "Guido" == Guido van Rossum <guido at> writes:

    Guido> I think that the implementation of encoding-guessing or
    Guido> auto-encoding-upgrade techniques should be left out of the
    Guido> standard library design for now.

As far as I can see, little new design is needed.  There's no reason
why an encoding-guesser couldn't be written as a codec that detects
the coding, then dispatches to the appropriate codec.  The only real
issue I know of is that if you ask such a codec "who are you?", there
are two plausible answers: "autoguess" and the codec actually being
used to translate the stream.  If there's no API to ask for both of
those, the API might want generalization.

    Guido> As far as searching bytes objects, that shouldn't be a
    Guido> problem as long as the search 'string' is also specified as
    Guido> a bytes object.

You do need to be a little careful in implementation, as (for example)
"case insensitive" should be meaningless for searching bytes objects.
This would be especially important if searching and collation become
more Unicode conformant.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From bokr at  Fri Feb 17 07:24:57 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 06:24:57 GMT
Subject: [Python-Dev] Pre-PEP: The "bytes" object
References: <>
Message-ID: <>

On Thu, 16 Feb 2006 12:47:22 -0800, Guido van Rossum <guido at> wrote:

>On 2/15/06, Neil Schemenauer <nas at> wrote:
>> This could be a replacement for PEP 332.  At least I hope it can
>> serve to summarize the previous discussion and help focus on the
>> currently undecided issues.
>> I'm too tired to dig up the rules for assigning it a PEP number.
>> Also, there are probably silly typos, etc.   Sorry.
>I may check it in for you, although right now it would be good if we
>had some more feedback.
>I noticed one behavior in your pseudo-code constructor that seems
>questionable: while in the Q&A section you explain why the encoding is
>ignored when the argument is a str instance, in fact you require an
>encoding (and one that's not "ascii") if the str instance contains any
>non-ASCII bytes. So bytes("\xff") would fail, but bytes("\xff",
>"blah") would succeed. I think that's a bit strange -- if you ignore
>the encoding, you should always ignore it. So IMO bytes("\xff") and
>bytes("\xff", "ascii") should both return the same as bytes([255]).
>Also, there's a code path where the initializer is a unicode instance
>and its encode() method is called with None as the argument. I think
>both could be fixed by setting the encoding to
>sys.getdefaultencoding() if it is None and the argument is a unicode
>    def bytes(initialiser=[], encoding=None):
>        if isinstance(initialiser, basestring):
>            if isinstance(initialiser, unicode):
>                if encoding is None:
>                    encoding = sys.getdefaultencoding()
>                initialiser = initialiser.encode(encoding)
>            initialiser = [ord(c) for c in initialiser]
>        elif encoding is not None:
>            raise TypeError("explicit encoding invalid for non-string "
>                            "initialiser")
>        create bytes object and fill with integers from initialiser
>        return bytes object

Two things:

As the above shows, str is encoding-agnostic and passes through
unmodified to bytes (except by ord).

I am wondering what it would hurt to allow the same for unicode ords,
since unicode is also encoding-agnostic. Please read [2] before
deciding that you have already decided this ;-)

The beauty of a unicode literal IMO is that it launders away
the source encoding into a coding-agnostic character sequence
that has stable ords across the universe, so why not use them?
It also solves a lot of ecaping grief. But see [2]

After all, in either case, an encoding can be specified if so desired. Thus

     def bytes(initialiser=[], encoding=None):
         if isinstance(initialiser, basestring):
             if encoding:
                 initialiser = initialiser.encode(encoding) # XXX for str ?? see [2]
             initialiser = [ord(c) for c in initialiser]
         elif encoding is not None:
             raise TypeError("explicit encoding invalid for non-string "
         create bytes object and fill with integers from initialiser
         return bytes object


One thing I wonder is where sys.getdefaultencoding() gets its info, and whether
a module_encoding is also necessary for str arguments with encoding.

E.g. if the source encoding is utf-8, and you want sys.getdefaultencoding()
finally, don't you first have to do decode from the source encoding, rather than
let the default decoding assumption for that be ascii? E.g. for utf-8 source,

    initialiser.decode('utf-8').encode(sys.getdefaultencodeing()) ?

works, but

    initialiser.encode(sys.getdefaultencodeing())  ?

bombs, because it tries to do .decode('ascii') in place of .decode('utf-8')

Notice where the following fails (where utf-8 source is written to
by and using latin-1 as standin for sys.getdefaultencoding())

----< >-------------------------------------------
def test():
    latin_1_src = """\
# -*- coding: utf-8 -*-
print '\\nfrom tutf8 import:'
print map(hex,map(ord, 'abc\xf6'))
print map(hex,map(ord,'abc\xf6'.decode('utf-8').encode('latin-1')))
print map(hex,map(ord,repr('abc\xf6'.encode('latin-1'))))

if __name__ == '__main__':
    print '\ utf-8 binary line reprs:'
    print '\n'.join(repr(L) for L in open('','rb').read().splitlines())
    import tutf8
The result:

[20:17] C:\pywk\pydev\pep0332>py24 utf-8 binary line reprs:
'# -*- coding: utf-8 -*-'
"print '\\nfrom tutf8 import:'"
"print map(hex,map(ord, 'abc\xc3\xb6'))"
"print map(hex,map(ord,'abc\xc3\xb6'.decode('utf-8').encode('latin-1')))"
"print map(hex,map(ord,repr('abc\xc3\xb6'.encode('latin-1'))))"

from tutf8 import:
['0x61', '0x62', '0x63', '0xc3', '0xb6']
['0x61', '0x62', '0x63', '0xf6']
Traceback (most recent call last):
  File "", line 15, in ?
    import tutf8
  File "C:\pywk\pydev\pep0332\", line 5, in ?
    print map(hex,map(ord,repr('abc+¦'.encode('latin-1'))))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)

I.e., if you leave out encoding for a str, you apparently get the native source
str representation of the literal, so it would seem that that must be undone
if you want to re-encode to anything else.

Should there be tutf8.__encoding__ available for this after import tutf8?
But that's interesting when str becomes unicode, and all literals will presumably have
an internal uniform unicode encoding, so the 'literal'.decode(source_encoding) will in effect already
have been done. What does a decode mean on unicode? It seems to mean blow up on non-ascii, so
that's not very portable. Why not use latin-1 as the default intermediate str representation when
doing a u'something'.decode(enc) ? The restriction to ascii in that context seems artificial.

IMHO and with all due respect ISTM the pain of all these considerations is not worth it when
the simple practicality of just prefixing a "u" on any ascii literal freely sprinkled
with escapes gets you exactly the bytes values you specify in any hex escapes. That's normally
what you want.

If by 'abc\xf6' you really mean the character with ord value 0xf6 in some encoding, then
bytes('abc\xf6'.decode(someenc), destenc) would be the way, so no one is stuck.

One danger is that someone is writing an in incomplete source character set and
wants to stick in some byte values in hex, happily sticking to the ascii subset
plus escapes, but a decode from the source encoding can fail on non-existent character
if the "ascii escape" is not in the source character set. E.g., cp1252 is pretty complete,

 >>> '\x81'.decode('cp1252')
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "d:\python-2.4b1\lib\encodings\", line 22, in decode
     return codecs.charmap_decode(input,errors,decoding_map)
 UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 0: character maps to <undefined>

This can't happen with the same literal of ascii plus escapes passed as a unicode literal, given
that map(ord, literal) is done on it to get bytes when no encoding is specified. You just get what you expect.
It seems practical to me. I'm really trying to help, not piss you off ;-)

BTW, I recently posted re str.translate vs unicode.translate, which has some tie-in with this, since
I anticipate that bytes.translate would be a useful thing in the absence of str.translate.
unicode.translate won't do all one might like to do with bytes.translate, I believe. Both
have uses.

>BTW, for folks who want to experiment, it's quite simple to create a
>working bytes implementation by inheriting from array.array. Here's a
>quick draft (which only takes str instance arguments):
>    from array import array
>    class bytes(array):
>        def __new__(cls, data=None):
>            b = array.__new__(cls, "B")
>            if data is not None:
>                b.fromstring(data)
>            return b
>        def __str__(self):
>            return self.tostring()
>        def __repr__(self):
>            return "bytes(%s)" % repr(list(self))
>        def __add__(self, other):
>            if isinstance(other, array):
>                return bytes(super(bytes, self).__add__(other))
>            return NotImplemented
Cool, thanks.

Bengt Richter

From stephen at  Fri Feb 17 07:40:53 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 17 Feb 2006 15:40:53 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
	(Guido van Rossum's message of "Wed, 15 Feb 2006 11:16:51 -0800")
References: <>
Message-ID: <>

>>>>> "Guido" == Guido van Rossum <guido at> writes:

    Guido> I'd say there are two "symmetric" API flavors possible (t
    Guido> and b are text and bytes objects, respectively, where text
    Guido> is a string type, either str or unicode; enc is an encoding
    Guido> name):

    Guido> - b.decode(enc) -> t; t.encode(enc) -> b

-0  When taking a binary file and attaching it to the text of a mail
message using BASE64, the tendency to say you're "encoding the file in
BASE64" is very strong.  I just don't see how such usages can be
avoided in discussion, which makes the types of decode and encode hard
to remember, and easy to mistake in some contexts.

    Guido> - b = bytes(t, enc); t = text(b, enc)

+1  The coding conversion operation has always felt like a constructor
to me, and in this particular usage that's exactly what it is.  I
prefer the nomenclature to reflect that.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From stephen at  Fri Feb 17 07:43:48 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 17 Feb 2006 15:43:48 +0900
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <> (Bob
	Ippolito's message of "Wed, 15 Feb 2006 11:23:22 -0800")
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

>>>>> "Bob" == Bob Ippolito <bob at> writes:

    Bob> Huh?  What does that have to do with anything?  I've never
    Bob> seen a system where /usr/include, /usr/lib, /usr/bin,
    Bob> etc. are not all on the same mount.  It's not really any
    Bob> different with OS X either.

/usr/share often is on a different mount; that's the whole rationale
for /usr/share.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From martin at  Fri Feb 17 08:09:23 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 08:09:23 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> Feedback?

I would like this to be part of the standard dictionary type,
rather than being a subtype.

d.setdefault([]) (one argument) should install a default value,
and d.cleardefault() should remove that setting; d.default
should be read-only. Alternatively, d.default could be assignable
and del-able.

Also, I think has_key/in should return True if there is a default.


From martin at  Fri Feb 17 08:13:04 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 08:13:04 +0100
Subject: [Python-Dev] Does eval() leak?
In-Reply-To: <>
References: <>
Message-ID: <>

John Marshall wrote:
> Should I expect the virtual memory allocation
> to go up if I do the following?

python-dev is a list for discussing development of Python,
not the development with Python. Please post this question
to python-list at

For python-dev, a message explaining where the memory leak
is and how to correct it would be more appropriate. Most
likely, there is no memory leak in eval.


From bokr at  Fri Feb 17 08:41:37 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 07:41:37 GMT
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <>

On Thu, 16 Feb 2006 13:11:49 -0800, Guido van Rossum <guido at> wrote:

>A bunch of Googlers were discussing the best way of doing the
>following (a common idiom when maintaining a dict of lists of values
>relating to a key, sometimes called a multimap):
>  if key not in d: d[key] = []
>  d[key].append(value)
>An alternative way to spell this uses setdefault(), but it's not very readable:
>  d.setdefault(key, []).append(value)
>and it also suffers from creating an unnecessary list instance.
>(Timings were inconclusive; the approaches are within 5-10% of each
>other in speed.)
>My conclusion is that setdefault() is a failure -- it was a
>well-intentioned construct, but doesn't actually create more readable
>Google has an internal data type called a DefaultDict which gets
>passed a default value upon construction. Its __getitem__ method,
>instead of raising KeyError, inserts a shallow copy (!) of the given
>default value into the dict when the value is not found. So the above
>code, after
>  d = DefaultDict([])
>can be written as simply
>  d[key].append(value)
Wouldn't it be more generally powerful to pass type or factory function
to use to instantiate a default object when a missing key is encountered, e.g.

   d = DefaultDict(list)



but then you can also do

   d = DefaultDict(dict)


   class Foo(object): pass
   d = DefaultDict(Foo)
   d[key].phone = '415-555-1212'

etc. No worries about generalizing shallow copying either ;-)

>Note that of all the possible semantics for __getitem__ that could
>have produced similar results (e.g. not inserting the default in the
>underlying dict, or not copying the default value), the chosen
>semantics are the only ones that makes this example work.
>Over lunch with Alex Martelli, he proposed that a subclass of dict
>with this behavior (but implemented in C) would be a good addition to
>the language. It looks like it wouldn't be hard to implement. It could
>be a builtin named defaultdict. The first, required, argument to the
>constructor should be the default value. Remaining arguments (even
>keyword args) are passed unchanged to the dict constructor.
>Some more design subtleties:
>- "key in d" still returns False if the key isn't there
>- "d.get(key)" still returns None if the key isn't there
>- "d.default" should be a read-only attribute giving the default value
See above.

Bengt Richter

From lists at  Fri Feb 17 08:27:47 2006
From: lists at (Dmitry Vasiliev)
Date: Fri, 17 Feb 2006 10:27:47 +0300
Subject: [Python-Dev] Test failures in test_timeout
In-Reply-To: <dt3onn$rlt$>
References: <> <dt3onn$rlt$>
Message-ID: <>

Steve Holden wrote:
> Thomas Wouters wrote:
>> I'm thinking that it could probably try to connect to a less reliable
>> website, but that's just moving the problem around (and possibly harassing
>> an unsuspecting website, particularly around release-time.) Perhaps the test
>> should try to connect to a known unconnecting site, like a firewalled port
>> on Not something that refuses connections, just something
>> that times out.
> Couldn't the test use subprocess to start a reliably slow server on 
> localhost? It might even be possible to retrieve the ephemeral port 
> number used by the server, to avoid conflicts with already-used ports on 
> the testing machine.

About 3 years ago I submitted the patch for test_timeout which had fixed some 
of these issues:

Now I think the patch need more review and need to be updated for the current 
Python version and maybe some new ideas.

Dmitry Vasiliev (dima at

From bob at  Fri Feb 17 10:07:02 2006
From: bob at (Bob Ippolito)
Date: Fri, 17 Feb 2006 01:07:02 -0800
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	<>	<>	<>	<>	<dt020q$s7$>
Message-ID: <>

On Feb 16, 2006, at 11:35 AM, Benji York wrote:

> Alexander Schremmer wrote:
>> In fact, PHP does it like which is even  
>> shorter, i.e.
>> they fallback to the documentation if that path does not exist  
>> otherwise.
> Like many things PHP, that seems a bit too magical for my tastes.

Not only does it fall back to documentation, it falls back to a  
search for documentation if there isn't a function of that name.

It's a convenient feature, I'm sure people would use it if it was  
there... even if it was something like


From g.brandl at  Fri Feb 17 10:10:32 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 10:10:32 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <dt43u8$rd8$>

Guido van Rossum wrote:

>   d = DefaultDict([])
> can be written as simply
>   d[key].append(value)

> Feedback?

Probably a good idea, has been proposed multiple times on clpy.
One good thing would be to be able to specify either a default value
or a factory function.

While at it, other interesting dict subclasses could be:
* sorteddict, practically reinvented by every larger project
* keytransformdict, such as d = keytransformdict(str.lower).


From walter at  Fri Feb 17 10:31:25 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Feb 2006 10:31:25 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> A bunch of Googlers were discussing the best way of doing the
> following (a common idiom when maintaining a dict of lists of values
> relating to a key, sometimes called a multimap):
>   if key not in d: d[key] = []
>   d[key].append(value)
> An alternative way to spell this uses setdefault(), but it's not very readable:
>   d.setdefault(key, []).append(value)
> and it also suffers from creating an unnecessary list instance.
> (Timings were inconclusive; the approaches are within 5-10% of each
> other in speed.)
> My conclusion is that setdefault() is a failure -- it was a
> well-intentioned construct, but doesn't actually create more readable
> code.
> Google has an internal data type called a DefaultDict which gets
> passed a default value upon construction. Its __getitem__ method,
> instead of raising KeyError, inserts a shallow copy (!) of the given
> default value into the dict when the value is not found. So the above
> code, after
>   d = DefaultDict([])
> can be written as simply
>   d[key].append(value)

Using a shallow copy of the default seems a bit too magical to me. How 
would this be done? Via copy.copy?

And passing [] to the constructor of dict has a different meaning already.

Fetching the default via a static/class method would solve both problems:

class default_dict(dict):
    def __getitem__(self, key):
       if key in self:
          return dict.__getitem__(self, key)
          default = self.getdefault()
          self[key] = default
          return default

class multi_map(default_dict):
    def getdefault(self):
       return []

class counting_dict(default_dict):
    def getdefault(self):
       return 0

> [...]

    Walter D?rwald

From theller at  Fri Feb 17 10:42:40 2006
From: theller at (Thomas Heller)
Date: Fri, 17 Feb 2006 10:42:40 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt43u8$rd8$>
References: <>
Message-ID: <dt45qg$3be$>

> Guido van Rossum wrote:
>>   d = DefaultDict([])
>> can be written as simply
>>   d[key].append(value)
>> Feedback?

Ok, setdefault is a horrible name.  Would it make sense to come up with a better name?

Georg Brandl wrote:

> Probably a good idea, has been proposed multiple times on clpy.
> One good thing would be to be able to specify either a default value
> or a factory function.
> While at it, other interesting dict subclasses could be:
> * sorteddict, practically reinvented by every larger project

You mean ordereddict, not sorteddict, I hope.

> * keytransformdict, such as d = keytransformdict(str.lower).

Not sure what you mean by that.

What *I* would like is probably more ambitious:  I want a dict that allows case-insensitive
lookup of string keys, plus ideally I want to use it as class or instance dictionary.
Use case: COM wrappers.


From bob at  Fri Feb 17 10:50:15 2006
From: bob at (Bob Ippolito)
Date: Fri, 17 Feb 2006 01:50:15 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 16, 2006, at 9:20 PM, Josiah Carlson wrote:

> Greg Ewing <greg.ewing at> wrote:
>> Josiah Carlson wrote:
>>> They may not be encodings of _unicode_ data,
>> But if they're not encodings of unicode data, what
>> business do they have being available through
>> someunicodestring.encode(...)?
> I had always presumed that bytes objects are going to be able to be a
> source for encode AND decode, like current non-unicode strings are  
> able
> to be today.  In that sense, if I have a bytes object which is an
> encoding of rot13, hex, uu, etc., or I have a bytes object which I  
> would
> like to be in one of those encodings, I should be able to do  
> b.encode(...)
> or b.decode(...), given that 'b' is a bytes object.
> Are 'encodings' going to become a mechanism to encode and decode
> _unicode_ strings, rather than a mechanism to encode and decode _text
> and data_ strings?  That would seem like a backwards step to me, as  
> the
> email package would need to package their own base-64 encode/decode  
> and implementation, and similarly for any other package which uses any
> one of the encodings already available.

It would be VERY useful to separate the two concepts.  bytes<->bytes  
transforms should be one function pair, and bytes<->text transforms  
should be another.  The current situation is totally insane:
	str.decode(codec) -> str or unicode or UnicodeDecodeError or  
ZlibError or TypeError.. who knows what else
	str.encode(codec) -> str or unicode or UnicodeDecodeError or  
TypeError... probably other exceptions

Granted, unicode.encode(codec) and unicode.decode(codec) are actually  
somewhat sane in that the return type is always a str and the  
exceptions are either UnicodeEncodeError or UnicodeDecodeError.

I think that rot13 is the only conceptually text<->text transform  
(though the current implementation is really bytes<->bytes),  
everything else is either bytes<->text or bytes<->bytes.


From g.brandl at  Fri Feb 17 10:56:20 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 10:56:20 +0100
Subject: [Python-Dev] still available
In-Reply-To: <>
References: <dsq741$4un$>	<>	<>	<>	<>	<dt020q$s7$>	<19igus5puu6e2$>	<>
Message-ID: <dt46k4$5id$>

Bob Ippolito wrote:
> On Feb 16, 2006, at 11:35 AM, Benji York wrote:
>> Alexander Schremmer wrote:
>>> In fact, PHP does it like which is even  
>>> shorter, i.e.
>>> they fallback to the documentation if that path does not exist  
>>> otherwise.
>> Like many things PHP, that seems a bit too magical for my tastes.
> Not only does it fall back to documentation, it falls back to a  
> search for documentation if there isn't a function of that name.
> It's a convenient feature, I'm sure people would use it if it was  
> there... even if it was something like

Yes. Either that or would be nice.

(alongside with the "custom markers" I proposed one time so that there
can be "speaking" URLs like



From g.brandl at  Fri Feb 17 11:00:26 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 11:00:26 +0100
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
	to PyNumberMethods
In-Reply-To: <>
References: <dsu7t9$m9c$> <>
Message-ID: <dt46rq$5id$>

Bernhard Herzog wrote:
> "Travis E. Oliphant" <oliphant.travis at> writes:
>>     2) The __index__ special method will have the signature
>>        def __index__(self):
>>            return obj
>>        Where obj must be either an int or a long or another object
>>        that has the __index__ special method (but not self).
> So int objects will not have an __index__ method (assuming that ints
> won't return a different but equal int object).  However:
>>     4) A new operator.index(obj) function will be added that calls
>>        equivalent of obj.__index__() and raises an error if obj does not
>>        implement the special method.
> So operator.index(1) will raise an exception.  I would expect
> operator.index to be implemented using PyNumber_index.

I'd expect that __index__ won't be called on an int in the first place.


From ncoghlan at  Fri Feb 17 11:37:57 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 17 Feb 2006 20:37:57 +1000
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
 to PyNumberMethods
In-Reply-To: <dt46rq$5id$>
References: <dsu7t9$m9c$> <>
Message-ID: <>

Georg Brandl wrote:
> Bernhard Herzog wrote:
>> "Travis E. Oliphant" <oliphant.travis at> writes:
>>>     2) The __index__ special method will have the signature
>>>        def __index__(self):
>>>            return obj
>>>        Where obj must be either an int or a long or another object
>>>        that has the __index__ special method (but not self).
>> So int objects will not have an __index__ method (assuming that ints
>> won't return a different but equal int object).  However:
>>>     4) A new operator.index(obj) function will be added that calls
>>>        equivalent of obj.__index__() and raises an error if obj does not
>>>        implement the special method.
>> So operator.index(1) will raise an exception.  I would expect
>> operator.index to be implemented using PyNumber_index.
> I'd expect that __index__ won't be called on an int in the first place.

The PEP has been updated to cover adding the __index__ slot to int/long so 
that "one check finds all". The slot will just get bypassed for ints and longs 
by a lot of the C code in the interpreter.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Fri Feb 17 11:55:36 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 11:55:36 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt45qg$3be$>
References: <>	<dt43u8$rd8$>
Message-ID: <dt4a38$gs9$>

Thomas Heller wrote:

>> Probably a good idea, has been proposed multiple times on clpy.
>> One good thing would be to be able to specify either a default value
>> or a factory function.
>> While at it, other interesting dict subclasses could be:
>> * sorteddict, practically reinvented by every larger project
> You mean ordereddict, not sorteddict, I hope.

Well, yes.

>> * keytransformdict, such as d = keytransformdict(str.lower).
> Not sure what you mean by that.
> What *I* would like is probably more ambitious:  I want a dict that allows case-insensitive
> lookup of string keys

This is exactly what this would do. All keys are transformed to lowercase when
setting and looking up.

> plus ideally I want to use it as class or instance dictionary.
> Use case: COM wrappers.


From mwh at  Fri Feb 17 11:57:25 2006
From: mwh at (Michael Hudson)
Date: Fri, 17 Feb 2006 10:57:25 +0000
Subject: [Python-Dev] Rename str/unicode to text
In-Reply-To: <>
	(Guido van Rossum's message of "Thu, 16 Feb 2006 10:50:08 -0800")
References: <>
Message-ID: <>

Guido van Rossum <guido at> writes:

> OTOH, even if we didn't rename str/unicode to text, opentext would
> still be a good name for the function that opens a text file.

Hnnrgh, not really.  You're not opening a 'text', nor are you
constructing something that might reasonably be called an 'opentext'.
textfile() seems better.


  Q: What are 1000 lawyers at the bottom of the ocean?
  A: A good start.
  (A lawyer told me this joke.)
                                  -- Michael Str?der, comp.lang.python

From p.f.moore at  Fri Feb 17 12:04:17 2006
From: p.f.moore at (Paul Moore)
Date: Fri, 17 Feb 2006 11:04:17 +0000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <002301c6337a$7001f140$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On 2/17/06, Raymond Hettinger <python at> wrote:
> >> Over lunch with Alex Martelli, he proposed that a subclass of dict
> >> with this behavior (but implemented in C) would be a good addition to
> >> the language
> I would like to add something like this to the collections module,


> but a PEP is probably needed to deal with issues like:

+0 (You're probably right, but I fear there's no "perfect answer", so
discussions could go round in circles...)


From fuzzyman at  Fri Feb 17 12:02:05 2006
From: fuzzyman at (Fuzzyman)
Date: Fri, 17 Feb 2006 11:02:05 +0000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. L?wis wrote:
> Guido van Rossum wrote:
>> Feedback?
> I would like this to be part of the standard dictionary type,
> rather than being a subtype.
> d.setdefault([]) (one argument) should install a default value,
> and d.cleardefault() should remove that setting; d.default
> should be read-only. Alternatively, d.default could be assignable
> and del-able.
> Also, I think has_key/in should return True if there is a default.
And exactly what use would it then be ?

Michael Foord

> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...

From fredrik at  Fri Feb 17 12:14:27 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 12:14:27 +0100
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <dt4b6m$lps$>

Guido van Rossum wrote:

> A bunch of Googlers were discussing the best way of doing the
> following (a common idiom when maintaining a dict of lists of values
> relating to a key, sometimes called a multimap):
>   if key not in d: d[key] = []
>   d[key].append(value)


> Feedback?

+1.  check it in, already (as collections.defaultdict, perhaps?)

alternatively, you could specialize even further: collections.multimap,
which deals with list values only (that shallow copy thing feels a bit
questionable, but all alternatives feel slightly overgeneralized...)


From fredrik at  Fri Feb 17 12:23:44 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 12:23:44 +0100
Subject: [Python-Dev] Rename str/unicode to text
References: <><>
Message-ID: <dt4bo2$nm5$>

Michael Hudson wrote:

>  > OTOH, even if we didn't rename str/unicode to text, opentext would
> > still be a good name for the function that opens a text file.
> Hnnrgh, not really.  You're not opening a 'text', nor are you
> constructing something that might reasonably be called an 'opentext'.
> textfile() seems better.

except that in Python, file is a type, and open is an action.

but I agree that textfile reads better (haven't we been through this
a couple of times already, btw?  iirc, my original textfile proposal was
posted in 1846, or so)


From fredrik at  Fri Feb 17 12:39:17 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 12:39:17 +0100
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <dt4cl7$qds$>

Martin v. Löwis wrote:

> Also, I think has_key/in should return True if there is a default.

and keys should return all possible key values!


From g.brandl at  Fri Feb 17 13:00:29 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 13:00:29 +0100
Subject: [Python-Dev] Deprecate ``multifile``?
Message-ID: <dt4dst$tma$>


as Jim Jewett noted, multifile is supplanted by email as much as mimify etc.
but it is not marked as deprecated. Should it be deprecated in 2.5?


From mal at  Fri Feb 17 13:03:29 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 13:03:29 +0100
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On 2/15/06, Neil Schemenauer <nas at> wrote:
>> This could be a replacement for PEP 332.  At least I hope it can
>> serve to summarize the previous discussion and help focus on the
>> currently undecided issues.
>> I'm too tired to dig up the rules for assigning it a PEP number.
>> Also, there are probably silly typos, etc.   Sorry.
> I may check it in for you, although right now it would be good if we
> had some more feedback.
> I noticed one behavior in your pseudo-code constructor that seems
> questionable: while in the Q&A section you explain why the encoding is
> ignored when the argument is a str instance, in fact you require an
> encoding (and one that's not "ascii") if the str instance contains any
> non-ASCII bytes. So bytes("\xff") would fail, but bytes("\xff",
> "blah") would succeed. I think that's a bit strange -- if you ignore
> the encoding, you should always ignore it. So IMO bytes("\xff") and
> bytes("\xff", "ascii") should both return the same as bytes([255]).
> Also, there's a code path where the initializer is a unicode instance
> and its encode() method is called with None as the argument. I think
> both could be fixed by setting the encoding to
> sys.getdefaultencoding() if it is None and the argument is a unicode
> instance:
>     def bytes(initialiser=[], encoding=None):
>         if isinstance(initialiser, basestring):
>             if isinstance(initialiser, unicode):
>                 if encoding is None:
>                     encoding = sys.getdefaultencoding()
>                 initialiser = initialiser.encode(encoding)
>             initialiser = [ord(c) for c in initialiser]
>         elif encoding is not None:
>             raise TypeError("explicit encoding invalid for non-string "
>                             "initialiser")
>         create bytes object and fill with integers from initialiser
>         return bytes object
> BTW, for folks who want to experiment, it's quite simple to create a
> working bytes implementation by inheriting from array.array. Here's a
> quick draft (which only takes str instance arguments):
>     from array import array
>     class bytes(array):
>         def __new__(cls, data=None):
>             b = array.__new__(cls, "B")
>             if data is not None:
>                 b.fromstring(data)
>             return b
>         def __str__(self):
>             return self.tostring()
>         def __repr__(self):
>             return "bytes(%s)" % repr(list(self))
>         def __add__(self, other):
>             if isinstance(other, array):
>                 return bytes(super(bytes, self).__add__(other))
>             return NotImplemented

Another hint:

If you want to play around with the migration
to all Unicode in Py3k, start Python with the -U switch and
monkey-patch the builtin str to be an alias for unicode.

Ideally, the bytes type should work under both the Py3k conditions
and the Py2.x default ones.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From python at  Fri Feb 17 13:19:11 2006
From: python at (Raymond Hettinger)
Date: Fri, 17 Feb 2006 07:19:11 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <000f01c633bc$5c1f4820$b83efea9@RaymondLaptop1>

> My conclusion is that setdefault() is a failure -- it was a
> well-intentioned construct, but doesn't actually create more readable
> code.

It was an across the board failure:  naming, clarity, efficiency.
Can we agree to slate dict.setdefault() to disappear in Py3.0?


From walter at  Fri Feb 17 13:45:22 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Feb 2006 13:45:22 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:

> On 2/16/06, M.-A. Lemburg <mal at> wrote:
>> What will be the explicit way to open a file in bytes mode
>> and in text mode (I for one would like to move away from
>> open() completely as well) ?
>> Will we have a single file type with two different modes
>> or two different types ?
> I'm currently thinking of an I/O stack somewhat like Java's. At the
> bottom there's a class that lets you do raw unbuffered reads and
> writes (and seek/tell) on binary files using bytes arrays. We can
> layer onto this buffering, text encoding/decoding, and more.  (Windows
> CRLF<->LF conversion is also an encoding of sorts).
> Years ago I wrote a prototype; checkout sandbox/sio/.

However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work 
for encodings that need state (e.g. when reading/writing UTF-16). 
Switching to stateful encoders/decoders isn't so easy, because the 
stateful codecs require a stream-API, which brings in a whole bunch of 
other functionality (readline() etc.), which we'd probably like to keep 
separate. I have a patch ( that should 
fix this problem (at least for all codecs derived from 
codecs.StreamReader/codecs.StreamWriter). Additionally it would make 
stateful codecs more useful in the context for iterators/generators.

I'd like this patch to go into 2.5.

    Walter D?rwald

From pje at  Fri Feb 17 13:52:41 2006
From: pje at (Phillip J. Eby)
Date: Fri, 17 Feb 2006 07:52:41 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt43u8$rd8$>
References: <>
Message-ID: <>

At 10:10 AM 02/17/2006 +0100, Georg Brandl wrote:
>Guido van Rossum wrote:
> >   d = DefaultDict([])
> >
> > can be written as simply
> >
> >   d[key].append(value)
> > Feedback?
>Probably a good idea, has been proposed multiple times on clpy.
>One good thing would be to be able to specify either a default value
>or a factory function.

+1 on factory function, e.g. "DefaultDict(list)".  A default value isn't 
very useful, because for immutable defaults, setdefault() works well 
enough.  If what you want is a copy of some starting object, you can always 
do something like DefaultDict({1:2,3:4}.copy).

From fredrik at  Fri Feb 17 13:50:18 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 13:50:18 +0100
Subject: [Python-Dev] Deprecate ``multifile``?
References: <dt4dst$tma$>
Message-ID: <dt4gqa$8i5$>

Georg Brandl wrote:

> as Jim Jewett noted, multifile is supplanted by email as much as mimify etc.
> but it is not marked as deprecated. Should it be deprecated in 2.5?

-0.5 (gratuitous breakage).

I think the current "see also/supersedes" link is good enough.


From mal at  Fri Feb 17 13:53:39 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 13:53:39 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/16/06, M.-A. Lemburg <mal at> wrote:
>> What will be the explicit way to open a file in bytes mode
>> and in text mode (I for one would like to move away from
>> open() completely as well) ?
>> Will we have a single file type with two different modes
>> or two different types ?
> I'm currently thinking of an I/O stack somewhat like Java's. At the
> bottom there's a class that lets you do raw unbuffered reads and
> writes (and seek/tell) on binary files using bytes arrays. We can
> layer onto this buffering, text encoding/decoding, and more. (Windows
> CRLF<->LF conversion is also an encoding of sorts).

Sounds like the stackable StreamWriters and -Readers would
nicely integrate into this design.

> Years ago I wrote a prototype; checkout sandbox/sio/.

Thanks. Maybe one of these days I'll get around to having
a look - unlike many of the pydev folks, I don't work for
Google and can't spend 20% or 50% of my time on
Python core development :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From g.brandl at  Fri Feb 17 14:01:22 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 14:01:22 +0100
Subject: [Python-Dev] Deprecate ``multifile``?
In-Reply-To: <dt4gqa$8i5$>
References: <dt4dst$tma$> <dt4gqa$8i5$>
Message-ID: <dt4hf2$adn$>

Fredrik Lundh wrote:
> Georg Brandl wrote:
>> as Jim Jewett noted, multifile is supplanted by email as much as mimify etc.
>> but it is not marked as deprecated. Should it be deprecated in 2.5?
> -0.5 (gratuitous breakage).
> I think the current "see also/supersedes" link is good enough.

Well, it would be deprecated like the other email modules, that is, only
a note is added to the docs and it is added to PEP 4. There would be no


From mal at  Fri Feb 17 14:10:43 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 14:10:43 +0100
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Walter D?rwald wrote:
> Guido van Rossum wrote:
>> On 2/16/06, M.-A. Lemburg <mal at> wrote:
>>> What will be the explicit way to open a file in bytes mode
>>> and in text mode (I for one would like to move away from
>>> open() completely as well) ?
>>> Will we have a single file type with two different modes
>>> or two different types ?
>> I'm currently thinking of an I/O stack somewhat like Java's. At the
>> bottom there's a class that lets you do raw unbuffered reads and
>> writes (and seek/tell) on binary files using bytes arrays. We can
>> layer onto this buffering, text encoding/decoding, and more.  (Windows
>> CRLF<->LF conversion is also an encoding of sorts).
>> Years ago I wrote a prototype; checkout sandbox/sio/.
> However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
> for encodings that need state (e.g. when reading/writing UTF-16).
> Switching to stateful encoders/decoders isn't so easy, because the
> stateful codecs require a stream-API, which brings in a whole bunch of
> other functionality (readline() etc.), which we'd probably like to keep
> separate. I have a patch ( that should
> fix this problem (at least for all codecs derived from
> codecs.StreamReader/codecs.StreamWriter). Additionally it would make
> stateful codecs more useful in the context for iterators/generators.
> I'd like this patch to go into 2.5.

The patch as-is won't go into 2.5. It's simply the wrong approach:
StreamReaders and -Writers work on streams (hence the name). It
doesn't make sense adding functionality to side-step this behavior,
since it undermines the design.

Like I suggested in the patch discussion, such functionality could
be factored out of the implementations of StreamReaders/Writers
and put into new StatefulEncoder/Decoder classes, the objects of
which then get used by StreamReader/Writer.

In addition to that we could extend the codec registry to also
maintain slots for the stateful encoders and decoders, if needed.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mwh at  Fri Feb 17 14:15:05 2006
From: mwh at (Michael Hudson)
Date: Fri, 17 Feb 2006 13:15:05 +0000
Subject: [Python-Dev] Rename str/unicode to text
In-Reply-To: <dt4bo2$nm5$> (Fredrik Lundh's message of "Fri,
	17 Feb 2006 12:23:44 +0100")
References: <>
	<> <dt4bo2$nm5$>
Message-ID: <>

"Fredrik Lundh" <fredrik at> writes:

> Michael Hudson wrote:
>>  > OTOH, even if we didn't rename str/unicode to text, opentext would
>> > still be a good name for the function that opens a text file.
>> Hnnrgh, not really.  You're not opening a 'text', nor are you
>> constructing something that might reasonably be called an 'opentext'.
>> textfile() seems better.
> except that in Python, file is a type, and open is an action.

Well, yeah, but you can interpret each name in a sane way and try to
ignore the fact that they refer to the same object...

> but I agree that textfile reads better (haven't we been through this
> a couple of times already, btw?  iirc, my original textfile proposal was
> posted in 1846, or so)

Yes, that sounds about right.


  I'm sorry, was my bias showing again? :-)
                                      -- William Tanksley, 13 May 2000

From fredrik at  Fri Feb 17 14:16:59 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 14:16:59 +0100
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <dt4icb$e9n$>

Raymond Hettinger wrote:

> I would like to add something like this to the collections module, but a PEP is
> probably needed to deal with issues like:

frankly, now that Guido is working 50% on Python, do we really have to use
the full PEP process also for simple things like this?

I'd say we let the BDFL roam free.

(if he adds something really lousy, it can always be tweaked/removed before
the next final release.  not every checkin needs to be final...).


From mal at  Fri Feb 17 14:24:51 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 14:24:51 +0100
Subject: [Python-Dev] str.translate vs unicode.translate
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>
Message-ID: <>

Bengt Richter wrote:
> If str becomes unicode for PY 3000, and we then have bytes as out coding-agnostic
> byte data, then I think bytes should have the str translation method, with a tweak
> that I would hope could also be done to str now.
> BTW, str.translate will presumably become unicode.translate, so
> perhaps unicode.translate should grow a compatible deletechars parameter.

I'd much rather like to see .translate() method deprecated.

Writing a code for the task is much more effective - the
builtin charmap codec will do all the mapping for you,
if you have a need to go from bytes to Unicode and vice-

We could also have a bytemap codec for doing bytes to bytes

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From g.brandl at  Fri Feb 17 14:26:45 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 14:26:45 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt4icb$e9n$>
References: <>	<002301c6337a$7001f140$b83efea9@RaymondLaptop1>
Message-ID: <dt4iul$g5k$>

Fredrik Lundh wrote:
> Raymond Hettinger wrote:
>> I would like to add something like this to the collections module, but a PEP is
>> probably needed to deal with issues like:
> frankly, now that Guido is working 50% on Python, do we really have to use
> the full PEP process also for simple things like this?
> I'd say we let the BDFL roam free.
> (if he adds something really lousy, it can always be tweaked/removed before
> the next final release.  not every checkin needs to be final...).



From mal at  Fri Feb 17 14:30:14 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 14:30:14 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Josiah Carlson wrote:
>> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
>> and likely a few others that the two of you may be arguing against
>> should stay as encodings, because strictly speaking, they are defined as
>> encodings of data.  They may not be encodings of _unicode_ data, but
>> that doesn't mean that they aren't useful encodings for other kinds of
>> data, some text, some binary, ...
> To support them, the bytes type would have to gain a .encode method,
> and I'm -1 on supporting bytes.encode, or string.decode.
> Why is
> s.encode("uu")
> any better than
> binascii.b2a_uu(s)

The .encode() and .decode() methods are merely convenience
interfaces to the registered codecs (with some extra logic to
make sure that only a pre-defined set of return types are allowed).
It's up to the user to use them for e.g. UU-encoding or not.

The reason we have codecs for UU, zip and the others is that
you can use their StreamWriters/Readers in stackable streams.

Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mal at  Fri Feb 17 14:39:46 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 14:39:46 +0100
Subject: [Python-Dev] [Python-checkins] r42396 - peps/trunk/pep-0011.txt
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Neal Norwitz wrote:
> [Moving to python-dev]
> I don't have a strong opinion.  Any one else have an opinion about
> removing --with-wctype-functions from configure?

FWIW, I announced this plan in Dec 2004:

I didn't get any replies back then, so assumed that no-one
would object, but forgot to add this to the PEP 11.

The reason I'd like to get this removed early rather than
later is that some Linux distros happen to use the config
switch causing the Python Unicode implementation on those
distros to behave inconsistent with regular Python

After all we've put a lot of effort into making sure that
the Unicode implementation does work independently of
the platform, even on platforms that don't have Unicode
support at all.

Another candidate for removal is the --disable-unicode

We should probably add a deprecation warning for that in
Py 2.5 and then remove the hundreds of
from the source code in time for Py 2.6.

> n
> --
> On 2/16/06, M.-A. Lemburg <mal at> wrote:
>> neal.norwitz wrote:
>>> Author: neal.norwitz
>>> Date: Thu Feb 16 06:25:37 2006
>>> New Revision: 42396
>>> Modified:
>>>    peps/trunk/pep-0011.txt
>>> Log:
>>> MAL says this option should go away in bug report 874534:
>>>     The reason for the removal is that the option causes
>>>     semantical problems and makes Unicode work in non-standard
>>>     ways on platforms that use locale-aware extensions to the
>>>     wc-type functions.
>>> Since it wasn't previously announced, we can keep the option until 2.6
>>> unless someone feels strong enough to rip it out.
>> I've been wanting to rip this out for some time now, but
>> you're right: I forgot to add this to PEP 11, so let's
>> wait for another release.
>> OTOH, this normally only affects system builders, so perhaps
>> we could do this a little faster, e.g. add a warning in the
>> first alpha and then rip it out with one of the last betas ?!
>>> Modified: peps/trunk/pep-0011.txt
>>> +    Name:             Systems using --with-wctype-functions
>>> +    Unsupported in:   Python 2.6
>>> +    Code removed in:  Python 2.6
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From ncoghlan at  Fri Feb 17 14:55:47 2006
From: ncoghlan at (Nick Coghlan)
Date: Fri, 17 Feb 2006 23:55:47 +1000
Subject: [Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> [Hey, I thought I sent that just to you. Is python-dev really
> interested in this?]

Force of habit on my part - I saw the python-dev header and automatically 
dropped "pyd" into the To: field of the reply.

Given Paul's contribution on the get_data front, it turned out to be a 
fortuitous accident :)

[init_globals argument]
>> I just realised that anything that's a legal argument to "dict.update" will
>> work. I'll fix the function description in the PEP (and the docs patch as well).
> I'm not sure that's a good idea -- you'll never be able to switch to a
> different implementation then.

Good point - I'll change the wording so that (officially, at least) it has to 
be a dictionary.

[_read_compiled_file() error handling]
> Also, *perhaps* it makes more sense to return None instead of raising
> ValueError? Since you're always catching it? (Or are you?)

I've changed this in my local copy. That provides a means for dealing with 
marshal, too - catching any Exception from marshal.load and convert it to 
returning None.

This approach loses some details on what exactly was wrong with the file, but 
that doesn't seem like a big issue (and it cleans up some of the other code).

[run_module() error handling]
> OK. But a loader could return None from get_code() -- do you check for
> that? (I don't have the source handy here.)

The current version on SF doesn't check it, but I just updated my local copy 
to fix that.

[run_module() interaction with import]
> What happens when you execute "" as __main__ and then (perhaps
> indirectly) something does "import foo"? Does a second copy of
> get loaded by the regular loader?

Yes - this is the same as if was run directly from the command line via 
its filename.

[YAGNI and 6 public functions where only 1 has a demonstrated use case]
> I do wonder if isn't getting a bit over-engineered -- it
> seems a lot of the functionality isn't actually necessary to implement
> -m, and the usefulness in other circumstances is as yet
> unproven. What do you think of taking a dose of YAGNI here?
> (Especially since I notice that most of the public APIs are very thin
> layers over exec or execfile -- people can just use those directly.)

I had a look at pdb and profile, and the runpy functions really wouldn't help 
with either of those. Since I don't have any convincing use cases, I'll demote 
run_code and run_module_code to be private helper functions and remove the 
three run*file methods (I might throw those three up on ASPN as a cookbook 
recipe instead).

That leaves the public API containing only run_module, which is all -m really 

[thread safety and the import lock]
>> Another problem that occurred to me is that the module isn't thread safe at
>> the moment. The PEP 302 emulation isn't protected by the import lock, and the
>> changes to sys.argv in run_module_code will be visible across threads (and may
>> clobber each other or the original if multiple threads invoke the function).
> Another reason to consider cutting it down to only what's needed by
> -m; -m doesn't need thread-safety (I think).

Yeah, thread-safety is only an issue if invoking runpy.run_module from 
threaded Python code. However, I think this is one of those nasty threading 
problems where it will work for 99.9% of cases and produce intractable bugs 
for the remaining 0.1%. If an extra try-finally block can categorically rule 
out those kinds of problems, then I think it's nicer to include it.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Fri Feb 17 14:56:44 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 14:56:44 +0100
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dsj0p7$tk3$>
References: <dsj0p7$tk3$>
Message-ID: <dt4kms$mfj$>

Georg Brandl wrote:
> Hi,
> it has been proposed before, but there was no conclusive answer last time:
> is there any chance for 2.5 to include commonly used decorators in a module?

No interest at all?


From rhamph at  Fri Feb 17 15:13:51 2006
From: rhamph at (Adam Olsen)
Date: Fri, 17 Feb 2006 07:13:51 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Guido van Rossum <guido at> wrote:
> A bunch of Googlers were discussing the best way of doing the
> following (a common idiom when maintaining a dict of lists of values
> relating to a key, sometimes called a multimap):
>   if key not in d: d[key] = []
>   d[key].append(value)
> An alternative way to spell this uses setdefault(), but it's not very readable:
>   d.setdefault(key, []).append(value)

I'd like to see it done passing a factory function (and with a better name):

d.getorset(key, list).append(value)

The name is slightly odd but it is effective.  Plus it avoids creating
a new class when a slight tweak to an existing one will do.

> Over lunch with Alex Martelli, he proposed that a subclass of dict
> with this behavior (but implemented in C) would be a good addition to
> the language. It looks like it wouldn't be hard to implement. It could
> be a builtin named defaultdict. The first, required, argument to the
> constructor should be the default value. Remaining arguments (even
> keyword args) are passed unchanged to the dict constructor.

-1 (atleast until you can explain why that's better than .getorset())

Adam Olsen, aka Rhamphoryncus

From ncoghlan at  Fri Feb 17 15:27:48 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 00:27:48 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Phillip J. Eby wrote:
> At 10:10 AM 02/17/2006 +0100, Georg Brandl wrote:
>> Guido van Rossum wrote:
>>>   d = DefaultDict([])
>>> can be written as simply
>>>   d[key].append(value)
>>> Feedback?
>> Probably a good idea, has been proposed multiple times on clpy.
>> One good thing would be to be able to specify either a default value
>> or a factory function.
> +1 on factory function, e.g. "DefaultDict(list)".  A default value isn't 
> very useful, because for immutable defaults, setdefault() works well 
> enough.  If what you want is a copy of some starting object, you can always 
> do something like DefaultDict({1:2,3:4}.copy).

+1 here, too (for permitting a factory function only).

This doesn't really limit usage, as you can still supply 
DefaultDict(partial(copy, x)) or DefaultDict(partial(deepcopy, x)), or (heaven 
forbid) a lambda expression. . .

As others have mentioned, the basic types are all easy, since the typename can 
be used directly.

+1 on supplying that factory function to the constructor, too (the default 
value is a fundamental part of the defaultdict). That is, I'd prefer:

   d = defaultdict(func)
   # The defaultdict is fully defined, but not yet populated


   d = defaultdict(init_values)
   # The defaultdict is partially populated, but not yet fully defined!

That is, something that is the same the normal dict except for:

     def __init__(self, default):
         self.default = default

     def __getitem__(self, key):
         return self.get(key, self.default())

Considering some of Raymond's questions in light of the above
> * implications of a __getitem__ succeeding while get(value, x) returns x 
> (possibly different from the overall default)
> * implications of a __getitem__ succeeding while __contains__ would fail

These behaviours seem reasonable for a default dictionary - "containment" is 
based on whether or not the key actually exists in the dictionary as it 
currently stands, and the default is really a "default default" that can be 
overridden using 'get'.

> * whether to add this to the collections module (I would say yes)
> * whether to allow default functions as well as default values (so you could 
> instantiate a new default list)

My preference is for factory functions only, to eliminate ambiguity.

# bag like behavior
dd = collections.default_dict(int)
for elem in collection:
     dd[elem] += 1

# setdefault-like behavior
dd = collections.default_dict(list)
for page_number, page in enumerate(book):
     for word in page.split():

Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From walter at  Fri Feb 17 15:38:24 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Feb 2006 15:38:24 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:

> Walter D?rwald wrote:
>> Guido van Rossum wrote:
>>> [...]
>>> Years ago I wrote a prototype; checkout sandbox/sio/.
>> However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
>> for encodings that need state (e.g. when reading/writing UTF-16).
>> Switching to stateful encoders/decoders isn't so easy, because the
>> stateful codecs require a stream-API, which brings in a whole bunch of
>> other functionality (readline() etc.), which we'd probably like to keep
>> separate. I have a patch ( that should
>> fix this problem (at least for all codecs derived from
>> codecs.StreamReader/codecs.StreamWriter). Additionally it would make
>> stateful codecs more useful in the context for iterators/generators.
>> I'd like this patch to go into 2.5.
> The patch as-is won't go into 2.5. It's simply the wrong approach:
> StreamReaders and -Writers work on streams (hence the name). It
> doesn't make sense adding functionality to side-step this behavior,
> since it undermines the design.

I agree that using a StreamWriter without a stream somehow feels wrong.

> Like I suggested in the patch discussion, such functionality could
> be factored out of the implementations of StreamReaders/Writers
> and put into new StatefulEncoder/Decoder classes, the objects of
> which then get used by StreamReader/Writer.
> In addition to that we could extend the codec registry to also
> maintain slots for the stateful encoders and decoders, if needed.

We *have* to do it like this otherwise there would be no way to get a 
StatefulEncoder/Decoder from an encoding name.

Does this mean that codecs.lookup() would have to return a 6-tuple? But 
this would break if someone uses codecs.lookup("foo")[-1]. So maybe 
codecs.lookup() should return an instance of a subclass of tuple which 
has the StatefulEncoder/Decoder as attributes. But then codecs.lookup() 
must be able to handle old 4-tuples returned by old search functions and 
update those to the new 6-tuples. (But we could drop this again after 
several releases, once all third party codecs are updated).

    Walter D?rwald

From ncoghlan at  Fri Feb 17 15:46:45 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 00:46:45 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Adam Olsen wrote:
>> Over lunch with Alex Martelli, he proposed that a subclass of dict
>> with this behavior (but implemented in C) would be a good addition to
>> the language. It looks like it wouldn't be hard to implement. It could
>> be a builtin named defaultdict. The first, required, argument to the
>> constructor should be the default value. Remaining arguments (even
>> keyword args) are passed unchanged to the dict constructor.
> -1 (atleast until you can explain why that's better than .getorset())

Because the "default default" is a fundamental characteristic of the default 
dictionary (meaning it works with normal indexing syntax), whereas "getorset" 
makes it a characteristic of the method call.

Besides, if there are going to be any method changes on normal dicts, I'd 
rather see a boolean third argument "set" to the get method.

That is (for a normal dict):

   def get(self, key, *args):
       set = False
       no_default = False
       if len(args) == 2:
           default, set = args
       elif args:
           default, = args
           no_default = True

       if key in self:
           return self[key]
       if no_default:
           raise KeyError(repr(key))
       if set:
           self[key] = default
       return default

Using Guido's original example:

   d.get(key, [], True).append(value)

I don't really think this is a replacement for defaultdict, though.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From fredrik at  Fri Feb 17 15:50:10 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 15:50:10 +0100
Subject: [Python-Dev] Proposal: defaultdict
References: <><>
Message-ID: <dt4nr3$3al$>

Nick Coghlan wrote:

> Using Guido's original example:
>   d.get(key, [], True).append(value)

hmm.  are you sure you didn't just reinvent setdefault ?


From mal at  Fri Feb 17 16:04:09 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 16:04:09 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Walter D?rwald wrote:
> M.-A. Lemburg wrote:
>> Walter D?rwald wrote:
>>> Guido van Rossum wrote:
>>>> [...]
>>>> Years ago I wrote a prototype; checkout sandbox/sio/.
>>> However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
>>> for encodings that need state (e.g. when reading/writing UTF-16).
>>> Switching to stateful encoders/decoders isn't so easy, because the
>>> stateful codecs require a stream-API, which brings in a whole bunch of
>>> other functionality (readline() etc.), which we'd probably like to keep
>>> separate. I have a patch ( that should
>>> fix this problem (at least for all codecs derived from
>>> codecs.StreamReader/codecs.StreamWriter). Additionally it would make
>>> stateful codecs more useful in the context for iterators/generators.
>>> I'd like this patch to go into 2.5.
>> The patch as-is won't go into 2.5. It's simply the wrong approach:
>> StreamReaders and -Writers work on streams (hence the name). It
>> doesn't make sense adding functionality to side-step this behavior,
>> since it undermines the design.
> I agree that using a StreamWriter without a stream somehow feels wrong.
>> Like I suggested in the patch discussion, such functionality could
>> be factored out of the implementations of StreamReaders/Writers
>> and put into new StatefulEncoder/Decoder classes, the objects of
>> which then get used by StreamReader/Writer.
>> In addition to that we could extend the codec registry to also
>> maintain slots for the stateful encoders and decoders, if needed.
> We *have* to do it like this otherwise there would be no way to get a
> StatefulEncoder/Decoder from an encoding name.
> Does this mean that codecs.lookup() would have to return a 6-tuple? 
> But this would break if someone uses codecs.lookup("foo")[-1].

Right; though I'd much rather see that people use the direct
codecs module lookup APIs:

getencoder(), getdecoder(), getreader() and getwriter()

instead of using codecs.lookup() directly.

> So maybe
> codecs.lookup() should return an instance of a subclass of tuple which
> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
> must be able to handle old 4-tuples returned by old search functions and
> update those to the new 6-tuples. (But we could drop this again after
> several releases, once all third party codecs are updated).

This was a design error: I should have not made
codecs.lookup() a documented function.

I'd suggest we keep codecs.lookup() the way it is and
instead add new functions to the codecs module, e.g.
codecs.getencoderobject() and codecs.getdecoderobject().

Changing the codec registration is not much of a problem:
we could simply allow 6-tuples to be passed into the

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From skip at  Fri Feb 17 16:05:27 2006
From: skip at (skip at
Date: Fri, 17 Feb 2006 09:05:27 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

    Guido> Over lunch with Alex Martelli, he proposed that a subclass of
    Guido> dict with this behavior (but implemented in C) would be a good
    Guido> addition to the language.

Instead, why not define setdefault() the way it should have been done in the
first place?  When you create a dict it has the current behavior.  If you
then call its setdefault() method that becomes the default value for missing

    d = {'a': 1}'
    d['b']              # raises KeyError
    d.get('c')          # evaluates to None
    d['b']              # evaluates to 42
    d.get('c')          # evaluates to 42

For symmetry, setdefault() should probably be undoable: deldefault(),
removedefault(), nodefault(), default_free(), whatever.

The only question in my mind is whether or not getting a non-existent value
under the influence of a given default value should stick that value in the
dictionary or not.

down-with-more-builtins-ly, y'rs,


From ncoghlan at  Fri Feb 17 16:28:55 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 01:28:55 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt4nr3$3al$>
References: <><>	<>
Message-ID: <>

Fredrik Lundh wrote:
> Nick Coghlan wrote:
>> Using Guido's original example:
>>   d.get(key, [], True).append(value)
> hmm.  are you sure you didn't just reinvent setdefault ?

I'm reasonably sure I copied it on purpose, only with a name that isn't 100% 
misleading as to what it does ;)

I think collections.defaultdict is a better approach, though.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Fri Feb 17 16:29:59 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 16:29:59 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>
Message-ID: <dt4q5n$chl$>

skip at wrote:
>     Guido> Over lunch with Alex Martelli, he proposed that a subclass of
>     Guido> dict with this behavior (but implemented in C) would be a good
>     Guido> addition to the language.
> Instead, why not define setdefault() the way it should have been done in the
> first place?  When you create a dict it has the current behavior.  If you
> then call its setdefault() method that becomes the default value for missing
> keys.

That puts it off until 3.0.

>From what I read I think defaultdict won't become builtin anyway.


From fdrake at  Fri Feb 17 16:36:11 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Fri, 17 Feb 2006 10:36:11 -0500
Subject: [Python-Dev] 2.5 PEP
In-Reply-To: <>
References: <>
Message-ID: <>

On Thursday 16 February 2006 17:06, Martin v. L?wis wrote:
 > I'm still unhappy with that change, and still nobody has told me how to
 > maintain PyXML so that it can continue to work both for 2.5 and for 2.4.


I do intend to write a proper response for you, but have been massively 
swamped.  python-dev topics occaissionally pop up for me, but time has been 
too limited to get back to the important items, like this one.


Fred L. Drake, Jr.   <fdrake at>

From walter at  Fri Feb 17 16:57:01 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Feb 2006 16:57:01 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Walter D?rwald wrote:
>> M.-A. Lemburg wrote:
>>> [...]
>>> Like I suggested in the patch discussion, such functionality could
>>> be factored out of the implementations of StreamReaders/Writers
>>> and put into new StatefulEncoder/Decoder classes, the objects of
>>> which then get used by StreamReader/Writer.
>>> In addition to that we could extend the codec registry to also
>>> maintain slots for the stateful encoders and decoders, if needed.
>> We *have* to do it like this otherwise there would be no way to get a
>> StatefulEncoder/Decoder from an encoding name.
>> Does this mean that codecs.lookup() would have to return a 6-tuple? 
>> But this would break if someone uses codecs.lookup("foo")[-1].
> Right; though I'd much rather see that people use the direct
> codecs module lookup APIs:
> getencoder(), getdecoder(), getreader() and getwriter()
> instead of using codecs.lookup() directly.


>> So maybe
>> codecs.lookup() should return an instance of a subclass of tuple which
>> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
>> must be able to handle old 4-tuples returned by old search functions and
>> update those to the new 6-tuples. (But we could drop this again after
>> several releases, once all third party codecs are updated).
> This was a design error: I should have not made
> codecs.lookup() a documented function.
> I'd suggest we keep codecs.lookup() the way it is and
> instead add new functions to the codecs module, e.g.
> codecs.getencoderobject() and codecs.getdecoderobject().
> Changing the codec registration is not much of a problem:
> we could simply allow 6-tuples to be passed into the
> registry.

OK, so codecs.lookup() returns 4-tuples, but the registry stores 
6-tuples and the search functions must return 6-tuples. And we add 
codecs.getencoderobject() and codecs.getdecoderobject() as well as new 
classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about 
old search functions that return 4-tuples?

    Walter D?rwald

From bokr at  Fri Feb 17 17:04:10 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 16:04:10 GMT
Subject: [Python-Dev] bytes type discussion
References: <><dt09vc$tvv$>	<><dt0b8s$2eb$><><dt0fr2$fmg$>	<>
	<dt2bsi$cb4$> <dt3nqh$rlt$>
Message-ID: <>

On Fri, 17 Feb 2006 00:43:50 -0500, Steve Holden <steve at> wrote:

>Fredrik Lundh wrote:
>> Barry Warsaw wrote:
>>>We know at least there will never be a 2.10, so I think we still have
>> because there's no way to count to 10 if you only have one digit?
>> we used to think that back when the gas price was just below 10 SEK/L,
>> but they found a way...
>IIRC Guido is on record as saying "There will be no Python 2.10 because 
>I hate the ambiguity of double-digit minor release numbers", or words to 
>that effect.

Bengt Richter

From mal at  Fri Feb 17 17:12:36 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 17:12:36 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
Message-ID: <>

Walter D?rwald wrote:
> M.-A. Lemburg wrote:
>> Walter D?rwald wrote:
>>> M.-A. Lemburg wrote:
>>>> [...]
>>>> Like I suggested in the patch discussion, such functionality could
>>>> be factored out of the implementations of StreamReaders/Writers
>>>> and put into new StatefulEncoder/Decoder classes, the objects of
>>>> which then get used by StreamReader/Writer.
>>>> In addition to that we could extend the codec registry to also
>>>> maintain slots for the stateful encoders and decoders, if needed.
>>> We *have* to do it like this otherwise there would be no way to get a
>>> StatefulEncoder/Decoder from an encoding name.
>>> Does this mean that codecs.lookup() would have to return a 6-tuple?
>>> But this would break if someone uses codecs.lookup("foo")[-1].
>> Right; though I'd much rather see that people use the direct
>> codecs module lookup APIs:
>> getencoder(), getdecoder(), getreader() and getwriter()
>> instead of using codecs.lookup() directly.
> OK.
>>> So maybe
>>> codecs.lookup() should return an instance of a subclass of tuple which
>>> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
>>> must be able to handle old 4-tuples returned by old search functions and
>>> update those to the new 6-tuples. (But we could drop this again after
>>> several releases, once all third party codecs are updated).
>> This was a design error: I should have not made
>> codecs.lookup() a documented function.
>> I'd suggest we keep codecs.lookup() the way it is and
>> instead add new functions to the codecs module, e.g.
>> codecs.getencoderobject() and codecs.getdecoderobject().
>> Changing the codec registration is not much of a problem:
>> we could simply allow 6-tuples to be passed into the
>> registry.
> OK, so codecs.lookup() returns 4-tuples, but the registry stores
> 6-tuples and the search functions must return 6-tuples. And we add
> codecs.getencoderobject() and codecs.getdecoderobject() as well as new
> classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
> old search functions that return 4-tuples?

The registry should then simply set the missing entries to None
and the getencoderobject()/getdecoderobject() would then have
to raise an error.

Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From jack at  Fri Feb 17 17:19:32 2006
From: jack at (Jack Diederich)
Date: Fri, 17 Feb 2006 11:19:32 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Feb 16, 2006 at 01:11:49PM -0800, Guido van Rossum wrote:
> Google has an internal data type called a DefaultDict which gets
> passed a default value upon construction. Its __getitem__ method,
> instead of raising KeyError, inserts a shallow copy (!) of the given
> default value into the dict when the value is not found. So the above
> code, after
>   d = DefaultDict([])
> can be written as simply
>   d[key].append(value)
> Note that of all the possible semantics for __getitem__ that could
> have produced similar results (e.g. not inserting the default in the
> underlying dict, or not copying the default value), the chosen
> semantics are the only ones that makes this example work.

Having __getitem__ insert the returned default value allows it to
work with a larger variety of classes.  My own ForgivingDict does not
do this and works fine for ints and lists but not much else.

fd = ForgivingDict(list)
fd[key] += [val] # extends the list and does a __setitem__

The += operator isn't useful for dicts.

How can you make a defaultdict with a defaultdict as the default?
My head asploded when I tried it with the constructor arg.
It does seem possible with the 'd.default = func' syntax

# empty defaultdict constructor
d = defaultdict()
d.default = d
tree = defaultdict()
tree.default = d.copy


From arigo at  Fri Feb 17 17:29:32 2006
From: arigo at (Armin Rigo)
Date: Fri, 17 Feb 2006 17:29:32 +0100
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
	to PyNumberMethods
In-Reply-To: <dsu7t9$m9c$>
References: <dsu7t9$m9c$>
Message-ID: <>

Hi Travis,

On Tue, Feb 14, 2006 at 08:41:19PM -0700, Travis E. Oliphant wrote:
>     2) The __index__ special method will have the signature
>        def __index__(self):
>            return obj
>        Where obj must be either an int or a long or another object that has the
>        __index__ special method (but not self).

The "anything but not self" rule is not consistent with any other
special method's behavior.  IMHO we should just do the same as

* __nonzero__(x) must return exactly a bool or an int.

This ensures that there is no infinite loop in C created by a
__nonzero__ that returns something that has a further __nonzero__

The rule that the PEP proposes for __index__ (returns anything but not
'self') is not useful, because you can still get infinite loops (you
just have to work slightly harder, and even not much).  We should just
say that __index__ must return an int or a long.

A bientot,


From monpublic at  Fri Feb 17 17:27:33 2006
From: monpublic at (CM)
Date: Fri, 17 Feb 2006 09:27:33 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>


It's about time!

- C

On 2/16/06, Guido van Rossum <guido at> wrote:
> A bunch of Googlers were discussing the best way of doing the
> following (a common idiom when maintaining a dict of lists of values
> relating to a key, sometimes called a multimap):
>   if key not in d: d[key] = []
>   d[key].append(value)
> An alternative way to spell this uses setdefault(), but it's not very
> readable:
>   d.setdefault(key, []).append(value)
> and it also suffers from creating an unnecessary list instance.
> (Timings were inconclusive; the approaches are within 5-10% of each
> other in speed.)
> My conclusion is that setdefault() is a failure -- it was a
> well-intentioned construct, but doesn't actually create more readable
> code.
> Google has an internal data type called a DefaultDict which gets
> passed a default value upon construction. Its __getitem__ method,
> instead of raising KeyError, inserts a shallow copy (!) of the given
> default value into the dict when the value is not found. So the above
> code, after
>   d = DefaultDict([])
> can be written as simply
>   d[key].append(value)
> Note that of all the possible semantics for __getitem__ that could
> have produced similar results (e.g. not inserting the default in the
> underlying dict, or not copying the default value), the chosen
> semantics are the only ones that makes this example work.
> Over lunch with Alex Martelli, he proposed that a subclass of dict
> with this behavior (but implemented in C) would be a good addition to
> the language. It looks like it wouldn't be hard to implement. It could
> be a builtin named defaultdict. The first, required, argument to the
> constructor should be the default value. Remaining arguments (even
> keyword args) are passed unchanged to the dict constructor.
> Some more design subtleties:
> - "key in d" still returns False if the key isn't there
> - "d.get(key)" still returns None if the key isn't there
> - "d.default" should be a read-only attribute giving the default value
> Feedback?
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

"A programmer learning programming from Perl is like a chemistry student
learning the definition of 'exothermic' with dynamite." - evilpenguin
-------------- next part --------------
An HTML attachment was scrubbed...

From fredrik at  Fri Feb 17 17:37:20 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 17:37:20 +0100
Subject: [Python-Dev] bytes type discussion
References: <><dt09vc$tvv$>	<><dt0b8s$2eb$><><dt0fr2$fmg$>	<><dt2bsi$cb4$>
	<dt3nqh$rlt$> <>
Message-ID: <dt4u42$ss0$>

Bengt Richter wrote:

> >> because there's no way to count to 10 if you only have one digit?
> >>
> >> we used to think that back when the gas price was just below 10 SEK/L,
> >> but they found a way...
> >>
> >IIRC Guido is on record as saying "There will be no Python 2.10 because
> >I hate the ambiguity of double-digit minor release numbers", or words to
> >that effect.
> Hex?

or roman numbers.

I've payed X.35 SEK/L for gas...


From thomas at  Fri Feb 17 17:41:11 2006
From: thomas at (Thomas Wouters)
Date: Fri, 17 Feb 2006 17:41:11 +0100
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
	to PyNumberMethods
In-Reply-To: <>
References: <dsu7t9$m9c$>
Message-ID: <>

On Fri, Feb 17, 2006 at 05:29:32PM +0100, Armin Rigo wrote:
> >        Where obj must be either an int or a long or another object that has the
> >        __index__ special method (but not self).

> The "anything but not self" rule is not consistent with any other
> special method's behavior.  IMHO we should just do the same as
> __nonzero__():

> * __nonzero__(x) must return exactly a bool or an int.

Yes, very much so. And in case people worry that this makes wrapping
objects harder: proxy objects (for instance) would do 'return

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From walter at  Fri Feb 17 17:44:56 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 17 Feb 2006 17:44:56 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Walter D?rwald wrote:
>> M.-A. Lemburg wrote:
>>> Walter D?rwald wrote:
 >>>> [...]
>>>> So maybe
>>>> codecs.lookup() should return an instance of a subclass of tuple which
>>>> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
>>>> must be able to handle old 4-tuples returned by old search functions and
>>>> update those to the new 6-tuples. (But we could drop this again after
>>>> several releases, once all third party codecs are updated).
>>> This was a design error: I should have not made
>>> codecs.lookup() a documented function.
>>> I'd suggest we keep codecs.lookup() the way it is and
>>> instead add new functions to the codecs module, e.g.
>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>> Changing the codec registration is not much of a problem:
>>> we could simply allow 6-tuples to be passed into the
>>> registry.
>> OK, so codecs.lookup() returns 4-tuples, but the registry stores
>> 6-tuples and the search functions must return 6-tuples. And we add
>> codecs.getencoderobject() and codecs.getdecoderobject() as well as new
>> classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
>> old search functions that return 4-tuples?
> The registry should then simply set the missing entries to None
> and the getencoderobject()/getdecoderobject() would then have
> to raise an error.

Sounds simple enough and we don't loose backwards compatibility.

> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!

+1, but I'd like to have a replacement for this, i.e. a function that 
returns all info the registry has about an encoding:

1. Name
2. Encoder function
3. Decoder function
4. Stateful encoder factory
5. Stateful decoder factory
6. Stream writer factory
7. Stream reader factory

and if this is an object with attributes, we won't have any problems if 
we extend it in the future.

BTW, if we change the API, can we fix the return value of the stateless 
functions? As the stateless function always encodes/decodes the complete 
string, returning the length of the string doesn't make sense. 
codecs.getencoder() and codecs.getdecoder() would have to continue to 
return the old variant of the functions, but 
codecs.getinfo("latin-1").encoder would be the new encoding function.

    Walter D?rwald

From chris at  Fri Feb 17 17:48:28 2006
From: chris at (Chris AtLee)
Date: Fri, 17 Feb 2006 11:48:28 -0500
Subject: [Python-Dev] Copying zlib compression objects
Message-ID: <>

I'm writing a program in python that creates tar files of a certain
maximum size (to fit onto CD/DVD).  One of the problems I'm running
into is that when using compression, it's pretty much impossible to
determine if a file, once added to an archive, will cause the archive
size to exceed the maximum size.

I believe that to do this properly, you need to copy the state of tar
file (basically the current file offset as well as the state of the
compression object), then add the file.  If the new size of the archive
exceeds the maximum, you need to restore the original state.

The critical part is being able to copy the compression object.
Without compression it is trivial to determine if a given file will
"fit" inside the archive.  When using compression, the compression
ratio of a file depends partially on all the data that has been
compressed prior to it.

The current implementation in the standard library does not allow you
to copy these compression objects in a useful way, so I've made some
minor modifications (patch attached) to the standard 2.4.2 library:
- Add copy() method to zlib compression object.  This returns a new
compression object with the same internal state.  I named it copy() to
keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile.  These
work only in write mode.  snapshot() returns a state object.  Passing
in this state object to restore() will restore the state of the
GzipFile / TarFile to the state represented by the object.

Future work:
- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects

Although this patch isn't complete, does this seem like a good approach?

-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snapshots.diff
Type: text/x-patch
Size: 3500 bytes
Desc: not available
Url : 

From bokr at  Fri Feb 17 18:28:45 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 17:28:45 GMT
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
References: <>	<>
Message-ID: <>

On Fri, 17 Feb 2006 00:33:49 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Josiah Carlson wrote:
>> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
>> and likely a few others that the two of you may be arguing against
>> should stay as encodings, because strictly speaking, they are defined as
>> encodings of data.  They may not be encodings of _unicode_ data, but
>> that doesn't mean that they aren't useful encodings for other kinds of
>> data, some text, some binary, ...
>To support them, the bytes type would have to gain a .encode method,
>and I'm -1 on supporting bytes.encode, or string.decode.
>Why is
>any better than
One aspect is that dotted notation method calling is serially composable,
whereas function calls nest, and you have to find and read from the innermost,
which gets hard quickly unless you use multiline formatting. But even then
you can't read top to bottom as processing order.

If we had a general serial composition syntax for function calls
something like unix piping (which is a big part of the power of unix shells IMO)
we could make the choice of appropriate composition semantics better.

Decorators already compose functions in a limited way, but processing
order would read like forth horizontally. Maybe '->' ? How about

    foo(x, y) -> bar() -> baz(z)

as as sugar for

    baz.__get__(bar.__get__(foo(x, y))())(z)

? (Hope I got that right ;-)

I.e., you'd have self-like args to receive results from upstream. E.g.,

 >>> def foo(x, y): return 'foo(%s, %s)'%(x,y)
 >>> def bar(stream): return 'bar(%s)'%stream
 >>> def baz(stream, z): return 'baz(%s, %s)'%(stream,z)
 >>> x = 'ex'; y='wye'; z='zed'
 >>> baz.__get__(bar.__get__(foo(x, y))())(z)
 'baz(bar(foo(ex, wye)), zed)'

Bengt Richter

From arigo at  Fri Feb 17 18:30:43 2006
From: arigo at (Armin Rigo)
Date: Fri, 17 Feb 2006 18:30:43 +0100
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>


On Tue, Feb 14, 2006 at 09:24:57PM -0800, Neal Norwitz wrote:

There is at least one SF bug, namely "#1333982 Bugs of the new AST
compiler", that in my humble opinion absolutely needs to be fixed before
the release, even though I won't hide that I have no intention of fixing
it myself.  Should I raise the issue here in python-dev, and see if we
agree that it is critical?

(Sorry if I should know about the procedure.  Does it then go in the
PEP's Planned Features list?)

A bientot,


From skip at  Fri Feb 17 18:35:45 2006
From: skip at (skip at
Date: Fri, 17 Feb 2006 11:35:45 -0600
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <> <>
Message-ID: <>

    Steve> It appears to hang together, but I'm not sure I see how it
    Steve> overcomes objections to lambda by replacing it with another
    Steve> keyword.

Well, it does replace it with a word which has meaning in common English.

FWIW, I would require the parens around the arguments and avoid the
ambiguity altogether.


From tjreedy at  Fri Feb 17 18:39:31 2006
From: tjreedy at (Terry Reedy)
Date: Fri, 17 Feb 2006 12:39:31 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><002301c6337a$7001f140$b83efea9@RaymondLaptop1>
Message-ID: <dt51og$bgr$>

"Fredrik Lundh" <fredrik at> wrote in message 
news:dt4icb$e9n$1 at
> Raymond Hettinger wrote:
>> I would like to add something like this to the collections module, but a 
>> PEP is
>> probably needed to deal with issues like:
> frankly, now that Guido is working 50% on Python, do we really have to 
> use
> the full PEP process also for simple things like this?
> I'd say we let the BDFL roam free.

PEPs are useful for question-answering purposes even after approval.  The 
design phase can be cut short by simply posting the approved design doc.

From jeremy at  Fri Feb 17 18:40:14 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 17 Feb 2006 12:40:14 -0500
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

It is critical, but I hadn't seen the bug report.  Feel free to assign
AST bugs to me and assign them a > 5 priority.


On 2/17/06, Armin Rigo <arigo at> wrote:
> Hi,
> On Tue, Feb 14, 2006 at 09:24:57PM -0800, Neal Norwitz wrote:
> >
> There is at least one SF bug, namely "#1333982 Bugs of the new AST
> compiler", that in my humble opinion absolutely needs to be fixed before
> the release, even though I won't hide that I have no intention of fixing
> it myself.  Should I raise the issue here in python-dev, and see if we
> agree that it is critical?
> (Sorry if I should know about the procedure.  Does it then go in the
> PEP's Planned Features list?)
> A bientot,
> Armin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From jeremy at  Fri Feb 17 18:44:15 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 17 Feb 2006 12:44:15 -0500
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

Actually, it might be easier to assign separate bugs.  A number of the
old bugs appear to have been fixed.  It's hard to track individual
items within a bug report.


On 2/17/06, Jeremy Hylton <jeremy at> wrote:
> It is critical, but I hadn't seen the bug report.  Feel free to assign
> AST bugs to me and assign them a > 5 priority.
> Jeremy
> On 2/17/06, Armin Rigo <arigo at> wrote:
> > Hi,
> >
> > On Tue, Feb 14, 2006 at 09:24:57PM -0800, Neal Norwitz wrote:
> > >
> >
> > There is at least one SF bug, namely "#1333982 Bugs of the new AST
> > compiler", that in my humble opinion absolutely needs to be fixed before
> > the release, even though I won't hide that I have no intention of fixing
> > it myself.  Should I raise the issue here in python-dev, and see if we
> > agree that it is critical?
> >
> > (Sorry if I should know about the procedure.  Does it then go in the
> > PEP's Planned Features list?)
> >
> >
> > A bientot,
> >
> > Armin
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >

From ianb at  Fri Feb 17 18:58:30 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 11:58:30 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <002301c6337a$7001f140$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
>>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>>with this behavior (but implemented in C) would be a good addition to
>>>the language
> I would like to add something like this to the collections module, but a PEP is 
> probably needed to deal with issues like:
> * implications of a __getitem__ succeeding while get(value, x) returns x 
> (possibly different from the overall default)
> * implications of a __getitem__ succeeding while __contains__ would fail
> * whether to add this to the collections module (I would say yes)
> * whether to allow default functions as well as default values (so you could 
> instantiate a new default list)
> * comparing all the existing recipes and third-party modules that have already 
> done this
> * evaluating its fitness for common use cases (i.e. bags and dict of lists).

It doesn't seem that useful for bags, assuming we're talking about an 
{object: count} implementation of bags; bags should really have a more 
set-like interface than a dict-like interface.

A dict of lists typically means a multi-valued dict.  In that case it 
seems like x[key_not_found] should return the empty list, as that means 
zero values; even though zero values also means that 
x.has_key(key_not_found) should return False as well.  *but* getting 
x[key_not_found] does not (for a multi-valued dict) mean that suddently 
has_key should return true.  I find the side-effect nature of 
__getitem__ as proposed in default_dict to be rather confusing, and when 
reading code it will very much break my expectations.  I assume that 
attribute access and [] access will not have side effects.  Coming at it 
from that direction, I'm -1, though I'm +1 on dealing with the specific 
use case that started this (x.setdefault(key, []).append(value)).

An implementation targetted specifically at multi-valued dictionaries 
seems like it would be better.  Incidentally, on Web-SIG we've discussed 
wsgiref, and it includes a mutli-values, ordered, case-insensitive 
dictionary.  Such a dictionary(ish) object has clear applicability for 
HTTP headers, but certainly it is something I've used many times 
elsewhere.  In a case-sensitive form it applies to URL variables. 
Really there's several combinations of features, each with different uses.

So we have now...

dicts: unordered, key:value (associative), single-value
sets: unordered, not key:value, single-value
lists: ordered, not key:value, multi-value

We don't have...

bags: unordered, not key:value, multi-value
multi-dict: unordered, key:value, multi-value
ordered-dict: ordered, key:value, single-value
ordered-multi-dict: ordered, key:value, single-value

For all key:value collections, normalized keys can be useful.  (Though 
notably the wsgiref Headers object does not have normalized keys, but 
instead does case-insensitive comparisons.)  I don't know where 
dict-of-dict best fits in here.

Ian Bicking  /  ianb at  /

From fredrik at  Fri Feb 17 19:10:03 2006
From: fredrik at (Fredrik Lundh)
Date: Fri, 17 Feb 2006 19:10:03 +0100
Subject: [Python-Dev] Proposal: defaultdict
References: <><002301c6337a$7001f140$b83efea9@RaymondLaptop1><dt4icb$e9n$>
Message-ID: <dt53hu$i4g$>

Terry Reedy wrote:

> > I'd say we let the BDFL roam free.
> PEPs are useful for question-answering purposes even after approval.  The
> design phase can be cut short by simply posting the approved design doc.

not for trivialities.  it'll take Guido more time to write a PEP than to
implement the damn thing.  is that really a good use of his time ?

why is python-dev suddenly full of control freaks ?


From skip at  Fri Feb 17 19:20:54 2006
From: skip at (skip at
Date: Fri, 17 Feb 2006 12:20:54 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt4cl7$qds$>
References: <>
	<> <dt4cl7$qds$>
Message-ID: <>

    >> Also, I think has_key/in should return True if there is a default.

    Fredrik> and keys should return all possible key values!

I think keys() and in should reflect reality.  Only when you do something

    x = d['nonexistent']


    x = d.get('nonexistent')

should the default value come into play.


From ianb at  Fri Feb 17 19:21:54 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 12:21:54 -0600
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dsj0p7$tk3$>
References: <dsj0p7$tk3$>
Message-ID: <>

Georg Brandl wrote:
> Hi,
> it has been proposed before, but there was no conclusive answer last time:
> is there any chance for 2.5 to include commonly used decorators in a module?

One peculiar aspect is that decorators are a programming technique, not 
a particular kind of functionality.  So the module seems kind of funny 
as a result.

> Of course not everything that jumps around should go in, only pretty basic
> stuff that can be widely used.
> Candidates are:
>  - @decorator. This properly wraps up a decorator function to change the
>    signature of the new function according to the decorated one's.

Yes, I like this, and it is purely related to "decorators" not anything 
else.  Without this, decorators really hurt introspectability.

>  - @contextmanager, see PEP 343.

This is abstract enough that it doesn't belong anywhere in particular.

>  - @synchronized/@locked/whatever, for thread safety.

Seems better in the threading module.  Plus contexts and with make it 
much less important as a decorator.

>  - @memoize

Also abstract, so I suppose it would make sense.

>  - Others from wiki:PythonDecoratorLibrary and Michele Simionato's decorator
>    module at <>.

redirecting_stdout is better implemented using contexts/with.  @threaded 
(which runs the decorated function in a thread) seems strange to me. 
@blocking seems like it is going into async directions that don't really 
fit in with "decorators" (as a general concept).

I like @tracing, though it doesn't seem like it is really implemented 
there, it's just an example?

> Unfortunately, a @property decorator is impossible...

It already works!  But only if you want a read-only property.  Which is 
actually about 50%+ of the properties I create.  So the status quo is 
not really that bad.

Ian Bicking  /  ianb at  /

From skip at  Fri Feb 17 19:26:00 2006
From: skip at (skip at
Date: Fri, 17 Feb 2006 12:26:00 -0600
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dt4kms$mfj$>
References: <dsj0p7$tk3$>
Message-ID: <>

    >> it has been proposed before, but there was no conclusive answer last
    >> time: is there any chance for 2.5 to include commonly used decorators
    >> in a module?

    Georg> No interest at all?

I would think the decorators that allow proper introspection (func_name,
__doc__, etc) should be available, probably in a decorators module.  Beyond
that I'm not sure.  I think it would be better to be conservative.


From g.brandl at  Fri Feb 17 19:35:55 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 19:35:55 +0100
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <>
References: <dsj0p7$tk3$> <>
Message-ID: <dt552b$mnq$>

Ian Bicking wrote:

>> Unfortunately, a @property decorator is impossible...
> It already works!  But only if you want a read-only property.  Which is 
> actually about 50%+ of the properties I create.  So the status quo is 
> not really that bad.

I have abused it this way too and felt bad every time.
Kind of like keeping your hat on in the church. :)


From bokr at  Fri Feb 17 19:37:08 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 18:37:08 GMT
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <>

On Fri, 17 Feb 2006 08:09:23 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Guido van Rossum wrote:
>> Feedback?
>I would like this to be part of the standard dictionary type,
>rather than being a subtype.
>d.setdefault([]) (one argument) should install a default value,
>and d.cleardefault() should remove that setting; d.default
>should be read-only. Alternatively, d.default could be assignable
>and del-able.
I like the latter, but d.default_factory = callable # or None
>Also, I think has_key/in should return True if there is a default.
That seems iffy. ISTM potential should not define actual status.

Bengt Richter

From guido at  Fri Feb 17 20:09:47 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 11:09:47 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Guido van Rossum <guido at> wrote:
> Over lunch with Alex Martelli, he proposed that a subclass of dict
> with this behavior (but implemented in C) would be a good addition to
> the language. It looks like it wouldn't be hard to implement. It could
> be a builtin named defaultdict. The first, required, argument to the
> constructor should be the default value. Remaining arguments (even
> keyword args) are passed unchanged to the dict constructor.

Thanks for all the constructive feedback. Here are some responses and
a new proposal.

- Yes, I'd like to kill setdefault() in 3.0 if not sooner.

- It would indeed be nice if this was an optional feature of the
standard dict type.

- I'm ignoring the request for other features (ordering, key
transforms). If you want one of these, write a PEP!

- Many, many people suggested to use a factory function instead of a
default value. This is indeed a much better idea (although slightly
more cumbersome for the simplest cases).

- Some people seem to think that a subclass constructor signature must
match the base class constructor signature. That's not so. The
subclass constructor must just be careful to call the base class
constructor with the correct arguments. Think of the subclass
constructor as a factory function.

- There's a fundamental difference between associating the default
value with the dict object, and associating it with the call. So
proposals to invent a better name/signature for setdefault() don't
compete. (As to one specific such proposal, adding an optional bool as
the 3rd argument to get(), I believe I've explained enough times in
the past that flag-like arguments that always get a constant passed in
at the call site are a bad idea and should usually be refactored into
two separate methods.)

- The inconsistency introduced by __getitem__() returning a value for
keys while get(), __contains__(), and keys() etc. don't show it,
cannot be resolved usefully. You'll just have to live with it.
Modifying get() to do the same thing as __getitem__() doesn't seem
useful -- it just takes away a potentially useful operation.

So here's a new proposal.

Let's add a generic missing-key handling method to the dict class, as
well as a default_factory slot initialized to None. The implementation
is like this (but in C):

def on_missing(self, key):
  if self.default_factory is not None:
    value = self.default_factory()
    self[key] = value
    return value
  raise KeyError(key)

When __getitem__() (and *only* __getitem__()) finds that the requested
key is not present in the dict, it calls self.on_missing(key) and
returns whatever it returns -- or raises whatever it raises.
__getitem__() doesn't need to raise KeyError any more, that's done by

The on_missing() method can be overridden to implement any semantics
you want when the key isn't found: return a value without inserting
it, insert a value without copying it, only do it for certain key
types/values, make the default incorporate the key, etc.

But the default implementation is designed so that we can write

d = {}
d.default_factory = list

to create a dict that inserts a new list whenever a key is not found
in __getitem__(), which is most useful in the original use case:
implementing a multiset so that one can write


to add a new key/value to the multiset without having to handle the
case separately where the key isn't in the dict yet. This also works
for sets instead of lists:

d = {}
d.default_factory = set

I went through several iterations to obtain this design; my first
version of on_missing() would just raise KeyError(key), requiring you
to always provide a subclass; this is more minimalistic but less
useful and would probably raise the bar for using the feature to some

To saev you attempts to simplify this, here are some near-misses I
considered that didn't quite work out:

- def on_missing(self, key):
    if self.default_factory is not None:
      return self.default_factory()
    raise KeyError(key)

This would require the multiset example to subclass, since
default_factory doesn't see the key so it can't insert it.

- def on_missing(self, key):
    if self.default_factory is not None:
      return self.default_factory(key)
    raise KeyError(key)

This appears to fix that problem, but now you can't write
"d.default_value = list" since (a) list(key) doesn't return an empty
list, and (b) it also doesn't insert the key into the dict; attempting
to assign a callback function to default_factory that solves these
issues fail because the callback doesn't have access to the dict
instance (unless there's only one).

- Do away with on_missing() and just include its body at the end of
__getitem__(), to be invoked when the key isn't found.

This is less general in case you want different default semantics
(e.g. not inserting the default, or making the default a function of
the key) -- you'd have to override __getitem__() for that, which means
you'd be paying overhead even for keys that *are* present.

I'll try to cook up an implementation on SF after I've dug myself out
of the day's email barrage.

--Guido van Rossum (home page:

From fdrake at  Fri Feb 17 20:26:40 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Fri, 17 Feb 2006 14:26:40 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Friday 17 February 2006 14:09, Guido van Rossum wrote:
 > So here's a new proposal.

I like the version you came up with.  It has sufficient functionality to make 
it easy to use, and enough flexibility to be useful in more specialized 
cases.  I'm quite certain it would handle all the cases I've actually dealt 
with where I wanted a variation of a mapping with default values.


Fred L. Drake, Jr.   <fdrake at>

From aleaxit at  Fri Feb 17 20:34:46 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 17 Feb 2006 11:34:46 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, Guido van Rossum <guido at> wrote:
> A bunch of Googlers were discussing the best way of doing the
Wow, what a great discussion!  As you'll recall, I had also mentioned
the "callable factory" as a live possibility, and there seems to be a
strong sentiment in favor of that; not really a "weakness case" for
HOFs, as you feared it might be during the lunchtime discussion.

Out of all I've read here, I like the idea of having a
collections.autodict (a much nicer name than defaultdict, a better
collocation for 2.5 than the builtins). One point I think nobody has
made is that whenever reasonably possible the setting of a callback
(the callable factory here) should include *a and **k to use when
calling back.  So, for example:

ad = collections.autodict(copy.copy, whatever)

would easily cover the use case of Google's DefaultDict (yes, partial
would also cover this use case, but having *a and **k is usefully more
general).  If you're convinced copy.copy is an overwhelmingly popular
use case (I'm not, personally), then this specific idiom might also be
summarized in a classmethod, a la

ad = collections.autodict.by_copy(whatever)

This way, all autodicts would start out empty (and be filled by update
if needed).  An alternative would be to have autodict's ctor have the
same signature as dict's, with a separate .set_initial method to pass
the factory (and *a, **k) -- this way an autodict might start out
populated, but would always start with some default factory, such as
lambda:None I guess. I think the first alternative (autodict always
starts empty, but with a specifically chosen factory [including *a,
**k]) is more useful.


From ianb at  Fri Feb 17 20:51:04 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 13:51:04 -0600
Subject: [Python-Dev] Counter proposal: multidict (was: Proposal:
In-Reply-To: <>
References: <>
Message-ID: <>

I really don't like that defaultdict (or a dict extension) means that 
x[not_found] will have noticeable side effects.  This all seems to be a 
roundabout way to address one important use case of a dictionary with 
multiple values for each key, and in the process breaking an important 
quality of good Python code, that attribute and getitem access not have 
noticeable side effects.

So, here's a proposed interface for a new multidict object, borrowing 
some methods from Set but mostly from dict.  Some things that seemed 
particularly questionable to me are marked with ??.

class multidict:

     def __init__([mapping], [**kwargs]):
         Create a multidict:

         multidict() -> new empty multidict
         multidict(mapping) -> equivalent to:
             ob = multidict()
         multidict(**kwargs) -> equivalent to:
             ob = multidict()

     def __contains__(key):
         True if ``self[key]`` is true

     def __getitem__(key):
         Returns a list of items associated with the given key.  If
         nothing, then the empty list.

         ??: Is the list mutable, and to what effect?

     def __delitem__(key):
         Removes any instances of key from the dictionary.  Does
         not raise an error if there are no values associated.

         ??: Should this raise a KeyError sometimes?

     def __setitem__(key, value):
         Same as:

             del self[key]
             self.add(key, value)

     def get(key, default=[]):
         Returns a list of items associated with the given key,
         or if that list would be empty it returns default

     def getfirst(key, default=None):
         Equivalent to:
             if key in self:
                 return self[key][0]
                 return default

     def add(key, value):
         Adds the value with the given key, so that
         self[key][-1] == value

     def remove(key, value):
         Remove (key, value) from the mapping (raising KeyError if not

     def discard(key, value):
         Remove like self.remove(key, value), except do not raise
         KeyError if missing.

     def pop(key):
         Removes key and returns the value; returns [] and does nothing
         if the key is not found.

     def keys():
         Returns all the keys which have some associated value.

     def items():
         Returns [(key, value)] for every key/value pair.  Keys that
         have multiple values will be returned as multiple (key, value)

     def __len__():
         Equivalent to len(self.items())

         ??: Not len(self.keys())?

     def update(E, **kwargs):
         if E has iteritems then::

             for k, v in E.iteritems():
                 self.add(k, v)

         elif E has keys:

             for k in E:
                 self.add(k, E[k])


             for k, v in E:
                 self.add(k, v)

         ??: Should **kwargs be allowed?  If so, should it the values
         be sequences?

     # iteritems, iterkeys, iter, has_key, copy, popitem, values, clear
     # with obvious implementations

From guido at  Fri Feb 17 20:58:10 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 11:58:10 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Alex Martelli <aleaxit at> wrote:
> On 2/16/06, Guido van Rossum <guido at> wrote:
> > A bunch of Googlers were discussing the best way of doing the
>    ...
> Wow, what a great discussion!  As you'll recall, I had also mentioned
> the "callable factory" as a live possibility, and there seems to be a
> strong sentiment in favor of that; not really a "weakness case" for
> HOFs, as you feared it might be during the lunchtime discussion.


You seem to have missed my revised proposal.

> Out of all I've read here, I like the idea of having a
> collections.autodict (a much nicer name than defaultdict, a better
> collocation for 2.5 than the builtins). One point I think nobody has
> made is that whenever reasonably possible the setting of a callback
> (the callable factory here) should include *a and **k to use when
> calling back.

That's your C/C++ brain talking. :-)

If you need additional data passed to a callback (to be provided at
the time the callback is *set*, not when it is *called*) the customary
approach is to make the callback a parameterless lambda; you can also
use a bound method, etc. There's no need to complicate ever piece of
code that calls a callback with the machinery to store and use
arbirary arguments and keyword arguments.

I forgot to mention in my revised proposal that the API for setting
the default_factory is slightly odd:

  d = {}   # or dict()
  d.default_factory = list

rather than

  d = dict(default_factory=list)

This is of course because we cut off that way when we defined what
arbitrary keyword arguments to the dict constructor would do. My
original proposal solved this by creating a subclass. But there were
several suggestions that this would be fine functionality to add to
the standard dict type -- and then I really don't see any other way to
do this. (Yes, I could have a set_default_factory() method -- but a
simple settable attribute seems more pythonic!)

--Guido van Rossum (home page:

From guido at  Fri Feb 17 21:02:15 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 12:02:15 -0800
Subject: [Python-Dev] Counter proposal: multidict (was: Proposal:
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Ian Bicking <ianb at> wrote:
> I really don't like that defaultdict (or a dict extension) means that
> x[not_found] will have noticeable side effects.  This all seems to be a
> roundabout way to address one important use case of a dictionary with
> multiple values for each key, and in the process breaking an important
> quality of good Python code, that attribute and getitem access not have
> noticeable side effects.
> So, here's a proposed interface for a new multidict object, borrowing
> some methods from Set but mostly from dict.  Some things that seemed
> particularly questionable to me are marked with ??.

Have you seen my revised proposal (which is indeed an addition to the
standard dict rather than a subclass)?

Your multidict addresses only one use case for the proposed behavior;
what's so special about dicts of lists that they should have special
support? What about dicts of dicts, dicts of sets, dicts of
user-defined objects?

--Guido van Rossum (home page:

From fdrake at  Fri Feb 17 21:03:06 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Fri, 17 Feb 2006 15:03:06 -0500
Subject: [Python-Dev] Counter proposal: multidict (was: Proposal:
In-Reply-To: <>
References: <>
Message-ID: <>

On Friday 17 February 2006 14:51, Ian Bicking wrote:
 > This all seems to be a
 > roundabout way to address one important use case of a dictionary with
 > multiple values for each key, 

I think there are use cases that do not involve multiple values per key.  That 
is one place where this commonly comes up, but not the only one.

 > and in the process breaking an important 
 > quality of good Python code, that attribute and getitem access not have
 > noticeable side effects.

I'm not sure that's quite as well-defined or agreed upon as you do.


Fred L. Drake, Jr.   <fdrake at>

From theller at  Fri Feb 17 21:04:42 2006
From: theller at (Thomas Heller)
Date: Fri, 17 Feb 2006 21:04:42 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> So here's a new proposal.
> Let's add a generic missing-key handling method to the dict class, as
> well as a default_factory slot initialized to None. The implementation
> is like this (but in C):
> def on_missing(self, key):
>   if self.default_factory is not None:
>     value = self.default_factory()
>     self[key] = value
>     return value
>   raise KeyError(key)
> When __getitem__() (and *only* __getitem__()) finds that the requested
> key is not present in the dict, it calls self.on_missing(key) and
> returns whatever it returns -- or raises whatever it raises.
> __getitem__() doesn't need to raise KeyError any more, that's done by
> on_missing().

Will this also work when PyDict_GetItem() does not find the key?


From guido at  Fri Feb 17 21:11:29 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 12:11:29 -0800
Subject: [Python-Dev] Copying zlib compression objects
In-Reply-To: <>
References: <>
Message-ID: <>

Please submit your patch to SourceForge.

On 2/17/06, Chris AtLee <chris at> wrote:
> I'm writing a program in python that creates tar files of a certain
>  maximum size (to fit onto CD/DVD).  One of the problems I'm running
>  into is that when using compression, it's pretty much impossible to
>  determine if a file, once added to an archive, will cause the archive
>  size to exceed the maximum size.
> I believe that to do this properly, you need to copy the state of tar
>  file (basically the current file offset as well as the state of the
>  compression object), then add the file.  If the new size of the archive
>  exceeds the maximum, you need to restore the original state.
> The critical part is being able to copy the compression object.
>  Without compression it is trivial to determine if a given file will
>  "fit" inside the archive.  When using compression, the compression
>  ratio of a file depends partially on all the data that has been
>  compressed prior to it.
> The current implementation in the standard library does not allow you
>  to copy these compression objects in a useful way, so I've made some
>  minor modifications (patch attached) to the standard 2.4.2 library:
>  - Add copy() method to zlib compression object.  This returns a new
>  compression object with the same internal state.  I named it copy() to
>  keep it consistent with things like sha.copy().
>  - Add snapshot() / restore() methods to GzipFile and TarFile.  These
>  work only in write mode.  snapshot() returns a state object.  Passing
>  in this state object to restore() will restore the state of the
>  GzipFile / TarFile to the state represented by the object.
> Future work:
>  - Decompression objects could use a copy() method too
>  - Add support for copying bzip2 compression objects
> Although this patch isn't complete, does this seem like a good approach?
> Cheers,
>  Chris
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From theller at  Fri Feb 17 21:18:42 2006
From: theller at (Thomas Heller)
Date: Fri, 17 Feb 2006 21:18:42 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	
Message-ID: <>

[cc to py-dev again]

Guido van Rossum wrote:
> On 2/17/06, Thomas Heller <theller at> wrote:
>> Guido van Rossum wrote:
>>> So here's a new proposal.
>>> Let's add a generic missing-key handling method to the dict class, as
>>> well as a default_factory slot initialized to None. The implementation
>>> is like this (but in C):
>>> def on_missing(self, key):
>>>   if self.default_factory is not None:
>>>     value = self.default_factory()
>>>     self[key] = value
>>>     return value
>>>   raise KeyError(key)
>>> When __getitem__() (and *only* __getitem__()) finds that the requested
>>> key is not present in the dict, it calls self.on_missing(key) and
>>> returns whatever it returns -- or raises whatever it raises.
>>> __getitem__() doesn't need to raise KeyError any more, that's done by
>>> on_missing().
>> Will this also work when PyDict_GetItem() does not find the key?
> Ouch, tricky. It should, of course, but the code will be a tad tricky
> because it's not supposed to inc the refcount. Thanks for reminding
> me!

Ahem, I'm still looking for ways to 'overtake' the dict to implement
weird and fancy things.  Can on_missing be overridden in subclasses (writing
the subclass in C would not be a problem)?



From guido at  Fri Feb 17 21:23:06 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 12:23:06 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Thomas Heller <theller at> wrote:
> Ahem, I'm still looking for ways to 'overtake' the dict to implement
> weird and fancy things.  Can on_missing be overridden in subclasses (writing
> the subclass in C would not be a problem)?

Why ahem?

The answer is yes.

--Guido van Rossum (home page:

From jack at  Fri Feb 17 21:25:14 2006
From: jack at (Jack Diederich)
Date: Fri, 17 Feb 2006 15:25:14 -0500
Subject: [Python-Dev] Counter proposal: multidict (was: Proposal:
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Feb 17, 2006 at 03:03:06PM -0500, Fred L. Drake, Jr. wrote:
> On Friday 17 February 2006 14:51, Ian Bicking wrote:
>  > and in the process breaking an important 
>  > quality of good Python code, that attribute and getitem access not have
>  > noticeable side effects.
> I'm not sure that's quite as well-defined or agreed upon as you do.
Without the __getitem__ side effect default objects that don't support any
operators would have problems.

  d[key] += val

works fine when the default is a list or int but fails for dicts and presumably
many user defined objects.  By assigning the default value in __getitem__ the
returned value can be manipulated via its methods.


From theller at  Fri Feb 17 21:27:36 2006
From: theller at (Thomas Heller)
Date: Fri, 17 Feb 2006 21:27:36 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/17/06, Thomas Heller <theller at> wrote:
>> Ahem, I'm still looking for ways to 'overtake' the dict to implement
>> weird and fancy things.  Can on_missing be overridden in subclasses (writing
>> the subclass in C would not be a problem)?
> Why ahem?
> The answer is yes.

Ok, so that allows to pass the key, for example, to the default_factory -
allowing the case insensitive lookup in namespaces.


From theller at  Fri Feb 17 21:27:36 2006
From: theller at (Thomas Heller)
Date: Fri, 17 Feb 2006 21:27:36 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/17/06, Thomas Heller <theller at> wrote:
>> Ahem, I'm still looking for ways to 'overtake' the dict to implement
>> weird and fancy things.  Can on_missing be overridden in subclasses (writing
>> the subclass in C would not be a problem)?
> Why ahem?
> The answer is yes.

Ok, so that allows to pass the key, for example, to the default_factory -
allowing the case insensitive lookup in namespaces.


From martin at  Fri Feb 17 21:35:25 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 21:35:25 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Just because some codecs don't fit into the string.decode()
> or bytes.encode() scenario doesn't mean that these codecs are
> useless or that the methods should be banned.

No. The reason to ban string.decode and bytes.encode is that
it confuses users.


From ianb at  Fri Feb 17 21:51:26 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 14:51:26 -0600
Subject: [Python-Dev] Counter proposal: multidict
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> On 2/17/06, Ian Bicking <ianb at> wrote:
>>I really don't like that defaultdict (or a dict extension) means that
>>x[not_found] will have noticeable side effects.  This all seems to be a
>>roundabout way to address one important use case of a dictionary with
>>multiple values for each key, and in the process breaking an important
>>quality of good Python code, that attribute and getitem access not have
>>noticeable side effects.
>>So, here's a proposed interface for a new multidict object, borrowing
>>some methods from Set but mostly from dict.  Some things that seemed
>>particularly questionable to me are marked with ??.
> Have you seen my revised proposal (which is indeed an addition to the
> standard dict rather than a subclass)?

Yes, and though it is more general it has the same issue of side
effects.  Doesn't it seem strange that getting an item will change the
values of .keys(), .items(), and .has_key()?

> Your multidict addresses only one use case for the proposed behavior;
> what's so special about dicts of lists that they should have special
> support? What about dicts of dicts, dicts of sets, dicts of
> user-defined objects?

What's so special?  95% (probably more!) of current use of .setdefault()
is .setdefault(key, []).append(value).

Also, since when do features have to address all possible cases? 
Certainly there are other cases, and I think they can be answered with 
other classes.  Here are some current options:

.setdefault() -- works with any subtype; slightly less efficient than 
what you propose.  Awkward to read; doesn't communicate intent very well.

UserDict -- works for a few cases where you want to make dict-like 
objects.  Messes up the concept of identity and containment -- resulting 
objects both "are" dictionaries, and "contain" a dictionary (

DictMixin -- does anything you can possibly want, requiring only the 
overriding of a couple methods.

dict subclassing -- does anything you want as well, but you typically 
have to override many more methods than with DictMixin (and if you don't 
have to override every method, that's not documented in any way).  Isn't 
written with subclassing in mind.  Really, you are proposing that one 
specific kind of override be made feasible, either with subclassing or 
injecting a method.

That said, I'm not saying that several kinds of behavior shouldn't be 
supported.  I just don't see why dict should support them all (or 
multidict).  And I also think dict will support them poorly.

multidict implements one behavior *well*.  In a documented way, with a 
name people can refer to.  I can say "multidict", I can't say "a dict 
where I set default_factory to list" (well, I can say that, but that 
just opens up yet more questions and clarifications).

Some ways multidict differs from default_factory=list:

* __contains__ works (you have to use .get() with default_factory to get 
a meaningful result)
* Barring cases where there are exceptions, x[key] and x.get(key) return 
the same value for multidict; with default_factory one returns [] and 
the other returns None when the key isn't found.  But if you do x[key]; 
x.get(key) then x.get(key) always returns [].
* You can't use __setitem__ to put non-list items into a multidict; with 
multidict you don't have to guard against non-sequences values.
* [] is meaningful not just as the default value, but as a null value; 
the multidict implementation respects both aspects.
* Specific method x.add(key, value) that indicates intent in a way that 
x[key].append(value) does not.
* items and iteritems return values meaningful to the context (a list of 
(key, single_value) -- this is usually what I want, and avoids a nested 
for loop).  __len__ also usefully different than in dict.
* .update() handles iteritems sensibly, and updates from dictionaries 
sensibly -- if you mix a default_factory=list dict with a "normal" 
(single-value) dictionary you'll get an effectively corrupted dictionary 
(where some keys are lists)
* x.getfirst(key) is useful
* I think this will be much easier to reason about in situations with 
threads -- dict acts very predictably with threads, and people rely upon 
* multidict can be written either with subclassing intended, or with an 
abstract superclass, so that other kinds of specializations of this 
superset of the dict interface can be made more easily (if DictMixin 
itself isn't already sufficient)

So, I'm saying: multidict handles one very common collection need that 
dict handles awkwardly now.  multidict is a meaningful and useful class 
with its own identity/name and meaning separate from dict, and has 
methods that represent both the intersection and the difference between 
the two classes.  multidict does not in any way preclude other 
collection objects for other situations; it is entirely unfair to expect 
a new class to solve all issues.  multidict suggests an interface that 
other related classes can use (e.g., an ordered version).  multidict, 
unlike default_factory, is not just a recipe for creating a specific and 
commonly needed object, it is a class for creating it.

Ian Bicking  /  ianb at  /

From ianb at  Fri Feb 17 22:03:49 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 15:03:49 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> d = {}
> d.default_factory = set
> ...
> d[key].add(value)

Another option would be:

   d = {}
   d.default_factory = set

Unlike .setdefault, this would use a factory associated with the 
dictionary, and no default value would get passed in.  Unlike the 
proposal, this would not override __getitem__ (not overriding 
__getitem__ is really the only difference with the proposal).  It would 
be clear reading the code that you were not implicitly asserting they 
"key in d" was true.

"get_default" isn't the best name, but another name isn't jumping out at 
me at the moment.  Of course, it is not a Pythonic argument to say that 
an existing method should be overridden, or functionality made nameless 
simply because we can't think of a name (looking to anonymous functions 
of course ;)

Ian Bicking  /  ianb at  /

From mal at  Fri Feb 17 22:11:35 2006
From: mal at (M.-A. Lemburg)
Date: Fri, 17 Feb 2006 22:11:35 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<>	<>
	<> <>
Message-ID: <>

Walter D?rwald wrote:
>>>> I'd suggest we keep codecs.lookup() the way it is and
>>>> instead add new functions to the codecs module, e.g.
>>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>>> Changing the codec registration is not much of a problem:
>>>> we could simply allow 6-tuples to be passed into the
>>>> registry.
>>> OK, so codecs.lookup() returns 4-tuples, but the registry stores
>>> 6-tuples and the search functions must return 6-tuples. And we add
>>> codecs.getencoderobject() and codecs.getdecoderobject() as well as new
>>> classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
>>> old search functions that return 4-tuples?
>> The registry should then simply set the missing entries to None
>> and the getencoderobject()/getdecoderobject() would then have
>> to raise an error.
> Sounds simple enough and we don't loose backwards compatibility.
>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
> +1, but I'd like to have a replacement for this, i.e. a function that
> returns all info the registry has about an encoding:
> 1. Name
> 2. Encoder function
> 3. Decoder function
> 4. Stateful encoder factory
> 5. Stateful decoder factory
> 6. Stream writer factory
> 7. Stream reader factory
> and if this is an object with attributes, we won't have any problems if
> we extend it in the future.

Shouldn't be a problem: just expose the registry dictionary
via the _codecs module.

The rest can then be done in a Python function defined in using a CodecInfo class.

> BTW, if we change the API, can we fix the return value of the stateless
> functions? As the stateless function always encodes/decodes the complete
> string, returning the length of the string doesn't make sense.
> codecs.getencoder() and codecs.getdecoder() would have to continue to
> return the old variant of the functions, but
> codecs.getinfo("latin-1").encoder would be the new encoding function.

No: you can still write stateless encoders or decoders that do
not process the whole input string. Just because we don't have
any of those in Python, doesn't mean that they can't be written
and used. A stateless codec might want to leave the work
of buffering bytes at the end of the input data which cannot
be processed to the caller. It is also possible to write
stateful codecs on top of such stateless encoding and decoding

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From aahz at  Fri Feb 17 22:18:54 2006
From: aahz at (Aahz)
Date: Fri, 17 Feb 2006 13:18:54 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Feb 17, 2006, Guido van Rossum wrote:
> But the default implementation is designed so that we can write
> d = {}
> d.default_factory = list


I actually like the fact that you're forced to use a separate statement
for setting the default_factory.  From my POV, this can go into 2.5.

(I was only +0 on the previous proposal and I was -1 on making it a
built-in; this extension is much nicer.)
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From guido at  Fri Feb 17 22:26:30 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 13:26:30 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On 2/16/06, Stephen J. Turnbull <stephen at> wrote:
> /usr/share often is on a different mount; that's the whole rationale
> for /usr/share.

I don't think I've worked at a place where something like that was
done for at least 10 years. Isn't this argument outdated?

--Guido van Rossum (home page:

From rhamph at  Fri Feb 17 22:54:08 2006
From: rhamph at (Adam Olsen)
Date: Fri, 17 Feb 2006 14:54:08 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Guido van Rossum <guido at> wrote:
> - There's a fundamental difference between associating the default
> value with the dict object, and associating it with the call. So
> proposals to invent a better name/signature for setdefault() don't
> compete.

That's a feature, not a bug. :)  See below.

> - The inconsistency introduced by __getitem__() returning a value for
> keys while get(), __contains__(), and keys() etc. don't show it,
> cannot be resolved usefully. You'll just have to live with it.
> Modifying get() to do the same thing as __getitem__() doesn't seem
> useful -- it just takes away a potentially useful operation.

Again, see below.

> So here's a new proposal.
> Let's add a generic missing-key handling method to the dict class, as
> well as a default_factory slot initialized to None. The implementation
> is like this (but in C):
> def on_missing(self, key):
>   if self.default_factory is not None:
>     value = self.default_factory()
>     self[key] = value
>     return value
>   raise KeyError(key)
> When __getitem__() (and *only* __getitem__()) finds that the requested
> key is not present in the dict, it calls self.on_missing(key) and
> returns whatever it returns -- or raises whatever it raises.
> __getitem__() doesn't need to raise KeyError any more, that's done by
> on_missing().

Still -1.  It's better, but it violates the principle of encapsulation
by mixing how-you-use-it state with what-it-stores state.  In doing
that it has the potential to break an API documented as accepting a
dict.  Code that expects d[key] to raise an exception (and catches the
resulting KeyError) will now silently "succeed".  I believe that
necessitates a PEP to document it.

It's also makes it harder to read code.  You may expect d[key] to
raise an exception, but it won't because of a single line up several
pages (or in another file entierly!)

d.getorset(key, func) has no such problems and has a much simpler specification:

def getorset(self, key, func):
    return self[key]
  except KeyError:
    value = self[key] = func()
    return value

Adam Olsen, aka Rhamphoryncus

From bokr at  Fri Feb 17 22:59:19 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 21:59:19 GMT
Subject: [Python-Dev] Serial function call composition syntax foo(x,
	y) -> bar() -> baz(z)
Message-ID: <>

Cut to the chase: how about being able to write

    baz(bar(foo(x, y)),z)

serially as

    foo(x, y) -> bar() -> baz(z)

via the above as sugar for

    baz.__get__(bar.__get__(foo(x, y))())(z)


I.e., you'd have self-like args to receive results from upstream. E.g.,

 >>> def foo(x, y): return 'foo(%s, %s)'%(x,y)
 >>> def bar(stream): return 'bar(%s)'%stream
 >>> def baz(stream, z): return 'baz(%s, %s)'%(stream,z)
 >>> x = 'ex'; y='wye'; z='zed'

then (faked)
 >>> foo(x, y) -> bar() -> baz(z)
 'baz(bar(foo(ex, wye)), zed)'

would do (actual) 
 >>> baz.__get__(bar.__get__(foo(x, y))())(z)
 'baz(bar(foo(ex, wye)), zed)'

(or if the callable has no __get__, use new.instancemethod methodology behind the scenes)

This is to provide an alternative to serial composition of function calls
as methods of returned objects, which sometimes looks nice, but may have strange
coupling of types and functionality. E.g. you could define classes to be able
to write the above as

    foo(x, y).bar().baz(z)    

and that's effectively what is being done by the -> notation. The __get__ stuff
is really just on-the-fly bound method generation without passing the instance
class argument. But -> allows the composition without creating knowledge coupling
between the foo, bar, and baz sources. It just has to be realized that this way
of composition works via the first argument in passing through prior results.

BTW, note that in the above foo(x, y) is just the first expression result being
fed into the chain, so a constant or any expression can be the first, since
it just becomes the argument for the innermost nested call. I.e.,

    'abcd' -> binascii.hexlify()
    >>> new.instancemethod(binascii.hexlify, 'abcd', str)()

Note that it's valid to leave off the () -- IOW
simply,  a->b is sugar for b.__get__(a)  (or the instancemethod equivalent)

 'expr' -> baz
 <bound method ?.baz of 'expr'>
 >>> baz.__get__('expr')
 <bound method ?.baz of 'expr'>

and then
 >>> baz.__get__('expr')('zee')
 'baz(expr, zee)'

What do you think? 

Bengt Richter

From martin at  Fri Feb 17 23:00:36 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 23:00:36 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Fuzzyman wrote:
>>Also, I think has_key/in should return True if there is a default.
> And exactly what use would it then be ?

Code that checks

if d.has_key(k):
  print d[k]

would work correctly. IOW, you could use a dictionary with a default
key just as if it were a normal dictionary - which is a useful
property, IMO.


From guido at  Fri Feb 17 23:08:41 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 14:08:41 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Adam Olsen <rhamph at> wrote:
> It's also makes it harder to read code.  You may expect d[key] to
> raise an exception, but it won't because of a single line up several
> pages (or in another file entierly!)

Such are the joys of writing polymorphic code. I don't really see how
you can avoid this kind of confusion -- I could have given you some
other mapping object that does weird stuff.

--Guido van Rossum (home page:

From g.brandl at  Fri Feb 17 23:06:12 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 17 Feb 2006 23:06:12 +0100
Subject: [Python-Dev] Serial function call composition syntax foo(x,
 y) -> bar() -> baz(z)
In-Reply-To: <>
References: <>
Message-ID: <dt5hck$vf4$>

Bengt Richter wrote:
> Cut to the chase: how about being able to write
>     baz(bar(foo(x, y)),z)
> serially as
>     foo(x, y) -> bar() -> baz(z)
> via the above as sugar for
>     baz.__get__(bar.__get__(foo(x, y))())(z)

Reminds me of


From martin at  Fri Feb 17 23:13:14 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 23:13:14 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Adam Olsen wrote:
> Still -1.  It's better, but it violates the principle of encapsulation
> by mixing how-you-use-it state with what-it-stores state.  In doing
> that it has the potential to break an API documented as accepting a
> dict.  Code that expects d[key] to raise an exception (and catches the
> resulting KeyError) will now silently "succeed".

Of course it will, and without quotes. That's the whole point.

> I believe that necessitates a PEP to document it.

You are missing the rationale of the PEP process. The point is
*not* documentation. The point of the PEP process is to channel
and collect discussion, so that the BDFL can make a decision.
The BDFL is not bound at all to the PEP process.

To document things, we use (or should use) documentation.


From guido at  Fri Feb 17 23:15:39 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 14:15:39 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Ian Bicking <ianb at> wrote:
> Guido van Rossum wrote:
> > d = {}
> > d.default_factory = set
> > ...
> > d[key].add(value)
> Another option would be:
>    d = {}
>    d.default_factory = set
>    d.get_default(key).add(value)
> Unlike .setdefault, this would use a factory associated with the
> dictionary, and no default value would get passed in.  Unlike the
> proposal, this would not override __getitem__ (not overriding
> __getitem__ is really the only difference with the proposal).  It would
> be clear reading the code that you were not implicitly asserting they
> "key in d" was true.
> "get_default" isn't the best name, but another name isn't jumping out at
> me at the moment.  Of course, it is not a Pythonic argument to say that
> an existing method should be overridden, or functionality made nameless
> simply because we can't think of a name (looking to anonymous functions
> of course ;)

I'm torn. While trying to implement this I came across some ugliness
in PyDict_GetItem() -- it would make sense if this also called
on_missing(), but it must return a value without incrementing its
refcount, and isn't supposed to raise exceptions -- so what to do if
on_missing() returns a value that's not inserted in the dict?

If the __getattr__()-like operation that supplies and inserts a
dynamic default was a separate method, we wouldn't have this problem.

OTOH most reviewers here seem to appreciate on_missing() as a way to
do various other ways of alterning a dict's __getitem__() behavior
behind a caller's back -- perhaps it could even be (ab)used to
implement case-insensitive lookup.

I'm not going to do a point-by-point to your longer post (I don't have
the time); let's (again) agree to disagree and I'll sleep on it.

--Guido van Rossum (home page:

From jcarlson at  Fri Feb 17 23:18:39 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 17 Feb 2006 14:18:39 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <> <>
Message-ID: <>

"Martin v. L?wis" <martin at> wrote:
> M.-A. Lemburg wrote:
> > Just because some codecs don't fit into the string.decode()
> > or bytes.encode() scenario doesn't mean that these codecs are
> > useless or that the methods should be banned.
> No. The reason to ban string.decode and bytes.encode is that
> it confuses users.

How are users confused?  bytes.encode CAN only produce bytes.  Though
string.decode (or bytes.decode) MAY produce strings (or bytes) or
unicode, depending on the codec, I think it is quite reasonable to
expect that users will understand that string.decode('utf-8') is
different than string.decode('base-64'), and that they may produce
different output.  In a similar fashion, dict.get(1) may produce
different results than dict.get(2) for some dictionaries.  If some users
can't understand this (passing different arguments to a function may
produce different output), then I think that some users are broken
beyond repair.

 - Josiah

From guido at  Fri Feb 17 23:17:41 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 14:17:41 -0800
Subject: [Python-Dev] Serial function call composition syntax foo(x,
	y) -> bar() -> baz(z)
In-Reply-To: <>
References: <>
Message-ID: <>

Cut to the chase: -1000.

On 2/17/06, Bengt Richter <bokr at> wrote:
> Cut to the chase: how about being able to write
>     baz(bar(foo(x, y)),z)
> serially as
>     foo(x, y) -> bar() -> baz(z)
> via the above as sugar for
>     baz.__get__(bar.__get__(foo(x, y))())(z)
> ?

--Guido van Rossum (home page:

From martin at  Fri Feb 17 23:22:37 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 23:22:37 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>	<dstlvb$6cb$>
	<>	<1140007745.13739.7.camel@localhost.localdomain>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/16/06, Stephen J. Turnbull <stephen at> wrote:
>>/usr/share often is on a different mount; that's the whole rationale
>>for /usr/share.
> I don't think I've worked at a place where something like that was
> done for at least 10 years. Isn't this argument outdated?

It still *is* the rationale for putting things into /usr/share,
even though I agree that probably nobody actually does that.

That, in turn, is because nobody is so short of disk space that
you really *have* to share /usr/share across architectures, and
because trying to do the sharing still causes problems (e.g.
what if the packaging systems of different architectures
all decide to put the same files into /usr/share?)


From bokr at  Fri Feb 17 23:34:00 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 22:34:00 GMT
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in
	coordination with pep 349?]
References: <>	<>	<>
	<> <>
Message-ID: <>

On Fri, 17 Feb 2006 21:35:25 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
>No. The reason to ban string.decode and bytes.encode is that
>it confuses users.
Well, that's because of semantic overloading. Assuming you mean
string as characters and bytes as binary bytes.

The trouble is encoding and decoding have to have bytes to represent
the coded info, whichever direction. Characters per se aren't coded
info, so string.decode doesn't make sense without faking it with
string.encode().decode() and bytes.encode() likewise first has to
have a hidden .decode to become a string that makes sense to encode.
And the hidden stuff restricts to ascii, for further grief :-(

So yes, please ban string.decode and bytes.encode.

And maybe introduce bytes.recode for bytes->bytes transforms?
(strings don't have any codes to recode).

Bengt Richter

From ianb at  Fri Feb 17 23:38:09 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 16:38:09 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/17/06, Adam Olsen <rhamph at> wrote:
>>It's also makes it harder to read code.  You may expect d[key] to
>>raise an exception, but it won't because of a single line up several
>>pages (or in another file entierly!)
> Such are the joys of writing polymorphic code. I don't really see how
> you can avoid this kind of confusion -- I could have given you some
> other mapping object that does weird stuff.

The way you avoid confusion is by not working with code or programmers 
who write bad code.  Python and polymorphic code in general pushes the 
responsibility for many errors from the language structure onto the 
programmer -- it is the programmers' responsibility to write good code. 
  Python has never kept people from writing obcenely horrible code.  We 
ought to have an obfuscated Python contest just to prove that point -- 
it is through practice and convention that readable Python code happens, 
not through the restrictions of the language.  (Honestly, I think such a 
contest would be a good idea.)

I know *I* at least don't like code that mixes up access and 
modification.  Maybe not everyone does (or maybe not everyone thinks of 
getitem as "access", but that's unlikely).  I will assert that it is 
Pythonic to keep access and modification separate, which is why methods 
and attributes are different things, and why assignment is not an 
expression, and why functions with side effects typically return None, 
or have names that are very explicit about the side effect, with names 
containing command verbs like "update" or "set".  All of these 
distinguish access from modification.

Note that all of what I'm saying *only* applies to the overriding of 
__getitem__, not the addition of any new method.  I think multidict is 
better for the places it applies, but I see no problem at all with a new 
method on dictionaries that calls on_missing.

Ian Bicking  /  ianb at  /

From martin at  Fri Feb 17 23:37:59 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 23:37:59 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> I'm torn. While trying to implement this I came across some ugliness
> in PyDict_GetItem() -- it would make sense if this also called
> on_missing(), but it must return a value without incrementing its
> refcount, and isn't supposed to raise exceptions -- so what to do if
> on_missing() returns a value that's not inserted in the dict?

I think there should be a guideline to use
PyObject_GetItem/PyMapping_GetItemString "normally", i.e. in all cases
where you would write d[k] in Python code.

It should be considered a bug if PyDict_GetItem is used in a place
that "should" invoke defaulting; IOW, the function should be reserved
to really low-level cases (e.g. if it is known that the dict doesn't
have any defaulting, e.g. the string interned dictionary).

There should be a policy whether name-lookup invokes defaulting
(i.e. __dict__ access); I think it should. This would cause
__getattr__ to have no effect if the object's dictionary has
a default factory (unless that raises a KeyError).


From martin at  Fri Feb 17 23:52:15 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 17 Feb 2006 23:52:15 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <> <>
Message-ID: <>

Josiah Carlson wrote:
> How are users confused?

Users do

py> "Martin v. L?wis".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in range(128)

because they want to convert the string "to Unicode", and they have
found a text telling them that .encode("utf-8") is a reasonable

What it *should* tell them is

py> "Martin v. L?wis".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'str' object has no attribute 'encode'

> bytes.encode CAN only produce bytes.

I don't understand MAL's design, but I believe in that design,
bytes.encode could produce anything (say, a list). A codec
can convert anything to anything else.

> If some users
> can't understand this (passing different arguments to a function may
> produce different output),

It's worse than that. The return *type* depends on the *value* of
the argument. I think there is little precedence for that: normally,
the return values depend on the argument values, and, in a polymorphic
function, the return type might depend on the argument types (e.g.
the arithmetic operations). Also, the return type may depend on the
number of arguments (e.g. by requesting a return type in a keyword

> then I think that some users are broken beyond repair.

Hmm. I'm speechless.


From rhamph at  Fri Feb 17 23:56:24 2006
From: rhamph at (Adam Olsen)
Date: Fri, 17 Feb 2006 15:56:24 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Guido van Rossum <guido at> wrote:
> On 2/17/06, Adam Olsen <rhamph at> wrote:
> > It's also makes it harder to read code.  You may expect d[key] to
> > raise an exception, but it won't because of a single line up several
> > pages (or in another file entierly!)
> Such are the joys of writing polymorphic code. I don't really see how
> you can avoid this kind of confusion -- I could have given you some
> other mapping object that does weird stuff.

You could pass a float in as well.  But if the function is documented
as taking a dict, and the programmer expects a dict.. that now has to
be changed to "dict without a default".  Or they have to code
defensively since d[key] may or may not raise KeyError, so they must
avoid depending on it either way.

Adam Olsen, aka Rhamphoryncus

From guido at  Fri Feb 17 23:58:34 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 14:58:34 -0800
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On 2/17/06, "Martin v. L?wis" <martin at> wrote:
> Guido van Rossum wrote:
> > On 2/16/06, Stephen J. Turnbull <stephen at> wrote:
> >>/usr/share often is on a different mount; that's the whole rationale
> >>for /usr/share.
> >
> > I don't think I've worked at a place where something like that was
> > done for at least 10 years. Isn't this argument outdated?
> It still *is* the rationale for putting things into /usr/share,
> even though I agree that probably nobody actually does that.
> That, in turn, is because nobody is so short of disk space that
> you really *have* to share /usr/share across architectures, and
> because trying to do the sharing still causes problems (e.g.
> what if the packaging systems of different architectures
> all decide to put the same files into /usr/share?)

I believe /usr/share was intended only to be used for
platform-independent files (e.g. docs, or .py files).

Another reason why nobody does this is because NFS is slow and
unreliable. It's no fun when your NFS server goes down and your
machine hangs because someone wanted to save 50 MB per workstation by
sharing it.

--Guido van Rossum (home page:

From guido at  Sat Feb 18 00:00:22 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 15:00:22 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Adam Olsen <rhamph at> wrote:
> > Such are the joys of writing polymorphic code. I don't really see how
> > you can avoid this kind of confusion -- I could have given you some
> > other mapping object that does weird stuff.
> You could pass a float in as well.  But if the function is documented
> as taking a dict, and the programmer expects a dict.. that now has to
> be changed to "dict without a default".  Or they have to code
> defensively since d[key] may or may not raise KeyError, so they must
> avoid depending on it either way.

I'd like to see a real-life example of code that would break this way.
I believe that *most* code that takes a dict will work just fine if
that dict has a default factory.

--Guido van Rossum (home page:

From martin at  Sat Feb 18 00:06:20 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 00:06:20 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Ian Bicking wrote:
> I know *I* at least don't like code that mixes up access and 
> modification.  Maybe not everyone does (or maybe not everyone thinks of 
> getitem as "access", but that's unlikely).  I will assert that it is 
> Pythonic to keep access and modification separate, which is why methods 
> and attributes are different things, and why assignment is not an 
> expression, and why functions with side effects typically return None, 
> or have names that are very explicit about the side effect, with names 
> containing command verbs like "update" or "set".  All of these 
> distinguish access from modification.

Do you never write


This is modification and access, all in a single statement, and all
without assignment operator.

I don't see the setting of the default value as a modification.
The default value has been there, all the time. It only is incarnated


From martin at  Sat Feb 18 00:07:48 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 00:07:48 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Adam Olsen wrote:
> You could pass a float in as well.  But if the function is documented
> as taking a dict, and the programmer expects a dict.. that now has to
> be changed to "dict without a default".  Or they have to code
> defensively since d[key] may or may not raise KeyError, so they must
> avoid depending on it either way.

Can you give an example of real, existing code that will break
if a such a dict is passed?


From rhamph at  Sat Feb 18 00:08:35 2006
From: rhamph at (Adam Olsen)
Date: Fri, 17 Feb 2006 16:08:35 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, "Martin v. L?wis" <martin at> wrote:
> Adam Olsen wrote:
> > Still -1.  It's better, but it violates the principle of encapsulation
> > by mixing how-you-use-it state with what-it-stores state.  In doing
> > that it has the potential to break an API documented as accepting a
> > dict.  Code that expects d[key] to raise an exception (and catches the
> > resulting KeyError) will now silently "succeed".
> Of course it will, and without quotes. That's the whole point.

Consider these two pieces of code:

if key in d:

except KeyError:

Before they were the same (assuming dosomething() won't raise
KeyError).  Now they would behave differently.

The latter is even the prefered form, since it only invokes a single
dict lookup:

On 2/16/06, Delaney, Timothy (Tim) <tdelaney at> wrote:
>     try:
>         v = d[key]
>     except:
>         v = d[key] = value

Obviously this example could be changed to use default_factory, but I
find it hard to believe the only use of that pattern is to set default

Of course you could just assume that of all the people passing your
function a dict, none of them will ever use the default_factory when
they build the dict.  Should be easy, right?

Adam Olsen, aka Rhamphoryncus

From ianb at  Sat Feb 18 00:13:51 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 17:13:51 -0600
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Users do
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.
> What it *should* tell them is
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I think it would be even better if they got "ValueError: utf8 can only 
encode unicode objects".  AttributeError is not much more clear than the 

That str.encode(unicode_encoding) implicitly decodes strings seems like 
a flaw in the unicode encodings, quite seperate from the existance of 
str.encode.  I for one really like s.encode('zlib').encode('base64') -- 
and if the zlib encoding raised an error when it was passed a unicode 
object (instead of implicitly encoding the string with the ascii 
encoding) that would be fine.

The pipe-like nature of .encode and .decode works very nicely for 
certain transformations, applicable to both unicode and byte objects. 
Let's not throw the baby out with the bath water.

Ian Bicking  /  ianb at  /

From ianb at  Sat Feb 18 00:21:52 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 17:21:52 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
>>I know *I* at least don't like code that mixes up access and 
>>modification.  Maybe not everyone does (or maybe not everyone thinks of 
>>getitem as "access", but that's unlikely).  I will assert that it is 
>>Pythonic to keep access and modification separate, which is why methods 
>>and attributes are different things, and why assignment is not an 
>>expression, and why functions with side effects typically return None, 
>>or have names that are very explicit about the side effect, with names 
>>containing command verbs like "update" or "set".  All of these 
>>distinguish access from modification.
> Do you never write
>  d[some_key].append(some_value)
> This is modification and access, all in a single statement, and all
> without assignment operator.

(d[some_key]) is access.  (...).append(some_value) is modification. 
Expressions are compound; of course you can mix both access and 
modification in a single expression.  d[some_key] is access that returns 
something, and .append(some_value) modifies that something, it doesn't 
modify d.

> I don't see the setting of the default value as a modification.
> The default value has been there, all the time. It only is incarnated
> lazily.

It is lazily incarnated for multidict, because there is no *noticeable* 
side effect -- if there is any internal side effects that is an 
implementation detail.  However for default_factory=list, the result of 
.keys(), .has_key(), and .items() changes when you do d[some_key].

Ian Bicking  /  ianb at  /

From ianb at  Sat Feb 18 00:34:29 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 17:34:29 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Adam Olsen wrote:
> The latter is even the prefered form, since it only invokes a single
> dict lookup:
> On 2/16/06, Delaney, Timothy (Tim) <tdelaney at> wrote:
>>    try:
>>        v = d[key]
>>    except:
>>        v = d[key] = value
> Obviously this example could be changed to use default_factory, but I
> find it hard to believe the only use of that pattern is to set default
> keys.

I'd go further -- I doubt many cases where try:except KeyError: is used 
could be refactored to use default_factory -- default_factory can only 
be used to set default keys to something that can be determined sometime 
close to the time the dictionary is created, and that the default is not 
dependent on the context in which the key is fetched, and that default 
value will not cause unintended side effects if the dictionary leaks out 
of the code where it was initially used (like if the dictionary is 
returned to someone).  Any default factory is more often an algorithmic 
detail than truly part of the nature of the dictionary itself.

For instance, here is something I do often:

     value = cache[key]
except KeyError:
     ... calculate value ...
     cache[key] = value

Realistically, factoring "... calculate value ..." into a factory that 
calculates the value would be difficult, produce highly unreadable code, 
perform worse, and have more bugs.  For simple factories like "list" and 
"dict" the factory works okay.  For immutable values like 0 and None, 
the factory (lambda : 0 and lambda : None) is a wasteful way to create a 
default value (because storing the value in the dictionary is 
unnecessary).  For non-trivial factories the whole thing falls apart, 
and one can just hope that no one will try to use this feature and will 
instead stick with the try:except KeyError: technique.

Ian Bicking  /  ianb at  /

From bokr at  Sat Feb 18 00:36:26 2006
From: bokr at (Bengt Richter)
Date: Fri, 17 Feb 2006 23:36:26 GMT
Subject: [Python-Dev] bdist_* to stdlib?
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

On Fri, 17 Feb 2006 14:58:34 -0800, "Guido van Rossum" <guido at> wrote:

>On 2/17/06, "Martin v. L=F6wis" <martin at> wrote:
>> Guido van Rossum wrote:
>> > On 2/16/06, Stephen J. Turnbull <stephen at> wrote:
>> >>/usr/share often is on a different mount; that's the whole rationale
>> >>for /usr/share.
>> >
>> > I don't think I've worked at a place where something like that was
>> > done for at least 10 years. Isn't this argument outdated?
>> It still *is* the rationale for putting things into /usr/share,
>> even though I agree that probably nobody actually does that.
>> That, in turn, is because nobody is so short of disk space that
>> you really *have* to share /usr/share across architectures, and
>> because trying to do the sharing still causes problems (e.g.
>> what if the packaging systems of different architectures
>> all decide to put the same files into /usr/share?)
>I believe /usr/share was intended only to be used for
>platform-independent files (e.g. docs, or .py files).
> agrees with you, via ref to
and more specifically
>Another reason why nobody does this is because NFS is slow and
>unreliable. It's no fun when your NFS server goes down and your
>machine hangs because someone wanted to save 50 MB per workstation by
>sharing it.
Sometimes a separate mount could be a separate hard disk in the same box, I guess.
Apparently it's read-only, so I guess it could also temporarily be a cdrom even.

Bengt Richter

From oliphant.travis at  Sat Feb 18 00:38:16 2006
From: oliphant.travis at (Travis Oliphant)
Date: Fri, 17 Feb 2006 16:38:16 -0700
Subject: [Python-Dev] Please comment on PEP 357 -- adding nb_index slot
	to PyNumberMethods
In-Reply-To: <>
References: <dsu7t9$m9c$>	<>
Message-ID: <dt5mp8$h0c$>

Thomas Wouters wrote:
> On Fri, Feb 17, 2006 at 05:29:32PM +0100, Armin Rigo wrote:
>>>       Where obj must be either an int or a long or another object that has the
>>>       __index__ special method (but not self).
>>The "anything but not self" rule is not consistent with any other
>>special method's behavior.  IMHO we should just do the same as

Agreed.  I implemented the code, then realized this possible recursion 
problem while writing the specification.  I didn't know how it would be 

It is easy enough to require __index__ to return an actual Python 
integer because for anything that has the nb_index slot you would just 
return obj.__index__()  instead of obj.

I'll change the PEP and the implementation.  I have an updated 
implementation that uses the ssize_t patch instead.

There seem to be some issues with the ssize_t patch still, though.

Shouldn't a lot of checks for INT_MAX be replaced with PY_SSIZE_T_MAX. 
But, I noticed that PY_SSIZE_T_MAX definition in pyport.h raises errors.
I don't think it even makes sense.


From mcherm at  Fri Feb 17 23:47:33 2006
From: mcherm at (Michael Chermside)
Date: Fri, 17 Feb 2006 14:47:33 -0800
Subject: [Python-Dev] Proposal: defaultdict
Message-ID: <>

Martin v. L?wis writes:
> You are missing the rationale of the PEP process. The point is
> *not* documentation. The point of the PEP process is to channel
> and collect discussion, so that the BDFL can make a decision.
> The BDFL is not bound at all to the PEP process.
> To document things, we use (or should use) documentation.

You are oversimplifying significantly. The purpose of the PEP
process is to lay out and document the reasoning that went
into forming the decision. The BDFL is *allowed* to be
capricious, but he's sensible enough to choose not to: in
cases where it matters, he tries to document the reasoning
behind his decisions. In fact, he does better than that... he
gets the PEP author to document it for him!

The PEP (whether accepted, rejected, or in-progress) serves
as the official documentation of how the decision was made
(or of what option it is that is still undecided). If a
_trivial_ decision is already made, there's no need for a
PEP, but if a difficult decision has been made, then
documenting it in a PEP saves years of having to justify
it to newbies.

-- Michael Chermside

From jcarlson at  Sat Feb 18 00:51:32 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 17 Feb 2006 15:51:32 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <>
Message-ID: <>

"Martin v. L?wis" <martin at> wrote:
> Josiah Carlson wrote:
> > How are users confused?
> Users do
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.

Removing functionality because some users read bad instructions
somewhere, is a bit like kicking your kitten because your puppy peed on
the floor.  You are punishing the wrong group, for something that
shouldn't result in punishment: it should result in education.

Users are always going to get bad instructions, and removing utility
because some users fail to think before they act, or complain when their
lack of thinking doesn't work, will give us a language where we are
removing features because *new* users have no idea what they are doing.

> What it *should* tell them is
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I disagree.  I think the original error was correct, and we should be
educating users to prefix their literals with a 'u' if they want unicode,
or they should get their data from a unicode source (wxPython with
unicode, StreamReader, etc.)

> > bytes.encode CAN only produce bytes.
> I don't understand MAL's design, but I believe in that design,
> bytes.encode could produce anything (say, a list). A codec
> can convert anything to anything else.

That seems to me to be a little overkill...

In any case, I personally find that data.encode('base-64') and
edata.decode('base-64') to be more convenient than binascii.b2a_base64
(data) and binascii.a2b_base64(edata).  Ditto for hexlify/unhexlify, etc.

> > If some users
> > can't understand this (passing different arguments to a function may
> > produce different output),
> It's worse than that. The return *type* depends on the *value* of
> the argument. I think there is little precedence for that: normally,
> the return values depend on the argument values, and, in a polymorphic
> function, the return type might depend on the argument types (e.g.
> the arithmetic operations). Also, the return type may depend on the
> number of arguments (e.g. by requesting a return type in a keyword
> argument).

You only need to look to dictionaries where different values passed into
a function call may very well return results of different types, yet
there have been no restrictions on mapping to and from single types per

Many dict-like interfaces for configuration files do this, things like
config.get('remote_host') and config.get('autoconnect') not being

 - Josiah

From oliphant.travis at  Sat Feb 18 00:40:08 2006
From: oliphant.travis at (Travis Oliphant)
Date: Fri, 17 Feb 2006 16:40:08 -0700
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>
Message-ID: <dt5mso$h0c$>

Martin v. L?wis wrote:
> Just in case you haven't noticed, I just merged
> the ssize_t branch (PEP 353).
> If you have any corrections to the code to make which
> you would consider bug fixes, just go ahead.
> If you are uncertain how specific problems should be resolved,
> feel free to ask.
> If you think certain API changes should be made, please
> discuss them here - they would need to be reflected in the
> PEP as well.

What is PY_SSIZE_T_MAX supposed to be?  The definition in pyport.h 
doesn't compile.

Shouldn't a lot of checks for INT_MAX be replaced with PY_SSIZE_T_MAX? 
Like in the slice indexing code?

Thanks for all your effort on ssize_t fixing.  This is a *big* deal for 
64-bit number crunching with Python.


From martin at  Sat Feb 18 00:52:51 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 00:52:51 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	
Message-ID: <>

Adam Olsen wrote:
> Consider these two pieces of code:
> if key in d:
>   dosomething(d[key])
> else:
>   dosomethingelse()
> try:
>   dosomething(d[key])
> except KeyError:
>   dosomethingelse()
> Before they were the same (assuming dosomething() won't raise
> KeyError).  Now they would behave differently.

I personally think they should continue to do the same thing,
i.e. "in" should return True if there is a default; in the
current proposal, it should invoke the default factory.

But that's beside the point: Where is the real example
where this difference would matter? (I'm not asking for
a realistic example, I'm asking for a real one)


From martin at  Sat Feb 18 00:58:35 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 00:58:35 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>
	<> <>
Message-ID: <>

Ian Bicking wrote:
> That str.encode(unicode_encoding) implicitly decodes strings seems like
> a flaw in the unicode encodings, quite seperate from the existance of
> str.encode.  I for one really like s.encode('zlib').encode('base64') --
> and if the zlib encoding raised an error when it was passed a unicode
> object (instead of implicitly encoding the string with the ascii
> encoding) that would be fine.
> The pipe-like nature of .encode and .decode works very nicely for
> certain transformations, applicable to both unicode and byte objects.
> Let's not throw the baby out with the bath water.

The way you use it, it's a matter of notation only: why


any worse? I think it's better: it doesn't use string literals to
denote function names.

If there is a point to this overgeneralized codec idea, it is
the streaming aspect: that you don't need to process all data
at once, but can feed data sequentially. Of course, you are
not using this in your example.


From ianb at  Sat Feb 18 01:00:09 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 18:00:09 -0600
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332
 revival	in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Josiah Carlson wrote:
>>>If some users
>>>can't understand this (passing different arguments to a function may
>>>produce different output),
>>It's worse than that. The return *type* depends on the *value* of
>>the argument. I think there is little precedence for that: normally,
>>the return values depend on the argument values, and, in a polymorphic
>>function, the return type might depend on the argument types (e.g.
>>the arithmetic operations). Also, the return type may depend on the
>>number of arguments (e.g. by requesting a return type in a keyword
> You only need to look to dictionaries where different values passed into
> a function call may very well return results of different types, yet
> there have been no restrictions on mapping to and from single types per
> dictionary.
> Many dict-like interfaces for configuration files do this, things like
> config.get('remote_host') and config.get('autoconnect') not being
> uncommon.

I think there is *some* justification, if you don't understand up front 
that the codec you refer to (using a string) is just a way of avoiding 
an import (thankfully -- dynamically importing unicode codecs is 
obviously infeasible).  Now, if you understand the argument refers to 
some algorithm, it's not so bad.

The other aspect is that there should be something consistent about the 
return types -- the Python type is not what we generally rely on, 
though.  In this case they are all "data".  Unicode and bytes are both 
data, and you could probably argue lists of ints is data too (but an 
arbitrary list definitely isn't data).  On the outer end of data might 
be an ElementTree structure (but that's getting fishy).  An open file 
object is not data.  A tuple probably isn't data.

Ian Bicking  /  ianb at  /

From bokr at  Sat Feb 18 01:02:15 2006
From: bokr at (Bengt Richter)
Date: Sat, 18 Feb 2006 00:02:15 GMT
Subject: [Python-Dev] Serial function call composition syntax foo(x,
	y) -> bar() -> baz(z)
References: <>
Message-ID: <>

Is that a record? ;-)

BTW, does python-dev have different expectations re top-posting?
I've seen more here than on c.l.p I think, so I'm wondering what to do.


Bengt Richter

On Fri, 17 Feb 2006 14:17:41 -0800, "Guido van Rossum" <guido at> wrote:

>Cut to the chase: -1000.
>On 2/17/06, Bengt Richter <bokr at> wrote:
>> Cut to the chase: how about being able to write
>>     baz(bar(foo(x, y)),z)
>> serially as
>>     foo(x, y) -> bar() -> baz(z)
>> via the above as sugar for
>>     baz.__get__(bar.__get__(foo(x, y))())(z)
>> ?
>--Guido van Rossum (home page:
>Python-Dev mailing list
>Python-Dev at

From aleaxit at  Sat Feb 18 01:02:05 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 17 Feb 2006 16:02:05 -0800
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dt552b$mnq$>
References: <dsj0p7$tk3$> <>
Message-ID: <>

On 2/17/06, Georg Brandl <g.brandl at> wrote:
> Ian Bicking wrote:
> >> Unfortunately, a @property decorator is impossible...
> >
> > It already works!  But only if you want a read-only property.  Which is
> > actually about 50%+ of the properties I create.  So the status quo is
> > not really that bad.
> I have abused it this way too and felt bad every time.
> Kind of like keeping your hat on in the church. :)

It's not ideal, because the resulting r-o property has no docstring:

>>> class ex(object):
...   @property
...   def amp(self):
...     ''' a nice docstring '''
...     return 23
>>> ex.amp.__doc__
>>> class xe(object):
...   def amp(self): return 23
...   amp=property(amp, doc='whatever!')
>>> xe.amp.__doc__

Maybe we could fix that by having property(getfunc) use
getfunc.__doc__ as the __doc__ of the resulting property object
(easily overridable in more normal property usage by the doc=
argument, which, I feel, should almost invariably be there).


From guido at  Sat Feb 18 01:03:20 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 16:03:20 -0800
Subject: [Python-Dev] Serial function call composition syntax foo(x,
	y) -> bar() -> baz(z)
In-Reply-To: <>
References: <>
Message-ID: <>

It's only me that's allowed to top-post. :-)

On 2/17/06, Bengt Richter <bokr at> wrote:
> Is that a record? ;-)
> BTW, does python-dev have different expectations re top-posting?
> I've seen more here than on c.l.p I think, so I'm wondering what to do.
> When-in-Rome'ly,
> Regards,
> Bengt Richter
> On Fri, 17 Feb 2006 14:17:41 -0800, "Guido van Rossum" <guido at> wrote:
> >Cut to the chase: -1000.
> >
> >On 2/17/06, Bengt Richter <bokr at> wrote:
> >> Cut to the chase: how about being able to write
> >>
> >>     baz(bar(foo(x, y)),z)
> >>
> >> serially as
> >>
> >>     foo(x, y) -> bar() -> baz(z)
> >>
> >> via the above as sugar for
> >>
> >>     baz.__get__(bar.__get__(foo(x, y))())(z)
> >>
> >> ?
> >
> >--
> >--Guido van Rossum (home page:
> >_______________________________________________
> >Python-Dev mailing list
> >Python-Dev at
> >
> >Unsubscribe:
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From ianb at  Sat Feb 18 01:06:13 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 18:06:13 -0600
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Ian Bicking wrote:
>>That str.encode(unicode_encoding) implicitly decodes strings seems like
>>a flaw in the unicode encodings, quite seperate from the existance of
>>str.encode.  I for one really like s.encode('zlib').encode('base64') --
>>and if the zlib encoding raised an error when it was passed a unicode
>>object (instead of implicitly encoding the string with the ascii
>>encoding) that would be fine.
>>The pipe-like nature of .encode and .decode works very nicely for
>>certain transformations, applicable to both unicode and byte objects.
>>Let's not throw the baby out with the bath water.
> The way you use it, it's a matter of notation only: why
> is
> zlib(base64(s))
> any worse? I think it's better: it doesn't use string literals to
> denote function names.

Maybe it isn't worse, but the real alternative is:

   import zlib
   import base64


Encodings cover up eclectic interfaces, where those interfaces fit a 
basic pattern -- data in, data out.

Ian Bicking  /  ianb at  /

From ianb at  Sat Feb 18 01:07:58 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 18:07:58 -0600
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <>
References: <dsj0p7$tk3$>
	<>	<dt552b$mnq$>
Message-ID: <>

Alex Martelli wrote:
> Maybe we could fix that by having property(getfunc) use
> getfunc.__doc__ as the __doc__ of the resulting property object
> (easily overridable in more normal property usage by the doc=
> argument, which, I feel, should almost invariably be there).


Ian Bicking  /  ianb at  /

From martin at  Sat Feb 18 01:12:21 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 01:12:21 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Ian Bicking wrote:
> It is lazily incarnated for multidict, because there is no *noticeable*
> side effect -- if there is any internal side effects that is an
> implementation detail.  However for default_factory=list, the result of
> .keys(), .has_key(), and .items() changes when you do d[some_key].

That's why I think has_key and in should return True for any key.
This leaves keys(), items(), and values(). From a pure point of
view, they should return infinite sets. Practicality beats purity,
so yes, d[k] could be considered a modifying operation.

If you look carefully, you find that many access operations also
have side effects. For example, .read() on a file not only returns
some data, but also advances the file position. Queue.get not
only returns the next item, but also removes it from the queue.


From martin at  Sat Feb 18 01:20:15 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 01:20:15 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

Ian Bicking wrote:
> Maybe it isn't worse, but the real alternative is:
>   import zlib
>   import base64
>   base64.b64encode(zlib.compress(s))
> Encodings cover up eclectic interfaces, where those interfaces fit a
> basic pattern -- data in, data out.

So should I write


or would that be


What about


It's "data in, data out", after all. Who needs functions?


From guido at  Sat Feb 18 01:20:46 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 16:20:46 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/17/06, "Martin v. L?wis" <martin at> wrote:
> That's why I think has_key and in should return True for any key.
> This leaves keys(), items(), and values(). From a pure point of
> view, they should return infinite sets. Practicality beats purity,
> so yes, d[k] could be considered a modifying operation.

I think practicality beats purity even for has_key/in; IMO these
operations are more useful when they match keys() instead of always
returning True. But someone should start writing some code to play
with this.

I have a working patch (including a hack for PyDict_GetItem()):

So there's no excuse to be practical now.

--Guido van Rossum (home page:

From guido at  Sat Feb 18 01:21:54 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 16:21:54 -0800
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <>
References: <dsj0p7$tk3$> <>
Message-ID: <>

WFM. Patch anyone?

On 2/17/06, Ian Bicking <ianb at> wrote:
> Alex Martelli wrote:
> > Maybe we could fix that by having property(getfunc) use
> > getfunc.__doc__ as the __doc__ of the resulting property object
> > (easily overridable in more normal property usage by the doc=
> > argument, which, I feel, should almost invariably be there).
> +1

--Guido van Rossum (home page:

From ianb at  Sat Feb 18 01:32:23 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 18:32:23 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>		<>		<>		<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Adam Olsen wrote:
>>Consider these two pieces of code:
>>if key in d:
>>  dosomething(d[key])
>>  dosomethingelse()
>>  dosomething(d[key])
>>except KeyError:
>>  dosomethingelse()
>>Before they were the same (assuming dosomething() won't raise
>>KeyError).  Now they would behave differently.
> I personally think they should continue to do the same thing,
> i.e. "in" should return True if there is a default; in the
> current proposal, it should invoke the default factory.

As I believe Fredrik implied, this would break the symmetry between "x 
in d" and "x in d.keys()" (unless d.keys() enumerates all possible 
keys), and either .get() would become useless, or it would also act in 
inconsistent ways.  I think these broken expectations are much worse 
than what Adam's talking about.

> But that's beside the point: Where is the real example
> where this difference would matter? (I'm not asking for
> a realistic example, I'm asking for a real one)

Well, here's a kind of an example: WSGI specifies that the environment 
must be a dictionary, and nothing but a dictionary.  I think it would 
have to be updated to say that it must be a dictionary with 
default_factory not set, as default_factory would break the 
predictability that was the reason WSGI specified exactly a dictionary 
(and not a dictionary-like object or subclass).  So there's something 
that becomes brokenish.

I think this is the primary kind of breakage -- dictionaries with 
default_factory set are not acceptable objects when a "plain" dictionary 
is expected.  Of course, it can always be claimed that it's the fault of 
the person who passes in such a dictionary (they could have passed in 
None and it would probably also be broken).  But now all of the sudden I 
have to say "x(a) takes a dictionary argument.  Oh, and don't you dare 
use the default_factory feature!" where before I could just say 
"dictionary".  And KeyError just... disappears.  KeyError is one of 
those errors that you *expect* to happen (maybe the "Error" part is a 
misnomer); having it disappear is a major change.

Also, I believe there's two ways to handle thread safety, both of which 
are broken:

1) d[key] gets the GIL, and thus while default_factory is being called 
the GIL is locked

2) d[key] doesn't get the GIL and so d[key].append(1) may not actually 
lead to 1 being in d[key] if another thread is appending something to 
the same key at the same time, and the key is not yet present in d.

Admittedly I don't understand the ins and outs of the GIL, so the first 
case might not actually need to acquire the GIL.

Ian Bicking  /  ianb at  /

From ianb at  Sat Feb 18 01:44:56 2006
From: ianb at (Ian Bicking)
Date: Fri, 17 Feb 2006 18:44:56 -0600
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
>>Maybe it isn't worse, but the real alternative is:
>>  import zlib
>>  import base64
>>  base64.b64encode(zlib.compress(s))
>>Encodings cover up eclectic interfaces, where those interfaces fit a
>>basic pattern -- data in, data out.
> So should I write
> 3.1415.encode("sin")
> or would that be
> 3.1415.decode("sin")

The ambiguity shows that "sin" is clearly not an encoding.  Doesn't read 
right anyway.

[0.3, 0.35, ...].encode('fourier') would be sensible though.  Except of 
course lists don't have an encode method; but that's just a convenience 
of strings and unicode because those objects are always data, where 
lists are only sometimes data.  If extended indefinitely, the namespace 
issue is notable.  But it's not going to be extended indefinitely, so 
that's just a theoretical problem.

> What about
> "".decode("URL")

you mean 'a%20b'.decode('url') == 'a b'?  That's not what you meant, but 
nevertheless that would be an excellent encoding ;)

Ian Bicking  /  ianb at  /

From thomas at  Sat Feb 18 01:51:49 2006
From: thomas at (Thomas Wouters)
Date: Sat, 18 Feb 2006 01:51:49 +0100
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <dt5mso$h0c$>
References: <> <dt5mso$h0c$>
Message-ID: <>

On Fri, Feb 17, 2006 at 04:40:08PM -0700, Travis Oliphant wrote:

> What is PY_SSIZE_T_MAX supposed to be?  The definition in pyport.h 
> doesn't compile.

Why not? Does it give an error for your particular platform? What platform
is that? What are HAVE_SSIZE_T, SIZEOF_VOID_P and SIZEOF_SIZE_T defined to

While looking at the piece of code in Include/pyport.h I do notice that the
fallback (when ssize_t is not available) is to Py_uintptr_t... Which is an
unsigned type, while ssize_t is supposed to be signed. Martin, is that on
purpose? I don't have any systems that lack ssize_t. ;P

That should prevent the PY_SSIZE_T_MAX definition from compiling though.

> Shouldn't a lot of checks for INT_MAX be replaced with PY_SSIZE_T_MAX? 
> Like in the slice indexing code?

Yes, ideally. (Actually, I think slice indexing was changed earlier today.)
But while changing it would have little to no effect on 32-bit machines,
doing it the wrong way may break the code on 64-bit machines in subtle ways,
so it's not all done blindly, or in one big shot. Also, because some output
parameters to PyArg_ParsE* change size (s#/t#), code will have to be
reviewed to make use of the full address range anyway. (There's some
preprocessor hackery that checks for PY_SIZE_T_CLEAN to see if it's safe to
use the large output versions.)

Adapting all code in the right way isn't finished yet (not in the last place
because some of the code is... how shall I put it... 'creative'.)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From bob at  Sat Feb 18 02:12:17 2006
From: bob at (Bob Ippolito)
Date: Fri, 17 Feb 2006 17:12:17 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
	<> <>
Message-ID: <>

On Feb 17, 2006, at 4:20 PM, Martin v. L?wis wrote:

> Ian Bicking wrote:
>> Maybe it isn't worse, but the real alternative is:
>>   import zlib
>>   import base64
>>   base64.b64encode(zlib.compress(s))
>> Encodings cover up eclectic interfaces, where those interfaces fit a
>> basic pattern -- data in, data out.
> So should I write
> 3.1415.encode("sin")
> or would that be
> 3.1415.decode("sin")
> What about
> "".decode("URL")
> It's "data in, data out", after all. Who needs functions?

Well, 3.1415.decode("sin") is of course NaN, because 3.1415.encode 
("sinh") is not defined for numbers outside of [-1, 1] :)


From rhamph at  Sat Feb 18 02:29:34 2006
From: rhamph at (Adam Olsen)
Date: Fri, 17 Feb 2006 18:29:34 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, "Martin v. L?wis" <martin at> wrote:
> Adam Olsen wrote:
> > You could pass a float in as well.  But if the function is documented
> > as taking a dict, and the programmer expects a dict.. that now has to
> > be changed to "dict without a default".  Or they have to code
> > defensively since d[key] may or may not raise KeyError, so they must
> > avoid depending on it either way.
> Can you give an example of real, existing code that will break
> if a such a dict is passed?

I only got halfway through the "grep KeyError" results, but..

Demo/tkinter/guido/  # Subclasses override self.classes
Lib/  # Currently uses UserDict but I assume it will
switch to dict eventually

And the pi?ce de r?sistance..

It has this:
        info = rcdict[s]
    except KeyError:
        sys.stderr.write("No refcount data for %s\n" % s)
rcdict is loaded from refcounts.load().  refcounts.load() calls
refcounts.loadfile(), which has this (inside a loop):
        entry = d[function]
    except KeyError:
        entry = d[function] = Entry(function)
A prime candidate for a default.

Perhaps the KeyError shouldn't ever get triggered in this case, I'm
not sure.  I think that's besides the point though.  The programmer
clearly expected it would.

Adam Olsen, aka Rhamphoryncus

From oliphant.travis at  Sat Feb 18 02:37:32 2006
From: oliphant.travis at (Travis Oliphant)
Date: Fri, 17 Feb 2006 18:37:32 -0700
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <> <dt5mso$h0c$>
Message-ID: <dt5tot$437$>

Thomas Wouters wrote:
> On Fri, Feb 17, 2006 at 04:40:08PM -0700, Travis Oliphant wrote:
>>What is PY_SSIZE_T_MAX supposed to be?  The definition in pyport.h 
>>doesn't compile.

Maybe I have the wrong version of code.  In my pyport.h (checked out 
from svn trunk) I have.

#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))

What is size_t?  Is this supposed to be sizeof(size_t)?

I get a syntax error when I actually use PY_SSIZE_T_MAX somewhere in the 

> While looking at the piece of code in Include/pyport.h I do notice that the
> fallback (when ssize_t is not available) is to Py_uintptr_t... Which is an
> unsigned type, while ssize_t is supposed to be signed. Martin, is that on
> purpose? I don't have any systems that lack ssize_t. ;P

I saw the same thing and figured it was an error.

> Adapting all code in the right way isn't finished yet (not in the last place
> because some of the code is... how shall I put it... 'creative'.)

I'm just trying to adapt my __index__ patch to use ssize_t.   I realize 
this was a big change and will take some "adjusting."  I can help with 
that if needed as I do have some experience here.  I just want to make 
sure I fully understand what issues Martin and others are concerned about.


From greg.ewing at  Sat Feb 18 04:13:45 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 18 Feb 2006 16:13:45 +1300
Subject: [Python-Dev] str object going in Py3K
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:

>>Another thought -- what is going to happen to
>>Will it change to return bytes, or will there be a new
> Nit-pickingly: will continue to return integers.

Sorry, what I meant was will return bytes.


From ncoghlan at  Sat Feb 18 04:34:35 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 13:34:35 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Adam Olsen wrote:
> And the pi?ce de r?sistance..
> Doc/tools/
> It has this:
>     try:
>         info = rcdict[s]
>     except KeyError:
>         sys.stderr.write("No refcount data for %s\n" % s)
>     else:
>         ...
> rcdict is loaded from refcounts.load().  refcounts.load() calls
> refcounts.loadfile(), which has this (inside a loop):
>     try:
>         entry = d[function]
>     except KeyError:
>         entry = d[function] = Entry(function)
> A prime candidate for a default.
> Perhaps the KeyError shouldn't ever get triggered in this case, I'm
> not sure.  I think that's besides the point though.  The programmer
> clearly expected it would.

Assuming the following override:

   class EntryDict(dict):
       def on_missing(self, key):
           value = Entry(key)
           self[key] = value
           return value

Then what it means is that the behaviour of "missing functions get an empty 
refcount entry" propagates to the rcdict code.

So the consequence is that the code in anno-api will never print an error 
message - all functions are deemed to have associated refcount data in 

But that would be a bug in refcounts.loadfile: if it returns an EntryDict 
instead of a normal dict it is, in effect, returning an *infinite* dictionary 
that contains refcount definitions for every possible function name (some of 
them are just populated on demand).

So *if* refcounts.loadfile was converted to use an EntryDict, it would need to 
return dict(d) instead of returning d directly.

And this is where the question of whether has_key/__having__ return True or 
False when default_factory is set is important. If they return False, then the 
LBYL (if key in d:) and EAFTP (try/except) approaches give *different answers*.

More importantly, LBYL will never have side effects, whereas EAFTP may.

If the methods always return True (as Martin suggests), then we retain the 
current behaviour where there is no real difference between the two 
approaches. Given the amount of time spent in recent years explaining this 
fact, I don't think it is an equivalence that should be broken lightly (IOW, 
I've persuaded myself that I agree with Martin)

The alternative would be to have an additional query API "will_default" that 
reflects whether or not a given key is actually present in the dictionary ("if 
key not in d.keys()" would serve a similar purpose, but requires building the 
list of keys).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tim.peters at  Sat Feb 18 04:37:21 2006
From: tim.peters at (Tim Peters)
Date: Fri, 17 Feb 2006 22:37:21 -0500
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <dt5tot$437$>
References: <> <dt5mso$h0c$>
	<> <dt5tot$437$>
Message-ID: <>

[Travis Oliphant]
> Maybe I have the wrong version of code.  In my pyport.h (checked out
> from svn trunk) I have.
> #define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))
> What is size_t?

size_t is an unsigned integral type defined by, required by, and used
all over the place in standard C.  What exactly is the compiler
message you get, and exactly which compiler are you using (note that
nobody else is having problems with this, so there's something unique
in your setup)?

>  Is this supposed to be sizeof(size_t)?

No.  (size_t)-1 casts -1 to the unsigned integral type size_t, which
creates a "solid string of 1 bits" with the width of the size_t type. 
">> 1" then shifts that right one bit, clearing the sign bit but
leaving the rest of the integer "all 1s".  Then that's cast to type
Py_ssize_t, which is a signed integral type with the same width as the
standard size_t.  In the end, you get the largest positive signed
integer with the same width as size_t, and that's the intent.

> I get a syntax error when I actually use PY_SSIZE_T_MAX somewhere in the
> code.

Nobody else does (PY_SSIZE_T_MAX is referenced in a few places
already), so you need to give more information.

Is it simply that you neglected to include Python.h in some extension
module?  The definition of size_t must be (according to the C
standard) supplied by stdlib.h, and Python.h includes stdlib.h.

It's also possible the some core Python C code doesn't #include enough
stuff to get the platform's size_t definition.

From greg.ewing at  Sat Feb 18 04:27:13 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 18 Feb 2006 16:27:13 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:
>>>>>>"Guido" == Guido van Rossum <guido at> writes:

>     Guido> - b = bytes(t, enc); t = text(b, enc)
> +1  The coding conversion operation has always felt like a constructor
> to me, and in this particular usage that's exactly what it is.  I
> prefer the nomenclature to reflect that.

This also has the advantage that it competely
avoids using the verbs "encode" and "decode"
and the attendant confusion about which direction
they go in.


   s = text(b, "base64")

makes it obvious that you're going from the
binary side to the text side of the base64


From ncoghlan at  Sat Feb 18 04:43:42 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 13:43:42 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Guido van Rossum wrote:
> But there were
> several suggestions that this would be fine functionality to add to
> the standard dict type -- and then I really don't see any other way to
> do this.

Given the constructor problem, and the issue with "this function expects a 
plain dictionary", I think your original instinct to use a subclass may have 
been correct.

The constructor is much cleaner that way:

# bag like behavior
dd = collections.autodict(int)
for elem in collection:
      dd[elem] += 1

# setdefault-like behavior
dd = collections.autodict(list)
for page_number, page in enumerate(book):
      for word in page.split():

And it can be a simple fact that for an autodict, "if key in d" and "d[key]" 
may give different answers.

Much cleaner than making the semantics of normal dicts dependent on:
  a. whether or not on_missing has been overridden
  b. whether or not default_factory has been set


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From pje at  Sat Feb 18 04:51:13 2006
From: pje at (Phillip J. Eby)
Date: Fri, 17 Feb 2006 22:51:13 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <
References: <>
Message-ID: <>

At 11:58 AM 02/17/2006 -0800, Guido van Rossum wrote:
>I forgot to mention in my revised proposal that the API for setting
>the default_factory is slightly odd:
>   d = {}   # or dict()
>   d.default_factory = list
>rather than
>   d = dict(default_factory=list)
>This is of course because we cut off that way when we defined what
>arbitrary keyword arguments to the dict constructor would do. My
>original proposal solved this by creating a subclass. But there were
>several suggestions that this would be fine functionality to add to
>the standard dict type -- and then I really don't see any other way to
>do this. (Yes, I could have a set_default_factory() method -- but a
>simple settable attribute seems more pythonic!)

Why not a classmethod constructor:

  d = dict.with_factory(list)

Admittedly, the name's not that great.  Actually, it's almost as bad as 
setdefault in some ways.  But I'd rather set the default and create the 
dictionary in one operation, since when reading it as two, you first think 
'd is a dictionary', and then 'oh, but it has a default factory', as 
opposed to "d is a dict with a factory" in one thought.  But maybe that's 
just me.  :)

From guido at  Sat Feb 18 05:14:28 2006
From: guido at (Guido van Rossum)
Date: Fri, 17 Feb 2006 20:14:28 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Nick Coghlan <ncoghlan at> wrote:
> And this is where the question of whether has_key/__having__ return True or
> False when default_factory is set is important. If they return False, then the
> LBYL (if key in d:) and EAFTP (try/except) approaches give *different answers*.
> More importantly, LBYL will never have side effects, whereas EAFTP may.
> If the methods always return True (as Martin suggests), then we retain the
> current behaviour where there is no real difference between the two
> approaches. Given the amount of time spent in recent years explaining this
> fact, I don't think it is an equivalence that should be broken lightly (IOW,
> I've persuaded myself that I agree with Martin)
> The alternative would be to have an additional query API "will_default" that
> reflects whether or not a given key is actually present in the dictionary ("if
> key not in d.keys()" would serve a similar purpose, but requires building the
> list of keys).

Looking at it from the "which invariants hold" POV isn't always the
right perspective.

Reality is that some amount of code that takes a dict won't work if
you give it a dict with a default_factory. Well, that's nothing new.
Some code also breaks if you pass it a dict containing key or value
types it doesn't expect, or if you pass it an anydbm instance, or
os.environ on Windows (which implements case-insensitive keys).

>From the POV of someone who decides to use a dict with a
default_factory (or overriding on-missing()), having the 'in' operator
always return True is d*mn annoying -- it means that any kind of
introspection of the dict doesn't work. Take for example the multiset
use case. Suppose you're aware that you're using a dict with this
special behavior. Now you've built up your multiset and now you want
to use it. Part of your app is interested in knowing the list of
values associated with each key. But another part may be interested
only in whether a particular key hs *any* values associated. If "key
in d" returns whether that key is currently present, you can write

  if key in d:
      print "whatever"

But under Martin and your proposed semantics, you'd have to write

  if d.get(key):
      print "whatever"

or (worse)

  if d[key]: # inserts an empty list into the dict!
      print "whatever"

I'd much rather be able to write "if key in d" and get the result I want...

--Guido van Rossum (home page:

From oliphant.travis at  Sat Feb 18 05:17:00 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Fri, 17 Feb 2006 21:17:00 -0700
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>
	<dt5mso$h0c$>	<>
Message-ID: <dt673u$phm$>

Tim Peters wrote:
> [Travis Oliphant]
>>Maybe I have the wrong version of code.  In my pyport.h (checked out
>>from svn trunk) I have.
>>#define PY_SSIZE_T_MAX ((Py_ssize_t)(((size_t)-1)>>1))
>>What is size_t?
> size_t is an unsigned integral type defined by, required by, and used
> all over the place in standard C.  What exactly is the compiler
> message you get, and exactly which compiler are you using (note that
> nobody else is having problems with this, so there's something unique
> in your setup)?

I'm very sorry for my silliness.  I do see the problem I was having now. 
   Thank you for helping me out.  I was assuming that PY_SSIZE_T_MAX 
could be used in a  pre-processor statement like LONG_MAX and INT_MAX.

In other words


This was giving me errors and I tried to understand the #define 
statement as an arithmetic operation (not a type-casting one).  I did 
know about size_t but thought it strange that 1 was being subtracted 
from it.

I would have written this as (size_t)(-1) to avoid that confusion.  I do 
apologize for my error.  Thank you for taking the time to explain it.

I still think that PY_SSIZE_T_MAX ought to be usable in a pre-processor 
statement, but it's a nit.



> No.  (size_t)-1 casts -1 to the unsigned integral type size_t,

That's what I was missing I saw this as subtraction not type-casting. 
My mistake


From jcarlson at  Sat Feb 18 05:33:16 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 17 Feb 2006 20:33:16 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Stephen J. Turnbull wrote:
> >>>>>>"Guido" == Guido van Rossum <guido at> writes:
> >     Guido> - b = bytes(t, enc); t = text(b, enc)
> > 
> > +1  The coding conversion operation has always felt like a constructor
> > to me, and in this particular usage that's exactly what it is.  I
> > prefer the nomenclature to reflect that.
> This also has the advantage that it competely
> avoids using the verbs "encode" and "decode"
> and the attendant confusion about which direction
> they go in.
> e.g.
>    s = text(b, "base64")
> makes it obvious that you're going from the
> binary side to the text side of the base64
> conversion.

But you aren't always getting *unicode* text from the decoding of bytes,
and you may be encoding bytes *to* bytes:

    b2 = bytes(b, "base64")
    b3 = bytes(b2, "base64")

Which direction are we going again?

 - Josiah

From bokr at  Sat Feb 18 05:41:07 2006
From: bokr at (Bengt Richter)
Date: Sat, 18 Feb 2006 04:41:07 GMT
Subject: [Python-Dev] Proposal: defaultdict
References: <>	
Message-ID: <>

On Sat, 18 Feb 2006 00:52:51 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Adam Olsen wrote:
>> Consider these two pieces of code:
>> if key in d:
>>   dosomething(d[key])
>> else:
>>   dosomethingelse()
>> try:
>>   dosomething(d[key])
>> except KeyError:
>>   dosomethingelse()
>> Before they were the same (assuming dosomething() won't raise
>> KeyError).  Now they would behave differently.
>I personally think they should continue to do the same thing,
>i.e. "in" should return True if there is a default; in the
>current proposal, it should invoke the default factory.
>But that's beside the point: Where is the real example
>where this difference would matter? (I'm not asking for
>a realistic example, I'm asking for a real one)
My guess is that realistically default_factory will be used
to make clean code for filling a dict, and then turning the factory
off if it's to be passed into unknown contexts. Those contexts
can then use old code to do as above, or if worth it can
temporarily set a factory to do some work. Tightly coupled
code I guess could pass factory-enabled dicts between each other.

IOW, no code should break unless you pass a factory-enabled dict
where you shouldn't ;-)

That said, maybe enabling/disabling could be separate from d.default_factory
(e.g., d.defaults_enabled) as that could allow e.g. foo(**kw) more options
in how to copy kw and what foo could do. Would total copy including defaulting state
be best? What other copies must be sanitized? type('Foo',(), **{'this':'one?'})

It will be interesting to see what comes out of the woodwork ;-)

Bengt Richter

From bob at  Sat Feb 18 06:10:04 2006
From: bob at (Bob Ippolito)
Date: Fri, 17 Feb 2006 21:10:04 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:

> Greg Ewing <greg.ewing at> wrote:
>> Stephen J. Turnbull wrote:
>>>>>>>> "Guido" == Guido van Rossum <guido at> writes:
>>>     Guido> - b = bytes(t, enc); t = text(b, enc)
>>> +1  The coding conversion operation has always felt like a  
>>> constructor
>>> to me, and in this particular usage that's exactly what it is.  I
>>> prefer the nomenclature to reflect that.
>> This also has the advantage that it competely
>> avoids using the verbs "encode" and "decode"
>> and the attendant confusion about which direction
>> they go in.
>> e.g.
>>    s = text(b, "base64")
>> makes it obvious that you're going from the
>> binary side to the text side of the base64
>> conversion.
> But you aren't always getting *unicode* text from the decoding of  
> bytes,
> and you may be encoding bytes *to* bytes:
>     b2 = bytes(b, "base64")
>     b3 = bytes(b2, "base64")
> Which direction are we going again?

This is *exactly* why the current set of codecs are INSANE.   
unicode.encode and str.decode should be used *only* for unicode  
codecs.  Byte transforms are entirely different semantically and  
should be some other method pair.


From aahz at  Sat Feb 18 06:13:44 2006
From: aahz at (Aahz)
Date: Fri, 17 Feb 2006 21:13:44 -0800
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Fri, Feb 17, 2006, "Martin v. L?wis" wrote:
> Josiah Carlson wrote:
>> How are users confused?
> Users do
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.

The problem is that they don't understand that "Martin v. L?wis" is not
Unicode -- once all strings are Unicode, this is guaranteed to work.
While it's not absolutely true, my experience of watching Unicode
confusion is that the simplest approach for newbies is: encode FROM
Unicode, decode TO Unicode.  Most people when they start playing with
Unicode think of it as just another text encoding rather than suddenly
replacing "the universe" as the most base form of text.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From murman at  Sat Feb 18 06:38:52 2006
From: murman at (Michael Urman)
Date: Fri, 17 Feb 2006 23:38:52 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Adam Olsen <rhamph at> wrote:
> if key in d:
>   dosomething(d[key])
> else:
>   dosomethingelse()
> try:
>   dosomething(d[key])
> except KeyError:
>   dosomethingelse()

I agree with the gut feeling that these should still do the same
thing. Could we modify d.get() instead?

>>> class ddict(dict):
...     default_value_factory = None
...     def get(self, k, d=None):
...         v = super(ddict, self).get(k, d)
...         if v is not None or d is not None or
self.default_value_factory is None:
...             return v
...         return self.setdefault(k, self.default_value_factory())
>>> d = ddict()
>>> d.default_value_factory = list
>>> d.get('list', [])
>>> d['list']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'list'
>>> d.get('list').append(5)
>>> d['list']

There was never an exception raised by d.get so this wouldn't change
(assuming the C is implemented more carefully than the python above).
What are the problems with this other than, like setdefault, it only
works on values with mutator methods (i.e., no counting dicts)? Is the
lack of counting dicts that d.__getitem__ supports a deal breaker?

>>> d.default_value_factory = int
>>> d.get('count') += 1
SyntaxError: can't assign to function call

How does the above either in dict or a subclass compare to five line
or smaller custom subclasses using something like the following?
    def append(self, k, val):
        self.setdefault(k, []).append(val)
    def accumulate(self, k, val):
        try: self[k] += val
        except KeyError: self[k] = val

Michael Urman

From talin at  Sat Feb 18 07:47:44 2006
From: talin at (Talin)
Date: Fri, 17 Feb 2006 22:47:44 -0800
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>
Message-ID: <>

All right, the patch is up on SF. Sorry for the delay, I accidentally 
left my powerbook about an hour's drive away from home, and had to drive 
to go get it this morning :)

To those who were asking what advantage the new syntax has - well, from 
a technical perspective there are none, since the underlying 
implementation is identical. The only (minor) difference is in the 
syntactical ambiguity, which both forms have - with lambda you can't be 
certain when to stop parsing the result expression, whereas with 'given' 
you can't be certain when to stop parsing the argument list.

I see the primary advantage of the inline syntax as pedagogical - given 
a choice, I would rather explain the "given" syntax to a novice 
programmer than try to explain lambda. This is especially true given the 
similarity in form to generator expressions - in other words, once 
you've gone through the effort of explaining generator expressions, you 
can re-use most of that explanation when explaining "function 
expressions"; whereas with lambda, which looks like nothing else in 
Python, you have to start from scratch.

-- Talin

From nnorwitz at  Sat Feb 18 07:53:19 2006
From: nnorwitz at (Neal Norwitz)
Date: Fri, 17 Feb 2006 22:53:19 -0800
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Armin Rigo <arigo at> wrote:
> Hi,
> On Tue, Feb 14, 2006 at 09:24:57PM -0800, Neal Norwitz wrote:
> >
> There is at least one SF bug, namely "#1333982 Bugs of the new AST
> compiler", that in my humble opinion absolutely needs to be fixed before
> the release, even though I won't hide that I have no intention of fixing
> it myself.  Should I raise the issue here in python-dev, and see if we
> agree that it is critical?

I agree it's critical.

> (Sorry if I should know about the procedure.  Does it then go in the
> PEP's Planned Features list?)

I don't think it belongs in the PEP.  I bumped the priority to 7 which
is the standard protocol, though I don't know that it's really
followed.  I will enumerate the existing problems for Jeremy in the
bug report.

In the future,  I would also prefer separate bug reports.  Feel free
to assign new bugs to Jeremy too. :-)


From nnorwitz at  Sat Feb 18 08:01:45 2006
From: nnorwitz at (Neal Norwitz)
Date: Fri, 17 Feb 2006 23:01:45 -0800
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <dt673u$phm$>
References: <> <dt5mso$h0c$>
	<> <dt5tot$437$>
Message-ID: <>

On 2/17/06, Travis E. Oliphant <oliphant.travis at> wrote:
> I'm very sorry for my silliness.  I do see the problem I was having now.
>    Thank you for helping me out.  I was assuming that PY_SSIZE_T_MAX
> could be used in a  pre-processor statement like LONG_MAX and INT_MAX.
> In other words

I suppose that might be nice, but would require configure magic.  I'm
not sure how it could be done on Windows.

There are much more important problems to address at this point IMO. 
Just review the recent fixes related to Py_BuildValue() on
python-checkins to see what I mean.


From jcarlson at  Sat Feb 18 08:05:48 2006
From: jcarlson at (Josiah Carlson)
Date: Fri, 17 Feb 2006 23:05:48 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Bob Ippolito <bob at> wrote:
> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
> >
> > Greg Ewing <greg.ewing at> wrote:
> >>
> >> Stephen J. Turnbull wrote:
> >>>>>>>> "Guido" == Guido van Rossum <guido at> writes:
> >>
> >>>     Guido> - b = bytes(t, enc); t = text(b, enc)
> >>>
> >>> +1  The coding conversion operation has always felt like a  
> >>> constructor
> >>> to me, and in this particular usage that's exactly what it is.  I
> >>> prefer the nomenclature to reflect that.
> >>
> >> This also has the advantage that it competely
> >> avoids using the verbs "encode" and "decode"
> >> and the attendant confusion about which direction
> >> they go in.
> >>
> >> e.g.
> >>
> >>    s = text(b, "base64")
> >>
> >> makes it obvious that you're going from the
> >> binary side to the text side of the base64
> >> conversion.
> >
> > But you aren't always getting *unicode* text from the decoding of  
> > bytes,
> > and you may be encoding bytes *to* bytes:
> >
> >     b2 = bytes(b, "base64")
> >     b3 = bytes(b2, "base64")
> >
> > Which direction are we going again?
> This is *exactly* why the current set of codecs are INSANE.   
> unicode.encode and str.decode should be used *only* for unicode  
> codecs.  Byte transforms are entirely different semantically and  
> should be some other method pair.

The problem is that we are overloading data types.  Strings (and bytes)
can contain both encoded text as well as data, or even encoded data.
Unless the plan is to make bytes _only_ contain encoded unicode, or
_only_ data, or _only_ encoded data, the confusion for users may continue. 
Me, I'm a fan of education.  Educating your users is simple, and if you
have good exceptions and documentation, it gets easier.  Raise an
exception when a user tries to use a codec which doesn't have a
particular source ('...'.decode('utf-8') should raise an error like
"Cannot use text as a source for 'utf-8' decoding", when unicode/text
becomes the default format for string literals).

Tossing out bytes.encode(), as well as decodings for bytes->bytes, also
brings up the issue of text.decode() for pure text transformations.  Are
we going to push all of those transformations somewhere else?

Look at what we've currently got going for data transformations in the
standard library to see what these removals will do: base64 module,
binascii module, binhex module, uu module, ...  Do we want or need to
add another top-level module for every future encoding/codec that comes
out (or does everyone think that we're done seeing codecs)?  Do we want
to keep monkey-patching binascii with names like 'a2b_hqx'?  While there
is currently one text->text transform (rot13), do we add another module
for text->text transforms? Would it start having names like t2e_rot13()
and e2t_rot13()?

Educate the users.  Raise better exceptions telling people why their
encoding or decoding failed, as Ian Bicking already pointed out.  If
bytes.encode() and the equivalent of text.decode() is going to disappear,
Bengt Richter had a good idea with bytes.recode() for strictly bytes
transformations (and the equivalent for text), though it is ambiguous as
to the direction; are we encoding or decoding with bytes.recode()?  In
my opinion, this is why .encode() and .decode() makes sense to keep on
both bytes and text, the direction is unambiguous, and if one has even a
remote idea of what the heck the codec is, they know their result.

 - Josiah

From ilya at  Sat Feb 18 08:03:42 2006
From: ilya at (Ilya Sandler)
Date: Fri, 17 Feb 2006 23:03:42 -0800 (PST)
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <Pine.LNX.4.58.0602172256320.1046@bagira>

On Fri, 17 Feb 2006, Phillip J. Eby wrote:

> >   d = {}   # or dict()
> >   d.default_factory = list
> Why not a classmethod constructor:
>   d = dict.with_factory(list)
>  But I'd rather set the default and create the
> dictionary in one operation, since when reading it as two, you first think
> 'd is a dictionary', and then 'oh, but it has a default factory', as
> opposed to "d is a dict with a factory" in one thought.

Also, class method would mean less typing (esp if dictionary name
happens to be longer than a couple of characters ;-)

But I'd like to suggest a different  name:

d = dict.with_default( list)


From ncoghlan at  Sat Feb 18 08:23:20 2006
From: ncoghlan at (Nick Coghlan)
Date: Sat, 18 Feb 2006 17:23:20 +1000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> I'd much rather be able to write "if key in d" and get the result I want...

Somewhere else in this byzantine thread, I realised that what was really 
bothering me was the conditional semantics that dict ended up with (i.e., it's 
behaviour changed significantly if the default factory was set).

If we go back to your idea of collection.defaultdict (or Alex's name 
collection.autodict), then the change in semantics bothers me a lot less, and 
I'd be all in favour of the more usual variant (where "key in d" implies "key 
in d.keys()".


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From bokr at  Sat Feb 18 08:24:31 2006
From: bokr at (Bengt Richter)
Date: Sat, 18 Feb 2006 07:24:31 GMT
Subject: [Python-Dev] bytes.from_hex()
References: <>
Message-ID: <>

On Fri, 17 Feb 2006 20:33:16 -0800, Josiah Carlson <jcarlson at> wrote:

>Greg Ewing <greg.ewing at> wrote:
>> Stephen J. Turnbull wrote:
>> >>>>>>"Guido" == Guido van Rossum <guido at> writes:
>> >     Guido> - b = bytes(t, enc); t = text(b, enc)
>> > 
>> > +1  The coding conversion operation has always felt like a constructor
>> > to me, and in this particular usage that's exactly what it is.  I
>> > prefer the nomenclature to reflect that.
>> This also has the advantage that it competely
>> avoids using the verbs "encode" and "decode"
>> and the attendant confusion about which direction
>> they go in.
>> e.g.
>>    s = text(b, "base64")
>> makes it obvious that you're going from the
>> binary side to the text side of the base64
>> conversion.
>But you aren't always getting *unicode* text from the decoding of bytes,
>and you may be encoding bytes *to* bytes:
>    b2 = bytes(b, "base64")
>    b3 = bytes(b2, "base64")
>Which direction are we going again?
Well, base64 is probably not your best example, because it necessarily involves characters ;-)

If you are using "base64" you are looking at characters in your input to
produce your bytes output. The only way you can see characters in bytes input
is to decode them. So you are hiding your assumption about b's encoding.

You can make useful rules of inference from type(b), but with bytes you really
don't know. "base64" has to interpret b bytes as characters, because that's what
it needs to recognize base64 characters, to produce the output bytes.

The characters in b could be encoded in plain ascii, or utf16le, you have to know.
So for utf16le it should be

     b2 = bytes(text(b, 'utf16le'), "base64")

just because you assume an implicit

     b2 = bytes(text(b, 'ascii'), "base64")

doesn't make it so in general. Even if you build that assumption in,
it's not really true that you are going "bytes *to* bytes" without characters
involved when you do bytes(b, "base64"). You have just left undocumented an API restriction
(assert <bytes input is an ascii encoding of base64 characters>) and an implementation
optimization ;-)

This is the trouble with str.encode and unicode.decode. They both hide implicit
decodes and encodes respectively. They should be banned IMO. Let people spell it out
and maybe understand what they are doing.

OTOH, a bytes-to-bytes codec might be decompressing tgz into tar. For conceptual consistency,
one might define a 'bytes' encoding that conceptually turns bytes into unicode byte characters and
vice versa. Then "gunzip" can decode bytes, producing unicode characters which are then
encoded back to bytes from the unicode ;-) The 'bytes' encoding would numerically be just like
latin-1 except on the unicode side it would have wrapped-bytes internal representation.

    b_tar = bytes(text(b_tgz, 'gunzip'), 'bytes')

of course, text(b_tgz, 'gunzip') would produce unicode text with a special internal representation that
just wraps bytes though they are true unicode. The 'bytes' codec encode of course would just unwrap the
internal bytes representation, but it would conceptually be an encoding into bytes. bytes(t, 'latin-1')
would produce the same output from the wrapped bytes unicode.

Sometimes conceptual purity can clarify things and sometimes it's just another confusing description.

Bengt Richter

From martin at  Sat Feb 18 08:33:35 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 08:33:35 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>		<>		<>		<>	<>
	<> <>
Message-ID: <>

Ian Bicking wrote:
> Well, here's a kind of an example: WSGI specifies that the environment
> must be a dictionary, and nothing but a dictionary.  I think it would
> have to be updated to say that it must be a dictionary with
> default_factory not set, as default_factory would break the
> predictability that was the reason WSGI specified exactly a dictionary
> (and not a dictionary-like object or subclass).  So there's something
> that becomes brokenish.

I don't understand. In the rationale of PEP 333, it says
"The rationale for requiring a dictionary is to maximize portability
between servers. The alternative would be to define some subset of a
dictionary's methods as being the standard and portable interface."

That rationale is not endangered: if the environment continues to
be a dict exactly, servers continue to be guaranteed what precise
set of operations is available on the environment.

Of course, that may change from Python version to Python version,
as new dict methods get defined. But that should have been clear
when the PEP was written: the dict type itself may evolve, providing
additional features that weren't present in earlier versions.
Even now, some dict implementations have setdefault(), others

> KeyError is one of
> those errors that you *expect* to happen (maybe the "Error" part is a
> misnomer); having it disappear is a major change.

Well, as you say: you get a KeyError if there is an error with the key.
With a default_factory, there isn't normally an error with the key.

> Also, I believe there's two ways to handle thread safety, both of which
> are broken:
> 1) d[key] gets the GIL, and thus while default_factory is being called
> the GIL is locked
> 2) d[key] doesn't get the GIL and so d[key].append(1) may not actually
> lead to 1 being in d[key] if another thread is appending something to
> the same key at the same time, and the key is not yet present in d.

It's 1), primarily. If default_factory is written in Python, though
(e.g. if it is *not* list()), the interpreter will give up the GIL
every N byte code instructions (or when a blocking operation is

Notice the same issue already exist with __hash__ for the key.

Also notice that the same issue already exists with any kind of
manipulation of a dictionary in multiple threads, today: if you

except KeyError:
   d[k] = [v]

then two threads might interleavingly execute the except-suite.


From martin at  Sat Feb 18 09:08:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 09:08:00 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	
Message-ID: <>

Adam Olsen wrote:
> Demo/metaclass/

That wouldn't break. If you had actually read the code, you would have
seen it is

            ga = dict['__getattr__']
        except KeyError:

How would it break if dict had a default factory? ga would get the
__getattr__ value, and everything would be fine. The KeyError is
ignored, after all.

> Demo/tkinter/guido/  # Subclasses override self.classes


                cl = self.classes[c]
            except KeyError:
                cl = 'unknown'

So cl wouldn't be 'unknown'. Why would that be a problem?

> Lib/

                    v = map[var]
                except KeyError:
                    raise InterpolationMissingOptionError(
                        option, section, rest, var)

So there is no InterpolationMissingOptionError. *Of course not*.
The whole point would be to provide a value for all interpolation

> Lib/

This entire functions samples k elements with indices between 0
and len(population). Now, people "shouldn't" be passing dictionaries
in in the first place; that specific code tests whether there
valid values at indices 0, n//2, and n. If the dictionary
isn't really a sequence (i.e. if it doesn't provide values
at all indices), the function may later fail even if it passes
that test.

With a default-valued dictionary, the function would not fail,
but a large number of samples might be the default value.

> Lib/

Same like ConfigParser: the intperpolation will always succeed,
interpolating all values (rather than leaving $identifier in the
string). That would be precisely the expected behaviour.

> Lib/  # Currently uses UserDict but I assume it will
> switch to dict eventually

Or, rather, UserDict might grow the on_missing feature as well.

That is irrelevant for this issue, though:

        o =[key]()
        if o is None:
            raise KeyError, key      # line 56
            return o

So we are looking for lookup failures in, here:
self.dict is initialized to {} in UserDict, with no
default factory. So there cannot be a change in behaviour.

> Perhaps the KeyError shouldn't ever get triggered in this case, I'm
> not sure.  I think that's besides the point though.  The programmer
> clearly expected it would.

No. I now see your problem: An "except KeyError" does *not* mean
that the programmer "clearly expects it will" raise an KeyError.
Instead, the programmer expects it *might* raise a KeyError, and
tries to deal with this situation.

If the situation doesn't arise, the code continue just fine.


From martin at  Sat Feb 18 09:21:04 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 09:21:04 +0100
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>
	<dt5mso$h0c$>	<>
	<dt5tot$437$>	<>	<dt673u$phm$>
Message-ID: <>

Neal Norwitz wrote:
> I suppose that might be nice, but would require configure magic.  I'm
> not sure how it could be done on Windows.

Contributions are welcome. On Windows, it can be hard-coded.

Actually, something like

#error What is size_t equal to?

might work.

> There are much more important problems to address at this point IMO. 
> Just review the recent fixes related to Py_BuildValue() on
> python-checkins to see what I mean.

Nevertheless, it would be desirable IMO if it expanded to a literal,
so that the preprocessor could understand it.


From rrr at  Sat Feb 18 09:35:24 2006
From: rrr at (Ron Adam)
Date: Sat, 18 Feb 2006 02:35:24 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Josiah Carlson wrote:
> Bob Ippolito <bob at> wrote:
>> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
>>> Greg Ewing <greg.ewing at> wrote:
>>>> Stephen J. Turnbull wrote:
>>>>>>>>>> "Guido" == Guido van Rossum <guido at> writes:
>>>>>     Guido> - b = bytes(t, enc); t = text(b, enc)
>>>>> +1  The coding conversion operation has always felt like a  
>>>>> constructor
>>>>> to me, and in this particular usage that's exactly what it is.  I
>>>>> prefer the nomenclature to reflect that.
>>>> This also has the advantage that it competely
>>>> avoids using the verbs "encode" and "decode"
>>>> and the attendant confusion about which direction
>>>> they go in.
>>>> e.g.
>>>>    s = text(b, "base64")
>>>> makes it obvious that you're going from the
>>>> binary side to the text side of the base64
>>>> conversion.
>>> But you aren't always getting *unicode* text from the decoding of  
>>> bytes,
>>> and you may be encoding bytes *to* bytes:
>>>     b2 = bytes(b, "base64")
>>>     b3 = bytes(b2, "base64")
>>> Which direction are we going again?
>> This is *exactly* why the current set of codecs are INSANE.   
>> unicode.encode and str.decode should be used *only* for unicode  
>> codecs.  Byte transforms are entirely different semantically and  
>> should be some other method pair.
> The problem is that we are overloading data types.  Strings (and bytes)
> can contain both encoded text as well as data, or even encoded data.


> Educate the users.  Raise better exceptions telling people why their
> encoding or decoding failed, as Ian Bicking already pointed out.  If
> bytes.encode() and the equivalent of text.decode() is going to disappear,

+1 on better documentation all around with regards to encodings and 
Unicode.  So far the best explanation I've found (so far) is in PEP 100. 
  The Python docs and built in help hardly explain more than the minimal 
argument list for the encoding and decoding methods, and the str and 
unicode type constructor arguments aren't explained any better.

> Bengt Richter had a good idea with bytes.recode() for strictly bytes
> transformations (and the equivalent for text), though it is ambiguous as
> to the direction; are we encoding or decoding with bytes.recode()?  In
> my opinion, this is why .encode() and .decode() makes sense to keep on
> both bytes and text, the direction is unambiguous, and if one has even a
> remote idea of what the heck the codec is, they know their result.
>  - Josiah

I like the bytes.recode() idea a lot. +1

It seems to me it's a far more useful idea than encoding and decoding by 
overloading and could do both and more.  It has a lot of potential to be 
an intermediate step for encoding as well as being used for many other 
translations to byte data.

I think I would prefer that encode and decode be just functions with 
well defined names and arguments instead of being methods or arguments 
to string and Unicode types.

I'm not sure on exactly how this would work. Maybe it would need two 
sets of encodings, ie.. decoders, and encoders.  An exception would be
given if it wasn't found for the direction one was going in.

Roughly... something or other like:

     import encodings

     encodings.tostr(obj, encoding):
        if encoding not in encoders:
            raise LookupError 'encoding not found in encoders'
        # check if obj works with encoding to string
        # ...
        b = bytes(obj).recode(encoding)
        return str(b)

     encodings.tounicode(obj, decodeing):
        if decoding not in decoders:
            raise LookupError 'decoding not found in decoders'
        # check if obj works with decoding to unicode
        # ...
        b = bytes(obj).recode(decoding)
        return unicode(b)

Anyway... food for thought.

    Ronald Adam

From g.brandl at  Sat Feb 18 09:38:45 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 18 Feb 2006 09:38:45 +0100
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <>
References: <dsj0p7$tk3$>
	<>	<dt552b$mnq$>	<>	<>
Message-ID: <dt6mem$bu$>

Guido van Rossum wrote:
> WFM. Patch anyone?



> On 2/17/06, Ian Bicking <ianb at> wrote:
>> Alex Martelli wrote:
>> > Maybe we could fix that by having property(getfunc) use
>> > getfunc.__doc__ as the __doc__ of the resulting property object
>> > (easily overridable in more normal property usage by the doc=
>> > argument, which, I feel, should almost invariably be there).

From martin at  Sat Feb 18 09:59:38 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 09:59:38 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332
 revival	in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>	<>
Message-ID: <>

Aahz wrote:
> The problem is that they don't understand that "Martin v. L?wis" is not
> Unicode -- once all strings are Unicode, this is guaranteed to work.

This specific call, yes. I don't think the problem will go away as long
as both encode and decode are available for both strings and byte

> While it's not absolutely true, my experience of watching Unicode
> confusion is that the simplest approach for newbies is: encode FROM
> Unicode, decode TO Unicode.

I think this is what should be in-grained into the library, also. It
shouldn't try to give additional meaning to these terms.


From jcarlson at  Sat Feb 18 10:16:07 2006
From: jcarlson at (Josiah Carlson)
Date: Sat, 18 Feb 2006 01:16:07 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Ron Adam <rrr at> wrote:
> Josiah Carlson wrote:
> > Bengt Richter had a good idea with bytes.recode() for strictly bytes
> > transformations (and the equivalent for text), though it is ambiguous as
> > to the direction; are we encoding or decoding with bytes.recode()?  In
> > my opinion, this is why .encode() and .decode() makes sense to keep on
> > both bytes and text, the direction is unambiguous, and if one has even a
> > remote idea of what the heck the codec is, they know their result.
> > 
> >  - Josiah
> I like the bytes.recode() idea a lot. +1
> It seems to me it's a far more useful idea than encoding and decoding by 
> overloading and could do both and more.  It has a lot of potential to be 
> an intermediate step for encoding as well as being used for many other 
> translations to byte data.

Indeed it does.

> I think I would prefer that encode and decode be just functions with 
> well defined names and arguments instead of being methods or arguments 
> to string and Unicode types.

Attaching it to string and unicode objects is a useful convenience. 
Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . 
Tossing the encode/decode somewhere else, like encodings, or even string,
I see as a backwards step.

> I'm not sure on exactly how this would work. Maybe it would need two 
> sets of encodings, ie.. decoders, and encoders.  An exception would be
> given if it wasn't found for the direction one was going in.
> Roughly... something or other like:
>      import encodings
>      encodings.tostr(obj, encoding):
>         if encoding not in encoders:
>             raise LookupError 'encoding not found in encoders'
>         # check if obj works with encoding to string
>         # ...
>         b = bytes(obj).recode(encoding)
>         return str(b)
>      encodings.tounicode(obj, decodeing):
>         if decoding not in decoders:
>             raise LookupError 'decoding not found in decoders'
>         # check if obj works with decoding to unicode
>         # ...
>         b = bytes(obj).recode(decoding)
>         return unicode(b)
> Anyway... food for thought.

Again, the problem is ambiguity; what does bytes.recode(something) mean?
Are we encoding _to_ something, or are we decoding _from_ something? 
Are we going to need to embed the direction in the encoding/decoding
name (to_base64, from_base64, etc.)?  That doesn't any better than
binascii.b2a_base64 .  What about .reencode and .redecode?  It seems as
though the 're' added as a prefix to .encode and .decode makes it
clearer that you get the same type back as you put in, and it is also
unambiguous to direction.

The question remains: is str.decode() returning a string or unicode
depending on the argument passed, when the argument quite literally
names the codec involved, difficult to understand?  I don't believe so;
am I the only one?

 - Josiah

From walter at  Sat Feb 18 10:44:15 2006
From: walter at (=?iso-8859-1?Q?Walter_D=F6rwald?=)
Date: Sat, 18 Feb 2006 10:44:15 +0100 (CET)
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On 2/17/06, Ian Bicking <ianb at> wrote:
>> Guido van Rossum wrote:
>> > d = {}
>> > d.default_factory = set
>> > ...
>> > d[key].add(value)
>> Another option would be:
>>    d = {}
>>    d.default_factory = set
>>    d.get_default(key).add(value)
>> Unlike .setdefault, this would use a factory associated with the dictionary, and no default value would get passed in.
>> Unlike the proposal, this would not override __getitem__ (not overriding
>> __getitem__ is really the only difference with the proposal).  It would be clear reading the code that you were not
>> implicitly asserting they "key in d" was true.
>> "get_default" isn't the best name, but another name isn't jumping out at me at the moment.  Of course, it is not a Pythonic
>> argument to say that an existing method should be overridden, or functionality made nameless simply because we can't think
>> of a name (looking to anonymous functions of course ;)
> I'm torn. While trying to implement this I came across some ugliness in PyDict_GetItem() -- it would make sense if this also
> called
> on_missing(), but it must return a value without incrementing its
> refcount, and isn't supposed to raise exceptions -- so what to do if on_missing() returns a value that's not inserted in the
> dict?
> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.
> OTOH most reviewers here seem to appreciate on_missing() as a way to do various other ways of alterning a dict's
> __getitem__() behavior behind a caller's back -- perhaps it could even be (ab)used to
> implement case-insensitive lookup.

I don't like the fact that on_missing()/default_factory can change the behaviour of __getitem__, which upto now has been
something simple and understandable.
Why don't we put the on_missing()/default_factory functionality into get() instead?

d.get(key, default) does what it did before. d.get(key) invokes on_missing() (and dict would have default_factory == type(None))

   Walter D?rwald

From mal at  Sat Feb 18 12:06:37 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 12:06:37 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>
Message-ID: <>

Martin, v. L?wis wrote:
>> How are users confused?
> Users do
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.
> What it *should* tell them is
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode() method *do* check the return
types of the codecs and only allow strings or Unicode
on return (no lists, instances, tuples or anything else).

You seem to ignore this fact.

If we were to follow your idea, we should remove .encode()
and .decode() altogether and refer users to the codecs.encode()
and codecs.decode() function. However, I doubt that users
will like this idea.

>> bytes.encode CAN only produce bytes.
> I don't understand MAL's design, but I believe in that design,
> bytes.encode could produce anything (say, a list). A codec
> can convert anything to anything else.

True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mwh at  Sat Feb 18 12:25:26 2006
From: mwh at (Michael Hudson)
Date: Sat, 18 Feb 2006 11:25:26 +0000
Subject: [Python-Dev] Serial function call composition syntax foo(x,
 y) -> bar() -> baz(z)
In-Reply-To: <>
	(Guido van Rossum's message of "Fri, 17 Feb 2006 16:03:20 -0800")
References: <>
Message-ID: <>

"Guido van Rossum" <guido at> writes:

> It's only me that's allowed to top-post. :-)

At least you include attributions these days! <wink>


  SPIDER:  'Scuse me. [scuttles off]
  ZAPHOD:  One huge spider.
    FORD:  Polite though.
                   -- The Hitch-Hikers Guide to the Galaxy, Episode 11

From thomas at  Sat Feb 18 12:33:58 2006
From: thomas at (Thomas Wouters)
Date: Sat, 18 Feb 2006 12:33:58 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:

> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> You seem to ignore this fact.

Actually, I think the problem is that while we all agree the
bytestring/unicode methods are a useful way to convert from bytestring to
unicode and back again, we disagree on their *general* usefulness. Sure, the
codecs mechanism is powerful, and even more so because they can determine
their own returntype. But it still smells and feels like a Perl attitude,
for the reasons already explained numerous times, as well:

 - The return value for the non-unicode encodings depends on the value of
   the encoding argument.

 - The general case, by and large, especially in non-powerusers, is to
   encode unicode to bytestrings and to decode bytestrings to unicode. And
   that is a hard enough task for many of the non-powerusers. Being able to
   use the encode/decode methods for other tasks isn't helping them.

That is why I disagree with the hypergeneralization of the encode/decode
methods, regardless of the fact that it is a natural expansion of the
implementation of codecs. Sure, it looks 'right' and 'natural' when you look
at the implementation. It sure doesn't look natural, to me and to many
others, when you look at the task of encoding and decoding

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From mwh at  Sat Feb 18 12:44:23 2006
From: mwh at (Michael Hudson)
Date: Sat, 18 Feb 2006 11:44:23 +0000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
	(Guido van Rossum's message of "Fri, 17 Feb 2006 14:15:39 -0800")
References: <>
Message-ID: <>

"Guido van Rossum" <guido at> writes:

> I'm torn. While trying to implement this I came across some ugliness
> in PyDict_GetItem() -- it would make sense if this also called
> on_missing(), but it must return a value without incrementing its
> refcount, and isn't supposed to raise exceptions

This last bit has been a painful lie for quite some time.  I don't
know what can be done about it, though -- avoid the use of
PyDict_GetItem() in situations where you don't expect string only
dicts (so using it on globals and instance dicts would still be ok)?

> -- so what to do if
> on_missing() returns a value that's not inserted in the dict?

Well, like some others I am a bit uncomfortable with changing the
semantics of such an important operation on such an important data
structure.  But then I'm also not that unhappy with setdefault, so I
must be weird.

> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.


> OTOH most reviewers here seem to appreciate on_missing() as a way to
> do various other ways of alterning a dict's __getitem__() behavior
> behind a caller's back -- perhaps it could even be (ab)used to
> implement case-insensitive lookup.

Well, I'm not sure I do.

There seems to be quite a conceptual difference between being able to
make a new kind of dictionary and mess with the behaviour of one that
exists already, but I don't know if that matters in practice (the fact
that you can currently do things like "import sys; sys.__dict__.clear()"
doesn't seem to cause real problems).

Finally, I'll just note that subclassing to modify the behaviour of a
builtin type has generally been actively discouraged in python so far.
If all dictionary lookups went through a method that you could
override in Python (i.e. subclasses could replace ma_lookup, in
effect) this would be easy to do in Python code.  But they don't, and
bug reports suggesting that they do have been rejected in the past
(and I agree with the rejection, fwiw).

So that rambled a bit.  But in essence: I'd much prefer much prefer an
addtion of a method or a type than modifictaion of existing behaviour.


  If you're talking "useful", I'm not your bot.
                                            -- Tim Peters, 08 Nov 2001

From mal at  Sat Feb 18 12:44:27 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 12:44:27 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
> No. The reason to ban string.decode and bytes.encode is that
> it confuses users.

Instead of starting to ban everything that can potentially
confuse a few users, we should educate those users and tell
them what these methods mean and how they should be used.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mwh at  Sat Feb 18 13:01:34 2006
From: mwh at (Michael Hudson)
Date: Sat, 18 Feb 2006 12:01:34 +0000
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (
	=?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Fri,
	17 Feb 2006 23:52:15 +0100")
References: <> <>
	<> <>
Message-ID: <>

This posting is entirely tangential.  Be warned.

"Martin v. L?wis" <martin at> writes:

> It's worse than that. The return *type* depends on the *value* of
> the argument. I think there is little precedence for that:

There's one extremely significant example where the *value* of
something impacts on the type of something else: functions.  The types
of everything involved in str([1]) and len([1]) are the same but the
results are different.  This shows up in PyPy's type annotation; most
of the time we just track types indeed, but when something is called
we need to have a pretty good idea of the potential values, too.

Relavent to the point at hand?  No.  Apologies for wasting your time


  The ultimate laziness is not using Perl.  That saves you so much
  work you wouldn't believe it if you had never tried it.
                                        -- Erik Naggum, comp.lang.lisp

From rrr at  Sat Feb 18 13:17:42 2006
From: rrr at (Ron Adam)
Date: Sat, 18 Feb 2006 06:17:42 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:
> Ron Adam <rrr at> wrote:
>> Josiah Carlson wrote:
>>> Bengt Richter had a good idea with bytes.recode() for strictly bytes
>>> transformations (and the equivalent for text), though it is ambiguous as
>>> to the direction; are we encoding or decoding with bytes.recode()?  In
>>> my opinion, this is why .encode() and .decode() makes sense to keep on
>>> both bytes and text, the direction is unambiguous, and if one has even a
>>> remote idea of what the heck the codec is, they know their result.
>>>  - Josiah
>> I like the bytes.recode() idea a lot. +1
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
> Indeed it does.
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
> Attaching it to string and unicode objects is a useful convenience. 
> Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . 
> Tossing the encode/decode somewhere else, like encodings, or even string,
> I see as a backwards step.
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
>> Roughly... something or other like:
>>      import encodings
>>      encodings.tostr(obj, encoding):
>>         if encoding not in encoders:
>>             raise LookupError 'encoding not found in encoders'
>>         # check if obj works with encoding to string
>>         # ...
>>         b = bytes(obj).recode(encoding)
>>         return str(b)
>>      encodings.tounicode(obj, decodeing):
>>         if decoding not in decoders:
>>             raise LookupError 'decoding not found in decoders'
>>         # check if obj works with decoding to unicode
>>         # ...
>>         b = bytes(obj).recode(decoding)
>>         return unicode(b)
>> Anyway... food for thought.
> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> Are we encoding _to_ something, or are we decoding _from_ something? 

This was just an example of one way that might work, but here are my 
thoughts on why I think it might be good.

In this case, the ambiguity is reduced as far as the encoding and 
decodings opperations are concerned.)

      somestring = encodings.tostr( someunicodestr, 'latin-1')

It's pretty clear what is happening to me.

     It will encode to a string an object, named someunicodestr, with 
the 'latin-1' encoder.

And also rusult in clear errors if the specified encoding is 
unavailable, and if it is, if it's not compatible with the given 
*someunicodestr* obj type.

Further hints could be gained by.


Which could result in... something like...
     encoding.tostr( <string|unicode>, <encoder> ) -> string

     Encode a unicode string using a encoder codec to a
     non-unicode string or transform a non-unicode string
     to another non-unicode string using an encoder codec.

And if that's not enough, then help(encodings) could give more clues. 
These steps would be what I would do. And then the next thing would be 
to find the python docs entry on encodings.

Placing them in encodings seems like a fairly good place to look for 
these functions if you are working with encodings.  So I find that just 
as convenient as having them be string methods.

There is no intermediate default encoding involved above, (the bytes 
object is used instead), so you wouldn't get some of the messages the 
present system results in when ascii is the default.

(Yes, I know it won't when P3K is here also)

> Are we going to need to embed the direction in the encoding/decoding
> name (to_base64, from_base64, etc.)?  That doesn't any better than
> binascii.b2a_base64 .  

No, that's why I suggested two separate lists (or dictionaries might be 
better).  They can contain the same names, but the lists they are in 
determine the context and point to the needed codec.  And that step is 
abstracted out by putting it inside the encodings.tostr() and 
encodings.tounicode() functions.

So either function would call 'base64' from the correct codec list and 
get the correct encoding or decoding codec it needs.

What about .reencode and .redecode?  It seems as
> though the 're' added as a prefix to .encode and .decode makes it
> clearer that you get the same type back as you put in, and it is also
> unambiguous to direction.

But then wouldn't we end up with multitude of ways to do things?

     s.encode(codec) == s.redecode(codec)
     s.decode(codec) == s.reencode(codec)
     unicode(s, codec) == s.decode(codec)
     str(u, codec) == u.encode(codec)
     str(s, codec) == s.encode(codec)
     unicode(s, codec) == s.reencode(codec)
     str(u, codec) == s.redecode(codec)
     str(s, codec) == s.redecode(codec)

Umm .. did I miss any?  Which ones would you remove?

Which ones of those will succeed with which codecs?

The method bytes.recode(), always does a byte transformation which can 
be almost anything.  It's the context bytes.recode() is used in that 
determines what's happening.  In the above cases, it's using an encoding 
transformation, so what it's doing is precisely what you would expect by 
it's context.

There isn't a bytes.decode(), since that's just another transformation. 
So only the one method is needed.  Which makes it easer to learn.

> The question remains: is str.decode() returning a string or unicode
> depending on the argument passed, when the argument quite literally
> names the codec involved, difficult to understand?  I don't believe so;
> am I the only one?
>  - Josiah

Using help(str.decode) and help(str.encode) gives:

      S.decode([encoding[,errors]]) -> object

      S.encode([encoding[,errors]]) -> object

These look an awful lot alike.  The descriptions are nearly identical as 
well.  The Python docs just reproduce (or close to) the doc strings with 
only a very small amount of additional words.

Learning how the current system works comes awfully close to reverse 
engineering.  Maybe I'm overstating it a bit, but I suspect many end up 
doing exactly that in order to learn how Python does it.

Or they go with the first solution that seems to work and hope for the 
best.  I believe that's what Martin said earlier in this thread.

It's much too late (or early now) to think further on this. So until 

(please ignore typos) ;-)

     Ronald Adam

From mal at  Sat Feb 18 13:21:18 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 13:21:18 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
Message-ID: <>

Thomas Wouters wrote:
> On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>> You seem to ignore this fact.
> Actually, I think the problem is that while we all agree the
> bytestring/unicode methods are a useful way to convert from bytestring to
> unicode and back again, we disagree on their *general* usefulness. Sure, the
> codecs mechanism is powerful, and even more so because they can determine
> their own returntype. But it still smells and feels like a Perl attitude,
> for the reasons already explained numerous times, as well:

It's by no means a Perl attitude.

The main reason is symmetry and the fact that strings and Unicode
should be as similar as possible in order to simplify the task of
moving from one to the other.

>  - The return value for the non-unicode encodings depends on the value of
>    the encoding argument.

Not really: you'll always get a basestring instance.

>  - The general case, by and large, especially in non-powerusers, is to
>    encode unicode to bytestrings and to decode bytestrings to unicode. And
>    that is a hard enough task for many of the non-powerusers. Being able to
>    use the encode/decode methods for other tasks isn't helping them.


Still, I believe that this is an educational problem. There are
a couple of gotchas users will have to be aware of (and this is
unrelated to the methods in question):

* "encoding" always refers to transforming original data into
  a derived form

* "decoding" always refers to transforming a derived form of
  data back into its original form

* for Unicode codecs the original form is Unicode, the derived
  form is, in most cases, a string

As a result, if you want to use a Unicode codec such as utf-8,
you encode Unicode into a utf-8 string and decode a utf-8 string
into Unicode.

Encoding a string is only possible if the string itself is
original data, e.g. some data that is supposed to be transformed
into a base64 encoded form.

Decoding Unicode is only possible if the Unicode string itself
represents a derived form, e.g. a sequence of hex literals.

> That is why I disagree with the hypergeneralization of the encode/decode
> methods, regardless of the fact that it is a natural expansion of the
> implementation of codecs. Sure, it looks 'right' and 'natural' when you look
> at the implementation. It sure doesn't look natural, to me and to many
> others, when you look at the task of encoding and decoding
> bytestrings/unicode.

That's because you only look at one specific task.

Codecs also unify the various interfaces to common encodings
such as base64, uu or zip which are not Unicode related.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From pierre.barbier at  Sat Feb 18 12:53:25 2006
From: pierre.barbier at (Pierre Barbier de Reuille)
Date: Sat, 18 Feb 2006 12:53:25 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

An embedded and charset-unspecified text was scrubbed...
Name: not available

From rhamph at  Sat Feb 18 14:19:26 2006
From: rhamph at (Adam Olsen)
Date: Sat, 18 Feb 2006 06:19:26 -0700
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/18/06, Josiah Carlson <jcarlson at> wrote:
> Look at what we've currently got going for data transformations in the
> standard library to see what these removals will do: base64 module,
> binascii module, binhex module, uu module, ...  Do we want or need to
> add another top-level module for every future encoding/codec that comes
> out (or does everyone think that we're done seeing codecs)?  Do we want
> to keep monkey-patching binascii with names like 'a2b_hqx'?  While there
> is currently one text->text transform (rot13), do we add another module
> for text->text transforms? Would it start having names like t2e_rot13()
> and e2t_rot13()?

If top-level modules are the problem then why not make codecs into a package?

from codecs import utf8, base64

utf8.encode(u) -> b
utf8.decode(b) -> u
base64.encode(b) -> b
base64.decode(b) -> b

Adam Olsen, aka Rhamphoryncus

From mal at  Sat Feb 18 14:44:29 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 14:44:29 +0100
Subject: [Python-Dev] A codecs nit
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Barry Warsaw wrote:
> On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:
>> Those are not pseudo-encodings, they are regular codecs.
>> It's a common misunderstanding that codecs are only seen as serving
>> the purpose of converting between Unicode and strings.
>> The codec system is deliberately designed to be general enough
>> to also work with many other types, e.g. it is easily possible to
>> write a codec that convert between the hex literal sequence you
>> have above to a list of ordinals:
> Slightly off-topic, but one thing that's always bothered me about the
> current codecs implementation is that str.encode() (and friends)
> implicitly treats its argument as module, and imports it, even if the
> module doesn't live in the encodings package.  That seems like a mistake
> to me (and a potential security problem if the import has side-effects).

It was a mistake, yes, and thanks for bringing this up.

Codec packages should implement and register their own
codec search functions.

> I don't know whether at the very least restricting the imports to the
> encodings package would make sense or would break things.
>>>> import sys
>>>> sys.modules['smtplib']
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> KeyError: 'smtplib'
>>>> ''.encode('smtplib')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> LookupError: unknown encoding: smtplib
>>>> sys.modules['smtplib']
> <module 'smtplib' from '/usr/lib/python2.4/smtplib.pyc'>
> I can't see any reason for allowing any randomly importable module to
> act like an encoding.

The encodings package search function will try to import
the module and then check the module signature. If the
module fails to export the codec registration API, then
it raises the LookupError you see above.

At the time, it was nice to be able to write codec
packages as Python packages and have them readily usable
by just putting the package on the sys.path.

This was a side-effect of the way the encodings search
function worked. The original design idea was to have
all 3rd party codecs register themselves with the
codec registry. However, this implies that the application
using the codecs would have to run the registration
code at least ones. Since the encodings package search
function provided a more convenient way, this was used
by most codec package programmers.

In Py 2.5 we'll change that. The encodings package search
function will only allow codecs in that package to be
imported. All other codec packages will have to provide
their own search function and register this with the
codecs registry.

The big question is: what to do about 2.3 and 2.4 - adding
the same patch will cause serious breakage, since popular
codec packages such as Tamito's Japanese package rely
on the existing behavior.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From skip at  Sat Feb 18 15:50:18 2006
From: skip at (skip at
Date: Sat, 18 Feb 2006 08:50:18 -0600
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>
Message-ID: <>

    talin> ... whereas with 'given' you can't be certain when to stop
    talin> parsing the argument list.

So require parens around the arglist:

    (x*y given (x, y))


From richard.m.tew at  Fri Feb 17 20:48:55 2006
From: richard.m.tew at (Richard Tew)
Date: Fri, 17 Feb 2006 19:48:55 +0000
Subject: [Python-Dev] Stackless Python sprint at PyCon 2006
Message-ID: <>


During the sprint period after PyCon, we are planning on sprinting to bring
Stackless up to date and to make it more current and approachable.  A key
part of this is porting it and the recently completed 64 bit changes that
have been made to it to the latest version of Python.  At the end of the
sprint we hope to have up to date working 32 and 64 bit versions.

If anyone on this list who is attending PyCon, has some time to spare during
the sprint period and an interest in perhaps getting more familiar with
Stackless, you would be more than welcome in joining us to help out.
Familiarity with the Python source code and its workings would be a great
help in the work we hope to get done.  Especially participants with an
interest in ensuring and testing that the porting done works on other
platforms than those we will be developing on (Windows XP and Windows XP x64

Obviously being the most familiar with the Stackless Python source code,
Christian Tismer has kindly offered us guidance by acting as the coach for
the sprint, taking time away from the PyPy sprint.

In any case, if you have any questions, or are interested, please feel free
to reply, whether here, to this email address or to richard at


Richard Tew
Senior Programmer
CCP Games

You can read more about the sprint and the scheduled talk about how
Stackless is used in the massively multiplayer game EVE Online we make, at
PyCon at the folloing URL:

And don't forget the Stackless website :)
-------------- next part --------------
An HTML attachment was scrubbed...

From aleaxit at  Sat Feb 18 16:24:41 2006
From: aleaxit at (Alex Martelli)
Date: Sat, 18 Feb 2006 07:24:41 -0800
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <dt6mem$bu$>
References: <dsj0p7$tk3$>
	<>	<dt552b$mnq$>	<>	<>
Message-ID: <>

On Feb 18, 2006, at 12:38 AM, Georg Brandl wrote:

> Guido van Rossum wrote:
>> WFM. Patch anyone?
> Done.

I reviewed the patch and added a comment on it,  but since the point  
may be controversial I had better air it here for discussion: in 2.4,  
property(fset=acallable) does work (maybe silly, but it does make a  
write-only property) -- with the patch as given, it would stop  
working (due to attempts to get __doc__ from the None value of fget);  
I think we should ensure it keeps working (and add a unit test to  
that effect).


From aahz at  Sat Feb 18 16:32:41 2006
From: aahz at (Aahz)
Date: Sat, 18 Feb 2006 07:32:41 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Feb 18, 2006, Ron Adam wrote:
> I like the bytes.recode() idea a lot. +1
> It seems to me it's a far more useful idea than encoding and decoding by 
> overloading and could do both and more.  It has a lot of potential to be 
> an intermediate step for encoding as well as being used for many other 
> translations to byte data.
> I think I would prefer that encode and decode be just functions with 
> well defined names and arguments instead of being methods or arguments 
> to string and Unicode types.
> I'm not sure on exactly how this would work. Maybe it would need two 
> sets of encodings, ie.. decoders, and encoders.  An exception would be
> given if it wasn't found for the direction one was going in.

Here's an idea I don't think I've seen before:

bytes.recode(b, src_encoding, dest_encoding)

This requires the user to state up-front what the source encoding is.
One of the big problems that I see with the whole encoding mess is that
so much of it contains implicit assumptions about the source encoding;
this gets away from that.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From mal at  Sat Feb 18 16:47:08 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 16:47:08 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
> Here's an idea I don't think I've seen before:
> bytes.recode(b, src_encoding, dest_encoding)
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.

You might want to look at the module: it has all these
things and a lot more.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From walter at  Sat Feb 18 17:11:39 2006
From: walter at (=?iso-8859-1?Q?Walter_D=F6rwald?=)
Date: Sat, 18 Feb 2006 17:11:39 +0100 (CET)
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> Walter D?rwald wrote:
>>>>> I'd suggest we keep codecs.lookup() the way it is and
>>>>> instead add new functions to the codecs module, e.g.
>>>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>>>> Changing the codec registration is not much of a problem:
>>>>> we could simply allow 6-tuples to be passed into the
>>>>> registry.
>>>> OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples and the search functions must return 6-tuples.
>>>> And we add codecs.getencoderobject() and codecs.getdecoderobject() as well as new classes codecs.StatefulEncoder and
>>>> codecs.StatefulDecoder. What about old search functions that return 4-tuples?
>>> The registry should then simply set the missing entries to None and the getencoderobject()/getdecoderobject() would then
>>> have
>>> to raise an error.
>> Sounds simple enough and we don't loose backwards compatibility.
>>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
>> +1, but I'd like to have a replacement for this, i.e. a function that returns all info the registry has about an encoding:
>> 1. Name
>> 2. Encoder function
>> 3. Decoder function
>> 4. Stateful encoder factory
>> 5. Stateful decoder factory
>> 6. Stream writer factory
>> 7. Stream reader factory
>> and if this is an object with attributes, we won't have any problems if we extend it in the future.
> Shouldn't be a problem: just expose the registry dictionary
> via the _codecs module.
> The rest can then be done in a Python function defined in
> using a CodecInfo class.

This would require the Python code to call codecs.lookup() and then look into the codecs dictionary (normalizing the encoding
name again). Maybe we should make a version of __PyCodec_Lookup() that allows 4- and 6-tuples available to Python and use that?
The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 4-tuples.
>> BTW, if we change the API, can we fix the return value of the stateless functions? As the stateless function always
>> encodes/decodes the complete string, returning the length of the string doesn't make sense.
>> codecs.getencoder() and codecs.getdecoder() would have to continue to return the old variant of the functions, but
>> codecs.getinfo("latin-1").encoder would be the new encoding function.
> No: you can still write stateless encoders or decoders that do
> not process the whole input string. Just because we don't have
> any of those in Python, doesn't mean that they can't be written
> and used. A stateless codec might want to leave the work
> of buffering bytes at the end of the input data which cannot
> be processed to the caller.

But what would the call do with that info? It can't retry encoding/decoding the rejected input, because the state of the codec
has been thrown away already.
> It is also possible to write
> stateful codecs on top of such stateless encoding and decoding
> functions.

That's what the codec helper functions from Python/_codecs.c are for.

Anyway, I've started implementing a patch that just adds codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16,
UTF-16-LE and UTF-16-BE are already working.
    Walter D?rwald

From martin at  Sat Feb 18 17:15:14 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 17:15:14 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> You seem to ignore this fact.

I'm not ignoring the fact that you have explained this
many times. I just fail to understand your explanations.

For example, you said at some point that codecs are not
restricted to Unicode. However, I don't recall any
explanation what the restriction *is*, if any restriction
exists. No such restriction seems to be documented.

> True. However, note that the .encode()/.decode() methods on
> strings and Unicode narrow down the possible return types.
> The corresponding .bytes methods should only allow bytes and
> Unicode.

I forgot that: what is the rationale for that restriction?


From martin at  Sat Feb 18 17:22:01 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 17:22:01 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
	<>	<>
	<> <>
Message-ID: <>

Michael Hudson wrote:
> There's one extremely significant example where the *value* of
> something impacts on the type of something else: functions.  The types
> of everything involved in str([1]) and len([1]) are the same but the
> results are different.  This shows up in PyPy's type annotation; most
> of the time we just track types indeed, but when something is called
> we need to have a pretty good idea of the potential values, too.
> Relavent to the point at hand?  No.  Apologies for wasting your time
> :)

Actually, I think it is relevant. I never thought about it this way,
but now that you mention it, you are right.

This demonstrates that the string argument to .encode is actually
a function name, atleast the way it is implemented now. So
.encode("uu") and .encode("rot13") are *two* different methods,
instead of being a single method.

This brings me back to my original point: "rot13" should be a function,
not a parameter to some function. In essence, .encode reimplements
apply(), with the added feature of not having to pass the function
itself, but just its name.

Maybe this design results from a really deep understanding of

Namespaces are one honking great idea -- let's do more of those!


From martin at  Sat Feb 18 17:28:28 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 17:28:28 +0100
Subject: [Python-Dev] Stackless Python sprint at PyCon 2006
In-Reply-To: <>
References: <>
Message-ID: <>

Richard Tew wrote:
> If anyone on this list who is attending PyCon, has some time to spare
> during the sprint period and an interest in perhaps getting more
> familiar with Stackless, you would be more than welcome in joining us to
> help out.  Familiarity with the Python source code and its workings
> would be a great help in the work we hope to get done.  Especially
> participants with an interest in ensuring and testing that the porting
> done works on other platforms than those we will be developing on
> (Windows XP and Windows XP x64 edition).

If you are going to work on XP x64, make sure you have the latest
platform SDK installed on these machines. I plan to build AMD64
binaries with the platform SDK, not with VS 2005.


From talin at  Sat Feb 18 17:49:29 2006
From: talin at (Talin)
Date: Sat, 18 Feb 2006 08:49:29 -0800
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>
Message-ID: <>

skip at wrote:

>    talin> ... whereas with 'given' you can't be certain when to stop
>    talin> parsing the argument list.
>So require parens around the arglist:
>    (x*y given (x, y))
I would not be opposed to mandating the parens, and its an easy enough 
change to make. The patch on SF lets you do it both ways, which will 
give people who are interested a chance to get a feel for the various 

I realize of course that this is a moot point. But perhaps I can help to 
winnow down the dozens of rejected lambda replacement proposals to just 
a few rejected lamda proposals :)

-- Talin

From mal at  Sat Feb 18 18:10:14 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 18:10:14 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> M.-A. Lemburg wrote:
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>> You seem to ignore this fact.
> I'm not ignoring the fact that you have explained this
> many times. I just fail to understand your explanations.

Feel free to ask questions.

> For example, you said at some point that codecs are not
> restricted to Unicode. However, I don't recall any
> explanation what the restriction *is*, if any restriction
> exists. No such restriction seems to be documented.

The codecs are not restricted w/r to the data types
they work on. It's up to the codecs to define which
data types are valid and which they take on input and

>> True. However, note that the .encode()/.decode() methods on
>> strings and Unicode narrow down the possible return types.
>> The corresponding .bytes methods should only allow bytes and
>> Unicode.
> I forgot that: what is the rationale for that restriction?

To assure that only those types can be returned from those
methods, ie. instances of basestring, which in return permits
type inference for those methods.

The codecs functions encode() and decode() don't have these
restrictions, and thus provide a generic interface to the
codec's encode and decode functions. It's up to the caller
to restrict the allowed encodings and as result the
possible input/output types.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mal at  Sat Feb 18 18:24:46 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 18:24:46 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

Walter D?rwald wrote:
> M.-A. Lemburg wrote:
>> Walter D?rwald wrote:
>>>>>> I'd suggest we keep codecs.lookup() the way it is and
>>>>>> instead add new functions to the codecs module, e.g.
>>>>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>>>>> Changing the codec registration is not much of a problem:
>>>>>> we could simply allow 6-tuples to be passed into the
>>>>>> registry.
>>>>> OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples and the search functions must return 6-tuples.
>>>>> And we add codecs.getencoderobject() and codecs.getdecoderobject() as well as new classes codecs.StatefulEncoder and
>>>>> codecs.StatefulDecoder. What about old search functions that return 4-tuples?
>>>> The registry should then simply set the missing entries to None and the getencoderobject()/getdecoderobject() would then
>>>> have
>>>> to raise an error.
>>> Sounds simple enough and we don't loose backwards compatibility.
>>>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
>>> +1, but I'd like to have a replacement for this, i.e. a function that returns all info the registry has about an encoding:
>>> 1. Name
>>> 2. Encoder function
>>> 3. Decoder function
>>> 4. Stateful encoder factory
>>> 5. Stateful decoder factory
>>> 6. Stream writer factory
>>> 7. Stream reader factory
>>> and if this is an object with attributes, we won't have any problems if we extend it in the future.
>> Shouldn't be a problem: just expose the registry dictionary
>> via the _codecs module.
>> The rest can then be done in a Python function defined in
>> using a CodecInfo class.
> This would require the Python code to call codecs.lookup() and then look into the codecs dictionary (normalizing the encoding
> name again). Maybe we should make a version of __PyCodec_Lookup() that allows 4- and 6-tuples available to Python and use that?
> The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 4-tuples.

Hmm, you're right: the dictionary may not have the requested codec
info yet (it's only used as cache) and only a call to _PyCodec_Lookup()
would fill it.

>>> BTW, if we change the API, can we fix the return value of the stateless functions? As the stateless function always
>>> encodes/decodes the complete string, returning the length of the string doesn't make sense.
>>> codecs.getencoder() and codecs.getdecoder() would have to continue to return the old variant of the functions, but
>>> codecs.getinfo("latin-1").encoder would be the new encoding function.
>> No: you can still write stateless encoders or decoders that do
>> not process the whole input string. Just because we don't have
>> any of those in Python, doesn't mean that they can't be written
>> and used. A stateless codec might want to leave the work
>> of buffering bytes at the end of the input data which cannot
>> be processed to the caller.
> But what would the call do with that info? It can't retry encoding/decoding the rejected input, because the state of the codec
> has been thrown away already.

This depends a lot on the nature of the codec. It may well be
possible to work on chunks of input data in a stateless way,
e.g. say you have a string of 4-byte hex values, then the decode
function would be able to work on 4 bytes each and let the caller
buffer any remaining bytes for the next call. There'd be no need for
keeping state in the decoder function.

>> It is also possible to write
>> stateful codecs on top of such stateless encoding and decoding
>> functions.
> That's what the codec helper functions from Python/_codecs.c are for.

I'm not sure what you mean here.

> Anyway, I've started implementing a patch that just adds codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16,
> UTF-16-LE and UTF-16-BE are already working.

Nice :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From ncoghlan at  Sat Feb 18 19:11:43 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 19 Feb 2006 04:11:43 +1000
Subject: [Python-Dev] Adventures with ASTs - Inline Lambda
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Talin wrote:
> skip at wrote:
>>    talin> ... whereas with 'given' you can't be certain when to stop
>>    talin> parsing the argument list.
>> So require parens around the arglist:
>>    (x*y given (x, y))
>> Skip
> I would not be opposed to mandating the parens, and its an easy enough 
> change to make. The patch on SF lets you do it both ways, which will 
> give people who are interested a chance to get a feel for the various 
> alternatives.

Another ambiguity is that when they're optional it is unclear whether or not 
adding them means the callable now expects a tuple argument (i.e., doubled 
parens at the call site). If they're mandatory, then it is clear that only 
doubled parentheses at the definition point require doubled parentheses at the 
call site (this is, not coincidentally, exactly the same rule as applies for 
normal functions).

> I realize of course that this is a moot point. But perhaps I can help to 
> winnow down the dozens of rejected lambda replacement proposals to just 
> a few rejected lamda proposals :)



Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From martin at  Sat Feb 18 19:19:55 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 19:19:55 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>
	<> <>
Message-ID: <>

M.-A. Lemburg wrote:
>>>True. However, note that the .encode()/.decode() methods on
>>>strings and Unicode narrow down the possible return types.
>>>The corresponding .bytes methods should only allow bytes and
>>I forgot that: what is the rationale for that restriction?
> To assure that only those types can be returned from those
> methods, ie. instances of basestring, which in return permits
> type inference for those methods.

Hmm. So it for type inference????
Where is that documented?

This looks pretty inconsistent. Either codecs can give arbitrary
return types, then .encode/.decode should also be allowed to
give arbitrary return types, or codecs should be restricted.
What's the point of first allowing a wide interface, and then
narrowing it?

Also, if type inference is the goal, what is the point in allowing
two result types?


From foom at  Sat Feb 18 19:44:01 2006
From: foom at (James Y Knight)
Date: Sat, 18 Feb 2006 13:44:01 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>		<>		<>		<>	<>
	<> <>
Message-ID: <>

On Feb 18, 2006, at 2:33 AM, Martin v. L?wis wrote:
> I don't understand. In the rationale of PEP 333, it says
> "The rationale for requiring a dictionary is to maximize portability
> between servers. The alternative would be to define some subset of a
> dictionary's methods as being the standard and portable interface."
> That rationale is not endangered: if the environment continues to
> be a dict exactly, servers continue to be guaranteed what precise
> set of operations is available on the environment.

Yes it is endangered.

> Well, as you say: you get a KeyError if there is an error with the  
> key.
> With a default_factory, there isn't normally an error with the key.

But there should be. Consider the case of two servers. One which  
takes all the items out of the dictionary (using items()) and puts  
them in some other data structure. Then it checks if the "Date"  
header has been set. It was not, so it adds it. Consider another  
similar server which checks if the "Date" header has been set on the  
dict passed in by the user. The default_factory then makes one up.  
Different behavior due to internal implementation details of how the  
server uses the dict object, which is what the restriction to  
_exactly_ dict prevents.

Consider another server which takes the dict instance and transports  
it across thread boundaries, from the wsgi-app's thread to the main  
server thread. Because WSGI specifies that you can only use 'dict',  
and the server checked that type(obj) == dict, it is guaranteed that  
using the dict won't run thread-unsafe code. That is now broken,  
since dict.__getitem__ can now invoke arbitrary user code. That is a  
major change.


From g.brandl at  Sat Feb 18 19:55:52 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 18 Feb 2006 19:55:52 +0100
Subject: [Python-Dev] The decorator(s) module
In-Reply-To: <>
References: <dsj0p7$tk3$>	<>	<dt552b$mnq$>	<>	<>	<>	<dt6mem$bu$>
Message-ID: <dt7qjo$25a$>

Alex Martelli wrote:
> On Feb 18, 2006, at 12:38 AM, Georg Brandl wrote:
>> Guido van Rossum wrote:
>>> WFM. Patch anyone?
>> Done.
> I reviewed the patch and added a comment on it,  but since the point  
> may be controversial I had better air it here for discussion: in 2.4,  
> property(fset=acallable) does work (maybe silly, but it does make a  
> write-only property) -- with the patch as given, it would stop  
> working (due to attempts to get __doc__ from the None value of fget);  
> I think we should ensure it keeps working (and add a unit test to  
> that effect).

Yes, of course. Thanks for pointing that out.

I updated the patch and hope it's now bullet-proof when no fget argument
is given.


From martin at  Sat Feb 18 20:06:55 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 18 Feb 2006 20:06:55 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>		<>		<>		<>	<>
	<> <>
Message-ID: <>

James Y Knight wrote:
> But there should be. Consider the case of two servers. One which  takes
> all the items out of the dictionary (using items()) and puts  them in
> some other data structure. Then it checks if the "Date"  header has been
> set. It was not, so it adds it. Consider another  similar server which
> checks if the "Date" header has been set on the  dict passed in by the
> user. The default_factory then makes one up.  Different behavior due to
> internal implementation details of how the  server uses the dict object,
> which is what the restriction to  _exactly_ dict prevents.

Right. I would claim that this is an artificial example: you can't
provide a HTTP_DATE value in a default_factory implementation, since
you don't know what the key is.

However, you are now making up a different rationale from the one the
PEP specifies: The PEP says that you need an "exact dict" so that
everybody knows precisely how the  dictionary behaves; instead of having
to define which precise subset of the dict API  is to be used.

*That* goal is still achieved: everybody knows that the dict might
have an on_missing/default_factory implementation. So to find out
whether HTTP_DATE has a value (which might be defaulted), you need
to invoke d['HTTP_DATE'].

> Consider another server which takes the dict instance and transports  it
> across thread boundaries, from the wsgi-app's thread to the main  server
> thread. Because WSGI specifies that you can only use 'dict',  and the
> server checked that type(obj) == dict, it is guaranteed that  using the
> dict won't run thread-unsafe code. That is now broken,  since
> dict.__getitem__ can now invoke arbitrary user code. That is a  major
> change.

Not at all. dict.__getitem__ could always invoke arbitrary user code,
through __hash__.


From rhamph at  Sat Feb 18 20:06:59 2006
From: rhamph at (Adam Olsen)
Date: Sat, 18 Feb 2006 12:06:59 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/18/06, James Y Knight <foom at> wrote:
> On Feb 18, 2006, at 2:33 AM, Martin v. L?wis wrote:
> > Well, as you say: you get a KeyError if there is an error with the
> > key.
> > With a default_factory, there isn't normally an error with the key.
> But there should be. Consider the case of two servers. One which
> takes all the items out of the dictionary (using items()) and puts
> them in some other data structure. Then it checks if the "Date"
> header has been set. It was not, so it adds it. Consider another
> similar server which checks if the "Date" header has been set on the
> dict passed in by the user. The default_factory then makes one up.
> Different behavior due to internal implementation details of how the
> server uses the dict object, which is what the restriction to
> _exactly_ dict prevents.

It just occured to me, what affect does this have on repr?  Does it
attempt to store the default_factory in the representation, or does it
remove it?  Is it even possible to store a reference to a builtin such
as list and have eval restore it?

Adam Olsen, aka Rhamphoryncus

From mal at  Sat Feb 18 20:38:21 2006
From: mal at (M.-A. Lemburg)
Date: Sat, 18 Feb 2006 20:38:21 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> M.-A. Lemburg wrote:
>>>> True. However, note that the .encode()/.decode() methods on
>>>> strings and Unicode narrow down the possible return types.
>>>> The corresponding .bytes methods should only allow bytes and
>>>> Unicode.
>>> I forgot that: what is the rationale for that restriction?
>> To assure that only those types can be returned from those
>> methods, ie. instances of basestring, which in return permits
>> type inference for those methods.
> Hmm. So it for type inference????
> Where is that documented?

Somewhere in the python-dev mailing list archives ;-)

Seriously, we should probably add this to the documentation.

> This looks pretty inconsistent. Either codecs can give arbitrary
> return types, then .encode/.decode should also be allowed to
> give arbitrary return types, or codecs should be restricted.


As I've said before: the .encode() and .decode() methods
are convenience methods to interface to codecs which take
string/Unicode on input and create string/Unicode output.

> What's the point of first allowing a wide interface, and then
> narrowing it?

The codec interface is an abstract interface. It is a flexible
enough to allow codecs to define possible input and output
types while being strict about the method names and signatures.

Much like the file interface in Python, the copy protocol
or the pickle interface.

> Also, if type inference is the goal, what is the point in allowing
> two result types?

I'm not sure I understand the question: type inference is about
being able to infer the types of (among other things) function
return objects. This is what the restriction guarantees - much
like int() guarantees that you get either an integer or a long.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From jcarlson at  Sat Feb 18 20:46:40 2006
From: jcarlson at (Josiah Carlson)
Date: Sat, 18 Feb 2006 11:46:40 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <> <>
Message-ID: <>

Ron Adam <rrr at> wrote:
> Josiah Carlson wrote:
> > Again, the problem is ambiguity; what does bytes.recode(something) mean?
> > Are we encoding _to_ something, or are we decoding _from_ something? 
> This was just an example of one way that might work, but here are my 
> thoughts on why I think it might be good.
> In this case, the ambiguity is reduced as far as the encoding and 
> decodings opperations are concerned.)
>       somestring = encodings.tostr( someunicodestr, 'latin-1')
> It's pretty clear what is happening to me.
>      It will encode to a string an object, named someunicodestr, with 
> the 'latin-1' encoder.

But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
unicode(..., 'latin-1')?

What about string transformations:
    somestring = encodings.tostr(somestr, 'base64')

How do we get that back?  encodings.tostr() again is completely
ambiguous, str(somestring, 'base64') seems a bit awkward (switching

> And also rusult in clear errors if the specified encoding is 
> unavailable, and if it is, if it's not compatible with the given 
> *someunicodestr* obj type.
> Further hints could be gained by.
>      help(encodings.tostr)
> Which could result in... something like...
>      """
>      encoding.tostr( <string|unicode>, <encoder> ) -> string
>      Encode a unicode string using a encoder codec to a
>      non-unicode string or transform a non-unicode string
>      to another non-unicode string using an encoder codec.
>      """
> And if that's not enough, then help(encodings) could give more clues. 
> These steps would be what I would do. And then the next thing would be 
> to find the python docs entry on encodings.
> Placing them in encodings seems like a fairly good place to look for 
> these functions if you are working with encodings.  So I find that just 
> as convenient as having them be string methods.
> There is no intermediate default encoding involved above, (the bytes 
> object is used instead), so you wouldn't get some of the messages the 
> present system results in when ascii is the default.
> (Yes, I know it won't when P3K is here also)
> > Are we going to need to embed the direction in the encoding/decoding
> > name (to_base64, from_base64, etc.)?  That doesn't any better than
> > binascii.b2a_base64 .  
> No, that's why I suggested two separate lists (or dictionaries might be 
> better).  They can contain the same names, but the lists they are in 
> determine the context and point to the needed codec.  And that step is 
> abstracted out by putting it inside the encodings.tostr() and 
> encodings.tounicode() functions.
> So either function would call 'base64' from the correct codec list and 
> get the correct encoding or decoding codec it needs.

Either the API you have described is incomplete, you haven't noticed the
directional ambiguity you are describing, or I have completely lost it.

> > What about .reencode and .redecode?  It seems as
> > though the 're' added as a prefix to .encode and .decode makes it
> > clearer that you get the same type back as you put in, and it is also
> > unambiguous to direction.
> But then wouldn't we end up with multitude of ways to do things?
>      s.encode(codec) == s.redecode(codec)
>      s.decode(codec) == s.reencode(codec)
>      unicode(s, codec) == s.decode(codec)
>      str(u, codec) == u.encode(codec)
>      str(s, codec) == s.encode(codec)
>      unicode(s, codec) == s.reencode(codec)
>      str(u, codec) == s.redecode(codec)
>      str(s, codec) == s.redecode(codec)
> Umm .. did I miss any?  Which ones would you remove?
> Which ones of those will succeed with which codecs?

I must not be expressing myself very well.

Right now:
    s.encode() -> s
    s.decode() -> s, u
    u.encode() -> s, u
    u.decode() -> u

Martin et al's desired change to encode/decode:
    s.decode() -> u
    u.encode() -> s

No others.

What my thoughts on .reencode() and .redecode() would get you given
Martin et al's desired change:
    s.reencode() -> s (you get encoded strings as strings)
    s.redecode() -> s (you get decoded strings as strings)
    u.reencode() -> u (you get encoded unicode as unicode)
    u.redecode() -> u (you get decoded unicode as unicode)

If one wants to go from unicode to string, one uses .encode(). If one
wants to go from string to unicode, one uses .decode().  If one wants to
keep their type unchanged, but encode or decode the data/text, one would
use .reencode() and .redecode(), depending on whether their source is an
encoded block of data, or the original data they want to encode.

The other bonus is that if given .reencode() and .redecode(), one can
quite easily verify that the source is possible as a source, and that
you would get back the proper type.  How this would occur behind the
scenes is beyond the scope of this discussion, but it seems to me to be
easy, given what I've read about the current mechanism.

Whether the constructors for the str and unicode do their own codec
transformations is beside the point.

> The method bytes.recode(), always does a byte transformation which can 
> be almost anything.  It's the context bytes.recode() is used in that 
> determines what's happening.  In the above cases, it's using an encoding 
> transformation, so what it's doing is precisely what you would expect by 
> it's context.

Indeed, there is a translation going on, but it is not clear as to
whether you are encoding _to_ something or _from_ something.  What does
s.recode('base64') mean?  Are you encoding _to_ base64 or _from_ base64? 
That's where the ambiguity lies.

> There isn't a bytes.decode(), since that's just another transformation. 
> So only the one method is needed.  Which makes it easer to learn.

But ambiguous.

> > The question remains: is str.decode() returning a string or unicode
> > depending on the argument passed, when the argument quite literally
> > names the codec involved, difficult to understand?  I don't believe so;
> > am I the only one?
> Using help(str.decode) and help(str.encode) gives:
>       S.decode([encoding[,errors]]) -> object
>       S.encode([encoding[,errors]]) -> object
> These look an awful lot alike.  The descriptions are nearly identical as 
> well.  The Python docs just reproduce (or close to) the doc strings with 
> only a very small amount of additional words.
> Learning how the current system works comes awfully close to reverse 
> engineering.  Maybe I'm overstating it a bit, but I suspect many end up 
> doing exactly that in order to learn how Python does it.

Again, we _need_ better documentation, regardless of whether or when the
removal of some or all .encode()/.decode() methods happen.

 - Josiah

From walter at  Sat Feb 18 22:08:19 2006
From: walter at (=?iso-8859-1?Q?Walter_D=F6rwald?=)
Date: Sat, 18 Feb 2006 22:08:19 +0100 (CET)
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<r01050400-1039-7EC926449D9911DA8736001124365170@>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> Walter D?rwald wrote:
>> M.-A. Lemburg wrote:
>>> Walter D?rwald wrote:
>>>> [...]
>>>>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
>>>> +1, but I'd like to have a replacement for this, i.e. a function that returns all info the registry has about an encoding:
>>>> 1. Name
>>>> 2. Encoder function
>>>> 3. Decoder function
>>>> 4. Stateful encoder factory
>>>> 5. Stateful decoder factory
>>>> 6. Stream writer factory
>>>> 7. Stream reader factory
>>>> and if this is an object with attributes, we won't have any problems if we extend it in the future.
>>> Shouldn't be a problem: just expose the registry dictionary
>>> via the _codecs module.
>>> The rest can then be done in a Python function defined in
>>> using a CodecInfo class.
>> This would require the Python code to call codecs.lookup() and then look into the codecs dictionary (normalizing the
>> encoding name again). Maybe we should make a version of __PyCodec_Lookup() that allows 4- and 6-tuples available to Python
>> and use that? The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 4-tuples.
> Hmm, you're right: the dictionary may not have the requested codec info yet (it's only used as cache) and only a call to
> _PyCodec_Lookup() would fill it.

I'm now trying a different approach: codecs.lookup() returns a subclass of tuple. We could deprecate calling __getitem__() in
2.5/2.6 and then remove the tuple subclassing later.
>>>> BTW, if we change the API, can we fix the return value of the stateless functions? As the stateless function always
>>>> encodes/decodes the complete string, returning the length of the string doesn't make sense. codecs.getencoder() and
>>>> codecs.getdecoder() would have to continue to return the old variant of the functions, but
>>>> codecs.getinfo("latin-1").encoder would be the new encoding function.
>>> No: you can still write stateless encoders or decoders that do
>>> not process the whole input string. Just because we don't have
>>> any of those in Python, doesn't mean that they can't be written and used. A stateless codec might want to leave the work
>>> of buffering bytes at the end of the input data which cannot
>>> be processed to the caller.
>> But what would the call do with that info? It can't retry encoding/decoding the rejected input, because the state of the
>> codec has been thrown away already.
> This depends a lot on the nature of the codec. It may well be
> possible to work on chunks of input data in a stateless way,
> e.g. say you have a string of 4-byte hex values, then the decode
> function would be able to work on 4 bytes each and let the caller
> buffer any remaining bytes for the next call. There'd be no need for keeping state in the decoder function.

So incomplete byte sequence would be silently ignored.

>>> It is also possible to write
>>> stateful codecs on top of such stateless encoding and decoding
>>> functions.
>> That's what the codec helper functions from Python/_codecs.c are for.
> I'm not sure what you mean here.

_codecs.utf_8_decode() etc. use (result, count) tuples as the return value, because those functions are the building blocks of
the codecs themselves.
>> Anyway, I've started implementing a patch that just adds codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
>> UTF-16, UTF-16-LE and UTF-16-BE are already working.
> Nice :-) is updated now too. The rest should be manageble too. I'll leave updating the CJKV codecs to Hye-Shik though.

   Walter D?rwald

From bh at  Sat Feb 18 22:41:07 2006
From: bh at (Bernhard Herzog)
Date: Sat, 18 Feb 2006 22:41:07 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
	(Guido van Rossum's message of "Fri, 17 Feb 2006 14:15:39 -0800")
References: <>
Message-ID: <>

"Guido van Rossum" <guido at> writes:

> If the __getattr__()-like operation that supplies and inserts a
> dynamic default was a separate method, we wouldn't have this problem.

Why implement it in the dictionary type at all?  If, for intance, the
default value functionality were provided as a decorator, it could be
used with all kinds of mappings.  I.e. you could have something along
these lines:

class defaultwrapper(object):

    def __init__(self, base, factory):
        self.__base = base
        self.__factory = factory

    def __getitem__(self, key):
            return self.__base[key]
        except KeyError:
            value = self.__factory()
            self.__base[key] = value
            return value

    def __getattr__(self, attr):
        return getattr(self.__base, attr)

def test():
    dd = defaultwrapper({}, list)
    assert sorted(dd.keys()) == ["abc", "def"]
    assert sorted(dd.values()) == [[1], [1, 2]]
    assert sorted(dd.items()) == [("abc", [1, 2]), ("def", [1])]
    assert dd.has_key("abc")
    assert not dd.has_key("xyz")

The precise semantics would have to be determined yet, of course.

> OTOH most reviewers here seem to appreciate on_missing() as a way to
> do various other ways of alterning a dict's __getitem__() behavior
> behind a caller's back -- perhaps it could even be (ab)used to
> implement case-insensitive lookup.

case-insensitive lookup could be implemented with another
wrapper/decorator.  If you need both case-insitivity and a default
value, you can easily stack the decorators.


Intevation GmbH                       

From rrr at  Sat Feb 18 23:15:17 2006
From: rrr at (Ron Adam)
Date: Sat, 18 Feb 2006 16:15:17 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>> It seems to me it's a far more useful idea than encoding and decoding by 
>> overloading and could do both and more.  It has a lot of potential to be 
>> an intermediate step for encoding as well as being used for many other 
>> translations to byte data.
>> I think I would prefer that encode and decode be just functions with 
>> well defined names and arguments instead of being methods or arguments 
>> to string and Unicode types.
>> I'm not sure on exactly how this would work. Maybe it would need two 
>> sets of encodings, ie.. decoders, and encoders.  An exception would be
>> given if it wasn't found for the direction one was going in.
> Here's an idea I don't think I've seen before:
> bytes.recode(b, src_encoding, dest_encoding)
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.

Yes, but it's not just the encodings that are implicit, it is also the 

    s.encode(codec)  # explicit source type, ? dest type
    s.decode(codec)  # explicit source type, ? dest type

    encodings.tostr(obj, codec) # implicit *known* source type
                                # explicit dest type

    encodings.tounicode(obj, codec) # implicit *known* source type
                                    # explicit dest type

In this case the source is implicit, but there can be a well defined 
check to validate the source type against the codec being used.  It's my 
feeling the user *knows* what he already has, and so it's more important 
that the resulting object type is explicit.

In your suggestion...

    bytes.recode(b, src_encoding, dest_incoding)

Here the encodings are both explicit, but the both the source and the 
destinations of the bytes are not.  Since it working on bytes, they 
could have come from anywhere, and after the translation they would then 
will be cast to the type the user *thinks* it should result in.  A 
source of errors that would likely pass silently.

The way I see it is the bytes type should be a lower level object that 
doesn't care what byte transformation it does. Ie.. they are all one way 
byte to byte transformations determined by context.  And it should have 
the capability to read from and write to types without translating in 
the same step.  Keep it simple.

Then it could be used as a lower level byte translator to implement 
encodings and other translations in encoding methods or functions 
instead of trying to make it replace the higher level functionality.


From thomas at  Sat Feb 18 23:33:15 2006
From: thomas at (Thomas Wouters)
Date: Sat, 18 Feb 2006 23:33:15 +0100
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
	in	coordination with pep 349?]
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:

> It's by no means a Perl attitude.

In your eyes, perhaps. It certainly feels that way to me (or I wouldn't have
said it :). Perl happens to be full of general constructs that were added
because they were easy to add, or they were useful in edgecases. The
encode/decode methods remind me of that, even though I fully understand the
reasoning behind it, and the elegance of the implementation.

> The main reason is symmetry and the fact that strings and Unicode
> should be as similar as possible in order to simplify the task of
> moving from one to the other.

Yes, and this is a design choice I don't agree with. They're different
types. They do everything similarly, except when they are mixed together
(unicode takes precedence, in general, encoding the bytestring from the
default encoding.) Going from one to the other isn't symmetric, though. I
understand that you disagree; the disagreement is on the fundamental choice
of allowing 'encode' and 'decode' to do *more* than going from and to
unicode. I regret that decision, not the decision to make encode and decode
symmetric (which makes sense, after the decision to overgeneralize
encode/decode is made.)

> >  - The return value for the non-unicode encodings depends on the value of
> >    the encoding argument.

> Not really: you'll always get a basestring instance.

Which is not a particularly useful distinction, since in any real world
application, you have to be careful not to mix unicode with (non-ascii)
bytestrings. The only way to reliably deal with unicode is to have it
well-contained (when migrating an application from using bytestrings to
using unicode) or to use unicode everywhere, decoding/encoding at
entrypoints. Containment is hard to achieve.

> Still, I believe that this is an educational problem. There are
> a couple of gotchas users will have to be aware of (and this is
> unrelated to the methods in question):
> * "encoding" always refers to transforming original data into
>   a derived form
> * "decoding" always refers to transforming a derived form of
>   data back into its original form
> * for Unicode codecs the original form is Unicode, the derived
>   form is, in most cases, a string
> As a result, if you want to use a Unicode codec such as utf-8,
> you encode Unicode into a utf-8 string and decode a utf-8 string
> into Unicode.
> Encoding a string is only possible if the string itself is
> original data, e.g. some data that is supposed to be transformed
> into a base64 encoded form.
> Decoding Unicode is only possible if the Unicode string itself
> represents a derived form, e.g. a sequence of hex literals.

Most of these gotchas would not have been gotchas had encode/decode only
been usable for unicode encodings.

> > That is why I disagree with the hypergeneralization of the encode/decode
> > methods
> That's because you only look at one specific task.

> Codecs also unify the various interfaces to common encodings
> such as base64, uu or zip which are not Unicode related.

No, I think you misunderstand. I object to the hypergeneralization of the
*encode/decode methods*, not the codec system. I would have been fine with
another set of methods for non-unicode transformations. Although I would
have been even more fine if they got their encoding not as a string, but as,
say, a module object, or something imported from a module.

Not that I think any of this matters; we have what we have and I'll have to
live with it ;)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From tjreedy at  Sat Feb 18 23:48:10 2006
From: tjreedy at (Terry Reedy)
Date: Sat, 18 Feb 2006 17:48:10 -0500
Subject: [Python-Dev] bytes.from_hex()
References: <><>
Message-ID: <dt887b$abi$>

"Josiah Carlson" <jcarlson at> wrote in message 
news:20060218005534.5FA8.JCARLSON at

> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> Are we encoding _to_ something, or are we decoding _from_ something?
> Are we going to need to embed the direction in the encoding/decoding
> name (to_base64, from_base64, etc.)?

To me, that seems simple and clear.  b.recode('from_base64') obviously 
requires that b meet the restrictions of base64.  Similarly for 'from_hex'.

> That doesn't any better than binascii.b2a_base64

I think 'from_base64' is *much* better.  I think there are now 4 
string-to-string transform modules that do similar things.  Not optimal to 

 >What about .reencode and .redecode?  It seems as
> though the 're' added as a prefix to .encode and .decode makes it
> clearer that you get the same type back as you put in, and it is also
> unambiguous to direction.

To me, the 're' prefix is awkward, confusing, and misleading.

Terry J. Reedy

From oliphant.travis at  Sun Feb 19 00:16:02 2006
From: oliphant.travis at (Travis E. Oliphant)
Date: Sat, 18 Feb 2006 16:16:02 -0700
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>	<dt5mso$h0c$>	<>	<dt5tot$437$>	<>	<dt673u$phm$>	<>
Message-ID: <>

Martin v. L?wis wrote:
> Neal Norwitz wrote:
>>I suppose that might be nice, but would require configure magic.  I'm
>>not sure how it could be done on Windows.
> Contributions are welcome. On Windows, it can be hard-coded.
> Actually, something like
> #else
> #error What is size_t equal to?
> #endif
> might work.

Why not just

#if SIZEOF_SIZE_T == 2
#define PY_SSIZE_T_MAX 0x7fff
#elif SIZEOF_SIZE_T == 4
#define PY_SSIZE_T_MAX 0x7fffffff
#elif SIZEOF_SIZE_T == 8
#define PY_SSIZE_T_MAX 0x7fffffffffffffff
#elif SIZEOF_SIZE_T == 16
#define PY_SSIZE_T_MAX 0x7fffffffffffffffffffffffffffffff


From pje at  Sun Feb 19 00:34:59 2006
From: pje at (Phillip J. Eby)
Date: Sat, 18 Feb 2006 18:34:59 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

At 01:44 PM 02/18/2006 -0500, James Y Knight wrote:
>On Feb 18, 2006, at 2:33 AM, Martin v. L?wis wrote:
> > I don't understand. In the rationale of PEP 333, it says
> > "The rationale for requiring a dictionary is to maximize portability
> > between servers. The alternative would be to define some subset of a
> > dictionary's methods as being the standard and portable interface."
> >
> > That rationale is not endangered: if the environment continues to
> > be a dict exactly, servers continue to be guaranteed what precise
> > set of operations is available on the environment.
>Yes it is endangered.

So we'll update the spec to say you can't use a dict that has the default 
set.  It's not reasonable to expect that language changes might not require 
updates to a PEP.  Certainly, we don't have to worry about being backward 
compatible when it's only Python 2.5 that's affected by the change.  :)

From rrr at  Sun Feb 19 00:56:02 2006
From: rrr at (Ron Adam)
Date: Sat, 18 Feb 2006 17:56:02 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <> <>
Message-ID: <>

Josiah Carlson wrote:
> Ron Adam <rrr at> wrote:
>> Josiah Carlson wrote:
> [snip]
>>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
>>> Are we encoding _to_ something, or are we decoding _from_ something? 
>> This was just an example of one way that might work, but here are my 
>> thoughts on why I think it might be good.
>> In this case, the ambiguity is reduced as far as the encoding and 
>> decodings opperations are concerned.)
>>       somestring = encodings.tostr( someunicodestr, 'latin-1')
>> It's pretty clear what is happening to me.
>>      It will encode to a string an object, named someunicodestr, with 
>> the 'latin-1' encoder.
> But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
> unicode(..., 'latin-1')?

Yes, Just do.

      someunicodestr = encoding.tounicode( somestring, 'latin-1')

> What about string transformations:
>     somestring = encodings.tostr(somestr, 'base64')
> How do we get that back?  encodings.tostr() again is completely
> ambiguous, str(somestring, 'base64') seems a bit awkward (switching
> namespaces)?

In the case where a string is converted to another string. It would 
probably be best to have a requirement that they all get converted to 
unicode as an intermediate step.  By doing that it becomes an explicit 
two step opperation.

     # string to string encoding
     u_string = encodings.tounicode(s_string, 'base64')
     s2_string = encodings.tostr(u_string, 'base64')

Or you could have a convenience function to do it in the encodings 
module also.

    def strtostr(s, sourcecodec, destcodec):
        u = tounicode(s, sourcecodec)
        return tostr(u, destcodec)


    s2 = encodings.strtostr(s, 'base64, 'base64)

Which would be kind of pointless in this example, but it would be a good 
way to test a codec.

    assert s == s2

>>> Are we going to need to embed the direction in the encoding/decoding
>>> name (to_base64, from_base64, etc.)?  That doesn't any better than
>>> binascii.b2a_base64 .  
>> No, that's why I suggested two separate lists (or dictionaries might be 
>> better).  They can contain the same names, but the lists they are in 
>> determine the context and point to the needed codec.  And that step is 
>> abstracted out by putting it inside the encodings.tostr() and 
>> encodings.tounicode() functions.
>> So either function would call 'base64' from the correct codec list and 
>> get the correct encoding or decoding codec it needs.
> Either the API you have described is incomplete, you haven't noticed the
> directional ambiguity you are describing, or I have completely lost it.

Most likely I gave an incomplete description of the API in this case 
because there are probably several ways to implement it.

>>> What about .reencode and .redecode?  It seems as
>>> though the 're' added as a prefix to .encode and .decode makes it
>>> clearer that you get the same type back as you put in, and it is also
>>> unambiguous to direction.


 > I must not be expressing myself very well.
> Right now:
>     s.encode() -> s
>     s.decode() -> s, u
>     u.encode() -> s, u
>     u.decode() -> u
> Martin et al's desired change to encode/decode:
>     s.decode() -> u
>     u.encode() -> s
 > No others.

Which would be similar to the functions I suggested.  The main 
difference is only weather it would be better to have them as methods or 
separate factory functions and the spelling of the names.  Both have 
their advantages I think.

>> The method bytes.recode(), always does a byte transformation which can 
>> be almost anything.  It's the context bytes.recode() is used in that 
>> determines what's happening.  In the above cases, it's using an encoding 
>> transformation, so what it's doing is precisely what you would expect by 
>> it's context.
> Indeed, there is a translation going on, but it is not clear as to
> whether you are encoding _to_ something or _from_ something.  What does
> s.recode('base64') mean?  Are you encoding _to_ base64 or _from_ base64? 
> That's where the ambiguity lies.

Bengt didn't propose adding .recode() to the string types, but only the 
bytes type.  The byte type would "recode" the bytes using a specific 
transformation.  I believe his view is it's a lower level API than 
strings that can be used to implement the higher level encoding API 
with, not replace the encoding API.  Or that is they way I interpreted 
the suggestion.

>> There isn't a bytes.decode(), since that's just another transformation. 
>> So only the one method is needed.  Which makes it easer to learn.
> But ambiguous.

What's ambiguous about it?  It's no more ambiguous than any math 
operation where you can do it one way with one operations and get your 
original value back with the same operation by using an inverse value.

    n2=n+1; n3=n+(-1); n==n3
    n2=n*2; n3=n*(.5); n==n3

>> Learning how the current system works comes awfully close to reverse 
>> engineering.  Maybe I'm overstating it a bit, but I suspect many end up 
>> doing exactly that in order to learn how Python does it.
> Again, we _need_ better documentation, regardless of whether or when the
> removal of some or all .encode()/.decode() methods happen.

Yes, in the short term some parts of PEP 100 could be moved to the 
python docs I think.


From jcarlson at  Sun Feb 19 02:26:49 2006
From: jcarlson at (Josiah Carlson)
Date: Sat, 18 Feb 2006 17:26:49 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Ron Adam <rrr at> wrote:
> Josiah Carlson wrote:
> > Ron Adam <rrr at> wrote:
> >> Josiah Carlson wrote:
> > [snip]
> >>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> >>> Are we encoding _to_ something, or are we decoding _from_ something? 
> >> This was just an example of one way that might work, but here are my 
> >> thoughts on why I think it might be good.
> >>
> >> In this case, the ambiguity is reduced as far as the encoding and 
> >> decodings opperations are concerned.)
> >>
> >>       somestring = encodings.tostr( someunicodestr, 'latin-1')
> >>
> >> It's pretty clear what is happening to me.
> >>
> >>      It will encode to a string an object, named someunicodestr, with 
> >> the 'latin-1' encoder.
> > 
> > But now how do you get it back?  encodings.tounicode(..., 'latin-1')?,
> > unicode(..., 'latin-1')?
> Yes, Just do.
>       someunicodestr = encoding.tounicode( somestring, 'latin-1')
> > What about string transformations:
> >     somestring = encodings.tostr(somestr, 'base64')
>  >
> > How do we get that back?  encodings.tostr() again is completely
> > ambiguous, str(somestring, 'base64') seems a bit awkward (switching
> > namespaces)?
> In the case where a string is converted to another string. It would 
> probably be best to have a requirement that they all get converted to 
> unicode as an intermediate step.  By doing that it becomes an explicit 
> two step opperation.
>      # string to string encoding
>      u_string = encodings.tounicode(s_string, 'base64')
>      s2_string = encodings.tostr(u_string, 'base64')

Except that ambiguates it even further.

Is encodings.tounicode() encoding, or decoding?  According to everything
you have said so far, it would be decoding.  But if I am decoding binary
data, why should it be spending any time as a unicode string?  What do I

    x = #x contains base-64 encoded binary data
    y = encodings.to_unicode(x, 'base64')
y now contains BINARY DATA, except that it is a unicode string

    z = encodings.to_str(y, 'latin-1')

Later you define a str_to_str function, which I (or someone else) would
use like:

    z = str_to_str(x, 'base64', 'latin-1')

But the trick is that I don't want some unicode string encoded in
latin-1, I want my binary data unencoded.  They may happen to be the
same in this particular example, but that doesn't mean that it makes any
sense to the user.


> >>> What about .reencode and .redecode?  It seems as
> >>> though the 're' added as a prefix to .encode and .decode makes it
> >>> clearer that you get the same type back as you put in, and it is also
> >>> unambiguous to direction.
> ...
>  > I must not be expressing myself very well.
>  >
> > Right now:
> >     s.encode() -> s
> >     s.decode() -> s, u
> >     u.encode() -> s, u
> >     u.decode() -> u
> > 
> > Martin et al's desired change to encode/decode:
> >     s.decode() -> u
> >     u.encode() -> s
>  >
>  > No others.
> Which would be similar to the functions I suggested.  The main 
> difference is only weather it would be better to have them as methods or 
> separate factory functions and the spelling of the names.  Both have 
> their advantages I think.

While others would disagree, I personally am not a fan of to* or from*
style namings, for either function names (especially in the encodings
module) or methods.  Just a personal preference.

Of course, I don't find the current situation regarding
str/unicode.encode/decode to be confusing either, but maybe it's because
my unicode experience is strictly within the realm of GUI widgets, where
compartmentalization can be easier.

> >> The method bytes.recode(), always does a byte transformation which can 
> >> be almost anything.  It's the context bytes.recode() is used in that 
> >> determines what's happening.  In the above cases, it's using an encoding 
> >> transformation, so what it's doing is precisely what you would expect by 
> >> it's context.

> > Indeed, there is a translation going on, but it is not clear as to
> > whether you are encoding _to_ something or _from_ something.  What does
> > s.recode('base64') mean?  Are you encoding _to_ base64 or _from_ base64? 
> > That's where the ambiguity lies.
> Bengt didn't propose adding .recode() to the string types, but only the 
> bytes type.  The byte type would "recode" the bytes using a specific 
> transformation.  I believe his view is it's a lower level API than 
> strings that can be used to implement the higher level encoding API 
> with, not replace the encoding API.  Or that is they way I interpreted 
> the suggestion.

But again, what would the transformation be?  To something?  From
something?  'to_base64', 'from_base64', 'to_rot13' (which happens to be
identical to) 'from_rot13', ...  Saying it would "recode ... using a
specific transformation" is a cop-out, what would the translation be? 
How would it work?  How would it be spelled?

That smells quite a bit like .encode() and .decode(), just spelled
differently, and without quite a clear path.  That is why I was offering...

> > >     s.reencode() -> s (you get encoded strings as strings)
> > >     s.redecode() -> s (you get decoded strings as strings)
> > >     u.reencode() -> u (you get encoded unicode as unicode)
> > >     u.redecode() -> u (you get decoded unicode as unicode)

You keep the encode and decode to be translating between types, you use
reencode and redecode to keep the type, and define whether you are
encoding or decoding your data/text.

While I have come to agree with Terry Reedy regarding the 're' prefix on
the 'encode' and 'decode', I think that having the name of the method
define the action and the argument of the method define the codec, is
the way to go (essentially the status quo).  It may make sense to
differentiate the cases of what an encoding/decoding process may return
(types change, types stay the same), but we then have a naming issue. 
So far, I've not seen _really_ good names for describing the
encoding/decoding process, except for what we already have: encode and

What if instead of using encode/decode for the following

> > Martin et al's desired change to encode/decode:
> >     s.decode() -> u
> >     u.encode() -> s

We use some method name for inter-type transformations:
    s.transform() -> u
    u.transform() -> s

... or something better than 'transform', then we use the
.encode()/.decode() for intra-type transformations...

    s.encode() -> s (you get encoded strings as strings)
    s.decode() -> s (you get decoded strings as strings)
    u.encode() -> u (you get encoded unicode as unicode)
    u.decode() -> u (you get decoded unicode as unicode)

Probably DOA, but just a thought.

> >> There isn't a bytes.decode(), since that's just another transformation. 
> >> So only the one method is needed.  Which makes it easer to learn.
> > 
> > But ambiguous.
> What's ambiguous about it?

See the section above that I marked "[THIS IS THE AMBIGUITY]" .

> It's no more ambiguous than any math 
> operation where you can do it one way with one operations and get your 
> original value back with the same operation by using an inverse value.
>     n2=n+1; n3=n+(-1); n==n3
>     n2=n*2; n3=n*(.5); n==n3

Ahh, so you are saying 'to_base64' and 'from_base64'.  There is one
major reason why I don't like that kind of a system: I can't just say
encoding='base64' and use str.encode(encoding) and str.decode(encoding),
I necessarily have to use, str.recode('to_'+encoding) and
str.recode('from_'+encoding) .  Seems a bit awkward.

 - Josiah

From greg.ewing at  Sun Feb 19 02:50:44 2006
From: greg.ewing at (Greg Ewing)
Date: Sun, 19 Feb 2006 14:50:44 +1300
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Would people perhaps feel better if defaultdict
*wasn't* a subclass of dict, but a distinct mapping
type of its own? That would make it clearer that it's
not meant to be a drop-in replacement for a dict
in arbitrary contexts.


From raymond.hettinger at  Sun Feb 19 03:10:42 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sat, 18 Feb 2006 21:10:42 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><><><><>
Message-ID: <009501c634f9$ba0ef000$b83efea9@RaymondLaptop1>

[Greg Ewing]
> Would people perhaps feel better if defaultdict
> *wasn't* a subclass of dict, but a distinct mapping
> type of its own? That would make it clearer that it's
> not meant to be a drop-in replacement for a dict
> in arbitrary contexts.

Absolutely.  That's the right way to avoid Liskov violations from altered 
invariants and API changes.  Besides, with Python's propensity for duck typing, 
there's no reason to subclass when we don't have to.


From greg.ewing at  Sun Feb 19 03:11:53 2006
From: greg.ewing at (Greg Ewing)
Date: Sun, 19 Feb 2006 15:11:53 +1300
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Bengt Richter wrote:

> My guess is that realistically default_factory will be used
> to make clean code for filling a dict, and then turning the factory
> off if it's to be passed into unknown contexts.

This suggests that maybe the autodict behaviour shouldn't
be part of the dict itself, but provided by a wrapper
around the dict.

The you can fill the dict through the wrapper, and still
have a normal dict underneath to use for other purposes.


From bokr at  Sun Feb 19 03:47:10 2006
From: bokr at (Bengt Richter)
Date: Sun, 19 Feb 2006 02:47:10 GMT
Subject: [Python-Dev] Proposal: defaultdict
References: <>
Message-ID: <>

On Sat, 18 Feb 2006 10:44:15 +0100 (CET), "=?iso-8859-1?Q?Walter_D=F6rwald?=" <walter at> wrote:

>Guido van Rossum wrote:
>> On 2/17/06, Ian Bicking <ianb at> wrote:
>>> Guido van Rossum wrote:
>>> > d =3D {}
>>> > d.default_factory =3D set
>>> > ...
>>> > d[key].add(value)
>>> Another option would be:
>>>    d =3D {}
>>>    d.default_factory =3D set
>>>    d.get_default(key).add(value)
>>> Unlike .setdefault, this would use a factory associated with the diction=
>ary, and no default value would get passed in.
>>> Unlike the proposal, this would not override __getitem__ (not overriding
>>> __getitem__ is really the only difference with the proposal).  It would =
>be clear reading the code that you were not
>>> implicitly asserting they "key in d" was true.
>>> "get_default" isn't the best name, but another name isn't jumping out at=
> me at the moment.  Of course, it is not a Pythonic
>>> argument to say that an existing method should be overridden, or functio=
>nality made nameless simply because we can't think
>>> of a name (looking to anonymous functions of course ;)
>> I'm torn. While trying to implement this I came across some ugliness in P=
>yDict_GetItem() -- it would make sense if this also
>> called
>> on_missing(), but it must return a value without incrementing its
>> refcount, and isn't supposed to raise exceptions -- so what to do if on_m=
>issing() returns a value that's not inserted in the
>> dict?
>> If the __getattr__()-like operation that supplies and inserts a
>> dynamic default was a separate method, we wouldn't have this problem.
>> OTOH most reviewers here seem to appreciate on_missing() as a way to do v=
>arious other ways of alterning a dict's
>> __getitem__() behavior behind a caller's back -- perhaps it could even be=
> (ab)used to
>> implement case-insensitive lookup.
>I don't like the fact that on_missing()/default_factory can change the beha=
>viour of __getitem__, which upto now has been
>something simple and understandable.
>Why don't we put the on_missing()/default_factory functionality into get() =
>d.get(key, default) does what it did before. d.get(key) invokes on_missing(=
>) (and dict would have default_factory =3D=3D type(None))
OTOH, I forgot why it was desirable in the first place to overload d[k]
with defaulting logic. E.g., why wouldn't d.defaulting[k] be ok to write
when you want the d.default_factory action?

on_missing feels more like a tracing hook though, so maybe it could always
act either way if defined.

Also, for those wanting to avoid lambda:42 as factory, would a callable test
cost a lot? Of course then the default_factory name might require revision.

Bengt Richter

From nnorwitz at  Sun Feb 19 04:15:07 2006
From: nnorwitz at (Neal Norwitz)
Date: Sat, 18 Feb 2006 19:15:07 -0800
Subject: [Python-Dev] buildbot is all green
Message-ID: <>

Whoever is first to break the build, buys a round of drinks at PyCon! 
That's over 400 people and counting:

Remember to run the tests *before* checkin. :-)


From steve at  Sun Feb 19 04:38:39 2006
From: steve at (Steve Holden)
Date: Sat, 18 Feb 2006 22:38:39 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <dt8p7p$emf$>

Martin v. L?wis wrote:
> Guido van Rossum wrote:
> I would like this to be part of the standard dictionary type,
> rather than being a subtype.
> d.setdefault([]) (one argument) should install a default value,
> and d.cleardefault() should remove that setting; d.default
> should be read-only. Alternatively, d.default could be assignable
> and del-able.
The issue with setting the default this way is that a copy would have to 
be created if the behavior was to differ from the sometimes-confusing 
default argument behavior for functions.

> Also, I think has_key/in should return True if there is a default.
It certainly seems desirable to see True where d[some_key] doesn't raise 
an exception, but one could argue either way.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From steve at  Sun Feb 19 04:44:37 2006
From: steve at (Steve Holden)
Date: Sat, 18 Feb 2006 22:44:37 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <dt8piu$hqo$>

Guido van Rossum wrote:
> On 2/16/06, Guido van Rossum <guido at> wrote:
>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>with this behavior (but implemented in C) would be a good addition to
>>the language. It looks like it wouldn't be hard to implement. It could
>>be a builtin named defaultdict. The first, required, argument to the
>>constructor should be the default value. Remaining arguments (even
>>keyword args) are passed unchanged to the dict constructor.
> Thanks for all the constructive feedback. Here are some responses and
> a new proposal.
> - Yes, I'd like to kill setdefault() in 3.0 if not sooner.
> - It would indeed be nice if this was an optional feature of the
> standard dict type.
> - I'm ignoring the request for other features (ordering, key
> transforms). If you want one of these, write a PEP!
> - Many, many people suggested to use a factory function instead of a
> default value. This is indeed a much better idea (although slightly
> more cumbersome for the simplest cases).
One might think about calling it if it were callable, otherwise using it 
literally. Of course this would require jiggery-pokery int eh cases 
where you actually *wantes* the default value to be a callable (you'd 
have to provide a callable to return the callable as a default).

> - Some people seem to think that a subclass constructor signature must
> match the base class constructor signature. That's not so. The
> subclass constructor must just be careful to call the base class
> constructor with the correct arguments. Think of the subclass
> constructor as a factory function.
True, but then this does get in the way of treating the base dict and 
its defaulting subtype polymorphically. That might not be a big issue.

> - There's a fundamental difference between associating the default
> value with the dict object, and associating it with the call. So
> proposals to invent a better name/signature for setdefault() don't
> compete. (As to one specific such proposal, adding an optional bool as
> the 3rd argument to get(), I believe I've explained enough times in
> the past that flag-like arguments that always get a constant passed in
> at the call site are a bad idea and should usually be refactored into
> two separate methods.)
> - The inconsistency introduced by __getitem__() returning a value for
> keys while get(), __contains__(), and keys() etc. don't show it,
> cannot be resolved usefully. You'll just have to live with it.
> Modifying get() to do the same thing as __getitem__() doesn't seem
> useful -- it just takes away a potentially useful operation.
> So here's a new proposal.
> Let's add a generic missing-key handling method to the dict class, as
> well as a default_factory slot initialized to None. The implementation
> is like this (but in C):
> def on_missing(self, key):
>   if self.default_factory is not None:
>     value = self.default_factory()
>     self[key] = value
>     return value
>   raise KeyError(key)
> When __getitem__() (and *only* __getitem__()) finds that the requested
> key is not present in the dict, it calls self.on_missing(key) and
> returns whatever it returns -- or raises whatever it raises.
> __getitem__() doesn't need to raise KeyError any more, that's done by
> on_missing().
> The on_missing() method can be overridden to implement any semantics
> you want when the key isn't found: return a value without inserting
> it, insert a value without copying it, only do it for certain key
> types/values, make the default incorporate the key, etc.
> But the default implementation is designed so that we can write
> d = {}
> d.default_factory = list
> to create a dict that inserts a new list whenever a key is not found
> in __getitem__(), which is most useful in the original use case:
> implementing a multiset so that one can write
> d[key].append(value)
> to add a new key/value to the multiset without having to handle the
> case separately where the key isn't in the dict yet. This also works
> for sets instead of lists:
> d = {}
> d.default_factory = set
> ...
> d[key].add(value)
This seems like a very good compromise.

[non-functional alternatives ...]
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From jcarlson at  Sun Feb 19 04:50:07 2006
From: jcarlson at (Josiah Carlson)
Date: Sat, 18 Feb 2006 19:50:07 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Bengt Richter wrote:
> > My guess is that realistically default_factory will be used
> > to make clean code for filling a dict, and then turning the factory
> > off if it's to be passed into unknown contexts.
> This suggests that maybe the autodict behaviour shouldn't
> be part of the dict itself, but provided by a wrapper
> around the dict.
> The you can fill the dict through the wrapper, and still
> have a normal dict underneath to use for other purposes.

I prefer this to changing dictionaries directly.  The actual wrapper
could sit in the collections module, ready for subclassing/replacement
of the on_missing method.

 - Josiah

From python at  Sun Feb 19 04:53:35 2006
From: python at (Raymond Hettinger)
Date: Sat, 18 Feb 2006 22:53:35 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><>
Message-ID: <007a01c63508$0f00f7d0$b83efea9@RaymondLaptop1>

> > Also, I think has_key/in should return True if there is a default.

> It certainly seems desirable to see True where d[some_key]
> doesn't raise an exception, but one could argue either way.

Some things can be agreed by everyone:

* if __contains__ always returns True, then it is a useless feature (since 
scripts containing a line such as "if k in dd" can always eliminate that line 
without affecting the algorithm).

* if defaultdicts are supposed to be drop-in dict substitutes, then having
__contains__ always return True will violate basic dict invariants:
   del d[some_key]
   assert some_key not in d


From rrr at  Sun Feb 19 04:54:44 2006
From: rrr at (Ron Adam)
Date: Sat, 18 Feb 2006 21:54:44 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:
> Ron Adam <rrr at> wrote:

> Except that ambiguates it even further.
> Is encodings.tounicode() encoding, or decoding?  According to everything
> you have said so far, it would be decoding.  But if I am decoding binary
> data, why should it be spending any time as a unicode string?  What do I
> mean?

Encoding and decoding are relative concepts.  It's all encoding from one
thing to another.  Weather it's "decoding" or "encoding" depends on the
relationship of the current encoding to a standard encoding.

The confusion introduced by "decode" is when the 'default_encoding'
changes, will change, or is unknown.

>     x = #x contains base-64 encoded binary data
>     y = encodings.to_unicode(x, 'base64')
> y now contains BINARY DATA, except that it is a unicode string

No, that wasn't what I was describing.  You get a Unicode string object
as the result, not a bytes object with binary data.  See the toy example
at the bottom.

>     z = encodings.to_str(y, 'latin-1')
> Later you define a str_to_str function, which I (or someone else) would
> use like:
>     z = str_to_str(x, 'base64', 'latin-1')
> But the trick is that I don't want some unicode string encoded in
> latin-1, I want my binary data unencoded.  They may happen to be the
> same in this particular example, but that doesn't mean that it makes any
> sense to the user.

If you want bytes then you would use the bytes() type to get bytes
directly.  Not encode or decode.

     binary_unicode = bytes(unicode_string)

The exact byte order and representation would need to be decided by the
python developers in this case.  The internal representation
'unicode-internal', is UCS-2 I believed.

>> It's no more ambiguous than any math 
>> operation where you can do it one way with one operations and get your 
>> original value back with the same operation by using an inverse value.
>>     n2=n+1; n3=n+(-1); n==n3
>>     n2=n*2; n3=n*(.5); n==n3
> Ahh, so you are saying 'to_base64' and 'from_base64'.  There is one
> major reason why I don't like that kind of a system: I can't just say
> encoding='base64' and use str.encode(encoding) and str.decode(encoding),
> I necessarily have to use, str.recode('to_'+encoding) and
> str.recode('from_'+encoding) .  Seems a bit awkward.

Yes, but the encodings API could abstract out the 'to_base64' and
'from_base64' so you can just say 'base64' and have it work either way.

Maybe a toy "incomplete" example might help.

    # in module or someplace else.
    class bytes(list):
       bytes methods defined here

    # in module

    # using a dict of lists, but other solutions would
    # work just as well.
    unicode_codecs = {
       'base64': ('from_base64', 'to_base64'),

    def tounicode(obj, from_codec):
        b = bytes(obj)
        b = b.recode(unicode_codecs[from_codec][0])
        return unicode(b)

    def tostr(obj, to_codec):
        b = bytes(obj)
        b = b.recode(unicode_codecs[to_codec][1])
        return str(b)

    # in your application

    import encodings

    ... a bunch of code ...

    u = encodings.tounicode(s, 'base64')

    # or if going the other way

    s = encodings.tostr(u, 'base64')

Does this help?  Is the relationship between the bytes object and the
encodings API clearer here?  If not maybe we should discuss it further
off line.

    Ronald Adam

From steve at  Sun Feb 19 04:57:35 2006
From: steve at (Steve Holden)
Date: Sat, 18 Feb 2006 22:57:35 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <dt8qb8$jd8$>

Martin v. L?wis wrote:
> Adam Olsen wrote:
>>Still -1.  It's better, but it violates the principle of encapsulation
>>by mixing how-you-use-it state with what-it-stores state.  In doing
>>that it has the potential to break an API documented as accepting a
>>dict.  Code that expects d[key] to raise an exception (and catches the
>>resulting KeyError) will now silently "succeed".
> Of course it will, and without quotes. That's the whole point.
>>I believe that necessitates a PEP to document it.
> You are missing the rationale of the PEP process. The point is
> *not* documentation. The point of the PEP process is to channel
> and collect discussion, so that the BDFL can make a decision.
> The BDFL is not bound at all to the PEP process.
> To document things, we use (or should use) documentation.
One could wish this ideal had been the case for the import extensions 
defined in PEP 302.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From benji at  Sun Feb 19 05:11:32 2006
From: benji at (Benji York)
Date: Sat, 18 Feb 2006 23:11:32 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:

If there's interest in slightly nicer buildbot CSS (something like I'd be glad to contribute.
Benji York

From tjreedy at  Sun Feb 19 06:13:20 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 19 Feb 2006 00:13:20 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><><>
Message-ID: <dt8uph$vr3$>

> Quoting skip at
>> The only question in my mind is whether or not getting a non-existent 
>> value
>> under the influence of a given default value should stick that value in 
>> the
>> dictionary or not.

It seems to me that there are at least two types of default dicts, which 
have opposite answers to that question.

One is a 'universal dict' that maps every key to something -- the default 
if nothing else.  That should not have the default ever explicitly entered. 
Udict.keys() should only give the keys *not* mapped to the universal value.

Another is the accumlator dict.  The default value is the identity (0, [], 
or whatever) for the type of accumulation.  An adict must have the identity 
added, even though that null will usually be immedially incremented by +=1 
or .append(ob) or whatever.

Guido's last proposal was for the default default_dict to cater to the 
second type (and others needing the same behavior) while catering to the 
first by making the default fill-in method over-rideable.

It we go with, for instance, wrappers in the collections module instead of 
modification of dict, then perhaps there should be at least two wrappers 
included, with each of these two behaviors.

Terry Jan Reedy

From martin at  Sun Feb 19 06:46:40 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 06:46:40 +0100
Subject: [Python-Dev] ssize_t branch merged
In-Reply-To: <>
References: <>	<dt5mso$h0c$>	<>	<dt5tot$437$>	<>	<dt673u$phm$>	<>
	<> <>
Message-ID: <>

Travis E. Oliphant wrote:
> Why not just
> #if SIZEOF_SIZE_T == 2
> #define PY_SSIZE_T_MAX 0x7fff
> #elif SIZEOF_SIZE_T == 4
> #define PY_SSIZE_T_MAX 0x7fffffff
> #elif SIZEOF_SIZE_T == 8
> #define PY_SSIZE_T_MAX 0x7fffffffffffffff
> #elif SIZEOF_SIZE_T == 16
> #define PY_SSIZE_T_MAX 0x7fffffffffffffffffffffffffffffff
> #endif

That would not work: 0x7fffffffffffffff is not a valid
integer literal. 0x7fffffffffffffffL might work,
or 0x7fffffffffffffffLL, or 0x7fffffffffffffffi64.
Which of these is correct depends on the compiler.

How to spell 128-bit integral constants, I don't know;
it appears that MS foresees a i128 suffix for them.


From martin at  Sun Feb 19 07:05:11 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 07:05:11 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <007a01c63508$0f00f7d0$b83efea9@RaymondLaptop1>
References: <><>	<dt8p7p$emf$>
Message-ID: <>

Raymond Hettinger wrote:
>>>Also, I think has_key/in should return True if there is a default.
> * if __contains__ always returns True, then it is a useless feature (since 
> scripts containing a line such as "if k in dd" can always eliminate that line 
> without affecting the algorithm).

If you mean "if __contains__ always returns True for a default dict,
then it is a useless feature", I disagree. The code using "if k in dd"
cannot be eliminated if you don't know that you have a default dict.

> * if defaultdicts are supposed to be drop-in dict substitutes, then having
> __contains__ always return True will violate basic dict invariants:
>    del d[some_key]
>    assert some_key not in d

If you have a default value, you cannot ultimately del a key. This
sequence is *not* a basic mapping invariant. If it was, then it would
be also an invariant that, after del d[some_key], d[some_key] will
raise a KeyError. This kind of invariant doesn't take into account
that there might be a default value.


From ncoghlan at  Sun Feb 19 07:12:56 2006
From: ncoghlan at (Nick Coghlan)
Date: Sun, 19 Feb 2006 16:12:56 +1000
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:
> Whoever is first to break the build, buys a round of drinks at PyCon! 
> That's over 400 people and counting: 
> Remember to run the tests *before* checkin. :-)

I don't think we can blame Tim's recent checkins for test_logging subsequently 
breaking on Solaris though ;)

There still seems to be something a bit temperamental in that test. . .


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From martin at  Sun Feb 19 07:17:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 07:17:53 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <>

Benji York wrote:
> If there's interest in slightly nicer buildbot CSS (something like 
> I'd be glad to contribute.

I personally don't care much about the visual look of web pages.
However, people have commented that the buildbot page is ugly,
so yes, please do contribute something.

Bonus points for visually separating the "trunk" columns from
the "2.4" columns. Would a vertical line be appropriate? Bigger


From martin at  Sun Feb 19 07:19:40 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 07:19:40 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:

Unfortunately, test_logging still fails sporadically on Solaris.


From raymond.hettinger at  Sun Feb 19 07:33:42 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sun, 19 Feb 2006 01:33:42 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><>
Message-ID: <00d301c6351e$6e02b500$b83efea9@RaymondLaptop1>

[Martin v. L?wis]
> If you have a default value, you cannot ultimately del a key. This
> sequence is *not* a basic mapping invariant.

You believe that key deletion is not basic to mappings?

> This kind of invariant doesn't take into account
> that there might be a default value.

Precisely.  Therefore, a defaultdict subclass violates the Liskov Substitution 

Of course, the __del__ followed __contains__ sequence is not the only invariant 
that is thrown-off.  There are plenty of examples.  Here's one that is 
absolutely basic to the method's contract:

    k, v = dd.popitem()
    assert k not in dd

Any code that was expecting a dictionary and uses popitem() as a means of 
looping over and consuming entries will fail.

No one should kid themselves that a default dictionary is a drop-in substitute. 
Much of the dict's API has an ambiguous meaning when applied to defaultdicts.

If all keys are in-theory predefined, what is the meaning of len(dd)?

Should dd.items() include any entries where the value is equal to the default or 
should the collection never store those?  If the former, then how do you access 
the entries without looping over the whole contents?  If the latter, then do you 
worry that "dd[v]=k" does not imply "(k,v) in dd.items()"?


From g.brandl at  Sun Feb 19 07:49:12 2006
From: g.brandl at (Georg Brandl)
Date: Sun, 19 Feb 2006 07:49:12 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <dt94d8$bbd$>

Neal Norwitz wrote:
> Whoever is first to break the build, buys a round of drinks at PyCon! 
> That's over 400 people and counting: 
> Remember to run the tests *before* checkin. :-)

Don't we have a Windows slave yet?


From martin at  Sun Feb 19 07:59:58 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 07:59:58 +0100
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <00d301c6351e$6e02b500$b83efea9@RaymondLaptop1>
References: <><>	<dt8p7p$emf$>
Message-ID: <>

Raymond Hettinger wrote:
>> If you have a default value, you cannot ultimately del a key. This
>> sequence is *not* a basic mapping invariant.
> You believe that key deletion is not basic to mappings?

No, not in the sense that the key will go away through deletion.
I view a mapping as a modifiable partial function. There is some
initial key/value association (in a classic mapping, it is initially
empty), and then there are modifications. Key deletion means to
reset the key to the initial association.

> Of course, the __del__ followed __contains__ sequence is not the only
> invariant that is thrown-off.  There are plenty of examples.  Here's one
> that is absolutely basic to the method's contract:
>    k, v = dd.popitem()
>    assert k not in dd
> Any code that was expecting a dictionary and uses popitem() as a means
> of looping over and consuming entries will fail.

Well, code that loops over a dictionary using popitem typically
terminates when the dictionary becomes false (or its length becomes
zero). That code wouldn't be affected by the behaviour of "in".

> No one should kid themselves that a default dictionary is a drop-in
> substitute. Much of the dict's API has an ambiguous meaning when applied
> to defaultdicts.

Right. But it is only ambiguous until specified. Of course, in the face
of ambiguity, refuse the temptation to guess.

> If all keys are in-theory predefined, what is the meaning of len(dd)?

Taking my definition from the beginning of the message, it is the number
of keys that have been modified from the initial mapping.

> Should dd.items() include any entries where the value is equal to the
> default or should the collection never store those?

It should include all modified items, and none of the unmodified ones.
Explicitly assigning the default value still makes the entry modified;
you need to del it to set it back to "unmodified".

> If the former, then
> how do you access the entries without looping over the whole contents? 

Not sure I understand the question. You use d[k] to access an entry.


From raymond.hettinger at  Sun Feb 19 08:11:33 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sun, 19 Feb 2006 02:11:33 -0500
Subject: [Python-Dev] Proposal: defaultdict
References: <><><><>
Message-ID: <005301c63523$b751b2b0$b83efea9@RaymondLaptop1>

[Terry Reedy]
> One is a 'universal dict' that maps every key to something -- the default if 
> nothing else.  That should not have the default ever explicitly entered. 
> Udict.keys() should only give the keys *not* mapped to the universal value.

Would you consider it a mapping invariant that "k in dd" implies "k in 

Is the notion of __contains__ at odds with notion of universality?


From jcarlson at  Sun Feb 19 08:42:56 2006
From: jcarlson at (Josiah Carlson)
Date: Sat, 18 Feb 2006 23:42:56 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <00d301c6351e$6e02b500$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

"Raymond Hettinger" <raymond.hettinger at> wrote:
> [Martin v. L?wis]
> > This kind of invariant doesn't take into account
> > that there might be a default value.
> Precisely.  Therefore, a defaultdict subclass violates the Liskov Substitution 
> Principle.

class defaultdict(dict):
    def __getitem__(self, key):
            return dict.__getitem__(self, key)
        except KeyError:
            return self.on_missing(key)
    def on_missing(self, key):
        if not hasattr(self, 'default') or not callable(self.default):
            raise KeyError, key
        r = self[key] = self.default()
        return r

In my opinion, the above implementation as a subclass "does the right
thing" in regards to __del__, __contains__, get, pop, popitem, __len__,
has_key, and anything else I can think of.  Does it violate the Liskov
Substitution Principle?  Yes, but only if user code relies on dd[key]
raising a KeyError on a lack of a key.  This can be easily remedied by
removing the default when it is unneeded, at which point, you get your
Liskov Substitution.

> Of course, the __del__ followed __contains__ sequence is not the only invariant 
> that is thrown-off.  There are plenty of examples.  Here's one that is 
> absolutely basic to the method's contract:
>     k, v = dd.popitem()
>     assert k not in dd
> Any code that was expecting a dictionary and uses popitem() as a means of 
> looping over and consuming entries will fail.

>>> a = defaultdict()
>>> a.default = list
>>> a['hello']
>>> k, v = a.popitem()
>>> assert k not in a

Seems to work for the above implementation.

> No one should kid themselves that a default dictionary is a drop-in substitute. 
> Much of the dict's API has an ambiguous meaning when applied to defaultdicts.

Actually, if one is careful, the dict's API is completely unchanged,
except for direct access to the object via b = a[i].

>>> del a['hello']
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'hello'
>>> 'hello' in a
>>> a.get('hello')
>>> a.pop('hello')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'pop(): dictionary is empty'
>>> a.popitem()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'popitem(): dictionary is empty'
>>> len(a)
>>> a.has_key('hello')

> If all keys are in-theory predefined, what is the meaning of len(dd)?

It depends on the sequence of actions.  Play around with the above
defaultdict implementation.  From what I understood of Guido's original
post, this is essentially what he was proposing, only implemented in C.

> Should dd.items() include any entries where the value is equal to the default or 
> should the collection never store those?

Yes, it should store any value which was stored via 'dd[k]=v', or any
default value created via access by 'v=dd[k]' .

> If the former, then how do you access 
> the entries without looping over the whole contents?

Presumably one is looking for a single kind of default (empty list, 0,
etc.) because one wanted to accumulate into them, similar to one of the

    for item, value in input:
            d[item] += value
            #or d[item].append(value)
        except KeyError:
            d[item] = value
            #or d[item] = [value]

which becomes

    for item in input:
        dd[item] += 1
        #or dd[item].append(value)

Once accumulation has occurred, iteration over them via .iteritems(),
.items(), .popitem(), etc., would progress exactly the same way as with
a regular dictionary.  If the code which is using the accumulated data
does things like...

    for key in wanted_keys:
            value = dd[key]
        except KeyError:
        #do something nontrivial with value

rather than...

    for key in wanted_keys:
        if key not in dd:
        value = dd[key]
        #do something nontrivial with value

Then the user has at least three options to make it 'work right':
1. User can change to using 'in' to iterate rather than relying on a
2. User could remember to remove the default.
3. User can create a copy of the default dictionary via dict(dd) and
pass it into the code which relies on the non-defaulting dictionary.

> If the latter, then do you 
> worry that "dd[v]=k" does not imply "(k,v) in dd.items()"?

I personally wouldn't want the latter.

My post probably hasn't convinced you, but much of the confusion, I
believe, is based on Martin's original belief that 'k in dd' should
always return true if there is a default.  One can argue that way, but
then you end up on the circular train of thought that gets you to "you
can't do anything useful if that is the case, .popitem() doesn't work,
len() is undefined, ...".  Keep it simple, keep it sane.

 - Josiah

From martin at  Sun Feb 19 09:03:38 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 09:03:38 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <dt94d8$bbd$>
References: <>
Message-ID: <>

Georg Brandl wrote:
> Don't we have a Windows slave yet?

No; nobody volunteered a machine yet (plus the hand-holding that
is always necessary with Windows).


From mwh at  Sun Feb 19 11:18:35 2006
From: mwh at (Michael Hudson)
Date: Sun, 19 Feb 2006 10:18:35 +0000
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <> (Neal
	Norwitz's message of "Sat, 18 Feb 2006 19:15:07 -0800")
References: <>
Message-ID: <>

"Neal Norwitz" <nnorwitz at> writes:


Wow, that's very cool!


  <Aardappel> this "I hate c++" is so old
  <dash> it's as old as C++, yes
                                                -- from Twisted.Quotes

From mwh at  Sun Feb 19 11:36:09 2006
From: mwh at (Michael Hudson)
Date: Sun, 19 Feb 2006 10:36:09 +0000
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (M.'s message of "Sat, 18 Feb
	2006 20:38:21 +0100")
References: <> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

"M.-A. Lemburg" <mal at> writes:

> Martin v. L?wis wrote:
>> M.-A. Lemburg wrote:
>>>>> True. However, note that the .encode()/.decode() methods on
>>>>> strings and Unicode narrow down the possible return types.
>>>>> The corresponding .bytes methods should only allow bytes and
>>>>> Unicode.
>>>> I forgot that: what is the rationale for that restriction?
>>> To assure that only those types can be returned from those
>>> methods, ie. instances of basestring, which in return permits
>>> type inference for those methods.
>> Hmm. So it for type inference????
>> Where is that documented?
> Somewhere in the python-dev mailing list archives ;-)
> Seriously, we should probably add this to the documentation.

Err.................. I don't think this is a good argument, for quite
a few reasons.  There certainly aren't many other features in Python
designed to aid type inference and the knowledge that something
returns "unicode or str" isn't especially useful...


  ROOSTA:  Ever since you arrived on this planet last night you've
           been going round telling people that you're Zaphod
           Beeblebrox, but that they're not to tell anyone else.
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 7

From mal at  Sun Feb 19 11:56:28 2006
From: mal at (M.-A. Lemburg)
Date: Sun, 19 Feb 2006 11:56:28 +0100
Subject: [Python-Dev] [Python-checkins] r42490 - in
 python/branches/release24-maint: Lib/
 Lib/test/ Misc/NEWS
In-Reply-To: <>
References: <>
Message-ID: <>

Why are these new features being backported to 2.4 ?

georg.brandl wrote:
> Author: georg.brandl
> Date: Sun Feb 19 10:51:33 2006
> New Revision: 42490
> Modified:
>    python/branches/release24-maint/Lib/
>    python/branches/release24-maint/Lib/test/
>    python/branches/release24-maint/Misc/NEWS
> Log:
> Patch #1337756: fileinput now accepts Unicode filenames.
> Modified: python/branches/release24-maint/Lib/
> ==============================================================================
> --- python/branches/release24-maint/Lib/	(original)
> +++ python/branches/release24-maint/Lib/	Sun Feb 19 10:51:33 2006
> @@ -184,7 +184,7 @@
>      """
>      def __init__(self, files=None, inplace=0, backup="", bufsize=0):
> -        if type(files) == type(''):
> +        if isinstance(files, basestring):
>              files = (files,)
>          else:
>              if files is None:
> Modified: python/branches/release24-maint/Lib/test/
> ==============================================================================
> --- python/branches/release24-maint/Lib/test/	(original)
> +++ python/branches/release24-maint/Lib/test/	Sun Feb 19 10:51:33 2006
> @@ -157,3 +157,13 @@
>      verify(fi.lineno() == 6)
>  finally:
>      remove_tempfiles(t1, t2)
> +
> +if verbose:
> +    print "15. Unicode filenames"
> +try:
> +    t1 = writeTmp(1, ["A\nB"])
> +    fi = FileInput(files=unicode(t1, sys.getfilesystemencoding()))
> +    lines = list(fi)
> +    verify(lines == ["A\n", "B"])
> +finally:
> +    remove_tempfiles(t1)
> Modified: python/branches/release24-maint/Misc/NEWS
> ==============================================================================
> --- python/branches/release24-maint/Misc/NEWS	(original)
> +++ python/branches/release24-maint/Misc/NEWS	Sun Feb 19 10:51:33 2006
> @@ -74,6 +74,8 @@
>  Library
>  -------
> +- Patch #1337756: fileinput now accepts Unicode filenames.
> +
>  - Patch #1373643: The chunk module can now read chunks larger than
>    two gigabytes.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 19 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From hyeshik at  Sun Feb 19 12:17:43 2006
From: hyeshik at (Hye-Shik Chang)
Date: Sun, 19 Feb 2006 20:17:43 +0900
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On 2/19/06, Walter D?rwald <walter at> wrote:
> M.-A. Lemburg wrote:
> > Walter D?rwald wrote:
> >> Anyway, I've started implementing a patch that just adds codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
> >> UTF-16, UTF-16-LE and UTF-16-BE are already working.
> >
> > Nice :-)
> is updated now too. The rest should be manageble too. I'll leave updating the CJKV codecs to Hye-Shik though.

Okay. I'll look whether how CJK codecs can be improved by the
new protocol soon.  I guess it'll be not so difficult because CJK
codecs have a their own common stateful framework already.

BTW, CJK codecs don't have V yet.  :-)


From stephen at  Sun Feb 19 13:38:44 2006
From: stephen at (Stephen J. Turnbull)
Date: Sun, 19 Feb 2006 21:38:44 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Ian Bicking's message of
	"Fri, 17 Feb 2006 18:06:13 -0600")
References: <> <>
	<> <>
	<> <>
Message-ID: <>

>>>>> "Ian" == Ian Bicking <ianb at> writes:

    Ian> Encodings cover up eclectic interfaces, where those
    Ian> interfaces fit a basic pattern -- data in, data out.

Isn't "filter" the word you're looking for?

I think you've just made a very strong case that this is a slippery
slope that we should avoid.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From mal at  Sun Feb 19 14:12:11 2006
From: mal at (M.-A. Lemburg)
Date: Sun, 19 Feb 2006 14:12:11 +0100
Subject: [Python-Dev] [Python-checkins] r42490 - in
 python/branches/release24-maint: Lib/
 Lib/test/ Misc/NEWS
In-Reply-To: <dt9pbi$sm5$>
References: <>	<>
Message-ID: <>

Georg Brandl wrote:
> M.-A. Lemburg wrote:
>> Why are these new features being backported to 2.4 ?
>> georg.brandl wrote:
>>> Author: georg.brandl
>>> Date: Sun Feb 19 10:51:33 2006
>>> New Revision: 42490
>>> Modified:
>>>    python/branches/release24-maint/Lib/
>>>    python/branches/release24-maint/Lib/test/
>>>    python/branches/release24-maint/Misc/NEWS
>>> Log:
>>> Patch #1337756: fileinput now accepts Unicode filenames.
> Is that a new feature? I thought that wherever a filename is accepted,
> it can be unicode too.
> The previous behavior was a bug in any case, since it treated the
> unicode string as a sequence of filenames. Would you fix that by
> raising a ValueError?

No, but from the text in the NEWS file things sounded a lot
like a feature rather than a bug fix.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 19 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From stephen at  Sun Feb 19 14:30:54 2006
From: stephen at (Stephen J. Turnbull)
Date: Sun, 19 Feb 2006 22:30:54 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (M.'s message of "Sat, 18 Feb
	2006 12:44:27 +0100")
References: <>
	<> <>
	<> <>
Message-ID: <>

>>>>> "M" == "M.-A. Lemburg" <mal at> writes:

    M> Martin v. L?wis wrote:

    >> No. The reason to ban string.decode and bytes.encode is that it
    >> confuses users.

    M> Instead of starting to ban everything that can potentially
    M> confuse a few users, we should educate those users and tell
    M> them what these methods mean and how they should be used.

ISTM it's neither "potential" nor "a few".

As Aahz pointed out, for the common use of text I/O it requires only a
single clue ("Unicode is The One True Plain Text, everything else must
be decoded to Unicode before use.") and you don't need any "education"
about "how to use" codecs under Martin's restrictions; you just need
to know which ones to use.

This is not a benefit to be given up lightly.

Would it be reasonable to put those restrictions in the codecs?  Ie,
so that bytes().encode('gzip') is allowed for the "generic" codec
'gzip', but bytes().encode('utf-8') is an error for the "charset"
codec 'utf-8'?

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From arigo at  Sun Feb 19 15:03:40 2006
From: arigo at (Armin Rigo)
Date: Sun, 19 Feb 2006 15:03:40 +0100
Subject: [Python-Dev] 2.5 release schedule
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Neal & Jeremy,

On Fri, Feb 17, 2006 at 10:53:19PM -0800, Neal Norwitz wrote:
> I don't think it belongs in the PEP.  I bumped the priority to 7 which
> is the standard protocol, though I don't know that it's really
> followed.


> I will enumerate the existing problems for Jeremy in the
> bug report.
> In the future,  I would also prefer separate bug reports.  Feel free
> to assign new bugs to Jeremy too. :-)

Thanks :-)

A bientot,


From stephen at  Sun Feb 19 15:30:02 2006
From: stephen at (Stephen J. Turnbull)
Date: Sun, 19 Feb 2006 23:30:02 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (M.'s message of "Sat, 18 Feb
	2006 13:21:18 +0100")
References: <> <>
	<> <>
	<> <>
Message-ID: <>

>>>>> "M" == "M.-A. Lemburg" <mal at> writes:

    M> The main reason is symmetry and the fact that strings and
    M> Unicode should be as similar as possible in order to simplify
    M> the task of moving from one to the other.

Those are perfectly compatible with Martin's suggestion.

    M> Still, I believe that this is an educational problem. There are
    M> a couple of gotchas users will have to be aware of (and this is
    M> unrelated to the methods in question):

But IMO that's wrong, both in attitude and in fact.  As for attitude,
users should not have to be aware of these gotchas.  Codec writers, on
the other hand, should be required to avoid presenting users with
those gotchas.  Martin's draconian restriction is in the right
direction, but you can argue it goes way too far.

In fact, of course it's related to the methods in question.
"Original" vs "derived" data can only be defined in terms of some
notion of the "usual semantics" of the streams, and that is going to
be strongly reflected in the semantics of the methods.

    M> * "encoding" always refers to transforming original data into a
    M> derived form

    M> * "decoding" always refers to transforming a derived form of
    M> data back into its original form

Users *already* know that; it's a very strong connotation of the
English words.  The problem is that users typically have their own
concept of what's original and what's derived.  For example:

    M> * for Unicode codecs the original form is Unicode, the derived
    M> form is, in most cases, a string

First of all, that's Martin's point!

Second, almost all Americans, a large majority of Japanese, and I
would bet most Western Europeans would say you have that backwards.
That's the problem, and it's the Unicode advocates' problem (ie,
ours), not the users'.  Even if we're right: education will require
lots of effort.  Rather, we should just make it as easy as possible to
do it right, and hard to do it wrong.

BTW, what use cases do you have in mind for Unicode -> Unicode
decoding?  Maximally decomposed forms and/or eliminating compatibility
characters etc?  Very specialized.

    M> Codecs also unify the various interfaces to common encodings
    M> such as base64, uu or zip which are not Unicode related.

Now this is useful and has use cases I've run into, for example in
email, where you would like to use the same interface for base64 as
for shift_jis and you'd like to be able to write

    def encode-mime-body (string, codec-list):
        if codec-list[0] not in charset-codec-list:
            raise NotCharsetCodecException
        if len (codec-list) > 1 and codec-list[-1] not in transfer-codec-list:
            raise NotTransferCodecException
        for codec in codec-list:
            string = string.encode (codec)
        return string

    mime-body = encode-mime-body ("This is a pen.",
                                  [ 'shift_jis', 'zip', 'base64' ])

I guess I have to admit I'm backtracking from my earlier hardline
support for Martin's position, but I'm still sympathetic: (a) that's
the direct way to "make it easy to do it right", and (b) I still think
the use cases for non-Unicode codecs are YAGNI very often.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From g.brandl at  Sun Feb 19 16:00:47 2006
From: g.brandl at (Georg Brandl)
Date: Sun, 19 Feb 2006 16:00:47 +0100
Subject: [Python-Dev] Enhancements to the fileinput module
Message-ID: <dta16v$h2q$>

I've just checked in some enhancements to the fileinput module.

* fileno() to check the current file descriptor
* mode argument to allow opening in universal newline mode
* openhook argument to allow transparent opening of compressed
  or encoded files.

Please feel free to comment.


From fredrik at  Sun Feb 19 16:05:36 2006
From: fredrik at (Fredrik Lundh)
Date: Sun, 19 Feb 2006 16:05:36 +0100
Subject: [Python-Dev] Enhancements to the fileinput module
References: <dta16v$h2q$>
Message-ID: <dta1g2$ij3$>

Georg Brandl wrote:

> I've just checked in some enhancements to the fileinput module.
> * fileno() to check the current file descriptor
> * mode argument to allow opening in universal newline mode
> * openhook argument to allow transparent opening of compressed
>   or encoded files.
> Please feel free to comment.

hey, where's the PEP, the endless thread where the same arguments are
repeated over and over again, the -1 vetos from the peanut gallery, and
the mandatory off-topic subthreads?

(looks good to me.  it might be idea to mention that hook_compressed
uses the extension instead of the file signature to determine what de-
compressor to use, though...)


From g.brandl at  Sun Feb 19 16:06:16 2006
From: g.brandl at (Georg Brandl)
Date: Sun, 19 Feb 2006 16:06:16 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <dta1h8$h2q$>

Benji York wrote:
> Neal Norwitz wrote:
> If there's interest in slightly nicer buildbot CSS (something like 
> I'd be glad to contribute.

+1. Looks nice!


From stephen at  Sun Feb 19 16:14:14 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 20 Feb 2006 00:14:14 +0900
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
	(Guido van Rossum's message of "Fri, 17 Feb 2006 13:26:30 -0800")
References: <>
	<dstlvb$6cb$> <>
Message-ID: <>

>>>>> "Guido" == Guido van Rossum <guido at> writes:

    Guido> On 2/16/06, Stephen J. Turnbull <stephen at> wrote:

    >> /usr/share often is on a different mount; that's the whole
    >> rationale for /usr/share.

    Guido> I don't think I've worked at a place where something like
    Guido> that was done for at least 10 years. Isn't this argument
    Guido> outdated?

I don't know.  It may be obsolete in practice.  I just know that I do
it, and so do several of the people on Coda list.

In my case, I don't do it because I'm short of disk space.  I do it
because my preferred distributed file system is Coda, which doesn't
support exporting a local file system.  You use a specialized server
instead.  Because Coda is designed for disconnected use, the files I
actually am using are in the cache (200MB, so cache misses when
disconnected are fairly rare).  But if the host whose files I'm
browsing gets an update and I'm connected, Coda automatically
refreshes the cache.

Coda is still not really production quality, and development on Coda
and similar (eg Intermezzo) seem pretty slow, so this use case may
never be of practical importance.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From stephen at  Sun Feb 19 17:21:59 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 20 Feb 2006 01:21:59 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Josiah Carlson's
	message of "Sat, 18 Feb 2006 01:16:07 -0800")
References: <>
	<> <>
Message-ID: <>

>>>>> "Josiah" == Josiah Carlson <jcarlson at> writes:

    Josiah> The question remains: is str.decode() returning a string
    Josiah> or unicode depending on the argument passed, when the
    Josiah> argument quite literally names the codec involved,
    Josiah> difficult to understand?  I don't believe so; am I the
    Josiah> only one?

Do you do any of the user education *about codec use* that you
recommend?  The people I try to teach about coding invariably find it
difficult to understand.  The problem is that the near-universal
intuition is that for "human-usable text" is pretty much anything *but
Unicode* will do.  This is a really hard block to get them past.
There is very good reason why Unicode is plain text ("original" in
MAL's terms) and everything else is encoded ("derived"), but students
new to the concept often take a while to "get" it.

Maybe it's just me, but whether it's the teacher or the students, I am
*not* excited about the education route.  Martin's simple rule *is*
simple, and the exceptions for using a "nonexistent" method mean I
don't have to reinforce---the students will be able to teach each
other.  The exceptions also directly help reinforce the notion that
text == Unicode.

I grant the point that .decode('base64') is useful, but I also believe
that "education" is a lot more easily said than done in this case.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From murman at  Sun Feb 19 17:23:15 2006
From: murman at (Michael Urman)
Date: Sun, 19 Feb 2006 10:23:15 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/19/06, Josiah Carlson <jcarlson at> wrote:
> My post probably hasn't convinced you, but much of the confusion, I
> believe, is based on Martin's original belief that 'k in dd' should
> always return true if there is a default.  One can argue that way, but
> then you end up on the circular train of thought that gets you to "you
> can't do anything useful if that is the case, .popitem() doesn't work,
> len() is undefined, ...".  Keep it simple, keep it sane.

A default factory implementation fundamentally modifies the behavior
of the mapping. There is no single answer to the question "what is the
right behavior for contains, len, popitem" as that depends on what the
code that consumes the mapping is written like, what it is attempting
to do, and what you are attempting to override it to do. Or, simply,
on why you are providing a default value. Resisting the temptation to
guess the why and just leaving the methods as is seems  the best
choice; overriding __contains__ to return true is much easier than
reversing that behavior would be.

An example when it could theoretically be used, if not particularly
useful. The gettext.install() function was just updated to take a
names parameter which controls which gettext accessor functions it
adds to the builtin namespace. Its implementation looks for "method in
names" to decide. Passing a default-true dict would allow the future
behavior to be bind all checked names, but only if __contains__
returns True.

Even though it would make a poor base implementation, and these
effects aren't a good candidate for it,  the code style that could
best leverage such a __contains__ exists.

Michael Urman

From benji at  Sun Feb 19 18:06:13 2006
From: benji at (Benji York)
Date: Sun, 19 Feb 2006 12:06:13 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> I personally don't care much about the visual look of web pages.
> However, people have commented that the buildbot page is ugly,
> so yes, please do contribute something.


It doesn't look quite as good in IE because of the limited HTML the 
buildbot waterfall display generates and the limitations of IE's CSS 

> Bonus points for visually separating the "trunk" columns from
> the "2.4" columns.

The best I could do without hacking buildbot was to highlight the trunk 
"builder" links.  This only works in Firefox, also because of IE's 
limited CSS2 support.

More could be done if the HTML generation was modified, but that didn't 
seem prudent.
Benji York

From stephen at  Sun Feb 19 18:26:39 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 20 Feb 2006 02:26:39 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Bob
	Ippolito's message of "Fri, 17 Feb 2006 21:10:04 -0800")
References: <>
Message-ID: <>

>>>>> "Bob" == Bob Ippolito <bob at> writes:

    Bob> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:

    >> But you aren't always getting *unicode* text from the decoding
    >> of bytes, and you may be encoding bytes *to* bytes:

Please note that I presumed that you can indeed assume that decoding
of bytes always results in unicode, and encoding of unicode always
results in bytes.  I believe Guido made the proposal relying on that
assumption too.  The constructor notation makes no sense for making an
object of the same type as the original unless it's a copy constructor.

You could argue that the base64 language is indeed a different
language from the bytes language, and I'd agree.  But since there's no
way in Python to determine whether a string that conforms to base64 is
supposed to be base64 or bytes, it would be a very bad idea to
interpret the distinction as one of type.

    >>     b2 = bytes(b, "base64")

    >>     b3 = bytes(b2, "base64")

    >> Which direction are we going again?

    Bob> This is *exactly* why the current set of codecs are INSANE.
    Bob> unicode.encode and str.decode should be used *only* for
    Bob> unicode codecs.  Byte transforms are entirely different
    Bob> semantically and should be some other method pair.

General filters are semantically different, I agree.  But "encode" and
"decode" in English are certainly far more general than character
coding conversion.  The use of those methods for any stream conversion
that is invertible (eg, compression or encryption) is not insane.
It's just pedagogically inconvenient given the existing confusion
(outside of python-dev, of course<wink>) about character coding

I'd like to rephrase your statement as "*only* unicode.encode and
str.decode should be used for unicode codecs".  Ie, str.encode(codec)
and unicode.decode(codec) should raise errors if codec is a "unicode
codec".  The question in my mind is whether we should allow other
kinds of codecs or not.

I could live with "not"<wink>, but if we're going to have other kinds
of codecs, I think they should have concrete signatures.  Ie,
basestring -> basestring shouldn't be allowed.  Content transfer
encodings like BASE64 and quoted-printable, compression, encryption,
etc IMO should be bytes -> bytes.  Overloading to unicode -> unicode
is sorta plausible for BASE64 or QP, but YAGNI.  OTOH, the Unicode
standard does define a number of unicode -> unicode transformations,
and it might make sense to generalize to case conversions etc.  (Note
that these conversions are pseudo-invertible, so you can think of them
as generalized .encode/.decode pairs.  The inverse is usually the
identity, which seems weird, but from the pedagogical standpoint you
could handle that weirdness by raising an error if the .encode method
were invoked.)

To be concrete, I could imagine writing

    s2 = s1.decode('upcase')
    if s2 == s1:
        print "Why are you shouting at me?"
        print "I like calm, well-spoken snakes."

    s3 = s2.encode('upcase')
    if s3 == s2:
        print "Never fails!"
        print "See a vet; your Python is *very* sick."

I chose the decode method to do the non-trivial transformation because
.decode()'s value is supposed to be "original" text in MAL's terms.
And that's true of uppercase-only text; you're still supposed to be
able to read it, so I guess it's not "encoded".  That's pretty
pedantic; I think it's better to raise on .encode('upcase').

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From stephen at  Sun Feb 19 19:04:24 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 20 Feb 2006 03:04:24 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Bengt Richter's message
	of "Sat, 18 Feb 2006 07:24:31 GMT")
References: <>
Message-ID: <>

>>>>> "Bengt" == Bengt Richter <bokr at> writes:

    Bengt> The characters in b could be encoded in plain ascii, or
    Bengt> utf16le, you have to know.

Which base64 are you thinking about?  Both RFC 3548 and RFC 2045
(MIME) specify subsets of US-ASCII explicitly.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From walter at  Sun Feb 19 19:33:37 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Sun, 19 Feb 2006 19:33:37 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Benji York wrote:
> Martin v. L?wis wrote:
>> I personally don't care much about the visual look of web pages.
>> However, people have commented that the buildbot page is ugly,
>> so yes, please do contribute something.
> See
> It doesn't look quite as good in IE because of the limited HTML the 
> buildbot waterfall display generates and the limitations of IE's CSS 
> support.
>> Bonus points for visually separating the "trunk" columns from
>> the "2.4" columns.
> The best I could do without hacking buildbot was to highlight the trunk 
> "builder" links.  This only works in Firefox, also because of IE's 
> limited CSS2 support.
> More could be done if the HTML generation was modified, but that didn't 
> seem prudent.

I'd like to see vertical lines between the column.

Why is everything bold?

    Walter D?rwald

From martin at  Sun Feb 19 19:55:49 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 19:55:49 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> BTW, what use cases do you have in mind for Unicode -> Unicode
> decoding?

I think "rot13" falls into that category: it is a transformation
on text, not on bytes.

For other "odd" cases: "base64" goes Unicode->bytes in the *decode*
direction, not in the encode direction. Some may argue that base64
is bytes, not text, but in many applications, you can combine base64
(or uuencode) with abitrary other text in a single stream. Of course,
it could be required that you go u.encode("ascii").decode("base64").

>     def encode-mime-body (string, codec-list):
>         if codec-list[0] not in charset-codec-list:
>             raise NotCharsetCodecException
>         if len (codec-list) > 1 and codec-list[-1] not in transfer-codec-list:
>             raise NotTransferCodecException
>         for codec in codec-list:
>             string = string.encode (codec)
>         return string
>     mime-body = encode-mime-body ("This is a pen.",
>                                   [ 'shift_jis', 'zip', 'base64' ])

I think this is an example where you *should* use the codec API,
as designed. As that apparently requires streams for stacking (ie.
no support for codec stacking), you would have to write

def encode_mime_body(string, codec_list):
    stack = output = cStringIO.StringIO()
    for codec in reversed(codec_list):
        stack = codecs.getwriter(codec)(stack)
    return output.getValue()

Notice that you have to start the stacking with the last codec,
and you have to keep a reference to the StringIO object where
the actual bytes end up.


P.S. there shows some LISP through in your Python code :-)

From martin at  Sun Feb 19 20:04:41 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 20:04:41 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Do you do any of the user education *about codec use* that you
> recommend?  The people I try to teach about coding invariably find it
> difficult to understand.  The problem is that the near-universal
> intuition is that for "human-usable text" is pretty much anything *but
> Unicode* will do.

It really is a matter of education. For the first time in my career,
I have been teaching the first-semester programming course, and I
was happy to see that the text book already has a section on text
and Unicode (actually, I selected the text book also based on whether
there was good discussion of that aspect). So I spent quite some
time with data representation (integrals, floats, characters), and
I hope that the students now "got it".

If they didn't learn it that way in the first semester (or already
got mis-educated in highschool), it will be very hard for them to
relearn. So I expect that it will take a decade or two until this
all is common knowledge.


From martin at  Sun Feb 19 20:14:25 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 20:14:25 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
>     Bengt> The characters in b could be encoded in plain ascii, or
>     Bengt> utf16le, you have to know.
> Which base64 are you thinking about?  Both RFC 3548 and RFC 2045
> (MIME) specify subsets of US-ASCII explicitly.

Unfortunately, it is ambiguous as to whether they refer to US-ASCII,
the character set, or US-ASCII, the encoding. It appears that
RFC 3548 talks about the character set only:

- section 2.4 talks about "choosing an alphabet", and how it should
  be possible for humans to handle such data.
- section 2.3 talks about non-alphabet characters

So it appears that RFC 3548 defines a conversion bytes->text.
To transmit this, you then also need encoding. MIME appears
to also use the US-ASCII *encoding* ("charset", in IETF speak),
for the "base64" Content-Transfer-Encoding.

For an example where base64 is *not* necessarily ASCII-encoded,
see the "binary" data type in XML Schema. There, base64 is embedded
into an XML document, and uses the encoding of the entire XML
document. As a result, you may get base64 data in utf16le.


From martin at  Sun Feb 19 20:18:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Feb 2006 20:18:00 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>	<>
	<> <>
Message-ID: <>

Walter D?rwald wrote:
> I'd like to see vertical lines between the column.

Can you please elaborate? Between which columns?

> Why is everything bold?

Not sure.


From walter at  Sun Feb 19 20:37:29 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Sun, 19 Feb 2006 20:37:29 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Walter D?rwald wrote:
>> I'd like to see vertical lines between the column.
> Can you please elaborate? Between which columns?

Something like this:

>> Why is everything bold?
> Not sure.

    Walter D?rwald

From nnorwitz at  Sun Feb 19 20:42:51 2006
From: nnorwitz at (Neal Norwitz)
Date: Sun, 19 Feb 2006 11:42:51 -0800
Subject: [Python-Dev] [Python-checkins] r42396 - peps/trunk/pep-0011.txt
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, M.-A. Lemburg <mal at> wrote:
> Neal Norwitz wrote:
> >
> > I don't have a strong opinion.  Any one else have an opinion about
> > removing --with-wctype-functions from configure?
> FWIW, I announced this plan in Dec 2004:
> I didn't get any replies back then, so assumed that no-one
> would object, but forgot to add this to the PEP 11.
> The reason I'd like to get this removed early rather than
> later is that some Linux distros happen to use the config
> switch causing the Python Unicode implementation on those
> distros to behave inconsistent with regular Python
> builds.

Like I said I don't have a strong opinion.  At least update PEP 11
now.  It would be good to ask on c.l.p.  I suspect that no one cares
enough about this flag to complain.  So it's probably ok to remove it.
 But we should at least give people the opportunity to object.

> Another candidate for removal is the --disable-unicode
> switch.
> We should probably add a deprecation warning for that in
> Py 2.5 and then remove the hundreds of
> from the source code in time for Py 2.6.

I've heard of a bunch of people using --disable-unicode.  I'm not sure
if it's curiosity or if there are really production builds without
unicode.  Ask this on c.l.p too.

We can update configure to add the warning and add a note to PEP 11. 
If we don't hear any complaints remove it for 2.6.  If there are
complaints, we can always back off.


From ianb at  Sun Feb 19 21:15:38 2006
From: ianb at (Ian Bicking)
Date: Sun, 19 Feb 2006 14:15:38 -0600
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>	<00d301c6351e$6e02b500$b83efea9@RaymondLaptop1>	<>
Message-ID: <>

Michael Urman wrote:
> On 2/19/06, Josiah Carlson <jcarlson at> wrote:
>>My post probably hasn't convinced you, but much of the confusion, I
>>believe, is based on Martin's original belief that 'k in dd' should
>>always return true if there is a default.  One can argue that way, but
>>then you end up on the circular train of thought that gets you to "you
>>can't do anything useful if that is the case, .popitem() doesn't work,
>>len() is undefined, ...".  Keep it simple, keep it sane.
> A default factory implementation fundamentally modifies the behavior
> of the mapping. There is no single answer to the question "what is the
> right behavior for contains, len, popitem" as that depends on what the
> code that consumes the mapping is written like, what it is attempting
> to do, and what you are attempting to override it to do. Or, simply,
> on why you are providing a default value. Resisting the temptation to
> guess the why and just leaving the methods as is seems  the best
> choice; overriding __contains__ to return true is much easier than
> reversing that behavior would be.

I agree that there is simply no universally correct answer for the 
various uses of default_factory.  I think ambiguity on points like this 
is a sign that something is overly general.

In many of the concrete cases it is fairly clear how these methods 
should work.  In the most obvious case (default_factory=list) what seems 
to be to be the correct implementation is one that no one is proposing, 
that is, "x in d" means "d.get(x)".  But that uses the fact that the 
return value of default_factory() is a false value, which we cannot 
assume in general.  And it effects .keys() -- which I would propose 
overriding for multidict (so it only returns keys with non-empty lists 
for values), but I don't see how it could be made correct for 

I just don't see why we should cram all these potential features into 
dict by using a vague feature like default_factory.  Why can't we just 
add a half-dozen new types of collections (to the module of the same 
name)?  Each one will get its own page of documentation, a name, a 
proper __repr__, and well defined meaning for all of these methods that 
it shares with dict only insofar as it makes sense to share.

Note that even if we use defaultdict or autodict or something besides 
changing dict itself, we still won't get a good __contains__, a good 
repr, or any of the other features that specific collection 
implementations will give us.

Isn't there anyone else who sees the various dict-like objects being 
passed around as recipes, and thinks that maybe that's a sign they 
should go in the stdlib?  The best of those recipes aren't 
all-encompassing, they just do one kind of container well.

Ian Bicking  |  ianb at  |

From benji at  Sun Feb 19 22:06:41 2006
From: benji at (Benji York)
Date: Sun, 19 Feb 2006 16:06:41 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Walter D?rwald wrote:
> I'd like to see vertical lines between the column.

I've done a version like that (still at

> Why is everything bold?

I was trying to increase the legibility of the smaller type (a result of 
trying to fit more in the horizontal space).  The current version is 
bold-free with slightly larger text.
Benji York

From tim.peters at  Sun Feb 19 22:07:47 2006
From: tim.peters at (Tim Peters)
Date: Sun, 19 Feb 2006 16:07:47 -0500
Subject: [Python-Dev] test_fileinput failing on Windows
Message-ID: <>

This started failing since last night:

C:\Code\python\PCbuild>python ..\lib\test\
1. Simple iteration (bs=0)
2. Status variables (bs=0)
3. Nextfile (bs=0)
4. Stdin (bs=0)
5. Boundary conditions (bs=0)
6. Inplace (bs=0)
7. Simple iteration (bs=30)
8. Status variables (bs=30)
9. Nextfile (bs=30)
10. Stdin (bs=30)
11. Boundary conditions (bs=30)
12. Inplace (bs=30)
13. 0-byte files
14. Files that don't end with newline
15. Unicode filenames
16. fileno()
17. Specify opening mode
Traceback (most recent call last):
  File "..\lib\test\", line 201, in <module>
    verify(lines == ["A\n", "B\n", "C\n", "D"])
  File "C:\Code\python\lib\test\", line 204, in verify
    raise TestFailed(reason)
test.test_support.TestFailed: test failed

`lines` at that point is

    ['A\n', 'B\n', '\n', 'C\n', 'D']

which indeed doesn't equal

    ["A\n", "B\n", "C\n", "D"]

From nnorwitz at  Sun Feb 19 22:13:12 2006
From: nnorwitz at (Neal Norwitz)
Date: Sun, 19 Feb 2006 13:13:12 -0800
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/19/06, Benji York <benji at> wrote:
> Walter D?rwald wrote:
> > I'd like to see vertical lines between the column.
> I've done a version like that (still at

I liked your current version better so I installed it.


From benji at  Sun Feb 19 22:14:11 2006
From: benji at (Benji York)
Date: Sun, 19 Feb 2006 16:14:11 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Benji York wrote:
> Great! you haven't explicitly stated that: may I copy this on
> (I did, but I need confirmation)

Sure!  Feel free to use it as you wish.

I replied to Walter D?rwald's suggestions and made a few changes, but 
don't know which I like better.  If you prefer the new one at you can use it as well.

(copying python-dev as a permanent record of permission)
Benji York

From tim.peters at  Sun Feb 19 22:22:29 2006
From: tim.peters at (Tim Peters)
Date: Sun, 19 Feb 2006 16:22:29 -0500
Subject: [Python-Dev] test_fileinput failing on Windows
In-Reply-To: <>
References: <>
Message-ID: <>

Never mind -- repaired it.

From crutcher at  Sun Feb 19 22:52:38 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 13:52:38 -0800
Subject: [Python-Dev] New Module: CommandLoop
Message-ID: <>

This is something I've been working on for a bit, and I think it is
more or less ready to bring up on this list. I'd like to add a module
(though probably not for 2.5).

Before you ask, this module is _not_ compatible with, as it is
command oriented, whereas is line oriented.

Anyway, I'm looking for feedback, feature requests before starting the
submission process.

Code available here:

Base class for writing simple interactive command loop environments.

CommandLoop provides a base class for writing simple interactive user
environments.  It is designed around sub-classing, has a simple command
parser, and is trivial to initialize.

Here is a trivial little environment written using CommandLoop:

    import cmdloop

    class Hello(cmdloop.commandLoop):

        @cmdloop.aliases('hello', 'hi', 'hola')
        @cmdloop.shorthelp('say hello')
        @cmdloop.usage('hello TARGET')
        def helloCmd(self, flags, args):
            Say hello to TARGET, which defaults to 'world'
            if flags or len(args) != 1:
                raise cmdloop.InvalidArguments
            print 'Hello %s!' % args[0]

        def quitCmd(self, flags, args):
            Quit the environment.
            raise cmdloop.HaltLoop


Here's a more complex example:

    import cmdloop

    class HelloGoodbye(cmdloop.CommandLoop):

        def __init__(self, default_target = 'world'):
            self.default_target = default_target
            self.target_list = []

        @cmdloop.aliases('hello', 'hi', 'hola')
        @cmdloop.shorthelp('say hello')
        @cmdloop.usage('hello [TARGET]')
        def helloCmd(self, flags, args):
            Say hello to TARGET, which defaults to 'world'
            if flags or len(args) > 1:
                raise cmdloop.InvalidArguments
            if args:
                target = args[0]
                target = self.default_target
            if target not in self.target_list:
            print 'Hello %s!' % target

        @cmdloop.shorthelp('say goodbye')
        @cmdloop.usage('goodbye TARGET')
        def goodbyeCmd(self, flags, args):
            Say goodbye to TARGET.
            if flags or len(args) != 1:
                raise cmdloop.InvalidArguments
            target = args[0]
            if target in self.target_list:
                print 'Goodbye %s!' % target
                print "I haven't said hello to %s." % target

        def quitCmd(self, flags, args):
            Quit the environment.
            raise cmdloop.HaltLoop

        def _onLoopExit(self):
            if len(self.target_list):
                for target in self.target_list:
                    self.pushCommands(('goodbye', target))
                raise cmdloop.HaltLoop


Crutcher Dunnavant <crutcher at>

From crutcher at  Sun Feb 19 23:02:20 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 14:02:20 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <>
References: <>
Message-ID: <>

oops, error in the example: s/commandLoop/CommandLoop/g

On 2/19/06, Crutcher Dunnavant <crutcher at> wrote:
> This is something I've been working on for a bit, and I think it is
> more or less ready to bring up on this list. I'd like to add a module
> (though probably not for 2.5).
> Before you ask, this module is _not_ compatible with, as it is
> command oriented, whereas is line oriented.
> Anyway, I'm looking for feedback, feature requests before starting the
> submission process.
> Code available here:
> Base class for writing simple interactive command loop environments.
> CommandLoop provides a base class for writing simple interactive user
> environments.  It is designed around sub-classing, has a simple command
> parser, and is trivial to initialize.
> Here is a trivial little environment written using CommandLoop:
>     import cmdloop
>     class Hello(cmdloop.commandLoop):
>         PS1='hello>'
>         @cmdloop.aliases('hello', 'hi', 'hola')
>         @cmdloop.shorthelp('say hello')
>         @cmdloop.usage('hello TARGET')
>         def helloCmd(self, flags, args):
>             '''
>             Say hello to TARGET, which defaults to 'world'
>             '''
>             if flags or len(args) != 1:
>                 raise cmdloop.InvalidArguments
>             print 'Hello %s!' % args[0]
>         @cmdloop.aliases('quit')
>         def quitCmd(self, flags, args):
>             '''
>             Quit the environment.
>             '''
>             raise cmdloop.HaltLoop
>     Hello().runLoop()
> Here's a more complex example:
>     import cmdloop
>     class HelloGoodbye(cmdloop.CommandLoop):
>         PS1='hello>'
>         def __init__(self, default_target = 'world'):
>             self.default_target = default_target
>             self.target_list = []
>         @cmdloop.aliases('hello', 'hi', 'hola')
>         @cmdloop.shorthelp('say hello')
>         @cmdloop.usage('hello [TARGET]')
>         def helloCmd(self, flags, args):
>             '''
>             Say hello to TARGET, which defaults to 'world'
>             '''
>             if flags or len(args) > 1:
>                 raise cmdloop.InvalidArguments
>             if args:
>                 target = args[0]
>             else:
>                 target = self.default_target
>             if target not in self.target_list:
>                 self.target_list.append(target)
>             print 'Hello %s!' % target
>         @cmdloop.aliases('goodbye')
>         @cmdloop.shorthelp('say goodbye')
>         @cmdloop.usage('goodbye TARGET')
>         def goodbyeCmd(self, flags, args):
>             '''
>             Say goodbye to TARGET.
>             '''
>             if flags or len(args) != 1:
>                 raise cmdloop.InvalidArguments
>             target = args[0]
>             if target in self.target_list:
>                 print 'Goodbye %s!' % target
>                 self.target_list.remove(target)
>             else:
>                 print "I haven't said hello to %s." % target
>         @cmdloop.aliases('quit')
>         def quitCmd(self, flags, args):
>             '''
>             Quit the environment.
>             '''
>             raise cmdloop.HaltLoop
>         def _onLoopExit(self):
>             if len(self.target_list):
>                 self.pushCommands(('quit',))
>                 for target in self.target_list:
>                     self.pushCommands(('goodbye', target))
>             else:
>                 raise cmdloop.HaltLoop
>     HelloGoodbye().runLoop()
> --
> Crutcher Dunnavant <crutcher at>

Crutcher Dunnavant <crutcher at>

From brett at  Sun Feb 19 23:09:31 2006
From: brett at (Brett Cannon)
Date: Sun, 19 Feb 2006 14:09:31 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/19/06, Crutcher Dunnavant <crutcher at> wrote:
> This is something I've been working on for a bit, and I think it is
> more or less ready to bring up on this list. I'd like to add a module
> (though probably not for 2.5).
> Before you ask, this module is _not_ compatible with, as it is
> command oriented, whereas is line oriented.
> Anyway, I'm looking for feedback, feature requests before starting the
> submission process.
> Code available here:

Just so you know, there is a basic rule that all new modules need to
have been used in the while and generally accepted and used by the
Python community before being accepted.  While this is not a hard
rule, it is mostly followed so if there is no visible use of the
module by the rest of the world it will be difficult to get it


From crutcher at  Sun Feb 19 23:15:56 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 14:15:56 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <>
References: <>
Message-ID: <>

Yes, I know. Hence this not being a patch.
This is really meant to be a compelling alternative to cmd.Cmd, and as
such I'm trying to get some discussion about it.

On 2/19/06, Brett Cannon <brett at> wrote:
> On 2/19/06, Crutcher Dunnavant <crutcher at> wrote:
> > This is something I've been working on for a bit, and I think it is
> > more or less ready to bring up on this list. I'd like to add a module
> > (though probably not for 2.5).
> >
> > Before you ask, this module is _not_ compatible with, as it is
> > command oriented, whereas is line oriented.
> >
> > Anyway, I'm looking for feedback, feature requests before starting the
> > submission process.
> >
> > Code available here:
> >
> Just so you know, there is a basic rule that all new modules need to
> have been used in the while and generally accepted and used by the
> Python community before being accepted.  While this is not a hard
> rule, it is mostly followed so if there is no visible use of the
> module by the rest of the world it will be difficult to get it
> accepted.
> -Brett

Crutcher Dunnavant <crutcher at>

From bob at  Sun Feb 19 23:49:48 2006
From: bob at (Bob Ippolito)
Date: Sun, 19 Feb 2006 14:49:48 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
	<>	<>
Message-ID: <>

On Feb 19, 2006, at 10:55 AM, Martin v. L?wis wrote:

> Stephen J. Turnbull wrote:
>> BTW, what use cases do you have in mind for Unicode -> Unicode
>> decoding?
> I think "rot13" falls into that category: it is a transformation
> on text, not on bytes.

The current implementation is a transformation on bytes, not text.   
Conceptually though, it's a text->text transform.

> For other "odd" cases: "base64" goes Unicode->bytes in the *decode*
> direction, not in the encode direction. Some may argue that base64
> is bytes, not text, but in many applications, you can combine base64
> (or uuencode) with abitrary other text in a single stream. Of course,
> it could be required that you go u.encode("ascii").decode("base64").

I would say that base64 is bytes->bytes.  Just because those bytes  
happen to be in a subset of ASCII, it's still a serialization meant  
for wire transmission.  Sometimes it ends up in unicode (e.g. in  
XML), but that's the exception not the rule.


From mail at  Sun Feb 19 23:32:23 2006
From: mail at (Manuzhai)
Date: Sun, 19 Feb 2006 23:32:23 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<dt94d8$bbd$>
Message-ID: <dtarlp$4av$>

> No; nobody volunteered a machine yet (plus the hand-holding that
> is always necessary with Windows).

What exactly is needed for this? Does it need to be a machine dedicated 
to this stuff, or could I just run the tests once every day or so when I 
feel like it and have them submitted to buildbot?



From walter at  Mon Feb 20 00:07:40 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Mon, 20 Feb 2006 00:07:40 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>
	<>	<>
	<>	<>
Message-ID: <>

Neal Norwitz wrote:
> On 2/19/06, Benji York <benji at> wrote:
>> Walter D?rwald wrote:
>>> I'd like to see vertical lines between the column.
>> I've done a version like that (still at
> I liked your current version better so I installed it.

How about this one:

    Walter D?rwald

From python at  Mon Feb 20 00:14:35 2006
From: python at (Raymond Hettinger)
Date: Sun, 19 Feb 2006 18:14:35 -0500
Subject: [Python-Dev] New Module: CommandLoop
References: <>
Message-ID: <001301c635aa$40288e70$b83efea9@RaymondLaptop1>

[Crutcher Dunnavant]
> Anyway, I'm looking for feedback, feature requests before starting the
> submission process.

With respect to the API, the examples tend to be visually dominated dominated by 
the series of decorators.  The three decorators do nothing more than add a 
function attribute, so they are essentially doing the same type of action. 
Consider combining those into a single decorator and using keywords for the 
various arguments.  For example, change:

        @cmdloop.shorthelp('say goodbye')
        @cmdloop.usage('goodbye TARGET')

to just:

        @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye', 
usage='goodbye TARGET')

leaving the possibility of multiple decorators when one line gets to long:

        @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye')
        @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is a filename in 
the current directory')

Another thought on the API is to consider adding another decorator option to 
make commands case-insensitive so that 'help', 'HELP', and 'Help' will all work:

Also, in the absence of readline(), consider adding support for "!" style 
repeats of previous commands.

The exception hierarchy looks to be well-designed.  I'm not clear on whether it 
is internal or part of the API.  If the latter, is there an example of how to 
trap and handle specific exceptions in specific contexts?

If you're interested, here are a few code comments based on my first 

1) The "chars" variable can be eliminated and the "while chars" and 
"c=chars.pop(0)" sequence simplified to just:
    for c in reversed(str):
        . . .

2) Can the reformatDocString() function be replaced by textwrap.dedent() or do 
they do something different?

3) In _mapCommands(), the sort can be simplified from:
        self._cmds.sort(lambda a, b: cmp(a.aliases[0], b.aliases[0]))
    or if you want to avoid module dependencies:
        self._cmds.sort(key=lambda a: a[0])

4) In _preCommand, the sort simplifies from:
                    names = self.aliasdict.keys()
                    for name in names:
                    for name in sorted(self.aliasdict):


From brett at  Mon Feb 20 00:23:18 2006
From: brett at (Brett Cannon)
Date: Sun, 19 Feb 2006 15:23:18 -0800
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/19/06, Walter D?rwald <walter at> wrote:
> Neal Norwitz wrote:
> > On 2/19/06, Benji York <benji at> wrote:
> >> Walter D?rwald wrote:
> >>> I'd like to see vertical lines between the column.
> >> I've done a version like that (still at
> >
> > I liked your current version better so I installed it.
> How about this one:

I like it.  It's really nice to be able to fit it all on a single
screen (at least for me).  Seems slightly crisper to me as well.


From fdrake at  Mon Feb 20 00:32:13 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Sun, 19 Feb 2006 18:32:13 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sunday 19 February 2006 16:14, Benji York wrote:
 > I replied to Walter D?rwald's suggestions and made a few changes, but
 > don't know which I like better.  If you prefer the new one at
 > you can use it as well.

I like the new one better; any chance we can switch to that on as well?  ;-)  The improved use of horizontal space is 


Fred L. Drake, Jr.   <fdrake at>

From fdrake at  Mon Feb 20 00:34:49 2006
From: fdrake at (Fred L. Drake, Jr.)
Date: Sun, 19 Feb 2006 18:34:49 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>
Message-ID: <>

On Sunday 19 February 2006 18:07, Walter D?rwald wrote:
 > How about this one:

Sigh.  This is nice too.  Now I'm not sure which I'd rather see on  ;-)


Fred L. Drake, Jr.   <fdrake at>

From raymond.hettinger at  Mon Feb 20 00:59:07 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sun, 19 Feb 2006 18:59:07 -0500
Subject: [Python-Dev] New Module: CommandLoop
References: <>
Message-ID: <000801c635b0$78a78ac0$b83efea9@RaymondLaptop1>

[Raymond Hettinger]
> 1) The "chars" variable can be eliminated and the "while chars" and 
> "c=chars.pop(0)" sequence simplified to just:
>    for c in reversed(str):

Actually, that should have been just:
      for c in str:
         . . .


From crutcher at  Mon Feb 20 01:26:25 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 16:26:25 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <001301c635aa$40288e70$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

Whoa, thanks. Incorporated the suggestions to the code.

On 2/19/06, Raymond Hettinger <python at> wrote:
> [Crutcher Dunnavant]
> > Anyway, I'm looking for feedback, feature requests before starting the
> > submission process.
> With respect to the API, the examples tend to be visually dominated dominated by
> the series of decorators.  The three decorators do nothing more than add a
> function attribute, so they are essentially doing the same type of action.
> Consider combining those into a single decorator and using keywords for the
> various arguments.  For example, change:
>         @cmdloop.aliases('goodbye')
>         @cmdloop.shorthelp('say goodbye')
>         @cmdloop.usage('goodbye TARGET')
> to just:
>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye',
> usage='goodbye TARGET')
> leaving the possibility of multiple decorators when one line gets to long:
>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye')
>         @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is a filename in
> the current directory')

Well, why not support both, and leave it up to the user?

> Another thought on the API is to consider adding another decorator option to
> make commands case-insensitive so that 'help', 'HELP', and 'Help' will all work:
>          @cmdloop.addspec(case_sensitive=True)

shouldn't this be a property of the shell, and not the individual commands?
Perhaps a CASE_SENSITIVE=False attribute on the shell?

> Also, in the absence of readline(), consider adding support for "!" style
> repeats of previous commands.

How would this work? Would it be a simple replay of the previous
command? Would it search a command history? How do we make it interact
with the _preCommand code? I'm not sure how to make this work.

> The exception hierarchy looks to be well-designed.  I'm not clear on whether it
> is internal or part of the API.  If the latter, is there an example of how to
> trap and handle specific exceptions in specific contexts?

The exceptions are part of the API, but are only meant to be thrown by
user code, and handled by the module code. There aren't any situations
when user code needs to catch modules that I know of.

> If you're interested, here are a few code comments based on my first
> read-through:
> 1) The "chars" variable can be eliminated and the "while chars" and
> "c=chars.pop(0)" sequence simplified to just:
>     for c in reversed(str):
>         . . .

chars is something of a navel. The parser went through some evolution,
and for a time, it _didn't_ consume a character every time arround.
However, the chars are not reversed, so given s/str/cmdline/g:

for c in cmdline:

> 2) Can the reformatDocString() function be replaced by textwrap.dedent() or do
> they do something different?

I guess so, they seem to do the same thing.

> 3) In _mapCommands(), the sort can be simplified from:
>         self._cmds.sort(lambda a, b: cmp(a.aliases[0], b.aliases[0]))
>     to:
>         self._cmds.sort(key=operator.itemgetter(0))
>     or if you want to avoid module dependencies:
>         self._cmds.sort(key=lambda a: a[0])

well, almost. we are sorting on the aliases, so
self._cmds.sort(key=lambda a: a.aliases[0])

> 4) In _preCommand, the sort simplifies from:
>                     names = self.aliasdict.keys()
>                     names.sort()
>                     for name in names:
>     to:
>                     for name in sorted(self.aliasdict):


> Raymond

Crutcher Dunnavant <crutcher at>

From crutcher at  Mon Feb 20 01:28:39 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 16:28:39 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <>
References: <>
Message-ID: <>

s/catch modules/catch exceptions/g

On 2/19/06, Crutcher Dunnavant <crutcher at> wrote:
> Whoa, thanks. Incorporated the suggestions to the code.
> On 2/19/06, Raymond Hettinger <python at> wrote:
> > [Crutcher Dunnavant]
> > > Anyway, I'm looking for feedback, feature requests before starting the
> > > submission process.
> >
> > With respect to the API, the examples tend to be visually dominated dominated by
> > the series of decorators.  The three decorators do nothing more than add a
> > function attribute, so they are essentially doing the same type of action.
> > Consider combining those into a single decorator and using keywords for the
> > various arguments.  For example, change:
> >
> >         @cmdloop.aliases('goodbye')
> >         @cmdloop.shorthelp('say goodbye')
> >         @cmdloop.usage('goodbye TARGET')
> >
> > to just:
> >
> >         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye',
> > usage='goodbye TARGET')
> >
> > leaving the possibility of multiple decorators when one line gets to long:
> >
> >         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye')
> >         @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is a filename in
> > the current directory')
> Well, why not support both, and leave it up to the user?
> > Another thought on the API is to consider adding another decorator option to
> > make commands case-insensitive so that 'help', 'HELP', and 'Help' will all work:
> >          @cmdloop.addspec(case_sensitive=True)
> shouldn't this be a property of the shell, and not the individual commands?
> Perhaps a CASE_SENSITIVE=False attribute on the shell?
> > Also, in the absence of readline(), consider adding support for "!" style
> > repeats of previous commands.
> How would this work? Would it be a simple replay of the previous
> command? Would it search a command history? How do we make it interact
> with the _preCommand code? I'm not sure how to make this work.
> > The exception hierarchy looks to be well-designed.  I'm not clear on whether it
> > is internal or part of the API.  If the latter, is there an example of how to
> > trap and handle specific exceptions in specific contexts?
> The exceptions are part of the API, but are only meant to be thrown by
> user code, and handled by the module code. There aren't any situations
> when user code needs to catch modules that I know of.
> > If you're interested, here are a few code comments based on my first
> > read-through:
> >
> > 1) The "chars" variable can be eliminated and the "while chars" and
> > "c=chars.pop(0)" sequence simplified to just:
> >     for c in reversed(str):
> >         . . .
> chars is something of a navel. The parser went through some evolution,
> and for a time, it _didn't_ consume a character every time arround.
> However, the chars are not reversed, so given s/str/cmdline/g:
> for c in cmdline:
>   ...
> > 2) Can the reformatDocString() function be replaced by textwrap.dedent() or do
> > they do something different?
> I guess so, they seem to do the same thing.
> > 3) In _mapCommands(), the sort can be simplified from:
> >         self._cmds.sort(lambda a, b: cmp(a.aliases[0], b.aliases[0]))
> >     to:
> >         self._cmds.sort(key=operator.itemgetter(0))
> >     or if you want to avoid module dependencies:
> >         self._cmds.sort(key=lambda a: a[0])
> well, almost. we are sorting on the aliases, so
> self._cmds.sort(key=lambda a: a.aliases[0])
> > 4) In _preCommand, the sort simplifies from:
> >                     names = self.aliasdict.keys()
> >                     names.sort()
> >                     for name in names:
> >     to:
> >                     for name in sorted(self.aliasdict):
> >
> cool.
> > Raymond
> >
> --
> Crutcher Dunnavant <crutcher at>

Crutcher Dunnavant <crutcher at>

From jeff at  Mon Feb 20 01:34:38 2006
From: jeff at (Jeff Rush)
Date: Sun, 19 Feb 2006 18:34:38 -0600
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Neal Norwitz wrote:
> On 2/17/06, M.-A. Lemburg <mal at> wrote:
>>Neal Norwitz wrote:
> >
>>Another candidate for removal is the --disable-unicode
>>We should probably add a deprecation warning for that in
>>Py 2.5 and then remove the hundreds of
>>from the source code in time for Py 2.6.
> I've heard of a bunch of people using --disable-unicode.  I'm not sure
> if it's curiosity or if there are really production builds without
> unicode.  Ask this on c.l.p too.

Such a switch quite likely is useful to those creating Python interpreters 
for small hand-held devices, where space is at a premium.  I would hesitate 
to remove switches to drop features in general, for that reason.

Although I have played with reducing the footprint of Python, I am not 
currently doing so.  I could never get the footprint down sufficiently to 
make it usable, unfortunately.  But I would like to see the Python 
developers maintain an awareness of memory consumption and not assume that 
Python is always run on modern fully-loaded desktops.  We are seeing 
increasing use of Python in embedded systems these days.


From raymond.hettinger at  Mon Feb 20 02:03:15 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Sun, 19 Feb 2006 20:03:15 -0500
Subject: [Python-Dev] New Module: CommandLoop
References: <>
Message-ID: <003101c635b9$6e93a6f0$b83efea9@RaymondLaptop1>

>>         @cmdloop.aliases('goodbye')
>>         @cmdloop.shorthelp('say goodbye')
>>         @cmdloop.usage('goodbye TARGET')
>> to just:
>>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye',
>> usage='goodbye TARGET')
>> leaving the possibility of multiple decorators when one line gets to long:
>>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye')
>>         @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is a filename 
>> in
>> the current directory')

> Well, why not support both, and leave it up to the user?

Having only one method keeps the API simple.  Also, the addspec() approach 
allows the user to choose between single and multiple lines.

BTW, addspec() could be made completely general by supporting all possible 
keywords at once:

def addspec(**kwds):
    def decorator(func):
        return func
    return decorator

With an open definition like that, users can specify new attributes with less 


From martin at  Mon Feb 20 02:10:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 02:10:53 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <dtarlp$4av$>
References: <>	<dt94d8$bbd$>	<>
Message-ID: <>

Manuzhai wrote:
>>No; nobody volunteered a machine yet (plus the hand-holding that
>>is always necessary with Windows).
> What exactly is needed for this? Does it need to be a machine dedicated 
> to this stuff, or could I just run the tests once every day or so when I 
> feel like it and have them submitted to buildbot?

"The point" of buildbot (atleast the way we use it) is to see
immediately what check-in broke the tests on some platform. So yes,
permanent availability would be desirable.

However, buildbot runs in the background (atleast on Unix), and
gets triggered whenever a checkin occurs. So the machine doesn't
have to be *dedicated*; any machine that is always on might do.


From crutcher at  Mon Feb 20 02:12:44 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 17:12:44 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <003101c635b9$6e93a6f0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On 2/19/06, Raymond Hettinger <raymond.hettinger at> wrote:
> >>         @cmdloop.aliases('goodbye')
> >>         @cmdloop.shorthelp('say goodbye')
> >>         @cmdloop.usage('goodbye TARGET')
> >>
> >> to just:
> >>
> >>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye',
> >> usage='goodbye TARGET')
> >>
> >> leaving the possibility of multiple decorators when one line gets to long:
> >>
> >>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say goodbye')
> >>         @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is a filename
> >> in
> >> the current directory')
> > Well, why not support both, and leave it up to the user?
> Having only one method keeps the API simple.  Also, the addspec() approach
> allows the user to choose between single and multiple lines.
> BTW, addspec() could be made completely general by supporting all possible
> keywords at once:
> def addspec(**kwds):
>     def decorator(func):
>         func.__dict__.update(kwds)
>         return func
>     return decorator
> With an open definition like that, users can specify new attributes with less
> effort.

Well, yes it could. But as it currently stands, there is no mechanism
for user code to manipulate commands, so that would be of limited
utility, versus throwing errors if the user specified something

> Raymond

Crutcher Dunnavant <crutcher at>

From martin at  Mon Feb 20 02:14:09 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 02:14:09 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>
	<> <>
Message-ID: <>

Benji York wrote:
> See
> It doesn't look quite as good in IE because of the limited HTML the 
> buildbot waterfall display generates and the limitations of IE's CSS 
> support.

Thanks again for the contribution!

> The best I could do without hacking buildbot was to highlight the trunk 
> "builder" links.  This only works in Firefox, also because of IE's 
> limited CSS2 support.
> More could be done if the HTML generation was modified, but that didn't 
> seem prudent.

I looked at it, and it would require quite a lot of changes to the
buildbot, so I abstain from wanting such a thing (atleast for the moment).

Your regex-matching (or whatever the mechanism is) works quite well for


From bob at  Mon Feb 20 02:24:04 2006
From: bob at (Bob Ippolito)
Date: Sun, 19 Feb 2006 17:24:04 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <003101c635b9$6e93a6f0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On Feb 19, 2006, at 5:03 PM, Raymond Hettinger wrote:

>>>         @cmdloop.aliases('goodbye')
>>>         @cmdloop.shorthelp('say goodbye')
>>>         @cmdloop.usage('goodbye TARGET')
>>> to just:
>>>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say  
>>> goodbye',
>>> usage='goodbye TARGET')
>>> leaving the possibility of multiple decorators when one line gets  
>>> to long:
>>>         @cmdloop.addspec(aliases=['goodbye'], shorthelp ='say  
>>> goodbye')
>>>         @cmdloop.addspec(usage='goodbye TARGET  # where TARGET is  
>>> a filename
>>> in
>>> the current directory')
>> Well, why not support both, and leave it up to the user?
> Having only one method keeps the API simple.  Also, the addspec()  
> approach
> allows the user to choose between single and multiple lines.
> BTW, addspec() could be made completely general by supporting all  
> possible
> keywords at once:
> def addspec(**kwds):
>     def decorator(func):
>         func.__dict__.update(kwds)
>         return func
>     return decorator
> With an open definition like that, users can specify new attributes  
> with less
> effort.

Doesn't this discussion belong on c.l.p / python-list?


From tjreedy at  Mon Feb 20 02:39:13 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 19 Feb 2006 20:39:13 -0500
Subject: [Python-Dev] New Module: CommandLoop
References: <>
Message-ID: <dtb6k2$4om$>

I know it is tempting and perhaps ok in your own privatecode, but casually 
masking builtins like 'str' in public library code sets a bad example ;-).


From tjreedy at  Mon Feb 20 02:27:10 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 19 Feb 2006 20:27:10 -0500
Subject: [Python-Dev] buildbot is all green
References: <>	<dt94d8$bbd$>	<><dtarlp$4av$>
Message-ID: <dtb6k1$4om$>

>>>is always necessary with Windows).

With a couple of more machines added, should there be two separate pages 
for trunk and 2.4 builds?  Or do most checkins affect both?

From mhammond at  Mon Feb 20 03:07:06 2006
From: mhammond at (Mark Hammond)
Date: Mon, 20 Feb 2006 13:07:06 +1100
Subject: [Python-Dev] javascript "standing on Python's shoulders" as it
	moves forward.
Message-ID: <>

Sorry for the slightly off-topic post, but I thought it of interest that
Brendan Eich (the "father" of javascript) has blogged about the future of
js, and specifically how he will "borrow from Python for iteration,
generators, and comprehensions" and more generally why he is "standing on
Python?s shoulders" when appropriate.

The fact my name appears there is a happy coincidence related to the fact I
am working with Mozilla on making their DOM "language agnostic" and
supporting Python - but the general reasons why Python is seen as important
is interesting...


From crutcher at  Mon Feb 20 03:21:39 2006
From: crutcher at (Crutcher Dunnavant)
Date: Sun, 19 Feb 2006 18:21:39 -0800
Subject: [Python-Dev] New Module: CommandLoop
In-Reply-To: <dtb6k2$4om$>
References: <>
Message-ID: <>

totally agree, removed them.

On 2/19/06, Terry Reedy <tjreedy at> wrote:
> I know it is tempting and perhaps ok in your own privatecode, but casually
> masking builtins like 'str' in public library code sets a bad example ;-).
> tjr
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Crutcher Dunnavant <crutcher at>

From fredrik at  Mon Feb 20 03:22:54 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 20 Feb 2006 03:22:54 +0100
Subject: [Python-Dev] New Module: CommandLoop
References: <><001301c635aa$40288e70$b83efea9@RaymondLaptop1><><003101c635b9$6e93a6f0$b83efea9@RaymondLaptop1>
Message-ID: <dtb962$b16$>

Bob Ippolito wrote:

> Doesn't this discussion belong on c.l.p / python-list?

yes, please.


From jcarlson at  Mon Feb 20 04:52:43 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 19 Feb 2006 19:52:43 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

"Michael Urman" <murman at> wrote:
> On 2/19/06, Josiah Carlson <jcarlson at> wrote:
> > My post probably hasn't convinced you, but much of the confusion, I
> > believe, is based on Martin's original belief that 'k in dd' should
> > always return true if there is a default.  One can argue that way, but
> > then you end up on the circular train of thought that gets you to "you
> > can't do anything useful if that is the case, .popitem() doesn't work,
> > len() is undefined, ...".  Keep it simple, keep it sane.
> A default factory implementation fundamentally modifies the behavior
> of the mapping. There is no single answer to the question "what is the
> right behavior for contains, len, popitem" as that depends on what the
> code that consumes the mapping is written like, what it is attempting
> to do, and what you are attempting to override it to do. Or, simply,
> on why you are providing a default value. Resisting the temptation to
> guess the why and just leaving the methods as is seems  the best
> choice; overriding __contains__ to return true is much easier than
> reversing that behavior would be.

I agree, there is nothing perfect.  But at least in all of my use-cases,
and the majority of the ones I've seen 'in the wild', my previous post
provided an implementation that worked precisely like desired, and
precisely like a regular dictionary, except when accessing a
non-existant key via: value = dd[key] . __contains__, etc., all work
exactly like they do with a non-defaulting dictionary. Iteration via
popitem(), pop(key), items(), iteritems(), __iter__, etc., all work the
way you would expect them. The only nit is that code which iterates

    for key in keys:
            value = dd[key]
        except KeyError:

(where 'keys' has nothing to do with dd.keys(), it is merely a listing
of keys which are desired at this particular point)  However, the
following works like it always did:

    for key in keys:
        if key not in dd:
        value = dd[key]

> An example when it could theoretically be used, if not particularly
> useful. The gettext.install() function was just updated to take a
> names parameter which controls which gettext accessor functions it
> adds to the builtin namespace. Its implementation looks for "method in
> names" to decide. Passing a default-true dict would allow the future
> behavior to be bind all checked names, but only if __contains__
> returns True.
> Even though it would make a poor base implementation, and these
> effects aren't a good candidate for it,  the code style that could
> best leverage such a __contains__ exists.

Indeed, there are cases where an always-true __contains__ exists, and
the pure-Python implementation I previously posted can be easily
modified to offer such a feature.  However, because there are also use
cases for the not-always-true __contains__, picking either as the "one
true way" seems a bit unnecessary.

Presumably, if one goes into the collections module, the other will too. 
Actually, they could share all of their code except for a simple flag
which determines the always-true __contains__.  With minor work, that
'flag', or really the single bit it would require, may even be
embeddable into the type object.  Arguably, there should be a handful of
these defaulting dictionary-like objects, and for each variant, it
should be documented what their use-cases are, and any gotcha's that
will inevitably come up.

 - Josiah

From jcarlson at  Mon Feb 20 05:28:41 2006
From: jcarlson at (Josiah Carlson)
Date: Sun, 19 Feb 2006 20:28:41 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

"Stephen J. Turnbull" <stephen at> wrote:
> >>>>> "Josiah" == Josiah Carlson <jcarlson at> writes:
>     Josiah> The question remains: is str.decode() returning a string
>     Josiah> or unicode depending on the argument passed, when the
>     Josiah> argument quite literally names the codec involved,
>     Josiah> difficult to understand?  I don't believe so; am I the
>     Josiah> only one?
> Do you do any of the user education *about codec use* that you
> recommend?  The people I try to teach about coding invariably find it
> difficult to understand.  The problem is that the near-universal
> intuition is that for "human-usable text" is pretty much anything *but
> Unicode* will do.  This is a really hard block to get them past.
> There is very good reason why Unicode is plain text ("original" in
> MAL's terms) and everything else is encoded ("derived"), but students
> new to the concept often take a while to "get" it.

I've not been teaching Python; when I was still a TA, it was strictly
algorithms and data structures.  Of those people who I have had the
opportunity to entice into Python, I've not followed up on their
progress to know if they had any issues.

I try to internalize it by not thinking of strings as encoded data, but
as binary data, and unicode as text.  I then remind myself that unicode
isn't native on-disk or cross-network (which stores and transports bytes,
not characters), so one needs to encode it as binary data.  It's a
subtle difference, but it has worked so far for me.

In my experience, at least for only-English speaking users, most people
don't even get to unicode.  I didn't even touch it until I had been well
versed with the encoding and decoding of all different kinds of binary
data, when a half-dozen international users (China, Japan, Russia, ...)
requested its support in my source editor; so I added it.  Supporting it
properly hasn't been very difficult, and the only real nit I have
experienced is supporting the encoding line just after the #! line for
arbitrary codecs (sometimes saving a file in a particular encoding dies).

I notice that you seem to be in Japan, so teaching unicode is a must. 
If you are using the "unicode is text" and "strings are data", and they
aren't getting it; then I don't know.

> Maybe it's just me, but whether it's the teacher or the students, I am
> *not* excited about the education route.  Martin's simple rule *is*
> simple, and the exceptions for using a "nonexistent" method mean I
> don't have to reinforce---the students will be able to teach each
> other.  The exceptions also directly help reinforce the notion that
> text == Unicode.

Are you sure that they would help?  If .encode() and .decode() drop from
strings and unicode (respectively), they get an AttributeError.  That's
almost useless.  Raising a better exception (with more information)
would be better in that case, but losing the functionality that either
would offer seems unnecessary; which is why I had suggested some of the
other method names.  Perhaps a "This method was removed because it
confused users.  Use help(str.encode) (or unicode.decode) to find out
how you can do the equivalent, or do what you *really* wanted to do."

> I grant the point that .decode('base64') is useful, but I also believe
> that "education" is a lot more easily said than done in this case.

What I meant by "education" is 'better documentation' and 'better
exception messages'.  I didn't learn Python by sitting in a class; I
learned it by going through the tutorial over a weekend as a 2nd year
undergrad and writing software which could do what I wanted/needed.
Compared to the compiler messages I'd been seeing from Codewarrior and
MSVC 6, Python exceptions were like an oracle.  I can understand how
first-time programmers can have issues with *some* Python exception
messages, which is why I think that we could use better ones.  There is
also the other issue that sometimes people fail to actually read the

Again, I don't believe that an AttributeError is any better than an
"ordinal not in range(128)", but "You are trying to encode/decode
to/from incompatible types. expected: a->b got: x->y" is better.  Some
of those can be done *very soon*, given the capabilities of the
encodings module, and they could likely be easily migrated, regardless
of the decisions with .encode()/.decode() .

 - Josiah

From guido at  Mon Feb 20 06:16:37 2006
From: guido at (Guido van Rossum)
Date: Sun, 19 Feb 2006 21:16:37 -0800
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/19/06, Jeff Rush <jeff at> wrote:
[Quoting Neal Norwitz]
> > I've heard of a bunch of people using --disable-unicode.  I'm not sure
> > if it's curiosity or if there are really production builds without
> > unicode.  Ask this on c.l.p too.
> Such a switch quite likely is useful to those creating Python interpreters
> for small hand-held devices, where space is at a premium.  I would hesitate
> to remove switches to drop features in general, for that reason.
> Although I have played with reducing the footprint of Python, I am not
> currently doing so.  I could never get the footprint down sufficiently to
> make it usable, unfortunately.  But I would like to see the Python
> developers maintain an awareness of memory consumption and not assume that
> Python is always run on modern fully-loaded desktops.  We are seeing
> increasing use of Python in embedded systems these days.

Do you know of any embedded platform that doesn't have unicode support
as a requirement? Python runs fine on Nokia phones running Symbian,
where *everything* is a Unicode string.

--Guido van Rossum (home page:

From guido at  Mon Feb 20 06:22:02 2006
From: guido at (Guido van Rossum)
Date: Sun, 19 Feb 2006 21:22:02 -0800
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <dtb6k1$4om$>
References: <>
	<dt94d8$bbd$> <>
	<dtarlp$4av$> <>
Message-ID: <>

On 2/19/06, Terry Reedy <tjreedy at> wrote:
> With a couple of more machines added, should there be two separate pages
> for trunk and 2.4 builds?  Or do most checkins affect both?

They don't; I think a separate page would be a fine idea.

FWIW, it looks like all the sample templates are still wasting a lot
of horizontal space in the first two columns the second is almost
always empty. Perhaps the author of the change could be placed *below*
the timestamp instead of next to it? Also for all practical purposes
we can probably get rid of the seconds in the timestamp.

--Guido van Rossum (home page:

From steve at  Mon Feb 20 07:35:59 2006
From: steve at (Steve Holden)
Date: Mon, 20 Feb 2006 01:35:59 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Walter D?rwald wrote:
> Neal Norwitz wrote:
>>On 2/19/06, Benji York <benji at> wrote:
>>>Walter D?rwald wrote:
>>>>I'd like to see vertical lines between the column.
>>>I've done a version like that (still at
>>I liked your current version better so I installed it.
> How about this one:
All formats would be improved of the headers could be made to float at 
the top of the page as scrolling took place.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From martin at  Mon Feb 20 08:09:44 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 08:09:44 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <dtb6k1$4om$>
References: <>	<dt94d8$bbd$>	<><dtarlp$4av$>	<>
Message-ID: <>

Terry Reedy wrote:
>>>>is always necessary with Windows).
> With a couple of more machines added, should there be two separate pages 
> for trunk and 2.4 builds?  Or do most checkins affect both?

I'd like to avoid this, assuming that people only look at the "main"
page. An individual checkin affects either the trunk or 2.4, but never
both; many check-ins come in pairs.


From jeff at  Mon Feb 20 10:36:09 2006
From: jeff at (Jeff Rush)
Date: Mon, 20 Feb 2006 03:36:09 -0600
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Guido van Rossum wrote:
> On 2/19/06, Jeff Rush <jeff at> wrote:
> [Quoting Neal Norwitz]
>>>I've heard of a bunch of people using --disable-unicode.  I'm not sure
>>>if it's curiosity or if there are really production builds without
>>>unicode.  Ask this on c.l.p too.
> Do you know of any embedded platform that doesn't have unicode support
> as a requirement? Python runs fine on Nokia phones running Symbian,
> where *everything* is a Unicode string.

1. PalmOS, at least the last time I was involved with it.  Python on a Palm 
is a very tight fit.

2. "GM862 Cellular Module with Python Interpreter"

These may be dimishing markets as memory capacity increases and I wouldn't 
argue adding compile flags for such at this late date, but if the flags are 
already there, perhaps the slight inconvenience to Python-internal 
developers is worth it.

Hey, perhaps dropping out Unicode support is not a big win - I just know it 
is useful at times to have a collection of flags to drop out floating point, 
complex arithmetic, language parsing and such for memory-constrained cases.


From bokr at  Mon Feb 20 12:52:22 2006
From: bokr at (Bengt Richter)
Date: Mon, 20 Feb 2006 11:52:22 GMT
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP 332
	revival in coordination with pep 349?]]
References: <>
	<>	<>	<>
	<> <>
Message-ID: <>

On Sat, 18 Feb 2006 09:59:38 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

>Aahz wrote:
>> The problem is that they don't understand that "Martin v. L?wis" is not
>> Unicode -- once all strings are Unicode, this is guaranteed to work.
Well, after all the "string" literal escapes that were being used
to define byte values are all rewritten, yes, I'll believe the guarantee ;-)
(BTW, are there plans for migration tools?)

Ok, now back to the s/bytes/octet/ topic:
>This specific call, yes. I don't think the problem will go away as long
>as both encode and decode are available for both strings and byte
>> While it's not absolutely true, my experience of watching Unicode
>> confusion is that the simplest approach for newbies is: encode FROM
>> Unicode, decode TO Unicode.
>I think this is what should be in-grained into the library, also. It
>shouldn't try to give additional meaning to these terms.
Thinking about bytes recently, it occurs to me that bytes are really not intrinsically
numeric in nature. They don't necessarily represent uint8's. E.g., a binary file is
really a sequence of bit octets in its most primitive and abstract sense.

So I'm wondering if we shouldn't have an octet type analogous to unicode, and instances of octet
would be vectors of octets as abstract 8-bit bit vectors, like instances of unicode are vectors of abstract characters.

If you wanted integers you could map ord for integers guaranteed to be in range(256).
The constructor would naturally take any suitable integer sequence so octet([65,66,67]) would work.

In general, all encode methods would produce an octet instance, e.g. unicode.encode.
octet.decode(octet_instance, 'src_encoding') or octet_instance.decode('src_encoding') would do
all the familiar character code sequence decoding,
e.g., octet.decode(oseq, 'utf-8') or oseq.decode('utf-8') to make a unicode instance.

Going from unicode, unicode.encode(uinst, 'utf-8') or uinst.encode('utf-8') would produce an octet instance.
I think this is conceptually purer than the current bytes idea, since the result really has no arithmetic significance.

Also, ord would work on a length-one octet instance, and produce the unsigned integer value you'd expect, but would fail
if not length-one, like ord on unicode (or current str).

Thus octet would replace bytes as the binary info container, and would not have any presumed aritmetic
significance, either as integer or as character-of-current-source-encoding-inferred-from-integer-value-as-ord.

To get a text representation of octets, hex is natural, e.g., octet('6162 6380') # spaces ignored
so repr(octet('a deaf bee')) => "octet('adeafbee')" and octet('616263').decode('ascii') => u'abc' and
back: u'abc.encode('ascii') => octet('616263'). The base64 codec looks conceptually cleaner too, so long
as you keep in mind base64 as a character subset of unicode and the name of the transformation function pair.
octet('616263').decode('base64') => u'YWJj\n' # octets -> characters
u'YWJj\n'.encode('base64') => octet('616263') # characters -> octets

If you wanted integer-nature bytes, you could have octet codecs for uint8 and int8, e.g., octseq.decode('int8')
could produce a list of signed integers all in range(-128,128). Or maybe map(dec_int8, octseq). The array
module could easily be a target for octet.decode, e.g., octseq.decode('array_B') or octet.decode(octseq, 'array_B'),
and octet(array_instance) the other way.

Likewise, other types could be destination for octet.decode.

E.g., if you had an abstraction for a display image one could have 'gif' and 'png' and 'bmp' etc
be like 'cp437', 'latin-1', and 'utf-8' etc are for decoding octest to unicode, and write stuff like

    o_seq = open('pic.gif','rb')  # makes octet instance
    img = o_seq.decode('gif89')   # => img is abstract, internally represented suitably but hidden, like unicode.
    open('pic.png', 'wb').write(img.encode('png'))

UIAM PIL has this functionality, if not as encode/decode methods.

Similarly, there could be an abstract archive container, and you could have

    arch = open('tree.tgz','rb').decode('tgz') # => might do lazy things waiting for encode
    egg_octets = arch.encode('python_egg')  # convert to egg format?? (just hand-waving ;-)

Probably all it would take is to wrap some things in abstract-container (AC) types, to enforce the protocol.
Image(octet_seq, 'gif') might produce an AC that only saved a (octet_seq, 'gif') internally, or it might
do eager conversion per optional additional args. Certainly .bmp without rle can be hugely wasteful.

For flexibility like eager vs not, or perhaps returning an iterator instead of a byte sequence,
I guess the encode/decode signatures should be (enc, *args, **kw) and pass those things on to
the worker functions? An abstract container could have a "pack" codec to do serial composition/decomposition.

I'm sure Mal has all this stuff one way or another, but I wanted the conceptual purity of AC instances ac in
ac = octet_seq.decode('src_enc'); octet_seq  = ac.encode('dst_enc') ;-)

Bottom line thought: binary octets aren't numeric ;-)

Bengt Richter

From p.f.moore at  Mon Feb 20 13:01:02 2006
From: p.f.moore at (Paul Moore)
Date: Mon, 20 Feb 2006 12:01:02 +0000
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <dt8qb8$jd8$>
References: <>
	<> <dt8qb8$jd8$>
Message-ID: <>

On 2/19/06, Steve Holden <steve at> wrote:
> > You are missing the rationale of the PEP process. The point is
> > *not* documentation. The point of the PEP process is to channel
> > and collect discussion, so that the BDFL can make a decision.
> > The BDFL is not bound at all to the PEP process.
> >
> > To document things, we use (or should use) documentation.
> >
> >
> One could wish this ideal had been the case for the import extensions
> defined in PEP 302.

(A bit off-topic, but that hit home, so I'll reply...)

Agreed, and it's my fault they weren't, to some extent. I did try to
find a suitable place, but the import docs are generally fairly
scattered, and there wasn't a particularly good place to put the

Any suggestions would be gratefully accepted...

From mal at  Mon Feb 20 14:23:22 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 20 Feb 2006 14:23:22 +0100
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Jeff Rush wrote:
> Guido van Rossum wrote:
>> On 2/19/06, Jeff Rush <jeff at> wrote:
>> [Quoting Neal Norwitz]
>>>> I've heard of a bunch of people using --disable-unicode.  I'm not sure
>>>> if it's curiosity or if there are really production builds without
>>>> unicode.  Ask this on c.l.p too.
>> Do you know of any embedded platform that doesn't have unicode support
>> as a requirement? Python runs fine on Nokia phones running Symbian,
>> where *everything* is a Unicode string.
> 1. PalmOS, at least the last time I was involved with it.  Python on a
> Palm is a very tight fit.
> 2. "GM862 Cellular Module with Python Interpreter"
> These may be dimishing markets as memory capacity increases and I
> wouldn't argue adding compile flags for such at this late date, but if
> the flags are already there, perhaps the slight inconvenience to
> Python-internal developers is worth it.
> Hey, perhaps dropping out Unicode support is not a big win - I just know
> it is useful at times to have a collection of flags to drop out floating
> point, complex arithmetic, language parsing and such for
> memory-constrained cases.

These switches make the code less maintainable. I'm not even
talking about the testing overhead.

I'd say that the parties interested in non-Unicode versions of
Python should maintain these branches of Python. Dito for other
stripped down versions.

Note that this does not mean that we should forget about memory
consumption issues. It's just that if there's only marginal
interest in certain special builds of Python, I don't see the
requirement for the Python core developers to maintain them.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 20 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From guido at  Mon Feb 20 14:41:43 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 05:41:43 -0800
Subject: [Python-Dev] defaultdict proposal round three
Message-ID: <>

I'm withdrawing the last proposal. I'm not convinced by the argument
that __contains__ should always return True (perhaps it should also
insert the value?), nor by the complaint that a holy invariant would
be violated (so what?).

But the amount of discussion and the number of different viewpoints
present makes it clear that the feature as I last proposed would be
forever divisive.

I see two alternatives. These will cause a different kind of
philosophical discussion; so be it. I'll describe them relative to the
last proposal; for those who wisely skipped the last thread, here's a
link to the proposal:

Alternative A: add a new method to the dict type with the semantics of
__getattr__ from the last proposal, using default_factory if not None
(except on_missing is inlined). This avoids the discussion about
broken invariants, but one could argue that it adds to an already
overly broad API.

Alternative B: provide a dict subclass that implements the __getattr__
semantics from the last proposal. It could be an unrelated type for
all I care, but I do care about implementation inheritance since it
should perform just as well as an unmodified dict object, and that's
hard to do without sharing implementation (copying would be worse).

Parting shots:

- Even if the default_factory were passed to the constructor, it still
ought to be a writable attribute so it can be introspected and
modified. A defaultdict that can't change its default factory after
its creation is less useful.

- It would be unwise to have a default value that would be called if
it was callable: what if I wanted the default to be a class instance
that happens to have a __call__ method for unrelated reasons?
Callability is an elusive propperty; APIs should not attempt to
dynamically decide whether an argument is callable or not.

- A third alternative would be to have a new method that takes an
explicit defaut factory argument. This differs from setdefault() only
in the type of the second argument. I'm not keen on this; the original
use case came from an example where the readability of

  d.setdefault(key, []).append(value)

was questioned, and I'm not sure that

  d.something(key, list).append(value)

is any more readable. IOW I like (and I believe few have questioned)
associating the default factory with the dict object instead of with
the call site.

Let the third round of the games begin!

--Guido van Rossum (home page:

From benji at  Mon Feb 20 15:10:31 2006
From: benji at (Benji York)
Date: Mon, 20 Feb 2006 09:10:31 -0500
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<dt94d8$bbd$>
	<>	<dtarlp$4av$>
	<>	<dtb6k1$4om$>
Message-ID: <>

Guido van Rossum wrote:
> FWIW, it looks like all the sample templates are still wasting a lot
> of horizontal space in the first two columns the second is almost
> always empty. Perhaps the author of the change could be placed *below*
> the timestamp instead of next to it? Also for all practical purposes
> we can probably get rid of the seconds in the timestamp.

So far the cosmetic changes have been done purely in CSS, implementing 
the above would (AFAICT) require modifying the buildbot waterfall 
display HTML generation.  Something that's been shied away from thus far.
Benji York

From jonathan.barbero at  Mon Feb 20 16:16:51 2006
From: jonathan.barbero at (Jonathan Barbero)
Date: Mon, 20 Feb 2006 12:16:51 -0300
Subject: [Python-Dev] (-1)**(1/2)==1?
Message-ID: <>

  My name is Jonathan, i?m new with Python.

   I try this in the command line:

   >>> (-1)**(1/2)

   This is wrong, i think it must throw an exception.
    What do you think?

-------------- next part --------------
An HTML attachment was scrubbed...

From aahz at  Mon Feb 20 16:18:08 2006
From: aahz at (Aahz)
Date: Mon, 20 Feb 2006 07:18:08 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Feb 19, 2006, Josiah Carlson wrote:
> I agree, there is nothing perfect.  But at least in all of my use-cases,
> and the majority of the ones I've seen 'in the wild', my previous post
> provided an implementation that worked precisely like desired, and
> precisely like a regular dictionary, except when accessing a
> non-existant key via: value = dd[key] . __contains__, etc., all work
> exactly like they do with a non-defaulting dictionary. Iteration via
> popitem(), pop(key), items(), iteritems(), __iter__, etc., all work the
> way you would expect them. 

This is the telling point, IMO.  My company makes heavy use of a "default
dict" (actually, it's a "default class" because using constants as the
lookup keys is mostly what we do and the convenience of is
compelling over foo['bar']).  Anyway, our semantics are as Josiah
outlines, and I can't see much use case for the alternatives.

Those of you arguing something different: do you have a real use case
(that you've implemented in real code)?
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From John.Marshall at  Mon Feb 20 16:19:17 2006
From: John.Marshall at (John Marshall)
Date: Mon, 20 Feb 2006 15:19:17 +0000
Subject: [Python-Dev] Does eval() leak?
In-Reply-To: <>
References: <> <>
Message-ID: <>

Martin v. L?wis wrote:
> John Marshall wrote:
>>Should I expect the virtual memory allocation
>>to go up if I do the following?
> python-dev is a list for discussing development of Python,
> not the development with Python. Please post this question
> to python-list at
> For python-dev, a message explaining where the memory leak
> is and how to correct it would be more appropriate. Most
> likely, there is no memory leak in eval.

My question was not a "development with Python" question.
However, I posted to python-list as you said. Only one
person responded to a request to test the provided code (~10
lines) which demonstrates a problem with eval()--he
confirmed my observations. As the problem _does exist_ for
2.3.5 which is the last 2.3 version still available at, I would suggest people avoid using it if they
do eval()s.

Unfortunately I, myself, cannot check into it more.


From g.brandl at  Mon Feb 20 16:19:45 2006
From: g.brandl at (Georg Brandl)
Date: Mon, 20 Feb 2006 16:19:45 +0100
Subject: [Python-Dev] (-1)**(1/2)==1?
In-Reply-To: <>
References: <>
Message-ID: <dtcmmh$jad$>

Jonathan Barbero wrote:
> Hello!
>   My name is Jonathan, i?m new with Python.
>    I try this in the command line:
>    >>> (-1)**(1/2)
>     1
>    This is wrong, i think it must throw an exception.
>     What do you think?

>>> 1/2
>>> (-1)**0

It's fine.

If you want to get a floating point result from dividing,
make one of the two numbers a float:

>>> 1.0/2


From aahz at  Mon Feb 20 16:25:37 2006
From: aahz at (Aahz)
Date: Mon, 20 Feb 2006 07:25:37 -0800
Subject: [Python-Dev] (-1)**(1/2)==1?
In-Reply-To: <dtcmmh$jad$>
References: <>
Message-ID: <>


Please do not respond to off-topic posts on python-dev without
redirecting them to comp.lang.python (or other suitable place).  Thanks!

On Mon, Feb 20, 2006, Georg Brandl wrote:
> Jonathan Barbero wrote:
>> Hello!
>>   My name is Jonathan, i?m new with Python.
>>    I try this in the command line:
>>    >>> (-1)**(1/2)
>>     1
>>    This is wrong, i think it must throw an exception.
>>     What do you think?
>>>> 1/2
> 0
>>>> (-1)**0
> 1
> It's fine.
> If you want to get a floating point result from dividing,
> make one of the two numbers a float:
>>>> 1.0/2
> 0.5

Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From facundobatista at  Mon Feb 20 16:50:35 2006
From: facundobatista at (Facundo Batista)
Date: Mon, 20 Feb 2006 12:50:35 -0300
Subject: [Python-Dev] (-1)**(1/2)==1?
In-Reply-To: <>
References: <>
Message-ID: <>

2006/2/20, Jonathan Barbero <jonathan.barbero at>:

> Hello!
>   My name is Jonathan, i?m new with Python.

Hello Jonathan. This list is only for developing Python itself, not
for developing in Python.

You should address this kind of question in comp.lang.python
(available as a newsgroup and a mailing list), see here for

>    I try this in the command line:
>    >>> (-1)**(1/2)
>     1
>    This is wrong, i think it must throw an exception.
>     What do you think?

It's OK, because (1/2) is zero, not 0.5.

>>> 1/2


.    Facundo


From aleaxit at  Mon Feb 20 17:05:11 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 08:05:11 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 5:41 AM, Guido van Rossum wrote:
> Alternative A: add a new method to the dict type with the semantics of
> __getattr__ from the last proposal, using default_factory if not None
> (except on_missing is inlined). This avoids the discussion about
> broken invariants, but one could argue that it adds to an already
> overly broad API.
> Alternative B: provide a dict subclass that implements the __getattr__
> semantics from the last proposal. It could be an unrelated type for
> all I care, but I do care about implementation inheritance since it
> should perform just as well as an unmodified dict object, and that's
> hard to do without sharing implementation (copying would be worse).

"Let's do both!"...;-).  Add a method X to dict as per A _and_  
provide in collections a subclass of dict that sets __getattr__ to X  
and also takes the value of default_dict as the first mandatory  
argument to __init__.

Yes, mapping is a "fat interface", chock full of convenience methods,  
but that's what makes it OK to add another, when it's really  
convenient; and nearly nobody's been arguing against defaultdict,  
only about the details of its architecture, so the convenience of  
this X can be taken as established. As long as DictMixin changes  
accordingly, the downsides are small.

Also having a collections.defaultdict as well as method X would be my  
preference, for even more convenience.

 From my POV, either or both of these additions would be an  
improvement wrt 2.4 (as would most of the other alternatives debated  
here), but I'm keen to have _some_ alternative get in, rather than  
all being blocked out of 2.5 by "analysis paralysis".


From guido at  Mon Feb 20 17:31:44 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 08:31:44 -0800
Subject: [Python-Dev] Does eval() leak?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/16/06, John Marshall <John.Marshall at> wrote:
> Hi,
> Should I expect the virtual memory allocation
> to go up if I do the following?
> -----
> raw = open("data").read()
> while True:
>         d = eval(raw)
> -----
> I would have expected the memory allocated to the
> object referenced by d to be deallocated, garbage
> collected, and reallocated for the new eval(raw)
> results, assigned to d.
> The file contains a large, SIMPLE (no self refs; all
> native python types/objects) dictionary (>300K).

You're probably running into the problem that the concrete parse tree
built up by the parser is rather large. While the memory used for that
tree is freed to Python's malloc pool, thus making it available for
other allocations by the same process, it is likely that the VM
allocation for the process will permanently go up.

When I try something like this (*) I see the virtual memory size go up
indefinitely with Python 2.3.5, but not with Python 2.4.1 or
2.5(head). Even so, the problem may be fragmentation instead of a
memory leak; fragmentation problems are even harded to debug than
leaks (since they depend on the heuristics applied by the platform's
malloc implementation).

You can file a bug for 2.3 but unless you also provide a patch it's
unlikely to be fixed; the memory allocation code was revamped
significantly for 2.4 so there's no simple backport of the fix

d = {}
for i in range(100000): d[repr(i)] = i
s = str(d)
while 1: x = eval(s); print 'x'

--Guido van Rossum (home page:

From raymond.hettinger at  Mon Feb 20 17:35:35 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 20 Feb 2006 11:35:35 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <008101c6363b$ad0fc4e0$b83efea9@RaymondLaptop1>

> I'm not convinced by the argument
> that __contains__ should always return True

Me either.  I cannot think of a more useless behavior or one more likely to have 
unexpected consequences.  Besides, as Josiah pointed out, it is much easier for 
a subclass override to substitute always True return values than vice-versa.

> Alternative A: add a new method to the dict type with the semantics of
> __getattr__ from the last proposal

Did you mean __getitem__?  If not, then I'm missing what the current proposal 

>, using default_factory if not None
> (except on_missing is inlined). This avoids the discussion about
> broken invariants, but one could argue that it adds to an already
> overly broad API.


I prefer this approach over subclassing.  The mental load from an additional 
method is less than the load from a separate type (even a subclass).   Also, 
avoidance of invariant issues is a big plus.  Besides, if this allows 
setdefault() to be deprecated, it becomes an all-around win.

> - Even if the default_factory were passed to the constructor, it still
> ought to be a writable attribute so it can be introspected and
> modified. A defaultdict that can't change its default factory after
> its creation is less useful.

Right!  My preference is to have default_factory not passed to the constructor, 
so we are left with just one way to do it.  But that is a nit.

> - It would be unwise to have a default value that would be called if
> it was callable: what if I wanted the default to be a class instance
> that happens to have a __call__ method for unrelated reasons?
> Callability is an elusive propperty; APIs should not attempt to
> dynamically decide whether an argument is callable or not.

That makes sense, though it seems over-the-top to need a zero-factory for a 

An alternative is to have two possible attributes:
  d.default_factory = list
  d.default_value = 0
with an exception being raised when both are defined (the test is done when the 
attribute is created, not when the lookup is performed).


From at  Mon Feb 20 17:52:26 2006
From: at (Michael Walter)
Date: Mon, 20 Feb 2006 17:52:26 +0100
Subject: [Python-Dev] (-1)**(1/2)==1?
In-Reply-To: <>
References: <>
Message-ID: <>

>>> 1/2

>>> (-1) ** (1./2)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: negative number cannot be raised to a fractional power


On 2/20/06, Jonathan Barbero <jonathan.barbero at> wrote:
> Hello!
>   My name is Jonathan, i?m new with Python.
>    I try this in the command line:
>    >>> (-1)**(1/2)
>     1
>    This is wrong, i think it must throw an exception.
>     What do you think?
>     Bye.
>         Jonathan.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From bokr at  Mon Feb 20 18:10:13 2006
From: bokr at (Bengt Richter)
Date: Mon, 20 Feb 2006 17:10:13 GMT
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <>

On Mon, 20 Feb 2006 05:41:43 -0800, "Guido van Rossum" <guido at> wrote:

>I'm withdrawing the last proposal. I'm not convinced by the argument
>that __contains__ should always return True (perhaps it should also
>insert the value?), nor by the complaint that a holy invariant would
>be violated (so what?).
>But the amount of discussion and the number of different viewpoints
>present makes it clear that the feature as I last proposed would be
>forever divisive.
>I see two alternatives. These will cause a different kind of
>philosophical discussion; so be it. I'll describe them relative to the
>last proposal; for those who wisely skipped the last thread, here's a
>link to the proposal:
>Alternative A: add a new method to the dict type with the semantics of
>__getattr__ from the last proposal, using default_factory if not None
>(except on_missing is inlined). This avoids the discussion about
>broken invariants, but one could argue that it adds to an already
>overly broad API.
>Alternative B: provide a dict subclass that implements the __getattr__
>semantics from the last proposal. It could be an unrelated type for
>all I care, but I do care about implementation inheritance since it
>should perform just as well as an unmodified dict object, and that's
>hard to do without sharing implementation (copying would be worse).
>Parting shots:
>- Even if the default_factory were passed to the constructor, it still
>ought to be a writable attribute so it can be introspected and
>modified. A defaultdict that can't change its default factory after
>its creation is less useful.
>- It would be unwise to have a default value that would be called if
>it was callable: what if I wanted the default to be a class instance
>that happens to have a __call__ method for unrelated reasons?
You'd have to put it in a lambda: thing_with_unrelated__call__method

>Callability is an elusive propperty; APIs should not attempt to
>dynamically decide whether an argument is callable or not.
>- A third alternative would be to have a new method that takes an
>explicit defaut factory argument. This differs from setdefault() only
>in the type of the second argument. I'm not keen on this; the original
>use case came from an example where the readability of
>  d.setdefault(key, []).append(value)
>was questioned, and I'm not sure that
>  d.something(key, list).append(value)
>is any more readable. IOW I like (and I believe few have questioned)
>associating the default factory with the dict object instead of with
>the call site.
>Let the third round of the games begin!
Sorry if I missed it, but is it established that defaulting lookup
will be spelled the same as traditional lookup, i.e. d[k] or d.__getitem__(k) ?

IOW, are default-enabled dicts really going to be be passed
into unknown contexts for use as a dict workalike? I can see using on_missing
for external side effects like logging etc., or _maybe_ modifying the dict with
a known separate set of keys that wouldn't be used for the normal purposes of the dict.

ISTM a defaulting dict could only reasonably be passed into contexts that expected it,
but that could still be useful also. How about d = dict() for a totally normal dict,
and d.defaulting to get a view that uses d.default_factory if present? E.g.,

d = dict()
d.default_factory = list
for i,name in enumerate('Eeny Meeny Miny Moe'.split()): # prefix insert order
    d.defaulting[name].append(i)  # or hoist d.defaulting => dd[name].append(i)

Maybe d.defaulting could be a descriptor?

If the above were done, could d.on_missing be independent and always active if present? E.g.,

    d.on_missing = lambda self, key: self.__setitem__(key, 0) or 0

would be allowed to work on its own first, irrespective of whether default_factory was set.
If it created d[key] it would effectively override default_factory if active, and
if not active, it would still act, letting you instrument a "normal" dict with special effects.

Of course, if you wanted to write an on_missing handler to use default_factory like your original
example, you could. So on_missing would always trigger if present, for missing keys, but
d.defaulting[k] would only call d.default_factory if the latter was set and the key was missing
even after on_missing (if present) did something (e.g., it could be logging passively).

Bengt Richter

From stephen at  Mon Feb 20 18:31:21 2006
From: stephen at (Stephen J. Turnbull)
Date: Tue, 21 Feb 2006 02:31:21 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Josiah Carlson's
	message of "Sun, 19 Feb 2006 20:28:41 -0800")
References: <>
Message-ID: <>

>>>>> "Josiah" == Josiah Carlson <jcarlson at> writes:

    Josiah> I try to internalize it by not thinking of strings as
    Josiah> encoded data, but as binary data, and unicode as text.  I
    Josiah> then remind myself that unicode isn't native on-disk or
    Josiah> cross-network (which stores and transports bytes, not
    Josiah> characters), so one needs to encode it as binary data.
    Josiah> It's a subtle difference, but it has worked so far for me.

Seems like a lot of work for something that for monolingual usage
should "Just Work" almost all of the time.

    Josiah> I notice that you seem to be in Japan, so teaching unicode
    Josiah> is a must.

Yes.  Japan is more complicated than that, but in Python unicode is a

    Josiah> If you are using the "unicode is text" and "strings are
    Josiah> data", and they aren't getting it; then I don't know.

Well, I can tell you that they don't get it.  One problem is PEP 263.
It makes it very easy to write programs that do line-oriented I/O with
input() and print, and the students come to think it should always be
that easy.  Since Japan has at least 6 common encodings that students
encounter on a daily basis while browsing the web, plus a couple more
that live inside of MSFT Word and Java, they're used to huge amounts
of magic.  The normal response of novice programmers is to mandate
that users of their programs use the encoding of choice and put it in
ordinary strings so that it just works.

Ie, the average student just "eats" the F on the codecs assignment,
and writes the rest of her programs without them.

    >> simple, and the exceptions for using a "nonexistent" method
    >> mean I don't have to reinforce---the students will be able to
    >> teach each other.  The exceptions also directly help reinforce
    >> the notion that text == Unicode.

    Josiah> Are you sure that they would help?  If .encode() and
    Josiah> .decode() drop from strings and unicode (respectively),
    Josiah> they get an AttributeError.  That's almost useless.

Well, I'm not _sure_, but this is the kind of thing that you can learn
by rote.  And it will happen on a sufficiently regular basis that a
large fraction of students will experience it.  They'll ask each
other, and usually they'll find a classmate who knows what happened.

I haven't tried this with codecs, but that's been my experience with
statistical packages where some routines understand non-linear
equations but others insist on linear equations.[1] The error messages
("Equation is non-linear!  Aaugh!") are not much more specific than

    Josiah> Raising a better exception (with more information) would
    Josiah> be better in that case, but losing the functionality that
    Josiah> either would offer seems unnecessary;

Well, the point is that for the "usual suspects" (ie, Unicode codecs)
there is no functionality that would be lost.  As MAL pointed out, for
these codecs the "original" text is always Unicode; that's the role
Unicode is designed for, and by and large it fits the bill very well.
With few exceptions (such as rot13) the "derived" text will be bytes
that peripherals such as keyboards and terminals can generate and

    Josiah> "You are trying to encode/decode to/from incompatible
    Josiah> types. expected: a->b got: x->y" is better.  Some of those
    Josiah> can be done *very soon*, given the capabilities of the
    Josiah> encodings module,

That's probably the way to go.

If we can have a derived "Unicode codec" class that does this, that
would pretty much entirely serve the need I perceive.  Beginning
students could learn to write, more advanced students could
learn to create codec stacks to generate MIME bodies, which could
include base64 or quoted-printable bytes -> bytes codecs.

[1]  If you're not familiar with regression analysis, the problem is
that the equation "z = a*log(x) + b*log(y)" where a and b are to be
estimated is _linear_ in the sense that x, y, and z are data series,
and X = log(x) and Y = log(y) can be precomputed so that the equation
actually computed is "z = a*X + b*Y".  On the other hand "z = a*(x +
b*y)" is _nonlinear_ because of the coefficient on y being a*b.
Students find this hard to grasp in the classroom, but they learn
quickly in the lab.

I believe the parameter/variable inversion that my students have
trouble with in statistics is similar to the "original"/"derived"
inversion that happens with "text you can see" (derived, string) and
"abstract text inside the program" (original, Unicode).

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From bokr at  Mon Feb 20 18:38:54 2006
From: bokr at (Bengt Richter)
Date: Mon, 20 Feb 2006 17:38:54 GMT
Subject: [Python-Dev] documenting things [Was: Re: Proposal: defaultdict]
References: <>
	<> <dt8qb8$jd8$>
Message-ID: <>

On Mon, 20 Feb 2006 12:01:02 +0000, "Paul Moore" <p.f.moore at> wrote:

>On 2/19/06, Steve Holden <steve at> wrote:
>> > You are missing the rationale of the PEP process. The point is
>> > *not* documentation. The point of the PEP process is to channel
>> > and collect discussion, so that the BDFL can make a decision.
>> > The BDFL is not bound at all to the PEP process.
>> >
>> > To document things, we use (or should use) documentation.
>> >
>> >
>> One could wish this ideal had been the case for the import extensions
>> defined in PEP 302.
>(A bit off-topic, but that hit home, so I'll reply...)
>Agreed, and it's my fault they weren't, to some extent. I did try to
>find a suitable place, but the import docs are generally fairly
>scattered, and there wasn't a particularly good place to put the
>Any suggestions would be gratefully accepted...

I've always thought we could leverage google to find good doc information
if we would just tag it in some consistent way. E.g., if you wanted to
post a partial draft of some pep doc, you could post it here and/or c.l.p
PEP 302 docs version 2 <<--ENDMARK--
text here
(use REST if ambitious)

If we had some standard tag lines, we could make an urllib tool to harvest the
material and merge the most recent version paragraphs and auto-post it as html
in one place for draft docs on

The same tagged section technique could be used re any documention, so
long as update and/or addition text can be associated with where it should
be tagged in as references. I think mouseover popup hints with clickable js popups
for the additional material would be cool. It would mean automatically editing
the doc to insert the hints though.

Well, nothing there is rocket science, but neither is a wall of bricks
so long and high you can't live long enough to complete it ;-)

Bengt Richter

From rhamph at  Mon Feb 20 19:22:30 2006
From: rhamph at (Adam Olsen)
Date: Mon, 20 Feb 2006 11:22:30 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Aahz <aahz at> wrote:
> On Sun, Feb 19, 2006, Josiah Carlson wrote:
> >
> > I agree, there is nothing perfect.  But at least in all of my use-cases,
> > and the majority of the ones I've seen 'in the wild', my previous post
> > provided an implementation that worked precisely like desired, and
> > precisely like a regular dictionary, except when accessing a
> > non-existant key via: value = dd[key] . __contains__, etc., all work
> > exactly like they do with a non-defaulting dictionary. Iteration via
> > popitem(), pop(key), items(), iteritems(), __iter__, etc., all work the
> > way you would expect them.
> This is the telling point, IMO.  My company makes heavy use of a "default
> dict" (actually, it's a "default class" because using constants as the
> lookup keys is mostly what we do and the convenience of is
> compelling over foo['bar']).  Anyway, our semantics are as Josiah
> outlines, and I can't see much use case for the alternatives.

Can you say, for the record (since nobody else seems to care), if
d.getorset(key, func) would work in your use cases?

> Those of you arguing something different: do you have a real use case
> (that you've implemented in real code)?

(again, for the record) getorset provides the minimum needed
functionality in a clean and intuitive way.  Why go for a complicated
solution when you simply don't need it?

Adam Olsen, aka Rhamphoryncus

From lists at  Mon Feb 20 19:32:48 2006
From: lists at (Jan Claeys)
Date: Mon, 20 Feb 2006 19:32:48 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <>
References: <>
	<dstlvb$6cb$> <>
Message-ID: <1140460369.13739.117.camel@localhost.localdomain>

Op vr, 17-02-2006 te 23:22 +0100, schreef "Martin v. L?wis":
> That, in turn, is because nobody is so short of disk space that
> you really *have* to share /usr/share across architectures, 

I can see diskless thin clients that boot from flash memory doing things
like that?  (E.g. having documentation and header files and other
less-important stuff on an nfs mount?)

Jan Claeys

From guido at  Mon Feb 20 19:53:30 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 10:53:30 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <008101c6363b$ad0fc4e0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On 2/20/06, Raymond Hettinger <raymond.hettinger at> wrote:
> [GvR]
> > Alternative A: add a new method to the dict type with the semantics of
> > __getattr__ from the last proposal
> Did you mean __getitem__?

Yes, sorry, I meant __getitem__.

--Guido van Rossum (home page:

From jimjjewett at  Mon Feb 20 20:02:02 2006
From: jimjjewett at (Jim Jewett)
Date: Mon, 20 Feb 2006 14:02:02 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

Adam Olsen asked:
> ... d.getorset(key, func) would work in your use
> cases?

It is an improvement over setdefault, because it
doesn't always evaluate the expensive func.  (But
why should every call have to pass in the function,
when it is a property of the dictionary?)

It doesn't actually *solve* the problem because it
doesn't compose well.  This makes it hard to use for

(Use case from plucker web reader, where the
config is arguably overdesigned, but ... the version
here is simplified)

There is a system-wide default config.
Users have config files.
A config file can be specified for this program run.

In each of these, settings can be either general
settings or channel-specific.  The end result is that
the value should be pulled from the first of about half
a dozen dictionaries to have an answer.  Because
most settings are never used in most channels, and
several channels are typically run at once, it feels
wrong to pre-build the whole "anything they
might ask" settings dictionary for each of them.  On
the other hand, I certainly don't want to write



even once, let alone every time I get a config value.

In other words, the program would work correctly
if I passed in a normal but huge dictionary; I want
to avoid that for reasons of efficiency.  This isn't the
only use for a mapping, but it is the only one I've
seen where KeyError is "expected" by the
program's normal flow.


From aleaxit at  Mon Feb 20 20:09:48 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 11:09:48 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <008101c6363b$ad0fc4e0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:

> [GvR]
>> I'm not convinced by the argument
>> that __contains__ should always return True
> Me either.  I cannot think of a more useless behavior or one more  
> likely to have
> unexpected consequences.  Besides, as Josiah pointed out, it is  
> much easier for
> a subclass override to substitute always True return values than  
> vice-versa.

Agreed on all counts.

> I prefer this approach over subclassing.  The mental load from an  
> additional
> method is less than the load from a separate type (even a  
> subclass).   Also,
> avoidance of invariant issues is a big plus.  Besides, if this allows
> setdefault() to be deprecated, it becomes an all-around win.

I'd love to remove setdefault in 3.0 -- but I don't think it can be  
done before that: default_factory won't cover the occasional use  
cases where setdefault is called with different defaults at different  
locations, and, rare as those cases may be, any 2.* should not break  
any existing code that uses that approach.

>> - Even if the default_factory were passed to the constructor, it  
>> still
>> ought to be a writable attribute so it can be introspected and
>> modified. A defaultdict that can't change its default factory after
>> its creation is less useful.
> Right!  My preference is to have default_factory not passed to the  
> constructor,
> so we are left with just one way to do it.  But that is a nit.

No big deal either way, but I see "passing the default factory to the  
ctor" as the "one obvious way to do it", so I'd rather have it (be it  
with a subclass or a classmethod-alternate constructor). I won't weep  
bitter tears if this drops out, though.

>> - It would be unwise to have a default value that would be called if
>> it was callable: what if I wanted the default to be a class instance
>> that happens to have a __call__ method for unrelated reasons?
>> Callability is an elusive propperty; APIs should not attempt to
>> dynamically decide whether an argument is callable or not.
> That makes sense, though it seems over-the-top to need a zero- 
> factory for a
> multiset.

But int is a convenient zero-factory.

> An alternative is to have two possible attributes:
>   d.default_factory = list
> or
>   d.default_value = 0
> with an exception being raised when both are defined (the test is  
> done when the
> attribute is created, not when the lookup is performed).

I see default_value as a way to get exactly the same beginner's error  
we already have with function defaults: a mutable object will not  
work as beginners expect, and we can confidently predict (based on  
the function defaults case) that python-list and python-help and  
python-tutor and a bazillion other venues will see an unending stream  
of confused beginners (in addition to those confused by mutable  
objects as default values for function arguments, but those can't be  
avoided). I presume you consider the "one obvious way" is to use  
default_value for immutables and default_factory for mutables, but  
based on a lot of experience teaching Python I feel certain that this  
won't be obvious to many, MANY users (and not just non-Dutch ones,  


From steven.bethard at  Mon Feb 20 20:24:09 2006
From: steven.bethard at (Steven Bethard)
Date: Mon, 20 Feb 2006 12:24:09 -0700
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> Alternative A: add a new method to the dict type with the semantics of
> __getattr__ from the last proposal, using default_factory if not None
> (except on_missing is inlined).

I'm not certain I understood this right but (after
s/__getattr__/__getitem__) this seems to suggest that for keeping a
dict of counts the code wouldn't really improve much:

dd = {}
dd.default_factory = int
for item in items:
    # I want to do ``dd[item] += 1`` but with a regular method instead
    # of __getitem__, this is not possible
    dd[item] = dd.somenewmethod(item) + 1

I don't think that's much better than just calling ``dd.get(item,
0)``.  Did I misunderstand Alternative A?

> Alternative B: provide a dict subclass that implements the __getattr__
> semantics from the last proposal.

If I didn't misinterpret Alternative A, I'd definitely prefer
Alternative B.  A dict of counts is by far my most common use case...

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From dw at  Mon Feb 20 20:58:08 2006
From: dw at (David Wilson)
Date: Mon, 20 Feb 2006 19:58:08 +0000
Subject: [Python-Dev] Simple CPython stack overflow.
Message-ID: <>

Just noticed this and wondered if it came under the Python should never
crash mantra. Should sys.getrecursionlimit() perhaps be taken into
account somewhere?

    >>> D = {'a': None}
    >>> for i in xrange(150000):
    ...     D = {'a': D}
    >>> D
    {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a':
    {'a': {'a': {'a': {'a': {'a': {'a': {'a': .... ': {'a': {'a': {'a':
    {'a': {'a': {'a': {'a': {[+]'a': {'a': {'a': {'a': {'a': {'a': {'a':
    {'a': {'a' .... Bus error



'tis better to be silent and be thought a fool,
than to speak and remove all doubt.
    -- Lincoln

From jcarlson at  Mon Feb 20 21:24:18 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 20 Feb 2006 12:24:18 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

"Adam Olsen" <rhamph at> wrote:
> Can you say, for the record (since nobody else seems to care), if
> d.getorset(key, func) would work in your use cases?

It doesn't work for the multiset/accumulation case:

    dd[key] += 1

 - Josiah

From guido at  Mon Feb 20 21:25:24 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 12:25:24 -0800
Subject: [Python-Dev] Simple CPython stack overflow.
In-Reply-To: <>
References: <>
Message-ID: <>

Yes, this is the type of thing we've been struggling with for years.
There used to be way more of these. I can't guarantee it'll be fixed
with priority (it's mostly of the "then don't do that" type) but
please do file a bug so someone with inclination can fix it. The same
happens for deeply recursive tuples and lists BTW.


On 2/20/06, David Wilson <dw at> wrote:
> Just noticed this and wondered if it came under the Python should never
> crash mantra. Should sys.getrecursionlimit() perhaps be taken into
> account somewhere?
>     >>> D = {'a': None}
>     >>> for i in xrange(150000):
>     ...     D = {'a': D}
>     ...
>     >>> D
>     {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a': {'a':
>     {'a': {'a': {'a': {'a': {'a': {'a': {'a': .... ': {'a': {'a': {'a':
>     {'a': {'a': {'a': {'a': {[+]'a': {'a': {'a': {'a': {'a': {'a': {'a':
>     {'a': {'a' .... Bus error
>     bash$
> Cheers,
> David.
> --
> 'tis better to be silent and be thought a fool,
> than to speak and remove all doubt.
>     -- Lincoln
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

--Guido van Rossum (home page:

From guido at  Mon Feb 20 21:33:04 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 12:33:04 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Steven Bethard <steven.bethard at> wrote:
> Guido van Rossum wrote:
> > Alternative A: add a new method to the dict type with the semantics of
> > [__getitem__] from the last proposal, using default_factory if not None
> > (except on_missing is inlined).
> I'm not certain I understood this right but [...]
> this seems to suggest that for keeping a
> dict of counts the code wouldn't really improve much:

You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1
is perfectly fine there and hard to improve upon.

It's the slightly more esoteric use case where the default is a list
and you want to append to that list that we're trying to improve:
currently the shortest version is d.setdefault(k, []).append(v) but
that lacks legibility and creates an empty list that is thrown away
most of the time. We're trying to obtain the minimal form where the new list is created by implicitly calling
d.default_factory if d[k] doesn't yet exist, and d.default_factory is
set to the list constructor.

--Guido van Rossum (home page:

From crutcher at  Mon Feb 20 21:34:44 2006
From: crutcher at (Crutcher Dunnavant)
Date: Mon, 20 Feb 2006 12:34:44 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry to chime in so late, but why are we setting a value when the key
isn't defined?

It seems there are many situations where you want:
  a) default values, and
  b) the ability to determine if a value was defined.

There are many times that I want d[key] to give me a value even when
it isn't defined, but that doesn't always mean I want to _save_ that
value in the dict. Sometimes I do, sometimes I don't. We should have
some means of describing this in any defaultdict implementation

On 2/20/06, Guido van Rossum <guido at> wrote:
> I'm withdrawing the last proposal. I'm not convinced by the argument
> that __contains__ should always return True (perhaps it should also
> insert the value?), nor by the complaint that a holy invariant would
> be violated (so what?).
> But the amount of discussion and the number of different viewpoints
> present makes it clear that the feature as I last proposed would be
> forever divisive.
> I see two alternatives. These will cause a different kind of
> philosophical discussion; so be it. I'll describe them relative to the
> last proposal; for those who wisely skipped the last thread, here's a
> link to the proposal:
> Alternative A: add a new method to the dict type with the semantics of
> __getattr__ from the last proposal, using default_factory if not None
> (except on_missing is inlined). This avoids the discussion about
> broken invariants, but one could argue that it adds to an already
> overly broad API.
> Alternative B: provide a dict subclass that implements the __getattr__
> semantics from the last proposal. It could be an unrelated type for
> all I care, but I do care about implementation inheritance since it
> should perform just as well as an unmodified dict object, and that's
> hard to do without sharing implementation (copying would be worse).
> Parting shots:
> - Even if the default_factory were passed to the constructor, it still
> ought to be a writable attribute so it can be introspected and
> modified. A defaultdict that can't change its default factory after
> its creation is less useful.
> - It would be unwise to have a default value that would be called if
> it was callable: what if I wanted the default to be a class instance
> that happens to have a __call__ method for unrelated reasons?
> Callability is an elusive propperty; APIs should not attempt to
> dynamically decide whether an argument is callable or not.
> - A third alternative would be to have a new method that takes an
> explicit defaut factory argument. This differs from setdefault() only
> in the type of the second argument. I'm not keen on this; the original
> use case came from an example where the readability of
>   d.setdefault(key, []).append(value)
> was questioned, and I'm not sure that
>   d.something(key, list).append(value)
> is any more readable. IOW I like (and I believe few have questioned)
> associating the default factory with the dict object instead of with
> the call site.
> Let the third round of the games begin!
> --
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

Crutcher Dunnavant <crutcher at>

From crutcher at  Mon Feb 20 21:37:30 2006
From: crutcher at (Crutcher Dunnavant)
Date: Mon, 20 Feb 2006 12:37:30 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

I'm thinking something mutch closer to this (note default_factory gets the key):

def on_missing(self, key):
  if self.default_factory is not None:
    value = self.default_factory(key)
    if self.on_missing_define_key:
      self[key] = value
    return value
  raise KeyError(key)

On 2/20/06, Crutcher Dunnavant <crutcher at> wrote:
> Sorry to chime in so late, but why are we setting a value when the key
> isn't defined?
> It seems there are many situations where you want:
>   a) default values, and
>   b) the ability to determine if a value was defined.
> There are many times that I want d[key] to give me a value even when
> it isn't defined, but that doesn't always mean I want to _save_ that
> value in the dict. Sometimes I do, sometimes I don't. We should have
> some means of describing this in any defaultdict implementation
> On 2/20/06, Guido van Rossum <guido at> wrote:
> > I'm withdrawing the last proposal. I'm not convinced by the argument
> > that __contains__ should always return True (perhaps it should also
> > insert the value?), nor by the complaint that a holy invariant would
> > be violated (so what?).
> >
> > But the amount of discussion and the number of different viewpoints
> > present makes it clear that the feature as I last proposed would be
> > forever divisive.
> >
> > I see two alternatives. These will cause a different kind of
> > philosophical discussion; so be it. I'll describe them relative to the
> > last proposal; for those who wisely skipped the last thread, here's a
> > link to the proposal:
> >
> >
> > Alternative A: add a new method to the dict type with the semantics of
> > __getattr__ from the last proposal, using default_factory if not None
> > (except on_missing is inlined). This avoids the discussion about
> > broken invariants, but one could argue that it adds to an already
> > overly broad API.
> >
> > Alternative B: provide a dict subclass that implements the __getattr__
> > semantics from the last proposal. It could be an unrelated type for
> > all I care, but I do care about implementation inheritance since it
> > should perform just as well as an unmodified dict object, and that's
> > hard to do without sharing implementation (copying would be worse).
> >
> > Parting shots:
> >
> > - Even if the default_factory were passed to the constructor, it still
> > ought to be a writable attribute so it can be introspected and
> > modified. A defaultdict that can't change its default factory after
> > its creation is less useful.
> >
> > - It would be unwise to have a default value that would be called if
> > it was callable: what if I wanted the default to be a class instance
> > that happens to have a __call__ method for unrelated reasons?
> > Callability is an elusive propperty; APIs should not attempt to
> > dynamically decide whether an argument is callable or not.
> >
> > - A third alternative would be to have a new method that takes an
> > explicit defaut factory argument. This differs from setdefault() only
> > in the type of the second argument. I'm not keen on this; the original
> > use case came from an example where the readability of
> >
> >   d.setdefault(key, []).append(value)
> >
> > was questioned, and I'm not sure that
> >
> >   d.something(key, list).append(value)
> >
> > is any more readable. IOW I like (and I believe few have questioned)
> > associating the default factory with the dict object instead of with
> > the call site.
> >
> > Let the third round of the games begin!
> >
> > --
> > --Guido van Rossum (home page:
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >
> --
> Crutcher Dunnavant <crutcher at>

Crutcher Dunnavant <crutcher at>

From aahz at  Mon Feb 20 21:38:52 2006
From: aahz at (Aahz)
Date: Mon, 20 Feb 2006 12:38:52 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Feb 20, 2006, Adam Olsen wrote:
> On 2/20/06, Aahz <aahz at> wrote:
>> On Sun, Feb 19, 2006, Josiah Carlson wrote:
>>> I agree, there is nothing perfect.  But at least in all of my use-cases,
>>> and the majority of the ones I've seen 'in the wild', my previous post
>>> provided an implementation that worked precisely like desired, and
>>> precisely like a regular dictionary, except when accessing a
>>> non-existant key via: value = dd[key] . __contains__, etc., all work
>>> exactly like they do with a non-defaulting dictionary. Iteration via
>>> popitem(), pop(key), items(), iteritems(), __iter__, etc., all work the
>>> way you would expect them.
>> This is the telling point, IMO.  My company makes heavy use of a "default
>> dict" (actually, it's a "default class" because using constants as the
>> lookup keys is mostly what we do and the convenience of is
>> compelling over foo['bar']).  Anyway, our semantics are as Josiah
>> outlines, and I can't see much use case for the alternatives.
> Can you say, for the record (since nobody else seems to care), if
> d.getorset(key, func) would work in your use cases?

Because I haven't been reading this thread all that closely, you'll have
to remind me what this means.

>> Those of you arguing something different: do you have a real use case
>> (that you've implemented in real code)?
> (again, for the record) getorset provides the minimum needed
> functionality in a clean and intuitive way.  Why go for a complicated
> solution when you simply don't need it?

Ditto above.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From guido at  Mon Feb 20 21:54:42 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 12:54:42 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Josiah Carlson <jcarlson at> wrote:
> "Adam Olsen" <rhamph at> wrote:
> > Can you say, for the record (since nobody else seems to care), if
> > d.getorset(key, func) would work in your use cases?
> It doesn't work for the multiset/accumulation case:
>     dd[key] += 1

This is actually a fairly powerful argument for a subclass that
redefines __getitem__ in favor of a new dict method. (Not to mention
that it's much easier to pick a name for the subclass than for the
method. :-) See the new thread I started.

--Guido van Rossum (home page:

From fuzzyman at  Mon Feb 20 21:57:49 2006
From: fuzzyman at (Michael Foord)
Date: Mon, 20 Feb 2006 20:57:49 +0000
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <dtarlp$4av$>
References: <>	<dt94d8$bbd$>	<>
Message-ID: <>

Manuzhai wrote:
>> No; nobody volunteered a machine yet (plus the hand-holding that
>> is always necessary with Windows).
> What exactly is needed for this? Does it need to be a machine dedicated 
> to this stuff, or could I just run the tests once every day or so when I 
> feel like it and have them submitted to buildbot?
Has a machine been volunteered ?

I have a spare machine and an always on connection. Would the 'right' 
development tools be needed ? (In the case of Microsoft they are a touch 
expensive I believe.)

All the best,

Michael Foord

> Regards,
> Manuzhai
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From ianb at  Mon Feb 20 22:13:23 2006
From: ianb at (Ian Bicking)
Date: Mon, 20 Feb 2006 15:13:23 -0600
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	<008101c6363b$ad0fc4e0$b83efea9@RaymondLaptop1>
Message-ID: <>

Alex Martelli wrote:
>>I prefer this approach over subclassing.  The mental load from an  
>>method is less than the load from a separate type (even a  
>>subclass).   Also,
>>avoidance of invariant issues is a big plus.  Besides, if this allows
>>setdefault() to be deprecated, it becomes an all-around win.
> I'd love to remove setdefault in 3.0 -- but I don't think it can be  
> done before that: default_factory won't cover the occasional use  
> cases where setdefault is called with different defaults at different  
> locations, and, rare as those cases may be, any 2.* should not break  
> any existing code that uses that approach.

Would it be deprecated in 2.*, or start deprecating in 3.0?

Also, is default_factory=list threadsafe in the same way .setdefault is? 
  That is, you can safely do this from multiple threads:

   d.setdefault(key, []).append(value)

I believe this is safe with very few caveats -- setdefault itself is 
atomic (or else I'm writing some bad code ;).  My impression is that 
default_factory will not generally be threadsafe in the way setdefault 
is.  For instance:

   def make_list(): return []
   d = dict
   d.default_factory = make_list
   # from multiple threads:

This would not be correct (a value can be lost if two threads 
concurrently enter make_list for the same key).  In the case of 
default_factory=list (using the list builtin) is the story different? 
Will this work on Jython, IronPython, or PyPy?  Will this be a 
documented guarantee?  Or alternately, are we just creating a new way to 
punish people who use threads?  And if we push threadsafety up to user 
code, are we trading a very small speed issue (creating lists that are 
thrown away) for a much larger speed issue (acquiring a lock)?

I tried to make a test for this threadsafety, actually -- using a 
technique besides setdefault which I knew was bad (try:except 
KeyError:).  And (except using time.sleep(), which is cheating), I 
wasn't actually able to trigger the bug.  Which is frustrating, because 
I know the bug is there.  So apparently threadsafety is hard to test in 
this case.  (If anyone is interested in trying it, I can email what I have.)

Note that multidict -- among other possible concrete collection patterns 
(like Bag, OrderedDict, or others) -- can be readily implemented with 
threading guarantees.

Ian Bicking  /  ianb at  /

From ianb at  Mon Feb 20 22:13:27 2006
From: ianb at (Ian Bicking)
Date: Mon, 20 Feb 2006 15:13:27 -0600
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Bethard wrote:
>>Alternative A: add a new method to the dict type with the semantics of
>>__getattr__ from the last proposal, using default_factory if not None
>>(except on_missing is inlined).
> I'm not certain I understood this right but (after
> s/__getattr__/__getitem__) this seems to suggest that for keeping a
> dict of counts the code wouldn't really improve much:
> dd = {}
> dd.default_factory = int
> for item in items:
>     # I want to do ``dd[item] += 1`` but with a regular method instead
>     # of __getitem__, this is not possible
>     dd[item] = dd.somenewmethod(item) + 1

This would be better done with a bag (a set that can contain multiple 
instances of the same item):

dd = collections.Bag()
for item in items:

Then to see how many there are of an item, perhaps something like:

No collections.Bag exists, but of course one should.  It has nice 
properties -- inclusion is done with __contains__ (with dicts it 
probably has to be done with get), you can't accidentally go below zero, 
the methods express intent, and presumably it will implement only a 
meaningful set of methods.

Ian Bicking  /  ianb at  /

From guido at  Mon Feb 20 22:18:14 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 13:18:14 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Ian Bicking <ianb at> wrote:
> Would it be deprecated in 2.*, or start deprecating in 3.0?

3.0 will have no backwards compatibility allowances. Whenever someone
says "remove this in 3.0" they mean exactly that. There will be too
many incompatibilities in 3.0 to be bothered with deprecating them
all; most likely we'll have to have some kind of (semi-)automatic
conversion tool.

Deprecation in 2.x is generally done to indicate that a feature will
be removed in 2.y for y >= x+1.

--Guido van Rossum (home page:

From aleaxit at  Mon Feb 20 22:20:46 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 13:20:46 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 12:33 PM, Guido van Rossum wrote:
> You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1
> is perfectly fine there and hard to improve upon.

I see d[k]+=1 as a substantial improvement -- conceptually more  
direct, "I've now seen one more k than I had seen before".


From aleaxit at  Mon Feb 20 22:24:24 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 13:24:24 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 12:38 PM, Aahz wrote:
>> Can you say, for the record (since nobody else seems to care), if
>> d.getorset(key, func) would work in your use cases?
> Because I haven't been reading this thread all that closely, you'll  
> have
> to remind me what this means.

Roughly the same (save for method/function difference) as:

def getorset(d, key, func):
   if key not in d: d[key] = func()
   return d[key]


From guido at  Mon Feb 20 22:28:46 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 13:28:46 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Ian Bicking <ianb at> wrote:
> Also, is default_factory=list threadsafe in the same way .setdefault is?
>   That is, you can safely do this from multiple threads:
>    d.setdefault(key, []).append(value)
> I believe this is safe with very few caveats -- setdefault itself is
> atomic (or else I'm writing some bad code ;).

Only if the key is a string and all values in the dict are also
strings (or other builtins). And I don't think that Jython or
IronPython promise anything here.

Here's a sketch of a situation that isn't thread-safe:

class C:
  def __eq__(self, other):
    return False
  def __hash__(self):
    return hash("abc")

d = {C(): 42}
print d["abc"]

Because "abc" and C() have the same hash value, the lookup will
compare "abc" to C() which will invoke C.__eq__().

Why are you so keen on using a dictionary to share data between
threads that  may both modify it? IMO this is asking for trouble --
the advice about sharing data between threads is always to use the
Queue module.

> Note that multidict -- among other possible concrete collection patterns
> (like Bag, OrderedDict, or others) -- can be readily implemented with
> threading guarantees.

I don't believe that this is as easy as you think.

--Guido van Rossum (home page:

From guido at  Mon Feb 20 22:32:08 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 13:32:08 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Alex Martelli <aleaxit at> wrote:
> On Feb 20, 2006, at 12:33 PM, Guido van Rossum wrote:
>     ...
> > You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1
> > is perfectly fine there and hard to improve upon.
> I see d[k]+=1 as a substantial improvement -- conceptually more
> direct, "I've now seen one more k than I had seen before".

Yes, I now agree. This means that I'm withdrawing proposal A (new
method) and championing only B (a subclass that implements
__getitem__() calling on_missing() and on_missing() defined in that
subclass as before, calling default_factory unless it's None). I don't
think this crisis is big enough to need *two* solutions, and this
example shows B's superiority over A.

--Guido van Rossum (home page:

From python at  Mon Feb 20 22:43:20 2006
From: python at (Raymond Hettinger)
Date: Mon, 20 Feb 2006 16:43:20 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <><>
Message-ID: <007701c63666$aaf98080$7600a8c0@RaymondLaptop1>

[Crutcher Dunnavant ]
>> There are many times that I want d[key] to give me a value even when
>> it isn't defined, but that doesn't always mean I want to _save_ that
>> value in the dict. 

How does that differ from the existing dict.get method?


From martin at  Mon Feb 20 22:44:00 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 22:44:00 +0100
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
>     Martin> For an example where base64 is *not* necessarily
>     Martin> ASCII-encoded, see the "binary" data type in XML
>     Martin> Schema. There, base64 is embedded into an XML document,
>     Martin> and uses the encoding of the entire XML document. As a
>     Martin> result, you may get base64 data in utf16le.
> I'll have to take a look.  It depends on whether base64 is specified
> as an octet-stream to Unicode stream transformation or as an embedding
> of an intermediate representation into Unicode.  Granted, defining the
> base64 alphabet as a subset of Unicode seems like the logical way to
> do it in the context of XML.

Please do take a look. It is the only way: If you were to embed base64
*bytes* into character data content of an XML element, the resulting
XML file might not be well-formed anymore (if the encoding of the XML
file is not an ASCII superencoding).


From ianb at  Mon Feb 20 22:47:45 2006
From: ianb at (Ian Bicking)
Date: Mon, 20 Feb 2006 15:47:45 -0600
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
> Why are you so keen on using a dictionary to share data between
> threads that  may both modify it? IMO this is asking for trouble --
> the advice about sharing data between threads is always to use the
> Queue module.

I use them often for a shared caches.  But yeah, it's harder than I 
thought at first -- I think the actual cases I'm using work, since they 
use simple keys (ints, strings), but yeah, thread guarantees are too 
difficult to handle in general.  Damn threads.

Ian Bicking  /  ianb at  /

From aahz at  Mon Feb 20 23:10:16 2006
From: aahz at (Aahz)
Date: Mon, 20 Feb 2006 14:10:16 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Feb 20, 2006, Alex Martelli wrote:
> On Feb 20, 2006, at 12:38 PM, Aahz wrote:
>     ...
>>> Can you say, for the record (since nobody else seems to care), if
>>> d.getorset(key, func) would work in your use cases?
>> Because I haven't been reading this thread all that closely, you'll  
>> have
>> to remind me what this means.
> Roughly the same (save for method/function difference) as:
> def getorset(d, key, func):
>    if key not in d: d[key] = func()
>    return d[key]

That has the problem of looking clumsy, and doubly so for our use case
where it's an attribute-based dict.  Our style relies on the clean look
of code like this:

    if order.street:

Even as a dict, that doesn't look horrible:

    if order['street']:

OTOH, this starts looking ugly:

    if order.get('street'):

And this is just plain bad:

    if getattr(order, 'street'):

Note that because we have to deal with *both* the possibility that the
attribute/key may not be there *and* that it might be blank -- but both
are semantically equivalent for our application -- there's no other
clean coding style.

Now, I realize this is different from the "primary use case" for needing
mutable values, but any proposed default dict solution that doesn't
cleanly support my use case is less interesting to me.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From martin at  Mon Feb 20 23:22:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 23:22:53 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<dt94d8$bbd$>
	<>	<dtarlp$4av$>
	<>	<dtb6k1$4om$>
Message-ID: <>

Guido van Rossum wrote:
> They don't; I think a separate page would be a fine idea.

Ok, I have now split this into three pages.

> FWIW, it looks like all the sample templates are still wasting a lot
> of horizontal space in the first two columns the second is almost
> always empty. Perhaps the author of the change could be placed *below*
> the timestamp instead of next to it? Also for all practical purposes
> we can probably get rid of the seconds in the timestamp.

The latter was easy to do, so I did it. The former is tricky;
contributions are welcome.


From martin at  Mon Feb 20 23:25:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 23:25:16 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Steve Holden wrote:
> All formats would be improved of the headers could be made to float at 
> the top of the page as scrolling took place.

Can this be done in CSS? If so, contributions are welcome. If not,
can somebody prepare a modified page with the necessary changes
(preferably only additional classes for the header or some such);
I can then try to edit buildbot to add these changes into the


From g.brandl at  Mon Feb 20 23:35:02 2006
From: g.brandl at (Georg Brandl)
Date: Mon, 20 Feb 2006 23:35:02 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <dtdg6m$o8s$>

Martin v. L?wis wrote:
> Steve Holden wrote:
>> All formats would be improved of the headers could be made to float at 
>> the top of the page as scrolling took place.
> Can this be done in CSS? If so, contributions are welcome.

Not as it is. The big table would have to be split so that there is one table
with the heading and one with the rest. But that would make the columns
independent, so the header's column widths would differ from the content's.

Even then, I don't know if there's a working solution for the headers to stay
on top since
* floats are only left or right aligned
* the header's height is variable
* position:absolute doesn't work in MSIE.


From martin at  Mon Feb 20 23:38:27 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 23:38:27 +0100
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:
> Note that this does not mean that we should forget about memory
> consumption issues. It's just that if there's only marginal
> interest in certain special builds of Python, I don't see the
> requirement for the Python core developers to maintain them.

Well, the cost of Unicode support is not so much in the algorithmic
part, but in the tables that come along with it. AFAICT, everything
but unicodectype is optional; that is 5KiB of code and 20KiB of data
on x86. Actually, the size of the code *does* matter, at a second
glance. Here are the largest object files in the Python code base
on my system (not counting dynamic modules):

   text    data     bss     dec     hex filename
   4845   19968       0   24813    60ed Objects/unicodectype.o
  22633    2432     352   25417    6349 Objects/listobject.o
  29259    1412     152   30823    7867 Objects/classobject.o
  20696   11488       4   32188    7dbc Python/bltinmodule.o
  33579     740       0   34319    860f Objects/longobject.o
  34119      16     288   34423    8677 Python/ceval.o
  35179    2796       0   37975    9457 Modules/_sre.o
  26539   15820     416   42775    a717 Modules/posixmodule.o
  35283    8800    1056   45139    b053 Objects/stringobject.o
  50360       0      28   50388    c4d4 Python/compile.o
  68455    4624     440   73519   11f2f Objects/typeobject.o
  69993    9316    1196   80505   13a79 Objects/unicodeobject.o

So it appears that dropping Unicode support can indeed provide
some savings.

For reference, we also have an option to drop complex numbers:

   9654     692       4   10350    286e Objects/complexobject.o


From martin at  Mon Feb 20 23:49:59 2006
From: martin at (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 23:49:59 +0100
Subject: [Python-Dev] bdist_* to stdlib?
In-Reply-To: <1140460369.13739.117.camel@localhost.localdomain>
References: <>	<dstlvb$6cb$>
	<>	<1140007745.13739.7.camel@localhost.localdomain>	<>	<>	<>	<>
Message-ID: <>

Jan Claeys wrote:
>>That, in turn, is because nobody is so short of disk space that
>>you really *have* to share /usr/share across architectures, 
> I can see diskless thin clients that boot from flash memory doing things
> like that?  (E.g. having documentation and header files and other
> less-important stuff on an nfs mount?)

Having parts of the file system on NFS: sure, even have root on NFS:
all the time.

But if you have two classes of machines (say, diskless SPARC and
diskless x86 PCs) for which you have to provide different sets of
binaries on NFS: why do you have to share /usr/share across
architectures? It will only save you a small percentage of disk
space, and at additional hassles.


From martin at  Mon Feb 20 23:53:47 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Feb 2006 23:53:47 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<dt94d8$bbd$>	<>	<dtarlp$4av$>
Message-ID: <>

Michael Foord wrote:
> Has a machine been volunteered ?

Not yet.

> I have a spare machine and an always on connection. Would the 'right' 
> development tools be needed ? (In the case of Microsoft they are a touch 
> expensive I believe.)

Any build process would do. I would prefer to see the official tools on
the buildbot (i.e. VS.NET 2003), but anything else that can build Python
and pass the test suite could do as well.

One issue is that you also have to to work with me on defining the build
steps: what sequence of commands to send in what order. For Unix, that
is easy; for Windows, not so.


From steven.bethard at  Mon Feb 20 23:58:09 2006
From: steven.bethard at (Steven Bethard)
Date: Mon, 20 Feb 2006 15:58:09 -0700
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

I wrote:
>    # I want to do ``dd[item] += 1``

Guido van Rossum wrote:
> You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1
> is perfectly fine there and hard to improve upon.

Alex Martelli wrote:
> I see d[k]+=1 as a substantial improvement -- conceptually more
> direct, "I've now seen one more k than I had seen before".

Guido van Rossum wrote:
> Yes, I now agree. This means that I'm withdrawing proposal A (new
> method) and championing only B (a subclass that implements
> __getitem__() calling on_missing() and on_missing() defined in that
> subclass as before, calling default_factory unless it's None).

Probably already obvious from my previous post, but FWIW, +1.

Two unaddressed issues:

* What module should hold the type?  I hope the collections module
isn't too controversial.

* Should default_factory be an argument to the constructor?  The three
answers I see:

  - "No."  I'm not a big fan of this answer.  Since the whole point of
creating a defaultdict type is to provide a default, requiring two
statements (the constructor call and the default_factory assignment)
to initialize such a dictionary seems a little inconvenient.
  - "Yes and it should be followed by all the normal dict constructor
arguments."  This is okay, but a few errors, like
``defaultdict({1:2})`` will pass silently (until you try to use the
dict, of course).
  - "Yes and it should be the only constructor argument."  This is my
favorite mainly because I think it's simple, and I couldn't think of
good examples where I really wanted to do ``defaultdict(list,
some_dict_or_iterable)`` or ``defaultdict(list,
**some_keyword_args)``.  It's also forward compatible if we need to
add some of the dict constructor args in later.

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From martin at  Tue Feb 21 00:00:43 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 00:00:43 +0100
Subject: [Python-Dev] Win64 AMD64 (aka x64) binaries available64
Message-ID: <>

I have now produces a snapshot of a Win64 build for AMD64
processors (also known as EM64T or x64); this is different
from IA-64 (which is also known as Itanium)...

Anyway, the binaries are

This is from today's trunk. If you have general remarks/discussion,
please post to python-dev. If you have specific bug reports, file
them on SF. Bug fixes are particularly welcome.

Known issues:
- _ssl.pyd is not build (I get linker errors)
- some of the tests fail (in some cases, due to bugs in the test suite)

If you want to build extensions for this build using distutils, you
need to
1. install the platform SDK (2003 SP1 should work)
2. open an AMD64 retail shell
3. run the included distutils

It might be possible to drop 2) some day, but finding the SDK from
the registry is really tricky.


From brett at  Tue Feb 21 00:04:57 2006
From: brett at (Brett Cannon)
Date: Mon, 20 Feb 2006 15:04:57 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Steven Bethard <steven.bethard at> wrote:
> I wrote:
> >    # I want to do ``dd[item] += 1``
> Guido van Rossum wrote:
> > You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1
> > is perfectly fine there and hard to improve upon.
> Alex Martelli wrote:
> > I see d[k]+=1 as a substantial improvement -- conceptually more
> > direct, "I've now seen one more k than I had seen before".
> Guido van Rossum wrote:
> > Yes, I now agree. This means that I'm withdrawing proposal A (new
> > method) and championing only B (a subclass that implements
> > __getitem__() calling on_missing() and on_missing() defined in that
> > subclass as before, calling default_factory unless it's None).
> Probably already obvious from my previous post, but FWIW, +1.
> Two unaddressed issues:
> * What module should hold the type?  I hope the collections module
> isn't too controversial.
> * Should default_factory be an argument to the constructor?  The three
> answers I see:
>   - "No."  I'm not a big fan of this answer.  Since the whole point of
> creating a defaultdict type is to provide a default, requiring two
> statements (the constructor call and the default_factory assignment)
> to initialize such a dictionary seems a little inconvenient.
>   - "Yes and it should be followed by all the normal dict constructor
> arguments."  This is okay, but a few errors, like
> ``defaultdict({1:2})`` will pass silently (until you try to use the
> dict, of course).
>   - "Yes and it should be the only constructor argument."  This is my
> favorite mainly because I think it's simple, and I couldn't think of
> good examples where I really wanted to do ``defaultdict(list,
> some_dict_or_iterable)`` or ``defaultdict(list,
> **some_keyword_args)``.  It's also forward compatible if we need to
> add some of the dict constructor args in later.

While #3 is my preferred solution as well, it does pose a Liskov
violation if this is a direct dict subclass instead of storing a dict
internally (can't remember the name of the design pattern that does
this).  But I think it is good to have the constructor be different
since it does also help drive home the point that this is not a
standard dict.


From dan.gass at  Tue Feb 21 00:08:07 2006
From: dan.gass at (Dan Gass)
Date: Mon, 20 Feb 2006 17:08:07 -0600
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <008101c6363b$ad0fc4e0$b83efea9@RaymondLaptop1>
References: <>
Message-ID: <>

On 2/20/06, Raymond Hettinger <raymond.hettinger at> wrote:
> An alternative is to have two possible attributes:
>   d.default_factory = list
> or
>   d.default_value = 0
> with an exception being raised when both are defined (the test is done
> when the
> attribute is created, not when the lookup is performed).
Why not have the factory function take the key being looked up as an
argument?  Seems like there would be uses to customize the default based on
the key.  It also forces you to handle list factory functions and constant
factory functions (amongst others) to be handled the same way:

d.default_factory = lambda k : list()
d.default_factory = lambda k : 0

Dan Gass
-------------- next part --------------
An HTML attachment was scrubbed...

From python at  Tue Feb 21 00:14:13 2006
From: python at (Raymond Hettinger)
Date: Mon, 20 Feb 2006 18:14:13 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <><><><><>
Message-ID: <00c901c63673$631c8430$7600a8c0@RaymondLaptop1>

[Steven Bethard]
> * Should default_factory be an argument to the constructor?  The three
> answers I see:
>  - "No."  I'm not a big fan of this answer.  Since the whole point of
> creating a defaultdict type is to provide a default, requiring two
> statements (the constructor call and the default_factory assignment)
> to initialize such a dictionary seems a little inconvenient.

You still have to allow assignments to the default_factory attribute to allow 
the factory to be changed:

    dd.default_factory = SomeFactory

If it's too much effort to do the initial setup in two lines, a classmethod 
could serve as an alternate constructor (leaving the regular contructor fully 
interchangeable with dicts):

    dd = defaultdict.setup(list, {'k1':'v1', 'k2:v2'})

or when there are no initial values:

    dd = defaultdict.setup(list)


From steven.bethard at  Tue Feb 21 00:14:27 2006
From: steven.bethard at (Steven Bethard)
Date: Mon, 20 Feb 2006 16:14:27 -0700
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Dan Gass <dan.gass at> wrote:
> Why not have the factory function take the key being looked up as an
> argument?  Seems like there would be uses to customize the default based on
> the key.  It also forces you to handle list factory functions and constant
> factory functions (amongst others) to be handled the same way:
>  d.default_factory = lambda k : list()
>  d.default_factory = lambda k : 0

Guido's currently backing "a subclass that implements __getitem__()
calling on_missing() and on_missing() ... calling default_factory
unless it's None".  I think for 90% of the use-cases, you don't need a
key argument.  If you do, you should subclass defaultdict and override
the on_missing() method.

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From guido at  Tue Feb 21 00:17:56 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 15:17:56 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Brett Cannon <brett at> wrote:
> While #3 is my preferred solution as well, it does pose a Liskov
> violation if this is a direct dict subclass instead of storing a dict
> internally (can't remember the name of the design pattern that does
> this).  But I think it is good to have the constructor be different
> since it does also help drive home the point that this is not a
> standard dict.

I've heard this argument a few times now from different folks and I'm
tired of it. It's wrong. It's not true. It's a dead argument. It's
pushing up the daisies, so to speak.

Please stop abusing Barbara Liskov's name and remember that the
constructor signature is *not* part of the interface to an instance!
Changing the constructor signature in a subclass does *not* cause
*any* "Liskov" violations because the constructor is not called by
*users* of the object -- it is only called to *create* an object. As
the *user* of an object you're not allowed to *create* another
instance (unless the object provides an explicit API to do so, of
course, in which case you deal with that API's signature, not with the

--Guido van Rossum (home page:

From guido at  Tue Feb 21 00:23:36 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 15:23:36 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Dan Gass <dan.gass at> wrote:
> Why not have the factory function take the key being looked up as an
> argument?

This was considered and rejected already.

You can already customize based on the key by overriding on_missing()
[*]. If the factory were to take a key argument, we couldn't use list
or int as the factory function; we'd have to write lambda key: list().
There aren't that many use cases for having the factory function
depend on the key anyway; it's mostly on_missing() that needs the key
so it can insert the new value into the dict.

[*] Earlier in this thread I wrote that on_missing() could be inlined.
I take that back; I think it's better to have it be available
explicitly so you can override it without having to override
__getitem__(). This is faster, assuming most __getitem__() calls find
the key already inserted, and reduces the amount of code you have to
write to customize the behavior; it also reduces worries about how to
call the superclass __getitem__ method (catching KeyError *might*
catch an unrelated KeyError caused by a bug in the key's __hash__ or
__eq__ method).

--Guido van Rossum (home page:

From greg.ewing at  Tue Feb 21 00:19:36 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 12:19:36 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> I see two alternatives.

Have you considered the third alternative that's been
mentioned -- a wrapper?

The issue of __contains__ etc. could be sidestepped by
not giving the wrapper a __contains__ method at all.
If you want to do an 'in' test you do it on the
underlying dict, and then the semantics are clear.


From fuzzyman at  Tue Feb 21 00:37:44 2006
From: fuzzyman at (Michael Foord)
Date: Mon, 20 Feb 2006 23:37:44 +0000
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <>	<dt94d8$bbd$>	<>	<dtarlp$4av$>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> Michael Foord wrote:
>> Has a machine been volunteered ?
> Not yet.
>> I have a spare machine and an always on connection. Would the 'right' 
>> development tools be needed ? (In the case of Microsoft they are a touch 
>> expensive I believe.)
> Any build process would do. I would prefer to see the official tools on
> the buildbot (i.e. VS.NET 2003), 
Man, that's a difficult (and expensive) piece of software to obtain, 
unless you're a student. I couldn't find a legal non-academic version 
for less than ?100. I might hunt around though.

Shame. I suspect that hacking the free compilers to work would require 
more knowledge than I possess. Sorry.

> but anything else that can build Python
> and pass the test suite could do as well.
> One issue is that you also have to to work with me on defining the build
> steps: what sequence of commands to send in what order. For Unix, that
> is easy; for Windows, not so.
Working with you wouldn't be a problem. Looks like the idea is a 
currently a bit of a dead dog though.

All the best,

Michael Foord
> Regards,
> Martin

From rrr at  Tue Feb 21 00:40:19 2006
From: rrr at (Ron Adam)
Date: Mon, 20 Feb 2006 17:40:19 -0600
Subject: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival
 in	coordination with pep 349?]
In-Reply-To: <>
References: <>
	<>	<>	<>
	<>	<>
	<>	<>
Message-ID: <>

Bengt Richter wrote:
> On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters <thomas at> wrote:

> note what base64 really is for. It's essence is to create a _character_ sequence
> which can succeed in being encoded as ascii. The concept of base64 going str->str
> is really a mental shortcut for s_str.decode('base64').encode('ascii'), where
> 3 octets are decoded as code for 4 characters modulo padding logic.

Wouldn't it be...


This would probably also work...

    obj.encode('base64').decode('ascii')  ->  ascii alphabet in unicode

Where the underlying sequence might be ...

    obj -> bytes -> bytes:base64 -> base64 ascii character set

The point is to have the data in a safe to transmit form that can 
survive being encoded and decoded into different forms along the 
transmission path and still be restored at the final destination.

    base64 ascii character set -> bytes:base64 -> original bytes -> obj

* a related note, derived from this and your other post in this thread.

If the str type constructor had an encode argument like the unicode type 
does, along with a str.encoded_with attribute.  Then it might be 
possible to depreciate the .decode() and .encode() methods and remove 
them form P3k entirely or use them as data coders/decoders instead of 
char type encoders.

It could also create a clear separation between character encodings and 
data coding.  The following should give an exception.

   str(str, 'rot13'))

Rot13 isn't a character encoding, but a data coding method.

   data_str.encode('rot13')   # could be ok

But this wouldn't...

   new_str = data_str.encode('latin_1')    # could cause an exception

We'd have to use...

   new_str = str(data_str, 'latin_1')      # New string sub type...

    Ronald Adam

From jcarlson at  Tue Feb 21 00:51:00 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 20 Feb 2006 15:51:00 -0800
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <> <>
Message-ID: <>

Michael Foord <fuzzyman at> wrote:
> Martin v. L?wis wrote:
> > Any build process would do. I would prefer to see the official tools on
> > the buildbot (i.e. VS.NET 2003), 

I can get a free academic license for VS.NET 2003 professional with my
university (MSDNAA), and I've also got a Windows machine sitting in my
office with a few spare cycles.

> > One issue is that you also have to to work with me on defining the build
> > steps: what sequence of commands to send in what order. For Unix, that
> > is easy; for Windows, not so.

If you're up for it, I'm up for it.  It'll take me a bit to get the
software on the machine.  Want me to ping you when I get the toolset

 - Josiah

From seojiwon at  Tue Feb 21 00:55:19 2006
From: seojiwon at (Jiwon Seo)
Date: Mon, 20 Feb 2006 15:55:19 -0800
Subject: [Python-Dev] problem with genexp
In-Reply-To: <>
References: <>
Message-ID: <>

Regarding this Grammar change;  (last October)
     from   argument: [test '=' ] test [gen_for]
     to      argument: test [gen_for] | test '=' test ['(' gen_for ')']

- to raise error for "bar(a = i for i in range(10)) )"

I think we should change it to
     argument: test [gen_for] | test '=' test

instead of
     argument: test [gen_for] | test '=' test ['(' gen_for ')']

that is, without ['(' gen_for ')'] . We don't need that extra term,
because "test" itself includes generator expressions - with all those
Actually with that extra ['(' gen_for ')'] ,   foo(a= 10 (for y in
'a')) is grammartically correct ; although that error seems to be
checked elsewhere.

I tested without ['(' gen_for ')'] , and worked fine passing


On 10/20/05, Neal Norwitz <nnorwitz at> wrote:
> On 10/16/05, Neal Norwitz <nnorwitz at> wrote:
> > On 10/10/05, Neal Norwitz <nnorwitz at> wrote:
> > > There's a problem with genexp's that I think really needs to get
> > > fixed.  See the details are below.  This
> > > code:
> > >
> > > >>> foo(a = i for i in range(10))
> > >
> > > I agree with the bug report that the code should either raise a
> > > SyntaxError or do the right thing.
> >
> > The change to Grammar/Grammar below seems to fix the problem and all
> > the tests pass.  Can anyone comment on whether this fix is
> > correct/appropriate?  Is there a better way to fix the problem?
> Since no one responded other than Jiwon, I checked in this change.  I
> did *not* backport it since what was syntactically correct in 2.4.2
> would raise an error in 2.4.3.  I'm not sure which is worse.  I'll
> leave it up to Anthony whether this should be backported.
> BTW, the change was the same regardless of old code vs. new AST code.
> n
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From greg.ewing at  Tue Feb 21 01:00:32 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 13:00:32 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Brett Cannon wrote:

> While #3 is my preferred solution as well, it does pose a Liskov
> violation if this is a direct dict subclass

I'm not sure we should be too worried about that.
Inheritance in Python has always been more about
implementation than interface, so Liskov doesn't
really apply in the same way it does in statically
typed languages.

In other words, just because A inherits from B in
Python isn't meant to imply that an A is a drop-in
replacement for a B.


From trentm at  Tue Feb 21 01:17:45 2006
From: trentm at (Trent Mick)
Date: Mon, 20 Feb 2006 16:17:45 -0800
Subject: [Python-Dev] Win64 AMD64 (aka x64) binaries available64
In-Reply-To: <>
References: <>
Message-ID: <>

[Martin v. Loewis wrote]
> If you want to build extensions for this build using distutils, you
> need to
> ...
> 2. open an AMD64 retail shell
> ...
> It might be possible to drop 2) some day, but finding the SDK from
> the registry is really tricky.

Look for:
    def find_platform_sdk_dir()

That is the best code I know for doing that.


Trent Mick
TrentM at

From tdelaney at  Tue Feb 21 01:19:24 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Tue, 21 Feb 2006 11:19:24 +1100
Subject: [Python-Dev] defaultdict proposal round three
Message-ID: <>

Greg Ewing wrote:

> In other words, just because A inherits from B in
> Python isn't meant to imply that an A is a drop-in
> replacement for a B.

Hmm - this is interesting. I'm not arguing Liskov violations or anything

However, *because* Python uses duck typing, I tend to feel that
subclasses in Python *should* be drop-in replacements. If it's not a
drop-in replacement, then it should probably not subclass, but just use
duck typing (probably by wrapping).

Subclassing implies a stronger relationship to me. Which is why I think
I prefer using a wrapper for a default dict, rather than a subclass.

Tim Delaney

From martin at  Tue Feb 21 01:36:57 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 01:36:57 +0100
Subject: [Python-Dev] buildbot is all green
In-Reply-To: <>
References: <> <>
Message-ID: <>

Josiah Carlson wrote:
> If you're up for it, I'm up for it.  It'll take me a bit to get the
> software on the machine.  Want me to ping you when I get the toolset
> installed?

Sure! That should work fine. It would be best if the buildbot would
run with the environment variables all set up, so that both svn.exe
and devenv.exe can be found in the path. Then I would need the sequence
of commands that the buildbot master should issue (svn update, build,
run tests, clean).


From martin at  Tue Feb 21 01:41:50 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 01:41:50 +0100
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Delaney, Timothy (Tim) wrote:
> However, *because* Python uses duck typing, I tend to feel that
> subclasses in Python *should* be drop-in replacements. If it's not a
> drop-in replacement, then it should probably not subclass, but just use
> duck typing (probably by wrapping).

Inheritance is more about code reuse than about polymorphism.


From rrr at  Tue Feb 21 01:40:45 2006
From: rrr at (Ron Adam)
Date: Mon, 20 Feb 2006 18:40:45 -0600
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP
 332	revival in coordination with pep 349?]]
In-Reply-To: <>
References: <>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Bengt Richter wrote:
> On Sat, 18 Feb 2006 09:59:38 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at> wrote:

> Thinking about bytes recently, it occurs to me that bytes are really not intrinsically
> numeric in nature. They don't necessarily represent uint8's. E.g., a binary file is
> really a sequence of bit octets in its most primitive and abstract sense.

In that you would want to do different types of operations on single 
byte (an octet) than you would on str, or integer,  I agree.

Storing byte information as 16 or 32 bits ints could take up a rather 
lot of memory in some cases.

I don't think it's been clarified yet weather the bytes() type would be 
implemented in C where it could be a single object with access to it's 
individual bytes via indexing, or python list type object which stores 
integers, chars or some other byte length object like octets.

My first impression is that it would be done in C with a way to access 
and change the actual bytes.  So a Python octet type wouldn't be needed. 
  But if it is implemented as a Python subclass of list or array, then 
an octet type would probably also be desired.

> Bottom line thought: binary octets aren't numeric ;-)


    Ronald Adam

From martin at  Tue Feb 21 01:44:18 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 01:44:18 +0100
Subject: [Python-Dev] Win64 AMD64 (aka x64) binaries available64
In-Reply-To: <>
References: <>
Message-ID: <>

Trent Mick wrote:
> Look for:
>     def find_platform_sdk_dir()
> here:
> That is the best code I know for doing that.

Right; I was planning something similar (although I would probably
hard-code the 2003 SP1 registry key - it is not at all certain that
future SDK releases will use the same registry scheme, and Microsoft
has tricked users often enough in thinking they understood the
scheme, just to change it with the next release entirely).


From aleaxit at  Tue Feb 21 01:55:34 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 16:55:34 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 3:04 PM, Brett Cannon wrote:
>>   - "Yes and it should be the only constructor argument."  This is my
> While #3 is my preferred solution as well, it does pose a Liskov
> violation if this is a direct dict subclass instead of storing a dict

How so?  Liskov's principle is (in her own words):

If for each object o1 of type S there is an object o2 of type T such  
that for all programs P defined in terms of T, the behavior of P is  
unchanged when o1 is substituted for o2 then S is a subtype of T.

How can this ever be broken by the mere presence of incompatible  
signatures for T's and S's ctors?

I believe the principle, as stated above, was imperfectly stated, btw  
(it WAS preceded by "something like the following substitution  
property", indicating that Liskov was groping towards a good  
formulation), but that's an aside -- the point is that the principle  
is about substitution of _objects_, i.e., _instances_ of the types S  
and T, not about substitution of the _types_ themselves for each  
other. Instances exist and are supposed to satisfy their invariants  
_after_ ctors are done executing; ctor's signatures don't matter.

In Python, of course, you _could_ call type(o2)(...) and possibly get  
different behavior if that was changed into type(o1)(...) -- the  
curse of powerful introspection;-).  But then, isn't it trivial to  
obtain cases in which the behavior is NOT unchanged?  If it was  
always unchanged, what would be the point of ever subclassing?-)  Say  
that o2 is an int and o1 is a bool -- just a "print o2" already  
breaks the principle as stated (it's harder to get a simpler P than  

Unless you have explicitly documented invariants (such as "any 'print  
o' must emit 1+ digits followed by a newline" for integers), you  
cannot say that some alleged subclass is breaking Liskov's property,  
in general. Mere "change of behavior" in the most general case cannot  
qualify, if method overriding is to be any use; such change IS  
traditionally allowed as long as preconditions are looser and  
postconditions are stricter; and I believe than in any real-world  
subclassing, with sufficient introspection you'll always find a  
violation E.g., a subtype IS allowed to add methods, by Liskov's  
specific example; but then, len(dir(o1)) cannot fail to be a higher  
number than len(dir(o2)), from which you can easily construct a P  
which "changes behavior" for any definition you care to choose.   
E.g., pick constant N as the len(dir(...)) for instances of type T,  
and say that M>N is the len(dir(...)) for instances of S.  Well,  
then, math.sqrt(N-len(dir(o2))) is well defined -- but change o2 into  
o1, and since N-M is <0, you'll get an exception.

If you can give an introspection-free example showing how Liskov  
substitution would be broken by a mere change to incompatible  
signature in the ctor, I'll be grateful; but I don't think it can be  


From python at  Tue Feb 21 02:05:33 2006
From: python at (Raymond Hettinger)
Date: Mon, 20 Feb 2006 20:05:33 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <><><><>
Message-ID: <013801c63682$eb39a5f0$7600a8c0@RaymondLaptop1>

>> I see d[k]+=1 as a substantial improvement -- conceptually more
>> direct, "I've now seen one more k than I had seen before".

> Yes, I now agree. This means that I'm withdrawing proposal A (new
> method) and championing only B (a subclass that implements
> __getitem__() calling on_missing() and on_missing() defined in that
> subclass as before, calling default_factory unless it's None). I don't
> think this crisis is big enough to need *two* solutions, and this
> example shows B's superiority over A.

FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5.


From aahz at  Tue Feb 21 02:11:48 2006
From: aahz at (Aahz)
Date: Mon, 20 Feb 2006 17:11:48 -0800
Subject: [Python-Dev]  buildbot vs. Windows
In-Reply-To: <>
References: <> <>
Message-ID: <>

If you're willing to commit to running a buildbot, and the only thing
preventing you is shelling out $$$ to Microsoft, send me e-mail.  I'll
compile a list to send to the PSF and we'll either poke Microsoft to
provide some more free licenses or pay for it ourselves.  This is what
the PSF is for!

Note the emphasis on the word "commit", please.  I'm setting an arbitrary
deadline of Saturday Feb 25 so I don't have to monitor indefinitely.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From aleaxit at  Tue Feb 21 02:46:06 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 17:46:06 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <013801c63682$eb39a5f0$7600a8c0@RaymondLaptop1>
References: <><><><>
Message-ID: <>

On Feb 20, 2006, at 5:05 PM, Raymond Hettinger wrote:

> [Alex]
>>> I see d[k]+=1 as a substantial improvement -- conceptually more
>>> direct, "I've now seen one more k than I had seen before".
> [Guido]
>> Yes, I now agree. This means that I'm withdrawing proposal A (new
>> method) and championing only B (a subclass that implements
>> __getitem__() calling on_missing() and on_missing() defined in that
>> subclass as before, calling default_factory unless it's None). I  
>> don't
>> think this crisis is big enough to need *two* solutions, and this
>> example shows B's superiority over A.
> FWIW, I'm happy with the proposal and think it is a nice addition  
> to Py2.5.

OK, sounds great to me.  collections.defaultdict, then?


From crutcher at  Tue Feb 21 02:57:30 2006
From: crutcher at (Crutcher Dunnavant)
Date: Mon, 20 Feb 2006 17:57:30 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <007701c63666$aaf98080$7600a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

in two ways:

1) dict.get doesn't work for object dicts or in exec/eval contexts, and
2) dict.get requires me to generate the default value even if I'm not
going to use it, a process which may be expensive.

On 2/20/06, Raymond Hettinger <python at> wrote:
> [Crutcher Dunnavant ]
> >> There are many times that I want d[key] to give me a value even when
> >> it isn't defined, but that doesn't always mean I want to _save_ that
> >> value in the dict.
> How does that differ from the existing dict.get method?
> Raymond

Crutcher Dunnavant <crutcher at>

From guido at  Tue Feb 21 03:03:34 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 18:03:34 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Alex Martelli <aleaxit at> wrote:
> > [Alex]
> >>> I see d[k]+=1 as a substantial improvement -- conceptually more
> >>> direct, "I've now seen one more k than I had seen before".
> >
> > [Guido]
> >> Yes, I now agree. This means that I'm withdrawing proposal A (new
> >> method) and championing only B (a subclass that implements
> >> __getitem__() calling on_missing() and on_missing() defined in that
> >> subclass as before, calling default_factory unless it's None). I don't
> >> think this crisis is big enough to need *two* solutions, and this
> >> example shows B's superiority over A.

> > FWIW, I'm happy with the proposal and think it is a nice addition
> > to Py2.5.

> OK, sounds great to me.  collections.defaultdict, then?

I have a patch ready that implements this. I've assigned it to Raymond
for review. I'm just reusing the same SF patch as before:

One subtlety: for maximul flexibility and speed, the standard dict
type now defines an on_missing(key) method; however this version
*just* raises KeyError and the implementation actually doesn't call it
unless the class is a subtype (with the possibility of overriding

collections.defaultdict overrides on_missing(key) to insert and return
self.fefault_factory() if it is not empty; otherwise it raises
KeyError. (It should really call the base class on_missing() but I
figured I'd just in-line it which is easier to code in C than a

The defaultdict signature takes an optional positional argument which
is the default_factory, defaulting to None. The remaining positional
and all keyword arguments are passed to the dict constructor. IOW:

  d = defaultdict(list, [(1, 2)])

is equivalent to:

 d = defaultdict()
 d.default_factory = list
 d.update([(1, 2)])

At this point, repr(d) will be:

  defaultdict(<type 'list'>, {1: 2})

Once Raymond approves the patch I'll check it in.

--Guido van Rossum (home page:

From guido at  Tue Feb 21 03:06:13 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 18:06:13 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Guido van Rossum <guido at> wrote:
> [stuff with typos]

Here's the proofread version:

I have a patch ready that implements this. I've assigned it to Raymond
for review. I'm just reusing the same SF patch as before: .

One subtlety: for maximal flexibility and speed, the standard dict
type now defines an on_missing(key) method; however this version
*just* raises KeyError and the implementation actually doesn't call it
unless the class is a subtype (with the possibility of overriding

collections.defaultdict overrides on_missing(key) to insert and return
self.default_factory() if it is not None; otherwise it raises
KeyError. (It should really call the base class on_missing() but I
figured I'd just in-line it which is easier to code in C than a

The defaultdict signature takes an optional positional argument which
is the default_factory, defaulting to None. The remaining positional
and all keyword arguments are passed to the dict constructor. IOW:

 d = defaultdict(list, [(1, 2)])

is equivalent to:

 d = defaultdict()
 d.default_factory = list
 d.update([(1, 2)])

At this point, repr(d) will be:

 defaultdict(<type 'list'>, {1: 2})

Once Raymond approves the patch I'll check it in.

--Guido van Rossum (home page:

From tdelaney at  Tue Feb 21 03:10:21 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Tue, 21 Feb 2006 13:10:21 +1100
Subject: [Python-Dev] defaultdict proposal round three
Message-ID: <>

"Martin v. L?wis" wrote:

> Delaney, Timothy (Tim) wrote:
>> However, *because* Python uses duck typing, I tend to feel that
>> subclasses in Python *should* be drop-in replacements. If it's not a
>> drop-in replacement, then it should probably not subclass, but just
>> use duck typing (probably by wrapping).
> Inheritance is more about code reuse than about polymorphism.

Oh - it's definitely no hard-and-fast rule. owever, I have found that *usually* people (including myself) only subclass when they want an is-a relationship, whereas duck typing is behaves-like.

In any case, Guido has produced a patch, and the tone of his message sounded like a Pronouncement ...

Tim Delaney

From guido at  Tue Feb 21 03:12:50 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 18:12:50 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Greg Ewing <greg.ewing at> wrote:
> Have you considered the third alternative that's been
> mentioned -- a wrapper?

I don't like that at all. It's quite tricky to implement a fully
transparent wrapper that supports all the special methods (__setitem__
etc.). It will be slower. And it will be more cumbersome to use.

> The issue of __contains__ etc. could be sidestepped by
> not giving the wrapper a __contains__ method at all.
> If you want to do an 'in' test you do it on the
> underlying dict, and then the semantics are clear.

The semantics of defaultdict are crystal clear. __contains__(), keys()
 and friends represent the *actual*, *current* keys. Only
__getitem__() calls on_missing() when the key is not present; being a
"hook", on_missing() can do whatever it wants.

What's the practical use case for not wanting __contains__() to
function? All I hear is fear of theoretical bugs.

--Guido van Rossum (home page:

From guido at  Tue Feb 21 03:48:19 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 18:48:19 -0800
Subject: [Python-Dev] readline compilarion fails on OSX
Message-ID: <>

On OSX (10.4.4) the readline module in the svn HEAD fails compilation
as follows. This is particularly strange since the buildbot is green
for OSX... What could be up with this?

building 'readline' extension
gcc -fno-strict-aliasing -Wno-long-double -no-cpp-precomp
-mno-fused-madd -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I.
-I/Users/guido/projects/python/trunk/./Mac/Include -I../Include -I.
-I/usr/local/include -I/Users/guido/projects/python/trunk/Include
-I/Users/guido/projects/python/trunk/osx -c
/Users/guido/projects/python/trunk/Modules/readline.c -o
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:112: warning:
implicit declaration of function 'history_truncate_file'
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:301: warning:
implicit declaration of function 'remove_history'
/Users/guido/projects/python/trunk/Modules/readline.c:301: warning:
assignment makes pointer from integer without a cast
/Users/guido/projects/python/trunk/Modules/readline.c:310: warning:
passing argument 1 of 'free' discards qualifiers from pointer target
/Users/guido/projects/python/trunk/Modules/readline.c:312: warning:
passing argument 1 of 'free' discards qualifiers from pointer target
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:338: warning:
implicit declaration of function 'replace_history_entry'
/Users/guido/projects/python/trunk/Modules/readline.c:338: warning:
assignment makes pointer from integer without a cast
/Users/guido/projects/python/trunk/Modules/readline.c:347: warning:
passing argument 1 of 'free' discards qualifiers from pointer target
/Users/guido/projects/python/trunk/Modules/readline.c:349: warning:
passing argument 1 of 'free' discards qualifiers from pointer target
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:453: error:
'HISTORY_STATE' undeclared (first use in this function)
/Users/guido/projects/python/trunk/Modules/readline.c:453: error:
(Each undeclared identifier is reported only once
/Users/guido/projects/python/trunk/Modules/readline.c:453: error: for
each function it appears in.)
/Users/guido/projects/python/trunk/Modules/readline.c:453: error:
'hist_st' undeclared (first use in this function)
/Users/guido/projects/python/trunk/Modules/readline.c:455: warning:
implicit declaration of function 'history_get_history_state'
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:503: warning:
implicit declaration of function 'rl_insert_text'
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:637: error:
'rl_attempted_completion_over' undeclared (first use in this function)
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:675: warning:
passing argument 2 of 'completion_matches' from incompatible pointer
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:700: warning:
passing argument 2 of 'rl_bind_key_in_map' from incompatible pointer
/Users/guido/projects/python/trunk/Modules/readline.c:701: warning:
passing argument 2 of 'rl_bind_key_in_map' from incompatible pointer
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:758: warning:
passing argument 2 of 'rl_callback_handler_install' from incompatible
pointer type
/Users/guido/projects/python/trunk/Modules/readline.c:788: warning:
implicit declaration of function 'rl_free_line_state'
/Users/guido/projects/python/trunk/Modules/readline.c:789: warning:
implicit declaration of function 'rl_cleanup_after_signal'
/Users/guido/projects/python/trunk/Modules/readline.c: In function
/Users/guido/projects/python/trunk/Modules/readline.c:883: error:
'HISTORY_STATE' undeclared (first use in this function)
/Users/guido/projects/python/trunk/Modules/readline.c:883: error:
'state' undeclared (first use in this function)
/Users/guido/projects/python/trunk/Modules/readline.c:885: warning:
assignment discards qualifiers from pointer target type

(Yes, the keynote slides are coming along just fine... :-)

--Guido van Rossum (home page:

From bob at  Tue Feb 21 04:04:08 2006
From: bob at (Bob Ippolito)
Date: Mon, 20 Feb 2006 19:04:08 -0800
Subject: [Python-Dev] readline compilarion fails on OSX
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 6:48 PM, Guido van Rossum wrote:

> On OSX (10.4.4) the readline module in the svn HEAD fails compilation
> as follows. This is particularly strange since the buildbot is green
> for OSX... What could be up with this?
> building 'readline' extension
-lots of build junk-

In Apple's quest to make our lives harder, they installed BSD libedit  
and symlinked it to readline.  Python doesn't like that.  The  
buildbot might have a real readline installation, or maybe the  
buildbot is skipping those tests.

You'll need to install a real libreadline if you want it to work.

I've also put together a little tarball that'll build  
statically, and there's pre-built eggs for OS X so the easy_install  
should be quick:


From guido at  Tue Feb 21 04:18:24 2006
From: guido at (Guido van Rossum)
Date: Mon, 20 Feb 2006 19:18:24 -0800
Subject: [Python-Dev] readline compilarion fails on OSX
In-Reply-To: <>
References: <>
Message-ID: <>

Thanks! That worked.

But shouldn't we try to fix to detect this situation instead
of making loud clattering noises?


On 2/20/06, Bob Ippolito <bob at> wrote:
> On Feb 20, 2006, at 6:48 PM, Guido van Rossum wrote:
> > On OSX (10.4.4) the readline module in the svn HEAD fails compilation
> > as follows. This is particularly strange since the buildbot is green
> > for OSX... What could be up with this?
> >
> > building 'readline' extension
> -lots of build junk-
> In Apple's quest to make our lives harder, they installed BSD libedit
> and symlinked it to readline.  Python doesn't like that.  The
> buildbot might have a real readline installation, or maybe the
> buildbot is skipping those tests.
> You'll need to install a real libreadline if you want it to work.
> I've also put together a little tarball that'll build
> statically, and there's pre-built eggs for OS X so the easy_install
> should be quick:
> -bob

--Guido van Rossum (home page:

From tim.peters at  Tue Feb 21 04:24:59 2006
From: tim.peters at (Tim Peters)
Date: Mon, 20 Feb 2006 22:24:59 -0500
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

> If you're willing to commit to running a buildbot, and the only thing
> preventing you is shelling out $$$ to Microsoft, send me e-mail.  I'll
> compile a list to send to the PSF and we'll either poke Microsoft to
> provide some more free licenses or pay for it ourselves.  This is what
> the PSF is for!

Speaking as a PSF director, I might not vote for that :-)  Fact is
I've been keeping the build & tests 100% healthy on WinXP Pro, and
that requires more than just running the tests (it also requires
repairing compiler warnings and Unixisms).

Speaking of which, a number of test failures over the past few weeks
were provoked here only under -r (run tests in random order) or under
a debug build, and didn't look like those were specific to Windows. 
Adding -r to the buildbot test recipe is a decent idea.  Getting
_some_ debug-build test runs would also be good (or do we do that

Anyway, since XP Pro is effectively covered, I'd be keener to see a
Windows buildbot running under a different flavor of Windows.  I
expect I'll eventually volunteer my home box to run an XP buildbot,
but am in no hurry (and probably won't leave any machine here turned
on 24/7 regardless).

From rhamph at  Tue Feb 21 04:55:22 2006
From: rhamph at (Adam Olsen)
Date: Mon, 20 Feb 2006 20:55:22 -0700
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Jim Jewett <jimjjewett at> wrote:
> Adam Olsen asked:
> > ... d.getorset(key, func) would work in your use cases?
> It is an improvement over setdefault, because it doesn't
> always evaluate the expensive func.  (But why should every
> call have to pass in the function, when it is a property of
> the dictionary?)

Because usually it's a property of how you use it, not a property of
the dictionary.  The dictionary is just a generic storage mechanism.

> [snip]
> In other words, the program would work correctly if I passed
> in a normal but huge dictionary; I want to avoid that for reasons
> of efficiency.  This isn't the only use for a mapping, but it is
> the only one I've seen where KeyError is "expected" by the
> program's normal flow.

Looking at your explanation, I agree, getorset is useless for that use case.

However, I'm beginning to think we shouldn't be comparing them.
defaultdict is a powerful but heavyweight option, intended for
complicated behavior.  getorset and setdefault are intended to be
very lightweight, even lighter than the "try/except KeyError" and "if
key not in X: X[key] = default" memes we have right now.  getorset's
factory function is only appropriate for preexisting functions, not
user defined ones.

Essentially, I believe getorset should be discussed on its own merits,
independent of the addition of a defaultdict class.  Perhaps
discussion of it (and the deprecation of setdefault) should wait until
after defaultdict has been completed?

Adam Olsen, aka Rhamphoryncus

From barry at  Tue Feb 21 05:11:26 2006
From: barry at (Barry Warsaw)
Date: Mon, 20 Feb 2006 23:11:26 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 2006-02-18 at 12:53 +0100, Pierre Barbier de Reuille wrote:

> >     Guido> Over lunch with Alex Martelli, he proposed that a subclass of
> >     Guido> dict with this behavior (but implemented in C) would be a good
> >     Guido> addition to the language.

I agree that .setdefault() is a well-intentioned failure, although I'm
much less concerned about any potential performance impact than the fact
that it's completely unreadable.  And while I like the basic idea, I
also agree that deriving from dict is problematic, both because of the
constructor signature is tough to forward, but also because dict is such
a fundamental type that APIs that return dicts may have to be changed to
allow passing in a factory type.

I'd rather like to see what Pierre proposes, with a few minor

> Well, first not ot break the current interface, and second because I think it
> reads better I would prefer :
>       d = {'a': 1}'
>       d['b']              # raises KeyError
>       d.get('c')          # evaluates to None
>       d.default = 42
>       d['b']              # evaluates to 42
>       d.get('c')          # evaluates to 42

So far so good.

> And to undo the default, you can simply do :
>       del d.default

Although this I'm not crazy about.  If you let .default be a callable,
you could also write this as

def keyerror(): raise KeyError
d.default = keyerror

or possibly just this as a shortcut:

d.default = KeyError

> > The only question in my mind is whether or not getting a non-existent value
> > under the influence of a given default value should stick that value in the
> > dictionary or not.

Agreed.  I'm not sure whether .get(onearg) should return None
or .default.  I /think/ I want the latter, but I'd have to play with
some real code to know for sure.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From bob at  Tue Feb 21 05:27:22 2006
From: bob at (Bob Ippolito)
Date: Mon, 20 Feb 2006 20:27:22 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 20, 2006, at 7:25 PM, Stephen J. Turnbull wrote:

>>>>>> "Martin" == Martin v L?wis <martin at> writes:
>     Martin> Please do take a look. It is the only way: If you were to
>     Martin> embed base64 *bytes* into character data content of an XML
>     Martin> element, the resulting XML file might not be well-formed
>     Martin> anymore (if the encoding of the XML file is not an ASCII
>     Martin> superencoding).
> Excuse me, I've been doing category theory recently.  By "embedding" I
> mean a map from an intermediate object which is a stream of bytes to
> the corresponding stream of characters.  In the case of UTF-16-coded
> characters, this would necessarily imply a representation change, as
> you say.
> What I advocate for Python is to require that the standard base64
> codec be defined only on bytes, and always produce bytes.  Any
> representation change should be done explicitly.  This is surely
> conformant with RFC 2045's definition and with RFC 3548.



From at  Tue Feb 21 05:29:53 2006
From: at (Almann T. Goo)
Date: Mon, 20 Feb 2006 23:29:53 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
Message-ID: <>

I am considering developing a PEP for enabling a mechanism to assign to free
variables in a closure (nested function).  My rationale is that with the
advent of PEP 227 <>, Python has
proper nested lexical scopes, but can have undesirable behavior (especially
with new developers) when a user makes wants to make an assignment to a free
variable within a nested function.  Furthermore, after seeing numerous
kludges to "solve" the problem with a mutable object, like a list, as the
free variable do not seem "Pythonic."  I have also seen mention that the use
of classes can mitigate this, but that seems, IMHO, heavy handed in cases
when an elegant solution using a closure would suffice and be more
appropriate--especially when Python already has nested lexical scopes.

I propose two possible approaches to solve this issue:

1.  Adding a keyword such as "use" that would follow similar semantics as "
global" does today.  A nested scope could declare names with this keyword to
enable assignment to such names to change the closest parent's binding.  The
semantic would be to keep the behavior we experience today but tell the
compiler/interpreter that a name declared with the "use" keyword would
explicitly use an enclosing scope.  I personally like this approach the most
since it would seem to be in keeping with the current way the language works
and would probably be the most backwards compatible.  The semantics for how
this interacts with the global scope would also need to be defined (should "
use" be equivalent to a global when no name exists all parent scopes, etc.)

def incgen( inc = 1 ) :
  a = 6
  def incrementer() :
    use a
    #use a, inc <-- list of names okay too
    a += inc
    return a
  return incrementer

Of course, this approach suffers from a downside that every nested scope
that wanted to assign to a parent scope's name would need to have the "use"
keyword for those names--but one could argue that this is in keeping with
one of Python's philosophies that "Explicit is better than implicit"
(PEP 20<>).
This approach also has to deal with a user declaring a name with "use" that
is a named parameter--this would be a semantic error that could be handled
like "global" does today with a SyntaxError.

2.  Adding a keyword such as "scope" that would behave similarly to
JavaScript's "var" keyword.  A name could be declared with such a keyword
optionally and all nested scopes would use the declaring scope's binding
when accessing or assigning to a particular name.  This approach has similar
benefits to my first approach, but is clearly more top-down than the first
approach.  Subsequent "scope" declarations would create a new binding at the
declaring scope for the declaring and child scopes to use.  This could
potentially be a gotcha for users expecting the binding semantics in place
today.  Also the scope keyword would have to be allowed to be used on
parameters to allow such parameter names to be used in a similar fashion in
a child scope.

def incgen( inc = 1 ) :
  #scope inc <-- allow scope declaration for bound parameters (not a big fan
of this)
   scope a = 6
  def incrementer() :
    a += inc
    return a
  return incrementer

This approach would be similar to languages like JavaScript that allow for
explicit scope binding with the use of "var" or more static languages that
allow re-declaring names at lower scopes.  I am less in favor of this,
because I don't think it feels very "Pythonic".

As a point of reference, some languages such as Ruby will only bind a new
name to a scope on assignment when an enclosing scope does not have the name
bound.  I do believe the Python name binding semantics have issues (for
which the "global" keyword was born), but I feel that the "fixing" the
Python semantic to a more "Ruby-like" one adds as many problems as it solves
since the "Ruby-like" one is just as implicit in nature.  Not to mention the
backwards compatibility impact is probably much larger.

I would like the community's opinion if there is enough out there that think
this would be a worthwile endevour--or if there is already an initiative
that I missed.  Please let me know your questions, comments.

Best Regards,

Almann T. Goo at
-------------- next part --------------
An HTML attachment was scrubbed...

From barry at  Tue Feb 21 05:29:59 2006
From: barry at (Barry Warsaw)
Date: Mon, 20 Feb 2006 23:29:59 -0500
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, 2006-02-17 at 11:09 -0800, Guido van Rossum wrote:

> Thanks for all the constructive feedback. Here are some responses and
> a new proposal.
> - Yes, I'd like to kill setdefault() in 3.0 if not sooner.

A worthy goal, but not possible unless you want to break existing code.
I don't think it's worth a DeprecationWarning either.  Slating it for
removal in 3.0 seems fine.

Everything else about your proposal seems great.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From skip at  Tue Feb 21 05:34:04 2006
From: skip at (skip at
Date: Mon, 20 Feb 2006 22:34:04 -0600
Subject: [Python-Dev] readline compilarion fails on OSX
In-Reply-To: <>
References: <>
Message-ID: <>

    Guido> But shouldn't we try to fix to detect this situation
    Guido> instead of making loud clattering noises?

Here's a first-cut try at a patch:

Unfortunately, I don't think distutils provides a clean way to detect
symbols the way configure does, so it's a bit clumsy...


From jcarlson at  Tue Feb 21 05:44:01 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 20 Feb 2006 20:44:01 -0800
Subject: [Python-Dev] Proposal: defaultdict
In-Reply-To: <>
References: <>
Message-ID: <>

"Adam Olsen" <rhamph at> wrote:
> However, I'm beginning to think we shouldn't be comparing them.
> defaultdict is a powerful but heavyweight option, intended for
> complicated behavior.

Check out Guido's patch.  It's not that "heavyweight", and its intended
behavior is to make some operations *more* intuitive, if not a bit
faster in some cases.

Whether or not getorset is introduced, I don't much care, as defaultdict
will cover every use case I've been using setdefault for, as well as
most of my use cases for get.

 - Josiah

From tim.peters at  Tue Feb 21 05:44:20 2006
From: tim.peters at (Tim Peters)
Date: Mon, 20 Feb 2006 23:44:20 -0500
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

> ...
> What's the practical use case for not wanting __contains__() to
> function?

I don't know.  I have practical use cases for wanting __contains__()
to function, but there's been no call for those.  For an example,
think of any real use ;-)

For example, I often use dicts to represent multisets, where a key
maps to a strictly positive count of the number of times that key
appears in the multiset.  A default of 0 is the right thing to return
for a key not in the multiset, so that M[k] += 1 works to add another
k to multiset M regardless of whether k was already present.

I sure hope I can implement multiset intersection as, e.g.,

def minter(a, b):
    if len(b) < len(a): # make `a` the smaller, and iterate over it
        a, b = b, a
    result = defaultdict defaulting to 0, however that's spelled
    for k in a:
        if k in b:
            result[k] = min(a[k], b[k])
    return result

Replacing the loop nest with:

    for k in a:
        result[k] = min(a[k], b[k])

would be semantically correct so far as it goes, but pragmatically
wrong:  I maintain my "strictly positive count" invariant because
consuming RAM to hold elements "that aren't there" can be a pragmatic
disaster.  (When `k` is in `a` but not in `b`, I don't want `k` to be
stored in `result`)

I have other examples, but they come so easily it's better to leave
that an exercise for the reader.

From jcarlson at  Tue Feb 21 06:01:01 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 20 Feb 2006 21:01:01 -0800
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

"Almann T. Goo" < at> wrote:
> I would like the community's opinion if there is enough out there that think
> this would be a worthwile endevour--or if there is already an initiative
> that I missed.  Please let me know your questions, comments.


Mechanisms which rely on manipulating variables within closures or
nested scopes to function properly can be elegant, but I've not yet seen
one that *really* is. You state that using classes can be "heavy handed",
but one of the major uses of classes is as a *namespace*. Many desired
uses of closures (including the various uses you have outlined) is to
hide a *namespace*, and combining both closures with classes can offer
that to you, without requiring a language change.  Of course using
classes directly with a bit of work can offer you everything you want
from a closure, with all of the explcitness that you could ever want.

As an aside, you mention both 'use' and 'scope' as possible keyword
additions for various uses of nested scopes.  In my experience, when one
goes beyond 3 or so levels of nested scopes (methods of a class defined
within a class namespace, or perhaps methods of a class defined within a
method of a class), it starts getting to the point where the programmer
is trying to be too clever.

 - Josiah

From at  Tue Feb 21 07:09:38 2006
From: at (Almann T. Goo)
Date: Tue, 21 Feb 2006 01:09:38 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

> Mechanisms which rely on manipulating variables within closures or
> nested scopes to function properly can be elegant, but I've not yet seen
> one that *really* is.

This really isn't a case for or against what I'm proposing since we
can already do this in today's Python with mutable variables in an
enclosing scope (see below).  I am proposing a language change to help
make closures more orthogonal to the scoping constructs that are
already in place for the global scope.

> You state that using classes can be "heavy handed",
> but one of the major uses of classes is as a *namespace*. Many desired
> uses of closures (including the various uses you have outlined)
> is to hide a *namespace*, and combining both closures with classes can offer
> that to you, without requiring a language change.

Closures are also used in more functional styles of programming for
defining customized control structures (those Ruby folks like them for
this purpose).  Granted you can do this with classes/objects and
defining interfaces the end result can be somewhat un-natural for some
problems--but I don't want to get into an argument between closures
vs. objects since that is not what my proposal is aimed at and Python
already has both.

> Of course using
> classes directly with a bit of work can offer you everything you want
> from a closure, with all of the explcitness that you could ever want.

Really, the easiest way to emulate what I want in today's Python is to
create a mutable object (like a dict or list) in the enclosing scope
to work around the semantic that the first assignment in a local scope
binds a new name.  Doing this seems rather un-natural and forcing the
use of classes doesn't seem more natural

def incgen( inc = 1 ) :
  env = [ 6 ]
  def incrementor() :
    env[ 0 ] += inc
    return env[ 0 ]
  return incrementor

This is a work around for something a developer cannot do more
naturally today.  I do not think using some combination of classes and
closures makes things clearer--it is still working around what I would
construe as the non-orthogonal nature of nested lexical scopes in
Python since the language provides a construct to deal with the
problem for global variables.

a = 6
def incgen( inc = 1 ) :
  def incrementor() :
    global a
    a += inc
    return a
  return incrementor

Granted this is a somewhat trivial example, but I think it
demonstrates my point about how nested lexical scopes are second class
(since the language has no equivalent construct for them) and don't
behave like the global scope.

> As an aside, you mention both 'use' and 'scope' as possible keyword
> additions for various uses of nested scopes.  In my experience, when one
> goes beyond 3 or so levels of nested scopes (methods of a class defined
> within a class namespace, or perhaps methods of a class defined within a
> method of a class), it starts getting to the point where the programmer
> is trying to be too clever.

Even though I may agree with you on this, your argument is more of an
argument against PEP 227 than what I am proposing.  Again, today's
Python already allows a developer to have deep nested scopes.


Almann T. Goo at

From bokr at  Tue Feb 21 07:43:00 2006
From: bokr at (Bengt Richter)
Date: Tue, 21 Feb 2006 06:43:00 GMT
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <>

On Mon, 20 Feb 2006 11:09:48 -0800, Alex Martelli <aleaxit at> wrote:

>On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:
>> [GvR]
>>> I'm not convinced by the argument
>>> that __contains__ should always return True
>> Me either.  I cannot think of a more useless behavior or one more  
>> likely to have
>> unexpected consequences.  Besides, as Josiah pointed out, it is  
>> much easier for
>> a subclass override to substitute always True return values than  
>> vice-versa.
>Agreed on all counts.
>> I prefer this approach over subclassing.  The mental load from an  
>> additional
>> method is less than the load from a separate type (even a  
>> subclass).   Also,
>> avoidance of invariant issues is a big plus.  Besides, if this allows
>> setdefault() to be deprecated, it becomes an all-around win.
>I'd love to remove setdefault in 3.0 -- but I don't think it can be  
>done before that: default_factory won't cover the occasional use  
>cases where setdefault is called with different defaults at different  
>locations, and, rare as those cases may be, any 2.* should not break  
>any existing code that uses that approach.
>>> - Even if the default_factory were passed to the constructor, it  
>>> still
>>> ought to be a writable attribute so it can be introspected and
>>> modified. A defaultdict that can't change its default factory after
>>> its creation is less useful.
>> Right!  My preference is to have default_factory not passed to the  
>> constructor,
>> so we are left with just one way to do it.  But that is a nit.
How about doing it as an expression, empowering ( ;-) the dict just afer creation?
E.g., for

    d = dict()
    d.default_factory = list

you could write

    d = dict()**list

I made a hack to illustrate functionality (code at end).
DD simulates the new dict without defaults.

 >>> d = DD(a=1)
 >>> d
 {'a': 1}
So d is the plain dict with no default action enabled

 >>> ddl = DD()**list
 >>> ddl
 DD({} <= list)

This is a new dict with list default factory

 >>> ddl[42]
Beats the heck out of ddl.setdefault(42, [])

 >>> ddl[42].append(1)
 >>> ddl[42].append(2)
 >>> ddl
 DD({42: [1, 2]} <= list)

Now take the non-default dict d and make an int default wrapper
 >>> ddi = d**int
 >>> ddi
 DD({'a': 1} <= int)

Show there's no default on the orig: 
 >>> d['b']+=1
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 KeyError: 'b'

But use the wrapper proxy:
 >>> ddi['b']+=1
 >>> ddi
 DD({'a': 1, 'b': 1} <= int)
 >>> ddi['b']+=1
 >>> ddi
 DD({'a': 1, 'b': 2} <= int)

Note that augassign works. And info is visible in d:
 >>> d
 {'a': 1, 'b': 2}

probably unusual use, but a one-off

    d.setdefault('S', set()).add(42)

can be written

 >>> (d**set)['S'].add(42)
 >>> d
 {'a': 1, 'S': set([42]), 'b': 2}

i.e., d**different_factory_value creates a temporary d-accessing proxy
with default_factory set to different_factory_value, without affecting
other bindings of d unless you rebind them with the expression result.

I haven't implemented a check for compatible types on mixed defaults.
e.g. the integer-default proxy will show 'S', but note:

 >>> ddi['S']
 >>> ddi['S'] += 5
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: unsupported operand type(s) for +=: 'set' and 'int'

I guess the programmer deserves it ;-)

You can get a new defaulting proxy from an existing one, as it will
use the same base plain dict:

 >>> ddd = ddi**dict
 >>> ddd
 DD({'a': 1, 'S': set([42]), 'b': 2, 'd': 0} <= dict)
 >>> ddd['adict'].update(check=1, this=2)
 >>> ddd
 DD({'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0} <= dict)
 >>> d
 {'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0}

Not sure what the C implementation ramifications would be, but it makes
setdefault easy to spell. And using both modes interchangeably is easy.

And stuff like

 >>> d = DD()**int
 >>> for c in open('').read(): d[c]+=1
 >>> print sorted(d.items(), key=lambda t:t[1])[-5:]
 [('f', 50), ('t', 52), ('_', 71), ('e', 74), (' ', 499)]

Is nice ;-)

 >>> len(d)
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: len() of unsized object

 >>> len(d.keys())
 >>> len(open('').read())
 >>> sum(d.values())

>No big deal either way, but I see "passing the default factory to the  
>ctor" as the "one obvious way to do it", so I'd rather have it (be it  
>with a subclass or a classmethod-alternate constructor). I won't weep  
>bitter tears if this drops out, though.
>>> - It would be unwise to have a default value that would be called if
>>> it was callable: what if I wanted the default to be a class instance
>>> that happens to have a __call__ method for unrelated reasons?
>>> Callability is an elusive propperty; APIs should not attempt to
>>> dynamically decide whether an argument is callable or not.
>> That makes sense, though it seems over-the-top to need a zero- 
>> factory for a
>> multiset.
>But int is a convenient zero-factory.
Aha, good one. I didn't think of that one^H^H^Hzero ;-)

I used it in the examples above ;-)

Here is the code (be kind ;-)
----< >-----------------------------------------------
class DD(dict):
    def __pow__(self, factory):
        class proxy(object):
            def __init__(self, dct, factory):
                self._d = dct
                self._f = factory
            def __getattribute__(self, attr):
                if attr in ('_d', '_f'):
                    return object.__getattribute__(self, attr)
                    _d = object.__getattribute__(self, '_d')
                    return object.__getattribute__(_d, attr)
            def __getitem__(self, k):
                if k in self._d:
                    v = self._d[k]
                elif self._f:
                    v = self._d[k] = self._f()
                    raise KeyError(repr(k))
                return v
            def __setitem__(self, i, v): self._d[i]=v
            def __delitem__(self, i): del self._d[i]
            def __repr__(self):
                if self._f:
                    return 'DD(%r <= %s)'%(self._d, self._f.__name__)
                    return dict.__repr__(self._d)
            def __pow__(self, fct):
                return type(self)(self._d, fct)
        return proxy(self, factory)

Bengt Richter

From bokr at  Tue Feb 21 07:53:41 2006
From: bokr at (Bengt Richter)
Date: Tue, 21 Feb 2006 06:53:41 GMT
Subject: [Python-Dev] Memory Error the right error for coding cookie promise
Message-ID: <>

Perhaps a more informative message would be nice.
Here's an easy way to trigger it:

 >>> compile("#-*- coding: ascii -*-\nprint 'ab%c'\n"%0x80, '','exec')
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?

Bengt Richter

From jcarlson at  Tue Feb 21 08:03:08 2006
From: jcarlson at (Josiah Carlson)
Date: Mon, 20 Feb 2006 23:03:08 -0800
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

"Almann T. Goo" < at> wrote:
> > Mechanisms which rely on manipulating variables within closures or
> > nested scopes to function properly can be elegant, but I've not yet seen
> > one that *really* is.
> This really isn't a case for or against what I'm proposing since we
> can already do this in today's Python with mutable variables in an
> enclosing scope (see below).  I am proposing a language change to help
> make closures more orthogonal to the scoping constructs that are
> already in place for the global scope.

Actually, it is.  Introducing these two new keywords is equivalent to
encouraging nested scope use.  Right now nested scope use is "limited" or
"fraught with gotchas".  Adding the 'use' and 'scope' keywords to label
levels of scopes for name resolution will only encourage users to write
closures which could have written better or not written at all (see some
of my later examples).  Users who had been using closures to solve
real problems "elegantly" likely have not been affected by the current
state of affairs, so arguably may not gain much in 'use' and 'scope'.

> > You state that using classes can be "heavy handed",
> > but one of the major uses of classes is as a *namespace*. Many desired
> > uses of closures (including the various uses you have outlined)
> > is to hide a *namespace*, and combining both closures with classes can offer
> > that to you, without requiring a language change.
> Closures are also used in more functional styles of programming for
> defining customized control structures (those Ruby folks like them for
> this purpose).

Except that Python does not offer user-defined control structures, so
this is not a Python use-case.

> > Of course using
> > classes directly with a bit of work can offer you everything you want
> > from a closure, with all of the explcitness that you could ever want.
> Really, the easiest way to emulate what I want in today's Python is to
> create a mutable object (like a dict or list) in the enclosing scope
> to work around the semantic that the first assignment in a local scope
> binds a new name.  Doing this seems rather un-natural and forcing the
> use of classes doesn't seem more natural
> def incgen( inc = 1 ) :
>   env = [ 6 ]
>   def incrementor() :
>     env[ 0 ] += inc
>     return env[ 0 ]
>   return incrementor

Indeed, there are other "more natural" ways of doing that right now.

    #for inc=1 cases
    from itertools import count as incgen

    #for limited-range but arbitrary integer inc cases:
    from sys import maxint
    def incgen(env=6, inc=1):
        return iter(xrange(env, (-maxint-1, maxint)[inc>0], inc)).next

Or if you want to get fancier, a generator factory works quite well.

    def mycount(start, inc):
        while 1:
            yield start
            start += inc

    def incgen(env=6, inc=1):
        return mycount(env, inc).next

All of which I find clearer than the closure example... but this isn't a
discussion on how to create counters, it's a discussion about the use of
closures and nested scopes, or more specifically, Python's lack of
orthogonality on lexically nested scopes.  Which brings up a question:
what is your actual use-case for nested scopes and closures which makes
the current "use a mutable or class" awkward?  I would like to see a
non-toy example of its use which would not be clearer through the use of
a class, and which is nontrivially hampered by the current state of
Python's nested scopes and name resolution.

 - Josiah

From aleaxit at  Tue Feb 21 08:15:00 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 20 Feb 2006 23:15:00 -0800
Subject: [Python-Dev] Memory Error the right error for coding cookie
	promise violation?
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 21, 2006, at 6:53 AM, Bengt Richter wrote:

> Perhaps a more informative message would be nice.
> Here's an easy way to trigger it:
>>>> compile("#-*- coding: ascii -*-\nprint 'ab%c'\n"%0x80, '','exec')
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
>  MemoryError

Definitely looks like a bug, please open a bug report for it.



From rasky at  Tue Feb 21 08:51:03 2006
From: rasky at (Giovanni Bajo)
Date: Tue, 21 Feb 2006 08:51:03 +0100
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <074c01c636bb$96669f90$09b92997@bagio>

Raymond Hettinger wrote:

>> - It would be unwise to have a default value that would be called if
>> it was callable: what if I wanted the default to be a class instance
>> that happens to have a _call_ method for unrelated reasons?
>> Callability is an elusive propperty; APIs should not attempt to
>> dynamically decide whether an argument is callable or not.
> That makes sense, though it seems over-the-top to need a zero-factory
> for a multiset.
> An alternative is to have two possible attributes:
>   d.default_factory = list
> or
>   d.default_value = 0
> with an exception being raised when both are defined (the test is
> done when the
> attribute is created, not when the lookup is performed).

What does this buy over just doing:

d.default_factory = lambda: 0

which is also totally unambiguous wrt the semantic of usage of the default
value (copy vs deepcopy vs whatever)? Given that the most of the default values
I have ever wanted to use do not even require a lambda (list, set, int come to
Giovanni Bajo

From nnorwitz at  Tue Feb 21 09:09:12 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 21 Feb 2006 00:09:12 -0800
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/20/06, Tim Peters <tim.peters at> wrote:
> Speaking as a PSF director, I might not vote for that :-)  Fact is
> I've been keeping the build & tests 100% healthy on WinXP Pro, and
> that requires more than just running the tests (it also requires
> repairing compiler warnings and Unixisms).

These are some ways we need buildbot to help us more.  IMO compiler
warnings should generate emails from buildbot.  We would need to
filter out a bunch, but it would be desirable to know about warnings
on different architectures.  Unfortunately, there are a ton of
warnings on OS X right now.

> Adding -r to the buildbot test recipe is a decent idea.  Getting
> _some_ debug-build test runs would also be good (or do we do that
> already?).

Buildbot runs "make testall" which does not run the tests in random order.

There's nothing to prevent buildbot from making debug builds, though
that is not currently done.  The builds I run on the x86 box every 12
hours *do* use debug builds (Misc/  The results are here:

I also recently switched the email to go to python-checkins, though
there haven't been any failures yet (unless they are sitting in a spam
queue).  There are some hangs (like right now):

Thread 1:

Lib/ (204): wait
Lib/ (543): join
Lib/ (637): __exitfunc
Lib/ (25): _run_exitfuncs

Thread 2:

Lib/ (170): accept
Lib/ (373): get_request
Lib/ (218): handle_request
Lib/test/ (33): serve_a_few
Lib/test/ (82): run
Lib/ (445): __bootstrap

I've seen test_socketserver fail before, this could be due to running
2 tests simultaneously.

> Anyway, since XP Pro is effectively covered, I'd be keener to see a
> Windows buildbot running under a different flavor of Windows.



From nnorwitz at  Tue Feb 21 09:28:06 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 21 Feb 2006 00:28:06 -0800
Subject: [Python-Dev] Memory Error the right error for coding cookie
	promise violation?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Bengt Richter <bokr at> wrote:
> Perhaps a more informative message would be nice.
> Here's an easy way to trigger it:
>  >>> compile("#-*- coding: ascii -*-\nprint 'ab%c'\n"%0x80, '','exec')
>  Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
>  MemoryError

This was fixed in 2.5, but looks like it wasn't backported.  I don't
recall exactly why. -- n

Python 2.5a0 (trunk:42526M, Feb 20 2006, 16:00:48)
>>> compile("#-*- coding: ascii -*-\nprint 'ab%c'\n"%0x80, '','exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 0
SyntaxError: unknown encoding: ascii

From jeff at  Tue Feb 21 09:57:27 2006
From: jeff at (Jeff Rush)
Date: Tue, 21 Feb 2006 02:57:27 -0600
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:

> I'd say that the parties interested in non-Unicode versions of
> Python should maintain these branches of Python. Dito for other
> stripped down versions.

I understand where you're coming from but the embedded market I encounter 
tends to focus on the hardware side.  If they can get a marketing star by 
grabbing Python off-the shelf, tweak the build and produce something to 
include with their product, they will. But if they have to maintain a 
branch, they'll just go with the defacto C API most such devices use.

> Note that this does not mean that we should forget about memory
> consumption issues. It's just that if there's only marginal
> interest in certain special builds of Python, I don't see the
> requirement for the Python core developers to maintain them.

These requirements of customization may not be a strong case for today but 
could be impacting future growth of the language in certain sectors.  I'm a 
rabid Python evangelist and alway try to push Python into more nooks and 
crannies of the marketplace, similar to how the Linux kernel is available 
from the tiniest machines to the largest iron.  If the focus of Python is to 
be strictly a desktop, conventional (mostly ;-) language, restricting its 
adaptability to other less interesting environments may be a reasonable 
tradeoff to improve its maintainability.  But adaptability, especially when 
you don't fully grok where or how it will be used, can also be a competitive 


From ronaldoussoren at  Tue Feb 21 10:10:16 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Tue, 21 Feb 2006 10:10:16 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 21-feb-2006, at 9:09, Neal Norwitz wrote:

> Unfortunately, there are a ton of
> warnings on OS X right now.

How many of those do you see when you ignore the warnings you get
while building the Carbon extensions? Those extensions wrap loads of
deprecated functions, each of which will give a warning.


From nnorwitz at  Tue Feb 21 10:19:03 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 21 Feb 2006 01:19:03 -0800
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/21/06, Ronald Oussoren <ronaldoussoren at> wrote:
> On 21-feb-2006, at 9:09, Neal Norwitz wrote:
> > Unfortunately, there are a ton of
> > warnings on OS X right now.
> How many of those do you see when you ignore the warnings you get
> while building the Carbon extensions? Those extensions wrap loads of
> deprecated functions, each of which will give a warning.


Most but not all of the warnings are due to Carbon AFAICT.  I'd like
to fix those that are important, but it's so far down on the priority
list. :-(


From ronaldoussoren at  Tue Feb 21 10:26:42 2006
From: ronaldoussoren at (Ronald Oussoren)
Date: Tue, 21 Feb 2006 10:26:42 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 21-feb-2006, at 10:19, Neal Norwitz wrote:

> On 2/21/06, Ronald Oussoren <ronaldoussoren at> wrote:
>> On 21-feb-2006, at 9:09, Neal Norwitz wrote:
>>> Unfortunately, there are a ton of
>>> warnings on OS X right now.
>> How many of those do you see when you ignore the warnings you get
>> while building the Carbon extensions? Those extensions wrap loads of
>> deprecated functions, each of which will give a warning.
> RIght:
> 138/step-compile/0
> Most but not all of the warnings are due to Carbon AFAICT.  I'd like
> to fix those that are important, but it's so far down on the priority
> list. :-(

I'm working with Bob I. on a universal binary build of python 2.4.  
Some of our patches fix warnings like the ones for _CFmodule.c. I'll  
be starting with submitting the less controversial patches once the  
universal build is mostly ready, which should be any day now.


> n

From jeff at  Tue Feb 21 09:57:27 2006
From: jeff at (Jeff Rush)
Date: Tue, 21 Feb 2006 02:57:27 -0600
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

M.-A. Lemburg wrote:

> I'd say that the parties interested in non-Unicode versions of
> Python should maintain these branches of Python. Dito for other
> stripped down versions.

I understand where you're coming from but the embedded market I encounter 
tends to focus on the hardware side.  If they can get a marketing star by 
grabbing Python off-the shelf, tweak the build and produce something to 
include with their product, they will. But if they have to maintain a 
branch, they'll just go with the defacto C API most such devices use.

> Note that this does not mean that we should forget about memory
> consumption issues. It's just that if there's only marginal
> interest in certain special builds of Python, I don't see the
> requirement for the Python core developers to maintain them.

These requirements of customization may not be a strong case for today but 
could be impacting future growth of the language in certain sectors.  I'm a 
rabid Python evangelist and alway try to push Python into more nooks and 
crannies of the marketplace, similar to how the Linux kernel is available 
from the tiniest machines to the largest iron.  If the focus of Python is to 
be strictly a desktop, conventional (mostly ;-) language, restricting its 
adaptability to other less interesting environments may be a reasonable 
tradeoff to improve its maintainability.  But adaptability, especially when 
you don't fully grok where or how it will be used, can also be a competitive 


From mal at  Tue Feb 21 10:36:29 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 21 Feb 2006 10:36:29 +0100
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Martin v. L?wis wrote:
> M.-A. Lemburg wrote:
>> Note that this does not mean that we should forget about memory
>> consumption issues. It's just that if there's only marginal
>> interest in certain special builds of Python, I don't see the
>> requirement for the Python core developers to maintain them.
> Well, the cost of Unicode support is not so much in the algorithmic
> part, but in the tables that come along with it. AFAICT, everything
> but unicodectype is optional; that is 5KiB of code and 20KiB of data
> on x86. Actually, the size of the code *does* matter, at a second
> glance. Here are the largest object files in the Python code base
> on my system (not counting dynamic modules):
>    text    data     bss     dec     hex filename
>    4845   19968       0   24813    60ed Objects/unicodectype.o
>   22633    2432     352   25417    6349 Objects/listobject.o
>   29259    1412     152   30823    7867 Objects/classobject.o
>   20696   11488       4   32188    7dbc Python/bltinmodule.o
>   33579     740       0   34319    860f Objects/longobject.o
>   34119      16     288   34423    8677 Python/ceval.o
>   35179    2796       0   37975    9457 Modules/_sre.o
>   26539   15820     416   42775    a717 Modules/posixmodule.o
>   35283    8800    1056   45139    b053 Objects/stringobject.o
>   50360       0      28   50388    c4d4 Python/compile.o
>   68455    4624     440   73519   11f2f Objects/typeobject.o
>   69993    9316    1196   80505   13a79 Objects/unicodeobject.o
> So it appears that dropping Unicode support can indeed provide
> some savings.
> For reference, we also have an option to drop complex numbers:
>    9654     692       4   10350    286e Objects/complexobject.o

So why not drop that as well ?

Note that I'm not saying that these switches are useless - of
course they do allow to strip down the Python interpreter.
I believe that only very few people are interested in having these
options and it's fair enough to put the burden of maintaining these
branches on them.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 21 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From mal at  Tue Feb 21 10:43:42 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 21 Feb 2006 10:43:42 +0100
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>
	<> <>
Message-ID: <>

Jeff Rush wrote:
> M.-A. Lemburg wrote:
>> I'd say that the parties interested in non-Unicode versions of
>> Python should maintain these branches of Python. Dito for other
>> stripped down versions.
> I understand where you're coming from but the embedded market I
> encounter tends to focus on the hardware side.  If they can get a
> marketing star by grabbing Python off-the shelf, tweak the build and
> produce something to include with their product, they will. But if they
> have to maintain a branch, they'll just go with the defacto C API most
> such devices use.
>> Note that this does not mean that we should forget about memory
>> consumption issues. It's just that if there's only marginal
>> interest in certain special builds of Python, I don't see the
>> requirement for the Python core developers to maintain them.
> These requirements of customization may not be a strong case for today
> but could be impacting future growth of the language in certain
> sectors.  I'm a rabid Python evangelist and alway try to push Python
> into more nooks and crannies of the marketplace, similar to how the
> Linux kernel is available from the tiniest machines to the largest
> iron.  If the focus of Python is to be strictly a desktop, conventional
> (mostly ;-) language, restricting its adaptability to other less
> interesting environments may be a reasonable tradeoff to improve its
> maintainability.  But adaptability, especially when you don't fully grok
> where or how it will be used, can also be a competitive advantage.

I don't think this is a strong enough case to warrant having
to maintain a separate branch of the Python core.

Even platforms like Palm nowadays have enough RAM to cope with
the 100kB or so that Unicode support adds.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 21 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From greg.ewing at  Tue Feb 21 10:50:17 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:50:17 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Delaney, Timothy (Tim) wrote:

> However, *because* Python uses duck typing, I tend to feel that
> subclasses in Python *should* be drop-in replacements.

Duck-typing means that the only reliable way to
assess whether two types are sufficiently compatible
for some purpose is to consult the documentation --
you can't just look at the base class list.

I think this should work both ways. It should be
okay to *not* document autodict as being a subclass
of dict, even if it happens to be implemented that

I've adopted a convention like this in PyGUI,
where I document the classes in terms of a
conceptual interface hierarchy, without promising
that they will be implemented that way.


From greg.ewing at  Tue Feb 21 10:50:56 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:50:56 +1300
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP
 332	revival in coordination with pep 349?]]
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Ron Adam wrote:

> Storing byte information as 16 or 32 bits ints could take up a rather 
> lot of memory in some cases.

I don't quite see the point here. Inside a bytes object,
they would be stored 1 byte per byte. Nobody is suggesting
that they would take up more than that just because
a_bytes_object[i] happens to return an int.

So the only reason to introduce a new "byte" type is to
remove some of the operations that int has. We can already
do bitwise operations on an int, so we don't need a new
type to add that capability.

What's more, I can see this leading to people asking for
arithmetic operations to be *added* to the byte type so
they can do wrap-around arithmetic, and then for 16-bit,
32-bit, 64-bit etc. versions of it, etc. etc.

Do we really want to get onto that slope?


From greg.ewing at  Tue Feb 21 10:51:08 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:51:08 +1300
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP
 332	revival in coordination with pep 349?]]
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Ron Adam wrote:

> Storing byte information as 16 or 32 bits ints could take up a rather 
> lot of memory in some cases.

I don't quite see the point here. Inside a bytes object,
they would be stored 1 byte per byte. Nobody is suggesting
that they would take up more than that just because
a_bytes_object[i] happens to return an int.

So the only reason to introduce a new "byte" type is to
remove some of the operations that int has. We can already
do bitwise operations on an int, so we don't need a new
type to add that capability.

What's more, I can see this leading to people asking for
arithmetic operations to be *added* to the byte type so
they can do wrap-around arithmetic, and then for 16-bit,
32-bit, 64-bit etc. versions of it, etc. etc.

Do we really want to get onto that slope?


From greg.ewing at  Tue Feb 21 10:51:20 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:51:20 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> It's quite tricky to implement a fully
> transparent wrapper that supports all the special
 > methods (__setitem__ etc.).

I was thinking the wrapper would only be a means of
filling the dict -- it wouldn't even pretend to
implement the full dict interface. The only method
it would really need to have is __getitem__.

> The semantics of defaultdict are crystal clear. __contains__(), keys()
>  and friends represent the *actual*, *current* keys.

If you're happy with that, then I am too. I was
never particularly attached to the wrapper idea --
I just mentioned it as a possible alternative.

Just one more thing -- have you made a final decision
about the name yet? I'd still prefer something like
'autodict', because to me 'defaultdict' suggests
a type that just returns default values without modifying
the dict. Maybe it should be reserved for some possible
future type that behaves that way.

Also, considering the intended use cases (accumulation,
etc.) it seems more accurate to think of the value
produced by the factory as an 'initial value' rather
than a 'default value', and I'd prefer to see it
described that way in the docs. If that is done,
having 'default' in the name wouldn't be so appropriate.


From greg.ewing at  Tue Feb 21 10:57:31 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:57:31 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> What I advocate for Python is to require that the standard base64
> codec be defined only on bytes, and always produce bytes.

I don't understand that. It seems quite clear to me that
base64 encoding (in the general sense of encoding, not the
unicode sense) takes binary data (bytes) and produces characters.
That's the whole point of base64 -- so you can send arbitrary
data over a channel that is only capable of dealing with

So in Py3k the correct usage would be

                   base64            unicode
                   encode            encode(x)
   original bytes --------> unicode ---------> bytes for transmission
                  <--------         <---------
                   base64            unicode
                   decode            decode(x)

where x is whatever unicode encoding the transmission
channel uses for characters (probably ascii or an ascii
superset, but not necessarily).

So, however it's spelled, the typing is such that

    base64_encode(bytes) --> unicode


    base64_decode(unicode) --> bytes


From greg.ewing at  Tue Feb 21 10:58:55 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:58:55 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:

> Mechanisms which rely on manipulating variables within closures or
> nested scopes to function properly can be elegant, but I've not yet seen
> one that *really* is.

It seems a bit inconsistent to say on the one hand
that direct assignment to a name in an outer scope
is not sufficiently useful to be worth supporting,
while at the same time providing a way to do it for
one particular scope, i.e. 'global'. Would you
advocate doing away with it?

 > Of course using
> classes directly with a bit of work can offer you everything you want
> from a closure, with all of the explcitness that you could ever want.

There are cases where the overhead (in terms of amount
of code) of defining a class and creating an instance of
it swamps the code which does the actual work, and,
I feel, actually obscures what is being done rather
than clarifies it. These cases benefit from the ability
to refer to names in enclosing scopes, and I believe
they would benefit further from the ability to assign
to such names.

Certainly the feature could be abused, as can the
existing nested scope facilities, or any other language
feature for that matter. Mere potential for abuse is
not sufficient reason to reject a feature, or the
language would have no features at all.

Another consideration is efficiency. CPython currently
implements access to local variables (both in the
current scope and all outer ones except the module
scope) in an extremely efficient way. There's
always the worry that using attribute access in
place of local variable access is greatly increasing
the runtime overhead for no corresponding benefit.

You mention the idea of namespaces. Maybe an answer
is to provide some lightweight way of defining a
temporary, singe-use namespace for use within
nested scopes -- lightweight in terms of both code
volume and runtime overhead. Perhaps something like

   def my_func():
     namespace foo
     foo.x = 42

     def inc_x():
       foo.x += 1

The idea here is that foo wouldn't be an object in
its own right, but just a collection of names that
would be implemented as local variables of my_func.


From greg.ewing at  Tue Feb 21 10:59:06 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 22:59:06 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Bengt Richter wrote:

> you could write
>     d = dict()**list

Or alternatively,

   ld = dict[list]

i.e. "a dict of lists". In the maximally twisted
form of this idea, the result wouldn't be a dict
but a new *type* of dict, which you would then

   d = ld(your_favourite_args_here)

This solves both the constructor-argument problem
(the new type can have the same constructor signature
as a regular dict with no conflict) and the
perceived-Liskov-nonsubstitutability problem (there's
no requirement that the new type have any particular
conceptual and/or actual inheritance relationship to
any other type). Plus being a really cool introduction
to the concepts of metaclasses, higher-order functions
and all that neat head-exploding stuff. :-)



From greg.ewing at  Tue Feb 21 11:01:51 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 21 Feb 2006 23:01:51 +1300
Subject: [Python-Dev] Papal encyclical on the use of closures (Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:

> Introducing these two new keywords is equivalent to
> encouraging nested scope use.  Right now nested scope
 > use is "limited" or "fraught with gotchas".

What you seem to be saying here is: Nested scope use
is Inherently Bad. Therefore we will keep them Limited
and Fraught With Gotchas, so people will be discouraged
from using them.

Sounds a bit like the attitude of certain religious
groups to condoms. (Might encourage people to have
sex -- can't have that -- look at all the nasty diseases
you can get!)


From fuzzyman at  Tue Feb 21 11:17:52 2006
From: fuzzyman at (Fuzzyman)
Date: Tue, 21 Feb 2006 10:17:52 +0000
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote:

>Delaney, Timothy (Tim) wrote:
>>However, *because* Python uses duck typing, I tend to feel that
>>subclasses in Python *should* be drop-in replacements.
>Duck-typing means that the only reliable way to
>assess whether two types are sufficiently compatible
>for some purpose is to consult the documentation --
>you can't just look at the base class list.
What's the API for that ?

I've had problems in code that needs to treat strings, lists and
dictionaries differently (assigning values to a container where all
three need different handling) and telling the difference but allowing
duck typing is *problematic*.


Michael Foord

From skip at  Tue Feb 21 12:58:04 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 05:58:04 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    Neal> IMO compiler warnings should generate emails from buildbot.  

It doesn't generate emails for any other condition.  I think it should just
turn the compilation section yellow.


From skip at  Tue Feb 21 13:01:24 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 06:01:24 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    >> Unfortunately, there are a ton of warnings on OS X right now.

    Ronald> How many of those do you see when you ignore the warnings you
    Ronald> get while building the Carbon extensions? 

I see a bunch related to Py_ssize_t.  Those have nothing to do with Carbon.
I don't see them on the gentoo build, so I assume they just haven't been
tackled yet.


From barry at  Tue Feb 21 13:31:54 2006
From: barry at (Barry Warsaw)
Date: Tue, 21 Feb 2006 07:31:54 -0500
Subject: [Python-Dev] bytes type discussion
In-Reply-To: <dt3nqh$rlt$>
References: <>
	<dt09vc$tvv$>	<>
	<dt0fr2$fmg$>	<>
	<dt2bsi$cb4$>  <dt3nqh$rlt$>
Message-ID: <>

On Fri, 2006-02-17 at 00:43 -0500, Steve Holden wrote:
> Fredrik Lundh wrote:
> > Barry Warsaw wrote:
> > 
> > 
> >>We know at least there will never be a 2.10, so I think we still have
> >>time.
> > 
> > 
> > because there's no way to count to 10 if you only have one digit?
> > 
> > we used to think that back when the gas price was just below 10 SEK/L,
> > but they found a way...
> > 
> IIRC Guido is on record as saying "There will be no Python 2.10 because 
> I hate the ambiguity of double-digit minor release numbers", or words to 
> that effect.

I heard the same quote, so that's what I was referring to!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From barry at  Tue Feb 21 13:55:31 2006
From: barry at (Barry Warsaw)
Date: Tue, 21 Feb 2006 07:55:31 -0500
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
Message-ID: <>

On Sun, 2006-02-19 at 23:30 +0900, Stephen J. Turnbull wrote:
> >>>>> "M" == "M.-A. Lemburg" <mal at> writes:

>     M> * for Unicode codecs the original form is Unicode, the derived
>     M> form is, in most cases, a string
> First of all, that's Martin's point!
> Second, almost all Americans, a large majority of Japanese, and I
> would bet most Western Europeans would say you have that backwards.
> That's the problem, and it's the Unicode advocates' problem (ie,
> ours), not the users'.  Even if we're right: education will require
> lots of effort.  Rather, we should just make it as easy as possible to
> do it right, and hard to do it wrong.

I think you've hit the nail squarely on the head.  Even though I /know/
what the intended semantics are, the originality of the string form is
deeply embedded in my nearly 30 years of programming experience, almost
all of it completely American English-centric.  

I always have to stop and think about which direction .encode()
and .decode() go in because it simply doesn't feel natural.  Or more
simply put, my brain knows what's right, but my heart doesn't and that's
why converting from one to the other is always a hiccup in the smooth
flow of coding.  And while I'm sympathetic to MAL's design decisions,
the overlaying of the generalizations doesn't help.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From jeremy at  Tue Feb 21 14:02:08 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 21 Feb 2006 08:02:08 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>


The lack of support for rebinding names in enclosing scopes is
certainly a wart.  I think option one is a better fit for Python,
because it more closely matches the existing naming semantics.  Namely
that assignment in a block creates a new name unless a global
statement indicates otherwise.  The revised rules would be that
assignment creates a new name unless a global or XXX statement
indicates otherwise.

The names of naming statements are quite hard to get right, I fear.  I
don't particularly like "use."  It's too generic.  (I don't
particularly like "scope" for option 2, either, for similar reasons. 
It doesn't indicate what kind of scope issue is being declared.)  The
most specifc thing I can think of is "free" to indicate that the
variable is free in the current scope.  It may be too specialized a
term to be familiar to most people.

I think free == global in the absence of other bindings.


On 2/20/06, Almann T. Goo < at> wrote:
> I am considering developing a PEP for enabling a mechanism to assign to free
> variables in a closure (nested function).  My rationale is that with the
> advent of PEP 227 , Python has proper nested lexical scopes, but can have
> undesirable behavior (especially with new developers) when a user makes
> wants to make an assignment to a free variable within a nested function.
> Furthermore, after seeing numerous kludges to "solve" the problem with a
> mutable object, like a list, as the free variable do not seem "Pythonic."  I
> have also seen mention that the use of classes can mitigate this, but that
> seems, IMHO, heavy handed in cases when an elegant solution using a closure
> would suffice and be more appropriate--especially when Python already has
> nested lexical scopes.
>  I propose two possible approaches to solve this issue:
>  1.  Adding a keyword such as "use" that would follow similar semantics as
> "global" does today.  A nested scope could declare names with this keyword
> to enable assignment to such names to change the closest parent's binding.
> The semantic would be to keep the behavior we experience today but tell the
> compiler/interpreter that a name declared with the "use" keyword would
> explicitly use an enclosing scope.  I personally like this approach the most
> since it would seem to be in keeping with the current way the language works
> and would probably be the most backwards compatible.  The semantics for how
> this interacts with the global scope would also need to be defined (should
> "use" be equivalent to a global when no name exists all parent scopes, etc.)
> def incgen( inc = 1 ) :
>    a = 6
>    def incrementer() :
>      use a
>      #use a, inc <-- list of names okay too
>      a += inc
>      return a
>    return incrementer
>  Of course, this approach suffers from a downside that every nested scope
> that wanted to assign to a parent scope's name would need to have the "use"
> keyword for those names--but one could argue that this is in keeping with
> one of Python's philosophies that "Explicit is better than implicit" (PEP
> 20).  This approach also has to deal with a user declaring a name with "
> use" that is a named parameter--this would be a semantic error that could be
> handled like "global " does today with a SyntaxError.
>  2.  Adding a keyword such as "scope" that would behave similarly to
> JavaScript's " var" keyword.  A name could be declared with such a keyword
> optionally and all nested scopes would use the declaring scope's binding
> when accessing or assigning to a particular name.  This approach has similar
> benefits to my first approach, but is clearly more top-down than the first
> approach.  Subsequent "scope" declarations would create a new binding at the
> declaring scope for the declaring and child scopes to use.  This could
> potentially be a gotcha for users expecting the binding semantics in place
> today.  Also the scope keyword would have to be allowed to be used on
> parameters to allow such parameter names to be used in a similar fashion in
> a child scope.
> def incgen( inc = 1 ) :
>    #scope inc <-- allow scope declaration for bound parameters (not a big
> fan of this)
>    scope a = 6
>    def incrementer() :
>      a += inc
>      return a
>    return incrementer
>  This approach would be similar to languages like JavaScript that allow for
> explicit scope binding with the use of "var" or more static languages that
> allow re-declaring names at lower scopes.  I am less in favor of this,
> because I don't think it feels very "Pythonic".
>  As a point of reference, some languages such as Ruby will only bind a new
> name to a scope on assignment when an enclosing scope does not have the name
> bound.  I do believe the Python name binding semantics have issues (for
> which the "global" keyword was born), but I feel that the "fixing" the
> Python semantic to a more "Ruby-like" one adds as many problems as it solves
> since the "Ruby-like" one is just as implicit in nature.  Not to mention the
> backwards compatibility impact is probably much larger.
> I would like the community's opinion if there is enough out there that think
> this would be a worthwile endevour--or if there is already an initiative
> that I missed.  Please let me know your questions, comments.
>  Best Regards,
>  Almann
> --
> Almann T. Goo
> at
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From at  Tue Feb 21 14:16:08 2006
From: at (Almann T. Goo)
Date: Tue, 21 Feb 2006 08:16:08 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>


I definitely agree that option one is more in line with the semantics
in place within Python today.

> The names of naming statements are quite hard to get right, I fear.  I
> don't particularly like "use."  It's too generic.  (I don't
> particularly like "scope" for option 2, either, for similar reasons.
> It doesn't indicate what kind of scope issue is being declared.)  The
> most specifc thing I can think of is "free" to indicate that the
> variable is free in the current scope.  It may be too specialized a
> term to be familiar to most people.

I am not married to any particular keyword for sure--I would be happy
for the most part if the language was fixed regardless of the keyword
chosen.  "free" gives me the sense that I am de-allocating memory (my
C background talking), I don't think most people would get the
mathematical reference for "free".

I certainly hope that an initiative like this doesn't get stymied by
the lack of a good name for such a keyword.  Maybe something like

> I think free == global in the absence of other bindings.

I actually like this, would sort of make "global" obsolete (and thus
making the global scope behave like other lexical scopes with regard
to to re-binding, which is probably a good thing)


Almann T. Goo at

From guido at  Tue Feb 21 14:58:52 2006
From: guido at (Guido van Rossum)
Date: Tue, 21 Feb 2006 05:58:52 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Bengt Richter <bokr at> wrote:
> How about doing it as an expression, empowering ( ;-) the dict just afer creation?
> E.g., for
>     d = dict()
>     d.default_factory = list
> you could write
>     d = dict()**list

Bengt, can you let your overactive imagination rest for a while? I
recommend that you sit back, relax for a season, and reflect on the
zen nature of Pythonicity. Then come back and hopefully you'll be able
to post without embarrassing yourself continuously.

--Guido van Rossum (home page:

From guido at  Tue Feb 21 15:04:34 2006
From: guido at (Guido van Rossum)
Date: Tue, 21 Feb 2006 06:04:34 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/21/06, Fuzzyman <fuzzyman at> wrote:
> I've had problems in code that needs to treat strings, lists and
> dictionaries differently (assigning values to a container where all
> three need different handling) and telling the difference but allowing
> duck typing is *problematic*.

Consider designing APIs that don't require you to mae that kind of
distinction, if you're worried about edge cases and classifying
arbitrary other objects correctly. It's totally possible to create an
object that behaves like a hybrid of a string and a dict.

If you're only interested in classifying the three specific built-ins
you mention, I'd check for the presense of certain attributes:
hasattr(x, "lower") -> x is a string of some kind; hasattr(x, "sort")
-> x is a list; hasattr(x, "update") -> x is a dict. Also, hasattr(x,
"union") -> x is a set; hasattr(x, "readline") -> x is a file.

That's duck typing!

--Guido van Rossum (home page:

From skip at  Tue Feb 21 15:07:56 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 08:07:56 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    Ronald> How many of those do you see when you ignore the warnings you
    Ronald> get while building the Carbon extensions?

    skip> I see a bunch related to Py_ssize_t.  Those have nothing to do
    skip> with Carbon.  I don't see them on the gentoo build, so I assume
    skip> they just haven't been tackled yet.

Let me rephrase that.  I assume the people digging through Py_ssize_t issues
have been looking at compilation warnings for platforms other than Mac OSX.


From thomas at  Tue Feb 21 15:27:54 2006
From: thomas at (Thomas Wouters)
Date: Tue, 21 Feb 2006 15:27:54 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Feb 21, 2006 at 08:02:08AM -0500, Jeremy Hylton wrote:

> The lack of support for rebinding names in enclosing scopes is
> certainly a wart.  I think option one is a better fit for Python,
> because it more closely matches the existing naming semantics.  Namely
> that assignment in a block creates a new name unless a global
> statement indicates otherwise.  The revised rules would be that
> assignment creates a new name unless a global or XXX statement
> indicates otherwise.

I agree with Jeremy on this. I've been thinking about doing something like
this myself, but never got 'round to it. It doesn't make working with
closures much easier, and I doubt it'll encourage using closures much, but
it does remove the wart of needing to use mutable objects to make them

> The names of naming statements are quite hard to get right, I fear.  I
> don't particularly like "use."  It's too generic.  (I don't
> particularly like "scope" for option 2, either, for similar reasons. 
> It doesn't indicate what kind of scope issue is being declared.)  The
> most specifc thing I can think of is "free" to indicate that the
> variable is free in the current scope.  It may be too specialized a
> term to be familiar to most people.

I was contemplating 'enclosed' as a declaration, myself. Maybe, if there's
enough of a consent on any name before Python 2.5a1 is released, and the
feature isn't going to make it into 2.5, we could ease the introduction of a
new keyword by issuing warning about the keyword in 2.5 already. (Rather
than a future-import to enable it in 2.6.) Maybe, and only if there's no
doubt about how it's going in, of course.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From bokr at  Tue Feb 21 15:29:11 2006
From: bokr at (Bengt Richter)
Date: Tue, 21 Feb 2006 14:29:11 GMT
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
References: <>
Message-ID: <>

On Tue, 21 Feb 2006 08:02:08 -0500, "Jeremy Hylton" <jeremy at> wrote:

>The lack of support for rebinding names in enclosing scopes is
>certainly a wart.  I think option one is a better fit for Python,
>because it more closely matches the existing naming semantics.  Namely
>that assignment in a block creates a new name unless a global
>statement indicates otherwise.  The revised rules would be that
>assignment creates a new name unless a global or XXX statement
>indicates otherwise.
>The names of naming statements are quite hard to get right, I fear.  I
>don't particularly like "use."  It's too generic.  (I don't
>particularly like "scope" for option 2, either, for similar reasons. 
>It doesn't indicate what kind of scope issue is being declared.)  The
>most specifc thing I can think of is "free" to indicate that the
>variable is free in the current scope.  It may be too specialized a
>term to be familiar to most people.
>I think free == global in the absence of other bindings.
Hey, only Guido is allowed to top-post. He said so ;-)

But to the topic, it just occurred to me that any outer scopes could be given names
(including global namespace, but that would have the name global by default, so
global.x would essentially mean what globals()['x'] means now, except it would
be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.

    namespace g_alias  # g_alias.x becomes alternate spelling of global.x
    def outer():
        namespace mezzanine
        a = 123
        print a  # => 123
        print mezzanine.a  # => 123 (the name space name is visible and functional locally)
        def inner():
            print mezzanine.a => 123
            mezznine.a =456
        print a # = 456
        global.x = re-binds global x, name error if not preexisting.

This would allow creating mezzanine like an attribute view of the slots in that local namespace,
as well as making namespace itself visible there, so the access to mezzanine would look like a read access to
an ordinary object named mezzanine that happened to have attribute slots matching outer's local name space.

Efficiency might make it desirable not to extend named namespaces with new names, function locals being
slotted in a fixed space tied into the frame (I think). But there are tricks I guess.
Anyway, I hadn't seen this idea before. Seems

Bengt Richter

>On 2/20/06, Almann T. Goo < at> wrote:
>> I am considering developing a PEP for enabling a mechanism to assign to free
>> variables in a closure (nested function).  My rationale is that with the
>> advent of PEP 227 , Python has proper nested lexical scopes, but can have
>> undesirable behavior (especially with new developers) when a user makes
>> wants to make an assignment to a free variable within a nested function.
>> Furthermore, after seeing numerous kludges to "solve" the problem with a
>> mutable object, like a list, as the free variable do not seem "Pythonic."  I
>> have also seen mention that the use of classes can mitigate this, but that
>> seems, IMHO, heavy handed in cases when an elegant solution using a closure
>> would suffice and be more appropriate--especially when Python already has
>> nested lexical scopes.
>>  I propose two possible approaches to solve this issue:
>>  1.  Adding a keyword such as "use" that would follow similar semantics as
>> "global" does today.  A nested scope could declare names with this keyword
>> to enable assignment to such names to change the closest parent's binding.
>> The semantic would be to keep the behavior we experience today but tell the
>> compiler/interpreter that a name declared with the "use" keyword would
>> explicitly use an enclosing scope.  I personally like this approach the most
>> since it would seem to be in keeping with the current way the language works
>> and would probably be the most backwards compatible.  The semantics for how
>> this interacts with the global scope would also need to be defined (should
>> "use" be equivalent to a global when no name exists all parent scopes, etc.)
>> def incgen( inc = 1 ) :
>>    a = 6
>>    def incrementer() :
>>      use a
>>      #use a, inc <-- list of names okay too
>>      a += inc
>>      return a
>>    return incrementer
>>  Of course, this approach suffers from a downside that every nested scope
>> that wanted to assign to a parent scope's name would need to have the "use"
>> keyword for those names--but one could argue that this is in keeping with
>> one of Python's philosophies that "Explicit is better than implicit" (PEP
>> 20).  This approach also has to deal with a user declaring a name with "
>> use" that is a named parameter--this would be a semantic error that could be
>> handled like "global " does today with a SyntaxError.
>>  2.  Adding a keyword such as "scope" that would behave similarly to
>> JavaScript's " var" keyword.  A name could be declared with such a keyword
>> optionally and all nested scopes would use the declaring scope's binding
>> when accessing or assigning to a particular name.  This approach has similar
>> benefits to my first approach, but is clearly more top-down than the first
>> approach.  Subsequent "scope" declarations would create a new binding at the
>> declaring scope for the declaring and child scopes to use.  This could
>> potentially be a gotcha for users expecting the binding semantics in place
>> today.  Also the scope keyword would have to be allowed to be used on
>> parameters to allow such parameter names to be used in a similar fashion in
>> a child scope.
>> def incgen( inc = 1 ) :
>>    #scope inc <-- allow scope declaration for bound parameters (not a big
>> fan of this)
>>    scope a = 6
>>    def incrementer() :
>>      a += inc
>>      return a
>>    return incrementer
>>  This approach would be similar to languages like JavaScript that allow for
>> explicit scope binding with the use of "var" or more static languages that
>> allow re-declaring names at lower scopes.  I am less in favor of this,
>> because I don't think it feels very "Pythonic".
>>  As a point of reference, some languages such as Ruby will only bind a new
>> name to a scope on assignment when an enclosing scope does not have the name
>> bound.  I do believe the Python name binding semantics have issues (for
>> which the "global" keyword was born), but I feel that the "fixing" the
>> Python semantic to a more "Ruby-like" one adds as many problems as it solves
>> since the "Ruby-like" one is just as implicit in nature.  Not to mention the
>> backwards compatibility impact is probably much larger.
>> I would like the community's opinion if there is enough out there that think
>> this would be a worthwile endevour--or if there is already an initiative
>> that I missed.  Please let me know your questions, comments.
>>  Best Regards,
>>  Almann
>> --
>> Almann T. Goo
>> at
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at
>> Unsubscribe:
>Python-Dev mailing list
>Python-Dev at

From jeremy at  Tue Feb 21 15:32:55 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 21 Feb 2006 09:32:55 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

I had to lookup top-post :-).

On 2/21/06, Bengt Richter <bokr at> wrote:
> On Tue, 21 Feb 2006 08:02:08 -0500, "Jeremy Hylton" <jeremy at> wrote:
> >Jeremy
> Hey, only Guido is allowed to top-post. He said so ;-)

The Gmail UI makes it really easy to forget where the q

> But to the topic, it just occurred to me that any outer scopes could be given names
> (including global namespace, but that would have the name global by default, so
> global.x would essentially mean what globals()['x'] means now, except it would
> be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.
>     namespace g_alias  # g_alias.x becomes alternate spelling of global.x
>     def outer():
>         namespace mezzanine
>         a = 123
>         print a  # => 123
>         print mezzanine.a  # => 123 (the name space name is visible and functional locally)
>         def inner():
>             print mezzanine.a => 123
>             mezznine.a =456
>         inner()
>         print a # = 456
>         global.x = re-binds global x, name error if not preexisting.
> This would allow creating mezzanine like an attribute view of the slots in that local namespace,
> as well as making namespace itself visible there, so the access to mezzanine would look like a read access to
> an ordinary object named mezzanine that happened to have attribute slots matching outer's local name space.
> Efficiency might make it desirable not to extend named namespaces with new names, function locals being
> slotted in a fixed space tied into the frame (I think). But there are tricks I guess.
> Anyway, I hadn't seen this idea before. Seems
> Regards,
> Bengt Richter
> >
> >On 2/20/06, Almann T. Goo < at> wrote:
> >> I am considering developing a PEP for enabling a mechanism to assign to free
> >> variables in a closure (nested function).  My rationale is that with the
> >> advent of PEP 227 , Python has proper nested lexical scopes, but can have
> >> undesirable behavior (especially with new developers) when a user makes
> >> wants to make an assignment to a free variable within a nested function.
> >> Furthermore, after seeing numerous kludges to "solve" the problem with a
> >> mutable object, like a list, as the free variable do not seem "Pythonic."  I
> >> have also seen mention that the use of classes can mitigate this, but that
> >> seems, IMHO, heavy handed in cases when an elegant solution using a closure
> >> would suffice and be more appropriate--especially when Python already has
> >> nested lexical scopes.
> >>
> >>  I propose two possible approaches to solve this issue:
> >>
> >>  1.  Adding a keyword such as "use" that would follow similar semantics as
> >> "global" does today.  A nested scope could declare names with this keyword
> >> to enable assignment to such names to change the closest parent's binding.
> >> The semantic would be to keep the behavior we experience today but tell the
> >> compiler/interpreter that a name declared with the "use" keyword would
> >> explicitly use an enclosing scope.  I personally like this approach the most
> >> since it would seem to be in keeping with the current way the language works
> >> and would probably be the most backwards compatible.  The semantics for how
> >> this interacts with the global scope would also need to be defined (should
> >> "use" be equivalent to a global when no name exists all parent scopes, etc.)
> >>
> >>
> >> def incgen( inc = 1 ) :
> >>    a = 6
> >>    def incrementer() :
> >>      use a
> >>      #use a, inc <-- list of names okay too
> >>      a += inc
> >>      return a
> >>    return incrementer
> >>
> >>  Of course, this approach suffers from a downside that every nested scope
> >> that wanted to assign to a parent scope's name would need to have the "use"
> >> keyword for those names--but one could argue that this is in keeping with
> >> one of Python's philosophies that "Explicit is better than implicit" (PEP
> >> 20).  This approach also has to deal with a user declaring a name with "
> >> use" that is a named parameter--this would be a semantic error that could be
> >> handled like "global " does today with a SyntaxError.
> >>
> >>  2.  Adding a keyword such as "scope" that would behave similarly to
> >> JavaScript's " var" keyword.  A name could be declared with such a keyword
> >> optionally and all nested scopes would use the declaring scope's binding
> >> when accessing or assigning to a particular name.  This approach has similar
> >> benefits to my first approach, but is clearly more top-down than the first
> >> approach.  Subsequent "scope" declarations would create a new binding at the
> >> declaring scope for the declaring and child scopes to use.  This could
> >> potentially be a gotcha for users expecting the binding semantics in place
> >> today.  Also the scope keyword would have to be allowed to be used on
> >> parameters to allow such parameter names to be used in a similar fashion in
> >> a child scope.
> >>
> >>
> >> def incgen( inc = 1 ) :
> >>    #scope inc <-- allow scope declaration for bound parameters (not a big
> >> fan of this)
> >>    scope a = 6
> >>    def incrementer() :
> >>      a += inc
> >>      return a
> >>    return incrementer
> >>
> >>  This approach would be similar to languages like JavaScript that allow for
> >> explicit scope binding with the use of "var" or more static languages that
> >> allow re-declaring names at lower scopes.  I am less in favor of this,
> >> because I don't think it feels very "Pythonic".
> >>
> >>  As a point of reference, some languages such as Ruby will only bind a new
> >> name to a scope on assignment when an enclosing scope does not have the name
> >> bound.  I do believe the Python name binding semantics have issues (for
> >> which the "global" keyword was born), but I feel that the "fixing" the
> >> Python semantic to a more "Ruby-like" one adds as many problems as it solves
> >> since the "Ruby-like" one is just as implicit in nature.  Not to mention the
> >> backwards compatibility impact is probably much larger.
> >>
> >> I would like the community's opinion if there is enough out there that think
> >> this would be a worthwile endevour--or if there is already an initiative
> >> that I missed.  Please let me know your questions, comments.
> >>
> >>  Best Regards,
> >>  Almann
> >>
> >> --
> >> Almann T. Goo
> >> at
> >> _______________________________________________
> >> Python-Dev mailing list
> >> Python-Dev at
> >>
> >> Unsubscribe:
> >>
> >>
> >>
> >>
> >_______________________________________________
> >Python-Dev mailing list
> >Python-Dev at
> >
> >Unsubscribe:
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From jeremy at  Tue Feb 21 15:37:06 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 21 Feb 2006 09:37:06 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/21/06, Jeremy Hylton <jeremy at> wrote:
> I had to lookup top-post :-).
> On 2/21/06, Bengt Richter <bokr at> wrote:
> > On Tue, 21 Feb 2006 08:02:08 -0500, "Jeremy Hylton" <jeremy at> wrote:
> > >Jeremy
> > Hey, only Guido is allowed to top-post. He said so ;-)
> The Gmail UI makes it really easy to forget where the q

Sorry about that.  Hit the send key by mistake.

The Gmail UI makes it really easy to forget where the quoted text is
in relation to your own text.

> > But to the topic, it just occurred to me that any outer scopes could be given names
> > (including global namespace, but that would have the name global by default, so
> > global.x would essentially mean what globals()['x'] means now, except it would
> > be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.

Isn't this suggestion that same as Greg Ewing's?

> >     namespace g_alias  # g_alias.x becomes alternate spelling of global.x
> >     def outer():
> >         namespace mezzanine
> >         a = 123
> >         print a  # => 123
> >         print mezzanine.a  # => 123 (the name space name is visible and functional locally)
> >         def inner():
> >             print mezzanine.a => 123
> >             mezznine.a =456
> >         inner()
> >         print a # = 456
> >         global.x = re-binds global x, name error if not preexisting.
> >
> > This would allow creating mezzanine like an attribute view of the slots in that local namespace,
> > as well as making namespace itself visible there, so the access to mezzanine would look like a read access to
> > an ordinary object named mezzanine that happened to have attribute slots matching outer's local name space.

I don't think using attribute access is particularly clear here.  It
introduces an entirely new concept, a first-class namespace, in order
to solve a small scoping problem.  It looks too much like attribute
access and not enough like accessing a variable.


> > Efficiency might make it desirable not to extend named namespaces with new names, function locals being
> > slotted in a fixed space tied into the frame (I think). But there are tricks I guess.
> > Anyway, I hadn't seen this idea before. Seems
> >
> > Regards,
> > Bengt Richter
> >
> > >
> > >On 2/20/06, Almann T. Goo < at> wrote:
> > >> I am considering developing a PEP for enabling a mechanism to assign to free
> > >> variables in a closure (nested function).  My rationale is that with the
> > >> advent of PEP 227 , Python has proper nested lexical scopes, but can have
> > >> undesirable behavior (especially with new developers) when a user makes
> > >> wants to make an assignment to a free variable within a nested function.
> > >> Furthermore, after seeing numerous kludges to "solve" the problem with a
> > >> mutable object, like a list, as the free variable do not seem "Pythonic."  I
> > >> have also seen mention that the use of classes can mitigate this, but that
> > >> seems, IMHO, heavy handed in cases when an elegant solution using a closure
> > >> would suffice and be more appropriate--especially when Python already has
> > >> nested lexical scopes.
> > >>
> > >>  I propose two possible approaches to solve this issue:
> > >>
> > >>  1.  Adding a keyword such as "use" that would follow similar semantics as
> > >> "global" does today.  A nested scope could declare names with this keyword
> > >> to enable assignment to such names to change the closest parent's binding.
> > >> The semantic would be to keep the behavior we experience today but tell the
> > >> compiler/interpreter that a name declared with the "use" keyword would
> > >> explicitly use an enclosing scope.  I personally like this approach the most
> > >> since it would seem to be in keeping with the current way the language works
> > >> and would probably be the most backwards compatible.  The semantics for how
> > >> this interacts with the global scope would also need to be defined (should
> > >> "use" be equivalent to a global when no name exists all parent scopes, etc.)
> > >>
> > >>
> > >> def incgen( inc = 1 ) :
> > >>    a = 6
> > >>    def incrementer() :
> > >>      use a
> > >>      #use a, inc <-- list of names okay too
> > >>      a += inc
> > >>      return a
> > >>    return incrementer
> > >>
> > >>  Of course, this approach suffers from a downside that every nested scope
> > >> that wanted to assign to a parent scope's name would need to have the "use"
> > >> keyword for those names--but one could argue that this is in keeping with
> > >> one of Python's philosophies that "Explicit is better than implicit" (PEP
> > >> 20).  This approach also has to deal with a user declaring a name with "
> > >> use" that is a named parameter--this would be a semantic error that could be
> > >> handled like "global " does today with a SyntaxError.
> > >>
> > >>  2.  Adding a keyword such as "scope" that would behave similarly to
> > >> JavaScript's " var" keyword.  A name could be declared with such a keyword
> > >> optionally and all nested scopes would use the declaring scope's binding
> > >> when accessing or assigning to a particular name.  This approach has similar
> > >> benefits to my first approach, but is clearly more top-down than the first
> > >> approach.  Subsequent "scope" declarations would create a new binding at the
> > >> declaring scope for the declaring and child scopes to use.  This could
> > >> potentially be a gotcha for users expecting the binding semantics in place
> > >> today.  Also the scope keyword would have to be allowed to be used on
> > >> parameters to allow such parameter names to be used in a similar fashion in
> > >> a child scope.
> > >>
> > >>
> > >> def incgen( inc = 1 ) :
> > >>    #scope inc <-- allow scope declaration for bound parameters (not a big
> > >> fan of this)
> > >>    scope a = 6
> > >>    def incrementer() :
> > >>      a += inc
> > >>      return a
> > >>    return incrementer
> > >>
> > >>  This approach would be similar to languages like JavaScript that allow for
> > >> explicit scope binding with the use of "var" or more static languages that
> > >> allow re-declaring names at lower scopes.  I am less in favor of this,
> > >> because I don't think it feels very "Pythonic".
> > >>
> > >>  As a point of reference, some languages such as Ruby will only bind a new
> > >> name to a scope on assignment when an enclosing scope does not have the name
> > >> bound.  I do believe the Python name binding semantics have issues (for
> > >> which the "global" keyword was born), but I feel that the "fixing" the
> > >> Python semantic to a more "Ruby-like" one adds as many problems as it solves
> > >> since the "Ruby-like" one is just as implicit in nature.  Not to mention the
> > >> backwards compatibility impact is probably much larger.
> > >>
> > >> I would like the community's opinion if there is enough out there that think
> > >> this would be a worthwile endevour--or if there is already an initiative
> > >> that I missed.  Please let me know your questions, comments.
> > >>
> > >>  Best Regards,
> > >>  Almann
> > >>
> > >> --
> > >> Almann T. Goo
> > >> at
> > >> _______________________________________________
> > >> Python-Dev mailing list
> > >> Python-Dev at
> > >>
> > >> Unsubscribe:
> > >>
> > >>
> > >>
> > >>
> > >_______________________________________________
> > >Python-Dev mailing list
> > >Python-Dev at
> > >
> > >Unsubscribe:
> > >
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at
> >
> > Unsubscribe:
> >

From at  Tue Feb 21 16:12:06 2006
From: at (Almann T. Goo)
Date: Tue, 21 Feb 2006 10:12:06 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

> But to the topic, it just occurred to me that any outer scopes could be given names
> (including global namespace, but that would have the name global by default, so
> global.x would essentially mean what globals()['x'] means now, except it would
> be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.
>     namespace g_alias  # g_alias.x becomes alternate spelling of global.x
>     def outer():
>         namespace mezzanine
>         a = 123
>         print a  # => 123
>         print mezzanine.a  # => 123 (the name space name is visible and functional locally)
>         def inner():
>             print mezzanine.a => 123
>             mezznine.a =456
>         inner()
>         print a # = 456
>         global.x = re-binds global x, name error if not preexisting.
> This would allow creating mezzanine like an attribute view of the slots in that local namespace,
> as well as making namespace itself visible there, so the access to mezzanine would look like a read access to
> an ordinary object named mezzanine that happened to have attribute slots matching outer's local name space.

This seems like a neat idea in principle, but I wonder if it removes
consistency from the language.    Consider that the scope that
declares the namespace and its child scopes the names could be
accessed by the namespace object or the direct name, but *only* in the
child scopes can re-binding for the name be done via the namespace

  def outer() :
    namespace n
    a = 5 # <-- same as n.a = 5
    def inner() :
      print a # <-- same as n.a
      n.a = 7 # <-- *not* the same as a = 7
    print n.a

I don't like how a child scope can access a free variable from an
enclosing scope without the namespace object, but needs to use it for
re-binding.  Because of this, namespace objects have the potential to
obfuscate things more than fix the language issue that I am


Almann T. Goo at

From rrr at  Tue Feb 21 16:48:21 2006
From: rrr at (Ron Adam)
Date: Tue, 21 Feb 2006 09:48:21 -0600
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>	<>	<>
Message-ID: <>

Jeremy Hylton wrote:
> On 2/21/06, Jeremy Hylton <jeremy at> wrote:
>> I had to lookup top-post :-).
>> On 2/21/06, Bengt Richter <bokr at> wrote:
>>> On Tue, 21 Feb 2006 08:02:08 -0500, "Jeremy Hylton" <jeremy at> wrote:
>>>> Jeremy
>>> Hey, only Guido is allowed to top-post. He said so ;-)
>> The Gmail UI makes it really easy to forget where the q
> Sorry about that.  Hit the send key by mistake.
> The Gmail UI makes it really easy to forget where the quoted text is
> in relation to your own text.
>>> But to the topic, it just occurred to me that any outer scopes could be given names
>>> (including global namespace, but that would have the name global by default, so
>>> global.x would essentially mean what globals()['x'] means now, except it would
>>> be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.
> Isn't this suggestion that same as Greg Ewing's?
>>>     namespace g_alias  # g_alias.x becomes alternate spelling of global.x
>>>     def outer():
>>>         namespace mezzanine
>>>         a = 123
>>>         print a  # => 123
>>>         print mezzanine.a  # => 123 (the name space name is visible and functional locally)
>>>         def inner():
>>>             print mezzanine.a => 123
>>>             mezznine.a =456
>>>         inner()
>>>         print a # = 456
>>>         global.x = re-binds global x, name error if not preexisting.
>>> This would allow creating mezzanine like an attribute view of the slots in that local namespace,
>>> as well as making namespace itself visible there, so the access to mezzanine would look like a read access to
>>> an ordinary object named mezzanine that happened to have attribute slots matching outer's local name space.

Why not just use a class?

def incgen(start=0, inc=1) :
    class incrementer(object):
      a = start - inc
      def __call__(self):
         self.a += inc
         return self.a
    return incrementer()

a = incgen(7, 5)
for n in range(10):
    print a(),

7 12 17 22 27 32 37 42 47 52

    Ronald Adam

From barry at  Tue Feb 21 16:50:58 2006
From: barry at (Barry Warsaw)
Date: Tue, 21 Feb 2006 10:50:58 -0500
Subject: [Python-Dev] Deprecate ``multifile``?
In-Reply-To: <dt4hf2$adn$>
References: <dt4dst$tma$> <dt4gqa$8i5$>
Message-ID: <>

On Fri, 2006-02-17 at 14:01 +0100, Georg Brandl wrote:
> Fredrik Lundh wrote:
> > Georg Brandl wrote:
> > 
> >> as Jim Jewett noted, multifile is supplanted by email as much as mimify etc.
> >> but it is not marked as deprecated. Should it be deprecated in 2.5?
> > 
> > -0.5 (gratuitous breakage).
> > 
> > I think the current "see also/supersedes" link is good enough.
> Well, it would be deprecated like the other email modules, that is, only
> a note is added to the docs and it is added to PEP 4. There would be no
> warning.

IIRC, when I brought this up ages ago, there was some grumbling that
multifile is useful for other than email/MIME applications.  Still, I'm
+1 on PEP 4'ing it.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From aleaxit at  Tue Feb 21 16:52:22 2006
From: aleaxit at (Alex Martelli)
Date: Tue, 21 Feb 2006 07:52:22 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 21, 2006, at 1:51 AM, Greg Ewing wrote:
> Just one more thing -- have you made a final decision
> about the name yet? I'd still prefer something like
> 'autodict', because to me 'defaultdict' suggests

autodict is shorter and sharper and I prefer it, too: +1

> etc.) it seems more accurate to think of the value
> produced by the factory as an 'initial value' rather
> than a 'default value', and I'd prefer to see it

If we call the type autodict, then having the factory attribute named  
autofactory seems to fit. This leaves it open to the reader's  
imagination to choose whether to think of the value as "initial" or  
"default" -- it's the *auto* (automatic) value.


From fuzzyman at  Tue Feb 21 17:09:18 2006
From: fuzzyman at (Fuzzyman)
Date: Tue, 21 Feb 2006 16:09:18 +0000
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	
	<> <>
Message-ID: <>

Guido van Rossum wrote:

>On 2/21/06, Fuzzyman <fuzzyman at> wrote:
>>I've had problems in code that needs to treat strings, lists and
>>dictionaries differently (assigning values to a container where all
>>three need different handling) and telling the difference but allowing
>>duck typing is *problematic*.
>Consider designing APIs that don't require you to mae that kind of
>distinction, if you're worried about edge cases and classifying
>arbitrary other objects correctly. It's totally possible to create an
>object that behaves like a hybrid of a string and a dict.

>If you're only interested in classifying the three specific built-ins
>you mention, I'd check for the presense of certain attributes:
>hasattr(x, "lower") -> x is a string of some kind; hasattr(x, "sort")
>-> x is a list; hasattr(x, "update") -> x is a dict. Also, hasattr(x,
>"union") -> x is a set; hasattr(x, "readline") -> x is a file.
>That's duck typing!
Sure, but that requires a "dictionary like object" to define an update
method, and a "list like object" to define a sort method.

The mapping and sequence protocols are so loosely defined that some
arbitrary decision like this has to be made. (Any object that defines
"__getitem__" could follow either or both and duck typing doesn't help
you unless you're prepared to make an additional requirement that is
outside the loose requirements of the protocol.)

I can't remember how we solved it, but I think we decided that an object
would be treated as a string if it passed isinstance, and a dictionary
or sequence if it has _getitem__ (but isn't a string instance or
subclass). If it has update as well as __getitem__ it is a

All the best,

Michael Foord

>--Guido van Rossum (home page:

From g.brandl at  Tue Feb 21 17:13:12 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 17:13:12 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>
Message-ID: <dtfe6o$q9u$>

Greg Ewing wrote:

>    def my_func():
>      namespace foo
>      foo.x = 42
>      def inc_x():
>        foo.x += 1
> The idea here is that foo wouldn't be an object in
> its own right, but just a collection of names that
> would be implemented as local variables of my_func.

But why is that better than

class namespace(object): pass

def my_func():
    foo = namespace()



From at  Tue Feb 21 17:15:40 2006
From: at (Almann T. Goo)
Date: Tue, 21 Feb 2006 11:15:40 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

> Why not just use a class?
> def incgen(start=0, inc=1) :
>     class incrementer(object):
>       a = start - inc
>       def __call__(self):
>          self.a += inc
>          return self.a
>     return incrementer()
> a = incgen(7, 5)
> for n in range(10):
>     print a(),

Because I think that this is a workaround for a concept that the
language doesn't support elegantly with its lexically nested scopes.

IMO, you are emulating name rebinding in a closure by creating an
object to encapsulate the name you want to rebind--you don't need this
workaround if you only need to access free variables in an enclosing
scope.  I provided a "lighter" example that didn't need a callable
object but could use any mutable such as a list.

This kind of workaround is needed as soon as you want to re-bind a
parent scope's name, except in the case when the parent scope is the
global scope (since there is the "global" keyword to handle this). 
It's this dichotomy that concerns me, since it seems to be against the
elegance of Python--at least in my opinion.

It seems artificially limiting that enclosing scope name rebinds are
not provided for by the language especially since the behavior with
the global scope is not so.  In a nutshell I am proposing a solution
to make nested lexical scopes to be orthogonal with the global scope
and removing a "wart," as Jeremy put it, in the language.


Almann T. Goo at

From g.brandl at  Tue Feb 21 17:18:01 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 17:18:01 +0100
Subject: [Python-Dev] Deprecate ``multifile``?
In-Reply-To: <>
References: <dt4dst$tma$>
	<dt4gqa$8i5$>	<dt4hf2$adn$>
Message-ID: <dtfefp$q9u$>

Barry Warsaw wrote:
> On Fri, 2006-02-17 at 14:01 +0100, Georg Brandl wrote:
>> Fredrik Lundh wrote:
>> > Georg Brandl wrote:
>> > 
>> >> as Jim Jewett noted, multifile is supplanted by email as much as mimify etc.
>> >> but it is not marked as deprecated. Should it be deprecated in 2.5?
>> > 
>> > -0.5 (gratuitous breakage).
>> > 
>> > I think the current "see also/supersedes" link is good enough.
>> Well, it would be deprecated like the other email modules, that is, only
>> a note is added to the docs and it is added to PEP 4. There would be no
>> warning.
> IIRC, when I brought this up ages ago, there was some grumbling that
> multifile is useful for other than email/MIME applications.  Still, I'm
> +1 on PEP 4'ing it.

Which means "go ahead" or "wait for others to be -1"? <wink>


From g.brandl at  Tue Feb 21 17:17:11 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 17:17:11 +0100
Subject: [Python-Dev] Removing Non-Unicode Support?
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <dtfee7$q9u$>

M.-A. Lemburg wrote:

> Note that I'm not saying that these switches are useless - of
> course they do allow to strip down the Python interpreter.
> I believe that only very few people are interested in having these
> options and it's fair enough to put the burden of maintaining these
> branches on them.

Which is proven by the fact that many tests fail without unicode.

So at least the people building --without-unicode don't care much about


From tjreedy at  Tue Feb 21 17:20:13 2006
From: tjreedy at (Terry Reedy)
Date: Tue, 21 Feb 2006 11:20:13 -0500
Subject: [Python-Dev] buildbot vs. Windows
References: <>
Message-ID: <dtff36$1js$>

"Neal Norwitz" <nnorwitz at> wrote in message 
news:ee2a432c0602210009x2f4d1fffl3d49037b9b084d1e at
> There's nothing to prevent buildbot from making debug builds, though
> that is not currently done.

Now that there are separate report pages for 2.4 and 2.5, you could add 
pages for debug builds, perhaps with a lower frequency (once a day?), 
without cluttering up the main two pages.

From guido at  Tue Feb 21 17:31:49 2006
From: guido at (Guido van Rossum)
Date: Tue, 21 Feb 2006 08:31:49 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/21/06, Alex Martelli <aleaxit at> wrote:
> On Feb 21, 2006, at 1:51 AM, Greg Ewing wrote:
>     ...
> > Just one more thing -- have you made a final decision
> > about the name yet? I'd still prefer something like
> > 'autodict', because to me 'defaultdict' suggests
> autodict is shorter and sharper and I prefer it, too: +1

Apart from it somehow hashing to the same place as "autodidact" in my
brain :), I don't like it as much.; someone who doesn't already know
what it is doesn't have a clue what an "automatic dictionary" would
offer compared to a regular one. IMO "default" conveys just enough of
a hint that something is being defaulted. A name long enough to convey
all the details of why, when, and it defaults wouldn't be practical.
(Look up the history of botanical names under Linnaeus for a simile.)

I'll let it brew in SF for a while but I expect to be checking this in at PyCon.

--Guido van Rossum (home page:

From bokr at  Tue Feb 21 18:09:36 2006
From: bokr at (Bengt Richter)
Date: Tue, 21 Feb 2006 17:09:36 GMT
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <>

On Tue, 21 Feb 2006 05:58:52 -0800, "Guido van Rossum" <guido at> wrote:

>On 2/20/06, Bengt Richter <bokr at> wrote:
>> How about doing it as an expression, empowering ( ;-) the dict just afer creation?
>> E.g., for
>>     d = dict()
>>     d.default_factory = list
>> you could write
>>     d = dict()**list
>Bengt, can you let your overactive imagination rest for a while? I
>recommend that you sit back, relax for a season, and reflect on the
>zen nature of Pythonicity. Then come back and hopefully you'll be able
>to post without embarrassing yourself continuously.
It is tempting to seek vindication re "embarrassing yourself continuously"
but I'll let it go, and treat it as an opportunity to explore the nature
of my ego a little further ;-)

I am not embarrassed by having an "overactive imagination," thank you,
but if it is causing a problem for you here, I apologize, and will withdraw.
Thanks for the nudge. I really have been wasting a lot of time using python
trivial pursuits as an escape from tackling stuff that I haven't been ready for.
It's time I focus. Thanks, and good luck. I'll be off now ;-)

Bengt Richter

From rrr at  Tue Feb 21 18:40:37 2006
From: rrr at (Ron Adam)
Date: Tue, 21 Feb 2006 11:40:37 -0600
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP
 332	revival in coordination with pep 349?]]
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
	<>	<>
	<> <>
Message-ID: <>

Greg Ewing wrote:
> Ron Adam wrote:
>> Storing byte information as 16 or 32 bits ints could take up a rather 
>> lot of memory in some cases.
> I don't quite see the point here. Inside a bytes object,
> they would be stored 1 byte per byte. Nobody is suggesting
> that they would take up more than that just because
> a_bytes_object[i] happens to return an int.

Yes, and the above is the obvious reason why not.  Not that I thought it 
was being considered.

> So the only reason to introduce a new "byte" type is to
> remove some of the operations that int has. We can already
> do bitwise operations on an int, so we don't need a new
> type to add that capability.

Yes, and a byte type isn't needed if the individual bytes are always in 
a bytes object. A bytes object with a single byte would be an octet in 
that case.

> What's more, I can see this leading to people asking for
> arithmetic operations to be *added* to the byte type so
> they can do wrap-around arithmetic, and then for 16-bit,
> 32-bit, 64-bit etc. versions of it, etc. etc.

I agree the bytes object shouldn't re implement arithmetic.  I would 
like bitwise logic operations on bytes() and byte ranges() if possible.

    Ronald Adam

From barry at  Tue Feb 21 19:19:24 2006
From: barry at (Barry Warsaw)
Date: Tue, 21 Feb 2006 13:19:24 -0500
Subject: [Python-Dev] Deprecate ``multifile``?
In-Reply-To: <dtfefp$q9u$>
References: <dt4dst$tma$> <dt4gqa$8i5$>
	<dt4hf2$adn$> <>
Message-ID: <>

On Tue, 2006-02-21 at 17:18 +0100, Georg Brandl wrote:

> > IIRC, when I brought this up ages ago, there was some grumbling that
> > multifile is useful for other than email/MIME applications.  Still, I'm
> > +1 on PEP 4'ing it.
> Which means "go ahead" or "wait for others to be -1"? <wink>

s/or/and/ ? :)

I say go ahead and add it to PEP 4.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From barry at  Tue Feb 21 19:26:45 2006
From: barry at (Barry Warsaw)
Date: Tue, 21 Feb 2006 13:26:45 -0500
Subject: [Python-Dev] A codecs nit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 2006-02-18 at 14:44 +0100, M.-A. Lemburg wrote:

> In Py 2.5 we'll change that. The encodings package search
> function will only allow codecs in that package to be
> imported. All other codec packages will have to provide
> their own search function and register this with the
> codecs registry.

My weekend experimentation used the imp functions to constrain the
module search path to encodings.__path__, but I'm not sure that's much
better than prepending 'encodings.' on the module name and letting
__import__() do its thing.

> The big question is: what to do about 2.3 and 2.4 - adding
> the same patch will cause serious breakage, since popular
> codec packages such as Tamito's Japanese package rely
> on the existing behavior.

FWIW, Mailman has had to do a bunch of special case loading of the 3rd
party Japanese and Korean codecs for older Pythons, and the email
package also has to do special tests for e.g. euc-jp before it'll do the
Asian codec tests.  I think most of the latter is unnecessary for 2.4
and beyond, and I suspect that the former is also unnecessary for 2.4
and beyond.  It's probably still necessary for 2.3.

IIUC, there are still people who prefer Tamito's package over the
built-in Japanese codecs in 2.4, but I don't understand all the details.
My preference would be to backport the fix to 2.4 but not worry about
2.3 since there are no plans to ever release a 2.3.6 AFAIK.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From rasky at  Tue Feb 21 19:47:04 2006
From: rasky at (Giovanni Bajo)
Date: Tue, 21 Feb 2006 19:47:04 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
References: <>
Message-ID: <0afe01c63717$351ff4a0$bf03030a@trilan>

Almann T. Goo < at> wrote:

>> 1.  Adding a keyword such as "use" that would follow similar semantics as
>> global" does today.  A nested scope could declare names with this keyword
>> to enable assignment to such names to change the closest parent's

+0, and I like "outer". I like the idea, but I grepped several Python
programs I wrote, and found out that I used the list trick many times, but
almost always in quick-hack code in unittests. I wasn't able to find a
single instance of this in real code I wrote, so I can't really be +1.
Giovanni Bajo

From jcarlson at  Tue Feb 21 19:52:48 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 21 Feb 2006 10:52:48 -0800
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Stephen J. Turnbull wrote:
> > What I advocate for Python is to require that the standard base64
> > codec be defined only on bytes, and always produce bytes.
> I don't understand that. It seems quite clear to me that
> base64 encoding (in the general sense of encoding, not the
> unicode sense) takes binary data (bytes) and produces characters.
> That's the whole point of base64 -- so you can send arbitrary
> data over a channel that is only capable of dealing with
> characters.
> So in Py3k the correct usage would be
>                    base64            unicode
>                    encode            encode(x)
>    original bytes --------> unicode ---------> bytes for transmission
>                   <--------         <---------
>                    base64            unicode
>                    decode            decode(x)
> where x is whatever unicode encoding the transmission
> channel uses for characters (probably ascii or an ascii
> superset, but not necessarily).

It doesn't seem strange to you to need to encode data twice to be able
to have a usable sequence of characters which can be embedded in an
effectively 7-bit email; when base64 was, dare I say it, designed to
have 7-bit email as its destination in the first place?  It does to me.

 - Josiah

From python at  Tue Feb 21 19:53:24 2006
From: python at (Raymond Hettinger)
Date: Tue, 21 Feb 2006 13:53:24 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <><><><><>
Message-ID: <007a01c63718$18335d40$6a01a8c0@RaymondLaptop1>

>> > Just one more thing -- have you made a final decision
>> > about the name yet? I'd still prefer something like
>> > 'autodict', because to me 'defaultdict' suggests
>> autodict is shorter and sharper and I prefer it, too: +1
> Apart from it somehow hashing to the same place as "autodidact" in my
> brain :), I don't like it as much.; someone who doesn't already know
> what it is doesn't have a clue what an "automatic dictionary" would
> offer compared to a regular one. IMO "default" conveys just enough of
> a hint that something is being defaulted. A name long enough to convey
> all the details of why, when, and it defaults wouldn't be practical.
> (Look up the history of botanical names under Linnaeus for a simile.)

I'm with Guido on this one.  The word default is closely associated with what 
makes this different from regular dictionaries and it is closely associated with 
the name of the attribute, default_factory.  Also, the word has a history of 
parallel use in the context of dict.get().

The word "auto" on the other hand is associated with nothing.  You might as well 
argue to call it magicdictionary because "magic" has two letters less than 
"default" ;-)


From tjreedy at  Tue Feb 21 20:16:17 2006
From: tjreedy at (Terry Reedy)
Date: Tue, 21 Feb 2006 14:16:17 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
References: <><>
Message-ID: <dtfou0$81t$>

"Almann T. Goo" < at> wrote in message 
news:7e9b97090602210516o5d1a823apedcea66846a271b5 at

> I certainly hope that an initiative like this doesn't get stymied by
> the lack of a good name for such a keyword.  Maybe something like
> "outer"?

Adding a keyword has a cost that you have so far ignored.  Guido is 
rightfully very cautious about additions, especially for esthetic reasons.

The issue of rebinding enclosed names was partly discussed in PEP 227. 
Sometime after the implementation of the PEP in 2.1, it was thoroughly 
discussed again (100+ posts?) in this forum.  There were perhaps 10 
different proposals, including, I believe, 'outer'.  Guido rejected them 
all as having costs greater than the benefits.  Perhaps you can find this 
discussion in the archives.  I remember it as a Jan-Feb discussion but 
might be wrong.

This thread so far seems like a rehash of parts of the earlier discussion. 
In the absence of indication from Guido that he is ready to reopen the 
issue, perhaps it would better go to comp.lang.python.  In and case, 
reconsideration is more likely to be stimulated by new experience with 
problems in real code than by repeats of 'orthogonality' desires and 
rejected changes.


In another post, you rejected the use of class instances by opining:

>Because I think that this is a workaround for a concept that the
>language doesn't support elegantly with its lexically nested scopes.

>IMO, you are emulating name rebinding in a closure by creating an
>object to encapsulate the name you want to rebind

Guido, on the other hand, views classes and instances as Python's method of 
doing what other (functional) languages do with closures.  From the PEP:
    "Given that this
    would encourage the use of local variables to hold state that is
    better stored in a class instance, it's not worth adding new
    syntax to make this possible (in Guido's opinion)."
He reiterated this viewpoint in the post-PEP discussion mentioned above.  I 
think he would specificly reject the view that Python's alternative is a 
'workaround' and 'emulation' of what you must consider to be the real 

Terry Jan Reedy

From jeremy at  Tue Feb 21 20:25:06 2006
From: jeremy at (Jeremy Hylton)
Date: Tue, 21 Feb 2006 14:25:06 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <dtfou0$81t$>
References: <>
Message-ID: <>

On 2/21/06, Terry Reedy <tjreedy at> wrote:
> "Almann T. Goo" < at> wrote in message
> news:7e9b97090602210516o5d1a823apedcea66846a271b5 at
> > I certainly hope that an initiative like this doesn't get stymied by
> > the lack of a good name for such a keyword.  Maybe something like
> > "outer"?
> Adding a keyword has a cost that you have so far ignored.  Guido is
> rightfully very cautious about additions, especially for esthetic reasons.
> The issue of rebinding enclosed names was partly discussed in PEP 227.
> Sometime after the implementation of the PEP in 2.1, it was thoroughly
> discussed again (100+ posts?) in this forum.  There were perhaps 10
> different proposals, including, I believe, 'outer'.  Guido rejected them
> all as having costs greater than the benefits.  Perhaps you can find this
> discussion in the archives.  I remember it as a Jan-Feb discussion but
> might be wrong.

If I recall the discussion correctly, Guido said he was open to a
version of nested scopes that allowed rebinding.  Not sure that the
specifics of the previous discussion are necessary, but I recall being
surprised by the change in opinion since 2.1 :-).


> This thread so far seems like a rehash of parts of the earlier discussion.
> In the absence of indication from Guido that he is ready to reopen the
> issue, perhaps it would better go to comp.lang.python.  In and case,
> reconsideration is more likely to be stimulated by new experience with
> problems in real code than by repeats of 'orthogonality' desires and
> rejected changes.
> ---
> In another post, you rejected the use of class instances by opining:
> >Because I think that this is a workaround for a concept that the
> >language doesn't support elegantly with its lexically nested scopes.
> >IMO, you are emulating name rebinding in a closure by creating an
> >object to encapsulate the name you want to rebind
> Guido, on the other hand, views classes and instances as Python's method of
> doing what other (functional) languages do with closures.  From the PEP:
>     "Given that this
>     would encourage the use of local variables to hold state that is
>     better stored in a class instance, it's not worth adding new
>     syntax to make this possible (in Guido's opinion)."
> He reiterated this viewpoint in the post-PEP discussion mentioned above.  I
> think he would specificly reject the view that Python's alternative is a
> 'workaround' and 'emulation' of what you must consider to be the real
> thing.
> Terry Jan Reedy
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From g.brandl at  Tue Feb 21 20:26:23 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 20:26:23 +0100
Subject: [Python-Dev] Deprecate ``multifile``?
In-Reply-To: <>
References: <dt4dst$tma$>
	<dt4gqa$8i5$>	<dt4hf2$adn$>
	<>	<dtfefp$q9u$>
Message-ID: <dtfpgv$9mb$>

Barry Warsaw wrote:
> On Tue, 2006-02-21 at 17:18 +0100, Georg Brandl wrote:
>> > IIRC, when I brought this up ages ago, there was some grumbling that
>> > multifile is useful for other than email/MIME applications.  Still, I'm
>> > +1 on PEP 4'ing it.
>> Which means "go ahead" or "wait for others to be -1"? <wink>
> s/or/and/ ? :)
> I say go ahead and add it to PEP 4.

Done, and added a note in the docs. More will not be needed until
3.0, I suppose.


From python at  Tue Feb 21 20:33:53 2006
From: python at (Raymond Hettinger)
Date: Tue, 21 Feb 2006 14:33:53 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <003d01c6371d$c0364840$6a01a8c0@RaymondLaptop1>

Then you will likely be happy with Guido's current version of the patch.

----- Original Message ----- 
From: "Crutcher Dunnavant" <crutcher at>
To: "Raymond Hettinger" <python at>
Cc: "Python Dev" <python-dev at>
Sent: Monday, February 20, 2006 8:57 PM
Subject: Re: [Python-Dev] defaultdict proposal round three

in two ways:

1) dict.get doesn't work for object dicts or in exec/eval contexts, and
2) dict.get requires me to generate the default value even if I'm not
going to use it, a process which may be expensive.

On 2/20/06, Raymond Hettinger <python at> wrote:
> [Crutcher Dunnavant ]
> >> There are many times that I want d[key] to give me a value even when
> >> it isn't defined, but that doesn't always mean I want to _save_ that
> >> value in the dict.
> How does that differ from the existing dict.get method?
> Raymond

Crutcher Dunnavant <crutcher at>

From jcarlson at  Tue Feb 21 20:31:50 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 21 Feb 2006 11:31:50 -0800
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:
> Josiah Carlson wrote:
> > Mechanisms which rely on manipulating variables within closures or
> > nested scopes to function properly can be elegant, but I've not yet seen
> > one that *really* is.
> It seems a bit inconsistent to say on the one hand
> that direct assignment to a name in an outer scope
> is not sufficiently useful to be worth supporting,
> while at the same time providing a way to do it for
> one particular scope, i.e. 'global'. Would you
> advocate doing away with it?

I didn't conceive of the idea or implementation of 'global', it was
before my time.  I have found that *using* global can be convenient (and
sometimes even directly manipulating globals() can be even more
convenient).  However, I believe global was and is necessary for the
same reasons for globals in any other language.

Are accessors for lexically nested scopes necessary?  Obviously no.  The
arguments for their inclusion are: easier access to parent scopes and
potentially faster execution.

The question which still remains in my mind, which I previously asked,
is whether the use cases are compelling enough to warrant the feature

>  > Of course using
> > classes directly with a bit of work can offer you everything you want
> > from a closure, with all of the explcitness that you could ever want.
> There are cases where the overhead (in terms of amount
> of code) of defining a class and creating an instance of
> it swamps the code which does the actual work, and,
> I feel, actually obscures what is being done rather
> than clarifies it. These cases benefit from the ability
> to refer to names in enclosing scopes, and I believe
> they would benefit further from the ability to assign
> to such names.

    class namespace: pass

    def fcn(...):
        foo = namespace()

Overwhelms the user?

> Certainly the feature could be abused, as can the
> existing nested scope facilities, or any other language
> feature for that matter. Mere potential for abuse is
> not sufficient reason to reject a feature, or the
> language would have no features at all.

Indeed, but as I have asked, I would like to see some potential
nontrivial *uses*. No one has responded to this particular request. When
I am confronted with a lack of uses, and the potential for abuses, I'm
going to have to side on "no thanks, the potential abuse outweighs the
nonexistant nontrivial use".

> Another consideration is efficiency. CPython currently
> implements access to local variables (both in the
> current scope and all outer ones except the module
> scope) in an extremely efficient way. There's
> always the worry that using attribute access in
> place of local variable access is greatly increasing
> the runtime overhead for no corresponding benefit.

Indeed, the only benefit to using classes is that you gain explicitness. 
To gain speed in current Python, one may need to do a bit more work
(slots, call frame hacking, perhaps an AST manipulation with the new AST
branch, etc.).

> You mention the idea of namespaces. Maybe an answer
> is to provide some lightweight way of defining a
> temporary, singe-use namespace for use within
> nested scopes -- lightweight in terms of both code
> volume and runtime overhead. Perhaps something like
>    def my_func():
>      namespace foo
>      foo.x = 42
>      def inc_x():
>        foo.x += 1

Because this discussion is not about "how do I create a counter in
Python", let's see some examples which are not counters and which are
improved through the use of this "namespace", or "use", "scope", etc.

> > Introducing these two new keywords is equivalent to
> > encouraging nested scope use.  Right now nested scope
> > use is "limited" or "fraught with gotchas".
> What you seem to be saying here is: Nested scope use
> is Inherently Bad. Therefore we will keep them Limited
> and Fraught With Gotchas, so people will be discouraged
> from using them.
> Sounds a bit like the attitude of certain religious
> groups to condoms. (Might encourage people to have
> sex -- can't have that -- look at all the nasty diseases
> you can get!)

If you take that statement within the context of the other things I had
been saying in regards to closures and nested scopes, namely that I find
their use rarely, if ever, truely elegant, it becomes less like "condom
use" as purported by some organizations, and more like kicking a puppy
for barking: it is of my opinion that there are usually better ways of
dealing with the problem (don't kick puppies for barking and don't use

 - Josiah

From walter at  Tue Feb 21 20:36:31 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 21 Feb 2006 20:36:31 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	
	<> <>	
	<> <>	
	<> <>	
Message-ID: <>

Hye-Shik Chang wrote:

> On 2/19/06, Walter D?rwald <walter at> wrote:
>> M.-A. Lemburg wrote:
>>> Walter D?rwald wrote:
>>>> Anyway, I've started implementing a patch that just adds
 >>>> codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
>>>> UTF-16, UTF-16-LE and UTF-16-BE are already working.
>>> Nice :-)
>> is updated now too. The rest should be manageble too.
 >> I'll leave updating the CJKV codecs to Hye-Shik though.
> Okay. I'll look whether how CJK codecs can be improved by the
> new protocol soon.  I guess it'll be not so difficult because CJK
> codecs have a their own common stateful framework already.

OK, here's the patch: (assigned to MAL).

> BTW, CJK codecs don't have V yet.  :-)

   Walter D?rwald

From mal at  Tue Feb 21 20:48:06 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 21 Feb 2006 20:48:06 +0100
Subject: [Python-Dev] Stateful codecs [Was: str object going in Py3K]
In-Reply-To: <>
References: <>	<>
	<>	<>
	<>	<>
	<>	<>	<>	<>	<>
Message-ID: <>

Walter D?rwald wrote:
> Hye-Shik Chang wrote:
>> On 2/19/06, Walter D?rwald <walter at> wrote:
>>> M.-A. Lemburg wrote:
>>>> Walter D?rwald wrote:
>>>>> Anyway, I've started implementing a patch that just adds
>>>>> codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
>>>>> UTF-16, UTF-16-LE and UTF-16-BE are already working.
>>>> Nice :-)
>>> is updated now too. The rest should be manageble too.
>>> I'll leave updating the CJKV codecs to Hye-Shik though.
>> Okay. I'll look whether how CJK codecs can be improved by the
>> new protocol soon.  I guess it'll be not so difficult because CJK
>> codecs have a their own common stateful framework already.
> OK, here's the patch: (assigned to MAL).

Thanks. I won't be able to look into it this week though, probably
next week.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 21 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From pje at  Tue Feb 21 21:25:38 2006
From: pje at (Phillip J. Eby)
Date: Tue, 21 Feb 2006 15:25:38 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

At 11:31 AM 2/21/2006 -0800, Josiah Carlson wrote:

>Greg Ewing <greg.ewing at> wrote:
> >
> > It seems a bit inconsistent to say on the one hand
> > that direct assignment to a name in an outer scope
> > is not sufficiently useful to be worth supporting,
> > while at the same time providing a way to do it for
> > one particular scope, i.e. 'global'. Would you
> > advocate doing away with it?
>I didn't conceive of the idea or implementation of 'global', it was
>before my time.  I have found that *using* global can be convenient (and
>sometimes even directly manipulating globals() can be even more
>convenient).  However, I believe global was and is necessary for the
>same reasons for globals in any other language.

Here's a crazy idea, that AFAIK has not been suggested before and could 
work for both globals and closures: using  a leading dot, ala the new 
relative import feature.  e.g.:

    def incrementer(val):
        def inc():
            .val += 1
            return .val
        return inc

The '.' would mean "this name, but in the nearest outer scope that defines 
it".  Note that this could include the global scope, so the 'global' 
keyword could go away in 2.5.  And in Python 3.0, the '.' could become 
*required* for use in closures, so that it's not necessary for the reader 
to check a function's outer scope to see whether closure is taking 
place.  EIBTI.

Interestingly, the absence of a name before the dot seems to imply that the 
name is an attribute of the Unnameable.  :)  Or more prosaically, it treats 
lexical closures and module globals as special cases of objects.

You could perhaps even extend it so that '.' by itself means the same thing 
as vars(), but that's probably going too far, assuming that the idea wasn't 
too far gone to begin with.  I suspect functional folks will love the '.' 
idea, but also that folks who wanted to get rid of 'self' will probably 
scream bloody murder at the idea of using a leading dot to represent a 
scope intead of 'self'.  :)

From mbland at  Tue Feb 21 21:39:53 2006
From: mbland at (Mike Bland)
Date: Tue, 21 Feb 2006 12:39:53 -0800
Subject: [Python-Dev] PEP 343 "with" statement patch
Message-ID: <>

With Neal Norwitz's help, I've submitted an initial patch to implement
the "with" statement from PEP 343 (SourceForge request ID 1435715). 
There is a little more work to be done (on the doc especially), and I
have a couple of questions written up on the SourceForge page, but the
code works to the best of my understanding of PEP 343 and has a fairly
comprehensive set of unit tests to verify it.

Looking forward to the review,


From mrussell at  Tue Feb 21 21:41:26 2006
From: mrussell at (Mark Russell)
Date: Tue, 21 Feb 2006 20:41:26 +0000
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 21 Feb 2006, at 19:25, Jeremy Hylton wrote:
> If I recall the discussion correctly, Guido said he was open to a
> version of nested scopes that allowed rebinding.

PEP 227 mentions using := as a rebinding operator, but rejects the  
idea as it would encourage the use of closures.  But to me it seems  
more elegant than some special keyword, especially is it could also  
replace the "global" keyword.  It doesn't handle things like "x += y"  
but I think you could deal with that by just writing "x := x + y".

BTW I do think there are some cases where replacing a closure with a  
class is not an improvement.  For example (and assuming the existence  
of :=):

     def check_items(items):
         had_error = False

         def err(mesg):
             print mesg
             had_error := True

         for item in items:
             if too_big(item):
                 err("Too big")
             if too_small(item):
                 err("Too small")

         if had_error:
             print "Some items were out of range"

Using a class for this kind of trivial bookkeeping just adds  
boilerplate and obscures the main purpose of the code:

     def check_items(items):
         class NoteErrors (object):
             def __init__(self):
                 self.had_error = False

             def __call__(self, mesg):
                 print mesg
                 self.had_error = True

         err = NoteErrors()

         for item in items:
             if too_big(item):
                 err("Too big")
             if too_small(item):
                 err("Too small")

         if err.had_error:
             print "Some items were out of range"

Any chance of := (and removing "global") in python 3K?

Mark Russell

From just at  Tue Feb 21 22:01:55 2006
From: just at (Just van Rossum)
Date: Tue, 21 Feb 2006 22:01:55 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
Message-ID: <r01050400-1039-4AD7D4ADA31D11DA8736001124365170@[]>

Mark Russell wrote:

> PEP 227 mentions using := as a rebinding operator, but rejects the  
> idea as it would encourage the use of closures.  But to me it seems  
> more elegant than some special keyword, especially is it could also  
> replace the "global" keyword.  It doesn't handle things like "x += y"  
> but I think you could deal with that by just writing "x := x + y".

Actually, it could handle += just fine, since that operator has written
"rebinding" all over it...

I'd be +1 on := (and augmented assignment being rebinding), but the
argument against it (if I recall correctly) was that rebinding should be
a property of the name, not of the operator. Yet "declaring" a name
local is also done trough an operator: a = 1 means a is local (unless it
was declared global). It can definitely be argued either way.

Btw, PJE's "crazy" idea (.name, to rebind an outer name) was proposed
before, but Guido wanted to reserve .name for a (Pascal-like) 'with'
statement. Hmm,
confirms that, although it wasn't in response to a rebinding syntax. So
maybe it wasn't proposed before after all...


From ianb at  Tue Feb 21 22:13:22 2006
From: ianb at (Ian Bicking)
Date: Tue, 21 Feb 2006 15:13:22 -0600
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>	<>	<dtfou0$81t$>	<>
Message-ID: <>

Mark Russell wrote:
> On 21 Feb 2006, at 19:25, Jeremy Hylton wrote:
>>If I recall the discussion correctly, Guido said he was open to a
>>version of nested scopes that allowed rebinding.
> PEP 227 mentions using := as a rebinding operator, but rejects the  
> idea as it would encourage the use of closures.  But to me it seems  
> more elegant than some special keyword, especially is it could also  
> replace the "global" keyword.  It doesn't handle things like "x += y"  
> but I think you could deal with that by just writing "x := x + y".

By rebinding operator, does that mean it is actually an operator?  I.e.:

   # Required assignment to declare?:
   chunk = None
   while chunk :=

Ian Bicking  /  ianb at  /

From martin at  Tue Feb 21 22:25:49 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:25:49 +0100
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <013801c63682$eb39a5f0$7600a8c0@RaymondLaptop1>
References: <><><><>	<>
Message-ID: <>

Raymond Hettinger wrote:
>>Yes, I now agree. This means that I'm withdrawing proposal A (new
>>method) and championing only B (a subclass that implements
>>__getitem__() calling on_missing() and on_missing() defined in that
>>subclass as before, calling default_factory unless it's None). I don't
>>think this crisis is big enough to need *two* solutions, and this
>>example shows B's superiority over A.
> FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5.

I agree. I would have preferred if dict itself was modified, but after
ruling out changes to dict.__getitem__, d[k]+=1 is too important to
not support it.


From g.brandl at  Tue Feb 21 22:27:26 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 22:27:26 +0100
Subject: [Python-Dev] Two patches
Message-ID: <dtg0jv$54r$>


I have two patches lying around here, please comment:

* I think I've submitted this one to the tracker, but can't remember:
  It's for PySequence_SetItem and makes something like this possible:

  tup = ([], )
  tup[0] += [1]

  I can upload it once more to allow review.

* One patch for staticmethod and classmethod, which currently silently
  accept keyword arguments and throw them away. The patch adds error


From martin at  Tue Feb 21 22:34:27 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:34:27 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
Message-ID: <>

Tim Peters wrote:
> Speaking of which, a number of test failures over the past few weeks
> were provoked here only under -r (run tests in random order) or under
> a debug build, and didn't look like those were specific to Windows. 
> Adding -r to the buildbot test recipe is a decent idea.  Getting
> _some_ debug-build test runs would also be good (or do we do that
> already?).

So what is your recipe: Add -r to all buildbots? Only to those which
have an 'a' in their name? Only to every third build? Duplicating
the number of builders?

Same question for --with-pydebug. Combining this with -r would multiply
the number of builders by 4 already.

I'm not keen on deciding this for myself. Somebody else please decide
for me.


From martin at  Tue Feb 21 22:36:51 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:36:51 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>	<>	<>	<>
Message-ID: <>

Neal Norwitz wrote:
>>How many of those do you see when you ignore the warnings you get
>>while building the Carbon extensions? Those extensions wrap loads of
>>deprecated functions, each of which will give a warning.
> RIght:
> Most but not all of the warnings are due to Carbon AFAICT.  I'd like
> to fix those that are important, but it's so far down on the priority
> list. :-(

Should we build with -Wno-deprecated (or whatever it is spelled) on OSX?
In general, "deprecated" warnings are useless for Python. We *know* we
are providing wrappers around many deprecated functions. We will (nearly
automatically) discontinue wrapping the functions when they get removed.


From martin at  Tue Feb 21 22:38:53 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:38:53 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <>
	<>	<>	<>
	<>	<>	<>	<>	<>
Message-ID: <>

skip at wrote:
>     Ronald> How many of those do you see when you ignore the warnings you
>     Ronald> get while building the Carbon extensions?
>     skip> I see a bunch related to Py_ssize_t.  Those have nothing to do
>     skip> with Carbon.  I don't see them on the gentoo build, so I assume
>     skip> they just haven't been tackled yet.
> Let me rephrase that.  I assume the people digging through Py_ssize_t issues
> have been looking at compilation warnings for platforms other than Mac OSX.

In the buildbot log, I see only a single one of these, and only in an
OSX-specific module. So no - "we" don't look into fixing them, as they
don't occur on Linux at all (as _Qdmodule isn't built on Linux).


From martin at  Tue Feb 21 22:39:58 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:39:58 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <>
	<>	<>	<>
	<>	<>	<>
Message-ID: <>

skip at wrote:
>     Neal> IMO compiler warnings should generate emails from buildbot.  
> It doesn't generate emails for any other condition.  I think it should just
> turn the compilation section yellow.

It would be easy to run the builds with -Werror, making warnings let the
compilation fail, which in turn is flagged red.


From martin at  Tue Feb 21 22:41:13 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:41:13 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <dtff36$1js$>
References: <>	<><>	<><><>	<>
Message-ID: <>

Terry Reedy wrote:
> "Neal Norwitz" <nnorwitz at> wrote in message 
> news:ee2a432c0602210009x2f4d1fffl3d49037b9b084d1e at
>>There's nothing to prevent buildbot from making debug builds, though
>>that is not currently done.
> Now that there are separate report pages for 2.4 and 2.5, you could add 
> pages for debug builds, perhaps with a lower frequency (once a day?), 
> without cluttering up the main two pages.

Not soon, unless somebody has a complete recipe how to change the master


From skip at  Tue Feb 21 22:41:42 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 15:41:42 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    Martin> So what is your recipe: Add -r to all buildbots? Only to those
    Martin> which have an 'a' in their name? Only to every third build?
    Martin> Duplicating the number of builders?

    Martin> Same question for --with-pydebug. Combining this with -r would
    Martin> multiply the number of builders by 4 already.

    Martin> I'm not keen on deciding this for myself. Somebody else please
    Martin> decide for me.

Now that you've broken the buildbot page into two (trunk and 2.4) I assume
breaking it down even further wouldn't be a major undertaking.  If we can
recruit a suitable number of boxes I see no particular reason you can't
support a 2x, 4x, 8x or more increase in the number of buildbot slaves.


From martin at  Tue Feb 21 22:45:13 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:45:13 +0100
Subject: [Python-Dev] readline compilarion fails on OSX
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Guido van Rossum wrote:
> Thanks! That worked.
> But shouldn't we try to fix to detect this situation instead
> of making loud clattering noises?

One of my concerns with the distutils build process is that it
takes failures lightly. Unlike make, it won't stop when an error
occurs, but instead go on with the next module. Then, when you
retry make, it will retry building the module again, which will
again fail. Also happens with the curses module on Solaris.

buildbot's detection of build failures is restricted to the exit
status of the build process. The build either succeeds or fails.


From steven.bethard at  Tue Feb 21 22:47:42 2006
From: steven.bethard at (Steven Bethard)
Date: Tue, 21 Feb 2006 14:47:42 -0700
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/21/06, Josiah Carlson <jcarlson at> wrote:
> The question which still remains in my mind, which I previously asked,
> is whether the use cases are compelling enough to warrant the feature
> addition.

I don't know whether I support the proposal or not, but in reading
Mark Russel's email, I realized that I just recently ran into a use

# group tokens into chunks by their chunk labels
token_groups = []
curr_suffix = ''
curr_tokens = []
for token in document.IterAnnotations('token', percent=80):
    label = token[attr_name]

    # determine the prefix and suffix of the label
    prefix, suffix = label[0], label[2:]

    # B labels start a new chunk
    if prefix == 'B':
        curr_suffix = suffix
        curr_tokens = [token]
        token_groups.append((curr_suffix, curr_tokens))

    # I labels continue the previous chunk
    elif prefix == 'I':
        if curr_suffix == suffix:
        # error: change in suffix - this should be a B label
            # log the error
            message = '%r followed by %r'
            last_label = curr_tokens[-1][attr_name]
   % (last_label, label))
            # start a new chunk
            curr_suffix = suffix
            curr_tokens = [token]
            token_groups.append((curr_suffix, curr_tokens))

    # O labels end any previous chunks
    elif prefix == 'O':
        curr_suffix = suffix
        curr_tokens = [token]

You can see that the code::

        curr_suffix = suffix
        curr_tokens = [token]
        token_groups.append((curr_suffix, curr_tokens))

is repeated in two places.  I would have liked to factor this out into
a function, but since the code requires rebinding curr_suffix and
curr_tokens, I can't.  I'm not sure I care that much -- it's only
three lines of code and only duplicated once -- but using something
like ``curr_suffix :=`` or Phillip J. Eby's suggestion of
``.curr_suffix =`` would allow this code to be factored out into a

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From martin at  Tue Feb 21 22:53:40 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 22:53:40 +0100
Subject: [Python-Dev] Two patches
In-Reply-To: <dtg0jv$54r$>
References: <dtg0jv$54r$>
Message-ID: <>

Georg Brandl wrote:
> * I think I've submitted this one to the tracker, but can't remember:
>   It's for PySequence_SetItem and makes something like this possible:
>   tup = ([], )
>   tup[0] += [1]

That definitely needs fixing:

py> tup = ([], )
py> tup[0] += [1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment
py> tup

Errors should never pass silently, but success shouldn't cause
an error message, either.

> * One patch for staticmethod and classmethod, which currently silently
>   accept keyword arguments and throw them away. The patch adds error
>   messages.

Sounds good as well.


From martin at  Tue Feb 21 23:00:08 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Feb 2006 23:00:08 +0100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <>
	<>	<>	<>
	<>	<>	<>
Message-ID: <>

skip at wrote:
> Now that you've broken the buildbot page into two (trunk and 2.4) I assume
> breaking it down even further wouldn't be a major undertaking.  If we can
> recruit a suitable number of boxes I see no particular reason you can't
> support a 2x, 4x, 8x or more increase in the number of buildbot slaves.

Let me explain the procedure for breaking it down, then:

- there are builder objects in buildbot, each displayed as a single

- each builder now gets a "category" attribute; currently, the
  categories are "trunk" and "2.4".

- for each page, there is an instance of the Waterfall object,
  constructed with the list of categories to display. Each Waterfall
  gets its own port number (currently 9010, 9011, and 9012).

- There are reverse proxy rules in Apache's httpd.conf, each page
  requiring 2 lines (giving currently 6 lines of Apache configuration).

So for multiplying this by 8, I would have to create 48 lines of
Apache configuration, and use 24 TCP ports. This can be done, but
it would take some time to implement. And who is going to look
at the 24 pages?


From jjl at  Tue Feb 21 22:38:03 2006
From: jjl at (John J Lee)
Date: Tue, 21 Feb 2006 21:38:03 +0000 (UTC)
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <Pine.LNX.4.64.0602212124150.6171@alice>

On Tue, 21 Feb 2006, Guido van Rossum wrote:
> If you're only interested in classifying the three specific built-ins
> you mention, I'd check for the presense of certain attributes:
> hasattr(x, "lower") -> x is a string of some kind; hasattr(x, "sort")
> -> x is a list; hasattr(x, "update") -> x is a dict. Also, hasattr(x,
> "union") -> x is a set; hasattr(x, "readline") -> x is a file.

dict and set instances both have an .update() method.  I guess "keys" or 
"items" is a better choice for testing dict-ness, if using "LBYL" at all.

(anybody new to "LBYL" can google for that and EAFP -- latter does not 
stand for European Assoc. of Fish Pathologists in this context, though ;-)

> That's duck typing!

>>> hasattr(python, "quack")


From g.brandl at  Tue Feb 21 23:13:13 2006
From: g.brandl at (Georg Brandl)
Date: Tue, 21 Feb 2006 23:13:13 +0100
Subject: [Python-Dev] Two patches
In-Reply-To: <>
References: <dtg0jv$54r$> <>
Message-ID: <dtg39p$dut$>

Martin v. L?wis wrote:
> Georg Brandl wrote:
>> * I think I've submitted this one to the tracker, but can't remember:
>>   It's for PySequence_SetItem and makes something like this possible:
>>   tup = ([], )
>>   tup[0] += [1]
> That definitely needs fixing:
> py> tup = ([], )
> py> tup[0] += [1]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: object doesn't support item assignment
> py> tup
> ([1],)
> Errors should never pass silently, but success shouldn't cause
> an error message, either.

The patch is now at SF, item #1436226.

>> * One patch for staticmethod and classmethod, which currently silently
>>   accept keyword arguments and throw them away. The patch adds error
>>   messages.
> Sounds good as well.

Checked in to 2.5 branch.


From tjreedy at  Tue Feb 21 23:17:47 2006
From: tjreedy at (Terry Reedy)
Date: Tue, 21 Feb 2006 17:17:47 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
References: <><><><dtfou0$81t$>
Message-ID: <dtg3ia$gle$>

"Jeremy Hylton" <jeremy at> wrote in message 
news:e8bf7a530602211125k28fc64bcx62430c375d060a8b at
> If I recall the discussion correctly, Guido said he was open to a
> version of nested scopes that allowed rebinding.

Yes.  Among other places, he said in
Your PEP wonders why I am against allowing assignment to intermediate
levels.  Here's my answer: all the syntaxes that have been proposed
to spell this have problems.  So let's not provide a way to spell it.
I predict that it won't be a problem.  If it becomes a problem, we can
add a way to spell it later. ''


From tdelaney at  Tue Feb 21 23:23:39 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Wed, 22 Feb 2006 09:23:39 +1100
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was: PEP
	332	revival in coordination with pep 349?]]
Message-ID: <>

Greg Ewing wrote:

> I don't quite see the point here. Inside a bytes object,
> they would be stored 1 byte per byte. Nobody is suggesting
> that they would take up more than that just because
> a_bytes_object[i] happens to return an int.

Speaking of which, I suspect it'll be a lot more common to need integer
objects in the full range [0, 255] than it is now.

Perhaps we should extend the pre-allocated integer objects to cover the
full byte range.

Tim Delaney

From python at  Tue Feb 21 23:31:22 2006
From: python at (Raymond Hettinger)
Date: Tue, 21 Feb 2006 17:31:22 -0500
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was:
	PEP332	revival in coordination with pep 349?]]
References: <>
Message-ID: <004c01c63736$8b33fe80$6a01a8c0@RaymondLaptop1>

> Speaking of which, I suspect it'll be a lot more common to need integer
> objects in the full range [0, 255] than it is now.
> Perhaps we should extend the pre-allocated integer objects to cover the
> full byte range.


From tdelaney at  Tue Feb 21 23:34:37 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Wed, 22 Feb 2006 09:34:37 +1100
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex() [Was:
	PEP332	revival in coordination with pep 349?]]
Message-ID: <>

Raymond Hettinger wrote:

>> Speaking of which, I suspect it'll be a lot more common to need
>> integer objects in the full range [0, 255] than it is now.
>> Perhaps we should extend the pre-allocated integer objects to cover
>> the full byte range.
> +1

Want me to raise an SF request?

Tim Delaney

From tdelaney at  Tue Feb 21 23:44:10 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Wed, 22 Feb 2006 09:44:10 +1100
Subject: [Python-Dev] s/bytes/octet/ [Was:Re: bytes.from_hex()
	[Was:PEP332	revival in coordination with pep 349?]]
Message-ID: <>

Delaney, Timothy (Tim) wrote:

>>> Perhaps we should extend the pre-allocated integer objects to cover
>>> the full byte range.
>> +1
> Want me to raise an SF request?

Done. Item # 1436243.

Tim Delaney

From mrussell at  Tue Feb 21 23:56:34 2006
From: mrussell at (Mark Russell)
Date: Tue, 21 Feb 2006 22:56:34 +0000
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>	<>	<dtfou0$81t$>	<>
Message-ID: <>

On 21 Feb 2006, at 21:13, Ian Bicking wrote:
> By rebinding operator, does that mean it is actually an operator?   
> I.e.:
>    # Required assignment to declare?:
>    chunk = None
>    while chunk :=
>        ...

No, I think that "x := y" should be a statement not an expression  
(i.e. just like "x = y" apart from the treatment of bindings).

I'd be inclined to require that the target of := be already bound, if  
only to prevent people randomly using ":=" in places where it's not  

In a new language I would probably also make it an error to use = to  
do rebinding (i.e. insist on = for new bindings, and := for  
rebindings).  But that's obviously not reasonable for python.

Mark Russell

From pje at  Wed Feb 22 00:19:09 2006
From: pje at (Phillip J. Eby)
Date: Tue, 21 Feb 2006 18:19:09 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

At 11:50 PM 2/21/2006 +0100, Morel Xavier wrote:
>Phillip J. Eby wrote:
>>The '.' would mean "this name, but in the nearest outer scope that 
>>defines it".  Note that this could include the global scope, so the 
>>'global' keyword could go away in 2.5.  And in Python 3.0, the '.' could 
>>become *required* for use in closures, so that it's not necessary for the 
>>reader to check a function's outer scope to see whether closure is taking 
>>place.  EIBTI.
>While the idea is interesting, how would this solution behave if the 
>variable (the name) didn't exist in any outer scope?

The compiler should consider it a name in the global scope, and for an 
assignment the name would be required to have an existing binding, or a 
NameError would result.  (Indicating you are assigning to a global that 
hasn't been defined.)

>Would it create and bind the name in the current scope?

No, never.

>         If yes, why wouldn't this behavior become the default (without 
> any leading dot), efficiency issues of the lookup?

No, it would be because explicit is better than implicit.  The whole point 
of requiring '.' for closures in Python 3.0 would be to keep the person 
who's reading the code from having to inspect an entire function and its 
context to figure out which names are referring to variables in outer 
scopes.  That is, it would go against the whole point of my idea, which is 
to make explicit what variables are part of your closure.

From andrew-pythondev at  Wed Feb 22 00:23:59 2006
From: andrew-pythondev at (Andrew Bennetts)
Date: Wed, 22 Feb 2006 10:23:59 +1100
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

Martin v. L?wis wrote:
> skip at wrote:
> So for multiplying this by 8, I would have to create 48 lines of
> Apache configuration, and use 24 TCP ports. This can be done, but
> it would take some time to implement. And who is going to look
> at the 24 pages?

This last point is the most important, I think.  Most of the time I look at
Twisted's buildbot, it's to see at a glance which, if any, builds are broken.  I
think this is the #1 use case.  Second is getting the details of what broke, and
who broke it.

So massively multiplying the pages seems counter-productive to me.

I suspect there's nearly as much advantage to running randomised tests on just
one platform as there is on many, so a good trade-off may be to just add one
more builder (to each branch) that does -r on just one platform.  I'm assuming
most of the issues randomisation exposes aren't platform-dependent.


From greg.ewing at  Wed Feb 22 01:01:55 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:01:55 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Jeremy Hylton wrote:

> The names of naming statements are quite hard to get right, I fear.

My vote goes for 'outer'.

And if this gets accepted, remove 'global' in 3.0.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 22 01:34:06 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:34:06 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:

> It doesn't seem strange to you to need to encode data twice to be able
> to have a usable sequence of characters which can be embedded in an
> effectively 7-bit email;

I'm talking about a 3.0 world where all strings are unicode
and the unicode <-> external coding is for the most part
done automatically by the I/O objects. So you'd be building
up your whole email as a string (aka unicode) which happens
to only contain code points in the range 0..127, and then
writing it to your socket or whatever. You wouldn't need
to do the second encoding step explicitly very often.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 22 01:35:26 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:35:26 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <dtfe6o$q9u$>
References: <>
	<> <dtfe6o$q9u$>
Message-ID: <>

Georg Brandl wrote:

> But why is that better than
> class namespace(object): pass
> def my_func():
>     foo = namespace()
>     (...)

Because then it would be extremely difficult for CPython to
optimise accesses to foo into local variable lookups.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 22 01:35:29 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:35:29 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
Message-ID: <>

Alex Martelli wrote:

> If we call the type autodict, then having the factory attribute named  
> autofactory seems to fit.

Or just 'factory', since it's the only kind of factory
the object is going to have.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 22 01:35:32 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:35:32 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Jeremy Hylton wrote:
> On 2/21/06, Jeremy Hylton <jeremy at> wrote:
>>On 2/21/06, Bengt Richter <bokr at> wrote:
>>>But to the topic, it just occurred to me that any outer scopes could be given names
>>>(including global namespace, but that would have the name global by default, so
>>>global.x would essentially mean what globals()['x'] means now, except it would
>>>be a name error if x didn't pre-exist when accessed via namespace_name.name_in_space notation.
> Isn't this suggestion that same as Greg Ewing's?

It's not quite the same, because in my scheme the namespace
statement creates a new namespace embedded in the scope
where it appears, whereas Bengt's one seems to just give
a name to the scope itself.

I'm not really in favour of either of these -- I'd be
just as happy with a simple 'outer' statement.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Wed Feb 22 01:35:33 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 13:35:33 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Fuzzyman wrote:

> I've had problems in code that needs to treat strings, lists and
> dictionaries differently (assigning values to a container where all
> three need different handling) and telling the difference but allowing
> duck typing is *problematic*.

You need to rethink your design so that you don't
have to make that kind of distinction.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From fumanchu at  Wed Feb 22 01:48:46 2006
From: fumanchu at (Robert Brewer)
Date: Tue, 21 Feb 2006 16:48:46 -0800
Subject: [Python-Dev] Unifying trace and profile
Message-ID: <6949EC6CD39F97498A57E0FA55295B2101C312B6@ex9.hostedexchange.local>

There are a number of features I'd like to see happen with Python's
tracing and profiling subsystems (but I don't have the C experience to
do it myself). I started to write an SF feature-request and then
realized it was too much for a single ticket. Maybe a PEP? All of these
would be make my latest side project[1] a lot easier.

Anyway, here they are (most important and easiest-to-implement first):

1. Allow trace hooks to receive c_call, c_return, and c_exception events
(like profile does).

2. Allow profile hooks to receive line events (like trace does).

3. Expose new sys.gettrace() and getprofile() methods, so trace and
profile functions that want to play nice can call
sys.settrace/setprofile(None) only if they are the current hook.

4. Make "the same move" that sys.exitfunc -> atexit made (from a single
function to multiple functions via registration), so multiple
tracers/profilers can play nice together.

5. Allow the core to filter on the "event" arg before hook(frame, event,
arg) is called.

6. Unify tracing and profiling, which would remove a lot of redundant
code in ceval and sysmodule and free up some space in the PyThreadState
struct to boot.

7. As if the above isn't enough of a dream, it would be nice to have a
bytecode tracer, which didn't bother with the f_lineno logic in
maybe_call_line_trace, but just called the hook on every instruction.

Robert Brewer
System Architect
Amor Ministries
fumanchu at

[1] PyConquer, a trace hook to help understand and debug concurrent
(threaded) code.

From raymond.hettinger at  Wed Feb 22 01:54:57 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Tue, 21 Feb 2006 19:54:57 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <><><><><>
Message-ID: <001401c6374a$9a3a2fd0$6a01a8c0@RaymondLaptop1>

> Alex Martelli wrote:
>> If we call the type autodict, then having the factory attribute named  
>> autofactory seems to fit.
> Or just 'factory', since it's the only kind of factory
> the object is going to have.

Gack, no.  You guys are drifting towards complete ambiguity.
You might as well call it "thingie_that_doth_return_an_object".
The word "factory" by itself says nothing about lookups and default values.
Like "autodict" could mean anything.  Keep in mind that we may well
end-up having this side-by-side with collections.ordered_dict.
The word "auto" tells you nothing about how this is different from
a regular dict or ordered dictionary.  It's meaningless.

Please, stick with defaultdictionary and default_factory.
While not perfectly descriptive, they are suggest just enough
to jog the memory and make the code readable.
Try to resist generalizing the name into nothingness.


From tim.peters at  Wed Feb 22 02:04:20 2006
From: tim.peters at (Tim Peters)
Date: Tue, 21 Feb 2006 20:04:20 -0500
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

[Martin v. L?wis]
> So what is your recipe:

I don't have one.  I personally always use -uall and -r, and then run
the tests 8 times, w/ and w/o -O, under debug and release builds, and
w/ and w/o deleting .py[co] files first.  Because that last one almost
never finds problems anymore, perhaps it would be good to stop
bothering with it routinely (it really doesn't have potential to find
a problem unless someone has been mucking with the marshaling of code
objects, right?).

> Add -r to all buildbots?

Sure.  -r adds variety to testing at no cost (the same number of tests
run, in the same pamount of time, with or without -r).

> Only to those which have an 'a' in their name?

Sorry, no idea what that means.

> Only to every third build? Duplicating the number of builders?

For -r, no.  I'd always use -r (and always do anyway).

> Same question for --with-pydebug. Combining this with -r would multiply
> the number of builders by 4 already.

I would much rather see a debug-build run than the current "with and
without deleting .py[co] files first" variant.  If the latter were
dropped and the former were added, and -r were used all the time, the
number of recipes wouldn't change.  Testing time would increase, by
the time to _do_ a debug build, and by the extra time a debug build
test run requires.

We should test with and without -O too, although that's another that
rarely finds a problem.

> I'm not keen on deciding this for myself. Somebody else please decide
> for me.

I don't know how hard it is to teach the system how to do something
"not so often", and I expect that's an important unknown since I
imagine that vastly increasing test time would discourage people from
volunteering buildbot slaves.

Since the most fruitful variations (IME) for finding code errors are
using -r and running a debug build too, I'd change the current
run-all-the-time recipes to:

- Stop doing the second "without deleting .py[co]" run.
- Do one run with a release build.
- Do one run with a debug build.
- Use -uall -r for both.

If we know how to get something done "occasionally", then about once a
week it would be prudent to also:

- Try the "with and without deleting .py[co] files first" business.
- Try with and without -O,

Those last two choices cover 8 distinct modes, when paired with each
other and with the "release versus debug build" choice.

From kbk at  Wed Feb 22 02:09:20 2006
From: kbk at (Kurt B. Kaiser)
Date: Tue, 21 Feb 2006 20:09:20 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <>

Patch / Bug Summary

Patches :  385 open (-14) /  3067 closed (+25) /  3452 total (+11)
Bugs    :  864 open (-59) /  5621 closed (+68) /  6485 total ( +9)
RFE     :  211 open ( +2) /   200 closed ( +2) /   411 total ( +4)

New / Reopened Patches

GNU uses double-dashes not single  (2006-02-16)  opened by  splitscreen

restrict codec lookup to encodings package  (2006-02-16)
CLOSED  reopened by  lemburg

restrict codec lookup to encodings package  (2006-02-16)
CLOSED  opened by  Guido van Rossum

add on_missing() and default_factory to dict  (2006-02-17)  opened by  Guido van Rossum

CHM file contains proprietary link format  (2006-02-18)  opened by  Alexander Schremmer

Patch to support lots of file descriptors  (2006-02-19)  opened by  Sven Berkvens-Matthijsse

Add copy() method to zlib's compress and decompress objects  (2006-02-20)  opened by  Chris AtLee

PEP 343 with statement  (2006-02-21)  opened by  mbland

Incremental codecs  (2006-02-21)  opened by  Walter D?rwald

fix inplace assignment for immutable sequences  (2006-02-21)  opened by  Georg Brandl

Patches Closed

GNU uses double-dashes not single  (2006-02-16)  deleted by  gvanrossum

restrict codec lookup to encodings package  (2006-02-16)  closed by  lemburg

restrict codec lookup to encodings package  (2006-02-16)  closed by  lemburg

use computed goto's in ceval loop  (2006-01-18)  closed by  loewis

have SimpleHTTPServer return last-modified headers  (2006-01-28)  closed by  birkenfeld

Feed style codec API  (2005-01-12)  closed by  lemburg can't handle >2GB chunks  (2005-12-05)  closed by  birkenfeld

Fix of bug 1366000  (2005-11-30)  closed by  birkenfeld

Optional second argument for startfile  (2005-12-29)  closed by  birkenfeld

Clairify docs on reference stealing  (2006-01-26)  closed by  birkenfeld

urllib proxy_bypass broken  (2006-02-07)  closed by  birkenfeld

Speed up EnumKey call  (2004-06-22)  closed by  birkenfeld

[PATCH] Bug #1351707  (2005-11-10)  closed by  birkenfeld

fileinput patch for bug #1336582  (2005-10-25)  closed by  birkenfeld

Fix for int(string, base) wrong answers  (2005-10-22)  closed by  birkenfeld

[PATCH] 100x optimization for ngettext  (2005-11-06)  closed by  birkenfeld

commands.getstatusoutput()  (2005-11-02)  closed by  birkenfeld

two fileinput enhancements (fileno, openhook)  (2005-06-05)  closed by  birkenfeld

mode argument for fileinput class  (2005-05-31)  closed by  birkenfeld

do not add directory of sys.argv[0] into sys.path  (2004-05-02)  closed by  gbrandl

prefix and exec_prefix as root dir bug  (2004-04-08)  closed by  gbrandl

New / Reopened Bugs

optparse docs double-dash confusion  (2006-02-16)  opened by  John Veness

Logging hangs thread after detaching a StreamHandler's termi  (2006-02-13)
CLOSED  reopened by  yangzhang

os.path.expandvars sometimes doesn't expand $HOSTNAME  (2006-02-17)
CLOSED  opened by  Doug Fort

normalize function in minidom unlinks empty child nodes  (2006-02-17)  opened by  RomanKliotzkin

string parameter to ioctl not null terminated, includes fix  (2006-02-17)  opened by  Quentin Barnes

pointer aliasing causes core dump, with workaround  (2006-02-17)  opened by  Quentin Barnes

Python crash on __init__/__getattr__/__setattr__ interaction  (2004-04-26)
CLOSED  reopened by  hhas

Crash when decoding UTF8  (2006-02-20)
CLOSED  opened by  Viktor Ferenczi

CGIHTTPServer doesn't handle path names with embeded space  (2006-02-21)  opened by  Richard Coupland

Bugs Closed

Logging hangs thread after detaching a StreamHandler's termi  (2006-02-14)  closed by  vsajip

logging module's setLoggerClass not really working  (2005-09-08)  closed by  vsajip

pydoc still doesn't handle lambda well  (2006-02-15)  closed by  birkenfeld

smtplib: empty mail addresses  (2006-02-12)  closed by  birkenfeld

IMPORT PROBLEM: Local submodule shadows global module  (2006-02-01)  closed by  birkenfeld

class dictionary shortcircuits __getattr__  (2006-01-31)  closed by  birkenfeld

SimpleHTTPServer doesn't return last-modified headers  (2006-01-28)  closed by  birkenfeld

os.path.expandvars sometimes doesn't expand $HOSTNAME  (2006-02-17)  closed by  birkenfeld

http response dictionary incomplete  (2006-02-01)  closed by  birkenfeld

Bug bz2.BZ2File(...).seek(0,2)  (2005-11-25)  closed by  birkenfeld

Incorrect Decimal-float behavior for +  (2005-11-13)  closed by  arigo

http auth documentation/implementation conflict  (2005-08-13)  closed by  birkenfeld

bsddb.__init__ causes error  (2006-01-04)  closed by  birkenfeld

StreamReader.readline doesn't advance on decode errors  (2005-12-13)  closed by  birkenfeld

zipimport produces incomplete IOError instances  (2005-11-08)  closed by  birkenfeld

zipfile: inserting some filenames produces corrupt .zips  (2006-01-24)  closed by  birkenfeld

socketmodule.c compile error using SunPro cc  (2003-10-06)  closed by  birkenfeld

Python-2.3.3c1, Solaris 2.7: socketmodule does not compile  (2003-12-05)  closed by  birkenfeld

README build instructions for fpectl  (2004-01-07)  closed by  birkenfeld

test_fcntl fails on netbsd2  (2005-01-12)  closed by  birkenfeld

Python 2.4 and 2.3.5 won't build on OpenBSD 3.7  (2005-11-01)  closed by  loewis

getwindowsversion() constants in sys module  (2005-10-10)  closed by  gbrandl

No documentation for PyFunction_* (C-Api)  (2004-08-22)  closed by  gbrandl

pickle files should be opened in binary mode  (2005-01-14)  closed by  gbrandl

os.stat returning a time   (2003-09-22)  closed by  gbrandl

test_sax fails on python 2.2.3 &amp; patch for  (2003-06-21)  closed by  gbrandl

a exception ocurrs when compiling a Python file  (2003-12-09)  closed by  gbrandl

Ms VC 2003 not supported  (2003-06-18)  closed by  gbrandl breaks if prefix is empty  (2003-04-01)  closed by  gbrandl

AESend on Jaguar  (2002-08-14)  closed by  jackjansen

source files using encoding ./. universal newlines  (2003-07-31)  closed by  lemburg

Solaris 8 declares gethostname().  (2005-08-12)  closed by  gbrandl

httplib.HTTPConnection._send_request header parsing bug  (2003-10-27)  closed by  gbrandl

Problem with ftplib on HP-UX11i  (2003-09-25)  closed by  gbrandl

locale.getdefaultlocale doesnt handle all locales gracefully  (2003-09-27)  closed by  gbrandl

NotImplemented return value misinterpreted in new classes  (2003-11-22)  closed by  gbrandl

fileinput does not use universal input  (2003-12-15)  closed by  gbrandl

Cursors not correctly closed after exception.  (2005-05-28)  closed by  gbrandl

urllib2 doesn't handle username/password in url  (2004-04-29)  closed by  gbrandl

urllib2.HTTPBasicAuthHandler problem with [HOST]:[PORT]  (2004-11-29)  closed by  gbrandl

urllib.urlopen() fails to raise exception  (2004-05-04)  closed by  gbrandl

__mul__ taken as __rmul__ for mul-by-int only  (2003-10-25)  closed by  gbrandl

Python crash on __init__/__getattr__/__setattr__ interaction  (2004-04-26)  closed by  gbrandl

Error "exec"ing python code  (2005-03-21)  closed by  gbrandl

type() and isinstance() do not call __getattribute__  (2005-08-19)  closed by  gbrandl

Crash when decoding UTF8  (2006-02-20)  closed by  nnorwitz

New / Reopened RFE

Implement preemptive threads in Python  (2006-02-16)  reopened by  darkprokoba

Implement preemptive threads in Python  (2006-02-16)  opened by  Andrey Petrov

Use new expat version 2.0  (2006-02-17)  opened by  Wolfgang Langner

python executable optionally should search script on PATH  (2005-12-13)  reopened by  cconrad

Extend pre-allocated integers to cover [0, 255]  (2006-02-21)  opened by  Tim Delaney

RFE Closed

Implement preemptive threads in Python  (2006-02-16)  closed by  mwh

fileinput/gzip modules should play well  (2002-06-01)  closed by  birkenfeld

From guido at  Wed Feb 22 02:22:20 2006
From: guido at (Guido van Rossum)
Date: Tue, 21 Feb 2006 20:22:20 -0500
Subject: [Python-Dev] Fixing to allow copying functions
Message-ID: <>

While playing around with the defaultdict patch, adding __reduce__ to
make defaultdict objects properly copyable through the copy module, I
noticed that doesn't support copying function objects. This
seems an oversight, since the (closely related) pickle module *does*
support copying functions. The semantics of pickling a function is
that it just stores the module and function name in the pickle; that
is, if you unpickle it in the same process it'll just return a
reference to the same function object. This would translate into
"atomic" semantics for copying functions: the "copy" is just the
original, for shallow as well as deep copies. It's a simple patch:

--- Lib/ (revision 42537)
+++ Lib/ (working copy)
@@ -101,7 +101,8 @@
     return x
 for t in (type(None), int, long, float, bool, str, tuple,
           frozenset, type, xrange, types.ClassType,
-          types.BuiltinFunctionType):
+          types.BuiltinFunctionType,
+         types.FunctionType):
     d[t] = _copy_immutable
 for name in ("ComplexType", "UnicodeType", "CodeType"):
     t = getattr(types, name, None)
@@ -217,6 +218,7 @@
 d[xrange] = _deepcopy_atomic
 d[types.ClassType] = _deepcopy_atomic
 d[types.BuiltinFunctionType] = _deepcopy_atomic
+d[types.FunctionType] = _deepcopy_atomic

 def _deepcopy_list(x, memo):
     y = []

Any objections? Given that these are picklable, I can't imagine there
are any but I thought I'd ask anyway.

--Guido van Rossum (home page:

From skip at  Wed Feb 22 03:35:57 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 20:35:57 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    >> Let me rephrase that.  I assume the people digging through Py_ssize_t
    >> issues have been looking at compilation warnings for platforms other
    >> than Mac OSX.

    Martin> In the buildbot log, I see only a single one of these, and only
    Martin> in an OSX-specific module. So no - "we" don't look into fixing
    Martin> them, as they don't occur on Linux at all (as _Qdmodule isn't
    Martin> built on Linux).

Sure looks like core to me:

    Objects/bufferobject.c: In function `buffer_repr':
    Objects/bufferobject.c:250: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/bufferobject.c:258: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/bufferobject.c:258: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/funcobject.c: In function `func_set_code':
    Objects/funcobject.c:254: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/funcobject.c:254: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/funcobject.c: In function `func_new':
    Objects/funcobject.c:406: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/funcobject.c:406: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/listobject.c: In function `list_ass_subscript':
    Objects/listobject.c:2604: warning: signed size_t format, Py_ssize_t arg (arg 3)
    Objects/listobject.c:2604: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/dictobject.c: In function `PyDict_MergeFromSeq2':
    Objects/dictobject.c:1152: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/methodobject.c: In function `PyCFunction_Call':
    Objects/methodobject.c:85: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/methodobject.c:96: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/structseq.c: In function `structseq_new':
    Objects/structseq.c:129: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/structseq.c:129: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/structseq.c:137: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/structseq.c:137: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/structseq.c:146: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/structseq.c:146: warning: signed size_t format, Py_ssize_t arg (arg 5)
    Objects/typeobject.c: In function `check_num_args':
    Objects/typeobject.c:3378: warning: signed size_t format, Py_ssize_t arg (arg 4)
    Objects/unicodeobject.c: In function `unicode_decode_call_errorhandler':
    Objects/unicodeobject.c:794: warning: signed size_t format, Py_ssize_t arg (arg 3)
    Objects/unicodeobject.c: In function `unicode_encode_call_errorhandler':
    Objects/unicodeobject.c:2475: warning: signed size_t format, int arg (arg 3)
    Objects/unicodeobject.c: In function `unicode_translate_call_errorhandler':
    Objects/unicodeobject.c:3374: warning: signed size_t format, int arg (arg 3)

This from the build on g5 osx.3 trunk from 22:54 today (21 Feb).


From jcarlson at  Wed Feb 22 03:42:38 2006
From: jcarlson at (Josiah Carlson)
Date: Tue, 21 Feb 2006 18:42:38 -0800
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

"Steven Bethard" <steven.bethard at> wrote:
> On 2/21/06, Josiah Carlson <jcarlson at> wrote:
> > The question which still remains in my mind, which I previously asked,
> > is whether the use cases are compelling enough to warrant the feature
> > addition.
> I don't know whether I support the proposal or not, but in reading
> Mark Russel's email, I realized that I just recently ran into a use
> case:

[snip example where 3 lines are duplicated twice, and a 2 line subset
are duplicated in a third location]

> using something
> like ``curr_suffix :=`` or Phillip J. Eby's suggestion of
> ``.curr_suffix =`` would allow this code to be factored out into a
> function.

In this particular example, there is no net reduction in line use. The
execution speed of your algorithm would be reduced due to function
calling overhead.  There may be a minor clarification improvement, but
arguably no better than the Richie Hindle's functional goto
implementation for Python 2.3 and later.

 - Josiah

From skip at  Wed Feb 22 03:41:16 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 20:41:16 -0600
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

    Martin> So for multiplying this by 8, I would have to create 48 lines of
    Martin> Apache configuration, and use 24 TCP ports. This can be done,
    Martin> but it would take some time to implement.

I'm not too worried about that because it's a one-time cost.  I'd be willing
to help out.  Just shoot me the httpd config file and other necessary bits
and I'll return you the modified stuff.

    Martin> And who is going to look at the 24 pages?

This is, of course, the bigger problem since it's ongoing.  If we solicit
buildbot slaves we should solicit a pair of eyeballs for each slave as well.
That doesn't need to be the owner of the box, but the owner is the likely
first candidate to trick^H^H^H^H^Hask.


From mark.m.mcmahon at  Mon Feb 20 17:06:45 2006
From: mark.m.mcmahon at (Mark Mc Mahon)
Date: Mon, 20 Feb 2006 11:06:45 -0500
Subject: [Python-Dev] Path PEP: some comments (equality)
Message-ID: <>


It seems that the Path module as currently defined leaves equality
testing up to the underlying string comparison. My guess is that this
is fine for Unix (maybe not even) but it is a bit lacking for Windows.

Should the path class implement an __eq__ method that might do some of
the following things:
 - Get the absolute path of both self and the other path
 - normcase both
 - now see if they are equal

This would make working with paths much easier for keys of a
dictionary on windows. (I frequently use a case insensitive string
class for paths if I need them to be keys of a dict.)

My first email to python-dev :-)

From sergey at  Tue Feb 21 10:39:27 2006
From: sergey at (Sergey Dorofeev)
Date: Tue, 21 Feb 2006 12:39:27 +0300
Subject: [Python-Dev] calendar.timegm
Message-ID: <000d01c636ca$b53411a0$>


Historical question ;)

Anyone can explain why function timegm is placed into module calendar, not to module time, where it would be near with similar 
function mktime?

From skip at  Wed Feb 22 05:47:53 2006
From: skip at (skip at
Date: Tue, 21 Feb 2006 22:47:53 -0600
Subject: [Python-Dev] calendar.timegm
In-Reply-To: <000d01c636ca$b53411a0$>
References: <000d01c636ca$b53411a0$>
Message-ID: <>

    Sergey> Historical question ;)

    Sergey> Anyone can explain why function timegm is placed into module
    Sergey> calendar, not to module time, where it would be near with
    Sergey> similar function mktime?

Historical accident. ;-)


From nnorwitz at  Wed Feb 22 06:30:31 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 21 Feb 2006 21:30:31 -0800
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/21/06, Tim Peters <tim.peters at> wrote:
> Since the most fruitful variations (IME) for finding code errors are
> using -r and running a debug build too, I'd change the current
> run-all-the-time recipes to:
> - Stop doing the second "without deleting .py[co]" run.
> - Do one run with a release build.
> - Do one run with a debug build.
> - Use -uall -r for both.

I agree with this, but don't know a clean way to do 2 builds.  I
modified buildbot to:

 - Stop doing the second "without deleting .py[co]" run.
 - Do one run with a debug build.
 - Use -uall -r for both.

Buildbot does *not* also do a release build.  That's the only
difference between your request above.  I agree that it would be
desirable, but I think the debug build is more important than the
release build right now.

We don't have to make this perfect right now.  We can talk about this
at PyCon and resolve the remaining issues.  One thing that would be
nice is to have the master.cfg checked in somewhere so we can track


From nnorwitz at  Wed Feb 22 06:36:36 2006
From: nnorwitz at (Neal Norwitz)
Date: Tue, 21 Feb 2006 21:36:36 -0800
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/21/06, "Martin v. L?wis" <martin at> wrote:
> skip at wrote:
> >     Neal> IMO compiler warnings should generate emails from buildbot.
> >
> > It doesn't generate emails for any other condition.  I think it should just
> > turn the compilation section yellow.
> It would be easy to run the builds with -Werror, making warnings let the
> compilation fail, which in turn is flagged red.

And previously:

> Should we build with -Wno-deprecated (or whatever it is spelled) on OSX?

Hmmm, I'm really tempted to add both of these flags (-Werror
-Wno-deprecated).  Let's discuss this at PyCon.  We can make lots of
changes then.  We might want to wait until after the sprints so people
don't have to deal with this churn.


From greg.ewing at  Wed Feb 22 07:04:05 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 19:04:05 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <dtfou0$81t$>
References: <>
Message-ID: <>

Terry Reedy wrote:

> There were perhaps 10 
> different proposals, including, I believe, 'outer'.  Guido rejected them 
> all as having costs greater than the benefits.

As far as I remember, Guido wasn't particularly opposed
to the idea, but the discussion fizzled out after having
failed to reach a consensus on an obviously right way
to go about it.


From greg.ewing at  Wed Feb 22 07:45:29 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 19:45:29 +1300
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Phillip J. Eby wrote:

>    def incrementer(val):
>        def inc():
>            .val += 1
>            return .val
>        return inc

-1, too obscure.


From greg.ewing at  Wed Feb 22 08:09:20 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 20:09:20 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Russell wrote:

> PEP 227 mentions using := as a rebinding operator, but rejects the  
> idea as it would encourage the use of closures.

Well, anything that facilitates rebinding in outer scopes
is going to encourage the use of closures, so I can't
see that as being a reason to reject a particular means
of rebinding. You either think such rebinding is a good
idea or not -- and that seems to be a matter of highly
individual taste.

On this particular idea, I tend to think it's too obscure
as well. Python generally avoids attaching randomly-chosen
semantics to punctuation, and I'd like to see it stay
that way.


From greg.ewing at  Wed Feb 22 08:11:40 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 20:11:40 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <r01050400-1039-4AD7D4ADA31D11DA8736001124365170@[]>
References: <r01050400-1039-4AD7D4ADA31D11DA8736001124365170@[]>
Message-ID: <>

Just van Rossum wrote:

> Btw, PJE's "crazy" idea (.name, to rebind an outer name) was proposed
> before, but Guido wanted to reserve .name for a (Pascal-like) 'with'
> statement. Hmm,

I guess that doesn't apply any more, since we've already
used "with" for something else.

Regardless, names with leading dots just look ugly and
perlish to me, so I wouldn't be in favour anyway.


From at  Wed Feb 22 08:36:33 2006
From: at (Almann T. Goo)
Date: Wed, 22 Feb 2006 02:36:33 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
Message-ID: <>

> As far as I remember, Guido wasn't particularly opposed
> to the idea, but the discussion fizzled out after having
> failed to reach a consensus on an obviously right way
> to go about it.

My apologies for bringing this debated topic again to the
front-lines--that said, I think there has been good, constructive
things said again and sometimes it doesn't hurt to kick up an old
topic.  After pouring through some of the list archive threads and
reading through this thread, it seems clear to me that the community
doesn't seem all that keen on fixing issue--which was my goal to
ferret out.

For me this is one of those things where the Pythonic thing to do is
not so clear--and that mysterious, enigmatic definition of what it
means to be Pythonic can be quite individual so I definitely don't
want to waste my time arguing what that means.

The most compelling argument for not doing anything about it is that
the use cases are probably not that many--that in itself makes me less
apt to push much harder--especially since my pragmatic side agrees
with a lot of what has been said to this regard.

IMO, Having properly nested scopes in Python in a sense made having
closures a natural idiom to the language and part of its "user
interface."  By not allowing the name re-binding it almost seems like
that "user interface" has a rough edge that is almost too easy to get
cut on.  This in-elegance seems very un-Pythonic to me.

Anyhow, good discussion.


Almann T. Goo at

From greg.ewing at  Wed Feb 22 08:47:40 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 20:47:40 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <001401c6374a$9a3a2fd0$6a01a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:

> Like "autodict" could mean anything.

Everything is meaningless until you know something
about it. If you'd never seen Python before,
would you know what 'dict' meant?

If I were seeing "defaultdict" for the first time,
I would need to look up the docs before I was
confident I knew exactly what it did -- as I've
mentioned before, my initial guess would have
been wrong. The same procedure would lead me to
an understanding of 'autodict' just as quickly.

Maybe 'autodict' isn't the best term either --
I'm open to suggestions. But my instincts still
tell me that 'defaultdict' is the best term
for something *else* that we might want to add
one day as well, so I'm just trying to make
sure we don't squander it lightly.


From greg.ewing at  Wed Feb 22 08:54:45 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 20:54:45 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:

> In this particular example, there is no net reduction in line use. The
> execution speed of your algorithm would be reduced due to function
> calling overhead.

If there were more uses of the function, the line count
reduction would be greater.

In any case, line count and execution speed aren't the
only issues -- there is DRY to consider.


From nnorwitz at  Wed Feb 22 09:10:43 2006
From: nnorwitz at (Neal Norwitz)
Date: Wed, 22 Feb 2006 00:10:43 -0800
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

On 2/21/06, Neal Norwitz <nnorwitz at> wrote:
> I agree with this, but don't know a clean way to do 2 builds.  I
> modified buildbot to:
>  - Stop doing the second "without deleting .py[co]" run.
>  - Do one run with a debug build.
>  - Use -uall -r for both.

I screwed it up, so now it does:

  - Do one run with a debug build.
  - Use -uall -r for both.
  - Still does the second "deleting .py[co]" run

I couldn't think of a simple way to figure out that on most unixes the
program is called python, but on Mac OS X, it's called python.exe.  So
I reverted back to using make testall.  We can make a new test target
to only run once.

I also think I know how to do the "double builds" (one release and one
debug).  But it's too late for me to change it tonight without
screwing it up.

The good/bad news after this change is:

A seg fault on Mac OS when running with -r. :-(


From steve at  Wed Feb 22 09:17:25 2006
From: steve at (Steve Holden)
Date: Wed, 22 Feb 2006 03:17:25 -0500
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<001401c6374a$9a3a2fd0$6a01a8c0@RaymondLaptop1>
Message-ID: <dth6mm$k7i$>

Greg Ewing wrote:
> Raymond Hettinger wrote:
>>Like "autodict" could mean anything.
> Everything is meaningless until you know something
> about it. If you'd never seen Python before,
> would you know what 'dict' meant?
> If I were seeing "defaultdict" for the first time,
> I would need to look up the docs before I was
> confident I knew exactly what it did -- as I've
> mentioned before, my initial guess would have
> been wrong. The same procedure would lead me to
> an understanding of 'autodict' just as quickly.
> Maybe 'autodict' isn't the best term either --
> I'm open to suggestions. But my instincts still
> tell me that 'defaultdict' is the best term
> for something *else* that we might want to add
> one day as well, so I'm just trying to make
> sure we don't squander it lightly.
Given that the default entries behind the non-existent keys don't 
actually exist, something like "virtual_dict" might be appropriate.

Or "phantom_dict", or "ghost_dict".

I agree that the naming of things is important.

Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC           
PyCon TX 2006        

From greg.ewing at  Wed Feb 22 09:23:10 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 21:23:10 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <dth6mm$k7i$>
References: <>
	<> <dth6mm$k7i$>
Message-ID: <>

Steve Holden wrote:

> Given that the default entries behind the non-existent keys don't 
> actually exist, something like "virtual_dict" might be appropriate.

No, that would suggest to me something like
a wrapper object that delegates most of the
mapping protocol to something else. That's
even less like what we're discussing.

In our case the default values are only
virtual until you use them, upon which they
become real. Sort of like a wave function
collapse... hmmm... I suppose 'heisendict'
wouldn't fly, would it?


From fuzzyman at  Wed Feb 22 10:02:33 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 22 Feb 2006 09:02:33 +0000
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Greg Ewing wrote:
> Fuzzyman wrote:
>> I've had problems in code that needs to treat strings, lists and
>> dictionaries differently (assigning values to a container where all
>> three need different handling) and telling the difference but allowing
>> duck typing is *problematic*.
> You need to rethink your design so that you don't
> have to make that kind of distinction.

Well... to *briefly* explain the use case, it's for value assignment in 

It basically accepts as valid values strings and lists of strings [#]_. 
You can also create new subsections by assigning a dictionary.

It needs to be able to recognise lists in order to check each list 
member is a string. (See note below, it still needs to be able to 
recognise lists when writing, even if it is not doing type checking on 

It needs to be able to recognise dictionaries in order to create a new 
section instance (rather than directly assigning the dictionary).

This is *terribly* convenient for the user (trivial example of creating 
a new config file programatically) :

from configobj import ConfigObj
cfg = ConfigObj(newfilename)
cfg['key'] = 'value'
cfg['key2'] = ['value1', 'value2', 'value3']
cfg['section'] = {'key': 'value', 'key2': ['value1', 'value2', 'value3']}

Writes out :

key = value
key2 = value1, value2, value3
key = value
key2 = value1, value2, value3

(Note none of those values needed quoting, so they aren't.)

Obviously I could force the creation of sections and the assignment of 
list values to use separate methods, but it's much less readable and 

The code as is works and has a nice API. It still needs to be able to 
tell what *type* of value is being assigned.

Mapping and sequence protocols are so loosely defined that in order to 
support 'list like objects' and 'dictionary like objects' some arbitrary 
decision about what methods they should support has to be made. (For 
example a read only mapping container is unlikely to implement 
__setitem__ or methods like update).

At first we defined a mapping object as one that defines __getitem__ and 
keys (not update as  I previously said), and list like objects as ones 
that define __getitem__ and *not* keys. For strings we required a 
basestring subclass. In the end I think we ripped this out and just 
settled on isinstance tests.

All the best,

Michael Foord

.. [#] Although it has two modes. In the 'default' mode you can assign 
any object as a value and a string representation is written out. A more 
strict mode checks values at the point you assign  them - so errors will 
be raised at that point rather than propagating into the config file. 
When writing you still need to able to recognise lists because each 
element is properly quoted.
-------------- next part --------------
An HTML attachment was scrubbed...

From fredrik at  Wed Feb 22 10:38:38 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 22 Feb 2006 10:38:38 +0100
Subject: [Python-Dev] defaultdict proposal round three
References: <><><><><><>
Message-ID: <dthbeu$39d$>

Raymond Hettinger wrote:

> Like "autodict" could mean anything.

fwiw, the first google hit for "autodict" appears to be part of someone's
link farm

    At this website we have assistance with autodict. In addition to
    information for autodict we also have the best web sites concerning
    dictionary, non profit and new york. This makes the
    most reliable guide for autodict on the Internet.

and the second is a description of a self-initializing dictionary data type
for Python.


From stephen at  Wed Feb 22 10:48:16 2006
From: stephen at (Stephen J. Turnbull)
Date: Wed, 22 Feb 2006 18:48:16 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Tue, 21 Feb 2006 22:57:31 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> Stephen J. Turnbull wrote:

    >> What I advocate for Python is to require that the standard
    >> base64 codec be defined only on bytes, and always produce
    >> bytes.

    Greg> I don't understand that. It seems quite clear to me that
    Greg> base64 encoding (in the general sense of encoding, not the
    Greg> unicode sense) takes binary data (bytes) and produces
    Greg> characters.

Base64 is a (family of) wire protocol(s).  It's not clear to me that
it makes sense to say that the alphabets used by "baseNN" encodings
are composed of characters, but suppose we stipulate that.

    Greg> So in Py3k the correct usage would be [bytes<->unicode].

IMHO, as a wire protocol, base64 simply doesn't care what Python's
internal representation of characters is.  I don't see any case for
"correctness" here, only for convenience, both for programmers on the
job and students in the classroom.  We can choose the character set
that works best for us.  I think that's 8-bit US ASCII.

My belief is that bytes<->bytes is going to be the dominant use case,
although I don't use binary representation in XML.  However, AFAIK for
on the wire use UTF-8 is strongly recommended for XML, and in that
case it's also efficient to use bytes<->bytes for XML, since
conversion of base64 bytes to UTF-8 characters is simply a matter of
"Simon says, be UTF-8!"

And in the classroom, you're just going to confuse students by telling
them that UTF-8 --[Unicode codec]--> Python string is decoding but
UTF-8 --[base64 codec]--> Python string is encoding, when MAL is
telling them that --> Python string is always decoding.

Sure, it all makes sense if you already know what's going on.  But I
have trouble remembering, especially in cases like UTF-8 vs UTF-16
where Perl and Python have opposite internal representations, and
glibc has a third which isn't either.  If base64 (and gzip, etc) are
all considered bytes<->bytes, there just isn't an issue any more.  The
simple rule wins: to Python string is always decoding.

Why fight it when we can run away with efficiency gains?<wink>

(In the above, "Python string" means the unicode type, not str.)

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From greg.ewing at  Wed Feb 22 11:29:40 2006
From: greg.ewing at (Greg Ewing)
Date: Wed, 22 Feb 2006 23:29:40 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Fuzzyman wrote:

> cfg = ConfigObj(newfilename)
> cfg['key'] = 'value'
> cfg['key2'] = ['value1', 'value2', 'value3']
> cfg['section'] = {'key': 'value', 'key2': ['value1', 'value2', 'value3']}

If the main purpose is to support this kind of notational
convenience, then I'd be inclined to require all the values
used with this API to be concrete strings, lists or dicts.
If you're going to make types part of the API, I think it's
better to do so with a firm hand rather than being half-
hearted and wishy-washy about it.

Then, if it's really necessary to support a wider variety
of types, provide an alternative API that separates the
different cases and isn't type-dependent at all. If someone
has a need for this API, using it isn't going to be much
of an inconvenience, since he won't be able to write out
constructors for his types using notation as compact as
the above anyway.


From jeremy at  Wed Feb 22 12:14:21 2006
From: jeremy at (Jeremy Hylton)
Date: Wed, 22 Feb 2006 06:14:21 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/22/06, Greg Ewing <greg.ewing at> wrote:
> Mark Russell wrote:
> > PEP 227 mentions using := as a rebinding operator, but rejects the
> > idea as it would encourage the use of closures.
> Well, anything that facilitates rebinding in outer scopes
> is going to encourage the use of closures, so I can't
> see that as being a reason to reject a particular means
> of rebinding. You either think such rebinding is a good
> idea or not -- and that seems to be a matter of highly
> individual taste.

At the time PEP 227 was written, nested scopes were contentious.  (I
recall one developer who said he'd be embarassed to tell his
co-workers he worked on Python if it had this feature :-).  Rebinding
was more contentious, so the feature was left out.  I don't think any
particular syntax or spelling for rebinding was favored more or less.

> On this particular idea, I tend to think it's too obscure
> as well. Python generally avoids attaching randomly-chosen
> semantics to punctuation, and I'd like to see it stay
> that way.

I agree.


From fuzzyman at  Wed Feb 22 12:14:12 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 22 Feb 2006 11:14:12 +0000
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	<>
	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Fuzzyman wrote:
>> cfg = ConfigObj(newfilename)
>> cfg['key'] = 'value'
>> cfg['key2'] = ['value1', 'value2', 'value3']
>> cfg['section'] = {'key': 'value', 'key2': ['value1', 'value2', 'value3']}
> If the main purpose is to support this kind of notational
> convenience, then I'd be inclined to require all the values
> used with this API to be concrete strings, lists or dicts.
> If you're going to make types part of the API, I think it's
> better to do so with a firm hand rather than being half-
> hearted and wishy-washy about it.
> [snip..]
Thanks, that's the solution we settled on. We use ``isinstance`` tests 
to determine types.

The user can always do something like :

    cfg['section'] = dict(dict_like_object)

Which isn't so horrible.

All the best,

> --
> Greg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

-------------- next part --------------
An HTML attachment was scrubbed...

From fuzzyman at  Wed Feb 22 12:33:03 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 22 Feb 2006 11:33:03 +0000
Subject: [Python-Dev]*Type
Message-ID: <>

Hello all,

Feel free to shoot this down, but a suggestion.

The operator module defines two functions :


These return a guesstimation as to whether an object passed in supports 
the mapping and sequence protocols.

These protocols are loosely defined. Any object which has a 
``__getitem__`` method defined could support either protocol.

Therefore :

 >>> from operator import isSequenceType, isMappingType
 >>> class anything(object):
...     def __getitem__(self, index):
...         pass
 >>> something = anything()
 >>> isMappingType(something)
 >>> isSequenceType(something)

I suggest we either deprecate these functions as worthless, *or* we 
define the protocols slightly more clearly for user defined classes.

An object prima facie supports the mapping protocol if it defines a 
``__getitem__`` method, and a ``keys`` method.

An object prima facie supports the sequence protocol if it defines a 
``__getitem__`` method, and *not* a ``keys`` method.

As a result code which needs to be able to tell the difference can use 
these functions and can sensibly refer to the definition of the mapping 
and sequence protocols when documenting what sort of objects an API call 
can accept.

All the best,

Michael Foord

From raymond.hettinger at  Wed Feb 22 12:45:47 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 06:45:47 -0500
Subject: [Python-Dev] defaultdict and on_missing()
Message-ID: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>

I'm concerned that the on_missing() part of the proposal is gratuitous.  The main use cases for defaultdict have a simple factory that supplies a zero, empty list, or empty set.  The on_missing() hook is only there to support the rarer case of needing a key to compute a default value.  The hook is not needed for the main use cases.

As it stands, we're adding a method to regular dicts that cannot be usefully called directly.  Essentially, it is a framework method meant to be overridden in a subclass.  So, it only makes sense in the context of subclassing.  In the meantime, we've added an oddball method to the main dict API, arguably the most important object API in Python.  

To use the hook, you write something like this:

    class D(dict):
        def on_missing(self, key):
             return somefunc(key)

However, we can already do something like that without the hook:

    class D(dict):
        def __getitem__(self, key):
                return dict.__getitem__(self, key)
            except KeyError:
                self[key] = value = somefunc(key)
                return value

The latter form is already possible, doesn't require modifying a basic API, and is arguably clearer about when it is called and what it does (the former doesn't explicitly show that the returned value gets saved in the dictionary).

Since we can already do the latter form, we can get some insight into whether the need has ever actually arisen in real code.  I scanned the usual sources (my own code, the standard library, and my most commonly used third-party libraries) and found no instances of code like that.   The closest approximation was safe_substitute() in string.Template where missing keys returned themselves as a default value.  Other than that, I conclude that there isn't sufficient need to warrant adding a funky method to the API for regular dicts.

I wondered why the safe_substitute() example was unique.  I think the answer is that we normally handle default computations through simple in-line code ("if k in d: do1() else do2()" or a try/except pair).  Overriding on_missing() then is really only useful when you need to create a type that can be passed to a client function that was expecting a regular dictionary.  So it does come-up but not much.

Aside:  Why on_missing() is an oddball among dict methods.  When teaching dicts to beginner, all the methods are easily explainable except this one.  You don't call this method directly, you only use it when subclassing, you have to override it to do anything useful, it hooks KeyError but only when raised by __getitem__ and not other methods, etc.  I'm concerned that evening having this method in regular dict API will create confusion about when to use dict.get(), when to use dict.setdefault(), when to catch a KeyError, or when to LBYL.  Adding this one extra choice makes the choice more difficult.

My recommendation:  Dump the on_missing() hook.  That leaves the dict API unmolested and allows a more straight-forward implementation/explanation of collections.default_dict or whatever it ends-up being named.  The result is delightfully simple and easy to understand/explain.


-------------- next part --------------
An HTML attachment was scrubbed...

From greg.ewing at  Wed Feb 22 12:35:39 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 00:35:39 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> Base64 is a (family of) wire protocol(s).  It's not clear to me that
> it makes sense to say that the alphabets used by "baseNN" encodings
> are composed of characters,

Take a look at

where it says

   ...base64 is a binary to text encoding scheme whereby an
   arbitrary sequence of bytes is converted to a sequence of
   printable ASCII characters.

Also see RFC 2045 ( which
defines base64 in terms of an encoding from octets to characters,
and also says

   A 65-character subset of US-ASCII is used ... This subset has
   the important property that it is represented identically in
   all versions of ISO 646 ... and all characters in the subset
   are also represented identically in all versions of EBCDIC.

Which seems to make it perfectly clear that the result
of the encoding is to be considered as characters, which
are not necessarily going to be encoded using ascii.

So base64 on its own is *not* a wire protocol. Only after
encoding the characters do you have a wire protocol.

> I don't see any case for
> "correctness" here, only for convenience,

I'm thinking of convenience, too. Keep in mind that in Py3k,
'unicode' will be called 'str' (or something equally neutral
like 'text') and you will rarely have to deal explicitly with
unicode codings, this being done mostly for you by the I/O
objects. So most of the time, using base64 will be just as
convenient as it is today: base64_encode(my_bytes) and write
the result out somewhere.

The reason I say it's *corrrect* is that if you go straight
from bytes to bytes, you're *assuming* the eventual encoding
is going to be an ascii superset. The programmer is going to
have to know about this assumption and understand all its
consequences and decide whether it's right, and if not, do
something to change it.

Whereas if the result is text, the right thing happens
automatically whatever the ultimate encoding turns out to
be. You can take the text from your base64 encoding, combine
it with other text from any other source to form a complete
mail message or xml document or whatever, and write it out
through a file object that's using any unicode encoding
at all, and the result will be correct.

 > it's also efficient to use bytes<->bytes for XML, since
> conversion of base64 bytes to UTF-8 characters is simply a matter of
> "Simon says, be UTF-8!"

Efficiency is an implementation concern. In Py3k, strings
which contain only ascii or latin-1 might be stored as
1 byte per character, in which case this would not be an

> And in the classroom, you're just going to confuse students by telling
> them that UTF-8 --[Unicode codec]--> Python string is decoding but
> UTF-8 --[base64 codec]--> Python string is encoding, when MAL is
> telling them that --> Python string is always decoding.

Which is why I think that only *unicode* codings should be
available through the .encode and .decode interface. Or
alternatively there should be something more explicit like
.unicode_encode and .unicode_decode that is thus restricted.

Also, if most unicode coding is done in the I/O objects, there
will be far less need for programmers to do explicit unicode
coding in the first place, so likely it will become more of
an advanced topic, rather than something you need to come to
grips with on day one of using unicode, like it is now.


From fredrik at  Wed Feb 22 12:54:28 2006
From: fredrik at (Fredrik Lundh)
Date: Wed, 22 Feb 2006 12:54:28 +0100
Subject: [Python-Dev] defaultdict and on_missing()
References: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
Message-ID: <dthjdk$u0p$>

Raymond Hettinger wrote:

> Aside:  Why on_missing() is an oddball among dict methods.  When
> teaching dicts to beginner, all the methods are easily explainable ex-
> cept this one.  You don't call this method directly, you only use it
> when subclassing, you have to override it to do anything useful, it
> hooks KeyError but only when raised by __getitem__ and not
> other methods, etc.


> My recommendation:  Dump the on_missing() hook.  That leaves
> the dict API unmolested and allows a more straight-forward im-
> plementation/explanation of collections.default_dict or whatever
> it ends-up being named.  The result is delightfully simple and easy
> to understand/explain.


a separate type in collections, a template object (or factory) passed to
the constructor, and implementation inheritance, is more than good en-
ough.  and if I recall correctly, pretty much what Guido first proposed.
I trust his intuition a lot more than I trust the design-by-committee-with-
out-use-cases process.


From greg.ewing at  Wed Feb 22 12:42:16 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 00:42:16 +1300
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
References: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
Message-ID: <>

Raymond Hettinger wrote:
> I'm concerned that the on_missing() part of the proposal is gratuitous.  

I second all that. A clear case of YAGNI.


From python at  Wed Feb 22 12:59:59 2006
From: python at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 06:59:59 -0500
Subject: [Python-Dev]*Type
References: <>
Message-ID: <002101c637a7$81d7afa0$6a01a8c0@RaymondLaptop1>

> >>> from operator import isSequenceType, isMappingType
> >>> class anything(object):
> ...     def __getitem__(self, index):
> ...         pass
> ...
> >>> something = anything()
> >>> isMappingType(something)
> True
> >>> isSequenceType(something)
> True
> I suggest we either deprecate these functions as worthless, *or* we
> define the protocols slightly more clearly for user defined classes.

They are not worthless.  They do a damned good job of differentiating anything 
that CAN be differentiated.

Your example simply highlights the consequences of one of Python's most basic, 
original design choices (using getitem for both sequences and mappings).  That 
choice is now so fundamental to the language that it cannot possibly change. 
Get used to it.

In your example, the results are correct.  The "anything" class can be viewed as 
either a sequence or a mapping.

In this and other posts, you seem to be focusing your design around notions of 
strong typing and mandatory interfaces.  I would suggest that that approach is 
futile unless you control all of the code being run.


From theller at  Wed Feb 22 13:03:55 2006
From: theller at (Thomas Heller)
Date: Wed, 22 Feb 2006 13:03:55 +0100
Subject: [Python-Dev]*Type
In-Reply-To: <>
References: <>
Message-ID: <>

Fuzzyman wrote:
> Hello all,
> Feel free to shoot this down, but a suggestion.
> The operator module defines two functions :
>     isMappingType
>     isSquenceType
> These return a guesstimation as to whether an object passed in supports 
> the mapping and sequence protocols.
> These protocols are loosely defined. Any object which has a 
> ``__getitem__`` method defined could support either protocol.

The docs contain clear warnings about that.

> I suggest we either deprecate these functions as worthless, *or* we 
> define the protocols slightly more clearly for user defined classes.

I have no problems deprecating them since I've never used one of these
functions.  If I want to know if something is a string I use isinstance(),
for string-like objects I would use

  try: obj + ""
  except TypeError:

and so on.

> An object prima facie supports the mapping protocol if it defines a 
> ``__getitem__`` method, and a ``keys`` method.
> An object prima facie supports the sequence protocol if it defines a 
> ``__getitem__`` method, and *not* a ``keys`` method.
> As a result code which needs to be able to tell the difference can use 
> these functions and can sensibly refer to the definition of the mapping 
> and sequence protocols when documenting what sort of objects an API call 
> can accept.


From fuzzyman at  Wed Feb 22 13:18:09 2006
From: fuzzyman at (Fuzzyman)
Date: Wed, 22 Feb 2006 12:18:09 +0000
Subject: [Python-Dev]*Type
In-Reply-To: <002101c637a7$81d7afa0$6a01a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
>> >>> from operator import isSequenceType, isMappingType
>> >>> class anything(object):
>> ...     def __getitem__(self, index):
>> ...         pass
>> ...
>> >>> something = anything()
>> >>> isMappingType(something)
>> True
>> >>> isSequenceType(something)
>> True
>> I suggest we either deprecate these functions as worthless, *or* we
>> define the protocols slightly more clearly for user defined classes.
> They are not worthless.  They do a damned good job of differentiating 
> anything that CAN be differentiated.
But as far as I can tell (and I may be wrong), they only work if the 
object is a subclass of a built in type, otherwise they're broken. So 
you'd have to do a type check as well, unless you document that an API 
call *only* works with a builtin type or subclass.

In which case - an isinstance call does the same, with the advantage of 
not being broken if the object is a user-defined class.

At the very least the function would be better renamed 
``MightBeMappingType``  ;-)

> Your example simply highlights the consequences of one of Python's 
> most basic, original design choices (using getitem for both sequences 
> and mappings).  That choice is now so fundamental to the language that 
> it cannot possibly change. Get used to it.
I have no problem with it - it's useful.

> In your example, the results are correct.  The "anything" class can be 
> viewed as either a sequence or a mapping.
But in practise an object is *unlikely* to be both. (Although 
conceivable a mapping container *could* implement integer indexing an 
thus be both - but *very* rare). Therefore the current behaviour is not 
really useful in any conceivable situation - not that I can think of anyway.

> In this and other posts, you seem to be focusing your design around 
> notions of strong typing and mandatory interfaces.  I would suggest 
> that that approach is futile unless you control all of the code being 
> run.
Not directly. I'm suggesting that the loosely defined protocol (used 
with duck typing) can be made quite a bit more useful by making the 
definition *slightly* more specific.

A preference for strong typing would require subclassing, surely ?

The approach I suggest would allow a *less* 'strongly typed' approach to 
code, because it establishes a convention to decide whether a user 
defined class supports the mapping and sequence protocols.

The simple alternative (which we took in ConfigObj) is to require a 
'strongly typed' interface, because there is currently no useful way to 
determine whether an object that implements __getitem__ supports mapping 
or sequence. (Other than *assuming* that a mapping container implements 
a random choice from the other common mapping methods.)

All the best,

> Raymond

From mwh at  Wed Feb 22 14:53:18 2006
From: mwh at (Michael Hudson)
Date: Wed, 22 Feb 2006 13:53:18 +0000
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <> (
	=?iso-8859-1?q?Martin_v._L=F6wis's_message_of?= "Tue,
	21 Feb 2006 22:34:27 +0100")
References: <> <>
	<> <>
Message-ID: <>

"Martin v. L?wis" <martin at> writes:

> Tim Peters wrote:
>> Speaking of which, a number of test failures over the past few weeks
>> were provoked here only under -r (run tests in random order) or under
>> a debug build, and didn't look like those were specific to Windows. 
>> Adding -r to the buildbot test recipe is a decent idea.  Getting
>> _some_ debug-build test runs would also be good (or do we do that
>> already?).
> So what is your recipe: Add -r to all buildbots? Only to those which
> have an 'a' in their name? Only to every third build? Duplicating
> the number of builders?
> Same question for --with-pydebug. Combining this with -r would multiply
> the number of builders by 4 already.

Instead of running release and debug builds, why not just run debug
builds?  They catch more problems, earlier.


  This song is for anyone ... fuck it.  Shut up and listen.
                         -- Eminem, "The Way I Am"

From raymond.hettinger at  Wed Feb 22 16:21:50 2006
From: raymond.hettinger at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 10:21:50 -0500
Subject: [Python-Dev] defaultdict proposal round three
References: <>
Message-ID: <002101c637c3$b88dd0d0$6a01a8c0@RaymondLaptop1>

> I'd love to remove setdefault in 3.0 -- but I don't think it can be  done 
> before that: default_factory won't cover the occasional use  cases where 
> setdefault is called with different defaults at different  locations, and, 
> rare as those cases may be, any 2.* should not break  any existing code that 
> uses that approach.

I'm not too concerned about this one.  Whenever setdefault gets deprecated , 
then ALL code that used it would have to be changed.  If there were cases with 
different defaults, a regular try/except would do the job just fine (heck, it 
might even be faster because the won't be a wasted instantiation in the cases 
where the key already exists).

There may be other reasons to delay removing setdefault(), but multiple default 
use case isn't one of them.

>> An alternative is to have two possible attributes:
>>   d.default_factory = list
>> or
>>   d.default_value = 0
>> with an exception being raised when both are defined (the test is  done when 
>> the
>> attribute is created, not when the lookup is performed).
> I see default_value as a way to get exactly the same beginner's error  we 
> already have with function defaults:

That makes sense.

I'm somewhat happy with the patch as it stands now.  The only part that needs 
serious rethinking is putting on_missing() in regular dicts.  See my other email 
on that subject.


From chris at  Wed Feb 22 16:40:05 2006
From: chris at (Chris AtLee)
Date: Wed, 22 Feb 2006 10:40:05 -0500
Subject: [Python-Dev] Copying zlib compression objects
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/17/06, Guido van Rossum <guido at> wrote:
> Please submit your patch to SourceForge.

I've submitted the zlib patch as patch #1435422.  I added some test cases to and documented the new methods.  I'd like to test my gzip /
tarfile changes more before creating a patch for it, but I'm interested in
any feedback about the idea of adding snapshot() / restore() methods to the
GzipFile and TarFile classes.

It doesn't look like the underlying bz2 library supports copying compression
/ decompression streams, so for now it's impossible to make corresponding
changes to the bz2 module.

I also noticed that the tarfile reimplements the gzip file format when
dealing with streams.  Would it make sense to refactor some the code
to expose the methods that read/write the gzip file header, and have the
tarfile module use those methods?

-------------- next part --------------
An HTML attachment was scrubbed...

From aleaxit at  Wed Feb 22 16:47:33 2006
From: aleaxit at (Alex Martelli)
Date: Wed, 22 Feb 2006 07:47:33 -0800
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <002101c637c3$b88dd0d0$6a01a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

On Feb 22, 2006, at 7:21 AM, Raymond Hettinger wrote:
> I'm somewhat happy with the patch as it stands now.  The only part  
> that needs serious rethinking is putting on_missing() in regular  
> dicts.  See my other email on that subject.

What if we named it _on_missing? Hook methods intended only to be  
overridden in subclasses are sometimes spelled that way, and it  
removes the need to teach about it to beginners -- it looks private  
so we don't explain it at that point.

My favorite example is Queue.Queue: I teach it (and in fact  
evangelize for it as the one sane way to do threading;-) in "Python  
101", *without* ever mentioning _get, _put etc -- THOSE I teach in  
"Patterns with Python" as the very bext example of the Gof4's classic  
"Template Method" design pattern. If dict had _on_missing I'd have  
another wonderful example to teach from!  (I believe the Library  
Reference avoids teaching about _get, _put etc, too, though I haven't  
checked it for a while).

TM is my favorite DP, so I'm biased in favor of Guido's design, and I  
think that by giving the hook method (not meant to be called, only  
overridden) a "private name" we're meeting enough of your and /F's  
concerns to let _on_missing remain. Its existence does simplify the  
implementation of defaultdict (and some other dict subclasses), and  
"if the implementation is easy to explain, it may be a good idea",  
after all;-)


From guido at  Wed Feb 22 16:49:48 2006
From: guido at (Guido van Rossum)
Date: Wed, 22 Feb 2006 10:49:48 -0500
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
References: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
Message-ID: <>

On 2/22/06, Raymond Hettinger <raymond.hettinger at> wrote:
> I'm concerned that the on_missing() part of the proposal is gratuitous.  The
> main use cases for defaultdict have a simple factory that supplies a zero,
> empty list, or empty set.  The on_missing() hook is only there to support
> the rarer case of needing a key to compute a default value.  The hook is not
> needed for the main use cases.

The on_missing() hook is there to take the action of inserting the
default value into the dict. For this it needs the key.

It seems attractive to collaps default_factory and on_missing into a
single attribute (my first attempt did this, and I was halfway posting
about it before I realized the mistake). But on_missing() really needs
the key, and at the same time you don't want to lose the convenience
of being able to specify set, list, int etc. as default factories, so
default_factory() must be called without the key.

If you don't have on_missing, then the functionality of inserting the
key produced by default_factory would have to be in-lined in
__getitem__, which means the machinery put in place can't be reused
for other use cases -- several people have claimed to have a use case
for returning a value *without* inserting it into the dict.

> As it stands, we're adding a method to regular dicts that cannot be usefully
> called directly.  Essentially, it is a framework method meant to be
> overridden in a subclass.  So, it only makes sense in the context of
> subclassing.  In the meantime, we've added an oddball method to the main
> dict API, arguably the most important object API in Python.

Which to me actually means it's a *good* place to put the hook
functionality, since it allows for maximum reuse.

> To use the hook, you write something like this:
>     class D(dict):
>         def on_missing(self, key):
>              return somefunc(key)

Or, more likely,

def on_missing(key):
    self[key] = value = somefunc()
    return value

> However, we can already do something like that without the hook:
>     class D(dict):
>         def __getitem__(self, key):
>             try:
>                 return dict.__getitem__(self, key)
>             except KeyError:
>                 self[key] = value = somefunc(key)
>                 return value
> The latter form is already possible, doesn't require modifying a basic API,
> and is arguably clearer about when it is called and what it does (the former
> doesn't explicitly show that the returned value gets saved in the
> dictionary).

This is exactly what Google's internal DefaultDict does. But it is
also its downfall, because now *all* __getitem__ calls are weighed
down by going through Python code; in a particular case that came up
at Google I had to recommend against using it for performance reasons.

> Since we can already do the latter form, we can get some insight into
> whether the need has ever actually arisen in real code.  I scanned the usual
> sources (my own code, the standard library, and my most commonly used
> third-party libraries) and found no instances of code like that.   The
> closest approximation was safe_substitute() in string.Template where missing
> keys returned themselves as a default value.  Other than that, I conclude
> that there isn't sufficient need to warrant adding a funky method to the API
> for regular dicts.

In this case I don't believe that the absence of real-life examples
says much (and BTW Google's DefaultDict *is* such a real life example;
it is used in other code). There is not much incentive for subclassing
dict and overriding __getitem__ if the alternative is that in a few
places you have to write two lines of code instead of one:

    if key not in d: d[key] = set()    # this line would be unneeded

> I wondered why the safe_substitute() example was unique.  I think the answer
> is that we normally handle default computations through simple in-line code
> ("if k in d: do1() else do2()" or a try/except pair).  Overriding
> on_missing() then is really only useful when you need to create a type that
> can be passed to a client function that was expecting a regular dictionary.
> So it does come-up but not much.

I think the pattern hasn't been commonly known; people have been
struggling with setdefault() all these years.

> Aside:  Why on_missing() is an oddball among dict methods.  When teaching
> dicts to beginner, all the methods are easily explainable except this one.

You don't seriously teach beginners all dict methods do you?
setdefault(), update(), copy() are all advanced material, and so are
iteritems(), itervalues() and iterkeys() (*especially* the last since
it's redundant through "for i in d:").

> You don't call this method directly, you only use it when subclassing, you
> have to override it to do anything useful, it hooks KeyError but only when
> raised by __getitem__ and not other methods, etc.

The only other methods that raise KeyError are __delitem__, pop() and
popitem(). I don't see how these could use the same hook as
__getitem__ if the only real known use case for the latter is a hook
that inserts the value -- these methods all *delete* an item, so they
would need a different hook anyway (two different hooks, really, since
__delitem__ doesn't need a value). And I can't even think of a
theoretical use case for hooking these, let alone a real one.

> I'm concerned that
> evening having this method in regular dict API will create confusion about
> when to use dict.get(), when to use dict.setdefault(), when to catch a
> KeyError, or when to LBYL.  Adding this one extra choice makes the choice
> more difficult.

Well, obviously if you're not subclassing you can't use on_missing(),
so it doesn't really add much to the available choices, *unless* you
subclass, which is a choice you're likely to make in a different phase
of the design, and not lightly.

> My recommendation:  Dump the on_missing() hook.  That leaves the dict API
> unmolested and allows a more straight-forward implementation/explanation of
> collections.default_dict or whatever it ends-up being named.  The result is
> delightfully simple and easy to understand/explain.

I disagree. on_missing() is exactly the right refactoring. If we
removed on_missing() from dict, we'd have to override __getitem__ in
defaultdict (regardless of whether we give defaultdict an on_missing()
hook or in-line it). But the base class __getitem__ is a careful piece
of work! The override in defaultdict basically has two choices: invoke
dict.__getitem__ and catch the KeyError exception, or copy all the
code. (Using PyDict_GetItem would be even more wrong since it
suppresses exceptions in the hash and comparison phase of the lookup.)
Copying all the code is fraught with maintenance problems. Calling
dict.__getitem__ has the problem that it *could* raise KeyError for
reasons that have nothing to do (directly) with a missing item -- a
broken hash or comparison could also raise this, and in that case it
would be a mistake to call on_missing().

IMO pretty much the only reason for keeping the changes contained
within the collections module would be code modularity; but the above
argument about code reuse deconstructs that argument.

--Guido van Rossum (home page:

From foom at  Wed Feb 22 17:17:41 2006
From: foom at (James Y Knight)
Date: Wed, 22 Feb 2006 11:17:41 -0500
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 22, 2006, at 6:35 AM, Greg Ewing wrote:

> I'm thinking of convenience, too. Keep in mind that in Py3k,
> 'unicode' will be called 'str' (or something equally neutral
> like 'text') and you will rarely have to deal explicitly with
> unicode codings, this being done mostly for you by the I/O
> objects. So most of the time, using base64 will be just as
> convenient as it is today: base64_encode(my_bytes) and write
> the result out somewhere.
> The reason I say it's *corrrect* is that if you go straight
> from bytes to bytes, you're *assuming* the eventual encoding
> is going to be an ascii superset. The programmer is going to
> have to know about this assumption and understand all its
> consequences and decide whether it's right, and if not, do
> something to change it.
> Whereas if the result is text, the right thing happens
> automatically whatever the ultimate encoding turns out to
> be. You can take the text from your base64 encoding, combine
> it with other text from any other source to form a complete
> mail message or xml document or whatever, and write it out
> through a file object that's using any unicode encoding
> at all, and the result will be correct.

This makes little sense for mail. You combine *bytes*, in various and  
possibly different encodings to form a mail message. Some MIME  
sections might have a base64 Content-Transfer-Encoding, others might  
be 8bit encoded, others might be 7bit encoded, others might be quoted- 
printable encoded. Before the C-T-E encoding, you will have had to do  
the Content-Type encoding, coverting your text into bytes with the  
desired character encoding: utf-8, iso-8859-1, etc. Having the final  
mail message be made up of "characters", right before transmission to  
the socket would be crazy.


From python at  Wed Feb 22 17:20:44 2006
From: python at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 11:20:44 -0500
Subject: [Python-Dev] defaultdict and on_missing()
References: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
Message-ID: <001301c637cb$f05fd960$6a01a8c0@RaymondLaptop1>

[Guido van Rossum"]
> If we removed on_missing() from dict, we'd have to override
> __getitem__ in defaultdict (regardless of whether we give
>defaultdict an on_missing() hook or in-line it).

You have another option.  Keep your current modifications to
dict.__getitem__ but do not include dict.on_missing().  Let it only
be called in a subclass IF it is defined; otherwise, raise KeyError.

That keeps me happy since the basic dict API won't show on_missing(),
but it still allows a user to attach an on_missing method to a dict subclass 
or if needed.  I think all your test cases would still pass without 
This is approach is not much different than for other magic methods which
kick-in if defined or revert to a default behavior if not.

My core concern is to keep the dict API clean as a whistle.


From bob at  Wed Feb 22 18:04:38 2006
From: bob at (Bob Ippolito)
Date: Wed, 22 Feb 2006 09:04:38 -0800
Subject: [Python-Dev]*Type
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 22, 2006, at 4:18 AM, Fuzzyman wrote:

> Raymond Hettinger wrote:
>>>>>> from operator import isSequenceType, isMappingType
>>>>>> class anything(object):
>>> ...     def __getitem__(self, index):
>>> ...         pass
>>> ...
>>>>>> something = anything()
>>>>>> isMappingType(something)
>>> True
>>>>>> isSequenceType(something)
>>> True
>>> I suggest we either deprecate these functions as worthless, *or* we
>>> define the protocols slightly more clearly for user defined classes.
>> They are not worthless.  They do a damned good job of differentiating
>> anything that CAN be differentiated.
> But as far as I can tell (and I may be wrong), they only work if the
> object is a subclass of a built in type, otherwise they're broken. So
> you'd have to do a type check as well, unless you document that an API
> call *only* works with a builtin type or subclass.

If you really cared, you could check hasattr(something, 'get') and  
hasattr(something, '__getitem__'), which is a pretty good indicator  
that it's a mapping and not a sequence (in a dict-like sense, anyway).


From ianb at  Wed Feb 22 18:10:14 2006
From: ianb at (Ian Bicking)
Date: Wed, 22 Feb 2006 11:10:14 -0600
Subject: [Python-Dev]*Type
In-Reply-To: <002101c637a7$81d7afa0$6a01a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
>>>>>from operator import isSequenceType, isMappingType
>>>>>class anything(object):
>>...     def __getitem__(self, index):
>>...         pass
>>>>>something = anything()
>>I suggest we either deprecate these functions as worthless, *or* we
>>define the protocols slightly more clearly for user defined classes.
> They are not worthless.  They do a damned good job of differentiating anything 
> that CAN be differentiated.

But they are just identical...?  They seem terribly pointless to me. 
Deprecation is one option, of course.  I think Michael's suggestion also 
makes sense.  *If* we distinguish between sequences and mapping types 
with two functions, *then* those two functions should be distinct.  It 
seems kind of obvious, doesn't it?

I think hasattr(obj, 'keys') is the simplest distinction of the two 
kinds of collections.

> Your example simply highlights the consequences of one of Python's most basic, 
> original design choices (using getitem for both sequences and mappings).  That 
> choice is now so fundamental to the language that it cannot possibly change. 
> Get used to it.
> In your example, the results are correct.  The "anything" class can be viewed as 
> either a sequence or a mapping.
> In this and other posts, you seem to be focusing your design around notions of 
> strong typing and mandatory interfaces.  I would suggest that that approach is 
> futile unless you control all of the code being run.

I think you are reading too much into it.  If the functions exist, they 
should be useful.  That's all I see in Michael's suggestion.

Ian Bicking  /  ianb at  /

From guido at  Wed Feb 22 18:44:33 2006
From: guido at (Guido van Rossum)
Date: Wed, 22 Feb 2006 12:44:33 -0500
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <001301c637cb$f05fd960$6a01a8c0@RaymondLaptop1>
References: <001401c637a5$86053ea0$6a01a8c0@RaymondLaptop1>
Message-ID: <>

On 2/22/06, Raymond Hettinger <python at> wrote:
> [Guido van Rossum"]
> > If we removed on_missing() from dict, we'd have to override
> > __getitem__ in defaultdict (regardless of whether we give
> >defaultdict an on_missing() hook or in-line it).
> You have another option.  Keep your current modifications to
> dict.__getitem__ but do not include dict.on_missing().  Let it only
> be called in a subclass IF it is defined; otherwise, raise KeyError.

OK. I don't have time right now for another round of patches -- if you
do, please go ahead. The dict docs in my latest patch must be updated
somewhat (since they document on_missing()).

> That keeps me happy since the basic dict API won't show on_missing(),
> but it still allows a user to attach an on_missing method to a dict subclass
> when
> or if needed.  I think all your test cases would still pass without
> modification.

Except the ones that explicitly test for dict.on_missing()'s presence
and behavior. :-)

> This is approach is not much different than for other magic methods which
> kick-in if defined or revert to a default behavior if not.

Right. Plenty of precedent there.

> My core concern is to keep the dict API clean as a whistle.


--Guido van Rossum (home page:

From jason.orendorff at  Wed Feb 22 19:17:51 2006
From: jason.orendorff at (Jason Orendorff)
Date: Wed, 22 Feb 2006 13:17:51 -0500
Subject: [Python-Dev] Path PEP: some comments (equality)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Mark Mc Mahon <mark.m.mcmahon at> wrote:
> It seems that the Path module as currently defined leaves equality
> testing up to the underlying string comparison. My guess is that this
> is fine for Unix (maybe not even) but it is a bit lacking for Windows.
> Should the path class implement an __eq__ method that might do some of
> the following things:
> - Get the absolute path of both self and the other path
> - normcase both
> - now see if they are equal

This has been suggested to me many times.

Unfortunately, since Path is a subclass of string, this breaks stuff in
weird ways.

For example:
    '' == path('') == path('X.PY') == 'X.PY', but '' != 'X.PY'

And hashing needs to be consistent with __eq__:
    hash('') == hash(path('X.PY')) == hash('X.PY') ???

Granted these problems would only pop up in code where people are mixing
Path and string objects.  But they would cause really obscure bugs in
practice, very difficult for a non-expert to figure out and fix.  It's safer
for Paths to behave just like strings.

-------------- next part --------------
An HTML attachment was scrubbed...

From pje at  Wed Feb 22 19:56:48 2006
From: pje at (Phillip J. Eby)
Date: Wed, 22 Feb 2006 13:56:48 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical  Scopes
In-Reply-To: <
References: <>
Message-ID: <>

At 06:14 AM 2/22/2006 -0500, Jeremy Hylton wrote:
>On 2/22/06, Greg Ewing <greg.ewing at> wrote:
> > Mark Russell wrote:
> >
> > > PEP 227 mentions using := as a rebinding operator, but rejects the
> > > idea as it would encourage the use of closures.
> >
> > Well, anything that facilitates rebinding in outer scopes
> > is going to encourage the use of closures, so I can't
> > see that as being a reason to reject a particular means
> > of rebinding. You either think such rebinding is a good
> > idea or not -- and that seems to be a matter of highly
> > individual taste.
>At the time PEP 227 was written, nested scopes were contentious.  (I
>recall one developer who said he'd be embarassed to tell his
>co-workers he worked on Python if it had this feature :-).

Was this because of the implicit "inheritance" of variables from the 
enclosing scope?

>   Rebinding
>was more contentious, so the feature was left out.  I don't think any
>particular syntax or spelling for rebinding was favored more or less.
> > On this particular idea, I tend to think it's too obscure
> > as well. Python generally avoids attaching randomly-chosen
> > semantics to punctuation, and I'd like to see it stay
> > that way.
>I agree.

Note that '.' for relative naming already exists (attribute access), and 
Python 2.5 is already introducing the use of a leading '.' (with no name 
before it) to mean "parent of the current namespace".  So, using that 
approach to reference variables in outer scopes wouldn't be without precedents.

IOW, I propose no new syntax for rebinding, but instead making variables' 
context explicit.  This would also fix the issue where right now you have 
to inspect a function and its context to find out whether there's a closure 
and what's in it.  The leading dots will be quite visible.

From tjreedy at  Wed Feb 22 20:09:41 2006
From: tjreedy at (Terry Reedy)
Date: Wed, 22 Feb 2006 14:09:41 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
References: <><><><dtfou0$81t$>
Message-ID: <dticu2$303$>

"Almann T. Goo" < at> wrote in message 
news:7e9b97090602212336ka0b5fd8r2c85b1c0e914aff1 at
> IMO, Having properly nested scopes in Python in a sense made having
> closures a natural idiom to the language and part of its "user
> interface."  By not allowing the name re-binding it almost seems like
> that "user interface" has a rough edge that is almost too easy to get
> cut on.

I can see now how it would look that way to someone who has experience with 
fully functional nested scopes in other languages and who learns Python 
after no-write nested scoping was added.  What is not mentioned in the ref 
manual and what I suppose may not be obvious even reading the PEP is that 
Python added nesting to solve two particular problems.  First was the 
inability to write nested recursive functions without the hack of stuffing 
its name in the global namespace (or of patching the byte code).  Second 
was the need to misuse the default arg mechanism in nested functions.  What 
we have now pretty well fixes both.

Terry Jan Reedy

From tjreedy at  Wed Feb 22 20:32:30 2006
From: tjreedy at (Terry Reedy)
Date: Wed, 22 Feb 2006 14:32:30 -0500
Subject: [Python-Dev] bytes.from_hex()
References: <><><><><><><><><><><>
Message-ID: <dtiemd$a3k$>

"Greg Ewing" <greg.ewing at> wrote in message 
news:43FC4C8B.6080300 at
> Efficiency is an implementation concern.

It is also a user concern, especially if inefficiency overruns memory 

> In Py3k, strings
> which contain only ascii or latin-1 might be stored as
> 1 byte per character, in which case this would not be an
> issue.

If 'might' becomes 'will', I and I suspect others will be happier with the 
change.  And I would be happy if the choice of physical storage was pretty 
much handled behind the scenes, as with the direction int/long unification 
is going.

> Which is why I think that only *unicode* codings should be
> available through the .encode and .decode interface. Or
> alternatively there should be something more explicit like
> .unicode_encode and .unicode_decode that is thus restricted.

I prefer the shorter names and using recode, for instance, for bytes to 

Terry Jan Reedy

From steven.bethard at  Wed Feb 22 20:41:54 2006
From: steven.bethard at (Steven Bethard)
Date: Wed, 22 Feb 2006 12:41:54 -0700
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/21/06, Phillip J. Eby <pje at> wrote:
> Here's a crazy idea, that AFAIK has not been suggested before and could
> work for both globals and closures: using  a leading dot, ala the new
> relative import feature.  e.g.:
>     def incrementer(val):
>         def inc():
>             .val += 1
>             return .val
>         return inc
> The '.' would mean "this name, but in the nearest outer scope that defines
> it".  Note that this could include the global scope, so the 'global'
> keyword could go away in 2.5.  And in Python 3.0, the '.' could become
> *required* for use in closures, so that it's not necessary for the reader
> to check a function's outer scope to see whether closure is taking
> place.  EIBTI.

FWIW, I think this is nice.  Since it uses the same dot-notation that
normal attribute access uses, it's clearly accessing the attribute of
*some* namespace.  It's not perfectly intuitive that the accessed
namespace is the enclosing one, but I do think it's at least more
intuitive than the suggested := operator, and at least as intuitive as
a ``global``-like declaration.  And, as you mention, it's consistent
with the relative import feature.

I'm a little worried that this proposal will get lost amid the mass of
other suggestions being thrown out right now.  Any chance of turning
this into a PEP?

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From python at  Wed Feb 22 20:43:51 2006
From: python at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 14:43:51 -0500
Subject: [Python-Dev]*Type
References: <>
Message-ID: <000801c637e8$547d8390$6a01a8c0@RaymondLaptop1>

[Ian Bicking]
> They seem terribly pointless to me.

FWIW, here is the script that had I used while updating and improving the two 
functions (can't remember whether it was for Py2.3 or Py2.4).  It lists 
comparative results for many different types of inputs.  Since perfection was 
not possible, the goal was to have no false negatives and mostly accurate 
positives.  IMO, they do a pretty good job and are able to access information in 
not otherwise visable to pure Python code.  With respect to user defined 
instances, I don't care that they can't draw a distinction where none exists in 
the first place -- at some point you have to either fallback on duck-typing or 
be in control of what kind of arguments you submit to your functions. 
Practicality beats purity -- especially when a pure solution doesn't exist (i.e. 
given a user defined class that defines just __getitem__, both mapping or 
sequence behavior is a possibility).

---- Analysis Script ----

from collections import deque
from UserList import UserList
from UserDict import UserDict
from operator import *
types = (set,
         int, float, complex, long, bool,
         str, unicode,
         list, UserList, tuple, deque,

for t in types:
    print isMappingType(t()), isSequenceType(t()), repr(t()), repr(t)

class c:
    def __repr__(self):
        return 'Instance w/o getitem'

class cn(object):
    def __repr__(self):
        return 'NewStyle Instance w/o getitem'

class cg:
    def __repr__(self):
        return 'Instance w getitem'
    def __getitem__(self):
        return 10

class cng(object):
    def __repr__(self):
        return 'NewStyle Instance w getitem'
    def __getitem__(self):
        return 10

def f():
    return 1

def g():
    yield 1

for i in (None, NotImplemented, g(), c(), cn()):
    print isMappingType(i), isSequenceType(i), repr(i), type(i)

for i in (cg(), cng(), dict(), UserDict()):
    print isMappingType(i), isSequenceType(i), repr(i), type(i)

---- Output ----

False False set([]) <type 'set'>
False False 0 <type 'int'>
False False 0.0 <type 'float'>
False False 0j <type 'complex'>
False False 0L <type 'long'>
False False False <type 'bool'>
False True '' <type 'str'>
False True u'' <type 'unicode'>
False True [] <type 'list'>
True True [] <class UserList.UserList at 0x00F11B70>
False True () <type 'tuple'>
False True deque([]) <type 'collections.deque'>
False False None <type 'NoneType'>
False False NotImplemented <type 'NotImplementedType'>
False False <generator object at 0x00F230A8> <type 'generator'>
False False Instance w/o getitem <type 'instance'>
False False NewStyle Instance w/o getitem <class ''>
True True Instance w getitem <type 'instance'>
True True NewStyle Instance w getitem <class '__main__.cng'>
True False {} <type 'dict'>
True True {} <type 'instance'>

From edcjones at  Wed Feb 22 21:27:56 2006
From: edcjones at (Edward C. Jones)
Date: Wed, 22 Feb 2006 15:27:56 -0500
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossen wrote:
> I think the pattern hasn't been commonly known; people have been
> struggling with setdefault() all these years.

I use setdefault _only_ to speed up the following code pattern:

if akey not in somedict:
     somedict[akey] = list()

These lines of simple Python are much easier to read and write than

somedict.setdefault(akey, list()).append(avalue)

From rrr at  Wed Feb 22 21:28:52 2006
From: rrr at (Ron Adam)
Date: Wed, 22 Feb 2006 14:28:52 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <dtiemd$a3k$>
References: <><><><><><><><><><><>	<>
Message-ID: <>

Terry Reedy wrote:

> "Greg Ewing" <greg.ewing at> wrote in message 
>> Which is why I think that only *unicode* codings should be
>> available through the .encode and .decode interface. Or
>> alternatively there should be something more explicit like
>> .unicode_encode and .unicode_decode that is thus restricted.
> I prefer the shorter names and using recode, for instance, for bytes to 
> bytes.

While I prefer constructors with an explicit encode argument, and use a 
recode() method for 'like to like' coding.  Then the whole encode/decode 
confusion goes away.

From fuzzyman at  Wed Feb 22 22:00:57 2006
From: fuzzyman at (Michael Foord)
Date: Wed, 22 Feb 2006 21:00:57 +0000
Subject: [Python-Dev]*Type
In-Reply-To: <000801c637e8$547d8390$6a01a8c0@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
> [Ian Bicking]
>> They seem terribly pointless to me.
> FWIW, here is the script that had I used while updating and improving 
> the two functions (can't remember whether it was for Py2.3 or Py2.4).  
> It lists comparative results for many different types of inputs.  
> Since perfection was not possible, the goal was to have no false 
> negatives and mostly accurate positives.  IMO, they do a pretty good 
> job and are able to access information in not otherwise visable to 
> pure Python code.  With respect to user defined instances, I don't 
> care that they can't draw a distinction where none exists in the first 
> place -- at some point you have to either fallback on duck-typing or 
> be in control of what kind of arguments you submit to your functions. 
> Practicality beats purity -- especially when a pure solution doesn't 
> exist (i.e. given a user defined class that defines just __getitem__, 
> both mapping or sequence behavior is a possibility).
But given :

True True Instance w getitem <type 'instance'>
True True NewStyle Instance w getitem <class '__main__.cng'>
True True [] <class UserList.UserList at 0x00F11B70>
True True {} <type 'instance'>

(Last one is UserDict)

I can't conceive of circumstances where this is useful without duck 
typing *as well*.

The tests seem roughly analogous to :

def isMappingType(obj):
    return isinstance(obj, dict) or hasattr(obj, '__getitem__')

def isSequenceType(obj):
    return isinstance(obj, (basestring, list, tuple, collections.deque)) 
or hasattr(obj, '__getitem__')

If you want to allow sequence access you could either just use the 
isinstance or you *have* to trap an exception in the case of a mapping 
object being passed in.

Redefining (effectively) as :

def isMappingType(obj):
    return isinstance(obj, dict) or (hasattr(obj, '__getitem__') and 
hasattr(obj, 'keys'))

def isSequenceType(obj):
    return isinstance(obj, (basestring, list, tuple, collections.deque)) 
or (hasattr(obj, '__getitem__')
        and not hasattr(obj, 'keys'))

Makes the test useful where you want to know you can safely treat an 
object as a mapping (or sequence) *and* where you want to tell the 

The only code that would break is use of mapping objects that don't 
define ``keys`` and sequences that do. I imagine these must be very rare 
and *would* be interested in seeing real code that does break. 
Especially if that code cannot be trivially rewritten to use the first 

All the best,

Michael Foord
> ---- Analysis Script ----
> from collections import deque
> from UserList import UserList
> from UserDict import UserDict
> from operator import *
> types = (set,
>         int, float, complex, long, bool,
>         str, unicode,
>         list, UserList, tuple, deque,
> )
> for t in types:
>    print isMappingType(t()), isSequenceType(t()), repr(t()), repr(t)
> class c:
>    def __repr__(self):
>        return 'Instance w/o getitem'
> class cn(object):
>    def __repr__(self):
>        return 'NewStyle Instance w/o getitem'
> class cg:
>    def __repr__(self):
>        return 'Instance w getitem'
>    def __getitem__(self):
>        return 10
> class cng(object):
>    def __repr__(self):
>        return 'NewStyle Instance w getitem'
>    def __getitem__(self):
>        return 10
> def f():
>    return 1
> def g():
>    yield 1
> for i in (None, NotImplemented, g(), c(), cn()):
>    print isMappingType(i), isSequenceType(i), repr(i), type(i)
> for i in (cg(), cng(), dict(), UserDict()):
>    print isMappingType(i), isSequenceType(i), repr(i), type(i)
> ---- Output ----
> False False set([]) <type 'set'>
> False False 0 <type 'int'>
> False False 0.0 <type 'float'>
> False False 0j <type 'complex'>
> False False 0L <type 'long'>
> False False False <type 'bool'>
> False True '' <type 'str'>
> False True u'' <type 'unicode'>
> False True [] <type 'list'>
> True True [] <class UserList.UserList at 0x00F11B70>
> False True () <type 'tuple'>
> False True deque([]) <type 'collections.deque'>
> False False None <type 'NoneType'>
> False False NotImplemented <type 'NotImplementedType'>
> False False <generator object at 0x00F230A8> <type 'generator'>
> False False Instance w/o getitem <type 'instance'>
> False False NewStyle Instance w/o getitem <class ''>
> True True Instance w getitem <type 'instance'>
> True True NewStyle Instance w getitem <class '__main__.cng'>
> True False {} <type 'dict'>
> True True {} <type 'instance'>

From mcherm at  Wed Feb 22 22:13:28 2006
From: mcherm at (Michael Chermside)
Date: Wed, 22 Feb 2006 13:13:28 -0800
Subject: [Python-Dev] defaultdict and on_missing()
Message-ID: <>

A minor related point about on_missing():

Haven't we learned from regrets over the .next() method of iterators
that all "magically" invoked methods should be named using the __xxx__
pattern? Shouldn't it be named __on_missing__() instead?

-- Michael Chermside

From python at  Wed Feb 22 22:18:58 2006
From: python at (Raymond Hettinger)
Date: Wed, 22 Feb 2006 16:18:58 -0500
Subject: [Python-Dev]*Type
References: <><002101c637a7$81d7afa0$6a01a8c0@RaymondLaptop1><><000801c637e8$547d8390$6a01a8c0@RaymondLaptop1>
Message-ID: <004801c637f5$9d79acb0$6a01a8c0@RaymondLaptop1>

> But  given :
> True True Instance w getitem <type 'instance'>
> True True NewStyle Instance w getitem <class '__main__.cng'>
> True True [] <class UserList.UserList at 0x00F11B70>
> True True {} <type 'instance'>
> (Last one is UserDict)
> I can't conceive of circumstances where this is useful without duck
> typing *as well*.

Yawn.  Give it up.  For user defined instances, these functions can only 
discriminate between the presence or absence of __getitem__.  If you're trying 
to distinguish between sequences and mappings for instances, you're own your own 
with duck-typing.  Since there is no mandatory mapping or sequence API, the 
operator module functions cannot add more checks without getting some false 
negatives (your original example is a case in point).

Use the function as-is and add your own isinstance checks for your own personal 
definition of what makes a mapping a mapping and what makes a sequence a 
sequence.  Or better yet, stop designing APIs that require you to differentiate 
things that aren't really different ;-)


From pedronis at  Wed Feb 22 22:27:51 2006
From: pedronis at (Samuele Pedroni)
Date: Wed, 22 Feb 2006 22:27:51 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>	<>	<dtfou0$81t$>
Message-ID: <>

Almann T. Goo wrote:
>>As far as I remember, Guido wasn't particularly opposed
>>to the idea, but the discussion fizzled out after having
>>failed to reach a consensus on an obviously right way
>>to go about it.
> My apologies for bringing this debated topic again to the
> front-lines--that said, I think there has been good, constructive
> things said again and sometimes it doesn't hurt to kick up an old
> topic.  After pouring through some of the list archive threads and
> reading through this thread, it seems clear to me that the community
> doesn't seem all that keen on fixing issue--which was my goal to
> ferret out.
> For me this is one of those things where the Pythonic thing to do is
> not so clear--and that mysterious, enigmatic definition of what it
> means to be Pythonic can be quite individual so I definitely don't
> want to waste my time arguing what that means.
> The most compelling argument for not doing anything about it is that
> the use cases are probably not that many--that in itself makes me less
> apt to push much harder--especially since my pragmatic side agrees
> with a lot of what has been said to this regard.
> IMO, Having properly nested scopes in Python in a sense made having
> closures a natural idiom to the language and part of its "user
> interface."  By not allowing the name re-binding it almost seems like
> that "user interface" has a rough edge that is almost too easy to get
> cut on.  This in-elegance seems very un-Pythonic to me.

If you are looking for rough edges about nested scopes in Python
this is probably worse:

 >>> x = []
 >>> for i in range(10):
...   x.append(lambda : i)
 >>> [y() for y in x]
[9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

although experienced people can live with it. The fact is that
importing nested scope from the like of Scheme it was not considered
that in Scheme for example, looping constructs introduce new scopes.
So this work more as expected there. There were long threads
about this at some point too.

Idioms and features mostly never port straightforwardly from language
to language.

For example Python has nothing with the explicit context introduction
and grouping of a Scheme 'let', so is arguable that nested scope
code, especially with rebindings, would be less clear, readable than
in Scheme (tastes in parenthesis kept aside).

> Anyhow, good discussion.
> Cheers,
> Almann
> --
> Almann T. Goo
> at
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From brett at  Wed Feb 22 22:22:19 2006
From: brett at (Brett Cannon)
Date: Wed, 22 Feb 2006 13:22:19 -0800
Subject: [Python-Dev] PEP 358 (bytes type) comments
Message-ID: <>

First off, thanks to Neil for writing this all down.  The whole thread
of discussion on the bytes type was rather long and thus hard to
follow.  Nice to finally have it written down in a PEP.

Anyway, a few comments on the PEP.  One, should the hex() method
instead be an attribute, implemented as a property?  Seems like static
data that is entirely based on the value of the bytes object and thus
is not properly represented by a method.

Next, why are the __*slice__ methods to be defined?  Docs say they are

And for the open-ended questions, I don't think sort() is needed.

Lastly, maybe I am just dense, but it took me a second to realize that
it will most likely return the ASCII string for __str__() for use in
something like socket.send(), but it isn't explicitly stated anywhere.
 There is a chance someone might think that __str__ will somehow
return the sequence of integers as a string does exist.


From pedronis at  Wed Feb 22 22:32:01 2006
From: pedronis at (Samuele Pedroni)
Date: Wed, 22 Feb 2006 22:32:01 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Greg Ewing wrote:
> Jeremy Hylton wrote:
>>The names of naming statements are quite hard to get right, I fear.
> My vote goes for 'outer'.
> And if this gets accepted, remove 'global' in 3.0.

In 3.0 we could remove 'global' even without 'outer',
and make module global scopes read-only, not rebindable
after the top-level code has run (i.e. more like function
body scopes). The only free-for-all namespaces would be
class and instance ones. I can think of some
gains from this.  <.3 wink>

From nas at  Wed Feb 22 22:28:44 2006
From: nas at (Neil Schemenauer)
Date: Wed, 22 Feb 2006 14:28:44 -0700
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Feb 16, 2006 at 12:47:22PM -0800, Guido van Rossum wrote:
> BTW, for folks who want to experiment, it's quite simple to create a
> working bytes implementation by inheriting from array.array. Here's a
> quick draft (which only takes str instance arguments):

Here's a more complete prototype.  Also, I checked in the PEP as
#358 after making changes suggested by Guido.


import sys
from array import array
import re
import binascii

class bytes(array):

    __slots__ = []

    def __new__(cls, initialiser=None, encoding=None):
        b = array.__new__(cls, "B")
        if isinstance(initialiser, basestring):
            if isinstance(initialiser, unicode):
                if encoding is None:
                    encoding = sys.getdefaultencoding()
                initialiser = initialiser.encode(encoding)
            initialiser = [ord(c) for c in initialiser]
        elif encoding is not None:
            raise TypeError("explicit encoding invalid for non-string "
        return b

    def fromhex(self, data):
        data = re.sub(r'\s+', '', data)
        return bytes(binascii.unhexlify(data))

    def __str__(self):
        return self.tostring()

    def __repr__(self):
        return "bytes(%r)" % self.tolist()

    def __add__(self, other):
        if isinstance(other, array):
            return bytes(super(bytes, self).__add__(other))
        return NotImplemented

    def __mul__(self, n):
        return bytes(super(bytes, self).__mul__(n))
    __rmul__ = __mul__
    def __getslice__(self, i, j):
        return bytes(super(bytes, self).__getslice__(i, j))

    def hex(self):
        return binascii.hexlify((self.tostring()))

    def decode(self, encoding):
        return self.tostring().decode(encoding)

From anthony at  Wed Feb 22 22:50:12 2006
From: anthony at (Anthony Baxter)
Date: Thu, 23 Feb 2006 08:50:12 +1100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Sunday 12 February 2006 21:51, Thomas Wouters wrote:
> Well, in the past, features -- even syntax changes -- have gone in
> between the last beta and the final release (but reminding Guido
> might bring him to tears of regret. ;) Features have also gone into
> what would have been 'bugfix releases' if you looked at the
> numbering alone (1.5 -> 1.5.1 -> 1.5.2, for instance.) "The past"
> doesn't have a very impressive track record... 

*cough* Go on. Try slipping a feature into a bugfix release now, see 
how loudly you can make an Australian swear...

See also PEP 006. Do I need to add a "bad language" caveat in it?

From bob at  Wed Feb 22 23:03:29 2006
From: bob at (Bob Ippolito)
Date: Wed, 22 Feb 2006 14:03:29 -0800
Subject: [Python-Dev] PEP 358 (bytes type) comments
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 22, 2006, at 1:22 PM, Brett Cannon wrote:

> First off, thanks to Neil for writing this all down.  The whole thread
> of discussion on the bytes type was rather long and thus hard to
> follow.  Nice to finally have it written down in a PEP.
> Anyway, a few comments on the PEP.  One, should the hex() method
> instead be an attribute, implemented as a property?  Seems like static
> data that is entirely based on the value of the bytes object and thus
> is not properly represented by a method.
> Next, why are the __*slice__ methods to be defined?  Docs say they are
> deprecated.
> And for the open-ended questions, I don't think sort() is needed.

sort would be totally useless for bytes.  array.array doesn't have  
sort either.

> Lastly, maybe I am just dense, but it took me a second to realize that
> it will most likely return the ASCII string for __str__() for use in
> something like socket.send(), but it isn't explicitly stated anywhere.
>  There is a chance someone might think that __str__ will somehow
> return the sequence of integers as a string does exist.

That would be a bad idea given that bytes are supposed make the str  
type go away.  It's probably better to make __str__ return __repr__  
like it does for most types.  If bytes type supports the buffer API  
(one would hope so), functions like socket.send should do the right  
thing as-is.


From guido at  Wed Feb 22 23:19:19 2006
From: guido at (Guido van Rossum)
Date: Wed, 22 Feb 2006 17:19:19 -0500
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

However the definition of "feature" vs. "bugfix" isn't always crystal clear.

Some things that went into 2.4 recently felt like small features to
me; but others may disagree:

- fixing to allow chunk size to be > 2GB
- supporting Unicode filenames in

Are these features or bugfixes?

On 2/22/06, Anthony Baxter <anthony at> wrote:
> On Sunday 12 February 2006 21:51, Thomas Wouters wrote:
> > Well, in the past, features -- even syntax changes -- have gone in
> > between the last beta and the final release (but reminding Guido
> > might bring him to tears of regret. ;) Features have also gone into
> > what would have been 'bugfix releases' if you looked at the
> > numbering alone (1.5 -> 1.5.1 -> 1.5.2, for instance.) "The past"
> > doesn't have a very impressive track record...
> *cough* Go on. Try slipping a feature into a bugfix release now, see
> how loudly you can make an Australian swear...
> See also PEP 006. Do I need to add a "bad language" caveat in it?

--Guido van Rossum (home page:

From tdelaney at  Wed Feb 22 23:47:06 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Thu, 23 Feb 2006 09:47:06 +1100
Subject: [Python-Dev] [ python-Feature Requests-1436243 ] Extend
	pre-allocated integers to cover [0, 255]
Message-ID: <> wrote:

> Status: Closed
> Resolution: Accepted

And here I was, thinking I might actually work on this and submit a
patch on the weekend ...

Tim Delaney

From anthony at  Wed Feb 22 23:50:29 2006
From: anthony at (Anthony Baxter)
Date: Thu, 23 Feb 2006 09:50:29 +1100
Subject: [Python-Dev] release plan for 2.5 ?
In-Reply-To: <>
References: <dsbc3h$rct$>
Message-ID: <>

On Thursday 23 February 2006 09:19, Guido van Rossum wrote:
> However the definition of "feature" vs. "bugfix" isn't always
> crystal clear.
> Some things that went into 2.4 recently felt like small features to
> me; but others may disagree:
> - fixing to allow chunk size to be > 2GB
> - supporting Unicode filenames in
> Are these features or bugfixes?

Sure, the line isn't so clear sometimes. I consider both of these 
bugfixes, but others could disagree. True/False, on the other hand, I 
don't think anyone disagrees about <wink/duck>

This stuff is always open for discussion, of course. 

Anthony Baxter     <anthony at>
It's never too late to have a happy childhood.

From tdelaney at  Thu Feb 23 00:03:05 2006
From: tdelaney at (Delaney, Timothy (Tim))
Date: Thu, 23 Feb 2006 10:03:05 +1100
Subject: [Python-Dev]*Type
Message-ID: <>

Raymond Hettinger wrote:

> Your example simply highlights the consequences of one of Python's
> most basic, original design choices (using getitem for both sequences
> and mappings).  That choice is now so fundamental to the language
> that it cannot possibly change. 

Hmm - just a thought ...

Since we're adding the __index__ magic method, why not have a
__getindexed__ method for sequences.

Then semantics of indexing operations would be something like:

    if hasattr(obj, '__getindexed__'):
        return obj.__getindexed__(val.__index__())
       return obj.__getitem__(val)

Similarly __setindexed__ and __delindexed__.

This would allow distinguishing between sequences and mappings in a
fairly backwards-compatible way. It would also enforce that only indexes
can be used for sequences.

The backwards-incompatibility comes in when you have a type that
implements __getindexed__, and a subclass that implements __getitem__
e.g. if `list` implemented __getindexed__ then any `list` subclass that
overrode __getitem__ would fail. However, I think we could make it 100%
backwards-compatible for the builtin sequence types if they just had
__getindexed__ delegate to __getitem__. Effectively:

    class list (object):

        def __getindexed__(self, index):
            return self.__getitem__(index)

Tim Delaney

From anthony at  Thu Feb 23 00:36:12 2006
From: anthony at (Anthony Baxter)
Date: Thu, 23 Feb 2006 10:36:12 +1100
Subject: [Python-Dev] buildbot, and test failures
Message-ID: <>

It took 2 hours, but I caught up on Python-dev email. Hoorah.

So, couple of things - the trunk has test failures for me, right now. 
test test_email failed -- Traceback (most recent call last):
line 2111, in test_parsedate_acceptable_to_time_functions
    eq(time.localtime(t)[:6], timetup[:6])
AssertionError: (2003, 2, 5, 14, 47, 26) != (2003, 2, 5, 13, 47, 26)

Right now, Australia's in daylight savings, I suspect that's the 
problem here.

I also see intermittent failures from test_socketserver:
test test_socketserver crashed -- socket.error: (111, 'Connection 
is the only error message. When it fails, regrtest fails to exit - it
just sits there after printing out the summary. This suggests that 
there's a threaded server not getting cleaned up correctly.  
test_socketserver could probably do with a rewrite. 

Who's the person who hands out buildbot username/password pairs? I 
have an Ubuntu x86 box here that can become one (I think the only 
linux, currently, is Gentoo...)

Anthony Baxter     <anthony at>
It's never too late to have a happy childhood.

From spam4bsimons at  Thu Feb 23 00:45:18 2006
From: spam4bsimons at (Brendan Simons)
Date: Wed, 22 Feb 2006 18:45:18 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 21-Feb-06, at 11:21 AM, Almann T. Goo" < at>

>> Why not just use a class?
>> def incgen(start=0, inc=1) :
>>     class incrementer(object):
>>       a = start - inc
>>       def __call__(self):
>>          self.a += inc
>>          return self.a
>>     return incrementer()
>> a = incgen(7, 5)
>> for n in range(10):
>>     print a(),
> Because I think that this is a workaround for a concept that the
> language doesn't support elegantly with its lexically nested scopes.
> IMO, you are emulating name rebinding in a closure by creating an
> object to encapsulate the name you want to rebind--you don't need this
> workaround if you only need to access free variables in an enclosing
> scope.  I provided a "lighter" example that didn't need a callable
> object but could use any mutable such as a list.
> This kind of workaround is needed as soon as you want to re-bind a
> parent scope's name, except in the case when the parent scope is the
> global scope (since there is the "global" keyword to handle this).
> It's this dichotomy that concerns me, since it seems to be against the
> elegance of Python--at least in my opinion.
> It seems artificially limiting that enclosing scope name rebinds are
> not provided for by the language especially since the behavior with
> the global scope is not so.  In a nutshell I am proposing a solution
> to make nested lexical scopes to be orthogonal with the global scope
> and removing a "wart," as Jeremy put it, in the language.
> -Almann
> --
> Almann T. Goo
> at

If I may be so bold, couldn't this be addressed by introducing a  
"rebinding" operator?  So the ' = ' operator would continue to create  
a new name in the current scope, and the (say) ' := ' operator would  
for an existing name to rebind.   The two operators would highlight  
the special way Python handles variable / name assignment, which many  
newbies miss.

(from someone who was surprised by this quirk of Python before:

Brendan Simons

-------------- next part --------------
An HTML attachment was scrubbed...

From tim.peters at  Thu Feb 23 00:59:28 2006
From: tim.peters at (Tim Peters)
Date: Wed, 22 Feb 2006 18:59:28 -0500
Subject: [Python-Dev] buildbot vs. Windows
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

[Neal Norwitz]
> ...
> I also think I know how to do the "double builds" (one release and one
> debug).  But it's too late for me to change it tonight without
> screwing it up.

I'm not mad :-).  The debug build is more fruitful than the release
build for finding problems, so doing two debug-build runs is an
improvement (keeping in mind that some bugs only show up in release
builds, though -- for example, subtly incorrect C code that works
differently depending on whether compiler optimization is in effect).

> The good/bad news after this change is:
> A seg fault on Mac OS when running with -r. :-(

Yay!  That's certainly good/bad news.  Since I always run with -r,
I've had the fun of tracking most of these down.  Sometimes it's very
hard, sometimes not.  regrtest's -f option is usually needed, to force
running the tests in exactly the same order, then commenting test
names out in binary-search fashion to get a minimal subset.  Alas,
half the time the cause for a -r segfault turns out to be an error in
refcounting or in setting up gc'able containers, and has nothing in
particular to do with the specific tests being run.  Those are the
"very hard" ones ;-)  Setting the gc threshold to 1 (do a full
collection on every allocation) can sometimes provoke such problems

From greg.ewing at  Thu Feb 23 01:16:11 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 13:16:11 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Josiah Carlson wrote:
> However, I believe global was and is necessary for the
> same reasons for globals in any other language.

Oddly, in Python, 'global' isn't actually necessary,
since the module can always import itself and use
attribute access.

Clearly, though, Guido must have thought at the time
that it was worth providing an alternative way.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 01:30:32 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 13:30:32 +1300
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <dthbeu$39d$>
References: <>
Message-ID: <>

Fredrik Lundh wrote:

> fwiw, the first google hit for "autodict" appears to be part of someone's
> link farm
>     At this website we have assistance with autodict. In addition to
>     information for autodict we also have the best web sites concerning
>     dictionary, non profit and new york.

Hmmm, looks like some sort of bot that takes the words in
your search and stuffs them into its response. I wonder
if they realise how silly the results end up sounding?

I've seen these sorts of things before, but I haven't
quite figured out yet how they manage to get into Google's
database if they're auto-generated. Anyone have any clues
what goes on?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From at  Thu Feb 23 02:12:29 2006
From: at (Almann T. Goo)
Date: Wed, 22 Feb 2006 20:12:29 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

> Oddly, in Python, 'global' isn't actually necessary,
> since the module can always import itself and use
> attribute access.
> Clearly, though, Guido must have thought at the time
> that it was worth providing an alternative way.

I believe that use cases for rebinding globals (module attributes)
from within a module are more numerous than rebinding in an enclosing
lexical scope (although rebinding a name in the global scope from a
local scope is really just a specific case of that).  I would think
this was probably a motivator for the 'global' key word to avoid
clumsier workarounds.  Since there were no nested lexical scopes back
then, there was no need to have a construct for arbitrary enclosing


Almann T. Goo at

From greg.ewing at  Thu Feb 23 03:28:52 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 15:28:52 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <dtiemd$a3k$>
References: <>
	<> <dtiemd$a3k$>
Message-ID: <>

Terry Reedy wrote:
> "Greg Ewing" <greg.ewing at> wrote in message 
>>Efficiency is an implementation concern.
> It is also a user concern, especially if inefficiency overruns memory 
> limits.

Sure, but what I mean is that it's better to find what's
conceptually right and then look for an efficient way
of implementing it, rather than letting the implementation
drive the design.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From spam4bsimons at  Thu Feb 23 03:46:09 2006
From: spam4bsimons at (Brendan Simons)
Date: Wed, 22 Feb 2006 21:46:09 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 22-Feb-06, at 9:28 PM, python-dev-request at wrote:

> On 21-Feb-06, at 11:21 AM, Almann T. Goo" < at>
>  wrote:
>>> Why not just use a class?
>>> def incgen(start=0, inc=1) :
>>>     class incrementer(object):
>>>       a = start - inc
>>>       def __call__(self):
>>>          self.a += inc
>>>          return self.a
>>>     return incrementer()
>>> a = incgen(7, 5)
>>> for n in range(10):
>>>     print a(),
>> Because I think that this is a workaround for a concept that the
>> language doesn't support elegantly with its lexically nested scopes.
>> IMO, you are emulating name rebinding in a closure by creating an
>> object to encapsulate the name you want to rebind--you don't need  
>> this
>> workaround if you only need to access free variables in an enclosing
>> scope.  I provided a "lighter" example that didn't need a callable
>> object but could use any mutable such as a list.
>> This kind of workaround is needed as soon as you want to re-bind a
>> parent scope's name, except in the case when the parent scope is the
>> global scope (since there is the "global" keyword to handle this).
>> It's this dichotomy that concerns me, since it seems to be against  
>> the
>> elegance of Python--at least in my opinion.
>> It seems artificially limiting that enclosing scope name rebinds are
>> not provided for by the language especially since the behavior with
>> the global scope is not so.  In a nutshell I am proposing a solution
>> to make nested lexical scopes to be orthogonal with the global scope
>> and removing a "wart," as Jeremy put it, in the language.
>> -Almann
>> --
>> Almann T. Goo
>> at
> If I may be so bold, couldn't this be addressed by introducing a  
> "rebinding" operator?  So the ' = ' operator would continue to  
> create a new name in the current scope, and the (say) ' := '  
> operator would for an existing name to rebind.   The two operators  
> would highlight the special way Python handles variable / name  
> assignment, which many newbies miss.
> (from someone who was surprised by this quirk of Python before:   
>   -Brendan
> --
> Brendan Simons

Sorry, this got hung up in my email outbox.  I see the thread has  
touched on this idea in the meantime.  So, yeah.  Go team.

Brendan Simons

-------------- next part --------------
An HTML attachment was scrubbed...

From greg.ewing at  Thu Feb 23 03:49:42 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 15:49:42 +1300
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Bethard wrote:
>  And, as you mention, it's consistent
> with the relative import feature.

Only rather vaguely -- it's really somewhat different.

With imports, .foo is an abbreviation for,
where myself is the absolute name for the current module,
and you could replace all instances of .foo with that.
But in the suggested scheme, .foo wouldn't have any
such interpretation -- there would be no other way of
spelling it.

Also, with imports, the dot refers to a single well-
defined point in the module-name hierarchy, but here it
would imply a search upwards throught the scope hierarchy.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 03:53:21 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 15:53:21 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
	<> <dtiemd$a3k$>
Message-ID: <>

Ron Adam wrote:

> While I prefer constructors with an explicit encode argument, and use a 
> recode() method for 'like to like' coding.  Then the whole encode/decode 
> confusion goes away.

I'd be happy with that, too.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From pje at  Thu Feb 23 04:01:52 2006
From: pje at (Phillip J. Eby)
Date: Wed, 22 Feb 2006 22:01:52 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

At 03:49 PM 2/23/2006 +1300, Greg Ewing wrote:
>Steven Bethard wrote:
> >  And, as you mention, it's consistent
> > with the relative import feature.
>Only rather vaguely -- it's really somewhat different.
>With imports, .foo is an abbreviation for,
>where myself is the absolute name for the current module,
>and you could replace all instances of .foo with that.

Actually, "import .foo" is an abbreviation for "import", not 

From barry at  Thu Feb 23 04:29:08 2006
From: barry at (Barry Warsaw)
Date: Wed, 22 Feb 2006 22:29:08 -0500
Subject: [Python-Dev] getdefault(), the real replacement for setdefault()
Message-ID: <>

Guido's on_missing() proposal is pretty good for what it is, but it is
not a replacement for set_default().  The use cases for a derivable,
definition or instantiation time framework is different than the
call-site based decision being made with setdefault().  The difference
is that in the former case, the class designer or instantiator gets to
decide what the default is, and in the latter (i.e. current) case, the
user gets to decide.

Going back to first principles, the two biggest problems with today's
setdefault() is 1) the default object gets instantiated whether you need
it or not, and 2) the idiom is not very readable.

To directly address these two problems, I propose a new method called
getdefault() with the following signature:

def getdefault(self, key, factory)

This yields the following idiom:

d.getdefault('foo', list).append('bar')

Clearly this completely addresses problem #1.  The implementation is
simple and obvious, and there's no default object instantiated unless
the key is missing.

I think #2 is addressed nicely too because "getdefault()" shifts the
focus on what the method returns rather than the effect of the method on
the target dict.  Perhaps that's enough to make the chained operation on
the returned value feel more natural.  "getdefault()" also looks more
like "get()" so maybe that helps it be less jarring.

This approach also seems to address Raymond's objections because
getdefault() isn't "special" the way on_missing() would be.

Anyway, I don't think it's an either/or choice with Guido's subclass.
Instead I think they are different use cases.  I would add getdefault()
to the standard dict API, remove (eventually) setdefault(), and add
Guido's subclass in a separate module.  But I /wouldn't/ clutter the
built-in dict's API with on_missing().



_missing = object()

def getdefault(self, key, factory):
    value = self.get(key, _missing)
    if value is _missing:
        value = self[key] = factory()
    return value

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : 

From steven.bethard at  Thu Feb 23 04:58:57 2006
From: steven.bethard at (Steven Bethard)
Date: Wed, 22 Feb 2006 20:58:57 -0700
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Bethard wrote:
> And, as you mention, it's consistent with the relative import feature.

Greg Ewing wrote:
> With imports, .foo is an abbreviation for,
> where myself is the absolute name for the current module,
> and you could replace all instances of .foo with that.

Phillip J. Eby wrote:
> Actually, "import .foo" is an abbreviation for "import", not
> "import".

If we wanted to be fully consistent with the relative import
mechanism, we would require as many dots as nested scopes.  So:

   def incrementer(val):
       def inc():
           .val += 1
           return .val
       return inc

but also:

    def incrementer_getter(val):
        def incrementer():
            def inc():
                ..val += 1
                return ..val
            return inc
        return incrementer

(Yes, I know the example is silly.  It's not meant as a use case, just
to demonstrate the usage of dots.)  I actually don't care which way it
goes here, but if you want to make the semantics as close to the
relative import semantics as possible, then this is the way to go.

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From greg.ewing at  Thu Feb 23 05:07:33 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:07:33 +1300
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Bethard wrote:

> Phillip J. Eby wrote:
>>Actually, "import .foo" is an abbreviation for "import", not

Oops, sorry, you're right.


Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From martin at  Thu Feb 23 05:13:16 2006
From: martin at (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 23 Feb 2006 05:13:16 +0100
Subject: [Python-Dev] buildbot, and test failures
In-Reply-To: <>
References: <>
Message-ID: <>

Anthony Baxter wrote:
> Who's the person who hands out buildbot username/password pairs?

That's me.

> I 
> have an Ubuntu x86 box here that can become one (I think the only 
> linux, currently, is Gentoo...)

How different are the Linuxes, though? How many of them do we need?


From greg.ewing at  Thu Feb 23 05:23:21 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:23:21 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

James Y Knight wrote:

> Some MIME  sections 
> might have a base64 Content-Transfer-Encoding, others might  be 8bit 
> encoded, others might be 7bit encoded, others might be quoted- printable 
> encoded.

I stand corrected -- in that situation you would have to encode
the characters before combining them with other material.

However, this doesn't change my view that the result of base64
encoding by itself is characters, not bytes. To go straight
to bytes would require assuming an encoding, and that would
make it *harder* to use in cases where you wanted a different
encoding, because you'd first have to undo the default
encoding and then re-encode it using the one you wanted.

It may be reasonable to provide an easy way to go straight
from raw bytes to ascii-encoded-base64 bytes, but that should
be a different codec. The plain base64 codec should produce

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 05:25:30 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:25:30 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
Message-ID: <>

Samuele Pedroni wrote:

> If you are looking for rough edges about nested scopes in Python
> this is probably worse:
>  >>> x = []
>  >>> for i in range(10):
> ...   x.append(lambda : i)
> ...
>  >>> [y() for y in x]
> [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

As an aside, is there any chance that this could be
changed in 3.0? I.e. have the for-loop create a new
binding for the loop variable on each iteration.

I know Guido seems to be attached to the idea of
being able to use the value of the loop variable
after the loop exits, but I find that to be a dubious
practice readability-wise, and I can't remember ever
using it. There are other ways of getting the same
effect, e.g. assigning it to another variable before
breaking out of the loop, or putting the loop in a
function and using return.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 05:25:34 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:25:34 +1300
Subject: [Python-Dev]*Type
In-Reply-To: <>
References: <>
Message-ID: <>

Delaney, Timothy (Tim) wrote:

> Since we're adding the __index__ magic method, why not have a
> __getindexed__ method for sequences.

I don't think this is a good idea, since it would be
re-introducing all the confusion that the existence of
two C-level indexing slots has led to, this time for
user-defined types.

> The backwards-incompatibility comes in when you have a type that
> implements __getindexed__, and a subclass that implements __getitem__

I don't think this is just a backwards-incompatibility
issue. Having a single syntax that can correspond to more
than one special method is inherently ambiguous. What do
you do if both are defined? Sure you can come up with
some rule to handle it, but it's better to avoid the
situation in the first place.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 05:27:21 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:27:21 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

Almann T. Goo wrote:

 > (although rebinding a name in the global scope from a
> local scope is really just a specific case of that).

That's what rankles people about this, I think -- there
doesn't seem to be a good reason for treating the global
scope so specially, given that all scopes could be
treated uniformly if only there were an 'outer' statement.
All the arguments I've seen in favour of the status quo
seem like rationalisations after the fact.

 > Since there were no nested lexical scopes back
> then, there was no need to have a construct for arbitrary enclosing
> scopes.

However, if nested scopes *had* existed back then, I
rather suspect we would have had an 'outer' statement
from the beginning, or else 'global' would have been
given the semantics we are now considering for 'outer'.

Of all the suggestions so far, it seems to me that
'outer' is the least radical and most consistent with
what we already have. How about we bung it in and see
how it goes? We can always yank it out in 3.0 if it
turns out to be a horrid mistake and we get swamped
with a terabyte of grievously abusive nested scope
code. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 05:27:28 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:27:28 +1300
Subject: [Python-Dev] Path PEP: some comments (equality)
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Mc Mahon wrote:

> Should the path class implement an __eq__ method that might do some of
> the following things:
>  - Get the absolute path of both self and the other path

I don't think that any path operations should implicitly
touch the file system like this. The paths may not
represent real files or may be for a system other than
the one the program is running on.

>  - normcase both

Not sure about this one either. When dealing with remote
file systems, it can be hard to know whether a path will
be interpreted as case-sensitive or not. This can be a
problem even with local filesystems, e.g. on MacOSX
where you can have both HFS (case-insensitive) and
Unix (case-sensitive) filesystems mounted.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From greg.ewing at  Thu Feb 23 05:27:33 2006
From: greg.ewing at (Greg Ewing)
Date: Thu, 23 Feb 2006 17:27:33 +1300
Subject: [Python-Dev]*Type
In-Reply-To: <>
References: <>
Message-ID: <>

Fuzzyman wrote:

> The operator module defines two functions :
>     isMappingType
>     isSquenceType
 > These protocols are loosely defined. Any object which has a
 > ``__getitem__`` method defined could support either protocol.

These functions are actually testing for the presence
of two different __getitem__ methods at the C level, one
in the "mapping" substructure of the type object, and the
other in the "sequence" substructure. This only works
for types implemented in C which make use of this distinction.
It's not much use for user-defined classes, where the
presence of a __getitem__ method causes both of these
slots to become populated.

Having two different slots for __getitem__ seems to have
been an ill-considered feature in the first place and
would probably best be removed in 3.0. I wouldn't mind if
these two functions went away.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From stephen at  Thu Feb 23 07:05:43 2006
From: stephen at (Stephen J. Turnbull)
Date: Thu, 23 Feb 2006 15:05:43 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Thu, 23 Feb 2006 00:35:39 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> Stephen J. Turnbull wrote:

    >> Base64 is a (family of) wire protocol(s).  It's not clear to me
    >> that it makes sense to say that the alphabets used by "baseNN"
    >> encodings are composed of characters,

    Greg> Take a look at [this that the other]

Those references use "character" in an ambiguous and ill-defined way.
Trying to impose Python unicode object semantics on "vague characters"
is a bad idea IMO.

    Greg> Which seems to make it perfectly clear that the result of
    Greg> the encoding is to be considered as characters, which are
    Greg> not necessarily going to be encoded using ascii.

Please define "character," and explain how its semantics map to
Python's unicode objects.

    Greg> So base64 on its own is *not* a wire protocol. Only after
    Greg> encoding the characters do you have a wire protocol.

No, base64 isn't a wire protocol.  Rather, it's a schema for a family
of wire protocols, whose alphabets are heuristically chosen on the
assumption that code units which happen to correspond to alpha-numeric
code points in a commonly-used coded character set are more likely to
pass through a communication channel without corruption.

Note that I have _precisely_ defined what I mean.  You still have the
problem that you haven't defined character, and that is a real
problem, see below.

    >> I don't see any case for "correctness" here, only for
    >> convenience,

    Greg> I'm thinking of convenience, too. Keep in mind that in Py3k,
    Greg> 'unicode' will be called 'str' (or something equally neutral
    Greg> like 'text') and you will rarely have to deal explicitly
    Greg> with unicode codings, this being done mostly for you by the
    Greg> I/O objects. So most of the time, using base64 will be just
    Greg> as convenient as it is today: base64_encode(my_bytes) and
    Greg> write the result out somewhere.

Convenient, yes, but incorrect.  Once you mix those bytes with the
Python string type, they become subject to all the usual operations on
characters, and there's no way for Python to tell you that you didn't
want to do that.  Ie,

    Greg> Whereas if the result is text, the right thing happens
    Greg> automatically whatever the ultimate encoding turns out to
    Greg> be. You can take the text from your base64 encoding, combine
    Greg> it with other text from any other source to form a complete
    Greg> mail message or xml document or whatever, and write it out
    Greg> through a file object that's using any unicode encoding at
    Greg> all, and the result will be correct.

Only if you do no transformations that will harm the base64-encoding.
This is why I say base64 is _not_ based on characters, at least not in
the way they are used in Python strings.  It doesn't allow any of the
usual transformations on characters that might be applied globally to
a mail composition buffer, for example.

In other words, you don't escape from the programmer having to know
what he's doing.  EIBTI, and the setup I advocate forces the
programmer to explicitly decide where to convert base64 objects to a
textual representation.  This reminds him that he'd better not touch
that text.

    Greg> The reason I say it's *corrrect* is that if you go straight
    Greg> from bytes to bytes, you're *assuming* the eventual encoding
    Greg> is going to be an ascii superset.  The programmer is going
    Greg> to have to know about this assumption and understand all its
    Greg> consequences and decide whether it's right, and if not, do
    Greg> something to change it.

I'm not assuming any such thing, except in the context of analysis of
implementation efficiency.  And the programmer needs to know about the
semantics of text that is actually a base64-encoded object, and that
they are different from string semantics.

This is something that programmers are used to dealing with in the
case of Python 2.x str and C char[]; the whole point of the unicode
type is to allow the programmer to abstract from that when dealing
human-readable text.  Why confuse the issue.

    >> And in the classroom, you're just going to confuse students by
    >> telling them that UTF-8 --[Unicode codec]--> Python string is
    >> decoding but UTF-8 --[base64 codec]--> Python string is
    >> encoding, when MAL is telling them that --> Python string is
    >> always decoding.

    Greg> Which is why I think that only *unicode* codings should be
    Greg> available through the .encode and .decode interface. Or
    Greg> alternatively there should be something more explicit like
    Greg> .unicode_encode and .unicode_decode that is thus restricted.

    Greg> Also, if most unicode coding is done in the I/O objects,
    Greg> there will be far less need for programmers to do explicit
    Greg> unicode coding in the first place, so likely it will become
    Greg> more of an advanced topic, rather than something you need to
    Greg> come to grips with on day one of using unicode, like it is
    Greg> now.

So then you bring it right back in with base64.  Now they need to know
about bytes<->unicode codecs.

Of course it all comes down to a matter of judgment.  I do find your
position attractive, but I just don't think it will work for naive
users the way you think it will.  It's also possible to make a precise
statement of the rationale for my approach, which I have not been able
to achieve for the "base64 uses characters" approach, and nobody else
has demonstrated one, yet.

On the other hand, I don't think either approach imposes substantially
more burden on the advanced programmer, nor does either proposal
involve a specific restriction on usage (aka "dumbing down the

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From nick.bastin at  Thu Feb 23 07:14:12 2006
From: nick.bastin at (Nicholas Bastin)
Date: Thu, 23 Feb 2006 01:14:12 -0500
Subject: [Python-Dev] Unifying trace and profile
In-Reply-To: <6949EC6CD39F97498A57E0FA55295B2101C312B6@ex9.hostedexchange.local>
References: <6949EC6CD39F97498A57E0FA55295B2101C312B6@ex9.hostedexchange.local>
Message-ID: <>

On 2/21/06, Robert Brewer <fumanchu at> wrote:
> 1. Allow trace hooks to receive c_call, c_return, and c_exception events
> (like profile does).

I can easily make this modification.  You can also register the same
bound method for trace and profile, which sortof eliminates this

> 2. Allow profile hooks to receive line events (like trace does).

You really don't want this in the general case.  Line events make
profiling *really* slow, and they're not that accurate (although many
thanks to Armin last year for helping me make them much more
accurate).  I guess what you require is to be able to selectively turn
on events, thus eliminating the notion of 'trace' or 'profile'
entirely, but I don't have a good idea of how to implement that at
least as efficiently as the current system at the moment - I'm sure it
could be done, I just haven't put any thought into it.

> 3. Expose new sys.gettrace() and getprofile() methods, so trace and
> profile functions that want to play nice can call
> sys.settrace/setprofile(None) only if they are the current hook.

Not a bad idea, although are you really running into this problem a lot?

> 4. Make "the same move" that sys.exitfunc -> atexit made (from a single
> function to multiple functions via registration), so multiple
> tracers/profilers can play nice together.

It seems very unlikely that you'll want to have a trace hook and
profile hook installed at the same time, given the extreme
unreliability this will introduce into the profiler.

> 5. Allow the core to filter on the "event" arg before hook(frame, event,
> arg) is called.

What do you mean by this, exactly?  How would you use this feature?

> 6. Unify tracing and profiling, which would remove a lot of redundant
> code in ceval and sysmodule and free up some space in the PyThreadState
> struct to boot.

The more events you throw in profiling makes it slow, however.  Line
events, while a nice thing to have, theoretically, would probably make
a profiler useless.  If you want to create line-by-line timing data,
we're going to have to look for a more efficient way (like sampling).

> 7. As if the above isn't enough of a dream, it would be nice to have a
> bytecode tracer, which didn't bother with the f_lineno logic in
> maybe_call_line_trace, but just called the hook on every instruction.

I'm working on one, but given how much time I've had to work on my
profiler in the last year, I'm not even going to guess when I'll get a
real shot at looking at that.

My long-term goal is to eliminate profiling and tracing from the core
interpreter entirely and implement the functionality in such a way
that they don't cost you when not in use (i.e., implement profilers
and debuggers which poke into the process from the outside, rather
than be supported natively through events).  This isn't impossible,
but it's difficult because of the large variety of platforms.  I have
access to most of them, but again, my time is hugely constrained right
now for python development work.


From stephen at  Thu Feb 23 07:21:17 2006
From: stephen at (Stephen J. Turnbull)
Date: Thu, 23 Feb 2006 15:21:17 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Ron Adam's message of "Wed, 22
	Feb 2006 14:28:52 -0600")
References: <>
	<> <dtiemd$a3k$>
Message-ID: <>

>>>>> "Ron" == Ron Adam <rrr at> writes:

    Ron> Terry Reedy wrote:

    >> I prefer the shorter names and using recode, for instance, for
    >> bytes to bytes.

    Ron> While I prefer constructors with an explicit encode argument,
    Ron> and use a recode() method for 'like to like' coding. 

'Recode' is a great name for the conceptual process, but the methods
are directional.  Also, in internationalization work, "recode"
strongly connotes "encodingA -> original -> encodingB", as in iconv.

I do prefer constructors, as it's generally not a good idea to do
encoding/decoding in-place for human-readable text, since the codecs
are often lossy.

    Ron> Then the whole encode/decode confusion goes away.

Unlikely.  Errors like "A string".encode("base64").encode("base64")
are all too easy to commit in practice.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From at  Thu Feb 23 07:28:24 2006
From: at (Almann T. Goo)
Date: Thu, 23 Feb 2006 01:28:24 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

> If we wanted to be fully consistent with the relative import
> mechanism, we would require as many dots as nested scopes.

At first I was a bit taken a back with the syntax, but after reading
PEP 328 (re: Relative Import) I think I can stomach the syntax a bit
better ; ).

That said, -1 because I believe it adds more problems than the one it
is designed to fix.

Part of me can appreciate using the prefixing "dot" as a way to spell
"my parent's scope" since it does not add a new keyword and in this
regard would appear to be equally as backwards compatible as the ":="
proposal (to which I am not a particularly big fan of either but could
probably get used to it).

Since the current semantics allow *evaluation* to an enclosing scope's
name by an "un-punctuated" name, "var" is a synonym to ".var" (if
"var" is bound in the immediately enclosing scope).  However for
*re-binding* to an enclosing scope's name, the "punctuated" name is
the only one we can use, so the semantic becomes more cluttered.

This can make a problem that I would say is akin to the "dangling else problem."

    def incrementer_getter(val):
       def incrementer():
           val = 5
           def inc():
               ..val += 1
               return val
           return inc
       return incrementer

Building on an example that Steve wrote to demonstrate the syntax
proposed, you can see that if a user inadvertently uses the enclosing
scope for the return instead of what would presumably be the outer
most bound parameter.  Now remove the binding in the incrementer
function and it works the way the user probably thought.

Because of this, I think by adding the "dot" to allow resolving a name
in an explicit way hurts the language by adding a new "gotcha" with
existing name binding semantics.

I would be okay with this if all name access for enclosing scopes
(binding and evaluation) required the "dot" syntax (as I believe Steve
suggests for Python 3K)--thus keeping the semantics cleaner--but that
would be incredibly backwards incompatible for what I would guess is
*a lot* of code.  This is where the case for the re-bind operator
(i.e. ":=") or an "outer" type keyword is stronger--the semantics in
the language today are not adversely affected.

Almann T. Goo at

From rrr at  Thu Feb 23 08:18:42 2006
From: rrr at (Ron Adam)
Date: Thu, 23 Feb 2006 01:18:42 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<dtiemd$a3k$>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
>>>>>> "Ron" == Ron Adam <rrr at> writes:
>     Ron> Terry Reedy wrote:
>     >> I prefer the shorter names and using recode, for instance, for
>     >> bytes to bytes.
>     Ron> While I prefer constructors with an explicit encode argument,
>     Ron> and use a recode() method for 'like to like' coding. 
> 'Recode' is a great name for the conceptual process, but the methods
> are directional.  Also, in internationalization work, "recode"
> strongly connotes "encodingA -> original -> encodingB", as in iconv.

We could call it transform or translate if needed.  Words are reused 
constantly in languages, so I don't think it's a sticking point.  As 
long as its meaning is documented well and doesn't change later, I think 
it would be just fine.  If the concept of not having encode and decode 
as methods work, (and has support other than me) the name can be decided 

> I do prefer constructors, as it's generally not a good idea to do
> encoding/decoding in-place for human-readable text, since the codecs
> are often lossy.
>     Ron> Then the whole encode/decode confusion goes away.
> Unlikely.  Errors like "A string".encode("base64").encode("base64")
> are all too easy to commit in practice.

Yes,... and wouldn't the above just result in a copy so it wouldn't be 
an out right error.  But I understand that you mean similar cases where 
it would change the bytes with consecutive calls.  In any case, I was 
referring to the confusion with the method names and how they are used.

This is how I was thinking of it.

    * Given that the string type gains a __codec__ attribute to handle 
automatic decoding when needed.   (is there a reason not to?)

       str(object[,codec][,error]) -> string coded with codec

       unicode(object[,error]) -> unicode

       bytes(object) -> bytes

     * a recode() method is used for transformations that *do_not* 
change the current codec.

See any problems with it?  (Other than from gross misuse of course and 
your dislike of 'recode' as the name.)

There may still be a __decode__() method on strings to do the actual 
decoding, but it wouldn't be part of the public interface.  Or it could 
call a function from the codec to do it.

     return self.codec.decode(self)

The only catching point I see is if having an additional attribute on 
strings would increase the memory which many small strings would use. 
That may be why it wasn't done this way to start.  (?)


From fumanchu at  Thu Feb 23 08:53:58 2006
From: fumanchu at (Robert Brewer)
Date: Wed, 22 Feb 2006 23:53:58 -0800
Subject: [Python-Dev] Unifying trace and profile
References: <6949EC6CD39F97498A57E0FA55295B2101C312B6@ex9.hostedexchange.local>
Message-ID: <6949EC6CD39F97498A57E0FA55295B21411613@ex9.hostedexchange.local>

I, Robert, wrote:
> 1. Allow trace hooks to receive c_call, c_return,
> and c_exception events (like profile does).

and Nicholas Bastin replied:
> I can easily make this modification.  You can also
> register the same bound method for trace and profile,
> which sort of eliminates this problem.

Wonderful! It looked easy. :)

I worked around this by registering one function for trace, and another for profile. The profile function rejects any non-C event and then calls the trace function.

> 3. Expose new sys.gettrace() and getprofile() methods,
> so trace and profile functions that want to play nice
> can call sys.settrace/setprofile(None) only if they
> are the current hook.

> Not a bad idea, although are you really running into
> this problem a lot?

Well, not "a lot", as I don't expect I'll write very many debuggers in my lifetime ;) But it's important when you have multiple, different debugging systems running at once, either to take advantage of the strengths of each, or to debug a debugger.

> 4. Make "the same move" that sys.exitfunc -> atexit made
> (from a single function to multiple functions via
> registration), so multiple tracers/profilers can play
> nice together.

> It seems very unlikely that you'll want to have a trace
> hook and profile hook installed at the same time, given
> the extreme unreliability this will introduce into the
> profiler.

True; this request is partly driven by the differing capabilities of each (only profile can handle C events at the moment). Being able to compose debuggers as described above is another reason. Anything else is just the usual (often ignorable) desire for elegance.

> 5. Allow the core to filter on the "event" arg before
> hook(frame, event, arg) is called.

> What do you mean by this, exactly?  How would you use
> this feature?

As you hinted, I mean that call_trace would only call the trace function if the current event were in a list of "events I want to monitor"; that list of events could be supplied, for example, with a new sys.settrace(func[, events]) signature, with "events" deafulting to all events for backward compatibility. A single int could be used internally, where each bit represents one of the event types.

This would be necessary if trace and profile were unified (see next). If they're not, it's less compelling.

> 6. Unify tracing and profiling, which would remove a
> lot of redundant code in ceval and sysmodule and free
> up some space in the PyThreadState struct to boot.

> The more events you throw in profiling makes it slow,
> however.  Line events, while a nice thing to have,
> theoretically, would probably make a profiler useless.

Sure. If trace functions can receive C events, then there's no need to add that to profiling. I guess I just see profiling as a "stripped down" version of the general trace architecture, and wonder if it couldn't be that in reality as well as appearance; that is, 
profiling becomes tracing with the 'line' events ignored (before they reach back into your Python trace function and slow everything down). But I also note that the current hotshot uses PyEval_SetTrace "if (self->lineevents)", and PyEval_SetProfile otherwise.

> 7. As if the above isn't enough of a dream, it would be nice
> to have a bytecode tracer, which didn't bother with the
> f_lineno logic in maybe_call_line_trace, but just called
> the hook on every instruction.

> I'm working on one, but given how much time I've had to
> work on my profiler in the last year, I'm not even going
> to guess when I'll get a real shot at looking at that.
> My long-term goal is to eliminate profiling and tracing
> from the core interpreter entirely and implement the
> functionality in such a way that they don't cost you
> when not in use (i.e., implement profilers and debuggers
> which poke into the process from the outside, rather
> than be supported natively through events).  This isn't
> impossible, but it's difficult because of the large
> variety of platforms.  I have access to most of them,
> but again, my time is hugely constrained right now for
> python development work.

Ah. Sorry to hear that. :/ But no worries on my end; if only #1 can be done someday, I'll be extremely happy. Find me at PyCon, I'll buy you a drink. :)

Robert Brewer
System Architect
Amor Ministries
fumanchu at
-------------- next part --------------
An HTML attachment was scrubbed...

From abo at  Thu Feb 23 10:21:13 2006
From: abo at (Donovan Baarda)
Date: Thu, 23 Feb 2006 09:21:13 +0000
Subject: [Python-Dev] calendar.timegm
In-Reply-To: <>
References: <000d01c636ca$b53411a0$>
Message-ID: <>

On Tue, 2006-02-21 at 22:47 -0600, skip at wrote:
>     Sergey> Historical question ;)
>     Sergey> Anyone can explain why function timegm is placed into module
>     Sergey> calendar, not to module time, where it would be near with
>     Sergey> similar function mktime?
> Historical accident. ;-)

It seems time contains a simple wrapper around the equivalent C
functions. There is no C equivalent to timegm() (how do they do it?).

The timegm() function is implemented in python using the datetime
module. The name sux BTW.

It would be nice if there was a time.mkgmtime(), but it would need to be
implemented in C.

Donovan Baarda <abo at>

From fuzzyman at  Thu Feb 23 10:59:29 2006
From: fuzzyman at (Fuzzyman)
Date: Thu, 23 Feb 2006 09:59:29 +0000
Subject: [Python-Dev] defaultdict proposal round three
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<001401c6374a$9a3a2fd0$6a01a8c0@RaymondLaptop1>	<dthbeu$39d$>
Message-ID: <>

Greg Ewing wrote:
> Fredrik Lundh wrote:
>> fwiw, the first google hit for "autodict" appears to be part of someone's
>> link farm
>>     At this website we have assistance with autodict. In addition to
>>     information for autodict we also have the best web sites concerning
>>     dictionary, non profit and new york.
> Hmmm, looks like some sort of bot that takes the words in
> your search and stuffs them into its response. I wonder
> if they realise how silly the results end up sounding?
> I've seen these sorts of things before, but I haven't
> quite figured out yet how they manage to get into Google's
> database if they're auto-generated. Anyone have any clues
> what goes on?

I guess the question is, how would google know *not*  to index them ? As 
soon as they are linked to (or more likely they re-use an expired domain 
name that is already in the google database) they will be indexed. They 
may be obviously autogenerated to a human, but it's a lot harder for a 
computer to tell.

It seems that google indexes sites of dubious value - but gives them a 
low pagerank. This means they do appear in results, but only if there is 
nothing more relevant available.

All the best,

Michael Foord
-------------- next part --------------
An HTML attachment was scrubbed...

From tzot at  Thu Feb 23 12:01:53 2006
From: tzot at (Christos Georgiou)
Date: Thu, 23 Feb 2006 13:01:53 +0200
Subject: [Python-Dev] buildbot, and test failures
References: <>
Message-ID: <dtk4lp$ihp$>

""Martin v. L?wis"" <martin at> wrote in message 
news:43FD365C.10801 at
> Anthony Baxter wrote:

>> I
>> have an Ubuntu x86 box here that can become one (I think the only
>> linux, currently, is Gentoo...)
> How different are the Linuxes, though? How many of them do we need?

Actually, we would need enough to cover the libc/gcc combinations that are 
most common.
This isn't feasible, though, so in case we add more Linux machines, at least 
make sure that
the libc/gcc combo is not one already used in the existing ones. 

From skip at  Thu Feb 23 13:35:51 2006
From: skip at (skip at
Date: Thu, 23 Feb 2006 06:35:51 -0600
Subject: [Python-Dev] buildbot, and test failures
In-Reply-To: <dtk4lp$ihp$>
References: <>
	<> <dtk4lp$ihp$>
Message-ID: <>

    Christos> This isn't feasible, though, so in case we add more Linux
    Christos> machines, at least make sure that the libc/gcc combo is not
    Christos> one already used in the existing ones.

Maybe include libc/gcc versions in the name or description?


From jason.orendorff at  Thu Feb 23 16:25:11 2006
From: jason.orendorff at (Jason Orendorff)
Date: Thu, 23 Feb 2006 10:25:11 -0500
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/22/06, Neil Schemenauer <nas at> wrote:
>     @classmethod
>     def fromhex(self, data):
>         data = re.sub(r'\s+', '', data)
>         return bytes(binascii.unhexlify(data))

If it's to be a classmethod, I guess that should be "return self(

-------------- next part --------------
An HTML attachment was scrubbed...

From g.brandl at  Thu Feb 23 16:49:10 2006
From: g.brandl at (Georg Brandl)
Date: Thu, 23 Feb 2006 16:49:10 +0100
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <dtklhm$gsl$>

Phillip J. Eby wrote:
> At 03:49 PM 2/23/2006 +1300, Greg Ewing wrote:
>>Steven Bethard wrote:
>> >  And, as you mention, it's consistent
>> > with the relative import feature.
>>Only rather vaguely -- it's really somewhat different.
>>With imports, .foo is an abbreviation for,
>>where myself is the absolute name for the current module,
>>and you could replace all instances of .foo with that.
> Actually, "import .foo" is an abbreviation for "import", not 
> "import".

Actually, "import .foo" won't work anyway.

nitpicking-ly yours,

From steven.bethard at  Thu Feb 23 17:19:08 2006
From: steven.bethard at (Steven Bethard)
Date: Thu, 23 Feb 2006 09:19:08 -0700
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/22/06, Almann T. Goo < at> wrote:
> Since the current semantics allow *evaluation* to an enclosing scope's
> name by an "un-punctuated" name, "var" is a synonym to ".var" (if
> "var" is bound in the immediately enclosing scope).  However for
> *re-binding* to an enclosing scope's name, the "punctuated" name is
> the only one we can use, so the semantic becomes more cluttered.
> This can make a problem that I would say is akin to the "dangling else problem."
>     def incrementer_getter(val):
>        def incrementer():
>            val = 5
>            def inc():
>                ..val += 1
>                return val
>            return inc
>        return incrementer
> Building on an example that Steve wrote to demonstrate the syntax
> proposed, you can see that if a user inadvertently uses the enclosing
> scope for the return instead of what would presumably be the outer
> most bound parameter.  Now remove the binding in the incrementer
> function and it works the way the user probably thought.

Sorry, what way did the user think?  I'm not sure what you think was
supposed to happen.

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From chris at  Thu Feb 23 22:01:08 2006
From: chris at (Chris AtLee)
Date: Thu, 23 Feb 2006 16:01:08 -0500
Subject: [Python-Dev] Path PEP: some comments (equality)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Mark Mc Mahon <mark.m.mcmahon at> wrote:
> Hi,
> It seems that the Path module as currently defined leaves equality
> testing up to the underlying string comparison. My guess is that this
> is fine for Unix (maybe not even) but it is a bit lacking for Windows.
> Should the path class implement an __eq__ method that might do some of
> the following things:
>  - Get the absolute path of both self and the other path
>  - normcase both
>  - now see if they are equal
> This would make working with paths much easier for keys of a
> dictionary on windows. (I frequently use a case insensitive string
> class for paths if I need them to be keys of a dict.)

The PEP specifies path.samefile(), which is useful in the case of
files that actually exist, but pretty much useless for comparing paths
that don't exist on the local machine.

I think leaving __eq__ as the default string comparison is best.  But
what about providing an alternate platform-specific equality test?

def isequal(self, other, platform="native"):
    """Return True if self is equivalent to other using platform's
path comparison rules.  platform can be one of "native", "posix",
"windows", "mac"."""

This could do some combination of os.path.normpath() and
os.path.normcase() depending on the platform.  The docs for
os.path.normpath() say that it may change the meaning of the path if
it contains's not clear to me how though.


From guido at  Thu Feb 23 22:18:57 2006
From: guido at (Guido van Rossum)
Date: Thu, 23 Feb 2006 16:18:57 -0500
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/22/06, Michael Chermside <mcherm at> wrote:
> A minor related point about on_missing():
> Haven't we learned from regrets over the .next() method of iterators
> that all "magically" invoked methods should be named using the __xxx__
> pattern? Shouldn't it be named __on_missing__() instead?

Good point. I'll call it __missing__. I've uploaded a new patch to

--Guido van Rossum (home page:

From thomas at  Thu Feb 23 22:24:02 2006
From: thomas at (Thomas Wouters)
Date: Thu, 23 Feb 2006 22:24:02 +0100
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
	<> <>
Message-ID: <>

On Thu, Feb 23, 2006 at 05:25:30PM +1300, Greg Ewing wrote:
> Samuele Pedroni wrote:
> > If you are looking for rough edges about nested scopes in Python
> > this is probably worse:
> > 
> >  >>> x = []
> >  >>> for i in range(10):
> > ...   x.append(lambda : i)
> > ...
> >  >>> [y() for y in x]
> > [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
> As an aside, is there any chance that this could be
> changed in 3.0? I.e. have the for-loop create a new
> binding for the loop variable on each iteration.

You can't do that without introducing a whole new scope for the body of the
'for' loop, and that means (in the current rules) you can't assign to any
function-local names in the for loop. The nested scope in that 'lambda'
refers to the 'slot' for the variable 'i' in the outer namespace (in this
case, the global one.) You can't 'remove' the binding, either; 'del' will
not allow you to.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Thu Feb 23 22:41:31 2006
From: thomas at (Thomas Wouters)
Date: Thu, 23 Feb 2006 22:41:31 +0100
Subject: [Python-Dev] getdefault(), the real replacement for setdefault()
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 22, 2006 at 10:29:08PM -0500, Barry Warsaw wrote:
> d.getdefault('foo', list).append('bar')

> Anyway, I don't think it's an either/or choice with Guido's subclass.
> Instead I think they are different use cases.  I would add getdefault()
> to the standard dict API, remove (eventually) setdefault(), and add
> Guido's subclass in a separate module.  But I /wouldn't/ clutter the
> built-in dict's API with on_missing().

+1. This is a much closer match to my own use of setdefault than Guido's
dict subtype. I'm +0 on the subtype, but I prefer the call-time decision on
whether to fall back to a default or not.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From thomas at  Thu Feb 23 22:45:19 2006
From: thomas at (Thomas Wouters)
Date: Thu, 23 Feb 2006 22:45:19 +0100
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Feb 22, 2006 at 01:13:28PM -0800, Michael Chermside wrote:

> Haven't we learned from regrets over the .next() method of iterators
> that all "magically" invoked methods should be named using the __xxx__
> pattern? Shouldn't it be named __on_missing__() instead?

I agree that on_missing should be __missing__ (or __missing_key__) but I
don't agree on the claim that all 'magically' invoked methods should be
two-way-double-underscored. __methods__ are methods that should only be
called 'magically', or by the object itself. 'next' has quite a few usecases
where it's desireable to call it directly (and I often do.)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From walter at  Thu Feb 23 22:55:40 2006
From: walter at (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 23 Feb 2006 22:55:40 +0100
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> On 2/22/06, Michael Chermside <mcherm at> wrote:
>> A minor related point about on_missing():
>> Haven't we learned from regrets over the .next() method of iterators
>> that all "magically" invoked methods should be named using the __xxx__
>> pattern? Shouldn't it be named __on_missing__() instead?
> Good point. I'll call it __missing__. I've uploaded a new patch to

I always thought that __magic__ method calls are done by Python on 
objects it doesn't know about. The special method name ensures that it 
is indeed the protocol Python is talking about, not some random method 
(with next() being the exception). In the defaultdict case this isn't a 
problem, because defaultdict is calling its own method.

    Walter D?rwald

From greg.ewing at  Fri Feb 24 01:25:50 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 24 Feb 2006 13:25:50 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> Please define "character," and explain how its semantics map to
> Python's unicode objects.

One of the 65 abstract entities referred to in the RFC
and represented in that RFC by certain visual glyphs.
There is a subset of the Unicode code points that
are conventionally associated with very similar glyphs,
so that there is an obvious one-to-one mapping between
these entities and those Unicode code points. These
entities therefore have a natural and obvious
representation using Python unicode strings.

> No, base64 isn't a wire protocol.  Rather, it's a schema for a family
> of wire protocols, whose alphabets are heuristically chosen on the
> assumption that code units which happen to correspond to alpha-numeric
> code points in a commonly-used coded character set are more likely to
> pass through a communication channel without corruption.

Yes, and it's up to the programmer to choose those code
units (i.e. pick an encoding for the characters) that
will, in fact, pass through the channel he is using
without corruption. I don't see how any of this is
inconsistent with what I've said.

> Only if you do no transformations that will harm the base64-encoding.
> ...  It doesn't allow any of the
> usual transformations on characters that might be applied globally to
> a mail composition buffer, for example.

I don't understand that. Obviously if you rot13 your
mail message or turn it into pig latin or something,
it's going to mess up any base64 it might contain.
But that would be a silly thing to do to a message
containing base64.

Given any piece of text, there are things it makes
sense to do with it and things it doesn't, depending
entirely on the use to which the text will eventually
be put. I don't see how base64 is any different in
this regard.

> So then you bring it right back in with base64.  Now they need to know
> about bytes<->unicode codecs.

No, they need to know about the characteristics of
the channel over which they're sending the data.

Base64 is designed for situations in which you
have a *text* channel that you know is capable of
transmitting at least a certain subset of characters,
where "character" means whatever is used as input
to that channel.

In Py3k, text will be represented by unicode strings.
So a Py3k text channel should take unicode as its
input, not bytes.

I think we've got a bit sidetracked by talking about
mime. I wasn't actually thinking about mime, but
just a plain text message into which some base64
data was being inserted. That's the way we used to
do things in the old days with uuencode etc, before
mime was invented.

Here, the "channel" is NOT the socket or whatever
that the ultimate transmission takes place over --
it's the interface to your mail sending software
that takes a piece of plain text and sends it off
as a mail message somehow.

In Py3k, if a channel doesn't take unicode as input,
then it's not a text channel, and it's not appropriate
to be using base64 with it directly. It might be
appropriate to to use base64 followed by some encoding,
but the programmer needs to be aware of that and
choose the encoding wisely. It's not possible to
shield him from having to know about encodings in
that situation, even if the encoding is just ascii.
Trying to do so will just lead to more confusion,
in my opinion.


From facundobatista at  Fri Feb 24 03:12:21 2006
From: facundobatista at (Facundo Batista)
Date: Thu, 23 Feb 2006 23:12:21 -0300
Subject: [Python-Dev] OT: T-Shirts
Message-ID: <>

Python Argentina finally have T-Shirts (you can see a photo here:

Why this mail to python-dev? Because the group decided to give some,
as a present, to some outstanding members of Python:

  Guido van Rossum
  Alex Martelli
  Tim Peters
  Fredrik Lundh
  David Ascher
  Mark Lutz
  Mark Hammond

Also, some of us want to give one as a personal present:

  Raymond Hettinger (from Facundo Batista)
  Bob Ippolito      (from Alejandro David Weil)
  Glyph Lefkowitz   (from Alejandro J. Cura)

The point is that I don't know some of you, so please grab my shoulder
here in PyCon. And if you're not coming to the conference but somebody
can carry it to you, just let me know.

And if you want to buy one, I brought some, only USD 12, ;).

Thank you very much and sorry for the OT.


.    Facundo


From mcherm at  Fri Feb 24 03:39:56 2006
From: mcherm at (Michael Chermside)
Date: Thu, 23 Feb 2006 21:39:56 -0500
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

Walter D?rwald writes:
> I always thought that __magic__ method calls are done by Python on
> objects it doesn't know about. The special method name ensures that it
> is indeed the protocol Python is talking about, not some random method
> (with next() being the exception). In the defaultdict case this isn't a
> problem, because defaultdict is calling its own method.

I, instead, felt that the __xxx__ convention served a few purposes. First,
it indicates that the method will be called in some means OTHER than
by name (generally, the interpreter invokes it directly, although in this
case it's a built-in method of dict that would invoke it). Secondly, it serves
to flag the method as being special -- true newbies can safely ignore
nearly all special methods aside from __init__(). And it serves to create
a separate namespace... writers of Python code know that names
beginning and ending with double-underscores are "reserved for the
language". Of these, I always felt that special invocation was the most
important feature. The next() method of iterators was an interesting
object lesson. The original reasoning (I think) for using next() not
__next__() was that *sometimes* the method was called directly by
name (when stepping an iterator manually, which one frequently does
for perfectly good reasons). Since it was sometimes invoked by name
and sometimes by special mechanism, the choice was to use the
unadorned name, but later experience showed that it would have been
better the other way.

-- Michael Chermside

From aleaxit at  Fri Feb 24 06:15:26 2006
From: aleaxit at (Alex Martelli)
Date: Thu, 23 Feb 2006 21:15:26 -0800
Subject: [Python-Dev] OT: T-Shirts
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 23, 2006, at 6:12 PM, Facundo Batista wrote:

> Python Argentina finally have T-Shirts (you can see a photo here:
> Why this mail to python-dev? Because the group decided to give some,
> as a present, to some outstanding members of Python:
>   Guido van Rossum
>   Alex Martelli
     <many other worthies snipped>

T-shirts?  I'm an absolute fan of T-shirts...!-)

> The point is that I don't know some of you, so please grab my shoulder
> here in PyCon. And if you're not coming to the conference but somebody
> can carry it to you, just let me know.

Anna can bring mine!!!

> And if you want to buy one, I brought some, only USD 12, ;).

Anna, please buy one for yourself before they run out -- they're  
cool, and this way we can go around as the AR (Anna Ravenscroft, of  
course!) Python Twins...!-)


From greg.ewing at  Fri Feb 24 07:53:12 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 24 Feb 2006 19:53:12 +1300
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Chermside wrote:
> The next() method of iterators was an interesting
> object lesson. ... Since it was sometimes invoked by name
> and sometimes by special mechanism, the choice was to use the
> unadorned name, but later experience showed that it would have been
> better the other way.

Any thoughts about fixing this in 3.0?


From greg.ewing at  Fri Feb 24 07:54:07 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 24 Feb 2006 19:54:07 +1300
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

Thomas Wouters wrote:

> __methods__ are methods that should only be
> called 'magically', or by the object itself. 
 > 'next' has quite a few usecases where it's
> desireable to call it directly

That's why the proposal to replace .next() with
.__next__() comes along with a function next(obj)
which calls obj.__next__().


From greg.ewing at  Fri Feb 24 07:54:14 2006
From: greg.ewing at (Greg Ewing)
Date: Fri, 24 Feb 2006 19:54:14 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
	<> <>
Message-ID: <>

Thomas Wouters wrote:
> On Thu, Feb 23, 2006 at 05:25:30PM +1300, Greg Ewing wrote:
>>As an aside, is there any chance that this could be
>>changed in 3.0? I.e. have the for-loop create a new
>>binding for the loop variable on each iteration.
> You can't do that without introducing a whole new scope 

for the body of the
> 'for' loop,

There's no need for that. The new scope need only
include the loop variable -- everything else could
still refer to the function's main scope.

There's even a rather elegant way of implementing
this in the current CPython. If a nested scope
references the loop variable, then it will be in
a cell. So you just create a new cell each time
round the loop, instead of changing the existing

This would even still let you use the value after
the loop finished, if that were considered a good
idea. But it might be better not to allow that,
since it could make alternative implementations


From hoffman at  Fri Feb 24 10:33:37 2006
From: hoffman at (Michael Hoffman)
Date: Fri, 24 Feb 2006 09:33:37 +0000
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
Message-ID: <>

[Neil Schemenauer]
>>     @classmethod
>>     def fromhex(self, data):
>>         data = re.sub(r'\s+', '', data)
>>         return bytes(binascii.unhexlify(data))

[Jason Orendorff]
> If it's to be a classmethod, I guess that should be "return self(
> binascii.unhexlify(data))".

Am I the only one who finds the use of "self" on a classmethod to be
incredibly confusing? Can we please follow PEP 8 and use "cls"
Michael Hoffman <hoffman at>
European Bioinformatics Institute

From stephen at  Fri Feb 24 11:15:34 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 24 Feb 2006 19:15:34 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Ron Adam's message of "Thu, 23
	Feb 2006 01:18:42 -0600")
References: <>
	<> <dtiemd$a3k$>
Message-ID: <>

>>>>> "Ron" == Ron Adam <rrr at> writes:

    Ron> We could call it transform or translate if needed.

You're still losing the directionality, which is my primary objection
to "recode".  The absence of directionality is precisely why "recode"
is used in that sense for i18n work.

There really isn't a good reason that I can see to use anything other
than the pair "encode" and "decode".  In monolingual environments,
once _all_ human-readable text (specifically including Python programs
and console I/O) is automatically mapped to a Python (unicode) string,
most programmers will never need to think about it as long as Python
(the project) very very strongly encourages that all Python programs
be written in UTF-8 if there's any chance the program will be reused
in a locale other than the one where it was written.  (Alternatively
you can depend on PEP 263 coding cookies.)  Then the user (or the
Python interpreter) just changes console and file I/O codecs to the
encoding in use in that locale, and everything just works.

So the remaining uses of "encode" and "decode" are for advanced users
and specialists: people using stuff like base64 or gzip, and those who
need to use unicode codecs explicitly.

I could be wrong about the possibility to get rid of explicit unicode
codec use in monolingual environments, but I hope that we can at least
try to achieve that.

    >> Unlikely.  Errors like "A
    >> string".encode("base64").encode("base64") are all too easy to
    >> commit in practice.

    Ron> Yes,... and wouldn't the above just result in a copy so it
    Ron> wouldn't be an out right error.

No, you either get the following:

A string. -> QSBzdHJpbmcu -> UVNCemRISnBibWN1

or you might get an error if base64 is defined as bytes->unicode.

    Ron>     * Given that the string type gains a __codec__ attribute
    Ron> to handle automatic decoding when needed.  (is there a reason
    Ron> not to?)

    Ron>        str(object[,codec][,error]) -> string coded with codec

    Ron>        unicode(object[,error]) -> unicode

    Ron>        bytes(object) -> bytes

str == unicode in Py3k, so this is a non-starter.  What do you want to

    Ron>      * a recode() method is used for transformations that
    Ron> *do_not* change the current codec.

I'm not sure what you mean by the "current codec".  If it's attached
to an "encoded object", it should be the codec needed to decode the
object.  And it should be allowed to be a "codec stack".  So suppose
you start with a unicode object "obj".  Then

>>> bytes = bytes (obj, 'utf-8')    # implicit .encode()
>>> print bytes.codec
>>> wire = bytes.encode ('base64')  # with apologies to Greg E.
>>> print wire.codec
['base64', 'utf-8']
>>> obj2 = wire.decode ('gzip')
>>> obj2 = wire.decode (wire.codec)
>>> print obj == obj2
>>> print obj2.codec

or maybe None for the last.  I think this would be very nice as a
basis for improving the email module (for one), but I don't really
think it belongs in Python core.

    Ron> That may be why it wasn't done this way to start.  (?)

I suspect the real reason is that Marc-Andre had the generalized codec
in mind from Day 0, and your proposal only works with duck-typing if
codecs always have a well-defined signature with two different types
for the argument and return of the "constructor".

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From stephen at  Fri Feb 24 12:05:55 2006
From: stephen at (Stephen J. Turnbull)
Date: Fri, 24 Feb 2006 20:05:55 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Fri, 24 Feb 2006 13:25:50 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> Stephen J. Turnbull wrote:

    >> No, base64 isn't a wire protocol.  It's a family[...].

    Greg> Yes, and it's up to the programmer to choose those code
    Greg> units (i.e. pick an encoding for the characters) that will,
    Greg> in fact, pass through the channel he is using without
    Greg> corruption. I don't see how any of this is inconsistent with
    Greg> what I've said.

It's not.  It just shows that there are other "correct" ways to think
about the issue.

    >> Only if you do no transformations that will harm the
    >> base64-encoding.  ...  It doesn't allow any of the usual
    >> transformations on characters that might be applied globally to
    >> a mail composition buffer, for example.

    Greg> I don't understand that. Obviously if you rot13 your mail
    Greg> message or turn it into pig latin or something, it's going
    Greg> to mess up any base64 it might contain.  But that would be a
    Greg> silly thing to do to a message containing base64.

What "message containing base64"?  "Any base64 in there?"  "Nope,
nobody here but us Unicode characters!"  I certainly hope that in Py3k
bytes objects will have neither ROT13 nor case-changing methods, but
str objects certainly will.  Why give up the safety of that

    Greg> Given any piece of text, there are things it makes sense to
    Greg> do with it and things it doesn't, depending entirely on the
    Greg> use to which the text will eventually be put.  I don't see
    Greg> how base64 is any different in this regard.

If you're going to be binary about it, it's not different.  However
the kind of "text" for which Unicode was designed is normally produced
and consumed by people, who wll pt up w/ ll knds f nnsns.  Base64
decoders will not put up with the same kinds of nonsense that people

You're basically assuming that the person who implements the code that
processes a Unicode string is the same person who implemented the code
that converts a binary object into base64 and inserts it into a
string.  I think that's a dangerous (and certainly invalid) assumption.

I know I've lost time and data to applications that make assumptions
like that.  In fact, that's why "MULE" is a four-letter word in Emacs

    >> So then you bring it right back in with base64.  Now they need
    >> to know about bytes<->unicode codecs.

    Greg> No, they need to know about the characteristics of the
    Greg> channel over which they're sending the data.

I meant it in a trivial sense: "How do you use a bytes<->unicode codec
properly without knowing that it's a bytes<->unicode codec?"

In most environments, it should be possible to hide bytes<->unicode
codecs almost all the time, and I think that's a very good thing.  I
don't think it's a good idea to gratuitously introduce wire protocols
as unicode codecs, even if a class of bit patterns which represent the
integer 65 are denoted "A" in various sources.  Practicality beats
purity (especially when you're talking about the purity of a pregnant

    Greg> It might be appropriate to to use base64 followed by some
    Greg> encoding, but the programmer needs to be aware of that and
    Greg> choose the encoding wisely. It's not possible to shield him
    Greg> from having to know about encodings in that situation, even
    Greg> if the encoding is just ascii.

What do you think the email module does?  Assuming conforming MIME
messages and receivers capable of handling UTF-8, the user of the
email module does not need to know anything about any encodings at
all.  With a little more smarts, the email module could even make a
good choice of output encoding based on the _language_ of the text,
removing the restriction to UTF-8 on the output side, too.  With the
aid of file(1), it can make excellent guesses about attachments.

Sure, the email module programmer needs to know, but the email module
programmer needs to know an awful lot about codecs anyway, since mail
at that level is a binary channel, while users will be throwing a
mixed bag of binary and textual objects at it.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From barry at  Fri Feb 24 15:22:19 2006
From: barry at (Barry Warsaw)
Date: Fri, 24 Feb 2006 09:22:19 -0500
Subject: [Python-Dev] getdefault(), the real replacement for setdefault()
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 23, 2006, at 4:41 PM, Thomas Wouters wrote:

> On Wed, Feb 22, 2006 at 10:29:08PM -0500, Barry Warsaw wrote:
>> d.getdefault('foo', list).append('bar')
>> Anyway, I don't think it's an either/or choice with Guido's subclass.
>> Instead I think they are different use cases.  I would add  
>> getdefault()
>> to the standard dict API, remove (eventually) setdefault(), and add
>> Guido's subclass in a separate module.  But I /wouldn't/ clutter the
>> built-in dict's API with on_missing().
> +1. This is a much closer match to my own use of setdefault than  
> Guido's
> dict subtype. I'm +0 on the subtype, but I prefer the call-time  
> decision on
> whether to fall back to a default or not.

Cool!  As your reward:

SF patch #1438113 


From foom at  Fri Feb 24 16:40:57 2006
From: foom at (James Y Knight)
Date: Fri, 24 Feb 2006 10:40:57 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
	<> <>
Message-ID: <>

On Feb 24, 2006, at 1:54 AM, Greg Ewing wrote:
> Thomas Wouters wrote:
>> On Thu, Feb 23, 2006 at 05:25:30PM +1300, Greg Ewing wrote:
>>> As an aside, is there any chance that this could be
>>> changed in 3.0? I.e. have the for-loop create a new
>>> binding for the loop variable on each iteration.
>> You can't do that without introducing a whole new scope
>> for the body of the 'for' loop,
> There's no need for that. The new scope need only
> include the loop variable -- everything else could
> still refer to the function's main scope.

No, that would be insane. You get the exact same problem, now even  
more confusing:

for x in range(10):
   y = x
   l.append(lambda: (x, y))

print l[0]()

With your suggestion, that would print (0, 9).

Unless python grows a distinction between creating a binding and  
assigning to one as most other languages have, this problem is here  
to stay.


From python at  Fri Feb 24 16:59:27 2006
From: python at (Raymond Hettinger)
Date: Fri, 24 Feb 2006 10:59:27 -0500
Subject: [Python-Dev] defaultdict and on_missing()
References: <><><><>
Message-ID: <015901c6395d$3b9fd4b0$2e2a960a@RaymondLaptop1>

> Michael Chermside wrote:
>> The next() method of iterators was an interesting
>> object lesson. ... Since it was sometimes invoked by name
>> and sometimes by special mechanism, the choice was to use the
>> unadorned name, but later experience showed that it would have been
>> better the other way.

> Any thoughts about fixing this in 3.0?

IMO, it isn't broken. It was an intentional divergence from naming conventions. 
The reasons for the divergence haven't changed.  Code that uses next() is more 
understandable, friendly, and readable without the walls of underscores.


From aleaxit at  Fri Feb 24 17:47:45 2006
From: aleaxit at (Alex Martelli)
Date: Fri, 24 Feb 2006 08:47:45 -0800
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <015901c6395d$3b9fd4b0$2e2a960a@RaymondLaptop1>
References: <>
Message-ID: <>

On 2/24/06, Raymond Hettinger <python at> wrote:
> > Michael Chermside wrote:
> >> The next() method of iterators was an interesting
> >> object lesson. ... Since it was sometimes invoked by name
> >> and sometimes by special mechanism, the choice was to use the
> >> unadorned name, but later experience showed that it would have been
> >> better the other way.
> [Grep]
> > Any thoughts about fixing this in 3.0?
> IMO, it isn't broken. It was an intentional divergence from naming conventions.
> The reasons for the divergence haven't changed.  Code that uses next() is more
> understandable, friendly, and readable without the walls of underscores.

Wouldn't, say, next(foo) [[with a hypothetical builtin 'next'
internally calling foo.__next__(), just like builtin 'len' internally
calls foo.__len__()]] be just as friendly etc? No biggie either way,
but that would seem to be more aligned with Python's usual approach.


From nnorwitz at  Fri Feb 24 18:44:39 2006
From: nnorwitz at (Neal Norwitz)
Date: Fri, 24 Feb 2006 11:44:39 -0600
Subject: [Python-Dev] Dropping support for Win9x in 2.6
Message-ID: <>

Martin and I were talking about dropping support for older versions of
Windows (of the non-NT flavor).  We both thought that it was
reasonable to stop supporting Win9x (including WinME) in Python 2.6. 
I updated PEP 11 to reflect this.

The Python 2.5 installer will present a warning message on the systems
which will not be supported in Python 2.6.


From g.brandl at  Fri Feb 24 19:00:06 2006
From: g.brandl at (Georg Brandl)
Date: Fri, 24 Feb 2006 19:00:06 +0100
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
Message-ID: <dtnhj6$lke$>

Neal Norwitz wrote:
> Martin and I were talking about dropping support for older versions of
> Windows (of the non-NT flavor).  We both thought that it was
> reasonable to stop supporting Win9x (including WinME) in Python 2.6. 
> I updated PEP 11 to reflect this.
> The Python 2.5 installer will present a warning message on the systems
> which will not be supported in Python 2.6.

Hey, someone even wanted to continue supporting DOS...


From fuzzyman at  Fri Feb 24 19:06:59 2006
From: fuzzyman at (Michael Foord)
Date: Fri, 24 Feb 2006 18:06:59 +0000
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <dtnhj6$lke$>
References: <>
Message-ID: <>

Georg Brandl wrote:
> Neal Norwitz wrote:
>> Martin and I were talking about dropping support for older versions of
>> Windows (of the non-NT flavor).  We both thought that it was
>> reasonable to stop supporting Win9x (including WinME) in Python 2.6. 
>> I updated PEP 11 to reflect this.
>> The Python 2.5 installer will present a warning message on the systems
>> which will not be supported in Python 2.6.
> Hey, someone even wanted to continue supporting DOS...
A lot of people are still using Windows 98.  But I guess if noone is 
volunteering to maintain the code...

Michael Foord

> Georg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at
> Unsubscribe:

From aahz at  Fri Feb 24 19:29:27 2006
From: aahz at (Aahz)
Date: Fri, 24 Feb 2006 10:29:27 -0800
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
	<dtnhj6$lke$> <>
Message-ID: <>

On Fri, Feb 24, 2006, Michael Foord wrote:
> Georg Brandl wrote:
>> Neal Norwitz wrote:
>>> Martin and I were talking about dropping support for older versions of
>>> Windows (of the non-NT flavor).  We both thought that it was
>>> reasonable to stop supporting Win9x (including WinME) in Python 2.6. 
>>> I updated PEP 11 to reflect this.
>>> The Python 2.5 installer will present a warning message on the systems
>>> which will not be supported in Python 2.6.
>> Hey, someone even wanted to continue supporting DOS...
> A lot of people are still using Windows 98.  But I guess if noone is 
> volunteering to maintain the code...

DOS has some actual utility for low-grade devices and is overall a
simpler platform to deliver code for.  At the standard 18-month release
cycle, it will be beginning of 2008 for the release of 2.6, which is ten
years after Win98.
Aahz (aahz at           <*>

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis

From 2005a at  Fri Feb 24 20:02:21 2006
From: 2005a at (Alexander Schremmer)
Date: Fri, 24 Feb 2006 20:02:21 +0100
Subject: [Python-Dev] Dropping support for Win9x in 2.6
References: <>
	<dtnhj6$lke$> <>
Message-ID: <>

On Fri, 24 Feb 2006 10:29:27 -0800, Aahz wrote:

> DOS has some actual utility for low-grade devices and is overall a
> simpler platform to deliver code for.  At the standard 18-month release
> cycle, it will be beginning of 2008 for the release of 2.6, which is ten
> years after Win98.

The last Windows release of that branch was Windows ME, in September 2000,
i.e. you have to wait till 2010 in order to be ten years after the last
legacy OS release.

Kind regards,

From guido at  Fri Feb 24 21:04:06 2006
From: guido at (Guido van Rossum)
Date: Fri, 24 Feb 2006 14:04:06 -0600
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
	<dtnhj6$lke$> <>
Message-ID: <>

On 2/24/06, Michael Foord <fuzzyman at> wrote:
> A lot of people are still using Windows 98.  But I guess if noone is
> volunteering to maintain the code...

Agreed. If they're so keen on using an antiquated OS, perhaps they
would be perfectly happy using a matching Python version... Somehow I
doubt this is going to be a big deal for anyone affected.

--Guido van Rossum (home page:

From trentm at  Fri Feb 24 21:09:14 2006
From: trentm at (Trent Mick)
Date: Fri, 24 Feb 2006 12:09:14 -0800
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
Message-ID: <>

[Neal Norwitz wrote]
> Martin and I were talking about dropping support for older versions of
> Windows (of the non-NT flavor).  We both thought that it was
> reasonable to stop supporting Win9x (including WinME) in Python 2.6. 
> I updated PEP 11 to reflect this.

Are there specific code areas in mind that would be ripped out for this
or is this mainly to avoid having to test on and ensure new code is
compatible with?


Trent Mick
TrentM at

From facundobatista at  Fri Feb 24 21:12:48 2006
From: facundobatista at (Facundo Batista)
Date: Fri, 24 Feb 2006 17:12:48 -0300
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
Message-ID: <>

2006/2/24, Neal Norwitz <nnorwitz at>:

> Martin and I were talking about dropping support for older versions of
> Windows (of the non-NT flavor).  We both thought that it was
> reasonable to stop supporting Win9x (including WinME) in Python 2.6.


.    Facundo


From jeremy at  Fri Feb 24 23:38:26 2006
From: jeremy at (Jeremy Hylton)
Date: Fri, 24 Feb 2006 17:38:26 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
	<> <>
	<> <>
Message-ID: <>

On 2/24/06, James Y Knight <foom at> wrote:
> On Feb 24, 2006, at 1:54 AM, Greg Ewing wrote:
> > Thomas Wouters wrote:
> >> On Thu, Feb 23, 2006 at 05:25:30PM +1300, Greg Ewing wrote:
> >>
> >>> As an aside, is there any chance that this could be
> >>> changed in 3.0? I.e. have the for-loop create a new
> >>> binding for the loop variable on each iteration.
> >>
> >> You can't do that without introducing a whole new scope
> >> for the body of the 'for' loop,
> >
> > There's no need for that. The new scope need only
> > include the loop variable -- everything else could
> > still refer to the function's main scope.
> No, that would be insane. You get the exact same problem, now even
> more confusing:
> l=[]
> for x in range(10):
>    y = x
>    l.append(lambda: (x, y))
> print l[0]()
> With your suggestion, that would print (0, 9).
> Unless python grows a distinction between creating a binding and
> assigning to one as most other languages have, this problem is here
> to stay.

The more practical complaint is that list comprehensions use the same
namespace as the block that contains them.  It's much easier to miss
an assignment to, say, i in a list comprehension than it is in a
separate statement in the body of a for loop.  Since list comps are
expressions, the only variable at issue is the index variable.  It
would be simple to fix by renaming, but I suspect we're stuck with the
current behavior for backwards compatibility reasons.


From rrr at  Fri Feb 24 23:46:00 2006
From: rrr at (Ron Adam)
Date: Fri, 24 Feb 2006 16:46:00 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<dtiemd$a3k$>	<>	<>	<>
Message-ID: <>

* The following reply is a rather longer than I intended explanation of 
why codings (and how they differ) like 'rot' aren't the same thing as 
pure unicode codecs and probably should be treated differently.
If you already understand that, then I suggest skipping this.  But if 
you like detailed logical analysis, it might be of some interest even if 
it's reviewing the obvious to those who already know.

(And hopefully I didn't make any really obvious errors myself.)

Stephen J. Turnbull wrote:
>>>>>> "Ron" == Ron Adam <rrr at> writes:
>     Ron> We could call it transform or translate if needed.
> You're still losing the directionality, which is my primary objection
> to "recode".  The absence of directionality is precisely why "recode"
> is used in that sense for i18n work.

I think your not understanding what I suggested.  It might help if we 
could agree on some points and then go from there.

So, lets consider a "codec" and a "coding" as being two different things 
where a codec is a character sub set of unicode characters expressed in 
a native format.  And a coding is *not* a subset of the unicode 
character set, but an _opperation_ performed on text.  So you would have 
the following properties.

    codec ->  text is always in *one_codec* at any time.

    coding ->  operation performed on text.

Lets add a special default coding called 'none' to represent a do 
nothing coding. (figuratively for explanation purposes)

    'none' -> return the input as is, or the uncoded text

Given the above relationships we have the following possible 

   1. codec to like codec:   'ascii' to 'ascii'
   2. codec to unlike codec:   'ascii' to 'latin1'

And we have coding relationships of:

   a. coding to like coding      # Unchanged, do nothing
   b. coding to unlike coding

Then we can express all the possible combinations as...

    [1.a, 1.b, 2.a, 2.b]

    1.a -> coding in codec to like coding in like codec:

        'none' in 'ascii' to 'none' in 'ascii'

    1.b -> coding in codec to diff coding in like codec:

        'none' in 'ascii' to 'base64' in 'ascii'

    2.a -> coding in codec to same coding in diff codec:

        'none' in 'ascii' to 'none' in 'latin1'

    2.b -> coding in codec to diff coding in diff codec:

        'none' in 'latin1' to 'base64' in 'ascii'

This last one is a problem as some codecs combine coding with character 
set encoding and return text in a differnt encoding than they recieved. 
  The line is also blurred between types and encodings.  Is unicode and 
encoding?  Will bytes also be a encoding?

Using the above combinations:

(1.a) is just creating a new copy of a object.

    s = str(s)

(1.b) is recoding an object, it returns a copy of the object in the same 

    s = s.encode('hex-codec')  # ascii str -> ascii str coded in hex
    s = s.decode('hex-codec')  # ascii str coded in hex -> ascii str

* these are really two differnt operations. And encoding repeatedly 
results in nested codings.  Codecs (as a pure subset of unicode) don't 
have that property.

* the hex-codec also fit the 2.b pattern below if the source string is 
of a differnt type than ascii. (or the the default string?)

(2.a) creates a copy encoded in a new codec.

    s = s.encode('latin1')

* I beleive string constructors should have a encoding argument for use 
with unicode strings.

    s = str(u, 'latin1')   # This would match the bytes constructor.

(2.b) are combinations of the above.

   s = u.encode('base64')
      # unicode to ascii string as base64 coded characters

   u = unicode(s.decode('base64'))
      # ascii string coded in base64 to unicode characters


>>> u = unicode(s, 'base64')
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  TypeError: decoder did not return an unicode object (type=str)

Ooops...  ;)

So is coding the same as a codec?  I think they have different 
properties and should be treated differently except when the 
practicality over purity rule is needed.  And in those cases maybe the 
names could clearly state the result.

    u.decode('base64ascii')  # name indicates coding to codec

> A string. -> QSBzdHJpbmcu -> UVNCemRISnBibWN1

Looks like the underlying sequence is:

      native string -> unicode -> unicode coded base64 -> coded ascii str

And decode operation would be...

      coded ascii str -> unicode coded base64 -> unicode -> ascii str

Except it may combine some of these steps to speed it up.

Since it's a hybred codec including a coding operation. We have to treat 
it as a codec.

>     Ron>     * Given that the string type gains a __codec__ attribute
>     Ron> to handle automatic decoding when needed.  (is there a reason
>     Ron> not to?)
>     Ron>        str(object[,codec][,error]) -> string coded with codec
>     Ron>        unicode(object[,error]) -> unicode
>     Ron>        bytes(object) -> bytes
> str == unicode in Py3k, so this is a non-starter.  What do you want to
> say?
>     Ron>      * a recode() method is used for transformations that
>     Ron> *do_not* change the current codec.
> I'm not sure what you mean by the "current codec".  If it's attached
> to an "encoded object", it should be the codec needed to decode the
> object.  And it should be allowed to be a "codec stack".  

I wasn't thinking in terms of stacks, but in that case the current codec 
would be the top of the stack.  I think stackable codecs is a very bad 
idea for the record.

Back to recode vs encode/decode, the example used above might be useful.

    s = s.encode('hex-codec')  # ascii str -> ascii str coded in hex
    s = s.decode('hex-codec')  # ascii str coded in hex -> ascii str

In my opinion these are actually too very different (although related) 
operations that would be better expressed with different names.

Curently it's a hybred codec that converts it's input to an ascii string 
(or default encoding?),  but when decoding you end up with an ascii 
encoding even if you started with something else.  So the decode isn't a 
true inverse to encode in some cases.

As a coding operation it would be.

    u = u.recode('to_hex')
    u = u.recode('from_hex')

Where this would work with both unicode and strings without changing the 

It also keeps the 'if i do it again' it will *recode* the coded text' 
relationship. So I think the name is appropriate. IMHO

Pure codecs such as latin-1 can be envoked over and over and you can 
always get back what you put in in a single step.

 >>> s = 'abc'
 >>> for n in range(100):
...   s = s.encode('latin-1')
 >>> print s, type(s)
abc <type 'str'>

Supposedly a lot of these issues will go away in Python 3000. And we can 
probably live with the current state of things.  But even after Python 
3000 it seems to me we will still need access to codecs as we may run 
across encoded text input from various sources.


From nnorwitz at  Sat Feb 25 00:11:26 2006
From: nnorwitz at (Neal Norwitz)
Date: Fri, 24 Feb 2006 17:11:26 -0600
Subject: [Python-Dev] problem with genexp
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/20/06, Jiwon Seo <seojiwon at> wrote:
> Regarding this Grammar change;  (last October)
>      from   argument: [test '=' ] test [gen_for]
>      to      argument: test [gen_for] | test '=' test ['(' gen_for ')']
> - to raise error for "bar(a = i for i in range(10)) )"
> I think we should change it to
>      argument: test [gen_for] | test '=' test
> instead of
>      argument: test [gen_for] | test '=' test ['(' gen_for ')']
> that is, without ['(' gen_for ')'] . We don't need that extra term,
> because "test" itself includes generator expressions - with all those
> parensises.

Works for me, committed.


From nas at  Sat Feb 25 00:52:48 2006
From: nas at (Neil Schemenauer)
Date: Fri, 24 Feb 2006 23:52:48 +0000 (UTC)
Subject: [Python-Dev] Pre-PEP: The "bytes" object
References: <>
Message-ID: <dto68f$3ca$>

Michael Hoffman <hoffman at> wrote:
> Am I the only one who finds the use of "self" on a classmethod to be
> incredibly confusing? Can we please follow PEP 8 and use "cls"
> instead?

Sorry, using "self" was an oversight.  It should be "cls", IMO.


From greg.ewing at  Sat Feb 25 01:20:25 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 25 Feb 2006 13:20:25 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> the kind of "text" for which Unicode was designed is normally produced
> and consumed by people, who wll pt up w/ ll knds f nnsns.  Base64
> decoders will not put up with the same kinds of nonsense that people
> will.

The Python compiler won't put up with that sort of
nonsense either. Would you consider that makes Python
source code binary data rather than text, and that
it's inappropriate to represent it using a unicode

> You're basically assuming that the person who implements the code that
> processes a Unicode string is the same person who implemented the code
> that converts a binary object into base64 and inserts it into a
> string.

No, I'm assuming the user of base64 knows the
characteristics of the channel he's using. You
can only use base64 if you know the channel
promises not to munge the particular characters
that base64 uses. If you don't know that, you
shouldn't be trying to send base64 through that

> In most environments, it should be possible to hide bytes<->unicode
> codecs almost all the time,

But it *is* hidden in the situation I'm talking
about, because all the Unicode encoding/decoding
takes place inside the implementation of the
text channel, which I'm taking as a given.

> I don't think it's a good idea to gratuitously introduce
 > wire protocols as unicode codecs,

I am *not* saying that base64 is a unicode codec!
If that's what you thought I was saying, it's no
wonder we're confusing each other.

It's just a transformation from bytes to
text. I'm only calling it unicode because all
text will be unicode in Py3k. In py2.x it could
just as well be a str -- but a str interpreted
as text, not binary.

> What do you think the email module does?
> Assuming conforming MIME messages

But I'm not assuming mime in the first place. If I
have a mail interface that will accept chunks of
binary data and encode them as a mime message for
me, then I don't need to use base64 in the first

The only time I need to use something like base64
is when I have something that will only accept
text. In Py3k, "accepts text" is going to mean
"takes a character string as input", where
"character string" is a distinct type from
"binary data". So having base64 produce anything
other than a character string would be awkward
and inconvenient.

I phrased that paragraph carefully to avoid using
the word "unicode" anywhere. Does that make it
clearer what I'm getting at?


From greg.ewing at  Sat Feb 25 01:28:34 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 25 Feb 2006 13:28:34 +1300
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <015901c6395d$3b9fd4b0$2e2a960a@RaymondLaptop1>
References: <>
Message-ID: <>

Raymond Hettinger wrote:
> Code that 
> uses next() is more understandable, friendly, and readable without the 
> walls of underscores.

There wouldn't be any walls of underscores, because

   y =

would become

   y = next(x)

The only time you would need to write underscores is
when defining a __next__ method. That would be no worse
than defining an __init__ or any other special method,
and has the advantage that it clearly marks the method
as being special.


From greg.ewing at  Sat Feb 25 01:32:23 2006
From: greg.ewing at (Greg Ewing)
Date: Sat, 25 Feb 2006 13:32:23 +1300
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<dtfou0$81t$> <>
	<> <>
	<> <>
Message-ID: <>

Jeremy Hylton wrote:

> The more practical complaint is that list comprehensions use the same
> namespace as the block that contains them.  
 > ... but I suspect we're stuck with the
> current behavior for backwards compatibility reasons.

There will be no backwards compatibility in 3.0,
so perhaps this could be fixed then?


From guido at  Sat Feb 25 01:48:23 2006
From: guido at (Guido van Rossum)
Date: Fri, 24 Feb 2006 18:48:23 -0600
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/24/06, Greg Ewing <greg.ewing at> wrote:
> Jeremy Hylton wrote:
> > The more practical complaint is that list comprehensions use the same
> > namespace as the block that contains them.
>  > ... but I suspect we're stuck with the
> > current behavior for backwards compatibility reasons.
> There will be no backwards compatibility in 3.0,
> so perhaps this could be fixed then?

Yes that's the plan. [f(x) for x in S] will be syntactic sugar for
list(f(x) for x in S) which already avoids the scope problem.

--Guido van Rossum (home page:

From rrr at  Sat Feb 25 01:47:04 2006
From: rrr at (Ron Adam)
Date: Fri, 24 Feb 2006 18:47:04 -0600
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <dto68f$3ca$>
References: <>	<>	<>	<>	<>
Message-ID: <>

Neil Schemenauer wrote:
> Michael Hoffman <hoffman at> wrote:
>> Am I the only one who finds the use of "self" on a classmethod to be
>> incredibly confusing? Can we please follow PEP 8 and use "cls"
>> instead?
> Sorry, using "self" was an oversight.  It should be "cls", IMO.
>   Neil


Why was it decided that the unicode encoding argument should be ignored 
if the first argument is a string?  Wouldn't an exception be better 
rather than give the impression it does something when it doesn't?


From nas at  Sat Feb 25 02:14:37 2006
From: nas at (Neil Schemenauer)
Date: Sat, 25 Feb 2006 01:14:37 +0000 (UTC)
Subject: [Python-Dev] Pre-PEP: The "bytes" object
References: <>
	<dto68f$3ca$> <>
Message-ID: <dtob1t$f52$>

Ron Adam <rrr at> wrote:
> Why was it decided that the unicode encoding argument should be ignored 
> if the first argument is a string?  Wouldn't an exception be better 
> rather than give the impression it does something when it doesn't?

>From the PEP:

    There is no sane meaning that the encoding can have in that
    case.  str objects *are* byte arrays and they know nothing about
    the encoding of character data they contain.  We need to assume
    that the programmer has provided str object that already uses
    the desired encoding.

Raising an exception would be a valid option.  However, passing the
string through unchanged makes the transition from str to bytes


From tim.peters at  Sat Feb 25 06:12:19 2006
From: tim.peters at (Tim Peters)
Date: Fri, 24 Feb 2006 23:12:19 -0600
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
Message-ID: <>

[Neal Norwitz]
>> Martin and I were talking about dropping support for older versions of
>> Windows (of the non-NT flavor).  We both thought that it was
>> reasonable to stop supporting Win9x (including WinME) in Python 2.6.
>> I updated PEP 11 to reflect this.

It's OK by me, but I have the same question as Trent:

[Trent Mick]
> Are there specific code areas in mind that would be ripped out for this
> or is this mainly to avoid having to test on and ensure new code is
> compatible with?

Seem unlikely it's the latter, since I'm not sure any Python developer
tests on a pre-NT Windows anymore anyway.  Maybe Raymond is still
running WinME?

About the former, I don't see much potential.  The ugliest 9x-ism is
w9xpopen.exe, but comments in the places it's used say it's needed on
NT too if the user is running  If so, it stays.

There's a bit of excruciating Win9x-specific code in winsound.c that
could go away, and I suppose we could assume that Unicode filenames
are always supported on Windows.

Maybe best is that if someone reports a Win9x-specific bug against
2.6+, we could close it as Won't-Fix at once instead of letting it sit
around ignored for years :-)

From rrr at  Sat Feb 25 07:23:59 2006
From: rrr at (Ron Adam)
Date: Sat, 25 Feb 2006 00:23:59 -0600
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <dtob1t$f52$>
References: <>	<>	<>	<>	<>	<dto68f$3ca$>
	<> <dtob1t$f52$>
Message-ID: <>

Neil Schemenauer wrote:
> Ron Adam <rrr at> wrote:
>> Why was it decided that the unicode encoding argument should be ignored 
>> if the first argument is a string?  Wouldn't an exception be better 
>> rather than give the impression it does something when it doesn't?
>>From the PEP:
>     There is no sane meaning that the encoding can have in that
>     case.  str objects *are* byte arrays and they know nothing about
>     the encoding of character data they contain.  We need to assume
>     that the programmer has provided str object that already uses
>     the desired encoding.
> Raising an exception would be a valid option.  However, passing the
> string through unchanged makes the transition from str to bytes
> easier.
>   Neil

I guess I'm concerned that if the string isn't already in the specified 
encoding it could pass though without complaining and not be encoded as 

 >>> b.bytes(u'abc', 'hex-codec')
bytes([54, 49, 54, 50, 54, 51])

 >>> b.bytes('abc', 'hex-codec')
bytes([97, 98, 99])                # not hex

If this was in a function I would need to do a check of some sort 
anyways or cast to unicode beforehand, or encode beforehand.  Which 
negates the advantage of having the codec argument in bytes unfortunately.

    def hexabyte(s):
        s = unicode(s)
        return bytes(s, 'hex-codec')

    def hexabyte(s):
        s = s.encode('hex-codec')
        return bytes(s)

It seems to me if you are specifying a codec for bytes, then you will 
not be expecting to get an already encoded string, and if you do, it may 
not be in the codec you want since you are probably not specifying the 
default codec.


From martin at  Sat Feb 25 14:22:35 2006
From: martin at (martin at
Date: Sat, 25 Feb 2006 14:22:35 +0100
Subject: [Python-Dev] Dropping support for Win9x in 2.6
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von Trent Mick <trentm at>:

> Are there specific code areas in mind that would be ripped out for this
> or is this mainly to avoid having to test on and ensure new code is
> compatible with?

Primarily the non-W versions of the file system API. I think the
W9x popen support could also go away.

I don't think any testing happens for W9x; I (atleast) can't
test it myself (I installed a Windows 95 system to test the 2.4
installer, but had to give up the machine shortly after that).


From stephen at  Sat Feb 25 19:05:38 2006
From: stephen at (Stephen J. Turnbull)
Date: Sun, 26 Feb 2006 03:05:38 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Sat, 25 Feb 2006 13:20:25 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> Stephen J. Turnbull wrote:

    >> the kind of "text" for which Unicode was designed is normally
    >> produced and consumed by people, who wll pt up w/ ll knds f
    >> nnsns.  Base64 decoders will not put up with the same kinds of
    >> nonsense that people will.

    Greg> The Python compiler won't put up with that sort of nonsense
    Greg> either. Would you consider that makes Python source code
    Greg> binary data rather than text, and that it's inappropriate to
    Greg> represent it using a unicode string?

The reason that Python source code is text is that the primary
producers/consumers of Python source code are human beings, not

There are no such human producers/consumers of base64.  Unless you
prefer that I expressed that last sentence as "VGhlIHJlYXNvbiB0aG

    >> You're basically assuming that the person who implements the
    >> code that processes a Unicode string is the same person who
    >> implemented the code that converts a binary object into base64
    >> and inserts it into a string.

    Greg> No, I'm assuming the user of base64 knows the
    Greg> characteristics of the channel he's using.

Yes, which implies that you assume he has control of the data all the
way to the channel that actually requires base64.

Use case: the Gnus MUA supports the RFC that allows non-ASCII names in
MIME headers that take file names.  The interface was written for
message-at-a-time use, which makes sense for composition.  Somebody
else added "save and strip part" editing capability, but this only
works one MIME part at a time.  So if you have a message with four
MIME parts and you save and strip all of them, the first one gets
encoded four times.

The reason for *this* bug, and scores like it over the years, is that
somebody made it convenient to put wire protocols into a text
document.  Shouldn't Python do better than that?  Shouldn't Python
text be for humans, rather than be whatever had the tag "character"
attached to it for convenience of definition of a protocol for
communication of data humans can't process without mechanical

    >> I don't think it's a good idea to gratuitously introduce wire
    >> protocols as unicode codecs,

    Greg> I am *not* saying that base64 is a unicode codec!  If that's
    Greg> what you thought I was saying, it's no wonder we're
    Greg> confusing each other.

I know you don't think that it's a duck, but it waddles and quacks.
Ie, the question is not what I think you're saying.  It's "what is the
Python compiler/interpreter going to think?"  AFAICS, it's going to
think that base64 is a unicode codec.

    Greg> The only time I need to use something like base64 is when I
    Greg> have something that will only accept text. In Py3k, "accepts
    Greg> text" is going to mean "takes a character string as input",

Characters are inherently abstract, as a class they can't be
instantiated as input or output---only derived (ie, encoded)
characters can.  I don't believe that "takes a character string as
input" has any intrinsic meaning.

    Greg> Does that make it clearer what I'm getting at?

No.<wink>  I already understood what you're getting at.  As I said, I'm
sympathetic in principle.  In practice, I think it's a loaded gun
aimed at my foot.  And yours.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From thomas at  Sat Feb 25 19:26:01 2006
From: thomas at (Thomas Wouters)
Date: Sat, 25 Feb 2006 19:26:01 +0100
Subject: [Python-Dev] PEP 328
Message-ID: <>

Since I implemented[*] PEP 328, Aahz suggested I take over editing the PEP,
too, as there were some minor discussion points to add still. I haven't been
around for the discussioons, though, and it's been a while for everone else,
I think, so I'd like to rehash and ask for any other open points.

The one open point that Aahz forwarded me, and is expressed somewhat in , is
the case where you have a package that you want to transparently supply a
particular version of a module for forward/backward compatibility, replacing
a version elsewhere on sys.path (if any.) I see four distinct situations for

 1) Replacing a stdlib module (or a set of them) with a newer version, if the
    stdlib module is too old, where you want the whole stdlib to use the
    newer version.

 2) Same as 1), but private to your package; modules not in your package
    should get the stdlib version when they import the 'replaced' module.

 3) Providing a module (or a set of them) that the stdlib might be missing
    (but which will be a new enough version if it's there)

1) and 3) are easy to solve: put the module in a separate directory, insert
that into sys.path; at the front for 1), at the end for 3). Mailman, IIRC,
does this, and I think it works fine.

2) is easy if it's a single module; include it in your package and import it
relatively. If it's a package itself, it's again pretty easy; include the
package and include it relatively. The package itself is hopefully already
using relative imports to get sibling packages. If the package is using
absolute imports to get sibling packages, well, crap. I don't think we can
solve that issue whatever we do: that already breaks.

The real problem with 2) is when you have tightly coupled modules that are
not together in a package and not using relative imports, or perhaps when
you want to *partially* override a package. I would argue that tightly
coupled modules should always use relative imports, whether they are
together in a package or not (even though they should probably be in a
package anyway.) I'd also argue that having different modules import
different versions of existing modules is a bad idea. It's workable if the
modules are only used internally, but exposing anything is troublesome. for
instance, an instance of a class defined in foo (1.0) imported by bar will
not be an instance of the same class defined in foo (1.1) imported by

Am I missing anything?

([*] incorrectly, to be sure, but I have a 'correct' version ready that I'll
upload in a second; I was trying to confuse Guido into accepting my version,

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From stephen at  Sat Feb 25 20:44:14 2006
From: stephen at (Stephen J. Turnbull)
Date: Sun, 26 Feb 2006 04:44:14 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Ron Adam's message of "Fri, 24
	Feb 2006 16:46:00 -0600")
References: <>
	<> <dtiemd$a3k$>
Message-ID: <>

>>>>> "Ron" == Ron Adam <rrr at> writes:

    Ron> So, lets consider a "codec" and a "coding" as being two
    Ron> different things where a codec is a character sub set of
    Ron> unicode characters expressed in a native format.  And a
    Ron> coding is *not* a subset of the unicode character set, but an
    Ron> _opperation_ performed on text.

    Ron>     codec ->  text is always in *one_codec* at any time.

No, a codec is an operation, not a state.

And text qua text has no need of state; the whole point of defining
text (as in the unicode type) is to abstract from such
representational issues.

    Ron> Pure codecs such as latin-1 can be envoked over and over and
    Ron> you can always get back what you put in in a single step.

Maybe you'd like to define them that way, but it doesn't work in
general.  Given that str and unicode currently don't carry state with
them, it's not possible for "to ASCII" and "to EBCDIC" to be
idempotent at the same time.  And for the languages spoken by 75% of
the world's population, "to latin-1" cannot be successfully invoked
even once, let alone be idempotent.  You really need to think about
how your examples apply to codecs like KOI8-R for Russian and Shift
JIS for Japanese.

In practice, I just don't think you can distinguish "codecs" from
"coding" using the kind of mathematical properties you have described

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From skip at  Sat Feb 25 21:26:18 2006
From: skip at (skip at
Date: Sat, 25 Feb 2006 14:26:18 -0600
Subject: [Python-Dev] cProfile prints to stdout?
Message-ID: <>

I just noticed that cProfile (like profile) prints to stdout.  Yuck.  I
guess that's to be expected because the pstats module does the actual
printing and it's used by both modules.  I'm willing to give up backward
compatibility to achieve a little more sanity and flexibility here.  I
propose rewriting the necessary bits to att a stream= keyword argument where
necessary and using stream.write(...) or print >> stream, ... instead of the
current bare print.  I'd prefer the default for the stream be sys.stderr as



From brett at  Sat Feb 25 21:35:22 2006
From: brett at (Brett Cannon)
Date: Sat, 25 Feb 2006 12:35:22 -0800
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, skip at <skip at> wrote:
> I just noticed that cProfile (like profile) prints to stdout.  Yuck.  I
> guess that's to be expected because the pstats module does the actual
> printing and it's used by both modules.  I'm willing to give up backward
> compatibility to achieve a little more sanity and flexibility here.  I
> propose rewriting the necessary bits to att a stream= keyword argument where
> necessary and using stream.write(...) or print >> stream, ... instead of the
> current bare print.  I'd prefer the default for the stream be sys.stderr as
> well.
> Thoughts?

+0 from me (would be +1 since it seems very reasonable, but I never
use profile and this will break some code somewhere).


From python at  Sat Feb 25 21:39:24 2006
From: python at (Raymond Hettinger)
Date: Sat, 25 Feb 2006 14:39:24 -0600
Subject: [Python-Dev] cProfile prints to stdout?
References: <>
Message-ID: <001e01c63a4b$97bfde90$c913020a@RaymondLaptop1>

>I just noticed that cProfile (like profile) prints to stdout.  Yuck.  I
> guess that's to be expected because the pstats module does the actual
> printing and it's used by both modules.  I'm willing to give up backward
> compatibility to achieve a little more sanity and flexibility here.  I
> propose rewriting the necessary bits to att a stream= keyword argument where
> necessary and using stream.write(...) or print >> stream, ... instead of the
> current bare print.  I'd prefer the default for the stream be sys.stderr as
> well.
> Thoughts?

FWIW, this idea has come-up a couple of times before so it should probably get 
fixed once and for all.


From g.brandl at  Sat Feb 25 21:48:06 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 25 Feb 2006 21:48:06 +0100
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <dtqfq6$44b$>

skip at wrote:
> I just noticed that cProfile (like profile) prints to stdout.  Yuck.  I
> guess that's to be expected because the pstats module does the actual
> printing and it's used by both modules.  I'm willing to give up backward
> compatibility to achieve a little more sanity and flexibility here.  I
> propose rewriting the necessary bits to att a stream= keyword argument where
> necessary and using stream.write(...) or print >> stream, ... instead of the
> current bare print.  I'd prefer the default for the stream be sys.stderr as
> well.

Probably related:


From skip at  Sat Feb 25 21:57:42 2006
From: skip at (skip at
Date: Sat, 25 Feb 2006 14:57:42 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <dtqfq6$44b$>
References: <>
Message-ID: <>

    Georg> Probably related:


Don't think so.  That was just a documentation nit (and is now fixed and
closed at any rate).


From g.brandl at  Sat Feb 25 21:59:18 2006
From: g.brandl at (Georg Brandl)
Date: Sat, 25 Feb 2006 21:59:18 +0100
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>	<dtqfq6$44b$>
Message-ID: <dtqgf6$6ee$>

skip at wrote:
>     Georg> Probably related:
>     Georg>
> Don't think so.  That was just a documentation nit (and is now fixed and
> closed at any rate).

Well, it is another module that prints to stdout instead of stderr.

Okay, not so closely related ;)


From guido at  Sat Feb 25 22:31:56 2006
From: guido at (Guido van Rossum)
Date: Sat, 25 Feb 2006 15:31:56 -0600
Subject: [Python-Dev] PEP 328
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, Thomas Wouters <thomas at> wrote:
> Since I implemented[*] PEP 328, Aahz suggested I take over editing the PEP,
> too, as there were some minor discussion points to add still. I haven't been
> around for the discussioons, though, and it's been a while for everone else,
> I think, so I'd like to rehash and ask for any other open points.
> The one open point that Aahz forwarded me, and is expressed somewhat in
> , is
> the case where you have a package that you want to transparently supply a
> particular version of a module for forward/backward compatibility, replacing
> a version elsewhere on sys.path (if any.) I see four distinct situations for
> this:
>  1) Replacing a stdlib module (or a set of them) with a newer version, if the
>     stdlib module is too old, where you want the whole stdlib to use the
>     newer version.
>  2) Same as 1), but private to your package; modules not in your package
>     should get the stdlib version when they import the 'replaced' module.
>  3) Providing a module (or a set of them) that the stdlib might be missing
>     (but which will be a new enough version if it's there)
> 1) and 3) are easy to solve: put the module in a separate directory, insert
> that into sys.path; at the front for 1), at the end for 3). Mailman, IIRC,
> does this, and I think it works fine.
> 2) is easy if it's a single module; include it in your package and import it
> relatively. If it's a package itself, it's again pretty easy; include the
> package and include it relatively. The package itself is hopefully already
> using relative imports to get sibling packages. If the package is using
> absolute imports to get sibling packages, well, crap. I don't think we can
> solve that issue whatever we do: that already breaks.
> The real problem with 2) is when you have tightly coupled modules that are
> not together in a package and not using relative imports, or perhaps when
> you want to *partially* override a package. I would argue that tightly
> coupled modules should always use relative imports, whether they are
> together in a package or not (even though they should probably be in a
> package anyway.) I'd also argue that having different modules import
> different versions of existing modules is a bad idea. It's workable if the
> modules are only used internally, but exposing anything is troublesome. for
> instance, an instance of a class defined in foo (1.0) imported by bar will
> not be an instance of the same class defined in foo (1.1) imported by
> feeble.
> Am I missing anything?
> ([*] incorrectly, to be sure, but I have a 'correct' version ready that I'll
> upload in a second; I was trying to confuse Guido into accepting my version,
> instead.)

One thing you're missing here is that the original assertion about the
impossibility of editing the source code of the third-party package
that's being incorporated into your distribution, is simply wrong.
Systematically modifying all modules in a package to change their
imports to assume a slightly different hierarchy can easily be done

I'd also add that eggs promise to provide a different solution for
most concerns.

I believe we should go ahead and implement PEP 338 faithfully without
revisiting the decisions. If we were wrong (which I doubt) we'll have
the opportunity to take a different direction in 2.6.

--Guido van Rossum (home page:

From guido at  Sat Feb 25 22:36:33 2006
From: guido at (Guido van Rossum)
Date: Sat, 25 Feb 2006 15:36:33 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, skip at <skip at> wrote:
> I just noticed that cProfile (like profile) prints to stdout.  Yuck.

Can you clarify? Why is it wrong to send the output of the profiler to
stdout? It seems to make sense to me that you should be able to
redirect the profiler's output to a file with a simple ">file".

--Guido van Rossum (home page:

From facundobatista at  Sat Feb 25 22:59:49 2006
From: facundobatista at (Facundo Batista)
Date: Sat, 25 Feb 2006 18:59:49 -0300
Subject: [Python-Dev] Translating docs
Message-ID: <>

After a small talk with Raymond, yesterday in the breakfast, I
proposed in PyAr the idea of start to translate the Library Reference.

You'll agree with me that this is a BIG effort. But not only big, it's dynamic!

So, we decided that we need a system that provide us the management of
the translations. And it'd be a good idea the system to be available
for translations in other languages.

One of the guys proposed to use Launchpad (

The question is, it's ok to use a third party system for this
initiative? Or you (we) prefer to host it in-house? Someone alredy
thought of this?

Thank you!

.    Facundo


From tjreedy at  Sat Feb 25 23:00:05 2006
From: tjreedy at (Terry Reedy)
Date: Sat, 25 Feb 2006 17:00:05 -0500
Subject: [Python-Dev] PEP 328
References: <>
Message-ID: <dtqk15$h03$>

"Thomas Wouters" <thomas at> wrote in message 
news:20060225182601.GQ23859 at
> The one open point that Aahz forwarded me, and is expressed somewhat in
> , 
> is
> the case where you have a package that you want to transparently supply a
> particular version of a module for forward/backward compatibility, 
> replacing
> a version elsewhere on sys.path (if any.) I see four distinct situations 
> for
> this:

Did you mean three?

> 1) Replacing a stdlib module (or a set of them) with a newer version, if 
> the
>    stdlib module is too old, where you want the whole stdlib to use the
>    newer version.
> 2) Same as 1), but private to your package; modules not in your package
>    should get the stdlib version when they import the 'replaced' module.
> 3) Providing a module (or a set of them) that the stdlib might be missing
>    (but which will be a new enough version if it's there)

Or did you forget the fourth?

In any case, the easy solution to breaking code is to not do it until 3.0. 
There might never be a 2.7 to worry about.

Terry Jan Reedy

From skip at  Sat Feb 25 23:14:04 2006
From: skip at (skip at
Date: Sat, 25 Feb 2006 16:14:04 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

    >> I just noticed that cProfile (like profile) prints to stdout.  Yuck.

    Guido> Can you clarify? Why is it wrong to send the output of the
    Guido> profiler to stdout?

If the program emits a bunch of output of its own I want to keep it separate
from what is arguably the debug output of the profiler (even though the
profiler prints all its output at the end): > /dev/null 2>

    Guido> It seems to make sense to me that you should be able to redirect
    Guido> the profiler's output to a file with a simple ">file".

It is currently impossible to separate profile output from the program's


From guido at  Sat Feb 25 23:42:03 2006
From: guido at (Guido van Rossum)
Date: Sat, 25 Feb 2006 16:42:03 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, skip at <skip at> wrote:
>     >> I just noticed that cProfile (like profile) prints to stdout.  Yuck.
>     Guido> Can you clarify? Why is it wrong to send the output of the
>     Guido> profiler to stdout?
> If the program emits a bunch of output of its own I want to keep it separate
> from what is arguably the debug output of the profiler (even though the
> profiler prints all its output at the end):
> > /dev/null 2>
>     Guido> It seems to make sense to me that you should be able to redirect
>     Guido> the profiler's output to a file with a simple ">file".
> It is currently impossible to separate profile output from the program's
> output.

It is if you use the "advanced" use of the profiler -- the profiling
run just saves the profiling data to a file, and the pstats module
invoked separately prints the output.

--Guido van Rossum (home page:

From guido at  Sun Feb 26 00:16:21 2006
From: guido at (Guido van Rossum)
Date: Sat, 25 Feb 2006 17:16:21 -0600
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
Message-ID: <>

FWIW this has now been checked in. Enjoy!


On 2/23/06, Guido van Rossum <guido at> wrote:
> On 2/22/06, Michael Chermside <mcherm at> wrote:
> > A minor related point about on_missing():
> >
> > Haven't we learned from regrets over the .next() method of iterators
> > that all "magically" invoked methods should be named using the __xxx__
> > pattern? Shouldn't it be named __on_missing__() instead?
> Good point. I'll call it __missing__. I've uploaded a new patch to

--Guido van Rossum (home page:

From greg.ewing at  Sun Feb 26 00:18:54 2006
From: greg.ewing at (Greg Ewing)
Date: Sun, 26 Feb 2006 12:18:54 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> The reason that Python source code is text is that the primary
> producers/consumers of Python source code are human beings, not
> compilers

I disagree with "primary" -- I think human and computer
use of source code have equal importance. Because of the
fact that Python source code must be acceptable to the
Python compiler, a great many transformations that would
be harmless to English text (upper casing, paragraph
wrapping, etc.) would cause disaster if applied to a
Python program. I don't see how base64 is any different.

> Yes, which implies that you assume he has control of the data all the
> way to the channel that actually requires base64.

Yes. If he doesn't, he can't safely use base64 at all.
That's true regardless of how the base64-encoded data
is represented. It's true of any data of any kind.

> Use case: the Gnus MUA supports the RFC that allows non-ASCII names in
> MIME headers that take file names...

I'm not familiar with all the details you're alluding
to here, but if there's a bug here, I'd say it's due
to somebody not thinking something through properly.
It shouldn't matter if something gets encoded four
times as long as it gets decoded four times at the
other end. If it's not possible to do that, someone
made an assumption about the channel that wasn't

> It's "what is the Python compiler/interpreter going
 > to think?"  AFAICS, it's going to think that base64 is
 > a unicode codec.

Only if it's designed that way, and I specifically
think it shouldn't -- i.e. it should be an error
to attempt the likes of a_unicode_string.encode("base64")
or unicode(something, "base64"). The interface for
doing base64 encoding should be something else.

> I don't believe that "takes a character string as
> input" has any intrinsic meaning.

I'm using that phrase in the context of Python, where
it means "a function that takes a Python character
string as input".

In the particular case of base64, it has the added
restriction that it must preserve the particular
65 characters used.

 > In practice, I think it's a loaded gun
> aimed at my foot.  And yours.

Whereas it seems quite the opposite to me, i.e.
*failing* to clearly distinguish between text and
binary data here is what will lead to confusion and

I think we need some concrete use cases to talk
about if we're to get any further with this. Do
you have any such use cases in mind?


From skip at  Sun Feb 26 01:13:04 2006
From: skip at (skip at
Date: Sat, 25 Feb 2006 18:13:04 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

    >> It is currently impossible to separate profile output from the
    >> program's output.

    Guido> It is if you use the "advanced" use of the profiler -- the
    Guido> profiling run just saves the profiling data to a file, and the
    Guido> pstats module invoked separately prints the output.

Sure, but then it's not "simple".  Your original example was "... > file".
I'd like it to be (nearly) as easy to do it right yet keep it simple.


From martin at  Sun Feb 26 01:43:34 2006
From: martin at (martin at
Date: Sun, 26 Feb 2006 01:43:34 +0100
Subject: [Python-Dev] Translating docs
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von Facundo Batista <facundobatista at>:

> The question is, it's ok to use a third party system for this
> initiative? Or you (we) prefer to host it in-house? Someone alredy
> thought of this?

I thought about it at one time, and I think the doc strings can be
translated very well using gettext-based procedures; I once submitted
a POT file to the translation project:

Translating the library reference as such is more difficult, because
it can't be translated in small chunks very well.

Some group of French translators once translated everything for 1.5.2,
and that translation never got updated.


From aleaxit at  Sun Feb 26 02:34:41 2006
From: aleaxit at (Alex Martelli)
Date: Sat, 25 Feb 2006 17:34:41 -0800
Subject: [Python-Dev] Translating docs
In-Reply-To: <>
References: <>
Message-ID: <>

On Feb 25, 2006, at 4:43 PM, martin at wrote:

> Zitat von Facundo Batista <facundobatista at>:
>> The question is, it's ok to use a third party system for this
>> initiative? Or you (we) prefer to host it in-house? Someone alredy
>> thought of this?
> I thought about it at one time, and I think the doc strings can be
> translated very well using gettext-based procedures; I once submitted
> a POT file to the translation project:
> Translating the library reference as such is more difficult, because
> it can't be translated in small chunks very well.
> Some group of French translators once translated everything for 1.5.2,
> and that translation never got updated.

A similar situation applies to Italy -- a lot of stuff is translated  
at (the C-API and  
Extending and Embedding aren't translated, though), but it's 2.3.4- 
vintage docs.  There's no real mechanism or process to ensure updates.


From rrr at  Sun Feb 26 02:50:24 2006
From: rrr at (Ron Adam)
Date: Sat, 25 Feb 2006 19:50:24 -0600
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> It's "what is the Python compiler/interpreter going
>  > to think?"  AFAICS, it's going to think that base64 is
>  > a unicode codec.
> Only if it's designed that way, and I specifically
> think it shouldn't -- i.e. it should be an error
> to attempt the likes of a_unicode_string.encode("base64")
> or unicode(something, "base64"). The interface for
> doing base64 encoding should be something else.

I agree

> I think we need some concrete use cases to talk
> about if we're to get any further with this. Do
> you have any such use cases in mind?
> Greg

Or at least some where more concrete than trying to debate abstract 
meanings. ;-)

Running a short test over all the codecs and converting u"Python" to 
string and back to unicode resulted in the following output.  These are 
the only ones of the 92 that couldn't do the round trip successfully.

It seems to me these will need to be moved and/or made to work with 
unicode at some point.

1: 'bz2_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

2: 'hex_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

3: 'uu_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

4: 'quopri_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

5: 'base64_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

6: 'zlib_codec'
Traceback (most recent call last):
   File "", line 29, in test1
     u2 = unicode(s, c)       # to unicode
TypeError: decoder did not return an unicode object (type=str)

7: 'tactis'
Traceback (most recent call last):
   File "", line 28, in test1
     s = u1.encode(c)         # to string
LookupError: unknown encoding: tactis

From facundobatista at  Sun Feb 26 03:13:51 2006
From: facundobatista at (Facundo Batista)
Date: Sat, 25 Feb 2006 23:13:51 -0300
Subject: [Python-Dev] Translating docs
In-Reply-To: <>
References: <>
Message-ID: <>

2006/2/25, Alex Martelli <aleaxit at>:

> A similar situation applies to Italy -- a lot of stuff is translated
> at (the C-API and
> Extending and Embedding aren't translated, though), but it's 2.3.4-
> vintage docs.  There's no real mechanism or process to ensure updates.

We don't want that to happen, no.

BTW, Alex, so bad you're not here. We miss you, :)

.    Facundo


From facundobatista at  Sun Feb 26 03:14:20 2006
From: facundobatista at (Facundo Batista)
Date: Sat, 25 Feb 2006 23:14:20 -0300
Subject: [Python-Dev] Fwd:  Translating docs
In-Reply-To: <>
References: <>
Message-ID: <>

2006/2/25, martin at <martin at>:

> Translating the library reference as such is more difficult, because
> it can't be translated in small chunks very well.

The SVN directory "python/dist/src/Doc/lib/" has 276 .tex's, with an
average of 250 lines each.

Maybe manage each file independently could work.

> Some group of French translators once translated everything for 1.5.2,
> and that translation never got updated.

We're afraid of this. And that's why we think that it'd be necessary
to have some automated system that tell us if the original file got
updated, if there're new files to translate, to show the state of the
translation (in process, finished, not even started, etc...).

I think that a system like this is not so difficult, but if the wheel
is already invented...

.    Facundo


From at  Sun Feb 26 07:46:21 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 01:46:21 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/23/06, Steven Bethard <steven.bethard at> wrote:
> On 2/22/06, Almann T. Goo < at> wrote:
> >     def incrementer_getter(val):
> >        def incrementer():
> >            val = 5
> >            def inc():
> >                ..val += 1
> >                return val
> >            return inc
> >        return incrementer
> Sorry, what way did the user think?  I'm not sure what you think was
> supposed to happen.

My apologies ... I shouldn't use vague terms like what the "user
thinks."  My problem, as is demonstrated in the above example, is that
the implicit nature of evaluating a name in Python conflicts with the
explicit nature of the proposed "dot" notation.  It makes it easier
for a user to write obscure code (until Python 3K when we force users
to use "dot" notation for all enclosing scope access ;-) ).

This sort of thing can be done today with code using attribute access
on its module object to evaluate and rebind global names.  With the
"global" keyword however, users don't have to resort to this sort of

Because of Python's name binding semantics and the semantic for the
"global" keyword, I think the case for an "outer"-type keyword is
stronger and we could deprecate "global" going forward in Python 3K. 
One of the biggest points of contention to this is of course the
backwards incompatibility with a new keyword ... Python has already
recently added "yield" and we're about to get "with" and "as" in 2.5. 
As far as the "user-interface" of the language getting bloated, I
personally think trading "global" for an "outer" mitigates that some.


Almann T. Goo at

From greg.ewing at  Sun Feb 26 07:48:03 2006
From: greg.ewing at (Greg Ewing)
Date: Sun, 26 Feb 2006 19:48:03 +1300
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

Almann T. Goo wrote:

> One of the biggest points of contention to this is of course the
> backwards incompatibility with a new keyword ...

Alternatively, 'global' could be redefined to mean
what we're thinking of for 'outer'. Then there would
be no change in keywordage.

There would be potential for breaking code, but I
suspect the actual amount of breakage would be
small, since there would have to be 3 scopes
involved, with something in the middle one
shadowing a global that was referenced in the
inner one with a global statement.

Given the rarity of global statement usage to begin
with, I'd say that narrows things down to something
well within the range of acceptable breakage in 3.0.


From at  Sun Feb 26 08:07:32 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 02:07:32 -0500
Subject: [Python-Dev] PEP for Better Control of Nested Lexical Scopes
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/22/06, Greg Ewing <greg.ewing at> wrote:
> That's what rankles people about this, I think -- there
> doesn't seem to be a good reason for treating the global
> scope so specially, given that all scopes could be
> treated uniformly if only there were an 'outer' statement.
> All the arguments I've seen in favour of the status quo
> seem like rationalisations after the fact.

I agree, hence my initial pre-PEP feeler on the topic ;).

> > Since there were no nested lexical scopes back
> > then, there was no need to have a construct for arbitrary enclosing
> > scopes.
> However, if nested scopes *had* existed back then, I
> rather suspect we would have had an 'outer' statement
> from the beginning, or else 'global' would have been
> given the semantics we are now considering for 'outer'.

Would it not be so horrible to make "global" be the "outer"-type
keyword--basically meaning "lexically global" versus "the global
scope"?  It would make the semantics for Python's nested lexical
scopes to be more in line with other languages with this feature and
fix my orthogonality gripes.  As far as backwards compatibility, I
doubt there would be too much impact in this regard, as places that
would break would be where "global" was used in a closure where the
name was shadowed in an enclosing scope.  A "from __future__ import
lexical_global" (which we'd have for adding the "outer"-like keyword
anyway) could help diminish the growing pains.


Almann T. Goo at

From at  Sun Feb 26 08:15:20 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 02:15:20 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/26/06, Greg Ewing <greg.ewing at> wrote:
> Alternatively, 'global' could be redefined to mean
> what we're thinking of for 'outer'. Then there would
> be no change in keywordage.
> There would be potential for breaking code, but I
> suspect the actual amount of breakage would be
> small, since there would have to be 3 scopes
> involved, with something in the middle one
> shadowing a global that was referenced in the
> inner one with a global statement.
> Given the rarity of global statement usage to begin
> with, I'd say that narrows things down to something
> well within the range of acceptable breakage in 3.0.

You read my mind--I made a reply similar to this on another branch of
this thread just minutes ago :).

I am curious to see what the community thinks about this.


Almann T. Goo at

From g.brandl at  Sun Feb 26 08:50:57 2006
From: g.brandl at (Georg Brandl)
Date: Sun, 26 Feb 2006 08:50:57 +0100
Subject: [Python-Dev] Fwd:  Translating docs
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <dtrml1$r27$>

Facundo Batista wrote:
> 2006/2/25, martin at <martin at>:
>> Translating the library reference as such is more difficult, because
>> it can't be translated in small chunks very well.
> The SVN directory "python/dist/src/Doc/lib/" has 276 .tex's, with an
> average of 250 lines each.
> Maybe manage each file independently could work.
>> Some group of French translators once translated everything for 1.5.2,
>> and that translation never got updated.
> We're afraid of this. And that's why we think that it'd be necessary
> to have some automated system that tell us if the original file got
> updated, if there're new files to translate, to show the state of the
> translation (in process, finished, not even started, etc...).
> I think that a system like this is not so difficult, but if the wheel
> is already invented...

Wouldn't a post-commit hook in SVN be enough?

Also, the docs could be managed in a Wiki (or, if the translators know how to
use it, in SVN too) so that translators can correct and revise what others
have translated...

Martin: There aren't any German docs, are there?


From 2005a at  Sun Feb 26 11:08:23 2006
From: 2005a at (Alexander Schremmer)
Date: Sun, 26 Feb 2006 11:08:23 +0100
Subject: [Python-Dev] Fwd:  Translating docs
References: <>	<>	<>
Message-ID: <>

On Sun, 26 Feb 2006 08:50:57 +0100, Georg Brandl wrote:

> Martin: There aren't any German docs, are there?

There is e.g.

Kind regards,

From martin at  Sun Feb 26 15:30:13 2006
From: martin at (martin at
Date: Sun, 26 Feb 2006 15:30:13 +0100
Subject: [Python-Dev] Fwd:  Translating docs
In-Reply-To: <dtrml1$r27$>
References: <>
Message-ID: <>

Zitat von Georg Brandl <g.brandl at>:

> Martin: There aren't any German docs, are there?

I started translating the doc strings once, but never got to complete
it. I still believe that the doc string translation is the only approach
that could work in a reasonable way - you would have to use pydoc to
view the translations, though.

There are, of course, various German books.


From massimiliano.leoni at  Sun Feb 26 15:27:34 2006
From: massimiliano.leoni at (Massimiliano Leoni)
Date: Sun, 26 Feb 2006 15:27:34 +0100
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
 Better Control of Nested Lexical Scopes)
Message-ID: <>

Why would you change the Python scoping rules, instead of using the 
function attributes, available from release 2.1 (PEP 232) ?
For example, you may write:

def incgen(start, inc):
   def incrementer():
     incrementer.a += incrementer.b
     return incrementer.a
   incrementer.a = start - inc
   incrementer.b = inc
   return incrementer

f = incgen(100, 2)
g = incgen(200, 3)
for i in range(5):
     print f(), g()

The result is:

100 200
102 203
104 206
106 209
108 212

From stephen at  Sun Feb 26 17:05:51 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 27 Feb 2006 01:05:51 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Sun, 26 Feb 2006 12:18:54 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> I think we need some concrete use cases to talk about if
    Greg> we're to get any further with this. Do you have any such use
    Greg> cases in mind?

I gave you one, MIME processing in email, and a concrete bug that is
possible with the design you propose, but not in mine.  You said, "the
programmers need to try harder."  If that's an acceptable answer, I
have to concede it beats all any use case I can imagine.

I think it's your turn.  Give me a use case where it matters
practically that the output of the base64 codec be Python unicode
characters rather than 8-bit ASCII characters.

I don't think you can.  Everything you have written so far is based on
defending your maintained assumption that because Python implements
text processing via the unicode type, everything that is described as
a "character" must be coerced to that type.

If you give up that assumption, you get

1.  an automatic reduction in base64.upcase() bugs because it's a type
    error, ie, binary objects are not text objects, no matter what their

2.  encouragement to programmer teams to carry along binary objects as
    opaque blobs until they're just about to put them on the wire,
    then let the wire protocol guy implement the conversion at that point

3.  efficiency for a very common case where ASCII octets are the wire

4.  efficient and clear implementation and documentation using the
    codec framework and API

I don't really see a downside, except for the occasional double
conversion ASCII -> unicode -> UTF-16, as is allowed (but not
mandated) in XML's use of base64.  What downside do you see?

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From gvanrossum at  Sun Feb 26 18:12:25 2006
From: gvanrossum at (Guido van Rossum)
Date: Sun, 26 Feb 2006 11:12:25 -0600
Subject: [Python-Dev] cProfile prints to stdout?
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, skip at <skip at> wrote:
>     >> It is currently impossible to separate profile output from the
>     >> program's output.
>     Guido> It is if you use the "advanced" use of the profiler -- the
>     Guido> profiling run just saves the profiling data to a file, and the
>     Guido> pstats module invoked separately prints the output.
> Sure, but then it's not "simple".  Your original example was "... > file".
> I'd like it to be (nearly) as easy to do it right yet keep it simple.

OK. I believe the default should be stdout though, and the conveniece
method print_stats() in should be the only place that
references stderr. The smallest code mod would be to redirect stdout
temporarily inside print_stats(); but I won't complain if you're more
ambitious and modify

    def print_stats(self, sort=-1, stream=None):
        import pstats
        if stream is None:
            stream = sys.stderr
        save = sys.stdout
            if stream is not None:
                sys.stdout = stream
            pstats.Stats(self).strip_dirs().sort_stats(sort). \
            sys.stdout = save

--Guido van Rossum (home page:

From thomas at  Sun Feb 26 18:14:18 2006
From: thomas at (Thomas Wouters)
Date: Sun, 26 Feb 2006 18:14:18 +0100
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Feb 26, 2006 at 03:27:34PM +0100, Massimiliano Leoni wrote:

> Why would you change the Python scoping rules, instead of using the 
> function attributes, available from release 2.1 (PEP 232) ?

Because closures allow for data that isn't trivially reachable by the caller
(or anyone but the function itself.) You can argue that that's unpythonic or
what not, but fact is that the current closures allow that.

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From steven.bethard at  Sun Feb 26 18:17:31 2006
From: steven.bethard at (Steven Bethard)
Date: Sun, 26 Feb 2006 10:17:31 -0700
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/25/06, Almann T. Goo < at> wrote:
> On 2/23/06, Steven Bethard <steven.bethard at> wrote:
> > On 2/22/06, Almann T. Goo < at> wrote:
> > >     def incrementer_getter(val):
> > >        def incrementer():
> > >            val = 5
> > >            def inc():
> > >                ..val += 1
> > >                return val
> > >            return inc
> > >        return incrementer
> >
> > Sorry, what way did the user think?  I'm not sure what you think was
> > supposed to happen.
> My apologies ... I shouldn't use vague terms like what the "user
> thinks."  My problem, as is demonstrated in the above example, is that
> the implicit nature of evaluating a name in Python conflicts with the
> explicit nature of the proposed "dot" notation.  It makes it easier
> for a user to write obscure code (until Python 3K when we force users
> to use "dot" notation for all enclosing scope access ;-) ).

Then do you also dislike the original proposal: that only a single dot
be allowed, and that the '.' would mean "this name, but in the nearest
outer scope that defines it"?  Then:

    def incrementer_getter(val):
       def incrementer():
           val = 5
           def inc():
               .val += 1
               return val
           return inc
       return incrementer

would do what I think you want it to[1].  Note that I only suggested
extending the dot-notation to allow multiple dots because of Greg
Ewing's complaint that it wasn't enough like the relative import
notation.  Personally I find PJE's original proposal more intuitive,
and based on your example, I suspect so do you.

[1] That is, increment the ``val`` in incrementer(), return the same
``val``, and never modify the ``val`` in incrementer_getter().

Grammar am for people who can't think for myself.
        --- Bucky Katt, Get Fuzzy

From tjreedy at  Sun Feb 26 18:45:24 2006
From: tjreedy at (Terry Reedy)
Date: Sun, 26 Feb 2006 12:45:24 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP
	forBetter Control of Nested Lexical Scopes)
References: <><><><><><><><><><>
Message-ID: <dtspfk$o7h$>

"Almann T. Goo" < at> wrote in message 
news:7e9b97090602252315mf6d4686ud86dd5163ea76b37 at
> On 2/26/06, Greg Ewing <greg.ewing at> wrote:
>> Alternatively, 'global' could be redefined to mean
>> what we're thinking of for 'outer'. Then there would
>> be no change in keywordage.
>> Given the rarity of global statement usage to begin
>> with, I'd say that narrows things down to something
>> well within the range of acceptable breakage in 3.0.
> You read my mind--I made a reply similar to this on another branch of
> this thread just minutes ago :).
> I am curious to see what the community thinks about this.

I *think* I like this better than more complicated proposals.  I don't 
think I would ever have a problem with the intermediate scope masking the 
module scope.  After all, if I really meant to access the current global 
scope from a nested function, I simply would not use that name in the 
intermediate scope.


From at  Sun Feb 26 20:06:57 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 14:06:57 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP for
	Better Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/26/06, Steven Bethard <steven.bethard at> wrote:
> Then do you also dislike the original proposal: that only a single dot
> be allowed, and that the '.' would mean "this name, but in the nearest
> outer scope that defines it"?  Then:
>     def incrementer_getter(val):
>        def incrementer():
>            val = 5
>            def inc():
>                .val += 1
>                return val
>            return inc
>        return incrementer
> would do what I think you want it to[1].  Note that I only suggested
> extending the dot-notation to allow multiple dots because of Greg
> Ewing's complaint that it wasn't enough like the relative import
> notation.  Personally I find PJE's original proposal more intuitive,
> and based on your example, I suspect so do you.
> [1] That is, increment the ``val`` in incrementer(), return the same
> ``val``, and never modify the ``val`` in incrementer_getter().

I'm not sure if I find this more intuitive, but I think it is more
convenient than the "explicit dots" for each scope.  However my
biggest issue is still there.  I am not a big fan of letting users
have synonyms for names.  Notice how ".var" means the same as "var" in
some contexts in the example above--that troubles me.  PEP 227
addresses this concern with regard to the class scope:

    Names in class scope are not accessible.  Names are resolved in
    the innermost enclosing function scope.  If a class definition
    occurs in a chain of nested scopes, the resolution process skips
    class definitions.  This rule prevents odd interactions between
    class attributes and local variable access.

As the PEP further states:

    An alternative would have been to allow name binding in class
    scope to behave exactly like name binding in function scope.  This
    rule would allow class attributes to be referenced either via
    attribute reference or simple name.  This option was ruled out
    because it would have been inconsistent with all other forms of
    class and instance attribute access, which always use attribute
    references.  Code that used simple names would have been obscure.

I especially don't want to add an issue that is similar to one that
PEP 227 went out of its way to avoid.


Almann T. Goo at

From rrr at  Sun Feb 26 20:47:18 2006
From: rrr at (Ron Adam)
Date: Sun, 26 Feb 2006 13:47:18 -0600
Subject: [Python-Dev] Using and binding relative names (was Re:
 PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <dtspfk$o7h$>
References: <><><><><><><><><><>	<>
Message-ID: <>

Terry Reedy wrote:
> "Almann T. Goo" < at> wrote in message 
> news:7e9b97090602252315mf6d4686ud86dd5163ea76b37 at
>> On 2/26/06, Greg Ewing <greg.ewing at> wrote:
>>> Alternatively, 'global' could be redefined to mean
>>> what we're thinking of for 'outer'. Then there would
>>> be no change in keywordage.
>>> Given the rarity of global statement usage to begin
>>> with, I'd say that narrows things down to something
>>> well within the range of acceptable breakage in 3.0.
>> You read my mind--I made a reply similar to this on another branch of
>> this thread just minutes ago :).
>> I am curious to see what the community thinks about this.
> I *think* I like this better than more complicated proposals.  I don't 
> think I would ever have a problem with the intermediate scope masking the 
> module scope.  After all, if I really meant to access the current global 
> scope from a nested function, I simply would not use that name in the 
> intermediate scope.
> tjr

Would this apply to reading intermediate scopes without the global keyword?

How would you know you aren't in inadvertently masking a name in a 
function you call?

In most cases it will probably break something in an obvious way, but I 
suppose in some cases it won't be so obvious.


From martin at  Sun Feb 26 20:59:09 2006
From: martin at (martin at
Date: Sun, 26 Feb 2006 20:59:09 +0100
Subject: [Python-Dev] Exposing the abstract syntax
Message-ID: <>

At PyCon, there was general reluctance for incorporating
the ast-objects branch, primarily because people where
concerned what the reference counting would do to
maintainability, and what (potentially troublesome)
options direct exposure of AST objects would do.

OTOH, the approach of creating a shadow tree did not
find opposition, so I implemented that.

Currently, you can use compile() to create an AST
out of source code, by passing PyCF_ONLY_AST (0x400)
to compile. The mapping of AST to Python objects
is as follows:

- there is a Python type for every sum, product,
  and constructor.
- The constructor types inherit from their sum
  types (e.g. ClassDef inherits from stmt)
- Each constructor and product type has an
  _fields member, giving the names of the fields
  of the product.
- Each node in the AST has members with the names
  given in _fields
- If the field is optional, it might be None
- if the field is zero-or-more, it is represented
  as a list.

It might be reasonable to expose this through
a separate module, in particular to provide
access to the type objects.


From aleaxit at  Sun Feb 26 21:07:42 2006
From: aleaxit at (Alex Martelli)
Date: Sun, 26 Feb 2006 12:07:42 -0800
Subject: [Python-Dev] Using and binding relative names (was Re:
	PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <><><><><><><><><><>	<>
	<dtspfk$o7h$> <>
Message-ID: <>

On Feb 26, 2006, at 11:47 AM, Ron Adam wrote:
> How would you know you aren't in inadvertently masking a name in a
> function you call?

What does calling have to do with it?  Nobody's proposing a move to  
(shudder) dynamic scopes, we're talking of saner concepts such as  
lexical scopes anyway.  Can you give an example of what you mean?

For the record: I detest the existing 'global' (could I change but  
ONE thing in Python, that would be the one -- move from hated  
'global' to a decent namespace use, e.g. glob.x=23 rather than global  
x;x=23), and I'd detest a similar 'outer' just as intensely (again,  
what I'd like instead is a decent namespace) -- so I might well be  
sympathetic to your POV, if I could but understand it;-).


From at  Sun Feb 26 21:18:12 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 15:18:12 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP
	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
	<dtspfk$o7h$> <>
Message-ID: <>

> Would this apply to reading intermediate scopes without the global keyword?

Using a name from an enclosing scope without re-binding to it would
not require the "global" keyword.  This actually is the case today
with "global" and accessing a name from the global scope versus
re-binding to it--this would make "global" more general than
explicitly overriding to the global scope.

> How would you know you aren't in inadvertently masking a name in a
> function you call?

I think is really an issue with the name binding semantics in Python. 
There are benefits to not having variable declarations, but with
assignment meaning bind locally, you can already shadow a name in a
nested scope inadvertently today.

> In most cases it will probably break something in an obvious way, but I
> suppose in some cases it won't be so obvious.

Having the "global" keyword semantics changed to be "lexically global"
would break in the cases that "global" is used on a name within a
nested scope that has an enclosing scope with the same name.  I would
suppose that actual instances in real code of this would be rare.

>>> x = 1
>>> def f() :
...   x = 2
...   def inner() :
...     global x
...     print x
...   inner()
>>> f()

Under the proposed rules:
>>> f()

PEP 227 also had backwards incompatibilities that were similar and I
suggest handling them the same way by issuing a warning in these cases
when the new semantics are not being used (i.e. no "from __future__").

Almann T. Goo at

From at  Sun Feb 26 21:28:56 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 15:28:56 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP
	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
	<dtspfk$o7h$> <>
Message-ID: <>

On 2/26/06, Alex Martelli <aleaxit at> wrote:
> For the record: I detest the existing 'global' (could I change but
> ONE thing in Python, that would be the one -- move from hated
> 'global' to a decent namespace use, e.g. glob.x=23 rather than global
> x;x=23), and I'd detest a similar 'outer' just as intensely (again,
> what I'd like instead is a decent namespace) -- so I might well be
> sympathetic to your POV, if I could but understand it;-).

I would prefer a more explicit means to accomplish this too (I sort of
like the prefix dot in this regard), however the fundamental problem
with allowing this lies in how accessing and binding names works in
Python today (sorry if I sound like a broken record in this regard).

Unless we change how names can be accessed/re-bound (very bad for
backwards compatibility), any proposal that forces explicit name
spaces would have to allow for both accessing "simple names" (like
just "var") and names via attribute access (name spaces) like
"glob.var"--I think this adds the problem of introducing obscurity to
the language.


Almann T. Goo at

From rrr at  Mon Feb 27 01:20:07 2006
From: rrr at (Ron Adam)
Date: Sun, 26 Feb 2006 18:20:07 -0600
Subject: [Python-Dev] Using and binding relative names (was Re:
 PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <><><><><><><><><><>	<>
	<dtspfk$o7h$> <>
Message-ID: <>

Alex Martelli wrote:
> On Feb 26, 2006, at 11:47 AM, Ron Adam wrote:
>    ...
>> How would you know you aren't in inadvertently masking a name in a
>> function you call?
> What does calling have to do with it?  Nobody's proposing a move to 
> (shudder) dynamic scopes, we're talking of saner concepts such as 
> lexical scopes anyway.  Can you give an example of what you mean?

(sigh of relief) Ok, so the following example will still be true.

def foo(n):             #foo is a global
    return n

def bar(n):
    return foo(n)        #call to foo is set at compile time

def baz(n):
    foo = lambda x: 7    #will not replace foo called in bar.
    return bar(n)

print baz(42)

I guess I don't quite get what they are proposing yet.

It seems to me adding intermediate scopes are making functions act more 
like class's.  After you add naming conventions to functions they begin 
to look like this.

""" Multiple n itemiter """
class baz(object):
     def getn(baz, n):
         start = baz.start
         baz.start += n
         return baz.lst[start:start+n]
     def __init__(baz, lst):
         baz.lst = lst
         baz.start = 0

b = baz(range(100))

for n in range(1,10):
     print b.getn(n)

> For the record: I detest the existing 'global' (could I change but ONE 
> thing in Python, that would be the one -- move from hated 'global' to a 
> decent namespace use, e.g. glob.x=23 rather than global x;x=23), and I'd 
> detest a similar 'outer' just as intensely (again, what I'd like instead 
> is a decent namespace) -- so I might well be sympathetic to your POV, if 
> I could but understand it;-).

Maybe something explicit like:

 >>> import __main__ as glob
 >>> glob.x = 10
 >>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': 
'__main__', 'glo
b': <module '__main__' (built-in)>, '__doc__': None, 'x': 10}

That could eliminate the global keyword.

I'm -1 on adding the intermediate (outer) scopes to functions. I'd even 
like to see closures gone completely, but there's probably a reason they 
are there.  What I like about functions is they are fast, clean up 
behind themselves, and act *exactly* the same on consecutive calls.



From thomas at  Mon Feb 27 01:31:56 2006
From: thomas at (Thomas Wouters)
Date: Mon, 27 Feb 2006 01:31:56 +0100
Subject: [Python-Dev] PEP 308
Message-ID: <>

Since I was on a streak of implementing not-quite-the-right-thing, I checked
in my PEP 308 implementation *with* backward compatibility -- just to spite
Guido's latest change to the PEP. It jumps through a minor hoop (two new
grammar rules) in order to be backwardly compatible, but that hoop can go
away in Python 3.0, and that shouldn't be too long from now. I apologize for
the test failures of compile, transform and parser: they seem to all depend
on the parsermodule being updated. If no one feels responsible for it, I'll
do it later in the week (I'll be sprinting until Thursday anyway.)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From greg.ewing at  Mon Feb 27 01:47:25 2006
From: greg.ewing at (Greg Ewing)
Date: Mon, 27 Feb 2006 13:47:25 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

 > I gave you one, MIME processing in email

If implementing a mime packer is really the only use case
for base64, then it might as well be removed from the
standard library, since 99.99999% of all programmers will
never touch it. Those that do will need to have boned up
on the subject of encoding until it's coming out their
ears, so they'll know what they're doing in any case. And
they'll be quite competent to write their own base64
encoder that works however they want it to.

I don't have any real-life use cases for base64 that a
non-mime-implementer might come across, so all I can do
is imagine what shape such a use case might have.

When I do that, I come up with what I've already described.
The programmer wants to send arbitrary data over a channel
that only accepts text. He doesn't know, and doesn't want
to have to know, how the channel encodes that text --
it might be ASCII or EBCDIC or morse code, it shouldn't
matter. If his Python base64 encoder produces a Python
character string, and his Python channel interface accepts
a Python character string, he doesn't have to know.

> I think it's your turn.  Give me a use case where it matters
> practically that the output of the base64 codec be Python unicode
> characters rather than 8-bit ASCII characters.

I'd be perfectly happy with ascii characters, but in Py3k,
the most natural place to keep ascii characters will be in
character strings, not byte arrays.

 > Everything you have written so far is based on
> defending your maintained assumption that because Python implements
> text processing via the unicode type, everything that is described as
> a "character" must be coerced to that type.

I'm not just blindly assuming that because the RFC happens
to use the word "character". I'm also looking at how it uses
that word in an effort to understand what it means. It
*doesn't* specify what bit patterns are to be used to
represent the characters. It *does* mention two "character
sets", namely ASCII and EBCDIC, with the implication that
the characters it is talking about could be taken as being
members of either of those sets. Since the Unicode character
set is a superset of the ASCII character set, it doesn't
seem unreasonable that they could also be thought of as
Unicode characters.

> I don't really see a downside, except for the occasional double
> conversion ASCII -> unicode -> UTF-16, as is allowed (but not
> mandated) in XML's use of base64.  What downside do you see?

It appears that all your upsides I see as downsides, and
vice versa. We appear to be mutually upside-down. :-)

XML is another example. Inside a Python program, the most
natural way to represent an XML is as a character string.
Your way, embedding base64 in it would require converting
the bytes produced by the base64 encoder into a character
string in some way, taking into account the assumed ascii
encoding of said bytes. My way, you just use the result
directly, with no coding involved at all.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiam!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at	   +--------------------------------------+

From aleaxit at  Mon Feb 27 01:55:21 2006
From: aleaxit at (Alex Martelli)
Date: Sun, 26 Feb 2006 16:55:21 -0800
Subject: [Python-Dev] Using and binding relative names (was Re:
	PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <><><><><><><><><><>	<>
	<dtspfk$o7h$> <>
Message-ID: <>

On Feb 26, 2006, at 4:20 PM, Ron Adam wrote:
> (sigh of relief) Ok, so the following example will still be true.

Yep, no danger of dynamic scoping, be certain of that.

> Maybe something explicit like:
>>>> import __main__ as glob

Sure, or the more general ''glob=__import__(__name__)''.

> I'm -1 on adding the intermediate (outer) scopes to functions. I'd  
> even
> like to see closures gone completely, but there's probably a reason  
> they
> are there.  What I like about functions is they are fast, clean up
> behind themselves, and act *exactly* the same on consecutive calls.

Except that the latter assertion is just untrue in Python -- we  
already have a bazilion ways to perform side effects, and, since  
there is no procedure/function distinction, side effects in functions  
are an extremely common thing.  If you're truly keen on having the  
"exactly the same" property, you may want to look into functional  
languages, such as Haskell -- there, all data is immutable, so the  
property does hold (any *indispensable* side effects, e.g. I/O, are  
packed into 'monads' -- but that's another story).

Closures in Python are often extremely handy, as long as you use them  
much as you would in Haskell -- treating data as immutable (and in  
particular outer names as unrebindable). You'd think that functional  
programming fans wouldn't gripe so much about Python closures being  
meant for use like Haskell ones, hm?-)  But, of course, they do want  
to have their closure and rebind names too...


From at  Mon Feb 27 02:27:42 2006
From: at (Almann T. Goo)
Date: Sun, 26 Feb 2006 20:27:42 -0500
Subject: [Python-Dev] Using and binding relative names (was Re: PEP
	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
	<dtspfk$o7h$> <>
Message-ID: <>

On 2/26/06, Ron Adam <rrr at> wrote:
> I'm -1 on adding the intermediate (outer) scopes to functions. I'd even
> like to see closures gone completely, but there's probably a reason they
> are there.

We already have enclosing scopes since Python 2.1--this is PEP 227
(  The proposal is for a
mechanism to allow for re-binding of enclosing scopes which seems like
a logical step to me.  The rest of the scoping semantics would remain
as they are today in Python.


Almann T. Goo at

From rrr at  Mon Feb 27 02:43:49 2006
From: rrr at (Ron Adam)
Date: Sun, 26 Feb 2006 19:43:49 -0600
Subject: [Python-Dev] Using and binding relative names (was Re:
 PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <><><><><><><><><><>	<>
	<dtspfk$o7h$> <>
Message-ID: <>

Alex Martelli wrote:

>> I'm -1 on adding the intermediate (outer) scopes to functions. I'd even
>> like to see closures gone completely, but there's probably a reason they
>> are there.  What I like about functions is they are fast, clean up
>> behind themselves, and act *exactly* the same on consecutive calls.
> Except that the latter assertion is just untrue in Python -- we already 
> have a bazilion ways to perform side effects, and, since there is no 
> procedure/function distinction, side effects in functions are an 
> extremely common thing.  If you're truly keen on having the "exactly the 
> same" property, you may want to look into functional languages, such as 
> Haskell -- there, all data is immutable, so the property does hold (any 
> *indispensable* side effects, e.g. I/O, are packed into 'monads' -- but 
> that's another story).

True, I should have said mostly act the same when using them in a common 
and direct way. I know we can change all sorts of behaviors fairly 
easily if we choose to.

> Closures in Python are often extremely handy, as long as you use them 
> much as you would in Haskell -- treating data as immutable (and in 
> particular outer names as unrebindable). You'd think that functional 
> programming fans wouldn't gripe so much about Python closures being 
> meant for use like Haskell ones, hm?-)  But, of course, they do want to 
> have their closure and rebind names too...

So far everywhere I've seen closures used, a class would work.  But 
maybe not as conveniently or as fast?

On the other side of the coin there are those who want to get rid of the 
"self" variable in class's also.  Which would cause classes to look more 
like nested functions.

Haskel sounds interesting, maybe I'll try a bit of it sometime.  But I 
like Python. ;-)


From aleaxit at  Mon Feb 27 03:54:19 2006
From: aleaxit at (Alex Martelli)
Date: Sun, 26 Feb 2006 18:54:19 -0800
Subject: [Python-Dev] Using and binding relative names (was Re:
	PEP	forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <><><><><><><><><><>	<>
	<dtspfk$o7h$> <>
Message-ID: <>

On Feb 26, 2006, at 5:43 PM, Ron Adam wrote:
> So far everywhere I've seen closures used, a class would work.  But
> maybe not as conveniently or as fast?

Yep.  In this, closures are like generators: much more convenient  
than purpose-built classes, but not as general.

> Haskel sounds interesting, maybe I'll try a bit of it sometime.  But I
> like Python. ;-)

So do I, so do many others: the first EuroHaskell was held the day  
right after a EuroPython, in the same venue (a Swedish University,  
Chalmers) -- that was convenient because so many delegates were  
interested in both languages, see.

We stole list comprehensions and genexps from Haskell (the idea and  
most of the semantics, not the syntax, which was Pythonized  
relentlessly) -- and the two languages share the concept of  
indentation being significant for grouping, with some minor  
differences in details since they developed these concepts  
independently. Hey, what more do you need?-)


From tim.peters at  Mon Feb 27 06:36:20 2006
From: tim.peters at (Tim Peters)
Date: Sun, 26 Feb 2006 23:36:20 -0600
Subject: [Python-Dev] Current trunk test failures
Message-ID: <>

The buildbot shows that the debug-build test_grammar is dying with a C
assert failure on all boxes.

In case it helps, in a Windows release build test_transformer is also failing:

test test_transformer failed -- Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 16, in
    a = transformer.parse(s)
  File "C:\Code\python\lib\compiler\", line 52, in parse
    return Transformer().parsesuite(buf)
  File "C:\Code\python\lib\compiler\", line 129, in parsesuite
    return self.transform(parser.suite(text))
  File "C:\Code\python\lib\compiler\", line 125, in transform
    return self.compile_node(tree)
  File "C:\Code\python\lib\compiler\", line 158, in compile_node
    return self.file_input(node[1:])
  File "C:\Code\python\lib\compiler\", line 189, in file_input
    self.com_append_stmt(stmts, node)
  File "C:\Code\python\lib\compiler\", line 1036, in
    result = self.lookup_node(node)(node[1:])
  File "C:\Code\python\lib\compiler\", line 305, in stmt
    return self.com_stmt(nodelist[0])
  File "C:\Code\python\lib\compiler\", line 1029, in com_stmt
    result = self.lookup_node(node)(node[1:])
  File "C:\Code\python\lib\compiler\", line 315, in simple_stmt
    self.com_append_stmt(stmts, nodelist[i])
  File "C:\Code\python\lib\compiler\", line 1036, in
    result = self.lookup_node(node)(node[1:])
  File "C:\Code\python\lib\compiler\", line 305, in stmt
    return self.com_stmt(nodelist[0])
  File "C:\Code\python\lib\compiler\", line 1029, in com_stmt
    result = self.lookup_node(node)(node[1:])
  File "C:\Code\python\lib\compiler\", line 353, in expr_stmt
    exprNode = self.lookup_node(en)(en[1:])
  File "C:\Code\python\lib\compiler\", line 763, in lookup_node
    return self._dispatch[node[0]]
KeyError: 324

Also test_parser:

C:\Code\python\PCbuild>python  -E -tt ../lib/test/ -v test_parser
test_assert (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
(test.test_parser.RoundtripLegalSyntaxTestCase) ... ok
test_class_defs (test.test_parser.RoundtripLegalSyntaxTestCase) ... ok
test_expressions (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
test_function_defs (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
(test.test_parser.RoundtripLegalSyntaxTestCase) ... ok
test_pep263 (test.test_parser.RoundtripLegalSyntaxTestCase) ... ok
test_print (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
test_simple_assignments (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
(test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
test_simple_expression (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
test_yield_statement (test.test_parser.RoundtripLegalSyntaxTestCase) ... FAIL
test_a_comma_comma_c (test.test_parser.IllegalSyntaxTestCase) ... ok
test_illegal_operator (test.test_parser.IllegalSyntaxTestCase) ... ok
test_illegal_yield_1 (test.test_parser.IllegalSyntaxTestCase) ... ok
test_illegal_yield_2 (test.test_parser.IllegalSyntaxTestCase) ... ok
test_junk (test.test_parser.IllegalSyntaxTestCase) ... ok
test_malformed_global (test.test_parser.IllegalSyntaxTestCase) ... ok
test_print_chevron_comma (test.test_parser.IllegalSyntaxTestCase) ... ok
test_compile_error (test.test_parser.CompileTestCase) ... ok
test_compile_expr (test.test_parser.CompileTestCase) ... ok
test_compile_suite (test.test_parser.CompileTestCase) ... ok

FAIL: test_assert (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 180, in test_assert
    self.check_suite("assert alo < ahi and blo < bhi\n")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'assert alo < ahi and blo <
bhi\n': Expected node type 303, got 302.

FAIL: test_expressions (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 50, in test_expressions
  File "C:\Code\python\lib\test\", line 25, in check_expr
    self.roundtrip(parser.expr, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'foo(1)': Expected node type 303, got 302.

FAIL: test_function_defs (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 119, in test_function_defs
    self.check_suite("def f(foo=bar): pass")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'def f(foo=bar): pass': Expected
node type 303, got 302.

FAIL: test_print (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 86, in test_print
    self.check_suite("print 1")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'print 1': Expected node type 303, got 302.

FAIL: test_simple_assignments (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 97, in
    self.check_suite("a = b")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'a = b': Expected node type 303, got 302.

FAIL: test_simple_augmented_assignments
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 101, in
    self.check_suite("a += b")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'a += b': Expected node type 303, got 302.

FAIL: test_simple_expression (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 94, in
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'a': Expected node type 303, got 302.

FAIL: test_yield_statement (test.test_parser.RoundtripLegalSyntaxTestCase)
Traceback (most recent call last):
  File "C:\Code\python\lib\test\", line 31, in
    self.check_suite("def f(): yield 1")
  File "C:\Code\python\lib\test\", line 28, in check_suite
    self.roundtrip(parser.suite, s)
  File "C:\Code\python\lib\test\", line 19, in roundtrip"could not roundtrip %r: %s" % (s, why))
AssertionError: could not roundtrip 'def f(): yield 1': Expected node
type 303, got 302.

Ran 22 tests in 0.015s

FAILED (failures=8)
test test_parser failed -- errors occurred; run in verbose mode for details
1 test failed:

and also test_compiler.

From stephen at  Mon Feb 27 06:59:44 2006
From: stephen at (Stephen J. Turnbull)
Date: Mon, 27 Feb 2006 14:59:44 +0900
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <> (Greg Ewing's message of
	"Mon, 27 Feb 2006 13:47:25 +1300")
References: <>
Message-ID: <>

>>>>> "Greg" == Greg Ewing <greg.ewing at> writes:

    Greg> Stephen J. Turnbull wrote:

    >> I gave you one, MIME processing in email

    Greg> If implementing a mime packer is really the only use case
    Greg> for base64, then it might as well be removed from the
    Greg> standard library, since 99.99999% of all programmers will
    Greg> never touch it.  I don't have any real-life use cases for
    Greg> base64 that a non-mime-implementer might come across, so all
    Greg> I can do is imagine what shape such a use case might have.

I guess we don't have much to talk about, then.

    >> Give me a use case where it matters practically that the output
    >> of the base64 codec be Python unicode characters rather than
    >> 8-bit ASCII characters.

    Greg> I'd be perfectly happy with ascii characters, but in Py3k,
    Greg> the most natural place to keep ascii characters will be in
    Greg> character strings, not byte arrays.

Natural != practical.

Anyway, I disagree, and I've lived with the problems that come with an
environment that mixes objects with various underlying semantics into
a single "text stream" for a decade and a half.

That doesn't make me authoritative, but as we agree to disagree, I
hope you'll keep in mind that someone with real-world experience that
is somewhat relevant[1] to the issue doesn't find that natural at all.

    Greg> Since the Unicode character set is a superset of the ASCII
    Greg> character set, it doesn't seem unreasonable that they could
    Greg> also be thought of as Unicode characters.

I agree.  However, as soon as I go past that intuition to thinking
about what that implies for _operations_ on the base64 string, it
begins to seem unreasonable, unnatural, and downright dangerous.  The
base64 string is a representation of an object that doesn't have text
semantics.  Nor do base64 strings have text semantics: they can't even
be concatenated as text (the pad character '=' is typically a syntax
error in a profile of base64, except as terminal padding).  So if you
wish to concatenate the underlying objects, the base64 strings must be
decoded, concatenated, and re-encoded in the general case.  IMO it's
not worth preserving the very superficial coincidence of "character
representation" in the face of such semantics.

I think that fact that favoring the coincidence of representation
leads you to also deprecate the very natural use of the codec API to
implement and understand base64 is indicative of a deep problem with
the idea of implementing base64 as bytes->unicode.

[1]  That "somewhat" is intended literally; my specialty is working
with codecs for humans in Emacs, but I've also worked with more
abstract codecs such as base64 in contexts like email, in both LISP
and Python.

School of Systems and Information Engineering
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

From greg.ewing at  Mon Feb 27 10:40:08 2006
From: greg.ewing at (Greg Ewing)
Date: Mon, 27 Feb 2006 22:40:08 +1300
Subject: [Python-Dev] Using and binding relative names (was Re:	PEP
 forBetter Control of Nested Lexical Scopes)
In-Reply-To: <>
References: <>
	<dtspfk$o7h$> <>
Message-ID: <>

Alex Martelli wrote:

> We stole list comprehensions and genexps from Haskell

The idea predates Haskell, I think. I first saw it in
Miranda, and it may have come from something even
earlier -- SETL, maybe?


From greg.ewing at  Mon Feb 27 12:41:25 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 28 Feb 2006 00:41:25 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

>     Greg> I'd be perfectly happy with ascii characters, but in Py3k,
>     Greg> the most natural place to keep ascii characters will be in
>     Greg> character strings, not byte arrays.
> Natural != practical.

That seems to be another thing we disagree about --
to me it seems both natural *and* practical.

The whole business of stuffing binary data down a
text channel is a practicality-beats-purity kind of
thing. You wouldn't do it if you had a real binary
channel available, but if you don't, it's better
than nothing.

> The base64 string is a representation of an object
 > that doesn't have text semantics.

But the base64 string itself *does* have text
semantics. That's the whole point of base64 --
to represent a non-text object *using* text.

To me this is no different than using a string
of decimal digit characters to represent an
integer, or a string of hexadecimal digit
characters to represent a bit pattern. Would
you say that those are not text, either?

What about XML? What would you consider the
proper data type for an XML document to be
inside a Python program -- bytes or text?
I'm genuinely interested in your answer to
that, because I'm trying to understand where
you draw the line between text and non-text.

You seem to want to reserve the term "text" for
data that doesn't ever have to be understood
even a little bit by a computer program, but that
seems far too restrictive to me, and a long
way from established usage.

> Nor do base64 strings have text semantics: they can't even
> be concatenated as text ...  So if you
> wish to concatenate the underlying objects, the base64 strings must be
> decoded, concatenated, and re-encoded in the general case.

You can't add two integers by concatenating
their base-10 character representation, either,
but I wouldn't take that as an argument against
putting decimal numbers into text files.

Also, even if we follow your suggestion and store
our base64-encoded data in byte arrays, we *still*
wouldn't be able to concatenate the original data
just by concatenating those byte arrays. So this
argument makes no sense either way.

> IMO it's not worth preserving the very superficial
 > coincidence of "character representation"

I disagree entirely that it's superficial. On the
contrary, it seems to me to be very essence of
what base64 is all about.

If there's any "coincidence of representation" it's
in the idea of storing the result as ASCII bit patterns
in a byte array, on the assumption that that's probably
how they're going to end up being represented down the

That assumption could be very wrong. What happens if
it turns out they really need to be encoded as UTF-16,
or as EBCDIC? All hell breaks loose, as far as I can
see, unless the programmer has kept very firmly in
mind that there is an implicit ASCII encoding involved.

It's exactly to avoid the need for those kinds of
mental gymnastics that Py3k will have a unified,
encoding-agnostic data type for all character strings.

> I think that fact that favoring the coincidence of representation
> leads you to also deprecate the very natural use of the codec API to
> implement and understand base64 is indicative of a deep problem with
> the idea of implementing base64 as bytes->unicode.

Not sure I'm following you. I don't object to
implementing base64 as a codec, only to exposing
it via the same interface as the "real" unicode
codecs like utf8, etc. I thought we were in
agreement about that.

If you're thinking that the mere fact its input
type is bytes and its output type is characters is
going to lead to its mistakenly appearing via that
interface, that would be a bug or design flaw in
the mechanism that controls which codecs appear
via that interface. It needs to be controlled by
something more than just the input and output


From ncoghlan at  Mon Feb 27 13:55:05 2006
From: ncoghlan at (Nick Coghlan)
Date: Mon, 27 Feb 2006 22:55:05 +1000
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<015901c6395d$3b9fd4b0$2e2a960a@RaymondLaptop1>
Message-ID: <>

Greg Ewing wrote:
> Raymond Hettinger wrote:
>> Code that 
>> uses next() is more understandable, friendly, and readable without the 
>> walls of underscores.
> There wouldn't be any walls of underscores, because
>    y =
> would become
>    y = next(x)
> The only time you would need to write underscores is
> when defining a __next__ method. That would be no worse
> than defining an __init__ or any other special method,
> and has the advantage that it clearly marks the method
> as being special.

I wouldn't mind seeing one of the early ideas from PEP 340 being resurrected 
some day, such that the signature for the special method was "__next__(self, 
input)" and for the builtin "next(iterator, input=None)"

That would go hand in hand with the idea of allowing the continue statement to 
accept an argument though.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From fredrik at  Mon Feb 27 16:34:00 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 27 Feb 2006 16:34:00 +0100
Subject: [Python-Dev] PEP 332 revival in coordination with pep 349?
	[Was:Re: release plan for 2.5 ?]
References: <>
Message-ID: <dtv659$gtk$>

Just van Rossum wrote:

> > If bytes support the buffer interface, we get another interesting
> > issue -- regular expressions over bytes. Brr.
> We already have that:
>   >>> import re, array
>   >>>'\2', array.array('B', [1, 2, 3, 4])).group()
>   array('B', [2])
>   >>>
> Not sure whether to blame array or re, though...

SRE.  iirc, the design rationale was to support RE over mmap'ed regions.


From guido at  Mon Feb 27 17:02:12 2006
From: guido at (Guido van Rossum)
Date: Mon, 27 Feb 2006 10:02:12 -0600
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2/27/06, Nick Coghlan <ncoghlan at> wrote:
> I wouldn't mind seeing one of the early ideas from PEP 340 being resurrected
> some day, such that the signature for the special method was "__next__(self,
> input)" and for the builtin "next(iterator, input=None)"
> That would go hand in hand with the idea of allowing the continue statement to
> accept an argument though.

Yup. The continue thing we might add in 2.6. The __next__ API in 3.0.

--Guido van Rossum (home page:

From thomas at  Mon Feb 27 17:33:31 2006
From: thomas at (Thomas Wouters)
Date: Mon, 27 Feb 2006 17:33:31 +0100
Subject: [Python-Dev] Current trunk test failures
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Feb 26, 2006 at 11:36:20PM -0600, Tim Peters wrote:

> The buildbot shows that the debug-build test_grammar is dying with a C
> assert failure on all boxes.
> In case it helps, in a Windows release build test_transformer is also failing:

All build/test failures introduced by the PEP 308 patch should be fixed
(thanks, Martin!)

Thomas Wouters <thomas at>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

From janssen at  Mon Feb 27 18:38:41 2006
From: janssen at (Bill Janssen)
Date: Mon, 27 Feb 2006 09:38:41 PST
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: Your message of "Sun, 26 Feb 2006 16:47:25 PST."
Message-ID: <06Feb27.093841pst."58633">

> If implementing a mime packer is really the only use case
> for base64, then it might as well be removed from the
> standard library, since 99.99999% of all programmers will
> never touch it. Those that do will need to have boned up

I use it quite a bit for image processing (converting to and from the
"data:" URL form), and various checksum applications (converting SHA
into a string).


From mal at  Mon Feb 27 18:40:34 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 27 Feb 2006 18:40:34 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
Message-ID: <>

Microsoft has recently released their express version of the Visual C++.
Given that this version is free for everyone, wouldn't it make sense
to ship Python 2.5 compiled with this version ?!

I suppose this would make compiling extensions easier for people
who don't have a standard VC++ .NET installed.

Note: This is just a thought - I haven't looked into the consequences
of building with VC8 yet, e.g. from the list of pre-requisites,
it's possible that .NET 2.0 would become a requirement.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 27 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From aleaxit at  Mon Feb 27 18:51:40 2006
From: aleaxit at (Alex Martelli)
Date: Mon, 27 Feb 2006 09:51:40 -0800
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/27/06, M.-A. Lemburg <mal at> wrote:
> Microsoft has recently released their express version of the Visual C++.
> Given that this version is free for everyone, wouldn't it make sense
> to ship Python 2.5 compiled with this version ?!
> I suppose this would make compiling extensions easier for people
> who don't have a standard VC++ .NET installed.

It would sure be nice for people like me with "occasional dabbler in
Windows" status, so, selfishly, I'd be all in favor.  However...:

What I hear from the rumor mill (not perhaps a reliable source) is a
bit discouraging about the stability of VS2005 (e.g. internal
rebellion at MS in which groups which need to ship a lot of code
pushed back against any attempt to make them use VS2005, and managed
to win the internal fight and stick with VS2003), but I don't know if
any such worry applies to something as simple as the mere compilation
of C code...

> Note: This is just a thought - I haven't looked into the consequences
> of building with VC8 yet, e.g. from the list of pre-requisites,
> it's possible that .NET 2.0 would become a requirement.

You mean, to RUN vc8-compiled Python?!  That would be perhaps the
first C compiler ever unable to produce "native", stand-alone code,
wouldn't it?


From phil at  Mon Feb 27 19:00:50 2006
From: phil at (Phil Thompson)
Date: Mon, 27 Feb 2006 18:00:50 +0000
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

On Monday 27 February 2006 5:51 pm, Alex Martelli wrote:
> On 2/27/06, M.-A. Lemburg <mal at> wrote:
> > Microsoft has recently released their express version of the Visual C++.
> > Given that this version is free for everyone, wouldn't it make sense
> > to ship Python 2.5 compiled with this version ?!
> >
> >
> >
> > I suppose this would make compiling extensions easier for people
> > who don't have a standard VC++ .NET installed.
> It would sure be nice for people like me with "occasional dabbler in
> Windows" status, so, selfishly, I'd be all in favor.  However...:
> What I hear from the rumor mill (not perhaps a reliable source) is a
> bit discouraging about the stability of VS2005 (e.g. internal
> rebellion at MS in which groups which need to ship a lot of code
> pushed back against any attempt to make them use VS2005, and managed
> to win the internal fight and stick with VS2003), but I don't know if
> any such worry applies to something as simple as the mere compilation
> of C code...

...but some extension modules are 500,000 lines of C++.


From benji at  Mon Feb 27 19:42:32 2006
From: benji at (Benji York)
Date: Mon, 27 Feb 2006 13:42:32 -0500
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

M.-A. Lemburg wrote:
> Microsoft has recently released their express version of the Visual C++.
> Given that this version is free for everyone, wouldn't it make sense
> to ship Python 2.5 compiled with this version ?!

The express editions are only "free" until November 7th:
Benji York

From martin at  Mon Feb 27 20:58:46 2006
From: martin at (martin at
Date: Mon, 27 Feb 2006 20:58:46 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von "M.-A. Lemburg" <mal at>:

> Microsoft has recently released their express version of the Visual C++.
> Given that this version is free for everyone, wouldn't it make sense
> to ship Python 2.5 compiled with this version ?!

Not in my opinion. People have also commented that they want to continue
with this version (i.e. 7.1.). I actually hope that Python can skip
VS 2005, and go right away to the next version.


From mal at  Mon Feb 27 21:03:06 2006
From: mal at (M.-A. Lemburg)
Date: Mon, 27 Feb 2006 21:03:06 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

Alex Martelli wrote:
> On 2/27/06, M.-A. Lemburg <mal at> wrote:
>> Microsoft has recently released their express version of the Visual C++.
>> Given that this version is free for everyone, wouldn't it make sense
>> to ship Python 2.5 compiled with this version ?!
>> I suppose this would make compiling extensions easier for people
>> who don't have a standard VC++ .NET installed.
> It would sure be nice for people like me with "occasional dabbler in
> Windows" status, so, selfishly, I'd be all in favor.  However...:
> What I hear from the rumor mill (not perhaps a reliable source) is a
> bit discouraging about the stability of VS2005 (e.g. internal
> rebellion at MS in which groups which need to ship a lot of code
> pushed back against any attempt to make them use VS2005, and managed
> to win the internal fight and stick with VS2003), but I don't know if
> any such worry applies to something as simple as the mere compilation
> of C code...

Should I read this as: VC8 is unstable ?

Perhaps that's the reason they decided to give it away for
free for the first year.

>> Note: This is just a thought - I haven't looked into the consequences
>> of building with VC8 yet, e.g. from the list of pre-requisites,
>> it's possible that .NET 2.0 would become a requirement.
> You mean, to RUN vc8-compiled Python?!  That would be perhaps the
> first C compiler ever unable to produce "native", stand-alone code,
> wouldn't it?

Well, the code that VC7 generates relies on MSVCR71.DLL
which appears to be part of .NET 1.1. It's hard to tell
since I don't have a system around without .NET on it.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 27 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From martin at  Mon Feb 27 21:12:35 2006
From: martin at (martin at
Date: Mon, 27 Feb 2006 21:12:35 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

Zitat von "M.-A. Lemburg" <mal at>:

> > What I hear from the rumor mill (not perhaps a reliable source) is a
> > bit discouraging about the stability of VS2005 (e.g. internal
> > rebellion at MS in which groups which need to ship a lot of code
> > pushed back against any attempt to make them use VS2005, and managed
> > to win the internal fight and stick with VS2003), but I don't know if
> > any such worry applies to something as simple as the mere compilation
> > of C code...
> Should I read this as: VC8 is unstable ?

Not sure how Alex interprets this; I think that one of the good reasons
not to use VS2005 is that they managed to "break" the C library: change
it from standard C in an incompatible way that they think is better for
the end user. One of these changes broke Python; we now have a work-around
for this breakage.

In addition to changing the library behaviour, they also produce tons
of warnings about perfectly correct code.

> Well, the code that VC7 generates relies on MSVCR71.DLL
> which appears to be part of .NET 1.1. It's hard to tell
> since I don't have a system around without .NET on it.

I don't believe .NET 1.1 ships msvcr71.dll. Actually, Microsoft
discourages installing msvcr into system32, so that would be against
their own guidelines.


From jason.orendorff at  Mon Feb 27 21:14:00 2006
From: jason.orendorff at (Jason Orendorff)
Date: Mon, 27 Feb 2006 15:14:00 -0500
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
	<dto68f$3ca$> <>
	<dtob1t$f52$> <>
Message-ID: <>

Neil Schemenauer wrote:
> Ron Adam <rrr at> wrote:
>> Why was it decided that the unicode encoding argument should be ignored
>> if the first argument is a string?  Wouldn't an exception be better
>> rather than give the impression it does something when it doesn't?
>From the PEP:
>     There is no sane meaning that the encoding can have in that
>     case.  str objects *are* byte arrays and they know nothing about
>     the encoding of character data they contain.  We need to assume
>     that the programmer has provided str object that already uses
>     the desired encoding.
> Raising an exception would be a valid option.  However, passing the
> string through unchanged makes the transition from str to bytes
> easier.

Does it?

I am quite certain the bytes PEP is dead wrong on this.  It should be changed.

Suppose I have code like this:

    def faz(s):
        return s.encode('utf-16be')

If I want to transition from str to bytes, how should I change this code?

    def faz(s):
        return bytes(s, 'utf-16be')  # OOPS - subtle bug

This silently does the wrong thing when s is a str.  If I hadn't read
the PEP, I would confidently assume that bytes(str, encoding) ==
bytes(unicode, encoding), modulo the default encoding.  I'd be wrong. 
But there's a really good reason to think this.  Wherever a unicode
argument is expected in Python 2.x, you can pass a str and it'll be
silently decoded.  This is an extremely strong convention.  It's even
embedded in PyArg_ParseTuple().  I can't think of any exceptions to
the rule, offhand.

Is this special case special enough to break the rules?  Arguable.  I
suspect not.  But even if so, allowing the breakage to pass silently
is surely a mistake.  It should just refuse the temptation to guess,
and throw an exception--right?

Now you may be thinking:  the str/unicode duality of text, and the
bytes/text duality of the "str" type, are *bad* things, and we're
trying to get rid of them.  True.  My view is, we'll be rid of them in
3.0 regardless.  In the meantime, there is no point trying to pretend
that 2.0 "str" is bytes and not text.  It just ain't so; you'll only
succeed in confusing people and causing bugs.  (And in 3.0 you're
going to turn around and tell them "str" *is* text!)

Good APIs make simple, sensible, comprehensible promises.  I like
these promises:
  - bytes(arg) works like array.array('b', arg)
  - bytes(arg1, arg2) works like bytes(arg1.encode(arg2))

I dislike these promises:
  - bytes(s, [ignored]), where s is a str, works like array.array('b', s)
  - bytes(u, [encoding]), where u is a unicode,
        works like bytes(u.encode(encoding))

It seems more Pythonic to differentiate based on the number of
arguments, rather than the type.


P.S.  As someone who gets a bit agitated when the word "Pythonic" or
the Zen of Python is taken in vain, I'd like to know if anyone feels
I've done so here, so I can properly apologize.  Thanks.

From trentm at  Mon Feb 27 22:18:07 2006
From: trentm at (Trent Mick)
Date: Mon, 27 Feb 2006 13:18:07 -0800
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <>
References: <>
Message-ID: <>

[Alex Martelli wrote]
> What I hear from the rumor mill (not perhaps a reliable source) is a
> bit discouraging about the stability of VS2005 (e.g. internal
> rebellion at MS in which groups which need to ship a lot of code
> pushed back against any attempt to make them use VS2005, and managed
> to win the internal fight and stick with VS2003), but I don't know if
> any such worry applies to something as simple as the mere compilation
> of C code...

As a (perhaps significant) datapoint: the Mozilla guys are moving to
building with VS2005. That's lots of C++ and widely run -- though
probably not the C runtime so much.


Trent Mick
TrentM at

From fredrik at  Mon Feb 27 22:53:26 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 27 Feb 2006 22:53:26 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
References: <>
Message-ID: <dtvscp$9mj$>

M.-A. Lemburg wrote:

> Microsoft has recently released their express version of the Visual C++.
> Given that this version is free for everyone, wouldn't it make sense
> to ship Python 2.5 compiled with this version ?!
> I suppose this would make compiling extensions easier for people
> who don't have a standard VC++ .NET installed.

it also causes more work for those of us who provide ready-made Windows
binaries for more than just the latest and greatest Python release.

if I could chose, I'd use the same compiler for at least one more release...


From martin at  Mon Feb 27 22:57:41 2006
From: martin at (martin at
Date: Mon, 27 Feb 2006 22:57:41 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <dtvscp$9mj$>
References: <> <dtvscp$9mj$>
Message-ID: <>

Zitat von Fredrik Lundh <fredrik at>:

> it also causes more work for those of us who provide ready-made Windows
> binaries for more than just the latest and greatest Python release.
> if I could chose, I'd use the same compiler for at least one more release...

I find this argument convincing.


From fredrik at  Mon Feb 27 23:21:19 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 27 Feb 2006 23:21:19 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
References: <> <dtvscp$9mj$>
Message-ID: <dtvu11$fjn$>

> if I could chose, I'd use the same compiler for at least one more release...

to clarify, the guideline should be "does the new compiler version add some-
thing important ?", rather than just "is there a new version ?"


From fredrik at  Mon Feb 27 23:46:32 2006
From: fredrik at (Fredrik Lundh)
Date: Mon, 27 Feb 2006 23:46:32 +0100
Subject: [Python-Dev] Translating docs
References: <>
Message-ID: <dtvvga$kj9$>

Facundo Batista wrote:

> After a small talk with Raymond, yesterday in the breakfast, I
> proposed in PyAr the idea of start to translate the Library Reference.
> You'll agree with me that this is a BIG effort. But not only big, it's dynamic!
> So, we decided that we need a system that provide us the management of
> the translations. And it'd be a good idea the system to be available
> for translations in other languages.
> One of the guys proposed to use Launchpad (
> The question is, it's ok to use a third party system for this
> initiative? Or you (we) prefer to host it in-house? Someone alredy
> thought of this?

localized editions (with editing support) is definitely within the scope
for a more dynamic library reference platform [1].

with a more granular structure, you can easily track changes on a
method/function level, and dynamically generate pages that suits
the reader ("official english for version X.Y", "experimental norwegian",
"mixed latest english/german", etc).

(but until we get there (if ever), I see no reason not to use an existing
infrastructure, of course).



From martin at  Mon Feb 27 23:57:13 2006
From: martin at (martin at
Date: Mon, 27 Feb 2006 23:57:13 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <dtvu11$fjn$>
References: <> <dtvscp$9mj$>
Message-ID: <>

Zitat von Fredrik Lundh <fredrik at>:

> to clarify, the guideline should be "does the new compiler version add some-
> thing important ?", rather than just "is there a new version ?"

In this specific case, the new thing added is the availability of Visual Studio
Express. Whether this is important, and outweighs the disadvantages, I don't

In addition, I'm uncertain whether this is a new feature. I thought you could
get the VS 2003 compiler (VC 7.1) with the .NET 1.1 SDK. But maybe I'm


From nnorwitz at  Tue Feb 28 00:35:58 2006
From: nnorwitz at (Neal Norwitz)
Date: Mon, 27 Feb 2006 17:35:58 -0600
Subject: [Python-Dev] Making ascii the default encoding
Message-ID: <>

PEP 263 states that in Phase 2 the default encoding will be set to
ASCII.  Although the PEP is marked final, this isn't actually
implemented.  The warning about using non-ASCII characters started in
2.3.  Does anyone think we shouldn't enforce the default being ASCII?

This means if an # -*- coding: ... -*- is not set and non-ASCII
characters are used, an error will be generated.


From bencvt at  Tue Feb 28 00:50:28 2006
From: bencvt at (Ben Cartwright)
Date: Mon, 27 Feb 2006 18:50:28 -0500
Subject: [Python-Dev] str.count is slow
In-Reply-To: <>
References: <>
Message-ID: <>

>From comp.lang.python:
chrisperkins99 at wrote:
> It seems to me that str.count is awfully slow.  Is there some reason
> for this?
> Evidence:
> ######## str.count time test ########
> import string
> import time
> import array
> s = string.printable * int(1e5) # 10**7 character string
> a = array.array('c', s)
> u = unicode(s)
> RIGHT_ANSWER = s.count('a')
> def main():
>     print 'str:    ', time_call(s.count, 'a')
>     print 'array:  ', time_call(a.count, 'a')
>     print 'unicode:', time_call(u.count, 'a')
> def time_call(f, *a):
>     start = time.clock()
>     assert RIGHT_ANSWER == f(*a)
>     return time.clock()-start
> if __name__ == '__main__':
>     main()
> ###### end ########
> On my machine, the output is:
> str:     0.29365715475
> array:   0.448095498171
> unicode: 0.0243757237303
> If a unicode object can count characters so fast, why should an str
> object be ten times slower?  Just curious, really - it's still fast
> enough for me (so far).
> This is with Python 2.4.1 on WinXP.
> Chris Perkins

Your evidence points to some unoptimized code in the underlying C
implementation of Python.  As such, this should probably go to the
python-dev list (

The problem is that the C library function memcmp is slow, and
str.count calls it frequently.  See lines 2165+ in stringobject.c
(inside function string_count):

        r = 0;
        while (i < m) {
                if (!memcmp(s+i, sub, n)) {
                        i += n;
                } else {

This could be optimized as:

        r = 0;
        while (i < m) {
                if (s[i] == *sub && !memcmp(s+i, sub, n)) {
                        i += n;
                } else {

This tactic typically avoids most (sometimes all) of the calls to
memcmp.  Other string search functions, including unicode.count,
unicode.index, and str.index, use this tactic, which is why you see
unicode.count performing better than str.count.

The above might be optimized further for cases such as yours, where a
single character appears many times in the string:

        r = 0;
        if (n == 1) {
                /* optimize for a single character */
                while (i < m) {
                        if (s[i] == *sub)
        } else {
                while (i < m) {
                        if (s[i] == *sub && !memcmp(s+i, sub, n)) {
                                i += n;
                        } else {

Note that there might be some subtle reason why neither of these
optimizations are done that I'm unaware of... in which case a comment
in the C source would help. :-)


From tjreedy at  Tue Feb 28 01:07:40 2006
From: tjreedy at (Terry Reedy)
Date: Mon, 27 Feb 2006 19:07:40 -0500
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
References: <> <>
Message-ID: <du048c$3v4$>

"Benji York" <benji at> wrote in message 
news:44034818.7020101 at
> The express editions are only "free" until November 7th:

One can keep using any version downloaded before that date, but I would not 
be surprised to see a bugfix sometime after.

There is also this:
2.What can I do with the Express Editions?
 Evaluate the .NET Framework for Windows and Web development.
and this

13.Can I develop applications using the Visual Studio Express Editions to 
target the .NET Framework 1.1?
No, each release of Visual Studio is tied to a specific version of the .NET 
Framework. The Express Editions can only be used to create applications 
that run on the .NET Framework 2.0.
'Free' is not always free.  This appears to be a .NET 2 promotion.

Perhaps the Firefox people are using the professional version, without such 
a limitation?

Terry Jan Reedy

From fredrik at  Tue Feb 28 01:06:50 2006
From: fredrik at (Fredrik Lundh)
Date: Tue, 28 Feb 2006 01:06:50 +0100
Subject: [Python-Dev] str.count is slow
References: <><>
Message-ID: <du046r$3rs$>

(manually cross-posting from comp.lang.python)

Ben Cartwright wrote:

> Your evidence points to some unoptimized code in the underlying C
> implementation of Python.  As such, this should probably go to the
> python-dev list (

> This tactic typically avoids most (sometimes all) of the calls to
> memcmp.  Other string search functions, including unicode.count,
> unicode.index, and str.index, use this tactic, which is why you see
> unicode.count performing better than str.count.

it's about time that someone sat down and merged the string and unicode
implementations into a single "stringlib" code base (see the SRE sources for
an efficient way to do this in plain C). [1]

moving to (basic) C++ might also be a good idea (in 3.0, perhaps).  is any-
one still stuck with pure C89 these days ?


1) anyone want me to start working on this ?

From tjreedy at  Tue Feb 28 01:09:34 2006
From: tjreedy at (Terry Reedy)
Date: Mon, 27 Feb 2006 19:09:34 -0500
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
References: <>
Message-ID: <du04bu$4bq$>

"M.-A. Lemburg" <mal at> wrote in message 
news:44033992.8040805 at
> Note: This is just a thought - I haven't looked into the consequences
> of building with VC8 yet, e.g. from the list of pre-requisites,
> it's possible that .NET 2.0 would become a requirement.

>From the FAQ (see other reply), it appears that this *is* a requirement for 
the Express editions. 

From martin at  Tue Feb 28 01:20:57 2006
From: martin at (martin at
Date: Tue, 28 Feb 2006 01:20:57 +0100
Subject: [Python-Dev] str.count is slow
In-Reply-To: <du046r$3rs$>
References: <><>
Message-ID: <>

Zitat von Fredrik Lundh <fredrik at>:

> it's about time that someone sat down and merged the string and unicode
> implementations into a single "stringlib" code base (see the SRE sources for
> an efficient way to do this in plain C). [1]
> 1) anyone want me to start working on this ?

This would be a waste of time: In Python 3, the string type will be
gone (or, rather, the unicode type, depending on the point of view).


From fredrik at  Tue Feb 28 01:24:50 2006
From: fredrik at (Fredrik Lundh)
Date: Tue, 28 Feb 2006 01:24:50 +0100
Subject: [Python-Dev] str.count is slow
References: <><><><du046r$3rs$>
Message-ID: <du058j$6tt$>

martin at wrote:

> > it's about time that someone sat down and merged the string and unicode
> > implementations into a single "stringlib" code base (see the SRE sources for
> > an efficient way to do this in plain C). [1]
> [...]
> > 1) anyone want me to start working on this ?
> This would be a waste of time: In Python 3, the string type will be
> gone (or, rather, the unicode type, depending on the point of view).

no matter what ends up in Python 3, you'll still need to perform operations
on both 8-bit buffers and Unicode buffers.

(not to mention that a byte type that doesn't support find/split/count etc
is pretty useless).


From martin at  Tue Feb 28 01:28:21 2006
From: martin at (martin at
Date: Tue, 28 Feb 2006 01:28:21 +0100
Subject: [Python-Dev] Switch to MS VC++ 2005 ?!
In-Reply-To: <du048c$3v4$>
References: <> <>
Message-ID: <>

Zitat von Terry Reedy <tjreedy at>:

> "
> 2.What can I do with the Express Editions?
> ...
>  Evaluate the .NET Framework for Windows and Web development.
> "
> and this
> "

Yes, but also this:

"""4. Can I use Express Editions for commercial use?
Yes, there are no licensing restrictions for applications built using the
Express Editions.

> 13.Can I develop applications using the Visual Studio Express Editions to
> target the .NET Framework 1.1?
> No ...

> 'Free' is not always free.  This appears to be a .NET 2 promotion.

Well, this is completely irrelevant for Python. Python does not use
any .NET whatsoever (except for IronPython, of course). What framework
version the C# links with is irrelevant for the to-native C compiler.

> Perhaps the Firefox people are using the professional version, without such
> a limitation?

I guess the Express version can also build firefox, just fine.


From jeremy at  Tue Feb 28 03:54:04 2006
From: jeremy at (Jeremy Hylton)
Date: Mon, 27 Feb 2006 21:54:04 -0500
Subject: [Python-Dev] quick status report
Message-ID: <>

I made a few more minor revisions to the AST on the plane this
afternoon.  I'll check them in tomorrow when I get a chance to do a
full test run.

* Remove asdl_seq_APPEND.  All uses replaced with set
* Fix set_context() comments and check return value every where.
* Reimplement real arena for pyarena.c


From tim.peters at  Tue Feb 28 05:48:44 2006
From: tim.peters at (Tim Peters)
Date: Mon, 27 Feb 2006 22:48:44 -0600
Subject: [Python-Dev] Long-time shy failure in test_socket_ssl
In-Reply-To: <>
References: <>
Message-ID: <>

[1/24/06, Tim Peters]
>> ...
>> test_rude_shutdown() is dicey, relying on a sleep() instead of proper
>> synchronization to make it probable that the `listener` thread goes
>> away before the main thread tries to connect, but while that race may
>> account for bogus TestFailed deaths, it doesn't seem possible that it
>> could account for the kind of failure above.

[Tim Peters]
> Well, since it's silly to try to guess about one weird failure when a
> clear cause for another kind of weird failure is known, I checked in
> changes to do "proper" thread synchronization and termination in that
> test.  Hasn't failed here since, but that's not surprising (it was
> always a "once in a light blue moon" kind of thing).

Neal plugged another hole later, but-- alas --I have seen the same shy
failure since then on WinXP.  One of the most recent buildbot test
runs saw it too, on a non-Windows box:

test test_socket_ssl crashed --
    exceptions.TypeError: 'NoneType' object is not callable

in the second test run there.

Still no theory!  Maybe we can spend the next 3 days sprinting on it :-)

From greg.ewing at  Tue Feb 28 06:40:51 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 28 Feb 2006 18:40:51 +1300
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Nick Coghlan wrote:

> I wouldn't mind seeing one of the early ideas from PEP 340 being resurrected 
> some day, such that the signature for the special method was "__next__(self, 
> input)" and for the builtin "next(iterator, input=None)"

Aren't we getting an argument to next() anyway?
Or was that idea dropped?


From greg.ewing at  Tue Feb 28 06:43:20 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 28 Feb 2006 18:43:20 +1300
Subject: [Python-Dev] bytes.from_hex()
In-Reply-To: <>
References: <>
Message-ID: <>

Bill Janssen wrote:

> I use it quite a bit for image processing (converting to and from the
> "data:" URL form), and various checksum applications (converting SHA
> into a string).

Aha! We have a customer!

For those cases, would you find it more convenient
for the result to be text or bytes in Py3k?


From greg.ewing at  Tue Feb 28 07:03:06 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 28 Feb 2006 19:03:06 +1300
Subject: [Python-Dev] Pre-PEP: The "bytes" object
In-Reply-To: <>
References: <>
	<dto68f$3ca$> <>
	<dtob1t$f52$> <>
Message-ID: <>

Jason Orendorff wrote:

> I like these promises:
>   - bytes(arg) works like array.array('b', arg)
>   - bytes(arg1, arg2) works like bytes(arg1.encode(arg2))

+1. That's exactly how I think it should work, too.

> I dislike these promises:
>   - bytes(s, [ignored]), where s is a str, works like array.array('b', s)
>   - bytes(u, [encoding]), where u is a unicode,
>         works like bytes(u.encode(encoding))



From greg.ewing at  Tue Feb 28 07:38:38 2006
From: greg.ewing at (Greg Ewing)
Date: Tue, 28 Feb 2006 19:38:38 +1300
Subject: [Python-Dev] str.count is slow
In-Reply-To: <du046r$3rs$>
References: <>
Message-ID: <>

Fredrik Lundh wrote:

> moving to (basic) C++ might also be a good idea (in 3.0, perhaps).  is any-
> one still stuck with pure C89 these days ?

Some of us actually *prefer* working with plain C
when we have a choice, and don't consider ourselves
"stuck" with it.

My personal goal in life right now is to stay as
far away from C++ as I can get. If CPython becomes
C++-based (C++Python?) I will find it quite
distressing, because my most favourite language will
then be built on top of my least favourite language.


From fredrik at  Tue Feb 28 08:45:49 2006
From: fredrik at (Fredrik Lundh)
Date: Tue, 28 Feb 2006 08:45:49 +0100
Subject: [Python-Dev] str.count is slow
References: <><><><du046r$3rs$>
Message-ID: <du0v3e$69k$>

Greg Ewing wrote:

> Fredrik Lundh wrote:
> > moving to (basic) C++ might also be a good idea (in 3.0, perhaps).  is any-
> > one still stuck with pure C89 these days ?
> Some of us actually *prefer* working with plain C
> when we have a choice, and don't consider ourselves
> "stuck" with it.

perhaps, but polymorphic code is a lot easier to write in C++ than
in C.

> My personal goal in life right now is to stay as
> far away from C++ as I can get.

so what C compiler are you using ?


From ncoghlan at  Tue Feb 28 10:03:29 2006
From: ncoghlan at (Nick Coghlan)
Date: Tue, 28 Feb 2006 19:03:29 +1000
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Greg Ewing wrote:
> Nick Coghlan wrote:
>> I wouldn't mind seeing one of the early ideas from PEP 340 being 
>> resurrected some day, such that the signature for the special method 
>> was "__next__(self, input)" and for the builtin "next(iterator, 
>> input=None)"
> Aren't we getting an argument to next() anyway?
> Or was that idea dropped?

PEP 342 opted to extend the generator API instead (using "send") and leave the 
iterator protocol alone for the time being.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From mal at  Tue Feb 28 11:23:55 2006
From: mal at (M.-A. Lemburg)
Date: Tue, 28 Feb 2006 11:23:55 +0100
Subject: [Python-Dev] Making ascii the default encoding
In-Reply-To: <>
References: <>
Message-ID: <>

Neal Norwitz wrote:
> PEP 263 states that in Phase 2 the default encoding will be set to
> ASCII.  Although the PEP is marked final, this isn't actually
> implemented.  The warning about using non-ASCII characters started in
> 2.3.  Does anyone think we shouldn't enforce the default being ASCII?
> This means if an # -*- coding: ... -*- is not set and non-ASCII
> characters are used, an error will be generated.


Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 28 2006)
>>> Python/Zope Consulting and Support ...
>>> mxODBC.Zope.Database.Adapter ...   
>>> mxODBC, mxDateTime, mxTextTools ...

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

From anthony at  Tue Feb 28 16:39:12 2006
From: anthony at (Anthony Baxter)
Date: Wed, 1 Mar 2006 02:39:12 +1100
Subject: [Python-Dev] 2.4.3 for end of March?
Message-ID: <>

So I'm planning a 2.4.3c1 around the 22nd-23rd of March, with a 2.4.3 
final a week later. This will be the first release since the svn 
cutover, which should make things exciting. 

This is to get things cleared out before we start the cycle of pain - 
ahem - the 2.5 release cycle. A 2.4.4 would then follow when 2.5 
final is done, hopefully October or so... 

Anyone have any screaming issues with this? Martin's ok to do the 
Windows release, and the doc build should be fine, too. 

Anthony Baxter     <anthony at>
It's never too late to have a happy childhood.

From guido at  Tue Feb 28 18:02:55 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 11:02:55 -0600
Subject: [Python-Dev] defaultdict and on_missing()
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 2/28/06, Nick Coghlan <ncoghlan at> wrote:
> Greg Ewing wrote:
> > Nick Coghlan wrote:
> >
> >> I wouldn't mind seeing one of the early ideas from PEP 340 being
> >> resurrected some day, such that the signature for the special method
> >> was "__next__(self, input)" and for the builtin "next(iterator,
> >> input=None)"
> >
> > Aren't we getting an argument to next() anyway?
> > Or was that idea dropped?
> PEP 342 opted to extend the generator API instead (using "send") and leave the
> iterator protocol alone for the time being.

One of the main reasons for this was the backwards compatibility
problems at the C level. The C implementation doesn't take an
argument. Adding an argument would cause all sorts of code breakage
and possible segfaults (if there's 3rd party code calling tp_next for

In 3.0 we could fix this.

--Guido van Rossum (home page:

From guido at  Tue Feb 28 18:34:35 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 11:34:35 -0600
Subject: [Python-Dev] with-statement heads-up
Message-ID: <>

I just realized that there's a bug in the with-statement as currently
checked in. __exit__ is supposed to re-raise the exception if there
was one; if it returns normally, the finally clause is NOT to re-raise
it. The fix is relatively simple (I believe) but requires updating
lots of unit tests. It'll be a while.

--Guido van Rossum (home page:

From mbland at  Tue Feb 28 18:52:04 2006
From: mbland at (Mike Bland)
Date: Tue, 28 Feb 2006 09:52:04 -0800
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Guido van Rossum <guido at> wrote:
> I just realized that there's a bug in the with-statement as currently
> checked in. __exit__ is supposed to re-raise the exception if there
> was one; if it returns normally, the finally clause is NOT to re-raise
> it. The fix is relatively simple (I believe) but requires updating
> lots of unit tests. It'll be a while.

Hmm.  My understanding was that __exit__ was *not* to reraise it, but
was simply given the opportunity to record the exception-in-progress.


From guido at  Tue Feb 28 19:07:36 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 12:07:36 -0600
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Mike Bland <mbland at> wrote:
> On 2/28/06, Guido van Rossum <guido at> wrote:
> > I just realized that there's a bug in the with-statement as currently
> > checked in. __exit__ is supposed to re-raise the exception if there
> > was one; if it returns normally, the finally clause is NOT to re-raise
> > it. The fix is relatively simple (I believe) but requires updating
> > lots of unit tests. It'll be a while.
> Hmm.  My understanding was that __exit__ was *not* to reraise it, but
> was simply given the opportunity to record the exception-in-progress.

Yes, that's what the PEP said. :-(

Unfortunately the way the PEP is specified, the intended equivalence
between writing a try/except in a @contextmanager-decorated generator
and writing things out explicitly is lost. The plan was that this:

def foo():
    except Exception:

with foo():

would be equivalent to this:

except Exception:



becomes a macro call, and GENERATOR() becomes a macro definition; its
body is the macro expansion with "yield" replaced by BLOCK. But in
order to get those semantics, it must be possible for __exit__() to
signal that the exception passed into it should *not* be re-raised.

The current expansion uses roughly this:


and here the finally clause will re-raise the exception (if there was one).

I ran into this when writing unit tests for @contextmanager.

--Guido van Rossum (home page:

From mbland at  Tue Feb 28 19:25:41 2006
From: mbland at (Mike Bland)
Date: Tue, 28 Feb 2006 10:25:41 -0800
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Guido van Rossum <guido at> wrote:
> On 2/28/06, Mike Bland <mbland at> wrote:
> > On 2/28/06, Guido van Rossum <guido at> wrote:
> > > I just realized that there's a bug in the with-statement as currently
> > > checked in. __exit__ is supposed to re-raise the exception if there
> > > was one; if it returns normally, the finally clause is NOT to re-raise
> > > it. The fix is relatively simple (I believe) but requires updating
> > > lots of unit tests. It'll be a while.
> >
> > Hmm.  My understanding was that __exit__ was *not* to reraise it, but
> > was simply given the opportunity to record the exception-in-progress.
> Yes, that's what the PEP said. :-(
> Unfortunately the way the PEP is specified, the intended equivalence
> between writing a try/except in a @contextmanager-decorated generator
> and writing things out explicitly is lost. The plan was that this:
> @contextmanager
> def foo():
>     try:
>         yield
>     except Exception:
>         pass
> with foo():
>     1/0
> would be equivalent to this:
> try:
>     1/0
> except Exception:
>     pass
> with GENERATOR():
>     BLOCK
> becomes a macro call, and GENERATOR() becomes a macro definition; its
> body is the macro expansion with "yield" replaced by BLOCK. But in
> order to get those semantics, it must be possible for __exit__() to
> signal that the exception passed into it should *not* be re-raised.
> The current expansion uses roughly this:
>   finally:
>       ctx.__exit__(*exc)
> and here the finally clause will re-raise the exception (if there was one).
> I ran into this when writing unit tests for @contextmanager.

This may just be my inexperience talking, and I don't have the code in
front of me right this moment, but in my mind these semantics would
simplify the original version of my patch, as we wouldn't have to
juggle the stack at all.  (Other than rotating the three exception
objects, that is).  We could then just pass the exception objects into
__exit__ without having to leave a copy on the stack, and could forego
the END_FINALLY.  (I *think*.)  Does that make sense?


From guido at  Tue Feb 28 20:07:14 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 13:07:14 -0600
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Mike Bland <mbland at> wrote:
> On 2/28/06, Guido van Rossum <guido at> wrote:
> > On 2/28/06, Mike Bland <mbland at> wrote:
> > > On 2/28/06, Guido van Rossum <guido at> wrote:
> > > > I just realized that there's a bug in the with-statement as currently
> > > > checked in. __exit__ is supposed to re-raise the exception if there
> > > > was one; if it returns normally, the finally clause is NOT to re-raise
> > > > it. The fix is relatively simple (I believe) but requires updating
> > > > lots of unit tests. It'll be a while.
> > >
> > > Hmm.  My understanding was that __exit__ was *not* to reraise it, but
> > > was simply given the opportunity to record the exception-in-progress.
> >
> > Yes, that's what the PEP said. :-(
> >
> > Unfortunately the way the PEP is specified, the intended equivalence
> > between writing a try/except in a @contextmanager-decorated generator
> > and writing things out explicitly is lost. The plan was that this:
> >
> > @contextmanager
> > def foo():
> >     try:
> >         yield
> >     except Exception:
> >         pass
> >
> > with foo():
> >     1/0
> >
> > would be equivalent to this:
> >
> > try:
> >     1/0
> > except Exception:
> >     pass
> >
> > IOW
> >
> > with GENERATOR():
> >     BLOCK
> >
> > becomes a macro call, and GENERATOR() becomes a macro definition; its
> > body is the macro expansion with "yield" replaced by BLOCK. But in
> > order to get those semantics, it must be possible for __exit__() to
> > signal that the exception passed into it should *not* be re-raised.
> >
> > The current expansion uses roughly this:
> >
> >   finally:
> >       ctx.__exit__(*exc)
> >
> > and here the finally clause will re-raise the exception (if there was one).
> >
> > I ran into this when writing unit tests for @contextmanager.
> This may just be my inexperience talking, and I don't have the code in
> front of me right this moment, but in my mind these semantics would
> simplify the original version of my patch, as we wouldn't have to
> juggle the stack at all.  (Other than rotating the three exception
> objects, that is).  We could then just pass the exception objects into
> __exit__ without having to leave a copy on the stack, and could forego
> the END_FINALLY.  (I *think*.)  Does that make sense?

Yes, it does. Except there's yet another wrinkle: non-local gotos
(break, continue, return).

The special WITH_CLEANUP opcode that I added instead of your ROT4
magic now considers the following cases:

- if the "exception indicator" is None or an int, leave it,
  and push three Nones

- otherwise, replace the exception indicator and the two elements below
  it with a single None (thus reducing the stack level by 2), *and* push
  the exception indicator and those two elements back onto the stack,
  in reverse order.

To clarify, let's look at the four cases. I'm drawing the stack top on
the right:

(return or continue; the int is WHY_RETURN or WHY_CONTINUE)
BEFORE: retval; int; __exit__
AFTER: retval; int; __exit__; None; None; None

(break; the int is WHY_BREAK)
BEFORE: int; __exit__
AFTER: int; __exit__; None; None; None

(no exception)
BEFORE: None; __exit__
AFTER: None; __exit__; None; None; None

BEFORE: traceback; value; type; __exit__
AFTER: None; __exit__; type; value; traceback

The code generated in the finally clause looks as follows:

WITH_CLEANUP      (this does the above transform)
CALL_FUNCTION 3   (calls __exit__ with three arguments)
POP_TOP           (throws away the result)
END_FINALLY       (interprets the int or None now on top appropriately)

Hope this helps (if not you, future generations :-).

--Guido van Rossum (home page:

From ncoghlan at  Tue Feb 28 22:57:52 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 01 Mar 2006 07:57:52 +1000
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> I just realized that there's a bug in the with-statement as currently
> checked in. __exit__ is supposed to re-raise the exception if there
> was one; if it returns normally, the finally clause is NOT to re-raise
> it. The fix is relatively simple (I believe) but requires updating
> lots of unit tests. It'll be a while.

So does that mean with statements *will* be able to suppress exceptions now? 
(If I'm reading the PEP changes right it does, but I haven't finished my 
coffee yet. . .)

I'm not complaining if that's so, as I think allowing it makes the operation 
of the statement both more useful and more intuitive, but you were originally 
concerned about the potential for hidden flow control if the context manager 
could suppress exceptions, as well as the need to remember to write "raise" in 
the except clauses of context managers.

If you changed your mind along the way, that should probably be explained in 
the PEP somewhere :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Tue Feb 28 23:01:49 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 16:01:49 -0600
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Nick Coghlan <ncoghlan at> wrote:
> Guido van Rossum wrote:
> > I just realized that there's a bug in the with-statement as currently
> > checked in. __exit__ is supposed to re-raise the exception if there
> > was one; if it returns normally, the finally clause is NOT to re-raise
> > it. The fix is relatively simple (I believe) but requires updating
> > lots of unit tests. It'll be a while.
> So does that mean with statements *will* be able to suppress exceptions now?
> (If I'm reading the PEP changes right it does, but I haven't finished my
> coffee yet. . .)

Yes. And unless there are peasants at the gate with pitchforks etc. it
will stay that way. :-)

> I'm not complaining if that's so, as I think allowing it makes the operation
> of the statement both more useful and more intuitive, but you were originally
> concerned about the potential for hidden flow control if the context manager
> could suppress exceptions, as well as the need to remember to write "raise" in
> the except clauses of context managers.

Yes, I've changed my mind about that.

> If you changed your mind along the way, that should probably be explained in
> the PEP somewhere :)

I don't know that PEPs benefit from too much "on the one hand, on the
other hand, on the third hand" or "and then I changed my mind, and
then I changed it back, and then I changed it again".

--Guido van Rossum (home page:

From ncoghlan at  Tue Feb 28 23:12:50 2006
From: ncoghlan at (Nick Coghlan)
Date: Wed, 01 Mar 2006 08:12:50 +1000
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>	
Message-ID: <>

Guido van Rossum wrote:
>> If you changed your mind along the way, that should probably be explained in
>> the PEP somewhere :)
> I don't know that PEPs benefit from too much "on the one hand, on the
> other hand, on the third hand" or "and then I changed my mind, and
> then I changed it back, and then I changed it again".

Heh :)

Plus the SVN history and the mailing list archive already provide that record.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From pje at  Tue Feb 28 23:29:16 2006
From: pje at (Phillip J. Eby)
Date: Tue, 28 Feb 2006 16:29:16 -0600
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

At 04:01 PM 2/28/2006, Guido van Rossum wrote:
>On 2/28/06, Nick Coghlan <ncoghlan at> wrote:
> > Guido van Rossum wrote:
> > > I just realized that there's a bug in the with-statement as currently
> > > checked in. __exit__ is supposed to re-raise the exception if there
> > > was one; if it returns normally, the finally clause is NOT to re-raise
> > > it. The fix is relatively simple (I believe) but requires updating
> > > lots of unit tests. It'll be a while.
> >
> > So does that mean with statements *will* be able to suppress 
> exceptions now?
> > (If I'm reading the PEP changes right it does, but I haven't finished my
> > coffee yet. . .)
>Yes. And unless there are peasants at the gate with pitchforks etc. it
>will stay that way. :-)

Notice that these semantics break some of the PEP examples.  For 
example, the 'locked' and 'nested' classes now suppress exceptions, 
and it took a non-trivial study of their code to determine 
this.  This seems to suggest that making suppression the default 
behavior is a bad idea.

I was originally on the side of allowing suppression, but I wanted it 
to be done by explicitly returning some non-None value, so that 
suppression would not be the default, implicit behavior.  I think I'd 
prefer not to be able to suppress the errors, than to have errors 
pass silently unless explicitly re-raised!  I don't see a problem 
with having e.g. __exit__ have to return a flag to suppress the 
exception; it wouldn't need to change how @contextmanager functions 
are written.  (Implicit suppression is only a problem for people 
writing __exit__ methods, in other words; all your reasoning about 
@contextmanager generators is valid, IMO.)

From guido at  Tue Feb 28 23:36:26 2006
From: guido at (Guido van Rossum)
Date: Tue, 28 Feb 2006 16:36:26 -0600
Subject: [Python-Dev] with-statement heads-up
In-Reply-To: <>
References: <>
Message-ID: <>

On 2/28/06, Phillip J. Eby <pje at> wrote:
> Notice that these semantics break some of the PEP examples.  For
> example, the 'locked' and 'nested' classes now suppress exceptions,
> and it took a non-trivial study of their code to determine
> this.  This seems to suggest that making suppression the default
> behavior is a bad idea.

I presume you're referring to example 4 (locked as a class), not
example 1 (locked as a generator). I'll fix this, and rewrite nested()
as a generator (just like what I checked in :-).

> I was originally on the side of allowing suppression, but I wanted it
> to be done by explicitly returning some non-None value, so that
> suppression would not be the default, implicit behavior.  I think I'd
> prefer not to be able to suppress the errors, than to have errors
> pass silently unless explicitly re-raised!  I don't see a problem
> with having e.g. __exit__ have to return a flag to suppress the
> exception; it wouldn't need to change how @contextmanager functions
> are written.  (Implicit suppression is only a problem for people
> writing __exit__ methods, in other words; all your reasoning about
> @contextmanager generators is valid, IMO.)

Thanks for the validation of the idea -- I ran into this when writing
unittests for @contextmanager...

I think that providing sufficient *correct* examples will avoid most
of the problems. People will clone existing examples (I know *I* did
when adding context managers to various modules :-).

Changing the API to let __exit__() return something true to suppress
the exception seems somewhat clumsy. Re-raising the exception is
analogous to the throw() method in PEP 342.

--Guido van Rossum (home page: