From  Tue Jan  1 22:54:10 2002
From: (Jack Jansen)
Date: Tue, 1 Jan 2002 23:54:10 +0100
Subject: [Python-Dev] Unicode support in getargs.c
Message-ID: <>

I posted a question on Unicode support in getargs.c last month (working
on a different project), but now that I'm trying to support
unicode-based APIs more seriously I find that it leaves even more to be
desired. I'd like to help to fix this, but I need some direction on
how things should be fixed.

Here are some of the issues I ran in today:
- Unicode objects have a companion string object, meaning that you can
  pass a unicode object to an "s" format and have the right thing happen.
  String objects have no such accompanying unicode object, and I think they
  should have. Right now you cannot pass a string object when the C
  routine expects a unicode object.
- There is no unicode equivalent of "c", the single character.
- "u#" does something useful, but something completely different from
  what "s#" does. More to the point, it probably does something
  dangerous, if I understand correctly. If I write a C routine with an
  "u#" format and the Python code passes a string object the string object
  will be used as a buffer object and its binary contents will be interpreted
  as unicode. If the argument in question is a filename this will produce
  very surprising results:-)

I'd like unicode objects to be get a little more first class citizenship,
especially in the light of operating systems that are primarily (or
exclusively) unicode based, such as Mac OS X or Windows CE, to sum things up.

From  Tue Jan  1 23:42:21 2002
From: (M.-A. Lemburg)
Date: Wed, 02 Jan 2002 00:42:21 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <>
Message-ID: <>

Jack Jansen wrote:

> I posted a question on Unicode support in getargs.c last month (working
> on a different project), but now that I'm trying to support
> unicode-based APIs more seriously I find that it leaves even more to be
> desired. I'd like to help to fix this, but I need some direction on
> how things should be fixed.
> Here are some of the issues I ran in today:
> - Unicode objects have a companion string object, meaning that you can
>   pass a unicode object to an "s" format and have the right thing happen.
>   String objects have no such accompanying unicode object, and I think they
>   should have. Right now you cannot pass a string object when the C
>   routine expects a unicode object.

You can: parse the object and then pass it to

> - There is no unicode equivalent of "c", the single character.
> - "u#" does something useful, but something completely different from
>   what "s#" does. More to the point, it probably does something
>   dangerous, if I understand correctly. If I write a C routine with an
>   "u#" format and the Python code passes a string object the string object
>   will be used as a buffer object and its binary contents will be interpreted
>   as unicode. If the argument in question is a filename this will produce
>   very surprising results:-)

True; "u#" does exactly the same as "s#" -- it interprets the
input as binary buffer.

> I'd like unicode objects to be get a little more first class citizenship,
> especially in the light of operating systems that are primarily (or
> exclusively) unicode based, such as Mac OS X or Windows CE, to sum things up.

You would be far better off using the Unicode API on the
objects which are passed into the function rather than relying on
the getargs parser to try to apply some magic to the input

It might be worthwhile extending the parser markers a bit
more or allowing e.g. introduce "us#" to return Unicode objects
much like "es#" returns strings... I think we'd need some examples
of use though before deciding what's the right way to do this
("es#" was implemented after an request by Mark Hammond to
be able to handle Unicode file names for Win CE).

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Tue Jan  1 23:58:04 2002
From: (Martin v. Loewis)
Date: Wed, 2 Jan 2002 00:58:04 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (message from Jack
 Jansen on Tue, 1 Jan 2002 23:54:10 +0100)
References: <>
Message-ID: <>

>   String objects have no such accompanying unicode object, and I
>   think they should have.

No. That would either give you cyclic structures, or an ever growing
chain of unicode->string->unicode->string objects that could easily
result in unacceptable memory consumption.

Furthermore, I consider the existance of the embedded string object in
a Unicode object as a flaw in itself, as it relies on the default
encoding. IMO, the default encoding shouldn't be used if possible, as
it only serves the transition towards Unicode, and only in limited

> - There is no unicode equivalent of "c", the single character.

Why do you need that?

> - "u#" does something useful, but something completely different from
>   what "s#" does. More to the point, it probably does something
>   dangerous, if I understand correctly. If I write a C routine with an
>   "u#" format and the Python code passes a string object the string object
>   will be used as a buffer object and its binary contents will be interpreted
>   as unicode.

That sounds like a bug to me. Passing a string to u# most certainly
does not do the right thing; it is bad that does so silently.

OTOH, why do you need u#? Normally, you use s# if a string can have
embedded null bytes; you do that if the string is "binary". For
Unicode, that is useless: A Unicode string typically won't have any
embedded null bytes, and it definitely isn't "binary".


From  Wed Jan  2 00:02:08 2002
From: (Martin v. Loewis)
Date: Wed, 2 Jan 2002 01:02:08 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (
References: <> <>
Message-ID: <>

> True; "u#" does exactly the same as "s#" -- it interprets the
> input as binary buffer.

It doesn't do exactly the same. If s# is applied to a Unicode object,
it transparently invokes the default encoding, which is sensible.  If
u# is applied to a byte string, it does not apply the default encoding.

Instead, it interprets the string "as-is". I cannot see an application
where this is useful, but I can see many applications where it is
clearly wrong.

IMO, u# cannot and should not be symmetric to s#. Instead, it should
accept just Unicode objects, and raise TypeErrors for everything else.


From  Wed Jan  2 01:02:43 2002
From: (Barry A. Warsaw)
Date: Tue, 1 Jan 2002 20:02:43 -0500
Subject: [Python-Dev] Unicode support in getargs.c
References: <>
Message-ID: <>

>>>>> "JJ" == Jack Jansen <> writes:

    JJ> I'd like unicode objects to be get a little more first class
    JJ> citizenship, especially in the light of operating systems that
    JJ> are primarily (or exclusively) unicode based, such as Mac OS X
    JJ> or Windows CE, to sum things up.

string/unicode unification?


From  Wed Jan  2 06:57:21 2002
From: (Mark Hammond)
Date: Wed, 2 Jan 2002 17:57:21 +1100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <>
Message-ID: <>

> It might be worthwhile extending the parser markers a bit
> more or allowing e.g. introduce "us#" to return Unicode objects
> much like "es#" returns strings... I think we'd need some examples
> of use though before deciding what's the right way to do this
> ("es#" was implemented after an request by Mark Hammond to
> be able to handle Unicode file names for Win CE).

Actually, it was for Windows itself, allowing the nt module to use Unicode
objects correctly for the platform.


From  Wed Jan  2 10:24:45 2002
From: (M.-A. Lemburg)
Date: Wed, 02 Jan 2002 11:24:45 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > True; "u#" does exactly the same as "s#" -- it interprets the
> > input as binary buffer.
> It doesn't do exactly the same. If s# is applied to a Unicode object,
> it transparently invokes the default encoding, which is sensible.  If
> u# is applied to a byte string, it does not apply the default encoding.

That's because the buffer interface on Unicode objects doesn't
return the raw binary buffer. If you pass in a memory mapped
file or a buffer object wrapping some memory area, u# will
take the input as raw binary stream.

All this weird behaviour is needed to make Unicode objects
behave well together with s#.

The implementation of u# is completely symmetric to that of s#
though. I agree, though, that it would make more sense to special
case Unicode objects here and have u# return a pointer to the
raw internal buffer of the Unicode object.

Jack will probably also need a way to say "decode this encoded
object into Unicode using the encoding xyz". Something like the
Unicode version of "es#". How about "eu#" which then passes through
Unicode as-is while decoding all other objects according to the
given encoding ?!

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  2 19:20:38 2002
From: (Martin v. Loewis)
Date: Wed, 2 Jan 2002 20:20:38 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (
References: <> <> <> <>
Message-ID: <>

> That's because the buffer interface on Unicode objects doesn't
> return the raw binary buffer. If you pass in a memory mapped
> file or a buffer object wrapping some memory area, u# will
> take the input as raw binary stream.
> All this weird behaviour is needed to make Unicode objects
> behave well together with s#.

I don't believe this. Why would the implementation of u# have any
effect on making s# work?

> Jack will probably also need a way to say "decode this encoded
> object into Unicode using the encoding xyz". Something like the
> Unicode version of "es#". How about "eu#" which then passes through
> Unicode as-is while decoding all other objects according to the
> given encoding ?!

I'd like to see the requirements, in terms of real-world problems,
before considering any extensions.


From  Wed Jan  2 19:40:56 2002
From: (M.-A. Lemburg)
Date: Wed, 02 Jan 2002 20:40:56 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <> <> <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > That's because the buffer interface on Unicode objects doesn't
> > return the raw binary buffer. If you pass in a memory mapped
> > file or a buffer object wrapping some memory area, u# will
> > take the input as raw binary stream.
> >
> > All this weird behaviour is needed to make Unicode objects
> > behave well together with s#.
> I don't believe this. Why would the implementation of u# have any
> effect on making s# work?

To make s# work, we had to map the read buffer interface to the
encoded version of Unicode -- not the binary version which would
have been the "right" choice in terms of the buffer interface (s#
maps to the read buffer interface, while t# maps to the character
buffer interface).
u# is simply a copy&paste implementation of s# interpreting the
results of the read buffer interface as Py_UNICODE array. As I menioned
in another mail, we should probably let u# pass through Unicode
objects as-is without going through the read buffer interface.
This functionality is clearly missing and should be added to
make u# useful.

> > Jack will probably also need a way to say "decode this encoded
> > object into Unicode using the encoding xyz". Something like the
> > Unicode version of "es#". How about "eu#" which then passes through
> > Unicode as-is while decoding all other objects according to the
> > given encoding ?!
> I'd like to see the requirements, in terms of real-world problems,
> before considering any extensions.

Agreed. Jack should post some examples of what he needs for his

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  2 20:29:02 2002
From: (Martin v. Loewis)
Date: Wed, 2 Jan 2002 21:29:02 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (
References: <> <> <> <> <> <>
Message-ID: <>

> > > All this weird behaviour is needed to make Unicode objects
> > > behave well together with s#.
> > 
> > I don't believe this. Why would the implementation of u# have any
> > effect on making s# work?
> u# is simply a copy&paste implementation of s# interpreting the
> results of the read buffer interface as Py_UNICODE array. 

Ok. That explains its history, but it also clarifies that changing the
u# implementation has *no* effect whatsoever proper operation of s#.

Therefore, I still think that u# should reject string objects, instead
of silently doing the wrong thing.

> As I menioned in another mail, we should probably let u# pass
> through Unicode objects as-is without going through the read buffer
> interface.

Yes, that would be nice. The only use of u# I can see is that it gives
you the number of Py_UNICODE characters, so that the caller doesn't
have to look for the terminating NUL.


From  Wed Jan  2 21:46:46 2002
From: (Jack Jansen)
Date: Wed, 2 Jan 2002 22:46:46 +0100 (CET)
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <>
Message-ID: <>

On Wed, 2 Jan 2002, Martin v. Loewis wrote:

> > Jack will probably also need a way to say "decode this encoded
> > object into Unicode using the encoding xyz". Something like the
> > Unicode version of "es#". How about "eu#" which then passes through
> > Unicode as-is while decoding all other objects according to the
> > given encoding ?!
> I'd like to see the requirements, in terms of real-world problems,
> before considering any extensions.

I have a number of MacOSX API's that expect Unicode buffers, passed as 
"long count, UniChar *buffer". I have the machinery in bgen to generate 
code for this, iff "u#" (or something else) would work the same as "s#", 
i.e. it returns you a pointer and a size, and it would work equally well 
for unicode objects as for classic strings (after conversion).

The trick with O and using PyUnicode_FromObject() may do the trick for me, 
as my code is generated, so a little more glue call doesn't really matter. 
But as a general solution it doesn't look right: "How do I call a C 
routine with a string parameter?" "Use the "s" format and you get the 
string pointer to pass". "How do I call a C routine with a unicode string 
parameter?" "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and 
make sure you get all your decrefs right and.....".

The "es#" is a very strange beast, and a similar "eu#" would help me a 
little, but it has some serious drawbacks. Aside from it being completely 
different from the other converters (being a prefix operator in stead of a 
postfix one, and having a value-return argument) I would also have to 
pre-allocate the buffer in advance, and that sort of defeats the purpose.
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++      | ++++ if you agree copy these lines to your sig ++++ | see 

From  Wed Jan  2 22:51:17 2002
From: (Martin v. Loewis)
Date: Wed, 2 Jan 2002 23:51:17 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <>
 (message from Jack Jansen on Wed, 2 Jan 2002 22:46:46 +0100 (CET))
References: <>
Message-ID: <>

> I have a number of MacOSX API's that expect Unicode buffers, passed as 
> "long count, UniChar *buffer". 

Well, my first question would be: Are you sure that UniChar has the
same underlying integral type as Py_UNICODE? If not, you lose.

So you may need to do even more conversion.

> I have the machinery in bgen to generate code for this, iff "u#" (or
> something else) would work the same as "s#", i.e. it returns you a
> pointer and a size, and it would work equally well for unicode
> objects as for classic strings (after conversion).

I see. u# could be made work for Unicode objects alone, but it would
have to reject string objects.

> But as a general solution it doesn't look right: "How do I call a C 
> routine with a string parameter?" "Use the "s" format and you get the 
> string pointer to pass". "How do I call a C routine with a unicode string 
> parameter?" 

For that, the answer is u. But you want the length also. So for that,
the answer is u#. But your question is "How do I call a C routine with
either a Unicode object or a string object, getting a reasonable
Py_UNICODE* and the length?".

For that, I'd recommend to use O&, with a conversion function

PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){
  if (PyUnicode_Check(o)){
    Py_INCREF(o);return o;
  if (PyString_Check(o)){
    return PyUnicode_FromObject(o);
  PyErr_SetString(PyExc_TypeError,"unicode object expecpected");
  return NULL;

> "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and 
> make sure you get all your decrefs right and.....".

With the function above, this becomes

Use O&, passing a PyObject**, the function, and a NULL pointer, using
PyUnicode_AS_UNICODE and PyUnicode_SIZE, performing a single DECREF at
the end [allowing to specify an encoding is optional]

In this scenario, somebody *has* to deallocate memory, you cannot get
around this. It is your choice whether this is Py_DECREF or PyMem_Free
that you have to call (as with the "esomething" conversions); the
DECREF is more efficient as it will not copy a Unicode object.

> The "es#" is a very strange beast, and a similar "eu#" would help me a 
> little, but it has some serious drawbacks. Aside from it being completely 
> different from the other converters (being a prefix operator in stead of a 
> postfix one, and having a value-return argument) I would also have to 
> pre-allocate the buffer in advance, and that sort of defeats the purpose.

You don't. If you set the buffer to NULL before invoking getargs, you
have to PyMem_Free it afterwards.


From  Thu Jan  3 10:34:17 2002
From: (M.-A. Lemburg)
Date: Thu, 03 Jan 2002 11:34:17 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > I have a number of MacOSX API's that expect Unicode buffers, passed as
> > "long count, UniChar *buffer".
> Well, my first question would be: Are you sure that UniChar has the
> same underlying integral type as Py_UNICODE? If not, you lose.
> So you may need to do even more conversion.

This should be the first thing to check. Also note that Python
has two different flavors of Unicode support: UCS-2 and UCS-4,
so you'll have to be careful about this too.
> > I have the machinery in bgen to generate code for this, iff "u#" (or
> > something else) would work the same as "s#", i.e. it returns you a
> > pointer and a size, and it would work equally well for unicode
> > objects as for classic strings (after conversion).
> I see. u# could be made work for Unicode objects alone, but it would
> have to reject string objects.

Martin, I don't agree here: string objects could hold binary UCS-2/UCS-4 

Jack, u# cannot auto-convert strings to Unicode since this would
require allocation of a temporary object and there's no logic there
to free that object after usage.

es# has logic in place which allows either copying the raw data
to a buffer you provide or have it allocate a buffer of the
right size for you. That's why I proposed to extend it support
Unicode raw data as well.

> > But as a general solution it doesn't look right: "How do I call a C
> > routine with a string parameter?" "Use the "s" format and you get the
> > string pointer to pass". "How do I call a C routine with a unicode string
> > parameter?"
> For that, the answer is u. But you want the length also. So for that,
> the answer is u#. But your question is "How do I call a C routine with
> either a Unicode object or a string object, getting a reasonable
> Py_UNICODE* and the length?".
> For that, I'd recommend to use O&, with a conversion function
> PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){
>   if (PyUnicode_Check(o)){
>     Py_INCREF(o);return o;
>   }
>   if (PyString_Check(o)){
>     return PyUnicode_FromObject(o);
>   }
>   PyErr_SetString(PyExc_TypeError,"unicode object expecpected");
>   return NULL;
> }

Martin, note that PyUnicode_FromObject() already does the Unicode
pass-through (even more: it makes sure that you get a true Unicode
object, not a subclass).
> > "Use O and PyUnicode_FromObject() and PyUnicode_AsUnicode and
> > make sure you get all your decrefs right and.....".
> With the function above, this becomes
> Use O&, passing a PyObject**, the function, and a NULL pointer, using
> PyUnicode_AS_UNICODE and PyUnicode_SIZE, performing a single DECREF at
> the end [allowing to specify an encoding is optional]
> In this scenario, somebody *has* to deallocate memory, you cannot get
> around this. It is your choice whether this is Py_DECREF or PyMem_Free
> that you have to call (as with the "esomething" conversions); the
> DECREF is more efficient as it will not copy a Unicode object.
> > The "es#" is a very strange beast, and a similar "eu#" would help me a
> > little, but it has some serious drawbacks. Aside from it being completely
> > different from the other converters (being a prefix operator in stead of a
> > postfix one, and having a value-return argument) I would also have to
> > pre-allocate the buffer in advance, and that sort of defeats the purpose.
> You don't. If you set the buffer to NULL before invoking getargs, you
> have to PyMem_Free it afterwards.


Let me see if I can summarize this:

Jack wants to get string and Unicode objects converted to Unicode 
automagically and then receive a pointer to a Py_UNICODE buffer and
a size. 

The current solution for this is to use the "O" parser,
fetch the object, pass it through PyUnicode_FromObject(), then
use PyUnicode_GET_SIZE() and PyUnicode_AS_UNICODE() to access
the Py_UNICODE buffer and finally to Py_DECREF() the object returned
by PyUnicode_FromObject().

What I proposed was to extend the "es#" parser marker with a new
modifier: "eu#" which does all of the above except that it either
copies the Py_UNICODE data to a buffer you provide or a newly
allocated buffer which you then have to PyMem_Free() after usage.

How does this sound ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From (Skip Montanaro)  Thu Jan  3 15:11:01 2002
From: (Skip Montanaro) (Skip Montanaro)
Date: Thu, 3 Jan 2002 09:11:01 -0600
Subject: [Python-Dev] Unicode strings as filenames
Message-ID: <>

What's the correct way to deal with filenames in a Unicode environment?=

Consider this:

    >>> import site
    >>> site.encoding
    >>> a =3D "abc\xe4\xfc\xdf.txt"
    >>> u =3D unicode (a, "latin-1")
    >>> uu =3D u.encode ("utf-8")
    >>> open(a, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823c2a0>
    >>> open(u, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823a1e8>
    >>> open(uu, "w")
    <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x81d6160>

If I change my site's default encoding back to ascii, the second open f=

    >>> import site
    >>> site.encoding
    >>> a =3D "abc\xe4\xfc\xdf.txt"
    >>> u =3D unicode (a, "latin-1")
    >>> uu =3D u.encode ("utf-8")
    >>> open(a, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x822b448>
    >>> open(u, "w")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: ASCII encoding error: ordinal not in range(128)
    >>> open(uu, "w")
    <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x822d260>

as I expect it should.  The third open is a problem as well, even thoug=
h it
succeeds with either encoding.  (Why doesn't it fail when the default
encoding is ascii?)  My thought is that before using a plain string or =
unicode string as a filename it should first be coerced to a unicode st=
with the default encoding, something like:

    if type(fname) =3D=3D types.StringType:
        fname =3D unicode(fname, site.encoding)
    elif type(fname) =3D=3D types.UnicodeType:
        fname =3D fname.encode(site.encoding)
        raise TypeError, ("unrecognized type for filename: %s"%type(fna=

Is that the correct approach?  Apparently Python's file object doesn't =
this under the covers.  Should it?



From Samuele Pedroni" <  Thu Jan  3 16:43:57 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Thu, 3 Jan 2002 17:43:57 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
Message-ID: <021101c19475$d7251260$6d94fea9@newmexico>


[Ok this is maybe more a comp.lang.python thing
but ...]

If I'm correct
dictionaries are based on equality and so the "in" operator.

AFAIK if I'm interested in a dictionary working on identity
I should wrap my objects ...

Now what is the fastest idiom equivalent to:

obj in list

when I'm interested in identity (is) and not equality?

That was the comp.lang.python part, now
my impression is that in any case when I'm interested
in identity and not equality I have to workaround,
that means I will never directly have the performace of the
equality idioms. Although my experience say that the
equality case is the most common, I wonder whether
some directy support for the identity case isn't worth,
because it is rare but typically then you would like some
speed. [Yes, I have some concrete context but this is long
so unless strictly requested ...]

Am I missing something? Opinions.


From Samuele Pedroni" <  Thu Jan  3 16:51:15 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Thu, 3 Jan 2002 17:51:15 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
References: <021101c19475$d7251260$6d94fea9@newmexico>
Message-ID: <022f01c19476$dbabb680$6d94fea9@newmexico>

PS: I know that equality for user classes defaults to identity.
But I'm obviously interested to the case when equality has
been possibly redefined and I still need identity.


----- Original Message -----
From: Samuele Pedroni <>
To: <>
Sent: Thursday, January 03, 2002 5:43 PM
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and

> Hi,
> [Ok this is maybe more a comp.lang.python thing
> but ...]
> If I'm correct
> dictionaries are based on equality and so the "in" operator.
> AFAIK if I'm interested in a dictionary working on identity
> I should wrap my objects ...
> Now what is the fastest idiom equivalent to:
> obj in list
> when I'm interested in identity (is) and not equality?
> That was the comp.lang.python part, now
> my impression is that in any case when I'm interested
> in identity and not equality I have to workaround,
> that means I will never directly have the performace of the
> equality idioms. Although my experience say that the
> equality case is the most common, I wonder whether
> some directy support for the identity case isn't worth,
> because it is rare but typically then you would like some
> speed. [Yes, I have some concrete context but this is long
> so unless strictly requested ...]
> Am I missing something? Opinions.
> regards.
> _______________________________________________
> Python-Dev mailing list

From  Thu Jan  3 17:16:44 2002
From: (Ka-Ping Yee)
Date: Thu, 3 Jan 2002 11:16:44 -0600 (CST)
Subject: [Python-Dev] updated
Message-ID: <>

I would like to apologize for allowing to fall behind
Python development for the past few months.  At last count, it
only gave documentation for Python 2.1b1 and 1.5.2.

Today, has been updated to provide all the pydoc-generated
documentation pages for Python 1.5.2, 1.6, 2.1, and 2.2 final.
The search feature lets you search the names of all the modules,
packages, functions, classes, and methods, and the text of all
their docstrings.

I hope it can be a useful resource for you.  Any thoughts you have
on making it better would be very welcome, of course.

-- ?!ng

From  Thu Jan  3 21:20:24 2002
From: (Neil Hodgson)
Date: Fri, 4 Jan 2002 08:20:24 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <>
Message-ID: <006e01c1949c$7631d1b0$0acc8490@neil>

> What's the correct way to deal with filenames in a
> Unicode environment?
> Consider this:
[Attempts to use encoding]

   On Windows NT/2K/XP the right thing to do is to use the wide char open
function such as
_CRTIMP FILE * __cdecl _wfopen(const wchar_t *, const wchar_t *);
_CRTIMP int __cdecl _wopen(const wchar_t *, int, ...);

   There may also be techniques for doing this on Windows 9x as the file
system stores Unicode file names but I have never looked into this.


From (Skip Montanaro)  Thu Jan  3 21:28:16 2002
From: (Skip Montanaro) (Skip Montanaro)
Date: Thu, 3 Jan 2002 15:28:16 -0600
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <006e01c1949c$7631d1b0$0acc8490@neil>
References: <>
Message-ID: <>

    Skip> What's the correct way to deal with filenames in a Unicode
    Skip> environment?  Consider this:

    Skip> [Attempts to use encoding]

    Neil> On Windows NT/2K/XP the right thing to do is to use the wide char
    Neil> open function such as
    Neil>   _CRTIMP FILE * __cdecl _wfopen(const wchar_t *, const wchar_t *);
    Neil>   _CRTIMP int __cdecl _wopen(const wchar_t *, int, ...);

    Neil> There may also be techniques for doing this on Windows 9x as the
    Neil> file system stores Unicode file names but I have never looked into
    Neil> this.

How is this exposed (if at all) to Python programmers?  I happen to be
developing on Linux, but the eventual delivery platform will be Windows.  Is
there no way to handle this in a cross-platform way?


From  Thu Jan  3 21:38:56 2002
From: (Martin v. Loewis)
Date: Thu, 3 Jan 2002 22:38:56 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (
References: <> <> <>
Message-ID: <>

> > I see. u# could be made work for Unicode objects alone, but it would
> > have to reject string objects.
> Martin, I don't agree here: string objects could hold binary UCS-2/UCS-4 
> data.

They could. Most likely, they don't. Explicit is better then implicit:
Anybody wishing to pass UCS-2 binary data to a function expecting
character strings should do

  function(unicode(data, "UCS-2BE")) # or LE if appropriate

> es# has logic in place which allows either copying the raw data
> to a buffer you provide or have it allocate a buffer of the
> right size for you. That's why I proposed to extend it support
> Unicode raw data as well.

Even though es# is cleanly defined, it is still undesirable to use,
IMO: it requires more copies of data than necessary. If explicit
memory management is required, it should be exposed through
Py_DECREF. That is easy to understand, and it allows to share
immutable objects, thus avoiding copies.

> > PyObject *Py_UnicodeOrString(PyObject *o, void *ignored)){
> >   if (PyUnicode_Check(o)){
> >     Py_INCREF(o);return o;
> >   }
> >   if (PyString_Check(o)){
> >     return PyUnicode_FromObject(o);
> >   }
> >   PyErr_SetString(PyExc_TypeError,"unicode object expecpected");
> >   return NULL;
> > }
> Martin, note that PyUnicode_FromObject() already does the Unicode
> pass-through (even more: it makes sure that you get a true Unicode
> object, not a subclass).

I noticed. However, I'd like Py_UnicodeOrString to fail if you are not
passing a character string (and I'd see no problem in accepting
Unicode subtypes without copying them). This is a minor point, though
- I might have written

PyObject *Py_UnicodeOrString(PyObject *p, void* ignored){
  return PyObject_FromObject(o);

as well.

> Jack wants to get string and Unicode objects converted to Unicode 
> automagically and then receive a pointer to a Py_UNICODE buffer and
> a size. 
> The current solution for this is to use the "O" parser,
> fetch the object, pass it through PyUnicode_FromObject(), then
> use PyUnicode_GET_SIZE() and PyUnicode_AS_UNICODE() to access
> the Py_UNICODE buffer and finally to Py_DECREF() the object returned
> by PyUnicode_FromObject().

That is the solution, although I would claim that using the O& parser
is simpler, and more flexible.

> What I proposed was to extend the "es#" parser marker with a new
> modifier: "eu#" which does all of the above except that it either
> copies the Py_UNICODE data to a buffer you provide or a newly
> allocated buffer which you then have to PyMem_Free() after usage.
> How does this sound ?

Terrible. It copies a Unicode object without any need. It also adds to
the inflation of format specifiers for getargs; this inflation is
terrible in itself.


From  Thu Jan  3 21:43:11 2002
From: (Neil Hodgson)
Date: Fri, 4 Jan 2002 08:43:11 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <>        <006e01c1949c$7631d1b0$0acc8490@neil> <>
Message-ID: <00c701c1949f$a3cb38c0$0acc8490@neil>

> How is this exposed (if at all) to Python programmers?

   Currently not exposed AFAICT except through calldll.

> I happen to be
> developing on Linux, but the eventual delivery platform will be Windows.
> there no way to handle this in a cross-platform way?

   Cross-platform is tricky as the file systems used on Linux have narrow
string file names. Some higher level software (such as the forthcoming
version of GTK+/GNOME) assume file names are encoded in UTF-8 but this is a
somewhat dangerous assumption.

   The problem on Windows is that there are files you can not open by
performing encoding operations on the Unicode names. They do have narrow
generated names, but these are mangled and look like Z8F22~1.HTM so are hard
to discover.


From  Thu Jan  3 21:52:19 2002
From: (Martin v. Loewis)
Date: Thu, 3 Jan 2002 22:52:19 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
 (message from Skip Montanaro on Thu, 3 Jan 2002 09:11:01 -0600)
References: <>
Message-ID: <>

> What's the correct way to deal with filenames in a Unicode environment?
> Consider this:
>     >>> import site
>     >>> site.encoding
>     'latin-1'

Setting site.encoding is certainly the wrong thing to do. How can you
know all users of your system use latin-1?

> If I change my site's default encoding back to ascii, the second open fai=
>     >>> import site
>     >>> site.encoding
>     'ascii'
>     >>> a =3D "abc\xe4\xfc\xdf.txt"
>     >>> u =3D unicode (a, "latin-1")

On my system, the following works fine

>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
>>> a =3D "abc\xe4\xfc\xdf.txt"
>>> u =3D unicode (a, "latin-1")
>>> open(u, "w")
<open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88>

On Unix, your best bet for file names is to trust the user's locale
settings. If you do that, open will accept Unicode objects.

What is your locale?

> Is that the correct approach?  Apparently Python's file object doesn't do
> this under the covers.  Should it?

No. There is no established convention, on Unix, how to do non-ASCII
file names. If anything, following the user's locale setting is the
most reasonable thing to do; this should be in synch of how the user's
terminal displays characters. The Python installations' default
encoding is almost useless, and shouldn't be changed.

On Windows, things are much better, since there a notion of Unicode
file names in the system.


From  Thu Jan  3 22:09:57 2002
From: (Martin v. Loewis)
Date: Thu, 3 Jan 2002 23:09:57 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <006e01c1949c$7631d1b0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil>
Message-ID: <>

>    On Windows NT/2K/XP the right thing to do is to use the wide char open
> function such as
> _CRTIMP FILE * __cdecl _wfopen(const wchar_t *, const wchar_t *);
> _CRTIMP int __cdecl _wopen(const wchar_t *, int, ...);

I agree. However:

- Mark decided to take a different route, using fopen all the time, but
  encoding Unicode strings with the "mbcs" encoding, which calls
  MultiByteToWideCharCP with CP_ACP. AFAICT, this is correct as well
  (although it invokes an unneeded conversion of the string, since
  fopen, eventually, will convert the string back to Unicode - probably
  inside CreateFileExA - atleast on WinNT).

  In any case, passing Unicode objects to open() works just fine, atleast
  as long as they can be encoded in the ANSI code page. If you want to
  open a Chinese file name on a Russian Windows installation, you lose.

- Skip was likely asking about a Unix installation, in which case all
  of this is irrelevant.

>    There may also be techniques for doing this on Windows 9x as the file
> system stores Unicode file names but I have never looked into this.

To my knowledge, VFAT32 doesn't - only NTFS does (which is not
available on W9x).


From  Thu Jan  3 22:51:03 2002
From: (Neil Hodgson)
Date: Fri, 4 Jan 2002 09:51:03 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <>
Message-ID: <01ab01c194a9$237b6dc0$0acc8490@neil>


>   In any case, passing Unicode objects to open() works just fine, atleast
>   as long as they can be encoded in the ANSI code page. If you want to
>   open a Chinese file name on a Russian Windows installation, you lose.

   I want to be able to open all files on my English W2K install and can
with many applications even if some have Chinese names and some have
Russian. The big advance W2K made over NT was to only have one real version
of the OS instead of multiple language versions. There is a system default
language as well as local defaults but with just a few clicks my machine can
be used as a Japanese machine although as the keyboard keys don't grow
Japanese characters, it is a bit harder to use. You do buy localised
versions of W2K and XP but they differ in packagng and defaults - the
underlying code is identical which was not the case for NT or 9x.

   Locales are a really poor choice for people who need to operate in
multiple languages and much software is moving to allowing concurrent use of
multiple languages through the use of Unicode. The term
'multinationalization' (m18n) is sometimes used in Japan to talk about
systems that try to avoid restrictions on character set and language.

> >    There may also be techniques for doing this on Windows 9x as the file
> > system stores Unicode file names but I have never looked into this.
> To my knowledge, VFAT32 doesn't - only NTFS does (which is not
> available on W9x).

   I have a file called u"C:\\z\u0439\u0446.html" on my W2K FAT partition
which displays correctly in the explorer and can be opened in, for example,

   This leads to the interesting situation of being able to see a file using
glob but not then use it:
>>> import glob
>>> glob.glob("C:\\*.html")
['C:\\l2.html', 'C:\\list.html', 'C:\\m4.html', 'C:\\x.html',
>>> for i in glob.glob("C:\\*.html"):
...    f = open(i)
Traceback (most recent call last):
  File "<stdin>", line 2, in ?
IOError: [Errno 22] Invalid argument: 'C:\\z??.html'


From  Thu Jan  3 22:56:38 2002
From: (Martin v. Loewis)
Date: Thu, 3 Jan 2002 23:56:38 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <021101c19475$d7251260$6d94fea9@newmexico> (
References: <021101c19475$d7251260$6d94fea9@newmexico>
Message-ID: <>

> Now what is the fastest idiom equivalent to:
> obj in list
> when I'm interested in identity (is) and not equality?

It appears that doing a plain for loop is fastest, see the attached
script below. On my system,it gives

m1         0   0.00      5000   0.29      9999   0.60       1.0   0.61
m2         0   0.60      5000   0.61      9999   0.62       1.0   0.62
m3         0   1.81      5000   1.81      9999   1.81       1.0   1.83
m4         0   0.00      5000   1.54      9999   3.11       1.0   3.17

> Although my experience say that the equality case is the most
> common, I wonder whether some directy support for the identity case
> isn't worth, because it is rare but typically then you would like
> some speed.

In Smalltalk, such things would be done in specialized
containers. E.g. the IdentityDictionary is a dictionary where keys are
considered equal only if identical. Likewise, you could have a
specialized list type. OTOH, if you need speed, just write an
extension module - doing a identical_in function is straight-forward.

I'd hesitate to add identical_in to the API, since it would mean that
it needs to be supported for any container, the same sq_contains works


import time

x = range(10000)
rep = [None] * 100

values = x[0], x[5000], x[-1], 1.0

def m1(val, rep=rep, x=x):
    for r in rep:
        found = 0
        for s in x:
            if s is val:
                found = 1

def m2(val, rep=rep, x=x):
    for r in rep:
        found = [s for s in x if s is val]

def m3(val, rep=rep, x=x):
    for r in rep:
        def identical(elem):
            return elem is val
        found = filter(identical, x)

class Contains:
    def __init__(self, val):
        self.val = val
    def __eq__(self, other):
        return self.val is other
def m4(val, rep=rep, x=x):
    for r in rep:
        found = Contains(val) in x

for options in [m1, m2, m3, m4]:
    print options.__name__,
    for val in values:
        start = time.time()
        end = time.time()
        print "%9s %6.2f" % (val,end-start),

From  Thu Jan  3 22:58:23 2002
From: (Martin v. Loewis)
Date: Thu, 3 Jan 2002 23:58:23 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
 (message from Skip Montanaro on Thu, 3 Jan 2002 15:28:16 -0600)
References: <>
 <006e01c1949c$7631d1b0$0acc8490@neil> <>
Message-ID: <>

> How is this exposed (if at all) to Python programmers?  I happen to be
> developing on Linux, but the eventual delivery platform will be Windows.  Is
> there no way to handle this in a cross-platform way?

Sure. Just pass Unicode strings to open(). Notice that this requires
Python 2.2, and expect exceptions. If it fails, fallback are up to
your application: it would be best to let the user know that the
choice of file name was not sensible.


From  Thu Jan  3 23:09:45 2002
From: (Jack Jansen)
Date: Fri, 4 Jan 2002 00:09:45 +0100 (CET)
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <>
Message-ID: <>

I'm going to jump out of this discussion for a while. Martin and Mark have 
a completely different view on Unicode than I do, apparently, and I think 
I should first try and see if I can use the current implementation.

For the record: my view of Unicode is really "ascii done right", i.e. a 
datatype that allows you to get richer characters than what 1960s ascii 
gives you. For this it should be as backward-compatible as possible, i.e. 
if some API expects a unicode filename and I pass "a.out" it should 
interpret it as u"a.out". All the converting to different charsets is 
icing on the cake, the number one priority should be that unicode is as 
compatible as possible with the 8-bit convention used on the platform 
(whatever it may be). No, make that the number 2 priority: the number one 
pritority is compatibility with 7-bit ascii. Using Python StringObjects as 
binary buffers is also far less common than using StringObjects to store 
plain old strings, so if either of these uses bites the other it's the 
binary buffer that needs to suffer. UnicodeObjects and StringObjects 
should behave pretty orthogonal to how FloatObjects and IntObjects behave.

 -- --
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++      | ++++ if you agree copy these lines to your sig ++++ | see 

From (Skip Montanaro)  Thu Jan  3 23:11:10 2002
From: (Skip Montanaro) (Skip Montanaro)
Date: Thu, 3 Jan 2002 17:11:10 -0600
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "Martin" =3D=3D Martin v Loewis <> writes:

    >> What's the correct way to deal with filenames in a Unicode
    >> environment?  Consider this:
    >> >>> import site site.encoding
    >> 'latin-1'

    Martin> Setting site.encoding is certainly the wrong thing to do. H=
    Martin> can you know all users of your system use latin-1?

Why is setting site.encoding appropriate to your environment at the tim=
e you
install Python wrong?  I can't know that all users of my system (whatev=
the definition of "my system" is) will use latin-1.  Somewhere along th=
e way
I have to make some assumptions, however.

    On any given computer I assume the people who install Python will s=
    site.encoding appropriate to their environment.

    The example I used was latin-1 simply because the folks I'm working=
    are in Austria and they came up with the example.  I assume the bes=
    default encoding for them is latin-1.

    The application writers themselves will have no problem restricting=

    internal filenames to be ascii.  I assume it users want to save fil=
es of
    their own, they will choose characters from the Unicode character s=
    they use most frequently.

So, my example used latin-1.  I could just as easily have chosen someth=

    Martin> On my system, the following works fine

    Martin> >>> import locale ; locale.setlocale(locale.LC_ALL,"")
    Martin> >>> a =3D "abc\xe4\xfc\xdf.txt" u =3D unicode (a, "latin-1"=
) open(u, "w")
    Martin> <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x8173e88>

    Martin> On Unix, your best bet for file names is to trust the user'=
    Martin> locale settings. If you do that, open will accept Unicode
    Martin> objects.

    Martin> What is your locale?

The above setlocale call prints


I can't get to the machines in Austria right now to see how their local=
are set, though I suspect they haven't fiddled their LC_* environment,
because they are having the problems I described.

    >> Is that the correct approach?  Apparently Python's file object
    >> doesn't do this under the covers.  Should it?

    Martin> No. There is no established convention, on Unix, how to do
    Martin> non-ASCII file names. If anything, following the user's loc=
    Martin> setting is the most reasonable thing to do; this should be =
    Martin> synch of how the user's terminal displays characters. The P=
    Martin> installations' default encoding is almost useless, and shou=
    Martin> be changed.

    Martin> On Windows, things are much better, since there a notion of=

    Martin> Unicode file names in the system.

This suggests to me that the Python docs need some introductory materia=
l on
this topic.  It appears to me that there are two people in the Python
community who live and breathe this stuff are you, Martin, and Marc-And=
For most of the rest of us, especially if we've never conciously writte=
code for consumption outside an ascii environment, the whole thing just=

looks like a quagmire.


From  Thu Jan  3 23:16:29 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 00:16:29 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <01ab01c194a9$237b6dc0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil>
Message-ID: <>

>    I want to be able to open all files on my English W2K install and can
> with many applications even if some have Chinese names and some have
> Russian. The big advance W2K made over NT was to only have one real version
> of the OS instead of multiple language versions. 

I understand all that, but I can't agree with all your conclusions.

>    Locales are a really poor choice for people who need to operate in
> multiple languages and much software is moving to allowing concurrent use of
> multiple languages through the use of Unicode. 

On Windows, locales and Unicode don't contradict each other. You can
create files through the locale's code page, and they still end up on
disk correctly. This is a much better situation than you have on Unix.

In any case, there is no alternative. Locales may be good or bad - you
must follow system conventions, if you want to write usable software.

> > To my knowledge, VFAT32 doesn't - only NTFS does (which is not
> > available on W9x).
>    I have a file called u"C:\\z\u0439\u0446.html" on my W2K FAT partition
> which displays correctly in the explorer and can be opened in, for example,
> notepad.

Oops, you are right - the long file name is in Unicode. It is only
when you do not have a long file name that the short one is
interpreted in OEM encoding.

> >>> import glob
> >>> glob.glob("C:\\*.html")
> ['C:\\l2.html', 'C:\\list.html', 'C:\\m4.html', 'C:\\x.html',
> 'C:\\z??.html']
> >>> for i in glob.glob("C:\\*.html"):
> ...    f = open(i)
> ...
> Traceback (most recent call last):
>   File "<stdin>", line 2, in ?
> IOError: [Errno 22] Invalid argument: 'C:\\z??.html'

I agree this is unfortunate; patches are welcome. Please notice that
the strategy of using wchar_t API on Windows has explicitly been
considered and rejected, for the complexity of the code changes
involved. So anybody proposing a patch would need to make it both
useful, and easy to maintain. With these constraints, the current
implementation is the best thing Mark could come up with.

Software always has limitations, which are removed only if somebody is
bothered so much as to change the software.


From  Thu Jan  3 23:23:35 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 00:23:35 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <>
 (message from Jack Jansen on Fri, 4 Jan 2002 00:09:45 +0100 (CET))
References: <>
Message-ID: <>

> For the record: my view of Unicode is really "ascii done right", i.e. a 
> datatype that allows you to get richer characters than what 1960s ascii 
> gives you. 

Exactly, with the stress on *ASCII*. Almost everybody could agree on
ASCII; it is the 8-bit character sets where the troubles start.

> For this it should be as backward-compatible as possible, i.e.  if
> some API expects a unicode filename and I pass "a.out" it should
> interpret it as u"a.out".

That works fine with the current API.

> All the converting to different charsets is icing on the cake, the
> number one priority should be that unicode is as compatible as
> possible with the 8-bit convention used on the platform (whatever it
> may be).

The problem is that there are multiple conventions on many systems,
and only the application can know which of these to apply.

> Using Python StringObjects as binary buffers is also far less common
> than using StringObjects to store plain old strings, so if either of
> these uses bites the other it's the binary buffer that needs to
> suffer.

This is a conclusion I cannot agree with. Most strings are really
binary, if you look at them closely enough :-)

> UnicodeObjects and StringObjects should behave pretty orthogonal to
> how FloatObjects and IntObjects behave.

For the Python programmer: yes; For the C programmer: memory
management makes that inherently difficult, which you don't have for
int vs float.


From  Thu Jan  3 23:34:25 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 00:34:25 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
 (message from Skip Montanaro on Thu, 3 Jan 2002 17:11:10 -0600)
References: <>
 <> <>
Message-ID: <>

>     >> What's the correct way to deal with filenames in a Unicode
>     >> environment?  Consider this:
>     >>
>     >> >>> import site site.encoding
>     >> 'latin-1'
>     Martin> Setting site.encoding is certainly the wrong thing to do. How
>     Martin> can you know all users of your system use latin-1?
> Why is setting site.encoding appropriate to your environment at the time you
> install Python wrong?  I can't know that all users of my system (whatever
> the definition of "my system" is) will use latin-1.  Somewhere along the way
> I have to make some assumptions, however.

Well, then accept the assumption that almost everybody will use an
ASCII superset. That may be still wrong, for the case of EBCDIC users,
but those are rare on Unix.

However, on our typical Unix system, three different encodings are in
use: ISO-8859-1 (for tradition), ISO-8859-15 (because it has the
Euro), and UTF-8 (because it removes all the limitations). Notice that
all of our users speak German, and we still could not set a meaningful
site.encoding except for 'ascii'.

>     On any given computer I assume the people who install Python will set
>     site.encoding appropriate to their environment.

That is probably wrong. Most users will install precompiled packages,
and thus will have the value that the package held, which will
be 'ascii' for most packages.

>     The example I used was latin-1 simply because the folks I'm working with
>     are in Austria and they came up with the example.  I assume the best
>     default encoding for them is latin-1.

Well, latin-1 does not have a Euro sign, which may be more and more of
a problem.

>     The application writers themselves will have no problem restricting
>     internal filenames to be ascii.  I assume it users want to save files of
>     their own, they will choose characters from the Unicode character set
>     they use most frequently.

That is a meaningful assumption. However, it is one that you have to
make in your application, not one that you should users expect to make
in their Python installations.

> The above setlocale call prints

You may want to extend your system to support the same configuration
that your users have, i.e. you might want to install an Austrian
locale on your system, and set LANG to de_AT. If your system also sets
all the LC_ variables for you, I recommend to unset them - setting
LANG is enough (to override all other LC_ variables, setting LC_ALL to
de_AT should also work).

> I can't get to the machines in Austria right now to see how their locales
> are set, though I suspect they haven't fiddled their LC_* environment,
> because they are having the problems I described.

If if they set the environment variables, they'd still have the problem
because your application doesn't call setlocale.

I do expect that they have set LANG to de_AT, or de_AT.ISO-8859-1.

Perhaps they also have this problem because they use Python 2.1 or

> This suggests to me that the Python docs need some introductory material on
> this topic.  It appears to me that there are two people in the Python
> community who live and breathe this stuff are you, Martin, and Marc-André.
> For most of the rest of us, especially if we've never conciously written
> code for consumption outside an ascii environment, the whole thing just
> looks like a quagmire.

Well, I'd happily review any introductory material somebody else
writes :-)


From  Fri Jan  4 00:07:19 2002
From: (Neil Hodgson)
Date: Fri, 4 Jan 2002 11:07:19 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <>
Message-ID: <020601c194b3$c85a4320$0acc8490@neil>


> I agree this is unfortunate; patches are welcome. Please notice that
> the strategy of using wchar_t API on Windows has explicitly been
> considered and rejected, for the complexity of the code changes
> involved. So anybody proposing a patch would need to make it both
> useful, and easy to maintain. With these constraints, the current
> implementation is the best thing Mark could come up with.
> Software always has limitations, which are removed only if somebody is
> bothered so much as to change the software.

   Sure, I'm just putting my point of view which appears to be different
from most in that many developers just use a single locale. If I had a
larger supply of time then I'd eventually work on this but there are other
tasks that currently look like having more impact.

   The system provided scripting languages support wide character file
names. in VBScript:

Set fso = CreateObject("Scripting.FileSystemObject")
crlf = chr(13) & chr(10)
For Each f1 in fso.GetFolder("C:\").Files
 if instr(1,, ".htm") > 0 then
  s = s & f1.Path & crlf
  if left(, 1) = "z" then
   fo = fso.OpenTextFile(f1.Path).ReadAll()
   s = s & fo & crlf
  end if
 end if
MsgBox s

   And Python with the win32 extensions can do the same using the

# encode used here just to make things print as a quick demo
import win32com
fso = win32com.client.Dispatch("Scripting.FileSystemObject")
s = ""
fol = fso.GetFolder("C:\\")
for f1 in fol.Files:
 if".htm") > 0:
  s += f1.Path.encode("UTF-8") + "\r\n"
  if[0] == u"z":
   fo = fso.OpenTextFile(f1.Path).ReadAll()
   s += fo.encode("UTF-8") + "\r\n"
print s


From Samuele Pedroni" <  Fri Jan  4 00:50:19 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Fri, 4 Jan 2002 01:50:19 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
References: <021101c19475$d7251260$6d94fea9@newmexico> <>
Message-ID: <003d01c194b9$c9f016a0$47fdbac3@newmexico>

[Martin v. Loewis]
> > Now what is the fastest idiom equivalent to:
> > 
> > obj in list
> > 
> > when I'm interested in identity (is) and not equality?
> It appears that doing a plain for loop is fastest, see the attached
> script below. On my system,it gives
> m1         0   0.00      5000   0.29      9999   0.60       1.0   0.61
> m2         0   0.60      5000   0.61      9999   0.62       1.0   0.62
> m3         0   1.81      5000   1.81      9999   1.81       1.0   1.83
> m4         0   0.00      5000   1.54      9999   3.11       1.0   3.17

Thanks, and, sorry I could have done the measuraments myself
but I supposed that maybe someone should already know.
The result makes also sense, is the version that does less consing
and calling user python functions. But only profiling knows
the truth <wink>.

> > Although my experience say that the equality case is the most
> > common, I wonder whether some directy support for the identity case
> > isn't worth, because it is rare but typically then you would like
> > some speed.
> In Smalltalk, such things would be done in specialized
> containers. E.g. the IdentityDictionary is a dictionary where keys are
> considered equal only if identical. Likewise, you could have a
> specialized list type. OTOH, if you need speed, just write an
> extension module - doing a identical_in function is straight-forward.

Is not really my code, but yes writing an extension (especially in
jython) would be not too difficult but see below.
> I'd hesitate to add identical_in to the API, since it would mean that
> it needs to be supported for any container, the same sq_contains works
> now.

I see the problem. Implicitly I was asking whether adding  builtin-in
identity_list, identity_dict and corresponding weak versions for the dicts
could make sense or is just code bloat.

The context  is anygui (, I'm following it closely
and I try to help with jython/swing issues.

Anygui has code like this in the event handling logic:

                      if not loop and not r.loop \
                           and id(obj) in source_stack: continue

Now this is a nice idiom and workarounds the identity-list problem,
but mmh ... id is broken under jython (that means
different objects can get the same id :( ) , also in 2.1 final.
 It is a long-standing bug
and yes we are about to solve it but there is a trade-off and jython
id will become precise but many times slower wrt to CPython version
(we need to implement a weak-key-table :( ).

An identity_list would make for a portable idiom with comparable
overhead and will give to the identity case somehow the same speed
of the equality case...

And further anygui shows also a possible need for a WeakKeyIdentityDict...


From  Fri Jan  4 01:07:04 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 02:07:04 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <020601c194b3$c85a4320$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil>
Message-ID: <>

>    The system provided scripting languages support wide character file
> names. 

Please understand that Python also supports wide character file
names. It just doesn't allow all the possible values that the system
would allow.

> For Each f1 in fso.GetFolder("C:\").Files

That, of course, is another important difference: Here you get the
directory contents as wide strings. Changing os.listdir to return
Unicode objects would be possible, but would likely introduce a number
of incompatibilities. Your script (e.g. the Python variant) is
prepared that .Files returns Unicode objects. Making the same change
in Python on all functions that return file names (i.e. listdir, glob,
etc) is difficult - most likely, you'll have to make the return type
a choice of the application.


From  Fri Jan  4 01:39:23 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 02:39:23 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <003d01c194b9$c9f016a0$47fdbac3@newmexico> (
References: <021101c19475$d7251260$6d94fea9@newmexico> <> <003d01c194b9$c9f016a0$47fdbac3@newmexico>
Message-ID: <>

> An identity_list would make for a portable idiom with comparable
> overhead and will give to the identity case somehow the same speed
> of the equality case...
> And further anygui shows also a possible need for a WeakKeyIdentityDict...

Well, I'd say this is a clear indication that this has to go the path
that all library extensions go (or should go): They are implemented in
one project, then are used in other projects as well, until finally
somebody submits the implementation to the Python core.

In the case of anygui, I'd suggest to include different
implementations of the identity_list, and any other specialised
container you may have:

- one implementation for C python that works across all Python
  versions (in C)
- if useful, one implementation for Python 2.2 using type inheritance,
  in C, alternatively, one implementation in pure Python:

class identity_list(list):
    def __contains__(self, elem):
        for i in self:
            if i is elem:
                return 1
        return 0

    # need to implement count, index, remove

  It turns out that this class is as fast in my benchmark than the
  Python loop over a builtin list, which is not surprising, as it is
  the same loop.

- one implementation in Java for use with Jython.

- one implementation in pure Python which works across all Python

The configuration mechanics of anygui should then select an
appropriate version.

Experience will tell which of those implementations are used in
practice, and which are of use to other packages. That will eventually
give a foundation for including one of them into the core
libraries. People tend to invent new containers all the time (and new
methods for existing containers), and I believe we have to resist the
tempation of including them into the language at first sight.

Just make sure that you do *not* put those containers into the
location where the Python library will eventually put them, as well;
instead if the core provides them, have the configuration mechanics
figure to use the builtin type, instead of the anygui-included
fallback implementation.


From Samuele Pedroni" <  Fri Jan  4 02:10:50 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Fri, 4 Jan 2002 03:10:50 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
References: <021101c19475$d7251260$6d94fea9@newmexico> <> <003d01c194b9$c9f016a0$47fdbac3@newmexico> <>
Message-ID: <00eb01c194c5$0880a000$47fdbac3@newmexico>

> > An identity_list would make for a portable idiom with comparable
> > overhead and will give to the identity case somehow the same speed
> > of the equality case...
> > 
> > And further anygui shows also a possible need for a WeakKeyIdentityDict...
> Well, I'd say this is a clear indication that this has to go the path
> that all library extensions go (or should go): They are implemented in
> one project, then are used in other projects as well, until finally
> somebody submits the implementation to the Python core.
> In the case of anygui, I'd suggest to include different
> implementations of the identity_list, and any other specialised
> container you may have:
> - one implementation for C python that works across all Python
>   versions (in C)
> - if useful, one implementation for Python 2.2 using type inheritance,
>   in C, alternatively, one implementation in pure Python:
> class identity_list(list):
>     def __contains__(self, elem):
>         for i in self:
>             if i is elem:
>                 return 1
>         return 0
>     # need to implement count, index, remove
>   It turns out that this class is as fast in my benchmark than the
>   Python loop over a builtin list, which is not surprising, as it is
>   the same loop.
> - one implementation in Java for use with Jython.
> - one implementation in pure Python which works across all Python
>   versions.
> The configuration mechanics of anygui should then select an
> appropriate version.
> Experience will tell which of those implementations are used in
> practice, and which are of use to other packages. That will eventually
> give a foundation for including one of them into the core
> libraries. People tend to invent new containers all the time (and new
> methods for existing containers), and I believe we have to resist the
> tempation of including them into the language at first sight.

I won't argue about that.
> Just make sure that you do *not* put those containers into the
> location where the Python library will eventually put them, as well;
> instead if the core provides them, have the configuration mechanics
> figure to use the builtin type, instead of the anygui-included
> fallback implementation.
In this case the above "you" is fully undefined.
I will archive this discussion for better times when
I have spare-cycles.

Anygui people is commited to ship just pure python code,
and I'm not really a developer for the project, just a jython
So I will just workaround otherwise, I already
knew that, this was mostly a survey, a valuable one
and thanks for the answers. 

My band-width in the near future is for helping
with Jython 2.2 and other personal stuff ...

Thanks, Samuele.

From  Fri Jan  4 03:00:08 2002
From: (Tim Peters)
Date: Thu, 3 Jan 2002 22:00:08 -0500
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <003d01c194b9$c9f016a0$47fdbac3@newmexico>
Message-ID: <>

[Samuele Pedroni]
> ...
> but mmh ... id is broken under jython (that means
> different objects can get the same id :( ) , also in 2.1 final.
>  It is a long-standing bug and yes we are about to solve it but
> there is a trade-off and jython id will become precise but many times
> slower wrt to CPython version (we need to implement a weak-key-table
> :( ).

Mapping what to what?  A fine implementation of id() would be to hand each
new object a unique Java int from a global counter, incremented once per
Python object creation -- or a Java long if any JVM stays up long enough
that 32 bits is an issue <wink>.

From Samuele Pedroni" <  Fri Jan  4 03:11:00 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Fri, 4 Jan 2002 04:11:00 +0100
Subject: Re [Python-Dev] object equality vs identity, in and dicts idioms and speed
Message-ID: <004101c194cd$701deb20$47fdbac3@newmexico>

[Tim Peters] 
> Mapping what to what?  A fine implementation of id() would be to hand each
> new object a unique Java int from a global counter, incremented once per
> Python object creation -- or a Java long if any JVM stays up long enough
> that 32 bits is an issue <wink>.
The problem are java class instances, sir,
we use non-unique wrappers for them and identity is simulated.
We could use a table to make the wrappers unique but we have
potentially lots of them as you can imagine, jython people
actually use java classes <wink>. So the workaround
is to keep a table just for the java instances for which
someone has asked the id. We cannot win-win so
we try not to lose-lose.  For simplicity we will
extend the table approach to everything.
If you have a win-win solution also in this case please ...
Our goal is compatibility but we will suggest 
to avoid id as far as possible for production jython
code ...

From Samuele Pedroni" <  Fri Jan  4 03:33:00 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Fri, 4 Jan 2002 04:33:00 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
References: <004101c194cd$701deb20$47fdbac3@newmexico>
Message-ID: <00fe01c194d0$82fb4be0$47fdbac3@newmexico>

I have been sloppy in the explanation (dangerous!)

> We could use a table to make the wrappers unique but we have
> potentially lots of them as you can imagine, jython people
> actually use java classes <wink>. 

The point is that we have potentially many java class instances
but not that much wrapper duplication for the same instance.
So it is not worth to pay the overhead and the complication
of making the wrappers unique. And it still not worth
to pay it in order to implement a non-broken id.


From  Fri Jan  4 07:15:08 2002
From: (Anthony Baxter)
Date: Fri, 04 Jan 2002 18:15:08 +1100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
Message-ID: <>

[resend: sorry if you see it twice, but I can't see that the original
ever got through...]

Ok, I'd like to make the 2.1.2 release some time in the first
half of the week starting 7th Jan, assuming that's ok for the folks 
who'll need to do the work on the PC/Mac packaging.

I notice that pep 101 is pretty strongly focussed on the major releases, 
not the minor ones. Is it worth making a modified version of this PEP with 
the minor release steps?

the things to do:

   README file.
   NEWS file - should this have anything other than the socket.sendall()

I don't have access to, so someone else's going to
need to do this.

As far as 2.2.1 goes, I'm happy to keep on the patch czar role. Is
trying for a release before the conference too aggressive a timeframe?
There seem to be a number of niggles that'd be nice to have fixed...


From  Fri Jan  4 07:53:11 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 02:53:11 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <>
Message-ID: <>

>>>>> "AB" == Anthony Baxter <> writes:

    AB> [resend: sorry if you see it twice, but I can't see that the
    AB> original ever got through...]

    AB> Ok, I'd like to make the 2.1.2 release some time in the first
    AB> half of the week starting 7th Jan, assuming that's ok for the
    AB> folks who'll need to do the work on the PC/Mac packaging.

One of the things I'd really like to be sure works in 2.1.2 is
largefile support.  I've had some trouble along these lines on
filesystems that I know have largefile (because a Python 2.2 built on
the same platform works fine).

Do we expect that largefile support should work in Python 2.1.2?  I'm
okay that autoconf detection fails as long as the instructions in the
posix module work:

I've had some failures on 2.4.7 kernels w/ ext3 filesystems.

    AB> I notice that pep 101 is pretty strongly focussed on the major
    AB> releases, not the minor ones. Is it worth making a modified
    AB> version of this PEP with the minor release steps?

I'd be more inclined to clone PEP 101 into a PEP 102 with micro
release instructions.  The nice thing about 101 is that you can just
go down the list, checking things off in a linear fashion as you
complete each item.  I'd be loathe to break up the linearity of that.

    AB> I don't have access to, so someone else's
    AB> going to need to do this.

I can certainly help with any fiddling necessary on creosote.  Then

    AB> As far as 2.2.1 goes, I'm happy to keep on the patch czar
    AB> role.

...if this is going to be a recurring role, we might just want to give
you access to the web cvs tree and creosote.
    AB> Is trying for a release before the conference too
    AB> aggressive a timeframe?  There seem to be a number of niggles
    AB> that'd be nice to have fixed...

Hey, if you're up for it!

dunno-about-you-but-i'm-planning-a-vacation-ly y'rs,

From  Fri Jan  4 09:21:09 2002
From: (M.-A. Lemburg)
Date: Fri, 04 Jan 2002 10:21:09 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil>
Message-ID: <>

[Skip wants open() to handle Unicode on all platforms]

As Martin and Neil have already explained, the handling of national
characters in file names is not standardized at all across 
platforms (not even file systems on one platform, e.g. on Linux).

The only option I see to make this situation less painful is
to write a filename subsystem which implements two generic

1. file open using strings and Unicode

2. file listing using either Unicode or strings with a predefined
   encoding in the output list

Since this subsystem would be fairly complicated, I'd suggest 
that someone writes a PEP on the topic and then the various
experts try to come up with implementations which work on at
least some systems and a fallback implementation which gets
used if no other implementation fits.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan  4 10:08:51 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 11:08:51 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (message from
 Anthony Baxter on Fri, 04 Jan 2002 18:15:08 +1100)
References: <>
Message-ID: <>

> I notice that pep 101 is pretty strongly focussed on the major releases, 
> not the minor ones. Is it worth making a modified version of this PEP with 
> the minor release steps?

If you don't think you'd get it "right", adding a delta section might
be reasonable. Specifically:

  Don't create a release branch. Instead, just call a code freeze on
  the maintainance branch, and release from the maintainance branch
  (just putting on the release tag, i.e. r212)

As for the things still to be done, don't forget Include/patchlevel.h :-)

>    NEWS file - should this have anything other than the socket.sendall()
>                change?

If you can, producing a list of bugs fixed would be nice. It does not
need to be exhaustive.

> As far as 2.2.1 goes, I'm happy to keep on the patch czar role. Is
> trying for a release before the conference too aggressive a timeframe?

I'd very much encourage a release in that time frame.


From  Fri Jan  4 10:38:28 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 11:38:28 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (
References: <> <>
Message-ID: <>

> Do we expect that largefile support should work in Python 2.1.2?  I'm
> okay that autoconf detection fails as long as the instructions in the
> posix module work:

I don't think we can get autoconf detection to work on 2.1. The
instructions are right. Unfortunately, the code is wrong: It prefers
fgetpos in 2.1, but that returns not an integral type on some

I think the best approach is to copy the body of _portable_fseek and
_portable_ftell from 2.2. With that, I get a setup that atleast looks
right (patch attached)

> I've had some failures on 2.4.7 kernels w/ ext3 filesystems.

Were these compilation failures, or runtime failures? For the
compilation failures, ext3 should be irrelevant, and 2.4.7 should be
irrelevant as well - the glibc version would matter (which defines
fpos_t to be a struct with an mbstate_t inside).


Index: fileobject.c
RCS file: /cvsroot/python/python/dist/src/Objects/fileobject.c,v
retrieving revision 2.110
diff -u -r2.110 fileobject.c
--- fileobject.c	2001/04/14 17:49:40	2.110
+++ fileobject.c	2002/01/04 10:31:39
@@ -225,20 +225,28 @@
 static int
 _portable_fseek(FILE *fp, Py_off_t offset, int whence)
-#if defined(HAVE_FSEEKO)
+	return fseek(fp, offset, whence);
+#elif defined(HAVE_FSEEKO) && SIZEOF_OFF_T >= 8
 	return fseeko(fp, offset, whence);
 #elif defined(HAVE_FSEEK64)
 	return fseek64(fp, offset, whence);
 #elif defined(__BEOS__)
 	return _fseek(fp, offset, whence);
+#elif SIZEOF_FPOS_T >= 8
 	/* lacking a 64-bit capable fseek(), use a 64-bit capable fsetpos()
 	   and fgetpos() to implement fseek()*/
 	fpos_t pos;
 	switch (whence) {
 	case SEEK_END:
+#ifdef MS_WINDOWS
+		fflush(fp);
+		if (_lseeki64(fileno(fp), 0, 2) == -1)
+			return -1;
 		if (fseek(fp, 0, SEEK_END) != 0)
 			return -1;
 		/* fall through */
 	case SEEK_CUR:
 		if (fgetpos(fp, &pos) != 0)
@@ -249,7 +257,7 @@
 	return fsetpos(fp, &offset);
-	return fseek(fp, offset, whence);
+#error "Large file support, but no way to fseek."
@@ -260,17 +268,19 @@
 static Py_off_t
 _portable_ftell(FILE* fp)
+	return ftell(fp);
+#elif defined(HAVE_FTELLO) && SIZEOF_OFF_T >= 8
+	return ftello(fp);
+#elif defined(HAVE_FTELL64)
+	return ftell64(fp);
+#elif SIZEOF_FPOS_T >= 8
 	fpos_t pos;
 	if (fgetpos(fp, &pos) != 0)
 		return -1;
 	return pos;
-#elif defined(HAVE_FTELLO) && defined(HAVE_LARGEFILE_SUPPORT)
-	return ftello(fp);
-#elif defined(HAVE_FTELL64) && defined(HAVE_LARGEFILE_SUPPORT)
-	return ftell64(fp);
-	return ftell(fp);
+#error "Large file support, but no way to ftell."

From  Fri Jan  4 10:46:20 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 11:46:20 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <>
Message-ID: <>

> The only option I see to make this situation less painful is
> to write a filename subsystem which implements two generic
> APIs:
> 1. file open using strings and Unicode

I think this "pretty much" works in Python 2.2 already. It uses the
"mbcs" encoding on Windows, and the locale's encoding on Unix if
locale.setlocale has been called (and the C library is good enough).

That might be still wrong if the file system expects UTF-8, or a fixed
encoding (e.g. on an NTFS or VFAT partition mounted on Linux), but I
don't think there is anything that can be done about this: It would be
a misconfigured system if then the user doesn't also use an UTF-8

> 2. file listing using either Unicode or strings with a predefined
>    encoding in the output list

That is something that certainly needs to be done. Having a PEP on
that would be useful.


From  Fri Jan  4 10:54:43 2002
From: (Neil Hodgson)
Date: Fri, 4 Jan 2002 21:54:43 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <>
Message-ID: <008c01c1950e$3775c3b0$0acc8490@neil>

Marc-Andre Lemburg:

> The only option I see to make this situation less painful is
> to write a filename subsystem which implements two generic
> APIs:
> 1. file open using strings and Unicode
> 2. file listing using either Unicode or strings with a predefined
>    encoding in the output list

   I started work on this in C++ for my SciTE editor a couple of months ago
but the design started to include stuff like 'are these two paths pointing
at one file', converting between OpenVMS and Unix paths, and handling URLs
(at least using ftp and http). My brain threatened to explode if it got any
more complex so it got moved to the 'future niceness' pile.


From  Fri Jan  4 11:11:12 2002
From: (M.-A. Lemburg)
Date: Fri, 04 Jan 2002 12:11:12 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > The only option I see to make this situation less painful is
> > to write a filename subsystem which implements two generic
> > APIs:
> >
> > 1. file open using strings and Unicode
> I think this "pretty much" works in Python 2.2 already. It uses the
> "mbcs" encoding on Windows, and the locale's encoding on Unix if
> locale.setlocale has been called (and the C library is good enough).
> That might be still wrong if the file system expects UTF-8, or a fixed
> encoding (e.g. on an NTFS or VFAT partition mounted on Linux), but I
> don't think there is anything that can be done about this: It would be
> a misconfigured system if then the user doesn't also use an UTF-8
> locale.

We'd still need to support other OSes as well, though, and I
don't think that putting all this code into fileobject.c is
a good idea -- after all opening files is needed by some other
parts of Python as well and may also be useful for extensions.

I'd suggest to implement something similiar to the DLL loading
code which is also implemented as subsystem in Python.
> > 2. file listing using either Unicode or strings with a predefined
> >    encoding in the output list
> That is something that certainly needs to be done. Having a PEP on
> that would be useful.


Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan  4 11:14:21 2002
From: (Michael Hudson)
Date: 04 Jan 2002 11:14:21 +0000
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Anthony Baxter's message of "Fri, 04 Jan 2002 18:15:08 +1100"
References: <>
Message-ID: <>

Anthony Baxter <> writes:

> As far as 2.2.1 goes, I'm happy to keep on the patch czar role. 

Fine, so long as you get on with it :) I was going to merge this weeks
bugfixes this morning...

> Is trying for a release before the conference too aggressive a
> timeframe?  There seem to be a number of niggles that'd be nice to
> have fixed...

That's probably a reasonable timeframe, so long as the niggles
actually do get fixed by then.  Picklability of the struct_seq
thingies is one that might be a bit of a pain.


31. Simplicity does not precede complexity, but follows it.
  -- Alan Perlis,

From  Fri Jan  4 11:20:12 2002
From: (M.-A. Lemburg)
Date: Fri, 04 Jan 2002 12:20:12 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <008c01c1950e$3775c3b0$0acc8490@neil>
Message-ID: <>

Neil Hodgson wrote:
> Marc-Andre Lemburg:
> > The only option I see to make this situation less painful is
> > to write a filename subsystem which implements two generic
> > APIs:
> >
> > 1. file open using strings and Unicode
> >
> > 2. file listing using either Unicode or strings with a predefined
> >    encoding in the output list
>    I started work on this in C++ for my SciTE editor a couple of months ago
> but the design started to include stuff like 'are these two paths pointing
> at one file', converting between OpenVMS and Unix paths, and handling URLs
> (at least using ftp and http). My brain threatened to explode if it got any
> more complex so it got moved to the 'future niceness' pile.

I believe that we could do well with the following assumptions:
a) strings passed to open() use whatever encoding is needed by the 
   file system
b) Unicode passed to open() are converted to whatever the file system
   needs by then open() API.

This doesn't cover all the possibilities, but goes a long way. Joining
paths between file systems should really be left to the os.path

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan  4 12:22:47 2002
From: (Jack Jansen)
Date: Fri, 04 Jan 2002 13:22:47 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Fri, 4 Jan 2002 00:23:35 +0100 , <>
Message-ID: <>

Sigh, I let myself be drawn in again, despite my previous

Recently, "Martin v. Loewis" <> said:
> > For this it should be as backward-compatible as possible, i.e.  if
> > some API expects a unicode filename and I pass "a.out" it should
> > interpret it as u"a.out".
> That works fine with the current API.

No, it doesn't, that is the whole point of why I started this

If the Python wrapper around the API uses PyArg_Parse("u") then it
will barf on "a.out", if the wrapper uses "u#" it will not barf but in
stead completely misinterpret the StringObject containing "a.out",
interpreting it as the binary representation of 3 unicode characters
or something far worse!

Yes, there is a workaround with the "O" format and three more function
calls, but I wouldn't call that "works fine"...

> > Using Python StringObjects as binary buffers is also far less common
> > than using StringObjects to store plain old strings, so if either of
> > these uses bites the other it's the binary buffer that needs to
> > suffer.
> This is a conclusion I cannot agree with. Most strings are really
> binary, if you look at them closely enough :-)

I'm not sure I understand this remark. If you made it just for the
smiley: never mind. If you really don't agree: please explain why.

Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++ | ++++ if you agree copy these lines to your sig ++++        | see 

From  Fri Jan  4 12:30:48 2002
From: (Michael Hudson)
Date: 04 Jan 2002 12:30:48 +0000
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.141,2.142
In-Reply-To: Neal Norwitz's message of "Tue, 01 Jan 2002 11:07:15 -0800"
References: <>
Message-ID: <>

Neal Norwitz <> writes:

> Update of /cvsroot/python/python/dist/src/Objects
> In directory usw-pr-cvs1:/tmp/cvs-serv2511/Objects
> Modified Files:
> 	fileobject.c 
> Log Message:
> SF Patch #494863, file.xreadlines() should raise ValueError if file is closed
> This makes xreadlines behave like all other file methods
> (other than close() which just returns).

Does this qualify as a bugfix?


From  Fri Jan  4 12:39:00 2002
From: (Fred L. Drake, Jr.)
Date: Fri, 4 Jan 2002 07:39:00 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.141,2.142
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Hudson writes:
 > > SF Patch #494863, file.xreadlines() should raise ValueError if file is closed
 > > 
 > > This makes xreadlines behave like all other file methods
 > > (other than close() which just returns).
 > Does this qualify as a bugfix?

  I think so.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Fri Jan  4 12:47:45 2002
From: (Jack Jansen)
Date: Fri, 04 Jan 2002 13:47:45 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: Message by "M.-A. Lemburg" <> ,
 Fri, 04 Jan 2002 10:21:09 +0100 , <>
Message-ID: <>

Off on a slight tangent:
On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles
this fine and so does open(). The OS does all the hard work for you:
it knows that some mounted disks may be in other 8-bit encodings (such
as MacRoman or MacJapanese for old mac disks, or probably latin-1 for NFS
filesystems, or god-knows-what for SMB mounted disks) and handles the

But in Python (unix-Python we're talking here, not MacPython),
unicode(filename) fails, because site.encoding is "ascii".

Would it be safe to set site.encoding to utf8 on Mac OS X by default? 
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++ | ++++ if you agree copy these lines to your sig ++++        | see 

From  Fri Jan  4 13:35:28 2002
From: (Guido van Rossum)
Date: Fri, 04 Jan 2002 08:35:28 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Your message of "Fri, 04 Jan 2002 18:15:08 +1100."
References: <>
Message-ID: <>

> Ok, I'd like to make the 2.1.2 release some time in the first
> half of the week starting 7th Jan, assuming that's ok for the folks 
> who'll need to do the work on the PC/Mac packaging.

Cool!  I can't speak for the Mac folks -- they may still be exhausted
from the 2.2 release effort -- but I can't imagine this would be much
of a problem.

> I notice that pep 101 is pretty strongly focussed on the major
> releases, not the minor ones. Is it worth making a modified version
> of this PEP with the minor release steps?

Great idea!

> the things to do:
>    README file.
>    NEWS file - should this have anything other than the socket.sendall()
>                change?

The 2.1.1 NEWS file had a SF reference of each and every bug that was
fixed.  Is this worth doing?  (If it were me, the answer would be an
emphatic "no".)

> I don't have access to, so someone else's going to
> need to do this.

Barry & I are at your service.  I'm guessing you'll also need Fred's
help to roll out the docs (are there going to be 2.1.2 docs?) and
Tim's for the windows installer (which may be a bit of a pain since
we've switched installer builders for 2.2).

> As far as 2.2.1 goes, I'm happy to keep on the patch czar role. Is
> trying for a release before the conference too aggressive a timeframe?
> There seem to be a number of niggles that'd be nice to have fixed...

That would be very cool!  Should be plenty of time if we aim low.

--Guido van Rossum (home page:

From  Fri Jan  4 16:23:27 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 11:23:27 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> I don't think we can get autoconf detection to work on
    MvL> 2.1.

I don't mind.
    MvL> The instructions are right. Unfortunately, the code is
    MvL> wrong: It prefers fgetpos in 2.1, but that returns not an
    MvL> integral type on some systems.


    MvL> I think the best approach is to copy the body of
    MvL> _portable_fseek and _portable_ftell from 2.2. With that, I
    MvL> get a setup that atleast looks right (patch attached)

Unfortunately that's not enough, I suspect.

    >> I've had some failures on 2.4.7 kernels w/ ext3 filesystems.

    MvL> Were these compilation failures, or runtime failures? For the
    MvL> compilation failures, ext3 should be irrelevant, and 2.4.7
    MvL> should be irrelevant as well - the glibc version would matter
    MvL> (which defines fpos_t to be a struct with an mbstate_t
    MvL> inside).

Vanilla release21-maint will give compilation failures, which go away
with the patch (essentially what I tried on other systems).  But even
with these patches, test_largefile fails on the seek(2**31L).

So something else too is going on.

FTR: this is a stock Mandrake 8.1 system w/ glibc 2.2.4.

I don't have much time to spend looking into this right now, but it
would be good to fix for 2.1.2.


From  Fri Jan  4 16:41:26 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 11:41:26 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

> I'm guessing you'll also need Fred's help to roll out the docs (are
> there going to be 2.1.2 docs?) and Tim's for the windows installer
> (which may be a bit of a pain since we've switched installer builders
> for 2.2).

Ouch.  More than a bit -- I'd have to find the old Wise floppy first (it's
not on my disk anymore).  But the 16-bit installer is itself "a bug" (often
doesn't work) on the recent MS high-end OSes (2000 and XP), so creating
another of those is a dubious exercise.  We were probably shipping different
versions of expat and/or zlib on Windows for 2.1 too (but at least I can
suck those binaries out of an installed 2.1 -- or was Windows 2.1 compiled
with a binary-incompatible MSVC 5?).  Etc.  If I do this, it's going to
consume at least a day to straighten out all the issues.

From  Fri Jan  4 16:58:47 2002
From: (Guido van Rossum)
Date: Fri, 04 Jan 2002 11:58:47 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Your message of "Fri, 04 Jan 2002 11:41:26 EST."
References: <>
Message-ID: <>

> [Guido]
> > I'm guessing you'll also need Fred's help to roll out the docs (are
> > there going to be 2.1.2 docs?) and Tim's for the windows installer
> > (which may be a bit of a pain since we've switched installer builders
> > for 2.2).
> Ouch.  More than a bit -- I'd have to find the old Wise floppy first (it's
> not on my disk anymore).  But the 16-bit installer is itself "a bug" (often
> doesn't work) on the recent MS high-end OSes (2000 and XP), so creating
> another of those is a dubious exercise.  We were probably shipping different
> versions of expat and/or zlib on Windows for 2.1 too (but at least I can
> suck those binaries out of an installed 2.1 -- or was Windows 2.1 compiled
> with a binary-incompatible MSVC 5?).  Etc.  If I do this, it's going to
> consume at least a day to straighten out all the issues.

2.1 was solidly MSVC 6, so I don't think there were any MSVC 5 issues.

Would it be a problem if we used the new installer for 2.1.2?  That
would be much easier on Tim.  There are still some issues (e.g. expat)
but I'm not qualified to rule on those.

--Guido van Rossum (home page:

From  Fri Jan  4 17:06:53 2002
From: (M.-A. Lemburg)
Date: Fri, 04 Jan 2002 18:06:53 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <>
Message-ID: <>

Jack Jansen wrote:
> Off on a slight tangent:
> On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles
> this fine and so does open(). The OS does all the hard work for you:
> it knows that some mounted disks may be in other 8-bit encodings (such
> as MacRoman or MacJapanese for old mac disks, or probably latin-1 for NFS
> filesystems, or god-knows-what for SMB mounted disks) and handles the
> conversion.

That's good news.
> But in Python (unix-Python we're talking here, not MacPython),
> unicode(filename) fails, because site.encoding is "ascii".
> Would it be safe to set site.encoding to utf8 on Mac OS X by default?

I'd rather suggest to use UTF-8 as default encoding in the
subsystem layer I was talking about. 

Making UTF-8 the default Python system encoding would have many other 
consequences -- and you'd probably lose a great deal of portability 
since UTF-8 conversion (nearly) always will succeed while ASCII can 
easily fail on other systems which use e.g. Latin-1 as native 

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan  4 17:23:47 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 12:23:47 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

> 2.1 was solidly MSVC 6, so I don't think there were any MSVC 5 issues.

That matches my recollection, but while I'd bet your life on it I wouldn't
bet mine <wink>.

> Would it be a problem if we used the new installer for 2.1.2?  That
> would be much easier on Tim.

The real reason to use the new installer is that the old one is, and
increasingly as 2000 and XP get more popular, itself "a bug".  Getting a
32-bit installer is increasingly necessary, and the old installer can't deal
with the Win2K privilege maze at all (usually spelling "insufficient
privilege" as "corrupt installation detected" before its first dialog box
even appears -- the old Wise 16-bit-installer builder was released when
Win95 was brand new).

> There are still some issues (e.g. expat) but I'm not qualified to rule
> on those.

I'll ask Fred about that one offline.  It's all doable, it's just going to
consume some time.

From  Fri Jan  4 18:17:03 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 19:17:03 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <>
Message-ID: <>

> We'd still need to support other OSes as well, though, and I
> don't think that putting all this code into fileobject.c is
> a good idea -- after all opening files is needed by some other
> parts of Python as well and may also be useful for extensions.

The stuff isn't in fileobject.c. Py_FileSystemDefaultEncoding
is defined in bltinmodule.c.

Also, on other OSes: You can pass Unicode object to open on all
systems. If Py_FileSystemDefaultEncoding is NULL, it will fall back to

Of course, if the system has an open function that expects wchar_t*,
we might want to use that instead of going through a codec. Off hand,
Win32 seems to be the only system where this might work, and even
there, it won't work on Win95.

> I'd suggest to implement something similiar to the DLL loading
> code which is also implemented as subsystem in Python.

I'd say this is over-designed. It is not that there are ten
alternative approaches to doing encodings in file names, and we only
support two of them, but it is rather that there are only two, and we
support all three of them :-)

Also, it is more difficult than threads: for threads, there is a fixed
set of API features that need to be represented. Doing Py_UNICODE*
opening alone is easy, but look at the number of posixmodule functions
that all expect file names of some sort.


From  Fri Jan  4 18:53:36 2002
From: (Chris McDonough)
Date: Fri, 4 Jan 2002 13:53:36 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <><><> <>
Message-ID: <01e801c19551$222145a0$c617a8c0@kurtz>

Hi folks,

I'm subscribed to the list, but I'm still not quite sure if I'm supposed to
be posting here... I suppose I should go read the charter.  Please flame me
if this list is for the "in crowd" only. ;-)

I tried to get the 21-maintbranch LFS working using the directions that are
provided in the current docs
(, but it fails
to compile for me as a result.  Someone has suggested that it's not the
instructions that are broken, but the code.  Can this be confirmed?

Because ZC is forced to stick with Python 2.1.X (as opposed to 2.2.X) for
the current crop of Zope releases, and because we often need large file
support under Zope, it's pretty important for us to get a 2.1.X release
under which LFS works.  A workaround is fine as well.

I don't think I have the knowhow to fix it, but if I can help in any way by
testing under various Linuxii, please let me know.


----- Original Message -----
From: "Barry A. Warsaw" <>
To: "Martin v. Loewis" <>
Cc: <>; <>
Sent: Friday, January 04, 2002 11:23 AM
Subject: Re: [Python-Dev] release for 2.1.2, plus 2.2.1...

> >>>>> "MvL" == Martin v Loewis <> writes:
>     MvL> I don't think we can get autoconf detection to work on
>     MvL> 2.1.
> I don't mind.
>     MvL> The instructions are right. Unfortunately, the code is
>     MvL> wrong: It prefers fgetpos in 2.1, but that returns not an
>     MvL> integral type on some systems.
> Right.
>     MvL> I think the best approach is to copy the body of
>     MvL> _portable_fseek and _portable_ftell from 2.2. With that, I
>     MvL> get a setup that atleast looks right (patch attached)
> Unfortunately that's not enough, I suspect.
>     >> I've had some failures on 2.4.7 kernels w/ ext3 filesystems.
>     MvL> Were these compilation failures, or runtime failures? For the
>     MvL> compilation failures, ext3 should be irrelevant, and 2.4.7
>     MvL> should be irrelevant as well - the glibc version would matter
>     MvL> (which defines fpos_t to be a struct with an mbstate_t
>     MvL> inside).
> Vanilla release21-maint will give compilation failures, which go away
> with the patch (essentially what I tried on other systems).  But even
> with these patches, test_largefile fails on the seek(2**31L).
> So something else too is going on.
> FTR: this is a stock Mandrake 8.1 system w/ glibc 2.2.4.
> I don't have much time to spend looking into this right now, but it
> would be good to fix for 2.1.2.
> -Barry
> _______________________________________________
> Python-Dev mailing list

From  Fri Jan  4 18:40:34 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 19:40:34 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (message from Jack
 Jansen on Fri, 04 Jan 2002 13:22:47 +0100)
References: <>
Message-ID: <>

> No, it doesn't, that is the whole point of why I started this
> thread!!!!

Oops, right. I was thinking the other way around: passing u"a.out"
where "a.out" is expected works fine; for this case, the memory
management issues come into play.

> > > Using Python StringObjects as binary buffers is also far less common
> > > than using StringObjects to store plain old strings, so if either of
> > > these uses bites the other it's the binary buffer that needs to
> > > suffer.
> > 
> > This is a conclusion I cannot agree with. Most strings are really
> > binary, if you look at them closely enough :-)
> I'm not sure I understand this remark. If you made it just for the
> smiley: never mind. If you really don't agree: please explain why.

When the discussion of tagging binary strings in source code came up,
I started to look into the standard library which string literals
would have to be tagged as byte strings, and which are really
character strings.

I found that the overwhelming majority of string literals in the
standard Python library really denotes byte strings, if you ignore doc
strings. Sometimes, it isn't obvious that they are binary strings,
hence the smiley. Look at

__all__ = ["HTTP", ...

Not sure: Are Python function names byte strings or character strings?
Probably doesn't matter either way. Python source code is definitely
byte-oriented, explicitly wihthout any assumed encoding, so I'd lean
towards byte strings here.


Looks like a character string. However, it is used in

        self.version = _UNKNOWN # HTTP-Version

self.version is later sent on the byte-oriented HTTP protocol, so
_UNKNOWN *is* a byte string.

_CS_IDLE = 'Idle'

These are enumerators, let's say they are character strings.

        self.fp = sock.makefile('rb', 0)

Not sure. Could be a character string.

            print "reply:", repr(line)

Definitely a character string.

                version = "HTTP/0.9"
                status = "200"
                reason = ""

Protocol elements, thus byte string.

So, I'm arguing that byte strings are far more common than you may
think at first sight. In particular, everything passed to .read(),
either of a file, or of a socket, is a byte string, since files and
network connections are byte-oriented. In the particular case of
network connections, applying system conventions for narrow strings
would be foolish.


From  Fri Jan  4 19:02:03 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 14:02:03 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <>
Message-ID: <>

>>>>> "CM" == Chris McDonough <> writes:

    CM> I'm subscribed to the list, but I'm still not quite sure if
    CM> I'm supposed to be posting here... I suppose I should go read
    CM> the charter.  Please flame me if this list is for the "in
    CM> crowd" only. ;-)

You did fine, Chris! :)  Welcome.

    CM> I tried to get the 21-maintbranch LFS working using the
    CM> directions that are provided in the current docs
    CM> (,
    CM> but it fails to compile for me as a result.  Someone has
    CM> suggested that it's not the instructions that are broken, but
    CM> the code.  Can this be confirmed?

Confirmed.  The compilation errors can be fixed with the patch that
Martin sent around earlier in this thread.  So that probably ought to
be added to Python 2.1.2.  But the patch + the posix-large-file
instructions still don't enable large file support for me on glibc
2.2.4.  So something more is needed.

    CM> Because ZC is forced to stick with Python 2.1.X (as opposed to
    CM> 2.2.X) for the current crop of Zope releases, and because we
    CM> often need large file support under Zope, it's pretty
    CM> important for us to get a 2.1.X release under which LFS works.
    CM> A workaround is fine as well.

    CM> I don't think I have the knowhow to fix it, but if I can help
    CM> in any way by testing under various Linuxii, please let me
    CM> know.

I do plan to get back to this if nobody else fixes it in the
meantime, but I've got a couple of higher priority things to deal with
right now.

I'd say LFS in Python 2.1.2 should be a high priority.


From  Fri Jan  4 19:04:45 2002
From: (Guido van Rossum)
Date: Fri, 04 Jan 2002 14:04:45 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Your message of "Fri, 04 Jan 2002 14:02:03 EST."
References: <> <> <> <> <01e801c19551$222145a0$c617a8c0@kurtz>
Message-ID: <>

> Confirmed.  The compilation errors can be fixed with the patch that
> Martin sent around earlier in this thread.  So that probably ought to
> be added to Python 2.1.2.  But the patch + the posix-large-file
> instructions still don't enable large file support for me on glibc
> 2.2.4.  So something more is needed.

Hm, is it possible that glibc 2.2.4 is too old to support large files?

> I'd say LFS in Python 2.1.2 should be a high priority.


--Guido van Rossum (home page:

From  Fri Jan  4 19:14:29 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 14:14:29 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

> I'd say LFS in Python 2.1.2 should be a high priority.

I'd say it's a show-stopper.  Zope isn't the only client for large files;
besides, we could just tell Zope customers to upgrade to Windows, where LFS
has been part of the Win32 API since before Linus learned how to spell Perl

From Andreas Jung" <  Fri Jan  4 19:17:13 2002
From: Andreas Jung" < (Andreas Jung)
Date: Fri, 4 Jan 2002 14:17:13 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <> <> <> <> <01e801c19551$222145a0$c617a8c0@kurtz>              <>  <>
Message-ID: <070701c19554$6a31bac0$9e17a8c0@suxlap>

----- Original Message -----
From: "Guido van Rossum" <>
To: "Barry A. Warsaw" <>
Cc: "Chris McDonough" <>; "Martin v. Loewis"
<>; <>; <>
Sent: Friday, January 04, 2002 14:04
Subject: Re: [Python-Dev] release for 2.1.2, plus 2.2.1...

> > Confirmed.  The compilation errors can be fixed with the patch that
> > Martin sent around earlier in this thread.  So that probably ought to
> > be added to Python 2.1.2.  But the patch + the posix-large-file
> > instructions still don't enable large file support for me on glibc
> > 2.2.4.  So something more is needed.
> Hm, is it possible that glibc 2.2.4 is too old to support large files?
I would be surprised if glibc 2.2.4 does not support LFS. Some months
ago I installed Python 2.1 on a "older" RH 7.1 system with LFS support.
The glibc version of RH7.1 is most likely  older than 2.2.4.


From  Fri Jan  4 19:29:43 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 14:29:43 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <>
Message-ID: <>

>>>>> "GvR" == Guido van Rossum <> writes:

    GvR> Hm, is it possible that glibc 2.2.4 is too old to support
    GvR> large files?

Doubtful.  This is the stock glibc that comes with Mandrake 8.1, which
is their latest offering.  And besides, Python 2.2 on the same box
supports LFS just fine!


From  Fri Jan  4 19:36:51 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:36:51 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

>     MvL> I think the best approach is to copy the body of
>     MvL> _portable_fseek and _portable_ftell from 2.2. With that, I
>     MvL> get a setup that atleast looks right (patch attached)
> Unfortunately that's not enough, I suspect.

I can't see a problem.

> Vanilla release21-maint will give compilation failures, which go away
> with the patch (essentially what I tried on other systems).  But even
> with these patches, test_largefile fails on the seek(2**31L).

Not for me (i.e. it passes just fine). How exactly does it fail? What
version of the test?  Can you produce an strace?

> FTR: this is a stock Mandrake 8.1 system w/ glibc 2.2.4.

That should be good enough.

> I don't have much time to spend looking into this right now, but it
> would be good to fix for 2.1.2.

Somebody else should probably try this as well. I would not stop the
release for that: if it compiles fine when following the instructions,
and does the right thing for small files, I think the release should


From  Fri Jan  4 19:27:59 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:27:59 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (message from Jack
 Jansen on Fri, 04 Jan 2002 13:47:45 +0100)
References: <>
Message-ID: <>

> Would it be safe to set site.encoding to utf8 on Mac OS X by default? 

As MAL explains, no. Instead, you should extend the fragment

#if defined(MS_WIN32) && defined(HAVE_USABLE_WCHAR_T)
const char *Py_FileSystemDefaultEncoding = "mbcs";
const char *Py_FileSystemDefaultEncoding = NULL; /* use default */

to cover OSX as well, setting the string to "utf-8". Then, Unicode
objects will be auto-converted to UTF-8 in open() and all posixmodule
calls; not sure whether OSX uses posixmodule, though...

Once you've done this, you should use es# specifiers with
Py_FileSystemDefaultEncoding wherever you retrieve a file or path
name from the application.

Returning file names to the user is a different story, though: it may
or may not be sensible to apply the file system encoding (if set)
whenever file names are returned to the application (mostly in


From  Fri Jan  4 19:21:07 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:21:07 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (message
 from Guido van Rossum on Fri, 04 Jan 2002 11:58:47 -0500)
References: <> <>
Message-ID: <>

> Would it be a problem if we used the new installer for 2.1.2?  That
> would be much easier on Tim.  There are still some issues (e.g. expat)
> but I'm not qualified to rule on those.

My guess is that 2.1.2 will compile fine with whatever expat
installation Tim currently has, if it does, pyexpat will certainly
work correctly (or: as good as it did in 2.1.1).


From  Fri Jan  4 18:42:42 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 19:42:42 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects fileobject.c,2.141,2.142
In-Reply-To: <> (message from Michael Hudson
 on 04 Jan 2002 12:30:48 +0000)
References: <> <>
Message-ID: <>

> > This makes xreadlines behave like all other file methods
> > (other than close() which just returns).
> Does this qualify as a bugfix?

Yes. But it also tightens the behaviour, so it should not be applied
to maintainance branches: no correct program would work better with
this patch, but currently broken programs may stop working.


From  Fri Jan  4 19:49:24 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:49:24 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <01e801c19551$222145a0$c617a8c0@kurtz> (
References: <><><> <> <01e801c19551$222145a0$c617a8c0@kurtz>
Message-ID: <>

> I'm subscribed to the list, but I'm still not quite sure if I'm supposed to
> be posting here... I suppose I should go read the charter.  Please flame me
> if this list is for the "in crowd" only. ;-)

This list is for development *of* Python. Anybody is free to post
questions and comments on that topic, like you just did; I don't like
it when people post questions of the "how do I ... in Python" kind
that you typically see on python-list - this is not a list to get
better help :-)

> I tried to get the 21-maintbranch LFS working using the directions that are
> provided in the current docs
> (, but it fails
> to compile for me as a result.  Someone has suggested that it's not the
> instructions that are broken, but the code.  Can this be confirmed?

Well, you did not describe exactly how it fails to compile for
you. Assuming you got an error that something is not an integral type,
then that is clearly an error in the code. You might want to
investigate the error message you get more closely; please confirm
that it refers to the return value of fgetpos.

If you need further confimation, I recommend that you invoke the gcc
line that fails adding --save-temps, and inspect the resulting
fileobject.i. You will likely find that fpos_t is a structure, and
that Python attempts to return it in a place where an integer is
needed (or vice versa).

> Because ZC is forced to stick with Python 2.1.X (as opposed to 2.2.X) for
> the current crop of Zope releases, and because we often need large file
> support under Zope, it's pretty important for us to get a 2.1.X release
> under which LFS works.  A workaround is fine as well.

Please try the patch I posted, and report whether test_largefile
passes or fails (or, if it fails to compile, what the exact error
messages are).


From  Fri Jan  4 19:52:50 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:52:50 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (
References: <>
 <01e801c19551$222145a0$c617a8c0@kurtz> <>
Message-ID: <>

> Confirmed.  The compilation errors can be fixed with the patch that
> Martin sent around earlier in this thread.  So that probably ought to
> be added to Python 2.1.2.  But the patch + the posix-large-file
> instructions still don't enable large file support for me on glibc
> 2.2.4.  So something more is needed.

One possible difference between your and my installation is that you
probably followed the Linux instructions, whereas I followed the
Solaris instructions (even though my system is Linux). I did so
because of

martin@mira:~/work/python/dist/src> getconf LFS_CFLAGS

So getconf works fine on Linux, as well, and DTRT. Could please
recompile your installation using the getconf approach alone?

> I do plan to get back to this if nobody else fixes it in the
> meantime, but I've got a couple of higher priority things to deal with
> right now.
> I'd say LFS in Python 2.1.2 should be a high priority.

I'd like to see an independent confirmation first that there still is
a problem to solve.


From  Fri Jan  4 19:53:34 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 20:53:34 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (message
 from Guido van Rossum on Fri, 04 Jan 2002 14:04:45 -0500)
References: <> <> <> <> <01e801c19551$222145a0$c617a8c0@kurtz>
 <> <>
Message-ID: <>

> Hm, is it possible that glibc 2.2.4 is too old to support large files?

No, it is the current release.


From  Fri Jan  4 19:58:15 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 14:58:15 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

[Martin v. Loewis]
> My guess is that 2.1.2 will compile fine with whatever expat
> installation Tim currently has, if it does, pyexpat will certainly
> work correctly (or: as good as it did in 2.1.1).

It changes the structure of the distribution, though:  2.1 Windows Python
shipped with xmlparse.dll and xmltok.dll, 2.2 with neither of those but with
a single expat.dll instead.  Regardless of whether "it works" for Python, I
don't think a bugfix release is the time to change the *set* of DLLs we
ship.  The MSVC project files on the 2.1 branch also have no idea what to do
with the current expat setup, and last-second changes just multiply if I
fight that too (the 2.1 PCbuild README would also need to be changed; etc).
2.2 is better here, but the old expat setup wasn't "a bug"; people who want
the new setup should upgrade to 2.2, where it was first introduced.

From  Fri Jan  4 20:04:29 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 21:04:29 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
References: <>
Message-ID: <>

> It changes the structure of the distribution, though:  2.1 Windows Python
> shipped with xmlparse.dll and xmltok.dll, 2.2 with neither of those but with
> a single expat.dll instead.  Regardless of whether "it works" for Python, I
> don't think a bugfix release is the time to change the *set* of DLLs we
> ship. 

Right, I agree on all accounts. Do whatever is most convenient for you.

On that topic, is there anybody else in this list who has the
necessary software to build a Python Windows release? I feel quite
uncomfortable thinking that, if your PC crashes, Windows people would
be without Python (actually, *that* uncomfortable is that thought not

This would probably the time to step forward offering to build the
official 2.1.2 binary distribution.


From  Fri Jan  4 20:07:26 2002
From: (Skip Montanaro)
Date: Fri, 4 Jan 2002 14:07:26 -0600
Subject: [Python-Dev] To post or not to post, that is the question...
In-Reply-To: <>
Message-ID: <>

    Martin> I don't like it when people post questions of the "how do I=

    Martin> ... in Python" kind that you typically see on python-list -=
    Martin> is not a list to get better help :-)

Somewhat au contraire from this neck of the woods...  In my Unicode fil=
thread I decided it would be best to post here for a couple reasons:

    * I figured most good answers would come from Martin and Marc-Andr=E8=

    * It's not clear that the "right way" to do this stuff appears to b=
      settled, which I think has been proven out somewhat by the extend=
      thread and the long thread Jack started about Unicode and getargs=


From  Fri Jan  4 20:08:57 2002
From: (Fred L. Drake, Jr.)
Date: Fri, 4 Jan 2002 15:08:57 -0500 (EST)
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
References: <>
Message-ID: <>

Martin v. Loewis writes:
 > My guess is that 2.1.2 will compile fine with whatever expat
 > installation Tim currently has, if it does, pyexpat will certainly
 > work correctly (or: as good as it did in 2.1.1).

  I like that answer.  ;-)  The catch is that I think Python 2.1.1
includes Expat 1.2, and the Python API changes slightly based on the
Expat version.  So I think it best to use the Expat shipped with
Python 2.1.1.  The pyexpat extension should need no changes.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Fri Jan  4 20:11:01 2002
From: (Guido van Rossum)
Date: Fri, 04 Jan 2002 15:11:01 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Your message of "Fri, 04 Jan 2002 21:04:29 +0100."
References: <>
Message-ID: <>

> On that topic, is there anybody else in this list who has the
> necessary software to build a Python Windows release? I feel quite
> uncomfortable thinking that, if your PC crashes, Windows people would
> be without Python (actually, *that* uncomfortable is that thought not
> :-)

I have the whole suite working on my laptop too.

> This would probably the time to step forward offering to build the
> official 2.1.2 binary distribution.

But I'm not volunteering.

--Guido van Rossum (home page:

From  Fri Jan  4 20:25:43 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 15:25:43 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

[Martin v. Loewis]
> Right, I agree on all accounts. Do whatever is most convenient for you.

I don't care what's convenient, I want to do what's *right* for 2.1.2.  I
think this got confused because Guido sold "use the new installer-builder"
on the grounds that it would be easier for me, but, while that's true, it's
also true that the old installer-builder produces broken installers (useless
for many Win2K/XP users).  The latter is the real reason I want to use the
new installer-builder; the old one is quite arguably "a bug".

> On that topic, is there anybody else in this list who has the
> necessary software to build a Python Windows release? I feel quite
> uncomfortable thinking that, if your PC crashes, Windows people would
> be without Python (actually, *that* uncomfortable is that thought not
> :-)

The only pieces you can't get for free over the Web are MSVC 6 and Wise
8.14; the MSVC and Wise project files are in CVS, so it's only the MS and
Wise executables someone would have to obtain.  PythonLabs has the physical
CDs for those, so it doesn't matter much if my box crashes; I also have at
least 3 copies of them on backup tapes, and two other copies on two other
machines.  Hmm.  I probably violated the license agreements at least 4 times
there <wink/sigh>.

> This would probably the time to step forward offering to build the
> official 2.1.2 binary distribution.

Don't I wish.  I would like to see us move to a free installer.  I built
(and checked in) an Inno Setup project file that does "almost all" the good
stuff, and advertised on for volunteers to take it over.  Alas,
nobody bit, and I can't justify spending more of my time on it (I could at
the time because I wasn't making any progress then getting Wise to let us
use a new version of their stuff, and Inno Setup was the only feasible

From  Fri Jan  4 20:26:51 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 15:26:51 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

> I have the whole suite working on my laptop too.

I wasn't going to admit that:  that one's a serious license violation

From  Fri Jan  4 20:32:26 2002
From: (M.-A. Lemburg)
Date: Fri, 04 Jan 2002 21:32:26 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <>
Message-ID: <>

Tim Peters wrote:
> [Guido]
> > I have the whole suite working on my laptop too.
> I wasn't going to admit that:  that one's a serious license violation
> <wink>.

Is it really ? Most desktop apps nowadays allow one additional laptop

Totally off-topic, of course,
Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan  4 20:34:34 2002
From: (Tim Peters)
Date: Fri, 4 Jan 2002 15:34:34 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <>
Message-ID: <>

> I have the whole suite working on my laptop too.

> I wasn't going to admit that:  that one's a serious license violation
> <wink>.

> Is it really ? Most desktop apps nowadays allow one additional laptop
> installation.

Yes, and my laptop != Guido's laptop.  Look at all the trouble you're
getting us into here ...

From  Fri Jan  4 21:00:24 2002
From: (Aahz Maruch)
Date: Fri, 4 Jan 2002 13:00:24 -0800 (PST)
Subject: Re [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <004101c194cd$701deb20$47fdbac3@newmexico> from "Samuele Pedroni" at Jan 04, 2002 04:11:00 AM
Message-ID: <>

Samuele Pedroni wrote:
> [Tim Peters] 
>> Mapping what to what?  A fine implementation of id() would be to hand each
>> new object a unique Java int from a global counter, incremented once per
>> Python object creation -- or a Java long if any JVM stays up long enough
>> that 32 bits is an issue <wink>.
> The problem are java class instances, sir, we use non-unique wrappers
> for them and identity is simulated.  We could use a table to make
> the wrappers unique but we have potentially lots of them as you can
> imagine, jython people actually use java classes <wink>. So the
> workaround is to keep a table just for the java instances for which
> someone has asked the id.

I'm slightly confuzzled here (no surprise given how little Java I know).
How does Jython know which Java class instance to refer to if there's
no mapping?  If there is a mapping, how does it slow things down to
create an id every time a map gets created?  (Yes, it'll chew up memory,
but Java uses so much memory already... ;-)
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Fri Jan  4 21:21:35 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 22:21:35 +0100
Subject: [Python-Dev] To post or not to post, that is the question...
In-Reply-To: <> (message from Skip
 Montanaro on Fri, 4 Jan 2002 14:07:26 -0600)
References: <>
Message-ID: <>

>     * I figured most good answers would come from Martin and Marc-Andrè.

That alone is not a good reason. python-dev is not a place to get free
consulting (which, of course, is more bothersome to whoever gives the
consulting, than to who receives it).

>     * It's not clear that the "right way" to do this stuff appears to be
>       settled, which I think has been proven out somewhat by the extended
>       thread and the long thread Jack started about Unicode and getargs.c.

Well, I was mostly referring to things that are either documented, or
can be found easily through source inspection. IOW, I expect that
python-dev posters do their homework before posting. Things like "is
it really that you cannot do X, but that it should be possible to do
so", or "what is the exact rationale for Y happening" are definitely
python-dev issues.


From  Fri Jan  4 21:39:49 2002
From: (Martin v. Loewis)
Date: Fri, 4 Jan 2002 22:39:49 +0100
Subject: Re [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <> (
References: <>
Message-ID: <>

> I'm slightly confuzzled here (no surprise given how little Java I know).
> How does Jython know which Java class instance to refer to if there's
> no mapping?  

I understand Samuele was talking about mapping in the Python sense
(existance of dictionary-style containers); he also mentioned that
Jython creates a Python wrapper object for each "foreign" Java object.

Creating wrapper objects immediately raises the issue of identity: If
you get the very same Java objects two times, do you want to use the
same wrapper object? If yes, how do you find out that you already have
a wrapper object. This is where the mapping comes into play.

If no, how do you implement "is"? Well, that's easy:

  def is(o1, o2):
    if o1 instanceof wrapper:
      if not o2 instanceof wrapper: return false
      return o1.wrapped identical_to o2.wrapped
    return o1 identical_to o2

Now, how do you implement id()? More tricky: Tim suggests you bump a
counter every time you create a Python object. Works fine for "true"
python objects:

  def id(o):
    return o.countervalue

Doesn't work as well for wrapper objects: When should you bump the
counter? When you create the wrapper? But then there may be two
wrappers with different ids refering to the very same object, so you'd
have 'o1 is o2 and id(o1) <> id(o2)' which clearly is a no-no.

> If there is a mapping, how does it slow things down to create an id
> every time a map gets created?

You can do it like this:

map = {}

def wrap(java.lang.Object o):
    return map[o]
  except KeyError:
    map[o] = res = wrapper(o, new_id())
    return res

That requires a map lookup every time a wrapper is created; clearly
undesirable. I think Samuele had something in mind like:

map = {}

def wrap(java.lang.Object o):
  return wrapper(o, None)

def id(o):
  if not o instanceof wrapper:
    return o.countervalue
  if o.countervalue:
    return o.countervalue
    o.countervalue = map[o.wrapped]
  except KeyError:
    o.countervalue = map[o.wrapped] = new_id()
  return o.countervalue

So you'd take the cost of a map lookup only if somebody accesses the
id of an object.

That would still mean that all Java objects whose Python id() was ever
computed would live in the dictionary forever; there you need a


From Samuele Pedroni" <  Fri Jan  4 21:55:14 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Fri, 4 Jan 2002 22:55:14 +0100
Subject: [Python-Dev] object equality vs identity, in and dicts idioms and speed
References: <> <>
Message-ID: <003701c1956a$7e296a80$6d94fea9@newmexico>

Martin explanation is correct.

[Martin v. Loewis]
> You can do it like this:
> map = {}
> def wrap(java.lang.Object o):
>   try:
>     return map[o]
>   except KeyError:
>     map[o] = res = wrapper(o, new_id())
>     return res
> That requires a map lookup every time a wrapper is created; clearly
> undesirable. I think Samuele had something in mind like:

With this approach you could use less memory if there is
much wrapper duplication, but typically a Java object
does not get many long-lived different wrappers.

This "wrap" is quite a core operation, and the map need
to be weak otherwise you leak badly.
 That means that you can implement it only
with java >1.2 and anyway weak-dictionaries in java
require dealing with polling queues of reset weak-refs.
This means complication and slowdown where we
would prefer to avoid it.


From  Fri Jan  4 22:11:29 2002
From: (Finn Bock)
Date: Fri, 04 Jan 2002 22:11:29 GMT
Subject: Re [Python-Dev] object equality vs identity, in and dicts idioms and speed
In-Reply-To: <>
References: <> <>
Message-ID: <>

[Martin v. Loewis]

>I understand Samuele was talking about mapping in the Python sense
>(existance of dictionary-style containers); he also mentioned that
>Jython creates a Python wrapper object for each "foreign" Java object.

Your summery is quite accurate.

When this was discussed on jython-dev, I said I preferred a solution
where all objects was inserted in your "map" dictionary when id() was
called on them. Not just the wrapped java instances. I picked that
preference because I think using id() is a relative uncommon operation.
In the Lib modules, id() is used to detect cycles in,, and xmlrpclib. I would rather have a slow id() operation on
python objects too, than burden all python objects with an additional
int or long.

Is that a wrong call?

In the repr() of a lot of internal objects, the id() is used in the
return string. Would anyone rightly expect that hex number to match the
id() value of the object? In our discussions we agreed that the repr()
string does not have to match the value return by id().


From  Fri Jan  4 22:27:12 2002
From: (Chris McDonough)
Date: Fri, 04 Jan 2002 17:27:12 -0500
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
References: <><><> <> <01e801c19551$222145a0$c617a8c0@kurtz> <>
Message-ID: <>

 > Well, you did not describe exactly how it fails to compile for
 > you. Assuming you got an error that something is not an integral type,
 > then that is clearly an error in the code. You might want to
 > investigate the error message you get more closely; please confirm
 > that it refers to the return value of fgetpos.

Yes, apologies.  I should have provided more details.

I'm using a stock Red Hat Linux 7.2, which has glibc 2.2.4 (Linux
kernel version 2.4.7).

With a Python built successfully from the 21-maintbranch without any
additional compiler flags indicating that I want large file support, I
get this when attempting to run the test_largefile test:

[chrism@kurtz tmp]$ python /usr/local/lib/python2.1/test/
Traceback (most recent call last):
   File "/usr/local/lib/python2.1/test/", line 22, in ?
     raise test_support.TestSkipped, \
test_support.TestSkipped: platform does not have largefile support

What's going on "under the hood" here is that a bit of code like this:

open('foo', 'w').seek(2147483649L)

.. raises an IOError 22, (invalid argument) out of the seek.

When I attempt to compile the code from the same branch using the
instructions for Solaris from, it craps
out during a successive make:

gcc -c -g -O2  -I. -I./Include -DHAVE_CONFIG_H  -o Objects/fileobject.o 
Objects/fileobject.c: In function `_portable_ftell':
Objects/fileobject.c:267: incompatible types in return
make: *** [Objects/fileobject.o] Error 1
[chrism@kurtz src]$

As a result, I am not able to compile successfully.  (Note: FYI, the
same thing happens when following the slightly different current doc
instructions for Linux.)

So be it.  With the patch you supplied earlier (and providing *either* 
the "Solaris" or "Linux" largefile support flags to configure) I am able
to compile successfully and when invoking the resulting executable
against, I get what looks like success, e.g.:

create large file via seek (may be sparse file) ...
2500000001L =?= 2500000001L ... yes
check file size with os.fstat
check file size with os.stat
2500000001L =?= 2500000001L ... yes
play around with seek() and read() with the built largefile
0L =?= 0 ... yes
..<and so on, successfully>..

So the question is this: is there reason to disbelieve test_largefile?
There seems to be some disbelief from Barry that your patch is "enough", 
but it appears to work at least enough to fool test_largefile.  ;-)

- C

From  Fri Jan  4 21:48:49 2002
From: (eric)
Date: Fri, 4 Jan 2002 16:48:49 -0500
Subject: [Python-Dev] weave -- inline C/C++ in Python, an implementation
Message-ID: <062601c19569$984f67d0$777ba8c0@ericlaptop>

Hello group,

I'm pretty close to releasing weave 0.2, a tool that helps in combining
C/C++ with
Python code.  There are basically three ways to use it. inline() offers
C/C++ in Python.  blitz() converts Python Numeric expression to C++ for fast
execution.  And, ext_tools offer a couple of classes that build extension

If you have a few cycles to spare, I'd appreciate a few eye balls on the
documentation page and source.  Also, if people could download the
files and let me know of any failures that would be helpful.  The website
info on how to test (it's very simple).  Also, success reports on platforms
be good .  W2K, RH 7.1, Debian are about all that has been tested.

Here is the link:


From  Fri Jan  4 23:03:07 2002
From: (Martin v. Loewis)
Date: Sat, 5 Jan 2002 00:03:07 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: <> (message from Chris McDonough on Fri,
 04 Jan 2002 17:27:12 -0500)
References: <><><> <> <01e801c19551$222145a0$c617a8c0@kurtz> <> <>
Message-ID: <>

> create large file via seek (may be sparse file) ...
> 2500000001L =?= 2500000001L ... yes
> check file size with os.fstat
> check file size with os.stat
> 2500000001L =?= 2500000001L ... yes
> play around with seek() and read() with the built largefile
> 0L =?= 0 ... yes
> ..<and so on, successfully>..

I have taken this is as success also. I don't know how Barry found
that the tests fail, but must likely, one of the expect calls failed,
resulting in a TestFailed exception - which would have been clearly


From  Fri Jan  4 23:15:38 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 18:15:38 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> I don't know how Barry found that the tests fail, but must
    MvL> likely, one of the expect calls failed, resulting in a
    MvL> TestFailed exception - which would have been clearly visible.

Okay, it's a build problem.  For whatever reason, the -D flags set in
configure weren't getting passed to gcc during the make.  If I add
that explicitly, everything works.  So Py2.1.2 is fine with Martin's
patch, which should be committed to the maint branch.

If I come up with a better recipe for posix-large-files I'll submit
it as a doc-fix.


From  Sat Jan  5 00:28:09 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 19:28:09 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> I don't know how Barry found that the tests fail, but must
    MvL> likely, one of the expect calls failed, resulting in a
    MvL> TestFailed exception - which would have been clearly visible.

>>>>> "BAW" == Barry A Warsaw <> writes:

    BAW> Okay, it's a build problem.  For whatever reason, the -D
    BAW> flags set in configure weren't getting passed to gcc during
    BAW> the make.  If I add that explicitly, everything works.  So
    BAW> Py2.1.2 is fine with Martin's patch, which should be
    BAW> committed to the maint branch.

    BAW> If I come up with a better recipe for posix-large-files I'll
    BAW> submit it as a doc-fix.

I think the following is a better suggestion:

% CC='gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' ./configure

CC is propagated to the Makefile so that just "make" is necessary, but
OPT and CFLAGS is not.  (Although, I seem to vaguely remember that OPT
/used/ to propagate -- I must be mis-remebering.)


From  Sat Jan  5 00:32:44 2002
From: (Fred L. Drake, Jr.)
Date: Fri, 4 Jan 2002 19:32:44 -0500 (EST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <>
References: <>
Message-ID: <>

Barry A. Warsaw writes:
 > CC is propagated to the Makefile so that just "make" is necessary, but
 > OPT and CFLAGS is not.  (Although, I seem to vaguely remember that OPT
 > /used/ to propagate -- I must be mis-remebering.)

  Or it used to work -- that's how I remember it as well.  Perhaps we
should fix this.  Feel free to file a bug report and assign it to me.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Sat Jan  5 00:39:49 2002
From: (Barry A. Warsaw)
Date: Fri, 4 Jan 2002 19:39:49 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "Fred" == Fred L Drake, Jr <> writes:

    Fred>   Or it used to work -- that's how I remember it as well.
    Fred> Perhaps we should fix this.  

It looks like OPT propagates in Python2.2-cvs, e.g. try:

    % OPT=-g ./configure

So maybe it's just a bug in release21-maint.

    Fred> Feel free to file a bug report and assign it to me.


From Anthony Baxter <>  Sat Jan  5 02:50:52 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Sat, 05 Jan 2002 13:50:52 +1100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objec ts fileobject.c,2.141,2.142
In-Reply-To: Message from "Martin v. Loewis" <>
 of "Fri, 04 Jan 2002 19:42:42 BST." <>
Message-ID: <>

>>> "Martin v. Loewis" wrote
> Yes. But it also tightens the behaviour, so it should not be applied
> to maintainance branches: no correct program would work better with
> this patch, but currently broken programs may stop working.

Yep, what he said. It's a bug, but it doesn't cause programs that are 
broken to work. It does potentially change how they break, though - this
has been one of the criterion I've been using to say "nope".

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Sat Jan  5 05:43:46 2002
From: (Martin v. Loewis)
Date: Sat, 5 Jan 2002 06:43:46 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> I think the following is a better suggestion:
> % CC='gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' ./configure
> CC is propagated to the Makefile so that just "make" is necessary, but
> OPT and CFLAGS is not.  (Although, I seem to vaguely remember that OPT
> /used/ to propagate -- I must be mis-remebering.)

What version of configure are you using? On my system, with configure, doing 


will result in a line


in the Makefile. This, in turn, will result in a compilation line

gcc -c -g -O2 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I. -I./Include -DHAVE_CONFIG_H  -o Objects/fileobject.o Objects/fileobject.c

Something else is going on on your system. Did you remove config.cache
before running configure?


From  Sat Jan  5 05:53:50 2002
From: (Martin v. Loewis)
Date: Sat, 5 Jan 2002 06:53:50 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> >>>>> "Fred" == Fred L Drake, Jr <> writes:
>     Fred>   Or it used to work -- that's how I remember it as well.
>     Fred> Perhaps we should fix this.  
> It looks like OPT propagates in Python2.2-cvs, e.g. try:
>     % OPT=-g ./configure
> So maybe it's just a bug in release21-maint.
>     Fred> Feel free to file a bug report and assign it to me.
> Done.

Before changing the documentation, I'd like to understand the problem
Barry is seeing first, or I'd like to hear independent confirmation
that the docs have a bug. Chris' report, in

is contradicting: On one hand, he says that following the
instructions, he got an interpreter that does LFS correctly; but he
also says that the compilation line is just

gcc -c -g -O2  -I. -I./Include -DHAVE_CONFIG_H  -o Objects/fileobject.o

which cannot possibly have the desired effect, AFAICT.


From  Sat Jan  5 16:42:35 2002
From: (Barry A. Warsaw)
Date: Sat, 5 Jan 2002 11:42:35 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> What version of configure are you using? On my system, with
    MvL> configure, doing is at the head of the release21-maint branch, so that's
definitely the version I'm using.

    MvL> -O2 $CFLAGS" ./configure

    MvL> will result in a line


    MvL> in the Makefile.

Not for me.  In fact the -D symbols never make it into the Makefile at
    MvL> This, in turn, will result in a compilation
    MvL> line

    MvL> -I. -I./Include -DHAVE_CONFIG_H -o Objects/fileobject.o
    MvL> Objects/fileobject.c

    MvL> Something else is going on on your system. Did you remove
    MvL> config.cache before running configure?

Of course!  I always "make distclean" before running configure again.


From  Sat Jan  5 17:02:34 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 05 Jan 2002 18:02:34 +0100
Subject: [Python-Dev] RPM *.spec file
Message-ID: <1010250155.2251.5.camel@marek.almaran.home>


Could someone point me to the RPM *.spec file with which was built the
Python 2.2 (

I've already installed from source, but I would like make some order in
my system, and currently I have 3 versions of Python in various dirs :-)

Q: why is the specfile not distributed with the sources? I've found some
outdated BeOpen specfile, but it doesn't work.=20
Yes, I can fix it, but what for when somewhere there seems to exist a
fixed one, since *.rpm binaries are supported on


Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sat Jan  5 17:29:01 2002
From: (Guido van Rossum)
Date: Sat, 05 Jan 2002 12:29:01 -0500
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: Your message of "05 Jan 2002 18:02:34 +0100."
References: <1010250155.2251.5.camel@marek.almaran.home>
Message-ID: <>

> Hi!
> Could someone point me to the RPM *.spec file with which was built the
> Python 2.2 (

I'm cc'ing Sean Reifschneider, who created them.  I'm sure he has them

> I've already installed from source, but I would like make some order in
> my system, and currently I have 3 versions of Python in various dirs :-)
> Q: why is the specfile not distributed with the sources? I've found some
> outdated BeOpen specfile, but it doesn't work. 
> Yes, I can fix it, but what for when somewhere there seems to exist a
> fixed one, since *.rpm binaries are supported on

I've asked Sean to contribute his specfile, but he hasn't given them
to me yet.

You did read I hope?

--Guido van Rossum (home page:

From  Sat Jan  5 17:48:47 2002
From: (Martin v. Loewis)
Date: Sat, 5 Jan 2002 18:48:47 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <1010250155.2251.5.camel@marek.almaran.home> (message from Marek	=?iso-8859-13?Q?P=E6tlicki?= on 05 Jan 2002 18:02:34 +0100)
References: <1010250155.2251.5.camel@marek.almaran.home>
Message-ID: <>

> Could someone point me to the RPM *.spec file with which was built the
> Python 2.2 (
> I've already installed from source, but I would like make some order in
> my system, and currently I have 3 versions of Python in various dirs :-)

Did you look at the src.rpm? That should definitely include a spec
file (didn't check, though). Just do rpm -i of the src.rpm, then look
into your packages/SPECS directory.

You may also look at Misc/RPM, but Guido suggests that this is likely
*not* the spec file that was used.

> Q: why is the specfile not distributed with the sources? 

Because they have been contributed, and because the contributor did
not contribute a stand-alone spec file (although he implicitly did so
through the src.rpm).

> I've found some outdated BeOpen specfile, but it doesn't work.  Yes,
> I can fix it, but what for when somewhere there seems to exist a
> fixed one, since *.rpm binaries are supported on

They are available on They are supported only if their
creator supports them.


From  Sat Jan  5 18:02:30 2002
From: (Martin v. Loewis)
Date: Sat, 5 Jan 2002 19:02:30 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

>     MvL> in the Makefile.
> Not for me.  In fact the -D symbols never make it into the Makefile at
> all.

That is very puzzling. I just did a fresh checkout on (the
debian installation), using

cvs -z9 co -d py21 -rrelease21-maint python/dist/src

cd py21



The earliest indication that it was accepted correctly is in

checking whether the C compiler (gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 ) works... yes

In the end, Makefile will have


Can you please try the same sequence of actions on the SF compile
farm, and report what it does for you? Alternatively, can you spot the
error in the commands I used?


From  Sat Jan  5 21:00:58 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 05 Jan 2002 22:00:58 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <>
References: <1010250155.2251.5.camel@marek.almaran.home>
Message-ID: <1010264459.3051.9.camel@marek.almaran.home>

W li=B6cie z sob, 05-01-2002, godz. 18:48, Martin v. Loewis pisze:=20
> > Could someone point me to the RPM *.spec file with which was built the
> > Python 2.2 (
> >=20
> > I've already installed from source, but I would like make some order in
> > my system, and currently I have 3 versions of Python in various dirs :-=
> Did you look at the src.rpm? That should definitely include a spec
> file (didn't check, though). Just do rpm -i of the src.rpm, then look
> into your packages/SPECS directory.

I'm sure src.rpm will have the correct version :-)
I only hoped to get only a few kilo heavy specfile instead of a few
mega src.rpm :-) Sorry if I disturb you hackers with such a silly=20
> You may also look at Misc/RPM, but Guido suggests that this is likely
> *not* the spec file that was used.


> > I've found some outdated BeOpen specfile, but it doesn't work.  Yes,
> > I can fix it, but what for when somewhere there seems to exist a
> > fixed one, since *.rpm binaries are supported on
> They are available on They are supported only if their
> creator supports them.

I understand that, it is the same as with Windows installers (you don't
_have to_ supply the installer creator off course), but since they _are_=20
present in SRPMs they _could_ be present in *.tgz, couldn't they? (well,=20
at least you could erase those outdated ones from Misc/RPM, they are=20
useless anyway :-)


Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sat Jan  5 21:01:01 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 05 Jan 2002 22:01:01 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <>
References: <1010250155.2251.5.camel@marek.almaran.home>
Message-ID: <1010264088.3050.8.camel@marek.almaran.home>

W li=B6cie z sob, 05-01-2002, godz. 18:48, Martin v. Loewis pisze:=20
> > Could someone point me to the RPM *.spec file with which was built the
> > Python 2.2 (
> >=20
> > I've already installed from source, but I would like make some order in
> > my system, and currently I have 3 versions of Python in various dirs :-=
> Did you look at the src.rpm? That should definitely include a spec
> file (didn't check, though). Just do rpm -i of the src.rpm, then look
> into your packages/SPECS directory.

well, I hoped to get only a few kilo heavy specfile instead of a few
mega src.rpm :-)
> You may also look at Misc/RPM, but Guido suggests that this is likely
> *not* the spec file that was used.


> > I've found some outdated BeOpen specfile, but it doesn't work.  Yes,
> > I can fix it, but what for when somewhere there seems to exist a
> > fixed one, since *.rpm binaries are supported on
> They are available on They are supported only if their
> creator supports them.

I understand that, it is the same as with Windows installers (you don't
_have to_ supply the installer creator off course), but since they _are_
in SRPMs they _could_ be present in *.tgz, couldn't they? :-)

And: where can I find those SRPMs anyway?


Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sat Jan  5 21:20:19 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 05 Jan 2002 22:20:19 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <1010264088.3050.8.camel@marek.almaran.home>
References: <1010250155.2251.5.camel@marek.almaran.home>
Message-ID: <1010265603.3049.14.camel@marek.almaran.home>

sorry for this one, must've sent it out of the trash :-(

Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sat Jan  5 21:53:40 2002
From: (Barry A. Warsaw)
Date: Sat, 5 Jan 2002 16:53:40 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> That is very puzzling. I just did a fresh checkout on
    MvL> (the debian installation), using

    MvL> Can you please try the same sequence of actions on the SF
    MvL> compile farm, and report what it does for you? Alternatively,
    MvL> can you spot the error in the commands I used?

Actually, let's do something different.  My laptop is a pretty stock
Mandrake 8.1 installation, while my desktop is a RH 6.1-ish system
(with a few kernel and other package updates).

On a fresh checkout of the release21-maint branch on the RH system,
everything works fine; the posix-large-file recipe does indeed
propagate the extended OPT macro into the Makefile.  A fresh checkout
on the Mandrake system it does not.  Weird.

What's different?  On both systems we're using autoconf 2.13, and m4
version 1.4.  On the RH system I've got GNU make version 3.77, but on
Mandrake it's 3.79.1 so that's one obvious difference.  But I'd be
surprised if this is a make bug because I didn't think make was
invoked during the configure phase.

I've gotta run, but I'll try to look into this some more later on.


From  Sat Jan  5 22:37:39 2002
From: (Neil Hodgson)
Date: Sun, 6 Jan 2002 09:37:39 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <>
Message-ID: <016e01c19639$94c909b0$0acc8490@neil>

   Explored the possibility of detecting Unicode arguments to open and using
_wfopen on Windows NT. This led to trying to store Unicode strings in the
f_name and f_mode fields of the file object which started to escalate into
complexity making Mark's mbcs choice more understandable.

   Another approach is to use utf-8 as the Py_FileSystemDefaultEncoding and
then convert to and from in each file system access function. The core file
open function from fileobject.c changed to work with utf-8 is at the end of
this message with the important lines in the #ifdef MS_WIN32 section. Along
with that change goes a change in Py_FileSystemDefaultEncoding to be "utf-8"
rather than "mbcs".

   This change works for me on Windows 2000 and allows access to all files
no matter what the current code page is set to. On Windows 9x (not yet
tested), the _wfopen call should fail causing a fallback to fopen. Possibly
the OS should be detected instead and _wfopen not attempted on 9x. On 9x,
mbcs may be a better choice of encoding although it may also be possible to
ask the file system to find the wide character file name and return the
mangled short name that can then be used by fopen.

   The best approach to me seems to be to make Py_FileSystemDefaultEncoding
settable by the user, at least allowing the choice between 'utf-8' and
'mbcs' with a default of 'utf-8' on NT and 'mbcs' on 9x.

   This approach can be extended to other file system calls with, for
example, os.listdir and glob.glob upon detecting a utf-8 default encoding,
using wide character system calls and converting to utf-8.

   Please criticise any stylistic or correctness issues in the code as it is
my first modification to the Python sources.


static PyObject *
open_the_file(PyFileObject *f, char *name, char *mode)
 assert(f != NULL);
 assert(name != NULL);
 assert(mode != NULL);
 assert(f->f_fp == NULL);

 /* can't stop a user from getting the file() constructor --
    all they have to do is get *any* file object f, and then do
    type(f).  Here we prevent them from doing damage with it. */
 if (PyEval_GetRestricted()) {
   "file() constructor not accessible in restricted mode");
  return NULL;
 errno = 0;
 if (*mode == '*') {
  FILE *fopenRF();
  f->f_fp = fopenRF(name, mode+1);
#ifdef MS_WIN32
  if (strcmp(Py_FileSystemDefaultEncoding, "utf-8") == 0) {
            PyObject *wname;
            PyObject *wmode;
            wname = PyUnicode_DecodeUTF8(name, strlen(name), "strict");
            wmode = PyUnicode_DecodeUTF8(mode, strlen(mode), "strict");
   if (wname && wmode) {
    f->f_fp = _wfopen(PyUnicode_AS_UNICODE(wname),
  if (NULL == f->f_fp) {
   f->f_fp = fopen(name, mode);
  f->f_fp = fopen(name, mode);
 if (f->f_fp == NULL) {
  /* Metroworks only, wich does not always sets errno */
  if (errno == 0) {
   PyObject *v;
   v = Py_BuildValue("(is)", 0, "Cannot open file");
   if (v != NULL) {
    PyErr_SetObject(PyExc_IOError, v);
   return NULL;
  if (errno == EINVAL)
   PyErr_Format(PyExc_IOError, "invalid argument: %s",
   PyErr_SetFromErrnoWithFilename(PyExc_IOError, name);
  f = NULL;
 return (PyObject *)f;

From  Sat Jan  5 23:05:57 2002
From: (Jack Jansen)
Date: Sun, 06 Jan 2002 00:05:57 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: Message by "M.-A. Lemburg" <> ,
 Fri, 04 Jan 2002 18:06:53 +0100 , <>
Message-ID: <>

Recently, "M.-A. Lemburg" <> said:
> Jack Jansen wrote:
> > 
> > Off on a slight tangent:
> > On Mac OS X the default 8-bit encoding is UTF8. os.listdir() handles
> > this fine and so does open(). The OS does all the hard work for
> > you [...]
> > But in Python (unix-Python we're talking here, not MacPython),
> > unicode(filename) fails, because site.encoding is "ascii".
> > 
> > Would it be safe to set site.encoding to utf8 on Mac OS X by default?
> I'd rather suggest to use UTF-8 as default encoding in the
> subsystem layer I was talking about. 

Uhm... Do you mean Py_FileSystemDefaultEncoding? Otherwise: what do
you mean? And, if you do mean Py_FSDE, would that also work for
listdir()? No, I guess it can't because listdir() returns simple
strings, so by the time I pass them to unicode() all knowledge that
they came from listdir is gone...

Hmm, shouldn't StringObjects themselves carry an encoding field
(defaulting to sys.encoding)? That would solve quite a few
issues. read() from a binary file would return the special encoding
"binary", for instance, and then the "u" and "u#" formats could make a
distinction between character strings (which would be converted to
unicode using the encoding they carry) and binary strings (which would
be interpreted as 16-bit chars). But interning may be a showstopper,
now that I think of it...

> Making UTF-8 the default Python system encoding would have many other 
> consequences -- and you'd probably lose a great deal of portability 
> since UTF-8 conversion (nearly) always will succeed while ASCII can 
> easily fail on other systems which use e.g. Latin-1 as native 
> encoding.

What are your reasons for asserting this? If I read this correctly
this would make Python compatible to the least common denominator of
all platforms, while I think I would prefer it to allow access to all
the niceties a platform gives. On Unix you really don't have a good
guess for the encoding, but on MacOS and Windows you do...

Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++ | ++++ if you agree copy these lines to your sig ++++        | see 

From  Sat Jan  5 23:18:36 2002
From: (Jack Jansen)
Date: Sun, 06 Jan 2002 00:18:36 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Fri, 4 Jan 2002 19:40:34 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> When the discussion of tagging binary strings in source code came up,
> I started to look into the standard library which string literals
> would have to be tagged as byte strings, and which are really
> character strings.
> I found that the overwhelming majority of string literals in the
> standard Python library really denotes byte strings, if you ignore doc
> strings. Sometimes, it isn't obvious that they are binary strings,
> hence the smiley.
[leaving only one example in:]
>                 version = "HTTP/0.9"
>                 status = "200"
>                 reason = ""
> Protocol elements, thus byte string.

I think you're taking it too far now. I think we should assume that
ASCII survives. If Python runs on an EBCDIC machine (does it?) I
assume that at some point the conversion of EBCDIC<->ASCII is handled

Also, as these things are readable they should be treated as such. It
should be possible to do
>>> print u"Funny reply to my "+unicode(version)+u" message"
especially when the "funny reply" bit is in Japanese.

What I would agree with, I think, is if we tag these strings as
"ascii". And that is also what the BDFL pronounced at some point:
Python sourcecode is ASCII, and if you put 8 bit characters in there
you're living dangerously.
Only when octal or hex escapes appear in a sourcecode string can it be
anything other than ascii.
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++ | ++++ if you agree copy these lines to your sig ++++        | see 

From  Sat Jan  5 23:58:13 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 00:58:13 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> What's different?  On both systems we're using autoconf 2.13, and m4
> version 1.4.  

Should be irrelevant, since you are using the generated configure (I

> On the RH system I've got GNU make version 3.77, but on
> Mandrake it's 3.79.1 so that's one obvious difference.  But I'd be
> surprised if this is a make bug because I didn't think make was
> invoked during the configure phase.

Right. That leaves it to /bin/sh.


From  Sun Jan  6 00:10:42 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 01:10:42 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (message from Jack
 Jansen on Sun, 06 Jan 2002 00:18:36 +0100)
References: <>
Message-ID: <>

> [leaving only one example in:]
> >                 version = "HTTP/0.9"
> >                 status = "200"
> >                 reason = ""
> > 
> > Protocol elements, thus byte string.
> I think you're taking it too far now. I think we should assume that
> ASCII survives. 

That is not the issue. That string *is* a byte string. The HTTP
protocol is not defined in terms of character sequences, but in terms
of byte sequences, or else interoperability would be lost.

If those strings would converted to character strings (i.e. Unicode
strings), it would still work, but it won't be correct anymore. That's
just like giving a file size as a double: it would probably work, but
it won't be correct.

> Also, as these things are readable they should be treated as such. It
> should be possible to do
> >>> print u"Funny reply to my "+unicode(version)+u" message"
> especially when the "funny reply" bit is in Japanese.

That is a nice property of so-called "text" protocols. That still
doesn't make it a character-oriented protocol; HTTP *is* a byte
oriented protocol. If you have a binary protocol, there is likely also
a version field in it, but you'd have to write

print u"Funny reply to my "+XDRversion2string(version)+u" message"

> What I would agree with, I think, is if we tag these strings as
> "ascii". 

That is pointless. Having strings tagged with their encoding is also a
possible architecture for a programming language, but none that Python
has chosen to take. Instead, Python has selected to have only a single
data type for character data, namely Unicode.

> Python sourcecode is ASCII, and if you put 8 bit characters in there
> you're living dangerously.
> Only when octal or hex escapes appear in a sourcecode string can it be
> anything other than ascii.

The octal escapes, in themselves, are also ASCII, or else you could
not put them into source code. The traditional string type in Python
really is a byte string type first of all. It can be used as a
character string type only if you imply a character set and an
encoding. The source being ASCII just gives you a guarantee about the
bytes you get at runtime.


From  Sun Jan  6 00:20:27 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 01:20:27 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (message from Jack
 Jansen on Sun, 06 Jan 2002 00:05:57 +0100)
References: <>
Message-ID: <>

> Hmm, shouldn't StringObjects themselves carry an encoding field
> (defaulting to sys.encoding)? 

That approach has been discussed during the design phase of the
Unicode API; Bill Janssen was the first to propose this in response
to my talk

During the Unicode design, this idea came up sometimes, but it always
turned out that proposers could not give a coherent semantics to such
tags. Just explain what happens if you add two strings that have
different encodings.

> That would solve quite a fewb issues.

And introduce many new ones.

> > Making UTF-8 the default Python system encoding would have many other 
> > consequences -- and you'd probably lose a great deal of portability 
> > since UTF-8 conversion (nearly) always will succeed while ASCII can 
> > easily fail on other systems which use e.g. Latin-1 as native 
> > encoding.
> What are your reasons for asserting this? 

If I understand this claim correctly, he means:

"Currently, if auto-conversion (to ASCII) succeeds, the result is
 likely correc. If the default encoding was UTF-8, conversion would
 succeed for all Unicode objects, but give incorrect results for many
 users, e.g. if they use Latin-1 on their terminal"

This is actually a frequent problem since the introduction of UTF-8:
Some applications display the bytes that make up an UTF-8 string as if
it was a Latin-1 string, rendering it completely unreadable (although
I can already recognize my name if I run into such an application).

This problem may go unnoticed during testing, whereas an exception
is likely noticed.

> If I read this correctly this would make Python compatible to the
> least common denominator of all platforms, while I think I would
> prefer it to allow access to all the niceties a platform gives.

It does no such thing. The application has full control over all
conversions, if it initiates them explicitly. Explicit is better then


From  Sun Jan  6 00:33:08 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 01:33:08 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <016e01c19639$94c909b0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil>
Message-ID: <>

>    This change works for me on Windows 2000 and allows access to all files
> no matter what the current code page is set to. On Windows 9x (not yet
> tested), the _wfopen call should fail causing a fallback to fopen. Possibly
> the OS should be detected instead and _wfopen not attempted on 9x. 

Now that you have that change, please try to extend it to
posixmodule.c. This is where I gave up. Notice that, with changing
Py_FileSystemDefaultEncoding and open() alone, you have worsened the
situation: os.stat will now fail on files with non-ASCII names on
which it works under the mbcs encoding, because windows won't find the
file (correct me if I'm wrong).

> On 9x, mbcs may be a better choice of encoding although it may also
> be possible to ask the file system to find the wide character file
> name and return the mangled short name that can then be used by
> fopen.

It is not just 9x: if you have ten (*) different APIs to open a file, 10
different APIs to stat a file, and so on, and have to select some of
them at compile time, and some of them at run-time, it gets messy very

(*) I'd expect that other systems may also have proprietary system
calls to do these things, using either wchar_t* or a proprietary
Unicode type.

>    The best approach to me seems to be to make
> Py_FileSystemDefaultEncoding settable by the user, at least allowing
> the choice between 'utf-8' and 'mbcs' with a default of 'utf-8' on
> NT and 'mbcs' on 9x.

By the user, or by the application? How can the application make a
more educated guess than Python proper? Alternatively, how can the
user (or her Administrator) know what value to put in there?

On Windows, probably neither is a good idea; if the file system
default encoding is used in the future, fixing it at mbcs is the best
I can think of.

>    Please criticise any stylistic or correctness issues in the code
> as it is my first modification to the Python sources.

The code looks fine. I'd encourage you to continue on that topic; just
expect that it will need many more rounds for completion.


From  Sun Jan  6 01:16:58 2002
From: (Sean Reifschneider)
Date: Sat, 5 Jan 2002 18:16:58 -0700
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <>; from on Sat, Jan 05, 2002 at 05:38:17PM -0700
References: <>
Message-ID: <>

>> Could someone point me to the RPM *.spec file with which was built the
>> Python 2.2 (

The nice thing about an source RPM file (.src.rpm) is that includes *ALL*
things necessary to reproduce the build of the software.  This includes the
pristine source, any modifications required to build, shell commands to
build/install it, and the meta-data.

So, pick up the .src.rpm file from the URL you mention above.  It's got
everything you need...  You can either install it and find the .spec file
in /usr/src/redhat/SPECS, or you can extract the .src.rpm using "rpm2cpio"
and using CPIO to get just the files you want.

>I've asked Sean to contribute his specfile, but he hasn't given them
>to me yet.

Well, at some point we were talking about wether to eliminate the patches
and how to do it for expat and ...  I was actually thinking last night of
putting them into CVS (I made some modifications that I thought would let
Zope 2.4.3 build, but didn't and I backed them out).

How would you like to deal with the .spec file and patches?  I can easily
enough turn the patches into sed commands in the setup, which would mean
you could build the RPMs from the Python tar file directly, if included

Do you want me to just mail the new .spec file to you when I ask you to
upload the new RPMs, or do you want a script that would check out the
latest .spec file from my CVS into your tree, update the version number in
it, and go from there, or can I get access to your CVS for checking in new
.specs?  Or, I guess we could get it updated into the current CVS and I
could submit patches through the Sourceforge.  Except I always sourceFORGET
to click the "click here to attach file" button.  <sigh>


 "McGuyver stole all his tricks from Dr. Who."
Sean Reifschneider, Inimitably Superfluous <> - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

From  Sun Jan  6 01:26:48 2002
From: (Barry A. Warsaw)
Date: Sat, 5 Jan 2002 20:26:48 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> Right. That leaves it to /bin/sh.

Yup.  A bash bug?

/bin/sh (aka bash) version 2.03.8 on RH6.1 vs. 2.05.1 on MD8.1.  It
isn't sed, which is at version 3.02 on both.

Hmm, a bash bug?


From  Sun Jan  6 01:47:43 2002
From: (Neil Hodgson)
Date: Sun, 6 Jan 2002 12:47:43 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <>
Message-ID: <021901c19654$21f2e3f0$0acc8490@neil>

Martin v. Loewis:

> Now that you have that change, please try to extend it to
> posixmodule.c. This is where I gave up.

   OK., os.stat, and os.listdir now work. Placed temporarily at

   os.stat is ugly because the posix_do_stat function is parameterised over
a stat function pointer but it is always _stati64 on Windows so the patch
just assumes _wstati64 is right. os.listdir returns Unicode objects rather
than strings. This makes glob.glob work as well so my earlier script that
finds the *.html files and opens them works. Unfortunately, I expect most
callers of glob() will be expecting narrow strings.

> Notice that, with changing
> Py_FileSystemDefaultEncoding and open() alone, you have worsened the
> situation: os.stat will now fail on files with non-ASCII names on
> which it works under the mbcs encoding, because windows won't find the
> file (correct me if I'm wrong).

   If you give it a file name encoded in the current code page then it may
fail where it did not before.


From  Sun Jan  6 03:26:23 2002
From: (Guido van Rossum)
Date: Sat, 05 Jan 2002 22:26:23 -0500
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: Your message of "Sat, 05 Jan 2002 18:16:58 MST."
References: <>
Message-ID: <>

FYI, I've checked in Sean's RPM spec file and the patches under
Misc/RPM/, replacing the previous (outdated) contents there.

--Guido van Rossum (home page:

From  Sun Jan  6 11:50:21 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 06 Jan 2002 12:50:21 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <>
References: <>
Message-ID: <1010317823.1347.0.camel@marek.almaran.home>

W li=B6cie z nie, 06-01-2002, godz. 04:26, Guido van Rossum pisze:=20
> FYI, I've checked in Sean's RPM spec file and the patches under
> Misc/RPM/, replacing the previous (outdated) contents there.

thank you very much: this is what suits me best - I prefer the specfile
in the main tarfile (AND cvs). In this way if building RPM-s doesn't
work I can always build it in the 'classic' way but still I don't have
to wait for the src.rpms to appear (don't want to say that this is the
case with Python releases, but generarly I prefer original tarballs
_with_ rpm specs to src.rpms with nobody-knows-what-changes-applied).

thanks and best regards

Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sun Jan  6 11:50:20 2002
From: (Marek =?iso-8859-13?Q?P=E6tlicki?=)
Date: 06 Jan 2002 12:50:20 +0100
Subject: [Python-Dev] RPM *.spec file
In-Reply-To: <>
References: <>
Message-ID: <1010317824.1636.1.camel@marek.almaran.home>

W li=B6cie z nie, 06-01-2002, godz. 04:26, Guido van Rossum pisze:=20
> FYI, I've checked in Sean's RPM spec file and the patches under
> Misc/RPM/, replacing the previous (outdated) contents there.

thank you very much: this is what suits me best - I prefer the specfile
in the main tarfile (AND cvs). In this way if building RPM-s doesn't
work I can always build it in the 'classic' way but still I don't have
to wait for the src.rpms to appear (don't want to say that this is the
case with Python releases, but generarly I prefer original tarballs
_with_ rpm specs to src.rpms with nobody-knows-what-changes-applied).

thanks and best regards

Marek P=EAtlicki <>
Linux User ID=3D162988

From  Sun Jan  6 12:14:55 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 13:14:55 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <021901c19654$21f2e3f0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil>
Message-ID: <>

> > Now that you have that change, please try to extend it to
> > posixmodule.c. This is where I gave up.
>    OK., os.stat, and os.listdir now work. Placed temporarily at

Looks good. The posix_do_stat changes contain an error; you have put
Python API calls inside the BEGIN_ALLOW_THREADS block. That is wrong:
you must always hold the interpreter lock when calling Python
API. Also, when calling _wstati64, you might want to assert that the
function pointer is _stati64. Likewise, the code inside posix_open
should hold the interpreter lock.

> os.listdir returns Unicode objects rather than strings. This makes
> glob.glob work as well so my earlier script that finds the *.html
> files and opens them works. Unfortunately, I expect most callers of
> glob() will be expecting narrow strings.

That is not that much of a problem; we could try to define API where
it is the caller's choice.

However, the size of your changes is really disturbing here. There
used to be already four versions of listing a directory; now you've
added a fifth one. And it isn't even clear whether this code works on
W9x, is it?

There must be a way to fold the different Windows versions into a
single one; perhaps it is acceptable to drop Win16 support. I think
three different versions should be offered to the end user:
- path is plain string, result is list of plain strings
- path is Unicode string, result is list of Unicode strings
- path is Unicode string, result is list of plain strings

Perhaps one could argue that the third version isn't really needed:
anybody passing Unicode strings to listdir should be expected to get
them back also. That would leave us with two functional features on
windows. I envision a fragment that looks like this

#ifdef windows
  if (argument is unicode string) {
#define strings wide
#include "listdir_win.h"
#undef strings
  } else {
    convert argument to string
#define strings narrow
#include "listdir_win.h"
#undef strings

If you provide a similar listdir_posix and listdir_os2, it should be
possible to get a uniform implementation.

> > Notice that, with changing
> > Py_FileSystemDefaultEncoding and open() alone, you have worsened the
> > situation: os.stat will now fail on files with non-ASCII names on
> > which it works under the mbcs encoding, because windows won't find the
> > file (correct me if I'm wrong).
>    If you give it a file name encoded in the current code page then it may
> fail where it did not before.

I was actually talking about stat as a function that you haven't
touched, yet. Now, os.rename will fail if you pass two Unicode strings
referring to non-ASCII file names. posix_1str and posix_2str are like
the stat implementation, except that you cannot know statically what
the function pointer is.


From  Sun Jan  6 11:37:00 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 12:37:00 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> Yup.  A bash bug?
> /bin/sh (aka bash) version 2.03.8 on RH6.1 vs. 2.05.1 on MD8.1.  It
> isn't sed, which is at version 3.02 on both.
> Hmm, a bash bug?

Could be a test problem as well. Line 1451 in configure currently reads

if test -z "$OPT"

My guess that this is where the environment setting is
overwritten. Just put

echo "Current value of OPT is x${OPT}x"

before this test, and 

echo "New value of OPT is x${OPT}x"

after the if statement.

Actually, after re-reading the autoconf documentation, I think I see
what's happending. $OPT starts with a - (HYPHEN MINUS), so test treats
it as an option. Please try replacing the test with

if test ${OPT+set} != set


From  Sun Jan  6 16:39:19 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 17:39:19 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <> <>
Message-ID: <>

Martin v. Loewis wrote:

>>We'd still need to support other OSes as well, though, and I
>>don't think that putting all this code into fileobject.c is
>>a good idea -- after all opening files is needed by some other
>>parts of Python as well and may also be useful for extensions.
> The stuff isn't in fileobject.c. Py_FileSystemDefaultEncoding
> is defined in bltinmodule.c.

That's the global, sure but the code using it is scattered
across fileobject.c and the posix module. I think it would be
a good idea to put all this file naming code into some
Python/fileapi.c file which then also provides C APIs for
extensions to use. These APIs should then take the file name
as PyObject* rather than char* to enable them to handle
Unicode directly.

> Also, on other OSes: You can pass Unicode object to open on all
> systems. If Py_FileSystemDefaultEncoding is NULL, it will fall back to
> site.encoding.
> Of course, if the system has an open function that expects wchar_t*,
> we might want to use that instead of going through a codec. Off hand,
> Win32 seems to be the only system where this might work, and even
> there, it won't work on Win95.

I expect this to become a standard in the next few years.

>>I'd suggest to implement something similiar to the DLL loading
>>code which is also implemented as subsystem in Python.
> I'd say this is over-designed. It is not that there are ten
> alternative approaches to doing encodings in file names, and we only
> support two of them, but it is rather that there are only two, and we
> support all three of them :-)
> Also, it is more difficult than threads: for threads, there is a fixed
> set of API features that need to be represented. Doing Py_UNICODE*
> opening alone is easy, but look at the number of posixmodule functions
> that all expect file names of some sort.

Doesn't that support the idea of having a small subsystem
in Python which exposes the Unicode aware APIs to Python
and its extensions ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sun Jan  6 16:58:41 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 17:58:41 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <>
Message-ID: <>

Jack Jansen wrote:

> I'm going to jump out of this discussion for a while. Martin and Mark have 
> a completely different view on Unicode than I do, apparently, and I think 
> I should first try and see if I can use the current implementation.


> For the record: my view of Unicode is really "ascii done right", i.e. a 
> datatype that allows you to get richer characters than what 1960s ascii 
> gives you. For this it should be as backward-compatible as possible, i.e. 
> if some API expects a unicode filename and I pass "a.out" it should 
> interpret it as u"a.out". All the converting to different charsets is 
> icing on the cake, the number one priority should be that unicode is as 
> compatible as possible with the 8-bit convention used on the platform 
> (whatever it may be). No, make that the number 2 priority: the number one 
> pritority is compatibility with 7-bit ascii. Using Python StringObjects as 
> binary buffers is also far less common than using StringObjects to store 
> plain old strings, so if either of these uses bites the other it's the 
> binary buffer that needs to suffer. UnicodeObjects and StringObjects 
> should behave pretty orthogonal to how FloatObjects and IntObjects behave.

It would be nice if Unicode could be made to behave that way,
but unfortunately, the 8-bit world is so differentiated with
lots of different encodings that not even Harry Potter would
have much luck finding the right magic to apply.

Another problem is that of the getargs.c API itself: since it returns

pointers to data buffers, auto-conversions (if at all possible)
which involve temporary objects must be handled differently than
normal Python string objects.

Now, the question is whether you are willing to pay for the
comfort of getting direct access to a Py_UNICODE buffer (or char
buffer) with extra copy-action and additional PyMem_Free() cleanup
overhead or not. The "O" parser marker doesn't provide any
magic on its own, but also reduces the need for copying data
and handling memory management in you APIs.

In my last message on this thread, I proposed to add "eu#" which
returns a Py_UNICODE buffer, possibly decoding a string object
using the given encoding first. As Martin noted, this option
requires extra copying but simplifies the C coding somewhat.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sun Jan  6 17:16:31 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 18:16:31 +0100
Subject: [Python-Dev] Re: [XML-SIG] printing Unicode xml to StringIO
References: <> <> <> <> <> <> <>              <> <>
Message-ID: <>

Guido van Rossum wrote:

>>>- Since we added a note to the docs that StringIO supports Unicode, we
>>>  clearly should continue to support that, and it's a bug if it
>>>  doesn't.
>>I still believe that the docs are wrong, but nevermind. I'll fix
>> to continue to support Unicode in addition to strings
>>and buffer objects. It's basically only about special casing
>>Unicode in the .write() method.
> Thanks.
>>BTW, I was never aware of the doc changes in this area and the 
>>test suite didn't bring up the issues either.
> Can you please add something to the test suite that makes sure this
> feature works?
>>>- OTOH, Unicode for cStringIO should be considered at best a feature
>>>  request.  I don't mind if cStringIO doesn't support Unicode -- it
>>>  never has, AFAIK, so it won't break much code.  I don't believe it's
>>>  much faster than StringIO, unless you use the C API (like cPickle
>>>  does).
>>Unicode support in cStringIO would require a new implementation
>>since the machinery uses raw byte buffers.
> That's why I don't care much about it. :-)
>>>- Of course, when Unicode is supported, mixing ASCII and Unicode
>>>  should be supported too.  (But not necessarily mixing 8-bit strings
>>>  containing characters in the range \200-\377, since there's no
>>>  default encoding for this range.)
>>In this is not much of a problem since it uses
>>a list of snippets. Note that this is also why "supported"
>>Unicode in the first place (and that's why I think it was more an
>>artifact of the implementation than true intent).
> But it was useful! :-)
>>>- Since this changed from 2.1 to 2.2, we should restore this
>>>  capability in 2.2.1; I would say that 2.2.1 can't go out until this
>>>  is fixed.
> Try to mark the checkin messages as "2.2.1 bugfix", for the 2.2.1
> patch czar.

Checked in.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sun Jan  6 17:30:17 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 18:30:17 +0100
Subject: [Python-Dev] Add to the standard lib ?!
Message-ID: <>

This is a multi-part message in MIME format.
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Should I go ahead and checkin into the Python 2.2
tree together with some docs ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

Content-Type: message/rfc822;
 name="[Python-bugs-list] [ python-Feature Requests-494854 ] add"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="[Python-bugs-list] [ python-Feature Requests-494854 ] add"

Received: from ( [])
	by (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) with ESMTP id fBJGsRv07455
	for <>; Wed, 19 Dec 2001 17:54:27 +0100
Received: from localhost.localdomain ([]
	by with esmtp (Exim 3.21 #1)
	id 16Gjyu-0002Bp-00; Wed, 19 Dec 2001 11:54:12 -0500
Received: from [] (
	by with esmtp (Exim 3.21 #1)
	id 16Gjy6-00025X-00
	for; Wed, 19 Dec 2001 11:53:22 -0500
Received: from ([]
	by with esmtp (Exim 3.22 #1 (Debian))
	id 16Gjy4-0002BT-00; Wed, 19 Dec 2001 08:53:20 -0800
Received: from nobody by with local (Exim 3.22 #1 (Debian))
	id 16Gjy4-0005mw-00; Wed, 19 Dec 2001 08:53:20 -0800
Message-Id: <>
Subject: [Python-bugs-list] [ python-Feature Requests-494854 ] add
X-Mailman-Version: 2.0.8 (101270)
Precedence: bulk
List-Help: <>
List-Post: <>
List-Subscribe: <>,
List-Id: List which receives bug reports on Python <>
List-Unsubscribe: <>,
List-Archive: <>
Date: Wed, 19 Dec 2001 08:53:20 -0800
MIME-Version: 1.0

Feature Requests item #494854, was opened at 2001-12-18 17:16
You can respond by visiting:

Category: Python Library
Group: None
Status: Open
Priority: 5
Submitted By: Jason R. Mastaler (jasonrm)
>Assigned to: M.-A. Lemburg (lemburg)
Summary: add

Initial Comment:
Here's a request to add Marc-Andre Lemburg's to the Python standard library.

It provides more complete platform information
than either sys.platform or

For more info, see:


Comment By: M.-A. Lemburg (lemburg)
Date: 2001-12-19 01:27

Logged In: YES 

No problem from here :-)


You can respond by visiting:

Python-bugs-list maillist  -


From  Sun Jan  6 17:41:32 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 18:41:32 +0100
Subject: [Python-Dev] Add to the standard lib ?!
References: <>
Message-ID: <>

M.-A. Lemburg wrote:

> Should I go ahead and checkin into the Python 2.2
> tree together with some docs ?

I meant CVS tree... of course.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sun Jan  6 18:10:06 2002
From: (Jason Orendorff)
Date: Sun, 6 Jan 2002 12:10:06 -0600
Subject: [Python-Dev] Add to the standard lib ?!
In-Reply-To: <>
Message-ID: <>

> Should I go ahead and checkin into the Python 2.2
> tree together with some docs ?

I noticed that the regular expressions in this module, throughout,
don't use raw strings.  Don't know if that's intentional.

## Jason Orendorff

From  Sun Jan  6 18:54:03 2002
From: (Guido van Rossum)
Date: Sun, 06 Jan 2002 13:54:03 -0500
Subject: [Python-Dev] Add to the standard lib ?!
In-Reply-To: Your message of "Sun, 06 Jan 2002 18:30:17 +0100."
References: <>
Message-ID: <>

> Should I go ahead and checkin into the Python 2.2
> tree together with some docs ?

There is no Python 2.2 tree.  Maybe you mean the 2.3 tree?  No problem
for me.

--Guido van Rossum (home page:

From  Sun Jan  6 19:44:45 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 20:44:45 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <> <> <>
Message-ID: <>

> That's the global, sure but the code using it is scattered
> across fileobject.c and the posix module. I think it would be
> a good idea to put all this file naming code into some
> Python/fileapi.c file which then also provides C APIs for
> extensions to use. These APIs should then take the file name
> as PyObject* rather than char* to enable them to handle
> Unicode directly.

What do you gain by that? Most of the posixmodule functions that take
filenames are direct wrappers around the system call. Using another
level of indirection is only useful if the fileapi.c functions are
used in different places. Notice that each function (open, access,
stat, etc) is used exactly *once* currently, so putting this all into
a single place just makes the code more complex.

The extensions module argument is a red herring: I don't think there
are many extension modules out there which want to call access(2) but
would like to do so using a PyObject* as the first argument, but
numbers as the other arguments.

> > Of course, if the system has an open function that expects wchar_t*,
> > we might want to use that instead of going through a codec. Off hand,
> > Win32 seems to be the only system where this might work, and even
> > there, it won't work on Win95.
> I expect this to become a standard in the next few years.

I doubt that. Posix people (including developers of various posixish
systems) have frequently rejected that idea in recent years. Even for
the most recent system in this respect (OS X), we hear that they still
open files with a char*, where char is byte - the only advancement is
that there is a guarantee that those bytes are UTF-8. 

It turns out that this is all you need: with that guarantee, there is
no need for an additional set of APIs. UTF-8 was originally invented
precisely to represent file names (and was called UTF-1 at that time);
it is more likely that more systems will follow this convention. If
so, a global per-system file system encoding is all that's needed.

The only problem is that on Windows, MS has already decided that the
APIs are in CP_ANSI, so they cannot change it to UTF-8 now; that's why
Windows will need special casing if people are unhappy with the "mbcs"
approach (which some apparantly are).

> > Also, it is more difficult than threads: for threads, there is a fixed
> > set of API features that need to be represented. Doing Py_UNICODE*
> > opening alone is easy, but look at the number of posixmodule functions
> > that all expect file names of some sort.
> Doesn't that support the idea of having a small subsystem
> in Python which exposes the Unicode aware APIs to Python
> and its extensions ?

No. It is a lot of work, and an additional layer of indirection, with
no apparent advantage. Feel free to write a PEP, though.


From  Sun Jan  6 19:48:41 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 20:48:41 +0100
Subject: [Python-Dev] Unicode support in getargs.c
In-Reply-To: <> (
References: <> <>
Message-ID: <>

> In my last message on this thread, I proposed to add "eu#" which
> returns a Py_UNICODE buffer, possibly decoding a string object
> using the given encoding first. As Martin noted, this option
> requires extra copying but simplifies the C coding somewhat.

Also, while it simplifies processing compared to "O", I cannot see any
simplification compared to "O&". So I'd be more in favor of offering
standard conversion functions for O& instead of inventing new getargs
modifiers all the time. This would also simplify creation of
cross-version extension modules: people could just incorporate the
code of the conversion function into their code base, trusting that O&
had been available for ages.


From  Sun Jan  6 21:36:45 2002
From: (Jack Jansen)
Date: Sun, 06 Jan 2002 22:36:45 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Sun, 6 Jan 2002 01:33:08 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> >    This change works for me on Windows 2000 and allows access to all files
> > no matter what the current code page is set to. On Windows 9x (not yet
> > tested), the _wfopen call should fail causing a fallback to fopen. Possibly
> > the OS should be detected instead and _wfopen not attempted on 9x. 
> Now that you have that change, please try to extend it to
> posixmodule.c. This is where I gave up. Notice that, with changing
> Py_FileSystemDefaultEncoding and open() alone, you have worsened the
> situation: os.stat will now fail on files with non-ASCII names on
> which it works under the mbcs encoding, because windows won't find the
> file (correct me if I'm wrong).

Could someone who really understands this issue (Martin?) perhaps
write a test case for this? I think something like creating a file
with some nonascii chars in the name, and verifying that open(),
readdir(), os.stat() and various others work as expected is what would
be needed (but I'm not sure I fully understand it:-).
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Sun Jan  6 21:50:55 2002
From: (Jack Jansen)
Date: Sun, 06 Jan 2002 22:50:55 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Sun, 6 Jan 2002 20:44:45 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> > That's the global, sure but the code using it is scattered
> > across fileobject.c and the posix module. I think it would be
> > a good idea to put all this file naming code into some
> > Python/fileapi.c file which then also provides C APIs for
> > extensions to use. These APIs should then take the file name
> > as PyObject* rather than char* to enable them to handle
> > Unicode directly.
> What do you gain by that? Most of the posixmodule functions that take
> filenames are direct wrappers around the system call. Using another
> level of indirection is only useful if the fileapi.c functions are
> used in different places.

Well, I only know about the Mac and (to a lesser extent) about
Windows, but there's lots of methods that are not in
{posix,mac,nt}module.c there that want filenames. And I think mmap
also uses filenames, no? All in all I'm in favor of a single place
where file name encoding magic is handled. Whether a fileapi.c is
needed or something simpler can do the trick (a PyArg_Parse fmt that
returns two items: the filename to use plus a routine you're expected
to call on it before you return?) I'm not sure.

- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Sun Jan  6 22:15:51 2002
From: (M.-A. Lemburg)
Date: Sun, 06 Jan 2002 23:15:51 +0100
Subject: [Python-Dev] Add to the standard lib ?!
References: <>
Message-ID: <>

Jason Orendorff wrote:
> > Should I go ahead and checkin into the Python 2.2
> > tree together with some docs ?
> I noticed that the regular expressions in this module, throughout,
> don't use raw strings.  Don't know if that's intentional.

It's not necessary since the escapes used in the module are
not unescaped by the Python parser, but you're probably right:
better safe than sorry...

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sun Jan  6 22:19:04 2002
From: (Jack Jansen)
Date: Sun, 06 Jan 2002 23:19:04 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
Message-ID: <>

Something I've wanted for a long time, and maybe I should drop the
idea now.

There's a lot of Python objects that are really little more than
wrappers around an opaque C pointer (plus all the methods to operate
on it, etc). These objects usually have accompanying Parse() and
Build() functions, that you pass to the O& format for PyArg_Parse and

This all works fine, but I think we can do one better. If we have
slots in the type structure to store the Parse and Build functions we
could add a new format specifier O@ (or whatever other character is
free:-) that has a typeobject parameter and a C pointer
parameter. One advantage is that this would fit a lot better with the
new class inheritance scheme. Moreover, and more importantly, this
would give us a handle to use from Python code, so structmodule could
(un)pack structures that contain pointers to objects that are
python-wrappable, calldll could neatly wrap functions that have
python-wrappable objects, etc. 

Is this a good idea?
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Sun Jan  6 22:42:34 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 23:42:34 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (message from Jack
 Jansen on Sun, 06 Jan 2002 22:36:45 +0100)
References: <>
Message-ID: <>

Content-Type: text/plain; charset=US-ASCII

> Could someone who really understands this issue (Martin?) perhaps
> write a test case for this? I think something like creating a file
> with some nonascii chars in the name, and verifying that open(),
> readdir(), os.stat() and various others work as expected is what would
> be needed (but I'm not sure I fully understand it:-).

I'll attach a script below. It contains UTF-8 encoded data, so to
prevent transmission errors, it comes base-64 attached. Running it
creates a three additional files in the current directory; I recommend
to run it in an empty directory.

In case you cannot view the source code properly, I attach a
screenshot of my editor.


Content-Type: application/octet-stream
Content-Disposition: attachment; filename=""
Content-Transfer-Encoding: base64


Content-Type: image/png
Content-Disposition: inline; filename="uni.png"
Content-Transfer-Encoding: base64


Content-Type: text/plain; charset=US-ASCII


From  Sun Jan  6 22:47:45 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 23:47:45 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <> (message from Jack
 Jansen on Sun, 06 Jan 2002 22:50:55 +0100)
References: <>
Message-ID: <>

> Well, I only know about the Mac and (to a lesser extent) about
> Windows, but there's lots of methods that are not in
> {posix,mac,nt}module.c there that want filenames. And I think mmap
> also uses filenames, no? All in all I'm in favor of a single place
> where file name encoding magic is handled.

I think Marc not only things about encoding, he also wants that the
single place actually performs the system calls. So if you want to
support mmap, or an additional system call that expects or returns a
file name, you cannot put it into your module; instead, you must put
it in fileapi.c first, and *then* call the function in fileapi.c from
your module.

It may be necessary to call different routines depending on whether
you have a byte or a character string; this is not something a getargs
converter can do. It also may be that, depending on which system
routine you call, the system will *return* either wide or narrow
strings to you. Every time you find another use of file names, Marc
suggests you put that into fileapi.c.


From  Sun Jan  6 22:51:37 2002
From: (Martin v. Loewis)
Date: Sun, 6 Jan 2002 23:51:37 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: <> (message from Jack
 Jansen on Sun, 06 Jan 2002 23:19:04 +0100)
References: <>
Message-ID: <>

> There's a lot of Python objects that are really little more than
> wrappers around an opaque C pointer (plus all the methods to operate
> on it, etc).

Can you give a few examples? I'm not aware of any such types, off-hand.


From  Sun Jan  6 23:19:11 2002
From: (Neil Hodgson)
Date: Mon, 7 Jan 2002 10:19:11 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <>
Message-ID: <024101c19708$8c64e6c0$0acc8490@neil>

This is a multi-part message in MIME format.

Content-Type: text/plain;
Content-Transfer-Encoding: 7bit

Martin v. Loewis:
> I'll attach a script below. It contains UTF-8 encoded data, so to
> prevent transmission errors, it comes base-64 attached. Running it
> creates a three additional files in the current directory; I recommend
> to run it in an empty directory.

   I have added some more cases to your example Martin, in Hebrew, Chinese
and Japanese and a combination. The combination is an interesting case as it
will not work with mbcs with a particular code page, as no code page (to my
knowledge) contains all the characters.

   This works using my modifications except for the calls to os.rename.


Content-Type: text/plain;
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;

# -*- coding: utf-8 -*-=0A=
import locale, os=0A=
locale.setlocale(locale.LC_ALL, "")=0A=
filenames =3D [=0A=
for name in filenames:=0A=
    print repr(name)=0A=
    f =3D open(name, 'w')
print os.listdir(".")=0A=
for name in filenames:=0A=


From  Mon Jan  7 00:05:08 2002
From: (Barry A. Warsaw)
Date: Sun, 6 Jan 2002 19:05:08 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

Okay, I'm totally confuggled now.  Let's boil this down.  Take this
simple program:

-------------------- snip snip --------------------/tmp/
#! /bin/sh
echo "OPT   = x${OPT}x"
echo "CFLAGS= x${CFLAGS}x"
-------------------- snip snip --------------------

and invoke it like:

% CFLAGS='one' OPT="two $CFLAGS" /tmp/

What do you get?  What do you *expect* to get?  Am I boiling things
down correctly?

On every system I've tested, the following output is what I get:

% CFLAGS='one' OPT="two $CFLAGS" /tmp/
OPT   = xtwo x
CFLAGS= xonex

So, why should any of this work anywhere?  Should we ever expect $OPT
to get the right value?

i-must-be-missing-something-really-obvious,-obvious-ly y'rs,

From  Mon Jan  7 00:20:32 2002
From: (Guido van Rossum)
Date: Sun, 06 Jan 2002 19:20:32 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: Your message of "Sun, 06 Jan 2002 19:05:08 EST."
References: <> <> <> <> <01e801c19551$222145a0$c617a8c0@kurtz> <> <> <> <> <> <> <> <> <> <> <> <>
Message-ID: <>

> Okay, I'm totally confuggled now.  Let's boil this down.  Take this
> simple program:
> -------------------- snip snip --------------------/tmp/
> #! /bin/sh
> echo "OPT   = x${OPT}x"
> echo "CFLAGS= x${CFLAGS}x"
> -------------------- snip snip --------------------
> and invoke it like:
> % CFLAGS='one' OPT="two $CFLAGS" /tmp/
> What do you get?  What do you *expect* to get?  Am I boiling things
> down correctly?
> On every system I've tested, the following output is what I get:
> % CFLAGS='one' OPT="two $CFLAGS" /tmp/
> OPT   = xtwo x
> CFLAGS= xonex
> So, why should any of this work anywhere?  Should we ever expect $OPT
> to get the right value?

I haven't followed this, but from the above it appears that if you use
the form

VAR1=val1 VAR2=val2 ... program args

then all of val1, val2, ... are evaluated simultaneously using the
previous values of VAR1, VAR2, ... rather than left-to-right.

That's mildly surprising but not really upsetting to me.

--Guido van Rossum (home page:

From  Mon Jan  7 00:27:31 2002
From: (Neal Norwitz)
Date: Sun, 06 Jan 2002 19:27:31 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for
 2.1.2, plus 2.2.1...)
References: <>
 <> <>
Message-ID: <>

"Barry A. Warsaw" wrote:
> Okay, I'm totally confuggled now.  Let's boil this down.  Take this
> simple program:
> -------------------- snip snip --------------------/tmp/
> #! /bin/sh
> echo "OPT   = x${OPT}x"
> echo "CFLAGS= x${CFLAGS}x"
> -------------------- snip snip --------------------
> and invoke it like:
> % CFLAGS='one' OPT="two $CFLAGS" /tmp/

I think the intent was to use single quotes for OPT='two $CFLAGS'.
(You could also do OPT="two \$CFLAGS".)  This will pass the string
"$CFLAGS" in OPT, not the value of the shell variable $CFLAGS.

While your shell script will print out: OPT   = xtwo $CFLAGSx
This is ok since it will/should get expanded properly in the Makefile.

Or I've totally missed the point too. :-)


From  Mon Jan  7 02:01:46 2002
From: (Mark Hammond)
Date: Mon, 7 Jan 2002 13:01:46 +1100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
Message-ID: <>

> It may be necessary to call different routines depending on whether
> you have a byte or a character string; this is not something a getargs
> converter can do. It also may be that, depending on which system
> routine you call, the system will *return* either wide or narrow
> strings to you. Every time you find another use of file names, Marc
> suggests you put that into fileapi.c.

I'm sure that is not what Marc meant.  I think he simply meant a conversion
function that would return the filename as either byte or Unicode.  Get your
arg from PyArg_ParseTuple, and convert it with this function.

Have I missed it all these years, or should we define a PyArg_ParseTuple
format that takes a "void **" and a function pointer to a type conversion


From  Mon Jan  7 02:50:10 2002
From: (Mark Hammond)
Date: Mon, 7 Jan 2002 13:50:10 +1100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <024101c19708$8c64e6c0$0acc8490@neil>
Message-ID: <>

>    I have added some more cases to your example Martin, in Hebrew, Chinese
> and Japanese and a combination. The combination is an interesting
> case as it will not work with mbcs with a particular code page, as no
> code page (to my knowledge) contains all the characters.
>    This works using my modifications except for the calls to os.rename.

This looks interesting :)  Any chance of putting all this together in a
patch at source-forge?  Ultimately should be rolled into
test/, and it is unclear if is the latest with Martin's
comments - and it appears posix_open may leave 'fd' uninitialized before
comparing < 0.



From  Mon Jan  7 03:46:14 2002
From: (Neil Hodgson)
Date: Mon, 7 Jan 2002 14:46:14 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <>
Message-ID: <036d01c1972d$db4ccad0$0acc8490@neil>

Mark Hammond:

> This looks interesting :)  Any chance of putting all this together in a
> patch at source-forge?

   Eventually although I'm not yet sure the direction is sound. It does
expand the code horribly. Also not sure if I'll have the determination to
push this through to completion - there are still plenty of issues to be
resolved. For me, just having open work is the most important bit - all the
others are far less used.

> Ultimately should be rolled into
> test/,

   Directory tests added to

> and it is unclear if
> is the latest with
> comments - and it appears posix_open may leave 'fd' uninitialized before
> comparing < 0.

   New version just uploaded fixing that at


From  Mon Jan  7 03:52:30 2002
From: (Neil Hodgson)
Date: Mon, 7 Jan 2002 14:52:30 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <>
Message-ID: <038a01c1972e$bb18f7b0$0acc8490@neil>


> Looks good. The posix_do_stat changes contain an error; you have put
> Python API calls inside the BEGIN_ALLOW_THREADS block. That is wrong:
> you must always hold the interpreter lock when calling Python
> API.

   OK, moved the thread stuff so no API calls are inside. However,
PyUnicode_AS_UNICODE left in as that is just a macro for accessing a field
that should be stable over the call. Or is it? Other methods don't seem to
worry that GC will move buffers, during calls.

> Also, when calling _wstati64, you might want to assert that the
> function pointer is _stati64. Likewise, the code inside posix_open
> should hold the interpreter lock.

   OK, assert in for stat.

> However, the size of your changes is really disturbing here. There
> used to be already four versions of listing a directory; now you've
> added a fifth one. And it isn't even clear whether this code works on
> W9x, is it?

   Currently it won't work on Windows 9x. That is more work and code bulk.

> There must be a way to fold the different Windows versions into a
> single one; perhaps it is acceptable to drop Win16 support. I think
> three different versions should be offered to the end user:

   Windows does this with the preprocessor - you are either building a
Unicode version or an ANSI version.

> - path is plain string, result is list of plain strings
> - path is Unicode string, result is list of Unicode strings
> - path is Unicode string, result is list of plain strings
> Perhaps one could argue that the third version isn't really needed:

   Sounds good to me. I'm moving back towards not using the 'utf-8' system
encoding but rather checking of Unicode arguments and handling them
explicitly even at the cost of code expansion.

> Now, os.rename will fail if you pass two Unicode strings
> referring to non-ASCII file names. posix_1str and posix_2str are like
> the stat implementation, except that you cannot know statically what
> the function pointer is.

   The code now passes both narrow and wide functions to posix_nstr and
there are two null functions to make this compile on non-Windows. Added
mkdir to allow testing the chdir and rmdir functions.

   Now handled are open,, os.stat. os.listdir, os.rename, os.remove,
os.mkdir, os.chdir, os.rmdir.

   Updated files available from


From  Mon Jan  7 05:30:52 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 00:30:52 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for
 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "NN" == Neal Norwitz <> writes:

    NN> I think the intent was to use single quotes for OPT='two
    NN> $CFLAGS'.  (You could also do OPT="two \$CFLAGS".)  This will
    NN> pass the string "$CFLAGS" in OPT, not the value of the shell
    NN> variable $CFLAGS.

    NN> While your shell script will print out: OPT = xtwo $CFLAGSx
    NN> This is ok since it will/should get expanded properly in the
    NN> Makefile.

Unfortunately, none of this really helps.  Getting $(CFLAGS) into $OPT
just results in this:

Makefile:737: *** Recursive variable `CFLAGS' references itself (eventually).  Stop.

Let me suggest the following, and then I'm going to stop here.
Martin's patch to fileobject.c should be applied -- that's a given.
As for configure:

    CC='gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' ./configure

works for me.  I'll leave it up to others to decide what to change,
although IMHO posix-large-file is broken (and also because those
instructions shouldn't be necessary for Python 2.2).


From Anthony Baxter <>  Mon Jan  7 05:43:39 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Mon, 07 Jan 2002 16:43:39 +1100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: Message from (Barry A. Warsaw)
 of "Mon, 07 Jan 2002 00:30:52 CDT." <>
Message-ID: <>

>>> Barry A. Warsaw wrote
> Let me suggest the following, and then I'm going to stop here.
> Martin's patch to fileobject.c should be applied -- that's a given.
> As for configure:
>     CC='gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' ./configure
> works for me.  

That's good enough for me - I'll test it on the boxes I can find...

> I'll leave it up to others to decide what to change,


> although IMHO posix-large-file is broken 

You mean, even with these new build instructions?

> (and also because those
> instructions shouldn't be necessary for Python 2.2).

They are still going to be necessary for 2.1.2 - I don't want to try and
play the game of getting this change in and turned on by default at this
stage of the game... :/


From  Mon Jan  7 06:52:20 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 07:52:20 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> What do you get?  

martin@mira:~> CFLAGS='one' OPT="two $CFLAGS" ./
OPT   = xtwo onex
CFLAGS= xonex
martin@mira:~> echo $BASH_VERSION

> What do you *expect* to get?  

What I get, both in zsh and bash. I'd expect environment variable
assignments to be evaluated from left to right, one-by-one. The bash
documentation says

# The order of expansions is: brace expansion, tilde expansion,
# parameter, variable and arithmetic expansion and command
# substitution (done in a left-to-right fashion), word splitting, and
# pathname expansion.

The only way I can produce an error is by

martin@mira:~> env CFLAGS='one' OPT="two $CFLAGS" ./
OPT   = xtwo x
CFLAGS= xonex

This is the result of the exact procedure used by bash:

# When a simple command is executed, the shell performs the following
# expansions, assignments, and redirections, from left to right.
# 1.  The words that the parser has marked as variable assignments
#     (those preceding the command name) and redirections are
#     saved for later processing.
# 2.  The words that are not variable assignments or redirections are
#     expanded.  If any words remain after expansion, the first word
#     is taken to be the name of the command and the remaining words
#     are the arguments.
# 3.  Redirections are performed as described above under REDIRECTION.
# 4.  The text after the = in each variable assignment undergoes tilde
#     expansion, parameter expansion, command substitution, arithmetic
#     expansion, and quote removal before being assigned to the
#     variable.

So variable left-more assignments have effect on right-more
assignments, but not on any other words in the command line.

> Am I boiling things down correctly?

I would say so. That also indicates the right change to the
documentation: Just put each assignment in an individual export

export CFLAGS OPT;CFLAGS='one';OPT="two $CFLAGS";./

I'm still surprised that it fails on your bash; I get the same (IMO
correct) behaviour with bash 2.03 on Solaris. I get failures with bash
2.02, and with /bin/sh on Solaris. /bin/ksh and /usr/xpg4/bin/sh work
fine (/usr/xpg4/bin/sh actually is ksh).

> So, why should any of this work anywhere?  Should we ever expect $OPT
> to get the right value?
> i-must-be-missing-something-really-obvious,-obvious-ly y'rs,

I'd say (without further research) that this was unspecified for
Bourne Shell, and got clarified for POSIX Shell - so both recent 
Bash versions, and the Solaris ksh work fine.


From  Mon Jan  7 06:56:16 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 07:56:16 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (message
 from Guido van Rossum on Sun, 06 Jan 2002 19:20:32 -0500)
References: <> <> <> <> <01e801c19551$222145a0$c617a8c0@kurtz> <> <> <> <> <> <> <> <> <> <> <> <>
 <> <>
Message-ID: <>

> I haven't followed this, but from the above it appears that if you use
> the form
> VAR1=val1 VAR2=val2 ... program args
> then all of val1, val2, ... are evaluated simultaneously using the
> previous values of VAR1, VAR2, ... rather than left-to-right.
> That's mildly surprising but not really upsetting to me.

What *is* upsetting is that different shells behave differently; or
else the current documentation would not have been written the way it
is now (and Barry and me would not have spent the week-end researching
that). Recent bash versions, and Korn shell evaluate from left to
right (bash now documents that assignments occur *after* args have been 


From  Mon Jan  7 07:00:20 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 08:00:20 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for
 2.1.2, plus 2.2.1...)
In-Reply-To: <> (message from Neal Norwitz on
 Sun, 06 Jan 2002 19:27:31 -0500)
References: <>
 <> <> <>
Message-ID: <>

> I think the intent was to use single quotes for OPT='two $CFLAGS'.
> (You could also do OPT="two \$CFLAGS".)  This will pass the string
> "$CFLAGS" in OPT, not the value of the shell variable $CFLAGS.
> While your shell script will print out: OPT   = xtwo $CFLAGSx
> This is ok since it will/should get expanded properly in the Makefile.
> Or I've totally missed the point too. :-)

The intent really was that the later assigment takes into account the
earlier one, by means of shell expansion. Setting OPT to a value that
depends on CFLAGS would give you a cyclic expansion in the Makefile
- so that clearly was not the intent.

You need to set both because one ends up in the Makefile (OPT) whereas
the other (CFLAGS) is needed to convince configure that HAVE_LARGEFILE
should be turned on.


From  Mon Jan  7 07:07:05 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 08:07:05 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
References: <>
Message-ID: <>

> I'm sure that is not what Marc meant.  I think he simply meant a
> conversion function that would return the filename as either byte or
> Unicode.  Get your arg from PyArg_ParseTuple, and convert it with
> this function.

If you have this, how do you know whether to call fopen or wfopen? If
it was a byte string, you need to pass it to fopen; if it was a
Unicode string, you pass it to wfopen.

Maybe that's what MAL meant, but then it won't work.

> Have I missed it all these years, or should we define a PyArg_ParseTuple
> format that takes a "void **" and a function pointer to a type conversion
> function?

This is what O& does. Unless it fills a PyObject*, you have a hard
time telling what it is that you got. It works for the void** case
only if it always fills in the same type (e.g. Py_UNICODE*); filling
int Py_UNICODE* in some cases and char* in others, without telling
you, is useless.


From  Mon Jan  7 07:09:27 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 08:09:27 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <>
References: <>
Message-ID: <>

> This looks interesting :)  Any chance of putting all this together in a
> patch at source-forge?  

I do hope Neil will create a patch eventually; so far, it seems to be
more convenient to him to post snippets. This is fine with me, since
this project still has some way to go for completion.


From  Mon Jan  7 07:33:47 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 08:33:47 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (message from
 Anthony Baxter on Mon, 07 Jan 2002 16:43:39 +1100)
References: <>
Message-ID: <>

> >     CC='gcc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' ./configure
> That's good enough for me - I'll test it on the boxes I can find...

I'd strongly advise against putting that into the documentation. There
are numerous assignments to CC inside, which would
override this setting. Setting OPT and CFLAGS is the right way to pass
these configuration options.

> > I'll leave it up to others to decide what to change,
> Documentation?

Please, not the way Barry proposes.


From  Mon Jan  7 09:22:50 2002
From: (M.-A. Lemburg)
Date: Mon, 07 Jan 2002 10:22:50 +0100
Subject: [Python-Dev] Unicode strings as filenames
References: <>
Message-ID: <>

Mark Hammond wrote:
> > It may be necessary to call different routines depending on whether
> > you have a byte or a character string; this is not something a getargs
> > converter can do. It also may be that, depending on which system
> > routine you call, the system will *return* either wide or narrow
> > strings to you. Every time you find another use of file names, Marc
> > suggests you put that into fileapi.c.
> I'm sure that is not what Marc meant.  I think he simply meant a conversion
> function that would return the filename as either byte or Unicode.  Get your
> arg from PyArg_ParseTuple, and convert it with this function.

What I meant is to move all the file name code from fileobject.c
and posixmodule.c to a new file fileapi.c which lives in the 
Python/ subdir and then let it expose C APIs which the other two files
then use in their machinery.

It's basically about cleaning up the various bits and pieces
in the source code; note that this does not only involve APIs
which work on file names, but also other APIs which take
filenames as arguments and or return filenames (even though
starting out with a file name mapping API would already
go a long way).

The benefits of such an approach would be two-fold:

1. You centralize the need for #ifdefs and other platform specific
   quirks in one file. As a result future fixes will only involve
   this one file. (Py_FileSystemDefaultEncoding should also live in this
   file, BTW)

2. The C APIs can well be used by other parts of the Python interpreter
   which need to open and handle files. Extensions would also benefit
   from this, e.g. they could use the API functions to open files
   with Unicode names in a cross-platform way using a single API.

> Have I missed it all these years, or should we define a PyArg_ParseTuple
> format that takes a "void **" and a function pointer to a type conversion
> function?

Isn't this what "O&" is meant for ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Mon Jan  7 09:30:11 2002
From: (Neil Hodgson)
Date: Mon, 7 Jan 2002 20:30:11 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <>
Message-ID: <003b01c1975d$e7dd3070$0acc8490@neil>

[Replacing the other mail destinations as I didn't do a reply all last time
so python-dev dropped off. You may want to resend your last mail to

> I don't think we can drop W9x support for Python 2.3, although I'm
> still waiting for comments on dropping W3.1 support...

   I wouldn't want to drop either.

> >    Sounds good to me. I'm moving back towards not using the 'utf-8'
> > encoding but rather checking of Unicode arguments and handling them
> > explicitly even at the cost of code expansion.
> That is very good. I don't know what is best for the file name;
> perhaps it is acceptable to encode it with the file system default
> encoding (even if it ends up having question marks in it). Programs
> relying on the file name to be correct are broken, IMO.

   My thinking now is that there are two modules here, fileobject and
posixmodule which should be handled differently.

   posixmodule is just a library with calls and no state. IIRC there used to
be multiple modules, one per OS, and the correct one was chosen and called
os. I think it is perfectly reasonable for there to be an extra 'ntos'
module that just works on NT that treats all arguments as Unicode (coercing
up using the current locale when given narrow strings) and always calling
the wide APIs. It would contain the same methods (when available) as os. NT
specific code can use it directly and sufficiently interested portable
client code could say something like

if nt:
  filesys = ntos
  filesys = os

   This hides away all the code bloat from posix code, ensures there are no
regressions in posix while developing and debugging ntos, and allows ntos to
just convert all arguments into wide strings without worrying about 9x.

   Maybe call the module osu if there may be implementations on other OS's
like OS X. Could have an enquiry method in the module

if osu.working:
  filesys = osu
  filesys = os

   fileobject is more complex because it holds two strings as state. The
mode can probably be assumed to be ASCII so can be left as a narrow string
(although it does have to be widened to call _wfopen) but the name is more
complex as some client code may just know that it is always a narrow string
and thus die if given a file with a wide name.

> Looks very good indeed. When producing patches, you might want to
> check line endings: currently, your files are a mix of LF only (which
> was there before) and CRLF.


> In open_the_file, you are still checking for utf-8; that should be
> removed also. It seems that open_the_file will always get an
> initialized filed, so passing name does not seem to be necessary: one
> could look at f_name.

   OK. So why are the name and mode passed when they are already available?

> I suggest that f_name stays as a byte string for the moment, and
> open_the_file gets an optional "original name" or "unicode name"
> argument, whatever is more convenient. If that is given, open_the_file
> should consider it, else it should fall back to f_name.

   If this is done then the unicode name should also be available as a field
of the object as those mangled "z??.html" strings are totally useless.

   I'm feeling more like making f_name be wide now but I'd expect some
opposition now from backwards compatibility advocates.

> In posixmodule, I cannot see the move towards passing Unicode objects
> directly, either - I guess you were talking about a future plan,
> above.

   Yes, I'm thinking ahead of the coding. Seeing where I'm already going or
about to go wrong.

> I cannot see the rationale for wfuncNull - wouldn't passing
> passing NULL as a function pointer be sufficient as well?

   Yes, must get used to thinking in C again. I don't think I have written C
for 8 years. WTF can't I declare variables just when I need them <incoherent
cursing and mumbling...>


From  Mon Jan  7 11:55:39 2002
From: (Jack Jansen)
Date: Mon, 07 Jan 2002 12:55:39 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Sun, 6 Jan 2002 23:51:37 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> > There's a lot of Python objects that are really little more than
> > wrappers around an opaque C pointer (plus all the methods to operate
> > on it, etc).
> Can you give a few examples? I'm not aware of any such types, off-hand.

All the Mac toolbox objects (Windows, Dialogs, Controls, Menus and a
zillion more), All the Windows HANDLEs, all the MFC objects (although
they might be a bit more difficult), the objects in the X11 and Motif
modules, the pyexpat parser object, *dbm objects, dlmodule objects,
mpz objects, zlib objects, SGI cl and al objects....

Enough examples? :-)
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Mon Jan  7 12:08:17 2002
From: (M.-A. Lemburg)
Date: Mon, 07 Jan 2002 13:08:17 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <>
Message-ID: <>

Jack Jansen wrote:
> Recently, "Martin v. Loewis" <> said:
> > > There's a lot of Python objects that are really little more than
> > > wrappers around an opaque C pointer (plus all the methods to operate
> > > on it, etc).
> >
> > Can you give a few examples? I'm not aware of any such types, off-hand.
> All the Mac toolbox objects (Windows, Dialogs, Controls, Menus and a
> zillion more), All the Windows HANDLEs, all the MFC objects (although
> they might be a bit more difficult), the objects in the X11 and Motif
> modules, the pyexpat parser object, *dbm objects, dlmodule objects,
> mpz objects, zlib objects, SGI cl and al objects....
> Enough examples? :-)

Sounds like you want to introduce a "buffer" interface for these
objects. If that's the case, please write a PEP for it -- I don't
think anyone on this list wants to see a second can of worms
like the buffer interface in Python :-/

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Mon Jan  7 12:12:16 2002
From: (Jack Jansen)
Date: Mon, 07 Jan 2002 13:12:16 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "M.-A. Lemburg" <> ,
 Mon, 07 Jan 2002 13:08:17 +0100 , <>
Message-ID: <>

Recently, "M.-A. Lemburg" <> said:
> Jack Jansen wrote:
> > 
> > Recently, "Martin v. Loewis" <> said:
> > > > There's a lot of Python objects that are really little more than
> > > > wrappers around an opaque C pointer (plus all the methods to operate
> > > > on it, etc).
> > >
> > > Can you give a few examples? I'm not aware of any such types, off-hand.
> > 
> > All the Mac toolbox objects (Windows, Dialogs, Controls, Menus and a
> > zillion more), All the Windows HANDLEs, all the MFC objects (although
> > they might be a bit more difficult), the objects in the X11 and Motif
> > modules, the pyexpat parser object, *dbm objects, dlmodule objects,
> > mpz objects, zlib objects, SGI cl and al objects....
> > 
> > Enough examples? :-)
> Sounds like you want to introduce a "buffer" interface for these
> objects.

No, that is something completely different. I want a replacement for
PyArg_Parse("O&", funcptr, void**) that has the form
PyArg_Parse("O@", typeobject, void**) and similarly for Py_BuildValue.

Because the typeobject has a Python representation (whereas the
function pointer does not) this would allow modules like struct and
calldll to support objects that have this interface, because these
modules are driven from specifications in Python. There is currently
no way to get from the typeobject to the function pointer needed for
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Mon Jan  7 12:25:48 2002
From: (Fredrik Lundh)
Date: Mon, 7 Jan 2002 13:25:48 +0100
Subject: [Python-Dev] Unicode support in getargs.c
References: <>
Message-ID: <042701c19776$71cb2b80$0900a8c0@spiff>

jack wrote:
> If Python runs on an EBCDIC machine (does it?) (2.2 on as/400) (1.4 on os/390)


From  Mon Jan  7 13:19:27 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 08:19:27 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "AB" == Anthony Baxter <> writes:

    >> I'll leave it up to others to decide what to change,

    AB> Documentation?

    >> although IMHO posix-large-file is broken

    AB> You mean, even with these new build instructions?

Oops, sorry, I meant: I think the instructions on that page are
broken!  LFS support seems to work just fine w/ Martin's patch and the
new instructions.

    >> (and also because those instructions shouldn't be necessary for
    >> Python 2.2).

    AB> They are still going to be necessary for 2.1.2 - I don't want
    AB> to try and play the game of getting this change in and turned
    AB> on by default at this stage of the game... :/

I agree completely!  The 2.2 docs should probably say that those
instructions aren't necessary, but in the 2.1.2 branch it should say
they /are/ needed to turn on LFS.


From  Mon Jan  7 13:45:00 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 08:45:00 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> I'd strongly advise against putting that into the
    MvL> documentation. There are numerous assignments to CC inside
    MvL>, which would override this setting. Setting OPT
    MvL> and CFLAGS is the right way to pass these configuration
    MvL> options.

    >> I'll leave it up to others to decide what to change,
    >>  Documentation?

    MvL> Please, not the way Barry proposes.

Here's another suggestion: add a make variable that isn't used or
anything else, has a default empty value, and is used to create the
compilation command.  Let's say $LARGEFILE.

Then the configure command would be


and that should work on all shells, and without having to permanently
export a variable to the environment, which I think we should avoid


From  Mon Jan  7 14:51:37 2002
From: (Fred L. Drake, Jr.)
Date: Mon, 7 Jan 2002 09:51:37 -0500 (EST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <>
References: <>
Message-ID: <>

Barry A. Warsaw writes:
 > Here's another suggestion: add a make variable that isn't used or
 > anything else, has a default empty value, and is used to create the
 > compilation command.  Let's say $LARGEFILE.

  This seems tolerable.  We should probably look for getconf in the
configure script, and make the default value the result of
"getconf LFS_CFLAGS" if available.  This seems like it would do "the
right thing" more often without user intervention and is safe when
getconf is not available.
  If LARGEFILE were set in the environment (by a command line such as
you suggest), we'd just use that instead.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Mon Jan  7 15:00:48 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 10:00:48 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "Fred" == Fred L Drake, Jr <> writes:

    Fred>   This seems tolerable.  We should probably look for getconf
    Fred> in the configure script, and make the default value the
    Fred> result of "getconf LFS_CFLAGS" if available.  This seems
    Fred> like it would do "the right thing" more often without user
    Fred> intervention and is safe when getconf is not available.  If
    Fred> LARGEFILE were set in the environment (by a command line
    Fred> such as you suggest), we'd just use that instead.


BTW, does anybody have a manpage for getconf?


From  Mon Jan  7 15:06:13 2002
From: (Fred L. Drake, Jr.)
Date: Mon, 7 Jan 2002 10:06:13 -0500 (EST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <>
References: <>
Message-ID: <>

Content-Type: text/plain; charset=us-ascii
Content-Description: message body and .signature
Content-Transfer-Encoding: 7bit

Barry A. Warsaw writes:
 > +1
 > BTW, does anybody have a manpage for getconf?

  Not for Linux, but you aleady have the command line we care about.
I've attached a Solaris manpage.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

Content-Type: text/plain
Content-Description: getconf manpage
Content-Disposition: inline;
Content-Transfer-Encoding: 7bit

User Commands                                          getconf(1)

     getconf - get configuration values

     getconf [ -v _s_p_e_c_i_f_i_c_a_t_i_o_n ]  _s_y_s_t_e_m__v_a_r

     getconf [ -v _s_p_e_c_i_f_i_c_a_t_i_o_n ]  _p_a_t_h__v_a_r  _p_a_t_h_n_a_m_e

     getconf -a

     In the first synopsis form, the getconf utility  will  write
     to  the  standard output the value of the variable specified
     by _s_y_s_t_e_m__v_a_r, in accordance with _s_p_e_c_i_f_i_c_a_t_i_o_n  if  the  -v
     option is used.

     In the second synopsis form, getconf will write to the stan-
     dard  output the value of the variable specified by _p_a_t_h__v_a_r
     for the path  specified  by  _p_a_t_h_n_a_m_e,  in  accordance  with
     _s_p_e_c_i_f_i_c_a_t_i_o_n if the -v option is used.

     In the third synopsis form, config will write to  the  stan-
     dard  output  the  names of the current system configuration

     The value of each configuration variable will be  determined
     as if it were obtained by calling the function from which it
     is defined to be available. The value  will  reflect  condi-
     tions in the current operating environment.

     The following options are supported:

     -a    Writes the names of the current  system  configuration
           variables to the standard output.

     -v _s_p_e_c_i_f_i_c_a_t_i_o_n
           Gives the specification which governs the selection of
           values for configuration variables.

     The following operands are supported:

           A name of a  configuration  variable  whose  value  is
           available  from  the  pathconf(2) function. All of the
           values in the following table are supported:

     LINK_MAX             _N_A_M_E__M_A_X              POSIX_CHOWN_RESTRICTED
     MAX_CANON            _P_A_T_H__M_A_X              POSIX_NO_TRUNC
     MAX_INPUT            PIPE_BUF              POSIX_VDISABLE

SunOS 5.8           Last change: 30 Jan 1998                    1

User Commands                                          getconf(1)

           A path  name  for  which  the  variable  specified  by
           _p_a_t_h__v_a_r is to be determined.

           A name of a  configuration  variable  whose  value  is
           available  from confstr(3C) or sysconf(3C). All of the
           values in the following table are supported:

     ARG_MAX                       _B_C__B_A_S_E__M_A_X
     BC_DIM_MAX                    _B_C__S_C_A_L_E__M_A_X
     BC_STRING_MAX                 CHAR_BIT
     CHAR_MIN                      CHILD_MAX
     CLK_TCK                       COLL_WEIGHTS_MAX
     CS_PATH                       EXPR_NEST_MAX
     INT_MAX                       INT_MIN
     LFS64_CFLAGS                  LFS64_LDFLAGS
     LFS64_LIBS                    LFS64_LINTFLAGS
     LFS_CFLAGS                    LFS_LDFLAGS
     LFS_LIBS                      LFS_LINTFLAGS
     LINE_MAX                      LONG_BIT
     LONG_MAX                      LONG_MIN
     MB_LEN_MAX                    NGROUPS_MAX
     NL_ARGMAX                     NL_LANGMAX
     NL_MSGMAX                     NL_NMAX
     NL_SETMAX                     NL_TEXTMAX
     NZERO                         OPEN_MAX
     POSIX2_C_BIND                 POSIX2_C_DEV
     POSIX2_FORT_DEV               POSIX2_FORT_RUN
     POSIX2_RE_DUP_MAX             POSIX2_SW_DEV
     POSIX2_UPE                    POSIX2_VERSION
     _POSIX_ARG_MAX                _POSIX_CHILD_MAX
     _POSIX_OPEN_MAX               _POSIX_PATH_MAX
     _POSIX_PIPE_BUF               _POSIX_SAVED_IDS
     RE_DUP_MAX                    SCHAR_MAX
     SCHAR_MIN                     SHRT_MAX
     SHRT_MIN                      SSIZE_MAX
     STREAM_MAX                    TMP_MAX
     TZNAME_MAX                    UCHAR_MAX
     UINT_MAX                      ULONG_MAX
     USHRT_MAX                     WORD_BIT

SunOS 5.8           Last change: 30 Jan 1998                    2

User Commands                                          getconf(1)

     XBS5_ILP32_OFF32              XBS5_ILP32_OFF32_CFLAGS
     XBS5_LP64_OFF64               XBS5_LP64_OFF64_CFLAGS
     XBS5_LP64_OFF64_LDFLAGS       XBS5_LP64_OFF64_LIBS
     _XOPEN_CRYPT                  _XOPEN_ENH_I18N
     _XOPEN_LEGACY                 _XOPEN_SHM
     _XOPEN_XPG2                   _XOPEN_XPG3

     The symbol PATH also is recognized, yielding the same  value
     as the confstr() name value CS_PATH.

     See largefile(5) for the  description  of  the  behavior  of
     getconf  when  encountering files greater than or equal to 2
     Gbyte ( 2**31 bytes).

     Example 1: Writing the value of a variable

     This example illustrates the value of {NGROUPS_MAX}:

     example% getconf NGROUPS_MAX

     Example 2: Writing the value of a variable  for  a  specific

     This  example  illustrates  the  value  of  NAME_MAX  for  a
     specific directory:

     example% getconf NAME_MAX /usr

     Example 3: Dealing with unspecified results

     This example shows how to deal more carefully  with  results
     that might be unspecified:

     if value=$(getconf PATH_MAX /usr); then
                          if [ "$value" = "undefined" ]; then
                         echo PATH_MAX in /usr is infinite.
                         echo PATH_MAX in /usr is $value.

SunOS 5.8           Last change: 30 Jan 1998                    3

User Commands                                          getconf(1)

                         echo Error in getconf.

     Note that



     system("getconf POSIX2_C_BIND");

     in a C program could give  different  answers.  The  sysconf
     call  supplies  a  value  that corresponds to the conditions
     when the program was either compiled or executed,  depending
     on  the  implementation;  the  system call to getconf always
     supplies a value corresponding to conditions when  the  pro-
     gram is executed.

     See environ(5) for descriptions of the following environment
     variables  that  affect  the execution of getconf: LC_CTYPE,

     The following exit values are returned:

     0     The specified variable is valid and information  about
           its current state was written successfully.

     >0    An error occurred.

     See attributes(5) for descriptions of the  following  attri-

    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    | Availability                | SUNWcsu                     |

     pathconf(2),   confstr(3C),   sysconf(3C),    attributes(5),
     environ(5), largefile(5)

SunOS 5.8           Last change: 30 Jan 1998                    4


From  Mon Jan  7 17:10:45 2002
From: (Andrew Kuchling)
Date: Mon, 07 Jan 2002 12:10:45 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
Message-ID: <>

[CC'ed to python-dev, Barbara Mattson]

Barbara's encountered an apparent problem with test_longexp in Python
2.2 on MacOS X.  test_longexp creates a big list expression and
eval()'s it.  The problem is that it takes an exceedingly long time to
run, at least more than half an hour (at which point she interrupted

The two curious things are that 1) while test_longexp requires a lot
of memory and often thrashes on a low-memory machine (I found there
are 2 or 3 bugs in the SF bugtracker to this effect), the MacOS box in
question has a gigabyte of RAM, and 2) Python 2.1.1 *doesn't* show the
problem.  Quoting from her report:

	I tried the test_longexp by hand:

	l = eval("[" + "2," * REPS + "]")
	print len(l)

	changing REPS from 1000 to 50000.  1000 and 10000 ran fairly
	quickly - under a minute.  However, 25000 took about 5 minutes
	and 50000 took 23 minutes.  I'm not about to try 65580 (I need
	to get some real work done today, after all :-).  BTW, out of
	curiosity, I tried the same thing under 2.1.1, and even for
	REPS = 70000 it took less than a minute.

Any clues?  

--amk                                                  (
  "Peri, how would you like to meet a genius?"
  "I thought I already have."
    -- The Doctor and Peri, in "Mark of the Rani"

From  Mon Jan  7 17:27:17 2002
From: (Skip Montanaro)
Date: Mon, 7 Jan 2002 11:27:17 -0600
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
References: <>
Message-ID: <>

    amk> test_longexp creates a big list expression and eval()'s it.  The
    amk> problem is that it takes an exceedingly long time to run, at least
    amk> more than half an hour (at which point she interrupted it).
    amk> changing REPS from 1000 to 50000.  1000 and 10000 ran fairly
    amk> quickly - under a minute.  However, 25000 took about 5 minutes and
    amk> 50000 took 23 minutes.
    amk> Any clues?

Try configuring using --with-pymalloc to see if Vladimir's Python-specific
object allocator helps.  Even with a gigabyte of RAM, perhaps the malloc
free list is getting badly fragmented, causing it to churn forever trying to
coalesce memory blocks.

Skip Montanaro ( -

From  Mon Jan  7 22:50:47 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 23:50:47 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: <> (message from Jack
 Jansen on Mon, 07 Jan 2002 12:55:39 +0100)
References: <>
Message-ID: <>

> All the Mac toolbox objects (Windows, Dialogs, Controls, Menus and a
> zillion more), All the Windows HANDLEs, all the MFC objects (although
> they might be a bit more difficult), the objects in the X11 and Motif
> modules, the pyexpat parser object, *dbm objects, dlmodule objects,
> mpz objects, zlib objects, SGI cl and al objects....

Could you please try once more, being serious this time? AFAICT, I was
asking for examples of types that are parsed by means of O& currently,
and do so just to get a void** from the python object.

Looking at pyexpat.c, I find a few uses of O&, none related to the
pyexpat parser object. In zlibmodule.c, I find not a single mentioning
of O&, likewise in dlmodule.c, clmodule.c, almodule.c, dbmmodule.c,
and now I'm losing interest into verifying more of your examples.

AFAICT, you are trying to replace O& with something. Where, exactly
(specific source file and line number), would you want to do that?


From  Mon Jan  7 22:55:47 2002
From: (Martin v. Loewis)
Date: Mon, 7 Jan 2002 23:55:47 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> Here's another suggestion: add a make variable that isn't used or
> anything else, has a default empty value, and is used to create the
> compilation command.  Let's say $LARGEFILE.
> Then the configure command would be
> and that should work on all shells, and without having to permanently
> export a variable to the environment, which I think we should avoid
> recommending.

"is used to create the compilation command" may be tricky to implement.
Anyway, what is wrong with my earlier suggestion

OPT="-g -O2 $CFLAGS" 



From  Mon Jan  7 23:03:41 2002
From: (Skip Montanaro)
Date: Mon, 7 Jan 2002 17:03:41 -0600
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <>
References: <>
Message-ID: <>

(trimming the cc list... i think everyone on it is a p-dev'er)

    Martin> Anyway, what is wrong with my earlier suggestion

    Martin> export CFLAGS OPT
    Martin> OPT="-g -O2 $CFLAGS"
    Martin> ./configure

I know I'm coming into this discussion late, but why even involve CFLAGS?

    export OPT


From  Mon Jan  7 23:17:14 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 00:17:14 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <003b01c1975d$e7dd3070$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil>
Message-ID: <>

>    posixmodule is just a library with calls and no state. IIRC there used to
> be multiple modules, one per OS, and the correct one was chosen and called
> os. I think it is perfectly reasonable for there to be an extra 'ntos'
> module that just works on NT that treats all arguments as Unicode (coercing
> up using the current locale when given narrow strings) and always calling
> the wide APIs. It would contain the same methods (when available) as os. 

I'd be all in favour of bringing ntmodule back into life, especially
if that is to become a module that does not need to work on
Win9x. Perhaps it can be compiled twice, once into w9x.pyd and once
into nt.pyd, or the common code can be shared by means if #include.

I'd also be in favour of killing all 16-bit Windows support in Python
for 2.3; not sure whether 16-bit DOS needs to stay.

>    If this is done then the unicode name should also be available as a field
> of the object as those mangled "z??.html" strings are totally useless.

It is not totally useless. Most users will never see the problem,
because their file names represent well in mbcs. In cases where you do
get replacement characters, it is still useful, since may roughly
recognize what file it is in debugging output (e.g. the file extension
will be ASCII-representable in most applicatons, perhaps you get a
meaningful path in there also).

>    I'm feeling more like making f_name be wide now but I'd expect some
> opposition now from backwards compatibility advocates.

I think the major problem is that performing repr on a file should
work. If that turns out to use the repr of the string (can't check
right now), instead of raising UnicodeErrors, my oposition to putting
Unicode objects into file names is not that strong anymore.

>    Yes, I'm thinking ahead of the coding. Seeing where I'm already
> going or about to go wrong.

That looks very good indeed. I was worried about using UTF-8 as file
system default encoding, because I believe that this encoding should
mandated by the system API, instead of being our choice.


From  Mon Jan  7 23:20:15 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 00:20:15 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> I know I'm coming into this discussion late, but why even involve CFLAGS?

Because without it, autoconf won't detect that large file support is
available, and fail to define HAVE_LARGEFILE. This is because
configure uses CFLAGS on its own for the test scripts, but won't use
OPT. I don't think anything in the configure machinery should change
for 2.1.2, since 2.2 does it all in a different and better way.


From  Tue Jan  8 00:16:28 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 01:16:28 +0100
Subject: [Python-Dev] Including BSDDB3
Message-ID: <>

What do people think about including bsddb3 in Python 2.3, along with
deprecating the existing bsddb module? You'll find the package at

It would come as a bsddb3 package, which acts interface-compatible
with the current bsddb module. Various submodules give access to more
advanced features.

The main rationale for dropping bsddb is that it still relies on the
db_185.h interface, which will be phased out sooner or
later. Existance of this interface, in turn, results in problems with

There are multiple versions of the database files available in the
world, and any BSDDB installation can only handle so many of these
versions. Now, on Linux, it is common that bsddb3 is installed, but
that glibc offers bsddb2 simultaneously. For anydbm to analyse this
situation properly, it would need some of the more advanced bsddb

While this is the rationale for dropping the existing bsddb module
sooner or later, there are, of course, numerous advantages in exposing
the additional BSDDB features, like concurrency, transactions, and

Any opinions?


From  Tue Jan  8 00:47:44 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 19:47:44 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> "is used to create the compilation command" may be tricky to
    MvL> implement.  Anyway, what is wrong with my earlier suggestion

    | export CFLAGS OPT
    | OPT="-g -O2 $CFLAGS" 
    | ./configure

Two problems:

1) This requires you to export two variables into the outer shell's
   environment.  As a general rule, I think this is a bad idea for
   tricking configure.  What else might you be affecting?  Others
   might not care as much.

2) Any time you overload a make variable that has existing semantics,
   you have to worry about losing the original value.  Personally, I
   think it's easier to get CC overloading right than get OPT or
   CFLAGS overloading (and easier than getting them both right).  But
   maybe that's just me.


From  Tue Jan  8 01:07:07 2002
From: (Matthias Klose)
Date: Tue, 8 Jan 2002 02:07:07 +0100
Subject: [Python-Dev] building python info documentation
Message-ID: <15418.17979.711869.453474@gargle.gargle.HOWL>

The info docs cannot be built with the current 2.1/2.2 and HEAD

I found updated versions of the conversion scripts at:

with from
the same site I get a step further ... but get:

emacs -batch api.texi --eval '(progn (goto-char (point-min)) (while
(re-search-forward "\\(@setfilename \\)\\([-a-z]*\\)\n" nil t)
(replace-match "\\1python-\\\n")) (while (search-forward "@node
Front Matter\n@chapter Abstract\n" nil t) (replace-match "@node
Abstract\n@section Abstract\n" nil t)) (progn (mark-whole-buffer)
(texinfo-master-menu (quote update-all-nodes))) (save-buffer))'
End of file during parsing

Is there an updated version available, which works for the python2.2
info files as well?

btw, who is pdm/pdm, who builds the info tarballs for download?

From  Tue Jan  8 01:38:33 2002
From: (Aahz Maruch)
Date: Mon, 7 Jan 2002 17:38:33 -0800 (PST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> from "Barry A. Warsaw" at Jan 07, 2002 07:47:44 PM
Message-ID: <>

Barry A. Warsaw wrote:
> >>>>> "MvL" == Martin v Loewis <> writes:
>     MvL> "is used to create the compilation command" may be tricky to
>     MvL> implement.  Anyway, what is wrong with my earlier suggestion
>     | export CFLAGS OPT
>     | OPT="-g -O2 $CFLAGS" 
>     | ./configure
> Two problems:
> 1) This requires you to export two variables into the outer shell's
>    environment.  As a general rule, I think this is a bad idea for
>    tricking configure.  What else might you be affecting?  Others
>    might not care as much.

OTOH, if MvL's code is in a shell script, this objection doesn't apply.
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Tue Jan  8 01:44:45 2002
From: (Guido van Rossum)
Date: Mon, 07 Jan 2002 20:44:45 -0500
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: Your message of "Tue, 08 Jan 2002 00:17:14 +0100."
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil>
Message-ID: <>

> I'd also be in favour of killing all 16-bit Windows support in Python
> for 2.3; not sure whether 16-bit DOS needs to stay.

I think both can be killed.  Hans Novak has long stopped supporting
his DOS version of Python.

--Guido van Rossum (home page:

From  Tue Jan  8 01:54:09 2002
From: (Guido van Rossum)
Date: Mon, 07 Jan 2002 20:54:09 -0500
Subject: [Python-Dev] Including BSDDB3
In-Reply-To: Your message of "Tue, 08 Jan 2002 01:16:28 +0100."
References: <>
Message-ID: <>

> What do people think about including bsddb3 in Python 2.3, along with
> deprecating the existing bsddb module? You'll find the package at
> It would come as a bsddb3 package, which acts interface-compatible
> with the current bsddb module. Various submodules give access to more
> advanced features.
> The main rationale for dropping bsddb is that it still relies on the
> db_185.h interface, which will be phased out sooner or
> later. Existance of this interface, in turn, results in problems with
> anydbm:
> There are multiple versions of the database files available in the
> world, and any BSDDB installation can only handle so many of these
> versions. Now, on Linux, it is common that bsddb3 is installed, but
> that glibc offers bsddb2 simultaneously. For anydbm to analyse this
> situation properly, it would need some of the more advanced bsddb
> facilities.
> While this is the rationale for dropping the existing bsddb module
> sooner or later, there are, of course, numerous advantages in exposing
> the additional BSDDB features, like concurrency, transactions, and
> cursors.
> Any opinions?

Sounds like a good plan, but we should make sure it can all be
re-released under the PSF license.  For the Zope Corp. portions of the
code I promise that's no problem :-) -- but there are so many other
contributors that it's getting a little tangled...

--Guido van Rossum (home page:

From  Tue Jan  8 03:04:01 2002
From: (Neil Hodgson)
Date: Tue, 8 Jan 2002 14:04:01 +1100
Subject: [Python-Dev] os.listdir("") bug on Windows
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil>              <>  <>
Message-ID: <04be01c197f1$1ff30fa0$0acc8490@neil>

   There is an out-of-bounds error on Windows when using os.listdir("")
which could result in indeterminate behaviour. After parsing the args, it
 ch = namebuf[len-1];
   which indexes before the array as len = 0.
   Possibly change this to
 ch = (len > 0) ? namebuf[len-1] : '\0';


From  Tue Jan  8 03:19:20 2002
From: (Guido van Rossum)
Date: Mon, 07 Jan 2002 22:19:20 -0500
Subject: [Python-Dev] os.listdir("") bug on Windows
In-Reply-To: Your message of "Tue, 08 Jan 2002 14:04:01 +1100."
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <>
Message-ID: <>

Neil, thanks for the bug report, but can you please submit it to
SourceForge?  We don't regularly scan the archives of python-dev
looking for bugs we haven't fixed yet -- but we do use SF as a
reminder (and triage) system.

--Guido van Rossum (home page:

From  Tue Jan  8 03:41:50 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 22:41:50 -0500
Subject: [Python-Dev] Including BSDDB3
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    MvL> What do people think about including bsddb3 in Python 2.3,
    MvL> along with deprecating the existing bsddb module? You'll find
    MvL> the package at


+1, for several reasons.

- Robin's done a great job with the module.  It feels quite solid and
  reliable.  I've used it quite a bit working on Berkeley storage for

- Berkeley support in 2.2 is broken -- at least the rules
  are.  On my stock, but stocked Mandrake 8.1 system, bsddbmodule
  never links right and the standard always deletes it
  because oflink problems.  Fixing this is on My List, although I'd
  prefer to work with pybsddb.

- I've talked to the Sleepycat guys, and if we wanted to, we could
  provide the Berkeley libraries with our distros with no licensing
  problems.  Using Berkeley through the pybsddb binding is perfectly
  legal for any programs using them through Python.

- It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and
  bsddb4?) modules which compile against the older libraries so
  databases written with any version could be accessed in Python.
  Maybe that's not exactly the right way to do it, but I don't think
  Python should be limited to just one version of Berkeley db.  I've
  no idea what the default ought to be -- there's no clear winner.

    MvL> It would come as a bsddb3 package, which acts
    MvL> interface-compatible with the current bsddb module. Various
    MvL> submodules give access to more advanced features.

I often "import bsddb3 as bsddb".

    MvL> The main rationale for dropping bsddb is that it still relies
    MvL> on the db_185.h interface, which will be phased out sooner or
    MvL> later. Existance of this interface, in turn, results in
    MvL> problems with anydbm:

As mentioned above, I can see reasons for wanting to access any
version of Berkeley db.


From  Tue Jan  8 04:06:23 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 23:06:23 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "AM" == Aahz Maruch <> writes:

    AM> OTOH, if MvL's code is in a shell script, this objection
    AM> doesn't apply.

I must have missed that.  Was Martin suggesting a shell script, like


From  Tue Jan  8 04:10:00 2002
From: (Barry A. Warsaw)
Date: Mon, 7 Jan 2002 23:10:00 -0500
Subject: [Python-Dev] Including BSDDB3
References: <>
Message-ID: <>

>>>>> "GvR" == Guido van Rossum <> writes:

    GvR> Sounds like a good plan, but we should make sure it can all
    GvR> be re-released under the PSF license.  For the Zope
    GvR> Corp. portions of the code I promise that's no problem :-) --
    GvR> but there are so many other contributors that it's getting a
    GvR> little tangled...

I /think/ we're just talking mostly about Robin Dunn and Andrew
Kuchling.  From the description on the page, I can't quite tell
whether any of Gregory P. Smith's original code remains.

i'm-sure-andrew-won't-mind-either-ly y'rs,

From  Tue Jan  8 04:09:38 2002
From: (Fred L. Drake, Jr.)
Date: Mon, 7 Jan 2002 23:09:38 -0500 (EST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <>
References: <>
Message-ID: <>

Barry A. Warsaw writes:
 > I must have missed that.  Was Martin suggesting a shell script, like
 > "configure-lfs"?

  As long as configure captures the values to the Makefile, it doesn't
matter whether the user types



CFLAGS=... OPT=... ./configure

is a matter of syntax, not functionality.  We should not rely on any
special environment variables being set after configure has been run.
I think we're wasting time arguing about syntax at this point, and not
making any progress.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Tue Jan  8 05:15:13 2002
From: (Aahz Maruch)
Date: Mon, 7 Jan 2002 21:15:13 -0800 (PST)
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> from "Barry A. Warsaw" at Jan 07, 2002 11:06:23 PM
Message-ID: <>

Barry A. Warsaw wrote:
> >>>>> "AM" == Aahz Maruch <> writes:
>     AM> OTOH, if MvL's code is in a shell script, this objection
>     AM> doesn't apply.
> I must have missed that.  Was Martin suggesting a shell script, like
> "configure-lfs"?

Martin didn't, but it answers your objection.  ;-)
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Tue Jan  8 06:01:51 2002
From: (Barry A. Warsaw)
Date: Tue, 8 Jan 2002 01:01:51 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "AM" == Aahz Maruch <> writes:

    AM> Martin didn't, but it answers your objection.  ;-)

Yes, it would.


From  Tue Jan  8 07:08:27 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 08:08:27 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

> 2) Any time you overload a make variable that has existing semantics,
>    you have to worry about losing the original value.  Personally, I
>    think it's easier to get CC overloading right than get OPT or
>    CFLAGS overloading (and easier than getting them both right).  But
>    maybe that's just me.

Ok. For Solaris and Linux, the instruction about setting CC is about
right, so I'm no longer objecting to changing the documentation in
that direction. It is just that if you specify --without-gcc, or are
on SGI or BSD/OS, that your environment setting of CC will be ignored.


From  Tue Jan  8 07:20:12 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 08:20:12 +0100
Subject: [Python-Dev] Including BSDDB3
In-Reply-To: <> (message
 from Guido van Rossum on Mon, 07 Jan 2002 20:54:09 -0500)
References: <> <>
Message-ID: <>

> Sounds like a good plan, but we should make sure it can all be
> re-released under the PSF license.  For the Zope Corp. portions of the
> code I promise that's no problem :-) -- but there are so many other
> contributors that it's getting a little tangled...

Ok, I'll investigate.


From  Tue Jan  8 07:24:20 2002
From: (Barry A. Warsaw)
Date: Tue, 8 Jan 2002 02:24:20 -0500
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    >> 2) Any time you overload a make variable that has existing
    >> semantics, you have to worry about losing the original value.
    >> Personally, I think it's easier to get CC overloading right
    >> than get OPT or CFLAGS overloading (and easier than getting
    >> them both right).  But maybe that's just me.

    MvL> Ok. For Solaris and Linux, the instruction about setting CC
    MvL> is about right, so I'm no longer objecting to changing the
    MvL> documentation in that direction. It is just that if you
    MvL> specify --without-gcc, or are on SGI or BSD/OS, that your
    MvL> environment setting of CC will be ignored.

Good point.  This should definitely be mentioned in the docs.

    -thread-ly y'rs,

From  Tue Jan  8 07:33:53 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 08:33:53 +0100
Subject: [Python-Dev] Including BSDDB3
In-Reply-To: <> (
References: <> <>
Message-ID: <>

> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and
>   bsddb4?) modules which compile against the older libraries so
>   databases written with any version could be accessed in Python.
>   Maybe that's not exactly the right way to do it, but I don't think
>   Python should be limited to just one version of Berkeley db.  I've
>   no idea what the default ought to be -- there's no clear winner.

I'm not sure how that would work, though. Are you thinking of
different code bases for the modules, or just compiling the same
module multiple times? If the latter, how do you deal with features
that are available only in later versions? E.g. I doubt that the
current _db.c compiles with bsddb2 (not sure it even compiles with
3.0; it may be that 3.1 is required as a minimum).

This *could* be solved with lots of #ifdefs in _db.c, but that sounds
difficult to get right (who has so many versions installed to actually
test that?).

Also, I think it is rare that multiple versions are installed on a
single system: I doubt BSDDB even supports simultaneous installation
of multiple header file sets, on Unix. So even while you can have
multiple versions of the shared library installed, compiling it for
use with these libraries may be tricky.

About the only case where I know about different systems is on Linux,
where glibc incorporates a version of BSDDB2, so you might find
database file of that version that the more recent BSDDB3 cannot open,
anymore. For any other scenario, users are to blame for forgetting to
update their database files when updating the libraries.


From  Tue Jan  8 07:39:28 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 08:39:28 +0100
Subject: Large file system support in 2.1.2 (was Re: [Python-Dev] release for 2.1.2, plus 2.2.1...)
In-Reply-To: <> (
References: <>
 <> <>
Message-ID: <>

>     AM> OTOH, if MvL's code is in a shell script, this objection
>     AM> doesn't apply.
> I must have missed that.  Was Martin suggesting a shell script, like
> "configure-lfs"?

No, I was really talking about the instructions in the manual, which
would then indeed result in OPT being in the environment after
configure has completed. If that is considered unacceptable, I'm fine
with documenting that CC should be set in the environment - even
though such instruction may also break on some systems. Enhancing
configure to take into account more environment variables is worse:
the risk of introducing new errors is just too high.


From  Tue Jan  8 07:46:01 2002
From: (Neil Hodgson)
Date: Tue, 8 Jan 2002 18:46:01 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <>
Message-ID: <060d01c19818$8476cda0$0acc8490@neil>


> I'd be all in favour of bringing ntmodule back into life, especially
> if that is to become a module that does not need to work on
> Win9x. Perhaps it can be compiled twice, once into w9x.pyd and once
> into nt.pyd, or the common code can be shared by means if #include.

   I reversed again, posixmodule now detects Unicode arguments and handles
them in UCS-2 rather than converting to UTF-8 and back again. This now looks
like the right way to me. The total amount of code bloat is about 8K over a
150K file and this doesn't appear to be too much for me.

   A check is made to see if the platform supports Unicode file names and if
it does not then the old conversion to Py_FileSystemDefaultEncoding is done.
This means that Windows 9x should work the same as it currently does. This
check is exposed as os.unicodefilenames() so that client code can decide
whether to use Unicode.

   For other OSs that can support Unicode file names, adiitional cases can
be added into posixmodule. The other platforms (OS X for example) may not
provide these functions as taking UCS-2 arguments but instead UTF-8
arguments. They should still work similarly to the NT code but encode into
UTF-8 before making system calls.

   The basic idea is that if you use a Unicode string for a file or path
name in a call then returned information is in Unicode strings.

> >    I'm feeling more like making f_name be wide now but I'd expect some
> > opposition now from backwards compatibility advocates.

   This is now done.

> I think the major problem is that performing repr on a file should
> work. If that turns out to use the repr of the string (can't check
> right now), instead of raising UnicodeErrors, my oposition to putting
> Unicode objects into file names is not that strong anymore.

   Changed the repr to display Unicode names using escapes so it does not
raise errors.

   _getfullpathname which is available from nt and is used in ntpath now
accepts a Unicode argument and then returns a Unicode path. Haven't checked
ntpath to see if it will work with Unicode.

   New code at

   After waiting a while for comments, I'll package this up as a patch.


From  Tue Jan  8 09:56:58 2002
From: (M.-A. Lemburg)
Date: Tue, 08 Jan 2002 10:56:58 +0100
Subject: [Python-Dev] PEP-time ? (Unicode strings as filenames)
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <>
Message-ID: <>

[Martin and Niel discussing various ways to add Unicode support 
to posixmodule]

Guys, this discussion is getting somewhat out of hand. I believe 
that no-one on python-dev is seriously following this anymore,
yet OTOH your are working on a rather important part of the Python
file API.

I'd suggest to write up the problem and your conclusions as a
PEP for everyone to understand before actually starting to
checkin anything.

One thing I'd like to note (again) is that the code base is
getting somewhat confusing in this area. I may be better to
rip out the various bits and pieces for each supported platform
and put the implementations into separate files -- much like
what Greg has done for the DLL import machinery. This will reduce 
the levels of #ifdefs and make the whole API much more readable 
and understandable.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Tue Jan  8 15:10:25 2002
From: (Skip Montanaro)
Date: Tue, 8 Jan 2002 09:10:25 -0600
Subject: [Python-Dev] Including BSDDB3
In-Reply-To: <>
References: <>
Message-ID: <>

    >> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3 (and
    >> bsddb4?) modules which compile against the older libraries so
    >> databases written with any version could be accessed in Python.

    Martin> I'm not sure how that would work, though. 

Agreed.  I think trying to use multiple versions of libdb-generated files
simultaneously is a disaster waiting to happen.  It's unfortunate that the
folks at Sleepycat haven't been able to provide a more consistent data
format, but I understand that stuff is internal details and can change.
They have been pretty good about providing update tools.

What would be useful is if whatever bsddb module is installed could be more
intelligent about file version errors.  Instead of reporting something
inscrutable like

    >>> db = bsddb.hashopen("tour.db")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    bsddb.error: (-30990, 'Unknown error 4294936306')

I'd like it to realize that it was asked to open an old format file and give
a useful error message like:

    bsddb.error: (-30990, 'Attempt to open old format file - see db_upgrade(1)')

Sleepycat's tools can do this in the face of old files:

    % file tour.db
    tour.db: Berkeley DB (Hash, version 5, native byte-order)
    % db_dump tour.db > tour.txt
    db_dump: tour.db: hash version 5 requires a version upgrade
    db_dump: open: tour.db: DB_OLDVERSION: Database requires a version upgrade
    % db_upgrade tour.db
    % file tour.db
    tour.db: Berkeley DB (Hash, version 7, native byte-order)
    % db_dump tour.db > tour.txt

    Martin> Also, I think it is rare that multiple versions are installed on
    Martin> a single system: I doubt BSDDB even supports simultaneous
    Martin> installation of multiple header file sets, on Unix. 

Actually, RedHat & Mandrake do.  This leads to as many problems as it
solves.  Take a look at the code in

    dblib = []
    if self.compiler.find_library_file(lib_dirs, 'db-3.2'):
        dblib = ['db-3.2']
    elif self.compiler.find_library_file(lib_dirs, 'db-3.1'):
        dblib = ['db-3.1']
    elif self.compiler.find_library_file(lib_dirs, 'db3'):
        dblib = ['db3']
    elif self.compiler.find_library_file(lib_dirs, 'db2'):
        dblib = ['db2']
    elif self.compiler.find_library_file(lib_dirs, 'db1'):
        dblib = ['db1']
    elif self.compiler.find_library_file(lib_dirs, 'db'):
        dblib = ['db']

    db185_incs = find_file('db_185.h', inc_dirs,
                           ['/usr/include/db3', '/usr/include/db2'])
    db_inc = find_file('db.h', inc_dirs, ['/usr/include/db1'])

And it's still not correct, as Barry indicated yesterday.  For example,
suppose that even though db3 is installed on your system you want to only
manipulate db2 databases (perhaps for compatibility with another machine).
You're stuck and have to edit or use Modules/Setup to build bsddb.

    Martin> So even while you can have multiple versions of the shared
    Martin> library installed, compiling it for use with these libraries may
    Martin> be tricky.

Got that right... ;-)

    Martin> For any other scenario, users are to blame for forgetting to
    Martin> update their database files when updating the libraries.

In the presence of anydbm, it's not obvious that users should know what file
format their underlying databases are.


From  Tue Jan  8 16:47:41 2002
From: (Barry A. Warsaw)
Date: Tue, 8 Jan 2002 11:47:41 -0500
Subject: [Python-Dev] Including BSDDB3
References: <>
Message-ID: <>

>>>>> "SM" == Skip Montanaro <> writes:

    >> - It'd be great if we actually provided bsddb1, bsddb2, bsddb3
    >> (and bsddb4?) modules which compile against the older libraries
    >> so databases written with any version could be accessed in
    >> Python.

    Martin> I'm not sure how that would work, though.

    SM> Agreed.

Oops.  I thought I had read that pybsddb could be compiled against
older APIs.  But on a re-read of the pages, that's obviously wrong, so
forget this dumb idea.
    SM> What would be useful is if whatever bsddb module is installed
    SM> could be more intelligent about file version errors.


    Martin> Also, I think it is rare that multiple versions are
    Martin> installed on a single system: I doubt BSDDB even supports
    Martin> simultaneous installation of multiple header file sets, on
    Martin> Unix.

    SM> Actually, RedHat & Mandrake do.  This leads to as many
    SM> problems as it solves.

Indeed, this is broken on Mandrake.  I was trying to get Postfix and
Python to at least agree on the BDB version they were going to use and
it wasn't until I installed pybsddb from source, and rebuilt Postfix
against the separately downloaded Berkeley 3.3.11 libs/API that I got
it all to work.
    SM>   Take a look at the code in

BTW, I think this a large part of the problem when building Py2.2 on
Mandrake 8.1.  Maybe these lines in the setup are /too/ smart?  I seem
to remember having no problems w/ Py2.1.1.

But that's excusable I suppose since pybsddb's has its own
problems!  It should at least recognize a default from-source install
of Sleepycat's libs w/o lots of cryptic command line options.  And
getting "python clean -a" to work right would be a bonus. :)

Also note that pybsddb should now (or soon) work with Berkeley DB 4 so
calling it bsddb3 isn't right either.  I don't think there's a db
format change from BDB 3 -> BDB 4.

bsddb-ng? :)

Okay, I'm rambling.  Let's add pybsddb (under a better name) and keep
bsddbmodule around and /try/ to fix some of the worst installation
problems.  The state of Berkeley DB on various distros doesn't make
our lives easy here, but let's not add to the problems, if at all

I'm willing to help out with all this.  We should also get buy-in from
Robin since we also don't want to fork develoment or have to keep the
two in sync.


From  Tue Jan  8 18:41:37 2002
From: (Tim Peters)
Date: Tue, 8 Jan 2002 13:41:37 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
Message-ID: <>

[Andrew Kuchling]
> [CC'ed to python-dev, Barbara Mattson]
> Barbara's encountered an apparent problem with test_longexp in Python
> 2.2 on MacOS X.  test_longexp creates a big list expression and
> eval()'s it.  The problem is that it takes an exceedingly long time to
> run, at least more than half an hour (at which point she interrupted
> it).
> The two curious things are that 1) while test_longexp requires a lot
> of memory and often thrashes on a low-memory machine (I found there
> are 2 or 3 bugs in the SF bugtracker to this effect), the MacOS box in
> question has a gigabyte of RAM, and 2) Python 2.1.1 *doesn't* show the
> problem.

The test takes about 2 seconds on my box (Win98SE, 256MB, 866MHz), in 2.2 or
2.1.1, and I don't know of any Mac-specific code that might get touched here
except for the C library.  So Skip's suggestion to try pymalloc is a good
one -- although it's hard to see in advance why that would make a difference
in this specific case.

> Quoting from her report:
> 	I tried the test_longexp by hand:
> 	l = eval("[" + "2," * REPS + "]")
> 	print len(l)

Break it into smaller steps so we can narrow down possible causes:

REPS = 50000

print "building list guts"
guts = "2," * REPS

print "building input string"
input = "[" + guts + "]"

print "compiling the input string"
code = compile(input, "<input string>", "eval")

print "executing"
thelist = eval(code)

print len(thelist)

When REPS is large, what's the last thing that gets printed before the huge
delay starts?

From  Tue Jan  8 19:23:09 2002
From: (Thomas Heller)
Date: Tue, 8 Jan 2002 20:23:09 +0100
Subject: [Python-Dev] unicode/string asymmetries
Message-ID: <012501c1987a$0622caa0$e000a8c0@thomasnotebook>

I noticed several unicode/string asymmetries:

1. No support for unicode in the struct and array modules.
Is this an oversight?

2. What would be the corresponding unicode format character for 'z'
in the struct module (string or None)?

3. There does not seem to be an equivalent to the 's' format character
for PyArg_Parse() or Py_BuildValue().


From  Tue Jan  8 19:52:29 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 20:52:29 +0100
Subject: [Python-Dev] PEP-time ? (Unicode strings as filenames)
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <>
Message-ID: <>

> I'd suggest to write up the problem and your conclusions as a
> PEP for everyone to understand before actually starting to
> checkin anything.

We certainly would, if we had achieved any conclusions yet. If you
want, we can continue discussion in private.


From  Tue Jan  8 20:21:27 2002
From: (Gregory P. Smith)
Date: Tue, 8 Jan 2002 12:21:27 -0800
Subject: [pybsddb] Re: [Python-Dev] Including BSDDB3
In-Reply-To: <>; from on Mon, Jan 07, 2002 at 11:10:00PM -0500
References: <> <> <>
Message-ID: <>

On Mon, Jan 07, 2002 at 11:10:00PM -0500, Barry A. Warsaw wrote:
> >>>>> "GvR" == Guido van Rossum <> writes:
>     GvR> Sounds like a good plan, but we should make sure it can all
>     GvR> be re-released under the PSF license.  For the Zope
>     GvR> Corp. portions of the code I promise that's no problem :-) --
>     GvR> but there are so many other contributors that it's getting a
>     GvR> little tangled...
> I /think/ we're just talking mostly about Robin Dunn and Andrew
> Kuchling.  From the description on the page, I can't quite tell
> whether any of Gregory P. Smith's original code remains.
> i'm-sure-andrew-won't-mind-either-ly y'rs,
> -Barry

Consider any of my pybsddb/bsddb3 code that remains [some does i'm sure]
placed under whatever open source license is needed, (PSF license, etc).
(I prefer the code to be used, not bickered about :).


Gregory P. Smith

From  Tue Jan  8 20:24:57 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 21:24:57 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <012501c1987a$0622caa0$e000a8c0@thomasnotebook>
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook>
Message-ID: <>

> I noticed several unicode/string asymmetries:
> 1. No support for unicode in the struct and array modules.
> Is this an oversight?

I'd call it intentional. What exactly would you like to happen?

> 2. What would be the corresponding unicode format character for 'z'
> in the struct module (string or None)?

You mean, in getargs? There is no corresponding thing.

I'd recommend against adding new formats. Instead, I'd propose to add
new conversion functions:

  Py_UNICODE *str;
  PyArg_ParseTuple(args, "O&", &str, PyArg_UnicodeZ);

int PyArg_UnicodeZ(PyObject *o, void *d){
  PyUnicode **dest = (Py_UNICODE**)d;
  if (o == Py_None) {
    *dest = NULL;
    return 1;
  if (PyUnicode_Check(o)){
    *dest = PyUnicode_AS_UNICODE(o);
    return 1;
  PyErr_SetString(PyExc_TypeError, "unicode or None expected");
  return 0;

It may be desirable to allow passing of : or ; strings to conversion
functions, and helper API to format the errors.

> 3. There does not seem to be an equivalent to the 's' format character
> for PyArg_Parse() or Py_BuildValue().

That would be 'u'. However, is this really needed? PyArg_Parse is
deprecated, and I doubt you have Py_UNICODE* often enough to need
it to pass to Py_BuildValue.


From  Tue Jan  8 20:52:27 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 21:52:27 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <060d01c19818$8476cda0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <060d01c19818$8476cda0$0acc8490@neil>
Message-ID: <>

>    I reversed again, posixmodule now detects Unicode arguments and handles
> them in UCS-2 rather than converting to UTF-8 and back again. This now looks
> like the right way to me. The total amount of code bloat is about 8K over a
> 150K file and this doesn't appear to be too much for me.

I agree. We still should keep "mbcs", so extension modules that don't
want to go through the troubles of special-casing Windows will be able
to get it right most of the time.

>    A check is made to see if the platform supports Unicode file names and if
> it does not then the old conversion to Py_FileSystemDefaultEncoding is done.
> This means that Windows 9x should work the same as it currently does. This
> check is exposed as os.unicodefilenames() so that client code can decide
> whether to use Unicode.

That has unclear semantics for me. It sounds like "if true, you can
pass Unicode strings to open etc." However, then it should return 1 on
all systems, since you always can - the default encoding may apply,
and restrict file names to ASCII. Or, it may mean "if true, you can
pass all Unicode strings to open". This is not true, either, because
there are always reserved characters (such as the path delimiter).

>    For other OSs that can support Unicode file names, adiitional cases can
> be added into posixmodule. The other platforms (OS X for example) may not
> provide these functions as taking UCS-2 arguments but instead UTF-8
> arguments. They should still work similarly to the NT code but encode into
> UTF-8 before making system calls.

I think this is not needed. Instead, using setting the file system
encoding to UTF-8 should be sufficient.

> After waiting a while for comments, I'll package this up as a patch.

Very good. Would you also write the PEP? If not, I will, but that may
take some time.


From  Tue Jan  8 21:24:55 2002
From: (Thomas Heller)
Date: Tue, 8 Jan 2002 22:24:55 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <>
Message-ID: <01f601c1988b$03b00d30$e000a8c0@thomasnotebook>

> > I noticed several unicode/string asymmetries:
> > 
> > 1. No support for unicode in the struct and array modules.
> > Is this an oversight?
> I'd call it intentional. What exactly would you like to happen?

I would like to create struct's containing unicode characters
(be gentle with me, maybe I mean wide characters, or mbcs, but I'm really
not sure)


From  Tue Jan  8 21:55:14 2002
From: (Neil Hodgson)
Date: Wed, 9 Jan 2002 08:55:14 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <060d01c19818$8476cda0$0acc8490@neil> <>
Message-ID: <01f601c1988f$27222920$0acc8490@neil>


> That has unclear semantics for me. It sounds like "if true, you can
> pass Unicode strings to open etc." However, then it should return 1 on
> all systems, since you always can - the default encoding may apply,
> and restrict file names to ASCII. Or, it may mean "if true, you can
> pass all Unicode strings to open". This is not true, either, because
> there are always reserved characters (such as the path delimiter).

   OK, it means:

   If true, the underlying system supports file names containing most
Unicode characters and any valid file name may be passed to open as a
Unicode string.

   Yes, the "most" is fuzzy but just as with normal strings, the file system
gets to put special meaning on delimiters, restrict file name length, and
disallow characters such as  \u0000.

> > After waiting a while for comments, I'll package this up as a patch.
> Very good. Would you also write the PEP? If not, I will, but that may
> take some time.

   I'll try in the next day or so but may bail if not able to work on it
much as I have some backlog from spending time on this rather than other


From  Tue Jan  8 22:17:53 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 23:17:53 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <01f601c1988b$03b00d30$e000a8c0@thomasnotebook>
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook>
Message-ID: <>

> > > 1. No support for unicode in the struct and array modules.
> > > Is this an oversight?
> > 
> > I'd call it intentional. What exactly would you like to happen?
> I would like to create struct's containing unicode characters
> (be gentle with me, maybe I mean wide characters, or mbcs, but I'm really
> not sure)

Well, that is precisely the problem: When putting a Unicode object
into a C structure, there are too many alternatives to pick a sensible
default. It is not even clear what a "wide character" is: it mide be a
value of wchar_t, or it might be a value of Py_UNICODE (those differ
on Unix, in the default installation). 

For "MBCS", the most reasonable default might be "utf-8", since this
capable of encoding all characters. On Windows, "mbcs" is also a good
choice, since it uses the encoding that all character API uses.

Why are you asking? Do you have a specific implementation in mind, or
are you just worried that Unicode objects cannot be put into
structures? Don't worry, file objects cannot be put into structures,
either :-)


From  Tue Jan  8 22:19:40 2002
From: (Martin v. Loewis)
Date: Tue, 8 Jan 2002 23:19:40 +0100
Subject: [Python-Dev] Unicode strings as filenames
In-Reply-To: <01f601c1988f$27222920$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <060d01c19818$8476cda0$0acc8490@neil> <> <01f601c1988f$27222920$0acc8490@neil>
Message-ID: <>

>    If true, the underlying system supports file names containing most
> Unicode characters and any valid file name may be passed to open as a
> Unicode string.

So what is the value of exposing this to Python? It seems to be
Windows-specific, so I doubt it should be generalized.


From  Tue Jan  8 23:35:29 2002
From: (Neil Hodgson)
Date: Wed, 9 Jan 2002 10:35:29 +1100
Subject: [Python-Dev] Unicode strings as filenames
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <060d01c19818$8476cda0$0acc8490@neil> <> <01f601c1988f$27222920$0acc8490@neil> <>
Message-ID: <035c01c1989d$28710ef0$0acc8490@neil>


> >    If true, the underlying system supports file names containing most
> > Unicode characters and any valid file name may be passed to open as a
> > Unicode string.
> So what is the value of exposing this to Python? It seems to be
> Windows-specific, so I doubt it should be generalized.

   It differentiates between those systems where open decodes Unicode file
names into a particular locale (possibly losing information) and those
systems that preserve Unicode file names. The set of systems where this is
true could change in the future. A sufficiently motivated Windows 9x user
could make it work there, possibly by looking for the long names in the
directory data and converting them to short names.

   When this is false, client code may be prepared to offer a more
reasonable error message indicating the the locale may be set incorrectly or
even try multiple locales in order to open a file. Mmm, there is a Japanese
character in that file name so I'll try temporarily changing the locale to
Japanese to open the file.


From  Wed Jan  9 01:48:28 2002
From: (Guido van Rossum)
Date: Tue, 08 Jan 2002 20:48:28 -0500
Subject: [Python-Dev] Please help making the Python track at OSCON 2002 a success!
Message-ID: <>

July 22-26 is the date for O'Reilly's Open Source Convention.  San
Diego is the location.

I've been enlisted by O'Reilly to try and make the Python track a
success.  But I can't do it by myself: I need people to help rustle up
speakers and review proposals for presentations and tutorials.  If you
think you'll be able to make it to the conference this year, please
consider helping out!

See here for more info:

If you want to help, please let me know!

--Guido van Rossum (home page:

From  Wed Jan  9 07:51:15 2002
From: (Thomas Heller)
Date: Wed, 9 Jan 2002 08:51:15 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <>
Message-ID: <024a01c198e2$823d2280$e000a8c0@thomasnotebook>

> > I would like to create struct's containing unicode characters
> > (be gentle with me, maybe I mean wide characters, or mbcs, but I'm really
> > not sure)
> Well, that is precisely the problem: When putting a Unicode object
> into a C structure, there are too many alternatives to pick a sensible
> default. It is not even clear what a "wide character" is: it mide be a
> value of wchar_t, or it might be a value of Py_UNICODE (those differ
> on Unix, in the default installation). 
> For "MBCS", the most reasonable default might be "utf-8", since this
> capable of encoding all characters. On Windows, "mbcs" is also a good
> choice, since it uses the encoding that all character API uses.
> Why are you asking? Do you have a specific implementation in mind, or
> are you just worried that Unicode objects cannot be put into
> structures? Don't worry, file objects cannot be put into structures,
> either :-)
Hehe, I don't want to put objects in structures, I just want to buid
structures containing "Unicode strings".

Actually, in this case I'm trying to build a win32 VS_VERSIONINFO
structure, which contains a field WCHAR szKey[].
MSDN says: 
  Contains the Unicode string "VS_VERSION_INFO".

Currently I use something like the following code to access the
raw buffer:

  struct.pack("32s", str(buffer(u"VS_VERSION_INFO")))

Looks strange but works:

>>> print repr(struct.pack("32s", (str(buffer(u"VS_VERSION_INFO")))))


From  Wed Jan  9 09:02:54 2002
From: (M.-A. Lemburg)
Date: Wed, 09 Jan 2002 10:02:54 +0100
Subject: [Python-Dev] PEP-time ? (Unicode strings as filenames)
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > I'd suggest to write up the problem and your conclusions as a
> > PEP for everyone to understand before actually starting to
> > checkin anything.
> We certainly would, if we had achieved any conclusions yet. If you
> want, we can continue discussion in private.

No, please keep it on python-dev; at least then the arguments 
will be kept in the archives.

Still, I don't expect anyone here to closely follow the discussion
and with most of the PythonLabs team being busy on other tasks
you'll have to find some way to summarize the discussion for them
and others to review at some later point in time. PEPs are the right
method for this, IMHO.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 09:02:12 2002
From: (Fredrik Lundh)
Date: Wed, 9 Jan 2002 10:02:12 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook>
Message-ID: <077201c198ec$56b9b740$0900a8c0@spiff>

thomas wrote:

> Hehe, I don't want to put objects in structures, I just want to buid
> structures containing "Unicode strings".

there is no such thing.

what you want is a binary buffer with an *encoded*
unicode string.

to get one, figure out what encoding you need (probably
utf-16-le), convert the string to a byte string using the
encode method, and store that byte string in your struct.

def wu(str):
    # encode unicode string for win32 apis
    return str.encode("utf-16-le")

struct.pack("32s", wu(u"VS_VERSION_INFO"))

> struct.pack("32s", str(buffer(u"VS_VERSION_INFO")))

that's evil: you're assuming that Python will always use the
same internal representation for unicode strings.  that's not
the case.


From  Wed Jan  9 09:33:19 2002
From: (M.-A. Lemburg)
Date: Wed, 09 Jan 2002 10:33:19 +0100
Subject: [Python-Dev] parser markers vs. conversion functions (unicode/string
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > 2. What would be the corresponding unicode format character for 'z'
> > in the struct module (string or None)?
> You mean, in getargs? There is no corresponding thing.
> I'd recommend against adding new formats. Instead, I'd propose to add
> new conversion functions:
>   Py_UNICODE *str;
>   PyArg_ParseTuple(args, "O&", &str, PyArg_UnicodeZ);
> int PyArg_UnicodeZ(PyObject *o, void *d){
> ...
> }

Why do you think that adding the conversion functions to getargs.c
would be any different from adding new parser markers ? 

As I understand "O&", it is meant for user-space conversion functions, 
not system provided ones. The latter can easily be intergated as 
parser markers or options to parser markers. Unless, of course, you 
want to start shifting from parser markers to conversion functions 
completely (which I doubt).

Note that "O&" doesn't really buy you anything much: you could
just as well use "O" and then switch on the returned object
type or call a converter (with all the extra error handling
or other extra information needed for your particular case).
> It may be desirable to allow passing of : or ; strings to conversion
> functions, and helper API to format the errors.

You'd need a new parser marker option to support this new 

In the end, I don't believe we gain much from beefing up the
"O&" interface. I'd rather like to see the Unicode parser
markers extended to be more useful (I'll checkin a patch for
"u#" later today).

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 10:17:30 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 11:17:30 +0100
Subject: [Python-Dev] parser markers vs. conversion functions (unicode/string
In-Reply-To: <> (
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <>
Message-ID: <>

> Why do you think that adding the conversion functions to getargs.c
> would be any different from adding new parser markers ? 

For two reasons:

- people who want portability across Python versions can better
  maintain their source code. They just need to provide a definition
  of the conversion function for older Python versions, which they
  can copy literally from the more recent version.
- the code becomes more readable, since function names are more
  self-documenting than single letter codes.

> As I understand "O&", it is meant for user-space conversion functions, 
> not system provided ones. 

It may have been originally defined for that purpose. I believe it 
would useful to provide a standard library of such functions.

> Unless, of course, you want to start shifting from parser markers to
> conversion functions completely (which I doubt).

I would, in fact, prefer if the set of conversion codes is frozen, and
extended only for cases that are likely to get wide applicability. I
believe many of the codes invented for Unicode have never been used in
any module, it seems that some have been invented just for an abstract
notion of "symmetry".

> Note that "O&" doesn't really buy you anything much: you could
> just as well use "O" and then switch on the returned object
> type or call a converter (with all the extra error handling
> or other extra information needed for your particular case).

People are apparently fond of a single function that simultaneously
checks the validity of all arguments. If it fails, it will completely
clean up.

That makes me wonder about the existing converters and their cleanup
capabilities: Suppose I do

  char *buffer = NULL;
  int i;
  if (PyArg_ParseTuple(args, "eti", &buffer, &i))
    return NULL;

Now suppose I pass a Unicode object for the first argument, and a list
for the second. Is it true that this code will leak? since the first
argument has already been converted, and the second leads to an error,
the encoded string has already been produced.

> In the end, I don't believe we gain much from beefing up the
> "O&" interface. I'd rather like to see the Unicode parser
> markers extended to be more useful (I'll checkin a patch for
> "u#" later today).

How will that deal with string objects?


From  Wed Jan  9 11:55:12 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 12:55:12 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper"
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Mon, 7 Jan 2002 23:50:47 +0100 , <>
Message-ID: <>

> > All the Mac toolbox objects (Windows, Dialogs, Controls, Menus and a
> > zillion more), All the Windows HANDLEs, all the MFC objects (although
> > they might be a bit more difficult), the objects in the X11 and Motif
> > modules, the pyexpat parser object, *dbm objects, dlmodule objects,
> > mpz objects, zlib objects, SGI cl and al objects....
> Could you please try once more, being serious this time? AFAICT, I was
> asking for examples of types that are parsed by means of O& currently,
> and do so just to get a void** from the python object.

Shall we try to keep this civil, please? I *am* being serious, and I'm getting 
slightly upset that with this subject (again) you appear to start shooting 
away without trying very hard to understand the issue I'm raising.

> Looking at pyexpat.c, I find a few uses of O&, none related to the
> pyexpat parser object. In zlibmodule.c, I find not a single mentioning
> of O&, likewise in dlmodule.c, clmodule.c, almodule.c, dbmmodule.c,
> and now I'm losing interest into verifying more of your examples.

Ok, let me rephrase my list then. The first five items in my list, which you 
carefully ignored, are examples of objects that now already make heavy use of 
O&. The rest are examples of other objects that wrap a C pointer, and which 
could potentially also be opened up to use in struct or calldll.

And to give a complete example of how useful this would be consider the 
following. I'll give a mac-centric example, because I don't know enough about 
calldll on windows (and I don't think there's a unix version yet).

Assume you're using Python to extend Photoshop. Assume Photoshop has an API to 
allow the plugin to get at the screen. Let's assume that there's a C call
extern GrafPtr ps_GetDrawableSurface(void);
to get at the datastructure you need to draw to.
These GrafPtr's are (in Mac/Modules/qd/_Quickdraw.c) wrapped in 
Carbon.Qd.GrafPortType objects in Python.

In the current situation, if you would want to wrap this ps_GetDrawableSurface 
function you would need to write a C wrapper (which means you would need a C 
compiler, etc etc) because you would need to convert the return value with 
("O&", GrafObj_new). If we had something like ("O@", typeobject) calldll could 
be extended so you could do something like
psapilib = calldll.getlibrary(....)
ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,

(newcall() arguments are funcpointer, return value type, arg1 type, ...)

You cannot do this currently, because there is no way to get from the type 
object (which is the only thing you have available in Python) to the functions 
you need to pass to O&.

- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 12:12:56 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 13:12:56 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Tue, 8 Jan 2002 21:24:57 +0100 , <>
Message-ID: <>

> > 3. There does not seem to be an equivalent to the 's' format character
> > for PyArg_Parse() or Py_BuildValue().
> That would be 'u'. However, is this really needed? PyArg_Parse is
> deprecated, 

Huh, what did I miss? Why is PyArg_Parse deprecated, and by what should it be 

> and I doubt you have Py_UNICODE* often enough to need
> it to pass to Py_BuildValue.

Martin, have you ever wrapped any Unicode API's? (As opposed to using unicode 
as a purely internal datatype, which you clearly know a lot about). Thomas' 
question are similar to mine from last week, and Neil's are related too. All 
the niceties we have for strings (optional ones with z, autoconversion from 
unicode, s# to get the size) are missing for unicode, and that's a pain when 
you're wrapping an existing C api.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 12:25:19 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 13:25:19 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "Fredrik Lundh" <> ,
 Wed, 9 Jan 2002 10:02:12 +0100 , <077201c198ec$56b9b740$0900a8c0@spiff>
Message-ID: <>

> thomas wrote:
> > Hehe, I don't want to put objects in structures, I just want to buid
> > structures containing "Unicode strings".
> there is no such thing.
> what you want is a binary buffer with an *encoded*
> unicode string.

It becomes more and more clear to me that there are two groups of people on 
this list: those who understand unicode (and may or may not actually use it) 
and those who want to use unicode (but apparently don't understand it). I'm in 
the second group:-)

> to get one, figure out what encoding you need (probably
> utf-16-le), convert the string to a byte string using the
> encode method, and store that byte string in your struct.
> def wu(str):
>     # encode unicode string for win32 apis
>     return str.encode("utf-16-le")
> struct.pack("32s", wu(u"VS_VERSION_INFO"))

Why would you have to specify the encoding if what you want is the normal, 
standard encoding? Or, to rephrase the question, why do C programmers only 
have to s/char/wchar_t/, add a "w" to the front of the routine names and a u 
in front of the string constants, whereas Python programmers are now suddenly 
expected to learn all this mumbo-jumbo about encodings and such?
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 13:16:30 2002
From: (Fredrik Lundh)
Date: Wed, 9 Jan 2002 14:16:30 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <>
Message-ID: <08b701c1990f$dbdb0e10$0900a8c0@spiff>

jack wrote:
> > struct.pack("32s", wu(u"VS_VERSION_INFO"))
> Why would you have to specify the encoding if what you want is the normal,
> standard encoding?

because there is no such thing as a "normal, standard
encoding" for a unicode character, just like there's no
"normal, standard encoding" for an integer (big endian,
little endian?), a floating point number (ieee, vax, etc),
a screen coordinate, etc.

as soon as something gets too large to store in a byte,
there's always more than one obvious way to store it ;-)

> Or, to rephrase the question, why do C programmers only
> have to s/char/wchar_t/

because they're tend to prefer to quickly get the wrong
result? ;-)

C makes no guarantees about wchar_t, so Python's Unicode
type doesn't rely on it (it can use it, though: you can check
the HAVE_USABLE_WCHAR_T macro to see if it's the same
thing; see PyUnicode_FromWideChar for an example).

in the Mac case, it might be easiest to configure things so
that HAVE_USABLE_WCHAR_T is always true, and assume
that Py_UNICODE is the same thing as wchar_t.  (checking
this in the module init function won't hurt, of course)

but you cannot rely on that if you're writing truly portable


From  Wed Jan  9 14:00:48 2002
From: (Thomas Heller)
Date: Wed, 9 Jan 2002 15:00:48 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper"  objects
References: <>
Message-ID: <04bc01c19916$22437120$e000a8c0@thomasnotebook>

From: "Jack Jansen" <>
> And to give a complete example of how useful this would be consider the 
> following. I'll give a mac-centric example, because I don't know enough about 
> calldll on windows (and I don't think there's a unix version yet).
> Assume you're using Python to extend Photoshop. Assume Photoshop has an API to 
> allow the plugin to get at the screen. Let's assume that there's a C call
> extern GrafPtr ps_GetDrawableSurface(void);
> to get at the datastructure you need to draw to.
> These GrafPtr's are (in Mac/Modules/qd/_Quickdraw.c) wrapped in 
> Carbon.Qd.GrafPortType objects in Python.
> In the current situation, if you would want to wrap this ps_GetDrawableSurface 
> function you would need to write a C wrapper (which means you would need a C 
> compiler, etc etc) because you would need to convert the return value with 
> ("O&", GrafObj_new). If we had something like ("O@", typeobject) calldll could 
> be extended so you could do something like
> psapilib = calldll.getlibrary(....)
> ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> Carbon.Qd.GrafPortType)
> (newcall() arguments are funcpointer, return value type, arg1 type, ...)
> You cannot do this currently, because there is no way to get from the type 
> object (which is the only thing you have available in Python) to the functions 
> you need to pass to O&.

In Python 2.2, the type object can itself be an instance, and you could call
classmethods on it...
I'm doing something similar on windows.


From  Wed Jan  9 14:07:57 2002
From: (Thomas Heller)
Date: Wed, 9 Jan 2002 15:07:57 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff>
Message-ID: <04ca01c19917$2229b220$e000a8c0@thomasnotebook>

From: "Fredrik Lundh" <>
> thomas wrote:
> > Hehe, I don't want to put objects in structures, I just want to buid
> > structures containing "Unicode strings".
> there is no such thing.
> what you want is a binary buffer with an *encoded*
> unicode string.
> to get one, figure out what encoding you need (probably
> utf-16-le), convert the string to a byte string using the
> encode method, and store that byte string in your struct.
> def wu(str):
>     # encode unicode string for win32 apis
>     return str.encode("utf-16-le")
> struct.pack("32s", wu(u"VS_VERSION_INFO"))

Thanks, works great. And utf-16-le *seems* to be what I want...

Next question ;-), sorry for beeing off-topic for python-dev:

How can I do the equivalent of
  u"some string"
in terms of
  unicode("some string", encoding)



From  Wed Jan  9 14:52:39 2002
From: (Fred L. Drake, Jr.)
Date: Wed, 9 Jan 2002 09:52:39 -0500 (EST)
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <>
References: <>
Message-ID: <>

Jack Jansen writes:
 > Huh, what did I miss? Why is PyArg_Parse deprecated, and by what
 > should it be replaced?

  I think it is only recommended to avoid this as the argument-parsing
function for an extension function/method; PyArg_ParseTuple() should
be used instead since it can give better error messages using the
:funcname syntax for the format string (which is strongly


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Wed Jan  9 14:55:11 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 15:55:11 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "Fredrik Lundh" <> ,
 Wed, 9 Jan 2002 14:16:30 +0100 , <08b701c1990f$dbdb0e10$0900a8c0@spiff>
Message-ID: <>

> jack wrote:
> > > struct.pack("32s", wu(u"VS_VERSION_INFO"))
> >
> > Why would you have to specify the encoding if what you want is the normal,
> > standard encoding?
> because there is no such thing as a "normal, standard
> encoding" for a unicode character, just like there's no
> "normal, standard encoding" for an integer (big endian,
> little endian?), a floating point number (ieee, vax, etc),
> a screen coordinate, etc.

What I here call the "normal, standard encoding" is what the C library 
supports. Your analogy of integers and floats is exactly the right one: even 
though there are many ways to represent an integer what you get back from 
PyArg_Parse("l") is a standard C "long".

Maybe the confusion is that whereever I have said "unicode" in the past I 
should have said "wchar_t". I know there are, in theory, many encodings of 
Unicode but in practice there is only one that I'm interested in most of the 
time and that's wchar_t, because that's what all my APIs want.

So, I would like PyArg_Parse/Py_BuildValue formats that are symmetric to "s", 
"s#" and "z" but that return wchar_t strings and that work with both 
UnicodeObjects and StringObjects.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 15:00:32 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 16:00:32 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper"
In-Reply-To: Message by "Thomas Heller" <> ,
 Wed, 9 Jan 2002 15:00:48 +0100 , <04bc01c19916$22437120$e000a8c0@thomasnotebook>
Message-ID: <>

> > You cannot do this currently, because there is no way to get from the type 
> > object (which is the only thing you have available in Python) to the functions 
> > you need to pass to O&.
> In Python 2.2, the type object can itself be an instance, and you could call
> classmethods on it...
> I'm doing something similar on windows.

Could you explain how you do this? If I have the typeobject, how would I get 
to the address of the "int (*converter)(PyObject *, void *)" function?

- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 14:56:15 2002
From: (Guido van Rossum)
Date: Wed, 09 Jan 2002 09:56:15 -0500
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Your message of "Wed, 09 Jan 2002 09:52:39 EST."
References: <> <> <>
Message-ID: <>

> Jack Jansen writes:
>  > Huh, what did I miss? Why is PyArg_Parse deprecated, and by what
>  > should it be replaced?
>   I think it is only recommended to avoid this as the argument-parsing
> function for an extension function/method; PyArg_ParseTuple() should
> be used instead since it can give better error messages using the
> :funcname syntax for the format string (which is strongly
> recommended!).
>   -Fred

The other problem with PyArg_Parse that PyArg_ParseTuple avoids is
that a function declared as taking N arguments can also be called with
a single tuple of N items.  This is not supposed to happen (you should
use apply or the *args call notation for that).

--Guido van Rossum (home page:

From  Wed Jan  9 15:03:45 2002
From: (Jack Jansen)
Date: Wed, 09 Jan 2002 16:03:45 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "Fred L. Drake, Jr." <> ,
 Wed, 9 Jan 2002 09:52:39 -0500 (EST) , <>
Message-ID: <>

> Jack Jansen writes:
>  > Huh, what did I miss? Why is PyArg_Parse deprecated, and by what
>  > should it be replaced?
>   I think it is only recommended to avoid this as the argument-parsing
> function for an extension function/method; PyArg_ParseTuple() should
> be used instead [...]

Ow, ok, I knew about that one. Silly me:-)
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 15:11:13 2002
From: (Thomas Heller)
Date: Wed, 9 Jan 2002 16:11:13 +0100
Subject: Was Re: [Python-Dev] unicode/string asymmetries
References: <><><> <>
Message-ID: <006a01c1991f$f8949ed0$e000a8c0@thomasnotebook>

From: "Fred L. Drake, Jr." <>
> Jack Jansen writes:
>  > Huh, what did I miss? Why is PyArg_Parse deprecated, and by what
>  > should it be replaced?
>   I think it is only recommended to avoid this as the argument-parsing
> function for an extension function/method; PyArg_ParseTuple() should
> be used instead since it can give better error messages using the
> :funcname syntax for the format string (which is strongly
> recommended!).

Offtopic again: PyArg_ParseTuple() is also nice for parsing a tuple
in C code, which you for example receive as a result from calling a method.
IIRC the only problem here is that it may throw weird error
messages if the object is not a tuple.
Instead of 'TypeError: unpack non-sequence' you get a
'SystemError: new style getargs format but argument is not a tuple'.

Should this be changed?


From  Wed Jan  9 15:22:05 2002
From: (Just van Rossum)
Date: Wed,  9 Jan 2002 16:22:05 +0100
Subject: Was Re: [Python-Dev] unicode/string asymmetries
In-Reply-To: <006a01c1991f$f8949ed0$e000a8c0@thomasnotebook>
Message-ID: <20020109162207-r01010800-f5d854de-0920-010c@>

Thomas Heller wrote:

> Offtopic again: PyArg_ParseTuple() is also nice for parsing a tuple
> in C code, which you for example receive as a result from calling a method.
> IIRC the only problem here is that it may throw weird error
> messages if the object is not a tuple.
> Instead of 'TypeError: unpack non-sequence' you get a
> 'SystemError: new style getargs format but argument is not a tuple'.

You can do that with PyArg_Parse(), too, if you point parens around your format
string, as in this converter function:

    CGPoint_Convert(PyObject *v, CGPoint *p_itself)
        if( !PyArg_Parse(v, "(ff)",
                &p_itself->y) )
            return 0;
        return 1;

The nice is that this will accept _any_ (length 2) sequence, not just tuples! So
this seems to be a case where PyArg_Parse() is actually better than


From  Wed Jan  9 15:30:50 2002
From: (Guido van Rossum)
Date: Wed, 09 Jan 2002 10:30:50 -0500
Subject: [Python-Dev] Re: PyArg_ParseTuple
In-Reply-To: Your message of "Wed, 09 Jan 2002 16:11:13 +0100."
References: <><><> <>
Message-ID: <>

> Offtopic again: PyArg_ParseTuple() is also nice for parsing a tuple
> in C code, which you for example receive as a result from calling a method.
> IIRC the only problem here is that it may throw weird error
> messages if the object is not a tuple.
> Instead of 'TypeError: unpack non-sequence' you get a
> 'SystemError: new style getargs format but argument is not a tuple'.
> Should this be changed?

No, you should test for PyTuple_Check before calling
PyArg_ParseTuple.  Why do you think it's called that?

The other problem with this use, alas, is that when it catches a
legitimate error, the error it reports is confusing if you don't
change it.  Example:

>>> from socket import *
>>> s = socket(AF_INET, SOCK_STREAM)
>>> s.bind(())
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: getsockaddrarg() takes exactly 2 arguments (0 given)

--Guido van Rossum (home page:

From  Wed Jan  9 16:02:13 2002
From: (Walter =?ISO-8859-15?Q?D=F6rwald?=)
Date: Wed, 09 Jan 2002 17:02:13 +0100
Subject: [Python-Dev] Re: PyArg_ParseTuple
References: <><><> <>              <006a01c1991f$f8949ed0$e000a8c0@thomasnotebook> <>
Message-ID: <>

Guido van Rossum wrote:

> [...]
> No, you should test for PyTuple_Check before calling
> PyArg_ParseTuple.  Why do you think it's called that?
> The other problem with this use, alas, is that when it catches a
> legitimate error, the error it reports is confusing if you don't
> change it.  Example:
>>>>from socket import *
>>>>s = socket(AF_INET, SOCK_STREAM)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: getsockaddrarg() takes exactly 2 arguments (0 given)

This should be fixed by using ;error message in the format string.

    Walter Dörwald

From  Wed Jan  9 16:45:19 2002
From: (M.-A. Lemburg)
Date: Wed, 09 Jan 2002 17:45:19 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <>
Message-ID: <>

Jack Jansen wrote:
> ...
> So, I would like PyArg_Parse/Py_BuildValue formats that are symmetric to "s",
> "s#" and "z" but that return wchar_t strings and that work with both
> UnicodeObjects and StringObjects.

How about this: we add a wchar_t codec to Python and the "eu#" parser
marker. Then you could write:

	wchar_t value = NULL;
	int len = 0;
	if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
                return NULL;

	return ...

or, for 8-bit strings:

	char value = NULL;
	int len = 0;
	if (PyArg_ParseTuple(tuple, "es#", "latin-1", &value, &len) < 0)
                return NULL;

	return ...

Is that symmetric enough ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 16:50:32 2002
From: (M.-A. Lemburg)
Date: Wed, 09 Jan 2002 17:50:32 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <>
Message-ID: <>

Jack Jansen wrote:
> > > 3. There does not seem to be an equivalent to the 's' format character
> > > for PyArg_Parse() or Py_BuildValue().
> Martin:
> > and I doubt you have Py_UNICODE* often enough to need
> > it to pass to Py_BuildValue.
> Martin, have you ever wrapped any Unicode API's? (As opposed to using unicode
> as a purely internal datatype, which you clearly know a lot about). Thomas'
> question are similar to mine from last week, and Neil's are related too. All
> the niceties we have for strings (optional ones with z, autoconversion from
> unicode, s# to get the size) are missing for unicode, and that's a pain when
> you're wrapping an existing C api.

Jack, please take a look at the very complete C API we have for Unicode.
AFACTL, the Unicode API has more to offer than even the string C API.

BTW, the "u" and "u#" build markers are available too, so there should
be no problem using them for Py_BuildValue().

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 16:56:47 2002
From: (Thomas Heller)
Date: Wed, 9 Jan 2002 17:56:47 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper"  objects
References: <>
Message-ID: <015a01c1992e$b8557c40$e000a8c0@thomasnotebook>

> > > You cannot do this currently, because there is no way to get from the type 
> > > object (which is the only thing you have available in Python) to the functions 
> > > you need to pass to O&.
> > 
> > In Python 2.2, the type object can itself be an instance, and you could call
> > classmethods on it...
> > I'm doing something similar on windows.
> Could you explain how you do this? If I have the typeobject, how would I get 
> to the address of the "int (*converter)(PyObject *, void *)" function?

Jack, it seems I misunderstood you (slightly?).
I was talking about the other direction (constructing Python objects
from C pointers or handles).
I had to invent a special convention: I use O& with a function which
calls obj->as_parameter() to convert from Python to C, but of course
this gives you no typechecks as your O@ proposal does.

I've reread your original O@ proposal, and I like it very much.
Aren't there really any other positive responses?


From  Wed Jan  9 18:42:11 2002
From: (M.-A. Lemburg)
Date: Wed, 09 Jan 2002 19:42:11 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <>
Message-ID: <>

Jack Jansen wrote:
> Recently, "M.-A. Lemburg" <> said:
> > Sounds like you want to introduce a "buffer" interface for these
> > objects.
> No, that is something completely different. I want a replacement for
> PyArg_Parse("O&", funcptr, void**) that has the form
> PyArg_Parse("O@", typeobject, void**) and similarly for Py_BuildValue.
> Because the typeobject has a Python representation (whereas the
> function pointer does not) this would allow modules like struct and
> calldll to support objects that have this interface, because these
> modules are driven from specifications in Python. There is currently
> no way to get from the typeobject to the function pointer needed for
> O&.

If I'm not mistaken this looks like an interface which resembles
the copyreg registry where you ask an object for a way to
pickle itself and a way to restore itself from the pickle.
(I think one of the ways pickle supports this more directly
is by looking for a reduce method.)

That would be nice to have indeed. For the simple objects you
have in mind, the void* could be wrapped into a PyCObject, BTW.

Could you write this up as a short PEP ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 19:36:58 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 20:36:58 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (message from Jack
 Jansen on Wed, 09 Jan 2002 13:25:19 +0100)
References: <>
Message-ID: <>

> Why would you have to specify the encoding if what you want is the normal, 
> standard encoding? 

Well, because utf-16-le definitely is *not* the normal, standard
encoding. It is only the right thing if the C type is WCHAR[], which
is a Microsoft invention.

> Or, to rephrase the question, why do C programmers only have to
> s/char/wchar_t/, add a "w" to the front of the routine names and a u
> in front of the string constants, whereas Python programmers are now
> suddenly expected to learn all this mumbo-jumbo about encodings and
> such?

That is definitely not the only thing that C programmers have to
do. They need to invoke conversion functions all the time. Plus, they
are faced with the problem that, when integrating different
Unicode-supporting libraries, they have to convert forth and back
between different Unicode types.


From  Wed Jan  9 21:14:26 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 22:14:26 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (message from Jack
 Jansen on Wed, 09 Jan 2002 15:55:11 +0100)
References: <>
Message-ID: <>

> So, I would like PyArg_Parse/Py_BuildValue formats that are
> symmetric to "s", "s#" and "z" but that return wchar_t strings and
> that work with both UnicodeObjects and StringObjects.

Unfortunately, that is quite difficult. Python does not guarantee that
the internal representation of Unicode strings uses wchar_t, so such a
conversion definitely requires explicit memory management. This is
unlike plain strings, which do guarantee that the internal
representation is char[].


From  Wed Jan  9 21:24:28 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 22:24:28 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (
References: <> <>
Message-ID: <>

> How about this: we add a wchar_t codec to Python and the "eu#" parser
> marker. Then you could write:
> 	wchar_t value = NULL;
> 	int len = 0;
> 	if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
>                 return NULL;

Wouldn't that code be incorrect if there are further format argument
whose conversion could fail also?

I think format specifiers that require explicit memory management are
so difficult to use that they must be avoided. I'd be in favour of
extending the argtuple type to include additional slots for objects
that go away when the tuple goes away.


From  Wed Jan  9 21:11:50 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 22:11:50 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (message from Jack
 Jansen on Wed, 09 Jan 2002 13:12:56 +0100)
References: <>
Message-ID: <>

> Huh, what did I miss? Why is PyArg_Parse deprecated, and by what
> should it be replaced?

Not precisely; METH_OLDARGS and its combination with Py_ArgParse is
deprecated, use PyArg_ParseTuple instead. That still leaves a few uses
of PyArg_Parse, but these are really to special to worry about.

> > and I doubt you have Py_UNICODE* often enough to need
> > it to pass to Py_BuildValue.

> Martin, have you ever wrapped any Unicode API's? (As opposed to
> using unicode as a purely internal datatype, which you clearly know
> a lot about).

Certainly, I've tried providing libiconv interfacing. I was strongly
pushing the notion that Py_UNICODE is equal to wchar_t on all
platforms, that notion was unfortunately rejected.

As a result, using wchar_t together with Python Unicode objects is
difficult. No existing C library reliably accepts Py_UNICODE*, if
anything, they accept wchar_t* (although Microsoft, and apparently
also Apple, manages to use yet another type, further complicating

There are exceptions: on some platforms, Py_UNICODE currently is equal
to wchar_t, like Windows. That may change in the future, if people
request full Unicode support (i.e. a 4-byte Unicode type) - then
Py_UNICODE might differ from WCHAR even on Windows. At that time, any
code that currently assumes they are equal will break. So I'd rather
educate people about the issues now than having to come up with
work-arounds when they eventually run into them.

> Thomas' question are similar to mine from last week, and Neil's are
> related too. All the niceties we have for strings (optional ones
> with z, autoconversion from unicode, s# to get the size) are missing
> for unicode, and that's a pain when you're wrapping an existing C
> api.

These problems are inherent in the subject matter: the C support of
Unicode, and its relationship to the char type is inherently

If Python would offer a struct code that translates into wchar_t, he'd
get away with that on Window. However, it seemed to me that the
specific structure was primarily used in files, so code that tries to
fill it should use formats that are platform-independent. For the
integer types, that means you cannot just use the "i" format, but you
need to know what the integer range is (i.e. 8, 16, 32, or 64
bits). Likewise, for strings, you need to know what the width of each
character, and the endianness is.

Furthermore, apart from Windows, I doubt *anybody* puts wide strings
in platform encoding into files. I'd hope anybody else is so smart to
clearly define the encoding used when representing Unicode strings in
byte-oriented files, streams, and structures.


From  Wed Jan  9 21:41:52 2002
From: (Neil Hodgson)
Date: Thu, 10 Jan 2002 08:41:52 +1100
Subject: [Python-Dev] unicode/string asymmetries
References: <> <>
Message-ID: <008f01c19956$73c4f790$0acc8490@neil>


> Unfortunately, that is quite difficult. Python does not guarantee that
> the internal representation of Unicode strings uses wchar_t, so such a
> conversion definitely requires explicit memory management.

   This could be a problem with my file patches as I have been using
PyUnicode_AS_UNICODE which will 4 byte strings if Py_UNICODE_WIDE is
defined. 4 byte strings can not be passed to the Windows API. So it looks
like PyUnicode_AsWideChar has to be used instead with a wrapper to allocate
enough memory to hold the resulting string.


From  Wed Jan  9 22:12:59 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 23:12:59 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper"
In-Reply-To: <> (message from Jack
 Jansen on Wed, 09 Jan 2002 12:55:12 +0100)
References: <>
Message-ID: <>

> > Could you please try once more, being serious this time? AFAICT, I was
> > asking for examples of types that are parsed by means of O& currently,
> > and do so just to get a void** from the python object.
> Shall we try to keep this civil, please? I *am* being serious

Please accept my apologies. I was expecting a single specific example,
and was somewhat surprised to get a list of unspecific ones.

> Ok, let me rephrase my list then. The first five items in my list,
> which you carefully ignored

I have ignored the Mac toolbox objects, since I don't know what they
are, and where to find their source code. I have ignored Windows
HANDLEs, since I don't have PythonWin sources readily available; I
don't know what the X11 and Motif modules are.

Now I've looked somewhat throught the Python source, and found
Mac/Modules/Win/_Winmodule.c:WinObj_SetWindowModality (taking an
arbitrary that seemed to match your description of "Windows"). Is that
one of the examples you were referring to? If so, I still cannot
understand the example. It reads

	if (!PyArg_ParseTuple(_args, "lO&",
	                      WinObj_Convert, &inUnavailableWindow))

so it appears that you would like to rewrite this as

	if (!PyArg_ParseTuple(_args, "lO@",
	                      WinObj_Type, &inUnavailableWindow))

Now, if that is how it is supposed to look like: How exactly would it
work? WinObj_Convert accepts None, integers, and WinObjs. It seems
that the rewritten version would only accept WinObj objects.

> extern GrafPtr ps_GetDrawableSurface(void);

> If we had something like ("O@", typeobject) calldll could 
> be extended so you could do something like
> psapilib = calldll.getlibrary(....)
> ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> 	Carbon.Qd.GrafPortType)
> (newcall() arguments are funcpointer, return value type, arg1 type, ...)
> You cannot do this currently

Please let me try to summarize what this is doing: Given a type object
and a long, create an instance of that type. Is that a correct
analysis of what has to be done?

I see two ways to do that currently:

1. Arrange that it is possible to construct GrafPortType objects from
   integers. Then you do

   curarg = PyObject_Call(returntype, "l", c_rv);

   inside calldll.c:cdc_call

2. Extend the type object to, say, MacType, which offers special support
   for calldll, to allow creation of instances given a long value.

> because there is no way to get from the type object (which is the
> only thing you have available in Python) to the functions you need
> to pass to O&.

I completely fail to see how O& fits into the puzzle. AFAICT,
conversion of the return value occurs inside cdc_call. There is no
tuple to parse anyway nearby.


From  Wed Jan  9 22:15:24 2002
From: (Martin v. Loewis)
Date: Wed, 9 Jan 2002 23:15:24 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <04ca01c19917$2229b220$e000a8c0@thomasnotebook>
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook>
Message-ID: <>

> How can I do the equivalent of
>   u"some string"
> in terms of
>   unicode("some string", encoding)

Again, what do you need that for? If there won't be any escape
sequences or non-ASCII characters inside, then

   unicode("some string", "ascii")

will work fine. In the general case,

   unicode("some string", "unicode-escape")

should work.


From  Wed Jan  9 23:14:25 2002
From: (M.-A. Lemburg)
Date: Thu, 10 Jan 2002 00:14:25 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > How about this: we add a wchar_t codec to Python and the "eu#" parser
> > marker. Then you could write:
> >
> >       wchar_t value = NULL;
> >       int len = 0;
> >       if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
> >                 return NULL;
> Wouldn't that code be incorrect if there are further format argument
> whose conversion could fail also?

Yes; you'd currently have to write:

      wchar_t value = NULL;
      int len = 0;
      if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
          goto onError;


      if (value)
      return NULL;

> I think format specifiers that require explicit memory management are
> so difficult to use that they must be avoided. I'd be in favour of
> extending the argtuple type to include additional slots for objects
> that go away when the tuple goes away.

I don't understand that last comment.

Anyway, you've got a point there: allocated buffers should be
freed in case the PyArg_ParserTuple() API fails (and then
reset the *buffer pointer to NULL).

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan  9 23:57:15 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 00:57:15 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Wed, 9 Jan 2002 23:12:59 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> Now I've looked somewhat throught the Python source, and found
> Mac/Modules/Win/_Winmodule.c:WinObj_SetWindowModality (taking an
> arbitrary that seemed to match your description of "Windows"). Is that
> one of the examples you were referring to? If so, I still cannot
> understand the example. It reads
> 	if (!PyArg_ParseTuple(_args, "lO&",
> 	                      &inModalKind,
> 	                      WinObj_Convert, &inUnavailableWindow))
> so it appears that you would like to rewrite this as
> 	if (!PyArg_ParseTuple(_args, "lO@",
> 	                      &inModalKind,
> 	                      WinObj_Type, &inUnavailableWindow))
> Now, if that is how it is supposed to look like: How exactly would it
> work? WinObj_Convert accepts None, integers, and WinObjs. It seems
> that the rewritten version would only accept WinObj objects.

Basically correct, but there is no reason why the rewritten version
would accept only WinObj's. ("O@", typeobj, ptr) would call
typeobj->tp_convert(arg[i], ptr)
and the semantics of tp_convert would be '"cast" arg PyObject to
whatever your type is and store the C pointer value for that thing in
ptr'. Or, to make things clearer, WinObj_Type->tp_convert would simply
point to the current WinObj_Convert function.

> > If we had something like ("O@", typeobject) calldll could 
> > be extended so you could do something like
> > psapilib = calldll.getlibrary(....)
> > ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> > 	Carbon.Qd.GrafPortType)
> > 
> > (newcall() arguments are funcpointer, return value type, arg1 type, ...)
> > 
> > You cannot do this currently
> Please let me try to summarize what this is doing: Given a type object
> and a long, create an instance of that type. Is that a correct
> analysis of what has to be done?

That would allow you to do the same thing, but rather more error prone
(i.e. I think it is much more of a hack than what I'm trying to get
at). As you noted above WinObj's unfortunately need such a hack, but I
would expect to get rid of it as soon as possible. I really don't like
passing C pointers around in Python integers.

> I completely fail to see how O& fits into the puzzle. AFAICT,
> conversion of the return value occurs inside cdc_call. There is no
> tuple to parse anyway nearby.

Not at the moment, but in calldll version 2 there would be. In stead
of passing types as "l" or "h" you would pass type objects to
newcall(). Newcall() would probably special-case the various ints but
for all other types simply call PyArg_Parse(arg, "O@", typeobj,
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 00:17:57 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 01:17:57 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "M.-A. Lemburg" <> ,
 Wed, 09 Jan 2002 17:45:19 +0100 , <>
Message-ID: <>

Recently, "M.-A. Lemburg" <> said:
> How about this: we add a wchar_t codec to Python and the "eu#" parser
> marker. Then you could write:
> 	wchar_t value = NULL;
> 	int len = 0;
> 	if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
>                 return NULL;

I like it! Even though I have to do the memory management myself (and
have to think of the error case) it at least looks reasonable. I'm
assuming here that if I pass a StringObject it will be unicode-encoded
using the default encoding, and that unicode value will then be
converted to wchar_t and put in value, right? Or, in other words,
passing "a.out" will do the same as passing u"a.out"...

One minor misgiving is that this call will *always* copy the string,
even if the internal coding of unicode objects is wchar_t. That's a
bit of a nuisance, but we can try to fix that later.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 00:18:09 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 01:18:09 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <008f01c19956$73c4f790$0acc8490@neil> (
References: <> <> <008f01c19956$73c4f790$0acc8490@neil>
Message-ID: <>

>    This could be a problem with my file patches as I have been using
> PyUnicode_AS_UNICODE which will 4 byte strings if Py_UNICODE_WIDE is
> defined. 4 byte strings can not be passed to the Windows API. So it looks
> like PyUnicode_AsWideChar has to be used instead with a wrapper to allocate
> enough memory to hold the resulting string.

Yes. Unfortunately, that would be much more inefficient. So I'd
suggest you just put an assertion into the code that Py_UNICODE is the
same size as WCHAR (that can be even done through a preprocessor
#error, using the _SIZE #defines). I'll expect people will resist
changing Py_UNICODE on Windows for quite some time, even if other
platforms move on.


From  Thu Jan 10 00:21:14 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 01:21:14 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "Thomas Heller" <> ,
 Wed, 9 Jan 2002 17:56:47 +0100 , <015a01c1992e$b8557c40$e000a8c0@thomasnotebook>
Message-ID: <>

Recently, "Thomas Heller" <> said:
> I've reread your original O@ proposal, and I like it very much.
> Aren't there really any other positive responses?

You and Marc-Andre, so far.

I'll write a PEP, as MAL suggested. Sigh, two PEPs on my plate:-)
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Wed Jan  9 23:15:35 2002
From: (NovaLuz)
Date: Wed, 09 Jan 2002 21:15:35 -0200
Subject: [Python-Dev] =?iso-8859-1?Q?Don=B4t_stay_in_the_dark?=
Message-ID: <>

This is a Multipart MIME message.

Content-Type: multipart/alternative;

Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit

Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: base64



Content-Type: image/gif; name="image001.gif"
Content-Transfer-Encoding: base64
Content-ID: <551094910425@image001.gif>


Content-Type: image/gif; name="image002.gif"
Content-Transfer-Encoding: base64
Content-ID: <551097023513@image002.gif>



From  Thu Jan 10 07:17:30 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 08:17:30 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (
References: <> <> <> <>
Message-ID: <>

> > I think format specifiers that require explicit memory management are
> > so difficult to use that they must be avoided. I'd be in favour of
> > extending the argtuple type to include additional slots for objects
> > that go away when the tuple goes away.
> I don't understand that last comment.

I was suggesting that the tuple passed to C API should not be of <type
'tuple'>, but of <type 'argtuple'>, which should have a method
add_object(o), which puts a reference to o into the tuple. Then,
whenever you want to return memory to the user, you create a string
object whose contents is that memory, and you put a reference to the
string into the argument tuple. 

The author of the C function then does not need to worry about memory
management: the memory will be deallocated when the argument tuple is

Unfortunately, that approach cannot be used for the existing
conversion codes that return memory, since it is the extension's job
to release the memory; changing that would break extensions which do
properly release memory.


From  Thu Jan 10 07:32:20 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 08:32:20 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <> (message from Jack
 Jansen on Thu, 10 Jan 2002 01:17:57 +0100)
References: <>
Message-ID: <>

> One minor misgiving is that this call will *always* copy the string,
> even if the internal coding of unicode objects is wchar_t. That's a
> bit of a nuisance, but we can try to fix that later.

Not sure what you mean by "later". Once this is being used, you cannot
fix it anymore. Extensions *will* have to call PyMem_Free, and when
they do so, changing the format specifier to do something better won't
be possible, anymore, since the call to PyMem_Free will be in the way.


From  Thu Jan 10 07:27:39 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 08:27:39 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: <> (message from Jack
 Jansen on Thu, 10 Jan 2002 00:57:15 +0100)
References: <>
Message-ID: <>

> Or, to make things clearer, WinObj_Type->tp_convert would simply
> point to the current WinObj_Convert function.

So what do you gain with that extension? It seem all that is done is
you can replace _Convert by _Type everywhere, with no additional
change to the semantics.

> > > ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> > > 	Carbon.Qd.GrafPortType)
> Not at the moment, but in calldll version 2 there would be. In stead
> of passing types as "l" or "h" you would pass type objects to
> newcall(). Newcall() would probably special-case the various ints but
> for all other types simply call PyArg_Parse(arg, "O@", typeobj,
> &voidptr). 

I still don't understand. In your example, GrafPortType is a return
type, not an argument type. So you *have* an anything, and you *want*
the GrafPortType. How exactly do you use PyArg_Parse in that scenario?

Also, why would you use this extension inside newcall()? I'd rather
expect it in ps_GetDrawableSurface.__call__ instead (i.e. when you
deal with a specific call, not when you create the callable instance).


From Anthony Baxter <>  Thu Jan 10 08:17:02 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Thu, 10 Jan 2002 19:17:02 +1100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Message from (Barry A. Warsaw)
 of "Fri, 04 Jan 2002 02:53:11 CDT." <>
Message-ID: <>

>>> Barry A. Warsaw wrote
>     AB> Ok, I'd like to make the 2.1.2 release some time in the first
>     AB> half of the week starting 7th Jan, assuming that's ok for the
>     AB> folks who'll need to do the work on the PC/Mac packaging.

I'm doing this this evening; i.e. now.

> I'd be more inclined to clone PEP 101 into a PEP 102 with micro
> release instructions.  The nice thing about 101 is that you can just
> go down the list, checking things off in a linear fashion as you
> complete each item.  I'd be loathe to break up the linearity of that.

Ok. I'm doing this as I go. Should I just check in PEP 102 directly, or
is that Not The Done Thing?

>     AB> I don't have access to, so someone else's
>     AB> going to need to do this.
> I can certainly help with any fiddling necessary on creosote.  Then
> again...
> ...if this is going to be a recurring role, we might just want to give
> you access to the web cvs tree and creosote.

Whichever works for you.

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Thu Jan 10 08:49:32 2002
From: (M.-A. Lemburg)
Date: Thu, 10 Jan 2002 09:49:32 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <>
Message-ID: <>

Jack Jansen wrote:
> Recently, "M.-A. Lemburg" <> said:
> > How about this: we add a wchar_t codec to Python and the "eu#" parser
> > marker. Then you could write:
> >
> >       wchar_t value = NULL;
> >       int len = 0;
> >       if (PyArg_ParseTuple(tuple, "eu#", "wchar_t", &value, &len) < 0)
> >                 return NULL;
> I like it! Even though I have to do the memory management myself (and
> have to think of the error case) it at least looks reasonable. 

Good :-)

> I'm
> assuming here that if I pass a StringObject it will be unicode-encoded
> using the default encoding, and that unicode value will then be
> converted to wchar_t and put in value, right? Or, in other words,
> passing "a.out" will do the same as passing u"a.out"...

> One minor misgiving is that this call will *always* copy the string,
> even if the internal coding of unicode objects is wchar_t. That's a
> bit of a nuisance, but we can try to fix that later.

Copying will always take place (either into a preallocated buffer
or one which the PyArg_ParseTuple() API allocates), but then: 
that's the cost you have to pay for the simplicity of the approach.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 10 09:10:58 2002
From: (Thomas Heller)
Date: Thu, 10 Jan 2002 10:10:58 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <>
Message-ID: <03be01c199b6$cfcfe350$e000a8c0@thomasnotebook>

> > > If we had something like ("O@", typeobject) calldll could 
> > > be extended so you could do something like
> > > psapilib = calldll.getlibrary(....)
> > > ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> > > Carbon.Qd.GrafPortType)
> > > 
> > > (newcall() arguments are funcpointer, return value type, arg1 type, ...)
> > > 
> > > You cannot do this currently
> > 
> > Please let me try to summarize what this is doing: Given a type object
> > and a long, create an instance of that type. Is that a correct
> > analysis of what has to be done?
> That would allow you to do the same thing, but rather more error prone
> (i.e. I think it is much more of a hack than what I'm trying to get
> at). As you noted above WinObj's unfortunately need such a hack, but I
> would expect to get rid of it as soon as possible. I really don't like
> passing C pointers around in Python integers.
> > I completely fail to see how O& fits into the puzzle. AFAICT,
> > conversion of the return value occurs inside cdc_call. There is no
> > tuple to parse anyway nearby.
> Not at the moment, but in calldll version 2 there would be. In stead
> of passing types as "l" or "h" you would pass type objects to
> newcall(). Newcall() would probably special-case the various ints but
> for all other types simply call PyArg_Parse(arg, "O@", typeobj,
> &voidptr). 

Here's an outline which could work in 2.2:

Create a subtype of type, having a tp_convert slot:

typedef int (*convert_func)(PyTypeObject *, void **);

typedef struct {
    PyTypeObject type;
    convert_func tp_convert;
} WrapperTypeType;

and use it as metaclass (metatype?) for your WindowObj:

class WindowObj(...):
    __metaclass__ = WrapperTypeType

Write a function to return a conversion function:

convert_func *get_converter(PyTypeObject *type)
    if (WrapperTypeType_Check(type))
        return ((WrapperTypeType *)type)->tp_convert;
    /* code to check additional types and return their converters */

and then

if (!PyArg_ParseTuple(args, "O&", get_converter(WinObj_Type), &Window))

How does this sound?


From  Thu Jan 10 09:22:29 2002
From: (Thomas Heller)
Date: Thu, 10 Jan 2002 10:22:29 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <>
Message-ID: <03d001c199b8$6c70d290$e000a8c0@thomasnotebook>

> > How can I do the equivalent of
> >   u"some string"
> > in terms of
> >   unicode("some string", encoding)
> Again, what do you need that for? If there won't be any escape
> sequences or non-ASCII characters inside, then
>    unicode("some string", "ascii")
> will work fine. In the general case,
>    unicode("some string", "unicode-escape")
> should work.

In the case of pure ASCII, unicode("some string") also works.

Here's what I'm trying to do:
I have a string variable containing some non-ascii characters (from
a characterset which was previously called 'ansi' instead of 'oem'
on windows).
For example the copyright symbol "=A9" (repr("=A9") gives "\xa9").
Now I want to convert this string to unicode.
u"=A9" works fine, unicode(variable) gives an ASCII decoding error.


From  Thu Jan 10 11:14:31 2002
From: (M.-A. Lemburg)
Date: Thu, 10 Jan 2002 12:14:31 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook>
Message-ID: <>

Thomas Heller wrote:
> > > How can I do the equivalent of
> > >   u"some string"
> > > in terms of
> > >   unicode("some string", encoding)
> For example the copyright symbol "=A9" (repr("=A9") gives "\xa9").
> Now I want to convert this string to unicode.
> u"=A9" works fine, unicode(variable) gives an ASCII decoding error.

u"something" maps to unicode("something", "latin-1"). This is because
Unicode literals in Python are interpreted as being Latin-1.=20

See the source code encoding PEP (0263) for details on what could be=20
done to make this user-configurable.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 10 12:24:29 2002
From: (Anthony Baxter)
Date: Thu, 10 Jan 2002 23:24:29 +1100
Subject: [Python-Dev] 2.1.2 testing.
Message-ID: <>

Has anyone had a chance to test that 2.1.2 builds and works correctly 
on anything? I'm testing on the following systems. sourceforge compile
farm boxes are marked as such, compaq testdrive boxes to arrive as well[1].

For each, a fresh cvs export, followed by ./configure ; make ; make test.

Are there additional useful tests that could be run?

Linux/x86               Redhat 6.2                     PASSED
Linux/x86               Redhat 7.1                     PASSED
Linux/x86               Redhat 7.2                     PASSED
Solaris/sparc           2.7 (gcc-2.95.2)               PASSED
Linux/x86               Debian 2.2 (         PASSED 
Linux/PPC [RS/6000]     Debian 2.2 (         PASSED
Linux/alpha             Debian 2.2 (         PASSED
FreeBSD                 4.4 (                PASSED
Solaris/sparc           2.8 ( (gcc-2.95.2)   PASSED
Tru64/Alpha             4.0 (compaq)                   ... still building ...
Tru64/Alpha             5.1 (compaq)                   ... to be done ...

Linux/sparc             Debian 2.2 (         FAILED
This is scary. I don't know why this one alone fails - it fails the
test_math test.

Running the test by hand:
    anthonybaxter@usf-cf-sparc-linux-1:~/python212_linxsparc$ PYTHONPATH= ./python  ./Lib/test/
    math module, testing with eps 1e-05
    Traceback (most recent call last):
      File "./Lib/test/", line 21, in ?
        testit('acos(-1)', math.acos(-1), math.pi)
    OverflowError: math range error

Running math.acos(-1) gives the correct answer. Anyone got any idea?

I couldn't get py212 to build on our remaining solaris/x86 box, but then
I can't get 2.1.1 to build on it either, without a whole lot of manual 
hackery - so I don't care about that. It's just a stuffed machine. :)

I was hoping to test on MacOS X, but the boxes aren't
answering... anyone else want to give it a go?


[1] sheesh. had to install telnet for the compaq boxes. first time I've not
had ssh access somewhere for a while. . . (plus, they don't have cvs. sigh.)

From  Thu Jan 10 13:33:43 2002
From: (Skip Montanaro)
Date: Thu, 10 Jan 2002 07:33:43 -0600
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: <>
References: <>
Message-ID: <>

    Anthony> Has anyone had a chance to test that 2.1.2 builds and works
    Anthony> correctly on anything? 

I will give it a quick try on my Mandrake 8.1 system.  What's the relevant
CVS branch?  I didn't see anything obvious like "r212".


From Anthony Baxter <>  Thu Jan 10 13:34:56 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Fri, 11 Jan 2002 00:34:56 +1100
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: Message from "Skip Montanaro" <>
 of "Thu, 10 Jan 2002 07:33:43 MDT." <>
Message-ID: <>

It's still release21-maint. I'm waiting til I've finished my testing before
making the tag. (As it's a bugfix release, I'm not making a release branch
off the existing maintenance branch (that path leads to madness))

>>> "Skip Montanaro" wrote
>     Anthony> Has anyone had a chance to test that 2.1.2 builds and works
>     Anthony> correctly on anything? 
> I will give it a quick try on my Mandrake 8.1 system.  What's the relevant
> CVS branch?  I didn't see anything obvious like "r212".
> Skip

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Thu Jan 10 14:04:18 2002
From: (Thomas Heller)
Date: Thu, 10 Jan 2002 15:04:18 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook> <>
Message-ID: <06e901c199df$cb1ff330$e000a8c0@thomasnotebook>

My problem is solved. I'm using now

  unicode(some_string, "latin-1").encode("utf-16-le")


  unicode(some_string, "unicode-escape").encode("utf-16-le")

to pack "unicode strings" (not sure about the terminology)
into my structures.

It seems PEP100 and the unicode standard (link in PEP 100)
should be required reading for everyone using unicode.

Thanks again, MaL, Martin, /F.


From  Thu Jan 10 14:31:58 2002
From: (Guido van Rossum)
Date: Thu, 10 Jan 2002 09:31:58 -0500
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
Message-ID: <>

Do we need a beta for the 2.1.2 release?  I think it might be prudent
-- Anthony's last-minute checking of a critical fix to a bug that
prevented compilation on one platform points this out again.

The alternative is to be optimistic, and to quickly release 2.1.3 if
2.1.2 has a problem that we discover after its release.

Opinions?  I think a beta is prudent, and it shouldn't cost too much
more in effort -- if we're lucky, nothing changes and we just fiddle
some version numbers.  If it turns out to be needed, it's better than
having to wear a brown bag over your head. :-)

--Guido van Rossum (home page:

From Anthony Baxter <>  Thu Jan 10 14:44:44 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Fri, 11 Jan 2002 01:44:44 +1100
Subject: [Python-Dev] Re: 2.1.2 release -- do we need a beta?
In-Reply-To: Message from Guido van Rossum <>
 of "Thu, 10 Jan 2002 09:31:58 CDT." <>
Message-ID: <>

>>> Guido van Rossum wrote
> Do we need a beta for the 2.1.2 release?  I think it might be prudent
> -- Anthony's last-minute checking of a critical fix to a bug that
> prevented compilation on one platform points this out again.

Maybe. But on the other hand, I've also done a bunch of different builds
on as many platforms as I could find. 
[The oopsie I found was actually probably the most complex merge of the
lot, and that's not saying much. put it down to too many CVS checkouts 
and not enough brain :)]

The ugliness potential is from either those platforms that are an offense
against nature that no-one thinks to try, or from some sort of weird 
compilation options. I don't think that there's many of the fixes in 
the 2.1.2 code that are going to break something that worked before - 
with the list of platforms I've hit tonight, I think I've got most of
the new code exercised. (One of the minor-ish constraints I put on 
candidate fixes was whether or not I could easily test it.) 

The other question I have to ask is whether people will actually download
and test a beta/release candidate of a bugfix release. 

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Thu Jan 10 15:01:52 2002
From: (Skip Montanaro)
Date: Thu, 10 Jan 2002 09:01:52 -0600
Subject: [Python-Dev] 2.1.2 build on Mandrake
Message-ID: <>

I got the usual output on my Mandrake 8.1 system when building the
release21-maint branch:

    126 tests OK.
    1 test failed: test_linuxaudiodev
    13 tests skipped: test_al test_cd test_cl test_dbm test_dl test_gl
    test_imgfile test_largefile test_nis test_socketserver test_sunaudiodev
    test_winreg test_winsound


From  Thu Jan 10 15:07:49 2002
From: (Thomas Heller)
Date: Thu, 10 Jan 2002 16:07:49 +0100
Subject: [Python-Dev] Re: 2.1.2 release -- do we need a beta?
References: <>
Message-ID: <075701c199e8$afb9b410$e000a8c0@thomasnotebook>

> >>> Guido van Rossum wrote
> > Do we need a beta for the 2.1.2 release?  I think it might be prudent
> > -- Anthony's last-minute checking of a critical fix to a bug that
> > prevented compilation on one platform points this out again.
> Maybe. But on the other hand, I've also done a bunch of different builds
> on as many platforms as I could find. 
> [The oopsie I found was actually probably the most complex merge of the
> lot, and that's not saying much. put it down to too many CVS checkouts 
> and not enough brain :)]
> The ugliness potential is from either those platforms that are an offense
> against nature that no-one thinks to try, or from some sort of weird 
> compilation options. I don't think that there's many of the fixes in 
> the 2.1.2 code that are going to break something that worked before - 
> with the list of platforms I've hit tonight, I think I've got most of
> the new code exercised. (One of the minor-ish constraints I put on 
> candidate fixes was whether or not I could easily test it.) 
> The other question I have to ask is whether people will actually download
> and test a beta/release candidate of a bugfix release. 

Given my current schedule I cannot offord to build 2.1.2 from CVS and test
it, but I would certainly try out a beta or rc on win2000. Been burned too
often by a buggy bdist_wininst ;-(


From  Thu Jan 10 15:08:25 2002
From: (Fred L. Drake, Jr.)
Date: Thu, 10 Jan 2002 10:08:25 -0500 (EST)
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: <>
References: <>
Message-ID: <>

[Sending to python-dev so people know the results for Solaris 2.8.]

Anthony Baxter writes:
 > > I'm exporting onto Solaris 2.8 now; will report the results.
 > Great. If you get a chance to try it with some non-standard build
 > args, that would also be appreciated...

  Specific suggestions please!  I've not looked at those in a while,
so I don't know which would be most useful.  A note summarizing
desired alternate builds would be good.
  Results for Solaris 2.8, gcc 2.95.2:

117 tests OK.
1 test failed: test_sunaudiodev
22 tests skipped: test_al test_bsddb test_cd test_cl test_dl test_gdbm test_gl test_gzip test_imgfile test_largefile test_linuxaudiodev test_minidom test_nis test_openpty test_pyexpat test_sax test_socketserver test_sundry test_winreg test_winsound test_zipfile test_zlib

  The sunaudiodev failure is "permission denied", which is not a real
failure; treat this as skipped.  (The machine is not local to me, so
there's no way for me to know if the test worked anyway.)
  Note that many of the optional modules don't get built on that
machine, but I can't do much (and nothing quickly) to change the
availability of additional libraries.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Thu Jan 10 15:20:51 2002
From: (Sjoerd Mullender)
Date: Thu, 10 Jan 2002 16:20:51 +0100
Subject: [Python-Dev] 2.1.2 testing.
References: <>
Message-ID: <>

I tried it on IRIX 6.5.13m (SGI) using gcc, and I saw two problems in the test
set.  One was in test_locale and can be written off as a bug in the IRIX
environment.  The other was in test_pty for which there is a fix.  Just get
the latest version of test_pty (the bug is in the test).

One more problem I saw was that test_sundry was skipped with the message that
there was an unresolvable symbol in by the name of dbopen.  I don't
quite understand why this is.

Anthony Baxter wrote:
> Has anyone had a chance to test that 2.1.2 builds and works correctly
> on anything? I'm testing on the following systems. sourceforge compile
> farm boxes are marked as such, compaq testdrive boxes to arrive as well[1].
> For each, a fresh cvs export, followed by ./configure ; make ; make test.
> Are there additional useful tests that could be run?
> Linux/x86               Redhat 6.2                     PASSED
> Linux/x86               Redhat 7.1                     PASSED
> Linux/x86               Redhat 7.2                     PASSED
> Solaris/sparc           2.7 (gcc-2.95.2)               PASSED
> Linux/x86               Debian 2.2 (         PASSED
> Linux/PPC [RS/6000]     Debian 2.2 (         PASSED
> Linux/alpha             Debian 2.2 (         PASSED
> FreeBSD                 4.4 (                PASSED
> Solaris/sparc           2.8 ( (gcc-2.95.2)   PASSED
> Tru64/Alpha             4.0 (compaq)                   ... still building ...
> Tru64/Alpha             5.1 (compaq)                   ... to be done ...
> Linux/sparc             Debian 2.2 (         FAILED
> This is scary. I don't know why this one alone fails - it fails the
> test_math test.
> Running the test by hand:
>     anthonybaxter@usf-cf-sparc-linux-1:~/python212_linxsparc$ PYTHONPATH= ./python  ./Lib/test/
>     math module, testing with eps 1e-05
>     constants
>     acos
>     Traceback (most recent call last):
>       File "./Lib/test/", line 21, in ?
>         testit('acos(-1)', math.acos(-1), math.pi)
>     OverflowError: math range error
> Running math.acos(-1) gives the correct answer. Anyone got any idea?
> I couldn't get py212 to build on our remaining solaris/x86 box, but then
> I can't get 2.1.1 to build on it either, without a whole lot of manual
> hackery - so I don't care about that. It's just a stuffed machine. :)
> I was hoping to test on MacOS X, but the boxes aren't
> answering... anyone else want to give it a go?
> Anthony
> [1] sheesh. had to install telnet for the compaq boxes. first time I've not
> had ssh access somewhere for a while. . . (plus, they don't have cvs. sigh.)
> _______________________________________________
> Python-Dev mailing list

From  Thu Jan 10 15:29:57 2002
From: (Barry A. Warsaw)
Date: Thu, 10 Jan 2002 10:29:57 -0500
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
References: <>
Message-ID: <>

>>>>> "GvR" == Guido van Rossum <> writes:

    GvR> Do we need a beta for the 2.1.2 release?  I think it might be
    GvR> prudent -- Anthony's last-minute checking of a critical fix
    GvR> to a bug that prevented compilation on one platform points
    GvR> this out again.

    GvR> The alternative is to be optimistic, and to quickly release
    GvR> 2.1.3 if 2.1.2 has a problem that we discover after its
    GvR> release.

We should be prepared for this in any case.

    GvR> Opinions?  I think a beta is prudent, and it shouldn't cost
    GvR> too much more in effort -- if we're lucky, nothing changes
    GvR> and we just fiddle some version numbers.  If it turns out to
    GvR> be needed, it's better than having to wear a brown bag over
    GvR> your head. :-)

I think micro releases should be as lightweight as possible so we
/can/ quickly get a new one out if a small, but important fix becomes

I'd say do a release candidate (which will probably not get much
testing beyond those who test cvs anyway), and then get 2.1.2 final


From Anthony Baxter <>  Thu Jan 10 15:31:54 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Fri, 11 Jan 2002 02:31:54 +1100
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
In-Reply-To: Message from (Barry A. Warsaw)
 of "Thu, 10 Jan 2002 10:29:57 CDT." <>
Message-ID: <>

>>> Barry A. Warsaw wrote
> I'd say do a release candidate (which will probably not get much
> testing beyond those who test cvs anyway), and then get 2.1.2 final
> out.

Ok. In that case I'll put the version back to rc1, and will start rolling
the tarball? 

(Not going to do it immediately - but soonish...)


From  Thu Jan 10 15:34:24 2002
From: (Barry A. Warsaw)
Date: Thu, 10 Jan 2002 10:34:24 -0500
Subject: [Python-Dev] 2.1.2 testing.
References: <>
Message-ID: <>

>>>>> "SM" == Sjoerd Mullender <> writes:

    SM> One more problem I saw was that test_sundry was skipped with
    SM> the message that there was an unresolvable symbol in
    SM> by the name of dbopen.  I don't quite understand why this is.

Hmm, if this was 2.2.1 I'd say it's the known brokenness of
w.r.t. bsddbmodule on some systems.  I think the is okay in
2.1.x but I'm doing a build on Mandrake 8.1 now...


From  Thu Jan 10 15:42:35 2002
From: (Guido van Rossum)
Date: Thu, 10 Jan 2002 10:42:35 -0500
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: Your message of "Thu, 10 Jan 2002 16:20:51 +0100."
References: <>
Message-ID: <>

> One more problem I saw was that test_sundry was skipped with the
> message that there was an unresolvable symbol in by the
> name of dbopen.  I don't quite understand why this is.

test_sundry shouldn't import dbhash.

--Guido van Rossum (home page:

From  Thu Jan 10 15:43:39 2002
From: (Guido van Rossum)
Date: Thu, 10 Jan 2002 10:43:39 -0500
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
In-Reply-To: Your message of "Fri, 11 Jan 2002 02:31:54 +1100."
References: <>
Message-ID: <>

> > I'd say do a release candidate (which will probably not get much
> > testing beyond those who test cvs anyway), and then get 2.1.2 final
> > out.


> Ok. In that case I'll put the version back to rc1, and will start rolling
> the tarball? 
> (Not going to do it immediately - but soonish...)

But how about the Windows installer?  A release isn't done without it.

--Guido van Rossum (home page:

From Anthony Baxter <>  Thu Jan 10 15:51:57 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Fri, 11 Jan 2002 02:51:57 +1100
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
In-Reply-To: Message from Guido van Rossum <>
 of "Thu, 10 Jan 2002 10:43:39 CDT." <>
Message-ID: <>

>>> Guido van Rossum wrote
> [Anthony]
> > Ok. In that case I'll put the version back to rc1, and will start rolling
> > the tarball? 
> > 
> > (Not going to do it immediately - but soonish...)
> But how about the Windows installer?  A release isn't done without it.

That, I can't help you with. I don't have access to MSVC, and I don't have
the requisite level of windows knowledge to do the build, anyway. (PEP-0101
refers to 'Windows Magic'. What's Tim's time like? 


Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Thu Jan 10 16:04:00 2002
From: (Fred L. Drake, Jr.)
Date: Thu, 10 Jan 2002 11:04:00 -0500 (EST)
Subject: [Python-Dev] Python 2.1.2c1 docs
Message-ID: <>

  The documentation for Python 2.1.2c1 is online at:

  Please report any real problems with these docs to me or Anthony
with a level 7 priority and set the "Group" to "Python 2.1.2".
  To file a bug, log into SourceForge and then visit:



Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Thu Jan 10 16:13:28 2002
From: (Barry A. Warsaw)
Date: Thu, 10 Jan 2002 11:13:28 -0500
Subject: [Python-Dev] 2.1.2 testing.
References: <>
Message-ID: <>

All the tests pass on Mandrake 8.1, including LFS with the CC='...'
configure instruction.


From Anthony Baxter <>  Thu Jan 10 16:15:15 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Fri, 11 Jan 2002 03:15:15 +1100
Subject: [Python-Dev] Re: [Zope.Com Geeks] 2.1.2 testing.
In-Reply-To: Message from Jens Vagelpohl <>
 of "Thu, 10 Jan 2002 11:13:54 CDT." <>
Message-ID: <>

Jens tested MacOS X. 

Results below:

>>> Jens Vagelpohl wrote
> anthony,
> here is what i found:
> as a "basic" build with the minimum configure options required to compile 
> (--with-dyld --with-suffix) i get the following test results (after upping 
> the stacksize to allow the re and sre-tests to succeed):
> 119 tests OK.
> 1 test failed: test_largefile
> 20 tests skipped: test_al test_cd test_cl test_dl test_fcntl test_gdbm 
> test_gl test_imgfile test_linuxaudiodev test_locale test_minidom test_nis 
> test_poll test_pty test_pyexpat test_sax test_socketserver 
> test_sunaudiodev test_winreg test_winsound
> make: *** [test] Error 1
> this is the same result as with 2.1.1. then, as a last test, i built and 
> tested zope with it. all unit tests run except for 3 ZODB tests which are 
> most likely not due to python misbehaving.
> lookin' good!
> jens

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Thu Jan 10 16:38:27 2002
From: (Fredrik Lundh)
Date: Thu, 10 Jan 2002 17:38:27 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook>
Message-ID: <0da701c199f5$3bf4c9e0$0900a8c0@spiff>

thomas wrote:
> I have a string variable containing some non-ascii characters (from
> a characterset which was previously called 'ansi' instead of 'oem'
> on windows).

short answer: "iso-8859-1" should work


longer answer:

windows "ansi" is an alias for the encoding you get from

    import locale
    language, encoding = locale.getdefaultlocale()

for people in western europe/north america, that's usually
"cp1252", which is a microsoft version of latin-1:

(characters 0x80-0x9f isn't part of iso-8859-1, aka latin-1)

cheers /F

From  Thu Jan 10 18:04:52 2002
From: (Greg Ward)
Date: Thu, 10 Jan 2002 13:04:52 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
Message-ID: <>

I have an application (Grouch) that has to do a lot of trickery at
pickle-time and unpickle-time, and as a result it happens to be
sensitive to the order of unpickling.

(The reason for the pickle-time intervention is that Grouch stores type
objects in its data structure, and you can't pickle type objects.  So it
hangs on to a representive value of the type for pickling -- eg. for the
"integer" type, it keeps both IntType and 0 in memory, but only pickles
0, and uses type(0) to get IntType back at unpickle time.)

The reason that Grouch is sensitive to the order of unpickling is
because its data structure is a gnarly, incestuous knot of mutually
interdependent classes, and I stopped tinkering with the pickle code as
soon as I got something that worked with Python 2.0 and 2.1.  Now it
fails under 2.2.  Under 2.1, it appears that certain more-deeply nested
objects were unpickled first; under 2.2, that is no longer the case, and
that screws up Grouch's test suite.

Anyone got a vague, hand-waving explanation for my vague, hand-waving
complaint?  Or should I try to come up with a test case?

Thanks --

Greg Ward - software developer      
MEMS Exchange                  

From  Thu Jan 10 18:24:13 2002
From: (Tim Peters)
Date: Thu, 10 Jan 2002 13:24:13 -0500
Subject: [Python-Dev] 2.1.2 release -- do we need a beta?
In-Reply-To: <>
Message-ID: <>

> Do we need a beta for the 2.1.2 release?

Yes, and whether or not I do a Windows release.  We could call it a "release
candidate" (i.e., 2.1.2c1).

From  Thu Jan 10 19:02:02 2002
From: (M.-A. Lemburg)
Date: Thu, 10 Jan 2002 20:02:02 +0100
Subject: [Python-Dev] Change in unpickle order in 2.2?
References: <>
Message-ID: <>

Greg Ward wrote:
> I have an application (Grouch) that has to do a lot of trickery at
> pickle-time and unpickle-time, and as a result it happens to be
> sensitive to the order of unpickling.

What's Grouch ?
> (The reason for the pickle-time intervention is that Grouch stores type
> objects in its data structure, and you can't pickle type objects.  So it
> hangs on to a representive value of the type for pickling -- eg. for the
> "integer" type, it keeps both IntType and 0 in memory, but only pickles
> 0, and uses type(0) to get IntType back at unpickle time.)

Why don't you use a special reduce function which takes the
tp_name as index into the types module ? Storing strings should
avoid all complicated type object saving.
> The reason that Grouch is sensitive to the order of unpickling is
> because its data structure is a gnarly, incestuous knot of mutually
> interdependent classes, and I stopped tinkering with the pickle code as
> soon as I got something that worked with Python 2.0 and 2.1.  Now it
> fails under 2.2.  Under 2.1, it appears that certain more-deeply nested
> objects were unpickled first; under 2.2, that is no longer the case, and
> that screws up Grouch's test suite.
> Anyone got a vague, hand-waving explanation for my vague, hand-waving
> complaint?  Or should I try to come up with a test case?

You should probably first check wether the pickle string is
identical in 2.1 and 2.2 and then go on from there.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 10 19:42:18 2002
From: (Andrew Kuchling)
Date: Thu, 10 Jan 2002 14:42:18 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Tue, Jan 08, 2002 at 01:41:37PM -0500, Tim Peters wrote:
>Break it into smaller steps so we can narrow down possible causes:

You should have cc'ed Barbara on that.  I forwarded your message to her 
and she wrote back (eventually):

>BTW, I forgot to pass this on yesterday, but I tried the code in Tim Peters'
>e-mail yesterday and the delay happens during the code = compile(...)

She's going to install sshd on her machine, so maybe this weekend I'll
be able to log in, compile Python from source, and poke around in an
effort to figure out what's going on.

--amk                                                  (
Our lives are different from anybody else's. That's the exciting
thing. Nobody in the universe can do what we're doing!
    -- The Doctor, in "Tomb of the Cybermen"

From  Thu Jan 10 20:26:32 2002
From: (Tim Peters)
Date: Thu, 10 Jan 2002 15:26:32 -0500
Subject: [Python-Dev] Ouch -- CVS troubles with 2.1.2c1
Message-ID: <>

Trying to add a new file to the release21-maint branch caused CVS commit to
die with an assertion error:

C:\Code\python\dist\src\PCbuild>cvs commit uninstal.wse
RCS file: /cvsroot/python/python/dist/src/PCbuild/Attic/uninstal.wse,v
cvs: commit.c:2104: checkaddfile: Assertion `*rcsnode == ((void *)0)'
Terminated with fatal signal 6
CVS.EXE commit: saving log message in c:\windows\TEMP\3

Trying again finds a stale lock:

C:\Code\python\dist\src\PCbuild>cvs commit uninstal.wse
cvs server: [12:22:36] waiting for tim_one's lock in
cvs server: [12:23:06] waiting for tim_one's lock in

Anyone got a clue?

From  Thu Jan 10 20:27:46 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 21:27:46 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <0da701c199f5$3bf4c9e0$0900a8c0@spiff> (
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook> <0da701c199f5$3bf4c9e0$0900a8c0@spiff>
Message-ID: <>

> windows "ansi" is an alias for the encoding you get from
>     import locale
>     language, encoding = locale.getdefaultlocale()
> for people in western europe/north america

Isn't that also known as "mbcs" in Python? And it is different from
"oem", which is not exposed to Python, right?

> "cp1252", which is a microsoft version of latin-1:
> (characters 0x80-0x9f isn't part of iso-8859-1, aka latin-1)

Strictly speaking, the characters 0x80-0x9f *are* assigned in latin-1,
to control characters - so these assignments differ in CP 1252.


From  Thu Jan 10 20:37:46 2002
From: (Tim Peters)
Date: Thu, 10 Jan 2002 15:37:46 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
Message-ID: <>

>> Break it into smaller steps so we can narrow down possible causes:

[Andrew Kuchling]
> You should have cc'ed Barbara on that.

I did a Reply-All.  In the copy I got back from Python-Dev,

was in the cc list.  If that didn't reach her, sorry, but I don't think I
could have done more than I did.

> I forwarded your message to her and she wrote back (eventually):
>> BTW, I forgot to pass this on yesterday, but I tried the code in
>> Tim Peters' e-mail yesterday and the delay happens during the code =
>> compile(...) statement.

So it's somehere in the front end -- that's a real help <wink>.

> She's going to install sshd on her machine, so maybe this weekend I'll
> be able to log in, compile Python from source, and poke around in an
> effort to figure out what's going on.

Did she try Skip's suggestion to try pymalloc?  Given that we believe there
is no Mac-specific code here outside libc, the first suggestion was (and
remains) the best.  The front end will be doing a whale of a lot of mallocs.
If it's like "the usual" malloc disease under glibc, the delays would appear
during the free()s.

From  Thu Jan 10 20:54:56 2002
From: (Tim Peters)
Date: Thu, 10 Jan 2002 15:54:56 -0500
Subject: [Python-Dev] Ouch -- CVS troubles with 2.1.2c1
In-Reply-To: <>
Message-ID: <>

This looks hopeless.  I submitted an SF support request to get the stale
lock removed:

In the meantime, you should expect this:

> cvs server: [12:22:36] waiting for tim_one's lock in
> /cvsroot/python/python/dist/src/PCbuild
> cvs server: [12:23:06] waiting for tim_one's lock in
> /cvsroot/python/python/dist/src/PCbuild
> ...

Perhaps you can arrange to skip the PCbuild directory?

From  Thu Jan 10 21:04:00 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 22:04:00 +0100
Subject: [Python-Dev] Ouch -- CVS troubles with 2.1.2c1
In-Reply-To: <>
References: <>
Message-ID: <>

[Tim Peters]
> This looks hopeless.  I submitted an SF support request to get the stale
> lock removed:

Well, Jacob Moorman is *really* quick with this kind of stuff these

Thanks, Jacob!


From  Thu Jan 10 21:12:31 2002
From: (Skip Montanaro)
Date: Thu, 10 Jan 2002 15:12:31 -0600
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
References: <>
Message-ID: <>

    >> BTW, I forgot to pass this on yesterday, but I tried the code in Tim
    >> Peters' e-mail yesterday and the delay happens during the code =
    >> compile(...)  statement.

I saw the same effect on my Linux laptop (with a mere 128MB).  The disk went
nuts when it tried compiling

    "[" + "2," * 200000 + "]"

VM size as reported by top went to 98.5MB.  This does not appear to be
exclusively a 2.2 issue, as I got this with the fresh 2.1.2 I built this

If you consider what this compiles to:

     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     LOAD_CONST               1 (2)
     BUILD_LIST               200000

To generate that it has to generate and parse a pretty deep abstract syntax
tree.  It looks like symtable_node gets called once for each list element.
There are probably other functions that are called once per list element as


From  Thu Jan 10 19:44:26 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 20:44:26 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <03d001c199b8$6c70d290$e000a8c0@thomasnotebook>
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook>
Message-ID: <>

> >    unicode("some string", "unicode-escape")
> For example the copyright symbol "©" (repr("©") gives "\xa9").
> Now I want to convert this string to unicode.
> u"©" works fine, unicode(variable) gives an ASCII decoding error.

As I said: unicode-escape is the precise encoding that is used to
parse Unicode strings from source files. It interprets all bytes above
128 as Latin-1.


From  Thu Jan 10 21:21:27 2002
From: (Thomas Heller)
Date: Thu, 10 Jan 2002 22:21:27 +0100
Subject: [Python-Dev] unicode/string asymmetries
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook> <>
Message-ID: <039701c19a1c$db9f3350$e000a8c0@thomasnotebook>

From: "Martin v. Loewis" <>
> > >    unicode("some string", "unicode-escape")
> [...]
> > For example the copyright symbol "=A9" (repr("=A9") gives "\xa9").
> > Now I want to convert this string to unicode.
> > u"=A9" works fine, unicode(variable) gives an ASCII decoding error.
> As I said: unicode-escape is the precise encoding that is used to
> parse Unicode strings from source files. It interprets all bytes above
> 128 as Latin-1.
I must apologize, because first it didn't seem to work:

>>> print unicode("\xa9", "unicode-escape")

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

but then I found out that the result simply cannot be printed out,
while the repr of it can be:

>>> unicode("\xa9", "unicode-escape")



From  Thu Jan 10 21:29:31 2002
From: (Andrew Kuchling)
Date: Thu, 10 Jan 2002 16:29:31 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Thu, Jan 10, 2002 at 03:37:46PM -0500, Tim Peters wrote:
>I did a Reply-All.  In the copy I got back from Python-Dev,  ...

Oops, I misread the headers of the original mail.  Sorry!

>Did she try Skip's suggestion to try pymalloc?  Given that we believe there
>is no Mac-specific code here outside libc, the first suggestion was (and

She hasn't compiled it herself yet, but that's the first thing I'll try.


From  Thu Jan 10 21:08:15 2002
From: (Jacob Moorman)
Date: 10 Jan 2002 16:08:15 -0500
Subject: [Python-Dev] Ouch -- CVS troubles with 2.1.2c1
In-Reply-To: <>
Message-ID: <>

On Thu, 2002-01-10 at 16:04, Martin v. Loewis wrote:
> [Tim Peters]
> > This looks hopeless.  I submitted an SF support request to get the stale
> > lock removed:
> > 
> >
> Well, Jacob Moorman is *really* quick with this kind of stuff these
> days.
> Thanks, Jacob!

As always, we are glad to assist :-)  If ever in the future you or any
other member of your team has support concerns which do not appear to be
receiving the level of response they deserve, feel free to contact me
directly (please include the support request number of the issue in
question) via e-mail at

Issues related to CVS stale locks, repository manipulation, etc. are
treated with our highest priority.  Our stated response time is 'two
business days' (roughly 48-72 hours), however we tend to respond to
these issues much faster than that.

Once again, thanks for the feedback; and do let me know if we may be of
further assistance in the future.

Jacob Moorman
Quality of Service Manager,

From  Thu Jan 10 21:31:06 2002
From: (Martin v. Loewis)
Date: Thu, 10 Jan 2002 22:31:06 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <039701c19a1c$db9f3350$e000a8c0@thomasnotebook>
References: <012501c1987a$0622caa0$e000a8c0@thomasnotebook> <> <01f601c1988b$03b00d30$e000a8c0@thomasnotebook> <> <024a01c198e2$823d2280$e000a8c0@thomasnotebook> <077201c198ec$56b9b740$0900a8c0@spiff> <04ca01c19917$2229b220$e000a8c0@thomasnotebook> <> <03d001c199b8$6c70d290$e000a8c0@thomasnotebook> <> <039701c19a1c$db9f3350$e000a8c0@thomasnotebook>
Message-ID: <>

> >>> unicode("\xa9", "unicode-escape")
> u'\xa9'

As a follow up, in source code, you might want to write


instead, for better readability.


From  Thu Jan 10 22:14:44 2002
From: (Skip Montanaro)
Date: Thu, 10 Jan 2002 16:14:44 -0600
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
References: <>
Message-ID: <>

    >> Did she try Skip's suggestion to try pymalloc?  

    amk> She hasn't compiled it herself yet, but that's the first thing I'll
    amk> try.

I did try that when the problem was first raised.  I just tried it again.
It did have a positive effect:

w/ threads and w/o pymalloc:

    user     system    elapsed     CPU
    7.64     0.72      0:09.73     85%
    7.86     0.45      0:08.66     95%
    7.66     0.66      0:08.32     99%

w/o threads and w/ pymalloc:

    user     system    elapsed     CPU
    5.44     0.58      0:06.85     87%
    5.57     0.46      0:06.02    100%
    5.57     0.45      0:06.02     99%

The above was with my memory usage trimmed about as far down as I could get
it (turned off X, for example).  My apologies that both sets of numbers
don't have threads disabled.  It's just what I happened to have laying
around on the disk.


From  Thu Jan 10 22:47:58 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 23:47:58 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Thu, 10 Jan 2002 08:32:20 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> > One minor misgiving is that this call will *always* copy the string,
> > even if the internal coding of unicode objects is wchar_t. That's a
> > bit of a nuisance, but we can try to fix that later.
> Not sure what you mean by "later". Once this is being used, you cannot
> fix it anymore.

By "later" I meant "when your argtuple idea has been accepted":-)

Remember: most of my code is generated anyway, so fixing things like
this is a minor effort.

In case it wasn't clear yet: this is a firm +1 for the argtuple idea.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 22:52:26 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 23:52:26 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Thu, 10 Jan 2002 08:27:39 +0100 , <>
Message-ID: <>

Recently, "Martin v. Loewis" <> said:
> > Or, to make things clearer, WinObj_Type->tp_convert would simply
> > point to the current WinObj_Convert function.
> So what do you gain with that extension? It seem all that is done is
> you can replace _Convert by _Type everywhere, with no additional
> change to the semantics.

Because you can refer to the _Type from Python, that is the whole
point of this exercise. And because you can refer to it from Python
you can pass it to calldll.newcall and such.

> > > > ps_GetDrawableSurface = calldll.newcall(psapilib.ps_GetDrawableSurface,
> > > > 	Carbon.Qd.GrafPortType)
> [...]
> > Not at the moment, but in calldll version 2 there would be. In stead
> > of passing types as "l" or "h" you would pass type objects to
> > newcall(). Newcall() would probably special-case the various ints but
> > for all other types simply call PyArg_Parse(arg, "O@", typeobj,
> > &voidptr). 
> I still don't understand. In your example, GrafPortType is a return
> type, not an argument type. So you *have* an anything, and you *want*
> the GrafPortType. How exactly do you use PyArg_Parse in that scenario?

Sorry, you're right. My example was for a return value, so we're
talking Py_BuildValue here. But this situation is equivalent to a
GrafPort argument, where PyArg_Parse would be used.

> Also, why would you use this extension inside newcall()? I'd rather
> expect it in ps_GetDrawableSurface.__call__ instead (i.e. when you
> deal with a specific call, not when you create the callable instance).

Absolutely right, sloppy typing on my part.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 22:55:00 2002
From: (Jack Jansen)
Date: Thu, 10 Jan 2002 23:55:00 +0100
Subject: [Python-Dev] release for 2.1.2, plus 2.2.1...
In-Reply-To: Message by Anthony Baxter <> ,
 Thu, 10 Jan 2002 19:17:02 +1100 , <>
Message-ID: <>

Recently, Anthony Baxter <> said:
> >>> Barry A. Warsaw wrote
> >     AB> Ok, I'd like to make the 2.1.2 release some time in the first
> >     AB> half of the week starting 7th Jan, assuming that's ok for the
> >     AB> folks who'll need to do the work on the PC/Mac packaging.
> I'm doing this this evening; i.e. now.

And I'm not going to do a MacPython 2.1.2. The effort needed is too
much, and people seem to be happy enough with 2.1.1 (most have
switched to 2.2 anyway).

Oh yes, Anthony: I tried the current 2.1.2 CVS on Mac OS X
(unix-Python), and all problems appear to be solved.

- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 23:02:17 2002
From: (Jack Jansen)
Date: Fri, 11 Jan 2002 00:02:17 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: Message by "Thomas Heller" <> ,
 Thu, 10 Jan 2002 10:10:58 +0100 , <03be01c199b6$cfcfe350$e000a8c0@thomasnotebook>
Message-ID: <>

Recently, "Thomas Heller" <> said:
> Here's an outline which could work in 2.2:

This sounds very good! There's only one thing you'll have to explain
to me: how would this work from C? My types are all in C, not in
Python, so I'd need to do the magic in C. Where do I find examples of
using metatypes from C?

I could then put all this wrapper stuff in a file WrapperObject.c and
it would be reusable by any object that wanted this functionality.

> Create a subtype of type, having a tp_convert slot:
> typedef int (*convert_func)(PyTypeObject *, void **);
> typedef struct {
>     PyTypeObject type;
>     convert_func tp_convert;
> } WrapperTypeType;
> and use it as metaclass (metatype?) for your WindowObj:
> class WindowObj(...):
>     __metaclass__ = WrapperTypeType
> Write a function to return a conversion function:
> convert_func *get_converter(PyTypeObject *type)
> {
>     if (WrapperTypeType_Check(type))
>         return ((WrapperTypeType *)type)->tp_convert;
>     /* code to check additional types and return their converters */
>     ....
> }
> and then
> if (!PyArg_ParseTuple(args, "O&", get_converter(WinObj_Type), &Window))
> How does this sound?
> Thomas
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Thu Jan 10 23:06:06 2002
From: (Neil Hodgson)
Date: Fri, 11 Jan 2002 10:06:06 +1100
Subject: [Python-Dev] unicode/string asymmetries
References: <> <> <008f01c19956$73c4f790$0acc8490@neil> <>
Message-ID: <040501c19a2b$6237f470$0acc8490@neil>


> ... So I'd
> suggest you just put an assertion into the code that Py_UNICODE is the
> same size as WCHAR (that can be even done through a preprocessor
> #error, using the _SIZE #defines). I'll expect people will resist
> changing Py_UNICODE on Windows for quite some time, even if other
> platforms move on.

   OK, I've turned off the wide character functions when Py_UNICODE_WIDE
defined. It even compiles in wide mode although with a lot (about 30)
warnings. The warnings are because I'm avoiding the wide char functions with
a runtime check rather than a compile time check as the preprocessor checks
would get messy with the extra case. The wide mode settings I used were:
#define PY_UNICODE_TYPE unsigned long

   Why isn't Py_UNICODE_SIZE defined as


   Changes at


From  Thu Jan 10 23:11:56 2002
From: (Mark Hammond)
Date: Fri, 11 Jan 2002 10:11:56 +1100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <>
Message-ID: <>

> > windows "ansi" is an alias for the encoding you get from
> >
> >     import locale
> >     language, encoding = locale.getdefaultlocale()
> >
> > for people in western europe/north america
> Isn't that also known as "mbcs" in Python? And it is different from
> "oem", which is not exposed to Python, right?

<gulp> My turn to speak of which I do not really understand :)

mbcs is an "encoding", but a strange encoding in that it depends on the
character set.  The character set itself determines what bytes are lead

Thus, the same mbcs string may be interpreted differently depending on the
current character set/code page.  Thus "ansi" and "oem" are code pages,
where mbcs is an encoding.

This is why Neil demonstrated problems referencing (say) a Japenese filename
when the current code-page is not Japanese - there is only a valid mbcs
representation in supported code pages.


From  Fri Jan 11 04:01:13 2002
From: (Guido van Rossum)
Date: Thu, 10 Jan 2002 23:01:13 -0500
Subject: [Python-Dev] RELEASED - Python 2.1.2c1
Message-ID: <>

We've issued a release candidate of Python 2.1.2:

Our thanks go out to Anthony Baxter, who almost singlehandedly
produced this release.  We're planning a final release of 2.1.2 early
next week, probably Tuesday night (Wednesday morning for Anthony :-).

Please report any bugs you find to the bug tracker:

This being a bugfix release, there are no exciting new features -- we
just fixed a lot of bugs; a few are outlined below.  For a complete
list, please see:

- The socket object gained a new method, 'sendall()'. This method 
  is guaranteed to send all data - this is not guaranteed by the
  'send()' method. See also SF patch #474307. The standard library
  has been updated to use this method where appropriate.

- Fix for incorrectly swapped arguments to PyFrame_BlockSetup in ceval.c.
  This bug could cause python to crash. It was related to using a 'continue'
  inside a 'try' block.

- The Python compiler package was updated to correctly calculate stack
  depth in some cases. This was affecting Zope Python Scripts rather badly.

- Largefile support was added (but not on by default, you'll need to follow
  the instructions in the documentation of the posix module).

--Guido van Rossum (home page:

From  Fri Jan 11 05:08:43 2002
From: (Guido van Rossum)
Date: Fri, 11 Jan 2002 00:08:43 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: Your message of "Thu, 10 Jan 2002 13:04:52 EST."
References: <>
Message-ID: <>

> I have an application (Grouch) that has to do a lot of trickery at
> pickle-time and unpickle-time, and as a result it happens to be
> sensitive to the order of unpickling.
> (The reason for the pickle-time intervention is that Grouch stores type
> objects in its data structure, and you can't pickle type objects.  So it
> hangs on to a representive value of the type for pickling -- eg. for the
> "integer" type, it keeps both IntType and 0 in memory, but only pickles
> 0, and uses type(0) to get IntType back at unpickle time.)
> The reason that Grouch is sensitive to the order of unpickling is
> because its data structure is a gnarly, incestuous knot of mutually
> interdependent classes, and I stopped tinkering with the pickle code as
> soon as I got something that worked with Python 2.0 and 2.1.  Now it
> fails under 2.2.  Under 2.1, it appears that certain more-deeply nested
> objects were unpickled first; under 2.2, that is no longer the case, and
> that screws up Grouch's test suite.
> Anyone got a vague, hand-waving explanation for my vague, hand-waving
> complaint?  Or should I try to come up with a test case?

Yes please, and post it to SourceForge.  There aren't that many
changes in the source of since release 2.1.  (Or are you
using cPickle?  If so, please say so.  The two aren't 100%

I see changes related to unicode, and type objects are pickled
differently in 2.2.  There's also a change that refuses to pickle an
"global" (a reference by module and object name, used for classes,
types and functions) when the name that the object claims to have
doesn't refer to the same object.  There's a new test on

Hm, I think you must be using cPickle, I don't know enough about it to

--Guido van Rossum (home page:

From  Fri Jan 11 05:21:47 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 06:21:47 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
In-Reply-To: <> (message from Jack
 Jansen on Thu, 10 Jan 2002 23:52:26 +0100)
References: <>
Message-ID: <>

> Because you can refer to the _Type from Python, that is the whole
> point of this exercise. And because you can refer to it from Python
> you can pass it to calldll.newcall and such.

I still fail to see why you need additional ParseTuple support in

> Sorry, you're right. My example was for a return value, so we're
> talking Py_BuildValue here. But this situation is equivalent to a
> GrafPort argument, where PyArg_Parse would be used.

In cdc_call, there is a loop over all arguments, rather than a
ParseTuple call. I don't see how this could change: all arguments are
processed uniformly. Precisely how would you use O@ in there?

Actually, it may be worthwhile to get rid of the PyArg_ParseTuple call
in call_newcall also: for performance reasons, to soften the
dependency on MAXARG, and to give better diagnostics in case of user
errors. There is a loop over argconv, anyway; this loop could have run
over args in the first place.

All you might want to have is additionals slots in type objects; as
Thomas explains, you can have that using just the 2.2 facilities. 

For the specific case of calldll, it seems that a generic mechanism
would be harmful: You want to be absolutely sure that an object is
convertible to a long *for the purposes of API calls*. So I'd even
encourage to create a PyCallDll_RegisterTypeConverter function;
extension types that want to support calldll should register a
conventry and a rvconventry. That approach works for any Python


From  Fri Jan 11 05:26:27 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 06:26:27 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <040501c19a2b$6237f470$0acc8490@neil> (
References: <> <> <008f01c19956$73c4f790$0acc8490@neil> <> <040501c19a2b$6237f470$0acc8490@neil>
Message-ID: <>

>    ?

Because you cannot use that in preprocessor tests. If you do


then the preprocessor is not supposed to do this properly unless you
have a integral number on each side.


From  Fri Jan 11 05:31:12 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 00:31:12 -0500
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
Message-ID: <>

[Skip Montanaro]
> I did try that when the problem was first raised.  I just tried it
> again.  It did have a positive effect:
> w/ threads and w/o pymalloc:
>     user     system    elapsed     CPU
>     7.64     0.72      0:09.73     85%
>     7.86     0.45      0:08.66     95%
>     7.66     0.66      0:08.32     99%
> w/o threads and w/ pymalloc:
>     user     system    elapsed     CPU
>     5.44     0.58      0:06.85     87%
>     5.57     0.46      0:06.02    100%
>     5.57     0.45      0:06.02     99%

Skip, I think this is irrelevant to the OP's problem.  You're telling us you
can save a few seconds running test_longexp on a box with barely enough
memory to run it at all.  Barbara is worried about shaving *hours* off it on
a box with gobs of memory to spare.

Still, I expect pymalloc will fix her problem (since malloc is the only
suspect on the list, it better <wink>).

From  Fri Jan 11 05:47:14 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 06:47:14 +0100
Subject: [Python-Dev] unicode/string asymmetries
In-Reply-To: <>
References: <>
Message-ID: <>

> > > windows "ansi" is an alias for the encoding you get from
> > Isn't that also known as "mbcs" in Python? And it is different from
> > "oem", which is not exposed to Python, right?
> mbcs is an "encoding", but a strange encoding in that it depends on the
> character set.  The character set itself determines what bytes are lead
> bytes.

That is my understanding also.

> Thus, the same mbcs string may be interpreted differently depending on the
> current character set/code page.  Thus "ansi" and "oem" are code pages,
> where mbcs is an encoding.

That is not really true, is it: "ansi" and "oem" are not code pages,
are they? Atleast, not constant code pages, but code pages that depend
on the national version, right?

"mbcs" uses MultiByteToWideChar with CP_ACP, so "mbcs" *is* CP_ACP,
where ACP stands for "ANSI Code Page", right? CP_ACP is the code page
that the "ANSI" functions, i.e. the *A functions, expect. It might be
code page 1252, or it might be something else.

Likewise, the OEM code page is not a fixed thing, either. Instead, it
is what DOS would have used in this locale. So, CP_OEMCP might be 437,
or it might be something else, again, e.g. 850.

I think it might have been less confusing to call the "mbcs" encoding
"ansi", and to expose the "oem" encoding (which can still be done).

Please correct me if I'm wrong.


From  Fri Jan 11 06:13:26 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 01:13:26 -0500
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: <>
Message-ID: <>

[Anthony Baxter]
> ...
> Linux/sparc             Debian 2.2 (         FAILED
> This is scary. I don't know why this one alone fails - it fails the
> test_math test.
> Running the test by hand:
>     anthonybaxter@usf-cf-sparc-linux-1:~/python212_linxsparc$
> PYTHONPATH= ./python  ./Lib/test/
>     math module, testing with eps 1e-05
>     constants
>     acos
>     Traceback (most recent call last):
>       File "./Lib/test/", line 21, in ?
>         testit('acos(-1)', math.acos(-1), math.pi)
>     OverflowError: math range error
> Running math.acos(-1) gives the correct answer. Anyone got any idea?

Sorry, not short of stepping into mathmodule.c under a debugger.  The only
interesting thing about that test is that math.acos(-1) is the very first
call makes to the platform libm.  Perhaps if you commented it
out, you'd get a bogus OverflowError from

    testit('acos(0)', math.acos(0), math.pi/2)

on the following line.

From  Fri Jan 11 07:02:31 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 08:02:31 +0100
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: <> (message
 from Guido van Rossum on Fri, 11 Jan 2002 00:08:43 -0500)
References: <> <>
Message-ID: <>

> Yes please, and post it to SourceForge.  There aren't that many
> changes in the source of since release 2.1. 

I think there have been changes to the order in which things come out
of a dictionary, which could affect pickling order.

Unpickling order, of course, should strictly follow the order in which
things are in the file.


From  Fri Jan 11 07:33:23 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 08:33:23 +0100
Subject: [Python-Dev] 2.1.2 testing.
In-Reply-To: <>
References: <>
Message-ID: <>

> > Linux/sparc             Debian 2.2 (         FAILED
> > This is scary. I don't know why this one alone fails - it fails the
> > test_math test.
> Sorry, not short of stepping into mathmodule.c under a debugger.  The only
> interesting thing about that test is that math.acos(-1) is the very first
> call makes to the platform libm.  Perhaps if you commented it
> out, you'd get a bogus OverflowError from
>     testit('acos(0)', math.acos(0), math.pi/2)
> on the following line.

Seems to be a Sparclinux bug. If mathmodule is statically linked into
python (via Modules/Setup), the test passes fine. Without further
analysis, I'd say that assigning to errno does not work well when done
in a shared library.

I'd say this is bug #459464. Last time, I incorrectly diagnosed this
as a sparc64 gcc issue, which it isn't: Even though 'uname -m' reports
'sparc64', all userland code is 32-bit. I'm probably wrong with my
current guess as well.


From  Fri Jan 11 00:49:08 2002
From: (Gustavo Niemeyer)
Date: Thu, 10 Jan 2002 22:49:08 -0200
Subject: [ Re: [Python-Dev] Python's footprint]
Message-ID: <20020110224908.C884@ibook.distro.conectiva>

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi everyone!

Now that 2.2 is history (well, kind of ;-), would it be the time to
think about this again?

Thank you!

----- Forwarded message from Gustavo Niemeyer <> -----

Date: Wed, 14 Nov 2001 20:07:03 -0200
From: Gustavo Niemeyer <>
Subject: Re: [Python-Dev] Python's footprint
In-Reply-To: <>
User-Agent: Mutt/1.3.23i

> > It means that about 10% of python's executable is documentation.
> Anyways, that sounds like a useful idea.  It would probably be a big
> patch that touches lots of files, so it's unlikely to get into Python
> 2.2.  You might consider whipping up a patch now to get it under
> consideration early in 2.3's life-cycle.

Ok. The patch is ready (attached). It's very simple. Just introducing
two new macros: Py_DOCSTR() to be used in usual doc strings, and
WITH_DOC_STRINGS, for more complex ones (sys module's doc string
comes into my mind).

I'd just like to know the moment when it is going to be applied, so I
can change every documentation string accordingly and submit the patch.
I could do this right now, for sure. But if it's going to be applied
just for 2.3, the patch will certainly be broken at that time.


Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

--- Python-2.2.orig/	Wed Nov 14 17:54:31 2001
+++ Python-2.2/	Wed Nov 14 19:08:08 2001
@@ -765,3 +765,13 @@
 #define STRICT_SYSV_CURSES /* Don't use ncurses extensions */
+/* Define if you want to have inline documentation. */
+/* Define macro for inline documentation. */
+#define Py_DOCSTR(x) x
+#define Py_DOCSTR(x) ""
--- Python-2.2.orig/	Wed Nov 14 17:54:31 2001
+++ Python-2.2/	Wed Nov 14 19:20:07 2001
@@ -1305,6 +1305,20 @@
+# Check for --with-doc-strings
+AC_MSG_CHECKING(for --with-doc-strings)
+[  --with(out)-doc-strings         disable/enable documentation strings])
+if test -z "$with_doc_strings"
+then with_doc_strings=3D"yes"
+if test "$with_doc_strings" !=3D "no"
 # Check for Python-specific malloc support
 AC_MSG_CHECKING(for --with-pymalloc)

----- End forwarded message -----

Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see



From  Fri Jan 11 13:34:21 2002
From: (Martin v. Loewis)
Date: Fri, 11 Jan 2002 14:34:21 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <20020110224908.C884@ibook.distro.conectiva> (message from
 Gustavo Niemeyer on Thu, 10 Jan 2002 22:49:08 -0200)
References: <20020110224908.C884@ibook.distro.conectiva>
Message-ID: <>

> Now that 2.2 is history (well, kind of ;-), would it be the time to
> think about this again?

By "consideration early in 2.3's life cycle", the OP probably meant
that a patch should be posted to SF. Are you willing to implement the
complete change (i.e. create a patch that changes each and every
source file)? If so, please post one to SF. You may want to start this
slowly, first creating only the infrastructure and touching a single
file (say, stringobject.c)

I'd personally like to see opportunities for more magic used. E.g. in
a compiler that uses sections, putting all doc strings into a single
section might be desirable. They will be a contiguous fragment of the
python executable, which helps on demand-paged systems to reduce the
startup time. Going further, it might be possible to strip off "unused
sections" from the binary after it has been linked, deferring the
choice of doc string presence to the installation time.

For that to work, we'd first need to know what compilers offer what
syntax to implement such magic, then generalize it to the right macro.
If that is a desirable goal, I'd be willing to investigate how to
achieve things with gcc, on ELF systems.


From  Fri Jan 11 14:15:27 2002
From: (Skip Montanaro)
Date: Fri, 11 Jan 2002 08:15:27 -0600
Subject: [Python-Dev] PEP 100 references & wording
Message-ID: <>

I just noticed that PEP 100 (Python/Unicode integration) references

as the latest version.  Sure enough, I visited that and found that it's
newer than the PEP (1.8 v. 1.7).

Shouldn't the PEP be the most up-to-date public document?  The comment right
after that suggests this should be so:

     [ed. note: new revisions should be made to this PEP document, while the
     historical record previous to version 1.7 should be retrieved from
     MAL's url, or Misc/unicode.txt]

Since this is now an informational PEP, I believe the wording should change
to reflect functionality that has already been implemented.  For instance,
instead of

    Python should provide a built-in constructor for Unicode strings which
    is available through __builtins__:

it should read

    Python provides a built-in constructor for Unicode strings which is
    available through __builtins__:


From  Fri Jan 11 14:21:05 2002
From: (Gustavo Niemeyer)
Date: Fri, 11 Jan 2002 12:21:05 -0200
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
References: <20020110224908.C884@ibook.distro.conectiva> <>
Message-ID: <20020111122105.B1808@ibook.distro.conectiva>

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Martin!

> > Now that 2.2 is history (well, kind of ;-), would it be the time to
> > think about this again?
> By "consideration early in 2.3's life cycle", the OP probably meant
> that a patch should be posted to SF. Are you willing to implement the
> complete change (i.e. create a patch that changes each and every
> source file)? If so, please post one to SF. You may want to start this
> slowly, first creating only the infrastructure and touching a single
> file (say, stringobject.c)

Yes, I'm going to implement it. I'd just like to know if there was
interest in the patch. Implementing it slowly looks like a nice idea
as well. I'll post a patch there. Thanks!

> I'd personally like to see opportunities for more magic used. E.g. in
> a compiler that uses sections, putting all doc strings into a single
> section might be desirable. They will be a contiguous fragment of the
> python executable, which helps on demand-paged systems to reduce the
> startup time. Going further, it might be possible to strip off "unused
> sections" from the binary after it has been linked, deferring the
> choice of doc string presence to the installation time.


I know it's possible to discard a session. OTOH, I don't know what happens
if somebody refer to discarded data. I'll have a look at this.

> For that to work, we'd first need to know what compilers offer what
> syntax to implement such magic, then generalize it to the right macro.
> If that is a desirable goal, I'd be willing to investigate how to
> achieve things with gcc, on ELF systems.

This is something pretty easy with gcc. When reading your email, I
remembered that the kernel uses this magic to discard a session with
code used just when initializing. Looking in the kernel code, I found
out this in include/linux/init.h:

 * Mark functions and data as being only used at initialization
 * or exit time.
#define __init      __attribute__ ((__section__ (".text.init")))
#define __exit      __attribute__ ((unused, __section__(".text.exit")))
#define __initdata  __attribute__ ((__section__ (".data.init")))
#define __exitdata  __attribute__ ((unused, __section__ (".data.exit")))
#define __initsetup __attribute__ ((unused,__section__ (".setup.init")))
#define __init_call __attribute__ ((unused,__section__ (".initcall.init")))
#define __exit_call __attribute__ ((unused,__section__ (".exitcall.exit")))

After surrounding doc strings with a macro, this will be easy to achieve.


Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see



From  Fri Jan 11 14:49:29 2002
From: (Greg Ward)
Date: Fri, 11 Jan 2002 09:49:29 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 10 January 2002, M.-A. Lemburg said:
> What's Grouch ?

Grouch is a system for 1) describing a Python object schema, and 2)
traversing an existing object graph (eg. a pickle or ZODB) to ensure
that it conforms to that object schema.

An object schema is a collection of classes (including the attributes in
each class and the type of each attribute), atomic types, and type

An atomic type is a type with no sub-types; by default every Grouch
schema has five atomic types: int, string, long, complex, and float.
You can easily add new atomic types, eg. the MEMS Exchange virtual fab
has mxDateTime as an atomic type.

A type alias is just what it sounds like, eg. "Foo" might be an alias
for "foo.Foo" (a fully qualified class name representing a Grouch
instance type), and "real" might be an alias for "int|long|float" (a
Grouch union type).


Anyways, that's not terribly relevant, but it gives me an excuse to plug
my most arcane and (IMHO) interesting Python hack.

> (The reason for the pickle-time intervention is that Grouch stores type
> objects in its data structure, and you can't pickle type objects.  So it
> hangs on to a representive value of the type for pickling -- eg. for the
> "integer" type, it keeps both IntType and 0 in memory, but only pickles
> 0, and uses type(0) to get IntType back at unpickle time.)

> Why don't you use a special reduce function which takes the
> tp_name as index into the types module ? Storing strings should
> avoid all complicated type object saving.

I'm not sure I understand what you're saying.  Are you just suggesting
that, when I need to pickle IntType, I pickle the string "int" instead
of the integer 0?  I don't see how that makes any difference: I still
need to intercede at pickle/unpickle time to make this happen.

Also, the fact that type(x).__name__ is not consistent across Python
versions or implementations (Jython) screws this up.  Grouch now has its
own canonical set of type names because of this, and I could easily
reverse that dictionary to make a typename->typeobject mapping.  But I
don't see how pickling "int" is a win over pickling 0, when what I
*really* want to do is pickle IntType.

> You should probably first check wether the pickle string is
> identical in 2.1 and 2.2 and then go on from there.

Excellent idea -- thanks!

Greg Ward - nerd                              
"Eine volk, eine reich, eine fĂ¼hrer" --Hitler
"One world, one web, one program" --Microsoft

From  Fri Jan 11 14:49:33 2002
From: (Skip Montanaro)
Date: Fri, 11 Jan 2002 08:49:33 -0600
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <20020111122105.B1808@ibook.distro.conectiva>
References: <20020110224908.C884@ibook.distro.conectiva>
Message-ID: <>

    Gustavo> Yes, I'm going to implement it. I'd just like to know if there
    Gustavo> was interest in the patch. Implementing it slowly looks like a
    Gustavo> nice idea as well. I'll post a patch there. Thanks!


I recommend you do the whole patch thing through SourceForge.  Just post a
link to your patch to python-dev.


From  Fri Jan 11 14:52:06 2002
From: (Greg Ward)
Date: Fri, 11 Jan 2002 09:52:06 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: <>
References: <> <>
Message-ID: <>

> I have an application (Grouch) that has to do a lot of trickery at
> pickle-time and unpickle-time, and as a result it happens to be
> sensitive to the order of unpickling.
> Anyone got a vague, hand-waving explanation for my vague, hand-waving
> complaint?  Or should I try to come up with a test case?

> Yes please, and post it to SourceForge.  There aren't that many
> changes in the source of since release 2.1.  (Or are you
> using cPickle?  If so, please say so.  The two aren't 100%
> equivalent.)

Tried it with both pickle and cPickle, with the same result (ie. one of
my test cases failed with the exact same traceback, apparently for the
same reason).

I'll see if I can't reduce this to something that doesn't rely on 1500
hairy lines of Grouch code.  (Only fitting that something named for
Oscar the Grouch is hairy, eh?)

Greg Ward - Linux weenie                      
A man without religion is like a fish without a bicycle.

From  Fri Jan 11 15:03:27 2002
From: (M.-A. Lemburg)
Date: Fri, 11 Jan 2002 16:03:27 +0100
Subject: [Python-Dev] PEP 100 references & wording
References: <>
Message-ID: <>

Skip Montanaro wrote:
> I just noticed that PEP 100 (Python/Unicode integration) references
> as the latest version.  Sure enough, I visited that and found that it's
> newer than the PEP (1.8 v. 1.7).

True. I'm not sure why the above file is 1.8 and the CVS PEP at 1.7.
I guess I forgot to update the PEP.

FYI, here's adiff between the 1.7 and 1.8 version:
--- unicode-proposal-1.7.txt    Tue Oct 17 17:38:40 2000
+++ unicode-proposal.txt        Tue Oct 17 17:38:40 2000
@@ -1,7 +1,7 @@
- Python Unicode Integration                            Proposal Version: 1.7
+ Python Unicode Integration                            Proposal Version: 1.8

@@ -612,11 +612,11 @@ Case Conversion:

 Case conversion is rather complicated with Unicode data, since there
 are many different conditions to respect. See


 for some guidelines on implementing case conversion.

 For Python, we should only implement the 1-1 conversions included in
 Unicode. Locale dependent and other special case conversions (see the
@@ -631,11 +631,15 @@ possible.
 Line Breaks:

 Line breaking should be done for all Unicode characters having the B
 property as well as the combinations CRLF, CR, LF (interpreted in that
-order) and other special line separators defined by the standard.
+order) and other special line separators defined by the standard. See
+for some guidelines on implementing line breaks and newline handling.

 The Unicode type should provide a .splitlines() method which returns a
 list of lines according to the above specification. See Unicode

@@ -1010,11 +1014,11 @@ Unicode 3.0:



 Introduction to Unicode (a little outdated by still nice to read):

 For comparison:
@@ -1047,10 +1051,11 @@ Encodings:

 History of this Proposal:
+1.8: Fixed some URLs to the site.
 1.7: Added note about the changed behaviour of "s#".
 1.6: Changed <defencstr> to <defenc> since this is the name used in the
      implementation. Added notes about the usage of <defenc> in the
      buffer protocol implementation.
 1.5: Added notes about setting the <default encoding>. Fixed some

> Shouldn't the PEP be the most up-to-date public document?  The comment right
> after that suggests this should be so:
>      [ed. note: new revisions should be made to this PEP document, while the
>      historical record previous to version 1.7 should be retrieved from
>      MAL's url, or Misc/unicode.txt]
> Since this is now an informational PEP, I believe the wording should change
> to reflect functionality that has already been implemented.  For instance,
> instead of
>     Python should provide a built-in constructor for Unicode strings which
>     is available through __builtins__:
> it should read
>     Python provides a built-in constructor for Unicode strings which is
>     available through __builtins__:

True again; I just didn't find time to rewrite these bits. The PEP
is basically a reformatted proposal. That's where the "should" wording
originates from.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 11 15:11:59 2002
From: (M.-A. Lemburg)
Date: Fri, 11 Jan 2002 16:11:59 +0100
Subject: [Python-Dev] Change in unpickle order in 2.2?
References: <> <> <>
Message-ID: <>

Greg Ward wrote:
> On 10 January 2002, M.-A. Lemburg said:
> > What's Grouch ?
> [Grouch is a system for 1) describing a Python object schema, and 2)
> traversing an existing object graph (eg. a pickle or ZODB) to ensure
> that it conforms to that object schema.]

Sounds very interesting :-)

> [me]
> > (The reason for the pickle-time intervention is that Grouch stores type
> > objects in its data structure, and you can't pickle type objects.  So it
> > hangs on to a representive value of the type for pickling -- eg. for the
> > "integer" type, it keeps both IntType and 0 in memory, but only pickles
> > 0, and uses type(0) to get IntType back at unpickle time.)
> [MAL]
> > Why don't you use a special reduce function which takes the
> > tp_name as index into the types module ? Storing strings should
> > avoid all complicated type object saving.
> I'm not sure I understand what you're saying.  Are you just suggesting
> that, when I need to pickle IntType, I pickle the string "int" instead
> of the integer 0? 

Right. It needn't be 'int', any string will do as long as you
have a mapping from strings to type objects.

> I don't see how that makes any difference: I still
> need to intercede at pickle/unpickle time to make this happen.

Well, I suppose with the new Python 2.2 version you could add a
special __reduce__ method to type objects which takes of this
for you.

For older versions, you should probably register a pickle handler
for type objects which does the same. Pickle should then use this
handler for pickling the type object.

> Also, the fact that type(x).__name__ is not consistent across Python
> versions or implementations (Jython) screws this up.  Grouch now has its
> own canonical set of type names because of this, and I could easily
> reverse that dictionary to make a typename->typeobject mapping.  But I
> don't see how pickling "int" is a win over pickling 0, when what I
> *really* want to do is pickle IntType.

True, but it saves you the trouble of storing global references
to the type constructors in the pickle. Your system will do the
mapping using the above hooks.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 11 15:54:17 2002
From: (Guido van Rossum)
Date: Fri, 11 Jan 2002 10:54:17 -0500
Subject: [Python-Dev] PEP 100 references & wording
In-Reply-To: Your message of "Fri, 11 Jan 2002 16:03:27 +0100."
References: <>
Message-ID: <>

Marc, can you update PEP 100?

You might want to retire the starship URL and use the PEP URL as the
official location.

--Guido van Rossum (home page:

From  Fri Jan 11 16:40:46 2002
From: (Thomas Heller)
Date: Fri, 11 Jan 2002 17:40:46 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <>
Message-ID: <08f501c19abe$d6785a80$e000a8c0@thomasnotebook>

From: "Jack Jansen" <>
> Recently, "Thomas Heller" <> said:
> > Here's an outline which could work in 2.2:
> This sounds very good! There's only one thing you'll have to explain
> to me: how would this work from C? My types are all in C, not in
> Python, so I'd need to do the magic in C. Where do I find examples of
> using metatypes from C?
I don't know of any, well, except ceval.c build_class():

 result = PyObject_CallFunction(metaclass, "OOO", name, bases, methods);

I had no need for this, because I'm very happy to write base classes/types
in C, extend them by deriving subtypes from them in Python, and plugging
everything together in Python.


From  Fri Jan 11 17:46:06 2002
From: (M.-A. Lemburg)
Date: Fri, 11 Jan 2002 18:46:06 +0100
Subject: [Python-Dev] PEP 100 references & wording
References: <>
 <> <>
Message-ID: <>

Guido van Rossum wrote:
> Marc, can you update PEP 100?
> You might want to retire the starship URL and use the PEP URL as the
> official location.

Will do, but it might take a week or two.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 11 18:06:04 2002
From: (M.-A. Lemburg)
Date: Fri, 11 Jan 2002 19:06:04 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <>
Message-ID: <>

[Metatypes, callbacks, etc.]

Wouldn't it be *much* easier to just use the copyreg/pickle 
API/protocol for dealing with all this ? 

AFAICTL, the actions needed by Jack are very similar to what 
pickle et al. do, and we already have all that in Python -- 
it's just not exposed too well at C level. 


PyArg_ParseTuple(args, "O@", &factory, &tuple) would
return a factory function and a tuple storing the data of
the object passed to the function


Py_BuildValue("O@", factory, tuple) would simply call factory
with tuple and use the return value as object.

(Note that void* can be wrapped into PyCObjects for "use" in

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 11 09:32:32 2002
From: (Andrew MacIntyre)
Date: Fri, 11 Jan 2002 20:32:32 +1100 (EDT)
Subject: [Python-Dev] eval() slowdown in 2.2 on MacOS X?
In-Reply-To: <>
Message-ID: <>

On Thu, 10 Jan 2002, Andrew Kuchling wrote:

> On Tue, Jan 08, 2002 at 01:41:37PM -0500, Tim Peters wrote:
> >Break it into smaller steps so we can narrow down possible causes:
> You should have cc'ed Barbara on that.  I forwarded your message to her
> and she wrote back (eventually):
> >BTW, I forgot to pass this on yesterday, but I tried the code in Tim Peters'
> >e-mail yesterday and the delay happens during the code = compile(...)
> >statement.
> She's going to install sshd on her machine, so maybe this weekend I'll
> be able to log in, compile Python from source, and poke around in an
> effort to figure out what's going on.

IMHO, Barbara's problem is almost certainly related to the system
malloc(), and if that is the case the only effective antidote is

However, just enabling WITH_PYMALLOC isn't enough as its currently only
configured to be used for object allocation and doesn't help the parser.

I did attach a patch enabling pymalloc for all interpreter memory
management to a long post to python-dev (which AMK might
recall this from his python-dev summaries) about my research into
test_longexp problems with the OS/2+EMX port. My research revealed that
test_longexp causes the parser to go ballistic with small mallocs.  While
pymalloc solved the test_longexp problem, using it for all interpreter
memory management caused about a 60% performance hit (on OS/2 + EMX).

On OS/2 the problem appeared to be overallocation (allocating 3-4x as much
memory as actually requested), but I recall reading a thread on
python-list wherein people reported system malloc()s that attempt to
coalesce blocks which had a similar slowdown effect in another set of
circumstances (I don't recall the details - might have been related to

Although OS/X's BSD heritage goes back to FreeBSD 3.2, I wouldn't have
expected either sort of problem from that source as I've had none of
these problems with my own FreeBSD systems - in fact last night I ran the
test suite on a CVS derived build on a 486/100 FreeBSD 4.4 system with
only 32MB of RAM and 128MB of swap and test_longexp passed & in only
minutes (the whole test suite took ~25-30 mins for the first pass, ie
without .pyc files). OS/X may have acquired a malloc() of different
heritage though.

I did keep a copy of the instrumented malloc() output from my OS/2
research if you're interested, although it probably isn't as helpful as it
could be (Python 2.0 vintage)... Likewise my crude debug malloc
wrapper... and I might be able to dig up my pymalloc_for_all patch...

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Fri Jan 11 20:40:56 2002
From: (Thomas Heller)
Date: Fri, 11 Jan 2002 21:40:56 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <> <>
Message-ID: <0a6201c19ae0$63d379c0$e000a8c0@thomasnotebook>

From: "M.-A. Lemburg" <>
> [Metatypes, callbacks, etc.]
> Wouldn't it be *much* easier to just use the copyreg/pickle 
> API/protocol for dealing with all this ? 
I *don't* think it's complicated (once you get used to metatypes).

> AFAICTL, the actions needed by Jack are very similar to what 
> pickle et al. do, and we already have all that in Python -- 
> it's just not exposed too well at C level. 
> Example:
> PyArg_ParseTuple(args, "O@", &factory, &tuple) would
> return a factory function and a tuple storing the data of
> the object passed to the function
> while
> Py_BuildValue("O@", factory, tuple) would simply call factory
> with tuple and use the return value as object.
> (Note that void* can be wrapped into PyCObjects for "use" in
> Python.)
I'm not sure we talk about the same thing: we (at least me) do not
want to serialize and reconstruct objects (what pickle does),
we want to convert objects from Python to C (convert them to parameters
usable in C API-calls), and back (convert them from handles, pointers,
whatever into Python objects) having only the Python *type* object
available in the latter case.

Or am I missing something?


From  Fri Jan 11 20:58:45 2002
From: (Thomas Heller)
Date: Fri, 11 Jan 2002 21:58:45 +0100
Subject: [Python-Dev] 2.2c1 test on windows - ok
Message-ID: <0ace01c19ae2$da29e670$e000a8c0@thomasnotebook>

installed from the windows installer - everything seems to be ok:

117 tests OK.
23 tests skipped: test_al test_cd test_cl test_crypt test_dbm test_dl test_fcntl test_fork1 test_gdbm test_gl test_grp t
est_imgfile test_largefile test_linuxaudiodev test_nis test_openpty test_poll test_pty test_pwd test_signal test_sockets
erver test_sunaudiodev test_timing


From  Fri Jan 11 21:05:42 2002
From: (Fred L. Drake, Jr.)
Date: Fri, 11 Jan 2002 16:05:42 -0500 (EST)
Subject: [Python-Dev] 2.2c1 test on windows - ok
In-Reply-To: <0ace01c19ae2$da29e670$e000a8c0@thomasnotebook>
References: <0ace01c19ae2$da29e670$e000a8c0@thomasnotebook>
Message-ID: <>

Thomas Heller writes:
 > installed from the windows installer - everything seems to be ok:

  I presume you meant 2.1.2c1 ???  ;-)


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Fri Jan 11 21:14:27 2002
From: (Thomas Heller)
Date: Fri, 11 Jan 2002 22:14:27 +0100
Subject: [Python-Dev] 2.2c1 test on windows - ok
References: <0ace01c19ae2$da29e670$e000a8c0@thomasnotebook> <>
Message-ID: <0b3201c19ae5$11be0a60$e000a8c0@thomasnotebook>

From: "Fred L. Drake, Jr." <>
> Thomas Heller writes:
>  > installed from the windows installer - everything seems to be ok:
>   I presume you meant 2.1.2c1 ???  ;-)

Of course. Sorry.


From  Fri Jan 11 21:25:04 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 16:25:04 -0500
Subject: [Python-Dev] 2.2c1 test on windows - ok
In-Reply-To: <0ace01c19ae2$da29e670$e000a8c0@thomasnotebook>
Message-ID: <>

> installed from the windows installer - everything seems to be ok:

Which flavor of Windows?  If NT or 2000 or XP, did you install with or
without admin privs?

> 117 tests OK.
> 23 tests skipped: test_al test_cd test_cl test_crypt test_dbm
> test_dl test_fcntl test_fork1 test_gdbm test_gl test_grp t
> est_imgfile test_largefile test_linuxaudiodev test_nis
> test_openpty test_poll test_pty test_pwd test_signal test_sockets
> erver test_sunaudiodev test_timing

Unfortunately, the code in to format this list to fit screen
width, and to say which skips are *expected* on win32, was new in 2.2, so
ineligible for inclusion in a 2.1 bugfix release.

From  Fri Jan 11 22:08:54 2002
From: (M.-A. Lemburg)
Date: Fri, 11 Jan 2002 23:08:54 +0100
Subject: [Python-Dev] Feature request: better support for "wrapper" objects
References: <> <> <0a6201c19ae0$63d379c0$e000a8c0@thomasnotebook>
Message-ID: <>

Thomas Heller wrote:
> From: "M.-A. Lemburg" <>
> > [Metatypes, callbacks, etc.]
> >
> > Wouldn't it be *much* easier to just use the copyreg/pickle
> > API/protocol for dealing with all this ?
> I *don't* think it's complicated (once you get used to metatypes).

I hear heads exploding already :-)
> > AFAICTL, the actions needed by Jack are very similar to what
> > pickle et al. do, and we already have all that in Python --
> > it's just not exposed too well at C level.
> >
> > Example:
> >
> > PyArg_ParseTuple(args, "O@", &factory, &tuple) would
> > return a factory function and a tuple storing the data of
> > the object passed to the function
> >
> > while
> >
> > Py_BuildValue("O@", factory, tuple) would simply call factory
> > with tuple and use the return value as object.
> >
> > (Note that void* can be wrapped into PyCObjects for "use" in
> > Python.)
> >
> I'm not sure we talk about the same thing: we (at least me) do not
> want to serialize and reconstruct objects (what pickle does),
> we want to convert objects from Python to C (convert them to parameters
> usable in C API-calls), and back (convert them from handles, pointers,
> whatever into Python objects) having only the Python *type* object
> available in the latter case.
> Or am I missing something?

I'm not really talking about serializing in the pickle sense (with
the intent of storing the data as string), it's more about
providing a way to recreate an object within the same process:
given an object x, provide a factory function f and a tuple args
such that x == apply(f, args).

Now, the object Jack has in mind wrap C pointers, so the
args would be a tuple containing one PyCObject. Getting
the pointer out of a PyCObject is really easy and by using
a tuple as intermediate storage form, you can also support
more complex objects, e.g. objects wrapping more than one
pointer or value.

After you have accessed the internal values, possibily
calculating new ones, you can then contruct a tuple,
pass it to the factory and return the same type of
input object as you received no input.

Since the API would be fixed, helper functions could be
added to make all this really easy at C level.

The fact that a registry similar to copyreg or a new
method on the input object is used to contruct the
factory function and the tuple, this mechanism can
easily be extended in Python as well as C. 

Furthermore, the existing pickle mechanisms could be reused 
for the existing objects, since most of these use very
reasonable state tuples for storing the object state.

I'm just suggesting this to make the whole wrapper
idea more flexible. One C void* pointer is really
only useful for very simple objects. The above easily
extends to complex objects such as e.g. mxDateTime

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 11 22:08:44 2002
From: (Greg Ward)
Date: Fri, 11 Jan 2002 17:08:44 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: <>
References: <> <> <>
Message-ID: <>

> I have an application (Grouch) that has to do a lot of trickery at
> pickle-time and unpickle-time, and as a result it happens to be
> sensitive to the order of unpickling.
> Anyone got a vague, hand-waving explanation for my vague, hand-waving
> complaint?  Or should I try to come up with a test case?
> Yes please, and post it to SourceForge.  There aren't that many
> changes in the source of since release 2.1.  (Or are you
> using cPickle?  If so, please say so.  The two aren't 100%
> equivalent.)

False alarm.  It appears that a change in dictionary order bit me; I was
lucky that pickling Grouch objects ever worked at all.

Lesson: when the code to support pickling is too complex too understand,
it's too complex.  Hmmm, that might have broader application.  ;-)

Greg Ward - Linux geek                        
Time flies like an arrow; fruit flies like a banana.

From  Fri Jan 11 22:46:34 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 17:46:34 -0500
Subject: [Python-Dev] Change in unpickle order in 2.2?
In-Reply-To: <>
Message-ID: <>

[Greg Ward]
> False alarm.  It appears that a change in dictionary order bit me; I was
> lucky that pickling Grouch objects ever worked at all.

You were luckier we changed dict iteration order for your own good <wink>.

> Lesson: when the code to support pickling is too complex too understand,
> it's too complex.  Hmmm, that might have broader application.  ;-)

No, I'm sure Zope Corporation would officially deny, denounce and decry any
intimation that convolution in support of pickling is a vice.  The true
problem is more likely that you haven't yet added enough layers of
abstraction around your pickling code.  I'm especially suspicious of that
because you were able to figure out the cause of the problem in less than a
week ...

From  Fri Jan 11 22:54:10 2002
From: (Jason R. Mastaler)
Date: Fri, 11 Jan 2002 15:54:10 -0700
Subject: [Python-Dev] sourceforge: where should feature requests go?
Message-ID: <>

I noticed that the sourceforge tracker has a "Feature Requests"
category, but that "Bugs" also has a "Feature Request" group.

Which is the right place to submit new feature requests?  

From  Fri Jan 11 23:04:11 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 18:04:11 -0500
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <>
Message-ID: <>

[Jason R. Mastaler]
> I noticed that the sourceforge tracker has a "Feature Requests"
> category, but that "Bugs" also has a "Feature Request" group.
> Which is the right place to submit new feature requests?

To the FR tracker.  That didn't always exist, and all FRs ended up in the
Bug tracker instead, so we added a FR group to Bugs to try to keep track of
them.  Unfortunately, once you add a group to an SF tracker, it can never be
removed, so this confusion won't go away.

Thanks for asking!

From  Fri Jan 11 23:10:09 2002
From: (Jason R. Mastaler)
Date: Fri, 11 Jan 2002 16:10:09 -0700
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <> ("Tim Peters"'s
 message of "Fri, 11 Jan 2002 18:04:11 -0500")
References: <>
Message-ID: <>

"Tim Peters" <> writes:

> To the FR tracker.  That didn't always exist, and all FRs ended up
> in the Bug tracker instead, so we added a FR group to Bugs to try to
> keep track of them.

OK.  My next question is: when a new item is submitted to the FR
tracker, does anyone get notice of it, or does it lie until one of
you happens to stumble across it?  In other words, should a new
request be accompanied by an e-mail somewhere?


(TMDA ( 
(UCE intrusion prevention in Python)

From  Fri Jan 11 23:21:48 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 18:21:48 -0500
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <>
Message-ID: <>

[Jason R. Mastaler]
> OK.  My next question is: when a new item is submitted to the FR
> tracker, does anyone get notice of it, or does it lie until one of
> you happens to stumble across it?  In other words, should a new
> request be accompanied by an e-mail somewhere?

All new items and changes to the FR tracker are automatically emailed to  Whether anyone besides me is subscribed to
that list is a question I can't answer <wink>.

Fair warning:  feature requests usually don't go anywhere unless somebody
volunteers a patch (comprising code, doc, and test suite changes)
implementing the request.  If you're not in a position to do that yourself,
it can be helpful to discuss what you want on comp.lang.python too (in the
hope that somebody else gets inspired to do it).

From  Fri Jan 11 23:32:22 2002
From: (Jason R. Mastaler)
Date: Fri, 11 Jan 2002 16:32:22 -0700
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <> ("Tim Peters"'s
 message of "Fri, 11 Jan 2002 18:21:48 -0500")
References: <>
Message-ID: <>

"Tim Peters" <> writes:

> Fair warning: feature requests usually don't go anywhere unless
> somebody volunteers a patch (comprising code, doc, and test suite
> changes) implementing the request.


> If you're not in a position to do that yourself, it can be helpful
> to discuss what you want on comp.lang.python too (in the hope that
> somebody else gets inspired to do it).

I do have a FR (#499529) that hasn't gone anywhere, but I did
volunteer a patch, I just had a question about method naming that
needed answering, so I haven't attached anything yet.  And of course,
you might not even be interested in including such a feature which was
another reason to hold off.

(TMDA ( 
(UCE intrusion prevention in Python)

From  Fri Jan 11 23:47:32 2002
From: (Martin v. Loewis)
Date: Sat, 12 Jan 2002 00:47:32 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <20020111122105.B1808@ibook.distro.conectiva> (message from
 Gustavo Niemeyer on Fri, 11 Jan 2002 12:21:05 -0200)
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva>
Message-ID: <>

> #define __init      __attribute__ ((__section__ (".text.init")))
> After surrounding doc strings with a macro, this will be easy to achieve.

Unfortunately, not with the doc string you propose. Apparently, your
macro is going to be used as

char foo__doc__[] = Py_DocString("this is foo");

However, with the attribute, the resulting code should read

char foo__doc__[] __attribute__((__section__("docstring")) = 
  "this is foo";

You cannot define the macro so that it comes out as expanding to
__attribute__, atleast not with that specific macro.


From  Sat Jan 12 00:26:28 2002
From: (Martin v. Loewis)
Date: Sat, 12 Jan 2002 01:26:28 +0100
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <>
References: <>
Message-ID: <>

> I noticed that the sourceforge tracker has a "Feature Requests"
> category, but that "Bugs" also has a "Feature Request" group.
> Which is the right place to submit new feature requests?  

Please use the separate tracker. The Bugs category predates the
separate tracker, but it cannot be removed (unfortunately).


From  Sat Jan 12 00:29:19 2002
From: (Martin v. Loewis)
Date: Sat, 12 Jan 2002 01:29:19 +0100
Subject: [Python-Dev] sourceforge: where should feature requests go?
In-Reply-To: <>
References: <>
Message-ID: <>

> All new items and changes to the FR tracker are automatically emailed to
>  Whether anyone besides me is subscribed to
> that list is a question I can't answer <wink>.

Let's assume you really don't know, for a moment: the subscriber list
is at


From Anthony Baxter <>  Sat Jan 12 00:59:46 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Sat, 12 Jan 2002 11:59:46 +1100
Subject: [Python-Dev] Re: Q: Testing Python 2.1.2 on cygwin
In-Reply-To: Message from Paul Everitt <>
 of "Fri, 11 Jan 2002 06:42:18 CDT." <>
Message-ID: <>

Paul Everitt tested on cygwin, and make test got:

> test_fork1
> test test_fork1 crashed -- exceptions.OSError: [Errno 11] Resource temporaril
y unavailable
> test_popen2
> test test_popen2 crashed -- exceptions.OSError: [Errno 11] Resource temporari
ly unavailable

Looks to me like cygwin's fork() support is busted.

In addition, the build of curses failed:

> build/temp.cygwin_nt-5.0-1.3.6-i686-2.1/_cursesmodule.o:/cygdrive/c/data/tmp/
Python-2.1.2c1/Modules/_cursesmodule.c:1808: more undefined references to `acs_
map' follow
> collect2: ld returned 1 exit status
> c:\data\tmp\Python-2.1.2c1\python.exe: *** unable to remap C:\apps\cygwin\bin
\cygssl.dll to same address as parent -- 0x1A2E0000
>       0 [main] python 964 sync_with_child: child 304(0x170) died before initi
alization with status code 0x1
>   38927 [main] python 964 sync_with_child: *** child state child loading dlls
> c:\data\tmp\Python-2.1.2c1\python.exe: *** unable to remap C:\apps\cygwin\bin
\cygssl.dll to same address as parent -- 0x1A2E0000
> 41469381 [main] python 964 sync_with_child: child 940(0x1E0) died before init
ialization with status code 0x1
> 41526514 [main] python 964 sync_with_child: *** child state child loading dll
> c:\data\tmp\Python-2.1.2c1\python.exe: *** unable to remap C:\apps\cygwin\bin
\cygssl.dll to same address as parent -- 0x1A2E0000
> 113407858 [main] python 964 sync_with_child: child 1396(0x2F0) died before in
itialization with status code 0x1
> 113447685 [main] python 964 sync_with_child: *** child state child loading dl
> make: [test] Error 58 (ignored)

Anthony Baxter     <>   
It's never too late to have a happy childhood.

From  Sat Jan 12 02:32:19 2002
From: (Tim Peters)
Date: Fri, 11 Jan 2002 21:32:19 -0500
Subject: [Python-Dev] Re: Q: Testing Python 2.1.2 on cygwin
In-Reply-To: <>
Message-ID: <>

[Anthony Baxter]
> Paul Everitt tested on cygwin, and make test got:

There's a long section about known Cygwin problems in the main README file,
mostly written by Cygwin developers.  Make sure Paul followed the special
Cygwin build instructions before worrying too much; the failure of curses to
build won't go away regardless (see README).  I believe Michael Hudson is
(or was) paying more attention to Cygwin than other Python-Dev'ers.

From  Sat Jan 12 12:27:38 2002
From: (M.-A. Lemburg)
Date: Sat, 12 Jan 2002 13:27:38 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <>
Message-ID: <>

Martin v. Loewis wrote:

>>#define __init      __attribute__ ((__section__ (".text.init")))
> [...]
>>After surrounding doc strings with a macro, this will be easy to achieve.
> Unfortunately, not with the doc string you propose. Apparently, your
> macro is going to be used as
> char foo__doc__[] = Py_DocString("this is foo");
> However, with the attribute, the resulting code should read
> char foo__doc__[] __attribute__((__section__("docstring")) = 
>   "this is foo";
> You cannot define the macro so that it comes out as expanding to
> __attribute__, atleast not with that specific macro.

Why don't you use macro which only takes the name of the
static array and the doc-string itself as argument ? This
could then be expanded to whatever needs to be done for
a particular case/platform, e.g.

Py_DefineDocString(foo__doc__, "foo does bar");

(I use such an approach in the mx stuff and it works great.)

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sat Jan 12 16:24:14 2002
From: (Jason Tishler)
Date: Sat, 12 Jan 2002 11:24:14 -0500
Subject: [Python-Dev] Re: Q: Testing Python 2.1.2 on cygwin
In-Reply-To: <>
References: <> <>
Message-ID: <>


The problems with fork() and _curses that you reported are already known.
In fact, the _curses build problem is already solved.  Please read
the Python README or the latest Cygwin Python 2.2 README which is
available at:

On Fri, Jan 11, 2002 at 09:32:19PM -0500, Tim Peters wrote:
> [Anthony Baxter]
> > Paul Everitt tested on cygwin, and make test got:
> There's a long section about known Cygwin problems in the main README file,
> mostly written by Cygwin developers.  Make sure Paul followed the special
> Cygwin build instructions before worrying too much;

The fork() problem can be worked around by building the _socket module

> the failure of curses to build won't go away regardless (see README).

This problem has been solved in the latest Cygwin ncurses release:

Note that you will have to explicitly ask Cygwin's setup.exe to install
this ncurses version because it is still marked as test.

FYI, a pre-built Python 2.2 is part of the standard Cygwin distribution:

BTW, if you are trying to build other Python versions (e.g., 2.1.2), you
may find the Cygwin specific patch (i.e., CYGWIN-PATCHES/python.patch)
in the Python source tarballs on the Cygwin mirrors useful to review.


From  Sat Jan 12 11:28:46 2002
From: (Andrew MacIntyre)
Date: Sat, 12 Jan 2002 22:28:46 +1100 (EDT)
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
Message-ID: <>

In preparing a set of patches intended to bring the OS/2 EMX port into
CVS, I have a dilemma as to how best to integrate some changes to standard
library modules.

As background to this request I note that EMX and Cygwin have similar
philosophies and attributes, being Posix/Unixish runtime environments on
OSes with PC-DOS ancestry. Both rely on the GNU toolchain for software

As a result of feedback on the previous set of patches, I am pruning
cosmetic changes and attempting to minimise the footprint of the necessary

The particular changes I am looking for guidance on (or BDFL
pronouncement on, as the case may be) involve and the functionality

The approach used in the port as released in binary form was to create a
module called (probably should really be called,
which replicates the functionality of with OS2/EMX specific

Most of the changes have to do with using different path separator
characters, with a few other changes reflecting slightly different
behavour under EMX.  EMX promotes the use of '/' as the path separator
rather than '\', though it works with the latter.  I don't know if Cygwin
promotes the same convention.

If I were to merge into (which I incline towards
instinctively) I believe that using references to os.sep and os.altsep
rather than explicit '\\' and '/' strings would significantly reduce the
extent of conditionalisation required, but in the process introduce
significant source changes into (although the logical changes
would be much less significant).

If rationalising the use of separator characters (by moving away from
hard-coded strings) in is unattractive, then I think I'd prefer
to keep (renamed to as is, rather than revert
to the DOS standard path separators.

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Sat Jan 12 20:11:15 2002
From: (Guido van Rossum)
Date: Sat, 12 Jan 2002 15:11:15 -0500
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: Your message of "Sat, 12 Jan 2002 22:28:46 +1100."
References: <>
Message-ID: <>

> The particular changes I am looking for guidance on (or BDFL
> pronouncement on, as the case may be) involve and the functionality
> in
> The approach used in the port as released in binary form was to create a
> module called (probably should really be called,
> which replicates the functionality of with OS2/EMX specific
> changes.
> Most of the changes have to do with using different path separator
> characters, with a few other changes reflecting slightly different
> behavour under EMX.  EMX promotes the use of '/' as the path separator
> rather than '\', though it works with the latter.  I don't know if Cygwin
> promotes the same convention.
> If I were to merge into (which I incline towards
> instinctively) I believe that using references to os.sep and os.altsep
> rather than explicit '\\' and '/' strings would significantly reduce the
> extent of conditionalisation required, but in the process introduce
> significant source changes into (although the logical changes
> would be much less significant).
> If rationalising the use of separator characters (by moving away from
> hard-coded strings) in is unattractive, then I think I'd prefer
> to keep (renamed to as is, rather than revert
> to the DOS standard path separators.

The various modules ntpath, posixpath, macpath etc. are not just their
to support their own platform on itself.  They are also there to
support foreign pathname twiddling.  E.g. On Windows I might have a
need to munge posix paths -- I can do that by explicitly importing
posixpath.  Likewise the reverse.

So I think changing to use os.set etc. would be wrong, and
creating a new file is the right thing to do -- despite
the endless cloning of the same code. :-(  (Maybe a different way to
share more code between the XXXpath modules could be devised.)

--Guido van Rossum (home page:

From  Sat Jan 12 20:36:44 2002
From: (Martin v. Loewis)
Date: Sat, 12 Jan 2002 21:36:44 +0100
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: <>
 (message from Andrew MacIntyre on Sat, 12 Jan 2002 22:28:46 +1100
References: <>
Message-ID: <>

> If rationalising the use of separator characters (by moving away from
> hard-coded strings) in is unattractive, then I think I'd prefer
> to keep (renamed to as is, rather than revert
> to the DOS standard path separators.

I think replacing hard-coded separators by os.sep is a good thing to
do. However, if you find that you cannot achieve re-use of ntpath for
OS/2 by existing customization alone, please do not add conditional
code into ntpath. It would be very confusing if, in, there
is a test whether the system is OS/2.


From  Sun Jan 13 07:39:17 2002
From: (Tim Peters)
Date: Sun, 13 Jan 2002 02:39:17 -0500
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: <>
Message-ID: <>

> The various modules ntpath, posixpath, macpath etc. are not just their
> to support their own platform on itself.  They are also there to
> support foreign pathname twiddling.  E.g. On Windows I might have a
> need to munge posix paths -- I can do that by explicitly importing
> posixpath.  Likewise the reverse.


> So I think changing to use os.set etc. would be wrong, and
> creating a new file is the right thing to do -- despite
> the endless cloning of the same code. :-(  (Maybe a different way to
> share more code between the XXXpath modules could be devised.)

Create and put shared routines there.  Then a can

from _commonpath import f, g, h

to re-export them.

An excellent candidate for inclusion would be expandvars():  the different
routines for that now have radically different behaviors in endcases, it's
impossible to say which are bugs or features, yet the routine *should* be
wholly platform-independent (no, the Mac doesn't need its own version --
when an envar isn't found, the $envar token is retained literally).  Having
different versions of walk() is also silly, ditto isdir(), etc.

From  Mon Jan 14 06:20:32 2002
From: (Neil Hodgson)
Date: Mon, 14 Jan 2002 17:20:32 +1100
Subject: [Python-Dev] PEP 277: Unicode file name support for Windows NT, was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <>
Message-ID: <02ff01c19cc3$92514540$0acc8490@neil>

M.-A. Lemburg:

> Guys, this discussion is getting somewhat out of hand. I believe 
> that no-one on python-dev is seriously following this anymore,
> yet OTOH your are working on a rather important part of the Python
> file API.
> I'd suggest to write up the problem and your conclusions as a
> PEP for everyone to understand before actually starting to
> checkin anything.

   OK, PEP 277 is now available from:


From  Mon Jan 14 07:11:54 2002
From: (Martin v. Loewis)
Date: Mon, 14 Jan 2002 08:11:54 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT, was PEP-time ? ...
In-Reply-To: <02ff01c19cc3$92514540$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil>
Message-ID: <>

>    OK, PEP 277 is now available from:

Looks very good to me, except that the listdir approach (unicode in,
unicode out) should apply uniformly to all platforms; I'll provide an
add-on patch to your implementation once the PEP is approved.


From  Mon Jan 14 11:30:53 2002
From: (Gustavo Niemeyer)
Date: Mon, 14 Jan 2002 09:30:53 -0200
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <>
Message-ID: <20020114093053.C1325@ibook.distro.conectiva>

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

> Why don't you use macro which only takes the name of the
> static array and the doc-string itself as argument ? This
> could then be expanded to whatever needs to be done for
> a particular case/platform, e.g.
> Py_DefineDocString(foo__doc__, "foo does bar");
> (I use such an approach in the mx stuff and it works great.)

Yes, it's a nice idea!

I'm looking for some way to "discard" the string using a macro. Let me
explain with code:

#define Py_DOCSTR(name, str) static char *name =3D str
#define Py_DOCSTR_START(name) Py_DOCSTR(name,)
#define Py_DOCSTR_END ;
#define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what
                                                      follows somehow */
#define Py_DOCSTR_END /* Stop discarding */

This would make it possible to do something like this:

Py_DOCSTR(simple_doc, "This is a simple doc string.");

=2E..and also...

"This is a complex doc string"
#ifndef MS_WIN16
"like the one in sysmodule.c"
"Something else"

This seems to be the most elegant way to allow these complex strings.
But unfortunately, I haven't found any way so far to do this "discarding
thing", besides including another "#if" in the documentation itself.

Any good ideas?

Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see



From  Mon Jan 14 12:05:57 2002
From: (M.-A. Lemburg)
Date: Mon, 14 Jan 2002 13:05:57 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva>
Message-ID: <>

Gustavo Niemeyer wrote:
> > Why don't you use macro which only takes the name of the
> > static array and the doc-string itself as argument ? This
> > could then be expanded to whatever needs to be done for
> > a particular case/platform, e.g.
> >
> > Py_DefineDocString(foo__doc__, "foo does bar");
> >
> > (I use such an approach in the mx stuff and it works great.)
> Yes, it's a nice idea!
> I'm looking for some way to "discard" the string using a macro. Let me
> explain with code:
> [...]
> #define Py_DOCSTR(name, str) static char *name = str
> #define Py_DOCSTR_START(name) Py_DOCSTR(name,)
> #define Py_DOCSTR_END ;
> #else
> #define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what
>                                                       follows somehow */
> #define Py_DOCSTR_END /* Stop discarding */
> #endif
> [...]
> This would make it possible to do something like this:
> Py_DOCSTR(simple_doc, "This is a simple doc string.");
> ...and also...
> Py_DOCSTR_START(complex_doc)
> "This is a complex doc string"
> #ifndef MS_WIN16
> "like the one in sysmodule.c"
> #endif
> "Something else"
> This seems to be the most elegant way to allow these complex strings.
> But unfortunately, I haven't found any way so far to do this "discarding
> thing", besides including another "#if" in the documentation itself.
> Any good ideas?

Wouldn't it be much simpler to wrap the complete Py_DOCSTR() 
into #ifdefs ?

BTW, I don't we'll ever need to #ifdef doc-strings for platforms;
you can just as well put the information for all platforms into 
the doc-string -- after the recipient is a human with enough 
non-AI to parse the doc-string into meaningful sections ;-)

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Mon Jan 14 12:41:46 2002
From: (Gustavo Niemeyer)
Date: Mon, 14 Jan 2002 10:41:46 -0200
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <>
Message-ID: <20020114104146.A2607@ibook.distro.conectiva>

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

> Wouldn't it be much simpler to wrap the complete Py_DOCSTR()=20
> into #ifdefs ?

Yes, it's going to be wrapped! I took this code out of a file I was
using to show the #ifdef problem.

> BTW, I don't we'll ever need to #ifdef doc-strings for platforms;

This would make things pretty easy, but note that we are *already*
#ifdef'ing doc-strings for platforms. Python/sysmodule.c is an example
of such.

> you can just as well put the information for all platforms into=20
> the doc-string -- after the recipient is a human with enough=20
> non-AI to parse the doc-string into meaningful sections ;-)

Cool! Are we going to change the existent doc strings then?

Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]

Content-Type: application/pgp-signature
Content-Disposition: inline

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see



From  Mon Jan 14 13:41:57 2002
From: (
Date: Mon, 14 Jan 2002 07:41:57 -0600
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <20020114093053.C1325@ibook.distro.conectiva>
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva>
Message-ID: <>

The following is the solution that comes to mind for me.  My other idea was
creating a static char* or a static function with the char* inside it, in
the hopes it would be discarded as unused, but gcc doesn't seem to do that.

Seems to me that compared to this, rewriting those docstrings that are
victim of preprocessor definitions already is certainly better for
readability of the docstrings in the source code...

Jeff Epler
On Mon, Jan 14, 2002 at 09:30:53AM -0200, Gustavo Niemeyer wrote:
> I'm looking for some way to "discard" the string using a macro. Let me
> explain with code:
> [...]
> #define Py_DOCSTR(name, str) static char *name = str
> #define Py_DOCSTR_START(name) Py_DOCSTR(name,)
> #define Py_DOCSTR_END ;
  #define Py_DOCSTR_PART(s) s
> #else
> #define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what
>                                                       follows somehow */
> #define Py_DOCSTR_END /* Stop discarding */
  #define Py_DOCSTR_PART(s) /* (nothing) */
> #endif
> [...]
> This would make it possible to do something like this:
> Py_DOCSTR(simple_doc, "This is a simple doc string.");
> ...and also...
> Py_DOCSTR_START(complex_doc)
  Py_DOCSTR_PART(              "This is a complex doc string")
> #ifndef MS_WIN16
  Py_DOCSTR_PART(              "like the one in sysmodule.c")
> #endif
  Py_DOCSTR_PART(              "Something else")

From  Mon Jan 14 14:25:46 2002
From: (Fred L. Drake, Jr.)
Date: Mon, 14 Jan 2002 09:25:46 -0500 (EST)
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum writes:
 > The various modules ntpath, posixpath, macpath etc. are not just their
 > to support their own platform on itself.  They are also there to

Note that ntpath.abspath() relies on nt._getfullpathname().  It is not
unreasonable for this particular function to require that it actually
be running on NT, so I'm not going to suggest changing this.  On the
other hand, it means the portable portions of the module are (mostly)
not tested when the regression test is run on a platform other than
Windows; the ntpath.abspath() test raises an ImportError since
ntpath.abspath() imports the "nt" module within the function, and the
resulting ImportError causes the rest of the unit test to be skipped
and reports that the test is skipped.

I'd like to change the test so that the abspath() test is only run
if the "nt" module is available:

    import nt
except ImportError:
    tester('ntpath.abspath("C:\\")', "C:\\")

Any objections?


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From sdm7g@Virginia.EDU  Mon Jan 14 17:22:26 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 12:22:26 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
Message-ID: <Pine.OSX.4.43.0201141213040.974-100000@localhost>

Since PEP 216 on string interpolation is still active, I'ld appreciate
it if some of it's supporters would comment on my revised alternative
solution (posted on comp.lang.python and at google thru):


I didn't get any feedback on the first version that was posted --


particularly whether the syntax was acceptable, or if a 'magic string'
solution was still preferred.

-- Steve Majewski

From  Mon Jan 14 09:03:24 2002
From: (Andrew MacIntyre)
Date: Mon, 14 Jan 2002 20:03:24 +1100 (EDT)
Subject: [Python-Dev] guidance sought: merging port related changes to
 Library modules
In-Reply-To: <>
Message-ID: <>

On Sat, 12 Jan 2002, Guido van Rossum wrote:

> The various modules ntpath, posixpath, macpath etc. are not just their
> to support their own platform on itself.  They are also there to
> support foreign pathname twiddling.  E.g. On Windows I might have a
> need to munge posix paths -- I can do that by explicitly importing
> posixpath.  Likewise the reverse.
> So I think changing to use os.set etc. would be wrong, and
> creating a new file is the right thing to do -- despite
> the endless cloning of the same code. :-(  (Maybe a different way to
> share more code between the XXXpath modules could be devised.)

I'd not considered the foreign path munging use, which I agree justifies
retaining hardcoded path separators.

I'll proceed on the basis of the approach.  I'll pass for
the time being on Tim's suggested rationalisation approach though...

Thanks for the enlightment.

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Mon Jan 14 18:43:30 2002
From: (M.-A. Lemburg)
Date: Mon, 14 Jan 2002 19:43:30 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva>
Message-ID: <>

Gustavo Niemeyer wrote:
> > Wouldn't it be much simpler to wrap the complete Py_DOCSTR()
> > into #ifdefs ?
> Yes, it's going to be wrapped! I took this code out of a file I was
> using to show the #ifdef problem.
> > BTW, I don't we'll ever need to #ifdef doc-strings for platforms;
> This would make things pretty easy, but note that we are *already*
> #ifdef'ing doc-strings for platforms. Python/sysmodule.c is an example
> of such.

Hmm, I wasn't aware of such doc-strings.
> > you can just as well put the information for all platforms into
> > the doc-string -- after the recipient is a human with enough
> > non-AI to parse the doc-string into meaningful sections ;-)
> Cool! Are we going to change the existent doc strings then?

Well, can't speak for PythonLabs, but I don't see any benefit
from making doc-string complicated by introducing #ifdefs. It
doesn't buy us anything, IMHO. Even worse: it makes translating
the doc-strings harder.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Mon Jan 14 18:47:05 2002
From: (Guido van Rossum)
Date: Mon, 14 Jan 2002 13:47:05 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: Your message of "Mon, 14 Jan 2002 19:43:30 +0100."
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva>
Message-ID: <>

> Well, can't speak for PythonLabs, but I don't see any benefit
> from making doc-string complicated by introducing #ifdefs. It
> doesn't buy us anything, IMHO. Even worse: it makes translating
> the doc-strings harder.

If there is platform-specific functionality, the docstring should
document that only on the platform where it applies.

--Guido van Rossum (home page:

From  Mon Jan 14 19:56:05 2002
From: (M.-A. Lemburg)
Date: Mon, 14 Jan 2002 20:56:05 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva>
 <> <>
Message-ID: <>

Guido van Rossum wrote:
> > Well, can't speak for PythonLabs, but I don't see any benefit
> > from making doc-string complicated by introducing #ifdefs. It
> > doesn't buy us anything, IMHO. Even worse: it makes translating
> > the doc-strings harder.
> If there is platform-specific functionality, the docstring should
> document that only on the platform where it applies.

Just to make sure... I was talking about something like:

open__doc__ = \
    "Open the file. On Windows, the MBCS encoding is assumed, "\
    "on all other systems, the file name must be given in ASCII.";


open__doc__ = \
    "Open the file, assuming the filename is given in the MBCS "\
open__doc__ = \
    "Open the file, assuming the filename is given in ASCII.";

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Mon Jan 14 19:51:21 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 11:51:21 -0800
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
References: <Pine.OSX.4.43.0201141213040.974-100000@localhost>
Message-ID: <>

Steven Majewski wrote:
> particularly whether the syntax was acceptable, or if a 'magic string'
> solution was still preferred.

IMHO, string interpolation should be one of the easiest things in the
language. It should be something you learn in the first half of your
first day learning Python. Any extra level of logical indirection seems
misplaced to me.

 Paul Prescod

From  Mon Jan 14 20:12:24 2002
From: (Skip Montanaro)
Date: Mon, 14 Jan 2002 14:12:24 -0600
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
References: <20020110224908.C884@ibook.distro.conectiva>
Message-ID: <>

    >> If there is platform-specific functionality, the docstring should
    >> document that only on the platform where it applies.

    mal> Just to make sure... I was talking about something like:

    mal> open__doc__ = \
    mal>     "Open the file. On Windows, the MBCS encoding is assumed, "\
    mal>     "on all other systems, the file name must be given in ASCII.";


    mal> vs.

    mal> #ifdef MS_WINDOWS
    mal> open__doc__ = \
    mal>     "Open the file, assuming the filename is given in the MBCS "\
    mal>     "encoding.";
    mal> #else
    mal> open__doc__ = \
    mal>     "Open the file, assuming the filename is given in ASCII.";
    mal> #endif


I agree w/ MAL.  I happen to be developing an application on Linux right
now, but I'm interested in where I might encounter problems when it migrates
to Windows.  I would much prefer the documentation make it eas(y|ier) to
identify platform differences.  This holds true for docstrings, because they
are the most readily available documentation format.


From  Mon Jan 14 20:06:52 2002
From: (Guido van Rossum)
Date: Mon, 14 Jan 2002 15:06:52 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: Your message of "Mon, 14 Jan 2002 20:56:05 +0100."
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <>
Message-ID: <>

> Just to make sure... I was talking about something like:
> open__doc__ = \
>     "Open the file. On Windows, the MBCS encoding is assumed, "\
>     "on all other systems, the file name must be given in ASCII.";
> vs.
> #ifdef MS_WINDOWS
> open__doc__ = \
>     "Open the file, assuming the filename is given in the MBCS "\
>     "encoding.";
> #else
> open__doc__ = \
>     "Open the file, assuming the filename is given in ASCII.";
> #endif

Given the main use case for docstrings, I'd prefer the latter.  The
library manual should contain the "all-platforms" documentation.

--Guido van Rossum (home page:

From  Mon Jan 14 20:17:19 2002
From: (Guido van Rossum)
Date: Mon, 14 Jan 2002 15:17:19 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: Your message of "Mon, 14 Jan 2002 14:12:24 CST."
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <> <>
Message-ID: <>

> I agree w/ MAL.  I happen to be developing an application on Linux
> right now, but I'm interested in where I might encounter problems
> when it migrates to Windows.  I would much prefer the documentation
> make it eas(y|ier) to identify platform differences.  This holds
> true for docstrings, because they are the most readily available
> documentation format.

But what about optional features that are only available on platform
X?  Do you really want those to clutter up the docstring on platforms
where they aren't available?  On the platform where they *are*, their
docstring should have a "(Platform X only)" note.

--Guido van Rossum (home page:

From  Mon Jan 14 20:19:13 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 14:19:13 -0600
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

Paul Prescod wrote:
> IMHO, string interpolation should be one of the easiest things in the
> language. It should be something you learn in the first half of your
> first day learning Python. Any extra level of logical indirection seems
> misplaced to me.


## Jason Orendorff

From sdm7g@Virginia.EDU  Mon Jan 14 20:44:05 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 15:44:05 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Paul Prescod wrote:

> > particularly whether the syntax was acceptable, or if a 'magic string'
> > solution was still preferred.
> IMHO, string interpolation should be one of the easiest things in the
> language. It should be something you learn in the first half of your
> first day learning Python. Any extra level of logical indirection seems
> misplaced to me.

Do you have any comments or suggestions about a substitution syntax, Paul?

 I think anything except PEP 216's magic initial u" for strings is
able to be done with an object extension rather than a syntax change,
including the substitution syntax within the magic string.

 I kept '%' rather than '$' because I assumed that particular char
choice was a rather arbitrary part of the design patterned after Tcl
or Perl, and that by keeping '%' I could do it with a dict. If a
different syntax is desired, then it can be done by extending string
to a magic format string object (rather than a magic string syntax).

 I'm not sure what you mean by logical indirection here: is that
a comment on the syntax, or do you object to the idea of not implementing
substitution by a language syntax change. ( But if what you mean is
you want fewer chars for a double substition, that's something that
can be fixed.)

 One reason I would prefer a "magic object" implementation, rather than
a 'magic syntax' one is that, after playing around with this for a bit,
I can see that there are a lot of possibilities for various substitution
and template languages. A language syntax change, once accepted is cast
in stone (and a new revised proposal is much less likely  to be
considered) while we can muck about and experiment with object extensions
both before and after the get put into the standard lib.

-- Steve Majewski

From sdm7g@Virginia.EDU  Mon Jan 14 20:49:17 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 15:49:17 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Jason Orendorff wrote:

> Paul Prescod wrote:
> > IMHO, string interpolation should be one of the easiest things in the
> > language. It should be something you learn in the first half of your
> > first day learning Python. Any extra level of logical indirection seems
> > misplaced to me.
> +1

Was that +1 for PEP 216?, my alternative proposal? or Paul's comments?

I think I agree with his comment above, but I'm not sure whether it
was intended a comment on the syntax (which is probably justified),
or objecting to solving the problem other than my changing the language
syntax (which I don't agree with), or just a statement of principals.

-- Steve

From  Mon Jan 14 21:04:17 2002
From: (Skip Montanaro)
Date: Mon, 14 Jan 2002 15:04:17 -0600
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
References: <20020110224908.C884@ibook.distro.conectiva>
Message-ID: <>

    >> I would much prefer the documentation make it eas(y|ier) to identify
    >> platform differences.  This holds true for docstrings, because they
    >> are the most readily available documentation format.

    Guido> But what about optional features that are only available on
    Guido> platform X?  Do you really want those to clutter up the docstring
    Guido> on platforms where they aren't available?  On the platform where
    Guido> they *are*, their docstring should have a "(Platform X only)"
    Guido> note.

Perhaps I should take a half-step back under Guido's withering stare.
That's probably why I've been feeling a chill all day... ;-)

I don't think it's necessary for the docstring to contain all the
excruciating detail available in the library reference manual, but I think a


at the interpreter prompt or a docstring popped up in PyCrust or other
IDE-like thing should give you an indication that there are semantic
differences for that function across platforms.  Ideally, these differences
would only be documented at the highest level they can come into play.  For
example, if a class or module exhibits some platform-dependency, its
docstring would indicate that, not the docstring of every one of its
methods.  Also, consider time.strptime.  It's not always available, so the
time module's docstring should mention its possible absence depending on
platform.  On platforms where it's not supported, putting a "platform x
only" note in strptime's docstring won't help much to the confused
programmer wondering where to disappeared to.

Of course, it's easy for me to spout platitudes here.  Adjusting to such a
convention will probably add a fair amount of work to somebody's already
full schedule.


From  Mon Jan 14 21:32:01 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 15:32:01 -0600
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

> > +1
> Was that +1 for PEP 216?, my alternative proposal? or Paul's comments?

It was a +1 for Paul's comments, both its principles and as maybe
a -0.3 criticism of your alternative.  No opinion on PEP 215.

## Jason Orendorff

From  Mon Jan 14 21:34:41 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 13:34:41 -0800
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
References: <>
Message-ID: <>

Steven Majewski wrote:
>  I'm not sure what you mean by logical indirection here: is that
> a comment on the syntax, or do you object to the idea of not implementing
> substitution by a language syntax change. 

Sorry I wasn't clear. Let's say it's the second hour of our Perl/Python

Here's Perl:

$a = 5;
$b = 6;
print "$a $b";

Lots of yucky extra chars in that code but you can't find much negative
stuff to say about the complexity of the string interpolation!

Here's Python:

a = 5;
b = 6;
print "%(a)s %(b)s" % vars()

Extra indirection: What does % do? What does vars() do? What does the
"s" mean? How does this use of % relate to the traditional meanings of
either percentage or modulus? 

This is one of the two problems I would like PEP 215 to solve. The other
one is to allow simple function calls and array lookups etc. to be done
"inline" to avoid setting up trivial vars or building unnecessary
dictionaries. If I understand your proposal correctly, I could only get
the evaluation behaviour by making the "indirection" problem even adding in yet another function call (well, class construtor
call), tentatively called EvalDict.

Another benefit of the PEP 215 model is that something hard-coded in the
syntax is much more amenable to compile time analysis. String
interpolation is actually quite compatible with standard compilation
techniques. You just rip the expressions out of the string, compile them
to byte-code and replace them with pointers ot the evaluated results. As
PEP 215 mentions, this also has advantages for reasoning about security.
If I tell a new programmer to avoid the use of "eval" unless they
consult with me, I'll have to tell them to avoid EvalDict also. My usual
approach is to consider eval and exec to be advanced (and rarely used)
features that I don't even teach new programmers.

I don't know that Jython allows me today to ship a JAR without the
Python parser and evaluator but I could imagine a future version that
would give me that option. Widespread use of EvalDict would render that
option useless.

Re: $ versus %. $ is "the standard" in other languages and shells. % is
the current standard in Python. $ has the advantage that it doesn't have
to work around Python's current C-inspired syntax. So I guess I
reluctantly favor $.

Also, EvalDict should be called evaldict to match the other constructors
in __builtins__.

So while I understand the advantage of non-syntactic solutions, in this
case I am still in favor of the syntax.

 Paul Prescod

From sdm7g@Virginia.EDU  Mon Jan 14 22:06:43 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 17:06:43 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

[ Oops. Initial subject line said incorrectly PEP 216]

On Mon, 14 Jan 2002, Paul Prescod wrote:

> [...] As
> PEP 215 mentions, this also has advantages for reasoning about security.
> If I tell a new programmer to avoid the use of "eval" unless they
> consult with me, I'll have to tell them to avoid EvalDict also. My usual
> approach is to consider eval and exec to be advanced (and rarely used)
> features that I don't even teach new programmers.

 But if you're going to allow interpolation of the results of arbitrary
function into a string, it's going to be a security problem whether
or not you use 'eval' to do it. My code hides the eval in the object's
python code. u" strings would hide the eval in the C code. How is one
more or less secure than the other.
 The security issue seems to be an argument for a non-language-syntax
implementation, as it means that: the hidden eval's could be controlled
with a restricted execution environment. ( Also the same advantages I
cited to easily experiment with alternatives -- we could roll out a
solution without having to tackle the security issue right away.)
 Also, although I agree with most of your other comments on making it
simple and easy, the security issue argues against making it TOO simple.
For example, I was considering making the current namespace of the
call a default, so you wouldn't need globals() -- but I was worried
that because of security and other issues, maybe that was too much
"magic" . I think maybe how much magic is enough and how much is too
much is one of the issues to discuss.

Thanks for expanding on your initial comment.
I think you're right that it needs to be simpler.
But, for several reasons, security among them, I'm still -1 on
PEP 215.

-- Steve

From sdm7g@Virginia.EDU  Mon Jan 14 22:18:25 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 17:18:25 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Steven Majewski wrote:

> [...]  I think maybe how much magic is enough and how much is too
> much is one of the issues to discuss.
> Thanks for expanding on your initial comment.
> I think you're right that it needs to be simpler.
> But, for several reasons, security among them, I'm still -1 on
> PEP 215.

In fact, I think "too much magic" is my main objection to PEP 215.
Having a magic string, which looks like it's a constant, with no operators
or function calls associated with it being the implicit source of a while
series of function calls and possibly unbounded computations is just
hiding too much magic for me to swallow.  u"$$main()" ?

-- Steve

From  Mon Jan 14 22:20:25 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 14:20:25 -0800
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
References: <>
Message-ID: <>

Steven Majewski wrote:
>  But if you're going to allow interpolation of the results of arbitrary
> function into a string, it's going to be a security problem whether
> or not you use 'eval' to do it. My code hides the eval in the object's
> python code. u" strings would hide the eval in the C code. How is one
> more or less secure than the other.

I think you mean $" strings, not u" strings. Given:

a = $" $, 5)"

I can translate that *at compile time* to:

a = $" %s" %, 5)

No runtime evaluation is necessary. So I see no security issues here. On
the other hand, evaldict really does have the same semantics as an eval,
right? Probably it is no more or less dangerous if you only do a single
level of EvalDict-ing. But once you get into multiple levels you could
get into a situation where user-provided code is being evaluated. The
first level of EvalDict incorporates the user-provided code into the
string and the second level evaluates it.

Ping's current runtime implementation does use "eval" but you could
imagine an alternate implementation that actually parses the relevant
parts of the string according to the Python grammar, and merely applies
the appropriate semantics. It would use "." to trigger getattr, "()" to
trigger apply, "[]" to trigger getitem and so forth. Then there would be
no eval and thus way to eval user-provided code.

 Paul Prescod

From  Mon Jan 14 22:39:16 2002
From: (Skip Montanaro)
Date: Mon, 14 Jan 2002 16:39:16 -0600
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
References: <>
Message-ID: <>

    Paul> Sorry I wasn't clear. Let's say it's the second hour of our
    Paul> Perl/Python class.

    Paul> Here's Perl:

    Paul> $a = 5;
    Paul> $b = 6;
    Paul> print "$a $b";


    Paul> Here's Python:

    Paul> a = 5;
    Paul> b = 6;
    Paul> print "%(a)s %(b)s" % vars()

So?  There are some things Perl does better than Python, some things Python
does better than Perl.  Maybe this is a (small) notch in Perl's gun.  It
just doesn't seem significantly better enough to me to warrant a language
change.  I would have written the Python example as

    print a, b

For the simple examples that would normally arise in an introductory
programming class, I think Python's print statement works just fine.  For
more hairy cases, Perl probably wins.  That's life.

but-that's-just-me-ly, y'rs,

Skip Montanaro ( -

From sdm7g@Virginia.EDU  Mon Jan 14 22:59:08 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 17:59:08 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Paul Prescod wrote:

> Steven Majewski wrote:
> >
> >....
> >
> >  But if you're going to allow interpolation of the results of arbitrary
> > function into a string, it's going to be a security problem whether
> > or not you use 'eval' to do it. My code hides the eval in the object's
> > python code. u" strings would hide the eval in the C code. How is one
> > more or less secure than the other.
> I think you mean $" strings, not u" strings. Given:

Oops. Yes.

> a = $" $, 5)"
> I can translate that *at compile time* to:
> a = $" %s" %, 5)
> No runtime evaluation is necessary. So I see no security issues here. On
> the other hand, evaldict really does have the same semantics as an eval,
> right? Probably it is no more or less dangerous if you only do a single
> level of EvalDict-ing. But once you get into multiple levels you could
> get into a situation where user-provided code is being evaluated. The
> first level of EvalDict incorporates the user-provided code into the
> string and the second level evaluates it.

The multiple level was an addition to the last version because that
was what some people expressed a desire for in the earlier string
interpolation discussion. EvalDict2 does a single level eval.
( Again: that seems to me to be an argument for several alternative
  object versions rather than one builtin syntax change. )

> Ping's current runtime implementation does use "eval" but you could
> imagine an alternate implementation that actually parses the relevant
> parts of the string according to the Python grammar, and merely applies
> the appropriate semantics. It would use "." to trigger getattr, "()" to
> trigger apply, "[]" to trigger getitem and so forth. Then there would be
> no eval and thus way to eval user-provided code.

 The same things holds for an object implementation. eval isn't required
for an implementation. But EVERY implementation of that semantics allows
implicit function calls. ( I was going to say 'hidden' function calls,
but I'll admit that may be provocative/argumentative.)
 Your point about compile time optomization holds here: yes, the
builtin syntax version allows much of that analysis to be done at
compile time, while the object version would need to do all of the
analysis on the fly at execution. However, as I noted -- the object
implementation would allow customizing a restricted environment
( which is a simple security implementation than code analysis.)
And having an explicit argument for the namespace allows more control,
as well as reminding you of the magic going on behind the curtains.
At least if there's a security problem, you have somewhere to look
for holes other than the Python C source code.

 If I keep an eval based implementation, I probably ought to make
a restricted __builtin__ the default.

-- Steve

From  Mon Jan 14 23:04:49 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 17:04:49 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

> But if you're going to allow interpolation of the results of arbitrary
> function into a string, it's going to be a security problem whether
> or not you use 'eval' to do it. My code hides the eval in the object's
> python code. u" strings would hide the eval in the C code. How is one
> more or less secure than the other.

There is no security issue with PEP 215.

$"$a and $b make $c"   <==>  ("%s and %s make %s" % (a, b, c))

These two are completely equivalent under PEP 215, and therefore
equally secure.

## Jason Orendorff

From Samuele Pedroni" <  Mon Jan 14 23:06:39 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Tue, 15 Jan 2002 00:06:39 +0100
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
References: <>
Message-ID: <00f601c19d50$1fa472a0$5154ca3e@newmexico>

The Jython 2cts. 

An eval implementation means that for Jython
a code using it cannot be run in a Java sand-box 
context, eval does not work there.

>  If I keep an eval based implementation, I probably ought to make
> a restricted __builtin__ the default.

Jython does not support CPython restricted execution. Probably
never will.

For what it counts I don't care having string interpolation a la Perl
in Python.

Samuele Pedroni.

From sdm7g@Virginia.EDU  Mon Jan 14 23:11:30 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 18:11:30 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

>     Paul> Sorry I wasn't clear. Let's say it's the second hour of our
>     Paul> Perl/Python class.
>     Paul> Here's Perl:
>     Paul> $a = 5;
>     Paul> $b = 6;
>     Paul> print "$a $b";
>     ...
>     Paul> Here's Python:
>     Paul> a = 5;
>     Paul> b = 6;
>     Paul> print "%(a)s %(b)s" % vars()

How does Perl handle it if the tokens aren't whitespace separated?
Is there an optional enclosing bracket as in shell syntax ?

How do you do:  "%(word)sly yours" %  vocabulary ?

(Sorry-- I stopped Perling somewhere around version 4.)

-- Steve Majewski

From  Mon Jan 14 23:10:37 2002
From: (Fred L. Drake, Jr.)
Date: Mon, 14 Jan 2002 18:10:37 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Majewski writes:
 > How does Perl handle it if the tokens aren't whitespace separated?
 > Is there an optional enclosing bracket as in shell syntax ?


 > How do you do:  "%(word)sly yours" %  vocabulary ?

  I've not a clue... manually scan the format string, perhaps?


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From sdm7g@Virginia.EDU  Mon Jan 14 23:19:21 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 18:19:21 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Jason Orendorff wrote:

> > But if you're going to allow interpolation of the results of arbitrary
> > function into a string, it's going to be a security problem whether
> > or not you use 'eval' to do it. My code hides the eval in the object's
> > python code. u" strings would hide the eval in the C code. How is one
> > more or less secure than the other.
> There is no security issue with PEP 215.
> $"$a and $b make $c"   <==>  ("%s and %s make %s" % (a, b, c))
> These two are completely equivalent under PEP 215, and therefore
> equally secure.

Your right. I'm confusing PEP 215 with the discussion on PEP 215,
where that feature was requested.

However, if you allow array and member access as well, which Paul
suggests, then you open the security problem back up unless you
do some code analysis (as he also suggests) to make sure that
[index] or .member doesn't perform a hidden function call
( A virus infected __getitem__ for example. )

-- Steve

From  Mon Jan 14 23:16:39 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 17:16:39 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

Would someone please explain to me what is seen as a "possible
security issue" in PEP 215?  Can anyone propose some real-life
situation where PEP 215 causes a vulnerability, and the
corresponding % syntax doesn't?

## Jason Orendorff

From  Mon Jan 14 23:42:13 2002
From: (Neil Schemenauer)
Date: Mon, 14 Jan 2002 15:42:13 -0800
Subject: [Python-Dev] Re: [Python-iterators] Python generators and try/finally..
In-Reply-To: <>; from on Sun, Jan 13, 2002 at 12:20:12PM -0800
References: <>
Message-ID: <>

[Cross-posted to python-dev, I'm not sure how many people are still on
the python-iterators list]

David Jeske wrote:
> Hello,
> I just read PEP255 about Python Generators. It's a very interesting
> and elegant solution to a tricky problem. 
> I have a thought about allowing try/finally with some reasonable
> semantics. 

This is definitely a wart.  This problem is one of the major reasons why
Ken Pitman did not want continuations in Common Lisp (Scheme predates CL
and in CL try/finally is called unwind-protect).  It's a hard problem.

However, I think try/finally is less of a problem with generators then
it is for continuations.  Generators only allow you to temporarily jump
up one level in the stack frame while continuations allow you to jump to
essentially arbirary stack frames.  We disallow try/finally inside a
generator since there is no guarantee that the finally clause will ever
be executed.  The problem is localized.  With continuations the problem
spreads.  Any try/finally block could affected.

In practice, I think the current restriction is not a big problem.
try/finally is allowed in code that calls generators as well as code
called by generators.  It is only disallowed in the body of generator

> The PEP says that there is no guarantee that next() will be called
> again. However, there is a guaratee that either next() will be called,
> or the Generator will be cleaned up. It seems reasonable to me to
> build a mechanism by which, on __del__ cleanup of the Generator, an
> exception is raised from the Yeild point "UnfinishedGenerator" (and
> also caught by the cleanup function). This exception would trigger any
> finally exception clauses which exist above the yeild. This also has
> the added advantage that code can detect when a Generator does not run
> to completion.
> It might even be useful to be able to flag the generator such that it
> does not catch the UnfinishedGenerator exception. Although this
> probably wouldn't be used often.

I'm pretty sure something like this could be done but I'm not sure it's
a good idea.  The handling of exceptions in __del__ methods is ugly,
IMHO.  We should not propagate that behavior without some careful
thought.  I would like to see some compelling arguments as to why
try/finally should be supported inside generators.


From  Mon Jan 14 23:49:18 2002
From: (Neil Schemenauer)
Date: Mon, 14 Jan 2002 15:49:18 -0800
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>; from on Mon, Jan 14, 2002 at 05:04:49PM -0600
References: <> <>
Message-ID: <>

Jason Orendorff wrote:
> There is no security issue with PEP 215.
> $"$a and $b make $c"   <==>  ("%s and %s make %s" % (a, b, c))
> These two are completely equivalent under PEP 215, and therefore
> equally secure.

Not exactly.  Say you have the code:

    secret_key = "spam"
    x = raw_input()
    print $"You entered $x"

Imagine that the user enters "I'm 3l337, give me the $secret_key" as the


From sdm7g@Virginia.EDU  Mon Jan 14 23:52:13 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 18:52:13 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Jason Orendorff wrote:

> Would someone please explain to me what is seen as a "possible
> security issue" in PEP 215?  Can anyone propose some real-life
> situation where PEP 215 causes a vulnerability, and the
> corresponding % syntax doesn't?

Do you mean the current '%' or my expanded example ?
Any expanded version -- mine or PEP 215 introduces possible
security holes. ( And I'm not even sure that the current "%"
doesn't have a hole if it's used "the wrong way" ) But, as
Paul said, it depends on the implementation.

I said in an earlied post that I confused PEP 215 with the discussion
of PEP 215, where some expanded capabilities were suggested.
However, on looking at it again closer, I would say that the
examples in PEP 215 contradict the Security Considerations
paragraph. It has expressions in it that can't be evaluated
at compile time, and any list index or member reference can,
in Python, invoke a hidden function call. Any implementation
is going to require some run time checks.

But just in case I'm seeing it all wrong: could you explain
to me how PEP 215 *doesn't* have the potential of introducing
a security hole ? If the current proof-of-concept implementation
does use eval (as Paul stated), then there is (I believe) a security
problem with that implementation. Paul has proposed some other
implementation tricks, but I'm, not convinced that you can get
the same semantics suggested in PEP 215's examples without
requiring runtime checks. Since eval is a know security hole,
I think the burden of proof is on the proponents. ( And I'm
not even demanding proof -- just a convincing argument without
too much hand waving and we-have-ways-of-dealing-with-that! )

-- Steve Majewski

From  Mon Jan 14 23:55:37 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 17:55:37 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

Neil Schemenauer wrote:
> Jason Orendorff wrote:
> > There is no security issue with PEP 215.
> > 
> > $"$a and $b make $c"   <==>  ("%s and %s make %s" % (a, b, c))
> > 
> > These two are completely equivalent under PEP 215, and therefore
> > equally secure.
> Not exactly.  Say you have the code:
>     secret_key = "spam"
>     x = raw_input()
>     print $"You entered $x"
> Imagine that the user enters "I'm 3l337, give me the $secret_key" as the
> input.

>>> import Itpl
>>> import sys
>>> sys.stdout = Itpl.filter()
>>> secret_key = "spam"
>>> x = raw_input()
I'm 3l337, give me the $secret_key
>>> print "You entered $x"
You entered I'm 3l337, give me the $secret_key

The substitution only happens once.

## Jason Orendorff

From  Tue Jan 15 00:18:40 2002
From: (Tim Peters)
Date: Mon, 14 Jan 2002 19:18:40 -0500
Subject: [Python-Dev] Re: [Python-iterators] Python generators and try/finally..
In-Reply-To: <>
Message-ID: <>

[Neil Schemenauer]
> ...
> In practice, I think the current restriction is not a big problem.
> try/finally is allowed in code that calls generators as well as code
> called by generators.  It is only disallowed in the body of generator
> itself.

It's not that severe, Neil:  the only restriction is that yield cannot
appear in the try clause of a try/finally construct.  try/finally can
otherwise be used freely inside generators, and yield can be used anywhere
inside a generator inside try/except/else, and even in a finally clause
(these latter assuming the yield is not also in the try clause of an
*enclosing* try/finally construct) -- just not in a try/finally's try

Here's the example from PEP 255 (also embedded in a
doctest, so we know for sure it works as advertised <wink>):

   >>> def f():
    ...     try:
    ...         yield 1
    ...         try:
    ...             yield 2
    ...             1//0
    ...             yield 3  # never get here
    ...         except ZeroDivisionError:
    ...             yield 4
    ...             yield 5
    ...             raise
    ...         except:
    ...             yield 6
    ...         yield 7     # the "raise" above stops this
    ...     except:
    ...         yield 8
    ...     yield 9
    ...     try:
    ...         x = 12
    ...     finally:
    ...         yield 10
    ...     yield 11
    >>> print list(f())
    [1, 2, 4, 5, 8, 9, 10, 11]

[David Jeske]
>> The PEP says that there is no guarantee that next() will be called
>> again. However, there is a guaratee that either next() will be called,
>> or the Generator will be cleaned up.

Not so:  Python doesn't guarantee destructors will get called by magic (see
the discussion of __del__ in the Python Reference Manual).  So best practice
is to use explicit (e.g.) close() calls anyway, and if you make your
generator a method of an object, its critical resources can (conveniently,
even!) be exposed to other methods for explicit cleanup (or its __del__, if
you absolutely must).

In practice (and I've had a lot <wink>), I have yet to be so much as midly
annoyed by this restriction.  So I have to echo Neil:

> We should not propagate that behavior without some careful thought.  I
> would like to see some compelling arguments as to why try/finally
> should be supported inside generators.

And especially with bizarre "and 'finally' will probably get executed, but
no guarantee that it will, and there's no predicting when-- or even in which
thread --if it does, and if it does and 'finally' itself goes boom, we may
also ignore the error" semantics.  As the PEP says, all of that is too much
a violation of finally's pre-generators contract to bear.

you-broke-it-you-fix-it<wink>-ly y'rs  - tim

From  Mon Jan 14 23:22:00 2002
From: (David Jeske)
Date: Mon, 14 Jan 2002 15:22:00 -0800
Subject: [Python-Dev] Re: [Python-iterators] Python generators and try/finally..
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Mon, Jan 14, 2002 at 07:18:40PM -0500, Tim Peters wrote:
> It's not that severe, Neil: the only restriction is that yield
> cannot appear in the try clause of a try/finally construct.
> try/finally can otherwise be used freely inside generators, and
> yield can be used anywhere inside a generator inside
> try/except/else, and even in a finally clause (these latter assuming
> the yield is not also in the try clause of an *enclosing*
> try/finally construct) -- just not in a try/finally's try clause.

Thanks for your thoughts, I'll defer this discussion until I do run
across a programming problem or two where the lack of finally cleanup
hurts the Generator, and then I'll bring that example to the table.

Thanks again for spending the time on Generators, it looks like a
truly neat and orthogonal feature.

David Jeske (N9LCA) + +

From  Tue Jan 15 01:33:07 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 17:33:07 -0800
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
References: <>
Message-ID: <>

Steven Majewski wrote:
> Your right. I'm confusing PEP 215 with the discussion on PEP 215,
> where that feature was requested.
> However, if you allow array and member access as well, which Paul
> suggests, then you open the security problem back up unless you
> do some code analysis (as he also suggests) to make sure that
> [index] or .member doesn't perform a hidden function call
> ( A virus infected __getitem__ for example. )

If you have a virus-infected __getitem__ you are screwed regardless. We
can't defend against that.

The whole point is that we are never evaluating code provided by the
user. "Safe" programmer-supplied literal strings are differentated at
compile time from arbitrary strings. The interpolation engine only works
on safe strings. Calling an overriden __getitem__ or .member is as safe
as if they had done it in the way they would today:

"%s" %

Think of it as pure, compile-time syntactic sugar. If you want it to act
like eval, I guess you would do this:


which would compile to:

"%s" % eval('....')

 Paul Prescod

From  Tue Jan 15 01:38:42 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 19:38:42 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

> But just in case I'm seeing it all wrong: could you explain
> to me how PEP 215 *doesn't* have the potential of introducing
> a security hole ?


Every $-string can be converted to equivalent code that uses only:

  a)  whatever code the programmer explicitly typed
      in the $-string;
  b)  str() or unicode(); and
  c)  the + operator applied to strings.

Therefore $ is exactly as secure or insecure as those three

All three of these things are just as safe as the non-PEP-215
features that we're already using.

Therefore $-strings do not introduce any new security hole.

## Jason Orendorff

From  Tue Jan 15 01:46:03 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 19:46:03 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

Steven Majewski wrote:
> On Mon, 14 Jan 2002, Jason Orendorff wrote:
> > Would someone please explain to me what is seen as a "possible
> > security issue" in PEP 215?  Can anyone propose some real-life
> > situation where PEP 215 causes a vulnerability, and the
> > corresponding % syntax doesn't?
> Do you mean the current '%' or my expanded example ?

I mean the current %.


## Jason Orendorff

From  Tue Jan 15 01:54:53 2002
From: (Neil Schemenauer)
Date: Mon, 14 Jan 2002 17:54:53 -0800
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>; from on Mon, Jan 14, 2002 at 05:55:37PM -0600
References: <> <>
Message-ID: <>

Jason Orendorff wrote:
> The substitution only happens once.

My example was not well thought out.  I was thinking something more

    secret_key = "spam"
    user = "joe"
    x = "$user said: " + raw_input()
    print $x

That wouldn't work either since $ only evaluates literals.  Amazing what
you learn by actually reading the PEP.  Yes, I'm an idiot.

After reading PEP 215 I like it a lot.  The fact that $ can only apply
to literals completely solves this issue.  Has Guido weighed in on it
yet?  I didn't find anything in the mail archives from him.


From  Tue Jan 15 02:01:52 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 18:01:52 -0800
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
References: <>
 <> <>
Message-ID: <>

Skip Montanaro wrote:
> So?  There are some things Perl does better than Python, some things Python
> does better than Perl.  

It doesn't have anything to do with competing with Perl. It is just
about learning from things that other languages do better (in this case
simpler) than Python. This feature came from the Bourne shell and is
also present in DOS batch, TCL, Ruby, PHP. 

Python's "%" is much better than nothing (which is what Javascript has)
but it is still a pain. First you use it with positional arguments and
then realize that is getting confusing so you switch to dictionary
arguments and then that gets unweildy because you're just declaring new
names for existing variables so you use vars(). But then you want to
interpolate the result of a function call or expression. So you have to
set up a one-time-use variable.

PEP 215 (which I did not write!) unifies all of the use cases into one
syntax that can be taught in ten minutes. The % syntax is fine for
totally different use cases: printf-style formatting and interpolation
of strings that might be generated at runtime.

 Paul Prescod

From sdm7g@Virginia.EDU  Tue Jan 15 02:07:24 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 21:07:24 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <Pine.OSX.4.43.0201142103460.286-100000@localhost>

On Mon, 14 Jan 2002, Jason Orendorff wrote:

> > But just in case I'm seeing it all wrong: could you explain
> > to me how PEP 215 *doesn't* have the potential of introducing
> > a security hole ?
> Gladly.
> Every $-string can be converted to equivalent code that uses only:
>   a)  whatever code the programmer explicitly typed
>       in the $-string;
>   b)  str() or unicode(); and
>   c)  the + operator applied to strings.

But the examples in PEP 215 don't follow those restrictions.

That may be the source of the confusion.

Maybe someone should revise the PEP for consistency before it's
considered further.

-- Steve.

From  Tue Jan 15 02:10:55 2002
From: (Neal Norwitz)
Date: Mon, 14 Jan 2002 21:10:55 -0500
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
References: <> <> <>
Message-ID: <>

Neil Schemenauer wrote:
> Jason Orendorff wrote:
> > The substitution only happens once.
> My example was not well thought out.  I was thinking something more
> like:
>     secret_key = "spam"
>     user = "joe"
>     x = "$user said: " + raw_input()
>     print $x
> That wouldn't work either since $ only evaluates literals.  Amazing what
> you learn by actually reading the PEP.  Yes, I'm an idiot.

Sorry, I haven't followed this thread real closely, but I thought
someone said eval() was used under the covers.

If x is eval'ed and the string is as above, I get the following in 2.1:

	>>> secret_key = 'spam'
	>>> x = raw_input('? ')
	? eval("secret_key")
	# Is the following commented print equivalent the the line below it?
	### print "You entered $x"
	>>> print "You entered", eval(x)
	You entered spam
	>>> print "You entered %(x)s" % locals()
	You entered eval("secret_key")

Not sure if that's the same as what you are talking about though.


From sdm7g@Virginia.EDU  Tue Jan 15 02:15:34 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Mon, 14 Jan 2002 21:15:34 -0500 (EST)
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <Pine.OSX.4.43.0201142108410.286-100000@localhost>

On Mon, 14 Jan 2002, Paul Prescod wrote:
> ...
> then realize that is getting confusing so you switch to dictionary
> arguments and then that gets unweildy because you're just declaring new
> names for existing variables so you use vars(). But then you want to
> interpolate the result of a function call or expression. So you have to
> set up a one-time-use variable.
> PEP 215 (which I did not write!) unifies all of the use cases into one
> syntax that can be taught in ten minutes. The % syntax is fine for
> totally different use cases: printf-style formatting and interpolation
> of strings that might be generated at runtime.

But Jason just said that function calls are not allowed.
( We -- actually, he listed what was allowed, and function calls
  were definitely not among them. )

PEP 215's examples don't agree with the limitations in it's
security section, and the proposal being discussed seems to
be shifting under out feet. That's the reason I got the proposals
given in the previous discussion of PEP 215 and PEP 215 itself

-- Steve

From  Tue Jan 15 02:25:18 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 20:25:18 -0600
Subject: [Python-Dev] Suggested changes to PEP 215
Message-ID: <>

One of the examples in PEP 215 is a bit wrong, I think.

        >>> print $'\$a'

This should output a backslash before the 5, because the
string '\$a' has a backslash character in it.

Also, for clarity, PEP 215 should explicitly specify
that the substitution only occurs once.  For example:

        # Existing examples
        >>> a, b = 5, 6
        >>> print $'a = $a, b = $b'
        a = 5, b = 6

        >>> x = "$a"
        >>> print $'x = $x'
        x = $a

Maybe there should also be examples demonstrating that $-strings
adopt the local namespace.

Also, the PEP says:

]       $'a = $a, b = $b'
]   could be compiled as though it were the expression
]       ('a = ' + str(a) + ', b = ' + str(b))


    def f(str):
        # The argument 'str' masks the builtin str() function.
        a, b = find_stuff(str)
        print $'a = $a, b = $b'
        return a, b

It should be specified that $-strings do not use the local
"str" and "unicode" names to find str() and unicode(); nor
do they look in the current __builtins__ or the __builtin__
module.  They should use the actual python C implementations
of str() and unicode().  This can be implemented by putting
a direct reference to str or unicode in the co_consts tuple
of the code object; I don't know how else the author plans
to deal with this.

## Jason Orendorff

From  Tue Jan 15 02:40:20 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 18:40:20 -0800
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
Message-ID: <>

Steven Majewski wrote:
> But Jason just said that function calls are not allowed.
> ( We -- actually, he listed what was allowed, and function calls
>   were definitely not among them. )

I misread Jason's list at first myself. Jason was describing the
*output* of the transformation. He said that the output of the
transformation would be no more and no less than directly typed code

a)  whatever code the programmer explicitly typed
       in the $-string;
b)  str() or unicode(); and
    "$" has the power to eval, but only to eval a literal.  As
    described here (a string prefix rather than an operator
c)  the + operator applied to strings.

"a)" embodies a whole host of things listed in the PEP:

    "A Python identifier optionally followed by any number of
        trailers, where a trailer consists of:
            - a dot and an identifier,
            - an expression enclosed in square brackets, or
            - an argument list enclosed in parentheses
        (This is exactly the pattern expressed in the Python grammar
        by "NAME trailer*", using the definitions in Grammar/Grammar.)"

The PEP also has examples:

>>> print $'References to $a: $sys.getrefcount(a)'
References to 5: 15

> PEP 215's examples don't agree with the limitations in it's
> security section, 

To summarize the security section, it says: *All of the text that is
ever processed by this mechanism is textually present in the Python
program at compile time*. In other words, users of the program can never
submit information and have it be evaluated by this mechanism.

 Paul Prescod

From  Tue Jan 15 02:40:33 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 18:40:33 -0800
Subject: [Python-Dev] Suggested changes to PEP 215
References: <>
Message-ID: <>

Jason Orendorff wrote:
> ...
> It should be specified that $-strings do not use the local
> "str" and "unicode" names to find str() and unicode(); nor
> do they look in the current __builtins__ or the __builtin__
> module.  They should use the actual python C implementations
> of str() and unicode(). 

Why? Wouldn't it be better to look in __builtin__? If someone overrides
str() or unicode() they may well want that behaviour to be respected in

 Paul Prescod

From  Tue Jan 15 02:46:49 2002
From: (Ka-Ping Yee)
Date: Mon, 14 Jan 2002 20:46:49 -0600 (CST)
Subject: [Python-Dev] PEP 215 does not introduce security issues
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Neil Schemenauer wrote:
> Amazing what you learn by actually reading the PEP.

May i quote you on that?  :)

Just kidding.  More seriously: there is no security issue introduced
by PEP 215.  I saw the concerns being raised in the previous e-mail
messages on this topic, but every time i was about to compose a
reply, i found that Jason Orendorff had already provided exactly
the explanation i was about to give, or better.

So, thank you, Jason. :)

In short: PEP 215 suggests a syntactic transformation that turns

    $'the $quick brown $fox()'

into the fully equivalent

    'the %s brown %s' % (quick, fox())

The '$' prefix only applies to literals, and cannot be used as
an operator in front of other expressions or variables.  This
issue is pointed out specifically in the PEP:

     '$' works like an operator and could be implemented as an
     operator, but that prevents the compile-time optimization
     and presents security issues.  So, it is only allowed as a
     string prefix.

Therefore, this transformation executes *only* code that was
literally present in the original program.  (An example of this
transformation is given at the end of PEP 215 in the
"Implementation" section.)

(By the way, i myself am not yet fully convinced that a string
interpolation feature is something that Python desperately needs.
I do see some considerable potential for good, and so the purpose
of PEP 215 was to put a concrete and plausible proposal on the
table for discussion.  Given that proposal, which i believe to be
about as good as one could reasonably expect, we can hope to save
ourselves the expense of re-arguing the same issues repeatedly,
and make an informed decision about whether to add the feature.

Among the possible drawbacks/complaints i see are: more work for
automated source code tools, tougher editor syntax highlighting,
too many messy string prefix characters, and the addition of yet
one more Python feature to teach and document.  Security, however,
is not among them.)

-- ?!ng

From  Tue Jan 15 02:51:43 2002
From: (Ka-Ping Yee)
Date: Mon, 14 Jan 2002 20:51:43 -0600 (CST)
Subject: [Python-Dev] Re: Suggested changes to PEP 215
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Jason Orendorff wrote:
> One of the examples in PEP 215 is a bit wrong, I think.
>         >>> print $'\$a'
>         5
> This should output a backslash before the 5, because the
> string '\$a' has a backslash character in it.

You are correct.  I'll make this change.

> Also, for clarity, PEP 215 should explicitly specify
> that the substitution only occurs once.
> Maybe there should also be examples demonstrating that $-strings
> adopt the local namespace.

Sure, that wouldn't hurt.  More examples are a good idea.

> Consider:
>     def f(str):
>         # The argument 'str' masks the builtin str() function.
>         a, b = find_stuff(str)
>         print $'a = $a, b = $b'
>         return a, b
> It should be specified that $-strings do not use the local
> "str" and "unicode" names to find str() and unicode()

Good point.  Perhaps it is better to simply describe a
transformation using '%s' and '%' instead of 'str' and '+'
to avoid this potential confusion altogether.

-- ?!ng

From  Tue Jan 15 03:01:24 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 21:01:24 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <Pine.OSX.4.43.0201142103460.286-100000@localhost>
Message-ID: <>

Steven Majewski wrote:
> On Mon, 14 Jan 2002, Jason Orendorff wrote:
> > > But just in case I'm seeing it all wrong: could you explain
> > > to me how PEP 215 *doesn't* have the potential of introducing
> > > a security hole ?
> >
> > Gladly.
> >
> > Every $-string can be converted to equivalent code that uses only:
> >
> >   a)  whatever code the programmer explicitly typed
> >       in the $-string;
> >   b)  str() or unicode(); and
> >   c)  the + operator applied to strings.
> But the examples in PEP 215 don't follow those restrictions.

I dunno, it looks like they do to me.

$'a = $a, b = $b'
    ---> ('a = ' + str(a) + ', b = ' + str(b))
    ---> (u'uni' + unicode(a) + u'ode')
    ---> ('\\' + str(a))
    ---> ('\\' + str(a))
    ---> ('$' + str(a) + '.' + str(b))
$'a + b = ${a + b}'
    ---> ('a + b = ' + str(a + b))
$'References to $a: $sys.getrefcount(a)'
    ---> ('References to ' + str(a) + ': ' + str(sys.getrefcount(a)))
$"sys = $sys, sys = $sys.modules['sys']"
    ---> ('sys = ' + str(sys) + ', sys = ' + str(sys.modules['sys']))
$'BDFL = $sys.copyright.split()[4].upper()'
    ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))

In every case, the equivalent uses
  a)  some bits of code that the programmer explicitly typed
      in the $-string;
  b)  str() or unicode();
  c)  and the + operator (to join the resulting strings).

I guess you're thinking "but those bits of code are invoking other
functions that aren't in your list".  My point is, the equivalent
print statement, or % expression (the existing %, not your
proposed %) does the exact same thing.

  print $'here we go: $y maps to $x[y]'
  print 'here we go: %s maps to %s' % (y, x[y])
  print 'here we go:', y, 'maps to', x[y]
  print 'here we go: ' + str(y) + ' maps to ' + str(x[y])

Is one of these less secure than the others somehow?

There is no new security hole here.

## Jason Orendorff

From  Tue Jan 15 03:30:47 2002
From: (Ka-Ping Yee)
Date: Mon, 14 Jan 2002 21:30:47 -0600 (CST)
Subject: [Python-Dev] Re: Suggested changes to PEP 215
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Ka-Ping Yee wrote:
> Good point.  Perhaps it is better to simply describe a
> transformation using '%s' and '%' instead of 'str' and '+'
> to avoid this potential confusion altogether.

I have just realized, upon careful thought, that it would be better
to make this syntactic transformation the official specification of
the feature, rather than simply an implementation suggestion.

The current specification is incomplete because it does not adequately
handle certain corner cases:

                             (current PEP)
                             \ then $        $ then \    what i want

    >>> x = 'x41'
    >>> print $'\$x'
    ???                      \x41            A           \x41

    >>> print $'\x24x'
    ???                      x41             $x          $x

    >>> y = '41'
    >>> print $'\x$y'        ???             A           SyntaxError

The issue is whether backslash-interpretation happens first, or
dollar-interpretation happens first.  The current PEP says \ first.

I hope you see why i want the first case *not* to do \x interpretation
and why i want the second case not to do $ interpretation.  (The
programmer shouldn't have to look for \x24 in her code!)  The third
case is a mess and should definitely be a syntax error.

I'll write a new PEP.

-- ?!ng

From  Tue Jan 15 03:36:57 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 21:36:57 -0600
Subject: [Python-Dev] Suggested changes to PEP 215
In-Reply-To: <>
Message-ID: <>

Paul Prescod wrote:
> Jason Orendorff wrote:
> > ...
> > It should be specified that $-strings do not use the local
> > "str" and "unicode" names to find str() and unicode(); nor
> > do they look in the current __builtins__ or the __builtin__
> > module.  They should use the actual python C implementations
> > of str() and unicode(). 
> Why? Wouldn't it be better to look in __builtin__? If someone overrides
> str() or unicode() they may well want that behaviour to be respected in
> interopolations.

I was thinking it should parallel what the other similar
features already do:

>>> import __builtin__
>>> __builtin__.str = 'a suffusion of yellow'
>>> str
'a suffusion of yellow'
>>> print 32
>>> print "xyz %s 123" % 4.5
xyz 4.5 123

## Jason Orendorff

From  Tue Jan 15 03:46:38 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 21:46:38 -0600
Subject: [Python-Dev] Re: Suggested changes to PEP 215
In-Reply-To: <>
Message-ID: <>

Ping wrote:
> > Consider:
> >
> >     def f(str):
> >         # The argument 'str' masks the builtin str() function.
> >         a, b = find_stuff(str)
> >         print $'a = $a, b = $b'
> >         return a, b
> >
> > It should be specified that $-strings do not use the local
> > "str" and "unicode" names to find str() and unicode()
> Good point.  Perhaps it is better to simply describe a
> transformation using '%s' and '%' instead of 'str' and '+'
> to avoid this potential confusion altogether.

I thought about this; but I don't know if there's a '%'
equivalent for the unicode handling.

    ---> (u'uni' + unicode(a) + u'ode')
    ---> u'uni%???ode' % (a,)

I don't think %s does it.  Maybe there's some format spec
flag that I'm forgetting.

## Jason Orendorff

From  Tue Jan 15 03:55:37 2002
From: (Neil Hodgson)
Date: Tue, 15 Jan 2002 14:55:37 +1100
Subject: [Python-Dev] PEP 215 does not introduce security issues
References: <>
Message-ID: <003601c19d78$7e220680$0acc8490@neil>

The PEP:
>      '$' works like an operator and could be implemented as an
>      operator, but that prevents the compile-time optimization
>      and presents security issues.  So, it is only allowed as a
>      string prefix.

   I'd like to see the '$' prefix replaced with an ordinary character such
as 'i'. '$' is currently unused in Python and so can be used for future
extension either as a new operator or as the basis for new operators.
Interpolation strings consume this character so it can no longer be chosen
as a new operator.


From  Tue Jan 15 03:56:47 2002
From: (Skip Montanaro)
Date: Mon, 14 Jan 2002 21:56:47 -0600
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <>
References: <>
Message-ID: <>

    Paul> But then you want to interpolate the result of a function call or
    Paul> expression. So you have to set up a one-time-use variable.

As has been demonstrated, there are several ways to tackle this problem.  I
first saw something headed in this direction with Zope's (actually
DocumentTemplate's) MultiMapping class several years ago.  It only aimed to
make it easy to interpolate named parameters from several dictionaries
simultaneously.  Steve Majewski and others have shown how you can do this
with an EvalDict type of class, so it's not like you can't do this today.
The point is for something to be really worth modifying the syntax of the
language I think it has to demonstrate that it's significantly better than
the alternatives.  The security argument is a red herring.  There are enough
other ways programmers can blow their feet off.  If someone is naive enough
to execute the moral equivalent of

    print raw_input() % EvalDict3()

in their programs they will probably learn fairly quickly that it's a
questionable programming practice.

    Paul> PEP 215 (which I did not write!) unifies all of the use cases into
    Paul> one syntax that can be taught in ten minutes. 

It unifies all the use cases into *two* syntaxes.  The preexisting
%-formatted strings aren't going away anytime soon.  They are suitable for
most applications, so new users would have to contend with at least being
able to read, if not write, both forms of string interpolation for the
forseeable future if PEP 215 is adopted.

It hasn't been demonstrated to me that Steve's EvalDict or something similar
couldn't be taught in a similar amount of time.  It has the added advantage
that it's essentially the same syntax as the current % syntax.  You can use
expressions where before you had to restrict yourself to names.  It requires
no change to the language.  Just drop it into a module in the std library
and away you go.  In fact, coded properly (which Steve is eminently capable
of doing) it would be 100% backward compatible.  People running essentially
any version of Python could use it. (I believe Pythonware still makes a 1.4
installer available for Windows.)

    Paul> The % syntax is fine for totally different use cases: printf-style
    Paul> formatting and interpolation of strings that might be generated at
    Paul> runtime.

What do you mean by "totally different"?  Most examples I've seen so far
have looked pretty much like

    print $"$a $b"

which probably covers about 90% of common usage anyway.  The examples in
PEP-215 don't look any more different than an EvalDict-like class could
comfortably handle today either.

Skip Montanaro ( -

From  Tue Jan 15 04:03:17 2002
From: (Skip Montanaro)
Date: Mon, 14 Jan 2002 22:03:17 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
References: <Pine.OSX.4.43.0201142103460.286-100000@localhost>
Message-ID: <>

    $'BDFL = $sys.copyright.split()[4].upper()'
    ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))

How to you know when to stop gobbling after seeing a dollar sign in the

Skip Montanaro ( -

From  Tue Jan 15 04:04:47 2002
From: (Ka-Ping Yee)
Date: Mon, 14 Jan 2002 22:04:47 -0600 (CST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

On Mon, 14 Jan 2002, Skip Montanaro wrote:
>     $'BDFL = $sys.copyright.split()[4].upper()'
>     ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))
> How to you know when to stop gobbling after seeing a dollar sign in the
> string?

Parse using the "NAME trailer*" production in Grammar/Grammar.

-- ?!ng

From  Tue Jan 15 04:16:11 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 22:16:11 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <>

> On Mon, 14 Jan 2002, Skip Montanaro wrote:
> >     $'BDFL = $sys.copyright.split()[4].upper()'
> >     ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))
> >
> > How to you know when to stop gobbling after seeing a dollar sign in the
> > string?
> Parse using the "NAME trailer*" production in Grammar/Grammar.

Except that whitespace is significant, at least in the sample

>>> i = Itpl.itpl
>>> x=4
>>> y=3
>>> i("This is x: $x.  This is y: $y.")  # doesn't grab (x.This)
'This is x: 4.  This is y: 3.'
>>> i("This is x: $x.This is y: $y.")    # does grab (x.This)
AttributeError: 'int' object has no attribute 'This'

This doesn't seem to be mentioned in the PEP.

## Jason Orendorff

From  Tue Jan 15 04:56:56 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 22:56:56 -0600
Subject: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict
In-Reply-To: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
Message-ID: <>

Steve Majewski wrote:
> But Jason just said that function calls are not allowed.
> ( We -- actually, he listed what was allowed, and function calls
>   were definitely not among them. ) [...]

Well, when the $-string explicitly contains the name of the
function to be called, then that falls into category (a).

I wrote:
>   a)  whatever code the programmer explicitly typed
>       in the $-string;

I hope this makes things clearer and not worse.  :-)

## Jason Orendorff

From sdm7g@Virginia.EDU  Tue Jan 15 05:15:23 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Tue, 15 Jan 2002 00:15:23 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <Pine.OSX.4.43.0201142350140.286-100000@localhost>

On Mon, 14 Jan 2002, Jason Orendorff wrote:

> Steven Majewski wrote:
> > On Mon, 14 Jan 2002, Jason Orendorff wrote:
> > > Would someone please explain to me what is seen as a "possible
> > > security issue" in PEP 215?  Can anyone propose some real-life
> > > situation where PEP 215 causes a vulnerability, and the
> > > corresponding % syntax doesn't?
> >
> > Do you mean the current '%' or my expanded example ?
> I mean the current %.
> Well?

Paul is the one who (rightly) brought up the issue of security
with respect to double evaluated strings. But in addition, he
seemed to be saying that you can do more with a compile time
test than you can with a runtime test. I disagree with that.

I think, for the same semantics, you get the same security
issues. I think it's very similar to the compile time type
checking vs. dynamic typing problem. (In fact, I think it
reduces to the same problem.)

There are clearly some advantages to doing things compile time,
but you don't get more security without more restriction.

-- Steve

From  Tue Jan 15 05:33:24 2002
From: (Jason Orendorff)
Date: Mon, 14 Jan 2002 23:33:24 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <Pine.OSX.4.43.0201142350140.286-100000@localhost>
Message-ID: <>

Steven Majewski wrote:
> On Mon, 14 Jan 2002, Jason Orendorff wrote:
> > Steven Majewski wrote:
> > > On Mon, 14 Jan 2002, Jason Orendorff wrote:
> > > > Would someone please explain to me what is seen as a "possible
> > > > security issue" in PEP 215?  Can anyone propose some real-life
> > > > situation where PEP 215 causes a vulnerability, and the
> > > > corresponding % syntax doesn't?
> > >
> > > Do you mean the current '%' or my expanded example ?
> >
> > I mean the current %.
> >
> > Well?
> >
> Paul is the one who (rightly) brought up the issue of security
> with respect to double evaluated strings. But in addition, he
> seemed to be saying that you can do more with a compile time
> test than you can with a runtime test. I disagree with that.
> I think, for the same semantics, you get the same security
> issues. I think it's very similar to the compile time type
> checking vs. dynamic typing problem. (In fact, I think it
> reduces to the same problem.)
> There are clearly some advantages to doing things compile time,
> but you don't get more security without more restriction.

As long as this "security issue" thread dies, I'm happy.

## Jason Orendorff

From  Tue Jan 15 05:49:19 2002
From: (Paul Prescod)
Date: Mon, 14 Jan 2002 21:49:19 -0800
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
References: <Pine.OSX.4.43.0201142350140.286-100000@localhost>
Message-ID: <>

Steven Majewski wrote:
> Paul is the one who (rightly) brought up the issue of security
> with respect to double evaluated strings. But in addition, he
> seemed to be saying that you can do more with a compile time
> test than you can with a runtime test. I disagree with that.
> I think, for the same semantics, you get the same security
> issues. 

Sure, for the same semantics. But EvalDict doesn't have the same
semantics. Even if we ignore double interpolation there is the issue of
code like this:

>>> def double():
...    user_val = raw_input("Please enter a number:")
...    print "%(2*user_val)" % EvalDict

>>> double()
Please enter a number: 3 + (os.system("rm -rm *"))

For EvalDict to have the same semantics as PEP 215 it would have to
disallow interpolations on strings that were not string literals. This
would make the EvalDict object somewhat different than any other object
in the Python library. Plus it would require compiler support which
would break compatibility with older Pythons.

 Paul Prescod

From sdm7g@Virginia.EDU  Tue Jan 15 05:56:18 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Tue, 15 Jan 2002 00:56:18 -0500 (EST)
Subject: [Python-Dev] PEP 215 does not introduce security issues
In-Reply-To: <>
Message-ID: <Pine.OSX.4.43.0201150016490.286-100000@localhost>

On Mon, 14 Jan 2002, Ka-Ping Yee wrote:

> The '$' prefix only applies to literals, and cannot be used as
> an operator in front of other expressions or variables.  This
> issue is pointed out specifically in the PEP:

I think the term "the '$' prefix" was one of the sources of my
confusion, as '$' is both a string prefix and a symbol prefix within
the string. I think I read "the '$' prefix"  as referreing to the
second kind where you meant the first. The same goes for discussion
of '$' as an operator. (This misreading was the source of the
inconsistency I thought I saw between the examples and other

> Therefore, this transformation executes *only* code that was
> literally present in the original program.  (An example of this
> transformation is given at the end of PEP 215 in the
> "Implementation" section.)

O.K. Jason's explaination finally got thru to me: it's more clear
if I think of it as a preprocessor that really doesn't add any
capabilities to the language. I should think of it more like
the 'r' string prefix, which is just a syntactic convenience,
rather than like the 'u' string prefix, which creates a special
kind of (unicode) string. ( Well, it *does* create a special kind
of string in the runtime, but you can't access that string to
to do anything strange in Python, because as soon as it's assigned,
it gets transformed into a 'normal string' . Thinking of it as
a preprocessor makes that more obvious.)

> (By the way, i myself am not yet fully convinced that a string
> interpolation feature is something that Python desperately needs.
> I do see some considerable potential for good, and so the purpose
> of PEP 215 was to put a concrete and plausible proposal on the
> table for discussion.  Given that proposal, which i believe to be
> about as good as one could reasonably expect, we can hope to save
> ourselves the expense of re-arguing the same issues repeatedly,
> and make an informed decision about whether to add the feature.
> Among the possible drawbacks/complaints i see are: more work for
> automated source code tools, tougher editor syntax highlighting,
> too many messy string prefix characters, and the addition of yet
> one more Python feature to teach and document.  Security, however,
> is not among them.)

 I'm not wild about more string prefixes, but we've already started
down that road, so I can't complain too much. But, as you've already
noted: it doesn't add any new capability, just new syntax. ( But it
probably as justifiable as the raw string syntax. )
 Although I've knocked the idea in the past, I'ld almost rather see
some sort of 'macro' facility for python, than to see a bunch of
special case syntax added to the language for every feature.

-- Steve

From  Tue Jan 15 06:01:36 2002
From: (Jason Orendorff)
Date: Tue, 15 Jan 2002 00:01:36 -0600
Subject: [Python-Dev] PEP 215 does not introduce security issues
In-Reply-To: <Pine.OSX.4.43.0201150016490.286-100000@localhost>
Message-ID: <>

Steve Majewski wrote:
> [...] it's more clear
> if I think of it as a preprocessor that really doesn't add any
> capabilities to the language. I should think of it more like
> the 'r' string prefix, which is just a syntactic convenience,
> rather than like the 'u' string prefix, which creates a special
> kind of (unicode) string. ( Well, it *does* create a special kind
> of string in the runtime, but you can't access that string to
> to do anything strange in Python, because as soon as it's assigned,
> it gets transformed into a 'normal string' . Thinking of it as
> a preprocessor makes that more obvious.)

Yep, I agree, and I'm glad we're all at least seeing PEP 215
the same way now. :-)

However, I don't think it would need a special kind of string
in the runtime.  Thinking of it as a preprocessor, I believe
it would only need to generate some Python bytecode that uses
the existing str or unicode types.

Now I can go back to being neutral on PEP 215.  :-)

## Jason Orendorff

From sdm7g@Virginia.EDU  Tue Jan 15 06:27:19 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Tue, 15 Jan 2002 01:27:19 -0500 (EST)
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <>
Message-ID: <Pine.OSX.4.43.0201150120550.286-100000@localhost>

On Mon, 14 Jan 2002, Paul Prescod wrote:

> Sure, for the same semantics. But EvalDict doesn't have the same
> semantics. Even if we ignore double interpolation there is the issue of
> code like this:
> >>> def double():
> ...    user_val = raw_input("Please enter a number:")
> ...    print "%(2*user_val)" % EvalDict
> >>> double()
> Please enter a number: 3 + (os.system("rm -rm *"))

But in EvalDict you have to explicitly pass it a namespace dict.
You just don't pass it one with access to os.system ( or most
other os calls. ) That's why I disliked an implicit namespace.

But your example suggests to me:

>>> input('?: ')
?: r'raw string'
'raw string'

>>> input('?: ')
?: u'unicode string'
u'unicode string'

>>> input('?: ')
?: $'$os.system("rm -rm *" )'

I guess you need to special case that out of the compiler also.
( Are there any others lurking about ? )

-- Steve

From  Tue Jan 15 07:04:10 2002
From: (Barry A. Warsaw)
Date: Tue, 15 Jan 2002 02:04:10 -0500
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict)
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
Message-ID: <>

>>>>> "PP" == Paul Prescod <> writes:

    PP> He said that the output of the transformation would be no more
    PP> and no less than directly typed code with

    | a)  whatever code the programmer explicitly typed
    |        in the $-string;
    | b)  str() or unicode(); and
    |     "$" has the power to eval, but only to eval a literal.  As
    |     described here (a string prefix rather than an operator
    | c) the + operator applied to strings.

    PP> "a)" embodies a whole host of things listed in the PEP:

    PP>     "A Python identifier optionally followed by any number of
    PP> trailers, where a trailer consists of: - a dot and an
    PP> identifier, - an expression enclosed in square brackets, or -
    PP> an argument list enclosed in parentheses (This is exactly the
    PP> pattern expressed in the Python grammar by "NAME trailer*",
    PP> using the definitions in Grammar/Grammar.)"

Not to pick on Paul, but I'm having a hard time imagining how a newbie
Python user being taught this new feature in his second hour will
actually understand any of these rules.  And how will you later answer
their questions about why Python has both $'' literals and '' % dict
interpolation when it seems like you can do basically the same task
using either of them?

>>>>> "KY" == Ka-Ping Yee <> writes:

    KY> In short: PEP 215 suggests a syntactic transformation that
    KY> turns

    KY>     $'the $quick brown $fox()'

    KY> into the fully equivalent

    KY>     'the %s brown %s' % (quick, fox())

    KY> The '$' prefix only applies to literals, and cannot be used as
    KY> an operator in front of other expressions or variables.  This
    KY> issue is pointed out specifically in the PEP:


    KY> Good point.  Perhaps it is better to simply describe a
    KY> transformation using '%s' and '%' instead of 'str' and '+'
    KY> to avoid this potential confusion altogether.

That would help <wink>.

    KY> (By the way, i myself am not yet fully convinced that a string
    KY> interpolation feature is something that Python desperately
    KY> needs.

I am definitely not convinced that Python desperately needs PEP 215.
I wonder if the same folks clamoring for it will be the same folks who
raise their hands next month when asked again if they think Python is
change too fast (naw, that won't happen :).

How many of you use Itpl regularly?  If Python were so deficient in
this regard, I would expect to see a lot of hands.  It's certainly
easy enough to define in today's Python, a simple function call that
adds only two characters to the proposal, so I don't buy that this
/only/ has utility if were to apply to literals.  I'm willing to
accept that as applied only to literals it doesn't raise more security
concerns, but it also isn't nearly as useful then IMO.

And BTW, as I've told Ka-Ping before, I /am/ sympathetic to many of
the ideas in this PEP and in Itpl.  In fact, I have something very
similar in Mailman that I use all the time[1].  Instead of $'...' I
spell it _('...') which actually stands out better to me, and is only
two extra characters.  It's not as feature rich as PEP 215, but then
about the /only/ thing I'd add would be attribute access.  As it is,

    _('You owe me %(num)d dollars for that %(adj)s parrot')

gets me there 9 times out of 10, while for the 10th

    bird = cage.bird
    state = bird.wake_up()
    days = int(time.time() - bird.lastmodtime) / 86400
    _('That %(bird)s has been %(state)s for %(days)s')

is really not much more onerous, and certainly less jarring to my eye
than all those $ signs.



[1] I use _() ostensibly to mark translatable strings, but it has a
side benefit in that it interpolates into the string named variables
from the locals and globals of the calling context.  It does this by
using sys._getframe(1) in Python 2.1 and try/except hackery in older
versions of Python.  I find it quite handy, and admittedly magical,
but then I'm not suggesting it become a standard Python feature. :)

From  Tue Jan 15 07:26:19 2002
From: (Barry A. Warsaw)
Date: Tue, 15 Jan 2002 02:26:19 -0500
Subject: [Python-Dev] PEP 215 and EvalDict, yet another alternative
References: <Pine.OSX.4.43.0201141213040.974-100000@localhost>
Message-ID: <>

>>>>> "SM" == Steven Majewski <sdm7g@Virginia.EDU> writes:

    SM> Since PEP 216 on string interpolation is still active, I'ld
    SM> appreciate it if some of it's supporters would comment on my
    SM> revised alternative solution (posted on comp.lang.python and
    SM> at google thru):

[Steve's EvalDict]

For completeness, here's a simplified version of Mailman's _()
function which does auto-interpolation from locals and globals of the
calling context.  This version works in Python 2.1 or beyond and has
the i18n translation stuff stripped out.  For the full deal, see*checkout*/mailman/mailman/Mailman/


-------------------- snip snip
import sys
from UserDict import UserDict
from types import StringType

class SafeDict(UserDict):
    """Dictionary which returns a default value for unknown keys."""
    def __getitem__(self, key):
        except KeyError:
            if isinstance(key, StringType):
                return '%('+key+')s'
                return '<Missing key: %s>' % `key`

def _(s):
    frame = sys._getframe(1)
    d = SafeDict(frame.f_globals.copy())
    return s % d

BIRD = 'parrot'

def examples(thing):
    bird = 'dead ' + BIRD
    print _('It used to be a %(BIRD)s')
    print _('But now it is a %(bird)s')
    print _('%(BIRD)s or %(bird)s?')
    print _('You are not %(morg)s, you are not %(imorg)s')
    print _('%(thing)s, %(thing)s, what is %(thing)s?')

-------------------- snip snip --------------------

% python /tmp/ brain
It used to be a parrot
But now it is a dead parrot
parrot or dead parrot?
You are not %(morg)s, you are not %(imorg)s
brain, brain, what is brain?

From  Tue Jan 15 09:58:31 2002
From: (Paul Prescod)
Date: Tue, 15 Jan 2002 01:58:31 -0800
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation)
 alternative EvalDict)
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
 <> <>
Message-ID: <>

"Barry A. Warsaw" wrote:
> Not to pick on Paul, but I'm having a hard time imagining how a newbie
> Python user being taught this new feature in his second hour will
> actually understand any of these rules.  

It's relatively simple. "You can do attribute access and function or
method calls. You can wrap things in parens do to more complicated

I would also be interested in a version of PEP 215 that merely required
parens all of the time. 

$"$(foo) $(5 + bar)"

I have always been nervous when I start new languages about how the
interpolation strings figure out where they end.

> ... And how will you later answer
> their questions about why Python has both $'' literals and '' % dict
> interpolation when it seems like you can do basically the same task
> using either of them?

One is for working with literals and the other for working with computed
strings that arise in your code. It's one of those things where you use
the simple way you are taught in class until you find a case where you
can't use it any more and then you'll understand why you need the
advanced way. Today's situation is that you are probably taught about
three or four ways in class because none of them is really particularly

> I am definitely not convinced that Python desperately needs PEP 215.

I don't think anybody is convinced that Python desperately needs PEP
AFAIK, it hasn't been touched since July 2000. How could a 10 year old
language desperately need ANY syntactic sugar? If we survived until now
without something then we could probably survive another few years.

> I wonder if the same folks clamoring for it will be the same folks who
> raise their hands next month when asked again if they think Python is
> change too fast (naw, that won't happen :).

Ummm. Who is clamoring for this feature? We were presented with a newer
proposal to be compared with PEP 215. Some of us came to the conclusion
that PEP 215 is better than the new proposal. Nobody has, AFAIK,
proposed to complete or implement the PEP.

> How many of you use Itpl regularly?  If Python were so deficient in
> this regard, I would expect to see a lot of hands.  ....

The hassle of an extra dependency is without a doubt greater than the
hassle of working around Python in this regard. But then there are may
features in today's Python that fell into that category originally. Like
you could get a form of type/class unification from ExtensionClass. But
who would bother to install ExtensionClass just for that?

Anyhow, Mailman's code demonstrates that when the feature is provided at
low cost (i.e. no dependency), people use it. 

> is really not much more onerous, and certainly less jarring to my eye
> than all those $ signs.

This from mister print >>? ;)

 Paul Prescod

From  Tue Jan 15 10:34:04 2002
From: (M.-A. Lemburg)
Date: Tue, 15 Jan 2002 11:34:04 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> >    OK, PEP 277 is now available from:
> >
> Looks very good to me, except that the listdir approach (unicode in,
> unicode out) should apply uniformly to all platforms; I'll provide an
> add-on patch to your implementation once the PEP is approved.


Some nits:

The restriction when compiling Python in wide mode on Windows
should be lifted: The PyUnicode_AsWideChar() API should be used
to convert 4-byte Unicode to wchar_t (which is 2-byte on Windows).

Why is "unicodefilenames" a function and not a constant ?

I'm still in favour of a file API abstraction layer in Python,
but that can be done at some later point (moving the code from
the various platform specific modules into a Python/fileapi.c

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Tue Jan 15 13:20:15 2002
From: (Jack Jansen)
Date: Tue, 15 Jan 2002 14:20:15 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT, was PEP-time ?
In-Reply-To: Message by "Martin v. Loewis" <> ,
 Mon, 14 Jan 2002 08:11:54 +0100 , <>
Message-ID: <>

> >    OK, PEP 277 is now available from:
> >
> Looks very good to me, except that the listdir approach (unicode in,
> unicode out) should apply uniformly to all platforms; I'll provide an
> add-on patch to your implementation once the PEP is approved.

Yes, I would like this. On Mac OS X I don't have wide API's, but all calls use 
and return utf8 filenames. If listdir() could return Unicode I could convert 
the utf8 results to Unicode without setting sys.encoding.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Tue Jan 15 15:29:26 2002
From: (Jack Jansen)
Date: Tue, 15 Jan 2002 16:29:26 +0100
Subject: [Python-Dev] Name clash with typedefs in object.h
Message-ID: <>

Object.h declares various typedefs for routine pointers, and their names are 
not adorned with some sort of Py_ prefix.

Suddenly this has started to be a problem for me on OSX (not sure why: either 
object.h changed or because I got a new version of the OSX devtools): object.h 
declares a typedef "destructor", and if that is in scope when <pthread.h> is 
included this fails, which uses the name "destructor" as an argument name (for 
a routine pointer), and the parser gets confused.

I think it's GCC that's to blame here, but still: shouldn't these names have 
some sort of a prefix?

Alternatively I can apply a quick fix by defining "destructor" as something 
else just before including <pthread.h>...
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -

From  Tue Jan 15 15:35:11 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 10:35:11 -0500
Subject: [Python-Dev] Name clash with typedefs in object.h
In-Reply-To: Your message of "Tue, 15 Jan 2002 16:29:26 +0100."
References: <>
Message-ID: <>

> Object.h declares various typedefs for routine pointers, and their names are 
> not adorned with some sort of Py_ prefix.
> Suddenly this has started to be a problem for me on OSX (not sure
> why: either object.h changed or because I got a new version of the
> OSX devtools): object.h declares a typedef "destructor", and if that
> is in scope when <pthread.h> is included this fails, which uses the
> name "destructor" as an argument name (for a routine pointer), and
> the parser gets confused.

destructor is a very old typedef, so OSX must've changed. :-)

> I think it's GCC that's to blame here, but still: shouldn't these
> names have some sort of a prefix?

Looking back, yes, definitely.  They were overlooked by the "grand
renaming" because they aren't visible to the loader.  But hard to fix
-- these typedefs are used in 3rd party extensions all over the place.

> Alternatively I can apply a quick fix by defining "destructor" as
> something else just before including <pthread.h>...

That sounds like the right fix, but please do it inside a platform
#ifdef.  I believe typedef names are exported as gdb symbols, but CPP
#defines are not.

--Guido van Rossum (home page:

From  Tue Jan 15 17:43:56 2002
From: (Paul Prescod)
Date: Tue, 15 Jan 2002 09:43:56 -0800
Subject: [Python-Dev] Utopian String Interpolation
Message-ID: <>

I think that if we're going to do string interpolation we might as go
all of the way and have one unified string interpolation model.

 1. There should be no string-prefix. Instead the string \$ should be
magical in all non-raw literal strings as \x, \n etc. are. (if you want
to do string interpolation on a raw string, you could do it using the
method version below)

>>> from __future__ import string_interp

>>> a = "acos(.5) = \$(acos(.5))"

Embrace the __future__!

 2. There should be a transition period where literal strings containing
"\$" are flagged. This is likely rare but may occur here and there. And
by the way, unused \-sequences should probably be proactively reserved
now instead of silently "failing" as they do today. What's the use of
making "\" special if sometimes it isn't special?

 3. I think that it would be clearest if any expression other than a
simple variable name required "\$(". But that's a
minor decision.

 4. Between the $-sign and the opening paren, it should be possible to
put a C-style formatting specification. 

"pi = \$5.3f(math.pi)". 

There is no reason to force people to switch to a totally different
language feature to get that functionality. I never use it myself but
presume that scientists do!

 5. The interpolation functionality is useful enough to be available for
use on runtime-generated strings. But at runtime it should have a
totally different syntax. Now that Python has string methods it is clear
that "%" could (and IMO should) have been implemented that way:

newstr = mystr.interp(variabledict, evaluate_expressions=0)

By default evaluate_expressions is turned off. That means that all it
does is look up variables in the dictionary and insert them into the
string where it seems \$. If you want full interpretation behaviour you
would flip the evaluate_expressions switch. May Guido have mercy on your

 6. People should be discouraged from using the "%" version. Some day
far in the future it could be officially deprecated. We'll tell our
children stories about the days when we modulo'd strings, tuples and
dictionaries in weird and wonderful ways.

Once the (admittedly long) transition period is over, we would simply
have a better way to do everything we can do today. Code using the new
model will be easier to read, more concise, more consistent, more like
other scripting languages, abuse syntax less and use fewer logical
concepts. Arguably, functions like vars(), locals() and globals() could
be relegated to an "introspection" module where no newbie will ever look
at them again. (okay, now I'm over-reaching)

There will undoubtedly be language-change backlash. Guido will take the
heat, not me. He would have to decide if it was worth the pain. I think,
however, that the resulting language would be an improvement for experts
and newbies alike. And as with other changes -- sooner is better than
later. The year after next year is going to be the Year of Python so
let's get our changes in before then!

 Paul Prescod

From  Tue Jan 15 20:04:52 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 15:04:52 -0500
Subject: [Python-Dev]
Message-ID: <>

In the most recent CVS checkout on the trunk, test_unicode_file has
started to fail.  Traceback:

Traceback (most recent call last):
  File "../Lib/test/", line 61, in ?
    if base not in os.listdir(path):
UnicodeError: ASCII decoding error: ordinal not in range(128)

This is on Linux (Red Hat 6.2, still).

--Guido van Rossum (home page:

From  Tue Jan 15 21:15:18 2002
From: (Paul Svensson)
Date: Tue, 15 Jan 2002 16:15:18 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Tue, 15 Jan 2002, Paul Prescod wrote:

>I think that if we're going to do string interpolation we might as go
>all of the way and have one unified string interpolation model.

Nice pie in the sky; my comments inserted below.

> 1. There should be no string-prefix. Instead the string \$ should be
>magical in all non-raw literal strings as \x, \n etc. are. (if you want
>to do string interpolation on a raw string, you could do it using the
>method version below)

+1 on no prefix, -0 on \$.
To my eyes, \(whatever) looks much cleaner, tho I'm not sure how
that would work with the evaluate_expressions flag in (5).

> 2. There should be a transition period where literal strings containing
>"\$" are flagged. This is likely rare but may occur here and there. And
>by the way, unused \-sequences should probably be proactively reserved
>now instead of silently "failing" as they do today. What's the use of
>making "\" special if sometimes it isn't special?

+1 on making undefined \-sequences raise SyntaxError.

> 3. I think that it would be clearest if any expression other than a
>simple variable name required "\$(". But that's a
>minor decision.

+1 on parens, but see my comments to (1).

> 4. Between the $-sign and the opening paren, it should be possible to
>put a C-style formatting specification. 
>"pi = \$5.3f(math.pi)". 
>There is no reason to force people to switch to a totally different
>language feature to get that functionality. I never use it myself but
>presume that scientists do!

Eek -- feeping creaturism.  -2.
The only reason to add this here is to be able to remove the % operator
on strings, and I'm not convinced that is the right way to go.
Anyways, this just begs to be spelled something like \%5.3f(math.pi).
Printf-like format specifications without a %-character seems just weird.

> 5. The interpolation functionality is useful enough to be available for
>use on runtime-generated strings. But at runtime it should have a
>totally different syntax. Now that Python has string methods it is clear
>that "%" could (and IMO should) have been implemented that way:
>newstr = mystr.interp(variabledict, evaluate_expressions=0)
>By default evaluate_expressions is turned off. That means that all it
>does is look up variables in the dictionary and insert them into the
>string where it seems \$. If you want full interpretation behaviour you
>would flip the evaluate_expressions switch. May Guido have mercy on your

-0.  Here I think is a good place to draw the line before the returns
diminish too far.  I see the major part of the usefulness of string
interpolation coming from compile time usage, and that also nicely matches
how all other \-sequences are handled.


From  Tue Jan 15 21:13:16 2002
From: (Martin v. Loewis)
Date: Tue, 15 Jan 2002 22:13:16 +0100
Subject: [Python-Dev]
In-Reply-To: <> (message
 from Guido van Rossum on Tue, 15 Jan 2002 15:04:52 -0500)
References: <>
Message-ID: <>

> In the most recent CVS checkout on the trunk, test_unicode_file has
> started to fail.  Traceback:
> Traceback (most recent call last):
>   File "../Lib/test/", line 61, in ?
>     if base not in os.listdir(path):
> UnicodeError: ASCII decoding error: ordinal not in range(128)

Until PEP 277 is approved, the tests that Mark recently added is
bogus: The return value of os.listdir is (currently) a list of byte
strings, and you cannot (portably) compare those to a Unicode string
if the byte strings contain non-ASCII characters.

I'm surprised the test passed for Mark; he either has Neil's patches
installed, or has set the default encoding to "mbcs" on his system.

I recommend to apply the attached patch.


RCS file: /cvsroot/python/python/dist/src/Lib/test/,v
retrieving revision 1.3
diff -u -r1.3
---	2002/01/07 02:11:43	1.3
+++	2002/01/15 21:06:24
@@ -55,11 +55,12 @@
     print "File doesn't exist after creating it"
 path, base = os.path.split(os.path.abspath(TESTFN_ENCODED))
-if base not in os.listdir(path):
-    print "Filename did not appear in os.listdir()"
-path, base = os.path.split(os.path.abspath(TESTFN_UNICODE))
-if base not in os.listdir(path):
-    print "Unicode filename did not appear in os.listdir()"
+# Until PEP 277 is adopted, this test is not portable
+#  if base not in os.listdir(path):
+#      print "Filename did not appear in os.listdir()"
+#  path, base = os.path.split(os.path.abspath(TESTFN_UNICODE))
+#  if base not in os.listdir(path):
+#      print "Unicode filename did not appear in os.listdir()"
 if os.path.abspath(TESTFN_ENCODED) != os.path.abspath(glob.glob(TESTFN_ENCODED)[0]):
     print "Filename did not appear in glob.glob()"

From  Tue Jan 15 21:21:04 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 16:21:04 -0500
Subject: [Python-Dev]
In-Reply-To: Your message of "Tue, 15 Jan 2002 22:13:16 +0100."
References: <>
Message-ID: <>

> Until PEP 277 is approved, the tests that Mark recently added is
> bogus: The return value of os.listdir is (currently) a list of byte
> strings, and you cannot (portably) compare those to a Unicode string
> if the byte strings contain non-ASCII characters.
> I'm surprised the test passed for Mark; he either has Neil's patches
> installed, or has set the default encoding to "mbcs" on his system.
> I recommend to apply the attached patch.

Thanks.  Done.

--Guido van Rossum (home page:

From  Tue Jan 15 21:24:31 2002
From: (Martin v. Loewis)
Date: Tue, 15 Jan 2002 22:24:31 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <>
Message-ID: <>

> The restriction when compiling Python in wide mode on Windows
> should be lifted: The PyUnicode_AsWideChar() API should be used
> to convert 4-byte Unicode to wchar_t (which is 2-byte on Windows).

While I agree that this restriction ought to be removed eventually, I
doubt that Python will be usable on Windows with a four-byte Unicode
type in any foreseeable future. 

Just have a look at unicodeobject.c:PyUnicode_DecodeMBCS; it makes the
assumption that a Py_UNICODE* is the same thing as a WCHAR*. That
means that the "mbcs" encoding goes away on Windows if
HAVE_USABLE_WCHAR_T does not hold anymore.

Also, I believe most of PythonWin also assumes HAVE_USABLE_WCHAR_T
(didn't check, though).

> Why is "unicodefilenames" a function and not a constant ?

In the Windows binary, you need a run-time check to see whether this
is DOS/W9x, or NT/W2k/XP; on DOS, the Unicode API is not available
(you still can pass Unicode file names to open and listdir, but they
will get converted through the MBCS encoding). So it clearly is not a
compile time constant.

I'm still not certain what the meaning of this function is, if it
means "Unicode file names are only restricted by the file system
conventions", then on Unix, it may change at run-time, if a user or
the application sets an UTF-8 locale, switching from the original "C"


From  Tue Jan 15 22:09:44 2002
From: (Neil Hodgson)
Date: Wed, 16 Jan 2002 09:09:44 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT, was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <>
Message-ID: <01e701c19e11$567f71f0$0acc8490@neil>

Martin v. Loewis:
> >    OK, PEP 277 is now available from:
> >
> Looks very good to me, except that the listdir approach (unicode in,
> unicode out) should apply uniformly to all platforms; I'll provide an
> add-on patch to your implementation once the PEP is approved.

   Won't this lead to a less useful result as Py_FileSystemDefaultEncoding
will be NULL on, for example, Linux, so if there are names containing
non-ASCII characters then it will either raise an exception or stick '?'s in
the names. So it would be better to use narrow strings there as that will
pass through all file names.

   You have probably already realised, but Windows 9x will also need a
Unicode preserving listdir but it will have to encode using mbcs.


From  Tue Jan 15 22:21:03 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 17:21:03 -0500
Subject: [Python-Dev] Starting 2.1.2 final release
Message-ID: <>

We're going to cut a 2.1.2 final release tonight.  Anthony had to bow
out for personal reasons, so it's the PythonLabs crew who are doing
the actual release for him.  In honor of Anthony's timezone (and
because we're all night owls here :-), the official release date will
be January 16.

Please no more checkins to the release21-maint branch, except from

--Guido van Rossum (home page:

From Anthony Baxter <>  Tue Jan 15 22:39:14 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Wed, 16 Jan 2002 09:39:14 +1100
Subject: [Python-Dev] Re: Starting 2.1.2 final release
In-Reply-To: Message from Guido van Rossum <>
 of "Tue, 15 Jan 2002 17:21:03 CDT." <>
Message-ID: <>

>>> Guido van Rossum wrote
> We're going to cut a 2.1.2 final release tonight.  Anthony had to bow
> out for personal reasons, so it's the PythonLabs crew who are doing
> the actual release for him.  In honor of Anthony's timezone (and
> because we're all night owls here :-), the official release date will
> be January 16.

Thanks for doing this - my most sincere apologies for the last minute
drop-out on this - I need to find a new place to live before I head
over for the python conference.


From  Tue Jan 15 23:12:21 2002
From: (Paul Prescod)
Date: Tue, 15 Jan 2002 15:12:21 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <>
Message-ID: <>

Paul Svensson wrote:
> +1 on no prefix, -0 on \$.
> To my eyes, \(whatever) looks much cleaner, tho I'm not sure how
> that would work with the evaluate_expressions flag in (5).

An offline correspond suggested that and also suggested perhaps \`. \`
is nicely reminicent of `abc` and it does basically the same thing, only
in strings, so I kind of like it. 

>>> `5+3`
>>> "\`5 + 3` is enough"
8 is enough

The downside is that larger characters like $ and % are much more clear
to my eyes. Plus there is the whole apos-backtick confusion.

The problem with \( is that that is likely to already be a popular
string in regular expressions.

> > 4. Between the $-sign and the opening paren, it should be possible to
> >put a C-style formatting specification.
> >
> >"pi = \$5.3f(math.pi)".
> >
> >There is no reason to force people to switch to a totally different
> >language feature to get that functionality. I never use it myself but
> >presume that scientists do!
> Eek -- feeping creaturism.  -2.

The feature is already there and sometimes used. We either keep two
different ways to spell interpolation or we incorporate it.

> The only reason to add this here is to be able to remove the % operator
> on strings, and I'm not convinced that is the right way to go.
> Anyways, this just begs to be spelled something like \%5.3f(math.pi).
> Printf-like format specifications without a %-character seems just weird.

The offline correspondant also had this idea and I'm coming around to

> -0.  Here I think is a good place to draw the line before the returns
> diminish too far.  I see the major part of the usefulness of string
> interpolation coming from compile time usage, and that also nicely matches
> how all other \-sequences are handled.

And do what to do templating at runtime? Modulo? string.replace? Or just
don't provide that feature? Also, how to handle interpolation in raw

 Paul Prescod

From  Tue Jan 15 16:12:44 2002
From: (
Date: Tue, 15 Jan 2002 16:12:44
Subject: [Python-Dev] More Customers Now !
Message-ID: <>

<title>Untitled Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<body bgcolor="#FFFFFF" text="#000000">
<p>I noticed your email address on a list serve related to technology and web 
  development. Our company has developed a simple, risk-free and cost effective 
  method of generating leads and creating awareness for your Company through targeted 
  email marketing. Please read on to find out more about this awesome service.<br>
<p><b>The process: </b> </p>
<p> </p>
  <li>You provide us with keywords pertaining to your company's target market. 
  <li>Using our proprietary spider software, we spider the Internet searching 
    for email addresses that are on pages that match those keywords.</li>
  <li>We setup a database driven form, which allows the prospect to input their 
    contact, company, and any other relative information that you may require. 
  <li>We send emails to the addresses collected. These emails do not contain your 
    company&#146;s name, so that your company is protected. </li>
  <li>Once the prospect has filled out and submitted their information, the data 
    is automatically written to a database. </li>
  <li>You may then login to our web driven application and view the current leads 
    that have been submitted. <br>
<p> </p>
<p>We typically provide our clients with anywhere from 30 &#150; 200 leads per 
  week with our standard package depending on: </p>
  <li>Target market</li>
  <li>Keywords used </li>
  <li>Product pricing <br>
<p>We will help develop a customized system for your company that will ensure 
  <b>maximum return</b>.</p>
<p> </p>
<p> </p>
<p>Our standard package includes the following:</p>
<p> </p>
  <li>HTML email design and implementation </li>
  <li>Form and database setup </li>
  <li>Over 100,000 emails distributed per month (done on a weekly basis)</li>
  <li>Email address collection and filtering <br>
<p>Cost: $750 per Month</p>
<p> </p>
<p>The above price is all-inclusive, and no other charges will be incurred. We 
  can also provide higher quantities of distribution if required. Please contact 
  us for details.</p>
<p> </p>
<p>If you would like more information on our services or would like to get started 
  please <a href="">click here</a><br>
<p>Gary Michaels</p>

From  Wed Jan 16 00:37:56 2002
From: (Paul Svensson)
Date: Tue, 15 Jan 2002 19:37:56 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Tue, 15 Jan 2002, Paul Prescod wrote:

>Paul Svensson wrote:
>> +1 on no prefix, -0 on \$.
>> To my eyes, \(whatever) looks much cleaner, tho I'm not sure how
>> that would work with the evaluate_expressions flag in (5).
>An offline correspond suggested that and also suggested perhaps \`. \`
>is nicely reminicent of `abc` and it does basically the same thing, only
>in strings, so I kind of like it. 
>>>> `5+3`
>>>> "\`5 + 3` is enough"
>8 is enough
>The downside is that larger characters like $ and % are much more clear
>to my eyes. Plus there is the whole apos-backtick confusion.

I thought of \` as well, but didn't suggest it, mainly for those reasons.

>The problem with \( is that that is likely to already be a popular
>string in regular expressions.

In which case it should either be a raw string, or spelled \\(.
(We _really_ need to issue syntax errors on undefined \-sequences)

>> > 4. Between the $-sign and the opening paren, it should be possible to
>> >put a C-style formatting specification.
>> >
>> >"pi = \$5.3f(math.pi)".
>> >
>> >There is no reason to force people to switch to a totally different
>> >language feature to get that functionality. I never use it myself but
>> >presume that scientists do!
>> Eek -- feeping creaturism.  -2.
>The feature is already there and sometimes used. We either keep two
>different ways to spell interpolation or we incorporate it.

I don't think interpolation and variable formatting are similar enough to
conflate in a single notation -- wasn't it the ungainliness of using the
existing variable formatting to interpolate that started this thread ?

>> The only reason to add this here is to be able to remove the % operator
>> on strings, and I'm not convinced that is the right way to go.
>> Anyways, this just begs to be spelled something like \%5.3f(math.pi).
>> Printf-like format specifications without a %-character seems just weird.
>The offline correspondant also had this idea and I'm coming around to

I'm not particularly happy with that idea; simply mimicking the syntax
it was supposed to replace, for little gain.
I also think there could be some cause for confusion between \%(foo)s
looking in vars() and %(foo)s using the other side of the % operator.

>> -0.  Here I think is a good place to draw the line before the returns
>> diminish too far.  I see the major part of the usefulness of string
>> interpolation coming from compile time usage, and that also nicely matches
>> how all other \-sequences are handled.
>And do what to do templating at runtime? Modulo? string.replace? Or just
>don't provide that feature? Also, how to handle interpolation in raw

Since the whole point of raw strings is to _not_ touch what's inside
the quotes, I don't see how string interpolation makes much sense there.

As for runtime templating, a string method to replace \-sequences
seems like a very straightforward idea, that shouldn't need much
discussion.  Call it "".eval([globals, [locals]]), to get some
educational synergy from teaching all the newbies not to give
unchecked user input to eval().

I still think compile-time templating would be the more common use,
and thus should be the driving issue behind the design.


From  Wed Jan 16 01:08:34 2002
From: (Neil Hodgson)
Date: Wed, 16 Jan 2002 12:08:34 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,  was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <>
Message-ID: <057101c19e2a$5217efc0$0acc8490@neil>

Martin v. Loewis:

> I'm still not certain what the meaning of this function is, if it
> means "Unicode file names are only restricted by the file system
> conventions", then on Unix, it may change at run-time, if a user or
> the application sets an UTF-8 locale, switching from the original "C"
> locale.

   The underlying motivation of the function is for code to be able to ask
"Is it better to pass Unicode strings to file operations"? For me the main
criterion for "better" is whether all files are accessible. It is best to
determine this through a test that does not require writing or that is
dependent on the user's setup, such as having a "C:" drive.

   Switching to a UTF-8 locale on Unix will make files inaccessible where
their names contain illegal UTF-8 sequences.


From  Wed Jan 16 02:53:08 2002
From: (Jason Orendorff)
Date: Tue, 15 Jan 2002 20:53:08 -0600
Subject: [Python-Dev] PEP_215_ (string interpolation) alternative EvalDict
In-Reply-To: <Pine.OSX.4.43.0201150120550.286-100000@localhost>
Message-ID: <>

> But your example suggests to me:
> >>> input('?: ')
> ?: $'$os.system("rm -rm *" )'
> I guess you need to special case that out of the compiler also.
> ( Are there any others lurking about ? )

The user could just as well type
  ?: os.system("rm -rf *")
and save some keystrokes.

input() is totally insecure.  Always has been.  Nothing new here.

## Jason Orendorff

From  Wed Jan 16 03:05:49 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 22:05:49 -0500
Subject: [Python-Dev] RELEASED - Python 2.1.2 (final)
Message-ID: <>

I've released the final version of Python 2.1.2 - a bugfix release for
Python 2.1.  I recommend everyone who is using Python 2.1 or
2.1.1 to upgrade to 2.1.2 -- this release fixes a few crashes.
Read about it and download it here:

My special thanks go out to Anthony Baxter, the relentless 2.1.2
releasemeister (and for the use of his timezone so I can call this a
January 16 release without having to stay up until after midnight :-).

--Guido van Rossum (home page:

From  Wed Jan 16 03:26:51 2002
From: (Russ Cox)
Date: Tue, 15 Jan 2002 22:26:51 -0500
Subject: [Python-Dev] thread_foobar.h routines
Message-ID: <>

I'm writing thread routines for the Plan 9 port of Python.

Is it correct that:

	PyThread_acquire_lock returns 1 on success, 0 on failure.
	PyThread_down_sema returns 0 on success, -1 on failure.

It appears that way, but the inconsistency bothers me.


From  Wed Jan 16 03:33:08 2002
From: (Guido van Rossum)
Date: Tue, 15 Jan 2002 22:33:08 -0500
Subject: [Python-Dev] thread_foobar.h routines
In-Reply-To: Your message of "Tue, 15 Jan 2002 22:26:51 EST."
References: <>
Message-ID: <>

> I'm writing thread routines for the Plan 9 port of Python.
> Is it correct that:
> 	PyThread_acquire_lock returns 1 on success, 0 on failure.
> 	PyThread_down_sema returns 0 on success, -1 on failure.
> It appears that way, but the inconsistency bothers me.

Me too.  The PyThread_*_sema routines are not used, and I would
recommend that you not bother implementing them at all.  (If anyone
used them, we would have heard a complaint -- in some thread
implementations these return -1 for failure, in others 0.  :-)

We should cut these out of the sources.

--Guido van Rossum (home page:

From  Wed Jan 16 04:00:38 2002
From: (Fred L. Drake, Jr.)
Date: Tue, 15 Jan 2002 23:00:38 -0500 (EST)
Subject: [Python-Dev] thread_foobar.h routines
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum writes:
 > Me too.  The PyThread_*_sema routines are not used, and I would
 > recommend that you not bother implementing them at all.  (If anyone
 > used them, we would have heard a complaint -- in some thread
 > implementations these return -1 for failure, in others 0.  :-)
 > We should cut these out of the sources.

I'll be glad to do this.  A quick grep seems to show that this really
does apply to *all* PyThread_*_sema() routines.

If there are no objections, I'll have this done quickly.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Wed Jan 16 04:21:16 2002
From: (
Date: Tue, 15 Jan 2002 22:21:16 -0600
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict)
In-Reply-To: <>
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost> <> <>
Message-ID: <>

On Tue, Jan 15, 2002 at 02:04:10AM -0500, Barry A. Warsaw wrote:
> [1] I use _() ostensibly to mark translatable strings, but it has a
> side benefit in that it interpolates into the string named variables
> from the locals and globals of the calling context.  It does this by
> using sys._getframe(1) in Python 2.1 and try/except hackery in older
> versions of Python.  I find it quite handy, and admittedly magical,
> but then I'm not suggesting it become a standard Python feature. :)

This caught my eye.

How will programs that use PEP215 for string interpolation be translatable?
All translation systems use some method of identifying the strings in
source code, then permitting mapping from the string identifiers to the
real strings at runtime.  With "gettext", the "string identifier" is
typically the original-language string, and the marker/mapper is spelled
_("string literal").

Given that short introduction, it's obvious how 
	_("hi there, %s") % yourname
works, and why
	_("hi there, %s" % yourname)
doesn't work, but how will I use a similar scheme to translate
	$"hi there, $yourname"
?  Obviously, 
	_($"hi there, $yourname")
won't work, because it's equivalent to the second, non-working translation

Well, I guess we could add _ and $_ strings to Python, right?

grumble-grumble'ly yours,

From  Wed Jan 16 05:00:44 2002
From: (Tim Peters)
Date: Wed, 16 Jan 2002 00:00:44 -0500
Subject: [Python-Dev] thread_foobar.h routines
In-Reply-To: <>
Message-ID: <>

[Fred L. Drake, Jr., on removing the unused PyThread_*_sema routines]
> If there are no objections, I'll have this done quickly.

+1, and you patch looks fine from a skim (and I'd rather fix it
retroactively if necessary than bother to apply it first -- live a little,
check it in, we're still pre-alpha-1 for 2.3 <wink>).

From  Wed Jan 16 05:17:36 2002
From: (Fred L. Drake, Jr.)
Date: Wed, 16 Jan 2002 00:17:36 -0500 (EST)
Subject: [Python-Dev] thread_foobar.h routines
In-Reply-To: <>
References: <>
Message-ID: <>

Tim Peters writes:
 > +1, and you patch looks fine from a skim (and I'd rather fix it
 > retroactively if necessary than bother to apply it first -- live a little,
 > check it in, we're still pre-alpha-1 for 2.3 <wink>).

Guido asked me to wait a day in case any legitimate reasons to keep
those routines popped up from python-dev, otherwise it would be in


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Wed Jan 16 05:22:18 2002
From: (Fred L. Drake, Jr.)
Date: Wed, 16 Jan 2002 00:22:18 -0500 (EST)
Subject: [Python-Dev] Intel C/C++ compiler evaluation version
Message-ID: <>

Has anyone tried the evaluation version of the Intel C/C++ compiler
for Linux 32-bit platforms?  They distributed a CD in the most recent
version of Linux Magazine, and it appears to be available for download
as well.

I had trouble getting it going; the evaluation license file they sent
me didn't work out of the box with the license manager that got
installed.  If anyone has gotten it to work, please send instructions


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Wed Jan 16 05:40:03 2002
From: (Mark Hammond)
Date: Wed, 16 Jan 2002 16:40:03 +1100
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: <>
Message-ID: <>

Fred writes:
> Guido van Rossum writes:
>  > The various modules ntpath, posixpath, macpath etc. are not just their
>  > to support their own platform on itself.  They are also there to
> Note that ntpath.abspath() relies on nt._getfullpathname().  It is not
> unreasonable for this particular function to require that it actually
> be running on NT, so I'm not going to suggest changing this.  On the
> other hand, it means the portable portions of the module are (mostly)
> not tested when the regression test is run on a platform other than
> Windows; the ntpath.abspath() test raises an ImportError since
> ntpath.abspath() imports the "nt" module within the function, and the
> resulting ImportError causes the rest of the unit test to be skipped
> and reports that the test is skipped.
> I'd like to change the test so that the abspath() test is only run
> if the "nt" module is available:

Sigh - this too would be my fault :(

Before _getfullpathname() was added to the 'nt' module, there was an attempt
to import 'win32api', and if OK, use the equivilent function from that.
When I added the new function to 'nt', I removed that import check, in the
belief it would now always succeed.  This was obviously a bad call ;)  (FYI,
that was rev 1.35 of

A patch that reinstates the code would be:

RCS file: /cvsroot/python/python/dist/src/Lib/,v
retrieving revision 1.44
diff -u -r1.44
---	2001/11/05 21:25:02	1.44
+++	2002/01/16 05:35:19
@@ -457,8 +457,18 @@
 # Return an absolute path.
 def abspath(path):
     """Return the absolute version of a path"""
-    if path: # Empty path must return current working directory.
+    try:
         from nt import _getfullpathname
+    except ImportError: # Not running on Windows - mock up something
+        global abspath
+        def _abspath(path):
+            if not isabs(path):
+                path = join(os.getcwd(), path)
+            return normpath(path)
+        abspath = _abspath
+        return _abspath(path)
+    if path: # Empty path must return current working directory.
             path = _getfullpathname(path)
         except WindowsError:

This should also solve the test case problem.



From  Wed Jan 16 05:53:46 2002
From: (Fred L. Drake, Jr.)
Date: Wed, 16 Jan 2002 00:53:46 -0500 (EST)
Subject: [Python-Dev] guidance sought: merging port related changes to Library modules
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Hammond writes:
 > Before _getfullpathname() was added to the 'nt' module, there was an attempt
 > to import 'win32api', and if OK, use the equivilent function from that.
 > When I added the new function to 'nt', I removed that import check, in the
 > belief it would now always succeed.  This was obviously a bad call ;)  (FYI,
 > that was rev 1.35 of
 > A patch that reinstates the code would be:
 > This should also solve the test case problem.

I haven't tested this, but it looks OK to me.  Feel free to check it
in.  Thanks!


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Wed Jan 16 07:08:33 2002
From: (Martin v. Loewis)
Date: Wed, 16 Jan 2002 08:08:33 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT, was PEP-time ? ...
In-Reply-To: <01e701c19e11$567f71f0$0acc8490@neil> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <01e701c19e11$567f71f0$0acc8490@neil>
Message-ID: <>

>    Won't this lead to a less useful result as Py_FileSystemDefaultEncoding
> will be NULL on, for example, Linux, so if there are names containing
> non-ASCII characters then it will either raise an exception or stick '?'s in
> the names. So it would be better to use narrow strings there as that will
> pass through all file names.

On Linux, if the user has set LANG to a reasonable value, and the
Python application has invoked setlocale(),
Py_FileSystemDefaultEncoding will not be NULL.

It still might happen that an individual file name cannot be decoded
from the file system encoding, e.g. if the locale is set to UTF-8, but
you have a Latin-1 file name (created by a different user). In that
exceptional case, I would neither expect an exception, nor expect
replacement characters in the Unicode string, but instead use a byte
string *for this specific file name*.

Just because there is there is the rare chance that you cannot
meaningfully interpret a certain file name does not mean that all
other installation have to suffer.

>    You have probably already realised, but Windows 9x will also need a
> Unicode preserving listdir but it will have to encode using mbcs.

Exactly. Unfortunately, we cannot do anything to avoid replacement
characters here, since it is already Windows who will introduce
them. In turn, we know that decoding from "mbcs" will always succeed.


From  Wed Jan 16 07:34:12 2002
From: (Martin v. Loewis)
Date: Wed, 16 Jan 2002 08:34:12 +0100
Subject: [Python-Dev] Intel C/C++ compiler evaluation version
In-Reply-To: <>
References: <>
Message-ID: <>

> Has anyone tried the evaluation version of the Intel C/C++ compiler
> for Linux 32-bit platforms?  They distributed a CD in the most recent
> version of Linux Magazine, and it appears to be available for download
> as well.
> I had trouble getting it going; the evaluation license file they sent
> me didn't work out of the box with the license manager that got
> installed.  If anyone has gotten it to work, please send instructions
> around!

We had no problems installing it. The compiler goes into
/opt/intel/compiler50/ia32/*, the license into

On the Debian system with a alien RPM installation, the RPM
postinstall scripts did not execute properly, so we adjusted the
configuration files ourselves (in particular, the postinstall script
would have created a broken .csh file, anyway). Looking at the
iccvars.csh script, make sure the following settings are correct:

setenv IA32ROOT /opt/intel/compiler50/ia32
setenv INTEL_FLEXLM_LICENSE /opt/intel/licenses

( accordingly). I don't think we run flexlm; sourcing the
appropriate settings is enough.


From  Wed Jan 16 11:38:54 2002
From: (Neil Hodgson)
Date: Wed, 16 Jan 2002 22:38:54 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,  was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <>
Message-ID: <018101c19e82$60c42950$0acc8490@neil>

M.-A. Lemburg:

> The restriction when compiling Python in wide mode on Windows
> should be lifted: The PyUnicode_AsWideChar() API should be used
> to convert 4-byte Unicode to wchar_t (which is 2-byte on Windows).

   I'd prefer not to include this as it adds complexity for little benefit
but am prepared to do the implementation if it is required.


From  Wed Jan 16 13:17:54 2002
From: (Barry A. Warsaw)
Date: Wed, 16 Jan 2002 08:17:54 -0500
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict)
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
Message-ID: <>

>>>>> "jepler" ==   <> writes:

    jepler> Well, I guess we could add _ and $_ strings to Python,
    jepler> right?

Ug.  t'' strings have been discussed before w.r.t. i18n markup, but I
don't like it.  I think it's a mistake to proliferate string
prefixes.  But search the i18n-sig for more discussion on the topic.


From  Wed Jan 16 14:22:41 2002
From: (Skip Montanaro)
Date: Wed, 16 Jan 2002 08:22:41 -0600
Subject: [Python-Dev] deprecate input()?
Message-ID: <>

I just responded to a question on a user had about feeding empty
strings to input().  While he didn't say why he called input(), I suspect he
thought the semantics were more like raw_input().

In these days of widespread Internet nastiness, shouldn't input() be

Skip Montanaro ( -

From  Wed Jan 16 18:22:37 2002
From: (M.-A. Lemburg)
Date: Wed, 16 Jan 2002 19:22:37 +0100
Subject: [Python-Dev] Utopian String Interpolation
References: <>
Message-ID: <>

Paul Prescod wrote:
> I think that if we're going to do string interpolation we might as go
> all of the way and have one unified string interpolation model.
>  1. There should be no string-prefix. Instead the string \$ should be
> magical in all non-raw literal strings as \x, \n etc. are. (if you want
> to do string interpolation on a raw string, you could do it using the
> method version below)
> >>> from __future__ import string_interp
> >>> a = "acos(.5) = \$(acos(.5))"
> Embrace the __future__!


Too dangerous. If string interpolation makes it into the core,
then please use a *new* construct. '\$' is currently interpreted
as '\$' and this should not be changed (heck, just think what would
happen to all the shell script snippets encoded in Python strings).
BTW, why don't you wrap all this interpolation stuff into
a module and then call a function to have it apply all the
magic you want. If I remember correctly, someone
else has already written such a module for Python.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan 16 18:48:49 2002
From: (M.-A. Lemburg)
Date: Wed, 16 Jan 2002 19:48:49 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > The restriction when compiling Python in wide mode on Windows
> > should be lifted: The PyUnicode_AsWideChar() API should be used
> > to convert 4-byte Unicode to wchar_t (which is 2-byte on Windows).
> While I agree that this restriction ought to be removed eventually, I
> doubt that Python will be usable on Windows with a four-byte Unicode
> type in any foreseeable future.

Perhaps Neil ought to copy your notes to the PEP, so that we
don't forget about this issue.
> Just have a look at unicodeobject.c:PyUnicode_DecodeMBCS; it makes the
> assumption that a Py_UNICODE* is the same thing as a WCHAR*. That
> means that the "mbcs" encoding goes away on Windows if
> HAVE_USABLE_WCHAR_T does not hold anymore.
> Also, I believe most of PythonWin also assumes HAVE_USABLE_WCHAR_T
> (didn't check, though).
> > Why is "unicodefilenames" a function and not a constant ?
> In the Windows binary, you need a run-time check to see whether this
> is DOS/W9x, or NT/W2k/XP; on DOS, the Unicode API is not available
> (you still can pass Unicode file names to open and listdir, but they
> will get converted through the MBCS encoding). So it clearly is not a
> compile time constant.

I see.
> I'm still not certain what the meaning of this function is, if it
> means "Unicode file names are only restricted by the file system
> conventions", then on Unix, it may change at run-time, if a user or
> the application sets an UTF-8 locale, switching from the original "C"
> locale.

Doesn't it mean: "posix functions and file() can accept Unicode file 
names" ? 

That's what I thought, at least; whether they succeed or not 
is another question and could well be handled by run-time errors
(e.g. on Unix it is not at all clear whether NFS, Samba or some
other more exotic file system can handle the encoding chosen by 
Python or the program).

Perhaps we ought to drop that function altogether and let the
various file IO functions raise run-time errors instead ?!

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan 16 18:54:00 2002
From: (M.-A. Lemburg)
Date: Wed, 16 Jan 2002 19:54:00 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <018101c19e82$60c42950$0acc8490@neil>
Message-ID: <>

Neil Hodgson wrote:
> M.-A. Lemburg:
> > The restriction when compiling Python in wide mode on Windows
> > should be lifted: The PyUnicode_AsWideChar() API should be used
> > to convert 4-byte Unicode to wchar_t (which is 2-byte on Windows).
>    I'd prefer not to include this as it adds complexity for little benefit
> but am prepared to do the implementation if it is required.

Point taken, but please mention this in the PEP.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan 16 19:09:24 2002
From: (Martin v. Loewis)
Date: Wed, 16 Jan 2002 20:09:24 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
In-Reply-To: <> (
References: <> <006e01c1949c$7631d1b0$0acc8490@neil> <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <>
Message-ID: <>

> > I'm still not certain what the meaning of this function is, if it
> > means "Unicode file names are only restricted by the file system
> > conventions", then on Unix, it may change at run-time, if a user or
> > the application sets an UTF-8 locale, switching from the original "C"
> > locale.
> Doesn't it mean: "posix functions and file() can accept Unicode file 
> names" ? 

Neil has given his own interpretation (return true if it is *better*
to pass Unicode strings than to pass byte strings).

You property (accepts Unicode) is true on all Python installations
since 2.2: if you pass a Unicode string, it will try the file system
encoding; if that is NULL, it will try the system encoding. So on all
Python systems, 


currently succeeds everywhere (unless Unicode was completely disabled
in the port).

> That's what I thought, at least; whether they succeed or not 
> is another question and could well be handled by run-time errors
> (e.g. on Unix it is not at all clear whether NFS, Samba or some
> other more exotic file system can handle the encoding chosen by 
> Python or the program).

For NFS, it is clear - file names are null-terminated byte strings
(AFAIK). For Samba, I believe it depends on the installation,
specifically whether the encoding of Samba matches the one of the
user. For more exotic file systems, it is not all that clear.

> Perhaps we ought to drop that function altogether and let the
> various file IO functions raise run-time errors instead ?!

That was my suggestion as well. However, Neil points out that, on
Windows, passing Unicode is sometimes better: For some files, there is
no byte string file name to identify the file (if the file name is not
representable in MBCS). OTOH, on Unix, some files cannot be accessed
with a Unicode string, if the file name is invalid in the user's

It turns out that only OS X really got it right: For each file, there
is both a byte string name, and a Unicode name.


From  Wed Jan 16 19:43:49 2002
From: (Paul Prescod)
Date: Wed, 16 Jan 2002 11:43:49 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <> <>
Message-ID: <>

"M.-A. Lemburg" wrote:
> > Embrace the __future__!
> -1.
> Too dangerous. 

It isn't dangerous. That's precisely what __future__ is for! It is no
more dangerous than any other feature that uses __future__.

> ... If string interpolation makes it into the core,
> then please use a *new* construct. '\$' is currently interpreted
> as '\$' and this should not be changed (heck, just think what would
> happen to all the shell script snippets encoded in Python strings).

No, this should be changed. Completely ignoring string interpolation, I
am strongly in favour of changing the behaviour of the literal string
parser so that unknown \-combinations raise a SyntaxError. If you don't
want a backslash to be interpreted as an escape sequence start, you
should use a raw string.

The Python documentation and grammar already says:

escapeseq  ::=  "\" <any ASCII character> 

The documentation says:

"Unlike Standard  , all unrecognized escape sequences are left in the
string unchanged, i.e., the backslash is left in the string. (This
behavior is useful when debugging: if an escape sequence is mistyped,
the resulting output is more easily recognized as broken.)"

That's a weird thing to say. What could be more helpful for debugging
than a good old SyntaxError???

> BTW, why don't you wrap all this interpolation stuff into
> a module and then call a function to have it apply all the
> magic you want. 

We've been through that in this discussion already. In fact, that's how
the discussion started.

 Paul Prescod

From  Wed Jan 16 20:26:23 2002
From: (Paul Svensson)
Date: Wed, 16 Jan 2002 15:26:23 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Wed, 16 Jan 2002, Paul Prescod wrote:

>The documentation says:
>"Unlike Standard  , all unrecognized escape sequences are left in the
>string unchanged, i.e., the backslash is left in the string. (This
>behavior is useful when debugging: if an escape sequence is mistyped,
>the resulting output is more easily recognized as broken.)"
>That's a weird thing to say. What could be more helpful for debugging
>than a good old SyntaxError???

The usefulness is relative; it's arguably easier to find the
problem and fix it if the \ remains in the string than if it's
simply removed (as C does, tho most compilers issue a warning).

It could also be argued that you get more nutritinal value by eating only
the black raisins from the cake then by eating just the golden raisins...


From  Wed Jan 16 20:40:39 2002
From: (Paul Prescod)
Date: Wed, 16 Jan 2002 12:40:39 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <>
Message-ID: <>

Paul Svensson wrote:
> The usefulness is relative; it's arguably easier to find the
> problem and fix it if the \ remains in the string than if it's
> simply removed (as C does, tho most compilers issue a warning).

Yeah, I understood that. I just don't understand why it isn't like most
other things in Python. Python tends to be strict about things that are
likely mistakes, rather than helping you "debug them" after passing them
through silently.

 Paul Prescod

From  Wed Jan 16 21:31:02 2002
From: (Guido van Rossum)
Date: Wed, 16 Jan 2002 16:31:02 -0500
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: Your message of "Wed, 16 Jan 2002 12:40:39 PST."
References: <>
Message-ID: <>

> Yeah, I understood that. I just don't understand why it isn't like most
> other things in Python. Python tends to be strict about things that are
> likely mistakes, rather than helping you "debug them" after passing them
> through silently.
>  Paul Prescod

The "why" is that long ago Python didn't have raw strings but it did
have regular expressions.  I thought it would be painful to have to
double all backslashes used for the regex syntax.

It would be hard to change this policy now.

--Guido van Rossum (home page:

From  Wed Jan 16 21:59:17 2002
From: (Paul Prescod)
Date: Wed, 16 Jan 2002 13:59:17 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <>
 <> <>
Message-ID: <>

Guido van Rossum wrote:
> The "why" is that long ago Python didn't have raw strings but it did
> have regular expressions.  I thought it would be painful to have to
> double all backslashes used for the regex syntax.


> It would be hard to change this policy now.

How about an optional warning which, after a year or so, would be turned
on by default, and then a year or so after that would be an error? 

This same issue may effect some eventual merging of literal strings and
Unicode literals because \N, \u etc. are treated differently in strings
than in Unicode literals. And even if literal strings and Unicode
strings are never merged, \N could be useful in ordinary strings.

 Paul Prescod

From  Wed Jan 16 22:09:29 2002
From: (Jeff Epler)
Date: Wed, 16 Jan 2002 16:09:29 -0600
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict)
In-Reply-To: <>
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost> <> <> <> <>
Message-ID: <>

On Wed, Jan 16, 2002 at 08:17:54AM -0500, Barry A. Warsaw wrote:
> >>>>> "jepler" ==   <> writes:
>     jepler> Well, I guess we could add _ and $_ strings to Python,
>     jepler> right?
> Ug.  t'' strings have been discussed before w.r.t. i18n markup, but I
> don't like it.

... and you like $'' strings?

That suggestion was intended to bring a bad taste to *everybody*'s mouth,
as much as t'' alone does to yours.

(Hmm, and then I might need a raw unicode interpolated translated string
... is that spelled $_ur'' or r_$u'' ?)


From  Wed Jan 16 22:19:53 2002
From: (Guido van Rossum)
Date: Wed, 16 Jan 2002 17:19:53 -0500
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: Your message of "Wed, 16 Jan 2002 13:59:17 PST."
References: <> <> <>
Message-ID: <>

> How about an optional warning which, after a year or so, would be turned
> on by default, and then a year or so after that would be an error? 
> This same issue may effect some eventual merging of literal strings and
> Unicode literals because \N, \u etc. are treated differently in strings
> than in Unicode literals. And even if literal strings and Unicode
> strings are never merged, \N could be useful in ordinary strings.


I don't find this enough of a problem to invoke the heavy gun of a
language change.

--Guido van Rossum (home page:

From  Wed Jan 16 22:28:18 2002
From: (Barry A. Warsaw)
Date: Wed, 16 Jan 2002 17:28:18 -0500
Subject: PEP 215 (was Re: [Python-Dev] PEP 216 (string interpolation) alternative EvalDict)
References: <Pine.OSX.4.43.0201142108410.286-100000@localhost>
Message-ID: <>

On Wed, Jan 16, 2002 at 08:17:54AM -0500, Barry A. Warsaw wrote:
> Ug.  t'' strings have been discussed before w.r.t. i18n markup, but I
> don't like it.

>>>>> "JE" == Jeff Epler <> writes:

    JE> ... and you like $'' strings?

No! :)

    JE> That suggestion was intended to bring a bad taste to
    JE> *everybody*'s mouth, as much as t'' alone does to yours.

Ah, no wonder I've had to drink 3 sodas today.  I wondered what that
foul flavor was, especially since I made sure to brush my teeth this

    JE> (Hmm, and then I might need a raw unicode interpolated
    JE> translated string ... is that spelled $_ur'' or r_$u'' ?)

Exactly why I'm against adding more string prefixes.  Remember that
the _ thingie we currently recommend for gettext /isn't/ prefix
proliferation.  E.g.:

    _(u'translate this')
    _(ru'and this')

It's just a function call with a convenient name (and even that's just
a convention, of course).


From  Wed Jan 16 22:29:13 2002
From: (Paul Svensson)
Date: Wed, 16 Jan 2002 17:29:13 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Wed, 16 Jan 2002, Guido van Rossum wrote:

>> Yeah, I understood that. I just don't understand why it isn't like most
>> other things in Python. Python tends to be strict about things that are
>> likely mistakes, rather than helping you "debug them" after passing them
>> through silently.
>>  Paul Prescod
>The "why" is that long ago Python didn't have raw strings but it did
>have regular expressions.  I thought it would be painful to have to
>double all backslashes used for the regex syntax.
>It would be hard to change this policy now.

Yeah, it would be like, say, changing the semantics of integer division.
Sometimes it's better to do what's right than what's easy.


From  Wed Jan 16 22:43:48 2002
From: (Paul Svensson)
Date: Wed, 16 Jan 2002 17:43:48 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Wed, 16 Jan 2002, Paul Prescod wrote:

>Guido van Rossum wrote:
>> The "why" is that long ago Python didn't have raw strings but it did
>> have regular expressions.  I thought it would be painful to have to
>> double all backslashes used for the regex syntax.
>> It would be hard to change this policy now.
>How about an optional warning which, after a year or so, would be turned
>on by default, and then a year or so after that would be an error? 

Such a warning might prove to be a useful debugging tool,
even if the language never changed.
Maybe it would be a useful addition to PyChecker or some similar tool ?



From  Wed Jan 16 22:56:23 2002
From: (Jack Jansen)
Date: Wed, 16 Jan 2002 23:56:23 +0100
Subject: [Python-Dev] Extending types in C - help needed
Message-ID: <>

In the discussion on my request for an ("O@", typeobject, 
void **) format for PyArg_Parse and Py_BuildValue MAL suggested 
that I could get the same functionality by creating a type 
WrapperTypeObject, which would be a subtype of TypeObject with 
extra fields pointing to the _New() and _Convert() routines to 
convert Python objects from/to C pointers. This would be good 
enough for me, because then types wanting to participate in the 
wrapper protocol would subtype WrapperTypeObject in stead of 
TypeObject, and two global routines could return the _New and 
_Convert routines given the type object, and we wouldn't need 
yet another PyArg_Parse format specifier.

However, after digging high and low I haven't been able to 
deduce how I would then use this WrapperType in C as the type 
for my extension module objects. Are there any examples? If not, 
could someone who understands the new inheritance scheme give me 
some clues as to how to do this?
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Thu Jan 17 00:53:37 2002
From: (Mark Hammond)
Date: Thu, 17 Jan 2002 11:53:37 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,  was PEP-time ? ...
In-Reply-To: <>
Message-ID: <>

> Also, I believe most of PythonWin also assumes HAVE_USABLE_WCHAR_T
> (didn't check, though).

FYI, all the win32 extensions use their own Unicode API.  These extensions
had Unicode before Python did!  These wrapper functions are abstract enough
that they should be able to withstand any changes to Python's Unicode
implementation quite simply - probably at the cost of extra copies and
transformations in those wrappers.


From  Thu Jan 17 06:28:03 2002
From: (Guido van Rossum)
Date: Thu, 17 Jan 2002 01:28:03 -0500
Subject: [Python-Dev] deprecate input()?
In-Reply-To: Your message of "Wed, 16 Jan 2002 08:22:41 CST."
References: <>
Message-ID: <>

> I just responded to a question on a user had about feeding empty
> strings to input().  While he didn't say why he called input(), I suspect he
> thought the semantics were more like raw_input().
> In these days of widespread Internet nastiness, shouldn't input() be
> deprecated?

Why?  I imagine this is only used for interactive input, and then it's
the computer's owner who is typing.

--Guido van Rossum (home page:

From  Thu Jan 17 10:11:08 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 11:11:08 +0100
Subject: [Python-Dev] Utopian String Interpolation
References: <> <> <>
Message-ID: <>

Paul Prescod wrote:
> "M.-A. Lemburg" wrote:
> >
> >...
> > > Embrace the __future__!
> >
> > -1.
> >
> > Too dangerous.
> It isn't dangerous. That's precisely what __future__ is for! It is no
> more dangerous than any other feature that uses __future__.

It is. Currently Python strings are just that: immutable strings.
Now, you suddenly add dynamics to then. This will cause nightmares
in terms of security. Note that Python hasn't really had a need
for Perl's "taint" because of this. I wouldn't want to see that
change in any way.

If you really need this, either use a string prefix or call a
specific function which implements string interpolation. At
least then things are obvious and explicit.
> > ... If string interpolation makes it into the core,
> > then please use a *new* construct. '\$' is currently interpreted
> > as '\$' and this should not be changed (heck, just think what would
> > happen to all the shell script snippets encoded in Python strings).
> No, this should be changed. 

Huh ? I bet RedHat and thousands of sysadmins who have switched
from shell or Perl to Python would have strong objections.

> Completely ignoring string interpolation, I
> am strongly in favour of changing the behaviour of the literal string
> parser so that unknown \-combinations raise a SyntaxError. If you don't
> want a backslash to be interpreted as an escape sequence start, you
> should use a raw string.
> The Python documentation and grammar already says:
> escapeseq  ::=  "\" <any ASCII character>
> The documentation says:
> "Unlike Standard  , all unrecognized escape sequences are left in the
> string unchanged, i.e., the backslash is left in the string. (This
> behavior is useful when debugging: if an escape sequence is mistyped,
> the resulting output is more easily recognized as broken.)"
> That's a weird thing to say. What could be more helpful for debugging
> than a good old SyntaxError???

If there's nothing wrong with the escape why raise a 
SyntaxError ?
> > BTW, why don't you wrap all this interpolation stuff into
> > a module and then call a function to have it apply all the
> > magic you want.
> We've been through that in this discussion already. In fact, that's how
> the discussion started.

I've jumped in at a rather late point. Perhaps you ought to rewind
the discussion then and start discussing in a different 
direction :-) E.g. about the syntax to be used in the 
interpolation and where, when and in which context to 
evaluate the strings.

There are so many options that I can't really see any benefit
from chosing only one and hard-coding it into the language.
Other users will have other requirement which are likely
not to combine well with the one implementation you have in

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 10:19:33 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 11:19:33 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > [unicodefilenames()]
> > Perhaps we ought to drop that function altogether and let the
> > various file IO functions raise run-time errors instead ?!
> That was my suggestion as well. However, Neil points out that, on
> Windows, passing Unicode is sometimes better: For some files, there is
> no byte string file name to identify the file (if the file name is not
> representable in MBCS). OTOH, on Unix, some files cannot be accessed
> with a Unicode string, if the file name is invalid in the user's
> encoding.

Sounds like the run-time error solution would at least "solve"
the issue in terms of making it depend on the used file name
and underlying OS or file system.

I'd say: let the different file name based APIs try hard enough
and then have them bail out if they can't handle the particular

> It turns out that only OS X really got it right: For each file, there
> is both a byte string name, and a Unicode name.

I suppose this is due to the fact that Mac file systems store
extended attributes (much like what OS/2 does too) along with the
file -- that's a really nice way of being able to extend file
system semantics on a per-file basis; much better than the Windows
Registry or the MIME guess-by-extension mechanisms.

Oh well.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 10:29:45 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 11:29:45 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <>
Message-ID: <>

Jack Jansen wrote:
> In the discussion on my request for an ("O@", typeobject,
> void **) format for PyArg_Parse and Py_BuildValue MAL suggested

Thomas Heller suggested this.

I am more in favour of
exposing the pickle reduce API through "O@", that is
have PyArgTuple_Parse() call the .__reduce__() method
of the object. This will then return (factory, state_tuple)
and these could then be exposed to the C function via two

Note that there's no need for any type object magic. If this
becomes a common case, it may be worthwhile to add a tp_reduce
slot to type objects though.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 11:04:53 2002
From: (Neil Hodgson)
Date: Thu, 17 Jan 2002 22:04:53 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,  was PEP-time ? ...
References: <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <>
Message-ID: <08e201c19f46$cad5f070$0acc8490@neil>

M.-A. Lemburg, regarding unicodefilenames():

> Sounds like the run-time error solution would at least "solve"
> the issue in terms of making it depend on the used file name
> and underlying OS or file system.

   It is much better to choose a technique that will always work rather than
try to recover from a technique that may fail.

   unicodefilenames() can be dropped in favour of explicit OS and version
checks but this is replacing a simple robust check with a more fragile one.
unicodefilenames() will allow other environments to declare that client code
will be more robust by choosing to use Unicode strings as file name
arguments. This could include UTF-8 based systems such as OS X and BeOS, as
well as Windows variants like CE.


From  Thu Jan 17 11:36:21 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 12:36:21 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil>
Message-ID: <>

Neil Hodgson wrote:
> M.-A. Lemburg, regarding unicodefilenames():
> > Sounds like the run-time error solution would at least "solve"
> > the issue in terms of making it depend on the used file name
> > and underlying OS or file system.
>    It is much better to choose a technique that will always work rather than
> try to recover from a technique that may fail.

Is it really ? The problem is that under some OSes it is possible
to work with multiple very different file system from within a
single Python program. In those cases, the unicodefilename()
API wouldn't really help all that much.
>    unicodefilenames() can be dropped in favour of explicit OS and version
> checks but this is replacing a simple robust check with a more fragile one.

What kind of checks do you have in mind then ? If possible, it should
be possible to pass unicodefilenames() a path to check for Unicode-
capability, since on Unix (and probably Mac OS X as well), the path
decides which file system get's the ioctrl calls.

> unicodefilenames() will allow other environments to declare that client code
> will be more robust by choosing to use Unicode strings as file name
> arguments. This could include UTF-8 based systems such as OS X and BeOS, as
> well as Windows variants like CE.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 11:42:21 2002
From: (Martin v. Loewis)
Date: Thu, 17 Jan 2002 12:42:21 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
In-Reply-To: <> (
References: <> <01ab01c194a9$237b6dc0$0acc8490@neil> <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <>
Message-ID: <>

> Sounds like the run-time error solution would at least "solve"
> the issue in terms of making it depend on the used file name
> and underlying OS or file system.

Such a solution is impossible to implement in some case. E.g. on
Windows, if you use the ANSI (*A) APIs to list the directory contents,
Windows will *silently* (AFAIK) give you incorrect file names, i.e. it
will replace unrepresentable characters with the replacement char

OTOH, on Unix, there is a better approach for listdir and
unconvertable names: just return the byte strings to the user.

> I'd say: let the different file name based APIs try hard enough
> and then have them bail out if they can't handle the particular
> case.

That is a good idea. However, in case of the WinNT replacement
strategy, the application may still want to know.

Passing *in* Unicode objects is no issue at all: If they cannot be
converted to a reasonable file name, you clearly get an exception.

> > It turns out that only OS X really got it right: For each file, there
> > is both a byte string name, and a Unicode name.
> I suppose this is due to the fact that Mac file systems store
> extended attributes (much like what OS/2 does too) along with the
> file -- that's a really nice way of being able to extend file
> system semantics on a per-file basis; much better than the Windows
> Registry or the MIME guess-by-extension mechanisms.

I'd assume it is different: They just *define* that all local file
systems they have control over use UTF-8 on disk, atleast for BSD ufs;
for HFS, it might be that they 'just know' what encoding is used on an
HFS partition. I doubt they use extended attributes for this, as they
reportedly return UTF-8 even for file systems they've never seen
before; this may be either due to static knowledge (e.g. that VFAT is
UCS-2LE), or through guessing.

It may be that there are also limitations and restrictions, but
atleast they remove the burden from the application.


From  Thu Jan 17 12:06:54 2002
From: (Martin v. Loewis)
Date: Thu, 17 Jan 2002 13:06:54 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
In-Reply-To: <> (
References: <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil> <>
Message-ID: <>

> Is it really ? The problem is that under some OSes it is possible
> to work with multiple very different file system from within a
> single Python program. In those cases, the unicodefilename()
> API wouldn't really help all that much.

If you are thinking of Unix: It seems unicodefilename has to return 0
on Unix, meaning that you need to use byte-oriented file names if you
want to access all files (not that you will be able to display all
file names to the user, though ... there is nothing we can do to
achieve *that*).

> >    unicodefilenames() can be dropped in favour of explicit OS and version
> > checks but this is replacing a simple robust check with a more fragile one.
> What kind of checks do you have in mind then ? If possible, it should
> be possible to pass unicodefilenames() a path to check for Unicode-
> capability, since on Unix (and probably Mac OS X as well), the path
> decides which file system get's the ioctrl calls.

I think you are missing the point that unicodefilenames, as defined,
does not take any parameters. It says either yay or nay. So it could
be replaced in application code with

if sys.platform == "win32":
  use_unicode_for_filenames = windowsversion in ['nt','w2k','xp']
elif sys.platform.startswith("darwin"):
  use_unicode_for_filenames = 1
  use_unicode_for_filenames = 0

I would not use such code in my applications, nor would I ever use
unicodefilenames. Instead, I would just use Unicode file names all the
time, and risk that some users have problems with some files. Those
users I would tell to fix their systems (i.e. use NT instead of
Windows, or use a UTF-8 locale on Unix). Most users will never notice
any problem (except for Neil, who likes to put funny file names on his
disk :-), so this is a typical 80-20 problem here (or maybe rather


From  Thu Jan 17 12:29:54 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 13:29:54 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > Is it really ? The problem is that under some OSes it is possible
> > to work with multiple very different file system from within a
> > single Python program. In those cases, the unicodefilename()
> > API wouldn't really help all that much.
> If you are thinking of Unix: It seems unicodefilename has to return 0
> on Unix, meaning that you need to use byte-oriented file names if you
> want to access all files (not that you will be able to display all
> file names to the user, though ... there is nothing we can do to
> achieve *that*).

Right. I am starting to believe that unicodefilenames() doesn't really 
provide enough information to make it useful for cross-platform 
> > >    unicodefilenames() can be dropped in favour of explicit OS and version
> > > checks but this is replacing a simple robust check with a more fragile one.
> >
> > What kind of checks do you have in mind then ? If possible, it should
> > be possible to pass unicodefilenames() a path to check for Unicode-
> > capability, since on Unix (and probably Mac OS X as well), the path
> > decides which file system get's the ioctrl calls.
> I think you are missing the point that unicodefilenames, as defined,
> does not take any parameters. It says either yay or nay. So it could
> be replaced in application code with
> if sys.platform == "win32":
>   use_unicode_for_filenames = windowsversion in ['nt','w2k','xp']
> elif sys.platform.startswith("darwin"):
>   use_unicode_for_filenames = 1
> else:
>   use_unicode_for_filenames = 0

Sounds like this would be a good candidate for which I'll
check into CVS soon. With its many platform querying APIs it should
easily be possible to add a function which returns the above
information based on the platform Python is running on.
> I would not use such code in my applications, nor would I ever use
> unicodefilenames. Instead, I would just use Unicode file names all the
> time, and risk that some users have problems with some files. Those
> users I would tell to fix their systems (i.e. use NT instead of
> Windows, or use a UTF-8 locale on Unix). Most users will never notice
> any problem (except for Neil, who likes to put funny file names on his
> disk :-), so this is a typical 80-20 problem here (or maybe rather
> 99-1).

I doubt that you'll have any luck in trying to convince a user
to switch OSes just because Python applications don't cope
with native file names. The UTF-8 locale on Unix is also hard to
push: e.g. existing latin-1 file names will probably stop working the 
minute you switch to that locale. (I always leave the setting to "C" 
and simply don't use locale based file names -- that way I don't 
run into problems; non-[a-zA-Z0-9\-\._]+ file names are a no-go 
for cross-platform-code if you ask me...)

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 12:36:27 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 13:36:27 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,
 was PEP-time ? ...
References: <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > Sounds like the run-time error solution would at least "solve"
> > the issue in terms of making it depend on the used file name
> > and underlying OS or file system.
> Such a solution is impossible to implement in some case. E.g. on
> Windows, if you use the ANSI (*A) APIs to list the directory contents,
> Windows will *silently* (AFAIK) give you incorrect file names, i.e. it
> will replace unrepresentable characters with the replacement char

Samba does the same for mounted Windows shares, BTW.
> OTOH, on Unix, there is a better approach for listdir and
> unconvertable names: just return the byte strings to the user.

> > I'd say: let the different file name based APIs try hard enough
> > and then have them bail out if they can't handle the particular
> > case.
> That is a good idea. However, in case of the WinNT replacement
> strategy, the application may still want to know.
> Passing *in* Unicode objects is no issue at all: If they cannot be
> converted to a reasonable file name, you clearly get an exception.

True and that's good :-)
Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 13:43:02 2002
From: (Paul Svensson)
Date: Thu, 17 Jan 2002 08:43:02 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Thu, 17 Jan 2002, M.-A. Lemburg wrote:

>Paul Prescod wrote:
>> The documentation says:
>> "Unlike Standard  , all unrecognized escape sequences are left in the
>> string unchanged, i.e., the backslash is left in the string. (This
>> behavior is useful when debugging: if an escape sequence is mistyped,
>> the resulting output is more easily recognized as broken.)"
>> That's a weird thing to say. What could be more helpful for debugging
>> than a good old SyntaxError???
>If there's nothing wrong with the escape why raise a 
>SyntaxError ?

I would certainly claim that an unrecognized escape sequence _is_ wrong.


From  Thu Jan 17 14:02:11 2002
From: (M.-A. Lemburg)
Date: Thu, 17 Jan 2002 15:02:11 +0100
Subject: [Python-Dev] Utopian String Interpolation
References: <>
Message-ID: <>

Paul Svensson wrote:
> On Thu, 17 Jan 2002, M.-A. Lemburg wrote:
> >Paul Prescod wrote:
> >>
> >> The documentation says:
> >>
> >> "Unlike Standard  , all unrecognized escape sequences are left in the
> >> string unchanged, i.e., the backslash is left in the string. (This
> >> behavior is useful when debugging: if an escape sequence is mistyped,
> >> the resulting output is more easily recognized as broken.)"
> >>
> >> That's a weird thing to say. What could be more helpful for debugging
> >> than a good old SyntaxError???
> >
> >If there's nothing wrong with the escape why raise a
> >SyntaxError ?
> I would certainly claim that an unrecognized escape sequence _is_ wrong.

Depending on how you see it, an "unrecognized escape sequence" is 
not an escape sequence to begin with :-)

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 17 14:15:00 2002
From: (Guido van Rossum)
Date: Thu, 17 Jan 2002 09:15:00 -0500
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: Your message of "Thu, 17 Jan 2002 08:43:02 EST."
References: <>
Message-ID: <>

> I would certainly claim that an unrecognized escape sequence _is_ wrong.

Then you are wrong.  Go away and design your own language.

--Guido van Rossum (home page:

From  Thu Jan 17 15:39:30 2002
From: (Barry A. Warsaw)
Date: Thu, 17 Jan 2002 10:39:30 -0500
Subject: [Python-Dev] Utopian String Interpolation
References: <>
Message-ID: <>

>>>>> "MAL" == M  <> writes:

    MAL> It is. Currently Python strings are just that: immutable
    MAL> strings.  Now, you suddenly add dynamics to then. This will
    MAL> cause nightmares in terms of security. Note that Python
    MAL> hasn't really had a need for Perl's "taint" because of
    MAL> this. I wouldn't want to see that change in any way.


    MAL> I've jumped in at a rather late point. Perhaps you ought to
    MAL> rewind the discussion then and start discussing in a
    MAL> different direction :-) E.g. about the syntax to be used in
    MAL> the interpolation and where, when and in which context to
    MAL> evaluate the strings.

Proponants of this feature can start by updating the PEP.


From  Thu Jan 17 16:32:11 2002
From: (Paul Svensson)
Date: Thu, 17 Jan 2002 11:32:11 -0500 (EST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

On Thu, 17 Jan 2002, Guido van Rossum wrote:

>> I would certainly claim that an unrecognized escape sequence _is_ wrong.
>Then you are wrong.  (---)

Then maybe the Python Referece Manual (2.4.1) needs to be updated,
since the paragraph concerning unrecognized escape sequences
doesn't mention them other than being "mistyped" or "broken".
(Does "mistyped" and "broken" qualify as "wrong" ?)


From  Thu Jan 17 17:40:19 2002
From: (Skip Montanaro)
Date: Thu, 17 Jan 2002 11:40:19 -0600
Subject: [Python-Dev] deprecate input()?
In-Reply-To: <>
References: <>
Message-ID: <>

    >> I just responded to a question on a user had about feeding
    >> empty strings to input().  While he didn't say why he called input(),
    >> I suspect he thought the semantics were more like raw_input().
    >> In these days of widespread Internet nastiness, shouldn't input() be
    >> deprecated?

    Guido> Why?  I imagine this is only used for interactive input, and then
    Guido> it's the computer's owner who is typing.

Yes, but what if the program containing calls to input() get shipped to
someone else's computer?  It just seems to me that a) input is almost never
what you want to call and that b) it would seem to a naive programmer to be
the correct way to ask the user for a line of input.


From  Thu Jan 17 17:49:26 2002
From: (Guido van Rossum)
Date: Thu, 17 Jan 2002 12:49:26 -0500
Subject: [Python-Dev] deprecate input()?
In-Reply-To: Your message of "Thu, 17 Jan 2002 11:40:19 CST."
References: <> <>
Message-ID: <>

>     Guido> Why?  I imagine this is only used for interactive input,
>     Guido> and then it's the computer's owner who is typing.
> Yes, but what if the program containing calls to input() get shipped
> to someone else's computer?  It just seems to me that a) input is
> almost never what you want to call and that b) it would seem to a
> naive programmer to be the correct way to ask the user for a line of
> input.

I don't see the security problem.  Can you explain a scenario where
this causes a security risk?  If the user of the program types
something evil in the input box they screw themselves!

--Guido van Rossum (home page:

From  Thu Jan 17 17:56:46 2002
From: (Aahz Maruch)
Date: Thu, 17 Jan 2002 09:56:46 -0800 (PST)
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <> from "Guido van Rossum" at Jan 17, 2002 09:15:00 AM
Message-ID: <>

Guido van Rossum wrote:
>Paul Svensson:
>> I would certainly claim that an unrecognized escape sequence _is_ wrong.
> Then you are wrong.  Go away and design your own language.

Hey!  That's a bit harsh.  I'm not going to campaign to make
unrecognized escape sequences a syntax error, but not raising a syntax
error does seem to be against Python's principles.
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Thu Jan 17 18:01:21 2002
From: (Russ Cox)
Date: Thu, 17 Jan 2002 13:01:21 -0500
Subject: [Python-Dev] deprecate input()?
Message-ID: <>

> Yes, but what if the program containing calls to input() get shipped to
> someone else's computer?  It just seems to me that a) input is almost never
> what you want to call and that b) it would seem to a naive programmer to be
> the correct way to ask the user for a line of input.

Since most arbitrary lines of input generate syntax errors,
wouldn't the naive programmer quickly figure out that input
isn't the "read a line" function?  (Unless you're trying to
input numbers, I suppose.)


From  Thu Jan 17 18:25:46 2002
From: (Andres Tuells)
Date: Thu, 17 Jan 2002 19:25:46 +0100
Subject: [Python-Dev] Re: Stackless Python is DEAD! Long live Stackless Python
References: <>
Message-ID: <006f01c19f84$620dde20$9d76393e@integralabzenon>

Thats great !!!

----- Original Message -----
From: "Christian Tismer" <>
To: <>; <>;
Sent: Thursday, January 17, 2002 6:09 PM
Subject: Ann: Stackless Python is DEAD! Long live Stackless Python

> #######################################
>              Announcement:
> #######################################
> The end of an era has come:
> ---------------------------
> Stackless Python, in the form provided upto Python 2.0, is DEAD.
> I am abandoning the whole implementation.
> A new era has begun:
> --------------------
> A completely new implementation is in development for
> Python 2.2 and up which gives you the following features:
> - There are no restrictions any longer for uthread/coroutine
>    switching. Switching is possible at *any* time, in *any*
>    context.
> - There are no significant changes to the Python core any
>    longer. The new patches are of minimum size, and they
>    will probably survive unchanged until Python 3.0 .
> - Maintenance work for Stackless Python is reduced to the
>    bare minimum. There is no longer a need to incorporate
>    Stackless into the standard, since there is no work to
>    be shared.
> - Stackless breaks its major axiom now. It is no longer
>    platform independent, since it *does* modify the C stack.
>    I will support all Intel platforms by myself. For other
>    platforms, I'm asking for volunteers.
> * The basic elements of Stackless are now switchable chains
>    of frames. We have to define an interface that turns these
>    chains into microthreads and coroutines.
> Everybody is invited to come to the Stackless mailing
> list and discuss the layout of this new design.
> Especially we need to decide about (*).
> see you there - chris
> --
> Christian Tismer             :^)   <>
> Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
> Kaunstr. 26                  :    *Starship*
> 14163 Berlin                 :     PGP key ->
> PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
>       where do you want to jump today?
> --

From  Thu Jan 17 18:25:46 2002
From: (Guido van Rossum)
Date: Thu, 17 Jan 2002 13:25:46 -0500
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: Your message of "Thu, 17 Jan 2002 09:56:46 PST."
References: <>
Message-ID: <>

> >Paul Svensson:
> >> 
> >> I would certainly claim that an unrecognized escape sequence _is_ wrong.
> > 
> Guido van Rossum wrote:
> > Then you are wrong.  Go away and design your own language.
> Hey!  That's a bit harsh.  I'm not going to campaign to make
> unrecognized escape sequences a syntax error, but not raising a syntax
> error does seem to be against Python's principles.

Whatever.  Who is Paul Svensson and what is he doing in python-dev?

--Guido van Rossum (home page:

From  Thu Jan 17 18:36:59 2002
From: (Skip Montanaro)
Date: Thu, 17 Jan 2002 12:36:59 -0600
Subject: [Python-Dev] deprecate input()?
In-Reply-To: <>
References: <>
Message-ID: <>

    Guido> Why?  I imagine this is only used for interactive input, and then
    Guido> it's the computer's owner who is typing.

    >> Yes, but what if the program containing calls to input() get shipped
    >> to someone else's computer?  It just seems to me that a) input is
    >> almost never what you want to call and that b) it would seem to a
    >> naive programmer to be the correct way to ask the user for a line of
    >> input.

    Guido> I don't see the security problem.  Can you explain a scenario
    Guido> where this causes a security risk?  If the user of the program
    Guido> types something evil in the input box they screw themselves!

Fine.  Let's drop it.


From  Thu Jan 17 19:31:36 2002
From: (Neil Hodgson)
Date: Fri, 18 Jan 2002 06:31:36 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,   was PEP-time ? ...
References: <> <020601c194b3$c85a4320$0acc8490@neil> <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil> <>
Message-ID: <00c401c19f8d$941e3fa0$0acc8490@neil>

M.-A. Lemburg:

> Is it really ? The problem is that under some OSes it is possible
> to work with multiple very different file system from within a
> single Python program. In those cases, the unicodefilename()
> API wouldn't really help all that much.

   On NT the core file system calls are Unicode based with the narrow string
calls being shims on top of this. When mounting non-native file systems, NT
may perform name mapping, but that name mapping is 'complete and consistent'
in that it is not possible to do anything with the narrow APIs that cannot
be achieved with the Unicode APIs.

> >    unicodefilenames() can be dropped in favour of explicit OS and
> > checks but this is replacing a simple robust check with a more fragile
> What kind of checks do you have in mind then ? If possible, it should
> be possible to pass unicodefilenames() a path to check for Unicode-
> capability, since on Unix (and probably Mac OS X as well), the path
> decides which file system get's the ioctrl calls.

   Any platform experts know how this works on MacOS X or BeOS? Do
non-native file systems get mapped to Unicode names so that UTF-8 will
always work?


From  Thu Jan 17 19:23:00 2002
From: (Thomas Heller)
Date: Thu, 17 Jan 2002 20:23:00 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <>
Message-ID: <08c501c19f8c$72631b20$e000a8c0@thomasnotebook>

From: "Jack Jansen" <>
> In the discussion on my request for an ("O@", typeobject, 
> void **) format for PyArg_Parse and Py_BuildValue MAL suggested 
(as MAL already explained, that we suggested by me)
> that I could get the same functionality by creating a type 
> WrapperTypeObject, which would be a subtype of TypeObject with 
> extra fields pointing to the _New() and _Convert() routines to 
> convert Python objects from/to C pointers. This would be good 
> enough for me, because then types wanting to participate in the 
> wrapper protocol would subtype WrapperTypeObject in stead of 
> TypeObject, and two global routines could return the _New and 
> _Convert routines given the type object, and we wouldn't need 
> yet another PyArg_Parse format specifier.
> However, after digging high and low I haven't been able to 
> deduce how I would then use this WrapperType in C as the type 
> for my extension module objects. Are there any examples? If not, 
> could someone who understands the new inheritance scheme give me 
> some clues as to how to do this?

Currently (after quite some time) I have the impression that you
cannot create a subtype of PyType_Type in C because PyType_Type
ends in a variable sized array, at least not in this way:

struct {
    PyTypeObject type;
    ...additional fields...
} WrapperType_Type;

Can someone confirm this?

(I have to find out what to do with the tp_members slot, which seems to be
correspond to the Python level __slots__ class variable)


From  Thu Jan 17 19:51:36 2002
From: (Guido van Rossum)
Date: Thu, 17 Jan 2002 14:51:36 -0500
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: Your message of "Thu, 17 Jan 2002 20:23:00 +0100."
References: <>
Message-ID: <>

> Currently (after quite some time) I have the impression that you
> cannot create a subtype of PyType_Type in C because PyType_Type
> ends in a variable sized array, at least not in this way:
> struct {
>     PyTypeObject type;
>     ...additional fields...
> } WrapperType_Type;
> Can someone confirm this?

Yes, alas.  The type you would have to declare is 'etype', a private
type in typeobject.c.

--Guido van Rossum (home page:

From  Thu Jan 17 20:07:30 2002
From: (Neil Hodgson)
Date: Fri, 18 Jan 2002 07:07:30 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,   was PEP-time ? ...
References: <>   <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil> <> <>
Message-ID: <016601c19f92$9a049e00$0acc8490@neil>

Martin v. Loewis:

> Most users will never notice
> any problem (except for Neil, who likes to put funny file names on his
> disk :-), so this is a typical 80-20 problem here (or maybe rather
> 99-1).

    While Martin is referring to the rarity of having non-native file names
on Windows 9x, the problem adressed by PEP 277 is real. Already this year,
there have been two enquiries [from Michael Ebert and Guenter Radestock] to
comp.lang.python about Unicode file name use on NT.


From  Thu Jan 17 20:10:27 2002
From: (Paul Prescod)
Date: Thu, 17 Jan 2002 12:10:27 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <> <> <> <>
Message-ID: <>

"M.-A. Lemburg" wrote:
> It is. Currently Python strings are just that: immutable strings.
> Now, you suddenly add dynamics to then. 

I don't want to go through this whole thread from the beginning again.
PEP 215 does not add "dynamics" to anything. In fact, PEP 215 is a more
static mechanism than the current idiom. Even if we make PEP 215's
behaviour the default for strings, it is still NOT DYNAMIC.

>... This will cause nightmares
> in terms of security. 

There is a thread called "PEP 215 does not introduce security issues".
Please read it. Everyone involved who initially thought that PEP 215 had
security issues backed down and agreed that it did not. Once again,
whether there is a string prefix or not is irrelevant to this question.
PEP 215's semantics are *not dynamic*.

> ... Note that Python hasn't really had a need
> for Perl's "taint" because of this. I wouldn't want to see that
> change in any way.

I am certainly not a Perl programmer but Python is also attackable
through the sorts of holes that "taint" is intended to avoid.

username = raw_input()
os.system("cp %s.old" % (username, username))

Perl considers this "dangerous" and so it has taint. It has *nothing* to
do with interpolation syntax.

> Huh ? I bet RedHat and thousands of sysadmins who have switched
> from shell or Perl to Python would have strong objections.

Python has a construct called a "raw string" which is perfect for when
you don't want backslashes treated specially.

 Paul Prescod

From  Thu Jan 17 17:46:23 2002
From: (David Ascher)
Date: Thu, 17 Jan 2002 09:46:23 -0800
Subject: [Python-Dev] deprecate input()?
References: <> <>
Message-ID: <>

Guido van Rossum wrote:
> > I just responded to a question on a user had about feeding empty
> > strings to input().  While he didn't say why he called input(), I suspect he
> > thought the semantics were more like raw_input().
> >
> > In these days of widespread Internet nastiness, shouldn't input() be
> > deprecated?
> Why?  I imagine this is only used for interactive input, and then it's
> the computer's owner who is typing.

input() can also be used effectively in interactive apps (calculators,
scripting engines for GUI apps) in contexts where the users can be

Not _everything_ is on the web, luckily, and not everything needs to be

That doesn't mean that I think the naming choices for input() and
raw_input() have withstood the test of hindsight, but few things do...


From  Thu Jan 17 20:59:13 2002
From: (Jack Jansen)
Date: Thu, 17 Jan 2002 21:59:13 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <>
Message-ID: <>

On Thursday, January 17, 2002, at 11:29  AM, M.-A. Lemburg wrote:

> Jack Jansen wrote:
>> In the discussion on my request for an ("O@", typeobject,
>> void **) format for PyArg_Parse and Py_BuildValue MAL suggested
> Thomas Heller suggested this.

Oops, you're right. I should be careful not to mix up my Germans;-)

> I am more in favour of
> exposing the pickle reduce API through "O@", that is
> have PyArgTuple_Parse() call the .__reduce__() method
> of the object. This will then return (factory, state_tuple)
> and these could then be exposed to the C function via two
> PyObject*.

You've suggested this before, but at that time I ignored it 
because it made absolutely no sense to me. "pickle" triggers one 
set of ideas for me, "reduce" triggers a different set, "factory 
function" yet another different set. None of these sets of ideas 
have the least resemblance to what I'm trying to do:-)

I gave a fairly complete example (using calldll from Python to 
wrap a function that returns a Mac WindowObject) last week, 
could you explain how you would implement this with pickle, 
reduce and factory functions?
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Thu Jan 17 21:03:25 2002
From: (Jack Jansen)
Date: Thu, 17 Jan 2002 22:03:25 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,  was PEP-time ? ...
In-Reply-To: <>
Message-ID: <>

On Thursday, January 17, 2002, at 12:42  PM, Martin v. Loewis wrote:

>> I suppose this is due to the fact that Mac file systems store
>> extended attributes (much like what OS/2 does too) along with the
>> file -- that's a really nice way of being able to extend file
>> system semantics on a per-file basis; much better than the Windows
>> Registry or the MIME guess-by-extension mechanisms.
> I'd assume it is different: They just *define* that all local file
> systems they have control over use UTF-8 on disk, atleast for BSD ufs;
> for HFS, it might be that they 'just know' what encoding is used on an
> HFS partition. I doubt they use extended attributes for this, as they
> reportedly return UTF-8 even for file systems they've never seen
> before; this may be either due to static knowledge (e.g. that VFAT is
> UCS-2LE), or through guessing.

It's actually a whole lot simpler: for filesystems with an 
encoding that is open to interpretation the user specifies it 
during mount:-)
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Thu Jan 17 21:09:20 2002
From: (Jack Jansen)
Date: Thu, 17 Jan 2002 22:09:20 +0100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,   was PEP-time ? ...
In-Reply-To: <00c401c19f8d$941e3fa0$0acc8490@neil>
Message-ID: <>

On Thursday, January 17, 2002, at 08:31  PM, Neil Hodgson wrote:

>> What kind of checks do you have in mind then ? If possible, it should
>> be possible to pass unicodefilenames() a path to check for Unicode-
>> capability, since on Unix (and probably Mac OS X as well), the path
>> decides which file system get's the ioctrl calls.
>    Any platform experts know how this works on MacOS X or BeOS? Do
> non-native file systems get mapped to Unicode names so that UTF-8 will
> always work?

For Mac OS X: yes, that is how it is supposed to work.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Fri Jan 18 09:47:03 2002
From: (M.-A. Lemburg)
Date: Fri, 18 Jan 2002 10:47:03 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <>
Message-ID: <>

Jack Jansen wrote:
> On Thursday, January 17, 2002, at 11:29  AM, M.-A. Lemburg wrote:
> > I am more in favour of
> > exposing the pickle reduce API through "O@", that is
> > have PyArgTuple_Parse() call the .__reduce__() method
> > of the object. This will then return (factory, state_tuple)
> > and these could then be exposed to the C function via two
> > PyObject*.
> You've suggested this before, but at that time I ignored it
> because it made absolutely no sense to me. "pickle" triggers one
> set of ideas for me, "reduce" triggers a different set, "factory
> function" yet another different set. None of these sets of ideas
> have the least resemblance to what I'm trying to do:-)

The idea is simple but extends what you are trying to
achieve (I gave an example on how to use this somewhere
in the "wrapper" thread). Basically, you'll just want to
use the state tuple to access the underlying void* C pointer
via a PyCObject which does the wrapping of the pointer.
The "pickle" mechanism would store the PyCObject in the
state tuple which you could then access to get at the
C pointer.
This may sound complicated at first, but it provides much
more flexibility w/r to more complex objects, e.g. the method
you have in mind only supports wrapping a single C pointer;
the "pickle" mechanism can potentially handle any serializable 

> I gave a fairly complete example (using calldll from Python to
> wrap a function that returns a Mac WindowObject) last week,
> could you explain how you would implement this with pickle,
> reduce and factory functions?

Sorry, no time for that ... I've got an important business
trip next week which needs to be prepared. Please bring this
up again after next week.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 18 10:24:01 2002
From: (Jack Jansen)
Date: Fri, 18 Jan 2002 11:24:01 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <>
Message-ID: <>

On Friday, January 18, 2002, at 10:47 , M.-A. Lemburg wrote:

> Jack Jansen wrote:
>> On Thursday, January 17, 2002, at 11:29  AM, M.-A. Lemburg wrote:
>>> I am more in favour of
>>> exposing the pickle reduce API through "O@", that is
>>> have PyArgTuple_Parse() call the .__reduce__() method
>>> of the object. This will then return (factory, state_tuple)
>>> and these could then be exposed to the C function via two
>>> PyObject*.
>> You've suggested this before, but at that time I ignored it
>> because it made absolutely no sense to me. "pickle" triggers one
>> set of ideas for me, "reduce" triggers a different set, "factory
>> function" yet another different set. None of these sets of ideas
>> have the least resemblance to what I'm trying to do:-)
> The idea is simple but extends what you are trying to
> achieve (I gave an example on how to use this somewhere
> in the "wrapper" thread). Basically, you'll just want to
> use the state tuple to access the underlying void* C pointer
> via a PyCObject which does the wrapping of the pointer.
> The "pickle" mechanism would store the PyCObject in the
> state tuple which you could then access to get at the
> C pointer.
I think you're missing a few points here. First of all, my objects 
aren't PyCObjects but other extension objects. While the main pointer in 
the object could be wrapped in a PyCObject there may be other 
information in my objects that is important, such as a pointer to the 
dispose routine to call on the c-pointer when the Python object reaches 
refcount zero (and this pointer may change over time as ownership of, 
say, a button is passed from Python to the system). The _New and 
_Convert routines will know how to get from the C pointer to the 
*correct* object, i.e. normally there will be only one Python object for 
every C object.

Also, the method seems rather complicated for doing a simple thing. The 
only thing I really want is a way to refer to an _New or _Convert method 
from Python code. The most reasonable way to do that seems to be by 
creating a way to get from te type object (which is available in Python) 
to those routines. Thomas' suggestion looked very promising, and simple 
too, until Guido said that unfortunately it couldn't be done. Your 
suggestion, as far as I understand it, looks complicated and probably 
inefficient too (remember the code will have to go through all these 
hoops every time it needs to convert an object from Python to C or vice 

Correct me if I'm wrong,

From  Fri Jan 18 11:27:11 2002
From: (M.-A. Lemburg)
Date: Fri, 18 Jan 2002 12:27:11 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <>
Message-ID: <>

Jack Jansen wrote:
> >> On Thursday, January 17, 2002, at 11:29  AM, M.-A. Lemburg wrote:
> >>
> >>> I am more in favour of
> >>> exposing the pickle reduce API through "O@", that is
> >>> have PyArgTuple_Parse() call the .__reduce__() method
> >>> of the object. This will then return (factory, state_tuple)
> >>> and these could then be exposed to the C function via two
> >>> PyObject*.
> >>
> >> You've suggested this before, but at that time I ignored it
> >> because it made absolutely no sense to me. "pickle" triggers one
> >> set of ideas for me, "reduce" triggers a different set, "factory
> >> function" yet another different set. None of these sets of ideas
> >> have the least resemblance to what I'm trying to do:-)
> >
> > The idea is simple but extends what you are trying to
> > achieve (I gave an example on how to use this somewhere
> > in the "wrapper" thread). Basically, you'll just want to
> > use the state tuple to access the underlying void* C pointer
> > via a PyCObject which does the wrapping of the pointer.
> > The "pickle" mechanism would store the PyCObject in the
> > state tuple which you could then access to get at the
> > C pointer.
> >
> I think you're missing a few points here. First of all, my objects
> aren't PyCObjects but other extension objects. 

I know. The idea is that either you add a .__reduce__ method
to the extension objects or register their types with a registry
comparable to copyreg.

> While the main pointer in
> the object could be wrapped in a PyCObject there may be other
> information in my objects that is important, such as a pointer to the
> dispose routine to call on the c-pointer when the Python object reaches
> refcount zero (and this pointer may change over time as ownership of,
> say, a button is passed from Python to the system). 

Note that PyCObjects support all of this. It's not important in this
context, though.  The PyCObject is only used to wrap the raw 
pointer; the factory function then takes this pointer and creates
one of your extension object out of it.

> The _New and
> _Convert routines will know how to get from the C pointer to the
> *correct* object, i.e. normally there will be only one Python object for
> every C object.

That's also possible using the "pickle" approach.
> Also, the method seems rather complicated for doing a simple thing. The
> only thing I really want is a way to refer to an _New or _Convert method
> from Python code. The most reasonable way to do that seems to be by
> creating a way to get from te type object (which is available in Python)
> to those routines. Thomas' suggestion looked very promising, and simple
> too, until Guido said that unfortunately it couldn't be done. Your
> suggestion, as far as I understand it, looks complicated and probably
> inefficient too (remember the code will have to go through all these
> hoops every time it needs to convert an object from Python to C or vice
> versa).

It is more complicated, but also more flexible. Plus it builds on
techniques which are already applied in Python's pickle 

Note that by adding a tp_reduce slot, the overhead of calling
a Python function could be kept reasonable. Helper functions
could aid in accessing the C pointer which is stored in
the state tuple.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 18 15:09:30 2002
From: (Guido van Rossum)
Date: Fri, 18 Jan 2002 10:09:30 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: Your message of "Mon, 14 Jan 2002 15:04:17 CST."
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <> <> <> <>
Message-ID: <>

What's the current thinking about making docstrings optional?

Does everybody agree on Gustavo's patch?

--Guido van Rossum (home page:

From  Fri Jan 18 15:15:54 2002
From: (Neil Schemenauer)
Date: Fri, 18 Jan 2002 07:15:54 -0800
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>; from on Fri, Jan 18, 2002 at 10:09:30AM -0500
References: <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <> <> <> <> <> <>
Message-ID: <>

Guido van Rossum wrote:
> What's the current thinking about making docstrings optional?
> Does everybody agree on Gustavo's patch?

10% space saving?  That doesn't seem to be worth the effort.  OTOH,
I'm not dealing with any platforms that are memory constrained right


From  Fri Jan 18 15:23:23 2002
From: (M.-A. Lemburg)
Date: Fri, 18 Jan 2002 16:23:23 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <> <> <> <>
 <> <>
Message-ID: <>

Guido van Rossum wrote:
> What's the current thinking about making docstrings optional?
> Does everybody agree on Gustavo's patch?


This will help Python embedders and porters to embedded systems
a lot.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 18 15:26:06 2002
From: (Barry A. Warsaw)
Date: Fri, 18 Jan 2002 10:26:06 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020114093053.C1325@ibook.distro.conectiva>
Message-ID: <>

>>>>> "NS" == Neil Schemenauer <> writes:

    >> What's the current thinking about making docstrings optional?
    >> Does everybody agree on Gustavo's patch?

    NS> 10% space saving?  That doesn't seem to be worth the effort.
    NS> OTOH, I'm not dealing with any platforms that are memory
    NS> constrained right now.

Personally I don't care either for the same reasons.  I'll just note
that what Emacs used to do (maybe it still does, I dunno), is extract
all its inlined docstrings into a separate file which could be thrown
away if you didn't want to pay for the bloat.  All that complexity was
built in a time when 300KB or so of docstrings really could make a
huge difference for download times or storage resources.


From  Fri Jan 18 15:42:09 2002
From: (M.-A. Lemburg)
Date: Fri, 18 Jan 2002 16:42:09 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020114093053.C1325@ibook.distro.conectiva>
 <> <>
Message-ID: <>

"Barry A. Warsaw" wrote:
> >>>>> "NS" == Neil Schemenauer <> writes:
>     >> What's the current thinking about making docstrings optional?
>     >> Does everybody agree on Gustavo's patch?
>     NS> 10% space saving?  That doesn't seem to be worth the effort.
>     NS> OTOH, I'm not dealing with any platforms that are memory
>     NS> constrained right now.
> Personally I don't care either for the same reasons.  I'll just note
> that what Emacs used to do (maybe it still does, I dunno), is extract
> all its inlined docstrings into a separate file which could be thrown
> away if you didn't want to pay for the bloat.  All that complexity was
> built in a time when 300KB or so of docstrings really could make a
> huge difference for download times or storage resources.

You should also consider the possibility of using the macros
for translating the docs-strings. They are a form of markup.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 18 15:46:48 2002
From: (Barry A. Warsaw)
Date: Fri, 18 Jan 2002 10:46:48 -0500
Subject: [ Re: [Python-Dev] Python's footprint]
References: <20020114093053.C1325@ibook.distro.conectiva>
Message-ID: <>

>>>>> "MAL" == M  <> writes:

    MAL> You should also consider the possibility of using the macros
    MAL> for translating the docs-strings. They are a form of markup.

Good point!

From  Fri Jan 18 16:23:30 2002
From: (Jack Jansen)
Date: Fri, 18 Jan 2002 17:23:30 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <>
Message-ID: <>

On Friday, January 18, 2002, at 04:23 , M.-A. Lemburg wrote:

> Guido van Rossum wrote:
>> What's the current thinking about making docstrings optional?
>> Does everybody agree on Gustavo's patch?
>> 5470
> +1.
> This will help Python embedders and porters to embedded systems
> a lot.

+1. Same reasoning.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -

From  Fri Jan 18 18:38:05 2002
From: (Paul Prescod)
Date: Fri, 18 Jan 2002 10:38:05 -0800
Subject: [Python-Dev] Utopian String Interpolation
References: <> <>
Message-ID: <>

I think that something in particular that Paul S. said got under your
skin (and there was something he said that could certainly get under a
person's skin). I'm pretty sure it isn't now a policy to rudely reject
suggestions from people you haven't heard of! Until I went back through
the thread I felt as Aahz did that your rejection was somewhat severe in
tone. I think you (still) agree that people should not be afraid of
(politely) stating their opinions in python-dev, even when those
opinions disagree with yours. Or if there is an unspoken rule that
unproven developers shouldn't be in python-dev then maybe we should just
make it a spoken rule. But I'm most confident of the theory that you
snapped at one person in particular because of something he said.

 Paul Prescod

Guido van Rossum wrote:
> > >Paul Svensson:
> > >>
> > >> I would certainly claim that an unrecognized escape sequence _is_ wrong.
> > >
> > Guido van Rossum wrote:
> > > Then you are wrong.  Go away and design your own language.
> >
> Aahz:
> > Hey!  That's a bit harsh.  I'm not going to campaign to make
> > unrecognized escape sequences a syntax error, but not raising a syntax
> > error does seem to be against Python's principles.
> Whatever.  Who is Paul Svensson and what is he doing in python-dev?
> --Guido van Rossum (home page:
> _______________________________________________
> Python-Dev mailing list

From  Fri Jan 18 18:42:22 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 19:42:22 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <> (message from
 Jack Jansen on Fri, 18 Jan 2002 11:24:01 +0100)
References: <>
Message-ID: <>

From  Fri Jan 18 18:56:45 2002
From: (Thomas Heller)
Date: Fri, 18 Jan 2002 19:56:45 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <> <>
Message-ID: <05fc01c1a051$f1c4db90$e000a8c0@thomasnotebook>

From: "Martin v. Loewis" <>
To: <>
Cc: <>; <>; <>
Sent: Friday, January 18, 2002 7:42 PM
Subject: Re: [Python-Dev] Extending types in C - help needed


Hmm, not very much help ;-)


From  Fri Jan 18 19:01:16 2002
From: (Thomas Heller)
Date: Fri, 18 Jan 2002 20:01:16 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <>              <08c501c19f8c$72631b20$e000a8c0@thomasnotebook>  <>
Message-ID: <060801c1a052$93d5a860$e000a8c0@thomasnotebook>

> > Currently (after quite some time) I have the impression that you
> > cannot create a subtype of PyType_Type in C because PyType_Type
> > ends in a variable sized array, at least not in this way:
> > 
> > struct {
> >     PyTypeObject type;
> >     ...additional fields...
> > } WrapperType_Type;
> > 
> > Can someone confirm this?
> Yes, alas.  The type you would have to declare is 'etype', a private
> type in typeobject.c.

Does this mean this is the wrong route, or is it absolute impossible
to create a subtype of PyType_Type in C with additional slots?

Any tips about the route to take?



From  Fri Jan 18 19:36:20 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 20:36:20 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <> (message from
 Jack Jansen on Fri, 18 Jan 2002 11:24:01 +0100)
References: <>
Message-ID: <>

> Also, the method seems rather complicated for doing a simple thing. The 
> only thing I really want is a way to refer to an _New or _Convert method 
> from Python code.

I believe the attached code implements your requirements. In
particular, see PyArg_GenericCopy for an application that extracts a
void* from an object through a type-safe protocol, then creates a
clone of the original object through the same protocol. Both extractor
and creator function are associated with the type object.

To see this work in Python, run

>>> import handle
>>> x
<handle.Handle object at 0x81683a0>
>>> y=handle.copy(x)
>>> y
<handle.Handle object at 0x819f270>


#include "Python.h"

/************* Generic Converters ***************/

struct converters{
	PyObject* (*create)(void*);
	int (*extract)(PyObject*, void**);

char descr_string[] = "calldll converter structure";

void PyArg_AddConverters(PyTypeObject *type, struct converters* convs)
	PyObject *cobj = PyCObject_FromVoidPtrAndDesc(convs, 

struct converters* PyArg_GetConverters(PyTypeObject *type)
	PyObject *cobj;
	void *descr;
	cobj = PyObject_GetAttrString((PyObject*)type, "__calldll__");
	if (!cobj)
		return NULL;
	descr = PyCObject_GetDesc(cobj);
	if (!descr)
		return NULL;
	if (descr != descr_string){
		PyErr_SetString(PyExc_TypeError, "invalid cobj");
		return NULL;
	return (struct converters*)PyCObject_AsVoidPtr(cobj);

PyObject *PyArg_Create(PyTypeObject* type, void * value)
	struct converters *convs = PyArg_GetConverters(type);
	if (!convs)
		return NULL;
	return convs->create(value);

int PyArg_Extract(PyObject* obj, void** value)
	struct converters *convs = PyArg_GetConverters(obj->ob_type);
	if (!convs)
		return -1;
	convs->extract(obj, value);
	return 0;

PyObject* PyArg_GenericCopy(PyObject* obj)
	void *tmp;
	if (PyArg_Extract(obj, &tmp))
		return NULL;
	return PyArg_Create(obj->ob_type, tmp);

/************* End Generic Converters ***************/

typedef struct {
	int handle;
} HandleObject;

staticforward PyTypeObject Handle_Type;

#define HandleObject_Check(v)	((v)->ob_type == &Handle_Type)

static HandleObject *
newHandleObject(int i)
	HandleObject *self;
	self = PyObject_New(HandleObject, &Handle_Type);
	if (self == NULL)
		return NULL;
	self->handle = i;
	return self;

/* Handle methods */

static void
Handle_dealloc(HandleObject *self)

/**************** Generic Converters: Handle support ***************/

static PyObject*
handle_conv_new(void *s){
	return (PyObject*)newHandleObject((int)s);

static int
handle_conv_extract(PyObject *o, void **dest){
	HandleObject *h = (HandleObject*)o;
	*dest = (void*)h->handle;
	return 0;

struct converters HandleConvs = {

/**************** Generic Converters: Handle support ***************/	

statichere PyTypeObject Handle_Type = {
	/* The ob_type field must be initialized in the module init function
	 * to be portable to Windows without using C++. */
	0,			/*ob_size*/
	"handle.Handle",		/*tp_name*/
	sizeof(HandleObject),	/*tp_basicsize*/
	0,			/*tp_itemsize*/
	/* methods */
	(destructor)Handle_dealloc, /*tp_dealloc*/
	0,			/*tp_print*/
	0, /*tp_getattr*/
	0, /*tp_setattr*/
	0,			/*tp_compare*/
	0,			/*tp_repr*/
	0,			/*tp_as_number*/
	0,			/*tp_as_sequence*/
	0,			/*tp_as_mapping*/
	0,			/*tp_hash*/
        0,                      /*tp_call*/
        0,                      /*tp_str*/
        0,                      /*tp_getattro*/
        0,                      /*tp_setattro*/
        0,                      /*tp_as_buffer*/
        Py_TPFLAGS_DEFAULT,     /*tp_flags*/
/* --------------------------------------------------------------------- */

static PyObject *
xx_new(PyObject *self, PyObject *args)
	HandleObject *rv;
	int h;
	if (!PyArg_ParseTuple(args, "i:new", &h))
		return NULL;
	rv = newHandleObject(h);
	if ( rv == NULL )
	    return NULL;
	return (PyObject *)rv;

static PyObject *
xx_copy(PyObject *self, PyObject *args)
	PyObject *obj;

	if (!PyArg_ParseTuple(args, "O:copy", &obj))
		return NULL;
	return PyArg_GenericCopy(obj);

static PyMethodDef xx_methods[] = {
	{"new",		xx_new,		METH_VARARGS},
	{"copy",		xx_copy,		METH_VARARGS},
	{NULL,		NULL}		/* sentinel */

	PyObject *m;

	Handle_Type.ob_type = &PyType_Type;
	PyArg_AddConverters(&Handle_Type, &HandleConvs);

	/* Create the module and add the functions */
	m = Py_InitModule("handle", xx_methods);

From sdm7g@Virginia.EDU  Fri Jan 18 19:52:18 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Fri, 18 Jan 2002 14:52:18 -0500 (EST)
Subject: [Python-Dev] (PyMapping|PyDict|PyObject)_DelItemString [was: [Pythonmac-SIG] ]
In-Reply-To: <>
Message-ID: <>

[ Background note for cc: to python-dev: builds under both python2.1.2 and python2.2.
  It works under 2.1.2, but under 2.2, it gives a
  'Failure linking new module' error. ]

Added a call to NSLinkEditError to get back more info from
the error (I'll submit this as a patch to SF after I clean
it up a bit.):

>>> import pyobjc
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ImportError: dyld: /usr/local/src/Python-2.2/python.exe Undefined symbols:
Failure linking new module

grepping for this in 2.1.2 finds nothing.

In 2.2, there seems to be one occurance:

 grep PyObject_DelItemString */*.[ch]
Include/abstract.h:#define PyMapping_DelItemString(O,K)  PyObject_DelItemString((O),(K))

Searching for PyMapping_DelItemString, it looks like this changed from
PyDict_DelItemString() in 2.1.2 to PyObject_DelItemString() in 2.2:

dm7g% grep PyMapping_DelItemString Python-2.*/*/*.[ch]
Python-2.1.2/Include/abstract.h:     int PyMapping_DelItemString(PyObject *o, char *key);
Python-2.1.2/Include/abstract.h:#define PyMapping_DelItemString(O,K) PyDict_DelItemString((O),(K))
Python-2.2/Include/abstract.h:     int PyMapping_DelItemString(PyObject *o, char *key);
Python-2.2/Include/abstract.h:#define PyMapping_DelItemString(O,K) PyObject_DelItemString((O),(K))

Is this change of name an inadvertant bug, or is it something that
was intentionally changed, but incompletely?

-- Steve

From  Fri Jan 18 20:06:23 2002
From: (Thomas Heller)
Date: Fri, 18 Jan 2002 21:06:23 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <> <>
Message-ID: <072001c1a05b$ac8a08c0$e000a8c0@thomasnotebook>

> > Also, the method seems rather complicated for doing a simple thing. The 
> > only thing I really want is a way to refer to an _New or _Convert method 
> > from Python code.
> I believe the attached code implements your requirements.

Yes, this looks very much like what I had in mind, except that you
demonstrate how to store and retrieve a C structure in the type's tp_dict.
Nice intro into PyCObject!



From  Fri Jan 18 20:23:11 2002
From: (Thomas Heller)
Date: Fri, 18 Jan 2002 21:23:11 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <> <>
Message-ID: <07a001c1a05e$05170810$e000a8c0@thomasnotebook>

[sorry if this is duplicated, I'm having mailer problems]

> > Also, the method seems rather complicated for doing a simple thing. The 
> > only thing I really want is a way to refer to an _New or _Convert method 
> > from Python code.
> I believe the attached code implements your requirements.

Yes, this looks very much like what I had in mind, except that you
demonstrate how to store and retrieve a C structure in the type's tp_dict.
Nice intro into PyCObject!



From  Fri Jan 18 20:24:43 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 21:24:43 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <> (message
 from Guido van Rossum on Fri, 18 Jan 2002 10:09:30 -0500)
References: <20020110224908.C884@ibook.distro.conectiva> <> <20020111122105.B1808@ibook.distro.conectiva> <> <> <20020114093053.C1325@ibook.distro.conectiva> <> <20020114104146.A2607@ibook.distro.conectiva> <> <> <> <> <>
 <> <>
Message-ID: <>

> What's the current thinking about making docstrings optional?
> Does everybody agree on Gustavo's patch?

Looks good to me.


From  Fri Jan 18 20:27:24 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 21:27:24 +0100
Subject: [ Re: [Python-Dev] Python's footprint]
In-Reply-To: <> (
References: <20020114093053.C1325@ibook.distro.conectiva>
 <> <> <>
Message-ID: <>

> You should also consider the possibility of using the macros
> for translating the docs-strings. They are a form of markup.

While that is true, most of the current strings are marked-up already,
by means of having an __doc__ suffix. I have an extractor that
understands this form of markup, and the Python .pot file in CVS has
those strings extracted.


From  Fri Jan 18 20:53:30 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 21:53:30 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <072001c1a05b$ac8a08c0$e000a8c0@thomasnotebook>
References: <> <> <072001c1a05b$ac8a08c0$e000a8c0@thomasnotebook>
Message-ID: <>

> Yes, this looks very much like what I had in mind, except that you
> demonstrate how to store and retrieve a C structure in the type's
> tp_dict.

Indeed. I also think it is more appropriate than either a new metatype
or a ParseTuple extension for the problem at hand (supporting
arbitrary types in calldll), for the following reasons:

- There may be different ways of how an object converts to a "native"
  type. In particular, in some cases, ParseTuple may need to return
  (fill out) something more complex than a void*, something that
  calldll cannot support by nature.

- A type may need to provide various independent extensions to the
  standard protocols, e.g. it may provide "give me a Unicode doc
  string" in addition to "give me a conversion function to void*".
  In this case, you'd need multiple inheritance on the metatype
  level, something that does not reflect well in C.
  For Python, it is much more common not to care at all about
  inheritance. Instead, just access the protocol, and expect an
  exception if it is not supported.

Also notice that this *does* make use of new-style classes: In 2.1,
types did not have a tp_dict slot. Of course, the PyType_Ready call
should go immediately before the place where tp_dict is accessed, and
a check should be added whether tp_flags contains


From  Fri Jan 18 20:57:21 2002
From: (Guido van Rossum)
Date: Fri, 18 Jan 2002 15:57:21 -0500
Subject: [Python-Dev] (PyMapping|PyDict|PyObject)_DelItemString [was: [Pythonmac-SIG] ]
In-Reply-To: Your message of "Fri, 18 Jan 2002 14:52:18 EST."
References: <>
Message-ID: <>

> >>> import pyobjc
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> ImportError: dyld: /usr/local/src/Python-2.2/python.exe Undefined symbols:
> _PyObject_DelItemString
> Failure linking new module
> >>>
> grepping for this in 2.1.2 finds nothing.
> In 2.2, there seems to be one occurance:
>  grep PyObject_DelItemString */*.[ch]
> Include/abstract.h:#define PyMapping_DelItemString(O,K)  PyObject_DelItemString((O),(K))
> Searching for PyMapping_DelItemString, it looks like this changed from
> PyDict_DelItemString() in 2.1.2 to PyObject_DelItemString() in 2.2:
> dm7g% grep PyMapping_DelItemString Python-2.*/*/*.[ch]
> Python-2.1.2/Include/abstract.h:     int PyMapping_DelItemString(PyObject *o, char *key);
> Python-2.1.2/Include/abstract.h:#define PyMapping_DelItemString(O,K) PyDict_DelItemString((O),(K))
> Python-2.2/Include/abstract.h:     int PyMapping_DelItemString(PyObject *o, char *key);
> Python-2.2/Include/abstract.h:#define PyMapping_DelItemString(O,K) PyObject_DelItemString((O),(K))
> Is this change of name an inadvertant bug, or is it something that
> was intentionally changed, but incompletely?

The latter.  See:

--Guido van Rossum (home page:

From  Fri Jan 18 20:57:48 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 21:57:48 +0100
Subject: [Python-Dev] (PyMapping|PyDict|PyObject)_DelItemString [was: [Pythonmac-SIG] ]
In-Reply-To: <>
 (message from Steven Majewski on Fri, 18 Jan 2002 14:52:18 -0500
References: <>
Message-ID: <>

> Is this change of name an inadvertant bug, or is it something that
> was intentionally changed, but incompletely?

This is bug #498915, fixed in abstract.h 2.43 and,
abstract.c 2.94 and


From  Fri Jan 18 21:21:59 2002
From: (Thomas Heller)
Date: Fri, 18 Jan 2002 22:21:59 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <> <> <072001c1a05b$ac8a08c0$e000a8c0@thomasnotebook> <>
Message-ID: <094901c1a066$3c54e6f0$e000a8c0@thomasnotebook>

> Also notice that this *does* make use of new-style classes: In 2.1,
> types did not have a tp_dict slot. Of course, the PyType_Ready call
> should go immediately before the place where tp_dict is accessed, and
> a check should be added whether tp_flags contains
Wouldn't it suffice to check for tp_dict != NULL (after the call
to PyType_Ready of course)?

Hm. What does Py_TPFLAGS_HAVE_CLASS mean exactly?
Or, better, since TPFLAGS_DEFAULT contains TPFLAGS_HAVE_CLASS,
what does it mean when Py_TPFLAGS_HAVE_CLASS is NOT in tp_flags?
Does it mean that this is a 'new style' type object?


From  Fri Jan 18 21:32:23 2002
From: (Martin v. Loewis)
Date: Fri, 18 Jan 2002 22:32:23 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <094901c1a066$3c54e6f0$e000a8c0@thomasnotebook>
References: <> <> <072001c1a05b$ac8a08c0$e000a8c0@thomasnotebook> <> <094901c1a066$3c54e6f0$e000a8c0@thomasnotebook>
Message-ID: <>

> Wouldn't it suffice to check for tp_dict != NULL (after the call
> to PyType_Ready of course)?

No, see below (although I must admit that I wrote "Right" here first

> Hm. What does Py_TPFLAGS_HAVE_CLASS mean exactly?

According to the documentation, it means that the underlying
TypeObject structure has the necessary fields in its C declaration.

> Or, better, since TPFLAGS_DEFAULT contains TPFLAGS_HAVE_CLASS,
> what does it mean when Py_TPFLAGS_HAVE_CLASS is NOT in tp_flags?

It means you have been loading a module from an earlier Python
version, which had a different setting for TPFLAGS_DEFAULTS, and a
shorter definition of the TypeObject.

If you try to access tp_dict in such an object, you are accessing
random memory. This may immediately crash, or only crash when you pass
the pointer you got to the dictionary functions.


From  Fri Jan 18 22:12:29 2002
From: (Guido van Rossum)
Date: Fri, 18 Jan 2002 17:12:29 -0500
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: Your message of "Fri, 18 Jan 2002 10:38:05 PST."
References: <> <>
Message-ID: <>

> I think that something in particular that Paul S. said got under your
> skin (and there was something he said that could certainly get under a
> person's skin). I'm pretty sure it isn't now a policy to rudely reject
> suggestions from people you haven't heard of! Until I went back through
> the thread I felt as Aahz did that your rejection was somewhat severe in
> tone. I think you (still) agree that people should not be afraid of
> (politely) stating their opinions in python-dev, even when those
> opinions disagree with yours. Or if there is an unspoken rule that
> unproven developers shouldn't be in python-dev then maybe we should just
> make it a spoken rule. But I'm most confident of the theory that you
> snapped at one person in particular because of something he said.
>  Paul Prescod

He harped at the same issue in three consecutive message without
explaining his position.

--Guido van Rossum (home page:

From sdm7g@Virginia.EDU  Fri Jan 18 22:49:38 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Fri, 18 Jan 2002 17:49:38 -0500 (EST)
Subject: [Python-Dev] Re: several messages
In-Reply-To: <>
Message-ID: <>

On Fri, 18 Jan 2002, Guido van Rossum wrote:


On Fri, 18 Jan 2002, Martin v. Loewis wrote:

> This is bug #498915, fixed in abstract.h 2.43 and,
> abstract.c 2.94 and

I changed it back to  PyDict_...
With that patch, pyobjc seems to build and work with Python-2.2
as well as 2.1.2.

-- Steve.

From  Fri Jan 18 23:18:27 2002
From: (Jason Orendorff)
Date: Fri, 18 Jan 2002 17:18:27 -0600
Subject: [Python-Dev] Utopian String Interpolation
In-Reply-To: <>
Message-ID: <>

Paul Prescod:
> > [...] But I'm most confident of the theory that you
> > snapped at one person in particular because of something he said.

> He harped at the same issue in three consecutive message without
> explaining his position.

Actually I was quite happy with the thread.

At runtime, Python tends to complain about iffy situations,
even situations that other languages might silently accept.
For example:

  print 50 + " percent"             # TypeError
  x = [1, 2, 3]; x.remove(4)        # ValueError
  x = {}; print x[3]                # KeyError
  a, b = "x,y,z,z,y".split()        # ValueError
  x.append(1, 2)                    # TypeError, recently
  print u"\N{EURO SIGN}"            # UnicodeError

I'm not complaining.  I like the pickiness.
But the Python compiler (that is, Python's syntax) tends to be
more forgiving.  Examples:

  - Inconsistent use of tabs and spaces.  (Originally handled
    by; now an optional warning in Python itself.)
  - Useless or probably-useless expressions, like these:
      def g(f):
          os.environ['EDITOR']      # does nothing with value
          f.write(xx), f.write(yy)  # should be ; not ,
          f.close                   # obvious mistake
    (PyChecker catches the last one.)
  - Non-escaping backslashes in strings (there is a well-known
    reason for this one; but the reason no longer exists, in new
    code anyway, since 1.5.)

So we catch things like this with static analysis tools like, or lately PyChecker.  If Guido finds any of these
syntax-checks compelling enough, he can always incorporate them
into Python whenever (but don't hold your breath).

Again, you'll get no complaints from me on this.  But I am
curious.  Is this apparent difference in pickiness a design
choice?  Or is it just harder to write picky compilers than
picky libraries?  Or am I seeing something that's not really

## Jason Orendorff

From  Sat Jan 19 00:07:56 2002
From: (Jack Jansen)
Date: Sat, 19 Jan 2002 01:07:56 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <>
Message-ID: <>

On Friday, January 18, 2002, at 08:36  PM, Martin v. Loewis wrote:

>> Also, the method seems rather complicated for doing a simple 
>> thing. The
>> only thing I really want is a way to refer to an _New or 
>> _Convert method
>> from Python code.
> I believe the attached code implements your requirements.

Martin, hats off!

This does exactly what I want, and it does so in a pretty 
generalized way. Actually in _such_ a generalized way that I 
think this should be documented loud and clear.

Looking at it a bit more, how about storing each function 
pointer in a separate PyCObject, and adding general APIs 
somewhere in the core
void PyType_SetAnnotation(PyTypeObject *tp, char *name, char 
*descr, void *);
void *PyType_GetAnnotation(PyTypeObject *tp, char *name, char *descr);

(I've picked the name annotation here, because it sort-of feels 
like that, another name may bring the idea across better).
> --
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Sat Jan 19 00:10:26 2002
From: (Tim Peters)
Date: Fri, 18 Jan 2002 19:10:26 -0500
Subject: [Python-Dev] deprecate input()?
In-Reply-To: <>
Message-ID: <>

[Skip Montanaro]
> Yes, but what if the program containing calls to input() get shipped to
> someone else's computer?  It just seems to me that a) input is almost
> never what you want to call and that b) it would seem to a naive
> programmer to be the correct way to ask the user for a line of input.

One of my favorite papers for the upcoming Python Conference describes the
use of Python in a CAD system for chip design.  The authors had indeed used
input(), and didn't know that it eval'ed expressions.  The program's users
discovered it first, succumbing to a natural urge to type expressions in the
input fields.  One of the things that made this paper a favorite is that the
authors didn't whine about this:  to the contrary, they were delighted to
get the kudos for Guido's good intuition about what a kick-ass input()
function should do.

guido-never-drives-before-a-few-stiff-drinks-either<wink>-ly y'rs  - tim

From  Sat Jan 19 00:28:23 2002
From: (Martin v. Loewis)
Date: Sat, 19 Jan 2002 01:28:23 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <> (message from
 Jack Jansen on Sat, 19 Jan 2002 01:07:56 +0100)
References: <>
Message-ID: <>

> Martin, hats off!
> This does exactly what I want, and it does so in a pretty 
> generalized way. Actually in _such_ a generalized way that I 
> think this should be documented loud and clear.


> Looking at it a bit more, how about storing each function 
> pointer in a separate PyCObject, and adding general APIs 
> somewhere in the core
> void PyType_SetAnnotation(PyTypeObject *tp, char *name, char 
> *descr, void *);
> void *PyType_GetAnnotation(PyTypeObject *tp, char *name, char *descr);

I'll happily add that to some recipe collection. However, before
generalizing it, I'd like to see more use cases. There should,
atleast, be a *second* application beyond calldll (or, perhaps even
beyond MacPython). Generalizing from a single use case is not good.


From  Sat Jan 19 03:38:56 2002
From: (Guido van Rossum)
Date: Fri, 18 Jan 2002 22:38:56 -0500
Subject: [Python-Dev] When to signal an error
In-Reply-To: Your message of "Fri, 18 Jan 2002 17:18:27 CST."
References: <>
Message-ID: <>

(I'm changing the topic :-)

> At runtime, Python tends to complain about iffy situations,
> even situations that other languages might silently accept.

"Other languages" being Perl or JavaScript?  The situations you show
here would all be errors in most languages that are compiled to
machine code.

> For example:
>   print 50 + " percent"             # TypeError
>   x = [1, 2, 3]; x.remove(4)        # ValueError
>   x = {}; print x[3]                # KeyError
>   a, b = "x,y,z,z,y".split()        # ValueError
>   x.append(1, 2)                    # TypeError, recently
>   print u"\N{EURO SIGN}"            # UnicodeError
> I'm not complaining.  I like the pickiness.

That's why you're using Python. :-)

> But the Python compiler (that is, Python's syntax) tends to be
> more forgiving.  Examples:
>   - Inconsistent use of tabs and spaces.  (Originally handled
>     by; now an optional warning in Python itself.)
>   - Useless or probably-useless expressions, like these:
>       def g(f):
>           os.environ['EDITOR']      # does nothing with value
>           f.write(xx), f.write(yy)  # should be ; not ,
>           f.close                   # obvious mistake
>     (PyChecker catches the last one.)
>   - Non-escaping backslashes in strings (there is a well-known
>     reason for this one; but the reason no longer exists, in new
>     code anyway, since 1.5.)
> So we catch things like this with static analysis tools like
>, or lately PyChecker.  If Guido finds any of these
> syntax-checks compelling enough, he can always incorporate them
> into Python whenever (but don't hold your breath).
> Again, you'll get no complaints from me on this.  But I am
> curious.  Is this apparent difference in pickiness a design
> choice?  Or is it just harder to write picky compilers than
> picky libraries?  Or am I seeing something that's not really
> there?

There's no unifying reason why thes examples are not errors.  The
first and last can be considered historical raisins -- the tabs/spaces
mix was considered a good thing in the days when Python only ran on
Unixoid systems where nobody would think about changing the display
size for tabs; we know the reason for the last.  But it's hard to
change these without inconveniencing users, and there are other ways
to deal with them (like picky tools).

The three examples in the second item have in common that they are
syntactically expressions but are used in a statement context.  The
problem here that any language designer is faced with: you would want
to allow expressions with an obvious side-effect, but you would want
to disallow expressions that obviously have no side-effects.  But
where to draw the line?  Traditional parsing technology such as used
in Python makes it hard to be very differentiating here; a good
analysis of which expressions "make sense" and which ones don't can
only be done during a later pass of the compiler.

I believe that evertually some PyChecker-like technology will be
incorporated in the Python compiler.  The same happened to C
compilers: the lint program became useless once GCC incorporated the
same technology.

But these warnings will always have a different status than purely
syntactical error: there are often cases where the user knows better
(for example, sometimes an attribute reference can have a desirable
side effect).

--Guido van Rossum (home page:

From  Sat Jan 19 19:25:18 2002
From: (Neal Norwitz)
Date: Sat, 19 Jan 2002 14:25:18 -0500
Subject: [Python-Dev] When to signal an error
References: <> <>
Message-ID: <>

Guido van Rossum wrote:

> I believe that evertually some PyChecker-like technology will be
> incorporated in the Python compiler.  The same happened to C
> compilers: the lint program became useless once GCC incorporated the
> same technology.

pychecker was (and still is) an experiment to me.  But I think 
it would be great if the lessons from pychecker could be integrated
into the compiler.

Currently, I think there are 2 or 3 warnings which definitely fit this class:
No global found, using ++/--, and expressions with no effect as Jason
described.  I have posted a patch on SF to demonstrate the feasibility
of expressions with no effect:

It should be pretty easy to warn about ++ and --.  No global found
would probably require another pass of the code after compilation.

I'd be happy to help the process of integrating warnings into the compiler,
however, I'm not sure how to proceed.  Should pychecker be put into the
standard library (users can now do:  import pychecker.checker and all
modules imported are checked by installing an __import__)?  Should
pychecker be added as a tool?  Should a PEP be written?  etc.

> But these warnings will always have a different status than purely
> syntactical error: there are often cases where the user knows better
> (for example, sometimes an attribute reference can have a desirable
> side effect).

I agree.


From  Sat Jan 19 23:16:42 2002
From: (Jason Orendorff)
Date: Sat, 19 Jan 2002 17:16:42 -0600
Subject: [Python-Dev] When to signal an error
In-Reply-To: <>
Message-ID: <>

Neal Norwitz:
> Guido van Rossum:
> > But these warnings will always have a different status than purely
> > syntactical error: there are often cases where the user knows better
> > (for example, sometimes an attribute reference can have a desirable
> > side effect).
> I agree.

Here's what Pychecker finds in the standard library (as of 2.2).
In each case, the expression is intended to raise an exception if
the named variable or attribute doesn't exist.

Each one could be rewritten (I'm curious as to the prevailing
stylistic opinions on this):

=== (lines 217 and 221)
        except AttributeError:
            sys.ps1 = ">>> "
        except AttributeError:
            sys.ps2 = "... "

Could be rewritten:
        if not hasattr(sys, 'ps1'):
            sys.ps1 = ">>> "
        if not hasattr(sys, 'ps2'):
            sys.ps2 = "... "

=== (line 721)

Could be rewritten:
if globals().has_key("LC_MESSAGES"):

=== (line 58)
except NameError:
    UnicodeType = None

Could be rewritten:
globals().setdefault('UnicodeType', None)

## Jason Orendorff

From  Sat Jan 19 23:34:12 2002
From: (Jason Orendorff)
Date: Sat, 19 Jan 2002 17:34:12 -0600
Subject: [Python-Dev] When to signal an error
In-Reply-To: <>
Message-ID: <>

Guido van Rossum wrote:
> Jason Orendorff wrote:
> > At runtime, Python tends to complain about iffy situations,
> > even situations that other languages might silently accept.
> "Other languages" being Perl or JavaScript?  The situations you show
> here would all be errors in most languages that are compiled to
> machine code.
> > For example:
> >   print 50 + " percent"             # TypeError
> >   x = [1, 2, 3]; x.remove(4)        # ValueError
> >   x = {}; print x[3]                # KeyError
> >   a, b = "x,y,z,z,y".split()        # ValueError
> >   x.append(1, 2)                    # TypeError, recently
> >   print u"\N{EURO SIGN}"            # UnicodeError

Not to bicker, but Java only manages to reject 2 of the 6,
both at compile time.  The other 4 silently pass through the
standard library without complaint.  None cause exceptions
during execution.

ML makes no distinction between append(1, 2) and append((1, 2)),
but that's a syntax thing...  C++ STL remove() doesn't complain
if it doesn't find anything to remove; nor does the C++
map<>::operator[]() complain if no entry exists.

> > I'm not complaining.  I like the pickiness.
> That's why you're using Python. :-)

(laugh) You sell yourself short, Guido.  :)  I would still use
Python even if (50 + " percent") started evaluating to
"50 percent" tomorrow.

## Jason Orendorff

From  Sun Jan 20 00:02:10 2002
From: (Martin v. Loewis)
Date: Sun, 20 Jan 2002 01:02:10 +0100
Subject: [Python-Dev] When to signal an error
In-Reply-To: <>
References: <>
Message-ID: <>

> Each one could be rewritten (I'm curious as to the prevailing
> stylistic opinions on this):

I think those rewrites do not improve the code, see detailed comments

> Could be rewritten:
>         if not hasattr(sys, 'ps1'):
>             sys.ps1 = ">>> "
>         if not hasattr(sys, 'ps2'):
>             sys.ps2 = "... "

Using string literals when you mean attribute names is bad style. It
just helps to trick the checker. Sometimes, you cannot avoid this
style, but if you can, you should.

> if globals().has_key("LC_MESSAGES"):
>     __all__.append("LC_MESSAGES")

This combines the previous issue with the usage of globals(). I find
it confusing to perform function calls to check for the presence of

> try:
>     UnicodeType
> except NameError:
>     UnicodeType = None
> Could be rewritten:
> globals().setdefault('UnicodeType', None)

Same issue here. If this needs to be rewritten, I'd prefer

    from types import UnicodeType
except ImportError:
    UnicodeType = None

Somebody might also change the "from types import *" to explicitly
list the set of names that are requested, when changing this fragment.


From  Sun Jan 20 00:53:41 2002
From: (Guido van Rossum)
Date: Sat, 19 Jan 2002 19:53:41 -0500
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: Your message of "Fri, 18 Jan 2002 20:01:16 +0100."
References: <> <08c501c19f8c$72631b20$e000a8c0@thomasnotebook> <>
Message-ID: <>

> > Yes, alas.  The type you would have to declare is 'etype', a private
> > type in typeobject.c.
> Does this mean this is the wrong route, or is it absolute impossible
> to create a subtype of PyType_Type in C with additional slots?

I wish I had time to explain this, but I don't.  For now, you'll have
to read how types are initialized in typeobject.c -- maybe there's a
way, maybe there isn't.

> Any tips about the route to take?

It can be done easily dynamically.

--Guido van Rossum (home page:

From  Sun Jan 20 12:11:57 2002
From: (M.-A. Lemburg)
Date: Sun, 20 Jan 2002 13:11:57 +0100
Subject: [Python-Dev] Extending types in C - help needed
References: <> <>
Message-ID: <>

[Martin's PyCObject based Handle object]

This seems to be very close to the __reduce__ idea I posted
on this thread a couple of days ago. Why not extend it to
fully support this standard Python protocol ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From Boris_Lipner <>  Sun Jan 20 12:49:52 2002
From: Boris_Lipner <> (Boris_Lipner)
Date: Sun, 20 Jan 2002 15:49:52 +0300
Subject: [Python-Dev] cooperation
Message-ID: <>

        Dear Sirs,

 For some technical reasons we have partially lost our Data Bank of art galleries.
 Please, write the address of your website, so that we could continue our
 Our site is

 Best regards,

From  Sun Jan 20 19:13:22 2002
From: (Martin v. Loewis)
Date: Sun, 20 Jan 2002 20:13:22 +0100
Subject: [Python-Dev] Extending types in C - help needed
In-Reply-To: <> (
References: <> <> <>
Message-ID: <>

> This seems to be very close to the __reduce__ idea I posted
> on this thread a couple of days ago. Why not extend it to
> fully support this standard Python protocol ?

Because it is not clear, to me, what specifically the semantics of
this protocol is. I wrote it to support MacOS calldll. I cannot see
applicability beyond this API.

One of the strength of OO and polymorphism is precisely that users can
freely extend the protocols that their objects support, without
requiring *all* objects to support the protocol. A standard protocol
should be clearly useful cross-platform, for many different types, in
different applications.


From  Sun Jan 20 21:48:24 2002
From: (Joshua 'The List' S.)
Date: Sun, 20 Jan 2002 13:48:24 -0800
Subject: [Python-Dev] (no subject)
Message-ID: <>

From  Sun Jan 20 22:23:15 2002
From: (Ka-Ping Yee)
Date: Sun, 20 Jan 2002 16:23:15 -0600 (CST)
Subject: [Python-Dev] Python and Security
In-Reply-To: <>
Message-ID: <>

"M.-A. Lemburg" wrote:
> ... Note that Python hasn't really had a need
> for Perl's "taint" because of this. I wouldn't want to see that
> change in any way.

On Thu, 17 Jan 2002, Paul Prescod wrote:
> I am certainly not a Perl programmer but Python is also attackable
> through the sorts of holes that "taint" is intended to avoid.

Paul is right on the money.  Tainting is a completely separate issue.

That said, however, i wonder why security rarely comes up as an
issue for Python.  Is it because nobody expects security properties
from the language?  Does anyone know how much the restricted
execution feature gets used?  Is there anyone here that would use
a tainting feature if it existed?

It would be interesting to explore the possibilities for safe
distributed programming in Python.  Restricted execution mode and the
ability to hook __import__ seem like a pretty strong starting point,
and given a suitable cryptographic comm library, it might be feasible
to get from there to capability-style distributed programming.

IMHO, simplicity and readability are extremely important for a secure
programming language, so that gives Python a great head start.

(By the way, i'm planning to be at Python 10, and hope to see many
of you there.  As i'm looking for ways to keep costs down, would
anyone be interested in splitting the cost of a hotel room in
exchange for a roommate with a strange hairstyle?  I'll be there
Feb 4 to 7, three nights.)

-- ?!ng

From  Sun Jan 20 22:37:11 2002
From: (Martin v. Loewis)
Date: Sun, 20 Jan 2002 23:37:11 +0100
Subject: [Python-Dev] Python and Security
In-Reply-To: <>
 (message from Ka-Ping Yee on Sun, 20 Jan 2002 16:23:15 -0600 (CST))
References: <>
Message-ID: <>

> That said, however, i wonder why security rarely comes up as an
> issue for Python.  Is it because nobody expects security properties
> from the language?  Does anyone know how much the restricted
> execution feature gets used?  Is there anyone here that would use
> a tainting feature if it existed?

In my understanding, tainting is needed if you allow data received
from remote to invoke arbitrary operations. In Python, there is only a
short list where this might cause a problem:

- invoking exec or eval on a string of unknown origin
- unpickling an arbitrary string
- performing getattr with a parameter of unknown origin.

Because there are so few places where tainted data may cause problems,
it never is an issue: people just intuitively know to avoid them.

> It would be interesting to explore the possibilities for safe
> distributed programming in Python.  

Not sure what this has to do with tainting, though: if you want to
execute code you receive from untrusted sources, a sandbox is closer
to what you need.


From  Sun Jan 20 23:01:44 2002
From: (Barry A. Warsaw)
Date: Sun, 20 Jan 2002 18:01:44 -0500
Subject: [Python-Dev] Python and Security
References: <>
Message-ID: <>

>>>>> "MvL" == Martin v Loewis <> writes:

    | - invoking exec or eval on a string of unknown origin
    | - unpickling an arbitrary string
    | - performing getattr with a parameter of unknown origin.

Don't forget os.system(), popen(), and friends, i.e. passing
unsanitized strings to the shell.  In my my long rusty Perl
experience, this was the most common reason to use taint strings.

Python OTOH really has very little need to call out to the shell;
almost everything you'd want to do that way can be done in pure
Python.  There are some opportunties for improving string sanitization
for the few instances where os.system() is necessary.

Most of the security issues I've had to deal with in Mailman have been
in library modules -- or the use thereof, not in the language itself.
Things like vulnerabilies in or pickle/marshal, or
cross-site scripting exploits, that kind of thing.  There are also
more subtle issues that would be interesting to explore, like DoS
attacks with thru-the-web regular expression searching, deliberate
form confuddling, and some of the ttw code execution stuff that
e.g. Zope gets into.  Rexec is an incomplete solution to the latter.


From  Sun Jan 20 23:49:58 2002
From: (Paul Prescod)
Date: Sun, 20 Jan 2002 15:49:58 -0800
Subject: [Python-Dev] Re: Python and Security
References: <>
Message-ID: <>

Ka-Ping Yee wrote:
> That said, however, i wonder why security rarely comes up as an
> issue for Python. 

I guess you didn't read comp.lang.python this week. ;)

> ... Is it because nobody expects security properties
> from the language?  

Remember that people for a long time thought of Perl as a "CGI
language". And early uses of CGI would probably have depended heavily on
the Perl equivalents of "popen" and "system". Plus, those features are
so easy to get at in the language. Compare:

print `ls`


import os

print os.popen("ls").read()

If you were a newbie in each of these languages what are the percentage
chance of you using either of these features versus the list-dir
equivalent. List-dir is available in each language.

> ... Does anyone know how much the restricted
> execution feature gets used?  

I personally would not trust it because I don't know if anyone is
following its progress from one version of Python to another. I also
know that even languages that are designed from scratch to be safe (Java
and JavaScript) have had leaky implemetations so I don't really hold out
much hope for Python until I hear that someone is actively researching

> ... Is there anyone here that would use
> a tainting feature if it existed?

I'd like to think I've internalized taints rules by osmosis...

> (By the way, i'm planning to be at Python 10, and hope to see many
> of you there.  As i'm looking for ways to keep costs down, would
> anyone be interested in splitting the cost of a hotel room in
> exchange for a roommate with a strange hairstyle?  I'll be there
> Feb 4 to 7, three nights.)

Maybe there should be a bulletin board or something for people to find
each other. I think one of the Python conferences had something like
that...for hotels and also to share cabs from the airport.

 Paul Prescod

From  Mon Jan 21 00:11:27 2002
From: (Simon Cozens)
Date: Mon, 21 Jan 2002 00:11:27 +0000
Subject: [Python-Dev] Python and Security
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Sun, Jan 20, 2002 at 11:37:11PM +0100, Martin v. Loewis wrote:
> In my understanding, tainting is needed if you allow data received
> from remote to invoke arbitrary operations. In Python, there is only a
> short list where this might cause a problem:
> - invoking exec or eval on a string of unknown origin
> - unpickling an arbitrary string
> - performing getattr with a parameter of unknown origin.

>From a Perl point of view, tainting is there to stop data received from
outside to do *anything* related to the system. This includes what you say,
but goes further:
    - open
    - os.popen (in fact, most of os.*)
    - socket (no, really) and everything that depends on it (urllib, etc.)

Since Python has rexec for this sort of thing, tainting may not be so
important, but I think rexec goes too far. The idea of tainting is not
to *disallow* using, say, arbitrary user input from CGI scripts as
filenames - it's help the programmer segregate which pieces of data need
special treatment before being passed to these kinds of functions.

Rule the Empire through force.
		-- Shogun Tokugawa

From  Mon Jan 21 01:38:59 2002
From: (Aahz Maruch)
Date: Sun, 20 Jan 2002 17:38:59 -0800 (PST)
Subject: [Python-Dev] Python and Security
In-Reply-To: <> from "Barry A. Warsaw" at Jan 20, 2002 06:01:44 PM
Message-ID: <>

Barry A. Warsaw wrote:
> >>>>> "MvL" == Martin v Loewis <> writes:
>     | - invoking exec or eval on a string of unknown origin
>     | - unpickling an arbitrary string
>     | - performing getattr with a parameter of unknown origin.
> Don't forget os.system(), popen(), and friends, i.e. passing
> unsanitized strings to the shell.  In my my long rusty Perl
> experience, this was the most common reason to use taint strings.

More precisely, because Perl culture developed as a superset of shell
scripts, it used to be all-too-common for Perl scripts to get their data
by parsing the output of a Unix utility (instead of calling a library
function directly).  This necessarily spawned a subshell where malicious
input could be a security problem.  (When I was learning Perl, the
available books often taught this programming style.)

I've heard that Perl culture has changed, but the taint capability is
still there because too many Perlers stick to their trusty poor habits.

Pythonistas, of course, never learned bad habits.  ;-)
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Mon Jan 21 02:06:53 2002
From: (Simon Cozens)
Date: Mon, 21 Jan 2002 02:06:53 +0000
Subject: [Python-Dev] Python and Security
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Sun, Jan 20, 2002 at 05:38:59PM -0800, Aahz Maruch wrote:
> More precisely, because Perl culture developed as a superset of shell
> scripts, it used to be all-too-common for Perl scripts to get their data
> by parsing the output of a Unix utility (instead of calling a library
> function directly).  This necessarily spawned a subshell where malicious
> input could be a security problem.

Not so.

This is what taint is: Taint tells you where there's some shit you want
to clean up. 

If you ask the user for a filename to write to, taint tells you that
you'd better check for leading slashes, double dots and the like before
writing to it. If you're about to run an external program, taint tells
you that you might not want to believe the user's idea of what $PATH
ought to be. If you're getting a URL from somewhere, taint tells you
that you should probably think twice before happily passing back
file:///etc/shadow. And so on and so forth. None of these examples are
about input to a subshell.

I'm not in a position to say whether or not Python needs taint; if it
had it, I probably wouldn't use the feature. But let's not misunderstand
what it's for.

Thermodynamics in a nutshell:
1st Law:  You can't win.  (Energy is conserved)
2nd Law:  You can't break even.  (Entropy)
0th Law:  You can't even quit the game.  (Closed systems) -- Taki Kogoma

From  Mon Jan 21 02:27:59 2002
From: (Paul Prescod)
Date: Sun, 20 Jan 2002 18:27:59 -0800
Subject: [Python-Dev] When to signal an error
References: <> <>
Message-ID: <>

"Martin v. Loewis" wrote:
> > Could be rewritten:
> >         if not hasattr(sys, 'ps1'):
> >             sys.ps1 = ">>> "
> >         if not hasattr(sys, 'ps2'):
> >             sys.ps2 = "... "
> Using string literals when you mean attribute names is bad style. It
> just helps to trick the checker. 

Just for the record, I think that Jason's rewrites were clearer in every
case because they said exactly what he was trying to do.

"If the sys module has the attribute ps1 then ..."

This is much clearer than "Get the ps1 attribute from the sys module and
throw it away.".

Python has a functions specifically for checking for the existance of
attributes and keys. Why not use them?

Plus, I think that exceptions should be (as far as possible) reserved
for exceptional situations. Using them to as tests is not as compact,
not as readable and not as runtime efficient.

But more to the point, any of these could have been rewritten as:

_junk = sys.ps1

That would shut up compiler messages without forcing you to use the
haskey/hasattr style.

 Paul Prescod

From  Mon Jan 21 10:24:54 2002
From: (Michael Hudson)
Date: 21 Jan 2002 10:24:54 +0000
Subject: [Python-Dev] When to signal an error
In-Reply-To: Neal Norwitz's message of "Sat, 19 Jan 2002 14:25:18 -0500"
References: <> <> <>
Message-ID: <>

Neal Norwitz <> writes:

> Currently, I think there are 2 or 3 warnings which definitely fit this class:
> No global found, using ++/--, and expressions with no effect as Jason
> described.

It would sure be nice if using a variable before assignment produced a
warning at compile time.  However I think this needs flow analysis and
you won't catch me trying to add that to compile.c.


  MGM will not get your whites whiter or your colors brighter.
  It will, however, sit there and look spiffy while sucking down
  a major honking wad of RAM.              --

From Samuele Pedroni" <  Mon Jan 21 13:20:15 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Mon, 21 Jan 2002 14:20:15 +0100
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
Message-ID: <001201c1a27e$5d8ec060$6d94fea9@newmexico>

Hi. Thanks to

I landed in

Peter Norvig is about to supply
Python versions of the algorithms with
the 2nd edition of his AI: A Modern Approach.

So far, so good. In the section about
coding convetions he says:

=A6In general, follow Guido's style conventions,
=A6but I have some quirks that I prefer (although I could be talked out o=
f them):
=A6* _ instead of self as first argument to methods: def f(_, x):

I'm perfectly aware that the 'self' thing it is just a convetion,
OTOH much of the cross-programmer readability
of code relies on such convention.

It is good, bad or irrelevant to have such
an authoritative book (although about AI not
Python directly) adopting such a line-noisy

Maybe nobody cares, but I preferred not to
let this go unnoticed. Someone who cares
could try to discuss the issue or make it
apparent to Mr. Norvig.


regards, Samuele Pedroni.

From  Sun Jan 20 22:43:59 2002
From: (Jeremy Hylton)
Date: Sun, 20 Jan 2002 17:43:59 -0500
Subject: [Python-Dev] When to signal an error
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "NN" == Neal Norwitz <> writes:

  NN> Guido van Rossum wrote:
  >> I believe that evertually some PyChecker-like technology will be
  >> incorporated in the Python compiler.  The same happened to C
  >> compilers: the lint program became useless once GCC incorporated
  >> the same technology.

  NN> pychecker was (and still is) an experiment to me.  But I think
  NN> it would be great if the lessons from pychecker could be
  NN> integrated into the compiler.

Me, too.

  NN> I'd be happy to help the process of integrating warnings into
  NN> the compiler, however, I'm not sure how to proceed.  Should
  NN> pychecker be put into the standard library (users can now do:
  NN> import pychecker.checker and all modules imported are checked by
  NN> installing an __import__)?  Should pychecker be added as a tool?
  NN> Should a PEP be written?  etc.

How much of pychecker's work could be done by the compiler itself?
I'd like to see more of the warnings generated during compilation, but
agree with Michael Hudson that extending it is a lot of work.  Perhaps
it's time to redesign the compiler.

A PEP is probably good for more than one reason.  One reason is to
document the warnings that are generated and the rationale for them.
If you integrate it into the compiler, the PEP is a good place to
capture some design info.


From  Sun Jan 20 22:44:39 2002
From: (Jeremy Hylton)
Date: Sun, 20 Jan 2002 17:44:39 -0500
Subject: [Python-Dev] When to signal an error
In-Reply-To: <>
References: <>
Message-ID: <>

We could talk about this at the conference.


From  Mon Jan 21 16:07:59 2002
From: (Guido van Rossum)
Date: Mon, 21 Jan 2002 11:07:59 -0500
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: Your message of "Mon, 21 Jan 2002 14:20:15 +0100."
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico>
Message-ID: <>

> Peter Norvig is about to supply
> Python versions of the algorithms with
> the 2nd edition of his AI: A Modern Approach.
> So far, so good. In the section about
> coding convetions he says:
> ¦In general, follow Guido's style conventions,
> ¦but I have some quirks that I prefer (although I could be talked out of them):
> ...
> ¦* _ instead of self as first argument to methods: def f(_, x):
> ...
> I'm perfectly aware that the 'self' thing it is just a convetion,
> OTOH much of the cross-programmer readability
> of code relies on such convention.
> It is good, bad or irrelevant to have such
> an authoritative book (although about AI not
> Python directly) adopting such a line-noisy
> convention?
> Maybe nobody cares, but I preferred not to
> let this go unnoticed. Someone who cares
> could try to discuss the issue or make it
> apparent to Mr. Norvig.
> Opinions?
> regards, Samuele Pedroni.


My apologies for butting in here without doing full research.  I don't
know how you reached this set of conventions, so maybe you've got a
very good reason; but I don't see it on your webpage.

Two of those coding conventions look really ugly to me: 2-space
indents and _ for self.  I think the code will look horrible!

I think everyone should be able to make their own style choices, but I
ask you to reconsider.  If you have to reconsider one, I would beg you
to use 'self' like everybody else.  The _ name is already overloaded
with multiple meanings in the Python community: it's a shorthand for
the last evaluated expression in interactive mode, and some people use
it as a dummy variable to assign uninteresting results to.

Almost the entire Python community is happy with 4-space indents; if
you're worried about your lines getting too long, that's usually a
hint that your code can be restructured in a way that's easier on the
reader's eye/mind anyway.

--Guido van Rossum (home page:

From  Mon Jan 21 16:10:04 2002
From: (Barry A. Warsaw)
Date: Mon, 21 Jan 2002 11:10:04 -0500
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico>
Message-ID: <>

>>>>> "GvR" == Guido van Rossum <> writes:

    GvR> The _ name is already overloaded with multiple meanings in
    GvR> the Python community: it's a shorthand for the last evaluated
    GvR> expression in interactive mode, and some people use it as a
    GvR> dummy variable to assign uninteresting results to.

It's also the common name of a function in internationalized Python
applications (mostly inherited from established conventions in the C


From  Mon Jan 21 18:16:51 2002
From: (Peter Norvig)
Date: Mon, 21 Jan 2002 10:16:51 -0800
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico> <>
Message-ID: <>

Wow; I didn't expect this to generate such a response.  But I did post
the code far before it was ready and put the "I could be talked out of
it" there for a reason. So, thank you for your feedback!  My reactions:

4 spaces: OK.=20

I have no strong feelings on that, and I think its just an accident of
the way my emacs was configured that I started using 2 spaces. I agree
that I should make it easier for other people to edit my code, so I'll
switch to the default.

self: OK, I'll try it.=20

My rationale was: I'm used to Java, where self is usually spelled '',
and I figured '_' was the next best thing. I find it much nicer to read
because 'self' is too intrusive; I want something that disappears.=20

	_.x, _.y, _.z =3D x, y, z
	self.x, self.y, self.z =3D x, y, z

Besides saving 9 characters, I find that the first line I can read at a
glance, ignoring the _, while the second I have to look at more
carefully. I also like the symmetry of _._ in _._private_slot.  However,
I recognize I'm doing this as an outsider to the language without much
experience reading/writing it. If it is really true that using '_' would
be seen as a change to the language and not a personal quirk, then I
agree that I shouldn't do it.  The first hint I had of this was when I
saw something on comp.lang.python (I forget the details) suggesting that
an automated tool look for methods with first argument 'self'. So I'll
try 'self' for a while, and hope I learn to like it (and learn to read
the second sample line above in one glance).  If I don't, I'll write
here and give you all another chance to innundate me with reasons why I


PS - Getting a personal request from Guido reminds me of the time I was
at a conference and John McCarthy walked up to the booth of one of the
Lisp vendors and said in his usual direct fashion "I hear you have a new
version. You should send me one".  The booth bimbo had no idea who
McCarthy was and politely suggested he pay for a copy.  Then someone in
the booth with a little more experience came over and said "That's ok --
it's his language, he can have whatever he wants."

Guido van Rossum wrote:
> >
> >
> > Peter Norvig is about to supply
> > Python versions of the algorithms with
> > the 2nd edition of his AI: A Modern Approach.
> >
> > So far, so good. In the section about
> > coding convetions he says:
> >
> > =A6In general, follow Guido's style conventions,
> > =A6but I have some quirks that I prefer (although I could be talked o=
ut of them):
> > ...
> > =A6* _ instead of self as first argument to methods: def f(_, x):
> > ...
> >
> > I'm perfectly aware that the 'self' thing it is just a convetion,
> > OTOH much of the cross-programmer readability
> > of code relies on such convention.
> >
> > It is good, bad or irrelevant to have such
> > an authoritative book (although about AI not
> > Python directly) adopting such a line-noisy
> > convention?
> >
> > Maybe nobody cares, but I preferred not to
> > let this go unnoticed. Someone who cares
> > could try to discuss the issue or make it
> > apparent to Mr. Norvig.
> >
> > Opinions?
> >
> > regards, Samuele Pedroni.
> Peter:
> My apologies for butting in here without doing full research.  I don't
> know how you reached this set of conventions, so maybe you've got a
> very good reason; but I don't see it on your webpage.
> Two of those coding conventions look really ugly to me: 2-space
> indents and _ for self.  I think the code will look horrible!
> I think everyone should be able to make their own style choices, but I
> ask you to reconsider.  If you have to reconsider one, I would beg you
> to use 'self' like everybody else.  The _ name is already overloaded
> with multiple meanings in the Python community: it's a shorthand for
> the last evaluated expression in interactive mode, and some people use
> it as a dummy variable to assign uninteresting results to.
> Almost the entire Python community is happy with 4-space indents; if
> you're worried about your lines getting too long, that's usually a
> hint that your code can be restructured in a way that's easier on the
> reader's eye/mind anyway.
> --Guido van Rossum (home page:

Peter Norvig, Director of Machine Learning, Google,,  Voice:650-330-0100 x1248,  Fax:650-618-1499

From  Mon Jan 21 19:02:49 2002
From: (Guido van Rossum)
Date: Mon, 21 Jan 2002 14:02:49 -0500
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: Your message of "Mon, 21 Jan 2002 10:16:51 PST."
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico> <>
Message-ID: <>

> Wow; I didn't expect this to generate such a response.  But I did post
> the code far before it was ready and put the "I could be talked out of
> it" there for a reason. So, thank you for your feedback!  My reactions:

You're welcome.  I'm always there to save a straying stranger. :-)

> self: OK, I'll try it. 
> My rationale was: I'm used to Java, where self is usually spelled '',
> and I figured '_' was the next best thing. I find it much nicer to read
> because 'self' is too intrusive; I want something that disappears. 

I hear that in the Lisp world, when someone complains about the
parentheses, the standard response is "once you're used to it, the
parentheses disappear".  So it is for Python's 'self'.  :-)

> Compare:
> 	_.x, _.y, _.z = x, y, z
> 	self.x, self.y, self.z = x, y, z
> Besides saving 9 characters, I find that the first line I can read at a
> glance, ignoring the _, while the second I have to look at more
> carefully. I also like the symmetry of _._ in _._private_slot.  However,
> I recognize I'm doing this as an outsider to the language without much
> experience reading/writing it. If it is really true that using '_' would
> be seen as a change to the language and not a personal quirk, then I
> agree that I shouldn't do it.  The first hint I had of this was when I
> saw something on comp.lang.python (I forget the details) suggesting that
> an automated tool look for methods with first argument 'self'. So I'll
> try 'self' for a while, and hope I learn to like it (and learn to read
> the second sample line above in one glance).  If I don't, I'll write
> here and give you all another chance to innundate me with reasons why I
> should.


> -Peter
> PS - Getting a personal request from Guido reminds me of the time I was
> at a conference and John McCarthy walked up to the booth of one of the
> Lisp vendors and said in his usual direct fashion "I hear you have a new
> version. You should send me one".  The booth bimbo had no idea who
> McCarthy was and politely suggested he pay for a copy.  Then someone in
> the booth with a little more experience came over and said "That's ok --
> it's his language, he can have whatever he wants."

What's a booth bimbo? :-)

--Guido van Rossum (home page:

From  Mon Jan 21 20:47:59 2002
From: (Jason Orendorff)
Date: Mon, 21 Jan 2002 14:47:59 -0600
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: <001201c1a27e$5d8ec060$6d94fea9@newmexico>
Message-ID: <>

> ] In general, follow Guido's style conventions,
> ] but I have some quirks that I prefer (although I could be talked 
> ] out of them):
> ...
> ] * _ instead of self as first argument to methods: def f(_, x):
> ...

I dunno; I think sample code should (a) stick rather conservatively
to typical usage, apart from the concept being illustrated of course;
and (b) strive for maximum readability.

For Python, both principles demand that one should write:

    def foo(bar):
        if is_list(bar):
            return sum(map(foo, bar))
            return [bar]

instead of:

    def foo(bar):
      if is_list(bar):  return sum(map(foo, bar))
      else:  return [bar]

This may be one of those things that only makes sense if you've
not a Lisp programmer.  (wink)

To stray from the topic:  I find I only disagree with three points
in Peter Norvig's enlightening table of Lisp vs. Python features.

1.  That "x.slot = y" is not user-extensible.
    The __setattr__() method does this.

2.  That Python's relative lack of control structures is
    necessarily worse than Lisp's abundance of them.
    Especially for students, I think this:

      if is_list(n):
          return foo_l(n)
      elif is_str(n) or is_int(n):
          return foo_a(n)
          raise TypeError

    is at least as clear, though not as brief, as this:

      (etypecase n
                 (list (foo-l n))
                 ((or string integer) (foo-a n)))

    with the obligatory note in the text to the effect that
    "'Etypecase' is a form similar to 'case' which selects
    a clause based on the type..." and so on.

3.  That Python doesn't support generic programming.
    Generic algorithms are expressed as naturally in Python
    as in any language I know:

      from operator import add
      def sum(items):
          return reduce(add, items)

      >>> sum([3, 4, 5])
      >>> sum([3, 4j, 4-2j])    
      >>> sum(["py", "th", "o", "n"])

    Likewise it's natural to write functions that can operate
    on "any sequence", not just lists or tuples, "any file-like
    object", not just a real file, "any function-like object",

    Perhaps something more specific is meant by "generic

## Jason Orendorff

From  Mon Jan 21 21:57:40 2002
From: (Martin v. Loewis)
Date: Mon, 21 Jan 2002 22:57:40 +0100
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: <001201c1a27e$5d8ec060$6d94fea9@newmexico> (
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico>
Message-ID: <>

> Opinions?

I dislike it, because _ is already taken for two things: for the last
expression in interactive mode, and as a markup of translatable


From  Mon Jan 21 23:22:32 2002
From: (Peter Norvig)
Date: Mon, 21 Jan 2002 15:22:32 -0800
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
References: <001201c1a27e$5d8ec060$6d94fea9@newmexico> <>
 <> <>
Message-ID: <>

Guido van Rossum wrote:
> I hear that in the Lisp world, when someone complains about the
> parentheses, the standard response is "once you're used to it, the
> parentheses disappear".  So it is for Python's 'self'.  :-)

That may be a good analogy, and as I said, I'm willing to try.  But I
still think one character is easier to ignore than four, and that there
is no compelling argument for 'self' over '_', while there is a positive
reason for parens (ease of automated parsing tools). 

> What's a booth bimbo? :-)

"It's not a sexist phenomenon as such, applying equally to the pretty
young men and women who work as scenery at various booths. Universally,
these people have no clue about the products they represent; instead
they hand out buttons and propaganda, smile nicely, and act as props for
the larger show that goes on around them."

> --Guido van Rossum (home page:

From  Tue Jan 22 00:06:51 2002
From: (Jason Orendorff)
Date: Mon, 21 Jan 2002 18:06:51 -0600
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: <>
Message-ID: <>

Peter Norvig wrote:
> Guido van Rossum wrote:
> > I hear that in the Lisp world, when someone complains about the
> > parentheses, the standard response is "once you're used to it, the
> > parentheses disappear".  So it is for Python's 'self'.  :-)
> That may be a good analogy, and as I said, I'm willing to try.

It's an excellent analogy:  both statements are about 1/3 true
in my experience.  :-)

> But I still think one character is easier to ignore than four,
> and that there is no compelling argument for 'self' over '_',
> while there is a positive reason for parens (ease of automated
> parsing tools).

There is no especially compelling reason for Python to have
'self' over '_' or 'me' or '@' or ''.

However, there is a compelling reason for you to choose 'self':
"Prefer the standard to the offbeat."  --Strunk and White

## Jason Orendorff

From Anthony Baxter <>  Tue Jan 22 00:14:37 2002
From: Anthony Baxter <> (Anthony Baxter)
Date: Tue, 22 Jan 2002 11:14:37 +1100
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
In-Reply-To: Message from Peter Norvig <>
 of "Mon, 21 Jan 2002 15:22:32 -0800." <>
Message-ID: <>

>>> Peter Norvig wrote
> That may be a good analogy, and as I said, I'm willing to try.  But I
> still think one character is easier to ignore than four, and that there
> is no compelling argument for 'self' over '_', while there is a positive
> reason for parens (ease of automated parsing tools). 
The primary arguments against '_' are that it already has meaning. I can 
think of three, off the top of my head.
  Interactive mode uses this as "result of last expression".
  The i18n code uses it as a function _('translate me').
  Zope uses it in DTML (python) expressions as the default namespace.

I'd also add the subjective argument that it's ugly, and looks far too
magical and perl-like. I don't _want_ it to disappear into the background,
as it's going to cause me pain if I miss it.


From  Tue Jan 22 00:27:04 2002
From: (Peter Norvig)
Date: Mon, 21 Jan 2002 16:27:04 -0800
Subject: [Python-Dev] OT: style convention: self vs. _ in new Norvig's book
References: <>
Message-ID: <>

OK, OK; When both Guido and E. B. team up against me, I know I'm licked.


Jason Orendorff wrote:
> There is no especially compelling reason for Python to have
> 'self' over '_' or 'me' or '@' or ''.
> However, there is a compelling reason for you to choose 'self':
> "Prefer the standard to the offbeat."  --Strunk and White

From  Tue Jan 22 14:35:15 2002
From: (
Date: Tue, 22 Jan 2002 08:35:15 -0600
Subject: [Python-Dev] Bug? is Tkinter+no threads+Windows supported?
Message-ID: <15437.30883.138962.301012@dynamic2.tttech1.ttt>

My client is trying to build a version of Python on Windows with Tkinter and
pymalloc enabled, and threads disabled (in part because pymalloc is not
thread-safe).  There appears to be a bug in _tkinter.c:EventHook.  It has
this code:

    #if defined(WITH_THREAD) || defined(MS_WINDOWS)
                    PyThread_acquire_lock(tcl_lock, 1);
                    tcl_tstate = event_tstate;

                    result = Tcl_DoOneEvent(TCL_DONT_WAIT);

                    tcl_tstate = NULL;
                    if (result == 0)
                    result = Tcl_DoOneEvent(0);

It seems on the surface that the "|| defined(MS_WINDOWS)" bit should be
deleted.  This code dates from 1998 and comes with this log text:

    revision 1.72
    date: 1998/06/13 13:56:28;  author: guido;  state: Exp;  lines: +26 -6
    Fixed the EventHook() code so that it also works on Windows, sort of.
    (The "sort of" is because it uses kbhit() to detect that the user
    starts typing, and then no events are processed until they hit

    Also fixed a nasty locking bug: EventHook() is called without the Tcl
    lock set, so it can't use the ENTER_PYTHON and LEAVE_PYTHON macros,
    which manipulate both the Python and the Tcl lock.  I now only acquire
    and release the Python lock.

    (Haven't tested this on Unix yet...)

This suggests that Guido was (rightly) worried about the case of threading
on Windows.  What about a non-threaded interpreter on Windows?


From  Tue Jan 22 15:02:12 2002
From: (Neil Schemenauer)
Date: Tue, 22 Jan 2002 07:02:12 -0800
Subject: [Python-Dev] Bug? is Tkinter+no threads+Windows supported?
In-Reply-To: <15437.30883.138962.301012@dynamic2.tttech1.ttt>; from on Tue, Jan 22, 2002 at 08:35:15AM -0600
References: <15437.30883.138962.301012@dynamic2.tttech1.ttt>
Message-ID: <> wrote:
> My client is trying to build a version of Python on Windows with Tkinter and
> pymalloc enabled, and threads disabled (in part because pymalloc is not
> thread-safe).

Using pymalloc with threads should be safe as long as you don't have
extensions that call pymalloc without the big lock held.


From  Wed Jan 23 08:24:27 2002
From: (
Date: Wed, 23 Jan 2002 02:24:27 -0600
Subject: [Python-Dev] "This document is locked" message from Sourceforge?
Message-ID: <15438.29499.711766.816065@dynamic2.tttech1.ttt>

I just logged into Sourceforge.  Now every time I visit a page, although
that page displays, I also get username/password popup saying the document
is locked and giving a server message of "foo".  Any idea where this came
from?  Perhaps a test on SF they forgot to undo before putting some pages
into production?


From  Wed Jan 23 10:57:23 2002
From: (Michael Hudson)
Date: 23 Jan 2002 10:57:23 +0000
Subject: [Python-Dev] "This document is locked" message from Sourceforge?
In-Reply-To:'s message of "Wed, 23 Jan 2002 02:24:27 -0600"
References: <15438.29499.711766.816065@dynamic2.tttech1.ttt>
Message-ID: <> writes:

> I just logged into Sourceforge.  Now every time I visit a page, although
> that page displays, I also get username/password popup saying the document
> is locked and giving a server message of "foo".  Any idea where this came
> from?  Perhaps a test on SF they forgot to undo before putting some pages
> into production?

Haven't noticed that, but sf is being nice and snappy this morning,
isn't it?  It seems to take five minutes for a bug report to finish
displaying.  Argh! <thump> <thump> <thump>


  $ head -n 2 src/bash/bash-2.04/unwind_prot.c
   /* I can't stand it anymore!  Please can't we just write the
      whole Unix system in lisp or something? */
                                       -- spotted by Rich van der Hoff

From sdm7g@Virginia.EDU  Wed Jan 23 16:26:09 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 23 Jan 2002 11:26:09 -0500 (EST)
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
Message-ID: <>

Py_DebugFlag is used for debugging the Python parser.
Py_VerboseFlag is used for debugging and tracing imports.
 (and in some places it wants Py_VerboseFlag > 1 (more than one "-v")
   for output)

Are there any conventions on which to use for other debugging output?
(Or did Guido have any particular conventions in mind when he added
them? )

Right now, I'm using Py_VerboseFlag to also trigger logging of message
sends in pyobjc. Stealing this flag for another use isn't a problem
here because [1] the logging goes to a /tmp file, so I don't have
to turn off import tracing -- the two logging streams don't get mixed
together, and [2] it only functions when you import pyobjc, so it's
not going to get in someone else's use.

But I may need to add other debug and log output to my module and
I'ld like to do it in the least suprising manner if possible.

-- Steve Majewski

From  Wed Jan 23 17:51:58 2002
From: (Tim Peters)
Date: Wed, 23 Jan 2002 12:51:58 -0500
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: <>
Message-ID: <>

[Steven Majewski]
> Py_DebugFlag is used for debugging the Python parser.

If Guido had it to do over again, I suspect he'd put that code in #ifdef
Py_DEBUG blocks instead.

> Py_VerboseFlag is used for debugging and tracing imports.
>  (and in some places it wants Py_VerboseFlag > 1 (more than one "-v")
>    for output)
> Are there any conventions on which to use for other debugging output?

Py_VerboseFlag is for output about core activities every user of Python may
want to see sometimes, and in release builds.  It doesn't cover much beyond
tracing imports, printing stats about memory cleanup, and some highly
dubious fudging:

PyThreadState_Clear(PyThreadState *tstate)
	if (Py_VerboseFlag && tstate->frame != NULL)
		  "PyThreadState_Clear: warning: thread still has a frame\n");

(that should probably be an error instead -- or be officially blessed).

> (Or did Guido have any particular conventions in mind when he added
> them? )
> Right now, I'm using Py_VerboseFlag to also trigger logging of message
> sends in pyobjc. Stealing this flag for another use isn't a problem
> here because [1] the logging goes to a /tmp file, so I don't have
> to turn off import tracing -- the two logging streams don't get mixed
> together, and [2] it only functions when you import pyobjc, so it's
> not going to get in someone else's use.
> But I may need to add other debug and log output to my module and
> I'ld like to do it in the least suprising manner if possible.

Supply a "set debug and log options" interface for your module, and then
call it <wink>.  Good example:  the gc module.

From  Wed Jan 23 18:01:14 2002
From: (Guido van Rossum)
Date: Wed, 23 Jan 2002 13:01:14 -0500
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: Your message of "Wed, 23 Jan 2002 12:51:58 EST."
References: <>
Message-ID: <>

> [Steven Majewski]
> > Py_DebugFlag is used for debugging the Python parser.
> If Guido had it to do over again, I suspect he'd put that code in #ifdef
> Py_DEBUG blocks instead.

Yes and no.  Some of it *is* already only inside #ifdef Py_DEBUG (see
parser.c); but it still requires a command line flag because the
output is too much to bear in a regular debugging run...

--Guido van Rossum (home page:

From sdm7g@Virginia.EDU  Wed Jan 23 18:06:30 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 23 Jan 2002 13:06:30 -0500 (EST)
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: <>
Message-ID: <>

On Wed, 23 Jan 2002, Tim Peters wrote:

> Supply a "set debug and log options" interface for your module, and then
> call it <wink>.  Good example:  the gc module.

Thanks. That mostly makes sense.
Except that I needed it to be in trace/debug mode when the module
initialization is being done, so I can't import the module and then
set it. I suppose I could just use another environment variable:
$PYOBJC_DEBUG -- then I could set debug levels.

-- Steve

FYI: In case you're wondering why I don't just use gdb:
 It's seems to be a meta level problem between the python runtime
and the objective-c runtime, and I suspect the objc extensions
in gdb must make use of the objc-runtime ( for 'po' - print object,
for example.) because I seem to be causing another objc runtime
exception the act of examining things in the debugger.
 This is not very documented in the gdb manual, so unless I'm
going to wade thru the sources, I though it would be easier just
to instrument the module. (and maybe Python.)

From  Wed Jan 23 18:26:02 2002
From: (Gordon McMillan)
Date: Wed, 23 Jan 2002 13:26:02 -0500
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: <>
References: <>
Message-ID: <3C4EB9EA.25681.2F95B9E4@localhost>

On 23 Jan 2002 at 13:06, Steven Majewski wrote:

> FYI: In case you're wondering why I don't just use gdb:
> It's seems to be a meta level problem between the python
> runtime and the objective-c runtime, and I suspect the
> objc extensions in gdb must make use of the objc-runtime (
> for 'po' - print object, for example.) because I seem to be
> causing another objc runtime exception the act of examining
> things in the debugger. 

Or perhaps chip geometries are getting small enough
that simply the act of observing is enough.

running-on-stale-Doritos-ly y'rs

-- Gordon

From sdm7g@Virginia.EDU  Wed Jan 23 18:57:08 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 23 Jan 2002 13:57:08 -0500 (EST)
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: <3C4EB9EA.25681.2F95B9E4@localhost>
Message-ID: <>

On Wed, 23 Jan 2002, Gordon McMillan wrote:

> Or perhaps chip geometries are getting small enough
> that simply the act of observing is enough.

Well: the effect is being magnified by Class Object self reference,
which I could probably avoid if objective-C had actual metaclasses.

> running-on-stale-Doritos-ly y'rs

Gosh. I'm impressed.  I'm still running on coffee and donuts here.
I usually don't start on stale Doritos until much later in the day!
( Unless I've wrapped around on an all nighter and I'm still on
  last night's Doritos. )

From  Wed Jan 23 23:02:32 2002
From: (David Ascher)
Date: Wed, 23 Jan 2002 15:02:32 -0800
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
Message-ID: <>

As mentioned in:

largeint.h is gone from the VisualStudio compiler as of the
VisualStudio.NET release.

Python's build currently fails without the workaround mentioned in that

Furthermore, the file "ver.h" used in python_nt.rc appears to be gone as
well.  Not sure why we needed it.  Gettinr dir fo it seems to have no
ill effect =).  Anyone remember what it's for?

I'm having sre problems in the test suite though, which have pretty
wide-ranging effects.  

Is someone else looking at the patches needed for VS.NET, or should I
keep digging?


From  Wed Jan 23 23:06:50 2002
From: (David Ascher)
Date: Wed, 23 Jan 2002 15:06:50 -0800
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
References: <>
Message-ID: <>


test_longexp seems to be causing the python_d process to bloat to almost
80Megs.  This is with the VS.NET build.

I guess I really have to get a VC6 build going now =).

From  Wed Jan 23 23:07:49 2002
From: (Jack Jansen)
Date: Thu, 24 Jan 2002 00:07:49 +0100
Subject: [Python-Dev] PEP 278 - Universal newline support
Message-ID: <>

there's a new PEP 278 plus an accompanying patch available on 
the subject of universal newline support (the ability to read 
and import files that use a different newline convention than 
what the current platform uses).

Please read, apply, try, provide feedback and put me back to work:-)
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Wed Jan 23 23:14:09 2002
From: (Guido van Rossum)
Date: Wed, 23 Jan 2002 18:14:09 -0500
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
In-Reply-To: Your message of "Wed, 23 Jan 2002 15:06:50 PST."
References: <>
Message-ID: <>

> test_longexp seems to be causing the python_d process to bloat to almost
> 80Megs.  This is with the VS.NET build.

I think that's expected -- test_longexp is very memory intensive,
we've seen complaints about this on feeble platforms before. :-)

--Guido van Rossum (home page:

From  Wed Jan 23 23:48:47 2002
From: (Tim Peters)
Date: Wed, 23 Jan 2002 18:48:47 -0500
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
In-Reply-To: <>
Message-ID: <>

[David Ascher]
> As mentioned in:
> A.2060%40tkmsftngp07&rnum=1
> largeint.h is gone from the VisualStudio compiler as of the
> VisualStudio.NET release.
> Python's build currently fails without the workaround mentioned in that
> posting.

Did they also, e.g., change the signature of QueryPerformanceCounter(), so
that largeint.h isn't needed to get at the MS-specific LARGE_INTEGER
typedef?  Note that the workaround doesn't work unless these files are on
MS's list of redistributable files (which always takes me an hour to find,
and no time for that now).

> Furthermore, the file "ver.h" used in python_nt.rc appears to be gone as
> well.  Not sure why we needed it.  Gettinr dir fo it seems to have no
> ill effect =).  Anyone remember what it's for?

Mark Hammond created all the code in question (here and above), so
ActiveState should know who to hire to maintain it <wink>.

Here's ver.h in its entirety (as of VC6):

#ifndef RC_INVOKED
#pragma message ("VER.H obsolete, including WINVER.H instead")
#include <winver.h>

gettinr-dir-fo-it-indeed-ly y'rs  - tim

From  Thu Jan 24 00:45:03 2002
From: (David Ascher)
Date: Wed, 23 Jan 2002 16:45:03 -0800
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
References: <>
Message-ID: <>

> Did they also, e.g., change the signature of QueryPerformanceCounter(), so
> that largeint.h isn't needed to get at the MS-specific LARGE_INTEGER
> typedef?  Note that the workaround doesn't work unless these files are on
> MS's list of redistributable files (which always takes me an hour to find,
> and no time for that now).

I did not intend that the workaround would be the right way to do it
long term.

LARGE_INTEGER is now defined in winnt.h, which is included by
windows.h.  However, the current code does need more than just the
typedef, such as LargeIntegerEqualToZero, LargeIntegerSubtract, etc.

> > Furthermore, the file "ver.h" used in python_nt.rc appears to be gone as
> > well.  Not sure why we needed it.  Gettinr dir fo it seems to have no
> > ill effect =).  Anyone remember what it's for?
> Mark Hammond created all the code in question (here and above), so
> ActiveState should know who to hire to maintain it <wink>.

Sigh.  I'm not doing this on behalf of ActiveState -- there's no real
need for us to move to VS.NET for most of our builds right now.  I'm
just playing with my new toy.


From  Thu Jan 24 01:02:53 2002
From: (Tim Peters)
Date: Wed, 23 Jan 2002 20:02:53 -0500
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
In-Reply-To: <>
Message-ID: <>

[David Ascher]
> I did not intend that the workaround would be the right way to do it
> long term.
> LARGE_INTEGER is now defined in winnt.h, which is included by
> windows.h.  However, the current code does need more than just the
> typedef, such as LargeIntegerEqualToZero, LargeIntegerSubtract, etc.

Those can be replaced with "== 0" and "-" etc -- the obvious things, at
least under VC6.  Don't know about .NET.

>> Mark Hammond created all the code in question (here and above), so
>> ActiveState should know who to hire to maintain it <wink>.

> Sigh.  I'm not doing this on behalf of ActiveState

Neither am I <wink>.

> -- there's no real need for us to move to VS.NET for most of our builds
> right now.  I'm just playing with my new toy.

Well, then *you* know who to hire -- same thing.

BTW, the #include of ver.h is gone in current CVS now.  Mucking with
LARGE_INTEGER awaits a volunteer.

From  Thu Jan 24 09:10:05 2002
From: (Fredrik Lundh)
Date: Thu, 24 Jan 2002 10:10:05 +0100
Subject: [Python-Dev] largeint.h and ver.h gone from VS.NET
References: <>
Message-ID: <00db01c1a4b6$eb4008d0$0900a8c0@spiff>

david wrote:

> I'm having sre problems in the test suite though, which have pretty
> wide-ranging effects.

SRE uses agressive inlining under MSVC.  maybe their new optimizer
is slightly broken? (not the first time, in a X.0 release)

as a temporary workaround, try changing

    #if defined(_MSC_VER)


    #if 0 && defined(_MSC_VER)

if SRE works after this change, try switching on

if you find a combination that works, change the MSC_VER
clause to:

    #if defined(_MSC_VER) && _MSC_VER >= SOMETHING
    ... configuration
    #elif defined(_MSC_VER)
    ... msvc 5/6 configuration
    #elif defined(USE_INLINE)

and mail me the patch.

cheers /F

From  Thu Jan 24 15:07:34 2002
From: (Guido van Rossum)
Date: Thu, 24 Jan 2002 10:07:34 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Mac/scripts,1.18,1.19
In-Reply-To: Your message of "Thu, 24 Jan 2002 09:34:40 EST."
References: <>
Message-ID: <>

> The keyword module has an undocumented data object kwlist which is a
> list of keywords.  Perhaps this should be documented and made part of
> the public API?  I'd want to change the list to a tuple, but that
> seems harmless since it isn't already part of the API.

Why make it a tuple?  Out of fear someone changes it?  Let them change
it, and learn about sharing of object references!

Agree it should be documented of course.

--Guido van Rossum (home page:

From  Thu Jan 24 16:07:41 2002
From: (Fred L. Drake, Jr.)
Date: Thu, 24 Jan 2002 11:07:41 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Mac/scripts,1.18,1.19
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum writes:
 > Why make it a tuple?  Out of fear someone changes it?  Let them change
 > it, and learn about sharing of object references!

Partly, and partly because it's something that should be changed
anyway.  Do you seriously object to changing it to a tuple???

 > Agree it should be documented of course.



Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Thu Jan 24 16:11:23 2002
From: (Guido van Rossum)
Date: Thu, 24 Jan 2002 11:11:23 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Mac/scripts,1.18,1.19
In-Reply-To: Your message of "Thu, 24 Jan 2002 11:07:41 EST."
References: <> <> <>
Message-ID: <>

> Do you seriously object to changing it to a tuple???

Yes, I don't want to create any more show code examples that use
tuples for (conceptually) arbitrary-length arrays of homogeneous
data.  The data type to use for those is lists.

--Guido van Rossum (home page:

From  Thu Jan 24 16:56:38 2002
From: (Aahz Maruch)
Date: Thu, 24 Jan 2002 08:56:38 -0800 (PST)
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: <> from "Guido van Rossum" at Jan 24, 2002 11:11:23 AM
Message-ID: <>

Guido van Rossum wrote:
>> Do you seriously object to changing it to a tuple???
> Yes, I don't want to create any more show code examples that use
> tuples for (conceptually) arbitrary-length arrays of homogeneous
> data.  The data type to use for those is lists.

Hrm.  Even when it's something that's supposed to be immutable?  I'm
asking because I'm currently using a tuple for the digit list in my BCD
module, and I'd like a clearer explanation of why you think that it
should be a list (assuming you do).

>From my viewpoint, the BCD digit string should be handled like a string;
I'm only using a tuple for efficiency of storing numbers instead of
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Thu Jan 24 17:01:51 2002
From: (Guido van Rossum)
Date: Thu, 24 Jan 2002 12:01:51 -0500
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: Your message of "Thu, 24 Jan 2002 08:56:38 PST."
References: <>
Message-ID: <>

> Hrm.  Even when it's something that's supposed to be immutable?  I'm
> asking because I'm currently using a tuple for the digit list in my BCD
> module, and I'd like a clearer explanation of why you think that it
> should be a list (assuming you do).
> From my viewpoint, the BCD digit string should be handled like a string;
> I'm only using a tuple for efficiency of storing numbers instead of
> characters.

Can't you trust your users not to change it?

--Guido van Rossum (home page:

From  Thu Jan 24 18:03:35 2002
From: (Aahz Maruch)
Date: Thu, 24 Jan 2002 10:03:35 -0800 (PST)
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: <> from "Guido van Rossum" at Jan 24, 2002 12:01:51 PM
Message-ID: <>

Guido van Rossum wrote:
> Aahz:
>> Hrm.  Even when it's something that's supposed to be immutable?  I'm
>> asking because I'm currently using a tuple for the digit list in my BCD
>> module, and I'd like a clearer explanation of why you think that it
>> should be a list (assuming you do).
>> From my viewpoint, the BCD digit string should be handled like a string;
>> I'm only using a tuple for efficiency of storing numbers instead of
>> characters.
> Can't you trust your users not to change it?

Sure, but then I can't just copy references to the tuple when creating a
copy of an instance, I'd have to copy the entire list.  That's what I
meant by efficiency.  There are important semantic differences coming
from the fact that tuples are immutable and lists are mutable, and I
think that a strict heterogeneous/homogenous distinction loses that.
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Thu Jan 24 18:07:24 2002
From: (Guido van Rossum)
Date: Thu, 24 Jan 2002 13:07:24 -0500
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: Your message of "Thu, 24 Jan 2002 10:03:35 PST."
References: <>
Message-ID: <>

> Sure, but then I can't just copy references to the tuple when creating a
> copy of an instance, I'd have to copy the entire list.  That's what I
> meant by efficiency.  There are important semantic differences coming
> from the fact that tuples are immutable and lists are mutable, and I
> think that a strict heterogeneous/homogenous distinction loses that.

Well, as long as you promise not to change it, you *can* copy a
reference, right?  I guess I don't understand your application
enough -- do you intend this to be a starting point that is modified
during the program's execution, or is this a constant array?

--Guido van Rossum (home page:

From  Fri Jan 25 01:22:47 2002
From: (Tim Peters)
Date: Thu, 24 Jan 2002 20:22:47 -0500
Subject: [Python-Dev] VERBOSE and DEBUG conventions.
In-Reply-To: <>
Message-ID: <>

> Supply a "set debug and log options" interface for your module, and then
> call it <wink>.  Good example:  the gc module.

[Steven Majewski]
> Thanks. That mostly makes sense.
> Except that I needed it to be in trace/debug mode when the module
> initialization is being done, so I can't import the module and then
> set it. I suppose I could just use another environment variable:
> $PYOBJC_DEBUG -- then I could set debug levels.

Sure.  Or split out option/logging knobs into a distinct module.

> FYI: In case you're wondering why I don't just use gdb:

Nope <wink>.

>  It's seems to be a meta level problem between the python runtime
> and the objective-c runtime, and I suspect the objc extensions
> in gdb must make use of the objc-runtime ( for 'po' - print object,
> for example.) because I seem to be causing another objc runtime
> exception the act of examining things in the debugger.
>  This is not very documented in the gdb manual, so unless I'm
> going to wade thru the sources, I though it would be easier just
> to instrument the module. (and maybe Python.)

Upgrade your OS to Windows and all these time-consuming *choices* go away.
Got a bug?  Great!  There's nowhere to report it that isn't a black hole,
and you can't even think about patching the sources, so you just live with
it and buy another OS next year.  Except for all the bugs you have to learn
to endure, it makes life much simpler <wink>.

From  Fri Jan 25 05:02:12 2002
From: (Neil Hodgson)
Date: Fri, 25 Jan 2002 16:02:12 +1100
Subject: [Python-Dev] Re: PEP 277: Unicode file name support for Windows NT,   was PEP-time ? ...
References: <> <> <016e01c19639$94c909b0$0acc8490@neil> <> <021901c19654$21f2e3f0$0acc8490@neil> <> <036e01c1972d$dbc88a80$0acc8490@neil> <> <003b01c1975d$e7dd3070$0acc8490@neil> <> <> <02ff01c19cc3$92514540$0acc8490@neil> <> <> <> <> <> <> <08e201c19f46$cad5f070$0acc8490@neil> <> <> <>
Message-ID: <06c901c1a55d$73987f40$0acc8490@neil>

M.-A. Lemburg:
> "Martin v. Loewis" wrote:
> > ...
> > if sys.platform == "win32":
> >   use_unicode_for_filenames = windowsversion in ['nt','w2k','xp']
> > elif sys.platform.startswith("darwin"):
> >   use_unicode_for_filenames = 1
> > else:
> >   use_unicode_for_filenames = 0
> Sounds like this would be a good candidate for which I'll
> check into CVS soon. With its many platform querying APIs it should
> easily be possible to add a function which returns the above
> information based on the platform Python is running on.

   OK. I'll remove unicodefilenames() from the PEP and my patch.


From  Thu Jan 17 17:09:38 2002
From: (Christian Tismer)
Date: Thu, 17 Jan 2002 18:09:38 +0100
Subject: [Python-Dev] Ann: Stackless Python is DEAD! Long live Stackless Python
Message-ID: <>




The end of an era has come:
Stackless Python, in the form provided upto Python 2.0, is DEAD.

I am abandoning the whole implementation.

A new era has begun:
A completely new implementation is in development for
Python 2.2 and up which gives you the following features:

- There are no restrictions any longer for uthread/coroutine
   switching. Switching is possible at *any* time, in *any*

- There are no significant changes to the Python core any
   longer. The new patches are of minimum size, and they
   will probably survive unchanged until Python 3.0 .

- Maintenance work for Stackless Python is reduced to the
   bare minimum. There is no longer a need to incorporate
   Stackless into the standard, since there is no work to
   be shared.

- Stackless breaks its major axiom now. It is no longer
   platform independent, since it *does* modify the C stack.
   I will support all Intel platforms by myself. For other
   platforms, I'm asking for volunteers.

* The basic elements of Stackless are now switchable chains
   of frames. We have to define an interface that turns these
   chains into microthreads and coroutines.

Everybody is invited to come to the Stackless mailing
list and discuss the layout of this new design.
Especially we need to decide about (*).

see you there - chris

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
      where do you want to jump today?

From  Fri Jan 25 20:46:29 2002
From: (M.-A. Lemburg)
Date: Fri, 25 Jan 2002 21:46:29 +0100
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
Message-ID: <>

Browing the Mozilla web-site I came across I nice utility which
enables cross-referenced source code browsing: LXR

For example, see e.g.

I suppose setting this up on would ease referencing
Python C sources a lot and also provide a nice tool for learning
to understand the internal structures of the interpreter.

What do you think ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 25 20:49:27 2002
From: (Guido van Rossum)
Date: Fri, 25 Jan 2002 15:49:27 -0500
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
In-Reply-To: Your message of "Fri, 25 Jan 2002 21:46:29 +0100."
References: <>
Message-ID: <>

> Browing the Mozilla web-site I came across I nice utility which
> enables cross-referenced source code browsing: LXR
> For example, see e.g.
> I suppose setting this up on would ease referencing
> Python C sources a lot and also provide a nice tool for learning
> to understand the internal structures of the interpreter.
> What do you think ?


Do you want access to the website and CVS so you can
install this yourself?

--Guido van Rossum (home page:

From  Fri Jan 25 21:16:19 2002
From: (M.-A. Lemburg)
Date: Fri, 25 Jan 2002 22:16:19 +0100
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
References: <> <>
Message-ID: <>

Guido van Rossum wrote:
> > Browing the Mozilla web-site I came across I nice utility which
> > enables cross-referenced source code browsing: LXR
> >
> >
> >
> > For example, see e.g.
> >
> >
> >
> > I suppose setting this up on would ease referencing
> > Python C sources a lot and also provide a nice tool for learning
> > to understand the internal structures of the interpreter.
> >
> > What do you think ?
> +1
> Do you want access to the website and CVS so you can
> install this yourself?

I could do that, but would need some help from the admins
since LXR requires Perl 5+ and Glimpse to be installed. I'll
also need to modify the Apache config files and will probably
have to setup a cron job which updates the indexes once a 

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Fri Jan 25 21:26:04 2002
From: (David Ascher)
Date: Fri, 25 Jan 2002 13:26:04 -0800
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
References: <> <> <>
Message-ID: <>

Not wishing to make a science project out of it, but you might consider
the newer lxr, which uses a real database (mysql, IIRC).

We've used lxr in-house for a while, it's an absolutely wonderful tool. 
It is quite hard to setup multiple lxr's on a single machine (at least
with the 'old' lxr), be forewarned.

Also, lxr doesn't really deal especially well with Python code - but for
C/C++ code, it rocks.


"M.-A. Lemburg" wrote:
> Guido van Rossum wrote:
> >
> > > Browing the Mozilla web-site I came across I nice utility which
> > > enables cross-referenced source code browsing: LXR
> > >
> > >
> > >
> > > For example, see e.g.
> > >
> > >
> > >
> > > I suppose setting this up on would ease referencing
> > > Python C sources a lot and also provide a nice tool for learning
> > > to understand the internal structures of the interpreter.
> > >
> > > What do you think ?
> >
> > +1
> >
> > Do you want access to the website and CVS so you can
> > install this yourself?
> I could do that, but would need some help from the admins
> since LXR requires Perl 5+ and Glimpse to be installed. I'll
> also need to modify the Apache config files and will probably
> have to setup a cron job which updates the indexes once a
> day.
> --
> Marc-Andre Lemburg
> CEO Software GmbH
> ______________________________________________________________________
> Company & Consulting:                 
> Python Software:         
> _______________________________________________
> Python-Dev mailing list

From  Fri Jan 25 22:57:31 2002
From: (M.-A. Lemburg)
Date: Fri, 25 Jan 2002 23:57:31 +0100
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
References: <> <> <> <>
Message-ID: <>

David Ascher wrote:
> Not wishing to make a science project out of it, but you might consider
> the newer lxr, which uses a real database (mysql, IIRC).
> We've used lxr in-house for a while, it's an absolutely wonderful tool.
> It is quite hard to setup multiple lxr's on a single machine (at least
> with the 'old' lxr), be forewarned.
> Also, lxr doesn't really deal especially well with Python code - but for
> C/C++ code, it rocks.

Hmm, I was planning to install the Mozilla version of LXR. I'll also
look at the latest LXR version 0.9. If it does indeed use MySQL, I'd
rather not go down that road -- setting up and maintaining MySQL is 
not exactly fun...

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Sat Jan 26 02:24:31 2002
From: (
Date: Sat, 26 Jan 2002 03:24:31 +0100
Subject: [Python-Dev] Nouvelles acquisitions / New online
Message-ID: <>

Chers bibliophiles,

Cette semaine nous vous proposons nos nouvelles acquisitions (plus de 500). Vous y trouverez par exemple:

Edition originale (rare). Ouvrage contenant un catalogue de Michl Lévy, libraire éditeur, de 36 p.Broché.IN8. LEVY Paris 1870 
Réf.: 18337 (107,00 €)

ou bien encore:

Roman. Edition originale (rare). DOMAT Paris 1947
Réf.: 18313 (45,00 €)

En vous souhaitant bonne lecture

Toute l'Ă©quipe de

Dear bibliophiles,

This week new online 500 books, for example:

Edition originale (rare). Ouvrage contenant un catalogue de Michl Lévy, libraire éditeur, de 36 p.Broché.IN8. LEVY Paris 1870 
Réf.: 18337 (107,00 €)

Roman. Edition originale (rare). DOMAT Paris 1947
Réf.: 18313 (45,00 €)


AaZbooks' Team

 ---------------------------------------------------------- - BP N°1 - La grande Bruyère - F72320 St-Maixent
Tel.: +33 (0)2 43 71 00 70  - Fax: +33 (0)2 43 71 29 16 
Pour vous désinscrire cliquez ci-dessous\lnews\desinscription.php

From  Sat Jan 26 22:17:26 2002
From: (Skip Montanaro)
Date: Sat, 26 Jan 2002 16:17:26 -0600
Subject: [Python-Dev] Using LXR for Python CVS Source Code ?
In-Reply-To: <>
References: <>
Message-ID: <15443.10998.673581.778224@localhost.localdomain>

    mal> Hmm, I was planning to install the Mozilla version of LXR. I'll
    mal> also look at the latest LXR version 0.9. If it does indeed use
    mal> MySQL, I'd rather not go down that road -- setting up and
    mal> maintaining MySQL is not exactly fun...

I find MySQL fairly straightforward to work with.  (I use it on the Mojam &
Musi-Cal sites.)  If there's a functional difference between the new version
and the old, I'd be willing to help out administering the database.


From  Sun Jan 27 10:48:59 2002
From: (Andrew MacIntyre)
Date: Sun, 27 Jan 2002 21:48:59 +1100 (EDT)
Subject: [Python-Dev] updated patches for OS/2 EMX port
Message-ID: <>

Its taken longer than I'd hoped, however they're finally up for review.

The updated bits have been attached to the previous patch entries in the
patch manager:

435381:  distutils changes

450265:  build files - self contained subdirectory in PC/

450266:  library changes - 3 patch files covering:-
         - Lib/ (included as previously discussed here)
         - Lib/plat-os2emx/ (new subdirectory)
         - Lib/test/ (cope with 2 EMX limitations)

450267:  core changes - 4 patch files covering:-
         - Include/
         - Modules/ (lots of changes; see below for more info)
         - Objects/ (see below for more info)
         - Python/

I hope that I got the patch links right...

Particular notes wrt #450267:
- the patch to Modules/import.c supports VACPP in addition to EMX.
Michael Muller has trialled this patch with a VACPP build successfully.
It is messy, but OS/2 isn't going to lose the 8.3 naming limit on DLLs
anytime soon :-(  Although truncating the DLL (PYD) name to 8 characters
increases the chances of a name clash, the case-sensitive import support
in the same patch alleviates it somewhat, and the fact that the
"init<module>" entrypoint is maintained will result in an import failure
when there is an actual name clash.
- Modules/unicodedata.c is affected by a name clash between the internally
defined _getname() and an EMX routine of the same name defined in
<stdlib.h>.  The patch renames the internal routine to _getucname() to
avoid this, but this change may not be acceptable - advice please.
- Objects/stringobject.c and Objects/unicodeobject.c contain changes to
handle the EMX runtime library returning "0x" as the prefix for output
formatted with a "%X" format.

I have tried to minimise the changes in these patches to the minimum
needed for the port to function, ie I've tried to eradicate the cosmetic
changes in the earlier patches, and avoid picking up unwanted files (such
as Modules/Setup).  Please let me know if you find any such changes I

The patches uploaded apply cleanly to a copy of an anonoymously checked
out CVS tree as of 0527 AEST this morning (Jan 27), and have been built
and regression tested on both OS/2 EMX and FreeBSD 4.4R with no unexpected
test failures.

If there are no unresolvable objections, and approval to apply these
patches is granted, I propose that the patches be applied as follows:-

Stage 1:  the build patch (creates+populates PC/os2emx/)
Stage 2:  the Lib/plat-os2emx/ patch
Stage 3:  the Lib/ and Lib/test/ patches
Stage 4:  the distutils patch
Stage 5:  the Include/, Objects/ and Python/ patches
Stage 6:  the Modules/ patch

I would expect to allow at least 48 hours between stages.

Comments/advice on this proposal also appreciated.

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Sun Jan 27 20:32:44 2002
From: (Martin v. Loewis)
Date: 27 Jan 2002 21:32:44 +0100
Subject: [Python-Dev] updated patches for OS/2 EMX port
In-Reply-To: <>
References: <>
Message-ID: <>

Andrew MacIntyre <> writes:

> - Modules/unicodedata.c is affected by a name clash between the internally
> defined _getname() and an EMX routine of the same name defined in
> <stdlib.h>.  The patch renames the internal routine to _getucname() to
> avoid this, but this change may not be acceptable - advice please.

My advice for renaming things because of name clashes: Always rename
in a way that solves this particular problem for good, by using the Py
prefix (or _Py to further indicate that this is not public API; it's a
static function, anyway). Somebody may have a function _getucname
somewhere, whereas it is really unlikely that people add a Py prefix
to their functions (if they have been following the last 30 years of C

> - Objects/stringobject.c and Objects/unicodeobject.c contain changes to
> handle the EMX runtime library returning "0x" as the prefix for output
> formatted with a "%X" format.

I'd suggest a different approach here, which does not use #ifdefs:
Instead of testing for the system, test for the bug. Then, if the bug
goes away, or appears on other systems as well, the code will be good.

Once formatting is complete, see whether it put in the right letter,
and fix that in the result buffer if the native sprintf got it wrong.

If you follow this strategy, you should still add a comment indicating
that this was added for OS/2, to give people an idea where that came

Another approach would be to autoconfiscate this particular issue. I'm
in general in favour of autoconf'ed bug tests instead of runtime bug
tests, but people on systems without /bin/sh might feel differently.

> If there are no unresolvable objections, and approval to apply these
> patches is granted, I propose that the patches be applied as follows:-
> Stage 1:  the build patch (creates+populates PC/os2emx/)
> Stage 2:  the Lib/plat-os2emx/ patch
> Stage 3:  the Lib/ and Lib/test/ patches
> Stage 4:  the distutils patch
> Stage 5:  the Include/, Objects/ and Python/ patches
> Stage 6:  the Modules/ patch
> I would expect to allow at least 48 hours between stages.
> Comments/advice on this proposal also appreciated.

Sounds good to me (although I'd probably process the "uncritical",
i.e. truly platform-specific parts much more quickly). Who's going to
work with Andrew to integrate this stuff?


From  Mon Jan 28 21:21:58 2002
From: (Aahz Maruch)
Date: Mon, 28 Jan 2002 13:21:58 -0800 (PST)
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: <> from "Guido van Rossum" at Jan 24, 2002 01:07:24 PM
Message-ID: <>

Guido van Rossum wrote:
> Aahz:
>> Sure, but then I can't just copy references to the tuple when creating a
>> copy of an instance, I'd have to copy the entire list.  That's what I
>> meant by efficiency.  There are important semantic differences coming
>> from the fact that tuples are immutable and lists are mutable, and I
>> think that a strict heterogeneous/homogenous distinction loses that.
> Well, as long as you promise not to change it, you *can* copy a
> reference, right?  I guess I don't understand your application
> enough -- do you intend this to be a starting point that is modified
> during the program's execution, or is this a constant array?

It's a constant.  The BCD module is Binary Coded Decimal; instances are
intended to be as immutable as strings and numbers (well, it *is* a
number type).  Modifying an instance is guaranteed to produce a new
instance.  To a large extent, I guess I feel that if a class is intended
to be immutable, each of its underlying data attributes should also be
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Mon Jan 28 21:26:23 2002
From: (Guido van Rossum)
Date: Mon, 28 Jan 2002 16:26:23 -0500
Subject: [Python-Dev] Tuples vs. lists
In-Reply-To: Your message of "Mon, 28 Jan 2002 13:21:58 PST."
References: <>
Message-ID: <>

> It's a constant.  The BCD module is Binary Coded Decimal; instances are
> intended to be as immutable as strings and numbers (well, it *is* a
> number type).  Modifying an instance is guaranteed to produce a new
> instance.  To a large extent, I guess I feel that if a class is intended
> to be immutable, each of its underlying data attributes should also be
> immutable.

Or you could assign it to a private variable.

--Guido van Rossum (home page:

From  Tue Jan 29 00:58:09 2002
From: (Christian Tismer)
Date: Tue, 29 Jan 2002 01:58:09 +0100
Subject: [Python-Dev] Ann: Stackless 2.2 pre-alpha is ready!
Message-ID: <>

Dear Python community,

Stackless Python 2.2 is alive!

This is the first alpha version.
It does not have any relevant changes to the interpreter.
It does not have any limitation on switching.
Support code for uthreads and coroutines is already implemented.

And as announced, it is completely platform dependant.
This version works on MS Win32 only.
I'm going to support other platforms if I can find some sponsors.

Let me say, it works great! There is no single problem. This
technique can be applied to any software, any interpreter,
provided I can support the platform.

*** This is a critical phase for Stackless! ***
*** I Am Asking For Corporate Sponsorships. ***

I don't know how things should go on.
I could turn it into a commercial product, Stackless is
enabled enough for this. Or I could continue to keep it
open-sourced, provided there is enough sponsorship.
This decision has to be discussed in the next two weeks,
after that I will decide.


Please check it out of CVS and have a look,
it is sooo small code now.
cvs -d co stackless/src

You might want to add -z9 since this is a full Python 2.2 checkout.

In this state, I don't prepare a distribution.
You can build Stackless from CVS. I also put a copy
of my python22.dll here for testing:

It is just almost 2 percent slower on my W2k machine.
The trick is to avoid stack switching as much as possible.
I do it only on every 8th recursion level, which is
more than what's usual.

 >>> def f(n):
...     if n:f(n-1)
 >>> import sys
 >>> sys.setrecursionlimit(100000+10)
 >>> f(100000)

ciao - chris

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
        where do you want to jump today?

Stackless mailing list

From  Tue Jan 29 05:09:10 2002
From: (Greg Ewing)
Date: Tue, 29 Jan 2002 18:09:10 +1300 (NZDT)
Subject: [Python-Dev] Ann: Stackless 2.2 pre-alpha is ready!
In-Reply-To: <>
Message-ID: <>

Christian Tismer <>:

> I could turn it into a commercial product, Stackless is
> enabled enough for this. Or I could continue to keep it
> open-sourced, provided there is enough sponsorship.

It would be disappointing if it ceased being open-source!
I hope enough volunteers can be found to work on ports to
other platforms.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |	   +--------------------------------------+

From  Tue Jan 29 12:12:16 2002
From: (Christian Tismer)
Date: Tue, 29 Jan 2002 13:12:16 +0100
Subject: [Python-Dev] Ann: Stackless 2.2 pre-alpha is ready!
References: <>
Message-ID: <>

Greg Ewing wrote:

> Christian Tismer <>:
>>I could turn it into a commercial product, Stackless is
>>enabled enough for this. Or I could continue to keep it
>>open-sourced, provided there is enough sponsorship.
> It would be disappointing if it ceased being open-source!
> I hope enough volunteers can be found to work on ports to
> other platforms.

No problem, I was just trying to get more sponsors,
which in fact already exist (but not enough for a living).

Stackless will stay open source, especially after it has
become so few source :-)

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
      where do you want to jump today?

From  Tue Jan 29 16:36:16 2002
From: (Fred L. Drake, Jr.)
Date: Tue, 29 Jan 2002 11:36:16 -0500
Subject: [Python-Dev] release22-maint branch strangeness
Message-ID: <>

I've noticed some strangeness with the release22-maint branch.  I made
a documentation change there this morning, and CVS gave the change a
really weird version number when I checked it in.  Looking further, it
looks like the previous checkin for that file (Doc/tut/tut.tex) has
some strangeness as well.  The branching tags are also pretty
whacked.  This is an excerpt of the "cvs log" for the file:

RCS file: /cvsroot/python/python/dist/src/Doc/tut/tut.tex,v
Working file: tut.tex
head: 1.158
locks: strict
access list:
symbolic names:
        release22-fork: 1.156

Note the revision number for release22-maint; it looks like it's a
branch created from a branch created from a tag on a branch(!).  All
the while, I've been thinking that branches, once created, are
independent (identified by the third component of the revision number
for any given file).  I still think they're supposed to be.

Using a checkout created with the "-r release22-maint" options, I made
two checkins, and the revision numbers & other metadata seem seriously

date: 2001/12/21 03:48:33;  author: fdrake;  state: Exp;  lines: +2 -2
Fix up some examples in the tutorial so we don't contradict our own
advice on docstrings.
This fixes SF bug #495601.
date: 2002/01/29 14:54:18;  author: fdrake;  state: Exp;  lines: +8 -1
Revise cheeseshop example so that the order of the keyword output is
completely determined by the example; dict insertion order and the string
hash algorithm no longer affect the output.
This fixes SF bug #509281.

For revision, note the strange branch number ( --
too many components), and for revision (too many
components again!).  The strange branch number on the first indicates
that a branch was created from that revision (itself part of the
release22-branch branch).

Does anyone remember who created these branches?  Or what commands
were used to create them (using which branch/tag as the source of the
working copy being used?)?  This pretty much has Barry & I stumped at
the moment, and we'd like to get this straightened out.

The suspect branches are release22-maint, release22-mac.



Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Tue Jan 29 17:25:05 2002
From: (Michael Hudson)
Date: 29 Jan 2002 17:25:05 +0000
Subject: [Python-Dev] release22-maint branch strangeness
In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 29 Jan 2002 11:36:16 -0500"
References: <>
Message-ID: <>

"Fred L. Drake, Jr." <> writes:

> I've noticed some strangeness with the release22-maint branch.  I made
> a documentation change there this morning, and CVS gave the change a
> really weird version number when I checked it in.  Looking further, it
> looks like the previous checkin for that file (Doc/tut/tut.tex) has
> some strangeness as well.  The branching tags are also pretty
> whacked.  This is an excerpt of the "cvs log" for the file:

Looks to me like the release22 tag for Doc/tut/tut.tex was set on the
release22-branch, not the trunk.

This is not what happened for, e.g.

Quite how this happened, or what (if anything) we should do about it,
is another question entirely.

cvs status -v is quite handy here.

$ cvs status -v Doc/tut/tut.tex | head -n 20
File: tut.tex           Status: Needs Patch

   Working revision:    1.157
   Repository revision: 1.158   /cvsroot/python/python/dist/src/Doc/tut/tut.tex,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

   Existing Tags:
        r212                            (revision:
        r212c1                          (revision:
        release22-mac                   (revision:
        release22-maint                 (branch:
        release22                       (revision:
        release22-branch                (branch: 1.156.4)
        release22-fork                  (revision: 1.156)
        r22c1-mac                       (revision: 1.156)
        r22c1                           (revision: 1.156)
        r22rc1-branch                   (branch: 1.156.2)
$ cvs status -v | head -n 20
File:      Status: Up-to-date

   Working revision:    1.289
   Repository revision: 1.289   /cvsroot/python/python/dist/src/,v
   Sticky Tag:          (none)
   Sticky Date:         (none)
   Sticky Options:      (none)

   Existing Tags:
        r212                            (revision:
        r212c1                          (revision:
        release22-mac                   (revision: 1.288)
        release22-maint                 (branch: 1.288.6)
        release22                       (revision: 1.288)
        release22-branch                (branch: 1.288.4)
        release22-fork                  (revision: 1.288)
        r22c1-mac                       (revision: 1.288)
        r22c1                           (revision: 1.288)
        r22rc1-branch                   (branch: 1.288.2)

Did different people create the release22 tags in different bits of the tree?


  The "of course, while I have no problem with this at all, it's
  surely too much for a lesser being" flavor of argument always
  rings hollow to me.                       -- Tim Peters, 29 Apr 1998

From  Tue Jan 29 19:15:48 2002
From: (Fred L. Drake, Jr.)
Date: Tue, 29 Jan 2002 14:15:48 -0500
Subject: [Python-Dev] release22-maint branch strangeness
In-Reply-To: <>
References: <>
Message-ID: <>

Michael Hudson writes:
 > Looks to me like the release22 tag for Doc/tut/tut.tex was set on the
 > release22-branch, not the trunk.

Should not the branches be independent, once created?

 > This is not what happened for, e.g. was not changed on the release22-branch.  Take a look at
Include/patchlevel.h.  It doesn't look as messed up as
Doc/tut/tut.tex, but something is definately wrong here as well.

 > Did different people create the release22 tags in different bits of
 > the tree?

I'm not quite sure how things were handled with the -maint and -mac
branches; I wonder if a branch tag was used somewhere a normal tag
could have been used.  I don't see it, though.


Fred L. Drake, Jr.  <fdrake at>
PythonLabs at Zope Corporation

From  Tue Jan 29 19:41:45 2002
From: (Christian Tismer)
Date: Tue, 29 Jan 2002 20:41:45 +0100
Subject: [Python-Dev] Thread questionlet
Message-ID: <>

Dear developers,

I'm still a little ignorant to real threads.
In order to do the implementation of hard-wired microthreads
right, I tried to understand how real threads work.

My question, which I could not easily answer by reading
the source is:
What happens when the main thread ends? Do all threads run
until they are eady too, or are they just killed away?
And if they are killed, are they just removed, or do
they all get an exception for cleanup?

I would guess the latter, but I'm not sure.
When a thread ends, it may contain several levels of other
C calls which might need to finalize, so I thought of
a special exception for this, but didn't find such.

Many thanks and sorry about my ignorance - chris

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
      where do you want to jump today?

From  Tue Jan 29 20:29:41 2002
From: (Tim Peters)
Date: Tue, 29 Jan 2002 15:29:41 -0500
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <>
Message-ID: <>

[Christian Tismer]
> ...
> My question, which I could not easily answer by reading
> the source is:
> What happens when the main thread ends? Do all threads run
> until they are ready too, or are they just killed away?

You're walking near the edge of a very steep cliff.  There are jagged rocks
a kilometer below, so don't slip <wink>.

It varies by OS, and even by exactly how the main thread exits.  Reading OS
docs doesn't really help either, because the version of threads exposed by
the C libraries may differ from native OS facilities in subtle but crucial

> And if they are killed, are they just removed, or do
> they all get an exception for cleanup?

Can only be answered one platform at a time.  They're not going to get a
*Python*-level exception, no.  Here's a simple test program:

import thread
import time

def f(i):
    while 1:
        print "thread %d about to sleep" % i

for i in range(3):
    thread.start_new_thread(f, (i,))

print "main is done"

and a typical run on Windows:

thread 0 about to sleep
thread 1 about to sleep
thread 2 about to sleep
thread 0 about to sleep
thread 1 about to sleep
thread 2 about to sleep
thread 0 about to sleep
thread 1 about to sleep
thread 2 about to sleep
thread 0 about to sleep
thread 1 about to sleep
thread 2 about to sleep
thread 1 about to sleep
thread 0 about to sleep
thread 2 about to sleep
thread 1 about to sleep
thread 0 about to sleep
thread 2 about to sleep
thread 1 about to sleep
main is done


I expect much the same on Linux (all threads die, no exceptions raised).
But, IIRC, the threads would keep going on SGI despite that the main thread
is history.

> ...
> When a thread ends, it may contain several levels of other
> C calls which might need to finalize, so I thought of
> a special exception for this, but didn't find such.

Closing threads cleanly is the programmer's responsiblity across all OSes.
It can be very difficult.  Python doesn't really help (or hinder).
Microsoft helps in that DLLs can define a "call on thread detach" function
that's automatically called when a thread detaches from the DLL, but Python
doesn't exploit that.  The DLL hook may not get called even if it did,
depending on exactly how a thread detaches (the Big Hammer last-chance Win32
TerminateProcess/TerminateThread functions generally leave things a mess --
"TerminateThread is a dangerous function that should only be used in the
most extreme cases", etc).

From  Tue Jan 29 21:46:41 2002
From: (Guido van Rossum)
Date: Tue, 29 Jan 2002 16:46:41 -0500
Subject: [Python-Dev] Thread questionlet
In-Reply-To: Your message of "Tue, 29 Jan 2002 20:41:45 +0100."
References: <>
Message-ID: <>

> My question, which I could not easily answer by reading
> the source is:
> What happens when the main thread ends? Do all threads run
> until they are eady too, or are they just killed away?
> And if they are killed, are they just removed, or do
> they all get an exception for cleanup?

If you're talking about the thread module, they are killed without
being given notice.  The threading module however waits for all
non-daemon threads, using the atexit mechanism build on top of

--Guido van Rossum (home page:

From  Tue Jan 29 21:51:35 2002
From: (Guido van Rossum)
Date: Tue, 29 Jan 2002 16:51:35 -0500
Subject: [Python-Dev] release22-maint branch strangeness
In-Reply-To: Your message of "Tue, 29 Jan 2002 11:36:16 EST."
References: <>
Message-ID: <>

> I've noticed some strangeness with the release22-maint branch.  I made
> a documentation change there this morning, and CVS gave the change a
> really weird version number when I checked it in.  Looking further, it
> looks like the previous checkin for that file (Doc/tut/tut.tex) has
> some strangeness as well.  The branching tags are also pretty
> whacked.  This is an excerpt of the "cvs log" for the file:
> ------------------------------------------------------------------------
> RCS file: /cvsroot/python/python/dist/src/Doc/tut/tut.tex,v
> Working file: tut.tex
> head: 1.158
> branch:
> locks: strict
> access list:
> symbolic names:
>         r212:
>         r212c1:
>         release22-mac:
>         release22-maint:
>         release22:
>         release22-branch:
>         release22-fork: 1.156
> ------------------------------------------------------------------------
> Note the revision number for release22-maint; it looks like it's a
> branch created from a branch created from a tag on a branch(!).  All
> the while, I've been thinking that branches, once created, are
> independent (identified by the third component of the revision number
> for any given file).  I still think they're supposed to be.

I think you must've used a tag to bvase your branch, and that tag was
already on a branch.

--Guido van Rossum (home page:

From  Tue Jan 29 21:51:29 2002
From: (Skip Montanaro)
Date: Tue, 29 Jan 2002 15:51:29 -0600
Subject: [Python-Dev] Stevens - still best for Unix system call programming?
Message-ID: <>

Sorry for the off-topic post.  I'm starting in on a little project to create
an analog to fopen(3) and friends that provides the illusion of large file
support even on systems that don't support large files, so I'm doing more
fiddling with Unix system calls than I've done in awhile, and am looking for
a little hardcover help.  Is Richard Stevens' "Advanced Programming in the
UNIX Environment" still the _sine qua non_ in this area?



P.S. OPN: it will have a Python binding...

From  Tue Jan 29 22:08:00 2002
From: (Barry A. Warsaw)
Date: Tue, 29 Jan 2002 17:08:00 -0500
Subject: [Python-Dev] Stevens - still best for Unix system call programming?
References: <>
Message-ID: <>

>>>>> "SM" == Skip Montanaro <> writes:

    SM> Is Richard Stevens' "Advanced Programming in the UNIX
    SM> Environment" still the _sine qua non_ in this area?

Indeed!  It always seems to answer my questions accurately and in


From  Tue Jan 29 23:47:37 2002
From: (Aahz Maruch)
Date: Tue, 29 Jan 2002 15:47:37 -0800 (PST)
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <> from "Christian Tismer" at Jan 29, 2002 08:41:45 PM
Message-ID: <>

Christian Tismer wrote:
> I'm still a little ignorant to real threads.
> In order to do the implementation of hard-wired microthreads
> right, I tried to understand how real threads work.

Can't answer your specific question, but you might want to look at my
Starship pages if you want to increase your general understanding of
Python threads (there probably won't be much new to you; OTOH, it
shouldn't take you long to read).
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Wed Jan 30 00:21:40 2002
From: (Christian Tismer)
Date: Wed, 30 Jan 2002 01:21:40 +0100
Subject: [Python-Dev] Thread questionlet
References: <>
Message-ID: <>

Aahz Maruch wrote:

> Christian Tismer wrote:
>>I'm still a little ignorant to real threads.
>>In order to do the implementation of hard-wired microthreads
>>right, I tried to understand how real threads work.
> Can't answer your specific question, but you might want to look at my
> Starship pages if you want to increase your general understanding of
> Python threads (there probably won't be much new to you; OTOH, it
> shouldn't take you long to read).

Oh well, 1024 thanks, very helpful. I'm again the
clueless implementor.

It still feels warm and fuzzy here, although I think
there is no rule that I missed to break since -dev
knows about me, and now my final sacrileg...
...after all, this is kinda piecemaker, since
Stackless is now orthogonal, in a way. I gave
up some academic POV, in favor of something
pragmatic, and finally we all get rid of a problem.
Hey, I want to become a productive contributor
(again?) :)

thanks - chris

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
      where do you want to jump today?

From  Wed Jan 30 06:29:39 2002
From: (Tim Peters)
Date: Wed, 30 Jan 2002 01:29:39 -0500
Subject: [Python-Dev] Stale CVS lock
Message-ID: <>

There appears to be a stale anoncvs lock in the Misc directory preventing
checkins there (like NEWS); have opened an SF support request:

From  Wed Jan 30 10:54:14 2002
From: (Christian Tismer)
Date: Wed, 30 Jan 2002 11:54:14 +0100
Subject: [Python-Dev] Thread questionlet
References: <>
Message-ID: <>

Tim Peters wrote:

> [Christian Tismer]
>>My question, which I could not easily answer by reading
>>the source is:
>>What happens when the main thread ends? Do all threads run
>>until they are ready too, or are they just killed away?
> You're walking near the edge of a very steep cliff.  There are jagged rocks
> a kilometer below, so don't slip <wink>.

Uhmm -- I really didn't want to poke into something
problematic, but obviously I have no more simple questions
left. ;-)

> It varies by OS, and even by exactly how the main thread exits.  Reading OS
> docs doesn't really help either, because the version of threads exposed by
> the C libraries may differ from native OS facilities in subtle but crucial
> ways.

It does not sound like being designed so, more like
just some way through these subtleties, without trying
to solve every platform's problems.

I don't try to solve this, either. But since I'm writing
some kind of platform independant threads (isn't it funny?
by using non-portable tricks, I get some portable threads),
I'd like to think about how this world *could* look like.
Maybe I have a chance to provide an (u)thread implementation
which is really what people would want for real threads?

>>And if they are killed, are they just removed, or do
>>they all get an exception for cleanup?
> Can only be answered one platform at a time.  They're not going to get a
> *Python*-level exception, no.  Here's a simple test program:

[thanks for the test code]

> I expect much the same on Linux (all threads die, no exceptions raised).
> But, IIRC, the threads would keep going on SGI despite that the main thread
> is history.

So threads do force the programmer to write platform-dependant
Python code. For sure nothing that Python wants,
it just happens.

>>When a thread ends, it may contain several levels of other
>>C calls which might need to finalize, so I thought of
>>a special exception for this, but didn't find such.
> Closing threads cleanly is the programmer's responsiblity across all OSes.
> It can be very difficult.  Python doesn't really help (or hinder).

Ok with me, this is really not trivial. (I guessed that from reading
the source, but it really was not obvious. So I asked a naive
question, but you know me better...)
Maybe Python could try to help though an API?

> Microsoft helps in that DLLs can define a "call on thread detach" function
> that's automatically called when a thread detaches from the DLL, but Python
> doesn't exploit that.  The DLL hook may not get called even if it did,
> depending on exactly how a thread detaches (the Big Hammer last-chance Win32
> TerminateProcess/TerminateThread functions generally leave things a mess --
> "TerminateThread is a dangerous function that should only be used in the
> most extreme cases", etc).

Now the real question:
If you have the oportunity which I have: Define some threads
which (mis)behave equally (un)well on every supported
platform, once and forever.

Would you try to mimick the median real threads behavior as
they work today? Or would you try to build something consistent,
cross-platform, that makes sense, that would even make
sense for new revisions of the real thread modules?

I think here is a chance to do a reference implementation
of (u)threads since there are absolutely no OS dictated
restrictions or MS added doubtful features, we can just
do it right. Given that there is a suitable definition
of "right", of course.

The problem is that I'm not a specialist on threading,
therefore I'm asking for suggestions.
Please, what do you all think would be "right", given that
you have full control of ver your "virtual OS"?

contructively-but-trying-not-to-overdo - ly y'rs - chris

Christian Tismer             :^)   <>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship*
14163 Berlin                 :     PGP key ->
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
      where do you want to jump today?

From  Wed Jan 30 14:54:30 2002
From: (Michael Hudson)
Date: 30 Jan 2002 14:54:30 +0000
Subject: [Python-Dev] can someone with purify run test_curses
Message-ID: <>

Please?  It segfaults for me, in a confusing way.

It only segfaults if I run it under regrtest, not directly.


  CLiki pages can be edited by anybody at any time. Imagine the most
  fearsomely comprehensive legal disclaimer you have ever seen, and
  double it                        --

From  Wed Jan 30 14:56:04 2002
From: (Aahz Maruch)
Date: Wed, 30 Jan 2002 06:56:04 -0800 (PST)
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <> from "Christian Tismer" at Jan 30, 2002 11:54:14 AM
Message-ID: <>

Christian Tismer wrote:
> I don't try to solve this, either. But since I'm writing some kind of
> platform independant threads (isn't it funny? by using non-portable
> tricks, I get some portable threads), I'd like to think about how
> this world *could* look like.  Maybe I have a chance to provide an
> (u)thread implementation which is really what people would want for
> real threads?

No, you don't.  Real threads have one killer advantage you just can't
emulate: they can parallelize I/O operations (and theoretically
parallelize computations on multiple CPUs).  The advantage of
microthreads has been that they're lightweight, so they're good for
applications that require *lots* of threads, such as simulations.  I
think keeping this advantage would be a Good Idea.

You might want to look at Ruby, though, because it does what you're
wanting to do.  (I think -- I haven't touched Ruby myself.)
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Wed Jan 30 15:36:23 2002
From: (Neal Norwitz)
Date: Wed, 30 Jan 2002 10:36:23 -0500
Subject: [Python-Dev] can someone with purify run test_curses
References: <>
Message-ID: <>

Michael Hudson wrote:
> Please?  It segfaults for me, in a confusing way.
> It only segfaults if I run it under regrtest, not directly.

What version?  Current CVS?  Any other special instructions?
When do you need it done?


From  Wed Jan 30 15:43:19 2002
From: (Michael Hudson)
Date: 30 Jan 2002 15:43:19 +0000
Subject: [Python-Dev] can someone with purify run test_curses
In-Reply-To: Neal Norwitz's message of "Wed, 30 Jan 2002 10:36:23 -0500"
References: <> <>
Message-ID: <>

Neal Norwitz <> writes:

> Michael Hudson wrote:
> > 
> > Please?  It segfaults for me, in a confusing way.
> > 
> > It only segfaults if I run it under regrtest, not directly.
> What version?  Current CVS?  Any other special instructions?
> When do you need it done?

I don't think I do, now (see checkins).

But it might be worth running test_curses --with-pymalloc, if it's not
too much hassle.  I'll have a look to see if there are any other
Object/Mem mismatches.


  Our lecture theatre has just crashed. It will currently only
  silently display an unexplained line-drawing of a large dog
  accompanied by spookily flickering lights.
     -- Dan Sheppard, (from Owen Dunn's summary of the year)

From  Wed Jan 30 16:28:07 2002
From: (Neil Schemenauer)
Date: Wed, 30 Jan 2002 08:28:07 -0800
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: <>; from on Wed, Jan 30, 2002 at 07:47:36AM -0800
References: <>
Message-ID: <>

Michael Hudson wrote:
> Modified Files:
> 	_curses_panel.c 
> Log Message:
> Oh look, another one.
> 2.2.1 candiate (he says, largely talking to himself :)

> *** 192,196 ****
>       Py_DECREF(po->wo);
>       remove_lop(po);
> !     PyMem_DEL(po);
>   }
> --- 192,196 ----
>       Py_DECREF(po->wo);
>       remove_lop(po);
> !     PyObject_DEL(po);
>   }

I think we have to break down and do what Tim suggests.  Ie make:

    free == PyMem_DEL == PyObject_DEL == PyObject_FREE == ...

pymalloc needs to use a completely new set of APIs.  The only problem I
see is coming up with names.  NEW, MALLOC, REALLOC, RESIZE, and DEL are
all taken.  Any suggestions?


From  Wed Jan 30 17:01:17 2002
From: (Michael Hudson)
Date: 30 Jan 2002 17:01:17 +0000
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: Neil Schemenauer's message of "Wed, 30 Jan 2002 08:28:07 -0800"
References: <> <>
Message-ID: <>

Neil Schemenauer <> writes:

> Michael Hudson wrote:
> > Modified Files:
> > 	_curses_panel.c 
> > Log Message:
> > Oh look, another one.
> > 
> > 2.2.1 candiate (he says, largely talking to himself :)
> > *** 192,196 ****
> >       Py_DECREF(po->wo);
> >       remove_lop(po);
> > !     PyMem_DEL(po);
> >   }
> >   
> > --- 192,196 ----
> >       Py_DECREF(po->wo);
> >       remove_lop(po);
> > !     PyObject_DEL(po);
> >   }
> I think we have to break down and do what Tim suggests.  Ie make:
>     free == PyMem_DEL == PyObject_DEL == PyObject_FREE == ...
> pymalloc needs to use a completely new set of APIs.  The only problem I
> see is coming up with names.  NEW, MALLOC, REALLOC, RESIZE, and DEL are
> all taken.  Any suggestions?

And then change all the current uses of PyObject_Del to the new API?
What would that buy us?  Unless I misunderstand we *have* to do
something different to remove an object as opposed to freeing raw
storage (GC, for example).

I agree we have too many preprocessor macros, but I don't think we can
have free == PyObject_DEL.

I don't what we have is so bad; a helpful tip is that if you're using
the _Free/_FREE/_Malloc/_REALLOC/etc interfaces, stop.  That gets rid
of half the problem.


  We've had a lot of problems going from glibc 2.0 to glibc 2.1.
  People claim binary compatibility.  Except for functions they
  don't like.                       -- Peter Van Eynde, comp.lang.lisp

From  Wed Jan 30 17:03:56 2002
From: (Michael Hudson)
Date: 30 Jan 2002 17:03:56 +0000
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: Michael Hudson's message of "30 Jan 2002 17:01:17 +0000"
References: <> <> <>
Message-ID: <>

Michael Hudson <> writes:

> Neil Schemenauer <> writes:
> > I think we have to break down and do what Tim suggests.  Ie make:
> > 
> >     free == PyMem_DEL == PyObject_DEL == PyObject_FREE == ...
> > 
> > pymalloc needs to use a completely new set of APIs.  The only problem I
> > see is coming up with names.  NEW, MALLOC, REALLOC, RESIZE, and DEL are
> > all taken.  Any suggestions?
> And then change all the current uses of PyObject_Del to the new API?
> What would that buy us?  Unless I misunderstand we *have* to do
> something different to remove an object as opposed to freeing raw
> storage (GC, for example).
> I agree we have too many preprocessor macros, but I don't think we can
> have free == PyObject_DEL.

No, I take that back...

  Just point your web browser at and
  look for "program", "doesn't", "work", or "my". Whenever you find
  someone else whose program didn't work, don't do what they
  did. Repeat as needed.    -- Tim Peters, on python-help, 16 Jun 1998

From  Wed Jan 30 17:13:48 2002
From: (Aahz Maruch)
Date: Wed, 30 Jan 2002 09:13:48 -0800 (PST)
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: <> from "Neil Schemenauer" at Jan 30, 2002 08:28:07 AM
Message-ID: <>

Neil Schemenauer wrote:
> pymalloc needs to use a completely new set of APIs.  The only problem I
> see is coming up with names.  NEW, MALLOC, REALLOC, RESIZE, and DEL are
> all taken.  Any suggestions?

>From the Department of Redundancy Department, how about: PyMalloc_New,
PyMalloc_Malloc, ....
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Wed Jan 30 17:40:50 2002
From: (Neil Schemenauer)
Date: Wed, 30 Jan 2002 09:40:50 -0800
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: <>; from on Wed, Jan 30, 2002 at 05:01:17PM +0000
References: <> <> <>
Message-ID: <>

Michael Hudson wrote:
> I agree we have too many preprocessor macros, but I don't think we can
> have free == PyObject_DEL.

If we don't then many extension modules will break.  The example module
Modules/xxmodule.c used to allocate using PyObject_New and deallocate
using free().  I believe there are many modules out there that do the
same (or use PyMem_Del, etc).


From  Wed Jan 30 17:39:45 2002
From: (Paul Dubois)
Date: Wed, 30 Jan 2002 09:39:45 -0800
Subject: [Python-Dev] Odd errors when catching ImportError
Message-ID: <000701c1a9b5$1bccb5e0$09860cc0@CLENHAM>

Please excuse me if this is in the bug list; I looked through it but the
list is too long for old people.

I have been running into a number of odd errors caused by code like the
following. The behavior seems to be machine dependent.

fooflag = 0
    import foo
except ImportError:
    fooflag = 1

I have had this result in a seg fault upon exit, and also when something
like this was in file inside a package, and the did

from xxx import fooflag

I've had it tell me xxx had no attribute fooflag. I added "print fooflag" at
the bottom of the file and it fixed it. That was on a DEC. On Linux it

I suppose I should be testing for the ability to import foo some other way
but I don't know what it is.

From sdm7g@Virginia.EDU  Wed Jan 30 18:00:48 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 30 Jan 2002 13:00:48 -0500 (EST)
Subject: [Python-Dev] next vs darwin
Message-ID: <>

I recall having the discussion but I don't quite recall the
resolution: Is Next support now officially dropped from the
distribution ?

I have a revised dynamic loading module that strips out all
of the dead branches ( as well as better error reporting ):
I was going to call it dynload_darwin.c and add support to
configure, but grepping thru configure I only saw darwin
as triggering dynload_next.c -- it *looks* like the Next
has bee dropped.

Should we rename the file anyway ? ( to make it easier for
folks to know where to look. )

There has also been some discussion on the pythonmac-sig list
about dynamic loading. There are some other problems that
this module doesn't fix yet. If someone wants to subit a
better one, that's fine by me, but we REALLY need to get
the better error reporting in there so we can at least
find the problem.

The other thing that's been discussed is adding configure
support to build with the dlopen compatability libs if
that is available. ( doing config with --without-dyld
doesn't seem to change anything. )

-- Steve

From  Wed Jan 30 19:04:59 2002
From: (Martin v. Loewis)
Date: 30 Jan 2002 20:04:59 +0100
Subject: [Python-Dev] next vs darwin
In-Reply-To: <>
References: <>
Message-ID: <>

Steven Majewski <sdm7g@Virginia.EDU> writes:

> I recall having the discussion but I don't quite recall the
> resolution: Is Next support now officially dropped from the
> distribution ?

AFAIR, yes.

> Should we rename the file anyway ? ( to make it easier for
> folks to know where to look. )


> The other thing that's been discussed is adding configure
> support to build with the dlopen compatability libs if
> that is available.

Can you please explain what that would provide to module users or end
users? Would there be additional modules available that otherwise
wouldn't be available? If not, I don't think that this option should
be provided.


From  Wed Jan 30 19:15:33 2002
From: (M.-A. Lemburg)
Date: Wed, 30 Jan 2002 20:15:33 +0100
Subject: [Python-Dev] Mixing memory management APIs
References: <> <> <> <>
Message-ID: <>

Neil Schemenauer wrote:
> Michael Hudson wrote:
> > I agree we have too many preprocessor macros, but I don't think we can
> > have free == PyObject_DEL.
> If we don't then many extension modules will break.  The example module
> Modules/xxmodule.c used to allocate using PyObject_New and deallocate
> using free().  I believe there are many modules out there that do the
> same (or use PyMem_Del, etc).

Breaking extensions is not a good idea. After all, these make
Python so much fun to work with (since most of the work is usually
already done ;-).

I do think that we should keep the differentiation between
allocating raw memory buffers and space for Python objects.

Even though this is not currently used, it clarifies the
code somewhat, e.g. to free memory allocated for a Python
object you write PyObject_FREE(), for an raw buffer you
write PyMem_FREE(). 

Who knows... perhaps we might want to
handle Python object memory blocks differently in the
future (e.g. build pymalloc support right into 
PyObject_NEW() and PyObject_DEL()) while leaving user space
memory in the realm of malloc() et al.

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Wed Jan 30 20:41:25 2002
From: (Rebecca Schaefer)
Date: Wed, 30 Jan 2002 15:41:25 -0500
Subject: [Python-Dev] Python/Web developer
Message-ID: <>

This is a multi-part message in MIME format.
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

TEKsystems in Appleton, WI has an opening for a web developer with
Python, Zope, html, SQL, UNIX, and Perl experience.  It is a long term
contract opportunity. Any interested candidates should email Rebecca
Schaefer at
Thank you,

Content-Type: text/x-vcard; charset=us-ascii;
Content-Transfer-Encoding: 7bit
Content-Description: Card for Rebecca Schaefer
Content-Disposition: attachment;

org:<a href=""><img src=""border="0"></a>
title:Senior Recruiter
fn:Rebecca Schaefer


From sdm7g@Virginia.EDU  Wed Jan 30 20:42:53 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 30 Jan 2002 15:42:53 -0500 (EST)
Subject: [Python-Dev] next vs darwin
In-Reply-To: <>
Message-ID: <>

On 30 Jan 2002, Martin v. Loewis wrote:

> > The other thing that's been discussed is adding configure
> > support to build with the dlopen compatability libs if
> > that is available.
> Can you please explain what that would provide to module users or end
> users? Would there be additional modules available that otherwise
> wouldn't be available? If not, I don't think that this option should
> be provided.

dlcompat libs are used by Apple to build Apache and some other programs.
The libs are not included in Mac OSX, although the sources are available
in the Darwin CVS, and an improved version is distributed on Fink and
maybe other places. Since additional libs are required, I would not
make that the default. ( unless, since there's already a check for
libdl in config, we make it dependent on that. )

The problem is that  the current dynload_next is broken, and we've
had some problems replicating tests and solutions because, among other
problems, of the very poor error reporting in dynload_next, everyone
is starting from a differently hacked version of the 2.2 distribution.
(The other variable is which modules and packages people are loading.)

Reportedly, using the dlcompat libs fixes some problems for some people.

Obviously, the best solution would be an even better dynload_darwin
that fixes all of the problems. But it the interim, I'ld like to
at least get everone debugging from the same baseline.

If there's a string objection to adding optional libdl support,
I can live without that. Adding it would just make it easier for
folks to test that configuration and build.

Getting a less broken dynload module is probably more important.

-- Steve.

From sdm7g@Virginia.EDU  Wed Jan 30 23:23:57 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Wed, 30 Jan 2002 18:23:57 -0500 (EST)
Subject: [Python-Dev] Apple PublicSource license [was: next vs darwin]
In-Reply-To: <>
Message-ID: <>

I did another version of dynload_darwin that took a >10 line
function from the dlcompat/dlopen.c code which uses an undocumented
(at least in the man pages -- there's probably comments in the
Darwin source code) non-public way around the public/private
namespace problem we were having with the previous version.

I'm waiting for some folks on pythonmac-sig to test it and report
back. I'm guessing that this solves the problem without requiring
libdl. However it gets into the possible problem of including
another license.

Could someone who undestands these issues a bit more than I, look
at this:

Apple Public Source License:

-- Steve

BTW: Here's the magic code I added from dlcompat/dlopen.c:
( On the one hand, it's fairly short and trivial. On the other,
  I wouldn't have had a clue about this without reading the code! )

 * NSMakePrivateModulePublic() is not part of the public dyld API so we define
 * it here.  The internal dyld function pointer for
 * __dyld_NSMakePrivateModulePublic is returned so thats all that maters to get
 * the functionality need to implement the dlopen() interfaces.
enum bool
                          NSModule module)
  static enum bool (*p)(NSModule module) = NULL;

  if(p == NULL)
                      (unsigned long *)&p);
  if(p == NULL){
#ifdef DEBUG
    printf("_dyld_func_lookup of __dyld_NSMakePrivateModulePublic "

From  Wed Jan 30 23:49:19 2002
From: (Tim Peters)
Date: Wed, 30 Jan 2002 18:49:19 -0500
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: <>
Message-ID: <>

[NeilS, growing older but wiser, embraces the wisdom of giving up <wink>]

[Michael Hudson]
> And then change all the current uses of PyObject_Del to the new API?

It would mean changing all uses of all memory macros in the core to use new

> What would that buy us?

The possibility to move to Vladimir's malloc implementation without breaking
any extension modules (none:  no breakage at all).  I want the Python core
to use Vladimir's malloc/free, and until the fabled free-threading gets
implemented, to use a version that *exploits* the GIL to eliminate
malloc/free lock overhead too.  We know for a fact that some major extension
modules misused the existing memory API (via mismatching "get memory"->"free
memory" pairs), and it's so Byzantine and ill-documented that this shouldn't
be a surprise.

Beyond that, I don't believe we've ever said anything about thread safety
wrt the existing memory API, simply because we relied on the platform
malloc/free to provide thread safety even in the worst of cases.  But if the
Python core switches to a gimmick that relies on the GIL, then even
extensions that use the current API properly (wrt correct matching pairs)
may get into huge trouble if the underlying allocator stops doing its own
layer of locking.

The intent with new macros would be to spell out the rules.  Extensions that
wanted to play along could switch, while extensions that ignored the issues
would continue to work with the existing seven ways to spell "malloc" (and
seven to spell "free").

> ...
> I don't what we have is so bad; a helpful tip is that if you're using
> the _Free/_FREE/_Malloc/_REALLOC/etc interfaces, stop.  That gets rid
> of half the problem.

But only for extensions that are actively maintained by people who are keen
to dig into how they've abused the current API.  It's likely easier for the
core to move to new macros than to fully debug even one large extension
module that's been working so far by luck.

a-big-hammer-wouldn't-be-called-that-if-it-weren't-big<wink>-ly y'rs  - tim

From  Wed Jan 30 23:51:56 2002
From: (Tim Peters)
Date: Wed, 30 Jan 2002 18:51:56 -0500
Subject: [Python-Dev] Mixing memory management APIs
In-Reply-To: <>
Message-ID: <>

> pymalloc needs to use a completely new set of APIs.  The only problem I
> see is coming up with names.  NEW, MALLOC, REALLOC, RESIZE, and DEL are
> all taken.  Any suggestions?

I liked Aahz's suggestion to start them with "PyMalloc_" well enough.  Most
of us use editors with word-completion anyway <wink>.

From  Thu Jan 31 00:13:58 2002
From: (Neal Norwitz)
Date: Wed, 30 Jan 2002 19:13:58 -0500
Subject: [Python-Dev] Mixing memory management APIs
References: <>
Message-ID: <>

Because of Michael Hudson's request, I tried running Purify 
--with-pymalloc enabled.  The results were a bit surprising: 13664 errors!

All the errors were in unicodeobject.c.  There were 3 types of errors:
Free Memory Reads, Array Bounds Reads, and Unitialized Memory Reads.
The line #s were in strange places (e.g., in a function declaration
and accessing self->length in an if clause, after it was accessed w/o error).
The line #s are primarily:  unicodeobject.c:2875, and unicodeobject.c:2214.

Has anyone run else used Purify and/or Insure --with-pymalloc?

BTW, I test_curses fails:  
	test test_curses crashed -- _curses.error: curs_set() returned ERR

Solaris 2.8, Purify 2002.

Problems (error lines begin with =>)

            PyUnicode_TranslateCharmap [unicodeobject.c:2214]
               PyObject *PyUnicode_EncodeASCII(const Py_UNICODE *p,
                                               int size,
            =>                                 const char *errors)
                   PyObject *repr;
                   char *s, *start;

            split_char     [unicodeobject.c:2875]
                   if (end > self->length)
                       end = self->length;
                   if (end < 0)
            =>         end += self->length;
                   if (end < 0)
                       end = 0;

From  Thu Jan 31 00:33:48 2002
From: (Tim Peters)
Date: Wed, 30 Jan 2002 19:33:48 -0500
Subject: [Python-Dev] Odd errors when catching ImportError
In-Reply-To: <000701c1a9b5$1bccb5e0$09860cc0@CLENHAM>
Message-ID: <>

[Paul Dubois]
> ...
> I have been running into a number of odd errors caused by code like the
> following. The behavior seems to be machine dependent.

Which version(s) of Python?  (Released, current CVS, all, ...?)

> fooflag = 0
> try:
>     import foo
> except ImportError:
>     fooflag = 1
> I have had this result in a seg fault upon exit,

Does or does not "foo" exist?  Or does it segfault both ways?  Either way,
run Python -vv to get a trace of what it's trying during the import attempt.
The last line displayed before the segfault may be a useful clue.  You may
even discover you're really importing a compiled foo extension module with a
hardcoded segfault in module init <wink>.

> and also when something like this was in file inside a package,
> and the did
> from xxx import fooflag
> I've had it tell me xxx had no attribute fooflag. I added "print
> fooflag" at the bottom of the file and it fixed it. That was on a DEC.
> On Linux it worked.
> I suppose I should be testing for the ability to import foo some other
> way but I don't know what it is.

That's "the usual" way to check imports; if it were a widespread problem
under any version of Python, I expect we would have heard about it before.

If you have useful followups, you should record them in a bug report on
SourceForge (Python-Dev is a black hole for bug reports).

From  Wed Jan 30 21:15:11 2002
From: (Andrew MacIntyre)
Date: Thu, 31 Jan 2002 08:15:11 +1100 (EDT)
Subject: [Python-Dev] updated patches for OS/2 EMX port
In-Reply-To: <>
Message-ID: <>

I've let this lie for a few days to see whether any other comments were
forthcoming, but nothing's turned up...

On 27 Jan 2002, Martin v. Loewis wrote:

> Andrew MacIntyre <> writes:
> > - Modules/unicodedata.c is affected by a name clash between the internally
> > defined _getname() and an EMX routine of the same name defined in
> > <stdlib.h>.  The patch renames the internal routine to _getucname() to
> > avoid this, but this change may not be acceptable - advice please.
> My advice for renaming things because of name clashes: Always rename
> in a way that solves this particular problem for good, by using the Py
> prefix (or _Py to further indicate that this is not public API; it's a
> static function, anyway). Somebody may have a function _getucname
> somewhere, whereas it is really unlikely that people add a Py prefix
> to their functions (if they have been following the last 30 years of C
> programming).

Fair enough.  I was trying to minimise stylistic differences in the fix,
but if using _Py_getname is the canonical solution, that's easy fixed.

> > - Objects/stringobject.c and Objects/unicodeobject.c contain changes to
> > handle the EMX runtime library returning "0x" as the prefix for output
> > formatted with a "%X" format.
> I'd suggest a different approach here, which does not use #ifdefs:
> Instead of testing for the system, test for the bug. Then, if the bug
> goes away, or appears on other systems as well, the code will be good.

I did it the way I did because there's already code dealing with other
brokeness in this area which doesn't solve the EMX issue, and the #ifdef
solution minimises the risk of EMX fixes breaking something else which I
can't test.  At this stage I can't see this bug being fixed in EMX :-(

> Once formatting is complete, see whether it put in the right letter,
> and fix that in the result buffer if the native sprintf got it wrong.
> If you follow this strategy, you should still add a comment indicating
> that this was added for OS/2, to give people an idea where that came
> from.

Definitely a more general approach, which I'll look at in detail.

> Another approach would be to autoconfiscate this particular issue. I'm
> in general in favour of autoconf'ed bug tests instead of runtime bug
> tests, but people on systems without /bin/sh might feel differently.

While there are sh/bash shells for OS/2, they're not standard equipment.
Autoconf also has a very spotty record on OS/2, although there are people
trying to improve that.

> > If there are no unresolvable objections, and approval to apply these
> > patches is granted, I propose that the patches be applied as follows:-
> >
> > Stage 1:  the build patch (creates+populates PC/os2emx/)
> > Stage 2:  the Lib/plat-os2emx/ patch
> > Stage 3:  the Lib/ and Lib/test/ patches
> > Stage 4:  the distutils patch
> > Stage 5:  the Include/, Objects/ and Python/ patches
> > Stage 6:  the Modules/ patch
> >
> > I would expect to allow at least 48 hours between stages.
> >
> > Comments/advice on this proposal also appreciated.
> Sounds good to me (although I'd probably process the "uncritical",
> i.e. truly platform-specific parts much more quickly). Who's going to
> work with Andrew to integrate this stuff?

The last I heard, Guido expected I was going to commit my own patches
(after review), so I was allowing time for my initial attempts to commit
to be checked by regular builders/testers of the tree before getting to
changes that affect non-EMX specific parts of Python.

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Thu Jan 31 04:41:13 2002
From: (Tim Peters)
Date: Wed, 30 Jan 2002 23:41:13 -0500
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <>
Message-ID: <>

If you ask Guido, the only reason to use threads is to do overlapped I/O.
And if you come up with a good counter-example, he'll find a way to *call*
it overlapped I/O, so there's no opposing him on this <wink>.

That's clearly a huge reason in practice to use threads, and that reason
requires using platform threads (to get true overlap).  Another huge reason
is to play nice with threaded libraries written in other languages, and
again that requires playing along with platform threads.

So what most thread users want is what Python gives them:  a thin wrapper
around native threads, complete with platform quirks.

The module adds *some* sanity to that, providing portable APIs
for some important synch primitives, and uniform thread shutdown semantics
(as Guido pointed out, when you use's thread wrappers, when the
main thread exits it waits for all (non-daemon) threads to quit).

What people seem to ask for most often now is a way for one thread to tell
another thread to stop.  In practice I've always done this by having each
thread poll a "time to stop?" variable from time to time.  That's not what
people want, though.  They want a way to *force* another thread to stop,
even if (e.g.) the target thread is stuck in a blocking read, or in the
middle of doing an extraordinarily expensive regexp search.  There simply
isn't a portable way to do that.  Java initially spec'ed a way to do it in
its thread model, but declared that deprecated after obtaining experience
with it:  not only did it prove impossible to implement in all cases, but
even when it worked, the thread that got killed had no way to leave global
invariants in a sane state (e.g., the thread may have had any number of
synch gimmicks-- like locks --in various states, and global invariants for
synch gimmicks can't tolerate a participant vanishing without both extreme
care and a way for a thread to declare itself unstoppable at times).

So that's a mess, but that's still what people want.  OTOH, they won't want
it for long if they get it (just as Java ran screaming from it).

I'm not sure the audience for cororoutine-style threads even overlaps.  You
could try to marry both models, by running coroutine-style threads in a pool
of OS threads.  Then, e.g., provided you knew a potentially blocking I/O
call when you saw one, you could farm it out to one of the real threads.  If
you can't do that, then I doubt the "real thread" crowd will have any
interest in uthreads (or will, but treat them as an entirely distinct
facility -- which, for their purposes, they would be).

For purposes of computational parallelism (more my background than
Guido's -- the idea that you might want to use a thread to avoid blocking on
I/O was novel to me <wink>), the global interpreter lock renders Python
useless except for prototyping, so there's not much point digging into the
hundreds of higher-level parallelism models that have been developed.

IOW, uthreads are their own universe.  I haven't used them, so don't know
what would be useful.  What do the current uthread users ask for?  That's
where I'd start.

From  Thu Jan 31 09:45:24 2002
From: (M.-A. Lemburg)
Date: Thu, 31 Jan 2002 10:45:24 +0100
Subject: [Python-Dev] updated patches for OS/2 EMX port
References: <>
Message-ID: <>

Andrew MacIntyre wrote:
> On 27 Jan 2002, Martin v. Loewis wrote:
> > Andrew MacIntyre <> writes:
> >
> > > - Modules/unicodedata.c is affected by a name clash between the internally
> > > defined _getname() and an EMX routine of the same name defined in
> > > <stdlib.h>.  The patch renames the internal routine to _getucname() to
> > > avoid this, but this change may not be acceptable - advice please.
> >
> > My advice for renaming things because of name clashes: Always rename
> > in a way that solves this particular problem for good, by using the Py
> > prefix (or _Py to further indicate that this is not public API; it's a
> > static function, anyway). Somebody may have a function _getucname
> > somewhere, whereas it is really unlikely that people add a Py prefix
> > to their functions (if they have been following the last 30 years of C
> > programming).
> Fair enough.  I was trying to minimise stylistic differences in the fix,
> but if using _Py_getname is the canonical solution, that's easy fixed.

> > > - Objects/stringobject.c and Objects/unicodeobject.c contain changes to
> > > handle the EMX runtime library returning "0x" as the prefix for output
> > > formatted with a "%X" format.
> >
> > I'd suggest a different approach here, which does not use #ifdefs:
> > Instead of testing for the system, test for the bug. Then, if the bug
> > goes away, or appears on other systems as well, the code will be good.
> I did it the way I did because there's already code dealing with other
> brokeness in this area which doesn't solve the EMX issue, and the #ifdef
> solution minimises the risk of EMX fixes breaking something else which I
> can't test.  At this stage I can't see this bug being fixed in EMX :-(

I'd go with Martin's suggestion here: there already is code in 
formatint() which tests for '%#X' adding '0x' or not. This code
should be made to handle the special case by testing for it --
who knows: there may be other platforms where this doesn't work
as expected either.

BTW, could you point me to your patch for this ?

Marc-Andre Lemburg
CEO Software GmbH
Company & Consulting:                 
Python Software:         

From  Thu Jan 31 10:32:29 2002
From: (Michael Hudson)
Date: 31 Jan 2002 10:32:29 +0000
Subject: [Python-Dev] test_curses
In-Reply-To: Neal Norwitz's message of "Wed, 30 Jan 2002 19:13:58 -0500"
References: <> <>
Message-ID: <>

Neal Norwitz <> writes:

> BTW, I test_curses fails:  
> 	test test_curses crashed -- _curses.error: curs_set() returned ERR

Hmm, yes I get that too (I'd commented it out, because I thought it
had something to do with the crashes, but it didn't AFAICT).

It's very strange.  I tried wrestling with gdb to find out how it was
failing, but didn't get very far.

It's hard to see how curs_set can fail.  Maybe I need to build ncurses
from source and link to that.


  ARTHUR:  Yes.  It was on display in the bottom of a locked filing
           cabinet stuck in a disused lavatory with a sign on the door
           saying "Beware of the Leopard".
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 1

From  Thu Jan 31 10:44:39 2002
From: (Aahz Maruch)
Date: Thu, 31 Jan 2002 02:44:39 -0800 (PST)
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <> from "Tim Peters" at Jan 30, 2002 11:41:13 PM
Message-ID: <>

Tim Peters wrote:
> For purposes of computational parallelism (more my background than
> Guido's -- the idea that you might want to use a thread to avoid
> blocking on I/O was novel to me <wink>), the global interpreter lock
> renders Python useless except for prototyping, so there's not much
> point digging into the hundreds of higher-level parallelism models
> that have been developed.

Well, maybe.  I'm still hoping to prove you at least partly wrong one of
these years.  ;-)

(The long-term plan for my BCD module is to turn it into a C extension
that releases the GIL.  If that's successful, I'll start working on ways
to have Numeric release the GIL.)
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Thu Jan 31 11:18:30 2002
From: (Michael Hudson)
Date: 31 Jan 2002 11:18:30 +0000
Subject: [Python-Dev] test_curses
In-Reply-To: Michael Hudson's message of "31 Jan 2002 10:32:29 +0000"
References: <> <> <>
Message-ID: <>

Michael Hudson <> writes:

> Neal Norwitz <> writes:
> > BTW, I test_curses fails:  
> > 	test test_curses crashed -- _curses.error: curs_set() returned ERR
> Hmm, yes I get that too (I'd commented it out, because I thought it
> had something to do with the crashes, but it didn't AFAICT).
> It's very strange.  I tried wrestling with gdb to find out how it was
> failing, but didn't get very far.
> It's hard to see how curs_set can fail.  Maybe I need to build ncurses
> from source and link to that.

Heh, well I worked that one out.  The terminfo database didn't contain
entries for cursor visibility for the $TERM I had (xterm-color, for
some forgotten reason).  Setting it to xterm made pass
for me.

much-python-dev-noise-not-much-content-from-me-ly y'rs

  That's why the smartest companies use Common Lisp, but lie about it
  so all their competitors think Lisp is slow and C++ is fast.  (This
  rumor has, however, gotten a little out of hand. :)
                                        -- Erik Naggum, comp.lang.lisp

From  Thu Jan 31 08:25:54 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 03:25:54 -0500
Subject: [Python-Dev] opcode performance measurements
Message-ID: <>

I've made some simple measurements of how long opcodes take to execute
and how long it takes to go around the mainloop, using the Pentim
timestamp counter, which measures processor cycles.

The results aren't particularly surprising, but they provide some
empirical validation of what we've believed all along.  I don't have
time to go into all the gory details here, though I plan to at
Spam 10 developers day next week.

I put together a few Web pages that summarize the data I've collected
on some simple benchmarks:

Comments and questions are welcome.  I've got a little time to do more
measurement and analysis before devday.


From  Thu Jan 31 18:37:16 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 12:37:16 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

    Jeremy> I've made some simple measurements of how long opcodes take to
    Jeremy> execute and how long it takes to go around the mainloop ...

    Jeremy> Comments and questions are welcome.  I've got a little time to
    Jeremy> do more measurement and analysis before devday.

Interesting results.  I've been working on my {TRACK,UNTRACK}_GLOBAL opcode
implementations.  I have an optimizer filter that sets up tracking for all
LOAD_GLOBAL,{LOAD_ATTR}* combinations.  It's still not quite working and
will only be a proof of concept by devday if I do get it working, but I
expect most of these expensive opcode combinations to collapse into a
LOAD_FAST, with the addition of a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed
at function start and end, respectively.


From  Thu Jan 31 19:48:03 2002
From: (Jeff Epler)
Date: Thu, 31 Jan 2002 13:48:03 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <> <>
Message-ID: <>

On Thu, Jan 31, 2002 at 12:37:16PM -0600, Skip Montanaro wrote:
> Interesting results.  I've been working on my {TRACK,UNTRACK}_GLOBAL opcode
> implementations.  I have an optimizer filter that sets up tracking for all
> LOAD_GLOBAL,{LOAD_ATTR}* combinations.  It's still not quite working and
> will only be a proof of concept by devday if I do get it working, but I
> expect most of these expensive opcode combinations to collapse into a
> LOAD_FAST, with the addition of a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed
> at function start and end, respectively.

Won't there be code that this slows down?  For instance, the code
generated by
    print "f = lambda: 0"
    print "def g():"
    print "\tif f():"  # prevent optimization of 'if 0:'
    print "\t\tx = []"
    for i in range(10000):
	print "\t\tx.append(global_%d)" % i
    print "\t\treturn x"
    print "\treturn []"

not to mention, will it even work?  TRACK_GLOBAL will have to make
special note of globals that didn't exist yet when the function prologue
is executed, and either not subsequently execute the load as a LOAD_FAST
or else have a special value that causes the same NameError "global name
'global_666' is not defined" message, not an UnboundLocalError...

The latter sounds easy enough to solve, but how do you make sure that
this optimization is never a pessimization (aside from sending
programmers such as myself to the retraining camps of the PSU)?

PS Hey, that's remarkable .. usually people get unexpectedly cut off
when they try to mentio

From  Thu Jan 31 10:14:28 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 05:14:28 -0500
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "SM" == Skip Montanaro <> writes:

  SM> Interesting results.  I've been working on my
  SM> {TRACK,UNTRACK}_GLOBAL opcode implementations.  I have an
  SM> optimizer filter that sets up tracking for all
  SM> LOAD_GLOBAL,{LOAD_ATTR}* combinations.  It's still not quite
  SM> working and will only be a proof of concept by devday if I do
  SM> get it working, but I expect most of these expensive opcode
  SM> combinations to collapse into a LOAD_FAST, with the addition of
  SM> a TRACK_GLOBAL/UNTRACK_GLOBAL pair executed at function start
  SM> and end, respectively.

I won't have any implementation done at all, but should have finished
the design for LOAD_FAST-style access to globals and module
attributes.  I also have some ideas about Python bytecode specializer
that would work essentially like a JIT but generated specialized
bytecode instead of machine code.


PS Skip-- Sorry the PEP isn't clear, but the only dictionary lookups
that need to occur are at function creation time.  MAKE_FUNCTION would
need to lookup the offsets of the globals used by the functions, so
that a LOAD_FAST_GLOBAL opcode would take an int argument.

From  Thu Jan 31 20:17:25 2002
From: (Jeff Epler)
Date: Thu, 31 Jan 2002 14:17:25 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <> <> <>
Message-ID: <>

On Thu, Jan 31, 2002 at 05:14:28AM -0500, Jeremy Hylton wrote:
> PS Skip-- Sorry the PEP isn't clear, but the only dictionary lookups
> that need to occur are at function creation time.  MAKE_FUNCTION would
> need to lookup the offsets of the globals used by the functions, so
> that a LOAD_FAST_GLOBAL opcode would take an int argument.

So how does this work for mutually-recursive functions?
    def f(x):
	if x==1: return 1
	return g(x)

    def g(x): return x * f(x-1)

can f not optimize the load of the global g into a LOAD_FAST_GLOBAL?

PS Which PEP?  I only see 266

From  Thu Jan 31 20:27:59 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 14:27:59 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

    Jeff> Won't there be code that this slows down?  For instance, the code
    Jeff> generated by ...

Sure, there will be code that slows down.  That's why I said what I am
working on is a proof of concept.  Right now, each function the optimizer
operates on converts it to the equivalent of

        using, and z

There are no checks for obvious potential problems at the moment.  Such
problems include (but are not limited to):

    * Only track globals that are accessed in loops.  This would eliminate
      your corner case and should be easily handled (only work between
      SETUP_LOOP and its jump target).

    * Only track globals when there are <= 256 globals (half an oparg - the
      other half being an index into the fastlocals array).  This would also
      cure your problem.

    * Only track globals that are valid at the start of function execution,
      or defer tracking setup until they are.  This can generally be avoided
      by not tracking globals that are written during the function's
      execution, but other safeguards will probably be necessary to insure
      that it works properly.

    Jeff> ... how do you make sure that this optimization is never a
    Jeff> pessimization ...

I expect in the majority of cases either my idea or Jeremy's will be a net
win, especially after seeing his timing data.  I'm willing to accept that in
some situations the code will run slower.  I'm confident they will be a
small minority.

Tim Peters can construct cases where dicts perform badly.  Does that mean
Python shouldn't have dicts? ;-)


From  Thu Jan 31 20:29:47 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 14:29:47 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

    Jeff> PS Which PEP?  I only see 266

Mine is 266.  Jeremy's is 267.


From  Thu Jan 31 20:39:10 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 14:39:10 -0600
Subject: [Python-Dev] What's up w/ CVS?
Message-ID: <>

Anyone have any idea what's going on w/ SF CVS?  I tried to "cvs up" earlier
and it just hung.  I just tried again now and got the dreaded "WARNING:


From Samuele Pedroni" <  Thu Jan 31 20:30:48 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Thu, 31 Jan 2002 21:30:48 +0100
Subject: [Python-Dev] Re: opcode performance measurements
References: <><> <>
Message-ID: <00c801c1aa96$2b521320$6d94fea9@newmexico>

Hi. Q about PEP 267

Does the PEP mechanims adress only

import a

use a.x

cases. How does it handle things like

import a.b

use a.b.x

Thanks, Samuele Pedroni.

From  Thu Jan 31 11:02:17 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 06:02:17 -0500
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "JE" == Jeff Epler <> writes:

  JE> On Thu, Jan 31, 2002 at 05:14:28AM -0500, Jeremy Hylton wrote:
  >> PS Skip-- Sorry the PEP isn't clear, but the only dictionary
  >> lookups that need to occur are at function creation time.
  >> MAKE_FUNCTION would need to lookup the offsets of the globals
  >> used by the functions, so that a LOAD_FAST_GLOBAL opcode would
  >> take an int argument.

  JE> So how does this work for mutually-recursive functions?
  JE>     def f(x):
  JE> 	if x==1: return 1 return g(x)

  JE>     def g(x): return x * f(x-1)

  JE> can f not optimize the load of the global g into a

  JE> PS Which PEP?  I only see 266

PEP 267.  (They gesture at each other.)

So you've got a module with two globals f() and g().  They're stored
in slots 0 and 1 of the module globals array.  When f() and g() are
compiled, the symbol table for the module can note the location of f()
and g() and that f() and g() contain references to globals.  Instead
of emitting LOAD_GLOBAL "f" in g(), you can emit LOAD_GLOBAL 0 ("f").

The complication here is that a code object isn't tied to a single
module.  It would be possible to to exec f.func_code in some other
environment where "g" was not stored in the module global array.  The
dictionary lookups may occur in MAKE_FUNCTION in order to verify that
the code object and the module object agree on the layout of the
globals array.


From  Thu Jan 31 20:52:47 2002
From: (Tim Peters)
Date: Thu, 31 Jan 2002 15:52:47 -0500
Subject: [Python-Dev] What's up w/ CVS?
In-Reply-To: <>
Message-ID: <>

> Anyone have any idea what's going on w/ SF CVS?  I tried to "cvs
> up" earlier and it just hung.  I just tried again now and got the

I get this too.  SF is also showing other signs of flakiness today.  Take a
nap <wink>.

From  Thu Jan 31 20:59:02 2002
From: (Barry A. Warsaw)
Date: Thu, 31 Jan 2002 15:59:02 -0500
Subject: [Python-Dev] What's up w/ CVS?
References: <>
Message-ID: <>

>>>>> "SM" == Skip Montanaro <> writes:

    SM> Anyone have any idea what's going on w/ SF CVS?  I tried to
    SM> "cvs up" earlier and it just hung.  I just tried again now and
    SM> NASTY!" from ssh.

Same here.  No word from the SF web pages.

napping-ly y'rs,

From  Thu Jan 31 11:12:26 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 06:12:26 -0500
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <00c801c1aa96$2b521320$6d94fea9@newmexico>
References: <>
Message-ID: <>

>>>>> "SP" == Samuele Pedroni <> writes:

  SP> Hi. Q about PEP 267 Does the PEP mechanims adress only
  SP> import a
  SP> use a.x

  SP> cases. How does it handle things like
  SP> import a.b
  SP> use a.b.x

You're a smart guy, can you tell me?  :-).  Seriously, I haven't
gotten that far.

import mod.sub
creates a binding for "mod" in the global namespace

The compiler can detect that the import statement is a package import
-- and mark "mod.sub" as a candidate for optimization.  A use of
"mod.sub.attr" in function should be treated just as "mod.attr".

The globals array (dict-list hybrid, technically) has the publicly
visible binding for "mod" but also has an internal binding for
"mod.sub" and "mod.sub.attr".  Every module or submodule attribute in
a function gets an internal slot in the globals.  The internal slot
gets initialized the first time it is used and then shared by all the
functions in the module.

So I think this case isn't special enough to need a special case.


From  Thu Jan 31 21:03:51 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 15:03:51 -0600
Subject: [Python-Dev] distutils & stderr
Message-ID: <>

If I could "cvs up" I would submit a patch, but in the meantime, is there
any good reason that distutils shouldn't write its output to stderr?  I'm
using PyInline to execute a little bit of C code that returns some
information about the system to the calling Python code.  This code then
sends some output to stdout.

I've patched my local directory tree so that distutils sends its output to
sys.stderr.  Is there some overriding reason distutils messages should go to

BTW, Python + PyInline makes a hell of a lot easier to understand configure
script... ;-)


From  Thu Jan 31 21:07:48 2002
From: (Barry A. Warsaw)
Date: Thu, 31 Jan 2002 16:07:48 -0500
Subject: [Python-Dev] distutils & stderr
References: <>
Message-ID: <>

>>>>> "SM" == Skip Montanaro <> writes:

    SM> If I could "cvs up" I would submit a patch

SF seems happy again.

From  Thu Jan 31 21:17:25 2002
From: (Tim Peters)
Date: Thu, 31 Jan 2002 16:17:25 -0500
Subject: [Python-Dev] distutils & stderr
In-Reply-To: <>
Message-ID: <>

[Skip Montanaro]
> If I could "cvs up" I would submit a patch, but in the meantime, is there
> any good reason that distutils shouldn't write its output to stderr?

Win9X ( users can't redirect stderr, and the DOS box there has a
50-line maximum output history.  So stuff going to stderr is often lost
forever.  stdout can be redirected.  I don't know whether disutils had that
in mind, but it is "a reason" to leave it alone.

> I'm using PyInline to execute a little bit of C code that returns some
> information about the system to the calling Python code.  This code then
> sends some output to stdout.

If there's a connection between this and disutils, it's not apparent from
what you wrote.

From  Thu Jan 31 21:16:27 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 15:16:27 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

    SP> cases. How does it handle things like
    SP> import a.b
    SP> use a.b.x

    Jeremy> You're a smart guy, can you tell me?  :-).  Seriously, I haven't
    Jeremy> gotten that far.

My stuff does handle this, as long as the first name is global.  It just
gobbles up all LOAD_GLOBALS and any immediately following LOAD_ATTRs.  For
instance, this trivial function:

    def f():
       return distutils.core.setup

compiles to:

      0 LOAD_GLOBAL              0 (distutils)
      3 LOAD_ATTR                1 (core)
      6 LOAD_ATTR                2 (setup)
      9 RETURN_VALUE        
     10 LOAD_CONST               0 (None)
     13 RETURN_VALUE        

My TrackGlobalOptimizer class currently transforms this to

      0 SETUP_FINALLY           11 (to 14)
      3 TRACK_GLOBAL             3 (distutils.core.setup, distutils.core.setup)
      6 POP_BLOCK           
      7 LOAD_CONST               0 (None)
     10 LOAD_FAST                0 (distutils.core.setup)
     13 RETURN_VALUE        
>>   14 UNTRACK_GLOBAL           3 (distutils.core.setup, distutils.core.setup)
     17 END_FINALLY         
     18 LOAD_CONST               0 (None)
     21 RETURN_VALUE        

which is obviously not an improvement because distutils.core.setup is only
accessed once.  As people make more use of packages, such multiple attribute
loads might become more common.


From  Thu Jan 31 21:33:37 2002
From: (Skip Montanaro)
Date: Thu, 31 Jan 2002 15:33:37 -0600
Subject: [Python-Dev] distutils & stderr
In-Reply-To: <>
References: <>
Message-ID: <>

    Tim> Win9X ( users can't redirect stderr, and the DOS box
    Tim> there has a 50-line maximum output history.  So stuff going to
    Tim> stderr is often lost forever.  stdout can be redirected.  I don't
    Tim> know whether disutils had that in mind, but it is "a reason" to
    Tim> leave it alone.

Perhaps it would be friendlier if all distutils messages were hidden in "if
verbose:" statements (many already are).  PyInline could then dial down the
verbosity before calling distutils.

    >> I'm using PyInline to execute a little bit of C code that returns
    >> some information about the system to the calling Python code.  This
    >> code then sends some output to stdout.

    Tim> If there's a connection between this and disutils, it's not
    Tim> apparent from what you wrote.

Sorry about the missing link.  PyInline uses distutils to compile the C
code.  How PyInline does its think doesn't really matter to me, so I'm not
going to be interested in distutils' messages.


From  Thu Jan 31 21:31:37 2002
From: (Tim Peters)
Date: Thu, 31 Jan 2002 16:31:37 -0500
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
Message-ID: <>

[Skip Montanaro]
> ...
> Tim Peters can construct cases where dicts perform badly.  Does that mean
> Python shouldn't have dicts? ;-)

I've thought about that hard over the years.  The answer is no <wink>.

From  Thu Jan 31 21:54:00 2002
From: (Tim Peters)
Date: Thu, 31 Jan 2002 16:54:00 -0500
Subject: [Python-Dev] Thread questionlet
In-Reply-To: <>
Message-ID: <>

> For purposes of computational parallelism ... the global interpreter
> lock> renders Python useless except for prototyping, so there's not much
> point digging into the hundreds of higher-level parallelism models
> that have been developed.

> Well, maybe.  I'm still hoping to prove you at least partly wrong one of
> these years.  ;-)

WRT higher-level parallelism models, you already have in a small way, by
your good championing of the Queue module.  Queue-based approaches are a
step above the morass of low-level home-grown locking protocols people
routinely screw up; it's almost *hard* to screw up a Queue-based approach.

The GIL issue is distinct, and it plainly stops computational parallelism
from doing any good so long as we're talking about Python code.

> (The long-term plan for my BCD module is to turn it into a C extension
> that releases the GIL.

Well, that's not Python code.  It's unclear whether it will actually help:
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS aren't free, and a typical
BCD calculation may be so cheap that it's a net loss to release and
reacquire the GIL across one.  Effective use of fine-grained parallelism
usually requires something cheaper to build on, like very lightweight
critical sections mediating otherwise free-running threads.

> If that's successful, I'll start working on ways to have Numeric release
> the GIL.)

I expect that's more promising because matrix ops are much coarser-grained,
but also much harder to do safely:  BCD objects are immutable (IIRC), so a
routine crunching one doesn't have to worry about another thread mutating it
midstream if the GIL is released.  A Numeric array probably does have to
worry about that.

From  Thu Jan 31 21:55:20 2002
From: (Jack Jansen)
Date: Thu, 31 Jan 2002 22:55:20 +0100
Subject: [Python-Dev] Fwd: [Pythonmac-SIG] sys.exit() functionality
Message-ID: <>

This discussion started on pythonmac-SIG, but someone suggested 
that it isn't really a MacPython-specific issue (even though the 
implementation will be different for MacPython from unix-Python).

Any opinions?

Begin forwarded message:

> From: Martin Miller <>
> Date: Wed Jan 30, 2002  08:14:13  PM Europe/Amsterdam
> To:
> Subject: Re: [Pythonmac-SIG] sys.exit() functionality
> On Wed, 30 Jan 2002 15:29:21 +0100, Jack Jansen wrote:
>> On Tuesday, January 29, 2002, at 08:54 , Jon Bradley wrote:
>>> hey all,
>>> In embedded Python - why does sys.exit() quit out of the application
>>> that's
>>> embedding the interpreter?  Is there any way to trap or 
>>> disregard this?
>>> If a user creates an application with Python and runs it through the
>>> embedded interpreter, calling quit or exit on the Python application
>>> itself
>>> is more than ok, but allowing it to force out of the parent 
>>> application
>>> isn't.
>> Sounds reasonable. How about a routine PyMac_SetExitFunc() that you
>> could call to set your own exit function, (similar to
>> PyMac_SetConsoleHandler())? MacPython would then do all it's normal
>> cleanup, but at the very end call your routine in stead of exit().
> With an approach like the above, wouldn't it be better to have a
> platform-independent way of defining a custom exit function, 
> rather than
> calling a Mac-only system function -- or is this whole thing only an
> issue with MacPython embedding?
> Martin
> _______________________________________________
> Pythonmac-SIG maillist  -
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Thu Jan 31 22:07:30 2002
From: (Aahz Maruch)
Date: Thu, 31 Jan 2002 14:07:30 -0800 (PST)
Subject: [Python-Dev] opcode performance measurements
In-Reply-To: <> from "Jeremy Hylton" at Jan 31, 2002 03:25:54 AM
Message-ID: <>

Jeremy Hylton wrote:
> I've made some simple measurements of how long opcodes take to execute
> and how long it takes to go around the mainloop, using the Pentim
> timestamp counter, which measures processor cycles.
> The results aren't particularly surprising, but they provide some
> empirical validation of what we've believed all along.  I don't have
> time to go into all the gory details here, though I plan to at
> Spam 10 developers day next week.

My suggestion WRT SET_LINENO is to encourage the use of python -O and
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From  Thu Jan 31 12:34:28 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 07:34:28 -0500
Subject: [Python-Dev] opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "AM" == Aahz Maruch <> writes:

  AM> My suggestion WRT SET_LINENO is to encourage the use of python

Vladimir submitted a patch long ago to dynamically recompile bytecode
to add/remove SET_LINENO as needed.  I find that approach much more
appealing, because you don't have to pay the SET_LINENO penalty just
because there's some chance you'd want to connect with a debugger.  A
long running server process is the prime use case; it benefits from -O
but may need to be debugged.


From  Thu Jan 31 22:30:22 2002
From: (Tim Peters)
Date: Thu, 31 Jan 2002 17:30:22 -0500
Subject: [Python-Dev] opcode performance measurements
In-Reply-To: <>
Message-ID: <>

> My suggestion WRT SET_LINENO is to encourage the use of python -O and

What SET_LINENO does isn't even used in normal Python operation anymore
(line numbers in tracebacks are obtained via a different means, the
co_lnotab member of PyCodeObjects).  They're needed now only to call back to
user-supplied tracing routines, and that's rarely needed.  The Python
debugger is the most visible example of a tool that uses the line tracing
hook.  There are others way to get that to work, but they require real
thought and effort to implement.  There's a patch on SourceForge (IIRC, from
Vladimir) that may have worked at one time, but nobody has picked it up (I
tried to for 2.2, but couldn't make time for it then; I don't expect to have
time for it for 2.3 either, alas).

From  Thu Jan 31 22:35:54 2002
From: (Jack Jansen)
Date: Thu, 31 Jan 2002 23:35:54 +0100
Subject: [Python-Dev] next vs darwin
In-Reply-To: <>
Message-ID: <>

On Wednesday, January 30, 2002, at 09:42  PM, Steven Majewski wrote:
> dlcompat libs are used by Apple to build Apache and some other 
> programs.
> The libs are not included in Mac OSX, although the sources are 
> available
> in the Darwin CVS, and an improved version is distributed on Fink and
> maybe other places. Since additional libs are required, I would not
> make that the default. ( unless, since there's already a check for
> libdl in config, we make it dependent on that. )
> The problem is that  the current dynload_next is broken, and we've
> had some problems replicating tests and solutions because, among other
> problems, of the very poor error reporting in dynload_next, everyone
> is starting from a differently hacked version of the 2.2 distribution.
> (The other variable is which modules and packages people are loading.)
> Reportedly, using the dlcompat libs fixes some problems for 
> some people.

I'm not too thrilled with dlcompat. First and foremost, it fixes 
some problems for some people but may introduce problems for 
others (if I understand correctly). And then there's the issue 
of it not being part of the base MacOSX distribution.

I now have a dynload_next.c (that I'll check in tomorrow) that 
can behave in two ways based on a #define.

With the define off it loads every extension module in a 
separate namespace, i.e. two independent modules can never break 
each other by supplying external symbols the other module 
expected to load from a completely different place.

With the define on it loads all extension modules into the 
application namespace. Some people want this (despite the 
problems sketched above) because they have modules that refer to 
external symbols defined in modules that have been loaded 
earlier (and I assume there's magic that ensures their modules 
are loaded in the right order).

While I think this is an accident waiting to happen [*] the 
latter behaviour is more-or-less the standard unix behaviour, so 
it should probably be supportable in some way. I prefer the new 
(OSX 10.1) preferred Apple way of linking plugins (which is also 
the common way to do so on all other non-unix platforms) where 
the plugin has to be linked against the application and dynamic 
libraries it is going to be plugged into, so none of this 
dynamic behaviour goes on.

[*] I know of two cases where this already happened: both the 
curses library and the SGI gl library defined a function 
clear(), so you were hosed when you used both in the same Python 
script. And the SGI compression library contains a private 
version of libjpeg with no symbol renaming, so if you used the 
cl module and a module which linked against the normal libjpeg 
you were also hosed.
- Jack Jansen        <> -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -

From  Thu Jan 31 22:54:52 2002
From: (
Date: Thu, 31 Jan 2002 16:54:52 -0600
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <> <> <> <> <>
Message-ID: <>

On Thu, Jan 31, 2002 at 06:02:17AM -0500, Jeremy Hylton wrote:
>   JE> can f not optimize the load of the global g into a
> So you've got a module with two globals f() and g().  They're stored
> in slots 0 and 1 of the module globals array.  When f() and g() are
> compiled, the symbol table for the module can note the location of f()
> and g() and that f() and g() contain references to globals.  Instead
> of emitting LOAD_GLOBAL "f" in g(), you can emit LOAD_GLOBAL 0 ("f").

But isn't what happens in this module something like
    LOAD_CONST <code1>
    STORE_GLOBAL 0 (f)

    LOAD_CONST <code2>
    STORE_GLOBAL 1 (g)

so if you convert LOAD_GLOBAL into LOAD_FAST_GLOBAL when you MAKE_FUNCTION
on code1, there is not yet a "g" in the dlict.

Are you populating the "names" part of the dlict as an earlier "pass" of
module compilation, then?  So the optimization doesn't apply if I create
the globals from within a function?  (Of course, in that case it would work
if I set the attributes to 'None' in the module scope, right?):
    def make_fg():
	global f, g
	def f(x): pass
	def g(x): pass


From  Thu Jan 31 22:52:43 2002
From: (Barry A. Warsaw)
Date: Thu, 31 Jan 2002 17:52:43 -0500
Subject: [Python-Dev] Attention Mailman list administrators
Message-ID: <>

You will soon notice (if you haven't already) that your list admin
passwords on are broken.  This happened due to an
upgrade of the version of Python running on that system.  The old list
passwords can't be recovered, so they have to be reset.

List administrators can contact me to get this done.  If you know the
old password, send it to me and I'll reset the list to it.  Otherwise,
let me know and I'll generate a new password for you.

Sorry for the inconvenience,

From sdm7g@Virginia.EDU  Thu Jan 31 22:55:22 2002
From: sdm7g@Virginia.EDU (Steven Majewski)
Date: Thu, 31 Jan 2002 17:55:22 -0500 (EST)
Subject: [Python-Dev] next vs darwin
In-Reply-To: <>
Message-ID: <>

On Thu, 31 Jan 2002, Jack Jansen wrote:

> I'm not too thrilled with dlcompat. First and foremost, it fixes
> some problems for some people but may introduce problems for
> others (if I understand correctly). And then there's the issue
> of it not being part of the base MacOSX distribution.
> I now have a dynload_next.c (that I'll check in tomorrow) that
> can behave in two ways based on a #define.
> With the define off it loads every extension module in a
> separate namespace, i.e. two independent modules can never break
> each other by supplying external symbols the other module
> expected to load from a completely different place.
> With the define on it loads all extension modules into the
> application namespace. [...]

Did you see the version I posted a day or two ago:

If I fixed up the #ifdef macros, you could compile that three
ways (at least): Global public symbols, Private Symbols, or
the dlcompat trick. ( But it uses that magic hook into the
non-public API from dlcompat. )

My main requirement is the better error reporting.

-- Steve

From  Thu Jan 31 13:09:59 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 08:09:59 -0500
Subject: [Python-Dev] Re: opcode performance measurements
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "JE" == jepler  <> writes:

  JE> But isn't what happens in this module something like


  JE> so if you convert LOAD_GLOBAL into LOAD_FAST_GLOBAL when you
  JE> MAKE_FUNCTION on code1, there is not yet a "g" in the dlict.

  JE> Are you populating the "names" part of the dlict as an earlier
  JE> "pass" of module compilation, then?

Yes.  The compiler can do a pretty good job of establishing all the
globals in a module at compile time.  When a module is loaded, the
interpreter would allocate space for all the expected globals.

  JE> "pass" of module compilation, then?  So the optimization doesn't
  JE> apply if I create the globals from within a function?

It still applies.  The function is compiled at the same time as the
module, so the module symbol table can account for globals assigned to
only in functions.

The harder cases are much more dynamic -- exec using a module's
globals, assignment to create new attributes on an imported module,
etc.  Example:

    import foo
    assert not hasattr(foo, 'bar') # just to illustrate the example = 12

There's no way for the compiler to know that foo will have a bar
attribute when it compiles foo.


From  Thu Jan 31 23:00:34 2002
From: (Martin v. Loewis)
Date: 01 Feb 2002 00:00:34 +0100
Subject: [Python-Dev] Fwd: [Pythonmac-SIG] sys.exit() functionality
In-Reply-To: <>
References: <>
Message-ID: <>

Jack Jansen <> writes:

> This discussion started on pythonmac-SIG, but someone suggested that
> it isn't really a MacPython-specific issue (even though the
> implementation will be different for MacPython from unix-Python).
> Any opinions?

I think allowing to replace Py_Exit is the right way to go. Make it a
function pointer, initialized to _Py_Exit, and let the embedding
context change its value (through a setter, or through direct
assignment). Double-check that all callers of Py_Exit behave well when
it actually does return (which currently is not the case), and don't
forget to bump the API version.


From  Thu Jan 31 23:09:55 2002
From: (Martin v. Loewis)
Date: 01 Feb 2002 00:09:55 +0100
Subject: [Python-Dev] next vs darwin
In-Reply-To: <>
References: <>
Message-ID: <>

Jack Jansen <> writes:

> With the define on it loads all extension modules into the application
> namespace. Some people want this (despite the problems sketched above)
> because they have modules that refer to external symbols defined in
> modules that have been loaded earlier (and I assume there's magic that
> ensures their modules are loaded in the right order).

On Unix, this is a runtime option via sys.setdlopenflags (RTLD_GLOBAL
turns on import into application namespace). Do you think you could
emulate this API?

> While I think this is an accident waiting to happen [*] the latter
> behaviour is more-or-less the standard unix behaviour, so it should
> probably be supportable in some way. 

It is not at all standard unix behaviour. Since Python 1.5.2, Python
loads extensions with RTLD_LOCAL on <dlfcn.h> systems, so that each
module has its own namespace. People often requested that this is
changed, but we successfully managed to turn down all these
requests. Eventually, somebody came up with sys.setdlopenflags; this
was good enough for me.

> I prefer the new (OSX 10.1) preferred Apple way of linking plugins
> (which is also the common way to do so on all other non-unix
> platforms) where the plugin has to be linked against the application
> and dynamic libraries it is going to be plugged into, so none of
> this dynamic behaviour goes on.

I'm not sure linking with a is desirable, I'm quite fond
of the approach to let the executable export symbols to the
extensions. If that is possible on OS X, I'd encourage you to follow
such a strategy (in unix gcc/ld, this is enabled through

> [*] I know of two cases where this already happened: both the curses
> library and the SGI gl library defined a function clear(), so you
> were hosed when you used both in the same Python script.

On Unix, the originally trigger might have been the problem with
initsocket, which was also exported in an Oracle library, thus
breaking Oracle (the Python symbol is now init_socket, but that does
not change the principle).


From  Thu Jan 31 14:43:22 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 09:43:22 -0500
Subject: [Python-Dev] distutils & stderr
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "TP" == Tim Peters <> writes:

  TP> [Skip]
  >> Sorry about the missing link.  PyInline uses distutils to compile
  >> the C code.  How PyInline does its think doesn't really matter to
  >> me, so I'm not going to be interested in distutils' messages.

  TP> If distutils output isn't interesting to PyInline users,
  TP> shouldn't PyInline be changed to run with its
  TP> -q/--quiet option?

I started a thread on similar issues on the distutils-sig mailing list
a week or two ago.  There's agreement that output is a problem.  The
code has no consistent way of generating messages or of interpreting
the notion of verbose or quiet.  I think the right solution is to have
several levels of verbosity and have a single function or method to
use for output.  (Perhaps a print statement with appropriate >>.)
This makes it easier to control the amount of information you get and
where it gets printed to.  Michael Hudson has signed up to implement
it and whatever else we can pile on when he's not looking.

Further discussion should probably go to the sig.


From  Thu Jan 31 21:13:52 2002
From: (Andrew MacIntyre)
Date: Fri, 1 Feb 2002 08:13:52 +1100 (EDT)
Subject: [Python-Dev] updated patches for OS/2 EMX port
In-Reply-To: <>
Message-ID: <>

On Thu, 31 Jan 2002, M.-A. Lemburg wrote:

> > > > - Objects/stringobject.c and Objects/unicodeobject.c contain changes to
> > > > handle the EMX runtime library returning "0x" as the prefix for output
> > > > formatted with a "%X" format.
> > >
> > > I'd suggest a different approach here, which does not use #ifdefs:
> > > Instead of testing for the system, test for the bug. Then, if the bug
> > > goes away, or appears on other systems as well, the code will be good.
> >
> > I did it the way I did because there's already code dealing with other
> > brokeness in this area which doesn't solve the EMX issue, and the #ifdef
> > solution minimises the risk of EMX fixes breaking something else which I
> > can't test.  At this stage I can't see this bug being fixed in EMX :-(
> I'd go with Martin's suggestion here: there already is code in
> formatint() which tests for '%#X' adding '0x' or not. This code
> should be made to handle the special case by testing for it --
> who knows: there may be other platforms where this doesn't work
> as expected either.

There are sure to be other platforms that have this bogosity.  I'll look
into this some more.

> BTW, could you point me to your patch for this ?

The Objects patch in patch #450267, at

Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail:  | Snail: PO Box 370            |        Belconnen  ACT  2616
Web:        |        Australia

From  Thu Jan 31 22:36:37 2002
From: (Jeremy Hylton)
Date: Thu, 31 Jan 2002 17:36:37 -0500
Subject: [Python-Dev] distutils & stderr
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> "GW" == Greg Ward <> writes:

  GW> Oh wait: most of the low-level worker code in the Distutils
  GW> falls outside the main class hierarchy, so the verbose flag
  GW> isn't *quite* so readily available; it gets passed in to a heck
  GW> of a lot of functions.  Crap.

I wish it were so clean and simple, Greg <wink>.  In a lot of places,
the binary verbose flag that is stored in the main class hierarchy is
compared to some a var named "level".  The result of that comparison
is passed to functions, which ignore it and just use print.  At least


From  Wed Jan 30 23:51:30 2002
From: (Aahz Maruch)
Date: Wed, 30 Jan 2002 15:51:30 -0800 (PST)
Subject: [Python-Dev] Python/Web developer
In-Reply-To: <> from "Rebecca Schaefer" at Jan 30, 2002 03:41:25 PM
Message-ID: <>

Rebecca Schaefer wrote:
> TEKsystems in Appleton, WI has an opening for a web developer with
> Python, Zope, html, SQL, UNIX, and Perl experience.  It is a long term
> contract opportunity. Any interested candidates should email Rebecca
> Schaefer at

python-dev is the wrong place for job ads.  Please use either
python-list or send a message to
                      --- Aahz (

Hugs and backrubs -- I break Rule 6       <*>
Androgynous poly kinky vanilla queer het Pythonista

We must not let the evil of a few trample the freedoms of the many.

From Samuele Pedroni" <  Thu Jan 31 21:02:23 2002
From: Samuele Pedroni" < (Samuele Pedroni)
Date: Thu, 31 Jan 2002 22:02:23 +0100
Subject: [Python-Dev] Re: opcode performance measurements
References: <><><><00c801c1aa96$2b521320$6d94fea9@newmexico> <>
Message-ID: <014001c1aa9a$94b8a960$6d94fea9@newmexico>

From: Jeremy Hylton <>
> >>>>> "SP" == Samuele Pedroni <> writes:
>   SP> Hi. Q about PEP 267 Does the PEP mechanims adress only
>   SP> import a
>   SP> use a.x
>   SP> cases. How does it handle things like
>   SP> import a.b
>   SP> use a.b.x
> You're a smart guy, can you tell me?  :-).  Seriously, I haven't
> gotten that far.
> import mod.sub
> creates a binding for "mod" in the global namespace
> The compiler can detect that the import statement is a package import
> -- and mark "mod.sub" as a candidate for optimization.  A use of
> "mod.sub.attr" in function should be treated just as "mod.attr".
> The globals array (dict-list hybrid, technically) has the publicly
> visible binding for "mod" but also has an internal binding for
> "mod.sub" and "mod.sub.attr".  Every module or submodule attribute in
> a function gets an internal slot in the globals.  The internal slot
> gets initialized the first time it is used and then shared by all the
> functions in the module.
> So I think this case isn't special enough to need a special case.

 OK, I stated the wrong question.  What happens if I do the following:

import a.b

def f():
   print a.b.x
   print a.b.x


Now a.g() change a.b from a submodule to an object with a x attribute.
Maybe this case does not make sense, but the point is that the PEP
is quite vague about imported stuff.

Samuele (more puzzled than smart).