From greg@cosc.canterbury.ac.nz Tue Apr 1 00:43:05 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 01 Apr 2003 12:43:05 +1200 (NZST)
Subject: [Python-Dev] Distutils documentation amputated in 2.2 docs?
Message-ID: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz>

I was looking at the Distributing Python Modules section of the
distutils docs for 2.2 the other day, and it mentioned a section about
extending the distutils, but there did not appear to be any such
section.

Further investigation revealed that the 1.6 version of the docs *does*
have this section, as section 8, but somewhere between the 1.6 and 2.2
docs, this section has disappeared, along with almost all of section
9, "Reference", which now appears as section 7, but with only a small
part of what it should contain.

What's the proper way of submitting a bug report about this?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From paul@prescod.net Tue Apr 1 00:52:06 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Mar 2003 16:52:06 -0800
Subject: [Python-Dev] Capabilities
In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
Message-ID: <3E88E2B6.1080409@prescod.net>

Ka-Ping Yee wrote:
> Hmm, i'm not sure you understood what i meant. The code example i posted
> is a solution to the design challenge: "provide read-only access to a
> directory and its subdirectories, but no access to the rest of the filesystem".
> I'm looking for other security design challenges to tackle in Python.
> Once enough of them have been tried, we'll have a better understanding of
> what Python would need to do to make secure programming easier.

Okay, how about allowing a piece of untrusted code to import modules 
from a selected subset of all modules. For instance you probably want to 
allow untrusted code to get access to regular expressions and codecs 
(after taming!) but not os or socket.

Speaking of sockets, web browsers often allow connections to sockets 
only at a particular domain. In a capabilities world, I guess the domain 
would be an object that you could request sockets from.

Are DOS issues in scope? How do we prevent untrusted code from just 
bringing the interpreter to a halt? A smart enough attacker could even 
block all threads in the current process by finding a task that is 
usually not time-sliced and making it go on for a very long time. 
without looking at the Python implementation, I can't remember an 
example off of the top of my head, but perhaps a large multiplication or 
search-and-replace in a string.

 Paul Prescod



From paul@prescod.net Tue Apr 1 01:08:40 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 31 Mar 2003 17:08:40 -0800
Subject: [Python-Dev] Capabilities
In-Reply-To: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3E88E698.7000503@prescod.net>

Guido van Rossum wrote:
>...
> 
>>In many classes, __init__ exercises authority. An obvious C type with
>>the same problem is the "file" type (being able to ask a file object
>>for its type gets you the ability to open any file on the filesystem).
>>But many Python classes are in the same position -- they acquire
>>authority upon initialization.
> 
> 
> What do you mean exactly by "exercise authority"? Again, I understand
> this for C code, but it would seem that all authority ultimately comes
> from C code, so I don't understand what authority __init__() can
> exercise.

Given that Zipfile("/tmp/foo.zip") can read a zipfile, the zipfile class 
clearly has the ability to open files. It derives this ability from the 
fact that it can get at open(), os.open etc. In a capabilities world, it 
should not have access to that stuff unless the caller specifically gave 
it access. And the logical way for the caller to give it that access is 
like this:

ZipFile(already_opened_file)

But in restricted code

> ...
> But is it really ZipFile.__init__ that exercises the authority? Isn't
> its authority derived from that of the open() function that it calls?

I think that's the problem. the ZipFile module has a back-door 
"capability" that is incredibly powerful. In a library designed for 
capabilities, its only access to the outside world would be via data 
passed to it explicitly.

> In what sense is the ZipFile class an entity by itself, rather than
> just a pile of Python statements that derive any and all authority
> from its caller?

In the sense that it can import "open" or "os.open" rather than being 
forced to only communicate with the world through objects provided by 
the caller. If we imagine a world where it has no access to those 
back-doors then I can't see why Ping's complaint about access to classes 
would be a problem.

 Paul Prescod




From jriehl@spaceship.com Tue Apr 1 01:50:39 2003
From: jriehl@spaceship.com (Jonathan Riehl)
Date: Mon, 31 Mar 2003 19:50:39 -0600 (CST)
Subject: [Python-Dev] PEP 269 once more.
Message-ID: <Pine.BSF.4.33.0303311945410.8285-100000@localhost>

Hey all,
	FYI, Guido closed the patch I had on SourceForge (599331), but I
have just put an updated patch there. I have added some documentation on
how my pgen module may be used, and the interface is much more consistent
and useful than the prior upload. If anyone is interested in playing with
pgen from Python, check it out and let me know what you think.

Thanks!
-Jon



From martin@v.loewis.de Tue Apr 1 06:12:17 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 01 Apr 2003 08:12:17 +0200
Subject: [Python-Dev] Distutils documentation amputated in 2.2 docs?
In-Reply-To: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz>
References: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz>
Message-ID: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> What's the proper way of submitting a bug report about this?

It would be best if you would provide a patch. Try to locate the
primary source of the missing documentation (i.e. a TeX snippet),
and integrate this into the current CVS, then do a cvs diff.

If you find that the text is still there in the primary source, and
just not rendered in the HTML version, submit a bug report pointing to
the precise file that does not get rendered.

Regards,
Martin


From joel@boost-consulting.com Tue Apr 1 08:56:34 2003
From: joel@boost-consulting.com (Joel de Guzman)
Date: Tue, 1 Apr 2003 16:56:34 +0800
Subject: [Python-Dev] How to suppress instance __dict__?
References: <ur88zougj.fsf@boost-consulting.com> <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <uof42i1ey.fsf@boost-consulting.com> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> <uvfyayr0y.fsf@boost-consulting.com> <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <021d01c2f82c$9b6d3470$4ee1afca@kim>

Dave Abrahams wrote:

>> I am generating extension types derived from a type which is derived
>> from int 'int' by calling the metaclass; in order to prevent instances
>> of the most-derived type from getting an instance _dict_ I am
>> putting an empty tuple in the class _dict_ as '_slots_'. The
>> problem with this hack is that it disables pickling of these babies:
>> 
>> "a class that defines _slots_ without defining _getstate_
>> cannot be pickled"
>> 

Guido van Rossum wrote:

> Yes. I was assuming you'd do this at the C level. To do what I
> suggested in Python, I think you'd have to write this:
> 
> class M(type):
> def __new__(cls, name, bases, dict):
> C = type.__new__(cls, name, bases, dict)
> del C.__getstate__
> return C

Hi,

Ok, I'm lost. Please be easy with me, I'm still learning the C API
interfacing with Python :) Here's what I have so far. Emulating the 
desired behavior in Python, I can do:

 class EnumMeta(type):
 def __new__(cls, name, bases, dict):
 C = type.__new__(cls, name, bases, dict)
 del C.__getstate__
 return C
 
 class Enum(int):
 __metaclass__ = EnumMeta
 __slots__ = ()
 
 
 x = Enum(1964)
 print x
 
 import pickle
 print "SAVING"
 out_x = pickle.dumps(x)
 
 print "LOADING"
 xl = pickle.loads(out_x)
 print xl

I'm trying to rewrite this in C/C++ with the intent to patch 
Boost.Python to allow pickling on enums. I took on this task to
learn more about the low level details of Python C interfacing. 
So far, I have implemented EnumMeta in C that does not override
anything yet and installed that as the metaclass of Enum.

I was wondering... Is there some C code somewhere that I can see
that implements some sort of meta-stuff? I read PEP253 and 253
and "Unifying Types and Classes in Python 2.2". The examples 
there (specifically the class autoprop) is written in Python. I tried
searching for examples in C from the current CVS snapsot of 2.3
but I failed in doing so. I'm sure it's there, but I don't know where
to find.

To be specific, I'm lost in trying to implement tp_new of PyTypeObject. 
How do I call the default tp_new for metaclasses? 

TIA,
-- 
Joel de Guzman
joel at boost-consulting.com
http://www.boost-consulting.com
http://spirit.sf.net






From zooko@zooko.com Tue Apr 1 16:47:56 2003
From: zooko@zooko.com (Zooko)
Date: Tue, 01 Apr 2003 11:47:56 -0500
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: Message from Guido van Rossum <guido@python.org>
 of "Mon, 31 Mar 2003 17:43:09 EST." <200303312243.h2VMhCC24639@odiug.zope.com>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com> <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com>
Message-ID: <E190OvU-0002KN-00@localhost>

(I, Zooko, wrote the lines prepended with "> > ".)

 Guido wrote:
>
> Yes. That may be why the demand for capabilities has been met with
> resistance: to quote the French in "Monty Python and the Holy Grail",
> "we already got one!" :-)

;-)

Such skepticism is of course perfectly appropriate for proposed changes to your 
beautiful language.

More on the one you already got below. (I agree: you already got one.)


> > Here's a two sentence definition of capabilities:
> 
> I've heard too many of these. They are all too abstract.

There may have been a terminological problem. The word "capabilities" has been 
used for three different systems -- "capabilities-as-rows-of-the-Lampson-access-
control-matrix", "capabilities-as-keys", and "capabilities-as-references". 
Unfortunately, the distinction is rarely made explicit, so people often assert 
things about "capabilities" which are untrue of capabilities-as-references. 
(Ping has just written a paper about this.)

The former two kinds of capabilities have major problems and are disliked by 
almost everybody. The last one is the one that Ping, Ben Laurie and I are 
advocating, and the one that you already got.

Anyway, if someone gave a definition of capabilities-as-references and it 
didn't match with the two-sentence definition I gave (and with the diagram), 
then it was wrong.

Here's the two-sentence definition again:

> > Authority originates in C code (in the interpreter or C extension
> > modules), and is passed from thing to thing.
> 
> This part I like.
> 
> > A given thing "X" -- an instance of ZipFile, for example -- has the
> > authority to use a given authority -- to invoke the real open(), for
> > example -- if and only if some thing "Y" previously held both the
> > "open()" authority and the "authority to extend authorities to X"
> > authority, and chose to extend the "open()" authority to X.
> 
> But the instance of ZipFile is not really a protection domain.
> Methods on the instance may have different authority.

Okay, ZipFile was the wrong example. Here it is without examples:

Abstract version: A given thing "X" can use a given authority "S" if and only if 
some thing "Y" has previously held both the authority and the "authority to 
extend authorities to X" and chose to extend "S" to X.

To make it concrete, I will use the word "object" to mean "anything referenced 
by a Python reference". This includes class instances, closures, bound methods, 
stack frames, etc. When I mean Python's instance-of-a-class "object", I'll say 
"instance" instead of "object". So the concrete version is:

Concrete version: An object "X" can use an object "S" if and only if some object 
"Y" has previously held references to both S and X, and chose to give a 
reference to S to X.


(Quoting out of order:)

> > Hm. Reviewing the rexec docs, I being to suspect that the "access
> > control system with unified designation and authority" *is* how
> > Python does access control in restricted mode, and that rexec itself
> > is just to manage module import and certain dangerous builtins.
> 
> Yes.
[...]
> Sure. The question is, what exactly are Alice, Bob and Carol? I
> claim that they are not specific class instances but they are each a
> "workspace" as I tried to explain before. A workspace is more or less
> the contents of a particular "sys.modules" dictionary.

I believe I understand the motivation for rexec now.

I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's 
suggestion [1]), Python objects have encapsulation -- one can't access their 
private data without their permission.

Once this is done, Python references are capabilities.

So if you have a Python object such as a wxWindow instance, and you want to 
control access to it, the natural way to do that is to control how references to 
it are passed around.

This is why you've already got one. The natural and Pythonic way to control 
access to Python objects is with capabilities, and that's what you've been doing 
all along.

However, you don't use the same technique to control access to Python *modules* 
such as the zipfile module, because the "import zipfile" statement will give the 
current scope access to the zipfile module even if nobody has granted such 
access to the current scope.

This is a violation of the two-sentence definition and of the graph: the current 
scope just gained authority ex nihilo.

So your solution to this, to prevent code from grabbing privileges willy nilly 
via "import" and builtins, is rexec, which creates a scope in which code 
executes (now called a "workspace"), and allows you to control which builtins 
and modules are available for code executing in that "workspace".

Now access to modules conforms to the definition of capabilities: an object X 
can access a module S if and only if some object Y previously had access to X's 
workspace and to S, and Y chose to give X access to S.

So unless I've missed something, rexec conforms to the definition of 
capabilities as well.

(Of course, one can always build other access-control mechanisms on top of 
capabilities. In particular, the rexec "hooks" mechanism seems intended for 
that.)

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links

[1] http://mail.python.org/pipermail/python-dev/2003-March/034311.html


From jeremy@zope.com Tue Apr 1 17:10:16 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 01 Apr 2003 12:10:16 -0500
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: <E190OvU-0002KN-00@localhost>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org>
 <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net>
 <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com>
 <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com>
 <E190OvU-0002KN-00@localhost>
Message-ID: <1049217016.14149.12.camel@slothrop.zope.com>

On Tue, 2003-04-01 at 11:47, Zooko wrote:
> I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's 
> suggestion [1]), Python objects have encapsulation -- one can't access their 
> private data without their permission.
> 
> Once this is done, Python references are capabilities.

REM does not provide object encapsulation, but it disables enough
introspection that it is possible to provide encapsulation. The REM
implementation provides a Bastion function that creates private state by
storing the state in func_defaults, which is inaccessible in REM.

Jeremy




From paul@prescod.net Tue Apr 1 18:29:37 2003
From: paul@prescod.net (Paul Prescod)
Date: Tue, 01 Apr 2003 10:29:37 -0800
Subject: [Python-Dev] Capabilities
In-Reply-To: <200303312243.h2VMhCC24639@odiug.zope.com>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com> <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com>
Message-ID: <3E89DA91.9040001@prescod.net>

Guido van Rossum wrote:
>>How is the implementation of "open" provided by the trusted code to
>>the untrusted code? Is it possible to provide a different "open"
>>implementation to different "instances" of the zipfile module? (I
>>think not, as there is no such thing as "a different instance of a
>>module", but perhaps you could have two rexec "workspaces" each of
>>which has a zipfile module with a different "open"?)
> 
> 
> To the contrary, it is very easy to provide code with a different
> version of open(). E.g.:
> 
> # this executes as trusted code
> def my_open(...):
> "open() variant that only allows reading"
> my_builtins = {"len": len, "open": my_open, "range": range, ...}
> namespace = {"__builtins__": my_builtins}
> exec "..." in namespace

That's fair enough, but why is it better for the "protection domain" to 
be an invoked "workspace" instead of an object?

Think of it from a software engineering point of view: you're proposing 
that the right way to manage security is to override more-or-less global 
variables. Zooko is proposing that you pass the capabilities each method 
needs to that method. i.e. standard structured programming.

Let's say that untrusted code wants access to the socket module. The 
surrounding code wants to tame it to prevent socket connections to 
certain IP addresses. I think that in the rexec model, the surrounding 
application would have to go in and poke "safe" versions of the 
constructor into the module. Or they would have to disallow access to 
the module altogether and provide an object that tamed module 
appropriately. The first approach is kind of error prone. The second 
approach requires the untrusted code to use a model of programming that 
is very different than "standard Python."

If we imagined a Python with capabilities were built in deeply, the 
socket module would be designed to be tamed. By default it would have no 
authority at all except that which is passed in. The authority to 
contact the outside world would be separate from all of the other useful 
stuff in the socket module and socket class. I'm not necessarily 
advocating this kind of a change to the Python library...

 Paul Prescod



From pje@telecommunity.com Tue Apr 1 18:01:54 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 01 Apr 2003 13:01:54 -0500
Subject: [Python-Dev] Capabilities (we already got one)
Message-ID: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net>

 >However, you don't use the same technique to control access to Python 
*modules*
 >such as the zipfile module, because the "import zipfile" statement will 
give the
 >current scope access to the zipfile module even if nobody has granted such
 >access to the current scope.
 >...
 >So your solution to this, to prevent code from grabbing privileges willy 
nilly
 >via "import" and builtins, is rexec, which creates a scope in which code
 >executes (now called a "workspace"), and allows you to control which 
builtins
 >and modules are available for code executing in that "workspace".

Almost. I think you may be confusing module *code* and module 
*objects*. Guido pointed this out earlier.

A Python module object is populated by executing a body of *code* against 
the module *object* dictionary. The module object dictionary contains a 
'__builtins__' entry that gives it its "base" capabilities.

Module *objects* possess capabilities, which are in their dictionary or 
reachable from it. *Code* doesn't possess capabilities except to constants 
used in the code. So access to *code* only grants you capabilities to the 
code and its constants.

So, in order to provide a capability-safe environment, you need only 
provide a custom __import__ which uses a different 'sys.modules' that is 
specific to that environment. At that point, a "workspace" consists of an 
object graph rooted in the supplied '__builtins__', locals(), globals(), 
and initially executing code.

We can then see that the standard Python environment is in fact a 
capability system, wherein everything is reachable from everything else.

The "holes" in this capability system, then, are:

1. introspective abilities that allow "breaking out" of the workspace (such 
as the ability to 'sys._getframe()' or examine tracebacks to "reach up" to 
higher-level stack frames)

2. the structuring of the library in ways that equate creating an instance 
of a class with an "unsafe" capability. (E.g., creating instances of 
'file()') coupled with instance->class introspection

3. Lack of true "privacy" for objects. (Proxies are a useful way to 
address this issue, because they allow more than one "capability" to exist 
for the same object.)



From ping@zesty.ca Tue Apr 1 20:12:49 2003
From: ping@zesty.ca (Ka-Ping Yee)
Date: Tue, 1 Apr 2003 14:12:49 -0600 (CST)
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: <E190OvU-0002KN-00@localhost>
Message-ID: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org>

On Tue, 1 Apr 2003, Zooko wrote:
> I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's
> suggestion [1]), Python objects have encapsulation -- one can't access their
> private data without their permission.
>
> Once this is done, Python references are capabilities.

Aaack! I wish you would *stop* saying that!

There is no criterion by which a reference is or is not a capability.
To talk in such terms only confuses the issue.

It is possible to program in a capability style in any Turing-complete
programming language, just as it is possible to program in an object
style or a functional style or a procedural style. The question is:
what does programming in a capability style look like, and how might
Python facilitate (or even encourage) that style?

To say that activating restricted execution mode causes things to
"become" capabilities is as meaningless as saying that adding a feature
to the C language would suddenly turn an arbitrary C program into an
object-oriented program.


-- ?!ng




From ehuss@netmeridian.com Tue Apr 1 21:41:54 2003
From: ehuss@netmeridian.com (Eric Huss)
Date: Tue, 1 Apr 2003 13:41:54 -0800 (PST)
Subject: [Python-Dev] Minor issue with PyErr_NormalizeException
Message-ID: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net>

We had a bug in one of our extension modules that caused a core dump in
PyErr_NormalizeException(). At the very top of the function (line 133) it
checks for a NULL type. I think it should have a "return" here so that
the code does not continue and thus dump core on line 153 when it calls
PyClass_Check(type). This should also make the comment not lie about
dumping core. ;)

Just thought I'd pass it on..

-Eric


From klm@zope.com Tue Apr 1 22:35:10 2003
From: klm@zope.com (Ken Manheimer)
Date: Tue, 1 Apr 2003 17:35:10 -0500 (EST)
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com>

On Tue, 1 Apr 2003, Ka-Ping Yee wrote:

> On Tue, 1 Apr 2003, Zooko wrote:
> > I think that in restricted-execution-mode (hereafter: "REM", as
> > per Greg Ewing's suggestion [1]), Python objects have
> > encapsulation -- one can't access their private data without their
> > permission.
> >
> > Once this is done, Python references are capabilities.
> 
> Aaack! I wish you would *stop* saying that!
> 
> There is no criterion by which a reference is or is not a capability.
> To talk in such terms only confuses the issue.

I take the above, with a bit of license, to mean that REM enables
encapsulation for python objects, so they are closer to being safe to
use as capabilities. Subsequent posts suggest that encapsulation
isn't actually achieved, but that's not the issue here - the issue, as
i understand it, is how to talk about enabling capability-based safety
in python code.

> It is possible to program in a capability style in any Turing-complete
> programming language, just as it is possible to program in an object
> style or a functional style or a procedural style. The question is:
> what does programming in a capability style look like, and how might
> Python facilitate (or even encourage) that style?

I think the last part is, more specifically, "what measures need to be
taken to enable safe use of python objects for capability style
programming?"

> To say that activating restricted execution mode causes things to
> "become" capabilities is as meaningless as saying that adding a feature
> to the C language would suddenly turn an arbitrary C program into an
> object-oriented program.

I'm not near as clear about all this as you seem to be, but i have the
feeling the statements are not as meaningless as you're suggesting.
I *do* think that getting more clear about what the questions are that
we're trying to answer would be helpful, here.

One big one seems to be: "What needs to be done to enable effective
("safe"?) use of python object (references) as capabilities?" I've
seen answers to this roll by several times - i think we need to settle
them, and collect the conclusions in a PEP. And we need to identify
what other questions there are.

One more probably is, "how do we use python objects as capabilities,
once we can ensure their safety?" And maybe it'd be helpful to
elaborate what "safety" means.

-- 
Ken
klm@zope.com

 Alan Turing thought about criteria to settle the question of whether
 machines can think, a question of which we now know that it is about
 as relevant as the question of whether submarines can swim.
 -- Edgser Dijkstra



From beau@nyc-search.com Tue Apr 1 23:15:44 2003
From: beau@nyc-search.com (beau@nyc-search.com)
Date: Tue, 01 Apr 2003 18:15:44 -0500
Subject: [Python-Dev] Python Programmers, NYC
Message-ID: <3E8A1DA0.5E202C45@nyc-search.com>

This is a multi-part message in MIME format.
--------------C831F35444BF6E2B414EE13A
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Python Programmers, NYC
http://www.nyc-search.com/jobs/python.html

--------------C831F35444BF6E2B414EE13A
Content-Type: text/html; charset=us-ascii;
 name="python.html"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="python.html"
Content-Base: "http://www.nyc-search.com/jobs/python.
	html"
Content-Location: "http://www.nyc-search.com/jobs/python.
	html"

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 <meta name="GENERATOR" content="Mozilla/4.79 [en] (Win98; U) [Netscape]">
 <title>Python Programmers, NYC</title>
</head>
<body>
Python Programmers,
NYC
We are seeking an experienced and
highly-talented programmer/scripter/analysts to fill the position of Technical
Lead for our quality control group. The successful candidate will collaborate
with engineering, QC, and clients, and shall be responsible for developing
and executing testing scripts to ensure all aspects of client data, as
transformed to reports, meet stringent quality standards.
Job Requirements:
<ul>
<li>
Solid experience programming with Python
and Java, preferably in a UNIX environment.</li>

<li>
Strong knowledge of databases (Oracle)
and SQL - knowledge of PL/SQL preferred.</li>

<li>
Strong analytical skills (mathematics
or statistics background preferred).</li>

<li>
Demonstrated business knowledge of public
education systems in the United States helpful.</li>

<li>
We are using Python to: Prototype and
simulate key product functionality, as well as test the client data for
consistency and test product subsystems for correctness.</li>

<li>
Candidates who elaborate on their knowledge
of the above *key* requirements will get the best response.</li>
</ul>
My client hires on a contract basis
first and then it becomes full time if both parties are happy.
Candidates MUST be permanent and
local tri-state (NY, NJ, CT) residents.
Please submit Word resume and hourly/salary
requirements to <a href="mailto:python@nyc-search.com?subject=&body=My hourly/salary requirements are">python@nyc-search.com</a>
 &nbsp;
</body>
</html>

--------------C831F35444BF6E2B414EE13A--



From greg@cosc.canterbury.ac.nz Wed Apr 2 01:58:31 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 Apr 2003 13:58:31 +1200 (NZST)
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz>

> It would be best if you would provide a patch. Try to locate the
> primary source of the missing documentation (i.e. a TeX snippet),
> and integrate this into the current CVS, then do a cvs diff.

I'd rather not get involved in all that right now. I
just want to draw this to the attention of whoever is
maintaining the documentation.

> submit a bug report

That's what I *want* to do, but I can't figure out how.
Following the obvious links leads me to the SourceForge
Bug Tracker page, but I can't find anything there for
submitting a new bug report, only browsing existing ones.

Can someone please tell me how to submit a bug report?

Thanks,

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From lalo@laranja.org Wed Apr 2 02:40:11 2003
From: lalo@laranja.org (Lalo Martins)
Date: Tue, 1 Apr 2003 23:40:11 -0300
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz>
References: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de> <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz>
Message-ID: <20030402024010.GG6887@laranja.org>

On Wed, Apr 02, 2003 at 01:58:31PM +1200, Greg Ewing wrote:
> 
> That's what I *want* to do, but I can't figure out how.
> Following the obvious links leads me to the SourceForge
> Bug Tracker page, but I can't find anything there for
> submitting a new bug report, only browsing existing ones.
> 
> Can someone please tell me how to submit a bug report?

You need to login to sourceforge.

Once you do that you should see a bar that looks like
Submit New | Browse | Reporting | Admin
the link you want is "Submit New".

[]s,
 |alo
 +----
--
 Those who trade freedom for security
 lose both and deserve neither.
--
http://www.laranja.org/ mailto:lalo@laranja.org
 pgp key: http://www.laranja.org/pessoal/pgp

Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/
GNU: never give up freedom http://www.gnu.org/


From tim.one@comcast.net Wed Apr 2 03:03:45 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 01 Apr 2003 22:03:45 -0500
Subject: [Python-Dev] Minor issue with PyErr_NormalizeException
In-Reply-To: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net>

[Eric Huss]
> We had a bug in one of our extension modules that caused a core dump in
> PyErr_NormalizeException(). At the very top of the function (line 133) it
> checks for a NULL type. I think it should have a "return" here so that
> the code does not continue and thus dump core on line 153 when it calls
> PyClass_Check(type). This should also make the comment not lie about
> dumping core. ;)
>
> Just thought I'd pass it on..

I agree the code doesn't make sense, but the comment doesn't either. I'm in
favor of replacing the guts of the

 	if (type == NULL) {

block with a call to Py_FatalError().



From barry@python.org Wed Apr 2 04:06:32 2003
From: barry@python.org (Barry Warsaw)
Date: 01 Apr 2003 23:06:32 -0500
Subject: [Python-Dev] Minor issue with PyErr_NormalizeException
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net>
Message-ID: <1049256392.3057.3.camel@geddy>

On Tue, 2003-04-01 at 22:03, Tim Peters wrote:
> [Eric Huss]
> 
> I agree the code doesn't make sense, but the comment doesn't either. I'm in
> favor of replacing the guts of the
> 
> 	if (type == NULL) {
> 
> block with a call to Py_FatalError().

+1
-Barry




From drifty@alum.berkeley.edu Wed Apr 2 04:52:22 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Tue, 1 Apr 2003 20:52:22 -0800 (PST)
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
Message-ID: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>

You guys have 24 hours to correct my usual bunch of mistakes.

Also give me feedback on the new format for the Quickies section.

-----------

+++++++++++++++++++++++++++++++++++++++++++++++++++++
python-dev Summary for 2003-03-16 through 2003-03-31
+++++++++++++++++++++++++++++++++++++++++++++++++++++

.. _last summary:
http://www.python.org/dev/summary/2003-03-01_2003-03-15.html

======================
Summary Announcements
======================
PyCon is now over! It was a wonderful experience. Getting to meet people
from python-dev in person was great. The sprint was fun and productive
(work on the AST branch, caching where something is found in an
inheritence tree, and a new CALL_ATTR opcode were all worked on).
Definitely was worth it.

I am trying a new way of formatting the Quickies_ section. I am trying
non-inline implicit links instead of inlined ones. I am hoping this will
read better in the text version of the summary. If you have an opinion on
whether the new or old version is better let me know. And remember, the
last time I asked for an opinion Michael Chermside was the only person to
respond and thus ended up making an executive decision.

.. _PyCon: http://www.python.org/pycon/


========================
`Re: lists v. tuples`__
========================
__ http://mail.python.org/pipermail/python-dev/2003-March/034029.html

Splinter threads:
 - `Re: Re: lists v. tuples
<http://mail.python.org/pipermail/python-dev/2003-March/034070.html>`__

This developed from a thread from covered in the `last summary`_ that
discussed the different uses of lists and tuples. By the start date for
this summary, though, it had turned into a discussion on comparisons.
This occured when sorting heterogeneous objects came up. Guido commented
that having anything beyond equality and non-equality tests for
non-related objects does not make sense. This also led Guido to comment
that "TOOWTDI makes me want to get rid of __cmp__" (TOOWTDI is "There is
Only One Way to Do It").

Now before people start screaming bloody murder over the possible future
loss of __cmp__() (which probably won't happen until Python 3), realize
that all comparisons can be done using the six other rich comparisons
(__lt__(), __eq__(), etc.). There is some possible code elegance lost if
you have to use two rich comparisons instead a single __cmp__()
comparison, but it is nothing that will prevent you from doing something
that you couldn't do before.

This all led Guido to suggest introducing the function before(). This
would be used for arbitrary ordering of objects. Alex Martelli said it
would "be very nice if before(x,y) were the same as x<y whenever the
latter doesn't raise an exception, if feasible". He also said that it
should probably "define a total ordering, i.e. the implied equivalence
being equality".


================================
`Fast access to __builtins__`__
================================
__ http://mail.python.org/pipermail/python-dev/2003-March/034243.html

There has been rumblings on the list as of late of disallowing shadowing
of built-ins. Specifically, the idea of someone injecting something into
a module's namespace that overrides a global (by doing something like
``socket.len = lambda x: 42`` from the socket module) is slightly nasty,
rarely done, and prevents the core from optimizing for built-ins.

Raymond Hettinger, in an effort to see how to speed up built-in access,
came up with the idea of replacing opcode calls of LOAD_GLOBAL and replace
them with LOAD_CONST after putting the built-in being called into the
constants table. This would leave shadowing of built-ins locally
unaffected but prevent shadowing at the module. Raymond suggested turning
on this behavior for when running Python -O.

The idea of turning this on when running with the -O option was shot down.
The main argument is that semantics are changed and thus is not acceptable
for the -O flag. It was mentioned that -OO can change semantics, but even
that is questionable.

So this led to some suggestions of how to turn this kind of feature on.
Someone suggested something like a pragma (think Perl) or some other
mechanism at the module level. Guido didn't like this idea since he does
not want modules to be riddled with code to turn on module-level
optimizations.

But all of this was partially shot down when Guido stepped in and
reiterated he just wanted to prevent outside code from shadowing built-ins
for a module. The idea is that if it can be proven that a module does not
shadow a built-in it can output an opcode specific for that built-in, e.g.
len() could output opcode for calling PyOject_Size() if the compiler can
prove that len() is not shadowed in the module at any point.

Neil Schemanauer suggested adding a warning for when this kind of
shadowing is done. Guido said fine as long as extension modules are
exempt. Now no matter how well the warning is coded, it would be
*extremely* difficult to catch something like ``import X; d = X__dict__;
d["len"] = lambda x: 42``. How do you deal with this? By Guido saying he
has not issue saying something like this "is always prohibited". He said
you could still do ``setattr(X, "len", lambda x: 42)``, though, and that
might give you a warning.


================================
`capability-mediated modules`__
================================
__ http://mail.python.org/pipermail/python-dev/2003-March/034149.html

Splinter threads:
 - `Capabilities
<http://mail.python.org/pipermail/python-dev/2003-March/034152.html>`__

The thread that will not die (nor does it look like it will in the near
future; Guido asked to postpone discussing it until he gets back from
`Python UK`_ which will continue the discussion into the next summary. I
am ending up an expert at capabilities against my will. =)

In case you have not been following all of this, capabilities as being
discussed here is the idea that security is based on passing around
references to objects. If you have a reference you can use it with no
restrictions. Security comes in by controlling who you give references
to. So I might ask for a reference to file(), but I won't necessarily get
it. I could, instead, be handed a reference to a restrictive version of
file() that only opens files in an OSs temporary file directory. If that
is not clear, read the `last summary`_ on this thread. And now, on to the
new stuff...

One point made about capabilities is that they partially go against the
Pythonic grain. Since you have to pass capabilities specifically and
shouldn't allow them to be inherited, it does not go with the way you tend
to write Python code.

There were also suggestions to add arguments to import statements to give
a more fine-grained control over them. But it was pointed out that
classes fit this bill.

The idea of limiting what modules are accessible by some code by not using
a universally global scope (i.e., not using sys.modules) but by having a
specific scope for each function was suggested. As Greg Ewing put it, "it
would be dynamic scoping of the import namespace".

While trying to clarify things (which were at PyCon thanks to the Open
Space discussion held there on this subject), a good distinction between a
rexec_ world (as in the module) and a capabilities was made by Guido. In
capabilities, security is based on passing around references that have the
amount of power you are willing for it to have. In a rexec world, it is
based on what powers the built-ins give you; there is no worry about
passing around code. Also, in the rexec world, you can have the idea of a
"workspace" where __builtin__ has very specific definitions of built-ins
that are used when executing untrusted code.

Ka-Ping Yee wrote up an example of some code of what it would be like to
code with capabilities (can be found at XXX ).

.. _Python UK: http://www.python-uk.org/
.. _rexec: http://www.python.org/dev/doc/devel/lib/module-rexec.html

=========
Quickies
=========

`tzset`__
 time.tzset() is going to be kept in Python, but only on UNIX. The
testing suite was also loosened so as to not throw as many
false-negatives.

__ http://mail.python.org/pipermail/python-dev/2003-March/034062.html

`Windows IO`__
 stdin and stdout on Windows are TTYs. You can get 3rd-party modules
to get more control over the TTY.

__ http://mail.python.org/pipermail/python-dev/2003-March/034102.html

`Who approved PyObject_GenericGetIter()???`__
 Splinter threads: `Re: [Python-checkins] python/dist/src/Modules
_hotshot.c,...`__; `PyObject_GenericGetIter()`__
 Raymond Hettinger wrote a function called PyObject_GenericGetIter()
that returned self for objects that were an iterator themselves. Thomas
Wouters didn't like the name and neither did Guido since it was generic at
all; it worked specifically with objects that were iterators themselves.
Thus the function was renamed to PyObject_SelfIter().

__ http://mail.python.org/pipermail/python-dev/2003-March/034107.html
__ http://mail.python.org/pipermail/python-dev/2003-March/034103.html
__ http://mail.python.org/pipermail/python-dev/2003-March/034110.html

`test_posix failures?`__
 A test for posix.getlogin() was failing for Barry Warsaw under XEmacs
(that is what he gets for not using Vim_ =). Thomas Wouters pointed out
it only works when there is a utmp file somewhere. Basically it was
agreed the test that was failing should be removed.

__ http://mail.python.org/pipermail/python-dev/2003-March/034120.html
.. _Vim: http://www.vim.org/

`Shortcut bugfix`__
 Raymond Hettinger reported that a change in `_tkinter.c`_ for a
function led to it returning strings or ints which broke PMW_ (although
having a function return two different things was disputed in the thread;
I think it used to return a string and now returns an int). The
suggestion of making string.atoi() more lenient on its accepted arguments
was made but shot down since it changes semantics. If you want to keep
old way of having everything in Tkinter return strings instead of more
proper object types (such as ints where appropriate), you can put teh line
``Tkinter.wantobjects = 0`` before the first creation of a tkapp object.

__ http://mail.python.org/pipermail/python-dev/2003-March/034138.html
.. __tkinter.c:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_tkinter.c
.. _PMW: http://pmw.sourceforge.net/

`csv package ready for prime-time?`__
 Related: `csv package stitched into CVS hierarchy`__
 Skip Montanaro: Okay to move csv_ package from the sandbox into the
stdlib?
 Guido van Rossum: Yes.

__ http://mail.python.org/pipermail/python-dev/2003-March/034162.html
__ http://mail.python.org/pipermail/python-dev/2003-March/034179.html
.. _csv: http://www.python.org/dev/doc/devel/lib/module-csv.html

`string.strip doc vs code mismatch`__
 Neal Norwitz asked for someone to look at http://python.org/sf/697220
which updates string.strip() from the string_ module to take an optional
second argument. The patch is still open.

__ http://mail.python.org/pipermail/python-dev/2003-March/034167.html
.. _string: http://www.python.org/dev/doc/devel/lib/module-string.html

`Re: More int/long integration issues`__
 The point was made that it would be nice if the statement ``if num in
range(...): ...`` could be optimized by the compiler if range() was only
the built-in by substituting it with something like xrange() and thus skip
creating a huge list. This would allow the removal of xrange() without
issue. Guido suggested a restartable iterator (generator would work
wonderfully if you could just get everything else to make what range()
returns look like the list it should be).

__ http://mail.python.org/pipermail/python-dev/2003-March/034019.html

`socket timeouts fail w/ makefile()`__
 Skip Montanaro discovered that using the makefile() method on a socket
cause the file-like object to not observe the new timeout facility
introduced in Python 2.3. He has since patched it so that it works
properly and that sockets always have a makefile() (wasn't always the case
before).

__ http://mail.python.org/pipermail/python-dev/2003-March/034177.html

`New Module? Tiger Hashsum`__
 Tino Lange implemented a wrapper for the `Tiger hash sum`_ for Python
and asked how he could get it added to the stdlib. He was told that he
would need community backing before his module could be added in order to
make sure that there is enough demand to warrant the edition.

__ http://mail.python.org/pipermail/python-dev/2003-March/034191.html
.. _Tiger hash sum: http://www.cs.technion.ac.il/~biham/Reports/Tiger/

`Icon for Python RSS Feed?`__
 Tino Lange asked if an XML RSS feed icon could be added at
http://www.python.org/ for http://www.python.org/channews.rdf . It has
been added.

__ http://mail.python.org/pipermail/python-dev/2003-March/034196.html

`How to suppress instance __dict__?`__
 David Abrahams asked if there was an easy way to suppress an instance
__dict__'s creation from a metaclass. The answer turned out to be no.

__ http://mail.python.org/pipermail/python-dev/2003-March/034197.html

`Weekly Python Bug/Patch Summary`__
 Another summary can be found at
http://mail.python.org/pipermail/python-dev/2003-March/034286.html
 Skip Montanaro's weekly reminder how Python ain't perfect.

__ http://mail.python.org/pipermail/python-dev/2003-March/034200.html

`[ot] offline`__
 Samuele Pedroni is off relaxing is is going to be offline for two
weeks starting March 23.

__ http://mail.python.org/pipermail/python-dev/2003-March/034204.html

`funny leak`__
 Christian Tismer discovered a memory leak in a funky def statement he
came up with. The leak has since been squashed (done at PyCon_ during the
sprint, actually).

__ http://mail.python.org/pipermail/python-dev/2003-March/034212.html

`Checkins to Attic?`__
 CVS_ uses something called the Attic to put files that are only in a
branch but not the HEAD of a tree.

__ http://mail.python.org/pipermail/python-dev/2003-March/034230.html
.. _CVS: http://www.cvshome.org/

`ossaudiodev tweak needs testing`__
 Greg Ward asked people who are running Linux or FreeBSD to execute
``Lib/test/regrtest.py -uaudio test_ossaudiodev`` so as to test his latest
change to ossaudiodev_.

__ http://mail.python.org/pipermail/python-dev/2003-March/034233.html
.. _ossaudiodev:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/ossaudiodev.c

`cvs.python.sourceforge.net fouled up`__
 Apparently when you get that nice message from SourceForge_ telling
you that recv() has aborted because of server overloading you can rest
assured that people with checkin rights get to continue to connect since
they get priority.

__ http://mail.python.org/pipermail/python-dev/2003-March/034234.html
.. _SF:
.. _SourceForge: http://www.sf.net/

`Doc strings for typeslots?`__
 You can't add custom docstrings to things stored in typeobject slots
at the C level.

__ http://mail.python.org/pipermail/python-dev/2003-March/034239.html

`Compiler treats None both as a constant and variable`__
 As of now the compiler outputs opcode that treats None as both a
global and a constant. That will change as some point when assigning to
None becomes an error instead of a warning as it is in Python 2.3;
possibly 2.4 the change will be made.

__ http://mail.python.org/pipermail/python-dev/2003-March/034281.html

`iconv codec`__
 M.A. Lemburg stated that he questioned whether the iconv codec was
ready for prime-time. There have been multiple issues with it and most
seem to stem from a platform's codec and not ones that come with Python.
This affects all u"".encode() calls when the codec does not come with
Python. Hye-Shik Chang said he would get his iconv codec NG patch up on
SF in the next few days and that would be applied.

__ http://mail.python.org/pipermail/python-dev/2003-March/034300.html




From beau@nyc-search.com Wed Apr 2 04:52:26 2003
From: beau@nyc-search.com (beau@nyc-search.com)
Date: Tue, 01 Apr 2003 23:52:26 -0500
Subject: [Python-Dev] Python Technical Lead, New York, NY
Message-ID: <3E8A6C8A.223A19FC@nyc-search.com>

This is a multi-part message in MIME format.
--------------FCDF22A5C479E2E8508D5BD8
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

http://www.nyc-search.com/jobs/python.html

--------------FCDF22A5C479E2E8508D5BD8
Content-Type: text/html; charset=us-ascii;
 name="python.html"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="python.html"
Content-Base: "http://www.nyc-search.com/jobs/python.
	html"
Content-Location: "http://www.nyc-search.com/jobs/python.
	html"

<html>
<head>
 <title>Python Technical Lead, New York, NY</title>
</head>
<body>
Python Technical Lead, New York, NY
We are seeking an experienced and
highly-talented programmer/scripter/analysts to fill the position of Technical
Lead for our quality control group. The successful candidate will collaborate
with engineering, QC, and clients, and shall be responsible for developing
and executing testing scripts to ensure all aspects of client data, as
transformed to reports, meet stringent quality standards.
Job Requirements:
<ul>
<li>
Solid experience programming with Python
and Java, preferably in a UNIX environment.</li>

<li>
Strong knowledge of databases (Oracle)
and SQL - knowledge of PL/SQL preferred.</li>

<li>
Strong analytical skills (mathematics
or statistics background preferred).</li>

<li>
Demonstrated business knowledge of public
education systems in the United States helpful.</li>

<li>
We are using Python to: Prototype and
simulate key product functionality, as well as test the client data for
consistency and test product subsystems for correctness.</li>

<li>
Candidates who elaborate on their knowledge
of the above *key* requirements will get the best response.</li>
</ul>
My client hires on a contract basis
first and then it becomes full time if both parties are happy.
Candidates MUST be permanent and
local tri-state (NY, NJ, CT) residents.
Please submit Word resume and hourly/salary
requirements to <a href="mailto:python@nyc-search.com?subject=&body=My hourly/salary requirements are">python@nyc-search.com</a>
 &nbsp;
</body>
</html>

--------------FCDF22A5C479E2E8508D5BD8--



From Jack.Jansen@cwi.nl Wed Apr 2 09:21:17 2003
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Wed, 2 Apr 2003 11:21:17 +0200
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <20030402024010.GG6887@laranja.org>
Message-ID: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl>

On Wednesday, Apr 2, 2003, at 04:40 Europe/Amsterdam, Lalo Martins 
wrote:
>> Can someone please tell me how to submit a bug report?
>
> You need to login to sourceforge.
>
> Once you do that you should see a bar that looks like
> Submit New | Browse | Reporting | Admin
> the link you want is "Submit New".

Aargh, this is very bad! I'm always logged in when I visit sourceforge 
(and I assume
that most of us are), I wasn't aware of the fact that if you are not 
logged in you
get no indication whatsoever that it is possible to submit bugs.

Do we have control over what is on that page, i.e. could we add a note 
to
the top saying "If you want to submit a new bug please log in first"?

Otherwise I think the "bugs" link on www.python.org should go to a 
local page
which explains this before sending people off to the sourceforge 
tracker.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma 
Goldman



From dave@boost-consulting.com Wed Apr 2 12:57:34 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Wed, 02 Apr 2003 07:57:34 -0500
Subject: [Python-Dev] How to suppress instance __dict__?
In-Reply-To: <021d01c2f82c$9b6d3470$4ee1afca@kim> ("Joel de Guzman"'s
 message of "Tue, 1 Apr 2003 16:56:34 +0800")
References: <ur88zougj.fsf@boost-consulting.com>
 <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net>
 <uof42i1ey.fsf@boost-consulting.com>
 <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net>
 <uvfyayr0y.fsf@boost-consulting.com>
 <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net>
 <021d01c2f82c$9b6d3470$4ee1afca@kim>
Message-ID: <uvfxxys3l.fsf@boost-consulting.com>

Hi, Joel --

I don't think this is more than marginally appropriate for python-dev,
and probably we shouldn't bother Guido about it until I've failed to
help you first. Everybody else can ignore the rest of this message
unless they have a sick fascination with Boost.Python...

"Joel de Guzman" <joel@boost-consulting.com> writes:

> Ok, I'm lost. Please be easy with me, I'm still learning the C API
> interfacing with Python :) Here's what I have so far. Emulating the 
> desired behavior in Python, I can do:
>
> class EnumMeta(type):
> def __new__(cls, name, bases, dict):
> C = type.__new__(cls, name, bases, dict)
> del C.__getstate__
> return C
> 
> class Enum(int):
> __metaclass__ = EnumMeta
> __slots__ = ()
> 
> 
> x = Enum(1964)
> print x
> 
> import pickle
> print "SAVING"
> out_x = pickle.dumps(x)
> 
> print "LOADING"
> xl = pickle.loads(out_x)
> print xl
>
> I'm trying to rewrite this in C/C++ with the intent to patch 
> Boost.Python to allow pickling on enums. I took on this task to
> learn more about the low level details of Python C interfacing. 
> So far, I have implemented EnumMeta in C that does not override
> anything yet and installed that as the metaclass of Enum.
>
> I was wondering... Is there some C code somewhere that I can see
> that implements some sort of meta-stuff? 

We have some in Boost.Python already, and I'm about to check in some
more to implement static data members.

> I read PEP253 and 253 and "Unifying Types and Classes in Python
> 2.2". The examples there (specifically the class autoprop) is
> written in Python. I tried searching for examples in C from the
> current CVS snapsot of 2.3 but I failed in doing so. I'm sure it's
> there, but I don't know where to find.

Actually there are very few metaclasses in Python proper. AFAIK,
PyType_Type is the only metaclass in the core.

> To be specific, I'm lost in trying to implement tp_new of
> PyTypeObject. How do I call the default tp_new for metaclasses?

 PyTypeObject.tp_new( /*args here*/ ) should work.

HTH,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From zooko@zooko.com Wed Apr 2 13:39:33 2003
From: zooko@zooko.com (Zooko)
Date: Wed, 02 Apr 2003 08:39:33 -0500
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
In-Reply-To: Message from Brett Cannon <bac@OCF.Berkeley.EDU>
 of "Tue, 01 Apr 2003 20:52:22 PST." <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
Message-ID: <E190iSj-0007S7-00@localhost>

 Brett Cannon <bac@OCF.Berkeley.EDU> wrote:
>
> One point made about capabilities is that they partially go against the
> Pythonic grain. Since you have to pass capabilities specifically and
> shouldn't allow them to be inherited, it does not go with the way you tend
> to write Python code.

This doesn't make sense to me, and I don't recall a message which asserted it.

If capabilities were implemented as Python references, you could inherit 
capabilities (== references) from superclasses, just as you can currently do.

The rest looks like a good summary!

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links



From nas@python.ca Wed Apr 2 14:35:53 2003
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 2 Apr 2003 06:35:53 -0800
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
In-Reply-To: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
Message-ID: <20030402143553.GA6801@glacier.arctrix.com>

Brett Cannon wrote:
> Neil Schemanauer suggested adding a warning for when this kind of
> shadowing is done.

There is a patch on SF (http://www.python.org/sf/711448) that adds a
warning. It probably needs a bit of polish but I think it could go into
2.3.

 Neil


From op73418@mail.telepac.pt Wed Apr 2 14:42:41 2003
From: op73418@mail.telepac.pt (=?iso-8859-1?Q?Gon=E7alo_Rodrigues?=)
Date: Wed, 2 Apr 2003 15:42:41 +0100
Subject: [Python-Dev] Super and properties
Message-ID: <001401c2f926$1d32d7e0$a8130dd5@violante>

Hi all,

Since this is my first post here, let me first introduce myself. I'm Gonçalo
Rodrigues. I work in mathematics, mathematical physics to be more precise. I
am a self-taught hobbyist programmer and fell in love with Python a year and
half ago. And of interesting personal details this is about all so let me
get down to business.

My problem has to do with super that does not seem to work well with
properties. I posted to comp.lang.python a while ago and there I was advised
to post here. So, suppose I override a property in a subclass, e.g.

>>> class test(object):
... def __init__(self, n):
... self.__n = n
... def __get_n(self):
... return self.__n
... def __set_n(self, n):
... self.__n = n
... n = property(__get_n, __set_n)
...
>>> a = test(8)
>>> a.n
8
>>> class test2(test):
... def __init__(self, n):
... super(test2, self).__init__(n)
... def __get_n(self):
... return "Got ya!"
... n = property(__get_n)
...
>>> b = test2(8)
>>> b.n
'Got ya!'

Now, since I'm overriding a property, it is only normal that I may want to
call the property implementation in the super class. But the obvious way (to
me at least) does not work:

>>> print super(test2, b).n
Traceback (most recent call last):
 File "<interactive input>", line 1, in ?
AttributeError: 'super' object has no attribute 'n'

I know I can get at the property via the class, e.g. do

>>> test.n.__get__(b)
8
>>>

Or, not hardcoding the test class,

>>> b.__class__.__mro__[1].n.__get__(b)
8

But this is ugly at best. To add to the puzzle, the following works, albeit
not in the way I expected

>>> super(test2, b).__getattribute__('n')
'Got ya!'

Since I do not know if this is a bug in super or a feature request for it, I
thought I'd better post here and leave it to your consideration.

With my best regards,
G. Rodrigues




From lkcl@samba-tng.org Wed Apr 2 09:07:26 2003
From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton)
Date: Wed, 2 Apr 2003 09:07:26 +0000
Subject: [Python-Dev] [PEP] += on return of function call result
Message-ID: <20030402090726.GN1048@localhost>

example code:

log = {}

	for t in range(5):
		for r in range(10):
			log.setdefault(r, '') += "test %d\n" % t

pprint(log)

instead, as the above is not possible, the following must be used:

from operator import add

 ...
 ...
	 ...

			add(log.setdefault(r, ''), "test %d\n" % t)

... ARGH! just checked - NOPE! add doesn't work.
and there's no function "radd" or "__radd__" in the
operator module.


unless there are really good reasons, can i recommend allowing +=
on return result of function calls.

i cannot honestly think of or believe that there is a reasonable
justification for restricting the += operator.

append() on the return result of setdefault works absolutely
fine, which is GREAT because you have no idea how long i have
been fed up of not being able to do this in one line:

	log = {}
	log.setdefault(99, []).append("test %d\n" % t)

l.



From ark@research.att.com Wed Apr 2 14:54:35 2003
From: ark@research.att.com (Andrew Koenig)
Date: 02 Apr 2003 09:54:35 -0500
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <20030402090726.GN1048@localhost>
References: <20030402090726.GN1048@localhost>
Message-ID: <yu99n0j9gdas.fsf@europa.research.att.com>

Luke> example code:
Luke> log = {}

Luke> 	for t in range(5):
Luke> 		for r in range(10):
Luke> 			log.setdefault(r, '') += "test %d\n" % t

Luke> pprint(log)

Luke> instead, as the above is not possible, the following must be used:

Luke> from operator import add

Luke> ...
Luke> ...
Luke> 	 ...

Luke> 			add(log.setdefault(r, ''), "test %d\n" % t)

Luke> ... ARGH! just checked - NOPE! add doesn't work.
Luke> and there's no function "radd" or "__radd__" in the
Luke> operator module.

Why can't you do this?

 for t in range(5):
 for r in range(10):
 foo = log.setdefault(r,'')
 foo += "test %d\n" % t



-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From lkcl@samba-tng.org Wed Apr 2 15:12:33 2003
From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton)
Date: Wed, 2 Apr 2003 15:12:33 +0000
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <yu99n0j9gdas.fsf@europa.research.att.com>
References: <20030402090726.GN1048@localhost> <yu99n0j9gdas.fsf@europa.research.att.com>
Message-ID: <20030402151232.GX1048@localhost>

On Wed, Apr 02, 2003 at 09:54:35AM -0500, Andrew Koenig wrote:

> Why can't you do this?
> 
> for t in range(5):
> for r in range(10):
> foo = log.setdefault(r,'')
> foo += "test %d\n" % t
 
 because i am thick?

 ... now why didn't that occur to me :)

 thanks andrew,

 l.

 p.s. so it's on the "would be nice to have"


From ben@algroup.co.uk Wed Apr 2 16:22:09 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Wed, 02 Apr 2003 17:22:09 +0100
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net>
References: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net>
Message-ID: <3E8B0E31.5060001@algroup.co.uk>

This message came unglued from the rest of the thread, so I'm going to 
unglue my response from my catching up with the rest of the thread 
(which I am partway through at the moment) ;-)

Phillip J. Eby wrote:
> >However, you don't use the same technique to control access to Python 
> *modules*
> >such as the zipfile module, because the "import zipfile" statement 
> will give the
> >current scope access to the zipfile module even if nobody has granted 
> such
> >access to the current scope.
> >...
> >So your solution to this, to prevent code from grabbing privileges 
> willy nilly
> >via "import" and builtins, is rexec, which creates a scope in which code
> >executes (now called a "workspace"), and allows you to control which 
> builtins
> >and modules are available for code executing in that "workspace".
> 
> Almost. I think you may be confusing module *code* and module 
> *objects*. Guido pointed this out earlier.
> 
> A Python module object is populated by executing a body of *code* 
> against the module *object* dictionary. The module object dictionary 
> contains a '__builtins__' entry that gives it its "base" capabilities.
> 
> Module *objects* possess capabilities, which are in their dictionary or 
> reachable from it. *Code* doesn't possess capabilities except to 
> constants used in the code. So access to *code* only grants you 
> capabilities to the code and its constants.
> 
> So, in order to provide a capability-safe environment, you need only 
> provide a custom __import__ which uses a different 'sys.modules' that is 
> specific to that environment. At that point, a "workspace" consists of 
> an object graph rooted in the supplied '__builtins__', locals(), 
> globals(), and initially executing code.
> 
> We can then see that the standard Python environment is in fact a 
> capability system, wherein everything is reachable from everything else.

I'm not quite sure what you mean by this. Of course, the fact that 
Python doesn't seem to be all that far from a capability system is one 
of the attractions, but until the holes you mention (and perhaps others) 
are plugged, it isn't a capability system.

> 
> The "holes" in this capability system, then, are:
> 
> 1. introspective abilities that allow "breaking out" of the workspace 
> (such as the ability to 'sys._getframe()' or examine tracebacks to 
> "reach up" to higher-level stack frames)
> 
> 2. the structuring of the library in ways that equate creating an 
> instance of a class with an "unsafe" capability. (E.g., creating 
> instances of 'file()') coupled with instance->class introspection
> 
> 3. Lack of true "privacy" for objects. (Proxies are a useful way to 
> address this issue, because they allow more than one "capability" to 
> exist for the same object.)

Of course, once you have a capability system, you get the effect of more 
than one capability for the same object for free, as it were, simply by, 
err, proxying with other objects.

The objection to doing it the other way round is that for capability 
languages to be truly usable the capability functionality needs to be 
automatic, not something that is painfully added to each class or object 
(at least, that is the claim we capability mavens are making).

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From aahz@pythoncraft.com Wed Apr 2 17:55:48 2003
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 2 Apr 2003 12:55:48 -0500
Subject: [Python-Dev] Security challenge (was Re: Capabilities)
In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
References: <3E8768BE.8010603@prescod.net> <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
Message-ID: <20030402175548.GA25135@panix.com>

On Mon, Mar 31, 2003, Ka-Ping Yee wrote:
>
> I'm looking for other security design challenges to tackle in Python.
> Once enough of them have been tried, we'll have a better understanding
> of what Python would need to do to make secure programming easier.

Okay, how about using LDAP to secure access to a database and give each
user appropriate privileges? I'm just throwing this in as an example of
mediated access that's required to be effective in the Real World [tm];
I'm sure you can think of simpler examples if you want.
-- 
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects 
with useful practice. --Aahz, c.l.py, 2/4/2002


From drifty@alum.berkeley.edu Wed Apr 2 20:36:38 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Wed, 2 Apr 2003 12:36:38 -0800 (PST)
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
In-Reply-To: <E190iSj-0007S7-00@localhost>
References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
 <E190iSj-0007S7-00@localhost>
Message-ID: <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU>

[Zooko]

>
> Brett Cannon <bac@OCF.Berkeley.EDU> wrote:
> >
> > One point made about capabilities is that they partially go against the
> > Pythonic grain. Since you have to pass capabilities specifically and
> > shouldn't allow them to be inherited, it does not go with the way you tend
> > to write Python code.
>
> This doesn't make sense to me, and I don't recall a message which asserted it.
>

It was said in an email. I don't remember who off the top of my head, but
someone stated something along these lines.

> If capabilities were implemented as Python references, you could inherit
> capabilities (== references) from superclasses, just as you can currently do.
>

That's why it says "shouldn't" instead of "couldn't". I could re-word
this to go more along the way Ping phrased it in how the class statement
does not make perfect sense for capabilities but it can be used.

> The rest looks like a good summary!
>

Thanks.

-Brett


From martin@v.loewis.de Wed Apr 2 21:24:32 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 02 Apr 2003 23:24:32 +0200
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl>
References: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl>
Message-ID: <m38yuslhin.fsf@mira.informatik.hu-berlin.de>

Jack Jansen <Jack.Jansen@cwi.nl> writes:

> Do we have control over what is on that page, i.e. could we add a
> note to the top saying "If you want to submit a new bug please log
> in first"?

Please have a look at the page now. Look ok? Is that needed for
patches as well?

Regards,
Martin



From fdrake@acm.org Wed Apr 2 21:34:24 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 2 Apr 2003 16:34:24 -0500
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation
 amputated in 2.2 docs?)
In-Reply-To: <m38yuslhin.fsf@mira.informatik.hu-berlin.de>
References: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl>
 <m38yuslhin.fsf@mira.informatik.hu-berlin.de>
Message-ID: <16011.22368.351593.284577@grendel.zope.com>

Martin v. L=F6wis writes:
 > Please have a look at the page now. Look ok? Is that needed for
 > patches as well?

Yes; that tracker has the same requirement for submission.


 -Fred

--=20
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation


From zooko@zooko.com Wed Apr 2 22:53:31 2003
From: zooko@zooko.com (Zooko)
Date: Wed, 02 Apr 2003 17:53:31 -0500
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
In-Reply-To: Message from Brett Cannon <bac@OCF.Berkeley.EDU>
 of "Wed, 02 Apr 2003 12:36:38 PST." <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> <E190iSj-0007S7-00@localhost> <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU>
Message-ID: <E190r6p-0002Yx-00@localhost>

> > > One point made about capabilities is that they partially go against the
> > > Pythonic grain.
...
> > If capabilities were implemented as Python references, you could inherit
> > capabilities (== references) from superclasses, just as you can currently do.
> 
> That's why it says "shouldn't" instead of "couldn't". I could re-word
> this to go more along the way Ping phrased it in how the class statement
> does not make perfect sense for capabilities but it can be used.

I can't speak for Ping, but I would be quite surprised if he thought that 
capabilities were un-Pythonic. (I wouldn't be surprised if he disapproved of 
the notion of classes in a programming language, regardless of security 
considerations...)

Speaking for myself, capabilities have two main advantages: they fit with the 
Zen of Python, they enable higher-order least-privilege, and they fit with the 
principle of unifying designation and authority.

But seriously, I feel that capabilities fit with normal Python programming as it 
is currently practiced.

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links


From zooko@zooko.com Wed Apr 2 23:08:12 2003
From: zooko@zooko.com (Zooko)
Date: Wed, 02 Apr 2003 18:08:12 -0500
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: Message from Ka-Ping Yee <ping@zesty.ca>
 of "Tue, 01 Apr 2003 14:12:49 CST." <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org>
References: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org>
Message-ID: <E190rL2-0002lv-00@localhost>

(I, Zooko, wrote the lines prepended with "> > ".)

 Ping wrote:
>
> > I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's
> > suggestion [1]), Python objects have encapsulation -- one can't access their
> > private data without their permission.
> >
> > Once this is done, Python references are capabilities.
> 
> Aaack! I wish you would *stop* saying that!
> 
> There is no criterion by which a reference is or is not a capability.
> To talk in such terms only confuses the issue.

Let me be a little more precise.

Once Python objects are encapsulated, then possession of a reference is 
constrained in the following way: you can have a reference only if another 
object that had it chose to give it to you (or if you create something yourself, 
in which case you get the first-ever reference to it).

This constraint happens to be the same constraint that the rule of capabilities 
imposes on the transmission of capabilities: you can have a capability only if 
someone else who had it chose to give it to you (or if you create something 
yourself, in which case you get the first-ever capability to it).

Therefore, if you wish to use capability access control to manage access to 
resources in Python you can use the following technique:

1. Encapsulate the resource that you wish to control in a Python object.
2. Say to yourself "References are capabilities!". 
3. Control the way references to that object are shared.

Doing it this way will yield the advantages that capability access control 
enjoys over alternative access control models. It also has the advantage that 
your skills at Python programming can be applied directly to the problem of 
managing access control, without requiring you to learn any new policy language 
or new concepts.

You are quite right, Ping, that capability access control could be enforced in 
other ways in Python. I didn't mean to say "capabilities are Python 
references", which would imply that capability access control could not be 
implemented in any other way.

I'm deliberately refraining from posting about the issue of controlling import 
of modules and builtins in an attempt to "slow down" the discussion until Guido 
returns from Python UK.

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links


From greg@cosc.canterbury.ac.nz Thu Apr 3 01:07:52 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 Apr 2003 13:07:52 +1200 (NZST)
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <20030402151232.GX1048@localhost>
Message-ID: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz>

Andrew Koenig wrote:

> Why can't you do this?
> foo = log.setdefault(r,'')
> foo += "test %d\n" % t

You can do it, but it's useless!

>>> d = {}
>>> foo = d.setdefault(42, "buckle")
>>> foo += " my shoe"
>>> d
{42: 'buckle'}

What Mr. Leighton wanted is *impossible* when the value
concerned is immutable, because by the time you get to
the += operator, there's no information left about where
the value came from, and thus no way to update the
dict with the new value.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Thu Apr 3 02:19:51 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 Apr 2003 14:19:51 +1200 (NZST)
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <m38yuslhin.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304030219.h332Jp223291@oma.cosc.canterbury.ac.nz>

Martin:

> Please have a look at the page now. Look ok?

What page are you talking about, exactly? I just tried
the "Bug Tracker" link in the sidebar of www.python.org,
and it still goes straight to a sourceforge page, which
looks just the same as before as far as I can tell.

What am I supposed to be seeing?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Thu Apr 3 02:31:43 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 02 Apr 2003 21:31:43 -0500
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation
 amputated in 2.2 docs?)
In-Reply-To: <200304030219.h332Jp223291@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEGCECAB.tim.one@comcast.net>

[Greg Ewing]
> What page are you talking about, exactly? I just tried
> the "Bug Tracker" link in the sidebar of www.python.org,
> and it still goes straight to a sourceforge page, which
> looks just the same as before as far as I can tell.
> 
> What am I supposed to be seeing?

I expect he wants you to see the line that says

 Please log into SourceForge to submit a new report. 

below the filter boxes and above the 1-line bug summaries.


From ark@research.att.com Thu Apr 3 02:38:48 2003
From: ark@research.att.com (Andrew Koenig)
Date: 02 Apr 2003 21:38:48 -0500
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz>
References: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz>
Message-ID: <yu99of3otidj.fsf@europa.research.att.com>

Greg> Andrew Koenig wrote:
>> Why can't you do this?
>> foo = log.setdefault(r,'')
>> foo += "test %d\n" % t

Greg> You can do it, but it's useless!

>>>> d = {}
>>>> foo = d.setdefault(42, "buckle")
>>>> foo += " my shoe"
>>>> d
Greg> {42: 'buckle'}

Greg> What Mr. Leighton wanted is *impossible* when the value
Greg> concerned is immutable, because by the time you get to
Greg> the += operator, there's no information left about where
Greg> the value came from, and thus no way to update the
Greg> dict with the new value.

Of course it's impossible when the value is immutable, because += cam't
mutate it :-) However, consider this:

 foo = []
 foo += ["my shoe"]

No problem, right?

So the behavior of

 foo = d.setdefault(r,'')
 foo += "test %d\n" % t

depends on what type foo has, and the OP didn't say. But whatever type
foo might have, the behavior of the two statements above ought logically
to be the same as the theoretical behavior of

 d.setdefault(r,'') += "test %d\n" % t

which is what the OP was trying to achieve.


-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From greg@cosc.canterbury.ac.nz Thu Apr 3 02:56:43 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 Apr 2003 14:56:43 +1200 (NZST)
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEGCECAB.tim.one@comcast.net>
Message-ID: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>

Tim Peters <tim.one@comcast.net>:

> I expect he wants you to see the line that says
> 
> Please log into SourceForge to submit a new report. 
> 
> below the filter boxes and above the 1-line bug summaries.

Hmmm, okay, I can see it now, but it would be easy to
miss if I weren't looking for it.

Perhaps it could be made a little larger and set off from
the items above and below it?

Ideally, of course, the Submit New button should always
be there, and lead to a page telling you to log in
if you're not already. But presumably you don't have
that much control over the page?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Thu Apr 3 03:04:27 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 Apr 2003 15:04:27 +1200 (NZST)
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <yu99of3otidj.fsf@europa.research.att.com>
Message-ID: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz>

Andrew Koenig <ark@research.att.com>:

> So the behavior of
> 
> foo = d.setdefault(r,'')
> foo += "test %d\n" % t
> 
> depends on what type foo has, and the OP didn't say.

I assumed that the code snippet was from his actual application, in
which case he *did* want it to work on strings, in which case, even if
he had the feature he wanted, it wouldn't have helped him.

I think the fact that this would only work when the value was mutable
is a good reason to disallow it. Too big a source of surprises,
otherwise.

Being forced to find another way to update the value in this case is a
feature, because the absence of such a way when the value is immutable
makes it clear that there's no way to do what you're trying to do!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Thu Apr 3 03:09:25 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 02 Apr 2003 22:09:25 -0500
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation
 amputated in 2.2 docs?)
In-Reply-To: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>

[Greg Ewing]
> Hmmm, okay, I can see it now, but it would be easy to
> miss if I weren't looking for it.
>
> Perhaps it could be made a little larger and set off from
> the items above and below it?

We have no control over either -- SF lets us put words there, but that's
all. I added another paragraph:

 Please log into SourceForge to submit a new report.

 SourceForge will not allow you to submit a new bug report unless
 you're logged in.

It's not as invisible now.

> Ideally, of course, the Submit New button should always
> be there, and lead to a page telling you to log in
> if you're not already. But presumably you don't have
> that much control over the page?

That's right.



From fdrake@acm.org Thu Apr 3 03:57:38 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 2 Apr 2003 22:57:38 -0500
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation
 amputated in 2.2 docs?)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>
References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>
 <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>
Message-ID: <16011.45362.723995.488848@grendel.zope.com>

Tim Peters writes:
 > We have no control over either -- SF lets us put words there, but that's
 > all. I added another paragraph:

We can do a little more; see the Expat tracker's "Submit New" page for
an example that enhances the presentation a bit:

 http://sourceforge.net/tracker/?func=add&group_id=10127&atid=110127

One catch, of course, is that the extra blurb is always shown, even
for people that are already logged in (I suspect the majority of use
is by the development team); the farther down the page we push the
actual bug information, the harder it is for developers to use.

We need to think about the tradeoff; it is important to encourage good
reports from people interested in providing them and willing to do so.


 -Fred

-- 
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation


From martin@v.loewis.de Thu Apr 3 04:33:31 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 03 Apr 2003 06:33:31 +0200
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?)
In-Reply-To: <16011.45362.723995.488848@grendel.zope.com>
References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>
 <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>
 <16011.45362.723995.488848@grendel.zope.com>
Message-ID: <m37kac19pg.fsf@mira.informatik.hu-berlin.de>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> One catch, of course, is that the extra blurb is always shown, even
> for people that are already logged in (I suspect the majority of use
> is by the development team); the farther down the page we push the
> actual bug information, the harder it is for developers to use.

I have now boldified parts of it; this doesn't take make space, but
should increase visibility. I hope it's not considered annoying - feel
free to undo that.

If they would allow us to put PHP into that box, we could even
suppress the text if the user was logged in.

Regards,
Martin


From boris.boutillier@arteris.net Thu Apr 3 06:09:11 2003
From: boris.boutillier@arteris.net (Boris Boutillier)
Date: 03 Apr 2003 08:09:11 +0200
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz>
References: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz>
Message-ID: <1049350152.23533.20.camel@elevedelix>

Thre is a way to do it, even with immutable objects, it is a little bit
heavier :
>>> x = {}
>>> x.setdefault(42,'buckle')
'buckle'
>>> x[42] += '3'
>>> x
{42: 'buckle3'}


Boris Boutillier,

 - ARTERIS -
Artwork Interconnecting System
6, Parc Ariane
78284 Guyancourt (FRANCE)

On Thu, 2003-04-03 at 05:04, Greg Ewing wrote:
> Andrew Koenig <ark@research.att.com>:
> 
> > So the behavior of
> > 
> > foo = d.setdefault(r,'')
> > foo += "test %d\n" % t
> > 
> > depends on what type foo has, and the OP didn't say.
> 
> I assumed that the code snippet was from his actual application, in
> which case he *did* want it to work on strings, in which case, even if
> he had the feature he wanted, it wouldn't have helped him.
> 
> I think the fact that this would only work when the value was mutable
> is a good reason to disallow it. Too big a source of surprises,
> otherwise.
> 
> Being forced to find another way to update the value in this case is a
> feature, because the absence of such a way when the value is immutable
> makes it clear that there's no way to do what you're trying to do!
> 
> Greg Ewing, Computer Science Dept, +--------------------------------------+
> University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
> Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
> greg@cosc.canterbury.ac.nz	 +--------------------------------------+
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev




From walter@livinglogic.de Thu Apr 3 08:53:17 2003
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Thu, 03 Apr 2003 10:53:17 +0200
Subject: [Python-Dev] [PEP] += on return of function call result
In-Reply-To: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz>
References: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz>
Message-ID: <3E8BF67D.4060807@livinglogic.de>

Greg Ewing wrote:

> Andrew Koenig <ark@research.att.com>:
> 
> 
>>So the behavior of
>>
>> foo = d.setdefault(r,'')
>> foo += "test %d\n" % t
>>
>>depends on what type foo has, and the OP didn't say.
> 
> I assumed that the code snippet was from his actual application, in
> which case he *did* want it to work on strings, in which case, even if
> he had the feature he wanted, it wouldn't have helped him.
> [...]
> Being forced to find another way to update the value in this case is a
> feature, because the absence of such a way when the value is immutable
> makes it clear that there's no way to do what you're trying to do!

Mutable (or at least appendable) strings should probably be done with
StringIO/cStringIO. How about adding support for __iadd__ and
__str__ (and __unicode__) to both?

Bye,
 Walter Dörwald




From ben@algroup.co.uk Thu Apr 3 10:43:10 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Thu, 03 Apr 2003 11:43:10 +0100
Subject: [Python-Dev] Capabilities
In-Reply-To: <E1903R1-0005sc-00@localhost>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost>
Message-ID: <3E8C103E.90201@algroup.co.uk>

Zooko wrote:
> In the capability way of life, it is still the case that access to the ZipFile 
> class gives you the ability to open files anywhere in the system! (That is: I'm 
> assuming for now that we implement capabilities without re-writing every 
> dangerous class in the Library.) In this scheme, there are no flags, and when 
> you run code that you think might misuse this feature, you simply don't give 
> that code a reference to the ZipFile class. (Also, we have to arrange that it 
> can't acquire a reference by "import zipfile".)

It would probably be helpful to explain what you (or, at least, I) would 
do if you (I) were writing from scratch, rather then "taming" the 
existing libraries. In this case, Zipfile would require a file 
capability to be passed to it at construction time, and so would become 
non-dangerous, which is, I think, where Guido is coming from.

The risk only occurs because we want to not rewrite the whole library, 
just to wrap it, and its important to understand that this isn't really 
the "proper" way to do it (though, of course, the ZipFile class is not 
unlike any of the other non-capability things we'd have to wrap anyway, 
given a non-capability OS underneath, it just happens to be one that 
_can_ be rewritten if we want to rewrite it).

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From ben@algroup.co.uk Thu Apr 3 10:52:08 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Thu, 03 Apr 2003 11:52:08 +0100
Subject: [Python-Dev] Capabilities (we already got one)
In-Reply-To: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com>
References: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com>
Message-ID: <3E8C1258.3070906@algroup.co.uk>

Ken Manheimer wrote:
> On Tue, 1 Apr 2003, Ka-Ping Yee wrote:
> One big one seems to be: "What needs to be done to enable effective
> ("safe"?) use of python object (references) as capabilities?" I've
> seen answers to this roll by several times - i think we need to settle
> them, and collect the conclusions in a PEP. And we need to identify
> what other questions there are.

I am in the process of writing a PEP, and it is being informed by this 
discussion. Unfortunately, I have several day jobs and its going 
somewhat slowly. I've also been bogged down somewhat in a theoretical 
discussion with a bunch of capability experts over globals and how they 
should work. However, we do appear to have reached closure on that 
issue: globals have to be at least transitively immutable - 
unfortunately, I have demonstrated that this requirement is not 
sufficient to make them safe, but it is (we believe) necessary. So, now 
I've sorted that one out I can complete my first pass on the PEP, which 
I expect to do in the next few days.

At that point, I'm slightly unsure how best to proceed. The most obvious 
way is, of course, to follow the standard PEP procedure, but are there 
people who would like to comment before I submit the first draft?

It is still going to be full of unanswered questions, but I do think we 
are near to the stage where we can start nailing down the answers.

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From mcherm@mcherm.com Thu Apr 3 13:09:31 2003
From: mcherm@mcherm.com (Michael Chermside)
Date: Thu, 3 Apr 2003 05:09:31 -0800
Subject: [Python-Dev] Re: Capabilities (we already got one)
Message-ID: <1049375371.3e8c328be581d@mcherm.com>

> The objection to doing it the other way round is that for capability 
> languages to be truly usable the capability functionality needs to be 
> automatic, not something that is painfully added to each class or object 
> (at least, that is the claim we capability mavens are making).

Just how strong a claim are you making here?

It seems to me that the need for security (via capabilities or any other
mechanism) is an UNUSUAL need. Most programs don't need it at all,
others need it in only a few places. Now don't get me wrong... when you
DO need it, you really need it, and just throwing something together
without explicit language support is somewhere between impossible and
terrifically-difficult-and-error-prone. So supporting secure execution
(via capabilities or whatever) in the language is a great idea. And I
like the capabilities-as-references approach... it's simple, elegant, 
and not error prone.

But if you're going so far as to imply that capability functionality
needs to be present ALWAYS, and supported (and considered) in every class
or object, then that's going too far. A random module should, for
instance, be able to open arbitrary files in the file system without
being passed any special objects, UNLESS we do something special when we
load it to indicate that we want it to run in a restricted mode.

I think that zipfile is a good example here. As a library developer, I
should be able to write and distribute a zipfile module without thinking
about capabilities or security at all. Of course, when others go to use
it in a secure or restricted mode, they may find that it isn't as useful
as they'd like, but (I believe) we shouldn't say NO ONE can have a
zipfile module unless the module author is willing to address security
issues. Someone can write securezipfile when they get the itch.

Now, if we really built security (via capabilities) into the language
from the ground up, then ALL modules would work by being passed
appropriate capability objects, and only the starting script would
possess all capabilities. There would be no "file" builtin, just file
objects (and ReadOnlyFile objects, and DirectorySubTree objects, and
so forth) which got passed around. So OF COURSE the original author
of zipfile would write it to accept a file at construction rather than
allowing it to open files... that would be the natural way to do things.
But that language isn't python... and I don't think it's worth changing
Python enough to get there.

So if you're proposing this drastic a change (which I doubt), then I
think it's too drastic. But if you're NOT, then you have to realize
that there will be lots of library modules like zipfile, which were
written by people who didn't give any thought to security (since it's
a rarely-used feature of the language). So we need workarounds (like
wrappers or proxies) that can be applied after-the-fact to modules and
classes that weren't written with security in mind. If that's 
"painfully adding something to each class or object", then I don't see
how it's to be avoided.

-- Michael Chermside




From zooko@zooko.com Thu Apr 3 13:29:57 2003
From: zooko@zooko.com (Zooko)
Date: Thu, 03 Apr 2003 08:29:57 -0500
Subject: [Python-Dev] Capabilities
In-Reply-To: Message from Ben Laurie <ben@algroup.co.uk>
 of "Thu, 03 Apr 2003 11:43:10 +0100." <3E8C103E.90201@algroup.co.uk>
References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <3E8C103E.90201@algroup.co.uk>
Message-ID: <E1914mz-0005SN-00@localhost>

(I, Zooko, wrote the lines prepended with "> > ".)

 Ben Laurie wrote:
>
> > In the capability way of life, it is still the case that access to the ZipFile 
> > class gives you the ability to open files anywhere in the system! (That is: I'm 
> > assuming for now that we implement capabilities without re-writing every 
> > dangerous class in the Library.)
...
> It would probably be helpful to explain what you (or, at least, I) would 
> do if you (I) were writing from scratch, rather then "taming" the 
> existing libraries. In this case, Zipfile would require a file 
> capability to be passed to it at construction time, and so would become 
> non-dangerous, which is, I think, where Guido is coming from.

Thank you. You are right about how I would do it, and I think you are right 
that this fits with Guido's approach, too.

I would make the constructor of the ZipFile class take a file object, and hide 
(at least from unprivileged code) the option of passing a filename to the 
constructor. This would make it so that no authority is gained by importing the 
zipfile module.

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links



From ben@algroup.co.uk Thu Apr 3 14:04:27 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Thu, 03 Apr 2003 15:04:27 +0100
Subject: [Python-Dev] Capabilities
In-Reply-To: <3E88E2B6.1080409@prescod.net>
References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> <3E88E2B6.1080409@prescod.net>
Message-ID: <3E8C3F6B.8000000@algroup.co.uk>

Paul Prescod wrote:
> Are DOS issues in scope? How do we prevent untrusted code from just 
> bringing the interpreter to a halt? A smart enough attacker could even 
> block all threads in the current process by finding a task that is 
> usually not time-sliced and making it go on for a very long time. 
> without looking at the Python implementation, I can't remember an 
> example off of the top of my head, but perhaps a large multiplication or 
> search-and-replace in a string.

It seems to me that this is an issue orthogonal to capabilities (though 
access to mechanisms that regulate it might well be capability-based).

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From ben@algroup.co.uk Thu Apr 3 14:05:45 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Thu, 03 Apr 2003 15:05:45 +0100
Subject: [Python-Dev] Capabilities
In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org>
Message-ID: <3E8C3FB9.50101@algroup.co.uk>

Ka-Ping Yee wrote:
> Hmm, i'm not sure you understood what i meant. The code example i posted
> is a solution to the design challenge: "provide read-only access to a
> directory and its subdirectories, but no access to the rest of the filesystem".
> I'm looking for other security design challenges to tackle in Python.
> Once enough of them have been tried, we'll have a better understanding of
> what Python would need to do to make secure programming easier.

Well, one of the favourites is to create a file selection dialog that 
will only give access (optionally readonly) to the file designated by 
the user. This may be rather more than you want to bite off as a working 
system at this stage, though! It might be a useful thought experiment, 
though.

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From fdrake@acm.org Thu Apr 3 14:40:21 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 3 Apr 2003 09:40:21 -0500
Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation
 amputated in 2.2 docs?)
In-Reply-To: <m37kac19pg.fsf@mira.informatik.hu-berlin.de>
References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>
 <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>
 <16011.45362.723995.488848@grendel.zope.com>
 <m37kac19pg.fsf@mira.informatik.hu-berlin.de>
Message-ID: <16012.18389.659720.951267@grendel.zope.com>

Martin v. L=F6wis writes:
 > I have now boldified parts of it; this doesn't take make space, but
 > should increase visibility. I hope it's not considered annoying - fe=
el
 > free to undo that.

Nice! I've made the boldified text a hyperlink to the login page, and
copied the text to the patch tracker as well.

 > If they would allow us to put PHP into that box, we could even
 > suppress the text if the user was logged in.

Hmm. I don't know that they won't, I just don't know the incantation
to determine if a user is logged on.


 -Fred

--=20
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation


From drifty@alum.berkeley.edu Thu Apr 3 19:05:56 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Thu, 3 Apr 2003 11:05:56 -0800 (PST)
Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31
In-Reply-To: <E190r6p-0002Yx-00@localhost>
References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU>
 <E190iSj-0007S7-00@localhost> <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU>
 <E190r6p-0002Yx-00@localhost>
Message-ID: <Pine.SOL.4.53.0304031105190.11078@death.OCF.Berkeley.EDU>

[Zooko]

> But seriously, I feel that capabilities fit with normal Python programming as it
> is currently practiced.
>

The paragraph is gone, so no need to worry about this anymore.

-Brett


From altis@semi-retired.com Thu Apr 3 19:42:09 2003
From: altis@semi-retired.com (Kevin Altis)
Date: Thu, 3 Apr 2003 11:42:09 -0800
Subject: [Python-Dev] fwd: Dan Sugalski on continuations and closures
Message-ID: <KJEOLDOPMIDKCMJDCNDPAEHLDDAA.altis@semi-retired.com>

via Simon Willison's blog:

 http://simon.incutio.com/archive/2003/04/03/#closuresAndContinuations

"
Thanks to Dan Sugalski (designer of Parrot, the next generation Perl VM) I
finally understand what continuations and closures actually are. He explains
them as part of a comparison between the forthcoming Parrot and two popular
virtual machines already in existence:

 * (Perl|python|Ruby) on (.NET|JVM) leads in to the explanation.
 http://www.sidhe.org/~dan/blog/archives/000151.html

 * The reason for Parrot, part 2 explains closures.
 http://www.sidhe.org/~dan/blog/archives/000152.html

 * Continuations and VMs explains continuations.
 http://www.sidhe.org/~dan/blog/archives/000156.html

 * Continuations and VMs, part 2 rounds things off by explaining why the JVM
and the CLR are unsuitable environments for supporting these language
features.
 http://www.sidhe.org/~dan/blog/archives/000157.html
"

ka
ps. In order to focus on Python promotion and site-redesign efforts I've
suspended delivery of python-dev email in the short-term and will only be
scanning the archives as time permits. If you need to flame me, please
address your emails to me directly or /dev/null, your choice ;-)



From martin@v.loewis.de Thu Apr 3 22:36:49 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 04 Apr 2003 00:36:49 +0200
Subject: [Python-Dev] Re: How do I report a bug?
In-Reply-To: <16012.18389.659720.951267@grendel.zope.com>
References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz>
 <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net>
 <16011.45362.723995.488848@grendel.zope.com>
 <m37kac19pg.fsf@mira.informatik.hu-berlin.de>
 <16012.18389.659720.951267@grendel.zope.com>
Message-ID: <m365pv8aym.fsf_-_@mira.informatik.hu-berlin.de>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> > If they would allow us to put PHP into that box, we could even
> > suppress the text if the user was logged in.
> 
> Hmm. I don't know that they won't, I just don't know the incantation
> to determine if a user is logged on.

If it's still the same code as in SF 2.5, it is "user_isloggedin()":

http://phpxref.sourceforge.net/sourceforge/include/User.class.source.html#l555

As an example usage, see

http://phpxref.sourceforge.net/sourceforge/patch/add_patch.php.source.html#l49

Regards,
Martin


From tim.one@comcast.net Fri Apr 4 04:08:54 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 03 Apr 2003 23:08:54 -0500
Subject: [Python-Dev] Boom
Message-ID: <LNBBLJKPBEHFEDALKOLCEEIAECAB.tim.one@comcast.net>

While enduring dental implant surgery earlier today, I thought to myself
"oops -- I bet this program will crash Python". Turns out it does, in
current CVS, and almost certainly in every version of Python since cyclic gc
was added:

"""
import gc

class C:
 def __getattr__(self, attr):
 del self.attr
 raise AttributeError

a = C()
b = C()
a.attr = b
b.attr = a

del a, b
gc.collect()
"""

Short course: a and b are in a trash cycle. gcmodule's move_finalizers()
finds one of them and calls has_finalizer() to see whether it's collectible.
Say it's b. has_finalizer() calls (in effect) hasattr(b, "__del__"), and
b.__getattr__() deletes b.attr as a side effect before saying b.__del__
doesn't exist. That drops the refcount on a to 0, which in turn drops the
refcount on a.__dict__ to 0. Those two are the killers: a and a.__dict__
become untracked (by gc) as part of cleaning them up, but the
move_finalizers() "next" local still points to one of them (to the __dict__,
in the run I happened to step thru). As a result, the next trip around the
move_finalizer() loop calls has_finalizer() on memory that's already been
free()ed. Hilarity ensues.

The anesthesia is wearing off and I won't speculate about solutions now. I
suspect it's easy, or close to intractable. PLabs folks, I'm unsure whether
this relates to the ZODB test failure we've been bashing away at. All, ZODB
is a persistent database, and at one point in this test gc determines that
"a ghost" is unreachable. When gc's has_finalizer() asks whether the ghost
has a __del__ method, the persistence machinery kicks in, sucking the
ghost's state off of disk, and executing a lot of Python code as a result.
Part of the Python code executed does appear (if hazy memory serves) to
delete some previously unreachable objects that were also in (or hanging off
of) the ghost's cycle, and so in the unreachable list gc's move_finalizers()
is crawling over.

The kind of blowup above could be one bad effect, and Jeremy was seeing
blowups with move_finalizers() in the traceback. Unfortunately, the test
doesn't blow up under CVS Python, and 2.2.2 doesn't have the telltale
0xdbdbdbdb filler 2.3's debug PyMalloc sprays into free()ed memory.



From tim.one@comcast.net Fri Apr 4 04:37:47 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 03 Apr 2003 23:37:47 -0500
Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <E191Dj8-00070O-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEIBECAB.tim.one@comcast.net>

[jhylton@users.sourceforge.net]
> Modified Files:
> Tag: release22-maint
> 	gcmodule.c
> Log Message:
> Fix memory corruption in garbage collection.
> ...
> The problem with the previous revision is that it followed
> gc->gc.gc_next before calling has_finalizer(). If has_finalizer()
> gc->happened to deallocate the object FROM_GC(gc->gc.gc_next), then
> the next time through the loop gc would point to freed memory. The
> fix is to always follow the next pointer after calling
> has_finalizer().

Oops! I didn't see this before posting my "Boom" msg.

> Note that Python 2.3 does not have this problem, because
> has_finalizer() checks the tp_del slot and never runs Python code.

That part isn't so, alas: the program I posted in the "Boom" msg crashes
2.3, via the same mechanism:

	return PyInstance_Check(op) ? PyObject_HasAttr(op, delstr) :
	 PyType_HasFeature(op->ob_type, Py_TPFLAGS_HEAPTYPE) ?
		op->ob_type->tp_del != NULL : 0;

It's the PyInstance_Check(op) path there that's still vulnerable. I'll poke
at that.

> Tim, Barry, and I peed away the better part of two days tracking this
> down.
> ! 		next = gc->gc.gc_next;
> 		if (has_finalizer(op)) {
> 			gc_list_remove(gc);
> 			gc_list_append(gc, finalizers);
> 			gc->gc.gc_refs = GC_MOVED;
> 		}
> 	}
> }
> --- 277,290 ----
> 	for (; gc != unreachable; gc=next) {
> 		PyObject *op = FROM_GC(gc);
> ! 		/* has_finalizer() may result in arbitrary Python
> ! 		 code being run. */
> 		if (has_finalizer(op)) {
> + 			next = gc->gc.gc_next;
> 			gc_list_remove(gc);
> 			gc_list_append(gc, finalizers);
> 			gc->gc.gc_refs = GC_MOVED;
> 		}
> + 		else
> + 			next = gc->gc.gc_next;
> 	}
> }

Are we certain that has_finalizer() can't unlink gc itself from the
unreachable list? If it can, then

> + 		else
> + 			next = gc->gc.gc_next;

will set next to the content of free()ed memory. In fact, I believe the
Boom program will suffer this fate ... yup, it does. "The problem" isn't
yet really fixed in any version of Python, although I agree it's a lot
better with the change above.



From ben@algroup.co.uk Fri Apr 4 10:41:43 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Fri, 04 Apr 2003 11:41:43 +0100
Subject: [Python-Dev] Re: Capabilities (we already got one)
In-Reply-To: <1049375371.3e8c328be581d@mcherm.com>
References: <1049375371.3e8c328be581d@mcherm.com>
Message-ID: <3E8D6167.4020804@algroup.co.uk>

Michael Chermside wrote:
>>The objection to doing it the other way round is that for capability 
>>languages to be truly usable the capability functionality needs to be 
>>automatic, not something that is painfully added to each class or object 
>>(at least, that is the claim we capability mavens are making).
> 
> 
> Just how strong a claim are you making here?
> 
> It seems to me that the need for security (via capabilities or any other
> mechanism) is an UNUSUAL need. Most programs don't need it at all,
> others need it in only a few places. Now don't get me wrong... when you
> DO need it, you really need it, and just throwing something together
> without explicit language support is somewhere between impossible and
> terrifically-difficult-and-error-prone. So supporting secure execution
> (via capabilities or whatever) in the language is a great idea. And I
> like the capabilities-as-references approach... it's simple, elegant, 
> and not error prone.
> 
> But if you're going so far as to imply that capability functionality
> needs to be present ALWAYS, and supported (and considered) in every class
> or object, then that's going too far. A random module should, for
> instance, be able to open arbitrary files in the file system without
> being passed any special objects, UNLESS we do something special when we
> load it to indicate that we want it to run in a restricted mode.
> 
> I think that zipfile is a good example here. As a library developer, I
> should be able to write and distribute a zipfile module without thinking
> about capabilities or security at all. Of course, when others go to use
> it in a secure or restricted mode, they may find that it isn't as useful
> as they'd like, but (I believe) we shouldn't say NO ONE can have a
> zipfile module unless the module author is willing to address security
> issues. Someone can write securezipfile when they get the itch.
> 
> Now, if we really built security (via capabilities) into the language
> from the ground up, then ALL modules would work by being passed
> appropriate capability objects, and only the starting script would
> possess all capabilities. There would be no "file" builtin, just file
> objects (and ReadOnlyFile objects, and DirectorySubTree objects, and
> so forth) which got passed around. So OF COURSE the original author
> of zipfile would write it to accept a file at construction rather than
> allowing it to open files... that would be the natural way to do things.
> But that language isn't python... and I don't think it's worth changing
> Python enough to get there.
> 
> So if you're proposing this drastic a change (which I doubt), then I
> think it's too drastic. But if you're NOT, then you have to realize
> that there will be lots of library modules like zipfile, which were
> written by people who didn't give any thought to security (since it's
> a rarely-used feature of the language). So we need workarounds (like
> wrappers or proxies) that can be applied after-the-fact to modules and
> classes that weren't written with security in mind. If that's 
> "painfully adding something to each class or object", then I don't see
> how it's to be avoided.

I am completely in agreement. Taming of existing modules is inevitably 
going to be somewhat painful - and, in some cases, it may be less 
painful to simply rewrite them. As you suspect, what I am proposing is 
that _when_ a programmer wishes to use capabilities as a security 
mechanism, it is desirable to make that as easy to use as possible.

I'm not sure I agree that the need for security is particularly unusual 
but I don't think its worth having a big argument about. I certainly do 
agree that crippling Python in order to get capabilities is not a 
desirable outcome. Not that I have that option anyway :-)

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From ping@zesty.ca Fri Apr 4 12:28:18 2003
From: ping@zesty.ca (Ka-Ping Yee)
Date: Fri, 4 Apr 2003 06:28:18 -0600 (CST)
Subject: [Python-Dev] Re: Capabilities (we already got one)
In-Reply-To: <3E8D6167.4020804@algroup.co.uk>
Message-ID: <Pine.LNX.4.33.0304040616370.1082-100000@server1.lfw.org>

Michael Chermside wrote:
> It seems to me that the need for security (via capabilities or any other
> mechanism) is an UNUSUAL need. Most programs don't need it at all,
> others need it in only a few places.

I think you are missing the point somewhat. Security is about making
sure your program will do what you expect. So it is just as much about
avoiding bugs as about thwarting malicious agents. Programming in a
capability style makes programs more reliable and bugs less damaging.

Colleagues of mine have established the habit of programming in a
capability style in Java -- not because Java supports capabilities,
and not because they need security at all, but just because programming
*as if* the language had capabilities leads to a better modular design.

On Fri, 4 Apr 2003, Ben Laurie wrote:
> I'm not sure I agree that the need for security is particularly unusual
> but I don't think its worth having a big argument about. I certainly do
> agree that crippling Python in order to get capabilities is not a
> desirable outcome. Not that I have that option anyway :-)

I also prefer to avoid loaded language. No one is talking about
"crippling" anything. The essence of a capability model is simply
to be explicit when authority is transferred. Explicit is better
than implicit.


-- ?!ng



From jeremy@zope.com Fri Apr 4 16:46:32 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 04 Apr 2003 11:46:32 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins] python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <list-1424542@digicool.com>
References: <list-1424542@digicool.com>
Message-ID: <1049474792.14151.85.camel@slothrop.zope.com>

On Thu, 2003-04-03 at 23:37, Tim Peters wrote:
> > ! 		next = gc->gc.gc_next;
> > 		if (has_finalizer(op)) {
> > 			gc_list_remove(gc);
> > 			gc_list_append(gc, finalizers);
> > 			gc->gc.gc_refs = GC_MOVED;
> > 		}
> > 	}
> > }
> > --- 277,290 ----
> > 	for (; gc != unreachable; gc=next) {
> > 		PyObject *op = FROM_GC(gc);
> > ! 		/* has_finalizer() may result in arbitrary Python
> > ! 		 code being run. */
> > 		if (has_finalizer(op)) {
> > + 			next = gc->gc.gc_next;
> > 			gc_list_remove(gc);
> > 			gc_list_append(gc, finalizers);
> > 			gc->gc.gc_refs = GC_MOVED;
> > 		}
> > + 		else
> > + 			next = gc->gc.gc_next;
> > 	}
> > }
> 
> Are we certain that has_finalizer() can't unlink gc itself from the
> unreachable list? If it can, then
> 
> > + 		else
> > + 			next = gc->gc.gc_next;
> 
> will set next to the content of free()ed memory. In fact, I believe the
> Boom program will suffer this fate ... yup, it does. "The problem" isn't
> yet really fixed in any version of Python, although I agree it's a lot
> better with the change above.

It looks like it's hard to find a place to stand. Since arbitrary
Python code can run, then an arbitrary set of objects in the unreachable
list can suddenly become unlinked. The previous, current, and next
objects are all suspect.

I think a safe approach would be to move everything out of unreachable
and into either "collectable" or "finalizers". That way, we can do a 
while (!gc_list_is_empty(unreachable)) loop and always deal with the
head of the unreachable list. Each time through the loop, the head of
the list can be moved to collectable or finalizers or become unlinked,
so we always make progress.

Sound plausible?

Jeremy




From jeremy@zope.com Fri Apr 4 17:39:16 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 04 Apr 2003 12:39:16 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]
 python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049474792.14151.85.camel@slothrop.zope.com>
References: <list-1424542@digicool.com>
 <1049474792.14151.85.camel@slothrop.zope.com>
Message-ID: <1049477956.14152.93.camel@slothrop.zope.com>

On Fri, 2003-04-04 at 11:46, Jeremy Hylton wrote:
> I think a safe approach would be to move everything out of unreachable
> and into either "collectable" or "finalizers". That way, we can do a 
> while (!gc_list_is_empty(unreachable)) loop and always deal with the
> head of the unreachable list. Each time through the loop, the head of
> the list can be moved to collectable or finalizers or become unlinked,
> so we always make progress.
> 
> Sound plausible?

Yes. I've got a patch that fixes the boom case, but I'm not sure I've
handled the case where the object becomes reachable as a result of
running PyObject_HasAttr(). I'll post after testing that.

Jeremy




From jeremy@zope.com Fri Apr 4 18:26:11 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 04 Apr 2003 13:26:11 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]
 python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049477956.14152.93.camel@slothrop.zope.com>
References: <list-1424542@digicool.com>
 <1049474792.14151.85.camel@slothrop.zope.com>
 <1049477956.14152.93.camel@slothrop.zope.com>
Message-ID: <1049480770.14146.95.camel@slothrop.zope.com>

On Fri, 2003-04-04 at 12:39, Jeremy Hylton wrote:
> On Fri, 2003-04-04 at 11:46, Jeremy Hylton wrote:
> > I think a safe approach would be to move everything out of unreachable
> > and into either "collectable" or "finalizers". That way, we can do a 
> > while (!gc_list_is_empty(unreachable)) loop and always deal with the
> > head of the unreachable list. Each time through the loop, the head of
> > the list can be moved to collectable or finalizers or become unlinked,
> > so we always make progress.
> > 
> > Sound plausible?
> 
> Yes. I've got a patch that fixes the boom case, but I'm not sure I've
> handled the case where the object becomes reachable as a result of
> running PyObject_HasAttr(). I'll post after testing that.

It's SF patch 715446.

There's a lingering problem with test_gc, but I hope it's tractable.

Jeremy




From jeremy@zope.com Fri Apr 4 20:15:51 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 04 Apr 2003 15:15:51 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]
 python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049480770.14146.95.camel@slothrop.zope.com>
References: <list-1424542@digicool.com>
 <1049474792.14151.85.camel@slothrop.zope.com>
 <1049477956.14152.93.camel@slothrop.zope.com>
 <1049480770.14146.95.camel@slothrop.zope.com>
Message-ID: <1049487350.14146.101.camel@slothrop.zope.com>

We've got the first version of boom nailed, but we've got the same
problem in handle_finalizers(). The version of boom below doesn't blow
up until the second time the has_finalizer() is called.

I don't understand the logic in handle_finalizers(), though. If the
objects are all in the finalizers list, why do we call has_finalizer() a
second time? Shouldn't everything has a finalizer at that point?

Jeremy


import gc

class C:

 def __init__(self):
 self.x = 0

 def delete(self):
 print "never called"
 
 def __getattr__(self, attr):
 self.x += 1
 print self.x
 if self.x > 1:
 del self.attr
 else:
 return self.delete
 raise AttributeError

a = C()
b = C()
a.attr = b
b.attr = a

del a, b
print gc.collect()




From tim_one@email.msn.com Sat Apr 5 08:15:40 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 5 Apr 2003 03:15:40 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENBEEAB.tim_one@email.msn.com>

[Jeremy Hylton]
> We've got the first version of boom nailed, but we've got the same
> problem in handle_finalizers(). The version of boom below doesn't blow
> up until the second time the has_finalizer() is called.
>
> I don't understand the logic in handle_finalizers(), though. If the
> objects are all in the finalizers list, why do we call has_finalizer() a
> second time? Shouldn't everything has a finalizer at that point?

Nope -- the parenthetical

/* Handle uncollectable garbage (cycles with finalizers). */

comment is incomplete. The earlier call to move_finalizer_reachable() also
put everything reachable only *from* trash cycles with finalizers into the
list. So, e.g., if the trash graph is like

 A<->B->C

and A has a finalizer but B and C don't, they're all in the finalizers list
(at this point) regardless. But B and C aren't stopping the blob from
getting collected, and we're trying to do the user a favor by putting only A
(the troublemaker) into gc.garbage. It's an approximation, though. For
example, if A and C both had finalizers, A and C would both be put into
gc.garbage, despite that C's finalizer isn't stopping anything from getting
collected.

The comments are apparently a bit out of synch with the code, because 17
months ago all instance objects in the finalizers list were put into
gc.garbage (regardless of whether they had __del__). The checkin comment
for rev 2.28 sez the __del__ change was needed to fix a bug; but I'm too
groggy to dig more now.



From tim_one@email.msn.com Sat Apr 5 19:34:36 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 5 Apr 2003 14:34:36 -0500
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com>

I checked in some more changes (2.3 head only). This kind of program may be
intractable:

"""
class C:
 def __getattr__(self, attribute):
 global alist
 if 'attr' in self.__dict__:
 alist.append(self.attr)
 del self.attr
 raise AttributeError

import gc
gc.collect()
a = C()
b = C()
alist = []
a.attr = b
b.attr = a
a.x = 1
b.x = 2

del a, b

# Oops. This prints 4: it's collecting
# a, b, and their dicts.
print gc.collect()

# Despite that __getattr__ resurrected them.
print alist

# But gc cleared their dicts.
print alist[0].__dict__
print alist[1].__dict__

# So a.x and b.x fail.
print alist[0].x, alist[1].x
"""

While a __getattr__ side effect may resurrect an object in gc's unreachable
list, gc has no way to know that an object has been resurrected short of
starting over again. In the absence of that, the object remains in gc's
unreachable list, and its tp_clear slot eventually gets called. The
internal C stuff remains self-consistent, so this won't cause a segfault
(etc), but it may (as above) be surprising. I don't see a sane way to fix
this so long as asking whether __del__ exists can execute arbitrary mounds
of Python code.



From exarkun@intarweb.us Sat Apr 5 19:35:31 2003
From: exarkun@intarweb.us (Jp Calderone)
Date: Sat, 5 Apr 2003 14:35:31 -0500
Subject: [Python-Dev] Placement of os.fdopen functionality
Message-ID: <20030405193531.GA23455@meson.dyndns.org>

--2fHTh5uZTiUOsy+g
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

 It occurred to me this afternoon (after answering aquestion about creating
file objects from file descriptors) that perhaps os.fdopen would be more
logically placed someplace else - of course it could also remain as
os.fdopen() for whatever deprecation period is warrented.

 Perhaps as a class method of the file type, file.fromfd()?

 Should I file a feature request for this on sf, or would it be considered
too much of a mindless twiddle to bother with?

 Jp

--=20
http://catandgirl.com/view.cgi?44
--=20
 up 16 days, 16:00, 5 users, load average: 1.13, 0.93, 0.85

--2fHTh5uZTiUOsy+g
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+jzADedcO2BJA+4YRApeUAJ98bFbiUoBXXdzYm025xmV8LamPbwCcDs/J
C1oeDLOPgcWgAWwEDQGCGOg=
=qSMA
-----END PGP SIGNATURE-----

--2fHTh5uZTiUOsy+g--


From martin@v.loewis.de Sat Apr 5 20:34:13 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 05 Apr 2003 22:34:13 +0200
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <20030405193531.GA23455@meson.dyndns.org>
References: <20030405193531.GA23455@meson.dyndns.org>
Message-ID: <m37ka81y62.fsf@mira.informatik.hu-berlin.de>

Jp Calderone <exarkun@intarweb.us> writes:

> Perhaps as a class method of the file type, file.fromfd()?
> 
> Should I file a feature request for this on sf, or would it be considered
> too much of a mindless twiddle to bother with?

Feel free to file a feature request, but I'd predict that it might sit
there for some years until it is closed because of no action.

OTOH, if you would produce a patch implementing the feature, it might
get attention.

Regards,
Martin



From tim.one@comcast.net Sun Apr 6 00:05:21 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 05 Apr 2003 19:05:21 -0500
Subject: [Python-Dev] Re: [PythonLabs] Re:
 [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEKJECAB.tim.one@comcast.net>

[Jeremy Hylton]
> We've got the first version of boom nailed, but we've got the same
> problem in handle_finalizers(). The version of boom below doesn't blow
> up until the second time the has_finalizer() is called.

It isn't really necessary to call has_finalizer() a second time, and I'll
check in changes so that it doesn't anymore (assuming the test suite
passes -- it's running as I type this).

> I don't understand the logic in handle_finalizers(), though. If the
> objects are all in the finalizers list, why do we call has_finalizer() a
> second time? Shouldn't everything has a finalizer at that point?

I tried to explain that last night. The essence of the changes I have
pending is to make move_finalizer_reachable() move the tentatively
unreachable objects reachable only from finalizers into a new & distinct
list, reachable_from_finalizers. After that, everything in finalizers has a
finalizer and nothing in reachable_from_finalizers does, so we don't have to
call has_finalizer() again. Before, finalizers contained everything in both
(finalizers and reachable_from_finalizers) lists, so another has_finalizer()
call on each object was needed to distinguish the two kinds (has a
finalizer, doesn't have a finalizer) of objects again.

> import gc
>
> class C:
>
> def __init__(self):
> self.x = 0
>
> def delete(self):
> print "never called"
>
> def __getattr__(self, attr):
> self.x += 1
> print self.x
> if self.x > 1:
> del self.attr
> else:
> return self.delete
> raise AttributeError
>
> a = C()
> b = C()
> a.attr = b
> b.attr = a
>
> del a, b
> print gc.collect()

I also added a non-printing variant of this to test_gc. In the new world,
the "del self.attr" bits never get called, so this is just a vanilla trash
cycle now.



From jeremy@alum.mit.edu Sun Apr 6 03:02:04 2003
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: 05 Apr 2003 21:02:04 -0500
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com>
Message-ID: <1049594522.24643.57.camel@localhost.localdomain>

On Sat, 2003-04-05 at 14:34, Tim Peters wrote:
> While a __getattr__ side effect may resurrect an object in gc's unreachable
> list, gc has no way to know that an object has been resurrected short of
> starting over again. In the absence of that, the object remains in gc's
> unreachable list, and its tp_clear slot eventually gets called. The
> internal C stuff remains self-consistent, so this won't cause a segfault
> (etc), but it may (as above) be surprising. I don't see a sane way to fix
> this so long as asking whether __del__ exists can execute arbitrary mounds
> of Python code.

I think I'll second the thought that there are no satisfactory answers
here. We've made a big step forward by fixing the core dumps.

If we want to document the current behavior, we would say that garbage
collection may leave reachable objects in an "invalid state" in the
presence of "problematic objects." A "problematic object" is an
instance of a classic class that defines a getattr hook (__getattr__)
but not a finalizer (__del__). An object an in "invalid state" has had
its tp_clear slot executed; in the case of instances, this means the
__dict__ will be empty. Specifically, if a problematic object is part
of unreachable cycle, the garbage collector will execute the code in its
getattr hook; if executing that code makes any object in the cycle
reachable again, it will be left in an invalid state.

If we document this for 2.2, it's more complicated because instances of
new-style classes are also affected. What's worse, a new-style class
with a __getattribute__ hook is affected regardless of whether it has a
finalizer.

Here are a couple of thoughts about how to avoid leaving objects in an
invalid state. It's pretty unlikely for it to happen, but speaking from
experience <wink> it's baffling when it does.

#1. (I think this was Fred's suggestion on Friday.) Don't do a
hasattr() check on the object, do it on the class. This is what happens
with new-style classes in Python 2.3: If a new-style class doesn't
define an __del__ method, then its instances don't have finalizer. It
doesn't matter whether the specific instance has an __del__ attribute.

Limitations: This is a change in semantics, although it only covers a
nearly insane corner case. The other limitation is that things could
still go wrong, although only in the presence of a classic metaclass!

#2. If an object has a getattr hook and it's involved in a cycle, just
put it in gc.garbage. Forget about checking for a finalizer. That
seems fine for 2.3, since we're only talking about classic classes with
getattr hooks. But it doesn't sound very pleasant for 2.2, since it
covers an class instance with a getattr hook.

I think #1 is pretty reasonable. I'd like to see something fixed for
2.2.3, but I worry that the semantic change may be unacceptable for a
bug fix release. (But maybe not, the semantics are pretty insane right
now :-).

Jeremy




From jim@zope.com Sun Apr 6 12:07:44 2003
From: jim@zope.com (Jim Fulton)
Date: Sun, 06 Apr 2003 07:07:44 -0400
Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <list-1431082@digicool.com>
References: <list-1431082@digicool.com>
Message-ID: <3E900A80.3010802@zope.com>

Tim Peters wrote:

...

> While a __getattr__ side effect may resurrect an object in gc's unreachable
> list, gc has no way to know that an object has been resurrected short of
> starting over again. In the absence of that, the object remains in gc's
> unreachable list, and its tp_clear slot eventually gets called. The
> internal C stuff remains self-consistent, so this won't cause a segfault
> (etc), but it may (as above) be surprising. I don't see a sane way to fix
> this so long as asking whether __del__ exists can execute arbitrary mounds
> of Python code.

If I understand the problem, it can be avoided by avoiding old-style classes.

Maybe it's time to, at least optionally, cause a warning when old-style classes
are used. :) I'm not kidding for Zope. I think it might be worth-while
to be issue such a warning in Zope.

Jim

-- 
Jim Fulton mailto:jim@zope.com Python Powered!
CTO (703) 361-1714 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org



From skip@mojam.com Sun Apr 6 13:00:22 2003
From: skip@mojam.com (Skip Montanaro)
Date: Sun, 6 Apr 2003 07:00:22 -0500
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200304061200.h36C0MU07870@manatee.mojam.com>

Bug/Patch Summary
-----------------

384 open / 3510 total bugs (+7)
136 open / 2062 total patches (no change)

New Bugs
--------

test_zipimport failing on ia64 (at least) (2003-03-30)
	http://python.org/sf/712322
Cannot change the class of a list (2003-03-31)
	http://python.org/sf/712975
test_pty fails on HP-UX and AIX when run after test_openpty (2003-03-31)
	http://python.org/sf/713169
site.py breaks if prefix is empty (2003-04-01)
	http://python.org/sf/713601
Distutils documentation amputated (2003-04-01)
	http://python.org/sf/713722
cPickle fails to pickle inf (2003-04-03)
	http://python.org/sf/714733
bsddb.first()/next() raise undocumented exception (2003-04-03)
	http://python.org/sf/715063
pydoc support for keywords (2003-04-05)
	http://python.org/sf/715782
Minor nested scopes doc issues (2003-04-06)
	http://python.org/sf/716168

New Patches
-----------

Bug fix 548176: urlparse('http://foo?blah') errs (2003-03-30)
	http://python.org/sf/712317
sre fixes for lastindex and minimizing repeats+assertions (2003-03-31)
	http://python.org/sf/712900
Fixes for 'commands' module on win32 (2003-04-01)
	http://python.org/sf/713428
rfc822.parsedate returns a tuple (2003-04-01)
	http://python.org/sf/713599
freeze fails when extensions_win32.ini is missing (2003-04-01)
	http://python.org/sf/713645
iconv_codec NG (2003-04-02)
	http://python.org/sf/713820
Unicode Codecs for CJK Encodings (2003-04-02)
	http://python.org/sf/713824
Guard against segfaults in debug code (2003-04-02)
	http://python.org/sf/714348
timeouts for FTP connect (and other supported ops) (2003-04-03)
	http://python.org/sf/714592
Document freeze process in PC/config.c (2003-04-03)
	http://python.org/sf/714957

Closed Bugs
-----------

locale.getpreferredencoding fails on AIX (2003-01-31)
	http://python.org/sf/678259
configure option --enable-shared make problems (2003-03-11)
	http://python.org/sf/701823
-i -u options give SyntaxError on Windows (2003-03-21)
	http://python.org/sf/707576

Closed Patches
--------------

sgmllib support for additional tag forms (2002-04-17)
	http://python.org/sf/545300
posixfy some things (2002-12-08)
	http://python.org/sf/650412
Add missing constants for IRIX al module (2003-01-13)
	http://python.org/sf/667548
Py_Main() removal of exit() calls. Return value instead (2003-01-21)
	http://python.org/sf/672053
fix for bug 672614 :) (2003-02-28)
	http://python.org/sf/695250
Wrong prototype for PyUnicode_Splitlines on documentation (2003-03-11)
	http://python.org/sf/701395
more apply removals (2003-03-11)
	http://python.org/sf/701494
Fix a few broken links in pydoc (2003-03-19)
	http://python.org/sf/706338
Adds Mock Object support to unittest.TestCase (2003-03-19)
	http://python.org/sf/706590
Make "%c" % u"a" work (2003-03-26)
	http://python.org/sf/710127
Backport to 2.2.2 of codec registry fix (2003-03-27)
	http://python.org/sf/710576
Obsolete comment in urlparse.py (2003-03-30)
	http://python.org/sf/712124


From nas@python.ca Sun Apr 6 19:43:21 2003
From: nas@python.ca (Neil Schemenauer)
Date: Sun, 6 Apr 2003 11:43:21 -0700
Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <3E900A80.3010802@zope.com>
References: <list-1431082@digicool.com> <3E900A80.3010802@zope.com>
Message-ID: <20030406184320.GA14894@glacier.arctrix.com>

Jim Fulton wrote:
> Maybe it's time to, at least optionally, cause a warning when
> old-style classes are used. :) I'm not kidding for Zope. I think it
> might be worth-while to be issue such a warning in Zope.

A command line option that enabled new-style classes by default may be a
good idea (suggested to me by AMK at PyCon).

 Neil


From barry@python.org Sun Apr 6 23:03:32 2003
From: barry@python.org (Barry Warsaw)
Date: 06 Apr 2003 18:03:32 -0400
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049594522.24643.57.camel@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com>
 <1049594522.24643.57.camel@localhost.localdomain>
Message-ID: <1049666611.9026.3.camel@geddy>

On Sat, 2003-04-05 at 21:02, Jeremy Hylton wrote:

> #1. (I think this was Fred's suggestion on Friday.) Don't do a
> hasattr() check on the object, do it on the class. This is what happens
> with new-style classes in Python 2.3: If a new-style class doesn't
> define an __del__ method, then its instances don't have finalizer. It
> doesn't matter whether the specific instance has an __del__ attribute.

FWIW, IIRC Jython does something vaguely like this. Actually the
existance of __del__ is check at class creation time because it's
expensive to call __del__ when the object is Java gc'd, and we use two
different Java classes for classic class instances depending on whether
it had a __del__ or not. This means you can't add __del__ to the class
or the instance after the class is defined. Personally I think this is
reasonable and I don't recall this biting anyone when I was working on
Jython.

-Barry




From tim_one@email.msn.com Mon Apr 7 01:47:53 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 6 Apr 2003 20:47:53 -0400
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <1049594522.24643.57.camel@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com>

[Jeremy Hylton]
> I think I'll second the thought that there are no satisfactory answers
> here. We've made a big step forward by fixing the core dumps.
>
> If we want to document the current behavior, we would say that garbage
> collection may leave reachable objects in an "invalid state" in the
> presence of "problematic objects." A "problematic object" is an
> instance of a classic class that defines a getattr hook (__getattr__)
> but not a finalizer (__del__). An object an in "invalid state" has had
> its tp_clear slot executed; in the case of instances, this means the
> __dict__ will be empty. Specifically, if a problematic object is part
> of unreachable cycle, the garbage collector will execute the code in its
> getattr hook; if executing that code makes any object in the cycle
> reachable again, it will be left in an invalid state.

I expect that documenting it comprehensbly is impossible. For example, the
referrent of "it" in your last sentence is unclear, and hard to flesh out.
A problematic object doesn't need to be part of a cycle to cause problems,
and when it does cause problems the things that end up in an unexpected
state needn't be part of cycles either. It's more that the problematic
object needs to be reachable only from an unreachable cycle (the unreachable
cycle needn't contain problematic objects), and then it's all the objects
reachable only from the unreachable cycle and from the problematic object
that may be in trouble (and regardless of whether they're in cycles).
Here's a concrete example, where the instance of the problematic D isn't in
a cycle, and neither are the list or the dict that get magically cleared
(.mylist and .mydict) despite being resurrected:

"""
class C:
 pass

class D:
 def __init__(self):
 self.mydict = {'a': 1, 'b': 2}
 self.mylist = range(100)

 def __getattr__(self, attribute):
 global alist
 if attribute == "__del__":
 alist.append(self.mydict)
 alist.append(self.mylist)
 raise AttributeError

import gc
gc.collect()

a = C()
a.loop = a # make a cycle
a.d_instance = D() # an instance of D hangs *off* the cycle

alist = []
del a
print gc.collect() # 6: a, a.d_instance, their __dicts__, and D()'s
 # mydict and mylist

print alist # [(), []]
"""

If we had enough words to explain that, it still wouldn't be enough, because
the effect of calling tp_clear isn't defined by the language for any type.
If, for example, D also defined a .mytuple attr and resurrected it in
__getattr__, the user would see that *that* one survived OK (tuples happen
to have a NULL tp_clear slot).

> If we document this for 2.2, it's more complicated because instances of
> new-style classes are also affected. What's worse, a new-style class
> with a __getattribute__ hook is affected regardless of whether it has a
> finalizer.

In 2.2 but not 2.3, right? I haven't tried anything with __getattribute__.
For that matter, in my own Python programming, I've never even defined a
__getattr__ method -- I spend most of my life tracking down bugs in things I
don't use <wink>.

> Here are a couple of thoughts about how to avoid leaving objects in an
> invalid state.

I'd much rather pursue that than write docs nobody will understand.

> It's pretty unlikely for it to happen, but speaking from
> experience <wink> it's baffling when it does.
>
> #1. (I think this was Fred's suggestion on Friday.) Don't do a
> hasattr() check on the object, do it on the class. This is what happens
> with new-style classes in Python 2.3: If a new-style class doesn't
> define an __del__ method, then its instances don't have finalizer. It
> doesn't matter whether the specific instance has an __del__ attribute.
>
> Limitations: This is a change in semantics, although it only covers a
> nearly insane corner case. The other limitation is that things could
> still go wrong, although only in the presence of a classic metaclass!

I'm not sure I followed the last sentence. If I did, screw calling
hasattr() -- do a string lookup for "__del__" in the classic class's
__dict__, and that's it. Anything that ends up executing arbitrary Python
code is going to leave holes.

> #2. If an object has a getattr hook and it's involved in a cycle, just
> put it in gc.garbage. Forget about checking for a finalizer. That
> seems fine for 2.3, since we're only talking about classic classes with
> getattr hooks. But it doesn't sound very pleasant for 2.2, since it
> covers an class instance with a getattr hook.

I'd like to avoid expanding the definition of what ends up in gc.garbage.
The relationship to __del__ and unreachable cycles is explainable now,
modulo the __getattr__ insanity. Getting rid of the latter is a lot more
attractive than folding it into the former.

> I think #1 is pretty reasonable. I'd like to see something fixed for
> 2.2.3, but I worry that the semantic change may be unacceptable for a
> bug fix release. (But maybe not, the semantics are pretty insane right
> now :-).

I have no problem with changing this for 2.2.3. I doubt any Python app will
be affected, except possibly to rid 1 in 10,000 of a subtle bug. There's
certainly no defensible app that relied on Python segfaulting here<wink>,
and I can't imagine any relying on containers getting magically cleared at
unpredictable times.

BTW, I'm still wondering why the ZODB thread test failed the way it did for
Tres and Barry and me: you saw corrupt gc lists, but the rest of us never
did. We saw a Connection instance with a mysteriously cleared __dict__.
That's consistent with the __getattr__-hook-resurrects-an-
object-reachable-only-from-an-unreachable-cycle examples I posted, but did
you guys figure out on Friday whether that's what was actually happening?
The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting
unreachable objects while gc was still crawling over them, and that's a
different (albeit related) problem than __dicts__ getting cleared by magic.



From greg@cosc.canterbury.ac.nz Mon Apr 7 01:54:20 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 07 Apr 2003 12:54:20 +1200 (NZST)
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com>
Message-ID: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz>

> I don't see a sane way to fix this so long as asking whether __del__
>exists can execute arbitrary mounds of Python code.

This further confirms my opinion that __del__ methods are evil, and
the language would be the better for their complete removal.

Failing that, perhaps they should be made a bit less dynamic, so that
the GC can make reasonable assumptions about their existence without
having to execute Python code.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Mon Apr 7 01:56:35 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 07 Apr 2003 12:56:35 +1200 (NZST)
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <20030405193531.GA23455@meson.dyndns.org>
Message-ID: <200304070056.h370uZc14935@oma.cosc.canterbury.ac.nz>

Jp Calderone <exarkun@intarweb.us>:

> perhaps os.fdopen would be more logically placed someplace else -
> Perhaps as a class method of the file type, file.fromfd()?

Not all OSes have the notion of a file descriptor, which is probably
why it's in the os module.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Mon Apr 7 02:04:39 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 07 Apr 2003 13:04:39 +1200 (NZST)
Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <3E900A80.3010802@zope.com>
Message-ID: <200304070104.h3714df15005@oma.cosc.canterbury.ac.nz>

> Maybe it's time to, at least optionally, cause a warning when
> old-style classes are used. :)

You might want to, er, make an exception for subclasses
of Exception (you still don't get any choice there,
right?)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim_one@email.msn.com Mon Apr 7 02:11:10 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 6 Apr 2003 21:11:10 -0400
Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <list-1431942@digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBGEFAB.tim_one@email.msn.com>

[Jim Fulton]
> If I understand the problem, it can be avoided by avoiding
> old-style classes.

In Python 2.3, that appears to be true. In Python 2.2.2, not true. The
problems are caused by __getattr__ hooks that resurrect unreachable objects,
and/or remove the last reference to an unreachable object, when such a hook
is on an instance reachable only from an unreachable cycle, and the class
doesn't explicitly define a __del__ method, and the class has a getattr
hook, and the getattr hook does extreme things instead of just saying "no,
there's no __del__ here".

Python 2.3 introduced new machinery for new-style classes specifically aimed
at answering the "does it support __del__?" question without invoking
getattr hooks, and that's why it's not a problem for new-style classes in
2.3. New-style classes still go thru getattr hooks to answer this question
in 2.2.2.

There were problem in Python and problems in Zope here. Jeremy fixed the
Zope problems under 2.2 by breaking the

 and the getattr hook does extreme things instead of just saying "no,
 there's no __del__ here"

link of the chain for persistent objects.

> Maybe it's time to, at least optionally, cause a warning when
> old-style classes are used. :) I'm not kidding for Zope. I think it
> might be worth-while to be issue such a warning in Zope.

There may be good reasons for wanting that, but none raised in this thread
so far are relevant (unless 2.3 is mandated for Zope, which I'm sure we
don't want to do).



From tim_one@email.msn.com Mon Apr 7 02:30:56 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 6 Apr 2003 21:30:56 -0400
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEBHEFAB.tim_one@email.msn.com>

[Greg Ewing]
> This further confirms my opinion that __del__ methods are evil, and
> the language would be the better for their complete removal.

They sure create more than their share of implementation headaches, so don't
fare well on the "if the implementation is hard to explain, it's a bad idea"
scale.

> Failing that, perhaps they should be made a bit less dynamic, so that
> the GC can make reasonable assumptions about their existence without
> having to execute Python code.

Guido already did so for new-style classes in Python 2.3. That machinery
doesn't exist in 2.2.2, and old-style classes remain a problem under 2.3
too. Backward compatibility constrains how much we can get away with, of
course.



From jeremy@alum.mit.edu Mon Apr 7 04:45:05 2003
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: 06 Apr 2003 23:45:05 -0400
Subject: [Python-Dev] Re:
 [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com>
Message-ID: <1049687104.1383.27.camel@localhost.localdomain>

On Sun, 2003-04-06 at 20:47, Tim Peters wrote:
> BTW, I'm still wondering why the ZODB thread test failed the way it did for
> Tres and Barry and me: you saw corrupt gc lists, but the rest of us never
> did. We saw a Connection instance with a mysteriously cleared __dict__.
> That's consistent with the __getattr__-hook-resurrects-an-
> object-reachable-only-from-an-unreachable-cycle examples I posted, but did
> you guys figure out on Friday whether that's what was actually happening?
> The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting
> unreachable objects while gc was still crawling over them, and that's a
> different (albeit related) problem than __dicts__ getting cleared by magic.

[Note to everyone else, there's a lot of ZODB-specific detail in the
answer. It might not be that interesting beyond ZODB developers.]

The __getattr__ code in ZODB made a large cycle of objects reachable
again. The __getattr__ hook called a method on a ZODB Connection and
the Connection registered itself with the current transaction
(basically, a global resource). Then the Connection got tp_cleared by
the garbage collector. Now the Connection is a zombie but it's also
registered with a transaction. When the transaction commits or aborts,
the code failed because the Connection didn't have any attributes.

I got particularly lucky with my compiler/platform/Python
version/whatever. Part of the code in __getattr__ deleted a key-value
pair from a dictionary. I think that was partly chance; there was
nothing about the code that guaranteed the key was in the dict, but it
deleted it if it was. The value in the dict was a weakref. The weakref
decrefed and deallocated its callback function. Just by luck, the
callback function was the next thing in the unreachable gc list. So I
got a segfault when I dereferenced the now-freed GC header of the
callback object.

Jeremy




From oren-py-d@hishome.net Mon Apr 7 07:16:30 2003
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 7 Apr 2003 02:16:30 -0400
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <20030405193531.GA23455@meson.dyndns.org>
References: <20030405193531.GA23455@meson.dyndns.org>
Message-ID: <20030407061630.GA12658@hishome.net>

On Sat, Apr 05, 2003 at 02:35:31PM -0500, Jp Calderone wrote:
> It occurred to me this afternoon (after answering aquestion about creating
> file objects from file descriptors) that perhaps os.fdopen would be more
> logically placed someplace else - of course it could also remain as
> os.fdopen() for whatever deprecation period is warrented.
> 
> Perhaps as a class method of the file type, file.fromfd()?

I don't see much point in moving it around just because the place 
doesn't seem right but the fact that it's a function rather than a
method means that some things cannot be done in pure Python.

I can create an uninitialized instance of a subclass of 'file' using 
file.__new__(filesubclass) but the only way to open it is by name using 
file.__init__(filesubclassinstance, 'filename'). A file subclass cannot 
be opened from a file descriptor because fdopen always returns a new 
instance of 'file'.

If there was some way to open an uninitialized file object from a file
descriptor it would be possible, for example, to write a version of popen 
that returns a subclass of file. It could add a method for retrieving 
the exit code of the process, do something interesting on __del__, etc.

Here are some alternatives of where this could be implemented, followed 
by what a Python implementation of os.fdopen would look like:

1. New form of file.__new__ with more arguments:

 def fdopen(fd, mode='r', buffering=-1):
 return file.__new__('(fdopen)', mode, buffering, fd)

2. Optional argument to file.__init__:

 def fdopen(fd, mode='r', buffering=-1):
 return file('(fdopen)', mode, buffering, fd)

3. Instance method (NOT a class method):

 def fdopen(fd, mode='r', buffering=-1):
 f = file.__new__()
 f.fdopen(fd, mode, buffering, '(fdopen)')
 return f

 Oren


From theller@python.net Mon Apr 7 07:56:38 2003
From: theller@python.net (Thomas Heller)
Date: 07 Apr 2003 08:56:38 +0200
Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS,1.703,1.704)
In-Reply-To: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net>
References: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net>
Message-ID: <brziu76h.fsf@python.net>

loewis@users.sourceforge.net writes:

> Update of /cvsroot/python/python/dist/src/Misc
> In directory sc8-pr-cvs1:/tmp/cvs-serv28757/Misc
> 
> Modified Files:
> 	NEWS 
> Log Message:
> Rename LONG_LONG to PY_LONG_LONG. Fixes #710285.
> 

What is the recommended way to port code like this to Python 2.3,
and still remain compatible with 2.2?

Thanks,

Thomas

typedef struct {
	PyObject_HEAD
	char tag;
	union {
		char c;
		char b;
		short h;
		int i;
		long l;
#ifdef HAVE_LONG_LONG
		LONG_LONG q;
#endif
		double d;
		float f;
		void *p;
	} value;
	PyObject *obj;
} PyCArgObject;



From mhammond@skippinet.com.au Mon Apr 7 12:23:02 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Mon, 07 Apr 2003 21:23:02 +1000
Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins]
 python/dist/src/Misc NEWS,1.703,1.704)
In-Reply-To: <brziu76h.fsf@python.net>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au>

> > Rename LONG_LONG to PY_LONG_LONG. Fixes #710285.
> > 
> 
> What is the recommended way to port code like this to Python 2.3,
> and still remain compatible with 2.2?

#if defined(PY_LONG_LONG) && !defined(LONG_LONG)
#define LONG_LONG PY_LONG_LONG /* grrr :( */
#endif

? <wink>

This change does break things.

Mark.



From skip@pobox.com Mon Apr 7 15:56:41 2003
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 7 Apr 2003 09:56:41 -0500
Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc
 NEWS, 1.703, 1.704)
In-Reply-To: <brziu76h.fsf@python.net>
References: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net>
 <brziu76h.fsf@python.net>
Message-ID: <16017.37289.216513.120081@montanaro.dyndns.org>

 Thomas> What is the recommended way to port code like this to Python
 Thomas> 2.3, and still remain compatible with 2.2?

 Thomas> #ifdef HAVE_LONG_LONG
 Thomas> LONG_LONG q;
 Thomas> #endif

Wouldn't this work?

#ifdef HAVE_LONG_LONG
# ifdef Py_LONG_LONG
 Py_LONG_LONG q;
# else
 LONG_LONG q;
# endif
#endif

As MarkH pointed out, this change is going to break some code, but there's
probably no way around it. Obviously, some other package defines a
LONG_LONG macro or there wouldn't have been a bug report. Better to bite
the bullet sooner than later.

Skip


From msg_2222@yahoo.com Mon Apr 7 18:16:53 2003
From: msg_2222@yahoo.com (Rick Y)
Date: Mon, 7 Apr 2003 10:16:53 -0700 (PDT)
Subject: [Python-Dev] socket question
Message-ID: <20030407171653.41362.qmail@web20711.mail.yahoo.com>

how can i enable _sockt module in my solaris python?. 
i did not build it. Downloaded it from sunfreeware. 

./viewcvs-install
Traceback (most recent call last):
 File "./viewcvs-install", line 35, in ?
 import compat
 File "./lib/compat.py", line 20, in ?
 import urllib
 File "/usr/local/lib/python2.1/urllib.py", line 26,
in ?
 import socket
 File "/usr/local/lib/python2.1/socket.py", line 41,
in ?
 from _socket import *
ImportError: ld.so.1: python: fatal: libssl.so.0.9.6:
open failed: No such file or directory


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


From aahz@pythoncraft.com Mon Apr 7 18:28:37 2003
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 7 Apr 2003 13:28:37 -0400
Subject: [Python-Dev] socket question
In-Reply-To: <20030407171653.41362.qmail@web20711.mail.yahoo.com>
References: <20030407171653.41362.qmail@web20711.mail.yahoo.com>
Message-ID: <20030407172837.GA18682@panix.com>

On Mon, Apr 07, 2003, Rick Y wrote:
>
> how can i enable _sockt module in my solaris python?. 

python-dev is for discussions about developing the language, not for
questions about using Python. You'll probably get better advice by
subscribing to the newsgroup comp.lang.python (or python-list).
-- 
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects 
with useful practice. --Aahz, c.l.py, 2/4/2002


From jeremy@zope.com Mon Apr 7 18:43:28 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 07 Apr 2003 13:43:28 -0400
Subject: [Python-Dev] socket question
In-Reply-To: <20030407171653.41362.qmail@web20711.mail.yahoo.com>
References: <20030407171653.41362.qmail@web20711.mail.yahoo.com>
Message-ID: <1049737408.23331.19.camel@slothrop.zope.com>

Rick,

This question would be more appropriate on python-list. The python-dev
list is for discussion among people who work on the Python
implementation, rather than for end-user questions. But don't sweat it;
you probably didn't know that.

On Mon, 2003-04-07 at 13:16, Rick Y wrote:
> how can i enable _sockt module in my solaris python?. 
> i did not build it. Downloaded it from sunfreeware. 
> 
> ./viewcvs-install
> Traceback (most recent call last):
> File "./viewcvs-install", line 35, in ?
> import compat
> File "./lib/compat.py", line 20, in ?
> import urllib
> File "/usr/local/lib/python2.1/urllib.py", line 26,
> in ?
> import socket
> File "/usr/local/lib/python2.1/socket.py", line 41,
> in ?
> from _socket import *
> ImportError: ld.so.1: python: fatal: libssl.so.0.9.6:
> open failed: No such file or directory

The version of Python you are using has been linked against OpenSSL. 
The import of _socket is failing because the libssl.so can't be found at
run-time. You either need to tell your linker where to find the file or
install OpenSSL. I'm sure you can find more help on the details on the
other list.

Jeremy




From martin@v.loewis.de Mon Apr 7 22:29:14 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 07 Apr 2003 23:29:14 +0200
Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS,1.703,1.704)
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au>
References: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au>
Message-ID: <m3el4edmj9.fsf@mira.informatik.hu-berlin.de>

Mark Hammond <mhammond@skippinet.com.au> writes:

> #if defined(PY_LONG_LONG) && !defined(LONG_LONG)
> #define LONG_LONG PY_LONG_LONG /* grrr :( */
> #endif

That works; perhaps one would remove the comment...

> This change does break things.

Most certainly. However, it was broken before, as it failed to be
renamed in the grand renaming.

Regards,
Martin


From marcus.h.mendenhall@vanderbilt.edu Tue Apr 8 15:38:57 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Tue, 8 Apr 2003 09:38:57 -0500
Subject: [Python-Dev] _socket efficiencies ideas
Message-ID: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu>

I have been in discussion recently with Martin v. Loewis about an idea 
I have been thinking about for a while to improve the efficiency of the 
connect method in the _socket module. I posted the original suggestion 
to the python suggestions tracker on sourceforge as item 706392.

A bit of history and justification:
I am doing a lot of work using python to develop almost-real-time 
distributed data acquisition and control systems from running 
laboratory apparatus. In this environment, I do a lot of sun-rpc calls 
as part of the vxi-11 protocol to allow TCP/IP access to gpib-like 
devices. As a part of this, I do a lot sock socket.connect() calls, 
often with the connections being quite transient. The problem is that 
the current python _socket module makes a DNS call to try to resolve 
each address before connect is called, which if I am 
connecting/disconnecting many times a second results in pathological 
and gratuitous network activity. Incidentally, I am in the process of 
creating a sourceforge project, pythonlabtools (just approved this 
morning), in which I will start maintaining a repository of the tools I 
have been working on.

My first solution to this, for which I submitted a patch to the tracker 
system (with guidance from Martin), was to create a wrapper for the 
sockaddr object, which one can create in advance, and when 
_socket.connect() is called (actually when getsockaddrarg() is called 
by connect), results in an immediate connection without any DNS 
activity.

This solution solves part of the problem, but may not be the right 
final one. After writing this patch and verifying its functionality, I 
tried it in the real world. Then, I realized that for sun-rpc work, it 
wasn't quite what I needed, since the socket number may be changing 
each time the rpc request is made, resulting in a new address wrapper 
being needed, and thus DNS activity again.

After thinking about what I have done with this patch, I would also 
like to suggest another change (for which I am also willing to submit 
the patch, which is quite simple): Consistent with some of the already 
extant glue in _socket to handle addresses like <broadcast>, would 
there be any reason no to modify
setipaddr() and getaddrinfo() so that if an address is prefixed with 
<numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags 
are always set so these routines reject any non-numeric address, but 
handle numeric ones very efficiently?

I have already implemented a predecessor to this which I am 
experimentally running at home in python 2.2.2, in which I made it so 
that prefixing the address with an exclamation point provided this 
functionality. Given the somewhat more legible approach the team has 
already chosen for special addresses, I see no reason why using a 
<numeric> (or some such) prefix isn't reasonable.

Do any members of the development team have commentary on this? Would 
such a change be likely to be accepted into the system? Any reasons 
which it might break something? The actual patch would be only about 
10 lines of code, (plus some documentation), a few in each of the 
routines mentioned above.

Thanks for any suggestions.

Marcus Mendenhall



From guido@python.org Tue Apr 8 15:50:50 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 08 Apr 2003 10:50:50 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Tue, 08 Apr 2003 09:38:57 CDT."
 <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu>
Message-ID: <200304081450.h38EoqE20178@odiug.zope.com>

> I have been in discussion recently with Martin v. Loewis about an idea 
> I have been thinking about for a while to improve the efficiency of the 
> connect method in the _socket module. I posted the original suggestion 
> to the python suggestions tracker on sourceforge as item 706392.
> 
> A bit of history and justification:
> I am doing a lot of work using python to develop almost-real-time 
> distributed data acquisition and control systems from running 
> laboratory apparatus. In this environment, I do a lot of sun-rpc calls 
> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like 
> devices. As a part of this, I do a lot sock socket.connect() calls, 
> often with the connections being quite transient. The problem is that 
> the current python _socket module makes a DNS call to try to resolve 
> each address before connect is called, which if I am 
> connecting/disconnecting many times a second results in pathological 
> and gratuitous network activity. Incidentally, I am in the process of 
> creating a sourceforge project, pythonlabtools (just approved this 
> morning), in which I will start maintaining a repository of the tools I 
> have been working on.

Are you sure that it tries make a DNS call even when the address is
pure numeric? That seems a mistake, and if that's really happening, I
think that is the part that should be fixed. Maybe in the _socket
module, maybe in getaddrinfo().

> My first solution to this, for which I submitted a patch to the tracker 
> system (with guidance from Martin), was to create a wrapper for the 
> sockaddr object, which one can create in advance, and when 
> _socket.connect() is called (actually when getsockaddrarg() is called 
> by connect), results in an immediate connection without any DNS 
> activity.
> 
> This solution solves part of the problem, but may not be the right 
> final one. After writing this patch and verifying its functionality, I 
> tried it in the real world. Then, I realized that for sun-rpc work, it 
> wasn't quite what I needed, since the socket number may be changing 
> each time the rpc request is made, resulting in a new address wrapper 
> being needed, and thus DNS activity again.
> 
> After thinking about what I have done with this patch, I would also 
> like to suggest another change (for which I am also willing to submit 
> the patch, which is quite simple): Consistent with some of the already 
> extant glue in _socket to handle addresses like <broadcast>, would 
> there be any reason no to modify
> setipaddr() and getaddrinfo() so that if an address is prefixed with 
> <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags 
> are always set so these routines reject any non-numeric address, but 
> handle numeric ones very efficiently?
> 
> I have already implemented a predecessor to this which I am 
> experimentally running at home in python 2.2.2, in which I made it so 
> that prefixing the address with an exclamation point provided this 
> functionality. Given the somewhat more legible approach the team has 
> already chosen for special addresses, I see no reason why using a 
> <numeric> (or some such) prefix isn't reasonable.
> 
> Do any members of the development team have commentary on this? Would 
> such a change be likely to be accepted into the system? Any reasons 
> which it might break something? The actual patch would be only about 
> 10 lines of code, (plus some documentation), a few in each of the 
> routines mentioned above.

I don't see why we would have to add the <numeric> flag to the address
when the form of the address itself is already a perfect clue that the
address is purely numeric. I'd be happy to see a patch that
intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
without calling getaddrinfo().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From marcus.h.mendenhall@vanderbilt.edu Tue Apr 8 16:59:27 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Tue, 8 Apr 2003 10:59:27 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304081450.h38EoqE20178@odiug.zope.com>
Message-ID: <138CDF38-69DB-11D7-A8D4-003065A81A70@vanderbilt.edu>

Thanks for your prompt reply!

On Tuesday, April 8, 2003, at 09:50 AM, Guido van Rossum wrote:

>> I have been in discussion recently with Martin v. Loewis about an idea
>> I have been thinking about for a while to improve the efficiency of 
>> the
>> connect method in the _socket module. I posted the original 
>> suggestion
>> to the python suggestions tracker on sourceforge as item 706392.
>>
>> A bit of history and justification:
>> I am doing a lot of work using python to develop almost-real-time
>> distributed data acquisition and control systems from running
>> laboratory apparatus. In this environment, I do a lot of sun-rpc 
>> calls
>> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like
>> devices. As a part of this, I do a lot sock socket.connect() calls,
>> often with the connections being quite transient. The problem is that
>> the current python _socket module makes a DNS call to try to resolve
>> each address before connect is called, which if I am
>> connecting/disconnecting many times a second results in pathological
>> and gratuitous network activity. Incidentally, I am in the process of
>> creating a sourceforge project, pythonlabtools (just approved this
>> morning), in which I will start maintaining a repository of the tools 
>> I
>> have been working on.
>
> Are you sure that it tries make a DNS call even when the address is
> pure numeric? That seems a mistake, and if that's really happening, I
> think that is the part that should be fixed. Maybe in the _socket
> module, maybe in getaddrinfo().
>
Yes, it seems to do this. It sets the PASSIVE flags, but that doesn't 
seem to be quite enough to prevent DNS activity, although the NUMERIC 
flag does the job. This is true, at least, in 2.3.x on MacOSX, and 
since the socket stuff is all the same, I suspect it is true on many 
Unixes. Note that this doesn't happen on the MacOS9 version, which 
provides its own socket interface through GUSI, which apparently is 
smart enough to handle it.
>> My first solution to this, for which I submitted a patch to the 
>> tracker
>> system (with guidance from Martin), was to create a wrapper for the
>> sockaddr object, which one can create in advance, and when
>> _socket.connect() is called (actually when getsockaddrarg() is called
>> by connect), results in an immediate connection without any DNS
>> activity.
>>
>> This solution solves part of the problem, but may not be the right
>> final one. After writing this patch and verifying its functionality, 
>> I
>> tried it in the real world. Then, I realized that for sun-rpc work, 
>> it
>> wasn't quite what I needed, since the socket number may be changing
>> each time the rpc request is made, resulting in a new address wrapper
>> being needed, and thus DNS activity again.
>>
>> After thinking about what I have done with this patch, I would also
>> like to suggest another change (for which I am also willing to submit
>> the patch, which is quite simple): Consistent with some of the 
>> already
>> extant glue in _socket to handle addresses like <broadcast>, would
>> there be any reason no to modify
>> setipaddr() and getaddrinfo() so that if an address is prefixed with
>> <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags
>> are always set so these routines reject any non-numeric address, but
>> handle numeric ones very efficiently?
>>
>> I have already implemented a predecessor to this which I am
>> experimentally running at home in python 2.2.2, in which I made it so
>> that prefixing the address with an exclamation point provided this
>> functionality. Given the somewhat more legible approach the team has
>> already chosen for special addresses, I see no reason why using a
>> <numeric> (or some such) prefix isn't reasonable.
>>
>> Do any members of the development team have commentary on this? Would
>> such a change be likely to be accepted into the system? Any reasons
>> which it might break something? The actual patch would be only about
>> 10 lines of code, (plus some documentation), a few in each of the
>> routines mentioned above.
>
> I don't see why we would have to add the <numeric> flag to the address
> when the form of the address itself is already a perfect clue that the
> address is purely numeric. I'd be happy to see a patch that
> intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
> without calling getaddrinfo().
>
Do we want this? The parser also then have to be modified when to 
handle numeric INET6 addresses, when they become popular. I actually 
did implement one of my trial versions this way, and it worked fine. 
There is one minor issue, too. In urllib, there are some calls to 
getaddrinfo to get (for maybe no good reason), CNAMEs of addresses. I 
would like some way to tag an address with a very strong comment that 
it is what it is, and I would like all further processing disabled. 
Also, a 'trial' parsing of an address for matching a a.b.c.d pattern 
each time is a lot more processor inensive than checking for <numeric> 
at the beginning.

I am perfectly happy to implement it either way.

> --Guido van Rossum (home page: http://www.python.org/~guido/)
>



From guido@python.org Tue Apr 8 19:01:24 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 08 Apr 2003 14:01:24 -0400
Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: Your message of "Sun, 06 Apr 2003 11:43:21 PDT."
 <20030406184320.GA14894@glacier.arctrix.com>
References: <list-1431082@digicool.com> <3E900A80.3010802@zope.com>
 <20030406184320.GA14894@glacier.arctrix.com>
Message-ID: <200304081801.h38I1QL22691@odiug.zope.com>

> A command line option that enabled new-style classes by default may be a
> good idea (suggested to me by AMK at PyCon).

I expect lots of things to break; such an option would have to be at
least as well-hidden as -U.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Tue Apr 8 19:06:47 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 08 Apr 2003 14:06:47 -0400
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: Your message of "Mon, 07 Apr 2003 12:54:20 +1200."
 <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz>
References: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz>
Message-ID: <200304081806.h38I6v822730@odiug.zope.com>

> This further confirms my opinion that __del__ methods are evil, and
> the language would be the better for their complete removal.

No can do. There must be a way to force e.g. calling os.close() for
an integer file descriptor returned by os.open() without writing C
code. But this should be exceedingly rare.

A quick inspection of the standard library found one other case:
flushing buffered data out. I think that's also a valid use of
__del__.

> Failing that, perhaps they should be made a bit less dynamic, so
> that the GC can make reasonable assumptions about their existence
> without having to execute Python code.

+1

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jafo@tummy.com Wed Apr 9 13:48:48 2003
From: jafo@tummy.com (Sean Reifschneider)
Date: Wed, 9 Apr 2003 06:48:48 -0600
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304081450.h38EoqE20178@odiug.zope.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com>
Message-ID: <20030409124848.GB15649@tummy.com>

On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote:
>Are you sure that it tries make a DNS call even when the address is
>pure numeric? That seems a mistake, and if that's really happening, I

My first thought is that there should be a local DNS cache on the
machine that is running these apps. My second thought is that Python
could benefit from caching some lookup information...

>address is purely numeric. I'd be happy to see a patch that
>intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
>without calling getaddrinfo().

It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere,
you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses
will fill in missing ".0"s, which is particularly handy for accessing
"127.1", which gets expanded to "127.0.0.1".

Sean
-- 
 Rocky: "Do you know what an A-Bomb is?"
 Bullwinkle: "Of course. ``A Bomb'' is what some people call our show."
Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin


From hbl@st-andrews.ac.uk Wed Apr 9 14:35:46 2003
From: hbl@st-andrews.ac.uk (Hamish Lawson)
Date: Wed, 09 Apr 2003 14:35:46 +0100
Subject: [Python-Dev] PEP305 csv package: from csv import csv?
Message-ID: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk>

[Please excuse my posting this message here after initially posting it to 
python-list, but I realised afterwards that this might be the more 
appropriate forum (it hasn't so far had any responses on python-list anyway).]

According to the documentation in progress at

 http://www.python.org/dev/doc/devel/whatsnew/node14.html

use of the forthcoming csv module (as described in PEP305) requires it to
be imported from the csv package:

 from csv import csv

 input = open('datafile', 'rb')
 reader = csv.reader(input)
 for line in reader:
 print line

Is there some reason why the cvs package's __init__.py doesn't import the
required names from cvs.py, so allowing the shorter form below?

 import csv

 input = open('datafile', 'rb')
 reader = csv.reader(input)
 for line in reader:
 print line


Hamish Lawson



From skip@pobox.com Wed Apr 9 14:43:11 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 9 Apr 2003 08:43:11 -0500
Subject: [Python-Dev] PEP305 csv package: from csv import csv?
In-Reply-To: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk>
References: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk>
Message-ID: <16020.9071.801846.936864@montanaro.dyndns.org>

>>>>> "Hamish" == Hamish Lawson <hbl@st-andrews.ac.uk> writes:

 Hamish> [Please excuse my posting this message here after initially
 Hamish> posting it to python-list, but I realised afterwards that this
 Hamish> might be the more appropriate forum (it hasn't so far had any
 Hamish> responses on python-list anyway).]

 ...

Actually, I forwarded your note to the csv mailing list: csv@mail.mojam.com.
That'd be the best place to discuss the topic. ;-)

I'll probably get around to changing things in the next day or two, but
please feel free to submit a patch so I don't forget.

Skip


From guido@python.org Wed Apr 9 14:51:26 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 09:51:26 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 06:48:48 MDT."
 <20030409124848.GB15649@tummy.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com>
 <20030409124848.GB15649@tummy.com>
Message-ID: <200304091351.h39DpSq24961@odiug.zope.com>

> On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote:
> >Are you sure that it tries make a DNS call even when the address is
> >pure numeric? That seems a mistake, and if that's really happening, I
> 
> My first thought is that there should be a local DNS cache on the
> machine that is running these apps. My second thought is that Python
> could benefit from caching some lookup information...

I don't want to build a cache into Python, it should already be part
of libresolv.

> >address is purely numeric. I'd be happy to see a patch that
> >intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
> >without calling getaddrinfo().
> 
> It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere,

The IPv6 folks can add their own cache.

> you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses
> will fill in missing ".0"s, which is particularly handy for accessing
> "127.1", which gets expanded to "127.0.0.1".

I didn't even know this, and I think it's bad style to use something
that obscure (most people would probably guess that 127.1 means
0.0.127.1 or 127.1.0.0).

But since you seem to know about this stuff, perhaps you can submit a
patch?

--Guido van Rossum (home page: http://www.python.org/~guido/)



From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 15:20:50 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Wed, 9 Apr 2003 09:20:50 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com>
Message-ID: <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu>

OK, I'll chime back in on the thread I started... I mostly have a 
question for Sean, since he seems to know the networking stuff well.
Do you know of any reason why my original proposal (which is to allows 
IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause 
both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution 
is attempted, which basically causes parsing with not real resolution 
at all) would break any known or plausible networking standards? The 
current Python socket module basically hides this part of the BSD 
socket API, and I find it quite useful to be able to suppress DNS 
activity absolutely for some addresses.
And for Guido: since this type of tag has already been used in Python 
(as <broadcast>), is there any reason why this solution is inelegant?

Thanks.

Marcus


On Wednesday, April 9, 2003, at 08:51 AM, Guido van Rossum wrote:

>> On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote:
>>> Are you sure that it tries make a DNS call even when the address is
>>> pure numeric? That seems a mistake, and if that's really happening, 
>>> I
>>
>> My first thought is that there should be a local DNS cache on the
>> machine that is running these apps. My second thought is that Python
>> could benefit from caching some lookup information...
>
> I don't want to build a cache into Python, it should already be part
> of libresolv.
>
>>> address is purely numeric. I'd be happy to see a patch that
>>> intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those
>>> without calling getaddrinfo().
>>
>> It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere,
>
> The IPv6 folks can add their own cache.
>
>> you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses
>> will fill in missing ".0"s, which is particularly handy for accessing
>> "127.1", which gets expanded to "127.0.0.1".
>
> I didn't even know this, and I think it's bad style to use something
> that obscure (most people would probably guess that 127.1 means
> 0.0.127.1 or 127.1.0.0).
>
> But since you seem to know about this stuff, perhaps you can submit a
> patch?
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>



From Anthony Baxter <anthony@interlink.com.au> Wed Apr 9 15:24:45 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Thu, 10 Apr 2003 00:24:45 +1000
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <20030409124848.GB15649@tummy.com>
Message-ID: <200304091424.h39EOje08304@localhost.localdomain>

>>> Sean Reifschneider wrote 
> My first thought is that there should be a local DNS cache on the
> machine that is running these apps. My second thought is that Python
> could benefit from caching some lookup information...

Ick ick. This is putting a bunch of code for a stub resolver into python.
This stuff is hard to get right - I implemented this on top of pydns, and
it was a lot of work to get (what I think is) correct, for not very much
gain.

The idea of either suppressing DNS lookups for all-numeric addresses, or
some sort of extended API for suppressing DNS lookups might be better,
but really, isn't this the job of the stub resolver?

Anthony

-- 
Anthony Baxter <anthony@interlink.com.au> 
It's never too late to have a happy childhood.



From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 15:32:00 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Wed, 9 Apr 2003 09:32:00 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain>
Message-ID: <069761E4-6A98-11D7-87F7-003065A81A70@vanderbilt.edu>

On Wednesday, April 9, 2003, at 09:24 AM, Anthony Baxter wrote:

>
>>>> Sean Reifschneider wrote
>> My first thought is that there should be a local DNS cache on the
>> machine that is running these apps. My second thought is that Python
>> could benefit from caching some lookup information...
>
> Ick ick. This is putting a bunch of code for a stub resolver into 
> python.
> This stuff is hard to get right - I implemented this on top of pydns, 
> and
> it was a lot of work to get (what I think is) correct, for not very 
> much
> gain.
>
> The idea of either suppressing DNS lookups for all-numeric addresses, 
> or
> some sort of extended API for suppressing DNS lookups might be better,
> but really, isn't this the job of the stub resolver?
>
This is part of the resolver API, via the AI_NUMERIC flags. I am just 
trying to expose that API to the top level of python.

Marcus

> Anthony
>
> -- 
> Anthony Baxter <anthony@interlink.com.au>
> It's never too late to have a happy childhood.
>



From guido@python.org Wed Apr 9 15:37:35 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 10:37:35 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 09:20:50 CDT."
 <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu>
References: <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <200304091437.h39Ebc125316@odiug.zope.com>

> OK, I'll chime back in on the thread I started... I mostly have a 
> question for Sean, since he seems to know the networking stuff well.

I'll chime in nevertheless.

> Do you know of any reason why my original proposal (which is to allows 
> IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause 
> both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution 
> is attempted, which basically causes parsing with not real resolution 
> at all) would break any known or plausible networking standards?

What are those flags? Which API uses them?

I still don't understand why intercepting the all-numeric syntax isn't
good enough, and why you want a <numeric> prefix.

> The current Python socket module basically hides this part of the
> BSD socket API, and I find it quite useful to be able to suppress
> DNS activity absolutely for some addresses. And for Guido: since
> this type of tag has already been used in Python (as <broadcast>),
> is there any reason why this solution is inelegant?

The reason I'm reluctant to add a new notation is that AFAIK it would
be unique to Python. It's better to stick to standard notations IMO.
<broadcast> was probably a mistake, since it seems to mean the same as
0.0.0.0 (for IPv4).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From neal@metaslash.com Wed Apr 9 15:38:03 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Wed, 09 Apr 2003 10:38:03 -0400
Subject: [Python-Dev] SF file uploads work now
Message-ID: <20030409143803.GE17847@epoch.metaslash.com>

SF has fixed the problem which prevented a file from being uploaded
when submitting a new patch. I just tested this and it worked.

Neal



From jafo@tummy.com Wed Apr 9 15:40:37 2003
From: jafo@tummy.com (Sean Reifschneider)
Date: Wed, 9 Apr 2003 08:40:37 -0600
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain>
References: <20030409124848.GB15649@tummy.com> <200304091424.h39EOje08304@localhost.localdomain>
Message-ID: <20030409144037.GL1756@tummy.com>

On Thu, Apr 10, 2003 at 12:24:45AM +1000, Anthony Baxter wrote:
>Ick ick. This is putting a bunch of code for a stub resolver into python.
>This stuff is hard to get right - I implemented this on top of pydns, and
>it was a lot of work to get (what I think is) correct, for not very much
>gain.

Well, ideally you'd cache the data for as long as the SOA says to cache
it. However, it sounds like in the situation that started this thread,
even caching that data for some small but configurable number of seconds
might help out.

>The idea of either suppressing DNS lookups for all-numeric addresses, or
>some sort of extended API for suppressing DNS lookups might be better,
>but really, isn't this the job of the stub resolver?

Definitely, on both counts... I like the idea of the "<numeric>127.0.0.1"
or otherwise somehow specifying that the address shouldn't be resolved.
I wouldn't think that it'd be good to do lookups of purely IP addresses,
but there is probably some obscure part of some spec that says it should
happen.

Contrary to popular belief, just because I know that IP addresses get
padded with 0s, I'm not a networking lawyer. ;-) I learned that trick
because it can help make dealing with IPV6 addresses much easier, but
I've found it most useful with 127.1.

Sean
-- 
 This message is REALLY offensive, so I ROT-13d it TWICE.
 -- Sean Reifschneider being silly on #python, 2000
Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin


From guido@python.org Wed Apr 9 15:41:37 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 10:41:37 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Thu, 10 Apr 2003 00:24:45 +1000."
 <200304091424.h39EOje08304@localhost.localdomain>
References: <200304091424.h39EOje08304@localhost.localdomain>
Message-ID: <200304091441.h39EfnU25347@odiug.zope.com>

> Ick ick. This is putting a bunch of code for a stub resolver into python.
> This stuff is hard to get right - I implemented this on top of pydns, and
> it was a lot of work to get (what I think is) correct, for not very much
> gain.

What I said.

> The idea of either suppressing DNS lookups for all-numeric addresses, or
> some sort of extended API for suppressing DNS lookups might be better,
> but really, isn't this the job of the stub resolver?

Hey, I just figured it out. The old socket module (Python 2.1 and
before) *did* special-case \d+\.\d+\.\d+\.\d+! This code was somehow
lost when the IPv6 support was added. I propose to put it back in, at
least for IPv4 (AF_INET). Patch anyone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jafo@tummy.com Wed Apr 9 15:48:04 2003
From: jafo@tummy.com (Sean Reifschneider)
Date: Wed, 9 Apr 2003 08:48:04 -0600
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com>
Message-ID: <20030409144803.GM1756@tummy.com>

On Wed, Apr 09, 2003 at 09:51:26AM -0400, Guido van Rossum wrote:
>I didn't even know this, and I think it's bad style to use something
>that obscure

Perhaps... It's also bad style to break the obscure cases that are
defined by the specifications... ;-)

>(most people would probably guess that 127.1 means
>0.0.127.1 or 127.1.0.0).

Yeah, unfortunately it's one of those cases that it doesn't really make
sense until you actually know the padding happens, and then think about
it... It really only makes sense to pad within the address because you
are rarely going to have leading or trailing 0s in a network address.
So, it pads before the trailing specified octet:

 10.1 => 10.0.0.1
 10.9.1 => 10.9.0.1

>But since you seem to know about this stuff, perhaps you can submit a
>patch?

I've updated my local CVS repository, I'll see if I can get a change
done on the airplane today.

Sean
-- 
 The structure of a system reflects the structure of the organization that
 built it. -- Richard E. Fairley
Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin


From guido@python.org Wed Apr 9 15:50:11 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 10:50:11 -0400
Subject: [Python-Dev] SF file uploads work now
In-Reply-To: Your message of "Wed, 09 Apr 2003 10:38:03 EDT."
 <20030409143803.GE17847@epoch.metaslash.com>
References: <20030409143803.GE17847@epoch.metaslash.com>
Message-ID: <200304091450.h39EoDP25441@odiug.zope.com>

> SF has fixed the problem which prevented a file from being uploaded
> when submitting a new patch. I just tested this and it worked.

Thanks! I've removed the big red warning about this from the "submit
new" page.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Wed Apr 9 15:54:18 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 10:54:18 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 08:48:04 MDT."
 <20030409144803.GM1756@tummy.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com>
 <20030409144803.GM1756@tummy.com>
Message-ID: <200304091454.h39EsPr25477@odiug.zope.com>

> On Wed, Apr 09, 2003 at 09:51:26AM -0400, Guido van Rossum wrote:
> >I didn't even know this, and I think it's bad style to use something
> >that obscure
> 
> Perhaps... It's also bad style to break the obscure cases that are
> defined by the specifications... ;-)

Sure. I propose to special-case only what we *absolutely* *know* we
can handle, and if on closer inspection we can't (e.g. someone writes
999.999.999.999) we pass it on to the official code. Here's the 2.1
code, which takes that approach:

	if (sscanf(name, "%d.%d.%d.%d%c", &d1, &d2, &d3, &d4, &ch) == 4 &&
	 0 <= d1 && d1 <= 255 && 0 <= d2 && d2 <= 255 &&
	 0 <= d3 && d3 <= 255 && 0 <= d4 && d4 <= 255) {
		addr_ret->sin_addr.s_addr = htonl(
			((long) d1 << 24) | ((long) d2 << 16) |
			((long) d3 << 8) | ((long) d4 << 0));
		return 4;
	}

> >But since you seem to know about this stuff, perhaps you can submit a
> >patch?
> 
> I've updated my local CVS repository, I'll see if I can get a change
> done on the airplane today.

Great!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 16:07:51 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Wed, 9 Apr 2003 10:07:51 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091437.h39Ebc125316@odiug.zope.com>
Message-ID: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu>

On Wednesday, April 9, 2003, at 09:37 AM, Guido van Rossum wrote:

>> OK, I'll chime back in on the thread I started... I mostly have a
>> question for Sean, since he seems to know the networking stuff well.
>
> I'll chime in nevertheless.
>
>> Do you know of any reason why my original proposal (which is to allows
>> IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause
>> both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution
>> is attempted, which basically causes parsing with not real resolution
>> at all) would break any known or plausible networking standards?
>
> What are those flags? Which API uses them?
>
The getsockaddr call uses them (actually the correct name for one of 
the flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated), 
and its part of the BSD sockets library, which is basically what the 
python socketmodule wraps.

> I still don't understand why intercepting the all-numeric syntax isn't
> good enough, and why you want a <numeric> prefix.
>
I guess intercepting all numeric is OK, it is just less efficient 
(since it requires a trial parsing of an address, which is wasted if it 
is not all numeric), and because it is so easy to implement <numeric>. 
However, all my operational goals are achieved if the old check for 
pure numeric is reinstated at the lowest level (probably in 
getsockaddrarg in socketmodule.c), so it is used everywhere.

>> The current Python socket module basically hides this part of the
>> BSD socket API, and I find it quite useful to be able to suppress
>> DNS activity absolutely for some addresses. And for Guido: since
>> this type of tag has already been used in Python (as <broadcast>),
>> is there any reason why this solution is inelegant?
>
> The reason I'm reluctant to add a new notation is that AFAIK it would
> be unique to Python. It's better to stick to standard notations IMO.
> <broadcast> was probably a mistake, since it seems to mean the same as
> 0.0.0.0 (for IPv4).
I accept this logic. However, python is hiding a very useful (for 
efficiency) piece of the API, or depending on guessing whether you want 
it or not by looking at the format of an address. There are times in 
higher-level (python) code where getaddrinfo is called to get a CNAME, 
where I would also like to cause the raw IP to be returned by force, 
instead of attempting to get a CNAME, since I already know, by the IP I 
chose, that one doesn't exists. If we make the same check for numeric 
IPs in getaddrinfo, then it becomes impossible to resolve numeric names 
back to real ones. There is not way for getaddrinfo to know which way 
we want it, since in this case both ways might be needed.
 



From guido@python.org Wed Apr 9 16:20:39 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 11:20:39 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 10:07:51 CDT."
 <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu>
References: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <200304091521.h39FL5425595@odiug.zope.com>

> > I still don't understand why intercepting the all-numeric syntax
> > isn't good enough, and why you want a <numeric> prefix.
> >
> I guess intercepting all numeric is OK, it is just less efficient
> (since it requires a trial parsing of an address, which is wasted if
> it is not all numeric), and because it is so easy to implement
> <numeric>.

The performance loss will be unmeasurable (parsing a string of at most
11 bytes against a very simple pattern). Compare that to the true
cost of adding <numeric>: documentation has to be added (and dozens of
books updated), and code that wants to use numeric addresses has to be
changed.

> However, all my operational goals are achieved if the
> old check for pure numeric is reinstated at the lowest level
> (probably in getsockaddrarg in socketmodule.c), so it is used
> everywhere.

Right.

> > The reason I'm reluctant to add a new notation is that AFAIK it would
> > be unique to Python. It's better to stick to standard notations IMO.
> > <broadcast> was probably a mistake, since it seems to mean the same as
> > 0.0.0.0 (for IPv4).

> I accept this logic. However, python is hiding a very useful (for 
> efficiency) piece of the API, or depending on guessing whether you want 
> it or not by looking at the format of an address. There are times in 
> higher-level (python) code where getaddrinfo is called to get a CNAME, 
> where I would also like to cause the raw IP to be returned by force, 
> instead of attempting to get a CNAME, since I already know, by the IP I 
> chose, that one doesn't exists. If we make the same check for numeric 
> IPs in getaddrinfo, then it becomes impossible to resolve numeric names 
> back to real ones. There is not way for getaddrinfo to know which way 
> we want it, since in this case both ways might be needed.

You're right, this functionality should be made available. IMO the
right solution is to make it a separate API in the socket module, not
to add more syntax to the existing address parsing code.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Wed Apr 9 19:36:17 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 20:36:17 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <20030409124848.GB15649@tummy.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com>
Message-ID: <3E946821.6010208@v.loewis.de>

Sean Reifschneider wrote:

> My first thought is that there should be a local DNS cache on the
> machine that is running these apps. My second thought is that Python
> could benefit from caching some lookup information...

I disagree. Python should expose the resolver library, and leave caching
to it; many such libraries do caching already, in some form.

The issue is different: In some cases the application just *knows* that
an address is numeric, and that DNS lookup will fail. In these cases,
lookup should be avoided - whether by explicit request from the 
application or by Python implicitly just knowing is a different issue.

It turns out that Python doesn't need to 100% detect numeric addresses,
as long as it would not classify addresses as numeric which aren't. 
Perhaps it is even possible to leave the "is numeric" test to the 
implementation of getaddrinfo, i.e. calling it twice (try numeric first, 
then try resolving the name)?

Regards,
Martin




From martin@v.loewis.de Wed Apr 9 19:38:32 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 20:38:32 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com>
Message-ID: <3E9468A8.8050407@v.loewis.de>

Guido van Rossum wrote:

> I didn't even know this, and I think it's bad style to use something
> that obscure (most people would probably guess that 127.1 means
> 0.0.127.1 or 127.1.0.0).
> 
> But since you seem to know about this stuff, perhaps you can submit a
> patch?

I think the OP is willing to create a patch if guided into a direction.
The basic question is: should Python automatically recognize numeric
addresses, or should the application have a way to indicate a numeric 
address?

Regards,
Martin




From skip@pobox.com Wed Apr 9 19:44:51 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 9 Apr 2003 13:44:51 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <3E946821.6010208@v.loewis.de>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu>
 <200304081450.h38EoqE20178@odiug.zope.com>
 <20030409124848.GB15649@tummy.com>
 <3E946821.6010208@v.loewis.de>
Message-ID: <16020.27171.834878.631470@montanaro.dyndns.org>

 Martin> It turns out that Python doesn't need to 100% detect numeric
 Martin> addresses, as long as it would not classify addresses as numeric
 Martin> which aren't. Perhaps it is even possible to leave the "is
 Martin> numeric" test to the implementation of getaddrinfo, i.e. calling
 Martin> it twice (try numeric first, then try resolving the name)?

Can a top-level domain be all digits? If not, why not assume numeric if
re.search(r"\.\d+$", addr) is not None?

Skip


From guido@python.org Wed Apr 9 19:45:49 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 14:45:49 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 20:36:17 +0200."
 <3E946821.6010208@v.loewis.de>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com>
 <3E946821.6010208@v.loewis.de>
Message-ID: <200304091845.h39Ijor31915@odiug.zope.com>

> Sean Reifschneider wrote:
> 
> > My first thought is that there should be a local DNS cache on the
> > machine that is running these apps. My second thought is that Python
> > could benefit from caching some lookup information...

[MvL]
> I disagree. Python should expose the resolver library, and leave
> caching to it; many such libraries do caching already, in some form.

Right.

> The issue is different: In some cases the application just *knows*
> that an address is numeric, and that DNS lookup will fail.

In fact, I've often written code that passes a numeric address, and
I've always assumed that in that case the code would take a shortcut
because there's nothing to look up (only to parse).

> In these cases, lookup should be avoided - whether by explicit
> request from the application or by Python implicitly just knowing is
> a different issue.
> 
> It turns out that Python doesn't need to 100% detect numeric
> addresses, as long as it would not classify addresses as numeric
> which aren't. Perhaps it is even possible to leave the "is numeric"
> test to the implementation of getaddrinfo, i.e. calling it twice
> (try numeric first, then try resolving the name)?

Perhaps, as long as we can safely ignore the first error. This would
probably be a little slower, but probably not slow enoug to matter,
and it sounds like a very general solution.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Wed Apr 9 19:49:54 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 20:49:54 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu>
References: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <3E946B52.7090708@v.loewis.de>

Marcus Mendenhall wrote:

> The getsockaddr call uses them (actually the correct name for one of the 
> flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated), and its 
> part of the BSD sockets library, which is basically what the python 
> socketmodule wraps.

More importantly, it is part of RFC 2553, which Python uses; it is also
part of Winsock2.

> I guess intercepting all numeric is OK, it is just less efficient (since 
> it requires a trial parsing of an address, which is wasted if it is not 
> all numeric), and because it is so easy to implement <numeric>. 

But isn't the same trial parsing needed to determine presence of the 
"<numeric>" flag? The trial parsing Guido proposes usually stops with
the first letter in a non-numeric address, and accesses up to 16 letters
for a numeric address.

Regards,
Martin




From guido@python.org Wed Apr 9 19:47:41 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 14:47:41 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 20:38:32 +0200."
 <3E9468A8.8050407@v.loewis.de>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com>
 <3E9468A8.8050407@v.loewis.de>
Message-ID: <200304091848.h39IlpW31935@odiug.zope.com>

> The basic question is: should Python automatically recognize numeric
> addresses, or should the application have a way to indicate a numeric 
> address?

It should be automatically recognized. Python has always done this
(until 2.1 at least). I don't think there is any ambiguity; AFAIK
it's not possible to put something in the DNS so that an all-numeric
address gets remapped (that would be a nasty security problem waiting
to happen).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Wed Apr 9 19:59:56 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 20:59:56 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <16020.27171.834878.631470@montanaro.dyndns.org>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <16020.27171.834878.631470@montanaro.dyndns.org>
Message-ID: <3E946DAC.8010909@v.loewis.de>

Skip Montanaro wrote:

> Can a top-level domain be all digits? 

It appears nobody here can answer this question with certainty. If the 
answer is "no", it is surprising that getaddrinfo implementations still 
make resolver calls in this case even if they are sure that those 
resolver calls fail. One would hope that people writing socket libraries 
should no the answer.

Regards,
Martin




From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 20:14:16 2003
From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall)
Date: Wed, 9 Apr 2003 14:14:16 -0500
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <3E946B52.7090708@v.loewis.de>
Message-ID: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>

On Wednesday, April 9, 2003, at 01:49 PM, Martin v. L=F6wis wrote:

> Marcus Mendenhall wrote:
>
>> The getsockaddr call uses them (actually the correct name for one of=20=

>> the flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated),=20=

>> and its part of the BSD sockets library, which is basically what the=20=

>> python socketmodule wraps.
>
> More importantly, it is part of RFC 2553, which Python uses; it is =
also
> part of Winsock2.
>
>> I guess intercepting all numeric is OK, it is just less efficient=20
>> (since it requires a trial parsing of an address, which is wasted if=20=

>> it is not all numeric), and because it is so easy to implement=20
>> <numeric>.
>
> But isn't the same trial parsing needed to determine presence of the=20=

> "<numeric>" flag? The trial parsing Guido proposes usually stops with
> the first letter in a non-numeric address, and accesses up to 16=20
> letters
> for a numeric address.
Yes, but a compare of the head of a string to a constant is probably=20
something which requires 1% of the cpu time of a sscanf. Just:
if (string[0]=3D=3D'<' && not strncmp(string,"<numeric>",9)) {whatever}
the first compare avoids even a subroutine call in the most likely case=20=

(string does not begin with <numeric>) but then checks extremely=20
quickly if it is right after that.

Even though cpu time is cheap, we should save it for useful work.

Marcus



From nas@python.ca Wed Apr 9 20:31:22 2003
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 9 Apr 2003 12:31:22 -0700
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
References: <3E946B52.7090708@v.loewis.de> <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <20030409193122.GA20230@glacier.arctrix.com>

Marcus Mendenhall wrote:
> Even though cpu time is cheap, we should save it for useful work.

Saving a few cycles while having the complicate the interface is not the
Python way. +1 on restoring the old sscanf code (or something similar
to it).

ObTrivia: IP addresses can be written as a single number (at least for
many IP implementations). Try "ping 2130706433".

 Neil


From jeremy@zope.com Wed Apr 9 20:33:47 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 09 Apr 2003 15:33:47 -0400
Subject: [Python-Dev] tp_clear return value
Message-ID: <1049916827.4961.64.camel@slothrop.zope.com>

Why does tp_clear have a return value? All the code I've seen returns
0, but the only place that clear is called doesn't inspect its return
value.

Jeremy




From guido@python.org Wed Apr 9 20:40:56 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 15:40:56 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 14:14:16 CDT."
 <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
References: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <200304091941.h39Jf7A00697@odiug.zope.com>

> Even though cpu time is cheap, we should save it for useful work.

With that attitude, I'm surprised you're using Python at all. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca Wed Apr 9 20:48:10 2003
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 9 Apr 2003 15:48:10 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <1049916827.4961.64.camel@slothrop.zope.com>
References: <1049916827.4961.64.camel@slothrop.zope.com>
Message-ID: <20030409194810.GA27070@mems-exchange.org>

On Wed, Apr 09, 2003 at 03:33:47PM -0400, Jeremy Hylton wrote:
> Why does tp_clear have a return value? All the code I've seen returns
> 0, but the only place that clear is called doesn't inspect its return
> value.

I guess I would have to say overdesign. I was thinking that tp_clear
and tp_traverse could somehow be used by things other than the GC. In
retrospect that doesn't seem likely or even possible. The GC has pretty
specific requirements.

In retrospect, I think both tp_traverse and tp_clear should have
returned "void". That would have made implementing those methods
easier. Testing for errors in tp_traverse methods is silly since
nothing returns an error, and, even if it did, the GC couldn't handle
it.

:-(

How do we sort this out? I suppose one step would be to document that
the return values of tp_traverse and tp_clear are ignored. If we agree
on that, I volunteer to go through the code and remove the useless tests
for errors in the tp_traverse methods.

 Neil


From guido@python.org Wed Apr 9 20:52:03 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 15:52:03 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: Your message of "Wed, 09 Apr 2003 15:48:10 EDT."
 <20030409194810.GA27070@mems-exchange.org>
References: <1049916827.4961.64.camel@slothrop.zope.com>
 <20030409194810.GA27070@mems-exchange.org>
Message-ID: <200304091952.h39Jq6Y01468@odiug.zope.com>

> On Wed, Apr 09, 2003 at 03:33:47PM -0400, Jeremy Hylton wrote:
> > Why does tp_clear have a return value? All the code I've seen returns
> > 0, but the only place that clear is called doesn't inspect its return
> > value.

[In response, Neil admitted]
> I guess I would have to say overdesign. I was thinking that tp_clear
> and tp_traverse could somehow be used by things other than the GC. In
> retrospect that doesn't seem likely or even possible. The GC has pretty
> specific requirements.
> 
> In retrospect, I think both tp_traverse and tp_clear should have
> returned "void". That would have made implementing those methods
> easier. Testing for errors in tp_traverse methods is silly since
> nothing returns an error, and, even if it did, the GC couldn't handle
> it.
> 
> :-(
> 
> How do we sort this out? I suppose one step would be to document that
> the return values of tp_traverse and tp_clear are ignored. If we agree
> on that, I volunteer to go through the code and remove the useless tests
> for errors in the tp_traverse methods.

That's a good first step. Unfortunately changing the declaration to
void will break 3rd party extensions so that will be too painful.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jafo@tummy.com Wed Apr 9 20:22:48 2003
From: jafo@tummy.com (Sean Reifschneider)
Date: Wed, 9 Apr 2003 13:22:48 -0600
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <3E946821.6010208@v.loewis.de>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de>
Message-ID: <20030409192248.GQ1756@tummy.com>

On Wed, Apr 09, 2003 at 08:36:17PM +0200, "Martin v. L?wis" wrote:
>I disagree. Python should expose the resolver library, and leave caching
>to it; many such libraries do caching already, in some form.

Why don't we carry it to the logical conclusion and say that the
resolver should also avoid doing a forward lookup on an already numeric
IP?

I've noticed that before the Red Hat 8.0 release, doing a "telnet <ip>"
would usually be very fast on the initial connection, and since 8.0 it's
been slow as if doing a lookup... To me that indicates that the
resolver used to do this and has been changed to not, which makes me
wonder why that was...

Perhaps we're being too clever and it's going to come back to bite us?
The "<numeric>" syntax would allow us to leave
resolution as it is and let the user override it when they deem
necessary. If we try to auto-detect (which I'm usually all for), we
should probably implement a "<forcedns>" or similar?

Sean
-- 
 Geek English Rule #7: To reduce redundancy, the word "scary" can be left
 out of any statement containing the phrase "scary java applet".
Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin


From guido@python.org Wed Apr 9 21:05:50 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 16:05:50 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: Your message of "Wed, 09 Apr 2003 13:22:48 MDT."
 <20030409192248.GQ1756@tummy.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de>
 <20030409192248.GQ1756@tummy.com>
Message-ID: <200304092005.h39K5pd01600@odiug.zope.com>

> Why don't we carry it to the logical conclusion and say that the
> resolver should also avoid doing a forward lookup on an already numeric
> IP?
> 
> I've noticed that before the Red Hat 8.0 release, doing a "telnet <ip>"
> would usually be very fast on the initial connection, and since 8.0 it's
> been slow as if doing a lookup... To me that indicates that the
> resolver used to do this and has been changed to not, which makes me
> wonder why that was...
> 
> Perhaps we're being too clever and it's going to come back to bite us?

I think it's the other way around. The resolver lost some perfectly
good caching in the upgrade to support IPv6. The designers probably
didn't notice the difference because in their own setup, DNS is fast.
I expect the caching will come back eventually.

> The "<numeric>" syntax would allow us to leave
> resolution as it is and let the user override it when they deem
> necessary. If we try to auto-detect (which I'm usually all for), we
> should probably implement a "<forcedns>" or similar?

YAGNI.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Wed Apr 9 21:27:01 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 22:27:01 +0200
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <20030409194810.GA27070@mems-exchange.org>
References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org>
Message-ID: <3E948215.8050504@v.loewis.de>

Neil Schemenauer wrote:

> I guess I would have to say overdesign. I was thinking that tp_clear
> and tp_traverse could somehow be used by things other than the GC. In
> retrospect that doesn't seem likely or even possible. The GC has pretty
> specific requirements.
> 
> In retrospect, I think both tp_traverse and tp_clear should have
> returned "void". 

While this is true for tp_clear, tp_traverse is actually more general.
gc.get_referrers uses tp_traverse, for something other than collection.

> That would have made implementing those methods
> easier. Testing for errors in tp_traverse methods is silly since
> nothing returns an error, and, even if it did, the GC couldn't handle
> it.

Again, gc.get_referrers "uses" this feature. If extending the list 
fails, traversal is aborted. Whether this is useful is questionable,
as the entire notion of "out of memory exception handling" is questionable.

Regards,
Martin




From jafo@tummy.com Wed Apr 9 21:33:19 2003
From: jafo@tummy.com (Sean Reifschneider)
Date: Wed, 9 Apr 2003 14:33:19 -0600
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <16020.27171.834878.631470@montanaro.dyndns.org>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <16020.27171.834878.631470@montanaro.dyndns.org>
Message-ID: <20030409203319.GS1756@tummy.com>

On Wed, Apr 09, 2003 at 01:44:51PM -0500, Skip Montanaro wrote:
>Can a top-level domain be all digits? If not, why not assume numeric if
>re.search(r"\.\d+$", addr) is not None?

I don't think anyone sane would create a top-level that's digits,
particularly in the range of 0 to 255. That probably means that
somebody is going to do it... ;-/

I think checking for 2 to 4 dotted octets in the range of 0 to 255
would be safest... Yes, you can probably get away with using the regex
above, but I wouldn't want to.

Sean
-- 
 Sucking all the marrow out of life doesn't mean choking on the bone.
 -- _Dead_Poet's_Society_
Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com>
tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin


From tim.one@comcast.net Wed Apr 9 22:33:07 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 09 Apr 2003 17:33:07 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <3E948215.8050504@v.loewis.de>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKMFEAA.tim.one@comcast.net>

[Neil Schemenauer]
>> I was thinking that tp_clear and tp_traverse could somehow be used by
>> things other than the GC. In retrospect that doesn't seem likely or even
>> possible. The GC has pretty specific requirements.
>> In retrospect, I think both tp_traverse and tp_clear should have
>> returned "void".

[Martin v. Lowis]
> While this is true for tp_clear, tp_traverse is actually more general.
> gc.get_referrers uses tp_traverse, for something other than collection.

>> That would have made implementing those methods
>> easier. Testing for errors in tp_traverse methods is silly since
>> nothing returns an error, and, even if it did, the GC couldn't handle
>> it.

> Again, gc.get_referrers "uses" this feature. If extending the list
> fails, traversal is aborted. Whether this is useful is questionable,
> as the entire notion of "out of memory exception handling" is
> questionable.

The brand new gc.get_referents uses the return value of tp_traverse to abort
on out-of-memory, but gc.get_referrers uses it for a different purpose (its
traversal function returns true if the visited object is in the tuple of
objects passed in, else returns false). The internal gc.get_referrers_for
is what aborts on out-of-memory in the get_referrers subsystem.

tp_traverse is fine as-is. The return value of tp_clear does indeed appear
without plausible use.

>> If we agree that, I volunteer to go through the code and remove the
>> useless tests for errors in the tp_traverse methods.

That would make get_referents press on after memory is exhausted. It would
also change the semantics of get_referrers, in a subtle way (if object A has
25 references to object B, gc.get_referrers(B) contains only 1 instance of A
today, but would contain 25 instances of A if tp_traverse methods ignored
visit() return values).

truth-isn't-necessarily-an-error-ly y'rs - tim



From Jack.Jansen@oratrix.com Wed Apr 9 22:33:14 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Wed, 9 Apr 2003 23:33:14 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <20030409144037.GL1756@tummy.com>
Message-ID: <DF2120F8-6AD2-11D7-846E-000A27B19B96@oratrix.com>

On woensdag, apr 9, 2003, at 16:40 Europe/Amsterdam, Sean Reifschneider 
wrote:

> On Thu, Apr 10, 2003 at 12:24:45AM +1000, Anthony Baxter wrote:
>> Ick ick. This is putting a bunch of code for a stub resolver into 
>> python.
>> This stuff is hard to get right - I implemented this on top of pydns, 
>> and
>> it was a lot of work to get (what I think is) correct, for not very 
>> much
>> gain.
>
> Well, ideally you'd cache the data for as long as the SOA says to cache
> it. However, it sounds like in the situation that started this thread,
> even caching that data for some small but configurable number of 
> seconds
> might help out.

I wouldn't touch caching with a ten foot pole here: Python cannot know 
what happens
under the hood of the network. For example, if I move my WiFi-equipped 
laptop
from one location to another I don't want to be forced to restart my 
Python
applications just to clear some silly cache, knowing that the OS and 
libc
layers have handled the switch fine.

(And, yes, Windoze-users are probably required to reboot anyway, but my 
Mac
handles changing IP addresses just nicely:-)
--
- Jack Jansen <Jack.Jansen@oratrix.com> 
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From nas@python.ca Wed Apr 9 22:41:04 2003
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 9 Apr 2003 14:41:04 -0700
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <3E948215.8050504@v.loewis.de>
References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> <3E948215.8050504@v.loewis.de>
Message-ID: <20030409214104.GA20544@glacier.arctrix.com>

"Martin v. L?wis" wrote:
> Neil Schemenauer wrote:
> >In retrospect, I think both tp_traverse and tp_clear should have
> >returned "void". 
> 
> While this is true for tp_clear, tp_traverse is actually more general.
> gc.get_referrers uses tp_traverse, for something other than collection.

Could the visit procedure keep track of errors? Something like:

 struct result {
 int error; /* true if an error occured while traversing */
 /* other results */
 }

 static void
 myvisit(PyObject* obj, struct result *r)
 {
 if (!r->error) {
 <do stuff, set r->error of error occurs>
 }
 }



From martin@v.loewis.de Wed Apr 9 22:47:52 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 09 Apr 2003 23:47:52 +0200
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <20030409214104.GA20544@glacier.arctrix.com>
References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> <3E948215.8050504@v.loewis.de> <20030409214104.GA20544@glacier.arctrix.com>
Message-ID: <3E949508.1030902@v.loewis.de>

Neil Schemenauer wrote:

> Could the visit procedure keep track of errors? 

No. For get_referrers (as Tim explains), it might be acceptable but
less efficient (since traversal should stop when a the object is found
to be a referrer). For get_referents, an error in the callback should
really abort traversal as the system just went out of memory.

Regards,
Martin





From db3l@fitlinxx.com Wed Apr 9 23:11:10 2003
From: db3l@fitlinxx.com (David Bolen)
Date: 09 Apr 2003 18:11:10 -0400
Subject: [Python-Dev] Re: _socket efficiencies ideas
References: <3E946B52.7090708@v.loewis.de> <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> <20030409193122.GA20230@glacier.arctrix.com>
Message-ID: <u65pniao1.fsf@fitlinxx.com>

Neil Schemenauer <nas@python.ca> writes:

> Marcus Mendenhall wrote:
> > Even though cpu time is cheap, we should save it for useful work.
> 
> Saving a few cycles while having the complicate the interface is not the
> Python way. +1 on restoring the old sscanf code (or something similar
> to it).

For what it's worth, whenever I had network code that I wanted to
accept names or addresses, I always distinguished them through an
attempt using the platform inet_addr() system call. If that returns
an error (-1), then I go ahead and process it as a name, otherwise I
use the address it returns.

inet_addr() will itself take care of validating that the address is
legal (e.g., no octet over 255 and only up to 4 octets), padding
values as necessary (e.g., x.y.z is processed as if z was a 16-bit
value, x.z as if z was a 24-bit value, x as a 32-bit value), and
permits decimal, octal or hexadecimal forms of the individual octets.
I believe this behavior is portable and well defined.

If you wanted the same code to work for IPv4 and IPv6, you'd probably
want to use inet_pton() instead since inet_addr() only does IPv4,
although that would lose the hex/octal options. You'd probably have
to conditionalize that anyway since it might not be available on IPv4
only configurations, so I could see using inet_addr() for IPv4 and
inet_pton() for IPv6.

> ObTrivia: IP addresses can be written as a single number (at least for
> many IP implementations). Try "ping 2130706433".

That's part of the inet_addr() definition. When a single value is
given as the string, it is assumed to be the complete 32-bit address
value, and is stored directly without any byte rearrangement.

So, 2130706433 is (127*2^24) + 1, or "127.0.0.1" - but then obviously
you knew that :-)

-- David



From greg@cosc.canterbury.ac.nz Thu Apr 10 01:31:34 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Apr 2003 12:31:34 +1200 (NZST)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain>
Message-ID: <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz>

Anthony Baxter <anthony@interlink.com.au>:

> The idea of either suppressing DNS lookups for all-numeric addresses, or
> some sort of extended API for suppressing DNS lookups might be better,
> but really, isn't this the job of the stub resolver?

Seems to me the basic problem is that we're representing
to completely different things -- a DNS name and a raw
IP address -- the same way, i.e. as a string.

A raw IP address should (at least optionally) be represented 
by something different, such as a tuple of ints.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From guido@python.org Thu Apr 10 01:37:58 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 09 Apr 2003 20:37:58 -0400
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: "Your message of Thu, 10 Apr 2003 12:31:34 +1200."
 <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz>
References: <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz>
Message-ID: <200304100037.h3A0bwt01972@pcp02138704pcs.reston01.va.comcast.net>

> Seems to me the basic problem is that we're representing
> to completely different things -- a DNS name and a raw
> IP address -- the same way, i.e. as a string.
> 
> A raw IP address should (at least optionally) be represented 
> by something different, such as a tuple of ints.

Why? There's never any ambiguity about which kind is intended.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz Thu Apr 10 02:10:44 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Apr 2003 13:10:44 +1200 (NZST)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304091848.h39IlpW31935@odiug.zope.com>
Message-ID: <200304100110.h3A1Aij25025@oma.cosc.canterbury.ac.nz>

Guido van Rossum <guido@python.org>:

> AFAIK it's not possible to put something in the DNS so that an
> all-numeric address gets remapped

In that case, there's no problem at all, and I withdraw
my suggestion about using tuples for numeric addresses.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Thu Apr 10 02:15:05 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Apr 2003 13:15:05 +1200 (NZST)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
Message-ID: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz>

Marcus Mendenhall <marcus.h.mendenhall@vanderbilt.edu>:

> Just: if (string[0]=='<' && not strncmp(string,"<numeric>",9))
> {whatever}

By the same token, checking whether the first char is
a digit ought to weed out about 99.999% of all
non-numeric domain name addresses.

If this is even a problem, which I doubt. We're
talking about something called from Python, for
goodness sake...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From andrew@acooke.org Thu Apr 10 02:27:35 2003
From: andrew@acooke.org (andrew cooke)
Date: Wed, 9 Apr 2003 21:27:35 -0400 (CLT)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz>
References: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu>
 <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz>
Message-ID: <40894.127.0.0.1.1049938055.squirrel@127.0.0.1>

this is a fragment from RFC 1034 (DOMAIN NAMES - CONCEPTS AND FACILITIES)
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1034.html

i'm not 100% sure that this is the "normative" definition, but if it is
then it clearly requires a non-numeric initial character for each label.

(sorry if someone has already mentioned this!)

andrew



3.5 Preferred name syntax

The DNS specifications attempt to be as general as possible in the rules
for constructing domain names. The idea is that the name of any
existing object can be expressed as a domain name with minimal changes.
However, when assigning a domain name for an object, the prudent user
will select a name which satisfies both the rules of the domain system
and any existing rules for the object, whether these rules are published
or implied by existing programs.

For example, when naming a mail domain, the user should satisfy both the
rules of this memo and those in RFC-822. When creating a new host name,
the old rules for HOSTS.TXT should be followed. This avoids problems
when old software is converted to use domain names.

The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET).

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in

upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9


Note that while upper and lower case letters are allowed in domain
names, no significance is attached to the case. That is, two names with
the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen. There are also some
restrictions on the length. Labels must be 63 characters or less.



-- 
http://www.acooke.org/andrew


From tim.one@comcast.net Thu Apr 10 03:29:21 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 09 Apr 2003 22:29:21 -0400
Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules
 gcmodule.c,2.33.6.5,2.33.6.6
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBHEFAB.tim_one@email.msn.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEOEECAB.tim.one@comcast.net>

[Greg Ewing]
>> Failing that, perhaps they should be made a bit less dynamic, so that
>> the GC can make reasonable assumptions about their existence without
>> having to execute Python code.

[Tim]
> Guido already did so for new-style classes in Python 2.3. That machinery
> doesn't exist in 2.2.2, and old-style classes remain a problem under 2.3
> too. Backward compatibility constrains how much we can get away with, of
> course.

FYI, those who study the checkin comments know how this ended. It ended
well! gc no longer does anything except string-keyed dict lookups when
determining whether a finalizer exists, for old- & new- style classes, and
in 2.3 CVS & the 2.2 maintenance branch.

The only incompatibilities appear to be genuine bug fixes. The hasattr()
method was actually incorrect in two mondo obscure cases (one where hasattr
said "yes, __del__ exists" when a finalizer couldn't actually be run, and
the other where hasattr said "no, __del__ doesn't exist" when arbitrary
Python code actually could be invoked by destructing an object). A new
private API function _PyInstance_Lookup was added in 2.2 and 2.3, which does
for old-style class instances what _PyType_Lookup does for new-style classes
(determines whether an attribute exists via pure C string-keyed dict
lookups).



From gisle@ActiveState.com Thu Apr 10 04:45:52 2003
From: gisle@ActiveState.com (Gisle Aas)
Date: 09 Apr 2003 20:45:52 -0700
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz>
References: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz>
Message-ID: <lrvfxnm2vj.fsf@caliper.activestate.com>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> Marcus Mendenhall <marcus.h.mendenhall@vanderbilt.edu>:
> 
> > Just: if (string[0]=='<' && not strncmp(string,"<numeric>",9))
> > {whatever}
> 
> By the same token, checking whether the first char is
> a digit ought to weed out about 99.999% of all
> non-numeric domain name addresses.

3m.com is a registered domain name.

Regards,
Gisle Aas,
ActiveState


From huey_jiang@yahoo.com Thu Apr 10 05:57:03 2003
From: huey_jiang@yahoo.com (Huey Jiang)
Date: Wed, 9 Apr 2003 21:57:03 -0700 (PDT)
Subject: [Python-Dev] Unicode
Message-ID: <20030410045703.14754.qmail@web20007.mail.yahoo.com>

Hi There,


I wonder how can I get python to support Chinese
language? I noticed python has Unicode feature in
version 2.2.2, but as I tried:

>>> str = " a_char_in_chinese_lan"

I encountered UnicodeError. How can I make this to
work? Thanks!


Huey

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


From Anthony Baxter <anthony@interlink.com.au> Thu Apr 10 05:58:20 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Thu, 10 Apr 2003 14:58:20 +1000
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <lrvfxnm2vj.fsf@caliper.activestate.com>
Message-ID: <200304100458.h3A4wK816653@localhost.localdomain>

>>> Gisle Aas wrote
> Greg Ewing <greg@cosc.canterbury.ac.nz> writes:
> > By the same token, checking whether the first char is
> > a digit ought to weed out about 99.999% of all
> > non-numeric domain name addresses.
> 
> 3m.com is a registered domain name.

As is 3com.com, and, for a more python-related example, 4suite.org.
The latter also has an A record. 

411.com and 911.com are both valid domains, as is 123.com. 
With the appropriate resolv.conf search path (ie including '.com'),
you could enter '123' and expect to get back the address 
64.186.10.158.

Isn't the DNS fun.

Anthony

-- 
Anthony Baxter <anthony@interlink.com.au> 
It's never too late to have a happy childhood.



From tim_one@email.msn.com Thu Apr 10 06:03:42 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 10 Apr 2003 01:03:42 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <3E949508.1030902@v.loewis.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com>

[Neil Schemenauer]
>> Could the visit procedure keep track of errors?

[Martin v. Löwis]
> No. For get_referrers (as Tim explains), it might be acceptable but
> less efficient (since traversal should stop when a the object is found
> to be a referrer). For get_referents, an error in the callback should
> really abort traversal as the system just went out of memory.

Still, I expect both could be handled by setjmp in the gc module get_ref*
driver functions and longjmp (as needed) in the gc module visitor functions.
IOW, the tp_traverse slot functions don't really need to cooperate, or even
know anything about "early returns".

Why this may be more than just idly interesting: the tp_traverse functions
are called a lot by gc. The get_ref* functions are never called except when
explicitly asked for, and their speed just doesn't matter. Burdening them
with funky control flow would be a real win if eliminating
almost-always-useless test/branch constructs in often-called tp_traverse
slots sped the latter.



From drifty@alum.berkeley.edu Thu Apr 10 06:28:09 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Wed, 9 Apr 2003 22:28:09 -0700 (PDT)
Subject: [Python-Dev] Unicode
In-Reply-To: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
Message-ID: <Pine.SOL.4.53.0304092226280.29205@death.OCF.Berkeley.EDU>

[Huey Jiang]

> Hi There,
>
>
> I wonder how can I get python to support Chinese
> language?

This is the wrong place to ask this question. python-dev is meant to
discuss the development of Python. Try emailing your question to
python-list@python.org; you should be able to get some help there.

-Brett


From martin@v.loewis.de Thu Apr 10 06:30:19 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 07:30:19 +0200
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <20030409203319.GS1756@tummy.com>
References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu>
 <200304081450.h38EoqE20178@odiug.zope.com>
 <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de>
 <16020.27171.834878.631470@montanaro.dyndns.org>
 <20030409203319.GS1756@tummy.com>
Message-ID: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de>

Sean Reifschneider <jafo@tummy.com> writes:

> I don't think anyone sane would create a top-level that's digits,
> particularly in the range of 0 to 255. That probably means that
> somebody is going to do it... ;-/

Indeed, Anthony brought the example of 911.com, which has been
registered despite being illegal.

Regards,
Martin


From martin@v.loewis.de Thu Apr 10 06:33:22 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 07:33:22 +0200
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com>
Message-ID: <m3brzevrvh.fsf@mira.informatik.hu-berlin.de>

"Tim Peters" <tim_one@email.msn.com> writes:

> Still, I expect both could be handled by setjmp in the gc module get_ref*
> driver functions and longjmp (as needed) in the gc module visitor functions.
> IOW, the tp_traverse slot functions don't really need to cooperate, or even
> know anything about "early returns".

That would require that tp_traverse does not modify any refcount while
iterating, right? It seems unpythonish to use setjmp/longjmp for
exceptions.

Regards,
Martin



From martin@v.loewis.de Thu Apr 10 06:37:39 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 07:37:39 +0200
Subject: [Python-Dev] Unicode
In-Reply-To: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
Message-ID: <m37ka2vroc.fsf@mira.informatik.hu-berlin.de>

Huey Jiang <huey_jiang@yahoo.com> writes:

> I wonder how can I get python to support Chinese
> language? I noticed python has Unicode feature in
> version 2.2.2, but as I tried:
> 
> >>> str = " a_char_in_chinese_lan"
> 
> I encountered UnicodeError. How can I make this to
> work? 

Hi Huey,

This is a mailing list for the development *of* Python; questions for
the development *with* Python, or for asking for help.

In the specific example, you should do some more research on your
own. For example, does it matter whether you use IDLE or the command
line Python? Does it matter whether you use Unix or Windows? Does it
matter whether you put the string into a source file or enter them in
interactive mode?

[quick answer: all these things matter; in the cases where it doesn't
work as you expect, causes vary widely]

Regards,
Martin



From greg@cosc.canterbury.ac.nz Thu Apr 10 06:49:19 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 Apr 2003 17:49:19 +1200 (NZST)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304100549.h3A5nJp26297@oma.cosc.canterbury.ac.nz>

> Indeed, Anthony brought the example of 911.com, which has been
> registered despite being illegal.

At least 911 is greater than 255, which unfortunately isn't the case
for 123.

But all these would be caught by requiring a full 4-number address
before deciding it's numeric. (I don't think it's worth allowing for
0-padding if there are less than 4 numbers.)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From egg@ign.com Thu Apr 10 12:38:49 2003
From: egg@ign.com (Ponce Dubuque)
Date: Thu, 10 Apr 2003 04:38:49 -0700
Subject: [Python-Dev] Unicode
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de>
Message-ID: <017b01c2ff55$c156a7e0$02fea8c0@HP>

Whatever the rules acribe, poor Mr Jiang has nonetheless done 'development'
in Python. Perhaps you ought to consider re-naming the list. I am sure
somewhere, someone has mislabeled a link saying that this is where one
posts, when one does development in Python.

However, I do not wish for this suggestion to be the source of some great
controversy. Everyone knows that trapping itself in such trifles is the
reason why open-source most often gets nowhere.



From aahz@pythoncraft.com Thu Apr 10 13:22:43 2003
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 10 Apr 2003 08:22:43 -0400
Subject: [Python-Dev] OT: Signal/noise ratio
In-Reply-To: <017b01c2ff55$c156a7e0$02fea8c0@HP>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP>
Message-ID: <20030410122243.GA17289@panix.com>

On Thu, Apr 10, 2003, Ponce Dubuque wrote:
>
> Whatever the rules acribe, poor Mr Jiang has nonetheless done
> 'development' in Python. Perhaps you ought to consider re-naming the
> list. I am sure somewhere, someone has mislabeled a link saying that
> this is where one posts, when one does development in Python.

What name would we pick to clearly indicate this? I am starting to
think that a better idea would be to make python-dev a closed list (only
subscribers may post), and have the subscription process include a
challenge/response with a code word embedded in the list rules.

If this capability isn't already in Mailman, I can think of several
mailing lists that could use this capability.
-- 
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects 
with useful practice. --Aahz, c.l.py, 2/4/2002


From zooko@zooko.com Thu Apr 10 13:51:13 2003
From: zooko@zooko.com (Zooko)
Date: Thu, 10 Apr 2003 08:51:13 -0400
Subject: [Python-Dev] OT: Signal/noise ratio
In-Reply-To: Message from Aahz <aahz@pythoncraft.com>
 of "Thu, 10 Apr 2003 08:22:43 EDT." <20030410122243.GA17289@panix.com>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com>
Message-ID: <E193bWL-0000Ij-00@localhost>

 Aahz wrote:
>
> I am starting to
> think that a better idea would be to make python-dev a closed list (only
> subscribers may post), and have the subscription process include a
> challenge/response with a code word embedded in the list rules.

This is how we run p2p-hackers [1] with Mailman and it works quite well to quell 
off-topic posts without, as far as I can tell, deterring any valuable posts.

Regards,

Zooko

http://zooko.com/
 ^-- under re-construction: some new stuff, some broken links

[1] http://zgp.org/mailman/listinfo/p2p-hackers


From barry@python.org Thu Apr 10 14:12:07 2003
From: barry@python.org (Barry Warsaw)
Date: 10 Apr 2003 09:12:07 -0400
Subject: [Python-Dev] OT: Signal/noise ratio
In-Reply-To: <20030410122243.GA17289@panix.com>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
 <m37ka2vroc.fsf@mira.informatik.hu-berlin.de>
 <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com>
Message-ID: <1049980327.28969.7.camel@anthem>

On Thu, 2003-04-10 at 08:22, Aahz wrote:

> What name would we pick to clearly indicate this? I am starting to
> think that a better idea would be to make python-dev a closed list (only
> subscribers may post), and have the subscription process include a
> challenge/response with a code word embedded in the list rules.
> 
> If this capability isn't already in Mailman, I can think of several
> mailing lists that could use this capability.

It's nearly there. You could set up an autoreply with the list
guidelines and send that on the first post. What isn't there is a
challenge/response subscription auto-enable, although I plan on adding
something like this for Mailman 2.2. I'd rather not discuss this
further on this list though.

FWIW, python-dev /was/ a closed list at one point, with subscriptions
requiring admin approval. At some point we didn't feel the overhead was
worth it so we "quietly" changed the policy to allow mail-back
confirmation subscriptions.

I don't think we need to change things personally. IMO, We're already
on the verge of spending more time discussing list policy than in simply
handling the odd off-topic post <wink>.

-Barry




From vladimir.marangozov@optimay.com Thu Apr 10 14:40:44 2003
From: vladimir.marangozov@optimay.com (Marangozov, Vladimir (Vladimir))
Date: Thu, 10 Apr 2003 09:40:44 -0400
Subject: [Python-Dev] Re: _socket efficiency ideas
Message-ID: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com>

Hi,

About the DNS discussion, I'll chime in with some info.
(I don't know what Python does about this and have no
 time to figure it out).

The format of an Internet (IPv4) address is:

 a.b.c.d - with all parts treated as 8 bits
 a.b.c - with 'c' treated as 16 bits
 a.b - with 'b' treated as 24 bits
 a - with 'a' treated as 32 bits

You can try this out with ping 127.1; ping 127, etc.

Any decent DNS resolver first tries to figure out whether
the requested name string is an IP address. If it is, it
doesn't send a query and returns immediately the numeric
value of the string representation of the IP address.

How a DNS resolver detects whether it should launch a
query for the name 'name' varies from resolver to resolver,
but basically, it does the following:

1. check for local resolution of 'name'
 (ex. if 'name' =3D=3D 'localhost', return 127.0.0.1)

2. if inet_aton('name') succeeds, 'name' is an IP address
 and return the result from inet_aton.

3. If caching is enabled, check the cache for 'name'

If 1, 2 and 3 don't hold, send a DNS query.

Caching is a separate/complementary issue and I agree that
it should be left to the underlying resolver.

Cheers,
Vladimir


From paul-python@svensson.org Thu Apr 10 14:46:25 2003
From: paul-python@svensson.org (Paul Svensson)
Date: Thu, 10 Apr 2003 09:46:25 -0400 (EDT)
Subject: [Python-Dev] _socket efficiencies ideas
In-Reply-To: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20030410094133.W71996-100000@familjen.svensson.org>

On 10 Apr 2003, Martin v. [iso-8859-15] Löwis wrote:

>Sean Reifschneider <jafo@tummy.com> writes:
>
>> I don't think anyone sane would create a top-level that's digits,
>> particularly in the range of 0 to 255. That probably means that
>> somebody is going to do it... ;-/
>
>Indeed, Anthony brought the example of 911.com, which has been
>registered despite being illegal.

Most of the alternate roots also carry the top level domains .800 and .411

	/Paul




From guido@python.org Thu Apr 10 14:47:10 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 09:47:10 -0400
Subject: [Python-Dev] OT: Signal/noise ratio
In-Reply-To: Your message of "Thu, 10 Apr 2003 08:51:13 EDT."
 <E193bWL-0000Ij-00@localhost>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com>
 <E193bWL-0000Ij-00@localhost>
Message-ID: <200304101347.h3ADlH603332@odiug.zope.com>

Isn't it easier to just ignore the occasional off-topic post rather
than trying to invent elaborate technological solutions to deal with
what is essentially a social problem? I don't think there's much of a
misunderstanding in the world about what python-dev is; it's probably
more that some people want to get answers from the "smart crowd",
which is known to hang out here. If we simply ignore inappropriate
posts, or send polite redirections, we're doing the best we can.

I'm for avoiding how-to discussions here. I'm against trying to keep
people out of this list for any other reason than insistent
obnoxiousness. Python-dev needs to be open.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net Thu Apr 10 19:09:57 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 14:09:57 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <m3brzevrvh.fsf@mira.informatik.hu-berlin.de>
Message-ID: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>

[Tim]
> Still, I expect both could be handled by setjmp in the gc
> module get_ref* driver functions and longjmp (as needed) in the
> gc module visitor functions. IOW, the tp_traverse slot functions don't
> really need to cooperate, or even know anything about "early returns".

[martin@v.loewis.de]
> That would require that tp_traverse does not modify any refcount while
> iterating, right?

Or do anything else that relies on calls to visit() returning. I've looked
at every traverse slot in the core, and there's no problem with those. I
don't think that's an accident -- the only purpose of an object's
tp_traverse is to invoke the visit callback on the non-NULL PyObject*
pointers the object has. So, e.g., there isn't an incref or decref in any
of 'em now; at worst there's an int loop counter.

> It seems unpythonish to use setjmp/longjmp for exceptions.

I'm not suggesting adding setjmp/longjmp to the Python language <0.9 wink>.
I'm suggesting using them for two specific and obscure gc module callbacks
that aren't normally used (*most* of the gc module callbacks wouldn't use
setjmp/longjmp); in return, mounds of frequently executed code like

static int
func_traverse(PyFunctionObject *f, visitproc visit, void *arg)
{
	int err;
	if (f->func_code) {
		err = visit(f->func_code, arg);
		if (err)
			return err;
	}
	if (f->func_globals) {
		err = visit(f->func_globals, arg);
		if (err)
			return err;
	}
	if (f->func_module) {
		err = visit(f->func_module, arg);
		if (err)
			return err;
	}
	if (f->func_defaults) {
		err = visit(f->func_defaults, arg);
		if (err)
			return err;
	}
	if (f->func_doc) {
		err = visit(f->func_doc, arg);
		if (err)
			return err;
	}
	...
	return 0;
}

could become the simpler and faster

static int
func_traverse(PyFunctionObject *f, visitproc visit, void *arg)
{
	int err;
	if (f->func_code)
		visit(f->func_code, arg);
	if (f->func_globals)
		visit(f->func_globals, arg);
	if (f->func_module)
		visit(f->func_module, arg);
	if (f->func_defaults)
		visit(f->func_defaults, arg);
	if (f->func_doc)
		visit(f->func_doc, arg);
	...
	return 0;
}

(I kept the final return 0 so that the signature wouldn't change.)



From nati@ai.mit.edu Thu Apr 10 19:55:43 2003
From: nati@ai.mit.edu (Nathan Srebro)
Date: Thu, 10 Apr 2003 14:55:43 -0400
Subject: [Python-Dev] Super and properties
In-Reply-To: <001401c2f926$1d32d7e0$a8130dd5@violante>
References: <001401c2f926$1d32d7e0$a8130dd5@violante>
Message-ID: <3E95BE2F.8010900@ai.mit.edu>

Gon=E7alo Rodrigues wrote:

> My problem has to do with super that does not seem to work well with
> properties.

I encountered simmilar problems, and wrote a class, 'duper', which=20
behaves like 'super', but handles attributes deffined by arbitrary=20
descriptors cooperatively. It is available from=20
http://www.ai.mit.edu/~nati/Python/

Nati




From mal@lemburg.com Thu Apr 10 20:09:07 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 10 Apr 2003 21:09:07 +0200
Subject: [Python-Dev] Unicode
In-Reply-To: <017b01c2ff55$c156a7e0$02fea8c0@HP>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com>	<m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP>
Message-ID: <3E95C153.8040104@lemburg.com>

Ponce Dubuque wrote:
> Whatever the rules acribe, poor Mr Jiang has nonetheless done 'development'
> in Python. Perhaps you ought to consider re-naming the list. I am sure
> somewhere, someone has mislabeled a link saying that this is where one
> posts, when one does development in Python.

Perhaps you could find these links and suggest fixing them ?

Python-Dev has always been a Python developer mailing list where
Python language development is discussed and managed. There are many
other lists out there which deal with development using Python.

> However, I do not wish for this suggestion to be the source of some great
> controversy. Everyone knows that trapping itself in such trifles is the
> reason why open-source most often gets nowhere.

I believe we've gone a looong way with Python :-) (even though these
discussions come up every now and then).

W/r to the subject, I suggest to start the Unicode discovery
tour with the Python PEP 100:

	http://www.python.org/peps/pep-0100.html

It has a list of references near the bottom which you can use to
bootstrap the quest.

Have fun,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source (#1, Apr 10 2003)
 >>> Python/Zope Products & Consulting ... http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium: 75 days left



From jeremy@zope.com Thu Apr 10 20:13:12 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 10 Apr 2003 15:13:12 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
Message-ID: <1050001991.4473.103.camel@slothrop.zope.com>

On Thu, 2003-04-10 at 14:09, Tim Peters wrote:
> I'm not suggesting adding setjmp/longjmp to the Python language <0.9 wink>.
> I'm suggesting using them for two specific and obscure gc module callbacks
> that aren't normally used (*most* of the gc module callbacks wouldn't use
> setjmp/longjmp); in return, mounds of frequently executed code like

...

> could become the simpler and faster

...

Sure sounds good to me.

If traverse worked this way, the traverse and clear slots and a part of
the dealloc slot become almost identical. The take all PyObject *
members in the struct and perform some action on them if they are
non-NULL. dealloc performs a DECREF. clear performs a DECREF + assign
NULL. traverse calls visit. It sure makes it easy to verify that each
is implemented correctly. It would be cool if there were a way to
automate some of the boilerplate.

Jeremy





From misa@redhat.com Thu Apr 10 20:29:20 2003
From: misa@redhat.com (Mihai Ibanescu)
Date: Thu, 10 Apr 2003 15:29:20 -0400 (EDT)
Subject: [Python-Dev] More socket questions
Message-ID: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>

Hello,

Since somebody mention inet_addr, here's something else that I can attempt 
to fix if we agree on it.

In python 2.2.2:

socket.inet_aton("255.255.255.255")
Traceback (most recent call last):
 File "<stdin>", line 1, in ?
socket.error: illegal IP address string passed to inet_aton

Implementation:

static PyObject*
PySocket_inet_aton(PyObject *self, PyObject *args)
{
#ifndef INADDR_NONE
#define INADDR_NONE (-1)
#endif

 /* Have to use inet_addr() instead */
 char *ip_addr;
 unsigned long packed_addr;

 if (!PyArg_ParseTuple(args, "s:inet_aton", &ip_addr)) {
 return NULL;
 }
#ifdef USE_GUSI1
 packed_addr = inet_addr(ip_addr).s_addr;
#else
 packed_addr = inet_addr(ip_addr);
#endif

 if (packed_addr == INADDR_NONE) { /* invalid address */
 PyErr_SetString(PySocket_Error,
 "illegal IP address string passed to inet_aton");


Reason for this behaviour can be found in the man page for inet_addr:

 The inet_addr() function converts the Internet host
 address cp from numbers-and-dots notation into binary data
 in network byte order. If the input is invalid,
 INADDR_NONE (usually -1) is returned. This is an obsolete
 interface to inet_aton, described immediately above; it is
 obsolete because -1 is a valid address (255.255.255.255),
 and inet_aton provides a cleaner way to indicate error
 return.


I propose that we use inet_aton to implement PySocket_inet_aton (am I 
clever or what). The part that I don't know, how portable is this 
function? Does it exist on Mac and Windows?

Thanks,
Misa



From shane.holloway@ieee.org Thu Apr 10 20:47:46 2003
From: shane.holloway@ieee.org (Shane Holloway (IEEE))
Date: Thu, 10 Apr 2003 13:47:46 -0600
Subject: [Python-Dev] Why is spawn*p* not available on Windows?
Message-ID: <3E95CA62.4040904@ieee.org>

Ok, so here's my story. I got curious as to why the various spawn*p* 
were not available on Windows. The conclusion I came to is that only 
"spawnv" and "spawnve" are exported by posixmodule.c, and os.py creates 
the other variants in terms of these functions. However, the "spawnvp" 
and "spawnvpe" python implementations are dependant upon the availablity 
of "fork".

So, after all that, I looked in standard library header file for 
process.h and found function prototypes for the various _spawn 
functions. Would it make sense to add support for "spawnvp" and 
"spawnvpe" to posixmodule.c? Should it be guarded by the existing 
HAVE_SPAWNV, new HAVE_SPAWNVP, or by MS_WINDOWS definitions?

Or, has someone already tried this with lessons learned? 

Thanks,
-Shane



From guido@python.org Thu Apr 10 20:57:30 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 15:57:30 -0400
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: Your message of "Sat, 05 Apr 2003 14:35:31 EST."
 <20030405193531.GA23455@meson.dyndns.org>
References: <20030405193531.GA23455@meson.dyndns.org>
Message-ID: <200304101957.h3AJvUe04626@odiug.zope.com>

> It occurred to me this afternoon (after answering aquestion about creating
> file objects from file descriptors) that perhaps os.fdopen would be more
> logically placed someplace else - of course it could also remain as
> os.fdopen() for whatever deprecation period is warrented.
> 
> Perhaps as a class method of the file type, file.fromfd()?
> 
> Should I file a feature request for this on sf, or would it be considered
> too much of a mindless twiddle to bother with?

The latter.

If I had to do it over again, your suggestion would make sense; class
methods are a good way to provide alternative constructors, and we're
doing this e.g. for the new datetime class/module.

But having this in the os module, which deals with such low-level file
descriptors, still strikes me as a pretty decent place to put it as
well, and I don't think it's worth the bother of updating
documentation and so on.

The social cost of deprecating a feature is pretty high. In general,
I'm open to fixing design bugs if keeping the buggy design means
forever having to explain a wart to new users, or forever having to
debug bad code written because of a misunderstanding perpetuated by
the buggy design (like int division). But in this case, I see no
compelling reason; explaining how to do this isn't particularly easier
or harder one way or the other.

Responses to other messages in this thread:

[Greg Ewing]
> Not all OSes have the notion of a file descriptor, which is probably
> why it's in the os module.

Perhaps, but note that file objects have a method fileno(), which
returns a file descriptor. Its implementation is not #ifdefed in any
way -- the C stdio library requires fileno() to exist!

Even if fdopen() really did need an #ifdef, it would be just as simple
only to have the file.fdopen() class method when the C library defines
fdopen() as it is to only have os.fdopen() under those conditions.

[Oren Tirosh]
> I don't see much point in moving it around just because the place 
> doesn't seem right but the fact that it's a function rather than a
> method means that some things cannot be done in pure Python.
> 
> I can create an uninitialized instance of a subclass of 'file' using
> file.__new__(filesubclass) but the only way to open it is by name
> using file.__init__(filesubclassinstance, 'filename'). A file
> subclass cannot be opened from a file descriptor because fdopen
> always returns a new instance of 'file'.
> 
> If there was some way to open an uninitialized file object from a
> file descriptor it would be possible, for example, to write a
> version of popen that returns a subclass of file. It could add a
> method for retrieving the exit code of the process, do something
> interesting on __del__, etc.

You have a point, but it's mostly theoretical: anything involving
popen() should be done in C anyway, and this is no problem in C.

> Here are some alternatives of where this could be implemented,
> followed by what a Python implementation of os.fdopen would look
> like:
> 
> 1. New form of file.__new__ with more arguments:
> 
> def fdopen(fd, mode='r', buffering=-1):
> return file.__new__('(fdopen)', mode, buffering, fd)

This violates the current invariant that __new__ doesn't initialize
the file with a C-level FILE *.

> 2. Optional argument to file.__init__:
> 
> def fdopen(fd, mode='r', buffering=-1):
> return file('(fdopen)', mode, buffering, fd)
> 
> 3. Instance method (NOT a class method):
> 
> def fdopen(fd, mode='r', buffering=-1):
> f = file.__new__()
> f.fdopen(fd, mode, buffering, '(fdopen)')
> return f

Hm, you seem to be implying that it should not be a class method
because it should be possible to first create an uninitialized
instance with __new__ (possibly of a subclass) and then initialize it
separately. Perhaps. But since class methods also work for
subclasses, I'm not sure I see the use case for this distinction.

In any case I think this should wait until a future redesign of the
stdio library, which will probably do some other refactoring (while
staying compatible with the existing API). I've checked in some rough
ideas in nondist/sandbox/sio/.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Thu Apr 10 21:21:32 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 16:21:32 -0400
Subject: [Python-Dev] Minor issue with PyErr_NormalizeException
In-Reply-To: Your message of "Tue, 01 Apr 2003 13:41:54 PST."
 <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net>
References: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net>
Message-ID: <200304102021.h3AKLXl05134@odiug.zope.com>

> We had a bug in one of our extension modules that caused a core dump in
> PyErr_NormalizeException(). At the very top of the function (line 133) it
> checks for a NULL type. I think it should have a "return" here so that
> the code does not continue and thus dump core on line 153 when it calls
> PyClass_Check(type). This should also make the comment not lie about
> dumping core. ;)
> 
> Just thought I'd pass it on..

Thanks! You're right, the comment is misleading and the call to
PyErr_SetString() was bogus.

Tim and Barry suggested to replace it with a call to Py_FatalError(),
but I think that's wrong too: I found several places where
PyErr_NormalizeException() is used and a few lines later a check is
made whether the exception type is NULL, so I think ignoring this call
is safer.

I'll fix this in CVS, and backport to 2.2.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Thu Apr 10 21:26:05 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 22:26:05 +0200
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
Message-ID: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de>

Tim Peters <tim.one@comcast.net> writes:

> could become the simpler and faster

How much faster, and for what example? Beautiful is better than ugly.

Regards,
Martin


From martin@v.loewis.de Thu Apr 10 21:30:43 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 22:30:43 +0200
Subject: [Python-Dev] OT: Signal/noise ratio
In-Reply-To: <20030410122243.GA17289@panix.com>
References: <20030410045703.14754.qmail@web20007.mail.yahoo.com>
 <m37ka2vroc.fsf@mira.informatik.hu-berlin.de>
 <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com>
Message-ID: <m3istmhz7w.fsf@mira.informatik.hu-berlin.de>

Aahz <aahz@pythoncraft.com> writes:

> I am starting to think that a better idea would be to make
> python-dev a closed list (only subscribers may post), and have the
> subscription process include a challenge/response with a code word
> embedded in the list rules.

I agree with Guido that an occasional indication of the lists's
charter is not that annoying, and helps "silent" readers to focus
their first posting to the python-dev related issues.

Regards,
Martin



From jeremy@zope.com Thu Apr 10 21:31:27 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 10 Apr 2003 16:31:27 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de>
References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
 <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050006687.20054.108.camel@slothrop.zope.com>

On Thu, 2003-04-10 at 16:26, Martin v. L=F6wis wrote:
> Tim Peters <tim.one@comcast.net> writes:
>=20
> > could become the simpler and faster
>=20
> How much faster, and for what example? Beautiful is better than ugly.

Doesn't "beautiful is better than ugly" mean that a little ugliness in
the gcmodule allows all the client code to be beautiful?

Jeremy




From martin@v.loewis.de Thu Apr 10 21:33:13 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 22:33:13 +0200
Subject: [Python-Dev] Re: _socket efficiency ideas
In-Reply-To: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com>
References: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com>
Message-ID: <m3el4ahz3q.fsf@mira.informatik.hu-berlin.de>

"Marangozov, Vladimir (Vladimir)" <vladimir.marangozov@optimay.com> writes:

> Any decent DNS resolver first tries to figure out whether
> the requested name string is an IP address.

So how come that the *very* recent netdb libraries do DNS lookups for
"apparently numeric" addresses, whereas somewhat older libraries
don't?

Regards,
Martin



From zen@shangri-la.dropbear.id.au Thu Apr 10 21:35:07 2003
From: zen@shangri-la.dropbear.id.au (Stuart Bishop)
Date: Fri, 11 Apr 2003 06:35:07 +1000
Subject: [Python-Dev] tzset
In-Reply-To: <057832A9-5A91-11D7-8A30-000393B63DDC@shangri-la.dropbear.id.au>
Message-ID: <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au>

On Thursday, March 20, 2003, at 04:01 PM, Stuart Bishop wrote:

> I've submitted an update to SF:
> 	http://www.python.org/sf/706707
>
> This version should only build time.tzset if it accepts the TZ 
> environment
> variable formats documented at:
> 	http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html
> So it shouldn't build under Windows.
>
> The last alternative would be to expose time.tzset if it exists at all,
> and the test suite would simply check to make sure it doesn't raise
> an exception. This would leave behaviour totally up to the OS, and the
> corresponding lack of documentation in the Python library reference.

The time.tzset patch is running fine. The outstanding issue is the
test suite. I can happily run the existing tests on OS X, Redhat 7.2
and Solaris 2.8, but there are reports of odd behaviour that can
only be attributed (as far as I can see) to broken time libraries.

Broken time libraries are fine - time.tzset() is at a basic level
just a wrapper around the C library call and we can't take 
responsibility
for the operating system's bad behavior. However, if the C library
doesn't work as documented, we have no way of testing if the various
time.* values are being updated correctly.

I think these are the options:
 - Use the test suite as it stands at the moment, which may cause the
 test to fail on broken platforms.
 - Use the test suite as it stands at the moment, flagging this test
 as an expected failure on broken platforms.
 - Don't test - just make sure time.tzset() doesn't raise an 
exception or
 core dump. The code that populated time.tzname etc. has never had 
unit
 tests before, so its not like we are going backwards. This option
 means tzset(3) could be exposed on Windows (which I can't 
presently
 do, not having a Windows dev box available).
 - Make the checks for a sane tzset(3) in configure.in more paranoid,
 so time.tzset() is only built if your OS correctly parses the 
standard
 TZ environment variable format *and* can correctly do daylight 
savings
 time calculations in the southern hemisphere etc.

-- 
Stuart Bishop <zen@shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/



From martin@v.loewis.de Thu Apr 10 21:35:17 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 22:35:17 +0200
Subject: [Python-Dev] More socket questions
In-Reply-To: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>
References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>
Message-ID: <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de>

Mihai Ibanescu <misa@redhat.com> writes:

> I propose that we use inet_aton to implement PySocket_inet_aton (am I 
> clever or what). The part that I don't know, how portable is this 
> function? Does it exist on Mac and Windows?

This is the tricky part of any such change: Nobody knows, and you have
to test it on a wide variety of platforms before it is
acceptable. That *atleast* includes Windows, OS X, and one or two
other flavours of Unix (Linux libc6 typically being one of them).

Regards,
Martin


From martin@v.loewis.de Thu Apr 10 21:36:54 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 10 Apr 2003 22:36:54 +0200
Subject: [Python-Dev] Why is spawn*p* not available on Windows?
In-Reply-To: <3E95CA62.4040904@ieee.org>
References: <3E95CA62.4040904@ieee.org>
Message-ID: <m365pmhyxl.fsf@mira.informatik.hu-berlin.de>

"Shane Holloway (IEEE)" <shane.holloway@ieee.org> writes:

> So, after all that, I looked in standard library header file for
> process.h and found function prototypes for the various _spawn
> functions. Would it make sense to add support for "spawnvp" and
> "spawnvpe" to posixmodule.c? Should it be guarded by the existing
> HAVE_SPAWNV, new HAVE_SPAWNVP, or by MS_WINDOWS definitions?

Adding a HAVE_SPAWNVP would be most appropriate, IMO.

Regards,
Martin


From tim.one@comcast.net Thu Apr 10 21:33:05 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 16:33:05 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de>
Message-ID: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net>

[Tim]
>> could become the simpler and faster

[martin@v.loewis.de]
> How much faster,

Won't know until it's tried.

> and for what example?

Code that spends signficant time in tp_traverse, presumably.

> Beautiful is better than ugly.

Whish is another reason <wink> it would be nice to get rid of the endlessly
repeated masses of ugly

	if (err)
		return err;

incantations out of the many tp_traverse slots, in return for putting a
little bit of setjmp/longjmp ugliness in exactly four functions hiding in a
single module.



From guido@python.org Thu Apr 10 21:38:58 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 16:38:58 -0400
Subject: [Python-Dev] More socket questions
In-Reply-To: Your message of "Thu, 10 Apr 2003 15:29:20 EDT."
 <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>
References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>
Message-ID: <200304102039.h3AKd0c06207@odiug.zope.com>

> Since somebody mention inet_addr, here's something else that I can attempt 
> to fix if we agree on it.
> 
> In python 2.2.2:
> 
> socket.inet_aton("255.255.255.255")
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> socket.error: illegal IP address string passed to inet_aton

Check out Python 2.3, it's been fixed there.

Unfortunately Windows only has inet_addr(), so it's still broken
there.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Thu Apr 10 22:35:26 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 17:35:26 -0400
Subject: [Python-Dev] tzset
In-Reply-To: Your message of "Fri, 11 Apr 2003 06:35:07 +1000."
 <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au>
References: <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au>
Message-ID: <200304102135.h3ALZQ613146@odiug.zope.com>

> > I've submitted an update to SF:
> > 	http://www.python.org/sf/706707
> >
> > This version should only build time.tzset if it accepts the TZ 
> > environment
> > variable formats documented at:
> > 	http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html
> > So it shouldn't build under Windows.
> >
> > The last alternative would be to expose time.tzset if it exists at all,
> > and the test suite would simply check to make sure it doesn't raise
> > an exception. This would leave behaviour totally up to the OS, and the
> > corresponding lack of documentation in the Python library reference.
> 
> The time.tzset patch is running fine. The outstanding issue is the
> test suite. I can happily run the existing tests on OS X, Redhat 7.2
> and Solaris 2.8, but there are reports of odd behaviour that can
> only be attributed (as far as I can see) to broken time libraries.

The test passes for me on Red Hat 7.3.

I tried it on Windows, and if I add "#define HAVE_WORKING_TZSET 1" to
PC/pyconfig.h, timemodule.c compiles, but the tzset test fails with
the error AssertionError: 69 != 1. This is on the line

 self.failUnlessEqual(time.daylight,1)

That *could* be construed as a bug in the test, because the C library
docs only promise that the daylight variable is nonzero. But if I fix
that in the test by using bool(time.daylight), I get other failures,
so I conclude that tzset() doesn't work the same way on Windows as the
test expects.

A simple solution would be to not provide tzset() on Windows. Time on
Windows is managed sufficiently different that this might be okay.

> Broken time libraries are fine - time.tzset() is at a basic level
> just a wrapper around the C library call and we can't take
> responsibility for the operating system's bad behavior.

But is the observed behavior on Windows broken or not? I don't know.

> However, if the C library doesn't work as documented, we have no way
> of testing if the various time.* values are being updated correctly.

Right.

> I think these are the options:
> - Use the test suite as it stands at the moment, which may cause the
> test to fail on broken platforms.

But we're not sure if the platform is broken or the test too
stringent!

> - Use the test suite as it stands at the moment, flagging this test
> as an expected failure on broken platforms.

Can't do that -- can flag only *skipped* tests as expected.

> - Don't test - just make sure time.tzset() doesn't raise an
> exception or core dump. The code that populated time.tzname
> etc. has never had unit tests before, so its not like we are
> going backwards. This option means tzset(3) could be exposed
> on Windows (which I can't presently do, not having a Windows
> dev box available).

That would be acceptable to me. Since all we want is a wrapper around
the C library tzset(), all we need to test for is that it does that.

> - Make the checks for a sane tzset(3) in configure.in more
> paranoid, so time.tzset() is only built if your OS correctly
> parses the standard TZ environment variable format *and* can
> correctly do daylight savings time calculations in the
> southern hemisphere etc.

Sounds like overprotective. I think that in those cases the tzset()
function works fine, it's just the database of timezones that's
different.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca Thu Apr 10 22:04:55 2003
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 10 Apr 2003 14:04:55 -0700
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net>
References: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net>
Message-ID: <20030410210455.GA22300@glacier.arctrix.com>

Tim Peters wrote:
> [martin@v.loewis.de]
> > Beautiful is better than ugly.
> 
> Whish is another reason <wink> it would be nice to get rid of the endlessly
> repeated masses of ugly
> 
> 	if (err)
> 		return err;
> 
> incantations out of the many tp_traverse slots, in return for putting a
> little bit of setjmp/longjmp ugliness in exactly four functions hiding in a
> single module.

I agree that concentrating the ugliness is good. However, how portable
is setjmp/longjmp? The manual page I have says C99. Can we rely on it
being available? If not, could we just disable the gcmodule functions
that depend on it?

 Neil


From tim.one@comcast.net Thu Apr 10 23:12:41 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 18:12:41 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <20030410210455.GA22300@glacier.arctrix.com>
Message-ID: <BIEJKCLHCIOIHAGOKOLHEEOKFEAA.tim.one@comcast.net>

[Neil Schemenauer]
> I agree that concentrating the ugliness is good. However, how portable
> is setjmp/longjmp? The manual page I have says C99.

It's also C89, i.e. "ANSI C".

> Can we rely on it being available?

I think so. Note that we have three modules that use them now, although
they're not compiled everywhere (readline, pcre, fpectl).

> If not, could we just disable the gcmodule functions that depend on it?

Jeremy and I have spent a lot of time tracking down leaks (in Python and in
Zope) recently, and get_refer{rers, ents} have been invaluable. If we found
a platform where {set,long}jmp didn't work, I'd be OK with disabling those
two gc functions on that platform. Those functions aren't needed for normal
gc operation, and it's not any platform I'm going to be using anyway <wink>.



From skip@pobox.com Thu Apr 10 22:28:55 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 10 Apr 2003 16:28:55 -0500
Subject: [Python-Dev] More socket questions
In-Reply-To: <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de>
References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com>
 <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de>
Message-ID: <16021.57879.498864.472222@montanaro.dyndns.org>

 Martin> This is the tricky part of any such change: Nobody knows, and
 Martin> you have to test it on a wide variety of platforms before it is
 Martin> acceptable. That *atleast* includes Windows, OS X, and one or
 Martin> two other flavours of Unix (Linux libc6 typically being one of
 Martin> them).

I can check Mac OS X off your list. Here's the start of the inet_aton man
page: 

 INET(3) System Library Functions Manual INET(3)

 NAME
 inet_aton, inet_addr, inet_network, inet_ntoa, inet_ntop, inet_pton,
 inet_makeaddr, inet_lnaof, inet_netof - Internet address manipulation
 routines
 ...

And here's the check from distutils:

 >>> import distutils.ccompiler
 >>> cc = distutils.ccompiler.new_compiler()
 >>> cc.has_function("inet_aton")
 True
 >>> cc.has_function("blecherous")
 ld: Undefined symbols:
 _blecherous
 False

(Note that has_function() isn't in cvs yet.)

Skip


From neal@metaslash.com Thu Apr 10 23:50:01 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Thu, 10 Apr 2003 18:50:01 -0400
Subject: [Python-Dev] backporting string changes to 2.2.3
Message-ID: <20030410225001.GN17847@epoch.metaslash.com>

Just in case anybody missed it the first several times, 
there were several inconsistencies in the string methods/functions.
This checkin should make everything consistent for 2.3.

I'm planning to backport these string changes to 2.2.3.
The reason is that methods on string objects already have the
changes, only doc is being updated. The string module has
the change for strip, but not lstrip/rstrip, and UserString
doesn't have any. 

 Modified Files:

 Doc/lib/libstring.tex: 1.49
 Lib/UserString.py: 1.17
 Lib/string.py: 1.68
 Lib/test/string_tests.py: 1.31
 Objects/stringobject.c: 2.208
 Objects/unicodeobject.c: 2.187

 Log Message:

 Attempt to make all the various string *strip methods the same.
 * Doc - add doc for when functions were added
 * UserString
 * string object methods
 * string module functions
 'chars' is used for the last parameter everywhere.

 These changes will be backported, since part of the changes 
 have already been made, but they were inconsistent.

Neal


From greg@cosc.canterbury.ac.nz Fri Apr 11 01:29:37 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 12:29:37 +1200 (NZST)
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net>
Message-ID: <200304110029.h3B0TbA09063@oma.cosc.canterbury.ac.nz>

> I've looked at every traverse slot in the core, and there's no problem
> with those. I don't think that's an accident -- the only purpose of
> an object's tp_traverse is to invoke the visit callback on the
> non-NULL PyObject* pointers the object has. So, e.g., there isn't an
> incref or decref in any of 'em now;

But what about the *visit function*? You need to take
account of what it might do as well. And if it's ever
used for something beside GC, it could do anything.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Fri Apr 11 01:32:20 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 12:32:20 +1200 (NZST)
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <1050001991.4473.103.camel@slothrop.zope.com>
Message-ID: <200304110032.h3B0WKM09071@oma.cosc.canterbury.ac.nz>

> If traverse worked this way, the traverse and clear slots and a part
> of the dealloc slot become almost identical. ... It would be cool if
> there were a way to automate some of the boilerplate.

There is... use Pyrex. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Fri Apr 11 01:39:53 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 12:39:53 +1200 (NZST)
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <200304101957.h3AJvUe04626@odiug.zope.com>
Message-ID: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz>

> If I had to do it over again, your suggestion would make sense;
> 
> But having this in the os module, which deals with such low-level file
> descriptors, still strikes me as a pretty decent place to put it as
> well, and I don't think it's worth the bother of updating
> documentation and so on.

I can think of another reason for making it a class
method: so that custom subclasses of file, or other
file-like objects, can override it to create objects 
of the appropriate type.

But since it is an os-dependent feature, the implementation
of it probably does belong in the os module.

So how about providing a file.fromfd() which calls
os.fdopen()?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From greg@cosc.canterbury.ac.nz Fri Apr 11 01:46:40 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 12:46:40 +1200 (NZST)
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <200304101957.h3AJvUe04626@odiug.zope.com>
Message-ID: <200304110046.h3B0kex09103@oma.cosc.canterbury.ac.nz>

> but note that file objects have a method fileno(), which
> returns a file descriptor. Its implementation is not #ifdefed in any
> way -- the C stdio library requires fileno() to exist!

Hmmm, I wasn't sure whether fileno() was a required part of stdio, or
whether it only existed on unix-like systems. If it really is
required, I guess it doesn't have to be in the os module.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From guido@python.org Fri Apr 11 01:48:19 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 10 Apr 2003 20:48:19 -0400
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: "Your message of Fri, 11 Apr 2003 12:39:53 +1200."
 <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz>
References: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz>
Message-ID: <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net>

> > If I had to do it over again, your suggestion would make sense;
> > 
> > But having this in the os module, which deals with such low-level
> > file descriptors, still strikes me as a pretty decent place to put
> > it as well, and I don't think it's worth the bother of updating
> > documentation and so on.
> 
> I can think of another reason for making it a class
> method: so that custom subclasses of file, or other
> file-like objects, can override it to create objects 
> of the appropriate type.

Yeah, this was the gist of Oren's post (if I understood it correctly).

> But since it is an os-dependent feature, the implementation
> of it probably does belong in the os module.
> 
> So how about providing a file.fromfd() which calls
> os.fdopen()?

I've never seen anyone code a file subclass yet, let alone one that
needed this. YAGNI?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz Fri Apr 11 01:54:01 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 12:54:01 +1200 (NZST)
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net>
Message-ID: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz>

> it would be nice to get rid of the endlessly repeated ... ugly
> incantations out of the many tp_traverse slots, in return for putting
> a little bit of setjmp/longjmp ugliness in exactly four functions
> hiding in a single module.

I'd be pretty nervous about having any longjmps anywhere
near anything Python.

If you do this, you'll have to make it very clear that
tp_traverse implementations MUST NOT alter any Python
ref counts, or rely in any other way on running to 
completion.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Fri Apr 11 03:49:01 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 22:49:01 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <200304110029.h3B0TbA09063@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPOECAB.tim.one@comcast.net>

[Greg Ewing]
> But what about the *visit function*? You need to take
> account of what it might do as well. And if it's ever
> used for something beside GC, it could do anything.

I don't see the relevance. The visit functions are where the longjmps would
go, if a visit function felt like using one. Two visit functions in
gcmodule.c would use them, the other visit functions in gcmodule.c would
not. I don't know of any visit functions not in gcmodule.c (where they all
have static scope), nor do I expect to see any outside of gcmodule.c --
visit functions are Python internals. tp_clear and tp_traverse functions
must be supplied by extension authors who want their types to play with the
gc system, but extension authors are never required (or even asked) to write
a visit function.



From tim.one@comcast.net Fri Apr 11 03:51:27 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 22:51:27 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEPPECAB.tim.one@comcast.net>

[Greg Ewing]
> I'd be pretty nervous about having any longjmps anywhere
> near anything Python.

Why?

> If you do this, you'll have to make it very clear that
> tp_traverse implementations MUST NOT alter any Python
> ref counts, or rely in any other way on running to
> completion.

That's so. For reasons explained earlier, it would be quite surprising to
see a tp_traverse function play with anything's refcount (their purpose is
to pass an object's PyObject* pointers on to the callback argument, and
that's all; manipulating refcounts during this wouldn't make sense).



From tim.one@comcast.net Fri Apr 11 04:02:06 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 23:02:06 -0400
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <200304110046.h3B0kex09103@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEAAEDAB.tim.one@comcast.net>

[Greg Ewing]
> Hmmm, I wasn't sure whether fileno() was a required part of stdio, or
> whether it only existed on unix-like systems. If it really is
> required, I guess it doesn't have to be in the os module.

It's not required by standard C -- standard C has only streams, not file
descriptors. Nevertheless, POSIX requires them, and uses of fileno() in
Python are unconditional (aren't conditionally compiled depending on config
symbols), so they're on every platform Python links on today.



From greg@cosc.canterbury.ac.nz Fri Apr 11 04:04:35 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Apr 2003 15:04:35 +1200 (NZST)
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEPPECAB.tim.one@comcast.net>
Message-ID: <200304110304.h3B34ZZ12973@oma.cosc.canterbury.ac.nz>

> it would be quite surprising to see a tp_traverse function play with
> anything's refcount (their purpose is to pass an object's PyObject*
> pointers on to the callback argument, and that's all

A thought -- maybe tp_visit and tp_clear could be unified
by having a tp_visit that passed pointers to pointers to
objects to the callback?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Fri Apr 11 04:41:03 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 23:41:03 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <200304110304.h3B34ZZ12973@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net>

[Greg Ewing]
> A thought -- maybe tp_visit and tp_clear could be unified
> by having a tp_visit that passed pointers to pointers to
> objects to the callback?

I think Jeremy suggested something like that earlier today. I don't think
it would fly now. tuples are the simplest example of a gc container object
whose tp_clear and tp_traverse slot functions do radically different things
(the tuple tp_clear is NULL!); type objects may be the most complex example
(see the long comment block in typeobject.c's type_clear for an explanation
of why only tp_mro is-- or needs to be --cleared). In general, tp_traverse
needs to reveal every PyObject* that may be part of a cycle, but tp_clear
only needs to nuke the subset of those necessary to guarantee that all
cycles will be broken.

OTOH, I suspect Guido thought too hard about this. Like the tp_clear
comment:

	 tp_dict:
	 It is a dict, so the collector will call its tp_clear.

If type_clear decrefed tp_dict, and the refcount fell to 0 thereby, the
usual refcount mechanism would nuke the dict on its own, and the collector
would *not* in fact call the dict's tp_clear slot (the dict object would get
unlinked from the gc list it was in, and the collector would never see the
dict again).

So I'm unclear on what we're trying to optimize when a tp_clear nukes less
than the corresponding tp_traverse visits. I suppose "code space" is one
decent answer to that.



From tim.one@comcast.net Fri Apr 11 04:52:43 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 10 Apr 2003 23:52:43 -0400
Subject: [Python-Dev] tzset
In-Reply-To: <200304102135.h3ALZQ613146@odiug.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net>

[Guido]
> The test passes for me on Red Hat 7.3.
>
> I tried it on Windows, and if I add "#define HAVE_WORKING_TZSET 1" to
> PC/pyconfig.h, timemodule.c compiles, but the tzset test fails with
> the error AssertionError: 69 != 1. This is on the line
>
> self.failUnlessEqual(time.daylight,1)
>
> That *could* be construed as a bug in the test, because the C library
> docs only promise that the daylight variable is nonzero.

That's all the MS docs promise too. You're actually getting ord("E"), the
first letter in "EDT".

> But if I fix that in the test by using bool(time.daylight), I get other
> failures, so I conclude that tzset() doesn't work the same way on Windows
as the
> test expects.

You can read the docs. It doesn't work on Windows the way anyone expects
<0.5 wink>:

 http://tinyurl.com/9a2n

> ...
> But is the observed behavior on Windows broken or not? I don't know.

It probably works as documented, but Real Windows Weenies use the native
Win32 time zone functions.

> ...
> That would be acceptable to me. Since all we want is a wrapper around
> the C library tzset(), all we need to test for is that it does that.

It's not really what I want. When we expose highly platform-dependent
functions, we create a lot of confusion along with them. Perhaps that's
because we're not always careful to emphasize that the behavior is a
cross-platform crapshoot, and users are rarely careful to heed such
warnings.



From martin@v.loewis.de Fri Apr 11 06:08:56 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 11 Apr 2003 07:08:56 +0200
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net>
Message-ID: <m3ptnty61j.fsf@mira.informatik.hu-berlin.de>

Tim Peters <tim.one@comcast.net> writes:

> So I'm unclear on what we're trying to optimize when a tp_clear nukes less
> than the corresponding tp_traverse visits. I suppose "code space" is one
> decent answer to that.

In the case of type objects, it's not a matter of optimization but of
correctness. If you were clearing all slots of a type object, you'd
lose state that is still needed later on; see the comment for
typeobject.c:2.150.

Regards,
Martin



From boris.boutillier@arteris.net Fri Apr 11 09:25:21 2003
From: boris.boutillier@arteris.net (Boris Boutillier)
Date: 11 Apr 2003 10:25:21 +0200
Subject: [Python-Dev] backporting string changes to 2.2.3
In-Reply-To: <20030410225001.GN17847@epoch.metaslash.com>
References: <20030410225001.GN17847@epoch.metaslash.com>
Message-ID: <1050049521.1751.16.camel@elevedelix>

Hi everybody,

As this is my first message on this development list I'll introduce
myself, I am a hardware designer in a new french startup Arteris which
is developping MicroElectronics IP cores. I'm responsible for
development of EDA tools, ie software to design and validate hardware
designs. For this purpose we've been developping a EDA design plateform
enterily in Python (with Python-C parts for the core database) for about
14 months. This plateform is being actively used for about three months
and is working well.

Now I'd like to give some help in developing Python, using my own
experience to try to improve this great language.

I'll start simple here, (we've got other great ideas, but i'll expose
them here when there will be some kind of first draft).

On string objects there is a find and rfind, a lstrip and rstrip, but
there is no rsplit function, is there a reason why there isn't, or is
this only because nobody implement it ? ( in this case I'll propose a
patch in a few days). I'm mainly using it for
'toto.titi.tata'.rsplit('.',1) -> 'toto.titi','tata' as our internal
database representation is quite like a logical filesystem.

-- 
Boris Boutillier - boris.boutillier@arteris.net




On Fri, 2003-04-11 at 00:50, Neal Norwitz wrote:
> 
> Just in case anybody missed it the first several times, 
> there were several inconsistencies in the string methods/functions.
> This checkin should make everything consistent for 2.3.
> 
> I'm planning to backport these string changes to 2.2.3.
> The reason is that methods on string objects already have the
> changes, only doc is being updated. The string module has
> the change for strip, but not lstrip/rstrip, and UserString
> doesn't have any. 
> 
> Modified Files:
> 
> Doc/lib/libstring.tex: 1.49
> Lib/UserString.py: 1.17
> Lib/string.py: 1.68
> Lib/test/string_tests.py: 1.31
> Objects/stringobject.c: 2.208
> Objects/unicodeobject.c: 2.187
> 
> Log Message:
> 
> Attempt to make all the various string *strip methods the same.
> * Doc - add doc for when functions were added
> * UserString
> * string object methods
> * string module functions
> 'chars' is used for the last parameter everywhere.
> 
> These changes will be backported, since part of the changes 
> have already been made, but they were inconsistent.
> 
> Neal
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev




From vladimir.marangozov@optimay.com Fri Apr 11 09:38:12 2003
From: vladimir.marangozov@optimay.com (Marangozov, Vladimir (Vladimir))
Date: Fri, 11 Apr 2003 04:38:12 -0400
Subject: [Python-Dev] Re: More socket questions
Message-ID: <58C1D0B500F92D469E0D73994AB66D040107EC29@GE0005EXCUAG01.ags.agere.com>

Hi,

inet_aton() is a pretty simple parser of an IP address string,
but it is not available on all setups. Libraries relying on
it usually provide a local version. So do the same.

Search the Web for "inet_aton.c" and you'll hit a standard
implementation, with all the niceties about the base encoding
of each part of the IP address which follows the C convention:
0x - hex, 0 - octal, other - decimal. And thus, BTW,
"ping 192.30.20.10" is not the same as "ping 192.030.020.010".
So take that code, stuff it in my_inet_aton() and case closed.

You could use my_inet_aton() before calling gethostbyname('name')
to see whether 'name' is an IP address and return immediately,
but as I said, decent resolvers should do that for you. After
all, their job is to give you an IP address in return. If you
feed an IP address as an input, you should get it as a reply.
Not all resolvers are decent, though. On top of that, some have
bugs :-). I can't answer the question about netdb's status quo.

Cheers,
Vladimir


From mwh@python.net Fri Apr 11 13:03:56 2003
From: mwh@python.net (Michael Hudson)
Date: Fri, 11 Apr 2003 13:03:56 +0100
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz> (Greg
 Ewing's message of "Fri, 11 Apr 2003 12:54:01 +1200 (NZST)")
References: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz>
Message-ID: <2mwui16y1f.fsf@starship.python.net>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> I'd be pretty nervous about having any longjmps anywhere
> near anything Python.

Too late, if you use readline and ever press ^C.

Cheers,
M.

-- 
 Presumably pronging in the wrong place zogs it.
 -- Aldabra Stoddart, ucam.chat


From harri@labs.trema.com Fri Apr 11 13:16:47 2003
From: harri@labs.trema.com (Harri Pasanen)
Date: Fri, 11 Apr 2003 14:16:47 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
Message-ID: <200304111416.47006.harri.pasanen@trema.com>

Hello,

In a few hours old CVS checkout, I'm having problems getting the 
embedded python to work.

---------8<------------8<-------------8<-------------8<-----------------
#include <Python.h>

char* cmd = "import sys; print sys.path\n"
"import re; print dir(re)\n";

int main()
{
	Py_Initialize();
	printf("Initialize done\n");
	PyRun_SimpleString(cmd);
	Py_Finalize();

	return 0;
}
---------8<------------8<-------------8<-------------8<-----------------

import re seems to be succeeded only half way, the output is:

Initialize done
['f:\\trema\\fk-dev\\tools\\python\\PCbuild\\python23.zip', 
'f:\\trema\\fk-dev\\tools\\python\\DLLs', 
'f:\\trema\\fk-dev\\tools\\python\\lib', 
'f:\\trema\\fk-dev\\tools\\python\\lib\\plat-win', 
'f:\\trema\\fk-dev\\tools\\python\\lib\\lib-tk', 
'f:\\trema\\fk-dev\\tools\\python\\Demo\\embed', 
'f:\\trema\\fk-dev\\tools\\python', 
'f:\\trema\\fk-dev\\tools\\python\\lib\\site-packages']
['__builtins__', '__doc__', '__file__', '__name__', 'engine']


So the re namespace is lacking everything from sre.

On linux it works both embedded, and from the interactive interpreter. 
On Win2K the interactive interpreter seems to work fine.

On Win2K, I have this working ok using Python 2.2.2.

What gives?


Harri





From guido@python.org Fri Apr 11 15:27:50 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 11 Apr 2003 10:27:50 -0400
Subject: [Python-Dev] backporting string changes to 2.2.3
In-Reply-To: Your message of "11 Apr 2003 10:25:21 +0200."
 <1050049521.1751.16.camel@elevedelix>
References: <20030410225001.GN17847@epoch.metaslash.com>
 <1050049521.1751.16.camel@elevedelix>
Message-ID: <200304111428.h3BERvm14364@odiug.zope.com>

> On string objects there is a find and rfind, a lstrip and rstrip, but
> there is no rsplit function, is there a reason why there isn't, or is
> this only because nobody implement it ? ( in this case I'll propose a
> patch in a few days). I'm mainly using it for
> 'toto.titi.tata'.rsplit('.',1) -> 'toto.titi','tata' as our internal
> database representation is quite like a logical filesystem.

I think the reason is that there isn't enough need for it. The
special case of s.rsplit(c, 1) can be coded so easily by using rfind()
that I don't see the need to add it. Our Swiss Army Knife string type
is beginning to be so loaded with features that I am reluctant to add
more. The cost of a new feature these days is measured in the number
of books that need to be updated, not the number of lines of code
needed to implement it.

For your amusement only (! :-), I offer this implementation of
rsplit(), which works in Python 2.3:

def rsplit(string, sep, count=-1):
 L = [part[::-1] for part in string[::-1].split(sep[::-1], count)]
 L.reverse()
 return L

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Fri Apr 11 15:50:36 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 11 Apr 2003 10:50:36 -0400
Subject: [Python-Dev] tzset
In-Reply-To: Your message of "Thu, 10 Apr 2003 23:52:43 EDT."
 <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net>
Message-ID: <200304111450.h3BEocd14466@odiug.zope.com>

> > That would be acceptable to me. Since all we want is a wrapper
> > around the C library tzset(), all we need to test for is that it
> > does that.
> 
> It's not really what I want. When we expose highly
> platform-dependent functions, we create a lot of confusion along
> with them. Perhaps that's because we're not always careful to
> emphasize that the behavior is a cross-platform crapshoot, and users
> are rarely careful to heed such warnings.

I guess we shouldn't expose the Windows version of tzset() at all.
The syntax it accepts and the rules it applies (always following US
DST rules) make it pretty useless.

OTOH I think tzset() is useful on most Unix and Linux platforms, and
there's no easy alternative (short of wrapping the tz library, which
would be a huge task), so there we should expose it.

I believe this means that Stuart's patch can be checked in as is.
We can tweak it based on reports during the beta cycle.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Fri Apr 11 15:53:39 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 11 Apr 2003 10:53:39 -0400
Subject: [Python-Dev] Re: tp_clear return value
In-Reply-To: Your message of "Thu, 10 Apr 2003 23:41:03 EDT."
 <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net>
Message-ID: <200304111453.h3BErjk14482@odiug.zope.com>

> So I'm unclear on what we're trying to optimize when a tp_clear
> nukes less than the corresponding tp_traverse visits. I suppose
> "code space" is one decent answer to that.

Yes. Though the type object example shows there are other differences
(thanks Martin).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From boris.boutillier@arteris.net Fri Apr 11 16:04:45 2003
From: boris.boutillier@arteris.net (Boris Boutillier)
Date: 11 Apr 2003 17:04:45 +0200
Subject: [Python-Dev] backporting string changes to 2.2.3
In-Reply-To: <200304111428.h3BERvm14364@odiug.zope.com>
References: <20030410225001.GN17847@epoch.metaslash.com>
 <1050049521.1751.16.camel@elevedelix>
 <200304111428.h3BERvm14364@odiug.zope.com>
Message-ID: <1050073485.1828.23.camel@elevedelix>

I see, I didn't think about all the documentations to update, and i
should have as I've got the same problem in my project :).

> I think the reason is that there isn't enough need for it. The
> special case of s.rsplit(c, 1) can be coded so easily by using rfind()
> that I don't see the need to add it. Our Swiss Army Knife string type
> is beginning to be so loaded with features that I am reluctant to add
> more. The cost of a new feature these days is measured in the number
> of books that need to be updated, not the number of lines of code
> needed to implement it.
> 
> For your amusement only (! :-), I offer this implementation of
> rsplit(), which works in Python 2.3:
> 
> def rsplit(string, sep, count=-1):
> L = [part[::-1] for part in string[::-1].split(sep[::-1], count)]
> L.reverse()
> return L
Didn't thought about this one, tricky and amusing.

-- 
Boris Boutillier - Boris.Boutillier@arteris.net



From barry@python.org Fri Apr 11 18:51:56 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 13:51:56 -0400
Subject: [Python-Dev] Changes to gettext.py for Python 2.3
Message-ID: <1050083516.11172.40.camel@barry>

Hi I18n-ers,

I plan on checking in the following changes to the gettext.py module for
Python 2.3, based on feedback from the Zope and Mailman i18n work. 
Here's a summary of the changes, hopefully there aren't too many
controversies <wink>. I'll update the tests and the docs at the same
time.

- Expose NullTranslations and GNUTranslations to __all__

- Set the default charset to iso-8859-1. It used to be None, which
would cause problems with .ugettext() if the file had no charset
parameter. Arguably, the po/mo file would be broken, but I still think
iso-8859-1 is a reasonable default.

- Add a "coerce" default argument to GNUTranslations's constructor. The
reason for this is that in Zope, we want all msgids and msgstrs to be
Unicode. For the latter, we could use .ugettext() but there isn't
currently a mechanism for Unicode-ifying msgids.

The plan then is that the charset parameter specifies the encoding for
both the msgids and msgstrs, and both are decoded to Unicode when read. 
For example, we might encode po files with utf-8. I think the GNU
gettext tools don't care.

Since this could potentially break code [*] that wants to use the
encoded interface .gettext(), the constructor flag is added, defaulting
to False. Most code I suspect will want to set this to True and use
.ugettext().

- A few other minor changes from the Zope project, including asserting
that a zero-length msgid must have a Project-ID-Version header for it to
be counted as the metadata record.

-Barry

[*] I've come to the opinion that using anything other than Unicode
msgids and msgstrs just won't work well for Python, and thus you really
should be using the .ugettext() method everywhere. It's also insane to
mix .gettext() and .ugettext(). In Zope, all human readable messages
will be Unicode strings internally, so we definitely want Unicode
msgids.



From exarkun@intarweb.us Fri Apr 11 19:11:57 2003
From: exarkun@intarweb.us (Jp Calderone)
Date: Fri, 11 Apr 2003 14:11:57 -0400
Subject: [Python-Dev] Placement of os.fdopen functionality
In-Reply-To: <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net>
References: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz> <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030411181157.GA32603@meson.dyndns.org>

--AhhlLboLdkugWU4S
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 10, 2003 at 08:48:19PM -0400, Guido van Rossum wrote:
>=20
[Greg Ewing]
> > But since it is an os-dependent feature, the implementation
> > of it probably does belong in the os module.
> >=20
> > So how about providing a file.fromfd() which calls
> > os.fdopen()?
>=20
> I've never seen anyone code a file subclass yet, let alone one that
> needed this. YAGNI?
>=20

 codecs.EncodedFile seems almost like it should (but it's just a factory
function). Other than that I can't think of anything that does or that
would benefit from doing so.

 Jp

--=20
Lowery's Law:
 If it jams -- force it. If it breaks, it needed replacing anyway.
--=20
 up 22 days, 15:01, 3 users, load average: 1.05, 1.11, 1.16

--AhhlLboLdkugWU4S
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE+lwVtedcO2BJA+4YRAjoAAJwIZeuDojw3sloNjgnD2VIv2ys9HQCgx6FI
h36wS6oWikFdzUGmmil0E+0=
=xJ1v
-----END PGP SIGNATURE-----

--AhhlLboLdkugWU4S--


From martin@v.loewis.de Fri Apr 11 20:54:50 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 11 Apr 2003 21:54:50 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050083516.11172.40.camel@barry>
References: <1050083516.11172.40.camel@barry>
Message-ID: <3E971D8A.5020006@v.loewis.de>

Barry Warsaw wrote:

> - Set the default charset to iso-8859-1. It used to be None, which
> would cause problems with .ugettext() if the file had no charset
> parameter. Arguably, the po/mo file would be broken, but I still think
> iso-8859-1 is a reasonable default.

I'm -1 here. Why do you think it is a reasonable default?

Errors should never pass silently.
Unless explicitly silenced.

While iso-8859-1 might be a reasonable default in other application
domains, in the context of non-English text (which it typically is),
assuming Latin-1 is bound to create mojibake.

If your application can accept creating mojibake, I suggest a method
setdefaultencoding on the catalog, which has no effect if an encoding
was found in the catalog.

> - Add a "coerce" default argument to GNUTranslations's constructor. The
> reason for this is that in Zope, we want all msgids and msgstrs to be
> Unicode. For the latter, we could use .ugettext() but there isn't
> currently a mechanism for Unicode-ifying msgids.

Could you please in what context this is needed? msgids are ASCII, and
you can pass a Unicode string to ugettext just fine.

> The plan then is that the charset parameter specifies the encoding for
> both the msgids and msgstrs, and both are decoded to Unicode when read. 
> For example, we might encode po files with utf-8. I think the GNU
> gettext tools don't care.

They complain loudly if they find bytes > 127 in the msgid.

> Since this could potentially break code [*] that wants to use the
> encoded interface .gettext(), the constructor flag is added, defaulting
> to False. Most code I suspect will want to set this to True and use
> .ugettext().

To avoid breakage, you could define ugettext as

 def ugettext(self, message):
 if isinstance(message, unicode):
 tmsg = self._catalog.get(message.encode(self._charset))
 if tmsg is None:
 return message
 else:
 tmsg = self._catalog.get(message, message)
 return unicode(tmsg, self._charset)

> - A few other minor changes from the Zope project, including asserting
> that a zero-length msgid must have a Project-ID-Version header for it to
> be counted as the metadata record.

That test was there, and removed on request of Bruno Haible, the GNU
gettext maintainer, as he points out that Project-ID-Version is not
mandatory for the metadata (see Patch #700839).

Regards,
Martin






From barry@python.org Fri Apr 11 21:26:59 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 16:26:59 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <3E971D8A.5020006@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
Message-ID: <1050092819.11172.89.camel@barry>

On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote:
> Barry Warsaw wrote:
> 
> > - Set the default charset to iso-8859-1. It used to be None, which
> > would cause problems with .ugettext() if the file had no charset
> > parameter. Arguably, the po/mo file would be broken, but I still think
> > iso-8859-1 is a reasonable default.
> 
> I'm -1 here. Why do you think it is a reasonable default?
> 
> Errors should never pass silently.
> Unless explicitly silenced.
> 
> While iso-8859-1 might be a reasonable default in other application
> domains, in the context of non-English text (which it typically is),
> assuming Latin-1 is bound to create mojibake.

Okay, never mind, I'll back this one out. The problem was caused by my
other patch to unicode-ify on read (see below) without first having a
charset. I have a different fix for this.

> > - Add a "coerce" default argument to GNUTranslations's constructor. The
> > reason for this is that in Zope, we want all msgids and msgstrs to be
> > Unicode. For the latter, we could use .ugettext() but there isn't
> > currently a mechanism for Unicode-ifying msgids.
> 
> Could you please in what context this is needed? msgids are ASCII, and
> you can pass a Unicode string to ugettext just fine.

In Zope, all strings are Unicode and the catalog may include messages
that are extracted from places other than Python source code, e.g.
XML-based files. Message ids can contain non-ASCII characters if they
are written by a non-English coder. I think in that case, we'd want to
do something like encode the strings possibly with utf-8 for the .po/.mo
files, but we want them decoded in time to look the Unicode strings up
in the catalog.

Similarly, what happens if a non-English coder writes an i18n'd Python
module with native strings, possibly using a Python 2.3 coding cookie. 
We'd want their message ids to be extracted into the .mo/.po files,
right?

> > The plan then is that the charset parameter specifies the encoding for
> > both the msgids and msgstrs, and both are decoded to Unicode when read. 
> > For example, we might encode po files with utf-8. I think the GNU
> > gettext tools don't care.
> 
> They complain loudly if they find bytes > 127 in the msgid.

Really? Ok, I'm still confused because I tried the following example:

I wrote a .mo file (charset=utf-8) with the following record:

#: nofile:0
msgid "ab\xc3\x9e"
msgstr "\xc2\xa4yz"

I used standard msgfmt to turn that into a .mo file. Then created a
GNUTranslation(fp, coerce=True) and called

>>> t.ugettext(u'ab\xde')
u'\xa4yz'

This is what I should expect, right? ;)

> > - A few other minor changes from the Zope project, including asserting
> > that a zero-length msgid must have a Project-ID-Version header for it to
> > be counted as the metadata record.
> 
> That test was there, and removed on request of Bruno Haible, the GNU
> gettext maintainer, as he points out that Project-ID-Version is not
> mandatory for the metadata (see Patch #700839).

Ah, I read the diff backwards in this case. I'll back this one out too.

-Barry




From barry@python.org Fri Apr 11 21:37:56 2003
From: barry@python.org (Barry Warsaw)
Date: 11 Apr 2003 16:37:56 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <3E971D8A.5020006@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
Message-ID: <1050093475.11200.96.camel@barry>

On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote:

> To avoid breakage, you could define ugettext as
> 
> def ugettext(self, message):
> if isinstance(message, unicode):
> tmsg = self._catalog.get(message.encode(self._charset))
> if tmsg is None:
> return message
> else:
> tmsg = self._catalog.get(message, message)
> return unicode(tmsg, self._charset)

I suppose we could cache the conversion to make the next lookup more
efficient. Alternatively, if we always convert internally to Unicode we
could encode on .gettext(). Then we could just pick One Way and do away
with the coerce flag.

-Barry
 



From guido@python.org Fri Apr 11 21:32:51 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 11 Apr 2003 16:32:51 -0400
Subject: [Python-Dev] Re: More int/long integration issues
In-Reply-To: Your message of "21 Mar 2003 14:42:07 PST."
 <1048286527.651.29.camel@sayge.arc.nasa.gov>
References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <uwuitaf3c.fsf@boost-consulting.com> <200303202233.h2KMXbG07782@odiug.zope.com> <uznnowzjb.fsf@boost-consulting.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net>
 <1048286527.651.29.camel@sayge.arc.nasa.gov>
Message-ID: <200304112033.h3BKWw703999@odiug.zope.com>

> On Fri, 2003-03-21 at 06:55, Guido van Rossum wrote:
> 
> > > > Hm, maybe range() shouldn't be an iterator but an interator
> > > > generator. No time to explain; see the discussion about
> > > > restartable iterators.

[Chad Netzer]
> Hmmm. Now that've uploaded my patch extending range() to longs,

(And now that I've checked it in. :-)

> I'd like to work on this. I've already written a C range() iterator
> (incorporating PyLongs), and it would be very nice to have it
> automatically be a lazy range() when used in a loop.
> 
> In any case, assuming you are quite busy, but would consider this for
> the 2.4 timeframe, I will do some work on it. If it is already being
> covered, I'll gladly stay away from it. :)

range() can't be changed from returning a list until at least Python
3.0.

xrange() already is an iterator well. So I'm not sure there's much to
do, especially since I think making xrange() support large longs goes
against the design goal for xrange(), which is to be a lightweight
alternative for range() when speed is important.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net Wed Apr 9 16:23:51 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 09 Apr 2003 11:23:51 -0400
Subject: [Python-Dev] RE: Adding item in front of a list
In-Reply-To: <yu99k7e3wxcf.fsf@europa.research.att.com>
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEIJFEAA.tim.one@comcast.net>

[Andrew Koenig, starting with l=[2, 3, 4]]
> ...
> I would have thought that after l.insert(-1, 1), l would be
> [2, 3, 1, 4], but it doesn't work that way.

Alas, list.insert() existed before sequence indices were generalized to give
a "count from the right end" meaning to negative index values. When the
generalization happened, it appears that list.insert() was just overlooked.

I'd like to change this. If I did, how loudly would people scream?

Guido says he also wishes list.insert() had been defined with the arguments
in the opposite order, so that list.insert(object) could have a natural
default index argument of 0. I'd like to change that too, but it's clearly
too late for that one.



From nas@python.ca Fri Apr 11 23:14:31 2003
From: nas@python.ca (Neil Schemenauer)
Date: Fri, 11 Apr 2003 15:14:31 -0700
Subject: [Python-Dev] new bytecode results
In-Reply-To: <b3kooi$gaj$1@main.gmane.org>
References: <b3kooi$gaj$1@main.gmane.org>
Message-ID: <20030411221431.GA25548@glacier.arctrix.com>

Damien Morton wrote:
> I tried adding a variety of new instructions to the PVM, initially with a
> code compression goal for the bytecodes, and later with a performance goal.

Hi Damiem,

It's good to see your enthusiasm for optimization. However, I can't
help but think your efforts could be better directed. Have you looked
at the CALL_ATTR work that was done at PyCon? There was also some work
done on optimizing descriptors.

I think working on global and builtin namespace optimizations could
payoff big. There was talk about disallowing shadowing builtin names.
That would allow getting rid of runtime lookups in dictionaries and even
inlining of builtin functions. I have a patch on SF that could use some
polish. Also, working on the new AST compiler would help us. It will
be much easier to add new optimization passes after that work is
completed.

> begin 666 source.zip
> M4$L#!!0````(`.0E6RZ%[DUZ.%X``)9\`0`'````8V5V86PN8^Q]?5<;.;+W
[...]

Yikes. Next time you should just upload a patch to Source Forge.

 Neil


From skip@pobox.com Fri Apr 11 23:58:23 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 11 Apr 2003 17:58:23 -0500
Subject: [Python-Dev] new bytecode results
In-Reply-To: <20030411221431.GA25548@glacier.arctrix.com>
References: <b3kooi$gaj$1@main.gmane.org>
 <20030411221431.GA25548@glacier.arctrix.com>
Message-ID: <16023.18575.48444.491279@montanaro.dyndns.org>

 Neil> Damien Morton wrote:
 >> I tried adding a variety of new instructions to the PVM, initially
 >> with a code compression goal for the bytecodes, and later with a
 >> performance goal.

 Neil> Hi Damiem,

 Neil> It's good to see your enthusiasm for optimization. However, I
 Neil> can't help but think your efforts could be better directed. Have
 Neil> you looked at the CALL_ATTR work that was done at PyCon? There
 Neil> was also some work done on optimizing descriptors.

I think that message got stuck on mail.python.org on Feb 27 and was just
released from purgatory today. Maybe it was the size?

Skip


From paul@prescod.net Sat Apr 12 00:59:29 2003
From: paul@prescod.net (Paul Prescod)
Date: Fri, 11 Apr 2003 16:59:29 -0700
Subject: [Python-Dev] Garbage collecting closures
Message-ID: <3E9756E1.10503@prescod.net>

Does this bug look familiar to anyone?

import gc

def bar(a):
 def foo():
 return None
 x = a
 foo()

class C:pass
a = C()

for i in range(20):
 print len(gc.get_referrers(a))
 x = bar(a)

On my Python, it just counts up. "a" gets more and more referrers and 
they are "cell" objects. If it is unknown, I'll submit a bug report 
unless someone fixes it before I get to it. ;)

 Paul Prescod



From guido@python.org Sat Apr 12 01:45:40 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 11 Apr 2003 20:45:40 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: "Your message of Fri, 11 Apr 2003 16:59:29 PDT."
 <3E9756E1.10503@prescod.net>
References: <3E9756E1.10503@prescod.net>
Message-ID: <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net>

> Does this bug look familiar to anyone?
> 
> import gc
> 
> def bar(a):
> def foo():
> return None
> x = a
> foo()
> 
> class C:pass
> a = C()
> 
> for i in range(20):
> print len(gc.get_referrers(a))
> x = bar(a)
> 
> On my Python, it just counts up. "a" gets more and more referrers and 
> they are "cell" objects. If it is unknown, I'll submit a bug report 
> unless someone fixes it before I get to it. ;)

If I use a "while 1" loop, the count never goes above 225.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paul@prescod.net Sat Apr 12 03:03:40 2003
From: paul@prescod.net (Paul Prescod)
Date: Fri, 11 Apr 2003 19:03:40 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net>
References: <3E9756E1.10503@prescod.net> <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3E9773FC.5020908@prescod.net>

Guido van Rossum wrote:
>...

>>
>>On my Python, it just counts up. "a" gets more and more referrers and 
>>they are "cell" objects. If it is unknown, I'll submit a bug report 
>>unless someone fixes it before I get to it. ;)
> 
> 
> If I use a "while 1" loop, the count never goes above 225.

Just FYI, even if it wouldn't have leaked forever, it caused me serious 
pain because it kept a reference to a COM object. The process wouldn't 
die until the object died and all of my usual techniques for breaking 
circular references were of no avail. I even tried nasty hacks like 
globals.clear() and self.__dict__.clear(). But there was no circular 
reference to be broken.

 Paul Prescod



From mhammond@skippinet.com.au Sat Apr 12 02:41:14 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sat, 12 Apr 2003 11:41:14 +1000
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <200304111416.47006.harri.pasanen@trema.com>
Message-ID: <000001c30099$711a6f60$530f8490@eden>

[Harri]
> Hello,
>
> In a few hours old CVS checkout, I'm having problems getting the
> embedded python to work.

This is true even in non-embedded Python. Move away "_sre.pyd", and the
interactive session shows:

'import site' failed; use -v for traceback
>>> import re
>>> dir(re)
['__builtins__', '__doc__', '__file__', '__name__', 'engine']

Running with "-v" shows:

'import site' failed; traceback:
Traceback (most recent call last):
 File "E:\src\python-cvs\lib\site.py", line 298, in ?
 encodings._cache[enc] = encodings._unknown
AttributeError: 'module' object has no attribute '_unknown'

So, my speculation at this point is that for some reason, site.py now
depends on re, which depends on _sre - but somehow a "stale" import is left
hanging around.

Another strange point - executing "python", then typing "import re" is
completely silent, as we have noted. However, executing "python -c "import
re" dumps an exception:

python -c "import re"
'import site' failed; use -v for traceback
Traceback (most recent call last):
 File "E:\src\python-cvs\lib\warnings.py", line 270, in ?
 filterwarnings("ignore", category=OverflowWarning, append=1)
 File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings
 item = (action, re.compile(message, re.I), category,
AttributeError: 'module' object has no attribute 'compile'

I'm really not sure what is going on here. I'd suggest creating a bug at
sf.

Mark.



From jeremy@alum.mit.edu Sat Apr 12 04:38:14 2003
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: 11 Apr 2003 23:38:14 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9756E1.10503@prescod.net>
References: <3E9756E1.10503@prescod.net>
Message-ID: <1050118693.10278.17.camel@localhost.localdomain>

On Fri, 2003-04-11 at 19:59, Paul Prescod wrote:
> Does this bug look familiar to anyone?

No bug here.

> import gc
> 
> def bar(a):
> def foo():
> return None
> x = a
> foo()
> 
> class C:pass
> a = C()
> 
> for i in range(20):
> print len(gc.get_referrers(a))
> x = bar(a)
> 
> On my Python, it just counts up. "a" gets more and more referrers and 
> they are "cell" objects. If it is unknown, I'll submit a bug report 
> unless someone fixes it before I get to it. ;)

Nested recursive functions create circular references, which are only
collected when the garbage collector runs. Add a gc.collect() call to
the end of your loop and the number of referrers stays at 1.

Jeremy




From tim_one@email.msn.com Sat Apr 12 09:56:42 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 12 Apr 2003 04:56:42 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9773FC.5020908@prescod.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>

[Paul Prescod]
> Just FYI, even if it wouldn't have leaked forever,

It wouldn't.

> it caused me serious pain because it kept a reference to a COM object.
> The process wouldn't die until the object died and all of my usual
> techniques for breaking circular references were of no avail. I even
> tried nasty hacks like globals.clear() and self.__dict__.clear(). But
> there was no circular reference to be broken.

There is, but I don't think you *can* break it. Stick

 print foo, foo.func_closure[1]

inside your bar() function, after foo's definition. foo.func_closure is a
2-tuple here, and you'll see that its last element is a cell, which in turn
points back to foo. That's the cycle. Since func_closure is a readonly
attr, and tuples and cells are immutable, there shouldn't be anything you
can do to break this cycle.

Calling gc.collect() will reclaim it, provided it has become unreachable.

Hiding critical resources in closures is a Bad Idea, of course -- that's why
nobody has used Scheme since 1993 <wink>.



From martin@v.loewis.de Sat Apr 12 11:34:05 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 12:34:05 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050093475.11200.96.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050093475.11200.96.camel@barry>
Message-ID: <m38yug57j6.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> I suppose we could cache the conversion to make the next lookup more
> efficient. Alternatively, if we always convert internally to Unicode we
> could encode on .gettext(). Then we could just pick One Way and do away
> with the coerce flag.

If you are concerned about efficiency, I guess there is no way to
avoid converting the file to Unicode on loading. I would then
encourage a change where this flag is available, but has an effect
only on performance, not on the behaviour.

Alternatively, you could subclass GNUTranslation.

Regards,
Martin



From mal@lemburg.com Sat Apr 12 11:58:33 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 12 Apr 2003 12:58:33 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <000001c30099$711a6f60$530f8490@eden>
References: <000001c30099$711a6f60$530f8490@eden>
Message-ID: <3E97F159.20909@lemburg.com>

Mark Hammond wrote:
> [Harri]
> 
>>Hello,
>>
>>In a few hours old CVS checkout, I'm having problems getting the
>>embedded python to work.
> 
> 
> This is true even in non-embedded Python. Move away "_sre.pyd", and the
> interactive session shows:
> 
> 'import site' failed; use -v for traceback
> 
>>>>import re
>>>>dir(re)
> 
> ['__builtins__', '__doc__', '__file__', '__name__', 'engine']
> 
> Running with "-v" shows:
> 
> 'import site' failed; traceback:
> Traceback (most recent call last):
> File "E:\src\python-cvs\lib\site.py", line 298, in ?
> encodings._cache[enc] = encodings._unknown
> AttributeError: 'module' object has no attribute '_unknown'

This looks like a modified site.py. Where did you get this from ?

BTW, hacking encodings._cache is generally a *bad* idea. There's
no guarantee that such code will continue to work in future
releases since you are touching undocumented internals there.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source (#1, Apr 12 2003)
 >>> Python/Zope Products & Consulting ... http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium: 73 days left



From martin@v.loewis.de Sat Apr 12 12:17:35 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 13:17:35 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <3E97F159.20909@lemburg.com>
References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com>
Message-ID: <m3r8883qy8.fsf@mira.informatik.hu-berlin.de>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> This looks like a modified site.py. Where did you get this from ?

Perhaps from the Python CVS?

if sys.platform == 'win32':
 import locale, codecs
 enc = locale.getdefaultlocale()[1]
 if enc.startswith('cp'): # "cp***" ?
 try:
 codecs.lookup(enc)
 except LookupError:
 import encodings
 encodings._cache[enc] = encodings._unknown
 encodings.aliases.aliases[enc] = 'mbcs'

Regards,
Martin


From mal@lemburg.com Sat Apr 12 12:23:47 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 12 Apr 2003 13:23:47 +0200
Subject: [Python-Dev] range() as iterator (Re: More int/long integration
 issues)
In-Reply-To: <200304112033.h3BKWw703999@odiug.zope.com>
References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com>	<200303131903.h2DJ3Ug06240@odiug.zope.com>	<uwuitaf3c.fsf@boost-consulting.com>	<200303202233.h2KMXbG07782@odiug.zope.com>	<uznnowzjb.fsf@boost-consulting.com>	<200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> 	<1048286527.651.29.camel@sayge.arc.nasa.gov> <200304112033.h3BKWw703999@odiug.zope.com>
Message-ID: <3E97F743.4070301@lemburg.com>

Guido van Rossum wrote:
>>I'd like to work on this. I've already written a C range() iterator
>>(incorporating PyLongs), and it would be very nice to have it
>>automatically be a lazy range() when used in a loop.
>>
>>In any case, assuming you are quite busy, but would consider this for
>>the 2.4 timeframe, I will do some work on it. If it is already being
>>covered, I'll gladly stay away from it. :)
> 
> range() can't be changed from returning a list until at least Python
> 3.0.

Is this change really necessary ? Instead of changing the semantics
of range() why not have the byte code compiler optimize it's typical
usage:

for i in range(10):
 pass

In the above case, changing the byte code compiler output would
not introduce any change in semantics. Even better, the compiler
could get rid off the function call altogether.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source (#1, Apr 12 2003)
 >>> Python/Zope Products & Consulting ... http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium: 73 days left



From martin@v.loewis.de Sat Apr 12 12:43:28 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 13:43:28 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050092819.11172.89.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
Message-ID: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> I used standard msgfmt to turn that into a .mo file. Then created a
> GNUTranslation(fp, coerce=3DTrue) and called
>=20
> >>> t.ugettext(u'ab\xde')
> u'\xa4yz'
>=20
> This is what I should expect, right? ;)

More or less, yes. Now, what happens if you pot "real" non-ASCII
(i.e. bytes above 127) into the message id, like so:

msgid "ab=F6"
msgstr "\xc2\xa4yz"

msgfmt will still accept that, but msgunfmt will complain:

msgunfmt: warning: The following msgid contains non-ASCII characters.
 This will cause problems to translators who use a
 character encoding different from yours. Consider
 using a pure ASCII msgid instead.

If you think about this, this is really bad: If you mean to apply the
charset=3D to both msgid and msgstr, then translators using a different
charset from yours are in big trouble.

They are faced with three problems:
1. They don't know what the charset of the msgids is. The PO files do
 have a charset declaration, the POT files typically don't.
2. They need to convert the msgids from the POT encoding to their
 native encoding. There are no tools available to support that readily;
 tools like iconv might correctly convert the msgids, but won't update
 the charset=3D in the POT file (if the charset was filled out).
3. By converting the msgids, they are also changing them. That means
 the msgids are not really suitable as keys anymore.

Regards,
Martin


From mal@lemburg.com Sat Apr 12 12:49:11 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 12 Apr 2003 13:49:11 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <m3r8883qy8.fsf@mira.informatik.hu-berlin.de>
References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> <m3r8883qy8.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3E97FD37.9040100@lemburg.com>

Martin v. L=F6wis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>=20
>=20
>>This looks like a modified site.py. Where did you get this from ?
>=20
> Perhaps from the Python CVS?

Hmm, I don't have that in my CVS checkout... I guess a cleanup
is due.

> if sys.platform =3D=3D 'win32':
> import locale, codecs
> enc =3D locale.getdefaultlocale()[1]
> if enc.startswith('cp'): # "cp***" ?
> try:
> codecs.lookup(enc)
> except LookupError:
> import encodings
> encodings._cache[enc] =3D encodings._unknown
> encodings.aliases.aliases[enc] =3D 'mbcs'

That's the wrong way to do it.

This code should live in encodings/__init__.py, not site.py,
and it should be done lazy, ie. Python startup time should not
suffer from this in general, only when Unicode and cpXXX
encodings are being requested and not found.

The codec machinery was carefully designed not to introduce
extra overhead when not using Unicode in programs. The above
approach pretty much kills this effort :-)

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source (#1, Apr 12 2003)
 >>> Python/Zope Products & Consulting ... http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium: 73 days left



From martin@v.loewis.de Sat Apr 12 13:31:07 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 12 Apr 2003 14:31:07 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <3E97FD37.9040100@lemburg.com>
References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com>
 <m3r8883qy8.fsf@mira.informatik.hu-berlin.de>
 <3E97FD37.9040100@lemburg.com>
Message-ID: <m3el475244.fsf@mira.informatik.hu-berlin.de>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> The codec machinery was carefully designed not to introduce
> extra overhead when not using Unicode in programs. The above
> approach pretty much kills this effort :-)

This effort is dead already. For example, on Unix, the file system
default encoding is initialized from the user's preference; to verify
that the encoding really exists, a codec lookup is performed.

Regards,
Martin



From guido@python.org Sat Apr 12 14:25:15 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 12 Apr 2003 09:25:15 -0400
Subject: [Python-Dev] range() as iterator (Re: More int/long integration
 issues)
In-Reply-To: "Your message of Sat, 12 Apr 2003 13:23:47 +0200."
 <3E97F743.4070301@lemburg.com>
References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com>
 <200303131903.h2DJ3Ug06240@odiug.zope.com>
 <uwuitaf3c.fsf@boost-consulting.com>
 <200303202233.h2KMXbG07782@odiug.zope.com>
 <uznnowzjb.fsf@boost-consulting.com>
 <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net>
 <1048286527.651.29.camel@sayge.arc.nasa.gov>
 <200304112033.h3BKWw703999@odiug.zope.com> <3E97F743.4070301@lemburg.com>
Message-ID: <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net>

[Chad Netzer]
> >>I'd like to work on this. I've already written a C range() iterator
> >>(incorporating PyLongs), and it would be very nice to have it
> >>automatically be a lazy range() when used in a loop.
> >>
> >>In any case, assuming you are quite busy, but would consider this for
> >>the 2.4 timeframe, I will do some work on it. If it is already being
> >>covered, I'll gladly stay away from it. :)

[Guido]
> > range() can't be changed from returning a list until at least Python
> > 3.0.

[MAL]
> Is this change really necessary ? Instead of changing the semantics
> of range() why not have the byte code compiler optimize it's typical
> usage:
> 
> for i in range(10):
> pass
> 
> In the above case, changing the byte code compiler output would
> not introduce any change in semantics. Even better, the compiler
> could get rid off the function call altogether.

Right. That's nice, and can be done before 3.0 (as soon as we change
the rules so that adding a 'range' attribute to a module object is
illegal).

My musing about making range() an iterator or iterator well comes from
the observation that if I had had iterators as a concept from day one,
I would have made several things iterators that currently return
lists, e.g. map(), filter(), and range(). The need for the concrete
list returned by range() (outside the tutorial :-) is rare; in those
rare cases you could say list(range(...)).

Whether this is indeed worth changing in 3.0 isn't clear, that depends
on the scope of 3.0, which isn't defined yet (because I haven't had
time to work on it, really). I certainly plan to eradicate xrange()
in 3.0 one way or another: TOOWTDI.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Sat Apr 12 14:43:52 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 12 Apr 2003 09:43:52 -0400
Subject: [Python-Dev] Evil setattr hack
Message-ID: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>

Someone accidentally discovered a way to set attributes of built-in
types, even though the implementation tries to prevent this. For
example, you cannot modify the str type to add a new method. Let's
define the method first:

 >>> def reverse(self):
 ... return self[::-1]
 ...
 >>>

Using direct attribute assignment doesn't work:

 >>> str.reverse = reverse
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 TypeError: can't set attributes of built-in/extension type 'str'
 >>>

Using the dictionary doesn't work either:

 >>> str.__dict__['reverse'] = reverse
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 TypeError: object does not support item assignment
 >>>

But here's a trick that *does* work:

 >>> object.__setattr__(str, 'reverse', reverse)
 >>>

Proof that it worked:

 >>> "hello".reverse()
 'olleh'
 >>> 

What to do about this? I *really* don't want changes to built-in
types to become a standard "hack", because there are all sorts of
things that could go wrong. (For one, built-in type objects are
static C variables, which are shared between multiple interpreter
contexts in the same process.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Sat Apr 12 17:00:13 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 12 Apr 2003 12:00:13 -0400
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: "Your message of Sat, 12 Apr 2003 11:41:14 +1000."
 <000001c30099$711a6f60$530f8490@eden>
References: <000001c30099$711a6f60$530f8490@eden>
Message-ID: <200304121600.h3CG0DU01994@pcp02138704pcs.reston01.va.comcast.net>

> [Harri]
> > Hello,
> >
> > In a few hours old CVS checkout, I'm having problems getting the
> > embedded python to work.

[Mark]
> This is true even in non-embedded Python. Move away "_sre.pyd", and the
> interactive session shows:
> 
> 'import site' failed; use -v for traceback
> >>> import re
> >>> dir(re)
> ['__builtins__', '__doc__', '__file__', '__name__', 'engine']
> 
> Running with "-v" shows:
> 
> 'import site' failed; traceback:
> Traceback (most recent call last):
> File "E:\src\python-cvs\lib\site.py", line 298, in ?
> encodings._cache[enc] = encodings._unknown
> AttributeError: 'module' object has no attribute '_unknown'
> 
> So, my speculation at this point is that for some reason, site.py
> now depends on re, which depends on _sre

site.py sometimes imports distutils.util which imports re which
imports _sre. But this is only when run from the build directory.
But there's another path that imports re, and that's from warnings,
which is imported as soon as a warning is issued (even if nothing is
printed).

> - but somehow a "stale" import is left hanging around.

That's a standard problem when module A imports B and B fails -- a
semi-complete A stays around. Proposals to fix it have been made, but
it's tricky because deleting A isn't always the right thing to do (and
makes the failure harder to debug).

> Another strange point - executing "python", then typing "import re" is
> completely silent, as we have noted. However, executing "python -c "import
> re" dumps an exception:
> 
> python -c "import re"
> 'import site' failed; use -v for traceback
> Traceback (most recent call last):
> File "E:\src\python-cvs\lib\warnings.py", line 270, in ?
> filterwarnings("ignore", category=OverflowWarning, append=1)
> File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings
> item = (action, re.compile(message, re.I), category,
> AttributeError: 'module' object has no attribute 'compile'
> 
> I'm really not sure what is going on here. I'd suggest creating a bug at
> sf.

Have you got a $PYTHONSTRATUP? That doesn't get executed in the
second case.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From cnetzer@mail.arc.nasa.gov Sat Apr 12 20:37:09 2003
From: cnetzer@mail.arc.nasa.gov (Chad Netzer)
Date: 12 Apr 2003 12:37:09 -0700
Subject: [Python-Dev] range() as iterator (Re: More int/long
 integration issues)
In-Reply-To: <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net>
References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com>
 <200303131903.h2DJ3Ug06240@odiug.zope.com>
 <uwuitaf3c.fsf@boost-consulting.com>
 <200303202233.h2KMXbG07782@odiug.zope.com>
 <uznnowzjb.fsf@boost-consulting.com>
 <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net>
 <1048286527.651.29.camel@sayge.arc.nasa.gov>
 <200304112033.h3BKWw703999@odiug.zope.com> <3E97F743.4070301@lemburg.com>
 <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <1050176229.601.15.camel@sayge.arc.nasa.gov>

On Sat, 2003-04-12 at 06:25, Guido van Rossum wrote:

> [MAL]
> > Is this change really necessary ? Instead of changing the semantics
> > of range() why not have the byte code compiler optimize it's typical
> > usage:

> 
> Right. That's nice, and can be done before 3.0 (as soon as we change
> the rules so that adding a 'range' attribute to a module object is
> illegal).

Well, I plan to look into doing this, just because I think it is an
interesting problem and tickles my fancy. I'll report back when I have
failed. But at least I'll try to get the ball rolling. :)

Chad Netzer




From drifty@alum.berkeley.edu Sat Apr 12 23:20:35 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sat, 12 Apr 2003 15:20:35 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression tests
Message-ID: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>

For the regression tests for the stdlib, is it okay to create temporary
files (using tempfile) and connect to the Internet (when the network
resource is enabled)?

-Brett


From guido@python.org Sun Apr 13 01:08:04 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 12 Apr 2003 20:08:04 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Sat, 12 Apr 2003 15:20:35 PDT."
 <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
Message-ID: <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>

> For the regression tests for the stdlib, is it okay to create temporary
> files (using tempfile) and connect to the Internet (when the network
> resource is enabled)?

Tempfiles: definitely; though if you need a single temporary file, you
can use test_support.TESTFN.

Connecting to the Internet: only if the network resource is enabled.
Then it is up to the tester to make sure that connection to the
Internet is possible.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com Sun Apr 13 02:55:50 2003
From: skip@pobox.com (Skip Montanaro)
Date: Sat, 12 Apr 2003 20:55:50 -0500
Subject: [Python-Dev] migration away from SourceForge?
Message-ID: <16024.50086.748997.76318@montanaro.dyndns.org>

Is it time to think seriously about moving away from SourceForge? Their cvs
performance seems to be getting worse by the day. Bug updates also seem to
fail periodically in a fashion that suggests system overload. I presume the
parent company (is that still VA Linux?) is in dire enough financial straits
that it can't afford to upgrade its infrastructure enough to meet the
increased demand.

It seems we mostly need a CVS repository and a bug tracker. Is RoundUp
close enough to fill the bug tracking bill? What options are available for
CVS hosting?

OTOH, maybe we should try to convince Google to buy SF. ;-)

Skip


From barry@python.org Sun Apr 13 05:15:55 2003
From: barry@python.org (Barry Warsaw)
Date: Sun, 13 Apr 2003 00:15:55 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <16024.50086.748997.76318@montanaro.dyndns.org>
Message-ID: <9EF4B80D-6D66-11D7-8848-003065EEFAC8@python.org>

On Saturday, April 12, 2003, at 09:55 PM, Skip Montanaro wrote:
>
> Is it time to think seriously about moving away from SourceForge? 
> Their cvs
> performance seems to be getting worse by the day. Bug updates also 
> seem to
> fail periodically in a fashion that suggests system overload. I 
> presume the
> parent company (is that still VA Linux?) is in dire enough financial 
> straits
> that it can't afford to upgrade its infrastructure enough to meet the
> increased demand.
>
> It seems we mostly need a CVS repository and a bug tracker. Is RoundUp
> close enough to fill the bug tracking bill? What options are 
> available for
> CVS hosting?

Perhaps we should look into running the GForge code on a python.org
machine?

http://gforge.org

-Barry



From tim.one@comcast.net Sun Apr 13 06:09:29 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 13 Apr 2003 01:09:29 -0400
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <000001c30099$711a6f60$530f8490@eden>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEBLEDAB.tim.one@comcast.net>

[Mark Hammond]
> ...
> Another strange point - executing "python", then typing "import re" is
> completely silent, as we have noted. However, executing
> "python -c "import re" dumps an exception:
>
> python -c "import re"
> 'import site' failed; use -v for traceback
> Traceback (most recent call last):
> File "E:\src\python-cvs\lib\warnings.py", line 270, in ?
> filterwarnings("ignore", category=OverflowWarning, append=1)
> File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings
> item = (action, re.compile(message, re.I), category,
> AttributeError: 'module' object has no attribute 'compile'
>
> I'm really not sure what is going on here. I'd suggest creating a bug at
> sf.

Does this fail for anyone else? Works for me, here on Win98SE:

 C:\Code\python\PCbuild>python -c "import re"
 C:\Code\python\PCbuild>

Did you try -v, as

> 'import site' failed; use -v for traceback

suggested? Here's the import info I get:

C:\Code\python\PCbuild>python -vc "import re"
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# C:\CODE\PYTHON\lib\site.pyc matches C:\CODE\PYTHON\lib\site.py
import site # precompiled from C:\CODE\PYTHON\lib\site.pyc
# C:\CODE\PYTHON\lib\os.pyc matches C:\CODE\PYTHON\lib\os.py
import os # precompiled from C:\CODE\PYTHON\lib\os.pyc
import nt # builtin
# C:\CODE\PYTHON\lib\ntpath.pyc matches C:\CODE\PYTHON\lib\ntpath.py
import ntpath # precompiled from C:\CODE\PYTHON\lib\ntpath.pyc
# C:\CODE\PYTHON\lib\stat.pyc matches C:\CODE\PYTHON\lib\stat.py
import stat # precompiled from C:\CODE\PYTHON\lib\stat.pyc
# C:\CODE\PYTHON\lib\UserDict.pyc matches C:\CODE\PYTHON\lib\UserDict.py
import UserDict # precompiled from C:\CODE\PYTHON\lib\UserDict.pyc
# C:\CODE\PYTHON\lib\copy_reg.pyc matches C:\CODE\PYTHON\lib\copy_reg.py
import copy_reg # precompiled from C:\CODE\PYTHON\lib\copy_reg.pyc
# C:\CODE\PYTHON\lib\types.pyc matches C:\CODE\PYTHON\lib\types.py
import types # precompiled from C:\CODE\PYTHON\lib\types.pyc
# C:\CODE\PYTHON\lib\locale.pyc matches C:\CODE\PYTHON\lib\locale.py
import locale # precompiled from C:\CODE\PYTHON\lib\locale.pyc
import _locale # builtin
# C:\CODE\PYTHON\lib\codecs.pyc matches C:\CODE\PYTHON\lib\codecs.py
import codecs # precompiled from C:\CODE\PYTHON\lib\codecs.pyc
import _codecs # builtin
import encodings # directory C:\CODE\PYTHON\lib\encodings
# C:\CODE\PYTHON\lib\encodings\__init__.pyc matches
C:\CODE\PYTHON\lib\encodings\__init__.py
import encodings # precompiled from
C:\CODE\PYTHON\lib\encodings\__init__.pyc
# C:\CODE\PYTHON\lib\re.pyc matches C:\CODE\PYTHON\lib\re.py
import re # precompiled from C:\CODE\PYTHON\lib\re.pyc
# C:\CODE\PYTHON\lib\sre.pyc matches C:\CODE\PYTHON\lib\sre.py
import sre # precompiled from C:\CODE\PYTHON\lib\sre.pyc
# C:\CODE\PYTHON\lib\sre_compile.pyc matches
C:\CODE\PYTHON\lib\sre_compile.py
import sre_compile # precompiled from C:\CODE\PYTHON\lib\sre_compile.pyc
import _sre # dynamically loaded from C:\Code\python\PCbuild\_sre.pyd
# C:\CODE\PYTHON\lib\sre_constants.pyc matches
C:\CODE\PYTHON\lib\sre_constants.py
import sre_constants # precompiled from C:\CODE\PYTHON\lib\sre_constants.pyc
# C:\CODE\PYTHON\lib\sre_parse.pyc matches C:\CODE\PYTHON\lib\sre_parse.py
import sre_parse # precompiled from C:\CODE\PYTHON\lib\sre_parse.pyc
# C:\CODE\PYTHON\lib\string.pyc matches C:\CODE\PYTHON\lib\string.py
import string # precompiled from C:\CODE\PYTHON\lib\string.pyc
import strop # builtin
# C:\CODE\PYTHON\lib\encodings\cp1252.pyc matches
C:\CODE\PYTHON\lib\encodings\cp1252.py
import encodings.cp1252 # precompiled from
C:\CODE\PYTHON\lib\encodings\cp1252.pyc
# C:\CODE\PYTHON\lib\warnings.pyc matches C:\CODE\PYTHON\lib\warnings.py
import warnings # precompiled from C:\CODE\PYTHON\lib\warnings.pyc
# C:\CODE\PYTHON\lib\linecache.pyc matches C:\CODE\PYTHON\lib\linecache.py
import linecache # precompiled from C:\CODE\PYTHON\lib\linecache.pyc



From drifty@alum.berkeley.edu Sun Apr 13 07:58:32 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sat, 12 Apr 2003 23:58:32 -0700 (PDT)
Subject: [Python-Dev] RE: How should time.strptime() handle UTC?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOECAEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOECAEDAB.tim.one@comcast.net>
Message-ID: <Pine.SOL.4.53.0304122344080.3660@death.OCF.Berkeley.EDU>

I am cc'ing python-dev at Tim's suggestion. You can read the replies in
the email, but the gist is whether time.strptime() should accept UTC and
GMT for teh %Z directive.


[Tim Peters]

> [Brett Cannon]
> > I was writing a script using strptime and I rediscovered that strptime (at
> > least the pure Python version) does not accept UTC for the %Z directive as
> > an acceptable timezone.
>
> Is it that it specifically didn't accept UTC, or that it generally doesn't
> accept anything for %Z?
>

Doesn't accept anything beyond what the computer's timezone is (if the
computer is in PDT, it only picks that up and nothing else; quick test I
did failed on PST). So trying anything that is not directly known gets
rejected as a format error.

> > Now I just checked an install on a Solaris machine under Python 2.2 and it
> > doesn't accept UTC as a timezone either so I know of at least one C
> version that
> > doesn't take it either.
> >
> > Do you two think that I should modify strptime to accept UTC and GMT and
> > then set tm_isdst (DST flag) to 0? Or should it just stay as-is and not
> > accept it? Should I change it so that it accepts any 3-letter entry for
> > %Z and then just see if I know what the timezone is; if I know set
> > tm_isdst appropriately, otherwise set it to -1?
> >
> > I say yes to adding UTC and no to blindly accepting possible timezones.
> > My feeling is that this should act as closely to a naive datetime
> > representation as possible.
> >
> > And don't let having to deal with a patch hold you up on wanting to change
> > it; I have to patch a "feature" of strptime anyway. =)
>
> I think you should debate this in public.

So if you guys don't like hearing about this stuff blame Tim. =)

> %Z isn't allowed in POSIX strptime(). GNU docs say glibc supports it as
> an extension to POSIX, and that GNU "parses" for it (whatever that
> means), but that "no field in tm is changed" as a result. A number of
> other strptime man pages on the web say:
>
> %Z
> timezone name or no characters if no time zone information exists
>
> which suggests they carelessly copied the format part of their strftime()
> man page.
>
> So there's no clear prior art to follow here, and inventing new art takes
> more time than I have (hence "debate this in public" -- please <wink>).
>

Well, that man page line is pretty much what the Python docs say.
Personally I would love to not have to support it, but it has been in the
docs so I don't know if we can yank it without upsetting someone (although
strptime has always been questionable, so maybe we can rip it out).

So, "public", should strptime be able to handle UTC and GMT as a timezone
no matter what? How about taking in any 3-character timezone so that an
error isn't raised but only set the DST flag if it is a known timezone?
Perhaps %Z should accept 42 since it is the answer to everything?

-Brett


From tim_one@email.msn.com Sun Apr 13 08:05:54 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 03:05:54 -0400
Subject: [Python-Dev] Big trouble in CVS Python
Message-ID: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com>

Under current CVS, release build, running regrtest.py crashes very soon
after entering test___all__.py for me, on two different machines (but both
Windows). The C stack has gotten lost by this point, and the program
counter is pointing into static data(!), about a dozen bytes beyond the
start of Python's static PyFloat_Type type.

Alas, there is no problem in a debug build. There's also no problem under
the release build if I run the debug build first and leave the .pyc files
behind. Removing the .pyc files and then running the release build dies
every time. So maybe it's something to do with compiling Python programs,
or maybe with a vagary of when cyclic gc triggers. The latter is high on my
suspect list, because the location of the death is affected by regrtest's -t
option, and the release build runs the tests to completion with -t0.

If it's in gc, I probably caused it. So I'm not asking you to fix it
<wink>. It would help to know if anyone is having problems under Linux, and
especially if you are and the debugger there is more helpful in a release
build.



From martin@v.loewis.de Sun Apr 13 08:37:31 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 13 Apr 2003 09:37:31 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <16024.50086.748997.76318@montanaro.dyndns.org>
References: <16024.50086.748997.76318@montanaro.dyndns.org>
Message-ID: <m3znmux2yr.fsf@mira.informatik.hu-berlin.de>

Skip Montanaro <skip@pobox.com> writes:

> Is it time to think seriously about moving away from SourceForge? 

Any proposal to move away from SourceForge should include a proposal
where to move *to*. I highly admire SourceForge operators for their
quality of service, and challenge anybody to provide the same quality
service. Be prepared to find yourself in a full-time job if you want
to take over.

SourceForge performance was *much* worse in the past, and we didn't
consider moving away, and SF fixed it by buying new hardware. Give
them some time.

Regards,
Martin


From drifty@alum.berkeley.edu Sun Apr 13 08:51:18 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 13 Apr 2003 00:51:18 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > For the regression tests for the stdlib, is it okay to create temporary
> > files (using tempfile) and connect to the Internet (when the network
> > resource is enabled)?
>
> Tempfiles: definitely; though if you need a single temporary file, you
> can use test_support.TESTFN.
>

Perfect. Exactly what I was looking for.

> Connecting to the Internet: only if the network resource is enabled.
> Then it is up to the tester to make sure that connection to the
> Internet is possible.
>

Any suggestions on how to go about this? An initial connection to
python.org after setting socket.setdefaulttimeout() to something
reasonable (like 10 seconds?) and raising test_support.TestSkipped if it
times out?

-Brett


From martin@v.loewis.de Sun Apr 13 08:58:27 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 13 Apr 2003 09:58:27 +0200
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com>
Message-ID: <m3n0iux1zw.fsf@mira.informatik.hu-berlin.de>

"Tim Peters" <tim_one@email.msn.com> writes:

> If it's in gc, I probably caused it. So I'm not asking you to fix it
> <wink>. It would help to know if anyone is having problems under Linux, and
> especially if you are and the debugger there is more helpful in a release
> build.

It crashes for me as well, in test_builtin, with the backtrace

#0 0x40340019 in main_arena () from /lib/libc.so.6
#1 0x080edad6 in visit_decref (op=0x8343fa4, data=0x80eda90) at Modules/gcmodule.c:236
#2 0x08097a70 in tupletraverse (o=0x40351e64, visit=0x80eda90 <visit_decref>, arg=0x0)
 at Objects/tupleobject.c:398
#3 0x080ed152 in collect (generation=2) at Modules/gcmodule.c:250
#4 0x080ed764 in gc_collect (self=0x0, noargs=0x0) at Modules/gcmodule.c:731
#5 0x080be763 in call_function (pp_stack=0xbfffee9c, oparg=24) at Python/ceval.c:3400
#6 0x080bcb9e in eval_frame (f=0x834013c) at Python/ceval.c:2091
#7 0x080bd685 in PyEval_EvalCodeEx (co=0x403aae60, globals=0x18, locals=0x0,
 args=0x834013c, argcount=0, kws=0x82fb2dc, kwcount=0, defs=0x403bd470, defcount=11,
 closure=0x0) at Python/ceval.c:2638
#8 0x080be81e in fast_function (func=0x40351e64, pp_stack=0xbffff02c, n=0, na=0, nk=0)
 at Python/ceval.c:3504
#9 0x080be671 in call_function (pp_stack=0xbffff02c, oparg=24) at Python/ceval.c:3433
#10 0x080bcb9e in eval_frame (f=0x82fb18c) at Python/ceval.c:2091
#11 0x080bd685 in PyEval_EvalCodeEx (co=0x4045a220, globals=0x18, locals=0x4036279c,
 args=0x82fb18c, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
 at Python/ceval.c:2638

The tuple being traversed has 19 elements, of types:

NoneType, int, int, int, int, int, int, int, int, int, int, int,
int, int, int, long, int, float, <NULL>

It crashes on the last tuple element, which is a garbage pointer.

Regards,
Martin


From guido@python.org Sun Apr 13 13:54:37 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 08:54:37 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Sun, 13 Apr 2003 00:51:18 PDT."
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
Message-ID: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>

> > Connecting to the Internet: only if the network resource is enabled.
> > Then it is up to the tester to make sure that connection to the
> > Internet is possible.
> 
> Any suggestions on how to go about this? An initial connection to
> python.org after setting socket.setdefaulttimeout() to something
> reasonable (like 10 seconds?) and raising test_support.TestSkipped if it
> times out?

No, you check whether the 'network' resource name is enabled in
test_support. Use test_support.is_resource_enabled('network').

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au Sun Apr 13 14:05:01 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sun, 13 Apr 2003 23:05:01 +1000
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBLEDAB.tim.one@comcast.net>
Message-ID: <022e01c301bd$4b7f5a70$530f8490@eden>

> Did you try -v, as
>
> > 'import site' failed; use -v for traceback
>
> suggested?

Yep. as I said:

> > Running with "-v" shows:

Note that as I mentioned, this is only if you move away _sre.pyd. The
original report was almost certainly a simple import error.

Mark.



From guido@python.org Sun Apr 13 14:22:35 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 09:22:35 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: Your message of "Sun, 13 Apr 2003 08:54:37 EDT."
Message-ID: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>

> > > Connecting to the Internet: only if the network resource is enabled.
> > > Then it is up to the tester to make sure that connection to the
> > > Internet is possible.
> > 
> > Any suggestions on how to go about this? An initial connection to
> > python.org after setting socket.setdefaulttimeout() to something
> > reasonable (like 10 seconds?) and raising test_support.TestSkipped if it
> > times out?
> 
> No, you check whether the 'network' resource name is enabled in
> test_support. Use test_support.is_resource_enabled('network').

I realize that you might not know how to run such tests either. The
magic words are

 regrtest.py -u network

BTW, this isn't described in Lib/test/README -- perhaps you or someone
else can add it? (Both the -u option and the is_resource_enabled()
function.)

Hm, maybe these docs shouldn't be so hidden and there should be a
standard library chapter on the test package and its submodules and
the conventions for writing and running tests?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@alum.mit.edu Sun Apr 13 19:13:13 2003
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: 13 Apr 2003 14:13:13 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com>
Message-ID: <1050257590.10278.19.camel@localhost.localdomain>

On Sun, 2003-04-13 at 03:05, Tim Peters wrote:
> Under current CVS, release build, running regrtest.py crashes very soon
> after entering test___all__.py for me, on two different machines (but both
> Windows). The C stack has gotten lost by this point, and the program
> counter is pointing into static data(!), about a dozen bytes beyond the
> start of Python's static PyFloat_Type type.

Unfortunately, I don't see any problem at all in a release build on my
Linux box.

Jeremy




From tim_one@email.msn.com Sun Apr 13 19:29:59 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 14:29:59 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <m3n0iux1zw.fsf@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEPCEGAB.tim_one@email.msn.com>

[martin@v.loewis.de]
> It crashes for me as well, in test_builtin, with the backtrace

Wow! It took me hours to get there. Noting that Anthony and Jeremy report
no problems, but Martin's symptom appears identical to mine:

> #0 0x40340019 in main_arena () from /lib/libc.so.6
> #1 0x080edad6 in visit_decref (op=0x8343fa4, data=0x80eda90) at
> Modules/gcmodule.c:236
> #2 0x08097a70 in tupletraverse (o=0x40351e64, visit=0x80eda90
> <visit_decref>, arg=0x0)
> at Objects/tupleobject.c:398
> #3 0x080ed152 in collect (generation=2) at Modules/gcmodule.c:250
> #4 0x080ed764 in gc_collect (self=0x0, noargs=0x0) at
> Modules/gcmodule.c:731
> #5 0x080be763 in call_function (pp_stack=0xbfffee9c, oparg=24)
> at Python/ceval.c:3400
> #6 0x080bcb9e in eval_frame (f=0x834013c) at Python/ceval.c:2091
> #7 0x080bd685 in PyEval_EvalCodeEx (co=0x403aae60, globals=0x18,
> locals=0x0,
> args=0x834013c, argcount=0, kws=0x82fb2dc, kwcount=0,
> defs=0x403bd470, defcount=11,
> closure=0x0) at Python/ceval.c:2638
> #8 0x080be81e in fast_function (func=0x40351e64,
> pp_stack=0xbffff02c, n=0, na=0, nk=0)
> at Python/ceval.c:3504
> #9 0x080be671 in call_function (pp_stack=0xbffff02c, oparg=24)
> at Python/ceval.c:3433
> #10 0x080bcb9e in eval_frame (f=0x82fb18c) at Python/ceval.c:2091
> #11 0x080bd685 in PyEval_EvalCodeEx (co=0x4045a220, globals=0x18,
> locals=0x4036279c,
> args=0x82fb18c, argcount=0, kws=0x0, kwcount=0, defs=0x0,
> defcount=0, closure=0x0)
> at Python/ceval.c:2638
>
> The tuple being traversed has 19 elements, of types:
>
> NoneType, int, int, int, int, int, int, int, int, int, int, int,
> int, int, int, long, int, float, <NULL>
>
> It crashes on the last tuple element, which is a garbage pointer.

Exactly the same here. The tuple is the co_consts belonging to
test_builtin's test_range. It's the 11th tuple of size 19 created
<wink/sigh>. At the time compile.c's jcompile created the tuple:

		consts = PyList_AsTuple(sc.c_consts);

the last element was fine, a float with value 1.e101, from test_range's

 self.assertRaises(ValueError, range, 1e100, 1e101, 1e101)

Alas, none of that helps. At the time of the crash, the last tuple entry
still points to the memory for that floatobject, but the memory has been
scribbled over. The first 18 tuple elements appear still to be intact.

My suspicion that it's a gc problem has gotten weaker to the point of
thinking that's unlikely. It looks more like gc is suffering the effects of
something else scribbling over memory it ought not to be poking.



From drifty@alum.berkeley.edu Sun Apr 13 20:50:39 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 13 Apr 2003 12:50:39 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
 <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > > Connecting to the Internet: only if the network resource is enabled.
> > > Then it is up to the tester to make sure that connection to the
> > > Internet is possible.
> >
> > Any suggestions on how to go about this? An initial connection to
> > python.org after setting socket.setdefaulttimeout() to something
> > reasonable (like 10 seconds?) and raising test_support.TestSkipped if it
> > times out?
>
> No, you check whether the 'network' resource name is enabled in
> test_support. Use test_support.is_resource_enabled('network').
>

Actually I knew that. What I was wondering about was how "to make sure
that connection to Internet is possible".

-Brett


From drifty@alum.berkeley.edu Sun Apr 13 20:53:55 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 13 Apr 2003 12:53:55 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>

[Guido van Rossum]

<snip - Me trying to find out whether it's OK to use the Net in tests>
> > No, you check whether the 'network' resource name is enabled in
> > test_support. Use test_support.is_resource_enabled('network').
>
> I realize that you might not know how to run such tests either. The
> magic words are
>
> regrtest.py -u network
>
> BTW, this isn't described in Lib/test/README -- perhaps you or someone
> else can add it? (Both the -u option and the is_resource_enabled()
> function.)
>

I can write some basic instructions on how to use regrtest and
test_support; someone will just have to check them in.

> Hm, maybe these docs shouldn't be so hidden and there should be a
> standard library chapter on the test package and its submodules and
> the conventions for writing and running tests?
>

That definitely wouldn't hurt. It might also get people to write tests
more often and maybe help with improving our code if they knew about
regrtest and test_support.

-Brett


From tim_one@email.msn.com Sun Apr 13 20:54:05 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 15:54:05 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEPCEGAB.tim_one@email.msn.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>

>> The tuple being traversed has 19 elements, of types:
>>
>> NoneType, int, int, int, int, int, int, int, int, int, int, int,
>> int, int, int, long, int, float, <NULL>
>>
>> It crashes on the last tuple element, which is a garbage pointer.

> Exactly the same here. The tuple is the co_consts belonging to
> test_builtin's test_range. It's the 11th tuple of size 19 created
> <wink/sigh>. At the time compile.c's jcompile created the tuple:
>
> 		consts = PyList_AsTuple(sc.c_consts);
>
> the last element was fine, a float with value 1.e101, from test_range's
>
> self.assertRaises(ValueError, range, 1e100, 1e101, 1e101)
>
> Alas, none of that helps. At the time of the crash, the last tuple
> entry still points to the memory for that floatobject, but the memory
> has been scribbled over. The first 18 tuple elements appear still to
> be intact.
>
> My suspicion that it's a gc problem has gotten weaker to the point of
> thinking that's unlikely. It looks more like gc is suffering the
> effects of something else scribbling over memory it ought not to be
> poking.

Next clue: the damaged float object was earlier (much earlier) deallocated.
Its refcount (in co_consts) started as 1, and it fell to 0 via the tail end
of call_function():

	/* What does this do? */
	while ((*pp_stack) > pfunc) {
		w = EXT_POP(*pp_stack);
		Py_DECREF(w);
		PCALL(PCALL_POP);
	}

However, co_consts is still alive and still points to it, so this
deallocation is erroneous.

float_dealloc abuses the ob_type field to maintain a free list:

		op->ob_type = (struct _typeobject *)free_list;

free_list is a file static. This explains why the tp_traverse slot ends up
pointing into static data in floatobject.c.

Given this, there's approximately no chance gc *caused* it. Who's been
mucking with function calls (or maybe the eval loop) recently?



From jeremy@alum.mit.edu Sun Apr 13 21:33:38 2003
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: 13 Apr 2003 16:33:38 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>
Message-ID: <1050266017.10278.24.camel@localhost.localdomain>

On Sun, 2003-04-13 at 15:54, Tim Peters wrote:
> Given this, there's approximately no chance gc *caused* it. Who's been
> mucking with function calls (or maybe the eval loop) recently?
> 

We've had a lot of changes to the function call implementation over the
last couple of months. What's the chance that this is just the first
time we've noticed the problem? Seems pretty plausible that the recent
GC changes just exposed an earlier bug.

Jeremy




From tim_one@email.msn.com Sun Apr 13 21:28:13 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 16:28:13 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>

>> self.assertRaises(ValueError, range, 1e100, 1e101, 1e101)
> ...
> Given this, there's approximately no chance gc *caused* it. Who's been
> mucking with function calls (or maybe the eval loop) recently?

It appears to be a refcount error in recently-added C code that tries to
generalize the builtin range() function, specifically here:

 Fail:
	Py_XDECREF(curnum);
	Py_XDECREF(istep); <- here
	Py_XDECREF(zero);

Word to the wise: don't ever try to reuse a variable whose address is
passed to PyArg_ParseTuple for anything other than holding what
PyArg_ParseTuple does or doesn't store into it. You'll never get the
decrefs straight (and even if you manage to at first, the next person to
modify your code will break it).

only-consumed-eight-hours-this-time<wink>-ly y'rs - tim



From martin@v.loewis.de Sun Apr 13 21:29:28 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 13 Apr 2003 22:29:28 +0200
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com>
Message-ID: <m3brza2lav.fsf@mira.informatik.hu-berlin.de>

"Tim Peters" <tim_one@email.msn.com> writes:

> However, co_consts is still alive and still points to it, so this
> deallocation is erroneous.

Notice, however, that the float object is not *directly* deallocated.
Instead, it is deallocated as a consequence of deallocating a
one-element tuple which is the argument tuple for "round", in

			PyObject *callargs;
			callargs = load_args(pp_stack, na);
			x = PyCFunction_Call(func, callargs, NULL);
			Py_XDECREF(callargs); 

load_args copies the argument from the stack into the tuple,
transferring the refence. So apparently, the float const gets on the
stack without its reference being bumped...

That's as far as I can get tonight.

Regards,
Martin


From mwh@python.net Sun Apr 13 21:49:17 2003
From: mwh@python.net (Michael Hudson)
Date: Sun, 13 Apr 2003 21:49:17 +0100
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> (Guido
 van Rossum's message of "Sat, 12 Apr 2003 09:43:52 -0400")
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <2mbrzap1gy.fsf@starship.python.net>

Guido van Rossum <guido@python.org> writes:

> Someone accidentally discovered a way to set attributes of built-in
> types, even though the implementation tries to prevent this.

[snip]

> What to do about this?

Well, one approach would be special cases in PyObject_GenericSetAttr,
I guess.

Cheers,
M.

-- 
 > So what does "abc" / "ab" equal?
 cheese
 -- Steve Holden defends obscure semantics on comp.lang.python


From tim_one@email.msn.com Sun Apr 13 22:44:57 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 17:44:57 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <m3brza2lav.fsf@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com>

[martin@v.loewis.de]
> Notice, however, that the float object is not *directly* deallocated.
> Instead, it is deallocated as a consequence of deallocating a
> one-element tuple which is the argument tuple for "round", in
>
> 			PyObject *callargs;
> 			callargs = load_args(pp_stack, na);
> 			x = PyCFunction_Call(func, callargs, NULL);
> 			Py_XDECREF(callargs);
>
> load_args copies the argument from the stack into the tuple,
> transferring the refence. So apparently, the float const gets on the
> stack without its reference being bumped...

That was my excited guess, until I looked at LOAD_CONST <wink>. Calls are
such an elaborate dance that the refcount on this puppy gets as high as 7.
The problem actually occurred when the refcount was at its peak, due to an
erroneous decref in handle_range_longs(). At that point the refcount fell
to 6, and the remaining 6(!) decrefs all looked correct.

> That's as far as I can get tonight.

Thanks for sharing the pain!



From drifty@alum.berkeley.edu Sun Apr 13 22:51:14 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 13 Apr 2003 14:51:14 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
 <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU>

[Guido van Rossum]

<snip - question about tests using the Internet>
> No, you check whether the 'network' resource name is enabled in
> test_support. Use test_support.is_resource_enabled('network').
>

Another thought that has come to mind; should we be diligent about
creating new objects like good testers? Or should we minimize it since
net connections are expensive to make and can hold things up.

-Brett


From tim_one@email.msn.com Sun Apr 13 23:07:05 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 13 Apr 2003 18:07:05 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <1050266017.10278.24.camel@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com>

[Jeremy Hylton]
> We've had a lot of changes to the function call implementation over the
> last couple of months. What's the chance that this is just the first
> time we've noticed the problem?

Slim, I think -- anything systematically screwing up refcounts on calls
would have lots of opportunities to create trouble. This one was unique and
shy.

> Seems pretty plausible that the recent GC changes just exposed an
> earlier bug.

For all the code changes, the only intended semantic difference was in
has_finalizer's implementation details. So that didn't seem likely either.

Turned out that the damaged co_consts was attached to the test that
exercised the new C code at fault. The code was compiled gazillions of
cycles before the test was executed, though, and gazillions more cycles
passed before GC bumped into the damage. If gc hadn't bumped into it, the
memory would have gotten allocated to some other float, and then would have
been decref'ed incorrectly when the original co_consts got deallocated. So
it *could* have been much harder to track down <shudder>.

What I still don't grasp is why a debug run never failed with a
negative-refcount error. Attaching the prematurely-freed float to the float
free list doesn't change its refcount field -- that remains 0. So if it was
still in the free list when the original co_consts got reclaimed, we should
have had a negrefcnt death. OTOH, if the memory was handed out to another
float, then when the original co_consts got reclaimed it would have knocked
that float's refcount down too, which should lead to a negrefcnt death
later. Maybe co_consts never did get reclaimed? I'm not clear on how much
we let slide at shutdown.



From skip@pobox.com Sun Apr 13 23:15:07 2003
From: skip@pobox.com (Skip Montanaro)
Date: Sun, 13 Apr 2003 17:15:07 -0500
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
 <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU>
Message-ID: <16025.57707.389009.819692@montanaro.dyndns.org>

 Brett> Actually I knew that. What I was wondering about was how "to
 Brett> make sure that connection to Internet is possible".

If s/he runs

 ./python Lib/test/regrtest.py -u network

you believe the user. ;-)

Skip


From aahz@pythoncraft.com Mon Apr 14 00:21:39 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 13 Apr 2003 19:21:39 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com>
References: <1050266017.10278.24.camel@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com>
Message-ID: <20030413232138.GA6811@panix.com>

On Sun, Apr 13, 2003, Tim Peters wrote:
>
> What I still don't grasp is why a debug run never failed with a
> negative-refcount error. Attaching the prematurely-freed float to the
> float free list doesn't change its refcount field -- that remains 0.
> So if it was still in the free list when the original co_consts got
> reclaimed, we should have had a negrefcnt death. OTOH, if the memory
> was handed out to another float, then when the original co_consts got
> reclaimed it would have knocked that float's refcount down too, which
> should lead to a negrefcnt death later. Maybe co_consts never did get
> reclaimed? I'm not clear on how much we let slide at shutdown.

Maybe debug runs should walk through "the universe" to make sure it's in
a valid state before exiting? I remember being confused that gc doesn't
run when Python exits.
-- 
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/

This is Python. We don't care much about theory, except where it intersects 
with useful practice. --Aahz, c.l.py, 2/4/2002


From guido@python.org Mon Apr 14 01:55:18 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 20:55:18 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Sun, 13 Apr 2003 14:51:14 PDT."
 <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU>
 <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU>
 <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU>
Message-ID: <200304140055.h3E0tIP26895@pcp02138704pcs.reston01.va.comcast.net>

> > No, you check whether the 'network' resource name is enabled in
> > test_support. Use test_support.is_resource_enabled('network').
> 
> Another thought that has come to mind; should we be diligent about
> creating new objects like good testers? Or should we minimize it since
> net connections are expensive to make and can hold things up.

Net connections aren't that expensive; you can happily create a new
net connection for each individual test. Of course, tests that hold
things up should be minimized, but in my experience, tests containing
waits (even sleep(0.1)) hold things up much more than opening and
closing sockets.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Mon Apr 14 01:59:52 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 20:59:52 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: "Your message of Sun, 13 Apr 2003 21:49:17 BST."
 <2mbrzap1gy.fsf@starship.python.net>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
 <2mbrzap1gy.fsf@starship.python.net>
Message-ID: <200304140059.h3E0xqH26915@pcp02138704pcs.reston01.va.comcast.net>

> Guido van Rossum <guido@python.org> writes:
> 
> > Someone accidentally discovered a way to set attributes of built-in
> > types, even though the implementation tries to prevent this.
> 
> [snip]
> 
> > What to do about this?

Michael Hudson:
> Well, one approach would be special cases in PyObject_GenericSetAttr,
> I guess.

That's not quite enough, because PyObject_GenericSetAttr also gets
called by code that should be allowed; I don't want to move all of the
special processing from type_setattro() to PyObject_GenericSetAttr.

But, having thought some more about this, I think adding a check to
wrap_setattr() might be the thing to do. That gets called when you
call object.__setattr__(x, "foo", value), but not when you do
x.foo = value, so it's okay if it slows it down a tad.

The test should make sure that self->ob_type->tp_setattro equals func,
or something like that (haven't thought enough about the exact test
which allows calling object.__setattr__ from a subclass that extends
__setattr__ but not in the offending case).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Mon Apr 14 02:01:46 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 21:01:46 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: "Your message of Sun, 13 Apr 2003 16:28:13 EDT."
 <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
Message-ID: <200304140101.h3E11kg26948@pcp02138704pcs.reston01.va.comcast.net>

> It appears to be a refcount error in recently-added C code that tries to
> generalize the builtin range() function, specifically here:
> 
> Fail:
> 	Py_XDECREF(curnum);
> 	Py_XDECREF(istep); <- here
> 	Py_XDECREF(zero);
> 
> Word to the wise: don't ever try to reuse a variable whose address is
> passed to PyArg_ParseTuple for anything other than holding what
> PyArg_ParseTuple does or doesn't store into it. You'll never get the
> decrefs straight (and even if you manage to at first, the next person to
> modify your code will break it).

It's possible that I introduced that bug when I reworked the patch to
use a single label rather than one for each variable. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Mon Apr 14 02:02:46 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 13 Apr 2003 21:02:46 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Sun, 13 Apr 2003 12:53:55 PDT."
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
Message-ID: <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>

> I can write some basic instructions on how to use regrtest and
> test_support; someone will just have to check them in.

That would be great. Do you have a SF userid yet? Then we can give
you commit privs!

> > Hm, maybe these docs shouldn't be so hidden and there should be a
> > standard library chapter on the test package and its submodules and
> > the conventions for writing and running tests?
> 
> That definitely wouldn't hurt. It might also get people to write
> tests more often and maybe help with improving our code if they knew
> about regrtest and test_support.

And I think regrtest and test_support are useful for testing 3rd party
code as well. Wanna make this a project?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz Mon Apr 14 02:17:14 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 Apr 2003 13:17:14 +1200 (NZST)
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <2mbrzap1gy.fsf@starship.python.net>
Message-ID: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz>

Guido:

> one approach would be special cases in PyObject_GenericSetAttr,
> I guess.

Before using a hack like that, it might be better to think about what
the real problem is.

Seems to me the problem in general is that there's no way to prevent a
class which overrides a method from having a superclass version of
that method called through a back door. Which means you can't rely on
method overriding to *restrict* what can be done to an object.

So a proper fix would require either:

(1) Providing some way for objects to prevent superclass
methods from being called on them when they're not looking

or

(2) Fixing the typeobject not to rely on that for its security --
by hiding the real dict more deeply somehow?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From drifty@alum.berkeley.edu Mon Apr 14 03:13:30 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 13 Apr 2003 19:13:30 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
 <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > I can write some basic instructions on how to use regrtest and
> > test_support; someone will just have to check them in.
>
> That would be great. Do you have a SF userid yet? Then we can give
> you commit privs!
>

bcannon is my username. I was going to wait to ask for commit privs until
I had done more patches (specifically C stuff), but if you think I am
ready for it then it would be extremely cool to get commit privs (and not
have to wait for anonymous CVS updates when the servers get overloaded
or bug people to commit _strptime patches =).

> > > Hm, maybe these docs shouldn't be so hidden and there should be a
> > > standard library chapter on the test package and its submodules and
> > > the conventions for writing and running tests?
> >
> > That definitely wouldn't hurt. It might also get people to write
> > tests more often and maybe help with improving our code if they knew
> > about regrtest and test_support.
>
> And I think regrtest and test_support are useful for testing 3rd party
> code as well. Wanna make this a project?
>

I could. Going to have to learn more LaTeX (and the special extensions).
So I can take this on, but I can't make any promises on when this will get
done (I would be personally horrified if I can't get this done before 2.3
final gets out the door, but you never know).

Should there be a testing SIG? Could keep a list of tests that could
stand to be rewritten or added (I know I was surprised to discover
test_urllib was so lacking). Could also start by hashing out these docs
and making sure regrtest and test_support stay updated and relevant.

Personally, I think writing regression tests is a good way to get new
people to help with Python. They are simple to write and allows someone
to be able to get involved beyond just filing a bug. I know it was a
thrill for me the first time I got code checked in and maybe making the
entry point easier by trying to get more people to write more regression
tests for the libraries will help give someone else that rush and thus
become more involved.

Or maybe I am just bonkers. =)

-Brett


From mwh@python.net Mon Apr 14 07:33:25 2003
From: mwh@python.net (Michael Hudson)
Date: Mon, 14 Apr 2003 07:33:25 +0100
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz> (Greg
 Ewing's message of "Mon, 14 Apr 2003 13:17:14 +1200 (NZST)")
References: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz>
Message-ID: <2m65phpozu.fsf@starship.python.net>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> Guido:

Er, this was me.

>> one approach would be special cases in PyObject_GenericSetAttr,
>> I guess.
>
> Before using a hack like that, it might be better to think about what
> the real problem is.

Aww :-)

> Seems to me the problem in general is that there's no way to prevent a
> class which overrides a method from having a superclass version of
> that method called through a back door. Which means you can't rely on
> method overriding to *restrict* what can be done to an object.
>
> So a proper fix would require either:
>
> (1) Providing some way for objects to prevent superclass
> methods from being called on them when they're not looking
>
> or
>
> (2) Fixing the typeobject not to rely on that for its security --
> by hiding the real dict more deeply somehow?

Yeah, another option would be to make _PyObject_GetDictPtr respect
__dict__ descriptors. But that's probably the Wrong Answer, too.
Maybe just PyObject_GenericSetAttr should do that -- call
PyObject_GetAttr(ob, '__dict__'), basically.

bad-answers-on-demand-ly y'rs
M.

-- 
 We did requirements and task analysis, iterative design, and user
 testing. You'd almost think programming languages were an interface
 between people and computers. -- Steven Pemberton
 (one of the designers of Python's direct ancestor ABC)


From mwh@python.net Mon Apr 14 07:36:59 2003
From: mwh@python.net (Michael Hudson)
Date: Mon, 14 Apr 2003 07:36:59 +0100
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com> ("Tim
 Peters"'s message of "Sun, 13 Apr 2003 17:44:57 -0400")
References: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com>
Message-ID: <2mznmtoa9g.fsf@starship.python.net>

"Tim Peters" <tim_one@email.msn.com> writes:

> That was my excited guess, until I looked at LOAD_CONST <wink>. Calls are
> such an elaborate dance that the refcount on this puppy gets as high as 7.
> The problem actually occurred when the refcount was at its peak, due to an
> erroneous decref in handle_range_longs(). At that point the refcount fell
> to 6, and the remaining 6(!) decrefs all looked correct.

It seems to me that this would have been found much more easily if
floats didn't have a free list anymore...

Cheers,
M.

-- 
 I don't remember any dirty green trousers.
 -- Ian Jackson, ucam.chat


From guido@python.org Mon Apr 14 12:52:29 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 07:52:29 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Sun, 13 Apr 2003 19:13:30 PDT."
 <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
 <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>
Message-ID: <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net>

> > That would be great. Do you have a SF userid yet? Then we can give
> > you commit privs!
> 
> bcannon is my username. I was going to wait to ask for commit privs
> until I had done more patches (specifically C stuff), but if you
> think I am ready for it then it would be extremely cool to get
> commit privs (and not have to wait for anonymous CVS updates when
> the servers get overloaded or bug people to commit _strptime patches
> =).

OK, you're on.

> I could. Going to have to learn more LaTeX (and the special
> extensions). So I can take this on, but I can't make any promises
> on when this will get done (I would be personally horrified if I
> can't get this done before 2.3 final gets out the door, but you
> never know).

With LaTeX, the monkey-see-monkey-do approach works pretty well,
combined with the Fred-will-fix-my-LaTeX-bugs approach. :-)

> Should there be a testing SIG? Could keep a list of tests that
> could stand to be rewritten or added (I know I was surprised to
> discover test_urllib was so lacking). Could also start by hashing
> out these docs and making sure regrtest and test_support stay
> updated and relevant.

I don't know about a SIG. Testing of what's in the core is fair game
for python-dev. 3rd party testing, ask around.

> Personally, I think writing regression tests is a good way to get
> new people to help with Python. They are simple to write and allows
> someone to be able to get involved beyond just filing a bug. I know
> it was a thrill for me the first time I got code checked in and
> maybe making the entry point easier by trying to get more people to
> write more regression tests for the libraries will help give someone
> else that rush and thus become more involved.
> 
> Or maybe I am just bonkers. =)

Writing a good regression test requires excellent knowledge about the
code you're testing while not touching it, so that's indeed a good way
to learn.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From theller@python.net Mon Apr 14 13:06:39 2003
From: theller@python.net (Thomas Heller)
Date: 14 Apr 2003 14:06:39 +0200
Subject: [Python-Dev] GIL vs thread state
Message-ID: <r885xoz4.fsf@python.net>

The docs for PyThreadState_Clear() state that the interpreter lock must
be held.

I had this code in ctypes to delete the thread state and release the lock:

static void LeavePython(char *msg)
{
	PyThreadState *pts = PyThreadState_Swap(NULL);
	if (!pts)
		Py_FatalError("wincall (LeavePython): ThreadState is NULL?");
	PyThreadState_Clear(pts);
	PyThreadState_Delete(pts);
	PyEval_ReleaseLock();
}

and (under certain coditions, when ptr->frame was not NULL), got
"Fatal Python error: PyThreadState_Get: no current thread" in the call
to PyThreadState_Clear().

The GIL is held while this code is executed, although there is no thread
state. Changing the code to the following fixes the problem, it seems
holding the GIL is not enough:

static void LeavePython(char *msg)
{
	PyThreadState *pts = PyThreadState_Get();
	if (!pts)
		Py_FatalError("wincall (LeavePython): ThreadState is NULL?");
	PyThreadState_Clear(pts);
	pts = PyThreadState_Swap(NULL);
	PyThreadState_Delete(pts);
	PyEval_ReleaseLock();
}

Is this a documentation problem, or a misunderstanding on my side?
And, while we're on it, does the second version look ok?

Thomas



From uche.ogbuji@fourthought.com Mon Apr 14 15:10:12 2003
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 14 Apr 2003 08:10:12 -0600
Subject: [Python-Dev] List wisdom
In-Reply-To: Message from "Tim Peters" <tim_one@email.msn.com>
 of "Sun, 13 Apr 2003 16:28:13 EDT." <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
Message-ID: <E1954ey-0007KU-00@borgia.local>

> >> self.assertRaises(ValueError, range, 1e100, 1e101, 1e101)
> > ...
> > Given this, there's approximately no chance gc *caused* it. Who's been
> > mucking with function calls (or maybe the eval loop) recently?
> 
> It appears to be a refcount error in recently-added C code that tries to
> generalize the builtin range() function, specifically here:
> 
> Fail:
> 	Py_XDECREF(curnum);
> 	Py_XDECREF(istep); <- here
> 	Py_XDECREF(zero);
> 
> Word to the wise: don't ever try to reuse a variable whose address is
> passed to PyArg_ParseTuple for anything other than holding what
> PyArg_ParseTuple does or doesn't store into it. You'll never get the
> decrefs straight (and even if you manage to at first, the next person to
> modify your code will break it).

This snippet sparked a little chain of events for me. I'm sure I've violated 
the principle before (foolishly trying to avoid declaring yet more C 
variables: I've always known it's bad style, but never thought it dangerous). 
I wanted to know whether this wisdom could be found anywhere a Python/C 
programmer would be likely to browse.

So I dug through the Python Wiki, and found no such page of gems (just a lot 
of whimsical quotes from #python and a code-sharing page with some odd 
trinkets). I also checked to see if #python had a chump (opt-in log) on which 
I could put the quote. No dice. I did chump it on the #4suite log:

http://uche.ogbuji.net/tech/akara/?xslt=irc.xslt&date=2003-04-14#14:03:38

I also created a Python Wiki page for useful notes and code snippets from this 
mailing list:

http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom

Please feel free to use it if anything here seems especially important to 
highlight (in addition to Brett Cannon's tireless work, of course).

Thanks.

hoping-to-save-others-an-eight-hour-odyssey-ly y'rs


-- 
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Gems From the [Python/XML] Archives - http://www.xml.com/pub/a/2003/04/09/py-xm
l.html
Introducing N-Triples - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk17/index.html
Use internal references in XML vocabularies - http://www-106.ibm.com/developerw
orks/xml/library/x-tipvocab.html
EXSLT by example - http://www-106.ibm.com/developerworks/library/x-exslt.html
The worry about program wizards - http://www.adtmag.com/article.asp?id=7238
Use rdf:about and rdf:ID effectively in RDF/XML - http://www-106.ibm.com/develo
perworks/xml/library/x-tiprdfai.html




From guido@python.org Mon Apr 14 15:10:28 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 10:10:28 -0400
Subject: [Python-Dev] GIL vs thread state
In-Reply-To: Your message of "14 Apr 2003 14:06:39 +0200."
 <r885xoz4.fsf@python.net>
References: <r885xoz4.fsf@python.net>
Message-ID: <200304141410.h3EEAeZ14896@odiug.zope.com>

> The docs for PyThreadState_Clear() state that the interpreter lock must
> be held.
> 
> I had this code in ctypes to delete the thread state and release the lock:
> 
> static void LeavePython(char *msg)
> {
> 	PyThreadState *pts = PyThreadState_Swap(NULL);
> 	if (!pts)
> 		Py_FatalError("wincall (LeavePython): ThreadState is NULL?");
> 	PyThreadState_Clear(pts);
> 	PyThreadState_Delete(pts);
> 	PyEval_ReleaseLock();
> }
> 
> and (under certain coditions, when ptr->frame was not NULL), got

What is ptr->frame? A typo for pts->frame?

If pts->frame is not NULL, I'd expect a warning from
PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has
a frame\n".

> "Fatal Python error: PyThreadState_Get: no current thread" in the call
> to PyThreadState_Clear().

That's strange, because I cannot trace the code in there to such a
call. (Unless it is in a destructor. Can you tell more about where
the PyThreadState_Get() call was?)

> The GIL is held while this code is executed, although there is no thread
> state. Changing the code to the following fixes the problem, it seems
> holding the GIL is not enough:
> 
> static void LeavePython(char *msg)
> {
> 	PyThreadState *pts = PyThreadState_Get();
> 	if (!pts)
> 		Py_FatalError("wincall (LeavePython): ThreadState is NULL?");
> 	PyThreadState_Clear(pts);
> 	pts = PyThreadState_Swap(NULL);
> 	PyThreadState_Delete(pts);
> 	PyEval_ReleaseLock();
> }
> 
> Is this a documentation problem, or a misunderstanding on my side?
> And, while we're on it, does the second version look ok?
> 
> Thomas
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev



From paul@prescod.net Mon Apr 14 15:34:22 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Apr 2003 07:34:22 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
Message-ID: <3E9AC6EE.8010900@prescod.net>

Tim Peters wrote:
> ...
> 
> Hiding critical resources in closures is a Bad Idea, of course -- that's why
> nobody has used Scheme since 1993 <wink>

Just to be clear, I didn't really intend to create a closure (i.e. a 
package of code and data). I just defined a function in a function 
because the inner function wasn't needed elsewhere.

I don't know what the solution is, but it seems quite serious to me that 
 there is another special case to remember when reasoning about when 
destructors get called. Roughly, Python's cleanup model is "things get 
destroyed when nothing refers to them." Then, that gets clarified to 
"unless they have reference cycles, in which case they may get destroyed 
arbitrarily later" and now "or they are used in a function containing 
another function, which will cause a circular reference involving all 
local variables."

 Paul Prescod



From guido@python.org Mon Apr 14 15:50:10 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 10:50:10 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: Your message of "Mon, 14 Apr 2003 07:34:22 PDT."
 <3E9AC6EE.8010900@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
 <3E9AC6EE.8010900@prescod.net>
Message-ID: <200304141450.h3EEoAx15118@odiug.zope.com>

> From: Paul Prescod <paul@prescod.net>
>
> Roughly, Python's cleanup model is "things get destroyed when
> nothing refers to them."

This hasn't been the mantra since Jython was introduced. Since then,
the rule has always been "some arbitrary time after nothing refers to
them." And the corollary is "always explicitly close your external
resources."

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@zope.com Mon Apr 14 15:52:42 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 14 Apr 2003 10:52:42 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9AC6EE.8010900@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
 <3E9AC6EE.8010900@prescod.net>
Message-ID: <1050331961.28028.4.camel@slothrop.zope.com>

On Mon, 2003-04-14 at 10:34, Paul Prescod wrote:
> I don't know what the solution is, but it seems quite serious to me that 
> there is another special case to remember when reasoning about when 
> destructors get called. Roughly, Python's cleanup model is "things get 
> destroyed when nothing refers to them." Then, that gets clarified to 
> "unless they have reference cycles, in which case they may get destroyed 
> arbitrarily later" and now "or they are used in a function containing 
> another function, which will cause a circular reference involving all 
> local variables."

The details of when finalizers are called is an implementation detail
rather than a language property. You should add to your list of
worries: An object is not finalized when it is reachable from a cycle of
objects involving finalizers. They don't get destroyed at all.

Finalizers seem useful in general, but I would still worry about any
specific program that managed critical resources using finalizers.

Jeremy






From theller@python.net Mon Apr 14 15:58:50 2003
From: theller@python.net (Thomas Heller)
Date: 14 Apr 2003 16:58:50 +0200
Subject: [Python-Dev] GIL vs thread state
In-Reply-To: <200304141410.h3EEAeZ14896@odiug.zope.com>
References: <r885xoz4.fsf@python.net>
 <200304141410.h3EEAeZ14896@odiug.zope.com>
Message-ID: <llydxh05.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> > The docs for PyThreadState_Clear() state that the interpreter lock must
> > be held.
> > 
> > I had this code in ctypes to delete the thread state and release the lock:
> > 
> > static void LeavePython(char *msg)
> > {
> > 	PyThreadState *pts = PyThreadState_Swap(NULL);
> > 	if (!pts)
> > 		Py_FatalError("wincall (LeavePython): ThreadState is NULL?");
> > 	PyThreadState_Clear(pts);
> > 	PyThreadState_Delete(pts);
> > 	PyEval_ReleaseLock();
> > }
> > 
> > and (under certain coditions, when ptr->frame was not NULL), got
> 
> What is ptr->frame? A typo for pts->frame?

Right, sorry.

> 
> If pts->frame is not NULL, I'd expect a warning from
> PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has
> a frame\n".
You mean this code, from Python/pystate.h?

void
PyThreadState_Clear(PyThreadState *tstate)
{
	if (Py_VerboseFlag && tstate->frame != NULL)
		fprintf(stderr,
		 "PyThreadState_Clear: warning: thread still has a frame\n");

	ZAP(tstate->frame);

	ZAP(tstate->dict);
...
}

Py_VerboseFlag is 0 set in my case, so no warning is printed.

> 
> > "Fatal Python error: PyThreadState_Get: no current thread" in the call
> > to PyThreadState_Clear().
> 
> That's strange, because I cannot trace the code in there to such a
> call. (Unless it is in a destructor.

It is in a destructor: frame_dealloc, called from ZAP(tstate->frame).

> Can you tell more about where
> the PyThreadState_Get() call was?)

This function allocates the threadstate for me:

static void EnterPython(char *msg)
{
	PyThreadState *pts;
	PyEval_AcquireLock();
	pts = PyThreadState_New(g_interp);
	if (!pts)
		Py_FatalError("wincall: Could not allocate ThreadState");
	if (NULL != PyThreadState_Swap(pts))
		Py_FatalError("wincall (EnterPython): thread state not == NULL?");
}


To explain the picture a little better, here is the sequence of calls:

Python calls into a C extension.
The C extension does

Py_BEGIN_ALLOW_THREADS
 call_a_C_function()
Py_END_ALLOW_THREADS

The call_a_C_function calls back into C code like this:

void MyCallback(void)
{
 EnterPython(); /* acquire the lock, and create a thread state */
 execute_some_python_code();
 LeavePython(); /* destroy the thread state, and release the lock */
}

Now, the execute_some_python_code() section is enclosed in a win32
structured exception handling block, and it may return still with a
frame in the threadstate, as it seems.

Oops, I just tried the code in CVS python, and the problem goes away.
Same for 2.3a2.
But my code has to run in 2.2.2 as well...

Thomas

Here's the stack from python 2.2.2:

NTDLL! 77f6f570()
PyThreadState_Get() line 246 + 10 bytes
PyErr_Fetch(_object * * 0x0012f944, _object * * 0x0012f954, _object * * 0x0012f948) line 215 + 5 bytes
call_finalizer(_object * 0x0095ef20) line 382 + 17 bytes
subtype_dealloc(_object * 0x0095ef20) line 434 + 9 bytes
_Py_Dealloc(_object * 0x0095ef20) line 1837 + 7 bytes
frame_dealloc(_frame * 0x00890c20) line 82 + 79 bytes
_Py_Dealloc(_object * 0x00890c20) line 1837 + 7 bytes
PyThreadState_Clear(_ts * 0x0095d2a0) line 174 + 86 bytes
LeavePython(char * 0x1001125c) line 41 + 10 bytes



From guido@python.org Mon Apr 14 16:18:07 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 11:18:07 -0400
Subject: [Python-Dev] GIL vs thread state
In-Reply-To: Your message of "14 Apr 2003 16:58:50 +0200."
 <llydxh05.fsf@python.net>
References: <r885xoz4.fsf@python.net> <200304141410.h3EEAeZ14896@odiug.zope.com>
 <llydxh05.fsf@python.net>
Message-ID: <200304141518.h3EFI7w16583@odiug.zope.com>

> > If pts->frame is not NULL, I'd expect a warning from
> > PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has
> > a frame\n".
> You mean this code, from Python/pystate.h?
> 
> void
> PyThreadState_Clear(PyThreadState *tstate)
> {
> 	if (Py_VerboseFlag && tstate->frame != NULL)
> 		fprintf(stderr,
> 		 "PyThreadState_Clear: warning: thread still has a frame\n");
> 
> 	ZAP(tstate->frame);
> 
> 	ZAP(tstate->dict);
> ...
> }

Yes.

> Py_VerboseFlag is 0 set in my case, so no warning is printed.

OK.

> > > "Fatal Python error: PyThreadState_Get: no current thread" in the call
> > > to PyThreadState_Clear().
> > 
> > That's strange, because I cannot trace the code in there to such a
> > call. (Unless it is in a destructor.
> 
> It is in a destructor: frame_dealloc, called from ZAP(tstate->frame).

Aha. That wasn't obvious from your description.

> > Can you tell more about where the PyThreadState_Get() call was?)
> 
> This function allocates the threadstate for me:
> 
> static void EnterPython(char *msg)
> {
> 	PyThreadState *pts;
> 	PyEval_AcquireLock();
> 	pts = PyThreadState_New(g_interp);
> 	if (!pts)
> 		Py_FatalError("wincall: Could not allocate ThreadState");
> 	if (NULL != PyThreadState_Swap(pts))
> 		Py_FatalError("wincall (EnterPython): thread state not == NULL?");
> }

Maybe you should have a look at Mark Hammond's PEP 311. It describes
the problem and proposes a better solution. (I think it requires you
to always use the existing thread state for the thread, rather than
making up a temporary thread state as is currently the idiom.)

> To explain the picture a little better, here is the sequence of calls:
> 
> Python calls into a C extension.
> The C extension does
> 
> Py_BEGIN_ALLOW_THREADS
> call_a_C_function()
> Py_END_ALLOW_THREADS
> 
> The call_a_C_function calls back into C code like this:
> 
> void MyCallback(void)
> {
> EnterPython(); /* acquire the lock, and create a thread state */
> execute_some_python_code();
> LeavePython(); /* destroy the thread state, and release the lock */
> }
> 
> Now, the execute_some_python_code() section is enclosed in a win32
> structured exception handling block, and it may return still with a
> frame in the threadstate, as it seems.

Ouch! I don't know what structured exception handling is, but this
looks like it would be as bad as using setjmp/longjmp to get back to
right after execute_some_python_code(). That code could leak
arbitrary Python references!!!

> Oops, I just tried the code in CVS python, and the problem goes away.
> Same for 2.3a2.

I vaguely recall that someone fixed some things in this area... :-(

> But my code has to run in 2.2.2 as well...

If the docs are lying, they have to be fixed. This is no longer my
prime area of expertise... :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From theller@python.net Mon Apr 14 16:35:34 2003
From: theller@python.net (Thomas Heller)
Date: 14 Apr 2003 17:35:34 +0200
Subject: [Python-Dev] GIL vs thread state
In-Reply-To: <200304141518.h3EFI7w16583@odiug.zope.com>
References: <r885xoz4.fsf@python.net>
 <200304141410.h3EEAeZ14896@odiug.zope.com> <llydxh05.fsf@python.net>
 <200304141518.h3EFI7w16583@odiug.zope.com>
Message-ID: <adetxfax.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> Maybe you should have a look at Mark Hammond's PEP 311. It describes
> the problem and proposes a better solution. (I think it requires you
> to always use the existing thread state for the thread, rather than
> making up a temporary thread state as is currently the idiom.)

I have only briefly skimmed the PEP, but I have the impression that it
proposes an new API, which may appear in 2.3 or 2.4.

> Ouch! I don't know what structured exception handling is, but this
> looks like it would be as bad as using setjmp/longjmp to get back to
> right after execute_some_python_code().

Exactly. It basically does a longjmp() instead of crashing the process
with an access violation, for example.

> That code could leak
> arbitrary Python references!!!

I consider access violations programming errors, so leaking references
would be ok. But I want to print a traceback instead of crashing (or at
least before crashing)

> If the docs are lying, they have to be fixed. This is no longer my
> prime area of expertise... :-(

That's why I have been asking. I can submit a bug pointing to this
thread.

Thomas



From pje@telecommunity.com Mon Apr 14 16:52:16 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Apr 2003 11:52:16 -0400
Subject: [Python-Dev] Garbage collecting closures
Message-ID: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>

 >Then, that gets clarified to
 >"unless they have reference cycles, in which case they may get destroyed
 >arbitrarily later" and now "or they are used in a function containing
 >another function, which will cause a circular reference involving all
 >local variables."

Actually, the issue is that *recursive* nested functions create a circular 
reference. Note that the body of function 'foo' contains a reference to 
'foo'. *That* is the circular reference.

If I understand correctly, it should also be breakable by deleting 'foo' 
from the outer function when you're done with it. E.g.:

def bar(a):
 def foo():
 return None
 x = a
 foo()

 del foo # clears the cell and breaks the cycle


Strangely, I could have sworn that there was documentation that came out 
when nested scopes were introduced that discussed this issue. But I just 
looked at PEP 227 and the related "What's New" document, and neither 
explicitly mentions that defining recursive nested functions creates a 
circular reference. I think I just "knew" that it would do so, from what 
*is* said in those documents and what little I knew about how the cells 
mechanism was supposed to work.

Since both PEP 227 and the What's New document mention recursive nested 
functions as a motivating example for nested scopes, perhaps they should 
mention the circular reference consequence of doing so.




From jeremy@zope.com Mon Apr 14 16:58:35 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 14 Apr 2003 11:58:35 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
Message-ID: <1050335915.28028.10.camel@slothrop.zope.com>

On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote:
> If I understand correctly, it should also be breakable by deleting 'foo' 
> from the outer function when you're done with it. E.g.:
> 
> def bar(a):
> def foo():
> return None
> x = a
> foo()
> 
> del foo # clears the cell and breaks the cycle
> 

You haven't tried this, have you? ;-)

SyntaxError: can not delete variable 'foo' referenced in nested scope

Since foo() could escape bar, i.e. become reachable outside of bar(), we
don't allow you to unbind foo.

Jeremy




From pje@telecommunity.com Mon Apr 14 17:08:38 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 14 Apr 2003 12:08:38 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <1050335915.28028.10.camel@slothrop.zope.com>
References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
 <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
Message-ID: <5.1.1.6.0.20030414120333.01d59220@telecommunity.com>

At 11:58 AM 4/14/03 -0400, Jeremy Hylton wrote:
>On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote:
> > If I understand correctly, it should also be breakable by deleting 'foo'
> > from the outer function when you're done with it. E.g.:
> >
> > def bar(a):
> > def foo():
> > return None
> > x = a
> > foo()
> >
> > del foo # clears the cell and breaks the cycle
> >
>
>You haven't tried this, have you? ;-)

Well, I did say, "If I understand correctly". :)

What's funny is, I could've sworn I've used 'del' under similar 
circumstances before. It must not have been to delete a cell, just deleting 
something else in a function that defined a function. Ah well.


>SyntaxError: can not delete variable 'foo' referenced in nested scope

Interestingly, it gives me a different error in IDLE: "unsupported operand 
type(s) for -: 'NoneType' and 'int'"


>Since foo() could escape bar, i.e. become reachable outside of bar(), we
>don't allow you to unbind foo.

So do this instead:

foo = None



From guido@python.org Mon Apr 14 17:08:04 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 12:08:04 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: Your message of "14 Apr 2003 11:58:35 EDT."
 <1050335915.28028.10.camel@slothrop.zope.com>
References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
 <1050335915.28028.10.camel@slothrop.zope.com>
Message-ID: <200304141608.h3EG84V17588@odiug.zope.com>

> On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote:
> > If I understand correctly, it should also be breakable by deleting 'foo' 
> > from the outer function when you're done with it. E.g.:
> > 
> > def bar(a):
> > def foo():
> > return None
> > x = a
> > foo()
> > 
> > del foo # clears the cell and breaks the cycle

> From: Jeremy Hylton <jeremy@zope.com>
> 
> You haven't tried this, have you? ;-)
> 
> SyntaxError: can not delete variable 'foo' referenced in nested scope
> 
> Since foo() could escape bar, i.e. become reachable outside of bar(), we
> don't allow you to unbind foo.

I don't see the reason for this semantic restriction. IMO it could
just as well be a runtime error (e.g. raising UnboundLocalError).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@zope.com Mon Apr 14 17:16:59 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 14 Apr 2003 12:16:59 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304141608.h3EG84V17588@odiug.zope.com>
References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net>
 <1050335915.28028.10.camel@slothrop.zope.com>
 <200304141608.h3EG84V17588@odiug.zope.com>
Message-ID: <1050337018.28028.19.camel@slothrop.zope.com>

On Mon, 2003-04-14 at 12:08, Guido van Rossum wrote:
> I don't see the reason for this semantic restriction. IMO it could
> just as well be a runtime error (e.g. raising UnboundLocalError).

I can't recall why I thought this restriction was necessary. Very
little code and one new opcode is required to change the compile-time
error to a runtime error.

Jeremy




From paul@prescod.net Mon Apr 14 20:32:06 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Apr 2003 12:32:06 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304141450.h3EEoAx15118@odiug.zope.com>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com>
Message-ID: <3E9B0CB6.4030101@prescod.net>

Jeremy Hylton wrote:
 > The details of when finalizers are called is an implementation detail
 > rather than a language property.

and

Guido van Rossum wrote:
> ... Since then,
> the rule has always been "some arbitrary time after nothing refers to
> them." And the corollary is "always explicitly close your external
> resources."

I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some 
background so you can understand my use case. Then you can decide for 
yourself whether it is worth supporting.

When you're dealing with COM objects, you do stuff like:

 foo = a.b.c.d

b, c and d are all temporary, reference counted objects: reference 
counted on both the COM and Python sides. It is quite inconvenient to 
treat them as "resources" like database handles or something.


a = Dispatch("xxx.yyy")

b = a.b
c = b.c
d = c.d

a.release()
b.release()
c.release()

80% of the variables in my code are COM objects!

I'm not a big win32com programmer, but it is my impression that this is 
NOT the typical programming style.

COM is specifically designed to use reference counting so that 
programmers (even C++ programmers!) don't have to do explicit 
deallocation. COM and CPython have roughly the same garbage collection 
model (reference counted) so there is no need to treat them as special 
external resources. (nowadays, Python cleans up circular references and 
COM doesn't, so there is a minor divergence there)

The truth is that even after having been bitten, I'd rather deal with 
the 3 or 4 exceptional garbage collection cases (circular references 
with finalizers, closures, etc.) than uglify and complicate my Python 
code! I'll explicitly run GC in a shutdown() method.

Even though it is easy to work around, this particular special case 
really feels pathological to me. Simple transformations set it off, and 
they can be quite non-local. From:

Safe:

def a():
 if something:
 a()

def b():
 a()
 ... # ten thousand lines of code
 x = com_object

to

Buggy:

def b():
 def a():
 if something:
 a()
 a()
 ... # ten thousand lines of code
 x = com_object

OR

Safe:

def b():
 def a():
 if something:
 a()
 a()
 ... # ten thousand lines of code
 com_object.do_something()

to

Buggy:

def b():
 def a():
 if something:
 a()
 a()
 # ten thousand lines of code
 junk = com_object.do_something()

If I'm the first and last person to have this problem, then I guess it 
won't be a big deal, but it sure was confusing for me to debug. The 
containing app wouldn't shut down while Python owned a reference.

 Paul Prescod



From paul@prescod.net Mon Apr 14 20:43:58 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Apr 2003 12:43:58 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <1050331961.28028.4.camel@slothrop.zope.com>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>	 <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com>
Message-ID: <3E9B0F7E.1080407@prescod.net>

> 
> Finalizers seem useful in general, but I would still worry about any
> specific program that managed critical resources using finalizers.
> 
> Jeremy

Finalizer behaviour is implementation specific. Fair enough.

Therefore, portable programs don't use finalizers. Okay, fine. Not all 
Python programs are designed to be portable. Finalizers tend to be used 
to deal with non-portable resources (COM objects, database handles) anyhow.

This suggests to me that each implementation should document in detail 
how finalizers work in that implementation. After all, if you can't 
depend on them to work predictably even within a single implementation, 
what is their value at all? A totally unpredictable feature is of little 
more value than no feature at all.

I propose to collect the various garbage collection special cases we've 
described in this discussion and write a tutorial for the CPython 
documentation. Does anyone know of any more special cases? Probably any 
library or language feature that can create non-obvious circular 
references should be listed.

 Paul Prescod



From martin@v.loewis.de Mon Apr 14 21:00:49 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 14 Apr 2003 22:00:49 +0200
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9B0CB6.4030101@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
 <3E9AC6EE.8010900@prescod.net>
 <200304141450.h3EEoAx15118@odiug.zope.com>
 <3E9B0CB6.4030101@prescod.net>
Message-ID: <m3u1d0voge.fsf@mira.informatik.hu-berlin.de>

Paul Prescod <paul@prescod.net> writes:

> I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some
> background so you can understand my use case. Then you can decide for
> yourself whether it is worth supporting.

I think demonstrating use cases is futile, as people believe that what
you want is unimplementable. Instead, if you would come forward with
an implementation strategy, that would be more convincing.

Regards,
Martin


From guido@python.org Mon Apr 14 21:03:33 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 16:03:33 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: Your message of "Mon, 14 Apr 2003 12:43:58 PDT."
 <3E9B0F7E.1080407@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com>
 <3E9B0F7E.1080407@prescod.net>
Message-ID: <200304142003.h3EK3XH21345@odiug.zope.com>

Paul, would finalizers have been run if you had included an explicit
gc.collect() call?

If so, I'd say that a sufficiently portable rule is that you can't
trust finalizers to run until GC is run (in Jython, gc.collect() isn't
how it is invoked though).

If gc.collect() didn't solve your problem, full documentation of
cycles would indeed be required. However, I'm reluctant to do so
because this reveals a lot of information about the implementation
that I don't want to have to guarantee for future versions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From cnetzer@mail.arc.nasa.gov Mon Apr 14 21:09:04 2003
From: cnetzer@mail.arc.nasa.gov (Chad Netzer)
Date: 14 Apr 2003 13:09:04 -0700
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
Message-ID: <1050350944.605.28.camel@sayge.arc.nasa.gov>

On Sun, 2003-04-13 at 13:28, Tim Peters wrote:

> It appears to be a refcount error in recently-added C code that tries to
> generalize the builtin range() function, specifically here:
> 
> Fail:
> 	Py_XDECREF(curnum);
> 	Py_XDECREF(istep); <- here
> 	Py_XDECREF(zero);
> 
> Word to the wise: don't ever try to reuse a variable whose address is
> passed to PyArg_ParseTuple for anything other than holding what
> PyArg_ParseTuple does or doesn't store into it.

Hmm, then this is my fault. I did exactly that. My approach was to
Py_INCREF an optional argument it if it was given (ie. not NULL),
otherwise to create it from scratch, and Py_DECREF when I was done. I
believe this was a not uncommon idiom (I can't recal the specifics, but
being my first submitted patch, I made sure to try to look for existing
idioms for argument and error handling). I apologize if I erred.

 I assume a better approach, then is to get the optional istep
argument, and copy it into a variable for your own use (or create it if
it didn't exist)? ie. Never increment or decrement the optional
argument object, returned from PyArg_ParseTuple, at all?

> You'll never get the
> decrefs straight (and even if you manage to at first, the next person to
> modify your code will break it).

Bingo! Guido took a slightly different approach (and ultimately a
better one, I think), in adapting my patch. Perhaps I unknowingly left
a time bomb for him.

I'll submit a patch to fix this all up tonight, if it hasn't already
been addressed by then.

> only-consumed-eight-hours-this-time<wink>-ly y'rs - tim

Oh, ow! Now that pains me. I am very sorry to hear this wasted so much
time.

Chad Netzer




From guido@python.org Mon Apr 14 21:13:31 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 16:13:31 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: Your message of "14 Apr 2003 13:09:04 PDT."
 <1050350944.605.28.camel@sayge.arc.nasa.gov>
References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com>
 <1050350944.605.28.camel@sayge.arc.nasa.gov>
Message-ID: <200304142013.h3EKDVo21434@odiug.zope.com>

> On Sun, 2003-04-13 at 13:28, Tim Peters wrote:
> 
> > It appears to be a refcount error in recently-added C code that tries to
> > generalize the builtin range() function, specifically here:
> > 
> > Fail:
> > 	Py_XDECREF(curnum);
> > 	Py_XDECREF(istep); <- here
> > 	Py_XDECREF(zero);
> > 
> > Word to the wise: don't ever try to reuse a variable whose address is
> > passed to PyArg_ParseTuple for anything other than holding what
> > PyArg_ParseTuple does or doesn't store into it.
> 
> Hmm, then this is my fault. I did exactly that. My approach was to
> Py_INCREF an optional argument it if it was given (ie. not NULL),
> otherwise to create it from scratch, and Py_DECREF when I was done. I
> believe this was a not uncommon idiom (I can't recal the specifics, but
> being my first submitted patch, I made sure to try to look for existing
> idioms for argument and error handling). I apologize if I erred.
> 
> I assume a better approach, then is to get the optional istep
> argument, and copy it into a variable for your own use (or create it if
> it didn't exist)? ie. Never increment or decrement the optional
> argument object, returned from PyArg_ParseTuple, at all?
> 
> > You'll never get the
> > decrefs straight (and even if you manage to at first, the next person to
> > modify your code will break it).
> 
> Bingo! Guido took a slightly different approach (and ultimately a
> better one, I think), in adapting my patch. Perhaps I unknowingly left
> a time bomb for him.

Sort of. Your code didn't have the refcount bug; I moved the
initialization of 'zero' up, and changed a few 'return NULL' lines
into 'goto Fail', but I didn't move the 'INCREF(istep)' up.

> I'll submit a patch to fix this all up tonight, if it hasn't already
> been addressed by then.

Tim fixed it already.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net Mon Apr 14 21:20:58 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 14 Apr 2003 16:20:58 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <1050350944.605.28.camel@sayge.arc.nasa.gov>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEHCFFAA.tim.one@comcast.net>

[Chad Netzer]
> Hmm, then this is my fault. I did exactly that.

Guido thinks he broke it when he updated the patch. It doesn't really
matter to me -- I hate everyone anyway <wink>.

> My approach was to Py_INCREF an optional argument it if it was given (ie.
> not NULL), otherwise to create it from scratch, and Py_DECREF when I was
> done. I believe this was a not uncommon idiom (I can't recal the
> specifics, but being my first submitted patch, I made sure to try to look
> for existing idioms for argument and error handling). I apologize if I
> erred.

I don't know -- and it doesn't matter. I ended up (perhaps) restoring your
original intent. I think Guido was provoked into fiddling it to begin with
because of the large number of exit labels in the original.

> I assume a better approach, then is to get the optional istep
> argument, and copy it into a variable for your own use (or create it if
> it didn't exist)? ie. Never increment or decrement the optional
> argument object, returned from PyArg_ParseTuple, at all?

That's usually safest. This was an unusual function, though (range's
signature is messy, and the extension to long required defaults that
couldn't be expressed as native C types).

> ...
> I'll submit a patch to fix this all up tonight, if it hasn't already
> been addressed by then.

It's all been checked in. Nothing left to do.

>> only-consumed-eight-hours-this-time<wink>-ly y'rs - tim

> Oh, ow! Now that pains me. I am very sorry to hear this wasted so much
> time.

Well, what do you think weekends are for <wink>?



From tim.one@comcast.net Mon Apr 14 21:38:39 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 14 Apr 2003 16:38:39 -0400
Subject: [Python-Dev] Big trouble in CVS Python
In-Reply-To: <2mznmtoa9g.fsf@starship.python.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEHEFFAA.tim.one@comcast.net>

[Michael Hudson]
> It seems to me that this would have been found much more easily if
> floats didn't have a free list anymore...

Hard to guess. It appears that the prematurely released float storage
wasn't allocated again by the time the error occurred, so if floats used
pymalloc a debug run would have sprayed 0xdb bytes into the memory, and that
would have made it obvious that the memory had been freed. OTOH, if another
float object had gotten allocated between the premature-free and the error,
pymalloc and the free-list strategy are both likely to have handed out the
same storage again, and we'd be staring at the same symptoms either way.

It's hard to love the unbounded & immortal free list for floats regardless.
OTOH, I have no doubt that it *is* faster than pymalloc (the latter has more
overheads due to recycling whole pools when possible, and for determining
who (pymalloc or system malloc) owns the memory getting freed; invoking
pymalloc is also another layer of function call).



From pedronis@bluewin.ch Mon Apr 14 21:37:34 2003
From: pedronis@bluewin.ch (Samuele Pedroni)
Date: Mon, 14 Apr 2003 22:37:34 +0200
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9B0CB6.4030101@prescod.net>
References: <200304141450.h3EEoAx15118@odiug.zope.com>
 <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
 <3E9AC6EE.8010900@prescod.net>
 <200304141450.h3EEoAx15118@odiug.zope.com>
Message-ID: <5.2.1.1.0.20030414223253.02a459c0@localhost>

At 12:32 14.04.03 -0700, Paul Prescod wrote:

>Buggy:
>
>def b():
> def a():
> if something:
> a()
> a()
> # ten thousand lines of code
> junk = com_object.do_something()

a should refer and close over junk otherwise nothing bad happens.

 >>> class X:
... def __del__(self):
... print "dying"
...
 >>> def b(x):
... def a(n):
... if n: a(n-1)
... a(1)
... junk = x
...
 >>> b(X())
dying

vs.

 >>> def b(x):
... def a(n):
... if n: a(n-1)
... junk
... junk = x
...
 >>> b(X())
 >>> gc.collect()
dying
10

regards 



From paul@prescod.net Mon Apr 14 22:41:31 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Apr 2003 14:41:31 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304142003.h3EK3XH21345@odiug.zope.com>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com> <3E9B0F7E.1080407@prescod.net> <200304142003.h3EK3XH21345@odiug.zope.com>
Message-ID: <3E9B2B0B.60101@prescod.net>

Guido van Rossum wrote:
> Paul, would finalizers have been run if you had included an explicit
> gc.collect() call?
> 
> If so, I'd say that a sufficiently portable rule is that you can't
> trust finalizers to run until GC is run (in Jython, gc.collect() isn't
> how it is invoked though).

Yes, now that I know to try it, gc.collect() would have fixed the problem.

But I'm not sure where I would have learned to do so. The documentation 
for __del__ is out of date.

 * http://www.python.org/doc/2.3a2/ref/customization.html

The documentation lists a variety of reasons that __del__ might not get 
called (it doesn't claim to be exhaustive but it does list some cases 
that I consider pretty obscure). It doesn't list nested recursive functions.

One strategy is to update the __del__ and gc documentation to add this 
case. Another strategy is to update the __del__ documentation to say: 
"if you want this to be executed deterministically in CPython, call 
gc.collect()". Or both.

 Paul Prescod



From paul@prescod.net Mon Apr 14 22:45:19 2003
From: paul@prescod.net (Paul Prescod)
Date: Mon, 14 Apr 2003 14:45:19 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <m3u1d0voge.fsf@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>	<3E9AC6EE.8010900@prescod.net>	<200304141450.h3EEoAx15118@odiug.zope.com>	<3E9B0CB6.4030101@prescod.net> <m3u1d0voge.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3E9B2BEF.5000907@prescod.net>

Martin v. L=F6wis wrote:
> Paul Prescod <paul@prescod.net> writes:
>=20
>=20
>>I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some
>>background so you can understand my use case. Then you can decide for
>>yourself whether it is worth supporting.
>=20
>=20
> I think demonstrating use cases is futile, as people believe that what
> you want is unimplementable. Instead, if you would come forward with
> an implementation strategy, that would be more convincing.

I'm no going to advocate a particular strategy because I don't know=20
enough of the performance and implementation costs. But you asked for a=20
strategy so I'll at least suggest one. Python could run gc.collect()=20
after returning from functions containing nested recursive functions.=20
Perhaps an opcode flags these functions.

Arguably this happens rarely enough that predictability is more=20
important than performance in this case. (I admit again that it is=20
arguable!) Perhaps there would be some more precise way to tell=20
gc.collect to only inspect graphs containing the offending nested=20
function...or maybe you could be even more precise: if a function is=20
known to be a nested function and it has a single reference count then=20
could you say that the only reference is to itself recursively?

Of course if the function returned a closure and the closure depended on=20
a variable referencing an object then the object should live as long as=20
the closure. That's both expected and necessary.

 Paul Prescod



From jeremy@zope.com Mon Apr 14 22:51:23 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 14 Apr 2003 17:51:23 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9B2B0B.60101@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com>
 <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com>
 <3E9B0F7E.1080407@prescod.net> <200304142003.h3EK3XH21345@odiug.zope.com>
 <3E9B2B0B.60101@prescod.net>
Message-ID: <1050357083.28025.42.camel@slothrop.zope.com>

On Mon, 2003-04-14 at 17:41, Paul Prescod wrote:
> Yes, now that I know to try it, gc.collect() would have fixed the problem.
> 
> But I'm not sure where I would have learned to do so. The documentation 
> for __del__ is out of date.
> 
> * http://www.python.org/doc/2.3a2/ref/customization.html
> 
> The documentation lists a variety of reasons that __del__ might not get 
> called (it doesn't claim to be exhaustive but it does list some cases 
> that I consider pretty obscure). It doesn't list nested recursive functions.

The first one on the list is "circular references between objects." 
(Now that should be "among" objects, but that's not your complaint.) 
Nested recursive functions are an example of data structure involving
circular references.

Jeremy




From nas@python.ca Mon Apr 14 23:09:14 2003
From: nas@python.ca (Neil Schemenauer)
Date: Mon, 14 Apr 2003 15:09:14 -0700
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9B2BEF.5000907@prescod.net>
References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> <3E9B0CB6.4030101@prescod.net> <m3u1d0voge.fsf@mira.informatik.hu-berlin.de> <3E9B2BEF.5000907@prescod.net>
Message-ID: <20030414220914.GA1208@glacier.arctrix.com>

Paul Prescod wrote:
> I'm no going to advocate a particular strategy because I don't know 
> enough of the performance and implementation costs. But you asked for a 
> strategy so I'll at least suggest one. Python could run gc.collect() 
> after returning from functions containing nested recursive functions. 

gc.collect() is too expensive for that to be feasible.

 Neil


From tim.one@comcast.net Mon Apr 14 23:15:58 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 14 Apr 2003 18:15:58 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <3E9B2B0B.60101@prescod.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHAEHOFFAA.tim.one@comcast.net>

[Paul Prescod]
> ...
> But I'm not sure where I would have learned to do so. The documentation
> for __del__ is out of date.
>
> * http://www.python.org/doc/2.3a2/ref/customization.html
>
> The documentation lists a variety of reasons that __del__ might not get
> called (it doesn't claim to be exhaustive but it does list some cases
> that I consider pretty obscure). It doesn't list nested recursive
> functions.

__del__ isn't relevant to your test case, though: if the cycles in question
contained any object with a __del__ method, gc would never have reclaimed
them (and gc.collect() would have had no effect on them either, other than
to move the trash cycles into gc.garbage).

You had __del__-free cycles, and then there is indeed no way to predict when
they'll get reclaimed. I think that's just life; you wouldn't be any better
off in Java or Scheme or anything else.

It's always been difficult to guess when the implementation of a thing may
involve a cycle under the covers, and closures, generators and new-style
classes have created many new opportunities for cycles to appear. I don't
expect users to know when they're going to happen! I can't keep them all
straight myself. I try to write code that doesn't care, though (avoid
__del__ methods; avoid "hiding" critical resources in side effects of what
look like simple expressions; arrange for subsystems that can be explicitly
told to release critical resources). Ain't always easy.



From drifty@alum.berkeley.edu Mon Apr 14 23:59:46 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Mon, 14 Apr 2003 15:59:46 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
 <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>
 <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > > That would be great. Do you have a SF userid yet? Then we can give
> > > you commit privs!
> >
> > bcannon is my username. I was going to wait to ask for commit privs
> > until I had done more patches
<snip>
> OK, you're on.
>

Cool! Thanks, Guido! No more recv() resets from SF! Woohoo!

> > I could. Going to have to learn more LaTeX (and the special
> > extensions). So I can take this on, but I can't make any promises
> > on when this will get done (I would be personally horrified if I
> > can't get this done before 2.3 final gets out the door, but you
> > never know).
>
> With LaTeX, the monkey-see-monkey-do approach works pretty well,
> combined with the Fred-will-fix-my-LaTeX-bugs approach. :-)
>

=) Works for me.

> > Should there be a testing SIG? Could keep a list of tests that
> > could stand to be rewritten or added (I know I was surprised to
> > discover test_urllib was so lacking). Could also start by hashing
> > out these docs and making sure regrtest and test_support stay
> > updated and relevant.
>
> I don't know about a SIG. Testing of what's in the core is fair game
> for python-dev. 3rd party testing, ask around.
>

OK, no SIG then.

> > Personally, I think writing regression tests is a good way to get
> > new people to help with Python. They are simple to write and allows
> > someone to be able to get involved beyond just filing a bug. I know
> > it was a thrill for me the first time I got code checked in and
> > maybe making the entry point easier by trying to get more people to
> > write more regression tests for the libraries will help give someone
> > else that rush and thus become more involved.
> >
> > Or maybe I am just bonkers. =)
>
> Writing a good regression test requires excellent knowledge about the
> code you're testing while not touching it, so that's indeed a good way
> to learn.
>

One of these days I am going to put together an "Intro to python-dev" page
that discusses the basic etiquette on the list and how to slowly get more
and more involved.

But it looks like I have some LaTeX docs to write first.

-Brett


From guido@python.org Tue Apr 15 01:08:08 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 20:08:08 -0400
Subject: [Python-Dev] Using temp files and the Internet in regression tests
In-Reply-To: "Your message of Mon, 14 Apr 2003 15:59:46 PDT."
 <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
 <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>
 <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU>
Message-ID: <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net>

> One of these days I am going to put together an "Intro to
> python-dev" page that discusses the basic etiquette on the list and
> how to slowly get more and more involved.

There's already quite a bit of that at http://www.python.org/dev/
(follow the links to "Development Process" and "Culture"). Since you
already have access to the CVS repository for the website, you could
simply augment what's already there...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz Tue Apr 15 01:32:34 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Apr 2003 12:32:34 +1200 (NZST)
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <1050331961.28028.4.camel@slothrop.zope.com>
Message-ID: <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz>

Jeremy:

> Finalizers seem useful in general, but I would still worry about any
> specific program that managed critical resources using finalizers.

What *are* they useful for, then? Or are they only useful "in
general", and never in any particular case? :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From guido@python.org Tue Apr 15 01:35:34 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 20:35:34 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: "Your message of Sat, 12 Apr 2003 09:43:52 EDT."
 <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304150035.h3F0ZYc03122@pcp02138704pcs.reston01.va.comcast.net>

> Someone accidentally discovered a way to set attributes of built-in
> types, even though the implementation tries to prevent this.

I've checked in what I believe is an adequate block for at least this
particular hack. wrap_setattr(), which is called in response to
<type>.__setattr__(), now compares if the C function it is about to
call is the same as the C function in the built-in base class closest
to the object's class. This means that if B is a built-in class and P
is a Python class derived from B, P.__setattr__ can call
B.__setattr__, but not A.__setattr__ where A is an (also built-in)
base class of B (unless B inherits A.__setattr__).

The following session shows that object.__setattr__ can no longer be
used to set a type's attributes:

Remind us that 'str' is an instance of 'type':

 >>> isinstance(str, type)
 True

'type' has a __setattr__ method that forbids setting all attributes.
Try type.__setattr__; nothing new here:

 >>> type.__setattr__(str, "foo", 42)
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 TypeError: can't set attributes of built-in/extension type 'str'

Remind us that 'object' is a base class of 'type':

 >>> issubclass(type, object)
 True

Now try object.__setattr__. This used to work; now it shows the new
error message:

 >>> object.__setattr__(str, "foo", 42)
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 TypeError: can't apply this __setattr__ to type object

__delattr__ has the same restriction, or else you would be able to
remove existing str methods -- not good:

 >>> object.__delattr__(str, "foo")
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 TypeError: can't apply this __delattr__ to type object

In other (normal) circumstances object.__setattr__ still works:

 >>> class C(object):
 ... pass
 ... 
 >>> x = C()
 >>> object.__setattr__(x, "foo", 42)
 >>> object.__delattr__(x, "foo")

I'll backport this to Python 2.2 as well.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com Tue Apr 15 01:45:53 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 14 Apr 2003 20:45:53 -0400
Subject: [Python-Dev] RE: List wisdom
In-Reply-To: <E1954ey-0007KU-00@borgia.local>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDIEDAB.tim_one@email.msn.com>

[Uche Ogbuji]
> ...
> So I dug through the Python Wiki, and found no such page of gems
> (just a lot of whimsical quotes from #python and a code-sharing page with
some
> odd trinkets). I also checked to see if #python had a chump (opt-in
> log) on which I could put the quote. No dice. I did chump it on the
#4suite
> log:
>
> http://uche.ogbuji.net/tech/akara/?xslt=irc.xslt&date=2003-04-14#14:03:38

I didn't understand a word of that -- young people <0.9 wink>.

> I also created a Python Wiki page for useful notes and code
> snippets from this mailing list:
>
> http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom
>
> Please feel free to use it if anything here seems especially important to
> highlight (in addition to Brett Cannon's tireless work, of course).

Excellent idea! The Python Wiki seems severely underused. I tried to help
it along by fleshing out the snippet. Unlike chumping, typing is something
an old bot knows how to do ...



From guido@python.org Tue Apr 15 01:55:21 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 14 Apr 2003 20:55:21 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: "Your message of Tue, 15 Apr 2003 12:32:34 +1200."
 <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz>
References: <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz>
Message-ID: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net>

> > Finalizers seem useful in general, but I would still worry about any
> > specific program that managed critical resources using finalizers.
> 
> What *are* they useful for, then? Or are they only useful "in
> general", and never in any particular case? :-)

Finalizers are a necessary evil. For example, when I create a Python
file type that encapsulates an external resource like a file
descriptor as returned by os.open(), together with a buffer, I really
want to be able to specify a finalizer that flushes the write buffer
and closes the file descriptor. But I also really want the
application not to rely on that finalizer!

Note that as a library developer, I can write the file type careful to
avoid being part of any cycles, so the restriction on finalizers that
are part of cycles doesn't bother me too much: I'm doing all I can,
and if a file is nevertheless kept alive by a cycle in the
application's code, the application has to deal with this (same as
with a file type implemented in C, for which the restriction on
finalizers in cycles doesn't hold).

Why do I, as library developer, want the finalizer? Because I don't
want to rely on the application to keep track of when a file must be
closed.

But then why do I (still as library developer) recommend that the
application closes files explicitly? Because there's no guarantee
*when* finalizers are run, and it's easy for the application to create
a cycle unknowingly (as we've seen in Paul's case).

Basically, the dual requirement is there to avoid the application and
the library to pointing fingers at each other when there's a problem
with leaking file descriptors.

This makes me think that Python should run the garbage collector
before exiting, so that finalizers on objects that were previously
kept alive by cycles are called (even if finalizers on objects that
are *part* of a cycle still won't be called).

I also think that if a strongly connected component (a stronger
concept than cycle) has exactly one object with a finalizer in it,
that finalizer should be called, and then the object should somehow be
marked as having been finalized (maybe a separate GC queue could be
used for this) in case it is resurrected by its finalizer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net Tue Apr 15 02:18:14 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 14 Apr 2003 21:18:14 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <m3znmux2yr.fsf@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>

[Skip Montanaro]
>> Is it time to think seriously about moving away from SourceForge?

[Martin v. L=F6wis]
> Any proposal to move away from SourceForge should include a proposa=
l
> where to move *to*. I highly admire SourceForge operators for their
> quality of service, and challenge anybody to provide the same quali=
ty
> service. Be prepared to find yourself in a full-time job if you wan=
t
> to take over.

I'm not sure that better alternatives for *some* of what SF does coul=
dn't be
gotten with reasonable effort. For example, on a quiet machine, I ju=
st did
a cvs up on a fully up-to-date Python, via SF. That took 147 seconds=
. I
also did a cvs up on a fully up-to-date Zope3, via Zope Corp's CVS se=
tup.
That took 9 seconds. I expect at least as many (probably more) peopl=
e hit
Zope's CVS as hit Python's CVS, and ZC appears to put minimal effort =
into
maintaining its public CVS servers. A crucial difference is that SF =
CVS has
to serve hundreds of thousands of people, and ZC's more like just hun=
dreds.

> SourceForge performance was *much* worse in the past, and we didn't
> consider moving away, and SF fixed it by buying new hardware. Give
> them some time.

There have been times over the past few weeks when cvsup time via SF =
was as
bad as it's ever been, meaning > half an hour to finish. There have =
also
been times when it's been quite zippy. I think they've made tremendo=
us
strides in cutting response time for the trackers, though (that was i=
ndeed
very much worse in the past).




From greg@cosc.canterbury.ac.nz Tue Apr 15 02:41:15 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Apr 2003 13:41:15 +1200 (NZST)
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz>

> Why do I, as library developer, want the finalizer? Because I don't
> want to rely on the application to keep track of when a file must be
> closed.
> 
> But then why do I (still as library developer) recommend that the
> application closes files explicitly? Because there's no guarantee
> *when* finalizers are run, and it's easy for the application to create
> a cycle unknowingly (as we've seen in Paul's case).

Okay, I think I see what you're saying. Finalizers are needed to make
sure that resources are *eventually* reclaimed, and if that's not good
enough for the application, it needs to make its own
arrangements. Fair enough.

What bothers me, though, is that even with finalizers, the library
writer *still* can't guarantee eventual reclamation. The application
can unwittingly stuff it all up by creating cycles, and there's
nothing the library writer can do about it.

It seems to me that giving up on finalization altogether in the
presence of cycles is too harsh. In most cases, the cycle isn't
actually going to make any difference. With a cycle of your
abovementioned file-descriptor-holding objects, for example, could be
finalized in an arbitrary order, because the *finalizers* don't depend
on any other objects in the cycle.

So maybe there should be some way of classifying the references held
by an object into those that are relied upon by its finalizer, and
those that aren't. The algorithm would then be to first go through and
clear all the references that *aren't* needed by finalizers, and
then...

Actually, that's all you would need to do, I think. If there is an
unambiguous order of finalization, that means there must be no cycles
amongst the references needed by finalizers. And if that's the case,
once you've cleared all the other references, normal ref counting will
take care of the rest and call their finalizers in the proper order.

If there's anything left after that, then you have a genuinely
difficult case and are entitled to give up!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Tue Apr 15 03:16:31 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 14 Apr 2003 22:16:31 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net>

[Guido]
> ...
> This makes me think that Python should run the garbage collector
> before exiting, so that finalizers on objects that were previously
> kept alive by cycles are called (even if finalizers on objects that
> are *part* of a cycle still won't be called).

What about finalizers on objects that are alive at exit because they're
still reachable? We seem to leave a lot of stuff alive at the end. For
example, here are the pymalloc stats at the end under current CVS, after
opening an interactive shell then exiting immediately; this is produced at
the end of Py_Finalize(), and only call_ll_exitfuncs() is done after this
(and that probably shouldn't free anything):

Small block threshold = 256, in 32 size classes.

class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
 2 24 1 1 168
 5 48 1 2 82
 6 56 13 170 766
 7 64 13 445 374
 8 72 5 25 255
 9 80 1 1 49
 15 128 1 2 29
 20 168 5 25 95
 23 192 1 1 20
 25 208 1 2 17
 29 240 1 2 14
 31 256 1 1 14

# times object malloc called = 17,119
3 arenas * 262144 bytes/arena = 786,432

# bytes in allocated blocks = 45,800
# bytes in available blocks = 131,072
145 unused pools * 4096 bytes = 593,920
# bytes lost to pool headers = 1,408
# bytes lost to quantization = 1,944
# bytes lost to arena alignment = 12,288
Total = 786,432

"size" here is 16 bytes larger than in a release build, because of the
8-byte padding added by PYMALLOC_DEBUG on each end of each block requested.
So, e.g., there's one (true size) 8-byte object still living at the end, and
445 48-byte objects. Unreclaimed ints and floats aren't counted here
(they've got their own free lists, and don't go thru pymalloc).

I don't know what all that stuff is, but I bet there are about 25 dicts
still alive at the end.

> I also think that if a strongly connected component (a stronger
> concept than cycle) has exactly one object with a finalizer in it,
> that finalizer should be called, and then the object should somehow be
> marked as having been finalized (maybe a separate GC queue could be
> used for this) in case it is resurrected by its finalizer.

With the addition of gc.get_referents() in 2.3, it's easy to compute SCCs
via Python code now; it's a PITA in C. OTOH, figuring out which finalizers
to call seems a PITA in Python:

 A<->F1 -> F2<->B

F1 and F2 have finalizers; A and B don't. Python code can easily determine
that there are 2 SCCs here, each with 1 finalizer (I suppose gc's
has_finalizer() would need to be exposed, to determine whether __del__
exists correctly). A tricky bit then is that running F1.__del__ may end up
deleting F2 by magic (this is *possible* since F2 is reachable from F1, and
F1.__del__ may break the link to F2), but it's hard for pure-Python code to
know that. So that part seems easier done in C, and creating new gc lists
in C is very easy thanks to the nice doubly-linked-list C API Neil coded in
gcmodule.

Note a subtlety: the finalizers in SCCs should be run in a topsort ordering
of the derived SCC graph (since F1.__del__ can ask F2 to do stuff, despite
that F1 and F2 are in different SCCs, F1 should be finalized before F2).
Finding a topsort order is also easy in Python (and also a PITA in C).

So I picture computing a topsorted list of suitable objects (those that have
a finalizer, and have the only finalizer in their SCC) in Python, and
passing that on to a new gcmodule entry point. The latter can link those
objects into a doubly-linked C list in the same order, and then run
finalizers "left to right". It's a nice property of the gc lists that,
e.g., if F1.__del__ does end up deleting F2, F2 simply vanishes from the
list.

Another subtlety: suppose F1.__del__ resurrects F1, and doesn't delete F2.
Should F2.__del__ be called anyway? Probably not, since if F1 is alive,
everything reachable from it is also alive, and F1 -> F2. I've read that
Java can get into a state where it's only able to reclaim 1 object per full
gc collection due to headaches like this, despite that everything is trash.
There's really no way to tell whether F1.__del__ resurrects F1 short of
starting gc over again (in particular, looking at F1's refcount before and
after running F1.__del__ isn't reliable evidence for either conclusion,
unless the "after" refcount is 0).



From drifty@alum.berkeley.edu Tue Apr 15 04:24:30 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Mon, 14 Apr 2003 20:24:30 -0700 (PDT)
Subject: [Python-Dev] Using temp files and the Internet in regression
 tests
In-Reply-To: <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net>
References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU>
 <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU>
 <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU>
 <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.53.0304142024150.15575@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > One of these days I am going to put together an "Intro to
> > python-dev" page that discusses the basic etiquette on the list and
> > how to slowly get more and more involved.
>
> There's already quite a bit of that at http://www.python.org/dev/
> (follow the links to "Development Process" and "Culture"). Since you
> already have access to the CVS repository for the website, you could
> simply augment what's already there...
>

That's what I had in mind.

-Brett


From tim_one@email.msn.com Tue Apr 15 05:02:02 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 15 Apr 2003 00:02:02 -0400
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEAKEHAB.tim_one@email.msn.com>

[Greg Ewing]
> ...
> What bothers me, though, is that even with finalizers, the library
> writer *still* can't guarantee eventual reclamation. The application
> can unwittingly stuff it all up by creating cycles, and there's
> nothing the library writer can do about it.

They're not trying very hard, then -- and, admittedly, most don't. For
example, every time the library grabs a resource that needs finalization, it
can plant a weakref to it in a singleton private module object with a
__del__ method. When the module is torn down at shutdown, that object's
__del__ gets called via refcount-falls-from-1-to-0 (it's a private object --
the library author can surely guarantee *it* isn't in a cycle), and free
whichever resources still exist then. The library could instead register a
cleanup function via atexit(). Or it could avoid weakrefs by setting up a
thread that wakes up every now and again, to scan gc.garbage for instances
of the objects it passed out. Finding one, it could finalize the resources
held by the object, mark the object as no longer needing resource
finalization, and letting the object leak. And so on -- Python supplies
lots of ways to get what you want even here.

> It seems to me that giving up on finalization altogether in the
> presence of cycles is too harsh. In most cases, the cycle isn't
> actually going to make any difference. With a cycle of your
> abovementioned file-descriptor-holding objects, for example, could be
> finalized in an arbitrary order, because the *finalizers* don't depend
> on any other objects in the cycle.

I expect that's usually so, but that detecting that it's so is intractable.
Even if we relied on programmers declaring their beliefs explicitly, Python
still has to be paranoid enough to avoid crashing if the stated beliefs
aren't really true. For example, if you fight your way through the details
of Java's Byzantine finalization scheme, you'll find that the hairiest parts
of it exist just to ensure that Java's gc internals never end up
dereferencing dangling pointers. This has the added benefit that most
experienced Java programmers appear to testify that Java's finalizers are
useless <wink>.

> So maybe there should be some way of classifying the references held
> by an object into those that are relied upon by its finalizer, and
> those that aren't.

How? I believe this is beyond realistic automated analysis for Python
source.

> The algorithm would then be to first go through and clear all the
> references that *aren't* needed by finalizers, and then...
> [assuming there's no problem leads to the conclusion there's no
> problem <wink>]

You probably need also to detect that the finalizer can't resurrect the
object either, else clearing references that aren't needed specifically for
finalization would leave the resurrected object in a damaged state.



From greg@cosc.canterbury.ac.nz Tue Apr 15 05:51:23 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Apr 2003 16:51:23 +1200 (NZST)
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEAKEHAB.tim_one@email.msn.com>
Message-ID: <200304150451.h3F4pN621203@oma.cosc.canterbury.ac.nz>

> They're not trying very hard, then -- and, admittedly, most don't.
> For example, every time the library grabs a resource that needs
> finalization, it can plant a weakref to it in a singleton private
> module object with a __del__ method...

If you have to go through such convolutions to make __del__ methods
reliable, perhaps some other mechanism should be provided in the first
place.

What you're describing sounds a lot like a scheme used in a Smalltalk
system that I encountered once. Objects didn't have finalizing methods
themselves; instead, an object could register another object as an
"executor" to carry out its "last will and testament". This was done
*after* the object in question had been deallocated, and after all
other GC activity had finished, so there was no risk of resurrecting
anything or getting the GC into a knot.

Using weakrefs, it might be possible to implement something like
this in pure Python, for use as an alternative to __del__ methods.

> How? I believe this is beyond realistic automated analysis for Python
> source.

I wasn't suggesting that it be automated, I was suggesting that it be
done explicitly.

Suppose, e.g. there were a special attribute __dontclear__ which can
be given a list of names of attributes that the GC shouldn't
clear. The author of a __del__ method would then have to make sure
that everything it needs is mentioned in that list, or risk having it
disappear.

> Even if we relied on programmers declaring their beliefs explicitly,
> Python still has to be paranoid enough to avoid crashing if the
> stated beliefs aren't really true.

I can't see how a crash could result -- the worst that might happen is
a __del__ method throws an exception because some attribute that it
relies on has been cleared. That's then a programming error in that
class -- the attribute should have been listed in __dontclear__.

> You probably need also to detect that the finalizer can't resurrect
> the object either, else clearing references that aren't needed
> specifically for finalization would leave the resurrected object in
> a damaged state.

Or just refrain from writing __del__ methods that are silly enough to
resurrect their objects. Or if resurrection really is necessary, put
all their vital attributes in __dontclear__.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From martin@v.loewis.de Tue Apr 15 06:13:02 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 07:13:02 +0200
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz>
References: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz>
Message-ID: <m3k7dwbaxt.fsf@mira.informatik.hu-berlin.de>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> What bothers me, though, is that even with finalizers, the library
> writer *still* can't guarantee eventual reclamation. The application
> can unwittingly stuff it all up by creating cycles, and there's
> nothing the library writer can do about it.

That is not so. If the object having a finalizer doesn't support
references to arbitrary other objects, then the application cannot
make this object be part of a cycle. This is while file objects will
be eventually closed: they cannot be part of a cycle.

Being-referred-to from a cycle is fine: If the cycle itself has no
objects with finalizers, GC will break the cycle at an arbitrary point
and thus release all objects in the cycle, which will then release the
object with a finalizer, which will run the finalizer.

So my usage guideline is this: If you need a finalizer, always make
two objects. One carries the resource being encapsulated, and nothing
else. The other one is the object exposed to applications, which has a
reference to the resource.

Regards,
Martin


From martin@v.loewis.de Tue Apr 15 06:24:48 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 07:24:48 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
Message-ID: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>

Tim Peters <tim.one@comcast.net> writes:

> I'm not sure that better alternatives for *some* of what SF does couldn't be
> gotten with reasonable effort. For example, on a quiet machine, I just did
> a cvs up on a fully up-to-date Python, via SF.

It is probably possible to find somebody to host the Python CVS and
offer enough connectivity to give more performant service than
SF. However, there is more to hosting such a service: You need user
management, email notifications, backups, and occasional hand-editing
of the CVS repository. I would expect that it might consume
significant time (several hours a week) to host the Python CVS. (Time
per project reduces if you host several projects)

So from your message, I still don't see who could be taking over the
Python CVS. Skip, did you have anybody specific in mind?

Regards,
Martin


From greg@cosc.canterbury.ac.nz Tue Apr 15 06:41:40 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Apr 2003 17:41:40 +1200 (NZST)
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <m3k7dwbaxt.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz>

> If the object having a finalizer doesn't support references to
> arbitrary other objects, then the application cannot make this object
> be part of a cycle.

It could make a subclass, though...

> If you need a finalizer, always make two objects. One carries the
> resource being encapsulated, and nothing else. The other one is the
> object exposed to applications, which has a reference to the resource.

That actually sounds like a reasonable solution. I was thinking
that __del__ methods on anything referenced from the cycle would
prevent collection, not just in the cycle itself, but as you point
out, that's not the case. Given that, many of my objections go
away.

I still may write that Executors module, though, it could be
fun...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From walter@livinglogic.de Tue Apr 15 12:53:02 2003
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Tue, 15 Apr 2003 13:53:02 +0200
Subject: [Python-Dev] ValueErrors in range()
Message-ID: <3E9BF29E.6060807@livinglogic.de>

Current CVS raises ValueErrors for range() arguments
of the wrong type:

 >>> range(0, "spam")
Traceback (most recent call last):
 File "<stdin>", line 1, in ?
ValueError: integer end argument expected, got str.

Shouldn't these be TypeErrors?

Bye,
 Walter Dörwald



From barry@python.org Tue Apr 15 12:56:48 2003
From: barry@python.org (Barry Warsaw)
Date: 15 Apr 2003 07:56:48 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050407808.9401.8.camel@anthem>

On Tue, 2003-04-15 at 01:24, Martin v. L=F6wis wrote:

> It is probably possible to find somebody to host the Python CVS and
> offer enough connectivity to give more performant service than
> SF. However, there is more to hosting such a service: You need user
> management, email notifications, backups, and occasional hand-editing
> of the CVS repository.=20

This would actually be a big advantage over the present situation. CVS
repository surgery is (sadly) necessary sometimes, but it's something we
currently can't do without a lot of pain.

> I would expect that it might consume
> significant time (several hours a week) to host the Python CVS. (Time
> per project reduces if you host several projects)

I can think of at least 3 projects we could host. :). But even if
GForge was our panacea, it would still take a real commitment to run and
maintain. I suspect the current crop of volunteers is already stretched
pretty far. OTOH, if we could roll Zope into the mix, we'd have more
resources to draw from, maybe.

-Barry




From skip@pobox.com Tue Apr 15 13:36:17 2003
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 15 Apr 2003 07:36:17 -0500
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
Message-ID: <16027.64705.970817.546379@montanaro.dyndns.org>

 Martin> So from your message, I still don't see who could be taking over
 Martin> the Python CVS. Skip, did you have anybody specific in mind?

Nope. I was just tossing out an idea based on my growing frustration with
SF's poor performance. I see the abysmal CVS performance Tim referred to
and also find the bug tracker performance to be problematic (web access,
submissions and updates are often very slow and sometimes fail, forcing me
to sit around waiting for them to complete and then going back to check that
my submission/change actually worked).

Skip


From guido@python.org Tue Apr 15 13:42:43 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 08:42:43 -0400
Subject: [Python-Dev] ValueErrors in range()
In-Reply-To: "Your message of Tue, 15 Apr 2003 13:53:02 +0200."
 <3E9BF29E.6060807@livinglogic.de>
References: <3E9BF29E.6060807@livinglogic.de>
Message-ID: <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net>

> Current CVS raises ValueErrors for range() arguments
> of the wrong type:
> 
> >>> range(0, "spam")
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> ValueError: integer end argument expected, got str.
> 
> Shouldn't these be TypeErrors?

Right! I did not review this code enough. :-( Fixing now...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ben@algroup.co.uk Tue Apr 15 15:15:53 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Tue, 15 Apr 2003 15:15:53 +0100
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <1050407808.9401.8.camel@anthem>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>	 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem>
Message-ID: <3E9C1419.6090908@algroup.co.uk>

Barry Warsaw wrote:

> On Tue, 2003-04-15 at 01:24, Martin v. Löwis wrote:
> 
> 
>>It is probably possible to find somebody to host the Python CVS and
>>offer enough connectivity to give more performant service than
>>SF. However, there is more to hosting such a service: You need user
>>management, email notifications, backups, and occasional hand-editing
>>of the CVS repository. 
> 
> 
> This would actually be a big advantage over the present situation. CVS
> repository surgery is (sadly) necessary sometimes, but it's something we
> currently can't do without a lot of pain.
> 
> 
>>I would expect that it might consume
>>significant time (several hours a week) to host the Python CVS. (Time
>>per project reduces if you host several projects)
> 
> 
> I can think of at least 3 projects we could host. :). But even if
> GForge was our panacea, it would still take a real commitment to run and
> maintain. I suspect the current crop of volunteers is already stretched
> pretty far. OTOH, if we could roll Zope into the mix, we'd have more
> resources to draw from, maybe.

My company would be happy to host it in The Bunker
(http://www.thebunker.net/). We do have to figure out some way to get
compensated for the bandwidth we'd have to pay for (does anyone know how
much that is?), but I'm leaving that to those that worry about such
things. Presumably they'd want a link to us somewhere, or something of
that nature.

We have plenty of experience running CVS and we have 24x7 support.

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From blunck@gst.com Tue Apr 15 15:11:42 2003
From: blunck@gst.com (Christopher Blunck)
Date: Tue, 15 Apr 2003 10:11:42 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <16027.64705.970817.546379@montanaro.dyndns.org>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org>
Message-ID: <20030415141142.GA6011@homer.gst.com>

On Tue, Apr 15, 2003 at 07:36:17AM -0500, Skip Montanaro wrote:
> Nope. I was just tossing out an idea based on my growing frustration with
> SF's poor performance. I see the abysmal CVS performance Tim referred to
> and also find the bug tracker performance to be problematic (web access,
> submissions and updates are often very slow and sometimes fail, forcing me
> to sit around waiting for them to complete and then going back to check that
> my submission/change actually worked).

...

Not to mention file uploads that don't actually upload, erroneous error 
messages when posting patches and/or bugs, and an inability to map bugs to
patches as a built in feature.


-c

-- 
 10:10am up 176 days, 1:08, 4 users, load average: 1.18, 1.40, 1.62


From guido@python.org Tue Apr 15 15:23:16 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 10:23:16 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: Your message of "Tue, 15 Apr 2003 15:15:53 BST."
 <3E9C1419.6090908@algroup.co.uk>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem>
 <3E9C1419.6090908@algroup.co.uk>
Message-ID: <200304151424.h3FENGS26701@odiug.zope.com>

> My company would be happy to host it in The Bunker
> (http://www.thebunker.net/). We do have to figure out some way to get
> compensated for the bandwidth we'd have to pay for (does anyone know how
> much that is?), but I'm leaving that to those that worry about such
> things. Presumably they'd want a link to us somewhere, or something of
> that nature.
> 
> We have plenty of experience running CVS and we have 24x7 support.

I'd like to pursue this, but I don't have time myself. A sponsorship
link to TheBunker would definitely be a possibility (we have a link to
XS4ALL at the top of www.python.org).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Tue Apr 15 15:26:16 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 10:26:16 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: Your message of "Tue, 15 Apr 2003 10:11:42 EDT."
 <20030415141142.GA6011@homer.gst.com>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org>
 <20030415141142.GA6011@homer.gst.com>
Message-ID: <200304151426.h3FEQGx26716@odiug.zope.com>

> Not to mention file uploads that don't actually upload, erroneous error 
> messages when posting patches and/or bugs, and an inability to map bugs to
> patches as a built in feature.

Right. Some of these have (finally) been fixed. But my
meta-complaint about SF is that it's impossible to get things fixed at
our schedule. I'm still hoping to revive the effort of moving the
tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ben@algroup.co.uk Tue Apr 15 15:45:23 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Tue, 15 Apr 2003 15:45:23 +0100
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <200304151424.h3FENGS26701@odiug.zope.com>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com>
Message-ID: <3E9C1B03.1070803@algroup.co.uk>

Guido van Rossum wrote:

>>My company would be happy to host it in The Bunker
>>(http://www.thebunker.net/). We do have to figure out some way to get
>>compensated for the bandwidth we'd have to pay for (does anyone know how
>>much that is?), but I'm leaving that to those that worry about such
>>things. Presumably they'd want a link to us somewhere, or something of
>>that nature.
>>
>>We have plenty of experience running CVS and we have 24x7 support.
> 
> 
> I'd like to pursue this, but I don't have time myself. A sponsorship
> link to TheBunker would definitely be a possibility (we have a link to
> XS4ALL at the top of www.python.org).

Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews?

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From guido@python.org Tue Apr 15 16:18:02 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 11:18:02 -0400
Subject: [Python-Dev] test_pwd failing
Message-ID: <200304151518.h3FFI2S27822@odiug.zope.com>

Somebody just changed the pwd module. I now get these errors when
running test_pwd:

[guido@odiug linux]$ ./python ../Lib/test/regrtest.py test_pwd
test_pwd
test test_pwd failed -- Traceback (most recent call last):
 File "/mnt/home/guido/projects/python/dist/src/Lib/test/test_pwd.py", line 29, in test_values
 self.assertEqual(pwd.getpwuid(e.pw_uid), e)
 File "/mnt/home/guido/projects/python/dist/src/Lib/unittest.py", line 292, in failUnlessEqual
 raise self.failureException, \
AssertionError: ('guido', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido', '/bin/bash') != ('guido1', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido1', '/bin/bash')

1 test failed:
 test_pwd
[guido@odiug linux]$ 

The last two lines of my /etc/passwd file are:

guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash
guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash

--Guido van Rossum (home page: http://www.python.org/~guido/)


From walter@livinglogic.de Tue Apr 15 16:31:05 2003
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Tue, 15 Apr 2003 17:31:05 +0200
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <200304151518.h3FFI2S27822@odiug.zope.com>
References: <200304151518.h3FFI2S27822@odiug.zope.com>
Message-ID: <3E9C25B9.7020308@livinglogic.de>

Guido van Rossum wrote:

> Somebody just changed the pwd module. I now get these errors when
> running test_pwd:
> 
> [guido@odiug linux]$ ./python ../Lib/test/regrtest.py test_pwd
> test_pwd
> test test_pwd failed -- Traceback (most recent call last):
> File "/mnt/home/guido/projects/python/dist/src/Lib/test/test_pwd.py", line 29, in test_values
> self.assertEqual(pwd.getpwuid(e.pw_uid), e)
> File "/mnt/home/guido/projects/python/dist/src/Lib/unittest.py", line 292, in failUnlessEqual
> raise self.failureException, \
> AssertionError: ('guido', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido', '/bin/bash') != ('guido1', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido1', '/bin/bash')
> 
> 1 test failed:
> test_pwd
> [guido@odiug linux]$ 
> 
> The last two lines of my /etc/passwd file are:
> 
> guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash
> guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash

That's my fault.

The duplicate entry for the uid 4102 makes the test fail.

I'll think of an alternate test for this case.

Bye,
 Walter Dörwald




From walter@livinglogic.de Tue Apr 15 16:41:28 2003
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Tue, 15 Apr 2003 17:41:28 +0200
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <3E9C25B9.7020308@livinglogic.de>
References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de>
Message-ID: <3E9C2828.4040803@livinglogic.de>

Walter Dörwald wrote:

> Guido van Rossum wrote:
> 
>> Somebody just changed the pwd module. I now get these errors when
>> running test_pwd:
>>
>> [...]
>> guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash
>> guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash
> 
> That's my fault.
> 
> The duplicate entry for the uid 4102 makes the test fail.
> 
> I'll think of an alternate test for this case.

Fixed!

Should the same change be done for the pwd module, i.e.
are duplicate gid's allowed in /etc/group?

Bye,
 Walter Dörwald




From fdrake@acm.org Tue Apr 15 16:41:21 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 Apr 2003 11:41:21 -0400
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <3E9C25B9.7020308@livinglogic.de>
References: <200304151518.h3FFI2S27822@odiug.zope.com>
 <3E9C25B9.7020308@livinglogic.de>
Message-ID: <16028.10273.709530.833600@grendel.zope.com>

Walter D=F6rwald writes:
 > The duplicate entry for the uid 4102 makes the test fail.
 >=20
 > I'll think of an alternate test for this case.

Since the duplicate entry is perfectly legal, I think the test can
really only check that the uid of the retrieved record match the
requested uid. I don't see what else can be reasonably checked since
everything else for the two entries could differ.


 -Fred

--=20
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation


From fdrake@acm.org Tue Apr 15 16:47:09 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 Apr 2003 11:47:09 -0400
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <3E9C2828.4040803@livinglogic.de>
References: <200304151518.h3FFI2S27822@odiug.zope.com>
 <3E9C25B9.7020308@livinglogic.de>
 <3E9C2828.4040803@livinglogic.de>
Message-ID: <16028.10621.958603.27070@grendel.zope.com>

Walter D=F6rwald writes:
 > Fixed!

And well! Thanks.

 > Should the same change be done for the pwd module, i.e.
 > are duplicate gid's allowed in /etc/group?

I think they are, but I'm less certain of that.


 -Fred

--=20
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation


From gh@ghaering.de Tue Apr 15 16:49:33 2003
From: gh@ghaering.de (Gerhard =?iso-8859-1?Q?H=E4ring?=)
Date: Tue, 15 Apr 2003 17:49:33 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <3E9C1B03.1070803@algroup.co.uk>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem>
 <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com>
 <3E9C1B03.1070803@algroup.co.uk>
Message-ID: <20030415154933.GA6030@mephisto.ghaering.test>

* Ben Laurie <ben@algroup.co.uk> [2003-04-15 15:45 +0100]:
> Guido van Rossum wrote:
> >>My company would be happy to host it in The Bunker
> >>(http://www.thebunker.net/). [...]
> >>We have plenty of experience running CVS and we have 24x7 support.
> > 
> > I'd like to pursue this, but I don't have time myself. A sponsorship
> > link to TheBunker would definitely be a possibility (we have a link to
> > XS4ALL at the top of www.python.org).
> 
> Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews?

Probably only Sourceforge staff. But maybe we can avoid asking them ...

My CVS documentation has to say this:

 CVS can keep a history file that tracks each use of the
 `checkout', 
 `commit', `rtag', `update', and `release' commands. You can use 
 `history' to display this information in various formats.

So maybe somebody CVS savvy can make the needed changes to Python's
CVSROOT at Sourceforge so we can collect the needed data for a week or
so in order to produce a statistic?

Gerhard
-- 
mail: gh@ghaering.de
web: http://ghaering.de/


From guido@python.org Tue Apr 15 16:49:27 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 11:49:27 -0400
Subject: [Python-Dev] test_pwd failing
In-Reply-To: Your message of "Tue, 15 Apr 2003 17:41:28 +0200."
 <3E9C2828.4040803@livinglogic.de>
References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de>
 <3E9C2828.4040803@livinglogic.de>
Message-ID: <200304151549.h3FFnRR28753@odiug.zope.com>

> Should the same change be done for the pwd module, i.e.
 ^^^grp
> are duplicate gid's allowed in /etc/group?

I guess group aliases are theoretically possible, so if you can easily
fix the test, go ahead.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com Tue Apr 15 17:45:36 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 15 Apr 2003 12:45:36 -0400
Subject: [Python-Dev] Evil setattr hack
Message-ID: <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net>

 >I've checked in what I believe is an adequate block for at least this
 >particular hack. wrap_setattr(), which is called in response to
 ><type>.__setattr__(), now compares if the C function it is about to
 >call is the same as the C function in the built-in base class closest
 >to the object's class. This means that if B is a built-in class and P
 >is a Python class derived from B, P.__setattr__ can call
 >B.__setattr__, but not A.__setattr__ where A is an (also built-in)
 >base class of B (unless B inherits A.__setattr__).

Does this follow __mro__ or __base__? I'm specifically wondering about the 
implications of multiple inheritance from more than one C base class; this 
sort of thing (safety checks relating to heap vs. non-heap types and the 
"closest" method of a particular kind) has bitten me before in relation to 
ZODB4's Persistence package. In that context, mixing 'type' and 
'PersistentMetaClass' makes it impossible to instantiate the resulting 
metaclass, because neither type.__new__ nor PersistentMetaClass.__new__ is 
considered "safe" to execute. My "evil hack" to fix that was to add an 
extra PyObject * to PersistentMetaClass so that it has a larger 
tp_basicsize than 'type' and Python then considers it the '__base__' type, 
thus causing its '__new__' method to be accepted as legitimate.




From martin@v.loewis.de Tue Apr 15 18:17:30 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 19:17:30 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <1050407808.9401.8.camel@anthem>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
 <1050407808.9401.8.camel@anthem>
Message-ID: <m3llybel3o.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> I can think of at least 3 projects we could host. :). 

"We" being "ZC", "PythonLabs", or pluralis majestatis?

Regards,
Martin


From martin@v.loewis.de Tue Apr 15 18:22:17 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 19:22:17 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <3E9C1B03.1070803@algroup.co.uk>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
 <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk>
 <200304151424.h3FENGS26701@odiug.zope.com>
 <3E9C1B03.1070803@algroup.co.uk>
Message-ID: <m3he8zekvq.fsf@mira.informatik.hu-berlin.de>

Ben Laurie <ben@algroup.co.uk> writes:

> Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews?

To get some estimate, try to guess how many full downloads of the
entire Python tree you will get per day. As Gerhard explains, only SF
would know the numbers, but my guess is that incremental updates are
negligible compared to full downloads.

To draw some random number, I guess you should accomodate 20 full
downloads per day, with a complete download being 50MB (i.e. only the
dist/src part). Whether this number is close to reality, I don't know.

Regards,
Martin


From guido@python.org Tue Apr 15 19:33:48 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 14:33:48 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: Your message of "Tue, 15 Apr 2003 12:45:36 EDT."
 <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net>
References: <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net>
Message-ID: <200304151833.h3FIXmU29036@odiug.zope.com>

[Guido]
> >I've checked in what I believe is an adequate block for at least
> >this particular hack. wrap_setattr(), which is called in response
> >to <type>.__setattr__(), now compares if the C function it is
> >about to call is the same as the C function in the built-in base
> >class closest to the object's class. This means that if B is a
> >built-in class and P is a Python class derived from B,
> >P.__setattr__ can call B.__setattr__, but not A.__setattr__ where
> >A is an (also built-in) base class of B (unless B inherits
> >A.__setattr__).

> From: "Phillip J. Eby" <pje@telecommunity.com>

> Does this follow __mro__ or __base__?

It follows __base__, like everything concerned about C level instance
lay-out.

> I'm specifically wondering about the implications of multiple
> inheritance from more than one C base class; this sort of thing
> (safety checks relating to heap vs. non-heap types and the "closest"
> method of a particular kind) has bitten me before in relation to
> ZODB4's Persistence package.

It is usually impossible to inherit from more than one C base class,
unless all but one are mix-in classes, meaning they add nothing to the
instance lay-out of a common base class.

> In that context, mixing 'type' and 'PersistentMetaClass' makes it
> impossible to instantiate the resulting metaclass, because neither
> type.__new__ nor PersistentMetaClass.__new__ is considered "safe" to
> execute.

You're referring to this error message from tp_new_wrapper(), right:

 "%s.__new__(%s) is not safe, use %s.__new__()"

> My "evil hack" to fix that was to add an extra PyObject *
> to PersistentMetaClass so that it has a larger tp_basicsize than
> 'type' and Python then considers it the '__base__' type, thus
> causing its '__new__' method to be accepted as legitimate.

Is this because the algorithm in best_base() picks the wrong base
otherwise?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com Tue Apr 15 19:45:43 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 15 Apr 2003 14:45:43 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <200304151833.h3FIXmU29036@odiug.zope.com>
References: <Your message of "Tue, 15 Apr 2003 12:45:36 EDT." <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net>
 <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net>
Message-ID: <5.1.1.6.0.20030415143437.02e62ae0@telecommunity.com>

At 02:33 PM 4/15/03 -0400, Guido van Rossum wrote:
>[Guido]
> > >I've checked in what I believe is an adequate block for at least
> > >this particular hack. wrap_setattr(), which is called in response
> > >to <type>.__setattr__(), now compares if the C function it is
> > >about to call is the same as the C function in the built-in base
> > >class closest to the object's class. This means that if B is a
> > >built-in class and P is a Python class derived from B,
> > >P.__setattr__ can call B.__setattr__, but not A.__setattr__ where
> > >A is an (also built-in) base class of B (unless B inherits
> > >A.__setattr__).
>
> > From: "Phillip J. Eby" <pje@telecommunity.com>
>
> > Does this follow __mro__ or __base__?
>
>It follows __base__, like everything concerned about C level instance
>lay-out.
>
> > I'm specifically wondering about the implications of multiple
> > inheritance from more than one C base class; this sort of thing
> > (safety checks relating to heap vs. non-heap types and the "closest"
> > method of a particular kind) has bitten me before in relation to
> > ZODB4's Persistence package.
>
>It is usually impossible to inherit from more than one C base class,
>unless all but one are mix-in classes, meaning they add nothing to the
>instance lay-out of a common base class.
>
> > In that context, mixing 'type' and 'PersistentMetaClass' makes it
> > impossible to instantiate the resulting metaclass, because neither
> > type.__new__ nor PersistentMetaClass.__new__ is considered "safe" to
> > execute.
>
>You're referring to this error message from tp_new_wrapper(), right:
>
> "%s.__new__(%s) is not safe, use %s.__new__()"

Yep, that's the one.


> > My "evil hack" to fix that was to add an extra PyObject *
> > to PersistentMetaClass so that it has a larger tp_basicsize than
> > 'type' and Python then considers it the '__base__' type, thus
> > causing its '__new__' method to be accepted as legitimate.
>
>Is this because the algorithm in best_base() picks the wrong base
>otherwise?

Yes, at least for Python 2.2. However, the problem with ZODB4 was only an 
issue on 2.2; on 2.3, PersistentMetaClass *is* 'type', because it is there 
to workaround C layout issues in 2.2 that don't exist in 2.3. So this is 
probably all moot.

Anyway... if I recall correctly, even if you got best_base() to pick the 
right base by changing the order of mixing in the classes, you got a 
*different* safety error message; I think it might have been in the 
resulting class, though, rather than in the metaclass. This was all back 
in November, so my memory is a little hazy. I think there might have been 
more details in the Zope3-Dev collector issue (#86), but I think Jeremy 
showed that info to you previously and said that it wasn't enough for you 
to understand what the problem was. I think part of the complexity had to 
do with the fact that one of the types (my subclass of 'type') was a "heap 
type", and PersistentMetaClass was not.

But as you pointed out, subclassing from multiple C bases is a rarity, so I 
don't see any point to following this up further, unless you have some 
perverse desire to have yet another new-style class layout algorithm change 
for Python 2.2.3. :) It's probably better just to leave my "make it 
bigger" hack in ZODB4, since PersistentMetaClass itself is one big Python 
2.2 backward compatibility hack anyway. <wink>



From martin@v.loewis.de Tue Apr 15 19:50:32 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 20:50:32 +0200
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz>
References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz>
Message-ID: <m38yubegsn.fsf@mira.informatik.hu-berlin.de>

Greg Ewing <greg@cosc.canterbury.ac.nz> writes:

> > If the object having a finalizer doesn't support references to
> > arbitrary other objects, then the application cannot make this object
> > be part of a cycle.
> 
> It could make a subclass, though...

If the type is carefully designed, it can't...

Regards,
Martin


From guido@python.org Tue Apr 15 20:06:15 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 15:06:15 -0400
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: Your message of "15 Apr 2003 20:50:32 +0200."
 <m38yubegsn.fsf@mira.informatik.hu-berlin.de>
References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz>
 <m38yubegsn.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304151906.h3FJ6FP29320@odiug.zope.com>

> > > If the object having a finalizer doesn't support references to
> > > arbitrary other objects, then the application cannot make this
> > > object be part of a cycle.

> Greg Ewing <greg@cosc.canterbury.ac.nz> writes:
> 
> > It could make a subclass, though...

> From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
> 
> If the type is carefully designed, it can't...

I suppose you have something in mind like this (which is the only way
I can come up with to implement something like a 'final' class in pure
Python):

 >>> class C(object): 
 ... def __new__(cls):
 ... if cls is not C: raise TypeError, "haha"
 ... return object.__new__(cls)
 ...
 >>> class D(C): pass
 ... 
 >>> a = D()
 Traceback (most recent call last):
 File "<stdin>", line 1, in ?
 File "<stdin>", line 3, in __new__
 TypeError: haha
 >>>

But how would you prevent this?

 >>> a = C()
 >>> a.__class__ = D
 >>>

--Guido van Rossum (home page: http://www.python.org/~guido/)


From cnetzer@mail.arc.nasa.gov Tue Apr 15 20:11:38 2003
From: cnetzer@mail.arc.nasa.gov (Chad Netzer)
Date: 15 Apr 2003 12:11:38 -0700
Subject: [Python-Dev] ValueErrors in range()
In-Reply-To: <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net>
References: <3E9BF29E.6060807@livinglogic.de>
 <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <1050433898.607.35.camel@sayge.arc.nasa.gov>

On Tue, 2003-04-15 at 05:42, Guido van Rossum wrote:

> > Shouldn't these be TypeErrors?
> 
> Right! I did not review this code enough. :-( Fixing now...

My fault again. I misremembered Guido wishing that range() returned
ValueError on floats (which I thought was strange at the time). Going
over a previous email, I see that he did say TypeError.

In the meantime, the test_builtins.py needs to be updated to check
against TypeError rather than ValueError. (maybe it'll be done by now;
ah, just checked, it has)

Chad Netzer




From barry@python.org Tue Apr 15 20:26:20 2003
From: barry@python.org (Barry Warsaw)
Date: 15 Apr 2003 15:26:20 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <m3llybel3o.fsf@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
 <1050407808.9401.8.camel@anthem>
 <m3llybel3o.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050434780.501.32.camel@barry>

On Tue, 2003-04-15 at 13:17, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > I can think of at least 3 projects we could host. :). 
> 
> "We" being "ZC", "PythonLabs", or pluralis majestatis?

"We" being me. :)

Python, Mailman, and mimelib to name 3. PyBSDDB perhaps, and I'm sure
others. Maybe even open it up to (most? all?) Python projects with the
proper PSF, er, wheel grease. :)

-Barry




From guido@python.org Tue Apr 15 20:23:48 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 15:23:48 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: Your message of "Mon, 14 Apr 2003 22:16:31 EDT."
 <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net>
Message-ID: <200304151923.h3FJNmG29436@odiug.zope.com>

> [Guido]
> > ...
> > This makes me think that Python should run the garbage collector
> > before exiting, so that finalizers on objects that were previously
> > kept alive by cycles are called (even if finalizers on objects that
> > are *part* of a cycle still won't be called).

[Tim]
> What about finalizers on objects that are alive at exit because
> they're still reachable? We seem to leave a lot of stuff alive at
> the end. For example, here are the pymalloc stats at the end under
> current CVS, after opening an interactive shell then exiting
> immediately; this is produced at the end of Py_Finalize(), and only
> call_ll_exitfuncs() is done after this (and that probably shouldn't
> free anything):
> 
> Small block threshold = 256, in 32 size classes.
> 
> class size num pools blocks in use avail blocks
> ----- ---- --------- ------------- ------------
> 2 24 1 1 168
> 5 48 1 2 82
> 6 56 13 170 766
> 7 64 13 445 374
> 8 72 5 25 255
> 9 80 1 1 49
> 15 128 1 2 29
> 20 168 5 25 95
> 23 192 1 1 20
> 25 208 1 2 17
> 29 240 1 2 14
> 31 256 1 1 14
> 
> # times object malloc called = 17,119
> 3 arenas * 262144 bytes/arena = 786,432
> 
> # bytes in allocated blocks = 45,800
> # bytes in available blocks = 131,072
> 145 unused pools * 4096 bytes = 593,920
> # bytes lost to pool headers = 1,408
> # bytes lost to quantization = 1,944
> # bytes lost to arena alignment = 12,288
> Total = 786,432
> 
> "size" here is 16 bytes larger than in a release build, because of
> the 8-byte padding added by PYMALLOC_DEBUG on each end of each block
> requested. So, e.g., there's one (true size) 8-byte object still
> living at the end, and 445 48-byte objects. Unreclaimed ints and
> floats aren't counted here (they've got their own free lists, and
> don't go thru pymalloc).
> 
> I don't know what all that stuff is, but I bet there are about 25
> dicts still alive at the end.

Close! I moved the debugging code that can print the list of all
objects still alive at the end around so that it is now next to the
code that prints the above malloc stats. (If you're following CVS
email you might have noticed this. :-) The full output is way too
large to post; you can see for yourself by creating a debug build and
running this (on Unix; windows users use their imagination or upgrade
their OS):

 PYTHONDUMPREFS= ./python -S -c pass

When I run this, I see 23 dictionaries. One is the dict of interned
strings that are still alive; the others are the tp_dicts of the
various built-in type objects. Some interned strings appear to be
kept alive by various static globals holding names for faster name
lookup; there isn't much we can do about that. I also don't think we
should bother un-initializing the built-in types. Apart from that, I
don't think I see anything that looks suspect. Of course, running a
larger program with the same setup might reveal real leaks.

> > I also think that if a strongly connected component (a stronger
> > concept than cycle) has exactly one object with a finalizer in it,
> > that finalizer should be called, and then the object should
> > somehow be marked as having been finalized (maybe a separate GC
> > queue could be used for this) in case it is resurrected by its
> > finalizer.
> 
> With the addition of gc.get_referents() in 2.3, it's easy to compute
> SCCs via Python code now; it's a PITA in C. OTOH, figuring out
> which finalizers to call seems a PITA in Python:
> 
> A<->F1 -> F2<->B
> 
> F1 and F2 have finalizers; A and B don't. Python code can easily
> determine that there are 2 SCCs here, each with 1 finalizer (I
> suppose gc's has_finalizer() would need to be exposed, to determine
> whether __del__ exists correctly). A tricky bit then is that
> running F1.__del__ may end up deleting F2 by magic (this is
> *possible* since F2 is reachable from F1, and F1.__del__ may break
> the link to F2), but it's hard for pure-Python code to know that.
> So that part seems easier done in C, and creating new gc lists in C
> is very easy thanks to the nice doubly-linked-list C API Neil coded
> in gcmodule.
> 
> Note a subtlety: the finalizers in SCCs should be run in a topsort
> ordering of the derived SCC graph (since F1.__del__ can ask F2 to do
> stuff, despite that F1 and F2 are in different SCCs, F1 should be
> finalized before F2). Finding a topsort order is also easy in
> Python (and also a PITA in C).
> 
> So I picture computing a topsorted list of suitable objects (those
> that have a finalizer, and have the only finalizer in their SCC) in
> Python, and passing that on to a new gcmodule entry point. The
> latter can link those objects into a doubly-linked C list in the
> same order, and then run finalizers "left to right". It's a nice
> property of the gc lists that, e.g., if F1.__del__ does end up
> deleting F2, F2 simply vanishes from the list.
> 
> Another subtlety: suppose F1.__del__ resurrects F1, and doesn't
> delete F2. Should F2.__del__ be called anyway? Probably not, since
> if F1 is alive, everything reachable from it is also alive, and F1
> -> F2. I've read that Java can get into a state where it's only
> able to reclaim 1 object per full gc collection due to headaches
> like this, despite that everything is trash. There's really no way
> to tell whether F1.__del__ resurrects F1 short of starting gc over
> again (in particular, looking at F1's refcount before and after
> running F1.__del__ isn't reliable evidence for either conclusion,
> unless the "after" refcount is 0).

I'm glazing over the details now, but there seems to be a kernel of
useful cleanup in here somehow; I hope that someone will be able to
contribute a prototype of such code at least!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de Tue Apr 15 19:27:55 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 20:27:55 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <200304151426.h3FEQGx26716@odiug.zope.com>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
 <16027.64705.970817.546379@montanaro.dyndns.org>
 <20030415141142.GA6011@homer.gst.com>
 <200304151426.h3FEQGx26716@odiug.zope.com>
Message-ID: <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> Right. Some of these have (finally) been fixed. But my
> meta-complaint about SF is that it's impossible to get things fixed at
> our schedule. I'm still hoping to revive the effort of moving the
> tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/

However, I take the fact that it has been sitting in that state for
many months now as an indication that our schedule might not outpace
SF. This stuff consumes a lot of time, and I'm willing to accept
worse-than-optimal quality of service if it doesn't consume my time.

Regards,
Martin



From martin@v.loewis.de Tue Apr 15 20:50:11 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 15 Apr 2003 21:50:11 +0200
Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures)
In-Reply-To: <200304151906.h3FJ6FP29320@odiug.zope.com>
References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz>
 <m38yubegsn.fsf@mira.informatik.hu-berlin.de>
 <200304151906.h3FJ6FP29320@odiug.zope.com>
Message-ID: <m3r883czgs.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> > > It could make a subclass, though...
> 
> > If the type is carefully designed, it can't...
> 
> I suppose you have something in mind like this (which is the only way
> I can come up with to implement something like a 'final' class in pure
> Python):

I was actually thinking about impure Python, i.e. by means of omitting
Py_TPFLAGS_BASETYPE.

> But how would you prevent this?
> 
> >>> a = C()
> >>> a.__class__ = D
> >>>

For the issue at hand: Assigning __class__ won't change the object
layout, so if the object didn't have an __dict__ before, it won't have
an __dict__ afterwards. Of course, if there are writable slots,
the application could corrupt the underlying resource reference,
making __del__ meaningless, anyway. Here I need to bring up
Python's "we are all consenting adults" attitude...

Regards,
Martin



From guido@python.org Tue Apr 15 21:08:25 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 16:08:25 -0400
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: Your message of "15 Apr 2003 20:27:55 +0200."
 <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org> <20030415141142.GA6011@homer.gst.com> <200304151426.h3FEQGx26716@odiug.zope.com>
 <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304152008.h3FK8Pg29754@odiug.zope.com>

> > Right. Some of these have (finally) been fixed. But my
> > meta-complaint about SF is that it's impossible to get things fixed at
> > our schedule. I'm still hoping to revive the effort of moving the
> > tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/
> 
> However, I take the fact that it has been sitting in that state for
> many months now as an indication that our schedule might not outpace
> SF. This stuff consumes a lot of time, and I'm willing to accept
> worse-than-optimal quality of service if it doesn't consume my time.

Right -- but someone might volunteer and the problem might go away.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com Tue Apr 15 22:29:24 2003
From: ark@research.att.com (Andrew Koenig)
Date: Tue, 15 Apr 2003 17:29:24 -0400 (EDT)
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net>
 (message from Guido van Rossum on Sun, 16 Mar 2003 07:32:04 -0500)
References: <20030312164902.10494.64514.Mailman@mail.python.org>
 <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com>
 <200303150857.53214.aleax@aleax.it>
 <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net>
 <b4vp23$vec$1@main.gmane.org>
 <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net>
 <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304152129.h3FLTOL05240@europa.research.att.com>

>> Moreover, for some data structures, the __cmp__ approach can be
>> expensive. For example, if you're comparing sequences of any kind,
>> and you know that the comparison is for == or !=, you have your answer
>> immediately if the sequences differ in length. If you don't know
>> what's being tested, as you wouldn't inside __cmp__, you may spend a
>> lot more time to obtain a result that will be thrown away.

Guido> Yes. OTOH, as long as cmp() is in the language, these same situations
Guido> are more efficiently done by a __cmp__ implementation than by calling
Guido> __lt__ and then __eq__ or similar (it's hard to decide which order is
Guido> best). So cmp() should be removed at the same time as __cmp__.

Yes.

Guido> And then we should also change list.sort(), as Tim points out. Maybe
Guido> we can start introducing this earlier by using keyword arguments:

Guido> list.sort(lt=function) sorts using a < implementation
Guido> list.sort(cmp=function) sorts using a __cmp__ implementation

The keyword argument might not be necessary: It is always possible for
a function such as sort to figure out whether a comparison function is
2-way or 3-way (assuming it matters) by doing only one extra comparison.




From duanev@io.com Tue Apr 15 22:37:28 2003
From: duanev@io.com (duane voth)
Date: Tue, 15 Apr 2003 16:37:28 -0500
Subject: [Python-Dev] LynxOS 4 port
Message-ID: <20030415163728.A22630@io.com>

I'd like to get 2.2.2 up on LynxOS 4 for PowerPC. I am very interested
in finding others who have worked toward this, and also the person in
charge of Python's configure scripts (as it seems LynxOS 4 is a bit of
a hybrid).

Thanks in advance!

-- 
Duane Voth
duanev@io.com
--
duanev@atlantis.io.com


From guido@python.org Wed Apr 16 00:49:50 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 15 Apr 2003 19:49:50 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: "Your message of Tue, 15 Apr 2003 17:29:24 EDT."
 <200304152129.h3FLTOL05240@europa.research.att.com>
References: <20030312164902.10494.64514.Mailman@mail.python.org>
 <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com>
 <200303150857.53214.aleax@aleax.it>
 <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net>
 <b4vp23$vec$1@main.gmane.org>
 <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net>
 <yu99adfw5h5n.fsf@europa.research.att.com>
 <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net>
 <200304152129.h3FLTOL05240@europa.research.att.com>
Message-ID: <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net>

> Guido> And then we should also change list.sort(), as Tim points
> Guido> out. Maybe we can start introducing this earlier by using
> Guido> keyword arguments:
> 
> Guido> list.sort(lt=function) sorts using a < implementation
> Guido> list.sort(cmp=function) sorts using a __cmp__ implementation

[Andrew Koenig]
> The keyword argument might not be necessary: It is always possible
> for a function such as sort to figure out whether a comparison
> function is 2-way or 3-way (assuming it matters) by doing only one
> extra comparison.

That's cute, but a bit too magical for my taste... It's not
immediately obvious how this would be done (I know how, but it would
require a lot of explaining). Plus, -1 is a perfectly valid truth
value.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com Wed Apr 16 01:41:31 2003
From: ark@research.att.com (Andrew Koenig)
Date: Tue, 15 Apr 2003 20:41:31 -0400 (EDT)
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net>
 (message from Guido van Rossum on Tue, 15 Apr 2003 19:49:50 -0400)
References: <20030312164902.10494.64514.Mailman@mail.python.org>
 <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com>
 <200303150857.53214.aleax@aleax.it>
 <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net>
 <b4vp23$vec$1@main.gmane.org>
 <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net>
 <yu99adfw5h5n.fsf@europa.research.att.com>
 <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net>
 <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304160041.h3G0fVI06215@europa.research.att.com>

Guido> That's cute, but a bit too magical for my taste... It's not
Guido> immediately obvious how this would be done (I know how, but it
Guido> would require a lot of explaining). Plus, -1 is a perfectly
Guido> valid truth value.

Yes, I know that -1 is a valid truth value.

Here's the trick. The object of the game is to figure out whether
f is < or __cmp__.

Suppose you call f(x, y) and it returns 0. Then you don't care
which one f is, because x<y is false either way.

So the first time you care is the first time f(x, y) returns nonzero.
Now you can find out what kind of function f is by calling f(y, x).
If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.


From greg@cosc.canterbury.ac.nz Wed Apr 16 02:11:34 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 Apr 2003 13:11:34 +1200 (NZST)
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200304160041.h3G0fVI06215@europa.research.att.com>
Message-ID: <200304160111.h3G1BYd03439@oma.cosc.canterbury.ac.nz>

> Yes, I know that -1 is a valid truth value.
> 
> So the first time you care is the first time f(x, y) returns nonzero.
> Now you can find out what kind of function f is by calling f(y, x).
> If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.

I think the worry is that the function might be saying
"true" to both of these, but just happen to spell it
1 the first time and -1 the second.

Probably fairly unlikely, though...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From tim.one@comcast.net Wed Apr 16 02:57:49 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 15 Apr 2003 21:57:49 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200304160111.h3G1BYd03439@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEEKEDAB.tim.one@comcast.net>

[Andrew Koenig]
> Yes, I know that -1 is a valid truth value.
>
> So the first time you care is the first time f(x, y) returns nonzero.
> Now you can find out what kind of function f is by calling f(y, x).
> If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.

[Greg Ewing]
> I think the worry is that the function might be saying
> "true" to both of these, but just happen to spell it
> 1 the first time and -1 the second.

Then it's answering true to both

 x < y ?
and
 y < x ?

The comparison function is insane, then, so it doesn't matter what
list.sort() does in that case (the algorithm is robust against insane
comparison functions now, but doesn't define what will happen then beyond
that the output list will contain a permutation of its input state).

I've ignored this scheme for two reasons: anti-Pythonicity (having Python
guess which kind of comparison function you wrote is anti-Pythonic on the
face of it), and inefficiency. list.sort() is so bloody highly tuned now
that adding even one test-&-branch per comparison, in C, on native C ints,
gives a measurable slowdown, even when the user passes an expensive
comparison function. In the case that no comparison function is passed,
we're able to skip a layer of function call now by calling
PyObject_RichCompareBool(X, Y, Py_LT) directly (no cmp-to-LT conversion is
needed then).

Against that, it could be natural to play Andrew's trick only in count_run()
(the part of the code that identifies natural runs). That would be confined
to 2 textual comparison sites, and does no more than len(list)-1 comparisons
total now.



From greg@cosc.canterbury.ac.nz Wed Apr 16 03:31:19 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 Apr 2003 14:31:19 +1200 (NZST)
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEEKEDAB.tim.one@comcast.net>
Message-ID: <200304160231.h3G2VJs03574@oma.cosc.canterbury.ac.nz>

> Then it's answering true to both
> 
> x < y ?
> and
> y < x ?
>
> The comparison function is insane, then

No, I'm the one that's insane, I think. You're right,
this is impossible.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	 | A citizen of NewZealandCorp, a	 |
Christchurch, New Zealand	 | wholly-owned subsidiary of USA Inc. |
greg@cosc.canterbury.ac.nz	 +--------------------------------------+


From jack@performancedrivers.com Wed Apr 16 04:00:36 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Tue, 15 Apr 2003 23:00:36 -0400
Subject: [Python-Dev] sre.c and sre_match()
Message-ID: <20030415230036.L1039@localhost.localdomain>

I can't find sre_match() anywhere in the source and it doesn't have a man
page. Usage is sprinkled throughout sre.c but it doesn't seem to be defined
anywhere I can find. Would someone in the know tell me where it is?

I was actually poking around to see how hard it would be to allow pure-python
string classes to work with the re modules. Much slower than base strings,
but nice for odd cases (like doing regexp matches on ternary trees).

-jackdied


From tim_one@email.msn.com Wed Apr 16 04:12:51 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 15 Apr 2003 23:12:51 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200304160231.h3G2VJs03574@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAOEHAB.tim_one@email.msn.com>

>> Then it's answering true to both
>>
>> x < y ?
>> and
>> y < x ?
>>
>> The comparison function is insane, then

[Greg Ewing]
> No, I'm the one that's insane, I think. You're right,
> this is impossible.

For a sane comparison function, yes. Python can't enforce that
user-supplied functions are sane, though, and-- as always --it's Python's
job to ensure that nothing catastrophic happens when users go bad. One of
the reasons Python had to grow its own sort implementation is that various
platform qsort() implementations weren't robust against ill-behaved cmp
functions. For example, a typical quicksort partitioning phase searches
right for the next element >= key, and left for the next <= key. Some are
tempted to save inner-loop index comparisons by ensuring that the leftmost
slice element is <= key, and the rightmost >= key, before partitioning
begins. Then the left and right inner searches are "guaranteed" not to go
too far, and by element comparisons alone. But if the comparison function
is inconsistent, that can lead to the inner loops reading outside the slice
bounds, and so cause segfaults.

Python's post-platform-qsort sorts all protect against that kind of crud,
but can't give a useful specification of the result in such cases (beyond
that the list is *some* permutation of its input state -- no element is lost
or duplicated -- and guaranteeing just that much in the worst cases causes
some pain in the implementation).



From tim_one@email.msn.com Wed Apr 16 04:22:42 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 15 Apr 2003 23:22:42 -0400
Subject: [Python-Dev] sre.c and sre_match()
In-Reply-To: <20030415230036.L1039@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>

[Jack Diederich]
> I can't find sre_match() anywhere in the source

It's in _sre.c, here:

LOCAL(int)
SRE_MATCH(SRE_STATE* state, SRE_CODE* pattern, int level)

SRE_MATCH is a macro, and expands to either sre_match or sre_umatch,
depending on whether Unicode support is enabled. Note that _sre.c arranges
to compile itself *twice*, via its

#define SRE_RECURSIVE
#include "_sre.c"
#undef SRE_RECURSIVE

This is to get both 8-bit and Unicode versions of the basic routines when
Unicode support is enabled.

> and it doesn't have a man page.

Heh. Does *any* Python source code have a man page <wink>?

> ...
> I was actually poking around to see how hard it would be to allow
> pure-python string classes to work with the re modules.

Sorry, no idea. Note that sre works on any object supporting the ill-fated
buffer interface. You may have a hard time figuring out that too. But,
e.g., it implies that re can search directly over an mmap'ed file (you don't
need to read the file into a string first).



From tim_one@email.msn.com Wed Apr 16 05:51:27 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 16 Apr 2003 00:51:27 -0400
Subject: [Python-Dev] Garbage collecting closures
In-Reply-To: <200304151923.h3FJNmG29436@odiug.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEBCEHAB.tim_one@email.msn.com>

This is a multi-part message in MIME format.

------=_NextPart_000_0006_01C303B2.4FA6CF60
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

[Guido]
> I'm glazing over the details now, but there seems to be a kernel of
> useful cleanup in here somehow; I hope that someone will be able to
> contribute a prototype of such code at least!

I'll attach a head start, a general implementation of Tarjan's SCC algorithm
that produces a list of SCCs already in a topsort order. I haven't tested
this enough, and Tarjan's algorithm is subtle -- user beware.

The trygc() function at the end is an example application that appears to
work, busting all the objects gc knows about into SCCs and displaying them.
This requires Python CVS (for the new gc.get_referents function). Note that
you'll get a very large SCC at the start. This isn't an error! Each module
that imports sys ends up in this SCC, due to that the module has the module
sys in its module dict, and sys has the module in its sys.modules dict.
>From there, modules have their top-level functions in their dict, while the
top level functions point back to the module dict via func_globals. Etc.
Everything in this giant blob is reachable from everything else.

For the gc application, it would probably be better (run faster and consume
less memory) if dfs() simply ignored objects with no successors.
Correctness shouldn't be harmed if def started with

 succs = successors(v)
 if not succs:
 return

except that objects with no successors would no longer be considered
singleton SCCs, and the recursive call to dfs() would need to be fiddled to
skip trying to update id2lowest[v_id] then (so dfs should be changed to
return a bool saying whether it took the early return). This would save the
current work of trying to chase pointless things like ints and strings.
Still, it's pretty zippy as-is!

------=_NextPart_000_0006_01C303B2.4FA6CF60
Content-Type: text/plain;
	name="scc.py"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="scc.py"

# This implements Tarjan's linear-time algorithm for finding the maximal
# strongly connected components. It takes time proportional to the sum
# of the number of nodes and arcs.
#
# Two functions must be passed to the constructor:
# node2id graph node -> a unique integer
# successors graph node -> sequence of immediate successor graph =
nodes
#
# Call method getsccs() with an iterable producing the root nodes of the =
graph.
# The result is a list of SCCs, each of which is a list of graph nodes.
# This is a partitioning of all graph nodes reachable from the roots,
# where each SCC is a maximal subset such that each node in an SCC is
# reachable from all other nodes in the SCC. Note that the derived =
graph
# where each SCC is a single "supernode" is necessarily acyclic (else if
# SCC1 and SCC2 were in a cycle, each node in SCC1 would be reachable =
from
# each node in SCC1 and SCC2, contradicting that SCC1 is a maximal =
subset).
# The list of SCCs returned by getsccs() is in a topological sort order =
wrt
# this derived DAG.

class SCC(object):
 def __init__(self, node2id, successors):
 self.node2id =3D node2id
 self.successors =3D successors

 def getsccs(self, roots):
 import sys

 node2id, successors =3D self.node2id, self.successors
 get_dfsnum =3D iter(xrange(sys.maxint)).next
 id2dfsnum =3D {}
 id2lowest =3D {}
 stack =3D []
 id2stacki =3D {}
 sccs =3D []

 def dfs(v, v_id):
 id2dfsnum[v_id] =3D id2lowest[v_id] =3D v_dfsnum =3D =
get_dfsnum()
 id2stacki[v_id] =3D len(stack)
 stack.append((v, v_id))
 for w in successors(v):
 w_id =3D node2id(w)
 if w_id not in id2dfsnum: # first time we saw w
 dfs(w, w_id)
 id2lowest[v_id] =3D min(id2lowest[v_id], =
id2lowest[w_id])
 else:
 w_dfsnum =3D id2dfsnum[w_id]
 if w_dfsnum < v_dfsnum and w_id in id2stacki:
 id2lowest[v_id] =3D min(id2lowest[v_id], =
w_dfsnum)

 if id2lowest[v_id] =3D=3D v_dfsnum:
 i =3D id2stacki[v_id]
 scc =3D []
 for w, w_id in stack[i:]:
 del id2stacki[w_id]
 scc.append(w)
 del stack[i:]
 sccs.append(scc)

 for v in roots:
 v_id =3D node2id(v)
 if v_id not in id2dfsnum:
 dfs(v, v_id)
 sccs.reverse()
 return sccs

_basic_tests =3D """
>>> succs =3D {1: [2], 2: []}
>>> s =3D SCC(int, lambda i: succs[i])

The order in which the roots are listed doesn't matter: we get the =
unique
topsort regardless.

>>> s.getsccs([1])
[[1], [2]]
>>> s.getsccs([1, 2])
[[1], [2]]
>>> s.getsccs([2, 1])
[[1], [2]]

But note that 1 isn't reachable from 2, so giving 2 as the only root =
won't
find 1.

>>> s.getsccs([2])
[[2]]

>>> succs =3D {1: [2],
... 2: [3, 5],
... 3: [2, 4],
... 4: [3],
... 5: [2]}
>>> s =3D SCC(int, lambda i: succs[i])
>>> s.getsccs([1])
[[1], [2, 3, 4, 5]]
>>> s.getsccs(range(1, 6))
[[1], [2, 3, 4, 5]]

Break the link from 4 back to 2.
>>> succs[4] =3D []
>>> s.getsccs([1])
[[1], [2, 3, 5], [4]]
"""

__test__ =3D {'basic': _basic_tests}

def _test():
 import doctest
 doctest.testmod()

if __name__ =3D=3D '__main__':
 _test()

def trygc():
 import gc
 gc.collect()
 s =3D SCC(id, gc.get_referents)
 for scc in s.getsccs(gc.get_objects()):
 if len(scc) =3D=3D 1:
 continue
 print "SCC w/", len(scc), "objects"
 for x in scc:
 print " ", hex(id(x)), type(x),
 if hasattr(x, "__name__"):
 print x.__name__,
 print

------=_NextPart_000_0006_01C303B2.4FA6CF60--



From martin@v.loewis.de Wed Apr 16 06:19:59 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 16 Apr 2003 07:19:59 +0200
Subject: [Python-Dev] LynxOS 4 port
In-Reply-To: <20030415163728.A22630@io.com>
References: <20030415163728.A22630@io.com>
Message-ID: <m365pf9fy8.fsf@mira.informatik.hu-berlin.de>

duane voth <duanev@io.com> writes:

> I'd like to get 2.2.2 up on LynxOS 4 for PowerPC. I am very interested
> in finding others who have worked toward this, and also the person in
> charge of Python's configure scripts (as it seems LynxOS 4 is a bit of
> a hybrid).

There isn't really a single person "in charge" of it: If you have
specific suggestions or questions, don't hesitate to ask; specific
patches best go to SF.

Regards,
Martin



From Raymond Hettinger" <python@rcn.com Wed Apr 16 06:56:45 2003
From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger)
Date: Wed, 16 Apr 2003 01:56:45 -0400
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
Message-ID: <000801c303e0$df6c9a20$125ffea9@oemcomputer>

The docs for PyObject_IsTrue() promise that the "function 
always succeeds". But in reality it can return an error 
result if an underlying method returns an error.

The calls in ceval.c and elsewhere are cluttered and slowed
by trying to handle all three possibilities. In other places
(like bltinmodule.c and pyexpat.c), the result is used directly
in an "if(result)" clause that ignores the possibility of an
error return.

Instead of fixing the docs, do you guys think there may
be merit in returning False whenever explicit Truth isn't 
found? Favoring practicality over silent error passage?

This would simplify the use of the function, honor the
promise in the docs, and match usage in code that had not 
considered an error result. The function and its callers will 
end-up a little smaller, a little faster, and a little more consistent.
Also, reasoning about truth values will be a tad simpler.

Note, similar thoughts also apply to PyObject_Not().


Raymond Hettinger
Pythonistas Against Three Valued Predicates



From ben@algroup.co.uk Wed Apr 16 11:20:48 2003
From: ben@algroup.co.uk (Ben Laurie)
Date: Wed, 16 Apr 2003 11:20:48 +0100
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <20030415154933.GA6030@mephisto.ghaering.test>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>	<m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem>	<3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com>	<3E9C1B03.1070803@algroup.co.uk> <20030415154933.GA6030@mephisto.ghaering.test>
Message-ID: <3E9D2E80.30902@algroup.co.uk>

Gerhard Häring wrote:

> * Ben Laurie <ben@algroup.co.uk> [2003-04-15 15:45 +0100]:
> 
>>Guido van Rossum wrote:
>>
>>>>My company would be happy to host it in The Bunker
>>>>(http://www.thebunker.net/). [...]
>>>>We have plenty of experience running CVS and we have 24x7 support.
>>>
>>>I'd like to pursue this, but I don't have time myself. A sponsorship
>>>link to TheBunker would definitely be a possibility (we have a link to
>>>XS4ALL at the top of www.python.org).
>>
>>Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews?
> 
> 
> Probably only Sourceforge staff. But maybe we can avoid asking them ...

Is there any particular reason to avoid asking them? This is a public
list, after all!

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff



From gh@ghaering.de Wed Apr 16 11:53:07 2003
From: gh@ghaering.de (Gerhard Haering)
Date: Wed, 16 Apr 2003 12:53:07 +0200
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <3E9D2E80.30902@algroup.co.uk>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>	<m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem>	<3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com>	<3E9C1B03.1070803@algroup.co.uk>	<20030415154933.GA6030@mephisto.ghaering.test> <3E9D2E80.30902@algroup.co.uk>
Message-ID: <3E9D3613.8070100@ghaering.de>

Ben Laurie wrote:
> Gerhard Häring wrote:
>>>Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews?
>>
>>Probably only Sourceforge staff. But maybe we can avoid asking them ...
> 
> Is there any particular reason to avoid asking them? This is a public
> list, after all!

No. It's just that from what I see, we can collect the necessary data 
ourselves and can get a timely and detailed answer by doing so.

-- Gerhard



From mal@lemburg.com Wed Apr 16 12:26:22 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 Apr 2003 13:26:22 +0200
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
In-Reply-To: <000801c303e0$df6c9a20$125ffea9@oemcomputer>
References: <000801c303e0$df6c9a20$125ffea9@oemcomputer>
Message-ID: <3E9D3DDE.4090409@lemburg.com>

Raymond Hettinger wrote:
> The docs for PyObject_IsTrue() promise that the "function 
> always succeeds". But in reality it can return an error 
> result if an underlying method returns an error.
> 
> The calls in ceval.c and elsewhere are cluttered and slowed
> by trying to handle all three possibilities. In other places
> (like bltinmodule.c and pyexpat.c), the result is used directly
> in an "if(result)" clause that ignores the possibility of an
> error return.
> 
> Instead of fixing the docs, do you guys think there may
> be merit in returning False whenever explicit Truth isn't 
> found? Favoring practicality over silent error passage?

Hmm, I've checked my sources and found that I am assuming
the documented behaviour, ie. the function never fails.
The Zope sources also assume this behaviour and many other
extensions probably do too... (we really need a repository
of available open source code for Python which makes grepping
these things easier, oh well).

> This would simplify the use of the function, honor the
> promise in the docs, and match usage in code that had not 
> considered an error result. The function and its callers will 
> end-up a little smaller, a little faster, and a little more consistent.
> Also, reasoning about truth values will be a tad simpler.
> 
> Note, similar thoughts also apply to PyObject_Not().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source (#1, Apr 16 2003)
 >>> Python/Zope Products & Consulting ... http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium: 69 days left



From mhammond@skippinet.com.au Wed Apr 16 13:11:03 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 16 Apr 2003 22:11:03 +1000
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
In-Reply-To: <3E9D3DDE.4090409@lemburg.com>
Message-ID: <00ec01c30411$4117a690$530f8490@eden>

MAL:

> (we really need a repository
> of available open source code for Python which makes grepping
> these things easier, oh well).

Isn't this just a list of CVS roots (and passwords for anonymous on that
server <wink/frown>)?

Members of the Python foundry at source-forge wouldn't be a bad place to
start. Except see that other thread <wink>.

Mark.



From skip@pobox.com Wed Apr 16 13:54:51 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 16 Apr 2003 07:54:51 -0500
Subject: [Python-Dev] migration away from SourceForge?
In-Reply-To: <3E9D3613.8070100@ghaering.de>
References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net>
 <m3fzokbae7.fsf@mira.informatik.hu-berlin.de>
 <1050407808.9401.8.camel@anthem>
 <3E9C1419.6090908@algroup.co.uk>
 <200304151424.h3FENGS26701@odiug.zope.com>
 <3E9C1B03.1070803@algroup.co.uk>
 <20030415154933.GA6030@mephisto.ghaering.test>
 <3E9D2E80.30902@algroup.co.uk>
 <3E9D3613.8070100@ghaering.de>
Message-ID: <16029.21147.256535.724317@montanaro.dyndns.org>

 >> Gerhard H=E4ring wrote:
 >>>> Groovy. _Does_ anyone have any idea how much bandwidth your CV=
S chews?
 >>>=20
 >>> Probably only Sourceforge staff. But maybe we can avoid asking =
them ...
 >>=20
 >> Is there any particular reason to avoid asking them? This is a p=
ublic
 >> list, after all!

 Gerhard> No. It's just that from what I see, we can collect the
 Gerhard> necessary data ourselves and can get a timely and detailed=

 Gerhard> answer by doing so.

"Timely" being the operative word here, I think.

Skip


From guido@python.org Wed Apr 16 14:30:37 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 09:30:37 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: Your message of "Tue, 15 Apr 2003 20:41:31 EDT."
 <200304160041.h3G0fVI06215@europa.research.att.com>
References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net>
 <200304160041.h3G0fVI06215@europa.research.att.com>
Message-ID: <200304161330.h3GDUbd07889@odiug.zope.com>

> Guido> That's cute, but a bit too magical for my taste... It's not
> Guido> immediately obvious how this would be done (I know how, but it
> Guido> would require a lot of explaining). Plus, -1 is a perfectly
> Guido> valid truth value.
> 
> Yes, I know that -1 is a valid truth value.
> 
> Here's the trick. The object of the game is to figure out whether
> f is < or __cmp__.
> 
> Suppose you call f(x, y) and it returns 0. Then you don't care
> which one f is, because x<y is false either way.
> 
> So the first time you care is the first time f(x, y) returns nonzero.
> Now you can find out what kind of function f is by calling f(y, x).
> If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison.

Right. There's no flaw in this logic, but I'd hate to have to explain
it over and over... I don't want people to believe that Python can
somehow magically sniff the difference between two functions; they
might expect it in other contexts.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org Wed Apr 16 14:40:53 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 09:40:53 -0400
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
In-Reply-To: Your message of "Wed, 16 Apr 2003 01:56:45 EDT."
 <000801c303e0$df6c9a20$125ffea9@oemcomputer>
References: <000801c303e0$df6c9a20$125ffea9@oemcomputer>
Message-ID: <200304161340.h3GDerM07941@odiug.zope.com>

> The docs for PyObject_IsTrue() promise that the "function 
> always succeeds". But in reality it can return an error 
> result if an underlying method returns an error.

Then the docs need to be repaired!

> The calls in ceval.c and elsewhere are cluttered and slowed
> by trying to handle all three possibilities. In other places
> (like bltinmodule.c and pyexpat.c), the result is used directly
> in an "if(result)" clause that ignores the possibility of an
> error return.

Code that ignores the error return possibility is an accident waiting
to happen and should be fixed.

> Instead of fixing the docs, do you guys think there may
> be merit in returning False whenever explicit Truth isn't 
> found? Favoring practicality over silent error passage?

-1000. This function may invoke arbitrary Python code; exceptions in
such code should never be silenced.

> This would simplify the use of the function, honor the
> promise in the docs, and match usage in code that had not 
> considered an error result. The function and its callers will 
> end-up a little smaller, a little faster, and a little more consistent.
> Also, reasoning about truth values will be a tad simpler.
> 
> Note, similar thoughts also apply to PyObject_Not().

And a ditto response.


Background: once upon a time the code honored the docs. This was way
long ago, when comparisons also were not allowed to fail. This was
found out to be a real bad idea when these operations could be
overloaded in Python, and gradually most code was fixed.
Unfortunately the docs weren't fixed. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sismex01@hebmex.com Wed Apr 16 14:38:14 2003
From: sismex01@hebmex.com (sismex01@hebmex.com)
Date: Wed, 16 Apr 2003 08:38:14 -0500
Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8
Message-ID: <F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com>

I've found something very, very strange: the interpreter
dies on me when printing a UTF-8 encoded unicode object,
when the terminal has a unicode codepage.

Before anyone asks, I'm running on Windows NT 4.

First, I read this message on Python-List from
Ben Hutchings:

>=20
> UTF-8 is code page 65001.
>=20
> Strangely, though, I get 'permission denied' when I run "chcp 65001" =
and
> then try to print a UTF-8-encoded Euro sign. I don't know what could =
be
> going wrong there.
>

So, promptly, I opened a console window, changed the codepage using
the above command and started Python.

When executing the following:

>>> print u"h=F2l=E1".encode("utf-8")

[in case it doesn't print out correctly, using html entities,
it's "h&ograve;l&aacute;".encode("utf-8")]

the interpreter simply exits without any message, exception,
peep, anything; it simply quits without printing anything.

Any suggestions?

-gustavo

pd: Before anybody mentions adding a bug report in SF, I must
 warn that I don't have web access, only email access.


From jack@performancedrivers.com Wed Apr 16 14:55:00 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Wed, 16 Apr 2003 09:55:00 -0400
Subject: [Python-Dev] sre.c and sre_match()
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>; from tim_one@email.msn.com on Tue, Apr 15, 2003 at 11:22:42PM -0400
References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>
Message-ID: <20030416095500.M1039@localhost.localdomain>

On Tue, Apr 15, 2003 at 11:22:42PM -0400, Tim Peters wrote:
> [Jack Diederich]
> > I can't find sre_match() anywhere in the source
> 
> It's in _sre.c, here:
> 
> LOCAL(int)
> SRE_MATCH(SRE_STATE* state, SRE_CODE* pattern, int level)
> 
> SRE_MATCH is a macro, and expands to either sre_match or sre_umatch,
> depending on whether Unicode support is enabled. Note that _sre.c arranges
> to compile itself *twice*, via its
> 
> #define SRE_RECURSIVE
> #include "_sre.c"
> #undef SRE_RECURSIVE
> 
> This is to get both 8-bit and Unicode versions of the basic routines when
> Unicode support is enabled.
> 

My god, its full of stars.

Ah, that explains how both sre_umatch() and sre_umatch() get defined and make
the if (state.charsize == 1) switches possible.

the SRE_RECURSIVE isn't hard to understand once you know it is there, but
might it be tidier to breakout the stuff parsed twice into another file?

The current layout of the _sre.c is

<stuff done once, setup stuff>
<stuff done twice, via #include "_sre.c">
<stuff done once, object stuff>

mv <stuff done twice> to _sre_twice.c

#define SRE_MATCH sre_match
#include "_sre_twice.c" /* defines the symbols sre_match, sre_search .. */
#define SRE_MATCH sre_umatch
#include "_sre_twice.c" /* defines the symbols sre_umatch, sre_usearch .. */
<stuff done once>

You probably don't get random people walking around _sre.c much, but it would
have gotten me where I need to go (or at least a better chance).

thanks,

-jack





From duncan@rcp.co.uk Wed Apr 16 15:22:06 2003
From: duncan@rcp.co.uk (Duncan Booth)
Date: Wed, 16 Apr 2003 15:22:06 +0100
Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8
References: <F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com>
Message-ID: <Xns935F9C2237892duncanrcpcouk@127.0.0.1>

sismex01@hebmex.com wrote in 
news:F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com:

> the interpreter simply exits without any message, exception,
> peep, anything; it simply quits without printing anything.
> 
> Any suggestions?

I think its a problem with windows, or with the C runtime rather than 
Python. The line editing is handled by the system and is obviously screwy. 
Python is interpreting what you entered as signalling end of file. Call 
raw_input and type your text there and you will get an EOFError.

Try typing any non-ascii character at Python's prompt (e.g. euro symbol) 
while the selected codepage is 65001, now move the cursor back to anywhere 
earlier in the input line and enter some more text. The non-ascii character 
character displayed will change.
If you restart the interpreter and recall the line you entered you won't 
get the characters you thought you typed.

Now write a C program:
#include <stdio.h>
int main()
{
 char s[256];
 gets(s);
 return 0;
}

Compile and run it and you get exactly the same behaviour.


-- 
Duncan Booth duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?


From Paul.Moore@atosorigin.com Wed Apr 16 15:53:24 2003
From: Paul.Moore@atosorigin.com (Moore, Paul)
Date: Wed, 16 Apr 2003 15:53:24 +0100
Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8
Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A40@UKDCX001.uk.int.atosorigin.com>

From: Duncan Booth [mailto:duncan@rcp.co.uk]
> I think its a problem with windows, or with the C runtime rather than=20
> Python. The line editing is handled by the system and is obviously =
screwy.=20
> Python is interpreting what you entered as signalling end of file. =
Call=20
> raw_input and type your text there and you will get an EOFError.

Too right something's screwy. But it's not just in the interactive
interpreter. It goes wrong when run from a file, with no non-ascii
characters in the script, as well.

See the attached transcript.

I don't doubt that it's some sort of Windows/CRT problem, but maybe
it's fixable within Python...?

Paul

--- session transcript ---

C:\Data
>chcp
Active code page: 65001

C:\Data
>testutf8.py
h=F2l=E1
Traceback (most recent call last):
  File "C:\Data\testutf8.py", line 1, in ?
    print u'h\xf2l\xe1'.encode("utf-8")
IOError: [Errno 2] No such file or directory

C:\Data
>type testutf8.py
print u'h\xf2l\xe1'.encode("utf-8")


From niemeyer@conectiva.com  Wed Apr 16 15:56:03 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 11:56:03 -0300
Subject: [Python-Dev] shellwords
Message-ID: <20030416145602.GA27447@localhost.distro.conectiva>

Good morning/afternoon!

Is there any chance of getting shellwords[1] into Python 2.3? It's very
small module with a pretty interesting functionality:

[niemeyer@localhost ..-shellwords-0.2]% python
Python 2.2.2 (#1, Apr 10 2003, 13:50:16) 
[GCC 3.2.2] on linux-ppc
Type "help", "copyright", "credits" or "license" for more information.
>>> import shellwords
>>> shellwords.shellwords('arg "arg arg" arg "arg" -o="arg arg"')
['arg', 'arg arg', 'arg', 'arg', '-o=arg arg']
>>> 

[1] http://www.crazy-compilers.com/py-lib/shellwords.html

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From skip@pobox.com  Wed Apr 16 16:12:35 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 16 Apr 2003 10:12:35 -0500
Subject: [Python-Dev] shellwords
In-Reply-To: <20030416145602.GA27447@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva>
Message-ID: <16029.29411.430501.744446@montanaro.dyndns.org>

    Gustavo> Is there any chance of getting shellwords[1] into Python 2.3?

Can shlex not be convinced to do what you want?  (Yes, I saw your Q/A, but
didn't quite understand it.)

Skip


From ark@research.att.com  Wed Apr 16 16:20:52 2003
From: ark@research.att.com (Andrew Koenig)
Date: 16 Apr 2003 11:20:52 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: <200304161330.h3GDUbd07889@odiug.zope.com>
References: <20030312164902.10494.64514.Mailman@mail.python.org>
 <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com>
 <200303150857.53214.aleax@aleax.it>
 <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net>
 <b4vp23$vec$1@main.gmane.org>
 <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net>
 <yu99adfw5h5n.fsf@europa.research.att.com>
 <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net>
 <200304152129.h3FLTOL05240@europa.research.att.com>
 <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net>
 <200304160041.h3G0fVI06215@europa.research.att.com>
 <200304161330.h3GDUbd07889@odiug.zope.com>
Message-ID: <yu99znmqa2p7.fsf@europa.research.att.com>

>> So the first time you care is the first time f(x, y) returns nonzero.
>> Now you can find out what kind of function f is by calling f(y, x).
>> If f(y, x) returns zero, f is <.  Otherwise, it's a 3-way comparison.

Guido> Right.  There's no flaw in this logic, but I'd hate to have to
Guido> explain it over and over...  I don't want people to believe
Guido> that Python can somehow magically sniff the difference between
Guido> two functions; they might expect it in other contexts.

I can understand your reluctance -- I was just pointing out that
it's possible.

However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f)
technique because it doesn't generalize terribly well.

If I want to write a function that takes a comparison function as an
argument, and eventualy passes that function to sort, what do I do?
Something like this?

        def myfun(foo, bar, lt=None, cmp=None):
                # ...
                x.sort(lt=lt, cmp=cmp)
                # ...

and assume that sort will use None as its defaults also?  Or must I
write

                if lt==None:
                        x.sort(cmp=cmp)
                else:
                        x.sort(lt=lt)

Either way it's inconvenient.

So I wonder if it might be better, as a way of allowing sort to take
two different types of comparison functions, to distinguish between
them by making them different types.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From niemeyer@conectiva.com  Wed Apr 16 16:22:56 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 12:22:56 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <16029.29411.430501.744446@montanaro.dyndns.org>
References: <20030416145602.GA27447@localhost.distro.conectiva> <16029.29411.430501.744446@montanaro.dyndns.org>
Message-ID: <20030416152255.GA27792@localhost.distro.conectiva>

> Gustavo> Is there any chance of getting shellwords[1] into Python 2.3?
> 
> Can shlex not be convinced to do what you want?  (Yes, I saw your Q/A, but
> didn't quite understand it.)

I haven't tried, but it surely can, subclassing and rewritting portions
of it. OTOH, shellwords is about half the size of shlex, and shlex looks
overly complex for something simple like

args = shellwords(line)

Btw, it wasn't *my* Q/A, I haven't written shellwords.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From guido@python.org  Wed Apr 16 16:29:15 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 11:29:15 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: Your message of "Wed, 16 Apr 2003 12:16:29 -0300."
 <20030416151629.GA27707@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com>
 <20030416151629.GA27707@localhost.distro.conectiva>
Message-ID: <200304161529.h3GFTFr09409@odiug.zope.com>

> > > [1] http://www.crazy-compilers.com/py-lib/shellwords.html
> > 
> > Hm, couldn't this be easily done with shlex?
> 
> >From the homepage:
> 
> """
> Frequently Asked Questions
> 
> Q: Hey, there is 'shlex' coming with Python. Why there is a need for
> this module? A: I know 'shlex' and I gave it a try. But 'shlex' takes
> quotes as word-delemiters which divers from the shell-semantic (see
> above). And even if 'shlex' would parse strings as needed, I would have
> written a (very, very) thin layer above, since 'shlex' is simple but
> seldomly used for this kind of job.
> """

I saw that after posting. :-(

The argument "'shlex' is simple but seldomly used for this kind of
job." seems circular though: "I'm not using shlex because it's rarely
used" ???

> I agree with him. Even disconsidering the fact of the syntax
> divergence, shellwords is about half the size of shlex, and it's
> much more confortable, allowing one liners like "for opt in
> shellwords(line):".

I know I've wished for this once or twice, but not badly enough to
bother solving the problem right.  I'm worrying that having too many
ways to do mostly the same thing adds clode bloat.

Couldn't adding something even smaller on top of shlex provide the
same interface and solve the syntactic divergence?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From niemeyer@conectiva.com  Wed Apr 16 16:29:44 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 12:29:44 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <16029.29411.430501.744446@montanaro.dyndns.org>
References: <20030416145602.GA27447@localhost.distro.conectiva> <16029.29411.430501.744446@montanaro.dyndns.org>
Message-ID: <20030416152944.GA27900@localhost.distro.conectiva>

> Can shlex not be convinced to do what you want?  (Yes, I saw your Q/A, but
> didn't quite understand it.)

Oh, sorry. Just now I noticed that you didn't *understand* it.

He was talking about that:

>>> s = StringIO.StringIO("foo 'bar'asd'foo'")
>>> l = shlex.shlex(s)
>>> l.
l.__class__     l.error_leader  l.pop_source    l.source
l.__doc__       l.filestack     l.push_source   l.sourcehook
l.__init__      l.get_token     l.push_token    l.state
l.__module__    l.infile        l.pushback      l.token
l.commenters    l.instream      l.quotes        l.whitespace
l.debug         l.lineno        l.read_token    l.wordchars
>>> l.read_token()
'foo'
>>> l.read_token()
"'bar'"
>>> l.read_token()
"asd'foo'"
>>> 


In constrast to:

>>> shellwords.shellwords("foo 'bar'asd'foo'")
['foo', 'barasdfoo']

And also:

[niemeyer@localhost ~/src]% echo foo 'bar'asd'foo' 
foo barasdfoo

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From niemeyer@conectiva.com  Wed Apr 16 16:30:56 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 12:30:56 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <200304161529.h3GFTFr09409@odiug.zope.com>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com>
Message-ID: <20030416153056.GB27900@localhost.distro.conectiva>

[...]
> Couldn't adding something even smaller on top of shlex provide the
> same interface and solve the syntactic divergence?

Ok, I'll check if there's an easy way to "turn" shlex into shellwords.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From guido@python.org  Wed Apr 16 16:32:28 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 11:32:28 -0400
Subject: [Python-Dev] Re: Re: lists v. tuples
In-Reply-To: Your message of "16 Apr 2003 11:20:52 EDT."
 <yu99znmqa2p7.fsf@europa.research.att.com>
References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> <200304160041.h3G0fVI06215@europa.research.att.com> <200304161330.h3GDUbd07889@odiug.zope.com>
 <yu99znmqa2p7.fsf@europa.research.att.com>
Message-ID: <200304161532.h3GFWSU09441@odiug.zope.com>

> However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f)
> technique because it doesn't generalize terribly well.
> 
> If I want to write a function that takes a comparison function as an
> argument, and eventualy passes that function to sort, what do I do?
> Something like this?
> 
>         def myfun(foo, bar, lt=None, cmp=None):
>                 # ...
>                 x.sort(lt=lt, cmp=cmp)
>                 # ...
> 
> and assume that sort will use None as its defaults also?  Or must I
> write
> 
>                 if lt==None:
>                         x.sort(cmp=cmp)
>                 else:
>                         x.sort(lt=lt)
> 
> Either way it's inconvenient.

Given that (if we add this) the cmp argument will be deprecated,
myfun() should take a 'lt' comparison only.

> So I wonder if it might be better, as a way of allowing sort to take
> two different types of comparison functions, to distinguish between
> them by making them different types.

But Python doesn't do types that way.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Wed Apr 16 16:40:14 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 16 Apr 2003 10:40:14 -0500
Subject: [Python-Dev] shellwords
In-Reply-To: <20030416153056.GB27900@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva>
 <200304161503.h3GF3Eo08464@odiug.zope.com>
 <20030416151629.GA27707@localhost.distro.conectiva>
 <200304161529.h3GFTFr09409@odiug.zope.com>
 <20030416153056.GB27900@localhost.distro.conectiva>
Message-ID: <16029.31070.687527.821448@montanaro.dyndns.org>

    Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into
    Gustavo> shellwords.

Cool.  Based on this thread and an experiment I tried, some obvious (to me)
things come to mind:

    * get_token() needs to be fixed to handle the 'bar'asd'foo' case

    * the shlex class should handle strings as input, not just file-like
      objects

    * get_word() or get_words() methods in the shlex class could implement
      the shellwords functionality

Skip


From fdrake@acm.org  Wed Apr 16 16:41:22 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 16 Apr 2003 11:41:22 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: <20030416153056.GB27900@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva>
 <200304161503.h3GF3Eo08464@odiug.zope.com>
 <20030416151629.GA27707@localhost.distro.conectiva>
 <200304161529.h3GFTFr09409@odiug.zope.com>
 <20030416153056.GB27900@localhost.distro.conectiva>
Message-ID: <16029.31138.988795.672854@grendel.zope.com>

Gustavo Niemeyer writes:
 > Ok, I'll check if there's an easy way to "turn" shlex into shellwords.

Is there any real objection to simply fixing shlex to get it right?
I'm guessing that the divergence from shell quoting was more a matter
of implementation expedience and a feeling that it was "good enough"
for whatever original application it was written for.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From guido@python.org  Wed Apr 16 16:45:12 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 11:45:12 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: Your message of "Wed, 16 Apr 2003 10:40:14 CDT."
 <16029.31070.687527.821448@montanaro.dyndns.org>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva>
 <16029.31070.687527.821448@montanaro.dyndns.org>
Message-ID: <200304161545.h3GFjC710136@odiug.zope.com>

>     Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into
>     Gustavo> shellwords.
> 
> Cool.  Based on this thread and an experiment I tried, some obvious (to me)
> things come to mind:
> 
>     * get_token() needs to be fixed to handle the 'bar'asd'foo' case
> 
>     * the shlex class should handle strings as input, not just file-like
>       objects
> 
>     * get_word() or get_words() methods in the shlex class could implement
>       the shellwords functionality

I'd be happy to see this done.  You might submit the changes to ESR
for review but he may be busy so don't wait for him.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From esr@thyrsus.com  Wed Apr 16 17:11:23 2003
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 16 Apr 2003 12:11:23 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: <16029.31138.988795.672854@grendel.zope.com>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31138.988795.672854@grendel.zope.com>
Message-ID: <20030416161123.GA13046@thyrsus.com>

Fred L. Drake, Jr. <fdrake@acm.org>:
> Gustavo Niemeyer writes:
>  > Ok, I'll check if there's an easy way to "turn" shlex into shellwords.
> 
> Is there any real objection to simply fixing shlex to get it right?
> I'm guessing that the divergence from shell quoting was more a matter
> of implementation expedience and a feeling that it was "good enough"
> for whatever original application it was written for.

That is correct.  I originally wrote shlex as the parser logic for a .netrc 
module.  I would have no intrinsic objection to having this behavior fixed,
though there is of course the general problem of how much we value not 
breaking old code.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


From guido@python.org  Wed Apr 16 16:52:10 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 11:52:10 -0400
Subject: [Python-Dev] 2.3b1 release
Message-ID: <200304161552.h3GFqAQ10181@odiug.zope.com>

I'd like to do a 2.3b1 release someday.  Maybe at the end of next
week, that would be Friday April 25.  If anyone has something that
needs to be done before this release go out, please let me know!

Assigning a SF bug or patch to me and setting the priority to 7 is a
good way to get my attention.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From niemeyer@conectiva.com  Wed Apr 16 17:43:14 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 13:43:14 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <200304161545.h3GFjC710136@odiug.zope.com>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> <200304161545.h3GFjC710136@odiug.zope.com>
Message-ID: <20030416164314.GA28085@localhost.distro.conectiva>

> > Cool.  Based on this thread and an experiment I tried, some obvious (to me)
> > things come to mind:
> > 
> >     * get_token() needs to be fixed to handle the 'bar'asd'foo' case
> > 
> >     * the shlex class should handle strings as input, not just file-like
> >       objects
> > 
> >     * get_word() or get_words() methods in the shlex class could implement
> >       the shellwords functionality
> 
> I'd be happy to see this done.  You might submit the changes to ESR
> for review but he may be busy so don't wait for him.

Great! I'll work on it.

How should we do to avoid compatibility problems? Some solutions that
come into my mind are:

- Forget about it completely and fix the syntax handling to be
  posix compliant.

- Create a subclass of shlex, or a completely different class
  (shlex_posix?) depending on how much can be reused.

- Add a flag to the constructor.

Suggestions?

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From barry@python.org  Wed Apr 16 17:52:06 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 12:52:06 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050511925.9818.78.camel@barry>

On Sat, 2003-04-12 at 07:43, Martin v. Löwis wrote:

> More or less, yes. Now, what happens if you pot "real" non-ASCII
> (i.e. bytes above 127) into the message id, like so:

But I don't think you'd ever want to do that.  In fact, I think in
general you're probably talking about ascii msgids or utf-8 encoded
Unicode msgids.  I'm not sure what else would make sense.

> msgfmt will still accept that, but msgunfmt will complain:

Didn't even know about msgunfmt. :)

> msgunfmt: warning: The following msgid contains non-ASCII characters.
>                    This will cause problems to translators who use a
>                    character encoding different from yours. Consider
>                    using a pure ASCII msgid instead.
> 
> If you think about this, this is really bad: If you mean to apply the
> charset= to both msgid and msgstr, then translators using a different
> charset from yours are in big trouble.

Right, but see above.  E.g. if your string literals are all Spanish and
you want a Turkish translation, then utf-8 is the only common encoding
you could possibly use in a .po file, right?

> They are faced with three problems:
> 1. They don't know what the charset of the msgids is. The PO files do
>    have a charset declaration, the POT files typically don't.

Yep, although it would be easy for the extractor to add a charset=utf-8
to the pot file.

> 2. They need to convert the msgids from the POT encoding to their
>    native encoding. There are no tools available to support that readily;
>    tools like iconv might correctly convert the msgids, but won't update
>    the charset= in the POT file (if the charset was filled out).
> 3. By converting the msgids, they are also changing them. That means
>    the msgids are not really suitable as keys anymore.

Is this still a problem for when charset=utf-8?

-Barry




From barry@python.org  Wed Apr 16 17:53:53 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 12:53:53 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m38yug57j6.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050093475.11200.96.camel@barry>
 <m38yug57j6.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050512032.9818.81.camel@barry>

On Sat, 2003-04-12 at 06:34, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > I suppose we could cache the conversion to make the next lookup more
> > efficient.  Alternatively, if we always convert internally to Unicode we
> > could encode on .gettext().  Then we could just pick One Way and do away
> > with the coerce flag.
> 
> If you are concerned about efficiency, I guess there is no way to
> avoid converting the file to Unicode on loading. I would then
> encourage a change where this flag is available, but has an effect
> only on performance, not on the behaviour.
> 
> Alternatively, you could subclass GNUTranslation.

It would take some refactoring, unless you implemented a second pass
over the catalog.  I'd rather not do either, so I'm happy to include
this right in GNUTranslations.

-Barry




From niemeyer@conectiva.com  Wed Apr 16 18:03:35 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 14:03:35 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <20030416164314.GA28085@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> <200304161545.h3GFjC710136@odiug.zope.com> <20030416164314.GA28085@localhost.distro.conectiva>
Message-ID: <20030416170335.GA28540@localhost.distro.conectiva>

> Great! I'll work on it.
> 
> How should we do to avoid compatibility problems? Some solutions that
> come into my mind are:
> 
> - Forget about it completely and fix the syntax handling to be
>   posix compliant.
> 
> - Create a subclass of shlex, or a completely different class
>   (shlex_posix?) depending on how much can be reused.
> 
> - Add a flag to the constructor.

Thinking further about this, I belive there's a better solution. I'll
write different functions (probably read_word()/get_word()) with the
new behavior.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From drifty@alum.berkeley.edu  Wed Apr 16 18:31:04 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Wed, 16 Apr 2003 10:31:04 -0700 (PDT)
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!
>

Just to make sure since this is the first release that I have CVS commit,
we can apply patches to fix bugs without having to worry about  it being
beta, right?  How about new tests?

-Brett


From jack@performancedrivers.com  Wed Apr 16 18:33:58 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Wed, 16 Apr 2003 13:33:58 -0400
Subject: [Python-Dev] sre.c and sre_match()
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>; from tim_one@email.msn.com on Tue, Apr 15, 2003 at 11:22:42PM -0400
References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>
Message-ID: <20030416133358.A1553@localhost.localdomain>

> [Jack Diederich]
> > ...
> > I was actually poking around to see how hard it would be to allow
> > pure-python string classes to work with the re modules.
[Tim Peters]
> Sorry, no idea.  Note that sre works on any object supporting the ill-fated
> buffer interface.  You may have a hard time figuring out that too.  But,
> e.g., it implies that re can search directly over an mmap'ed file (you don't
> need to read the file into a string first).

Poking around some more in _sre.c

It looks like user defined strings could be supported via the same #include
hack as unicode with some extra defines.

// ascii/unicdoe
#define STATE_NEXT_CHAR(state) state->ptr++
// user strings
#define STATE_NEXT_CHAR(state) PyEval_CallObject(state->string_nextmethod)

similar for STATE_PREV_CHAR

and something to ask if we're at the end

// ascii
#define STATE_ISEND(state) (state->ptr == state->end)
// user strings
#define STATE_ISEND(state) PyEval_CallOjbect(state->string_endmethod)

Is there a speed reason why all the SRE_MATCH type functions do
  ptr = state->ptr;
  ptr++;
  ptr--;
  // lots more stuff with ptr
  state->ptr = ptr;

or is it just convenience?  If just convenience it would make writing the
#defines easier.

the PyEval_CallObjects are just psuedo code, it would be wrapped in something
that tested the appropriateness of the return value and other book keeping.

could this be done without hurting the speed of regular regexps?

-jackdied


From guido@python.org  Wed Apr 16 18:36:19 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 13:36:19 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: Your message of "Wed, 16 Apr 2003 10:31:04 PDT."
 <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU>
Message-ID: <200304161736.h3GHaJB10928@odiug.zope.com>

> > I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> > week, that would be Friday April 25.  If anyone has something that
> > needs to be done before this release go out, please let me know!
> 
> Just to make sure since this is the first release that I have CVS commit,
> we can apply patches to fix bugs without having to worry about  it being
> beta, right?

Right.  Fix away.

> How about new tests?

Feel free to add new unit tests, as long as the whole unit test suite
passes when you commit.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Apr 16 18:39:50 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 13:39:50 -0400
Subject: [Python-Dev] sre.c and sre_match()
In-Reply-To: Your message of "Wed, 16 Apr 2003 13:33:58 EDT."
 <20030416133358.A1553@localhost.localdomain>
References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>
 <20030416133358.A1553@localhost.localdomain>
Message-ID: <200304161739.h3GHdoP10981@odiug.zope.com>

There are few people here who understand the _sre code, so I'm not
sure you'll get answers.  Given how critical this code is and given
that Fredrik is adamant that the code needs to continue to run with
all versions of Python starting with 1.5.2, I'd rather not mess with
it much in terms of adding new features.  Maybe you can create your
own code fork for now?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From theller@python.net  Wed Apr 16 18:47:57 2003
From: theller@python.net (Thomas Heller)
Date: 16 Apr 2003 19:47:57 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <4r4yqqpe.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

I would still like to work on http://www.python.org/sf/595026
support for masks in getargs.c.

Jack requested that this change should be implement shortly after the
release of 2.3a2, but this is too late now as it seems ;-)

What to do?

Implement it now and commit it after 2.3b1 is released, or delay this
until 2.3 final is released. I have to admit that I'm sure I can
implement it for 32-bit Windows, but it would have to be tested (and
maybe completed) on other, especially 64-bit platforms as well.

And it introduces incompatibilities.

BTW: Since you want to release a beta version, what's the state of the
FutureWarning about hex/oct constants: will this stay the way it is?

Thomas



From python@rcn.com  Wed Apr 16 18:50:57 2003
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 16 Apr 2003 13:50:57 -0400
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
References: <000801c303e0$df6c9a20$125ffea9@oemcomputer>  <200304161340.h3GDerM07941@odiug.zope.com>
Message-ID: <00d701c30440$bc766680$125ffea9@oemcomputer>

> > The docs for PyObject_IsTrue() promise that the "function 
> > always succeeds".  But in reality it can return an error 
> > result if an underlying method returns an error.
> 
> Then the docs need to be repaired!

Done.


From guido@python.org  Wed Apr 16 19:00:54 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 14:00:54 -0400
Subject: [Python-Dev] Masks in getargs.c (was: 2.3b1 release)
In-Reply-To: Your message of "16 Apr 2003 19:47:57 +0200."
 <4r4yqqpe.fsf@python.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <4r4yqqpe.fsf@python.net>
Message-ID: <200304161800.h3GI0sP11085@odiug.zope.com>

> Guido van Rossum <guido@python.org> writes:
> 
> > I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> > week, that would be Friday April 25.  If anyone has something that
> > needs to be done before this release go out, please let me know!

> From: Thomas Heller <theller@python.net>
> 
> I would still like to work on http://www.python.org/sf/595026
> support for masks in getargs.c.

Great!

> Jack requested that this change should be implement shortly after the
> release of 2.3a2, but this is too late now as it seems ;-)
> 
> What to do?

Do it ASAP.

> Implement it now and commit it after 2.3b1 is released, or delay this
> until 2.3 final is released. I have to admit that I'm sure I can
> implement it for 32-bit Windows, but it would have to be tested (and
> maybe completed) on other, especially 64-bit platforms as well.

If you can get something rough into 2.3b1, it can be improved while
2.3b2 is cooking.

> And it introduces incompatibilities.

What kind?  I thought it would be a new format code?

> BTW: Since you want to release a beta version, what's the state of the
> FutureWarning about hex/oct constants: will this stay the way it is?

Probably, unless you hve a better idea. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Apr 16 19:01:46 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 14:01:46 -0400
Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA
In-Reply-To: Your message of "Wed, 16 Apr 2003 13:50:57 EDT."
 <00d701c30440$bc766680$125ffea9@oemcomputer>
References: <000801c303e0$df6c9a20$125ffea9@oemcomputer> <200304161340.h3GDerM07941@odiug.zope.com>
 <00d701c30440$bc766680$125ffea9@oemcomputer>
Message-ID: <200304161801.h3GI1kW11105@odiug.zope.com>

> > > The docs for PyObject_IsTrue() promise that the "function 
> > > always succeeds".  But in reality it can return an error 
> > > result if an underlying method returns an error.
> > 
> > Then the docs need to be repaired!
> 
> Done.

Thanks!  But didn't you say that you had found code (in core Python)
that didn't account for failures?  Shouldn't that be fixed too?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From theller@python.net  Wed Apr 16 19:11:27 2003
From: theller@python.net (Thomas Heller)
Date: 16 Apr 2003 20:11:27 +0200
Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release)
In-Reply-To: <200304161800.h3GI0sP11085@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com>
Message-ID: <vfxepb1s.fsf@python.net>

> > I would still like to work on http://www.python.org/sf/595026
> > support for masks in getargs.c.
> 
> Great!
> 
> > Jack requested that this change should be implement shortly after the
> > release of 2.3a2, but this is too late now as it seems ;-)
> > 
> > What to do?
> 
> Do it ASAP.

Ok, working on it.

> 
> > Implement it now and commit it after 2.3b1 is released, or delay this
> > until 2.3 final is released. I have to admit that I'm sure I can
> > implement it for 32-bit Windows, but it would have to be tested (and
> > maybe completed) on other, especially 64-bit platforms as well.
> 
> If you can get something rough into 2.3b1, it can be improved while
> 2.3b2 is cooking.
> 
> > And it introduces incompatibilities.
> 
> What kind?  I thought it would be a new format code?

Two new format codes ('k' and 'K'), and changes to existing format
codes - per your request:

| How about the following counterproposal. This also changes some of the
| other format codes to be a little more regular.
|
| Code C type Range check
|
| b unsigned char 0..UCHAR_MAX
| B unsigned char none **
| h unsigned short 0..USHRT_MAX
| H unsigned short none **
| i int INT_MIN..INT_MAX
| I * unsigned int 0..UINT_MAX
| l long LONG_MIN..LONG_MAX
| k * unsigned long none
| L long long LLONG_MIN..LLONG_MAX
| K * unsigned long long none
|
| Notes:
|
| * New format codes.
|
| ** Changed from previous "range-and-a-half" to "none"; the
| range-and-a-half checking wasn't particularly useful.


> > BTW: Since you want to release a beta version, what's the state of the
> > FutureWarning about hex/oct constants: will this stay the way it is?
> 
> Probably, unless you hve a better idea. :-(

I haven't used warnings very much, but is there a possibility to disable
them per module? You get a lot of them if you 'import win32con' for
example.

Thomas



From guido@python.org  Wed Apr 16 19:17:10 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 14:17:10 -0400
Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release)
In-Reply-To: Your message of "16 Apr 2003 20:11:27 +0200."
 <vfxepb1s.fsf@python.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com>
 <vfxepb1s.fsf@python.net>
Message-ID: <200304161817.h3GIHA111307@odiug.zope.com>

> > > And it introduces incompatibilities.
> > 
> > What kind?  I thought it would be a new format code?
> 
> Two new format codes ('k' and 'K'), and changes to existing format
> codes - per your request:
> 
> | How about the following counterproposal. This also changes some of the
> | other format codes to be a little more regular.
> |
> | Code C type Range check
> |
> | b unsigned char 0..UCHAR_MAX
> | B unsigned char none **
> | h unsigned short 0..USHRT_MAX
> | H unsigned short none **
> | i int INT_MIN..INT_MAX
> | I * unsigned int 0..UINT_MAX
> | l long LONG_MIN..LONG_MAX
> | k * unsigned long none
> | L long long LLONG_MIN..LLONG_MAX
> | K * unsigned long long none
> |
> | Notes:
> |
> | * New format codes.
> |
> | ** Changed from previous "range-and-a-half" to "none"; the
> | range-and-a-half checking wasn't particularly useful.

Oh of course.  None to worry about IMO.

> > > BTW: Since you want to release a beta version, what's the state
> > > of the FutureWarning about hex/oct constants: will this stay the
> > > way it is?
> > 
> > Probably, unless you hve a better idea. :-(
> 
> I haven't used warnings very much, but is there a possibility to
> disable them per module? You get a lot of them if you 'import
> win32con' for example.

Yes, you can suppress warnings per module.  Please read the docs.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From theller@python.net  Wed Apr 16 19:38:33 2003
From: theller@python.net (Thomas Heller)
Date: 16 Apr 2003 20:38:33 +0200
Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release)
In-Reply-To: <200304161817.h3GIHA111307@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com>
 <vfxepb1s.fsf@python.net> <200304161817.h3GIHA111307@odiug.zope.com>
Message-ID: <ptnmp9sm.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> > > > And it introduces incompatibilities.
> > > 
> > > What kind?  I thought it would be a new format code?
> > 
> > Two new format codes ('k' and 'K'), and changes to existing format
> > codes - per your request:
> > 
> > | How about the following counterproposal. This also changes some of the
> > | other format codes to be a little more regular.
> > |
> > | Code C type Range check
> > |
> > | b unsigned char 0..UCHAR_MAX
> > | B unsigned char none **
> > | h unsigned short 0..USHRT_MAX
> > | H unsigned short none **
> > | i int INT_MIN..INT_MAX
> > | I * unsigned int 0..UINT_MAX
> > | l long LONG_MIN..LONG_MAX
> > | k * unsigned long none
> > | L long long LLONG_MIN..LLONG_MAX
> > | K * unsigned long long none
> > |
> > | Notes:
> > |
> > | * New format codes.
> > |
> > | ** Changed from previous "range-and-a-half" to "none"; the
> > | range-and-a-half checking wasn't particularly useful.
> 
> Oh of course.  None to worry about IMO.

Well, implementing (and testing) these as the main part of the work,
and I'm at least halfway through.

Thomas



From martin@v.loewis.de  Wed Apr 16 20:14:27 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 16 Apr 2003 21:14:27 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <m3y92a9rvw.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

I'd like to install the modifications for internationalized domain
names before the beta release is made.

Regards,
Martin



From martin@v.loewis.de  Wed Apr 16 20:20:34 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 16 Apr 2003 21:20:34 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1050511925.9818.78.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
Message-ID: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> Right, but see above.  E.g. if your string literals are all Spanish and
> you want a Turkish translation, then utf-8 is the only common encoding
> you could possibly use in a .po file, right?

That's why your string literals should never be all Spanish. If you
have Spanish string literals and use escape codes in the msgid,
reading the Spanish msgid becomes difficult, anyway.

> > 3. By converting the msgids, they are also changing them. That means
> >    the msgids are not really suitable as keys anymore.
> 
> Is this still a problem for when charset=utf-8?

If the msgids are UTF-8, with non-ASCII characters C-escaped,
translators will *still* put non-UTF-8 encodings into the catalogs.
This will then be a problem: The catalog encoding won't be UTF-8,
and you can't process the msgids.

Regards,
Martin



From niemeyer@conectiva.com  Wed Apr 16 20:23:27 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Wed, 16 Apr 2003 16:23:27 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <16029.31070.687527.821448@montanaro.dyndns.org>
References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org>
Message-ID: <20030416192326.GA29785@localhost.distro.conectiva>

> Cool.  Based on this thread and an experiment I tried, some obvious (to me)
> things come to mind:
> 
>     * get_token() needs to be fixed to handle the 'bar'asd'foo' case
> 
>     * the shlex class should handle strings as input, not just file-like
>       objects
> 
>     * get_word() or get_words() methods in the shlex class could implement
>       the shellwords functionality

Ok, it was easier than I imagined. Here's an example of the new shlex.

Maintaining the old behavior (notice that now strings are
accepted as arguments):

>>> import shlex
>>> l = shlex.shlex("'foo'a'bar'")
>>> l.get_token()
"'foo'"
>>> l.get_token()
"a'bar'"

New behavior:

>>> l = shlex.shlex("'foo'a'bar'", posix=1)
>>> l.get_token()
'fooabar'

Introduced iterator interface:

>>> for i in shlex.shlex("'foo'a'bar'"):
...   print i
... 
'foo'
a'bar'

New function, mimicking shellwords:

>>> shlex.split_args("'foo'a'bar' -o='foo bar'")
['fooabar', '-o=foo bar']


I'm not sure if "posix" and "split_args" are the best names for these
features. Suggestions?

I've just commited patch #722686 (and assigned to Guido, as he suggested
recently ;-).

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From barry@python.org  Wed Apr 16 20:36:08 2003
From: barry@python.org (Barry Warsaw)
Date: 16 Apr 2003 15:36:08 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1050521768.14112.15.camel@barry>

On Wed, 2003-04-16 at 15:20, Martin v. Löwis wrote:
> Barry Warsaw <barry@python.org> writes:
> 
> > Right, but see above.  E.g. if your string literals are all Spanish and
> > you want a Turkish translation, then utf-8 is the only common encoding
> > you could possibly use in a .po file, right?
> 
> That's why your string literals should never be all Spanish. If you
> have Spanish string literals and use escape codes in the msgid,
> reading the Spanish msgid becomes difficult, anyway.

So why isn't the English/US-ASCII bias for msgids considered a liability
for gettext?  Do non-English programmers not want to use native literals
in their source code?

If we adhere to this limitation instead of extending gettext then it
seems like Zope will be forced to use something else, and that seems
like a waste.  Its msgids come from sources other than program source
code and such sources may indeed be written in non-English.  It seems
like gettext is so close and all the machinery is almost there, that
this small enhancement should be harmless and helpful.

BTW, I believe that if all your msgids /are/ us-ascii, you should be
able to ignore this change and have it works backwards compatibly.

Also, this change ought to visibly only affect .ugettext() which isn't
part of the traditional gettext API anyway.

> > > 3. By converting the msgids, they are also changing them. That means
> > >    the msgids are not really suitable as keys anymore.
> > 
> > Is this still a problem for when charset=utf-8?
> 
> If the msgids are UTF-8, with non-ASCII characters C-escaped,
> translators will *still* put non-UTF-8 encodings into the catalogs.
> This will then be a problem: The catalog encoding won't be UTF-8,
> and you can't process the msgids.

Isn't this just another validation step to run on the .po files?  There
are already several ways translators can (and do!) make mistakes, so we
already have to validate the files anyway.

-Barry




From guido@python.org  Wed Apr 16 20:31:48 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 16 Apr 2003 15:31:48 -0400
Subject: [Python-Dev] Super and properties
In-Reply-To: Your message of "Wed, 02 Apr 2003 15:42:41 +0100."
 <001401c2f926$1d32d7e0$a8130dd5@violante>
References: <001401c2f926$1d32d7e0$a8130dd5@violante>
Message-ID: <200304161931.h3GJVmp19275@odiug.zope.com>

(I'm quoting the whole message below since this has been two weeks by
now.)

> From: =?iso-8859-1?Q?Gon=E7alo_Rodrigues?= <op73418@mail.telepac.pt>
> 
> Hi all,
> 
> Since this is my first post here, let me first introduce myself. I'm Gonçalo
> Rodrigues. I work in mathematics, mathematical physics to be more precise. I
> am a self-taught hobbyist programmer and fell in love with Python a year and
> half ago. And of interesting personal details this is about all so let me
> get down to business.
> 
> My problem has to do with super that does not seem to work well with
> properties. I posted to comp.lang.python a while ago and there I was advised
> to post here. So, suppose I override a property in a subclass,  e.g.
> 
> >>> class test(object):
> ...  def __init__(self, n):
> ...   self.__n = n
> ...  def __get_n(self):
> ...   return self.__n
> ...  def __set_n(self, n):
> ...   self.__n = n
> ...  n = property(__get_n, __set_n)
> ...
> >>> a = test(8)
> >>> a.n
> 8
> >>> class test2(test):
> ...  def __init__(self, n):
> ...   super(test2, self).__init__(n)
> ...  def __get_n(self):
> ...   return "Got ya!"
> ...  n = property(__get_n)
> ...
> >>> b = test2(8)
> >>> b.n
> 'Got ya!'
> 
> Now, since I'm overriding a property, it is only normal that I may want to
> call the property implementation in the super class. But the obvious way (to
> me at least) does not work:
> 
> >>> print super(test2, b).n
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> AttributeError: 'super' object has no attribute 'n'
> 
> I know I can get at the property via the class, e.g. do
> 
> >>> test.n.__get__(b)
> 8
> >>>
> 
> Or, not hardcoding the test class,
> 
> >>> b.__class__.__mro__[1].n.__get__(b)
> 8
> 
> But this is ugly at best. To add to the puzzle, the following works, albeit
> not in the way I expected
> 
> >>> super(test2, b).__getattribute__('n')
> 'Got ya!'
> 
> Since I do not know if this is a bug in super or a feature request for it, I
> thought I'd better post here and leave it to your consideration.
> 
> With my best regards,
> G. Rodrigues

Hah!  I think I've resolved this, and I *still* don't know if it's a
bug report or a feature request. :-)

The crux of the matter is that super() has a specific exception for
data descriptors, of which properties are an example.  This means that
when looking for attribute 'x', if it finds a hit which is a data
descriptor, it ignores it and keeps looking.

It took me a while to understand why.  When I disabled the test,
exactly *one* unit test fails, and it wasn't immediately clear what
was going on.

It turns out that this test was asking for the __class__ attribute of
the super object itself, but it was getting the __class__ of the
instance.  Simplified:

  class C(object):
    pass
  print super(C, C()).__class__

This should print <type 'super'> and not <class '__main__.C'>, because
it would be really confusing if the super object, when inquired about
its class, masqueraded as another class.

How does skipping data descriptors accomplish this goal?  When super
does its search, the last class it looks at is 'object', at the end of
the MRO chain.  And this has a data descriptor for '__class__', which
describes the __class__ attribute of all objects.  If super were to
give this descriptor the usual treatment, it would call its __get__
method, and that would (in the above example) return the class C.

The CVS history mentions (for typeobject.c rev 2.120, shortly before
the final 2.2.0 release):

  super(C, C()).__class__ would return the __class__ attribute of C()
  rather than the __class__ attribute of the super object.  This is
  confusing.  To fix this, I decided to change the semantics of super
  so that it only applies to code attributes, not to data attributes.
  After all, overriding data attributes is not supported anyway.

Your message above makes a good case for overriding data attributes,
so I have to retract this.  But I don't want __class__ to return C, I
want it to return super.  So I'll change this back, and make an
explicit exception only for __class__.

And ok, I'm deciding now that this is a feature, which means that I'm
changing it in Python 2.3, but not backporting the change to Python
2.2.x.

Hope this helps!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python@rcn.com  Wed Apr 16 21:18:52 2003
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 16 Apr 2003 16:18:52 -0400
Subject: [Python-Dev] 2.3b1 release
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <004301c30455$78230c80$125ffea9@oemcomputer>

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

I have a couple of small patches and bugs to review.  Should be no
problem getting these in this weekend.

Am working on a more cache friendly dict lookup strategy.  If it is
not ready for prime time in the next few days, it will have to wait
for Py2.4.

A couple of bytecode optimizations may also have to wait for Py2.4.
For some reason, Basicblock(nop, jump_if_true) is not always directly 
substitutable for Basicblock(unary_not, jump_if_false).  I suspect
the three-way return value for PyObject_IsTrue() but it could be
something else.


Raymond Hettinger


From martin@v.loewis.de  Wed Apr 16 23:07:15 2003
From: martin@v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 17 Apr 2003 00:07:15 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <1050521768.14112.15.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>	 <1050092819.11172.89.camel@barry>	 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>	 <1050511925.9818.78.camel@barry>	 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry>
Message-ID: <3E9DD413.8030002@v.loewis.de>

Barry Warsaw wrote:

> So why isn't the English/US-ASCII bias for msgids considered a liability
> for gettext?  Do non-English programmers not want to use native literals
> in their source code?

Using English for msgids is about the only way to get translation. 
Finding a Turkish speaker who can translate from Spanish is 
*significantly* more difficult than starting from English; if you were 
starting from, say, Chinese, and going to Hebrew might just be impossible.

So any programmer who seriously wants to have his software translated 
will put English texts into the source code. Non-English literals are 
only used if l10n is not an issue.

> If we adhere to this limitation instead of extending gettext then it
> seems like Zope will be forced to use something else, and that seems
> like a waste.  

It's not a limitation of gettext, but a usage guideline: gettext can map 
arbitrary byte strings to arbitrary other byte strings.

> BTW, I believe that if all your msgids /are/ us-ascii, you should be
> able to ignore this change and have it works backwards compatibly.

"This" change being addition of the "coerce" argument? If you think
you will need it, we can leave it in.

>>If the msgids are UTF-8, with non-ASCII characters C-escaped,
>>translators will *still* put non-UTF-8 encodings into the catalogs.
>>This will then be a problem: The catalog encoding won't be UTF-8,
>>and you can't process the msgids.
> 
> 
> Isn't this just another validation step to run on the .po files?  There
> are already several ways translators can (and do!) make mistakes, so we
> already have to validate the files anyway.

I'm not sure how exactly a validation step would be executed. Would that
step simply verify that the encoding of a catalog is UTF-8? That 
validation step would fail for catalogs that legally use other charsets.

Regards,
Martin




From mhammond@skippinet.com.au  Thu Apr 17 02:58:01 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Thu, 17 Apr 2003 11:58:01 +1000
Subject: [Python-Dev] Final PEP 311 run
Message-ID: <023a01c30484$c782ad10$530f8490@eden>

Hi all,
 I'd like to get PEP311 in for the Python 2.3b1 -
http://www.python.org/peps/pep-0311.html (or even if I miss, very soon
after!)

There appears to be no issues with the technical aspects of the PEP (please
correct me now if I am wrong).  The only issue is the name of the API.

To save re-reading the PEP just to understand the names, I will summarize
here (see the PEP for the full version):

There are 2 new functions, called as a pair.  The first function sets up the
Python thread state, along with the GIL, so that the current thread can
safely call the Python API.  The function makes no assumptions about the
current state of the GIL etc - it works out the current state, and does the
"right thing".  The second function is the reverse of the first, to indicate
that the thread has finished with the thread state for the time being.

The PEP calls these functions PyAutoThreadState_Ensure() and
PyAutoThreadState_Release()

Reasons for the names in the PEP:
"Auto" reflects that the current thread-state need not be known (whereas the
other APIs do).  "Ensure()" reflects that nothing may actually be
*created* - all we are doing is "ensuring" we have the resources, creating
only if necessary. On the down-side - "Auto" will look strange in the future
when this is the standard way of managing the lock.  "ThreadState" does not
reflect that the function does more than manage the PyThreadState - it also
manages the locks (which while an implementation detail, are currently
discrete)

Other Proposals:
     Just: PyGIL_Ensure(), PyGIL_Release(): shorter to type, conveys the
meaning.
     David Abrahams: Prefers SubjectVerbObject, so would prefer
"PyEnsureGIL" - but likes
     PyAcquireInterpreter() and PyReleaseInterpreter() best.
     Dropping "Auto" from the PEP gives PyThreadState_Ensure() and
PyThreadState_Release().

I admit to liking "PyAcquireInterpreter()" best, but it does not match the
existing API structure.  For the sake of typing, I would be happy to go with
Just's PyGIL_Ensure(), but maybe PyInterpreter_Ensure() is a good
compromise.

Other opinions or pronouncements welcome :)

Mark.










From tim_one@email.msn.com  Thu Apr 17 03:46:28 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 16 Apr 2003 22:46:28 -0400
Subject: [Python-Dev] Final PEP 311 run
In-Reply-To: <023a01c30484$c782ad10$530f8490@eden>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEBMEHAB.tim_one@email.msn.com>

[Mark Hammond]
> I'd like to get PEP311 in for the Python 2.3b1 -
> http://www.python.org/peps/pep-0311.html (or even if I miss, very soon
> after!)

I hope so!  It seems important that the specific projects mentioned in the
PEP test drive this before 2.3 final.

> There appears to be no issues with the technical aspects of the
> PEP (please correct me now if I am wrong).

Some questions occurred while reading the PEP again, primarily are there any
restrictions on which parts of the Python API can be called between an
ensure and its matching release?  For example, is it OK if the thread does a

    Py_BEGIN_ALLOW_THREADS
    whatever
    Py_END_ALLOW_THREADS

pair while an ensure is active in the thread?  Is it OK if the thread does a
nested

    PyAutoThreadState_Ensure()
    whatever
    PyAutoThreadState_Release()

likewise (I'm sure that one is OK, but am not sure the PEP really says so)?
If that is OK, must the nested call use the same PyAutoThreadState_State
handle returned by the outer ensure -- or must it avoid using the same
handle?

> The only issue is the name of the API.

If that's the only issue, check it in yesterday <0.9 wink>.

> To save re-reading the PEP just to understand the names, I will summarize
> here (see the PEP for the full version):
>
> There are 2 new functions, called as a pair.  The first function
> sets up the Python thread state, along with the GIL, so that the current
> thread can safely call the Python API.  The function makes no assumptions
> about the current state of the GIL etc - it works out the current state,
> and does the "right thing".  The second function is the reverse of the
> first, to indicate that the thread has finished with the thread state for
> the time being.
>
> The PEP calls these functions PyAutoThreadState_Ensure() and
> PyAutoThreadState_Release()

I can live with that.

> Reasons for the names in the PEP:
> "Auto" reflects that the current thread-state need not be known
> (whereas the other APIs do).  "Ensure()" reflects that nothing may
> actually be *created* - all we are doing is "ensuring" we have the
> resources, creating only if necessary. On the down-side - "Auto" will
> look strange in the future when this is the standard way of managing
> the lock.  "ThreadState" does not reflect that the function does more
> than manage the PyThreadState - it also manages the locks (which while
> an implementation detail, are currently discrete)

Please put this paragraph of rationale in the docs (leaving out the down
side, and maybe in a footnote)!  Once it's explained, there's nothing
mysterious about the names, and there's no point making future readers guess
at the reasons.

> Other Proposals:
>      Just: PyGIL_Ensure(), PyGIL_Release(): shorter to type, conveys the
> meaning.

Could live with that too.

>      David Abrahams: Prefers SubjectVerbObject, so would prefer
> "PyEnsureGIL" - but likes

Ditto, except grates some against existing naming conventions (generally

    "Py" Subsystem "_" Detail

).

>      PyAcquireInterpreter() and PyReleaseInterpreter() best.

I first read those as having something to do with an interpreter state,
which isn't a good sign.

>      Dropping "Auto" from the PEP gives PyThreadState_Ensure() and
> PyThreadState_Release().

What do you get if you drop the Thread <wink>?

> I admit to liking "PyAcquireInterpreter()" best, but it does not match
> the existing API structure.

PyInterpreter_{Acquire,Release} would, though.

> For the sake of typing, I would be happy to go with Just's
> PyGIL_Ensure(), but maybe PyInterpreter_Ensure() is a good
> compromise.
>
> Other opinions or pronouncements welcome :)

Flip a coin, check it in, have a smoke, don't look back.  I'll join you.



From greg@cosc.canterbury.ac.nz  Thu Apr 17 03:57:06 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 Apr 2003 14:57:06 +1200 (NZST)
Subject: [Python-Dev] Final PEP 311 run
In-Reply-To: <023a01c30484$c782ad10$530f8490@eden>
Message-ID: <200304170257.h3H2v6v16015@oma.cosc.canterbury.ac.nz>

Mark Hammond <mhammond@skippinet.com.au>:

> The PEP calls these functions PyAutoThreadState_Ensure() and
> PyAutoThreadState_Release()
> 
> Other opinions or pronouncements welcome :)

How about:

  PyEnvironment_Ensure
  PyEnvironment_Release

where the "Environment" bit means "everything that's needed,
whatever it might be".

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tismer@tismer.com  Thu Apr 17 05:40:32 2003
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 17 Apr 2003 06:40:32 +0200
Subject: [Python-Dev] Stackless 3.0 alpha 1 at blinding speed
Message-ID: <3E9E3040.5040409@tismer.com>

Dear community, dear Stackless addicts, dear friends,

Ich habe Euch wirklich was zu erz=E4hlen, liebe Freunde,

I really have to tell you a story!

During the last four months, I have been struggling with
Stackless Python, and especially with myself and how to
get re-focused on my major project which you know very well.
Some of you might know quite well too how hard this was for me,
especially in the context of my parent's endangeroured health.
This particular problem seems to be solved,
for the moment, so let's celebrate the moment, celebrate the moment!

Without going into details, I would like to tell you about the
current status of Stackless Python.
For short, like an abstract, Stackless 3.0 is something like an
or-merge of Stackless 1.0 and 2.0 technology.

Guido, Tim, you both will probably remember my lengthy approaches
to introduce those continuations, years ago, you both convinced
me to drop them, and I did what I was supposed to do. I'm hopefully
a proper citizen, right now. Anyway, you know I'll never really be...

After a long period of depression, I re-invented Stackless in early
2002, with a version number of 2.0, denoting that I had dropped all the
1.0 paradigms (as there are: (1) try to keep compatible, (2) do minimal
changes only, (3) absolutely avoid assembly code at all)

At the same time, I dismissed all of my Stackless 1.0 code, which was
continuation-based, an absolute no-no in Guido's eyes. I still do think
that TimP wasn't that conformant to this "nono"-statement, after I read
a lot of his comments, especially side-notes on the thread-sig,
but this time Guido's veto was clearly stronger than Tim's arguing,
a thing that doesn't happen so often, but I'm proactively respecting
this, positively.

Now, after all that rubbish, let's go into facts, which are quite
interesting.

-------------------------------------------------------------

Today, I finished Stackless Python 3.0, alpha 3.0.1!

First of all, I would like to talk about the new principles.
Yes, no, there are no longer continuations in that sense.
I'm meanwhile convinced that we don't want to support them,
any longer, although I'm happy that Stackless allowed me to
learn *all* any much more about them that that is avalable
on the wor(th|ld) w/h)i(d|l)e net!!

Q&A:

Q: What is it about that Stackless 3.0, will this guy never shut up???

A: No, he most probably never will, unless he's dead, and this is
another 40 or more years in advance, for heaven's sake.

Q: So, what is it about that Stackless 3.0 hype around since months?

A: Simple! Stackless 3.0 has all the hardware switching stuff in it
that Stackless 2.0 had. Stackless 3.0 also incorporates 80% of the
soft switching protocol that Stackless 1.0 had.
But there are a lot of new features:
Stackless has again shown how to marry the impossible with the
imbelievable, and this is the new concept of Stackless 3.0:
There is a maerge between (1.0) soft context switching and (2.0)
hard context switching, which always does the most reasonable thing.

There are a lot of benefits which stem from this hybrid solution,
which will appear in one of my most recent papers, pretty soon.

--------------------------------------------------------------

BLURB

Let me simply end this pamphlete with some simple sentences:
Stackless Python is more capable of tasklets switching than any
other light-weight threading software package.
If anyone disagrees, please give me a runnable counter-example.

Here are some impressive site-specific time measurements, which
especially show, that 20.000.000 cframe tasklet switches per
second are really, really hard to beat.

Pythonon Win32:

D:\slpdev\src\2.2\src\Stackless\test>..\..\pcbuild\python taskspeed.py
10000000 frame switches      took 3.83061 seconds, rate =3D    2610551/s
10000000 frame softswitches  took 2.40112 seconds, rate =3D    4164718/s
10000000 cfunction calls     took 2.13033 seconds, rate =3D    4694098/s
10000000 cframe softswitches took 0.49296 seconds, rate =3D   20285627/s
10000000 cframe switches     took 1.98907 seconds, rate =3D    5027486/s
10000000 cframe 100 words    took 3.93737 seconds, rate =3D    2539768/s
The penalty per stack word is about 0.980 percent of raw switching.
Stack size of initial stub   =3D 14
Stack size of frame tasklet  =3D 58
Stack size of cframe tasklet =3D 35

D:\slpdev\src\2.2\src\Stackless\test>

Python on Debian

--=20
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From guido@python.org  Thu Apr 17 14:47:56 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 09:47:56 -0400
Subject: [Python-Dev] Final PEP 311 run
In-Reply-To: Your message of "Thu, 17 Apr 2003 11:58:01 +1000."
 <023a01c30484$c782ad10$530f8490@eden>
References: <023a01c30484$c782ad10$530f8490@eden>
Message-ID: <200304171347.h3HDluh22368@odiug.zope.com>

>  I'd like to get PEP311 in for the Python 2.3b1 -
> http://www.python.org/peps/pep-0311.html (or even if I miss, very soon
> after!)

Great!

> There appears to be no issues with the technical aspects of the PEP (please
> correct me now if I am wrong).  The only issue is the name of the API.

How about PyGILState_Ensure() and PyGILState_Restore()?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas@xs4all.net  Thu Apr 17 16:27:22 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 17 Apr 2003 17:27:22 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <20030417152722.GA9493@xs4all.nl>

On Wed, Apr 16, 2003 at 11:52:10AM -0400, Guido van Rossum wrote:

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

Well, there is the CALL_ATTR patch (http://www.python.org/sf/709744) that
Brett and I worked on at the PyCon sprints. It's finished (barring tests)
for classic classes, and writing tests is not very inspiring because all
functionality is already tested in the standard test suite. However, it
doesn't do anything with newstyle classes at all, yet.

I've had suprisingly little time since PyCon (it's amazing how not being at
the office for two weeks makes people shove work your way -- I guess they
realized I couldn't object :) but I'm in the process of grokking newstyle
classes. So far, I've been alternating from 'Wow! to 'Au!', and I'll send
another email after this one for clarification of a few issues :) Anyway,
if anyone has straightforward ideas about how CALL_ATTR should deal with
newstyle classes (if at all), please inform me (preferably via SF) or just
grab the patch and run with it. I'm still confused about descrgets and where
they come from.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Thu Apr 17 16:47:39 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 17 Apr 2003 17:47:39 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <20030417152722.GA9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl>
Message-ID: <20030417154739.GB9493@xs4all.nl>

On Thu, Apr 17, 2003 at 05:27:22PM +0200, Thomas Wouters wrote:

> So far, I've been alternating from 'Wow! to 'Au!', and I'll send
> another email after this one for clarification of a few issues :)

Nevermind that. A "D'oh" slipped into the stream, and I think I get it now.
At least the stuff that wasn't working is working now. I wouldn't mind if
someone pointed me to a xxtype.c (newstyle class in C) like we have xxobject
and xxsubclass though... So far, it's been so simple I fear I'm missing
something.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido@python.org  Thu Apr 17 16:53:01 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 11:53:01 -0400
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: Your message of "Thu, 17 Apr 2003 17:27:22 +0200."
 <20030417152722.GA9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <20030417152722.GA9493@xs4all.nl>
Message-ID: <200304171553.h3HFr1023445@odiug.zope.com>

[Thomas]
> Well, there is the CALL_ATTR patch (http://www.python.org/sf/709744)
> that Brett and I worked on at the PyCon sprints. It's finished
> (barring tests) for classic classes, and writing tests is not very
> inspiring because all functionality is already tested in the
> standard test suite. However, it doesn't do anything with newstyle
> classes at all, yet.

And I want the new-style classes version!

> I've had suprisingly little time since PyCon (it's amazing how not
> being at the office for two weeks makes people shove work your way
> -- I guess they realized I couldn't object :)

Even without so much that problem here, I was buried in email for a
week after returning from Python UK. :-)

> but I'm in the process of grokking newstyle classes. So far, I've
> been alternating from 'Wow! to 'Au!', and I'll send another email
> after this one for clarification of a few issues :) Anyway, if
> anyone has straightforward ideas about how CALL_ATTR should deal
> with newstyle classes (if at all), please inform me (preferably via
> SF) or just grab the patch and run with it. I'm still confused about
> descrgets and where they come from.

Yes, please.  Here's a quick explanation of descriptors:

A descriptor is something that lives in a class' __dict__, and
primarily affects instance attribute lookup.  A descriptor has a
__get__ method (in C this is the tp_descrget function in its type
object) and the instance attribute lookup calls this to "bind" the
descriptor to a specific instance.  This is what turns a function into
a bound method object in Python 2.2.  In earlier versions, functions
were special-cased by the instance getattr code; the special case has
been subsumed by looking for a __get__ method.  Yes, this means that a
plain Python function object is a descriptor!  Because the instance
getattr code returns whatever __get__ returns as the result of the
attribute lookup, this is also how properties work: they have a
__get__ method that calls the property-get" function.

A descriptor's __get__ method is also called for class attribute
lookup (with the instance argument set to NULL or None).  And a
descsriptor's __set__ method is called for instance attribute
assignment; but not for class attribute assignment.

Hope this helps!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Thu Apr 17 18:28:42 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 Apr 2003 19:28:42 +0200
Subject: [Python-Dev] CALL_ATTR patch
In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>              <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com>
Message-ID: <3E9EE44A.6060904@lemburg.com>

Guido van Rossum wrote:
> Yes, please.  Here's a quick explanation of descriptors:
> 
> A descriptor is something that lives in a class' __dict__, and
> primarily affects instance attribute lookup.  A descriptor has a
> __get__ method (in C this is the tp_descrget function in its type
> object) and the instance attribute lookup calls this to "bind" the
> descriptor to a specific instance.  This is what turns a function into
> a bound method object in Python 2.2.  In earlier versions, functions
> were special-cased by the instance getattr code; the special case has
> been subsumed by looking for a __get__ method.  Yes, this means that a
> plain Python function object is a descriptor!  Because the instance
> getattr code returns whatever __get__ returns as the result of the
> attribute lookup, this is also how properties work: they have a
> __get__ method that calls the property-get" function.
> 
> A descriptor's __get__ method is also called for class attribute
> lookup (with the instance argument set to NULL or None).  And a
> descsriptor's __set__ method is called for instance attribute
> assignment; but not for class attribute assignment.
> 
> Hope this helps!

Could you put such short overviews somewhere on the Python Wiki ?

They sure help in understanding what is going on behind the
scenes without having to grep through tons of source code :-)

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 17 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        68 days left



From guido@python.org  Thu Apr 17 18:34:31 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 13:34:31 -0400
Subject: [Python-Dev] CALL_ATTR patch
In-Reply-To: Your message of "Thu, 17 Apr 2003 19:28:42 +0200."
 <3E9EE44A.6060904@lemburg.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com>
 <3E9EE44A.6060904@lemburg.com>
Message-ID: <200304171734.h3HHYVU03250@odiug.zope.com>

> Could you put such short overviews somewhere on the Python Wiki ?

I don't have the time for that.  When I want to publish stuff like
this somewhere, I need to spend time to make it all correct, complete
etc.

> They sure help in understanding what is going on behind the
> scenes without having to grep through tons of source code :-)

You should start by reading

http://www.python.org/2.2.2/descrintro.html

If you still have questions about descriptors after reading that,
grepping the source is an option. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From whisper@oz.net  Thu Apr 17 20:51:08 2003
From: whisper@oz.net (David LeBlanc)
Date: Thu, 17 Apr 2003 12:51:08 -0700
Subject: [Python-Dev] Wrappers and keywords
Message-ID: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net>

(You'll excuse me I hope if this is deemed inappropriate. I'm posting this
here rather than in the general list since it's about the language and not
it's application.)

I am curious to know why the, what seems to me kludgy, "def x(): pass  x =
(static|class)method(x)" syntax was chosen over a simple "staticdef x
():..." or "classdef x ():..." def specialization syntax? Either method adds
keywords to the language, but a direct declaration seems clearer and less
error prone to me compared to the "call->assignment magically makes a
wrapper" method. Is it hard to do the needed special wrapping directly?

Regards,

David LeBlanc
Seattle, WA USA



From skip@pobox.com  Thu Apr 17 20:59:29 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 17 Apr 2003 14:59:29 -0500
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net>
References: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net>
Message-ID: <16031.1953.359534.974127@montanaro.dyndns.org>

    David> I am curious to know why the, what seems to me kludgy, "..."
    David> syntax was chosen over a simple "staticdef" or "classdef" def
    David> specialization syntax? 

It was felt that it was more important in the short term to explore/add the
functionality and settle details of syntax later.

Skip


From Jack.Jansen@oratrix.com  Thu Apr 17 21:11:39 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Thu, 17 Apr 2003 22:11:39 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com>

On woensdag, apr 16, 2003, at 17:52 Europe/Amsterdam, Guido van Rossum 
wrote:

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

The getargs mods got checked in just this morning, even though I 
explicitly
and rather strongly asked that if these mods be made they be checked in
*long* before a release was due:-(

This means that all the Mac modules are now 100% dead. The same is 
probably true
for PyObjC. And PyObjC has the added problem that
it needs to be compatible with both 2.3b1 and 2.2 (notice that that is 
"2.2",
not "2.2.X": PyObjC has to work with /usr/bin/python that Apple ships, 
which
is 2.2 at the moment). I assume there are format codes that will convert
16 bit and 32 bit integer quantities without any checks on both 2.2 and 
2.3,
but I haven't investigated yet.

And there may be problems with other wrapper packages (win32, wxPython, 
PyOpenGL)
too.

I will start fixing things, but there are only 4 real days left before 
April 25,
given easter, so I would strongly urge for postponing the release date 
for
another two weeks or so.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From martin@v.loewis.de  Thu Apr 17 21:12:19 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 17 Apr 2003 22:12:19 +0200
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net>
References: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net>
Message-ID: <3E9F0AA3.7000907@v.loewis.de>

David LeBlanc wrote:
> I am curious to know why the, what seems to me kludgy, "def x(): pass  x =
> (static|class)method(x)" syntax was chosen over a simple "staticdef x
> ():..." or "classdef x ():..." def specialization syntax? 

That syntax hasn't been chosen yet; syntactic sugar for static and class
methods, properties, slots, and other object types is still an area of
ongoing research.

The current implementation was created since it did not need an 
extension to the syntax:

    x=staticmethod(x)

was syntactically correct even in Python 1.2 (which is the oldest
Python version I remember).

There have been numerous proposals on what the syntactic sugar should
look like, which is one reason why no specific solution has been 
implemented yet. Proposals get usually discredit if they require 
introduction of new keywords, like "staticdef". The current favorite 
proposals is to write

   def x() [static]:
     pass

or perhaps

   def x() [staticmethod]:
     pass

In that proposal, static(method) would *not* be a keyword, but would
be an identifier (denoting the same thing that staticmethod currently 
denotes). This syntax nicely extends to

   def x() [threading.synchronized, xmlrpclib.webmethod]:
     pass

The syntax has the disadvantage of not applying nicely to slots.

Regards,
Martin



From guido@python.org  Thu Apr 17 21:17:30 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 16:17:30 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: Your message of "Thu, 17 Apr 2003 22:11:39 +0200."
 <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com>
References: <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com>
Message-ID: <200304172017.h3HKHUO05664@odiug.zope.com>

> > I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> > week, that would be Friday April 25.  If anyone has something that
> > needs to be done before this release go out, please let me know!
> 
> The getargs mods got checked in just this morning, even though I
> explicitly and rather strongly asked that if these mods be made they
> be checked in *long* before a release was due:-(

Sorry, I forgot.  Did you make a note of that on the SF patch?

> This means that all the Mac modules are now 100% dead. The same is
> probably true for PyObjC. And PyObjC has the added problem that it
> needs to be compatible with both 2.3b1 and 2.2 (notice that that is
> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that
> Apple ships, which is 2.2 at the moment). I assume there are format
> codes that will convert 16 bit and 32 bit integer quantities without
> any checks on both 2.2 and 2.3, but I haven't investigated yet.

Maybe we should retract the changes to existing format codes that make
them more restrictive?  That should revive any code that's curerntly
dead, right?

> And there may be problems with other wrapper packages (win32,
> wxPython, PyOpenGL) too.
> 
> I will start fixing things, but there are only 4 real days left
> before April 25, given easter, so I would strongly urge for
> postponing the release date for another two weeks or so.

That would endanger then entire release schedule to the point of
pushing the 2.3 release past the OSCON conference (July 7-11).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Thu Apr 17 21:30:54 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Thu, 17 Apr 2003 22:30:54 +0200
Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release)
In-Reply-To: <vfxepb1s.fsf@python.net>
Message-ID: <7D10D627-7113-11D7-AE99-000A27B19B96@oratrix.com>

On woensdag, apr 16, 2003, at 20:11 Europe/Amsterdam, Thomas Heller 
wrote:
> | How about the following counterproposal. This also changes some of 
> the
> | other format codes to be a little more regular.
> |
> | Code C type Range check
> |
> | b unsigned char 0..UCHAR_MAX
> | B unsigned char none **
> | h unsigned short 0..USHRT_MAX
> | H unsigned short none **
> | i int INT_MIN..INT_MAX
> | I * unsigned int 0..UINT_MAX
> | l long LONG_MIN..LONG_MAX
> | k * unsigned long none
> | L long long LLONG_MIN..LLONG_MAX
> | K * unsigned long long none
> |
> | Notes:
> |
> | * New format codes.
> |
> | ** Changed from previous "range-and-a-half" to "none"; the
> | range-and-a-half checking wasn't particularly useful.

Do I understand correctly that there is no format code that works on 
both
2.2 and 2.3 that converts 32 bit quantities without complaining
(B and H will work for 8 and 16 bit quantities)?

That may be a serious problem for PyObjC....
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From thomas@xs4all.net  Thu Apr 17 21:59:56 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 17 Apr 2003 22:59:56 +0200
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com>
Message-ID: <20030417205956.GC9493@xs4all.nl>

On Thu, Apr 17, 2003 at 11:53:01AM -0400, Guido van Rossum wrote:
> > Anyway, if anyone has straightforward ideas about how CALL_ATTR should
> > deal with newstyle classes (if at all), please inform me (preferably via
> > SF) or just grab the patch and run with it. I'm still confused about
> > descrgets and where they come from.

> Yes, please.  Here's a quick explanation of descriptors:

[ the descriptor describes descriptors ]

> Hope this helps!

Well, yes, in that it reminded me to stop looking for how functions get
turned into methods. That part is the same for old-style classes, though,
and not quite what I'm confused about. What the call_attr patch does is
shortcut the instance_getattr functions in a new function, to do just that
what is necessary (and no more.) _Py_instance_getmethod() returns NULL for
anything that isn't a method, too, letting the slow case handle it. When it
does find a would-be method, it returns the unwrapped function. The
call_attr function basically does a PyInstance_Check() and a
_Py_instance_getmethod(), and calls the returned function.

The problem I have with newstyle classes is where to shortcut what. I
understand now how to detect a would-be method, but I'm not sure how to get
unwrapped attributes. As far as I understand, types can provide their own
getattr function with complete control over descriptors, so there isn't much
to shortcut. Unless I should make the shortcut depend on the actual value of
tp_getattro, as in shortcut only if it actually is PyObject_GenericGetAttr ?
In that case, I'm somewhat sceptical about the speed benefit's cost in
maintenance, as it would require a near copy of PyObject_GenericGetAttr
(which is already a near-copy of a few other functions :) It's also very hard
to control any nested getattrs (possible, I think, because the process goes
over all bases' dicts and the instance dict.) Or can we reduce
the number of steps PyObject_GenericGetAttr goes through if we know we are
just looking for a method ? I don't believe so, but I'm not sure.

(Looking at PyObject_GenericGetAttr with that in mind, I wonder if there
isn't a possible crash there. In the first MRO lookup, looking for descr's,
if a non-data-descr is found, it is kept around but not INCREF'd until
later, after the instance-dict is searched. Am I wrong in believing the
PyDict_GetItem of the instance dict can call Python code ? There isn't even
as much as an assert(PyDict_Check(dict)) there.)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From whisper@oz.net  Thu Apr 17 22:03:40 2003
From: whisper@oz.net (David LeBlanc)
Date: Thu, 17 Apr 2003 14:03:40 -0700
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <3E9F0AA3.7000907@v.loewis.de>
Message-ID: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net>

<snip>
> There have been numerous proposals on what the syntactic sugar should
> look like, which is one reason why no specific solution has been
> implemented yet. Proposals get usually discredit if they require
> introduction of new keywords, like "staticdef". The current favorite
> proposals is to write
>
>    def x() [static]:
>      pass
>
> or perhaps
>
>    def x() [staticmethod]:
>      pass
>
> In that proposal, static(method) would *not* be a keyword, but would
> be an identifier (denoting the same thing that staticmethod currently
> denotes). This syntax nicely extends to
>
>    def x() [threading.synchronized, xmlrpclib.webmethod]:
>      pass

I'm not sure what you're suggesting here semantically...?

> The syntax has the disadvantage of not applying nicely to slots.
>
> Regards,
> Martin

It also has the disadvantage of adding a new syntactical construct to the
language does it not (which seems like more pain than a couple of keywords)?
I don't recall any other place in the language that uses [] as a way to
specify a variable (oops, excepting list comprehensions sort of, and that's
not quite the same thing IMO), especially in that position in a statement?
It seems like it would open the door to uses (abuses?) like:
	class foo [abstract]:
		pass
(although, this particular one might satisfy the group that wants interfaces
in python)

Is there any real difference between what amounts to a reserved constant
identifier (with semantic meaning rather than value) compared to a keyword
statement sentinal? Are there any other language-level uses like that
(reserved constant identifier), or does this introduce something new as
well?

Speaking of slots, is their primary purpose to have classes whose instances
are not morphable? If so, one might default to all classes being
non-morphable by default and having something like:
	class foo [morphable]:
		pass
as identifying those which are (an obviously python-3000 feature if
implemented thusly).

Regards,

Dave LeBlanc
Seattle, WA USA



From Jack.Jansen@oratrix.com  Thu Apr 17 22:29:24 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Thu, 17 Apr 2003 23:29:24 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304172017.h3HKHUO05664@odiug.zope.com>
Message-ID: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com>

On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van Rossum 
wrote:

>>> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
>>> week, that would be Friday April 25.  If anyone has something that
>>> needs to be done before this release go out, please let me know!
>>
>> The getargs mods got checked in just this morning, even though I
>> explicitly and rather strongly asked that if these mods be made they
>> be checked in *long* before a release was due:-(
>
> Sorry, I forgot.  Did you make a note of that on the SF patch?

Yes, I'm pretty sure I did. Thomas also seems to refer to it...

>> This means that all the Mac modules are now 100% dead. The same is
>> probably true for PyObjC. And PyObjC has the added problem that it
>> needs to be compatible with both 2.3b1 and 2.2 (notice that that is
>> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that
>> Apple ships, which is 2.2 at the moment). I assume there are format
>> codes that will convert 16 bit and 32 bit integer quantities without
>> any checks on both 2.2 and 2.3, but I haven't investigated yet.
>
> Maybe we should retract the changes to existing format codes that make
> them more restrictive?  That should revive any code that's curerntly
> dead, right?

That would be much better. if "l" (lower case ell) would continue to 
accept
anything I wouldn't have to change anything.

Of course I've been busy all night fixing code, but apart from a couple
of hand-crafted modules I haven't checked anything in yet. I will check
it in on a branch later tonight, and then I'll either forget about the 
branch
or merge it, depending on the resolution of this.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From gherron@islandtraining.com  Thu Apr 17 22:39:16 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 17 Apr 2003 14:39:16 -0700
Subject: [Python-Dev] Build errors under RH9
Message-ID: <200304171439.17504.gherron@islandtraining.com>

I just upgraded my development system to RedHat 9, and now I get two
compilation errors on the Python CVS tree.  I'll have time to examine
them tonight, but I thought I'd get a notice out now on the chance
that someone else has already resolved them.

1. Compilation of _tkinter comes up with
     #error "unsupported Tcl configuration"

   The failing test was changed just yesterday, but the previous
   version gives the same results:
   In revision 1.155:
     #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6)
   and in revision 1.154:
     #if TCL_UTF_MAX != 3
  
  And yet, if I remove the test, I get a (very minimally tested)
  working version of Tkinter, so the test should probably be modified
  to pass in whatever circumstances RH 9 presents.
 

2. Compilation of _ssl.c fails to find, through a chain of includes,
   file krb5.h.  Then things rapidly go to hell.

   Defining 
     #define OPENSSL_NO_KRB5
   gets through the compilation, but I don't yet know how to test it.

   (How to I get past the
      "Use of the `network' resource not enabled"
    result of running test_socket_ssl.py?)

Gary Herron




From skip@pobox.com  Thu Apr 17 22:43:23 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 17 Apr 2003 16:43:23 -0500
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net>
References: <3E9F0AA3.7000907@v.loewis.de>
 <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net>
Message-ID: <16031.8187.420786.801944@montanaro.dyndns.org>

    David> It also has the disadvantage of adding a new syntactical
    David> construct to the language does it not (which seems like more pain
    David> than a couple of keywords)?  I don't recall any other place in
    David> the language that uses [] as a way to specify a variable (oops,
    David> excepting list comprehensions sort of, and that's not quite the
    David> same thing IMO), especially in that position in a statement? 

Adding new syntactic sugar is less problem than adding keywords for two
reasons:

    * old code may have used the new keyword as a variable (because it
      wasn't a keyword)

    * old code won't have used the new syntactic sugar (because it wasn't
      proper syntax)

Combined, it means there is a higher probability that old code will continue
to run with a new bit of syntax than with a new keyword.

You can think of [mod1, mod2, ...] as precisely a list of modifiers to
normal functions, so it is very much like existing list construction syntax
in that regard.  Also "[...]" often means "optional" in may grammar
specifications or documentation, so there's an added hint as to the meaning.

Skip


From theller@python.net  Thu Apr 17 22:54:08 2003
From: theller@python.net (Thomas Heller)
Date: 17 Apr 2003 23:54:08 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com>
References: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com>
Message-ID: <ptnk94e7.fsf@python.net>

Jack Jansen <Jack.Jansen@oratrix.com> writes:

> On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van
> Rossum wrote:
> 
> 
> >>> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> >>> week, that would be Friday April 25.  If anyone has something that
> >>> needs to be done before this release go out, please let me know!
> >>
> >> The getargs mods got checked in just this morning, even though I
> >> explicitly and rather strongly asked that if these mods be made they
> >> be checked in *long* before a release was due:-(
> >
> > Sorry, I forgot.  Did you make a note of that on the SF patch?
> 
> Yes, I'm pretty sure I did. Thomas also seems to refer to it...

He did, and I also mentioned it yesterday.
OTOH, I had sitting a first version of the patch on SF for a rather long
time (shortly after the alpha2 release), asking for feedback, but
didn't get any.

> 
> >> This means that all the Mac modules are now 100% dead. The same is
> >> probably true for PyObjC. And PyObjC has the added problem that it
> >> needs to be compatible with both 2.3b1 and 2.2 (notice that that is
> >> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that
> >> Apple ships, which is 2.2 at the moment). I assume there are format
> >> codes that will convert 16 bit and 32 bit integer quantities without
> >> any checks on both 2.2 and 2.3, but I haven't investigated yet.
> >
> > Maybe we should retract the changes to existing format codes that make
> > them more restrictive?  That should revive any code that's curerntly
> > dead, right?
> 
> That would be much better. if "l" (lower case ell) would continue to
> accept anything I wouldn't have to change anything.
> 

Guido has also suggested to keep another code without changes, I cannot
remember which one it was, but there is a comment on SF.

I have the impression that the new test_getargs2.py test makes it easy
to change the behaviour and verify it to anything we want.

In case it is too much trouble, why not backout all this again (although
someone else would have to do it, I'm basically offline until tuesday), and
check in after the b1 release.

Sorry,  Thomas



From klm@zope.com  Thu Apr 17 23:02:20 2003
From: klm@zope.com (Ken Manheimer)
Date: Thu, 17 Apr 2003 18:02:20 -0400 (EDT)
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <16031.8187.420786.801944@montanaro.dyndns.org>
Message-ID: <Pine.LNX.4.44.0304171745490.963-100000@korak.zope.com>

On Thu, 17 Apr 2003, Skip Montanaro wrote:

> Adding new syntactic sugar is less problem than adding keywords for two
> reasons:
> 
>     * old code may have used the new keyword as a variable (because it
>       wasn't a keyword)
> 
>     * old code won't have used the new syntactic sugar (because it wasn't
>       proper syntax)

(If i recall correctly, minting new keywords is particularly onerous
in python because of its simple parser.  Specifically, you can't use
keywords for variable names anywhere, even outside the syntactic
construct which involves the keyword.  Hence the need to use 'klass'
instead of 'class' for parameter names, no variables named 'from',
etc.  'import's recent aliasing refinement -

  import x as y

was implemented without making "as" a keyword specifically to avoid
this drawback.  "as" gets its role there purely by virtue of the
import syntax, not as a new keyword - and so you can use "as" as a
variable name, etc.  The scheme for qualifying function definitions
with [...] would have the same virtue - not requiring the qualifiers
to be new keywords...)

-- 
Ken
klm@zope.com



From martin@v.loewis.de  Thu Apr 17 23:29:06 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 18 Apr 2003 00:29:06 +0200
Subject: [Python-Dev] Wrappers and keywords
In-Reply-To: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net>
References: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net>
Message-ID: <3E9F2AB2.4010708@v.loewis.de>

David LeBlanc wrote:
>>In that proposal, static(method) would *not* be a keyword, but would
>>be an identifier (denoting the same thing that staticmethod currently
>>denotes). This syntax nicely extends to
>>
>>   def x() [threading.synchronized, xmlrpclib.webmethod]:
>>     pass
> 
> 
> I'm not sure what you're suggesting here semantically...?

That is part of the point: You could add arbitrary annotations
to function definitions, to indicate that they are static methods,
to indicate that multiple calls to them should be synchronized,
or to indicate that the method should be available via SOAP
(the simple object access protocol).

The language would not associate any inherent semantics. Instead,
the identifiers in the square brackets would be callable (or
have some other interface) that modifies the
function-under-construction, to integrate additional aspects.

> It also has the disadvantage of adding a new syntactical construct to the
> language does it not (which seems like more pain than a couple of keywords)?

No. The disadvantage of adding keywords is that it breaks backwards 
compatibility: Somebody might be using that identifier already. When it
becomes a keyword, existing code that works now would stop working.
With the extension of sqare brackets after the parameter list, nothing
breaks, as you can't currently put brackets in that place.

> I don't recall any other place in the language that uses [] as a way to
> specify a variable (oops, excepting list comprehensions sort of, and that's
> not quite the same thing IMO), especially in that position in a statement?
> It seems like it would open the door to uses (abuses?) like:
> 	class foo [abstract]:
> 		pass

The syntax is inspired by DCOM IDL, and by C#, both allowing to annotate
declarations with square brackets.

> Is there any real difference between what amounts to a reserved constant
> identifier (with semantic meaning rather than value) compared to a keyword
> statement sentinal? 

What is a keyword statement sentinal, and what alternatives are you 
comparing here?

> Are there any other language-level uses like that
> (reserved constant identifier), or does this introduce something new as
> well?

If you are referring the the
     def foo()[static]

proposal: "static" would not be reserved nor constant. Instead, writing

     def foo()[bar1, bar2]:
        body

would be a short-hand for writing

     def foo():
       body
     foo = bar1(foo)
     foo = bar2(foo)

bar1 and bar2 could be arbitrary expressions - nothing reserved at all.

> Speaking of slots, is their primary purpose to have classes whose instances
> are not morphable? 

No.

Regards,
Martin



From neal@metaslash.com  Thu Apr 17 23:23:59 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Thu, 17 Apr 2003 18:23:59 -0400
Subject: [Python-Dev] Build errors under RH9
In-Reply-To: <200304171439.17504.gherron@islandtraining.com>
References: <200304171439.17504.gherron@islandtraining.com>
Message-ID: <20030417222359.GB28630@epoch.metaslash.com>

On Thu, Apr 17, 2003 at 02:39:16PM -0700, Gary Herron wrote:
> I just upgraded my development system to RedHat 9, and now I get two
> compilation errors on the Python CVS tree.  I'll have time to examine
> them tonight, but I thought I'd get a notice out now on the chance
> that someone else has already resolved them.
> 
> 1. Compilation of _tkinter comes up with
>      #error "unsupported Tcl configuration"
> 
>    The failing test was changed just yesterday, but the previous
>    version gives the same results:
>    In revision 1.155:
>      #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6)
>    and in revision 1.154:
>      #if TCL_UTF_MAX != 3
>   
>   And yet, if I remove the test, I get a (very minimally tested)
>   working version of Tkinter, so the test should probably be modified
>   to pass in whatever circumstances RH 9 presents.

I believe Martin von Loewis already checked in a fix for this.
http://python.org/sf/719880

> 2. Compilation of _ssl.c fails to find, through a chain of includes,
>    file krb5.h.  Then things rapidly go to hell.
> 
>    Defining 
>      #define OPENSSL_NO_KRB5
>    gets through the compilation, but I don't yet know how to test it.
> 
>    (How to I get past the
>       "Use of the `network' resource not enabled"
>     result of running test_socket_ssl.py?)

I just checked in a fix for Feature Request #719429 which fixes
this problem.  It finds the header file.  To enable resources:

        ./python -E -tt ./Lib/test/regrtest.py -u network

I have a couple of failures.  I think they may have occurred
before upgrading.  Is anybody else seeing this?

        test_array OverflowError: unsigned short integer is less than minimum
        test_logging - (I think this is the old test sensitivity)
        test_trace - AssertionError: events did not match expectation

Neal


From martin@v.loewis.de  Thu Apr 17 23:35:15 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 18 Apr 2003 00:35:15 +0200
Subject: [Python-Dev] Build errors under RH9
In-Reply-To: <200304171439.17504.gherron@islandtraining.com>
References: <200304171439.17504.gherron@islandtraining.com>
Message-ID: <3E9F2C23.70809@v.loewis.de>

Gary Herron wrote:

> I just upgraded my development system to RedHat 9, and now I get two
> compilation errors on the Python CVS tree.  I'll have time to examine
> them tonight, but I thought I'd get a notice out now on the chance
> that someone else has already resolved them.
> 
> 1. Compilation of _tkinter comes up wit
>      #error "unsupported Tcl configuration"
> 
>    The failing test was changed just yesterday, but the previous
>    version gives the same results:
>    In revision 1.155:
>      #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6)
>    and in revision 1.154:
>      #if TCL_UTF_MAX != 3
>   
>   And yet, if I remove the test, I get a (very minimally tested)
>   working version of Tkinter, so the test should probably be modified
>   to pass in whatever circumstances RH 9 presents.

That change is indeed intended to fix the problem. You need to configure
Python with --enable-unicode=ucs4 on Redhat 9; compiling in UCS-2 
support is not supported if you want Tkinter to work with the Tk 
provided by Redhat. Before this change, --enable-unicode=ucs4 would not
work, either. Outright removing the test gives is incorrect as well.

>    (How to I get past the
>       "Use of the `network' resource not enabled"
>     result of running test_socket_ssl.py?)

Pass "-u network" to regrtest.

Regards,
Martin




From guido@python.org  Fri Apr 18 00:14:50 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 19:14:50 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: "Your message of 17 Apr 2003 23:54:08 +0200."
 <ptnk94e7.fsf@python.net>
References: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com>
 <ptnk94e7.fsf@python.net>
Message-ID: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net>

> Jack Jansen <Jack.Jansen@oratrix.com> writes:
> 
> > On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van
> > Rossum wrote:
> > 
> > 
> > >>> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> > >>> week, that would be Friday April 25.  If anyone has something that
> > >>> needs to be done before this release go out, please let me know!
> > >>
> > >> The getargs mods got checked in just this morning, even though I
> > >> explicitly and rather strongly asked that if these mods be made they
> > >> be checked in *long* before a release was due:-(
> > >
> > > Sorry, I forgot.  Did you make a note of that on the SF patch?
> > 
> > Yes, I'm pretty sure I did. Thomas also seems to refer to it...
> 
> He did, and I also mentioned it yesterday.
> OTOH, I had sitting a first version of the patch on SF for a rather long
> time (shortly after the alpha2 release), asking for feedback, but
> didn't get any.

That was my fault -- I was too busy. :-(

> > >> This means that all the Mac modules are now 100% dead. The same is
> > >> probably true for PyObjC. And PyObjC has the added problem that it
> > >> needs to be compatible with both 2.3b1 and 2.2 (notice that that is
> > >> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that
> > >> Apple ships, which is 2.2 at the moment). I assume there are format
> > >> codes that will convert 16 bit and 32 bit integer quantities without
> > >> any checks on both 2.2 and 2.3, but I haven't investigated yet.
> > >
> > > Maybe we should retract the changes to existing format codes that make
> > > them more restrictive?  That should revive any code that's curerntly
> > > dead, right?
> > 
> > That would be much better. if "l" (lower case ell) would continue to
> > accept anything I wouldn't have to change anything.
> 
> Guido has also suggested to keep another code without changes, I cannot
> remember which one it was, but there is a comment on SF.

That was 'h'.

> I have the impression that the new test_getargs2.py test makes it easy
> to change the behaviour and verify it to anything we want.
> 
> In case it is too much trouble, why not backout all this again (although
> someone else would have to do it, I'm basically offline until tuesday), and
> check in after the b1 release.

I'll back out the change to 'h', which is the only incompatible change
I can see (unless you consider accepting *more* than before an error).
Thomas made no changes to 'l', so I'm not sure what that is about --
maybe the problem is with unsigned hex constants?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas@xs4all.net  Fri Apr 18 01:06:50 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 18 Apr 2003 02:06:50 +0200
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <20030417205956.GC9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl>
Message-ID: <20030418000650.GD9493@xs4all.nl>

On Thu, Apr 17, 2003 at 10:59:56PM +0200, Thomas Wouters wrote:

> Unless I should make the shortcut depend on the actual value of
> tp_getattro, as in shortcut only if it actually is
> PyObject_GenericGetAttr ?

Well, I went ahead and did that, and uploaded the new patch to SF. The
result is somewhat annoying, but explainable: The patch is now 3% _slower_
than an unmodified Python, whereas the patch without support for newstyle
classes was a good 5% _faster_ than unmodified. This is both according to
PyBench (which doesn't use newstyle classes) and according to
'time timeit.py pass' (which does use newstyle classes.) Timing just
'x.foo()' where 'x' is a newstyle class instance is about 20% faster,
against 25-30% for oldstyle classes.

The overall slowdown is caused by the fact that the patch only treats
PyFunctions (functions written in Python) specially, and not PyMethodDescrs
(PyCFunctions wrapped in PyMethodDefs wrapped in a descriptor.) This is
because it would still need to instantiate a PyCFunctionObject (a PyObject
wrapper for a PyCFunction, which is just a C function-pointer) OR it would
need to do all interpretation of METH_* arguments and a bunch of
argument-preparing itself.

Another possible cause for the slowdown (but almost certainly not as
substantial as the type-with-C-function one) is calling an almost-method on
a newstyle class; a callable object that is an attribute of a type (or
instance of the type) but is not a PyFunction or PyMethodDescr. The way the
current mechanisms works, it would have to traverse the MRO and (possibly)
check the instance dict twice; first to determine that it's not a PyFunction
in _PyObject_Generic_getmethod() and then again in the regular run though
PyObject_GenericGetAttr(). Examples of this case would be staticmethods,
classmethods, and other callable objects as attributes. I do not believe
this is a substantial party though.

The slowdown can be fixed in two ways: handing PyMethodDescrs as well, in
_PyObject_Generic_getmethod(), or removing the double lookups. Hm, wait,
handling PyMethodDescrs may not be as tricky as I thought... hrm... I'll
look at it tomorrow, it's time for bed.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido@python.org  Fri Apr 18 01:22:56 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 20:22:56 -0400
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: "Your message of Thu, 17 Apr 2003 22:59:56 +0200."
 <20030417205956.GC9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com>
 <20030417205956.GC9493@xs4all.nl>
Message-ID: <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net>

> (Looking at PyObject_GenericGetAttr with that in mind, I wonder if
> there isn't a possible crash there. In the first MRO lookup, looking
> for descr's, if a non-data-descr is found, it is kept around but not
> INCREF'd until later, after the instance-dict is searched. Am I
> wrong in believing the PyDict_GetItem of the instance dict can call
> Python code ?

It can, if there's a key whose type has a custom __eq__ or __cmp__.
So indeed, if this custom __eq__ is evil enough to delete the
corresponding key from the class dict, it could cause descr to point
to freed memory.  I won't try to construct a case, but it's not
impossible. :-(

Fixing this would make the code even hairier though... :-(

> There isn't even as much as an assert(PyDict_Check(dict)) there.)

All over the place it is assumed and ensured that a types tp_dict and
an instance's __dict__ are always real dicts.  The only way this could
be violated would be by C code defining a type that violates this.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From thomas@xs4all.net  Fri Apr 18 01:34:31 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 18 Apr 2003 02:34:31 +0200
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <20030418000650.GD9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl>
Message-ID: <20030418003431.GE9493@xs4all.nl>

On Fri, Apr 18, 2003 at 02:06:50AM +0200, Thomas Wouters wrote:

> Hm, wait, handling PyMethodDescrs may not be as tricky as I thought...
> hrm... I'll look at it tomorrow, it's time for bed.

I did a quick hack to the same effect, and it still came out a 1% loss (so
about 6% against the no-newstyle patch) in PyBench and a few timeit tests.
Sigh. I guess the non-method overhead is just too large, or there are more
almost-methods than I figured. I'll start work on a more lookup-saving
_PyObject_Generic_getmethod tomorrow or this weekend (and will probably do
_Py_instance_getmethod that way too, while I'm at it.)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From greg@cosc.canterbury.ac.nz  Fri Apr 18 03:15:11 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Apr 2003 14:15:11 +1200 (NZST)
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com>
Message-ID: <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz>

> In earlier versions, functions were special-cased by the instance
> getattr code; the special case has been subsumed by looking for a
> __get__ method.  Yes, this means that a plain Python function object
> is a descriptor!

While we're on the topic -- Guid, how would you feel about the
idea of giving built-in function objects the same instance-
binding behaviour as interpreted functions?

This would help Pyrex considerably, because currently I
have to resort to a kludge to make Pyrex-defined functions
work as methods. It mostly works, but it has some side
effects, such as breaking the most common idiomatic
usage of staticmethod() and classmethod().

If built-in functions were more like interpreted functions
in this regard, all these problems would go away.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri Apr 18 03:21:06 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Apr 2003 14:21:06 +1200 (NZST)
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <20030417205956.GC9493@xs4all.nl>
Message-ID: <200304180221.h3I2L6911441@oma.cosc.canterbury.ac.nz>

Thomas Wouters <thomas@xs4all.net>:

> The problem I have with newstyle classes is where to shortcut what.

It sounds to me like descriptor objects will need to
have a __callattr__ slot added.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Fri Apr 18 03:27:09 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 17 Apr 2003 22:27:09 -0400
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: "Your message of Fri, 18 Apr 2003 14:15:11 +1200."
 <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz>
References: <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz>
Message-ID: <200304180227.h3I2R9R14494@pcp02138704pcs.reston01.va.comcast.net>

> While we're on the topic -- Guid, how would you feel about the
> idea of giving built-in function objects the same instance-
> binding behaviour as interpreted functions?
> 
> This would help Pyrex considerably, because currently I
> have to resort to a kludge to make Pyrex-defined functions
> work as methods. It mostly works, but it has some side
> effects, such as breaking the most common idiomatic
> usage of staticmethod() and classmethod().
> 
> If built-in functions were more like interpreted functions
> in this regard, all these problems would go away.

There are two ways to "bind" a built-in function to an object.

One would be to do what happens for Python functions, which is in
effect a currying: f.__get__(obj) yields a function g that when called
as g(arg1, ...) calls f(obj, arg1, ...).  (In fact, I've recently
checked in a change that makes instancemethod a general currying
function on the first argument. :-)

But the other interpretation, which might be more appropriate for C
functions, is that the bound instance is passed to the first argument
at the *C* level, usually called self:

  PyObject *
  my_c_function(PyObject *self, PyObject *args)
  {
      ...
  }

Which one would you like?  I think we could do each rather easily
(perhaps the first more easily because the type needed to represent
the bound method already exist; for the second I think we'd have to
introduce a new helper object type).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Fri Apr 18 04:38:30 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Apr 2003 15:38:30 +1200 (NZST)
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <200304180227.h3I2R9R14494@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz>

> There are two ways to "bind" a built-in function to an object.
> 
> One would be to do what happens for Python functions, which is in
> effect a currying: f.__get__(obj) yields a function g that when called
> as g(arg1, ...) calls f(obj, arg1, ...).

That's the one I'm talking about. I forgot to explain that the problem
occurs when I'm creating a *Python* class object and populating it
with functions that are supposed to be methods. Currently I have to
manually wrap each function in an unbound method object before putting
it in the class's __dict__. If that happened automatically on access,
I would be able to create Python classes that behave more like the
real thing.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From Jack.Jansen@oratrix.com  Fri Apr 18 09:19:24 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Fri, 18 Apr 2003 10:19:24 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com>

On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum 
wrote:
>>>> Maybe we should retract the changes to existing format codes that 
>>>> make
>>>> them more restrictive?  That should revive any code that's curerntly
>>>> dead, right?
>>>
>>> That would be much better. if "l" (lower case ell) would continue to
>>> accept anything I wouldn't have to change anything.
>>
>> Guido has also suggested to keep another code without changes, I 
>> cannot
>> remember which one it was, but there is a comment on SF.
>
> That was 'h'.

Right, 'h' turns out to be the problem. I changed a lot of 'l's to 
'k's, but
it seems this one is the real killer.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From Jack.Jansen@oratrix.com  Fri Apr 18 09:48:35 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Fri, 18 Apr 2003 10:48:35 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com>

On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum 
wrote:
> I'll back out the change to 'h', which is the only incompatible change
> I can see (unless you consider accepting *more* than before an error).
> Thomas made no changes to 'l', so I'm not sure what that is about --
> maybe the problem is with unsigned hex constants?

Okay, great!!

Is this a temporary measure, i.e. is the semantic change to 'h' going 
to come
back after 2.3 is out?
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From g_a_l_l_a@mail333.com  Fri Apr 18 10:19:17 2003
From: g_a_l_l_a@mail333.com (g_a_l_l_a@mail333.com)
Date: 18 Apr 2003 13:19:17 +0400
Subject: [Python-Dev] 0400058546-ID: We are glad to inform you about the changes in our website
Message-ID: <2003.04.18.03847F4F401D71F0@mail333.com>

Dear Sir or Madam,
We are glad to inform you about the changes in our website
http://www.gallery-a.ru. Now you can get to know the price for the 
paintings
in our gallery without filling the order form.
Also, we have new works of Utkin Alexandr, Belonog Anatoly and other
painters.
Soon there will be pages of new painters and new section of free wallpapers
and screensavers.
Welcome to our website!
Gallery-a curator.

Sorry if that information not interesting for You and we disturb You with 
our message!
For removing yor address from this mailing list just replay this message 
with word 'unsubscribe' in subject field
or simple click this link:
 http://www.gallery-a.ru/unsubscribe.php?e=cHl0aG9uLWRldkBweXRob24ub3JnOjE2ODAwMzcx






From mal@lemburg.com  Fri Apr 18 11:51:23 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 18 Apr 2003 12:51:23 +0200
Subject: [Python-Dev] Startup overhead due to codec usage
In-Reply-To: <m3el475244.fsf@mira.informatik.hu-berlin.de>
References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com>	<m3r8883qy8.fsf@mira.informatik.hu-berlin.de>	<3E97FD37.9040100@lemburg.com> <m3el475244.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3E9FD8AB.8040400@lemburg.com>

Martin v. L=F6wis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>=20
>>The codec machinery was carefully designed not to introduce
>>extra overhead when not using Unicode in programs. The above
>>approach pretty much kills this effort :-)
>=20
> This effort is dead already. For example, on Unix, the file system
> default encoding is initialized from the user's preference; to verify
> that the encoding really exists, a codec lookup is performed.

Hmm, then we should fix this and the site.py lookup you
introduced. I don't see the point in increasing startup
time for all scripts just because a seldom used feature needs
initialization.

BTW, I wonder what happens if you run a Python version with
Unicode disabled in the current scenario.

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 18 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        67 days left



From martin@v.loewis.de  Fri Apr 18 12:33:17 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 18 Apr 2003 13:33:17 +0200
Subject: [Python-Dev] Startup overhead due to codec usage
In-Reply-To: <3E9FD8AB.8040400@lemburg.com>
References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com>
 <m3r8883qy8.fsf@mira.informatik.hu-berlin.de>
 <3E97FD37.9040100@lemburg.com>
 <m3el475244.fsf@mira.informatik.hu-berlin.de>
 <3E9FD8AB.8040400@lemburg.com>
Message-ID: <m3n0io82gy.fsf@mira.informatik.hu-berlin.de>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> Hmm, then we should fix this and the site.py lookup you
> introduced. I don't see the point in increasing startup
> time for all scripts just because a seldom used feature needs
> initialization.

I really don't see the need to fix anything here. I wouldn't mind
somebody else fixing something, as long as none of the features break.

> BTW, I wonder what happens if you run a Python version with
> Unicode disabled in the current scenario.

The nl_langinfo code in Python/pythonrun.c is disabled when unicode is
disabled. In turn, it won't be executed, and
Py_FileSystemDefaultEncoding stays at NULL. This is no problem, as it
is never used, anyway.

For the code in site.py (I think), finding the codec will fail with an
exception, which will be caught, and the "mbcs" alias will be added.

Regards,
Martin


From dave@boost-consulting.com  Fri Apr 18 14:06:30 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Fri, 18 Apr 2003 09:06:30 -0400
Subject: [Python-Dev] Re: CALL_ATTR patch
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl>
 <200304171553.h3HFr1023445@odiug.zope.com> <3E9EE44A.6060904@lemburg.com>
 <200304171734.h3HHYVU03250@odiug.zope.com>
Message-ID: <uvfxc550p.fsf@boost-consulting.com>

Guido van Rossum <guido@python.org> writes:

>> Could you put such short overviews somewhere on the Python Wiki ?
>
> I don't have the time for that.  When I want to publish stuff like
> this somewhere, I need to spend time to make it all correct, complete
> etc.

Besides which, it's already in the docs.  Correct, complete, and all
that ;-)

    http://www.python.org/dev/doc/devel/ref/descriptors.html


-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From guido@python.org  Fri Apr 18 14:21:02 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 18 Apr 2003 09:21:02 -0400
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: "Your message of Fri, 18 Apr 2003 15:38:30 +1200."
 <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz>
References: <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz>
Message-ID: <200304181321.h3IDL2922688@pcp02138704pcs.reston01.va.comcast.net>

> > There are two ways to "bind" a built-in function to an object.
> > 
> > One would be to do what happens for Python functions, which is in
> > effect a currying: f.__get__(obj) yields a function g that when called
> > as g(arg1, ...) calls f(obj, arg1, ...).
> 
> That's the one I'm talking about. I forgot to explain that the problem
> occurs when I'm creating a *Python* class object and populating it
> with functions that are supposed to be methods. Currently I have to
> manually wrap each function in an unbound method object before putting
> it in the class's __dict__. If that happened automatically on access,
> I would be able to create Python classes that behave more like the
> real thing.

OK, are you up for submitting a patch?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 18 14:25:23 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 18 Apr 2003 09:25:23 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: "Your message of Fri, 18 Apr 2003 10:19:24 +0200."
 <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com>
References: <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com>
Message-ID: <200304181325.h3IDPNl22760@pcp02138704pcs.reston01.va.comcast.net>

> Right, 'h' turns out to be the problem. I changed a lot of 'l's to
> 'k's, but it seems this one is the real killer.

So now that I rolled back 'h', is there any reason not to keep the
rest of these changes?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 18 14:41:30 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 18 Apr 2003 09:41:30 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: "Your message of Fri, 18 Apr 2003 10:48:35 +0200."
 <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com>
References: <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com>
Message-ID: <200304181341.h3IDfUx22802@pcp02138704pcs.reston01.va.comcast.net>

> On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum 
> wrote:
> > I'll back out the change to 'h', which is the only incompatible change
> > I can see (unless you consider accepting *more* than before an error).
> > Thomas made no changes to 'l', so I'm not sure what that is about --
> > maybe the problem is with unsigned hex constants?
> 
> Okay, great!!
> 
> Is this a temporary measure, i.e. is the semantic change to 'h'
> going to come back after 2.3 is out?

I don't see why -- it always was a signed short, let it stay that way.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mwh@python.net  Fri Apr 18 19:26:22 2003
From: mwh@python.net (Michael Hudson)
Date: Fri, 18 Apr 2003 19:26:22 +0100
Subject: [Python-Dev] CALL_ATTR patch
In-Reply-To: <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net> (Guido
 van Rossum's message of "Thu, 17 Apr 2003 20:22:56 -0400")
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <20030417152722.GA9493@xs4all.nl>
 <200304171553.h3HFr1023445@odiug.zope.com>
 <20030417205956.GC9493@xs4all.nl>
 <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <2m3ckfr7ap.fsf@starship.python.net>

Guido van Rossum <guido@python.org> writes:

>> (Looking at PyObject_GenericGetAttr with that in mind, I wonder if
>> there isn't a possible crash there. In the first MRO lookup, looking
>> for descr's, if a non-data-descr is found, it is kept around but not
>> INCREF'd until later, after the instance-dict is searched. Am I
>> wrong in believing the PyDict_GetItem of the instance dict can call
>> Python code ?
>
> It can, if there's a key whose type has a custom __eq__ or __cmp__.
> So indeed, if this custom __eq__ is evil enough to delete the
> corresponding key from the class dict, it could cause descr to point
> to freed memory.  I won't try to construct a case, but it's not
> impossible. :-(

Indeed, there are several examples of this sort of thing already in
Lib/test/test_mutants.py.

Cheers,
M.

-- 
  If comp.lang.lisp *is* what vendors are relying on to make or
  break Lisp sales, that's more likely the problem than is the
  effect of any one of us on such a flimsy marketing strategy...
                                      -- Kent M Pitman, comp.lang.lisp


From pje@telecommunity.com  Fri Apr 18 20:02:14 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Fri, 18 Apr 2003 15:02:14 -0400
Subject: [Python-Dev] Built-in functions as methods
Message-ID: <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net>

Hi guys.  Greg, you were asking about making built-in functions act like 
methods.  This could break code, if it applies to all built-in 
functions.  In more than one Python version, I have stuck a built-in type 
or function into a class, under the assumption that it would behave as a 
'staticmethod' now does.  If all built-in functions start acting like 
unbound methods, existing code will break.

I'm not positive, but I think there's even code like this in the standard 
library.

I'm all for anything that makes Pyrex easier for Greg to maintain <wink>, 
but perhaps there is a flag that could be used to request the behavior so 
that existing code won't break?




From andymac@bullseye.apana.org.au  Fri Apr 18 14:24:30 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Sat, 19 Apr 2003 00:24:30 +1100 (edt)
Subject: [Python-Dev] Build errors under RH9
In-Reply-To: <20030417222359.GB28630@epoch.metaslash.com>
Message-ID: <Pine.OS2.4.44.0304190015530.1480-100000@tenring.andymac.org>

On Thu, 17 Apr 2003, Neal Norwitz wrote:

> I have a couple of failures.  I think they may have occurred
> before upgrading.  Is anybody else seeing this?
>
>         test_array OverflowError: unsigned short integer is less than minimum

On FreeBSD 4.4, I'm seeing this one...

>         test_logging - (I think this is the old test sensitivity)
>         test_trace - AssertionError: events did not match expectation

... but not these 2.

I've also stumbled across a compiler optimisation issue with the changes
Guido checked in to SRE on April 14 - on FreeBSD 4.4 at least.  gcc -O3
produces an _sre.o that gives rise to a bus error; -O2 works (gcc is
v2.95.3).

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From guido@python.org  Fri Apr 18 20:28:02 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 18 Apr 2003 15:28:02 -0400
Subject: [Python-Dev] Built-in functions as methods
In-Reply-To: "Your message of Fri, 18 Apr 2003 15:02:14 EDT."
 <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net>
References: <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net>
Message-ID: <200304181928.h3IJS2m01321@pcp02138704pcs.reston01.va.comcast.net>

> Hi guys.  Greg, you were asking about making built-in functions act like 
> methods.  This could break code, if it applies to all built-in 
> functions.  In more than one Python version, I have stuck a built-in type 
> or function into a class, under the assumption that it would behave as a 
> 'staticmethod' now does.  If all built-in functions start acting like 
> unbound methods, existing code will break.
> 
> I'm not positive, but I think there's even code like this in the standard 
> library.
> 
> I'm all for anything that makes Pyrex easier for Greg to maintain <wink>, 
> but perhaps there is a flag that could be used to request the behavior so 
> that existing code won't break?

Good point!  I suppose Greg could use something very similar to the
standard built-in object type but with a __get__ method, or he could
define a flag you have to set in the ml_flags before __get__ returns a
bound function.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Fri Apr 18 21:33:23 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Fri, 18 Apr 2003 22:33:23 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304181325.h3IDPNl22760@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <00662E75-71DD-11D7-B18D-000A27B19B96@oratrix.com>

On vrijdag, apr 18, 2003, at 15:25 Europe/Amsterdam, Guido van Rossum 
wrote:

>> Right, 'h' turns out to be the problem. I changed a lot of 'l's to
>> 'k's, but it seems this one is the real killer.
>
> So now that I rolled back 'h', is there any reason not to keep the
> rest of these changes?

No, everything is fine as it is now.

I'm happy again!
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From drifty@alum.berkeley.edu  Fri Apr 18 22:39:15 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 18 Apr 2003 14:39:15 -0700 (PDT)
Subject: [Python-Dev] pytho-dev Summary for 2003-04-01 through 2003-04-15
Message-ID: <Pine.SOL.4.55.0304181437030.378@death.OCF.Berkeley.EDU>

Sorry this is later than normal but I got sucked into helping with
elections at UC Berkeley by providing IT support (first time elections are
all on computer).

Anyway, you guys have until Monday night to reply with corrections for the
summary.

+++++++++++++++++++++++++++++++++++++++++++++++++++++
python-dev Summary for 2003-04-01 through 2003-04-15
+++++++++++++++++++++++++++++++++++++++++++++++++++++

This is a summary of traffic on the `python-dev mailing list`_ from April
1, 2003 through April 15, 2003.  It is intended to inform the wider Python
community of on-going developments on the list and to have an archived
summary of each thread started on the list.  To comment on anything
mentioned here, just post to python-list@python.org or `comp.lang.python`_
with a subject line mentioning what you are discussing. All python-dev
members are interested in seeing ideas discussed by the community, so
don't hesitate to take a stance on something.  And if all of this really
interests you then get involved and join `python-dev`_!

This is the fifteenth summary written by Brett Cannon (<voice of Comic
Book Guy from "The Simpsons">Most summaries written by a single person
*ever* </voice>).

All summaries are archived at http://www.python.org/dev/summary/ .

Please note that this summary is written using reStructuredText_ which can
be found at http://docutils.sf.net/rst.html .  Any unfamiliar punctuation
is probably markup for reST_ (otherwise it is probably regular expression
syntax or a typo =); you can safely ignore it, although I suggest learning
reST; its simple and is accepted for `PEP markup`__.  Also, because of the
wonders of programs that like to reformat text, I cannot guarantee you
will be able to run the text version of this summary through Docutils_
as-is unless it is from the original text file.

__ http://www.python.org/peps/pep-0012.html

.. _python-dev: http://www.python.org/dev/
.. _python-dev mailing list:
http://mail.python.org/mailman/listinfo/python-dev
.. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python
.. _Docutils: http://docutils.sf.net/
.. _reST:
.. _reStructuredText: http://docutils.sf.net/rst.html

.. contents::


.. _last summary:
http://www.python.org/dev/summary/2003-03-16_2003-03-31.html



======================
Summary Announcements
======================
So all three people who expressed an opinion about the new Quickies_
format liked the new one, so it stays.

Do you guys actually like the links to the CVS in the Summaries?  The
various links I put in to every single file mentioned that does not have
direct documentation is time-consuming.  But if you find it useful it can
stay.  Please let me know whether you actually use it (this means if you
don't tell me!).


=========
`Boom`__
=========
__ http://mail.python.org/pipermail/python-dev/2003-April/034370.html

Splinter threads:
    - `RE: [Python-checkins] python/dist/src/Modules gcmodule.c
<http://mail.python.org/pipermail/python-dev/2003-April/034371.html>`__

Related threads:
    - `Garbage collecting closures
<http://mail.python.org/pipermail/python-dev/2003-April/034521.html>`__
    - `Algorithm for finalizing cycles
<http://mail.python.org/pipermail/python-dev/2003-April/034609.html>`__

Do you want to know what dedication is?  Thinking of Python code that will
cause Python to crash during dental surgery.  Well, Tim Peters is that
dedicated and managed to come up with some code that crashed Python when
it attempted to garbage-collect some objects.  This begins the joy that is
garbage collection and finalizer functions.

Gather around, children, as we learn about how Python tries to keep you
from having to worry about keeping track of your trash.  When Python
executes the garbage collector, it looks to see what objects are
unreachable based on reference counts; when something has a reference
count of 0 nothing is referencing it so it is just floating out in the
middle of no where with no one giving a hoot about whether it is their or
not (children: "Awww!  Poor, lonely object!").  But some of these lonely
objects have what we call a finalizer (children: "What's that?!?"; isn't
it cute when children are still inquisitive?  Good for you for still being
inquisitive!  Have to be supportive, you know).  A finalizer is either an
instance that has a __del__ object or an object that has something in its
tp_del slot (children: <nod>).  Does anyone know why we have to take of
these lonely objects that are somewhat special? (children: <look around
for someone to be brave enough to actually try to answer; no one does>)
Well, since these objects have something that must be called before they
are garbage-collected, we have to make sure they don't reference an object
that is out there but people think is cared about when it is only an
object with a finalizer and we don't care about their opinions; it might
want to keep an object's reference count above 0 and thus not be collected
(children: "Oh...").  What makes them especially difficult to handle is
that some objects that don't seem like finalizers lie end up acting like
they do by defining a __getattr__ method that does very rude things
(children: "That's not nice!").  And since it is really hard to tell
whether they are real finalizers or just act like one, we just let them be
out there forever so that they don't make the great Interpreter have to
deal with them (some rabble-rouser: "My dad says that there is no great
Interpreter and that everything came from the Compiler and Linker when
they got together and did something my dad won't tell me about.  Something
about the bits and the bees..."; rest of children: "Nu-uh!  The Interpeter
is real!  We've seen it!  It's all-powerful and knowing!  Take it back,
take it back!").  Luckily, though, it is only an issue with something
old-timers call classic classes; things that started to decay away long,
long ago(children: "Yay!  No more cruft!")  And thanks to the diligent
work of some very important people, it has been dealt with (children:
"Yay!  Thank you important people!").  There is a lesson to be learned
here children; do not put old things to pasture to early since you can
make stubborn old people mad which is bad since they make up the majority
of voters in America (children: "Yes, teacher!"; smart-ass in the back of
the room: "How old are you, teacher?" <snicker>).

Guido said that Python's cleanup model could be summed up as "things get
destroyed when> nothing refers to them at some arbitrary time after
nothing refers to
them."  And the corollary is "always explicitly close your external
resources."

Tim Peters gave several suggestions in regards in how to make sure things
get cleaned up; from registering cleanup code with sys.atexit() to keep a
weakref in a module with a finalizer to be executed when the module is
collected.


=========
Quickies
=========
`Distutils documentation amputated in 2.2 docs?`__
    Splinter threads:
        - `How do I report a bug?
<http://mail.python.org/pipermail/python-dev/2003-April/034328.html>`__

    Greg Ewing that two sections from the Distutils docs disappeared
between Python 1.6 and 2.2.  Sections are still missing and will stay so
until someone comes up with a patch to add in the missing sections.  There
was also a discussion on making it more obvious how to report a bug on
SF_.

__ http://mail.python.org/pipermail/python-dev/2003-April/034314.html
.. _SF:
.. _SourceForge: http://www.sf.net/

`PEP 269 once more.`__
    Jonathan Riehl got his patch for implementing `PEP 269`_ ("Pgen Module
for Python") but then uploaded a newer version that is better.

__ http://mail.python.org/pipermail/python-dev/2003-April/034317.html
.. _PEP 269: http://www.python.org/peps/pep-0269.html

`Minor issue with PyErr_NormalizeException`__
    It was discovered that PyErr_NormalizeException could dump core
because it forgot to return on possible errors.  It's been fixed and
back-ported.

__ http://mail.python.org/pipermail/python-dev/2003-April/034325.html

`Capabilities (we already got one)`__
    Splinter threads:
        - `Capabilities
<http://mail.python.org/pipermail/python-dev/2003-April/034315.html>`__
        - `Security challenge
<http://mail.python.org/pipermail/python-dev/2003-April/034343.html>`__

    The thread that refuses to die continued into this month.  Nothing
ground-breaking was said, though.  Ben Laurie, though, did say he is
working on a PEP_ so hopefully that will make this whole discussion clear.

__ http://mail.python.org/pipermail/python-dev/2003-April/034323.html
.. _PEP: http://www.python.org/peps/

`[PEP] += on return of function call result`__
    Someone wanted to do ``log.setdefault(r, '') += "test %d\n" % t``
which does not work.  But it was pointed out you can just do ``temp =
log.setdefault(r, ''); temp += "test %d\n" % t``.

__ http://mail.python.org/pipermail/python-dev/2003-April/034339.html

`How to suppress instance __dict__?`__
    This is only of interest to people who use `Boost.Python`_ (which I
don't use so I am not going to summarize it; although if you use C++ you
will want to look at Boost.Python).

__ http://mail.python.org/pipermail/python-dev/2003-April/034319.html
.. _Boost.Python: http://www.boost.org/libs/python/doc/

`Super and properties`__
    Someone got bit by properties not working nicely with super().  Nathan
Srebro subsequently posted a `link
<http://www.ai.mit.edu/~nati/Python/>`__ to a his own version of super()
which handles this problem.

__

`fwd: Dan Sugalski on continuations and closures`__
    Kevin Altis forwarded some posts by Dan Sugalski (the guy heading the
Parrot_ project and who Guido will throw a pie at at OSCON 2004  =) about
closures and continuations that he found at
http://simon.incutio.com/archive/2003/04/03/#closuresAndContinuations .
Very well-written and might clarify things for people if they care to know
more about closures, continuations, and why Lisp folks claim they are so
damn important.

__ http://mail.python.org/pipermail/python-dev/2003-April/034368.html
.. _Parrot: http://www.parrotcode.org/

`LONG_LONG`__
    Python 2.3 renames the LONG_LONG definition from the C API to
PY_LONG_LONG as it should have been renamed.  Yes, this will break things,
but it was incorrect to have not renamed it.  If you need to keep
compatibility with code before Python 2.3, just use the following code
(contributed by Mark Hammond)::

        #if defined(PY_LONG_LONG) && !defined(LONG_LONG)
        #define LONG_LONG PY_LONG_LONG
        #endif

__ http://mail.python.org/pipermail/python-dev/2003-April/034396.html

`socket question`__
    Someone asking why something didn't build under Solaris and
subsequently being redirected to python-list@python.org .

__ http://mail.python.org/pipermail/python-dev/2003-April/034399.html

`PEP305 csv package: from csv import csv?`__
    Why does one have to do ``from csv import csv``?  Wouldn't it be more
reasonable to just do some magic in __init__.py for the csv_ package to do
this properly?  Well, Skip Montanaro forwarded the question to cvs
development list at csv@mail.mojam.com and said he probably will make the
change in the near future.

__ http://mail.python.org/pipermail/python-dev/2003-April/034409.html
.. _csv: http://www.python.org/dev/doc/devel/lib/module-csv.html

`SF file uploads work now`__
    Yes, hell must have frozen over since you can now upload a file when
you start a new patch or bug report on SourceForge_.

__ http://mail.python.org/pipermail/python-dev/2003-April/034416.html

`Unicode`__
    Splinter threads:
        - `OT: Signal/noise ratio
<http://mail.python.org/pipermail/python-dev/2003-April/034462.html>`__

    Once again another question on python-dev that is not appropriate for
the list.  But this one spawned questions of whether the mailing list
should be renamed (answer: no, since it is fairly well-known what
python-dev is for) or go back to having the list being closed and
requiring moderator approval for posts from people off the list (answer:
no, because the amount of work was just too much of a pain and the amount
of off-topic emails has not equated to the filtering work done
previously).

__ http://mail.python.org/pipermail/python-dev/2003-April/034453.html

`Placement of os.fdopen functionality`__
    It was suggested to make the fdopen method of the os_ module a class
method of 'file'.  That was determined to be YAGNI and thus won't happen.

__ http://mail.python.org/pipermail/python-dev/2003-April/034380.html
.. _os: http://www.python.org/dev/doc/devel/lib/os-newstreams.html

`Adding item in front of a list`__
    Tim Peters wonders how many people would be made upset if
list.insert() supported a negative index argument.

__ http://mail.python.org/pipermail/python-dev/2003-April/034518.html

`Why is spawn*p* not available on Windows?`__
    Shane Halloway might add one of the os.spawn*p*() functions to
Windows.

__ http://mail.python.org/pipermail/python-dev/2003-April/034473.html

`tzset`__
    time.tzset() is no long on Windows because it is broken (and I will
behave and not make a joke about how it would be just as broken as the OS
or anything because I am unbiased).

__ http://mail.python.org/pipermail/python-dev/2003-April/034480.html

`backporting string changes to 2.2.3`__
    Neal Norwitz updated docs and back-ported changes to the string_
module to bring it in sync with the actual string object.

__ http://mail.python.org/pipermail/python-dev/2003-April/034489.html
.. _string: http://www.python.org/dev/doc/devel/lib/module-string.html

`List wisdom`__
    http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom is a wiki page
created to contain the random nuggets of wisdom that come up on
python-dev.

__ http://mail.python.org/pipermail/python-dev/2003-April/034575.html

`ValueErrors in range()`__
    Fixed the error where range() returned ValueError when it should
return TypeError.

__ http://mail.python.org/pipermail/python-dev/2003-April/034617.html

`_socket efficiencies ideas`__
    Marcus Mendenhall wanted to get a patch applied that would allow you
to create a socket that could skip a DNS lookup.  He also wanted to add
the ability to include a '<numeric>' prefix IP addresses to make sure that
DNS lookup was skipped.  Various ways of trying to cut back on time wasted
on unneeded DNS lookups was discussed but no solution was found
acceptable.

__ http://mail.python.org/pipermail/python-dev/2003-April/034403.html

`tp_clear return value`__
    tp_clear could stand to return void, but can't change because of
backwards-compatibility.  Will most likely end up documenting to ignore
what is returned by what is put into tp_clear.

__ http://mail.python.org/pipermail/python-dev/2003-April/034433.html

`More socket questions`__
    Someone suggested fixing something that has been solved in Python 2.3.

__ http://mail.python.org/pipermail/python-dev/2003-April/034472.html

`Embedded python on Win2K, import failures`__
    Someone had errors embedding Windows.  No real conclusion came out of
it.

__ http://mail.python.org/pipermail/python-dev/2003-April/034506.html

`More int/long integration issues`__
    Splintered threads:
        - `range() as iterator
<http://mail.python.org/pipermail/python-dev/2003-April/034530.html>`__

    Before Python 3.0 (when xrange() will disappear), there is a good
chance that the idiom ``for x in range(): ...`` will be caught by the
compiler and compiled into a lazy generator (probably a generator).

__ http://mail.python.org/pipermail/python-dev/2003-April/034516.html

`Changes to gettext.py for Python 2.3`__
    Barry Warsaw suggested some changes to gettext_ but none of them
seemed to catch on.

__ http://mail.python.org/pipermail/python-dev/2003-April/034511.html
.. _gettext:

`Evil setattr hack`__
    Someone discovered how to set attributes on built-in types.  Guido
checked in code to prevent it.

__ http://mail.python.org/pipermail/python-dev/2003-April/034535.html

`Using temp files and the Internet in regression tests`__
    I asked if it was okay to use in regression tests temporary files
(answer: yes it is and if you only need one use test.test_support.TESTFN)
or sockets (answer: yes as long as
test.test_support.is_resource_enabled("network") is True).  It led to me
being unofficially assigned the task of coming up with documentation for
test_support and regrtest for both the library documentation and
Lib/test/README.  I also got CVS commit privileges on Python itself!  I
became an official Python developer!  Woohoo!

__ http://mail.python.org/pipermail/python-dev/2003-April/034538.html

`migration away from SourceForge?`__
    It was suggested we revisit the idea of moving Python development off
of SourceForge_ because of the usual crappy CVS performance and
underwhelming tracker performance.  There is also the issue that problems
with the setup cannot be fixed on our schedule.  GForge_ and Roundup_ were
both suggested as alternatives.  Roundup specifically has gotten a decent
amount of support since it is in Python and thus we could get things fixed
quickly.  Trouble is that it is not polished enough yet and we would need
to furnish our own CVS (but Ben Laurie might be gracious enough to help us
out on that front with possible hosting at http://www.thebunker.net/ ).

__ http://mail.python.org/pipermail/python-dev/2003-April/034540.html
.. _GForge: http://gforge.org/
.. _Roundup: http://roundup.sf.net/

`How should time.strptime() handle UTC?`__
    I asked if anyone thought time.strptime() should recognize UTC and GMT
timezones by default for setting whether or not daylight savings was being
used.  No one has given their opinion yet (do you have one?).

__ http://mail.python.org/pipermail/python-dev/2003-April/034543.html

`Big trouble in CVS Python`__
    CVS Python was crashing on the regression tests.  It turned out to be
from the reuse of a variable in the code that implements range().  Tim
Peters said a "Word to the wise:  don't ever try to reuse a variable whose
address is passed to PyArg_ParseTuple for anything other than holding what
PyArg_ParseTuple does or doesn't store into it".

__ http://mail.python.org/pipermail/python-dev/2003-April/034544.html

`GIL vs thread state`__
    Discovered the docs for PyThreadState_Clear() are incorrect (or at
least not very clear).

__ http://mail.python.org/pipermail/python-dev/2003-April/034574.html

`test_pwd failing`__
    test_pwd was failing and now it isn't.

__ http://mail.python.org/pipermail/python-dev/2003-April/034626.html

`lists v. tuples`__
    You can tell whether a comparison function does a 3-way or 2-way
(using <); but that is not Pythonic and thus won't be done so as to allow
someone to pass either a 2 or 3-way comparison function to list.sort() and
have the method figure out what type of sort it is.

__ http://mail.python.org/pipermail/python-dev/2003-April/034646.html

`LynxOS 4 port`__
    Duane Voth wants to get Python ported to the LynxOS on PPC.

__ http://mail.python.org/pipermail/python-dev/2003-April/034647.html

`sre.c and sre_match()`__
    The C code for the re module is not simple.  =)

__ http://mail.python.org/pipermail/python-dev/2003-April/034653.html




From thomas@xs4all.net  Fri Apr 18 23:50:55 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Sat, 19 Apr 2003 00:50:55 +0200
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
In-Reply-To: <20030418003431.GE9493@xs4all.nl>
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl> <20030418003431.GE9493@xs4all.nl>
Message-ID: <20030418225055.GF9493@xs4all.nl>

On Fri, Apr 18, 2003 at 02:34:31AM +0200, Thomas Wouters wrote:
> On Fri, Apr 18, 2003 at 02:06:50AM +0200, Thomas Wouters wrote:

> > Hm, wait, handling PyMethodDescrs may not be as tricky as I thought...
> > hrm... I'll look at it tomorrow, it's time for bed.

> I did a quick hack to the same effect, and it still came out a 1% loss (so
> about 6% against the no-newstyle patch) in PyBench and a few timeit tests.
> Sigh. I guess the non-method overhead is just too large, or there are more
> almost-methods than I figured. I'll start work on a more lookup-saving
> _PyObject_Generic_getmethod tomorrow or this weekend (and will probably do
> _Py_instance_getmethod that way too, while I'm at it.)

Okay, for those who care about this but aren't on Patches, I just uploaded a
new CALL_ATTR patch, version 4. It's actually two separate versions (3 and
4): maintainable, and fast. See the SF patch comment for more details :)

However, I spent most of tonight trying to clock the patch, only to come to
the conclusion that benchmarks suck. Which I already knew :) PyBench did a
reasonable job pointing me towards slowness, but the main slowdowns I see
with PyBench I cannot reproduce with timeit.py. I think I stopped trusting
PyBench when it reported the patch was 2% slower, but did so 5% faster --
consistently. So, if anyone has any *real* programs they can test the patch
with, I would be much obliged. Otherwise we'll have to check it in claiming
it gives a 20% performance boost on ... methods of newstyle classes ,,, and
30% on ... methods of old-style classes. :)

The patch is here: http://www.python.org/sf/709744

Where-...-reads-"empty-no-argument"-and-,,,-reads-"that-use-
  -PyObject_GenericGetAttr"-in-very-very-small-letters'ly y'rs,
-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mhammond@skippinet.com.au  Sat Apr 19 03:34:37 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sat, 19 Apr 2003 12:34:37 +1000
Subject: [Python-Dev] Final PEP 311 run
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBMEHAB.tim_one@email.msn.com>
Message-ID: <001101c3061c$395b6dd0$530f8490@eden>

[Tim]

> Some questions occurred while reading the PEP again,
> primarily are there any

To save space, the answer to all your questions are "it is ok", and "it must
avoid using the same handle - it must use its own".  I have updated the PEP
and the patch (primarily in the comments for the new functions) to hopefully
clarify this.

> > The only issue is the name of the API.
>
> If that's the only issue, check it in yesterday <0.9 wink>.

OK, I'm gunna hold that against you <wink>

Guido:
> How about PyGILState_Ensure() and PyGILState_Restore()?

Done!  I have checked in a new pep-311, and a new patch
(http://www.python.org/sf/684256).  So if Guido can formally pronounce on
pep-311, I will use those words against Tim and check it in!

Thanks,

Mark.



From python@rcn.com  Sat Apr 19 05:14:07 2003
From: python@rcn.com (Raymond Hettinger)
Date: Sat, 19 Apr 2003 00:14:07 -0400
Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release)
References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl> <20030418003431.GE9493@xs4all.nl> <20030418225055.GF9493@xs4all.nl>
Message-ID: <001501c3062a$1f07bde0$060ea044@oemcomputer>

So, if anyone has any *real* programs they can test the patch
> with, I would be much obliged. Otherwise we'll have to check it in claiming
> it gives a 20% performance boost on ... methods of newstyle classes ,,, and
> 30% on ... methods of old-style classes. :)

I tried it on some of my apps which moderately exercise
both new and old style classes.  None of the apps
improved and one a 1% worse.  Both pybench and
pystone were worse by 1%.  

Also, line 767 in classobject.c has an unreferenced
variable, f.


Raymond Hettinger


From guido@python.org  Sat Apr 19 14:19:48 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 19 Apr 2003 09:19:48 -0400
Subject: [Python-Dev] Final PEP 311 run
In-Reply-To: "Your message of Sat, 19 Apr 2003 12:34:37 +1000."
 <001101c3061c$395b6dd0$530f8490@eden>
References: <001101c3061c$395b6dd0$530f8490@eden>
Message-ID: <200304191319.h3JDJmF05210@pcp02138704pcs.reston01.va.comcast.net>

> Guido:
> > How about PyGILState_Ensure() and PyGILState_Restore()?
>
[Mark]
> Done!  I have checked in a new pep-311, and a new patch
> (http://www.python.org/sf/684256).  So if Guido can formally pronounce on
> pep-311, I will use those words against Tim and check it in!

OK, check it in, Mark!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@python.net  Sat Apr 19 17:07:54 2003
From: gward@python.net (Greg Ward)
Date: Sat, 19 Apr 2003 12:07:54 -0400
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <3E9C2828.4040803@livinglogic.de>
References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de>
Message-ID: <20030419160754.GA847@cthulhu.gerg.ca>

On 15 April 2003, Walter Dörwald said:
> Should the same change be done for the pwd module, i.e.
> are duplicate gid's allowed in /etc/group?

Yes.  I got a test failure from test_grp the other night, but I didn't
report it because I hadn't investigated it thoroughly yet.  I'm guessing
it's the same as the test_pwd failure... and yes, it stems from a
duplicate GID in the /etc/group file on that system.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Outside of a dog, a book is man's best friend.
Inside of a dog, it's too dark to read.


From gward@python.net  Sat Apr 19 17:15:18 2003
From: gward@python.net (Greg Ward)
Date: Sat, 19 Apr 2003 12:15:18 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: <20030416145602.GA27447@localhost.distro.conectiva>
References: <20030416145602.GA27447@localhost.distro.conectiva>
Message-ID: <20030419161518.GB847@cthulhu.gerg.ca>

On 16 April 2003, Gustavo Niemeyer said:
> Is there any chance of getting shellwords[1] into Python 2.3? It's very
> small module with a pretty interesting functionality:

It's already there (and has been since Python 1.6), albeit with a
different name and implementation:

>>> import distutils.util
>>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"')
['arg', 'arg arg', 'arg', 'arg', '-o=arg arg']

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
I'd like some JUNK FOOD ... and then I want to be ALONE --


From barry@python.org  Sat Apr 19 17:26:15 2003
From: barry@python.org (Barry Warsaw)
Date: 19 Apr 2003 12:26:15 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: <20030419161518.GB847@cthulhu.gerg.ca>
References: <20030416145602.GA27447@localhost.distro.conectiva>
 <20030419161518.GB847@cthulhu.gerg.ca>
Message-ID: <1050769575.29001.28.camel@anthem>

On Sat, 2003-04-19 at 12:15, Greg Ward wrote:
> On 16 April 2003, Gustavo Niemeyer said:
> > Is there any chance of getting shellwords[1] into Python 2.3? It's very
> > small module with a pretty interesting functionality:
> 
> It's already there (and has been since Python 1.6), albeit with a
> different name and implementation:
> 
> >>> import distutils.util
> >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"')
> ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg']

Distutils has a lot of neat (undocumented <wink>) stuff!  I wonder if it
makes sense to start promoting some of the more generally useful stuff
up into library modules of their own?

-Barry




From mwh@python.net  Sat Apr 19 18:04:20 2003
From: mwh@python.net (Michael Hudson)
Date: Sat, 19 Apr 2003 18:04:20 +0100
Subject: [Python-Dev] shellwords
In-Reply-To: <1050769575.29001.28.camel@anthem> (Barry Warsaw's message of
 "19 Apr 2003 12:26:15 -0400")
References: <20030416145602.GA27447@localhost.distro.conectiva>
 <20030419161518.GB847@cthulhu.gerg.ca>
 <1050769575.29001.28.camel@anthem>
Message-ID: <2mlly6pgff.fsf@starship.python.net>

Barry Warsaw <barry@python.org> writes:

> On Sat, 2003-04-19 at 12:15, Greg Ward wrote:
>> On 16 April 2003, Gustavo Niemeyer said:
>> > Is there any chance of getting shellwords[1] into Python 2.3? It's very
>> > small module with a pretty interesting functionality:
>> 
>> It's already there (and has been since Python 1.6), albeit with a
>> different name and implementation:
>> 
>> >>> import distutils.util
>> >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"')
>> ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg']
>
> Distutils has a lot of neat (undocumented <wink>) stuff!  I wonder if it
> makes sense to start promoting some of the more generally useful stuff
> up into library modules of their own?

Yes.

Particularly the file-manipulation stuff... shutil tends to lose
somewhat x-platform.

I probably first said this two or more years ago... still haven't done
anythin about it :-/

Cheers,
M.

-- 
  Java sucks. [...] Java on TV set top boxes will suck so hard it
  might well inhale people from off  their sofa until their heads
  get wedged in the card slots.              --- Jon Rabone, ucam.chat


From aahz@pythoncraft.com  Sat Apr 19 18:07:00 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sat, 19 Apr 2003 13:07:00 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030419170700.GA21744@panix.com>

On Sat, Apr 12, 2003, Guido van Rossum wrote:
>
> Using the dictionary doesn't work either:
> 
>     >>> str.__dict__['reverse'] = reverse
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>     TypeError: object does not support item assignment
>     >>>
> 
> But here's a trick that *does* work:
> 
>     >>> object.__setattr__(str, 'reverse', reverse)
>     >>>
> 
> Proof that it worked:
> 
>     >>> "hello".reverse()
>     'olleh'
>     >>> 

This post inspired me to check the way new-style class instances work
with properties.  Running the following code will demonstrate that
although the __setattr__ hack is blocked, you can still access the
instance's dict.  This can obviously be fixed by using __slots__, but
that seems unwieldy.  Should we do anything?

class C(object):
    def _getx(self):
        print "getting x:", self._x
        return self._x
    def _setx(self, value):
        print "setting x with:", value
        self._x = value
    x = property(_getx, _setx)

a = C()
a.x = 1
a.x
object.__setattr__(a, 'x', 'foo')
a.__dict__['x'] = 'spam'
print a.__dict__['x']
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From guido@python.org  Sat Apr 19 18:22:57 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 19 Apr 2003 13:22:57 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: "Your message of Sat, 19 Apr 2003 13:07:00 EDT."
 <20030419170700.GA21744@panix.com>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
 <20030419170700.GA21744@panix.com>
Message-ID: <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net>

> This post inspired me to check the way new-style class instances work
> with properties.  Running the following code will demonstrate that
> although the __setattr__ hack is blocked, you can still access the
> instance's dict.  This can obviously be fixed by using __slots__, but
> that seems unwieldy.  Should we do anything?
> 
> class C(object):
>     def _getx(self):
>         print "getting x:", self._x
>         return self._x
>     def _setx(self, value):
>         print "setting x with:", value
>         self._x = value
>     x = property(_getx, _setx)
> 
> a = C()
> a.x = 1
> a.x
> object.__setattr__(a, 'x', 'foo')
> a.__dict__['x'] = 'spam'
> print a.__dict__['x']

I see nothing wrong with that.  It falls in the category "don't do
that", but I don't see why we should try to make it impossible.

The thing with attributes of built-in types was different.  This can
affect multiple interpreters, which is evil.  It also is too
attractive to expect people not to use it if it works (since many
people *think* they have a need to modify built-in types).  That's why
I go to extra lengths to make it impossible, not just hard.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From drifty@alum.berkeley.edu  Sat Apr 19 20:26:18 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sat, 19 Apr 2003 12:26:18 -0700 (PDT)
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <20030419160754.GA847@cthulhu.gerg.ca>
References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de>
 <3E9C2828.4040803@livinglogic.de> <20030419160754.GA847@cthulhu.gerg.ca>
Message-ID: <Pine.SOL.4.55.0304191225130.19123@death.OCF.Berkeley.EDU>

[Greg Ward]

> On 15 April 2003, Walter D=F6rwald said:
> > Should the same change be done for the pwd module, i.e.
> > are duplicate gid's allowed in /etc/group?
>
> Yes.  I got a test failure from test_grp the other night, but I didn't
> report it because I hadn't investigated it thoroughly yet.  I'm guessing
> it's the same as the test_pwd failure... and yes, it stems from a
> duplicate GID in the /etc/group file on that system.
>

I got it, too.  Also got a test_getargs2 failure.  Haven't looked into it
thoroughly yet, though, especially since I don't know the status of the
new arg codes.

-Brett


From theller@python.net  Sat Apr 19 21:10:14 2003
From: theller@python.net (Thomas Heller)
Date: 19 Apr 2003 22:10:14 +0200
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
 <20030419170700.GA21744@panix.com>
 <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <4r4uuu3d.fsf@python.net>

Guido van Rossum <guido@python.org> writes:

> The thing with attributes of built-in types was different.  This can
> affect multiple interpreters, which is evil.

You seem to care about multiple interpreters in the same process.
Any chance to move the frozen modules pointer PyImport_FrozenModules
to a interpreter private variable (part of the PyInterpreterState)?

Thomas



From gward@python.net  Sat Apr 19 22:31:23 2003
From: gward@python.net (Greg Ward)
Date: Sat, 19 Apr 2003 17:31:23 -0400
Subject: [Python-Dev] shellwords
In-Reply-To: <1050769575.29001.28.camel@anthem>
References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca> <1050769575.29001.28.camel@anthem>
Message-ID: <20030419213123.GA681@cthulhu.gerg.ca>

On 19 April 2003, Barry Warsaw said:
> Distutils has a lot of neat (undocumented <wink>) stuff!  I wonder if it
> makes sense to start promoting some of the more generally useful stuff
> up into library modules of their own?

Probably.  All the generally-useful stuff is documented in clear,
concise docstrings, so any enterprising hacker could take this on.  I
still don't have enough round tuits to look at the Distutils again.
(Let's see ... my distutils-sig folder has 937 unread messages right
now... sigh...)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Question authority!


From aleax@aleax.it  Sat Apr 19 22:43:48 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sat, 19 Apr 2003 23:43:48 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
Message-ID: <200304192343.48211.aleax@aleax.it>

Sorry to distract python-dev's august collective attention from its usual
exhalted concerns down to a mundane issue;-), but... we may be able to
strike a tiny blow for simplicity, clarity, power, AND performance at once...

For the Nth time, today somebody asked in c.l.py about how best to sum
a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
others reduce(int.__add__,L), others reduce(operator.add,L), etc, and some
(me included) a simple
    total = 0
    for x in L:
        total = total + x

The usual performance measurements were unchained (easier than ever
thanks to timeit.py of course;-), and the paladins of reduce were once again
dismayed by the fact that the best reduce can do (that best is obtained with
operator.add) is mediocre (e.g. on my box with L=range(999), reduce takes 
330 usec, and the simple for loop takes 247).

Discussion proceeded on whether "reduce(operator.add, L)" was abstruse
for most people, or not, and on whether the loop was or wasn't "too low
level", as the Pythonic approach to such a common task.

It then struck me that Python doesn't HAVE "one single obvious way" to
do what IS after all a rather common task in everyday programming,
namely, "sum up this bunch of things" (typically numbers, occasionally
strings -- and when they're strings the "obvious" loop above is terribly
slow, a typical newbie trap...).  Somebody proposed having operator.add 
take any number of arguments -- not quite satisfactory, AND dog-slow, 
it turned out to be (when I tried a quick experimental mod to operator.c),
due to the need to turn a sequence (typically a list) into a tuple with *.

Now, I think the obvious approach would be to have a function sum,
callable with any non-empty homogeneous sequence (sequence of
items such that + can apply between them), returning the sequence's
summation -- now THAT might help for simplicity, clarity AND power.

So, I tried coding that up -- just 40 lines of C... it runs twice as fast 
as the plain loop, for summing up range(999), and just as fast as ''.join 
for summing up map(str, range(999)) [for the simple reason that I 
special-case this -- when the first item is a PyBaseString_Type, I 
delegate to ''.join].

Discussing this with newbie-to-moderately experienced Pythonistas,
the uniform reaction was on the order of "you mean Python doesn't
HAVE a sum function already?!" -- most everybody seemed to feel
that such a function WOULD be "the obvious way to do it" and that
it should definitely be there.

So -- by this time I'm biased, having invested a bit of time in this --
what do y'all think... any interest in this?  Should I submit it?  I'm not
quite sure where it should go -- a builtin seems most natural (to keep
company with min and max, for example), but maybe that would be
too ambitious, and it should be in math or operator instead...


Alex



From agthorr@barsoom.org  Sat Apr 19 23:41:11 2003
From: agthorr@barsoom.org (Agthorr)
Date: Sat, 19 Apr 2003 15:41:11 -0700
Subject: [Python-Dev] heapq
Message-ID: <20030419224110.GB2460@barsoom.org>

Hello,

I'm new to this list, so I will begin by introducing myself.  I'm a
graduate student at the University of Oregon working towards my PhD.
My primary area of research is in peer-to-peer networks.  I use Python
for a variety of purposes such as constructing web pages, rapid
prototyping, and building test frameworks.  I have been a Python user
for at least two years.

I must confess that I have not lurked on the list much before making
this post.  I did search back in the list though, so in theory I won't
be bringing up a rehashed topic...

Recently, I had need of a heap in Python.  I didn't see one in the 2.2
distribution, so I went and implemented one.  Afterwards, I wondered
if this might be useful to others, so I decided to investigate if any
work had been done to add a heap to Python's standard library.

Low and behold, in CVS there is a module called "heapq".

I compared my implementation with heapq, and I see some important
differences.  I'm not going to unilaterally state that mine is better,
but I thought it would be worthwhile to raise the differences in this
forum, so that an informed decision is made about The Best Way To Do
Things.

Hopefully, it will not be too terribly controversial :)

The algorithms used are more or less identical, I'm primarily
concerned with the differences in interface.

As written, heapq provides some functions to maintain the heap
priority on a Python list.  By contrast, I implemented the heap as an
opaque class that maintains a list internally.  By creating this layer
of abstraction, it is possible to completely change the heap
implementation later, without worrying about affecting user programs.
For example, it would be possible to switch to using Fibonacci Heaps
or the Pairing Heaps mentioned by Tim Peters in this message:
    http://mail.python.org/pipermail/python-dev/2002-August/027531.html

Another key difference is that my implementation supports the
decrease_key() operation that is important for algorithms such as
Dijkstra's.  This requires a little extra bookkeeping, but it's just a
small constant factor ;) For the API, my insert() function returns an
opaque key that can later be used as a parameter to the adjust_key()
function.

For those who like looking at source code, my implementation is here:
    http://www.cs.uoregon.edu/~agthorr/heap.py

-- Dan Stutzbach


From niemeyer@conectiva.com  Sat Apr 19 23:51:08 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Sat, 19 Apr 2003 19:51:08 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <20030419161518.GB847@cthulhu.gerg.ca>
References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca>
Message-ID: <20030419225108.GA2469@localhost.distro.conectiva>

> It's already there (and has been since Python 1.6), albeit with a
> different name and implementation:
> 
> >>> import distutils.util
> >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"')
> ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg']

I wasn't aware about it. While it should be enough for most uses, it's
still not posix compliant.  Single and double quotes are treated the
same way (single quotes shouldn't allow escaping), and escaping is done
differently (r'"\""' results in r'\"' instead of '"', and r'"\\"'
results in r'\\' instead of r'\', for example).

As others have said, it'd be nice to have these utilities somewhere
outside distutils.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From Jack.Jansen@oratrix.com  Sun Apr 20 00:03:40 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Sun, 20 Apr 2003 01:03:40 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304192343.48211.aleax@aleax.it>
Message-ID: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com>

On zaterdag, apr 19, 2003, at 23:43 Europe/Amsterdam, Alex Martelli 
wrote:
> For the Nth time, today somebody asked in c.l.py about how best to sum
> a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
> others reduce(int.__add__,L), others reduce(operator.add,L), etc, and 
> some
> (me included) a simple
>     total = 0
>     for x in L:
>         total = total + x
>
> The usual performance measurements were unchained (easier than ever
> thanks to timeit.py of course;-), and the paladins of reduce were once 
> again
> dismayed by the fact that the best reduce can do (that best is 
> obtained with
> operator.add) is mediocre (e.g. on my box with L=range(999), reduce 
> takes
> 330 usec, and the simple for loop takes 247).
>
[...]
> Now, I think the obvious approach would be to have a function sum,
> callable with any non-empty homogeneous sequence (sequence of
> items such that + can apply between them), returning the sequence's
> summation -- now THAT might help for simplicity, clarity AND power.
>
> So, I tried coding that up -- just 40 lines of C... it runs twice as 
> fast
> as the plain loop, for summing up range(999), and just as fast as 
> ''.join
> for summing up map(str, range(999)) [for the simple reason that I
> special-case this -- when the first item is a PyBaseString_Type, I
> delegate to ''.join].

Do you have any idea why your sum function is, uhm, three times faster 
than
the reduce(operator.add) version? Is the implementation of reduce doing
something silly, or are there shortcuts you can take that reduce() 
can't?

I'm asking because I think I would prefer reduce to give the speed you 
want.
That way, we won't have people come asking for a prod() function to 
match
sum(), etc.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From eppstein@ics.uci.edu  Sun Apr 20 00:06:19 2003
From: eppstein@ics.uci.edu (David Eppstein)
Date: Sat, 19 Apr 2003 16:06:19 -0700
Subject: [Python-Dev] Re: heapq
References: <20030419224110.GB2460@barsoom.org>
Message-ID: <eppstein-34C3B0.16061719042003@main.gmane.org>

In article <20030419224110.GB2460@barsoom.org>,
 Agthorr <agthorr@barsoom.org> wrote:

> The algorithms used are more or less identical, I'm primarily
> concerned with the differences in interface.

It seems relevant to point out my own experiment with an interface to 
priority queue data structures, 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117228

The algorithm itself is an uninteresting binary heap with lazy 
deletion, I am interested here more in the API.  My feeling is that 
"queue" is the wrong metaphor to use for a heap, since it maintains not 
just a sequence of objects (as in a queue) but a more general mapping 
of objects to priorities.  In many algorithms (e.g. Dijkstra), you want 
to be able to change these priorities, not just add and remove items 
the way you would in a queue.

So, anyway, I called it a "priority dictionary" and gave it a 
dictionary-like API:
    pd[item] = priority 
adds a new item to the heap with the given priority, or updates the 
priority of an existing item, no need for a separate decrease_key method 
as you suggest.  There is an additional method for finding the 
highest-priority item since that's not a normal dictionary operation.

I also implemented an iterator method that repeatedly finds and removes 
the highest priority item, so that "for item in priorityDictionary" 
loops through the items in priority order.  Maybe it would have been 
better to give this method a different name, though, since it's quite 
different from the usual not-very-useful dictionary iterator.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science



From mwh@python.net  Sun Apr 20 00:10:48 2003
From: mwh@python.net (Michael Hudson)
Date: Sun, 20 Apr 2003 00:10:48 +0100
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> (Jack
 Jansen's message of "Sun, 20 Apr 2003 01:03:40 +0200")
References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com>
Message-ID: <2mademozgn.fsf@starship.python.net>

Jack Jansen <Jack.Jansen@oratrix.com> writes:

> Do you have any idea why your sum function is, uhm, three times
> faster than the reduce(operator.add) version? Is the implementation
> of reduce doing something silly, or are there shortcuts you can take
> that reduce() can't?

I imagine it's the function calls; a trip through the call machinery,
time packing and unpacking arguments, etc.  I haven't checked, though.

> I'm asking because I think I would prefer reduce to give the speed
> you want.  That way, we won't have people come asking for a prod()
> function to match sum(), etc.

I can't think of one.

I'm not sure this is worth the effort, though.

Cheers,
M.

-- 
  Any form of evilness that can be detected without *too* much effort
  is worth it...  I have no idea what kind of evil we're looking for
  here or how to detect is, so I can't answer yes or no.
                                       -- Guido Van Rossum, python-dev


From jack@performancedrivers.com  Sun Apr 20 00:57:20 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Sat, 19 Apr 2003 19:57:20 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <2mademozgn.fsf@starship.python.net>; from mwh@python.net on Sun, Apr 20, 2003 at 12:10:48AM +0100
References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> <2mademozgn.fsf@starship.python.net>
Message-ID: <20030419195720.D1553@localhost.localdomain>

On Sun, Apr 20, 2003 at 12:10:48AM +0100, Michael Hudson wrote:
> Jack Jansen <Jack.Jansen@oratrix.com> writes:
> 
> > Do you have any idea why your sum function is, uhm, three times
> > faster than the reduce(operator.add) version? Is the implementation
> > of reduce doing something silly, or are there shortcuts you can take
> > that reduce() can't?
> 
> I imagine it's the function calls; a trip through the call machinery,
> time packing and unpacking arguments, etc.  I haven't checked, though.

Browsing through bltinmodule.c (was 'builtinmodule.c' too long?) it is
mainly the overhead of calling PyEval_CallObject lots of times, which would
include parsing args, etc.  It tries to avoid creating the argument tuple
more than once by checking the refcount on every loop (I would think the
tuple would be generally be unpacked by the receiving function, but better
safe than sorry).

> > I'm asking because I think I would prefer reduce to give the speed
> > you want.  That way, we won't have people come asking for a prod()
> > function to match sum(), etc.
> 

I think reduce/filter/map could be improved by checking if their operative
function is in builtin or operator modules and calling the C directly.
operator.c is a just litany of macros.  This would add tiny one-time overhead 
for non builtins/operators.

Don't mistake my comments as volunteering ;)  I'm still plodding through _sre.c
(Did you know re's are used in setup.py?  break _sre.c and you can't compile
the source without using two trees, fun!)

-jackdied


From drifty@alum.berkeley.edu  Sun Apr 20 01:01:21 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sat, 19 Apr 2003 17:01:21 -0700 (PDT)
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304192343.48211.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it>
Message-ID: <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU>

[Alex Martelli]

> For the Nth time, today somebody asked in c.l.py about how best to sum
> a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
> others reduce(int.__add__,L), others reduce(operator.add,L), etc, and some
> (me included) a simple
>     total = 0
>     for x in L:
>         total = total + x
<snip>
> Now, I think the obvious approach would be to have a function sum,
> callable with any non-empty homogeneous sequence (sequence of
> items such that + can apply between them), returning the sequence's
> summation -- now THAT might help for simplicity, clarity AND power.
>
<snip>
> Discussing this with newbie-to-moderately experienced Pythonistas,
> the uniform reaction was on the order of "you mean Python doesn't
> HAVE a sum function already?!" -- most everybody seemed to feel
> that such a function WOULD be "the obvious way to do it" and that
> it should definitely be there.
>

So I have no fundamental issue with the proposed function, but I don't
find a huge need for it personally; I always do the looping solution
(jaded against the functional stuff from school =).

I do see how it could be useful, though.  I don't necessarily see this as
a built-in (although it wouldn't kill me if it became one).  I don't see
it going into either the math or operator modules since it doesn't quite
fit what is already there.  I initially thought itertools since it is
basically working on an iterator, but I don't know if we want to change
itertools from a module the provides functionality for outputting special
iterators compared to working with iterators.

And as for the argument that other people are shocked it isn't already
there... I just don't agree with that.  Just because people want it does
not mean it is a good solution to a problem.  Tyranny of the majority and
such.  =)

So I am currently +0 on having the function, -0 on sticking it in math or
operator, +0 on built-in.

And now I go back to PHP grunt work, wishing I was actually writing docs
for test_support and regrtest instead (and that says something about what
I am having to work on).

-Brett


From guido@python.org  Sun Apr 20 01:40:39 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 19 Apr 2003 20:40:39 -0400
Subject: [Python-Dev] Evil setattr hack
In-Reply-To: "Your message of 19 Apr 2003 22:10:14 +0200."
 <4r4uuu3d.fsf@python.net>
References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net>
 <20030419170700.GA21744@panix.com>
 <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net>
 <4r4uuu3d.fsf@python.net>
Message-ID: <200304200040.h3K0edh10106@pcp02138704pcs.reston01.va.comcast.net>

> You seem to care about multiple interpreters in the same process.
> Any chance to move the frozen modules pointer PyImport_FrozenModules
> to a interpreter private variable (part of the PyInterpreterState)?

Why would you want that?  Since it is just statically initialized
data, I don't see the point.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Sun Apr 20 02:46:22 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 19 Apr 2003 21:46:22 -0400
Subject: [Python-Dev] New re failures on Windows
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>

Sorry, I can't make time for this.

test_re is failing today:

"""
C:\Code\python\PCbuild>python ../lib/test/test_re.py
Running tests on re.search and re.match
Running tests on re.sub
Running tests on symbolic references
Running tests on re.subn
Running tests on re.split
Running tests on re.findall
Running tests on re.match
Running tests on re.escape
Pickling a RegexObject instance
Test engine limitations
maximum recursion limit exceeded
Running re_tests test suite
=== grouping error ('^((a)c)?(ab)$', 'ab', 0, 'g1+"-"+g2+"-"+g3',
'None-None-ab'
) 'None-a-ab' should be 'None-None-ab'
"""

test_sre is dying with a segfault:

"""
C:\Code\python\PCbuild>python ../lib/test/test_sre.py
Running tests on character literals
Running tests on sre.search and sre.match
sre.match(r'(a)?a','a').lastindex FAILED
expected None
got result 1
sre.match(r'(a)(b)?b','ab').lastindex FAILED
expected 1
got result 2
sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED
expected 'a'
got result 'b'
Running tests on sre.sub
Running tests on symbolic references
Running tests on sre.subn
Running tests on sre.split
Running tests on sre.findall
Running tests on sre.finditer
Running tests on sre.match
Running tests on sre.escape
Running tests on sre.Scanner
Pickling a SRE_Pattern instance
Test engine limitations
"""

and it dies with a segfault there.  Unfortunately, test_sre doesn't die in a
debug build.



From niemeyer@conectiva.com  Sun Apr 20 03:27:23 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Sat, 19 Apr 2003 23:27:23 -0300
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>
Message-ID: <20030420022723.GA5905@localhost.distro.conectiva>

I have backed out some changes introduced in _sre.c:2.84 so that its
behavior was compliant with the old behavior. More information in
bug #672491.

[...]
> Running re_tests test suite
> === grouping error ('^((a)c)?(ab)$', 'ab', 0, 'g1+"-"+g2+"-"+g3',
> 'None-None-ab'
> ) 'None-a-ab' should be 'None-None-ab'
> """

Hummm.. my changes shouldn't affect this. I'll check that out as well.

> test_sre is dying with a segfault:

This shouldn't happen with my changes either. I've just backed out
some changes, returning to the original code.

> """
> C:\Code\python\PCbuild>python ../lib/test/test_sre.py
> Running tests on character literals
> Running tests on sre.search and sre.match
> sre.match(r'(a)?a','a').lastindex FAILED
> expected None
> got result 1
> sre.match(r'(a)(b)?b','ab').lastindex FAILED
> expected 1
> got result 2
> sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED
> expected 'a'
> got result 'b'

These were the tests I've implemented when the patch was introduced.
Unfortunately, the documentation wasn't clear about the expected
behavior, and it was implemented wrongly the first time. Now I backed
out the changes, and it returned to the original behavior. OTOH, it
looks like part of the original problem is still there.

I'll work on it. Greg Chapman also has some ideas about this in the
patch #712900.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From python@rcn.com  Sun Apr 20 06:59:03 2003
From: python@rcn.com (Raymond Hettinger)
Date: Sun, 20 Apr 2003 01:59:03 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it>
Message-ID: <004101c30701$f2bde9c0$0a11a044@oemcomputer>

> Now, I think the obvious approach would be to have a function sum,
> callable with any non-empty homogeneous sequence (sequence of
> items such that + can apply between them), returning the sequence's
> summation -- now THAT might help for simplicity, clarity AND power.

+1  -- this comes-up all the time.

> I'm not
> quite sure where it should go -- a builtin seems most natural (to keep
> company with min and max, for example), but maybe that would be
> too ambitious, and it should be in math or operator instead...

__builtin__ is already too fat.  math is for floats.  operator is mostly
for operators.  Perhaps make a separate module for vector-to-scalar 
operations like min, max, product, average, moment, and dotproduct.


Raymond Hettinger


From uche.ogbuji@fourthought.com  Sun Apr 20 04:51:38 2003
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Sat, 19 Apr 2003 21:51:38 -0600
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: Message from Alex Martelli <aleax@aleax.it>
 of "Sat, 19 Apr 2003 23:43:48 +0200." <200304192343.48211.aleax@aleax.it>
Message-ID: <E1975rf-0002kf-00@borgia.local>

> Now, I think the obvious approach would be to have a function sum,
> callable with any non-empty homogeneous sequence (sequence of
> items such that + can apply between them), returning the sequence's
> summation -- now THAT might help for simplicity, clarity AND power.

+1.  I agree that this is a natural additon to min() and max(), and a common 
enough case to clarify and optimize.


> I'm not
> quite sure where it should go -- a builtin seems most natural (to keep
> company with min and max, for example), but maybe that would be
> too ambitious, and it should be in math or operator instead...

+1 on builtins, but I'd be OK with math or op as well.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Gems From the [Python/XML] Archives - http://www.xml.com/pub/a/2003/04/09/py-xm
l.html




From prabhu@aero.iitm.ernet.in  Sun Apr 20 07:09:23 2003
From: prabhu@aero.iitm.ernet.in (Prabhu Ramachandran)
Date: Sun, 20 Apr 2003 11:39:23 +0530
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304192343.48211.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it>
Message-ID: <16034.14739.83749.505815@monster.linux.in>

>>>>> "AM" == Alex Martelli <aleax@aleax.it> writes:

[summing a sequence]
    AM> Now, I think the obvious approach would be to have a function
    AM> sum, callable with any non-empty homogeneous sequence
    AM> (sequence of items such that + can apply between them),
    AM> returning the sequence's summation -- now THAT might help for
    AM> simplicity, clarity AND power.

FWIW, Numeric provides a sum function that mostly does what you want:

>>> from Numeric import * 
>>> sum(range(999))
498501
>>> sum(['a', 'b', 'c'])
'abc'

# this one produces a slightly surprising result
>>> sum(['aaa', 'b', 'c'])
array([abc , a   , a   ],'O')
# but is easily explained in the context of multi-dimensional arrays.

Anyway, my point is most Numeric users are already comfortable with
the idea of a sum function.  However, as someone already said, if you
argue that sum is necessary, what about product (which again Numeric
provides along with a host of other useful functions)?

cheers,
prabhu


From aleax@aleax.it  Sun Apr 20 07:29:52 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 08:29:52 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <16034.14739.83749.505815@monster.linux.in>
References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in>
Message-ID: <200304200829.52477.aleax@aleax.it>

On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote:
   ...
> Anyway, my point is most Numeric users are already comfortable with
> the idea of a sum function.  However, as someone already said, if you

Oh yes, Numeric.sum is excellent, by all means.  But I think sum is
quite helpful even for programs not using Numeric.

> argue that sum is necessary, what about product (which again Numeric
> provides along with a host of other useful functions)?

In the context of Numeric use, it's quite appropriate to have sum, prod,
and the other ufuncs' reduce AND accumulate methods.  In everyday
programming in other fields, the demand for the functionality given by
sum is FAR higher than that given by prod.  For example, googling on
c.l.py shows 165 posts mentioning "reduce(operator.add" versus 39
mentioning "reduce(operator.mul".  This reflects the need of typical
computations -- indeed, even the English language shows indications
about the prevalence of summing as a bulk operation.  

In everyday life, we often have to sum a set of numbers of varying
cardinality -- we even have the word "total" to indicate the result of
this operation.  We rarely have to multiply such a set of numbers --
most multiplications we do involve two, at most three numbers, while
every time we check a restaurant bill or other itemized bill we're
summing up a varying number of numbers, for example.

I think that, in this case, practicality beats purity, and we should have
a sum function somewhere in Python's standard library (or builtins,
though as someone mentioned they ARE quite fat already), leaving
reduce for all other, less frequently used cases of bulk operations.


Alex



From aleax@aleax.it  Sun Apr 20 07:38:20 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 08:38:20 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <004101c30701$f2bde9c0$0a11a044@oemcomputer>
References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer>
Message-ID: <200304200838.20191.aleax@aleax.it>

On Sunday 20 April 2003 07:59 am, Raymond Hettinger wrote:
> > Now, I think the obvious approach would be to have a function sum,
> > callable with any non-empty homogeneous sequence (sequence of
> > items such that + can apply between them), returning the sequence's
> > summation -- now THAT might help for simplicity, clarity AND power.
>
> +1  -- this comes-up all the time.

Yes, I agree it does -- both  in discussions (c.l.py, python-help -- dunno
'bout tutor, as I'm not following it) AND in practical use.


> > I'm not
> > quite sure where it should go -- a builtin seems most natural (to keep
> > company with min and max, for example), but maybe that would be
> > too ambitious, and it should be in math or operator instead...
>
> __builtin__ is already too fat.  math is for floats.  operator is mostly
> for operators.  Perhaps make a separate module for vector-to-scalar
> operations like min, max, product, average, moment, and dotproduct.

__builtin__ has 123 entries.  ls Lib/*.py | wc finds 183 toplevel modules
(without even mentioning those modules that are already grouped into
packages).  So, making new modules should be roughly as much of a
"fatness" problem as adding new builtins, at least, shouldn't it?  min and
max are already built-ins.  Computing average(x) as sum(x)/len(x) does
not seem too much of a problem.  product, moment and dotproduct
appear to be "nice to have" rather than real needs.

True, math deals only with float stuff.  But operator doesn't seem too
bad -- sure, it mostly exposes stuff that's already elsewhere in the
internals (operators AND others, such as countOf), but that could be
considered an implementation detail.


Alex



From aleax@aleax.it  Sun Apr 20 07:52:29 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 08:52:29 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU>
References: <200304192343.48211.aleax@aleax.it> <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU>
Message-ID: <200304200852.29672.aleax@aleax.it>

On Sunday 20 April 2003 02:01 am, Brett Cannon wrote:
   ...
> So I have no fundamental issue with the proposed function, but I don't
> find a huge need for it personally; I always do the looping solution
> (jaded against the functional stuff from school =).

Looping is what I'm doing these days, but while fastest it's not terribly
convenient.  And it took me a while to learn to avoid reduce for that...


> I do see how it could be useful, though.  I don't necessarily see this as
> a built-in (although it wouldn't kill me if it became one).  I don't see
> it going into either the math or operator modules since it doesn't quite
> fit what is already there.  I initially thought itertools since it is
> basically working on an iterator, but I don't know if we want to change
> itertools from a module the provides functionality for outputting special
> iterators compared to working with iterators.

Agreed on collocation -- itertools or math would be inappropriate, and
builtins best, but since there are already so many builtins many are
understandably reacting badly to the idea of adding anything there.
So, if builtins are to be considered untouchable, I'd rather have sum
in operator (where it does sort of fit, I think) than do without it.


> And as for the argument that other people are shocked it isn't already
> there... I just don't agree with that.  Just because people want it does
> not mean it is a good solution to a problem.  Tyranny of the majority and
> such.  =)

I must have expressed myself badly -- sorry.  What I meant to illustrate
is that sum (particularly as a built-in) would feel perfectly natural to
typical Python beginners -- it would instantly become "the one obvious
way" to deal with the common task of "sum these several numbers", as
well as the slightly less common one of "concatenate these many
strings" [many still balk at ''.join(manystrings), sum(manystrings) as I
coded it delegates to ''.join so it's almost equally fast] and the like.

So, let's see if I can express this more clearly...:

It's not a question of tyranny of anybody -- it's a question of the degree
of abstraction required to find "reduce(operator.add, L)" the ``one
obvious way'' to sum numbers being quite a bit above everyday thought
habits.  If we say that "the one obvious way" is a loop it becomes hard
to justify why the one obvious way to find a maximum is max(L) rather
than a perfectly similar loop -- after all "sum these numbers" and "find
the largest one of these numbers" are tasks with perfectly comparable
frequency of applicability in everyday programming tasks and perceived
complexity.  (My implementation for sum is a small copy-past-edit job
on that of max/min, removing the special-case the latter have when
called with >1 argument and adding one to delegate to ''.join for the
specific purpose of summing instances of PyBaseString_Type -- the
structure is really very similar).


Alex



From aleax@aleax.it  Sun Apr 20 08:16:18 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 09:16:18 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com>
References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com>
Message-ID: <200304200916.18283.aleax@aleax.it>

On Sunday 20 April 2003 01:03 am, Jack Jansen wrote:
   ...
> > For the Nth time, today somebody asked in c.l.py about how best to sum
> > a list of numbers.  As usual, many suggested reduce(lambda x,y:x+y, L),
> > others reduce(int.__add__,L), others reduce(operator.add,L), etc, and
   ...
> > Now, I think the obvious approach would be to have a function sum,
> > callable with any non-empty homogeneous sequence (sequence of
   ...
> Do you have any idea why your sum function is, uhm, three times faster
> than the reduce(operator.add) version? Is the implementation of reduce 
> doing something silly, or are there shortcuts you can take that reduce()
> can't?

I see this has already been answered.  The only important shortcut in the
sum I coded is to delegate to ''.join if it turns out the first item is an
instance of PyBaseString_Type -- this way we get excellent performance
for "concatenate up this bunch of strings" in a way that would surely be
rather problematic for a function as general as reduce (the latter would
need to specialcase on its function argument, singling out the special
case in which it's operator.add for optimization).


> I'm asking because I think I would prefer reduce to give the speed you
> want.
> That way, we won't have people come asking for a prod() function to
> match sum(), etc.

I see I must have expressed myself badly, and I apologize.  Raw speed
in summing up many numbers is NOT the #1 motivation for my proposal.
Whether it takes 100+ microseconds, or 300+ microseconds, to sum up a
thousand integers (with O(N) scaling in either case), is not all that crucial.

I think the importance of speed here is mainly *psychological* -- an issue
of "marketing" at some level, if you will.  What I'm mostly after is to have
"one obvious way" to sum up a bunch of numbers.  I think a HOF such as
reduce would not be "the one obvious way" to most people, and having
to code an explicit loop maintaining the total has its own problems -- some
people object that it's too low-level (so not obvious enough for THEM), and it
also leads the beginner right into a performance trap when what's being 
summed is strings rather than numbers.  If the one obvious way to sum
(concatenate) a bunch of strings is ''.join(bunch), how can we say that when
the task is summing  a bunch of numbers instead, then the one obvious way
becomes a HOF or an explicit loop?

Having sum(bunch) would give the "one obvious way", and a speedup of
2 or 3 times wrt looping or using reduce would psychologically help make
it "THE obvious way", I think.

I must also have been unclear on why I think sum is important in a way
that prod and other reduce operations aren't: summing a bunch of numbers
is *quite a common task* in many kinds of everyday programming (and in
fact everyday life) in a way that multiplying a bunch of numbers (much less
any other such bulk operation) just isn't.  "prod" isn't even an English word
(well it IS, but not in the meaning of "product":-) and when people talk
about "the product" they're hardly ever talking about multiplication, while
"the sum" or also commonly "the total" are words that do indicate the
result of repeatedly applying addition (even when you say "the sum" to
indicate an amount of money, the history of that word does come from
addition -- addition of the values coins and notes of varying denominations
making up "the sum" -- while "the product" as normally used in English
has nothing to do with multiplying).

I think I understand the worry that introducing 'sum' would be the start
of a slippery slope leading to requests for 'prod' (I can't think of other
bulk operations that would be at all popular -- perhaps bulk and/or, but
I think that's stretching it).  But I think it's a misplaced worry in this 
case.  "Adding up a bunch of numbers" is just SO much more common
than "Multiplying them up" (indeed the latter's hardly idiomatic English,
while "adding up" sure is), that I believe normal users (as opposed to
advanced programmers with a keenness on generalization) wouldn't
have any problem at all with 'sum' being there and 'prod' missing...


Alex



From prabhu@aero.iitm.ernet.in  Sun Apr 20 08:51:13 2003
From: prabhu@aero.iitm.ernet.in (Prabhu Ramachandran)
Date: Sun, 20 Apr 2003 13:21:13 +0530
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304200829.52477.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it>
 <16034.14739.83749.505815@monster.linux.in>
 <200304200829.52477.aleax@aleax.it>
Message-ID: <16034.20849.214752.656523@monster.linux.in>

>>>>> "AM" == Alex Martelli <aleax@aleax.it> writes:

    >> argue that sum is necessary, what about product (which again
    >> Numeric provides along with a host of other useful functions)?

    AM> In the context of Numeric use, it's quite appropriate to have
    AM> sum, prod, and the other ufuncs' reduce AND accumulate
    AM> methods.  In everyday programming in other fields, the demand
    AM> for the functionality given by sum is FAR higher than that
    AM> given by prod.  For example, googling on c.l.py shows 165
    AM> posts mentioning "reduce(operator.add" versus 39 mentioning
    AM> "reduce(operator.mul".  This reflects the need of typical
    AM> computations -- indeed, even the English language shows
    AM> indications about the prevalence of summing as a bulk
    AM> operation.

I agree that sum will be used far more than product.  I can't remember
when I needed to use product myself!  Anyway here are the arguments
I've seen so far.

Pros:

  1. One obvious, fairly efficient, and easy way to sum sequences.

  2. Google and experience suggest sum is used more often than
  product and other functions.

  3. Easy on newbies?

  4. Will hopefully prevent the N+1'th thread on how to sum lists on
  c.l.py.
  
Cons:

  1. __builtins__ is already fat.  Will one more function make that
  much difference?

  2. Will future requests be made for product and friends?

  3. Why not simply speed up reduce/operator.add and train more people
  to use that?



cheers,
prabhu


From niemeyer@conectiva.com  Sun Apr 20 08:54:53 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Sun, 20 Apr 2003 04:54:53 -0300
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>
Message-ID: <20030420075453.GA9504@localhost.distro.conectiva>

> test_re is failing today:
[...]

Should be working now. Sorry about the trouble. I should have fixed that
before submiting the first version.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From fincher.8@osu.edu  Sun Apr 20 09:56:46 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 04:56:46 -0400
Subject: [Python-Dev] heapq
In-Reply-To: <20030419224110.GB2460@barsoom.org>
References: <20030419224110.GB2460@barsoom.org>
Message-ID: <200304200456.52084.fincher.8@osu.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I must agree with Agthorr that the interface to heapq could use some work.  I 
wrote a simple wrapper around the heapq functions to give myself a heapq 
object (with methods like push, pop, etc.)  Here's that object:

class heap(list):
    __slots__ = ()
    def __init__(self, seq):
        list.__init__(self, seq)
        heapify(self)

    def pop(self):
        lastelt = list.pop(self)
        if self:
            returnitem = self[0]
            self[0] = lastelt
            _siftup(self, 0)
        else:
            returnitem = lastelt
        return returnitem

    replace = heapreplace
    push = heappush

Is there any possibility of such an interface going into the heapq module?  I 
find it much cleaner and easier to read than the "functions operating on 
sequences" interface heapq currently offers.

I've got unit tests for the object written, if it is something that will 
possibly go into the standard library.

Jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+omDOqkDiu+Bs+JIRAu0wAJ9xpx+7nH0fNiZzJhl34tWUbHN4HgCfZx9G
38IgV4lSY6adYyLEufWG6mk=
=NVlh
-----END PGP SIGNATURE-----



From aleax@aleax.it  Sun Apr 20 09:01:12 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 10:01:12 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <16034.20849.214752.656523@monster.linux.in>
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <16034.20849.214752.656523@monster.linux.in>
Message-ID: <200304201001.12643.aleax@aleax.it>

On Sunday 20 April 2003 09:51 am, Prabhu Ramachandran wrote:

VERY good summary -- repeated here:

> Pros:
>   1. One obvious, fairly efficient, and easy way to sum sequences.
>   2. Google and experience suggest sum is used more often than
>   product and other functions.
>   3. Easy on newbies?
>   4. Will hopefully prevent the N+1'th thread on how to sum lists on
>   c.l.py.
>
> Cons:
>   1. __builtins__ is already fat.  Will one more function make that
>   much difference?
>   2. Will future requests be made for product and friends?
>   3. Why not simply speed up reduce/operator.add and train more people
>   to use that?

I think Pro #2 answers Con #2, and dittos for #3's (in addition to some
implementation issues with speeding up reduce that way).  But anyway,
yes, these _are_ the considerations made pro & con on this thread.

Anyway, whence now -- a PEP?  (Seems a bit too small for that).  Or, do
I just submit the patch (to where -- builtins?) and let Guido pronounce?


Alex



From drifty@alum.berkeley.edu  Sun Apr 20 09:07:54 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 20 Apr 2003 01:07:54 -0700 (PDT)
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304200852.29672.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it>
 <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU>
 <200304200852.29672.aleax@aleax.it>
Message-ID: <Pine.SOL.4.55.0304200102340.27731@death.OCF.Berkeley.EDU>

[Alex Martelli]

> On Sunday 20 April 2003 02:01 am, Brett Cannon wrote:
>    ...
> > So I have no fundamental issue with the proposed function, but I don't
> > find a huge need for it personally; I always do the looping solution
> > (jaded against the functional stuff from school =).
>
> Looping is what I'm doing these days, but while fastest it's not terribly
> convenient.  And it took me a while to learn to avoid reduce for that...
>

True, but "explicit is better than implicit".  But don't take this to mean
that I don't think that your proposed function is not good; I do think it
has merit.

<snip>
> Agreed on collocation -- itertools or math would be inappropriate, and
> builtins best, but since there are already so many builtins many are
> understandably reacting badly to the idea of adding anything there.
> So, if builtins are to be considered untouchable, I'd rather have sum
> in operator (where it does sort of fit, I think) than do without it.
>

Fair enough.  Either that or a new module.

<snip>
> So, let's see if I can express this more clearly...:
>
> It's not a question of tyranny of anybody -- it's a question of the degree
> of abstraction required to find "reduce(operator.add, L)" the ``one
> obvious way'' to sum numbers being quite a bit above everyday thought
> habits.  If we say that "the one obvious way" is a loop it becomes hard
> to justify why the one obvious way to find a maximum is max(L) rather
> than a perfectly similar loop -- after all "sum these numbers" and "find
> the largest one of these numbers" are tasks with perfectly comparable
> frequency of applicability in everyday programming tasks and perceived
> complexity.  (My implementation for sum is a small copy-past-edit job
> on that of max/min, removing the special-case the latter have when
> called with >1 argument and adding one to delegate to ''.join for the
> specific purpose of summing instances of PyBaseString_Type -- the
> structure is really very similar).
>

That's better. =)  Comes off less as "let's add this to make newcomers
happy" and more as "this will simplify good code".  The latter is a always
a good thing.

I have an idea to respond to the whole "everyone will want prod() next"
idea, but I will put that in another email to try to keep this thread
coherent.

-Brett


From fincher.8@osu.edu  Sun Apr 20 10:16:01 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 05:16:01 -0400
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <20030419224110.GB2460@barsoom.org>
References: <20030419224110.GB2460@barsoom.org>
Message-ID: <200304200516.02382.fincher.8@osu.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

2.3 seems to focus somewhat on adding a wider variety of data structures to 
Python -- well, sets and heapq, at least :)  One thing I've found lacking, 
though, is a nice O(1) FIFO queue -- even the standard Queue module 
underlying uses a list as a queue, which means the dequeue operation is O(N) 
in the size of the queue.  I'm curious what the possiblity of getting a queue 
module (which would probably have to be named "fifo", since Queue is already 
taken and some operating systems use case-insensitive filesystems) added to 
the standard library would be.

If it is a possibility, I have a pure-Python implementation using the 
mechanism described at  
<http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=a23cjl%24dps%241%40serv1.iunet.it>.  
The module itself is at <http://www.cis.ohio-state.edu/fifo.py>; the tests 
are at <http://www.cis.ohio-state.edu/test_fifo.py>.

Jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+omVRqkDiu+Bs+JIRAilOAKCWe7CfZqyBboi/zGZ5jHxnKSiS5ACfTBEt
D2Hz+k7dzXTW3HjXByzlA2M=
=juHN
-----END PGP SIGNATURE-----



From drifty@alum.berkeley.edu  Sun Apr 20 09:18:08 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 20 Apr 2003 01:18:08 -0700 (PDT)
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304200829.52477.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in>
 <200304200829.52477.aleax@aleax.it>
Message-ID: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>

[Alex Martelli]

> On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote:
<snip>    ...
> In the context of Numeric use, it's quite appropriate to have sum,
prod,
> and the other ufuncs' reduce AND accumulate methods.  In everyday
> programming in other fields, the demand for the functionality given by
> sum is FAR higher than that given by prod.
<snip>

I think part of the trouble here is the name.  The word "sum" just
automatically causes one to think math.  This leads to thinking of
multiplication, division, and subtraction.  But Alex's proposed function
does more than a summation by special-casing the concatentation of
strings.

Perhaps renaming it to something like "combine()" would help do away with
the worry of people wanting a complimentary version for multiplication
since it does more than just sum numbers; it also combines strings in a
very efficient manner.  I mean we could extend this to all built-in types
where there is a reasonable operation for them (but this is jumping the
gun).

And as for the worry about this being a built-in, we do have divmod for
goodness sakes.  I mean divmod() is nothing more than glorifying division
and remainder for the sake of clean code; ``divmod(3,2) == (3/2, 3%2)``.
This function serves the same purpose in the end; to allow for cleaner
code with some improved performance for a function that people use on a
regular enough basis to ask for it constantly on c.l.p .

-Brett


From noah@noah.org  Sun Apr 20 09:20:57 2003
From: noah@noah.org (Noah Spurrier)
Date: Sun, 20 Apr 2003 01:20:57 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
Message-ID: <3EA25869.6070404@noah.org>

Hello,

Recently I realized that there is no easy way to
walk a directory tree and rename each directory and file.
The standard os.path.walk() function does a breadth first walk.
This makes it hard to write scripts that modify directory names
as they walk the tree because you need to visit subdirectories before
you rename their parents. What is really needed is a depth first walk.
For example this naive code would not work with breadth first walk:

     """Renames all directories and files to lower case."""
     import os.path

     def visit (arg, dirname, names):
         for name in names:
             print os.path.join (dirname, name)
             oldname = os.path.join (dirname, name)
             newname = os.path.join (dirname, name.lower())
             os.rename (oldname, newname)

     os.path.walk ('.', visit, None)

The library source posixpath.py defined os.path.walk on my system.
A comment in that file mentions that the visit function may
modify the filenames list to impose a different order of visiting,
but this is not possible as far as I can tell.

Perhaps future versions of Python could include an option
to do a depth first walk instead of the default breadth first.
Modifying os.path.walk() to allow for optional depth first
walking is simple. I have attached a patch to posixpath.py
that demonstrates this. This adds an if conditional at
the beginning and end of the walk() function.
I have not checked to see if other platforms share the posixpath.py
module this for the walk() function, but if there
is interest then I'd be happy to cross reference this.

Yours,
Noah

*** posixpath.py 2003-04-19 22:26:08.000000000 -0700
--- posixpath_walk_depthfirst.py 2003-04-19 22:12:48.000000000 -0700
***************
*** 259,265 ****
   # The func may modify the filenames list, to implement a filter,
   # or to impose a different order of visiting.

! def walk(top, func, arg):
       """Directory tree walk with callback function.

       For each directory in the directory tree rooted at top (including top
--- 259,265 ----
   # The func may modify the filenames list, to implement a filter,
   # or to impose a different order of visiting.

! def walk(top, func, arg, depthfirst=False):
       """Directory tree walk with callback function.

       For each directory in the directory tree rooted at top (including top
***************
*** 272,284 ****
       order of visiting.  No semantics are defined for, or required of, arg,
       beyond that arg is always passed to func.  It can be used, e.g., to pass
       a filename pattern, or a mutable object designed to accumulate
!     statistics.  Passing None for arg is common."""

       try:
           names = os.listdir(top)
       except os.error:
           return
!     func(arg, top, names)
       for name in names:
           name = join(top, name)
           try:
--- 272,287 ----
       order of visiting.  No semantics are defined for, or required of, arg,
       beyond that arg is always passed to func.  It can be used, e.g., to pass
       a filename pattern, or a mutable object designed to accumulate
!     statistics.  Passing None for arg is common. The optional depthfirst
!     argument may be set to True to walk the directory tree depth first.
!     The default is False (walk breadth first)."""

       try:
           names = os.listdir(top)
       except os.error:
           return
!     if not depthfirst:
!         func(arg, top, names)
       for name in names:
           name = join(top, name)
           try:
***************
*** 287,293 ****
               continue
           if stat.S_ISDIR(st.st_mode):
               walk(name, func, arg)
!

   # Expand paths beginning with '~' or '~user'.
   # '~' means $HOME; '~user' means that user's home directory.
--- 290,297 ----
               continue
           if stat.S_ISDIR(st.st_mode):
               walk(name, func, arg)
!     if depthfirst:
!         func(arg, top, names)

   # Expand paths beginning with '~' or '~user'.
   # '~' means $HOME; '~user' means that user's home directory.
***************
*** 416,420 ****
       return filename

   supports_unicode_filenames = False
-
-
--- 420,422 ----



From oren-py-l@hishome.net  Sun Apr 20 09:37:18 2003
From: oren-py-l@hishome.net (Oren Tirosh)
Date: Sun, 20 Apr 2003 04:37:18 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
Message-ID: <20030420083718.GA65548@hishome.net>

On Sun, Apr 20, 2003 at 01:18:08AM -0700, Brett Cannon wrote:
> [Alex Martelli]
> 
> > On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote:
> <snip>    ...
> > In the context of Numeric use, it's quite appropriate to have sum,
> prod,
> > and the other ufuncs' reduce AND accumulate methods.  In everyday
> > programming in other fields, the demand for the functionality given by
> > sum is FAR higher than that given by prod.
> <snip>
> 
> I think part of the trouble here is the name.  The word "sum" just
> automatically causes one to think math.  This leads to thinking of
> multiplication, division, and subtraction.  But Alex's proposed function
> does more than a summation by special-casing the concatentation of
> strings.

The special case is just a performance optimization. Without it the
sum function would still return the same result. The sum function 
should work for any object that defines a + operator. 

I agree that the name 'sum' isn't 100% intuitive for use with strings 
but I can't think of any name that would be really natural for both.

    Oren


From ping@zesty.ca  Sun Apr 20 09:42:26 2003
From: ping@zesty.ca (Ka-Ping Yee)
Date: Sun, 20 Apr 2003 03:42:26 -0500 (CDT)
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
Message-ID: <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org>

On Sun, 20 Apr 2003, Brett Cannon wrote:
> I think part of the trouble here is the name.  The word "sum" just
> automatically causes one to think math.  This leads to thinking of
> multiplication, division, and subtraction.  But Alex's proposed function
> does more than a summation by special-casing the concatentation of
> strings.
>
> Perhaps renaming it to something like "combine()" would help do away with
> the worry of people wanting a complimentary version for multiplication
> since it does more than just sum numbers; it also combines strings in a
> very efficient manner.

Why not simply call it "add()", if it's going to be in the built-ins?
That seems like the most straightforward and accurate name.

It would have the same argument spec as min() and max(): it accepts
a single list argument, or multiple arguments to be added together.
Thus, no serious confusion with operator.add -- builtin add() would
work anywhere that operator.add works now.

>>> help(add)
add(...)
    add(sequence) -> value
    add(a, b, c, ...) -> value

    With a single sequence argument, add together all the elements.
    With two or more arguments, add together all the arguments.

New question: what is add([])?  If add() is really polymorphic, then
this should probably raise an exception (just like min() and max() do).
That would lead to idioms such as

    add(numberlist + [0])

    add(stringlist + [''])

I suppose those don't look too bad.  Nothing vastly better springs
to mind.


-- ?!ng



From aleax@aleax.it  Sun Apr 20 11:10:07 2003
From: aleax@aleax.it (Alex Martelli)
Date: Sun, 20 Apr 2003 12:10:07 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
Message-ID: <200304201210.07054.aleax@aleax.it>

On Sunday 20 April 2003 10:18 am, Brett Cannon wrote:
   ...
> I think part of the trouble here is the name.  The word "sum" just
> automatically causes one to think math.  This leads to thinking of
> multiplication, division, and subtraction.  But Alex's proposed function
> does more than a summation by special-casing the concatentation of
> strings.

Actually it does more than summation because, in Python, + does
more than summation.  E.g., my function needs absolutely no
special-casing whatsoever to produce

>>> sum([[1,2],[3,4],[5,6]])
[1, 2, 3, 4, 5, 6]

I special-cased the "sum of strings" just for performance issues,
nothing more!

> Perhaps renaming it to something like "combine()" would help do away with
> the worry of people wanting a complimentary version for multiplication
> since it does more than just sum numbers; it also combines strings in a
> very efficient manner.  I mean we could extend this to all built-in types
> where there is a reasonable operation for them (but this is jumping the
> gun).

sum already works on all types, built-in or not, for which + and operator.add
work -- thus, 'combine' sounds too vague to me, and the natural way to
"extend this" to any other type would be to have that type support + (by
defining __add__ or in the equivalent C-coded way).


> And as for the worry about this being a built-in, we do have divmod for
> goodness sakes.  I mean divmod() is nothing more than glorifying division
> and remainder for the sake of clean code; ``divmod(3,2) == (3/2, 3%2)``.
> This function serves the same purpose in the end; to allow for cleaner
> code with some improved performance for a function that people use on a
> regular enough basis to ask for it constantly on c.l.p .

I think this is a very good point.  The worries come from the fact that we
already have many built-ins (44 functions at last count -- counting such
things as exception classes doesn't seem sensible) -- but is it a good idea
to exclude 'sum' because we have more exotic built-ins such as 'divmod',
or semi-obsolete ones such as 'apply'?

I've only seen Raymond objecting to 'sum' as a built-in (naming it 'add'
might be just as fine, and having it accept the same argument patterns
as max/min probably useful) -- though I may have missed other voices
speaking to this issue -- so perhaps it's best if he clarifies his objection.


Alex



From andrew@acooke.org  Sun Apr 20 11:53:41 2003
From: andrew@acooke.org (andrew cooke)
Date: Sun, 20 Apr 2003 06:53:41 -0400 (CLT)
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <200304200516.02382.fincher.8@osu.edu>
References: <20030419224110.GB2460@barsoom.org>
 <200304200516.02382.fincher.8@osu.edu>
Message-ID: <34897.127.0.0.1.1050836021.squirrel@127.0.0.1>

hi,

i haven't looked at the code, but when you mention lists are you referring
to standard python structures?  i understood that the thing in python that
looks like a list is actually an array (a simple one, not a vlist), so
access to index elements is done in constant time.

however, that doesn't necesarily alter your argument as using an array for
a fifo queue in a naieve manner is going to cause problems too (unless the
implementation explicitly implements a circular buffer, say, or the array
implementation is clever enough to drop leading elements whch contain
nulls - note that circular buffers are a bit tricky to extend in size).

see eg http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52246

(the links you gave don't work for me.)

cheers,
andrew

Jeremy Fincher said:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> 2.3 seems to focus somewhat on adding a wider variety of data structures
> to
> Python -- well, sets and heapq, at least :)  One thing I've found lacking,
> though, is a nice O(1) FIFO queue -- even the standard Queue module
> underlying uses a list as a queue, which means the dequeue operation is
> O(N)
> in the size of the queue.  I'm curious what the possiblity of getting a
> queue
> module (which would probably have to be named "fifo", since Queue is
> already
> taken and some operating systems use case-insensitive filesystems) added
> to
> the standard library would be.
>
> If it is a possibility, I have a pure-Python implementation using the
> mechanism described at
> <http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=a23cjl%24dps%241%40serv1.iunet.it>.
> The module itself is at <http://www.cis.ohio-state.edu/fifo.py>; the tests
> are at <http://www.cis.ohio-state.edu/test_fifo.py>.
>
> Jeremy
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (FreeBSD)
>
> iD8DBQE+omVRqkDiu+Bs+JIRAilOAKCWe7CfZqyBboi/zGZ5jHxnKSiS5ACfTBEt
> D2Hz+k7dzXTW3HjXByzlA2M=
> =juHN
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
>
>


-- 
http://www.acooke.org/andrew


From skip@mojam.com  Sun Apr 20 13:00:32 2003
From: skip@mojam.com (Skip Montanaro)
Date: Sun, 20 Apr 2003 07:00:32 -0500
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200304201200.h3KC0WE16154@manatee.mojam.com>

Bug/Patch Summary
-----------------

396 open / 3541 total bugs (+13)
130 open / 2092 total patches (-4)

New Bugs
--------

profile.run makes assumption regarding namespace (2003-04-06)
	http://python.org/sf/716587
"build_ext" "libraries" subcommand not split into values (2003-04-07)
	http://python.org/sf/716634
Uthread problem - Pipe left open (2003-04-08)
	http://python.org/sf/717614
inspect, class instances and __getattr__ (2003-04-09)
	http://python.org/sf/718532
sys.path on MacOSX (2003-04-10)
	http://python.org/sf/719297
pimp needs to do download and untar itself (2003-04-10)
	http://python.org/sf/719300
Icon on applets is wrong (2003-04-10)
	http://python.org/sf/719303
string exceptions are deprecated (2003-04-10)
	http://python.org/sf/719367
Mac OS X painless compilation (2003-04-11)
	http://python.org/sf/719549
tokenize module w/ coding cookie (2003-04-11)
	http://python.org/sf/719888
datetime types don't work as bases (2003-04-13)
	http://python.org/sf/720908
Building lib.pdf fails on MacOSX (2003-04-14)
	http://python.org/sf/721157
Acrobat Reader 5 compatibility (2003-04-14)
	http://python.org/sf/721160
tarfile gets filenames wrong (2003-04-15)
	http://python.org/sf/721871
_winreg doesn't handle NULL bytes in value names (2003-04-16)
	http://python.org/sf/722413
weakref: proxy_print and proxy_repr incons. (2003-04-16)
	http://python.org/sf/722763
Put a reference to print in the Library Reference, please. (2003-04-17)
	http://python.org/sf/723136
PyThreadState_Clear() docs incorrect (2003-04-17)
	http://python.org/sf/723205
add timeout support in socket using modules (2003-04-17)
	http://python.org/sf/723287
runtime_library_dirs broken under OS X (2003-04-17)
	http://python.org/sf/723495
__slots__ broken in 2.3a with ("__dict__", ) (2003-04-18)
	http://python.org/sf/723540
app-building with Bundlebuilder for framework builds (2003-04-18)
	http://python.org/sf/723562
logging.setLoggerClass() doesn't support new-style classes (2003-04-18)
	http://python.org/sf/723801
overintelligent slice() behavior on integers (2003-04-18)
	http://python.org/sf/723806
urlopen(url_to_a_non-existing-domain) raises gaierror (2003-04-18)
	http://python.org/sf/723831
imaplib should convert line endings to be rfc2822 complient (2003-04-18)
	http://python.org/sf/723962

New Patches
-----------

PEP 269 Implementation (2002-08-23)
	http://python.org/sf/599331
has_function() method for CCompiler (2003-04-07)
	http://python.org/sf/717152
allow timeit to see your globals() (2003-04-08)
	http://python.org/sf/717575
Patch to distutils doc for metadata explanation (2003-04-09)
	http://python.org/sf/718027
DESTDIR variable patch (2003-04-09)
	http://python.org/sf/718286
fix test_long failure on OSF/1 (2003-04-10)
	http://python.org/sf/719359
Remove __file__ after running $PYTHONSTARTUP (2003-04-11)
	http://python.org/sf/719777
proposed patch for posixpath.py: getctime() (2003-04-12)
	http://python.org/sf/720188
Patch to make shlex accept escaped quotes in strings. (2003-04-12)
	http://python.org/sf/720329
iconv_codec 3rd generation (2003-04-13)
	http://python.org/sf/720585
Some bug fixes for regular ex code. (2003-04-14)
	http://python.org/sf/720991
Add copyrange method to array. (2003-04-14)
	http://python.org/sf/721061
Remote debugging with pdb.py (2003-04-14)
	http://python.org/sf/721464
Better output for unittest (2003-04-16)
	http://python.org/sf/722638
PyArg_ParseTuple problem with 'L' format (2003-04-17)
	http://python.org/sf/723201
__del__ in dumbdbm fails under some circumstances (2003-04-17)
	http://python.org/sf/723231
ability to pass a timeout to underlying socket (2003-04-17)
	http://python.org/sf/723312
terminal type option subnegotiation in telnetlib (2003-04-17)
	http://python.org/sf/723364
Backport of recent sre fixes. (2003-04-18)
	http://python.org/sf/723940

Closed Bugs
-----------

urllib needs 303/307 handlers (2002-06-12)
	http://python.org/sf/568068
Support for masks in getargs.c (2002-08-14)
	http://python.org/sf/595026
Cannot compile escaped unicode character (2002-09-20)
	http://python.org/sf/612074
Numerous defunct threads left behind (2002-10-10)
	http://python.org/sf/621548
no docs for HTMLParser.handle_pi (2002-12-27)
	http://python.org/sf/659188
2.3a1 computes lastindex incorrectly (2003-01-22)
	http://python.org/sf/672491
re.LOCALE, umlaut and \w (2003-02-21)
	http://python.org/sf/690974
gensuitemodule overhaul (2003-03-04)
	http://python.org/sf/697179
string.strip implementation/doc mismatch (2003-03-04)
	http://python.org/sf/697220
builtin type inconsistency (2003-03-07)
	http://python.org/sf/699312
Obscure error message (2003-03-08)
	http://python.org/sf/699934
gensuitemodule needs to be documented (2003-03-29)
	http://python.org/sf/711986
test_zipimport failing on ia64 (at least) (2003-03-30)
	http://python.org/sf/712322
Cannot change the class of a list (2003-03-31)
	http://python.org/sf/712975

Closed Patches
--------------

SimpleXMLRPCServer auto-docing subclass (2002-03-29)
	http://python.org/sf/536883
optionally make shelve less surprising (2002-05-07)
	http://python.org/sf/553171
GC: untrack simple objects (2002-05-21)
	http://python.org/sf/558745
gettext module charset changes (2002-06-13)
	http://python.org/sf/568669
Shadow Password Support Module (2002-07-09)
	http://python.org/sf/579435
Add popen2 like functionality to pty.py. (2002-08-03)
	http://python.org/sf/590513
Refactoring of difflib.Differ (2002-08-27)
	http://python.org/sf/600984
Punycode encoding (2002-11-02)
	http://python.org/sf/632643
refactoring and documenting ModuleFinder (2002-11-25)
	http://python.org/sf/643711
Complementary patch for OpenVMS (2002-12-07)
	http://python.org/sf/649997
659188: no docs for HTMLParser (2003-01-04)
	http://python.org/sf/662464
xmlrpclib: better string encoding in responce package (2003-02-03)
	http://python.org/sf/679383
Tiny patch for bug 612074: sre unicode escapes (2003-02-05)
	http://python.org/sf/681152
AutoThreadState implementation (2003-02-10)
	http://python.org/sf/684256
optparse OptionGroup docs (2003-03-05)
	http://python.org/sf/697941
docs for hotshot module (2003-03-05)
	http://python.org/sf/698505
time.tzset standards compliance update (2003-03-19)
	http://python.org/sf/706707
Allow range() to return long integer values (2003-03-21)
	http://python.org/sf/707427
remove -static option from cygwinccompiler (2003-03-24)
	http://python.org/sf/709178
new test_urllib and patch for found urllib bug (2003-03-27)
	http://python.org/sf/711002
Removing unnecessary lock operations (2003-03-29)
	http://python.org/sf/711835
iconv_codec NG (2003-04-02)
	http://python.org/sf/713820
Unicode Codecs for CJK Encodings (2003-04-02)
	http://python.org/sf/713824
Guard against segfaults in debug code (2003-04-02)
	http://python.org/sf/714348
Document freeze process in PC/config.c (2003-04-03)
	http://python.org/sf/714957


From lists@morpheus.demon.co.uk  Sun Apr 20 14:39:25 2003
From: lists@morpheus.demon.co.uk (Paul Moore)
Date: Sun, 20 Apr 2003 14:39:25 +0100
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it>
 <16034.20849.214752.656523@monster.linux.in>
 <200304201001.12643.aleax@aleax.it>
Message-ID: <n2m-g.he8tffua.fsf@morpheus.demon.co.uk>

Alex Martelli <aleax@aleax.it> writes:

> Anyway, whence now -- a PEP?  (Seems a bit too small for that).  Or, do
> I just submit the patch (to where -- builtins?) and let Guido pronounce?

Not that I have much to say, but I'd say submit a patch, and either
assign it to Guido for pronouncement, or just wait for his view. You
may hit the "no new features for 2.3" rule, and have to wait for 2.4 -
personally, I think it's small enough for that not to matter, but
Guido's been pretty strict with that one so far...

Paul.
-- 
This signature intentionally left blank


From aahz@pythoncraft.com  Sun Apr 20 15:00:22 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 10:00:22 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <004101c30701$f2bde9c0$0a11a044@oemcomputer>
References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer>
Message-ID: <20030420140022.GA6462@panix.com>

On Sun, Apr 20, 2003, Raymond Hettinger wrote:
>
> __builtin__ is already too fat.  math is for floats.  operator is mostly
> for operators.  Perhaps make a separate module for vector-to-scalar 
> operations like min, max, product, average, moment, and dotproduct.

Call it "statistics".  Yes, I've seen the comments about using add()/sum()
for strings, but I think numeric usage will be by far the most common.
I also think that max() and min() should be removed from the builtins.
Having a good, simple statistics library standard would be a Good Thing.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From jack@performancedrivers.com  Sun Apr 20 15:22:46 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Sun, 20 Apr 2003 10:22:46 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030420140022.GA6462@panix.com>; from aahz@pythoncraft.com on Sun, Apr 20, 2003 at 10:00:22AM -0400
References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> <20030420140022.GA6462@panix.com>
Message-ID: <20030420102245.A15881@localhost.localdomain>

On Sun, Apr 20, 2003 at 10:00:22AM -0400, Aahz wrote:
> On Sun, Apr 20, 2003, Raymond Hettinger wrote:
> >
> > __builtin__ is already too fat.  math is for floats.  operator is mostly
> > for operators.  Perhaps make a separate module for vector-to-scalar 
> > operations like min, max, product, average, moment, and dotproduct.
> 
> Call it "statistics".  Yes, I've seen the comments about using add()/sum()
> for strings, but I think numeric usage will be by far the most common.
> I also think that max() and min() should be removed from the builtins.
> Having a good, simple statistics library standard would be a Good Thing.

Would operations performed on sets go in there too, like combinatorics[1] that
are also frequently golfed on c.l.py?

I'm also not sure that add() means '+' or 'plus' to everyday people.
I read strvar += 'foo' as concatenate or 'plus' at a stretch but not 'add'.

-jack

[1] my 'probstat' module does these in C for lists/tuples.  probstat.sf.net



From jack@performancedrivers.com  Sun Apr 20 15:29:04 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Sun, 20 Apr 2003 10:29:04 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org>; from ping@zesty.ca on Sun, Apr 20, 2003 at 03:42:26AM -0500
References: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org>
Message-ID: <20030420102904.B15881@localhost.localdomain>

On Sun, Apr 20, 2003 at 03:42:26AM -0500, Ka-Ping Yee wrote:
> New question: what is add([])?  If add() is really polymorphic, then
> this should probably raise an exception (just like min() and max() do).
> That would lead to idioms such as
> 
>     add(numberlist + [0])
> 
>     add(stringlist + [''])
> 
> I suppose those don't look too bad.  Nothing vastly better springs
> to mind.

For a large numberlist this is a problem, it causes a copy of the whole list.
Not to mention it looks like a perl coercion hack.  The third argument to 
reduce is there to avoid the hack.

so now we have

from newmodule import add
answer = add(numberlist, 0)

why don't we just write it as

from operator import add
answer = reduce(add, numberlist, 0)

-jack


From andrew-pydev@lexical.org.uk  Sun Apr 20 15:38:17 2003
From: andrew-pydev@lexical.org.uk (Andrew Walkingshaw)
Date: Sun, 20 Apr 2003 15:38:17 +0100
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030420140022.GA6462@panix.com>
References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> <20030420140022.GA6462@panix.com>
Message-ID: <20030420143817.GA43283@colon.colondot.net>

On Sun, Apr 20, 2003 at 10:00:22AM -0400, Aahz wrote:
> On Sun, Apr 20, 2003, Raymond Hettinger wrote:
> >
> > __builtin__ is already too fat.  math is for floats.  operator is mostly
> > for operators.  Perhaps make a separate module for vector-to-scalar 
> > operations like min, max, product, average, moment, and dotproduct.
> 
> Call it "statistics".  Yes, I've seen the comments about using add()/sum()
> for strings, but I think numeric usage will be by far the most common.

A lightweight vector class would be very useful; it's something I've
had to roll my own of for a lot of scientific code I'm writing (the
problem being that it's often impractical to build Numeric everywhere,
so you can't rely on having it whereas you probably can rely on at
least having Python.)

A good example is in processing of output from solid-state physics codes
(a subject very close to my heart); you want vectors to store (eg) positions
of and forces on atoms, but you don't need the performance of Numeric - and
the distribution overhead of same.

As such, this is something I've got lying around; I'd be more than willing
to distribute this (~100 line) class to whoever wants it under whatever
license they care for.

It should be easily extensible to do whatever else people want in this
regard, as well.

- Andrew

-- 
email: andrew@lexical.org.uk                        http://www.lexical.org.uk/
Earth Sciences, University of Cambridge             http://www.esc.cam.ac.uk/
CUR1350, 1350 MW Cambridgeshire and online          http://www.cur1350.co.uk/


From jack@performancedrivers.com  Sun Apr 20 15:58:07 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Sun, 20 Apr 2003 10:58:07 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304201210.07054.aleax@aleax.it>; from aleax@aleax.it on Sun, Apr 20, 2003 at 12:10:07PM +0200
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it>
Message-ID: <20030420105807.C15881@localhost.localdomain>

I see two points

1 - it isn't obvious to many people to write
    reduce(operator.add, mylist, 0) # where '0' is just an appropriate default

2 - reduce() is slower than a special purpose function

#2 is fixable (see my earlier posts) and isn't the main argument of proponents.

To #1 I would argue for education about reduce().  We already have minor
style wars about map/filter versus list comps.  This would just add one more.

People would still have to learn about reduce() when they wanted the first
argument to be anything other than operator.add.

aliasing
  reduce(operator.add, mylist, 0)
to
  sum(mylist, 0)

is a solution looking for a problem, IMO.  I know I would have to learn what
the to-be-named module of aliases does if people start to use them.  I'll be
selfish here, I don't want to learn em.  The proposed patch would be
equivilent to a one line alias, even if it is written more verbosely in C.
A one line alias for existing functionality sounds like TMTOWTDI to me.

I also don't want people having patch fights every time they see sum() or 
reduce() in code (re-submitting whichever version they prefer).

A possible solution could be a 'newbie' module that defined things like
'sum' with the canonical solution listed in the documentation.  It would be
a nice clear flag to readers of the code while allowing the noob to skip
reading the reduce() manpage.

-jack



From dave@boost-consulting.com  Sun Apr 20 16:07:24 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Sun, 20 Apr 2003 11:07:24 -0400
Subject: [Python-Dev] Hook Extension Module Import?
Message-ID: <847k9pp5qr.fsf@boost-consulting.com>

Hi,

I think I need a way to temporarily (from 'C'), arrange to be notified
just before and just after a new extension module is loaded.  Is this
possible?  I didn't see anything obvious in the source.  BTW, I'd be
just as happy if it were possible to do the same thing for any module
(i.e., not discriminating between extension and pure python modules).

Thanks in advance,
Dave

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From fincher.8@osu.edu  Sun Apr 20 17:21:49 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 12:21:49 -0400
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <34897.127.0.0.1.1050836021.squirrel@127.0.0.1>
References: <20030419224110.GB2460@barsoom.org>
 <200304200516.02382.fincher.8@osu.edu>
 <34897.127.0.0.1.1050836021.squirrel@127.0.0.1>
Message-ID: <200304201221.51200.fincher.8@osu.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 20 April 2003 06:53 am, andrew cooke wrote:
> i haven't looked at the code, but when you mention lists are you referring
> to standard python structures?  i understood that the thing in python that
> looks like a list is actually an array (a simple one, not a vlist), so
> access to index elements is done in constant time.

But deleting an element from the beginning is O(n), because all the elements 
have to be moved back to replace it.  So queues implemented via list.append 
and list.pop(0) are O(N) in dequeue.

> (the links you gave don't work for me.)

Ah, shoot, I always forget the ~fincher.

New links:

<http://www.cis.ohio-state.edu/~fincher/fifo.py>
<http://www.cis.ohio-state.edu/~fincher/test_fifo.py>

Jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+oskdqkDiu+Bs+JIRAu/tAJ0dPjbD65e5Kw1XctbrWwYGt4jAZgCaAgF3
2+3nzAjeswigkg9697bx38Y=
=E8ES
-----END PGP SIGNATURE-----



From aahz@pythoncraft.com  Sun Apr 20 17:35:33 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 12:35:33 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <847k9pp5qr.fsf@boost-consulting.com>
References: <847k9pp5qr.fsf@boost-consulting.com>
Message-ID: <20030420163533.GA1885@panix.com>

On Sun, Apr 20, 2003, David Abrahams wrote:
> 
> I think I need a way to temporarily (from 'C'), arrange to be notified
> just before and just after a new extension module is loaded.  Is this
> possible?  I didn't see anything obvious in the source.  BTW, I'd be
> just as happy if it were possible to do the same thing for any module
> (i.e., not discriminating between extension and pure python modules).

http://www.python.org/peps/pep-0302.html
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From dave@boost-consulting.com  Sun Apr 20 17:58:53 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Sun, 20 Apr 2003 12:58:53 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <20030420163533.GA1885@panix.com> (aahz@pythoncraft.com's
 message of "Sun, 20 Apr 2003 12:35:33 -0400")
References: <847k9pp5qr.fsf@boost-consulting.com>
 <20030420163533.GA1885@panix.com>
Message-ID: <84wuhpnm0i.fsf@boost-consulting.com>

Aahz <aahz@pythoncraft.com> writes:

> On Sun, Apr 20, 2003, David Abrahams wrote:
>> 
>> I think I need a way to temporarily (from 'C'), arrange to be notified
>> just before and just after a new extension module is loaded.  Is this
>> possible?  I didn't see anything obvious in the source.  BTW, I'd be
>> just as happy if it were possible to do the same thing for any module
>> (i.e., not discriminating between extension and pure python modules).
>
> http://www.python.org/peps/pep-0302.html

I guess I should take that to mean "you can't do that yet" (?)

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From aahz@pythoncraft.com  Sun Apr 20 18:04:09 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 13:04:09 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <84wuhpnm0i.fsf@boost-consulting.com>
References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com>
Message-ID: <20030420170408.GA6705@panix.com>

On Sun, Apr 20, 2003, David Abrahams wrote:
> Aahz <aahz@pythoncraft.com> writes:
>> On Sun, Apr 20, 2003, David Abrahams wrote:
>>> 
>>> I think I need a way to temporarily (from 'C'), arrange to be notified
>>> just before and just after a new extension module is loaded.  Is this
>>> possible?  I didn't see anything obvious in the source.  BTW, I'd be
>>> just as happy if it were possible to do the same thing for any module
>>> (i.e., not discriminating between extension and pure python modules).
>>
>> http://www.python.org/peps/pep-0302.html
> 
> I guess I should take that to mean "you can't do that yet" (?)

As the PEP says, you *could* define an __import__ hook, but that would
likely be more effort than you want.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From dave@boost-consulting.com  Sun Apr 20 18:41:41 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Sun, 20 Apr 2003 13:41:41 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <20030420170408.GA6705@panix.com> (aahz@pythoncraft.com's
 message of "Sun, 20 Apr 2003 13:04:09 -0400")
References: <847k9pp5qr.fsf@boost-consulting.com>
 <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com>
 <20030420170408.GA6705@panix.com>
Message-ID: <84r87xnk16.fsf@boost-consulting.com>

Aahz <aahz@pythoncraft.com> writes:

> On Sun, Apr 20, 2003, David Abrahams wrote:
>> Aahz <aahz@pythoncraft.com> writes:
>>> On Sun, Apr 20, 2003, David Abrahams wrote:
>>>> 
>>>> I think I need a way to temporarily (from 'C'), arrange to be notified
>>>> just before and just after a new extension module is loaded.  Is this
>>>> possible?  I didn't see anything obvious in the source.  BTW, I'd be
>>>> just as happy if it were possible to do the same thing for any module
>>>> (i.e., not discriminating between extension and pure python modules).
>>>
>>> http://www.python.org/peps/pep-0302.html
>> 
>> I guess I should take that to mean "you can't do that yet" (?)
>
> As the PEP says, you *could* define an __import__ hook, but that would
> likely be more effort than you want.

It also says:
   
    The situation gets worse when you need to extend the import
    mechanism from C: it's currently impossible, apart from hacking
    Python's import.c or reimplementing much of import.c from scratch.

OTOH, it's not obvious to me why this should be so.  Can't I
access/replace builtins.__import__ from C/C++?

That said, if I could do that, it doesn't seem like much trouble at
all to get the behavior I want.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From mwh@python.net  Sun Apr 20 19:15:45 2003
From: mwh@python.net (Michael Hudson)
Date: Sun, 20 Apr 2003 19:15:45 +0100
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <200304201221.51200.fincher.8@osu.edu> (Jeremy Fincher's
 message of "Sun, 20 Apr 2003 12:21:49 -0400")
References: <20030419224110.GB2460@barsoom.org>
 <200304200516.02382.fincher.8@osu.edu>
 <34897.127.0.0.1.1050836021.squirrel@127.0.0.1>
 <200304201221.51200.fincher.8@osu.edu>
Message-ID: <2mu1ctnige.fsf@starship.python.net>

Jeremy Fincher <fincher.8@osu.edu> writes:

> <http://www.cis.ohio-state.edu/~fincher/fifo.py>
> <http://www.cis.ohio-state.edu/~fincher/test_fifo.py>

What do you gain from inheriting from dict?  It seems to me that
merely containing one would do.

Cheers,
M.

-- 
  ARTHUR:  Ford, you're turning into a penguin, stop it.
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 2


From agthorr@barsoom.org  Sun Apr 20 19:24:19 2003
From: agthorr@barsoom.org (Agthorr)
Date: Sun, 20 Apr 2003 11:24:19 -0700
Subject: [Python-Dev] heapq
In-Reply-To: <200304200456.52084.fincher.8@osu.edu>
References: <20030419224110.GB2460@barsoom.org> <200304200456.52084.fincher.8@osu.edu>
Message-ID: <20030420182419.GA8449@barsoom.org>

On Sun, Apr 20, 2003 at 04:56:46AM -0400, Jeremy Fincher wrote:
> I've got unit tests for the object written, if it is something that will 
> possibly go into the standard library.

FWIW, I have unit tests written for my heap implementation as well.

-- Agthorr



From agthorr@barsoom.org  Sun Apr 20 19:30:06 2003
From: agthorr@barsoom.org (Agthorr)
Date: Sun, 20 Apr 2003 11:30:06 -0700
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <200304200516.02382.fincher.8@osu.edu>
References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu>
Message-ID: <20030420183005.GB8449@barsoom.org>

On Sun, Apr 20, 2003 at 05:16:01AM -0400, Jeremy Fincher wrote:
> 2.3 seems to focus somewhat on adding a wider variety of data structures to 
> Python -- well, sets and heapq, at least :)  One thing I've found lacking, 
> though, is a nice O(1) FIFO queue -- even the standard Queue module 

I actually just wrote a modification to Queue that is O(1).  There's
no change to the interface, so it doesn't require adding a new data
structure.

I have the code here:
    http://www.cs.uoregon.edu/~agthorr/Queue.py

The only changes are near the bottom of the file, beginning with the
_init() function.  My implementation uses Python lists, but it uses
them in a smarter way than the existing Queue implementation.

I'll submit a patch to SourceForge in a day or two.

-- Agthorr


From aahz@pythoncraft.com  Sun Apr 20 19:31:05 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 14:31:05 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <84r87xnk16.fsf@boost-consulting.com>
References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com> <20030420170408.GA6705@panix.com> <84r87xnk16.fsf@boost-consulting.com>
Message-ID: <20030420183105.GA18929@panix.com>

On Sun, Apr 20, 2003, David Abrahams wrote:
> Aahz <aahz@pythoncraft.com> writes:
>> On Sun, Apr 20, 2003, David Abrahams wrote:
>>> Aahz <aahz@pythoncraft.com> writes:
>>>> On Sun, Apr 20, 2003, David Abrahams wrote:
>>>>> 
>>>>> I think I need a way to temporarily (from 'C'), arrange to be notified
>>>>> just before and just after a new extension module is loaded.  Is this
>>>>> possible?  I didn't see anything obvious in the source.  BTW, I'd be
>>>>> just as happy if it were possible to do the same thing for any module
>>>>> (i.e., not discriminating between extension and pure python modules).
>>>>
>>>> http://www.python.org/peps/pep-0302.html
>>> 
>>> I guess I should take that to mean "you can't do that yet" (?)
>>
>> As the PEP says, you *could* define an __import__ hook, but that would
>> likely be more effort than you want.
> 
> It also says:
>    
>     The situation gets worse when you need to extend the import
>     mechanism from C: it's currently impossible, apart from hacking
>     Python's import.c or reimplementing much of import.c from scratch.
> 
> OTOH, it's not obvious to me why this should be so.  Can't I
> access/replace builtins.__import__ from C/C++?

Sure, but then you need to replace import.c, just as it says.  I'd be
inclined to do the heavy lifting in Python with a callback into C code
(after all, you're not calling it so frequently as to make it a
performance issue).
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From eppstein@ics.uci.edu  Sun Apr 20 20:04:35 2003
From: eppstein@ics.uci.edu (David Eppstein)
Date: Sun, 20 Apr 2003 12:04:35 -0700
Subject: [Python-Dev] Re: FIFO data structure?
References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> <200304201221.51200.fincher.8@osu.edu> <2mu1ctnige.fsf@starship.python.net>
Message-ID: <eppstein-C048B2.12043520042003@main.gmane.org>

In article <2mu1ctnige.fsf@starship.python.net>,
 Michael Hudson <mwh@python.net> wrote:

> > <http://www.cis.ohio-state.edu/~fincher/fifo.py>
> > <http://www.cis.ohio-state.edu/~fincher/test_fifo.py>
> 
> What do you gain from inheriting from dict?  It seems to me that
> merely containing one would do.

See <http://tinyurl.com/9x6d> for some tests indicating that using dict for fifo is a slow way to go.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science



From mwh@python.net  Sun Apr 20 21:32:43 2003
From: mwh@python.net (Michael Hudson)
Date: Sun, 20 Apr 2003 21:32:43 +0100
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <20030420183105.GA18929@panix.com> (Aahz's message of "Sun, 20
 Apr 2003 14:31:05 -0400")
References: <847k9pp5qr.fsf@boost-consulting.com>
 <20030420163533.GA1885@panix.com>
 <84wuhpnm0i.fsf@boost-consulting.com>
 <20030420170408.GA6705@panix.com>
 <84r87xnk16.fsf@boost-consulting.com>
 <20030420183105.GA18929@panix.com>
Message-ID: <2mr87woqok.fsf@starship.python.net>

Aahz <aahz@pythoncraft.com> writes:

>> OTOH, it's not obvious to me why this should be so.  Can't I
>> access/replace builtins.__import__ from C/C++?
>
> Sure, but then you need to replace import.c, just as it says.

Not in this case: if all you want is notification, surely you can call
the original __import__ to do the work...

Cheers,
M.

-- 
  In many ways, it's a dull language, borrowing solid old concepts
  from many other languages & styles:  boring syntax, unsurprising
  semantics, few  automatic coercions, etc etc.  But that's one of
  the things I like about it.                 -- Tim Peters, 16 Sep 93


From fincher.8@osu.edu  Sun Apr 20 22:53:15 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 17:53:15 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: <eppstein-C048B2.12043520042003@main.gmane.org>
References: <20030419224110.GB2460@barsoom.org>
 <2mu1ctnige.fsf@starship.python.net>
 <eppstein-C048B2.12043520042003@main.gmane.org>
Message-ID: <200304201753.18059.fincher.8@osu.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 20 April 2003 03:04 pm, David Eppstein wrote:
> See <http://tinyurl.com/9x6d> for some tests indicating that using dict for
> fifo is a slow way to go.

That's definitely an inadequate test.  First,if I read correctly, the test 
function doesn't test the plain list or array.array('i') as fifos, it tests 
them as a lifos (using simple .append(elt) and .pop()).  Second, it never 
allows the fifo to have a size greater than 1, which completely negates the 
O(N) disadvantage of simple list-based implementations.

Change the test function's for loops to this:

 for i in xrange(iterations):
  fifo.append(i)
 for i in xrange(iterations):
  j = fifo.pop()

And you'll have a much more accurate comparison of the relative speed of the 
queues, taking into account naive list implementations' O(N) dequeue.

I've written my own speed comparison using timeit.py.  It's available at 
<http://www.cis.ohio-state.edu/~fincher/fifo_comparison.py>.  Interestingly 
enough, the amortized-time 2-list approach is faster than all the other 
approaches for n elements somewhere between 100 and 1000.  Here are my 
results with Python 2.2:

1        ListSubclassFifo     0.000233
1        DictSubclassFifo     0.000419
1        O1ListSubclassFifo   0.000350

10       ListSubclassFifo     0.001200
10       DictSubclassFifo     0.002814
10       O1ListSubclassFifo   0.001546

100      ListSubclassFifo     0.010613
100      DictSubclassFifo     0.028463
100      O1ListSubclassFifo   0.012658

1000     ListSubclassFifo     0.174211
1000     DictSubclassFifo     0.294973
1000     O1ListSubclassFifo   0.121407

10000    ListSubclassFifo     8.536460
10000    DictSubclassFifo     3.056266
10000    O1ListSubclassFifo   1.224752

(The O1ListSubclassFifo uses the standard (at least standard in functional 
programming :)) implementation technique of using two singly-linked lists, 
one for the front of the queue and another for the back of the queue.)

Jeremy

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+oxbLqkDiu+Bs+JIRAgdZAJ9xiAkwpjDylj8aiAqDFL8Jm5zNTgCfU7nU
kMThW2eItzfr5pXjMf2P0Y8=
=9Tu7
-----END PGP SIGNATURE-----



From guido@python.org  Sun Apr 20 21:54:16 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 16:54:16 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: "Your message of Sun, 20 Apr 2003 12:04:35 PDT."
 <eppstein-C048B2.12043520042003@main.gmane.org>
References: <20030419224110.GB2460@barsoom.org>
 <200304200516.02382.fincher.8@osu.edu>
 <34897.127.0.0.1.1050836021.squirrel@127.0.0.1>
 <200304201221.51200.fincher.8@osu.edu> <2mu1ctnige.fsf@starship.python.net>
 <eppstein-C048B2.12043520042003@main.gmane.org>
Message-ID: <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net>

[David Eppstein]
> See <http://tinyurl.com/9x6d> for some tests indicating that using
> dict for fifo is a slow way to go.

I was just going to say that I was disappointed that there was
discussion about O(1) vs. O(N) but no actual performance
measurements.

But a comment on David's measurements: they assume the queue is empty.
What happens if the queue has an average of N elements, for various N?
At what point does the dict version overtake the list version?

Also ask yourself the following questions.  How much time are you
paying for the overhead of using a class vs. using a list directly?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Apr 20 21:59:30 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 16:59:30 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: "Your message of Sun, 20 Apr 2003 01:20:57 PDT."
 <3EA25869.6070404@noah.org>
References: <3EA25869.6070404@noah.org>
Message-ID: <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net>

> Recently I realized that there is no easy way to
> walk a directory tree and rename each directory and file.
> The standard os.path.walk() function does a breadth first walk.

This idea has merit, although I'm not sure I'd call this depth first;
it's more a matter of pre-order vs. post-order, isn't it?

But I ask two questions:

- How often does one need this?

- When needed, how hard is it to hand-code a directory walk?  It's not
  like the body of the walk() function is rocket science.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Apr 20 22:20:03 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 17:20:03 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: "Your message of Sun, 20 Apr 2003 11:07:24 EDT."
 <847k9pp5qr.fsf@boost-consulting.com>
References: <847k9pp5qr.fsf@boost-consulting.com>
Message-ID: <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net>

> I think I need a way to temporarily (from 'C'), arrange to be notified
> just before and just after a new extension module is loaded.  Is this
> possible?  I didn't see anything obvious in the source.  BTW, I'd be
> just as happy if it were possible to do the same thing for any module
> (i.e., not discriminating between extension and pure python modules).

I think Aahz is slowly leading you in the right direction: you can
override __import__ with something that calls your pre-hook, then the
original __import__, then your post_hook.  I see no problem with doing
this from C except that it's a bit verbose.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fincher.8@osu.edu  Sun Apr 20 23:21:02 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 18:21:02 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net>
References: <20030419224110.GB2460@barsoom.org>
 <eppstein-C048B2.12043520042003@main.gmane.org>
 <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304201821.03771.fincher.8@osu.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 20 April 2003 04:54 pm, Guido van Rossum wrote:
> Also ask yourself the following questions.  How much time are you
> paying for the overhead of using a class vs. using a list directly?

I imagine the object would eventually be written in C (probably by someone 
more experienced than myself, but I could do it if need be), when that 
overhead shouldn't matter.  But even with a pure-Python implementation, as 
noted in my other email, the fastest O(1) implementation outran the naive 
list implementation (granted it was wrapped in a class to maintain the same 
interface) somewhere between 100 and 1000 elements.  I could find out the 
average place at which the O(1) implementation becomes faster, if you're 
interested.

Jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (FreeBSD)

iD8DBQE+ox1OqkDiu+Bs+JIRAhvvAJ9gHSRpZmf8F2tCsqK40uSPqIoCMACeM5lY
k7FInBxUdA3MF/q/Hl4U45U=
=lb0T
-----END PGP SIGNATURE-----



From guido@python.org  Sun Apr 20 22:31:03 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 17:31:03 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: "Your message of Sun, 20 Apr 2003 18:21:02 EDT."
 <200304201821.03771.fincher.8@osu.edu>
References: <20030419224110.GB2460@barsoom.org>
 <eppstein-C048B2.12043520042003@main.gmane.org>
 <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net>
 <200304201821.03771.fincher.8@osu.edu>
Message-ID: <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net>

[Guido]
> > Also ask yourself the following questions.  How much time are you
> > paying for the overhead of using a class vs. using a list directly?

[Jeremy Fincher]
> I imagine the object would eventually be written in C (probably by someone 
> more experienced than myself, but I could do it if need be), when that 
> overhead shouldn't matter.  But even with a pure-Python implementation, as 
> noted in my other email, the fastest O(1) implementation outran the naive 
> list implementation (granted it was wrapped in a class to maintain the same 
> interface) somewhere between 100 and 1000 elements.  I could find out the 
> average place at which the O(1) implementation becomes faster, if you're 
> interested.

I have to think about this more.  ATM I'm inclined to say that this is
relatively uncommon, and it's not that hard to come up with an
efficient implementation.  Python's philosophy about data types is
that a few versatile data types (list, dict) get most the attention
because they are re-usable in so many places.  When you get to other
algorithms, there is such a variety that it's hard to imagine putting
them all in the standard library; instead, it's easy to roll your own
built out of the standard ones.

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (FreeBSD)
> 
> iD8DBQE+ox1OqkDiu+Bs+JIRAhvvAJ9gHSRpZmf8F2tCsqK40uSPqIoCMACeM5lY
> k7FInBxUdA3MF/q/Hl4U45U=
> =lb0T
> -----END PGP SIGNATURE-----

I know what this is, but I don't see the point.  I don't know who you
are (don't think we've ever met) and I respond based on your words,
not on who wrote them.  So what's the point?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fincher.8@osu.edu  Mon Apr 21 00:01:26 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Sun, 20 Apr 2003 19:01:26 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net>
References: <20030419224110.GB2460@barsoom.org>
 <200304201821.03771.fincher.8@osu.edu>
 <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304201901.26269.fincher.8@osu.edu>

On Sunday 20 April 2003 05:31 pm, Guido van Rossum wrote:
> I have to think about this more.  ATM I'm inclined to say that this is
> relatively uncommon, and it's not that hard to come up with an
> efficient implementation.  Python's philosophy about data types is
> that a few versatile data types (list, dict) get most the attention
> because they are re-usable in so many places.  When you get to other
> algorithms, there is such a variety that it's hard to imagine putting
> them all in the standard library; instead, it's easy to roll your own
> built out of the standard ones.

Aside from the efficiency improves, I like the self-documenting nature of 
using .enqueue and .dequeue methods instead of .append and .pop(0).  But I 
see your point.

> I know what this is, but I don't see the point.  I don't know who you
> are (don't think we've ever met) and I respond based on your words,
> not on who wrote them.  So what's the point?

I just had my client setup to sign messages automatically; I'll disable it :)

Jeremy


From DavidA@ActiveState.com  Mon Apr 21 01:49:56 2003
From: DavidA@ActiveState.com (David Ascher)
Date: Sun, 20 Apr 2003 17:49:56 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3EA34034.9060109@ActiveState.com>

Guido van Rossum wrote:
>>Recently I realized that there is no easy way to
>>walk a directory tree and rename each directory and file.
>>The standard os.path.walk() function does a breadth first walk.
> 
> 
> This idea has merit, although I'm not sure I'd call this depth first;
> it's more a matter of pre-order vs. post-order, isn't it?
> 
> But I ask two questions:
> 
> - How often does one need this?
> 
> - When needed, how hard is it to hand-code a directory walk?  It's not
>   like the body of the walk() function is rocket science.

That's hardly the point of improving the standard library, though, is 
it?  I'm all for putting the kitchen sink in there, especially if it 
originates with a use case ("I had some dishes to wash..." ;-)

--david



From guido@python.org  Mon Apr 21 02:01:53 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 21:01:53 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: "Your message of Sun, 20 Apr 2003 17:49:56 PDT."
 <3EA34034.9060109@ActiveState.com>
References: <3EA25869.6070404@noah.org>
 <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net>
 <3EA34034.9060109@ActiveState.com>
Message-ID: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>

> > - When needed, how hard is it to hand-code a directory walk?  It's not
> >   like the body of the walk() function is rocket science.
> 
> That's hardly the point of improving the standard library, though, is 
> it?  I'm all for putting the kitchen sink in there, especially if it 
> originates with a use case ("I had some dishes to wash..." ;-)

But if I had to do it over again, I wouldn't have added walk() in the
current form.  I often find it harder to fit a particular program's
needs in the API offered by walk() than it is to reimplement the walk
myself.  That's why I'm concerned about adding to it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 21 01:58:06 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 20 Apr 2003 20:58:06 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: "Your message of Sun, 20 Apr 2003 10:58:07 EDT."
 <20030420105807.C15881@localhost.localdomain>
References: <200304192343.48211.aleax@aleax.it>
 <200304200829.52477.aleax@aleax.it>
 <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
 <200304201210.07054.aleax@aleax.it>
 <20030420105807.C15881@localhost.localdomain>
Message-ID: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>

Thanks to all for a good and quick discussion!

I'm swayed by Alex's argument that a simple sum() builtin answers a
lot of recurring questions, so I'd like to add it.

I've never liked reduce() -- in its full generality it causes hard to
understand code, and I'm glad to see sum() remove probably 80% of the
need for it.

I like sum() best as the name -- that's what it's called in other
systems.

I'm not too concerned about the number of builtins (we should
deprecate some anyway to make room for new ones).

I'm not too worried that people will ask for prod() as well.  And if
they do, maybe we can give them that too; there's not much else along
the same lines (bitwise or/and; ha ha ha) so even if the slope may be
a bit slippery, I'm not worried about sliding too far.

I don't think the signature should be extended to match min() and
max() -- min(a, b) serves a real purpose, but sum(a, b) is just a
redundant way of saying a+b, and ditto for sum(a,b,c) etc.

There's a bunch of statistics functions (avg or mean, sdev etc.) that
should go in a statistics package or module together with more
advanced statistics stuff -- it would be a good idea to form a working
group or SIG to design such a thing with an eye towards usability,
power, and avoiding traps for newbies.

Finally, there's the question of what sum() of an empty sequence
should be.  There are several ways to force it: you can write

  sum(L or [0])

(which avoids the cost of copying in sum(L + [0]), or we can give
sum() an optional second argument.  But still, what should sum([]) do?
I'm sure that the newbies who are asking for it would be surprised by
anything except sum([]) == 0, since they probably want to sum a list
of numbers, and occasionally (albeit through a bug in their program
:-) the list will be empty.  But that means that summing a sequence of
strings ends up with a strange end case.  So perhaps raising an
exception for an empty sequence, like min() and max(), is better: "In
the face of ambiguity, refuse the temptation to guess."  An optional
second argument can then be used to specify a starting point for the
summation.  The semantics of this argument should be the same as for
reduce():

  sum(S, x) == sum([x] + list(S))

and hence

  sum(["a", "b"], "x") == "xab"

(A minority view that I can't quite shake off: since the name sum()
strongly suggests it's summing up numbers, sum([]) should be 0 and no
second argument is allowed.  I find using sum() for a sequence of
strings a bit weird anyway, and will probably continue to write
"".join(S) for that case.)

Alex, care to send in your patch?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Apr 21 02:10:19 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 21:10:19 -0400
Subject: [Python-Dev] Re: FIFO data structure?
In-Reply-To: <200304201753.18059.fincher.8@osu.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEKBEDAB.tim.one@comcast.net>

[Jeremy Fincher, on
 <http://tinyurl.com/9x6d>
]
> That's definitely an inadequate test.  First,if I read correctly,
> the test function doesn't test the plain list or array.array('i') as
> fifos, it tests them as a lifos (using simple .append(elt) and .pop()).

That's right, alas.  Mr. Delaney was implementing stacks there, and just
calling them fifos.

> Second, it never allows the fifo to have a size greater than 1, which
> completely negates the O(N) disadvantage of simple list-based
> implementations.

Yup.

> ...
> <http://www.cis.ohio-state.edu/~fincher/fifo_comparison.py>.
> ...

> (The O1ListSubclassFifo uses the standard (at least standard in
> functional programming :)) implementation technique of using two
> singly-linked lists, one for the front of the queue and another for the
> back of the queue.)

The Dark Force has seduced you there:

class O1ListSubclassFifo(list):
    __slots__ = ('back',)
    def __init__(self):
        self.back = []
    def enqueue(self, elt):
        self.back.append(elt)
    def dequeue(self):
        if self:
            return self.pop()
        else:
            self.back.reverse()
            self[:] = self.back
            self.back = []
            return self.pop()

That is, you're subclassing merely to reuse implementation, not because you
want to claim that O1ListSubclassFifo is-a list.  It's better not to
subclass list, and use two lists via has-a instead, say self.front and
self.back.  Then the O(N)

            self[:] = self.back

can be replaced by the O(1) (for example)

            self.front = self.back

Of course, this is Python <wink>, so it may not actually be faster that way:
you save some C-speed list copies, but at the cost of more-expensive
Python-speed dereferencing ("self.pop" vs "self.front.pop").  But even if
it's slower, it's better not to pretend this flavor of FIFO is-a list (e.g.,
someone doing len(), pop(), append() on one of these instances is going to
get a bizarre result).



From tim.one@comcast.net  Mon Apr 21 02:31:04 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 21:31:04 -0400
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <20030420183005.GB8449@barsoom.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net>

[Agthorr]
> I actually just wrote a modification to Queue that is O(1).  There's
> no change to the interface, so it doesn't require adding a new data
> structure.
>
> I have the code here:
>     http://www.cs.uoregon.edu/~agthorr/Queue.py
>
> The only changes are near the bottom of the file, beginning with the
> _init() function.  My implementation uses Python lists, but it uses
> them in a smarter way than the existing Queue implementation.
>
> I'll submit a patch to SourceForge in a day or two.

I'm opposed to this.  The purpose of Queue is to mediate communication among
threads, and a Queue.Queue rarely gets large because of its intended
applications.  As other recent timing posts have shown, you simply can't
beat the list.append + list.pop(0) approach until a queue gets quite large
(relative to the intended purpose of a Queue.Queue).

If you have an unusual application for a Queue.Queue where it's actually
faster to do a circular-buffer gimmick (and don't believe that you do before
you time it), then, as the comments say, you're invited to *subclass*
Queue.Queue, and override as many of the six queue-implementation methods at
the bottom of the class as you believe will be helpful.  It's not helpful to
change the *base* implementation of Queue.Queue for an O() advantage swamped
by increased overhead at typical queue sizes.



From nas@python.ca  Mon Apr 21 02:48:51 2003
From: nas@python.ca (Neil Schemenauer)
Date: Sun, 20 Apr 2003 18:48:51 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030421014851.GB18971@glacier.arctrix.com>

Guido van Rossum wrote:
> But if I had to do it over again, I wouldn't have added walk() in the
> current form.

I think it's the perfect place for a generator.

  Neil


From pinard@iro.umontreal.ca  Mon Apr 21 03:14:02 2003
From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois_Pinard?=)
Date: 20 Apr 2003 22:14:02 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
References: <3EA25869.6070404@noah.org>
 <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net>
 <3EA34034.9060109@ActiveState.com>
 <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <oqd6jgfvh1.fsf@titan.progiciels-bpi.ca>

[Guido van Rossum]

> But if I had to do it over again, I wouldn't have added walk() in the
> current form.  I often find it harder to fit a particular program's
> needs in the API offered by walk() than it is to reimplement the walk
> myself.

I do not much use `os.path.walk' myself.  It is so simple to write a small
walking loop with a stack of unseen directories, and in practice, there is
a wide range of ways and reasons to walk a directory hierarchy, some of
which do not fit nicely in the current `os.path.walk' specifications.

> That's why I'm concerned about adding to it.

The addition of generators to Python also changed the picture somewhat, in
this area.  It is often convenient to use a generator for a particular walk.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard


From tim.one@comcast.net  Mon Apr 21 03:12:42 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 22:12:42 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net>

[Guido]
> But if I had to do it over again, I wouldn't have added walk() in the
> current form.  I often find it harder to fit a particular program's
> needs in the API offered by walk() than it is to reimplement the walk
> myself.  That's why I'm concerned about adding to it.

We also have another possibility now:  a pathname generator.  Then the funky
callback and mystery-arg ("what's the purpose of the 'arg' arg?" is a
semi-FAQ on c.l.py) bits can go away, and client code could look like:

    for path in walk(root):
        # filter, if you like, via 'if whatever: continue'
        # accumulate state, if you like, in local vars

Or it could look like

    for top, names in walk(root):

or

    for top, dirnames, nondirnames in walk(root):


Here's an implementation of the last flavor.  Besides the more-or-less
obvious topdown argument, note a subtlety:  when topdown is True, the caller
can prune the search by mutating the dirs list yielded to it.  For example,

for top, dirs, nondirs in walk('C:/code/python'):
    print top, dirs, len(nondirs)
    if 'CVS' in dirs:
        dirs.remove('CVS')

doesn't descend into CVS subdirectories.

def walk(top, topdown=True):
    import os

    try:
        names = os.listdir(top)
    except os.error:
        return

    exceptions = ('.', '..')
    dirs, nondirs = [], []
    for name in names:
        if name in exceptions:
            continue
        fullname = os.path.join(top, name)
        if os.path.isdir(fullname):
            dirs.append(name)
        else:
            nondirs.append(name)
    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        for x in walk(os.path.join(top, name)):
            yield x
    if not topdown:
        yield top, dirs, nondirs



From barry@python.org  Mon Apr 21 03:23:47 2003
From: barry@python.org (Barry Warsaw)
Date: 20 Apr 2003 22:23:47 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
References: <200304192343.48211.aleax@aleax.it>
 <200304200829.52477.aleax@aleax.it>
 <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
 <200304201210.07054.aleax@aleax.it>
 <20030420105807.C15881@localhost.localdomain>
 <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <1050891827.26667.1.camel@geddy>

On Sun, 2003-04-20 at 20:58, Guido van Rossum wrote:

> (A minority view that I can't quite shake off: since the name sum()
> strongly suggests it's summing up numbers, sum([]) should be 0 and no
> second argument is allowed.  I find using sum() for a sequence of
> strings a bit weird anyway, and will probably continue to write
> "".join(S) for that case.)

I agree.  I'd rather see sum() constrain itself to numbers and sum([])
== 0.  Then I don't see a need for second argument.  "Summing" a list of
strings doesn't make much sense to me.

-Barry




From tim.one@comcast.net  Mon Apr 21 03:24:52 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 22:24:52 -0400
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <20030420075453.GA9504@localhost.distro.conectiva>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKHEDAB.tim.one@comcast.net>

[Gustavo Niemeyer]
> Should be working now. Sorry about the trouble. I should have fixed that
> before submiting the first version.

Confirming that my problems went away.  Thank you!


From gward@python.net  Mon Apr 21 03:47:43 2003
From: gward@python.net (Greg Ward)
Date: Sun, 20 Apr 2003 22:47:43 -0400
Subject: [Python-Dev] Bug/feature/patch policy for optparse.py
Message-ID: <20030421024743.GA3911@cthulhu.gerg.ca>

Hi all -- I've just thrown together Optik 1.4.1, and in turn checked in
rev 1.3 of Lib/optparse.py.  From the optparse docstring:

"""
If you have problems with this module, please do not files bugs,
patches, or feature requests with Python; instead, use Optik's
SourceForge project page:
  http://sourceforge.net/projects/optik

For support, use the optik-users@lists.sourceforge.net mailing list
(http://lists.sourceforge.net/lists/listinfo/optik-users).
"""

and from a comment right after the docstring:

# Python developers: please do not make changes to this file, since
# it is automatically generated from the Optik source code.

Does this policy seem reasonable to everyone?  And, more importantly,
can you all please try to respect it when you find bugs in or want to
add features to optparse.py?  Thanks!

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Never try to outstubborn a cat.


From aahz@pythoncraft.com  Mon Apr 21 03:51:54 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 22:51:54 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <1050891827.26667.1.camel@geddy>
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <1050891827.26667.1.camel@geddy>
Message-ID: <20030421025154.GA4542@panix.com>

On Sun, Apr 20, 2003, Barry Warsaw wrote:
> On Sun, 2003-04-20 at 20:58, Guido van Rossum wrote:
>> 
>> (A minority view that I can't quite shake off: since the name sum()
>> strongly suggests it's summing up numbers, sum([]) should be 0 and no
>> second argument is allowed.  I find using sum() for a sequence of
>> strings a bit weird anyway, and will probably continue to write
>> "".join(S) for that case.)
> 
> I agree.  I'd rather see sum() constrain itself to numbers and sum([])
> == 0.  Then I don't see a need for second argument.  "Summing" a list of
> strings doesn't make much sense to me.

Problem is, what *kind* of number?  While ints are in general easily
promotable (especially int 0), I'd prefer to make things explicit.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From tim.one@comcast.net  Mon Apr 21 03:48:43 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 22:48:43 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>

[Guido, making right decisions again, and I'll correct the details <wink>]
> ...
> I've never liked reduce() -- in its full generality it causes hard to
> understand code, and I'm glad to see sum() remove probably 80% of the
> need for it.

97.2%, actually.

> ...
> I'm not too worried that people will ask for prod() as well.  And if
> they do, maybe we can give them that too;

They will ask, but let's resist that one.

> there's not much else along the same lines (bitwise or/and; ha ha ha)

xor reduction is the key to the winning strategy for the game of Nim, so
expect intense pressure from the computer Nim camp.

> ...
> There's a bunch of statistics functions (avg or mean, sdev etc.) that
> should go in a statistics package or module together with more
> advanced statistics stuff -- it would be a good idea to form a working
> group or SIG to design such a thing with an eye towards usability,
> power, and avoiding traps for newbies.

Very big job, unless you leave the "advanced" stuff out.  Note that there
are many stats packages available for Python already, although some build on
NumPy.

> ...
> (A minority view that I can't quite shake off: since the name sum()
> strongly suggests it's summing up numbers, sum([]) should be 0 and no
> second argument is allowed.

That's my view, so it's quite possibly the correct view <wink>.  Numbers is
numbers.  sum(sequence_of_strings) hurts my brain, just as much as if we had
a builtin concat() function for pasting together a sequence of strings, and
someone argued that concat(sequence_of_numbers) should return their sum
"because they're both related to the '+' glyph in a syntactical way" (that
they both relate to methods named __add__ is beyond instant explanation to a
newbie).



From davida@ActiveState.com  Mon Apr 21 04:28:13 2003
From: davida@ActiveState.com (David Ascher)
Date: Sun, 20 Apr 2003 20:28:13 -0700
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
Message-ID: <3EA3654D.3070402@activestate.com>

Tim Peters wrote:

>>There's a bunch of statistics functions (avg or mean, sdev etc.) that
>>should go in a statistics package or module together with more
>>advanced statistics stuff -- it would be a good idea to form a working
>>group or SIG to design such a thing with an eye towards usability,
>>power, and avoiding traps for newbies.
>>    
>>
>
>Very big job, unless you leave the "advanced" stuff out.  Note that there
>are many stats packages available for Python already, although some build on
>NumPy.
>  
>
Scipy's stats package is more complete than many people expect.   I 
would argue strongly against putting a 'cheap stats' package in the 
core, since building one such packages takes a huge amount of work, 
doing it twice is silly.  At least the first version of the stats 
package now in chaco used to not require numeric, although I think that 
requirement is a red herring in practice.

>That's my view, so it's quite possibly the correct view <wink>.  Numbers is
>numbers.  sum(sequence_of_strings) hurts my brain, just as much as if we had
>a builtin concat() function for pasting together a sequence of strings, and
>someone argued that concat(sequence_of_numbers) should return their sum
>"because they're both related to the '+' glyph in a syntactical way" (that
>they both relate to methods named __add__ is beyond instant explanation to a
>newbie).
>  
>
+1.

Concatenation using + always seemed too Perlish for me, and Perl doesn't 
even do it! =)






From tim.one@comcast.net  Mon Apr 21 04:29:35 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 20 Apr 2003 23:29:35 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030421025154.GA4542@panix.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net>

[Aahz]
> Problem is, what *kind* of number?  While ints are in general easily
> promotable (especially int 0), I'd prefer to make things explicit.

I'd be OK with changing the signature to

    sum(iterable, empty=0)

as 0 cannot in fact be auto-promoted to some reasonable number-like objects.
For example, summing a list of datetime.timedelta objects seems a quite
natural application (e.g., picture a timesheet app), but supplying int 0
generally blows up when a timedelta is expected.



From aahz@pythoncraft.com  Mon Apr 21 04:34:38 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 20 Apr 2003 23:34:38 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net>
References: <20030421025154.GA4542@panix.com> <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net>
Message-ID: <20030421033438.GA7942@panix.com>

On Sun, Apr 20, 2003, Tim Peters wrote:
> [Aahz]
>>
>> Problem is, what *kind* of number?  While ints are in general easily
>> promotable (especially int 0), I'd prefer to make things explicit.
> 
> I'd be OK with changing the signature to
> 
>     sum(iterable, empty=0)
> 
> as 0 cannot in fact be auto-promoted to some reasonable number-like
> objects.  For example, summing a list of datetime.timedelta objects
> seems a quite natural application (e.g., picture a timesheet app), but
> supplying int 0 generally blows up when a timedelta is expected.

+1

That makes the canonical usage clear, while not preventing people from
doing stupid things.  ;-)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From eppstein@ics.uci.edu  Mon Apr 21 04:35:24 2003
From: eppstein@ics.uci.edu (David Eppstein)
Date: Sun, 20 Apr 2003 20:35:24 -0700
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <1050891827.26667.1.camel@geddy> <20030421025154.GA4542@panix.com>
Message-ID: <eppstein-B5BDB5.20352420042003@main.gmane.org>

In article <20030421025154.GA4542@panix.com>,
 Aahz <aahz@pythoncraft.com> wrote:

> > I agree.  I'd rather see sum() constrain itself to numbers and sum([])
> > == 0.  Then I don't see a need for second argument.  "Summing" a list of
> > strings doesn't make much sense to me.
> 
> Problem is, what *kind* of number?  While ints are in general easily
> promotable (especially int 0), I'd prefer to make things explicit.

Maybe make sum(L) always equivalent to reduce(operator.add,L,0)?
Then "number" here would mean something that can be added to 0, allowing 
any kind of user-defined number type to work (e.g. I recently wanted a 
sum function for Keith Briggs' "xr" package for exact computations over 
computable reals).  This would mean that attempts to abuse sum to 
concatenate strings would raise TypeError.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science



From cnetzer@mail.arc.nasa.gov  Mon Apr 21 05:33:32 2003
From: cnetzer@mail.arc.nasa.gov (Chad Netzer)
Date: 20 Apr 2003 21:33:32 -0700
Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ')
In-Reply-To: <3EA3654D.3070402@activestate.com>
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
 <3EA3654D.3070402@activestate.com>
Message-ID: <1050899612.591.21.camel@sayge.arc.nasa.gov>

On Sun, 2003-04-20 at 20:28, David Ascher wrote:
> Tim Peters wrote:
> 
> >>There's a bunch of statistics functions (avg or mean, sdev etc.) that
> >>should go in a statistics package or module together with more
> >>advanced statistics stuff -- it would be a good idea to form a working
> >>group or SIG to design such a thing with an eye towards usability,
> >>power, and avoiding traps for newbies.

+1

> >Very big job, unless you leave the "advanced" stuff out.  Note that there
> >are many stats packages available for Python already, although some build on
> >NumPy.
> >
> Scipy's stats package is more complete than many people expect.

I was going to suggest that we consider adopting Gary Strangman's
stats.py package as the foundation for inclusion.  This is the package
that SciPy chose to include (with modifications of the namespace and API
to fit the SciPy scheme of things).

I've used it, and it is a very full featured package.  I was actually
kind of saddened that Gary had done all the work, since after getting my
Master's degree, I had considered implementing such a module myself (for
reasons of learning).  But Gary's work is quite comprehensive, and well
written, IMO (well tested, few external dependencies, etc.  I just drop
it in a working directory when I need it on a new system.)

http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python.html

Gary allowed SciPy to adopt his package under the BSD license, so I'm
sure he would be amenable to discussing any licensing issues that may
arise (the original package is GPL).  It works on Python lists, as well
as Numeric arrays.

I'd be happy to take up the efforts of approaching Gary about whether he
would consider "donating" his module for the standard lib, after any
changes a working group or SIG might suggest (or require).  Possibly
there are some namespace issues (actually, he has a companion "pstat"
module, that is a standard library module name conflict I'd wanted
fixed).

Other than ensuring it works on the normal python sequences, and
removing any dependencies on NumPy or Numeric (while hopefully allowing
it to integrate well with either), and possibly trying to reconcile name
issues with SciPy (if at all feasible), it may be definitely doable by
2.4.  I'm happy to volunteer some time to the effort.  I think it would
be quite worthwhile.

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)



From andymac@bullseye.apana.org.au  Mon Apr 21 02:01:34 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Mon, 21 Apr 2003 12:01:34 +1100 (edt)
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net>
Message-ID: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org>

On Sat, 19 Apr 2003, Tim Peters wrote:

> test_sre is dying with a segfault:
>
> """
> C:\Code\python\PCbuild>python ../lib/test/test_sre.py
> Running tests on character literals
> Running tests on sre.search and sre.match
> sre.match(r'(a)?a','a').lastindex FAILED
> expected None
> got result 1
> sre.match(r'(a)(b)?b','ab').lastindex FAILED
> expected 1
> got result 2
> sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED
> expected 'a'
> got result 'b'
> Running tests on sre.sub
> Running tests on symbolic references
> Running tests on sre.subn
> Running tests on sre.split
> Running tests on sre.findall
> Running tests on sre.finditer
> Running tests on sre.match
> Running tests on sre.escape
> Running tests on sre.Scanner
> Pickling a SRE_Pattern instance
> Test engine limitations
> """
>
> and it dies with a segfault there.  Unfortunately, test_sre doesn't die in a
> debug build.

Compiler optimisation?  I've been trying to get a handle on this for the
last couple of days, with various versions of gcc on FreeBSD and OS/2 not
liking _sre since Guido checked patch #720991 in on April 14.

The failures all occur after the "Running tests on sre.search and
sre.match" phase of test_sre.

What I've been able to delineate thus far:

test_sre on FreeBSD 4.[47]:
  gcc 2.95.[34]:  -O3     => bus error, -O2 => Ok
  gcc 3.2.2:      -O[023] =>  "    "  , -Os => Ok

test_sre on OS/2:
  gcc 2.8.1:      -O2     => Ok
  pgcc 2.95.2:    -O3     => Ok
  gcc 3.2.1:      -O[23]  => SYS3171, -O[0s] => Ok
  OpenWatcom 1.0 with all optimisations enabled => Ok

Now, the docs for SYS3171 on OS/2 say

"EXPLANATION: The process was terminated without running exception
handlers because there was not enough room left on the stack to
dispatch the exception.  This is typically caused by exceptions
occurring in exception handlers."

I did bump the stack from 1M to 2M with no effect.

I'm not concerned by the failures on OS/2 as I'm not using autoconf there,
and I can special-case _sre.c easily.

I am concerned about the failures on FreeBSD.  It looks to me as though
the only viable option is to just special case FreeBSD/gcc in configure.in
and use -Os instead of -O3.

I've been assuming that test_sre has passed with gcc 3.2.x -O3 on Linux
since that checkin.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From martin@v.loewis.de  Mon Apr 21 08:06:29 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 21 Apr 2003 09:06:29 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
References: <200304192343.48211.aleax@aleax.it>
 <200304200829.52477.aleax@aleax.it>
 <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU>
 <200304201210.07054.aleax@aleax.it>
 <20030420105807.C15881@localhost.localdomain>
 <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <m3y9242utm.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> But still, what should sum([]) do?

It should raise a ValueError("no values to sum"). In practice, I
expect it won't matter, because users will typically have values to
sum. If they don't, telling them to write sum(L or [0]) is easy
enough. There should be preferably only one obvious way to do it.

Regards,
Martin



From martin@v.loewis.de  Mon Apr 21 08:21:42 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 21 Apr 2003 09:21:42 +0200
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org>
References: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org>
Message-ID: <m3u1cs2u49.fsf@mira.informatik.hu-berlin.de>

Andrew MacIntyre <andymac@bullseye.apana.org.au> writes:

> The failures all occur after the "Running tests on sre.search and
> sre.match" phase of test_sre.

Instead of trying various compilers hoping that the problem goes away,
I recommend that you try to narrow down the test case that fails.

Regards,
Martin


From aleax@aleax.it  Mon Apr 21 09:29:28 2003
From: aleax@aleax.it (Alex Martelli)
Date: Mon, 21 Apr 2003 10:29:28 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <m3y9242utm.fsf@mira.informatik.hu-berlin.de>
References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <m3y9242utm.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304211029.28802.aleax@aleax.it>

On Monday 21 April 2003 09:06 am, Martin v. Löwis wrote:
> Guido van Rossum <guido@python.org> writes:
> > But still, what should sum([]) do?
>
> It should raise a ValueError("no values to sum"). In practice, I
> expect it won't matter, because users will typically have values to
> sum. If they don't, telling them to write sum(L or [0]) is easy
> enough. There should be preferably only one obvious way to do it.

I like this a lot -- particularly because it's exactly what I teach
people now for max and min (except that in the cases of max
and min there's the extra complication for the user of choosing
WHAT he or she wants as the result for an empty list, while in
the case of sum the user's life will be easier).


Alex



From aleax@aleax.it  Mon Apr 21 09:52:55 2003
From: aleax@aleax.it (Alex Martelli)
Date: Mon, 21 Apr 2003 10:52:55 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
References: <200304192343.48211.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304211052.55557.aleax@aleax.it>

On Monday 21 April 2003 02:58 am, Guido van Rossum wrote:
   ...
> anything except sum([]) == 0, since they probably want to sum a list
> of numbers, and occasionally (albeit through a bug in their program
> :-) the list will be empty.  But that means that summing a sequence of

Errors should never pass silently, unless explicitly silenced.  I thus think
that the sum of an empty sequence should raise a ValueError (just
like the max or min of an empty sequence) and the idiom sum(L or [0])
should be taught to "sum up a list of numbers that might be empty".

> strings ends up with a strange end case.  So perhaps raising an
> exception for an empty sequence, like min() and max(), is better: "In
> the face of ambiguity, refuse the temptation to guess."  An optional

Yes!

> Alex, care to send in your patch?

Aye aye, cap'n -- now that youve crossed the i's and dotted the t's
I'll arrange the complete patch with tests and docs and submit it
forthwith.


Alex



From aleax@aleax.it  Mon Apr 21 11:52:32 2003
From: aleax@aleax.it (Alex Martelli)
Date: Mon, 21 Apr 2003 12:52:32 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304211052.55557.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it>
Message-ID: <200304211252.32948.aleax@aleax.it>

On Monday 21 April 2003 10:52 am, Alex Martelli wrote:
   ...
> Aye aye, cap'n -- now that youve crossed the i's and dotted the t's
> I'll arrange the complete patch with tests and docs and submit it
> forthwith.

Done -- patch 724936 on SF, assigned to gvanrossum with priority 7
as you said to do for patches meant for 2.3beta1.


As I've remarked in the patch's comments, there's something of a
performance hit now with sum(manystrings) wrt ''.join(manystrings):

[alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' 
-s'import operator'  'sum(L)'
10000 loops, best of 3: 174 usec per loop

[alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' 
-s'import operator'  '"".join(L)'
10000 loops, best of 3: 75 usec per loop

[alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' 
-s'import operator'  'reduce(operator.add,L)'
1000 loops, best of 3: 1.35e+03 usec per loop

[alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' 
-s'import operator'  'tot=""' 'for it in L: tot+=it'
1000 loops, best of 3: 1.33e+03 usec per loop

Nowhere as bad as the unbounded slowdown with operator.add
or the equivalent loop, but still, a solid slowdown of a factor of two.

Problem is that the argument to sum MIGHT be an iterator (not a
"normal" sequence) so sum must save the first item and concat
the _PyString_Join of the OTHER items after the first (my unit
tests were originally lax and only exercised sum with the argument
being a list -- fortunately I beefed up the unit tests as part of
preparing the patch for submission, so they caught this, as well
as the issue with sum of a sequence mixing unicode and plain
string items, which forces sum to use different concatenation code
depending on the exact type of the first item...).


Reasoning on this, and on "If the implementation is hard to explain, 
it's a bad idea", I'm starting to doubt my original intuition that "of
course" sum should be polymorphic over sequences of any type
supporting "+" -- maybe Tim Peters' concept that sum should be
restricted to sequences of numbers is sounder -- it's irksome that,
of sum's 50 lines of C, 14 should deal with the special case of
"sequence of strings" and STILL involve a factor-of-2 performance
hit wrt ''.join!  However, HOW do we catch attempts to use sum on
a sequence of strings while still allowing the use case of a sequence
of timedeltas?  [maybe a timedelta SHOULD be summable to 0
and we could take the 'summable to 0' as a test of numberhood?-)]

I don't know, so, I've submitted the patch as it stands, and I hope
somebody can suggest a better solution - I just PRAY that sum
won't accept a sequence of strings AND sum them up with + ,
thus perpetuating the "newbie performance trap" of that idiom!-)


Alex




From dave@boost-consulting.com  Mon Apr 21 12:14:57 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Mon, 21 Apr 2003 07:14:57 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net> (Guido
 van Rossum's message of "Sun, 20 Apr 2003 17:20:03 -0400")
References: <847k9pp5qr.fsf@boost-consulting.com>
 <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <84brz0nlu6.fsf@boost-consulting.com>

Guido van Rossum <guido@python.org> writes:

>> I think I need a way to temporarily (from 'C'), arrange to be notified
>> just before and just after a new extension module is loaded.  Is this
>> possible?  I didn't see anything obvious in the source.  BTW, I'd be
>> just as happy if it were possible to do the same thing for any module
>> (i.e., not discriminating between extension and pure python modules).
>
> I think Aahz is slowly leading you in the right direction: you can
> override __import__ with something that calls your pre-hook, then the
> original __import__, then your post_hook.  I see no problem with doing
> this from C except that it's a bit verbose.

So I take it a doc patch is in order.  That section which claims it's
impossible is certainly misleading...

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From guido@python.org  Mon Apr 21 13:04:38 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 08:04:38 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: "Your message of Sun, 20 Apr 2003 18:48:51 PDT."
 <20030421014851.GB18971@glacier.arctrix.com>
References: <3EA25869.6070404@noah.org>
 <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net>
 <3EA34034.9060109@ActiveState.com>
 <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net>
 <20030421014851.GB18971@glacier.arctrix.com>
Message-ID: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net>

> Guido van Rossum wrote:
> > But if I had to do it over again, I wouldn't have added walk() in the
> > current form.
> 
> I think it's the perfect place for a generator.

Absolutely!  So let's try to write something new based on generators,
make it flexible enough so that it can handle pre-order or post-order
visits, and then phase out os.walk().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 21 13:08:07 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 08:08:07 -0400
Subject: [Python-Dev] Bug/feature/patch policy for optparse.py
In-Reply-To: "Your message of Sun, 20 Apr 2003 22:47:43 EDT."
 <20030421024743.GA3911@cthulhu.gerg.ca>
References: <20030421024743.GA3911@cthulhu.gerg.ca>
Message-ID: <200304211208.h3LC87O20882@pcp02138704pcs.reston01.va.comcast.net>

> Hi all -- I've just thrown together Optik 1.4.1, and in turn checked in
> rev 1.3 of Lib/optparse.py.  From the optparse docstring:
> 
> """
> If you have problems with this module, please do not files bugs,
> patches, or feature requests with Python; instead, use Optik's
> SourceForge project page:
>   http://sourceforge.net/projects/optik
> 
> For support, use the optik-users@lists.sourceforge.net mailing list
> (http://lists.sourceforge.net/lists/listinfo/optik-users).
> """
> 
> and from a comment right after the docstring:
> 
> # Python developers: please do not make changes to this file, since
> # it is automatically generated from the Optik source code.
> 
> Does this policy seem reasonable to everyone?  And, more importantly,
> can you all please try to respect it when you find bugs in or want to
> add features to optparse.py?  Thanks!

Works for me.  I expect that occasionally someone will forget this and
check in a fix; they will surely be corrected quickly (and without
*too* much embarrassment) by other developers.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jacobs@penguin.theopalgroup.com  Mon Apr 21 13:12:25 2003
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Mon, 21 Apr 2003 08:12:25 -0400 (EDT)
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304211252.32948.aleax@aleax.it>
Message-ID: <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com>

On Mon, 21 Apr 2003, Alex Martelli wrote:
> On Monday 21 April 2003 10:52 am, Alex Martelli wrote:
>    ...
> > Aye aye, cap'n -- now that youve crossed the i's and dotted the t's
> > I'll arrange the complete patch with tests and docs and submit it
> > forthwith.
> 
> Done -- patch 724936 on SF, assigned to gvanrossum with priority 7
> as you said to do for patches meant for 2.3beta1.

Just to make sure I understand the desired semantics, is this Python
implementation of sum() accurate:

def sum(l):
    '''sum(sequence) -> value

       Returns the sum of a non-empty sequence of numbers (or other objects
       that can be added to each other, such as strings, lists, tuples...).'''

    it   = iter(l)
    next = it.next

    try:
        first = next()
    except StopIteration:
        raise ValueError, 'sum() arg is an empty sequence'

    # Special-case sequences of strings, for speed 
    if isinstance(first, str):
        try:
            return first + ''.join(it)
        except:
            pass

    try:
        while 1:
            first += next()

    except StopIteration:
        return first

The speed optimization for string sequences is slightly different, but
exposes the same fast-path for the vast majority of likely inputs.

-Kevin

-- 
--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From guido@python.org  Mon Apr 21 13:26:41 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 08:26:41 -0400
Subject: [Python-Dev] Hook Extension Module Import?
In-Reply-To: "Your message of Mon, 21 Apr 2003 07:14:57 EDT."
 <84brz0nlu6.fsf@boost-consulting.com>
References: <847k9pp5qr.fsf@boost-consulting.com>
 <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net>
 <84brz0nlu6.fsf@boost-consulting.com>
Message-ID: <200304211226.h3LCQfc22713@pcp02138704pcs.reston01.va.comcast.net>

> So I take it a doc patch is in order.  That section which claims it's
> impossible is certainly misleading...

I have no idea where it says that, so yes, please submit a patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 21 13:30:28 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 08:30:28 -0400
Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ')
In-Reply-To: "Your message of 20 Apr 2003 21:33:32 PDT."
 <1050899612.591.21.camel@sayge.arc.nasa.gov>
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
 <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov>
Message-ID: <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net>

Since it already exists as a 3rd party package, we should definitely
not try to duplicate the effort.  Then the question is, is it enough
to point to the 3rd party package or does it deserve to be
incorporated into the core?  We can't go and incorporate every useful
3rd party package into the core (that's the job of the SUMO
distribution project -- whichunfortunately seems to have stalled).
OTOH, having it in the core, with decent documentation, might prevent
naive wannabe-statisticians like myself from misremembering how
standard deviation is implemented, or when to use it. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 21 13:48:58 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 08:48:58 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: "Your message of Mon, 21 Apr 2003 12:52:32 +0200."
 <200304211252.32948.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it>
 <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
 <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it>
Message-ID: <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>

OK, let me summarize and pronounce.

sum(sequence_of_strings) is out.  *If* "".join() is really too ugly (I
still think it's a matter of getting used to, like indentation), we
could add join(seq, delim) as a built-in.  VB has one. :-)

sum([]) could either return 0 or raise ValueError.  I lean towards 0
because that is occasionally useful and reinforces the numeric
intention.  I think making it return 0 will prevent end-case bugs
where a newbie sums a list that is occasionally empty.  If we made it
an error, I expect that in 99% of the cases the response to that error
would be to change the program to make it return 0 if the list is
empty, and I can't imagine many bugs caused by choosing 0 over some
other numerical zero.  Having to teach the idiom sum(S or [0]) is
ugly, and this doesn't work if S is an iterator.

I appreciate Tim's point of wanting to sum "number-like" objects that
can't be added to 0.  OTOH if we provide *any* way of providing a
different starting point, some creative newbie is going to use
sum(list_of_strings, "") instead of "".join(), and be hurt by the
performance months later.

If we add an optional argument for Tim's use case, it could be used in
two different ways: (1) only when the sequence is empty, (2) always
used as a starting point.  IMO (2) is more useful and more consistent.

Here's one suggestion to deal with the sequence_of_strings issue
(though maybe too pedantic): explicitly check whether the second
argument is a string or unicode object, and in that case raise a
TypeError indicating that a numeric value is required and suggesting
to use "".join() for summing a sequence of strings.

So here's a strawman implementation:

  def sum(seq, start=0):
    if isinstance(start, basestring):
      raise TypeError, "can't sum strings; use ''.join(seq) instead"
    return reduce(operator.add, seq, start)

Alex, go ahead and implement this!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 21 14:43:09 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 09:43:09 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: Your message of "Mon, 21 Apr 2003 08:12:25 EDT."
 <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com>
References: <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com>
Message-ID: <200304211343.h3LDh9W21923@odiug.zope.com>

> Just to make sure I understand the desired semantics, is this Python
> implementation of sum() accurate:

We're no longer aiming for this, but let me point out the fatal flaw
in this approach:

> def sum(l):
>     '''sum(sequence) -> value
> 
>        Returns the sum of a non-empty sequence of numbers (or other objects
>        that can be added to each other, such as strings, lists, tuples...).'''
> 
>     it   = iter(l)
>     next = it.next
> 
>     try:
>         first = next()
>     except StopIteration:
>         raise ValueError, 'sum() arg is an empty sequence'
> 
>     # Special-case sequences of strings, for speed 
>     if isinstance(first, str):
>         try:
>             return first + ''.join(it)
>         except:
>             pass

Suppose the iterator was iter(["a", "b", "c", 1, 2, 3]).  The "a" is
held in the variable 'first'.  The "".join() code consumes "b", "c"
and 1, and then raises an exception.  At this point, there's no way to
recover the values swallowed by "".join(), so there's no way to
continue.  But letting the exception raised by "".join() propagate
isn't right either: suppose that instead of [1, 2, 3] the sequence
ended with some instances of a class that knows how to add itself to a
string: the optimization attempt would cause an error to be thrown
that wouldn't have been thrown without the optimization, a big no-no
for optimizations.

>     try:
>         while 1:
>             first += next()
> 
>     except StopIteration:
>         return first
> 
> The speed optimization for string sequences is slightly different, but
> exposes the same fast-path for the vast majority of likely inputs.

Of course, it might have been okay to only invoke "".join() if the
argument was a *list* of strings.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aleax@aleax.it  Mon Apr 21 16:03:24 2003
From: aleax@aleax.it (Alex Martelli)
Date: Mon, 21 Apr 2003 17:03:24 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304211703.24685.aleax@aleax.it>

On Monday 21 April 2003 02:48 pm, Guido van Rossum wrote:
> OK, let me summarize and pronounce.
>
> sum(sequence_of_strings) is out.  *If* "".join() is really too ugly (I
> still think it's a matter of getting used to, like indentation), we

I entirely agree on this.  Differently from reduce(operator.add, XX),
''.join(XX) *CAN* be taught quite reasonably to bright beginners
without any special math/CS background, in my experience.  The
noise against ''.join IMHO comes mostly from a crowd of "OO
purists" who just don't see WHY it's RIGHT for it to be that way!-)

> could add join(seq, delim) as a built-in.  VB has one. :-)

VB has lots of stuff, but we don't need this one.  Please.  One
obvious way to do it (at least if you are Dutch...!).


> sum([]) could either return 0 or raise ValueError.  I lean towards 0
> because that is occasionally useful and reinforces the numeric
> intention.  I think making it return 0 will prevent end-case bugs
> where a newbie sums a list that is occasionally empty.  If we made it
> an error, I expect that in 99% of the cases the response to that error
> would be to change the program to make it return 0 if the list is
> empty, and I can't imagine many bugs caused by choosing 0 over some
> other numerical zero.  Having to teach the idiom sum(S or [0]) is
> ugly, and this doesn't work if S is an iterator.

You're right that S or [0] doesn't work for iterators, AND that bright
beginners expect 0 rather than an error (fortunately I have some of
those at hand to check with;-).  So, sum([])==0 it is.


> I appreciate Tim's point of wanting to sum "number-like" objects that
> can't be added to 0.  OTOH if we provide *any* way of providing a
> different starting point, some creative newbie is going to use
> sum(list_of_strings, "") instead of "".join(), and be hurt by the
> performance months later.

Yes yes yes!


> If we add an optional argument for Tim's use case, it could be used in
> two different ways: (1) only when the sequence is empty, (2) always
> used as a starting point.  IMO (2) is more useful and more consistent.
>
> Here's one suggestion to deal with the sequence_of_strings issue
> (though maybe too pedantic): explicitly check whether the second
> argument is a string or unicode object, and in that case raise a
> TypeError indicating that a numeric value is required and suggesting
> to use "".join() for summing a sequence of strings.

I like this!!!


> So here's a strawman implementation:
>
>   def sum(seq, start=0):
>     if isinstance(start, basestring):
>       raise TypeError, "can't sum strings; use ''.join(seq) instead"
>     return reduce(operator.add, seq, start)
>
> Alex, go ahead and implement this!

Coming right up!


Alex



From fincher.8@osu.edu  Mon Apr 21 17:56:42 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Mon, 21 Apr 2003 12:56:42 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net>
Message-ID: <200304211256.42839.fincher.8@osu.edu>

On Sunday 20 April 2003 10:12 pm, Tim Peters wrote:
>     if 'CVS' in dirs:
>         dirs.remove('CVS')

This code brought up an interesting question to me: if sets have a .discard 
method that removes an element without raising KeyError if the element isn't 
in the set, should lists perhaps have that same method?

On another related front, sets (in my Python 2.3a2) raise KeyError on a 
.remove(elt) when elt isn't in the set.  Since sets aren't mappings, should 
that be a ValueError (like list raises) instead?

Jeremy


From aahz@pythoncraft.com  Mon Apr 21 17:05:19 2003
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 21 Apr 2003 12:05:19 -0400
Subject: [Python-Dev] ''.join() again
In-Reply-To: <200304211703.24685.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it>
Message-ID: <20030421160515.GA26557@panix.com>

On Mon, Apr 21, 2003, Alex Martelli wrote:
>
> I entirely agree on this.  Differently from reduce(operator.add, XX),
> ''.join(XX) *CAN* be taught quite reasonably to bright beginners
> without any special math/CS background, in my experience.  The
> noise against ''.join IMHO comes mostly from a crowd of "OO
> purists" who just don't see WHY it's RIGHT for it to be that way!-)

Well, this means it's time for my regular reminder that I'm very far
from an OO purist and I still hate ''.join().  OTOH, I've been using it
recently for some of my own code, and while I'll never change my mind
about its visual ugliness, I've got to admit that it has one cardinal
virtue: you can never forget what order its arguments belong in.

So I'll stop ranting about ''.join() except when people like Alex make
sneers about OO purists.  ;-)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From barry@python.org  Mon Apr 21 17:23:39 2003
From: barry@python.org (Barry Warsaw)
Date: 21 Apr 2003 12:23:39 -0400
Subject: [Python-Dev] ''.join() again
In-Reply-To: <20030421160515.GA26557@panix.com>
References: <200304192343.48211.aleax@aleax.it>
 <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <200304211703.24685.aleax@aleax.it>  <20030421160515.GA26557@panix.com>
Message-ID: <1050942219.30896.11.camel@barry>

On Mon, 2003-04-21 at 12:05, Aahz wrote:
> Well, this means it's time for my regular reminder that I'm very far
> from an OO purist and I still hate ''.join().  OTOH, I've been using it
> recently for some of my own code, and while I'll never change my mind
> about its visual ugliness, I've got to admit that it has one cardinal
> virtue: you can never forget what order its arguments belong in.

And I'll do my semi-regular rant that 

	COMMASPACE.join(seq)

looks a lot nicer than

	', '.join(seq)

even to the point of starting to /like/ this idiom. :)
-Barry




From pinard@iro.umontreal.ca  Mon Apr 21 18:44:12 2003
From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois_Pinard?=)
Date: 21 Apr 2003 13:44:12 -0400
Subject: [Python-Dev] Re: ''.join() again
In-Reply-To: <20030421160515.GA26557@panix.com>
References: <200304192343.48211.aleax@aleax.it>
 <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com>
Message-ID: <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca>

[Aahz]

> Well, [...] I still hate ''.join().  [...] I'll never change my mind
> about its visual ugliness,

Same here.  I'm getting used to it like children get use to cigars: they
vomit for some time, and after a while, learn to like them.  Cigars, like
the construct above, still destroy taste, and are not ideal for health! :-)

> I've got to admit that it has one cardinal virtue: you can never forget
> what order its arguments belong in.

But yet, it is so unnatural and brain damaging that it sometimes induces me
into using the wrong order of arguments for `A.split(B)'.

I tried complaining as loud as I could, while staying civilised, before the
above was put into Python, but nobody seemed interested to listen.  As much
as I appreciate most additions to Python from 1.6 and on, that particular
one has been and will stay a long lasting mistake.  I still love Python! :-)

[Barry Warsaw]

> And I'll do my semi-regular rant that 
> 	COMMASPACE.join(seq)
> looks a lot nicer than
> 	', '.join(seq)
> even to the point of starting to /like/ this idiom. :)

Having to name simple string constants like a single space looks overkill.
It hardly salvages the original ugliness.  Admit it: you're stuck! :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard


From barry@python.org  Mon Apr 21 19:01:17 2003
From: barry@python.org (Barry Warsaw)
Date: 21 Apr 2003 14:01:17 -0400
Subject: [Python-Dev] Re: ''.join() again
In-Reply-To: <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca>
References: <200304192343.48211.aleax@aleax.it>
 <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com>
 <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca>
Message-ID: <1050948076.30896.33.camel@barry>

On Mon, 2003-04-21 at 13:44, François Pinard wrote:

> > And I'll do my semi-regular rant that 
> > 	COMMASPACE.join(seq)
> > looks a lot nicer than
> > 	', '.join(seq)
> > even to the point of starting to /like/ this idiom. :)
> 
> Having to name simple string constants like a single space looks overkill.
> It hardly salvages the original ugliness.  Admit it: you're stuck! :-)

Never!  And though I don't smoke, some of my fondest childhood memories
are walking around the block with my grandfather while he smoked his
cigars. :)

'Course, you don't /have/ to name your string constants, though I
usually do because it improves readability, and because I invariably
find several uses for the same string constant in a single module.

OTOH, I wouldn't object too strenuously to a join() builtin, but I'd
probably never use it -- I'm sure I'd rarely remember the argument order
and hate having too look it up much more than writing out the current
spelling.  Admit it: there is no natural unforgetable order! :)

-Barry




From zack@codesourcery.com  Mon Apr 21 19:08:56 2003
From: zack@codesourcery.com (Zack Weinberg)
Date: Mon, 21 Apr 2003 11:08:56 -0700
Subject: [Python-Dev] Re: ''.join() again
In-Reply-To: <1050948076.30896.33.camel@barry> (Barry Warsaw's message of
 "21 Apr 2003 14:01:17 -0400")
References: <200304192343.48211.aleax@aleax.it>
 <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com>
 <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca>
 <1050948076.30896.33.camel@barry>
Message-ID: <87el3v67uv.fsf@egil.codesourcery.com>

Barry Warsaw <barry@python.org> writes:

> OTOH, I wouldn't object too strenuously to a join() builtin, but I'd
> probably never use it -- I'm sure I'd rarely remember the argument order
> and hate having too look it up much more than writing out the current
> spelling.  Admit it: there is no natural unforgetable order! :)

It occurs to me that one may put

join = str.join

at the top of one's module, and thereafter use

join('str', sequence)

(For 2.1 backward compatibility, use type('') instead of str.)

Possibly this is a counterargument to accusations that a join builtin
would be bloat, since the same implementation could be used for both.

zw


From barry@python.org  Mon Apr 21 19:18:34 2003
From: barry@python.org (Barry Warsaw)
Date: 21 Apr 2003 14:18:34 -0400
Subject: [Python-Dev] Re: ''.join() again
In-Reply-To: <87el3v67uv.fsf@egil.codesourcery.com>
References: <200304192343.48211.aleax@aleax.it>
 <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com>
 <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> <1050948076.30896.33.camel@barry>
 <87el3v67uv.fsf@egil.codesourcery.com>
Message-ID: <1050949114.30943.41.camel@barry>

On Mon, 2003-04-21 at 14:08, Zack Weinberg wrote:

> Possibly this is a counterargument to accusations that a join builtin
> would be bloat

Or necessary <wink>.
-Barry






From tjreedy@udel.edu  Mon Apr 21 20:01:32 2003
From: tjreedy@udel.edu (Terry Reedy)
Date: Mon, 21 Apr 2003 15:01:32 -0400
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <b81ebo$cpq$1@main.gmane.org>

"Guido van Rossum" <guido@python.org> wrote in message
news:200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net.
..
> sum(sequence_of_strings) is out.  *If* "".join() is really too ugly
(I
> still think it's a matter of getting used to, like indentation), we
> could add join(seq, delim) as a built-in.  VB has one. :-)

Given that we already have the 'less ugly' alternative str.join(delim,
strseq),
both sum(strseq) and a hypothetical builtin seem unnecessary.  And, an
explicit udelim.join(sseq) or unicode.join(udelim, sseq) nicely
handles mixed seqs without type guessing.

>>> str.join('', ['a','b','c']
'abc'
>>> unicode.join(u'', ['a',u'b'])
u'ab'

Terry J. Reedy





From noah@noah.org  Mon Apr 21 19:55:31 2003
From: noah@noah.org (Noah Spurrier)
Date: Mon, 21 Apr 2003 11:55:31 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <3EA34034.9060109@ActiveState.com>
References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com>
Message-ID: <3EA43EA3.1030903@noah.org>

Guido>> This idea has merit, although I'm not sure I'd call this depth first;
Guido>> it's more a matter of pre-order vs. post-order, isn't it?
I thought the names were synonymous, but a quick look on Google
showed that post-order seems more specific to binary trees whereas
depth first is more general, but I didn't look very hard and all my
college text books are in storage :-) Depth first is more intuitive, but
post order is more descriptive of what the algorithm does.
If I were writing documentation (or reading it) I would prefer "depth first".

Guido>> - How often does one need this?
I write these little directory/file filters quite often. I have come across
this problem of renaming the directories you are traversing before.
In the past the trees were small, so I just renamed the directories by
hand and used os.path.walk() to handle the files. Recently I had to rename
a very large tree which prompted me to look for a better solution.

Guido>> - When needed, how hard is it to hand-code a directory walk?  It's not
Guido>>   like the body of the walk() function is rocket science.
True, it is easy to write. It would make a good exercise for a beginner, but
I think it's better to have it than to not have it since I think a big
part of the appear of Python is the "little" algorithms.
It's also fits with the Python Batteries Included philosophy and
benefits the "casual" Python user. Finally, I just find it generally useful.
I use directory walkers a lot.

david>> That's hardly the point of improving the standard library, though, is
david>> it?  I'm all for putting the kitchen sink in there, especially if it
david>> originates with a use case ("I had some dishes to wash..." ;-)
Guido>
Guido>But if I had to do it over again, I wouldn't have added walk() in the
Guido>current form.  I often find it harder to fit a particular program's
Guido>needs in the API offered by walk() than it is to reimplement the walk
Guido>myself.  That's why I'm concerned about adding to it.

The change is small and the interface is backward compatible, but
if you are actually trying to discourage people from using os.path.walk()
in the future then I would vote for deprecating it and
replacing it with a generator where the default is depthfirst ;-)

Below is a sample tree walker using a generator
I was delighted to find that they work in recursive functions, but
it gave me a headache to think about for the first time.
Perhaps it could be prettier, but this demonstrates the basic idea.

Yours,
Noah


# Inspired by Doug Fort from an ActiveState Python recipe.
# His version didn't use recursion and didn't do depth first.
import os
import stat

def walktree (top = ".", depthfirst = True):
     """This walks a directory tree, starting from the 'top' directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to DEPTH first with optional BREADTH first,
     whereas the os.path.walk function allows only BREADTH first.
     Depth first was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.
     """
     names = os.listdir(top)

     if not depthfirst:
         yield top, names

     for name in names:
         try:
             st = os.lstat(os.path.join(top, name))
         except os.error:
             continue
         if stat.S_ISDIR(st.st_mode):
             for (newtop, children) in walktree (os.path.join(top, name), depthfirst):
                 yield newtop, children

     if depthfirst:
         yield top, names

def test():
     for (basepath, children) in walktree():
         for child in children:
	    print os.path.join(basepath, child)

if __name__ == '__main__':
     test()




From drifty@alum.berkeley.edu  Mon Apr 21 19:57:22 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Mon, 21 Apr 2003 11:57:22 -0700 (PDT)
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <b81ebo$cpq$1@main.gmane.org>
References: <200304192343.48211.aleax@aleax.it>
 <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
 <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it>
 <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net>
 <b81ebo$cpq$1@main.gmane.org>
Message-ID: <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU>

[Terry Reedy]

> Given that we already have the 'less ugly' alternative str.join(delim,
> strseq),

Yes, but the string module will go away *someday*, so having it now does
not matter much.

-Brett


From aleax@aleax.it  Mon Apr 21 20:07:31 2003
From: aleax@aleax.it (Alex Martelli)
Date: Mon, 21 Apr 2003 21:07:31 +0200
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU>
References: <200304192343.48211.aleax@aleax.it> <b81ebo$cpq$1@main.gmane.org> <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU>
Message-ID: <200304212107.31876.aleax@aleax.it>

On Monday 21 April 2003 08:57 pm, Brett Cannon wrote:
> [Terry Reedy]
>
> > Given that we already have the 'less ugly' alternative str.join(delim,
> > strseq),
>
> Yes, but the string module will go away *someday*, so having it now does
> not matter much.

Terry mentioned the type (str), not the module (string).  The type's not
gonna go away anytime soon...


Alex



From tim.one@comcast.net  Mon Apr 21 20:15:54 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 15:15:54 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net>

This is a multi-part message in MIME format.

--Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ)
Content-type: text/plain; charset=iso-8859-1
Content-transfer-encoding: 7BIT

[Guido]
>>> But if I had to do it over again, I wouldn't have added walk() in the
>>> current form.

[Neil Schemenauer]
>> I think it's the perfect place for a generator.

[Guido]
> Absolutely!  So let's try to write something new based on generators,
> make it flexible enough so that it can handle pre-order or post-order
> visits, and then phase out os.walk().

I posted one last night, with a bug (it failed to pass the topdown flag
through to recursive calls).

Here's that again, with the bug repaired, sped up some, and with a
docstring.  Double duty:  the example in the docstring shows why we don't
want to make a special case out of sum([]):  empty lists can arise
naturally.

What else would people like in this?  I really like separating the directory
names from the plain-file names, so don't bother griping about that <wink>.

It's at least as fast as the current os.path.walk() (it's generally faster
for me, but times for this are extremely variable on Win98).  Removing the
internal recursion doesn't appear to make a measureable difference when
walking my Python tree, although because recursive generators require time
proportional to the current stack depth to deliver a result to the caller,
and to resume again, removing recursion could be much more efficient on an
extremely deep tree.  The biggest speedup I could find on Windows was via
using os.chdir() liberally, so that os.path.join() calls weren't needed, and
os.path.isdir() calls worked directly on one-component names.  I suspect
this has to do with that Win98 doesn't have an effective way to cache
directory lookups under the covers.  Even so, it only amounted to a 10%
speedup:  directory walking is plain slow on Win98 no matter how you do it.
The attached doesn't play any gross speed tricks.

--Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ)
Content-type: text/plain; name=walk.py
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=walk.py

def walk(top, topdown=True):
    """Directory tree generator.

    For each directory in the directory tree rooted at top (including top
    itself, but excluding '.' and '..'), yields a 3-tuple

        dirpath, dirnames, filenames

    dirpath is a string, the path to the directory.  dirnames is a list of
    the names of the subdirectories in dirpath (excluding '.' and '..').
    filenames is a list of the names of the non-directory files in dirpath.

    If optional arg 'topdown' is true or not specified, the triple for a
    directory is generated before the triples for any of its subdirectories
    (directories are generated top down).  If topdown is false, the triple
    for a directory is generated after the triples for all of its
    subdirectories (directories are generated bottom up).

    When topdown is true, the caller can modify the dirnames list in-place
    (e.g. via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune
    the search, or to impose a specific order of visiting.  Modifying
    dirnames when topdown is false is ineffective, since the directories in
    dirnames have already been generated by the time dirnames itself is
    generated.

    Caution:  if you pass a relative pathname for top, don't change the
    current working directory between resumptions of walk.

    Example:

    from os.path import join, getsize
    for root, dirs, files in walk('python/Lib/email'):
        print root, "consumes",
        print sum([getsize(join(root, name)) for name in files]),
        print "bytes in", len(files), "non-directory files"
        if 'CVS' in dirs:
            dirs.remove('CVS')  # don't visit CVS directories
    """

    import os
    from os.path import join, isdir

    try:
        names = os.listdir(top)
    except os.error:
        return

    exceptions = ('.', '..')
    dirs, nondirs = [], []
    for name in names:
        if name not in exceptions:
            if isdir(join(top, name)):
                dirs.append(name)
            else:
                nondirs.append(name)
    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        for x in walk(join(top, name), topdown):
            yield x
    if not topdown:
        yield top, dirs, nondirs

--Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ)--


From andymac@bullseye.apana.org.au  Mon Apr 21 13:59:45 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Mon, 21 Apr 2003 23:59:45 +1100 (edt)
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <m3u1cs2u49.fsf@mira.informatik.hu-berlin.de>
Message-ID: <Pine.OS2.4.44.0304212340550.27154-100000@tenring.andymac.org>

On 21 Apr 2003, Martin v. [iso-8859-15] L=F6wis wrote:

> > The failures all occur after the "Running tests on sre.search and
> > sre.match" phase of test_sre.
>
> Instead of trying various compilers hoping that the problem goes away,
> I recommend that you try to narrow down the test case that fails.

I never had any hope the problem would "go away".  I've been trying to
quantify the extent of the problem, by finding out which compilers
exhibit the failure with what optimisation settings, so that the autoconf
configurations generated don't result in interpreters that blow up
unexpectedly.

As it appears the issue is confined to gcc, and so far only on FreeBSD and
OS/2, I've got bugger all chance of resolving this in the gcc context.
I'm sure that others would have screamed by now if gcc on Linux was
similarly failing, which would have given more scope for resolving the
issue.  For all I know, it could be binutils related, as I seem to recall
Andrew Koenig encountering something along these lines.

I have a patch to configure.in which I'll upload to SF shortly which
lowers the optimisation for FreeBSD.  Not my preferred outcome, but all
I'm able to offer in my current circumstances.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From guido@python.org  Mon Apr 21 20:30:29 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 15:30:29 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: Your message of "Mon, 21 Apr 2003 15:15:54 EDT."
 <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net>
Message-ID: <200304211930.h3LJUTk14894@odiug.zope.com>

> Here's that again, with the bug repaired, sped up some, and with a
> docstring.  Double duty: the example in the docstring shows why we
> don't want to make a special case out of sum([]): empty lists can
> arise naturally.
> 
> What else would people like in this?  I really like separating the
> directory names from the plain-file names, so don't bother griping
> about that <wink>.

Good enough for me. :-)

> It's at least as fast as the current os.path.walk() (it's generally
> faster for me, but times for this are extremely variable on Win98).
> Removing the internal recursion doesn't appear to make a measureable
> difference when walking my Python tree, although because recursive
> generators require time proportional to the current stack depth to
> deliver a result to the caller, and to resume again, removing
> recursion could be much more efficient on an extremely deep tree.
> The biggest speedup I could find on Windows was via using os.chdir()
> liberally, so that os.path.join() calls weren't needed, and
> os.path.isdir() calls worked directly on one-component names.  I
> suspect this has to do with that Win98 doesn't have an effective way
> to cache directory lookups under the covers.  Even so, it only
> amounted to a 10% speedup: directory walking is plain slow on Win98
> no matter how you do it.  The attached doesn't play any gross speed
> tricks.

Please don't us chdir(), no matter how much it speeds things up.  It's
a disaster in a multi-threaded program.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Mon Apr 21 20:31:18 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 21 Apr 2003 21:31:18 +0200
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <3EA43EA3.1030903@noah.org>
References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <3EA43EA3.1030903@noah.org>
Message-ID: <3EA44706.3010806@v.loewis.de>

Noah Spurrier wrote:

> I thought the names were synonymous, but a quick look on Google
> showed that post-order seems more specific to binary trees whereas
> depth first is more general, but I didn't look very hard and all my
> college text books are in storage :-) Depth first is more intuitive, but
> post order is more descriptive of what the algorithm does.
> If I were writing documentation (or reading it) I would prefer "depth 
> first".

I'm tempted to declare this off-topic: depth-first means "traverse
children before traversing siblings". Depth-first comes in three
variations: pre-order (traverse node first, then children, then 
siblings), in-order (only for binary trees: traverse left child
first, then node, then right child, then sibling), post-order (traverse
children first, then node, then siblings). There is also breadth-first:
traverse siblings first, then children.

> I write these little directory/file filters quite often. I have come across
> this problem of renaming the directories you are traversing before.

I still can't understand why you can't use os.path.walk for that.
Did you know that you can modify the list that is passed to the 
callback, and that walk will continue to visit the elements in the list?

Regards,
Martin




From drifty@alum.berkeley.edu  Mon Apr 21 21:53:40 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Mon, 21 Apr 2003 13:53:40 -0700 (PDT)
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304212107.31876.aleax@aleax.it>
References: <200304192343.48211.aleax@aleax.it> <b81ebo$cpq$1@main.gmane.org>
 <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU>
 <200304212107.31876.aleax@aleax.it>
Message-ID: <Pine.SOL.4.55.0304211353090.3640@death.OCF.Berkeley.EDU>

[Alex Martelli]

> On Monday 21 April 2003 08:57 pm, Brett Cannon wrote:
> > [Terry Reedy]
> >
> > > Given that we already have the 'less ugly' alternative str.join(delim,
> > > strseq),
> >
> > Yes, but the string module will go away *someday*, so having it now does
> > not matter much.
>
> Terry mentioned the type (str), not the module (string).  The type's not
> gonna go away anytime soon...
>

Oops.  =)  Sorry about that mix-up.

-Brett


From python@rcn.com  Mon Apr 21 21:58:54 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 21 Apr 2003 16:58:54 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it>
Message-ID: <003d01c30848$ebcc2d00$ec11a044@oemcomputer>

> > So here's a strawman implementation:
> >
> >   def sum(seq, start=0):
> >     if isinstance(start, basestring):
> >       raise TypeError, "can't sum strings; use ''.join(seq) instead"
> >     return reduce(operator.add, seq, start)
> >
> > Alex, go ahead and implement this!
> 
> Coming right up!

For the C implementation, consider bypassing operator.add
and calling the nb_add slot directly.  It's faster and fulfills
the intention to avoid the alternative call to sq_concat.

Also, think about whether you want to match to two argument
styles for min() and max():  
  >>> max(1,2,3)
  3
  >>> max([1,2,3])
  3

When the patch is ready, feel free to assign it to me for
the code review.


Raymond Hettinger


P.S.  Your new builtin works great with itertools.
    def dotproduct(vec1, vec2):
        return sum(itertools.imap(operator.mul, vec1, vec2))


From python@rcn.com  Mon Apr 21 22:23:25 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 21 Apr 2003 17:23:25 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <003d01c30848$ebcc2d00$ec11a044@oemcomputer>
Message-ID: <005e01c3084c$3fe7d300$ec11a044@oemcomputer>

[RH]
> For the C implementation, consider bypassing operator.add
> and calling the nb_add slot directly.  It's faster and fulfills
> the intention to avoid the alternative call to sq_concat.

Forget I said that, you still need PyNumber_Add() to
handle coercion and such.  Though without some
special casing  it's going to be darned difficult to match 
the performance of a pure python for-loop (especially
for a sequence of integers).


Raymond Hettinger      



From tim.one@comcast.net  Mon Apr 21 22:31:20 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 17:31:20 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <003d01c30848$ebcc2d00$ec11a044@oemcomputer>
Message-ID: <LNBBLJKPBEHFEDALKOLCGENCEDAB.tim.one@comcast.net>

[Raymond Hettinger]
> For the C implementation, consider bypassing operator.add
> and calling the nb_add slot directly.  It's faster and fulfills
> the intention to avoid the alternative call to sq_concat.

Checking for the existence of a (non-NULL) nb_add slot may be slicker than
special-casing strings, but I'm not sure it's ever going to work if we try
to call nb_add directly.  In the end, I expect we'd have to duplicate all
the logic in abstract.c's private binary_op1() to get all the endcases
straight:

/*
  Calling scheme used for binary operations:

  v	w	Action
  -------------------------------------------------------------------
  new	new	w.op(v,w)[*], v.op(v,w), w.op(v,w)
  new	old	v.op(v,w), coerce(v,w), v.op(v,w)
  old	new	w.op(v,w), coerce(v,w), v.op(v,w)
  old	old	coerce(v,w), v.op(v,w)

  [*] only when v->ob_type != w->ob_type && w->ob_type is a subclass of
      v->ob_type

  Legend:
  -------
  * new == new style number
  * old == old style number
  * Action indicates the order in which operations are tried until either
    a valid result is produced or an error occurs.

 */

OTOH, when the nb_add slot isn't NULL, the public PyNumber_Add (the same as
operator.add) will do no more than invoke binary_op1 (unless the nb_add slot
returns NotImplemented, which is another endcase you have to consider when
calling nb_add directly -- I believe the Python core calls nb_add directly
in only one place, when it already knows that both operands are ints, and
that their sum overflows an int, so wants long.__add__ to handle it).

> Also, think about whether you want to match to two argument
> styles for min() and max():
>   >>> max(1,2,3)
>   3
>   >>> max([1,2,3])
>   3

Guido already Pronounced on that -- max(x, y) is the clearest way to perform
that operation, but there's no point to making sum(x, y) an obscure way to
spell x+y (I suppose you want it as a builtin synonym for operator.add,
though <wink>).

> ...
> P.S.  Your new builtin works great with itertools.
>     def dotproduct(vec1, vec2):
>         return sum(itertools.imap(operator.mul, vec1, vec2))

Cool!



From cnetzer@mail.arc.nasa.gov  Mon Apr 21 23:07:28 2003
From: cnetzer@mail.arc.nasa.gov (Chad Netzer)
Date: 21 Apr 2003 15:07:28 -0700
Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ')
In-Reply-To: <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net>
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
 <3EA3654D.3070402@activestate.com>
 <1050899612.591.21.camel@sayge.arc.nasa.gov>
 <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <1050962848.584.9.camel@sayge.arc.nasa.gov>

On Mon, 2003-04-21 at 05:30, Guido van Rossum wrote:
> Since it already exists as a 3rd party package, we should definitely
> not try to duplicate the effort.  Then the question is, is it enough
> to point to the 3rd party package or does it deserve to be
> incorporated into the core?  We can't go and incorporate every useful
> 3rd party package into the core

True.  I just happen to be of the opinion that a statistics package is
the single most practical and useful mathematics pakage that can be
added to a language, after basic linear algebra (which isn't in the
core... Hmmm)

> OTOH, having it in the core, with decent documentation, might prevent
> naive wannabe-statisticians like myself from misremembering how
> standard deviation is implemented, or when to use it. :-)

My concern, like yours, is that this kind of thing is probably
reimplemented a LOT (at least the simple stats functions).  If we
adopoted it, I would actually favor keeping it fairly lightweight
(although t-tests and even ANOVA should go in).  Heavyweight users could
always download a separate add on package.

Of course, the stats.py package (and it's SciPy cousin) DOES seem to be
well maintained, so perhaps the issue is just making sure those that
might need it can easily download and install it (ie. promote it to
distributions, give it proper promotion on Vaults of Parnassus, mirror
it, etc.).

My personal preference would be make it standard (as I propably would
like NumPy to become).  I like to use it in my unittests. :)  That may
not be the consensus, though.

-- 

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)



From noah@noah.org  Mon Apr 21 23:15:09 2003
From: noah@noah.org (Noah Spurrier)
Date: Mon, 21 Apr 2003 15:15:09 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
Message-ID: <3EA46D6D.8070606@noah.org>

I like your version; although I used a different name to avoid
confusion with os.path.walk.

Note that os.listdir does not include the special entries '.' and '..'
even if they are present in the directory, so there is no need
to remove them.

Tim Peters>    for path in walk(root):
Tim Peters>Or it could look like
Tim Peters>    for top, names in walk(root):
Tim Peters>or
Tim Peters>    for top, dirnames, nondirnames in walk(root):

I like the idea of yielding (top, dirs, nondirs), but
often I want to perform the same operations on both
dirs and nondirs so separating them doesn't help that case.
This seems to be a situation where there is no typical case,
so my preference is for the simpler interface.
It also eliminates the need to build two new lists from
the list you get from os.listdir()... In fact, I prefer
your first suggestion (for path in walk(root)), but that
would require building a new list by prepending the
basepath to each element of children because os.listdir does not
return full path. So finally in this example, I just went with
returning the basepath and the children (both files and directories).

Following Tom Good's example I added an option to
ignore symbolic links. It would be better to detect
cycles or at least prevent going higher up in the tree.

Tim Peters>obvious topdown argument, note a subtlety:  when topdown is True, the caller
Tim Peters>can prune the search by mutating the dirs list yielded to it.  For example,
This example still allows you to prune the search
in Breadth first mode by removing elements from
the children list. That is cool.
     for top, children in walk('C:/code/python', depthfirst=False):
         print top, children
         if 'CVS' in children:
             children.remove('CVS')

Yours,
Noah

from __future__ import generators # needed for Python 2.2
# Inspired by Doug Fort from an ActiveState Python recipe.
# His version didn't use recursion and didn't do depth first.
import os

def walktree (basepath=".", depthfirst=True, ignorelinks=True):
     """This walks a directory tree, starting from the basepath directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to DEPTH first with optional BREADTH first,
     whereas the os.path.walk function allows only BREADTH first.
     Depth first was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.

     The ignorelinks option determines whether to follow symbolic links.
     Some symbolic links can lead to recursive traversal cycles.
     A better way would be to detect and prune cycles.
     """
     children = os.listdir(basepath)

     if not depthfirst:
         yield basepath, children

     for name in children:
         fullpath = os.path.join(basepath, name)
         if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)):
             for (next_basepath, next_children) in walktree (fullpath, depthfirst, ignorelinks):
                 yield next_basepath, next_children

     if depthfirst:
         yield basepath, children

def test():
     for (basepath, children) in walktree():
         for name in children:
             print os.path.join(basepath, name)

if __name__ == '__main__':
     test()




From tim.one@comcast.net  Tue Apr 22 00:33:03 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 19:33:03 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <3EA46D6D.8070606@noah.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net>

[Noah Spurrier]
> I like your version; although I used a different name to avoid
> confusion with os.path.walk.

Who's confused <wink>?  I agree it needs some other name if something like
this gets checked in.

> Note that os.listdir does not include the special entries '.' and '..'
> even if they are present in the directory, so there is no need
> to remove them.

Oops -- that's right!  This is a code divergence problem.  There's more than
one implementation of os.path.walk in the core, and the version in ntpath.py
(which I started from) still special-cases '.' and '..'.  I don't think it
needs to.

> Tim Peters>    for path in walk(root):
> Tim Peters>Or it could look like
> Tim Peters>    for top, names in walk(root):
> Tim Peters>or
> Tim Peters>    for top, dirnames, nondirnames in walk(root):
>
> I like the idea of yielding (top, dirs, nondirs), but
> often I want to perform the same operations on both
> dirs and nondirs so separating them doesn't help that case.

I think (a) that's unusual, and (b) it doesn't hurt that case either.  You
can do, e.g.,

    for root, dirs, files in walk(...):
        for name in dirs + files:

to squash them together again.

> This seems to be a situation where there is no typical case,
> so my preference is for the simpler interface.
> It also eliminates the need to build two new lists from
> the list you get from os.listdir()...

Sorry, I'm unmovable on this point.  My typical uses for this function do
have to separates dirs from non-dirs, walk() has to make the distinction
*anyway* (for its internal use), and it's expensive for the client to do the
join() and isdir() bits all over again (isdir() is a filesystem op, and at
least on my box repeated isdir() is overwhelmingly more costly than
partitioning or joining a Python list).

> In fact, I prefer your first suggestion (for path in walk(root)), but
> that would require building a new list by prepending the
> basepath to each element of children because os.listdir does not
> return full path.

What about that worries you?  I don't like it because I have some
directories with many thousands of files, and stuffing a long redundant path
at the start of each is wasteful in the abstract.  I'm not sure it really
matters, though -- e.g., 10K files in a directory * 200 redundant chars each
= a measly 2 megabytes wasted <wink>.

> So finally in this example, I just went with returning the basepath
> and the children (both files and directories).
>
> Following Tom Good's example I added an option to
> ignore symbolic links.

Not all Python platforms have symlinks, of course.  The traditional answer
to this one was that if a user wanted to avoid chasing those on a platform
that supports them, they should prune the symlink names out of the fnames
list passed to walk's func callback.  The same kind of trick is still
available in the generator version, although it was and remains painful.
Separating the dirs from the non-dirs for the caller at least reduces the
expense of it.

> It would be better to detect cycles or at least prevent going
> higher up in the tree.
> ...
> This example still allows you to prune the search
> in Breadth first mode by removing elements from
> the children list. That is cool.
>      for top, children in walk('C:/code/python', depthfirst=False):
>          print top, children
>          if 'CVS' in children:
>              children.remove('CVS')

I'm finding you too hard to follow here, becuase your use of "depthfirst"
and "breadthfirst" doesn't follow normal usage of the terms.  Here's normal
usage:  consider this tree (A's kids are B, C, D; B's kids are E, F; C's are
G, H, I; D's are J, K):

                   A
          B        C         D
         E F     G H I      J K

A depth-first left-to-right traversal is what you get out of a natural
recursive routine.  It sees the nodes internally in this order:

   A B E F C G H I D J K

In a preorder DFS (depth first search), you deliver ("do something with" --
print it, yield it, whatever) the node before delivering its children.
Preorder DFS in the tree above delivers the nodes in order

   A B E F C G H I D J K

which is the same order in which nodes are first seen.  This is what I
called "top down".  In a postorder DFS, you deliver the node *after*
delivering its children, although you still first see nodes in the same
order.  Postorder left-to-right DFS in the tree above delivers nodes in this
order:

   E F B G H I C J K D A

This is what I called "bottom up".

A breadth-first search can't be done naturally using recursion; you need to
maintain an explicit queue for that (or write convoluted recursive code).  A
BFS on the tree above would see the nodes in this order:

   A B C D E F G H I J K

It can be programmed like so, given a suitable queue implementation:

     queue = SuitableQueueImplementation()
     queue.enqueue(root)
     while queue:
          node = queue.dequeue()
          for child in node.children():
              queue.enqueue(child)

Nobody has written a breadth-first traverser in this thread.  If someone
wants to, there are again preorder and postorder variations, although only
preorder BFS falls naturally out of the code just above.

The current os.path.walk() delivers directories in preorder depth-first
left-to-right order, BTW.

>      for name in children:
>          fullpath = os.path.join(basepath, name)
>          if os.path.isdir (fullpath) and not (ignorelinks and
> os.path.islink(fullpath)):

Despte what I said above <wink>, I expect the ignorelinks argument is a good
idea.

>              for (next_basepath, next_children) in walktree
> (fullpath, depthfirst, ignorelinks):
>                  yield next_basepath, next_children

Note that there's no need to pull apart 2-tuples and paste them together
again here;

    for x in walktree(...):
        yield x

does the same thing.



From duanev@io.com  Tue Apr 22 01:27:38 2003
From: duanev@io.com (duane voth)
Date: Mon, 21 Apr 2003 19:27:38 -0500
Subject: [Python-Dev] LynxOS4 dynamic loading with dlopen() and -ldl
Message-ID: <20030421192738.A23585@io.com>

I'm unable to get the dynamic Python modules to import/load
correctly on LynxOS4 (a realtime OS that has gcc, shared libs,
and many other UNIXisms).

Make excerpt:
    ...
    running build
    running build_ext
    platform = lynxos4                    (my comment in setup.py)
    building 'struct' extension
    creating build
    creating build/temp.lynxos-4.0.0-PowerPC-2.2
    gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fvec -fPIC -mshared
        -mthreads -I. -I/usr/local/src/Python-2.2.2/./Include
        -I/usr/local/include -I/usr/local/src/Python-2.2.2/Include
        -I/usr/local/src/Python-2.2.2
        -c /usr/local/src/Python-2.2.2/Modules/structmodule.c
        -o build/temp.lynxos-4.0.0-PowerPC-2.2/structmodule.o
    creating build/lib.lynxos-4.0.0-PowerPC-2.2
    gcc -shared -mshared -mthreads
        build/temp.lynxos-4.0.0-PowerPC-2.2/structmodule.o
        -L/usr/local/lib -o build/lib.lynxos-4.0.0-PowerPC-2.2/struct.so
    WARNING: removing "struct" since importing it failed
    ...
    (all the other modules fail the same way)

I hacked setup.py to stop "removing" the bad module files and brought
up the python interpreter to try the import by hand:

    bash-2.02# ./python 
    Python 2.2.2 (#4, Apr 21 2003, 16:39:51) 
    [GCC 2.95.3 20010323 (Lynx)] on lynxos4
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.path += ['/usr/local/src/Python-2.2.2/build/lib.lynxos-4.0.0-PowerPC-2.2']
    >>> import struct
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    ImportError: Symbol not found: "PyInt_Type"
    >>>

(btw, it would be nice if 'ImportError: Symbol not found: "PyInt_Type"'
was emitted without all the debugging by hand -- actually it would be
nice if many python exceptions (IndexError: list index out of range
comes to mind) were rather more helpful about what is wrong, all this
debugging via divination is a bit hard on us newbies!)


PyInt_Type is declared in Objects/intobject.o and is visible in
the python binary (the one doing the dlopen()).  I'm not that familiar
with dlopen() but shouldn't references from the .so being loaded to
the loading program be resolved by dlopen during load?  Running nm
on 'python' gives '004d2d3c D PyInt_Type' so all the python symbols
are being exported properly.


Any ideas on how to resolve this run-time symbol lookup error?




Nagging thoughts:

LynxOS seems to shy away from shared libraries (they live in
a special nonstandard directory and not all libraries have shared
versions).  Should I be thinking about doing a static python?  If
so, I will need to abandon dlopen() completely right?  But I also
want to use tkinter and the X11 libs to so I don't think static is
really what I want!



-- 
Duane Voth
duanev@io.com
--
duanev@atlantis.io.com


From tdelaney@avaya.com  Tue Apr 22 01:31:19 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 22 Apr 2003 10:31:19 +1000
Subject: [Python-Dev] Re: FIFO data structure?
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4ABCB3@au3010avexu1.global.avaya.com>

> From: David Eppstein [mailto:eppstein@ics.uci.edu]
>=20
> See <http://tinyurl.com/9x6d> for some tests indicating that=20
> using dict for fifo is a slow way to go.

Arrgh! That's an extremely broken test. Do not link to that test!

I even admitted it when I realised ...

http://tinyurl.com/a0f4

Tim Delaney


From python@rcn.com  Tue Apr 22 01:29:20 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 21 Apr 2003 20:29:20 -0400
Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ')
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov> <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> <1050962848.584.9.camel@sayge.arc.nasa.gov>
Message-ID: <009c01c30866$383cc920$ec11a044@oemcomputer>

[GvR]
> > Since it already exists as a 3rd party package, we should definitely
> > not try to duplicate the effort.  Then the question is, is it enough
> > to point to the 3rd party package or does it deserve to be
> > incorporated into the core?  We can't go and incorporate every useful
> > 3rd party package into the core

Why not?  From a users point of view, that is the best place for it.
Of course, not *every* useful third-party package is a candidate,
but if it applies to several different categories of users, then maybe.
For instance, the DNA seach packages are somewhat tightly targeted,
but basic statistics come up in many different types of work.

[Chad]
> True.  I just happen to be of the opinion that a statistics package is
> the single most practical and useful mathematics pakage that can be
> added to a language, after basic linear algebra (which isn't in the
> core... Hmmm)

I've maintained a pure python linear algebra package for several years.
It gives the basics plus QR decomposition, complex matrices, and eigenvalues.
Still, I haven't felt the slightest need to request that it be put it the core,
nor have any of my users requested it.

> I would actually favor keeping it fairly lightweight
> (although t-tests and even ANOVA should go in).  
> Heavyweight users could
> always download a separate add on package.

To keep it lightweight, it should be kept in pure python.
Heavyweight users can download the binaries when needed.

reinventing-the-wheel-is-fun-educational-and-non-productive-ly yours,


Raymond Hettinger





From guido@python.org  Tue Apr 22 01:57:42 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 21 Apr 2003 20:57:42 -0400
Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ')
In-Reply-To: "Your message of Mon, 21 Apr 2003 20:29:20 EDT."
 <009c01c30866$383cc920$ec11a044@oemcomputer>
References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net>
 <3EA3654D.3070402@activestate.com>
 <1050899612.591.21.camel@sayge.arc.nasa.gov>
 <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net>
 <1050962848.584.9.camel@sayge.arc.nasa.gov>
 <009c01c30866$383cc920$ec11a044@oemcomputer>
Message-ID: <200304220057.h3M0vge23381@pcp02138704pcs.reston01.va.comcast.net>

[Guido]
> > > We can't go and incorporate every useful
> > > 3rd party package into the core

[Raymond]
> Why not?

Because of the costs associated with code in the core:

- Once it's in the core, you can't take it away; if the original
  maintainer goes away, we have to somehow keep up maintenance;
  there's no such thing as maintenance-free code.  E.g. see the pain
  it takes to get SRE bugs fixed now that Effbot is too busy.

- The core needs to build and run on a large variety of platforms.
  Some 3rd party package authors don't have that goal, and maintain
  their solution for one or two platforms only.  But what's in the
  core should (unless it is *inherently* platform specific) run on all
  platforms.  The extra portability work must be done by *someone*.

- If it's actively maintained by the original author(s), their release
  cycle may not coincide with Python's; given Python's size, Python
  releases are typically less frequent than other package releases.
  There's not much point in having an outdated version of something in
  the core.  E.g. see the painful situation with the xml package and
  the PyXML distribution.  This is one reason why win32all is still
  separately maintained.

- Coding standards.  I don't care what naming and other coding
  conventions are used in a 3rd party package, but there are certain
  minimum standards for core code (see PEP 7 and 8).  This is another
  reason why win32al is still separately maintained.

- Documentation style.  For core packages it is expected that their
  documentation is maintained in our special LaTeX dialect.

- At some point the download size simply gets too big, and we have to
  break things up again.  This has happened to Emacs, for example.

- For some areas (I'm not saying that this is the case for the stats
  package, for all I know it's "best of breed") there is considerable
  disagreement among (potential and existing) users which package
  providing certain functionality is "right".  E.g. Twisted vs. Zope.
  We can't put every approach in the core, but putting one package in
  the core may damage the viability of another, possibly better (for
  some users) solution.  To some extent this has happened with GUI
  toolkits: the presence of Tkinter in the core makes it harder for
  other GUI toolkits to compete (leaving aside whether Tkinter is
  better or not -- it's just not a level playing field).

Feel free to enter this in the FAQ; I've got a feeling this is a
generally useful response. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim@multitalents.net  Tue Apr 22 02:55:38 2003
From: tim@multitalents.net (Tim Rice)
Date: Mon, 21 Apr 2003 18:55:38 -0700 (PDT)
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
Message-ID: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>

On Wed, 16 Apr 2003, Guido van Rossum wrote:

> I'd like to do a 2.3b1 release someday.  Maybe at the end of next
> week, that would be Friday April 25.  If anyone has something that
> needs to be done before this release go out, please let me know!

The UnixWare build is way dead right now. (today's CVS)

cc -c -K pentium,host,inline,loop_unroll,alloca  -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include  -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c
UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set
UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select
gmake: *** [Modules/python.o] Error 1

> 
> Assigning a SF bug or patch to me and setting the priority to 7 is a
> good way to get my attention.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> 

-- 
Tim Rice				Multitalents	(707) 887-1469
tim@multitalents.net



From gward@python.net  Tue Apr 22 02:57:29 2003
From: gward@python.net (Greg Ward)
Date: Mon, 21 Apr 2003 21:57:29 -0400
Subject: [Python-Dev] Bug/feature/patch policy for optparse.py
In-Reply-To: <003d01c3081f$e5232540$410ea044@oemcomputer>
References: <20030421024743.GA3911@cthulhu.gerg.ca> <003d01c3081f$e5232540$410ea044@oemcomputer>
Message-ID: <20030422015729.GA966@cthulhu.gerg.ca>

[Raymond, I'm assuming you did not mean to send your reply to me
 privately, so I'm cc'ing python-dev!]

On 21 April 2003, Raymond Hettinger said:
> Why is it important to keep two separate implementations?
> Also, if you have to have two, why not have the python cvs
> as the primary (to the remove the restriction, to take advantage
> of the snake farm, to let third party users have a single place
> to file a bug report, to have more developer eyes and fingers
> to work a problem, etc)?

Hmmm, good question.  Probably it's mostly for ego gratification -- one
of *my* SF projects was above the 50th percentile in activity last week,
just because of my flurry of checkins on Sunday!  ;->

But seriously: if people using Python < 2.3 are to be able to use Optik
(aka optparse), then there needs to be somewhere for the setup script,
tarball etc. to live.  optik.sourceforge.net is as good a place as any.
Perhaps in due course, the code in Lib/optparse.py (and
Lib/test/test_optparse.py) will become the definitive copy, but for now
it's not.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
I'm on a strict vegetarian diet -- I only eat vegetarians.


From gward@python.net  Tue Apr 22 03:26:07 2003
From: gward@python.net (Greg Ward)
Date: Mon, 21 Apr 2003 22:26:07 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030422022607.GA1107@cthulhu.gerg.ca>

On 20 April 2003, Guido van Rossum said:
> I'm not too worried that people will ask for prod() as well.  And if
> they do, maybe we can give them that too; there's not much else along
> the same lines (bitwise or/and; ha ha ha) so even if the slope may be
> a bit slippery, I'm not worried about sliding too far.

I can't count the number of times sum() would have been useful to me.  I
can count the number of times prod() would have been: zero.

Bitwise and/or en masse seems unnecessary (although I remember being
quite tickled by the fact that you can do bitwise operations on strings
in Perl -- whee, fun! -- when I was young and naive).

However, there have been a number of occasions where I wanted *logical*
and/or en masse: are any/all elements of this list true/false?  On
several occasions I tried to do it in one super-clever line of code
using reduce(), and I think I even succeeded once.  But usually I give
up and make it a loop.  IMHO *this* is likely to be the feature people
start asking for after they decide sum() is handy.

        Greg

PS. my nominations for removal in Python 3.0: reduce() and filter().
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
What happens if you touch these two wires tog--


From tim.one@comcast.net  Tue Apr 22 03:44:58 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 22:44:58 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <3EA44706.3010806@v.loewis.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOCEDAB.tim.one@comcast.net>

[Noah Spurrier]
>> I write these little directory/file filters quite often. I have
>> come across this problem of renaming the directories you are
>> traversing before.

[Martin v. L=F6wis]
> I still can't understand why you can't use os.path.walk for that.
> Did you know that you can modify the list that is passed to the
> callback, and that walk will continue to visit the elements in the =
list?

Let's spell it out.  Say the directory structure is like so:

a/
    b/
         c/
         d/
    e/

and we want to stick "x" at the end of each directory name.  The firs=
t thing
the callback sees is

    arg, "a", ["b", "e"]

The callback can rename b and e, and change the contents of the fname=
s list
to ["bx", "ex"] so that walk will find the renamed directories.  Etc.

This works:

"""
import os

def renamer(arg, dirname, fnames):
    for i, name in enumerate(fnames):
        if os.path.isdir(os.path.join(dirname, name)):
            newname =3D name + "x"
            os.rename(os.path.join(dirname, name),
                      os.path.join(dirname, newname))
            fnames[i] =3D newname   # crucial!

os.path.walk('a', renamer, None)
"""

It's certainly less bother renaming bottom-up; this works too (given =
the
last walk() generator implementation I posted):

"""
import os

for root, dirs, files in walk('a', topdown=3DFalse):
    for d in dirs:
        os.rename(os.path.join(root, d),
                  os.path.join(root, d + 'x'))
"""

A possible surprise is that neither of these renames 'a'.




From tim.one@comcast.net  Tue Apr 22 04:03:20 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 23:03:20 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030422022607.GA1107@cthulhu.gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>

[Greg Ward]
> I can't count the number of times sum() would have been useful to me.  I
> can count the number of times prod() would have been: zero.

Two correct answers.  Good for you, Greg!

> Bitwise and/or en masse seems unnecessary (although I remember being
> quite tickled by the fact that you can do bitwise operations on strings
> in Perl -- whee, fun! -- when I was young and naive).
>
> However, there have been a number of occasions where I wanted *logical*
> and/or en masse: are any/all elements of this list true/false?  On
> several occasions I tried to do it in one super-clever line of code
> using reduce(), and I think I even succeeded once.  But usually I give
> up and make it a loop.  IMHO *this* is likely to be the feature people
> start asking for after they decide sum() is handy.

def alltrue(seq):
    return sum(map(bool, seq)) == len(seq)

def atleastonetrue(seq):
    return sum(map(bool, seq)) > 0

> ...
> PS. my nominations for removal in Python 3.0: reduce() and filter().

reduce() is still in Python?!  Brrrr.

filter() is hard to get rid of because the bizarre filter(None, seq) special
case is supernaturally fast.  Indeed, time the above against

def alltrue(seq):
    return len(filter(None, seq)) == len(seq)

def atleastonetrue(seq):
    return bool(filter(None, seq))

Let me know which wins <wink>.



From tim.one@comcast.net  Tue Apr 22 04:33:05 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 23:33:05 -0400
Subject: [Python-Dev] New re failures on Windows
In-Reply-To: <Pine.OS2.4.44.0304212340550.27154-100000@tenring.andymac.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEOHEDAB.tim.one@comcast.net>

[Martin v. L=F6wis]
>> Instead of trying various compilers hoping that the problem goes a=
way,
>> I recommend that you try to narrow down the test case that fails.

[Andrew MacIntyre]
> I never had any hope the problem would "go away".  I've been trying=
 to
> quantify the extent of the problem, by finding out which compilers
> exhibit the failure with what optimisation settings, so that the
> autoconf configurations generated don't result in interpreters that
> blow up unexpectedly.

Narrowing it down to the specific C code that's at fault is still the=
 best
hope.  There are two reasons for that:

1. It's very easy to write ill-defined code in C, and for all we know
   now some part of _sre is depending on undefined, or implementation
   defined (but apparently likely), behavior.

2. If that's not the problem, optimization bugs are usually easy to
   sidestep via minor code changes.  You have to know which code is
   getting screwed first, though.

> ...
> I have a patch to configure.in which I'll upload to SF shortly whic=
h
> lowers the optimisation for FreeBSD.  Not my preferred outcome, but=
 all
> I'm able to offer in my current circumstances.

Narrowing it down is indeed A Project.




From tim.one@comcast.net  Tue Apr 22 04:43:28 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 21 Apr 2003 23:43:28 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <3EA3654D.3070402@activestate.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEOIEDAB.tim.one@comcast.net>

[David Ascher]
> Scipy's stats package is more complete than many people expect.   I
> would argue strongly against putting a 'cheap stats' package in the
> core, since building one such packages takes a huge amount of work,
> doing it twice is silly.  At least the first version of the stats
> package now in chaco used to not require numeric, although I think that
> requirement is a red herring in practice.

I expect that when Guido is thinking about a simple stats package, he's not
picturing more than median, mean, sdev, variance, and maybe percentile
points, all limited to one dimension.  Just about anyone can take over
maintenance of those if need be, although how to code a numerically robust
sdev isn't well known outside of people who've been burned by "good enough,
it can't be *that* hard <wink>" initial attempts.



From tim.one@comcast.net  Tue Apr 22 05:06:10 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 22 Apr 2003 00:06:10 -0400
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <200304211256.42839.fincher.8@osu.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net>

[Jeremy Fincher]
> This code brought up an interesting question to me: if sets have
> a .discard method that removes an element without raising KeyError
> if the element isn't in the set, should lists perhaps have that same
> method?

I don't think list.remove(x) is used enough to care, when the presence of x
in the list is unknown.  Adding methods for purity alone is neither Pythonic
nor Perlish <wink>.

> On another related front, sets (in my Python 2.3a2) raise KeyError on a
> .remove(elt) when elt isn't in the set.  Since sets aren't mappings,
> should that be a ValueError (like list raises) instead?

Since sets aren't sequences either, why should sets raise the same exception
lists raise?  It's up to the type to use whichever fool exceptions it
chooses.  This doesn't always make life easy for users, alas -- there's not
much consistency in exception behavior across packages.  In this case, a
user would be wise to avoid expecting IndexError or KeyError, and catch
their common base class (LookupError) instead.  The distinction between
IndexError and KeyError isn't really useful (IMO; LookupError was injected
as a base class recently in Python's life).



From martin@v.loewis.de  Tue Apr 22 06:28:03 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 22 Apr 2003 07:28:03 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
Message-ID: <m31xzvf6e4.fsf@mira.informatik.hu-berlin.de>

Tim Rice <tim@multitalents.net> writes:

> The UnixWare build is way dead right now. (today's CVS)

Any volunteers to fix it?

Regards,
Martin


From martin@v.loewis.de  Tue Apr 22 06:40:49 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 22 Apr 2003 07:40:49 +0200
Subject: [Python-Dev] LynxOS4 dynamic loading with dlopen() and -ldl
In-Reply-To: <20030421192738.A23585@io.com>
References: <20030421192738.A23585@io.com>
Message-ID: <m3wuhndr8e.fsf@mira.informatik.hu-berlin.de>

duane voth <duanev@io.com> writes:

> I hacked setup.py to stop "removing" the bad module files and brought
> up the python interpreter to try the import by hand:
[...]
> (btw, it would be nice if 'ImportError: Symbol not found: "PyInt_Type"'
> was emitted without all the debugging by hand

I *strongly* recommend to use the Python CVS (to become 2.3) as a
baseline for your port. Among other things, it does this already.

> PyInt_Type is declared in Objects/intobject.o and is visible in
> the python binary (the one doing the dlopen()).  I'm not that familiar
> with dlopen() but shouldn't references from the .so being loaded to
> the loading program be resolved by dlopen during load?  

For executables, this is highly platform dependent - they never
consider the case of somebody linking with an *executable*; they
expect that symbols normally come from shared libraries.

On ELF systems, it is supported, but still depends on the linker. For
example, the GNU linker wants --export-dynamic as a linker option in
order to expose symbols from the executable. You can use "nm -D
--defined-only" (for GNU nm) to find out whether the executable
exports symbols dynamically.

> Running nm on 'python' gives '004d2d3c D PyInt_Type' so all the
> python symbols are being exported properly.

You are looking into the wrong section :-( Try strip on the binary and
see the symbols go away. On ELF systems, you need the .dynsym/.dynstr
sections on the binary.

> LynxOS seems to shy away from shared libraries (they live in
> a special nonstandard directory and not all libraries have shared
> versions).  Should I be thinking about doing a static python?  If
> so, I will need to abandon dlopen() completely right?  But I also
> want to use tkinter and the X11 libs to so I don't think static is
> really what I want!

It depends. I have a strong dislike towards shared libraries, myself.
They are hard to use and somewhat inefficient, both in terms of
start-up time, and in terms of memory usage. OTOH, for Python
extension modules, they simplify the build and deployment process,
and help to cut dependencies to other libraries.

So if you can make it work, you should. You can then *still* consider
integrating as many modules as reasonable into your python interpreter
image, by means of Setup, and, for an embedded system, you definitely
should also do that.

Add demand paging to the picture: If the system has demand-paging, the
size of the binary is irrelevant, as the system will swap in only what
is needed. If the system needs to read the entire image into RAM, you
want it as small as possible, though.

Regards,
Martin


From agthorr@barsoom.org  Tue Apr 22 06:42:18 2003
From: agthorr@barsoom.org (Agthorr)
Date: Mon, 21 Apr 2003 22:42:18 -0700
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net>
References: <20030420183005.GB8449@barsoom.org> <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net>
Message-ID: <20030422054218.GA18642@barsoom.org>

On Sun, Apr 20, 2003 at 09:31:04PM -0400, Tim Peters wrote:
> I'm opposed to this.  The purpose of Queue is to mediate communication among
> threads, and a Queue.Queue rarely gets large because of its intended
> applications.  As other recent timing posts have shown, you simply can't
> beat the list.append + list.pop(0) approach until a queue gets quite large
> (relative to the intended purpose of a Queue.Queue).

Out of curiosity, I ran some tests, comparing:
        list.append + list.pop(0)
        Queue.Queue
        my modified Queue.Queue

The test adds n integers to the Queue, then removes them.  I use the
timeit module to perform the measurements, and do not count the
loading of the module or creating of the list/queue object (since
presumably the user will do this extremely infrequently).

What I found was that for small n, list.append/pop is much faster than
either Queue implementation.  I assume this means that the bulk of the
time is spent dealing with thread synchronization issues and with the
overhead of using a class.  It takes around one twentieth the time to
complete the list.append/pop compared to either Queue implementation.

For small n, the two Queue implementations were at least in the same
ballpark.  Mine was roughly 25% slower for n < 10, and around 10%
slower for 10 < n < 100.  After that, the difference gradually
declined until the circular array took the lead somewhere in the
vicinity of n=2000.  The performance difference didn't become large
until n=10000 where the O(n^2) growth finally began to kill the
list.append/pop.

Disappointed with these results, I spent some time tweaking my
modified Queue.Queue to improve the performance.  I create local
variables in a few places, perform a bitwise-AND instead of a modulus,
and initialize the circular buffer with 8 elements instead of just 1.
I also now grow the circular buffer more efficiently.

This made a huge difference.  My implementation now outperforms the
current Queue.Queue for n > 1!  It does around 1% to 4% better up
until around n=500, then the advantage starts to slowly ramp up.

My updated Queue implement is here:
    http://www.cs.uoregon.edu/~agthorr/QueueNew.py

and my test program is here:
    http://www.cs.uoregon.edu/~agthorr/test.py

> If you have an unusual application for a Queue.Queue where it's actually
> faster to do a circular-buffer gimmick (and don't believe that you do before
> you time it), 

My application is a little program that sends simulation jobs to a
small server farm.  I have one thread per server that grabs jobs off
the Queue and starts the remote simulation.  I have a fair number of
simulation parameters, and this translates into thousands of jobs
getting added to the Queue.  So, yes, for my particular application,
the O(n^2) behavior really is a genuine problem ;)

If circular-array Queue.Queue was significantly slower for low n, I'd
agree with you that the current implementation should not be changed.
It doesn't appear to be a problem, though.

However, speaking of subclassing Queue: is it likely there are many
user applications that subclass it in a way that would break? (i.e.,
they override some, but not all, of the functions intended for
overriding).

-- Agthorr


From noah@noah.org  Tue Apr 22 09:32:21 2003
From: noah@noah.org (Noah Spurrier)
Date: Tue, 22 Apr 2003 01:32:21 -0700
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net>
Message-ID: <3EA4FE15.1070803@noah.org>

Tim>The callback can rename b and e, and change the contents of the fnames list
Tim>to ["bx", "ex"] so that walk will find the renamed directories.  Etc.

Ha! This is sweet, but I would call this solution "nonobvious".
But perhaps it is a good argument for not modifying os.path.walk(), yet
should a walktree generator be included in Python's future I hope that
it will have the explicit option for postorder depth first.

Tim> Sorry, I'm unmovable on this point.  My typical uses for this function do
Tim> have to separates dirs from non-dirs, walk() has to make the distinction
Tim> *anyway* (for its internal use), and it's expensive for the client to do the
Tim> join() and isdir() bits all over again (isdir() is a filesystem op, and at
Tim> least on my box repeated isdir() is overwhelmingly more costly than
Tim> partitioning or joining a Python list).

I'm probably less adamant on this point than you :-)
And you are right, it's cheaper for me to simply run through both lists
than it would be to loop over a conditional based on isdir().

Tim> What about that worries you?  I don't like it because I have some
Tim> directories with many thousands of files, and stuffing a long redundant path
Tim> at the start of each is wasteful in the abstract.  I'm not sure it really
Tim> matters, though -- e.g., 10K files in a directory * 200 redundant chars each
Tim> = a measly 2 megabytes wasted <wink>.

That was also what bothered me ;-) I guess it's more of a habit than necessity.

Tim> Not all Python platforms have symlinks, of course.  The traditional answer

True, but checking a file with os.path.islink() should be safe even
on platforms without links -- if the docs are to be believed.
The docs says that platforms that don't support links will always return
False for islink(). The Python docs are a little inconsistent on links.
1. os.path.islink(path) claims to be only check links on UNIX and
always false if symbolic links are not supported.
2. os.readlink(path) is only available on UNIX and is not defined on Windows.
3. os.path.realpath(path) claims to be only available on UNIX, but it
is actually defined and returns the given path if you call it on Windows.

Tim> I'm finding you too hard to follow here, because your use of "depthfirst"
Tim> and "breadthfirst" doesn't follow normal usage of the terms.  Here's normal

You are right. I will stop calling it Breadth First now. Feel free to dope slap me.

This confusion on my part was due to the apparent order when one
prints the elements of the names list when the visit function
is called. It would print B, C, D, E, F, G, H, I, J, K, but
that's the parent printing the children, not the children printing themselves
as they are visited. Oh... (a small, dim light clicks on.)

Still, walktree should have the option to hit the bottom of a branch and
then process on it's way back up (post-order).

OK, how is this following version?

Yours,
Noah

from __future__ import generators # needed for Python 2.2
import os

def walktree (basepath=".", postorder=True, ignorelinks=True):
     """This walks a directory tree, starting from the basepath directory.
     This is somewhat like os.path.walk, but using generators
     instead of a visit function. One important difference is that
     walktree() defaults to postorder with optional preorder,
     whereas the os.path.walk function allows only preorder.
     Postorder was made the default because it is safer if
     you are going to be modifying the directory names you visit.
     This avoids the problem of renaming a directory before visiting
     the children of that directory.

     The ignorelinks option determines whether to follow symbolic links.
     Some symbolic links can lead to recursive traversal cycles.
     A better way would be to detect and prune cycles.
     """
     children = os.listdir(basepath)

     dirs, nondirs = [], []

     for name in children:
         fullpath = os.path.join (basepath, name)
         if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)):
             dirs.append(name)
         else:
             nondirs.append(name)

     if not postorder:
         yield basepath, dirs, nondirs

     for name in dirs:
         for next_branch in walktree (os.path.join(basepath, name), postorder, ignorelinks):
             yield next_branch

     if postorder:
         yield basepath, dirs, nondirs

def test():
     for basepath, dirs, nondirs in walktree():
         for name in dirs:
             print os.path.join(basepath, name)
         for name in nondirs:
             print os.path.join(basepath, name)

if __name__ == '__main__':
         test()



From mwh@python.net  Tue Apr 22 09:34:29 2003
From: mwh@python.net (Michael Hudson)
Date: Tue, 22 Apr 2003 09:34:29 +0100
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net> (Tim
 Peters's message of "Tue, 22 Apr 2003 00:06:10 -0400")
References: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net>
Message-ID: <2mist7nd62.fsf@starship.python.net>

Tim Peters <tim.one@comcast.net> writes:

> [Jeremy Fincher]
>> This code brought up an interesting question to me: if sets have
>> a .discard method that removes an element without raising KeyError
>> if the element isn't in the set, should lists perhaps have that same
>> method?
>
> I don't think list.remove(x) is used enough to care, when the presence of x
> in the list is unknown.

I've wished for this, more than once, in the past.  I can't quite
remember why, I have to admit.

while x in seq:
    seq.remove(x)

is vulgar, on at least two levels.

For all that, I'm not sure this is worth the pain.

>> On another related front, sets (in my Python 2.3a2) raise KeyError on a
>> .remove(elt) when elt isn't in the set.  Since sets aren't mappings,
>> should that be a ValueError (like list raises) instead?
>
> Since sets aren't sequences either, why should sets raise the same exception
> lists raise?  It's up to the type to use whichever fool exceptions it
> chooses.  This doesn't always make life easy for users, alas -- there's not
> much consistency in exception behavior across packages.  In this case, a
> user would be wise to avoid expecting IndexError or KeyError, and catch
> their common base class (LookupError) instead.  The distinction between
> IndexError and KeyError isn't really useful (IMO; LookupError was injected
> as a base class recently in Python's life).

Without me noticing, too!  Well, I knew there was a lookup error that
you get when failing to find a codec, but I didn't know IndexError and
KeyError derived from it...

Also note that Jeremy was suggesting *ValueError*, not
IndexError... that any kind of index-or-key-ing is going on is trivia
of the implementation, surely?

Cheers,
M.

-- 
  First of all, email me your AOL password as a security measure. You
  may find that won't be able to connect to the 'net for a while. This
  is normal. The next thing to do is turn your computer upside down
  and shake it to reboot it.                     -- Darren Tucker, asr


From andymac@bullseye.apana.org.au  Tue Apr 22 09:27:01 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Tue, 22 Apr 2003 19:27:01 +1100 (edt)
Subject: [Python-Dev] sre vs gcc (was: New re failures on Windows)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEOHEDAB.tim.one@comcast.net>
Message-ID: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org>

[redirected to people apparently working on SRE]
On Mon, 21 Apr 2003, Tim Peters wrote:

> Narrowing it down to the specific C code that's at fault is still the best
> hope.  There are two reasons for that:
>
> 1. It's very easy to write ill-defined code in C, and for all we know
>    now some part of _sre is depending on undefined, or implementation
>    defined (but apparently likely), behavior.
>
> 2. If that's not the problem, optimization bugs are usually easy to
>    sidestep via minor code changes.  You have to know which code is
>    getting screwed first, though.

Seeing that Gustavo had checked in some changes to _sre.c on Sunday, I CVS
up'ed and now find that a gcc 2.95.4 build survives test_sre with -O3.
A gcc 3.2.2 build still gets a bus error with either -O3 or -O2.

The actual test case from test_sre that fails is:
---8<---8<---
# non-simple '*?' still recurses and hits the recursion limit
test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None, RuntimeError)
---8<---8<---

For the moment, the FreeBSD 5.x (ie gcc 3.2.x) element of my configure.in
patch (SF #725024) is still valid.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From aleax@aleax.it  Tue Apr 22 11:54:10 2003
From: aleax@aleax.it (Alex Martelli)
Date: Tue, 22 Apr 2003 12:54:10 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>
Message-ID: <200304221254.10510.aleax@aleax.it>

On Tuesday 22 April 2003 05:03 am, Tim Peters wrote:
   ...
> filter() is hard to get rid of because the bizarre filter(None, seq)
> special case is supernaturally fast.  Indeed, time the above against
>
> def alltrue(seq):
>     return len(filter(None, seq)) == len(seq)
>
> def atleastonetrue(seq):
>     return bool(filter(None, seq))
>
> Let me know which wins <wink>.

Hmmm, I think I must be missing something here.  Surely in many
application cases a loop exploiting short-circuiting behavior will have
better expected performance than anything that's going all the way
through the sequence no matter what?  Far greater variance, sure,
and if the probability of true items gets extreme enough then the
gain from short-circuiting will evaporate, but...:

[alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in range(9999)]' 
-s'''
> def any(x):
>   for xx in x:
>     if xx: return True
>   return False
> ''' 'any(seq)'
1000000 loops, best of 3: 1.42 usec per loop

[alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in range(9999)]' 
-s'''
def any(x):
  return bool(filter(None,x))
''' 'any(seq)'
1000 loops, best of 3: 679 usec per loop

...i.e., despite filter's amazing performance, looping over 10k items still
takes a bit more than shortcircuiting out at once;-).

If Python ever gains such C-coded functions as any, all, etc (hopefully in
some library module, not in builtins!) I do hope and imagine they'd short-
circuit, of course.  BTW, I think any should return the first true item (or 
the last one if all false, or False for an empty sequence) and all should
return the first false item (or the last one if all true, or True for an empty 
seq) by analogy with the behavior of operators and/or.


Alex



From guido@python.org  Tue Apr 22 13:03:15 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 08:03:15 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: "Your message of Mon, 21 Apr 2003 18:55:38 PDT."
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
Message-ID: <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net>

> The UnixWare build is way dead right now. (today's CVS)
> 
> cc -c -K pentium,host,inline,loop_unroll,alloca  -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include  -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c
> UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set
> UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select
> gmake: *** [Modules/python.o] Error 1

That doesn't look like a *new* problem to me; if sys/select.h is being
included twice, that probably was so for a long time.  You may be the
only person with access to this platform.  Can you find the problem?

Was this present in 2.3a2?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From harri.pasanen@trema.com  Tue Apr 22 13:47:02 2003
From: harri.pasanen@trema.com (Harri Pasanen)
Date: Tue, 22 Apr 2003 14:47:02 +0200
Subject: [Python-Dev] Embedded python on Win2K, import failures
In-Reply-To: <022e01c301bd$4b7f5a70$530f8490@eden>
References: <022e01c301bd$4b7f5a70$530f8490@eden>
Message-ID: <200304221447.02812.harri.pasanen@trema.com>

On Sunday 13 April 2003 15:05, Mark Hammond wrote:
> > Did you try -v, as
> >
> > > 'import site' failed; use -v for traceback
> >
> > suggested?
>
> Yep. as I said:
> > > Running with "-v" shows:
>
> Note that as I mentioned, this is only if you move away _sre.pyd. 
> The original report was almost certainly a simple import error.

I was away the past week, so excuse my delayed response.

Ok, I found the problem, it is just a difference in the way Linux and 
Windows versions are built, but the failure mode could arguably be a 
bug.

_sre.pyd is a separate module in windows, while on linux it is part of 
the whole lib.  (libpython23.a, libpython23.so).

I was running the python from the build tree, and PCbuild was not part 
of the sys.path for the embedded python.

When running with the interactive python, in the style 
../../PCbuild/python.exe, the sys.path implicitly gets the PCbuild 
directory, _sre.pyd is found and everything works.

So when everything is configured and installed properly, everything 
works.   

The bug here is, that when _sre.pyd is not found from sys.path, and 
I'm running the embedded python.  I'm not seeing any import errors, 
things just silently fail. At runtime "import re" goes though without 
a problem, but the resulting module is invalid, which is only noticed 
when the module is first time used.   So I had no idea that _sre 
module was not being found, or that it was even required.

Does this merit a bug at sf?  

Another thing, assuming I would get an import error from the embedded 
python, how do I enable the "use -v for traceback" for an it?  Is 
there a function call I can add for the same effect?

Regards,
Harri


From bkc@murkworks.com  Tue Apr 22 14:41:45 2003
From: bkc@murkworks.com (Brad Clements)
Date: Tue, 22 Apr 2003 09:41:45 -0400
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <m31xzvf6e4.fsf@mira.informatik.hu-berlin.de>
References: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
Message-ID: <3EA50E59.29235.1E8CFCCA@localhost>

On 22 Apr 2003 at 7:28, Martin v. L=F6wis wrote:

> Tim Rice <tim@multitalents.net> writes:
> 
> > The UnixWare build is way dead right now. (today's CVS)
> 
> Any volunteers to fix it?
> 
> Regards,
> Martin

I'm sorry I'm not in a position to fix it, but I do have an un-opened Unix=
ware Advanced 
Server 2.01 box set (docs and media) if anyone wants them.

Personally, I think Unixware is dead. Novell dropped it ages ago.





-- 
Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
http://www.wecanstopspam.org/                   AOL-IM: BKClements



From tim@multitalents.net  Tue Apr 22 15:03:19 2003
From: tim@multitalents.net (Tim Rice)
Date: Tue, 22 Apr 2003 07:03:19 -0700 (PDT)
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <3EA50E59.29235.1E8CFCCA@localhost>
References: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
 <3EA50E59.29235.1E8CFCCA@localhost>
Message-ID: <Pine.UW2.4.53.0304220701050.25189@ou8.int.multitalents.net>

On Tue, 22 Apr 2003, Brad Clements wrote:

> On 22 Apr 2003 at 7:28, Martin v. L=F6wis wrote:
>=20
> > Tim Rice <tim@multitalents.net> writes:
> >
> > > The UnixWare build is way dead right now. (today's CVS)
> >
> > Any volunteers to fix it?
> >
> > Regards,
> > Martin
>=20
> I'm sorry I'm not in a position to fix it, but I do have an un-opened Uni=
xware Advanced
> Server 2.01 box set (docs and media) if anyone wants them.
>=20
> Personally, I think Unixware is dead. Novell dropped it ages ago.

"Dropped it" isn't quite correct. They sold it to SCO.

--=20
Tim Rice=09=09=09=09Multitalents=09(707) 887-1469
tim@multitalents.net



From tim@zope.com  Tue Apr 22 16:36:24 2003
From: tim@zope.com (Tim Peters)
Date: Tue, 22 Apr 2003 11:36:24 -0400
Subject: [Python-Dev] New thread death in test_bsddb3
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com>

test_bsddb3.py fails quickly today under a debug build, with a thread state
error, on Win2K, every time.  Linux?

I assume this is a bad interaction between Mark Hammond's new
auto-thread-state code and _bsddb.c's custom thread-manipulation macros:


C:\Code\python\PCbuild>python_d ../lib/test/test_bsddb3.py
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002)
bsddb.db.version():   (4, 1, 25)
bsddb.db.__version__: 4.1.5
bsddb.db.cvsid:       $Id: _bsddb.c,v 1.11 2003/03/31 19:51:29 bwarsaw Exp $
python version:        2.3a2+ (#39, Apr 22 2003, 10:48:23) [MSC v.1200 32
bit (I
ntel)]
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Fatal Python error: Invalid thread state for this thread

C:\Code\python\PCbuild>


It's dying in _db_associateCallback, here:

static int
_db_associateCallback(DB* db, const DBT* priKey, const DBT* priData,
                      DBT* secKey)
{
    int       retval = DB_DONOTINDEX;
    DBObject* secondaryDB = (DBObject*)db->app_private;
    PyObject* callback = secondaryDB->associateCallback;
    int       type = secondaryDB->primaryDBType;
    PyObject* key;
    PyObject* data;
    PyObject* args;
    PyObject* result;


    if (callback != NULL) {
        MYDB_BEGIN_BLOCK_THREADS;   ************ HERE *************




The macro is defined like so:

#define MYDB_BEGIN_BLOCK_THREADS {                              \
        PyThreadState* prevState;                               \
        PyThreadState* newState;                                \
        PyEval_AcquireLock();                                   \
        newState  = PyThreadState_New(_db_interpreterState);    \
        prevState = PyThreadState_Swap(newState);





PyThreadState_Swap is complaining here:

#if defined(Py_DEBUG)
	if (new) {
		PyThreadState *check = PyGILState_GetThisThreadState();
		if (check && check != new)
			Py_FatalError("Invalid thread state for this thread");
	}
#endif




This is a new check, I believe it's an intentional check, and I doubt
_bsddb.c *should* pass it as-is.



From gherron@islandtraining.com  Tue Apr 22 16:49:42 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Tue, 22 Apr 2003 08:49:42 -0700
Subject: [Python-Dev] Re: sre vs gcc (was: New re failures on Windows)
In-Reply-To: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org>
References: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org>
Message-ID: <200304220849.43411.gherron@islandtraining.com>

On Tuesday 22 April 2003 01:27 am, Andrew MacIntyre wrote:
> [redirected to people apparently working on SRE]
>
> On Mon, 21 Apr 2003, Tim Peters wrote:
> > Narrowing it down to the specific C code that's at fault is still the
> > best hope.  There are two reasons for that:
> >
> > 1. It's very easy to write ill-defined code in C, and for all we know
> >    now some part of _sre is depending on undefined, or implementation
> >    defined (but apparently likely), behavior.
> >
> > 2. If that's not the problem, optimization bugs are usually easy to
> >    sidestep via minor code changes.  You have to know which code is
> >    getting screwed first, though.
>
> Seeing that Gustavo had checked in some changes to _sre.c on Sunday, I CVS
> up'ed and now find that a gcc 2.95.4 build survives test_sre with -O3.
> A gcc 3.2.2 build still gets a bus error with either -O3 or -O2.
>
> The actual test case from test_sre that fails is:
> ---8<---8<---
> # non-simple '*?' still recurses and hits the recursion limit
> test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None,
> RuntimeError) ---8<---8<---
>
> For the moment, the FreeBSD 5.x (ie gcc 3.2.x) element of my configure.in
> patch (SF #725024) is still valid.

Ah.  Good clue!  Here's a very likely fix to that problem.  Around
line 3102 of _sre.c find the line that sets USE_RECURSION_LIMIT.
Depending on you platform it will be set to either 10000 or 7500.  As
a test, lower that value to 1000 or even 100.  If all the tests pass,
then we know the culprit.

The sre code uses that value to prevent run-away recursion from
overflowing the stack.  It's value must be large enough to allow for
*reasonable* levels of recursion, but small enough to catch a run-away
recursion before it actually overflows the stack.  On at least one
class of machines, a value of 10000 was determined to be too high
(i.e., the stack overflowed before that many levels of recursion were
hit), and so the limit for them was lowered to 7500.  Perhaps such is
needed for your platform.

You have a lot of leeway here in your tests.  None of the tests in
test_sre recurse more than 100 levels except for that one test which
is expressly designed to blow past any limit, thereby testing that
excessive recursion caught correctly.  (And on your system, it is not
being caught correctly, perhaps because the stack is overflowing
before the USE_RECURSION_LIMIT is hit.)

Let me know the results of the test please.

Thank you,
Gary Herron





From tim@multitalents.net  Tue Apr 22 16:54:40 2003
From: tim@multitalents.net (Tim Rice)
Date: Tue, 22 Apr 2003 08:54:40 -0700 (PDT)
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
 <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net>

On Tue, 22 Apr 2003, Guido van Rossum wrote:

> > The UnixWare build is way dead right now. (today's CVS)
> > 
> > cc -c -K pentium,host,inline,loop_unroll,alloca  -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include  -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c
> > UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set
> > UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select
> > gmake: *** [Modules/python.o] Error 1
> 
> That doesn't look like a *new* problem to me; if sys/select.h is being
> included twice, that probably was so for a long time.  You may be the
> only person with access to this platform.  Can you find the problem?
> 
> Was this present in 2.3a2?
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

I think it was in 2.3a1 and probably before.

It looks like the problem is having both sys/time.h and sys/select.h
included when both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED are
defined.

SYS_SELECT_WITH_SYS_TIME is not defined in pyconfig.h so configure
is detecting the problem. It's just that SYS_SELECT_WITH_SYS_TIME is
not user anywhere in the code.

Something like this will get things a lot farther.
------------------------
--- pyport.h.old	2003-04-17 13:17:24.000000000 -0700
+++ pyport.h	2003-04-22 08:51:43.230240009 -0700
@@ -115,7 +115,9 @@
 
 #ifdef HAVE_SYS_SELECT_H
 
+#ifdef SYS_SELECT_WITH_SYS_TIME
 #include <sys/select.h>
+#endif
 
 #endif /* !HAVE_SYS_SELECT_H */
 
------------------------

-- 
Tim Rice				Multitalents	(707) 887-1469
tim@multitalents.net



From walter@livinglogic.de  Tue Apr 22 16:57:12 2003
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Tue, 22 Apr 2003 17:57:12 +0200
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net>
References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> <20030421014851.GB18971@glacier.arctrix.com> <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3EA56658.10408@livinglogic.de>

Guido van Rossum wrote:

>>Guido van Rossum wrote:
>>
>>>But if I had to do it over again, I wouldn't have added walk() in the
>>>current form.
>>
>>I think it's the perfect place for a generator.

Has anybody considered Jason Orendorff's path module
(http://www.jorendorff.com/articles/python/path/)
for inclusion in the standard library? It has a path walking
generator and much, much more.

> Absolutely!  So let's try to write something new based on generators,
> make it flexible enough so that it can handle pre-order or post-order
> visits, and then phase out os.walk().

This new generator should probably support callbacks that determine
whether directories should be entered or not.

Bye,
    Walter Dörwald




From guido@python.org  Tue Apr 22 17:01:34 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 12:01:34 -0400
Subject: [Python-Dev] Magic number needs upgrade
Message-ID: <200304221601.h3MG1Yo32750@odiug.zope.com>

Now that we have new bytecode optimizations, the pyc file magic number
needs to be changed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From walter@livinglogic.de  Tue Apr 22 17:08:52 2003
From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 22 Apr 2003 18:08:52 +0200
Subject: [Python-Dev] test_pwd failing
In-Reply-To: <20030419160754.GA847@cthulhu.gerg.ca>
References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> <20030419160754.GA847@cthulhu.gerg.ca>
Message-ID: <3EA56914.2040803@livinglogic.de>

Greg Ward wrote:

> On 15 April 2003, Walter Dörwald said:
> 
>>Should the same change be done for the pwd module, i.e.
>>are duplicate gid's allowed in /etc/group?
> 
> Yes.  I got a test failure from test_grp the other night, but I didn't
> report it because I hadn't investigated it thoroughly yet.  I'm guessing
> it's the same as the test_pwd failure... and yes, it stems from a
> duplicate GID in the /etc/group file on that system.

This (and duplicate user or group names) should be fixed now.

Bye,
    Walter Dörwald




From guido@python.org  Tue Apr 22 17:40:21 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 12:40:21 -0400
Subject: [Python-Dev] Re: Magic number needs upgrade
In-Reply-To: Your message of "Tue, 22 Apr 2003 12:01:34 EDT."
Message-ID: <200304221640.h3MGeLP05887@odiug.zope.com>

> Now that we have new bytecode optimizations, the pyc file magic
> number needs to be changed.

Of course we might also consider turning back Raymond's bytecode
optimizations.  Given that I can't discern any speedup, I wonder what
the wisdom is of adding more code complexity.

We're still holding off on Ping and Aahz's changes (see the
cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for
similar reasons (inconclusive evidence of speedups in real programs).

What makes Raymond's changes different?

I also wonder why this is done unconditionally, rather than only with
-O.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Tue Apr 22 20:53:25 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 15:53:25 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <3E9DD413.8030002@v.loewis.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry>  <3E9DD413.8030002@v.loewis.de>
Message-ID: <1051041205.32490.51.camel@barry>

On Wed, 2003-04-16 at 18:07, "Martin v. Löwis" wrote:

> > So why isn't the English/US-ASCII bias for msgids considered a liability
> > for gettext?  Do non-English programmers not want to use native literals
> > in their source code?
> 
> Using English for msgids is about the only way to get translation. 
> Finding a Turkish speaker who can translate from Spanish is 
> *significantly* more difficult than starting from English; if you were 
> starting from, say, Chinese, and going to Hebrew might just be impossible.
> 
> So any programmer who seriously wants to have his software translated 
> will put English texts into the source code. Non-English literals are 
> only used if l10n is not an issue.

That's probably true.  I'm just not sure Zope wants to make that a
requirement.

> > BTW, I believe that if all your msgids /are/ us-ascii, you should be
> > able to ignore this change and have it works backwards compatibly.
> 
> "This" change being addition of the "coerce" argument? If you think
> you will need it, we can leave it in.

Actually, thinking about this more, we probably don't even need the
coerce flag.  If all your msgids are us-ascii, you don't care whether
they've been coerced to Unicode or not because they'll still compare
equal.

So I propose to remove the coerce flag, but still Unicode-ify both
msgids and msgstrs.  Then .ugettext() will just return the Unicode
msgstr in the catalog, while .gettext() will encode it to an 8-bit
string based on the charset.  Personally, I think most i18n Python apps
are going to want to use .ugettext() anyway, so for the average program
this will just work as expected.

I have the tests passing for this change.  Any objections?

> >>If the msgids are UTF-8, with non-ASCII characters C-escaped,
> >>translators will *still* put non-UTF-8 encodings into the catalogs.
> >>This will then be a problem: The catalog encoding won't be UTF-8,
> >>and you can't process the msgids.
> > 
> > Isn't this just another validation step to run on the .po files?  There
> > are already several ways translators can (and do!) make mistakes, so we
> > already have to validate the files anyway.
> 
> I'm not sure how exactly a validation step would be executed. Would that
> step simply verify that the encoding of a catalog is UTF-8? That 
> validation step would fail for catalogs that legally use other charsets.

The validation step would make sure that all the msgids and msgstrs
could be decoded using the encoding claimed in the headers.  If msgids
are us-ascii then (just about) any other encoding for msgstrs should
work just fine.  If there are non-ascii in both msgids and msgstrs, then
some common encoding would have to be used (what other than utf-8?). 
It's a choice left up to the application and its translators.

-Barry




From jeremy@zope.com  Tue Apr 22 20:47:27 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 22 Apr 2003 15:47:27 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
Message-ID: <1051040847.12834.32.camel@slothrop.zope.com>

I've been working a little on the trace module lately, trying to get it
to work correctly with Zope.  One issue that remains open is how to
handle multi-threaded programs.  The PEP below proposes a solution.

Jeremy

PEP: XXX
Title: Trace and Profile Support for Threads
Version: $Revision: 1.1 $
Last-Modified: $Date: 2002/08/30 04:11:20 $
Author: Jeremy Hylton <jeremy@alum.mit.edu>
Status: Active
Type: Standards Track
Content-Type: text/x-rst
Created: 22-Apr-2003
Post-History: 22-Apr-2003

Abstract
========

This PEP describes a mechanism for attaching profile and trace
functions to a thread when it is created.  This mechanism allows
existing tools, like the profiler, to work with multi-threaded
programs.  The new functionality is exposed via a new event type for
trace functions.

Rationale
=========

The Python interpreter provides profile and trace hooks to support
tools like debuggers and profilers.  The hooks are associated with a
single thread, which makes them harder to use in a multi-threaded
environment.  For example, the profiler will only collect data for a
single thread.  If the profiled application spawns new threads, the
new threads will not be profiled.  This PEP describes a mechanism that
allows tools using profile and trace hooks to hook thread creation
events.  This mechanism would allow tools like the profiler to
automatically instrument new threads as soon as they are created.

The ability to hook thread creation makes a variety of tools more
useful.  It should allow them to work seamlessly with multi-threaded
applications.  The best alternative given the current interpreter
support is to edit a multi-threaded application to manually insert
calls to enable tracing or profiling.

Background
==========

There are two different hooks provided by the interpreter, one for
tracing and one for profiling.  The hooks are basically the same,
except that the trace hook is called for each line that is executed
but the profile hook is only called for each function.  The hooks are
exposed by the C API [1] and at the Python level by the sys module [2].
For simplicity, the rest of the section just talks about the trace
function. 

A trace function [3] is called with three arguments: a frame, an
event, and an event-dependent argument.  The event is one of the
following strings: "call," "line," "return," or "exception."  The C
API defines trace function that takes an int instead of a string to
define the trace event.

The sys.settrace() function sets the global trace function.  A global
trace function is called whenever a new local scope is entered.  If
the global trace function returns a value, it is used as the local
trace function.  If it returns None, no local tracing occurs.

Thread creation event
=====================

The proposed mechanism is to add a thread creation event called
"thread" and PyTrace_THREAD.  When thread.start_new_thread() is
called, the calling thread's trace function is called with a thread
event.  The frame passed is None or NULL and the argument is the
callable argument passed to start_new_thread().  If the trace function
returns a value from the thread event, it is used as the global trace
function for the newly created thread.

Implementation
==============

The bootstrap code in the thread module (Modules/threadmodule.c) must
be extended to take trace functions into account.  A thread's
bootstate must be extended to include pointers to the trace function
and its state object.  The t_bootstrap() code must call the trace
function before executing the boot function.

Compatibility and Limitations
=============================

An existing trace or profile function may be unprepared for the new
event type.  This may cause them to treat the thread event as some
other kind of event.

The thread event does not pass a valid frame object, because the frame
isn't available before the thread starts running.  Once the thread
starts running, it is too late to generate the thread event.

The hook is only available when a thread is created using the Python
thread module.  If a custom C extension calls
PyThread_start_new_thread() directly, the trace function will not be
called for that thread.  It's hard to judge whether this behavior is
good or bad.  It is driven partly by implementation details.  The
implementation of PyThread_start_new_thread() can not tell when or if
Python code will be executed by the thread.

References
==========

.. [1] Section 8.2, Profiling and Tracing, Python/C API Reference Manual
   (http://www.python.org/dev/doc/devel/api/profiling.html)

.. [2] Section 3.1, sys, Python Library Reference
   (http://www.python.org/dev/doc/devel/lib/module-sys.html)

.. [3] Section 9.2, How It Works (Python Debugger), Python Library
Reference
   (http://www.python.org/dev/doc/devel/lib/debugger-hooks.html)


Copyright
=========

This document has been placed in the public domain.




From dave@boost-consulting.com  Tue Apr 22 22:58:23 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Tue, 22 Apr 2003 17:58:23 -0400
Subject: [Python-Dev] Metatype conflict among bases?
Message-ID: <84lly2i48w.fsf@boost-consulting.com>

Consider:

    class A(object):
        class __metaclass__(type):
            pass

    class B(A):  # TypeError: metatype conflict among bases
        class __metaclass__(type):
            pass

Now that's a weird error message at least!  There's only one base (A),
and I'm telling Python explicitly to use the nested __metaclass__
instead of A's __metaclass__!

Should I not be surprised that Python won't let me set the metatype
explicitly?

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From drifty@alum.berkeley.edu  Tue Apr 22 22:58:10 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Tue, 22 Apr 2003 14:58:10 -0700 (PDT)
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: <1051040847.12834.32.camel@slothrop.zope.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
Message-ID: <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU>

[Jeremy Hylton]

> I've been working a little on the trace module lately, trying to get it
> to work correctly with Zope.  One issue that remains open is how to
> handle multi-threaded programs.  The PEP below proposes a solution.
>

Seems reasonable to me.  Now if we just got rid of threads all together we
wouldn't have to worry about this.  =)

<snip - a lot of stuff>
> A trace function [3] is called with three arguments: a frame, an
> event, and an event-dependent argument.  The event is one of the
> following strings: "call," "line," "return," or "exception."  The C
> API defines trace function that takes an int instead of a string to
             ^
> define the trace event.
>

Need "a" here?  One one grammatical mistake?!?  Wish I could pull that off
once in the summaries.  =)

-Brett


From python@rcn.com  Tue Apr 22 23:01:05 2003
From: python@rcn.com (Raymond Hettinger)
Date: Tue, 22 Apr 2003 18:01:05 -0400
Subject: [Python-Dev] Re: Magic number needs upgrade
References: <200304221640.h3MGeLP05887@odiug.zope.com>
Message-ID: <002101c3091a$ac2dfac0$1a10a044@oemcomputer>

> > Now that we have new bytecode optimizations, the pyc file magic
> > number needs to be changed.

We have several options:

1. change the magic number to accomodate NOP.

2. install an additional step that eliminates the NOPs from
    the bytecode (they are not strictly necessary).  this will make
    the code even shorter and faster without a need to change the
    magic number.  i've got this in my hip pocket if we decide that
    this is the way to go.  the generated code is beautiful.

3. eliminate the last two optimizations which were the only ones
    that needed a NOP:

    a)   compare_op (is, in,is not, not in)  unary_not -->
                  compare_op(is not, not in, is, in)   nop
     b)  unary_not jump_if_false (tgt) -->
                   nop   jump_if_true (tgt) 


> I wonder what
> the wisdom is of adding more code complexity.

Part of the benefit is that there will no longer be any need to re-arrange
branches and conditionals in order to avoid 'not'.  As of now, it has 
near-zero cost in most situations (except when used with and/or).


> We're still holding off on Ping and Aahz's changes (see the
> cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for
> similar reasons (inconclusive evidence of speedups in real programs).
> 
> What makes Raymond's changes different?

* They are thoroughly tested.

* They are decoupled from the surrounding code and
   will survive changes to ceval.c and newcompile.c.

* They provide some benefits without hurting anything else.

* They provide a framework for others to build upon.
   The scanning loop and basic block tester make it
    a piece of cake to add/change/remove new code transformations.

CALL_ATTR ought to go in when it is ready.  It certainly provides
measurable speed-up in the targeted behavior.  It just needs more
polish so that it doesn't slow down other pathways.  The benefit
is real, but in real programs it is being offset by reduced performance
in non-targeted behavior.  With some more work, it ought to be a 
real gem.  Unfortunately, it is tightly coupled to the implementation
of new and old-style class.   Still, it looks like a winner.

What we're seeing is a consequence of Amdahl's law and Python's
broad scope.  Instead of a single hotspot, Python exercises many
different types of code and each needs to be optimized separately.
People have taken on many of these and collectively they are having
a great effect.  The proposals by Ping, Aahz, Brett, and Thomas
are import steps to address untouched areas.   

I took on the task of making sure that the basic pure python code
slithers along quickly.  The basics like "while", "for", "if", "not"
have all been improved.  Lowering the cost of those constructs
will result in less effort towards by-passing them with vectorized 
code (map, etc).  Code in something like sets.py won't show much
benefit because so much effort had been directed at using filter,
map, dict.update, and other high volume c-coded functions and
methods.

Any one person's optimizations will likely help by a few percent 
at most.  But, taken together, they will be a big win.


> I also wonder why this is done unconditionally, rather than only with
> -O.

Neal, Brett, and I had discussed this a bit and I came to the conclusion
that these code transformations are like the ones already built into the
compiler -- they have some benefit, but cost almost nothing (two passes
over the code string at compile time).  The -O option makes sense for
optimizations that have a high time overhead, throw-away debugging
information, change semantics, or reduce feature access.  IOW, -O is
for when you're trading something away in return for a bit of speed
in production code.

There is essentially no benefit to not using the optimized bytecode.


Raymond Hettinger


From jeremy@zope.com  Tue Apr 22 23:02:07 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 22 Apr 2003 18:02:07 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU>
References: <1051040847.12834.32.camel@slothrop.zope.com>
 <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU>
Message-ID: <1051048927.12834.47.camel@slothrop.zope.com>

On Tue, 2003-04-22 at 17:58, Brett Cannon wrote:
> <snip - a lot of stuff>
> > A trace function [3] is called with three arguments: a frame, an
> > event, and an event-dependent argument.  The event is one of the
> > following strings: "call," "line," "return," or "exception."  The C
> > API defines trace function that takes an int instead of a string to
>              ^
> > define the trace event.
> >
> 
> Need "a" here?  One one grammatical mistake?!?  Wish I could pull that off
> once in the summaries.  =)

The PEP was short.  Just write shorter summaries <wink>.

Jeremy




From martin@v.loewis.de  Tue Apr 22 23:15:08 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 23 Apr 2003 00:15:08 +0200
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3
In-Reply-To: <1051041205.32490.51.camel@barry>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de>
 <1051041205.32490.51.camel@barry>
Message-ID: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> So I propose to remove the coerce flag, but still Unicode-ify both
> msgids and msgstrs.  Then .ugettext() will just return the Unicode
> msgstr in the catalog, while .gettext() will encode it to an 8-bit
> string based on the charset.  Personally, I think most i18n Python apps
> are going to want to use .ugettext() anyway, so for the average program
> this will just work as expected.
> 
> I have the tests passing for this change.  Any objections?

For safety, I'd recommend that you use byte string msgids if
conversion to Unicode fails. Otherwise, I'm fine with automatically
coercing everything to Unicode.

I do know about catalogs that use Latin-1 in msgids (to represent
accented characters in the names of authors). That should not cause
failures.

Regards,
Martin


From mhammond@skippinet.com.au  Tue Apr 22 23:27:44 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 23 Apr 2003 08:27:44 +1000
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com>
Message-ID: <000a01c3091e$65a978a0$530f8490@eden>

> test_bsddb3.py fails quickly today under a debug build, with
> a thread state
> error, on Win2K, every time.  Linux?
>
> I assume this is a bad interaction between Mark Hammond's new
> auto-thread-state code and _bsddb.c's custom
> thread-manipulation macros:

Yes, this is my fault.  The assertion is detecting the fact that bsddb is
creating and using its own interpreter/thread states than using the
thread-state already seen for that thread.

As Tim says, the assertion is new, but the check it makes is valid.  I
believe that removing the assertion would allow it to work, but the right
thing to do is fix bsddb to use the new PyGILState_ API, and therefore share
the threadstate with the rest of Python.

I will do this very shortly (ie, within a couple of hours)

Mark.



From pje@telecommunity.com  Tue Apr 22 23:31:24 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 22 Apr 2003 18:31:24 -0400
Subject: [Python-Dev] Metatype conflict among bases?
Message-ID: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net>

David Abrahams <dave@boost-consulting.com> wrote:
 >
 >Consider:
 >
 >    class A(object):
 >        class __metaclass__(type):
 >            pass
 >
 >    class B(A):  # TypeError: metatype conflict among bases
 >        class __metaclass__(type):
 >            pass
 >
 >Now that's a weird error message at least!  There's only one base (A),
 >and I'm telling Python explicitly to use the nested __metaclass__
 >instead of A's __metaclass__!
 >
 >Should I not be surprised that Python won't let me set the metatype
 >explicitly?

The problem here is that B.__metaclass__ *must* be the same as, or a 
subclass of, A.__metaclass__, or vice versa.  It doesn't matter whether the 
metaclass is specified implicitly or explicitly, this constraint must be 
met.  Your code doesn't meet this constraint.  Here's a revised example 
that does:

     class A(object):
         class __metaclass__(type):
             pass

     class B(A):
         class __metaclass__(A.__class__):
             pass

B.__metaclass__ will now meet the "metaclass inheritance" constraint.  See 
the "descrintro" document for some more info about this, and the "Putting 
Metaclasses To Work" book for even more info about it than you would ever 
want to know.  :)

Here's a short statement of the constraint, though:

A class X's metaclass (X.__class__) must be identical to, or a subclass of, 
the metaclass of *every* class in X.__bases__.  That is:

for b in X.__bases__:
     assert X.__class__ is b.__class__ or issubclass(X.__class, b.__class__),\
         "metatype conflict among bases"





From tim@multitalents.net  Tue Apr 22 23:35:52 2003
From: tim@multitalents.net (Tim Rice)
Date: Tue, 22 Apr 2003 15:35:52 -0700 (PDT)
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
 <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net>
Message-ID: <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net>

On Tue, 22 Apr 2003, Tim Rice wrote:

> On Tue, 22 Apr 2003, Guido van Rossum wrote:
> 
> > > The UnixWare build is way dead right now. (today's CVS)
> > > 
> > > cc -c -K pentium,host,inline,loop_unroll,alloca  -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include  -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c
> > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set
> > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select
> > > gmake: *** [Modules/python.o] Error 1
> > 
> > That doesn't look like a *new* problem to me; if sys/select.h is being
> > included twice, that probably was so for a long time.  You may be the
> > only person with access to this platform.  Can you find the problem?
> > 
> > Was this present in 2.3a2?
> > 
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> I think it was in 2.3a1 and probably before.
> 
> It looks like the problem is having both sys/time.h and sys/select.h
> included when both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED are
> defined.
> 
> SYS_SELECT_WITH_SYS_TIME is not defined in pyconfig.h so configure
> is detecting the problem. It's just that SYS_SELECT_WITH_SYS_TIME is
> not user anywhere in the code.
> 
> Something like this will get things a lot farther.
> ------------------------
> --- pyport.h.old	2003-04-17 13:17:24.000000000 -0700
> +++ pyport.h	2003-04-22 08:51:43.230240009 -0700
> @@ -115,7 +115,9 @@
>  
>  #ifdef HAVE_SYS_SELECT_H
>  
> +#ifdef SYS_SELECT_WITH_SYS_TIME
>  #include <sys/select.h>
> +#endif
>  
>  #endif /* !HAVE_SYS_SELECT_H */
>  
> ------------------------

Well after patching pyport.h for the sys/select problem, I had
errors because of missing u_int and u_long data types.
Patch configure.in, pyconfig.h.in, pyport.h.
Now u_char, and u_short.
Patch configure.in, pyconfig.h.in, pyport.h. some more.
Now missing defines of NI_MAXHOST, NI_NUMERICHOST, & NI_MAXSERV.
At that point I said to myself "This is nuts, 2.2.2 worked fine".

So I backed out all my other patches and added this one.
--------------------------
--- configure.in.old	2003-04-17 13:16:42.000000000 -0700
+++ configure.in	2003-04-22 15:26:13.450080095 -0700
@@ -124,6 +124,8 @@
   # of union __?sigval. Reported by Stuart Bishop.
   SunOS/5.6)
     define_xopen_source=no;;
+  OpenUNIX/8.* | UnixWare/7.*)
+    define_xopen_source=no;;
 esac
 
 if test $define_xopen_source = yes
--------------------------

Builds fine now.

-- 
Tim Rice				Multitalents	(707) 887-1469
tim@multitalents.net



From mhammond@skippinet.com.au  Wed Apr 23 00:05:26 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 23 Apr 2003 09:05:26 +1000
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com>
Message-ID: <000001c30923$aa162830$530f8490@eden>

> test_bsddb3.py fails quickly today under a debug build, with
> a thread state
> error, on Win2K, every time.  Linux?

Actually, some guidance would be nice here.

Is this code (_bsddb.c) ever expected to again build under pre-trunk
versions of Python, or can I remove the old thread-state management code?

ie, should my changes be or the style:

#if defined(NEW_PYGILSTATE_API_EXISTS)
// new 1 line of code
#else
// existing many lines of code
#endif

Or just stick with the new code?

Nothing-is-finished-until-there-is-nothing-left-to-remove ly,

Mark.



From tim.one@comcast.net  Wed Apr 23 00:18:22 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 22 Apr 2003 19:18:22 -0400
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <000001c30923$aa162830$530f8490@eden>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net>

[Mark Hammond]
> Actually, some guidance would be nice here.

It's easy this time.  BTW, I agree your new check is the right thing to do!
If another case like this pops up, though, we/you should probably add a
section to the PEP explaining what to do about it.

> Is this code (_bsddb.c) ever expected to again build under pre-trunk
> versions of Python, or can I remove the old thread-state management code?

The former:  the pybsddb project still exists and is used with older
versions of Python.  Barry mumbled something today at the office about
wanting to keep the C code in synch.

> ie, should my changes be or the style:
>
> #if defined(NEW_PYGILSTATE_API_EXISTS)
> // new 1 line of code
> #else
> // existing many lines of code
> #endif

Yes, that would be great.



From mhammond@skippinet.com.au  Wed Apr 23 00:41:44 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 23 Apr 2003 09:41:44 +1000
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net>
Message-ID: <000301c30928$bc311160$530f8490@eden>

> Yes, that would be great.

Cool - all checked in.  Thanks.

Mark.



From guido@python.org  Wed Apr 23 01:23:47 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 20:23:47 -0400
Subject: [Python-Dev] Metatype conflict among bases?
In-Reply-To: "Your message of Tue, 22 Apr 2003 17:58:23 EDT."
 <84lly2i48w.fsf@boost-consulting.com>
References: <84lly2i48w.fsf@boost-consulting.com>
Message-ID: <200304230023.h3N0Nlf26157@pcp02138704pcs.reston01.va.comcast.net>

> Consider:
> 
>     class A(object):
>         class __metaclass__(type):
>             pass
> 
>     class B(A):  # TypeError: metatype conflict among bases
>         class __metaclass__(type):
>             pass
> 
> Now that's a weird error message at least!  There's only one base (A),
> and I'm telling Python explicitly to use the nested __metaclass__
> instead of A's __metaclass__!
> 
> Should I not be surprised that Python won't let me set the metatype
> explicitly?

The metaclass must be a subclass of the metaclass of all the bases.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Apr 23 01:49:03 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 20:49:03 -0400
Subject: [Python-Dev] Re: Magic number needs upgrade
In-Reply-To: "Your message of Tue, 22 Apr 2003 18:01:05 EDT."
 <002101c3091a$ac2dfac0$1a10a044@oemcomputer>
References: <200304221640.h3MGeLP05887@odiug.zope.com>
 <002101c3091a$ac2dfac0$1a10a044@oemcomputer>
Message-ID: <200304230049.h3N0n3Q26957@pcp02138704pcs.reston01.va.comcast.net>

> > What makes Raymond's changes different?
> 
> * They are thoroughly tested.
> 
> * They are decoupled from the surrounding code and
>    will survive changes to ceval.c and newcompile.c.
> 
> * They provide some benefits without hurting anything else.

What are the benefits?  I see zero improvement.  And more code hurts.

> * They provide a framework for others to build upon.
>    The scanning loop and basic block tester make it
>     a piece of cake to add/change/remove new code transformations.

> CALL_ATTR ought to go in when it is ready.

No, only if it really makes a difference.  We can't expect to beat
Parrot by accumulating an endless string of theoretical improvements
that each contribute 0.1% speedup to the average application.

> It certainly provides measurable speed-up in the targeted behavior.
> It just needs more polish so that it doesn't slow down other
> pathways.  The benefit is real, but in real programs it is being
> offset by reduced performance in non-targeted behavior.  With some
> more work, it ought to be a real gem.  Unfortunately, it is tightly
> coupled to the implementation of new and old-style class.  Still, it
> looks like a winner.

That's what I though, until I benchmarked it.  It's possible that it
can be saved.  It's also possible that we've pretty much reached a
point where any optimization we think of is somehow undone by the
effect of more code and hence less code locality.

> What we're seeing is a consequence of Amdahl's law and Python's
> broad scope.  Instead of a single hotspot, Python exercises many
> different types of code and each needs to be optimized separately.
> People have taken on many of these and collectively they are having
> a great effect.  The proposals by Ping, Aahz, Brett, and Thomas
> are import steps to address untouched areas.   

Possibly.  Or possibly we need to step back and redesign the
interpreter from scratch.  Or put more effort in e.g. Psyco.

> I took on the task of making sure that the basic pure python code
> slithers along quickly.  The basics like "while", "for", "if", "not"
> have all been improved.  Lowering the cost of those constructs
> will result in less effort towards by-passing them with vectorized 
> code (map, etc).  Code in something like sets.py won't show much
> benefit because so much effort had been directed at using filter,
> map, dict.update, and other high volume c-coded functions and
> methods.

And I'm happy that Python 2.3 is significantly faster than 2.2 (15% in
my benchmark!).

> Any one person's optimizations will likely help by a few percent 
> at most.  But, taken together, they will be a big win.

Yet, I expect that we're reaching a limit, or at least crawling up
ever slower.

> > I also wonder why this is done unconditionally, rather than only with
> > -O.
> 
> Neal, Brett, and I had discussed this a bit and I came to the conclusion
> that these code transformations are like the ones already built into the
> compiler -- they have some benefit, but cost almost nothing (two passes
> over the code string at compile time).  The -O option makes sense for
> optimizations that have a high time overhead, throw-away debugging
> information, change semantics, or reduce feature access.  IOW, -O is
> for when you're trading something away in return for a bit of speed
> in production code.

Yeah, but right now -O does *nothing* except remove asserts.  We might
as well get rid of it.

> There is essentially no benefit to not using the optimized bytecode.

Of course not, if you keep putting all optimizations in the default
case.

If we had only optimized unary minus followed by a constant in -O
mode, the (several!) bugs in that optimization would have been caught
much sooner.

PS, Raymond, can I ask you to look at the following bugs and patches
that are assigned to you: bugs 549151 (!), 557704 (!), 665835, 678519,
patches 708374, 685051, 658316, 562501.  The (!) ones have priority.
It's okay if you don't have time, but in that case say so so I can
find another way to get them addressed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From andrew@acooke.org  Wed Apr 23 02:12:05 2003
From: andrew@acooke.org (andrew cooke)
Date: Tue, 22 Apr 2003 21:12:05 -0400 (CLT)
Subject: [Python-Dev] os.path.walk() lacks 'depth first' option
In-Reply-To: <3EA4FE15.1070803@noah.org>
References: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net>
 <3EA4FE15.1070803@noah.org>
Message-ID: <53569.127.0.0.1.1051060325.squirrel@127.0.0.1>

I hesitate to post this because I'm out of my depth - I've never used
generators before, and I'm not 100% certain that this strange compromise
between imperative (the usual breadth/depth switch using queues) and a
functional (the usual pre/post switch using the call stack) algorithms is
ok.  However, it appears to work and may be useful - it's a simple
extension to Noah's code that allows the user to choose between breadth-
and depth-first traversal.  It is more expensive, using a list as either
fifo or lifo queue (depending on breadth/depth selection).

[Noah - I decided to post this rather than bother you again - hope that's OK]

#!/usr/bin/python2.2

from __future__ import generators # needed for Python 2.2
import os

def walktree(basepath=".", postorder=True, depthfirst=True,
             ignorelinks=True):

    """Noah Spurrier's code, modified to allow depth/breadth-first
    traversal.  The recursion is there *only* to allow postorder
    processing as the stack rolls back - the rest of the algorithm is
    imperative and queue would be declared outside helper if I knew
    how."""

    def helper(queue):

        if queue:

            if depthfirst: dir = queue.pop(-1)
            else: dir = queue.pop(0)

            children = os.listdir(dir)

            dirs, nondirs = [], []

            for name in children:
                fullpath = os.path.join(dir, name)
                if os.path.isdir(fullpath) and not \
                       (ignorelinks and os.path.islink(fullpath)):
                    dirs.append(name)
                    queue.append(fullpath)
                else:
                    nondirs.append(name)

            if not postorder:
                yield dir, dirs, nondirs

            for rest in helper(queue):
                yield rest

            if postorder:
                yield dir, dirs, nondirs

    return helper([basepath])

def test():
     for basepath, dirs, nondirs in \
             walktree(postorder=True, depthfirst=False):
         for name in dirs:
             print os.path.join(basepath, name)
         for name in nondirs:
             print os.path.join(basepath, name)

if __name__ == '__main__':
         test()


-- 
http://www.acooke.org/andrew


From andymac@bullseye.apana.org.au  Wed Apr 23 00:52:19 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Wed, 23 Apr 2003 10:52:19 +1100 (edt)
Subject: [Python-Dev] Re: sre vs gcc (was: New re failures on Windows)
In-Reply-To: <200304220849.43411.gherron@islandtraining.com>
Message-ID: <Pine.OS2.4.44.0304231025270.28508-100000@tenring.andymac.org>

On Tue, 22 Apr 2003, Gary Herron wrote:

> On Tuesday 22 April 2003 01:27 am, Andrew MacIntyre wrote:

{...}

> > The actual test case from test_sre that fails is:
> > ---8<---8<---
> > # non-simple '*?' still recurses and hits the recursion limit
> > test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None,
> > RuntimeError) ---8<---8<---

{...}

> Ah.  Good clue!  Here's a very likely fix to that problem.  Around
> line 3102 of _sre.c find the line that sets USE_RECURSION_LIMIT.
> Depending on you platform it will be set to either 10000 or 7500.  As
> a test, lower that value to 1000 or even 100.  If all the tests pass,
> then we know the culprit.

The magic number for USE_RECURSION_LIMIT is between 9250 & 9500.

Note that this is for gcc 3.2.2 on FreeBSD 4.7.  For gcc 3.2.1 on OS/2,
9250 is too high, but 7500 lets test_sre complete.

If the above test case is commented out, the "Test engine limitations"
test case section fails at the same USE_RECURSION_LIMIT settings as the
above test case.

I'll prepare a patch to supercede 725024 which sets USE_RECURSION_LIMIT to
7500 on FreeBSD & OS/2 with gcc 3.x, but I won't get to it for a day or
two.  I'll assign it to Gustavo.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From gward@python.net  Wed Apr 23 02:35:06 2003
From: gward@python.net (Greg Ward)
Date: Tue, 22 Apr 2003 21:35:06 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>
References: <20030422022607.GA1107@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>
Message-ID: <20030423013506.GA2547@cthulhu.gerg.ca>

On 21 April 2003, Tim Peters said:
> filter() is hard to get rid of because the bizarre filter(None, seq) special
> case is supernaturally fast.  Indeed, time the above against

Hmmm, a random idea: has filter() ever been used for anything else?
I didn't think so.  So why not remove everything *except* that handy
special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and
that's *all* filter() does.

Just a random thought...

-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Dyslexics of the world, untie!


From guido@python.org  Wed Apr 23 02:37:52 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 22 Apr 2003 21:37:52 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: "Your message of 22 Apr 2003 15:47:27 EDT."
 <1051040847.12834.32.camel@slothrop.zope.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
Message-ID: <200304230137.h3N1bqh27095@pcp02138704pcs.reston01.va.comcast.net>

> PEP: XXX
> Title: Trace and Profile Support for Threads
> Author: Jeremy Hylton <jeremy@alum.mit.edu>

Nice idea, Jeremy!

I have some more worries to add to the compatibility section.  It
seems reasonable for a trace implementation to implement a state
machine that assumes that events come in certain orders, e.g. CALL,
LINE, LINE, ..., RAISE or RETURN, and it might assume without checking
that all these apply to the same frame.  Calls from multiple threads
would confuse such a tracer!

If we can limit ourselves to threads started with the higher-level
(and recommended) threading module, we could provide a different
mechanism: you give the threading module a "tracer factory function"
which is invoked when a thread is started and passed to
sys.settrace().  Since sys.settrace() manipulates per-thread state,
this should work.  Since the API is new, there is no compatibility
problem.  The API could be super simple:

  threading.settrace(factory)

This would cause the following to be executed when a new thread is
started:

  sys.settrace(factory(frame, "thread", thread))

(An end-thread event should probably also be passed to the factory.)

By giving the factory the same signature as the regular trace
function, it is still possible to use the same tracer function if it
doesn't get confused by events from multiple threads, but it's also
possible to implement something different.

No C code would have to be written.

What do you think?  Or does the dependency on the threading module
kill this idea?  (Then we should think of adding this to the thread
module instead. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Wed Apr 23 03:33:21 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 22:33:21 -0400
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <000a01c3091e$65a978a0$530f8490@eden>
References: <000a01c3091e$65a978a0$530f8490@eden>
Message-ID: <1051065201.19699.2.camel@anthem>

On Tue, 2003-04-22 at 18:27, Mark Hammond wrote:

> Yes, this is my fault.  The assertion is detecting the fact that bsddb is
> creating and using its own interpreter/thread states than using the
> thread-state already seen for that thread.
> 
> As Tim says, the assertion is new, but the check it makes is valid.  I
> believe that removing the assertion would allow it to work, but the right
> thing to do is fix bsddb to use the new PyGILState_ API, and therefore share
> the threadstate with the rest of Python.
> 
> I will do this very shortly (ie, within a couple of hours)

Thanks for taking care of this Mark!  Yes, as PEP 291 states, bsddb.c
has to be compatible with Python 2.1.  At some point we may want to
re-evaluate that, but for now, if it's easy to do, we should keep
compatibility.

-Barry




From jack@performancedrivers.com  Wed Apr 23 03:38:23 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Tue, 22 Apr 2003 22:38:23 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca>; from gward@python.net on Tue, Apr 22, 2003 at 09:35:06PM -0400
References: <20030422022607.GA1107@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> <20030423013506.GA2547@cthulhu.gerg.ca>
Message-ID: <20030422223823.D15881@localhost.localdomain>

On Tue, Apr 22, 2003 at 09:35:06PM -0400, Greg Ward wrote:
> On 21 April 2003, Tim Peters said:
> > filter() is hard to get rid of because the bizarre filter(None, seq) special
> > case is supernaturally fast.  Indeed, time the above against
> 
> Hmmm, a random idea: has filter() ever been used for anything else?
> I didn't think so.  So why not remove everything *except* that handy
> special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and
> that's *all* filter() does.

Most frequently I test truth of a member of a tuple or list,
newl = filter(lambda x:x[-2], l)

secondly just plain truth, but here are some other examples.

sql_obs = filter(lambda x:isinstance(x, SQL), l)
words = filter(lambda x: x[-1] != ':', words) # filter out group: related: etc
pad_these = filter(lambda x:len(x) < maxlen, lists)
files = filter(lambda x:dir_matches(sid, x), os.listdir(libConst.STATE_DIR + '/'))
delete_these = map(lambda x:x[0][2:], filter(lambda x: x[1], d.iteritems()))
files = filter(lambda x:x.endswith('.state'), os.listdir(base_dir))

Go ahead, ask why we don't yank out lambda too, nobody uses that *wink*

-jack


From dave@boost-consulting.com  Wed Apr 23 03:39:11 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Tue, 22 Apr 2003 22:39:11 -0400
Subject: [Python-Dev] Metatype conflict among bases?
In-Reply-To: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> (Phillip
 J. Eby's message of "Tue, 22 Apr 2003 18:31:24 -0400")
References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net>
Message-ID: <844r4qgcog.fsf@boost-consulting.com>

"Phillip J. Eby" <pje@telecommunity.com> writes:

> The problem here is that B.__metaclass__ *must* be the same as, or a
> subclass of, A.__metaclass__, or vice versa.  It doesn't matter
> whether the metaclass is specified implicitly or explicitly, this
> constraint must be met.  Your code doesn't meet this constraint.
> Here's a revised example that does:
>
>      class A(object):
>          class __metaclass__(type):
>              pass
>
>      class B(A):
>          class __metaclass__(A.__class__):
>              pass
>
> B.__metaclass__ will now meet the "metaclass inheritance" constraint.
> See the "descrintro" document for some more info about this, and the
> "Putting Metaclasses To Work" book for even more info about it than
> you would ever want to know.  :)

I knew all that once, and have since forgotten more than I knew :(.

I actually already managed to make the code work by doing what you did
above, so it couldn't have been buried too deeply in the caves of my
brain.

> Here's a short statement of the constraint, though:
>
> A class X's metaclass (X.__class__) must be identical to, or a
> subclass of, the metaclass of *every* class in X.__bases__.  That is:
>
> for b in X.__bases__:
>      assert X.__class__ is b.__class__ or issubclass(X.__class, b.__class__),\
>          "metatype conflict among bases"

Still, the message is misleading.  There's only one base class, so the
metatype conflict is not "among bases".

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From barry@python.org  Wed Apr 23 03:42:01 2003
From: barry@python.org (Barry Warsaw)
Date: 22 Apr 2003 22:42:01 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca>
References: <20030422022607.GA1107@cthulhu.gerg.ca>
 <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net>
 <20030423013506.GA2547@cthulhu.gerg.ca>
Message-ID: <1051065721.19699.9.camel@anthem>

On Tue, 2003-04-22 at 21:35, Greg Ward wrote:
> On 21 April 2003, Tim Peters said:
> > filter() is hard to get rid of because the bizarre filter(None, seq) special
> > case is supernaturally fast.  Indeed, time the above against
> 
> Hmmm, a random idea: has filter() ever been used for anything else?
> I didn't think so.  So why not remove everything *except* that handy
> special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and
> that's *all* filter() does.

I've never used it for anything else, but I'm also just as happy to use

[x for x in seq if x]

Although it's a bit verbose, TOOWTDI.

-Barry





From tim.one@comcast.net  Wed Apr 23 04:50:58 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 22 Apr 2003 23:50:58 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <200304221254.10510.aleax@aleax.it>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEADEEAB.tim.one@comcast.net>

[Alex Martelli]
> Hmmm, I think I must be missing something here.  Surely in many
> application cases a loop exploiting short-circuiting behavior will have
> better expected performance than anything that's going all the way
> through the sequence no matter what?

No, you're only missing that I seem rarely to have apps where it actually
matters.

> Far greater variance, sure, and if the probability of true items gets
> extreme enough then the gain from short-circuiting will evaporate,

Or, more likely, become a pessimization (liability).

> but...:
>
> [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in
> range(9999)]'
> -s'''
> > def any(x):
> >   for xx in x:
> >     if xx: return True
> >   return False
> > ''' 'any(seq)'
> 1000000 loops, best of 3: 1.42 usec per loop
>
> [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in
> range(9999)]'
> -s'''
> def any(x):
>   return bool(filter(None,x))
> ''' 'any(seq)'
> 1000 loops, best of 3: 679 usec per loop
>
> ...i.e., despite filter's amazing performance, looping over 10k
> items still takes a bit more than shortcircuiting out at once;-).

It's only because Guido sped up loops for 2.3 <wink>.

> If Python ever gains such C-coded functions as any, all, etc (hopefully
> in some library module, not in builtins!) I do hope and imagine they'd
> short-circuit, of course.  BTW, I think any should return the first
> true item (or the last one if all false, or False for an empty sequence)
> and all should return the first false item (or the last one if all true,
> or True for an empty seq) by analogy with the behavior of operators
> and/or.

I agree that getting the first witness (for "any") or counterexample (for
"all") can be useful.  I'm not sure I care what it returns if all are false
for "any", or all true for "all".  If I don't care, they're easy to write
with itertools now:

"""
import itertools

def all(seq):
    for x in itertools.ifilterfalse(None, seq):
        return x # return first false value
    return True

def any(seq):
    for x in itertools.ifilter(None, seq):
        return x # return first true value
    return False

print all([1, 2, 3]) # True
print all([1, 2, 3, 0, 4, 5]) # 0, the first counterexample

print any([0, 0, 0, 0]) # False
print any([0, 42, 0, 0]) # 42, the first witness
"""

I liked ABC's quantified boolean expressions:

    SOME x IN collection HAS bool_expression_presumably_referencing_x
    EACH x IN collection HAS bool_expression_presumably_referencing_x
    NO x IN collection HAS bool_expression_presumably_referencing_x

The first left x bound to the first witness when true.  ABC didn't have
boolean data values -- these expressions could only be used in control-flow
statements (like IF).  x was then a block-local binding in the block
controlled by the truth of the expression, so there was no question about
what to do with x when the expression was false (you didn't enter the block
then, so couldn't reference the block-local x).

The second and third left x bound to the first counterexample when the
expression was false, and in those cases x was local to the ELSE clause.

I viewed that as finessing around a question that shouldn't be asked, via
the simple expedient of making the question unaskable <wink>.  The exact
rules were pretty complicated, though.



From tim.one@comcast.net  Wed Apr 23 04:59:27 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 22 Apr 2003 23:59:27 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net>

[Greg Ward]
> Hmmm, a random idea: has filter() ever been used for anything else?
> I didn't think so.  So why not remove everything *except* that handy
> special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and
> that's *all* filter() does.
>
> Just a random thought...

It's been used for lots of other stuff, but I'm not sure if any other use
wouldn't read better as a listcomp.  For example, from spambayes:

def textparts(msg):
    """Return a set of all msg parts with content maintype 'text'."""
    return Set(filter(lambda part: part.get_content_maintype() == 'text',
                      msg.walk()))

I think that reads better as:

    return Set([part for part in msg.walk()
                     if part.get_content_maintype() == 'text'])

In Python 3.0 that will become a set comprehension <wink>:

    return {part for part in msg.walk()
                 if part.get_content_maintype() == 'text'}




From martin@v.loewis.de  Wed Apr 23 06:09:28 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 23 Apr 2003 07:09:28 +0200
Subject: [Python-Dev] 2.3b1 release
In-Reply-To: <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net>
References: <200304161552.h3GFqAQ10181@odiug.zope.com>
 <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net>
 <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net>
 <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net>
Message-ID: <m34r4per5j.fsf@mira.informatik.hu-berlin.de>

Tim Rice <tim@multitalents.net> writes:

> Well after patching pyport.h for the sys/select problem, I had
> errors because of missing u_int and u_long data types.

In this form, I consider the patch unacceptable. Setting
define_xopen_source should be the last resort, to be used only if the
operating system is broken in the sense of not working at all as an
X/Open system, for compiling software.

If this is indeed the case that OpenUnix cannot work with
_XOPEN_SOURCE defined, giving one instance of an unsolvable problem in
a comment that explains why it should be disabled. See the comments
for other systems as to how to explain such problems. Saying there are
"errors" is too unspecific; saying that u_int is not defined but
needed for the signature of the foo_bar function would be ok.

Please post your updated patch to SF.

Regards,
Martin


From Ludovic.Aubry@logilab.fr  Wed Apr 23 09:44:19 2003
From: Ludovic.Aubry@logilab.fr (Ludovic Aubry)
Date: Wed, 23 Apr 2003 10:44:19 +0200
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <005e01c3084c$3fe7d300$ec11a044@oemcomputer>
References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <003d01c30848$ebcc2d00$ec11a044@oemcomputer> <005e01c3084c$3fe7d300$ec11a044@oemcomputer>
Message-ID: <20030423084419.GC567@logilab.fr>

On Mon, Apr 21, 2003 at 05:23:25PM -0400, Raymond Hettinger wrote:
> [RH]
> > For the C implementation, consider bypassing operator.add
> > and calling the nb_add slot directly.  It's faster and fulfills
> > the intention to avoid the alternative call to sq_concat.
> 
> Forget I said that, you still need PyNumber_Add() to
> handle coercion and such.  Though without some
> special casing  it's going to be darned difficult to match 
> the performance of a pure python for-loop (especially
> for a sequence of integers).

Why not move the integer add optimization from ceval.c into PyNumber_Add
?
Granted you have an extra call on the fast path, but on the other hand
* more code could benefit from this optimization
* you don't have code related to the same operation spread in several
  files
* the ceval loop has a reduced footprint



-- 
Ludovic Aubry                                 LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org


From dave@boost-consulting.com  Wed Apr 23 10:50:59 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Wed, 23 Apr 2003 05:50:59 -0400
Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers")
References: <20030423013506.GA2547@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net>
Message-ID: <847k9lbkzg.fsf@boost-consulting.com>

Tim Peters <tim.one@comcast.net> writes:

> [Greg Ward]
>> Hmmm, a random idea: has filter() ever been used for anything else?
>> I didn't think so.  So why not remove everything *except* that handy
>> special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and
>> that's *all* filter() does.
>>
>> Just a random thought...
>
> It's been used for lots of other stuff, but I'm not sure if any other use
> wouldn't read better as a listcomp.  For example, from spambayes:
>
> def textparts(msg):
>     """Return a set of all msg parts with content maintype 'text'."""
>     return Set(filter(lambda part: part.get_content_maintype() == 'text',
>                       msg.walk()))
>
> I think that reads better as:
>
>     return Set([part for part in msg.walk()
>                      if part.get_content_maintype() == 'text'])

IMO this one's much nicer than either of those:

     return Set(
         filter_(msg.walk(), _1.get_content_maintype() == 'text')
     )

with

   filter_ = lambda x,y: filter y,x

and _N for N in 0..9 left as an exercise to the reader.

It helps my brain a lot to be able to write the sequence before the
filtering function, and for the kind of simple lambdas that Python is
restricted to, having to name the arguments is just syntactic
deadweight.

   python = best_language([pound for pound in the_world])

but-list-comprehensions-always-read-like-strange-english-to-me-ly y'rs,

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From mwh@python.net  Wed Apr 23 11:23:31 2003
From: mwh@python.net (Michael Hudson)
Date: Wed, 23 Apr 2003 11:23:31 +0100
Subject: [Python-Dev] Metatype conflict among bases?
In-Reply-To: <844r4qgcog.fsf@boost-consulting.com> (David Abrahams's message
 of "Tue, 22 Apr 2003 22:39:11 -0400")
References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net>
 <844r4qgcog.fsf@boost-consulting.com>
Message-ID: <2m3ck9wlzw.fsf@starship.python.net>

David Abrahams <dave@boost-consulting.com> writes:

> Still, the message is misleading.  There's only one base class, so
> the metatype conflict is not "among bases".

Not arguing with that, but: what would you suggest instead?  I'm agin
the idea of having small essays in tracebacks...

Cheers,
M.

-- 
  People think I'm a nice guy, and the fact is that I'm a scheming,
  conniving bastard who doesn't care for any hurt feelings or lost
  hours of work if it just results in what I consider to be a better
  system.                                            -- Linus Torvalds


From barry@python.org  Wed Apr 23 12:19:16 2003
From: barry@python.org (Barry Warsaw)
Date: 23 Apr 2003 07:19:16 -0400
Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net>
Message-ID: <1051096756.19699.14.camel@anthem>

On Tue, 2003-04-22 at 23:59, Tim Peters wrote:

> In Python 3.0 that will become a set comprehension <wink>:

PEP 274 lives!
-Barry




From dave@boost-consulting.com  Wed Apr 23 13:17:15 2003
From: dave@boost-consulting.com (David Abrahams)
Date: Wed, 23 Apr 2003 08:17:15 -0400
Subject: [Python-Dev] Re: Metatype conflict among bases?
References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> <844r4qgcog.fsf@boost-consulting.com>
 <2m3ck9wlzw.fsf@starship.python.net>
Message-ID: <znmhxvas.fsf@boost-consulting.com>

Michael Hudson <mwh@python.net> writes:

> David Abrahams <dave@boost-consulting.com> writes:
>
>> Still, the message is misleading.  There's only one base class, so
>> the metatype conflict is not "among bases".
>
> Not arguing with that, but: what would you suggest instead?  I'm agin
> the idea of having small essays in tracebacks...

    metatype conflict: metatype of derived class B must be a
    (non-strict) subclass of the metatypes of its bases

I don't think that's too verbose.  Too many traceback messages from
Python give no indication of what the actual problem was or how to fix
it, so I don't mind getting a bit more essay-like.  Just today on
python-list I saw this

    >>> range(map(lambda x:x+1, [0, 100, 3]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    TypeError: an integer is required

come up as a problem for someone.

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com



From skip@pobox.com  Wed Apr 23 14:27:57 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 23 Apr 2003 08:27:57 -0500
Subject: [Python-Dev] okay to beef up tests on the maintenance branch?
Message-ID: <16038.38109.39294.770440@montanaro.dyndns.org>

Is it okay to beef up the test harness code a little on the 2.2 maintenance
branch?  I'm installing 2.2.2 on a Solaris 8 machine at the moment and
notice two small warts:

    * -u all isn't accepted

    * there are no sunos5 expected skips

Any problem adding them for 2.2.3 if they aren't already in CVS (they may
already be there)?  More generally, is improving the test harness okay (not
strictly a bug fix) since it doesn't directly affect the performance of the
interpreter?

Thx,

Skip


From aleax@aleax.it  Wed Apr 23 14:49:54 2003
From: aleax@aleax.it (Alex Martelli)
Date: Wed, 23 Apr 2003 15:49:54 +0200
Subject: [Python-Dev] Re: Metatype conflict among bases?
In-Reply-To: <znmhxvas.fsf@boost-consulting.com>
References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> <2m3ck9wlzw.fsf@starship.python.net> <znmhxvas.fsf@boost-consulting.com>
Message-ID: <200304231549.54063.aleax@aleax.it>

On Wednesday 23 April 2003 02:17 pm, David Abrahams wrote:
   ...
> it, so I don't mind getting a bit more essay-like.  Just today on
> python-list I saw this
>
>     >>> range(map(lambda x:x+1, [0, 100, 3]))
>
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>     TypeError: an integer is required
>
> come up as a problem for someone.

It's a bit better in the current CVS Python -- essentially all error
messages from built-ins now identify which built-in is involved, and
many give extra, pertinent information -- e.g.:

[alex@lancelot src]$ ./python -c 'range(map(str,[1,2,3]))'
Traceback (most recent call last):
  File "<string>", line 1, in ?
TypeError: range() integer end argument expected, got list.

As long as the message still typically fits within one line, I think there can
be no substantial objection to making it clearer and more infomative.


Alex



From fdrake@acm.org  Wed Apr 23 15:52:03 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 23 Apr 2003 10:52:03 -0400
Subject: [Python-Dev] okay to beef up tests on the maintenance branch?
In-Reply-To: <16038.38109.39294.770440@montanaro.dyndns.org>
References: <16038.38109.39294.770440@montanaro.dyndns.org>
Message-ID: <16038.43155.910242.470533@grendel.zope.com>

Skip Montanaro writes:
 >     * -u all isn't accepted

I think "all" and the "-<feature>" syntax should both be added; I
don't see any problem with backporting enhancements to the maintenance
tools.

 >     * there are no sunos5 expected skips

The expected skips information should certainly be maintained on the
maintenance branch.  Feel free!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From neal@metaslash.com  Wed Apr 23 18:21:10 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Wed, 23 Apr 2003 13:21:10 -0400
Subject: [Python-Dev] vacation
Message-ID: <20030423172110.GO12836@epoch.metaslash.com>

I'm going on vacation from Apr 26 - May 6.  I will probably not be
available during this period.

Sometime in the next month or so, I plan to run valgrind and pychecker
over everything.  I should be done before beta2.  

Also, the snake farm still has some issues.  I will try to improve the
snake farm status in May or June.  But if anybody wants to volunteer
to fix any of the issues, feel free. :-)

        http://www.lysator.liu.se/xenofarm/python/latest.html

Some test failures are:

        test_logging    Solaris 8, RedHat 9
        test_getargs2   Solaris 8, Mac OS X
        test_time       RedHat 9, Linux ia64

For a hack which seems to fix test_logging problem, see my comment here:

        http://python.org/sf/725904

Neal


From theller@python.net  Wed Apr 23 18:48:06 2003
From: theller@python.net (Thomas Heller)
Date: 23 Apr 2003 19:48:06 +0200
Subject: [Python-Dev] vacation
In-Reply-To: <20030423172110.GO12836@epoch.metaslash.com>
References: <20030423172110.GO12836@epoch.metaslash.com>
Message-ID: <3ck95cmh.fsf@python.net>

Neal Norwitz <neal@metaslash.com> writes:

> Some test failures are:
> 
>         test_getargs2   Solaris 8, Mac OS X

It seems test_getargs2 fails on big endian platforms. Is the solaris 8
such a machine?

See also the comments I added to http://www.python.org/sf/724774.

I have the impression that the test is broken. Should I try to fix it
(difficult, without access to neither Mac or Solaris), or should it
simply be deleted ;-)

Thomas



From neal@metaslash.com  Wed Apr 23 19:02:21 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Wed, 23 Apr 2003 14:02:21 -0400
Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation)
In-Reply-To: <3ck95cmh.fsf@python.net>
References: <20030423172110.GO12836@epoch.metaslash.com>
 <3ck95cmh.fsf@python.net>
Message-ID: <20030423180221.GP12836@epoch.metaslash.com>

On Wed, Apr 23, 2003 at 07:48:06PM +0200, Thomas Heller wrote:
> Neal Norwitz <neal@metaslash.com> writes:
> 
> > Some test failures are:
> > 
> >         test_getargs2   Solaris 8, Mac OS X
> 
> It seems test_getargs2 fails on big endian platforms. Is the solaris 8
> such a machine?

I believe so.

> See also the comments I added to http://www.python.org/sf/724774.
>
> I have the impression that the test is broken. Should I try to fix it
> (difficult, without access to neither Mac or Solaris), or should it
> simply be deleted ;-)

I think getargs_ul() is broken.  For example, if the user passes more
than a single char as the format, memory will be scribbled on.  The
format should be checked to make sure it contains acceptable values
for getargs_ul() to be safe.

I fixed a similar problem in revision 1.23 of _testcapimodule.c.
See comment and code around line 330.

I'm not really sure of the purpose of _testcapimodule, so perhaps
the lack of error checking is acceptable?  I can fix the problems,
but not before the beta will go out.

Neal


From theller@python.net  Wed Apr 23 19:08:47 2003
From: theller@python.net (Thomas Heller)
Date: 23 Apr 2003 20:08:47 +0200
Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation)
In-Reply-To: <20030423180221.GP12836@epoch.metaslash.com>
References: <20030423172110.GO12836@epoch.metaslash.com>
 <3ck95cmh.fsf@python.net> <20030423180221.GP12836@epoch.metaslash.com>
Message-ID: <ist53x3k.fsf@python.net>

Neal Norwitz <neal@metaslash.com> writes:

> On Wed, Apr 23, 2003 at 07:48:06PM +0200, Thomas Heller wrote:
> > Neal Norwitz <neal@metaslash.com> writes:
> > 
> > > Some test failures are:
> > > 
> > >         test_getargs2   Solaris 8, Mac OS X
> > 
> > It seems test_getargs2 fails on big endian platforms. Is the solaris 8
> > such a machine?
> 
> I believe so.
> 
> > See also the comments I added to http://www.python.org/sf/724774.
> >
> > I have the impression that the test is broken. Should I try to fix it
> > (difficult, without access to neither Mac or Solaris), or should it
> > simply be deleted ;-)
> 
> I think getargs_ul() is broken.

That was what I meant.

>   For example, if the user passes more
> than a single char as the format, memory will be scribbled on.  The
> format should be checked to make sure it contains acceptable values
> for getargs_ul() to be safe.

It is even broken if only single character formats are passed, because
it always uses an unsigned long * as the third parameter, which is wrong
for 'B' and 'H' format codes.

> 
> I fixed a similar problem in revision 1.23 of _testcapimodule.c.
> See comment and code around line 330.

I will take a look.

> 
> I'm not really sure of the purpose of _testcapimodule, so perhaps
> the lack of error checking is acceptable?  I can fix the problems,
> but not before the beta will go out.
> 
> Neal

Thomas



From guido@python.org  Wed Apr 23 19:31:53 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 23 Apr 2003 14:31:53 -0400
Subject: [Python-Dev] Democracy
Message-ID: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net>

I read this interview in ACM's *Ubiquity* which reminded me of the
Python developer community.  Seems we are doing some things right.
Maybe we can learn from it in cases where we aren't.

  http://www.acm.org/ubiquity/interviews/b_manville_1.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Wed Apr 23 20:24:01 2003
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 23 Apr 2003 14:24:01 -0500
Subject: [Python-Dev] vacation
In-Reply-To: <3ck95cmh.fsf@python.net>
References: <20030423172110.GO12836@epoch.metaslash.com>
 <3ck95cmh.fsf@python.net>
Message-ID: <16038.59473.700903.98765@montanaro.dyndns.org>

    Thomas> I have the impression that the test is broken. Should I try to
    Thomas> fix it (difficult, without access to neither Mac or Solaris), or
    Thomas> should it simply be deleted ;-)

I have access to both Mac OS X and Solaris 8.  I routinely build from CVS on
my Mac Laptop (my default Python interpreter there is built from CVS).  I
can set up a CVS tree on a Solaris 8 machine and test anything you need.

Skip


From theller@python.net  Wed Apr 23 20:37:43 2003
From: theller@python.net (Thomas Heller)
Date: 23 Apr 2003 21:37:43 +0200
Subject: [Python-Dev] vacation
In-Reply-To: <16038.59473.700903.98765@montanaro.dyndns.org>
References: <20030423172110.GO12836@epoch.metaslash.com>
 <3ck95cmh.fsf@python.net>
 <16038.59473.700903.98765@montanaro.dyndns.org>
Message-ID: <sms92eew.fsf@python.net>

Skip Montanaro <skip@pobox.com> writes:

>     Thomas> I have the impression that the test is broken. Should I try to
>     Thomas> fix it (difficult, without access to neither Mac or Solaris), or
>     Thomas> should it simply be deleted ;-)
> 
> I have access to both Mac OS X and Solaris 8.  I routinely build from CVS on
> my Mac Laptop (my default Python interpreter there is built from CVS).  I
> can set up a CVS tree on a Solaris 8 machine and test anything you need.

In this case I'll try to fix it tomorrow.

Thanks,

Thomas



From aahz@pythoncraft.com  Wed Apr 23 20:46:40 2003
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 23 Apr 2003 15:46:40 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: <1051040847.12834.32.camel@slothrop.zope.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
Message-ID: <20030423194638.GA19312@panix.com>

On Tue, Apr 22, 2003, Jeremy Hylton wrote:
>
> Abstract
> ========
> 
> This PEP describes a mechanism for attaching profile and trace
> functions to a thread when it is created.  This mechanism allows
> existing tools, like the profiler, to work with multi-threaded
> programs.  The new functionality is exposed via a new event type for
> trace functions.

Hrm.  While I don't want to overload what looks like a simple PEP, I'd
like some thoughts about how this ought to interact with thread-local
storage (if at all).  There are some modules (notably the BCD module)
that need to keep track of state on a per-thread basis, but without
requiring a user of the module to do the work.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From guido@python.org  Wed Apr 23 21:58:09 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 23 Apr 2003 16:58:09 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: "Your message of Wed, 23 Apr 2003 15:46:40 EDT."
 <20030423194638.GA19312@panix.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
Message-ID: <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>

> Hrm.  While I don't want to overload what looks like a simple PEP, I'd
> like some thoughts about how this ought to interact with thread-local
> storage (if at all).  There are some modules (notably the BCD module)
> that need to keep track of state on a per-thread basis, but without
> requiring a user of the module to do the work.

IMO you can do thread-local storage just fine by attaching private
attributes to threading.currentThread().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack@performancedrivers.com  Wed Apr 23 22:53:11 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Wed, 23 Apr 2003 17:53:11 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Apr 23, 2003 at 02:31:53PM -0400
References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030423175310.F15881@localhost.localdomain>

On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote:
> I read this interview in ACM's *Ubiquity* which reminded me of the
> Python developer community.  Seems we are doing some things right.
> Maybe we can learn from it in cases where we aren't.

He seems to be talking more about Governments (and treating companies as
governments b/c the people can't or don't want to leave) and knowledge workers
broadly.

A better comparison would be Habitat for Humanity (and voluntary associations
in general).  Habitat has some fixed overhead for the organization.  They get
free labor from anyone that wants to contribute it and agrees with the scope
of work.  The amount of product they can churn out (houses) is greatly
increased by private donations that can hire full-time labor and marginal
supplies.  Most of the voluntary labor is from the local community who want
to see the area improved.  Would be home owners have to contribute large 
amounts of time in exchange for an inexpensive house built mostly by others.

It wouldn't go away if there was no funding, it would just be a local fixup
club (which do exist).  If there is a large group of people that think they 
should be building differently, they will form their own association (fork)
which will take some or all of the patrons and volunteers with it.

It maintains its character because the bulk of the labor and all the 
contributions are voluntary.  If they paid everyone and sold the houses at
a profit they would be a regular company.


The building houses vs building code analogy is not perfect.  Houses have fixed
costs per deployment a portion of which is paid by the new owner.  Software
costs are extremely low per copy, so that wouldn't work.  People get real
but widely varying benefits from a copy of python (personal site v commercial
product).


In closing, if there is something to be learned by looking at others, 
specific purpose voluntary associations seem to be the better place to look
than governments.


-jack


From lalo@laranja.org  Wed Apr 23 22:54:13 2003
From: lalo@laranja.org (Lalo Martins)
Date: Wed, 23 Apr 2003 18:54:13 -0300
Subject: [Python-Dev] Democracy
In-Reply-To: <20030423175310.F15881@localhost.localdomain>
References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> <20030423175310.F15881@localhost.localdomain>
Message-ID: <20030423215413.GD8197@laranja.org>

On Wed, Apr 23, 2003 at 05:53:11PM -0400, Jack Diederich wrote:
> On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote:
> > I read this interview in ACM's *Ubiquity* which reminded me of the
> > Python developer community.  Seems we are doing some things right.
> > Maybe we can learn from it in cases where we aren't.
> 
> He seems to be talking more about Governments (and treating companies as
> governments b/c the people can't or don't want to leave) and knowledge workers
> broadly.

In fact he mentions in the text that the open source community
(he uses the term "open software") is a good example of this model.

[]s,
                                               |alo
                                               +----
--
            Those who trade freedom for security
               lose both and deserve neither.
--
http://www.laranja.org/                mailto:lalo@laranja.org
         pgp key: http://www.laranja.org/pessoal/pgp

Eu jogo RPG! (I play RPG)         http://www.eujogorpg.com.br/
GNU: never give up freedom                 http://www.gnu.org/


From aahz@pythoncraft.com  Wed Apr 23 23:40:05 2003
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 23 Apr 2003 18:40:05 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030423224005.GA6089@panix.com>

On Wed, Apr 23, 2003, Guido van Rossum wrote:
> Aahz:
>>
>> Hrm.  While I don't want to overload what looks like a simple PEP, I'd
>> like some thoughts about how this ought to interact with thread-local
>> storage (if at all).  There are some modules (notably the BCD module)
>> that need to keep track of state on a per-thread basis, but without
>> requiring a user of the module to do the work.
> 
> IMO you can do thread-local storage just fine by attaching private
> attributes to threading.currentThread().

Agreed -- *if* Jeremy goes for your threading-only solution.  If this
PEP hooks in at a lower level, that's going to require that everything
else built on top of threads work at a lower level, too.

Seems to me that this is a good argument for module-level properties,
BTW, or we require that all module attributes be set only through
functions.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From aahz@pythoncraft.com  Wed Apr 23 23:59:05 2003
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 23 Apr 2003 18:59:05 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <20030423175310.F15881@localhost.localdomain>
References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> <20030423175310.F15881@localhost.localdomain>
Message-ID: <20030423225905.GA11217@panix.com>

On Wed, Apr 23, 2003, Jack Diederich wrote:
>
> In closing, if there is something to be learned by looking at others,
> specific purpose voluntary associations seem to be the better place to
> look than governments.

Excellent post!  Another community that I often mention along those
lines is science fiction fandom.  It's particularly relevant because
fandom has many of the same social issues as the programming community
(people who are True Believers, unbelievable amounts of politics, people
with marginal social skills, and so on).
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From pje@telecommunity.com  Thu Apr 24 01:14:52 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 23 Apr 2003 20:14:52 -0400
Subject: [Python-Dev] Updating PEP 246 for type/class unification, 2.2+, etc.
Message-ID: <5.1.1.6.0.20030423191448.00a30e20@mail.rapidsite.net>

I'd like to propose some revisions to PEP 246 based on experience trying to 
implement a prototype of it for use in PEAK and Zope (perhaps Twisted as 
well).  The issues I see are as follows:

1. PEP 246 allows TypeError in __conform__ and __adapt__ methods to pass 
silently.  (After considerable work and thought, I was able to 
reverse-engineer *why*, but that rationale should at least be explicitly 
documented in the PEP, even if the limitation is unavoidable.)

2. The reference implementation in the PEP has fancy extra features that 
are not specified by the main body of the PEP, and in some cases raise more 
questions than they answer about what a valid PEP 246 implementation should 
do.  (adaptRaiseTypeException, adaptForceFailException, _check, etc.)

3. The PEP 246 examples do not illustrate Python 2.2+ idioms for creating 
usable __conform__ and __adapt__ methods.  For example, a class 
instance  with a __call__ method gets stuck in another class in order to 
(presumably) work around the absence of staticmethod or classmethod in 
Python prior to version 2.2.  The reference implementation also uses string 
exceptions, which were a no-no even before version 2.2.

4. PEP 246 does not cover implementation issues for developers in the cases 
where 'obj' is a class or 'protocol' is an instance.  The former is 
particularly important in the context of adapting metaclass instances, and 
the latter is relevant for using Zope 'Interface' objects (for example) as 
protocols.

None of these issues are unresolvable; in fact I have proposals to address 
them all.  If the PEP authors agree with my assessments, perhaps they will 
undertake to update the PEP.  My goal is not to get a PEP 246 'adapt()' 
blessed for the Python core or distro in the immediate future, but rather 
to have a usable reference standard for framework developers to build 
implementations on.  Even more important...  I would like framework users 
to be able to write __conform__ and __adapt__ methods that will be in 
principle usable by any framework that uses PEP 246 as a standard for 
adaptation.  In this sense, we may view the role of PEP 246 as being 
similar to the Python DBAPI.

So, without further ado, my proposals for revisions to PEP 246 are as follows:

Issue #1: My reverse-engineering leads me to the conclusion that PEP 246 
specifies that TypeError be ignored because of issue #4 above: using a 
class as 'obj' or an instance for 'protocol' may lead to a TypeError caused 
by using a class method as an instance method or vice versa.  While the 
creator of the objects being supplied to 'adapt()' can work around these 
issues with descriptors, the casual user should not be expected to.  Thus, 
such TypeErrors should be ignored.

To resolve this dilemna, I propose that 'adapt()' use the following 
pseudocode to verify whether a TypeError has arisen from invocation of a 
method, or the execution of a method:

         try:
             # note: real implementation needs to catch AttributeError!
             result = obj.__conform__(protocol)
             if result is not None:
                 return result

         except TypeError:
             if sys.exc_info()[2].tb_frame is not sys._getframe():
                 raise

In other words, if the exception was raised in the calling frame, it is 
assumed to be an invocation error rather than an execution error, and can 
thus be safely ignored.  The only "exception" to this pattern is if the 
targeted method is written in C and thus does not create a separate frame 
for execution.  (Note that C code generated by Pyrex creates dummy 
execution frames before returning an exception to Python, so this is only 
an issue for hand-written C code.)  The worst case scenario here is that 
authors of '__conform__' and '__adapt__' methods written in C must 1) 
guarantee that TypeError will not be raised, 2) accept silent loss of 
internal TypeErrors, or 3) write code to create a dummy frame when raising 
an error.

As far as Jython impact, the mechanism by which TypeErrors are raised is 
different, so I do not know if it is possible for the Java or Python levels 
to cleanly make this differentiation.  If Jython simulates Python frames 
and tracebacks, including only Python-level frames, then this would work 
more or less directly.  I confess I do not understand enough about Jython's 
implementation at present to know how practical it is under Jython.  An 
alternative might be to recognize the text of the Python exception values 
for unbound methods, missing arguments, etc., applying to the method being 
called.  This might actually be more complex to implement correctly, though.


Issue #2: I propose that the PEP 246 reference implementation be pared down 
to remove extraneous features.  Specifically, I believe that the signature 
of adapt should be:

_marker = object()

def adapt(obj, protocol, default=_marker):

     # ... attempt to return adapted result

     if default is _marker:
         raise NotImplementedError(...)

'adaptForceFailException' looks to me like a YAGNI, since an object 
shouldn't veto its being used for a protocol, if the protocol knows how to 
adapt it.  And the protocol doesn't need to force failure, it can return 
failure.

Rather than raising a TypeError for adaptation failure, and thus "raising" 
even further confusion regarding the proper handling of TypeError.

Finally, the '_check()' function should be dropped.  Its presence simply 
makes it harder to evaluate or consider PEP 246 for inclusion in Python or 
a framework, because it is left unspecified what '_check()' should do.  We 
are given many examples of what it *could* do, but not what it *should* 
do.  In any event, I think it's a YAGNI because if the object claims it can 
conform or the protocol claims it can adapt, then what business is it of 
'adapt()' to question the consent of the objects involved?


Issue #3: Examples should use 'classmethod' for '__adapt__' rather than 
simulated or real 'staticmethod', and include the case where a subclass 
delegates to a superclass '__adapt__' method.  And string exceptions are 
right out.


Issue #4: Illustrate the issues that arise for adapting classes or 
metaclass instances, and using instances rather than types as 
protocols.  Ideally, examples of descriptors that work around the issues 
should be included.  (And as soon as I've figured out how to write them, 
I'll be happy to supply source!)


Thoughts, anyone?



From guido@python.org  Thu Apr 24 01:33:19 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 23 Apr 2003 20:33:19 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: "Your message of Wed, 23 Apr 2003 18:40:05 EDT."
 <20030423224005.GA6089@panix.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
 <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
 <20030423224005.GA6089@panix.com>
Message-ID: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>

> > Aahz:
> >> Hrm.  While I don't want to overload what looks like a simple PEP, I'd
> >> like some thoughts about how this ought to interact with thread-local
> >> storage (if at all).  There are some modules (notably the BCD module)
> >> that need to keep track of state on a per-thread basis, but without
> >> requiring a user of the module to do the work.

> On Wed, Apr 23, 2003, Guido van Rossum wrote:
> > IMO you can do thread-local storage just fine by attaching private
> > attributes to threading.currentThread().

Aahz:
> Agreed -- *if* Jeremy goes for your threading-only solution.  If this
> PEP hooks in at a lower level, that's going to require that everything
> else built on top of threads work at a lower level, too.

Well, I think it's fair to say that you should use the higher-level
threading module if you want higher-level concepts like thread-local
storage.  (A poor name IMO; it would be better to call it "per-thread
data".)

> Seems to me that this is a good argument for module-level properties,
> BTW, or we require that all module attributes be set only through
> functions.

I'm not following.  What do you mean by module-level properties?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From amk@amk.ca  Wed Apr 23 17:39:47 2003
From: amk@amk.ca (A.M. Kuchling)
Date: Wed, 23 Apr 2003 12:39:47 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
Message-ID: <20030423163947.GA24541@nyman.amk.ca>

A while ago Paul Rubin proposed adding a Rijndael/AES module to 2.3.
(AES = Advanced Encryption Standard, a block cipher that's likely to
be around for a long time).  Rubin wanted to come up with a nice
interface for the module, and has posted some notes toward it.  I have
an existing implementation that's 2212 lines of code; I like the
interface, but opinions may vary. :)

Do we want to do anything about this for 2.3?  A benefit is that AES
is useful, and likely to remain so for the next 20 years; a drawback
is that it might entangle the PSF in export-control legalities.  I
vaguely recall the PSF getting some legal advice on this point; am I
misremembering?  What was the outcome?

If AES gets added, rotor can be deprecated to encourage people to use
something better; patch is at <URL:http://www.python.org/sf/679505>.

--amk                                                    (www.amk.ca)
Cerebral circuits in order. Physiognomy dubious.
      -- K9 assesses the Doctor's condition, in "The Invasion of Time"


From guido@python.org  Thu Apr 24 02:14:26 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 23 Apr 2003 21:14:26 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: "Your message of Wed, 23 Apr 2003 17:53:11 EDT."
 <20030423175310.F15881@localhost.localdomain>
References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net>
 <20030423175310.F15881@localhost.localdomain>
Message-ID: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net>

> On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote:
> > I read this interview in ACM's *Ubiquity* which reminded me of the
> > Python developer community.  Seems we are doing some things right.
> > Maybe we can learn from it in cases where we aren't.
> 
> He seems to be talking more about Governments (and treating
> companies as governments b/c the people can't or don't want to
> leave) and knowledge workers broadly.

Well, he specifically points out that the US government is an
inappropriate model, and suggests instead to use the government of
ancient Athens as a model.  Then he goes on to point out several
properties of that community that I think match our community pretty
well:

(1) Shared communal values, including moral reciprocity; you get
    professional or personal growth in return for your contributions.
    I think many developers contribute and learn something from the
    review of their code by others.

(2) Structure, a body for debate, dialogue, and decision-making.  "The
    organization is the people."  In our case: mailing lists, PEPs,
    SourceForge, CVS.

(3) Specific practices: the right and expectation of *participation*;
    *consequence* or *accountability*: if you decide something, you
    have to do the work; *deliberation*: resist partisanship; *merit*
    as the basis for decisions; and *closure*: debates shouldn't go on
    forever and once a decision is made, everyone is supposed to get
    on board.

I think all those things match our way of working pretty well!

> A better comparison would be Habitat for Humanity (and voluntary
> associations in general).  [...]

Maybe.  I get lots of junk mail asking for contributions from HforH
and frankly I've always thought of them as yet another charity: there
are lots of these, and most of them are so much larger than our
community that comparison is difficult.  IMO these large charities in
general (maybe not HforH, I don't know anything about them because on
principle I never open unsolicited mail) are too much like modern-day
massive governments already: they typically have a leadership who,
like politicians, would do anything to keep or improve their personal
position.  I hope that's not true for the Python developer community.
Certainly my own motivation is the fun I have here and not personal
gain!!!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Apr 24 02:17:08 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 23 Apr 2003 21:17:08 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: "Your message of Wed, 23 Apr 2003 12:39:47 EDT."
 <20030423163947.GA24541@nyman.amk.ca>
References: <20030423163947.GA24541@nyman.amk.ca>
Message-ID: <200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net>

> A while ago Paul Rubin proposed adding a Rijndael/AES module to 2.3.
> (AES = Advanced Encryption Standard, a block cipher that's likely to
> be around for a long time).  Rubin wanted to come up with a nice
> interface for the module, and has posted some notes toward it.  I have
> an existing implementation that's 2212 lines of code; I like the
> interface, but opinions may vary. :)
> 
> Do we want to do anything about this for 2.3?  A benefit is that AES
> is useful, and likely to remain so for the next 20 years; a drawback
> is that it might entangle the PSF in export-control legalities.  I
> vaguely recall the PSF getting some legal advice on this point; am I
> misremembering?  What was the outcome?

I don't recall; I think Jeremy knows most about these issues.
Personally, I expect that even if we could get certification, it would
be much easier if there was no encryption code at all in Python, and
if people had to get it from a 3rd party site.

> If AES gets added, rotor can be deprecated to encourage people to use
> something better; patch is at <URL:http://www.python.org/sf/679505>.

Rotor should be deprecated regardless; I've never heard of someone
using it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From agthorr@barsoom.org  Thu Apr 24 04:46:57 2003
From: agthorr@barsoom.org (Agthorr)
Date: Wed, 23 Apr 2003 20:46:57 -0700
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <20030422054218.GA18642@barsoom.org>
References: <20030420183005.GB8449@barsoom.org> <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net> <20030422054218.GA18642@barsoom.org>
Message-ID: <20030424034656.GF12507@barsoom.org>

On Mon, Apr 21, 2003 at 10:42:18PM -0700, Agthorr wrote:
> However, speaking of subclassing Queue: is it likely there are many
> user applications that subclass it in a way that would break? (i.e.,
> they override some, but not all, of the functions intended for
> overriding).

Answering myself, I notice that the bisect class documents this use of
the Queue class:
------------------------------------------------------------------------
The bisect module can be used with the Queue module to implement a priority
queue (example courtesy of Fredrik Lundh): \index{Priority Queue}

\begin{verbatim}
import Queue, bisect

class PriorityQueue(Queue.Queue):
    def _put(self, item):
        bisect.insort(self.queue, item)
------------------------------------------------------------------------

This example relies on the behavior of the other internal functions of
the Queue class.  Since my faster Queue class changes the internal
structure, it breaks this example.  Strangely, the internal functions
are not actually mentioned in the documentation for Queue, so this
example is somewhat anomalous.  However, the comments inside Queue.py
*do* suggest subclassing Queue to create non-FIFO queues.

The example was not present in 2.2, so removing it may not hurt too
many people.

I confess I'm new to the Python development process.  Who makes
decisions about whether this type of change should go in, or not?  Do
I just submit a patch and cross my fingers? ;)

-- Agthorr


From tim_one@email.msn.com  Thu Apr 24 05:35:39 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 24 Apr 2003 00:35:39 -0400
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <20030424034656.GF12507@barsoom.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com>

[Agthorr]
>> However, speaking of subclassing Queue: is it likely there are many
>> user applications that subclass it in a way that would break? (i.e.,
>> they override some, but not all, of the functions intended for
>> overriding).

[Agthorr]
> Answering myself, I notice that the bisect class documents this use of
> the Queue class:
> ------------------------------------------------------------------------
> The bisect module can be used with the Queue module to implement
> a priority queue (example courtesy of Fredrik Lundh): \index{Priority
> Queue}
>
> \begin{verbatim}
> import Queue, bisect
>
> class PriorityQueue(Queue.Queue):
>     def _put(self, item):
>         bisect.insort(self.queue, item)
> ------------------------------------------------------------------------
>
> This example relies on the behavior of the other internal functions of
> the Queue class.  Since my faster Queue class changes the internal
> structure, it breaks this example.  Strangely, the internal functions
> are not actually mentioned in the documentation for Queue, so this
> example is somewhat anomalous.  However, the comments inside Queue.py
> *do* suggest subclassing Queue to create non-FIFO queues.
>
> The example was not present in 2.2, so removing it may not hurt too
> many people.

I'm sorry I had to let this thread drop.  I had lots of time to type on the
weekend, and on Monday because I took that day off from work sick.  My time
is gone now, though.

As a delayed answer to your question, yes, people do this.  I expect the
most common subclass does just this:

    def _get(self):
        return self.queue.pop()

That is, for many apps, the first-in part of FIFO isn't needed, and a stack
of work is just as good.  I'm not sure it wouldn't be just as good for your
simulation app, either!

People aren't "supposed to" muck with private names, and a single underscore
at the front is a convention for saying "please don't muck with this".

I don't believe you made a strong enough case to break code that cheats,
though:  the code as it is now is obviously correct at first glance.  The
best that can be said for the much hairier circular-buffer business is that
it's not obviously incorrect at first glance, and Python isn't immune to
that ongoing maintenacne is more expensive than initial development.  I also
think your use of (presumably many) thousands of Queue items is unusual.  A
subclass may be welcome, and doc clarifications would certainly be welcome.

> I confess I'm new to the Python development process.  Who makes
> decisions about whether this type of change should go in, or not?  Do
> I just submit a patch and cross my fingers? ;)

There aren't enough volunteers to review patches, and "Guido's team" doesn't
spend work hours on Python anymore except as it happens to intersect with
important Zope needs, so I'm afraid it may sit there forever.  Talking about
it on Python-Dev was/is a good thing.  If you haven't already, you should
devour the developer material at:

    http://www.python.org/dev/

Right now we're trying to conserve our "spare time" for resolving issues
necessary to release 2.3b1 on Friday, so it's hard to keep a conversation
going.

Don't let any of this discourage you!  To become a Python developer requires
an almost supernatural love of discouragement -- you'll know what I mean
when you meet any of us <wink>.



From tim_one@email.msn.com  Thu Apr 24 05:42:35 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 24 Apr 2003 00:42:35 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPIEHAB.tim_one@email.msn.com>

[Guido]
> ...
> Certainly my own motivation is the fun I have here and not personal
> gain!!!

It's good to hear that.  I've been worrying that if your goal had been
riches and power all along, you must be incompetent <wink>.

don't-worry-*our*-goal-is-your-personal-gain-ly y'rs  - tim



From martin@v.loewis.de  Thu Apr 24 06:19:55 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 07:19:55 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <20030423163947.GA24541@nyman.amk.ca>
References: <20030423163947.GA24541@nyman.amk.ca>
Message-ID: <m3lly079qc.fsf@mira.informatik.hu-berlin.de>

"A.M. Kuchling" <amk@amk.ca> writes:

> Do we want to do anything about this for 2.3?  A benefit is that AES
> is useful, and likely to remain so for the next 20 years; a drawback
> is that it might entangle the PSF in export-control legalities.  I
> vaguely recall the PSF getting some legal advice on this point; am I
> misremembering?  What was the outcome?

I think we now formally meet all US export requirements. The
requirement is that we inform some agency that we do export
cryptographic software. Jeremy did that. I don't recall the exact
details of that registration, but I think it would be easy to update
it to also report that we export an AES implementation (or, perhaps,
our registration was generic to cover all future additions to the SF
CVS tree).

So I'm all in favour of adding AES to the Python standard library.

Regards,
Martin


From agthorr@barsoom.org  Thu Apr 24 06:26:41 2003
From: agthorr@barsoom.org (Agthorr)
Date: Wed, 23 Apr 2003 22:26:41 -0700
Subject: [Python-Dev] FIFO data structure?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com>
References: <20030424034656.GF12507@barsoom.org> <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com>
Message-ID: <20030424052640.GG12507@barsoom.org>

On Thu, Apr 24, 2003 at 12:35:39AM -0400, Tim Peters wrote:
> I'm sorry I had to let this thread drop.  I had lots of time to type on the
> weekend, and on Monday because I took that day off from work sick.  My time
> is gone now, though.

Quite alright; I understand entirely.  I appreciate your responding
now :-)

> That is, for many apps, the first-in part of FIFO isn't needed, and a stack
> of work is just as good.  I'm not sure it wouldn't be just as good for your
> simulation app, either!

It might be.  When I originally wrote my simulation-dispatcher, it
needed to work with a small FIFO Queue.  I'm currently using it for a
different project where a large stack would work fine.  There's a good
chance that sometime in the future I'll need the large FIFO, though.

> People aren't "supposed to" muck with private names, and a single underscore
> at the front is a convention for saying "please don't muck with
> this".

I've been thinking about what the "right way" for the Queue to expose
it's interface would be.  It doesn't seem quite right for those
functions to be "public" names either; since they should never
actually be called directly by a user program.  Is there a convention
for member functions that are meant to be overridden, but not
(externally) called?

> I don't believe you made a strong enough case to break code that cheats,
> though:  the code as it is now is obviously correct at first glance.  The
> best that can be said for the much hairier circular-buffer business is that
> it's not obviously incorrect at first glance, and Python isn't immune to
> that ongoing maintenacne is more expensive than initial development.  I also
> think your use of (presumably many) thousands of Queue items is unusual.  A
> subclass may be welcome, and doc clarifications would certainly be
> welcome.

That's fair.  My other primary programming language is C, where the
standard libraries tend to be tightly optimized for performance.
Hence, my expectations tend to be biased in that direction.  That
doesn't mean that my expectations are the right way to do things
though ;)

"Premature optimization is the root of much evil"

> There aren't enough volunteers to review patches, and "Guido's team" doesn't
> spend work hours on Python anymore except as it happens to intersect with
> important Zope needs, so I'm afraid it may sit there forever.  Talking about
> it on Python-Dev was/is a good thing.  If you haven't already, you should
> devour the developer material at:
> 
>     http://www.python.org/dev/

I have, indeed, already devoured it.  :)

> Right now we're trying to conserve our "spare time" for resolving issues
> necessary to release 2.3b1 on Friday, so it's hard to keep a conversation
> going.

Okay, in that case I'll drop the Queue issue for now, and revisit the
thread on heaps.  That's something I feel needs to be done right for
2.3, or a bunch of user code will come to depend on the heap
implementation rather than the heap interface.

> Don't let any of this discourage you!  To become a Python developer requires
> an almost supernatural love of discouragement -- you'll know what I mean
> when you meet any of us <wink>.

Thanks :)

-- Agthorr


From ji@mit.jyu.fi  Thu Apr 24 07:12:05 2003
From: ji@mit.jyu.fi (Jonne Itkonen)
Date: Thu, 24 Apr 2003 09:12:05 +0300 (EETDST)
Subject: [Python-Dev] Democracy
In-Reply-To: <20030423175310.F15881@localhost.localdomain>
Message-ID: <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi>

On Wed, 23 Apr 2003, Jack Diederich wrote:

> On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote:
> > I read this interview in ACM's *Ubiquity* which reminded me of the
> > Python developer community.  Seems we are doing some things right.
> > Maybe we can learn from it in cases where we aren't.
>
> He seems to be talking more about Governments (and treating companies as
> governments b/c the people can't or don't want to leave) and knowledge
> workers broadly.
...
> The building houses vs building code analogy is not perfect.
...
> In closing, if there is something to be learned by looking at others,

There always is...

The article at Ubiquity, Jack's writings, and the appearance of ancient
Greeks every here and there... I'd like to point you to

  http://www.dreamsongs.org/MobSoftware.html

Is the resemblance in my eyes, or do we get a glimpse of a shift of
paradigm approaching?

  Jonne



From jack@performancedrivers.com  Thu Apr 24 09:05:43 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Thu, 24 Apr 2003 04:05:43 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi>; from ji@mit.jyu.fi on Thu, Apr 24, 2003 at 09:12:05AM +0300
References: <20030423175310.F15881@localhost.localdomain> <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi>
Message-ID: <20030424040543.I15881@localhost.localdomain>

On Thu, Apr 24, 2003 at 09:12:05AM +0300, Jonne Itkonen wrote:
> On Wed, 23 Apr 2003, Jack Diederich wrote:
> 
> > On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote:
> > > I read this interview in ACM's *Ubiquity* which reminded me of the
> > > Python developer community.  Seems we are doing some things right.
> > > Maybe we can learn from it in cases where we aren't.
> >
> > He seems to be talking more about Governments (and treating companies as
> > governments b/c the people can't or don't want to leave) and knowledge
> > workers broadly.
> ...
> > The building houses vs building code analogy is not perfect.
> ...
> > In closing, if there is something to be learned by looking at others,
> 
> There always is...
>
>   http://www.dreamsongs.org/MobSoftware.html
> 

Before we go too far afield, does anyone know of a Wiki where this kind
of thing is dicussed, or is anyone willing to host one?  This is a worthwhile
conversation, but is a runaway favorite for off topic thread of tomorrow.

-jack


From mal@lemburg.com  Thu Apr 24 09:32:27 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 24 Apr 2003 10:32:27 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3EA7A11B.8090202@lemburg.com>

Martin v. L=F6wis wrote:
> "A.M. Kuchling" <amk@amk.ca> writes:
>=20
>=20
>>Do we want to do anything about this for 2.3?  A benefit is that AES
>>is useful, and likely to remain so for the next 20 years; a drawback
>>is that it might entangle the PSF in export-control legalities.  I
>>vaguely recall the PSF getting some legal advice on this point; am I
>>misremembering?  What was the outcome?
>=20
> I think we now formally meet all US export requirements. The
> requirement is that we inform some agency that we do export
> cryptographic software. Jeremy did that. I don't recall the exact
> details of that registration, but I think it would be easy to update
> it to also report that we export an AES implementation (or, perhaps,
> our registration was generic to cover all future additions to the SF
> CVS tree).
>=20
> So I'm all in favour of adding AES to the Python standard library.

-1.

Why do you only look at US export rules when discussing crypto
code in Python ? There are plenty of other countries where
importing/exporting and/or using such code is illegal:

     http://rechten.kub.nl/koops/cryptolaw/cls2.htm

Please keep the crypto code separate from the core Python
distribution.

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 24 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        61 days left



From mal@lemburg.com  Thu Apr 24 09:36:56 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 24 Apr 2003 10:36:56 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA7A11B.8090202@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca>	<m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com>
Message-ID: <3EA7A228.2010705@lemburg.com>

M.-A. Lemburg wrote:
> Why do you only look at US export rules when discussing crypto
> code in Python ? There are plenty of other countries where
> importing/exporting and/or using such code is illegal:
> 
>     http://rechten.kub.nl/koops/cryptolaw/cls2.htm
> 
> Please keep the crypto code separate from the core Python
> distribution.

Here's a really nice graphical overview:

    http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 24 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        61 days left



From nramchandani@harveynash.com  Thu Apr 24 12:23:22 2003
From: nramchandani@harveynash.com (Neeta Ramchandani)
Date: Thu, 24 Apr 2003 12:23:22 +0100
Subject: [Python-Dev] Python Developers
Message-ID: <sea7d753.075@lon_nw_9.harveynash.com>

Hi,

I know there aren't many of you guys, but I have an Investment Bank that is=
 looking for an OO Scriptor, with at least 2 years Java experience with Uni=
x and proper Python development skills. =20
Anyone know anyone....or anyone interested in this 3-6 contract?


Neeta Ramchandani
Key Account Manager
Harvey Nash IT
Investment Banking / Finance Team
DD:  020 73331518
Fax:  020 73332657
E-mail: nramchandani@harveynash.com
Website:  www.harveynash.com



*****************************************************************
IMPORTANT NOTICE

The information in this e-mail and any attached files is CONFIDENTIAL and m=
ay be legally privileged or prohibited from disclosure and unauthorised use=
.  The views of the author may not necessarily reflect those of the Company.

It is intended solely for the addressee, or the employee or agent responsib=
le for delivering such materials to the addressee.  If you have received th=
is message in error please return it to the sender then delete the email an=
d destroy any copies of it.  If you are not the intended recipient, any for=
m of reproduction, dissemination, copying, disclosure, modification, distri=
bution and/or publication or any action taken or omitted to be taken in rel=
iance upon this message or its attachments is prohibited and may be unlawfu=
l.

At present the integrity of e-mail across the Internet cannot be guaranteed=
 and messages sent via this medium are potentially at risk.  All liability =
is excluded to the extent permitted by law for any claims arising as a resu=
lt of the use of this medium to transmit information by or to the Harvey Na=
sh Group plc.
*****************************************************************



From guido@python.org  Thu Apr 24 13:20:53 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 08:20:53 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: "Your message of Thu, 24 Apr 2003 10:36:56 +0200."
 <3EA7A228.2010705@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com>
 <3EA7A228.2010705@lemburg.com>
Message-ID: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>

> M.-A. Lemburg wrote:
> > Why do you only look at US export rules when discussing crypto
> > code in Python ? There are plenty of other countries where
> > importing/exporting and/or using such code is illegal:
> > 
> >     http://rechten.kub.nl/koops/cryptolaw/cls2.htm
> > 
> > Please keep the crypto code separate from the core Python
> > distribution.
> 
> Here's a really nice graphical overview:
> 
>     http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm

Thanks for the URLs!  Another good reason to avoid tying up Python
with crypto.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin@mems-exchange.org  Thu Apr 24 13:38:02 2003
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 24 Apr 2003 08:38:02 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA7A228.2010705@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
Message-ID: <20030424123802.GA32257@ute.mems-exchange.org>

On Thu, Apr 24, 2003 at 10:36:56AM +0200, M.-A. Lemburg wrote:
>Here's a really nice graphical overview:
>   http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm

Thanks for posting this link; very nice!  

Guido wrote:
>Rotor should be deprecated regardless; I've never heard of someone
>using it.

Actually, back when Zope was Principia, products could be shipped as
encrypted .pyc's, and the rotor module was used to encrypt them.  It's
not relevant now, though.

I'll mark the deprecation patch as accepted and check it in.

--amk                                                    (www.amk.ca)
"Generic identifier" -- think about it too much and your head explodes.
      -- Sean McGrath at IPC7, discussing SGML terminology


From aahz@pythoncraft.com  Thu Apr 24 14:31:52 2003
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 24 Apr 2003 09:31:52 -0400
Subject: [Python-Dev] Python Developers
In-Reply-To: <sea7d753.075@lon_nw_9.harveynash.com>
References: <sea7d753.075@lon_nw_9.harveynash.com>
Message-ID: <20030424133152.GC12899@panix.com>

On Thu, Apr 24, 2003, Neeta Ramchandani wrote:
> 
> I know there aren't many of you guys, but I have an Investment Bank
> that is looking for an OO Scriptor, with at least 2 years Java
> experience with Unix and proper Python development skills.  Anyone
> know anyone....or anyone interested in this 3-6 contract?

Please send this to jobs@python.org; that's where to advertise for
Python jobs.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From skip@pobox.com  Thu Apr 24 15:10:17 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 24 Apr 2003 09:10:17 -0500
Subject: [Python-Dev] why is test_socketserver in expected skips?
Message-ID: <16039.61513.240914.807445@montanaro.dyndns.org>

test_socketserver seems to be in all the expected skip lists except for
(oddly enough) os2emx.  It correctly bails if the network resource isn't set
and the 2.2 branch version seems to complete for me on my Mac OS X system.
When run like:

    % ./python.exe ../Lib/test/test_socketserver.py

the 2.3 branch version fails because the network resource isn't enabled:

    Traceback (most recent call last):
      File "../Lib/test/test_socketserver.py", line 5, in ?
        test_support.requires('network')
      File "/Users/skip/src/python/head/dist/src/Lib/test/test_support.py", line 68, in requires
        raise ResourceDenied(msg)
    test.test_support.ResourceDenied: Use of the `network' resource not enabled
    [5953 refs]

Seems like a fairly simple change to test_support.requires() would correct
things: 

    def requires(resource, msg=None):
        # see if the caller's module is __main__ - if so, treat as if
        # the resource was set
        if sys._getframe().f_back.f_globals.get("__name__") == "__main__":
            return
        if not is_resource_enabled(resource):
            if msg is None:
                msg = "Use of the `%s' resource not enabled" % resource
            raise ResourceDenied(msg)

Someone please shout if the above not-quite-obvious code doesn't look
correct.

Thx,

Skip


From guido@python.org  Thu Apr 24 15:18:26 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 10:18:26 -0400
Subject: [Python-Dev] why is test_socketserver in expected skips?
In-Reply-To: Your message of "Thu, 24 Apr 2003 09:10:17 CDT."
 <16039.61513.240914.807445@montanaro.dyndns.org>
References: <16039.61513.240914.807445@montanaro.dyndns.org>
Message-ID: <200304241418.h3OEIQA11173@odiug.zope.com>

> test_socketserver seems to be in all the expected skip lists except
> for (oddly enough) os2emx.

Probably because the os2emx port hasn't been updated in a while.

> It correctly bails if the network resource isn't set and the 2.2
> branch version seems to complete for me on my Mac OS X system.  When
> run like:
> 
>     % ./python.exe ../Lib/test/test_socketserver.py
> 
> the 2.3 branch version fails because the network resource isn't enabled:
> 
>     Traceback (most recent call last):
>       File "../Lib/test/test_socketserver.py", line 5, in ?
>         test_support.requires('network')
>       File "/Users/skip/src/python/head/dist/src/Lib/test/test_support.py", line 68, in requires
>         raise ResourceDenied(msg)
>     test.test_support.ResourceDenied: Use of the `network' resource not enabled
>     [5953 refs]
> 
> Seems like a fairly simple change to test_support.requires() would
> correct things:
> 
>     def requires(resource, msg=None):
>         # see if the caller's module is __main__ - if so, treat as if
>         # the resource was set
>         if sys._getframe().f_back.f_globals.get("__name__") == "__main__":
>             return
>         if not is_resource_enabled(resource):
>             if msg is None:
>                 msg = "Use of the `%s' resource not enabled" % resource
>             raise ResourceDenied(msg)
> 
> Someone please shout if the above not-quite-obvious code doesn't look
> correct.

Looks good to me; I've thought of this myself occasionally.  Please
also update the README file for testing to mention this detail!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Thu Apr 24 15:58:36 2003
From: barry@python.org (Barry Warsaw)
Date: 24 Apr 2003 10:58:36 -0400
Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python
 2.3
In-Reply-To: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>
References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de>
 <1050092819.11172.89.camel@barry>
 <m3istk3pr3.fsf@mira.informatik.hu-berlin.de>
 <1050511925.9818.78.camel@barry>
 <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de>
 <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de>
 <1051041205.32490.51.camel@barry>
 <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de>
Message-ID: <1051196316.22909.13.camel@barry>

On Tue, 2003-04-22 at 18:15, Martin v. Löwis wrote:

> For safety, I'd recommend that you use byte string msgids if
> conversion to Unicode fails. Otherwise, I'm fine with automatically
> coercing everything to Unicode.

For now, I'll add a comment to the code at the point of conversion since
I'm not sure whether it's better to throw an exception or attempt to
carry on with 8-bit strings.  I'll update the docs too.

> I do know about catalogs that use Latin-1 in msgids (to represent
> accented characters in the names of authors). That should not cause
> failures.

Cool, thanks for the feedback Martin!
-Barry




From skip@pobox.com  Thu Apr 24 16:34:13 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 24 Apr 2003 10:34:13 -0500
Subject: [Python-Dev] why is test_socketserver in expected skips?
In-Reply-To: <200304241418.h3OEIQA11173@odiug.zope.com>
References: <16039.61513.240914.807445@montanaro.dyndns.org>
 <200304241418.h3OEIQA11173@odiug.zope.com>
Message-ID: <16040.1013.400199.534299@montanaro.dyndns.org>

>>>>> "Guido" == Guido van Rossum <guido@python.org> writes:

    >> test_socketserver seems to be in all the expected skip lists except
    >> for (oddly enough) os2emx.

    Guido> Probably because the os2emx port hasn't been updated in a while.

I guess I should have phrased my question differently.  Why is it on any
expected skip lists at all?  It seems to me that the 'network' resouce
requirement is sufficient to keep it from being run inappropriately.  

    >> def requires(resource, msg=None):
    >>   # see if the caller's module is __main__ - if so, treat as if
    >>   # the resource was set
    ...
    >> Someone please shout if the above not-quite-obvious code doesn't look
    >> correct.

    Guido> Looks good to me; I've thought of this myself occasionally.
    Guido> Please also update the README file for testing to mention this
    Guido> detail!

Thanks, I'll tuck it into CVS later today.

Skip



From guido@python.org  Thu Apr 24 16:48:54 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 11:48:54 -0400
Subject: [Python-Dev] why is test_socketserver in expected skips?
In-Reply-To: Your message of "Thu, 24 Apr 2003 10:34:13 CDT."
 <16040.1013.400199.534299@montanaro.dyndns.org>
References: <16039.61513.240914.807445@montanaro.dyndns.org> <200304241418.h3OEIQA11173@odiug.zope.com>
 <16040.1013.400199.534299@montanaro.dyndns.org>
Message-ID: <200304241548.h3OFms411960@odiug.zope.com>

> I guess I should have phrased my question differently.  Why is it on
> any expected skip lists at all?  It seems to me that the 'network'
> resouce requirement is sufficient to keep it from being run
> inappropriately.

I seems to me too.  It looks like such tests are still added to the
"skipped" lists by regrtest.main().  Maybe they shouldn't be?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Thu Apr 24 16:52:56 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 24 Apr 2003 11:52:56 -0400
Subject: [Python-Dev] why is test_socketserver in expected skips?
In-Reply-To: <16040.1013.400199.534299@montanaro.dyndns.org>
References: <16039.61513.240914.807445@montanaro.dyndns.org>
 <200304241418.h3OEIQA11173@odiug.zope.com>
 <16040.1013.400199.534299@montanaro.dyndns.org>
Message-ID: <16040.2136.405300.211588@grendel.zope.com>

Skip Montanaro writes:
 > I guess I should have phrased my question differently.  Why is it on any
 > expected skip lists at all?  It seems to me that the 'network' resouce
 > requirement is sufficient to keep it from being run inappropriately.  

Being on the expected skip lists doesn't keep it from running; the
resource requirement handles that, and causes it to be skipped when
the resource isn't enabled.

Until fairly recently, a test that was skipped due to resource denial
was still reported as an unexpected skip if it wasn't listed.  That
was fixed in Lib/test/regrtest.py revision 1.122.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From theller@python.net  Thu Apr 24 17:24:48 2003
From: theller@python.net (Thomas Heller)
Date: 24 Apr 2003 18:24:48 +0200
Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation)
In-Reply-To: <20030423180221.GP12836@epoch.metaslash.com>
References: <20030423172110.GO12836@epoch.metaslash.com>
 <3ck95cmh.fsf@python.net> <20030423180221.GP12836@epoch.metaslash.com>
Message-ID: <8ytz3ltb.fsf@python.net>

Neal Norwitz <neal@metaslash.com> writes:

> I think getargs_ul() is broken.  For example, if the user passes more
> than a single char as the format, memory will be scribbled on.  The
> format should be checked to make sure it contains acceptable values
> for getargs_ul() to be safe.
> 
> I fixed a similar problem in revision 1.23 of _testcapimodule.c.
> See comment and code around line 330.
> 

I've replaced the getargs_ul() function and friends with new getargs_X()
functions for all the tested format codes.
I've also adapted test_getargs2 to use these new functions.

Skip and Jack have offered to test this, anyone else is welcome as well
to report crashes.

Thomas



From mcherm@mcherm.com  Thu Apr 24 17:44:09 2003
From: mcherm@mcherm.com (Michael Chermside)
Date: Thu, 24 Apr 2003 09:44:09 -0700
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
Message-ID: <1051202649.3ea814599f6fa@mcherm.com>

Tim:

Don't get a swelled head or anything ;-), but your generator-based version 
of walk() is beautiful piece of work. I don't mean the code (although that's
clean and readable), but the design. Using a generator is clearly good,
having it return (path,names) tuples is a nice way to work, and having
it return (path,dirnames,filenames) tuples is inspired. (If you want
them lumped together, just add the lists!) Allowing the consumer to 
modify control the flow by modifying dirnames is very nice. And the fact
that it's so simple to code (22 short lines) is a testament to the power 
of generators.

I'm +2 on putting this in immediately and deprecating os.path.walk().

-- Michael Chermside



From guido@python.org  Thu Apr 24 17:50:25 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 12:50:25 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: Your message of "Thu, 24 Apr 2003 09:44:09 PDT."
 <1051202649.3ea814599f6fa@mcherm.com>
References: <1051202649.3ea814599f6fa@mcherm.com>
Message-ID: <200304241650.h3OGoPM15432@odiug.zope.com>

> From: Michael Chermside <mcherm@mcherm.com>

> Tim:
> 
> Don't get a swelled head or anything ;-), but your generator-based
> version of walk() is beautiful piece of work. I don't mean the code
> (although that's clean and readable), but the design. Using a
> generator is clearly good, having it return (path,names) tuples is a
> nice way to work, and having it return (path,dirnames,filenames)
> tuples is inspired. (If you want them lumped together, just add the
> lists!) Allowing the consumer to modify control the flow by
> modifying dirnames is very nice. And the fact that it's so simple to
> code (22 short lines) is a testament to the power of generators.
> 
> I'm +2 on putting this in immediately and deprecating os.path.walk().

Agreed.  How about naming it os.walk()?  I think it's not OS specific
-- all the OS specific stuff is part of os.path.  So we only need one
implementation.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python@rcn.com  Thu Apr 24 18:04:57 2003
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 13:04:57 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
References: <1051202649.3ea814599f6fa@mcherm.com>  <200304241650.h3OGoPM15432@odiug.zope.com>
Message-ID: <001e01c30a83$a2cf6440$b6b8958d@oemcomputer>

> > I'm +2 on putting this in immediately and deprecating os.path.walk().
>
> Agreed.  How about naming it os.walk()?  I think it's not OS specific
> -- all the OS specific stuff is part of os.path.  So we only need one
> implementation.

Double check on SF.  Someone had posted a patch for this and
Martin v. Löwis had some reasons for rejecting it or something
else that should have been done at the same time.


Raymond Hettinger


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################


From Raymond Hettinger" <python@rcn.com  Thu Apr 24 18:48:09 2003
From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 13:48:09 -0400
Subject: [Python-Dev] netrc.py
Message-ID: <004601c30a89$c4459e40$b6b8958d@oemcomputer>

Bram Moolenaar 
> > Please at least do not produce the NetrcParseError when the
> > "login" field is omitted.  This can be done by changing the
> > "else:" above "malformed %s entry" to "elif not password:".
> >  That is the minimal change to make this module work on my
> > system.

Bram is requesting netrc.py be modified to exclude entries
without a login field.  An example use case is for mail servers:

    machine mail          password fruit

If the change is made, the line won't be handled at all.  It
would be silently skipped.  Currently is raises a NetrcParseError.

Do you guys think this is appropriate?  On the one hand,
it's a bummer that netrc.py cannot currently be used with
files containing these lines.  On the other hand, silently
skipping over them doesn't seem quite right either.


Raymond Hettinger


P.S.  He would also like (but does not have to have) this
backported.


From martin@v.loewis.de  Thu Apr 24 19:21:20 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 20:21:20 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA7A11B.8090202@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com>
Message-ID: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> Why do you only look at US export rules when discussing crypto
> code in Python ?

Because only exporting matters. Importing is no problem: You can
easily *remove* stuff from the distribution, by creating a copy of
package that doesn't have the code that cannot be imported. That would
be the job of whoever wants to import it.

Exporting also only matters from the servers which host the Python
distribution, i.e. the US and the Netherlands.

Regards,
Martin


From martin@v.loewis.de  Thu Apr 24 19:22:41 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 20:22:41 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> Thanks for the URLs!  Another good reason to avoid tying up Python
> with crypto.

I don't consider that a good reason. Including batteries is one of the
strengths of Python, and if there are useful libraries, we should
attempt to include them.

Regards,
Martin



From esr@thyrsus.com  Thu Apr 24 19:26:22 2003
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 24 Apr 2003 14:26:22 -0400
Subject: [Python-Dev] netrc.py
In-Reply-To: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
Message-ID: <20030424182622.GA21500@thyrsus.com>

Raymond Hettinger <raymond.hettinger@verizon.net>:
> Bram Moolenaar 
> > > Please at least do not produce the NetrcParseError when the
> > > "login" field is omitted.  This can be done by changing the
> > > "else:" above "malformed %s entry" to "elif not password:".
> > >  That is the minimal change to make this module work on my
> > > system.
> 
> Bram is requesting netrc.py be modified to exclude entries
> without a login field.  An example use case is for mail servers:
> 
>     machine mail          password fruit
> 
> If the change is made, the line won't be handled at all.  It
> would be silently skipped.  Currently is raises a NetrcParseError.
> 
> Do you guys think this is appropriate?  On the one hand,
> it's a bummer that netrc.py cannot currently be used with
> files containing these lines.  On the other hand, silently
> skipping over them doesn't seem quite right either.

As the original designer, I say -1.  It's not clear to me or when how entries
of this kind have value.  But I'm willing to be convinced otherwise by
a good argument.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>


From guido@python.org  Thu Apr 24 19:30:30 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 14:30:30 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: Your message of "24 Apr 2003 20:22:41 +0200."
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304241830.h3OIUUj22372@odiug.zope.com>

> > Thanks for the URLs!  Another good reason to avoid tying up Python
> > with crypto.
> 
> I don't consider that a good reason. Including batteries is one of the
> strengths of Python, and if there are useful libraries, we should
> attempt to include them.

IMO there are more important batteries to include before we deal with
the hassle of registering for crypto stuff.  Even if it's harmless,
the inclusion of any crypto at all causes some people to have to go
through a lot of corporate red tape.  I just dealt with questions from
someone who was re-exporting Python and needed answers for his
corporate lawyer.  If I had to say "yes, Python contains an AES
implementation" his red tape amount would have multiplied.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Thu Apr 24 19:37:20 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 24 Apr 2003 13:37:20 -0500
Subject: [Python-Dev] netrc.py
In-Reply-To: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
Message-ID: <16040.12000.700605.215458@montanaro.dyndns.org>

>>>>> "Raymond" == Raymond Hettinger <raymond.hettinger@verizon.net> writes:

    Raymond> Bram Moolenaar 
    >> > Please at least do not produce the NetrcParseError when the
    >> > "login" field is omitted.  This can be done by changing the
    >> > "else:" above "malformed %s entry" to "elif not password:".
    >> >  That is the minimal change to make this module work on my
    >> > system.

    Raymond> Bram is requesting netrc.py be modified to exclude entries
    Raymond> without a login field.  An example use case is for mail
    Raymond> servers:

    Raymond>     machine mail          password fruit

    Raymond> If the change is made, the line won't be handled at all.  It
    Raymond> would be silently skipped.  Currently is raises a
    Raymond> NetrcParseError.

Why not have it add an entry to self.hosts with an empty string associated
with the 'login' key?

Skip


From fdrake@acm.org  Thu Apr 24 19:43:06 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 24 Apr 2003 14:43:06 -0400
Subject: [Python-Dev] netrc.py
In-Reply-To: <20030424182622.GA21500@thyrsus.com>
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
 <20030424182622.GA21500@thyrsus.com>
Message-ID: <16040.12346.526703.651003@grendel.zope.com>

Eric S. Raymond writes:
 > As the original designer, I say -1.  It's not clear to me or when
 > how entries of this kind have value.  But I'm willing to be
 > convinced otherwise by a good argument.

Looking at the netrc(5) manpage on my RedHat 7.3 box, I'd say it's
clear that a machine entry without a login should specifically
suppress autologin for that machine.  For example, this .netrc file:

  machine ftp.example.com
  default login anonymous password fred@example.com

should cause autologin on every machine except for ftp.example.com.
If the ftp.example.com entry is simply dropped, the default could be
used, and that would be wrong.

So I think the entry should be retained, with a login value of None.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From wesleyhenwood@hotmail.com  Thu Apr 24 20:41:48 2003
From: wesleyhenwood@hotmail.com (wesley henwood)
Date: Thu, 24 Apr 2003 19:41:48 +0000
Subject: [Python-Dev] PyRun_* functions
Message-ID: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com>

Quote from py docs:
"Note also that several of these functions take FILE* parameters. On 
particular issue which needs to be handled carefully is that the FILE 
structure for different C libraries can be different and incompatible. Under 
Windows (at least), it is possible for dynamically linked extensions to 
actually use different libraries, so care should be taken that FILE* 
parameters are only passed to these functions if it is certain that they 
were created by the same library that the Python runtime is using."

How does one do this - make sure that they were created with the same lib?

Its seems that it would be a good enhancement to remove the FILE pointer 
parameter from these functions, and just use the file name.  For example, 
change PyRun_SimpleFile( FILE *fp, char *filename) to PyRun_SimpleFile(char 
*filename). Then no one would have to worry about the incompatibility.





_________________________________________________________________




From python@rcn.com  Thu Apr 24 20:49:12 2003
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 15:49:12 -0400
Subject: [Python-Dev] netrc.py
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer><20030424182622.GA21500@thyrsus.com> <16040.12346.526703.651003@grendel.zope.com>
Message-ID: <005201c30a9a$94e23580$b6b8958d@oemcomputer>

[Fred L. Drake, Jr.]
> Looking at the netrc(5) manpage on my RedHat 7.3 box, I'd say it's
> clear that a machine entry without a login should specifically
> suppress autologin for that machine.  For example, this .netrc file:
> 
>   machine ftp.example.com
>   default login anonymous password fred@example.com
> 
> should cause autologin on every machine except for ftp.example.com.
> If the ftp.example.com entry is simply dropped, the default could be
> used, and that would be wrong.
> 
> So I think the entry should be retained, with a login value of None.



[Skip Montanaro]
> Why not have it add an entry to self.hosts with an empty string associated
> with the 'login' key?


Since existing apps expect a string, the empty string approach may be preferable.


Raymond Hettinger



From fdrake@acm.org  Thu Apr 24 20:52:26 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 24 Apr 2003 15:52:26 -0400
Subject: [Python-Dev] netrc.py
In-Reply-To: <005201c30a9a$94e23580$b6b8958d@oemcomputer>
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
 <20030424182622.GA21500@thyrsus.com>
 <16040.12346.526703.651003@grendel.zope.com>
 <005201c30a9a$94e23580$b6b8958d@oemcomputer>
Message-ID: <16040.16506.722640.409224@grendel.zope.com>

Raymond Hettinger writes:
 > Since existing apps expect a string, the empty string approach may
 > be preferable.

I could live with that.  The real point is that it's wrong to drop
records without a login on the floor.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From python@rcn.com  Thu Apr 24 20:57:31 2003
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 15:57:31 -0400
Subject: [Python-Dev] netrc.py
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer><20030424182622.GA21500@thyrsus.com><16040.12346.526703.651003@grendel.zope.com><005201c30a9a$94e23580$b6b8958d@oemcomputer> <16040.16506.722640.409224@grendel.zope.com>
Message-ID: <007a01c30a9b$be1d3660$b6b8958d@oemcomputer>

>  > Since existing apps expect a string, the empty string approach may
>  > be preferable.

[Fred]
> I could live with that.  The real point is that it's wrong to drop
> records without a login on the floor.

Since that solution is friendly to existing apps, do you think
it is reasonable to backport it?


Raymond


From fdrake@acm.org  Thu Apr 24 21:04:50 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 24 Apr 2003 16:04:50 -0400
Subject: [Python-Dev] netrc.py
In-Reply-To: <007a01c30a9b$be1d3660$b6b8958d@oemcomputer>
References: <004601c30a89$c4459e40$b6b8958d@oemcomputer>
 <20030424182622.GA21500@thyrsus.com>
 <16040.12346.526703.651003@grendel.zope.com>
 <005201c30a9a$94e23580$b6b8958d@oemcomputer>
 <16040.16506.722640.409224@grendel.zope.com>
 <007a01c30a9b$be1d3660$b6b8958d@oemcomputer>
Message-ID: <16040.17250.119938.342267@grendel.zope.com>

Raymond Hettinger writes:
 > Since that solution is friendly to existing apps, do you think
 > it is reasonable to backport it?

I'd be happy with that; not handling those entries is a bug in my book
(using the netrc(5) manpage as my critical reference), so it's very
reasonable to backport.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From drifty@alum.berkeley.edu  Thu Apr 24 21:09:26 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Thu, 24 Apr 2003 13:09:26 -0700 (PDT)
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <200304241830.h3OIUUj22372@odiug.zope.com>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>  <200304241830.h3OIUUj22372@odiug.zope.com>
Message-ID: <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > > Thanks for the URLs!  Another good reason to avoid tying up Python
> > > with crypto.
> >
> > I don't consider that a good reason. Including batteries is one of the
> > strengths of Python, and if there are useful libraries, we should
> > attempt to include them.
>
> IMO there are more important batteries to include before we deal with
> the hassle of registering for crypto stuff.  Even if it's harmless,
> the inclusion of any crypto at all causes some people to have to go
> through a lot of corporate red tape.
<snip>

Good point.  I admit I think it would be cool to have an AES
implementation in the stdlib, but I don't see it as crucial.

I think does make sense, though, to have a package that is maintained
separately that python-dev pseudo endorses (like PyXML and win32all) that
contains all of this crypto stuff.

-Brett


From fdrake@acm.org  Thu Apr 24 21:09:41 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 24 Apr 2003 16:09:41 -0400
Subject: [Python-Dev] PyRun_* functions
In-Reply-To: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com>
References: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com>
Message-ID: <16040.17541.672978.719267@grendel.zope.com>

wesley henwood writes:
 > How does one do this - make sure that they were created with the same lib?

Exactly.  This tends not to be a problem on Unix (though possible),
but isn't so rare on Windows.

 > Its seems that it would be a good enhancement to remove the FILE pointer 
 > parameter from these functions, and just use the file name.  For example, 
 > change PyRun_SimpleFile( FILE *fp, char *filename) to PyRun_SimpleFile(char 
 > *filename). Then no one would have to worry about the incompatibility.

That would be a loss of functionality -- these can currently work
with, for example, standard input.  That's currently required by the
interpreter's main program.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From guido@python.org  Thu Apr 24 21:12:25 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 16:12:25 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: Your message of "Thu, 24 Apr 2003 13:09:26 PDT."
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
Message-ID: <200304242012.h3OKCP325878@odiug.zope.com>

> I think does make sense, though, to have a package that is maintained
> separately that python-dev pseudo endorses (like PyXML and win32all) that
> contains all of this crypto stuff.

Right.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From drifty@alum.berkeley.edu  Thu Apr 24 21:22:46 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Thu, 24 Apr 2003 13:22:46 -0700 (PDT)
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <1051215797.1847.6.camel@barry>
References: <20030423163947.GA24541@nyman.amk.ca>  <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>  <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
 <1051215797.1847.6.camel@barry>
Message-ID: <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>

[Barry Warsaw]

> On Thu, 2003-04-24 at 16:09, Brett Cannon wrote:
>
> > I think does make sense, though, to have a package that is maintained
> > separately that python-dev pseudo endorses (like PyXML and win32all) that
> > contains all of this crypto stuff.
>
> Where do we draw the line?  Do we delete the ssl stuff?  What about the
> crypto hashes?  hmac?  md5?  mpz?  All of Chapter 15 in the library
> reference manual?
>

Anything that causes export issues should be separate.  From my
understanding hash functions are not regulated.  I believe SSL is okay
because the encryption is not high enough (this all from memory, so don't
take this as hard fact).

But you are right, Barry, there is no hard line that can easily be drawn;
joys of laws in the US.  =)

-Brett


From martin@v.loewis.de  Thu Apr 24 21:29:10 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 22:29:10 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
 <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
 <1051215797.1847.6.camel@barry>
 <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
Message-ID: <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>

Brett Cannon <bac@OCF.Berkeley.EDU> writes:

> Anything that causes export issues should be separate.  From my
> understanding hash functions are not regulated.  I believe SSL is okay
> because the encryption is not high enough (this all from memory, so don't
> take this as hard fact).

It is probably pointless to discuss this among non-lawyers, however, I
do believe that a strict "no crypto" policy would cause the removal of
all the modules that Barry mentioned.

For the specific case of OpenSSL, it seems pretty clear that it
*cannot* be exported from the US without telling the respective
agency. When I studied their rules, I came to the conclusion that even
the *wrapper* around it needs to be declared (so both the Windows
binary release and the source release cannot be exported without
being declared in advance).

Of course, if one considers crypto stuff as useless and a waste of
time, then probably https is not interesting, either.

Regards,
Martin


From guido@python.org  Thu Apr 24 21:34:10 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 16:34:10 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: Your message of "24 Apr 2003 22:29:10 +0200."
 <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
 <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304242034.h3OKYAt26069@odiug.zope.com>

> Of course, if one considers crypto stuff as useless and a waste of
> time, then probably https is not interesting, either.

Except that some URLs are *only* accessible through https -- this was
the push for supporting https.  I don't see the same kind of push for
AES yet.

It is true that we should report the inclusiong of openssl and its
wrappers to the authorities.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Thu Apr 24 21:37:03 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 22:37:03 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <200304242034.h3OKYAt26069@odiug.zope.com>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
 <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
 <1051215797.1847.6.camel@barry>
 <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
 <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>
 <200304242034.h3OKYAt26069@odiug.zope.com>
Message-ID: <m3el3reiog.fsf@mira.informatik.hu-berlin.de>

Guido van Rossum <guido@python.org> writes:

> It is true that we should report the inclusiong of openssl and its
> wrappers to the authorities.

I think we did already; Jeremy should know the details.

Regards,
Martin


From python@rcn.com  Thu Apr 24 21:14:43 2003
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 16:14:43 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>              <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>  <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
Message-ID: <00a901c30a9e$253a63c0$b6b8958d@oemcomputer>

> > > I don't consider that a good reason. Including batteries is one of the
> > > strengths of Python, and if there are useful libraries, we should
> > > attempt to include them.
> >
> > IMO there are more important batteries to include before we deal with
> > the hassle of registering for crypto stuff.  Even if it's harmless,
> > the inclusion of any crypto at all causes some people to have to go
> > through a lot of corporate red tape.

Just sneak it through by labeling it as a Python-to-Perl conversion tool ;)


Raymond


From barry@python.org  Thu Apr 24 21:23:17 2003
From: barry@python.org (Barry Warsaw)
Date: 24 Apr 2003 16:23:17 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com>
 <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
 <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
Message-ID: <1051215797.1847.6.camel@barry>

On Thu, 2003-04-24 at 16:09, Brett Cannon wrote:

> > IMO there are more important batteries to include before we deal with
> > the hassle of registering for crypto stuff.  Even if it's harmless,
> > the inclusion of any crypto at all causes some people to have to go
> > through a lot of corporate red tape.
> <snip>
> 
> Good point.  I admit I think it would be cool to have an AES
> implementation in the stdlib, but I don't see it as crucial.
> 
> I think does make sense, though, to have a package that is maintained
> separately that python-dev pseudo endorses (like PyXML and win32all) that
> contains all of this crypto stuff.

Where do we draw the line?  Do we delete the ssl stuff?  What about the
crypto hashes?  hmac?  md5?  mpz?  All of Chapter 15 in the library
reference manual?

-Barry






From agthorr@barsoom.org  Thu Apr 24 21:48:12 2003
From: agthorr@barsoom.org (Agthorr)
Date: Thu, 24 Apr 2003 13:48:12 -0700
Subject: [Python-Dev] heaps
Message-ID: <20030424204812.GD24838@barsoom.org>

I brought up heapq last week, but there was only brief discussion
before the issue got sidetracked into a discussion of FIFO queues.
I'd like to revisit heapq.  The two people who responded seemed to
agree that the existing heapq interface was lacking, and this seemed
to be the sentiment many months ago when heapq was added.

I'll summarize some of the heap interfaces that have been proposed:

 - the heapq currently in CVS:
   - Provides functions to manipulate a list organized as a binary heap
   - Advantages:
     - Internal binary heap structure is transparent to user, useful
       for educational purposes
     - Low overhead
     - Already in CVS

 - My MinHeap/MaxHeap classes:
   - Provides a class with heap access routines, using a list internally
   - Advantages:
     - Implementation is opaque, so it can be replaced later with
       Fibonacci heaps or Paired heaps without breaking user programs
     - Provides an adjust_key() command needed by some applications
       (e.g. Dijkstra's Algorithm)

 - David Eppstein's priorityDictionary class:
   - Provides a class with a dictionary-style interface
     (ex: heap['cat'] = 5 would give 'cat' a priority of 5 in the heap)
   - Advantages:
     - Implementation is opaque, so it can be replaced later with
       Fibonacci heaps or Paired heaps without breaking user programs
     - A dictionary interface may be more intuitive for certain
       applications
   - Limitation:
     - Objects with the same value may only have a single instance in
       the heap.

I'd very much like to see the current heapq replaced with a different
interface in time for 2.3.  I believe that an opaque object is better,
since it allows more flexibility later.  If the current heapq is
released, user program will start to use it, and then it will be much
more difficult to switch to a different heap algorithm later, should
that become desirable.  Also, decrease-key is an important feature
that many users will expect from a heap; this operation is notably
missing from heapq.

I'm willing to do whatever work is necessary to get a more flexible
heap interface into 2.3.  If the consensus prefers my MinHeap (or
something similar), I'll gladly write documentation (and have already
written rather brutal tests).

Somebody with authority, just tell me where to pour my energy in this
matter :)

-- Agthorr


From guido@python.org  Thu Apr 24 21:50:23 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 16:50:23 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: Your message of "24 Apr 2003 22:37:03 +0200."
 <m3el3reiog.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> <200304242034.h3OKYAt26069@odiug.zope.com>
 <m3el3reiog.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304242050.h3OKoNx26182@odiug.zope.com>

> > It is true that we should report the inclusiong of openssl and its
> > wrappers to the authorities.
> 
> I think we did already; Jeremy should know the details.

Jeremy sits next to me, and he tells me he did not.  However it is on
his TODO list.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From neal@metaslash.com  Thu Apr 24 22:02:29 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Thu, 24 Apr 2003 17:02:29 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <200304242050.h3OKoNx26182@odiug.zope.com>
References: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
 <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
 <1051215797.1847.6.camel@barry>
 <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
 <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>
 <200304242034.h3OKYAt26069@odiug.zope.com>
 <m3el3reiog.fsf@mira.informatik.hu-berlin.de>
 <200304242050.h3OKoNx26182@odiug.zope.com>
Message-ID: <20030424210229.GT12836@epoch.metaslash.com>

On Thu, Apr 24, 2003 at 04:50:23PM -0400, Guido van Rossum wrote:
> > > It is true that we should report the inclusiong of openssl and its
> > > wrappers to the authorities.
> > 
> > I think we did already; Jeremy should know the details.
> 
> Jeremy sits next to me, and he tells me he did not.  However it is on
> his TODO list.

I contacted the BXA which is part of the US Dept. of Commerce:
<http://www.bxa.doc.gov/Encryption/PubAvailEncSourceCodeNofify.html>.

I think I notified them that Python contains Rotor and then forwarded
the info to Jeremy.  I'm not sure if there's anything else that needs
to be done.  It was unclear of whether this was required for each
release.

My memory is fuzzy.  I remember talking to Martin and Jeremy, but
this was probably at least 6 months ago.

Neal


From martin@v.loewis.de  Thu Apr 24 22:17:19 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 24 Apr 2003 23:17:19 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <20030424210229.GT12836@epoch.metaslash.com>
References: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net>
 <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de>
 <200304241830.h3OIUUj22372@odiug.zope.com>
 <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU>
 <1051215797.1847.6.camel@barry>
 <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU>
 <m3r87rej1l.fsf@mira.informatik.hu-berlin.de>
 <200304242034.h3OKYAt26069@odiug.zope.com>
 <m3el3reiog.fsf@mira.informatik.hu-berlin.de>
 <200304242050.h3OKoNx26182@odiug.zope.com>
 <20030424210229.GT12836@epoch.metaslash.com>
Message-ID: <m3ptnbd28w.fsf@mira.informatik.hu-berlin.de>

Neal Norwitz <neal@metaslash.com> writes:

> My memory is fuzzy.  I remember talking to Martin and Jeremy, but
> this was probably at least 6 months ago.

You first sent a letter that I include below; you then edited the
NOTIFICATION at

http://mail.python.org/pipermail/python-dev/2002-March/021785.html

It appears that you then didn't actually send the notification to BXA,
but that the PSF board passed a motion in

http://www.python.org/psf/records/board/minutes-2002-04-09.html

charging Jeremy with contacting BXA; it appears that this did not
happen, either.

Regards, 
Martin

>
> I work on an open source project called Python.  It is a programming
> language which is publicly available at http://www.python.org/.
> The current version is 2.2.  This software is provided free of charge.
>
> We would like to comply with US export regulations, however,
> we are not sure what, if anything, needs to be done.
>
> There is an encryption technique used in the rotormodule.c file
> (which is attached).  This apparently uses 80 bits.
>
> Do we need to send a NOTIFICATION?  Is there anything else we
> need to do?
>
> Thank you,
> Neal



From python@rcn.com  Thu Apr 24 22:52:00 2003
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 24 Apr 2003 17:52:00 -0400
Subject: [Python-Dev] heaps
References: <20030424204812.GD24838@barsoom.org>
Message-ID: <001901c30aab$bf31c060$b6b8958d@oemcomputer>

> I'd very much like to see the current heapq replaced with a different
> interface in time for 2.3.  I believe that an opaque object is better,
> since it allows more flexibility later.  

I'm quite pleased with the version already in CVS.  It is a small
masterpiece of exposition, sophistication, simplicity, and speed.
A class based interface is not necessary for every algorithm.

For the other approaches, what might be useful is to define an API
and leave it that.  The various implementations can be maintained
on Cookbook pages, the Vaults of Parnassus, or an SF project.

The min/max heap and fibonacci heaps are a great idea.  Nice work.


Raymond Hettinger


From tim.one@comcast.net  Thu Apr 24 23:01:42 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 24 Apr 2003 18:01:42 -0400
Subject: [Python-Dev] New test failure on Windows
Message-ID: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>

Last-second re changes don't appear to be going in the right direction
<wink>:

C:\Code\python\PCbuild>python ../lib/test/test_re.py
Running re_tests test suite
test_basic_re_sub (__main__.ReTests) ... ok
test_constants (__main__.ReTests) ... ok
test_escaped_re_sub (__main__.ReTests) ... ok
test_flags (__main__.ReTests) ... ok
test_limitations (__main__.ReTests) ... ERROR
test_pickling (__main__.ReTests) ... ok
test_qualified_re_split (__main__.ReTests) ... ok
test_qualified_re_sub (__main__.ReTests) ... ok
test_re_escape (__main__.ReTests) ... ok
test_re_findall (__main__.ReTests) ... ok
test_re_match (__main__.ReTests) ... ok
test_re_split (__main__.ReTests) ... ok
test_re_subn (__main__.ReTests) ... ok
test_search_star_plus (__main__.ReTests) ... ok
test_symbolic_refs (__main__.ReTests) ... ok

======================================================================
ERROR: test_limitations (__main__.ReTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_re.py", line 182, in test_limitations
    self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))
  File "C:\Code\python\lib\sre.py", line 132, in match
    return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded

----------------------------------------------------------------------



From gherron@islandtraining.com  Thu Apr 24 23:38:42 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 24 Apr 2003 15:38:42 -0700
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
Message-ID: <200304241538.43480.gherron@islandtraining.com>

On Thursday 24 April 2003 03:01 pm, Tim Peters wrote:
> Last-second re changes don't appear to be going in the right direction
> <wink>:
>
> C:\Code\python\PCbuild>python ../lib/test/test_re.py
> Running re_tests test suite
> test_basic_re_sub (__main__.ReTests) ... ok
> test_constants (__main__.ReTests) ... ok
> test_escaped_re_sub (__main__.ReTests) ... ok
> test_flags (__main__.ReTests) ... ok
> test_limitations (__main__.ReTests) ... ERROR
> test_pickling (__main__.ReTests) ... ok
> test_qualified_re_split (__main__.ReTests) ... ok
> test_qualified_re_sub (__main__.ReTests) ... ok
> test_re_escape (__main__.ReTests) ... ok
> test_re_findall (__main__.ReTests) ... ok
> test_re_match (__main__.ReTests) ... ok
> test_re_split (__main__.ReTests) ... ok
> test_re_subn (__main__.ReTests) ... ok
> test_search_star_plus (__main__.ReTests) ... ok
> test_symbolic_refs (__main__.ReTests) ... ok
>
> ======================================================================
> ERROR: test_limitations (__main__.ReTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_re.py", line 182, in test_limitations
>     self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))
>   File "C:\Code\python\lib\sre.py", line 132, in match
>     return _compile(pattern, flags).match(string)
> RuntimeError: maximum recursion limit exceeded
>

Today's change to test_re (rather than a change to any of the sre
code) is the problem.  It appears the Skip was attempting to translate
the tests to use the unittest module.  One test (and perhaps others)
were translated incorrectly.


The original test was:

  try:
      verify(re.match('(x)*', 50000*'x').span() == (0, 50000))
  except RuntimeError, v:
      print v

Since this is *supposed* to cause a RuntimeError, it should be
translated something like

  self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x')

but definitely not as

  self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))


Here's the CVS log entry:
----------------------------
revision 1.34
date: 2003/04/24 19:43:18;  author: montanaro;  state: Exp;  lines: +294 -371
first cut at unittest version of re tests
----------------------------

Gary Herron




From drifty@alum.berkeley.edu  Thu Apr 24 23:43:34 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Thu, 24 Apr 2003 15:43:34 -0700 (PDT)
Subject: [Python-Dev] When is it okay to ``cvs remove``?
Message-ID: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>

I am rewriting test_urllib.py from scratch since the current version is
very lacking (and out of date; the thing tests against UserDict from odd
reason).  Since I have written it from scratch I figure doing a ``cvs
remove`` on the current test_urllib.py and then adding my new version to
get a fresh version numbering?

Also, my rewrite is not finished (have some more things I want to test),
but what I have so far passes and seems good.  Should I bother to check in
what I have so far to have it in b1, or hold off until the suite is
completely finished?  I am assuming since these are unit tests that are
passing I don't need to bother with an SF patch to get a code review from
someone.

-Brett


From thomas@xs4all.net  Thu Apr 24 23:59:14 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 25 Apr 2003 00:59:14 +0200
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
Message-ID: <20030424225914.GA26254@xs4all.nl>

On Thu, Apr 24, 2003 at 03:43:34PM -0700, Brett Cannon wrote:

> I am rewriting test_urllib.py from scratch since the current version is
> very lacking (and out of date; the thing tests against UserDict from odd
> reason).  Since I have written it from scratch I figure doing a ``cvs
> remove`` on the current test_urllib.py and then adding my new version to
> get a fresh version numbering?

That's not particularly useful. The only thing that does is create a period
in time (or rather, 'history' -- CVS history) in which test_urllib.py
doesn't exist. Re-adding the file won't give you a fresh version numbering
either, it'll just give you a lot of headaches, especially when there are
branches involved (right, Barry ? :-)

Just commit your new test_urllib.py directly, when it's all done, using
something like

    cvs commit -r2.0 test_urllib.py

But you probably want to discuss the version number you want to force, Guido
might like to reserve 2.0 for something (although I think he should use
'3000' instead :)

CVS is very 4-dimensional; it only allows for one file to exist at any given
spot in the entire timeline. It can leave and come back, but it's still the
same file. (And, for example, it can never become a directory.) And a file
can have only one 1.1 revision. If you have direct access to the CVS
repository (which is actually RCS) you can remove the RCS file and start
really a'fresh, but that means you lose history. It's nullifies the file.
(and is also about as drastic as Galactus' Nullifier ;-)

> Also, my rewrite is not finished (have some more things I want to test),
> but what I have so far passes and seems good.  Should I bother to check in
> what I have so far to have it in b1, or hold off until the suite is
> completely finished?  I am assuming since these are unit tests that are
> passing I don't need to bother with an SF patch to get a code review from
> someone.

It might at least make sense to have some differing platforms run the test
before you check it in.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From eppstein@ics.uci.edu  Fri Apr 25 00:20:50 2003
From: eppstein@ics.uci.edu (David Eppstein)
Date: Thu, 24 Apr 2003 16:20:50 -0700
Subject: [Python-Dev] Re: heaps
References: <20030424204812.GD24838@barsoom.org> <001901c30aab$bf31c060$b6b8958d@oemcomputer>
Message-ID: <eppstein-C9853A.16204924042003@main.gmane.org>

In article <001901c30aab$bf31c060$b6b8958d@oemcomputer>,
 "Raymond Hettinger" <python@rcn.com> wrote:

> > I'd very much like to see the current heapq replaced with a different
> > interface in time for 2.3.  I believe that an opaque object is better,
> > since it allows more flexibility later.  
> 
> I'm quite pleased with the version already in CVS.  It is a small
> masterpiece of exposition, sophistication, simplicity, and speed.
> A class based interface is not necessary for every algorithm.

It has some elegance, but omits basic operations that are necessary for 
many heap-based algorithms and are not provided by this interface.  
Specifically, the three algorithms that use heaps in my upper-division 
undergraduate algorithms classes are heapsort (for which heapq works 
fine, but you would generally want to use L.sort() instead), Dijkstra's 
algorithm (and its relatives such as A* and Prim), which needs the 
ability to decrease keys, and event-queue-based plane sweep algorithms 
(e.g. for finding all crossing pairs in a set of line segments) which 
need the ability to delete items from other than the top.

To see how important the lack of these operations is, I decided to 
compare two implementations of Dijkstra's algorithm.  The priority-dict 
implementation from 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/119466 takes as 
input a graph, coded as nested dicts {vertex: {neighbor: edge length}}.
This is a variation of a graph coding suggested in one of Guido's essays 
that, as Raymond suggests, avoids using a separate class based interface.

Here's a simplification of my dictionary-based Dijkstra implementation:

def Dijkstra(G,start,end=None):
    D = {}   # dictionary of final distances
    P = {}   # dictionary of predecessors
    Q = priorityDictionary()   # est.dist. of non-final vert.
    Q[start] = 0
    for v in Q:
        D[v] = Q[v]
        for w in G[v]:
            vwLength = D[v] + G[v][w]
            if w not in D and (w not in Q or vwLength < Q[w]):
                Q[w] = vwLength
                P[w] = v
   return (D,P)

Here's a translation of the same implementation to heapq (untested 
since I'm not running 2.3).  Since there is no decrease in heapq, nor 
any way to find and remove old keys, I changed the algorithm to add new 
tuples for each new key, leaving the old tuples in place until they 
bubble up to the top of the heap.

def Dijkstra(G,start,end=None):
    D = {}   # dictionary of final distances
    P = {}   # dictionary of predecessors
    Q = [(0,None,start)]  # heap of (est.dist., pred., vert.)
    while Q:
        dist,pred,v = heappop(Q)
        if v in D:
            continue  # tuple outdated by decrease-key, ignore
        D[v] = dist
        P[v] = pred
        for w in G[v]:
            heappush(Q, (D[v] + G[v][w], v, w))
    return (D,P)

My analysis of the differences between the two implementations:

- The heapq version is slightly complicated (the two lines 
if...continue) by the need to explicitly ignore tuples with outdated 
priorities.  This need for inserting low-level data structure 
maintenance code into higher-level algorithms is intrinsic to using 
heapq, since its data is not structured in a way that can support 
efficient decrease key operations.

- Since the heap version had no way to determine when a new key was 
smaller than an old one, the heapq implementation needed two separate 
data structures to maintain predecessors (middle elements of tuples for 
items in queue, dictionary P for items already removed from queue).  In 
the dictionary implementation, both types of items stored their 
predecessors in P, so there was no need to transfer this information 
from one structure to another.

- The dictionary version is slightly complicated by the need to look up 
old heap keys and compare them with the new ones instead of just 
blasting new tuples onto the heap.  So despite the more-flexible heap 
structure of the dictionary implementation, the overall code complexity 
of both implementations ends up being about the same.

- Heapq forced me to build tuples of keys and items, while the 
dictionary based heap did not have the same object-creation overhead 
(unless it's hidden inside the creation of dictionary entries).  On the 
other hand, since I was already building tuples, it was convenient to 
also store predecessors in them instead of in some other structure.

- The heapq version uses significantly more storage than the dictionary: 
proportional to the number of edges instead of the number of vertices.

- The changes I made to Dijkstra's algorithm in order to use heapq might 
not have been obvious to a non-expert; more generally I think this lack 
of flexibility would make it more difficult to use heapq for 
cookbook-type implementation of textbook algorithms.

- In Dijkstra's algorithm, it was easy to identify and ignore outdated 
heap entries, sidestepping the inability to decrease keys.  I'm not 
convinced that this would be as easy in other applications of heaps.

- One of the reasons to separate data structures from the algorithms 
that use them is that the data structures can be replaced by ones with 
equivalent behavior, without changing any of the algorithm code.  The 
heapq Dijkstra implementation is forced to include code based on the 
internal details of heapq (specifically, the line initializing the heap 
to be a one element list), making it less flexible for some uses.  The 
usual reason one might want to replace a data structure is for 
efficiency, but there are others: for instance, I teach various 
algorithms classes and might want to use an implementation of Dijkstra's 
algorithm as a testbed for learning about different priority queue data 
structures.  I could do that with the dictionary-based implementation 
(since it shows nothing of the heap details) but not the heapq one.

Overall, while heapq was usable for implementing Dijkstra, I think it 
has significant shortcomings that could be avoided by a more 
well-thought-out interface that provided a little more functionality and 
a little clearer separation between interface and implementation.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science



From aahz@pythoncraft.com  Fri Apr 25 00:22:48 2003
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 24 Apr 2003 19:22:48 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030424232248.GA25695@panix.com>

On Wed, Apr 23, 2003, Guido van Rossum wrote:
> Aahz:
>> 
>> Seems to me that this is a good argument for module-level properties,
>> BTW, or we require that all module attributes be set only through
>> functions.
> 
> I'm not following.  What do you mean by module-level properties?

Data descriptors on module objects.  Let's suppose we have, say, a BCD
module.  For example, we want to set the "global" rounding state on a
per-thread basis.  By definition, modules are singletons, so there needs
to be a container within the module to hold the per-thread rounding
state.  Question is, how/when do we update that container?  Currently,
the only option is to require a user to call a function with the new
setting as a parameter; I can imagine cases where it would be convenient
to be able to simply set the module attribute, exactly the way we now
permit with new-style classes.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Why is this newsgroup different from all other newsgroups?


From gherron@islandtraining.com  Fri Apr 25 00:33:50 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 24 Apr 2003 16:33:50 -0700
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <200304241538.43480.gherron@islandtraining.com>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241538.43480.gherron@islandtraining.com>
Message-ID: <200304241633.50247.gherron@islandtraining.com>

There's a bit more to this problem.  It has to do with the *sre* test
versus the *re* tests.  When test_sre is run, it claims to run all its
own tests as well as all of test_re.  However any failed tests in
test_re are not reported by test_sre.  (Neither the one found by Tim
nor any others I just purposely introduced into test_re.)  This is
clearly a problem with test_sre.  Only if you run test_re directly
rather than through test_sre do you see Tim's error.

I'm hoping that Skip, who made these changes, can fix them.  (BTW, I
like the idea of putting all these tests into unittest -- the old test
code looked like a cancer of multiple test methods grown on top of
each other.)

Gary Herron



On Thursday 24 April 2003 03:38 pm, Gary Herron wrote:
> On Thursday 24 April 2003 03:01 pm, Tim Peters wrote:
> > Last-second re changes don't appear to be going in the right direction
> > <wink>:
> >
> > C:\Code\python\PCbuild>python ../lib/test/test_re.py
> > Running re_tests test suite
> > test_basic_re_sub (__main__.ReTests) ... ok
> > test_constants (__main__.ReTests) ... ok
> > test_escaped_re_sub (__main__.ReTests) ... ok
> > test_flags (__main__.ReTests) ... ok
> > test_limitations (__main__.ReTests) ... ERROR
> > test_pickling (__main__.ReTests) ... ok
> > test_qualified_re_split (__main__.ReTests) ... ok
> > test_qualified_re_sub (__main__.ReTests) ... ok
> > test_re_escape (__main__.ReTests) ... ok
> > test_re_findall (__main__.ReTests) ... ok
> > test_re_match (__main__.ReTests) ... ok
> > test_re_split (__main__.ReTests) ... ok
> > test_re_subn (__main__.ReTests) ... ok
> > test_search_star_plus (__main__.ReTests) ... ok
> > test_symbolic_refs (__main__.ReTests) ... ok
> >
> > ======================================================================
> > ERROR: test_limitations (__main__.ReTests)
> > ----------------------------------------------------------------------
> > Traceback (most recent call last):
> >   File "../lib/test/test_re.py", line 182, in test_limitations
> >     self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))
> >   File "C:\Code\python\lib\sre.py", line 132, in match
> >     return _compile(pattern, flags).match(string)
> > RuntimeError: maximum recursion limit exceeded
>
> Today's change to test_re (rather than a change to any of the sre
> code) is the problem.  It appears the Skip was attempting to translate
> the tests to use the unittest module.  One test (and perhaps others)
> were translated incorrectly.
>
>
> The original test was:
>
>   try:
>       verify(re.match('(x)*', 50000*'x').span() == (0, 50000))
>   except RuntimeError, v:
>       print v
>
> Since this is *supposed* to cause a RuntimeError, it should be
> translated something like
>
>   self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x')
>
> but definitely not as
>
>   self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))
>
>
> Here's the CVS log entry:
> ----------------------------
> revision 1.34
> date: 2003/04/24 19:43:18;  author: montanaro;  state: Exp;  lines: +294
> -371 first cut at unittest version of re tests
> ----------------------------
>
> Gary Herron
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev







From thomas@xs4all.net  Fri Apr 25 00:03:16 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 25 Apr 2003 01:03:16 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20030424230316.GB26254@xs4all.nl>

On Thu, Apr 24, 2003 at 08:21:20PM +0200, Martin v. L=F6wis wrote:

> Exporting also only matters from the servers which host the Python
> distribution, i.e. the US and the Netherlands.

Good point. Not only will Guido be exporting crypto software tomorrow whe=
n
he uploads 2.3b1, he will also be importing it... Especially since he's
still a Dutch citizen. I can't figure out if that's a good thing or not. =
:)

--=20
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me =
spread!


From tim.one@comcast.net  Fri Apr 25 01:13:05 2003
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 24 Apr 2003 20:13:05 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <20030424225914.GA26254@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net>

[Thomas Wouters, dispensing good CVS advice]
> ...
> Just commit your new test_urllib.py directly, when it's all done, using
> something like
>
>     cvs commit -r2.0 test_urllib.py
>
> But you probably want to discuss the version number you want to
> force, Guido might like to reserve 2.0 for something (although I
> think he should use '3000' instead :)

That part I didn't grok:  why force an artifical version number?  I can't
imagine a use for that.  The "Rewrote from scratch." checkin comment Brett
will surely make is milestone enough in the CVS log.



From pedronis@bluewin.ch  Fri Apr 25 01:17:28 2003
From: pedronis@bluewin.ch (Samuele Pedroni)
Date: Fri, 25 Apr 2003 02:17:28 +0200
Subject: [Python-Dev] draft PEP: Trace and Profile Support for
 Threads
In-Reply-To: <20030424232248.GA25695@panix.com>
References: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
 <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
 <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
 <20030423224005.GA6089@panix.com>
 <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5.2.1.1.0.20030425021640.0230e0d0@pop.bluewin.ch>

At 19:22 24.04.03 -0400, Aahz wrote:
>On Wed, Apr 23, 2003, Guido van Rossum wrote:
> > Aahz:
> >>
> >> Seems to me that this is a good argument for module-level properties,
> >> BTW, or we require that all module attributes be set only through
> >> functions.
> >
> > I'm not following.  What do you mean by module-level properties?
>
>Data descriptors on module objects.  Let's suppose we have, say, a BCD
>module.  For example, we want to set the "global" rounding state on a
>per-thread basis.  By definition, modules are singletons, so there needs
>to be a container within the module to hold the per-thread rounding
>state.  Question is, how/when do we update that container?  Currently,
>the only option is to require a user to call a function with the new
>setting as a parameter; I can imagine cases where it would be convenient
>to be able to simply set the module attribute, exactly the way we now
>permit with new-style classes.

see the following thread

http://aspn.activestate.com/ASPN/Mail/Message/1497615




From gherron@islandtraining.com  Fri Apr 25 01:47:28 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 24 Apr 2003 17:47:28 -0700
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <200304241633.50247.gherron@islandtraining.com>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241538.43480.gherron@islandtraining.com> <200304241633.50247.gherron@islandtraining.com>
Message-ID: <200304241747.29059.gherron@islandtraining.com>

On Thursday 24 April 2003 04:33 pm, Gary Herron wrote:
> There's a bit more to this problem.  It has to do with the *sre* test
> versus the *re* tests.  When test_sre is run, it claims to run all its
> own tests as well as all of test_re.  However any failed tests in
> test_re are not reported by test_sre.  (Neither the one found by Tim
> nor any others I just purposely introduced into test_re.)  This is
> clearly a problem with test_sre.  Only if you run test_re directly
> rather than through test_sre do you see Tim's error.

Sigh... I find I was confused and must correct that last paragraph...

Test test_sre imports re_test not test_re, and test_re also imports
re_test -- perhaps you'll understand my confusion.  Running test_sre
should *not* find Tim's bug, but running test_re should.  Test_sre has
no problem, test_re needs to be fixed, and re_test, used by both, is
fine.

Sigh...
Gary Herron




From drifty@alum.berkeley.edu  Fri Apr 25 01:56:39 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Thu, 24 Apr 2003 17:56:39 -0700 (PDT)
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <20030424225914.GA26254@xs4all.nl>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
 <20030424225914.GA26254@xs4all.nl>
Message-ID: <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU>

[Thomas Wouters]

> On Thu, Apr 24, 2003 at 03:43:34PM -0700, Brett Cannon wrote:
>
> > Also, my rewrite is not finished (have some more things I want to test),
> > but what I have so far passes and seems good.  Should I bother to check in
> > what I have so far to have it in b1, or hold off until the suite is
> > completely finished?  I am assuming since these are unit tests that are
> > passing I don't need to bother with an SF patch to get a code review from
> > someone.
>
> It might at least make sense to have some differing platforms run the test
> before you check it in.
>

OK, I will finish the code then first.  Just to double-check, creating a
tracker item and initially assigning it to myself will not cause people to
ignore it since everyone who cares will see the new item when it gets
mailed to Patches and sees I am asking for other people on other platforms
beyond OS X to give the code a run, right?

And is having new testing suites peer-reviewed a common thing?  Or should
I only worry about it when there is a slight chance cross-platform issues
might sprout up from the tests?  I already know to have any questionable
code and massive code changes checked, but I also don't want to hold up
code I think is good and safe on SF and bug other people to check it for
me.

-Brett


From andymac@bullseye.apana.org.au  Thu Apr 24 23:43:55 2003
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Fri, 25 Apr 2003 09:43:55 +1100 (edt)
Subject: [Python-Dev] why is test_socketserver in expected skips?
In-Reply-To: <200304241418.h3OEIQA11173@odiug.zope.com>
Message-ID: <Pine.OS2.4.44.0304250934160.28662-100000@tenring.andymac.org>

On Thu, 24 Apr 2003, Guido van Rossum wrote:

> > test_socketserver seems to be in all the expected skip lists except
> > for (oddly enough) os2emx.
>
> Probably because the os2emx port hasn't been updated in a while.

As it happens, I routinely test with the network resource enabled, and
the EMX port Makefile explicitly enables it for the test target.  So I
never considered test_socketserver an expected skip...

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From guido@python.org  Fri Apr 25 02:22:57 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 21:22:57 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: "Your message of Thu, 24 Apr 2003 15:43:34 PDT."
 <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
Message-ID: <200304250122.h3P1MvJ01176@pcp02138704pcs.reston01.va.comcast.net>

> I am rewriting test_urllib.py from scratch since the current version
> is very lacking (and out of date; the thing tests against UserDict
> from odd reason).  Since I have written it from scratch I figure
> doing a ``cvs remove`` on the current test_urllib.py and then adding
> my new version to get a fresh version numbering?

No, just copy it on top. and check it in.  We don't do fresh
version numbering. :-)

> Also, my rewrite is not finished (have some more things I want to
> test), but what I have so far passes and seems good.  Should I
> bother to check in what I have so far to have it in b1, or hold off
> until the suite is completely finished?  I am assuming since these
> are unit tests that are passing I don't need to bother with an SF
> patch to get a code review from someone.

I'd say check it in and keep working on it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 25 02:32:14 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 21:32:14 -0400
Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads
In-Reply-To: "Your message of Thu, 24 Apr 2003 19:22:48 EDT."
 <20030424232248.GA25695@panix.com>
References: <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
 <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
 <20030423224005.GA6089@panix.com>
 <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
 <20030424232248.GA25695@panix.com>
Message-ID: <200304250132.h3P1WEc01920@pcp02138704pcs.reston01.va.comcast.net>

> >> Seems to me that this is a good argument for module-level properties,
> >> BTW, or we require that all module attributes be set only through
> >> functions.
> > 
> > I'm not following.  What do you mean by module-level properties?
> 
> Data descriptors on module objects.

I promise you will never get these.  Modules are supposed to be robust
and simple.  If you want fancy, you can use classes and instances.

> Let's suppose we have, say, a BCD module.  For example, we want to
> set the "global" rounding state on a per-thread basis.  By
> definition, modules are singletons, so there needs to be a container
> within the module to hold the per-thread rounding state.  Question
> is, how/when do we update that container?  Currently, the only
> option is to require a user to call a function with the new setting
> as a parameter; I can imagine cases where it would be convenient to
> be able to simply set the module attribute, exactly the way we now
> permit with new-style classes.

Hm, why hide the mechanism?  I'd say let the BCD module get an options
object by explicitly asking for the current thread (or using a
higher-level per-thread data facility), and let the user make a
function call to set the state -- the function can request the
per-thread options object and update it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 25 02:51:34 2003
From: guido@python.org (Guido van Rossum)
Date: Thu, 24 Apr 2003 21:51:34 -0400
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: "Your message of Thu, 24 Apr 2003 17:47:28 PDT."
 <200304241747.29059.gherron@islandtraining.com>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
 <200304241538.43480.gherron@islandtraining.com>
 <200304241633.50247.gherron@islandtraining.com>
 <200304241747.29059.gherron@islandtraining.com>
Message-ID: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>

I think I understand the problem, and I've checked something in that
makes the test pass, by insisting that the match raise RuntimeError
with a specific error message.  This is what was tested before; that
particular error message was part of the expected output in
Lib/test/output/test_re, which is now no longer needed and which I
have hence deleted.

(Hmm, I wonder if there are any other files in Lib/test/output that
are no longer needed?  All those files should eventually disappear...)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com  Fri Apr 25 03:01:52 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Thu, 24 Apr 2003 22:01:52 -0400
Subject: [Python-Dev] Data Descriptors on module objects (was Re: draft PEP: Trace
 and Profile Support for Threads)
In-Reply-To: <20030424232248.GA25695@panix.com>
References: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
 <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
 <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
 <20030423224005.GA6089@panix.com>
 <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5.1.0.14.0.20030424215447.0220a3b0@mail.telecommunity.com>

At 07:22 PM 4/24/03 -0400, Aahz wrote:

>Data descriptors on module objects.

If you *really* need them, you can have them.

from types import ModuleType
import time, sys

class ModuleWithDescriptor(ModuleType):

     bar = property(lambda self: time.time())


moduleFoo = ModuleWithDescriptor()

# named module must be importable, but not yet imported;
# parent package must be in sys.modules
moduleFoo.__name__ = "mypackage.foo"
sys.modules['mypacakge.foo'] =
reload(moduleFoo)

import mypackage.foo

# watch the time change...
print mypackage.foo.bar
print mypackage.foo.bar


I *love* new-style classes.  I use the trick above for lazy module 
importation; a subclass of ModuleType that doesn't import itself until the 
first time a __getattribute__ occurs.



From pje@telecommunity.com  Fri Apr 25 03:50:49 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Thu, 24 Apr 2003 22:50:49 -0400
Subject: [Python-Dev] Data Descriptors on module objects (was Re:
 draft PEP: Trace and Profile Support for Threads)
In-Reply-To: <5.1.0.14.0.20030424215447.0220a3b0@mail.telecommunity.com>
References: <20030424232248.GA25695@panix.com>
 <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
 <1051040847.12834.32.camel@slothrop.zope.com>
 <20030423194638.GA19312@panix.com>
 <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net>
 <20030423224005.GA6089@panix.com>
 <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5.1.0.14.0.20030424224918.02ca5010@mail.telecommunity.com>

At 10:01 PM 4/24/03 -0400, Phillip J. Eby wrote:
># named module must be importable, but not yet imported;
># parent package must be in sys.modules
>moduleFoo.__name__ = "mypackage.foo"
>sys.modules['mypacakge.foo'] =
>reload(moduleFoo)

Oops, that was supposed to read:

sys.modules['mypackage.foo'] = moduleFoo



From gherron@islandtraining.com  Fri Apr 25 03:56:49 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 24 Apr 2003 19:56:49 -0700
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241747.29059.gherron@islandtraining.com> <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304241956.49764.gherron@islandtraining.com>

On Thursday 24 April 2003 06:51 pm, Guido van Rossum wrote:
> I think I understand the problem, and I've checked something in that
> makes the test pass, by insisting that the match raise RuntimeError
> with a specific error message.  This is what was tested before; that
> particular error message was part of the expected output in
> Lib/test/output/test_re, which is now no longer needed and which I
> have hence deleted.

Looks good.

Perhaps test_re should be (or should have been) phased out.  Test_sre
makes many of the same tests (including today's offending one), as
well as many new ones, and both run all the many old test from
re_test.  It must be a (historical) quirk that they both exist.  It's
mostly a waste to run both, and having two is a maintenance hassle,
underscored by the fact that Skip has choosen the less important one
of the two (IMHO) to modernize.

It's not a high priority, but perhaps I'll look at straightening
things out in the (somewhat distant) future.

Gary Herron




From skip@pobox.com  Fri Apr 25 04:00:51 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 24 Apr 2003 22:00:51 -0500
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
Message-ID: <16040.42211.529453.408981@montanaro.dyndns.org>

    Tim> ======================================================================
    Tim> ERROR: test_limitations (__main__.ReTests)
    Tim> ----------------------------------------------------------------------
    Tim> Traceback (most recent call last):
    Tim>   File "../lib/test/test_re.py", line 182, in test_limitations
    Tim>     self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000))
    Tim>   File "C:\Code\python\lib\sre.py", line 132, in match
    Tim>     return _compile(pattern, flags).match(string)
    Tim> RuntimeError: maximum recursion limit exceeded

My apologies.  I made most of these changes a couple months ago.  test_re
has been failing with the stack limit problem all this time.  I thought it
was related to the usual Mac OS X stack limit problem.  Thanks to Guido and
Gary also for elucidating and fixing the problem while I was at my son's
hockey game.  (I know, I'll get my priorities straight one of these days...)

Skip


From skip@pobox.com  Fri Apr 25 04:05:28 2003
From: skip@pobox.com (Skip Montanaro)
Date: Thu, 24 Apr 2003 22:05:28 -0500
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <200304241956.49764.gherron@islandtraining.com>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
 <200304241747.29059.gherron@islandtraining.com>
 <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>
 <200304241956.49764.gherron@islandtraining.com>
Message-ID: <16040.42488.834447.899129@montanaro.dyndns.org>

    Gary> It's mostly a waste to run both, and having two is a maintenance
    Gary> hassle, underscored by the fact that Skip has choosen the less
    Gary> important one of the two (IMHO) to modernize.

I think it would be better to fold missing tests in from test_sre to
test_re, not so much because I've partly converted test_re to use unittest,
but because "re" is what people generally import.  It never even occurred to
me to look for "test_sre" when I was looking for a candidate test suite to
convert to unittest.  I'll keep working at completing the conversion.

Skip


From tim_one@email.msn.com  Fri Apr 25 04:17:56 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 24 Apr 2003 23:17:56 -0400
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEAOEIAB.tim_one@email.msn.com>

[Guido]
> I think I understand the problem, and I've checked something in that
> makes the test pass, by insisting that the match raise RuntimeError
> with a specific error message.  This is what was tested before; that
> particular error message was part of the expected output in
> Lib/test/output/test_re, which is now no longer needed and which I
> have hence deleted.

That's all exactly right.  Thanks!

> (Hmm, I wonder if there are any other files in Lib/test/output that
> are no longer needed?  All those files should eventually disappear...)

Fred used to keep good track of this, so I doubt there's a big backlog.  I
expect the best candidates are those (like the re tests) recently converted
to unittest.  Getting rid of expected-output files should be part of such a
conversion (or of a conversion to doctest).  OTOH, the expected-output kind
of test remains fine by me!  It used to be very painful to see what went
wrong when things failed, but quite some time ago that mechanism was
reworked to save all the output and display a diff instead.



From barry@python.org  Fri Apr 25 04:19:57 2003
From: barry@python.org (Barry Warsaw)
Date: 24 Apr 2003 23:19:57 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <20030424225914.GA26254@xs4all.nl>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
 <20030424225914.GA26254@xs4all.nl>
Message-ID: <1051240796.11580.4.camel@geddy>

On Thu, 2003-04-24 at 18:59, Thomas Wouters wrote:

> That's not particularly useful. The only thing that does is create a period
> in time (or rather, 'history' -- CVS history) in which test_urllib.py
> doesn't exist. Re-adding the file won't give you a fresh version numbering
> either, it'll just give you a lot of headaches, especially when there are
> branches involved (right, Barry ? :-)

And one thing we do /not/ need is more headaches with cvs. :)

The specific problem I've been fighting with (in Mailman's cvs) is that
I've cvs rm'd some binary files, but both a cvs checkout and a cvs
export continue to resurrect the files when I provide -r on the initial
command.  If I do a checkout of the trunk, then cvs up to the tag, the
file goes away as intended.  Sigh.

> Just commit your new test_urllib.py directly, when it's all done, using
> something like
> 
>     cvs commit -r2.0 test_urllib.py
> 
> But you probably want to discuss the version number you want to force, Guido
> might like to reserve 2.0 for something (although I think he should use
> '3000' instead :)

I know Guido doesn't care, but I like to have the file major revision
numbers match the s/w's major rev number.  Really, I just hate to see
huge minor revision numbers on files.  I hate it as much as I hate to
hear Tim's tummy rumbling, right around noon.

-Barry




From tim_one@email.msn.com  Fri Apr 25 04:32:07 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 24 Apr 2003 23:32:07 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <1051240796.11580.4.camel@geddy>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEBAEIAB.tim_one@email.msn.com>

[Barry Warsaw]
> ...
> I know Guido doesn't care, but I like to have the file major revision
> numbers match the s/w's major rev number.  Really, I just hate to see
> huge minor revision numbers on files.

Good news:  I'm living proof that you can learn to ignore that files *have*
CVS revision numbers.  If you need a milestone marker, apply a tag.

> I hate it as much as I hate to hear Tim's tummy rumbling, right around
> noon.

Lucky for both of us that my lunch admin almost never lets that happen
anymore.  If I could remember his name, I'd recommend him to you.



From gherron@islandtraining.com  Fri Apr 25 04:48:12 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Thu, 24 Apr 2003 20:48:12 -0700
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: <16040.42488.834447.899129@montanaro.dyndns.org>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241956.49764.gherron@islandtraining.com> <16040.42488.834447.899129@montanaro.dyndns.org>
Message-ID: <200304242048.12864.gherron@islandtraining.com>

On Thursday 24 April 2003 08:05 pm, Skip Montanaro wrote:
>     Gary> It's mostly a waste to run both, and having two is a maintenance
>     Gary> hassle, underscored by the fact that Skip has choosen the less
>     Gary> important one of the two (IMHO) to modernize.
>
> I think it would be better to fold missing tests in from test_sre to
> test_re, not so much because I've partly converted test_re to use unittest,
> but because "re" is what people generally import.  It never even occurred
> to me to look for "test_sre" when I was looking for a candidate test suite
> to convert to unittest.  I'll keep working at completing the conversion.

Sure.  This is sensible.

Gary Herron






From DavidA@ActiveState.com  Fri Apr 25 05:10:22 2003
From: DavidA@ActiveState.com (David Ascher)
Date: Thu, 24 Apr 2003 21:10:22 -0700
Subject: [Python-Dev] Cryptographic stuff for 2.3
References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com>              <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <200304242012.h3OKCP325878@odiug.zope.com>
Message-ID: <3EA8B52E.7090505@ActiveState.com>

Guido van Rossum wrote:
>>I think does make sense, though, to have a package that is maintained
>>separately that python-dev pseudo endorses (like PyXML and win32all) that
>>contains all of this crypto stuff.
> 
> 
> Right.

Although of course then the crypto requirements would impact that 
package, and MAL's point applies to it.

--david




From Anthony Baxter <anthony@interlink.com.au>  Fri Apr 25 06:02:24 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Fri, 25 Apr 2003 15:02:24 +1000
Subject: [Python-Dev] shellwords
In-Reply-To: <2mlly6pgff.fsf@starship.python.net>
Message-ID: <200304250502.h3P52PH25342@localhost.localdomain>

>>> Michael Hudson wrote
> Particularly the file-manipulation stuff... shutil tends to lose
> somewhat x-platform.

The other file manipulation thingy that would be good would be to
abstract out the bits of tarfile and zipfile and make a standard 
interface to the two.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.



From martin@v.loewis.de  Fri Apr 25 06:06:01 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 25 Apr 2003 07:06:01 +0200
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net>
Message-ID: <m3y91zb1za.fsf@mira.informatik.hu-berlin.de>

Tim Peters <tim.one@comcast.net> writes:

> That part I didn't grok:  why force an artifical version number?  I can't
> imagine a use for that.  The "Rewrote from scratch." checkin comment Brett
> will surely make is milestone enough in the CVS log.

Bumping the major number makes a more visible change. There is no
technical reason to do that, nor one to avoid doing so if you like the
visible change.

A number of files in the Python CVS do have a 2.x version number; I
always wondered why that is.

Regards,
Martin



From guido@python.org  Fri Apr 25 06:16:29 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 01:16:29 -0400
Subject: [Python-Dev] New test failure on Windows
In-Reply-To: "Your message of Thu, 24 Apr 2003 19:56:49 PDT."
 <200304241956.49764.gherron@islandtraining.com>
References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net>
 <200304241747.29059.gherron@islandtraining.com>
 <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net>
 <200304241956.49764.gherron@islandtraining.com>
Message-ID: <200304250516.h3P5GTI02992@pcp02138704pcs.reston01.va.comcast.net>

> Perhaps test_re should be (or should have been) phased out.  Test_sre
> makes many of the same tests (including today's offending one), as
> well as many new ones, and both run all the many old test from
> re_test.  It must be a (historical) quirk that they both exist.  It's
> mostly a waste to run both, and having two is a maintenance hassle,
> underscored by the fact that Skip has choosen the less important one
> of the two (IMHO) to modernize.
> 
> It's not a high priority, but perhaps I'll look at straightening
> things out in the (somewhat distant) future.

Yes, this is mostly a historical artefact from the time when SRE was
one of the two provided RE implementations.  If you can straighten
this one out, be my guest.  I see no reason to stop working on the
tests while Python is in beta.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Fri Apr 25 06:14:51 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 25 Apr 2003 07:14:51 +0200
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
 <20030424225914.GA26254@xs4all.nl>
 <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU>
Message-ID: <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de>

Brett Cannon <bac@OCF.Berkeley.EDU> writes:

> OK, I will finish the code then first.  Just to double-check, creating a
> tracker item and initially assigning it to myself will not cause people to
> ignore it since everyone who cares will see the new item when it gets
> mailed to Patches and sees I am asking for other people on other platforms
> beyond OS X to give the code a run, right?

Wrong; I do ignore patches that are assigned.

> And is having new testing suites peer-reviewed a common thing?  Or should
> I only worry about it when there is a slight chance cross-platform issues
> might sprout up from the tests?

I believe the general policy is that if you are certain that a certain
patch is useful and correct, you don't need to post it on SF; that is
the case in particular if you are *the* maintainer of that piece of
code. So if you have doubts, post on SF - but do ask yourself whether
there is anybody who you think could eliminate those doubts; if there
is no true expert around, the patch will stay unreviewed forever.

The policy about beta releases is (or should be) stricter. No new
features, and perhaps no new tests unless they test for a bug that
gets fixed. So in the beta cycle, patches are posted to SF just to
store them there until after the release of Python 2.3.

In the specific case, ask yourself what the cost would be if you
produce a test failure under conditions that you consider obscure.
Will enough people test the test before 2.3 is released? Does the new
test suite behave differently enough from the old one to make false
positives a possibility?

Regards,
Martin


From guido@python.org  Fri Apr 25 06:19:08 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 01:19:08 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: "Your message of 24 Apr 2003 23:19:57 EDT."
 <1051240796.11580.4.camel@geddy>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
 <20030424225914.GA26254@xs4all.nl> <1051240796.11580.4.camel@geddy>
Message-ID: <200304250519.h3P5J8x03015@pcp02138704pcs.reston01.va.comcast.net>

> I know Guido doesn't care, but I like to have the file major revision
> numbers match the s/w's major rev number.  Really, I just hate to see
> huge minor revision numbers on files.

Well, some files already have a 2.x revno, others don't.  The 2.x
revnos were introduced in ancient times.  I'm all for switching to 3.x
when we're doing Python 3.0, but until then, I see no reason to play
with this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 25 06:24:45 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 01:24:45 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: "Your message of 25 Apr 2003 07:14:51 +0200."
 <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de>
References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU>
 <20030424225914.GA26254@xs4all.nl>
 <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU>
 <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de>
Message-ID: <200304250524.h3P5OjH03063@pcp02138704pcs.reston01.va.comcast.net>

> The policy about beta releases is (or should be) stricter. No new
> features, and perhaps no new tests unless they test for a bug that
> gets fixed. So in the beta cycle, patches are posted to SF just to
> store them there until after the release of Python 2.3.

I don't see much of a reason to be so strict about no new tests.

> In the specific case, ask yourself what the cost would be if you
> produce a test failure under conditions that you consider obscure.
> Will enough people test the test before 2.3 is released? Does the new
> test suite behave differently enough from the old one to make false
> positives a possibility?

This is always a good set of questions to ask yourself.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Fri Apr 25 07:42:11 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 Apr 2003 08:42:11 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>
References: <20030423163947.GA24541@nyman.amk.ca>	<m3lly079qc.fsf@mira.informatik.hu-berlin.de>	<3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3EA8D8C3.40503@lemburg.com>

Martin v. L=F6wis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>=20
>>Why do you only look at US export rules when discussing crypto
>>code in Python ?
>=20
> Because only exporting matters. Importing is no problem: You can
> easily *remove* stuff from the distribution, by creating a copy of
> package that doesn't have the code that cannot be imported. That would
> be the job of whoever wants to import it.
>=20
> Exporting also only matters from the servers which host the Python
> distribution, i.e. the US and the Netherlands.

That's really optimistic. Every CD vendor, mirror site, etc. in the
world hosting the Python distribution would have to go through the
business of evaluating whether it's legal to distribute Python or not
in their particular case.

Even better: users who download Python from some web-site/CD would
have to trace back the path the Python version took to be sure
that they are using a legally exported and imported version.

Crypto is just too much (legal) work if you're serious about it.

I also don't really see a problem here: there are plenty good
crypto packages out there ready to be used. Not having them in
the core distribution raises the awareness bar just a little to
make people think about whether it's legal to use them in
their particular case.

So again: why put the whole Python distribution at risk just
because you want to make life easier for the small share of
people actually using such code ?

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 25 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        60 days left



From martin@v.loewis.de  Fri Apr 25 08:01:18 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 25 Apr 2003 09:01:18 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA8D8C3.40503@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca>	<m3lly079qc.fsf@mira.informatik.hu-berlin.de>	<3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> <3EA8D8C3.40503@lemburg.com>
Message-ID: <3EA8DD3E.8090201@v.loewis.de>

M.-A. Lemburg wrote:

> That's really optimistic. Every CD vendor, mirror site, etc. in the
> world hosting the Python distribution would have to go through the
> business of evaluating whether it's legal to distribute Python or not
> in their particular case.

Every CD vendor, mirror site, etc. would have to perform a risk 
analysis, yes. That goes beyond analysing the legal status only - people 
will usually also take into account what the risk of prosecution is.
They already do that for all other software they distribute, and 
apparently come to the conclusion that the risk of being prosecuted is 
nearly zero.

> Crypto is just too much (legal) work if you're serious about it.

So then you would advise to remove the OpenSSL support from the Windows 
distribution, and from Python altogether?

Because if not, why would it be bad to add more cryptographic packages 
to the standard Python distribution? Either you violate some law in some 
country already by distributing Python from A to B, or you don't. Adding 
another package doesn't change anything here.

> I also don't really see a problem here: there are plenty good
> crypto packages out there ready to be used. 

And it may be indeed the case that authors of such package fear the loss 
of reputation if competing packages were included into the Python 
distribution :-(

Regards,
Martin




From tim_one@email.msn.com  Fri Apr 25 08:12:47 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 25 Apr 2003 03:12:47 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304241650.h3OGoPM15432@odiug.zope.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com>

[Guido]
> Agreed.  How about naming it os.walk()?  I think it's not OS specific
> -- all the OS specific stuff is part of os.path.  So we only need one
> implementation.

I've checked this in, modified to treat symlinks the same way os.path.walk()
treated them, and with docs and test cases.  It wasn't my intent to cut off
people who want fancier stuff, but available time is finite, and at least
now they can demonstrate their sincerity by supplying code, doc, and test
suite patches <wink>.



From mal@lemburg.com  Fri Apr 25 09:02:26 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 Apr 2003 10:02:26 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA8DD3E.8090201@v.loewis.de>
References: <20030423163947.GA24541@nyman.amk.ca>	<m3lly079qc.fsf@mira.informatik.hu-berlin.de>	<3EA7A11B.8090202@lemburg.com>	<m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> <3EA8D8C3.40503@lemburg.com> <3EA8DD3E.8090201@v.loewis.de>
Message-ID: <3EA8EB92.4070606@lemburg.com>

Martin v. L=F6wis wrote:
> M.-A. Lemburg wrote:
>=20
>> That's really optimistic. Every CD vendor, mirror site, etc. in the
>> world hosting the Python distribution would have to go through the
>> business of evaluating whether it's legal to distribute Python or not
>> in their particular case.
>=20
> Every CD vendor, mirror site, etc. would have to perform a risk=20
> analysis, yes. That goes beyond analysing the legal status only - peopl=
e=20
> will usually also take into account what the risk of prosecution is.
> They already do that for all other software they distribute, and=20
> apparently come to the conclusion that the risk of being prosecuted is=20
> nearly zero.

In reality is probably is for most parts of the world. But why
put this burden on the casual user ?

>> Crypto is just too much (legal) work if you're serious about it.
>=20
> So then you would advise to remove the OpenSSL support from the Windows=
=20
> distribution, and from Python altogether?

Hmm, I didn't know that the Windows installer comes with an SSL
module that includes OpenSSL. I'd strongly advise to make that
a separate download. At the very least, there should be a Windows
installer without that module and a note on the web-site mentioning
the problem and maybe linking to the URL I gave in my other mail.

In any case, the download page should have a note about the
use of crypto code and interfaces to crypto code to make things
safer for both the PSF and the user downloading the distribution.

> Because if not, why would it be bad to add more cryptographic packages=20
> to the standard Python distribution? Either you violate some law in som=
e=20
> country already by distributing Python from A to B, or you don't. Addin=
g=20
> another package doesn't change anything here.

I can't follow you're argument. This is like "you've robbed
one bank; it doesn't get worse if you rob another two".

I also don't understand your position in the light of the PSF's
intentions. The PSF is meant to protect the IP in Python -- how
does that fit with being careless about breaking law ?

>> I also don't really see a problem here: there are plenty good
>> crypto packages out there ready to be used.=20
>=20
> And it may be indeed the case that authors of such package fear the los=
s=20
> of reputation if competing packages were included into the Python=20
> distribution :-(

Is there ? pycrypto is all you need if you're into deep crypto.
The standard SSL support is enough crypt for most people and
that's already included in the distribution.

--=20
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 25 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        60 days left



From Paul.Moore@atosorigin.com  Fri Apr 25 09:59:52 2003
From: Paul.Moore@atosorigin.com (Moore, Paul)
Date: Fri, 25 Apr 2003 09:59:52 +0100
Subject: [Python-Dev] Cryptographic stuff for 2.3
Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>

From: M.-A. Lemburg [mailto:mal@lemburg.com]
> In reality is probably is for most parts of the world. But
> why put this burden on the casual user ?

Speaking as a "casual user", I very rarely need or use crypto
software. However, when I do need it, having it "built in" is
a major benefit - most of the crypto packages either have
dependencies I'm not familiar with or don't have, or go far
too deep into crypto theory for me to follow. At the end of
the day, all I want is simple stuff, like for urllib to get a
"https" web page for me, "just like my browser does" (ie, with
no thought on my part...)

>>> Crypto is just too much (legal) work if you're serious
>>> about it.
>>=20
>> So then you would advise to remove the OpenSSL support
>> from the Windows distribution, and from Python altogether?
>
> Hmm, I didn't know that the Windows installer comes with an SSL
> module that includes OpenSSL. I'd strongly advise to make that
> a separate download.

If you did, I'd expect that 99% of Windows users would perceive
that as "Python can't handle https URLs". Having a separate
download might be enough, as long as it was utterly trivial -
download the package, click to install, done. All dependencies
included, no extra work.

> Is there ? pycrypto is all you need if you're into deep crypto.

But pycrypto (at least when I've looked into it) definitely *isn't*
just a 1-click install, and a quick Google search reveals no way
of getting a prebuilt Windows binary. Of course, you say "if you're
into deep crypto", so maybe you'd say that expecting users to build
their own isn't unreasonable at that level.

Actually, m2crypto is another candidate, and it does include
Windows binaries (but they are a bit fiddly to install)...

> The standard SSL support is enough crypt for most people and
> that's already included in the distribution.

But you were arguing to take it out...

Personally, I'd like the existing stuff to stay as-is. I don't
particularly see the need for more crypto stuff in the core, but I'd
like to see a well-maintained, easy to install, "sanctioned" crypto
package for people who want to either use crypto "for real", or just
investigate it.

Paul.


From andrew@acooke.org  Fri Apr 25 12:47:05 2003
From: andrew@acooke.org (andrew cooke)
Date: Fri, 25 Apr 2003 07:47:05 -0400 (CLT)
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com>
References: <200304241650.h3OGoPM15432@odiug.zope.com>
 <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com>
Message-ID: <41193.127.0.0.1.1051271225.squirrel@127.0.0.1>

Tim Peters said:
> I've checked this in, modified to treat symlinks the same way
> os.path.walk()
> treated them, and with docs and test cases.  It wasn't my intent to cut
> off
> people who want fancier stuff, but available time is finite, and at least
> now they can demonstrate their sincerity by supplying code, doc, and test
> suite patches <wink>.

For the record - the version I posted (with breadth-first as an option)
wasn't reliable (it runs out of stack space on reasonable directory
structures).

Andrew

-- 
http://www.acooke.org/andrew


From tim@zope.com  Fri Apr 25 16:14:31 2003
From: tim@zope.com (Tim Peters)
Date: Fri, 25 Apr 2003 11:14:31 -0400
Subject: [Python-Dev] More new Windos test failures
Message-ID: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com>

test_urllib and test_socket fail on Win2K today.

test_socket (I think Guido already knows about these):

======================================================================
ERROR: testIPv4toString (__main__.GeneralModuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_socket.py", line 322, in testIPv4toString
    from socket import inet_aton as f, inet_pton, AF_INET
ImportError: cannot import name inet_pton

======================================================================
ERROR: testStringToIPv4 (__main__.GeneralModuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_socket.py", line 352, in testStringToIPv4
    from socket import inet_ntoa as f, inet_ntop, AF_INET
ImportError: cannot import name inet_ntop

----------------------------------------------------------------------
Ran 46 tests in 3.555s

FAILED (errors=2)



test_urllib (these may all be bad line-end assumptions):

======================================================================
FAIL: test_fileno (__main__.urlopen_FileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_urllib.py", line 68, in test_fileno
    "Reading on the file descriptor returned by fileno() "
  File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
    raise self.failureException, \
AssertionError: Reading on the file descriptor returned by fileno() did not
return the expected text

======================================================================
FAIL: test_iter (__main__.urlopen_FileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_urllib.py", line 88, in test_iter
    self.assertEqual(line, self.text)
  File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
    raise self.failureException, \
AssertionError: 'test_urllib: urlopen_FileTests\r\n' != 'test_urllib:
urlopen_FileTests\n'

======================================================================
FAIL: test_read (__main__.urlopen_FileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_urllib.py", line 48, in test_read
    self.assertEqual(self.text, self.returned_obj.read())
  File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
    raise self.failureException, \
AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib:
urlopen_FileTests\r\n'

======================================================================
FAIL: test_readline (__main__.urlopen_FileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_urllib.py", line 51, in test_readline
    self.assertEqual(self.text, self.returned_obj.readline())
  File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
    raise self.failureException, \
AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib:
urlopen_FileTests\r\n'

======================================================================
FAIL: test_readlines (__main__.urlopen_FileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "../lib/test/test_urllib.py", line 61, in test_readlines
    "readlines() returned improper text")
  File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
    raise self.failureException, \
AssertionError: readlines() returned improper text

----------------------------------------------------------------------
Ran 23 tests in 0.280s

FAILED (failures=5)



From guido@python.org  Fri Apr 25 16:21:55 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 11:21:55 -0400
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: "Your message of Fri, 25 Apr 2003 06:40:27 EDT."
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
Message-ID: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>

> test_urllib.py is crashing on my fresh WinMe build:
> 
> test_fileno (__main__.urlopen_FileTests) ... FAIL
> test_iter (__main__.urlopen_FileTests) ... FAIL
> test_read (__main__.urlopen_FileTests) ... FAIL
> test_readline (__main__.urlopen_FileTests) ... FAIL
> test_readlines (__main__.urlopen_FileTests) ... FAIL

Should be fixed now -- I'm writing the file with test data in binary
mode.

I think that it would be preferably if the socket._fileobject class
would actually interpret the mode argument, but it's never done that,
so I'm not in a hurry to add this feature to this already hairy class.
(Better wait until the new "sio" class -- see sandbox.)

> Two of the test cases are failing in test_socket.py 
> on a fresh build for WinMe:
> 
> testIPv4toString (__main__.GeneralModuleTests) ... ERROR
> testStringToIPv4 (__main__.GeneralModuleTests) ... ERROR

Fixed too, by skipping the tests of inet_ntop() and inet_pton() when
they don't exist.

All tests now pass for me, both on Linux (Red Hat 7.8) and Windows
(Win 98 second edition).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From wesleyhenwood@hotmail.com  Fri Apr 25 16:33:57 2003
From: wesleyhenwood@hotmail.com (wesley henwood)
Date: Fri, 25 Apr 2003 15:33:57 +0000
Subject: [Python-Dev] Re: PyRun_* functions
Message-ID: <BAY7-F110q50K8paSxf00007dd9@hotmail.com>

How do I make certain that  FILE* parameters are only passed to these 
functions if it is certain that they were created by the same library that 
the Python runtime is using?





_________________________________________________________________




From jeremy@zope.com  Fri Apr 25 16:34:22 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 25 Apr 2003 11:34:22 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <3EA8D8C3.40503@lemburg.com>
References: <20030423163947.GA24541@nyman.amk.ca>
 <m3lly079qc.fsf@mira.informatik.hu-berlin.de>
 <3EA7A11B.8090202@lemburg.com>
 <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de>  <3EA8D8C3.40503@lemburg.com>
Message-ID: <1051284862.1009.6.camel@slothrop.zope.com>

On Fri, 2003-04-25 at 02:42, M.-A. Lemburg wrote:
> That's really optimistic. Every CD vendor, mirror site, etc. in the
> world hosting the Python distribution would have to go through the
> business of evaluating whether it's legal to distribute Python or not
> in their particular case.

I haven't had time to follow this thread closely, but I think I saw a
message from Martin where he explained that the OpenSSL wrapper we
already have is probably covered by US export regulations.  I think it's
a matter of interpretation, but I agree with that interpretation.  So
everyone who distributes Python already needs to do that analysis.

I think it's unlikely we would remove the crypto code we already have,
so I'm all for adding more crypto code that makes the library more
useful.

Jeremy




From guido@python.org  Fri Apr 25 16:38:35 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 11:38:35 -0400
Subject: [Python-Dev] Re: PyRun_* functions
In-Reply-To: "Your message of Fri, 25 Apr 2003 15:33:57 -0000."
 <BAY7-F110q50K8paSxf00007dd9@hotmail.com>
References: <BAY7-F110q50K8paSxf00007dd9@hotmail.com>
Message-ID: <200304251538.h3PFcZ119642@pcp02138704pcs.reston01.va.comcast.net>

> How do I make certain that  FILE* parameters are only passed to these 
> functions if it is certain that they were created by the same library that 
> the Python runtime is using?

On which platform?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From duanev@io.com  Fri Apr 25 16:47:48 2003
From: duanev@io.com (Duane Voth)
Date: Fri, 25 Apr 2003 10:47:48 -0500
Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses!
Message-ID: <20030425104748.A26488@io.com>

First, Martin, muchas garcias!  --export-dynamic was exactly the ticket.

Next hurdle: Lynx is clearly hoping curses will go the way of the condor,
their implementation is pre ncurses!  Comments at the top of
Python-2.2.2/Modules/_cursesmodule.c suggest that there was a prior
version of Python curses that should be much closer to what LynxOS4
supports.  Does anyone have an archived copy of the old _cursesmodule.c?

Modules/_cursesmodule.c comments:
 *   Based on prior work by Lance Ellinghaus and Oliver Andrich
 *   Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse,
 *    Cathedral City, California Republic, United States of America.
 *
 *   Version 1.5b1, heavily extended for ncurses by Oliver Andrich:
 *   Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany.

so I guess I'm looking for version 1.2 of _cursesmodule.c.

-- 
Duane Voth
duanev@io.com
--
duanev@atlantis.io.com


From james.kew@btinternet.com  Fri Apr 25 16:56:58 2003
From: james.kew@btinternet.com (James Kew)
Date: Fri, 25 Apr 2003 16:56:58 +0100
Subject: [Python-Dev] Re: Cryptographic stuff for 2.3
References: <20030423163947.GA24541@nyman.amk.ca> <200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <b8blqg$96u$1@main.gmane.org>

"Guido van Rossum" <guido@python.org> wrote in message
news:200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net...

> Rotor should be deprecated regardless; I've never heard of someone
> using it.

I have seen it mentioned occasionally on c.l.py, usually with a followup of
"don't use rotor, it's not secure":

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&th=5a655073e0b632ea&rnum=4
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&th=7b945db40cf892fd&rnum=5

James





From python@rcn.com  Fri Apr 25 17:00:15 2003
From: python@rcn.com (Raymond Hettinger)
Date: Fri, 25 Apr 2003 12:00:15 -0400
Subject: [Python-Dev] Re: Failin tests on Windows
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <000701c30b43$c30f9ac0$125ffea9@oemcomputer>

> All tests now pass for me, both on Linux (Red Hat 7.8) and Windows
> (Win 98 second edition).

On a fresh WinME build, 
all test pass for me also :-)


Raymond




From guido@python.org  Fri Apr 25 17:00:27 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 12:00:27 -0400
Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses!
In-Reply-To: "Your message of Fri, 25 Apr 2003 10:47:48 CDT."
 <20030425104748.A26488@io.com>
References: <20030425104748.A26488@io.com>
Message-ID: <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net>

> Modules/_cursesmodule.c comments:
>  *   Based on prior work by Lance Ellinghaus and Oliver Andrich
>  *   Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse,
>  *    Cathedral City, California Republic, United States of America.
>  *
>  *   Version 1.5b1, heavily extended for ncurses by Oliver Andrich:
>  *   Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany.
> 
> so I guess I'm looking for version 1.2 of _cursesmodule.c.

You should be able to get that out of CVS.  The oldest version at

  http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_cursesmodule.c

is labeled 2.1, but the CVS version numbers don't match author's
versions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@zope.com  Fri Apr 25 17:16:46 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: 25 Apr 2003 12:16:46 -0400
Subject: [Python-Dev] test_ossaudiodev hanging again
Message-ID: <1051287405.1009.66.camel@slothrop.zope.com>

I thought I'd report that test_ossaudiodev is back to hanging on my RH
7.2 box.  It's been a while since I ran the test suite with the audio
resource enabled, so I don't know when it started to hang.

Jeremy




From guido@python.org  Fri Apr 25 17:39:46 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 12:39:46 -0400
Subject: [Python-Dev] test_ossaudiodev hanging again
In-Reply-To: "Your message of 25 Apr 2003 12:16:46 EDT."
 <1051287405.1009.66.camel@slothrop.zope.com>
References: <1051287405.1009.66.camel@slothrop.zope.com>
Message-ID: <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net>

> I thought I'd report that test_ossaudiodev is back to hanging on my RH
> 7.2 box.  It's been a while since I ran the test suite with the audio
> resource enabled, so I don't know when it started to hang.

It probably never stopped hanging.  It only runs when you pass
"-u audio" to regrtest though.

I note that it passes for me with Red Hat 7.3, so you might want to
upgrade. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Fri Apr 25 17:58:40 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 25 Apr 2003 18:58:40 +0200
Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses!
In-Reply-To: <20030425104748.A26488@io.com>
References: <20030425104748.A26488@io.com>
Message-ID: <3EA96940.4060501@v.loewis.de>

Duane Voth wrote:
> Next hurdle: Lynx is clearly hoping curses will go the way of the condor,
> their implementation is pre ncurses!  Comments at the top of
> Python-2.2.2/Modules/_cursesmodule.c suggest that there was a prior
> version of Python curses that should be much closer to what LynxOS4
> supports.  Does anyone have an archived copy of the old _cursesmodule.c?

You can get old versions of all source code from the CVS.

>  *   Based on prior work by Lance Ellinghaus and Oliver Andrich
>  *   Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse,
>  *    Cathedral City, California Republic, United States of America.
>  *
>  *   Version 1.5b1, heavily extended for ncurses by Oliver Andrich:
>  *   Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany.
> 
> so I guess I'm looking for version 1.2 of _cursesmodule.c.

I think your guess is wrong. The extensions are used only if available,
and the curses module works with pre-ncurses implementations of curses
just fine.

Regards,
Martin




From barry@python.org  Fri Apr 25 18:10:57 2003
From: barry@python.org (Barry Warsaw)
Date: 25 Apr 2003 13:10:57 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>
References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>
Message-ID: <1051290657.1500.6.camel@barry>

On Fri, 2003-04-25 at 04:59, Moore, Paul wrote:

> Personally, I'd like the existing stuff to stay as-is.

I'd hate to see sha removed from the standard distro.

-Barry




From tim.one@comcast.net  Fri Apr 25 18:47:50 2003
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 25 Apr 2003 13:47:50 -0400
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEGIFHAA.tim.one@comcast.net>

[Guido]
> ...
> All tests now pass for me, both on Linux (Red Hat 7.8) and Windows
> (Win 98 second edition).

It looks good on Win2K now too, both release and debug builds.  I saw one
failure in test_queue, but believe that's due to a pre-existing race
condition in the test code (recall that we've both seen test_queue fail
before).



From theller@python.net  Fri Apr 25 19:10:33 2003
From: theller@python.net (Thomas Heller)
Date: 25 Apr 2003 20:10:33 +0200
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net>
Message-ID: <y91ycusm.fsf@python.net>

Tim Peters <tim.one@comcast.net> writes:

> [Mark Hammond]
> > Actually, some guidance would be nice here.
> 
> It's easy this time.  BTW, I agree your new check is the right thing to do!
> If another case like this pops up, though, we/you should probably add a
> section to the PEP explaining what to do about it.
> 
ctypes ;-) is another case (and more cases will pop up as soon as the beta is
released, and people try their extensions under it).

I agree it is easy to fix, but usually when Python crashes with an
invalid thread state I'm very anxious at first.

So is the policy now that it is no longer *allowed* to create another
thread state, while in previous versions there wasn't any choice,
because there existed no way to get the existing one?

IMO a fatal error is very harsh, especially there's no problem to
continue execution - excactly what happens in a release build.

Not that I am misunderstood: I very much appreciate the work Mark has
done, and look forward to use it to it's fullest extent.

Thomas



From niemeyer@conectiva.com  Fri Apr 25 19:11:57 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Fri, 25 Apr 2003 15:11:57 -0300
Subject: [Python-Dev] shellwords
In-Reply-To: <200304250502.h3P52PH25342@localhost.localdomain>
References: <2mlly6pgff.fsf@starship.python.net> <200304250502.h3P52PH25342@localhost.localdomain>
Message-ID: <20030425181157.GB6591@localhost.distro.conectiva>

> > Particularly the file-manipulation stuff... shutil tends to lose
> > somewhat x-platform.
> 
> The other file manipulation thingy that would be good would be to
> abstract out the bits of tarfile and zipfile and make a standard 
> interface to the two.

IIRC, tarfile has a wrapper which makes it compatible with zipfile.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From guido@python.org  Fri Apr 25 19:26:25 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 14:26:25 -0400
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: "Your message of 25 Apr 2003 13:10:57 EDT."
 <1051290657.1500.6.camel@barry>
References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>
 <1051290657.1500.6.camel@barry>
Message-ID: <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net>

> I'd hate to see sha removed from the standard distro.

Me too; I don't see sha or md5 as crypto.  I'm only against adding new
*crypto* capability.

I'm also for isolating existing crypto capability so it's easy to
remove for anyone who has a need for a crypto-free distribution.  I
think we're already doing that, given that even on Windows, the SSL
module is a separate DLL.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From fdrake@acm.org  Fri Apr 25 19:36:45 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 25 Apr 2003 14:36:45 -0400
Subject: [Python-Dev] Python 2.3b1 documentation
Message-ID: <16041.32829.612385.536757@grendel.zope.com>

I've already formatted the documentation for Python 2.3b1; please
don't touch the Doc directory until the final release has been
announced.

Thanks!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From skip@pobox.com  Fri Apr 25 19:22:43 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 13:22:43 -0500
Subject: [Python-Dev] test_logging hangs on Solaris 8
Message-ID: <16041.31987.943313.278329@montanaro.dyndns.org>

Using the latest version from CVS, on Solaris 8 test_logging hangs.  Lots of
output, then:

    ...
    INFO:root:Info index = 99
    -- logging 100 at INFO, messages should be seen every 10 events --
    -- logging 101 at INFO, messages should be seen every 10 events --
    INFO:root:Info index = 100
    INFO:root:Info index = 101
    -- log_test2  end    ---------------------------------------------------
    -- log_test3  begin  ---------------------------------------------------
    Unfiltered...
    INFO:a:Info 1
    INFO:a.b:Info 2
    INFO:a.c:Info 3
    INFO:a.b.c:Info 4
    INFO:a.b.c.d:Info 5
    INFO:a.bb.c:Info 6
    INFO:b:Info 7
    INFO:b.a:Info 8
    INFO:c.a.b:Info 9
    INFO:a.bb:Info 10
    Filtered with 'a.b'...
    INFO:a.b:Info 2
    INFO:a.b.c:Info 4
    INFO:a.b.c.d:Info 5
    -- log_test3  end    ---------------------------------------------------

and it just sits there.  ^C doesn't terminate it.  I have to stop it w/ ^Z,
then "kill %1" it.  I have the very latest source checked out.  Any ideas?

Skip


From skip@pobox.com  Fri Apr 25 16:38:03 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 10:38:03 -0500
Subject: [Python-Dev] should sre.Scanner be exposed through re and documented?
Message-ID: <16041.22107.533893.743928@montanaro.dyndns.org>

While moving tests from test_sre to test_re I stumbled upon a simple test
for sre.Scanner.  This looks fairly cool.  Should it be exposed through re
and documented?

Skip


From skip@pobox.com  Fri Apr 25 17:14:51 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 11:14:51 -0500
Subject: [Python-Dev] test_s?re merge
Message-ID: <16041.24315.500827.370963@montanaro.dyndns.org>

For those of you who don't read python-checkins, the merge of test_re.py and
test_sre.py has been completed and test_sre.py is no longer in the
repository.  Future test cases should be added to test_re.py, even if it's a
test specifically of Fredrik's sre module.  The sre.Scanner object is the
only thing imported directly from sre, and only because it is not sucked in
by re.py.

I may also assimilate Tim's re_tests.py at some point, but probably not real
soon, so if someone feels like tackling that, be my guest. ;-)

Skip


From skip@pobox.com  Fri Apr 25 17:27:22 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 11:27:22 -0500
Subject: [Python-Dev] bz2 module fails to compile on Solaris 8
Message-ID: <16041.25066.559968.451868@montanaro.dyndns.org>

The bz2 module isn't compiling for me on Solaris 8:

    building 'bz2' extension
    gcc -g -Wall -Wstrict-prototypes -fPIC -I. -I/export/home/python/dist/src/./Include -I/usr/local/include -I/export/home/python/dist/src/Include -I/export/home/python/dist/src -c /export/home/python/dist/src/Modules/bz2module.c -o build/temp.solaris-2.8-sun4u-2.3/bz2module.o
    cc1: warning: changing search order for system directory "/usr/local/include"
    cc1: warning:   as it has already been specified as a non-system directory
    /export/home/python/dist/src/Modules/bz2module.c: In function `Util_CatchBZ2Error':
    /export/home/python/dist/src/Modules/bz2module.c:120: `BZ_CONFIG_ERROR' undeclared (first use in this function)
    /export/home/python/dist/src/Modules/bz2module.c:120: (Each undeclared identifier is reported only once
    ...

This particular machine has a /usr/include/bzlib.h file with a copyright
date of 1998.  There are several other BZ_*_ERROR defines, but not
BZ_CONFIG_ERROR.  Adding a conditional define for that macro isn't
sufficient to get it to compile.  I get lots of "structure has no ..."
errors:

    Modules/bz2module.c:1521: structure has no member named `total_out_hi32'
    Modules/bz2module.c:1521: structure has no member named `total_out_lo32'

Perhaps this version of bz2 lib is too old to use with Gustavo's module?

Skip


From guido@python.org  Fri Apr 25 20:24:42 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 15:24:42 -0400
Subject: [Python-Dev] Tagging the tree
Message-ID: <200304251924.h3PJOgw25941@pcp02138704pcs.reston01.va.comcast.net>

I'm tagging the CVS tree now.  Please no more checkins until the
release is announced or unless I specifically ask you!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Fri Apr 25 20:32:26 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 25 Apr 2003 21:32:26 +0200
Subject: [Python-Dev] should sre.Scanner be exposed through re and documented?
In-Reply-To: <16041.22107.533893.743928@montanaro.dyndns.org>
References: <16041.22107.533893.743928@montanaro.dyndns.org>
Message-ID: <3EA98D4A.40107@v.loewis.de>

Skip Montanaro wrote:
> While moving tests from test_sre to test_re I stumbled upon a simple test
> for sre.Scanner.  This looks fairly cool.  Should it be exposed through re
> and documented?

I think /F did not consider it ready for general consumption. I believe
the approach is cool, but the API would still leave features to be 
desired. In practical compiler construction, I usually copy the 
approach, and duplicate it - that gives a very efficient and readily 
comprehensible scanner.

IOW, I would leave it where it is: As a masterpiece of work to get 
inspiration from, but not as a tool to give out to anybody.

Regards,
Martin



From guido@python.org  Fri Apr 25 20:38:36 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 15:38:36 -0400
Subject: [Python-Dev] bz2 module fails to compile on Solaris 8
In-Reply-To: "Your message of Fri, 25 Apr 2003 11:27:22 CDT."
 <16041.25066.559968.451868@montanaro.dyndns.org>
References: <16041.25066.559968.451868@montanaro.dyndns.org>
Message-ID: <200304251938.h3PJcap26135@pcp02138704pcs.reston01.va.comcast.net>

> The bz2 module isn't compiling for me on Solaris 8:
> 
>     building 'bz2' extension
>     gcc -g -Wall -Wstrict-prototypes -fPIC -I. -I/export/home/python/dist/src/./Include -I/usr/local/include -I/export/home/python/dist/src/Include -I/export/home/python/dist/src -c /export/home/python/dist/src/Modules/bz2module.c -o build/temp.solaris-2.8-sun4u-2.3/bz2module.o
>     cc1: warning: changing search order for system directory "/usr/local/include"
>     cc1: warning:   as it has already been specified as a non-system directory
>     /export/home/python/dist/src/Modules/bz2module.c: In function `Util_CatchBZ2Error':
>     /export/home/python/dist/src/Modules/bz2module.c:120: `BZ_CONFIG_ERROR' undeclared (first use in this function)
>     /export/home/python/dist/src/Modules/bz2module.c:120: (Each undeclared identifier is reported only once
>     ...
> 
> This particular machine has a /usr/include/bzlib.h file with a copyright
> date of 1998.  There are several other BZ_*_ERROR defines, but not
> BZ_CONFIG_ERROR.  Adding a conditional define for that macro isn't
> sufficient to get it to compile.  I get lots of "structure has no ..."
> errors:
> 
>     Modules/bz2module.c:1521: structure has no member named `total_out_hi32'
>     Modules/bz2module.c:1521: structure has no member named `total_out_lo32'
> 
> Perhaps this version of bz2 lib is too old to use with Gustavo's module?

Again, maybe we should just give up on Solaris. :-(

Please work with Gustavo to fix this after the b1 release.  (I'm still
waiting for the "cvs tag" command to finish...)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 25 20:37:17 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 15:37:17 -0400
Subject: [Python-Dev] test_logging hangs on Solaris 8
In-Reply-To: "Your message of Fri, 25 Apr 2003 13:22:43 CDT."
 <16041.31987.943313.278329@montanaro.dyndns.org>
References: <16041.31987.943313.278329@montanaro.dyndns.org>
Message-ID: <200304251937.h3PJbHr26118@pcp02138704pcs.reston01.va.comcast.net>

> Using the latest version from CVS, on Solaris 8 test_logging hangs.  Lots of
> output, then:
> 
>     ...
>     INFO:root:Info index = 99
>     -- logging 100 at INFO, messages should be seen every 10 events --
>     -- logging 101 at INFO, messages should be seen every 10 events --
>     INFO:root:Info index = 100
>     INFO:root:Info index = 101
>     -- log_test2  end    ---------------------------------------------------
>     -- log_test3  begin  ---------------------------------------------------
>     Unfiltered...
>     INFO:a:Info 1
>     INFO:a.b:Info 2
>     INFO:a.c:Info 3
>     INFO:a.b.c:Info 4
>     INFO:a.b.c.d:Info 5
>     INFO:a.bb.c:Info 6
>     INFO:b:Info 7
>     INFO:b.a:Info 8
>     INFO:c.a.b:Info 9
>     INFO:a.bb:Info 10
>     Filtered with 'a.b'...
>     INFO:a.b:Info 2
>     INFO:a.b.c:Info 4
>     INFO:a.b.c.d:Info 5
>     -- log_test3  end    ---------------------------------------------------
> 
> and it just sits there.  ^C doesn't terminate it.  I have to stop it w/ ^Z,
> then "kill %1" it.  I have the very latest source checked out.  Any ideas?

Let's eradicate Solaris from the universe. :-)

Seriously, this will have to wait until after the b1 release today.
Someone else reported success on Solaris 8 IIRC.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Apr 25 20:38:55 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 15:38:55 -0400
Subject: [Python-Dev] should sre.Scanner be exposed through re and
 documented?
In-Reply-To: "Your message of Fri, 25 Apr 2003 10:38:03 CDT."
 <16041.22107.533893.743928@montanaro.dyndns.org>
References: <16041.22107.533893.743928@montanaro.dyndns.org>
Message-ID: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net>

> While moving tests from test_sre to test_re I stumbled upon a simple test
> for sre.Scanner.  This looks fairly cool.  Should it be exposed through re
> and documented?

What's Scanner?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From neal@metaslash.com  Fri Apr 25 20:41:48 2003
From: neal@metaslash.com (Neal Norwitz)
Date: Fri, 25 Apr 2003 15:41:48 -0400
Subject: [Python-Dev] test_logging hangs on Solaris 8
In-Reply-To: <16041.31987.943313.278329@montanaro.dyndns.org>
References: <16041.31987.943313.278329@montanaro.dyndns.org>
Message-ID: <20030425194147.GG12173@epoch.metaslash.com>

On Fri, Apr 25, 2003 at 01:22:43PM -0500, Skip Montanaro wrote:
> Using the latest version from CVS, on Solaris 8 test_logging hangs.  Lots of
> output, then:
> 
>     ...
> 
> and it just sits there.  ^C doesn't terminate it.  I have to stop it w/ ^Z,
> then "kill %1" it.  I have the very latest source checked out.  Any ideas?

On Solaris 8, I've had the test pass, hang, and crash the interpreter
(actually it was test_threaded_import when running all the tests).
The problem may be related to Mark's changes, but I'm not sure.

Anyway, the tests passed when run many times with the change below,
perhaps it will work for you.  I'm running the entire suite on
Solaris now.  The change works on Linux.

Neal
--
--- Lib/test/test_logging.py.save       2003-04-25 15:30:23.000000000 -0400
+++ Lib/test/test_logging.py    2003-04-25 15:30:52.000000000 -0400
@@ -470,6 +470,8 @@
         socketDataProcessed.acquire()
         socketDataProcessed.wait()
         socketDataProcessed.release()
+        for thread in threads:
+            thread.join()
         banner("logrecv output", "begin")
         sys.stdout.write(sockOut.getvalue())
         sockOut.close()


From gherron@islandtraining.com  Fri Apr 25 20:49:33 2003
From: gherron@islandtraining.com (Gary Herron)
Date: Fri, 25 Apr 2003 12:49:33 -0700
Subject: [Python-Dev] should sre.Scanner be exposed through re and documented?
In-Reply-To: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net>
References: <16041.22107.533893.743928@montanaro.dyndns.org> <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304251249.33780.gherron@islandtraining.com>

On Friday 25 April 2003 12:38 pm, Guido van Rossum wrote:
> > While moving tests from test_sre to test_re I stumbled upon a simple test
> > for sre.Scanner.  This looks fairly cool.  Should it be exposed through
> > re and documented?
>
> What's Scanner?
>

You create a Scanner instance with a list of re's and associated
functions, then you use it to scan a string, returning a list of parts
which match the given re's. (Actually the matches are run through the
associated functions, and their output is what forms the returned
list.)


Here's the single test case Skip refereed to:


def s_ident(scanner, token): return token
def s_operator(scanner, token): return "op%s" % token
def s_float(scanner, token): return float(token)
def s_int(scanner, token): return int(token)

scanner = sre.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

# sanity check
test('scanner.scan("sum = 3*foo + 312.50 + bar")',
     (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], ''))

Gary Herron






From skip@pobox.com  Fri Apr 25 20:58:34 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 14:58:34 -0500
Subject: [Python-Dev] should sre.Scanner be exposed through re and
 documented?
In-Reply-To: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net>
References: <16041.22107.533893.743928@montanaro.dyndns.org>
 <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <16041.37738.788965.865040@montanaro.dyndns.org>

    Guido> What's Scanner?

Gary already posted the example I was going to (damn phone!)... ;-)

I defer to Martin's judgement on this.  (I presume his response has passed
through your mailbox by now.)  I still think it would be nice to demonstrate
it somewhere.  I'll look and see if there's somewhere some toy script can be
squeezed into the Demo directory.

Skip


From skip@pobox.com  Fri Apr 25 21:04:46 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 15:04:46 -0500
Subject: [Python-Dev] test_logging hangs on Solaris 8
In-Reply-To: <20030425194147.GG12173@epoch.metaslash.com>
References: <16041.31987.943313.278329@montanaro.dyndns.org>
 <20030425194147.GG12173@epoch.metaslash.com>
Message-ID: <16041.38110.284173.399590@montanaro.dyndns.org>

    Neal> Anyway, the tests passed when run many times with the change
    Neal> below, perhaps it will work for you.  I'm running the entire suite
    Neal> on Solaris now.  The change works on Linux.

Thanks.  Alas, it didn't seem to help.

Skip


From guido@python.org  Fri Apr 25 21:07:43 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 16:07:43 -0400
Subject: [Python-Dev] test_logging hangs on Solaris 8
In-Reply-To: "Your message of Fri, 25 Apr 2003 15:41:48 EDT."
 <20030425194147.GG12173@epoch.metaslash.com>
References: <16041.31987.943313.278329@montanaro.dyndns.org>
 <20030425194147.GG12173@epoch.metaslash.com>
Message-ID: <200304252007.h3PK7h426521@pcp02138704pcs.reston01.va.comcast.net>

> From: Neal Norwitz <neal@metaslash.com>

> On Fri, Apr 25, 2003 at 01:22:43PM -0500, Skip Montanaro wrote:
> > Using the latest version from CVS, on Solaris 8 test_logging hangs.  Lots of
> > output, then:
> > 
> >     ...
> > 
> > and it just sits there.  ^C doesn't terminate it.  I have to stop it w/ ^Z,
> > then "kill %1" it.  I have the very latest source checked out.  Any ideas?
> 
> On Solaris 8, I've had the test pass, hang, and crash the interpreter
> (actually it was test_threaded_import when running all the tests).
> The problem may be related to Mark's changes, but I'm not sure.
> 
> Anyway, the tests passed when run many times with the change below,
> perhaps it will work for you.  I'm running the entire suite on
> Solaris now.  The change works on Linux.
> 
> Neal
> --
> --- Lib/test/test_logging.py.save       2003-04-25 15:30:23.000000000 -0400
> +++ Lib/test/test_logging.py    2003-04-25 15:30:52.000000000 -0400
> @@ -470,6 +470,8 @@
>          socketDataProcessed.acquire()
>          socketDataProcessed.wait()
>          socketDataProcessed.release()
> +        for thread in threads:
> +            thread.join()
>          banner("logrecv output", "begin")
>          sys.stdout.write(sockOut.getvalue())
>          sockOut.close()

OK, I'll add that to the release branch, so I can scratch at least one
of the "known bugs" we start out with...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From wesleyhenwood@hotmail.com  Fri Apr 25 21:23:53 2003
From: wesleyhenwood@hotmail.com (wesley henwood)
Date: Fri, 25 Apr 2003 20:23:53 +0000
Subject: [Python-Dev] Re: PyRun_* functions
Message-ID: <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com>

>>How do I make certain that  FILE* parameters are only passed to these 
>>functions if it is certain that they were created by the same library 
>> >>that the Python runtime is using?

>On which platform?

Windows.






_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail



From guido@python.org  Fri Apr 25 22:02:53 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 17:02:53 -0400
Subject: [Python-Dev] Re: PyRun_* functions
In-Reply-To: "Your message of Fri, 25 Apr 2003 20:23:53 -0000."
 <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com>
References: <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com>
Message-ID: <200304252102.h3PL2r426956@pcp02138704pcs.reston01.va.comcast.net>

> >>How do I make certain that  FILE* parameters are only passed to these 
> >>functions if it is certain that they were created by the same library 
> >> >>that the Python runtime is using?
> 
> >On which platform?
> 
> Windows.

Link your application with MSVCRT.DLL.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim@zope.com  Fri Apr 25 22:21:28 2003
From: tim@zope.com (Tim Peters)
Date: Fri, 25 Apr 2003 17:21:28 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <41193.127.0.0.1.1051271225.squirrel@127.0.0.1>
Message-ID: <BIEJKCLHCIOIHAGOKOLHAEIJFHAA.tim@zope.com>

[andrew cooke]
> For the record - the version I posted (with breadth-first as an option)
> wasn't reliable (it runs out of stack space on reasonable directory
> structures).

Apart from that, did you have a use case for breadth-first directory
traversal?  Because it's clumsier, you usually find BFS only used on search
trees that are too deep/expensive to traverse exhaustively (e.g., a tree of
chess moves), or that have infinite paths (so that DFS can't terminate even
in theory).  Directory trees aren't usually <wink> of that nature.



From drifty@alum.berkeley.edu  Fri Apr 25 23:17:52 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 15:17:52 -0700 (PDT)
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > test_urllib.py is crashing on my fresh WinMe build:
> >
> > test_fileno (__main__.urlopen_FileTests) ... FAIL
> > test_iter (__main__.urlopen_FileTests) ... FAIL
> > test_read (__main__.urlopen_FileTests) ... FAIL
> > test_readline (__main__.urlopen_FileTests) ... FAIL
> > test_readlines (__main__.urlopen_FileTests) ... FAIL
>
> Should be fixed now -- I'm writing the file with test data in binary
> mode.
>

Didn't even think of that problem when I wrote the tests.  Should I patch
the docs for urllib (again  =)  to say that files are open in binary?  I
know I wasn't expecting urllib to open in binary mode for a local text
file.

Thanks for fixing this, Guido.  I think I am going to do a self-imposed
"no checkins within 24 hours of a planned release" rule.

-Brett


From drifty@alum.berkeley.edu  Fri Apr 25 23:41:17 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 15:41:17 -0700 (PDT)
Subject: [Python-Dev] More new Windos test failures
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com>
References: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com>
Message-ID: <Pine.SOL.4.55.0304251538420.25263@death.OCF.Berkeley.EDU>

[Tim Peters]

> test_urllib (these may all be bad line-end assumptions):
>

Yep, it looks like it is line-ending issues.  Is this still happening even
after Guido changed the test to open the files in binary?  If it is I will
change the tests after Guido give the all clear for CVS checkins again and
strip all text before comparing.

> ======================================================================
> FAIL: test_fileno (__main__.urlopen_FileTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_urllib.py", line 68, in test_fileno
>     "Reading on the file descriptor returned by fileno() "
>   File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: Reading on the file descriptor returned by fileno() did not
> return the expected text
>
> ======================================================================
> FAIL: test_iter (__main__.urlopen_FileTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_urllib.py", line 88, in test_iter
>     self.assertEqual(line, self.text)
>   File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: 'test_urllib: urlopen_FileTests\r\n' != 'test_urllib:
> urlopen_FileTests\n'
>
> ======================================================================
> FAIL: test_read (__main__.urlopen_FileTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_urllib.py", line 48, in test_read
>     self.assertEqual(self.text, self.returned_obj.read())
>   File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib:
> urlopen_FileTests\r\n'
>
> ======================================================================
> FAIL: test_readline (__main__.urlopen_FileTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_urllib.py", line 51, in test_readline
>     self.assertEqual(self.text, self.returned_obj.readline())
>   File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib:
> urlopen_FileTests\r\n'
>
> ======================================================================
> FAIL: test_readlines (__main__.urlopen_FileTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "../lib/test/test_urllib.py", line 61, in test_readlines
>     "readlines() returned improper text")
>   File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual
>     raise self.failureException, \
> AssertionError: readlines() returned improper text
>
> ----------------------------------------------------------------------
> Ran 23 tests in 0.280s
>
> FAILED (failures=5)


From duanev@io.com  Fri Apr 25 23:44:05 2003
From: duanev@io.com (Duane Voth)
Date: Fri, 25 Apr 2003 17:44:05 -0500
Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses!
In-Reply-To: <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Fri, Apr 25, 2003 at 12:00:27PM -0400
References: <20030425104748.A26488@io.com> <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030425174405.A31111@io.com>

On Fri, Apr 25, 2003 at 12:00:27PM -0400, Guido van Rossum wrote:
> You should be able to get that out of CVS.  The oldest version
> ...
> is labeled 2.1, but the CVS version numbers don't match author's
> versions.

After taking a closer look, not even _cursesmodule.c 2.1 is going to
help me.  The copyright in the /usr/include/curses.h on this box is:

/*
 * Copyright (c) 1980 Regents of the University of California.
 * All rights reserved.  The Berkeley software License Agreement
 * specifies the terms and conditions for redistribution.
 *
 *      @(#)curses.h    5.1 (Berkeley) 6/7/85
 */

There is no support for either attributes or colors!  Taking the
current source and cutting out everything unsupported will at least
keep the python-curses API intact.  That's probably my best route.

-- 
Duane Voth
duanev@io.com
--
duanev@atlantis.io.com


From Jack.Jansen@oratrix.com  Fri Apr 25 23:47:43 2003
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Sat, 26 Apr 2003 00:47:43 +0200
Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses!
In-Reply-To: <3EA96940.4060501@v.loewis.de>
Message-ID: <ED18587C-776F-11D7-B113-000A27B19B96@oratrix.com>

On vrijdag, apr 25, 2003, at 18:58 Europe/Amsterdam, Martin v. L=F6wis=20=

wrote:
>>  *   Based on prior work by Lance Ellinghaus and Oliver Andrich
>>  *   Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse,
>>  *    Cathedral City, California Republic, United States of America.
>>  *
>>  *   Version 1.5b1, heavily extended for ncurses by Oliver Andrich:
>>  *   Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany.
>> so I guess I'm looking for version 1.2 of _cursesmodule.c.
>
> I think your guess is wrong. The extensions are used only if =
available,
> and the curses module works with pre-ncurses implementations of curses
> just fine.

Not in all cases. Before MacOSX had ncurses (MacOSX 10.1 and earlier had
an ancient BSD curses) the only solution was to disable building curses,
as the module didn't compile, and fixing it was far from obvious.
--
- Jack Jansen        <Jack.Jansen@oratrix.com>       =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma=20
Goldman -



From fincher.8@osu.edu  Sat Apr 26 00:48:26 2003
From: fincher.8@osu.edu (Jeremy Fincher)
Date: Fri, 25 Apr 2003 19:48:26 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304241650.h3OGoPM15432@odiug.zope.com>
References: <1051202649.3ea814599f6fa@mcherm.com>
 <200304241650.h3OGoPM15432@odiug.zope.com>
Message-ID: <200304251948.26774.fincher.8@osu.edu>

On Thursday 24 April 2003 12:50 pm, Guido van Rossum wrote:
> Agreed.  How about naming it os.walk()?  I think it's not OS specific
> -- all the OS specific stuff is part of os.path.  So we only need one
> implementation.

It's a minor quibble to be sure, but os.walk doesn't really describe what 
exactly it's doing.  I'd suggest os.pathwalk, but that'd be too error-prone, 
being os.path.walk without a dot.  Perhaps os.pathwalker?

Just a (likely ill-informed :)) opinion :)

Jeremy



From tim@zope.com  Sat Apr 26 00:45:33 2003
From: tim@zope.com (Tim Peters)
Date: Fri, 25 Apr 2003 19:45:33 -0400
Subject: [Python-Dev] More new Windos test failures
In-Reply-To: <Pine.SOL.4.55.0304251538420.25263@death.OCF.Berkeley.EDU>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com>

>> test_urllib (these may all be bad line-end assumptions):

[Brett]
> Yep, it looks like it is line-ending issues.  Is this still happening
> even after Guido changed the test to open the files in binary?

No, all is well now.  That's why you didn't see a sequence of increasingly
vicious msgs from me <wink>.



From guido@python.org  Sat Apr 26 00:51:30 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 19:51:30 -0400
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: "Your message of Fri, 25 Apr 2003 15:17:52 PDT."
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
Message-ID: <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>

> [Guido van Rossum]
> 
> > > test_urllib.py is crashing on my fresh WinMe build:
> > >
> > > test_fileno (__main__.urlopen_FileTests) ... FAIL
> > > test_iter (__main__.urlopen_FileTests) ... FAIL
> > > test_read (__main__.urlopen_FileTests) ... FAIL
> > > test_readline (__main__.urlopen_FileTests) ... FAIL
> > > test_readlines (__main__.urlopen_FileTests) ... FAIL
> >
> > Should be fixed now -- I'm writing the file with test data in binary
> > mode.
> >
> 
> Didn't even think of that problem when I wrote the tests.  Should I
> patch the docs for urllib (again =) to say that files are open in
> binary?  I know I wasn't expecting urllib to open in binary mode for
> a local text file.

It's a good idea to document that urllib (currently!) never does
newline translation.  Given that URLs often point to binary files,
that's probably a good idea!

> Thanks for fixing this, Guido.  I think I am going to do a self-imposed
> "no checkins within 24 hours of a planned release" rule.

Yeah, me too. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From drifty@alum.berkeley.edu  Sat Apr 26 00:54:34 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 16:54:34 -0700 (PDT)
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
 <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > [Guido van Rossum]
> > Didn't even think of that problem when I wrote the tests.  Should I
> > patch the docs for urllib (again =) to say that files are open in
> > binary?  I know I wasn't expecting urllib to open in binary mode for
> > a local text file.
>
> It's a good idea to document that urllib (currently!) never does
> newline translation.  Given that URLs often point to binary files,
> that's probably a good idea!
>

OK.  I will patch the docs and the docstrings (and backport it as
necessary) after you raise the commit moratorium.

> > Thanks for fixing this, Guido.  I think I am going to do a self-imposed
> > "no checkins within 24 hours of a planned release" rule.
>
> Yeah, me too. :-)
>

 Perhaps this should be in the FAQ?

-Brett


From drifty@alum.berkeley.edu  Sat Apr 26 00:57:46 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 16:57:46 -0700 (PDT)
Subject: [Python-Dev] More new Windos test failures
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com>
References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com>
Message-ID: <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU>

[Tim Peters]

> >> test_urllib (these may all be bad line-end assumptions):
>
> [Brett]
> > Yep, it looks like it is line-ending issues.  Is this still happening
> > even after Guido changed the test to open the files in binary?
>
> No, all is well now.  That's why you didn't see a sequence of increasingly
> vicious msgs from me <wink>.
>

=)  I have fixed my copy, though, to rstrip all the text that is compared
in case Guido's quick fix is removed later.  I will commit it when Guido
gives the all-clear.

-Brett


From guido@python.org  Sat Apr 26 01:02:53 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 20:02:53 -0400
Subject: [Python-Dev] Failin tests on Windows
In-Reply-To: "Your message of Fri, 25 Apr 2003 16:54:34 PDT."
 <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
 <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU>
Message-ID: <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net>

> > It's a good idea to document that urllib (currently!) never does
> > newline translation.  Given that URLs often point to binary files,
> > that's probably a good idea!
> 
> OK.  I will patch the docs and the docstrings (and backport it as
> necessary) after you raise the commit moratorium.

Consider it raised.  Python 2.3b1 is officially released!

> > > Thanks for fixing this, Guido.  I think I am going to do a
> > > self-imposed "no checkins within 24 hours of a planned release"
> > > rule.
> >
> > Yeah, me too. :-)
> 
>  Perhaps this should be in the FAQ?

But then releases would be so *boring*! :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From drifty@alum.berkeley.edu  Sat Apr 26 01:07:34 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 17:07:34 -0700 (PDT)
Subject: [Python-Dev] Rules of a beta release?
In-Reply-To: <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
 <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU>
 <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > > It's a good idea to document that urllib (currently!) never does
> > > newline translation.  Given that URLs often point to binary files,
> > > that's probably a good idea!
> >
> > OK.  I will patch the docs and the docstrings (and backport it as
> > necessary) after you raise the commit moratorium.
>
> Consider it raised.  Python 2.3b1 is officially released!
>

Wonderful!

> > > > Thanks for fixing this, Guido.  I think I am going to do a
> > > > self-imposed "no checkins within 24 hours of a planned release"
> > > > rule.
> > >
> > > Yeah, me too. :-)
> >
> >  Perhaps this should be in the FAQ?
>
> But then releases would be so *boring*! :-)
>

I think Raymond should add something about this in his next bit of "Hard
Knocks"-type writing.  =)


Now that we are officially in a beta release, I want to clarify what the
ground rules are in terms of commits are.  Obviously no new functionality
such as new modules or built-ins.  But what about small features?
Specifically, since I have CVS commit I can finally apply my patch to
regrtest.py to allow the use of a skips.txt file listing tests to skip
(unless people don't want it anymore).  Now that is a new feature, but it
is minor *and* it is on an undocumented module (for now; I will get those
docs done before 2.3 final is reached).

Is this reasonable to commit now?  Anything else I should know so I don't
run a muck in CVS?  =)

-Brett


From guido@python.org  Sat Apr 26 01:12:41 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 20:12:41 -0400
Subject: [Python-Dev] RELEASED: Python 2.3b1
Message-ID: <200304260012.h3Q0CgE03802@pcp02138704pcs.reston01.va.comcast.net>

Python 2.3b1 is the first beta release of Python 2.3.  Much improved
since the last alpha, chockfull of things you'd like to check out:

  http://www.python.org/2.3/

Some highlights of what's new since 2.3a2:

- sum() builtin, adds a sequence of numbers, beats reduce().

- csv module, reads comma-separated-value files (and more).

- timeit module, times code snippets.

- os.walk(), a generator slated to replace os.path.walk().

- platform module, by Marc-Andre Lemburg, returns detailed platform
  information.

For more highlights, see http://www.python.org/2.3/highlights.html

New since Python 2.2:

- Many new and improved library modules, e.g. sets, heapq, datetime,
  textwrap, optparse, logging, bsddb, bz2, tarfile,
  ossaudiodev, and a new random number generator based on the highly
  acclaimed Mersenne Twister algorithm (with a period of 2**19937-1!).

- New builtin enumerate(): an iterator yielding (index, item) pairs.

- Extended slices, e.g. "hello"[::-1] returns "olleh".

- Universal newlines mode for reading files (converts \r, \n and \r\n
  all into \n).

- Source code encoding declarations.  (PEP 263)

- Import from zip files.  (PEP 273 and PEP 302)

- FutureWarning issued for "unsigned" operations on ints.  (PEP 237)

- Faster list.sort() is now stable.

- Unicode filenames on Windows.

- Karatsuba long multiplication (running time O(N**1.58) instead of
  O(N**2)).

See also http://www.python.org/doc/2.3b1/whatsnew/ - Andrew Kuchling's
description of all important changes since 2.2.

We request widespread testing of this release but don't recommend
using it for production situations yet.  Beta releases contain bugs.
New APIs are expected to be stable, and may be changed only if serious
deficiencies are found.  No new APIs or modules will be added after
the first beta release.  If you have an important Python application,
we strongly recommend that you try it out with a beta release and
report any incompatibilities or other problems you may encounter, so
that they can be fixed before the final release.  To report problems,
use the SourceForge bug tracker:

  http://sourceforge.net/tracker/?group_id=5470&atid=105470

Enjoy!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Sat Apr 26 01:16:50 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sat, 26 Apr 2003 10:16:50 +1000
Subject: [Python-Dev] New thread death in test_bsddb3
In-Reply-To: <y91ycusm.fsf@python.net>
Message-ID: <015c01c30b89$22b46a60$530f8490@eden>

> So is the policy now that it is no longer *allowed* to create another
> thread state, while in previous versions there wasn't any choice,
> because there existed no way to get the existing one?

Only not allowed under debug builds <wink>.

I would be more than happy to have this code print a warning, or take some
alternative action - but I would hate to see the message dropped.

Would a PyErr_Warning call be more appropriate?  The only issue here is that
literally *thousands* may be generated.

Mark.



From guido@python.org  Sat Apr 26 01:19:14 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 20:19:14 -0400
Subject: [Python-Dev] Rules of a beta release?
In-Reply-To: "Your message of Fri, 25 Apr 2003 17:07:34 PDT."
 <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU>
References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net>
 <005d01c30b17$28840580$1a3cc797@oemcomputer>
 <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU>
 <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU>
 <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net>
 <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU>
Message-ID: <200304260019.h3Q0JEV03824@pcp02138704pcs.reston01.va.comcast.net>

> Now that we are officially in a beta release, I want to clarify what
> the ground rules are in terms of commits are.  Obviously no new
> functionality such as new modules or built-ins.  But what about
> small features?  Specifically, since I have CVS commit I can finally
> apply my patch to regrtest.py to allow the use of a skips.txt file
> listing tests to skip (unless people don't want it anymore).  Now
> that is a new feature, but it is minor *and* it is on an
> undocumented module (for now; I will get those docs done before 2.3
> final is reached).

IMO in general fiddling with the test suite during beta is okay.

There should be guidelines for this (for all I know there's already a
PEP :-) but I'm too tired to write any more about it.  Use common
sense.  It should be the case that if someone tested their application
with 2.3b1 and they tweaked everything to work with that version, they
shouldn't have to tweak anything to work with 2.3final.

I plan to make an exception for IDLE: a brand new copy of IDLEfork
will replace the current IDLE 0.8.  I was very tempted to include it
today, but there wasn't time to get all the loose ends tied up: it has
a C extension now, and the Windows installer would have to be
changed; plus, Kurt has some improvements that he hasn't even checked
in.  So he'll do an independent IDLEfork beta, and then it'll be
incorporated into Python.  Hopefully that will all be done two weeks
from now.

> Is this reasonable to commit now?  Anything else I should know so I
> don't run a muck in CVS?  =)

Don't be too fearful -- if you really commit an atrocity, the nice
thing of CVS is that it's easy to roll back. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sat Apr 26 01:21:37 2003
From: guido@python.org (Guido van Rossum)
Date: Fri, 25 Apr 2003 20:21:37 -0400
Subject: [Python-Dev] More new Windos test failures
In-Reply-To: "Your message of Fri, 25 Apr 2003 16:57:46 PDT."
 <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU>
References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com>
 <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU>
Message-ID: <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net>

> =) I have fixed my copy, though, to rstrip all the text that is
> compared in case Guido's quick fix is removed later.  I will commit
> it when Guido gives the all-clear.

I just realized that this would be *wrong* -- URLs may point to binary
files and there's no reliable way to know whether this is the case.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Sat Apr 26 01:27:48 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sat, 26 Apr 2003 10:27:48 +1000
Subject: [Python-Dev] PyRun_* functions
In-Reply-To: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com>
Message-ID: <017501c30b8a$aaa22010$530f8490@eden>

> Its seems that it would be a good enhancement to remove the
> FILE pointer
> parameter from these functions, and just use the file name.
> For example,
> change PyRun_SimpleFile( FILE *fp, char *filename) to
> PyRun_SimpleFile(char
> *filename). Then no one would have to worry about the incompatibility.

Or simply a PyFile_Open/Close pair - exactly mirroring fopen(), but inside
the Python DLL, so guaranteed to use the same library.

I believe the only reason this hasn't come up before as a patch is that
PyRun_() functions that take file objects are great "getting started"
functions, but tend to not be used in real apps - in that case the
requirements start getting tricker, so you tend to drop down to the
lower-level Python APIs.

If it really did worry you, I would expect a patch at sourceforge with these
2 new functions would have a good chance of getting in.

Mark.



From drifty@alum.berkeley.edu  Sat Apr 26 01:32:34 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Fri, 25 Apr 2003 17:32:34 -0700 (PDT)
Subject: [Python-Dev] More new Windos test failures
In-Reply-To: <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net>
References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com>
 <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU>
 <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.SOL.4.55.0304251731590.9257@death.OCF.Berkeley.EDU>

[Guido van Rossum]

> > =) I have fixed my copy, though, to rstrip all the text that is
> > compared in case Guido's quick fix is removed later.  I will commit
> > it when Guido gives the all-clear.
>
> I just realized that this would be *wrong* -- URLs may point to binary
> files and there's no reliable way to know whether this is the case.
>

OK, so then I won't commit my changes and let the stand as they are in CVS
right now.

-Brett


From Raymond Hettinger" <python@rcn.com  Sat Apr 26 01:33:00 2003
From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger)
Date: Fri, 25 Apr 2003 20:33:00 -0400
Subject: [Python-Dev] Curiousity
Message-ID: <003d01c30b8b$64697b60$125ffea9@oemcomputer>

Do we have download statistics for the 
various releases including alpha and betas?


Raymond Hettinger


From mark@ned.dem.csiro.au  Sat Apr 26 03:17:58 2003
From: mark@ned.dem.csiro.au (Mark Favas)
Date: Sat, 26 Apr 2003 10:17:58 +0800 (WST)
Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9)
Message-ID: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au>

Just confirming Skip's observation - 2.3b1 test_logging (with Neal's patch) passed once on Solaris 9 (gcc 3.2.2) but failed thereafter. No other test failures.

Mark Favas


From skip@pobox.com  Sat Apr 26 03:56:06 2003
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 25 Apr 2003 21:56:06 -0500
Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9)
In-Reply-To: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au>
References: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au>
Message-ID: <16041.62790.619502.562615@montanaro.dyndns.org>

    Mark> Just confirming Skip's observation - 2.3b1 test_logging (with
    Mark> Neal's patch) passed once on Solaris 9 (gcc 3.2.2) but failed
    Mark> thereafter. No other test failures.

Failed (completed with one or more failures or errors) or hung?

Skip


From mark@ned.dem.csiro.au  Sat Apr 26 11:11:16 2003
From: mark@ned.dem.csiro.au (Mark Favas)
Date: Sat, 26 Apr 2003 18:11:16 +0800 (WST)
Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9)
Message-ID: <200304261011.h3QABGka004908@solo.ned.dem.csiro.au>

[Skip] Failed (completed with one or more failures or errors) or hung?

Sorry - hung, couldn't ^C it, had to ^Z and "kill %1" the "make test" process.

Mark Favas


From martin@v.loewis.de  Sat Apr 26 14:10:59 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 26 Apr 2003 15:10:59 +0200
Subject: [Python-Dev] Curiousity
In-Reply-To: <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net>
References: <003d01c30b8b$64697b60$125ffea9@oemcomputer> <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3EAA8563.6060404@v.loewis.de>

Guido van Rossum wrote:

> Please don't spread these around; I've made this response
> non-archivable by including an "X-Archive: No" header.  (It's okay IMO
> for folks receiving python-dev to see this, but not to spread it
> around.)

What is creating accesses to URLs like

/doc/2.3a2//////////////////////////////about.html

???

Regards,
Martin




From pje@telecommunity.com  Sat Apr 26 14:37:48 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Sat, 26 Apr 2003 09:37:48 -0400
Subject: [Python-Dev] Accepted PEPs?
Message-ID: <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com>

I was going over the PEP index this morning, and I noticed a large number 
of PEPs listed under the "open" list that would seem to me to be 
"accepted", if not "done" in some cases, according to the criteria 
described by the headings.  (Specifically, PEPs 218, 237, 273, 282, 283, 
301, 302, 305, and 307.)

Others under "open" I would guess are in fact "rejected", notably 294 (the 
patch was closed rejected) and 313 (presumably tongue-in-cheek).

Should I submit a patch for PEP 0?



From guido@python.org  Sat Apr 26 17:11:23 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 26 Apr 2003 12:11:23 -0400
Subject: [Python-Dev] Accepted PEPs?
In-Reply-To: "Your message of Sat, 26 Apr 2003 09:37:48 EDT."
 <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com>
References: <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com>
Message-ID: <200304261611.h3QGBNF05043@pcp02138704pcs.reston01.va.comcast.net>

> I was going over the PEP index this morning, and I noticed a large number 
> of PEPs listed under the "open" list that would seem to me to be 
> "accepted", if not "done" in some cases, according to the criteria 
> described by the headings.  (Specifically, PEPs 218, 237, 273, 282, 283, 
> 301, 302, 305, and 307.)

Some of those (e.g. 237) have multiple stages and ought to remain open
until the last stage is implemented.  283 ought to remain open until
Python 2.3 final is released.  Some others need to be brought in line
with what ended up being implemented.  Authors with commit privileges
can update their own PEPs; others can send patches or new versions to
the PEP editors.

> Others under "open" I would guess are in fact "rejected", notably
> 294 (the patch was closed rejected)

Correct -- this *issue* is still open, but the solution from the PEP
is rejected.

> and 313 (presumably tongue-in-cheek).

I think it's appropriate for April Fool's PEPs to be in limbo forever.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sat Apr 26 17:48:41 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 26 Apr 2003 12:48:41 -0400
Subject: [Python-Dev] Curiousity
In-Reply-To: "Your message of Sat, 26 Apr 2003 15:10:59 +0200."
 <3EAA8563.6060404@v.loewis.de>
References: <003d01c30b8b$64697b60$125ffea9@oemcomputer>
 <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net>
 <3EAA8563.6060404@v.loewis.de>
Message-ID: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>

> What is creating accesses to URLs like
> 
> /doc/2.3a2//////////////////////////////about.html
> 
> ???

I see these too (not that exact one though) and always in the form
/dev/doc/devel//////lib/<something>.  Grepping through today's access
log (/usr/local/log/httpd.access on creosote) suggests that these come
from Ultraseek.  This suggests that the spider on search.python.org
perhaps generates these.  It appears to generate such URLs with any
number of slashes between 1 and 6.  But I can't find any clues like
relative URLs using an extra / anywhere in those files.  It might be a
bug in Ultraseek's url joining algorithm.

A while ago some people were interested in upgrading our Ultraseek
setup, but that initiative seems to have fallen by the wayside. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com  Sat Apr 26 21:51:58 2003
From: pje@telecommunity.com (Phillip J. Eby)
Date: Sat, 26 Apr 2003 16:51:58 -0400
Subject: [Python-Dev] Accepted PEPs?
In-Reply-To: <200304261611.h3QGBNF05043@pcp02138704pcs.reston01.va.comca
 st.net>
References: <"Your message of Sat, 26 Apr 2003 09:37:48 EDT." <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com>
 <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20030426164409.034ae2b0@mail.telecommunity.com>

At 12:11 PM 4/26/03 -0400, Guido van Rossum wrote:
> > I was going over the PEP index this morning, and I noticed a large number
> > of PEPs listed under the "open" list that would seem to me to be
> > "accepted", if not "done" in some cases, according to the criteria
> > described by the headings.  (Specifically, PEPs 218, 237, 273, 282, 283,
> > 301, 302, 305, and 307.)
>
>Some of those (e.g. 237) have multiple stages and ought to remain open
>until the last stage is implemented.  283 ought to remain open until
>Python 2.3 final is released.  Some others need to be brought in line
>with what ended up being implemented.  Authors with commit privileges
>can update their own PEPs; others can send patches or new versions to
>the PEP editors.

The PEP list has an additional heading called "Accepted"; currently only 
252 and 253 are in that category.  I would've thought that the ones like 
237 and 283 that are not "Done" but are definitely "accepted" would go 
under that "Accepted" heading.

It's not a big deal, but it's very hard to see from the list which things 
are "in progress", "need revisions", or are "unlikely to make it".  So 
since I already took the trouble to work out the answers for myself, I 
thought I'd offer to help the next person who came along.  :)



From goodger@python.org  Sat Apr 26 23:20:53 2003
From: goodger@python.org (David Goodger)
Date: Sat, 26 Apr 2003 18:20:53 -0400
Subject: [Python-Dev] Accepted PEPs?
Message-ID: <3EAB0645.7040306@python.org>

[Phillip J. Eby]
 >>> I was going over the PEP index this morning, and I noticed a large
 >>> number of PEPs listed under the "open" list that would seem to me
 >>> to be "accepted", if not "done" in some cases, according to the
 >>> criteria described by the headings.  (Specifically, PEPs 218, 237,
 >>> 273, 282, 283, 301, 302, 305, and 307.)

Wearing my PEP Editor hat, I recently performed a similar exercise.  I
even got Guido's OK on suggested changes to Final and Approved on
those specific PEPs (all but 305, which I'd missed).

On further reflection however, I'm not sure that we should go forward
without at least giving the authors notice, and a chance to make
changes (especially, changes that bring PEPs in line with current
reality).  PEP 1 states:

     Once the authors have completed a PEP, they must inform the PEP
     editor that it is ready for review.  PEPs are reviewed by the BDFL
     and his chosen consultants, who may accept or reject a PEP or send
     it back to the author(s) for revision.

     Once a PEP has been accepted, the reference implementation must be
     completed.  When the reference implementation is complete and
     accepted by the BDFL, the status will be changed to "Final".

It's unclear whether the BDFL should even be able to review a PEP
without the author's review request (I'm pretty sure everyone would
agree that it's OK, but it's not clear from the wording).  So as not
to upset PEP authors unnecessarily ;-), I think we ought to follow the
formal process.  It's not too onerous; a simple note (stating "PEP X
is ready for review") to <peps@python.org> would be sufficient:

I'll send out reminders.

 > It's not a big deal, but it's very hard to see from the list which
 > things are "in progress", "need revisions", or are "unlikely to make
 > it". So since I already took the trouble to work out the answers for
 > myself, I thought I'd offer to help the next person who came along.
 > :)

Everyone needs a good kick in the pants once in a while, thanks.

 >>> Others under "open" I would guess are in fact "rejected", notably
 >>> 294 (the patch was closed rejected)

[Guido]
 >> Correct -- this *issue* is still open, but the solution from the
 >> PEP is rejected.

So is PEP 294 itself rejected?  Or should we await a formal review
request (as per the above)?

[Phillip]
 >>> Should I submit a patch for PEP 0?

Don't bother; I'll update it as required.

-- 
David Goodger                    <http://starship.python.net/~goodger>
Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/>

(Please cc: all PEP correspondence to <peps@python.org>.)



From goodger@python.org  Sat Apr 26 23:25:18 2003
From: goodger@python.org (David Goodger)
Date: Sat, 26 Apr 2003 18:25:18 -0400
Subject: [Python-Dev] Reminder to PEP authors
Message-ID: <3EAB074E.9040102@python.org>

There are several PEPs with "Draft" status which are ripe for review.
PEP 1 states:

     Once the authors have completed a PEP, they must inform the PEP
     editor that it is ready for review.  PEPs are reviewed by the BDFL
     and his chosen consultants, who may accept or reject a PEP or send
     it back to the author(s) for revision.

     Once a PEP has been accepted, the reference implementation must be
     completed.  When the reference implementation is complete and
     accepted by the BDFL, the status will be changed to "Final".

PEP authors, please keep your PEPs up to date.  When you think the PEP
is ready for review, please send a note to <peps@python.org> stating
"PEP X is ready for review".  Otherwise the PEP may remain in "Draft"
limbo indefinitely.  It is the PEP author's responsibility to move the
process forward:

     Each PEP must have a champion -- someone who writes the PEP using
     the style and format described below, shepherds the discussions in
     the appropriate forums, and attempts to build community consensus
     around the idea.

Authors with CVS check-in privileges are welcome to check in their own
content changes.  Others should send updates to <peps@python.org>
(please make updates to the latest text from CVS).

-- 
David Goodger                    <http://starship.python.net/~goodger>
Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/>

(Please cc: all PEP correspondence to <peps@python.org>.)



From barry@python.org  Sun Apr 27 02:28:12 2003
From: barry@python.org (Barry Warsaw)
Date: 26 Apr 2003 21:28:12 -0400
Subject: [Python-Dev] Curiousity
In-Reply-To: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>
References: <003d01c30b8b$64697b60$125ffea9@oemcomputer>
 <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net>
 <3EAA8563.6060404@v.loewis.de>
 <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <1051406872.20524.2.camel@geddy>

On Sat, 2003-04-26 at 12:48, Guido van Rossum wrote:

> A while ago some people were interested in upgrading our Ultraseek
> setup, but that initiative seems to have fallen by the wayside. :-(

Last time I talked to Thomas about this, I think he mentioned that the
machine he had earmarked for the upgrade got appropriated by others. 
IIRC, he was expected more machines to become available soon though.

-Barry




From gward@python.net  Sun Apr 27 02:35:42 2003
From: gward@python.net (Greg Ward)
Date: Sat, 26 Apr 2003 21:35:42 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net>
References: <20030423175310.F15881@localhost.localdomain> <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030427013542.GA919@cthulhu.gerg.ca>

On 23 April 2003, Guido van Rossum said:
> > A better comparison would be Habitat for Humanity (and voluntary
> > associations in general).  [...]
> 
> Maybe.  I get lots of junk mail asking for contributions from HforH
> and frankly I've always thought of them as yet another charity: there
> are lots of these, and most of them are so much larger than our
> community that comparison is difficult.

Don't forget, the PSF is gunning for charity status too.  That's just
the most obvious way to state legally, "We are a community with shared
values, etc. etc.".  I think there are a lot of parallels between open
source development and other volunteer organizations.  Heck, I like to
justify the occasional weekend spent hunkered down in front of the
computer by saying I'm doing volunteer work.  IMHO, hacking on Python is
the moral equivalent of helping to maintain public-access hiking
trails.  (Although the latter is better exercise, it's nice not to have
to jump in the shower after a day spent hacking on Python.  ;-)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Reality is for people who can't handle science fiction.


From gward@python.net  Sun Apr 27 02:44:07 2003
From: gward@python.net (Greg Ward)
Date: Sat, 26 Apr 2003 21:44:07 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304251948.26774.fincher.8@osu.edu>
References: <200304241650.h3OGoPM15432@odiug.zope.com> <200304251948.26774.fincher.8@osu.edu>
Message-ID: <20030427014407.GC919@cthulhu.gerg.ca>

On 25 April 2003, Jeremy Fincher said:
> It's a minor quibble to be sure, but os.walk doesn't really describe what 
> exactly it's doing.  I'd suggest os.pathwalk, but that'd be too error-prone, 
> being os.path.walk without a dot.  Perhaps os.pathwalker?

os.walktree?  os.walkdirs?  os.walkpath?

(On reflection, the latter two are pretty dumb.  walktree is the right
name, undoubtedly.  ;-)

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
God is omnipotent, omniscient, and omnibenevolent
---it says so right here on the label.


From gward@python.net  Sun Apr 27 02:55:55 2003
From: gward@python.net (Greg Ward)
Date: Sat, 26 Apr 2003 21:55:55 -0400
Subject: [Python-Dev] When is it okay to ``cvs remove``?
In-Reply-To: <1051240796.11580.4.camel@geddy>
References: <20030424225914.GA26254@xs4all.nl> <1051240796.11580.4.camel@geddy>
Message-ID: <20030427015555.GD919@cthulhu.gerg.ca>

On 24 April 2003, Barry Warsaw said:
> I know Guido doesn't care, but I like to have the file major revision
> numbers match the s/w's major rev number.

Blecchh!  Evil!  Wrong!  Bad!  Naughty, naughty!

Software versions have nothing to do with file revisions.  Some obscure
little file might change very little (or not at all) in going from
MyGreatBigProduct 1.4 to 2.0.  It's revision number should not be
artificially bumped just because a lot of other files in the same
project got bumped too.

> Really, I just hate to see
> huge minor revision numbers on files.

Then you'll just love Subversion: when Neil S. converted the MEMS
Exchange CVS repository to Subversion back in January, all of a sudden
every file we knew and loved had a revision number around 21000.  Yow!

I'm not convinced Subversion's model is exactly right, but it's
certainly no worse than CVS'.  Probably better.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
I'd rather have a bottle in front of me than have to have a frontal lobotomy.


From gward@python.net  Sun Apr 27 03:05:41 2003
From: gward@python.net (Greg Ward)
Date: Sat, 26 Apr 2003 22:05:41 -0400
Subject: [Python-Dev] test_ossaudiodev hanging again
In-Reply-To: <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net>
References: <1051287405.1009.66.camel@slothrop.zope.com> <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030427020541.GE919@cthulhu.gerg.ca>

On 25 April 2003, Guido van Rossum said:
> It probably never stopped hanging.  It only runs when you pass
> "-u audio" to regrtest though.
> 
> I note that it passes for me with Red Hat 7.3, so you might want to
> upgrade. :-)

Could be hardware, or it could be the device driver in the kernel.
Jeremy, what audio software do you use regularly -- xmms? play?
anything?  ossaudiodev currently goes to great pains to open the audio
device in what *seems* to be the right way, but I have no idea if it
really is.

(Oh yeah: it opens with O_NONBLOCK, to avoid hanging on the open() call.
Then it uses fcntl() to put the device back in blocking mode, so that
write() acts sanely.  If you really want to do non-blocking audio I/O,
you use the nonblock() method, which uses an OSS-specific ioctl().
O_NONBLOCK has no documented meaning with OSS; using it at open() time
was just a lucky guess on my part.  It does seem to affect write(), at
least with one of my audio devices.  [I have a sound card and an
external USB audio device, which makes things interesting at times.])

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
If you and a friend are being chased by a lion, it is not necessary to
outrun the lion.  It is only necessary to outrun your friend.


From mhammond@skippinet.com.au  Sun Apr 27 04:00:29 2003
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sun, 27 Apr 2003 13:00:29 +1000
Subject: [Python-Dev] test_ossaudiodev hanging again
In-Reply-To: <20030427020541.GE919@cthulhu.gerg.ca>
Message-ID: <02df01c30c69$29cf59a0$530f8490@eden>

[Greg]
> On 25 April 2003, Guido van Rossum said:
> > It probably never stopped hanging.  It only runs when you pass
> > "-u audio" to regrtest though.
> >
> > I note that it passes for me with Red Hat 7.3, so you might want to
> > upgrade. :-)
>
> Could be hardware, or it could be the device driver in the kernel.
> Jeremy, what audio software do you use regularly -- xmms? play?
> anything?  ossaudiodev currently goes to great pains to open the audio
> device in what *seems* to be the right way, but I have no idea if it
> really is.

It fails for me too, RH8:

Linux bobcat 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386
GNU/Linux

Install is pure vanilla on an asus laptop.  As far as I can tell, there is
no sound driver installed (but I'm not sure :)  Gnome desktop is not
starting the "sound server".  I have never heard a sound through these
speakers under Linux (so the fact Python can't play a sound isn't a problem,
but the fact write() hangs is)

Note that as mentioned this only fails/hangs when the audio resource is
enabled, so in general I don't have a problem but thought the data point may
be interesting.

Mark.



From guido@python.org  Sun Apr 27 04:28:59 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 26 Apr 2003 23:28:59 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: "Your message of Sat, 26 Apr 2003 21:35:42 EDT."
 <20030427013542.GA919@cthulhu.gerg.ca>
References: <20030423175310.F15881@localhost.localdomain>
 <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net>
 <20030427013542.GA919@cthulhu.gerg.ca>
Message-ID: <200304270328.h3R3Sxd05643@pcp02138704pcs.reston01.va.comcast.net>

> > > A better comparison would be Habitat for Humanity (and voluntary
> > > associations in general).  [...]
> > 
> On 23 April 2003, Guido van Rossum said:
> > Maybe.  I get lots of junk mail asking for contributions from
> > HforH and frankly I've always thought of them as yet another
> > charity: there are lots of these, and most of them are so much
> > larger than our community that comparison is difficult.

[Greg Ward]
> Don't forget, the PSF is gunning for charity status too.  That's
> just the most obvious way to state legally, "We are a community with
> shared values, etc. etc.".  I think there are a lot of parallels
> between open source development and other volunteer organizations.
> Heck, I like to justify the occasional weekend spent hunkered down
> in front of the computer by saying I'm doing volunteer work.  IMHO,
> hacking on Python is the moral equivalent of helping to maintain
> public-access hiking trails.  (Although the latter is better
> exercise, it's nice not to have to jump in the shower after a day
> spent hacking on Python.  ;-)

Sure (although I hope you jump in the shower anyway :).  But I don't
want the PSF to grow to the point where we have to send junk mail to
people who haven't heard about us.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Sun Apr 27 04:28:37 2003
From: barry@python.org (Barry Warsaw)
Date: 26 Apr 2003 23:28:37 -0400
Subject: [Python-Dev] test_ossaudiodev hanging again
In-Reply-To: <02df01c30c69$29cf59a0$530f8490@eden>
References: <02df01c30c69$29cf59a0$530f8490@eden>
Message-ID: <1051414116.20524.98.camel@geddy>

On Sat, 2003-04-26 at 23:00, Mark Hammond wrote:
> It fails for me too, RH8:

I just upgraded my RH7.3 laptop to RH9.  test_ossaudiodev passes for me,
even though I turn off the speaker on this laptop due to unbearable
feedback (can you say "poor Dell design"?).

In fact Python 2.3 cvs looks pretty good on RH9.  Python 2.2 maint is
another story, but I'm still investigating some things and will send a
separate email about that later.

-Barry




From guido@python.org  Sun Apr 27 04:37:21 2003
From: guido@python.org (Guido van Rossum)
Date: Sat, 26 Apr 2003 23:37:21 -0400
Subject: [Python-Dev] Accepted PEPs?
In-Reply-To: "Your message of Sat, 26 Apr 2003 18:20:53 EDT."
 <3EAB0645.7040306@python.org>
References: <3EAB0645.7040306@python.org>
Message-ID: <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net>

> So is PEP 294 itself rejected?  Or should we await a formal review
> request (as per the above)?

I suggest to reject it without further ado.  It seems there are two
kinds of PEPs: those aimed primarily at public review, and those aimed
primarily at the BDFL.  294 seems to be of the latter kind; it's 10
months old now and has never been posted (at least according to its
Post-History).  I wonder if the language in PEP 1 about this needs
firming up?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From goodger@python.org  Sun Apr 27 05:59:19 2003
From: goodger@python.org (David Goodger)
Date: Sun, 27 Apr 2003 00:59:19 -0400
Subject: [Python-Dev] Democracy
In-Reply-To: <20030427013542.GA919@cthulhu.gerg.ca>
References: <20030423175310.F15881@localhost.localdomain> <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> <20030427013542.GA919@cthulhu.gerg.ca>
Message-ID: <3EAB63A7.6050801@python.org>

Greg Ward wrote:
> IMHO, hacking on Python is
> the moral equivalent of helping to maintain public-access hiking
> trails.  (Although the latter is better exercise, it's nice not to have
> to jump in the shower after a day spent hacking on Python.  ;-)

You must not be practising Extreme Programming properly.

(Apologies for the obvious... but what an opening!)

-- David Goodger



From barry@python.org  Sun Apr 27 06:02:27 2003
From: barry@python.org (Barry Warsaw)
Date: 27 Apr 2003 01:02:27 -0400
Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9
Message-ID: <1051419746.20524.193.camel@geddy>

--=-3sCy95DzaAuHt1Ovtt10
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

I've been upgrading a few machines to Redhat 9 from 7.3 and I've run
into a few minor problems with Python on the 2.2 maint branch.  Both
dbmmodule and _socket fail to build properly.  Neither problems exist in
2.3 cvs.

The socket problem is fairly shallow I think: including ssl.h eventually
includes krb5.h.  Python 2.3's setup.py has a couple of lines of code to
deal with this, and that just needs to go into 2.2 maint's setup.py, so
I checked this in.

The dbm problem is just a bit deeper.  dbm ends up linking against gdbm,
so the library has to be specified in setup.py.  I'm nervous about
adding the stuff to setup.py because I don't want to break other
platforms.  Looking at 2.3's setup.py shows this section to be more
complicated and I'm too tired to tease everything out tonight.  I'll
attach a diff to this message in case anybody else feels like mucking
with it in the meantime.

-Barry



--=-3sCy95DzaAuHt1Ovtt10
Content-Description: 
Content-Disposition: inline; filename=setup.py-patch.txt
Content-Type: text/x-patch; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

Index: setup.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/setup.py,v
retrieving revision 1.73.4.16
diff -u -r1.73.4.16 setup.py
--- setup.py	27 Apr 2003 04:00:01 -0000	1.73.4.16
+++ setup.py	27 Apr 2003 05:00:15 -0000
@@ -406,7 +406,8 @@
                 exts.append( Extension('dbm', ['dbmmodule.c'],
                                        libraries = ['db1'] ) )
             else:
-                exts.append( Extension('dbm', ['dbmmodule.c']) )
+                exts.append( Extension('dbm', ['dbmmodule.c'],
+                                       libraries = ['gdbm']) )
 
         # Anthony Baxter's gdbm module.  GNU dbm(3) will require -lgdbm:
         if (self.compiler.find_library_file(lib_dirs, 'gdbm')):

--=-3sCy95DzaAuHt1Ovtt10--



From Anthony Baxter <anthony@interlink.com.au>  Sun Apr 27 07:05:51 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Sun, 27 Apr 2003 16:05:51 +1000
Subject: [Python-Dev] Curiousity
In-Reply-To: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200304270605.h3R65qK21094@localhost.localdomain>

>>> Guido van Rossum wrote
> A while ago some people were interested in upgrading our Ultraseek
> setup, but that initiative seems to have fallen by the wayside. :-(

Not exactly. I'm still waiting for 
a) the new linux-based creosote
b) the ultraseek license key

Anthony


From niemeyer@conectiva.com  Sun Apr 27 07:44:46 2003
From: niemeyer@conectiva.com (Gustavo Niemeyer)
Date: Sun, 27 Apr 2003 03:44:46 -0300
Subject: [Python-Dev] test_s?re merge
In-Reply-To: <16041.24315.500827.370963@montanaro.dyndns.org>
References: <16041.24315.500827.370963@montanaro.dyndns.org>
Message-ID: <20030427064444.GA30981@localhost>

> For those of you who don't read python-checkins, the merge of test_re.py and
> test_sre.py has been completed and test_sre.py is no longer in the
[...]

Great work! Thanks.

-- 
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5  60E2 2253 B29A 6664 3A0C ]


From martin@v.loewis.de  Sun Apr 27 09:22:53 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 27 Apr 2003 10:22:53 +0200
Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9
In-Reply-To: <1051419746.20524.193.camel@geddy>
References: <1051419746.20524.193.camel@geddy>
Message-ID: <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de>

Barry Warsaw <barry@python.org> writes:

> -                exts.append( Extension('dbm', ['dbmmodule.c']) )
> +                exts.append( Extension('dbm', ['dbmmodule.c'],
> +                                       libraries = ['gdbm']) )

I think this was an alternative for platforms where a dbm library is
part of the C library. Your patch would kill those platforms - but it
may be that we are talking about the empty set here.

Regards,
Martin


From oren-py-d@hishome.net  Sun Apr 27 10:58:22 2003
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Sun, 27 Apr 2003 05:58:22 -0400
Subject: [Python-Dev] Accepted PEPs?
In-Reply-To: <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net>
References: <3EAB0645.7040306@python.org> <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030427095822.GA66695@hishome.net>

On Sat, Apr 26, 2003 at 11:37:21PM -0400, Guido van Rossum wrote:
> > So is PEP 294 itself rejected?  Or should we await a formal review
> > request (as per the above)?
> 
> I suggest to reject it without further ado.  

Go ahead.

I still consider this an open issue (though of pretty low priority). If 
anyone else here feels that it's redundant to refer to built in types by
two different names and has a better idea of where to put names that match 
the __name__ attribute of types, please go ahead and write a proposal.

> It seems there are two
> kinds of PEPs: those aimed primarily at public review, and those aimed
> primarily at the BDFL.  294 seems to be of the latter kind; it's 10
> months old now and has never been posted (at least according to its
> Post-History).  I wonder if the language in PEP 1 about this needs
> firming up?

Mea culpa. I never realized that I forgot to actually post it.

    Oren


From skip@mojam.com  Sun Apr 27 13:01:17 2003
From: skip@mojam.com (Skip Montanaro)
Date: Sun, 27 Apr 2003 07:01:17 -0500
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200304271201.h3RC1He01081@manatee.mojam.com>

Bug/Patch Summary
-----------------

407 open / 3569 total bugs (+12)
129 open / 2106 total patches (+1)

New Bugs
--------

socketmodule doesn't compile on strict POSIX systems (2003-04-20)
	http://python.org/sf/724588
email/quopriMIME.py exception on int (lstrip) (2003-04-20)
	http://python.org/sf/724621
Minor /Tools/Scripts/crlf.py bugs (2003-04-20)
	http://python.org/sf/724767
Possible OSX module location bug (2003-04-21)
	http://python.org/sf/725026
SRE bug with capturing groups in alternatives in repeats (2003-04-21)
	http://python.org/sf/725106
SRE bugs with capturing groups in negative assertions (2003-04-21)
	http://python.org/sf/725149
urlopen object's read() doesn't read to EOF (2003-04-21)
	http://python.org/sf/725265
Broken links (2003-04-23)
	http://python.org/sf/726150
textwrap.wrap infinite loop (2003-04-23)
	http://python.org/sf/726446
platform module needs docs (LaTeX) (2003-04-24)
	http://python.org/sf/726911
valgrind python fails (2003-04-24)
	http://python.org/sf/727051
use bsddb185 if necessary in dbhash (2003-04-24)
	http://python.org/sf/727137
Core Dumps : Python2.2.2  (2003-04-24)
	http://python.org/sf/727241
test_bsddb3 fails (2003-04-25)
	http://python.org/sf/727571
Documentation formatting bugs (2003-04-25)
	http://python.org/sf/727692
email parsedate still wrong (PATCH) (2003-04-25)
	http://python.org/sf/727719
getpath.c-generated prefix wrong for Tru64 scripts (2003-04-25)
	http://python.org/sf/727732
Test failures on Linux, Python 2.3b1 tarball (2003-04-26)
	http://python.org/sf/728051
tmpnam problems on windows 2.3b, breaks test.test_os (2003-04-26)
	http://python.org/sf/728097
Tools/msgfmt.py results in two warnings under Python 2.3b1 (2003-04-26)
	http://python.org/sf/728277
setup.py breaks during build of Python-2.3b1 (2003-04-27)
	http://python.org/sf/728322
IRIX, 2.3b1, socketmodule.c compilation errors (2003-04-27)
	http://python.org/sf/728330

New Patches
-----------

Improved output for unittest failUnlessEqual (2003-04-22)
	http://python.org/sf/725569
Modules/addrinfo.h patch (2003-04-22)
	http://python.org/sf/725942
help() with readline support (2003-04-23)
	http://python.org/sf/726204
Clarify docs for except target assignment (2003-04-24)
	http://python.org/sf/726751
AUTH_TYPE and REMOTE_USER for CGIHTTPServer.py:run_cgi() (2003-04-25)
	http://python.org/sf/727483
Editing of __str__ and __repr__ docs (2003-04-25)
	http://python.org/sf/727789
Remove extra line ending in CGI XML-RPC responses (2003-04-25)
	http://python.org/sf/727805
Multiple webbrowser.py bug fixes / improvements (2003-04-26)
	http://python.org/sf/728278

Closed Bugs
-----------

netrc module can't handle all passwords (2002-05-18)
	http://python.org/sf/557704
netrc & special chars in passwords (2002-12-01)
	http://python.org/sf/646592
optparse store_true uses 1 and 0 (2002-12-28)
	http://python.org/sf/659604
filter() treatment of str and tuple inconsistent (2003-01-10)
	http://python.org/sf/665835
StringIO self-iterator (2003-01-31)
	http://python.org/sf/678519
repr() of large array objects takes quadratic time (2003-02-05)
	http://python.org/sf/680789
math.log(0) differs from math.log(0L) (2003-03-27)
	http://python.org/sf/711019
sys.path on MacOSX (2003-04-10)
	http://python.org/sf/719297
Icon on applets is wrong (2003-04-10)
	http://python.org/sf/719303
tarfile gets filenames wrong (2003-04-15)
	http://python.org/sf/721871
logging.setLoggerClass() doesn't support new-style classes (2003-04-18)
	http://python.org/sf/723801

Closed Patches
--------------

Fix for seg fault on test_re on mac osx (2002-07-12)
	http://python.org/sf/580869
[mingw patches] alloca and posixmodule (2002-10-04)
	http://python.org/sf/618791
Generator form of os.path.walk (2002-12-12)
	http://python.org/sf/652980
Add inet_pton and inet_ntop to socket (2002-12-24)
	http://python.org/sf/658327
Deprecate rotor module (2003-02-03)
	http://python.org/sf/679505
fix for 680789: reprs in arraymodule (2003-02-11)
	http://python.org/sf/685051
fix bug 678519: cStringIO self iterator (2003-03-01)
	http://python.org/sf/695710
allow timeit to see your globals() (2003-04-08)
	http://python.org/sf/717575
Patch to distutils doc for metadata explanation (2003-04-09)
	http://python.org/sf/718027


From thomas@xs4all.net  Sun Apr 27 15:03:46 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Sun, 27 Apr 2003 16:03:46 +0200
Subject: [Python-Dev] Curiousity
In-Reply-To: <200304270605.h3R65qK21094@localhost.localdomain>
References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain>
Message-ID: <20030427140346.GC26254@xs4all.nl>

On Sun, Apr 27, 2003 at 04:05:51PM +1000, Anthony Baxter wrote:

> Not exactly. I'm still waiting for 
> a) the new linux-based creosote

No, it's not a new creosote. It's a new machine, it doesn't replace
creosote. It's taking a bit longer than I expected, partly because of my
workload, partly because that of others and partly because of unforseen
events, but it's still on its way.

> b) the ultraseek license key

Last Barry and I looked (during PyCon), we couldn't find the Linux
version... Are we certain it still exists ? :)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Sun Apr 27 15:11:01 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Sun, 27 Apr 2003 16:11:01 +0200
Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9
In-Reply-To: <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de>
References: <1051419746.20524.193.camel@geddy> <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20030427141101.GD26254@xs4all.nl>

On Sun, Apr 27, 2003 at 10:22:53AM +0200, Martin v. L=F6wis wrote:
> Barry Warsaw <barry@python.org> writes:
>=20
> > -                exts.append( Extension('dbm', ['dbmmodule.c']) )
> > +                exts.append( Extension('dbm', ['dbmmodule.c'],
> > +                                       libraries =3D ['gdbm']) )

> I think this was an alternative for platforms where a dbm library is
> part of the C library. Your patch would kill those platforms - but it
> may be that we are talking about the empty set here.

In either case, it should only link to gdbm if gdbm exists -- which is
checked for right below the patch.

--=20
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me =
spread!


From Anthony Baxter <anthony@interlink.com.au>  Sun Apr 27 15:48:41 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Mon, 28 Apr 2003 00:48:41 +1000
Subject: [Python-Dev] shellwords
In-Reply-To: <20030425181157.GB6591@localhost.distro.conectiva>
Message-ID: <200304271448.h3REmfK23350@localhost.localdomain>

>>> Gustavo Niemeyer wrote
> > The other file manipulation thingy that would be good would be to
> > abstract out the bits of tarfile and zipfile and make a standard 
> > interface to the two.
> 
> IIRC, tarfile has a wrapper which makes it compatible with zipfile.

Yah, but tarfile's interface is much nicer. I was talking about a mode
that makes zipfile like tarfile.

Anthony



From dberlin@dberlin.org  Sun Apr 27 16:49:02 2003
From: dberlin@dberlin.org (Daniel Berlin)
Date: Sun, 27 Apr 2003 11:49:02 -0400
Subject: [Python-Dev] Why doesn't the uu module give you the filename?
Message-ID: <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org>

While it's simple enough to get the uu module to uudecode a string 
(using StringIO), it's impossible to get it to handle you the filename 
the uuencoded thing specifies.

IE given

begin 644 a.ii.gz
<whatever>
end

Their is no way to get the decode function to tell you the thing is 
named a.ii.gz.
Of course, it uses this filename itself in creating an output file if 
you don't specify one.  It just won't tell *you* what the filename is.
I could just give it no output file, and let it create it, then 
determine the name of the file it created, but this seems like a very 
large kludge.
Besides, I am decoding from/to a string, in memory. I don't want to 
start have it write things to the disk for no reason.

The context of all of this is that I have a program that is converting 
text that possibly contains uuencoded attachments into a bunch of SQL 
statements to insert into a database (It's converting a GNATS bug 
database to a Bugzilla one. It's a rewrite of an incredibly ugly, slow, 
barely functional perl script that spews errors at random and leaks 
memory for no reason :P).
I had to cut/paste the decode function from the uu module into a new 
module and make it return the filename, just so that i could get access 
to it.
This seems a bit silly.
The decode function has no return value right now, so giving it one 
shouldn't break existing applications (since none of them should be 
expecting it to return anything).
I believe it should return the filename specified in the begin line.

As an added bonus, it would be even nicer if it also returned the start 
and end position of the decoded portion inside the input text.  that 
way if one wants to replace the entire uuencoded text with something 
like, say, "See bug attachments for <filename>", you can do it easily. 
:P

As i said, i've got a version of uu.decode that does all of this, i'll 
happily submit it as a patch if people agree i'm right.
--Dan



From ANTIGEN@netsys.co.za  Sun Apr 27 16:50:07 2003
From: ANTIGEN@netsys.co.za (ANTIGEN_NETSYS-NT-SERV)
Date: Sun, 27 Apr 2003 17:50:07 +0200
Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus
Message-ID: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za>

Antigen for Exchange found Unknown infected with
CorruptedCompressedUuencodeFile virus.
The file is currently Removed.  The message, "Why doesn't the uu module give
you the filename?", was
sent from python-list-admin@python.org and was discovered in IMC
Queues\Inbound
located at Netsys International/NETSYS/NETSYS-NT-SERV.


From dberlin@dberlin.org  Sun Apr 27 17:25:43 2003
From: dberlin@dberlin.org (Daniel Berlin)
Date: Sun, 27 Apr 2003 12:25:43 -0400
Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus
In-Reply-To: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za>
Message-ID: <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org>

Yes, it's the incredible "even if it was valid uuencoded text, it would 
be a very dangerous empty file" virus.
Who designs this shite?
What's next?
"Antigen for Exchange found Unknown infected with 
YourMailerCantDoMIMEProperly virus"
or maybe
"Antigen for Exchange found Unknown infected with YouQuotedTooMuchText 
virus"
or more likely
"Antigen for Exchange found our entire organization infected with 
WeUseBrainDeadAntiVirusSoftware virus"

--Dan
On Sunday, April 27, 2003, at 11:50  AM, ANTIGEN_NETSYS-NT-SERV wrote:

> Antigen for Exchange found Unknown infected with
> CorruptedCompressedUuencodeFile virus.
> The file is currently Removed.  The message, "Why doesn't the uu 
> module give
> you the filename?", was
> sent from python-list-admin@python.org and was discovered in IMC
> Queues\Inbound
> located at Netsys International/NETSYS/NETSYS-NT-SERV.
>



From itamar@itamarst.org  Sun Apr 27 19:53:16 2003
From: itamar@itamarst.org (Itamar Shtull-Trauring)
Date: Sun, 27 Apr 2003 14:53:16 -0400
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
Message-ID: <20030427145316.475c3cf5.itamar@itamarst.org>

The "we always wrap socket objects with python class" change seems to
have slowed down networking on Linux (and presumably other platforms
where socket objects used to be unwrapped.)

Moshe Zadka ran some benchmarks on Linux (2.4.9 - a redhat machine at
work probably) with 2.2 and 2.3b1 using Demos/sockets/throughput.py.

For count of 1000:
2.3 server, 2.3 client: Throughput: 13556.811 K/sec.
2.3 server, 2.2 client: Throughput: 24917.862 K/sec.
2.2 server, 2.2 client: Throughput: 29838.491 K/sec.

10,000:
2.3 server, 2.3 client: Throughput: 35994.749 K/sec.
2.3 server, 2.2 client: Throughput: 34398.085 K/sec.
2.2 server, 2.2 client: Throughput: 49488.916 K/sec.

50,000:
2.3 server, 2.3 client: Throughput: 39002.538 K/sec.
2.3 server, 2.2 client: Throughput: 48064.785 K/sec.
2.2 server, 2.2 client: Throughput: 59799.672 K/sec.

On a 2.3a2 I have I did "socket.socket = socket._socketobject", and got
a 20% slowdown compared to 2.2 on throughput. (2.3a2 without this change
is the same speed as 2.2).

Can other people do some tests to verify these numbers?

If this slowdown is confirmed, it is really not acceptable, since the
change seems to have been made only to support making timeout sockets
slightly easier to use. Why should everyone have to pay a speed penalty
just so a minority of people can skip calling a
"socket.installtimeoutsupport()" at the beginning of their program? it's
just one line of code they'd need to add.

In real programs the speed drop would probably be much less pronounced,
although I bet this slows down e.g. Anthony Baxter's portforwarder quite
a bit. If Python 2.3 is released without fixing this Twisted will
probably monkeypatch the socket module so that we can get full
performance, since we have our own (unavoidable) layers of Python
indirection :)

-- 
Itamar Shtull-Trauring    http://itamarst.org/
http://www.zoteca.com -- Python & Twisted consulting


From guido@python.org  Sun Apr 27 20:19:17 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 27 Apr 2003 15:19:17 -0400
Subject: [Python-Dev] Why doesn't the uu module give you the filename?
In-Reply-To: "Your message of Sun, 27 Apr 2003 11:49:02 EDT."
 <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org>
References: <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org>
Message-ID: <200304271919.h3RJJHs15021@pcp02138704pcs.reston01.va.comcast.net>

> While it's simple enough to get the uu module to uudecode a string 
> (using StringIO), it's impossible to get it to handle you the filename 
> the uuencoded thing specifies.
[...]
> As i said, i've got a version of uu.decode that does all of this, i'll 
> happily submit it as a patch if people agree i'm right.

Sure, as long as your patch is backwards compatible.  Send it to SourceForge.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Apr 27 20:22:17 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 27 Apr 2003 15:22:17 -0400
Subject: [Python-Dev] Curiousity
In-Reply-To: "Your message of Sun, 27 Apr 2003 16:03:46 +0200."
 <20030427140346.GC26254@xs4all.nl>
References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>
 <200304270605.h3R65qK21094@localhost.localdomain>
 <20030427140346.GC26254@xs4all.nl>
Message-ID: <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net>

> > b) the ultraseek license key
> 
> Last Barry and I looked (during PyCon), we couldn't find the Linux
> version... Are we certain it still exists ? :)

Googling for "ultraseek download" found thispage, which seems to have
it:

  http://downloadcenter.verity.com/dlc/index.jsp

--Guido van Rossum (home page: http://www.python.org/~guido/)


From drifty@alum.berkeley.edu  Sun Apr 27 20:53:08 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 27 Apr 2003 12:53:08 -0700 (PDT)
Subject: [Python-Dev] test_logging hangs on OS X (was: ... Solaris 8)
In-Reply-To: <16041.31987.943313.278329@montanaro.dyndns.org>
References: <16041.31987.943313.278329@montanaro.dyndns.org>
Message-ID: <Pine.SOL.4.55.0304271250320.17451@death.OCF.Berkeley.EDU>

[Skip Montanaro]

> Using the latest version from CVS, on Solaris 8 test_logging hangs.

I am getting this hang on OS X as well.  Anyone else?

> Lots of output, then:
>
>     ...
>     -- log_test3  end    ---------------------------------------------------
>

I am getting the hang at the same place; log_test3 ends and then nothing
happens afterwards.

At least this has convinced me that to forward with my skips.txt patch.

-Brett


From martin@v.loewis.de  Sun Apr 27 21:19:44 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 27 Apr 2003 22:19:44 +0200
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
Message-ID: <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de>

Itamar Shtull-Trauring <itamar@itamarst.org> writes:

> Can other people do some tests to verify these numbers?

For that, it would be good if Moshe's test procedure was published.

Regards,
Martin


From itamar@itamarst.org  Sun Apr 27 21:23:27 2003
From: itamar@itamarst.org (Itamar Shtull-Trauring)
Date: Sun, 27 Apr 2003 16:23:27 -0400
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
 <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de>
Message-ID: <20030427162327.428c974f.itamar@itamarst.org>

On 27 Apr 2003 22:19:44 +0200
martin@v.loewis.de (Martin v. L=F6wis) wrote:

> > Can other people do some tests to verify these numbers?
>=20
> For that, it would be good if Moshe's test procedure was published.

On Debian, you can do:

cd /usr/share/doc/python2.2/examples/Demos/sockets.py
python2.2 throughput.py -s &
python2.2 throughput.py -c 10000 localhost

and try with python2.3 and different numbers other than 10000. On
non-Debian platforms/packages it's wherever you have the python examples
installed.

--=20
Itamar Shtull-Trauring    http://itamarst.org/
http://www.zoteca.com -- Python & Twisted consulting


From drifty@alum.berkeley.edu  Sun Apr 27 21:32:58 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Sun, 27 Apr 2003 13:32:58 -0700 (PDT)
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <20030427162327.428c974f.itamar@itamarst.org>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
 <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de> <20030427162327.428c974f.itamar@itamarst.org>
Message-ID: <Pine.SOL.4.55.0304271328270.17451@death.OCF.Berkeley.EDU>

[Itamar Shtull-Trauring]

> On 27 Apr 2003 22:19:44 +0200
> martin@v.loewis.de (Martin v. L=F6wis) wrote:
>
> > > Can other people do some tests to verify these numbers?
> >
>
> non-Debian platforms/packages it's wherever you have the python examples
> installed.

So running Demo/sockets/throughput.py with the -c 10000 argument I get
under OS X:

* Python 2.2.2: 7976.756k K/sec
* CVS Python (compiled on April 18): 2772.97 K/sec

Now I put no great effort into steriliziing my system so that nothing else
was running so take these numbers with a grain of salt.

-Brett


From tim.one@comcast.net  Sun Apr 27 22:22:00 2003
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 27 Apr 2003 17:22:00 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <20030427014407.GC919@cthulhu.gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEEOEEAB.tim.one@comcast.net>

[Jeremy Fincher]
> It's a minor quibble to be sure, but os.walk doesn't really
> describe what exactly it's doing.  I'd suggest os.pathwalk, but
> that'd be too error-prone, being os.path.walk without a dot.  Perhaps
> os.pathwalker?

[Greg Ward]
> os.walktree?  os.walkdirs?  os.walkpath?
>
> (On reflection, the latter two are pretty dumb.  walktree is the right
> name, undoubtedly.  ;-)

I don't expect any short name to describe exactly what a thing does, and
don't worry about it.  math.sin() isn't about lust in your heart, or
math.tan() about practicing safe sunning either.

Guido has his own inscrutable criteria for picking names.  Mine is whether,
*after* I know what a thing does, it's hard to forget what the name means.
"walk" passed that test for me, and better than Python or Java did <wink>.



From thomas@xs4all.net  Sun Apr 27 22:40:11 2003
From: thomas@xs4all.net (Thomas Wouters)
Date: Sun, 27 Apr 2003 23:40:11 +0200
Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus
In-Reply-To: <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org>
References: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za> <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org>
Message-ID: <20030427214011.GE26254@xs4all.nl>

On Sun, Apr 27, 2003 at 12:25:43PM -0400, Daniel Berlin wrote:

> Yes, it's the incredible "even if it was valid uuencoded text, it would 
> be a very dangerous empty file" virus.
> Who designs this shite?

Someone who was paying attention to the incredibly numerous problems with
braindead mailprograms (oddly enough by far the most common of them on one
particular platform, and from one particular vendor ;) If there is a problem
with some kind of pattern, somewhere someone will write a program to block
the pattern, and lots of people will use/buy it. It beats getting infected.

> What's next?
> "Antigen for Exchange found Unknown infected with 
> YourMailerCantDoMIMEProperly virus"
> or maybe
> "Antigen for Exchange found Unknown infected with YouQuotedTooMuchText 
> virus"

If those viruses actually exist, yes, I'm certain you will see them.

> or more likely
> "Antigen for Exchange found our entire organization infected with 
> WeUseBrainDeadAntiVirusSoftware virus"

You mean 'WeUseBrainDeadMailClientsAndMailServers'. No worries, though,
it's not python-dev itself that checks for viruses, and whoever has his or
her viruschecker on the unpopular 'warn everyone on the CC list too' setting
will proably be frantically trying to fix it now :)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From aahz@pythoncraft.com  Mon Apr 28 00:25:39 2003
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 27 Apr 2003 19:25:39 -0400
Subject: [Python-Dev] Curiousity
In-Reply-To: <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net>
References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain> <20030427140346.GC26254@xs4all.nl> <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030427232539.GA25650@panix.com>

On Sun, Apr 27, 2003, Guido van Rossum wrote:
>
> Googling for "ultraseek download" found thispage, which seems to have
> it:
> 
>   http://downloadcenter.verity.com/dlc/index.jsp

Did you try accessing URL?  It seems to be down right now.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 93


From guido@python.org  Mon Apr 28 02:26:00 2003
From: guido@python.org (Guido van Rossum)
Date: Sun, 27 Apr 2003 21:26:00 -0400
Subject: [Python-Dev] Curiousity
In-Reply-To: "Your message of Sun, 27 Apr 2003 19:25:39 EDT."
 <20030427232539.GA25650@panix.com>
References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net>
 <200304270605.h3R65qK21094@localhost.localdomain>
 <20030427140346.GC26254@xs4all.nl>
 <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net>
 <20030427232539.GA25650@panix.com>
Message-ID: <200304280126.h3S1Q0i15475@pcp02138704pcs.reston01.va.comcast.net>

> On Sun, Apr 27, 2003, Guido van Rossum wrote:
> >
> > Googling for "ultraseek download" found thispage, which seems to have
> > it:
> > 
> >   http://downloadcenter.verity.com/dlc/index.jsp
> 
> Did you try accessing URL?  It seems to be down right now.

Yes, I even downloaded the Linux tarball (to my Windows laptop :-).
It's up right now for me.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jepler@unpythonic.net  Mon Apr 28 03:04:22 2003
From: jepler@unpythonic.net (Jeff Epler)
Date: Sun, 27 Apr 2003 21:04:22 -0500
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
Message-ID: <20030428020421.GA31496@unpythonic.net>

I can also reproduce the slowdown.

Measured on a Redhat 9 machine, python-2.2.2-26.i386.rpm vs python 2.3b1
compiled with default options.  700MHz Pentium III in a laptop. best of 3
runs.  Count of 100000.  Running over the loopback device.  Sentence
fragments.

Server  Client  Throughput  Speed
2.2     2.2     53520.4     100.00%
2.2     2.3b1   43726.28    81.70%
2.3b1   2.2     43032.06    80.40%
2.3b1   2.3b1   38283.78    71.53%

System load was low at the time, though I had various apps running.

I also ran the test over my 802.11b wireless setup:
Server  Client  Throughput  Speed
2.2     2.2     639.16      100.00%
2.3b1   2.2     639.07      99.98%
(client was a 350MHz machine with various programs running)

that is, when running over a relatively slow link (theoretically, 11mbps)
the slowdown is not measurable.  However, I don't think that this really
decreases the importance of this performance regression.

Jeff


From mal@lemburg.com  Mon Apr 28 08:53:35 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 28 Apr 2003 09:53:35 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>
References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com>
Message-ID: <3EACDDFF.3060304@lemburg.com>

Moore, Paul wrote:
> From: M.-A. Lemburg [mailto:mal@lemburg.com]
> 
>>In reality it probably is for most parts of the world. But
>>why put this burden on the casual user ?
> 
> Speaking as a "casual user", I very rarely need or use crypto
> software. However, when I do need it, having it "built in" is
> a major benefit - most of the crypto packages either have
> dependencies I'm not familiar with or don't have, or go far
> too deep into crypto theory for me to follow. At the end of
> the day, all I want is simple stuff, like for urllib to get a
> "https" web page for me, "just like my browser does" (ie, with
> no thought on my part...)

Paul, that's the wrong approach to the problem. Crypto
code causes legal problems not ones which have to do with
how to wrap up distributions.

There's hardly anything to argue about here, unfortunately.

>>>>Crypto is just too much (legal) work if you're serious
>>>>about it.
>>>
>>>So then you would advise to remove the OpenSSL support
>>>from the Windows distribution, and from Python altogether?
>>
>>Hmm, I didn't know that the Windows installer comes with an SSL
>>module that includes OpenSSL. I'd strongly advise to make that
>>a separate download.
> 
> If you did, I'd expect that 99% of Windows users would perceive
> that as "Python can't handle https URLs". Having a separate
> download might be enough, as long as it was utterly trivial -
> download the package, click to install, done. All dependencies
> included, no extra work.

Right; and that would be possible... not only for Windows,
but for most supported platforms via distutils.

>>Is there ? pycrypto is all you need if you're into deep crypto.
> 
> But pycrypto (at least when I've looked into it) definitely *isn't*
> just a 1-click install, and a quick Google search reveals no way
> of getting a prebuilt Windows binary. Of course, you say "if you're
> into deep crypto", so maybe you'd say that expecting users to build
> their own isn't unreasonable at that level.
> 
> Actually, m2crypto is another candidate, and it does include
> Windows binaries (but they are a bit fiddly to install)...

Both packages are maintained outside the Python distribution,
so there's nothing much we can do to change that situation.
I was talking about the code currently integrated in Python
itself.

>>The standard SSL support is enough crypt for most people and
>>that's already included in the distribution.
> 
> But you were arguing to take it out...

I am argueing to take out the OpenSSL code currently
shipped with the Windows installer, not the wrapper
code in the _ssl module.

> Personally, I'd like the existing stuff to stay as-is. 

I can understand your point, but we have to do something
about the current situation, unless we want to put the
whole Python distribution at risk of being illegally
exported/imported/used in some parts of the world.

Making the crypto part of the distribution would solve
the problem and only introduce a mild inconvenience for
casual users.

> I don't
> particularly see the need for more crypto stuff in the core, but I'd
> like to see a well-maintained, easy to install, "sanctioned" crypto
> package for people who want to either use crypto "for real", or just
> investigate it.

That's a feature request :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 28 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        57 days left



From mal@lemburg.com  Mon Apr 28 11:00:29 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 28 Apr 2003 12:00:29 +0200
Subject: [Python-Dev] Cryptographic stuff for 2.3
In-Reply-To: <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net>
References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> <1051290657.1500.6.camel@barry> <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3EACFBBD.7000909@lemburg.com>

Guido van Rossum wrote:
>>I'd hate to see sha removed from the standard distro.
> 
> 
> Me too; I don't see sha or md5 as crypto.  I'm only against adding new
> *crypto* capability.

Hash algorithms are usually not regulated as crypto code -- even
though they can be used for such purposes; see e.g. shaffing and
winnowing:

    http://theory.lcs.mit.edu/~rivest/chaffing.txt

> I'm also for isolating existing crypto capability so it's easy to
> remove for anyone who has a need for a crypto-free distribution.  I
> think we're already doing that, given that even on Windows, the SSL
> module is a separate DLL.

We could wrap up the following set:

a.installer with crypto code + notice that downloading and using
   this version is illegal in some countries and that the downloading
   and/or reexporting the installer to certain countries is not
   legal

b.installer without crypto

c.crypto package as distutils installer with the same notice
   as for the combined package

Or we just do b and c and leave a to companies like ActiveState.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 28 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        57 days left



From cjohns@cybertec.com.au  Mon Apr 28 14:31:36 2003
From: cjohns@cybertec.com.au (Chris Johns)
Date: Mon, 28 Apr 2003 23:31:36 +1000
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
Message-ID: <3EAD2D38.3030906@cybertec.com.au>

Hello,

Porting Python to the open source realtime OS called RTEMS I get a compile error 
on line 2797 of socketmodule.c. This is from CVS and I suspect a result of the 
SF patch #658327. More problems exist on lines 2814, 2835 and 2850.

Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet.

Also where is INET_ADDRSTRLEN suppose to be defined ?


Regards

-- 
  Chris Johns, cjohns at cybertec.com.au



From guido@python.org  Mon Apr 28 15:29:55 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 28 Apr 2003 10:29:55 -0400
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: Your message of "Sun, 27 Apr 2003 21:04:22 CDT."
 <20030428020421.GA31496@unpythonic.net>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
 <20030428020421.GA31496@unpythonic.net>
Message-ID: <200304281429.h3SETtq06555@odiug.zope.com>

I'm guessing that the slowdown comes from the fact that calling a
method like recv() on the wrapper object is now a Python method which
calls the C method on the wrapped object.

I wonder if the slowdown can't be easily repaired by changing the
wrapper class to copy the relevant methods to instance variables.

It would be even nicer to use subclassing instead of a wrapper
object.  I vaguely recall that I tried this before but couldn't figure
out how to do it, but I've got a feeling that it ought to be doable --
after all the C socket object has separate __new__ and __init__
methods.

I hope someone can take this ball and submit a patch -- it would
indeed be a shame to have to live with the slowdown (even if it only
shows up when using the loopback device) or to have a practice of
monkey patching socket.py.  (BTW instead of monkey-patching socket.py,
it might be easier to write "import _socket as socket".)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Mon Apr 28 15:32:23 2003
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 28 Apr 2003 09:32:23 -0500
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
Message-ID: <16045.15223.224661.442041@montanaro.dyndns.org>

    Itamar> If this slowdown is confirmed, it is really not acceptable,
    Itamar> since the change seems to have been made only to support making
    Itamar> timeout sockets slightly easier to use. 

It was done to support making timeout sockets work properly.  As they
existed previously, timeout sockets wouldn't work with protocols which would
most likely use them: higher level modules such as httplib, which call
sock.makefile(), then call readlines?() on the resulting file object.

    Itamar> Why should everyone have to pay a speed penalty just so a
    Itamar> minority of people can skip calling a
    Itamar> "socket.installtimeoutsupport()" at the beginning of their
    Itamar> program? it's just one line of code they'd need to add.

I think it would be easier for the minority of programs that care about the
20% performance loss to simply set

    import socket, _socket
    socket.socket = socket.SocketType = _socket.socket

I don't know about you, but fast and incorrect don't help me much.

Feel free to submit a patch which improves performance but maintains proper
behavior in the face of timeouts (that is, allows test_urllibnet to still
work correctly).

Skip


From hemanexp@yahoo.com  Mon Apr 28 15:40:41 2003
From: hemanexp@yahoo.com (perl lover)
Date: Mon, 28 Apr 2003 07:40:41 -0700 (PDT)
Subject: [Python-Dev] Getting mouse position interms of canvas unit.
Message-ID: <20030428144041.1817.qmail@web41709.mail.yahoo.com>

hi,
  iam new to python and tkinter. I have created a
canvas of size 300m * 300m (in millimeter). I bind
mouse move method to canvas. when i move the mouse
over the canvas the mouse position gets printed in
pixel unit. But i want to get mouse position values
interms of canvas unit (ie, millimeter). How can i get
mouse position values interms of canvas unit?

   My program is given below.
   *****************************

from Tkinter import *
root = Tk()
c = Canvas(root,width="300m",height="300m",background
= 'gray')
c.pack()

def mouseMove(event):
	print c.canvasx(event.x), c.canvasy(event.y)

c.create_rectangle('16m','10.5m','21m','15.5m',fill='blue')
c.bind('<Motion>',mouseMove)
root.mainloop()

Tnanx


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


From tjreedy@udel.edu  Mon Apr 28 18:22:32 2003
From: tjreedy@udel.edu (Terry Reedy)
Date: Mon, 28 Apr 2003 13:22:32 -0400
Subject: [Python-Dev] Re: Getting mouse position interms of canvas unit.
References: <20030428144041.1817.qmail@web41709.mail.yahoo.com>
Message-ID: <b8jntb$7qd$1@main.gmane.org>

"perl lover" <hemanexp@yahoo.com> wrote in message
news:20030428144041.1817.qmail@web41709.mail.yahoo.com...
> hi,
>   iam new to python and tkinter.

Hi.  This is the Python *developers* list.  The following is a Python
*usage* question rather than a Python *development* question.  Please
submit such to comp.lang.python or the regular Python mailing list
(see www.python.org).

>How can i get mouse position values in terms of canvas unit?

If nothing else, figure out the ratio between pixel and canvas (mm)
units and multiply.

Terry J. Reedy







From Raymond Hettinger" <python@rcn.com  Mon Apr 28 20:51:36 2003
From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 15:51:36 -0400
Subject: [Python-Dev] Dictionary tuning
Message-ID: <001b01c30dbf$94363140$125ffea9@oemcomputer>

I've experimented with about a dozen ways to improve dictionary 
performance and found one that benefits some programs by up to 
5% without hurting the performance of other programs by more
than a single percentage point.

It entails a one line change to dictobject.c resulting in a new 
schedule of dictionary sizes for a given number of entries:

Number of           Current size        Proposed size
Filled Entries      of dictionary       of dictionary
--------------      -------------       -------------
[-- 0 to 5 --]            8                   8
[-- 6 to 10 --]          16                  32
[-- 11 to 21 --]         32                  32
[-- 22 to 42 --]         64                 128
[-- 43 to 85 --]        128                 128
[-- 86 to 170 --]       256                 512
[-- 171 to 341 --]      512                 512

The idea is to lower the average sparseness of dictionaries (by
0% to 50% of their current sparsenes).  This results in fewer 
collisions, faster collision resolution, fewer memory accesses,
and better cache performance.  A small side-benefit is halving
the number of resize operations as the dictionary grows.

The above table of dictionary sizes shows that odd numbered
steps have the same size as the current approach while even
numbered steps are twice as large.  As a result, small dicts 
keep their current size and the amortized size of large dicts 
remains the same.  Along the way, some dicts will be a little
larger and will benefit from the increased sparseness.

I would like to know what you guys think about the idea
and would appreciate your verifying the performance on
your various processors and operating systems.


Raymond Hettinger



P.S.  The one line patch is:

*** dictobject.c        17 Mar 2003 19:46:09 -0000      2.143
--- dictobject.c        25 Apr 2003 22:33:24 -0000
***************
*** 532,538 ****
         * deleted).
         */
        if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
!               if (dictresize(mp, mp->ma_used*2) != 0)
                        return -1;
        }
        return 0;
--- 532,538 ----
         * deleted).
         */
        if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
!               if (dictresize(mp, mp->ma_used*4) != 0)
                        return -1;
        }









From martin@v.loewis.de  Mon Apr 28 21:07:08 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 28 Apr 2003 22:07:08 +0200
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <3EAD2D38.3030906@cybertec.com.au>
References: <3EAD2D38.3030906@cybertec.com.au>
Message-ID: <m3adea9yj7.fsf@mira.informatik.hu-berlin.de>

Chris Johns <cjohns@cybertec.com.au> writes:

> Porting Python to the open source realtime OS called RTEMS I get a
> compile error on line 2797 of socketmodule.c. 

In my copy, this is the line

	char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))];

Can you report more on the nature of the compile error (such as its
*message*)?

> Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet.

(assuming this is a question): I'm unsure. It should not cause a
compile time failure, period.

> Also where is INET_ADDRSTRLEN suppose to be defined ?

<netinet/in.h>

Regards,
Martin


From glyph@twistedmatrix.com  Mon Apr 28 21:49:27 2003
From: glyph@twistedmatrix.com (Glyph Lefkowitz)
Date: Mon, 28 Apr 2003 15:49:27 -0500
Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs
In-Reply-To: <20030428160006.2359.60528.Mailman@mail.python.org>
Message-ID: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Monday, April 28, 2003, at 11:00 AM, python-dev-request@python.org 
wrote:

>     Itamar> If this slowdown is confirmed, it is really not acceptable,
>     Itamar> since the change seems to have been made only to support 
> making
>     Itamar> timeout sockets slightly easier to use.
>
> It was done to support making timeout sockets work properly.  As they
> existed previously, timeout sockets wouldn't work with protocols which 
> would
> most likely use them: higher level modules such as httplib, which call
> sock.makefile(), then call readlines?() on the resulting file object.

Clearly this is a flaw in httplib's design.  Perhaps one should be able 
to pass in a socket or file factory?  That would allow speaking HTTP 
over non-TCP transports or through something like a SOCKS proxy, which 
is arguably a good thing.  Do you want to add SOCKS support by adding 
another wrapper around the socket module as well?  How about a python 
software firewall?  Pretty soon our "correct" socket module will have 
20 performance-destroying wrappers around it in order to work around 
deficiencies in the interfaces of some programs which use sockets.

httplib is importing a module where passing a factory function is the 
correct thing to do.  At first it looks like you can parameterize it by 
hacking up a module, but you can only do that once or twice before the 
design problem really becomes pressing.

The socket module is not a high-level interface to networking.  
Attempting to make it into one will harm its utility as a low-level 
interface that good high-level interfaces can be built on top of.

>     Itamar> Why should everyone have to pay a speed penalty just so a
>     Itamar> minority of people can skip calling a
>     Itamar> "socket.installtimeoutsupport()" at the beginning of their
>     Itamar> program? it's just one line of code they'd need to add.
>
> I think it would be easier for the minority of programs that care 
> about the
> 20% performance loss to simply set

I think this should be in the release notes for 2.3.  "Python is 10% 
faster, unless you use sockets, in which case it is much, much slower.  
Do the following in order to regain lost performance and retain the 
same semantics:"

I anticipate that more than just Twisted will want to monkey-patch the 
module.  (A 20% drop in throughput is a significant issue to more than 
an eclectic audience.)  If you're not going to fix this bug, maybe we 
could have a "socket.monkeypatch()" method which would prevent 
different systems from stepping on each other when they do it?

> I don't know about you, but fast and incorrect don't help me much.

Since when is the behavior of the socket module "incorrect"?  If 
anything the interface to "timeout sockets" is incorrect, because BSD 
sockets do not in fact support timeouts.  The interface is doing a 
bunch of things behind the user's back which would be better done 
another way, for example, with actually asynchronous networking.  It's 
pretty likely that there is some obscure corner-case that the select() 
in timeout sockets doesn't catch.

 From a brief glance, internal_select ignores error return values, and 
nothing checks its errno before making another socket call.  If I 
remember correctly, that means that if select gets an EINTR, the 
following call to accept() or recv() or what-have-you may very well 
block.  Of course, since the socket is in non-blocking mode at this 
point, that means that Python will raise an exception on the EAGAIN 
EWOULDBLOCK error.  This is pretty hard to write a test for.

I could be wrong about this particular error, but in general if one 
wishes to be pedantic about "correctness", one must first check the 
result codes from one's C system calls.

> Feel free to submit a patch which improves performance but maintains 
> proper behavior in the face of timeouts (that is, allows 
> test_urllibnet to still work correctly).

Why is the Python development team introducing bugs into Python and 
then expecting the user community to fix things that used to work?  I 
could understand not wanting to put a lot of effort into correcting 
obscure or difficult-to-find performance problems that only a few 
people care about, but the obvious thing to do in this case is simply 
to change the default behavior.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)

iD8DBQE+rZPbvVGR4uSOE2wRAhZVAKCjWkl1NSr8bC1DGcbvhKwL4GZ9+ACeO2cJ
FNU17XosCZxRTVRF/wIkLys=
=GJ3H
-----END PGP SIGNATURE-----



From guido@python.org  Mon Apr 28 21:58:54 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 28 Apr 2003 16:58:54 -0400
Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs
In-Reply-To: Your message of "Mon, 28 Apr 2003 15:49:27 CDT."
 <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com>
References: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com>
Message-ID: <200304282058.h3SKwsj18824@odiug.zope.com>

> I think this should be in the release notes for 2.3.  "Python is 10% 
> faster, unless you use sockets, in which case it is much, much slower.  
> Do the following in order to regain lost performance and retain the 
> same semantics:"

That is total bullshit, Glyph, and you know it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 28 22:02:53 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 28 Apr 2003 17:02:53 -0400
Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs
In-Reply-To: Your message of "Mon, 28 Apr 2003 15:49:27 CDT."
 <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com>
References: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com>
Message-ID: <200304282102.h3SL2rW18842@odiug.zope.com>

> Why is the Python development team introducing bugs into Python and 
> then expecting the user community to fix things that used to work?

I resent your rhetoric, Glyph.  Had you read the rest of this thread,
you would have seen that the performance regression only happens for
sending data at maximum speed over the loopback device, and is
negligeable when receiving e.g. data over a LAN.  You would also have
seen that I have already suggested two different simple fixes.

> I could understand not wanting to put a lot of effort into
> correcting obscure or difficult-to-find performance problems that
> only a few people care about, but the obvious thing to do in this
> case is simply to change the default behavior.

It can and will be fixed.  I just don't have the time to fix it
myself.  The functionality (of having timeouts work properly for
streams created by socket.makefile()) is useful to have.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Apr 28 23:06:48 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 28 Apr 2003 18:06:48 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: Your message of "Mon, 28 Apr 2003 15:51:36 EDT."
 <001b01c30dbf$94363140$125ffea9@oemcomputer>
References: <001b01c30dbf$94363140$125ffea9@oemcomputer>
Message-ID: <200304282206.h3SM6md20118@odiug.zope.com>

> I've experimented with about a dozen ways to improve dictionary 
> performance and found one that benefits some programs by up to 
> 5% without hurting the performance of other programs by more
> than a single percentage point.
> 
> It entails a one line change to dictobject.c resulting in a new 
> schedule of dictionary sizes for a given number of entries:
> 
> Number of           Current size        Proposed size
> Filled Entries      of dictionary       of dictionary
> --------------      -------------       -------------
> [-- 0 to 5 --]            8                   8
> [-- 6 to 10 --]          16                  32
> [-- 11 to 21 --]         32                  32
> [-- 22 to 42 --]         64                 128
> [-- 43 to 85 --]        128                 128
> [-- 86 to 170 --]       256                 512
> [-- 171 to 341 --]      512                 512

I suppose there's an "and so on" here, right?  I wonder if for
*really* large dicts the space sacrifice isn't worth the time saved?

> The idea is to lower the average sparseness of dictionaries (by
> 0% to 50% of their current sparsenes).  This results in fewer 
> collisions, faster collision resolution, fewer memory accesses,
> and better cache performance.  A small side-benefit is halving
> the number of resize operations as the dictionary grows.

I think you mean "raise the average sparseness" don't you?
(The more sparse something is, the more gaps it has.)

I tried the patch with my new favorite benchmark, startup time for
Zope (which surely populates a lot of dicts :-).  It did give about
0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
speedup.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From cjohns@cybertec.com.au  Mon Apr 28 22:54:32 2003
From: cjohns@cybertec.com.au (Chris Johns)
Date: Tue, 29 Apr 2003 07:54:32 +1000
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <m3adea9yj7.fsf@mira.informatik.hu-berlin.de>
References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de>
Message-ID: <3EADA318.5010602@cybertec.com.au>

Martin v. L=F6wis wrote:
> Chris Johns <cjohns@cybertec.com.au> writes:
>=20
>=20
>>Porting Python to the open source realtime OS called RTEMS I get a
>>compile error on line 2797 of socketmodule.c.=20
>=20
>=20
> In my copy, this is the line
>=20
> 	char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))];
                                                        ^^
I would assume the marked code is for IPV6 so needs to be protected by=20
ENABLE_IPV6, for example:

#ifdef ENABLE_IPV6
	char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))];
#else
	char packed[sizeof(struct in_addr)];
#endif

>=20
> Can you report more on the nature of the compile error (such as its
> *message*)?
>=20

(I do not use the Python build system as I have to cross-compile and so u=
se an=20
automake makefile in a RISCOS type layout)

Sure. The output is from gcc-3.2.3:

m68k-rtems-gcc -DHAVE_CONFIG_H -I. -I../python-cvs/dist/src/RTEMS -I.=20
-I../python-cvs/dist/src/RTEMS/../Include=20
-I../python-cvs/dist/src/RTEMS/../Python -I/opt/rtems/m68k-rtems/lib/incl=
ude=20
-m5200 -O4 -g -DPLATFORM=3D"\"RTEMS (m5200)\""  -c -o socketmodule.o `tes=
t -f=20
'../python-cvs/dist/src/RTEMS/../Modules/socketmodule.c' || echo=20
'../python-cvs/dist/src/RTEMS/'`../python-cvs/dist/src/RTEMS/../Modules/s=
ocketmodule.c
=2E./python-cvs/dist/src/Modules/socketmodule.c: In function `socket_inet=
_pton':
=2E./python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to a=
n=20
incomplete type
=2E./python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to a=
n=20
incomplete type
=2E./python-cvs/dist/src/Modules/socketmodule.c:2816: sizeof applied to a=
n=20
incomplete type
=2E./python-cvs/dist/src/Modules/socketmodule.c: In function `socket_inet=
_ntop':
=2E./python-cvs/dist/src/Modules/socketmodule.c:2835: `INET_ADDRSTRLEN' u=
ndeclared=20
(first use in this function)
=2E./python-cvs/dist/src/Modules/socketmodule.c:2835: (Each undeclared id=
entifier=20
is reported only once
=2E./python-cvs/dist/src/Modules/socketmodule.c:2835: for each function i=
t appears=20
in.)
=2E./python-cvs/dist/src/Modules/socketmodule.c:2835: `INET6_ADDRSTRLEN' =

undeclared (first use in this function)
=2E./python-cvs/dist/src/Modules/socketmodule.c:2851: sizeof applied to a=
n=20
incomplete type

>=20
>>Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet.=

>=20
>=20
> (assuming this is a question): I'm unsure. It should not cause a
> compile time failure, period.
>=20

Sorry, it was a question. See above.

>=20
>>Also where is INET_ADDRSTRLEN suppose to be defined ?
>=20
> <netinet/in.h>
>=20

Thanks. The RTEMS TCP/IP stack is an old port of the FreeBSD stack and do=
es not=20
have this. The current FreeBSD does so I will fix RTEMS. I will not add=20
INET6_ADDRSTRLEN as no other IPV6 support is currently provided.

--=20
  Chris Johns, cjohns at cybertec.com.au



From martin@v.loewis.de  Mon Apr 28 23:17:42 2003
From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 29 Apr 2003 00:17:42 +0200
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <3EADA318.5010602@cybertec.com.au>
References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au>
Message-ID: <3EADA886.9020605@v.loewis.de>

Chris Johns wrote:

> ../python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to an 
> incomplete type

I see. And the system does have inet_pton? *That* sounds like a bug to 
me - there should be no inet_pton if the IPv6 API is unsupported.

So I think the configure test should be changed to define HAVE_PTON only 
if all prerequisites of its usage are met (or the entire function should 
be hidden if IPv6 is disabled).

Regards,
Martin




From jack@performancedrivers.com  Mon Apr 28 23:19:20 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Mon, 28 Apr 2003 18:19:20 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <200304282206.h3SM6md20118@odiug.zope.com>; from guido@python.org on Mon, Apr 28, 2003 at 06:06:48PM -0400
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com>
Message-ID: <20030428181920.O15881@localhost.localdomain>

> > I've experimented with about a dozen ways to improve dictionary 
> > performance and found one that benefits some programs by up to 
> > 5% without hurting the performance of other programs by more
> > than a single percentage point.

You wouldn't have some created some handy tables of 'typical' dictionary
usage, would you?  They would be interesting in general, but very nice
for the PEPs doing dict optimizations for symbol tables in particular.

-jack


From cjohns@cybertec.com.au  Mon Apr 28 23:33:35 2003
From: cjohns@cybertec.com.au (Chris Johns)
Date: Tue, 29 Apr 2003 08:33:35 +1000
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <3EADA886.9020605@v.loewis.de>
References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de>
Message-ID: <3EADAC3F.6020802@cybertec.com.au>

Martin v. Lvwis wrote:
> 
> I see. And the system does have inet_pton? *That* sounds like a bug to 
> me - there should be no inet_pton if the IPv6 API is unsupported.

Agreed. I will disable them.

> 
> So I think the configure test should be changed to define HAVE_PTON only 
> if all prerequisites of its usage are met (or the entire function should 
> be hidden if IPv6 is disabled).
> 

It would make Python more robust, but this is a mistake on my part.


Thanks for the help.

-- 
  Chris Johns, cjohns at cybertec.com.au



From goodger@python.org  Tue Apr 29 00:13:14 2003
From: goodger@python.org (David Goodger)
Date: Mon, 28 Apr 2003 19:13:14 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
Message-ID: <3EADB58A.2030607@python.org>

The following paragraph is from PEP 1, "PEP Work Flow" section:

     Once the authors have completed a PEP, they must inform the PEP
     editor that it is ready for review.  PEPs are reviewed by the BDFL
     and his chosen consultants, who may accept or reject a PEP or send
     it back to the author(s) for revision.

I propose adding the following text:

     ...  The BDFL may also initiate a PEP review, first notifying the
     PEP author(s).

In addition, I think it would be useful to add some text describing
the PEP acceptance criteria.  Something like the following:

     For a PEP to be accepted it must meet certain minimum criteria.
     It must be a clear description of the proposed enhancement.  The
     enhancement must represent a net improvement.  The implementation,
     if applicable, must be solid and must not complicate the
     interpreter unduly.  Finally, a proposed enhancement must be
     "pythonic" in order to be accepted by the BDFL.  (However,
     "pythonic" is an imprecise term; it may be defined as whatever is
     acceptable to the BDFL.  This logic is intentionally circular.)

     See PEP 2 for standard library module acceptance criteria.

Please comment.

-- 
David Goodger                    <http://starship.python.net/~goodger>
Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/>

(Please cc: all PEP correspondence to <peps@python.org>.)



From bkelly@sourcereview.net  Tue Apr 29 00:25:57 2003
From: bkelly@sourcereview.net (Brett Kelly)
Date: Mon, 28 Apr 2003 16:25:57 -0700
Subject: [Python-Dev] Introduction :)
Message-ID: <20030428232557.GE21953@inkedmn.homelinux.org>

--j36CejUufJ9OqLyd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Howdy folks, i'm new to this mailing list, thought i'd say hello and
introduce myself.

i'm Brett, i've been using python for about 2 years now, it was the
first language i learned.  I hope to learn more about python's advanced
featurs (and gain a deeper understanding of OOP), as well as contribute
in whatever small way i can.

Anyway, hello!
--=20
Brett Kelly
bkelly@sourcereview.net

This message was created using the Mutt mail agent and=20
digitally signed using GnuPG.


--j36CejUufJ9OqLyd
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+rbiFa7gYa9SI8SoRAlaqAJ4/98em+U4nsVfUi9cT9lSukvtkQgCdFKGY
2+WIdi5jLj754hWlQzqSVv8=
=CouZ
-----END PGP SIGNATURE-----

--j36CejUufJ9OqLyd--


From tdelaney@avaya.com  Tue Apr 29 00:45:21 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 09:45:21 +1000
Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC128@au3010avexu1.global.avaya.com>

> From: Guido van Rossum [mailto:guido@python.org]
>=20
> > Why is the Python development team introducing bugs into Python and=20
> > then expecting the user community to fix things that used to work?
>=20
> I resent your rhetoric, Glyph.  Had you read the rest of this thread,
> you would have seen that the performance regression only happens for
> sending data at maximum speed over the loopback device, and is
> negligeable when receiving e.g. data over a LAN.  You would also have
> seen that I have already suggested two different simple fixes.

Indeed - the primary purpose of a beta is IMO to discover these issues =
by use in as great a number of scenarios as possible before the final =
release is made.

I would be extremely surprised if this cannot be fixed before 2.3 final =
(in fact, I would be extremely surprised if such a known regression were =
allowed in 2.3 final).

A beta should (excluding implementation bugs) have *correct* behaviour. =
Performance is not the #1 priority for a beta.

Tim Delaney


From tdelaney@avaya.com  Tue Apr 29 00:51:47 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 09:51:47 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com>

> From: Guido van Rossum [mailto:guido@python.org]
>=20
> > I've experimented with about a dozen ways to improve dictionary=20
> > performance and found one that benefits some programs by up to=20
> > 5% without hurting the performance of other programs by more
> > than a single percentage point.
> >=20
> > It entails a one line change to dictobject.c resulting in a new=20
> > schedule of dictionary sizes for a given number of entries:
> >=20
> > Number of           Current size        Proposed size
> > Filled Entries      of dictionary       of dictionary
> > --------------      -------------       -------------
> > [-- 0 to 5 --]            8                   8
> > [-- 6 to 10 --]          16                  32
> > [-- 11 to 21 --]         32                  32
> > [-- 22 to 42 --]         64                 128
> > [-- 43 to 85 --]        128                 128
> > [-- 86 to 170 --]       256                 512
> > [-- 171 to 341 --]      512                 512
>=20
> I suppose there's an "and so on" here, right?  I wonder if for
> *really* large dicts the space sacrifice isn't worth the time saved?

What is the effect on peak memory usage over "average" programs?

This might be a worthwhile speedup on small dicts (up to a TBD number of =
entries) but not worthwhile for large dicts. However, to add this =
capability in would of course add more code to a very common code path =
(additional test for current size to decide what factor to increase by).

> I tried the patch with my new favorite benchmark, startup time for
> Zope (which surely populates a lot of dicts :-).  It did give about
> 0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
> speedup.

Nice (in relative, not absolute terms). Do we have any numbers on memory =
usage during and after that period?

Tim Delaney


From tdelaney@avaya.com  Tue Apr 29 01:10:39 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 10:10:39 +1000
Subject: [Python-Dev] Thoughts on -O
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com>

Was doing some thinking in the shower this morning, and came up with =
some ideas for specifying optimisation. These are currently quite =
nebulous thoughts ...

We have the current situation:

-O only removes asserts
-OO removes asserts and docstrings.

I think this is an ideal time to revisit the purpose of -O for 2.4 or =
later.

IMO the "vanilla" mode should be a "release" mode. Users should not have =
to use a command-line option to gain "release" optimisations such as =
asserts.

I would propose that we have the following modes for python to work in.

1. Release/Production mode (no command-line switch)

  - asserts are turned off
  - well-tested/stable optimisations are included
  - possibly additional things, such as not calling trace functions

2. Optimised mode (-O)

  - more experimental optimisations are included i.e. those that may =
have performance improvements in some cases, but penalties in others, =
etc

  - may possibly split this up so individual optimisations can be turned =
on and off as required - this would leave -O by itself as a no-op

3. Docstring elimination mode (-OO)

  - may be specified in addition to optimised mode - it does not imply =
optimised mode

4. Debug mode (-D?)

  - will be the slowest mode - no optimisations - cannot be called with =
either -O or -OO
  - turns on asserts
  - turns on trace functions

I would see Debug mode being used by developers in unit tests, code =
coverage, etc.

.pyc and .pyo files would need to know which optimisations they were =
compiled with so that if they would be loaded again with the "wrong" =
optimisations they would be re-compiled.

Anyway, any thoughts, rebuttals, etc would be of interest. I'd like to =
get some discussion before I create a PEP.

Cheers.

Tim Delaney


From drifty@alum.berkeley.edu  Tue Apr 29 01:13:31 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Mon, 28 Apr 2003 17:13:31 -0700 (PDT)
Subject: [Python-Dev] Introduction :)
In-Reply-To: <20030428232557.GE21953@inkedmn.homelinux.org>
References: <20030428232557.GE21953@inkedmn.homelinux.org>
Message-ID: <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU>

[Brett Kelly]

> Howdy folks, i'm new to this mailing list, thought i'd say hello and
> introduce myself.
>
> i'm Brett,
<snip>

Does this mean I have to append my my last initial to my name to
differentiate?  Or since I was here first can I just become the default
"Brett" and make Brett K. have to use his initial?  =)

-Brett


From python@rcn.com  Tue Apr 29 00:50:14 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 19:50:14 -0400
Subject: [Python-Dev] Dictionary tuning
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain>
Message-ID: <000201c30de7$01b5bd40$125ffea9@oemcomputer>

[jack, master of the mac]
> You wouldn't have some created some handy tables of 'typical' dictionary
> usage, would you?  They would be interesting in general, but very nice
> for the PEPs doing dict optimizations for symbol tables in particular.

That path proved fruitless.  I studied the usage patterns in about
a dozen of my apps and found that there is no such thing as typical.
Instead there are many categories of dictionary usage.

* attribute/method look-up in many small dictionaries
* uniquification apps with many redundant stores and few lookups
* membership testing with few stores and many lookups into small or large dicts.
* database style lookups following Zipf's law for key access in large dicts.
* graph explorers that access a few keys frequently and then move
   onto another set of related nodes.
* global/builtin variable access following a failed search of locals.

Almost every dictionary tune-up that helped one app would end-up 
hurting another.  The only way to test the effectiveness of a change 
was to time a whole suite of applications.  The standard benchmarks
were useless in this regard.  Worse still, contrived test programs 
would not predict the results for real apps.  There were several
reasons for this:

* there is a special case for handling dicts that only have string keys
* real apps exhibit keys access patterns that pull the most frequently
   accessed entries into the cache.  this thwarted attempts to improve
   cache performance at the expense of more collisions.
* any small, fixed set of test keys may have atypical collision anomalies,
   non-representative access frequencies, or not be characteristic of other
   dicts with a slightly different number of keys.
* some sets of keys have non-random hash patterns but if you rely
   on this, it victimizes other sets of keys.
* the results are platform dependent (ratio of processor speed to memory
   speed; size of cache; size of cache a line; cache associativity; write-back
   vs. write-through; etc).

I had done some experiments that focused on symbol tables and
had some luck with sequential searches into a self-organizing list.
Using a list eliminated the holes and allowed more of the entries
to fit in a single cache line.  No placeholders were needed for
deleted entries and that saves a test in the search loop.  The
self-organizing property kept the most frequently accessed
entries at the head of the list.   Using a sequential search had
slightly less overhead than the hash table search pattern.  Except
for the builtin dictionary, most of the symbol tables in my apps
have only a handful of entries.

if-only-i-had-had-a-single-valid-dict-performance-predictor-ly yours,


Raymond Hettinger








From python@rcn.com  Tue Apr 29 01:09:15 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 20:09:15 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
References: <3EADB58A.2030607@python.org>
Message-ID: <000301c30de7$021a7280$125ffea9@oemcomputer>

> I propose adding the following text:
>      ...  The BDFL may also initiate a PEP review, first notifying the
>      PEP author(s).

Periodic updates to the parade-of-peps serves equally well.


>      For a PEP to be accepted it must meet certain minimum criteria.
>      It must be a clear description of the proposed enhancement.  The
>      enhancement must represent a net improvement.  The implementation,
>      if applicable, must be solid and must not complicate the
>      interpreter unduly.  Finally, a proposed enhancement must be
>      "pythonic" in order to be accepted by the BDFL.  (However,
>      "pythonic" is an imprecise term; it may be defined as whatever is
>      acceptable to the BDFL.  This logic is intentionally circular.)

Peps can go through a lot of stages before they get to this point.
That can include having other peps explore other options;
refinements to the idea, etc.

>From these proposals and the annoucement earlier this week,
I sense a desire to have fewer peps and to more rapidly get
them out of the draft status.

In general, I don't think this is a good idea.  If someone wants
to do a write-up and weather the ensuing firestorm, that is 
enough for me.  If it has to sit for a few years before becoming
obviously good or bad, that's fine too.

Also, some ideas need time.  My generator attributes idea had
no chance for Py2.3.  After people spend a year or so using
generators, they might collectively begin to see a need for it.
Also, someone may be able to help express the rationale
more clearly.  As written, the rationale would result in instant
death for the pep.  After a pep dies, it becomes a permanent
impediment for similar ideas even if someone comes up
with better use cases or a slightly improved implementation.

The first time I proposed something like a DictMixin class,
it was violently shot down.  A few months later, I had an
improved version and those with a long memory immediately
pointed out, "hey, that was shot down".  After one more
round, it was accepted, the alpha reviewers loved it, and
it got applied through-out the library.   Early rejection of
peps will doom some useful ideas before they have a
fighting chance.  The authors can read the parade of peps
and adapt or withdraw as appropriate.

IOW, I like the process as it stands and am -1 on the 
amendment.  It should be up to the pep author to 
decide when to stick his head in the guillotine to
see what happens :)


Raymond Hettinger


"Theories have four stages of acceptance: 
     i) this is worthless nonsense; 
     ii) this is an interesting, but perverse, point of view. 
     iii) this is true but quite unimportant. 
     iv) I always said so."
                  - J.B.S. Haldane, 1963 

"All great truths began as blasphemies" - George Bernard Shaw 












From python@rcn.com  Tue Apr 29 01:20:35 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 20:20:35 -0400
Subject: [Python-Dev] Dictionary tuning
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com>
Message-ID: <000501c30de7$029f32e0$125ffea9@oemcomputer>

> What is the effect on peak memory usage over "average" programs?

Since the amortized growth is the same, the effect is Nil on average.
Special cases can be contrived with a specific number of records
where the memory use is doubled, but in general it is nearly unchanged
for average programs.

> This might be a worthwhile speedup on small dicts (up to a TBD 
> number of entries) but not worthwhile for large dicts.

Actually, it helps large dictionaries even more that small dictionaries.
Collisions in large dicts are resolved through other memory probes
which are almost certain not to be in the current cache line.

> However, to add this capability in would of course add more code
> to a very common code path (additional test for current size to 
> decide what factor to increase by).

Your intuition is exactly correct.  All experiments to special case
various sizes results in decreased performance because it added
a tiny amount to some of the most heavily exercised code in
python.  Further, it results in an unpredicable branch which is
also not a good thing.

[GvR]
> I tried the patch with my new favorite benchmark, startup time for
> Zope (which surely populates a lot of dicts :-).  It did give about
> 0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
> speedup.

[Tim]
>Nice (in relative, not absolute terms). Do we have any numbers on 
> memory usage during and after that period?

I found out that timing dict performance was hard.
Capturing memory usage was harder.  Checking entry 
space,space plus unused space, calls to PyMalloc, and 
calls to the os malloc, only the last is important, but
it depends on all kinds of things that are not easily
controlled.


Tim Delaney

_______________________________________________



From guido@python.org  Tue Apr 29 01:43:00 2003
From: guido@python.org (Guido van Rossum)
Date: Mon, 28 Apr 2003 20:43:00 -0400
Subject: [Python-Dev] Thoughts on -O
In-Reply-To: "Your message of Tue, 29 Apr 2003 10:10:39 +1000."
 <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com>
Message-ID: <200304290043.h3T0h0R16884@pcp02138704pcs.reston01.va.comcast.net>

> Was doing some thinking in the shower this morning, and came up with
> some ideas for specifying optimisation. These are currently quite
> nebulous thoughts ...
> 
> We have the current situation:
> 
> -O only removes asserts

It may do some more, but not much more now that SET_LINENO is never
generated.

> -OO removes asserts and docstrings.
> 
> I think this is an ideal time to revisit the purpose of -O for 2.4 or later.

Hm, I would think we can wait until after 2.3 is released, lest we be
tempted to "push one more feature into 2.3".

> IMO the "vanilla" mode should be a "release" mode. Users should not
> have to use a command-line option to gain "release" optimisations
> such as asserts.

I strongly disagree, and I expect most Python users would.  I think
this idea of a default harks back to the time when computers were slow
and you would put on your special debugging hat only when you had a
problem you couldn't solve by thinking about it.  These days, often
you don't care about the small gain in speed that -O or even -OO
offers, because the program runs fast enough; but often you *do* care
about the extra checks that assert offers.  (I know I do.)

> I would propose that we have the following modes for python to work in.
> 
> 1. Release/Production mode (no command-line switch)
> 
>   - asserts are turned off
>   - well-tested/stable optimisations are included
>   - possibly additional things, such as not calling trace functions
> 
> 2. Optimised mode (-O)
> 
>   - more experimental optimisations are included i.e. those that may
>     have performance improvements in some cases, but penalties in
>     others, etc
> 
>   - may possibly split this up so individual optimisations can be
>     turned on and off as required - this would leave -O by itself as
>     a no-op
> 
> 3. Docstring elimination mode (-OO)
> 
>   - may be specified in addition to optimised mode - it does not
>     imply optimised mode
> 
> 4. Debug mode (-D?)
> 
>   - will be the slowest mode - no optimisations - cannot be called
>     with either -O or -OO
>   - turns on asserts
>   - turns on trace functions
> 
> I would see Debug mode being used by developers in unit tests, code
> coverage, etc.

If I'm right about how Python is used, most Python users are in debug
mode most of the time.  So this ought to be the default.

> .pyc and .pyo files would need to know which optimisations they were
> compiled with so that if they would be loaded again with the "wrong"
> optimisations they would be re-compiled.

That's what the difference between .pyc and .pyo was intended to
convey; IMO this was a mistake.

> Anyway, any thoughts, rebuttals, etc would be of interest. I'd like
> to get some discussion before I create a PEP.

I'm not convinced that we need anything, given the minimal effect of
most currently available optimizations.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python@rcn.com  Tue Apr 29 01:09:15 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 20:09:15 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
References: <3EADB58A.2030607@python.org>
Message-ID: <000101c30de8$57758840$125ffea9@oemcomputer>

> I propose adding the following text:
>      ...  The BDFL may also initiate a PEP review, first notifying the
>      PEP author(s).

Periodic updates to the parade-of-peps serves equally well.


>      For a PEP to be accepted it must meet certain minimum criteria.
>      It must be a clear description of the proposed enhancement.  The
>      enhancement must represent a net improvement.  The implementation,
>      if applicable, must be solid and must not complicate the
>      interpreter unduly.  Finally, a proposed enhancement must be
>      "pythonic" in order to be accepted by the BDFL.  (However,
>      "pythonic" is an imprecise term; it may be defined as whatever is
>      acceptable to the BDFL.  This logic is intentionally circular.)

Peps can go through a lot of stages before they get to this point.
That can include having other peps explore other options;
refinements to the idea, etc.

>From these proposals and the annoucement earlier this week,
I sense a desire to have fewer peps and to more rapidly get
them out of the draft status.

In general, I don't think this is a good idea.  If someone wants
to do a write-up and weather the ensuing firestorm, that is 
enough for me.  If it has to sit for a few years before becoming
obviously good or bad, that's fine too.

Also, some ideas need time.  My generator attributes idea had
no chance for Py2.3.  After people spend a year or so using
generators, they might collectively begin to see a need for it.
Also, someone may be able to help express the rationale
more clearly.  As written, the rationale would result in instant
death for the pep.  After a pep dies, it becomes a permanent
impediment for similar ideas even if someone comes up
with better use cases or a slightly improved implementation.

The first time I proposed something like a DictMixin class,
it was violently shot down.  A few months later, I had an
improved version and those with a long memory immediately
pointed out, "hey, that was shot down".  After one more
round, it was accepted, the alpha reviewers loved it, and
it got applied through-out the library.   Early rejection of
peps will doom some useful ideas before they have a
fighting chance.  The authors can read the parade of peps
and adapt or withdraw as appropriate.

IOW, I like the process as it stands and am -1 on the 
amendment.  It should be up to the pep author to 
decide when to stick his head in the guillotine to
see what happens :)


Raymond Hettinger


"Theories have four stages of acceptance: 
     i) this is worthless nonsense; 
     ii) this is an interesting, but perverse, point of view. 
     iii) this is true but quite unimportant. 
     iv) I always said so."
                  - J.B.S. Haldane, 1963 

"All great truths began as blasphemies" - George Bernard Shaw 












From python@rcn.com  Tue Apr 29 01:20:35 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 20:20:35 -0400
Subject: [Python-Dev] Dictionary tuning
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com>
Message-ID: <000301c30de8$5815bfe0$125ffea9@oemcomputer>

> What is the effect on peak memory usage over "average" programs?

Since the amortized growth is the same, the effect is Nil on average.
Special cases can be contrived with a specific number of records
where the memory use is doubled, but in general it is nearly unchanged
for average programs.

> This might be a worthwhile speedup on small dicts (up to a TBD 
> number of entries) but not worthwhile for large dicts.

Actually, it helps large dictionaries even more that small dictionaries.
Collisions in large dicts are resolved through other memory probes
which are almost certain not to be in the current cache line.

> However, to add this capability in would of course add more code
> to a very common code path (additional test for current size to 
> decide what factor to increase by).

Your intuition is exactly correct.  All experiments to special case
various sizes results in decreased performance because it added
a tiny amount to some of the most heavily exercised code in
python.  Further, it results in an unpredicable branch which is
also not a good thing.

[GvR]
> I tried the patch with my new favorite benchmark, startup time for
> Zope (which surely populates a lot of dicts :-).  It did give about
> 0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
> speedup.

[Tim]
>Nice (in relative, not absolute terms). Do we have any numbers on 
> memory usage during and after that period?

I found out that timing dict performance was hard.
Capturing memory usage was harder.  Checking entry 
space,space plus unused space, calls to PyMalloc, and 
calls to the os malloc, only the last is important, but
it depends on all kinds of things that are not easily
controlled.


Tim Delaney

_______________________________________________



From python@rcn.com  Tue Apr 29 00:50:14 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 19:50:14 -0400
Subject: [Python-Dev] Dictionary tuning
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain>
Message-ID: <000001c30de8$57219be0$125ffea9@oemcomputer>

[jack, master of the mac]
> You wouldn't have some created some handy tables of 'typical' dictionary
> usage, would you?  They would be interesting in general, but very nice
> for the PEPs doing dict optimizations for symbol tables in particular.

That path proved fruitless.  I studied the usage patterns in about
a dozen of my apps and found that there is no such thing as typical.
Instead there are many categories of dictionary usage.

* attribute/method look-up in many small dictionaries
* uniquification apps with many redundant stores and few lookups
* membership testing with few stores and many lookups into small or large dicts.
* database style lookups following Zipf's law for key access in large dicts.
* graph explorers that access a few keys frequently and then move
   onto another set of related nodes.
* global/builtin variable access following a failed search of locals.

Almost every dictionary tune-up that helped one app would end-up 
hurting another.  The only way to test the effectiveness of a change 
was to time a whole suite of applications.  The standard benchmarks
were useless in this regard.  Worse still, contrived test programs 
would not predict the results for real apps.  There were several
reasons for this:

* there is a special case for handling dicts that only have string keys
* real apps exhibit keys access patterns that pull the most frequently
   accessed entries into the cache.  this thwarted attempts to improve
   cache performance at the expense of more collisions.
* any small, fixed set of test keys may have atypical collision anomalies,
   non-representative access frequencies, or not be characteristic of other
   dicts with a slightly different number of keys.
* some sets of keys have non-random hash patterns but if you rely
   on this, it victimizes other sets of keys.
* the results are platform dependent (ratio of processor speed to memory
   speed; size of cache; size of cache a line; cache associativity; write-back
   vs. write-through; etc).

I had done some experiments that focused on symbol tables and
had some luck with sequential searches into a self-organizing list.
Using a list eliminated the holes and allowed more of the entries
to fit in a single cache line.  No placeholders were needed for
deleted entries and that saves a test in the search loop.  The
self-organizing property kept the most frequently accessed
entries at the head of the list.   Using a sequential search had
slightly less overhead than the hash table search pattern.  Except
for the builtin dictionary, most of the symbol tables in my apps
have only a handful of entries.

if-only-i-had-had-a-single-valid-dict-performance-predictor-ly yours,


Raymond Hettinger








From aahz@pythoncraft.com  Tue Apr 29 01:57:50 2003
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 28 Apr 2003 20:57:50 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
Message-ID: <20030429005750.GA17963@panix.com>

On Mon, Apr 28, 2003, Raymond Hettinger wrote:
>
> From these proposals and the annoucement earlier this week,
> I sense a desire to have fewer peps and to more rapidly get
> them out of the draft status.

There's some truth to that.  OTOH, until the BDFL declares something to
be an ex-PEP, I don't think BDFL rejection of a PEP means that it is
forever dead -- it just requires substantial revision to resurrect it.
The point of PEPs is to prevent rehashing of old subjects in the same
way, not to prevent new ideas from restarting discussions.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 93


From aahz@pythoncraft.com  Tue Apr 29 01:58:37 2003
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 28 Apr 2003 20:58:37 -0400
Subject: [Python-Dev] Introduction :)
In-Reply-To: <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU>
References: <20030428232557.GE21953@inkedmn.homelinux.org> <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU>
Message-ID: <20030429005837.GB17963@panix.com>

On Mon, Apr 28, 2003, Brett Cannon wrote:
> [Brett Kelly]
>> 
>> Howdy folks, i'm new to this mailing list, thought i'd say hello and
>> introduce myself.
>>
>> i'm Brett,
>
> Does this mean I have to append my my last initial to my name to
> differentiate?  Or since I was here first can I just become the default
> "Brett" and make Brett K. have to use his initial?  =)

"Explicit is better than implicit."  ;-)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 93


From tdelaney@avaya.com  Tue Apr 29 01:59:41 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 10:59:41 +1000
Subject: [Python-Dev] Thoughts on -O
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC167@au3010avexu1.global.avaya.com>

> From: Guido van Rossum [mailto:guido@python.org]
>=20
> > -OO removes asserts and docstrings.
> >=20
> > I think this is an ideal time to revisit the purpose of -O=20
> for 2.4 or later.
>=20
> Hm, I would think we can wait until after 2.3 is released, lest we be
> tempted to "push one more feature into 2.3".

I have absolutely *no* intention of pushing any of this for 2.3. Good =
lord no. For a start, these would be major feature changes ...

> > IMO the "vanilla" mode should be a "release" mode. Users should not
> > have to use a command-line option to gain "release" optimisations
> > such as asserts.
>=20
> I strongly disagree, and I expect most Python users would.  I think
> this idea of a default harks back to the time when computers were slow
> and you would put on your special debugging hat only when you had a
> problem you couldn't solve by thinking about it.  These days, often
> you don't care about the small gain in speed that -O or even -OO
> offers, because the program runs fast enough; but often you *do* care
> about the extra checks that assert offers.  (I know I do.)

True. I'm ambivalent about that myself. But in that case, I would argue =
instead that there should not be any option to remove asserts.

> > .pyc and .pyo files would need to know which optimisations they were
> > compiled with so that if they would be loaded again with the "wrong"
> > optimisations they would be re-compiled.
>=20
> That's what the difference between .pyc and .pyo was intended to
> convey; IMO this was a mistake.

Yep - I know this. I would actually suggest removing .pyo and simply =
have the info held in the .pyc.

> > Anyway, any thoughts, rebuttals, etc would be of interest. I'd like
> > to get some discussion before I create a PEP.
>=20
> I'm not convinced that we need anything, given the minimal effect of
> most currently available optimizations.

One of my options is to create a PEP specifically to have it rejected.

However, I think there are definitely a couple of useful things in here. =
In particular, it provides a path for introducing optimisations. One of =
the complaints I have seen recently is that all optimisations are being =
added to both paths.

Perhaps this could be reduced to a process PEP with the following major =
points:

1. Any new optimisation must be introduced on the optimised path.

2. Optimisations may be promoted from the optimised path to the vanilla =
path at BDFL discretion.

3. Experimental optimisations in general will required at least one =
complete release before being promoted from the optimised path to the =
vanilla path.

Tim Delaney


From tim.one@comcast.net  Tue Apr 29 02:49:52 2003
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 28 Apr 2003 21:49:52 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <000301c30de8$5815bfe0$125ffea9@oemcomputer>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net>

[Tim Delaney]
>> What is the effect on peak memory usage over "average" programs?

[Raymond Hettinger]
> Since the amortized growth is the same, the effect is Nil on average.
> Special cases can be contrived with a specific number of records
> where the memory use is doubled, but in general it is nearly unchanged
> for average programs.

That doesn't make sense.  Dicts can be larger after the patch, but never
smaller, so there's nothing opposing the "can be larger" part:  on average,
allocated address space must be strictly larger than before.  Whether that
*matters* on average to the average user is something we can answer
rigorously just as soon as we find an average user with an average program
<wink>.  I'm not inclined to worry much about it.

>> This might be a worthwhile speedup on small dicts (up to a TBD
>> number of entries) but not worthwhile for large dicts.

> Actually, it helps large dictionaries even more that small dictionaries.
> Collisions in large dicts are resolved through other memory probes
> which are almost certain not to be in the current cache line.

That part makes sense.  Resizing a large dict is an expensive operation too.

>> However, to add this capability in would of course add more code
>> to a very common code path (additional test for current size to
>> decide what factor to increase by).

> Your intuition is exactly correct.  All experiments to special case
> various sizes results in decreased performance because it added
> a tiny amount to some of the most heavily exercised code in
> python.

This part isn't clear:  the changed code is in the body of an if() block
that normally *isn't* entered (over an ever-growing dict's life, it's
entered O(log(len(dict))) times, and independent of the number of dict
lookups).  The change cuts the number of times it's entered by approximately
a factor of 2, but it isn't entered often even now.

> Further, it results in an unpredicable branch which is
> also not a good thing.

Since the body of the loop isn't entered often, unpredictable one-shot
branches within the body shouldn't have a measurable effect.  The
unpredictable branches when physically resizing the dict will swamp them
regardless.  The surrounding if-test continues to be predictable in the
"branch taken" direction.

What could be much worse is that stuffing code into the if-block bloats the
code so much as to frustrate lookahead I-stream caching of the normal
"branch taken and return 0" path:

	if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
		if (dictresize(mp, mp->ma_used*2) != 0)
			return -1;
	}
	return 0;

Rewriting as

	if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
		return 0;

	return dictresize(mp, mp->ma_used*2) ? -1 : 0;

would help some compilers generate better code for the expected path, and
especially if the blob after "return 0;" got hairier.

IOW, if fiddling with different growth factors at different sizes slowed
things down, we have to look for something that affected the *normal* paths;
it's hard to imagine that the the guts of the if-block execute often enough
to matter (discounting its call to dictresize(), which is an expensive
routine).

> I found out that timing dict performance was hard.
> Capturing memory usage was harder.  Checking entry
> space,space plus unused space, calls to PyMalloc, and
> calls to the os malloc, only the last is important, but
> it depends on all kinds of things that are not easily
> controlled.

In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all
memory was real.  If you had a CPU-bound program, it took the same number of
nanoseconds each time you ran it.  Benchmarking was hard then too <0.5
wink>.



From tdelaney@avaya.com  Tue Apr 29 03:40:41 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 12:40:41 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com>

> From: Tim Peters [mailto:tim.one@comcast.net]
>=20
> That doesn't make sense.  Dicts can be larger after the=20
> patch, but never
> smaller, so there's nothing opposing the "can be larger"=20
> part:  on average,
> allocated address space must be strictly larger than before. =20
> Whether that
> *matters* on average to the average user is something we can answer
> rigorously just as soon as we find an average user with an=20
> average program
> <wink>.  I'm not inclined to worry much about it.

That's what I was getting at. I know that (for example) most
classes I create have less that 16 entries in their __dict__.
With this change, each class instance would take (approx) twice
as much memory for its __dict__. I suspect that class instance
__dict__ is the most common dictionary I use.

> >> This might be a worthwhile speedup on small dicts (up to a TBD
> >> number of entries) but not worthwhile for large dicts.
>=20
> > Actually, it helps large dictionaries even more that small=20
> dictionaries.
> > Collisions in large dicts are resolved through other memory probes
> > which are almost certain not to be in the current cache line.
>=20
> That part makes sense.  Resizing a large dict is an expensive=20
> operation too.

That's not what I meant. Most dictionaries are fairly small.
Large dictionaries are common, but I doubt they are common enough
to offset the potential memory loss from this patch. Currently if
you go one over a threshold you have a capacity of 2*len(d)-1.
With the patch this would change to 4*len(d)-1 - very significant
for large dictionaries. Thus my consideration that it might be
worthwhile for smaller dictionaries (depending on memory
memory characteristics) but not for large dictionaries.

Perhaps we need to add some internal profiling, so that
"quickly-growing" dictionaries get larger reallocations ;)

> Since the body of the loop isn't entered often, unpredictable one-shot
> branches within the body shouldn't have a measurable effect.  The
> unpredictable branches when physically resizing the dict will=20
> swamp them
> regardless.  The surrounding if-test continues to be=20
> predictable in the
> "branch taken" direction.

I didn't look at the surrounding code (bad Tim D - thwack!) but
in this case I would not expect an appreciable performance loss
from this. However, the fact that we're getting an appreciable
performance *gain* from changes on this branch suggests that it
might be slightly more vulnerable than expected (but should still be
swamped by the resize).

> What could be much worse is that stuffing code into the=20
> if-block bloats the
> code so much as to frustrate lookahead I-stream caching of the normal
> "branch taken and return 0" path:
>=20
> 	if (mp->ma_used > n_used && mp->ma_fill*3 >=3D=20
> (mp->ma_mask+1)*2) {
> 		if (dictresize(mp, mp->ma_used*2) !=3D 0)
> 			return -1;
> 	}
> 	return 0;
>=20
> Rewriting as
>=20
> 	if (mp->ma_used <=3D n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
> 		return 0;
>=20
> 	return dictresize(mp, mp->ma_used*2) ? -1 : 0;
>=20
> would help some compilers generate better code for the=20
> expected path, and
> especially if the blob after "return 0;" got hairier.

I find that considerably easier to read in any case ;)

Cheers.

Tim Delaney


From python@rcn.com  Tue Apr 29 04:15:50 2003
From: python@rcn.com (Raymond Hettinger)
Date: Mon, 28 Apr 2003 23:15:50 -0400
Subject: [Python-Dev] Dictionary tuning
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com>
Message-ID: <003b01c30dfd$a2d28b20$920aa044@oemcomputer>

[Tim Peters] 
> Dicts can be larger after the 
> patch, but never
> smaller, so there's nothing opposing the "can be larger" 
> part:  on average, allocated address space must be strictly larger than before.

I think of the resize intervals as steps on a staircase.
My patch eliminates the even numbered stairs.
The average logarithmic slope of the staircase doesn't change,
there are just fewer discrete steps.  Also, the height of 
the staircase doesn't change unless the top stair was even, 
in which case, another half step is added.


[Tim Peters]
> Resizing a large dict is an expensive operation too.
Not only are there fewer resizes, but the cost of the operation
becomes cheaper because it takes less time to load a sparse
dictionary than one that is more dense.


[Tim Peters] 
> Whether that *matters* on average to the average user is something 
> we can answer
> rigorously just as soon as we find an average user with an 
> average program
> <wink>.  I'm not inclined to worry much about it.

Me either, I suspect that it is rare to find a stable application that is
functioning just fine and consuming nearly all memory.  Sooner
or later, some change in data, hardware, os, or script would push
it over the edge.


[Timothy Delaney]
> That's what I was getting at. I know that (for example) most
> classes I create have less that 16 entries in their __dict__.
> With this change, each class instance would take (approx) twice
> as much memory for its __dict__. I suspect that class instance
> __dict__ is the most common dictionary I use.

Those dicts would also be the ones benefitting from the patch.
Their density would be halved; resulting in fewer collisions,
improved search times, and better cache performance.


[Timothy Delaney]
> Perhaps we need to add some internal profiling, so that
> "quickly-growing" dictionaries get larger reallocations ;)

I came up with this patch a couple of months ago and have
since tried every tweak I could think of (apply to this size
dict but not that one, etc) but found nothing that survived
a battery of application benchmarks.

Have you guys tried out the patch? I'm very interested in 
getting results from different benchmarks, processors, 
cache sizes, and various operating systems.

sparse-is-better-than-dense-ly yours,


Raymond
(currently, the only one.  unlike two Tims, two Bretts, 
 two Jacks and a Fredrik distinct from Fred)












#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################


From tdelaney@avaya.com  Tue Apr 29 04:34:06 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 13:34:06 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1C5@au3010avexu1.global.avaya.com>

> From: Raymond Hettinger [mailto:python@rcn.com]
>=20
> [Timothy Delaney]
> > That's what I was getting at. I know that (for example) most
> > classes I create have less that 16 entries in their __dict__.
> > With this change, each class instance would take (approx) twice
> > as much memory for its __dict__. I suspect that class instance
> > __dict__ is the most common dictionary I use.
>=20
> Those dicts would also be the ones benefitting from the patch.
> Their density would be halved; resulting in fewer collisions,
> improved search times, and better cache performance.

No question that they would benefit. The question is whether the
benefit outweighs the possible penalties. Of course, we can't
evaluate that until we've got some data ...

> [Timothy Delaney]
> > Perhaps we need to add some internal profiling, so that
> > "quickly-growing" dictionaries get larger reallocations ;)
>=20
> I came up with this patch a couple of months ago and have
> since tried every tweak I could think of (apply to this size
> dict but not that one, etc) but found nothing that survived
> a battery of application benchmarks.

Note the smiley. Not at all intended seriously - the extra
complication would almost certainly eliminate any possible
performance gains.

> Have you guys tried out the patch? I'm very interested in=20
> getting results from different benchmarks, processors,=20
> cache sizes, and various operating systems.

If I can find the time I will. We're in crunch time on my
project at the moment ... I'm somewhat over-allocated :(

In case I forgot to mention it - I like the ideas in the patch,
and really like the performance improvement. But with the things
I'm doing at the moment, memory is proving more of a bottleneck
than performance ... once we start hitting virtual memory, you
can forget about a 5% performance improvement ...

Tim Delaney


From goodger@python.org  Tue Apr 29 04:53:22 2003
From: goodger@python.org (David Goodger)
Date: Mon, 28 Apr 2003 23:53:22 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
In-Reply-To: <000101c30de8$57758840$125ffea9@oemcomputer>
References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer>
Message-ID: <3EADF732.7020300@python.org>

[David Goodger]
>> I propose adding the following text:
 >>
>>     ...  The BDFL may also initiate a PEP review, first notifying the
>>     PEP author(s).

[Raymond Hettinger]
> Periodic updates to the parade-of-peps serves equally well.

Except that Guido doesn't have time to update the PEP Parade.  He told 
me so when I asked a few days ago.

> From these proposals and the annoucement earlier this week,
> I sense a desire to have fewer peps and to more rapidly get
> them out of the draft status.

Not quite.  The desire is not to cull the weak, but to promote the 
strong.  The desire is to change already-implemented and 
implicitly-accepted PEPs to from "Status: Draft" to "Status: Accepted" 
or "Status: Final".  See the "Accepted PEPs?" thread from a few days 
ago; 9 "Draft" but already-implemented-for-2.3 PEPs were identified. 
Their status lines ought to be changed, but the wording as written 
implies that Guido and the PEP editors have to wait for authors to ask 
for a review.  We should be able to be more proactive.

New proposed addition:

     ...  For PEPs that are pre-determined to be acceptable (e.g.,
     their implementation has already been checked in) the BDFL may
     also initiate a PEP review, first notifying the PEP author(s)
     and giving them a chance to make revisions.

It is implied that Guido himself doesn't necessarily do all the 
notifying or initiating, but may delegate to his loyal serfs. ;-)

 > If someone wants
 > to do a write-up and weather the ensuing firestorm, that is
 > enough for me.  If it has to sit for a few years before becoming
 > obviously good or bad, that's fine too.
 >
 > Also, some ideas need time.

Good points; I agree completely.  I have no problem leaving doomed (or 
currently perceived as doomed) PEPs to remain in limbo until the 
author(s) choose to seal their fate.

>> For a PEP to be accepted it must meet certain minimum criteria. It
>> must be a clear description of the proposed enhancement. The 
>> enhancement must represent a net improvement. The implementation, 
>> if applicable, must be solid and must not complicate the 
>> interpreter unduly. Finally, a proposed enhancement must be 
>> "pythonic" in order to be accepted by the BDFL. (However, 
>> "pythonic" is an imprecise term; it may be defined as whatever is 
>> acceptable to the BDFL. This logic is intentionally circular.)

Clarification: this paragraph addresses a completely separate issue than 
the proposed addition above.  I have sensed some confusion as to what 
constitutes an acceptable PEP, and a hand-waving blurb giving a vague 
definition seems useful.  Of course, it would be great if we could make 
the text more precise, but vagueness may have value here.  Comments on 
the wording are welcome.

> IOW, I like the process as it stands and am -1 on the 
> amendment.  It should be up to the pep author to 
> decide when to stick his head in the guillotine to
> see what happens :)

What's your opinion now, post-clarifications?  Please treat the two 
parts separately.

-- David Goodger



From python@rcn.com  Tue Apr 29 05:03:43 2003
From: python@rcn.com (Raymond Hettinger)
Date: Tue, 29 Apr 2003 00:03:43 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org>
Message-ID: <000b01c30e04$53ae5d60$920aa044@oemcomputer>

> [David Goodger]
> The desire is not to cull the weak, but to promote the 
> strong.  The desire is to change already-implemented and 
> implicitly-accepted PEPs to from "Status: Draft" to "Status: Accepted" 
> or "Status: Final".  

That's a good goal.


> Good points; I agree completely.  I have no problem leaving doomed (or 
> currently perceived as doomed) PEPs to remain in limbo until the 
> author(s) choose to seal their fate.

Great.  I have one of those ;)

> >> For a PEP to be accepted it must meet certain minimum criteria. It
> >> must be a clear description of the proposed enhancement. The 
> >> enhancement must represent a net improvement. The implementation, 
> >> if applicable, must be solid and must not complicate the 
> >> interpreter unduly. Finally, a proposed enhancement must be 
> >> "pythonic" in order to be accepted by the BDFL. (However, 
> >> "pythonic" is an imprecise term; it may be defined as whatever is 
> >> acceptable to the BDFL. This logic is intentionally circular.)
> 
> Clarification: this paragraph addresses a completely separate issue than 
> the proposed addition above.  I have sensed some confusion as to what 
> constitutes an acceptable PEP, and a hand-waving blurb giving a vague 
> definition seems useful.  

That's reasonable.  I'm not sure it would have filtered out anything
except an April Fools pep.


> What's your opinion now, post-clarifications?  Please treat the two 
> parts separately.

+1
+0

BTW, thanks for your work as PEP editor.
Keep it up,


Raymond Hettinger


From goodger@python.org  Tue Apr 29 05:14:29 2003
From: goodger@python.org (David Goodger)
Date: Tue, 29 Apr 2003 00:14:29 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
In-Reply-To: <000b01c30e04$53ae5d60$920aa044@oemcomputer>
References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org> <000b01c30e04$53ae5d60$920aa044@oemcomputer>
Message-ID: <3EADFC25.7000906@python.org>

Raymond Hettinger wrote:
> I'm not sure it would have filtered out anything
> except an April Fools pep.

That one was its own reward. :-)

-- David Goodger



From tim.one@comcast.net  Tue Apr 29 05:22:26 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 29 Apr 2003 00:22:26 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEHDEEAB.tim.one@comcast.net>

[Delaney, Timothy C]
> That's what I was getting at. I know that (for example) most
> classes I create have less that 16 entries in their __dict__.
> With this change, each class instance would take (approx) twice
> as much memory for its __dict__. I suspect that class instance
> __dict__ is the most common dictionary I use.

Do they have fewer then 6 entries?  Dicts with 5 or fewer entries don't
change size at all (an "empty dict" comes with room for 5 entries).

Surprise <wink>:  in many apps, the most frequent use is dicts created to
hold keyword arguments at call sites.  This is under the covers so you're
not normally aware of it.  Those almost always hold less than 6 entries;
except in apps where they don't.  But they're usually short-lived too (not
surviving the function call they're created for).

> That's not what I meant. Most dictionaries are fairly small.
> Large dictionaries are common, but I doubt they are common enough
> to offset the potential memory loss from this patch. Currently if
> you go one over a threshold you have a capacity of 2*len(d)-1.

Two-thirds of which is empty space right after resizing, BTW.

> With the patch this would change to 4*len(d)-1 - very significant
> for large dictionaries.

I don't know that it is.  One dict slot consumes 12 bytes on 32-bit boxes,
and slots are allocated contiguously so there's no hidden malloc overhead
per slot.  I hope a dict with a million slots counts as large, but that's
"only" 12MB for slot space.  When it gets too large to fit in RAM, that's
deadly to performance; I've reached that point many times in experimental
code, but those were lazy algorithms to an extreme.  So I'm more worried
about apps with several large dicts than about apps with one huge dict.

> ...
> I didn't look at the surrounding code (bad Tim D - thwack!) but
> in this case I would not expect an appreciable performance loss
> from this. However, the fact that we're getting an appreciable
> performance *gain* from changes on this branch suggests that it
> might be slightly more vulnerable than expected (but should still be
> swamped by the resize).

There's always more than one effect from a change.  Raymond explained that
large dict performance is boosted due to fewer collisions, and that makes
perfect sense (every probe in a large dict is likely to be a cache miss).
It doesn't make sense that fiddling the code inside the if-block slows
anything, unless perhaps it's an unfortunate I-stream cache effect slowing
the normal (if-block not entered) case.  When you're looking at out-of-cache
code, second- and third- order causes are often the whole story.



From greg@cosc.canterbury.ac.nz  Tue Apr 29 05:37:15 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Apr 2003 16:37:15 +1200 (NZST)
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304251948.26774.fincher.8@osu.edu>
Message-ID: <200304290437.h3T4bFl09594@oma.cosc.canterbury.ac.nz>

Jeremy Fincher <fincher.8@osu.edu>:

> It's a minor quibble to be sure, but os.walk doesn't really describe what 
> exactly it's doing. 

How about os.walkdir (by analogy with os.listdir).

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From jack@performancedrivers.com  Tue Apr 29 05:36:36 2003
From: jack@performancedrivers.com (Jack Diederich)
Date: Tue, 29 Apr 2003 00:36:36 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <000001c30de8$57219be0$125ffea9@oemcomputer>; from python@rcn.com on Mon, Apr 28, 2003 at 07:50:14PM -0400
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain> <000001c30de8$57219be0$125ffea9@oemcomputer>
Message-ID: <20030429003636.Q15881@localhost.localdomain>

On Mon, Apr 28, 2003 at 07:50:14PM -0400, Raymond Hettinger wrote:
> [jackdiederich]
> > You wouldn't have some created some handy tables of 'typical' dictionary
> > usage, would you?  They would be interesting in general, but very nice
> > for the PEPs doing dict optimizations for symbol tables in particular.
> 
> That path proved fruitless.  I studied the usage patterns in about
> a dozen of my apps and found that there is no such thing as typical.
> Instead there are many categories of dictionary usage.
[symbol table amongst them]

A good proj would be breaking out the particular cases of dictionary usage and
using the right dict for the right job.

Module symbol tables are dicts that have a different 'typical' usage than dicts
in general.  They are likely even regular enough in usage to actually _have_
a typical usage (no finger quotes).

I've looked at aliasing dicts used in symbol (builtin, module, local) tables
so they could be specialized from generic dicts in the source and I get lost 
in the nuances (esp frame stuff).  If someone who did know the code well 
enough  would make the effort it would allow those of us who are familiar but
not intimate with the source to take a shot at optimizing a particular use 
case (symbol table dicts).  Alas, people who are that familiar aren't likely
to do it, they have more important things to do.

-jackdied

ps, I've always wanted to try ternary trees as symbol tables.  They have
    worse than O(1) lookup, but in real life are probably OK for symbol tables.
    They nest beautifully and do cascading caching decently.


From tim.one@comcast.net  Tue Apr 29 05:57:58 2003
From: tim.one@comcast.net (Tim Peters)
Date: Tue, 29 Apr 2003 00:57:58 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304290437.h3T4bFl09594@oma.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net>

[Greg Ewing]
> How about os.walkdir (by analogy with os.listdir).

I'm -0 on bothering to change the name, but, if we have to, I'm +1 on
walkdir (for the reason Greg gives there).



From martin@v.loewis.de  Tue Apr 29 06:56:51 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 29 Apr 2003 07:56:51 +0200
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <3EADAC3F.6020802@cybertec.com.au>
References: <3EAD2D38.3030906@cybertec.com.au>
 <m3adea9yj7.fsf@mira.informatik.hu-berlin.de>
 <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de>
 <3EADAC3F.6020802@cybertec.com.au>
Message-ID: <m3ade9alss.fsf@mira.informatik.hu-berlin.de>

Chris Johns <cjohns@cybertec.com.au> writes:

> > So I think the configure test should be changed to define HAVE_PTON
> > only if all prerequisites of its usage are met (or the entire
> > function should be hidden if IPv6 is disabled).
> >
> 
> It would make Python more robust, but this is a mistake on my part.

It's a trade-off between maintainability and robustness, and in this
specific case, we favoured maintainability over robustness: We simply
assume that the code ought to compile on all systems that have
pton(3).

It might be that this assumption is wrong. If so, we need to consider
whether we want to support the systems for which it is wrong, in which
case my proposal would be to strengthen the pton test (thus ignoring
the buggy pton from the platform).

In this case, I read your message that it really is your fault and not
the system's (for hand-editing pyconfig.h); if you did indeed run
autoconf to determine presence of pton, I'd encourage you to
contribute a patch that analyses pton in more detail.

Regards,
Martin


From dberlin@dberlin.org  Tue Apr 29 07:48:01 2003
From: dberlin@dberlin.org (Daniel Berlin)
Date: Tue, 29 Apr 2003 02:48:01 -0400
Subject: [Python-Dev] Thoughts on -O
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC167@au3010avexu1.global.avaya.com>
Message-ID: <854D4DF4-7A0E-11D7-9180-000A95A34564@dberlin.org>

>>
>
> Yep - I know this. I would actually suggest removing .pyo and simply 
> have the info held in the .pyc.
>
>>> Anyway, any thoughts, rebuttals, etc would be of interest. I'd like
>>> to get some discussion before I create a PEP.
>>
>> I'm not convinced that we need anything, given the minimal effect of
>> most currently available optimizations.
>
> One of my options is to create a PEP specifically to have it rejected.
>
> However, I think there are definitely a couple of useful things in 
> here. In particular, it provides a path for introducing optimisations. 
> One of the complaints I have seen recently is that all optimisations 
> are being added to both paths.
>
> Perhaps this could be reduced to a process PEP with the following 
> major points:
>
> 1. Any new optimisation must be introduced on the optimised path.
>
> 2. Optimisations may be promoted from the optimised path to the 
> vanilla path at BDFL discretion.
>
> 3. Experimental optimisations in general will required at least one 
> complete release before being promoted from the optimised path to the 
> vanilla path.

Before everyone gets too far, are there actually concrete separate 
optimizations we are talking about here?
Or is this just "in case someone comes up with an optimization that 
helps"
I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law 
student by trade, who works for IBM's TJ Watson Research Center as a 
GCC Hacker), and i've looked at most optimizing python compilers that 
have existed in the past 4-5 years (geez, have i been lurking on 
python-dev that long. Wow.  I used to actively contribute now and then, 
stopped for a few years).
The only one that makes any appreciable difference is Psyco 
(unsurprising, actually), and measurements i did (and i think this was 
the idea behind it) show this is because of two things
1. Removal of python overhead (ie bytecode execution vs direct machine 
code)
2. Removal of temporary objects (which is more powerful than it sounds, 
because of how it's done.  Psyco simply doesn't emit code to compute 
something at runtime until forced. it does as much as it can at compile 
time, when possible.  In this way, one can view it as a very powerful 
symbolic execution engine)

In terms of improvements, starting with Psyco as your base (to be 
honest, doing something completely different isn't a smart idea.  He's 
got the right idea, there's no other real way you are going to get more 
speed), the best you can do are the following:
1. Improve the generated machine code (IE better register allocation, 
better scheduling, a peephole optimizer).  as for register allocation, 
I've never measured how often Psyco spills right now.  Some platforms 
are all about spill code generation (x86), others are more about 
coalescing registers.
2. Teach it how to execute more operations at compile time (IE improve 
the symbolic execution engine)
3. Improve the profiling done at runtime.

That's about all you can do. I've lumped all classical compiler 
optimizations into "improve generated machine code", since that is 
where you'd be able to do them (unless you want to introduce a new 
middle IR, which will complicate matters greatly, and probably not 
significantly speed things up).  Number 1 can become expensive quickly 
for a JIT, for rapidly diminishing gains.  Number 2 has the natural 
limit that once you've taught it how to virtualize every base python 
object and operation, it should be able to compute everything not in a 
c module given the input, and your limit becomes how good at profiling 
you are to choose what to specialize.  Number 3 doesn't become 
important until you start hitting negative gains due to choosing the 
wrong functions to specialize.

Any useful thing not involving specialization is some combination of
1. Not going to be applicable without specialization and compilation to 
machine code (I can think of no useful optimization that will make a 
significant difference at the python code level, that wouldn't be 
easier and faster to do at the machine code level. Python does not give 
enough guarantees that makes it better to optimizer python bytecode).
2. Already covered by the way it does compilation.
3. Too expensive.

Couple all of this with the fact that there are a limited number of 
operations performed at the python level already that aren't taken care 
of by making a better symbolic execution engine.

In short, I believe if you want to seriously talk about "adding this 
optimization", or "adding that optimization", that time would be better 
served doing something like psyco (if it's not acceptable or can't be 
made acceptable), where your main thing was specialization of 
functions, and compilation to machine code of the specialized 
functions.  These are your only real options for speeding up python 
code.
Diddling around at the python source or bytecode level will buy you 
*less* (since you still have the interpreter overhead), and be just as 
difficult  (since you will still need to specialize to be able to know 
the types involved).  If you want something to look at besides Psyco, 
see LLVM's runtime abilities (http://llvm.cs.uiuc.edu).  It might also 
make a good backend machine code optimizer replacement for Psyco's 
hard-coded x86 output, because it can exploit type information.

To put all of this in context, i'm assuming you aren't looking for 
5-10% gains, total. Instead, i'm assuming you are looking for very 
significant speedups (100% or greater).

If you only want 5-10%, that's easy to do at just the bytecode level, 
but you eventually hit the limit of the speed of bytecode execution, 
and from experience, you will hit it rather quickly.
--Dan



From mal@lemburg.com  Tue Apr 29 08:08:13 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 29 Apr 2003 09:08:13 +0200
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <200304282206.h3SM6md20118@odiug.zope.com>
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com>
Message-ID: <3EAE24DD.2070409@lemburg.com>

Guido van Rossum wrote:
>>I've experimented with about a dozen ways to improve dictionary 
>>performance and found one that benefits some programs by up to 
>>5% without hurting the performance of other programs by more
>>than a single percentage point.
>>
>>It entails a one line change to dictobject.c resulting in a new 
>>schedule of dictionary sizes for a given number of entries:

Perhaps you could share that change ? Or is it on SF somewhere ?

>>Number of           Current size        Proposed size
>>Filled Entries      of dictionary       of dictionary
>>--------------      -------------       -------------
>>[-- 0 to 5 --]            8                   8
>>[-- 6 to 10 --]          16                  32
>>[-- 11 to 21 --]         32                  32
>>[-- 22 to 42 --]         64                 128
>>[-- 43 to 85 --]        128                 128
>>[-- 86 to 170 --]       256                 512
>>[-- 171 to 341 --]      512                 512
> 
> 
> I suppose there's an "and so on" here, right?  I wonder if for
> *really* large dicts the space sacrifice isn't worth the time saved?

Once upon a time, when I was playing with inlining dictionary
tables (now part of the dictionary implementation thanks to Tim),
I found that optimizing dictionaries to have a table size 8
gave the best results. Most dictionaries in a Python application
have very few entries (and most of them were instance dictionaries
at the time -- not sure whether that's changed).

Another result of my experiments was that reducing the number
of resizes made a big difference.

To get some more useful numbers, I suggest to instrument Python
to display the table size of dictionaries and the number of
resizes necessary to make them that big. You should also keep
a good eye on the overall process size.

I believe that the reason for the speedups you see is
that cache sizes and processor optimizations have changes
since the times the current resizing implementation was chosen,
so maybe we ought to rethink the parameters:

* minimum table size
* first three resize steps

I don't think that large dictionaries should become more
sparse -- that's just a waste of memory.

>>The idea is to lower the average sparseness of dictionaries (by
>>0% to 50% of their current sparsenes).  This results in fewer 
>>collisions, faster collision resolution, fewer memory accesses,
>>and better cache performance.  A small side-benefit is halving
>>the number of resize operations as the dictionary grows.
> 
> I think you mean "raise the average sparseness" don't you?
> (The more sparse something is, the more gaps it has.)
> 
> I tried the patch with my new favorite benchmark, startup time for
> Zope (which surely populates a lot of dicts :-).  It did give about
> 0.13 seconds speedup on a total around 3.5 seconds, or almost 4%
> speedup.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 29 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        56 days left



From mal@lemburg.com  Tue Apr 29 08:11:21 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 29 Apr 2003 09:11:21 +0200
Subject: [Python-Dev] Thoughts on -O
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com>
Message-ID: <3EAE2599.8020702@lemburg.com>

Delaney, Timothy C (Timothy) wrote:
> Was doing some thinking in the shower this morning, and came up with some ideas for specifying optimisation. These are currently quite nebulous thoughts ...
> 
> We have the current situation:
> 
> -O only removes asserts
> -OO removes asserts and docstrings.

That's true, but not what they actually mean:

-O  ... optimize the byte code without changing semantics
-OO ... optimize even further, slight changes in semantics are allowed

(note that some tools rely on the availabilitiy of doc-strings)

Rather than adding more options, we should rather think about
more optimizations to add ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 29 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        56 days left



From tdelaney@avaya.com  Tue Apr 29 08:16:01 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 17:16:01 +1000
Subject: [Python-Dev] Thoughts on -O
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com>

> From: Daniel Berlin [mailto:dberlin@dberlin.org]
> >
> > 1. Any new optimisation must be introduced on the optimised path.
> >
> > 2. Optimisations may be promoted from the optimised path to the=20
> > vanilla path at BDFL discretion.
> >
> > 3. Experimental optimisations in general will required at least one=20
> > complete release before being promoted from the optimised=20
> path to the=20
> > vanilla path.
>=20
> Before everyone gets too far, are there actually concrete separate=20
> optimizations we are talking about here?
> Or is this just "in case someone comes up with an optimization that=20
> helps"

One I had in mind would be the CALL_ATTR patch, which Guido
explicitly mentioned as having been implemented on the main
path, not on the optimised path, and pointed out that if it
had been implemented only on the optimised path a number of
issues with it would have been discovered much earlier.

> The only one that makes any appreciable difference is Psyco

Indeed. I would love Psyco to eventually be part of Python, but
suspect it will only be so in the PyPy implementation.

> To put all of this in context, i'm assuming you aren't looking for=20
> 5-10% gains, total. Instead, i'm assuming you are looking for very=20
> significant speedups (100% or greater).

Many of the recent optimisation patches have involved 5% speedups in
some cases. If they all worked without impacting each other (cache
effects, etc) we could probably approach 50% improvement in some
cases.

I have no problems if someone can get a 5% speedup across the
board without introducing incredibly hairy code. I would like such
optimisations to eventually become part of the main path - but I
would prefer that it not become part of the main path until it has
been exposed to many different environments - assuming the
implementor or someone else can't come up with one or more cases
where it becomes a pessimisation.

> If you only want 5-10%, that's easy to do at just the bytecode level,=20
> but you eventually hit the limit of the speed of bytecode execution,=20
> and from experience, you will hit it rather quickly.

Indeed. Every attempt so far has either been in the 5% improvement or
less, standalone, and most have resulted in worse performance when
combined.

Tim Delaney


From tdelaney@avaya.com  Tue Apr 29 08:25:50 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Tue, 29 Apr 2003 17:25:50 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com>

> From: Tim Peters [mailto:tim.one@comcast.net]
>=20
> [Delaney, Timothy C]
> > That's what I was getting at. I know that (for example) most
> > classes I create have less that 16 entries in their __dict__.
> > With this change, each class instance would take (approx) twice
> > as much memory for its __dict__. I suspect that class instance
> > __dict__ is the most common dictionary I use.
>=20
> Do they have fewer then 6 entries?  Dicts with 5 or fewer=20
> entries don't
> change size at all (an "empty dict" comes with room for 5 entries).

No hard and fast data here. That would require grovelling through code =
;)
I was making an quick estimate.

Off the top of my head, most classes I create have ...

__init__
3-5 other methods
3-5 instance attributes

Hmm - that would only be 3-5 instance __dict__ entries, with
4-6 class __dict__ entries, correct?

I was forgetting that methods are put into the instance __dict__.

Bah - it's too late. It's the end of the day, and I've barely
managed to get 2 hours real work done.

Tim Delaney


From python@rcn.com  Tue Apr 29 09:12:52 2003
From: python@rcn.com (Raymond Hettinger)
Date: Tue, 29 Apr 2003 04:12:52 -0400
Subject: [Python-Dev] Dictionary tuning upto 100,000 entries
References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <3EAE24DD.2070409@lemburg.com>
Message-ID: <000201c30e27$a89bc4c0$125ffea9@oemcomputer>

[Raymond]
> >>I've experimented with about a dozen ways to improve dictionary 
> >>performance and found one that benefits some programs by up to 
> >>5% without hurting the performance of other programs by more
> >>than a single percentage point.
> >>
> >>It entails a one line change to dictobject.c resulting in a new 
> >>schedule of dictionary sizes for a given number of entries:

[Mark Lemburg]
> Perhaps you could share that change ? Or is it on SF somewhere ?

It was in the original post.  But SF is better, so
I just loaded it to the patch manager:
    www.python.org/sf/729395



[GvR]
> > I suppose there's an "and so on" here, right?  I wonder if for
> > *really* large dicts the space sacrifice isn't worth the time saved?

Due to the concerns raised about massive dictionaries,
I revised the patch to switch back to the old growth 
schedule for sizes above 100,000 entries (approx 1.2 Mb).



[Mark Lemburg]
> I believe that the reason for the speedups you see is
> that cache sizes and processor optimizations have changes
> since the times the current resizing implementation was chosen,
> so maybe we ought to rethink the parameters:
> 
> * minimum table size
> * first three resize steps

I've done dozens of experiements with changing these parameters
and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7)
but found that what helped some applications would hurt others.
The current tuning remains fairly effective.  Changing the resize
step from *2 to *4 was the only alteration that yielded across 
the board improvements.



Raymond Hettinger


From mal@lemburg.com  Tue Apr 29 10:48:03 2003
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 29 Apr 2003 11:48:03 +0200
Subject: [Python-Dev] Dictionary tuning upto 100,000 entries
In-Reply-To: <000201c30e27$a89bc4c0$125ffea9@oemcomputer>
References: <001b01c30dbf$94363140$125ffea9@oemcomputer>	<200304282206.h3SM6md20118@odiug.zope.com> <3EAE24DD.2070409@lemburg.com> <000201c30e27$a89bc4c0$125ffea9@oemcomputer>
Message-ID: <3EAE4A53.2030005@lemburg.com>

Raymond Hettinger wrote:
>>I believe that the reason for the speedups you see is
>>that cache sizes and processor optimizations have changes
>>since the times the current resizing implementation was chosen,
>>so maybe we ought to rethink the parameters:
>>
>>* minimum table size
>>* first three resize steps
> 
> 
> I've done dozens of experiements with changing these parameters
> and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7)
> but found that what helped some applications would hurt others.
> The current tuning remains fairly effective.  Changing the resize
> step from *2 to *4 was the only alteration that yielded across 
> the board improvements.

Ok, but I still fear that using *4 will cause too much memory
bloat for dicts which have more than 10-30 entries. If you instrument
Python you'll find that for typical applications, most dictionaries
will have only few entries. Tuning the implementation to those
findings is what you really want to do :-)

If you take e.g. Zope, what difference in memory consumption does
your patch make ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Apr 29 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
EuroPython 2003, Charleroi, Belgium:                        56 days left



From guido@python.org  Tue Apr 29 11:19:18 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Apr 2003 06:19:18 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: "Your message of Mon, 28 Apr 2003 21:49:52 EDT."
 <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net>
Message-ID: <200304291019.h3TAJI517748@pcp02138704pcs.reston01.va.comcast.net>

> What could be much worse is that stuffing code into the if-block
> bloats the code so much as to frustrate lookahead I-stream caching
> of the normal "branch taken and return 0" path:
> 
> 	if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
> 		if (dictresize(mp, mp->ma_used*2) != 0)
> 			return -1;
> 	}
> 	return 0;
> 
> Rewriting as
> 
> 	if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2)
> 		return 0;
> 
> 	return dictresize(mp, mp->ma_used*2) ? -1 : 0;

That last line might as well be

	return dictresize(mp, mp->ma_used*2); /* Or *4, per Raymond */

Which reminds me, there are two other places where dictresize() is
called; shouldn't those be changed to the new fill factor?

All in ll I think I'm mildly in favor of this change.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Apr 29 11:36:12 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Apr 2003 06:36:12 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: "Your message of Tue, 29 Apr 2003 00:57:58 EDT."
 <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net>
Message-ID: <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net>

> [Greg Ewing]
> > How about os.walkdir (by analogy with os.listdir).

[Tim]
> I'm -0 on bothering to change the name, but, if we have to, I'm +1 on
> walkdir (for the reason Greg gives there).

I'm -1 om changing the name.  os.walk() it is.

Short-n-sweet,

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Apr 29 11:46:58 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Apr 2003 06:46:58 -0400
Subject: [Python-Dev] Thoughts on -O
In-Reply-To: "Your message of Tue, 29 Apr 2003 17:16:01 +1000."
 <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com>
Message-ID: <200304291046.h3TAkwt17905@pcp02138704pcs.reston01.va.comcast.net>

[Tim Delaney]
> One I had in mind would be the CALL_ATTR patch, which Guido
> explicitly mentioned as having been implemented on the main
> path, not on the optimised path, and pointed out that if it
> had been implemented only on the optimised path a number of
> issues with it would have been discovered much earlier.

Correction: I meant to say that about the optimization of expressions
of the form

  '-' NUMBER      # e.g. -1

This was buggy for years.

I'm not aware of problems with CALL_ATTR (which exists only as a patch
on SF) except that it's not always a speedup. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Apr 29 12:04:27 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Apr 2003 07:04:27 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: "Your message of Tue, 29 Apr 2003 17:25:50 +1000."
 <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com>
Message-ID: <200304291104.h3TB4R618388@pcp02138704pcs.reston01.va.comcast.net>

[Tim Delaney]
> Off the top of my head, most classes I create have ...
> 
> __init__
> 3-5 other methods
> 3-5 instance attributes
> 
> Hmm - that would only be 3-5 instance __dict__ entries, with
> 4-6 class __dict__ entries, correct?
> 
> I was forgetting that methods are put into the instance __dict__.

No, they're not.

> Bah - it's too late. It's the end of the day, and I've barely
> managed to get 2 hours real work done.

That might explain your recent goofs. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Apr 29 12:54:23 2003
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Apr 2003 07:54:23 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
In-Reply-To: "Your message of Mon, 28 Apr 2003 20:57:50 EDT."
 <20030429005750.GA17963@panix.com>
References: <20030429005750.GA17963@panix.com>
Message-ID: <200304291154.h3TBsNc18815@pcp02138704pcs.reston01.va.comcast.net>

> There's some truth to that.  OTOH, until the BDFL declares something
> to be an ex-PEP, I don't think BDFL rejection of a PEP means that it
> is forever dead -- it just requires substantial revision to
> resurrect it.  The point of PEPs is to prevent rehashing of old
> subjects in the same way, not to prevent new ideas from restarting
> discussions.

In general, it's better to create a new PEP if you have a new idea.
The only reason to revive a rejected PEP would be if the reason for
rejecting the specific idea put forth in the PEP becomes invalid.  A
PEP should propose a specific solution.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Tue Apr 29 14:38:05 2003
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 29 Apr 2003 08:38:05 -0500
Subject: [Python-Dev] proposed amendments to PEP 1
In-Reply-To: <3EADF732.7020300@python.org>
References: <3EADB58A.2030607@python.org>
 <000101c30de8$57758840$125ffea9@oemcomputer>
 <3EADF732.7020300@python.org>
Message-ID: <16046.32829.920029.296191@montanaro.dyndns.org>

I'd like to move PEP 305 (CSV) along and intend to bring the text up-to-date
w.r.t. the current implementation, however the code which implements CSV
reading and writing doesn't currently handle Unicode.  Given that there is a
module checked into CSV, what should the PEP's status be, "draft" or
"accepted" or something else?

Skip


From goodger@python.org  Tue Apr 29 14:43:46 2003
From: goodger@python.org (David Goodger)
Date: Tue, 29 Apr 2003 09:43:46 -0400
Subject: [Python-Dev] proposed amendments to PEP 1
In-Reply-To: <16046.32829.920029.296191@montanaro.dyndns.org>
References: <3EADB58A.2030607@python.org>        <000101c30de8$57758840$125ffea9@oemcomputer>        <3EADF732.7020300@python.org> <16046.32829.920029.296191@montanaro.dyndns.org>
Message-ID: <3EAE8192.4030803@python.org>

Skip Montanaro wrote:
> I'd like to move PEP 305 (CSV) along and intend to bring the text up-to-date
> w.r.t. the current implementation, however the code which implements CSV
> reading and writing doesn't currently handle Unicode.  Given that there is a
> module checked into CSV,

CVS?

> what should the PEP's status be, "draft" or
> "accepted" or something else?

"Accepted" for now, becoming "Final" when the implementation is 
finished.  Assuming my first proposed PEP 1 amendment is okayed, Guido 
has already indicated that PEP 305 is to be accepted.

-- David Goodger



From skip@pobox.com  Tue Apr 29 16:11:09 2003
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 29 Apr 2003 10:11:09 -0500
Subject: [Python-Dev] Dictionary tuning
Message-ID: <16046.38413.487331.327698@montanaro.dyndns.org>

    >> Have you guys tried out the patch? I'm very interested in getting
    >> results from different benchmarks, processors, cache sizes, and
    >> various operating systems.

    Tim> If I can find the time I will. We're in crunch time on my project
    Tim> at the moment ... I'm somewhat over-allocated :(

Can't you just head over to Dunkin' Donuts and resize? ;-)

Skip


From fdrake@acm.org  Tue Apr 29 16:15:54 2003
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 29 Apr 2003 11:15:54 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <16046.38413.487331.327698@montanaro.dyndns.org>
References: <16046.38413.487331.327698@montanaro.dyndns.org>
Message-ID: <16046.38698.308785.590565@grendel.zope.com>

Skip Montanaro writes:
 > Can't you just head over to Dunkin' Donuts and resize? ;-)

Ooh, ooh!  Count me in!  ...er, oh, I guess I've done that too many
times already.  Never mind.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


From tdelaney@avaya.com  Tue Apr 29 22:51:31 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Wed, 30 Apr 2003 07:51:31 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC252@au3010avexu1.global.avaya.com>

> From: Guido van Rossum [mailto:guido@python.org]
>=20
> [Tim Delaney]
> > Off the top of my head, most classes I create have ...
> >=20
> > __init__
> > 3-5 other methods
> > 3-5 instance attributes
> >=20
> > Hmm - that would only be 3-5 instance __dict__ entries, with
> > 4-6 class __dict__ entries, correct?
> >=20
> > I was forgetting that methods are put into the instance __dict__.
>=20
> No, they're not.

Bah - I meant to say __class__.__dict__ - if you look at the numbers =
above
they add up that way.

> > Bah - it's too late. It's the end of the day, and I've barely
> > managed to get 2 hours real work done.
>=20
> That might explain your recent goofs. :-)

See above ;)

Well, it's a whole new day ... I've got an 8am phone call to the US
(10 minutes away) ... maybe I can do better today ...

Tim Delaney


From tdelaney@avaya.com  Tue Apr 29 22:52:52 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Wed, 30 Apr 2003 07:52:52 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com>

> From: Skip Montanaro [mailto:skip@pobox.com]
>=20
>     >> Have you guys tried out the patch? I'm very interested=20
> in getting
>     >> results from different benchmarks, processors, cache sizes, and
>     >> various operating systems.
>=20
>     Tim> If I can find the time I will. We're in crunch time=20
> on my project
>     Tim> at the moment ... I'm somewhat over-allocated :(
>=20
> Can't you just head over to Dunkin' Donuts and resize? ;-)

Umm ... I'm parsing this OK ... seems syntactically correct ... but
the not sure about the semantics ...

Tim Delaney


From skip@pobox.com  Tue Apr 29 23:05:06 2003
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 29 Apr 2003 17:05:06 -0500
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com>
References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com>
Message-ID: <16046.63250.915898.339768@montanaro.dyndns.org>

    Tim> at the moment ... I'm somewhat over-allocated :(

    Skip> Can't you just head over to Dunkin' Donuts and resize? ;-)

    Tim> Umm ... I'm parsing this OK ... seems syntactically correct ... but
    Tim> the not sure about the semantics ...

Well, when a dictionary is over-allocated, we make it bigger to create more
space.  I was thinking maybe you could try a similar sort of approach using
donuts...

Skip


From tdelaney@avaya.com  Wed Apr 30 00:54:35 2003
From: tdelaney@avaya.com (Delaney, Timothy C (Timothy))
Date: Wed, 30 Apr 2003 09:54:35 +1000
Subject: [Python-Dev] Dictionary tuning
Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC291@au3010avexu1.global.avaya.com>

> From: Skip Montanaro [mailto:skip@pobox.com]
>=20
>     Tim> at the moment ... I'm somewhat over-allocated :(
>=20
>     Skip> Can't you just head over to Dunkin' Donuts and resize? ;-)
>=20
>     Tim> Umm ... I'm parsing this OK ... seems syntactically=20
> correct ... but
>     Tim> the not sure about the semantics ...
>=20
> Well, when a dictionary is over-allocated, we make it bigger=20
> to create more
> space.  I was thinking maybe you could try a similar sort of=20
> approach using
> donuts...

Making me bigger won't help anything (I'm trying to make myself
smaller).

Now, if Dunkin' Donuts can make more of me, that's another matter ...

Tim Delaney


From gward@python.net  Wed Apr 30 03:07:44 2003
From: gward@python.net (Greg Ward)
Date: Tue, 29 Apr 2003 22:07:44 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net>
References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20030430020743.GA6541@cthulhu.gerg.ca>

On 29 April 2003, Guido van Rossum said:
> I'm -1 om changing the name.  os.walk() it is.

Sheesh, it's like my undeniably brilliant suggestion of os.walktree()
disappeared into thin air.  Ah well, back to wasting my vast talents
elsewhere... ;->

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Jesus Saves -- and you can too, by redeeming these valuable coupons!


From laotzu@pobox.com  Wed Apr 30 03:16:18 2003
From: laotzu@pobox.com (Mathieu Fenniak)
Date: Tue, 29 Apr 2003 20:16:18 -0600
Subject: [Python-Dev] 2.3b1, and object()
Message-ID: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>

I've been testing Python 2.3b1 since its release.  I've tested it with 
a number of applications I've written myself, as well as testing most 
of the new language features and modules out.  I've encountered no 
problems, and everything is happy and working.

On an unrelated note, I'm curious, what's the difference between an 
instance of an object, and an instance of an empty class?  Calling the 
object builtin returns an <object object at ...>, which I would expect 
would function the same as a 'class blah(object): pass', but they do 
not function similarly at all.

 >>> class A(object): pass
 >>> a = A()
 >>> a.i = 5
 >>> a.i
5
 >>>

 >>> a = object()
 >>> a.i = 5
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
AttributeError: 'object' object has no attribute 'i'

--
Random words of the day:
  Who does not trust enough will not be trusted.
                                                         Lao-Tzu

  Mathieu Fenniak <laotzu@pobox.com>
  PGP Key ID 0x2459092A
  http://www.stompstompstomp.com/



From drifty@alum.berkeley.edu  Wed Apr 30 04:47:37 2003
From: drifty@alum.berkeley.edu (Brett Cannon)
Date: Tue, 29 Apr 2003 20:47:37 -0700 (PDT)
Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9)
Message-ID: <Pine.SOL.4.55.0304292042280.9903@death.OCF.Berkeley.EDU>

(sorry for messing up people's threading of this thread but I deleted the
original emails since I summarized it already in my rough draft of the
next summary)

I just created patch #729988 that I think fixes any possible hanging
issues with test_logging in regards to it hanging after completing test 3
(its last test).  I just switched the lock used from a Condition lock
(which I think was sending its 'notify' faster than it took to reach the
'wait' call in the main thread) to an Event lock.  It solves the hanging
on my OS X box.

The reason I didn't apply it is that I don't have much threading
experience and I would rather be safe than sorry.  I just need someone to
sign off on it; I will apply it myself.

-Brett


From tim_one@email.msn.com  Wed Apr 30 05:32:23 2003
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 30 Apr 2003 00:32:23 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <3EAE24DD.2070409@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com>

[M.-A. Lemburg]
> ...
> Once upon a time, when I was playing with inlining dictionary
> tables (now part of the dictionary implementation thanks to Tim),

Thank you!

> ...
> I don't think that large dictionaries should become more
> sparse -- that's just a waste of memory.

Collision resolution is very fast if the dict slots happen to live in cache.
When they're out of cache, the apparent speed of the C code is irrelevant,
the time is virtually all consumed by the HW (or even OS) resolving the
cache misses, and every collision probe is very likely to be a cache miss
then (the probe sequence-- by design --jumps all over the slots in
(pseudo-)random order).  So when Raymond explained that increasing
sparseness helped *most* for large dicts, it made great sense to me.  We can
likely resolve dozens of collisions in a small dict in the time it takes for
one extra probe in a large dict.

Jeremy had a possibly happy idea wrt this:  make the collision probe
sequence start in a slot adjacent to the colliding slot.  That's likely to
get sucked into cache "for free", tagging along with the slot that collided.
If that's effective, it could buy much of the "large dict" speed gains
Raymond saw without increasing the dict size.

If someone wants to experiment with that in lookdict_string(), stick a new

    ++i;

before the for loop, and move the existing

		i = (i << 2) + i + perturb + 1;

to the bottom of that loop.  Likewise for lookdict().



From cjohns@cybertec.com.au  Wed Apr 30 06:10:50 2003
From: cjohns@cybertec.com.au (Chris Johns)
Date: Wed, 30 Apr 2003 15:10:50 +1000
Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled
In-Reply-To: <3EADAC3F.6020802@cybertec.com.au>
References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de> <3EADAC3F.6020802@cybertec.com.au>
Message-ID: <3EAF5ADA.2010006@cybertec.com.au>

Chris Johns wrote:
> Martin v. Lvwis wrote:
> 
>>
>> I see. And the system does have inet_pton? *That* sounds like a bug to 
>> me - there should be no inet_pton if the IPv6 API is unsupported.
> 
> 
> Agreed. I will disable them.
> 

I disabled HAVE_INET_PTON in the pyconfig.h although the functions are present in the 
RTEMS header files as suggested. This throws up another error. When disabled the 
inet_pton and inet_ntop funtions in socketmodule.c are built. The RTEMS prototypes 
and the ones provided in socketmodule.c are not extactly the same giving a compile 
time error.

The RTEMS history is the IP stack is a port of the FreeBSD stack from a while ago. It 
must have some IPV6 things how-ever as far as I know is not working on RTEMS. I 
suspect it is not complete/current.

I feel the best solution is to define HAVE_INET_PTON in pyconfig.h.

>>
>> So I think the configure test should be changed to define HAVE_PTON 
>> only if all prerequisites of its usage are met (or the entire function 
>> should be hidden if IPv6 is disabled).
>>
> 
> It would make Python more robust, but this is a mistake on my part.
> 

I wrapped 'socket_inet_pton' and friend with ENABLE_IPV6 and sockets under RTEMS work.


-- 
  Chris Johns, cjohns at cybertec.com.au



From martin@v.loewis.de  Wed Apr 30 06:14:44 2003
From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 30 Apr 2003 07:14:44 +0200
Subject: [Python-Dev] 2.3b1, and object()
In-Reply-To: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>
References: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>
Message-ID: <m3d6j4efcr.fsf@mira.informatik.hu-berlin.de>

Mathieu Fenniak <laotzu@pobox.com> writes:

> On an unrelated note, I'm curious, what's the difference between an
> instance of an object, and an instance of an empty class?

On python-dev, you are supposed to study the Python source code to
answer such questions (or find other means to investigate the answer
yourself)

Regards,
Martin


From greg@cosc.canterbury.ac.nz  Wed Apr 30 06:34:54 2003
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Apr 2003 17:34:54 +1200 (NZST)
Subject: [Python-Dev] 2.3b1, and object()
In-Reply-To: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>
Message-ID: <200304300534.h3U5Ysa15757@oma.cosc.canterbury.ac.nz>

Mathieu Fenniak <laotzu@pobox.com>:

> >>> class A(object): pass
> >>> a = A()
> >>> a.i = 5
> >>> a.i
> 5
> >>>
>
> >>> a = object()
> >>> a.i = 5
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> AttributeError: 'object' object has no attribute 'i'

I think this is because object is a built-in type, and
as such doesn't allow attributes to be added, unless you
create a Python subclass of it.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From Anthony Baxter <anthony@interlink.com.au>  Wed Apr 30 08:29:08 2003
From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter)
Date: Wed, 30 Apr 2003 17:29:08 +1000
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org>
Message-ID: <200304300729.h3U7T9O05308@localhost.localdomain>

>>> Itamar Shtull-Trauring wrote
> In real programs the speed drop would probably be much less pronounced,
> although I bet this slows down e.g. Anthony Baxter's portforwarder quite
> a bit. If Python 2.3 is released without fixing this Twisted will
> probably monkeypatch the socket module so that we can get full
> performance, since we have our own (unavoidable) layers of Python
> indirection :)

For whatever reason, it actually doesn't seem to matter. 
Python2.2 seems to clock in about 10% slower (in throughput
and connections/second) than the same code running under 2.3a1.
Upgrading to current-CVS, I see almost no difference between
2.3a1 and current-CVS (maybe 5% improvement). (FWIW, python2.1
is almost 25% slower than current-cvs!)

The code in question is pythondirector, a pure-python TCP 
loadbalancer, http://pythondirector.sf.net/. In this case all the 
above were run with Twisted 1.0.3. All tests were run on my laptop
via the loopback interface.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.



From guido@python.org  Wed Apr 30 14:49:51 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 30 Apr 2003 09:49:51 -0400
Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option
In-Reply-To: Your message of "Tue, 29 Apr 2003 22:07:44 EDT."
 <20030430020743.GA6541@cthulhu.gerg.ca>
References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net>
 <20030430020743.GA6541@cthulhu.gerg.ca>
Message-ID: <200304301349.h3UDnpJ28834@odiug.zope.com>

> On 29 April 2003, Guido van Rossum said:
> > I'm -1 om changing the name.  os.walk() it is.
> 
> Sheesh, it's like my undeniably brilliant suggestion of os.walktree()
> disappeared into thin air.  Ah well, back to wasting my vast talents
> elsewhere... ;->
> 
>         Greg

Sorry, I didn't see your suggestion until after I'd released 2.3b1.
The difference is not significant enough to rename things again after
the beta release.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Apr 30 14:59:46 2003
From: guido@python.org (Guido van Rossum)
Date: Wed, 30 Apr 2003 09:59:46 -0400
Subject: [Python-Dev] 2.3b1, and object()
In-Reply-To: Your message of "Tue, 29 Apr 2003 20:16:18 MDT."
 <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>
References: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com>
Message-ID: <200304301359.h3UDxku28868@odiug.zope.com>

> On an unrelated note, I'm curious, what's the difference between an 
> instance of an object, and an instance of an empty class?  Calling the 
> object builtin returns an <object object at ...>, which I would expect 
> would function the same as a 'class blah(object): pass', but they do 
> not function similarly at all.
> 
>  >>> class A(object): pass
>  >>> a = A()
>  >>> a.i = 5
>  >>> a.i
> 5
>  >>>
> 
>  >>> a = object()
>  >>> a.i = 5
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> AttributeError: 'object' object has no attribute 'i'

Instances of 'object' don't have an instance dict, so they are
uncapable of having instance variables.  When you use a class
statement, instances of the subclass get an instance dict, unless
__slots__ is used in that class statement.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python@rcn.com  Wed Apr 30 16:06:51 2003
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 30 Apr 2003 11:06:51 -0400
Subject: [Python-Dev] Dictionary tuning
References: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com>
Message-ID: <001101c30f2a$216954a0$b1b3958d@oemcomputer>

> Jeremy had a possibly happy idea wrt this:  make the collision probe
> sequence start in a slot adjacent to the colliding slot.  That's likely to
> get sucked into cache "for free", tagging along with the slot that collided.
> If that's effective, it could buy much of the "large dict" speed gains
> Raymond saw without increasing the dict size.

I worked on similar approaches last month and found them wanting.
The concept was that a 64byte cache line held 5.3 dict entries and 
that probing those was much less expensive than making a random
probe into memory outside of the cache.

The first thing I learned was that the random probes were necessary
to reduce collisions.  Checking the adjacent space is like a single
step of linear chaining, it increases the number of collisions.

That would be fine if the cost were offset by decreased memory
access time; however, for small dicts, the whole dict is already
in cache and having more collisions degrades performance
with no compensating gain.

The next bright idea was to have a separate lookup function for
small dicts and for larger dictionaries.  I set the large dict lookup
to search adjacent entries.  The good news is that an artificial
test of big dicts showed a substantial improvement (around 25%).
The bad news is that real programs were worse-off than before.

A day of investigation showed the cause.  The artificial test
accessed keys randomly and showed the anticipated benefit. However,
real programs access some keys more frequently than others
(I believe Zipf's law applies.)  Those keys *and* their collision
chains are likely already in the cache.  So, big dicts had
the same limitation as small dicts:  You always lose when you
accept more collisions in return for exploiting cache locality.

The conclusion was clear, the best way to gain performance
was to have fewer collisions in the first place.  Hence, I 
resumed experiments on sparsification.


> 
> If someone wants to experiment with that in lookdict_string(), stick a new
> 
>     ++i;
> 
> before the for loop, and move the existing
> 
> i = (i << 2) + i + perturb + 1;
> 
> to the bottom of that loop.  Likewise for lookdict().

PyStone gains 1%.
PyBench loses a 1%.
timecell gains 2%     (spreadsheet benchmark)
timemat loses 2%     (pure python matrix package benchmark)
timepuzzle loses 1% (class based graph traverser)



Raymond Hettinger



P.S.  There is one other way to improve cache behavior
but it involves touching code throughout dictobject.c.
Move the entry values into a separate array from the
key/hash pairs.  That way, you get 8 entries per cache line.

P.P.S.  One other idea is to use a different search pattern
for small dictionaries.  Store entries in a self-organizing list
with no holes.  Dummy fields aren't needed which saves
a test in the linear search loop.  When an entry is found,
move it one closer to the head of the list so that the most
common entries get found instantly.  Since there are no
holes, all eight cells can be used instead of the current
maximum of five.  Like the current arrangement, the
whole small dict fits into just two cache lines.

#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################


From jepler@unpythonic.net  Wed Apr 30 17:16:16 2003
From: jepler@unpythonic.net (Jeff Epler)
Date: Wed, 30 Apr 2003 11:16:16 -0500
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com>
References: <3EAE24DD.2070409@lemburg.com> <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com>
Message-ID: <20030430161615.GB22792@unpythonic.net>

On Wed, Apr 30, 2003 at 12:32:23AM -0400, Tim Peters wrote:
> If someone wants to experiment with that in lookdict_string(), stick a new
> 
>     ++i;
> 
> before the for loop, and move the existing
> 
> 		i = (i << 2) + i + perturb + 1;
> 
> to the bottom of that loop.  Likewise for lookdict().

You might also investigate making PyDictEntry a power-of-two bytes big
(it's currently 12 bytes) so that they align nicely in the cache, and
then use
    i ^= 1;
instead of 
    ++i;
so that the second key checked is always in the same (32-byte or bigger)
cache line.  Of course, increasing the size of PyDictEntry would also
increase the size of all dicts by 33%, so the speed payoff would have
to be big.

It's also not obvious that ma_smalltable will be 32-byte aligned (since
no special effort was made, it's unlikely to be).  If it's not, then
this optimization would still not pay (compared to i++) for <= MINSIZE
dictionaries. (which are the important case?)

A little program indicates that the table has an 8-byte or better
alignment, the xor approach gives same-cache-line results more
frequently than the increment approach even with a 12-byte PyDictEntry.
This doesn't quite make sense to me.  It also indicates that if the
alignment is not 32 bytes but the dict is 16 bytes that xor is a loss,
which does make sense.  The results (for a 32-byte cache line):
    algorithm   sizeof()    alignment   % in same cache line
    i^=1        12           4           62.5
    i^=1        12           8           75.0
    i^=1        12          16           75.0
    i^=1        12          32           75.0
    i^=1        16           4           50.0
    i^=1        16           8           50.0
    i^=1        16          16           50.0
    i^=1        16          32          100.0
     ++i        12           4           62.5
     ++i        12           8           62.5
     ++i        12          16           62.5
     ++i        12          32           62.5
     ++i        16           4           50.0
     ++i        16           8           50.0
     ++i        16          16           50.0
     ++i        16          32           50.0
so using i^=1 and adding 4 bytes to each dict (if necessary) to get
8-alignment of ma_smalltable would give a 12.5% increase in the hit rate
of the second probe compared to i++.

Ouch.  When I take into account that each probe accesses me_key (not
just me_hash) the results change:
   i^=1      16       4    37.5
    ++i      16       4    37.5
   i^=1      12      16    50.0
   i^=1      12      32    50.0
   i^=1      12       4    50.0
   i^=1      12       8    50.0
   i^=1      16      16    50.0
   i^=1      16       8    50.0
    ++i      12      16    50.0
    ++i      12      32    50.0
    ++i      12       4    50.0
    ++i      12       8    50.0
    ++i      16      16    50.0
    ++i      16      32    50.0
    ++i      16       8    50.0
   i^=1      16      32   100.0
You don't beat i++ unless you go to size 16 with alignment 32.

Looking at the # of cache lines accessed on average, the numbers are
unsurprising.  For the 37.5% items, 1.625 cache lines are accessed for
the two probes, 1.5 for the 50% items, and 1.0 for the 100% items.
Looking at the number of cache lines accessed for a single probe,
8-or-better alignment gives 1.0 cache lines accessed for 16-byte
structures, and 1.125 for all other cases (4-byte alignment or 12-byte
structure)

If the "more than 3 probes" case bears optimizing (and I doubt it does),
the for(perturb) loop could be unrolled once, with even iterations using
++i or i^=1 and odd iterations using i = (i << 2) + i + perturb + 1;
so that the same-cache-line property is used as often as possible.  Of
course, the code duplication of the rest of the loop body will increase
i-cache pressure a bit.

And I'm surprised if you read this far.  Summary: i^=1 is not likely to
win comapred to ++i, unless we increase dict size 33%.

Jeff


From itamar@itamarst.org  Wed Apr 30 17:41:54 2003
From: itamar@itamarst.org (Itamar Shtull-Trauring)
Date: Wed, 30 Apr 2003 12:41:54 -0400
Subject: [Python-Dev] Python 2.3b1 has 20% slower networking?
In-Reply-To: <200304300729.h3U7T9O05308@localhost.localdomain>
References: <20030427145316.475c3cf5.itamar@itamarst.org>
 <200304300729.h3U7T9O05308@localhost.localdomain>
Message-ID: <20030430124154.2da91bfe.itamar@itamarst.org>

On Wed, 30 Apr 2003 17:29:08 +1000
Anthony Baxter <anthony@interlink.com.au> wrote:

> For whatever reason, it actually doesn't seem to matter. 

OK, great. 

And thanks to the python-dev team for fixing the issue in CVS so
quickly.

-- 
Itamar Shtull-Trauring    http://itamarst.org/
http://www.zoteca.com -- Python & Twisted consulting


From python@rcn.com  Wed Apr 30 18:30:22 2003
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 30 Apr 2003 13:30:22 -0400
Subject: [Python-Dev] Dictionary tuning
References: <3EAE24DD.2070409@lemburg.com> <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> <20030430161615.GB22792@unpythonic.net>
Message-ID: <000901c30f3e$2e31a3e0$b1b3958d@oemcomputer>

> And I'm surprised if you read this far.  Summary: i^=1 is not likely to
> win comapred to ++i, unless we increase dict size 33%.

Right!  I had tried i^=1 and it had near zero or slightly negative
effects on performance.  It resulted in more collisions, though
the collisions were resolved relatively cheaply.

I had also experimented with changing alignment, but nothing
helped.  Everything is already word aligned and that takes
care of the HW issues.  The only benefit to the alignment
is that i^=1 guarantees a cache hit.  Without alignment, the
odds are 4 out of 5.3 will have a hit (since there a 5.3 entries
to a line).

Increasing the dict size 33% with unused space doesn't help
sparseness and negatively impacts the chance cache hits you
already have with smaller dictionaries.

heyhey-mymy-there's-more-to-the-picture-than-meets-the-eye,


Raymond Hettinger


From tim.one@comcast.net  Wed Apr 30 19:13:43 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 30 Apr 2003 14:13:43 -0400
Subject: [Python-Dev] Dictionary tuning
In-Reply-To: <20030430161615.GB22792@unpythonic.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEFEFIAA.tim.one@comcast.net>

FYI, for years the dict code had some #ifdef'ed preprocessor gimmick to
force cache alignment.  I ripped that out a while back because nobody ever
reported an improvement when using it.



From tim.one@comcast.net  Wed Apr 30 20:43:45 2003
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 30 Apr 2003 15:43:45 -0400
Subject: [Python-Dev] RE: os.path.walk() lacks 'depth first' option
In-Reply-To: <1051202649.3ea814599f6fa@mcherm.com>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEFMFIAA.tim.one@comcast.net>

[Michael Chermside]
> Don't get a swelled head or anything ;-), but your generator-based
> version  of walk() is beautiful piece of work. I don't mean the code
> (although that's clean and readable), but the design.
> ...

Thanks for the nudge!  If you hadn't reminded us, I bet this would have been
forgotten.  (I would have replied earlier, except my head got so heavy it
took this look to peel my lips off the floor.)



From python@rcn.com  Wed Apr 30 21:14:44 2003
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 30 Apr 2003 16:14:44 -0400
Subject: [Python-Dev] Dictionary tuning
References: <BIEJKCLHCIOIHAGOKOLHKEFEFIAA.tim.one@comcast.net>
Message-ID: <002301c30f55$245394c0$125ffea9@oemcomputer>

[Timbot]
> FYI, for years the dict code had some #ifdef'ed preprocessor gimmick to
> force cache alignment.  I ripped that out a while back because nobody ever
> reported an improvement when using it.

Gee, you mean we're not the first ones to have ever 
thought up dictionary optimizations that didn't pan out?

I've tried square wheels, pentagonal wheels, and gotten
even better results with octagonal wheels.  Each further 
subdivision seems to have less-and-less payoff
so I'm confident that octagonal is close to optimum ;-)

I'm going to write-up an informational PEP to summarize the
results of research to-date.  After the first draft, I'm sure the
other experimenters will each have lessons to share.  In
addition, I'll attach a benchmarking suite and dictionary
simulator (fully instrumented).  That way, future generations
can reproduce the results and pickup where we left-off.  

I've decided that this new process should have a name, 
something pithy, yet magical sounding, so it shall be 
dubbed SCIENCE.


Raymond Hettinger





From patmiller@llnl.gov  Wed Apr 30 23:15:31 2003
From: patmiller@llnl.gov (Patrick J. Miller)
Date: Wed, 30 Apr 2003 15:15:31 -0700
Subject: [Python-Dev] Initialization hook for extenders
Message-ID: <3EB04B03.887CDF7B@llnl.gov>

This is a multi-part message in MIME format.
--------------42714FBF141F967516679964
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I work on several projects that have initialization requirements that
need to grab control after Py_Initialize(), but before any user code
runs (via input, script, -c, etc...).

Note that these are Python clones that take advantage of an installed
python (using its $prefix/lib/pythonx.x/*.py and site-packages/*)

We could use 

PyImport_AppendInittab("sitecustomize",initsitecustomize);

But if there already IS customization in sitecustomize.py, I've
blown it away (and have to look it up and force an import).
And if someone uses the -S flag, I'm screwed.

I propose a hook styled after Py_AtExit(func) called Py_AtInit(func)
which maintains a list of functions that are called in Py_Initialize
right after main and site initializations.

If the hook isn't used, then the cost is a single extra function
call at initialization.  Here's a spurious example:  A customer wants
a version of python that has all the math functions and his
extensions to act like builtins...

I would write (without refcnt or error checks ;-):

#include "Python.h"
static void after_init(void) {
    PyObject *builtin,*builtin_dict,*math,*math_dict,*user,*user_dict;

    builtin = PyImport_ImportModule("__builtin__");
    builtin_dict = PyModule_GetDict(builtin);
    math = PyImport_ImportModule("math");
    math_dict = PyModule_GetDict(math);
    user = PyImport_ImportModule("user");
    user_dict = PyModule_GetDict(math);

    PyDict_Update(builtin_dictionary, math_dict);
    PyDict_Update(builtin_dictionary, user_dict);
}


int main(int argc, char** argv) {
    PyImport_AppendInittab("user",inituser);
    Py_AtInit(after_init);

    return Py_Main(argc, argv);
}

voila!  An extended Python with new builtins.

I actually want this to do some MPI initialization to setup a
single user prompt with broadcast which has to run after
Py_Initialize() but before the import of readline.

I've attached a copy of the patch (also going to patches
at sf.net)

Pat


-- 
Patrick Miller | (925) 423-0309 |
http://www.llnl.gov/CASC/people/pmiller

Son, when you grow up you will know who I really am.
I am just a child like you who has been forced to act
responsibly.  -- Rod Byrnes
--------------42714FBF141F967516679964
Content-Type: text/plain; charset=us-ascii;
 name="Py_AtInit.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Py_AtInit.diff"

Index: dist/src/Include/pythonrun.h
===================================================================
RCS file: /cvsroot/python/python/dist/src/Include/pythonrun.h,v
retrieving revision 2.62
diff -c -r2.62 pythonrun.h
*** dist/src/Include/pythonrun.h	13 Feb 2003 22:07:52 -0000	2.62
--- dist/src/Include/pythonrun.h	30 Apr 2003 22:04:13 -0000
***************
*** 75,80 ****
--- 75,81 ----
  PyAPI_FUNC(void) PyErr_Display(PyObject *, PyObject *, PyObject *);
  
  PyAPI_FUNC(int) Py_AtExit(void (*func)(void));
+ PyAPI_FUNC(int) Py_AtInit(void (*func)(void));
  
  PyAPI_FUNC(void) Py_Exit(int);
  
Index: dist/src/Python/pythonrun.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Python/pythonrun.c,v
retrieving revision 2.193
diff -c -r2.193 pythonrun.c
*** dist/src/Python/pythonrun.c	22 Apr 2003 11:18:00 -0000	2.193
--- dist/src/Python/pythonrun.c	30 Apr 2003 22:04:16 -0000
***************
*** 106,111 ****
--- 106,135 ----
  	return flag;
  }
  
+ #define NINITFUNCS 32
+ static void (*initfuncs[NINITFUNCS])(void);
+ static int ninitfuncs = 0;
+ 
+ int Py_AtInit(void (*func)(void))
+ {
+ 	if (ninitfuncs >= NINITFUNCS)
+ 		return -1;
+ 	if (!func)
+ 		return -1;
+ 	initfuncs[ninitfuncs++] = func;
+ 	return 0;
+ }
+ 
+ static void initinitialize(void)
+ {
+ 	int i;
+ 	for(i=0;i<ninitfuncs;++i) {
+ 		initfuncs[i]();
+ 		if (PyErr_Occurred())
+ 		  Py_FatalError("Py_AtInit: initialization error");
+ 	}
+ }
+ 
  void
  Py_Initialize(void)
  {
***************
*** 182,190 ****
--- 206,217 ----
  	initsigs(); /* Signal handling stuff, including initintr() */
  
  	initmain(); /* Module __main__ */
+ 
  	if (!Py_NoSiteFlag)
  		initsite(); /* Module site */
  
+ 	initinitialize(); /* Extenstion hooks */
+ 
  	/* auto-thread-state API, if available */
  #ifdef WITH_THREAD
  	_PyGILState_Init(interp, tstate);
***************
*** 1418,1423 ****
--- 1445,1451 ----
  #endif /* MS_WINDOWS */
  	abort();
  }
+ 
  
  /* Clean up and exit */
  

--------------42714FBF141F967516679964--



From thfcvjqtoeik@usa.net  Wed Apr 30 23:13:35 2003
From: thfcvjqtoeik@usa.net (Megan Shearer)
Date: Wed, 30 Apr 03 22:13:35 GMT
Subject: [Python-Dev] Continue using wxj
Message-ID: <fk$6l$4uq698$$$0@h5x9.gw5>

This is a multi-part message in MIME format.

--E8931..C.2CBE26.8D33A9AC
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<html>
<body bgcolor=3D"#ffffff">
<IMG src=3D"http://www.648am.com/where.cfm?id=3Dur1" height=3D"1" width=3D=
"1" border=3D0>
<div align=3D"center">
  <p><a href=3D"http://www.blondebutterflies.com/cosmetics/form.htm"><img =
name=3D"urmailer2" src=3D"http://www.blondebutterflies.com/cosmetics/image=
s/urmailer2.jpg" width=3D"800" height=3D"554" border=3D"0" alt=3D""></a></=
p>
</div>
<div align=3D"center"><img src=3D"http://www.blondebutterflies.com/footer3=
gif" alt=3D"" name=3D"footer3" width=3D"600" height=3D"60" border=3D"0" u=
semap=3D"#footer3Map"> 
  <map name=3D"footer3Map">
    <area shape=3D"rect" coords=3D"491,22,563,40" href=3D"http://www.blond=
ebutterflies.com/nope.html">
    <area shape=3D"rect" coords=3D"85,43,486,61" href=3D"http://www.blonde=
butterflies.com/next.cfm">
  </map>
</div>
</body>
</html>tr dohjno uvl hx dyrgv
bqpt lhdwq dgwdrzncz hox t m olcdhvry
--E8931..C.2CBE26.8D33A9AC--