From greg@cosc.canterbury.ac.nz Tue Apr 1 00:43:05 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 01 Apr 2003 12:43:05 +1200 (NZST) Subject: [Python-Dev] Distutils documentation amputated in 2.2 docs? Message-ID: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz> I was looking at the Distributing Python Modules section of the distutils docs for 2.2 the other day, and it mentioned a section about extending the distutils, but there did not appear to be any such section. Further investigation revealed that the 1.6 version of the docs *does* have this section, as section 8, but somewhere between the 1.6 and 2.2 docs, this section has disappeared, along with almost all of section 9, "Reference", which now appears as section 7, but with only a small part of what it should contain. What's the proper way of submitting a bug report about this? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From paul@prescod.net Tue Apr 1 00:52:06 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Mar 2003 16:52:06 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> Message-ID: <3E88E2B6.1080409@prescod.net> Ka-Ping Yee wrote: > Hmm, i'm not sure you understood what i meant. The code example i posted > is a solution to the design challenge: "provide read-only access to a > directory and its subdirectories, but no access to the rest of the filesystem". > I'm looking for other security design challenges to tackle in Python. > Once enough of them have been tried, we'll have a better understanding of > what Python would need to do to make secure programming easier. Okay, how about allowing a piece of untrusted code to import modules from a selected subset of all modules. For instance you probably want to allow untrusted code to get access to regular expressions and codecs (after taming!) but not os or socket. Speaking of sockets, web browsers often allow connections to sockets only at a particular domain. In a capabilities world, I guess the domain would be an object that you could request sockets from. Are DOS issues in scope? How do we prevent untrusted code from just bringing the interpreter to a halt? A smart enough attacker could even block all threads in the current process by finding a task that is usually not time-sliced and making it go on for a very long time. without looking at the Python implementation, I can't remember an example off of the top of my head, but perhaps a large multiplication or search-and-replace in a string. Paul Prescod From paul@prescod.net Tue Apr 1 01:08:40 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 31 Mar 2003 17:08:40 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E88E698.7000503@prescod.net> Guido van Rossum wrote: >... > >>In many classes, __init__ exercises authority. An obvious C type with >>the same problem is the "file" type (being able to ask a file object >>for its type gets you the ability to open any file on the filesystem). >>But many Python classes are in the same position -- they acquire >>authority upon initialization. > > > What do you mean exactly by "exercise authority"? Again, I understand > this for C code, but it would seem that all authority ultimately comes > from C code, so I don't understand what authority __init__() can > exercise. Given that Zipfile("/tmp/foo.zip") can read a zipfile, the zipfile class clearly has the ability to open files. It derives this ability from the fact that it can get at open(), os.open etc. In a capabilities world, it should not have access to that stuff unless the caller specifically gave it access. And the logical way for the caller to give it that access is like this: ZipFile(already_opened_file) But in restricted code > ... > But is it really ZipFile.__init__ that exercises the authority? Isn't > its authority derived from that of the open() function that it calls? I think that's the problem. the ZipFile module has a back-door "capability" that is incredibly powerful. In a library designed for capabilities, its only access to the outside world would be via data passed to it explicitly. > In what sense is the ZipFile class an entity by itself, rather than > just a pile of Python statements that derive any and all authority > from its caller? In the sense that it can import "open" or "os.open" rather than being forced to only communicate with the world through objects provided by the caller. If we imagine a world where it has no access to those back-doors then I can't see why Ping's complaint about access to classes would be a problem. Paul Prescod From jriehl@spaceship.com Tue Apr 1 01:50:39 2003 From: jriehl@spaceship.com (Jonathan Riehl) Date: Mon, 31 Mar 2003 19:50:39 -0600 (CST) Subject: [Python-Dev] PEP 269 once more. Message-ID: <Pine.BSF.4.33.0303311945410.8285-100000@localhost> Hey all, FYI, Guido closed the patch I had on SourceForge (599331), but I have just put an updated patch there. I have added some documentation on how my pgen module may be used, and the interface is much more consistent and useful than the prior upload. If anyone is interested in playing with pgen from Python, check it out and let me know what you think. Thanks! -Jon From martin@v.loewis.de Tue Apr 1 06:12:17 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 01 Apr 2003 08:12:17 +0200 Subject: [Python-Dev] Distutils documentation amputated in 2.2 docs? In-Reply-To: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz> References: <200304010043.h310h5M17556@oma.cosc.canterbury.ac.nz> Message-ID: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > What's the proper way of submitting a bug report about this? It would be best if you would provide a patch. Try to locate the primary source of the missing documentation (i.e. a TeX snippet), and integrate this into the current CVS, then do a cvs diff. If you find that the text is still there in the primary source, and just not rendered in the HTML version, submit a bug report pointing to the precise file that does not get rendered. Regards, Martin From joel@boost-consulting.com Tue Apr 1 08:56:34 2003 From: joel@boost-consulting.com (Joel de Guzman) Date: Tue, 1 Apr 2003 16:56:34 +0800 Subject: [Python-Dev] How to suppress instance __dict__? References: <ur88zougj.fsf@boost-consulting.com> <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <uof42i1ey.fsf@boost-consulting.com> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> <uvfyayr0y.fsf@boost-consulting.com> <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <021d01c2f82c$9b6d3470$4ee1afca@kim> Dave Abrahams wrote: >> I am generating extension types derived from a type which is derived >> from int 'int' by calling the metaclass; in order to prevent instances >> of the most-derived type from getting an instance _dict_ I am >> putting an empty tuple in the class _dict_ as '_slots_'. The >> problem with this hack is that it disables pickling of these babies: >> >> "a class that defines _slots_ without defining _getstate_ >> cannot be pickled" >> Guido van Rossum wrote: > Yes. I was assuming you'd do this at the C level. To do what I > suggested in Python, I think you'd have to write this: > > class M(type): > def __new__(cls, name, bases, dict): > C = type.__new__(cls, name, bases, dict) > del C.__getstate__ > return C Hi, Ok, I'm lost. Please be easy with me, I'm still learning the C API interfacing with Python :) Here's what I have so far. Emulating the desired behavior in Python, I can do: class EnumMeta(type): def __new__(cls, name, bases, dict): C = type.__new__(cls, name, bases, dict) del C.__getstate__ return C class Enum(int): __metaclass__ = EnumMeta __slots__ = () x = Enum(1964) print x import pickle print "SAVING" out_x = pickle.dumps(x) print "LOADING" xl = pickle.loads(out_x) print xl I'm trying to rewrite this in C/C++ with the intent to patch Boost.Python to allow pickling on enums. I took on this task to learn more about the low level details of Python C interfacing. So far, I have implemented EnumMeta in C that does not override anything yet and installed that as the metaclass of Enum. I was wondering... Is there some C code somewhere that I can see that implements some sort of meta-stuff? I read PEP253 and 253 and "Unifying Types and Classes in Python 2.2". The examples there (specifically the class autoprop) is written in Python. I tried searching for examples in C from the current CVS snapsot of 2.3 but I failed in doing so. I'm sure it's there, but I don't know where to find. To be specific, I'm lost in trying to implement tp_new of PyTypeObject. How do I call the default tp_new for metaclasses? TIA, -- Joel de Guzman joel at boost-consulting.com http://www.boost-consulting.com http://spirit.sf.net From zooko@zooko.com Tue Apr 1 16:47:56 2003 From: zooko@zooko.com (Zooko) Date: Tue, 01 Apr 2003 11:47:56 -0500 Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: Message from Guido van Rossum <guido@python.org> of "Mon, 31 Mar 2003 17:43:09 EST." <200303312243.h2VMhCC24639@odiug.zope.com> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com> <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com> Message-ID: <E190OvU-0002KN-00@localhost> (I, Zooko, wrote the lines prepended with "> > ".) Guido wrote: > > Yes. That may be why the demand for capabilities has been met with > resistance: to quote the French in "Monty Python and the Holy Grail", > "we already got one!" :-) ;-) Such skepticism is of course perfectly appropriate for proposed changes to your beautiful language. More on the one you already got below. (I agree: you already got one.) > > Here's a two sentence definition of capabilities: > > I've heard too many of these. They are all too abstract. There may have been a terminological problem. The word "capabilities" has been used for three different systems -- "capabilities-as-rows-of-the-Lampson-access- control-matrix", "capabilities-as-keys", and "capabilities-as-references". Unfortunately, the distinction is rarely made explicit, so people often assert things about "capabilities" which are untrue of capabilities-as-references. (Ping has just written a paper about this.) The former two kinds of capabilities have major problems and are disliked by almost everybody. The last one is the one that Ping, Ben Laurie and I are advocating, and the one that you already got. Anyway, if someone gave a definition of capabilities-as-references and it didn't match with the two-sentence definition I gave (and with the diagram), then it was wrong. Here's the two-sentence definition again: > > Authority originates in C code (in the interpreter or C extension > > modules), and is passed from thing to thing. > > This part I like. > > > A given thing "X" -- an instance of ZipFile, for example -- has the > > authority to use a given authority -- to invoke the real open(), for > > example -- if and only if some thing "Y" previously held both the > > "open()" authority and the "authority to extend authorities to X" > > authority, and chose to extend the "open()" authority to X. > > But the instance of ZipFile is not really a protection domain. > Methods on the instance may have different authority. Okay, ZipFile was the wrong example. Here it is without examples: Abstract version: A given thing "X" can use a given authority "S" if and only if some thing "Y" has previously held both the authority and the "authority to extend authorities to X" and chose to extend "S" to X. To make it concrete, I will use the word "object" to mean "anything referenced by a Python reference". This includes class instances, closures, bound methods, stack frames, etc. When I mean Python's instance-of-a-class "object", I'll say "instance" instead of "object". So the concrete version is: Concrete version: An object "X" can use an object "S" if and only if some object "Y" has previously held references to both S and X, and chose to give a reference to S to X. (Quoting out of order:) > > Hm. Reviewing the rexec docs, I being to suspect that the "access > > control system with unified designation and authority" *is* how > > Python does access control in restricted mode, and that rexec itself > > is just to manage module import and certain dangerous builtins. > > Yes. [...] > Sure. The question is, what exactly are Alice, Bob and Carol? I > claim that they are not specific class instances but they are each a > "workspace" as I tried to explain before. A workspace is more or less > the contents of a particular "sys.modules" dictionary. I believe I understand the motivation for rexec now. I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's suggestion [1]), Python objects have encapsulation -- one can't access their private data without their permission. Once this is done, Python references are capabilities. So if you have a Python object such as a wxWindow instance, and you want to control access to it, the natural way to do that is to control how references to it are passed around. This is why you've already got one. The natural and Pythonic way to control access to Python objects is with capabilities, and that's what you've been doing all along. However, you don't use the same technique to control access to Python *modules* such as the zipfile module, because the "import zipfile" statement will give the current scope access to the zipfile module even if nobody has granted such access to the current scope. This is a violation of the two-sentence definition and of the graph: the current scope just gained authority ex nihilo. So your solution to this, to prevent code from grabbing privileges willy nilly via "import" and builtins, is rexec, which creates a scope in which code executes (now called a "workspace"), and allows you to control which builtins and modules are available for code executing in that "workspace". Now access to modules conforms to the definition of capabilities: an object X can access a module S if and only if some object Y previously had access to X's workspace and to S, and Y chose to give X access to S. So unless I've missed something, rexec conforms to the definition of capabilities as well. (Of course, one can always build other access-control mechanisms on top of capabilities. In particular, the rexec "hooks" mechanism seems intended for that.) Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://mail.python.org/pipermail/python-dev/2003-March/034311.html From jeremy@zope.com Tue Apr 1 17:10:16 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 01 Apr 2003 12:10:16 -0500 Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: <E190OvU-0002KN-00@localhost> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com> <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com> <E190OvU-0002KN-00@localhost> Message-ID: <1049217016.14149.12.camel@slothrop.zope.com> On Tue, 2003-04-01 at 11:47, Zooko wrote: > I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's > suggestion [1]), Python objects have encapsulation -- one can't access their > private data without their permission. > > Once this is done, Python references are capabilities. REM does not provide object encapsulation, but it disables enough introspection that it is possible to provide encapsulation. The REM implementation provides a Bastion function that creates private state by storing the state in func_defaults, which is inaccessible in REM. Jeremy From paul@prescod.net Tue Apr 1 18:29:37 2003 From: paul@prescod.net (Paul Prescod) Date: Tue, 01 Apr 2003 10:29:37 -0800 Subject: [Python-Dev] Capabilities In-Reply-To: <200303312243.h2VMhCC24639@odiug.zope.com> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <200303311944.h2VJhsA16638@odiug.zope.com> <E1907fu-0007r9-00@localhost> <200303312243.h2VMhCC24639@odiug.zope.com> Message-ID: <3E89DA91.9040001@prescod.net> Guido van Rossum wrote: >>How is the implementation of "open" provided by the trusted code to >>the untrusted code? Is it possible to provide a different "open" >>implementation to different "instances" of the zipfile module? (I >>think not, as there is no such thing as "a different instance of a >>module", but perhaps you could have two rexec "workspaces" each of >>which has a zipfile module with a different "open"?) > > > To the contrary, it is very easy to provide code with a different > version of open(). E.g.: > > # this executes as trusted code > def my_open(...): > "open() variant that only allows reading" > my_builtins = {"len": len, "open": my_open, "range": range, ...} > namespace = {"__builtins__": my_builtins} > exec "..." in namespace That's fair enough, but why is it better for the "protection domain" to be an invoked "workspace" instead of an object? Think of it from a software engineering point of view: you're proposing that the right way to manage security is to override more-or-less global variables. Zooko is proposing that you pass the capabilities each method needs to that method. i.e. standard structured programming. Let's say that untrusted code wants access to the socket module. The surrounding code wants to tame it to prevent socket connections to certain IP addresses. I think that in the rexec model, the surrounding application would have to go in and poke "safe" versions of the constructor into the module. Or they would have to disallow access to the module altogether and provide an object that tamed module appropriately. The first approach is kind of error prone. The second approach requires the untrusted code to use a model of programming that is very different than "standard Python." If we imagined a Python with capabilities were built in deeply, the socket module would be designed to be tamed. By default it would have no authority at all except that which is passed in. The authority to contact the outside world would be separate from all of the other useful stuff in the socket module and socket class. I'm not necessarily advocating this kind of a change to the Python library... Paul Prescod From pje@telecommunity.com Tue Apr 1 18:01:54 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 01 Apr 2003 13:01:54 -0500 Subject: [Python-Dev] Capabilities (we already got one) Message-ID: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net> >However, you don't use the same technique to control access to Python *modules* >such as the zipfile module, because the "import zipfile" statement will give the >current scope access to the zipfile module even if nobody has granted such >access to the current scope. >... >So your solution to this, to prevent code from grabbing privileges willy nilly >via "import" and builtins, is rexec, which creates a scope in which code >executes (now called a "workspace"), and allows you to control which builtins >and modules are available for code executing in that "workspace". Almost. I think you may be confusing module *code* and module *objects*. Guido pointed this out earlier. A Python module object is populated by executing a body of *code* against the module *object* dictionary. The module object dictionary contains a '__builtins__' entry that gives it its "base" capabilities. Module *objects* possess capabilities, which are in their dictionary or reachable from it. *Code* doesn't possess capabilities except to constants used in the code. So access to *code* only grants you capabilities to the code and its constants. So, in order to provide a capability-safe environment, you need only provide a custom __import__ which uses a different 'sys.modules' that is specific to that environment. At that point, a "workspace" consists of an object graph rooted in the supplied '__builtins__', locals(), globals(), and initially executing code. We can then see that the standard Python environment is in fact a capability system, wherein everything is reachable from everything else. The "holes" in this capability system, then, are: 1. introspective abilities that allow "breaking out" of the workspace (such as the ability to 'sys._getframe()' or examine tracebacks to "reach up" to higher-level stack frames) 2. the structuring of the library in ways that equate creating an instance of a class with an "unsafe" capability. (E.g., creating instances of 'file()') coupled with instance->class introspection 3. Lack of true "privacy" for objects. (Proxies are a useful way to address this issue, because they allow more than one "capability" to exist for the same object.) From ping@zesty.ca Tue Apr 1 20:12:49 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Tue, 1 Apr 2003 14:12:49 -0600 (CST) Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: <E190OvU-0002KN-00@localhost> Message-ID: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org> On Tue, 1 Apr 2003, Zooko wrote: > I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's > suggestion [1]), Python objects have encapsulation -- one can't access their > private data without their permission. > > Once this is done, Python references are capabilities. Aaack! I wish you would *stop* saying that! There is no criterion by which a reference is or is not a capability. To talk in such terms only confuses the issue. It is possible to program in a capability style in any Turing-complete programming language, just as it is possible to program in an object style or a functional style or a procedural style. The question is: what does programming in a capability style look like, and how might Python facilitate (or even encourage) that style? To say that activating restricted execution mode causes things to "become" capabilities is as meaningless as saying that adding a feature to the C language would suddenly turn an arbitrary C program into an object-oriented program. -- ?!ng From ehuss@netmeridian.com Tue Apr 1 21:41:54 2003 From: ehuss@netmeridian.com (Eric Huss) Date: Tue, 1 Apr 2003 13:41:54 -0800 (PST) Subject: [Python-Dev] Minor issue with PyErr_NormalizeException Message-ID: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net> We had a bug in one of our extension modules that caused a core dump in PyErr_NormalizeException(). At the very top of the function (line 133) it checks for a NULL type. I think it should have a "return" here so that the code does not continue and thus dump core on line 153 when it calls PyClass_Check(type). This should also make the comment not lie about dumping core. ;) Just thought I'd pass it on.. -Eric From klm@zope.com Tue Apr 1 22:35:10 2003 From: klm@zope.com (Ken Manheimer) Date: Tue, 1 Apr 2003 17:35:10 -0500 (EST) Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org> Message-ID: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com> On Tue, 1 Apr 2003, Ka-Ping Yee wrote: > On Tue, 1 Apr 2003, Zooko wrote: > > I think that in restricted-execution-mode (hereafter: "REM", as > > per Greg Ewing's suggestion [1]), Python objects have > > encapsulation -- one can't access their private data without their > > permission. > > > > Once this is done, Python references are capabilities. > > Aaack! I wish you would *stop* saying that! > > There is no criterion by which a reference is or is not a capability. > To talk in such terms only confuses the issue. I take the above, with a bit of license, to mean that REM enables encapsulation for python objects, so they are closer to being safe to use as capabilities. Subsequent posts suggest that encapsulation isn't actually achieved, but that's not the issue here - the issue, as i understand it, is how to talk about enabling capability-based safety in python code. > It is possible to program in a capability style in any Turing-complete > programming language, just as it is possible to program in an object > style or a functional style or a procedural style. The question is: > what does programming in a capability style look like, and how might > Python facilitate (or even encourage) that style? I think the last part is, more specifically, "what measures need to be taken to enable safe use of python objects for capability style programming?" > To say that activating restricted execution mode causes things to > "become" capabilities is as meaningless as saying that adding a feature > to the C language would suddenly turn an arbitrary C program into an > object-oriented program. I'm not near as clear about all this as you seem to be, but i have the feeling the statements are not as meaningless as you're suggesting. I *do* think that getting more clear about what the questions are that we're trying to answer would be helpful, here. One big one seems to be: "What needs to be done to enable effective ("safe"?) use of python object (references) as capabilities?" I've seen answers to this roll by several times - i think we need to settle them, and collect the conclusions in a PEP. And we need to identify what other questions there are. One more probably is, "how do we use python objects as capabilities, once we can ensure their safety?" And maybe it'd be helpful to elaborate what "safety" means. -- Ken klm@zope.com Alan Turing thought about criteria to settle the question of whether machines can think, a question of which we now know that it is about as relevant as the question of whether submarines can swim. -- Edgser Dijkstra From beau@nyc-search.com Tue Apr 1 23:15:44 2003 From: beau@nyc-search.com (beau@nyc-search.com) Date: Tue, 01 Apr 2003 18:15:44 -0500 Subject: [Python-Dev] Python Programmers, NYC Message-ID: <3E8A1DA0.5E202C45@nyc-search.com> This is a multi-part message in MIME format. --------------C831F35444BF6E2B414EE13A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Python Programmers, NYC http://www.nyc-search.com/jobs/python.html --------------C831F35444BF6E2B414EE13A Content-Type: text/html; charset=us-ascii; name="python.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="python.html" Content-Base: "http://www.nyc-search.com/jobs/python. html" Content-Location: "http://www.nyc-search.com/jobs/python. html" <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="GENERATOR" content="Mozilla/4.79 [en] (Win98; U) [Netscape]"> <title>Python Programmers, NYC</title> </head> <body> <b><font face="Verdana"><font size=-1>Python Programmers, NYC</font></font></b><font face="Verdana"><font size=-1></font></font> <p><font face="Verdana"><font size=-1>We are seeking an experienced and highly-talented programmer/scripter/analysts to fill the position of Technical Lead for our quality control group. The successful candidate will collaborate with engineering, QC, and clients, and shall be responsible for developing and executing testing scripts to ensure all aspects of client data, as transformed to reports, meet stringent quality standards.</font></font><font face="Verdana"><font size=-1></font></font> <p><b><font face="Verdana"><font size=-1>Job Requirements:</font></font></b> <ul> <li> <font face="Verdana"><font size=-1>Solid experience programming with <b>Python</b> and <b>Java</b>, preferably in a <b>UNIX</b> environment.</font></font></li> <li> <font face="Verdana"><font size=-1>Strong knowledge of databases (<b>Oracle</b>) and <b>SQL</b> - knowledge of <b>PL/SQL</b> preferred.</font></font></li> <li> <font face="Verdana"><font size=-1>Strong analytical skills (mathematics or statistics background preferred).</font></font></li> <li> <font face="Verdana"><font size=-1>Demonstrated business knowledge of public education systems in the United States helpful.</font></font></li> <li> <font face="Verdana"><font size=-1>We are using Python to: Prototype and simulate key product functionality, as well as test the client data for consistency and test product subsystems for correctness.</font></font></li> <li> <font face="Verdana"><font size=-1>Candidates who elaborate on their knowledge of the above <b>*</b>key<b>*</b> requirements will get the best response.</font></font></li> </ul> <font face="Verdana"><font size=-1>My client hires on a contract basis first and then it becomes full time if both parties are happy.</font></font><font face="Verdana"><font size=-1></font></font> <p><b><font face="Verdana"><font size=-1>Candidates MUST be permanent and local tri-state (NY, NJ, CT) residents.</font></font></b><font face="Verdana"><font size=-1></font></font> <p><font face="Verdana"><font size=-1>Please submit Word resume and hourly/salary requirements to <a href="mailto:python@nyc-search.com?subject=&body=My hourly/salary requirements are">python@nyc-search.com</a></font></font> <br><font face="Verdana"><font size=-1></font></font> </body> </html> --------------C831F35444BF6E2B414EE13A-- From greg@cosc.canterbury.ac.nz Wed Apr 2 01:58:31 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 Apr 2003 13:58:31 +1200 (NZST) Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz> > It would be best if you would provide a patch. Try to locate the > primary source of the missing documentation (i.e. a TeX snippet), > and integrate this into the current CVS, then do a cvs diff. I'd rather not get involved in all that right now. I just want to draw this to the attention of whoever is maintaining the documentation. > submit a bug report That's what I *want* to do, but I can't figure out how. Following the obvious links leads me to the SourceForge Bug Tracker page, but I can't find anything there for submitting a new bug report, only browsing existing ones. Can someone please tell me how to submit a bug report? Thanks, Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From lalo@laranja.org Wed Apr 2 02:40:11 2003 From: lalo@laranja.org (Lalo Martins) Date: Tue, 1 Apr 2003 23:40:11 -0300 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz> References: <m34r5ipwzi.fsf@mira.informatik.hu-berlin.de> <200304020158.h321wVY02357@oma.cosc.canterbury.ac.nz> Message-ID: <20030402024010.GG6887@laranja.org> On Wed, Apr 02, 2003 at 01:58:31PM +1200, Greg Ewing wrote: > > That's what I *want* to do, but I can't figure out how. > Following the obvious links leads me to the SourceForge > Bug Tracker page, but I can't find anything there for > submitting a new bug report, only browsing existing ones. > > Can someone please tell me how to submit a bug report? You need to login to sourceforge. Once you do that you should see a bar that looks like Submit New | Browse | Reporting | Admin the link you want is "Submit New". []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://www.laranja.org/pessoal/pgp Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/ GNU: never give up freedom http://www.gnu.org/ From tim.one@comcast.net Wed Apr 2 03:03:45 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 01 Apr 2003 22:03:45 -0500 Subject: [Python-Dev] Minor issue with PyErr_NormalizeException In-Reply-To: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net> Message-ID: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net> [Eric Huss] > We had a bug in one of our extension modules that caused a core dump in > PyErr_NormalizeException(). At the very top of the function (line 133) it > checks for a NULL type. I think it should have a "return" here so that > the code does not continue and thus dump core on line 153 when it calls > PyClass_Check(type). This should also make the comment not lie about > dumping core. ;) > > Just thought I'd pass it on.. I agree the code doesn't make sense, but the comment doesn't either. I'm in favor of replacing the guts of the if (type == NULL) { block with a call to Py_FatalError(). From barry@python.org Wed Apr 2 04:06:32 2003 From: barry@python.org (Barry Warsaw) Date: 01 Apr 2003 23:06:32 -0500 Subject: [Python-Dev] Minor issue with PyErr_NormalizeException In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCAEEOECAB.tim.one@comcast.net> Message-ID: <1049256392.3057.3.camel@geddy> On Tue, 2003-04-01 at 22:03, Tim Peters wrote: > [Eric Huss] > > I agree the code doesn't make sense, but the comment doesn't either. I'm in > favor of replacing the guts of the > > if (type == NULL) { > > block with a call to Py_FatalError(). +1 -Barry From drifty@alum.berkeley.edu Wed Apr 2 04:52:22 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Tue, 1 Apr 2003 20:52:22 -0800 (PST) Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 Message-ID: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> You guys have 24 hours to correct my usual bunch of mistakes. Also give me feedback on the new format for the Quickies section. ----------- +++++++++++++++++++++++++++++++++++++++++++++++++++++ python-dev Summary for 2003-03-16 through 2003-03-31 +++++++++++++++++++++++++++++++++++++++++++++++++++++ .. _last summary: http://www.python.org/dev/summary/2003-03-01_2003-03-15.html ====================== Summary Announcements ====================== PyCon is now over! It was a wonderful experience. Getting to meet people from python-dev in person was great. The sprint was fun and productive (work on the AST branch, caching where something is found in an inheritence tree, and a new CALL_ATTR opcode were all worked on). Definitely was worth it. I am trying a new way of formatting the Quickies_ section. I am trying non-inline implicit links instead of inlined ones. I am hoping this will read better in the text version of the summary. If you have an opinion on whether the new or old version is better let me know. And remember, the last time I asked for an opinion Michael Chermside was the only person to respond and thus ended up making an executive decision. .. _PyCon: http://www.python.org/pycon/ ======================== `Re: lists v. tuples`__ ======================== __ http://mail.python.org/pipermail/python-dev/2003-March/034029.html Splinter threads: - `Re: Re: lists v. tuples <http://mail.python.org/pipermail/python-dev/2003-March/034070.html>`__ This developed from a thread from covered in the `last summary`_ that discussed the different uses of lists and tuples. By the start date for this summary, though, it had turned into a discussion on comparisons. This occured when sorting heterogeneous objects came up. Guido commented that having anything beyond equality and non-equality tests for non-related objects does not make sense. This also led Guido to comment that "TOOWTDI makes me want to get rid of __cmp__" (TOOWTDI is "There is Only One Way to Do It"). Now before people start screaming bloody murder over the possible future loss of __cmp__() (which probably won't happen until Python 3), realize that all comparisons can be done using the six other rich comparisons (__lt__(), __eq__(), etc.). There is some possible code elegance lost if you have to use two rich comparisons instead a single __cmp__() comparison, but it is nothing that will prevent you from doing something that you couldn't do before. This all led Guido to suggest introducing the function before(). This would be used for arbitrary ordering of objects. Alex Martelli said it would "be very nice if before(x,y) were the same as x<y whenever the latter doesn't raise an exception, if feasible". He also said that it should probably "define a total ordering, i.e. the implied equivalence being equality". ================================ `Fast access to __builtins__`__ ================================ __ http://mail.python.org/pipermail/python-dev/2003-March/034243.html There has been rumblings on the list as of late of disallowing shadowing of built-ins. Specifically, the idea of someone injecting something into a module's namespace that overrides a global (by doing something like ``socket.len = lambda x: 42`` from the socket module) is slightly nasty, rarely done, and prevents the core from optimizing for built-ins. Raymond Hettinger, in an effort to see how to speed up built-in access, came up with the idea of replacing opcode calls of LOAD_GLOBAL and replace them with LOAD_CONST after putting the built-in being called into the constants table. This would leave shadowing of built-ins locally unaffected but prevent shadowing at the module. Raymond suggested turning on this behavior for when running Python -O. The idea of turning this on when running with the -O option was shot down. The main argument is that semantics are changed and thus is not acceptable for the -O flag. It was mentioned that -OO can change semantics, but even that is questionable. So this led to some suggestions of how to turn this kind of feature on. Someone suggested something like a pragma (think Perl) or some other mechanism at the module level. Guido didn't like this idea since he does not want modules to be riddled with code to turn on module-level optimizations. But all of this was partially shot down when Guido stepped in and reiterated he just wanted to prevent outside code from shadowing built-ins for a module. The idea is that if it can be proven that a module does not shadow a built-in it can output an opcode specific for that built-in, e.g. len() could output opcode for calling PyOject_Size() if the compiler can prove that len() is not shadowed in the module at any point. Neil Schemanauer suggested adding a warning for when this kind of shadowing is done. Guido said fine as long as extension modules are exempt. Now no matter how well the warning is coded, it would be *extremely* difficult to catch something like ``import X; d = X__dict__; d["len"] = lambda x: 42``. How do you deal with this? By Guido saying he has not issue saying something like this "is always prohibited". He said you could still do ``setattr(X, "len", lambda x: 42)``, though, and that might give you a warning. ================================ `capability-mediated modules`__ ================================ __ http://mail.python.org/pipermail/python-dev/2003-March/034149.html Splinter threads: - `Capabilities <http://mail.python.org/pipermail/python-dev/2003-March/034152.html>`__ The thread that will not die (nor does it look like it will in the near future; Guido asked to postpone discussing it until he gets back from `Python UK`_ which will continue the discussion into the next summary. I am ending up an expert at capabilities against my will. =) In case you have not been following all of this, capabilities as being discussed here is the idea that security is based on passing around references to objects. If you have a reference you can use it with no restrictions. Security comes in by controlling who you give references to. So I might ask for a reference to file(), but I won't necessarily get it. I could, instead, be handed a reference to a restrictive version of file() that only opens files in an OSs temporary file directory. If that is not clear, read the `last summary`_ on this thread. And now, on to the new stuff... One point made about capabilities is that they partially go against the Pythonic grain. Since you have to pass capabilities specifically and shouldn't allow them to be inherited, it does not go with the way you tend to write Python code. There were also suggestions to add arguments to import statements to give a more fine-grained control over them. But it was pointed out that classes fit this bill. The idea of limiting what modules are accessible by some code by not using a universally global scope (i.e., not using sys.modules) but by having a specific scope for each function was suggested. As Greg Ewing put it, "it would be dynamic scoping of the import namespace". While trying to clarify things (which were at PyCon thanks to the Open Space discussion held there on this subject), a good distinction between a rexec_ world (as in the module) and a capabilities was made by Guido. In capabilities, security is based on passing around references that have the amount of power you are willing for it to have. In a rexec world, it is based on what powers the built-ins give you; there is no worry about passing around code. Also, in the rexec world, you can have the idea of a "workspace" where __builtin__ has very specific definitions of built-ins that are used when executing untrusted code. Ka-Ping Yee wrote up an example of some code of what it would be like to code with capabilities (can be found at XXX ). .. _Python UK: http://www.python-uk.org/ .. _rexec: http://www.python.org/dev/doc/devel/lib/module-rexec.html ========= Quickies ========= `tzset`__ time.tzset() is going to be kept in Python, but only on UNIX. The testing suite was also loosened so as to not throw as many false-negatives. __ http://mail.python.org/pipermail/python-dev/2003-March/034062.html `Windows IO`__ stdin and stdout on Windows are TTYs. You can get 3rd-party modules to get more control over the TTY. __ http://mail.python.org/pipermail/python-dev/2003-March/034102.html `Who approved PyObject_GenericGetIter()???`__ Splinter threads: `Re: [Python-checkins] python/dist/src/Modules _hotshot.c,...`__; `PyObject_GenericGetIter()`__ Raymond Hettinger wrote a function called PyObject_GenericGetIter() that returned self for objects that were an iterator themselves. Thomas Wouters didn't like the name and neither did Guido since it was generic at all; it worked specifically with objects that were iterators themselves. Thus the function was renamed to PyObject_SelfIter(). __ http://mail.python.org/pipermail/python-dev/2003-March/034107.html __ http://mail.python.org/pipermail/python-dev/2003-March/034103.html __ http://mail.python.org/pipermail/python-dev/2003-March/034110.html `test_posix failures?`__ A test for posix.getlogin() was failing for Barry Warsaw under XEmacs (that is what he gets for not using Vim_ =). Thomas Wouters pointed out it only works when there is a utmp file somewhere. Basically it was agreed the test that was failing should be removed. __ http://mail.python.org/pipermail/python-dev/2003-March/034120.html .. _Vim: http://www.vim.org/ `Shortcut bugfix`__ Raymond Hettinger reported that a change in `_tkinter.c`_ for a function led to it returning strings or ints which broke PMW_ (although having a function return two different things was disputed in the thread; I think it used to return a string and now returns an int). The suggestion of making string.atoi() more lenient on its accepted arguments was made but shot down since it changes semantics. If you want to keep old way of having everything in Tkinter return strings instead of more proper object types (such as ints where appropriate), you can put teh line ``Tkinter.wantobjects = 0`` before the first creation of a tkapp object. __ http://mail.python.org/pipermail/python-dev/2003-March/034138.html .. __tkinter.c: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_tkinter.c .. _PMW: http://pmw.sourceforge.net/ `csv package ready for prime-time?`__ Related: `csv package stitched into CVS hierarchy`__ Skip Montanaro: Okay to move csv_ package from the sandbox into the stdlib? Guido van Rossum: Yes. __ http://mail.python.org/pipermail/python-dev/2003-March/034162.html __ http://mail.python.org/pipermail/python-dev/2003-March/034179.html .. _csv: http://www.python.org/dev/doc/devel/lib/module-csv.html `string.strip doc vs code mismatch`__ Neal Norwitz asked for someone to look at http://python.org/sf/697220 which updates string.strip() from the string_ module to take an optional second argument. The patch is still open. __ http://mail.python.org/pipermail/python-dev/2003-March/034167.html .. _string: http://www.python.org/dev/doc/devel/lib/module-string.html `Re: More int/long integration issues`__ The point was made that it would be nice if the statement ``if num in range(...): ...`` could be optimized by the compiler if range() was only the built-in by substituting it with something like xrange() and thus skip creating a huge list. This would allow the removal of xrange() without issue. Guido suggested a restartable iterator (generator would work wonderfully if you could just get everything else to make what range() returns look like the list it should be). __ http://mail.python.org/pipermail/python-dev/2003-March/034019.html `socket timeouts fail w/ makefile()`__ Skip Montanaro discovered that using the makefile() method on a socket cause the file-like object to not observe the new timeout facility introduced in Python 2.3. He has since patched it so that it works properly and that sockets always have a makefile() (wasn't always the case before). __ http://mail.python.org/pipermail/python-dev/2003-March/034177.html `New Module? Tiger Hashsum`__ Tino Lange implemented a wrapper for the `Tiger hash sum`_ for Python and asked how he could get it added to the stdlib. He was told that he would need community backing before his module could be added in order to make sure that there is enough demand to warrant the edition. __ http://mail.python.org/pipermail/python-dev/2003-March/034191.html .. _Tiger hash sum: http://www.cs.technion.ac.il/~biham/Reports/Tiger/ `Icon for Python RSS Feed?`__ Tino Lange asked if an XML RSS feed icon could be added at http://www.python.org/ for http://www.python.org/channews.rdf . It has been added. __ http://mail.python.org/pipermail/python-dev/2003-March/034196.html `How to suppress instance __dict__?`__ David Abrahams asked if there was an easy way to suppress an instance __dict__'s creation from a metaclass. The answer turned out to be no. __ http://mail.python.org/pipermail/python-dev/2003-March/034197.html `Weekly Python Bug/Patch Summary`__ Another summary can be found at http://mail.python.org/pipermail/python-dev/2003-March/034286.html Skip Montanaro's weekly reminder how Python ain't perfect. __ http://mail.python.org/pipermail/python-dev/2003-March/034200.html `[ot] offline`__ Samuele Pedroni is off relaxing is is going to be offline for two weeks starting March 23. __ http://mail.python.org/pipermail/python-dev/2003-March/034204.html `funny leak`__ Christian Tismer discovered a memory leak in a funky def statement he came up with. The leak has since been squashed (done at PyCon_ during the sprint, actually). __ http://mail.python.org/pipermail/python-dev/2003-March/034212.html `Checkins to Attic?`__ CVS_ uses something called the Attic to put files that are only in a branch but not the HEAD of a tree. __ http://mail.python.org/pipermail/python-dev/2003-March/034230.html .. _CVS: http://www.cvshome.org/ `ossaudiodev tweak needs testing`__ Greg Ward asked people who are running Linux or FreeBSD to execute ``Lib/test/regrtest.py -uaudio test_ossaudiodev`` so as to test his latest change to ossaudiodev_. __ http://mail.python.org/pipermail/python-dev/2003-March/034233.html .. _ossaudiodev: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/ossaudiodev.c `cvs.python.sourceforge.net fouled up`__ Apparently when you get that nice message from SourceForge_ telling you that recv() has aborted because of server overloading you can rest assured that people with checkin rights get to continue to connect since they get priority. __ http://mail.python.org/pipermail/python-dev/2003-March/034234.html .. _SF: .. _SourceForge: http://www.sf.net/ `Doc strings for typeslots?`__ You can't add custom docstrings to things stored in typeobject slots at the C level. __ http://mail.python.org/pipermail/python-dev/2003-March/034239.html `Compiler treats None both as a constant and variable`__ As of now the compiler outputs opcode that treats None as both a global and a constant. That will change as some point when assigning to None becomes an error instead of a warning as it is in Python 2.3; possibly 2.4 the change will be made. __ http://mail.python.org/pipermail/python-dev/2003-March/034281.html `iconv codec`__ M.A. Lemburg stated that he questioned whether the iconv codec was ready for prime-time. There have been multiple issues with it and most seem to stem from a platform's codec and not ones that come with Python. This affects all u"".encode() calls when the codec does not come with Python. Hye-Shik Chang said he would get his iconv codec NG patch up on SF in the next few days and that would be applied. __ http://mail.python.org/pipermail/python-dev/2003-March/034300.html From beau@nyc-search.com Wed Apr 2 04:52:26 2003 From: beau@nyc-search.com (beau@nyc-search.com) Date: Tue, 01 Apr 2003 23:52:26 -0500 Subject: [Python-Dev] Python Technical Lead, New York, NY Message-ID: <3E8A6C8A.223A19FC@nyc-search.com> This is a multi-part message in MIME format. --------------FCDF22A5C479E2E8508D5BD8 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit http://www.nyc-search.com/jobs/python.html --------------FCDF22A5C479E2E8508D5BD8 Content-Type: text/html; charset=us-ascii; name="python.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="python.html" Content-Base: "http://www.nyc-search.com/jobs/python. html" Content-Location: "http://www.nyc-search.com/jobs/python. html" <html> <head> <title>Python Technical Lead, New York, NY</title> </head> <body> <b><font face="Verdana"><font size=-1>Python Technical Lead, New York, NY</font></font></b><font face="Verdana"><font size=-1></font></font> <p><font face="Verdana"><font size=-1>We are seeking an experienced and highly-talented programmer/scripter/analysts to fill the position of Technical Lead for our quality control group. The successful candidate will collaborate with engineering, QC, and clients, and shall be responsible for developing and executing testing scripts to ensure all aspects of client data, as transformed to reports, meet stringent quality standards.</font></font><font face="Verdana"><font size=-1></font></font> <p><b><font face="Verdana"><font size=-1>Job Requirements:</font></font></b> <ul> <li> <font face="Verdana"><font size=-1>Solid experience programming with <b>Python</b> and <b>Java</b>, preferably in a <b>UNIX</b> environment.</font></font></li> <li> <font face="Verdana"><font size=-1>Strong knowledge of databases (<b>Oracle</b>) and <b>SQL</b> - knowledge of <b>PL/SQL</b> preferred.</font></font></li> <li> <font face="Verdana"><font size=-1>Strong analytical skills (mathematics or statistics background preferred).</font></font></li> <li> <font face="Verdana"><font size=-1>Demonstrated business knowledge of public education systems in the United States helpful.</font></font></li> <li> <font face="Verdana"><font size=-1>We are using Python to: Prototype and simulate key product functionality, as well as test the client data for consistency and test product subsystems for correctness.</font></font></li> <li> <font face="Verdana"><font size=-1>Candidates who elaborate on their knowledge of the above <b>*</b>key<b>*</b> requirements will get the best response.</font></font></li> </ul> <font face="Verdana"><font size=-1>My client hires on a contract basis first and then it becomes full time if both parties are happy.</font></font><font face="Verdana"><font size=-1></font></font> <p><b><font face="Verdana"><font size=-1>Candidates MUST be permanent and local tri-state (NY, NJ, CT) residents.</font></font></b><font face="Verdana"><font size=-1></font></font> <p><font face="Verdana"><font size=-1>Please submit Word resume and hourly/salary requirements to <a href="mailto:python@nyc-search.com?subject=&body=My hourly/salary requirements are">python@nyc-search.com</a></font></font> <br><font face="Verdana"><font size=-1></font></font> </body> </html> --------------FCDF22A5C479E2E8508D5BD8-- From Jack.Jansen@cwi.nl Wed Apr 2 09:21:17 2003 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Wed, 2 Apr 2003 11:21:17 +0200 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <20030402024010.GG6887@laranja.org> Message-ID: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl> On Wednesday, Apr 2, 2003, at 04:40 Europe/Amsterdam, Lalo Martins wrote: >> Can someone please tell me how to submit a bug report? > > You need to login to sourceforge. > > Once you do that you should see a bar that looks like > Submit New | Browse | Reporting | Admin > the link you want is "Submit New". Aargh, this is very bad! I'm always logged in when I visit sourceforge (and I assume that most of us are), I wasn't aware of the fact that if you are not logged in you get no indication whatsoever that it is possible to submit bugs. Do we have control over what is on that page, i.e. could we add a note to the top saying "If you want to submit a new bug please log in first"? Otherwise I think the "bugs" link on www.python.org should go to a local page which explains this before sending people off to the sourceforge tracker. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From dave@boost-consulting.com Wed Apr 2 12:57:34 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 02 Apr 2003 07:57:34 -0500 Subject: [Python-Dev] How to suppress instance __dict__? In-Reply-To: <021d01c2f82c$9b6d3470$4ee1afca@kim> ("Joel de Guzman"'s message of "Tue, 1 Apr 2003 16:56:34 +0800") References: <ur88zougj.fsf@boost-consulting.com> <200303231321.h2NDLCF04208@pcp02138704pcs.reston01.va.comcast.net> <uof42i1ey.fsf@boost-consulting.com> <200303231546.h2NFkex04473@pcp02138704pcs.reston01.va.comcast.net> <uvfyayr0y.fsf@boost-consulting.com> <200303232104.h2NL4GQ04819@pcp02138704pcs.reston01.va.comcast.net> <021d01c2f82c$9b6d3470$4ee1afca@kim> Message-ID: <uvfxxys3l.fsf@boost-consulting.com> Hi, Joel -- I don't think this is more than marginally appropriate for python-dev, and probably we shouldn't bother Guido about it until I've failed to help you first. Everybody else can ignore the rest of this message unless they have a sick fascination with Boost.Python... "Joel de Guzman" <joel@boost-consulting.com> writes: > Ok, I'm lost. Please be easy with me, I'm still learning the C API > interfacing with Python :) Here's what I have so far. Emulating the > desired behavior in Python, I can do: > > class EnumMeta(type): > def __new__(cls, name, bases, dict): > C = type.__new__(cls, name, bases, dict) > del C.__getstate__ > return C > > class Enum(int): > __metaclass__ = EnumMeta > __slots__ = () > > > x = Enum(1964) > print x > > import pickle > print "SAVING" > out_x = pickle.dumps(x) > > print "LOADING" > xl = pickle.loads(out_x) > print xl > > I'm trying to rewrite this in C/C++ with the intent to patch > Boost.Python to allow pickling on enums. I took on this task to > learn more about the low level details of Python C interfacing. > So far, I have implemented EnumMeta in C that does not override > anything yet and installed that as the metaclass of Enum. > > I was wondering... Is there some C code somewhere that I can see > that implements some sort of meta-stuff? We have some in Boost.Python already, and I'm about to check in some more to implement static data members. > I read PEP253 and 253 and "Unifying Types and Classes in Python > 2.2". The examples there (specifically the class autoprop) is > written in Python. I tried searching for examples in C from the > current CVS snapsot of 2.3 but I failed in doing so. I'm sure it's > there, but I don't know where to find. Actually there are very few metaclasses in Python proper. AFAIK, PyType_Type is the only metaclass in the core. > To be specific, I'm lost in trying to implement tp_new of > PyTypeObject. How do I call the default tp_new for metaclasses? PyTypeObject.tp_new( /*args here*/ ) should work. HTH, -- Dave Abrahams Boost Consulting www.boost-consulting.com From zooko@zooko.com Wed Apr 2 13:39:33 2003 From: zooko@zooko.com (Zooko) Date: Wed, 02 Apr 2003 08:39:33 -0500 Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 In-Reply-To: Message from Brett Cannon <bac@OCF.Berkeley.EDU> of "Tue, 01 Apr 2003 20:52:22 PST." <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> Message-ID: <E190iSj-0007S7-00@localhost> Brett Cannon <bac@OCF.Berkeley.EDU> wrote: > > One point made about capabilities is that they partially go against the > Pythonic grain. Since you have to pass capabilities specifically and > shouldn't allow them to be inherited, it does not go with the way you tend > to write Python code. This doesn't make sense to me, and I don't recall a message which asserted it. If capabilities were implemented as Python references, you could inherit capabilities (== references) from superclasses, just as you can currently do. The rest looks like a good summary! Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From nas@python.ca Wed Apr 2 14:35:53 2003 From: nas@python.ca (Neil Schemenauer) Date: Wed, 2 Apr 2003 06:35:53 -0800 Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 In-Reply-To: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> Message-ID: <20030402143553.GA6801@glacier.arctrix.com> Brett Cannon wrote: > Neil Schemanauer suggested adding a warning for when this kind of > shadowing is done. There is a patch on SF (http://www.python.org/sf/711448) that adds a warning. It probably needs a bit of polish but I think it could go into 2.3. Neil From op73418@mail.telepac.pt Wed Apr 2 14:42:41 2003 From: op73418@mail.telepac.pt (=?iso-8859-1?Q?Gon=E7alo_Rodrigues?=) Date: Wed, 2 Apr 2003 15:42:41 +0100 Subject: [Python-Dev] Super and properties Message-ID: <001401c2f926$1d32d7e0$a8130dd5@violante> Hi all, Since this is my first post here, let me first introduce myself. I'm Gonçalo Rodrigues. I work in mathematics, mathematical physics to be more precise. I am a self-taught hobbyist programmer and fell in love with Python a year and half ago. And of interesting personal details this is about all so let me get down to business. My problem has to do with super that does not seem to work well with properties. I posted to comp.lang.python a while ago and there I was advised to post here. So, suppose I override a property in a subclass, e.g. >>> class test(object): ... def __init__(self, n): ... self.__n = n ... def __get_n(self): ... return self.__n ... def __set_n(self, n): ... self.__n = n ... n = property(__get_n, __set_n) ... >>> a = test(8) >>> a.n 8 >>> class test2(test): ... def __init__(self, n): ... super(test2, self).__init__(n) ... def __get_n(self): ... return "Got ya!" ... n = property(__get_n) ... >>> b = test2(8) >>> b.n 'Got ya!' Now, since I'm overriding a property, it is only normal that I may want to call the property implementation in the super class. But the obvious way (to me at least) does not work: >>> print super(test2, b).n Traceback (most recent call last): File "<interactive input>", line 1, in ? AttributeError: 'super' object has no attribute 'n' I know I can get at the property via the class, e.g. do >>> test.n.__get__(b) 8 >>> Or, not hardcoding the test class, >>> b.__class__.__mro__[1].n.__get__(b) 8 But this is ugly at best. To add to the puzzle, the following works, albeit not in the way I expected >>> super(test2, b).__getattribute__('n') 'Got ya!' Since I do not know if this is a bug in super or a feature request for it, I thought I'd better post here and leave it to your consideration. With my best regards, G. Rodrigues From lkcl@samba-tng.org Wed Apr 2 09:07:26 2003 From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton) Date: Wed, 2 Apr 2003 09:07:26 +0000 Subject: [Python-Dev] [PEP] += on return of function call result Message-ID: <20030402090726.GN1048@localhost> example code: log = {} for t in range(5): for r in range(10): log.setdefault(r, '') += "test %d\n" % t pprint(log) instead, as the above is not possible, the following must be used: from operator import add ... ... ... add(log.setdefault(r, ''), "test %d\n" % t) ... ARGH! just checked - NOPE! add doesn't work. and there's no function "radd" or "__radd__" in the operator module. unless there are really good reasons, can i recommend allowing += on return result of function calls. i cannot honestly think of or believe that there is a reasonable justification for restricting the += operator. append() on the return result of setdefault works absolutely fine, which is GREAT because you have no idea how long i have been fed up of not being able to do this in one line: log = {} log.setdefault(99, []).append("test %d\n" % t) l. From ark@research.att.com Wed Apr 2 14:54:35 2003 From: ark@research.att.com (Andrew Koenig) Date: 02 Apr 2003 09:54:35 -0500 Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <20030402090726.GN1048@localhost> References: <20030402090726.GN1048@localhost> Message-ID: <yu99n0j9gdas.fsf@europa.research.att.com> Luke> example code: Luke> log = {} Luke> for t in range(5): Luke> for r in range(10): Luke> log.setdefault(r, '') += "test %d\n" % t Luke> pprint(log) Luke> instead, as the above is not possible, the following must be used: Luke> from operator import add Luke> ... Luke> ... Luke> ... Luke> add(log.setdefault(r, ''), "test %d\n" % t) Luke> ... ARGH! just checked - NOPE! add doesn't work. Luke> and there's no function "radd" or "__radd__" in the Luke> operator module. Why can't you do this? for t in range(5): for r in range(10): foo = log.setdefault(r,'') foo += "test %d\n" % t -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From lkcl@samba-tng.org Wed Apr 2 15:12:33 2003 From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton) Date: Wed, 2 Apr 2003 15:12:33 +0000 Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <yu99n0j9gdas.fsf@europa.research.att.com> References: <20030402090726.GN1048@localhost> <yu99n0j9gdas.fsf@europa.research.att.com> Message-ID: <20030402151232.GX1048@localhost> On Wed, Apr 02, 2003 at 09:54:35AM -0500, Andrew Koenig wrote: > Why can't you do this? > > for t in range(5): > for r in range(10): > foo = log.setdefault(r,'') > foo += "test %d\n" % t because i am thick? ... now why didn't that occur to me :) thanks andrew, l. p.s. so it's on the "would be nice to have" From ben@algroup.co.uk Wed Apr 2 16:22:09 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Wed, 02 Apr 2003 17:22:09 +0100 Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net> References: <5.1.1.6.0.20030401124212.01e03670@mail.rapidsite.net> Message-ID: <3E8B0E31.5060001@algroup.co.uk> This message came unglued from the rest of the thread, so I'm going to unglue my response from my catching up with the rest of the thread (which I am partway through at the moment) ;-) Phillip J. Eby wrote: > >However, you don't use the same technique to control access to Python > *modules* > >such as the zipfile module, because the "import zipfile" statement > will give the > >current scope access to the zipfile module even if nobody has granted > such > >access to the current scope. > >... > >So your solution to this, to prevent code from grabbing privileges > willy nilly > >via "import" and builtins, is rexec, which creates a scope in which code > >executes (now called a "workspace"), and allows you to control which > builtins > >and modules are available for code executing in that "workspace". > > Almost. I think you may be confusing module *code* and module > *objects*. Guido pointed this out earlier. > > A Python module object is populated by executing a body of *code* > against the module *object* dictionary. The module object dictionary > contains a '__builtins__' entry that gives it its "base" capabilities. > > Module *objects* possess capabilities, which are in their dictionary or > reachable from it. *Code* doesn't possess capabilities except to > constants used in the code. So access to *code* only grants you > capabilities to the code and its constants. > > So, in order to provide a capability-safe environment, you need only > provide a custom __import__ which uses a different 'sys.modules' that is > specific to that environment. At that point, a "workspace" consists of > an object graph rooted in the supplied '__builtins__', locals(), > globals(), and initially executing code. > > We can then see that the standard Python environment is in fact a > capability system, wherein everything is reachable from everything else. I'm not quite sure what you mean by this. Of course, the fact that Python doesn't seem to be all that far from a capability system is one of the attractions, but until the holes you mention (and perhaps others) are plugged, it isn't a capability system. > > The "holes" in this capability system, then, are: > > 1. introspective abilities that allow "breaking out" of the workspace > (such as the ability to 'sys._getframe()' or examine tracebacks to > "reach up" to higher-level stack frames) > > 2. the structuring of the library in ways that equate creating an > instance of a class with an "unsafe" capability. (E.g., creating > instances of 'file()') coupled with instance->class introspection > > 3. Lack of true "privacy" for objects. (Proxies are a useful way to > address this issue, because they allow more than one "capability" to > exist for the same object.) Of course, once you have a capability system, you get the effect of more than one capability for the same object for free, as it were, simply by, err, proxying with other objects. The objection to doing it the other way round is that for capability languages to be truly usable the capability functionality needs to be automatic, not something that is painfully added to each class or object (at least, that is the claim we capability mavens are making). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From aahz@pythoncraft.com Wed Apr 2 17:55:48 2003 From: aahz@pythoncraft.com (Aahz) Date: Wed, 2 Apr 2003 12:55:48 -0500 Subject: [Python-Dev] Security challenge (was Re: Capabilities) In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> References: <3E8768BE.8010603@prescod.net> <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> Message-ID: <20030402175548.GA25135@panix.com> On Mon, Mar 31, 2003, Ka-Ping Yee wrote: > > I'm looking for other security design challenges to tackle in Python. > Once enough of them have been tried, we'll have a better understanding > of what Python would need to do to make secure programming easier. Okay, how about using LDAP to secure access to a database and give each user appropriate privileges? I'm just throwing this in as an example of mediated access that's required to be effective in the Real World [tm]; I'm sure you can think of simpler examples if you want. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz, c.l.py, 2/4/2002 From drifty@alum.berkeley.edu Wed Apr 2 20:36:38 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Wed, 2 Apr 2003 12:36:38 -0800 (PST) Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 In-Reply-To: <E190iSj-0007S7-00@localhost> References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> <E190iSj-0007S7-00@localhost> Message-ID: <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU> [Zooko] > > Brett Cannon <bac@OCF.Berkeley.EDU> wrote: > > > > One point made about capabilities is that they partially go against the > > Pythonic grain. Since you have to pass capabilities specifically and > > shouldn't allow them to be inherited, it does not go with the way you tend > > to write Python code. > > This doesn't make sense to me, and I don't recall a message which asserted it. > It was said in an email. I don't remember who off the top of my head, but someone stated something along these lines. > If capabilities were implemented as Python references, you could inherit > capabilities (== references) from superclasses, just as you can currently do. > That's why it says "shouldn't" instead of "couldn't". I could re-word this to go more along the way Ping phrased it in how the class statement does not make perfect sense for capabilities but it can be used. > The rest looks like a good summary! > Thanks. -Brett From martin@v.loewis.de Wed Apr 2 21:24:32 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 02 Apr 2003 23:24:32 +0200 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl> References: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl> Message-ID: <m38yuslhin.fsf@mira.informatik.hu-berlin.de> Jack Jansen <Jack.Jansen@cwi.nl> writes: > Do we have control over what is on that page, i.e. could we add a > note to the top saying "If you want to submit a new bug please log > in first"? Please have a look at the page now. Look ok? Is that needed for patches as well? Regards, Martin From fdrake@acm.org Wed Apr 2 21:34:24 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 2 Apr 2003 16:34:24 -0500 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <m38yuslhin.fsf@mira.informatik.hu-berlin.de> References: <7574B507-64EC-11D7-80C3-0030655234CE@cwi.nl> <m38yuslhin.fsf@mira.informatik.hu-berlin.de> Message-ID: <16011.22368.351593.284577@grendel.zope.com> Martin v. L=F6wis writes: > Please have a look at the page now. Look ok? Is that needed for > patches as well? Yes; that tracker has the same requirement for submission. -Fred --=20 Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From zooko@zooko.com Wed Apr 2 22:53:31 2003 From: zooko@zooko.com (Zooko) Date: Wed, 02 Apr 2003 17:53:31 -0500 Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 In-Reply-To: Message from Brett Cannon <bac@OCF.Berkeley.EDU> of "Wed, 02 Apr 2003 12:36:38 PST." <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> <E190iSj-0007S7-00@localhost> <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU> Message-ID: <E190r6p-0002Yx-00@localhost> > > > One point made about capabilities is that they partially go against the > > > Pythonic grain. ... > > If capabilities were implemented as Python references, you could inherit > > capabilities (== references) from superclasses, just as you can currently do. > > That's why it says "shouldn't" instead of "couldn't". I could re-word > this to go more along the way Ping phrased it in how the class statement > does not make perfect sense for capabilities but it can be used. I can't speak for Ping, but I would be quite surprised if he thought that capabilities were un-Pythonic. (I wouldn't be surprised if he disapproved of the notion of classes in a programming language, regardless of security considerations...) Speaking for myself, capabilities have two main advantages: they fit with the Zen of Python, they enable higher-order least-privilege, and they fit with the principle of unifying designation and authority. But seriously, I feel that capabilities fit with normal Python programming as it is currently practiced. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From zooko@zooko.com Wed Apr 2 23:08:12 2003 From: zooko@zooko.com (Zooko) Date: Wed, 02 Apr 2003 18:08:12 -0500 Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: Message from Ka-Ping Yee <ping@zesty.ca> of "Tue, 01 Apr 2003 14:12:49 CST." <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org> References: <Pine.LNX.4.33.0304011407390.4222-100000@server1.lfw.org> Message-ID: <E190rL2-0002lv-00@localhost> (I, Zooko, wrote the lines prepended with "> > ".) Ping wrote: > > > I think that in restricted-execution-mode (hereafter: "REM", as per Greg Ewing's > > suggestion [1]), Python objects have encapsulation -- one can't access their > > private data without their permission. > > > > Once this is done, Python references are capabilities. > > Aaack! I wish you would *stop* saying that! > > There is no criterion by which a reference is or is not a capability. > To talk in such terms only confuses the issue. Let me be a little more precise. Once Python objects are encapsulated, then possession of a reference is constrained in the following way: you can have a reference only if another object that had it chose to give it to you (or if you create something yourself, in which case you get the first-ever reference to it). This constraint happens to be the same constraint that the rule of capabilities imposes on the transmission of capabilities: you can have a capability only if someone else who had it chose to give it to you (or if you create something yourself, in which case you get the first-ever capability to it). Therefore, if you wish to use capability access control to manage access to resources in Python you can use the following technique: 1. Encapsulate the resource that you wish to control in a Python object. 2. Say to yourself "References are capabilities!". 3. Control the way references to that object are shared. Doing it this way will yield the advantages that capability access control enjoys over alternative access control models. It also has the advantage that your skills at Python programming can be applied directly to the problem of managing access control, without requiring you to learn any new policy language or new concepts. You are quite right, Ping, that capability access control could be enforced in other ways in Python. I didn't mean to say "capabilities are Python references", which would imply that capability access control could not be implemented in any other way. I'm deliberately refraining from posting about the issue of controlling import of modules and builtins in an attempt to "slow down" the discussion until Guido returns from Python UK. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From greg@cosc.canterbury.ac.nz Thu Apr 3 01:07:52 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 Apr 2003 13:07:52 +1200 (NZST) Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <20030402151232.GX1048@localhost> Message-ID: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz> Andrew Koenig wrote: > Why can't you do this? > foo = log.setdefault(r,'') > foo += "test %d\n" % t You can do it, but it's useless! >>> d = {} >>> foo = d.setdefault(42, "buckle") >>> foo += " my shoe" >>> d {42: 'buckle'} What Mr. Leighton wanted is *impossible* when the value concerned is immutable, because by the time you get to the += operator, there's no information left about where the value came from, and thus no way to update the dict with the new value. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Apr 3 02:19:51 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 Apr 2003 14:19:51 +1200 (NZST) Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <m38yuslhin.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304030219.h332Jp223291@oma.cosc.canterbury.ac.nz> Martin: > Please have a look at the page now. Look ok? What page are you talking about, exactly? I just tried the "Bug Tracker" link in the sidebar of www.python.org, and it still goes straight to a sourceforge page, which looks just the same as before as far as I can tell. What am I supposed to be seeing? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Thu Apr 3 02:31:43 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 02 Apr 2003 21:31:43 -0500 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <200304030219.h332Jp223291@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCEEGCECAB.tim.one@comcast.net> [Greg Ewing] > What page are you talking about, exactly? I just tried > the "Bug Tracker" link in the sidebar of www.python.org, > and it still goes straight to a sourceforge page, which > looks just the same as before as far as I can tell. > > What am I supposed to be seeing? I expect he wants you to see the line that says Please log into SourceForge to submit a new report. below the filter boxes and above the 1-line bug summaries. From ark@research.att.com Thu Apr 3 02:38:48 2003 From: ark@research.att.com (Andrew Koenig) Date: 02 Apr 2003 21:38:48 -0500 Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz> References: <200304030107.h3317qq20982@oma.cosc.canterbury.ac.nz> Message-ID: <yu99of3otidj.fsf@europa.research.att.com> Greg> Andrew Koenig wrote: >> Why can't you do this? >> foo = log.setdefault(r,'') >> foo += "test %d\n" % t Greg> You can do it, but it's useless! >>>> d = {} >>>> foo = d.setdefault(42, "buckle") >>>> foo += " my shoe" >>>> d Greg> {42: 'buckle'} Greg> What Mr. Leighton wanted is *impossible* when the value Greg> concerned is immutable, because by the time you get to Greg> the += operator, there's no information left about where Greg> the value came from, and thus no way to update the Greg> dict with the new value. Of course it's impossible when the value is immutable, because += cam't mutate it :-) However, consider this: foo = [] foo += ["my shoe"] No problem, right? So the behavior of foo = d.setdefault(r,'') foo += "test %d\n" % t depends on what type foo has, and the OP didn't say. But whatever type foo might have, the behavior of the two statements above ought logically to be the same as the theoretical behavior of d.setdefault(r,'') += "test %d\n" % t which is what the OP was trying to achieve. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From greg@cosc.canterbury.ac.nz Thu Apr 3 02:56:43 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 Apr 2003 14:56:43 +1200 (NZST) Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEGCECAB.tim.one@comcast.net> Message-ID: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> Tim Peters <tim.one@comcast.net>: > I expect he wants you to see the line that says > > Please log into SourceForge to submit a new report. > > below the filter boxes and above the 1-line bug summaries. Hmmm, okay, I can see it now, but it would be easy to miss if I weren't looking for it. Perhaps it could be made a little larger and set off from the items above and below it? Ideally, of course, the Submit New button should always be there, and lead to a page telling you to log in if you're not already. But presumably you don't have that much control over the page? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Apr 3 03:04:27 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 Apr 2003 15:04:27 +1200 (NZST) Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <yu99of3otidj.fsf@europa.research.att.com> Message-ID: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz> Andrew Koenig <ark@research.att.com>: > So the behavior of > > foo = d.setdefault(r,'') > foo += "test %d\n" % t > > depends on what type foo has, and the OP didn't say. I assumed that the code snippet was from his actual application, in which case he *did* want it to work on strings, in which case, even if he had the feature he wanted, it wouldn't have helped him. I think the fact that this would only work when the value was mutable is a good reason to disallow it. Too big a source of surprises, otherwise. Being forced to find another way to update the value in this case is a feature, because the absence of such a way when the value is immutable makes it clear that there's no way to do what you're trying to do! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Thu Apr 3 03:09:25 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 02 Apr 2003 22:09:25 -0500 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> [Greg Ewing] > Hmmm, okay, I can see it now, but it would be easy to > miss if I weren't looking for it. > > Perhaps it could be made a little larger and set off from > the items above and below it? We have no control over either -- SF lets us put words there, but that's all. I added another paragraph: Please log into SourceForge to submit a new report. SourceForge will not allow you to submit a new bug report unless you're logged in. It's not as invisible now. > Ideally, of course, the Submit New button should always > be there, and lead to a page telling you to log in > if you're not already. But presumably you don't have > that much control over the page? That's right. From fdrake@acm.org Thu Apr 3 03:57:38 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 2 Apr 2003 22:57:38 -0500 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> Message-ID: <16011.45362.723995.488848@grendel.zope.com> Tim Peters writes: > We have no control over either -- SF lets us put words there, but that's > all. I added another paragraph: We can do a little more; see the Expat tracker's "Submit New" page for an example that enhances the presentation a bit: http://sourceforge.net/tracker/?func=add&group_id=10127&atid=110127 One catch, of course, is that the extra blurb is always shown, even for people that are already logged in (I suspect the majority of use is by the development team); the farther down the page we push the actual bug information, the harder it is for developers to use. We need to think about the tradeoff; it is important to encourage good reports from people interested in providing them and willing to do so. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From martin@v.loewis.de Thu Apr 3 04:33:31 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 03 Apr 2003 06:33:31 +0200 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <16011.45362.723995.488848@grendel.zope.com> References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> <16011.45362.723995.488848@grendel.zope.com> Message-ID: <m37kac19pg.fsf@mira.informatik.hu-berlin.de> "Fred L. Drake, Jr." <fdrake@acm.org> writes: > One catch, of course, is that the extra blurb is always shown, even > for people that are already logged in (I suspect the majority of use > is by the development team); the farther down the page we push the > actual bug information, the harder it is for developers to use. I have now boldified parts of it; this doesn't take make space, but should increase visibility. I hope it's not considered annoying - feel free to undo that. If they would allow us to put PHP into that box, we could even suppress the text if the user was logged in. Regards, Martin From boris.boutillier@arteris.net Thu Apr 3 06:09:11 2003 From: boris.boutillier@arteris.net (Boris Boutillier) Date: 03 Apr 2003 08:09:11 +0200 Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz> References: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz> Message-ID: <1049350152.23533.20.camel@elevedelix> Thre is a way to do it, even with immutable objects, it is a little bit heavier : >>> x = {} >>> x.setdefault(42,'buckle') 'buckle' >>> x[42] += '3' >>> x {42: 'buckle3'} Boris Boutillier, - ARTERIS - Artwork Interconnecting System 6, Parc Ariane 78284 Guyancourt (FRANCE) On Thu, 2003-04-03 at 05:04, Greg Ewing wrote: > Andrew Koenig <ark@research.att.com>: > > > So the behavior of > > > > foo = d.setdefault(r,'') > > foo += "test %d\n" % t > > > > depends on what type foo has, and the OP didn't say. > > I assumed that the code snippet was from his actual application, in > which case he *did* want it to work on strings, in which case, even if > he had the feature he wanted, it wouldn't have helped him. > > I think the fact that this would only work when the value was mutable > is a good reason to disallow it. Too big a source of surprises, > otherwise. > > Being forced to find another way to update the value in this case is a > feature, because the absence of such a way when the value is immutable > makes it clear that there's no way to do what you're trying to do! > > Greg Ewing, Computer Science Dept, +--------------------------------------+ > University of Canterbury, | A citizen of NewZealandCorp, a | > Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | > greg@cosc.canterbury.ac.nz +--------------------------------------+ > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From walter@livinglogic.de Thu Apr 3 08:53:17 2003 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Thu, 03 Apr 2003 10:53:17 +0200 Subject: [Python-Dev] [PEP] += on return of function call result In-Reply-To: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz> References: <200304030304.h3334Rc23393@oma.cosc.canterbury.ac.nz> Message-ID: <3E8BF67D.4060807@livinglogic.de> Greg Ewing wrote: > Andrew Koenig <ark@research.att.com>: > > >>So the behavior of >> >> foo = d.setdefault(r,'') >> foo += "test %d\n" % t >> >>depends on what type foo has, and the OP didn't say. > > I assumed that the code snippet was from his actual application, in > which case he *did* want it to work on strings, in which case, even if > he had the feature he wanted, it wouldn't have helped him. > [...] > Being forced to find another way to update the value in this case is a > feature, because the absence of such a way when the value is immutable > makes it clear that there's no way to do what you're trying to do! Mutable (or at least appendable) strings should probably be done with StringIO/cStringIO. How about adding support for __iadd__ and __str__ (and __unicode__) to both? Bye, Walter Dörwald From ben@algroup.co.uk Thu Apr 3 10:43:10 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 03 Apr 2003 11:43:10 +0100 Subject: [Python-Dev] Capabilities In-Reply-To: <E1903R1-0005sc-00@localhost> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> Message-ID: <3E8C103E.90201@algroup.co.uk> Zooko wrote: > In the capability way of life, it is still the case that access to the ZipFile > class gives you the ability to open files anywhere in the system! (That is: I'm > assuming for now that we implement capabilities without re-writing every > dangerous class in the Library.) In this scheme, there are no flags, and when > you run code that you think might misuse this feature, you simply don't give > that code a reference to the ZipFile class. (Also, we have to arrange that it > can't acquire a reference by "import zipfile".) It would probably be helpful to explain what you (or, at least, I) would do if you (I) were writing from scratch, rather then "taming" the existing libraries. In this case, Zipfile would require a file capability to be passed to it at construction time, and so would become non-dangerous, which is, I think, where Guido is coming from. The risk only occurs because we want to not rewrite the whole library, just to wrap it, and its important to understand that this isn't really the "proper" way to do it (though, of course, the ZipFile class is not unlike any of the other non-capability things we'd have to wrap anyway, given a non-capability OS underneath, it just happens to be one that _can_ be rewritten if we want to rewrite it). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From ben@algroup.co.uk Thu Apr 3 10:52:08 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 03 Apr 2003 11:52:08 +0100 Subject: [Python-Dev] Capabilities (we already got one) In-Reply-To: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com> References: <Pine.LNX.4.44.0304011713230.32508-100000@korak.zope.com> Message-ID: <3E8C1258.3070906@algroup.co.uk> Ken Manheimer wrote: > On Tue, 1 Apr 2003, Ka-Ping Yee wrote: > One big one seems to be: "What needs to be done to enable effective > ("safe"?) use of python object (references) as capabilities?" I've > seen answers to this roll by several times - i think we need to settle > them, and collect the conclusions in a PEP. And we need to identify > what other questions there are. I am in the process of writing a PEP, and it is being informed by this discussion. Unfortunately, I have several day jobs and its going somewhat slowly. I've also been bogged down somewhat in a theoretical discussion with a bunch of capability experts over globals and how they should work. However, we do appear to have reached closure on that issue: globals have to be at least transitively immutable - unfortunately, I have demonstrated that this requirement is not sufficient to make them safe, but it is (we believe) necessary. So, now I've sorted that one out I can complete my first pass on the PEP, which I expect to do in the next few days. At that point, I'm slightly unsure how best to proceed. The most obvious way is, of course, to follow the standard PEP procedure, but are there people who would like to comment before I submit the first draft? It is still going to be full of unanswered questions, but I do think we are near to the stage where we can start nailing down the answers. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From mcherm@mcherm.com Thu Apr 3 13:09:31 2003 From: mcherm@mcherm.com (Michael Chermside) Date: Thu, 3 Apr 2003 05:09:31 -0800 Subject: [Python-Dev] Re: Capabilities (we already got one) Message-ID: <1049375371.3e8c328be581d@mcherm.com> > The objection to doing it the other way round is that for capability > languages to be truly usable the capability functionality needs to be > automatic, not something that is painfully added to each class or object > (at least, that is the claim we capability mavens are making). Just how strong a claim are you making here? It seems to me that the need for security (via capabilities or any other mechanism) is an UNUSUAL need. Most programs don't need it at all, others need it in only a few places. Now don't get me wrong... when you DO need it, you really need it, and just throwing something together without explicit language support is somewhere between impossible and terrifically-difficult-and-error-prone. So supporting secure execution (via capabilities or whatever) in the language is a great idea. And I like the capabilities-as-references approach... it's simple, elegant, and not error prone. But if you're going so far as to imply that capability functionality needs to be present ALWAYS, and supported (and considered) in every class or object, then that's going too far. A random module should, for instance, be able to open arbitrary files in the file system without being passed any special objects, UNLESS we do something special when we load it to indicate that we want it to run in a restricted mode. I think that zipfile is a good example here. As a library developer, I should be able to write and distribute a zipfile module without thinking about capabilities or security at all. Of course, when others go to use it in a secure or restricted mode, they may find that it isn't as useful as they'd like, but (I believe) we shouldn't say NO ONE can have a zipfile module unless the module author is willing to address security issues. Someone can write securezipfile when they get the itch. Now, if we really built security (via capabilities) into the language from the ground up, then ALL modules would work by being passed appropriate capability objects, and only the starting script would possess all capabilities. There would be no "file" builtin, just file objects (and ReadOnlyFile objects, and DirectorySubTree objects, and so forth) which got passed around. So OF COURSE the original author of zipfile would write it to accept a file at construction rather than allowing it to open files... that would be the natural way to do things. But that language isn't python... and I don't think it's worth changing Python enough to get there. So if you're proposing this drastic a change (which I doubt), then I think it's too drastic. But if you're NOT, then you have to realize that there will be lots of library modules like zipfile, which were written by people who didn't give any thought to security (since it's a rarely-used feature of the language). So we need workarounds (like wrappers or proxies) that can be applied after-the-fact to modules and classes that weren't written with security in mind. If that's "painfully adding something to each class or object", then I don't see how it's to be avoided. -- Michael Chermside From zooko@zooko.com Thu Apr 3 13:29:57 2003 From: zooko@zooko.com (Zooko) Date: Thu, 03 Apr 2003 08:29:57 -0500 Subject: [Python-Dev] Capabilities In-Reply-To: Message from Ben Laurie <ben@algroup.co.uk> of "Thu, 03 Apr 2003 11:43:10 +0100." <3E8C103E.90201@algroup.co.uk> References: <Pine.LNX.4.33.0303301445260.22036-100000@server1.lfw.org> <200303310009.h2V09qx01754@pcp02138704pcs.reston01.va.comcast.net> <E1903R1-0005sc-00@localhost> <3E8C103E.90201@algroup.co.uk> Message-ID: <E1914mz-0005SN-00@localhost> (I, Zooko, wrote the lines prepended with "> > ".) Ben Laurie wrote: > > > In the capability way of life, it is still the case that access to the ZipFile > > class gives you the ability to open files anywhere in the system! (That is: I'm > > assuming for now that we implement capabilities without re-writing every > > dangerous class in the Library.) ... > It would probably be helpful to explain what you (or, at least, I) would > do if you (I) were writing from scratch, rather then "taming" the > existing libraries. In this case, Zipfile would require a file > capability to be passed to it at construction time, and so would become > non-dangerous, which is, I think, where Guido is coming from. Thank you. You are right about how I would do it, and I think you are right that this fits with Guido's approach, too. I would make the constructor of the ZipFile class take a file object, and hide (at least from unprivileged code) the option of passing a filename to the constructor. This would make it so that no authority is gained by importing the zipfile module. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From ben@algroup.co.uk Thu Apr 3 14:04:27 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 03 Apr 2003 15:04:27 +0100 Subject: [Python-Dev] Capabilities In-Reply-To: <3E88E2B6.1080409@prescod.net> References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> <3E88E2B6.1080409@prescod.net> Message-ID: <3E8C3F6B.8000000@algroup.co.uk> Paul Prescod wrote: > Are DOS issues in scope? How do we prevent untrusted code from just > bringing the interpreter to a halt? A smart enough attacker could even > block all threads in the current process by finding a task that is > usually not time-sliced and making it go on for a very long time. > without looking at the Python implementation, I can't remember an > example off of the top of my head, but perhaps a large multiplication or > search-and-replace in a string. It seems to me that this is an issue orthogonal to capabilities (though access to mechanisms that regulate it might well be capability-based). Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From ben@algroup.co.uk Thu Apr 3 14:05:45 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Thu, 03 Apr 2003 15:05:45 +0100 Subject: [Python-Dev] Capabilities In-Reply-To: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> References: <Pine.LNX.4.33.0303302001350.326-100000@server1.lfw.org> Message-ID: <3E8C3FB9.50101@algroup.co.uk> Ka-Ping Yee wrote: > Hmm, i'm not sure you understood what i meant. The code example i posted > is a solution to the design challenge: "provide read-only access to a > directory and its subdirectories, but no access to the rest of the filesystem". > I'm looking for other security design challenges to tackle in Python. > Once enough of them have been tried, we'll have a better understanding of > what Python would need to do to make secure programming easier. Well, one of the favourites is to create a file selection dialog that will only give access (optionally readonly) to the file designated by the user. This may be rather more than you want to bite off as a working system at this stage, though! It might be a useful thought experiment, though. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From fdrake@acm.org Thu Apr 3 14:40:21 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 3 Apr 2003 09:40:21 -0500 Subject: How do I report a bug? (Re: [Python-Dev] Distutils documentation amputated in 2.2 docs?) In-Reply-To: <m37kac19pg.fsf@mira.informatik.hu-berlin.de> References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> <16011.45362.723995.488848@grendel.zope.com> <m37kac19pg.fsf@mira.informatik.hu-berlin.de> Message-ID: <16012.18389.659720.951267@grendel.zope.com> Martin v. L=F6wis writes: > I have now boldified parts of it; this doesn't take make space, but > should increase visibility. I hope it's not considered annoying - fe= el > free to undo that. Nice! I've made the boldified text a hyperlink to the login page, and copied the text to the patch tracker as well. > If they would allow us to put PHP into that box, we could even > suppress the text if the user was logged in. Hmm. I don't know that they won't, I just don't know the incantation to determine if a user is logged on. -Fred --=20 Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From drifty@alum.berkeley.edu Thu Apr 3 19:05:56 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Thu, 3 Apr 2003 11:05:56 -0800 (PST) Subject: [Python-Dev] python-dev Summary for 2003-03-16 through 2003-03-31 In-Reply-To: <E190r6p-0002Yx-00@localhost> References: <Pine.SOL.4.53.0304012049230.14447@death.OCF.Berkeley.EDU> <E190iSj-0007S7-00@localhost> <Pine.SOL.4.53.0304021234000.11234@death.OCF.Berkeley.EDU> <E190r6p-0002Yx-00@localhost> Message-ID: <Pine.SOL.4.53.0304031105190.11078@death.OCF.Berkeley.EDU> [Zooko] > But seriously, I feel that capabilities fit with normal Python programming as it > is currently practiced. > The paragraph is gone, so no need to worry about this anymore. -Brett From altis@semi-retired.com Thu Apr 3 19:42:09 2003 From: altis@semi-retired.com (Kevin Altis) Date: Thu, 3 Apr 2003 11:42:09 -0800 Subject: [Python-Dev] fwd: Dan Sugalski on continuations and closures Message-ID: <KJEOLDOPMIDKCMJDCNDPAEHLDDAA.altis@semi-retired.com> via Simon Willison's blog: http://simon.incutio.com/archive/2003/04/03/#closuresAndContinuations " Thanks to Dan Sugalski (designer of Parrot, the next generation Perl VM) I finally understand what continuations and closures actually are. He explains them as part of a comparison between the forthcoming Parrot and two popular virtual machines already in existence: * (Perl|python|Ruby) on (.NET|JVM) leads in to the explanation. http://www.sidhe.org/~dan/blog/archives/000151.html * The reason for Parrot, part 2 explains closures. http://www.sidhe.org/~dan/blog/archives/000152.html * Continuations and VMs explains continuations. http://www.sidhe.org/~dan/blog/archives/000156.html * Continuations and VMs, part 2 rounds things off by explaining why the JVM and the CLR are unsuitable environments for supporting these language features. http://www.sidhe.org/~dan/blog/archives/000157.html " ka ps. In order to focus on Python promotion and site-redesign efforts I've suspended delivery of python-dev email in the short-term and will only be scanning the archives as time permits. If you need to flame me, please address your emails to me directly or /dev/null, your choice ;-) From martin@v.loewis.de Thu Apr 3 22:36:49 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 04 Apr 2003 00:36:49 +0200 Subject: [Python-Dev] Re: How do I report a bug? In-Reply-To: <16012.18389.659720.951267@grendel.zope.com> References: <200304030256.h332uha23381@oma.cosc.canterbury.ac.nz> <LNBBLJKPBEHFEDALKOLCKEGFECAB.tim.one@comcast.net> <16011.45362.723995.488848@grendel.zope.com> <m37kac19pg.fsf@mira.informatik.hu-berlin.de> <16012.18389.659720.951267@grendel.zope.com> Message-ID: <m365pv8aym.fsf_-_@mira.informatik.hu-berlin.de> "Fred L. Drake, Jr." <fdrake@acm.org> writes: > > If they would allow us to put PHP into that box, we could even > > suppress the text if the user was logged in. > > Hmm. I don't know that they won't, I just don't know the incantation > to determine if a user is logged on. If it's still the same code as in SF 2.5, it is "user_isloggedin()": http://phpxref.sourceforge.net/sourceforge/include/User.class.source.html#l555 As an example usage, see http://phpxref.sourceforge.net/sourceforge/patch/add_patch.php.source.html#l49 Regards, Martin From tim.one@comcast.net Fri Apr 4 04:08:54 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 03 Apr 2003 23:08:54 -0500 Subject: [Python-Dev] Boom Message-ID: <LNBBLJKPBEHFEDALKOLCEEIAECAB.tim.one@comcast.net> While enduring dental implant surgery earlier today, I thought to myself "oops -- I bet this program will crash Python". Turns out it does, in current CVS, and almost certainly in every version of Python since cyclic gc was added: """ import gc class C: def __getattr__(self, attr): del self.attr raise AttributeError a = C() b = C() a.attr = b b.attr = a del a, b gc.collect() """ Short course: a and b are in a trash cycle. gcmodule's move_finalizers() finds one of them and calls has_finalizer() to see whether it's collectible. Say it's b. has_finalizer() calls (in effect) hasattr(b, "__del__"), and b.__getattr__() deletes b.attr as a side effect before saying b.__del__ doesn't exist. That drops the refcount on a to 0, which in turn drops the refcount on a.__dict__ to 0. Those two are the killers: a and a.__dict__ become untracked (by gc) as part of cleaning them up, but the move_finalizers() "next" local still points to one of them (to the __dict__, in the run I happened to step thru). As a result, the next trip around the move_finalizer() loop calls has_finalizer() on memory that's already been free()ed. Hilarity ensues. The anesthesia is wearing off and I won't speculate about solutions now. I suspect it's easy, or close to intractable. PLabs folks, I'm unsure whether this relates to the ZODB test failure we've been bashing away at. All, ZODB is a persistent database, and at one point in this test gc determines that "a ghost" is unreachable. When gc's has_finalizer() asks whether the ghost has a __del__ method, the persistence machinery kicks in, sucking the ghost's state off of disk, and executing a lot of Python code as a result. Part of the Python code executed does appear (if hazy memory serves) to delete some previously unreachable objects that were also in (or hanging off of) the ghost's cycle, and so in the unreachable list gc's move_finalizers() is crawling over. The kind of blowup above could be one bad effect, and Jeremy was seeing blowups with move_finalizers() in the traceback. Unfortunately, the test doesn't blow up under CVS Python, and 2.2.2 doesn't have the telltale 0xdbdbdbdb filler 2.3's debug PyMalloc sprays into free()ed memory. From tim.one@comcast.net Fri Apr 4 04:37:47 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 03 Apr 2003 23:37:47 -0500 Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <E191Dj8-00070O-00@sc8-pr-cvs1.sourceforge.net> Message-ID: <LNBBLJKPBEHFEDALKOLCOEIBECAB.tim.one@comcast.net> [jhylton@users.sourceforge.net] > Modified Files: > Tag: release22-maint > gcmodule.c > Log Message: > Fix memory corruption in garbage collection. > ... > The problem with the previous revision is that it followed > gc->gc.gc_next before calling has_finalizer(). If has_finalizer() > gc->happened to deallocate the object FROM_GC(gc->gc.gc_next), then > the next time through the loop gc would point to freed memory. The > fix is to always follow the next pointer after calling > has_finalizer(). Oops! I didn't see this before posting my "Boom" msg. > Note that Python 2.3 does not have this problem, because > has_finalizer() checks the tp_del slot and never runs Python code. That part isn't so, alas: the program I posted in the "Boom" msg crashes 2.3, via the same mechanism: return PyInstance_Check(op) ? PyObject_HasAttr(op, delstr) : PyType_HasFeature(op->ob_type, Py_TPFLAGS_HEAPTYPE) ? op->ob_type->tp_del != NULL : 0; It's the PyInstance_Check(op) path there that's still vulnerable. I'll poke at that. > Tim, Barry, and I peed away the better part of two days tracking this > down. > ! next = gc->gc.gc_next; > if (has_finalizer(op)) { > gc_list_remove(gc); > gc_list_append(gc, finalizers); > gc->gc.gc_refs = GC_MOVED; > } > } > } > --- 277,290 ---- > for (; gc != unreachable; gc=next) { > PyObject *op = FROM_GC(gc); > ! /* has_finalizer() may result in arbitrary Python > ! code being run. */ > if (has_finalizer(op)) { > + next = gc->gc.gc_next; > gc_list_remove(gc); > gc_list_append(gc, finalizers); > gc->gc.gc_refs = GC_MOVED; > } > + else > + next = gc->gc.gc_next; > } > } Are we certain that has_finalizer() can't unlink gc itself from the unreachable list? If it can, then > + else > + next = gc->gc.gc_next; will set next to the content of free()ed memory. In fact, I believe the Boom program will suffer this fate ... yup, it does. "The problem" isn't yet really fixed in any version of Python, although I agree it's a lot better with the change above. From ben@algroup.co.uk Fri Apr 4 10:41:43 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Fri, 04 Apr 2003 11:41:43 +0100 Subject: [Python-Dev] Re: Capabilities (we already got one) In-Reply-To: <1049375371.3e8c328be581d@mcherm.com> References: <1049375371.3e8c328be581d@mcherm.com> Message-ID: <3E8D6167.4020804@algroup.co.uk> Michael Chermside wrote: >>The objection to doing it the other way round is that for capability >>languages to be truly usable the capability functionality needs to be >>automatic, not something that is painfully added to each class or object >>(at least, that is the claim we capability mavens are making). > > > Just how strong a claim are you making here? > > It seems to me that the need for security (via capabilities or any other > mechanism) is an UNUSUAL need. Most programs don't need it at all, > others need it in only a few places. Now don't get me wrong... when you > DO need it, you really need it, and just throwing something together > without explicit language support is somewhere between impossible and > terrifically-difficult-and-error-prone. So supporting secure execution > (via capabilities or whatever) in the language is a great idea. And I > like the capabilities-as-references approach... it's simple, elegant, > and not error prone. > > But if you're going so far as to imply that capability functionality > needs to be present ALWAYS, and supported (and considered) in every class > or object, then that's going too far. A random module should, for > instance, be able to open arbitrary files in the file system without > being passed any special objects, UNLESS we do something special when we > load it to indicate that we want it to run in a restricted mode. > > I think that zipfile is a good example here. As a library developer, I > should be able to write and distribute a zipfile module without thinking > about capabilities or security at all. Of course, when others go to use > it in a secure or restricted mode, they may find that it isn't as useful > as they'd like, but (I believe) we shouldn't say NO ONE can have a > zipfile module unless the module author is willing to address security > issues. Someone can write securezipfile when they get the itch. > > Now, if we really built security (via capabilities) into the language > from the ground up, then ALL modules would work by being passed > appropriate capability objects, and only the starting script would > possess all capabilities. There would be no "file" builtin, just file > objects (and ReadOnlyFile objects, and DirectorySubTree objects, and > so forth) which got passed around. So OF COURSE the original author > of zipfile would write it to accept a file at construction rather than > allowing it to open files... that would be the natural way to do things. > But that language isn't python... and I don't think it's worth changing > Python enough to get there. > > So if you're proposing this drastic a change (which I doubt), then I > think it's too drastic. But if you're NOT, then you have to realize > that there will be lots of library modules like zipfile, which were > written by people who didn't give any thought to security (since it's > a rarely-used feature of the language). So we need workarounds (like > wrappers or proxies) that can be applied after-the-fact to modules and > classes that weren't written with security in mind. If that's > "painfully adding something to each class or object", then I don't see > how it's to be avoided. I am completely in agreement. Taming of existing modules is inevitably going to be somewhat painful - and, in some cases, it may be less painful to simply rewrite them. As you suspect, what I am proposing is that _when_ a programmer wishes to use capabilities as a security mechanism, it is desirable to make that as easy to use as possible. I'm not sure I agree that the need for security is particularly unusual but I don't think its worth having a big argument about. I certainly do agree that crippling Python in order to get capabilities is not a desirable outcome. Not that I have that option anyway :-) Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From ping@zesty.ca Fri Apr 4 12:28:18 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Fri, 4 Apr 2003 06:28:18 -0600 (CST) Subject: [Python-Dev] Re: Capabilities (we already got one) In-Reply-To: <3E8D6167.4020804@algroup.co.uk> Message-ID: <Pine.LNX.4.33.0304040616370.1082-100000@server1.lfw.org> Michael Chermside wrote: > It seems to me that the need for security (via capabilities or any other > mechanism) is an UNUSUAL need. Most programs don't need it at all, > others need it in only a few places. I think you are missing the point somewhat. Security is about making sure your program will do what you expect. So it is just as much about avoiding bugs as about thwarting malicious agents. Programming in a capability style makes programs more reliable and bugs less damaging. Colleagues of mine have established the habit of programming in a capability style in Java -- not because Java supports capabilities, and not because they need security at all, but just because programming *as if* the language had capabilities leads to a better modular design. On Fri, 4 Apr 2003, Ben Laurie wrote: > I'm not sure I agree that the need for security is particularly unusual > but I don't think its worth having a big argument about. I certainly do > agree that crippling Python in order to get capabilities is not a > desirable outcome. Not that I have that option anyway :-) I also prefer to avoid loaded language. No one is talking about "crippling" anything. The essence of a capability model is simply to be explicit when authority is transferred. Explicit is better than implicit. -- ?!ng From jeremy@zope.com Fri Apr 4 16:46:32 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 04 Apr 2003 11:46:32 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins] python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <list-1424542@digicool.com> References: <list-1424542@digicool.com> Message-ID: <1049474792.14151.85.camel@slothrop.zope.com> On Thu, 2003-04-03 at 23:37, Tim Peters wrote: > > ! next = gc->gc.gc_next; > > if (has_finalizer(op)) { > > gc_list_remove(gc); > > gc_list_append(gc, finalizers); > > gc->gc.gc_refs = GC_MOVED; > > } > > } > > } > > --- 277,290 ---- > > for (; gc != unreachable; gc=next) { > > PyObject *op = FROM_GC(gc); > > ! /* has_finalizer() may result in arbitrary Python > > ! code being run. */ > > if (has_finalizer(op)) { > > + next = gc->gc.gc_next; > > gc_list_remove(gc); > > gc_list_append(gc, finalizers); > > gc->gc.gc_refs = GC_MOVED; > > } > > + else > > + next = gc->gc.gc_next; > > } > > } > > Are we certain that has_finalizer() can't unlink gc itself from the > unreachable list? If it can, then > > > + else > > + next = gc->gc.gc_next; > > will set next to the content of free()ed memory. In fact, I believe the > Boom program will suffer this fate ... yup, it does. "The problem" isn't > yet really fixed in any version of Python, although I agree it's a lot > better with the change above. It looks like it's hard to find a place to stand. Since arbitrary Python code can run, then an arbitrary set of objects in the unreachable list can suddenly become unlinked. The previous, current, and next objects are all suspect. I think a safe approach would be to move everything out of unreachable and into either "collectable" or "finalizers". That way, we can do a while (!gc_list_is_empty(unreachable)) loop and always deal with the head of the unreachable list. Each time through the loop, the head of the list can be moved to collectable or finalizers or become unlinked, so we always make progress. Sound plausible? Jeremy From jeremy@zope.com Fri Apr 4 17:39:16 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 04 Apr 2003 12:39:16 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins] python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049474792.14151.85.camel@slothrop.zope.com> References: <list-1424542@digicool.com> <1049474792.14151.85.camel@slothrop.zope.com> Message-ID: <1049477956.14152.93.camel@slothrop.zope.com> On Fri, 2003-04-04 at 11:46, Jeremy Hylton wrote: > I think a safe approach would be to move everything out of unreachable > and into either "collectable" or "finalizers". That way, we can do a > while (!gc_list_is_empty(unreachable)) loop and always deal with the > head of the unreachable list. Each time through the loop, the head of > the list can be moved to collectable or finalizers or become unlinked, > so we always make progress. > > Sound plausible? Yes. I've got a patch that fixes the boom case, but I'm not sure I've handled the case where the object becomes reachable as a result of running PyObject_HasAttr(). I'll post after testing that. Jeremy From jeremy@zope.com Fri Apr 4 18:26:11 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 04 Apr 2003 13:26:11 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins] python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049477956.14152.93.camel@slothrop.zope.com> References: <list-1424542@digicool.com> <1049474792.14151.85.camel@slothrop.zope.com> <1049477956.14152.93.camel@slothrop.zope.com> Message-ID: <1049480770.14146.95.camel@slothrop.zope.com> On Fri, 2003-04-04 at 12:39, Jeremy Hylton wrote: > On Fri, 2003-04-04 at 11:46, Jeremy Hylton wrote: > > I think a safe approach would be to move everything out of unreachable > > and into either "collectable" or "finalizers". That way, we can do a > > while (!gc_list_is_empty(unreachable)) loop and always deal with the > > head of the unreachable list. Each time through the loop, the head of > > the list can be moved to collectable or finalizers or become unlinked, > > so we always make progress. > > > > Sound plausible? > > Yes. I've got a patch that fixes the boom case, but I'm not sure I've > handled the case where the object becomes reachable as a result of > running PyObject_HasAttr(). I'll post after testing that. It's SF patch 715446. There's a lingering problem with test_gc, but I hope it's tractable. Jeremy From jeremy@zope.com Fri Apr 4 20:15:51 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 04 Apr 2003 15:15:51 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins] python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049480770.14146.95.camel@slothrop.zope.com> References: <list-1424542@digicool.com> <1049474792.14151.85.camel@slothrop.zope.com> <1049477956.14152.93.camel@slothrop.zope.com> <1049480770.14146.95.camel@slothrop.zope.com> Message-ID: <1049487350.14146.101.camel@slothrop.zope.com> We've got the first version of boom nailed, but we've got the same problem in handle_finalizers(). The version of boom below doesn't blow up until the second time the has_finalizer() is called. I don't understand the logic in handle_finalizers(), though. If the objects are all in the finalizers list, why do we call has_finalizer() a second time? Shouldn't everything has a finalizer at that point? Jeremy import gc class C: def __init__(self): self.x = 0 def delete(self): print "never called" def __getattr__(self, attr): self.x += 1 print self.x if self.x > 1: del self.attr else: return self.delete raise AttributeError a = C() b = C() a.attr = b b.attr = a del a, b print gc.collect() From tim_one@email.msn.com Sat Apr 5 08:15:40 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 5 Apr 2003 03:15:40 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCIENBEEAB.tim_one@email.msn.com> [Jeremy Hylton] > We've got the first version of boom nailed, but we've got the same > problem in handle_finalizers(). The version of boom below doesn't blow > up until the second time the has_finalizer() is called. > > I don't understand the logic in handle_finalizers(), though. If the > objects are all in the finalizers list, why do we call has_finalizer() a > second time? Shouldn't everything has a finalizer at that point? Nope -- the parenthetical /* Handle uncollectable garbage (cycles with finalizers). */ comment is incomplete. The earlier call to move_finalizer_reachable() also put everything reachable only *from* trash cycles with finalizers into the list. So, e.g., if the trash graph is like A<->B->C and A has a finalizer but B and C don't, they're all in the finalizers list (at this point) regardless. But B and C aren't stopping the blob from getting collected, and we're trying to do the user a favor by putting only A (the troublemaker) into gc.garbage. It's an approximation, though. For example, if A and C both had finalizers, A and C would both be put into gc.garbage, despite that C's finalizer isn't stopping anything from getting collected. The comments are apparently a bit out of synch with the code, because 17 months ago all instance objects in the finalizers list were put into gc.garbage (regardless of whether they had __del__). The checkin comment for rev 2.28 sez the __del__ change was needed to fix a bug; but I'm too groggy to dig more now. From tim_one@email.msn.com Sat Apr 5 19:34:36 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 5 Apr 2003 14:34:36 -0500 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com> I checked in some more changes (2.3 head only). This kind of program may be intractable: """ class C: def __getattr__(self, attribute): global alist if 'attr' in self.__dict__: alist.append(self.attr) del self.attr raise AttributeError import gc gc.collect() a = C() b = C() alist = [] a.attr = b b.attr = a a.x = 1 b.x = 2 del a, b # Oops. This prints 4: it's collecting # a, b, and their dicts. print gc.collect() # Despite that __getattr__ resurrected them. print alist # But gc cleared their dicts. print alist[0].__dict__ print alist[1].__dict__ # So a.x and b.x fail. print alist[0].x, alist[1].x """ While a __getattr__ side effect may resurrect an object in gc's unreachable list, gc has no way to know that an object has been resurrected short of starting over again. In the absence of that, the object remains in gc's unreachable list, and its tp_clear slot eventually gets called. The internal C stuff remains self-consistent, so this won't cause a segfault (etc), but it may (as above) be surprising. I don't see a sane way to fix this so long as asking whether __del__ exists can execute arbitrary mounds of Python code. From exarkun@intarweb.us Sat Apr 5 19:35:31 2003 From: exarkun@intarweb.us (Jp Calderone) Date: Sat, 5 Apr 2003 14:35:31 -0500 Subject: [Python-Dev] Placement of os.fdopen functionality Message-ID: <20030405193531.GA23455@meson.dyndns.org> --2fHTh5uZTiUOsy+g Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable It occurred to me this afternoon (after answering aquestion about creating file objects from file descriptors) that perhaps os.fdopen would be more logically placed someplace else - of course it could also remain as os.fdopen() for whatever deprecation period is warrented. Perhaps as a class method of the file type, file.fromfd()? Should I file a feature request for this on sf, or would it be considered too much of a mindless twiddle to bother with? Jp --=20 http://catandgirl.com/view.cgi?44 --=20 up 16 days, 16:00, 5 users, load average: 1.13, 0.93, 0.85 --2fHTh5uZTiUOsy+g Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE+jzADedcO2BJA+4YRApeUAJ98bFbiUoBXXdzYm025xmV8LamPbwCcDs/J C1oeDLOPgcWgAWwEDQGCGOg= =qSMA -----END PGP SIGNATURE----- --2fHTh5uZTiUOsy+g-- From martin@v.loewis.de Sat Apr 5 20:34:13 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 05 Apr 2003 22:34:13 +0200 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <20030405193531.GA23455@meson.dyndns.org> References: <20030405193531.GA23455@meson.dyndns.org> Message-ID: <m37ka81y62.fsf@mira.informatik.hu-berlin.de> Jp Calderone <exarkun@intarweb.us> writes: > Perhaps as a class method of the file type, file.fromfd()? > > Should I file a feature request for this on sf, or would it be considered > too much of a mindless twiddle to bother with? Feel free to file a feature request, but I'd predict that it might sit there for some years until it is closed because of no action. OTOH, if you would produce a patch implementing the feature, it might get attention. Regards, Martin From tim.one@comcast.net Sun Apr 6 00:05:21 2003 From: tim.one@comcast.net (Tim Peters) Date: Sat, 05 Apr 2003 19:05:21 -0500 Subject: [Python-Dev] Re: [PythonLabs] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049487350.14146.101.camel@slothrop.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCGEKJECAB.tim.one@comcast.net> [Jeremy Hylton] > We've got the first version of boom nailed, but we've got the same > problem in handle_finalizers(). The version of boom below doesn't blow > up until the second time the has_finalizer() is called. It isn't really necessary to call has_finalizer() a second time, and I'll check in changes so that it doesn't anymore (assuming the test suite passes -- it's running as I type this). > I don't understand the logic in handle_finalizers(), though. If the > objects are all in the finalizers list, why do we call has_finalizer() a > second time? Shouldn't everything has a finalizer at that point? I tried to explain that last night. The essence of the changes I have pending is to make move_finalizer_reachable() move the tentatively unreachable objects reachable only from finalizers into a new & distinct list, reachable_from_finalizers. After that, everything in finalizers has a finalizer and nothing in reachable_from_finalizers does, so we don't have to call has_finalizer() again. Before, finalizers contained everything in both (finalizers and reachable_from_finalizers) lists, so another has_finalizer() call on each object was needed to distinguish the two kinds (has a finalizer, doesn't have a finalizer) of objects again. > import gc > > class C: > > def __init__(self): > self.x = 0 > > def delete(self): > print "never called" > > def __getattr__(self, attr): > self.x += 1 > print self.x > if self.x > 1: > del self.attr > else: > return self.delete > raise AttributeError > > a = C() > b = C() > a.attr = b > b.attr = a > > del a, b > print gc.collect() I also added a non-printing variant of this to test_gc. In the new world, the "del self.attr" bits never get called, so this is just a vanilla trash cycle now. From jeremy@alum.mit.edu Sun Apr 6 03:02:04 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 05 Apr 2003 21:02:04 -0500 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com> Message-ID: <1049594522.24643.57.camel@localhost.localdomain> On Sat, 2003-04-05 at 14:34, Tim Peters wrote: > While a __getattr__ side effect may resurrect an object in gc's unreachable > list, gc has no way to know that an object has been resurrected short of > starting over again. In the absence of that, the object remains in gc's > unreachable list, and its tp_clear slot eventually gets called. The > internal C stuff remains self-consistent, so this won't cause a segfault > (etc), but it may (as above) be surprising. I don't see a sane way to fix > this so long as asking whether __del__ exists can execute arbitrary mounds > of Python code. I think I'll second the thought that there are no satisfactory answers here. We've made a big step forward by fixing the core dumps. If we want to document the current behavior, we would say that garbage collection may leave reachable objects in an "invalid state" in the presence of "problematic objects." A "problematic object" is an instance of a classic class that defines a getattr hook (__getattr__) but not a finalizer (__del__). An object an in "invalid state" has had its tp_clear slot executed; in the case of instances, this means the __dict__ will be empty. Specifically, if a problematic object is part of unreachable cycle, the garbage collector will execute the code in its getattr hook; if executing that code makes any object in the cycle reachable again, it will be left in an invalid state. If we document this for 2.2, it's more complicated because instances of new-style classes are also affected. What's worse, a new-style class with a __getattribute__ hook is affected regardless of whether it has a finalizer. Here are a couple of thoughts about how to avoid leaving objects in an invalid state. It's pretty unlikely for it to happen, but speaking from experience <wink> it's baffling when it does. #1. (I think this was Fred's suggestion on Friday.) Don't do a hasattr() check on the object, do it on the class. This is what happens with new-style classes in Python 2.3: If a new-style class doesn't define an __del__ method, then its instances don't have finalizer. It doesn't matter whether the specific instance has an __del__ attribute. Limitations: This is a change in semantics, although it only covers a nearly insane corner case. The other limitation is that things could still go wrong, although only in the presence of a classic metaclass! #2. If an object has a getattr hook and it's involved in a cycle, just put it in gc.garbage. Forget about checking for a finalizer. That seems fine for 2.3, since we're only talking about classic classes with getattr hooks. But it doesn't sound very pleasant for 2.2, since it covers an class instance with a getattr hook. I think #1 is pretty reasonable. I'd like to see something fixed for 2.2.3, but I worry that the semantic change may be unacceptable for a bug fix release. (But maybe not, the semantics are pretty insane right now :-). Jeremy From jim@zope.com Sun Apr 6 12:07:44 2003 From: jim@zope.com (Jim Fulton) Date: Sun, 06 Apr 2003 07:07:44 -0400 Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <list-1431082@digicool.com> References: <list-1431082@digicool.com> Message-ID: <3E900A80.3010802@zope.com> Tim Peters wrote: ... > While a __getattr__ side effect may resurrect an object in gc's unreachable > list, gc has no way to know that an object has been resurrected short of > starting over again. In the absence of that, the object remains in gc's > unreachable list, and its tp_clear slot eventually gets called. The > internal C stuff remains self-consistent, so this won't cause a segfault > (etc), but it may (as above) be surprising. I don't see a sane way to fix > this so long as asking whether __del__ exists can execute arbitrary mounds > of Python code. If I understand the problem, it can be avoided by avoiding old-style classes. Maybe it's time to, at least optionally, cause a warning when old-style classes are used. :) I'm not kidding for Zope. I think it might be worth-while to be issue such a warning in Zope. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (703) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org From skip@mojam.com Sun Apr 6 13:00:22 2003 From: skip@mojam.com (Skip Montanaro) Date: Sun, 6 Apr 2003 07:00:22 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200304061200.h36C0MU07870@manatee.mojam.com> Bug/Patch Summary ----------------- 384 open / 3510 total bugs (+7) 136 open / 2062 total patches (no change) New Bugs -------- test_zipimport failing on ia64 (at least) (2003-03-30) http://python.org/sf/712322 Cannot change the class of a list (2003-03-31) http://python.org/sf/712975 test_pty fails on HP-UX and AIX when run after test_openpty (2003-03-31) http://python.org/sf/713169 site.py breaks if prefix is empty (2003-04-01) http://python.org/sf/713601 Distutils documentation amputated (2003-04-01) http://python.org/sf/713722 cPickle fails to pickle inf (2003-04-03) http://python.org/sf/714733 bsddb.first()/next() raise undocumented exception (2003-04-03) http://python.org/sf/715063 pydoc support for keywords (2003-04-05) http://python.org/sf/715782 Minor nested scopes doc issues (2003-04-06) http://python.org/sf/716168 New Patches ----------- Bug fix 548176: urlparse('http://foo?blah') errs (2003-03-30) http://python.org/sf/712317 sre fixes for lastindex and minimizing repeats+assertions (2003-03-31) http://python.org/sf/712900 Fixes for 'commands' module on win32 (2003-04-01) http://python.org/sf/713428 rfc822.parsedate returns a tuple (2003-04-01) http://python.org/sf/713599 freeze fails when extensions_win32.ini is missing (2003-04-01) http://python.org/sf/713645 iconv_codec NG (2003-04-02) http://python.org/sf/713820 Unicode Codecs for CJK Encodings (2003-04-02) http://python.org/sf/713824 Guard against segfaults in debug code (2003-04-02) http://python.org/sf/714348 timeouts for FTP connect (and other supported ops) (2003-04-03) http://python.org/sf/714592 Document freeze process in PC/config.c (2003-04-03) http://python.org/sf/714957 Closed Bugs ----------- locale.getpreferredencoding fails on AIX (2003-01-31) http://python.org/sf/678259 configure option --enable-shared make problems (2003-03-11) http://python.org/sf/701823 -i -u options give SyntaxError on Windows (2003-03-21) http://python.org/sf/707576 Closed Patches -------------- sgmllib support for additional tag forms (2002-04-17) http://python.org/sf/545300 posixfy some things (2002-12-08) http://python.org/sf/650412 Add missing constants for IRIX al module (2003-01-13) http://python.org/sf/667548 Py_Main() removal of exit() calls. Return value instead (2003-01-21) http://python.org/sf/672053 fix for bug 672614 :) (2003-02-28) http://python.org/sf/695250 Wrong prototype for PyUnicode_Splitlines on documentation (2003-03-11) http://python.org/sf/701395 more apply removals (2003-03-11) http://python.org/sf/701494 Fix a few broken links in pydoc (2003-03-19) http://python.org/sf/706338 Adds Mock Object support to unittest.TestCase (2003-03-19) http://python.org/sf/706590 Make "%c" % u"a" work (2003-03-26) http://python.org/sf/710127 Backport to 2.2.2 of codec registry fix (2003-03-27) http://python.org/sf/710576 Obsolete comment in urlparse.py (2003-03-30) http://python.org/sf/712124 From nas@python.ca Sun Apr 6 19:43:21 2003 From: nas@python.ca (Neil Schemenauer) Date: Sun, 6 Apr 2003 11:43:21 -0700 Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <3E900A80.3010802@zope.com> References: <list-1431082@digicool.com> <3E900A80.3010802@zope.com> Message-ID: <20030406184320.GA14894@glacier.arctrix.com> Jim Fulton wrote: > Maybe it's time to, at least optionally, cause a warning when > old-style classes are used. :) I'm not kidding for Zope. I think it > might be worth-while to be issue such a warning in Zope. A command line option that enabled new-style classes by default may be a good idea (suggested to me by AMK at PyCon). Neil From barry@python.org Sun Apr 6 23:03:32 2003 From: barry@python.org (Barry Warsaw) Date: 06 Apr 2003 18:03:32 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049594522.24643.57.camel@localhost.localdomain> References: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com> <1049594522.24643.57.camel@localhost.localdomain> Message-ID: <1049666611.9026.3.camel@geddy> On Sat, 2003-04-05 at 21:02, Jeremy Hylton wrote: > #1. (I think this was Fred's suggestion on Friday.) Don't do a > hasattr() check on the object, do it on the class. This is what happens > with new-style classes in Python 2.3: If a new-style class doesn't > define an __del__ method, then its instances don't have finalizer. It > doesn't matter whether the specific instance has an __del__ attribute. FWIW, IIRC Jython does something vaguely like this. Actually the existance of __del__ is check at class creation time because it's expensive to call __del__ when the object is Java gc'd, and we use two different Java classes for classic class instances depending on whether it had a __del__ or not. This means you can't add __del__ to the class or the instance after the class is defined. Personally I think this is reasonable and I don't recall this biting anyone when I was working on Jython. -Barry From tim_one@email.msn.com Mon Apr 7 01:47:53 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 6 Apr 2003 20:47:53 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <1049594522.24643.57.camel@localhost.localdomain> Message-ID: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com> [Jeremy Hylton] > I think I'll second the thought that there are no satisfactory answers > here. We've made a big step forward by fixing the core dumps. > > If we want to document the current behavior, we would say that garbage > collection may leave reachable objects in an "invalid state" in the > presence of "problematic objects." A "problematic object" is an > instance of a classic class that defines a getattr hook (__getattr__) > but not a finalizer (__del__). An object an in "invalid state" has had > its tp_clear slot executed; in the case of instances, this means the > __dict__ will be empty. Specifically, if a problematic object is part > of unreachable cycle, the garbage collector will execute the code in its > getattr hook; if executing that code makes any object in the cycle > reachable again, it will be left in an invalid state. I expect that documenting it comprehensbly is impossible. For example, the referrent of "it" in your last sentence is unclear, and hard to flesh out. A problematic object doesn't need to be part of a cycle to cause problems, and when it does cause problems the things that end up in an unexpected state needn't be part of cycles either. It's more that the problematic object needs to be reachable only from an unreachable cycle (the unreachable cycle needn't contain problematic objects), and then it's all the objects reachable only from the unreachable cycle and from the problematic object that may be in trouble (and regardless of whether they're in cycles). Here's a concrete example, where the instance of the problematic D isn't in a cycle, and neither are the list or the dict that get magically cleared (.mylist and .mydict) despite being resurrected: """ class C: pass class D: def __init__(self): self.mydict = {'a': 1, 'b': 2} self.mylist = range(100) def __getattr__(self, attribute): global alist if attribute == "__del__": alist.append(self.mydict) alist.append(self.mylist) raise AttributeError import gc gc.collect() a = C() a.loop = a # make a cycle a.d_instance = D() # an instance of D hangs *off* the cycle alist = [] del a print gc.collect() # 6: a, a.d_instance, their __dicts__, and D()'s # mydict and mylist print alist # [(), []] """ If we had enough words to explain that, it still wouldn't be enough, because the effect of calling tp_clear isn't defined by the language for any type. If, for example, D also defined a .mytuple attr and resurrected it in __getattr__, the user would see that *that* one survived OK (tuples happen to have a NULL tp_clear slot). > If we document this for 2.2, it's more complicated because instances of > new-style classes are also affected. What's worse, a new-style class > with a __getattribute__ hook is affected regardless of whether it has a > finalizer. In 2.2 but not 2.3, right? I haven't tried anything with __getattribute__. For that matter, in my own Python programming, I've never even defined a __getattr__ method -- I spend most of my life tracking down bugs in things I don't use <wink>. > Here are a couple of thoughts about how to avoid leaving objects in an > invalid state. I'd much rather pursue that than write docs nobody will understand. > It's pretty unlikely for it to happen, but speaking from > experience <wink> it's baffling when it does. > > #1. (I think this was Fred's suggestion on Friday.) Don't do a > hasattr() check on the object, do it on the class. This is what happens > with new-style classes in Python 2.3: If a new-style class doesn't > define an __del__ method, then its instances don't have finalizer. It > doesn't matter whether the specific instance has an __del__ attribute. > > Limitations: This is a change in semantics, although it only covers a > nearly insane corner case. The other limitation is that things could > still go wrong, although only in the presence of a classic metaclass! I'm not sure I followed the last sentence. If I did, screw calling hasattr() -- do a string lookup for "__del__" in the classic class's __dict__, and that's it. Anything that ends up executing arbitrary Python code is going to leave holes. > #2. If an object has a getattr hook and it's involved in a cycle, just > put it in gc.garbage. Forget about checking for a finalizer. That > seems fine for 2.3, since we're only talking about classic classes with > getattr hooks. But it doesn't sound very pleasant for 2.2, since it > covers an class instance with a getattr hook. I'd like to avoid expanding the definition of what ends up in gc.garbage. The relationship to __del__ and unreachable cycles is explainable now, modulo the __getattr__ insanity. Getting rid of the latter is a lot more attractive than folding it into the former. > I think #1 is pretty reasonable. I'd like to see something fixed for > 2.2.3, but I worry that the semantic change may be unacceptable for a > bug fix release. (But maybe not, the semantics are pretty insane right > now :-). I have no problem with changing this for 2.2.3. I doubt any Python app will be affected, except possibly to rid 1 in 10,000 of a subtle bug. There's certainly no defensible app that relied on Python segfaulting here<wink>, and I can't imagine any relying on containers getting magically cleared at unpredictable times. BTW, I'm still wondering why the ZODB thread test failed the way it did for Tres and Barry and me: you saw corrupt gc lists, but the rest of us never did. We saw a Connection instance with a mysteriously cleared __dict__. That's consistent with the __getattr__-hook-resurrects-an- object-reachable-only-from-an-unreachable-cycle examples I posted, but did you guys figure out on Friday whether that's what was actually happening? The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting unreachable objects while gc was still crawling over them, and that's a different (albeit related) problem than __dicts__ getting cleared by magic. From greg@cosc.canterbury.ac.nz Mon Apr 7 01:54:20 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 07 Apr 2003 12:54:20 +1200 (NZST) Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOFEEAB.tim_one@email.msn.com> Message-ID: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz> > I don't see a sane way to fix this so long as asking whether __del__ >exists can execute arbitrary mounds of Python code. This further confirms my opinion that __del__ methods are evil, and the language would be the better for their complete removal. Failing that, perhaps they should be made a bit less dynamic, so that the GC can make reasonable assumptions about their existence without having to execute Python code. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Apr 7 01:56:35 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 07 Apr 2003 12:56:35 +1200 (NZST) Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <20030405193531.GA23455@meson.dyndns.org> Message-ID: <200304070056.h370uZc14935@oma.cosc.canterbury.ac.nz> Jp Calderone <exarkun@intarweb.us>: > perhaps os.fdopen would be more logically placed someplace else - > Perhaps as a class method of the file type, file.fromfd()? Not all OSes have the notion of a file descriptor, which is probably why it's in the os module. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Apr 7 02:04:39 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 07 Apr 2003 13:04:39 +1200 (NZST) Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <3E900A80.3010802@zope.com> Message-ID: <200304070104.h3714df15005@oma.cosc.canterbury.ac.nz> > Maybe it's time to, at least optionally, cause a warning when > old-style classes are used. :) You might want to, er, make an exception for subclasses of Exception (you still don't get any choice there, right?) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim_one@email.msn.com Mon Apr 7 02:11:10 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 6 Apr 2003 21:11:10 -0400 Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <list-1431942@digicool.com> Message-ID: <LNBBLJKPBEHFEDALKOLCCEBGEFAB.tim_one@email.msn.com> [Jim Fulton] > If I understand the problem, it can be avoided by avoiding > old-style classes. In Python 2.3, that appears to be true. In Python 2.2.2, not true. The problems are caused by __getattr__ hooks that resurrect unreachable objects, and/or remove the last reference to an unreachable object, when such a hook is on an instance reachable only from an unreachable cycle, and the class doesn't explicitly define a __del__ method, and the class has a getattr hook, and the getattr hook does extreme things instead of just saying "no, there's no __del__ here". Python 2.3 introduced new machinery for new-style classes specifically aimed at answering the "does it support __del__?" question without invoking getattr hooks, and that's why it's not a problem for new-style classes in 2.3. New-style classes still go thru getattr hooks to answer this question in 2.2.2. There were problem in Python and problems in Zope here. Jeremy fixed the Zope problems under 2.2 by breaking the and the getattr hook does extreme things instead of just saying "no, there's no __del__ here" link of the chain for persistent objects. > Maybe it's time to, at least optionally, cause a warning when > old-style classes are used. :) I'm not kidding for Zope. I think it > might be worth-while to be issue such a warning in Zope. There may be good reasons for wanting that, but none raised in this thread so far are relevant (unless 2.3 is mandated for Zope, which I'm sure we don't want to do). From tim_one@email.msn.com Mon Apr 7 02:30:56 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 6 Apr 2003 21:30:56 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCMEBHEFAB.tim_one@email.msn.com> [Greg Ewing] > This further confirms my opinion that __del__ methods are evil, and > the language would be the better for their complete removal. They sure create more than their share of implementation headaches, so don't fare well on the "if the implementation is hard to explain, it's a bad idea" scale. > Failing that, perhaps they should be made a bit less dynamic, so that > the GC can make reasonable assumptions about their existence without > having to execute Python code. Guido already did so for new-style classes in Python 2.3. That machinery doesn't exist in 2.2.2, and old-style classes remain a problem under 2.3 too. Backward compatibility constrains how much we can get away with, of course. From jeremy@alum.mit.edu Mon Apr 7 04:45:05 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 06 Apr 2003 23:45:05 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCMEBDEFAB.tim_one@email.msn.com> Message-ID: <1049687104.1383.27.camel@localhost.localdomain> On Sun, 2003-04-06 at 20:47, Tim Peters wrote: > BTW, I'm still wondering why the ZODB thread test failed the way it did for > Tres and Barry and me: you saw corrupt gc lists, but the rest of us never > did. We saw a Connection instance with a mysteriously cleared __dict__. > That's consistent with the __getattr__-hook-resurrects-an- > object-reachable-only-from-an-unreachable-cycle examples I posted, but did > you guys figure out on Friday whether that's what was actually happening? > The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting > unreachable objects while gc was still crawling over them, and that's a > different (albeit related) problem than __dicts__ getting cleared by magic. [Note to everyone else, there's a lot of ZODB-specific detail in the answer. It might not be that interesting beyond ZODB developers.] The __getattr__ code in ZODB made a large cycle of objects reachable again. The __getattr__ hook called a method on a ZODB Connection and the Connection registered itself with the current transaction (basically, a global resource). Then the Connection got tp_cleared by the garbage collector. Now the Connection is a zombie but it's also registered with a transaction. When the transaction commits or aborts, the code failed because the Connection didn't have any attributes. I got particularly lucky with my compiler/platform/Python version/whatever. Part of the code in __getattr__ deleted a key-value pair from a dictionary. I think that was partly chance; there was nothing about the code that guaranteed the key was in the dict, but it deleted it if it was. The value in the dict was a weakref. The weakref decrefed and deallocated its callback function. Just by luck, the callback function was the next thing in the unreachable gc list. So I got a segfault when I dereferenced the now-freed GC header of the callback object. Jeremy From oren-py-d@hishome.net Mon Apr 7 07:16:30 2003 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 7 Apr 2003 02:16:30 -0400 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <20030405193531.GA23455@meson.dyndns.org> References: <20030405193531.GA23455@meson.dyndns.org> Message-ID: <20030407061630.GA12658@hishome.net> On Sat, Apr 05, 2003 at 02:35:31PM -0500, Jp Calderone wrote: > It occurred to me this afternoon (after answering aquestion about creating > file objects from file descriptors) that perhaps os.fdopen would be more > logically placed someplace else - of course it could also remain as > os.fdopen() for whatever deprecation period is warrented. > > Perhaps as a class method of the file type, file.fromfd()? I don't see much point in moving it around just because the place doesn't seem right but the fact that it's a function rather than a method means that some things cannot be done in pure Python. I can create an uninitialized instance of a subclass of 'file' using file.__new__(filesubclass) but the only way to open it is by name using file.__init__(filesubclassinstance, 'filename'). A file subclass cannot be opened from a file descriptor because fdopen always returns a new instance of 'file'. If there was some way to open an uninitialized file object from a file descriptor it would be possible, for example, to write a version of popen that returns a subclass of file. It could add a method for retrieving the exit code of the process, do something interesting on __del__, etc. Here are some alternatives of where this could be implemented, followed by what a Python implementation of os.fdopen would look like: 1. New form of file.__new__ with more arguments: def fdopen(fd, mode='r', buffering=-1): return file.__new__('(fdopen)', mode, buffering, fd) 2. Optional argument to file.__init__: def fdopen(fd, mode='r', buffering=-1): return file('(fdopen)', mode, buffering, fd) 3. Instance method (NOT a class method): def fdopen(fd, mode='r', buffering=-1): f = file.__new__() f.fdopen(fd, mode, buffering, '(fdopen)') return f Oren From theller@python.net Mon Apr 7 07:56:38 2003 From: theller@python.net (Thomas Heller) Date: 07 Apr 2003 08:56:38 +0200 Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS,1.703,1.704) In-Reply-To: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net> References: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net> Message-ID: <brziu76h.fsf@python.net> loewis@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Misc > In directory sc8-pr-cvs1:/tmp/cvs-serv28757/Misc > > Modified Files: > NEWS > Log Message: > Rename LONG_LONG to PY_LONG_LONG. Fixes #710285. > What is the recommended way to port code like this to Python 2.3, and still remain compatible with 2.2? Thanks, Thomas typedef struct { PyObject_HEAD char tag; union { char c; char b; short h; int i; long l; #ifdef HAVE_LONG_LONG LONG_LONG q; #endif double d; float f; void *p; } value; PyObject *obj; } PyCArgObject; From mhammond@skippinet.com.au Mon Apr 7 12:23:02 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 07 Apr 2003 21:23:02 +1000 Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS,1.703,1.704) In-Reply-To: <brziu76h.fsf@python.net> Message-ID: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au> > > Rename LONG_LONG to PY_LONG_LONG. Fixes #710285. > > > > What is the recommended way to port code like this to Python 2.3, > and still remain compatible with 2.2? #if defined(PY_LONG_LONG) && !defined(LONG_LONG) #define LONG_LONG PY_LONG_LONG /* grrr :( */ #endif ? <wink> This change does break things. Mark. From skip@pobox.com Mon Apr 7 15:56:41 2003 From: skip@pobox.com (Skip Montanaro) Date: Mon, 7 Apr 2003 09:56:41 -0500 Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS, 1.703, 1.704) In-Reply-To: <brziu76h.fsf@python.net> References: <E18zDEE-0007Ww-00@sc8-pr-cvs1.sourceforge.net> <brziu76h.fsf@python.net> Message-ID: <16017.37289.216513.120081@montanaro.dyndns.org> Thomas> What is the recommended way to port code like this to Python Thomas> 2.3, and still remain compatible with 2.2? Thomas> #ifdef HAVE_LONG_LONG Thomas> LONG_LONG q; Thomas> #endif Wouldn't this work? #ifdef HAVE_LONG_LONG # ifdef Py_LONG_LONG Py_LONG_LONG q; # else LONG_LONG q; # endif #endif As MarkH pointed out, this change is going to break some code, but there's probably no way around it. Obviously, some other package defines a LONG_LONG macro or there wouldn't have been a bug report. Better to bite the bullet sooner than later. Skip From msg_2222@yahoo.com Mon Apr 7 18:16:53 2003 From: msg_2222@yahoo.com (Rick Y) Date: Mon, 7 Apr 2003 10:16:53 -0700 (PDT) Subject: [Python-Dev] socket question Message-ID: <20030407171653.41362.qmail@web20711.mail.yahoo.com> how can i enable _sockt module in my solaris python?. i did not build it. Downloaded it from sunfreeware. ./viewcvs-install Traceback (most recent call last): File "./viewcvs-install", line 35, in ? import compat File "./lib/compat.py", line 20, in ? import urllib File "/usr/local/lib/python2.1/urllib.py", line 26, in ? import socket File "/usr/local/lib/python2.1/socket.py", line 41, in ? from _socket import * ImportError: ld.so.1: python: fatal: libssl.so.0.9.6: open failed: No such file or directory __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com From aahz@pythoncraft.com Mon Apr 7 18:28:37 2003 From: aahz@pythoncraft.com (Aahz) Date: Mon, 7 Apr 2003 13:28:37 -0400 Subject: [Python-Dev] socket question In-Reply-To: <20030407171653.41362.qmail@web20711.mail.yahoo.com> References: <20030407171653.41362.qmail@web20711.mail.yahoo.com> Message-ID: <20030407172837.GA18682@panix.com> On Mon, Apr 07, 2003, Rick Y wrote: > > how can i enable _sockt module in my solaris python?. python-dev is for discussions about developing the language, not for questions about using Python. You'll probably get better advice by subscribing to the newsgroup comp.lang.python (or python-list). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz, c.l.py, 2/4/2002 From jeremy@zope.com Mon Apr 7 18:43:28 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 07 Apr 2003 13:43:28 -0400 Subject: [Python-Dev] socket question In-Reply-To: <20030407171653.41362.qmail@web20711.mail.yahoo.com> References: <20030407171653.41362.qmail@web20711.mail.yahoo.com> Message-ID: <1049737408.23331.19.camel@slothrop.zope.com> Rick, This question would be more appropriate on python-list. The python-dev list is for discussion among people who work on the Python implementation, rather than for end-user questions. But don't sweat it; you probably didn't know that. On Mon, 2003-04-07 at 13:16, Rick Y wrote: > how can i enable _sockt module in my solaris python?. > i did not build it. Downloaded it from sunfreeware. > > ./viewcvs-install > Traceback (most recent call last): > File "./viewcvs-install", line 35, in ? > import compat > File "./lib/compat.py", line 20, in ? > import urllib > File "/usr/local/lib/python2.1/urllib.py", line 26, > in ? > import socket > File "/usr/local/lib/python2.1/socket.py", line 41, > in ? > from _socket import * > ImportError: ld.so.1: python: fatal: libssl.so.0.9.6: > open failed: No such file or directory The version of Python you are using has been linked against OpenSSL. The import of _socket is failing because the libssl.so can't be found at run-time. You either need to tell your linker where to find the file or install OpenSSL. I'm sure you can find more help on the details on the other list. Jeremy From martin@v.loewis.de Mon Apr 7 22:29:14 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 07 Apr 2003 23:29:14 +0200 Subject: [Python-Dev] LONG_LONG (Was: [Python-checkins] python/dist/src/Misc NEWS,1.703,1.704) In-Reply-To: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au> References: <LCEPIIGDJPKCOIHOBJEPKEBHOMAA.mhammond@skippinet.com.au> Message-ID: <m3el4edmj9.fsf@mira.informatik.hu-berlin.de> Mark Hammond <mhammond@skippinet.com.au> writes: > #if defined(PY_LONG_LONG) && !defined(LONG_LONG) > #define LONG_LONG PY_LONG_LONG /* grrr :( */ > #endif That works; perhaps one would remove the comment... > This change does break things. Most certainly. However, it was broken before, as it failed to be renamed in the grand renaming. Regards, Martin From marcus.h.mendenhall@vanderbilt.edu Tue Apr 8 15:38:57 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Tue, 8 Apr 2003 09:38:57 -0500 Subject: [Python-Dev] _socket efficiencies ideas Message-ID: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> I have been in discussion recently with Martin v. Loewis about an idea I have been thinking about for a while to improve the efficiency of the connect method in the _socket module. I posted the original suggestion to the python suggestions tracker on sourceforge as item 706392. A bit of history and justification: I am doing a lot of work using python to develop almost-real-time distributed data acquisition and control systems from running laboratory apparatus. In this environment, I do a lot of sun-rpc calls as part of the vxi-11 protocol to allow TCP/IP access to gpib-like devices. As a part of this, I do a lot sock socket.connect() calls, often with the connections being quite transient. The problem is that the current python _socket module makes a DNS call to try to resolve each address before connect is called, which if I am connecting/disconnecting many times a second results in pathological and gratuitous network activity. Incidentally, I am in the process of creating a sourceforge project, pythonlabtools (just approved this morning), in which I will start maintaining a repository of the tools I have been working on. My first solution to this, for which I submitted a patch to the tracker system (with guidance from Martin), was to create a wrapper for the sockaddr object, which one can create in advance, and when _socket.connect() is called (actually when getsockaddrarg() is called by connect), results in an immediate connection without any DNS activity. This solution solves part of the problem, but may not be the right final one. After writing this patch and verifying its functionality, I tried it in the real world. Then, I realized that for sun-rpc work, it wasn't quite what I needed, since the socket number may be changing each time the rpc request is made, resulting in a new address wrapper being needed, and thus DNS activity again. After thinking about what I have done with this patch, I would also like to suggest another change (for which I am also willing to submit the patch, which is quite simple): Consistent with some of the already extant glue in _socket to handle addresses like <broadcast>, would there be any reason no to modify setipaddr() and getaddrinfo() so that if an address is prefixed with <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags are always set so these routines reject any non-numeric address, but handle numeric ones very efficiently? I have already implemented a predecessor to this which I am experimentally running at home in python 2.2.2, in which I made it so that prefixing the address with an exclamation point provided this functionality. Given the somewhat more legible approach the team has already chosen for special addresses, I see no reason why using a <numeric> (or some such) prefix isn't reasonable. Do any members of the development team have commentary on this? Would such a change be likely to be accepted into the system? Any reasons which it might break something? The actual patch would be only about 10 lines of code, (plus some documentation), a few in each of the routines mentioned above. Thanks for any suggestions. Marcus Mendenhall From guido@python.org Tue Apr 8 15:50:50 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 08 Apr 2003 10:50:50 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Tue, 08 Apr 2003 09:38:57 CDT." <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> Message-ID: <200304081450.h38EoqE20178@odiug.zope.com> > I have been in discussion recently with Martin v. Loewis about an idea > I have been thinking about for a while to improve the efficiency of the > connect method in the _socket module. I posted the original suggestion > to the python suggestions tracker on sourceforge as item 706392. > > A bit of history and justification: > I am doing a lot of work using python to develop almost-real-time > distributed data acquisition and control systems from running > laboratory apparatus. In this environment, I do a lot of sun-rpc calls > as part of the vxi-11 protocol to allow TCP/IP access to gpib-like > devices. As a part of this, I do a lot sock socket.connect() calls, > often with the connections being quite transient. The problem is that > the current python _socket module makes a DNS call to try to resolve > each address before connect is called, which if I am > connecting/disconnecting many times a second results in pathological > and gratuitous network activity. Incidentally, I am in the process of > creating a sourceforge project, pythonlabtools (just approved this > morning), in which I will start maintaining a repository of the tools I > have been working on. Are you sure that it tries make a DNS call even when the address is pure numeric? That seems a mistake, and if that's really happening, I think that is the part that should be fixed. Maybe in the _socket module, maybe in getaddrinfo(). > My first solution to this, for which I submitted a patch to the tracker > system (with guidance from Martin), was to create a wrapper for the > sockaddr object, which one can create in advance, and when > _socket.connect() is called (actually when getsockaddrarg() is called > by connect), results in an immediate connection without any DNS > activity. > > This solution solves part of the problem, but may not be the right > final one. After writing this patch and verifying its functionality, I > tried it in the real world. Then, I realized that for sun-rpc work, it > wasn't quite what I needed, since the socket number may be changing > each time the rpc request is made, resulting in a new address wrapper > being needed, and thus DNS activity again. > > After thinking about what I have done with this patch, I would also > like to suggest another change (for which I am also willing to submit > the patch, which is quite simple): Consistent with some of the already > extant glue in _socket to handle addresses like <broadcast>, would > there be any reason no to modify > setipaddr() and getaddrinfo() so that if an address is prefixed with > <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags > are always set so these routines reject any non-numeric address, but > handle numeric ones very efficiently? > > I have already implemented a predecessor to this which I am > experimentally running at home in python 2.2.2, in which I made it so > that prefixing the address with an exclamation point provided this > functionality. Given the somewhat more legible approach the team has > already chosen for special addresses, I see no reason why using a > <numeric> (or some such) prefix isn't reasonable. > > Do any members of the development team have commentary on this? Would > such a change be likely to be accepted into the system? Any reasons > which it might break something? The actual patch would be only about > 10 lines of code, (plus some documentation), a few in each of the > routines mentioned above. I don't see why we would have to add the <numeric> flag to the address when the form of the address itself is already a perfect clue that the address is purely numeric. I'd be happy to see a patch that intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those without calling getaddrinfo(). --Guido van Rossum (home page: http://www.python.org/~guido/) From marcus.h.mendenhall@vanderbilt.edu Tue Apr 8 16:59:27 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Tue, 8 Apr 2003 10:59:27 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304081450.h38EoqE20178@odiug.zope.com> Message-ID: <138CDF38-69DB-11D7-A8D4-003065A81A70@vanderbilt.edu> Thanks for your prompt reply! On Tuesday, April 8, 2003, at 09:50 AM, Guido van Rossum wrote: >> I have been in discussion recently with Martin v. Loewis about an idea >> I have been thinking about for a while to improve the efficiency of >> the >> connect method in the _socket module. I posted the original >> suggestion >> to the python suggestions tracker on sourceforge as item 706392. >> >> A bit of history and justification: >> I am doing a lot of work using python to develop almost-real-time >> distributed data acquisition and control systems from running >> laboratory apparatus. In this environment, I do a lot of sun-rpc >> calls >> as part of the vxi-11 protocol to allow TCP/IP access to gpib-like >> devices. As a part of this, I do a lot sock socket.connect() calls, >> often with the connections being quite transient. The problem is that >> the current python _socket module makes a DNS call to try to resolve >> each address before connect is called, which if I am >> connecting/disconnecting many times a second results in pathological >> and gratuitous network activity. Incidentally, I am in the process of >> creating a sourceforge project, pythonlabtools (just approved this >> morning), in which I will start maintaining a repository of the tools >> I >> have been working on. > > Are you sure that it tries make a DNS call even when the address is > pure numeric? That seems a mistake, and if that's really happening, I > think that is the part that should be fixed. Maybe in the _socket > module, maybe in getaddrinfo(). > Yes, it seems to do this. It sets the PASSIVE flags, but that doesn't seem to be quite enough to prevent DNS activity, although the NUMERIC flag does the job. This is true, at least, in 2.3.x on MacOSX, and since the socket stuff is all the same, I suspect it is true on many Unixes. Note that this doesn't happen on the MacOS9 version, which provides its own socket interface through GUSI, which apparently is smart enough to handle it. >> My first solution to this, for which I submitted a patch to the >> tracker >> system (with guidance from Martin), was to create a wrapper for the >> sockaddr object, which one can create in advance, and when >> _socket.connect() is called (actually when getsockaddrarg() is called >> by connect), results in an immediate connection without any DNS >> activity. >> >> This solution solves part of the problem, but may not be the right >> final one. After writing this patch and verifying its functionality, >> I >> tried it in the real world. Then, I realized that for sun-rpc work, >> it >> wasn't quite what I needed, since the socket number may be changing >> each time the rpc request is made, resulting in a new address wrapper >> being needed, and thus DNS activity again. >> >> After thinking about what I have done with this patch, I would also >> like to suggest another change (for which I am also willing to submit >> the patch, which is quite simple): Consistent with some of the >> already >> extant glue in _socket to handle addresses like <broadcast>, would >> there be any reason no to modify >> setipaddr() and getaddrinfo() so that if an address is prefixed with >> <numeric> (e.g. <numeric>127.0.0.1) that the PASSIVE and NUMERIC flags >> are always set so these routines reject any non-numeric address, but >> handle numeric ones very efficiently? >> >> I have already implemented a predecessor to this which I am >> experimentally running at home in python 2.2.2, in which I made it so >> that prefixing the address with an exclamation point provided this >> functionality. Given the somewhat more legible approach the team has >> already chosen for special addresses, I see no reason why using a >> <numeric> (or some such) prefix isn't reasonable. >> >> Do any members of the development team have commentary on this? Would >> such a change be likely to be accepted into the system? Any reasons >> which it might break something? The actual patch would be only about >> 10 lines of code, (plus some documentation), a few in each of the >> routines mentioned above. > > I don't see why we would have to add the <numeric> flag to the address > when the form of the address itself is already a perfect clue that the > address is purely numeric. I'd be happy to see a patch that > intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those > without calling getaddrinfo(). > Do we want this? The parser also then have to be modified when to handle numeric INET6 addresses, when they become popular. I actually did implement one of my trial versions this way, and it worked fine. There is one minor issue, too. In urllib, there are some calls to getaddrinfo to get (for maybe no good reason), CNAMEs of addresses. I would like some way to tag an address with a very strong comment that it is what it is, and I would like all further processing disabled. Also, a 'trial' parsing of an address for matching a a.b.c.d pattern each time is a lot more processor inensive than checking for <numeric> at the beginning. I am perfectly happy to implement it either way. > --Guido van Rossum (home page: http://www.python.org/~guido/) > From guido@python.org Tue Apr 8 19:01:24 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 08 Apr 2003 14:01:24 -0400 Subject: [PythonLabs] Re: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: Your message of "Sun, 06 Apr 2003 11:43:21 PDT." <20030406184320.GA14894@glacier.arctrix.com> References: <list-1431082@digicool.com> <3E900A80.3010802@zope.com> <20030406184320.GA14894@glacier.arctrix.com> Message-ID: <200304081801.h38I1QL22691@odiug.zope.com> > A command line option that enabled new-style classes by default may be a > good idea (suggested to me by AMK at PyCon). I expect lots of things to break; such an option would have to be at least as well-hidden as -U. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 8 19:06:47 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 08 Apr 2003 14:06:47 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: Your message of "Mon, 07 Apr 2003 12:54:20 +1200." <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz> References: <200304070054.h370sK814932@oma.cosc.canterbury.ac.nz> Message-ID: <200304081806.h38I6v822730@odiug.zope.com> > This further confirms my opinion that __del__ methods are evil, and > the language would be the better for their complete removal. No can do. There must be a way to force e.g. calling os.close() for an integer file descriptor returned by os.open() without writing C code. But this should be exceedingly rare. A quick inspection of the standard library found one other case: flushing buffered data out. I think that's also a valid use of __del__. > Failing that, perhaps they should be made a bit less dynamic, so > that the GC can make reasonable assumptions about their existence > without having to execute Python code. +1 --Guido van Rossum (home page: http://www.python.org/~guido/) From jafo@tummy.com Wed Apr 9 13:48:48 2003 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 9 Apr 2003 06:48:48 -0600 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304081450.h38EoqE20178@odiug.zope.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> Message-ID: <20030409124848.GB15649@tummy.com> On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote: >Are you sure that it tries make a DNS call even when the address is >pure numeric? That seems a mistake, and if that's really happening, I My first thought is that there should be a local DNS cache on the machine that is running these apps. My second thought is that Python could benefit from caching some lookup information... >address is purely numeric. I'd be happy to see a patch that >intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those >without calling getaddrinfo(). It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere, you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses will fill in missing ".0"s, which is particularly handy for accessing "127.1", which gets expanded to "127.0.0.1". Sean -- Rocky: "Do you know what an A-Bomb is?" Bullwinkle: "Of course. ``A Bomb'' is what some people call our show." Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin From hbl@st-andrews.ac.uk Wed Apr 9 14:35:46 2003 From: hbl@st-andrews.ac.uk (Hamish Lawson) Date: Wed, 09 Apr 2003 14:35:46 +0100 Subject: [Python-Dev] PEP305 csv package: from csv import csv? Message-ID: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk> [Please excuse my posting this message here after initially posting it to python-list, but I realised afterwards that this might be the more appropriate forum (it hasn't so far had any responses on python-list anyway).] According to the documentation in progress at http://www.python.org/dev/doc/devel/whatsnew/node14.html use of the forthcoming csv module (as described in PEP305) requires it to be imported from the csv package: from csv import csv input = open('datafile', 'rb') reader = csv.reader(input) for line in reader: print line Is there some reason why the cvs package's __init__.py doesn't import the required names from cvs.py, so allowing the shorter form below? import csv input = open('datafile', 'rb') reader = csv.reader(input) for line in reader: print line Hamish Lawson From skip@pobox.com Wed Apr 9 14:43:11 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 9 Apr 2003 08:43:11 -0500 Subject: [Python-Dev] PEP305 csv package: from csv import csv? In-Reply-To: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk> References: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk> Message-ID: <16020.9071.801846.936864@montanaro.dyndns.org> >>>>> "Hamish" == Hamish Lawson <hbl@st-andrews.ac.uk> writes: Hamish> [Please excuse my posting this message here after initially Hamish> posting it to python-list, but I realised afterwards that this Hamish> might be the more appropriate forum (it hasn't so far had any Hamish> responses on python-list anyway).] ... Actually, I forwarded your note to the csv mailing list: csv@mail.mojam.com. That'd be the best place to discuss the topic. ;-) I'll probably get around to changing things in the next day or two, but please feel free to submit a patch so I don't forget. Skip From guido@python.org Wed Apr 9 14:51:26 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 09:51:26 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 06:48:48 MDT." <20030409124848.GB15649@tummy.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> Message-ID: <200304091351.h39DpSq24961@odiug.zope.com> > On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote: > >Are you sure that it tries make a DNS call even when the address is > >pure numeric? That seems a mistake, and if that's really happening, I > > My first thought is that there should be a local DNS cache on the > machine that is running these apps. My second thought is that Python > could benefit from caching some lookup information... I don't want to build a cache into Python, it should already be part of libresolv. > >address is purely numeric. I'd be happy to see a patch that > >intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those > >without calling getaddrinfo(). > > It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere, The IPv6 folks can add their own cache. > you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses > will fill in missing ".0"s, which is particularly handy for accessing > "127.1", which gets expanded to "127.0.0.1". I didn't even know this, and I think it's bad style to use something that obscure (most people would probably guess that 127.1 means 0.0.127.1 or 127.1.0.0). But since you seem to know about this stuff, perhaps you can submit a patch? --Guido van Rossum (home page: http://www.python.org/~guido/) From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 15:20:50 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Wed, 9 Apr 2003 09:20:50 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com> Message-ID: <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu> OK, I'll chime back in on the thread I started... I mostly have a question for Sean, since he seems to know the networking stuff well. Do you know of any reason why my original proposal (which is to allows IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution is attempted, which basically causes parsing with not real resolution at all) would break any known or plausible networking standards? The current Python socket module basically hides this part of the BSD socket API, and I find it quite useful to be able to suppress DNS activity absolutely for some addresses. And for Guido: since this type of tag has already been used in Python (as <broadcast>), is there any reason why this solution is inelegant? Thanks. Marcus On Wednesday, April 9, 2003, at 08:51 AM, Guido van Rossum wrote: >> On Tue, Apr 08, 2003 at 10:50:50AM -0400, Guido van Rossum wrote: >>> Are you sure that it tries make a DNS call even when the address is >>> pure numeric? That seems a mistake, and if that's really happening, >>> I >> >> My first thought is that there should be a local DNS cache on the >> machine that is running these apps. My second thought is that Python >> could benefit from caching some lookup information... > > I don't want to build a cache into Python, it should already be part > of libresolv. > >>> address is purely numeric. I'd be happy to see a patch that >>> intercepts addresses of the form \d+\.\d+\.\d+\.\d+ and parses those >>> without calling getaddrinfo(). >> >> It's not quite that easy. Beyond the IPV6 issues mentioned elsewhere, > > The IPv6 folks can add their own cache. > >> you'd also want to check "\d+.\d+" and "\d+\.\d+\.\d+". IP addresses >> will fill in missing ".0"s, which is particularly handy for accessing >> "127.1", which gets expanded to "127.0.0.1". > > I didn't even know this, and I think it's bad style to use something > that obscure (most people would probably guess that 127.1 means > 0.0.127.1 or 127.1.0.0). > > But since you seem to know about this stuff, perhaps you can submit a > patch? > > --Guido van Rossum (home page: http://www.python.org/~guido/) > From Anthony Baxter <anthony@interlink.com.au> Wed Apr 9 15:24:45 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Thu, 10 Apr 2003 00:24:45 +1000 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <20030409124848.GB15649@tummy.com> Message-ID: <200304091424.h39EOje08304@localhost.localdomain> >>> Sean Reifschneider wrote > My first thought is that there should be a local DNS cache on the > machine that is running these apps. My second thought is that Python > could benefit from caching some lookup information... Ick ick. This is putting a bunch of code for a stub resolver into python. This stuff is hard to get right - I implemented this on top of pydns, and it was a lot of work to get (what I think is) correct, for not very much gain. The idea of either suppressing DNS lookups for all-numeric addresses, or some sort of extended API for suppressing DNS lookups might be better, but really, isn't this the job of the stub resolver? Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 15:32:00 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Wed, 9 Apr 2003 09:32:00 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain> Message-ID: <069761E4-6A98-11D7-87F7-003065A81A70@vanderbilt.edu> On Wednesday, April 9, 2003, at 09:24 AM, Anthony Baxter wrote: > >>>> Sean Reifschneider wrote >> My first thought is that there should be a local DNS cache on the >> machine that is running these apps. My second thought is that Python >> could benefit from caching some lookup information... > > Ick ick. This is putting a bunch of code for a stub resolver into > python. > This stuff is hard to get right - I implemented this on top of pydns, > and > it was a lot of work to get (what I think is) correct, for not very > much > gain. > > The idea of either suppressing DNS lookups for all-numeric addresses, > or > some sort of extended API for suppressing DNS lookups might be better, > but really, isn't this the job of the stub resolver? > This is part of the resolver API, via the AI_NUMERIC flags. I am just trying to expose that API to the top level of python. Marcus > Anthony > > -- > Anthony Baxter <anthony@interlink.com.au> > It's never too late to have a happy childhood. > From guido@python.org Wed Apr 9 15:37:35 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 10:37:35 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 09:20:50 CDT." <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu> References: <77018B84-6A96-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <200304091437.h39Ebc125316@odiug.zope.com> > OK, I'll chime back in on the thread I started... I mostly have a > question for Sean, since he seems to know the networking stuff well. I'll chime in nevertheless. > Do you know of any reason why my original proposal (which is to allows > IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause > both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution > is attempted, which basically causes parsing with not real resolution > at all) would break any known or plausible networking standards? What are those flags? Which API uses them? I still don't understand why intercepting the all-numeric syntax isn't good enough, and why you want a <numeric> prefix. > The current Python socket module basically hides this part of the > BSD socket API, and I find it quite useful to be able to suppress > DNS activity absolutely for some addresses. And for Guido: since > this type of tag has already been used in Python (as <broadcast>), > is there any reason why this solution is inelegant? The reason I'm reluctant to add a new notation is that AFAIK it would be unique to Python. It's better to stick to standard notations IMO. <broadcast> was probably a mistake, since it seems to mean the same as 0.0.0.0 (for IPv4). --Guido van Rossum (home page: http://www.python.org/~guido/) From neal@metaslash.com Wed Apr 9 15:38:03 2003 From: neal@metaslash.com (Neal Norwitz) Date: Wed, 09 Apr 2003 10:38:03 -0400 Subject: [Python-Dev] SF file uploads work now Message-ID: <20030409143803.GE17847@epoch.metaslash.com> SF has fixed the problem which prevented a file from being uploaded when submitting a new patch. I just tested this and it worked. Neal From jafo@tummy.com Wed Apr 9 15:40:37 2003 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 9 Apr 2003 08:40:37 -0600 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain> References: <20030409124848.GB15649@tummy.com> <200304091424.h39EOje08304@localhost.localdomain> Message-ID: <20030409144037.GL1756@tummy.com> On Thu, Apr 10, 2003 at 12:24:45AM +1000, Anthony Baxter wrote: >Ick ick. This is putting a bunch of code for a stub resolver into python. >This stuff is hard to get right - I implemented this on top of pydns, and >it was a lot of work to get (what I think is) correct, for not very much >gain. Well, ideally you'd cache the data for as long as the SOA says to cache it. However, it sounds like in the situation that started this thread, even caching that data for some small but configurable number of seconds might help out. >The idea of either suppressing DNS lookups for all-numeric addresses, or >some sort of extended API for suppressing DNS lookups might be better, >but really, isn't this the job of the stub resolver? Definitely, on both counts... I like the idea of the "<numeric>127.0.0.1" or otherwise somehow specifying that the address shouldn't be resolved. I wouldn't think that it'd be good to do lookups of purely IP addresses, but there is probably some obscure part of some spec that says it should happen. Contrary to popular belief, just because I know that IP addresses get padded with 0s, I'm not a networking lawyer. ;-) I learned that trick because it can help make dealing with IPV6 addresses much easier, but I've found it most useful with 127.1. Sean -- This message is REALLY offensive, so I ROT-13d it TWICE. -- Sean Reifschneider being silly on #python, 2000 Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin From guido@python.org Wed Apr 9 15:41:37 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 10:41:37 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Thu, 10 Apr 2003 00:24:45 +1000." <200304091424.h39EOje08304@localhost.localdomain> References: <200304091424.h39EOje08304@localhost.localdomain> Message-ID: <200304091441.h39EfnU25347@odiug.zope.com> > Ick ick. This is putting a bunch of code for a stub resolver into python. > This stuff is hard to get right - I implemented this on top of pydns, and > it was a lot of work to get (what I think is) correct, for not very much > gain. What I said. > The idea of either suppressing DNS lookups for all-numeric addresses, or > some sort of extended API for suppressing DNS lookups might be better, > but really, isn't this the job of the stub resolver? Hey, I just figured it out. The old socket module (Python 2.1 and before) *did* special-case \d+\.\d+\.\d+\.\d+! This code was somehow lost when the IPv6 support was added. I propose to put it back in, at least for IPv4 (AF_INET). Patch anyone? --Guido van Rossum (home page: http://www.python.org/~guido/) From jafo@tummy.com Wed Apr 9 15:48:04 2003 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 9 Apr 2003 08:48:04 -0600 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com> Message-ID: <20030409144803.GM1756@tummy.com> On Wed, Apr 09, 2003 at 09:51:26AM -0400, Guido van Rossum wrote: >I didn't even know this, and I think it's bad style to use something >that obscure Perhaps... It's also bad style to break the obscure cases that are defined by the specifications... ;-) >(most people would probably guess that 127.1 means >0.0.127.1 or 127.1.0.0). Yeah, unfortunately it's one of those cases that it doesn't really make sense until you actually know the padding happens, and then think about it... It really only makes sense to pad within the address because you are rarely going to have leading or trailing 0s in a network address. So, it pads before the trailing specified octet: 10.1 => 10.0.0.1 10.9.1 => 10.9.0.1 >But since you seem to know about this stuff, perhaps you can submit a >patch? I've updated my local CVS repository, I'll see if I can get a change done on the airplane today. Sean -- The structure of a system reflects the structure of the organization that built it. -- Richard E. Fairley Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin From guido@python.org Wed Apr 9 15:50:11 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 10:50:11 -0400 Subject: [Python-Dev] SF file uploads work now In-Reply-To: Your message of "Wed, 09 Apr 2003 10:38:03 EDT." <20030409143803.GE17847@epoch.metaslash.com> References: <20030409143803.GE17847@epoch.metaslash.com> Message-ID: <200304091450.h39EoDP25441@odiug.zope.com> > SF has fixed the problem which prevented a file from being uploaded > when submitting a new patch. I just tested this and it worked. Thanks! I've removed the big red warning about this from the "submit new" page. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 9 15:54:18 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 10:54:18 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 08:48:04 MDT." <20030409144803.GM1756@tummy.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com> <20030409144803.GM1756@tummy.com> Message-ID: <200304091454.h39EsPr25477@odiug.zope.com> > On Wed, Apr 09, 2003 at 09:51:26AM -0400, Guido van Rossum wrote: > >I didn't even know this, and I think it's bad style to use something > >that obscure > > Perhaps... It's also bad style to break the obscure cases that are > defined by the specifications... ;-) Sure. I propose to special-case only what we *absolutely* *know* we can handle, and if on closer inspection we can't (e.g. someone writes 999.999.999.999) we pass it on to the official code. Here's the 2.1 code, which takes that approach: if (sscanf(name, "%d.%d.%d.%d%c", &d1, &d2, &d3, &d4, &ch) == 4 && 0 <= d1 && d1 <= 255 && 0 <= d2 && d2 <= 255 && 0 <= d3 && d3 <= 255 && 0 <= d4 && d4 <= 255) { addr_ret->sin_addr.s_addr = htonl( ((long) d1 << 24) | ((long) d2 << 16) | ((long) d3 << 8) | ((long) d4 << 0)); return 4; } > >But since you seem to know about this stuff, perhaps you can submit a > >patch? > > I've updated my local CVS repository, I'll see if I can get a change > done on the airplane today. Great! --Guido van Rossum (home page: http://www.python.org/~guido/) From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 16:07:51 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Wed, 9 Apr 2003 10:07:51 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091437.h39Ebc125316@odiug.zope.com> Message-ID: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu> On Wednesday, April 9, 2003, at 09:37 AM, Guido van Rossum wrote: >> OK, I'll chime back in on the thread I started... I mostly have a >> question for Sean, since he seems to know the networking stuff well. > > I'll chime in nevertheless. > >> Do you know of any reason why my original proposal (which is to allows >> IP addresses prefixed with <numeric> e.g. <numeric>127.0.0.1 to cause >> both the AI_PASSIVE _and_ AI_NUMERIC flags to get set when resolution >> is attempted, which basically causes parsing with not real resolution >> at all) would break any known or plausible networking standards? > > What are those flags? Which API uses them? > The getsockaddr call uses them (actually the correct name for one of the flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated), and its part of the BSD sockets library, which is basically what the python socketmodule wraps. > I still don't understand why intercepting the all-numeric syntax isn't > good enough, and why you want a <numeric> prefix. > I guess intercepting all numeric is OK, it is just less efficient (since it requires a trial parsing of an address, which is wasted if it is not all numeric), and because it is so easy to implement <numeric>. However, all my operational goals are achieved if the old check for pure numeric is reinstated at the lowest level (probably in getsockaddrarg in socketmodule.c), so it is used everywhere. >> The current Python socket module basically hides this part of the >> BSD socket API, and I find it quite useful to be able to suppress >> DNS activity absolutely for some addresses. And for Guido: since >> this type of tag has already been used in Python (as <broadcast>), >> is there any reason why this solution is inelegant? > > The reason I'm reluctant to add a new notation is that AFAIK it would > be unique to Python. It's better to stick to standard notations IMO. > <broadcast> was probably a mistake, since it seems to mean the same as > 0.0.0.0 (for IPv4). I accept this logic. However, python is hiding a very useful (for efficiency) piece of the API, or depending on guessing whether you want it or not by looking at the format of an address. There are times in higher-level (python) code where getaddrinfo is called to get a CNAME, where I would also like to cause the raw IP to be returned by force, instead of attempting to get a CNAME, since I already know, by the IP I chose, that one doesn't exists. If we make the same check for numeric IPs in getaddrinfo, then it becomes impossible to resolve numeric names back to real ones. There is not way for getaddrinfo to know which way we want it, since in this case both ways might be needed. From guido@python.org Wed Apr 9 16:20:39 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 11:20:39 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 10:07:51 CDT." <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu> References: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <200304091521.h39FL5425595@odiug.zope.com> > > I still don't understand why intercepting the all-numeric syntax > > isn't good enough, and why you want a <numeric> prefix. > > > I guess intercepting all numeric is OK, it is just less efficient > (since it requires a trial parsing of an address, which is wasted if > it is not all numeric), and because it is so easy to implement > <numeric>. The performance loss will be unmeasurable (parsing a string of at most 11 bytes against a very simple pattern). Compare that to the true cost of adding <numeric>: documentation has to be added (and dozens of books updated), and code that wants to use numeric addresses has to be changed. > However, all my operational goals are achieved if the > old check for pure numeric is reinstated at the lowest level > (probably in getsockaddrarg in socketmodule.c), so it is used > everywhere. Right. > > The reason I'm reluctant to add a new notation is that AFAIK it would > > be unique to Python. It's better to stick to standard notations IMO. > > <broadcast> was probably a mistake, since it seems to mean the same as > > 0.0.0.0 (for IPv4). > I accept this logic. However, python is hiding a very useful (for > efficiency) piece of the API, or depending on guessing whether you want > it or not by looking at the format of an address. There are times in > higher-level (python) code where getaddrinfo is called to get a CNAME, > where I would also like to cause the raw IP to be returned by force, > instead of attempting to get a CNAME, since I already know, by the IP I > chose, that one doesn't exists. If we make the same check for numeric > IPs in getaddrinfo, then it becomes impossible to resolve numeric names > back to real ones. There is not way for getaddrinfo to know which way > we want it, since in this case both ways might be needed. You're right, this functionality should be made available. IMO the right solution is to make it a separate API in the socket module, not to add more syntax to the existing address parsing code. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Wed Apr 9 19:36:17 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 20:36:17 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <20030409124848.GB15649@tummy.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> Message-ID: <3E946821.6010208@v.loewis.de> Sean Reifschneider wrote: > My first thought is that there should be a local DNS cache on the > machine that is running these apps. My second thought is that Python > could benefit from caching some lookup information... I disagree. Python should expose the resolver library, and leave caching to it; many such libraries do caching already, in some form. The issue is different: In some cases the application just *knows* that an address is numeric, and that DNS lookup will fail. In these cases, lookup should be avoided - whether by explicit request from the application or by Python implicitly just knowing is a different issue. It turns out that Python doesn't need to 100% detect numeric addresses, as long as it would not classify addresses as numeric which aren't. Perhaps it is even possible to leave the "is numeric" test to the implementation of getaddrinfo, i.e. calling it twice (try numeric first, then try resolving the name)? Regards, Martin From martin@v.loewis.de Wed Apr 9 19:38:32 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 20:38:32 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091351.h39DpSq24961@odiug.zope.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com> Message-ID: <3E9468A8.8050407@v.loewis.de> Guido van Rossum wrote: > I didn't even know this, and I think it's bad style to use something > that obscure (most people would probably guess that 127.1 means > 0.0.127.1 or 127.1.0.0). > > But since you seem to know about this stuff, perhaps you can submit a > patch? I think the OP is willing to create a patch if guided into a direction. The basic question is: should Python automatically recognize numeric addresses, or should the application have a way to indicate a numeric address? Regards, Martin From skip@pobox.com Wed Apr 9 19:44:51 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 9 Apr 2003 13:44:51 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <3E946821.6010208@v.loewis.de> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> Message-ID: <16020.27171.834878.631470@montanaro.dyndns.org> Martin> It turns out that Python doesn't need to 100% detect numeric Martin> addresses, as long as it would not classify addresses as numeric Martin> which aren't. Perhaps it is even possible to leave the "is Martin> numeric" test to the implementation of getaddrinfo, i.e. calling Martin> it twice (try numeric first, then try resolving the name)? Can a top-level domain be all digits? If not, why not assume numeric if re.search(r"\.\d+$", addr) is not None? Skip From guido@python.org Wed Apr 9 19:45:49 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 14:45:49 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 20:36:17 +0200." <3E946821.6010208@v.loewis.de> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> Message-ID: <200304091845.h39Ijor31915@odiug.zope.com> > Sean Reifschneider wrote: > > > My first thought is that there should be a local DNS cache on the > > machine that is running these apps. My second thought is that Python > > could benefit from caching some lookup information... [MvL] > I disagree. Python should expose the resolver library, and leave > caching to it; many such libraries do caching already, in some form. Right. > The issue is different: In some cases the application just *knows* > that an address is numeric, and that DNS lookup will fail. In fact, I've often written code that passes a numeric address, and I've always assumed that in that case the code would take a shortcut because there's nothing to look up (only to parse). > In these cases, lookup should be avoided - whether by explicit > request from the application or by Python implicitly just knowing is > a different issue. > > It turns out that Python doesn't need to 100% detect numeric > addresses, as long as it would not classify addresses as numeric > which aren't. Perhaps it is even possible to leave the "is numeric" > test to the implementation of getaddrinfo, i.e. calling it twice > (try numeric first, then try resolving the name)? Perhaps, as long as we can safely ignore the first error. This would probably be a little slower, but probably not slow enoug to matter, and it sounds like a very general solution. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Wed Apr 9 19:49:54 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 20:49:54 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu> References: <0836E287-6A9D-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <3E946B52.7090708@v.loewis.de> Marcus Mendenhall wrote: > The getsockaddr call uses them (actually the correct name for one of the > flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated), and its > part of the BSD sockets library, which is basically what the python > socketmodule wraps. More importantly, it is part of RFC 2553, which Python uses; it is also part of Winsock2. > I guess intercepting all numeric is OK, it is just less efficient (since > it requires a trial parsing of an address, which is wasted if it is not > all numeric), and because it is so easy to implement <numeric>. But isn't the same trial parsing needed to determine presence of the "<numeric>" flag? The trial parsing Guido proposes usually stops with the first letter in a non-numeric address, and accesses up to 16 letters for a numeric address. Regards, Martin From guido@python.org Wed Apr 9 19:47:41 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 14:47:41 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 20:38:32 +0200." <3E9468A8.8050407@v.loewis.de> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <200304091351.h39DpSq24961@odiug.zope.com> <3E9468A8.8050407@v.loewis.de> Message-ID: <200304091848.h39IlpW31935@odiug.zope.com> > The basic question is: should Python automatically recognize numeric > addresses, or should the application have a way to indicate a numeric > address? It should be automatically recognized. Python has always done this (until 2.1 at least). I don't think there is any ambiguity; AFAIK it's not possible to put something in the DNS so that an all-numeric address gets remapped (that would be a nasty security problem waiting to happen). --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Wed Apr 9 19:59:56 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 20:59:56 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <16020.27171.834878.631470@montanaro.dyndns.org> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <16020.27171.834878.631470@montanaro.dyndns.org> Message-ID: <3E946DAC.8010909@v.loewis.de> Skip Montanaro wrote: > Can a top-level domain be all digits? It appears nobody here can answer this question with certainty. If the answer is "no", it is surprising that getaddrinfo implementations still make resolver calls in this case even if they are sure that those resolver calls fail. One would hope that people writing socket libraries should no the answer. Regards, Martin From marcus.h.mendenhall@vanderbilt.edu Wed Apr 9 20:14:16 2003 From: marcus.h.mendenhall@vanderbilt.edu (Marcus Mendenhall) Date: Wed, 9 Apr 2003 14:14:16 -0500 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <3E946B52.7090708@v.loewis.de> Message-ID: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> On Wednesday, April 9, 2003, at 01:49 PM, Martin v. L=F6wis wrote: > Marcus Mendenhall wrote: > >> The getsockaddr call uses them (actually the correct name for one of=20= >> the flags is AI_NUMERICHOST, not AI_NUMERIC as I originally stated),=20= >> and its part of the BSD sockets library, which is basically what the=20= >> python socketmodule wraps. > > More importantly, it is part of RFC 2553, which Python uses; it is = also > part of Winsock2. > >> I guess intercepting all numeric is OK, it is just less efficient=20 >> (since it requires a trial parsing of an address, which is wasted if=20= >> it is not all numeric), and because it is so easy to implement=20 >> <numeric>. > > But isn't the same trial parsing needed to determine presence of the=20= > "<numeric>" flag? The trial parsing Guido proposes usually stops with > the first letter in a non-numeric address, and accesses up to 16=20 > letters > for a numeric address. Yes, but a compare of the head of a string to a constant is probably=20 something which requires 1% of the cpu time of a sscanf. Just: if (string[0]=3D=3D'<' && not strncmp(string,"<numeric>",9)) {whatever} the first compare avoids even a subroutine call in the most likely case=20= (string does not begin with <numeric>) but then checks extremely=20 quickly if it is right after that. Even though cpu time is cheap, we should save it for useful work. Marcus From nas@python.ca Wed Apr 9 20:31:22 2003 From: nas@python.ca (Neil Schemenauer) Date: Wed, 9 Apr 2003 12:31:22 -0700 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> References: <3E946B52.7090708@v.loewis.de> <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <20030409193122.GA20230@glacier.arctrix.com> Marcus Mendenhall wrote: > Even though cpu time is cheap, we should save it for useful work. Saving a few cycles while having the complicate the interface is not the Python way. +1 on restoring the old sscanf code (or something similar to it). ObTrivia: IP addresses can be written as a single number (at least for many IP implementations). Try "ping 2130706433". Neil From jeremy@zope.com Wed Apr 9 20:33:47 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 09 Apr 2003 15:33:47 -0400 Subject: [Python-Dev] tp_clear return value Message-ID: <1049916827.4961.64.camel@slothrop.zope.com> Why does tp_clear have a return value? All the code I've seen returns 0, but the only place that clear is called doesn't inspect its return value. Jeremy From guido@python.org Wed Apr 9 20:40:56 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 15:40:56 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 14:14:16 CDT." <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> References: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <200304091941.h39Jf7A00697@odiug.zope.com> > Even though cpu time is cheap, we should save it for useful work. With that attitude, I'm surprised you're using Python at all. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Wed Apr 9 20:48:10 2003 From: nas@python.ca (Neil Schemenauer) Date: Wed, 9 Apr 2003 15:48:10 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <1049916827.4961.64.camel@slothrop.zope.com> References: <1049916827.4961.64.camel@slothrop.zope.com> Message-ID: <20030409194810.GA27070@mems-exchange.org> On Wed, Apr 09, 2003 at 03:33:47PM -0400, Jeremy Hylton wrote: > Why does tp_clear have a return value? All the code I've seen returns > 0, but the only place that clear is called doesn't inspect its return > value. I guess I would have to say overdesign. I was thinking that tp_clear and tp_traverse could somehow be used by things other than the GC. In retrospect that doesn't seem likely or even possible. The GC has pretty specific requirements. In retrospect, I think both tp_traverse and tp_clear should have returned "void". That would have made implementing those methods easier. Testing for errors in tp_traverse methods is silly since nothing returns an error, and, even if it did, the GC couldn't handle it. :-( How do we sort this out? I suppose one step would be to document that the return values of tp_traverse and tp_clear are ignored. If we agree on that, I volunteer to go through the code and remove the useless tests for errors in the tp_traverse methods. Neil From guido@python.org Wed Apr 9 20:52:03 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 15:52:03 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: Your message of "Wed, 09 Apr 2003 15:48:10 EDT." <20030409194810.GA27070@mems-exchange.org> References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> Message-ID: <200304091952.h39Jq6Y01468@odiug.zope.com> > On Wed, Apr 09, 2003 at 03:33:47PM -0400, Jeremy Hylton wrote: > > Why does tp_clear have a return value? All the code I've seen returns > > 0, but the only place that clear is called doesn't inspect its return > > value. [In response, Neil admitted] > I guess I would have to say overdesign. I was thinking that tp_clear > and tp_traverse could somehow be used by things other than the GC. In > retrospect that doesn't seem likely or even possible. The GC has pretty > specific requirements. > > In retrospect, I think both tp_traverse and tp_clear should have > returned "void". That would have made implementing those methods > easier. Testing for errors in tp_traverse methods is silly since > nothing returns an error, and, even if it did, the GC couldn't handle > it. > > :-( > > How do we sort this out? I suppose one step would be to document that > the return values of tp_traverse and tp_clear are ignored. If we agree > on that, I volunteer to go through the code and remove the useless tests > for errors in the tp_traverse methods. That's a good first step. Unfortunately changing the declaration to void will break 3rd party extensions so that will be too painful. --Guido van Rossum (home page: http://www.python.org/~guido/) From jafo@tummy.com Wed Apr 9 20:22:48 2003 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 9 Apr 2003 13:22:48 -0600 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <3E946821.6010208@v.loewis.de> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> Message-ID: <20030409192248.GQ1756@tummy.com> On Wed, Apr 09, 2003 at 08:36:17PM +0200, "Martin v. L?wis" wrote: >I disagree. Python should expose the resolver library, and leave caching >to it; many such libraries do caching already, in some form. Why don't we carry it to the logical conclusion and say that the resolver should also avoid doing a forward lookup on an already numeric IP? I've noticed that before the Red Hat 8.0 release, doing a "telnet <ip>" would usually be very fast on the initial connection, and since 8.0 it's been slow as if doing a lookup... To me that indicates that the resolver used to do this and has been changed to not, which makes me wonder why that was... Perhaps we're being too clever and it's going to come back to bite us? The "<numeric>" syntax would allow us to leave resolution as it is and let the user override it when they deem necessary. If we try to auto-detect (which I'm usually all for), we should probably implement a "<forcedns>" or similar? Sean -- Geek English Rule #7: To reduce redundancy, the word "scary" can be left out of any statement containing the phrase "scary java applet". Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin From guido@python.org Wed Apr 9 21:05:50 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 16:05:50 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: Your message of "Wed, 09 Apr 2003 13:22:48 MDT." <20030409192248.GQ1756@tummy.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <20030409192248.GQ1756@tummy.com> Message-ID: <200304092005.h39K5pd01600@odiug.zope.com> > Why don't we carry it to the logical conclusion and say that the > resolver should also avoid doing a forward lookup on an already numeric > IP? > > I've noticed that before the Red Hat 8.0 release, doing a "telnet <ip>" > would usually be very fast on the initial connection, and since 8.0 it's > been slow as if doing a lookup... To me that indicates that the > resolver used to do this and has been changed to not, which makes me > wonder why that was... > > Perhaps we're being too clever and it's going to come back to bite us? I think it's the other way around. The resolver lost some perfectly good caching in the upgrade to support IPv6. The designers probably didn't notice the difference because in their own setup, DNS is fast. I expect the caching will come back eventually. > The "<numeric>" syntax would allow us to leave > resolution as it is and let the user override it when they deem > necessary. If we try to auto-detect (which I'm usually all for), we > should probably implement a "<forcedns>" or similar? YAGNI. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Wed Apr 9 21:27:01 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 22:27:01 +0200 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <20030409194810.GA27070@mems-exchange.org> References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> Message-ID: <3E948215.8050504@v.loewis.de> Neil Schemenauer wrote: > I guess I would have to say overdesign. I was thinking that tp_clear > and tp_traverse could somehow be used by things other than the GC. In > retrospect that doesn't seem likely or even possible. The GC has pretty > specific requirements. > > In retrospect, I think both tp_traverse and tp_clear should have > returned "void". While this is true for tp_clear, tp_traverse is actually more general. gc.get_referrers uses tp_traverse, for something other than collection. > That would have made implementing those methods > easier. Testing for errors in tp_traverse methods is silly since > nothing returns an error, and, even if it did, the GC couldn't handle > it. Again, gc.get_referrers "uses" this feature. If extending the list fails, traversal is aborted. Whether this is useful is questionable, as the entire notion of "out of memory exception handling" is questionable. Regards, Martin From jafo@tummy.com Wed Apr 9 21:33:19 2003 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 9 Apr 2003 14:33:19 -0600 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <16020.27171.834878.631470@montanaro.dyndns.org> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <16020.27171.834878.631470@montanaro.dyndns.org> Message-ID: <20030409203319.GS1756@tummy.com> On Wed, Apr 09, 2003 at 01:44:51PM -0500, Skip Montanaro wrote: >Can a top-level domain be all digits? If not, why not assume numeric if >re.search(r"\.\d+$", addr) is not None? I don't think anyone sane would create a top-level that's digits, particularly in the range of 0 to 255. That probably means that somebody is going to do it... ;-/ I think checking for 2 to 4 dotted octets in the range of 0 to 255 would be safest... Yes, you can probably get away with using the regex above, but I wouldn't want to. Sean -- Sucking all the marrow out of life doesn't mean choking on the bone. -- _Dead_Poet's_Society_ Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com, ltd. - Linux Consulting since 1995. Qmail, Python, SysAdmin From tim.one@comcast.net Wed Apr 9 22:33:07 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 09 Apr 2003 17:33:07 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <3E948215.8050504@v.loewis.de> Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKMFEAA.tim.one@comcast.net> [Neil Schemenauer] >> I was thinking that tp_clear and tp_traverse could somehow be used by >> things other than the GC. In retrospect that doesn't seem likely or even >> possible. The GC has pretty specific requirements. >> In retrospect, I think both tp_traverse and tp_clear should have >> returned "void". [Martin v. Lowis] > While this is true for tp_clear, tp_traverse is actually more general. > gc.get_referrers uses tp_traverse, for something other than collection. >> That would have made implementing those methods >> easier. Testing for errors in tp_traverse methods is silly since >> nothing returns an error, and, even if it did, the GC couldn't handle >> it. > Again, gc.get_referrers "uses" this feature. If extending the list > fails, traversal is aborted. Whether this is useful is questionable, > as the entire notion of "out of memory exception handling" is > questionable. The brand new gc.get_referents uses the return value of tp_traverse to abort on out-of-memory, but gc.get_referrers uses it for a different purpose (its traversal function returns true if the visited object is in the tuple of objects passed in, else returns false). The internal gc.get_referrers_for is what aborts on out-of-memory in the get_referrers subsystem. tp_traverse is fine as-is. The return value of tp_clear does indeed appear without plausible use. >> If we agree that, I volunteer to go through the code and remove the >> useless tests for errors in the tp_traverse methods. That would make get_referents press on after memory is exhausted. It would also change the semantics of get_referrers, in a subtle way (if object A has 25 references to object B, gc.get_referrers(B) contains only 1 instance of A today, but would contain 25 instances of A if tp_traverse methods ignored visit() return values). truth-isn't-necessarily-an-error-ly y'rs - tim From Jack.Jansen@oratrix.com Wed Apr 9 22:33:14 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Wed, 9 Apr 2003 23:33:14 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <20030409144037.GL1756@tummy.com> Message-ID: <DF2120F8-6AD2-11D7-846E-000A27B19B96@oratrix.com> On woensdag, apr 9, 2003, at 16:40 Europe/Amsterdam, Sean Reifschneider wrote: > On Thu, Apr 10, 2003 at 12:24:45AM +1000, Anthony Baxter wrote: >> Ick ick. This is putting a bunch of code for a stub resolver into >> python. >> This stuff is hard to get right - I implemented this on top of pydns, >> and >> it was a lot of work to get (what I think is) correct, for not very >> much >> gain. > > Well, ideally you'd cache the data for as long as the SOA says to cache > it. However, it sounds like in the situation that started this thread, > even caching that data for some small but configurable number of > seconds > might help out. I wouldn't touch caching with a ten foot pole here: Python cannot know what happens under the hood of the network. For example, if I move my WiFi-equipped laptop from one location to another I don't want to be forced to restart my Python applications just to clear some silly cache, knowing that the OS and libc layers have handled the switch fine. (And, yes, Windoze-users are probably required to reboot anyway, but my Mac handles changing IP addresses just nicely:-) -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From nas@python.ca Wed Apr 9 22:41:04 2003 From: nas@python.ca (Neil Schemenauer) Date: Wed, 9 Apr 2003 14:41:04 -0700 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <3E948215.8050504@v.loewis.de> References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> <3E948215.8050504@v.loewis.de> Message-ID: <20030409214104.GA20544@glacier.arctrix.com> "Martin v. L?wis" wrote: > Neil Schemenauer wrote: > >In retrospect, I think both tp_traverse and tp_clear should have > >returned "void". > > While this is true for tp_clear, tp_traverse is actually more general. > gc.get_referrers uses tp_traverse, for something other than collection. Could the visit procedure keep track of errors? Something like: struct result { int error; /* true if an error occured while traversing */ /* other results */ } static void myvisit(PyObject* obj, struct result *r) { if (!r->error) { <do stuff, set r->error of error occurs> } } From martin@v.loewis.de Wed Apr 9 22:47:52 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Apr 2003 23:47:52 +0200 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <20030409214104.GA20544@glacier.arctrix.com> References: <1049916827.4961.64.camel@slothrop.zope.com> <20030409194810.GA27070@mems-exchange.org> <3E948215.8050504@v.loewis.de> <20030409214104.GA20544@glacier.arctrix.com> Message-ID: <3E949508.1030902@v.loewis.de> Neil Schemenauer wrote: > Could the visit procedure keep track of errors? No. For get_referrers (as Tim explains), it might be acceptable but less efficient (since traversal should stop when a the object is found to be a referrer). For get_referents, an error in the callback should really abort traversal as the system just went out of memory. Regards, Martin From db3l@fitlinxx.com Wed Apr 9 23:11:10 2003 From: db3l@fitlinxx.com (David Bolen) Date: 09 Apr 2003 18:11:10 -0400 Subject: [Python-Dev] Re: _socket efficiencies ideas References: <3E946B52.7090708@v.loewis.de> <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> <20030409193122.GA20230@glacier.arctrix.com> Message-ID: <u65pniao1.fsf@fitlinxx.com> Neil Schemenauer <nas@python.ca> writes: > Marcus Mendenhall wrote: > > Even though cpu time is cheap, we should save it for useful work. > > Saving a few cycles while having the complicate the interface is not the > Python way. +1 on restoring the old sscanf code (or something similar > to it). For what it's worth, whenever I had network code that I wanted to accept names or addresses, I always distinguished them through an attempt using the platform inet_addr() system call. If that returns an error (-1), then I go ahead and process it as a name, otherwise I use the address it returns. inet_addr() will itself take care of validating that the address is legal (e.g., no octet over 255 and only up to 4 octets), padding values as necessary (e.g., x.y.z is processed as if z was a 16-bit value, x.z as if z was a 24-bit value, x as a 32-bit value), and permits decimal, octal or hexadecimal forms of the individual octets. I believe this behavior is portable and well defined. If you wanted the same code to work for IPv4 and IPv6, you'd probably want to use inet_pton() instead since inet_addr() only does IPv4, although that would lose the hex/octal options. You'd probably have to conditionalize that anyway since it might not be available on IPv4 only configurations, so I could see using inet_addr() for IPv4 and inet_pton() for IPv6. > ObTrivia: IP addresses can be written as a single number (at least for > many IP implementations). Try "ping 2130706433". That's part of the inet_addr() definition. When a single value is given as the string, it is assumed to be the complete 32-bit address value, and is stored directly without any byte rearrangement. So, 2130706433 is (127*2^24) + 1, or "127.0.0.1" - but then obviously you knew that :-) -- David From greg@cosc.canterbury.ac.nz Thu Apr 10 01:31:34 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Apr 2003 12:31:34 +1200 (NZST) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091424.h39EOje08304@localhost.localdomain> Message-ID: <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz> Anthony Baxter <anthony@interlink.com.au>: > The idea of either suppressing DNS lookups for all-numeric addresses, or > some sort of extended API for suppressing DNS lookups might be better, > but really, isn't this the job of the stub resolver? Seems to me the basic problem is that we're representing to completely different things -- a DNS name and a raw IP address -- the same way, i.e. as a string. A raw IP address should (at least optionally) be represented by something different, such as a tuple of ints. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Thu Apr 10 01:37:58 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 09 Apr 2003 20:37:58 -0400 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: "Your message of Thu, 10 Apr 2003 12:31:34 +1200." <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz> References: <200304100031.h3A0VYV24951@oma.cosc.canterbury.ac.nz> Message-ID: <200304100037.h3A0bwt01972@pcp02138704pcs.reston01.va.comcast.net> > Seems to me the basic problem is that we're representing > to completely different things -- a DNS name and a raw > IP address -- the same way, i.e. as a string. > > A raw IP address should (at least optionally) be represented > by something different, such as a tuple of ints. Why? There's never any ambiguity about which kind is intended. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Thu Apr 10 02:10:44 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Apr 2003 13:10:44 +1200 (NZST) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304091848.h39IlpW31935@odiug.zope.com> Message-ID: <200304100110.h3A1Aij25025@oma.cosc.canterbury.ac.nz> Guido van Rossum <guido@python.org>: > AFAIK it's not possible to put something in the DNS so that an > all-numeric address gets remapped In that case, there's no problem at all, and I withdraw my suggestion about using tuples for numeric addresses. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Apr 10 02:15:05 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Apr 2003 13:15:05 +1200 (NZST) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> Message-ID: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz> Marcus Mendenhall <marcus.h.mendenhall@vanderbilt.edu>: > Just: if (string[0]=='<' && not strncmp(string,"<numeric>",9)) > {whatever} By the same token, checking whether the first char is a digit ought to weed out about 99.999% of all non-numeric domain name addresses. If this is even a problem, which I doubt. We're talking about something called from Python, for goodness sake... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From andrew@acooke.org Thu Apr 10 02:27:35 2003 From: andrew@acooke.org (andrew cooke) Date: Wed, 9 Apr 2003 21:27:35 -0400 (CLT) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz> References: <750D46CE-6ABF-11D7-87F7-003065A81A70@vanderbilt.edu> <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz> Message-ID: <40894.127.0.0.1.1049938055.squirrel@127.0.0.1> this is a fragment from RFC 1034 (DOMAIN NAMES - CONCEPTS AND FACILITIES) http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1034.html i'm not 100% sure that this is the "normative" definition, but if it is then it clearly requires a non-numeric initial character for each label. (sorry if someone has already mentioned this!) andrew 3.5 Preferred name syntax The DNS specifications attempt to be as general as possible in the rules for constructing domain names. The idea is that the name of any existing object can be expressed as a domain name with minimal changes. However, when assigning a domain name for an object, the prudent user will select a name which satisfies both the rules of the domain system and any existing rules for the object, whether these rules are published or implied by existing programs. For example, when naming a mail domain, the user should satisfy both the rules of this memo and those in RFC-822. When creating a new host name, the old rules for HOSTS.TXT should be followed. This avoids problems when old software is converted to use domain names. The following syntax will result in fewer problems with many applications that use domain names (e.g., mail, TELNET). <domain> ::= <subdomain> | " " <subdomain> ::= <label> | <subdomain> "." <label> <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> <let-dig-hyp> ::= <let-dig> | "-" <let-dig> ::= <letter> | <digit> <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case <digit> ::= any one of the ten digits 0 through 9 Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical. The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less. -- http://www.acooke.org/andrew From tim.one@comcast.net Thu Apr 10 03:29:21 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 09 Apr 2003 22:29:21 -0400 Subject: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6 In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBHEFAB.tim_one@email.msn.com> Message-ID: <LNBBLJKPBEHFEDALKOLCMEOEECAB.tim.one@comcast.net> [Greg Ewing] >> Failing that, perhaps they should be made a bit less dynamic, so that >> the GC can make reasonable assumptions about their existence without >> having to execute Python code. [Tim] > Guido already did so for new-style classes in Python 2.3. That machinery > doesn't exist in 2.2.2, and old-style classes remain a problem under 2.3 > too. Backward compatibility constrains how much we can get away with, of > course. FYI, those who study the checkin comments know how this ended. It ended well! gc no longer does anything except string-keyed dict lookups when determining whether a finalizer exists, for old- & new- style classes, and in 2.3 CVS & the 2.2 maintenance branch. The only incompatibilities appear to be genuine bug fixes. The hasattr() method was actually incorrect in two mondo obscure cases (one where hasattr said "yes, __del__ exists" when a finalizer couldn't actually be run, and the other where hasattr said "no, __del__ doesn't exist" when arbitrary Python code actually could be invoked by destructing an object). A new private API function _PyInstance_Lookup was added in 2.2 and 2.3, which does for old-style class instances what _PyType_Lookup does for new-style classes (determines whether an attribute exists via pure C string-keyed dict lookups). From gisle@ActiveState.com Thu Apr 10 04:45:52 2003 From: gisle@ActiveState.com (Gisle Aas) Date: 09 Apr 2003 20:45:52 -0700 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz> References: <200304100115.h3A1F5425035@oma.cosc.canterbury.ac.nz> Message-ID: <lrvfxnm2vj.fsf@caliper.activestate.com> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > Marcus Mendenhall <marcus.h.mendenhall@vanderbilt.edu>: > > > Just: if (string[0]=='<' && not strncmp(string,"<numeric>",9)) > > {whatever} > > By the same token, checking whether the first char is > a digit ought to weed out about 99.999% of all > non-numeric domain name addresses. 3m.com is a registered domain name. Regards, Gisle Aas, ActiveState From huey_jiang@yahoo.com Thu Apr 10 05:57:03 2003 From: huey_jiang@yahoo.com (Huey Jiang) Date: Wed, 9 Apr 2003 21:57:03 -0700 (PDT) Subject: [Python-Dev] Unicode Message-ID: <20030410045703.14754.qmail@web20007.mail.yahoo.com> Hi There, I wonder how can I get python to support Chinese language? I noticed python has Unicode feature in version 2.2.2, but as I tried: >>> str = " a_char_in_chinese_lan" I encountered UnicodeError. How can I make this to work? Thanks! Huey __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com From Anthony Baxter <anthony@interlink.com.au> Thu Apr 10 05:58:20 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Thu, 10 Apr 2003 14:58:20 +1000 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <lrvfxnm2vj.fsf@caliper.activestate.com> Message-ID: <200304100458.h3A4wK816653@localhost.localdomain> >>> Gisle Aas wrote > Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > > By the same token, checking whether the first char is > > a digit ought to weed out about 99.999% of all > > non-numeric domain name addresses. > > 3m.com is a registered domain name. As is 3com.com, and, for a more python-related example, 4suite.org. The latter also has an A record. 411.com and 911.com are both valid domains, as is 123.com. With the appropriate resolv.conf search path (ie including '.com'), you could enter '123' and expect to get back the address 64.186.10.158. Isn't the DNS fun. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From tim_one@email.msn.com Thu Apr 10 06:03:42 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 10 Apr 2003 01:03:42 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <3E949508.1030902@v.loewis.de> Message-ID: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com> [Neil Schemenauer] >> Could the visit procedure keep track of errors? [Martin v. Löwis] > No. For get_referrers (as Tim explains), it might be acceptable but > less efficient (since traversal should stop when a the object is found > to be a referrer). For get_referents, an error in the callback should > really abort traversal as the system just went out of memory. Still, I expect both could be handled by setjmp in the gc module get_ref* driver functions and longjmp (as needed) in the gc module visitor functions. IOW, the tp_traverse slot functions don't really need to cooperate, or even know anything about "early returns". Why this may be more than just idly interesting: the tp_traverse functions are called a lot by gc. The get_ref* functions are never called except when explicitly asked for, and their speed just doesn't matter. Burdening them with funky control flow would be a real win if eliminating almost-always-useless test/branch constructs in often-called tp_traverse slots sped the latter. From drifty@alum.berkeley.edu Thu Apr 10 06:28:09 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Wed, 9 Apr 2003 22:28:09 -0700 (PDT) Subject: [Python-Dev] Unicode In-Reply-To: <20030410045703.14754.qmail@web20007.mail.yahoo.com> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> Message-ID: <Pine.SOL.4.53.0304092226280.29205@death.OCF.Berkeley.EDU> [Huey Jiang] > Hi There, > > > I wonder how can I get python to support Chinese > language? This is the wrong place to ask this question. python-dev is meant to discuss the development of Python. Try emailing your question to python-list@python.org; you should be able to get some help there. -Brett From martin@v.loewis.de Thu Apr 10 06:30:19 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 07:30:19 +0200 Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <20030409203319.GS1756@tummy.com> References: <D4B8AB0D-69CF-11D7-A8D4-003065A81A70@vanderbilt.edu> <200304081450.h38EoqE20178@odiug.zope.com> <20030409124848.GB15649@tummy.com> <3E946821.6010208@v.loewis.de> <16020.27171.834878.631470@montanaro.dyndns.org> <20030409203319.GS1756@tummy.com> Message-ID: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de> Sean Reifschneider <jafo@tummy.com> writes: > I don't think anyone sane would create a top-level that's digits, > particularly in the range of 0 to 255. That probably means that > somebody is going to do it... ;-/ Indeed, Anthony brought the example of 911.com, which has been registered despite being illegal. Regards, Martin From martin@v.loewis.de Thu Apr 10 06:33:22 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 07:33:22 +0200 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCKEIOEGAB.tim_one@email.msn.com> Message-ID: <m3brzevrvh.fsf@mira.informatik.hu-berlin.de> "Tim Peters" <tim_one@email.msn.com> writes: > Still, I expect both could be handled by setjmp in the gc module get_ref* > driver functions and longjmp (as needed) in the gc module visitor functions. > IOW, the tp_traverse slot functions don't really need to cooperate, or even > know anything about "early returns". That would require that tp_traverse does not modify any refcount while iterating, right? It seems unpythonish to use setjmp/longjmp for exceptions. Regards, Martin From martin@v.loewis.de Thu Apr 10 06:37:39 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 07:37:39 +0200 Subject: [Python-Dev] Unicode In-Reply-To: <20030410045703.14754.qmail@web20007.mail.yahoo.com> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> Message-ID: <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> Huey Jiang <huey_jiang@yahoo.com> writes: > I wonder how can I get python to support Chinese > language? I noticed python has Unicode feature in > version 2.2.2, but as I tried: > > >>> str = " a_char_in_chinese_lan" > > I encountered UnicodeError. How can I make this to > work? Hi Huey, This is a mailing list for the development *of* Python; questions for the development *with* Python, or for asking for help. In the specific example, you should do some more research on your own. For example, does it matter whether you use IDLE or the command line Python? Does it matter whether you use Unix or Windows? Does it matter whether you put the string into a source file or enter them in interactive mode? [quick answer: all these things matter; in the cases where it doesn't work as you expect, causes vary widely] Regards, Martin From greg@cosc.canterbury.ac.nz Thu Apr 10 06:49:19 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Apr 2003 17:49:19 +1200 (NZST) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304100549.h3A5nJp26297@oma.cosc.canterbury.ac.nz> > Indeed, Anthony brought the example of 911.com, which has been > registered despite being illegal. At least 911 is greater than 255, which unfortunately isn't the case for 123. But all these would be caught by requiring a full 4-number address before deciding it's numeric. (I don't think it's worth allowing for 0-padding if there are less than 4 numbers.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From egg@ign.com Thu Apr 10 12:38:49 2003 From: egg@ign.com (Ponce Dubuque) Date: Thu, 10 Apr 2003 04:38:49 -0700 Subject: [Python-Dev] Unicode References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> Message-ID: <017b01c2ff55$c156a7e0$02fea8c0@HP> Whatever the rules acribe, poor Mr Jiang has nonetheless done 'development' in Python. Perhaps you ought to consider re-naming the list. I am sure somewhere, someone has mislabeled a link saying that this is where one posts, when one does development in Python. However, I do not wish for this suggestion to be the source of some great controversy. Everyone knows that trapping itself in such trifles is the reason why open-source most often gets nowhere. From aahz@pythoncraft.com Thu Apr 10 13:22:43 2003 From: aahz@pythoncraft.com (Aahz) Date: Thu, 10 Apr 2003 08:22:43 -0400 Subject: [Python-Dev] OT: Signal/noise ratio In-Reply-To: <017b01c2ff55$c156a7e0$02fea8c0@HP> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> Message-ID: <20030410122243.GA17289@panix.com> On Thu, Apr 10, 2003, Ponce Dubuque wrote: > > Whatever the rules acribe, poor Mr Jiang has nonetheless done > 'development' in Python. Perhaps you ought to consider re-naming the > list. I am sure somewhere, someone has mislabeled a link saying that > this is where one posts, when one does development in Python. What name would we pick to clearly indicate this? I am starting to think that a better idea would be to make python-dev a closed list (only subscribers may post), and have the subscription process include a challenge/response with a code word embedded in the list rules. If this capability isn't already in Mailman, I can think of several mailing lists that could use this capability. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz, c.l.py, 2/4/2002 From zooko@zooko.com Thu Apr 10 13:51:13 2003 From: zooko@zooko.com (Zooko) Date: Thu, 10 Apr 2003 08:51:13 -0400 Subject: [Python-Dev] OT: Signal/noise ratio In-Reply-To: Message from Aahz <aahz@pythoncraft.com> of "Thu, 10 Apr 2003 08:22:43 EDT." <20030410122243.GA17289@panix.com> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com> Message-ID: <E193bWL-0000Ij-00@localhost> Aahz wrote: > > I am starting to > think that a better idea would be to make python-dev a closed list (only > subscribers may post), and have the subscription process include a > challenge/response with a code word embedded in the list rules. This is how we run p2p-hackers [1] with Mailman and it works quite well to quell off-topic posts without, as far as I can tell, deterring any valuable posts. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links [1] http://zgp.org/mailman/listinfo/p2p-hackers From barry@python.org Thu Apr 10 14:12:07 2003 From: barry@python.org (Barry Warsaw) Date: 10 Apr 2003 09:12:07 -0400 Subject: [Python-Dev] OT: Signal/noise ratio In-Reply-To: <20030410122243.GA17289@panix.com> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com> Message-ID: <1049980327.28969.7.camel@anthem> On Thu, 2003-04-10 at 08:22, Aahz wrote: > What name would we pick to clearly indicate this? I am starting to > think that a better idea would be to make python-dev a closed list (only > subscribers may post), and have the subscription process include a > challenge/response with a code word embedded in the list rules. > > If this capability isn't already in Mailman, I can think of several > mailing lists that could use this capability. It's nearly there. You could set up an autoreply with the list guidelines and send that on the first post. What isn't there is a challenge/response subscription auto-enable, although I plan on adding something like this for Mailman 2.2. I'd rather not discuss this further on this list though. FWIW, python-dev /was/ a closed list at one point, with subscriptions requiring admin approval. At some point we didn't feel the overhead was worth it so we "quietly" changed the policy to allow mail-back confirmation subscriptions. I don't think we need to change things personally. IMO, We're already on the verge of spending more time discussing list policy than in simply handling the odd off-topic post <wink>. -Barry From vladimir.marangozov@optimay.com Thu Apr 10 14:40:44 2003 From: vladimir.marangozov@optimay.com (Marangozov, Vladimir (Vladimir)) Date: Thu, 10 Apr 2003 09:40:44 -0400 Subject: [Python-Dev] Re: _socket efficiency ideas Message-ID: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com> Hi, About the DNS discussion, I'll chime in with some info. (I don't know what Python does about this and have no time to figure it out). The format of an Internet (IPv4) address is: a.b.c.d - with all parts treated as 8 bits a.b.c - with 'c' treated as 16 bits a.b - with 'b' treated as 24 bits a - with 'a' treated as 32 bits You can try this out with ping 127.1; ping 127, etc. Any decent DNS resolver first tries to figure out whether the requested name string is an IP address. If it is, it doesn't send a query and returns immediately the numeric value of the string representation of the IP address. How a DNS resolver detects whether it should launch a query for the name 'name' varies from resolver to resolver, but basically, it does the following: 1. check for local resolution of 'name' (ex. if 'name' =3D=3D 'localhost', return 127.0.0.1) 2. if inet_aton('name') succeeds, 'name' is an IP address and return the result from inet_aton. 3. If caching is enabled, check the cache for 'name' If 1, 2 and 3 don't hold, send a DNS query. Caching is a separate/complementary issue and I agree that it should be left to the underlying resolver. Cheers, Vladimir From paul-python@svensson.org Thu Apr 10 14:46:25 2003 From: paul-python@svensson.org (Paul Svensson) Date: Thu, 10 Apr 2003 09:46:25 -0400 (EDT) Subject: [Python-Dev] _socket efficiencies ideas In-Reply-To: <m3fzoqvs0k.fsf@mira.informatik.hu-berlin.de> Message-ID: <20030410094133.W71996-100000@familjen.svensson.org> On 10 Apr 2003, Martin v. [iso-8859-15] Löwis wrote: >Sean Reifschneider <jafo@tummy.com> writes: > >> I don't think anyone sane would create a top-level that's digits, >> particularly in the range of 0 to 255. That probably means that >> somebody is going to do it... ;-/ > >Indeed, Anthony brought the example of 911.com, which has been >registered despite being illegal. Most of the alternate roots also carry the top level domains .800 and .411 /Paul From guido@python.org Thu Apr 10 14:47:10 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 09:47:10 -0400 Subject: [Python-Dev] OT: Signal/noise ratio In-Reply-To: Your message of "Thu, 10 Apr 2003 08:51:13 EDT." <E193bWL-0000Ij-00@localhost> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com> <E193bWL-0000Ij-00@localhost> Message-ID: <200304101347.h3ADlH603332@odiug.zope.com> Isn't it easier to just ignore the occasional off-topic post rather than trying to invent elaborate technological solutions to deal with what is essentially a social problem? I don't think there's much of a misunderstanding in the world about what python-dev is; it's probably more that some people want to get answers from the "smart crowd", which is known to hang out here. If we simply ignore inappropriate posts, or send polite redirections, we're doing the best we can. I'm for avoiding how-to discussions here. I'm against trying to keep people out of this list for any other reason than insistent obnoxiousness. Python-dev needs to be open. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Apr 10 19:09:57 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 14:09:57 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <m3brzevrvh.fsf@mira.informatik.hu-berlin.de> Message-ID: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> [Tim] > Still, I expect both could be handled by setjmp in the gc > module get_ref* driver functions and longjmp (as needed) in the > gc module visitor functions. IOW, the tp_traverse slot functions don't > really need to cooperate, or even know anything about "early returns". [martin@v.loewis.de] > That would require that tp_traverse does not modify any refcount while > iterating, right? Or do anything else that relies on calls to visit() returning. I've looked at every traverse slot in the core, and there's no problem with those. I don't think that's an accident -- the only purpose of an object's tp_traverse is to invoke the visit callback on the non-NULL PyObject* pointers the object has. So, e.g., there isn't an incref or decref in any of 'em now; at worst there's an int loop counter. > It seems unpythonish to use setjmp/longjmp for exceptions. I'm not suggesting adding setjmp/longjmp to the Python language <0.9 wink>. I'm suggesting using them for two specific and obscure gc module callbacks that aren't normally used (*most* of the gc module callbacks wouldn't use setjmp/longjmp); in return, mounds of frequently executed code like static int func_traverse(PyFunctionObject *f, visitproc visit, void *arg) { int err; if (f->func_code) { err = visit(f->func_code, arg); if (err) return err; } if (f->func_globals) { err = visit(f->func_globals, arg); if (err) return err; } if (f->func_module) { err = visit(f->func_module, arg); if (err) return err; } if (f->func_defaults) { err = visit(f->func_defaults, arg); if (err) return err; } if (f->func_doc) { err = visit(f->func_doc, arg); if (err) return err; } ... return 0; } could become the simpler and faster static int func_traverse(PyFunctionObject *f, visitproc visit, void *arg) { int err; if (f->func_code) visit(f->func_code, arg); if (f->func_globals) visit(f->func_globals, arg); if (f->func_module) visit(f->func_module, arg); if (f->func_defaults) visit(f->func_defaults, arg); if (f->func_doc) visit(f->func_doc, arg); ... return 0; } (I kept the final return 0 so that the signature wouldn't change.) From nati@ai.mit.edu Thu Apr 10 19:55:43 2003 From: nati@ai.mit.edu (Nathan Srebro) Date: Thu, 10 Apr 2003 14:55:43 -0400 Subject: [Python-Dev] Super and properties In-Reply-To: <001401c2f926$1d32d7e0$a8130dd5@violante> References: <001401c2f926$1d32d7e0$a8130dd5@violante> Message-ID: <3E95BE2F.8010900@ai.mit.edu> Gon=E7alo Rodrigues wrote: > My problem has to do with super that does not seem to work well with > properties. I encountered simmilar problems, and wrote a class, 'duper', which=20 behaves like 'super', but handles attributes deffined by arbitrary=20 descriptors cooperatively. It is available from=20 http://www.ai.mit.edu/~nati/Python/ Nati From mal@lemburg.com Thu Apr 10 20:09:07 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 10 Apr 2003 21:09:07 +0200 Subject: [Python-Dev] Unicode In-Reply-To: <017b01c2ff55$c156a7e0$02fea8c0@HP> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> Message-ID: <3E95C153.8040104@lemburg.com> Ponce Dubuque wrote: > Whatever the rules acribe, poor Mr Jiang has nonetheless done 'development' > in Python. Perhaps you ought to consider re-naming the list. I am sure > somewhere, someone has mislabeled a link saying that this is where one > posts, when one does development in Python. Perhaps you could find these links and suggest fixing them ? Python-Dev has always been a Python developer mailing list where Python language development is discussed and managed. There are many other lists out there which deal with development using Python. > However, I do not wish for this suggestion to be the source of some great > controversy. Everyone knows that trapping itself in such trifles is the > reason why open-source most often gets nowhere. I believe we've gone a looong way with Python :-) (even though these discussions come up every now and then). W/r to the subject, I suggest to start the Unicode discovery tour with the Python PEP 100: http://www.python.org/peps/pep-0100.html It has a list of references near the bottom which you can use to bootstrap the quest. Have fun, -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 10 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 75 days left From jeremy@zope.com Thu Apr 10 20:13:12 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 10 Apr 2003 15:13:12 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> Message-ID: <1050001991.4473.103.camel@slothrop.zope.com> On Thu, 2003-04-10 at 14:09, Tim Peters wrote: > I'm not suggesting adding setjmp/longjmp to the Python language <0.9 wink>. > I'm suggesting using them for two specific and obscure gc module callbacks > that aren't normally used (*most* of the gc module callbacks wouldn't use > setjmp/longjmp); in return, mounds of frequently executed code like ... > could become the simpler and faster ... Sure sounds good to me. If traverse worked this way, the traverse and clear slots and a part of the dealloc slot become almost identical. The take all PyObject * members in the struct and perform some action on them if they are non-NULL. dealloc performs a DECREF. clear performs a DECREF + assign NULL. traverse calls visit. It sure makes it easy to verify that each is implemented correctly. It would be cool if there were a way to automate some of the boilerplate. Jeremy From misa@redhat.com Thu Apr 10 20:29:20 2003 From: misa@redhat.com (Mihai Ibanescu) Date: Thu, 10 Apr 2003 15:29:20 -0400 (EDT) Subject: [Python-Dev] More socket questions Message-ID: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> Hello, Since somebody mention inet_addr, here's something else that I can attempt to fix if we agree on it. In python 2.2.2: socket.inet_aton("255.255.255.255") Traceback (most recent call last): File "<stdin>", line 1, in ? socket.error: illegal IP address string passed to inet_aton Implementation: static PyObject* PySocket_inet_aton(PyObject *self, PyObject *args) { #ifndef INADDR_NONE #define INADDR_NONE (-1) #endif /* Have to use inet_addr() instead */ char *ip_addr; unsigned long packed_addr; if (!PyArg_ParseTuple(args, "s:inet_aton", &ip_addr)) { return NULL; } #ifdef USE_GUSI1 packed_addr = inet_addr(ip_addr).s_addr; #else packed_addr = inet_addr(ip_addr); #endif if (packed_addr == INADDR_NONE) { /* invalid address */ PyErr_SetString(PySocket_Error, "illegal IP address string passed to inet_aton"); Reason for this behaviour can be found in the man page for inet_addr: The inet_addr() function converts the Internet host address cp from numbers-and-dots notation into binary data in network byte order. If the input is invalid, INADDR_NONE (usually -1) is returned. This is an obsolete interface to inet_aton, described immediately above; it is obsolete because -1 is a valid address (255.255.255.255), and inet_aton provides a cleaner way to indicate error return. I propose that we use inet_aton to implement PySocket_inet_aton (am I clever or what). The part that I don't know, how portable is this function? Does it exist on Mac and Windows? Thanks, Misa From shane.holloway@ieee.org Thu Apr 10 20:47:46 2003 From: shane.holloway@ieee.org (Shane Holloway (IEEE)) Date: Thu, 10 Apr 2003 13:47:46 -0600 Subject: [Python-Dev] Why is spawn*p* not available on Windows? Message-ID: <3E95CA62.4040904@ieee.org> Ok, so here's my story. I got curious as to why the various spawn*p* were not available on Windows. The conclusion I came to is that only "spawnv" and "spawnve" are exported by posixmodule.c, and os.py creates the other variants in terms of these functions. However, the "spawnvp" and "spawnvpe" python implementations are dependant upon the availablity of "fork". So, after all that, I looked in standard library header file for process.h and found function prototypes for the various _spawn functions. Would it make sense to add support for "spawnvp" and "spawnvpe" to posixmodule.c? Should it be guarded by the existing HAVE_SPAWNV, new HAVE_SPAWNVP, or by MS_WINDOWS definitions? Or, has someone already tried this with lessons learned? Thanks, -Shane From guido@python.org Thu Apr 10 20:57:30 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 15:57:30 -0400 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: Your message of "Sat, 05 Apr 2003 14:35:31 EST." <20030405193531.GA23455@meson.dyndns.org> References: <20030405193531.GA23455@meson.dyndns.org> Message-ID: <200304101957.h3AJvUe04626@odiug.zope.com> > It occurred to me this afternoon (after answering aquestion about creating > file objects from file descriptors) that perhaps os.fdopen would be more > logically placed someplace else - of course it could also remain as > os.fdopen() for whatever deprecation period is warrented. > > Perhaps as a class method of the file type, file.fromfd()? > > Should I file a feature request for this on sf, or would it be considered > too much of a mindless twiddle to bother with? The latter. If I had to do it over again, your suggestion would make sense; class methods are a good way to provide alternative constructors, and we're doing this e.g. for the new datetime class/module. But having this in the os module, which deals with such low-level file descriptors, still strikes me as a pretty decent place to put it as well, and I don't think it's worth the bother of updating documentation and so on. The social cost of deprecating a feature is pretty high. In general, I'm open to fixing design bugs if keeping the buggy design means forever having to explain a wart to new users, or forever having to debug bad code written because of a misunderstanding perpetuated by the buggy design (like int division). But in this case, I see no compelling reason; explaining how to do this isn't particularly easier or harder one way or the other. Responses to other messages in this thread: [Greg Ewing] > Not all OSes have the notion of a file descriptor, which is probably > why it's in the os module. Perhaps, but note that file objects have a method fileno(), which returns a file descriptor. Its implementation is not #ifdefed in any way -- the C stdio library requires fileno() to exist! Even if fdopen() really did need an #ifdef, it would be just as simple only to have the file.fdopen() class method when the C library defines fdopen() as it is to only have os.fdopen() under those conditions. [Oren Tirosh] > I don't see much point in moving it around just because the place > doesn't seem right but the fact that it's a function rather than a > method means that some things cannot be done in pure Python. > > I can create an uninitialized instance of a subclass of 'file' using > file.__new__(filesubclass) but the only way to open it is by name > using file.__init__(filesubclassinstance, 'filename'). A file > subclass cannot be opened from a file descriptor because fdopen > always returns a new instance of 'file'. > > If there was some way to open an uninitialized file object from a > file descriptor it would be possible, for example, to write a > version of popen that returns a subclass of file. It could add a > method for retrieving the exit code of the process, do something > interesting on __del__, etc. You have a point, but it's mostly theoretical: anything involving popen() should be done in C anyway, and this is no problem in C. > Here are some alternatives of where this could be implemented, > followed by what a Python implementation of os.fdopen would look > like: > > 1. New form of file.__new__ with more arguments: > > def fdopen(fd, mode='r', buffering=-1): > return file.__new__('(fdopen)', mode, buffering, fd) This violates the current invariant that __new__ doesn't initialize the file with a C-level FILE *. > 2. Optional argument to file.__init__: > > def fdopen(fd, mode='r', buffering=-1): > return file('(fdopen)', mode, buffering, fd) > > 3. Instance method (NOT a class method): > > def fdopen(fd, mode='r', buffering=-1): > f = file.__new__() > f.fdopen(fd, mode, buffering, '(fdopen)') > return f Hm, you seem to be implying that it should not be a class method because it should be possible to first create an uninitialized instance with __new__ (possibly of a subclass) and then initialize it separately. Perhaps. But since class methods also work for subclasses, I'm not sure I see the use case for this distinction. In any case I think this should wait until a future redesign of the stdio library, which will probably do some other refactoring (while staying compatible with the existing API). I've checked in some rough ideas in nondist/sandbox/sio/. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Apr 10 21:21:32 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 16:21:32 -0400 Subject: [Python-Dev] Minor issue with PyErr_NormalizeException In-Reply-To: Your message of "Tue, 01 Apr 2003 13:41:54 PST." <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net> References: <Pine.BSF.4.50.0304011338520.42302-100000@wintermute.sponsor.net> Message-ID: <200304102021.h3AKLXl05134@odiug.zope.com> > We had a bug in one of our extension modules that caused a core dump in > PyErr_NormalizeException(). At the very top of the function (line 133) it > checks for a NULL type. I think it should have a "return" here so that > the code does not continue and thus dump core on line 153 when it calls > PyClass_Check(type). This should also make the comment not lie about > dumping core. ;) > > Just thought I'd pass it on.. Thanks! You're right, the comment is misleading and the call to PyErr_SetString() was bogus. Tim and Barry suggested to replace it with a call to Py_FatalError(), but I think that's wrong too: I found several places where PyErr_NormalizeException() is used and a few lines later a check is made whether the exception type is NULL, so I think ignoring this call is safer. I'll fix this in CVS, and backport to 2.2. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Thu Apr 10 21:26:05 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 22:26:05 +0200 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> Message-ID: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> Tim Peters <tim.one@comcast.net> writes: > could become the simpler and faster How much faster, and for what example? Beautiful is better than ugly. Regards, Martin From martin@v.loewis.de Thu Apr 10 21:30:43 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 22:30:43 +0200 Subject: [Python-Dev] OT: Signal/noise ratio In-Reply-To: <20030410122243.GA17289@panix.com> References: <20030410045703.14754.qmail@web20007.mail.yahoo.com> <m37ka2vroc.fsf@mira.informatik.hu-berlin.de> <017b01c2ff55$c156a7e0$02fea8c0@HP> <20030410122243.GA17289@panix.com> Message-ID: <m3istmhz7w.fsf@mira.informatik.hu-berlin.de> Aahz <aahz@pythoncraft.com> writes: > I am starting to think that a better idea would be to make > python-dev a closed list (only subscribers may post), and have the > subscription process include a challenge/response with a code word > embedded in the list rules. I agree with Guido that an occasional indication of the lists's charter is not that annoying, and helps "silent" readers to focus their first posting to the python-dev related issues. Regards, Martin From jeremy@zope.com Thu Apr 10 21:31:27 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 10 Apr 2003 16:31:27 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> References: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050006687.20054.108.camel@slothrop.zope.com> On Thu, 2003-04-10 at 16:26, Martin v. L=F6wis wrote: > Tim Peters <tim.one@comcast.net> writes: >=20 > > could become the simpler and faster >=20 > How much faster, and for what example? Beautiful is better than ugly. Doesn't "beautiful is better than ugly" mean that a little ugliness in the gcmodule allows all the client code to be beautiful? Jeremy From martin@v.loewis.de Thu Apr 10 21:33:13 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 22:33:13 +0200 Subject: [Python-Dev] Re: _socket efficiency ideas In-Reply-To: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com> References: <58C1D0B500F92D469E0D73994AB66D040107EC26@GE0005EXCUAG01.ags.agere.com> Message-ID: <m3el4ahz3q.fsf@mira.informatik.hu-berlin.de> "Marangozov, Vladimir (Vladimir)" <vladimir.marangozov@optimay.com> writes: > Any decent DNS resolver first tries to figure out whether > the requested name string is an IP address. So how come that the *very* recent netdb libraries do DNS lookups for "apparently numeric" addresses, whereas somewhat older libraries don't? Regards, Martin From zen@shangri-la.dropbear.id.au Thu Apr 10 21:35:07 2003 From: zen@shangri-la.dropbear.id.au (Stuart Bishop) Date: Fri, 11 Apr 2003 06:35:07 +1000 Subject: [Python-Dev] tzset In-Reply-To: <057832A9-5A91-11D7-8A30-000393B63DDC@shangri-la.dropbear.id.au> Message-ID: <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au> On Thursday, March 20, 2003, at 04:01 PM, Stuart Bishop wrote: > I've submitted an update to SF: > http://www.python.org/sf/706707 > > This version should only build time.tzset if it accepts the TZ > environment > variable formats documented at: > http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html > So it shouldn't build under Windows. > > The last alternative would be to expose time.tzset if it exists at all, > and the test suite would simply check to make sure it doesn't raise > an exception. This would leave behaviour totally up to the OS, and the > corresponding lack of documentation in the Python library reference. The time.tzset patch is running fine. The outstanding issue is the test suite. I can happily run the existing tests on OS X, Redhat 7.2 and Solaris 2.8, but there are reports of odd behaviour that can only be attributed (as far as I can see) to broken time libraries. Broken time libraries are fine - time.tzset() is at a basic level just a wrapper around the C library call and we can't take responsibility for the operating system's bad behavior. However, if the C library doesn't work as documented, we have no way of testing if the various time.* values are being updated correctly. I think these are the options: - Use the test suite as it stands at the moment, which may cause the test to fail on broken platforms. - Use the test suite as it stands at the moment, flagging this test as an expected failure on broken platforms. - Don't test - just make sure time.tzset() doesn't raise an exception or core dump. The code that populated time.tzname etc. has never had unit tests before, so its not like we are going backwards. This option means tzset(3) could be exposed on Windows (which I can't presently do, not having a Windows dev box available). - Make the checks for a sane tzset(3) in configure.in more paranoid, so time.tzset() is only built if your OS correctly parses the standard TZ environment variable format *and* can correctly do daylight savings time calculations in the southern hemisphere etc. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/ From martin@v.loewis.de Thu Apr 10 21:35:17 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 22:35:17 +0200 Subject: [Python-Dev] More socket questions In-Reply-To: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> Message-ID: <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de> Mihai Ibanescu <misa@redhat.com> writes: > I propose that we use inet_aton to implement PySocket_inet_aton (am I > clever or what). The part that I don't know, how portable is this > function? Does it exist on Mac and Windows? This is the tricky part of any such change: Nobody knows, and you have to test it on a wide variety of platforms before it is acceptable. That *atleast* includes Windows, OS X, and one or two other flavours of Unix (Linux libc6 typically being one of them). Regards, Martin From martin@v.loewis.de Thu Apr 10 21:36:54 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 10 Apr 2003 22:36:54 +0200 Subject: [Python-Dev] Why is spawn*p* not available on Windows? In-Reply-To: <3E95CA62.4040904@ieee.org> References: <3E95CA62.4040904@ieee.org> Message-ID: <m365pmhyxl.fsf@mira.informatik.hu-berlin.de> "Shane Holloway (IEEE)" <shane.holloway@ieee.org> writes: > So, after all that, I looked in standard library header file for > process.h and found function prototypes for the various _spawn > functions. Would it make sense to add support for "spawnvp" and > "spawnvpe" to posixmodule.c? Should it be guarded by the existing > HAVE_SPAWNV, new HAVE_SPAWNVP, or by MS_WINDOWS definitions? Adding a HAVE_SPAWNVP would be most appropriate, IMO. Regards, Martin From tim.one@comcast.net Thu Apr 10 21:33:05 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 16:33:05 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> Message-ID: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net> [Tim] >> could become the simpler and faster [martin@v.loewis.de] > How much faster, Won't know until it's tried. > and for what example? Code that spends signficant time in tp_traverse, presumably. > Beautiful is better than ugly. Whish is another reason <wink> it would be nice to get rid of the endlessly repeated masses of ugly if (err) return err; incantations out of the many tp_traverse slots, in return for putting a little bit of setjmp/longjmp ugliness in exactly four functions hiding in a single module. From guido@python.org Thu Apr 10 21:38:58 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 16:38:58 -0400 Subject: [Python-Dev] More socket questions In-Reply-To: Your message of "Thu, 10 Apr 2003 15:29:20 EDT." <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> Message-ID: <200304102039.h3AKd0c06207@odiug.zope.com> > Since somebody mention inet_addr, here's something else that I can attempt > to fix if we agree on it. > > In python 2.2.2: > > socket.inet_aton("255.255.255.255") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > socket.error: illegal IP address string passed to inet_aton Check out Python 2.3, it's been fixed there. Unfortunately Windows only has inet_addr(), so it's still broken there. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Apr 10 22:35:26 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 17:35:26 -0400 Subject: [Python-Dev] tzset In-Reply-To: Your message of "Fri, 11 Apr 2003 06:35:07 +1000." <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au> References: <EAE04F94-6B93-11D7-8A32-000393B63DDC@shangri-la.dropbear.id.au> Message-ID: <200304102135.h3ALZQ613146@odiug.zope.com> > > I've submitted an update to SF: > > http://www.python.org/sf/706707 > > > > This version should only build time.tzset if it accepts the TZ > > environment > > variable formats documented at: > > http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html > > So it shouldn't build under Windows. > > > > The last alternative would be to expose time.tzset if it exists at all, > > and the test suite would simply check to make sure it doesn't raise > > an exception. This would leave behaviour totally up to the OS, and the > > corresponding lack of documentation in the Python library reference. > > The time.tzset patch is running fine. The outstanding issue is the > test suite. I can happily run the existing tests on OS X, Redhat 7.2 > and Solaris 2.8, but there are reports of odd behaviour that can > only be attributed (as far as I can see) to broken time libraries. The test passes for me on Red Hat 7.3. I tried it on Windows, and if I add "#define HAVE_WORKING_TZSET 1" to PC/pyconfig.h, timemodule.c compiles, but the tzset test fails with the error AssertionError: 69 != 1. This is on the line self.failUnlessEqual(time.daylight,1) That *could* be construed as a bug in the test, because the C library docs only promise that the daylight variable is nonzero. But if I fix that in the test by using bool(time.daylight), I get other failures, so I conclude that tzset() doesn't work the same way on Windows as the test expects. A simple solution would be to not provide tzset() on Windows. Time on Windows is managed sufficiently different that this might be okay. > Broken time libraries are fine - time.tzset() is at a basic level > just a wrapper around the C library call and we can't take > responsibility for the operating system's bad behavior. But is the observed behavior on Windows broken or not? I don't know. > However, if the C library doesn't work as documented, we have no way > of testing if the various time.* values are being updated correctly. Right. > I think these are the options: > - Use the test suite as it stands at the moment, which may cause the > test to fail on broken platforms. But we're not sure if the platform is broken or the test too stringent! > - Use the test suite as it stands at the moment, flagging this test > as an expected failure on broken platforms. Can't do that -- can flag only *skipped* tests as expected. > - Don't test - just make sure time.tzset() doesn't raise an > exception or core dump. The code that populated time.tzname > etc. has never had unit tests before, so its not like we are > going backwards. This option means tzset(3) could be exposed > on Windows (which I can't presently do, not having a Windows > dev box available). That would be acceptable to me. Since all we want is a wrapper around the C library tzset(), all we need to test for is that it does that. > - Make the checks for a sane tzset(3) in configure.in more > paranoid, so time.tzset() is only built if your OS correctly > parses the standard TZ environment variable format *and* can > correctly do daylight savings time calculations in the > southern hemisphere etc. Sounds like overprotective. I think that in those cases the tzset() function works fine, it's just the database of timezones that's different. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Thu Apr 10 22:04:55 2003 From: nas@python.ca (Neil Schemenauer) Date: Thu, 10 Apr 2003 14:04:55 -0700 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net> References: <m3n0iyhzfm.fsf@mira.informatik.hu-berlin.de> <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net> Message-ID: <20030410210455.GA22300@glacier.arctrix.com> Tim Peters wrote: > [martin@v.loewis.de] > > Beautiful is better than ugly. > > Whish is another reason <wink> it would be nice to get rid of the endlessly > repeated masses of ugly > > if (err) > return err; > > incantations out of the many tp_traverse slots, in return for putting a > little bit of setjmp/longjmp ugliness in exactly four functions hiding in a > single module. I agree that concentrating the ugliness is good. However, how portable is setjmp/longjmp? The manual page I have says C99. Can we rely on it being available? If not, could we just disable the gcmodule functions that depend on it? Neil From tim.one@comcast.net Thu Apr 10 23:12:41 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 18:12:41 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <20030410210455.GA22300@glacier.arctrix.com> Message-ID: <BIEJKCLHCIOIHAGOKOLHEEOKFEAA.tim.one@comcast.net> [Neil Schemenauer] > I agree that concentrating the ugliness is good. However, how portable > is setjmp/longjmp? The manual page I have says C99. It's also C89, i.e. "ANSI C". > Can we rely on it being available? I think so. Note that we have three modules that use them now, although they're not compiled everywhere (readline, pcre, fpectl). > If not, could we just disable the gcmodule functions that depend on it? Jeremy and I have spent a lot of time tracking down leaks (in Python and in Zope) recently, and get_refer{rers, ents} have been invaluable. If we found a platform where {set,long}jmp didn't work, I'd be OK with disabling those two gc functions on that platform. Those functions aren't needed for normal gc operation, and it's not any platform I'm going to be using anyway <wink>. From skip@pobox.com Thu Apr 10 22:28:55 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 10 Apr 2003 16:28:55 -0500 Subject: [Python-Dev] More socket questions In-Reply-To: <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de> References: <Pine.LNX.4.44.0304101525210.20778-100000@coyote.devel.redhat.com> <m3adeyhz0a.fsf@mira.informatik.hu-berlin.de> Message-ID: <16021.57879.498864.472222@montanaro.dyndns.org> Martin> This is the tricky part of any such change: Nobody knows, and Martin> you have to test it on a wide variety of platforms before it is Martin> acceptable. That *atleast* includes Windows, OS X, and one or Martin> two other flavours of Unix (Linux libc6 typically being one of Martin> them). I can check Mac OS X off your list. Here's the start of the inet_aton man page: INET(3) System Library Functions Manual INET(3) NAME inet_aton, inet_addr, inet_network, inet_ntoa, inet_ntop, inet_pton, inet_makeaddr, inet_lnaof, inet_netof - Internet address manipulation routines ... And here's the check from distutils: >>> import distutils.ccompiler >>> cc = distutils.ccompiler.new_compiler() >>> cc.has_function("inet_aton") True >>> cc.has_function("blecherous") ld: Undefined symbols: _blecherous False (Note that has_function() isn't in cvs yet.) Skip From neal@metaslash.com Thu Apr 10 23:50:01 2003 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 10 Apr 2003 18:50:01 -0400 Subject: [Python-Dev] backporting string changes to 2.2.3 Message-ID: <20030410225001.GN17847@epoch.metaslash.com> Just in case anybody missed it the first several times, there were several inconsistencies in the string methods/functions. This checkin should make everything consistent for 2.3. I'm planning to backport these string changes to 2.2.3. The reason is that methods on string objects already have the changes, only doc is being updated. The string module has the change for strip, but not lstrip/rstrip, and UserString doesn't have any. Modified Files: Doc/lib/libstring.tex: 1.49 Lib/UserString.py: 1.17 Lib/string.py: 1.68 Lib/test/string_tests.py: 1.31 Objects/stringobject.c: 2.208 Objects/unicodeobject.c: 2.187 Log Message: Attempt to make all the various string *strip methods the same. * Doc - add doc for when functions were added * UserString * string object methods * string module functions 'chars' is used for the last parameter everywhere. These changes will be backported, since part of the changes have already been made, but they were inconsistent. Neal From greg@cosc.canterbury.ac.nz Fri Apr 11 01:29:37 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 12:29:37 +1200 (NZST) Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEENJFEAA.tim.one@comcast.net> Message-ID: <200304110029.h3B0TbA09063@oma.cosc.canterbury.ac.nz> > I've looked at every traverse slot in the core, and there's no problem > with those. I don't think that's an accident -- the only purpose of > an object's tp_traverse is to invoke the visit callback on the > non-NULL PyObject* pointers the object has. So, e.g., there isn't an > incref or decref in any of 'em now; But what about the *visit function*? You need to take account of what it might do as well. And if it's ever used for something beside GC, it could do anything. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Apr 11 01:32:20 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 12:32:20 +1200 (NZST) Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <1050001991.4473.103.camel@slothrop.zope.com> Message-ID: <200304110032.h3B0WKM09071@oma.cosc.canterbury.ac.nz> > If traverse worked this way, the traverse and clear slots and a part > of the dealloc slot become almost identical. ... It would be cool if > there were a way to automate some of the boilerplate. There is... use Pyrex. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Apr 11 01:39:53 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 12:39:53 +1200 (NZST) Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <200304101957.h3AJvUe04626@odiug.zope.com> Message-ID: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz> > If I had to do it over again, your suggestion would make sense; > > But having this in the os module, which deals with such low-level file > descriptors, still strikes me as a pretty decent place to put it as > well, and I don't think it's worth the bother of updating > documentation and so on. I can think of another reason for making it a class method: so that custom subclasses of file, or other file-like objects, can override it to create objects of the appropriate type. But since it is an os-dependent feature, the implementation of it probably does belong in the os module. So how about providing a file.fromfd() which calls os.fdopen()? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Apr 11 01:46:40 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 12:46:40 +1200 (NZST) Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <200304101957.h3AJvUe04626@odiug.zope.com> Message-ID: <200304110046.h3B0kex09103@oma.cosc.canterbury.ac.nz> > but note that file objects have a method fileno(), which > returns a file descriptor. Its implementation is not #ifdefed in any > way -- the C stdio library requires fileno() to exist! Hmmm, I wasn't sure whether fileno() was a required part of stdio, or whether it only existed on unix-like systems. If it really is required, I guess it doesn't have to be in the os module. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Fri Apr 11 01:48:19 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 10 Apr 2003 20:48:19 -0400 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: "Your message of Fri, 11 Apr 2003 12:39:53 +1200." <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz> References: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz> Message-ID: <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net> > > If I had to do it over again, your suggestion would make sense; > > > > But having this in the os module, which deals with such low-level > > file descriptors, still strikes me as a pretty decent place to put > > it as well, and I don't think it's worth the bother of updating > > documentation and so on. > > I can think of another reason for making it a class > method: so that custom subclasses of file, or other > file-like objects, can override it to create objects > of the appropriate type. Yeah, this was the gist of Oren's post (if I understood it correctly). > But since it is an os-dependent feature, the implementation > of it probably does belong in the os module. > > So how about providing a file.fromfd() which calls > os.fdopen()? I've never seen anyone code a file subclass yet, let alone one that needed this. YAGNI? --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Fri Apr 11 01:54:01 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 12:54:01 +1200 (NZST) Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <BIEJKCLHCIOIHAGOKOLHAEODFEAA.tim.one@comcast.net> Message-ID: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz> > it would be nice to get rid of the endlessly repeated ... ugly > incantations out of the many tp_traverse slots, in return for putting > a little bit of setjmp/longjmp ugliness in exactly four functions > hiding in a single module. I'd be pretty nervous about having any longjmps anywhere near anything Python. If you do this, you'll have to make it very clear that tp_traverse implementations MUST NOT alter any Python ref counts, or rely in any other way on running to completion. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Fri Apr 11 03:49:01 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 22:49:01 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <200304110029.h3B0TbA09063@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCIEPOECAB.tim.one@comcast.net> [Greg Ewing] > But what about the *visit function*? You need to take > account of what it might do as well. And if it's ever > used for something beside GC, it could do anything. I don't see the relevance. The visit functions are where the longjmps would go, if a visit function felt like using one. Two visit functions in gcmodule.c would use them, the other visit functions in gcmodule.c would not. I don't know of any visit functions not in gcmodule.c (where they all have static scope), nor do I expect to see any outside of gcmodule.c -- visit functions are Python internals. tp_clear and tp_traverse functions must be supplied by extension authors who want their types to play with the gc system, but extension authors are never required (or even asked) to write a visit function. From tim.one@comcast.net Fri Apr 11 03:51:27 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 22:51:27 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCCEPPECAB.tim.one@comcast.net> [Greg Ewing] > I'd be pretty nervous about having any longjmps anywhere > near anything Python. Why? > If you do this, you'll have to make it very clear that > tp_traverse implementations MUST NOT alter any Python > ref counts, or rely in any other way on running to > completion. That's so. For reasons explained earlier, it would be quite surprising to see a tp_traverse function play with anything's refcount (their purpose is to pass an object's PyObject* pointers on to the callback argument, and that's all; manipulating refcounts during this wouldn't make sense). From tim.one@comcast.net Fri Apr 11 04:02:06 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 23:02:06 -0400 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <200304110046.h3B0kex09103@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCCEAAEDAB.tim.one@comcast.net> [Greg Ewing] > Hmmm, I wasn't sure whether fileno() was a required part of stdio, or > whether it only existed on unix-like systems. If it really is > required, I guess it doesn't have to be in the os module. It's not required by standard C -- standard C has only streams, not file descriptors. Nevertheless, POSIX requires them, and uses of fileno() in Python are unconditional (aren't conditionally compiled depending on config symbols), so they're on every platform Python links on today. From greg@cosc.canterbury.ac.nz Fri Apr 11 04:04:35 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Apr 2003 15:04:35 +1200 (NZST) Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEPPECAB.tim.one@comcast.net> Message-ID: <200304110304.h3B34ZZ12973@oma.cosc.canterbury.ac.nz> > it would be quite surprising to see a tp_traverse function play with > anything's refcount (their purpose is to pass an object's PyObject* > pointers on to the callback argument, and that's all A thought -- maybe tp_visit and tp_clear could be unified by having a tp_visit that passed pointers to pointers to objects to the callback? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Fri Apr 11 04:41:03 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 23:41:03 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <200304110304.h3B34ZZ12973@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net> [Greg Ewing] > A thought -- maybe tp_visit and tp_clear could be unified > by having a tp_visit that passed pointers to pointers to > objects to the callback? I think Jeremy suggested something like that earlier today. I don't think it would fly now. tuples are the simplest example of a gc container object whose tp_clear and tp_traverse slot functions do radically different things (the tuple tp_clear is NULL!); type objects may be the most complex example (see the long comment block in typeobject.c's type_clear for an explanation of why only tp_mro is-- or needs to be --cleared). In general, tp_traverse needs to reveal every PyObject* that may be part of a cycle, but tp_clear only needs to nuke the subset of those necessary to guarantee that all cycles will be broken. OTOH, I suspect Guido thought too hard about this. Like the tp_clear comment: tp_dict: It is a dict, so the collector will call its tp_clear. If type_clear decrefed tp_dict, and the refcount fell to 0 thereby, the usual refcount mechanism would nuke the dict on its own, and the collector would *not* in fact call the dict's tp_clear slot (the dict object would get unlinked from the gc list it was in, and the collector would never see the dict again). So I'm unclear on what we're trying to optimize when a tp_clear nukes less than the corresponding tp_traverse visits. I suppose "code space" is one decent answer to that. From tim.one@comcast.net Fri Apr 11 04:52:43 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 10 Apr 2003 23:52:43 -0400 Subject: [Python-Dev] tzset In-Reply-To: <200304102135.h3ALZQ613146@odiug.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net> [Guido] > The test passes for me on Red Hat 7.3. > > I tried it on Windows, and if I add "#define HAVE_WORKING_TZSET 1" to > PC/pyconfig.h, timemodule.c compiles, but the tzset test fails with > the error AssertionError: 69 != 1. This is on the line > > self.failUnlessEqual(time.daylight,1) > > That *could* be construed as a bug in the test, because the C library > docs only promise that the daylight variable is nonzero. That's all the MS docs promise too. You're actually getting ord("E"), the first letter in "EDT". > But if I fix that in the test by using bool(time.daylight), I get other > failures, so I conclude that tzset() doesn't work the same way on Windows as the > test expects. You can read the docs. It doesn't work on Windows the way anyone expects <0.5 wink>: http://tinyurl.com/9a2n > ... > But is the observed behavior on Windows broken or not? I don't know. It probably works as documented, but Real Windows Weenies use the native Win32 time zone functions. > ... > That would be acceptable to me. Since all we want is a wrapper around > the C library tzset(), all we need to test for is that it does that. It's not really what I want. When we expose highly platform-dependent functions, we create a lot of confusion along with them. Perhaps that's because we're not always careful to emphasize that the behavior is a cross-platform crapshoot, and users are rarely careful to heed such warnings. From martin@v.loewis.de Fri Apr 11 06:08:56 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 11 Apr 2003 07:08:56 +0200 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net> Message-ID: <m3ptnty61j.fsf@mira.informatik.hu-berlin.de> Tim Peters <tim.one@comcast.net> writes: > So I'm unclear on what we're trying to optimize when a tp_clear nukes less > than the corresponding tp_traverse visits. I suppose "code space" is one > decent answer to that. In the case of type objects, it's not a matter of optimization but of correctness. If you were clearing all slots of a type object, you'd lose state that is still needed later on; see the comment for typeobject.c:2.150. Regards, Martin From boris.boutillier@arteris.net Fri Apr 11 09:25:21 2003 From: boris.boutillier@arteris.net (Boris Boutillier) Date: 11 Apr 2003 10:25:21 +0200 Subject: [Python-Dev] backporting string changes to 2.2.3 In-Reply-To: <20030410225001.GN17847@epoch.metaslash.com> References: <20030410225001.GN17847@epoch.metaslash.com> Message-ID: <1050049521.1751.16.camel@elevedelix> Hi everybody, As this is my first message on this development list I'll introduce myself, I am a hardware designer in a new french startup Arteris which is developping MicroElectronics IP cores. I'm responsible for development of EDA tools, ie software to design and validate hardware designs. For this purpose we've been developping a EDA design plateform enterily in Python (with Python-C parts for the core database) for about 14 months. This plateform is being actively used for about three months and is working well. Now I'd like to give some help in developing Python, using my own experience to try to improve this great language. I'll start simple here, (we've got other great ideas, but i'll expose them here when there will be some kind of first draft). On string objects there is a find and rfind, a lstrip and rstrip, but there is no rsplit function, is there a reason why there isn't, or is this only because nobody implement it ? ( in this case I'll propose a patch in a few days). I'm mainly using it for 'toto.titi.tata'.rsplit('.',1) -> 'toto.titi','tata' as our internal database representation is quite like a logical filesystem. -- Boris Boutillier - boris.boutillier@arteris.net On Fri, 2003-04-11 at 00:50, Neal Norwitz wrote: > > Just in case anybody missed it the first several times, > there were several inconsistencies in the string methods/functions. > This checkin should make everything consistent for 2.3. > > I'm planning to backport these string changes to 2.2.3. > The reason is that methods on string objects already have the > changes, only doc is being updated. The string module has > the change for strip, but not lstrip/rstrip, and UserString > doesn't have any. > > Modified Files: > > Doc/lib/libstring.tex: 1.49 > Lib/UserString.py: 1.17 > Lib/string.py: 1.68 > Lib/test/string_tests.py: 1.31 > Objects/stringobject.c: 2.208 > Objects/unicodeobject.c: 2.187 > > Log Message: > > Attempt to make all the various string *strip methods the same. > * Doc - add doc for when functions were added > * UserString > * string object methods > * string module functions > 'chars' is used for the last parameter everywhere. > > These changes will be backported, since part of the changes > have already been made, but they were inconsistent. > > Neal > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From vladimir.marangozov@optimay.com Fri Apr 11 09:38:12 2003 From: vladimir.marangozov@optimay.com (Marangozov, Vladimir (Vladimir)) Date: Fri, 11 Apr 2003 04:38:12 -0400 Subject: [Python-Dev] Re: More socket questions Message-ID: <58C1D0B500F92D469E0D73994AB66D040107EC29@GE0005EXCUAG01.ags.agere.com> Hi, inet_aton() is a pretty simple parser of an IP address string, but it is not available on all setups. Libraries relying on it usually provide a local version. So do the same. Search the Web for "inet_aton.c" and you'll hit a standard implementation, with all the niceties about the base encoding of each part of the IP address which follows the C convention: 0x - hex, 0 - octal, other - decimal. And thus, BTW, "ping 192.30.20.10" is not the same as "ping 192.030.020.010". So take that code, stuff it in my_inet_aton() and case closed. You could use my_inet_aton() before calling gethostbyname('name') to see whether 'name' is an IP address and return immediately, but as I said, decent resolvers should do that for you. After all, their job is to give you an IP address in return. If you feed an IP address as an input, you should get it as a reply. Not all resolvers are decent, though. On top of that, some have bugs :-). I can't answer the question about netdb's status quo. Cheers, Vladimir From mwh@python.net Fri Apr 11 13:03:56 2003 From: mwh@python.net (Michael Hudson) Date: Fri, 11 Apr 2003 13:03:56 +0100 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz> (Greg Ewing's message of "Fri, 11 Apr 2003 12:54:01 +1200 (NZST)") References: <200304110054.h3B0s1809124@oma.cosc.canterbury.ac.nz> Message-ID: <2mwui16y1f.fsf@starship.python.net> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > I'd be pretty nervous about having any longjmps anywhere > near anything Python. Too late, if you use readline and ever press ^C. Cheers, M. -- Presumably pronging in the wrong place zogs it. -- Aldabra Stoddart, ucam.chat From harri@labs.trema.com Fri Apr 11 13:16:47 2003 From: harri@labs.trema.com (Harri Pasanen) Date: Fri, 11 Apr 2003 14:16:47 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures Message-ID: <200304111416.47006.harri.pasanen@trema.com> Hello, In a few hours old CVS checkout, I'm having problems getting the embedded python to work. ---------8<------------8<-------------8<-------------8<----------------- #include <Python.h> char* cmd = "import sys; print sys.path\n" "import re; print dir(re)\n"; int main() { Py_Initialize(); printf("Initialize done\n"); PyRun_SimpleString(cmd); Py_Finalize(); return 0; } ---------8<------------8<-------------8<-------------8<----------------- import re seems to be succeeded only half way, the output is: Initialize done ['f:\\trema\\fk-dev\\tools\\python\\PCbuild\\python23.zip', 'f:\\trema\\fk-dev\\tools\\python\\DLLs', 'f:\\trema\\fk-dev\\tools\\python\\lib', 'f:\\trema\\fk-dev\\tools\\python\\lib\\plat-win', 'f:\\trema\\fk-dev\\tools\\python\\lib\\lib-tk', 'f:\\trema\\fk-dev\\tools\\python\\Demo\\embed', 'f:\\trema\\fk-dev\\tools\\python', 'f:\\trema\\fk-dev\\tools\\python\\lib\\site-packages'] ['__builtins__', '__doc__', '__file__', '__name__', 'engine'] So the re namespace is lacking everything from sre. On linux it works both embedded, and from the interactive interpreter. On Win2K the interactive interpreter seems to work fine. On Win2K, I have this working ok using Python 2.2.2. What gives? Harri From guido@python.org Fri Apr 11 15:27:50 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 11 Apr 2003 10:27:50 -0400 Subject: [Python-Dev] backporting string changes to 2.2.3 In-Reply-To: Your message of "11 Apr 2003 10:25:21 +0200." <1050049521.1751.16.camel@elevedelix> References: <20030410225001.GN17847@epoch.metaslash.com> <1050049521.1751.16.camel@elevedelix> Message-ID: <200304111428.h3BERvm14364@odiug.zope.com> > On string objects there is a find and rfind, a lstrip and rstrip, but > there is no rsplit function, is there a reason why there isn't, or is > this only because nobody implement it ? ( in this case I'll propose a > patch in a few days). I'm mainly using it for > 'toto.titi.tata'.rsplit('.',1) -> 'toto.titi','tata' as our internal > database representation is quite like a logical filesystem. I think the reason is that there isn't enough need for it. The special case of s.rsplit(c, 1) can be coded so easily by using rfind() that I don't see the need to add it. Our Swiss Army Knife string type is beginning to be so loaded with features that I am reluctant to add more. The cost of a new feature these days is measured in the number of books that need to be updated, not the number of lines of code needed to implement it. For your amusement only (! :-), I offer this implementation of rsplit(), which works in Python 2.3: def rsplit(string, sep, count=-1): L = [part[::-1] for part in string[::-1].split(sep[::-1], count)] L.reverse() return L --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 11 15:50:36 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 11 Apr 2003 10:50:36 -0400 Subject: [Python-Dev] tzset In-Reply-To: Your message of "Thu, 10 Apr 2003 23:52:43 EDT." <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCAEAEEDAB.tim.one@comcast.net> Message-ID: <200304111450.h3BEocd14466@odiug.zope.com> > > That would be acceptable to me. Since all we want is a wrapper > > around the C library tzset(), all we need to test for is that it > > does that. > > It's not really what I want. When we expose highly > platform-dependent functions, we create a lot of confusion along > with them. Perhaps that's because we're not always careful to > emphasize that the behavior is a cross-platform crapshoot, and users > are rarely careful to heed such warnings. I guess we shouldn't expose the Windows version of tzset() at all. The syntax it accepts and the rules it applies (always following US DST rules) make it pretty useless. OTOH I think tzset() is useful on most Unix and Linux platforms, and there's no easy alternative (short of wrapping the tz library, which would be a huge task), so there we should expose it. I believe this means that Stuart's patch can be checked in as is. We can tweak it based on reports during the beta cycle. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 11 15:53:39 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 11 Apr 2003 10:53:39 -0400 Subject: [Python-Dev] Re: tp_clear return value In-Reply-To: Your message of "Thu, 10 Apr 2003 23:41:03 EDT." <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCGEADEDAB.tim.one@comcast.net> Message-ID: <200304111453.h3BErjk14482@odiug.zope.com> > So I'm unclear on what we're trying to optimize when a tp_clear > nukes less than the corresponding tp_traverse visits. I suppose > "code space" is one decent answer to that. Yes. Though the type object example shows there are other differences (thanks Martin). --Guido van Rossum (home page: http://www.python.org/~guido/) From boris.boutillier@arteris.net Fri Apr 11 16:04:45 2003 From: boris.boutillier@arteris.net (Boris Boutillier) Date: 11 Apr 2003 17:04:45 +0200 Subject: [Python-Dev] backporting string changes to 2.2.3 In-Reply-To: <200304111428.h3BERvm14364@odiug.zope.com> References: <20030410225001.GN17847@epoch.metaslash.com> <1050049521.1751.16.camel@elevedelix> <200304111428.h3BERvm14364@odiug.zope.com> Message-ID: <1050073485.1828.23.camel@elevedelix> I see, I didn't think about all the documentations to update, and i should have as I've got the same problem in my project :). > I think the reason is that there isn't enough need for it. The > special case of s.rsplit(c, 1) can be coded so easily by using rfind() > that I don't see the need to add it. Our Swiss Army Knife string type > is beginning to be so loaded with features that I am reluctant to add > more. The cost of a new feature these days is measured in the number > of books that need to be updated, not the number of lines of code > needed to implement it. > > For your amusement only (! :-), I offer this implementation of > rsplit(), which works in Python 2.3: > > def rsplit(string, sep, count=-1): > L = [part[::-1] for part in string[::-1].split(sep[::-1], count)] > L.reverse() > return L Didn't thought about this one, tricky and amusing. -- Boris Boutillier - Boris.Boutillier@arteris.net From barry@python.org Fri Apr 11 18:51:56 2003 From: barry@python.org (Barry Warsaw) Date: 11 Apr 2003 13:51:56 -0400 Subject: [Python-Dev] Changes to gettext.py for Python 2.3 Message-ID: <1050083516.11172.40.camel@barry> Hi I18n-ers, I plan on checking in the following changes to the gettext.py module for Python 2.3, based on feedback from the Zope and Mailman i18n work. Here's a summary of the changes, hopefully there aren't too many controversies <wink>. I'll update the tests and the docs at the same time. - Expose NullTranslations and GNUTranslations to __all__ - Set the default charset to iso-8859-1. It used to be None, which would cause problems with .ugettext() if the file had no charset parameter. Arguably, the po/mo file would be broken, but I still think iso-8859-1 is a reasonable default. - Add a "coerce" default argument to GNUTranslations's constructor. The reason for this is that in Zope, we want all msgids and msgstrs to be Unicode. For the latter, we could use .ugettext() but there isn't currently a mechanism for Unicode-ifying msgids. The plan then is that the charset parameter specifies the encoding for both the msgids and msgstrs, and both are decoded to Unicode when read. For example, we might encode po files with utf-8. I think the GNU gettext tools don't care. Since this could potentially break code [*] that wants to use the encoded interface .gettext(), the constructor flag is added, defaulting to False. Most code I suspect will want to set this to True and use .ugettext(). - A few other minor changes from the Zope project, including asserting that a zero-length msgid must have a Project-ID-Version header for it to be counted as the metadata record. -Barry [*] I've come to the opinion that using anything other than Unicode msgids and msgstrs just won't work well for Python, and thus you really should be using the .ugettext() method everywhere. It's also insane to mix .gettext() and .ugettext(). In Zope, all human readable messages will be Unicode strings internally, so we definitely want Unicode msgids. From exarkun@intarweb.us Fri Apr 11 19:11:57 2003 From: exarkun@intarweb.us (Jp Calderone) Date: Fri, 11 Apr 2003 14:11:57 -0400 Subject: [Python-Dev] Placement of os.fdopen functionality In-Reply-To: <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net> References: <200304110039.h3B0dqD09084@oma.cosc.canterbury.ac.nz> <200304110048.h3B0mJ803809@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030411181157.GA32603@meson.dyndns.org> --AhhlLboLdkugWU4S Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 10, 2003 at 08:48:19PM -0400, Guido van Rossum wrote: >=20 [Greg Ewing] > > But since it is an os-dependent feature, the implementation > > of it probably does belong in the os module. > >=20 > > So how about providing a file.fromfd() which calls > > os.fdopen()? >=20 > I've never seen anyone code a file subclass yet, let alone one that > needed this. YAGNI? >=20 codecs.EncodedFile seems almost like it should (but it's just a factory function). Other than that I can't think of anything that does or that would benefit from doing so. Jp --=20 Lowery's Law: If it jams -- force it. If it breaks, it needed replacing anyway. --=20 up 22 days, 15:01, 3 users, load average: 1.05, 1.11, 1.16 --AhhlLboLdkugWU4S Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE+lwVtedcO2BJA+4YRAjoAAJwIZeuDojw3sloNjgnD2VIv2ys9HQCgx6FI h36wS6oWikFdzUGmmil0E+0= =xJ1v -----END PGP SIGNATURE----- --AhhlLboLdkugWU4S-- From martin@v.loewis.de Fri Apr 11 20:54:50 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 11 Apr 2003 21:54:50 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1050083516.11172.40.camel@barry> References: <1050083516.11172.40.camel@barry> Message-ID: <3E971D8A.5020006@v.loewis.de> Barry Warsaw wrote: > - Set the default charset to iso-8859-1. It used to be None, which > would cause problems with .ugettext() if the file had no charset > parameter. Arguably, the po/mo file would be broken, but I still think > iso-8859-1 is a reasonable default. I'm -1 here. Why do you think it is a reasonable default? Errors should never pass silently. Unless explicitly silenced. While iso-8859-1 might be a reasonable default in other application domains, in the context of non-English text (which it typically is), assuming Latin-1 is bound to create mojibake. If your application can accept creating mojibake, I suggest a method setdefaultencoding on the catalog, which has no effect if an encoding was found in the catalog. > - Add a "coerce" default argument to GNUTranslations's constructor. The > reason for this is that in Zope, we want all msgids and msgstrs to be > Unicode. For the latter, we could use .ugettext() but there isn't > currently a mechanism for Unicode-ifying msgids. Could you please in what context this is needed? msgids are ASCII, and you can pass a Unicode string to ugettext just fine. > The plan then is that the charset parameter specifies the encoding for > both the msgids and msgstrs, and both are decoded to Unicode when read. > For example, we might encode po files with utf-8. I think the GNU > gettext tools don't care. They complain loudly if they find bytes > 127 in the msgid. > Since this could potentially break code [*] that wants to use the > encoded interface .gettext(), the constructor flag is added, defaulting > to False. Most code I suspect will want to set this to True and use > .ugettext(). To avoid breakage, you could define ugettext as def ugettext(self, message): if isinstance(message, unicode): tmsg = self._catalog.get(message.encode(self._charset)) if tmsg is None: return message else: tmsg = self._catalog.get(message, message) return unicode(tmsg, self._charset) > - A few other minor changes from the Zope project, including asserting > that a zero-length msgid must have a Project-ID-Version header for it to > be counted as the metadata record. That test was there, and removed on request of Bruno Haible, the GNU gettext maintainer, as he points out that Project-ID-Version is not mandatory for the metadata (see Patch #700839). Regards, Martin From barry@python.org Fri Apr 11 21:26:59 2003 From: barry@python.org (Barry Warsaw) Date: 11 Apr 2003 16:26:59 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <3E971D8A.5020006@v.loewis.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> Message-ID: <1050092819.11172.89.camel@barry> On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote: > Barry Warsaw wrote: > > > - Set the default charset to iso-8859-1. It used to be None, which > > would cause problems with .ugettext() if the file had no charset > > parameter. Arguably, the po/mo file would be broken, but I still think > > iso-8859-1 is a reasonable default. > > I'm -1 here. Why do you think it is a reasonable default? > > Errors should never pass silently. > Unless explicitly silenced. > > While iso-8859-1 might be a reasonable default in other application > domains, in the context of non-English text (which it typically is), > assuming Latin-1 is bound to create mojibake. Okay, never mind, I'll back this one out. The problem was caused by my other patch to unicode-ify on read (see below) without first having a charset. I have a different fix for this. > > - Add a "coerce" default argument to GNUTranslations's constructor. The > > reason for this is that in Zope, we want all msgids and msgstrs to be > > Unicode. For the latter, we could use .ugettext() but there isn't > > currently a mechanism for Unicode-ifying msgids. > > Could you please in what context this is needed? msgids are ASCII, and > you can pass a Unicode string to ugettext just fine. In Zope, all strings are Unicode and the catalog may include messages that are extracted from places other than Python source code, e.g. XML-based files. Message ids can contain non-ASCII characters if they are written by a non-English coder. I think in that case, we'd want to do something like encode the strings possibly with utf-8 for the .po/.mo files, but we want them decoded in time to look the Unicode strings up in the catalog. Similarly, what happens if a non-English coder writes an i18n'd Python module with native strings, possibly using a Python 2.3 coding cookie. We'd want their message ids to be extracted into the .mo/.po files, right? > > The plan then is that the charset parameter specifies the encoding for > > both the msgids and msgstrs, and both are decoded to Unicode when read. > > For example, we might encode po files with utf-8. I think the GNU > > gettext tools don't care. > > They complain loudly if they find bytes > 127 in the msgid. Really? Ok, I'm still confused because I tried the following example: I wrote a .mo file (charset=utf-8) with the following record: #: nofile:0 msgid "ab\xc3\x9e" msgstr "\xc2\xa4yz" I used standard msgfmt to turn that into a .mo file. Then created a GNUTranslation(fp, coerce=True) and called >>> t.ugettext(u'ab\xde') u'\xa4yz' This is what I should expect, right? ;) > > - A few other minor changes from the Zope project, including asserting > > that a zero-length msgid must have a Project-ID-Version header for it to > > be counted as the metadata record. > > That test was there, and removed on request of Bruno Haible, the GNU > gettext maintainer, as he points out that Project-ID-Version is not > mandatory for the metadata (see Patch #700839). Ah, I read the diff backwards in this case. I'll back this one out too. -Barry From barry@python.org Fri Apr 11 21:37:56 2003 From: barry@python.org (Barry Warsaw) Date: 11 Apr 2003 16:37:56 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <3E971D8A.5020006@v.loewis.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> Message-ID: <1050093475.11200.96.camel@barry> On Fri, 2003-04-11 at 15:54, "Martin v. Löwis" wrote: > To avoid breakage, you could define ugettext as > > def ugettext(self, message): > if isinstance(message, unicode): > tmsg = self._catalog.get(message.encode(self._charset)) > if tmsg is None: > return message > else: > tmsg = self._catalog.get(message, message) > return unicode(tmsg, self._charset) I suppose we could cache the conversion to make the next lookup more efficient. Alternatively, if we always convert internally to Unicode we could encode on .gettext(). Then we could just pick One Way and do away with the coerce flag. -Barry From guido@python.org Fri Apr 11 21:32:51 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 11 Apr 2003 16:32:51 -0400 Subject: [Python-Dev] Re: More int/long integration issues In-Reply-To: Your message of "21 Mar 2003 14:42:07 PST." <1048286527.651.29.camel@sayge.arc.nasa.gov> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <uwuitaf3c.fsf@boost-consulting.com> <200303202233.h2KMXbG07782@odiug.zope.com> <uznnowzjb.fsf@boost-consulting.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> <1048286527.651.29.camel@sayge.arc.nasa.gov> Message-ID: <200304112033.h3BKWw703999@odiug.zope.com> > On Fri, 2003-03-21 at 06:55, Guido van Rossum wrote: > > > > > Hm, maybe range() shouldn't be an iterator but an interator > > > > generator. No time to explain; see the discussion about > > > > restartable iterators. [Chad Netzer] > Hmmm. Now that've uploaded my patch extending range() to longs, (And now that I've checked it in. :-) > I'd like to work on this. I've already written a C range() iterator > (incorporating PyLongs), and it would be very nice to have it > automatically be a lazy range() when used in a loop. > > In any case, assuming you are quite busy, but would consider this for > the 2.4 timeframe, I will do some work on it. If it is already being > covered, I'll gladly stay away from it. :) range() can't be changed from returning a list until at least Python 3.0. xrange() already is an iterator well. So I'm not sure there's much to do, especially since I think making xrange() support large longs goes against the design goal for xrange(), which is to be a lightweight alternative for range() when speed is important. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Apr 9 16:23:51 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 09 Apr 2003 11:23:51 -0400 Subject: [Python-Dev] RE: Adding item in front of a list In-Reply-To: <yu99k7e3wxcf.fsf@europa.research.att.com> Message-ID: <BIEJKCLHCIOIHAGOKOLHGEIJFEAA.tim.one@comcast.net> [Andrew Koenig, starting with l=[2, 3, 4]] > ... > I would have thought that after l.insert(-1, 1), l would be > [2, 3, 1, 4], but it doesn't work that way. Alas, list.insert() existed before sequence indices were generalized to give a "count from the right end" meaning to negative index values. When the generalization happened, it appears that list.insert() was just overlooked. I'd like to change this. If I did, how loudly would people scream? Guido says he also wishes list.insert() had been defined with the arguments in the opposite order, so that list.insert(object) could have a natural default index argument of 0. I'd like to change that too, but it's clearly too late for that one. From nas@python.ca Fri Apr 11 23:14:31 2003 From: nas@python.ca (Neil Schemenauer) Date: Fri, 11 Apr 2003 15:14:31 -0700 Subject: [Python-Dev] new bytecode results In-Reply-To: <b3kooi$gaj$1@main.gmane.org> References: <b3kooi$gaj$1@main.gmane.org> Message-ID: <20030411221431.GA25548@glacier.arctrix.com> Damien Morton wrote: > I tried adding a variety of new instructions to the PVM, initially with a > code compression goal for the bytecodes, and later with a performance goal. Hi Damiem, It's good to see your enthusiasm for optimization. However, I can't help but think your efforts could be better directed. Have you looked at the CALL_ATTR work that was done at PyCon? There was also some work done on optimizing descriptors. I think working on global and builtin namespace optimizations could payoff big. There was talk about disallowing shadowing builtin names. That would allow getting rid of runtime lookups in dictionaries and even inlining of builtin functions. I have a patch on SF that could use some polish. Also, working on the new AST compiler would help us. It will be much easier to add new optimization passes after that work is completed. > begin 666 source.zip > M4$L#!!0````(`.0E6RZ%[DUZ.%X``)9\`0`'````8V5V86PN8^Q]?5<;.;+W [...] Yikes. Next time you should just upload a patch to Source Forge. Neil From skip@pobox.com Fri Apr 11 23:58:23 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 11 Apr 2003 17:58:23 -0500 Subject: [Python-Dev] new bytecode results In-Reply-To: <20030411221431.GA25548@glacier.arctrix.com> References: <b3kooi$gaj$1@main.gmane.org> <20030411221431.GA25548@glacier.arctrix.com> Message-ID: <16023.18575.48444.491279@montanaro.dyndns.org> Neil> Damien Morton wrote: >> I tried adding a variety of new instructions to the PVM, initially >> with a code compression goal for the bytecodes, and later with a >> performance goal. Neil> Hi Damiem, Neil> It's good to see your enthusiasm for optimization. However, I Neil> can't help but think your efforts could be better directed. Have Neil> you looked at the CALL_ATTR work that was done at PyCon? There Neil> was also some work done on optimizing descriptors. I think that message got stuck on mail.python.org on Feb 27 and was just released from purgatory today. Maybe it was the size? Skip From paul@prescod.net Sat Apr 12 00:59:29 2003 From: paul@prescod.net (Paul Prescod) Date: Fri, 11 Apr 2003 16:59:29 -0700 Subject: [Python-Dev] Garbage collecting closures Message-ID: <3E9756E1.10503@prescod.net> Does this bug look familiar to anyone? import gc def bar(a): def foo(): return None x = a foo() class C:pass a = C() for i in range(20): print len(gc.get_referrers(a)) x = bar(a) On my Python, it just counts up. "a" gets more and more referrers and they are "cell" objects. If it is unknown, I'll submit a bug report unless someone fixes it before I get to it. ;) Paul Prescod From guido@python.org Sat Apr 12 01:45:40 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 11 Apr 2003 20:45:40 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: "Your message of Fri, 11 Apr 2003 16:59:29 PDT." <3E9756E1.10503@prescod.net> References: <3E9756E1.10503@prescod.net> Message-ID: <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net> > Does this bug look familiar to anyone? > > import gc > > def bar(a): > def foo(): > return None > x = a > foo() > > class C:pass > a = C() > > for i in range(20): > print len(gc.get_referrers(a)) > x = bar(a) > > On my Python, it just counts up. "a" gets more and more referrers and > they are "cell" objects. If it is unknown, I'll submit a bug report > unless someone fixes it before I get to it. ;) If I use a "while 1" loop, the count never goes above 225. --Guido van Rossum (home page: http://www.python.org/~guido/) From paul@prescod.net Sat Apr 12 03:03:40 2003 From: paul@prescod.net (Paul Prescod) Date: Fri, 11 Apr 2003 19:03:40 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net> References: <3E9756E1.10503@prescod.net> <200304120045.h3C0jep05603@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3E9773FC.5020908@prescod.net> Guido van Rossum wrote: >... >> >>On my Python, it just counts up. "a" gets more and more referrers and >>they are "cell" objects. If it is unknown, I'll submit a bug report >>unless someone fixes it before I get to it. ;) > > > If I use a "while 1" loop, the count never goes above 225. Just FYI, even if it wouldn't have leaked forever, it caused me serious pain because it kept a reference to a COM object. The process wouldn't die until the object died and all of my usual techniques for breaking circular references were of no avail. I even tried nasty hacks like globals.clear() and self.__dict__.clear(). But there was no circular reference to be broken. Paul Prescod From mhammond@skippinet.com.au Sat Apr 12 02:41:14 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 12 Apr 2003 11:41:14 +1000 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <200304111416.47006.harri.pasanen@trema.com> Message-ID: <000001c30099$711a6f60$530f8490@eden> [Harri] > Hello, > > In a few hours old CVS checkout, I'm having problems getting the > embedded python to work. This is true even in non-embedded Python. Move away "_sre.pyd", and the interactive session shows: 'import site' failed; use -v for traceback >>> import re >>> dir(re) ['__builtins__', '__doc__', '__file__', '__name__', 'engine'] Running with "-v" shows: 'import site' failed; traceback: Traceback (most recent call last): File "E:\src\python-cvs\lib\site.py", line 298, in ? encodings._cache[enc] = encodings._unknown AttributeError: 'module' object has no attribute '_unknown' So, my speculation at this point is that for some reason, site.py now depends on re, which depends on _sre - but somehow a "stale" import is left hanging around. Another strange point - executing "python", then typing "import re" is completely silent, as we have noted. However, executing "python -c "import re" dumps an exception: python -c "import re" 'import site' failed; use -v for traceback Traceback (most recent call last): File "E:\src\python-cvs\lib\warnings.py", line 270, in ? filterwarnings("ignore", category=OverflowWarning, append=1) File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings item = (action, re.compile(message, re.I), category, AttributeError: 'module' object has no attribute 'compile' I'm really not sure what is going on here. I'd suggest creating a bug at sf. Mark. From jeremy@alum.mit.edu Sat Apr 12 04:38:14 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 11 Apr 2003 23:38:14 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9756E1.10503@prescod.net> References: <3E9756E1.10503@prescod.net> Message-ID: <1050118693.10278.17.camel@localhost.localdomain> On Fri, 2003-04-11 at 19:59, Paul Prescod wrote: > Does this bug look familiar to anyone? No bug here. > import gc > > def bar(a): > def foo(): > return None > x = a > foo() > > class C:pass > a = C() > > for i in range(20): > print len(gc.get_referrers(a)) > x = bar(a) > > On my Python, it just counts up. "a" gets more and more referrers and > they are "cell" objects. If it is unknown, I'll submit a bug report > unless someone fixes it before I get to it. ;) Nested recursive functions create circular references, which are only collected when the garbage collector runs. Add a gc.collect() call to the end of your loop and the number of referrers stays at 1. Jeremy From tim_one@email.msn.com Sat Apr 12 09:56:42 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 12 Apr 2003 04:56:42 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9773FC.5020908@prescod.net> Message-ID: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> [Paul Prescod] > Just FYI, even if it wouldn't have leaked forever, It wouldn't. > it caused me serious pain because it kept a reference to a COM object. > The process wouldn't die until the object died and all of my usual > techniques for breaking circular references were of no avail. I even > tried nasty hacks like globals.clear() and self.__dict__.clear(). But > there was no circular reference to be broken. There is, but I don't think you *can* break it. Stick print foo, foo.func_closure[1] inside your bar() function, after foo's definition. foo.func_closure is a 2-tuple here, and you'll see that its last element is a cell, which in turn points back to foo. That's the cycle. Since func_closure is a readonly attr, and tuples and cells are immutable, there shouldn't be anything you can do to break this cycle. Calling gc.collect() will reclaim it, provided it has become unreachable. Hiding critical resources in closures is a Bad Idea, of course -- that's why nobody has used Scheme since 1993 <wink>. From martin@v.loewis.de Sat Apr 12 11:34:05 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 12 Apr 2003 12:34:05 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1050093475.11200.96.camel@barry> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050093475.11200.96.camel@barry> Message-ID: <m38yug57j6.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > I suppose we could cache the conversion to make the next lookup more > efficient. Alternatively, if we always convert internally to Unicode we > could encode on .gettext(). Then we could just pick One Way and do away > with the coerce flag. If you are concerned about efficiency, I guess there is no way to avoid converting the file to Unicode on loading. I would then encourage a change where this flag is available, but has an effect only on performance, not on the behaviour. Alternatively, you could subclass GNUTranslation. Regards, Martin From mal@lemburg.com Sat Apr 12 11:58:33 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 12 Apr 2003 12:58:33 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <000001c30099$711a6f60$530f8490@eden> References: <000001c30099$711a6f60$530f8490@eden> Message-ID: <3E97F159.20909@lemburg.com> Mark Hammond wrote: > [Harri] > >>Hello, >> >>In a few hours old CVS checkout, I'm having problems getting the >>embedded python to work. > > > This is true even in non-embedded Python. Move away "_sre.pyd", and the > interactive session shows: > > 'import site' failed; use -v for traceback > >>>>import re >>>>dir(re) > > ['__builtins__', '__doc__', '__file__', '__name__', 'engine'] > > Running with "-v" shows: > > 'import site' failed; traceback: > Traceback (most recent call last): > File "E:\src\python-cvs\lib\site.py", line 298, in ? > encodings._cache[enc] = encodings._unknown > AttributeError: 'module' object has no attribute '_unknown' This looks like a modified site.py. Where did you get this from ? BTW, hacking encodings._cache is generally a *bad* idea. There's no guarantee that such code will continue to work in future releases since you are touching undocumented internals there. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 12 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 73 days left From martin@v.loewis.de Sat Apr 12 12:17:35 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 12 Apr 2003 13:17:35 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <3E97F159.20909@lemburg.com> References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> Message-ID: <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> "M.-A. Lemburg" <mal@lemburg.com> writes: > This looks like a modified site.py. Where did you get this from ? Perhaps from the Python CVS? if sys.platform == 'win32': import locale, codecs enc = locale.getdefaultlocale()[1] if enc.startswith('cp'): # "cp***" ? try: codecs.lookup(enc) except LookupError: import encodings encodings._cache[enc] = encodings._unknown encodings.aliases.aliases[enc] = 'mbcs' Regards, Martin From mal@lemburg.com Sat Apr 12 12:23:47 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 12 Apr 2003 13:23:47 +0200 Subject: [Python-Dev] range() as iterator (Re: More int/long integration issues) In-Reply-To: <200304112033.h3BKWw703999@odiug.zope.com> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <uwuitaf3c.fsf@boost-consulting.com> <200303202233.h2KMXbG07782@odiug.zope.com> <uznnowzjb.fsf@boost-consulting.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> <1048286527.651.29.camel@sayge.arc.nasa.gov> <200304112033.h3BKWw703999@odiug.zope.com> Message-ID: <3E97F743.4070301@lemburg.com> Guido van Rossum wrote: >>I'd like to work on this. I've already written a C range() iterator >>(incorporating PyLongs), and it would be very nice to have it >>automatically be a lazy range() when used in a loop. >> >>In any case, assuming you are quite busy, but would consider this for >>the 2.4 timeframe, I will do some work on it. If it is already being >>covered, I'll gladly stay away from it. :) > > range() can't be changed from returning a list until at least Python > 3.0. Is this change really necessary ? Instead of changing the semantics of range() why not have the byte code compiler optimize it's typical usage: for i in range(10): pass In the above case, changing the byte code compiler output would not introduce any change in semantics. Even better, the compiler could get rid off the function call altogether. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 12 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 73 days left From martin@v.loewis.de Sat Apr 12 12:43:28 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 12 Apr 2003 13:43:28 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1050092819.11172.89.camel@barry> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> Message-ID: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > I used standard msgfmt to turn that into a .mo file. Then created a > GNUTranslation(fp, coerce=3DTrue) and called >=20 > >>> t.ugettext(u'ab\xde') > u'\xa4yz' >=20 > This is what I should expect, right? ;) More or less, yes. Now, what happens if you pot "real" non-ASCII (i.e. bytes above 127) into the message id, like so: msgid "ab=F6" msgstr "\xc2\xa4yz" msgfmt will still accept that, but msgunfmt will complain: msgunfmt: warning: The following msgid contains non-ASCII characters. This will cause problems to translators who use a character encoding different from yours. Consider using a pure ASCII msgid instead. If you think about this, this is really bad: If you mean to apply the charset=3D to both msgid and msgstr, then translators using a different charset from yours are in big trouble. They are faced with three problems: 1. They don't know what the charset of the msgids is. The PO files do have a charset declaration, the POT files typically don't. 2. They need to convert the msgids from the POT encoding to their native encoding. There are no tools available to support that readily; tools like iconv might correctly convert the msgids, but won't update the charset=3D in the POT file (if the charset was filled out). 3. By converting the msgids, they are also changing them. That means the msgids are not really suitable as keys anymore. Regards, Martin From mal@lemburg.com Sat Apr 12 12:49:11 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 12 Apr 2003 13:49:11 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> Message-ID: <3E97FD37.9040100@lemburg.com> Martin v. L=F6wis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >=20 >=20 >>This looks like a modified site.py. Where did you get this from ? >=20 > Perhaps from the Python CVS? Hmm, I don't have that in my CVS checkout... I guess a cleanup is due. > if sys.platform =3D=3D 'win32': > import locale, codecs > enc =3D locale.getdefaultlocale()[1] > if enc.startswith('cp'): # "cp***" ? > try: > codecs.lookup(enc) > except LookupError: > import encodings > encodings._cache[enc] =3D encodings._unknown > encodings.aliases.aliases[enc] =3D 'mbcs' That's the wrong way to do it. This code should live in encodings/__init__.py, not site.py, and it should be done lazy, ie. Python startup time should not suffer from this in general, only when Unicode and cpXXX encodings are being requested and not found. The codec machinery was carefully designed not to introduce extra overhead when not using Unicode in programs. The above approach pretty much kills this effort :-) --=20 Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 12 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 73 days left From martin@v.loewis.de Sat Apr 12 13:31:07 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 12 Apr 2003 14:31:07 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <3E97FD37.9040100@lemburg.com> References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> <3E97FD37.9040100@lemburg.com> Message-ID: <m3el475244.fsf@mira.informatik.hu-berlin.de> "M.-A. Lemburg" <mal@lemburg.com> writes: > The codec machinery was carefully designed not to introduce > extra overhead when not using Unicode in programs. The above > approach pretty much kills this effort :-) This effort is dead already. For example, on Unix, the file system default encoding is initialized from the user's preference; to verify that the encoding really exists, a codec lookup is performed. Regards, Martin From guido@python.org Sat Apr 12 14:25:15 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 12 Apr 2003 09:25:15 -0400 Subject: [Python-Dev] range() as iterator (Re: More int/long integration issues) In-Reply-To: "Your message of Sat, 12 Apr 2003 13:23:47 +0200." <3E97F743.4070301@lemburg.com> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <uwuitaf3c.fsf@boost-consulting.com> <200303202233.h2KMXbG07782@odiug.zope.com> <uznnowzjb.fsf@boost-consulting.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> <1048286527.651.29.camel@sayge.arc.nasa.gov> <200304112033.h3BKWw703999@odiug.zope.com> <3E97F743.4070301@lemburg.com> Message-ID: <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net> [Chad Netzer] > >>I'd like to work on this. I've already written a C range() iterator > >>(incorporating PyLongs), and it would be very nice to have it > >>automatically be a lazy range() when used in a loop. > >> > >>In any case, assuming you are quite busy, but would consider this for > >>the 2.4 timeframe, I will do some work on it. If it is already being > >>covered, I'll gladly stay away from it. :) [Guido] > > range() can't be changed from returning a list until at least Python > > 3.0. [MAL] > Is this change really necessary ? Instead of changing the semantics > of range() why not have the byte code compiler optimize it's typical > usage: > > for i in range(10): > pass > > In the above case, changing the byte code compiler output would > not introduce any change in semantics. Even better, the compiler > could get rid off the function call altogether. Right. That's nice, and can be done before 3.0 (as soon as we change the rules so that adding a 'range' attribute to a module object is illegal). My musing about making range() an iterator or iterator well comes from the observation that if I had had iterators as a concept from day one, I would have made several things iterators that currently return lists, e.g. map(), filter(), and range(). The need for the concrete list returned by range() (outside the tutorial :-) is rare; in those rare cases you could say list(range(...)). Whether this is indeed worth changing in 3.0 isn't clear, that depends on the scope of 3.0, which isn't defined yet (because I haven't had time to work on it, really). I certainly plan to eradicate xrange() in 3.0 one way or another: TOOWTDI. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Apr 12 14:43:52 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 12 Apr 2003 09:43:52 -0400 Subject: [Python-Dev] Evil setattr hack Message-ID: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> Someone accidentally discovered a way to set attributes of built-in types, even though the implementation tries to prevent this. For example, you cannot modify the str type to add a new method. Let's define the method first: >>> def reverse(self): ... return self[::-1] ... >>> Using direct attribute assignment doesn't work: >>> str.reverse = reverse Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can't set attributes of built-in/extension type 'str' >>> Using the dictionary doesn't work either: >>> str.__dict__['reverse'] = reverse Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: object does not support item assignment >>> But here's a trick that *does* work: >>> object.__setattr__(str, 'reverse', reverse) >>> Proof that it worked: >>> "hello".reverse() 'olleh' >>> What to do about this? I *really* don't want changes to built-in types to become a standard "hack", because there are all sorts of things that could go wrong. (For one, built-in type objects are static C variables, which are shared between multiple interpreter contexts in the same process.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Apr 12 17:00:13 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 12 Apr 2003 12:00:13 -0400 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: "Your message of Sat, 12 Apr 2003 11:41:14 +1000." <000001c30099$711a6f60$530f8490@eden> References: <000001c30099$711a6f60$530f8490@eden> Message-ID: <200304121600.h3CG0DU01994@pcp02138704pcs.reston01.va.comcast.net> > [Harri] > > Hello, > > > > In a few hours old CVS checkout, I'm having problems getting the > > embedded python to work. [Mark] > This is true even in non-embedded Python. Move away "_sre.pyd", and the > interactive session shows: > > 'import site' failed; use -v for traceback > >>> import re > >>> dir(re) > ['__builtins__', '__doc__', '__file__', '__name__', 'engine'] > > Running with "-v" shows: > > 'import site' failed; traceback: > Traceback (most recent call last): > File "E:\src\python-cvs\lib\site.py", line 298, in ? > encodings._cache[enc] = encodings._unknown > AttributeError: 'module' object has no attribute '_unknown' > > So, my speculation at this point is that for some reason, site.py > now depends on re, which depends on _sre site.py sometimes imports distutils.util which imports re which imports _sre. But this is only when run from the build directory. But there's another path that imports re, and that's from warnings, which is imported as soon as a warning is issued (even if nothing is printed). > - but somehow a "stale" import is left hanging around. That's a standard problem when module A imports B and B fails -- a semi-complete A stays around. Proposals to fix it have been made, but it's tricky because deleting A isn't always the right thing to do (and makes the failure harder to debug). > Another strange point - executing "python", then typing "import re" is > completely silent, as we have noted. However, executing "python -c "import > re" dumps an exception: > > python -c "import re" > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "E:\src\python-cvs\lib\warnings.py", line 270, in ? > filterwarnings("ignore", category=OverflowWarning, append=1) > File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings > item = (action, re.compile(message, re.I), category, > AttributeError: 'module' object has no attribute 'compile' > > I'm really not sure what is going on here. I'd suggest creating a bug at > sf. Have you got a $PYTHONSTRATUP? That doesn't get executed in the second case. --Guido van Rossum (home page: http://www.python.org/~guido/) From cnetzer@mail.arc.nasa.gov Sat Apr 12 20:37:09 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 12 Apr 2003 12:37:09 -0700 Subject: [Python-Dev] range() as iterator (Re: More int/long integration issues) In-Reply-To: <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net> References: <7F171EB5E155544CAC4035F0182093F03CF792@INGDEXCHSANC1.ingdirect.com> <200303131903.h2DJ3Ug06240@odiug.zope.com> <uwuitaf3c.fsf@boost-consulting.com> <200303202233.h2KMXbG07782@odiug.zope.com> <uznnowzjb.fsf@boost-consulting.com> <200303211455.h2LEtGp24202@pcp02138704pcs.reston01.va.comcast.net> <1048286527.651.29.camel@sayge.arc.nasa.gov> <200304112033.h3BKWw703999@odiug.zope.com> <3E97F743.4070301@lemburg.com> <200304121325.h3CDPFW01806@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1050176229.601.15.camel@sayge.arc.nasa.gov> On Sat, 2003-04-12 at 06:25, Guido van Rossum wrote: > [MAL] > > Is this change really necessary ? Instead of changing the semantics > > of range() why not have the byte code compiler optimize it's typical > > usage: > > Right. That's nice, and can be done before 3.0 (as soon as we change > the rules so that adding a 'range' attribute to a module object is > illegal). Well, I plan to look into doing this, just because I think it is an interesting problem and tickles my fancy. I'll report back when I have failed. But at least I'll try to get the ball rolling. :) Chad Netzer From drifty@alum.berkeley.edu Sat Apr 12 23:20:35 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sat, 12 Apr 2003 15:20:35 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests Message-ID: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> For the regression tests for the stdlib, is it okay to create temporary files (using tempfile) and connect to the Internet (when the network resource is enabled)? -Brett From guido@python.org Sun Apr 13 01:08:04 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 12 Apr 2003 20:08:04 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Sat, 12 Apr 2003 15:20:35 PDT." <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> Message-ID: <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> > For the regression tests for the stdlib, is it okay to create temporary > files (using tempfile) and connect to the Internet (when the network > resource is enabled)? Tempfiles: definitely; though if you need a single temporary file, you can use test_support.TESTFN. Connecting to the Internet: only if the network resource is enabled. Then it is up to the tester to make sure that connection to the Internet is possible. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Sun Apr 13 02:55:50 2003 From: skip@pobox.com (Skip Montanaro) Date: Sat, 12 Apr 2003 20:55:50 -0500 Subject: [Python-Dev] migration away from SourceForge? Message-ID: <16024.50086.748997.76318@montanaro.dyndns.org> Is it time to think seriously about moving away from SourceForge? Their cvs performance seems to be getting worse by the day. Bug updates also seem to fail periodically in a fashion that suggests system overload. I presume the parent company (is that still VA Linux?) is in dire enough financial straits that it can't afford to upgrade its infrastructure enough to meet the increased demand. It seems we mostly need a CVS repository and a bug tracker. Is RoundUp close enough to fill the bug tracking bill? What options are available for CVS hosting? OTOH, maybe we should try to convince Google to buy SF. ;-) Skip From barry@python.org Sun Apr 13 05:15:55 2003 From: barry@python.org (Barry Warsaw) Date: Sun, 13 Apr 2003 00:15:55 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <16024.50086.748997.76318@montanaro.dyndns.org> Message-ID: <9EF4B80D-6D66-11D7-8848-003065EEFAC8@python.org> On Saturday, April 12, 2003, at 09:55 PM, Skip Montanaro wrote: > > Is it time to think seriously about moving away from SourceForge? > Their cvs > performance seems to be getting worse by the day. Bug updates also > seem to > fail periodically in a fashion that suggests system overload. I > presume the > parent company (is that still VA Linux?) is in dire enough financial > straits > that it can't afford to upgrade its infrastructure enough to meet the > increased demand. > > It seems we mostly need a CVS repository and a bug tracker. Is RoundUp > close enough to fill the bug tracking bill? What options are > available for > CVS hosting? Perhaps we should look into running the GForge code on a python.org machine? http://gforge.org -Barry From tim.one@comcast.net Sun Apr 13 06:09:29 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 13 Apr 2003 01:09:29 -0400 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <000001c30099$711a6f60$530f8490@eden> Message-ID: <LNBBLJKPBEHFEDALKOLCGEBLEDAB.tim.one@comcast.net> [Mark Hammond] > ... > Another strange point - executing "python", then typing "import re" is > completely silent, as we have noted. However, executing > "python -c "import re" dumps an exception: > > python -c "import re" > 'import site' failed; use -v for traceback > Traceback (most recent call last): > File "E:\src\python-cvs\lib\warnings.py", line 270, in ? > filterwarnings("ignore", category=OverflowWarning, append=1) > File "E:\src\python-cvs\lib\warnings.py", line 140, in filterwarnings > item = (action, re.compile(message, re.I), category, > AttributeError: 'module' object has no attribute 'compile' > > I'm really not sure what is going on here. I'd suggest creating a bug at > sf. Does this fail for anyone else? Works for me, here on Win98SE: C:\Code\python\PCbuild>python -c "import re" C:\Code\python\PCbuild> Did you try -v, as > 'import site' failed; use -v for traceback suggested? Here's the import info I get: C:\Code\python\PCbuild>python -vc "import re" # installing zipimport hook import zipimport # builtin # installed zipimport hook # C:\CODE\PYTHON\lib\site.pyc matches C:\CODE\PYTHON\lib\site.py import site # precompiled from C:\CODE\PYTHON\lib\site.pyc # C:\CODE\PYTHON\lib\os.pyc matches C:\CODE\PYTHON\lib\os.py import os # precompiled from C:\CODE\PYTHON\lib\os.pyc import nt # builtin # C:\CODE\PYTHON\lib\ntpath.pyc matches C:\CODE\PYTHON\lib\ntpath.py import ntpath # precompiled from C:\CODE\PYTHON\lib\ntpath.pyc # C:\CODE\PYTHON\lib\stat.pyc matches C:\CODE\PYTHON\lib\stat.py import stat # precompiled from C:\CODE\PYTHON\lib\stat.pyc # C:\CODE\PYTHON\lib\UserDict.pyc matches C:\CODE\PYTHON\lib\UserDict.py import UserDict # precompiled from C:\CODE\PYTHON\lib\UserDict.pyc # C:\CODE\PYTHON\lib\copy_reg.pyc matches C:\CODE\PYTHON\lib\copy_reg.py import copy_reg # precompiled from C:\CODE\PYTHON\lib\copy_reg.pyc # C:\CODE\PYTHON\lib\types.pyc matches C:\CODE\PYTHON\lib\types.py import types # precompiled from C:\CODE\PYTHON\lib\types.pyc # C:\CODE\PYTHON\lib\locale.pyc matches C:\CODE\PYTHON\lib\locale.py import locale # precompiled from C:\CODE\PYTHON\lib\locale.pyc import _locale # builtin # C:\CODE\PYTHON\lib\codecs.pyc matches C:\CODE\PYTHON\lib\codecs.py import codecs # precompiled from C:\CODE\PYTHON\lib\codecs.pyc import _codecs # builtin import encodings # directory C:\CODE\PYTHON\lib\encodings # C:\CODE\PYTHON\lib\encodings\__init__.pyc matches C:\CODE\PYTHON\lib\encodings\__init__.py import encodings # precompiled from C:\CODE\PYTHON\lib\encodings\__init__.pyc # C:\CODE\PYTHON\lib\re.pyc matches C:\CODE\PYTHON\lib\re.py import re # precompiled from C:\CODE\PYTHON\lib\re.pyc # C:\CODE\PYTHON\lib\sre.pyc matches C:\CODE\PYTHON\lib\sre.py import sre # precompiled from C:\CODE\PYTHON\lib\sre.pyc # C:\CODE\PYTHON\lib\sre_compile.pyc matches C:\CODE\PYTHON\lib\sre_compile.py import sre_compile # precompiled from C:\CODE\PYTHON\lib\sre_compile.pyc import _sre # dynamically loaded from C:\Code\python\PCbuild\_sre.pyd # C:\CODE\PYTHON\lib\sre_constants.pyc matches C:\CODE\PYTHON\lib\sre_constants.py import sre_constants # precompiled from C:\CODE\PYTHON\lib\sre_constants.pyc # C:\CODE\PYTHON\lib\sre_parse.pyc matches C:\CODE\PYTHON\lib\sre_parse.py import sre_parse # precompiled from C:\CODE\PYTHON\lib\sre_parse.pyc # C:\CODE\PYTHON\lib\string.pyc matches C:\CODE\PYTHON\lib\string.py import string # precompiled from C:\CODE\PYTHON\lib\string.pyc import strop # builtin # C:\CODE\PYTHON\lib\encodings\cp1252.pyc matches C:\CODE\PYTHON\lib\encodings\cp1252.py import encodings.cp1252 # precompiled from C:\CODE\PYTHON\lib\encodings\cp1252.pyc # C:\CODE\PYTHON\lib\warnings.pyc matches C:\CODE\PYTHON\lib\warnings.py import warnings # precompiled from C:\CODE\PYTHON\lib\warnings.pyc # C:\CODE\PYTHON\lib\linecache.pyc matches C:\CODE\PYTHON\lib\linecache.py import linecache # precompiled from C:\CODE\PYTHON\lib\linecache.pyc From drifty@alum.berkeley.edu Sun Apr 13 07:58:32 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sat, 12 Apr 2003 23:58:32 -0700 (PDT) Subject: [Python-Dev] RE: How should time.strptime() handle UTC? In-Reply-To: <LNBBLJKPBEHFEDALKOLCOECAEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCOECAEDAB.tim.one@comcast.net> Message-ID: <Pine.SOL.4.53.0304122344080.3660@death.OCF.Berkeley.EDU> I am cc'ing python-dev at Tim's suggestion. You can read the replies in the email, but the gist is whether time.strptime() should accept UTC and GMT for teh %Z directive. [Tim Peters] > [Brett Cannon] > > I was writing a script using strptime and I rediscovered that strptime (at > > least the pure Python version) does not accept UTC for the %Z directive as > > an acceptable timezone. > > Is it that it specifically didn't accept UTC, or that it generally doesn't > accept anything for %Z? > Doesn't accept anything beyond what the computer's timezone is (if the computer is in PDT, it only picks that up and nothing else; quick test I did failed on PST). So trying anything that is not directly known gets rejected as a format error. > > Now I just checked an install on a Solaris machine under Python 2.2 and it > > doesn't accept UTC as a timezone either so I know of at least one C > version that > > doesn't take it either. > > > > Do you two think that I should modify strptime to accept UTC and GMT and > > then set tm_isdst (DST flag) to 0? Or should it just stay as-is and not > > accept it? Should I change it so that it accepts any 3-letter entry for > > %Z and then just see if I know what the timezone is; if I know set > > tm_isdst appropriately, otherwise set it to -1? > > > > I say yes to adding UTC and no to blindly accepting possible timezones. > > My feeling is that this should act as closely to a naive datetime > > representation as possible. > > > > And don't let having to deal with a patch hold you up on wanting to change > > it; I have to patch a "feature" of strptime anyway. =) > > I think you should debate this in public. So if you guys don't like hearing about this stuff blame Tim. =) > %Z isn't allowed in POSIX strptime(). GNU docs say glibc supports it as > an extension to POSIX, and that GNU "parses" for it (whatever that > means), but that "no field in tm is changed" as a result. A number of > other strptime man pages on the web say: > > %Z > timezone name or no characters if no time zone information exists > > which suggests they carelessly copied the format part of their strftime() > man page. > > So there's no clear prior art to follow here, and inventing new art takes > more time than I have (hence "debate this in public" -- please <wink>). > Well, that man page line is pretty much what the Python docs say. Personally I would love to not have to support it, but it has been in the docs so I don't know if we can yank it without upsetting someone (although strptime has always been questionable, so maybe we can rip it out). So, "public", should strptime be able to handle UTC and GMT as a timezone no matter what? How about taking in any 3-character timezone so that an error isn't raised but only set the DST flag if it is a known timezone? Perhaps %Z should accept 42 since it is the answer to everything? -Brett From tim_one@email.msn.com Sun Apr 13 08:05:54 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 03:05:54 -0400 Subject: [Python-Dev] Big trouble in CVS Python Message-ID: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com> Under current CVS, release build, running regrtest.py crashes very soon after entering test___all__.py for me, on two different machines (but both Windows). The C stack has gotten lost by this point, and the program counter is pointing into static data(!), about a dozen bytes beyond the start of Python's static PyFloat_Type type. Alas, there is no problem in a debug build. There's also no problem under the release build if I run the debug build first and leave the .pyc files behind. Removing the .pyc files and then running the release build dies every time. So maybe it's something to do with compiling Python programs, or maybe with a vagary of when cyclic gc triggers. The latter is high on my suspect list, because the location of the death is affected by regrtest's -t option, and the release build runs the tests to completion with -t0. If it's in gc, I probably caused it. So I'm not asking you to fix it <wink>. It would help to know if anyone is having problems under Linux, and especially if you are and the debugger there is more helpful in a release build. From martin@v.loewis.de Sun Apr 13 08:37:31 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 13 Apr 2003 09:37:31 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <16024.50086.748997.76318@montanaro.dyndns.org> References: <16024.50086.748997.76318@montanaro.dyndns.org> Message-ID: <m3znmux2yr.fsf@mira.informatik.hu-berlin.de> Skip Montanaro <skip@pobox.com> writes: > Is it time to think seriously about moving away from SourceForge? Any proposal to move away from SourceForge should include a proposal where to move *to*. I highly admire SourceForge operators for their quality of service, and challenge anybody to provide the same quality service. Be prepared to find yourself in a full-time job if you want to take over. SourceForge performance was *much* worse in the past, and we didn't consider moving away, and SF fixed it by buying new hardware. Give them some time. Regards, Martin From drifty@alum.berkeley.edu Sun Apr 13 08:51:18 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 13 Apr 2003 00:51:18 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> [Guido van Rossum] > > For the regression tests for the stdlib, is it okay to create temporary > > files (using tempfile) and connect to the Internet (when the network > > resource is enabled)? > > Tempfiles: definitely; though if you need a single temporary file, you > can use test_support.TESTFN. > Perfect. Exactly what I was looking for. > Connecting to the Internet: only if the network resource is enabled. > Then it is up to the tester to make sure that connection to the > Internet is possible. > Any suggestions on how to go about this? An initial connection to python.org after setting socket.setdefaulttimeout() to something reasonable (like 10 seconds?) and raising test_support.TestSkipped if it times out? -Brett From martin@v.loewis.de Sun Apr 13 08:58:27 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 13 Apr 2003 09:58:27 +0200 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com> Message-ID: <m3n0iux1zw.fsf@mira.informatik.hu-berlin.de> "Tim Peters" <tim_one@email.msn.com> writes: > If it's in gc, I probably caused it. So I'm not asking you to fix it > <wink>. It would help to know if anyone is having problems under Linux, and > especially if you are and the debugger there is more helpful in a release > build. It crashes for me as well, in test_builtin, with the backtrace #0 0x40340019 in main_arena () from /lib/libc.so.6 #1 0x080edad6 in visit_decref (op=0x8343fa4, data=0x80eda90) at Modules/gcmodule.c:236 #2 0x08097a70 in tupletraverse (o=0x40351e64, visit=0x80eda90 <visit_decref>, arg=0x0) at Objects/tupleobject.c:398 #3 0x080ed152 in collect (generation=2) at Modules/gcmodule.c:250 #4 0x080ed764 in gc_collect (self=0x0, noargs=0x0) at Modules/gcmodule.c:731 #5 0x080be763 in call_function (pp_stack=0xbfffee9c, oparg=24) at Python/ceval.c:3400 #6 0x080bcb9e in eval_frame (f=0x834013c) at Python/ceval.c:2091 #7 0x080bd685 in PyEval_EvalCodeEx (co=0x403aae60, globals=0x18, locals=0x0, args=0x834013c, argcount=0, kws=0x82fb2dc, kwcount=0, defs=0x403bd470, defcount=11, closure=0x0) at Python/ceval.c:2638 #8 0x080be81e in fast_function (func=0x40351e64, pp_stack=0xbffff02c, n=0, na=0, nk=0) at Python/ceval.c:3504 #9 0x080be671 in call_function (pp_stack=0xbffff02c, oparg=24) at Python/ceval.c:3433 #10 0x080bcb9e in eval_frame (f=0x82fb18c) at Python/ceval.c:2091 #11 0x080bd685 in PyEval_EvalCodeEx (co=0x4045a220, globals=0x18, locals=0x4036279c, args=0x82fb18c, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2638 The tuple being traversed has 19 elements, of types: NoneType, int, int, int, int, int, int, int, int, int, int, int, int, int, int, long, int, float, <NULL> It crashes on the last tuple element, which is a garbage pointer. Regards, Martin From guido@python.org Sun Apr 13 13:54:37 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 08:54:37 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Sun, 13 Apr 2003 00:51:18 PDT." <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> Message-ID: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> > > Connecting to the Internet: only if the network resource is enabled. > > Then it is up to the tester to make sure that connection to the > > Internet is possible. > > Any suggestions on how to go about this? An initial connection to > python.org after setting socket.setdefaulttimeout() to something > reasonable (like 10 seconds?) and raising test_support.TestSkipped if it > times out? No, you check whether the 'network' resource name is enabled in test_support. Use test_support.is_resource_enabled('network'). --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Sun Apr 13 14:05:01 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sun, 13 Apr 2003 23:05:01 +1000 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBLEDAB.tim.one@comcast.net> Message-ID: <022e01c301bd$4b7f5a70$530f8490@eden> > Did you try -v, as > > > 'import site' failed; use -v for traceback > > suggested? Yep. as I said: > > Running with "-v" shows: Note that as I mentioned, this is only if you move away _sre.pyd. The original report was almost certainly a simple import error. Mark. From guido@python.org Sun Apr 13 14:22:35 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 09:22:35 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: Your message of "Sun, 13 Apr 2003 08:54:37 EDT." Message-ID: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> > > > Connecting to the Internet: only if the network resource is enabled. > > > Then it is up to the tester to make sure that connection to the > > > Internet is possible. > > > > Any suggestions on how to go about this? An initial connection to > > python.org after setting socket.setdefaulttimeout() to something > > reasonable (like 10 seconds?) and raising test_support.TestSkipped if it > > times out? > > No, you check whether the 'network' resource name is enabled in > test_support. Use test_support.is_resource_enabled('network'). I realize that you might not know how to run such tests either. The magic words are regrtest.py -u network BTW, this isn't described in Lib/test/README -- perhaps you or someone else can add it? (Both the -u option and the is_resource_enabled() function.) Hm, maybe these docs shouldn't be so hidden and there should be a standard library chapter on the test package and its submodules and the conventions for writing and running tests? --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Sun Apr 13 19:13:13 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 13 Apr 2003 14:13:13 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCIENMEGAB.tim_one@email.msn.com> Message-ID: <1050257590.10278.19.camel@localhost.localdomain> On Sun, 2003-04-13 at 03:05, Tim Peters wrote: > Under current CVS, release build, running regrtest.py crashes very soon > after entering test___all__.py for me, on two different machines (but both > Windows). The C stack has gotten lost by this point, and the program > counter is pointing into static data(!), about a dozen bytes beyond the > start of Python's static PyFloat_Type type. Unfortunately, I don't see any problem at all in a release build on my Linux box. Jeremy From tim_one@email.msn.com Sun Apr 13 19:29:59 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 14:29:59 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <m3n0iux1zw.fsf@mira.informatik.hu-berlin.de> Message-ID: <LNBBLJKPBEHFEDALKOLCGEPCEGAB.tim_one@email.msn.com> [martin@v.loewis.de] > It crashes for me as well, in test_builtin, with the backtrace Wow! It took me hours to get there. Noting that Anthony and Jeremy report no problems, but Martin's symptom appears identical to mine: > #0 0x40340019 in main_arena () from /lib/libc.so.6 > #1 0x080edad6 in visit_decref (op=0x8343fa4, data=0x80eda90) at > Modules/gcmodule.c:236 > #2 0x08097a70 in tupletraverse (o=0x40351e64, visit=0x80eda90 > <visit_decref>, arg=0x0) > at Objects/tupleobject.c:398 > #3 0x080ed152 in collect (generation=2) at Modules/gcmodule.c:250 > #4 0x080ed764 in gc_collect (self=0x0, noargs=0x0) at > Modules/gcmodule.c:731 > #5 0x080be763 in call_function (pp_stack=0xbfffee9c, oparg=24) > at Python/ceval.c:3400 > #6 0x080bcb9e in eval_frame (f=0x834013c) at Python/ceval.c:2091 > #7 0x080bd685 in PyEval_EvalCodeEx (co=0x403aae60, globals=0x18, > locals=0x0, > args=0x834013c, argcount=0, kws=0x82fb2dc, kwcount=0, > defs=0x403bd470, defcount=11, > closure=0x0) at Python/ceval.c:2638 > #8 0x080be81e in fast_function (func=0x40351e64, > pp_stack=0xbffff02c, n=0, na=0, nk=0) > at Python/ceval.c:3504 > #9 0x080be671 in call_function (pp_stack=0xbffff02c, oparg=24) > at Python/ceval.c:3433 > #10 0x080bcb9e in eval_frame (f=0x82fb18c) at Python/ceval.c:2091 > #11 0x080bd685 in PyEval_EvalCodeEx (co=0x4045a220, globals=0x18, > locals=0x4036279c, > args=0x82fb18c, argcount=0, kws=0x0, kwcount=0, defs=0x0, > defcount=0, closure=0x0) > at Python/ceval.c:2638 > > The tuple being traversed has 19 elements, of types: > > NoneType, int, int, int, int, int, int, int, int, int, int, int, > int, int, int, long, int, float, <NULL> > > It crashes on the last tuple element, which is a garbage pointer. Exactly the same here. The tuple is the co_consts belonging to test_builtin's test_range. It's the 11th tuple of size 19 created <wink/sigh>. At the time compile.c's jcompile created the tuple: consts = PyList_AsTuple(sc.c_consts); the last element was fine, a float with value 1.e101, from test_range's self.assertRaises(ValueError, range, 1e100, 1e101, 1e101) Alas, none of that helps. At the time of the crash, the last tuple entry still points to the memory for that floatobject, but the memory has been scribbled over. The first 18 tuple elements appear still to be intact. My suspicion that it's a gc problem has gotten weaker to the point of thinking that's unlikely. It looks more like gc is suffering the effects of something else scribbling over memory it ought not to be poking. From drifty@alum.berkeley.edu Sun Apr 13 20:50:39 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 13 Apr 2003 12:50:39 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU> [Guido van Rossum] > > > Connecting to the Internet: only if the network resource is enabled. > > > Then it is up to the tester to make sure that connection to the > > > Internet is possible. > > > > Any suggestions on how to go about this? An initial connection to > > python.org after setting socket.setdefaulttimeout() to something > > reasonable (like 10 seconds?) and raising test_support.TestSkipped if it > > times out? > > No, you check whether the 'network' resource name is enabled in > test_support. Use test_support.is_resource_enabled('network'). > Actually I knew that. What I was wondering about was how "to make sure that connection to Internet is possible". -Brett From drifty@alum.berkeley.edu Sun Apr 13 20:53:55 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 13 Apr 2003 12:53:55 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> [Guido van Rossum] <snip - Me trying to find out whether it's OK to use the Net in tests> > > No, you check whether the 'network' resource name is enabled in > > test_support. Use test_support.is_resource_enabled('network'). > > I realize that you might not know how to run such tests either. The > magic words are > > regrtest.py -u network > > BTW, this isn't described in Lib/test/README -- perhaps you or someone > else can add it? (Both the -u option and the is_resource_enabled() > function.) > I can write some basic instructions on how to use regrtest and test_support; someone will just have to check them in. > Hm, maybe these docs shouldn't be so hidden and there should be a > standard library chapter on the test package and its submodules and > the conventions for writing and running tests? > That definitely wouldn't hurt. It might also get people to write tests more often and maybe help with improving our code if they knew about regrtest and test_support. -Brett From tim_one@email.msn.com Sun Apr 13 20:54:05 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 15:54:05 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEPCEGAB.tim_one@email.msn.com> Message-ID: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> >> The tuple being traversed has 19 elements, of types: >> >> NoneType, int, int, int, int, int, int, int, int, int, int, int, >> int, int, int, long, int, float, <NULL> >> >> It crashes on the last tuple element, which is a garbage pointer. > Exactly the same here. The tuple is the co_consts belonging to > test_builtin's test_range. It's the 11th tuple of size 19 created > <wink/sigh>. At the time compile.c's jcompile created the tuple: > > consts = PyList_AsTuple(sc.c_consts); > > the last element was fine, a float with value 1.e101, from test_range's > > self.assertRaises(ValueError, range, 1e100, 1e101, 1e101) > > Alas, none of that helps. At the time of the crash, the last tuple > entry still points to the memory for that floatobject, but the memory > has been scribbled over. The first 18 tuple elements appear still to > be intact. > > My suspicion that it's a gc problem has gotten weaker to the point of > thinking that's unlikely. It looks more like gc is suffering the > effects of something else scribbling over memory it ought not to be > poking. Next clue: the damaged float object was earlier (much earlier) deallocated. Its refcount (in co_consts) started as 1, and it fell to 0 via the tail end of call_function(): /* What does this do? */ while ((*pp_stack) > pfunc) { w = EXT_POP(*pp_stack); Py_DECREF(w); PCALL(PCALL_POP); } However, co_consts is still alive and still points to it, so this deallocation is erroneous. float_dealloc abuses the ob_type field to maintain a free list: op->ob_type = (struct _typeobject *)free_list; free_list is a file static. This explains why the tp_traverse slot ends up pointing into static data in floatobject.c. Given this, there's approximately no chance gc *caused* it. Who's been mucking with function calls (or maybe the eval loop) recently? From jeremy@alum.mit.edu Sun Apr 13 21:33:38 2003 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: 13 Apr 2003 16:33:38 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> Message-ID: <1050266017.10278.24.camel@localhost.localdomain> On Sun, 2003-04-13 at 15:54, Tim Peters wrote: > Given this, there's approximately no chance gc *caused* it. Who's been > mucking with function calls (or maybe the eval loop) recently? > We've had a lot of changes to the function call implementation over the last couple of months. What's the chance that this is just the first time we've noticed the problem? Seems pretty plausible that the recent GC changes just exposed an earlier bug. Jeremy From tim_one@email.msn.com Sun Apr 13 21:28:13 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 16:28:13 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> Message-ID: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> >> self.assertRaises(ValueError, range, 1e100, 1e101, 1e101) > ... > Given this, there's approximately no chance gc *caused* it. Who's been > mucking with function calls (or maybe the eval loop) recently? It appears to be a refcount error in recently-added C code that tries to generalize the builtin range() function, specifically here: Fail: Py_XDECREF(curnum); Py_XDECREF(istep); <- here Py_XDECREF(zero); Word to the wise: don't ever try to reuse a variable whose address is passed to PyArg_ParseTuple for anything other than holding what PyArg_ParseTuple does or doesn't store into it. You'll never get the decrefs straight (and even if you manage to at first, the next person to modify your code will break it). only-consumed-eight-hours-this-time<wink>-ly y'rs - tim From martin@v.loewis.de Sun Apr 13 21:29:28 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 13 Apr 2003 22:29:28 +0200 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCKEPHEGAB.tim_one@email.msn.com> Message-ID: <m3brza2lav.fsf@mira.informatik.hu-berlin.de> "Tim Peters" <tim_one@email.msn.com> writes: > However, co_consts is still alive and still points to it, so this > deallocation is erroneous. Notice, however, that the float object is not *directly* deallocated. Instead, it is deallocated as a consequence of deallocating a one-element tuple which is the argument tuple for "round", in PyObject *callargs; callargs = load_args(pp_stack, na); x = PyCFunction_Call(func, callargs, NULL); Py_XDECREF(callargs); load_args copies the argument from the stack into the tuple, transferring the refence. So apparently, the float const gets on the stack without its reference being bumped... That's as far as I can get tonight. Regards, Martin From mwh@python.net Sun Apr 13 21:49:17 2003 From: mwh@python.net (Michael Hudson) Date: Sun, 13 Apr 2003 21:49:17 +0100 Subject: [Python-Dev] Evil setattr hack In-Reply-To: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sat, 12 Apr 2003 09:43:52 -0400") References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mbrzap1gy.fsf@starship.python.net> Guido van Rossum <guido@python.org> writes: > Someone accidentally discovered a way to set attributes of built-in > types, even though the implementation tries to prevent this. [snip] > What to do about this? Well, one approach would be special cases in PyObject_GenericSetAttr, I guess. Cheers, M. -- > So what does "abc" / "ab" equal? cheese -- Steve Holden defends obscure semantics on comp.lang.python From tim_one@email.msn.com Sun Apr 13 22:44:57 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 17:44:57 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <m3brza2lav.fsf@mira.informatik.hu-berlin.de> Message-ID: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com> [martin@v.loewis.de] > Notice, however, that the float object is not *directly* deallocated. > Instead, it is deallocated as a consequence of deallocating a > one-element tuple which is the argument tuple for "round", in > > PyObject *callargs; > callargs = load_args(pp_stack, na); > x = PyCFunction_Call(func, callargs, NULL); > Py_XDECREF(callargs); > > load_args copies the argument from the stack into the tuple, > transferring the refence. So apparently, the float const gets on the > stack without its reference being bumped... That was my excited guess, until I looked at LOAD_CONST <wink>. Calls are such an elaborate dance that the refcount on this puppy gets as high as 7. The problem actually occurred when the refcount was at its peak, due to an erroneous decref in handle_range_longs(). At that point the refcount fell to 6, and the remaining 6(!) decrefs all looked correct. > That's as far as I can get tonight. Thanks for sharing the pain! From drifty@alum.berkeley.edu Sun Apr 13 22:51:14 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 13 Apr 2003 14:51:14 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU> [Guido van Rossum] <snip - question about tests using the Internet> > No, you check whether the 'network' resource name is enabled in > test_support. Use test_support.is_resource_enabled('network'). > Another thought that has come to mind; should we be diligent about creating new objects like good testers? Or should we minimize it since net connections are expensive to make and can hold things up. -Brett From tim_one@email.msn.com Sun Apr 13 23:07:05 2003 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 13 Apr 2003 18:07:05 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <1050266017.10278.24.camel@localhost.localdomain> Message-ID: <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com> [Jeremy Hylton] > We've had a lot of changes to the function call implementation over the > last couple of months. What's the chance that this is just the first > time we've noticed the problem? Slim, I think -- anything systematically screwing up refcounts on calls would have lots of opportunities to create trouble. This one was unique and shy. > Seems pretty plausible that the recent GC changes just exposed an > earlier bug. For all the code changes, the only intended semantic difference was in has_finalizer's implementation details. So that didn't seem likely either. Turned out that the damaged co_consts was attached to the test that exercised the new C code at fault. The code was compiled gazillions of cycles before the test was executed, though, and gazillions more cycles passed before GC bumped into the damage. If gc hadn't bumped into it, the memory would have gotten allocated to some other float, and then would have been decref'ed incorrectly when the original co_consts got deallocated. So it *could* have been much harder to track down <shudder>. What I still don't grasp is why a debug run never failed with a negative-refcount error. Attaching the prematurely-freed float to the float free list doesn't change its refcount field -- that remains 0. So if it was still in the free list when the original co_consts got reclaimed, we should have had a negrefcnt death. OTOH, if the memory was handed out to another float, then when the original co_consts got reclaimed it would have knocked that float's refcount down too, which should lead to a negrefcnt death later. Maybe co_consts never did get reclaimed? I'm not clear on how much we let slide at shutdown. From skip@pobox.com Sun Apr 13 23:15:07 2003 From: skip@pobox.com (Skip Montanaro) Date: Sun, 13 Apr 2003 17:15:07 -0500 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131249140.22203@death.OCF.Berkeley.EDU> Message-ID: <16025.57707.389009.819692@montanaro.dyndns.org> Brett> Actually I knew that. What I was wondering about was how "to Brett> make sure that connection to Internet is possible". If s/he runs ./python Lib/test/regrtest.py -u network you believe the user. ;-) Skip From aahz@pythoncraft.com Mon Apr 14 00:21:39 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 13 Apr 2003 19:21:39 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com> References: <1050266017.10278.24.camel@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCOEPPEGAB.tim_one@email.msn.com> Message-ID: <20030413232138.GA6811@panix.com> On Sun, Apr 13, 2003, Tim Peters wrote: > > What I still don't grasp is why a debug run never failed with a > negative-refcount error. Attaching the prematurely-freed float to the > float free list doesn't change its refcount field -- that remains 0. > So if it was still in the free list when the original co_consts got > reclaimed, we should have had a negrefcnt death. OTOH, if the memory > was handed out to another float, then when the original co_consts got > reclaimed it would have knocked that float's refcount down too, which > should lead to a negrefcnt death later. Maybe co_consts never did get > reclaimed? I'm not clear on how much we let slide at shutdown. Maybe debug runs should walk through "the universe" to make sure it's in a valid state before exiting? I remember being confused that gc doesn't run when Python exits. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ This is Python. We don't care much about theory, except where it intersects with useful practice. --Aahz, c.l.py, 2/4/2002 From guido@python.org Mon Apr 14 01:55:18 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 20:55:18 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Sun, 13 Apr 2003 14:51:14 PDT." <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.53.0304121518440.6356@death.OCF.Berkeley.EDU> <200304130008.h3D084v02375@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304130048580.8131@death.OCF.Berkeley.EDU> <200304131254.h3DCscF17625@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131449510.17015@death.OCF.Berkeley.EDU> Message-ID: <200304140055.h3E0tIP26895@pcp02138704pcs.reston01.va.comcast.net> > > No, you check whether the 'network' resource name is enabled in > > test_support. Use test_support.is_resource_enabled('network'). > > Another thought that has come to mind; should we be diligent about > creating new objects like good testers? Or should we minimize it since > net connections are expensive to make and can hold things up. Net connections aren't that expensive; you can happily create a new net connection for each individual test. Of course, tests that hold things up should be minimized, but in my experience, tests containing waits (even sleep(0.1)) hold things up much more than opening and closing sockets. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 14 01:59:52 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 20:59:52 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: "Your message of Sun, 13 Apr 2003 21:49:17 BST." <2mbrzap1gy.fsf@starship.python.net> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> <2mbrzap1gy.fsf@starship.python.net> Message-ID: <200304140059.h3E0xqH26915@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum <guido@python.org> writes: > > > Someone accidentally discovered a way to set attributes of built-in > > types, even though the implementation tries to prevent this. > > [snip] > > > What to do about this? Michael Hudson: > Well, one approach would be special cases in PyObject_GenericSetAttr, > I guess. That's not quite enough, because PyObject_GenericSetAttr also gets called by code that should be allowed; I don't want to move all of the special processing from type_setattro() to PyObject_GenericSetAttr. But, having thought some more about this, I think adding a check to wrap_setattr() might be the thing to do. That gets called when you call object.__setattr__(x, "foo", value), but not when you do x.foo = value, so it's okay if it slows it down a tad. The test should make sure that self->ob_type->tp_setattro equals func, or something like that (haven't thought enough about the exact test which allows calling object.__setattr__ from a subclass that extends __setattr__ but not in the offending case). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 14 02:01:46 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 21:01:46 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: "Your message of Sun, 13 Apr 2003 16:28:13 EDT." <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> Message-ID: <200304140101.h3E11kg26948@pcp02138704pcs.reston01.va.comcast.net> > It appears to be a refcount error in recently-added C code that tries to > generalize the builtin range() function, specifically here: > > Fail: > Py_XDECREF(curnum); > Py_XDECREF(istep); <- here > Py_XDECREF(zero); > > Word to the wise: don't ever try to reuse a variable whose address is > passed to PyArg_ParseTuple for anything other than holding what > PyArg_ParseTuple does or doesn't store into it. You'll never get the > decrefs straight (and even if you manage to at first, the next person to > modify your code will break it). It's possible that I introduced that bug when I reworked the patch to use a single label rather than one for each variable. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 14 02:02:46 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 13 Apr 2003 21:02:46 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Sun, 13 Apr 2003 12:53:55 PDT." <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> Message-ID: <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> > I can write some basic instructions on how to use regrtest and > test_support; someone will just have to check them in. That would be great. Do you have a SF userid yet? Then we can give you commit privs! > > Hm, maybe these docs shouldn't be so hidden and there should be a > > standard library chapter on the test package and its submodules and > > the conventions for writing and running tests? > > That definitely wouldn't hurt. It might also get people to write > tests more often and maybe help with improving our code if they knew > about regrtest and test_support. And I think regrtest and test_support are useful for testing 3rd party code as well. Wanna make this a project? --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Mon Apr 14 02:17:14 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 14 Apr 2003 13:17:14 +1200 (NZST) Subject: [Python-Dev] Evil setattr hack In-Reply-To: <2mbrzap1gy.fsf@starship.python.net> Message-ID: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz> Guido: > one approach would be special cases in PyObject_GenericSetAttr, > I guess. Before using a hack like that, it might be better to think about what the real problem is. Seems to me the problem in general is that there's no way to prevent a class which overrides a method from having a superclass version of that method called through a back door. Which means you can't rely on method overriding to *restrict* what can be done to an object. So a proper fix would require either: (1) Providing some way for objects to prevent superclass methods from being called on them when they're not looking or (2) Fixing the typeobject not to rely on that for its security -- by hiding the real dict more deeply somehow? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From drifty@alum.berkeley.edu Mon Apr 14 03:13:30 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 13 Apr 2003 19:13:30 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> [Guido van Rossum] > > I can write some basic instructions on how to use regrtest and > > test_support; someone will just have to check them in. > > That would be great. Do you have a SF userid yet? Then we can give > you commit privs! > bcannon is my username. I was going to wait to ask for commit privs until I had done more patches (specifically C stuff), but if you think I am ready for it then it would be extremely cool to get commit privs (and not have to wait for anonymous CVS updates when the servers get overloaded or bug people to commit _strptime patches =). > > > Hm, maybe these docs shouldn't be so hidden and there should be a > > > standard library chapter on the test package and its submodules and > > > the conventions for writing and running tests? > > > > That definitely wouldn't hurt. It might also get people to write > > tests more often and maybe help with improving our code if they knew > > about regrtest and test_support. > > And I think regrtest and test_support are useful for testing 3rd party > code as well. Wanna make this a project? > I could. Going to have to learn more LaTeX (and the special extensions). So I can take this on, but I can't make any promises on when this will get done (I would be personally horrified if I can't get this done before 2.3 final gets out the door, but you never know). Should there be a testing SIG? Could keep a list of tests that could stand to be rewritten or added (I know I was surprised to discover test_urllib was so lacking). Could also start by hashing out these docs and making sure regrtest and test_support stay updated and relevant. Personally, I think writing regression tests is a good way to get new people to help with Python. They are simple to write and allows someone to be able to get involved beyond just filing a bug. I know it was a thrill for me the first time I got code checked in and maybe making the entry point easier by trying to get more people to write more regression tests for the libraries will help give someone else that rush and thus become more involved. Or maybe I am just bonkers. =) -Brett From mwh@python.net Mon Apr 14 07:33:25 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 14 Apr 2003 07:33:25 +0100 Subject: [Python-Dev] Evil setattr hack In-Reply-To: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz> (Greg Ewing's message of "Mon, 14 Apr 2003 13:17:14 +1200 (NZST)") References: <200304140117.h3E1HEv08476@oma.cosc.canterbury.ac.nz> Message-ID: <2m65phpozu.fsf@starship.python.net> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > Guido: Er, this was me. >> one approach would be special cases in PyObject_GenericSetAttr, >> I guess. > > Before using a hack like that, it might be better to think about what > the real problem is. Aww :-) > Seems to me the problem in general is that there's no way to prevent a > class which overrides a method from having a superclass version of > that method called through a back door. Which means you can't rely on > method overriding to *restrict* what can be done to an object. > > So a proper fix would require either: > > (1) Providing some way for objects to prevent superclass > methods from being called on them when they're not looking > > or > > (2) Fixing the typeobject not to rely on that for its security -- > by hiding the real dict more deeply somehow? Yeah, another option would be to make _PyObject_GetDictPtr respect __dict__ descriptors. But that's probably the Wrong Answer, too. Maybe just PyObject_GenericSetAttr should do that -- call PyObject_GetAttr(ob, '__dict__'), basically. bad-answers-on-demand-ly y'rs M. -- We did requirements and task analysis, iterative design, and user testing. You'd almost think programming languages were an interface between people and computers. -- Steven Pemberton (one of the designers of Python's direct ancestor ABC) From mwh@python.net Mon Apr 14 07:36:59 2003 From: mwh@python.net (Michael Hudson) Date: Mon, 14 Apr 2003 07:36:59 +0100 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com> ("Tim Peters"'s message of "Sun, 13 Apr 2003 17:44:57 -0400") References: <LNBBLJKPBEHFEDALKOLCMEPNEGAB.tim_one@email.msn.com> Message-ID: <2mznmtoa9g.fsf@starship.python.net> "Tim Peters" <tim_one@email.msn.com> writes: > That was my excited guess, until I looked at LOAD_CONST <wink>. Calls are > such an elaborate dance that the refcount on this puppy gets as high as 7. > The problem actually occurred when the refcount was at its peak, due to an > erroneous decref in handle_range_longs(). At that point the refcount fell > to 6, and the remaining 6(!) decrefs all looked correct. It seems to me that this would have been found much more easily if floats didn't have a free list anymore... Cheers, M. -- I don't remember any dirty green trousers. -- Ian Jackson, ucam.chat From guido@python.org Mon Apr 14 12:52:29 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 07:52:29 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Sun, 13 Apr 2003 19:13:30 PDT." <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> Message-ID: <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net> > > That would be great. Do you have a SF userid yet? Then we can give > > you commit privs! > > bcannon is my username. I was going to wait to ask for commit privs > until I had done more patches (specifically C stuff), but if you > think I am ready for it then it would be extremely cool to get > commit privs (and not have to wait for anonymous CVS updates when > the servers get overloaded or bug people to commit _strptime patches > =). OK, you're on. > I could. Going to have to learn more LaTeX (and the special > extensions). So I can take this on, but I can't make any promises > on when this will get done (I would be personally horrified if I > can't get this done before 2.3 final gets out the door, but you > never know). With LaTeX, the monkey-see-monkey-do approach works pretty well, combined with the Fred-will-fix-my-LaTeX-bugs approach. :-) > Should there be a testing SIG? Could keep a list of tests that > could stand to be rewritten or added (I know I was surprised to > discover test_urllib was so lacking). Could also start by hashing > out these docs and making sure regrtest and test_support stay > updated and relevant. I don't know about a SIG. Testing of what's in the core is fair game for python-dev. 3rd party testing, ask around. > Personally, I think writing regression tests is a good way to get > new people to help with Python. They are simple to write and allows > someone to be able to get involved beyond just filing a bug. I know > it was a thrill for me the first time I got code checked in and > maybe making the entry point easier by trying to get more people to > write more regression tests for the libraries will help give someone > else that rush and thus become more involved. > > Or maybe I am just bonkers. =) Writing a good regression test requires excellent knowledge about the code you're testing while not touching it, so that's indeed a good way to learn. --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Mon Apr 14 13:06:39 2003 From: theller@python.net (Thomas Heller) Date: 14 Apr 2003 14:06:39 +0200 Subject: [Python-Dev] GIL vs thread state Message-ID: <r885xoz4.fsf@python.net> The docs for PyThreadState_Clear() state that the interpreter lock must be held. I had this code in ctypes to delete the thread state and release the lock: static void LeavePython(char *msg) { PyThreadState *pts = PyThreadState_Swap(NULL); if (!pts) Py_FatalError("wincall (LeavePython): ThreadState is NULL?"); PyThreadState_Clear(pts); PyThreadState_Delete(pts); PyEval_ReleaseLock(); } and (under certain coditions, when ptr->frame was not NULL), got "Fatal Python error: PyThreadState_Get: no current thread" in the call to PyThreadState_Clear(). The GIL is held while this code is executed, although there is no thread state. Changing the code to the following fixes the problem, it seems holding the GIL is not enough: static void LeavePython(char *msg) { PyThreadState *pts = PyThreadState_Get(); if (!pts) Py_FatalError("wincall (LeavePython): ThreadState is NULL?"); PyThreadState_Clear(pts); pts = PyThreadState_Swap(NULL); PyThreadState_Delete(pts); PyEval_ReleaseLock(); } Is this a documentation problem, or a misunderstanding on my side? And, while we're on it, does the second version look ok? Thomas From uche.ogbuji@fourthought.com Mon Apr 14 15:10:12 2003 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 14 Apr 2003 08:10:12 -0600 Subject: [Python-Dev] List wisdom In-Reply-To: Message from "Tim Peters" <tim_one@email.msn.com> of "Sun, 13 Apr 2003 16:28:13 EDT." <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> Message-ID: <E1954ey-0007KU-00@borgia.local> > >> self.assertRaises(ValueError, range, 1e100, 1e101, 1e101) > > ... > > Given this, there's approximately no chance gc *caused* it. Who's been > > mucking with function calls (or maybe the eval loop) recently? > > It appears to be a refcount error in recently-added C code that tries to > generalize the builtin range() function, specifically here: > > Fail: > Py_XDECREF(curnum); > Py_XDECREF(istep); <- here > Py_XDECREF(zero); > > Word to the wise: don't ever try to reuse a variable whose address is > passed to PyArg_ParseTuple for anything other than holding what > PyArg_ParseTuple does or doesn't store into it. You'll never get the > decrefs straight (and even if you manage to at first, the next person to > modify your code will break it). This snippet sparked a little chain of events for me. I'm sure I've violated the principle before (foolishly trying to avoid declaring yet more C variables: I've always known it's bad style, but never thought it dangerous). I wanted to know whether this wisdom could be found anywhere a Python/C programmer would be likely to browse. So I dug through the Python Wiki, and found no such page of gems (just a lot of whimsical quotes from #python and a code-sharing page with some odd trinkets). I also checked to see if #python had a chump (opt-in log) on which I could put the quote. No dice. I did chump it on the #4suite log: http://uche.ogbuji.net/tech/akara/?xslt=irc.xslt&date=2003-04-14#14:03:38 I also created a Python Wiki page for useful notes and code snippets from this mailing list: http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom Please feel free to use it if anything here seems especially important to highlight (in addition to Brett Cannon's tireless work, of course). Thanks. hoping-to-save-others-an-eight-hour-odyssey-ly y'rs -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Gems From the [Python/XML] Archives - http://www.xml.com/pub/a/2003/04/09/py-xm l.html Introducing N-Triples - http://www-106.ibm.com/developerworks/xml/library/x-thi nk17/index.html Use internal references in XML vocabularies - http://www-106.ibm.com/developerw orks/xml/library/x-tipvocab.html EXSLT by example - http://www-106.ibm.com/developerworks/library/x-exslt.html The worry about program wizards - http://www.adtmag.com/article.asp?id=7238 Use rdf:about and rdf:ID effectively in RDF/XML - http://www-106.ibm.com/develo perworks/xml/library/x-tiprdfai.html From guido@python.org Mon Apr 14 15:10:28 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 10:10:28 -0400 Subject: [Python-Dev] GIL vs thread state In-Reply-To: Your message of "14 Apr 2003 14:06:39 +0200." <r885xoz4.fsf@python.net> References: <r885xoz4.fsf@python.net> Message-ID: <200304141410.h3EEAeZ14896@odiug.zope.com> > The docs for PyThreadState_Clear() state that the interpreter lock must > be held. > > I had this code in ctypes to delete the thread state and release the lock: > > static void LeavePython(char *msg) > { > PyThreadState *pts = PyThreadState_Swap(NULL); > if (!pts) > Py_FatalError("wincall (LeavePython): ThreadState is NULL?"); > PyThreadState_Clear(pts); > PyThreadState_Delete(pts); > PyEval_ReleaseLock(); > } > > and (under certain coditions, when ptr->frame was not NULL), got What is ptr->frame? A typo for pts->frame? If pts->frame is not NULL, I'd expect a warning from PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has a frame\n". > "Fatal Python error: PyThreadState_Get: no current thread" in the call > to PyThreadState_Clear(). That's strange, because I cannot trace the code in there to such a call. (Unless it is in a destructor. Can you tell more about where the PyThreadState_Get() call was?) > The GIL is held while this code is executed, although there is no thread > state. Changing the code to the following fixes the problem, it seems > holding the GIL is not enough: > > static void LeavePython(char *msg) > { > PyThreadState *pts = PyThreadState_Get(); > if (!pts) > Py_FatalError("wincall (LeavePython): ThreadState is NULL?"); > PyThreadState_Clear(pts); > pts = PyThreadState_Swap(NULL); > PyThreadState_Delete(pts); > PyEval_ReleaseLock(); > } > > Is this a documentation problem, or a misunderstanding on my side? > And, while we're on it, does the second version look ok? > > Thomas > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From paul@prescod.net Mon Apr 14 15:34:22 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Apr 2003 07:34:22 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> Message-ID: <3E9AC6EE.8010900@prescod.net> Tim Peters wrote: > ... > > Hiding critical resources in closures is a Bad Idea, of course -- that's why > nobody has used Scheme since 1993 <wink> Just to be clear, I didn't really intend to create a closure (i.e. a package of code and data). I just defined a function in a function because the inner function wasn't needed elsewhere. I don't know what the solution is, but it seems quite serious to me that there is another special case to remember when reasoning about when destructors get called. Roughly, Python's cleanup model is "things get destroyed when nothing refers to them." Then, that gets clarified to "unless they have reference cycles, in which case they may get destroyed arbitrarily later" and now "or they are used in a function containing another function, which will cause a circular reference involving all local variables." Paul Prescod From guido@python.org Mon Apr 14 15:50:10 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 10:50:10 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: Your message of "Mon, 14 Apr 2003 07:34:22 PDT." <3E9AC6EE.8010900@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> Message-ID: <200304141450.h3EEoAx15118@odiug.zope.com> > From: Paul Prescod <paul@prescod.net> > > Roughly, Python's cleanup model is "things get destroyed when > nothing refers to them." This hasn't been the mantra since Jython was introduced. Since then, the rule has always been "some arbitrary time after nothing refers to them." And the corollary is "always explicitly close your external resources." --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@zope.com Mon Apr 14 15:52:42 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 14 Apr 2003 10:52:42 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9AC6EE.8010900@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> Message-ID: <1050331961.28028.4.camel@slothrop.zope.com> On Mon, 2003-04-14 at 10:34, Paul Prescod wrote: > I don't know what the solution is, but it seems quite serious to me that > there is another special case to remember when reasoning about when > destructors get called. Roughly, Python's cleanup model is "things get > destroyed when nothing refers to them." Then, that gets clarified to > "unless they have reference cycles, in which case they may get destroyed > arbitrarily later" and now "or they are used in a function containing > another function, which will cause a circular reference involving all > local variables." The details of when finalizers are called is an implementation detail rather than a language property. You should add to your list of worries: An object is not finalized when it is reachable from a cycle of objects involving finalizers. They don't get destroyed at all. Finalizers seem useful in general, but I would still worry about any specific program that managed critical resources using finalizers. Jeremy From theller@python.net Mon Apr 14 15:58:50 2003 From: theller@python.net (Thomas Heller) Date: 14 Apr 2003 16:58:50 +0200 Subject: [Python-Dev] GIL vs thread state In-Reply-To: <200304141410.h3EEAeZ14896@odiug.zope.com> References: <r885xoz4.fsf@python.net> <200304141410.h3EEAeZ14896@odiug.zope.com> Message-ID: <llydxh05.fsf@python.net> Guido van Rossum <guido@python.org> writes: > > The docs for PyThreadState_Clear() state that the interpreter lock must > > be held. > > > > I had this code in ctypes to delete the thread state and release the lock: > > > > static void LeavePython(char *msg) > > { > > PyThreadState *pts = PyThreadState_Swap(NULL); > > if (!pts) > > Py_FatalError("wincall (LeavePython): ThreadState is NULL?"); > > PyThreadState_Clear(pts); > > PyThreadState_Delete(pts); > > PyEval_ReleaseLock(); > > } > > > > and (under certain coditions, when ptr->frame was not NULL), got > > What is ptr->frame? A typo for pts->frame? Right, sorry. > > If pts->frame is not NULL, I'd expect a warning from > PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has > a frame\n". You mean this code, from Python/pystate.h? void PyThreadState_Clear(PyThreadState *tstate) { if (Py_VerboseFlag && tstate->frame != NULL) fprintf(stderr, "PyThreadState_Clear: warning: thread still has a frame\n"); ZAP(tstate->frame); ZAP(tstate->dict); ... } Py_VerboseFlag is 0 set in my case, so no warning is printed. > > > "Fatal Python error: PyThreadState_Get: no current thread" in the call > > to PyThreadState_Clear(). > > That's strange, because I cannot trace the code in there to such a > call. (Unless it is in a destructor. It is in a destructor: frame_dealloc, called from ZAP(tstate->frame). > Can you tell more about where > the PyThreadState_Get() call was?) This function allocates the threadstate for me: static void EnterPython(char *msg) { PyThreadState *pts; PyEval_AcquireLock(); pts = PyThreadState_New(g_interp); if (!pts) Py_FatalError("wincall: Could not allocate ThreadState"); if (NULL != PyThreadState_Swap(pts)) Py_FatalError("wincall (EnterPython): thread state not == NULL?"); } To explain the picture a little better, here is the sequence of calls: Python calls into a C extension. The C extension does Py_BEGIN_ALLOW_THREADS call_a_C_function() Py_END_ALLOW_THREADS The call_a_C_function calls back into C code like this: void MyCallback(void) { EnterPython(); /* acquire the lock, and create a thread state */ execute_some_python_code(); LeavePython(); /* destroy the thread state, and release the lock */ } Now, the execute_some_python_code() section is enclosed in a win32 structured exception handling block, and it may return still with a frame in the threadstate, as it seems. Oops, I just tried the code in CVS python, and the problem goes away. Same for 2.3a2. But my code has to run in 2.2.2 as well... Thomas Here's the stack from python 2.2.2: NTDLL! 77f6f570() PyThreadState_Get() line 246 + 10 bytes PyErr_Fetch(_object * * 0x0012f944, _object * * 0x0012f954, _object * * 0x0012f948) line 215 + 5 bytes call_finalizer(_object * 0x0095ef20) line 382 + 17 bytes subtype_dealloc(_object * 0x0095ef20) line 434 + 9 bytes _Py_Dealloc(_object * 0x0095ef20) line 1837 + 7 bytes frame_dealloc(_frame * 0x00890c20) line 82 + 79 bytes _Py_Dealloc(_object * 0x00890c20) line 1837 + 7 bytes PyThreadState_Clear(_ts * 0x0095d2a0) line 174 + 86 bytes LeavePython(char * 0x1001125c) line 41 + 10 bytes From guido@python.org Mon Apr 14 16:18:07 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 11:18:07 -0400 Subject: [Python-Dev] GIL vs thread state In-Reply-To: Your message of "14 Apr 2003 16:58:50 +0200." <llydxh05.fsf@python.net> References: <r885xoz4.fsf@python.net> <200304141410.h3EEAeZ14896@odiug.zope.com> <llydxh05.fsf@python.net> Message-ID: <200304141518.h3EFI7w16583@odiug.zope.com> > > If pts->frame is not NULL, I'd expect a warning from > > PyThreadState_Clear(): "PyThreadState_Clear: warning: thread still has > > a frame\n". > You mean this code, from Python/pystate.h? > > void > PyThreadState_Clear(PyThreadState *tstate) > { > if (Py_VerboseFlag && tstate->frame != NULL) > fprintf(stderr, > "PyThreadState_Clear: warning: thread still has a frame\n"); > > ZAP(tstate->frame); > > ZAP(tstate->dict); > ... > } Yes. > Py_VerboseFlag is 0 set in my case, so no warning is printed. OK. > > > "Fatal Python error: PyThreadState_Get: no current thread" in the call > > > to PyThreadState_Clear(). > > > > That's strange, because I cannot trace the code in there to such a > > call. (Unless it is in a destructor. > > It is in a destructor: frame_dealloc, called from ZAP(tstate->frame). Aha. That wasn't obvious from your description. > > Can you tell more about where the PyThreadState_Get() call was?) > > This function allocates the threadstate for me: > > static void EnterPython(char *msg) > { > PyThreadState *pts; > PyEval_AcquireLock(); > pts = PyThreadState_New(g_interp); > if (!pts) > Py_FatalError("wincall: Could not allocate ThreadState"); > if (NULL != PyThreadState_Swap(pts)) > Py_FatalError("wincall (EnterPython): thread state not == NULL?"); > } Maybe you should have a look at Mark Hammond's PEP 311. It describes the problem and proposes a better solution. (I think it requires you to always use the existing thread state for the thread, rather than making up a temporary thread state as is currently the idiom.) > To explain the picture a little better, here is the sequence of calls: > > Python calls into a C extension. > The C extension does > > Py_BEGIN_ALLOW_THREADS > call_a_C_function() > Py_END_ALLOW_THREADS > > The call_a_C_function calls back into C code like this: > > void MyCallback(void) > { > EnterPython(); /* acquire the lock, and create a thread state */ > execute_some_python_code(); > LeavePython(); /* destroy the thread state, and release the lock */ > } > > Now, the execute_some_python_code() section is enclosed in a win32 > structured exception handling block, and it may return still with a > frame in the threadstate, as it seems. Ouch! I don't know what structured exception handling is, but this looks like it would be as bad as using setjmp/longjmp to get back to right after execute_some_python_code(). That code could leak arbitrary Python references!!! > Oops, I just tried the code in CVS python, and the problem goes away. > Same for 2.3a2. I vaguely recall that someone fixed some things in this area... :-( > But my code has to run in 2.2.2 as well... If the docs are lying, they have to be fixed. This is no longer my prime area of expertise... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Mon Apr 14 16:35:34 2003 From: theller@python.net (Thomas Heller) Date: 14 Apr 2003 17:35:34 +0200 Subject: [Python-Dev] GIL vs thread state In-Reply-To: <200304141518.h3EFI7w16583@odiug.zope.com> References: <r885xoz4.fsf@python.net> <200304141410.h3EEAeZ14896@odiug.zope.com> <llydxh05.fsf@python.net> <200304141518.h3EFI7w16583@odiug.zope.com> Message-ID: <adetxfax.fsf@python.net> Guido van Rossum <guido@python.org> writes: > Maybe you should have a look at Mark Hammond's PEP 311. It describes > the problem and proposes a better solution. (I think it requires you > to always use the existing thread state for the thread, rather than > making up a temporary thread state as is currently the idiom.) I have only briefly skimmed the PEP, but I have the impression that it proposes an new API, which may appear in 2.3 or 2.4. > Ouch! I don't know what structured exception handling is, but this > looks like it would be as bad as using setjmp/longjmp to get back to > right after execute_some_python_code(). Exactly. It basically does a longjmp() instead of crashing the process with an access violation, for example. > That code could leak > arbitrary Python references!!! I consider access violations programming errors, so leaking references would be ok. But I want to print a traceback instead of crashing (or at least before crashing) > If the docs are lying, they have to be fixed. This is no longer my > prime area of expertise... :-( That's why I have been asking. I can submit a bug pointing to this thread. Thomas From pje@telecommunity.com Mon Apr 14 16:52:16 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 14 Apr 2003 11:52:16 -0400 Subject: [Python-Dev] Garbage collecting closures Message-ID: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> >Then, that gets clarified to >"unless they have reference cycles, in which case they may get destroyed >arbitrarily later" and now "or they are used in a function containing >another function, which will cause a circular reference involving all >local variables." Actually, the issue is that *recursive* nested functions create a circular reference. Note that the body of function 'foo' contains a reference to 'foo'. *That* is the circular reference. If I understand correctly, it should also be breakable by deleting 'foo' from the outer function when you're done with it. E.g.: def bar(a): def foo(): return None x = a foo() del foo # clears the cell and breaks the cycle Strangely, I could have sworn that there was documentation that came out when nested scopes were introduced that discussed this issue. But I just looked at PEP 227 and the related "What's New" document, and neither explicitly mentions that defining recursive nested functions creates a circular reference. I think I just "knew" that it would do so, from what *is* said in those documents and what little I knew about how the cells mechanism was supposed to work. Since both PEP 227 and the What's New document mention recursive nested functions as a motivating example for nested scopes, perhaps they should mention the circular reference consequence of doing so. From jeremy@zope.com Mon Apr 14 16:58:35 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 14 Apr 2003 11:58:35 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> Message-ID: <1050335915.28028.10.camel@slothrop.zope.com> On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote: > If I understand correctly, it should also be breakable by deleting 'foo' > from the outer function when you're done with it. E.g.: > > def bar(a): > def foo(): > return None > x = a > foo() > > del foo # clears the cell and breaks the cycle > You haven't tried this, have you? ;-) SyntaxError: can not delete variable 'foo' referenced in nested scope Since foo() could escape bar, i.e. become reachable outside of bar(), we don't allow you to unbind foo. Jeremy From pje@telecommunity.com Mon Apr 14 17:08:38 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Mon, 14 Apr 2003 12:08:38 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <1050335915.28028.10.camel@slothrop.zope.com> References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> Message-ID: <5.1.1.6.0.20030414120333.01d59220@telecommunity.com> At 11:58 AM 4/14/03 -0400, Jeremy Hylton wrote: >On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote: > > If I understand correctly, it should also be breakable by deleting 'foo' > > from the outer function when you're done with it. E.g.: > > > > def bar(a): > > def foo(): > > return None > > x = a > > foo() > > > > del foo # clears the cell and breaks the cycle > > > >You haven't tried this, have you? ;-) Well, I did say, "If I understand correctly". :) What's funny is, I could've sworn I've used 'del' under similar circumstances before. It must not have been to delete a cell, just deleting something else in a function that defined a function. Ah well. >SyntaxError: can not delete variable 'foo' referenced in nested scope Interestingly, it gives me a different error in IDLE: "unsupported operand type(s) for -: 'NoneType' and 'int'" >Since foo() could escape bar, i.e. become reachable outside of bar(), we >don't allow you to unbind foo. So do this instead: foo = None From guido@python.org Mon Apr 14 17:08:04 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 12:08:04 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: Your message of "14 Apr 2003 11:58:35 EDT." <1050335915.28028.10.camel@slothrop.zope.com> References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> <1050335915.28028.10.camel@slothrop.zope.com> Message-ID: <200304141608.h3EG84V17588@odiug.zope.com> > On Mon, 2003-04-14 at 11:52, Phillip J. Eby wrote: > > If I understand correctly, it should also be breakable by deleting 'foo' > > from the outer function when you're done with it. E.g.: > > > > def bar(a): > > def foo(): > > return None > > x = a > > foo() > > > > del foo # clears the cell and breaks the cycle > From: Jeremy Hylton <jeremy@zope.com> > > You haven't tried this, have you? ;-) > > SyntaxError: can not delete variable 'foo' referenced in nested scope > > Since foo() could escape bar, i.e. become reachable outside of bar(), we > don't allow you to unbind foo. I don't see the reason for this semantic restriction. IMO it could just as well be a runtime error (e.g. raising UnboundLocalError). --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@zope.com Mon Apr 14 17:16:59 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 14 Apr 2003 12:16:59 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304141608.h3EG84V17588@odiug.zope.com> References: <5.1.1.6.0.20030414114006.00a28c90@mail.rapidsite.net> <1050335915.28028.10.camel@slothrop.zope.com> <200304141608.h3EG84V17588@odiug.zope.com> Message-ID: <1050337018.28028.19.camel@slothrop.zope.com> On Mon, 2003-04-14 at 12:08, Guido van Rossum wrote: > I don't see the reason for this semantic restriction. IMO it could > just as well be a runtime error (e.g. raising UnboundLocalError). I can't recall why I thought this restriction was necessary. Very little code and one new opcode is required to change the compile-time error to a runtime error. Jeremy From paul@prescod.net Mon Apr 14 20:32:06 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Apr 2003 12:32:06 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304141450.h3EEoAx15118@odiug.zope.com> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> Message-ID: <3E9B0CB6.4030101@prescod.net> Jeremy Hylton wrote: > The details of when finalizers are called is an implementation detail > rather than a language property. and Guido van Rossum wrote: > ... Since then, > the rule has always been "some arbitrary time after nothing refers to > them." And the corollary is "always explicitly close your external > resources." I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some background so you can understand my use case. Then you can decide for yourself whether it is worth supporting. When you're dealing with COM objects, you do stuff like: foo = a.b.c.d b, c and d are all temporary, reference counted objects: reference counted on both the COM and Python sides. It is quite inconvenient to treat them as "resources" like database handles or something. a = Dispatch("xxx.yyy") b = a.b c = b.c d = c.d a.release() b.release() c.release() 80% of the variables in my code are COM objects! I'm not a big win32com programmer, but it is my impression that this is NOT the typical programming style. COM is specifically designed to use reference counting so that programmers (even C++ programmers!) don't have to do explicit deallocation. COM and CPython have roughly the same garbage collection model (reference counted) so there is no need to treat them as special external resources. (nowadays, Python cleans up circular references and COM doesn't, so there is a minor divergence there) The truth is that even after having been bitten, I'd rather deal with the 3 or 4 exceptional garbage collection cases (circular references with finalizers, closures, etc.) than uglify and complicate my Python code! I'll explicitly run GC in a shutdown() method. Even though it is easy to work around, this particular special case really feels pathological to me. Simple transformations set it off, and they can be quite non-local. From: Safe: def a(): if something: a() def b(): a() ... # ten thousand lines of code x = com_object to Buggy: def b(): def a(): if something: a() a() ... # ten thousand lines of code x = com_object OR Safe: def b(): def a(): if something: a() a() ... # ten thousand lines of code com_object.do_something() to Buggy: def b(): def a(): if something: a() a() # ten thousand lines of code junk = com_object.do_something() If I'm the first and last person to have this problem, then I guess it won't be a big deal, but it sure was confusing for me to debug. The containing app wouldn't shut down while Python owned a reference. Paul Prescod From paul@prescod.net Mon Apr 14 20:43:58 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Apr 2003 12:43:58 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <1050331961.28028.4.camel@slothrop.zope.com> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com> Message-ID: <3E9B0F7E.1080407@prescod.net> > > Finalizers seem useful in general, but I would still worry about any > specific program that managed critical resources using finalizers. > > Jeremy Finalizer behaviour is implementation specific. Fair enough. Therefore, portable programs don't use finalizers. Okay, fine. Not all Python programs are designed to be portable. Finalizers tend to be used to deal with non-portable resources (COM objects, database handles) anyhow. This suggests to me that each implementation should document in detail how finalizers work in that implementation. After all, if you can't depend on them to work predictably even within a single implementation, what is their value at all? A totally unpredictable feature is of little more value than no feature at all. I propose to collect the various garbage collection special cases we've described in this discussion and write a tutorial for the CPython documentation. Does anyone know of any more special cases? Probably any library or language feature that can create non-obvious circular references should be listed. Paul Prescod From martin@v.loewis.de Mon Apr 14 21:00:49 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 14 Apr 2003 22:00:49 +0200 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9B0CB6.4030101@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> <3E9B0CB6.4030101@prescod.net> Message-ID: <m3u1d0voge.fsf@mira.informatik.hu-berlin.de> Paul Prescod <paul@prescod.net> writes: > I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some > background so you can understand my use case. Then you can decide for > yourself whether it is worth supporting. I think demonstrating use cases is futile, as people believe that what you want is unimplementable. Instead, if you would come forward with an implementation strategy, that would be more convincing. Regards, Martin From guido@python.org Mon Apr 14 21:03:33 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 16:03:33 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: Your message of "Mon, 14 Apr 2003 12:43:58 PDT." <3E9B0F7E.1080407@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com> <3E9B0F7E.1080407@prescod.net> Message-ID: <200304142003.h3EK3XH21345@odiug.zope.com> Paul, would finalizers have been run if you had included an explicit gc.collect() call? If so, I'd say that a sufficiently portable rule is that you can't trust finalizers to run until GC is run (in Jython, gc.collect() isn't how it is invoked though). If gc.collect() didn't solve your problem, full documentation of cycles would indeed be required. However, I'm reluctant to do so because this reveals a lot of information about the implementation that I don't want to have to guarantee for future versions. --Guido van Rossum (home page: http://www.python.org/~guido/) From cnetzer@mail.arc.nasa.gov Mon Apr 14 21:09:04 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 14 Apr 2003 13:09:04 -0700 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> Message-ID: <1050350944.605.28.camel@sayge.arc.nasa.gov> On Sun, 2003-04-13 at 13:28, Tim Peters wrote: > It appears to be a refcount error in recently-added C code that tries to > generalize the builtin range() function, specifically here: > > Fail: > Py_XDECREF(curnum); > Py_XDECREF(istep); <- here > Py_XDECREF(zero); > > Word to the wise: don't ever try to reuse a variable whose address is > passed to PyArg_ParseTuple for anything other than holding what > PyArg_ParseTuple does or doesn't store into it. Hmm, then this is my fault. I did exactly that. My approach was to Py_INCREF an optional argument it if it was given (ie. not NULL), otherwise to create it from scratch, and Py_DECREF when I was done. I believe this was a not uncommon idiom (I can't recal the specifics, but being my first submitted patch, I made sure to try to look for existing idioms for argument and error handling). I apologize if I erred. I assume a better approach, then is to get the optional istep argument, and copy it into a variable for your own use (or create it if it didn't exist)? ie. Never increment or decrement the optional argument object, returned from PyArg_ParseTuple, at all? > You'll never get the > decrefs straight (and even if you manage to at first, the next person to > modify your code will break it). Bingo! Guido took a slightly different approach (and ultimately a better one, I think), in adapting my patch. Perhaps I unknowingly left a time bomb for him. I'll submit a patch to fix this all up tonight, if it hasn't already been addressed by then. > only-consumed-eight-hours-this-time<wink>-ly y'rs - tim Oh, ow! Now that pains me. I am very sorry to hear this wasted so much time. Chad Netzer From guido@python.org Mon Apr 14 21:13:31 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 16:13:31 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: Your message of "14 Apr 2003 13:09:04 PDT." <1050350944.605.28.camel@sayge.arc.nasa.gov> References: <LNBBLJKPBEHFEDALKOLCAEPKEGAB.tim_one@email.msn.com> <1050350944.605.28.camel@sayge.arc.nasa.gov> Message-ID: <200304142013.h3EKDVo21434@odiug.zope.com> > On Sun, 2003-04-13 at 13:28, Tim Peters wrote: > > > It appears to be a refcount error in recently-added C code that tries to > > generalize the builtin range() function, specifically here: > > > > Fail: > > Py_XDECREF(curnum); > > Py_XDECREF(istep); <- here > > Py_XDECREF(zero); > > > > Word to the wise: don't ever try to reuse a variable whose address is > > passed to PyArg_ParseTuple for anything other than holding what > > PyArg_ParseTuple does or doesn't store into it. > > Hmm, then this is my fault. I did exactly that. My approach was to > Py_INCREF an optional argument it if it was given (ie. not NULL), > otherwise to create it from scratch, and Py_DECREF when I was done. I > believe this was a not uncommon idiom (I can't recal the specifics, but > being my first submitted patch, I made sure to try to look for existing > idioms for argument and error handling). I apologize if I erred. > > I assume a better approach, then is to get the optional istep > argument, and copy it into a variable for your own use (or create it if > it didn't exist)? ie. Never increment or decrement the optional > argument object, returned from PyArg_ParseTuple, at all? > > > You'll never get the > > decrefs straight (and even if you manage to at first, the next person to > > modify your code will break it). > > Bingo! Guido took a slightly different approach (and ultimately a > better one, I think), in adapting my patch. Perhaps I unknowingly left > a time bomb for him. Sort of. Your code didn't have the refcount bug; I moved the initialization of 'zero' up, and changed a few 'return NULL' lines into 'goto Fail', but I didn't move the 'INCREF(istep)' up. > I'll submit a patch to fix this all up tonight, if it hasn't already > been addressed by then. Tim fixed it already. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Apr 14 21:20:58 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 14 Apr 2003 16:20:58 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <1050350944.605.28.camel@sayge.arc.nasa.gov> Message-ID: <BIEJKCLHCIOIHAGOKOLHKEHCFFAA.tim.one@comcast.net> [Chad Netzer] > Hmm, then this is my fault. I did exactly that. Guido thinks he broke it when he updated the patch. It doesn't really matter to me -- I hate everyone anyway <wink>. > My approach was to Py_INCREF an optional argument it if it was given (ie. > not NULL), otherwise to create it from scratch, and Py_DECREF when I was > done. I believe this was a not uncommon idiom (I can't recal the > specifics, but being my first submitted patch, I made sure to try to look > for existing idioms for argument and error handling). I apologize if I > erred. I don't know -- and it doesn't matter. I ended up (perhaps) restoring your original intent. I think Guido was provoked into fiddling it to begin with because of the large number of exit labels in the original. > I assume a better approach, then is to get the optional istep > argument, and copy it into a variable for your own use (or create it if > it didn't exist)? ie. Never increment or decrement the optional > argument object, returned from PyArg_ParseTuple, at all? That's usually safest. This was an unusual function, though (range's signature is messy, and the extension to long required defaults that couldn't be expressed as native C types). > ... > I'll submit a patch to fix this all up tonight, if it hasn't already > been addressed by then. It's all been checked in. Nothing left to do. >> only-consumed-eight-hours-this-time<wink>-ly y'rs - tim > Oh, ow! Now that pains me. I am very sorry to hear this wasted so much > time. Well, what do you think weekends are for <wink>? From tim.one@comcast.net Mon Apr 14 21:38:39 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 14 Apr 2003 16:38:39 -0400 Subject: [Python-Dev] Big trouble in CVS Python In-Reply-To: <2mznmtoa9g.fsf@starship.python.net> Message-ID: <BIEJKCLHCIOIHAGOKOLHKEHEFFAA.tim.one@comcast.net> [Michael Hudson] > It seems to me that this would have been found much more easily if > floats didn't have a free list anymore... Hard to guess. It appears that the prematurely released float storage wasn't allocated again by the time the error occurred, so if floats used pymalloc a debug run would have sprayed 0xdb bytes into the memory, and that would have made it obvious that the memory had been freed. OTOH, if another float object had gotten allocated between the premature-free and the error, pymalloc and the free-list strategy are both likely to have handed out the same storage again, and we'd be staring at the same symptoms either way. It's hard to love the unbounded & immortal free list for floats regardless. OTOH, I have no doubt that it *is* faster than pymalloc (the latter has more overheads due to recycling whole pools when possible, and for determining who (pymalloc or system malloc) owns the memory getting freed; invoking pymalloc is also another layer of function call). From pedronis@bluewin.ch Mon Apr 14 21:37:34 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Mon, 14 Apr 2003 22:37:34 +0200 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9B0CB6.4030101@prescod.net> References: <200304141450.h3EEoAx15118@odiug.zope.com> <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> Message-ID: <5.2.1.1.0.20030414223253.02a459c0@localhost> At 12:32 14.04.03 -0700, Paul Prescod wrote: >Buggy: > >def b(): > def a(): > if something: > a() > a() > # ten thousand lines of code > junk = com_object.do_something() a should refer and close over junk otherwise nothing bad happens. >>> class X: ... def __del__(self): ... print "dying" ... >>> def b(x): ... def a(n): ... if n: a(n-1) ... a(1) ... junk = x ... >>> b(X()) dying vs. >>> def b(x): ... def a(n): ... if n: a(n-1) ... junk ... junk = x ... >>> b(X()) >>> gc.collect() dying 10 regards From paul@prescod.net Mon Apr 14 22:41:31 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Apr 2003 14:41:31 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304142003.h3EK3XH21345@odiug.zope.com> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com> <3E9B0F7E.1080407@prescod.net> <200304142003.h3EK3XH21345@odiug.zope.com> Message-ID: <3E9B2B0B.60101@prescod.net> Guido van Rossum wrote: > Paul, would finalizers have been run if you had included an explicit > gc.collect() call? > > If so, I'd say that a sufficiently portable rule is that you can't > trust finalizers to run until GC is run (in Jython, gc.collect() isn't > how it is invoked though). Yes, now that I know to try it, gc.collect() would have fixed the problem. But I'm not sure where I would have learned to do so. The documentation for __del__ is out of date. * http://www.python.org/doc/2.3a2/ref/customization.html The documentation lists a variety of reasons that __del__ might not get called (it doesn't claim to be exhaustive but it does list some cases that I consider pretty obscure). It doesn't list nested recursive functions. One strategy is to update the __del__ and gc documentation to add this case. Another strategy is to update the __del__ documentation to say: "if you want this to be executed deterministically in CPython, call gc.collect()". Or both. Paul Prescod From paul@prescod.net Mon Apr 14 22:45:19 2003 From: paul@prescod.net (Paul Prescod) Date: Mon, 14 Apr 2003 14:45:19 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <m3u1d0voge.fsf@mira.informatik.hu-berlin.de> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> <3E9B0CB6.4030101@prescod.net> <m3u1d0voge.fsf@mira.informatik.hu-berlin.de> Message-ID: <3E9B2BEF.5000907@prescod.net> Martin v. L=F6wis wrote: > Paul Prescod <paul@prescod.net> writes: >=20 >=20 >>I knew I'd hear that. ;) Overall, I agree. Anyhow, I'll give you some >>background so you can understand my use case. Then you can decide for >>yourself whether it is worth supporting. >=20 >=20 > I think demonstrating use cases is futile, as people believe that what > you want is unimplementable. Instead, if you would come forward with > an implementation strategy, that would be more convincing. I'm no going to advocate a particular strategy because I don't know=20 enough of the performance and implementation costs. But you asked for a=20 strategy so I'll at least suggest one. Python could run gc.collect()=20 after returning from functions containing nested recursive functions.=20 Perhaps an opcode flags these functions. Arguably this happens rarely enough that predictability is more=20 important than performance in this case. (I admit again that it is=20 arguable!) Perhaps there would be some more precise way to tell=20 gc.collect to only inspect graphs containing the offending nested=20 function...or maybe you could be even more precise: if a function is=20 known to be a nested function and it has a single reference count then=20 could you say that the only reference is to itself recursively? Of course if the function returned a closure and the closure depended on=20 a variable referencing an object then the object should live as long as=20 the closure. That's both expected and necessary. Paul Prescod From jeremy@zope.com Mon Apr 14 22:51:23 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 14 Apr 2003 17:51:23 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9B2B0B.60101@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <1050331961.28028.4.camel@slothrop.zope.com> <3E9B0F7E.1080407@prescod.net> <200304142003.h3EK3XH21345@odiug.zope.com> <3E9B2B0B.60101@prescod.net> Message-ID: <1050357083.28025.42.camel@slothrop.zope.com> On Mon, 2003-04-14 at 17:41, Paul Prescod wrote: > Yes, now that I know to try it, gc.collect() would have fixed the problem. > > But I'm not sure where I would have learned to do so. The documentation > for __del__ is out of date. > > * http://www.python.org/doc/2.3a2/ref/customization.html > > The documentation lists a variety of reasons that __del__ might not get > called (it doesn't claim to be exhaustive but it does list some cases > that I consider pretty obscure). It doesn't list nested recursive functions. The first one on the list is "circular references between objects." (Now that should be "among" objects, but that's not your complaint.) Nested recursive functions are an example of data structure involving circular references. Jeremy From nas@python.ca Mon Apr 14 23:09:14 2003 From: nas@python.ca (Neil Schemenauer) Date: Mon, 14 Apr 2003 15:09:14 -0700 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9B2BEF.5000907@prescod.net> References: <LNBBLJKPBEHFEDALKOLCAELDEGAB.tim_one@email.msn.com> <3E9AC6EE.8010900@prescod.net> <200304141450.h3EEoAx15118@odiug.zope.com> <3E9B0CB6.4030101@prescod.net> <m3u1d0voge.fsf@mira.informatik.hu-berlin.de> <3E9B2BEF.5000907@prescod.net> Message-ID: <20030414220914.GA1208@glacier.arctrix.com> Paul Prescod wrote: > I'm no going to advocate a particular strategy because I don't know > enough of the performance and implementation costs. But you asked for a > strategy so I'll at least suggest one. Python could run gc.collect() > after returning from functions containing nested recursive functions. gc.collect() is too expensive for that to be feasible. Neil From tim.one@comcast.net Mon Apr 14 23:15:58 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 14 Apr 2003 18:15:58 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <3E9B2B0B.60101@prescod.net> Message-ID: <BIEJKCLHCIOIHAGOKOLHAEHOFFAA.tim.one@comcast.net> [Paul Prescod] > ... > But I'm not sure where I would have learned to do so. The documentation > for __del__ is out of date. > > * http://www.python.org/doc/2.3a2/ref/customization.html > > The documentation lists a variety of reasons that __del__ might not get > called (it doesn't claim to be exhaustive but it does list some cases > that I consider pretty obscure). It doesn't list nested recursive > functions. __del__ isn't relevant to your test case, though: if the cycles in question contained any object with a __del__ method, gc would never have reclaimed them (and gc.collect() would have had no effect on them either, other than to move the trash cycles into gc.garbage). You had __del__-free cycles, and then there is indeed no way to predict when they'll get reclaimed. I think that's just life; you wouldn't be any better off in Java or Scheme or anything else. It's always been difficult to guess when the implementation of a thing may involve a cycle under the covers, and closures, generators and new-style classes have created many new opportunities for cycles to appear. I don't expect users to know when they're going to happen! I can't keep them all straight myself. I try to write code that doesn't care, though (avoid __del__ methods; avoid "hiding" critical resources in side effects of what look like simple expressions; arrange for subsystems that can be explicitly told to release critical resources). Ain't always easy. From drifty@alum.berkeley.edu Mon Apr 14 23:59:46 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 14 Apr 2003 15:59:46 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU> [Guido van Rossum] > > > That would be great. Do you have a SF userid yet? Then we can give > > > you commit privs! > > > > bcannon is my username. I was going to wait to ask for commit privs > > until I had done more patches <snip> > OK, you're on. > Cool! Thanks, Guido! No more recv() resets from SF! Woohoo! > > I could. Going to have to learn more LaTeX (and the special > > extensions). So I can take this on, but I can't make any promises > > on when this will get done (I would be personally horrified if I > > can't get this done before 2.3 final gets out the door, but you > > never know). > > With LaTeX, the monkey-see-monkey-do approach works pretty well, > combined with the Fred-will-fix-my-LaTeX-bugs approach. :-) > =) Works for me. > > Should there be a testing SIG? Could keep a list of tests that > > could stand to be rewritten or added (I know I was surprised to > > discover test_urllib was so lacking). Could also start by hashing > > out these docs and making sure regrtest and test_support stay > > updated and relevant. > > I don't know about a SIG. Testing of what's in the core is fair game > for python-dev. 3rd party testing, ask around. > OK, no SIG then. > > Personally, I think writing regression tests is a good way to get > > new people to help with Python. They are simple to write and allows > > someone to be able to get involved beyond just filing a bug. I know > > it was a thrill for me the first time I got code checked in and > > maybe making the entry point easier by trying to get more people to > > write more regression tests for the libraries will help give someone > > else that rush and thus become more involved. > > > > Or maybe I am just bonkers. =) > > Writing a good regression test requires excellent knowledge about the > code you're testing while not touching it, so that's indeed a good way > to learn. > One of these days I am going to put together an "Intro to python-dev" page that discusses the basic etiquette on the list and how to slowly get more and more involved. But it looks like I have some LaTeX docs to write first. -Brett From guido@python.org Tue Apr 15 01:08:08 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 20:08:08 -0400 Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: "Your message of Mon, 14 Apr 2003 15:59:46 PDT." <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU> Message-ID: <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net> > One of these days I am going to put together an "Intro to > python-dev" page that discusses the basic etiquette on the list and > how to slowly get more and more involved. There's already quite a bit of that at http://www.python.org/dev/ (follow the links to "Development Process" and "Culture"). Since you already have access to the CVS repository for the website, you could simply augment what's already there... --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Tue Apr 15 01:32:34 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Apr 2003 12:32:34 +1200 (NZST) Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <1050331961.28028.4.camel@slothrop.zope.com> Message-ID: <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz> Jeremy: > Finalizers seem useful in general, but I would still worry about any > specific program that managed critical resources using finalizers. What *are* they useful for, then? Or are they only useful "in general", and never in any particular case? :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Tue Apr 15 01:35:34 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 20:35:34 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: "Your message of Sat, 12 Apr 2003 09:43:52 EDT." <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304150035.h3F0ZYc03122@pcp02138704pcs.reston01.va.comcast.net> > Someone accidentally discovered a way to set attributes of built-in > types, even though the implementation tries to prevent this. I've checked in what I believe is an adequate block for at least this particular hack. wrap_setattr(), which is called in response to <type>.__setattr__(), now compares if the C function it is about to call is the same as the C function in the built-in base class closest to the object's class. This means that if B is a built-in class and P is a Python class derived from B, P.__setattr__ can call B.__setattr__, but not A.__setattr__ where A is an (also built-in) base class of B (unless B inherits A.__setattr__). The following session shows that object.__setattr__ can no longer be used to set a type's attributes: Remind us that 'str' is an instance of 'type': >>> isinstance(str, type) True 'type' has a __setattr__ method that forbids setting all attributes. Try type.__setattr__; nothing new here: >>> type.__setattr__(str, "foo", 42) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can't set attributes of built-in/extension type 'str' Remind us that 'object' is a base class of 'type': >>> issubclass(type, object) True Now try object.__setattr__. This used to work; now it shows the new error message: >>> object.__setattr__(str, "foo", 42) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can't apply this __setattr__ to type object __delattr__ has the same restriction, or else you would be able to remove existing str methods -- not good: >>> object.__delattr__(str, "foo") Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: can't apply this __delattr__ to type object In other (normal) circumstances object.__setattr__ still works: >>> class C(object): ... pass ... >>> x = C() >>> object.__setattr__(x, "foo", 42) >>> object.__delattr__(x, "foo") I'll backport this to Python 2.2 as well. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Tue Apr 15 01:45:53 2003 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 14 Apr 2003 20:45:53 -0400 Subject: [Python-Dev] RE: List wisdom In-Reply-To: <E1954ey-0007KU-00@borgia.local> Message-ID: <LNBBLJKPBEHFEDALKOLCEEDIEDAB.tim_one@email.msn.com> [Uche Ogbuji] > ... > So I dug through the Python Wiki, and found no such page of gems > (just a lot of whimsical quotes from #python and a code-sharing page with some > odd trinkets). I also checked to see if #python had a chump (opt-in > log) on which I could put the quote. No dice. I did chump it on the #4suite > log: > > http://uche.ogbuji.net/tech/akara/?xslt=irc.xslt&date=2003-04-14#14:03:38 I didn't understand a word of that -- young people <0.9 wink>. > I also created a Python Wiki page for useful notes and code > snippets from this mailing list: > > http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom > > Please feel free to use it if anything here seems especially important to > highlight (in addition to Brett Cannon's tireless work, of course). Excellent idea! The Python Wiki seems severely underused. I tried to help it along by fleshing out the snippet. Unlike chumping, typing is something an old bot knows how to do ... From guido@python.org Tue Apr 15 01:55:21 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 14 Apr 2003 20:55:21 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: "Your message of Tue, 15 Apr 2003 12:32:34 +1200." <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz> References: <200304150032.h3F0WY020654@oma.cosc.canterbury.ac.nz> Message-ID: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net> > > Finalizers seem useful in general, but I would still worry about any > > specific program that managed critical resources using finalizers. > > What *are* they useful for, then? Or are they only useful "in > general", and never in any particular case? :-) Finalizers are a necessary evil. For example, when I create a Python file type that encapsulates an external resource like a file descriptor as returned by os.open(), together with a buffer, I really want to be able to specify a finalizer that flushes the write buffer and closes the file descriptor. But I also really want the application not to rely on that finalizer! Note that as a library developer, I can write the file type careful to avoid being part of any cycles, so the restriction on finalizers that are part of cycles doesn't bother me too much: I'm doing all I can, and if a file is nevertheless kept alive by a cycle in the application's code, the application has to deal with this (same as with a file type implemented in C, for which the restriction on finalizers in cycles doesn't hold). Why do I, as library developer, want the finalizer? Because I don't want to rely on the application to keep track of when a file must be closed. But then why do I (still as library developer) recommend that the application closes files explicitly? Because there's no guarantee *when* finalizers are run, and it's easy for the application to create a cycle unknowingly (as we've seen in Paul's case). Basically, the dual requirement is there to avoid the application and the library to pointing fingers at each other when there's a problem with leaking file descriptors. This makes me think that Python should run the garbage collector before exiting, so that finalizers on objects that were previously kept alive by cycles are called (even if finalizers on objects that are *part* of a cycle still won't be called). I also think that if a strongly connected component (a stronger concept than cycle) has exactly one object with a finalizer in it, that finalizer should be called, and then the object should somehow be marked as having been finalized (maybe a separate GC queue could be used for this) in case it is resurrected by its finalizer. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Tue Apr 15 02:18:14 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 14 Apr 2003 21:18:14 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <m3znmux2yr.fsf@mira.informatik.hu-berlin.de> Message-ID: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> [Skip Montanaro] >> Is it time to think seriously about moving away from SourceForge? [Martin v. L=F6wis] > Any proposal to move away from SourceForge should include a proposa= l > where to move *to*. I highly admire SourceForge operators for their > quality of service, and challenge anybody to provide the same quali= ty > service. Be prepared to find yourself in a full-time job if you wan= t > to take over. I'm not sure that better alternatives for *some* of what SF does coul= dn't be gotten with reasonable effort. For example, on a quiet machine, I ju= st did a cvs up on a fully up-to-date Python, via SF. That took 147 seconds= . I also did a cvs up on a fully up-to-date Zope3, via Zope Corp's CVS se= tup. That took 9 seconds. I expect at least as many (probably more) peopl= e hit Zope's CVS as hit Python's CVS, and ZC appears to put minimal effort = into maintaining its public CVS servers. A crucial difference is that SF = CVS has to serve hundreds of thousands of people, and ZC's more like just hun= dreds. > SourceForge performance was *much* worse in the past, and we didn't > consider moving away, and SF fixed it by buying new hardware. Give > them some time. There have been times over the past few weeks when cvsup time via SF = was as bad as it's ever been, meaning > half an hour to finish. There have = also been times when it's been quite zippy. I think they've made tremendo= us strides in cutting response time for the trackers, though (that was i= ndeed very much worse in the past). From greg@cosc.canterbury.ac.nz Tue Apr 15 02:41:15 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Apr 2003 13:41:15 +1200 (NZST) Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz> > Why do I, as library developer, want the finalizer? Because I don't > want to rely on the application to keep track of when a file must be > closed. > > But then why do I (still as library developer) recommend that the > application closes files explicitly? Because there's no guarantee > *when* finalizers are run, and it's easy for the application to create > a cycle unknowingly (as we've seen in Paul's case). Okay, I think I see what you're saying. Finalizers are needed to make sure that resources are *eventually* reclaimed, and if that's not good enough for the application, it needs to make its own arrangements. Fair enough. What bothers me, though, is that even with finalizers, the library writer *still* can't guarantee eventual reclamation. The application can unwittingly stuff it all up by creating cycles, and there's nothing the library writer can do about it. It seems to me that giving up on finalization altogether in the presence of cycles is too harsh. In most cases, the cycle isn't actually going to make any difference. With a cycle of your abovementioned file-descriptor-holding objects, for example, could be finalized in an arbitrary order, because the *finalizers* don't depend on any other objects in the cycle. So maybe there should be some way of classifying the references held by an object into those that are relied upon by its finalizer, and those that aren't. The algorithm would then be to first go through and clear all the references that *aren't* needed by finalizers, and then... Actually, that's all you would need to do, I think. If there is an unambiguous order of finalization, that means there must be no cycles amongst the references needed by finalizers. And if that's the case, once you've cleared all the other references, normal ref counting will take care of the rest and call their finalizers in the proper order. If there's anything left after that, then you have a genuinely difficult case and are entitled to give up! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Tue Apr 15 03:16:31 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 14 Apr 2003 22:16:31 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304150055.h3F0tLk05623@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net> [Guido] > ... > This makes me think that Python should run the garbage collector > before exiting, so that finalizers on objects that were previously > kept alive by cycles are called (even if finalizers on objects that > are *part* of a cycle still won't be called). What about finalizers on objects that are alive at exit because they're still reachable? We seem to leave a lot of stuff alive at the end. For example, here are the pymalloc stats at the end under current CVS, after opening an interactive shell then exiting immediately; this is produced at the end of Py_Finalize(), and only call_ll_exitfuncs() is done after this (and that probably shouldn't free anything): Small block threshold = 256, in 32 size classes. class size num pools blocks in use avail blocks ----- ---- --------- ------------- ------------ 2 24 1 1 168 5 48 1 2 82 6 56 13 170 766 7 64 13 445 374 8 72 5 25 255 9 80 1 1 49 15 128 1 2 29 20 168 5 25 95 23 192 1 1 20 25 208 1 2 17 29 240 1 2 14 31 256 1 1 14 # times object malloc called = 17,119 3 arenas * 262144 bytes/arena = 786,432 # bytes in allocated blocks = 45,800 # bytes in available blocks = 131,072 145 unused pools * 4096 bytes = 593,920 # bytes lost to pool headers = 1,408 # bytes lost to quantization = 1,944 # bytes lost to arena alignment = 12,288 Total = 786,432 "size" here is 16 bytes larger than in a release build, because of the 8-byte padding added by PYMALLOC_DEBUG on each end of each block requested. So, e.g., there's one (true size) 8-byte object still living at the end, and 445 48-byte objects. Unreclaimed ints and floats aren't counted here (they've got their own free lists, and don't go thru pymalloc). I don't know what all that stuff is, but I bet there are about 25 dicts still alive at the end. > I also think that if a strongly connected component (a stronger > concept than cycle) has exactly one object with a finalizer in it, > that finalizer should be called, and then the object should somehow be > marked as having been finalized (maybe a separate GC queue could be > used for this) in case it is resurrected by its finalizer. With the addition of gc.get_referents() in 2.3, it's easy to compute SCCs via Python code now; it's a PITA in C. OTOH, figuring out which finalizers to call seems a PITA in Python: A<->F1 -> F2<->B F1 and F2 have finalizers; A and B don't. Python code can easily determine that there are 2 SCCs here, each with 1 finalizer (I suppose gc's has_finalizer() would need to be exposed, to determine whether __del__ exists correctly). A tricky bit then is that running F1.__del__ may end up deleting F2 by magic (this is *possible* since F2 is reachable from F1, and F1.__del__ may break the link to F2), but it's hard for pure-Python code to know that. So that part seems easier done in C, and creating new gc lists in C is very easy thanks to the nice doubly-linked-list C API Neil coded in gcmodule. Note a subtlety: the finalizers in SCCs should be run in a topsort ordering of the derived SCC graph (since F1.__del__ can ask F2 to do stuff, despite that F1 and F2 are in different SCCs, F1 should be finalized before F2). Finding a topsort order is also easy in Python (and also a PITA in C). So I picture computing a topsorted list of suitable objects (those that have a finalizer, and have the only finalizer in their SCC) in Python, and passing that on to a new gcmodule entry point. The latter can link those objects into a doubly-linked C list in the same order, and then run finalizers "left to right". It's a nice property of the gc lists that, e.g., if F1.__del__ does end up deleting F2, F2 simply vanishes from the list. Another subtlety: suppose F1.__del__ resurrects F1, and doesn't delete F2. Should F2.__del__ be called anyway? Probably not, since if F1 is alive, everything reachable from it is also alive, and F1 -> F2. I've read that Java can get into a state where it's only able to reclaim 1 object per full gc collection due to headaches like this, despite that everything is trash. There's really no way to tell whether F1.__del__ resurrects F1 short of starting gc over again (in particular, looking at F1's refcount before and after running F1.__del__ isn't reliable evidence for either conclusion, unless the "after" refcount is 0). From drifty@alum.berkeley.edu Tue Apr 15 04:24:30 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 14 Apr 2003 20:24:30 -0700 (PDT) Subject: [Python-Dev] Using temp files and the Internet in regression tests In-Reply-To: <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net> References: <200304131322.h3DDMZ718822@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131250470.22203@death.OCF.Berkeley.EDU> <200304140102.h3E12kG26965@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304131901520.28443@death.OCF.Berkeley.EDU> <200304141152.h3EBqTW28000@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.53.0304141556270.16375@death.OCF.Berkeley.EDU> <200304150008.h3F088028745@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.53.0304142024150.15575@death.OCF.Berkeley.EDU> [Guido van Rossum] > > One of these days I am going to put together an "Intro to > > python-dev" page that discusses the basic etiquette on the list and > > how to slowly get more and more involved. > > There's already quite a bit of that at http://www.python.org/dev/ > (follow the links to "Development Process" and "Culture"). Since you > already have access to the CVS repository for the website, you could > simply augment what's already there... > That's what I had in mind. -Brett From tim_one@email.msn.com Tue Apr 15 05:02:02 2003 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 15 Apr 2003 00:02:02 -0400 Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCMEAKEHAB.tim_one@email.msn.com> [Greg Ewing] > ... > What bothers me, though, is that even with finalizers, the library > writer *still* can't guarantee eventual reclamation. The application > can unwittingly stuff it all up by creating cycles, and there's > nothing the library writer can do about it. They're not trying very hard, then -- and, admittedly, most don't. For example, every time the library grabs a resource that needs finalization, it can plant a weakref to it in a singleton private module object with a __del__ method. When the module is torn down at shutdown, that object's __del__ gets called via refcount-falls-from-1-to-0 (it's a private object -- the library author can surely guarantee *it* isn't in a cycle), and free whichever resources still exist then. The library could instead register a cleanup function via atexit(). Or it could avoid weakrefs by setting up a thread that wakes up every now and again, to scan gc.garbage for instances of the objects it passed out. Finding one, it could finalize the resources held by the object, mark the object as no longer needing resource finalization, and letting the object leak. And so on -- Python supplies lots of ways to get what you want even here. > It seems to me that giving up on finalization altogether in the > presence of cycles is too harsh. In most cases, the cycle isn't > actually going to make any difference. With a cycle of your > abovementioned file-descriptor-holding objects, for example, could be > finalized in an arbitrary order, because the *finalizers* don't depend > on any other objects in the cycle. I expect that's usually so, but that detecting that it's so is intractable. Even if we relied on programmers declaring their beliefs explicitly, Python still has to be paranoid enough to avoid crashing if the stated beliefs aren't really true. For example, if you fight your way through the details of Java's Byzantine finalization scheme, you'll find that the hairiest parts of it exist just to ensure that Java's gc internals never end up dereferencing dangling pointers. This has the added benefit that most experienced Java programmers appear to testify that Java's finalizers are useless <wink>. > So maybe there should be some way of classifying the references held > by an object into those that are relied upon by its finalizer, and > those that aren't. How? I believe this is beyond realistic automated analysis for Python source. > The algorithm would then be to first go through and clear all the > references that *aren't* needed by finalizers, and then... > [assuming there's no problem leads to the conclusion there's no > problem <wink>] You probably need also to detect that the finalizer can't resurrect the object either, else clearing references that aren't needed specifically for finalization would leave the resurrected object in a damaged state. From greg@cosc.canterbury.ac.nz Tue Apr 15 05:51:23 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Apr 2003 16:51:23 +1200 (NZST) Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEAKEHAB.tim_one@email.msn.com> Message-ID: <200304150451.h3F4pN621203@oma.cosc.canterbury.ac.nz> > They're not trying very hard, then -- and, admittedly, most don't. > For example, every time the library grabs a resource that needs > finalization, it can plant a weakref to it in a singleton private > module object with a __del__ method... If you have to go through such convolutions to make __del__ methods reliable, perhaps some other mechanism should be provided in the first place. What you're describing sounds a lot like a scheme used in a Smalltalk system that I encountered once. Objects didn't have finalizing methods themselves; instead, an object could register another object as an "executor" to carry out its "last will and testament". This was done *after* the object in question had been deallocated, and after all other GC activity had finished, so there was no risk of resurrecting anything or getting the GC into a knot. Using weakrefs, it might be possible to implement something like this in pure Python, for use as an alternative to __del__ methods. > How? I believe this is beyond realistic automated analysis for Python > source. I wasn't suggesting that it be automated, I was suggesting that it be done explicitly. Suppose, e.g. there were a special attribute __dontclear__ which can be given a list of names of attributes that the GC shouldn't clear. The author of a __del__ method would then have to make sure that everything it needs is mentioned in that list, or risk having it disappear. > Even if we relied on programmers declaring their beliefs explicitly, > Python still has to be paranoid enough to avoid crashing if the > stated beliefs aren't really true. I can't see how a crash could result -- the worst that might happen is a __del__ method throws an exception because some attribute that it relies on has been cleared. That's then a programming error in that class -- the attribute should have been listed in __dontclear__. > You probably need also to detect that the finalizer can't resurrect > the object either, else clearing references that aren't needed > specifically for finalization would leave the resurrected object in > a damaged state. Or just refrain from writing __del__ methods that are silly enough to resurrect their objects. Or if resurrection really is necessary, put all their vital attributes in __dontclear__. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@v.loewis.de Tue Apr 15 06:13:02 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 07:13:02 +0200 Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz> References: <200304150141.h3F1fF720801@oma.cosc.canterbury.ac.nz> Message-ID: <m3k7dwbaxt.fsf@mira.informatik.hu-berlin.de> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > What bothers me, though, is that even with finalizers, the library > writer *still* can't guarantee eventual reclamation. The application > can unwittingly stuff it all up by creating cycles, and there's > nothing the library writer can do about it. That is not so. If the object having a finalizer doesn't support references to arbitrary other objects, then the application cannot make this object be part of a cycle. This is while file objects will be eventually closed: they cannot be part of a cycle. Being-referred-to from a cycle is fine: If the cycle itself has no objects with finalizers, GC will break the cycle at an arbitrary point and thus release all objects in the cycle, which will then release the object with a finalizer, which will run the finalizer. So my usage guideline is this: If you need a finalizer, always make two objects. One carries the resource being encapsulated, and nothing else. The other one is the object exposed to applications, which has a reference to the resource. Regards, Martin From martin@v.loewis.de Tue Apr 15 06:24:48 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 07:24:48 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> Message-ID: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> Tim Peters <tim.one@comcast.net> writes: > I'm not sure that better alternatives for *some* of what SF does couldn't be > gotten with reasonable effort. For example, on a quiet machine, I just did > a cvs up on a fully up-to-date Python, via SF. It is probably possible to find somebody to host the Python CVS and offer enough connectivity to give more performant service than SF. However, there is more to hosting such a service: You need user management, email notifications, backups, and occasional hand-editing of the CVS repository. I would expect that it might consume significant time (several hours a week) to host the Python CVS. (Time per project reduces if you host several projects) So from your message, I still don't see who could be taking over the Python CVS. Skip, did you have anybody specific in mind? Regards, Martin From greg@cosc.canterbury.ac.nz Tue Apr 15 06:41:40 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Apr 2003 17:41:40 +1200 (NZST) Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <m3k7dwbaxt.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz> > If the object having a finalizer doesn't support references to > arbitrary other objects, then the application cannot make this object > be part of a cycle. It could make a subclass, though... > If you need a finalizer, always make two objects. One carries the > resource being encapsulated, and nothing else. The other one is the > object exposed to applications, which has a reference to the resource. That actually sounds like a reasonable solution. I was thinking that __del__ methods on anything referenced from the cycle would prevent collection, not just in the cycle itself, but as you point out, that's not the case. Given that, many of my objections go away. I still may write that Executors module, though, it could be fun... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From walter@livinglogic.de Tue Apr 15 12:53:02 2003 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue, 15 Apr 2003 13:53:02 +0200 Subject: [Python-Dev] ValueErrors in range() Message-ID: <3E9BF29E.6060807@livinglogic.de> Current CVS raises ValueErrors for range() arguments of the wrong type: >>> range(0, "spam") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: integer end argument expected, got str. Shouldn't these be TypeErrors? Bye, Walter Dörwald From barry@python.org Tue Apr 15 12:56:48 2003 From: barry@python.org (Barry Warsaw) Date: 15 Apr 2003 07:56:48 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050407808.9401.8.camel@anthem> On Tue, 2003-04-15 at 01:24, Martin v. L=F6wis wrote: > It is probably possible to find somebody to host the Python CVS and > offer enough connectivity to give more performant service than > SF. However, there is more to hosting such a service: You need user > management, email notifications, backups, and occasional hand-editing > of the CVS repository.=20 This would actually be a big advantage over the present situation. CVS repository surgery is (sadly) necessary sometimes, but it's something we currently can't do without a lot of pain. > I would expect that it might consume > significant time (several hours a week) to host the Python CVS. (Time > per project reduces if you host several projects) I can think of at least 3 projects we could host. :). But even if GForge was our panacea, it would still take a real commitment to run and maintain. I suspect the current crop of volunteers is already stretched pretty far. OTOH, if we could roll Zope into the mix, we'd have more resources to draw from, maybe. -Barry From skip@pobox.com Tue Apr 15 13:36:17 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 15 Apr 2003 07:36:17 -0500 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> Message-ID: <16027.64705.970817.546379@montanaro.dyndns.org> Martin> So from your message, I still don't see who could be taking over Martin> the Python CVS. Skip, did you have anybody specific in mind? Nope. I was just tossing out an idea based on my growing frustration with SF's poor performance. I see the abysmal CVS performance Tim referred to and also find the bug tracker performance to be problematic (web access, submissions and updates are often very slow and sometimes fail, forcing me to sit around waiting for them to complete and then going back to check that my submission/change actually worked). Skip From guido@python.org Tue Apr 15 13:42:43 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 08:42:43 -0400 Subject: [Python-Dev] ValueErrors in range() In-Reply-To: "Your message of Tue, 15 Apr 2003 13:53:02 +0200." <3E9BF29E.6060807@livinglogic.de> References: <3E9BF29E.6060807@livinglogic.de> Message-ID: <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net> > Current CVS raises ValueErrors for range() arguments > of the wrong type: > > >>> range(0, "spam") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > ValueError: integer end argument expected, got str. > > Shouldn't these be TypeErrors? Right! I did not review this code enough. :-( Fixing now... --Guido van Rossum (home page: http://www.python.org/~guido/) From ben@algroup.co.uk Tue Apr 15 15:15:53 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 15 Apr 2003 15:15:53 +0100 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <1050407808.9401.8.camel@anthem> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> Message-ID: <3E9C1419.6090908@algroup.co.uk> Barry Warsaw wrote: > On Tue, 2003-04-15 at 01:24, Martin v. Löwis wrote: > > >>It is probably possible to find somebody to host the Python CVS and >>offer enough connectivity to give more performant service than >>SF. However, there is more to hosting such a service: You need user >>management, email notifications, backups, and occasional hand-editing >>of the CVS repository. > > > This would actually be a big advantage over the present situation. CVS > repository surgery is (sadly) necessary sometimes, but it's something we > currently can't do without a lot of pain. > > >>I would expect that it might consume >>significant time (several hours a week) to host the Python CVS. (Time >>per project reduces if you host several projects) > > > I can think of at least 3 projects we could host. :). But even if > GForge was our panacea, it would still take a real commitment to run and > maintain. I suspect the current crop of volunteers is already stretched > pretty far. OTOH, if we could roll Zope into the mix, we'd have more > resources to draw from, maybe. My company would be happy to host it in The Bunker (http://www.thebunker.net/). We do have to figure out some way to get compensated for the bandwidth we'd have to pay for (does anyone know how much that is?), but I'm leaving that to those that worry about such things. Presumably they'd want a link to us somewhere, or something of that nature. We have plenty of experience running CVS and we have 24x7 support. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From blunck@gst.com Tue Apr 15 15:11:42 2003 From: blunck@gst.com (Christopher Blunck) Date: Tue, 15 Apr 2003 10:11:42 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <16027.64705.970817.546379@montanaro.dyndns.org> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org> Message-ID: <20030415141142.GA6011@homer.gst.com> On Tue, Apr 15, 2003 at 07:36:17AM -0500, Skip Montanaro wrote: > Nope. I was just tossing out an idea based on my growing frustration with > SF's poor performance. I see the abysmal CVS performance Tim referred to > and also find the bug tracker performance to be problematic (web access, > submissions and updates are often very slow and sometimes fail, forcing me > to sit around waiting for them to complete and then going back to check that > my submission/change actually worked). ... Not to mention file uploads that don't actually upload, erroneous error messages when posting patches and/or bugs, and an inability to map bugs to patches as a built in feature. -c -- 10:10am up 176 days, 1:08, 4 users, load average: 1.18, 1.40, 1.62 From guido@python.org Tue Apr 15 15:23:16 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 10:23:16 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: Your message of "Tue, 15 Apr 2003 15:15:53 BST." <3E9C1419.6090908@algroup.co.uk> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> Message-ID: <200304151424.h3FENGS26701@odiug.zope.com> > My company would be happy to host it in The Bunker > (http://www.thebunker.net/). We do have to figure out some way to get > compensated for the bandwidth we'd have to pay for (does anyone know how > much that is?), but I'm leaving that to those that worry about such > things. Presumably they'd want a link to us somewhere, or something of > that nature. > > We have plenty of experience running CVS and we have 24x7 support. I'd like to pursue this, but I don't have time myself. A sponsorship link to TheBunker would definitely be a possibility (we have a link to XS4ALL at the top of www.python.org). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 15 15:26:16 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 10:26:16 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: Your message of "Tue, 15 Apr 2003 10:11:42 EDT." <20030415141142.GA6011@homer.gst.com> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org> <20030415141142.GA6011@homer.gst.com> Message-ID: <200304151426.h3FEQGx26716@odiug.zope.com> > Not to mention file uploads that don't actually upload, erroneous error > messages when posting patches and/or bugs, and an inability to map bugs to > patches as a built in feature. Right. Some of these have (finally) been fixed. But my meta-complaint about SF is that it's impossible to get things fixed at our schedule. I'm still hoping to revive the effort of moving the tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/ --Guido van Rossum (home page: http://www.python.org/~guido/) From ben@algroup.co.uk Tue Apr 15 15:45:23 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Tue, 15 Apr 2003 15:45:23 +0100 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <200304151424.h3FENGS26701@odiug.zope.com> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> Message-ID: <3E9C1B03.1070803@algroup.co.uk> Guido van Rossum wrote: >>My company would be happy to host it in The Bunker >>(http://www.thebunker.net/). We do have to figure out some way to get >>compensated for the bandwidth we'd have to pay for (does anyone know how >>much that is?), but I'm leaving that to those that worry about such >>things. Presumably they'd want a link to us somewhere, or something of >>that nature. >> >>We have plenty of experience running CVS and we have 24x7 support. > > > I'd like to pursue this, but I don't have time myself. A sponsorship > link to TheBunker would definitely be a possibility (we have a link to > XS4ALL at the top of www.python.org). Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews? Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From guido@python.org Tue Apr 15 16:18:02 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 11:18:02 -0400 Subject: [Python-Dev] test_pwd failing Message-ID: <200304151518.h3FFI2S27822@odiug.zope.com> Somebody just changed the pwd module. I now get these errors when running test_pwd: [guido@odiug linux]$ ./python ../Lib/test/regrtest.py test_pwd test_pwd test test_pwd failed -- Traceback (most recent call last): File "/mnt/home/guido/projects/python/dist/src/Lib/test/test_pwd.py", line 29, in test_values self.assertEqual(pwd.getpwuid(e.pw_uid), e) File "/mnt/home/guido/projects/python/dist/src/Lib/unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: ('guido', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido', '/bin/bash') != ('guido1', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido1', '/bin/bash') 1 test failed: test_pwd [guido@odiug linux]$ The last two lines of my /etc/passwd file are: guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash --Guido van Rossum (home page: http://www.python.org/~guido/) From walter@livinglogic.de Tue Apr 15 16:31:05 2003 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue, 15 Apr 2003 17:31:05 +0200 Subject: [Python-Dev] test_pwd failing In-Reply-To: <200304151518.h3FFI2S27822@odiug.zope.com> References: <200304151518.h3FFI2S27822@odiug.zope.com> Message-ID: <3E9C25B9.7020308@livinglogic.de> Guido van Rossum wrote: > Somebody just changed the pwd module. I now get these errors when > running test_pwd: > > [guido@odiug linux]$ ./python ../Lib/test/regrtest.py test_pwd > test_pwd > test test_pwd failed -- Traceback (most recent call last): > File "/mnt/home/guido/projects/python/dist/src/Lib/test/test_pwd.py", line 29, in test_values > self.assertEqual(pwd.getpwuid(e.pw_uid), e) > File "/mnt/home/guido/projects/python/dist/src/Lib/unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: ('guido', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido', '/bin/bash') != ('guido1', 'x', 4102, 4102, 'Guido van Rossum', '/home/guido1', '/bin/bash') > > 1 test failed: > test_pwd > [guido@odiug linux]$ > > The last two lines of my /etc/passwd file are: > > guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash > guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash That's my fault. The duplicate entry for the uid 4102 makes the test fail. I'll think of an alternate test for this case. Bye, Walter Dörwald From walter@livinglogic.de Tue Apr 15 16:41:28 2003 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue, 15 Apr 2003 17:41:28 +0200 Subject: [Python-Dev] test_pwd failing In-Reply-To: <3E9C25B9.7020308@livinglogic.de> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> Message-ID: <3E9C2828.4040803@livinglogic.de> Walter Dörwald wrote: > Guido van Rossum wrote: > >> Somebody just changed the pwd module. I now get these errors when >> running test_pwd: >> >> [...] >> guido:x:4102:4102:Guido van Rossum:/home/guido:/bin/bash >> guido1:x:4102:4102:Guido van Rossum:/home/guido1:/bin/bash > > That's my fault. > > The duplicate entry for the uid 4102 makes the test fail. > > I'll think of an alternate test for this case. Fixed! Should the same change be done for the pwd module, i.e. are duplicate gid's allowed in /etc/group? Bye, Walter Dörwald From fdrake@acm.org Tue Apr 15 16:41:21 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 Apr 2003 11:41:21 -0400 Subject: [Python-Dev] test_pwd failing In-Reply-To: <3E9C25B9.7020308@livinglogic.de> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> Message-ID: <16028.10273.709530.833600@grendel.zope.com> Walter D=F6rwald writes: > The duplicate entry for the uid 4102 makes the test fail. >=20 > I'll think of an alternate test for this case. Since the duplicate entry is perfectly legal, I think the test can really only check that the uid of the retrieved record match the requested uid. I don't see what else can be reasonably checked since everything else for the two entries could differ. -Fred --=20 Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From fdrake@acm.org Tue Apr 15 16:47:09 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 Apr 2003 11:47:09 -0400 Subject: [Python-Dev] test_pwd failing In-Reply-To: <3E9C2828.4040803@livinglogic.de> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> Message-ID: <16028.10621.958603.27070@grendel.zope.com> Walter D=F6rwald writes: > Fixed! And well! Thanks. > Should the same change be done for the pwd module, i.e. > are duplicate gid's allowed in /etc/group? I think they are, but I'm less certain of that. -Fred --=20 Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From gh@ghaering.de Tue Apr 15 16:49:33 2003 From: gh@ghaering.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Tue, 15 Apr 2003 17:49:33 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <3E9C1B03.1070803@algroup.co.uk> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> <3E9C1B03.1070803@algroup.co.uk> Message-ID: <20030415154933.GA6030@mephisto.ghaering.test> * Ben Laurie <ben@algroup.co.uk> [2003-04-15 15:45 +0100]: > Guido van Rossum wrote: > >>My company would be happy to host it in The Bunker > >>(http://www.thebunker.net/). [...] > >>We have plenty of experience running CVS and we have 24x7 support. > > > > I'd like to pursue this, but I don't have time myself. A sponsorship > > link to TheBunker would definitely be a possibility (we have a link to > > XS4ALL at the top of www.python.org). > > Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews? Probably only Sourceforge staff. But maybe we can avoid asking them ... My CVS documentation has to say this: CVS can keep a history file that tracks each use of the `checkout', `commit', `rtag', `update', and `release' commands. You can use `history' to display this information in various formats. So maybe somebody CVS savvy can make the needed changes to Python's CVSROOT at Sourceforge so we can collect the needed data for a week or so in order to produce a statistic? Gerhard -- mail: gh@ghaering.de web: http://ghaering.de/ From guido@python.org Tue Apr 15 16:49:27 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 11:49:27 -0400 Subject: [Python-Dev] test_pwd failing In-Reply-To: Your message of "Tue, 15 Apr 2003 17:41:28 +0200." <3E9C2828.4040803@livinglogic.de> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> Message-ID: <200304151549.h3FFnRR28753@odiug.zope.com> > Should the same change be done for the pwd module, i.e. ^^^grp > are duplicate gid's allowed in /etc/group? I guess group aliases are theoretically possible, so if you can easily fix the test, go ahead. --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Tue Apr 15 17:45:36 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 15 Apr 2003 12:45:36 -0400 Subject: [Python-Dev] Evil setattr hack Message-ID: <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net> >I've checked in what I believe is an adequate block for at least this >particular hack. wrap_setattr(), which is called in response to ><type>.__setattr__(), now compares if the C function it is about to >call is the same as the C function in the built-in base class closest >to the object's class. This means that if B is a built-in class and P >is a Python class derived from B, P.__setattr__ can call >B.__setattr__, but not A.__setattr__ where A is an (also built-in) >base class of B (unless B inherits A.__setattr__). Does this follow __mro__ or __base__? I'm specifically wondering about the implications of multiple inheritance from more than one C base class; this sort of thing (safety checks relating to heap vs. non-heap types and the "closest" method of a particular kind) has bitten me before in relation to ZODB4's Persistence package. In that context, mixing 'type' and 'PersistentMetaClass' makes it impossible to instantiate the resulting metaclass, because neither type.__new__ nor PersistentMetaClass.__new__ is considered "safe" to execute. My "evil hack" to fix that was to add an extra PyObject * to PersistentMetaClass so that it has a larger tp_basicsize than 'type' and Python then considers it the '__base__' type, thus causing its '__new__' method to be accepted as legitimate. From martin@v.loewis.de Tue Apr 15 18:17:30 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 19:17:30 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <1050407808.9401.8.camel@anthem> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> Message-ID: <m3llybel3o.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > I can think of at least 3 projects we could host. :). "We" being "ZC", "PythonLabs", or pluralis majestatis? Regards, Martin From martin@v.loewis.de Tue Apr 15 18:22:17 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 19:22:17 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <3E9C1B03.1070803@algroup.co.uk> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> <3E9C1B03.1070803@algroup.co.uk> Message-ID: <m3he8zekvq.fsf@mira.informatik.hu-berlin.de> Ben Laurie <ben@algroup.co.uk> writes: > Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews? To get some estimate, try to guess how many full downloads of the entire Python tree you will get per day. As Gerhard explains, only SF would know the numbers, but my guess is that incremental updates are negligible compared to full downloads. To draw some random number, I guess you should accomodate 20 full downloads per day, with a complete download being 50MB (i.e. only the dist/src part). Whether this number is close to reality, I don't know. Regards, Martin From guido@python.org Tue Apr 15 19:33:48 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 14:33:48 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: Your message of "Tue, 15 Apr 2003 12:45:36 EDT." <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net> References: <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net> Message-ID: <200304151833.h3FIXmU29036@odiug.zope.com> [Guido] > >I've checked in what I believe is an adequate block for at least > >this particular hack. wrap_setattr(), which is called in response > >to <type>.__setattr__(), now compares if the C function it is > >about to call is the same as the C function in the built-in base > >class closest to the object's class. This means that if B is a > >built-in class and P is a Python class derived from B, > >P.__setattr__ can call B.__setattr__, but not A.__setattr__ where > >A is an (also built-in) base class of B (unless B inherits > >A.__setattr__). > From: "Phillip J. Eby" <pje@telecommunity.com> > Does this follow __mro__ or __base__? It follows __base__, like everything concerned about C level instance lay-out. > I'm specifically wondering about the implications of multiple > inheritance from more than one C base class; this sort of thing > (safety checks relating to heap vs. non-heap types and the "closest" > method of a particular kind) has bitten me before in relation to > ZODB4's Persistence package. It is usually impossible to inherit from more than one C base class, unless all but one are mix-in classes, meaning they add nothing to the instance lay-out of a common base class. > In that context, mixing 'type' and 'PersistentMetaClass' makes it > impossible to instantiate the resulting metaclass, because neither > type.__new__ nor PersistentMetaClass.__new__ is considered "safe" to > execute. You're referring to this error message from tp_new_wrapper(), right: "%s.__new__(%s) is not safe, use %s.__new__()" > My "evil hack" to fix that was to add an extra PyObject * > to PersistentMetaClass so that it has a larger tp_basicsize than > 'type' and Python then considers it the '__base__' type, thus > causing its '__new__' method to be accepted as legitimate. Is this because the algorithm in best_base() picks the wrong base otherwise? --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Tue Apr 15 19:45:43 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 15 Apr 2003 14:45:43 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: <200304151833.h3FIXmU29036@odiug.zope.com> References: <Your message of "Tue, 15 Apr 2003 12:45:36 EDT." <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net> <5.1.1.6.0.20030415123712.01d43700@mail.rapidsite.net> Message-ID: <5.1.1.6.0.20030415143437.02e62ae0@telecommunity.com> At 02:33 PM 4/15/03 -0400, Guido van Rossum wrote: >[Guido] > > >I've checked in what I believe is an adequate block for at least > > >this particular hack. wrap_setattr(), which is called in response > > >to <type>.__setattr__(), now compares if the C function it is > > >about to call is the same as the C function in the built-in base > > >class closest to the object's class. This means that if B is a > > >built-in class and P is a Python class derived from B, > > >P.__setattr__ can call B.__setattr__, but not A.__setattr__ where > > >A is an (also built-in) base class of B (unless B inherits > > >A.__setattr__). > > > From: "Phillip J. Eby" <pje@telecommunity.com> > > > Does this follow __mro__ or __base__? > >It follows __base__, like everything concerned about C level instance >lay-out. > > > I'm specifically wondering about the implications of multiple > > inheritance from more than one C base class; this sort of thing > > (safety checks relating to heap vs. non-heap types and the "closest" > > method of a particular kind) has bitten me before in relation to > > ZODB4's Persistence package. > >It is usually impossible to inherit from more than one C base class, >unless all but one are mix-in classes, meaning they add nothing to the >instance lay-out of a common base class. > > > In that context, mixing 'type' and 'PersistentMetaClass' makes it > > impossible to instantiate the resulting metaclass, because neither > > type.__new__ nor PersistentMetaClass.__new__ is considered "safe" to > > execute. > >You're referring to this error message from tp_new_wrapper(), right: > > "%s.__new__(%s) is not safe, use %s.__new__()" Yep, that's the one. > > My "evil hack" to fix that was to add an extra PyObject * > > to PersistentMetaClass so that it has a larger tp_basicsize than > > 'type' and Python then considers it the '__base__' type, thus > > causing its '__new__' method to be accepted as legitimate. > >Is this because the algorithm in best_base() picks the wrong base >otherwise? Yes, at least for Python 2.2. However, the problem with ZODB4 was only an issue on 2.2; on 2.3, PersistentMetaClass *is* 'type', because it is there to workaround C layout issues in 2.2 that don't exist in 2.3. So this is probably all moot. Anyway... if I recall correctly, even if you got best_base() to pick the right base by changing the order of mixing in the classes, you got a *different* safety error message; I think it might have been in the resulting class, though, rather than in the metaclass. This was all back in November, so my memory is a little hazy. I think there might have been more details in the Zope3-Dev collector issue (#86), but I think Jeremy showed that info to you previously and said that it wasn't enough for you to understand what the problem was. I think part of the complexity had to do with the fact that one of the types (my subclass of 'type') was a "heap type", and PersistentMetaClass was not. But as you pointed out, subclassing from multiple C bases is a rarity, so I don't see any point to following this up further, unless you have some perverse desire to have yet another new-style class layout algorithm change for Python 2.2.3. :) It's probably better just to leave my "make it bigger" hack in ZODB4, since PersistentMetaClass itself is one big Python 2.2 backward compatibility hack anyway. <wink> From martin@v.loewis.de Tue Apr 15 19:50:32 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 20:50:32 +0200 Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz> References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz> Message-ID: <m38yubegsn.fsf@mira.informatik.hu-berlin.de> Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > > If the object having a finalizer doesn't support references to > > arbitrary other objects, then the application cannot make this object > > be part of a cycle. > > It could make a subclass, though... If the type is carefully designed, it can't... Regards, Martin From guido@python.org Tue Apr 15 20:06:15 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 15:06:15 -0400 Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: Your message of "15 Apr 2003 20:50:32 +0200." <m38yubegsn.fsf@mira.informatik.hu-berlin.de> References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz> <m38yubegsn.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304151906.h3FJ6FP29320@odiug.zope.com> > > > If the object having a finalizer doesn't support references to > > > arbitrary other objects, then the application cannot make this > > > object be part of a cycle. > Greg Ewing <greg@cosc.canterbury.ac.nz> writes: > > > It could make a subclass, though... > From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) > > If the type is carefully designed, it can't... I suppose you have something in mind like this (which is the only way I can come up with to implement something like a 'final' class in pure Python): >>> class C(object): ... def __new__(cls): ... if cls is not C: raise TypeError, "haha" ... return object.__new__(cls) ... >>> class D(C): pass ... >>> a = D() Traceback (most recent call last): File "<stdin>", line 1, in ? File "<stdin>", line 3, in __new__ TypeError: haha >>> But how would you prevent this? >>> a = C() >>> a.__class__ = D >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From cnetzer@mail.arc.nasa.gov Tue Apr 15 20:11:38 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 15 Apr 2003 12:11:38 -0700 Subject: [Python-Dev] ValueErrors in range() In-Reply-To: <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net> References: <3E9BF29E.6060807@livinglogic.de> <200304151242.h3FCgho06677@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1050433898.607.35.camel@sayge.arc.nasa.gov> On Tue, 2003-04-15 at 05:42, Guido van Rossum wrote: > > Shouldn't these be TypeErrors? > > Right! I did not review this code enough. :-( Fixing now... My fault again. I misremembered Guido wishing that range() returned ValueError on floats (which I thought was strange at the time). Going over a previous email, I see that he did say TypeError. In the meantime, the test_builtins.py needs to be updated to check against TypeError rather than ValueError. (maybe it'll be done by now; ah, just checked, it has) Chad Netzer From barry@python.org Tue Apr 15 20:26:20 2003 From: barry@python.org (Barry Warsaw) Date: 15 Apr 2003 15:26:20 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <m3llybel3o.fsf@mira.informatik.hu-berlin.de> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <m3llybel3o.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050434780.501.32.camel@barry> On Tue, 2003-04-15 at 13:17, Martin v. Löwis wrote: > Barry Warsaw <barry@python.org> writes: > > > I can think of at least 3 projects we could host. :). > > "We" being "ZC", "PythonLabs", or pluralis majestatis? "We" being me. :) Python, Mailman, and mimelib to name 3. PyBSDDB perhaps, and I'm sure others. Maybe even open it up to (most? all?) Python projects with the proper PSF, er, wheel grease. :) -Barry From guido@python.org Tue Apr 15 20:23:48 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 15:23:48 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: Your message of "Mon, 14 Apr 2003 22:16:31 EDT." <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCMEDMEDAB.tim.one@comcast.net> Message-ID: <200304151923.h3FJNmG29436@odiug.zope.com> > [Guido] > > ... > > This makes me think that Python should run the garbage collector > > before exiting, so that finalizers on objects that were previously > > kept alive by cycles are called (even if finalizers on objects that > > are *part* of a cycle still won't be called). [Tim] > What about finalizers on objects that are alive at exit because > they're still reachable? We seem to leave a lot of stuff alive at > the end. For example, here are the pymalloc stats at the end under > current CVS, after opening an interactive shell then exiting > immediately; this is produced at the end of Py_Finalize(), and only > call_ll_exitfuncs() is done after this (and that probably shouldn't > free anything): > > Small block threshold = 256, in 32 size classes. > > class size num pools blocks in use avail blocks > ----- ---- --------- ------------- ------------ > 2 24 1 1 168 > 5 48 1 2 82 > 6 56 13 170 766 > 7 64 13 445 374 > 8 72 5 25 255 > 9 80 1 1 49 > 15 128 1 2 29 > 20 168 5 25 95 > 23 192 1 1 20 > 25 208 1 2 17 > 29 240 1 2 14 > 31 256 1 1 14 > > # times object malloc called = 17,119 > 3 arenas * 262144 bytes/arena = 786,432 > > # bytes in allocated blocks = 45,800 > # bytes in available blocks = 131,072 > 145 unused pools * 4096 bytes = 593,920 > # bytes lost to pool headers = 1,408 > # bytes lost to quantization = 1,944 > # bytes lost to arena alignment = 12,288 > Total = 786,432 > > "size" here is 16 bytes larger than in a release build, because of > the 8-byte padding added by PYMALLOC_DEBUG on each end of each block > requested. So, e.g., there's one (true size) 8-byte object still > living at the end, and 445 48-byte objects. Unreclaimed ints and > floats aren't counted here (they've got their own free lists, and > don't go thru pymalloc). > > I don't know what all that stuff is, but I bet there are about 25 > dicts still alive at the end. Close! I moved the debugging code that can print the list of all objects still alive at the end around so that it is now next to the code that prints the above malloc stats. (If you're following CVS email you might have noticed this. :-) The full output is way too large to post; you can see for yourself by creating a debug build and running this (on Unix; windows users use their imagination or upgrade their OS): PYTHONDUMPREFS= ./python -S -c pass When I run this, I see 23 dictionaries. One is the dict of interned strings that are still alive; the others are the tp_dicts of the various built-in type objects. Some interned strings appear to be kept alive by various static globals holding names for faster name lookup; there isn't much we can do about that. I also don't think we should bother un-initializing the built-in types. Apart from that, I don't think I see anything that looks suspect. Of course, running a larger program with the same setup might reveal real leaks. > > I also think that if a strongly connected component (a stronger > > concept than cycle) has exactly one object with a finalizer in it, > > that finalizer should be called, and then the object should > > somehow be marked as having been finalized (maybe a separate GC > > queue could be used for this) in case it is resurrected by its > > finalizer. > > With the addition of gc.get_referents() in 2.3, it's easy to compute > SCCs via Python code now; it's a PITA in C. OTOH, figuring out > which finalizers to call seems a PITA in Python: > > A<->F1 -> F2<->B > > F1 and F2 have finalizers; A and B don't. Python code can easily > determine that there are 2 SCCs here, each with 1 finalizer (I > suppose gc's has_finalizer() would need to be exposed, to determine > whether __del__ exists correctly). A tricky bit then is that > running F1.__del__ may end up deleting F2 by magic (this is > *possible* since F2 is reachable from F1, and F1.__del__ may break > the link to F2), but it's hard for pure-Python code to know that. > So that part seems easier done in C, and creating new gc lists in C > is very easy thanks to the nice doubly-linked-list C API Neil coded > in gcmodule. > > Note a subtlety: the finalizers in SCCs should be run in a topsort > ordering of the derived SCC graph (since F1.__del__ can ask F2 to do > stuff, despite that F1 and F2 are in different SCCs, F1 should be > finalized before F2). Finding a topsort order is also easy in > Python (and also a PITA in C). > > So I picture computing a topsorted list of suitable objects (those > that have a finalizer, and have the only finalizer in their SCC) in > Python, and passing that on to a new gcmodule entry point. The > latter can link those objects into a doubly-linked C list in the > same order, and then run finalizers "left to right". It's a nice > property of the gc lists that, e.g., if F1.__del__ does end up > deleting F2, F2 simply vanishes from the list. > > Another subtlety: suppose F1.__del__ resurrects F1, and doesn't > delete F2. Should F2.__del__ be called anyway? Probably not, since > if F1 is alive, everything reachable from it is also alive, and F1 > -> F2. I've read that Java can get into a state where it's only > able to reclaim 1 object per full gc collection due to headaches > like this, despite that everything is trash. There's really no way > to tell whether F1.__del__ resurrects F1 short of starting gc over > again (in particular, looking at F1's refcount before and after > running F1.__del__ isn't reliable evidence for either conclusion, > unless the "after" refcount is 0). I'm glazing over the details now, but there seems to be a kernel of useful cleanup in here somehow; I hope that someone will be able to contribute a prototype of such code at least! --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Tue Apr 15 19:27:55 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 20:27:55 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <200304151426.h3FEQGx26716@odiug.zope.com> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org> <20030415141142.GA6011@homer.gst.com> <200304151426.h3FEQGx26716@odiug.zope.com> Message-ID: <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > Right. Some of these have (finally) been fixed. But my > meta-complaint about SF is that it's impossible to get things fixed at > our schedule. I'm still hoping to revive the effort of moving the > tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/ However, I take the fact that it has been sitting in that state for many months now as an indication that our schedule might not outpace SF. This stuff consumes a lot of time, and I'm willing to accept worse-than-optimal quality of service if it doesn't consume my time. Regards, Martin From martin@v.loewis.de Tue Apr 15 20:50:11 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 15 Apr 2003 21:50:11 +0200 Subject: Algorithm for finalizing cycles (Re: [Python-Dev] Garbage collecting closures) In-Reply-To: <200304151906.h3FJ6FP29320@odiug.zope.com> References: <200304150541.h3F5feO21318@oma.cosc.canterbury.ac.nz> <m38yubegsn.fsf@mira.informatik.hu-berlin.de> <200304151906.h3FJ6FP29320@odiug.zope.com> Message-ID: <m3r883czgs.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > > > It could make a subclass, though... > > > If the type is carefully designed, it can't... > > I suppose you have something in mind like this (which is the only way > I can come up with to implement something like a 'final' class in pure > Python): I was actually thinking about impure Python, i.e. by means of omitting Py_TPFLAGS_BASETYPE. > But how would you prevent this? > > >>> a = C() > >>> a.__class__ = D > >>> For the issue at hand: Assigning __class__ won't change the object layout, so if the object didn't have an __dict__ before, it won't have an __dict__ afterwards. Of course, if there are writable slots, the application could corrupt the underlying resource reference, making __del__ meaningless, anyway. Here I need to bring up Python's "we are all consenting adults" attitude... Regards, Martin From guido@python.org Tue Apr 15 21:08:25 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 16:08:25 -0400 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: Your message of "15 Apr 2003 20:27:55 +0200." <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <16027.64705.970817.546379@montanaro.dyndns.org> <20030415141142.GA6011@homer.gst.com> <200304151426.h3FEQGx26716@odiug.zope.com> <m3d6jnehuc.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304152008.h3FK8Pg29754@odiug.zope.com> > > Right. Some of these have (finally) been fixed. But my > > meta-complaint about SF is that it's impossible to get things fixed at > > our schedule. I'm still hoping to revive the effort of moving the > > tracker to RoundUp; it's 80% complete IMO: http://www.python.org:8080/ > > However, I take the fact that it has been sitting in that state for > many months now as an indication that our schedule might not outpace > SF. This stuff consumes a lot of time, and I'm willing to accept > worse-than-optimal quality of service if it doesn't consume my time. Right -- but someone might volunteer and the problem might go away. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Tue Apr 15 22:29:24 2003 From: ark@research.att.com (Andrew Koenig) Date: Tue, 15 Apr 2003 17:29:24 -0400 (EDT) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Sun, 16 Mar 2003 07:32:04 -0500) References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304152129.h3FLTOL05240@europa.research.att.com> >> Moreover, for some data structures, the __cmp__ approach can be >> expensive. For example, if you're comparing sequences of any kind, >> and you know that the comparison is for == or !=, you have your answer >> immediately if the sequences differ in length. If you don't know >> what's being tested, as you wouldn't inside __cmp__, you may spend a >> lot more time to obtain a result that will be thrown away. Guido> Yes. OTOH, as long as cmp() is in the language, these same situations Guido> are more efficiently done by a __cmp__ implementation than by calling Guido> __lt__ and then __eq__ or similar (it's hard to decide which order is Guido> best). So cmp() should be removed at the same time as __cmp__. Yes. Guido> And then we should also change list.sort(), as Tim points out. Maybe Guido> we can start introducing this earlier by using keyword arguments: Guido> list.sort(lt=function) sorts using a < implementation Guido> list.sort(cmp=function) sorts using a __cmp__ implementation The keyword argument might not be necessary: It is always possible for a function such as sort to figure out whether a comparison function is 2-way or 3-way (assuming it matters) by doing only one extra comparison. From duanev@io.com Tue Apr 15 22:37:28 2003 From: duanev@io.com (duane voth) Date: Tue, 15 Apr 2003 16:37:28 -0500 Subject: [Python-Dev] LynxOS 4 port Message-ID: <20030415163728.A22630@io.com> I'd like to get 2.2.2 up on LynxOS 4 for PowerPC. I am very interested in finding others who have worked toward this, and also the person in charge of Python's configure scripts (as it seems LynxOS 4 is a bit of a hybrid). Thanks in advance! -- Duane Voth duanev@io.com -- duanev@atlantis.io.com From guido@python.org Wed Apr 16 00:49:50 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 15 Apr 2003 19:49:50 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: "Your message of Tue, 15 Apr 2003 17:29:24 EDT." <200304152129.h3FLTOL05240@europa.research.att.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> Message-ID: <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> > Guido> And then we should also change list.sort(), as Tim points > Guido> out. Maybe we can start introducing this earlier by using > Guido> keyword arguments: > > Guido> list.sort(lt=function) sorts using a < implementation > Guido> list.sort(cmp=function) sorts using a __cmp__ implementation [Andrew Koenig] > The keyword argument might not be necessary: It is always possible > for a function such as sort to figure out whether a comparison > function is 2-way or 3-way (assuming it matters) by doing only one > extra comparison. That's cute, but a bit too magical for my taste... It's not immediately obvious how this would be done (I know how, but it would require a lot of explaining). Plus, -1 is a perfectly valid truth value. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Wed Apr 16 01:41:31 2003 From: ark@research.att.com (Andrew Koenig) Date: Tue, 15 Apr 2003 20:41:31 -0400 (EDT) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Tue, 15 Apr 2003 19:49:50 -0400) References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304160041.h3G0fVI06215@europa.research.att.com> Guido> That's cute, but a bit too magical for my taste... It's not Guido> immediately obvious how this would be done (I know how, but it Guido> would require a lot of explaining). Plus, -1 is a perfectly Guido> valid truth value. Yes, I know that -1 is a valid truth value. Here's the trick. The object of the game is to figure out whether f is < or __cmp__. Suppose you call f(x, y) and it returns 0. Then you don't care which one f is, because x<y is false either way. So the first time you care is the first time f(x, y) returns nonzero. Now you can find out what kind of function f is by calling f(y, x). If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison. From greg@cosc.canterbury.ac.nz Wed Apr 16 02:11:34 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 16 Apr 2003 13:11:34 +1200 (NZST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200304160041.h3G0fVI06215@europa.research.att.com> Message-ID: <200304160111.h3G1BYd03439@oma.cosc.canterbury.ac.nz> > Yes, I know that -1 is a valid truth value. > > So the first time you care is the first time f(x, y) returns nonzero. > Now you can find out what kind of function f is by calling f(y, x). > If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison. I think the worry is that the function might be saying "true" to both of these, but just happen to spell it 1 the first time and -1 the second. Probably fairly unlikely, though... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Wed Apr 16 02:57:49 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 15 Apr 2003 21:57:49 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200304160111.h3G1BYd03439@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCEEEKEDAB.tim.one@comcast.net> [Andrew Koenig] > Yes, I know that -1 is a valid truth value. > > So the first time you care is the first time f(x, y) returns nonzero. > Now you can find out what kind of function f is by calling f(y, x). > If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison. [Greg Ewing] > I think the worry is that the function might be saying > "true" to both of these, but just happen to spell it > 1 the first time and -1 the second. Then it's answering true to both x < y ? and y < x ? The comparison function is insane, then, so it doesn't matter what list.sort() does in that case (the algorithm is robust against insane comparison functions now, but doesn't define what will happen then beyond that the output list will contain a permutation of its input state). I've ignored this scheme for two reasons: anti-Pythonicity (having Python guess which kind of comparison function you wrote is anti-Pythonic on the face of it), and inefficiency. list.sort() is so bloody highly tuned now that adding even one test-&-branch per comparison, in C, on native C ints, gives a measurable slowdown, even when the user passes an expensive comparison function. In the case that no comparison function is passed, we're able to skip a layer of function call now by calling PyObject_RichCompareBool(X, Y, Py_LT) directly (no cmp-to-LT conversion is needed then). Against that, it could be natural to play Andrew's trick only in count_run() (the part of the code that identifies natural runs). That would be confined to 2 textual comparison sites, and does no more than len(list)-1 comparisons total now. From greg@cosc.canterbury.ac.nz Wed Apr 16 03:31:19 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 16 Apr 2003 14:31:19 +1200 (NZST) Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEEKEDAB.tim.one@comcast.net> Message-ID: <200304160231.h3G2VJs03574@oma.cosc.canterbury.ac.nz> > Then it's answering true to both > > x < y ? > and > y < x ? > > The comparison function is insane, then No, I'm the one that's insane, I think. You're right, this is impossible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jack@performancedrivers.com Wed Apr 16 04:00:36 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Tue, 15 Apr 2003 23:00:36 -0400 Subject: [Python-Dev] sre.c and sre_match() Message-ID: <20030415230036.L1039@localhost.localdomain> I can't find sre_match() anywhere in the source and it doesn't have a man page. Usage is sprinkled throughout sre.c but it doesn't seem to be defined anywhere I can find. Would someone in the know tell me where it is? I was actually poking around to see how hard it would be to allow pure-python string classes to work with the re modules. Much slower than base strings, but nice for odd cases (like doing regexp matches on ternary trees). -jackdied From tim_one@email.msn.com Wed Apr 16 04:12:51 2003 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 15 Apr 2003 23:12:51 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200304160231.h3G2VJs03574@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCIEAOEHAB.tim_one@email.msn.com> >> Then it's answering true to both >> >> x < y ? >> and >> y < x ? >> >> The comparison function is insane, then [Greg Ewing] > No, I'm the one that's insane, I think. You're right, > this is impossible. For a sane comparison function, yes. Python can't enforce that user-supplied functions are sane, though, and-- as always --it's Python's job to ensure that nothing catastrophic happens when users go bad. One of the reasons Python had to grow its own sort implementation is that various platform qsort() implementations weren't robust against ill-behaved cmp functions. For example, a typical quicksort partitioning phase searches right for the next element >= key, and left for the next <= key. Some are tempted to save inner-loop index comparisons by ensuring that the leftmost slice element is <= key, and the rightmost >= key, before partitioning begins. Then the left and right inner searches are "guaranteed" not to go too far, and by element comparisons alone. But if the comparison function is inconsistent, that can lead to the inner loops reading outside the slice bounds, and so cause segfaults. Python's post-platform-qsort sorts all protect against that kind of crud, but can't give a useful specification of the result in such cases (beyond that the list is *some* permutation of its input state -- no element is lost or duplicated -- and guaranteeing just that much in the worst cases causes some pain in the implementation). From tim_one@email.msn.com Wed Apr 16 04:22:42 2003 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 15 Apr 2003 23:22:42 -0400 Subject: [Python-Dev] sre.c and sre_match() In-Reply-To: <20030415230036.L1039@localhost.localdomain> Message-ID: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com> [Jack Diederich] > I can't find sre_match() anywhere in the source It's in _sre.c, here: LOCAL(int) SRE_MATCH(SRE_STATE* state, SRE_CODE* pattern, int level) SRE_MATCH is a macro, and expands to either sre_match or sre_umatch, depending on whether Unicode support is enabled. Note that _sre.c arranges to compile itself *twice*, via its #define SRE_RECURSIVE #include "_sre.c" #undef SRE_RECURSIVE This is to get both 8-bit and Unicode versions of the basic routines when Unicode support is enabled. > and it doesn't have a man page. Heh. Does *any* Python source code have a man page <wink>? > ... > I was actually poking around to see how hard it would be to allow > pure-python string classes to work with the re modules. Sorry, no idea. Note that sre works on any object supporting the ill-fated buffer interface. You may have a hard time figuring out that too. But, e.g., it implies that re can search directly over an mmap'ed file (you don't need to read the file into a string first). From tim_one@email.msn.com Wed Apr 16 05:51:27 2003 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 16 Apr 2003 00:51:27 -0400 Subject: [Python-Dev] Garbage collecting closures In-Reply-To: <200304151923.h3FJNmG29436@odiug.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCEEBCEHAB.tim_one@email.msn.com> This is a multi-part message in MIME format. ------=_NextPart_000_0006_01C303B2.4FA6CF60 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit [Guido] > I'm glazing over the details now, but there seems to be a kernel of > useful cleanup in here somehow; I hope that someone will be able to > contribute a prototype of such code at least! I'll attach a head start, a general implementation of Tarjan's SCC algorithm that produces a list of SCCs already in a topsort order. I haven't tested this enough, and Tarjan's algorithm is subtle -- user beware. The trygc() function at the end is an example application that appears to work, busting all the objects gc knows about into SCCs and displaying them. This requires Python CVS (for the new gc.get_referents function). Note that you'll get a very large SCC at the start. This isn't an error! Each module that imports sys ends up in this SCC, due to that the module has the module sys in its module dict, and sys has the module in its sys.modules dict. >From there, modules have their top-level functions in their dict, while the top level functions point back to the module dict via func_globals. Etc. Everything in this giant blob is reachable from everything else. For the gc application, it would probably be better (run faster and consume less memory) if dfs() simply ignored objects with no successors. Correctness shouldn't be harmed if def started with succs = successors(v) if not succs: return except that objects with no successors would no longer be considered singleton SCCs, and the recursive call to dfs() would need to be fiddled to skip trying to update id2lowest[v_id] then (so dfs should be changed to return a bool saying whether it took the early return). This would save the current work of trying to chase pointless things like ints and strings. Still, it's pretty zippy as-is! ------=_NextPart_000_0006_01C303B2.4FA6CF60 Content-Type: text/plain; name="scc.py" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="scc.py" # This implements Tarjan's linear-time algorithm for finding the maximal # strongly connected components. It takes time proportional to the sum # of the number of nodes and arcs. # # Two functions must be passed to the constructor: # node2id graph node -> a unique integer # successors graph node -> sequence of immediate successor graph = nodes # # Call method getsccs() with an iterable producing the root nodes of the = graph. # The result is a list of SCCs, each of which is a list of graph nodes. # This is a partitioning of all graph nodes reachable from the roots, # where each SCC is a maximal subset such that each node in an SCC is # reachable from all other nodes in the SCC. Note that the derived = graph # where each SCC is a single "supernode" is necessarily acyclic (else if # SCC1 and SCC2 were in a cycle, each node in SCC1 would be reachable = from # each node in SCC1 and SCC2, contradicting that SCC1 is a maximal = subset). # The list of SCCs returned by getsccs() is in a topological sort order = wrt # this derived DAG. class SCC(object): def __init__(self, node2id, successors): self.node2id =3D node2id self.successors =3D successors def getsccs(self, roots): import sys node2id, successors =3D self.node2id, self.successors get_dfsnum =3D iter(xrange(sys.maxint)).next id2dfsnum =3D {} id2lowest =3D {} stack =3D [] id2stacki =3D {} sccs =3D [] def dfs(v, v_id): id2dfsnum[v_id] =3D id2lowest[v_id] =3D v_dfsnum =3D = get_dfsnum() id2stacki[v_id] =3D len(stack) stack.append((v, v_id)) for w in successors(v): w_id =3D node2id(w) if w_id not in id2dfsnum: # first time we saw w dfs(w, w_id) id2lowest[v_id] =3D min(id2lowest[v_id], = id2lowest[w_id]) else: w_dfsnum =3D id2dfsnum[w_id] if w_dfsnum < v_dfsnum and w_id in id2stacki: id2lowest[v_id] =3D min(id2lowest[v_id], = w_dfsnum) if id2lowest[v_id] =3D=3D v_dfsnum: i =3D id2stacki[v_id] scc =3D [] for w, w_id in stack[i:]: del id2stacki[w_id] scc.append(w) del stack[i:] sccs.append(scc) for v in roots: v_id =3D node2id(v) if v_id not in id2dfsnum: dfs(v, v_id) sccs.reverse() return sccs _basic_tests =3D """ >>> succs =3D {1: [2], 2: []} >>> s =3D SCC(int, lambda i: succs[i]) The order in which the roots are listed doesn't matter: we get the = unique topsort regardless. >>> s.getsccs([1]) [[1], [2]] >>> s.getsccs([1, 2]) [[1], [2]] >>> s.getsccs([2, 1]) [[1], [2]] But note that 1 isn't reachable from 2, so giving 2 as the only root = won't find 1. >>> s.getsccs([2]) [[2]] >>> succs =3D {1: [2], ... 2: [3, 5], ... 3: [2, 4], ... 4: [3], ... 5: [2]} >>> s =3D SCC(int, lambda i: succs[i]) >>> s.getsccs([1]) [[1], [2, 3, 4, 5]] >>> s.getsccs(range(1, 6)) [[1], [2, 3, 4, 5]] Break the link from 4 back to 2. >>> succs[4] =3D [] >>> s.getsccs([1]) [[1], [2, 3, 5], [4]] """ __test__ =3D {'basic': _basic_tests} def _test(): import doctest doctest.testmod() if __name__ =3D=3D '__main__': _test() def trygc(): import gc gc.collect() s =3D SCC(id, gc.get_referents) for scc in s.getsccs(gc.get_objects()): if len(scc) =3D=3D 1: continue print "SCC w/", len(scc), "objects" for x in scc: print " ", hex(id(x)), type(x), if hasattr(x, "__name__"): print x.__name__, print ------=_NextPart_000_0006_01C303B2.4FA6CF60-- From martin@v.loewis.de Wed Apr 16 06:19:59 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 16 Apr 2003 07:19:59 +0200 Subject: [Python-Dev] LynxOS 4 port In-Reply-To: <20030415163728.A22630@io.com> References: <20030415163728.A22630@io.com> Message-ID: <m365pf9fy8.fsf@mira.informatik.hu-berlin.de> duane voth <duanev@io.com> writes: > I'd like to get 2.2.2 up on LynxOS 4 for PowerPC. I am very interested > in finding others who have worked toward this, and also the person in > charge of Python's configure scripts (as it seems LynxOS 4 is a bit of > a hybrid). There isn't really a single person "in charge" of it: If you have specific suggestions or questions, don't hesitate to ask; specific patches best go to SF. Regards, Martin From Raymond Hettinger" <python@rcn.com Wed Apr 16 06:56:45 2003 From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger) Date: Wed, 16 Apr 2003 01:56:45 -0400 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA Message-ID: <000801c303e0$df6c9a20$125ffea9@oemcomputer> The docs for PyObject_IsTrue() promise that the "function always succeeds". But in reality it can return an error result if an underlying method returns an error. The calls in ceval.c and elsewhere are cluttered and slowed by trying to handle all three possibilities. In other places (like bltinmodule.c and pyexpat.c), the result is used directly in an "if(result)" clause that ignores the possibility of an error return. Instead of fixing the docs, do you guys think there may be merit in returning False whenever explicit Truth isn't found? Favoring practicality over silent error passage? This would simplify the use of the function, honor the promise in the docs, and match usage in code that had not considered an error result. The function and its callers will end-up a little smaller, a little faster, and a little more consistent. Also, reasoning about truth values will be a tad simpler. Note, similar thoughts also apply to PyObject_Not(). Raymond Hettinger Pythonistas Against Three Valued Predicates From ben@algroup.co.uk Wed Apr 16 11:20:48 2003 From: ben@algroup.co.uk (Ben Laurie) Date: Wed, 16 Apr 2003 11:20:48 +0100 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <20030415154933.GA6030@mephisto.ghaering.test> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> <3E9C1B03.1070803@algroup.co.uk> <20030415154933.GA6030@mephisto.ghaering.test> Message-ID: <3E9D2E80.30902@algroup.co.uk> Gerhard Häring wrote: > * Ben Laurie <ben@algroup.co.uk> [2003-04-15 15:45 +0100]: > >>Guido van Rossum wrote: >> >>>>My company would be happy to host it in The Bunker >>>>(http://www.thebunker.net/). [...] >>>>We have plenty of experience running CVS and we have 24x7 support. >>> >>>I'd like to pursue this, but I don't have time myself. A sponsorship >>>link to TheBunker would definitely be a possibility (we have a link to >>>XS4ALL at the top of www.python.org). >> >>Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews? > > > Probably only Sourceforge staff. But maybe we can avoid asking them ... Is there any particular reason to avoid asking them? This is a public list, after all! Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff From gh@ghaering.de Wed Apr 16 11:53:07 2003 From: gh@ghaering.de (Gerhard Haering) Date: Wed, 16 Apr 2003 12:53:07 +0200 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <3E9D2E80.30902@algroup.co.uk> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> <3E9C1B03.1070803@algroup.co.uk> <20030415154933.GA6030@mephisto.ghaering.test> <3E9D2E80.30902@algroup.co.uk> Message-ID: <3E9D3613.8070100@ghaering.de> Ben Laurie wrote: > Gerhard Häring wrote: >>>Groovy. _Does_ anyone have any idea how much bandwidth your CVS chews? >> >>Probably only Sourceforge staff. But maybe we can avoid asking them ... > > Is there any particular reason to avoid asking them? This is a public > list, after all! No. It's just that from what I see, we can collect the necessary data ourselves and can get a timely and detailed answer by doing so. -- Gerhard From mal@lemburg.com Wed Apr 16 12:26:22 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 Apr 2003 13:26:22 +0200 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA In-Reply-To: <000801c303e0$df6c9a20$125ffea9@oemcomputer> References: <000801c303e0$df6c9a20$125ffea9@oemcomputer> Message-ID: <3E9D3DDE.4090409@lemburg.com> Raymond Hettinger wrote: > The docs for PyObject_IsTrue() promise that the "function > always succeeds". But in reality it can return an error > result if an underlying method returns an error. > > The calls in ceval.c and elsewhere are cluttered and slowed > by trying to handle all three possibilities. In other places > (like bltinmodule.c and pyexpat.c), the result is used directly > in an "if(result)" clause that ignores the possibility of an > error return. > > Instead of fixing the docs, do you guys think there may > be merit in returning False whenever explicit Truth isn't > found? Favoring practicality over silent error passage? Hmm, I've checked my sources and found that I am assuming the documented behaviour, ie. the function never fails. The Zope sources also assume this behaviour and many other extensions probably do too... (we really need a repository of available open source code for Python which makes grepping these things easier, oh well). > This would simplify the use of the function, honor the > promise in the docs, and match usage in code that had not > considered an error result. The function and its callers will > end-up a little smaller, a little faster, and a little more consistent. > Also, reasoning about truth values will be a tad simpler. > > Note, similar thoughts also apply to PyObject_Not(). -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 16 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 69 days left From mhammond@skippinet.com.au Wed Apr 16 13:11:03 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 16 Apr 2003 22:11:03 +1000 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA In-Reply-To: <3E9D3DDE.4090409@lemburg.com> Message-ID: <00ec01c30411$4117a690$530f8490@eden> MAL: > (we really need a repository > of available open source code for Python which makes grepping > these things easier, oh well). Isn't this just a list of CVS roots (and passwords for anonymous on that server <wink/frown>)? Members of the Python foundry at source-forge wouldn't be a bad place to start. Except see that other thread <wink>. Mark. From skip@pobox.com Wed Apr 16 13:54:51 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 16 Apr 2003 07:54:51 -0500 Subject: [Python-Dev] migration away from SourceForge? In-Reply-To: <3E9D3613.8070100@ghaering.de> References: <LNBBLJKPBEHFEDALKOLCOEDJEDAB.tim.one@comcast.net> <m3fzokbae7.fsf@mira.informatik.hu-berlin.de> <1050407808.9401.8.camel@anthem> <3E9C1419.6090908@algroup.co.uk> <200304151424.h3FENGS26701@odiug.zope.com> <3E9C1B03.1070803@algroup.co.uk> <20030415154933.GA6030@mephisto.ghaering.test> <3E9D2E80.30902@algroup.co.uk> <3E9D3613.8070100@ghaering.de> Message-ID: <16029.21147.256535.724317@montanaro.dyndns.org> >> Gerhard H=E4ring wrote: >>>> Groovy. _Does_ anyone have any idea how much bandwidth your CV= S chews? >>>=20 >>> Probably only Sourceforge staff. But maybe we can avoid asking = them ... >>=20 >> Is there any particular reason to avoid asking them? This is a p= ublic >> list, after all! Gerhard> No. It's just that from what I see, we can collect the Gerhard> necessary data ourselves and can get a timely and detailed= Gerhard> answer by doing so. "Timely" being the operative word here, I think. Skip From guido@python.org Wed Apr 16 14:30:37 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 09:30:37 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: Your message of "Tue, 15 Apr 2003 20:41:31 EDT." <200304160041.h3G0fVI06215@europa.research.att.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> <200304160041.h3G0fVI06215@europa.research.att.com> Message-ID: <200304161330.h3GDUbd07889@odiug.zope.com> > Guido> That's cute, but a bit too magical for my taste... It's not > Guido> immediately obvious how this would be done (I know how, but it > Guido> would require a lot of explaining). Plus, -1 is a perfectly > Guido> valid truth value. > > Yes, I know that -1 is a valid truth value. > > Here's the trick. The object of the game is to figure out whether > f is < or __cmp__. > > Suppose you call f(x, y) and it returns 0. Then you don't care > which one f is, because x<y is false either way. > > So the first time you care is the first time f(x, y) returns nonzero. > Now you can find out what kind of function f is by calling f(y, x). > If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison. Right. There's no flaw in this logic, but I'd hate to have to explain it over and over... I don't want people to believe that Python can somehow magically sniff the difference between two functions; they might expect it in other contexts. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 16 14:40:53 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 09:40:53 -0400 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA In-Reply-To: Your message of "Wed, 16 Apr 2003 01:56:45 EDT." <000801c303e0$df6c9a20$125ffea9@oemcomputer> References: <000801c303e0$df6c9a20$125ffea9@oemcomputer> Message-ID: <200304161340.h3GDerM07941@odiug.zope.com> > The docs for PyObject_IsTrue() promise that the "function > always succeeds". But in reality it can return an error > result if an underlying method returns an error. Then the docs need to be repaired! > The calls in ceval.c and elsewhere are cluttered and slowed > by trying to handle all three possibilities. In other places > (like bltinmodule.c and pyexpat.c), the result is used directly > in an "if(result)" clause that ignores the possibility of an > error return. Code that ignores the error return possibility is an accident waiting to happen and should be fixed. > Instead of fixing the docs, do you guys think there may > be merit in returning False whenever explicit Truth isn't > found? Favoring practicality over silent error passage? -1000. This function may invoke arbitrary Python code; exceptions in such code should never be silenced. > This would simplify the use of the function, honor the > promise in the docs, and match usage in code that had not > considered an error result. The function and its callers will > end-up a little smaller, a little faster, and a little more consistent. > Also, reasoning about truth values will be a tad simpler. > > Note, similar thoughts also apply to PyObject_Not(). And a ditto response. Background: once upon a time the code honored the docs. This was way long ago, when comparisons also were not allowed to fail. This was found out to be a real bad idea when these operations could be overloaded in Python, and gradually most code was fixed. Unfortunately the docs weren't fixed. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From sismex01@hebmex.com Wed Apr 16 14:38:14 2003 From: sismex01@hebmex.com (sismex01@hebmex.com) Date: Wed, 16 Apr 2003 08:38:14 -0500 Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8 Message-ID: <F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com> I've found something very, very strange: the interpreter dies on me when printing a UTF-8 encoded unicode object, when the terminal has a unicode codepage. Before anyone asks, I'm running on Windows NT 4. First, I read this message on Python-List from Ben Hutchings: >=20 > UTF-8 is code page 65001. >=20 > Strangely, though, I get 'permission denied' when I run "chcp 65001" = and > then try to print a UTF-8-encoded Euro sign. I don't know what could = be > going wrong there. > So, promptly, I opened a console window, changed the codepage using the above command and started Python. When executing the following: >>> print u"h=F2l=E1".encode("utf-8") [in case it doesn't print out correctly, using html entities, it's "hòlá".encode("utf-8")] the interpreter simply exits without any message, exception, peep, anything; it simply quits without printing anything. Any suggestions? -gustavo pd: Before anybody mentions adding a bug report in SF, I must warn that I don't have web access, only email access. From jack@performancedrivers.com Wed Apr 16 14:55:00 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Wed, 16 Apr 2003 09:55:00 -0400 Subject: [Python-Dev] sre.c and sre_match() In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>; from tim_one@email.msn.com on Tue, Apr 15, 2003 at 11:22:42PM -0400 References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com> Message-ID: <20030416095500.M1039@localhost.localdomain> On Tue, Apr 15, 2003 at 11:22:42PM -0400, Tim Peters wrote: > [Jack Diederich] > > I can't find sre_match() anywhere in the source > > It's in _sre.c, here: > > LOCAL(int) > SRE_MATCH(SRE_STATE* state, SRE_CODE* pattern, int level) > > SRE_MATCH is a macro, and expands to either sre_match or sre_umatch, > depending on whether Unicode support is enabled. Note that _sre.c arranges > to compile itself *twice*, via its > > #define SRE_RECURSIVE > #include "_sre.c" > #undef SRE_RECURSIVE > > This is to get both 8-bit and Unicode versions of the basic routines when > Unicode support is enabled. > My god, its full of stars. Ah, that explains how both sre_umatch() and sre_umatch() get defined and make the if (state.charsize == 1) switches possible. the SRE_RECURSIVE isn't hard to understand once you know it is there, but might it be tidier to breakout the stuff parsed twice into another file? The current layout of the _sre.c is <stuff done once, setup stuff> <stuff done twice, via #include "_sre.c"> <stuff done once, object stuff> mv <stuff done twice> to _sre_twice.c #define SRE_MATCH sre_match #include "_sre_twice.c" /* defines the symbols sre_match, sre_search .. */ #define SRE_MATCH sre_umatch #include "_sre_twice.c" /* defines the symbols sre_umatch, sre_usearch .. */ <stuff done once> You probably don't get random people walking around _sre.c much, but it would have gotten me where I need to go (or at least a better chance). thanks, -jack From duncan@rcp.co.uk Wed Apr 16 15:22:06 2003 From: duncan@rcp.co.uk (Duncan Booth) Date: Wed, 16 Apr 2003 15:22:06 +0100 Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8 References: <F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com> Message-ID: <Xns935F9C2237892duncanrcpcouk@127.0.0.1> sismex01@hebmex.com wrote in news:F7DB8D13DB61D511B6FF00B0D0F06233045E4456@mail.hebmex.com: > the interpreter simply exits without any message, exception, > peep, anything; it simply quits without printing anything. > > Any suggestions? I think its a problem with windows, or with the C runtime rather than Python. The line editing is handled by the system and is obviously screwy. Python is interpreting what you entered as signalling end of file. Call raw_input and type your text there and you will get an EOFError. Try typing any non-ascii character at Python's prompt (e.g. euro symbol) while the selected codepage is 65001, now move the cursor back to anywhere earlier in the input line and enter some more text. The non-ascii character character displayed will change. If you restart the interpreter and recall the line you entered you won't get the characters you thought you typed. Now write a C program: #include <stdio.h> int main() { char s[256]; gets(s); return 0; } Compile and run it and you get exactly the same behaviour. -- Duncan Booth duncan@rcp.co.uk int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3" "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure? From Paul.Moore@atosorigin.com Wed Apr 16 15:53:24 2003 From: Paul.Moore@atosorigin.com (Moore, Paul) Date: Wed, 16 Apr 2003 15:53:24 +0100 Subject: [Python-Dev] Python dies upon printing UNICODE using UTF-8 Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A40@UKDCX001.uk.int.atosorigin.com> From: Duncan Booth [mailto:duncan@rcp.co.uk] > I think its a problem with windows, or with the C runtime rather than=20 > Python. The line editing is handled by the system and is obviously = screwy.=20 > Python is interpreting what you entered as signalling end of file. = Call=20 > raw_input and type your text there and you will get an EOFError. Too right something's screwy. But it's not just in the interactive interpreter. It goes wrong when run from a file, with no non-ascii characters in the script, as well. See the attached transcript. I don't doubt that it's some sort of Windows/CRT problem, but maybe it's fixable within Python...? Paul --- session transcript --- C:\Data >chcp Active code page: 65001 C:\Data >testutf8.py h=F2l=E1 Traceback (most recent call last): File "C:\Data\testutf8.py", line 1, in ? print u'h\xf2l\xe1'.encode("utf-8") IOError: [Errno 2] No such file or directory C:\Data >type testutf8.py print u'h\xf2l\xe1'.encode("utf-8") From niemeyer@conectiva.com Wed Apr 16 15:56:03 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 11:56:03 -0300 Subject: [Python-Dev] shellwords Message-ID: <20030416145602.GA27447@localhost.distro.conectiva> Good morning/afternoon! Is there any chance of getting shellwords[1] into Python 2.3? It's very small module with a pretty interesting functionality: [niemeyer@localhost ..-shellwords-0.2]% python Python 2.2.2 (#1, Apr 10 2003, 13:50:16) [GCC 3.2.2] on linux-ppc Type "help", "copyright", "credits" or "license" for more information. >>> import shellwords >>> shellwords.shellwords('arg "arg arg" arg "arg" -o="arg arg"') ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg'] >>> [1] http://www.crazy-compilers.com/py-lib/shellwords.html -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From skip@pobox.com Wed Apr 16 16:12:35 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 16 Apr 2003 10:12:35 -0500 Subject: [Python-Dev] shellwords In-Reply-To: <20030416145602.GA27447@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> Message-ID: <16029.29411.430501.744446@montanaro.dyndns.org> Gustavo> Is there any chance of getting shellwords[1] into Python 2.3? Can shlex not be convinced to do what you want? (Yes, I saw your Q/A, but didn't quite understand it.) Skip From ark@research.att.com Wed Apr 16 16:20:52 2003 From: ark@research.att.com (Andrew Koenig) Date: 16 Apr 2003 11:20:52 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: <200304161330.h3GDUbd07889@odiug.zope.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> <200304160041.h3G0fVI06215@europa.research.att.com> <200304161330.h3GDUbd07889@odiug.zope.com> Message-ID: <yu99znmqa2p7.fsf@europa.research.att.com> >> So the first time you care is the first time f(x, y) returns nonzero. >> Now you can find out what kind of function f is by calling f(y, x). >> If f(y, x) returns zero, f is <. Otherwise, it's a 3-way comparison. Guido> Right. There's no flaw in this logic, but I'd hate to have to Guido> explain it over and over... I don't want people to believe Guido> that Python can somehow magically sniff the difference between Guido> two functions; they might expect it in other contexts. I can understand your reluctance -- I was just pointing out that it's possible. However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f) technique because it doesn't generalize terribly well. If I want to write a function that takes a comparison function as an argument, and eventualy passes that function to sort, what do I do? Something like this? def myfun(foo, bar, lt=None, cmp=None): # ... x.sort(lt=lt, cmp=cmp) # ... and assume that sort will use None as its defaults also? Or must I write if lt==None: x.sort(cmp=cmp) else: x.sort(lt=lt) Either way it's inconvenient. So I wonder if it might be better, as a way of allowing sort to take two different types of comparison functions, to distinguish between them by making them different types. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From niemeyer@conectiva.com Wed Apr 16 16:22:56 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 12:22:56 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <16029.29411.430501.744446@montanaro.dyndns.org> References: <20030416145602.GA27447@localhost.distro.conectiva> <16029.29411.430501.744446@montanaro.dyndns.org> Message-ID: <20030416152255.GA27792@localhost.distro.conectiva> > Gustavo> Is there any chance of getting shellwords[1] into Python 2.3? > > Can shlex not be convinced to do what you want? (Yes, I saw your Q/A, but > didn't quite understand it.) I haven't tried, but it surely can, subclassing and rewritting portions of it. OTOH, shellwords is about half the size of shlex, and shlex looks overly complex for something simple like args = shellwords(line) Btw, it wasn't *my* Q/A, I haven't written shellwords. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From guido@python.org Wed Apr 16 16:29:15 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 11:29:15 -0400 Subject: [Python-Dev] shellwords In-Reply-To: Your message of "Wed, 16 Apr 2003 12:16:29 -0300." <20030416151629.GA27707@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> Message-ID: <200304161529.h3GFTFr09409@odiug.zope.com> > > > [1] http://www.crazy-compilers.com/py-lib/shellwords.html > > > > Hm, couldn't this be easily done with shlex? > > >From the homepage: > > """ > Frequently Asked Questions > > Q: Hey, there is 'shlex' coming with Python. Why there is a need for > this module? A: I know 'shlex' and I gave it a try. But 'shlex' takes > quotes as word-delemiters which divers from the shell-semantic (see > above). And even if 'shlex' would parse strings as needed, I would have > written a (very, very) thin layer above, since 'shlex' is simple but > seldomly used for this kind of job. > """ I saw that after posting. :-( The argument "'shlex' is simple but seldomly used for this kind of job." seems circular though: "I'm not using shlex because it's rarely used" ??? > I agree with him. Even disconsidering the fact of the syntax > divergence, shellwords is about half the size of shlex, and it's > much more confortable, allowing one liners like "for opt in > shellwords(line):". I know I've wished for this once or twice, but not badly enough to bother solving the problem right. I'm worrying that having too many ways to do mostly the same thing adds clode bloat. Couldn't adding something even smaller on top of shlex provide the same interface and solve the syntactic divergence? --Guido van Rossum (home page: http://www.python.org/~guido/) From niemeyer@conectiva.com Wed Apr 16 16:29:44 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 12:29:44 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <16029.29411.430501.744446@montanaro.dyndns.org> References: <20030416145602.GA27447@localhost.distro.conectiva> <16029.29411.430501.744446@montanaro.dyndns.org> Message-ID: <20030416152944.GA27900@localhost.distro.conectiva> > Can shlex not be convinced to do what you want? (Yes, I saw your Q/A, but > didn't quite understand it.) Oh, sorry. Just now I noticed that you didn't *understand* it. He was talking about that: >>> s = StringIO.StringIO("foo 'bar'asd'foo'") >>> l = shlex.shlex(s) >>> l. l.__class__ l.error_leader l.pop_source l.source l.__doc__ l.filestack l.push_source l.sourcehook l.__init__ l.get_token l.push_token l.state l.__module__ l.infile l.pushback l.token l.commenters l.instream l.quotes l.whitespace l.debug l.lineno l.read_token l.wordchars >>> l.read_token() 'foo' >>> l.read_token() "'bar'" >>> l.read_token() "asd'foo'" >>> In constrast to: >>> shellwords.shellwords("foo 'bar'asd'foo'") ['foo', 'barasdfoo'] And also: [niemeyer@localhost ~/src]% echo foo 'bar'asd'foo' foo barasdfoo -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From niemeyer@conectiva.com Wed Apr 16 16:30:56 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 12:30:56 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <200304161529.h3GFTFr09409@odiug.zope.com> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> Message-ID: <20030416153056.GB27900@localhost.distro.conectiva> [...] > Couldn't adding something even smaller on top of shlex provide the > same interface and solve the syntactic divergence? Ok, I'll check if there's an easy way to "turn" shlex into shellwords. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From guido@python.org Wed Apr 16 16:32:28 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 11:32:28 -0400 Subject: [Python-Dev] Re: Re: lists v. tuples In-Reply-To: Your message of "16 Apr 2003 11:20:52 EDT." <yu99znmqa2p7.fsf@europa.research.att.com> References: <20030312164902.10494.64514.Mailman@mail.python.org> <200303140903.10045.aleax@aleax.it> <3E71F851.3030802@tismer.com> <200303150857.53214.aleax@aleax.it> <200303151236.h2FCaJP06038@pcp02138704pcs.reston01.va.comcast.net> <b4vp23$vec$1@main.gmane.org> <200303152245.h2FMjZx06571@pcp02138704pcs.reston01.va.comcast.net> <yu99adfw5h5n.fsf@europa.research.att.com> <200303161232.h2GCW4Q15556@pcp02138704pcs.reston01.va.comcast.net> <200304152129.h3FLTOL05240@europa.research.att.com> <200304152349.h3FNno407072@pcp02138704pcs.reston01.va.comcast.net> <200304160041.h3G0fVI06215@europa.research.att.com> <200304161330.h3GDUbd07889@odiug.zope.com> <yu99znmqa2p7.fsf@europa.research.att.com> Message-ID: <200304161532.h3GFWSU09441@odiug.zope.com> > However, I'm slightly dubious about the x.sort(lt=f) vs x.sort(cmp=f) > technique because it doesn't generalize terribly well. > > If I want to write a function that takes a comparison function as an > argument, and eventualy passes that function to sort, what do I do? > Something like this? > > def myfun(foo, bar, lt=None, cmp=None): > # ... > x.sort(lt=lt, cmp=cmp) > # ... > > and assume that sort will use None as its defaults also? Or must I > write > > if lt==None: > x.sort(cmp=cmp) > else: > x.sort(lt=lt) > > Either way it's inconvenient. Given that (if we add this) the cmp argument will be deprecated, myfun() should take a 'lt' comparison only. > So I wonder if it might be better, as a way of allowing sort to take > two different types of comparison functions, to distinguish between > them by making them different types. But Python doesn't do types that way. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Apr 16 16:40:14 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 16 Apr 2003 10:40:14 -0500 Subject: [Python-Dev] shellwords In-Reply-To: <20030416153056.GB27900@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> Message-ID: <16029.31070.687527.821448@montanaro.dyndns.org> Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into Gustavo> shellwords. Cool. Based on this thread and an experiment I tried, some obvious (to me) things come to mind: * get_token() needs to be fixed to handle the 'bar'asd'foo' case * the shlex class should handle strings as input, not just file-like objects * get_word() or get_words() methods in the shlex class could implement the shellwords functionality Skip From fdrake@acm.org Wed Apr 16 16:41:22 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 16 Apr 2003 11:41:22 -0400 Subject: [Python-Dev] shellwords In-Reply-To: <20030416153056.GB27900@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> Message-ID: <16029.31138.988795.672854@grendel.zope.com> Gustavo Niemeyer writes: > Ok, I'll check if there's an easy way to "turn" shlex into shellwords. Is there any real objection to simply fixing shlex to get it right? I'm guessing that the divergence from shell quoting was more a matter of implementation expedience and a feeling that it was "good enough" for whatever original application it was written for. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From guido@python.org Wed Apr 16 16:45:12 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 11:45:12 -0400 Subject: [Python-Dev] shellwords In-Reply-To: Your message of "Wed, 16 Apr 2003 10:40:14 CDT." <16029.31070.687527.821448@montanaro.dyndns.org> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> Message-ID: <200304161545.h3GFjC710136@odiug.zope.com> > Gustavo> Ok, I'll check if there's an easy way to "turn" shlex into > Gustavo> shellwords. > > Cool. Based on this thread and an experiment I tried, some obvious (to me) > things come to mind: > > * get_token() needs to be fixed to handle the 'bar'asd'foo' case > > * the shlex class should handle strings as input, not just file-like > objects > > * get_word() or get_words() methods in the shlex class could implement > the shellwords functionality I'd be happy to see this done. You might submit the changes to ESR for review but he may be busy so don't wait for him. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Wed Apr 16 17:11:23 2003 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 16 Apr 2003 12:11:23 -0400 Subject: [Python-Dev] shellwords In-Reply-To: <16029.31138.988795.672854@grendel.zope.com> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31138.988795.672854@grendel.zope.com> Message-ID: <20030416161123.GA13046@thyrsus.com> Fred L. Drake, Jr. <fdrake@acm.org>: > Gustavo Niemeyer writes: > > Ok, I'll check if there's an easy way to "turn" shlex into shellwords. > > Is there any real objection to simply fixing shlex to get it right? > I'm guessing that the divergence from shell quoting was more a matter > of implementation expedience and a feeling that it was "good enough" > for whatever original application it was written for. That is correct. I originally wrote shlex as the parser logic for a .netrc module. I would have no intrinsic objection to having this behavior fixed, though there is of course the general problem of how much we value not breaking old code. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> From guido@python.org Wed Apr 16 16:52:10 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 11:52:10 -0400 Subject: [Python-Dev] 2.3b1 release Message-ID: <200304161552.h3GFqAQ10181@odiug.zope.com> I'd like to do a 2.3b1 release someday. Maybe at the end of next week, that would be Friday April 25. If anyone has something that needs to be done before this release go out, please let me know! Assigning a SF bug or patch to me and setting the priority to 7 is a good way to get my attention. --Guido van Rossum (home page: http://www.python.org/~guido/) From niemeyer@conectiva.com Wed Apr 16 17:43:14 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 13:43:14 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <200304161545.h3GFjC710136@odiug.zope.com> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> <200304161545.h3GFjC710136@odiug.zope.com> Message-ID: <20030416164314.GA28085@localhost.distro.conectiva> > > Cool. Based on this thread and an experiment I tried, some obvious (to me) > > things come to mind: > > > > * get_token() needs to be fixed to handle the 'bar'asd'foo' case > > > > * the shlex class should handle strings as input, not just file-like > > objects > > > > * get_word() or get_words() methods in the shlex class could implement > > the shellwords functionality > > I'd be happy to see this done. You might submit the changes to ESR > for review but he may be busy so don't wait for him. Great! I'll work on it. How should we do to avoid compatibility problems? Some solutions that come into my mind are: - Forget about it completely and fix the syntax handling to be posix compliant. - Create a subclass of shlex, or a completely different class (shlex_posix?) depending on how much can be reused. - Add a flag to the constructor. Suggestions? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From barry@python.org Wed Apr 16 17:52:06 2003 From: barry@python.org (Barry Warsaw) Date: 16 Apr 2003 12:52:06 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050511925.9818.78.camel@barry> On Sat, 2003-04-12 at 07:43, Martin v. Löwis wrote: > More or less, yes. Now, what happens if you pot "real" non-ASCII > (i.e. bytes above 127) into the message id, like so: But I don't think you'd ever want to do that. In fact, I think in general you're probably talking about ascii msgids or utf-8 encoded Unicode msgids. I'm not sure what else would make sense. > msgfmt will still accept that, but msgunfmt will complain: Didn't even know about msgunfmt. :) > msgunfmt: warning: The following msgid contains non-ASCII characters. > This will cause problems to translators who use a > character encoding different from yours. Consider > using a pure ASCII msgid instead. > > If you think about this, this is really bad: If you mean to apply the > charset= to both msgid and msgstr, then translators using a different > charset from yours are in big trouble. Right, but see above. E.g. if your string literals are all Spanish and you want a Turkish translation, then utf-8 is the only common encoding you could possibly use in a .po file, right? > They are faced with three problems: > 1. They don't know what the charset of the msgids is. The PO files do > have a charset declaration, the POT files typically don't. Yep, although it would be easy for the extractor to add a charset=utf-8 to the pot file. > 2. They need to convert the msgids from the POT encoding to their > native encoding. There are no tools available to support that readily; > tools like iconv might correctly convert the msgids, but won't update > the charset= in the POT file (if the charset was filled out). > 3. By converting the msgids, they are also changing them. That means > the msgids are not really suitable as keys anymore. Is this still a problem for when charset=utf-8? -Barry From barry@python.org Wed Apr 16 17:53:53 2003 From: barry@python.org (Barry Warsaw) Date: 16 Apr 2003 12:53:53 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <m38yug57j6.fsf@mira.informatik.hu-berlin.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050093475.11200.96.camel@barry> <m38yug57j6.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050512032.9818.81.camel@barry> On Sat, 2003-04-12 at 06:34, Martin v. Löwis wrote: > Barry Warsaw <barry@python.org> writes: > > > I suppose we could cache the conversion to make the next lookup more > > efficient. Alternatively, if we always convert internally to Unicode we > > could encode on .gettext(). Then we could just pick One Way and do away > > with the coerce flag. > > If you are concerned about efficiency, I guess there is no way to > avoid converting the file to Unicode on loading. I would then > encourage a change where this flag is available, but has an effect > only on performance, not on the behaviour. > > Alternatively, you could subclass GNUTranslation. It would take some refactoring, unless you implemented a second pass over the catalog. I'd rather not do either, so I'm happy to include this right in GNUTranslations. -Barry From niemeyer@conectiva.com Wed Apr 16 18:03:35 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 14:03:35 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <20030416164314.GA28085@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> <200304161545.h3GFjC710136@odiug.zope.com> <20030416164314.GA28085@localhost.distro.conectiva> Message-ID: <20030416170335.GA28540@localhost.distro.conectiva> > Great! I'll work on it. > > How should we do to avoid compatibility problems? Some solutions that > come into my mind are: > > - Forget about it completely and fix the syntax handling to be > posix compliant. > > - Create a subclass of shlex, or a completely different class > (shlex_posix?) depending on how much can be reused. > > - Add a flag to the constructor. Thinking further about this, I belive there's a better solution. I'll write different functions (probably read_word()/get_word()) with the new behavior. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From drifty@alum.berkeley.edu Wed Apr 16 18:31:04 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Wed, 16 Apr 2003 10:31:04 -0700 (PDT) Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU> [Guido van Rossum] > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! > Just to make sure since this is the first release that I have CVS commit, we can apply patches to fix bugs without having to worry about it being beta, right? How about new tests? -Brett From jack@performancedrivers.com Wed Apr 16 18:33:58 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Wed, 16 Apr 2003 13:33:58 -0400 Subject: [Python-Dev] sre.c and sre_match() In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com>; from tim_one@email.msn.com on Tue, Apr 15, 2003 at 11:22:42PM -0400 References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com> Message-ID: <20030416133358.A1553@localhost.localdomain> > [Jack Diederich] > > ... > > I was actually poking around to see how hard it would be to allow > > pure-python string classes to work with the re modules. [Tim Peters] > Sorry, no idea. Note that sre works on any object supporting the ill-fated > buffer interface. You may have a hard time figuring out that too. But, > e.g., it implies that re can search directly over an mmap'ed file (you don't > need to read the file into a string first). Poking around some more in _sre.c It looks like user defined strings could be supported via the same #include hack as unicode with some extra defines. // ascii/unicdoe #define STATE_NEXT_CHAR(state) state->ptr++ // user strings #define STATE_NEXT_CHAR(state) PyEval_CallObject(state->string_nextmethod) similar for STATE_PREV_CHAR and something to ask if we're at the end // ascii #define STATE_ISEND(state) (state->ptr == state->end) // user strings #define STATE_ISEND(state) PyEval_CallOjbect(state->string_endmethod) Is there a speed reason why all the SRE_MATCH type functions do ptr = state->ptr; ptr++; ptr--; // lots more stuff with ptr state->ptr = ptr; or is it just convenience? If just convenience it would make writing the #defines easier. the PyEval_CallObjects are just psuedo code, it would be wrapped in something that tested the appropriateness of the return value and other book keeping. could this be done without hurting the speed of regular regexps? -jackdied From guido@python.org Wed Apr 16 18:36:19 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 13:36:19 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: Your message of "Wed, 16 Apr 2003 10:31:04 PDT." <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.SOL.4.53.0304161030001.26627@death.OCF.Berkeley.EDU> Message-ID: <200304161736.h3GHaJB10928@odiug.zope.com> > > I'd like to do a 2.3b1 release someday. Maybe at the end of next > > week, that would be Friday April 25. If anyone has something that > > needs to be done before this release go out, please let me know! > > Just to make sure since this is the first release that I have CVS commit, > we can apply patches to fix bugs without having to worry about it being > beta, right? Right. Fix away. > How about new tests? Feel free to add new unit tests, as long as the whole unit test suite passes when you commit. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 16 18:39:50 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 13:39:50 -0400 Subject: [Python-Dev] sre.c and sre_match() In-Reply-To: Your message of "Wed, 16 Apr 2003 13:33:58 EDT." <20030416133358.A1553@localhost.localdomain> References: <20030415230036.L1039@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCKEAPEHAB.tim_one@email.msn.com> <20030416133358.A1553@localhost.localdomain> Message-ID: <200304161739.h3GHdoP10981@odiug.zope.com> There are few people here who understand the _sre code, so I'm not sure you'll get answers. Given how critical this code is and given that Fredrik is adamant that the code needs to continue to run with all versions of Python starting with 1.5.2, I'd rather not mess with it much in terms of adding new features. Maybe you can create your own code fork for now? --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Wed Apr 16 18:47:57 2003 From: theller@python.net (Thomas Heller) Date: 16 Apr 2003 19:47:57 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <4r4yqqpe.fsf@python.net> Guido van Rossum <guido@python.org> writes: > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! I would still like to work on http://www.python.org/sf/595026 support for masks in getargs.c. Jack requested that this change should be implement shortly after the release of 2.3a2, but this is too late now as it seems ;-) What to do? Implement it now and commit it after 2.3b1 is released, or delay this until 2.3 final is released. I have to admit that I'm sure I can implement it for 32-bit Windows, but it would have to be tested (and maybe completed) on other, especially 64-bit platforms as well. And it introduces incompatibilities. BTW: Since you want to release a beta version, what's the state of the FutureWarning about hex/oct constants: will this stay the way it is? Thomas From python@rcn.com Wed Apr 16 18:50:57 2003 From: python@rcn.com (Raymond Hettinger) Date: Wed, 16 Apr 2003 13:50:57 -0400 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA References: <000801c303e0$df6c9a20$125ffea9@oemcomputer> <200304161340.h3GDerM07941@odiug.zope.com> Message-ID: <00d701c30440$bc766680$125ffea9@oemcomputer> > > The docs for PyObject_IsTrue() promise that the "function > > always succeeds". But in reality it can return an error > > result if an underlying method returns an error. > > Then the docs need to be repaired! Done. From guido@python.org Wed Apr 16 19:00:54 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 14:00:54 -0400 Subject: [Python-Dev] Masks in getargs.c (was: 2.3b1 release) In-Reply-To: Your message of "16 Apr 2003 19:47:57 +0200." <4r4yqqpe.fsf@python.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <4r4yqqpe.fsf@python.net> Message-ID: <200304161800.h3GI0sP11085@odiug.zope.com> > Guido van Rossum <guido@python.org> writes: > > > I'd like to do a 2.3b1 release someday. Maybe at the end of next > > week, that would be Friday April 25. If anyone has something that > > needs to be done before this release go out, please let me know! > From: Thomas Heller <theller@python.net> > > I would still like to work on http://www.python.org/sf/595026 > support for masks in getargs.c. Great! > Jack requested that this change should be implement shortly after the > release of 2.3a2, but this is too late now as it seems ;-) > > What to do? Do it ASAP. > Implement it now and commit it after 2.3b1 is released, or delay this > until 2.3 final is released. I have to admit that I'm sure I can > implement it for 32-bit Windows, but it would have to be tested (and > maybe completed) on other, especially 64-bit platforms as well. If you can get something rough into 2.3b1, it can be improved while 2.3b2 is cooking. > And it introduces incompatibilities. What kind? I thought it would be a new format code? > BTW: Since you want to release a beta version, what's the state of the > FutureWarning about hex/oct constants: will this stay the way it is? Probably, unless you hve a better idea. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 16 19:01:46 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 14:01:46 -0400 Subject: [Python-Dev] 3-way result of PyObject_IsTrue() considered PITA In-Reply-To: Your message of "Wed, 16 Apr 2003 13:50:57 EDT." <00d701c30440$bc766680$125ffea9@oemcomputer> References: <000801c303e0$df6c9a20$125ffea9@oemcomputer> <200304161340.h3GDerM07941@odiug.zope.com> <00d701c30440$bc766680$125ffea9@oemcomputer> Message-ID: <200304161801.h3GI1kW11105@odiug.zope.com> > > > The docs for PyObject_IsTrue() promise that the "function > > > always succeeds". But in reality it can return an error > > > result if an underlying method returns an error. > > > > Then the docs need to be repaired! > > Done. Thanks! But didn't you say that you had found code (in core Python) that didn't account for failures? Shouldn't that be fixed too? --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Wed Apr 16 19:11:27 2003 From: theller@python.net (Thomas Heller) Date: 16 Apr 2003 20:11:27 +0200 Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release) In-Reply-To: <200304161800.h3GI0sP11085@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com> Message-ID: <vfxepb1s.fsf@python.net> > > I would still like to work on http://www.python.org/sf/595026 > > support for masks in getargs.c. > > Great! > > > Jack requested that this change should be implement shortly after the > > release of 2.3a2, but this is too late now as it seems ;-) > > > > What to do? > > Do it ASAP. Ok, working on it. > > > Implement it now and commit it after 2.3b1 is released, or delay this > > until 2.3 final is released. I have to admit that I'm sure I can > > implement it for 32-bit Windows, but it would have to be tested (and > > maybe completed) on other, especially 64-bit platforms as well. > > If you can get something rough into 2.3b1, it can be improved while > 2.3b2 is cooking. > > > And it introduces incompatibilities. > > What kind? I thought it would be a new format code? Two new format codes ('k' and 'K'), and changes to existing format codes - per your request: | How about the following counterproposal. This also changes some of the | other format codes to be a little more regular. | | Code C type Range check | | b unsigned char 0..UCHAR_MAX | B unsigned char none ** | h unsigned short 0..USHRT_MAX | H unsigned short none ** | i int INT_MIN..INT_MAX | I * unsigned int 0..UINT_MAX | l long LONG_MIN..LONG_MAX | k * unsigned long none | L long long LLONG_MIN..LLONG_MAX | K * unsigned long long none | | Notes: | | * New format codes. | | ** Changed from previous "range-and-a-half" to "none"; the | range-and-a-half checking wasn't particularly useful. > > BTW: Since you want to release a beta version, what's the state of the > > FutureWarning about hex/oct constants: will this stay the way it is? > > Probably, unless you hve a better idea. :-( I haven't used warnings very much, but is there a possibility to disable them per module? You get a lot of them if you 'import win32con' for example. Thomas From guido@python.org Wed Apr 16 19:17:10 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 14:17:10 -0400 Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release) In-Reply-To: Your message of "16 Apr 2003 20:11:27 +0200." <vfxepb1s.fsf@python.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com> <vfxepb1s.fsf@python.net> Message-ID: <200304161817.h3GIHA111307@odiug.zope.com> > > > And it introduces incompatibilities. > > > > What kind? I thought it would be a new format code? > > Two new format codes ('k' and 'K'), and changes to existing format > codes - per your request: > > | How about the following counterproposal. This also changes some of the > | other format codes to be a little more regular. > | > | Code C type Range check > | > | b unsigned char 0..UCHAR_MAX > | B unsigned char none ** > | h unsigned short 0..USHRT_MAX > | H unsigned short none ** > | i int INT_MIN..INT_MAX > | I * unsigned int 0..UINT_MAX > | l long LONG_MIN..LONG_MAX > | k * unsigned long none > | L long long LLONG_MIN..LLONG_MAX > | K * unsigned long long none > | > | Notes: > | > | * New format codes. > | > | ** Changed from previous "range-and-a-half" to "none"; the > | range-and-a-half checking wasn't particularly useful. Oh of course. None to worry about IMO. > > > BTW: Since you want to release a beta version, what's the state > > > of the FutureWarning about hex/oct constants: will this stay the > > > way it is? > > > > Probably, unless you hve a better idea. :-( > > I haven't used warnings very much, but is there a possibility to > disable them per module? You get a lot of them if you 'import > win32con' for example. Yes, you can suppress warnings per module. Please read the docs. --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Wed Apr 16 19:38:33 2003 From: theller@python.net (Thomas Heller) Date: 16 Apr 2003 20:38:33 +0200 Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release) In-Reply-To: <200304161817.h3GIHA111307@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <4r4yqqpe.fsf@python.net> <200304161800.h3GI0sP11085@odiug.zope.com> <vfxepb1s.fsf@python.net> <200304161817.h3GIHA111307@odiug.zope.com> Message-ID: <ptnmp9sm.fsf@python.net> Guido van Rossum <guido@python.org> writes: > > > > And it introduces incompatibilities. > > > > > > What kind? I thought it would be a new format code? > > > > Two new format codes ('k' and 'K'), and changes to existing format > > codes - per your request: > > > > | How about the following counterproposal. This also changes some of the > > | other format codes to be a little more regular. > > | > > | Code C type Range check > > | > > | b unsigned char 0..UCHAR_MAX > > | B unsigned char none ** > > | h unsigned short 0..USHRT_MAX > > | H unsigned short none ** > > | i int INT_MIN..INT_MAX > > | I * unsigned int 0..UINT_MAX > > | l long LONG_MIN..LONG_MAX > > | k * unsigned long none > > | L long long LLONG_MIN..LLONG_MAX > > | K * unsigned long long none > > | > > | Notes: > > | > > | * New format codes. > > | > > | ** Changed from previous "range-and-a-half" to "none"; the > > | range-and-a-half checking wasn't particularly useful. > > Oh of course. None to worry about IMO. Well, implementing (and testing) these as the main part of the work, and I'm at least halfway through. Thomas From martin@v.loewis.de Wed Apr 16 20:14:27 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 16 Apr 2003 21:14:27 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <m3y92a9rvw.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! I'd like to install the modifications for internationalized domain names before the beta release is made. Regards, Martin From martin@v.loewis.de Wed Apr 16 20:20:34 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 16 Apr 2003 21:20:34 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1050511925.9818.78.camel@barry> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> Message-ID: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > Right, but see above. E.g. if your string literals are all Spanish and > you want a Turkish translation, then utf-8 is the only common encoding > you could possibly use in a .po file, right? That's why your string literals should never be all Spanish. If you have Spanish string literals and use escape codes in the msgid, reading the Spanish msgid becomes difficult, anyway. > > 3. By converting the msgids, they are also changing them. That means > > the msgids are not really suitable as keys anymore. > > Is this still a problem for when charset=utf-8? If the msgids are UTF-8, with non-ASCII characters C-escaped, translators will *still* put non-UTF-8 encodings into the catalogs. This will then be a problem: The catalog encoding won't be UTF-8, and you can't process the msgids. Regards, Martin From niemeyer@conectiva.com Wed Apr 16 20:23:27 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Wed, 16 Apr 2003 16:23:27 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <16029.31070.687527.821448@montanaro.dyndns.org> References: <20030416145602.GA27447@localhost.distro.conectiva> <200304161503.h3GF3Eo08464@odiug.zope.com> <20030416151629.GA27707@localhost.distro.conectiva> <200304161529.h3GFTFr09409@odiug.zope.com> <20030416153056.GB27900@localhost.distro.conectiva> <16029.31070.687527.821448@montanaro.dyndns.org> Message-ID: <20030416192326.GA29785@localhost.distro.conectiva> > Cool. Based on this thread and an experiment I tried, some obvious (to me) > things come to mind: > > * get_token() needs to be fixed to handle the 'bar'asd'foo' case > > * the shlex class should handle strings as input, not just file-like > objects > > * get_word() or get_words() methods in the shlex class could implement > the shellwords functionality Ok, it was easier than I imagined. Here's an example of the new shlex. Maintaining the old behavior (notice that now strings are accepted as arguments): >>> import shlex >>> l = shlex.shlex("'foo'a'bar'") >>> l.get_token() "'foo'" >>> l.get_token() "a'bar'" New behavior: >>> l = shlex.shlex("'foo'a'bar'", posix=1) >>> l.get_token() 'fooabar' Introduced iterator interface: >>> for i in shlex.shlex("'foo'a'bar'"): ... print i ... 'foo' a'bar' New function, mimicking shellwords: >>> shlex.split_args("'foo'a'bar' -o='foo bar'") ['fooabar', '-o=foo bar'] I'm not sure if "posix" and "split_args" are the best names for these features. Suggestions? I've just commited patch #722686 (and assigned to Guido, as he suggested recently ;-). -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From barry@python.org Wed Apr 16 20:36:08 2003 From: barry@python.org (Barry Warsaw) Date: 16 Apr 2003 15:36:08 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> Message-ID: <1050521768.14112.15.camel@barry> On Wed, 2003-04-16 at 15:20, Martin v. Löwis wrote: > Barry Warsaw <barry@python.org> writes: > > > Right, but see above. E.g. if your string literals are all Spanish and > > you want a Turkish translation, then utf-8 is the only common encoding > > you could possibly use in a .po file, right? > > That's why your string literals should never be all Spanish. If you > have Spanish string literals and use escape codes in the msgid, > reading the Spanish msgid becomes difficult, anyway. So why isn't the English/US-ASCII bias for msgids considered a liability for gettext? Do non-English programmers not want to use native literals in their source code? If we adhere to this limitation instead of extending gettext then it seems like Zope will be forced to use something else, and that seems like a waste. Its msgids come from sources other than program source code and such sources may indeed be written in non-English. It seems like gettext is so close and all the machinery is almost there, that this small enhancement should be harmless and helpful. BTW, I believe that if all your msgids /are/ us-ascii, you should be able to ignore this change and have it works backwards compatibly. Also, this change ought to visibly only affect .ugettext() which isn't part of the traditional gettext API anyway. > > > 3. By converting the msgids, they are also changing them. That means > > > the msgids are not really suitable as keys anymore. > > > > Is this still a problem for when charset=utf-8? > > If the msgids are UTF-8, with non-ASCII characters C-escaped, > translators will *still* put non-UTF-8 encodings into the catalogs. > This will then be a problem: The catalog encoding won't be UTF-8, > and you can't process the msgids. Isn't this just another validation step to run on the .po files? There are already several ways translators can (and do!) make mistakes, so we already have to validate the files anyway. -Barry From guido@python.org Wed Apr 16 20:31:48 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 16 Apr 2003 15:31:48 -0400 Subject: [Python-Dev] Super and properties In-Reply-To: Your message of "Wed, 02 Apr 2003 15:42:41 +0100." <001401c2f926$1d32d7e0$a8130dd5@violante> References: <001401c2f926$1d32d7e0$a8130dd5@violante> Message-ID: <200304161931.h3GJVmp19275@odiug.zope.com> (I'm quoting the whole message below since this has been two weeks by now.) > From: =?iso-8859-1?Q?Gon=E7alo_Rodrigues?= <op73418@mail.telepac.pt> > > Hi all, > > Since this is my first post here, let me first introduce myself. I'm Gonçalo > Rodrigues. I work in mathematics, mathematical physics to be more precise. I > am a self-taught hobbyist programmer and fell in love with Python a year and > half ago. And of interesting personal details this is about all so let me > get down to business. > > My problem has to do with super that does not seem to work well with > properties. I posted to comp.lang.python a while ago and there I was advised > to post here. So, suppose I override a property in a subclass, e.g. > > >>> class test(object): > ... def __init__(self, n): > ... self.__n = n > ... def __get_n(self): > ... return self.__n > ... def __set_n(self, n): > ... self.__n = n > ... n = property(__get_n, __set_n) > ... > >>> a = test(8) > >>> a.n > 8 > >>> class test2(test): > ... def __init__(self, n): > ... super(test2, self).__init__(n) > ... def __get_n(self): > ... return "Got ya!" > ... n = property(__get_n) > ... > >>> b = test2(8) > >>> b.n > 'Got ya!' > > Now, since I'm overriding a property, it is only normal that I may want to > call the property implementation in the super class. But the obvious way (to > me at least) does not work: > > >>> print super(test2, b).n > Traceback (most recent call last): > File "<interactive input>", line 1, in ? > AttributeError: 'super' object has no attribute 'n' > > I know I can get at the property via the class, e.g. do > > >>> test.n.__get__(b) > 8 > >>> > > Or, not hardcoding the test class, > > >>> b.__class__.__mro__[1].n.__get__(b) > 8 > > But this is ugly at best. To add to the puzzle, the following works, albeit > not in the way I expected > > >>> super(test2, b).__getattribute__('n') > 'Got ya!' > > Since I do not know if this is a bug in super or a feature request for it, I > thought I'd better post here and leave it to your consideration. > > With my best regards, > G. Rodrigues Hah! I think I've resolved this, and I *still* don't know if it's a bug report or a feature request. :-) The crux of the matter is that super() has a specific exception for data descriptors, of which properties are an example. This means that when looking for attribute 'x', if it finds a hit which is a data descriptor, it ignores it and keeps looking. It took me a while to understand why. When I disabled the test, exactly *one* unit test fails, and it wasn't immediately clear what was going on. It turns out that this test was asking for the __class__ attribute of the super object itself, but it was getting the __class__ of the instance. Simplified: class C(object): pass print super(C, C()).__class__ This should print <type 'super'> and not <class '__main__.C'>, because it would be really confusing if the super object, when inquired about its class, masqueraded as another class. How does skipping data descriptors accomplish this goal? When super does its search, the last class it looks at is 'object', at the end of the MRO chain. And this has a data descriptor for '__class__', which describes the __class__ attribute of all objects. If super were to give this descriptor the usual treatment, it would call its __get__ method, and that would (in the above example) return the class C. The CVS history mentions (for typeobject.c rev 2.120, shortly before the final 2.2.0 release): super(C, C()).__class__ would return the __class__ attribute of C() rather than the __class__ attribute of the super object. This is confusing. To fix this, I decided to change the semantics of super so that it only applies to code attributes, not to data attributes. After all, overriding data attributes is not supported anyway. Your message above makes a good case for overriding data attributes, so I have to retract this. But I don't want __class__ to return C, I want it to return super. So I'll change this back, and make an explicit exception only for __class__. And ok, I'm deciding now that this is a feature, which means that I'm changing it in Python 2.3, but not backporting the change to Python 2.2.x. Hope this helps! --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Wed Apr 16 21:18:52 2003 From: python@rcn.com (Raymond Hettinger) Date: Wed, 16 Apr 2003 16:18:52 -0400 Subject: [Python-Dev] 2.3b1 release References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <004301c30455$78230c80$125ffea9@oemcomputer> > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! I have a couple of small patches and bugs to review. Should be no problem getting these in this weekend. Am working on a more cache friendly dict lookup strategy. If it is not ready for prime time in the next few days, it will have to wait for Py2.4. A couple of bytecode optimizations may also have to wait for Py2.4. For some reason, Basicblock(nop, jump_if_true) is not always directly substitutable for Basicblock(unary_not, jump_if_false). I suspect the three-way return value for PyObject_IsTrue() but it could be something else. Raymond Hettinger From martin@v.loewis.de Wed Apr 16 23:07:15 2003 From: martin@v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Apr 2003 00:07:15 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1050521768.14112.15.camel@barry> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry> Message-ID: <3E9DD413.8030002@v.loewis.de> Barry Warsaw wrote: > So why isn't the English/US-ASCII bias for msgids considered a liability > for gettext? Do non-English programmers not want to use native literals > in their source code? Using English for msgids is about the only way to get translation. Finding a Turkish speaker who can translate from Spanish is *significantly* more difficult than starting from English; if you were starting from, say, Chinese, and going to Hebrew might just be impossible. So any programmer who seriously wants to have his software translated will put English texts into the source code. Non-English literals are only used if l10n is not an issue. > If we adhere to this limitation instead of extending gettext then it > seems like Zope will be forced to use something else, and that seems > like a waste. It's not a limitation of gettext, but a usage guideline: gettext can map arbitrary byte strings to arbitrary other byte strings. > BTW, I believe that if all your msgids /are/ us-ascii, you should be > able to ignore this change and have it works backwards compatibly. "This" change being addition of the "coerce" argument? If you think you will need it, we can leave it in. >>If the msgids are UTF-8, with non-ASCII characters C-escaped, >>translators will *still* put non-UTF-8 encodings into the catalogs. >>This will then be a problem: The catalog encoding won't be UTF-8, >>and you can't process the msgids. > > > Isn't this just another validation step to run on the .po files? There > are already several ways translators can (and do!) make mistakes, so we > already have to validate the files anyway. I'm not sure how exactly a validation step would be executed. Would that step simply verify that the encoding of a catalog is UTF-8? That validation step would fail for catalogs that legally use other charsets. Regards, Martin From mhammond@skippinet.com.au Thu Apr 17 02:58:01 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 17 Apr 2003 11:58:01 +1000 Subject: [Python-Dev] Final PEP 311 run Message-ID: <023a01c30484$c782ad10$530f8490@eden> Hi all, I'd like to get PEP311 in for the Python 2.3b1 - http://www.python.org/peps/pep-0311.html (or even if I miss, very soon after!) There appears to be no issues with the technical aspects of the PEP (please correct me now if I am wrong). The only issue is the name of the API. To save re-reading the PEP just to understand the names, I will summarize here (see the PEP for the full version): There are 2 new functions, called as a pair. The first function sets up the Python thread state, along with the GIL, so that the current thread can safely call the Python API. The function makes no assumptions about the current state of the GIL etc - it works out the current state, and does the "right thing". The second function is the reverse of the first, to indicate that the thread has finished with the thread state for the time being. The PEP calls these functions PyAutoThreadState_Ensure() and PyAutoThreadState_Release() Reasons for the names in the PEP: "Auto" reflects that the current thread-state need not be known (whereas the other APIs do). "Ensure()" reflects that nothing may actually be *created* - all we are doing is "ensuring" we have the resources, creating only if necessary. On the down-side - "Auto" will look strange in the future when this is the standard way of managing the lock. "ThreadState" does not reflect that the function does more than manage the PyThreadState - it also manages the locks (which while an implementation detail, are currently discrete) Other Proposals: Just: PyGIL_Ensure(), PyGIL_Release(): shorter to type, conveys the meaning. David Abrahams: Prefers SubjectVerbObject, so would prefer "PyEnsureGIL" - but likes PyAcquireInterpreter() and PyReleaseInterpreter() best. Dropping "Auto" from the PEP gives PyThreadState_Ensure() and PyThreadState_Release(). I admit to liking "PyAcquireInterpreter()" best, but it does not match the existing API structure. For the sake of typing, I would be happy to go with Just's PyGIL_Ensure(), but maybe PyInterpreter_Ensure() is a good compromise. Other opinions or pronouncements welcome :) Mark. From tim_one@email.msn.com Thu Apr 17 03:46:28 2003 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 16 Apr 2003 22:46:28 -0400 Subject: [Python-Dev] Final PEP 311 run In-Reply-To: <023a01c30484$c782ad10$530f8490@eden> Message-ID: <LNBBLJKPBEHFEDALKOLCGEBMEHAB.tim_one@email.msn.com> [Mark Hammond] > I'd like to get PEP311 in for the Python 2.3b1 - > http://www.python.org/peps/pep-0311.html (or even if I miss, very soon > after!) I hope so! It seems important that the specific projects mentioned in the PEP test drive this before 2.3 final. > There appears to be no issues with the technical aspects of the > PEP (please correct me now if I am wrong). Some questions occurred while reading the PEP again, primarily are there any restrictions on which parts of the Python API can be called between an ensure and its matching release? For example, is it OK if the thread does a Py_BEGIN_ALLOW_THREADS whatever Py_END_ALLOW_THREADS pair while an ensure is active in the thread? Is it OK if the thread does a nested PyAutoThreadState_Ensure() whatever PyAutoThreadState_Release() likewise (I'm sure that one is OK, but am not sure the PEP really says so)? If that is OK, must the nested call use the same PyAutoThreadState_State handle returned by the outer ensure -- or must it avoid using the same handle? > The only issue is the name of the API. If that's the only issue, check it in yesterday <0.9 wink>. > To save re-reading the PEP just to understand the names, I will summarize > here (see the PEP for the full version): > > There are 2 new functions, called as a pair. The first function > sets up the Python thread state, along with the GIL, so that the current > thread can safely call the Python API. The function makes no assumptions > about the current state of the GIL etc - it works out the current state, > and does the "right thing". The second function is the reverse of the > first, to indicate that the thread has finished with the thread state for > the time being. > > The PEP calls these functions PyAutoThreadState_Ensure() and > PyAutoThreadState_Release() I can live with that. > Reasons for the names in the PEP: > "Auto" reflects that the current thread-state need not be known > (whereas the other APIs do). "Ensure()" reflects that nothing may > actually be *created* - all we are doing is "ensuring" we have the > resources, creating only if necessary. On the down-side - "Auto" will > look strange in the future when this is the standard way of managing > the lock. "ThreadState" does not reflect that the function does more > than manage the PyThreadState - it also manages the locks (which while > an implementation detail, are currently discrete) Please put this paragraph of rationale in the docs (leaving out the down side, and maybe in a footnote)! Once it's explained, there's nothing mysterious about the names, and there's no point making future readers guess at the reasons. > Other Proposals: > Just: PyGIL_Ensure(), PyGIL_Release(): shorter to type, conveys the > meaning. Could live with that too. > David Abrahams: Prefers SubjectVerbObject, so would prefer > "PyEnsureGIL" - but likes Ditto, except grates some against existing naming conventions (generally "Py" Subsystem "_" Detail ). > PyAcquireInterpreter() and PyReleaseInterpreter() best. I first read those as having something to do with an interpreter state, which isn't a good sign. > Dropping "Auto" from the PEP gives PyThreadState_Ensure() and > PyThreadState_Release(). What do you get if you drop the Thread <wink>? > I admit to liking "PyAcquireInterpreter()" best, but it does not match > the existing API structure. PyInterpreter_{Acquire,Release} would, though. > For the sake of typing, I would be happy to go with Just's > PyGIL_Ensure(), but maybe PyInterpreter_Ensure() is a good > compromise. > > Other opinions or pronouncements welcome :) Flip a coin, check it in, have a smoke, don't look back. I'll join you. From greg@cosc.canterbury.ac.nz Thu Apr 17 03:57:06 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 17 Apr 2003 14:57:06 +1200 (NZST) Subject: [Python-Dev] Final PEP 311 run In-Reply-To: <023a01c30484$c782ad10$530f8490@eden> Message-ID: <200304170257.h3H2v6v16015@oma.cosc.canterbury.ac.nz> Mark Hammond <mhammond@skippinet.com.au>: > The PEP calls these functions PyAutoThreadState_Ensure() and > PyAutoThreadState_Release() > > Other opinions or pronouncements welcome :) How about: PyEnvironment_Ensure PyEnvironment_Release where the "Environment" bit means "everything that's needed, whatever it might be". Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tismer@tismer.com Thu Apr 17 05:40:32 2003 From: tismer@tismer.com (Christian Tismer) Date: Thu, 17 Apr 2003 06:40:32 +0200 Subject: [Python-Dev] Stackless 3.0 alpha 1 at blinding speed Message-ID: <3E9E3040.5040409@tismer.com> Dear community, dear Stackless addicts, dear friends, Ich habe Euch wirklich was zu erz=E4hlen, liebe Freunde, I really have to tell you a story! During the last four months, I have been struggling with Stackless Python, and especially with myself and how to get re-focused on my major project which you know very well. Some of you might know quite well too how hard this was for me, especially in the context of my parent's endangeroured health. This particular problem seems to be solved, for the moment, so let's celebrate the moment, celebrate the moment! Without going into details, I would like to tell you about the current status of Stackless Python. For short, like an abstract, Stackless 3.0 is something like an or-merge of Stackless 1.0 and 2.0 technology. Guido, Tim, you both will probably remember my lengthy approaches to introduce those continuations, years ago, you both convinced me to drop them, and I did what I was supposed to do. I'm hopefully a proper citizen, right now. Anyway, you know I'll never really be... After a long period of depression, I re-invented Stackless in early 2002, with a version number of 2.0, denoting that I had dropped all the 1.0 paradigms (as there are: (1) try to keep compatible, (2) do minimal changes only, (3) absolutely avoid assembly code at all) At the same time, I dismissed all of my Stackless 1.0 code, which was continuation-based, an absolute no-no in Guido's eyes. I still do think that TimP wasn't that conformant to this "nono"-statement, after I read a lot of his comments, especially side-notes on the thread-sig, but this time Guido's veto was clearly stronger than Tim's arguing, a thing that doesn't happen so often, but I'm proactively respecting this, positively. Now, after all that rubbish, let's go into facts, which are quite interesting. ------------------------------------------------------------- Today, I finished Stackless Python 3.0, alpha 3.0.1! First of all, I would like to talk about the new principles. Yes, no, there are no longer continuations in that sense. I'm meanwhile convinced that we don't want to support them, any longer, although I'm happy that Stackless allowed me to learn *all* any much more about them that that is avalable on the wor(th|ld) w/h)i(d|l)e net!! Q&A: Q: What is it about that Stackless 3.0, will this guy never shut up??? A: No, he most probably never will, unless he's dead, and this is another 40 or more years in advance, for heaven's sake. Q: So, what is it about that Stackless 3.0 hype around since months? A: Simple! Stackless 3.0 has all the hardware switching stuff in it that Stackless 2.0 had. Stackless 3.0 also incorporates 80% of the soft switching protocol that Stackless 1.0 had. But there are a lot of new features: Stackless has again shown how to marry the impossible with the imbelievable, and this is the new concept of Stackless 3.0: There is a maerge between (1.0) soft context switching and (2.0) hard context switching, which always does the most reasonable thing. There are a lot of benefits which stem from this hybrid solution, which will appear in one of my most recent papers, pretty soon. -------------------------------------------------------------- BLURB Let me simply end this pamphlete with some simple sentences: Stackless Python is more capable of tasklets switching than any other light-weight threading software package. If anyone disagrees, please give me a runnable counter-example. Here are some impressive site-specific time measurements, which especially show, that 20.000.000 cframe tasklet switches per second are really, really hard to beat. Pythonon Win32: D:\slpdev\src\2.2\src\Stackless\test>..\..\pcbuild\python taskspeed.py 10000000 frame switches took 3.83061 seconds, rate =3D 2610551/s 10000000 frame softswitches took 2.40112 seconds, rate =3D 4164718/s 10000000 cfunction calls took 2.13033 seconds, rate =3D 4694098/s 10000000 cframe softswitches took 0.49296 seconds, rate =3D 20285627/s 10000000 cframe switches took 1.98907 seconds, rate =3D 5027486/s 10000000 cframe 100 words took 3.93737 seconds, rate =3D 2539768/s The penalty per stack word is about 0.980 percent of raw switching. Stack size of initial stub =3D 14 Stack size of frame tasklet =3D 58 Stack size of cframe tasklet =3D 35 D:\slpdev\src\2.2\src\Stackless\test> Python on Debian --=20 Christian Tismer :^) <mailto:tismer@tismer.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 pager +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido@python.org Thu Apr 17 14:47:56 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 09:47:56 -0400 Subject: [Python-Dev] Final PEP 311 run In-Reply-To: Your message of "Thu, 17 Apr 2003 11:58:01 +1000." <023a01c30484$c782ad10$530f8490@eden> References: <023a01c30484$c782ad10$530f8490@eden> Message-ID: <200304171347.h3HDluh22368@odiug.zope.com> > I'd like to get PEP311 in for the Python 2.3b1 - > http://www.python.org/peps/pep-0311.html (or even if I miss, very soon > after!) Great! > There appears to be no issues with the technical aspects of the PEP (please > correct me now if I am wrong). The only issue is the name of the API. How about PyGILState_Ensure() and PyGILState_Restore()? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Thu Apr 17 16:27:22 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 17 Apr 2003 17:27:22 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <20030417152722.GA9493@xs4all.nl> On Wed, Apr 16, 2003 at 11:52:10AM -0400, Guido van Rossum wrote: > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! Well, there is the CALL_ATTR patch (http://www.python.org/sf/709744) that Brett and I worked on at the PyCon sprints. It's finished (barring tests) for classic classes, and writing tests is not very inspiring because all functionality is already tested in the standard test suite. However, it doesn't do anything with newstyle classes at all, yet. I've had suprisingly little time since PyCon (it's amazing how not being at the office for two weeks makes people shove work your way -- I guess they realized I couldn't object :) but I'm in the process of grokking newstyle classes. So far, I've been alternating from 'Wow! to 'Au!', and I'll send another email after this one for clarification of a few issues :) Anyway, if anyone has straightforward ideas about how CALL_ATTR should deal with newstyle classes (if at all), please inform me (preferably via SF) or just grab the patch and run with it. I'm still confused about descrgets and where they come from. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Thu Apr 17 16:47:39 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 17 Apr 2003 17:47:39 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <20030417152722.GA9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> Message-ID: <20030417154739.GB9493@xs4all.nl> On Thu, Apr 17, 2003 at 05:27:22PM +0200, Thomas Wouters wrote: > So far, I've been alternating from 'Wow! to 'Au!', and I'll send > another email after this one for clarification of a few issues :) Nevermind that. A "D'oh" slipped into the stream, and I think I get it now. At least the stuff that wasn't working is working now. I wouldn't mind if someone pointed me to a xxtype.c (newstyle class in C) like we have xxobject and xxsubclass though... So far, it's been so simple I fear I'm missing something. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Thu Apr 17 16:53:01 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 11:53:01 -0400 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: Your message of "Thu, 17 Apr 2003 17:27:22 +0200." <20030417152722.GA9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> Message-ID: <200304171553.h3HFr1023445@odiug.zope.com> [Thomas] > Well, there is the CALL_ATTR patch (http://www.python.org/sf/709744) > that Brett and I worked on at the PyCon sprints. It's finished > (barring tests) for classic classes, and writing tests is not very > inspiring because all functionality is already tested in the > standard test suite. However, it doesn't do anything with newstyle > classes at all, yet. And I want the new-style classes version! > I've had suprisingly little time since PyCon (it's amazing how not > being at the office for two weeks makes people shove work your way > -- I guess they realized I couldn't object :) Even without so much that problem here, I was buried in email for a week after returning from Python UK. :-) > but I'm in the process of grokking newstyle classes. So far, I've > been alternating from 'Wow! to 'Au!', and I'll send another email > after this one for clarification of a few issues :) Anyway, if > anyone has straightforward ideas about how CALL_ATTR should deal > with newstyle classes (if at all), please inform me (preferably via > SF) or just grab the patch and run with it. I'm still confused about > descrgets and where they come from. Yes, please. Here's a quick explanation of descriptors: A descriptor is something that lives in a class' __dict__, and primarily affects instance attribute lookup. A descriptor has a __get__ method (in C this is the tp_descrget function in its type object) and the instance attribute lookup calls this to "bind" the descriptor to a specific instance. This is what turns a function into a bound method object in Python 2.2. In earlier versions, functions were special-cased by the instance getattr code; the special case has been subsumed by looking for a __get__ method. Yes, this means that a plain Python function object is a descriptor! Because the instance getattr code returns whatever __get__ returns as the result of the attribute lookup, this is also how properties work: they have a __get__ method that calls the property-get" function. A descriptor's __get__ method is also called for class attribute lookup (with the instance argument set to NULL or None). And a descsriptor's __set__ method is called for instance attribute assignment; but not for class attribute assignment. Hope this helps! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Thu Apr 17 18:28:42 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 Apr 2003 19:28:42 +0200 Subject: [Python-Dev] CALL_ATTR patch In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> Message-ID: <3E9EE44A.6060904@lemburg.com> Guido van Rossum wrote: > Yes, please. Here's a quick explanation of descriptors: > > A descriptor is something that lives in a class' __dict__, and > primarily affects instance attribute lookup. A descriptor has a > __get__ method (in C this is the tp_descrget function in its type > object) and the instance attribute lookup calls this to "bind" the > descriptor to a specific instance. This is what turns a function into > a bound method object in Python 2.2. In earlier versions, functions > were special-cased by the instance getattr code; the special case has > been subsumed by looking for a __get__ method. Yes, this means that a > plain Python function object is a descriptor! Because the instance > getattr code returns whatever __get__ returns as the result of the > attribute lookup, this is also how properties work: they have a > __get__ method that calls the property-get" function. > > A descriptor's __get__ method is also called for class attribute > lookup (with the instance argument set to NULL or None). And a > descsriptor's __set__ method is called for instance attribute > assignment; but not for class attribute assignment. > > Hope this helps! Could you put such short overviews somewhere on the Python Wiki ? They sure help in understanding what is going on behind the scenes without having to grep through tons of source code :-) Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 17 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 68 days left From guido@python.org Thu Apr 17 18:34:31 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 13:34:31 -0400 Subject: [Python-Dev] CALL_ATTR patch In-Reply-To: Your message of "Thu, 17 Apr 2003 19:28:42 +0200." <3E9EE44A.6060904@lemburg.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <3E9EE44A.6060904@lemburg.com> Message-ID: <200304171734.h3HHYVU03250@odiug.zope.com> > Could you put such short overviews somewhere on the Python Wiki ? I don't have the time for that. When I want to publish stuff like this somewhere, I need to spend time to make it all correct, complete etc. > They sure help in understanding what is going on behind the > scenes without having to grep through tons of source code :-) You should start by reading http://www.python.org/2.2.2/descrintro.html If you still have questions about descriptors after reading that, grepping the source is an option. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From whisper@oz.net Thu Apr 17 20:51:08 2003 From: whisper@oz.net (David LeBlanc) Date: Thu, 17 Apr 2003 12:51:08 -0700 Subject: [Python-Dev] Wrappers and keywords Message-ID: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net> (You'll excuse me I hope if this is deemed inappropriate. I'm posting this here rather than in the general list since it's about the language and not it's application.) I am curious to know why the, what seems to me kludgy, "def x(): pass x = (static|class)method(x)" syntax was chosen over a simple "staticdef x ():..." or "classdef x ():..." def specialization syntax? Either method adds keywords to the language, but a direct declaration seems clearer and less error prone to me compared to the "call->assignment magically makes a wrapper" method. Is it hard to do the needed special wrapping directly? Regards, David LeBlanc Seattle, WA USA From skip@pobox.com Thu Apr 17 20:59:29 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 17 Apr 2003 14:59:29 -0500 Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net> References: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net> Message-ID: <16031.1953.359534.974127@montanaro.dyndns.org> David> I am curious to know why the, what seems to me kludgy, "..." David> syntax was chosen over a simple "staticdef" or "classdef" def David> specialization syntax? It was felt that it was more important in the short term to explore/add the functionality and settle details of syntax later. Skip From Jack.Jansen@oratrix.com Thu Apr 17 21:11:39 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 17 Apr 2003 22:11:39 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com> On woensdag, apr 16, 2003, at 17:52 Europe/Amsterdam, Guido van Rossum wrote: > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! The getargs mods got checked in just this morning, even though I explicitly and rather strongly asked that if these mods be made they be checked in *long* before a release was due:-( This means that all the Mac modules are now 100% dead. The same is probably true for PyObjC. And PyObjC has the added problem that it needs to be compatible with both 2.3b1 and 2.2 (notice that that is "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that Apple ships, which is 2.2 at the moment). I assume there are format codes that will convert 16 bit and 32 bit integer quantities without any checks on both 2.2 and 2.3, but I haven't investigated yet. And there may be problems with other wrapper packages (win32, wxPython, PyOpenGL) too. I will start fixing things, but there are only 4 real days left before April 25, given easter, so I would strongly urge for postponing the release date for another two weeks or so. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From martin@v.loewis.de Thu Apr 17 21:12:19 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Apr 2003 22:12:19 +0200 Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net> References: <GCEDKONBLEFPPADDJCOEKEIBJJAA.whisper@oz.net> Message-ID: <3E9F0AA3.7000907@v.loewis.de> David LeBlanc wrote: > I am curious to know why the, what seems to me kludgy, "def x(): pass x = > (static|class)method(x)" syntax was chosen over a simple "staticdef x > ():..." or "classdef x ():..." def specialization syntax? That syntax hasn't been chosen yet; syntactic sugar for static and class methods, properties, slots, and other object types is still an area of ongoing research. The current implementation was created since it did not need an extension to the syntax: x=staticmethod(x) was syntactically correct even in Python 1.2 (which is the oldest Python version I remember). There have been numerous proposals on what the syntactic sugar should look like, which is one reason why no specific solution has been implemented yet. Proposals get usually discredit if they require introduction of new keywords, like "staticdef". The current favorite proposals is to write def x() [static]: pass or perhaps def x() [staticmethod]: pass In that proposal, static(method) would *not* be a keyword, but would be an identifier (denoting the same thing that staticmethod currently denotes). This syntax nicely extends to def x() [threading.synchronized, xmlrpclib.webmethod]: pass The syntax has the disadvantage of not applying nicely to slots. Regards, Martin From guido@python.org Thu Apr 17 21:17:30 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 16:17:30 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: Your message of "Thu, 17 Apr 2003 22:11:39 +0200." <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com> References: <CCAB6251-7110-11D7-AE99-000A27B19B96@oratrix.com> Message-ID: <200304172017.h3HKHUO05664@odiug.zope.com> > > I'd like to do a 2.3b1 release someday. Maybe at the end of next > > week, that would be Friday April 25. If anyone has something that > > needs to be done before this release go out, please let me know! > > The getargs mods got checked in just this morning, even though I > explicitly and rather strongly asked that if these mods be made they > be checked in *long* before a release was due:-( Sorry, I forgot. Did you make a note of that on the SF patch? > This means that all the Mac modules are now 100% dead. The same is > probably true for PyObjC. And PyObjC has the added problem that it > needs to be compatible with both 2.3b1 and 2.2 (notice that that is > "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that > Apple ships, which is 2.2 at the moment). I assume there are format > codes that will convert 16 bit and 32 bit integer quantities without > any checks on both 2.2 and 2.3, but I haven't investigated yet. Maybe we should retract the changes to existing format codes that make them more restrictive? That should revive any code that's curerntly dead, right? > And there may be problems with other wrapper packages (win32, > wxPython, PyOpenGL) too. > > I will start fixing things, but there are only 4 real days left > before April 25, given easter, so I would strongly urge for > postponing the release date for another two weeks or so. That would endanger then entire release schedule to the point of pushing the 2.3 release past the OSCON conference (July 7-11). --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Thu Apr 17 21:30:54 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 17 Apr 2003 22:30:54 +0200 Subject: [Python-Dev] Re: Masks in getargs.c (was: 2.3b1 release) In-Reply-To: <vfxepb1s.fsf@python.net> Message-ID: <7D10D627-7113-11D7-AE99-000A27B19B96@oratrix.com> On woensdag, apr 16, 2003, at 20:11 Europe/Amsterdam, Thomas Heller wrote: > | How about the following counterproposal. This also changes some of > the > | other format codes to be a little more regular. > | > | Code C type Range check > | > | b unsigned char 0..UCHAR_MAX > | B unsigned char none ** > | h unsigned short 0..USHRT_MAX > | H unsigned short none ** > | i int INT_MIN..INT_MAX > | I * unsigned int 0..UINT_MAX > | l long LONG_MIN..LONG_MAX > | k * unsigned long none > | L long long LLONG_MIN..LLONG_MAX > | K * unsigned long long none > | > | Notes: > | > | * New format codes. > | > | ** Changed from previous "range-and-a-half" to "none"; the > | range-and-a-half checking wasn't particularly useful. Do I understand correctly that there is no format code that works on both 2.2 and 2.3 that converts 32 bit quantities without complaining (B and H will work for 8 and 16 bit quantities)? That may be a serious problem for PyObjC.... -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From thomas@xs4all.net Thu Apr 17 21:59:56 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 17 Apr 2003 22:59:56 +0200 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> Message-ID: <20030417205956.GC9493@xs4all.nl> On Thu, Apr 17, 2003 at 11:53:01AM -0400, Guido van Rossum wrote: > > Anyway, if anyone has straightforward ideas about how CALL_ATTR should > > deal with newstyle classes (if at all), please inform me (preferably via > > SF) or just grab the patch and run with it. I'm still confused about > > descrgets and where they come from. > Yes, please. Here's a quick explanation of descriptors: [ the descriptor describes descriptors ] > Hope this helps! Well, yes, in that it reminded me to stop looking for how functions get turned into methods. That part is the same for old-style classes, though, and not quite what I'm confused about. What the call_attr patch does is shortcut the instance_getattr functions in a new function, to do just that what is necessary (and no more.) _Py_instance_getmethod() returns NULL for anything that isn't a method, too, letting the slow case handle it. When it does find a would-be method, it returns the unwrapped function. The call_attr function basically does a PyInstance_Check() and a _Py_instance_getmethod(), and calls the returned function. The problem I have with newstyle classes is where to shortcut what. I understand now how to detect a would-be method, but I'm not sure how to get unwrapped attributes. As far as I understand, types can provide their own getattr function with complete control over descriptors, so there isn't much to shortcut. Unless I should make the shortcut depend on the actual value of tp_getattro, as in shortcut only if it actually is PyObject_GenericGetAttr ? In that case, I'm somewhat sceptical about the speed benefit's cost in maintenance, as it would require a near copy of PyObject_GenericGetAttr (which is already a near-copy of a few other functions :) It's also very hard to control any nested getattrs (possible, I think, because the process goes over all bases' dicts and the instance dict.) Or can we reduce the number of steps PyObject_GenericGetAttr goes through if we know we are just looking for a method ? I don't believe so, but I'm not sure. (Looking at PyObject_GenericGetAttr with that in mind, I wonder if there isn't a possible crash there. In the first MRO lookup, looking for descr's, if a non-data-descr is found, it is kept around but not INCREF'd until later, after the instance-dict is searched. Am I wrong in believing the PyDict_GetItem of the instance dict can call Python code ? There isn't even as much as an assert(PyDict_Check(dict)) there.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From whisper@oz.net Thu Apr 17 22:03:40 2003 From: whisper@oz.net (David LeBlanc) Date: Thu, 17 Apr 2003 14:03:40 -0700 Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <3E9F0AA3.7000907@v.loewis.de> Message-ID: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net> <snip> > There have been numerous proposals on what the syntactic sugar should > look like, which is one reason why no specific solution has been > implemented yet. Proposals get usually discredit if they require > introduction of new keywords, like "staticdef". The current favorite > proposals is to write > > def x() [static]: > pass > > or perhaps > > def x() [staticmethod]: > pass > > In that proposal, static(method) would *not* be a keyword, but would > be an identifier (denoting the same thing that staticmethod currently > denotes). This syntax nicely extends to > > def x() [threading.synchronized, xmlrpclib.webmethod]: > pass I'm not sure what you're suggesting here semantically...? > The syntax has the disadvantage of not applying nicely to slots. > > Regards, > Martin It also has the disadvantage of adding a new syntactical construct to the language does it not (which seems like more pain than a couple of keywords)? I don't recall any other place in the language that uses [] as a way to specify a variable (oops, excepting list comprehensions sort of, and that's not quite the same thing IMO), especially in that position in a statement? It seems like it would open the door to uses (abuses?) like: class foo [abstract]: pass (although, this particular one might satisfy the group that wants interfaces in python) Is there any real difference between what amounts to a reserved constant identifier (with semantic meaning rather than value) compared to a keyword statement sentinal? Are there any other language-level uses like that (reserved constant identifier), or does this introduce something new as well? Speaking of slots, is their primary purpose to have classes whose instances are not morphable? If so, one might default to all classes being non-morphable by default and having something like: class foo [morphable]: pass as identifying those which are (an obviously python-3000 feature if implemented thusly). Regards, Dave LeBlanc Seattle, WA USA From Jack.Jansen@oratrix.com Thu Apr 17 22:29:24 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 17 Apr 2003 23:29:24 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304172017.h3HKHUO05664@odiug.zope.com> Message-ID: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com> On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van Rossum wrote: >>> I'd like to do a 2.3b1 release someday. Maybe at the end of next >>> week, that would be Friday April 25. If anyone has something that >>> needs to be done before this release go out, please let me know! >> >> The getargs mods got checked in just this morning, even though I >> explicitly and rather strongly asked that if these mods be made they >> be checked in *long* before a release was due:-( > > Sorry, I forgot. Did you make a note of that on the SF patch? Yes, I'm pretty sure I did. Thomas also seems to refer to it... >> This means that all the Mac modules are now 100% dead. The same is >> probably true for PyObjC. And PyObjC has the added problem that it >> needs to be compatible with both 2.3b1 and 2.2 (notice that that is >> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that >> Apple ships, which is 2.2 at the moment). I assume there are format >> codes that will convert 16 bit and 32 bit integer quantities without >> any checks on both 2.2 and 2.3, but I haven't investigated yet. > > Maybe we should retract the changes to existing format codes that make > them more restrictive? That should revive any code that's curerntly > dead, right? That would be much better. if "l" (lower case ell) would continue to accept anything I wouldn't have to change anything. Of course I've been busy all night fixing code, but apart from a couple of hand-crafted modules I haven't checked anything in yet. I will check it in on a branch later tonight, and then I'll either forget about the branch or merge it, depending on the resolution of this. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From gherron@islandtraining.com Thu Apr 17 22:39:16 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 17 Apr 2003 14:39:16 -0700 Subject: [Python-Dev] Build errors under RH9 Message-ID: <200304171439.17504.gherron@islandtraining.com> I just upgraded my development system to RedHat 9, and now I get two compilation errors on the Python CVS tree. I'll have time to examine them tonight, but I thought I'd get a notice out now on the chance that someone else has already resolved them. 1. Compilation of _tkinter comes up with #error "unsupported Tcl configuration" The failing test was changed just yesterday, but the previous version gives the same results: In revision 1.155: #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6) and in revision 1.154: #if TCL_UTF_MAX != 3 And yet, if I remove the test, I get a (very minimally tested) working version of Tkinter, so the test should probably be modified to pass in whatever circumstances RH 9 presents. 2. Compilation of _ssl.c fails to find, through a chain of includes, file krb5.h. Then things rapidly go to hell. Defining #define OPENSSL_NO_KRB5 gets through the compilation, but I don't yet know how to test it. (How to I get past the "Use of the `network' resource not enabled" result of running test_socket_ssl.py?) Gary Herron From skip@pobox.com Thu Apr 17 22:43:23 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 17 Apr 2003 16:43:23 -0500 Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net> References: <3E9F0AA3.7000907@v.loewis.de> <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net> Message-ID: <16031.8187.420786.801944@montanaro.dyndns.org> David> It also has the disadvantage of adding a new syntactical David> construct to the language does it not (which seems like more pain David> than a couple of keywords)? I don't recall any other place in David> the language that uses [] as a way to specify a variable (oops, David> excepting list comprehensions sort of, and that's not quite the David> same thing IMO), especially in that position in a statement? Adding new syntactic sugar is less problem than adding keywords for two reasons: * old code may have used the new keyword as a variable (because it wasn't a keyword) * old code won't have used the new syntactic sugar (because it wasn't proper syntax) Combined, it means there is a higher probability that old code will continue to run with a new bit of syntax than with a new keyword. You can think of [mod1, mod2, ...] as precisely a list of modifiers to normal functions, so it is very much like existing list construction syntax in that regard. Also "[...]" often means "optional" in may grammar specifications or documentation, so there's an added hint as to the meaning. Skip From theller@python.net Thu Apr 17 22:54:08 2003 From: theller@python.net (Thomas Heller) Date: 17 Apr 2003 23:54:08 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com> References: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com> Message-ID: <ptnk94e7.fsf@python.net> Jack Jansen <Jack.Jansen@oratrix.com> writes: > On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van > Rossum wrote: > > > >>> I'd like to do a 2.3b1 release someday. Maybe at the end of next > >>> week, that would be Friday April 25. If anyone has something that > >>> needs to be done before this release go out, please let me know! > >> > >> The getargs mods got checked in just this morning, even though I > >> explicitly and rather strongly asked that if these mods be made they > >> be checked in *long* before a release was due:-( > > > > Sorry, I forgot. Did you make a note of that on the SF patch? > > Yes, I'm pretty sure I did. Thomas also seems to refer to it... He did, and I also mentioned it yesterday. OTOH, I had sitting a first version of the patch on SF for a rather long time (shortly after the alpha2 release), asking for feedback, but didn't get any. > > >> This means that all the Mac modules are now 100% dead. The same is > >> probably true for PyObjC. And PyObjC has the added problem that it > >> needs to be compatible with both 2.3b1 and 2.2 (notice that that is > >> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that > >> Apple ships, which is 2.2 at the moment). I assume there are format > >> codes that will convert 16 bit and 32 bit integer quantities without > >> any checks on both 2.2 and 2.3, but I haven't investigated yet. > > > > Maybe we should retract the changes to existing format codes that make > > them more restrictive? That should revive any code that's curerntly > > dead, right? > > That would be much better. if "l" (lower case ell) would continue to > accept anything I wouldn't have to change anything. > Guido has also suggested to keep another code without changes, I cannot remember which one it was, but there is a comment on SF. I have the impression that the new test_getargs2.py test makes it easy to change the behaviour and verify it to anything we want. In case it is too much trouble, why not backout all this again (although someone else would have to do it, I'm basically offline until tuesday), and check in after the b1 release. Sorry, Thomas From klm@zope.com Thu Apr 17 23:02:20 2003 From: klm@zope.com (Ken Manheimer) Date: Thu, 17 Apr 2003 18:02:20 -0400 (EDT) Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <16031.8187.420786.801944@montanaro.dyndns.org> Message-ID: <Pine.LNX.4.44.0304171745490.963-100000@korak.zope.com> On Thu, 17 Apr 2003, Skip Montanaro wrote: > Adding new syntactic sugar is less problem than adding keywords for two > reasons: > > * old code may have used the new keyword as a variable (because it > wasn't a keyword) > > * old code won't have used the new syntactic sugar (because it wasn't > proper syntax) (If i recall correctly, minting new keywords is particularly onerous in python because of its simple parser. Specifically, you can't use keywords for variable names anywhere, even outside the syntactic construct which involves the keyword. Hence the need to use 'klass' instead of 'class' for parameter names, no variables named 'from', etc. 'import's recent aliasing refinement - import x as y was implemented without making "as" a keyword specifically to avoid this drawback. "as" gets its role there purely by virtue of the import syntax, not as a new keyword - and so you can use "as" as a variable name, etc. The scheme for qualifying function definitions with [...] would have the same virtue - not requiring the qualifiers to be new keywords...) -- Ken klm@zope.com From martin@v.loewis.de Thu Apr 17 23:29:06 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 18 Apr 2003 00:29:06 +0200 Subject: [Python-Dev] Wrappers and keywords In-Reply-To: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net> References: <GCEDKONBLEFPPADDJCOEIEIIJJAA.whisper@oz.net> Message-ID: <3E9F2AB2.4010708@v.loewis.de> David LeBlanc wrote: >>In that proposal, static(method) would *not* be a keyword, but would >>be an identifier (denoting the same thing that staticmethod currently >>denotes). This syntax nicely extends to >> >> def x() [threading.synchronized, xmlrpclib.webmethod]: >> pass > > > I'm not sure what you're suggesting here semantically...? That is part of the point: You could add arbitrary annotations to function definitions, to indicate that they are static methods, to indicate that multiple calls to them should be synchronized, or to indicate that the method should be available via SOAP (the simple object access protocol). The language would not associate any inherent semantics. Instead, the identifiers in the square brackets would be callable (or have some other interface) that modifies the function-under-construction, to integrate additional aspects. > It also has the disadvantage of adding a new syntactical construct to the > language does it not (which seems like more pain than a couple of keywords)? No. The disadvantage of adding keywords is that it breaks backwards compatibility: Somebody might be using that identifier already. When it becomes a keyword, existing code that works now would stop working. With the extension of sqare brackets after the parameter list, nothing breaks, as you can't currently put brackets in that place. > I don't recall any other place in the language that uses [] as a way to > specify a variable (oops, excepting list comprehensions sort of, and that's > not quite the same thing IMO), especially in that position in a statement? > It seems like it would open the door to uses (abuses?) like: > class foo [abstract]: > pass The syntax is inspired by DCOM IDL, and by C#, both allowing to annotate declarations with square brackets. > Is there any real difference between what amounts to a reserved constant > identifier (with semantic meaning rather than value) compared to a keyword > statement sentinal? What is a keyword statement sentinal, and what alternatives are you comparing here? > Are there any other language-level uses like that > (reserved constant identifier), or does this introduce something new as > well? If you are referring the the def foo()[static] proposal: "static" would not be reserved nor constant. Instead, writing def foo()[bar1, bar2]: body would be a short-hand for writing def foo(): body foo = bar1(foo) foo = bar2(foo) bar1 and bar2 could be arbitrary expressions - nothing reserved at all. > Speaking of slots, is their primary purpose to have classes whose instances > are not morphable? No. Regards, Martin From neal@metaslash.com Thu Apr 17 23:23:59 2003 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 17 Apr 2003 18:23:59 -0400 Subject: [Python-Dev] Build errors under RH9 In-Reply-To: <200304171439.17504.gherron@islandtraining.com> References: <200304171439.17504.gherron@islandtraining.com> Message-ID: <20030417222359.GB28630@epoch.metaslash.com> On Thu, Apr 17, 2003 at 02:39:16PM -0700, Gary Herron wrote: > I just upgraded my development system to RedHat 9, and now I get two > compilation errors on the Python CVS tree. I'll have time to examine > them tonight, but I thought I'd get a notice out now on the chance > that someone else has already resolved them. > > 1. Compilation of _tkinter comes up with > #error "unsupported Tcl configuration" > > The failing test was changed just yesterday, but the previous > version gives the same results: > In revision 1.155: > #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6) > and in revision 1.154: > #if TCL_UTF_MAX != 3 > > And yet, if I remove the test, I get a (very minimally tested) > working version of Tkinter, so the test should probably be modified > to pass in whatever circumstances RH 9 presents. I believe Martin von Loewis already checked in a fix for this. http://python.org/sf/719880 > 2. Compilation of _ssl.c fails to find, through a chain of includes, > file krb5.h. Then things rapidly go to hell. > > Defining > #define OPENSSL_NO_KRB5 > gets through the compilation, but I don't yet know how to test it. > > (How to I get past the > "Use of the `network' resource not enabled" > result of running test_socket_ssl.py?) I just checked in a fix for Feature Request #719429 which fixes this problem. It finds the header file. To enable resources: ./python -E -tt ./Lib/test/regrtest.py -u network I have a couple of failures. I think they may have occurred before upgrading. Is anybody else seeing this? test_array OverflowError: unsigned short integer is less than minimum test_logging - (I think this is the old test sensitivity) test_trace - AssertionError: events did not match expectation Neal From martin@v.loewis.de Thu Apr 17 23:35:15 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 18 Apr 2003 00:35:15 +0200 Subject: [Python-Dev] Build errors under RH9 In-Reply-To: <200304171439.17504.gherron@islandtraining.com> References: <200304171439.17504.gherron@islandtraining.com> Message-ID: <3E9F2C23.70809@v.loewis.de> Gary Herron wrote: > I just upgraded my development system to RedHat 9, and now I get two > compilation errors on the Python CVS tree. I'll have time to examine > them tonight, but I thought I'd get a notice out now on the chance > that someone else has already resolved them. > > 1. Compilation of _tkinter comes up wit > #error "unsupported Tcl configuration" > > The failing test was changed just yesterday, but the previous > version gives the same results: > In revision 1.155: > #if TCL_UTF_MAX != 3 && !(defined(Py_UNICODE_WIDE) && TCL_UTF_MAX==6) > and in revision 1.154: > #if TCL_UTF_MAX != 3 > > And yet, if I remove the test, I get a (very minimally tested) > working version of Tkinter, so the test should probably be modified > to pass in whatever circumstances RH 9 presents. That change is indeed intended to fix the problem. You need to configure Python with --enable-unicode=ucs4 on Redhat 9; compiling in UCS-2 support is not supported if you want Tkinter to work with the Tk provided by Redhat. Before this change, --enable-unicode=ucs4 would not work, either. Outright removing the test gives is incorrect as well. > (How to I get past the > "Use of the `network' resource not enabled" > result of running test_socket_ssl.py?) Pass "-u network" to regrtest. Regards, Martin From guido@python.org Fri Apr 18 00:14:50 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 19:14:50 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: "Your message of 17 Apr 2003 23:54:08 +0200." <ptnk94e7.fsf@python.net> References: <A8C78406-711B-11D7-AE99-000A27B19B96@oratrix.com> <ptnk94e7.fsf@python.net> Message-ID: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net> > Jack Jansen <Jack.Jansen@oratrix.com> writes: > > > On donderdag, apr 17, 2003, at 22:17 Europe/Amsterdam, Guido van > > Rossum wrote: > > > > > > >>> I'd like to do a 2.3b1 release someday. Maybe at the end of next > > >>> week, that would be Friday April 25. If anyone has something that > > >>> needs to be done before this release go out, please let me know! > > >> > > >> The getargs mods got checked in just this morning, even though I > > >> explicitly and rather strongly asked that if these mods be made they > > >> be checked in *long* before a release was due:-( > > > > > > Sorry, I forgot. Did you make a note of that on the SF patch? > > > > Yes, I'm pretty sure I did. Thomas also seems to refer to it... > > He did, and I also mentioned it yesterday. > OTOH, I had sitting a first version of the patch on SF for a rather long > time (shortly after the alpha2 release), asking for feedback, but > didn't get any. That was my fault -- I was too busy. :-( > > >> This means that all the Mac modules are now 100% dead. The same is > > >> probably true for PyObjC. And PyObjC has the added problem that it > > >> needs to be compatible with both 2.3b1 and 2.2 (notice that that is > > >> "2.2", not "2.2.X": PyObjC has to work with /usr/bin/python that > > >> Apple ships, which is 2.2 at the moment). I assume there are format > > >> codes that will convert 16 bit and 32 bit integer quantities without > > >> any checks on both 2.2 and 2.3, but I haven't investigated yet. > > > > > > Maybe we should retract the changes to existing format codes that make > > > them more restrictive? That should revive any code that's curerntly > > > dead, right? > > > > That would be much better. if "l" (lower case ell) would continue to > > accept anything I wouldn't have to change anything. > > Guido has also suggested to keep another code without changes, I cannot > remember which one it was, but there is a comment on SF. That was 'h'. > I have the impression that the new test_getargs2.py test makes it easy > to change the behaviour and verify it to anything we want. > > In case it is too much trouble, why not backout all this again (although > someone else would have to do it, I'm basically offline until tuesday), and > check in after the b1 release. I'll back out the change to 'h', which is the only incompatible change I can see (unless you consider accepting *more* than before an error). Thomas made no changes to 'l', so I'm not sure what that is about -- maybe the problem is with unsigned hex constants? --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Fri Apr 18 01:06:50 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 18 Apr 2003 02:06:50 +0200 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <20030417205956.GC9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> Message-ID: <20030418000650.GD9493@xs4all.nl> On Thu, Apr 17, 2003 at 10:59:56PM +0200, Thomas Wouters wrote: > Unless I should make the shortcut depend on the actual value of > tp_getattro, as in shortcut only if it actually is > PyObject_GenericGetAttr ? Well, I went ahead and did that, and uploaded the new patch to SF. The result is somewhat annoying, but explainable: The patch is now 3% _slower_ than an unmodified Python, whereas the patch without support for newstyle classes was a good 5% _faster_ than unmodified. This is both according to PyBench (which doesn't use newstyle classes) and according to 'time timeit.py pass' (which does use newstyle classes.) Timing just 'x.foo()' where 'x' is a newstyle class instance is about 20% faster, against 25-30% for oldstyle classes. The overall slowdown is caused by the fact that the patch only treats PyFunctions (functions written in Python) specially, and not PyMethodDescrs (PyCFunctions wrapped in PyMethodDefs wrapped in a descriptor.) This is because it would still need to instantiate a PyCFunctionObject (a PyObject wrapper for a PyCFunction, which is just a C function-pointer) OR it would need to do all interpretation of METH_* arguments and a bunch of argument-preparing itself. Another possible cause for the slowdown (but almost certainly not as substantial as the type-with-C-function one) is calling an almost-method on a newstyle class; a callable object that is an attribute of a type (or instance of the type) but is not a PyFunction or PyMethodDescr. The way the current mechanisms works, it would have to traverse the MRO and (possibly) check the instance dict twice; first to determine that it's not a PyFunction in _PyObject_Generic_getmethod() and then again in the regular run though PyObject_GenericGetAttr(). Examples of this case would be staticmethods, classmethods, and other callable objects as attributes. I do not believe this is a substantial party though. The slowdown can be fixed in two ways: handing PyMethodDescrs as well, in _PyObject_Generic_getmethod(), or removing the double lookups. Hm, wait, handling PyMethodDescrs may not be as tricky as I thought... hrm... I'll look at it tomorrow, it's time for bed. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@python.org Fri Apr 18 01:22:56 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 20:22:56 -0400 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: "Your message of Thu, 17 Apr 2003 22:59:56 +0200." <20030417205956.GC9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> Message-ID: <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net> > (Looking at PyObject_GenericGetAttr with that in mind, I wonder if > there isn't a possible crash there. In the first MRO lookup, looking > for descr's, if a non-data-descr is found, it is kept around but not > INCREF'd until later, after the instance-dict is searched. Am I > wrong in believing the PyDict_GetItem of the instance dict can call > Python code ? It can, if there's a key whose type has a custom __eq__ or __cmp__. So indeed, if this custom __eq__ is evil enough to delete the corresponding key from the class dict, it could cause descr to point to freed memory. I won't try to construct a case, but it's not impossible. :-( Fixing this would make the code even hairier though... :-( > There isn't even as much as an assert(PyDict_Check(dict)) there.) All over the place it is assumed and ensured that a types tp_dict and an instance's __dict__ are always real dicts. The only way this could be violated would be by C code defining a type that violates this. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Fri Apr 18 01:34:31 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 18 Apr 2003 02:34:31 +0200 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <20030418000650.GD9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl> Message-ID: <20030418003431.GE9493@xs4all.nl> On Fri, Apr 18, 2003 at 02:06:50AM +0200, Thomas Wouters wrote: > Hm, wait, handling PyMethodDescrs may not be as tricky as I thought... > hrm... I'll look at it tomorrow, it's time for bed. I did a quick hack to the same effect, and it still came out a 1% loss (so about 6% against the no-newstyle patch) in PyBench and a few timeit tests. Sigh. I guess the non-method overhead is just too large, or there are more almost-methods than I figured. I'll start work on a more lookup-saving _PyObject_Generic_getmethod tomorrow or this weekend (and will probably do _Py_instance_getmethod that way too, while I'm at it.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg@cosc.canterbury.ac.nz Fri Apr 18 03:15:11 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Apr 2003 14:15:11 +1200 (NZST) Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <200304171553.h3HFr1023445@odiug.zope.com> Message-ID: <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz> > In earlier versions, functions were special-cased by the instance > getattr code; the special case has been subsumed by looking for a > __get__ method. Yes, this means that a plain Python function object > is a descriptor! While we're on the topic -- Guid, how would you feel about the idea of giving built-in function objects the same instance- binding behaviour as interpreted functions? This would help Pyrex considerably, because currently I have to resort to a kludge to make Pyrex-defined functions work as methods. It mostly works, but it has some side effects, such as breaking the most common idiomatic usage of staticmethod() and classmethod(). If built-in functions were more like interpreted functions in this regard, all these problems would go away. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Apr 18 03:21:06 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Apr 2003 14:21:06 +1200 (NZST) Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <20030417205956.GC9493@xs4all.nl> Message-ID: <200304180221.h3I2L6911441@oma.cosc.canterbury.ac.nz> Thomas Wouters <thomas@xs4all.net>: > The problem I have with newstyle classes is where to shortcut what. It sounds to me like descriptor objects will need to have a __callattr__ slot added. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Fri Apr 18 03:27:09 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 17 Apr 2003 22:27:09 -0400 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: "Your message of Fri, 18 Apr 2003 14:15:11 +1200." <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz> References: <200304180215.h3I2FBP11374@oma.cosc.canterbury.ac.nz> Message-ID: <200304180227.h3I2R9R14494@pcp02138704pcs.reston01.va.comcast.net> > While we're on the topic -- Guid, how would you feel about the > idea of giving built-in function objects the same instance- > binding behaviour as interpreted functions? > > This would help Pyrex considerably, because currently I > have to resort to a kludge to make Pyrex-defined functions > work as methods. It mostly works, but it has some side > effects, such as breaking the most common idiomatic > usage of staticmethod() and classmethod(). > > If built-in functions were more like interpreted functions > in this regard, all these problems would go away. There are two ways to "bind" a built-in function to an object. One would be to do what happens for Python functions, which is in effect a currying: f.__get__(obj) yields a function g that when called as g(arg1, ...) calls f(obj, arg1, ...). (In fact, I've recently checked in a change that makes instancemethod a general currying function on the first argument. :-) But the other interpretation, which might be more appropriate for C functions, is that the bound instance is passed to the first argument at the *C* level, usually called self: PyObject * my_c_function(PyObject *self, PyObject *args) { ... } Which one would you like? I think we could do each rather easily (perhaps the first more easily because the type needed to represent the bound method already exist; for the second I think we'd have to introduce a new helper object type). --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Fri Apr 18 04:38:30 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Apr 2003 15:38:30 +1200 (NZST) Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <200304180227.h3I2R9R14494@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz> > There are two ways to "bind" a built-in function to an object. > > One would be to do what happens for Python functions, which is in > effect a currying: f.__get__(obj) yields a function g that when called > as g(arg1, ...) calls f(obj, arg1, ...). That's the one I'm talking about. I forgot to explain that the problem occurs when I'm creating a *Python* class object and populating it with functions that are supposed to be methods. Currently I have to manually wrap each function in an unbound method object before putting it in the class's __dict__. If that happened automatically on access, I would be able to create Python classes that behave more like the real thing. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From Jack.Jansen@oratrix.com Fri Apr 18 09:19:24 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 18 Apr 2003 10:19:24 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com> On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum wrote: >>>> Maybe we should retract the changes to existing format codes that >>>> make >>>> them more restrictive? That should revive any code that's curerntly >>>> dead, right? >>> >>> That would be much better. if "l" (lower case ell) would continue to >>> accept anything I wouldn't have to change anything. >> >> Guido has also suggested to keep another code without changes, I >> cannot >> remember which one it was, but there is a comment on SF. > > That was 'h'. Right, 'h' turns out to be the problem. I changed a lot of 'l's to 'k's, but it seems this one is the real killer. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From Jack.Jansen@oratrix.com Fri Apr 18 09:48:35 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 18 Apr 2003 10:48:35 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304172314.h3HNEpg11408@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com> On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum wrote: > I'll back out the change to 'h', which is the only incompatible change > I can see (unless you consider accepting *more* than before an error). > Thomas made no changes to 'l', so I'm not sure what that is about -- > maybe the problem is with unsigned hex constants? Okay, great!! Is this a temporary measure, i.e. is the semantic change to 'h' going to come back after 2.3 is out? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From g_a_l_l_a@mail333.com Fri Apr 18 10:19:17 2003 From: g_a_l_l_a@mail333.com (g_a_l_l_a@mail333.com) Date: 18 Apr 2003 13:19:17 +0400 Subject: [Python-Dev] 0400058546-ID: We are glad to inform you about the changes in our website Message-ID: <2003.04.18.03847F4F401D71F0@mail333.com> Dear Sir or Madam, We are glad to inform you about the changes in our website http://www.gallery-a.ru. Now you can get to know the price for the paintings in our gallery without filling the order form. Also, we have new works of Utkin Alexandr, Belonog Anatoly and other painters. Soon there will be pages of new painters and new section of free wallpapers and screensavers. Welcome to our website! Gallery-a curator. Sorry if that information not interesting for You and we disturb You with our message! For removing yor address from this mailing list just replay this message with word 'unsubscribe' in subject field or simple click this link: http://www.gallery-a.ru/unsubscribe.php?e=cHl0aG9uLWRldkBweXRob24ub3JnOjE2ODAwMzcx From mal@lemburg.com Fri Apr 18 11:51:23 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 18 Apr 2003 12:51:23 +0200 Subject: [Python-Dev] Startup overhead due to codec usage In-Reply-To: <m3el475244.fsf@mira.informatik.hu-berlin.de> References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> <3E97FD37.9040100@lemburg.com> <m3el475244.fsf@mira.informatik.hu-berlin.de> Message-ID: <3E9FD8AB.8040400@lemburg.com> Martin v. L=F6wis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >=20 >>The codec machinery was carefully designed not to introduce >>extra overhead when not using Unicode in programs. The above >>approach pretty much kills this effort :-) >=20 > This effort is dead already. For example, on Unix, the file system > default encoding is initialized from the user's preference; to verify > that the encoding really exists, a codec lookup is performed. Hmm, then we should fix this and the site.py lookup you introduced. I don't see the point in increasing startup time for all scripts just because a seldom used feature needs initialization. BTW, I wonder what happens if you run a Python version with Unicode disabled in the current scenario. --=20 Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 18 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 67 days left From martin@v.loewis.de Fri Apr 18 12:33:17 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 18 Apr 2003 13:33:17 +0200 Subject: [Python-Dev] Startup overhead due to codec usage In-Reply-To: <3E9FD8AB.8040400@lemburg.com> References: <000001c30099$711a6f60$530f8490@eden> <3E97F159.20909@lemburg.com> <m3r8883qy8.fsf@mira.informatik.hu-berlin.de> <3E97FD37.9040100@lemburg.com> <m3el475244.fsf@mira.informatik.hu-berlin.de> <3E9FD8AB.8040400@lemburg.com> Message-ID: <m3n0io82gy.fsf@mira.informatik.hu-berlin.de> "M.-A. Lemburg" <mal@lemburg.com> writes: > Hmm, then we should fix this and the site.py lookup you > introduced. I don't see the point in increasing startup > time for all scripts just because a seldom used feature needs > initialization. I really don't see the need to fix anything here. I wouldn't mind somebody else fixing something, as long as none of the features break. > BTW, I wonder what happens if you run a Python version with > Unicode disabled in the current scenario. The nl_langinfo code in Python/pythonrun.c is disabled when unicode is disabled. In turn, it won't be executed, and Py_FileSystemDefaultEncoding stays at NULL. This is no problem, as it is never used, anyway. For the code in site.py (I think), finding the codec will fail with an exception, which will be caught, and the "mbcs" alias will be added. Regards, Martin From dave@boost-consulting.com Fri Apr 18 14:06:30 2003 From: dave@boost-consulting.com (David Abrahams) Date: Fri, 18 Apr 2003 09:06:30 -0400 Subject: [Python-Dev] Re: CALL_ATTR patch References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <3E9EE44A.6060904@lemburg.com> <200304171734.h3HHYVU03250@odiug.zope.com> Message-ID: <uvfxc550p.fsf@boost-consulting.com> Guido van Rossum <guido@python.org> writes: >> Could you put such short overviews somewhere on the Python Wiki ? > > I don't have the time for that. When I want to publish stuff like > this somewhere, I need to spend time to make it all correct, complete > etc. Besides which, it's already in the docs. Correct, complete, and all that ;-) http://www.python.org/dev/doc/devel/ref/descriptors.html -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Fri Apr 18 14:21:02 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 18 Apr 2003 09:21:02 -0400 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: "Your message of Fri, 18 Apr 2003 15:38:30 +1200." <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz> References: <200304180338.h3I3cUY12383@oma.cosc.canterbury.ac.nz> Message-ID: <200304181321.h3IDL2922688@pcp02138704pcs.reston01.va.comcast.net> > > There are two ways to "bind" a built-in function to an object. > > > > One would be to do what happens for Python functions, which is in > > effect a currying: f.__get__(obj) yields a function g that when called > > as g(arg1, ...) calls f(obj, arg1, ...). > > That's the one I'm talking about. I forgot to explain that the problem > occurs when I'm creating a *Python* class object and populating it > with functions that are supposed to be methods. Currently I have to > manually wrap each function in an unbound method object before putting > it in the class's __dict__. If that happened automatically on access, > I would be able to create Python classes that behave more like the > real thing. OK, are you up for submitting a patch? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 18 14:25:23 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 18 Apr 2003 09:25:23 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: "Your message of Fri, 18 Apr 2003 10:19:24 +0200." <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com> References: <76AFD8BC-7176-11D7-9CB8-000A27B19B96@oratrix.com> Message-ID: <200304181325.h3IDPNl22760@pcp02138704pcs.reston01.va.comcast.net> > Right, 'h' turns out to be the problem. I changed a lot of 'l's to > 'k's, but it seems this one is the real killer. So now that I rolled back 'h', is there any reason not to keep the rest of these changes? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 18 14:41:30 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 18 Apr 2003 09:41:30 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: "Your message of Fri, 18 Apr 2003 10:48:35 +0200." <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com> References: <8A4705E2-717A-11D7-9CB8-000A27B19B96@oratrix.com> Message-ID: <200304181341.h3IDfUx22802@pcp02138704pcs.reston01.va.comcast.net> > On vrijdag, apr 18, 2003, at 01:14 Europe/Amsterdam, Guido van Rossum > wrote: > > I'll back out the change to 'h', which is the only incompatible change > > I can see (unless you consider accepting *more* than before an error). > > Thomas made no changes to 'l', so I'm not sure what that is about -- > > maybe the problem is with unsigned hex constants? > > Okay, great!! > > Is this a temporary measure, i.e. is the semantic change to 'h' > going to come back after 2.3 is out? I don't see why -- it always was a signed short, let it stay that way. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Fri Apr 18 19:26:22 2003 From: mwh@python.net (Michael Hudson) Date: Fri, 18 Apr 2003 19:26:22 +0100 Subject: [Python-Dev] CALL_ATTR patch In-Reply-To: <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Thu, 17 Apr 2003 20:22:56 -0400") References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <200304180022.h3I0Mu012443@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2m3ckfr7ap.fsf@starship.python.net> Guido van Rossum <guido@python.org> writes: >> (Looking at PyObject_GenericGetAttr with that in mind, I wonder if >> there isn't a possible crash there. In the first MRO lookup, looking >> for descr's, if a non-data-descr is found, it is kept around but not >> INCREF'd until later, after the instance-dict is searched. Am I >> wrong in believing the PyDict_GetItem of the instance dict can call >> Python code ? > > It can, if there's a key whose type has a custom __eq__ or __cmp__. > So indeed, if this custom __eq__ is evil enough to delete the > corresponding key from the class dict, it could cause descr to point > to freed memory. I won't try to construct a case, but it's not > impossible. :-( Indeed, there are several examples of this sort of thing already in Lib/test/test_mutants.py. Cheers, M. -- If comp.lang.lisp *is* what vendors are relying on to make or break Lisp sales, that's more likely the problem than is the effect of any one of us on such a flimsy marketing strategy... -- Kent M Pitman, comp.lang.lisp From pje@telecommunity.com Fri Apr 18 20:02:14 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Fri, 18 Apr 2003 15:02:14 -0400 Subject: [Python-Dev] Built-in functions as methods Message-ID: <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net> Hi guys. Greg, you were asking about making built-in functions act like methods. This could break code, if it applies to all built-in functions. In more than one Python version, I have stuck a built-in type or function into a class, under the assumption that it would behave as a 'staticmethod' now does. If all built-in functions start acting like unbound methods, existing code will break. I'm not positive, but I think there's even code like this in the standard library. I'm all for anything that makes Pyrex easier for Greg to maintain <wink>, but perhaps there is a flag that could be used to request the behavior so that existing code won't break? From andymac@bullseye.apana.org.au Fri Apr 18 14:24:30 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Sat, 19 Apr 2003 00:24:30 +1100 (edt) Subject: [Python-Dev] Build errors under RH9 In-Reply-To: <20030417222359.GB28630@epoch.metaslash.com> Message-ID: <Pine.OS2.4.44.0304190015530.1480-100000@tenring.andymac.org> On Thu, 17 Apr 2003, Neal Norwitz wrote: > I have a couple of failures. I think they may have occurred > before upgrading. Is anybody else seeing this? > > test_array OverflowError: unsigned short integer is less than minimum On FreeBSD 4.4, I'm seeing this one... > test_logging - (I think this is the old test sensitivity) > test_trace - AssertionError: events did not match expectation ... but not these 2. I've also stumbled across a compiler optimisation issue with the changes Guido checked in to SRE on April 14 - on FreeBSD 4.4 at least. gcc -O3 produces an _sre.o that gives rise to a bus error; -O2 works (gcc is v2.95.3). -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From guido@python.org Fri Apr 18 20:28:02 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 18 Apr 2003 15:28:02 -0400 Subject: [Python-Dev] Built-in functions as methods In-Reply-To: "Your message of Fri, 18 Apr 2003 15:02:14 EDT." <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net> References: <5.1.1.6.0.20030418145652.02f59e70@mail.rapidsite.net> Message-ID: <200304181928.h3IJS2m01321@pcp02138704pcs.reston01.va.comcast.net> > Hi guys. Greg, you were asking about making built-in functions act like > methods. This could break code, if it applies to all built-in > functions. In more than one Python version, I have stuck a built-in type > or function into a class, under the assumption that it would behave as a > 'staticmethod' now does. If all built-in functions start acting like > unbound methods, existing code will break. > > I'm not positive, but I think there's even code like this in the standard > library. > > I'm all for anything that makes Pyrex easier for Greg to maintain <wink>, > but perhaps there is a flag that could be used to request the behavior so > that existing code won't break? Good point! I suppose Greg could use something very similar to the standard built-in object type but with a __get__ method, or he could define a flag you have to set in the ml_flags before __get__ returns a bound function. --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Fri Apr 18 21:33:23 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 18 Apr 2003 22:33:23 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304181325.h3IDPNl22760@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <00662E75-71DD-11D7-B18D-000A27B19B96@oratrix.com> On vrijdag, apr 18, 2003, at 15:25 Europe/Amsterdam, Guido van Rossum wrote: >> Right, 'h' turns out to be the problem. I changed a lot of 'l's to >> 'k's, but it seems this one is the real killer. > > So now that I rolled back 'h', is there any reason not to keep the > rest of these changes? No, everything is fine as it is now. I'm happy again! -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From drifty@alum.berkeley.edu Fri Apr 18 22:39:15 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 18 Apr 2003 14:39:15 -0700 (PDT) Subject: [Python-Dev] pytho-dev Summary for 2003-04-01 through 2003-04-15 Message-ID: <Pine.SOL.4.55.0304181437030.378@death.OCF.Berkeley.EDU> Sorry this is later than normal but I got sucked into helping with elections at UC Berkeley by providing IT support (first time elections are all on computer). Anyway, you guys have until Monday night to reply with corrections for the summary. +++++++++++++++++++++++++++++++++++++++++++++++++++++ python-dev Summary for 2003-04-01 through 2003-04-15 +++++++++++++++++++++++++++++++++++++++++++++++++++++ This is a summary of traffic on the `python-dev mailing list`_ from April 1, 2003 through April 15, 2003. It is intended to inform the wider Python community of on-going developments on the list and to have an archived summary of each thread started on the list. To comment on anything mentioned here, just post to python-list@python.org or `comp.lang.python`_ with a subject line mentioning what you are discussing. All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join `python-dev`_! This is the fifteenth summary written by Brett Cannon (<voice of Comic Book Guy from "The Simpsons">Most summaries written by a single person *ever* </voice>). All summaries are archived at http://www.python.org/dev/summary/ . Please note that this summary is written using reStructuredText_ which can be found at http://docutils.sf.net/rst.html . Any unfamiliar punctuation is probably markup for reST_ (otherwise it is probably regular expression syntax or a typo =); you can safely ignore it, although I suggest learning reST; its simple and is accepted for `PEP markup`__. Also, because of the wonders of programs that like to reformat text, I cannot guarantee you will be able to run the text version of this summary through Docutils_ as-is unless it is from the original text file. __ http://www.python.org/peps/pep-0012.html .. _python-dev: http://www.python.org/dev/ .. _python-dev mailing list: http://mail.python.org/mailman/listinfo/python-dev .. _comp.lang.python: http://groups.google.com/groups?q=comp.lang.python .. _Docutils: http://docutils.sf.net/ .. _reST: .. _reStructuredText: http://docutils.sf.net/rst.html .. contents:: .. _last summary: http://www.python.org/dev/summary/2003-03-16_2003-03-31.html ====================== Summary Announcements ====================== So all three people who expressed an opinion about the new Quickies_ format liked the new one, so it stays. Do you guys actually like the links to the CVS in the Summaries? The various links I put in to every single file mentioned that does not have direct documentation is time-consuming. But if you find it useful it can stay. Please let me know whether you actually use it (this means if you don't tell me!). ========= `Boom`__ ========= __ http://mail.python.org/pipermail/python-dev/2003-April/034370.html Splinter threads: - `RE: [Python-checkins] python/dist/src/Modules gcmodule.c <http://mail.python.org/pipermail/python-dev/2003-April/034371.html>`__ Related threads: - `Garbage collecting closures <http://mail.python.org/pipermail/python-dev/2003-April/034521.html>`__ - `Algorithm for finalizing cycles <http://mail.python.org/pipermail/python-dev/2003-April/034609.html>`__ Do you want to know what dedication is? Thinking of Python code that will cause Python to crash during dental surgery. Well, Tim Peters is that dedicated and managed to come up with some code that crashed Python when it attempted to garbage-collect some objects. This begins the joy that is garbage collection and finalizer functions. Gather around, children, as we learn about how Python tries to keep you from having to worry about keeping track of your trash. When Python executes the garbage collector, it looks to see what objects are unreachable based on reference counts; when something has a reference count of 0 nothing is referencing it so it is just floating out in the middle of no where with no one giving a hoot about whether it is their or not (children: "Awww! Poor, lonely object!"). But some of these lonely objects have what we call a finalizer (children: "What's that?!?"; isn't it cute when children are still inquisitive? Good for you for still being inquisitive! Have to be supportive, you know). A finalizer is either an instance that has a __del__ object or an object that has something in its tp_del slot (children: <nod>). Does anyone know why we have to take of these lonely objects that are somewhat special? (children: <look around for someone to be brave enough to actually try to answer; no one does>) Well, since these objects have something that must be called before they are garbage-collected, we have to make sure they don't reference an object that is out there but people think is cared about when it is only an object with a finalizer and we don't care about their opinions; it might want to keep an object's reference count above 0 and thus not be collected (children: "Oh..."). What makes them especially difficult to handle is that some objects that don't seem like finalizers lie end up acting like they do by defining a __getattr__ method that does very rude things (children: "That's not nice!"). And since it is really hard to tell whether they are real finalizers or just act like one, we just let them be out there forever so that they don't make the great Interpreter have to deal with them (some rabble-rouser: "My dad says that there is no great Interpreter and that everything came from the Compiler and Linker when they got together and did something my dad won't tell me about. Something about the bits and the bees..."; rest of children: "Nu-uh! The Interpeter is real! We've seen it! It's all-powerful and knowing! Take it back, take it back!"). Luckily, though, it is only an issue with something old-timers call classic classes; things that started to decay away long, long ago(children: "Yay! No more cruft!") And thanks to the diligent work of some very important people, it has been dealt with (children: "Yay! Thank you important people!"). There is a lesson to be learned here children; do not put old things to pasture to early since you can make stubborn old people mad which is bad since they make up the majority of voters in America (children: "Yes, teacher!"; smart-ass in the back of the room: "How old are you, teacher?" <snicker>). Guido said that Python's cleanup model could be summed up as "things get destroyed when> nothing refers to them at some arbitrary time after nothing refers to them." And the corollary is "always explicitly close your external resources." Tim Peters gave several suggestions in regards in how to make sure things get cleaned up; from registering cleanup code with sys.atexit() to keep a weakref in a module with a finalizer to be executed when the module is collected. ========= Quickies ========= `Distutils documentation amputated in 2.2 docs?`__ Splinter threads: - `How do I report a bug? <http://mail.python.org/pipermail/python-dev/2003-April/034328.html>`__ Greg Ewing that two sections from the Distutils docs disappeared between Python 1.6 and 2.2. Sections are still missing and will stay so until someone comes up with a patch to add in the missing sections. There was also a discussion on making it more obvious how to report a bug on SF_. __ http://mail.python.org/pipermail/python-dev/2003-April/034314.html .. _SF: .. _SourceForge: http://www.sf.net/ `PEP 269 once more.`__ Jonathan Riehl got his patch for implementing `PEP 269`_ ("Pgen Module for Python") but then uploaded a newer version that is better. __ http://mail.python.org/pipermail/python-dev/2003-April/034317.html .. _PEP 269: http://www.python.org/peps/pep-0269.html `Minor issue with PyErr_NormalizeException`__ It was discovered that PyErr_NormalizeException could dump core because it forgot to return on possible errors. It's been fixed and back-ported. __ http://mail.python.org/pipermail/python-dev/2003-April/034325.html `Capabilities (we already got one)`__ Splinter threads: - `Capabilities <http://mail.python.org/pipermail/python-dev/2003-April/034315.html>`__ - `Security challenge <http://mail.python.org/pipermail/python-dev/2003-April/034343.html>`__ The thread that refuses to die continued into this month. Nothing ground-breaking was said, though. Ben Laurie, though, did say he is working on a PEP_ so hopefully that will make this whole discussion clear. __ http://mail.python.org/pipermail/python-dev/2003-April/034323.html .. _PEP: http://www.python.org/peps/ `[PEP] += on return of function call result`__ Someone wanted to do ``log.setdefault(r, '') += "test %d\n" % t`` which does not work. But it was pointed out you can just do ``temp = log.setdefault(r, ''); temp += "test %d\n" % t``. __ http://mail.python.org/pipermail/python-dev/2003-April/034339.html `How to suppress instance __dict__?`__ This is only of interest to people who use `Boost.Python`_ (which I don't use so I am not going to summarize it; although if you use C++ you will want to look at Boost.Python). __ http://mail.python.org/pipermail/python-dev/2003-April/034319.html .. _Boost.Python: http://www.boost.org/libs/python/doc/ `Super and properties`__ Someone got bit by properties not working nicely with super(). Nathan Srebro subsequently posted a `link <http://www.ai.mit.edu/~nati/Python/>`__ to a his own version of super() which handles this problem. __ `fwd: Dan Sugalski on continuations and closures`__ Kevin Altis forwarded some posts by Dan Sugalski (the guy heading the Parrot_ project and who Guido will throw a pie at at OSCON 2004 =) about closures and continuations that he found at http://simon.incutio.com/archive/2003/04/03/#closuresAndContinuations . Very well-written and might clarify things for people if they care to know more about closures, continuations, and why Lisp folks claim they are so damn important. __ http://mail.python.org/pipermail/python-dev/2003-April/034368.html .. _Parrot: http://www.parrotcode.org/ `LONG_LONG`__ Python 2.3 renames the LONG_LONG definition from the C API to PY_LONG_LONG as it should have been renamed. Yes, this will break things, but it was incorrect to have not renamed it. If you need to keep compatibility with code before Python 2.3, just use the following code (contributed by Mark Hammond):: #if defined(PY_LONG_LONG) && !defined(LONG_LONG) #define LONG_LONG PY_LONG_LONG #endif __ http://mail.python.org/pipermail/python-dev/2003-April/034396.html `socket question`__ Someone asking why something didn't build under Solaris and subsequently being redirected to python-list@python.org . __ http://mail.python.org/pipermail/python-dev/2003-April/034399.html `PEP305 csv package: from csv import csv?`__ Why does one have to do ``from csv import csv``? Wouldn't it be more reasonable to just do some magic in __init__.py for the csv_ package to do this properly? Well, Skip Montanaro forwarded the question to cvs development list at csv@mail.mojam.com and said he probably will make the change in the near future. __ http://mail.python.org/pipermail/python-dev/2003-April/034409.html .. _csv: http://www.python.org/dev/doc/devel/lib/module-csv.html `SF file uploads work now`__ Yes, hell must have frozen over since you can now upload a file when you start a new patch or bug report on SourceForge_. __ http://mail.python.org/pipermail/python-dev/2003-April/034416.html `Unicode`__ Splinter threads: - `OT: Signal/noise ratio <http://mail.python.org/pipermail/python-dev/2003-April/034462.html>`__ Once again another question on python-dev that is not appropriate for the list. But this one spawned questions of whether the mailing list should be renamed (answer: no, since it is fairly well-known what python-dev is for) or go back to having the list being closed and requiring moderator approval for posts from people off the list (answer: no, because the amount of work was just too much of a pain and the amount of off-topic emails has not equated to the filtering work done previously). __ http://mail.python.org/pipermail/python-dev/2003-April/034453.html `Placement of os.fdopen functionality`__ It was suggested to make the fdopen method of the os_ module a class method of 'file'. That was determined to be YAGNI and thus won't happen. __ http://mail.python.org/pipermail/python-dev/2003-April/034380.html .. _os: http://www.python.org/dev/doc/devel/lib/os-newstreams.html `Adding item in front of a list`__ Tim Peters wonders how many people would be made upset if list.insert() supported a negative index argument. __ http://mail.python.org/pipermail/python-dev/2003-April/034518.html `Why is spawn*p* not available on Windows?`__ Shane Halloway might add one of the os.spawn*p*() functions to Windows. __ http://mail.python.org/pipermail/python-dev/2003-April/034473.html `tzset`__ time.tzset() is no long on Windows because it is broken (and I will behave and not make a joke about how it would be just as broken as the OS or anything because I am unbiased). __ http://mail.python.org/pipermail/python-dev/2003-April/034480.html `backporting string changes to 2.2.3`__ Neal Norwitz updated docs and back-ported changes to the string_ module to bring it in sync with the actual string object. __ http://mail.python.org/pipermail/python-dev/2003-April/034489.html .. _string: http://www.python.org/dev/doc/devel/lib/module-string.html `List wisdom`__ http://www.python.org/cgi-bin/moinmoin/PythonDevWisdom is a wiki page created to contain the random nuggets of wisdom that come up on python-dev. __ http://mail.python.org/pipermail/python-dev/2003-April/034575.html `ValueErrors in range()`__ Fixed the error where range() returned ValueError when it should return TypeError. __ http://mail.python.org/pipermail/python-dev/2003-April/034617.html `_socket efficiencies ideas`__ Marcus Mendenhall wanted to get a patch applied that would allow you to create a socket that could skip a DNS lookup. He also wanted to add the ability to include a '<numeric>' prefix IP addresses to make sure that DNS lookup was skipped. Various ways of trying to cut back on time wasted on unneeded DNS lookups was discussed but no solution was found acceptable. __ http://mail.python.org/pipermail/python-dev/2003-April/034403.html `tp_clear return value`__ tp_clear could stand to return void, but can't change because of backwards-compatibility. Will most likely end up documenting to ignore what is returned by what is put into tp_clear. __ http://mail.python.org/pipermail/python-dev/2003-April/034433.html `More socket questions`__ Someone suggested fixing something that has been solved in Python 2.3. __ http://mail.python.org/pipermail/python-dev/2003-April/034472.html `Embedded python on Win2K, import failures`__ Someone had errors embedding Windows. No real conclusion came out of it. __ http://mail.python.org/pipermail/python-dev/2003-April/034506.html `More int/long integration issues`__ Splintered threads: - `range() as iterator <http://mail.python.org/pipermail/python-dev/2003-April/034530.html>`__ Before Python 3.0 (when xrange() will disappear), there is a good chance that the idiom ``for x in range(): ...`` will be caught by the compiler and compiled into a lazy generator (probably a generator). __ http://mail.python.org/pipermail/python-dev/2003-April/034516.html `Changes to gettext.py for Python 2.3`__ Barry Warsaw suggested some changes to gettext_ but none of them seemed to catch on. __ http://mail.python.org/pipermail/python-dev/2003-April/034511.html .. _gettext: `Evil setattr hack`__ Someone discovered how to set attributes on built-in types. Guido checked in code to prevent it. __ http://mail.python.org/pipermail/python-dev/2003-April/034535.html `Using temp files and the Internet in regression tests`__ I asked if it was okay to use in regression tests temporary files (answer: yes it is and if you only need one use test.test_support.TESTFN) or sockets (answer: yes as long as test.test_support.is_resource_enabled("network") is True). It led to me being unofficially assigned the task of coming up with documentation for test_support and regrtest for both the library documentation and Lib/test/README. I also got CVS commit privileges on Python itself! I became an official Python developer! Woohoo! __ http://mail.python.org/pipermail/python-dev/2003-April/034538.html `migration away from SourceForge?`__ It was suggested we revisit the idea of moving Python development off of SourceForge_ because of the usual crappy CVS performance and underwhelming tracker performance. There is also the issue that problems with the setup cannot be fixed on our schedule. GForge_ and Roundup_ were both suggested as alternatives. Roundup specifically has gotten a decent amount of support since it is in Python and thus we could get things fixed quickly. Trouble is that it is not polished enough yet and we would need to furnish our own CVS (but Ben Laurie might be gracious enough to help us out on that front with possible hosting at http://www.thebunker.net/ ). __ http://mail.python.org/pipermail/python-dev/2003-April/034540.html .. _GForge: http://gforge.org/ .. _Roundup: http://roundup.sf.net/ `How should time.strptime() handle UTC?`__ I asked if anyone thought time.strptime() should recognize UTC and GMT timezones by default for setting whether or not daylight savings was being used. No one has given their opinion yet (do you have one?). __ http://mail.python.org/pipermail/python-dev/2003-April/034543.html `Big trouble in CVS Python`__ CVS Python was crashing on the regression tests. It turned out to be from the reuse of a variable in the code that implements range(). Tim Peters said a "Word to the wise: don't ever try to reuse a variable whose address is passed to PyArg_ParseTuple for anything other than holding what PyArg_ParseTuple does or doesn't store into it". __ http://mail.python.org/pipermail/python-dev/2003-April/034544.html `GIL vs thread state`__ Discovered the docs for PyThreadState_Clear() are incorrect (or at least not very clear). __ http://mail.python.org/pipermail/python-dev/2003-April/034574.html `test_pwd failing`__ test_pwd was failing and now it isn't. __ http://mail.python.org/pipermail/python-dev/2003-April/034626.html `lists v. tuples`__ You can tell whether a comparison function does a 3-way or 2-way (using <); but that is not Pythonic and thus won't be done so as to allow someone to pass either a 2 or 3-way comparison function to list.sort() and have the method figure out what type of sort it is. __ http://mail.python.org/pipermail/python-dev/2003-April/034646.html `LynxOS 4 port`__ Duane Voth wants to get Python ported to the LynxOS on PPC. __ http://mail.python.org/pipermail/python-dev/2003-April/034647.html `sre.c and sre_match()`__ The C code for the re module is not simple. =) __ http://mail.python.org/pipermail/python-dev/2003-April/034653.html From thomas@xs4all.net Fri Apr 18 23:50:55 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sat, 19 Apr 2003 00:50:55 +0200 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) In-Reply-To: <20030418003431.GE9493@xs4all.nl> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl> <20030418003431.GE9493@xs4all.nl> Message-ID: <20030418225055.GF9493@xs4all.nl> On Fri, Apr 18, 2003 at 02:34:31AM +0200, Thomas Wouters wrote: > On Fri, Apr 18, 2003 at 02:06:50AM +0200, Thomas Wouters wrote: > > Hm, wait, handling PyMethodDescrs may not be as tricky as I thought... > > hrm... I'll look at it tomorrow, it's time for bed. > I did a quick hack to the same effect, and it still came out a 1% loss (so > about 6% against the no-newstyle patch) in PyBench and a few timeit tests. > Sigh. I guess the non-method overhead is just too large, or there are more > almost-methods than I figured. I'll start work on a more lookup-saving > _PyObject_Generic_getmethod tomorrow or this weekend (and will probably do > _Py_instance_getmethod that way too, while I'm at it.) Okay, for those who care about this but aren't on Patches, I just uploaded a new CALL_ATTR patch, version 4. It's actually two separate versions (3 and 4): maintainable, and fast. See the SF patch comment for more details :) However, I spent most of tonight trying to clock the patch, only to come to the conclusion that benchmarks suck. Which I already knew :) PyBench did a reasonable job pointing me towards slowness, but the main slowdowns I see with PyBench I cannot reproduce with timeit.py. I think I stopped trusting PyBench when it reported the patch was 2% slower, but did so 5% faster -- consistently. So, if anyone has any *real* programs they can test the patch with, I would be much obliged. Otherwise we'll have to check it in claiming it gives a 20% performance boost on ... methods of newstyle classes ,,, and 30% on ... methods of old-style classes. :) The patch is here: http://www.python.org/sf/709744 Where-...-reads-"empty-no-argument"-and-,,,-reads-"that-use- -PyObject_GenericGetAttr"-in-very-very-small-letters'ly y'rs, -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mhammond@skippinet.com.au Sat Apr 19 03:34:37 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 19 Apr 2003 12:34:37 +1000 Subject: [Python-Dev] Final PEP 311 run In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBMEHAB.tim_one@email.msn.com> Message-ID: <001101c3061c$395b6dd0$530f8490@eden> [Tim] > Some questions occurred while reading the PEP again, > primarily are there any To save space, the answer to all your questions are "it is ok", and "it must avoid using the same handle - it must use its own". I have updated the PEP and the patch (primarily in the comments for the new functions) to hopefully clarify this. > > The only issue is the name of the API. > > If that's the only issue, check it in yesterday <0.9 wink>. OK, I'm gunna hold that against you <wink> Guido: > How about PyGILState_Ensure() and PyGILState_Restore()? Done! I have checked in a new pep-311, and a new patch (http://www.python.org/sf/684256). So if Guido can formally pronounce on pep-311, I will use those words against Tim and check it in! Thanks, Mark. From python@rcn.com Sat Apr 19 05:14:07 2003 From: python@rcn.com (Raymond Hettinger) Date: Sat, 19 Apr 2003 00:14:07 -0400 Subject: [Python-Dev] CALL_ATTR patch (was: 2.3b1 release) References: <200304161552.h3GFqAQ10181@odiug.zope.com> <20030417152722.GA9493@xs4all.nl> <200304171553.h3HFr1023445@odiug.zope.com> <20030417205956.GC9493@xs4all.nl> <20030418000650.GD9493@xs4all.nl> <20030418003431.GE9493@xs4all.nl> <20030418225055.GF9493@xs4all.nl> Message-ID: <001501c3062a$1f07bde0$060ea044@oemcomputer> So, if anyone has any *real* programs they can test the patch > with, I would be much obliged. Otherwise we'll have to check it in claiming > it gives a 20% performance boost on ... methods of newstyle classes ,,, and > 30% on ... methods of old-style classes. :) I tried it on some of my apps which moderately exercise both new and old style classes. None of the apps improved and one a 1% worse. Both pybench and pystone were worse by 1%. Also, line 767 in classobject.c has an unreferenced variable, f. Raymond Hettinger From guido@python.org Sat Apr 19 14:19:48 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 19 Apr 2003 09:19:48 -0400 Subject: [Python-Dev] Final PEP 311 run In-Reply-To: "Your message of Sat, 19 Apr 2003 12:34:37 +1000." <001101c3061c$395b6dd0$530f8490@eden> References: <001101c3061c$395b6dd0$530f8490@eden> Message-ID: <200304191319.h3JDJmF05210@pcp02138704pcs.reston01.va.comcast.net> > Guido: > > How about PyGILState_Ensure() and PyGILState_Restore()? > [Mark] > Done! I have checked in a new pep-311, and a new patch > (http://www.python.org/sf/684256). So if Guido can formally pronounce on > pep-311, I will use those words against Tim and check it in! OK, check it in, Mark! --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@python.net Sat Apr 19 17:07:54 2003 From: gward@python.net (Greg Ward) Date: Sat, 19 Apr 2003 12:07:54 -0400 Subject: [Python-Dev] test_pwd failing In-Reply-To: <3E9C2828.4040803@livinglogic.de> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> Message-ID: <20030419160754.GA847@cthulhu.gerg.ca> On 15 April 2003, Walter Dörwald said: > Should the same change be done for the pwd module, i.e. > are duplicate gid's allowed in /etc/group? Yes. I got a test failure from test_grp the other night, but I didn't report it because I hadn't investigated it thoroughly yet. I'm guessing it's the same as the test_pwd failure... and yes, it stems from a duplicate GID in the /etc/group file on that system. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Outside of a dog, a book is man's best friend. Inside of a dog, it's too dark to read. From gward@python.net Sat Apr 19 17:15:18 2003 From: gward@python.net (Greg Ward) Date: Sat, 19 Apr 2003 12:15:18 -0400 Subject: [Python-Dev] shellwords In-Reply-To: <20030416145602.GA27447@localhost.distro.conectiva> References: <20030416145602.GA27447@localhost.distro.conectiva> Message-ID: <20030419161518.GB847@cthulhu.gerg.ca> On 16 April 2003, Gustavo Niemeyer said: > Is there any chance of getting shellwords[1] into Python 2.3? It's very > small module with a pretty interesting functionality: It's already there (and has been since Python 1.6), albeit with a different name and implementation: >>> import distutils.util >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"') ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg'] Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ I'd like some JUNK FOOD ... and then I want to be ALONE -- From barry@python.org Sat Apr 19 17:26:15 2003 From: barry@python.org (Barry Warsaw) Date: 19 Apr 2003 12:26:15 -0400 Subject: [Python-Dev] shellwords In-Reply-To: <20030419161518.GB847@cthulhu.gerg.ca> References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca> Message-ID: <1050769575.29001.28.camel@anthem> On Sat, 2003-04-19 at 12:15, Greg Ward wrote: > On 16 April 2003, Gustavo Niemeyer said: > > Is there any chance of getting shellwords[1] into Python 2.3? It's very > > small module with a pretty interesting functionality: > > It's already there (and has been since Python 1.6), albeit with a > different name and implementation: > > >>> import distutils.util > >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"') > ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg'] Distutils has a lot of neat (undocumented <wink>) stuff! I wonder if it makes sense to start promoting some of the more generally useful stuff up into library modules of their own? -Barry From mwh@python.net Sat Apr 19 18:04:20 2003 From: mwh@python.net (Michael Hudson) Date: Sat, 19 Apr 2003 18:04:20 +0100 Subject: [Python-Dev] shellwords In-Reply-To: <1050769575.29001.28.camel@anthem> (Barry Warsaw's message of "19 Apr 2003 12:26:15 -0400") References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca> <1050769575.29001.28.camel@anthem> Message-ID: <2mlly6pgff.fsf@starship.python.net> Barry Warsaw <barry@python.org> writes: > On Sat, 2003-04-19 at 12:15, Greg Ward wrote: >> On 16 April 2003, Gustavo Niemeyer said: >> > Is there any chance of getting shellwords[1] into Python 2.3? It's very >> > small module with a pretty interesting functionality: >> >> It's already there (and has been since Python 1.6), albeit with a >> different name and implementation: >> >> >>> import distutils.util >> >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"') >> ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg'] > > Distutils has a lot of neat (undocumented <wink>) stuff! I wonder if it > makes sense to start promoting some of the more generally useful stuff > up into library modules of their own? Yes. Particularly the file-manipulation stuff... shutil tends to lose somewhat x-platform. I probably first said this two or more years ago... still haven't done anythin about it :-/ Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From aahz@pythoncraft.com Sat Apr 19 18:07:00 2003 From: aahz@pythoncraft.com (Aahz) Date: Sat, 19 Apr 2003 13:07:00 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030419170700.GA21744@panix.com> On Sat, Apr 12, 2003, Guido van Rossum wrote: > > Using the dictionary doesn't work either: > > >>> str.__dict__['reverse'] = reverse > Traceback (most recent call last): > File "<stdin>", line 1, in ? > TypeError: object does not support item assignment > >>> > > But here's a trick that *does* work: > > >>> object.__setattr__(str, 'reverse', reverse) > >>> > > Proof that it worked: > > >>> "hello".reverse() > 'olleh' > >>> This post inspired me to check the way new-style class instances work with properties. Running the following code will demonstrate that although the __setattr__ hack is blocked, you can still access the instance's dict. This can obviously be fixed by using __slots__, but that seems unwieldy. Should we do anything? class C(object): def _getx(self): print "getting x:", self._x return self._x def _setx(self, value): print "setting x with:", value self._x = value x = property(_getx, _setx) a = C() a.x = 1 a.x object.__setattr__(a, 'x', 'foo') a.__dict__['x'] = 'spam' print a.__dict__['x'] -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From guido@python.org Sat Apr 19 18:22:57 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 19 Apr 2003 13:22:57 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: "Your message of Sat, 19 Apr 2003 13:07:00 EDT." <20030419170700.GA21744@panix.com> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> <20030419170700.GA21744@panix.com> Message-ID: <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net> > This post inspired me to check the way new-style class instances work > with properties. Running the following code will demonstrate that > although the __setattr__ hack is blocked, you can still access the > instance's dict. This can obviously be fixed by using __slots__, but > that seems unwieldy. Should we do anything? > > class C(object): > def _getx(self): > print "getting x:", self._x > return self._x > def _setx(self, value): > print "setting x with:", value > self._x = value > x = property(_getx, _setx) > > a = C() > a.x = 1 > a.x > object.__setattr__(a, 'x', 'foo') > a.__dict__['x'] = 'spam' > print a.__dict__['x'] I see nothing wrong with that. It falls in the category "don't do that", but I don't see why we should try to make it impossible. The thing with attributes of built-in types was different. This can affect multiple interpreters, which is evil. It also is too attractive to expect people not to use it if it works (since many people *think* they have a need to modify built-in types). That's why I go to extra lengths to make it impossible, not just hard. --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Sat Apr 19 20:26:18 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sat, 19 Apr 2003 12:26:18 -0700 (PDT) Subject: [Python-Dev] test_pwd failing In-Reply-To: <20030419160754.GA847@cthulhu.gerg.ca> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> <20030419160754.GA847@cthulhu.gerg.ca> Message-ID: <Pine.SOL.4.55.0304191225130.19123@death.OCF.Berkeley.EDU> [Greg Ward] > On 15 April 2003, Walter D=F6rwald said: > > Should the same change be done for the pwd module, i.e. > > are duplicate gid's allowed in /etc/group? > > Yes. I got a test failure from test_grp the other night, but I didn't > report it because I hadn't investigated it thoroughly yet. I'm guessing > it's the same as the test_pwd failure... and yes, it stems from a > duplicate GID in the /etc/group file on that system. > I got it, too. Also got a test_getargs2 failure. Haven't looked into it thoroughly yet, though, especially since I don't know the status of the new arg codes. -Brett From theller@python.net Sat Apr 19 21:10:14 2003 From: theller@python.net (Thomas Heller) Date: 19 Apr 2003 22:10:14 +0200 Subject: [Python-Dev] Evil setattr hack In-Reply-To: <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> <20030419170700.GA21744@panix.com> <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <4r4uuu3d.fsf@python.net> Guido van Rossum <guido@python.org> writes: > The thing with attributes of built-in types was different. This can > affect multiple interpreters, which is evil. You seem to care about multiple interpreters in the same process. Any chance to move the frozen modules pointer PyImport_FrozenModules to a interpreter private variable (part of the PyInterpreterState)? Thomas From gward@python.net Sat Apr 19 22:31:23 2003 From: gward@python.net (Greg Ward) Date: Sat, 19 Apr 2003 17:31:23 -0400 Subject: [Python-Dev] shellwords In-Reply-To: <1050769575.29001.28.camel@anthem> References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca> <1050769575.29001.28.camel@anthem> Message-ID: <20030419213123.GA681@cthulhu.gerg.ca> On 19 April 2003, Barry Warsaw said: > Distutils has a lot of neat (undocumented <wink>) stuff! I wonder if it > makes sense to start promoting some of the more generally useful stuff > up into library modules of their own? Probably. All the generally-useful stuff is documented in clear, concise docstrings, so any enterprising hacker could take this on. I still don't have enough round tuits to look at the Distutils again. (Let's see ... my distutils-sig folder has 937 unread messages right now... sigh...) Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Question authority! From aleax@aleax.it Sat Apr 19 22:43:48 2003 From: aleax@aleax.it (Alex Martelli) Date: Sat, 19 Apr 2003 23:43:48 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") Message-ID: <200304192343.48211.aleax@aleax.it> Sorry to distract python-dev's august collective attention from its usual exhalted concerns down to a mundane issue;-), but... we may be able to strike a tiny blow for simplicity, clarity, power, AND performance at once... For the Nth time, today somebody asked in c.l.py about how best to sum a list of numbers. As usual, many suggested reduce(lambda x,y:x+y, L), others reduce(int.__add__,L), others reduce(operator.add,L), etc, and some (me included) a simple total = 0 for x in L: total = total + x The usual performance measurements were unchained (easier than ever thanks to timeit.py of course;-), and the paladins of reduce were once again dismayed by the fact that the best reduce can do (that best is obtained with operator.add) is mediocre (e.g. on my box with L=range(999), reduce takes 330 usec, and the simple for loop takes 247). Discussion proceeded on whether "reduce(operator.add, L)" was abstruse for most people, or not, and on whether the loop was or wasn't "too low level", as the Pythonic approach to such a common task. It then struck me that Python doesn't HAVE "one single obvious way" to do what IS after all a rather common task in everyday programming, namely, "sum up this bunch of things" (typically numbers, occasionally strings -- and when they're strings the "obvious" loop above is terribly slow, a typical newbie trap...). Somebody proposed having operator.add take any number of arguments -- not quite satisfactory, AND dog-slow, it turned out to be (when I tried a quick experimental mod to operator.c), due to the need to turn a sequence (typically a list) into a tuple with *. Now, I think the obvious approach would be to have a function sum, callable with any non-empty homogeneous sequence (sequence of items such that + can apply between them), returning the sequence's summation -- now THAT might help for simplicity, clarity AND power. So, I tried coding that up -- just 40 lines of C... it runs twice as fast as the plain loop, for summing up range(999), and just as fast as ''.join for summing up map(str, range(999)) [for the simple reason that I special-case this -- when the first item is a PyBaseString_Type, I delegate to ''.join]. Discussing this with newbie-to-moderately experienced Pythonistas, the uniform reaction was on the order of "you mean Python doesn't HAVE a sum function already?!" -- most everybody seemed to feel that such a function WOULD be "the obvious way to do it" and that it should definitely be there. So -- by this time I'm biased, having invested a bit of time in this -- what do y'all think... any interest in this? Should I submit it? I'm not quite sure where it should go -- a builtin seems most natural (to keep company with min and max, for example), but maybe that would be too ambitious, and it should be in math or operator instead... Alex From agthorr@barsoom.org Sat Apr 19 23:41:11 2003 From: agthorr@barsoom.org (Agthorr) Date: Sat, 19 Apr 2003 15:41:11 -0700 Subject: [Python-Dev] heapq Message-ID: <20030419224110.GB2460@barsoom.org> Hello, I'm new to this list, so I will begin by introducing myself. I'm a graduate student at the University of Oregon working towards my PhD. My primary area of research is in peer-to-peer networks. I use Python for a variety of purposes such as constructing web pages, rapid prototyping, and building test frameworks. I have been a Python user for at least two years. I must confess that I have not lurked on the list much before making this post. I did search back in the list though, so in theory I won't be bringing up a rehashed topic... Recently, I had need of a heap in Python. I didn't see one in the 2.2 distribution, so I went and implemented one. Afterwards, I wondered if this might be useful to others, so I decided to investigate if any work had been done to add a heap to Python's standard library. Low and behold, in CVS there is a module called "heapq". I compared my implementation with heapq, and I see some important differences. I'm not going to unilaterally state that mine is better, but I thought it would be worthwhile to raise the differences in this forum, so that an informed decision is made about The Best Way To Do Things. Hopefully, it will not be too terribly controversial :) The algorithms used are more or less identical, I'm primarily concerned with the differences in interface. As written, heapq provides some functions to maintain the heap priority on a Python list. By contrast, I implemented the heap as an opaque class that maintains a list internally. By creating this layer of abstraction, it is possible to completely change the heap implementation later, without worrying about affecting user programs. For example, it would be possible to switch to using Fibonacci Heaps or the Pairing Heaps mentioned by Tim Peters in this message: http://mail.python.org/pipermail/python-dev/2002-August/027531.html Another key difference is that my implementation supports the decrease_key() operation that is important for algorithms such as Dijkstra's. This requires a little extra bookkeeping, but it's just a small constant factor ;) For the API, my insert() function returns an opaque key that can later be used as a parameter to the adjust_key() function. For those who like looking at source code, my implementation is here: http://www.cs.uoregon.edu/~agthorr/heap.py -- Dan Stutzbach From niemeyer@conectiva.com Sat Apr 19 23:51:08 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Sat, 19 Apr 2003 19:51:08 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <20030419161518.GB847@cthulhu.gerg.ca> References: <20030416145602.GA27447@localhost.distro.conectiva> <20030419161518.GB847@cthulhu.gerg.ca> Message-ID: <20030419225108.GA2469@localhost.distro.conectiva> > It's already there (and has been since Python 1.6), albeit with a > different name and implementation: > > >>> import distutils.util > >>> distutils.util.split_quoted('arg "arg arg" arg "arg" -o="arg arg"') > ['arg', 'arg arg', 'arg', 'arg', '-o=arg arg'] I wasn't aware about it. While it should be enough for most uses, it's still not posix compliant. Single and double quotes are treated the same way (single quotes shouldn't allow escaping), and escaping is done differently (r'"\""' results in r'\"' instead of '"', and r'"\\"' results in r'\\' instead of r'\', for example). As others have said, it'd be nice to have these utilities somewhere outside distutils. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From Jack.Jansen@oratrix.com Sun Apr 20 00:03:40 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sun, 20 Apr 2003 01:03:40 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304192343.48211.aleax@aleax.it> Message-ID: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> On zaterdag, apr 19, 2003, at 23:43 Europe/Amsterdam, Alex Martelli wrote: > For the Nth time, today somebody asked in c.l.py about how best to sum > a list of numbers. As usual, many suggested reduce(lambda x,y:x+y, L), > others reduce(int.__add__,L), others reduce(operator.add,L), etc, and > some > (me included) a simple > total = 0 > for x in L: > total = total + x > > The usual performance measurements were unchained (easier than ever > thanks to timeit.py of course;-), and the paladins of reduce were once > again > dismayed by the fact that the best reduce can do (that best is > obtained with > operator.add) is mediocre (e.g. on my box with L=range(999), reduce > takes > 330 usec, and the simple for loop takes 247). > [...] > Now, I think the obvious approach would be to have a function sum, > callable with any non-empty homogeneous sequence (sequence of > items such that + can apply between them), returning the sequence's > summation -- now THAT might help for simplicity, clarity AND power. > > So, I tried coding that up -- just 40 lines of C... it runs twice as > fast > as the plain loop, for summing up range(999), and just as fast as > ''.join > for summing up map(str, range(999)) [for the simple reason that I > special-case this -- when the first item is a PyBaseString_Type, I > delegate to ''.join]. Do you have any idea why your sum function is, uhm, three times faster than the reduce(operator.add) version? Is the implementation of reduce doing something silly, or are there shortcuts you can take that reduce() can't? I'm asking because I think I would prefer reduce to give the speed you want. That way, we won't have people come asking for a prod() function to match sum(), etc. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From eppstein@ics.uci.edu Sun Apr 20 00:06:19 2003 From: eppstein@ics.uci.edu (David Eppstein) Date: Sat, 19 Apr 2003 16:06:19 -0700 Subject: [Python-Dev] Re: heapq References: <20030419224110.GB2460@barsoom.org> Message-ID: <eppstein-34C3B0.16061719042003@main.gmane.org> In article <20030419224110.GB2460@barsoom.org>, Agthorr <agthorr@barsoom.org> wrote: > The algorithms used are more or less identical, I'm primarily > concerned with the differences in interface. It seems relevant to point out my own experiment with an interface to priority queue data structures, http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117228 The algorithm itself is an uninteresting binary heap with lazy deletion, I am interested here more in the API. My feeling is that "queue" is the wrong metaphor to use for a heap, since it maintains not just a sequence of objects (as in a queue) but a more general mapping of objects to priorities. In many algorithms (e.g. Dijkstra), you want to be able to change these priorities, not just add and remove items the way you would in a queue. So, anyway, I called it a "priority dictionary" and gave it a dictionary-like API: pd[item] = priority adds a new item to the heap with the given priority, or updates the priority of an existing item, no need for a separate decrease_key method as you suggest. There is an additional method for finding the highest-priority item since that's not a normal dictionary operation. I also implemented an iterator method that repeatedly finds and removes the highest priority item, so that "for item in priorityDictionary" loops through the items in priority order. Maybe it would have been better to give this method a different name, though, since it's quite different from the usual not-very-useful dictionary iterator. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From mwh@python.net Sun Apr 20 00:10:48 2003 From: mwh@python.net (Michael Hudson) Date: Sun, 20 Apr 2003 00:10:48 +0100 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> (Jack Jansen's message of "Sun, 20 Apr 2003 01:03:40 +0200") References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> Message-ID: <2mademozgn.fsf@starship.python.net> Jack Jansen <Jack.Jansen@oratrix.com> writes: > Do you have any idea why your sum function is, uhm, three times > faster than the reduce(operator.add) version? Is the implementation > of reduce doing something silly, or are there shortcuts you can take > that reduce() can't? I imagine it's the function calls; a trip through the call machinery, time packing and unpacking arguments, etc. I haven't checked, though. > I'm asking because I think I would prefer reduce to give the speed > you want. That way, we won't have people come asking for a prod() > function to match sum(), etc. I can't think of one. I'm not sure this is worth the effort, though. Cheers, M. -- Any form of evilness that can be detected without *too* much effort is worth it... I have no idea what kind of evil we're looking for here or how to detect is, so I can't answer yes or no. -- Guido Van Rossum, python-dev From jack@performancedrivers.com Sun Apr 20 00:57:20 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Sat, 19 Apr 2003 19:57:20 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <2mademozgn.fsf@starship.python.net>; from mwh@python.net on Sun, Apr 20, 2003 at 12:10:48AM +0100 References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> <2mademozgn.fsf@starship.python.net> Message-ID: <20030419195720.D1553@localhost.localdomain> On Sun, Apr 20, 2003 at 12:10:48AM +0100, Michael Hudson wrote: > Jack Jansen <Jack.Jansen@oratrix.com> writes: > > > Do you have any idea why your sum function is, uhm, three times > > faster than the reduce(operator.add) version? Is the implementation > > of reduce doing something silly, or are there shortcuts you can take > > that reduce() can't? > > I imagine it's the function calls; a trip through the call machinery, > time packing and unpacking arguments, etc. I haven't checked, though. Browsing through bltinmodule.c (was 'builtinmodule.c' too long?) it is mainly the overhead of calling PyEval_CallObject lots of times, which would include parsing args, etc. It tries to avoid creating the argument tuple more than once by checking the refcount on every loop (I would think the tuple would be generally be unpacked by the receiving function, but better safe than sorry). > > I'm asking because I think I would prefer reduce to give the speed > > you want. That way, we won't have people come asking for a prod() > > function to match sum(), etc. > I think reduce/filter/map could be improved by checking if their operative function is in builtin or operator modules and calling the C directly. operator.c is a just litany of macros. This would add tiny one-time overhead for non builtins/operators. Don't mistake my comments as volunteering ;) I'm still plodding through _sre.c (Did you know re's are used in setup.py? break _sre.c and you can't compile the source without using two trees, fun!) -jackdied From drifty@alum.berkeley.edu Sun Apr 20 01:01:21 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sat, 19 Apr 2003 17:01:21 -0700 (PDT) Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304192343.48211.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> Message-ID: <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU> [Alex Martelli] > For the Nth time, today somebody asked in c.l.py about how best to sum > a list of numbers. As usual, many suggested reduce(lambda x,y:x+y, L), > others reduce(int.__add__,L), others reduce(operator.add,L), etc, and some > (me included) a simple > total = 0 > for x in L: > total = total + x <snip> > Now, I think the obvious approach would be to have a function sum, > callable with any non-empty homogeneous sequence (sequence of > items such that + can apply between them), returning the sequence's > summation -- now THAT might help for simplicity, clarity AND power. > <snip> > Discussing this with newbie-to-moderately experienced Pythonistas, > the uniform reaction was on the order of "you mean Python doesn't > HAVE a sum function already?!" -- most everybody seemed to feel > that such a function WOULD be "the obvious way to do it" and that > it should definitely be there. > So I have no fundamental issue with the proposed function, but I don't find a huge need for it personally; I always do the looping solution (jaded against the functional stuff from school =). I do see how it could be useful, though. I don't necessarily see this as a built-in (although it wouldn't kill me if it became one). I don't see it going into either the math or operator modules since it doesn't quite fit what is already there. I initially thought itertools since it is basically working on an iterator, but I don't know if we want to change itertools from a module the provides functionality for outputting special iterators compared to working with iterators. And as for the argument that other people are shocked it isn't already there... I just don't agree with that. Just because people want it does not mean it is a good solution to a problem. Tyranny of the majority and such. =) So I am currently +0 on having the function, -0 on sticking it in math or operator, +0 on built-in. And now I go back to PHP grunt work, wishing I was actually writing docs for test_support and regrtest instead (and that says something about what I am having to work on). -Brett From guido@python.org Sun Apr 20 01:40:39 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 19 Apr 2003 20:40:39 -0400 Subject: [Python-Dev] Evil setattr hack In-Reply-To: "Your message of 19 Apr 2003 22:10:14 +0200." <4r4uuu3d.fsf@python.net> References: <200304121343.h3CDhqU01887@pcp02138704pcs.reston01.va.comcast.net> <20030419170700.GA21744@panix.com> <200304191722.h3JHMvh05538@pcp02138704pcs.reston01.va.comcast.net> <4r4uuu3d.fsf@python.net> Message-ID: <200304200040.h3K0edh10106@pcp02138704pcs.reston01.va.comcast.net> > You seem to care about multiple interpreters in the same process. > Any chance to move the frozen modules pointer PyImport_FrozenModules > to a interpreter private variable (part of the PyInterpreterState)? Why would you want that? Since it is just statically initialized data, I don't see the point. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Sun Apr 20 02:46:22 2003 From: tim.one@comcast.net (Tim Peters) Date: Sat, 19 Apr 2003 21:46:22 -0400 Subject: [Python-Dev] New re failures on Windows Message-ID: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> Sorry, I can't make time for this. test_re is failing today: """ C:\Code\python\PCbuild>python ../lib/test/test_re.py Running tests on re.search and re.match Running tests on re.sub Running tests on symbolic references Running tests on re.subn Running tests on re.split Running tests on re.findall Running tests on re.match Running tests on re.escape Pickling a RegexObject instance Test engine limitations maximum recursion limit exceeded Running re_tests test suite === grouping error ('^((a)c)?(ab)$', 'ab', 0, 'g1+"-"+g2+"-"+g3', 'None-None-ab' ) 'None-a-ab' should be 'None-None-ab' """ test_sre is dying with a segfault: """ C:\Code\python\PCbuild>python ../lib/test/test_sre.py Running tests on character literals Running tests on sre.search and sre.match sre.match(r'(a)?a','a').lastindex FAILED expected None got result 1 sre.match(r'(a)(b)?b','ab').lastindex FAILED expected 1 got result 2 sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED expected 'a' got result 'b' Running tests on sre.sub Running tests on symbolic references Running tests on sre.subn Running tests on sre.split Running tests on sre.findall Running tests on sre.finditer Running tests on sre.match Running tests on sre.escape Running tests on sre.Scanner Pickling a SRE_Pattern instance Test engine limitations """ and it dies with a segfault there. Unfortunately, test_sre doesn't die in a debug build. From niemeyer@conectiva.com Sun Apr 20 03:27:23 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Sat, 19 Apr 2003 23:27:23 -0300 Subject: [Python-Dev] New re failures on Windows In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> Message-ID: <20030420022723.GA5905@localhost.distro.conectiva> I have backed out some changes introduced in _sre.c:2.84 so that its behavior was compliant with the old behavior. More information in bug #672491. [...] > Running re_tests test suite > === grouping error ('^((a)c)?(ab)$', 'ab', 0, 'g1+"-"+g2+"-"+g3', > 'None-None-ab' > ) 'None-a-ab' should be 'None-None-ab' > """ Hummm.. my changes shouldn't affect this. I'll check that out as well. > test_sre is dying with a segfault: This shouldn't happen with my changes either. I've just backed out some changes, returning to the original code. > """ > C:\Code\python\PCbuild>python ../lib/test/test_sre.py > Running tests on character literals > Running tests on sre.search and sre.match > sre.match(r'(a)?a','a').lastindex FAILED > expected None > got result 1 > sre.match(r'(a)(b)?b','ab').lastindex FAILED > expected 1 > got result 2 > sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED > expected 'a' > got result 'b' These were the tests I've implemented when the patch was introduced. Unfortunately, the documentation wasn't clear about the expected behavior, and it was implemented wrongly the first time. Now I backed out the changes, and it returned to the original behavior. OTOH, it looks like part of the original problem is still there. I'll work on it. Greg Chapman also has some ideas about this in the patch #712900. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From python@rcn.com Sun Apr 20 06:59:03 2003 From: python@rcn.com (Raymond Hettinger) Date: Sun, 20 Apr 2003 01:59:03 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> Message-ID: <004101c30701$f2bde9c0$0a11a044@oemcomputer> > Now, I think the obvious approach would be to have a function sum, > callable with any non-empty homogeneous sequence (sequence of > items such that + can apply between them), returning the sequence's > summation -- now THAT might help for simplicity, clarity AND power. +1 -- this comes-up all the time. > I'm not > quite sure where it should go -- a builtin seems most natural (to keep > company with min and max, for example), but maybe that would be > too ambitious, and it should be in math or operator instead... __builtin__ is already too fat. math is for floats. operator is mostly for operators. Perhaps make a separate module for vector-to-scalar operations like min, max, product, average, moment, and dotproduct. Raymond Hettinger From uche.ogbuji@fourthought.com Sun Apr 20 04:51:38 2003 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Sat, 19 Apr 2003 21:51:38 -0600 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: Message from Alex Martelli <aleax@aleax.it> of "Sat, 19 Apr 2003 23:43:48 +0200." <200304192343.48211.aleax@aleax.it> Message-ID: <E1975rf-0002kf-00@borgia.local> > Now, I think the obvious approach would be to have a function sum, > callable with any non-empty homogeneous sequence (sequence of > items such that + can apply between them), returning the sequence's > summation -- now THAT might help for simplicity, clarity AND power. +1. I agree that this is a natural additon to min() and max(), and a common enough case to clarify and optimize. > I'm not > quite sure where it should go -- a builtin seems most natural (to keep > company with min and max, for example), but maybe that would be > too ambitious, and it should be in math or operator instead... +1 on builtins, but I'd be OK with math or op as well. -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Gems From the [Python/XML] Archives - http://www.xml.com/pub/a/2003/04/09/py-xm l.html From prabhu@aero.iitm.ernet.in Sun Apr 20 07:09:23 2003 From: prabhu@aero.iitm.ernet.in (Prabhu Ramachandran) Date: Sun, 20 Apr 2003 11:39:23 +0530 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304192343.48211.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> Message-ID: <16034.14739.83749.505815@monster.linux.in> >>>>> "AM" == Alex Martelli <aleax@aleax.it> writes: [summing a sequence] AM> Now, I think the obvious approach would be to have a function AM> sum, callable with any non-empty homogeneous sequence AM> (sequence of items such that + can apply between them), AM> returning the sequence's summation -- now THAT might help for AM> simplicity, clarity AND power. FWIW, Numeric provides a sum function that mostly does what you want: >>> from Numeric import * >>> sum(range(999)) 498501 >>> sum(['a', 'b', 'c']) 'abc' # this one produces a slightly surprising result >>> sum(['aaa', 'b', 'c']) array([abc , a , a ],'O') # but is easily explained in the context of multi-dimensional arrays. Anyway, my point is most Numeric users are already comfortable with the idea of a sum function. However, as someone already said, if you argue that sum is necessary, what about product (which again Numeric provides along with a host of other useful functions)? cheers, prabhu From aleax@aleax.it Sun Apr 20 07:29:52 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 08:29:52 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <16034.14739.83749.505815@monster.linux.in> References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in> Message-ID: <200304200829.52477.aleax@aleax.it> On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote: ... > Anyway, my point is most Numeric users are already comfortable with > the idea of a sum function. However, as someone already said, if you Oh yes, Numeric.sum is excellent, by all means. But I think sum is quite helpful even for programs not using Numeric. > argue that sum is necessary, what about product (which again Numeric > provides along with a host of other useful functions)? In the context of Numeric use, it's quite appropriate to have sum, prod, and the other ufuncs' reduce AND accumulate methods. In everyday programming in other fields, the demand for the functionality given by sum is FAR higher than that given by prod. For example, googling on c.l.py shows 165 posts mentioning "reduce(operator.add" versus 39 mentioning "reduce(operator.mul". This reflects the need of typical computations -- indeed, even the English language shows indications about the prevalence of summing as a bulk operation. In everyday life, we often have to sum a set of numbers of varying cardinality -- we even have the word "total" to indicate the result of this operation. We rarely have to multiply such a set of numbers -- most multiplications we do involve two, at most three numbers, while every time we check a restaurant bill or other itemized bill we're summing up a varying number of numbers, for example. I think that, in this case, practicality beats purity, and we should have a sum function somewhere in Python's standard library (or builtins, though as someone mentioned they ARE quite fat already), leaving reduce for all other, less frequently used cases of bulk operations. Alex From aleax@aleax.it Sun Apr 20 07:38:20 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 08:38:20 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <004101c30701$f2bde9c0$0a11a044@oemcomputer> References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> Message-ID: <200304200838.20191.aleax@aleax.it> On Sunday 20 April 2003 07:59 am, Raymond Hettinger wrote: > > Now, I think the obvious approach would be to have a function sum, > > callable with any non-empty homogeneous sequence (sequence of > > items such that + can apply between them), returning the sequence's > > summation -- now THAT might help for simplicity, clarity AND power. > > +1 -- this comes-up all the time. Yes, I agree it does -- both in discussions (c.l.py, python-help -- dunno 'bout tutor, as I'm not following it) AND in practical use. > > I'm not > > quite sure where it should go -- a builtin seems most natural (to keep > > company with min and max, for example), but maybe that would be > > too ambitious, and it should be in math or operator instead... > > __builtin__ is already too fat. math is for floats. operator is mostly > for operators. Perhaps make a separate module for vector-to-scalar > operations like min, max, product, average, moment, and dotproduct. __builtin__ has 123 entries. ls Lib/*.py | wc finds 183 toplevel modules (without even mentioning those modules that are already grouped into packages). So, making new modules should be roughly as much of a "fatness" problem as adding new builtins, at least, shouldn't it? min and max are already built-ins. Computing average(x) as sum(x)/len(x) does not seem too much of a problem. product, moment and dotproduct appear to be "nice to have" rather than real needs. True, math deals only with float stuff. But operator doesn't seem too bad -- sure, it mostly exposes stuff that's already elsewhere in the internals (operators AND others, such as countOf), but that could be considered an implementation detail. Alex From aleax@aleax.it Sun Apr 20 07:52:29 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 08:52:29 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU> References: <200304192343.48211.aleax@aleax.it> <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU> Message-ID: <200304200852.29672.aleax@aleax.it> On Sunday 20 April 2003 02:01 am, Brett Cannon wrote: ... > So I have no fundamental issue with the proposed function, but I don't > find a huge need for it personally; I always do the looping solution > (jaded against the functional stuff from school =). Looping is what I'm doing these days, but while fastest it's not terribly convenient. And it took me a while to learn to avoid reduce for that... > I do see how it could be useful, though. I don't necessarily see this as > a built-in (although it wouldn't kill me if it became one). I don't see > it going into either the math or operator modules since it doesn't quite > fit what is already there. I initially thought itertools since it is > basically working on an iterator, but I don't know if we want to change > itertools from a module the provides functionality for outputting special > iterators compared to working with iterators. Agreed on collocation -- itertools or math would be inappropriate, and builtins best, but since there are already so many builtins many are understandably reacting badly to the idea of adding anything there. So, if builtins are to be considered untouchable, I'd rather have sum in operator (where it does sort of fit, I think) than do without it. > And as for the argument that other people are shocked it isn't already > there... I just don't agree with that. Just because people want it does > not mean it is a good solution to a problem. Tyranny of the majority and > such. =) I must have expressed myself badly -- sorry. What I meant to illustrate is that sum (particularly as a built-in) would feel perfectly natural to typical Python beginners -- it would instantly become "the one obvious way" to deal with the common task of "sum these several numbers", as well as the slightly less common one of "concatenate these many strings" [many still balk at ''.join(manystrings), sum(manystrings) as I coded it delegates to ''.join so it's almost equally fast] and the like. So, let's see if I can express this more clearly...: It's not a question of tyranny of anybody -- it's a question of the degree of abstraction required to find "reduce(operator.add, L)" the ``one obvious way'' to sum numbers being quite a bit above everyday thought habits. If we say that "the one obvious way" is a loop it becomes hard to justify why the one obvious way to find a maximum is max(L) rather than a perfectly similar loop -- after all "sum these numbers" and "find the largest one of these numbers" are tasks with perfectly comparable frequency of applicability in everyday programming tasks and perceived complexity. (My implementation for sum is a small copy-past-edit job on that of max/min, removing the special-case the latter have when called with >1 argument and adding one to delegate to ''.join for the specific purpose of summing instances of PyBaseString_Type -- the structure is really very similar). Alex From aleax@aleax.it Sun Apr 20 08:16:18 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 09:16:18 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> References: <2962C557-72BB-11D7-9743-000A27B19B96@oratrix.com> Message-ID: <200304200916.18283.aleax@aleax.it> On Sunday 20 April 2003 01:03 am, Jack Jansen wrote: ... > > For the Nth time, today somebody asked in c.l.py about how best to sum > > a list of numbers. As usual, many suggested reduce(lambda x,y:x+y, L), > > others reduce(int.__add__,L), others reduce(operator.add,L), etc, and ... > > Now, I think the obvious approach would be to have a function sum, > > callable with any non-empty homogeneous sequence (sequence of ... > Do you have any idea why your sum function is, uhm, three times faster > than the reduce(operator.add) version? Is the implementation of reduce > doing something silly, or are there shortcuts you can take that reduce() > can't? I see this has already been answered. The only important shortcut in the sum I coded is to delegate to ''.join if it turns out the first item is an instance of PyBaseString_Type -- this way we get excellent performance for "concatenate up this bunch of strings" in a way that would surely be rather problematic for a function as general as reduce (the latter would need to specialcase on its function argument, singling out the special case in which it's operator.add for optimization). > I'm asking because I think I would prefer reduce to give the speed you > want. > That way, we won't have people come asking for a prod() function to > match sum(), etc. I see I must have expressed myself badly, and I apologize. Raw speed in summing up many numbers is NOT the #1 motivation for my proposal. Whether it takes 100+ microseconds, or 300+ microseconds, to sum up a thousand integers (with O(N) scaling in either case), is not all that crucial. I think the importance of speed here is mainly *psychological* -- an issue of "marketing" at some level, if you will. What I'm mostly after is to have "one obvious way" to sum up a bunch of numbers. I think a HOF such as reduce would not be "the one obvious way" to most people, and having to code an explicit loop maintaining the total has its own problems -- some people object that it's too low-level (so not obvious enough for THEM), and it also leads the beginner right into a performance trap when what's being summed is strings rather than numbers. If the one obvious way to sum (concatenate) a bunch of strings is ''.join(bunch), how can we say that when the task is summing a bunch of numbers instead, then the one obvious way becomes a HOF or an explicit loop? Having sum(bunch) would give the "one obvious way", and a speedup of 2 or 3 times wrt looping or using reduce would psychologically help make it "THE obvious way", I think. I must also have been unclear on why I think sum is important in a way that prod and other reduce operations aren't: summing a bunch of numbers is *quite a common task* in many kinds of everyday programming (and in fact everyday life) in a way that multiplying a bunch of numbers (much less any other such bulk operation) just isn't. "prod" isn't even an English word (well it IS, but not in the meaning of "product":-) and when people talk about "the product" they're hardly ever talking about multiplication, while "the sum" or also commonly "the total" are words that do indicate the result of repeatedly applying addition (even when you say "the sum" to indicate an amount of money, the history of that word does come from addition -- addition of the values coins and notes of varying denominations making up "the sum" -- while "the product" as normally used in English has nothing to do with multiplying). I think I understand the worry that introducing 'sum' would be the start of a slippery slope leading to requests for 'prod' (I can't think of other bulk operations that would be at all popular -- perhaps bulk and/or, but I think that's stretching it). But I think it's a misplaced worry in this case. "Adding up a bunch of numbers" is just SO much more common than "Multiplying them up" (indeed the latter's hardly idiomatic English, while "adding up" sure is), that I believe normal users (as opposed to advanced programmers with a keenness on generalization) wouldn't have any problem at all with 'sum' being there and 'prod' missing... Alex From prabhu@aero.iitm.ernet.in Sun Apr 20 08:51:13 2003 From: prabhu@aero.iitm.ernet.in (Prabhu Ramachandran) Date: Sun, 20 Apr 2003 13:21:13 +0530 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304200829.52477.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in> <200304200829.52477.aleax@aleax.it> Message-ID: <16034.20849.214752.656523@monster.linux.in> >>>>> "AM" == Alex Martelli <aleax@aleax.it> writes: >> argue that sum is necessary, what about product (which again >> Numeric provides along with a host of other useful functions)? AM> In the context of Numeric use, it's quite appropriate to have AM> sum, prod, and the other ufuncs' reduce AND accumulate AM> methods. In everyday programming in other fields, the demand AM> for the functionality given by sum is FAR higher than that AM> given by prod. For example, googling on c.l.py shows 165 AM> posts mentioning "reduce(operator.add" versus 39 mentioning AM> "reduce(operator.mul". This reflects the need of typical AM> computations -- indeed, even the English language shows AM> indications about the prevalence of summing as a bulk AM> operation. I agree that sum will be used far more than product. I can't remember when I needed to use product myself! Anyway here are the arguments I've seen so far. Pros: 1. One obvious, fairly efficient, and easy way to sum sequences. 2. Google and experience suggest sum is used more often than product and other functions. 3. Easy on newbies? 4. Will hopefully prevent the N+1'th thread on how to sum lists on c.l.py. Cons: 1. __builtins__ is already fat. Will one more function make that much difference? 2. Will future requests be made for product and friends? 3. Why not simply speed up reduce/operator.add and train more people to use that? cheers, prabhu From niemeyer@conectiva.com Sun Apr 20 08:54:53 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Sun, 20 Apr 2003 04:54:53 -0300 Subject: [Python-Dev] New re failures on Windows In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> Message-ID: <20030420075453.GA9504@localhost.distro.conectiva> > test_re is failing today: [...] Should be working now. Sorry about the trouble. I should have fixed that before submiting the first version. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From fincher.8@osu.edu Sun Apr 20 09:56:46 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 04:56:46 -0400 Subject: [Python-Dev] heapq In-Reply-To: <20030419224110.GB2460@barsoom.org> References: <20030419224110.GB2460@barsoom.org> Message-ID: <200304200456.52084.fincher.8@osu.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I must agree with Agthorr that the interface to heapq could use some work. I wrote a simple wrapper around the heapq functions to give myself a heapq object (with methods like push, pop, etc.) Here's that object: class heap(list): __slots__ = () def __init__(self, seq): list.__init__(self, seq) heapify(self) def pop(self): lastelt = list.pop(self) if self: returnitem = self[0] self[0] = lastelt _siftup(self, 0) else: returnitem = lastelt return returnitem replace = heapreplace push = heappush Is there any possibility of such an interface going into the heapq module? I find it much cleaner and easier to read than the "functions operating on sequences" interface heapq currently offers. I've got unit tests for the object written, if it is something that will possibly go into the standard library. Jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+omDOqkDiu+Bs+JIRAu0wAJ9xpx+7nH0fNiZzJhl34tWUbHN4HgCfZx9G 38IgV4lSY6adYyLEufWG6mk= =NVlh -----END PGP SIGNATURE----- From aleax@aleax.it Sun Apr 20 09:01:12 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 10:01:12 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <16034.20849.214752.656523@monster.linux.in> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <16034.20849.214752.656523@monster.linux.in> Message-ID: <200304201001.12643.aleax@aleax.it> On Sunday 20 April 2003 09:51 am, Prabhu Ramachandran wrote: VERY good summary -- repeated here: > Pros: > 1. One obvious, fairly efficient, and easy way to sum sequences. > 2. Google and experience suggest sum is used more often than > product and other functions. > 3. Easy on newbies? > 4. Will hopefully prevent the N+1'th thread on how to sum lists on > c.l.py. > > Cons: > 1. __builtins__ is already fat. Will one more function make that > much difference? > 2. Will future requests be made for product and friends? > 3. Why not simply speed up reduce/operator.add and train more people > to use that? I think Pro #2 answers Con #2, and dittos for #3's (in addition to some implementation issues with speeding up reduce that way). But anyway, yes, these _are_ the considerations made pro & con on this thread. Anyway, whence now -- a PEP? (Seems a bit too small for that). Or, do I just submit the patch (to where -- builtins?) and let Guido pronounce? Alex From drifty@alum.berkeley.edu Sun Apr 20 09:07:54 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 20 Apr 2003 01:07:54 -0700 (PDT) Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304200852.29672.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <Pine.SOL.4.55.0304191648590.9716@death.OCF.Berkeley.EDU> <200304200852.29672.aleax@aleax.it> Message-ID: <Pine.SOL.4.55.0304200102340.27731@death.OCF.Berkeley.EDU> [Alex Martelli] > On Sunday 20 April 2003 02:01 am, Brett Cannon wrote: > ... > > So I have no fundamental issue with the proposed function, but I don't > > find a huge need for it personally; I always do the looping solution > > (jaded against the functional stuff from school =). > > Looping is what I'm doing these days, but while fastest it's not terribly > convenient. And it took me a while to learn to avoid reduce for that... > True, but "explicit is better than implicit". But don't take this to mean that I don't think that your proposed function is not good; I do think it has merit. <snip> > Agreed on collocation -- itertools or math would be inappropriate, and > builtins best, but since there are already so many builtins many are > understandably reacting badly to the idea of adding anything there. > So, if builtins are to be considered untouchable, I'd rather have sum > in operator (where it does sort of fit, I think) than do without it. > Fair enough. Either that or a new module. <snip> > So, let's see if I can express this more clearly...: > > It's not a question of tyranny of anybody -- it's a question of the degree > of abstraction required to find "reduce(operator.add, L)" the ``one > obvious way'' to sum numbers being quite a bit above everyday thought > habits. If we say that "the one obvious way" is a loop it becomes hard > to justify why the one obvious way to find a maximum is max(L) rather > than a perfectly similar loop -- after all "sum these numbers" and "find > the largest one of these numbers" are tasks with perfectly comparable > frequency of applicability in everyday programming tasks and perceived > complexity. (My implementation for sum is a small copy-past-edit job > on that of max/min, removing the special-case the latter have when > called with >1 argument and adding one to delegate to ''.join for the > specific purpose of summing instances of PyBaseString_Type -- the > structure is really very similar). > That's better. =) Comes off less as "let's add this to make newcomers happy" and more as "this will simplify good code". The latter is a always a good thing. I have an idea to respond to the whole "everyone will want prod() next" idea, but I will put that in another email to try to keep this thread coherent. -Brett From fincher.8@osu.edu Sun Apr 20 10:16:01 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 05:16:01 -0400 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <20030419224110.GB2460@barsoom.org> References: <20030419224110.GB2460@barsoom.org> Message-ID: <200304200516.02382.fincher.8@osu.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 2.3 seems to focus somewhat on adding a wider variety of data structures to Python -- well, sets and heapq, at least :) One thing I've found lacking, though, is a nice O(1) FIFO queue -- even the standard Queue module underlying uses a list as a queue, which means the dequeue operation is O(N) in the size of the queue. I'm curious what the possiblity of getting a queue module (which would probably have to be named "fifo", since Queue is already taken and some operating systems use case-insensitive filesystems) added to the standard library would be. If it is a possibility, I have a pure-Python implementation using the mechanism described at <http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=a23cjl%24dps%241%40serv1.iunet.it>. The module itself is at <http://www.cis.ohio-state.edu/fifo.py>; the tests are at <http://www.cis.ohio-state.edu/test_fifo.py>. Jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+omVRqkDiu+Bs+JIRAilOAKCWe7CfZqyBboi/zGZ5jHxnKSiS5ACfTBEt D2Hz+k7dzXTW3HjXByzlA2M= =juHN -----END PGP SIGNATURE----- From drifty@alum.berkeley.edu Sun Apr 20 09:18:08 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 20 Apr 2003 01:18:08 -0700 (PDT) Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304200829.52477.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in> <200304200829.52477.aleax@aleax.it> Message-ID: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> [Alex Martelli] > On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote: <snip> ... > In the context of Numeric use, it's quite appropriate to have sum, prod, > and the other ufuncs' reduce AND accumulate methods. In everyday > programming in other fields, the demand for the functionality given by > sum is FAR higher than that given by prod. <snip> I think part of the trouble here is the name. The word "sum" just automatically causes one to think math. This leads to thinking of multiplication, division, and subtraction. But Alex's proposed function does more than a summation by special-casing the concatentation of strings. Perhaps renaming it to something like "combine()" would help do away with the worry of people wanting a complimentary version for multiplication since it does more than just sum numbers; it also combines strings in a very efficient manner. I mean we could extend this to all built-in types where there is a reasonable operation for them (but this is jumping the gun). And as for the worry about this being a built-in, we do have divmod for goodness sakes. I mean divmod() is nothing more than glorifying division and remainder for the sake of clean code; ``divmod(3,2) == (3/2, 3%2)``. This function serves the same purpose in the end; to allow for cleaner code with some improved performance for a function that people use on a regular enough basis to ask for it constantly on c.l.p . -Brett From noah@noah.org Sun Apr 20 09:20:57 2003 From: noah@noah.org (Noah Spurrier) Date: Sun, 20 Apr 2003 01:20:57 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option Message-ID: <3EA25869.6070404@noah.org> Hello, Recently I realized that there is no easy way to walk a directory tree and rename each directory and file. The standard os.path.walk() function does a breadth first walk. This makes it hard to write scripts that modify directory names as they walk the tree because you need to visit subdirectories before you rename their parents. What is really needed is a depth first walk. For example this naive code would not work with breadth first walk: """Renames all directories and files to lower case.""" import os.path def visit (arg, dirname, names): for name in names: print os.path.join (dirname, name) oldname = os.path.join (dirname, name) newname = os.path.join (dirname, name.lower()) os.rename (oldname, newname) os.path.walk ('.', visit, None) The library source posixpath.py defined os.path.walk on my system. A comment in that file mentions that the visit function may modify the filenames list to impose a different order of visiting, but this is not possible as far as I can tell. Perhaps future versions of Python could include an option to do a depth first walk instead of the default breadth first. Modifying os.path.walk() to allow for optional depth first walking is simple. I have attached a patch to posixpath.py that demonstrates this. This adds an if conditional at the beginning and end of the walk() function. I have not checked to see if other platforms share the posixpath.py module this for the walk() function, but if there is interest then I'd be happy to cross reference this. Yours, Noah *** posixpath.py 2003-04-19 22:26:08.000000000 -0700 --- posixpath_walk_depthfirst.py 2003-04-19 22:12:48.000000000 -0700 *************** *** 259,265 **** # The func may modify the filenames list, to implement a filter, # or to impose a different order of visiting. ! def walk(top, func, arg): """Directory tree walk with callback function. For each directory in the directory tree rooted at top (including top --- 259,265 ---- # The func may modify the filenames list, to implement a filter, # or to impose a different order of visiting. ! def walk(top, func, arg, depthfirst=False): """Directory tree walk with callback function. For each directory in the directory tree rooted at top (including top *************** *** 272,284 **** order of visiting. No semantics are defined for, or required of, arg, beyond that arg is always passed to func. It can be used, e.g., to pass a filename pattern, or a mutable object designed to accumulate ! statistics. Passing None for arg is common.""" try: names = os.listdir(top) except os.error: return ! func(arg, top, names) for name in names: name = join(top, name) try: --- 272,287 ---- order of visiting. No semantics are defined for, or required of, arg, beyond that arg is always passed to func. It can be used, e.g., to pass a filename pattern, or a mutable object designed to accumulate ! statistics. Passing None for arg is common. The optional depthfirst ! argument may be set to True to walk the directory tree depth first. ! The default is False (walk breadth first).""" try: names = os.listdir(top) except os.error: return ! if not depthfirst: ! func(arg, top, names) for name in names: name = join(top, name) try: *************** *** 287,293 **** continue if stat.S_ISDIR(st.st_mode): walk(name, func, arg) ! # Expand paths beginning with '~' or '~user'. # '~' means $HOME; '~user' means that user's home directory. --- 290,297 ---- continue if stat.S_ISDIR(st.st_mode): walk(name, func, arg) ! if depthfirst: ! func(arg, top, names) # Expand paths beginning with '~' or '~user'. # '~' means $HOME; '~user' means that user's home directory. *************** *** 416,420 **** return filename supports_unicode_filenames = False - - --- 420,422 ---- From oren-py-l@hishome.net Sun Apr 20 09:37:18 2003 From: oren-py-l@hishome.net (Oren Tirosh) Date: Sun, 20 Apr 2003 04:37:18 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> References: <200304192343.48211.aleax@aleax.it> <16034.14739.83749.505815@monster.linux.in> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> Message-ID: <20030420083718.GA65548@hishome.net> On Sun, Apr 20, 2003 at 01:18:08AM -0700, Brett Cannon wrote: > [Alex Martelli] > > > On Sunday 20 April 2003 08:09 am, Prabhu Ramachandran wrote: > <snip> ... > > In the context of Numeric use, it's quite appropriate to have sum, > prod, > > and the other ufuncs' reduce AND accumulate methods. In everyday > > programming in other fields, the demand for the functionality given by > > sum is FAR higher than that given by prod. > <snip> > > I think part of the trouble here is the name. The word "sum" just > automatically causes one to think math. This leads to thinking of > multiplication, division, and subtraction. But Alex's proposed function > does more than a summation by special-casing the concatentation of > strings. The special case is just a performance optimization. Without it the sum function would still return the same result. The sum function should work for any object that defines a + operator. I agree that the name 'sum' isn't 100% intuitive for use with strings but I can't think of any name that would be really natural for both. Oren From ping@zesty.ca Sun Apr 20 09:42:26 2003 From: ping@zesty.ca (Ka-Ping Yee) Date: Sun, 20 Apr 2003 03:42:26 -0500 (CDT) Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> Message-ID: <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org> On Sun, 20 Apr 2003, Brett Cannon wrote: > I think part of the trouble here is the name. The word "sum" just > automatically causes one to think math. This leads to thinking of > multiplication, division, and subtraction. But Alex's proposed function > does more than a summation by special-casing the concatentation of > strings. > > Perhaps renaming it to something like "combine()" would help do away with > the worry of people wanting a complimentary version for multiplication > since it does more than just sum numbers; it also combines strings in a > very efficient manner. Why not simply call it "add()", if it's going to be in the built-ins? That seems like the most straightforward and accurate name. It would have the same argument spec as min() and max(): it accepts a single list argument, or multiple arguments to be added together. Thus, no serious confusion with operator.add -- builtin add() would work anywhere that operator.add works now. >>> help(add) add(...) add(sequence) -> value add(a, b, c, ...) -> value With a single sequence argument, add together all the elements. With two or more arguments, add together all the arguments. New question: what is add([])? If add() is really polymorphic, then this should probably raise an exception (just like min() and max() do). That would lead to idioms such as add(numberlist + [0]) add(stringlist + ['']) I suppose those don't look too bad. Nothing vastly better springs to mind. -- ?!ng From aleax@aleax.it Sun Apr 20 11:10:07 2003 From: aleax@aleax.it (Alex Martelli) Date: Sun, 20 Apr 2003 12:10:07 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> Message-ID: <200304201210.07054.aleax@aleax.it> On Sunday 20 April 2003 10:18 am, Brett Cannon wrote: ... > I think part of the trouble here is the name. The word "sum" just > automatically causes one to think math. This leads to thinking of > multiplication, division, and subtraction. But Alex's proposed function > does more than a summation by special-casing the concatentation of > strings. Actually it does more than summation because, in Python, + does more than summation. E.g., my function needs absolutely no special-casing whatsoever to produce >>> sum([[1,2],[3,4],[5,6]]) [1, 2, 3, 4, 5, 6] I special-cased the "sum of strings" just for performance issues, nothing more! > Perhaps renaming it to something like "combine()" would help do away with > the worry of people wanting a complimentary version for multiplication > since it does more than just sum numbers; it also combines strings in a > very efficient manner. I mean we could extend this to all built-in types > where there is a reasonable operation for them (but this is jumping the > gun). sum already works on all types, built-in or not, for which + and operator.add work -- thus, 'combine' sounds too vague to me, and the natural way to "extend this" to any other type would be to have that type support + (by defining __add__ or in the equivalent C-coded way). > And as for the worry about this being a built-in, we do have divmod for > goodness sakes. I mean divmod() is nothing more than glorifying division > and remainder for the sake of clean code; ``divmod(3,2) == (3/2, 3%2)``. > This function serves the same purpose in the end; to allow for cleaner > code with some improved performance for a function that people use on a > regular enough basis to ask for it constantly on c.l.p . I think this is a very good point. The worries come from the fact that we already have many built-ins (44 functions at last count -- counting such things as exception classes doesn't seem sensible) -- but is it a good idea to exclude 'sum' because we have more exotic built-ins such as 'divmod', or semi-obsolete ones such as 'apply'? I've only seen Raymond objecting to 'sum' as a built-in (naming it 'add' might be just as fine, and having it accept the same argument patterns as max/min probably useful) -- though I may have missed other voices speaking to this issue -- so perhaps it's best if he clarifies his objection. Alex From andrew@acooke.org Sun Apr 20 11:53:41 2003 From: andrew@acooke.org (andrew cooke) Date: Sun, 20 Apr 2003 06:53:41 -0400 (CLT) Subject: [Python-Dev] FIFO data structure? In-Reply-To: <200304200516.02382.fincher.8@osu.edu> References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> Message-ID: <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> hi, i haven't looked at the code, but when you mention lists are you referring to standard python structures? i understood that the thing in python that looks like a list is actually an array (a simple one, not a vlist), so access to index elements is done in constant time. however, that doesn't necesarily alter your argument as using an array for a fifo queue in a naieve manner is going to cause problems too (unless the implementation explicitly implements a circular buffer, say, or the array implementation is clever enough to drop leading elements whch contain nulls - note that circular buffers are a bit tricky to extend in size). see eg http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52246 (the links you gave don't work for me.) cheers, andrew Jeremy Fincher said: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > 2.3 seems to focus somewhat on adding a wider variety of data structures > to > Python -- well, sets and heapq, at least :) One thing I've found lacking, > though, is a nice O(1) FIFO queue -- even the standard Queue module > underlying uses a list as a queue, which means the dequeue operation is > O(N) > in the size of the queue. I'm curious what the possiblity of getting a > queue > module (which would probably have to be named "fifo", since Queue is > already > taken and some operating systems use case-insensitive filesystems) added > to > the standard library would be. > > If it is a possibility, I have a pure-Python implementation using the > mechanism described at > <http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=a23cjl%24dps%241%40serv1.iunet.it>. > The module itself is at <http://www.cis.ohio-state.edu/fifo.py>; the tests > are at <http://www.cis.ohio-state.edu/test_fifo.py>. > > Jeremy > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.1 (FreeBSD) > > iD8DBQE+omVRqkDiu+Bs+JIRAilOAKCWe7CfZqyBboi/zGZ5jHxnKSiS5ACfTBEt > D2Hz+k7dzXTW3HjXByzlA2M= > =juHN > -----END PGP SIGNATURE----- > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > > -- http://www.acooke.org/andrew From skip@mojam.com Sun Apr 20 13:00:32 2003 From: skip@mojam.com (Skip Montanaro) Date: Sun, 20 Apr 2003 07:00:32 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200304201200.h3KC0WE16154@manatee.mojam.com> Bug/Patch Summary ----------------- 396 open / 3541 total bugs (+13) 130 open / 2092 total patches (-4) New Bugs -------- profile.run makes assumption regarding namespace (2003-04-06) http://python.org/sf/716587 "build_ext" "libraries" subcommand not split into values (2003-04-07) http://python.org/sf/716634 Uthread problem - Pipe left open (2003-04-08) http://python.org/sf/717614 inspect, class instances and __getattr__ (2003-04-09) http://python.org/sf/718532 sys.path on MacOSX (2003-04-10) http://python.org/sf/719297 pimp needs to do download and untar itself (2003-04-10) http://python.org/sf/719300 Icon on applets is wrong (2003-04-10) http://python.org/sf/719303 string exceptions are deprecated (2003-04-10) http://python.org/sf/719367 Mac OS X painless compilation (2003-04-11) http://python.org/sf/719549 tokenize module w/ coding cookie (2003-04-11) http://python.org/sf/719888 datetime types don't work as bases (2003-04-13) http://python.org/sf/720908 Building lib.pdf fails on MacOSX (2003-04-14) http://python.org/sf/721157 Acrobat Reader 5 compatibility (2003-04-14) http://python.org/sf/721160 tarfile gets filenames wrong (2003-04-15) http://python.org/sf/721871 _winreg doesn't handle NULL bytes in value names (2003-04-16) http://python.org/sf/722413 weakref: proxy_print and proxy_repr incons. (2003-04-16) http://python.org/sf/722763 Put a reference to print in the Library Reference, please. (2003-04-17) http://python.org/sf/723136 PyThreadState_Clear() docs incorrect (2003-04-17) http://python.org/sf/723205 add timeout support in socket using modules (2003-04-17) http://python.org/sf/723287 runtime_library_dirs broken under OS X (2003-04-17) http://python.org/sf/723495 __slots__ broken in 2.3a with ("__dict__", ) (2003-04-18) http://python.org/sf/723540 app-building with Bundlebuilder for framework builds (2003-04-18) http://python.org/sf/723562 logging.setLoggerClass() doesn't support new-style classes (2003-04-18) http://python.org/sf/723801 overintelligent slice() behavior on integers (2003-04-18) http://python.org/sf/723806 urlopen(url_to_a_non-existing-domain) raises gaierror (2003-04-18) http://python.org/sf/723831 imaplib should convert line endings to be rfc2822 complient (2003-04-18) http://python.org/sf/723962 New Patches ----------- PEP 269 Implementation (2002-08-23) http://python.org/sf/599331 has_function() method for CCompiler (2003-04-07) http://python.org/sf/717152 allow timeit to see your globals() (2003-04-08) http://python.org/sf/717575 Patch to distutils doc for metadata explanation (2003-04-09) http://python.org/sf/718027 DESTDIR variable patch (2003-04-09) http://python.org/sf/718286 fix test_long failure on OSF/1 (2003-04-10) http://python.org/sf/719359 Remove __file__ after running $PYTHONSTARTUP (2003-04-11) http://python.org/sf/719777 proposed patch for posixpath.py: getctime() (2003-04-12) http://python.org/sf/720188 Patch to make shlex accept escaped quotes in strings. (2003-04-12) http://python.org/sf/720329 iconv_codec 3rd generation (2003-04-13) http://python.org/sf/720585 Some bug fixes for regular ex code. (2003-04-14) http://python.org/sf/720991 Add copyrange method to array. (2003-04-14) http://python.org/sf/721061 Remote debugging with pdb.py (2003-04-14) http://python.org/sf/721464 Better output for unittest (2003-04-16) http://python.org/sf/722638 PyArg_ParseTuple problem with 'L' format (2003-04-17) http://python.org/sf/723201 __del__ in dumbdbm fails under some circumstances (2003-04-17) http://python.org/sf/723231 ability to pass a timeout to underlying socket (2003-04-17) http://python.org/sf/723312 terminal type option subnegotiation in telnetlib (2003-04-17) http://python.org/sf/723364 Backport of recent sre fixes. (2003-04-18) http://python.org/sf/723940 Closed Bugs ----------- urllib needs 303/307 handlers (2002-06-12) http://python.org/sf/568068 Support for masks in getargs.c (2002-08-14) http://python.org/sf/595026 Cannot compile escaped unicode character (2002-09-20) http://python.org/sf/612074 Numerous defunct threads left behind (2002-10-10) http://python.org/sf/621548 no docs for HTMLParser.handle_pi (2002-12-27) http://python.org/sf/659188 2.3a1 computes lastindex incorrectly (2003-01-22) http://python.org/sf/672491 re.LOCALE, umlaut and \w (2003-02-21) http://python.org/sf/690974 gensuitemodule overhaul (2003-03-04) http://python.org/sf/697179 string.strip implementation/doc mismatch (2003-03-04) http://python.org/sf/697220 builtin type inconsistency (2003-03-07) http://python.org/sf/699312 Obscure error message (2003-03-08) http://python.org/sf/699934 gensuitemodule needs to be documented (2003-03-29) http://python.org/sf/711986 test_zipimport failing on ia64 (at least) (2003-03-30) http://python.org/sf/712322 Cannot change the class of a list (2003-03-31) http://python.org/sf/712975 Closed Patches -------------- SimpleXMLRPCServer auto-docing subclass (2002-03-29) http://python.org/sf/536883 optionally make shelve less surprising (2002-05-07) http://python.org/sf/553171 GC: untrack simple objects (2002-05-21) http://python.org/sf/558745 gettext module charset changes (2002-06-13) http://python.org/sf/568669 Shadow Password Support Module (2002-07-09) http://python.org/sf/579435 Add popen2 like functionality to pty.py. (2002-08-03) http://python.org/sf/590513 Refactoring of difflib.Differ (2002-08-27) http://python.org/sf/600984 Punycode encoding (2002-11-02) http://python.org/sf/632643 refactoring and documenting ModuleFinder (2002-11-25) http://python.org/sf/643711 Complementary patch for OpenVMS (2002-12-07) http://python.org/sf/649997 659188: no docs for HTMLParser (2003-01-04) http://python.org/sf/662464 xmlrpclib: better string encoding in responce package (2003-02-03) http://python.org/sf/679383 Tiny patch for bug 612074: sre unicode escapes (2003-02-05) http://python.org/sf/681152 AutoThreadState implementation (2003-02-10) http://python.org/sf/684256 optparse OptionGroup docs (2003-03-05) http://python.org/sf/697941 docs for hotshot module (2003-03-05) http://python.org/sf/698505 time.tzset standards compliance update (2003-03-19) http://python.org/sf/706707 Allow range() to return long integer values (2003-03-21) http://python.org/sf/707427 remove -static option from cygwinccompiler (2003-03-24) http://python.org/sf/709178 new test_urllib and patch for found urllib bug (2003-03-27) http://python.org/sf/711002 Removing unnecessary lock operations (2003-03-29) http://python.org/sf/711835 iconv_codec NG (2003-04-02) http://python.org/sf/713820 Unicode Codecs for CJK Encodings (2003-04-02) http://python.org/sf/713824 Guard against segfaults in debug code (2003-04-02) http://python.org/sf/714348 Document freeze process in PC/config.c (2003-04-03) http://python.org/sf/714957 From lists@morpheus.demon.co.uk Sun Apr 20 14:39:25 2003 From: lists@morpheus.demon.co.uk (Paul Moore) Date: Sun, 20 Apr 2003 14:39:25 +0100 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <16034.20849.214752.656523@monster.linux.in> <200304201001.12643.aleax@aleax.it> Message-ID: <n2m-g.he8tffua.fsf@morpheus.demon.co.uk> Alex Martelli <aleax@aleax.it> writes: > Anyway, whence now -- a PEP? (Seems a bit too small for that). Or, do > I just submit the patch (to where -- builtins?) and let Guido pronounce? Not that I have much to say, but I'd say submit a patch, and either assign it to Guido for pronouncement, or just wait for his view. You may hit the "no new features for 2.3" rule, and have to wait for 2.4 - personally, I think it's small enough for that not to matter, but Guido's been pretty strict with that one so far... Paul. -- This signature intentionally left blank From aahz@pythoncraft.com Sun Apr 20 15:00:22 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 10:00:22 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <004101c30701$f2bde9c0$0a11a044@oemcomputer> References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> Message-ID: <20030420140022.GA6462@panix.com> On Sun, Apr 20, 2003, Raymond Hettinger wrote: > > __builtin__ is already too fat. math is for floats. operator is mostly > for operators. Perhaps make a separate module for vector-to-scalar > operations like min, max, product, average, moment, and dotproduct. Call it "statistics". Yes, I've seen the comments about using add()/sum() for strings, but I think numeric usage will be by far the most common. I also think that max() and min() should be removed from the builtins. Having a good, simple statistics library standard would be a Good Thing. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From jack@performancedrivers.com Sun Apr 20 15:22:46 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Sun, 20 Apr 2003 10:22:46 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030420140022.GA6462@panix.com>; from aahz@pythoncraft.com on Sun, Apr 20, 2003 at 10:00:22AM -0400 References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> <20030420140022.GA6462@panix.com> Message-ID: <20030420102245.A15881@localhost.localdomain> On Sun, Apr 20, 2003 at 10:00:22AM -0400, Aahz wrote: > On Sun, Apr 20, 2003, Raymond Hettinger wrote: > > > > __builtin__ is already too fat. math is for floats. operator is mostly > > for operators. Perhaps make a separate module for vector-to-scalar > > operations like min, max, product, average, moment, and dotproduct. > > Call it "statistics". Yes, I've seen the comments about using add()/sum() > for strings, but I think numeric usage will be by far the most common. > I also think that max() and min() should be removed from the builtins. > Having a good, simple statistics library standard would be a Good Thing. Would operations performed on sets go in there too, like combinatorics[1] that are also frequently golfed on c.l.py? I'm also not sure that add() means '+' or 'plus' to everyday people. I read strvar += 'foo' as concatenate or 'plus' at a stretch but not 'add'. -jack [1] my 'probstat' module does these in C for lists/tuples. probstat.sf.net From jack@performancedrivers.com Sun Apr 20 15:29:04 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Sun, 20 Apr 2003 10:29:04 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org>; from ping@zesty.ca on Sun, Apr 20, 2003 at 03:42:26AM -0500 References: <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <Pine.LNX.4.33.0304200330420.1715-100000@server1.lfw.org> Message-ID: <20030420102904.B15881@localhost.localdomain> On Sun, Apr 20, 2003 at 03:42:26AM -0500, Ka-Ping Yee wrote: > New question: what is add([])? If add() is really polymorphic, then > this should probably raise an exception (just like min() and max() do). > That would lead to idioms such as > > add(numberlist + [0]) > > add(stringlist + ['']) > > I suppose those don't look too bad. Nothing vastly better springs > to mind. For a large numberlist this is a problem, it causes a copy of the whole list. Not to mention it looks like a perl coercion hack. The third argument to reduce is there to avoid the hack. so now we have from newmodule import add answer = add(numberlist, 0) why don't we just write it as from operator import add answer = reduce(add, numberlist, 0) -jack From andrew-pydev@lexical.org.uk Sun Apr 20 15:38:17 2003 From: andrew-pydev@lexical.org.uk (Andrew Walkingshaw) Date: Sun, 20 Apr 2003 15:38:17 +0100 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030420140022.GA6462@panix.com> References: <200304192343.48211.aleax@aleax.it> <004101c30701$f2bde9c0$0a11a044@oemcomputer> <20030420140022.GA6462@panix.com> Message-ID: <20030420143817.GA43283@colon.colondot.net> On Sun, Apr 20, 2003 at 10:00:22AM -0400, Aahz wrote: > On Sun, Apr 20, 2003, Raymond Hettinger wrote: > > > > __builtin__ is already too fat. math is for floats. operator is mostly > > for operators. Perhaps make a separate module for vector-to-scalar > > operations like min, max, product, average, moment, and dotproduct. > > Call it "statistics". Yes, I've seen the comments about using add()/sum() > for strings, but I think numeric usage will be by far the most common. A lightweight vector class would be very useful; it's something I've had to roll my own of for a lot of scientific code I'm writing (the problem being that it's often impractical to build Numeric everywhere, so you can't rely on having it whereas you probably can rely on at least having Python.) A good example is in processing of output from solid-state physics codes (a subject very close to my heart); you want vectors to store (eg) positions of and forces on atoms, but you don't need the performance of Numeric - and the distribution overhead of same. As such, this is something I've got lying around; I'd be more than willing to distribute this (~100 line) class to whoever wants it under whatever license they care for. It should be easily extensible to do whatever else people want in this regard, as well. - Andrew -- email: andrew@lexical.org.uk http://www.lexical.org.uk/ Earth Sciences, University of Cambridge http://www.esc.cam.ac.uk/ CUR1350, 1350 MW Cambridgeshire and online http://www.cur1350.co.uk/ From jack@performancedrivers.com Sun Apr 20 15:58:07 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Sun, 20 Apr 2003 10:58:07 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304201210.07054.aleax@aleax.it>; from aleax@aleax.it on Sun, Apr 20, 2003 at 12:10:07PM +0200 References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> Message-ID: <20030420105807.C15881@localhost.localdomain> I see two points 1 - it isn't obvious to many people to write reduce(operator.add, mylist, 0) # where '0' is just an appropriate default 2 - reduce() is slower than a special purpose function #2 is fixable (see my earlier posts) and isn't the main argument of proponents. To #1 I would argue for education about reduce(). We already have minor style wars about map/filter versus list comps. This would just add one more. People would still have to learn about reduce() when they wanted the first argument to be anything other than operator.add. aliasing reduce(operator.add, mylist, 0) to sum(mylist, 0) is a solution looking for a problem, IMO. I know I would have to learn what the to-be-named module of aliases does if people start to use them. I'll be selfish here, I don't want to learn em. The proposed patch would be equivilent to a one line alias, even if it is written more verbosely in C. A one line alias for existing functionality sounds like TMTOWTDI to me. I also don't want people having patch fights every time they see sum() or reduce() in code (re-submitting whichever version they prefer). A possible solution could be a 'newbie' module that defined things like 'sum' with the canonical solution listed in the documentation. It would be a nice clear flag to readers of the code while allowing the noob to skip reading the reduce() manpage. -jack From dave@boost-consulting.com Sun Apr 20 16:07:24 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 20 Apr 2003 11:07:24 -0400 Subject: [Python-Dev] Hook Extension Module Import? Message-ID: <847k9pp5qr.fsf@boost-consulting.com> Hi, I think I need a way to temporarily (from 'C'), arrange to be notified just before and just after a new extension module is loaded. Is this possible? I didn't see anything obvious in the source. BTW, I'd be just as happy if it were possible to do the same thing for any module (i.e., not discriminating between extension and pure python modules). Thanks in advance, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From fincher.8@osu.edu Sun Apr 20 17:21:49 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 12:21:49 -0400 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> Message-ID: <200304201221.51200.fincher.8@osu.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday 20 April 2003 06:53 am, andrew cooke wrote: > i haven't looked at the code, but when you mention lists are you referring > to standard python structures? i understood that the thing in python that > looks like a list is actually an array (a simple one, not a vlist), so > access to index elements is done in constant time. But deleting an element from the beginning is O(n), because all the elements have to be moved back to replace it. So queues implemented via list.append and list.pop(0) are O(N) in dequeue. > (the links you gave don't work for me.) Ah, shoot, I always forget the ~fincher. New links: <http://www.cis.ohio-state.edu/~fincher/fifo.py> <http://www.cis.ohio-state.edu/~fincher/test_fifo.py> Jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+oskdqkDiu+Bs+JIRAu/tAJ0dPjbD65e5Kw1XctbrWwYGt4jAZgCaAgF3 2+3nzAjeswigkg9697bx38Y= =E8ES -----END PGP SIGNATURE----- From aahz@pythoncraft.com Sun Apr 20 17:35:33 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 12:35:33 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <847k9pp5qr.fsf@boost-consulting.com> References: <847k9pp5qr.fsf@boost-consulting.com> Message-ID: <20030420163533.GA1885@panix.com> On Sun, Apr 20, 2003, David Abrahams wrote: > > I think I need a way to temporarily (from 'C'), arrange to be notified > just before and just after a new extension module is loaded. Is this > possible? I didn't see anything obvious in the source. BTW, I'd be > just as happy if it were possible to do the same thing for any module > (i.e., not discriminating between extension and pure python modules). http://www.python.org/peps/pep-0302.html -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From dave@boost-consulting.com Sun Apr 20 17:58:53 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 20 Apr 2003 12:58:53 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <20030420163533.GA1885@panix.com> (aahz@pythoncraft.com's message of "Sun, 20 Apr 2003 12:35:33 -0400") References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> Message-ID: <84wuhpnm0i.fsf@boost-consulting.com> Aahz <aahz@pythoncraft.com> writes: > On Sun, Apr 20, 2003, David Abrahams wrote: >> >> I think I need a way to temporarily (from 'C'), arrange to be notified >> just before and just after a new extension module is loaded. Is this >> possible? I didn't see anything obvious in the source. BTW, I'd be >> just as happy if it were possible to do the same thing for any module >> (i.e., not discriminating between extension and pure python modules). > > http://www.python.org/peps/pep-0302.html I guess I should take that to mean "you can't do that yet" (?) -- Dave Abrahams Boost Consulting www.boost-consulting.com From aahz@pythoncraft.com Sun Apr 20 18:04:09 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 13:04:09 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <84wuhpnm0i.fsf@boost-consulting.com> References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com> Message-ID: <20030420170408.GA6705@panix.com> On Sun, Apr 20, 2003, David Abrahams wrote: > Aahz <aahz@pythoncraft.com> writes: >> On Sun, Apr 20, 2003, David Abrahams wrote: >>> >>> I think I need a way to temporarily (from 'C'), arrange to be notified >>> just before and just after a new extension module is loaded. Is this >>> possible? I didn't see anything obvious in the source. BTW, I'd be >>> just as happy if it were possible to do the same thing for any module >>> (i.e., not discriminating between extension and pure python modules). >> >> http://www.python.org/peps/pep-0302.html > > I guess I should take that to mean "you can't do that yet" (?) As the PEP says, you *could* define an __import__ hook, but that would likely be more effort than you want. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From dave@boost-consulting.com Sun Apr 20 18:41:41 2003 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 20 Apr 2003 13:41:41 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <20030420170408.GA6705@panix.com> (aahz@pythoncraft.com's message of "Sun, 20 Apr 2003 13:04:09 -0400") References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com> <20030420170408.GA6705@panix.com> Message-ID: <84r87xnk16.fsf@boost-consulting.com> Aahz <aahz@pythoncraft.com> writes: > On Sun, Apr 20, 2003, David Abrahams wrote: >> Aahz <aahz@pythoncraft.com> writes: >>> On Sun, Apr 20, 2003, David Abrahams wrote: >>>> >>>> I think I need a way to temporarily (from 'C'), arrange to be notified >>>> just before and just after a new extension module is loaded. Is this >>>> possible? I didn't see anything obvious in the source. BTW, I'd be >>>> just as happy if it were possible to do the same thing for any module >>>> (i.e., not discriminating between extension and pure python modules). >>> >>> http://www.python.org/peps/pep-0302.html >> >> I guess I should take that to mean "you can't do that yet" (?) > > As the PEP says, you *could* define an __import__ hook, but that would > likely be more effort than you want. It also says: The situation gets worse when you need to extend the import mechanism from C: it's currently impossible, apart from hacking Python's import.c or reimplementing much of import.c from scratch. OTOH, it's not obvious to me why this should be so. Can't I access/replace builtins.__import__ from C/C++? That said, if I could do that, it doesn't seem like much trouble at all to get the behavior I want. -- Dave Abrahams Boost Consulting www.boost-consulting.com From mwh@python.net Sun Apr 20 19:15:45 2003 From: mwh@python.net (Michael Hudson) Date: Sun, 20 Apr 2003 19:15:45 +0100 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <200304201221.51200.fincher.8@osu.edu> (Jeremy Fincher's message of "Sun, 20 Apr 2003 12:21:49 -0400") References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> <200304201221.51200.fincher.8@osu.edu> Message-ID: <2mu1ctnige.fsf@starship.python.net> Jeremy Fincher <fincher.8@osu.edu> writes: > <http://www.cis.ohio-state.edu/~fincher/fifo.py> > <http://www.cis.ohio-state.edu/~fincher/test_fifo.py> What do you gain from inheriting from dict? It seems to me that merely containing one would do. Cheers, M. -- ARTHUR: Ford, you're turning into a penguin, stop it. -- The Hitch-Hikers Guide to the Galaxy, Episode 2 From agthorr@barsoom.org Sun Apr 20 19:24:19 2003 From: agthorr@barsoom.org (Agthorr) Date: Sun, 20 Apr 2003 11:24:19 -0700 Subject: [Python-Dev] heapq In-Reply-To: <200304200456.52084.fincher.8@osu.edu> References: <20030419224110.GB2460@barsoom.org> <200304200456.52084.fincher.8@osu.edu> Message-ID: <20030420182419.GA8449@barsoom.org> On Sun, Apr 20, 2003 at 04:56:46AM -0400, Jeremy Fincher wrote: > I've got unit tests for the object written, if it is something that will > possibly go into the standard library. FWIW, I have unit tests written for my heap implementation as well. -- Agthorr From agthorr@barsoom.org Sun Apr 20 19:30:06 2003 From: agthorr@barsoom.org (Agthorr) Date: Sun, 20 Apr 2003 11:30:06 -0700 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <200304200516.02382.fincher.8@osu.edu> References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> Message-ID: <20030420183005.GB8449@barsoom.org> On Sun, Apr 20, 2003 at 05:16:01AM -0400, Jeremy Fincher wrote: > 2.3 seems to focus somewhat on adding a wider variety of data structures to > Python -- well, sets and heapq, at least :) One thing I've found lacking, > though, is a nice O(1) FIFO queue -- even the standard Queue module I actually just wrote a modification to Queue that is O(1). There's no change to the interface, so it doesn't require adding a new data structure. I have the code here: http://www.cs.uoregon.edu/~agthorr/Queue.py The only changes are near the bottom of the file, beginning with the _init() function. My implementation uses Python lists, but it uses them in a smarter way than the existing Queue implementation. I'll submit a patch to SourceForge in a day or two. -- Agthorr From aahz@pythoncraft.com Sun Apr 20 19:31:05 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 14:31:05 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <84r87xnk16.fsf@boost-consulting.com> References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com> <20030420170408.GA6705@panix.com> <84r87xnk16.fsf@boost-consulting.com> Message-ID: <20030420183105.GA18929@panix.com> On Sun, Apr 20, 2003, David Abrahams wrote: > Aahz <aahz@pythoncraft.com> writes: >> On Sun, Apr 20, 2003, David Abrahams wrote: >>> Aahz <aahz@pythoncraft.com> writes: >>>> On Sun, Apr 20, 2003, David Abrahams wrote: >>>>> >>>>> I think I need a way to temporarily (from 'C'), arrange to be notified >>>>> just before and just after a new extension module is loaded. Is this >>>>> possible? I didn't see anything obvious in the source. BTW, I'd be >>>>> just as happy if it were possible to do the same thing for any module >>>>> (i.e., not discriminating between extension and pure python modules). >>>> >>>> http://www.python.org/peps/pep-0302.html >>> >>> I guess I should take that to mean "you can't do that yet" (?) >> >> As the PEP says, you *could* define an __import__ hook, but that would >> likely be more effort than you want. > > It also says: > > The situation gets worse when you need to extend the import > mechanism from C: it's currently impossible, apart from hacking > Python's import.c or reimplementing much of import.c from scratch. > > OTOH, it's not obvious to me why this should be so. Can't I > access/replace builtins.__import__ from C/C++? Sure, but then you need to replace import.c, just as it says. I'd be inclined to do the heavy lifting in Python with a callback into C code (after all, you're not calling it so frequently as to make it a performance issue). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From eppstein@ics.uci.edu Sun Apr 20 20:04:35 2003 From: eppstein@ics.uci.edu (David Eppstein) Date: Sun, 20 Apr 2003 12:04:35 -0700 Subject: [Python-Dev] Re: FIFO data structure? References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> <200304201221.51200.fincher.8@osu.edu> <2mu1ctnige.fsf@starship.python.net> Message-ID: <eppstein-C048B2.12043520042003@main.gmane.org> In article <2mu1ctnige.fsf@starship.python.net>, Michael Hudson <mwh@python.net> wrote: > > <http://www.cis.ohio-state.edu/~fincher/fifo.py> > > <http://www.cis.ohio-state.edu/~fincher/test_fifo.py> > > What do you gain from inheriting from dict? It seems to me that > merely containing one would do. See <http://tinyurl.com/9x6d> for some tests indicating that using dict for fifo is a slow way to go. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From mwh@python.net Sun Apr 20 21:32:43 2003 From: mwh@python.net (Michael Hudson) Date: Sun, 20 Apr 2003 21:32:43 +0100 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <20030420183105.GA18929@panix.com> (Aahz's message of "Sun, 20 Apr 2003 14:31:05 -0400") References: <847k9pp5qr.fsf@boost-consulting.com> <20030420163533.GA1885@panix.com> <84wuhpnm0i.fsf@boost-consulting.com> <20030420170408.GA6705@panix.com> <84r87xnk16.fsf@boost-consulting.com> <20030420183105.GA18929@panix.com> Message-ID: <2mr87woqok.fsf@starship.python.net> Aahz <aahz@pythoncraft.com> writes: >> OTOH, it's not obvious to me why this should be so. Can't I >> access/replace builtins.__import__ from C/C++? > > Sure, but then you need to replace import.c, just as it says. Not in this case: if all you want is notification, surely you can call the original __import__ to do the work... Cheers, M. -- In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it. -- Tim Peters, 16 Sep 93 From fincher.8@osu.edu Sun Apr 20 22:53:15 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 17:53:15 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: <eppstein-C048B2.12043520042003@main.gmane.org> References: <20030419224110.GB2460@barsoom.org> <2mu1ctnige.fsf@starship.python.net> <eppstein-C048B2.12043520042003@main.gmane.org> Message-ID: <200304201753.18059.fincher.8@osu.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday 20 April 2003 03:04 pm, David Eppstein wrote: > See <http://tinyurl.com/9x6d> for some tests indicating that using dict for > fifo is a slow way to go. That's definitely an inadequate test. First,if I read correctly, the test function doesn't test the plain list or array.array('i') as fifos, it tests them as a lifos (using simple .append(elt) and .pop()). Second, it never allows the fifo to have a size greater than 1, which completely negates the O(N) disadvantage of simple list-based implementations. Change the test function's for loops to this: for i in xrange(iterations): fifo.append(i) for i in xrange(iterations): j = fifo.pop() And you'll have a much more accurate comparison of the relative speed of the queues, taking into account naive list implementations' O(N) dequeue. I've written my own speed comparison using timeit.py. It's available at <http://www.cis.ohio-state.edu/~fincher/fifo_comparison.py>. Interestingly enough, the amortized-time 2-list approach is faster than all the other approaches for n elements somewhere between 100 and 1000. Here are my results with Python 2.2: 1 ListSubclassFifo 0.000233 1 DictSubclassFifo 0.000419 1 O1ListSubclassFifo 0.000350 10 ListSubclassFifo 0.001200 10 DictSubclassFifo 0.002814 10 O1ListSubclassFifo 0.001546 100 ListSubclassFifo 0.010613 100 DictSubclassFifo 0.028463 100 O1ListSubclassFifo 0.012658 1000 ListSubclassFifo 0.174211 1000 DictSubclassFifo 0.294973 1000 O1ListSubclassFifo 0.121407 10000 ListSubclassFifo 8.536460 10000 DictSubclassFifo 3.056266 10000 O1ListSubclassFifo 1.224752 (The O1ListSubclassFifo uses the standard (at least standard in functional programming :)) implementation technique of using two singly-linked lists, one for the front of the queue and another for the back of the queue.) Jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+oxbLqkDiu+Bs+JIRAgdZAJ9xiAkwpjDylj8aiAqDFL8Jm5zNTgCfU7nU kMThW2eItzfr5pXjMf2P0Y8= =9Tu7 -----END PGP SIGNATURE----- From guido@python.org Sun Apr 20 21:54:16 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 16:54:16 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: "Your message of Sun, 20 Apr 2003 12:04:35 PDT." <eppstein-C048B2.12043520042003@main.gmane.org> References: <20030419224110.GB2460@barsoom.org> <200304200516.02382.fincher.8@osu.edu> <34897.127.0.0.1.1050836021.squirrel@127.0.0.1> <200304201221.51200.fincher.8@osu.edu> <2mu1ctnige.fsf@starship.python.net> <eppstein-C048B2.12043520042003@main.gmane.org> Message-ID: <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net> [David Eppstein] > See <http://tinyurl.com/9x6d> for some tests indicating that using > dict for fifo is a slow way to go. I was just going to say that I was disappointed that there was discussion about O(1) vs. O(N) but no actual performance measurements. But a comment on David's measurements: they assume the queue is empty. What happens if the queue has an average of N elements, for various N? At what point does the dict version overtake the list version? Also ask yourself the following questions. How much time are you paying for the overhead of using a class vs. using a list directly? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Apr 20 21:59:30 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 16:59:30 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: "Your message of Sun, 20 Apr 2003 01:20:57 PDT." <3EA25869.6070404@noah.org> References: <3EA25869.6070404@noah.org> Message-ID: <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> > Recently I realized that there is no easy way to > walk a directory tree and rename each directory and file. > The standard os.path.walk() function does a breadth first walk. This idea has merit, although I'm not sure I'd call this depth first; it's more a matter of pre-order vs. post-order, isn't it? But I ask two questions: - How often does one need this? - When needed, how hard is it to hand-code a directory walk? It's not like the body of the walk() function is rocket science. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Apr 20 22:20:03 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 17:20:03 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: "Your message of Sun, 20 Apr 2003 11:07:24 EDT." <847k9pp5qr.fsf@boost-consulting.com> References: <847k9pp5qr.fsf@boost-consulting.com> Message-ID: <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net> > I think I need a way to temporarily (from 'C'), arrange to be notified > just before and just after a new extension module is loaded. Is this > possible? I didn't see anything obvious in the source. BTW, I'd be > just as happy if it were possible to do the same thing for any module > (i.e., not discriminating between extension and pure python modules). I think Aahz is slowly leading you in the right direction: you can override __import__ with something that calls your pre-hook, then the original __import__, then your post_hook. I see no problem with doing this from C except that it's a bit verbose. --Guido van Rossum (home page: http://www.python.org/~guido/) From fincher.8@osu.edu Sun Apr 20 23:21:02 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 18:21:02 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net> References: <20030419224110.GB2460@barsoom.org> <eppstein-C048B2.12043520042003@main.gmane.org> <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304201821.03771.fincher.8@osu.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday 20 April 2003 04:54 pm, Guido van Rossum wrote: > Also ask yourself the following questions. How much time are you > paying for the overhead of using a class vs. using a list directly? I imagine the object would eventually be written in C (probably by someone more experienced than myself, but I could do it if need be), when that overhead shouldn't matter. But even with a pure-Python implementation, as noted in my other email, the fastest O(1) implementation outran the naive list implementation (granted it was wrapped in a class to maintain the same interface) somewhere between 100 and 1000 elements. I could find out the average place at which the O(1) implementation becomes faster, if you're interested. Jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+ox1OqkDiu+Bs+JIRAhvvAJ9gHSRpZmf8F2tCsqK40uSPqIoCMACeM5lY k7FInBxUdA3MF/q/Hl4U45U= =lb0T -----END PGP SIGNATURE----- From guido@python.org Sun Apr 20 22:31:03 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 17:31:03 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: "Your message of Sun, 20 Apr 2003 18:21:02 EDT." <200304201821.03771.fincher.8@osu.edu> References: <20030419224110.GB2460@barsoom.org> <eppstein-C048B2.12043520042003@main.gmane.org> <200304202054.h3KKsGY19570@pcp02138704pcs.reston01.va.comcast.net> <200304201821.03771.fincher.8@osu.edu> Message-ID: <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net> [Guido] > > Also ask yourself the following questions. How much time are you > > paying for the overhead of using a class vs. using a list directly? [Jeremy Fincher] > I imagine the object would eventually be written in C (probably by someone > more experienced than myself, but I could do it if need be), when that > overhead shouldn't matter. But even with a pure-Python implementation, as > noted in my other email, the fastest O(1) implementation outran the naive > list implementation (granted it was wrapped in a class to maintain the same > interface) somewhere between 100 and 1000 elements. I could find out the > average place at which the O(1) implementation becomes faster, if you're > interested. I have to think about this more. ATM I'm inclined to say that this is relatively uncommon, and it's not that hard to come up with an efficient implementation. Python's philosophy about data types is that a few versatile data types (list, dict) get most the attention because they are re-usable in so many places. When you get to other algorithms, there is such a variety that it's hard to imagine putting them all in the standard library; instead, it's easy to roll your own built out of the standard ones. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.1 (FreeBSD) > > iD8DBQE+ox1OqkDiu+Bs+JIRAhvvAJ9gHSRpZmf8F2tCsqK40uSPqIoCMACeM5lY > k7FInBxUdA3MF/q/Hl4U45U= > =lb0T > -----END PGP SIGNATURE----- I know what this is, but I don't see the point. I don't know who you are (don't think we've ever met) and I respond based on your words, not on who wrote them. So what's the point? --Guido van Rossum (home page: http://www.python.org/~guido/) From fincher.8@osu.edu Mon Apr 21 00:01:26 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Sun, 20 Apr 2003 19:01:26 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net> References: <20030419224110.GB2460@barsoom.org> <200304201821.03771.fincher.8@osu.edu> <200304202131.h3KLV3X19827@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304201901.26269.fincher.8@osu.edu> On Sunday 20 April 2003 05:31 pm, Guido van Rossum wrote: > I have to think about this more. ATM I'm inclined to say that this is > relatively uncommon, and it's not that hard to come up with an > efficient implementation. Python's philosophy about data types is > that a few versatile data types (list, dict) get most the attention > because they are re-usable in so many places. When you get to other > algorithms, there is such a variety that it's hard to imagine putting > them all in the standard library; instead, it's easy to roll your own > built out of the standard ones. Aside from the efficiency improves, I like the self-documenting nature of using .enqueue and .dequeue methods instead of .append and .pop(0). But I see your point. > I know what this is, but I don't see the point. I don't know who you > are (don't think we've ever met) and I respond based on your words, > not on who wrote them. So what's the point? I just had my client setup to sign messages automatically; I'll disable it :) Jeremy From DavidA@ActiveState.com Mon Apr 21 01:49:56 2003 From: DavidA@ActiveState.com (David Ascher) Date: Sun, 20 Apr 2003 17:49:56 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3EA34034.9060109@ActiveState.com> Guido van Rossum wrote: >>Recently I realized that there is no easy way to >>walk a directory tree and rename each directory and file. >>The standard os.path.walk() function does a breadth first walk. > > > This idea has merit, although I'm not sure I'd call this depth first; > it's more a matter of pre-order vs. post-order, isn't it? > > But I ask two questions: > > - How often does one need this? > > - When needed, how hard is it to hand-code a directory walk? It's not > like the body of the walk() function is rocket science. That's hardly the point of improving the standard library, though, is it? I'm all for putting the kitchen sink in there, especially if it originates with a use case ("I had some dishes to wash..." ;-) --david From guido@python.org Mon Apr 21 02:01:53 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 21:01:53 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: "Your message of Sun, 20 Apr 2003 17:49:56 PDT." <3EA34034.9060109@ActiveState.com> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> Message-ID: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> > > - When needed, how hard is it to hand-code a directory walk? It's not > > like the body of the walk() function is rocket science. > > That's hardly the point of improving the standard library, though, is > it? I'm all for putting the kitchen sink in there, especially if it > originates with a use case ("I had some dishes to wash..." ;-) But if I had to do it over again, I wouldn't have added walk() in the current form. I often find it harder to fit a particular program's needs in the API offered by walk() than it is to reimplement the walk myself. That's why I'm concerned about adding to it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 21 01:58:06 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 20 Apr 2003 20:58:06 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: "Your message of Sun, 20 Apr 2003 10:58:07 EDT." <20030420105807.C15881@localhost.localdomain> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> Message-ID: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Thanks to all for a good and quick discussion! I'm swayed by Alex's argument that a simple sum() builtin answers a lot of recurring questions, so I'd like to add it. I've never liked reduce() -- in its full generality it causes hard to understand code, and I'm glad to see sum() remove probably 80% of the need for it. I like sum() best as the name -- that's what it's called in other systems. I'm not too concerned about the number of builtins (we should deprecate some anyway to make room for new ones). I'm not too worried that people will ask for prod() as well. And if they do, maybe we can give them that too; there's not much else along the same lines (bitwise or/and; ha ha ha) so even if the slope may be a bit slippery, I'm not worried about sliding too far. I don't think the signature should be extended to match min() and max() -- min(a, b) serves a real purpose, but sum(a, b) is just a redundant way of saying a+b, and ditto for sum(a,b,c) etc. There's a bunch of statistics functions (avg or mean, sdev etc.) that should go in a statistics package or module together with more advanced statistics stuff -- it would be a good idea to form a working group or SIG to design such a thing with an eye towards usability, power, and avoiding traps for newbies. Finally, there's the question of what sum() of an empty sequence should be. There are several ways to force it: you can write sum(L or [0]) (which avoids the cost of copying in sum(L + [0]), or we can give sum() an optional second argument. But still, what should sum([]) do? I'm sure that the newbies who are asking for it would be surprised by anything except sum([]) == 0, since they probably want to sum a list of numbers, and occasionally (albeit through a bug in their program :-) the list will be empty. But that means that summing a sequence of strings ends up with a strange end case. So perhaps raising an exception for an empty sequence, like min() and max(), is better: "In the face of ambiguity, refuse the temptation to guess." An optional second argument can then be used to specify a starting point for the summation. The semantics of this argument should be the same as for reduce(): sum(S, x) == sum([x] + list(S)) and hence sum(["a", "b"], "x") == "xab" (A minority view that I can't quite shake off: since the name sum() strongly suggests it's summing up numbers, sum([]) should be 0 and no second argument is allowed. I find using sum() for a sequence of strings a bit weird anyway, and will probably continue to write "".join(S) for that case.) Alex, care to send in your patch? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Apr 21 02:10:19 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 21:10:19 -0400 Subject: [Python-Dev] Re: FIFO data structure? In-Reply-To: <200304201753.18059.fincher.8@osu.edu> Message-ID: <LNBBLJKPBEHFEDALKOLCKEKBEDAB.tim.one@comcast.net> [Jeremy Fincher, on <http://tinyurl.com/9x6d> ] > That's definitely an inadequate test. First,if I read correctly, > the test function doesn't test the plain list or array.array('i') as > fifos, it tests them as a lifos (using simple .append(elt) and .pop()). That's right, alas. Mr. Delaney was implementing stacks there, and just calling them fifos. > Second, it never allows the fifo to have a size greater than 1, which > completely negates the O(N) disadvantage of simple list-based > implementations. Yup. > ... > <http://www.cis.ohio-state.edu/~fincher/fifo_comparison.py>. > ... > (The O1ListSubclassFifo uses the standard (at least standard in > functional programming :)) implementation technique of using two > singly-linked lists, one for the front of the queue and another for the > back of the queue.) The Dark Force has seduced you there: class O1ListSubclassFifo(list): __slots__ = ('back',) def __init__(self): self.back = [] def enqueue(self, elt): self.back.append(elt) def dequeue(self): if self: return self.pop() else: self.back.reverse() self[:] = self.back self.back = [] return self.pop() That is, you're subclassing merely to reuse implementation, not because you want to claim that O1ListSubclassFifo is-a list. It's better not to subclass list, and use two lists via has-a instead, say self.front and self.back. Then the O(N) self[:] = self.back can be replaced by the O(1) (for example) self.front = self.back Of course, this is Python <wink>, so it may not actually be faster that way: you save some C-speed list copies, but at the cost of more-expensive Python-speed dereferencing ("self.pop" vs "self.front.pop"). But even if it's slower, it's better not to pretend this flavor of FIFO is-a list (e.g., someone doing len(), pop(), append() on one of these instances is going to get a bizarre result). From tim.one@comcast.net Mon Apr 21 02:31:04 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 21:31:04 -0400 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <20030420183005.GB8449@barsoom.org> Message-ID: <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net> [Agthorr] > I actually just wrote a modification to Queue that is O(1). There's > no change to the interface, so it doesn't require adding a new data > structure. > > I have the code here: > http://www.cs.uoregon.edu/~agthorr/Queue.py > > The only changes are near the bottom of the file, beginning with the > _init() function. My implementation uses Python lists, but it uses > them in a smarter way than the existing Queue implementation. > > I'll submit a patch to SourceForge in a day or two. I'm opposed to this. The purpose of Queue is to mediate communication among threads, and a Queue.Queue rarely gets large because of its intended applications. As other recent timing posts have shown, you simply can't beat the list.append + list.pop(0) approach until a queue gets quite large (relative to the intended purpose of a Queue.Queue). If you have an unusual application for a Queue.Queue where it's actually faster to do a circular-buffer gimmick (and don't believe that you do before you time it), then, as the comments say, you're invited to *subclass* Queue.Queue, and override as many of the six queue-implementation methods at the bottom of the class as you believe will be helpful. It's not helpful to change the *base* implementation of Queue.Queue for an O() advantage swamped by increased overhead at typical queue sizes. From nas@python.ca Mon Apr 21 02:48:51 2003 From: nas@python.ca (Neil Schemenauer) Date: Sun, 20 Apr 2003 18:48:51 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030421014851.GB18971@glacier.arctrix.com> Guido van Rossum wrote: > But if I had to do it over again, I wouldn't have added walk() in the > current form. I think it's the perfect place for a generator. Neil From pinard@iro.umontreal.ca Mon Apr 21 03:14:02 2003 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois_Pinard?=) Date: 20 Apr 2003 22:14:02 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <oqd6jgfvh1.fsf@titan.progiciels-bpi.ca> [Guido van Rossum] > But if I had to do it over again, I wouldn't have added walk() in the > current form. I often find it harder to fit a particular program's > needs in the API offered by walk() than it is to reimplement the walk > myself. I do not much use `os.path.walk' myself. It is so simple to write a small walking loop with a stack of unseen directories, and in practice, there is a wide range of ways and reasons to walk a directory hierarchy, some of which do not fit nicely in the current `os.path.walk' specifications. > That's why I'm concerned about adding to it. The addition of generators to Python also changed the picture somewhat, in this area. It is often convenient to use a generator for a particular walk. -- François Pinard http://www.iro.umontreal.ca/~pinard From tim.one@comcast.net Mon Apr 21 03:12:42 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 22:12:42 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net> [Guido] > But if I had to do it over again, I wouldn't have added walk() in the > current form. I often find it harder to fit a particular program's > needs in the API offered by walk() than it is to reimplement the walk > myself. That's why I'm concerned about adding to it. We also have another possibility now: a pathname generator. Then the funky callback and mystery-arg ("what's the purpose of the 'arg' arg?" is a semi-FAQ on c.l.py) bits can go away, and client code could look like: for path in walk(root): # filter, if you like, via 'if whatever: continue' # accumulate state, if you like, in local vars Or it could look like for top, names in walk(root): or for top, dirnames, nondirnames in walk(root): Here's an implementation of the last flavor. Besides the more-or-less obvious topdown argument, note a subtlety: when topdown is True, the caller can prune the search by mutating the dirs list yielded to it. For example, for top, dirs, nondirs in walk('C:/code/python'): print top, dirs, len(nondirs) if 'CVS' in dirs: dirs.remove('CVS') doesn't descend into CVS subdirectories. def walk(top, topdown=True): import os try: names = os.listdir(top) except os.error: return exceptions = ('.', '..') dirs, nondirs = [], [] for name in names: if name in exceptions: continue fullname = os.path.join(top, name) if os.path.isdir(fullname): dirs.append(name) else: nondirs.append(name) if topdown: yield top, dirs, nondirs for name in dirs: for x in walk(os.path.join(top, name)): yield x if not topdown: yield top, dirs, nondirs From barry@python.org Mon Apr 21 03:23:47 2003 From: barry@python.org (Barry Warsaw) Date: 20 Apr 2003 22:23:47 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1050891827.26667.1.camel@geddy> On Sun, 2003-04-20 at 20:58, Guido van Rossum wrote: > (A minority view that I can't quite shake off: since the name sum() > strongly suggests it's summing up numbers, sum([]) should be 0 and no > second argument is allowed. I find using sum() for a sequence of > strings a bit weird anyway, and will probably continue to write > "".join(S) for that case.) I agree. I'd rather see sum() constrain itself to numbers and sum([]) == 0. Then I don't see a need for second argument. "Summing" a list of strings doesn't make much sense to me. -Barry From tim.one@comcast.net Mon Apr 21 03:24:52 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 22:24:52 -0400 Subject: [Python-Dev] New re failures on Windows In-Reply-To: <20030420075453.GA9504@localhost.distro.conectiva> Message-ID: <LNBBLJKPBEHFEDALKOLCMEKHEDAB.tim.one@comcast.net> [Gustavo Niemeyer] > Should be working now. Sorry about the trouble. I should have fixed that > before submiting the first version. Confirming that my problems went away. Thank you! From gward@python.net Mon Apr 21 03:47:43 2003 From: gward@python.net (Greg Ward) Date: Sun, 20 Apr 2003 22:47:43 -0400 Subject: [Python-Dev] Bug/feature/patch policy for optparse.py Message-ID: <20030421024743.GA3911@cthulhu.gerg.ca> Hi all -- I've just thrown together Optik 1.4.1, and in turn checked in rev 1.3 of Lib/optparse.py. From the optparse docstring: """ If you have problems with this module, please do not files bugs, patches, or feature requests with Python; instead, use Optik's SourceForge project page: http://sourceforge.net/projects/optik For support, use the optik-users@lists.sourceforge.net mailing list (http://lists.sourceforge.net/lists/listinfo/optik-users). """ and from a comment right after the docstring: # Python developers: please do not make changes to this file, since # it is automatically generated from the Optik source code. Does this policy seem reasonable to everyone? And, more importantly, can you all please try to respect it when you find bugs in or want to add features to optparse.py? Thanks! Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Never try to outstubborn a cat. From aahz@pythoncraft.com Mon Apr 21 03:51:54 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 22:51:54 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <1050891827.26667.1.camel@geddy> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <1050891827.26667.1.camel@geddy> Message-ID: <20030421025154.GA4542@panix.com> On Sun, Apr 20, 2003, Barry Warsaw wrote: > On Sun, 2003-04-20 at 20:58, Guido van Rossum wrote: >> >> (A minority view that I can't quite shake off: since the name sum() >> strongly suggests it's summing up numbers, sum([]) should be 0 and no >> second argument is allowed. I find using sum() for a sequence of >> strings a bit weird anyway, and will probably continue to write >> "".join(S) for that case.) > > I agree. I'd rather see sum() constrain itself to numbers and sum([]) > == 0. Then I don't see a need for second argument. "Summing" a list of > strings doesn't make much sense to me. Problem is, what *kind* of number? While ints are in general easily promotable (especially int 0), I'd prefer to make things explicit. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From tim.one@comcast.net Mon Apr 21 03:48:43 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 22:48:43 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> [Guido, making right decisions again, and I'll correct the details <wink>] > ... > I've never liked reduce() -- in its full generality it causes hard to > understand code, and I'm glad to see sum() remove probably 80% of the > need for it. 97.2%, actually. > ... > I'm not too worried that people will ask for prod() as well. And if > they do, maybe we can give them that too; They will ask, but let's resist that one. > there's not much else along the same lines (bitwise or/and; ha ha ha) xor reduction is the key to the winning strategy for the game of Nim, so expect intense pressure from the computer Nim camp. > ... > There's a bunch of statistics functions (avg or mean, sdev etc.) that > should go in a statistics package or module together with more > advanced statistics stuff -- it would be a good idea to form a working > group or SIG to design such a thing with an eye towards usability, > power, and avoiding traps for newbies. Very big job, unless you leave the "advanced" stuff out. Note that there are many stats packages available for Python already, although some build on NumPy. > ... > (A minority view that I can't quite shake off: since the name sum() > strongly suggests it's summing up numbers, sum([]) should be 0 and no > second argument is allowed. That's my view, so it's quite possibly the correct view <wink>. Numbers is numbers. sum(sequence_of_strings) hurts my brain, just as much as if we had a builtin concat() function for pasting together a sequence of strings, and someone argued that concat(sequence_of_numbers) should return their sum "because they're both related to the '+' glyph in a syntactical way" (that they both relate to methods named __add__ is beyond instant explanation to a newbie). From davida@ActiveState.com Mon Apr 21 04:28:13 2003 From: davida@ActiveState.com (David Ascher) Date: Sun, 20 Apr 2003 20:28:13 -0700 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> Message-ID: <3EA3654D.3070402@activestate.com> Tim Peters wrote: >>There's a bunch of statistics functions (avg or mean, sdev etc.) that >>should go in a statistics package or module together with more >>advanced statistics stuff -- it would be a good idea to form a working >>group or SIG to design such a thing with an eye towards usability, >>power, and avoiding traps for newbies. >> >> > >Very big job, unless you leave the "advanced" stuff out. Note that there >are many stats packages available for Python already, although some build on >NumPy. > > Scipy's stats package is more complete than many people expect. I would argue strongly against putting a 'cheap stats' package in the core, since building one such packages takes a huge amount of work, doing it twice is silly. At least the first version of the stats package now in chaco used to not require numeric, although I think that requirement is a red herring in practice. >That's my view, so it's quite possibly the correct view <wink>. Numbers is >numbers. sum(sequence_of_strings) hurts my brain, just as much as if we had >a builtin concat() function for pasting together a sequence of strings, and >someone argued that concat(sequence_of_numbers) should return their sum >"because they're both related to the '+' glyph in a syntactical way" (that >they both relate to methods named __add__ is beyond instant explanation to a >newbie). > > +1. Concatenation using + always seemed too Perlish for me, and Perl doesn't even do it! =) From tim.one@comcast.net Mon Apr 21 04:29:35 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 20 Apr 2003 23:29:35 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030421025154.GA4542@panix.com> Message-ID: <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net> [Aahz] > Problem is, what *kind* of number? While ints are in general easily > promotable (especially int 0), I'd prefer to make things explicit. I'd be OK with changing the signature to sum(iterable, empty=0) as 0 cannot in fact be auto-promoted to some reasonable number-like objects. For example, summing a list of datetime.timedelta objects seems a quite natural application (e.g., picture a timesheet app), but supplying int 0 generally blows up when a timedelta is expected. From aahz@pythoncraft.com Mon Apr 21 04:34:38 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 20 Apr 2003 23:34:38 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net> References: <20030421025154.GA4542@panix.com> <LNBBLJKPBEHFEDALKOLCEELAEDAB.tim.one@comcast.net> Message-ID: <20030421033438.GA7942@panix.com> On Sun, Apr 20, 2003, Tim Peters wrote: > [Aahz] >> >> Problem is, what *kind* of number? While ints are in general easily >> promotable (especially int 0), I'd prefer to make things explicit. > > I'd be OK with changing the signature to > > sum(iterable, empty=0) > > as 0 cannot in fact be auto-promoted to some reasonable number-like > objects. For example, summing a list of datetime.timedelta objects > seems a quite natural application (e.g., picture a timesheet app), but > supplying int 0 generally blows up when a timedelta is expected. +1 That makes the canonical usage clear, while not preventing people from doing stupid things. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From eppstein@ics.uci.edu Mon Apr 21 04:35:24 2003 From: eppstein@ics.uci.edu (David Eppstein) Date: Sun, 20 Apr 2003 20:35:24 -0700 Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <1050891827.26667.1.camel@geddy> <20030421025154.GA4542@panix.com> Message-ID: <eppstein-B5BDB5.20352420042003@main.gmane.org> In article <20030421025154.GA4542@panix.com>, Aahz <aahz@pythoncraft.com> wrote: > > I agree. I'd rather see sum() constrain itself to numbers and sum([]) > > == 0. Then I don't see a need for second argument. "Summing" a list of > > strings doesn't make much sense to me. > > Problem is, what *kind* of number? While ints are in general easily > promotable (especially int 0), I'd prefer to make things explicit. Maybe make sum(L) always equivalent to reduce(operator.add,L,0)? Then "number" here would mean something that can be added to 0, allowing any kind of user-defined number type to work (e.g. I recently wanted a sum function for Keith Briggs' "xr" package for exact computations over computable reals). This would mean that attempts to abuse sum to concatenate strings would raise TypeError. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From cnetzer@mail.arc.nasa.gov Mon Apr 21 05:33:32 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 20 Apr 2003 21:33:32 -0700 Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ') In-Reply-To: <3EA3654D.3070402@activestate.com> References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> Message-ID: <1050899612.591.21.camel@sayge.arc.nasa.gov> On Sun, 2003-04-20 at 20:28, David Ascher wrote: > Tim Peters wrote: > > >>There's a bunch of statistics functions (avg or mean, sdev etc.) that > >>should go in a statistics package or module together with more > >>advanced statistics stuff -- it would be a good idea to form a working > >>group or SIG to design such a thing with an eye towards usability, > >>power, and avoiding traps for newbies. +1 > >Very big job, unless you leave the "advanced" stuff out. Note that there > >are many stats packages available for Python already, although some build on > >NumPy. > > > Scipy's stats package is more complete than many people expect. I was going to suggest that we consider adopting Gary Strangman's stats.py package as the foundation for inclusion. This is the package that SciPy chose to include (with modifications of the namespace and API to fit the SciPy scheme of things). I've used it, and it is a very full featured package. I was actually kind of saddened that Gary had done all the work, since after getting my Master's degree, I had considered implementing such a module myself (for reasons of learning). But Gary's work is quite comprehensive, and well written, IMO (well tested, few external dependencies, etc. I just drop it in a working directory when I need it on a new system.) http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python.html Gary allowed SciPy to adopt his package under the BSD license, so I'm sure he would be amenable to discussing any licensing issues that may arise (the original package is GPL). It works on Python lists, as well as Numeric arrays. I'd be happy to take up the efforts of approaching Gary about whether he would consider "donating" his module for the standard lib, after any changes a working group or SIG might suggest (or require). Possibly there are some namespace issues (actually, he has a companion "pstat" module, that is a standard library module name conflict I'd wanted fixed). Other than ensuring it works on the normal python sequences, and removing any dependencies on NumPy or Numeric (while hopefully allowing it to integrate well with either), and possibly trying to reconcile name issues with SciPy (if at all feasible), it may be definitely doable by 2.4. I'm happy to volunteer some time to the effort. I think it would be quite worthwhile. -- Chad Netzer (any opinion expressed is my own and not NASA's or my employer's) From andymac@bullseye.apana.org.au Mon Apr 21 02:01:34 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Mon, 21 Apr 2003 12:01:34 +1100 (edt) Subject: [Python-Dev] New re failures on Windows In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIJEDAB.tim.one@comcast.net> Message-ID: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org> On Sat, 19 Apr 2003, Tim Peters wrote: > test_sre is dying with a segfault: > > """ > C:\Code\python\PCbuild>python ../lib/test/test_sre.py > Running tests on character literals > Running tests on sre.search and sre.match > sre.match(r'(a)?a','a').lastindex FAILED > expected None > got result 1 > sre.match(r'(a)(b)?b','ab').lastindex FAILED > expected 1 > got result 2 > sre.match(r'(?P<a>a)(?P<b>b)?b','ab').lastgroup FAILED > expected 'a' > got result 'b' > Running tests on sre.sub > Running tests on symbolic references > Running tests on sre.subn > Running tests on sre.split > Running tests on sre.findall > Running tests on sre.finditer > Running tests on sre.match > Running tests on sre.escape > Running tests on sre.Scanner > Pickling a SRE_Pattern instance > Test engine limitations > """ > > and it dies with a segfault there. Unfortunately, test_sre doesn't die in a > debug build. Compiler optimisation? I've been trying to get a handle on this for the last couple of days, with various versions of gcc on FreeBSD and OS/2 not liking _sre since Guido checked patch #720991 in on April 14. The failures all occur after the "Running tests on sre.search and sre.match" phase of test_sre. What I've been able to delineate thus far: test_sre on FreeBSD 4.[47]: gcc 2.95.[34]: -O3 => bus error, -O2 => Ok gcc 3.2.2: -O[023] => " " , -Os => Ok test_sre on OS/2: gcc 2.8.1: -O2 => Ok pgcc 2.95.2: -O3 => Ok gcc 3.2.1: -O[23] => SYS3171, -O[0s] => Ok OpenWatcom 1.0 with all optimisations enabled => Ok Now, the docs for SYS3171 on OS/2 say "EXPLANATION: The process was terminated without running exception handlers because there was not enough room left on the stack to dispatch the exception. This is typically caused by exceptions occurring in exception handlers." I did bump the stack from 1M to 2M with no effect. I'm not concerned by the failures on OS/2 as I'm not using autoconf there, and I can special-case _sre.c easily. I am concerned about the failures on FreeBSD. It looks to me as though the only viable option is to just special case FreeBSD/gcc in configure.in and use -Os instead of -O3. I've been assuming that test_sre has passed with gcc 3.2.x -O3 on Linux since that checkin. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From martin@v.loewis.de Mon Apr 21 08:06:29 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 21 Apr 2003 09:06:29 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <m3y9242utm.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > But still, what should sum([]) do? It should raise a ValueError("no values to sum"). In practice, I expect it won't matter, because users will typically have values to sum. If they don't, telling them to write sum(L or [0]) is easy enough. There should be preferably only one obvious way to do it. Regards, Martin From martin@v.loewis.de Mon Apr 21 08:21:42 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 21 Apr 2003 09:21:42 +0200 Subject: [Python-Dev] New re failures on Windows In-Reply-To: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org> References: <Pine.OS2.4.44.0304211124410.22617-100000@tenring.andymac.org> Message-ID: <m3u1cs2u49.fsf@mira.informatik.hu-berlin.de> Andrew MacIntyre <andymac@bullseye.apana.org.au> writes: > The failures all occur after the "Running tests on sre.search and > sre.match" phase of test_sre. Instead of trying various compilers hoping that the problem goes away, I recommend that you try to narrow down the test case that fails. Regards, Martin From aleax@aleax.it Mon Apr 21 09:29:28 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 21 Apr 2003 10:29:28 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <m3y9242utm.fsf@mira.informatik.hu-berlin.de> References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <m3y9242utm.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304211029.28802.aleax@aleax.it> On Monday 21 April 2003 09:06 am, Martin v. Löwis wrote: > Guido van Rossum <guido@python.org> writes: > > But still, what should sum([]) do? > > It should raise a ValueError("no values to sum"). In practice, I > expect it won't matter, because users will typically have values to > sum. If they don't, telling them to write sum(L or [0]) is easy > enough. There should be preferably only one obvious way to do it. I like this a lot -- particularly because it's exactly what I teach people now for max and min (except that in the cases of max and min there's the extra complication for the user of choosing WHAT he or she wants as the result for an empty list, while in the case of sum the user's life will be easier). Alex From aleax@aleax.it Mon Apr 21 09:52:55 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 21 Apr 2003 10:52:55 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> References: <200304192343.48211.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304211052.55557.aleax@aleax.it> On Monday 21 April 2003 02:58 am, Guido van Rossum wrote: ... > anything except sum([]) == 0, since they probably want to sum a list > of numbers, and occasionally (albeit through a bug in their program > :-) the list will be empty. But that means that summing a sequence of Errors should never pass silently, unless explicitly silenced. I thus think that the sum of an empty sequence should raise a ValueError (just like the max or min of an empty sequence) and the idiom sum(L or [0]) should be taught to "sum up a list of numbers that might be empty". > strings ends up with a strange end case. So perhaps raising an > exception for an empty sequence, like min() and max(), is better: "In > the face of ambiguity, refuse the temptation to guess." An optional Yes! > Alex, care to send in your patch? Aye aye, cap'n -- now that youve crossed the i's and dotted the t's I'll arrange the complete patch with tests and docs and submit it forthwith. Alex From aleax@aleax.it Mon Apr 21 11:52:32 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 21 Apr 2003 12:52:32 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304211052.55557.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it> Message-ID: <200304211252.32948.aleax@aleax.it> On Monday 21 April 2003 10:52 am, Alex Martelli wrote: ... > Aye aye, cap'n -- now that youve crossed the i's and dotted the t's > I'll arrange the complete patch with tests and docs and submit it > forthwith. Done -- patch 724936 on SF, assigned to gvanrossum with priority 7 as you said to do for patches meant for 2.3beta1. As I've remarked in the patch's comments, there's something of a performance hit now with sum(manystrings) wrt ''.join(manystrings): [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'sum(L)' 10000 loops, best of 3: 174 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' '"".join(L)' 10000 loops, best of 3: 75 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'reduce(operator.add,L)' 1000 loops, best of 3: 1.35e+03 usec per loop [alex@lancelot Lib]$ ../python -O timeit.py -s'L=map(str,range(999))' -s'import operator' 'tot=""' 'for it in L: tot+=it' 1000 loops, best of 3: 1.33e+03 usec per loop Nowhere as bad as the unbounded slowdown with operator.add or the equivalent loop, but still, a solid slowdown of a factor of two. Problem is that the argument to sum MIGHT be an iterator (not a "normal" sequence) so sum must save the first item and concat the _PyString_Join of the OTHER items after the first (my unit tests were originally lax and only exercised sum with the argument being a list -- fortunately I beefed up the unit tests as part of preparing the patch for submission, so they caught this, as well as the issue with sum of a sequence mixing unicode and plain string items, which forces sum to use different concatenation code depending on the exact type of the first item...). Reasoning on this, and on "If the implementation is hard to explain, it's a bad idea", I'm starting to doubt my original intuition that "of course" sum should be polymorphic over sequences of any type supporting "+" -- maybe Tim Peters' concept that sum should be restricted to sequences of numbers is sounder -- it's irksome that, of sum's 50 lines of C, 14 should deal with the special case of "sequence of strings" and STILL involve a factor-of-2 performance hit wrt ''.join! However, HOW do we catch attempts to use sum on a sequence of strings while still allowing the use case of a sequence of timedeltas? [maybe a timedelta SHOULD be summable to 0 and we could take the 'summable to 0' as a test of numberhood?-)] I don't know, so, I've submitted the patch as it stands, and I hope somebody can suggest a better solution - I just PRAY that sum won't accept a sequence of strings AND sum them up with + , thus perpetuating the "newbie performance trap" of that idiom!-) Alex From dave@boost-consulting.com Mon Apr 21 12:14:57 2003 From: dave@boost-consulting.com (David Abrahams) Date: Mon, 21 Apr 2003 07:14:57 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net> (Guido van Rossum's message of "Sun, 20 Apr 2003 17:20:03 -0400") References: <847k9pp5qr.fsf@boost-consulting.com> <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <84brz0nlu6.fsf@boost-consulting.com> Guido van Rossum <guido@python.org> writes: >> I think I need a way to temporarily (from 'C'), arrange to be notified >> just before and just after a new extension module is loaded. Is this >> possible? I didn't see anything obvious in the source. BTW, I'd be >> just as happy if it were possible to do the same thing for any module >> (i.e., not discriminating between extension and pure python modules). > > I think Aahz is slowly leading you in the right direction: you can > override __import__ with something that calls your pre-hook, then the > original __import__, then your post_hook. I see no problem with doing > this from C except that it's a bit verbose. So I take it a doc patch is in order. That section which claims it's impossible is certainly misleading... -- Dave Abrahams Boost Consulting www.boost-consulting.com From guido@python.org Mon Apr 21 13:04:38 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 08:04:38 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: "Your message of Sun, 20 Apr 2003 18:48:51 PDT." <20030421014851.GB18971@glacier.arctrix.com> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> <20030421014851.GB18971@glacier.arctrix.com> Message-ID: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum wrote: > > But if I had to do it over again, I wouldn't have added walk() in the > > current form. > > I think it's the perfect place for a generator. Absolutely! So let's try to write something new based on generators, make it flexible enough so that it can handle pre-order or post-order visits, and then phase out os.walk(). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 21 13:08:07 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 08:08:07 -0400 Subject: [Python-Dev] Bug/feature/patch policy for optparse.py In-Reply-To: "Your message of Sun, 20 Apr 2003 22:47:43 EDT." <20030421024743.GA3911@cthulhu.gerg.ca> References: <20030421024743.GA3911@cthulhu.gerg.ca> Message-ID: <200304211208.h3LC87O20882@pcp02138704pcs.reston01.va.comcast.net> > Hi all -- I've just thrown together Optik 1.4.1, and in turn checked in > rev 1.3 of Lib/optparse.py. From the optparse docstring: > > """ > If you have problems with this module, please do not files bugs, > patches, or feature requests with Python; instead, use Optik's > SourceForge project page: > http://sourceforge.net/projects/optik > > For support, use the optik-users@lists.sourceforge.net mailing list > (http://lists.sourceforge.net/lists/listinfo/optik-users). > """ > > and from a comment right after the docstring: > > # Python developers: please do not make changes to this file, since > # it is automatically generated from the Optik source code. > > Does this policy seem reasonable to everyone? And, more importantly, > can you all please try to respect it when you find bugs in or want to > add features to optparse.py? Thanks! Works for me. I expect that occasionally someone will forget this and check in a fix; they will surely be corrected quickly (and without *too* much embarrassment) by other developers. --Guido van Rossum (home page: http://www.python.org/~guido/) From jacobs@penguin.theopalgroup.com Mon Apr 21 13:12:25 2003 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 21 Apr 2003 08:12:25 -0400 (EDT) Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304211252.32948.aleax@aleax.it> Message-ID: <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com> On Mon, 21 Apr 2003, Alex Martelli wrote: > On Monday 21 April 2003 10:52 am, Alex Martelli wrote: > ... > > Aye aye, cap'n -- now that youve crossed the i's and dotted the t's > > I'll arrange the complete patch with tests and docs and submit it > > forthwith. > > Done -- patch 724936 on SF, assigned to gvanrossum with priority 7 > as you said to do for patches meant for 2.3beta1. Just to make sure I understand the desired semantics, is this Python implementation of sum() accurate: def sum(l): '''sum(sequence) -> value Returns the sum of a non-empty sequence of numbers (or other objects that can be added to each other, such as strings, lists, tuples...).''' it = iter(l) next = it.next try: first = next() except StopIteration: raise ValueError, 'sum() arg is an empty sequence' # Special-case sequences of strings, for speed if isinstance(first, str): try: return first + ''.join(it) except: pass try: while 1: first += next() except StopIteration: return first The speed optimization for string sequences is slightly different, but exposes the same fast-path for the vast majority of likely inputs. -Kevin -- -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From guido@python.org Mon Apr 21 13:26:41 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 08:26:41 -0400 Subject: [Python-Dev] Hook Extension Module Import? In-Reply-To: "Your message of Mon, 21 Apr 2003 07:14:57 EDT." <84brz0nlu6.fsf@boost-consulting.com> References: <847k9pp5qr.fsf@boost-consulting.com> <200304202120.h3KLK3w19764@pcp02138704pcs.reston01.va.comcast.net> <84brz0nlu6.fsf@boost-consulting.com> Message-ID: <200304211226.h3LCQfc22713@pcp02138704pcs.reston01.va.comcast.net> > So I take it a doc patch is in order. That section which claims it's > impossible is certainly misleading... I have no idea where it says that, so yes, please submit a patch! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 21 13:30:28 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 08:30:28 -0400 Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ') In-Reply-To: "Your message of 20 Apr 2003 21:33:32 PDT." <1050899612.591.21.camel@sayge.arc.nasa.gov> References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov> Message-ID: <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> Since it already exists as a 3rd party package, we should definitely not try to duplicate the effort. Then the question is, is it enough to point to the 3rd party package or does it deserve to be incorporated into the core? We can't go and incorporate every useful 3rd party package into the core (that's the job of the SUMO distribution project -- whichunfortunately seems to have stalled). OTOH, having it in the core, with decent documentation, might prevent naive wannabe-statisticians like myself from misremembering how standard deviation is implemented, or when to use it. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 21 13:48:58 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 08:48:58 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: "Your message of Mon, 21 Apr 2003 12:52:32 +0200." <200304211252.32948.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> Message-ID: <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> OK, let me summarize and pronounce. sum(sequence_of_strings) is out. *If* "".join() is really too ugly (I still think it's a matter of getting used to, like indentation), we could add join(seq, delim) as a built-in. VB has one. :-) sum([]) could either return 0 or raise ValueError. I lean towards 0 because that is occasionally useful and reinforces the numeric intention. I think making it return 0 will prevent end-case bugs where a newbie sums a list that is occasionally empty. If we made it an error, I expect that in 99% of the cases the response to that error would be to change the program to make it return 0 if the list is empty, and I can't imagine many bugs caused by choosing 0 over some other numerical zero. Having to teach the idiom sum(S or [0]) is ugly, and this doesn't work if S is an iterator. I appreciate Tim's point of wanting to sum "number-like" objects that can't be added to 0. OTOH if we provide *any* way of providing a different starting point, some creative newbie is going to use sum(list_of_strings, "") instead of "".join(), and be hurt by the performance months later. If we add an optional argument for Tim's use case, it could be used in two different ways: (1) only when the sequence is empty, (2) always used as a starting point. IMO (2) is more useful and more consistent. Here's one suggestion to deal with the sequence_of_strings issue (though maybe too pedantic): explicitly check whether the second argument is a string or unicode object, and in that case raise a TypeError indicating that a numeric value is required and suggesting to use "".join() for summing a sequence of strings. So here's a strawman implementation: def sum(seq, start=0): if isinstance(start, basestring): raise TypeError, "can't sum strings; use ''.join(seq) instead" return reduce(operator.add, seq, start) Alex, go ahead and implement this! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 21 14:43:09 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 09:43:09 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: Your message of "Mon, 21 Apr 2003 08:12:25 EDT." <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com> References: <Pine.LNX.4.44.0304210803440.4107-100000@penguin.theopalgroup.com> Message-ID: <200304211343.h3LDh9W21923@odiug.zope.com> > Just to make sure I understand the desired semantics, is this Python > implementation of sum() accurate: We're no longer aiming for this, but let me point out the fatal flaw in this approach: > def sum(l): > '''sum(sequence) -> value > > Returns the sum of a non-empty sequence of numbers (or other objects > that can be added to each other, such as strings, lists, tuples...).''' > > it = iter(l) > next = it.next > > try: > first = next() > except StopIteration: > raise ValueError, 'sum() arg is an empty sequence' > > # Special-case sequences of strings, for speed > if isinstance(first, str): > try: > return first + ''.join(it) > except: > pass Suppose the iterator was iter(["a", "b", "c", 1, 2, 3]). The "a" is held in the variable 'first'. The "".join() code consumes "b", "c" and 1, and then raises an exception. At this point, there's no way to recover the values swallowed by "".join(), so there's no way to continue. But letting the exception raised by "".join() propagate isn't right either: suppose that instead of [1, 2, 3] the sequence ended with some instances of a class that knows how to add itself to a string: the optimization attempt would cause an error to be thrown that wouldn't have been thrown without the optimization, a big no-no for optimizations. > try: > while 1: > first += next() > > except StopIteration: > return first > > The speed optimization for string sequences is slightly different, but > exposes the same fast-path for the vast majority of likely inputs. Of course, it might have been okay to only invoke "".join() if the argument was a *list* of strings. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax@aleax.it Mon Apr 21 16:03:24 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 21 Apr 2003 17:03:24 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304211703.24685.aleax@aleax.it> On Monday 21 April 2003 02:48 pm, Guido van Rossum wrote: > OK, let me summarize and pronounce. > > sum(sequence_of_strings) is out. *If* "".join() is really too ugly (I > still think it's a matter of getting used to, like indentation), we I entirely agree on this. Differently from reduce(operator.add, XX), ''.join(XX) *CAN* be taught quite reasonably to bright beginners without any special math/CS background, in my experience. The noise against ''.join IMHO comes mostly from a crowd of "OO purists" who just don't see WHY it's RIGHT for it to be that way!-) > could add join(seq, delim) as a built-in. VB has one. :-) VB has lots of stuff, but we don't need this one. Please. One obvious way to do it (at least if you are Dutch...!). > sum([]) could either return 0 or raise ValueError. I lean towards 0 > because that is occasionally useful and reinforces the numeric > intention. I think making it return 0 will prevent end-case bugs > where a newbie sums a list that is occasionally empty. If we made it > an error, I expect that in 99% of the cases the response to that error > would be to change the program to make it return 0 if the list is > empty, and I can't imagine many bugs caused by choosing 0 over some > other numerical zero. Having to teach the idiom sum(S or [0]) is > ugly, and this doesn't work if S is an iterator. You're right that S or [0] doesn't work for iterators, AND that bright beginners expect 0 rather than an error (fortunately I have some of those at hand to check with;-). So, sum([])==0 it is. > I appreciate Tim's point of wanting to sum "number-like" objects that > can't be added to 0. OTOH if we provide *any* way of providing a > different starting point, some creative newbie is going to use > sum(list_of_strings, "") instead of "".join(), and be hurt by the > performance months later. Yes yes yes! > If we add an optional argument for Tim's use case, it could be used in > two different ways: (1) only when the sequence is empty, (2) always > used as a starting point. IMO (2) is more useful and more consistent. > > Here's one suggestion to deal with the sequence_of_strings issue > (though maybe too pedantic): explicitly check whether the second > argument is a string or unicode object, and in that case raise a > TypeError indicating that a numeric value is required and suggesting > to use "".join() for summing a sequence of strings. I like this!!! > So here's a strawman implementation: > > def sum(seq, start=0): > if isinstance(start, basestring): > raise TypeError, "can't sum strings; use ''.join(seq) instead" > return reduce(operator.add, seq, start) > > Alex, go ahead and implement this! Coming right up! Alex From fincher.8@osu.edu Mon Apr 21 17:56:42 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Mon, 21 Apr 2003 12:56:42 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCIEKFEDAB.tim.one@comcast.net> Message-ID: <200304211256.42839.fincher.8@osu.edu> On Sunday 20 April 2003 10:12 pm, Tim Peters wrote: > if 'CVS' in dirs: > dirs.remove('CVS') This code brought up an interesting question to me: if sets have a .discard method that removes an element without raising KeyError if the element isn't in the set, should lists perhaps have that same method? On another related front, sets (in my Python 2.3a2) raise KeyError on a .remove(elt) when elt isn't in the set. Since sets aren't mappings, should that be a ValueError (like list raises) instead? Jeremy From aahz@pythoncraft.com Mon Apr 21 17:05:19 2003 From: aahz@pythoncraft.com (Aahz) Date: Mon, 21 Apr 2003 12:05:19 -0400 Subject: [Python-Dev] ''.join() again In-Reply-To: <200304211703.24685.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> Message-ID: <20030421160515.GA26557@panix.com> On Mon, Apr 21, 2003, Alex Martelli wrote: > > I entirely agree on this. Differently from reduce(operator.add, XX), > ''.join(XX) *CAN* be taught quite reasonably to bright beginners > without any special math/CS background, in my experience. The > noise against ''.join IMHO comes mostly from a crowd of "OO > purists" who just don't see WHY it's RIGHT for it to be that way!-) Well, this means it's time for my regular reminder that I'm very far from an OO purist and I still hate ''.join(). OTOH, I've been using it recently for some of my own code, and while I'll never change my mind about its visual ugliness, I've got to admit that it has one cardinal virtue: you can never forget what order its arguments belong in. So I'll stop ranting about ''.join() except when people like Alex make sneers about OO purists. ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From barry@python.org Mon Apr 21 17:23:39 2003 From: barry@python.org (Barry Warsaw) Date: 21 Apr 2003 12:23:39 -0400 Subject: [Python-Dev] ''.join() again In-Reply-To: <20030421160515.GA26557@panix.com> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com> Message-ID: <1050942219.30896.11.camel@barry> On Mon, 2003-04-21 at 12:05, Aahz wrote: > Well, this means it's time for my regular reminder that I'm very far > from an OO purist and I still hate ''.join(). OTOH, I've been using it > recently for some of my own code, and while I'll never change my mind > about its visual ugliness, I've got to admit that it has one cardinal > virtue: you can never forget what order its arguments belong in. And I'll do my semi-regular rant that COMMASPACE.join(seq) looks a lot nicer than ', '.join(seq) even to the point of starting to /like/ this idiom. :) -Barry From pinard@iro.umontreal.ca Mon Apr 21 18:44:12 2003 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois_Pinard?=) Date: 21 Apr 2003 13:44:12 -0400 Subject: [Python-Dev] Re: ''.join() again In-Reply-To: <20030421160515.GA26557@panix.com> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com> Message-ID: <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> [Aahz] > Well, [...] I still hate ''.join(). [...] I'll never change my mind > about its visual ugliness, Same here. I'm getting used to it like children get use to cigars: they vomit for some time, and after a while, learn to like them. Cigars, like the construct above, still destroy taste, and are not ideal for health! :-) > I've got to admit that it has one cardinal virtue: you can never forget > what order its arguments belong in. But yet, it is so unnatural and brain damaging that it sometimes induces me into using the wrong order of arguments for `A.split(B)'. I tried complaining as loud as I could, while staying civilised, before the above was put into Python, but nobody seemed interested to listen. As much as I appreciate most additions to Python from 1.6 and on, that particular one has been and will stay a long lasting mistake. I still love Python! :-) [Barry Warsaw] > And I'll do my semi-regular rant that > COMMASPACE.join(seq) > looks a lot nicer than > ', '.join(seq) > even to the point of starting to /like/ this idiom. :) Having to name simple string constants like a single space looks overkill. It hardly salvages the original ugliness. Admit it: you're stuck! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From barry@python.org Mon Apr 21 19:01:17 2003 From: barry@python.org (Barry Warsaw) Date: 21 Apr 2003 14:01:17 -0400 Subject: [Python-Dev] Re: ''.join() again In-Reply-To: <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com> <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> Message-ID: <1050948076.30896.33.camel@barry> On Mon, 2003-04-21 at 13:44, François Pinard wrote: > > And I'll do my semi-regular rant that > > COMMASPACE.join(seq) > > looks a lot nicer than > > ', '.join(seq) > > even to the point of starting to /like/ this idiom. :) > > Having to name simple string constants like a single space looks overkill. > It hardly salvages the original ugliness. Admit it: you're stuck! :-) Never! And though I don't smoke, some of my fondest childhood memories are walking around the block with my grandfather while he smoked his cigars. :) 'Course, you don't /have/ to name your string constants, though I usually do because it improves readability, and because I invariably find several uses for the same string constant in a single module. OTOH, I wouldn't object too strenuously to a join() builtin, but I'd probably never use it -- I'm sure I'd rarely remember the argument order and hate having too look it up much more than writing out the current spelling. Admit it: there is no natural unforgetable order! :) -Barry From zack@codesourcery.com Mon Apr 21 19:08:56 2003 From: zack@codesourcery.com (Zack Weinberg) Date: Mon, 21 Apr 2003 11:08:56 -0700 Subject: [Python-Dev] Re: ''.join() again In-Reply-To: <1050948076.30896.33.camel@barry> (Barry Warsaw's message of "21 Apr 2003 14:01:17 -0400") References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com> <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> <1050948076.30896.33.camel@barry> Message-ID: <87el3v67uv.fsf@egil.codesourcery.com> Barry Warsaw <barry@python.org> writes: > OTOH, I wouldn't object too strenuously to a join() builtin, but I'd > probably never use it -- I'm sure I'd rarely remember the argument order > and hate having too look it up much more than writing out the current > spelling. Admit it: there is no natural unforgetable order! :) It occurs to me that one may put join = str.join at the top of one's module, and thereafter use join('str', sequence) (For 2.1 backward compatibility, use type('') instead of str.) Possibly this is a counterargument to accusations that a join builtin would be bloat, since the same implementation could be used for both. zw From barry@python.org Mon Apr 21 19:18:34 2003 From: barry@python.org (Barry Warsaw) Date: 21 Apr 2003 14:18:34 -0400 Subject: [Python-Dev] Re: ''.join() again In-Reply-To: <87el3v67uv.fsf@egil.codesourcery.com> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <20030421160515.GA26557@panix.com> <oqhe8rd9ub.fsf@titan.progiciels-bpi.ca> <1050948076.30896.33.camel@barry> <87el3v67uv.fsf@egil.codesourcery.com> Message-ID: <1050949114.30943.41.camel@barry> On Mon, 2003-04-21 at 14:08, Zack Weinberg wrote: > Possibly this is a counterargument to accusations that a join builtin > would be bloat Or necessary <wink>. -Barry From tjreedy@udel.edu Mon Apr 21 20:01:32 2003 From: tjreedy@udel.edu (Terry Reedy) Date: Mon, 21 Apr 2003 15:01:32 -0400 Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <b81ebo$cpq$1@main.gmane.org> "Guido van Rossum" <guido@python.org> wrote in message news:200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net. .. > sum(sequence_of_strings) is out. *If* "".join() is really too ugly (I > still think it's a matter of getting used to, like indentation), we > could add join(seq, delim) as a built-in. VB has one. :-) Given that we already have the 'less ugly' alternative str.join(delim, strseq), both sum(strseq) and a hypothetical builtin seem unnecessary. And, an explicit udelim.join(sseq) or unicode.join(udelim, sseq) nicely handles mixed seqs without type guessing. >>> str.join('', ['a','b','c'] 'abc' >>> unicode.join(u'', ['a',u'b']) u'ab' Terry J. Reedy From noah@noah.org Mon Apr 21 19:55:31 2003 From: noah@noah.org (Noah Spurrier) Date: Mon, 21 Apr 2003 11:55:31 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <3EA34034.9060109@ActiveState.com> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> Message-ID: <3EA43EA3.1030903@noah.org> Guido>> This idea has merit, although I'm not sure I'd call this depth first; Guido>> it's more a matter of pre-order vs. post-order, isn't it? I thought the names were synonymous, but a quick look on Google showed that post-order seems more specific to binary trees whereas depth first is more general, but I didn't look very hard and all my college text books are in storage :-) Depth first is more intuitive, but post order is more descriptive of what the algorithm does. If I were writing documentation (or reading it) I would prefer "depth first". Guido>> - How often does one need this? I write these little directory/file filters quite often. I have come across this problem of renaming the directories you are traversing before. In the past the trees were small, so I just renamed the directories by hand and used os.path.walk() to handle the files. Recently I had to rename a very large tree which prompted me to look for a better solution. Guido>> - When needed, how hard is it to hand-code a directory walk? It's not Guido>> like the body of the walk() function is rocket science. True, it is easy to write. It would make a good exercise for a beginner, but I think it's better to have it than to not have it since I think a big part of the appear of Python is the "little" algorithms. It's also fits with the Python Batteries Included philosophy and benefits the "casual" Python user. Finally, I just find it generally useful. I use directory walkers a lot. david>> That's hardly the point of improving the standard library, though, is david>> it? I'm all for putting the kitchen sink in there, especially if it david>> originates with a use case ("I had some dishes to wash..." ;-) Guido> Guido>But if I had to do it over again, I wouldn't have added walk() in the Guido>current form. I often find it harder to fit a particular program's Guido>needs in the API offered by walk() than it is to reimplement the walk Guido>myself. That's why I'm concerned about adding to it. The change is small and the interface is backward compatible, but if you are actually trying to discourage people from using os.path.walk() in the future then I would vote for deprecating it and replacing it with a generator where the default is depthfirst ;-) Below is a sample tree walker using a generator I was delighted to find that they work in recursive functions, but it gave me a headache to think about for the first time. Perhaps it could be prettier, but this demonstrates the basic idea. Yours, Noah # Inspired by Doug Fort from an ActiveState Python recipe. # His version didn't use recursion and didn't do depth first. import os import stat def walktree (top = ".", depthfirst = True): """This walks a directory tree, starting from the 'top' directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to DEPTH first with optional BREADTH first, whereas the os.path.walk function allows only BREADTH first. Depth first was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory. """ names = os.listdir(top) if not depthfirst: yield top, names for name in names: try: st = os.lstat(os.path.join(top, name)) except os.error: continue if stat.S_ISDIR(st.st_mode): for (newtop, children) in walktree (os.path.join(top, name), depthfirst): yield newtop, children if depthfirst: yield top, names def test(): for (basepath, children) in walktree(): for child in children: print os.path.join(basepath, child) if __name__ == '__main__': test() From drifty@alum.berkeley.edu Mon Apr 21 19:57:22 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 21 Apr 2003 11:57:22 -0700 (PDT) Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <b81ebo$cpq$1@main.gmane.org> References: <200304192343.48211.aleax@aleax.it> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> <200304211052.55557.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <b81ebo$cpq$1@main.gmane.org> Message-ID: <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU> [Terry Reedy] > Given that we already have the 'less ugly' alternative str.join(delim, > strseq), Yes, but the string module will go away *someday*, so having it now does not matter much. -Brett From aleax@aleax.it Mon Apr 21 20:07:31 2003 From: aleax@aleax.it (Alex Martelli) Date: Mon, 21 Apr 2003 21:07:31 +0200 Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU> References: <200304192343.48211.aleax@aleax.it> <b81ebo$cpq$1@main.gmane.org> <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU> Message-ID: <200304212107.31876.aleax@aleax.it> On Monday 21 April 2003 08:57 pm, Brett Cannon wrote: > [Terry Reedy] > > > Given that we already have the 'less ugly' alternative str.join(delim, > > strseq), > > Yes, but the string module will go away *someday*, so having it now does > not matter much. Terry mentioned the type (str), not the module (string). The type's not gonna go away anytime soon... Alex From tim.one@comcast.net Mon Apr 21 20:15:54 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 15:15:54 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net> This is a multi-part message in MIME format. --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT [Guido] >>> But if I had to do it over again, I wouldn't have added walk() in the >>> current form. [Neil Schemenauer] >> I think it's the perfect place for a generator. [Guido] > Absolutely! So let's try to write something new based on generators, > make it flexible enough so that it can handle pre-order or post-order > visits, and then phase out os.walk(). I posted one last night, with a bug (it failed to pass the topdown flag through to recursive calls). Here's that again, with the bug repaired, sped up some, and with a docstring. Double duty: the example in the docstring shows why we don't want to make a special case out of sum([]): empty lists can arise naturally. What else would people like in this? I really like separating the directory names from the plain-file names, so don't bother griping about that <wink>. It's at least as fast as the current os.path.walk() (it's generally faster for me, but times for this are extremely variable on Win98). Removing the internal recursion doesn't appear to make a measureable difference when walking my Python tree, although because recursive generators require time proportional to the current stack depth to deliver a result to the caller, and to resume again, removing recursion could be much more efficient on an extremely deep tree. The biggest speedup I could find on Windows was via using os.chdir() liberally, so that os.path.join() calls weren't needed, and os.path.isdir() calls worked directly on one-component names. I suspect this has to do with that Win98 doesn't have an effective way to cache directory lookups under the covers. Even so, it only amounted to a 10% speedup: directory walking is plain slow on Win98 no matter how you do it. The attached doesn't play any gross speed tricks. --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ) Content-type: text/plain; name=walk.py Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=walk.py def walk(top, topdown=True): """Directory tree generator. For each directory in the directory tree rooted at top (including top itself, but excluding '.' and '..'), yields a 3-tuple dirpath, dirnames, filenames dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. If optional arg 'topdown' is true or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top down). If topdown is false, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom up). When topdown is true, the caller can modify the dirnames list in-place (e.g. via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, or to impose a specific order of visiting. Modifying dirnames when topdown is false is ineffective, since the directories in dirnames have already been generated by the time dirnames itself is generated. Caution: if you pass a relative pathname for top, don't change the current working directory between resumptions of walk. Example: from os.path import join, getsize for root, dirs, files in walk('python/Lib/email'): print root, "consumes", print sum([getsize(join(root, name)) for name in files]), print "bytes in", len(files), "non-directory files" if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories """ import os from os.path import join, isdir try: names = os.listdir(top) except os.error: return exceptions = ('.', '..') dirs, nondirs = [], [] for name in names: if name not in exceptions: if isdir(join(top, name)): dirs.append(name) else: nondirs.append(name) if topdown: yield top, dirs, nondirs for name in dirs: for x in walk(join(top, name), topdown): yield x if not topdown: yield top, dirs, nondirs --Boundary_(ID_obWR1ARDlxFPCnCmvHvojQ)-- From andymac@bullseye.apana.org.au Mon Apr 21 13:59:45 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Mon, 21 Apr 2003 23:59:45 +1100 (edt) Subject: [Python-Dev] New re failures on Windows In-Reply-To: <m3u1cs2u49.fsf@mira.informatik.hu-berlin.de> Message-ID: <Pine.OS2.4.44.0304212340550.27154-100000@tenring.andymac.org> On 21 Apr 2003, Martin v. [iso-8859-15] L=F6wis wrote: > > The failures all occur after the "Running tests on sre.search and > > sre.match" phase of test_sre. > > Instead of trying various compilers hoping that the problem goes away, > I recommend that you try to narrow down the test case that fails. I never had any hope the problem would "go away". I've been trying to quantify the extent of the problem, by finding out which compilers exhibit the failure with what optimisation settings, so that the autoconf configurations generated don't result in interpreters that blow up unexpectedly. As it appears the issue is confined to gcc, and so far only on FreeBSD and OS/2, I've got bugger all chance of resolving this in the gcc context. I'm sure that others would have screamed by now if gcc on Linux was similarly failing, which would have given more scope for resolving the issue. For all I know, it could be binutils related, as I seem to recall Andrew Koenig encountering something along these lines. I have a patch to configure.in which I'll upload to SF shortly which lowers the optimisation for FreeBSD. Not my preferred outcome, but all I'm able to offer in my current circumstances. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From guido@python.org Mon Apr 21 20:30:29 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 15:30:29 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: Your message of "Mon, 21 Apr 2003 15:15:54 EDT." <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCAEMDEDAB.tim.one@comcast.net> Message-ID: <200304211930.h3LJUTk14894@odiug.zope.com> > Here's that again, with the bug repaired, sped up some, and with a > docstring. Double duty: the example in the docstring shows why we > don't want to make a special case out of sum([]): empty lists can > arise naturally. > > What else would people like in this? I really like separating the > directory names from the plain-file names, so don't bother griping > about that <wink>. Good enough for me. :-) > It's at least as fast as the current os.path.walk() (it's generally > faster for me, but times for this are extremely variable on Win98). > Removing the internal recursion doesn't appear to make a measureable > difference when walking my Python tree, although because recursive > generators require time proportional to the current stack depth to > deliver a result to the caller, and to resume again, removing > recursion could be much more efficient on an extremely deep tree. > The biggest speedup I could find on Windows was via using os.chdir() > liberally, so that os.path.join() calls weren't needed, and > os.path.isdir() calls worked directly on one-component names. I > suspect this has to do with that Win98 doesn't have an effective way > to cache directory lookups under the covers. Even so, it only > amounted to a 10% speedup: directory walking is plain slow on Win98 > no matter how you do it. The attached doesn't play any gross speed > tricks. Please don't us chdir(), no matter how much it speeds things up. It's a disaster in a multi-threaded program. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Mon Apr 21 20:31:18 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 21 Apr 2003 21:31:18 +0200 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <3EA43EA3.1030903@noah.org> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <3EA43EA3.1030903@noah.org> Message-ID: <3EA44706.3010806@v.loewis.de> Noah Spurrier wrote: > I thought the names were synonymous, but a quick look on Google > showed that post-order seems more specific to binary trees whereas > depth first is more general, but I didn't look very hard and all my > college text books are in storage :-) Depth first is more intuitive, but > post order is more descriptive of what the algorithm does. > If I were writing documentation (or reading it) I would prefer "depth > first". I'm tempted to declare this off-topic: depth-first means "traverse children before traversing siblings". Depth-first comes in three variations: pre-order (traverse node first, then children, then siblings), in-order (only for binary trees: traverse left child first, then node, then right child, then sibling), post-order (traverse children first, then node, then siblings). There is also breadth-first: traverse siblings first, then children. > I write these little directory/file filters quite often. I have come across > this problem of renaming the directories you are traversing before. I still can't understand why you can't use os.path.walk for that. Did you know that you can modify the list that is passed to the callback, and that walk will continue to visit the elements in the list? Regards, Martin From drifty@alum.berkeley.edu Mon Apr 21 21:53:40 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 21 Apr 2003 13:53:40 -0700 (PDT) Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304212107.31876.aleax@aleax.it> References: <200304192343.48211.aleax@aleax.it> <b81ebo$cpq$1@main.gmane.org> <Pine.SOL.4.55.0304211156410.28903@death.OCF.Berkeley.EDU> <200304212107.31876.aleax@aleax.it> Message-ID: <Pine.SOL.4.55.0304211353090.3640@death.OCF.Berkeley.EDU> [Alex Martelli] > On Monday 21 April 2003 08:57 pm, Brett Cannon wrote: > > [Terry Reedy] > > > > > Given that we already have the 'less ugly' alternative str.join(delim, > > > strseq), > > > > Yes, but the string module will go away *someday*, so having it now does > > not matter much. > > Terry mentioned the type (str), not the module (string). The type's not > gonna go away anytime soon... > Oops. =) Sorry about that mix-up. -Brett From python@rcn.com Mon Apr 21 21:58:54 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 21 Apr 2003 16:58:54 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> Message-ID: <003d01c30848$ebcc2d00$ec11a044@oemcomputer> > > So here's a strawman implementation: > > > > def sum(seq, start=0): > > if isinstance(start, basestring): > > raise TypeError, "can't sum strings; use ''.join(seq) instead" > > return reduce(operator.add, seq, start) > > > > Alex, go ahead and implement this! > > Coming right up! For the C implementation, consider bypassing operator.add and calling the nb_add slot directly. It's faster and fulfills the intention to avoid the alternative call to sq_concat. Also, think about whether you want to match to two argument styles for min() and max(): >>> max(1,2,3) 3 >>> max([1,2,3]) 3 When the patch is ready, feel free to assign it to me for the code review. Raymond Hettinger P.S. Your new builtin works great with itertools. def dotproduct(vec1, vec2): return sum(itertools.imap(operator.mul, vec1, vec2)) From python@rcn.com Mon Apr 21 22:23:25 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 21 Apr 2003 17:23:25 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <003d01c30848$ebcc2d00$ec11a044@oemcomputer> Message-ID: <005e01c3084c$3fe7d300$ec11a044@oemcomputer> [RH] > For the C implementation, consider bypassing operator.add > and calling the nb_add slot directly. It's faster and fulfills > the intention to avoid the alternative call to sq_concat. Forget I said that, you still need PyNumber_Add() to handle coercion and such. Though without some special casing it's going to be darned difficult to match the performance of a pure python for-loop (especially for a sequence of integers). Raymond Hettinger From tim.one@comcast.net Mon Apr 21 22:31:20 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 17:31:20 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <003d01c30848$ebcc2d00$ec11a044@oemcomputer> Message-ID: <LNBBLJKPBEHFEDALKOLCGENCEDAB.tim.one@comcast.net> [Raymond Hettinger] > For the C implementation, consider bypassing operator.add > and calling the nb_add slot directly. It's faster and fulfills > the intention to avoid the alternative call to sq_concat. Checking for the existence of a (non-NULL) nb_add slot may be slicker than special-casing strings, but I'm not sure it's ever going to work if we try to call nb_add directly. In the end, I expect we'd have to duplicate all the logic in abstract.c's private binary_op1() to get all the endcases straight: /* Calling scheme used for binary operations: v w Action ------------------------------------------------------------------- new new w.op(v,w)[*], v.op(v,w), w.op(v,w) new old v.op(v,w), coerce(v,w), v.op(v,w) old new w.op(v,w), coerce(v,w), v.op(v,w) old old coerce(v,w), v.op(v,w) [*] only when v->ob_type != w->ob_type && w->ob_type is a subclass of v->ob_type Legend: ------- * new == new style number * old == old style number * Action indicates the order in which operations are tried until either a valid result is produced or an error occurs. */ OTOH, when the nb_add slot isn't NULL, the public PyNumber_Add (the same as operator.add) will do no more than invoke binary_op1 (unless the nb_add slot returns NotImplemented, which is another endcase you have to consider when calling nb_add directly -- I believe the Python core calls nb_add directly in only one place, when it already knows that both operands are ints, and that their sum overflows an int, so wants long.__add__ to handle it). > Also, think about whether you want to match to two argument > styles for min() and max(): > >>> max(1,2,3) > 3 > >>> max([1,2,3]) > 3 Guido already Pronounced on that -- max(x, y) is the clearest way to perform that operation, but there's no point to making sum(x, y) an obscure way to spell x+y (I suppose you want it as a builtin synonym for operator.add, though <wink>). > ... > P.S. Your new builtin works great with itertools. > def dotproduct(vec1, vec2): > return sum(itertools.imap(operator.mul, vec1, vec2)) Cool! From cnetzer@mail.arc.nasa.gov Mon Apr 21 23:07:28 2003 From: cnetzer@mail.arc.nasa.gov (Chad Netzer) Date: 21 Apr 2003 15:07:28 -0700 Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ') In-Reply-To: <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov> <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1050962848.584.9.camel@sayge.arc.nasa.gov> On Mon, 2003-04-21 at 05:30, Guido van Rossum wrote: > Since it already exists as a 3rd party package, we should definitely > not try to duplicate the effort. Then the question is, is it enough > to point to the 3rd party package or does it deserve to be > incorporated into the core? We can't go and incorporate every useful > 3rd party package into the core True. I just happen to be of the opinion that a statistics package is the single most practical and useful mathematics pakage that can be added to a language, after basic linear algebra (which isn't in the core... Hmmm) > OTOH, having it in the core, with decent documentation, might prevent > naive wannabe-statisticians like myself from misremembering how > standard deviation is implemented, or when to use it. :-) My concern, like yours, is that this kind of thing is probably reimplemented a LOT (at least the simple stats functions). If we adopoted it, I would actually favor keeping it fairly lightweight (although t-tests and even ANOVA should go in). Heavyweight users could always download a separate add on package. Of course, the stats.py package (and it's SciPy cousin) DOES seem to be well maintained, so perhaps the issue is just making sure those that might need it can easily download and install it (ie. promote it to distributions, give it proper promotion on Vaults of Parnassus, mirror it, etc.). My personal preference would be make it standard (as I propably would like NumPy to become). I like to use it in my unittests. :) That may not be the consensus, though. -- Chad Netzer (any opinion expressed is my own and not NASA's or my employer's) From noah@noah.org Mon Apr 21 23:15:09 2003 From: noah@noah.org (Noah Spurrier) Date: Mon, 21 Apr 2003 15:15:09 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option Message-ID: <3EA46D6D.8070606@noah.org> I like your version; although I used a different name to avoid confusion with os.path.walk. Note that os.listdir does not include the special entries '.' and '..' even if they are present in the directory, so there is no need to remove them. Tim Peters> for path in walk(root): Tim Peters>Or it could look like Tim Peters> for top, names in walk(root): Tim Peters>or Tim Peters> for top, dirnames, nondirnames in walk(root): I like the idea of yielding (top, dirs, nondirs), but often I want to perform the same operations on both dirs and nondirs so separating them doesn't help that case. This seems to be a situation where there is no typical case, so my preference is for the simpler interface. It also eliminates the need to build two new lists from the list you get from os.listdir()... In fact, I prefer your first suggestion (for path in walk(root)), but that would require building a new list by prepending the basepath to each element of children because os.listdir does not return full path. So finally in this example, I just went with returning the basepath and the children (both files and directories). Following Tom Good's example I added an option to ignore symbolic links. It would be better to detect cycles or at least prevent going higher up in the tree. Tim Peters>obvious topdown argument, note a subtlety: when topdown is True, the caller Tim Peters>can prune the search by mutating the dirs list yielded to it. For example, This example still allows you to prune the search in Breadth first mode by removing elements from the children list. That is cool. for top, children in walk('C:/code/python', depthfirst=False): print top, children if 'CVS' in children: children.remove('CVS') Yours, Noah from __future__ import generators # needed for Python 2.2 # Inspired by Doug Fort from an ActiveState Python recipe. # His version didn't use recursion and didn't do depth first. import os def walktree (basepath=".", depthfirst=True, ignorelinks=True): """This walks a directory tree, starting from the basepath directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to DEPTH first with optional BREADTH first, whereas the os.path.walk function allows only BREADTH first. Depth first was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory. The ignorelinks option determines whether to follow symbolic links. Some symbolic links can lead to recursive traversal cycles. A better way would be to detect and prune cycles. """ children = os.listdir(basepath) if not depthfirst: yield basepath, children for name in children: fullpath = os.path.join(basepath, name) if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)): for (next_basepath, next_children) in walktree (fullpath, depthfirst, ignorelinks): yield next_basepath, next_children if depthfirst: yield basepath, children def test(): for (basepath, children) in walktree(): for name in children: print os.path.join(basepath, name) if __name__ == '__main__': test() From tim.one@comcast.net Tue Apr 22 00:33:03 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 19:33:03 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <3EA46D6D.8070606@noah.org> Message-ID: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net> [Noah Spurrier] > I like your version; although I used a different name to avoid > confusion with os.path.walk. Who's confused <wink>? I agree it needs some other name if something like this gets checked in. > Note that os.listdir does not include the special entries '.' and '..' > even if they are present in the directory, so there is no need > to remove them. Oops -- that's right! This is a code divergence problem. There's more than one implementation of os.path.walk in the core, and the version in ntpath.py (which I started from) still special-cases '.' and '..'. I don't think it needs to. > Tim Peters> for path in walk(root): > Tim Peters>Or it could look like > Tim Peters> for top, names in walk(root): > Tim Peters>or > Tim Peters> for top, dirnames, nondirnames in walk(root): > > I like the idea of yielding (top, dirs, nondirs), but > often I want to perform the same operations on both > dirs and nondirs so separating them doesn't help that case. I think (a) that's unusual, and (b) it doesn't hurt that case either. You can do, e.g., for root, dirs, files in walk(...): for name in dirs + files: to squash them together again. > This seems to be a situation where there is no typical case, > so my preference is for the simpler interface. > It also eliminates the need to build two new lists from > the list you get from os.listdir()... Sorry, I'm unmovable on this point. My typical uses for this function do have to separates dirs from non-dirs, walk() has to make the distinction *anyway* (for its internal use), and it's expensive for the client to do the join() and isdir() bits all over again (isdir() is a filesystem op, and at least on my box repeated isdir() is overwhelmingly more costly than partitioning or joining a Python list). > In fact, I prefer your first suggestion (for path in walk(root)), but > that would require building a new list by prepending the > basepath to each element of children because os.listdir does not > return full path. What about that worries you? I don't like it because I have some directories with many thousands of files, and stuffing a long redundant path at the start of each is wasteful in the abstract. I'm not sure it really matters, though -- e.g., 10K files in a directory * 200 redundant chars each = a measly 2 megabytes wasted <wink>. > So finally in this example, I just went with returning the basepath > and the children (both files and directories). > > Following Tom Good's example I added an option to > ignore symbolic links. Not all Python platforms have symlinks, of course. The traditional answer to this one was that if a user wanted to avoid chasing those on a platform that supports them, they should prune the symlink names out of the fnames list passed to walk's func callback. The same kind of trick is still available in the generator version, although it was and remains painful. Separating the dirs from the non-dirs for the caller at least reduces the expense of it. > It would be better to detect cycles or at least prevent going > higher up in the tree. > ... > This example still allows you to prune the search > in Breadth first mode by removing elements from > the children list. That is cool. > for top, children in walk('C:/code/python', depthfirst=False): > print top, children > if 'CVS' in children: > children.remove('CVS') I'm finding you too hard to follow here, becuase your use of "depthfirst" and "breadthfirst" doesn't follow normal usage of the terms. Here's normal usage: consider this tree (A's kids are B, C, D; B's kids are E, F; C's are G, H, I; D's are J, K): A B C D E F G H I J K A depth-first left-to-right traversal is what you get out of a natural recursive routine. It sees the nodes internally in this order: A B E F C G H I D J K In a preorder DFS (depth first search), you deliver ("do something with" -- print it, yield it, whatever) the node before delivering its children. Preorder DFS in the tree above delivers the nodes in order A B E F C G H I D J K which is the same order in which nodes are first seen. This is what I called "top down". In a postorder DFS, you deliver the node *after* delivering its children, although you still first see nodes in the same order. Postorder left-to-right DFS in the tree above delivers nodes in this order: E F B G H I C J K D A This is what I called "bottom up". A breadth-first search can't be done naturally using recursion; you need to maintain an explicit queue for that (or write convoluted recursive code). A BFS on the tree above would see the nodes in this order: A B C D E F G H I J K It can be programmed like so, given a suitable queue implementation: queue = SuitableQueueImplementation() queue.enqueue(root) while queue: node = queue.dequeue() for child in node.children(): queue.enqueue(child) Nobody has written a breadth-first traverser in this thread. If someone wants to, there are again preorder and postorder variations, although only preorder BFS falls naturally out of the code just above. The current os.path.walk() delivers directories in preorder depth-first left-to-right order, BTW. > for name in children: > fullpath = os.path.join(basepath, name) > if os.path.isdir (fullpath) and not (ignorelinks and > os.path.islink(fullpath)): Despte what I said above <wink>, I expect the ignorelinks argument is a good idea. > for (next_basepath, next_children) in walktree > (fullpath, depthfirst, ignorelinks): > yield next_basepath, next_children Note that there's no need to pull apart 2-tuples and paste them together again here; for x in walktree(...): yield x does the same thing. From duanev@io.com Tue Apr 22 01:27:38 2003 From: duanev@io.com (duane voth) Date: Mon, 21 Apr 2003 19:27:38 -0500 Subject: [Python-Dev] LynxOS4 dynamic loading with dlopen() and -ldl Message-ID: <20030421192738.A23585@io.com> I'm unable to get the dynamic Python modules to import/load correctly on LynxOS4 (a realtime OS that has gcc, shared libs, and many other UNIXisms). Make excerpt: ... running build running build_ext platform = lynxos4 (my comment in setup.py) building 'struct' extension creating build creating build/temp.lynxos-4.0.0-PowerPC-2.2 gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fvec -fPIC -mshared -mthreads -I. -I/usr/local/src/Python-2.2.2/./Include -I/usr/local/include -I/usr/local/src/Python-2.2.2/Include -I/usr/local/src/Python-2.2.2 -c /usr/local/src/Python-2.2.2/Modules/structmodule.c -o build/temp.lynxos-4.0.0-PowerPC-2.2/structmodule.o creating build/lib.lynxos-4.0.0-PowerPC-2.2 gcc -shared -mshared -mthreads build/temp.lynxos-4.0.0-PowerPC-2.2/structmodule.o -L/usr/local/lib -o build/lib.lynxos-4.0.0-PowerPC-2.2/struct.so WARNING: removing "struct" since importing it failed ... (all the other modules fail the same way) I hacked setup.py to stop "removing" the bad module files and brought up the python interpreter to try the import by hand: bash-2.02# ./python Python 2.2.2 (#4, Apr 21 2003, 16:39:51) [GCC 2.95.3 20010323 (Lynx)] on lynxos4 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path += ['/usr/local/src/Python-2.2.2/build/lib.lynxos-4.0.0-PowerPC-2.2'] >>> import struct Traceback (most recent call last): File "<stdin>", line 1, in ? ImportError: Symbol not found: "PyInt_Type" >>> (btw, it would be nice if 'ImportError: Symbol not found: "PyInt_Type"' was emitted without all the debugging by hand -- actually it would be nice if many python exceptions (IndexError: list index out of range comes to mind) were rather more helpful about what is wrong, all this debugging via divination is a bit hard on us newbies!) PyInt_Type is declared in Objects/intobject.o and is visible in the python binary (the one doing the dlopen()). I'm not that familiar with dlopen() but shouldn't references from the .so being loaded to the loading program be resolved by dlopen during load? Running nm on 'python' gives '004d2d3c D PyInt_Type' so all the python symbols are being exported properly. Any ideas on how to resolve this run-time symbol lookup error? Nagging thoughts: LynxOS seems to shy away from shared libraries (they live in a special nonstandard directory and not all libraries have shared versions). Should I be thinking about doing a static python? If so, I will need to abandon dlopen() completely right? But I also want to use tkinter and the X11 libs to so I don't think static is really what I want! -- Duane Voth duanev@io.com -- duanev@atlantis.io.com From tdelaney@avaya.com Tue Apr 22 01:31:19 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 22 Apr 2003 10:31:19 +1000 Subject: [Python-Dev] Re: FIFO data structure? Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4ABCB3@au3010avexu1.global.avaya.com> > From: David Eppstein [mailto:eppstein@ics.uci.edu] >=20 > See <http://tinyurl.com/9x6d> for some tests indicating that=20 > using dict for fifo is a slow way to go. Arrgh! That's an extremely broken test. Do not link to that test! I even admitted it when I realised ... http://tinyurl.com/a0f4 Tim Delaney From python@rcn.com Tue Apr 22 01:29:20 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 21 Apr 2003 20:29:20 -0400 Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ') References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov> <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> <1050962848.584.9.camel@sayge.arc.nasa.gov> Message-ID: <009c01c30866$383cc920$ec11a044@oemcomputer> [GvR] > > Since it already exists as a 3rd party package, we should definitely > > not try to duplicate the effort. Then the question is, is it enough > > to point to the 3rd party package or does it deserve to be > > incorporated into the core? We can't go and incorporate every useful > > 3rd party package into the core Why not? From a users point of view, that is the best place for it. Of course, not *every* useful third-party package is a candidate, but if it applies to several different categories of users, then maybe. For instance, the DNA seach packages are somewhat tightly targeted, but basic statistics come up in many different types of work. [Chad] > True. I just happen to be of the opinion that a statistics package is > the single most practical and useful mathematics pakage that can be > added to a language, after basic linear algebra (which isn't in the > core... Hmmm) I've maintained a pure python linear algebra package for several years. It gives the basics plus QR decomposition, complex matrices, and eigenvalues. Still, I haven't felt the slightest need to request that it be put it the core, nor have any of my users requested it. > I would actually favor keeping it fairly lightweight > (although t-tests and even ANOVA should go in). > Heavyweight users could > always download a separate add on package. To keep it lightweight, it should be kept in pure python. Heavyweight users can download the binaries when needed. reinventing-the-wheel-is-fun-educational-and-non-productive-ly yours, Raymond Hettinger From guido@python.org Tue Apr 22 01:57:42 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 21 Apr 2003 20:57:42 -0400 Subject: [Python-Dev] stats.py (was 'summing a bunch of numbers ') In-Reply-To: "Your message of Mon, 21 Apr 2003 20:29:20 EDT." <009c01c30866$383cc920$ec11a044@oemcomputer> References: <LNBBLJKPBEHFEDALKOLCIEKJEDAB.tim.one@comcast.net> <3EA3654D.3070402@activestate.com> <1050899612.591.21.camel@sayge.arc.nasa.gov> <200304211230.h3LCUSN22737@pcp02138704pcs.reston01.va.comcast.net> <1050962848.584.9.camel@sayge.arc.nasa.gov> <009c01c30866$383cc920$ec11a044@oemcomputer> Message-ID: <200304220057.h3M0vge23381@pcp02138704pcs.reston01.va.comcast.net> [Guido] > > > We can't go and incorporate every useful > > > 3rd party package into the core [Raymond] > Why not? Because of the costs associated with code in the core: - Once it's in the core, you can't take it away; if the original maintainer goes away, we have to somehow keep up maintenance; there's no such thing as maintenance-free code. E.g. see the pain it takes to get SRE bugs fixed now that Effbot is too busy. - The core needs to build and run on a large variety of platforms. Some 3rd party package authors don't have that goal, and maintain their solution for one or two platforms only. But what's in the core should (unless it is *inherently* platform specific) run on all platforms. The extra portability work must be done by *someone*. - If it's actively maintained by the original author(s), their release cycle may not coincide with Python's; given Python's size, Python releases are typically less frequent than other package releases. There's not much point in having an outdated version of something in the core. E.g. see the painful situation with the xml package and the PyXML distribution. This is one reason why win32all is still separately maintained. - Coding standards. I don't care what naming and other coding conventions are used in a 3rd party package, but there are certain minimum standards for core code (see PEP 7 and 8). This is another reason why win32al is still separately maintained. - Documentation style. For core packages it is expected that their documentation is maintained in our special LaTeX dialect. - At some point the download size simply gets too big, and we have to break things up again. This has happened to Emacs, for example. - For some areas (I'm not saying that this is the case for the stats package, for all I know it's "best of breed") there is considerable disagreement among (potential and existing) users which package providing certain functionality is "right". E.g. Twisted vs. Zope. We can't put every approach in the core, but putting one package in the core may damage the viability of another, possibly better (for some users) solution. To some extent this has happened with GUI toolkits: the presence of Tkinter in the core makes it harder for other GUI toolkits to compete (leaving aside whether Tkinter is better or not -- it's just not a level playing field). Feel free to enter this in the FAQ; I've got a feeling this is a generally useful response. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@multitalents.net Tue Apr 22 02:55:38 2003 From: tim@multitalents.net (Tim Rice) Date: Mon, 21 Apr 2003 18:55:38 -0700 (PDT) Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304161552.h3GFqAQ10181@odiug.zope.com> References: <200304161552.h3GFqAQ10181@odiug.zope.com> Message-ID: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> On Wed, 16 Apr 2003, Guido van Rossum wrote: > I'd like to do a 2.3b1 release someday. Maybe at the end of next > week, that would be Friday April 25. If anyone has something that > needs to be done before this release go out, please let me know! The UnixWare build is way dead right now. (today's CVS) cc -c -K pentium,host,inline,loop_unroll,alloca -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select gmake: *** [Modules/python.o] Error 1 > > Assigning a SF bug or patch to me and setting the priority to 7 is a > good way to get my attention. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From gward@python.net Tue Apr 22 02:57:29 2003 From: gward@python.net (Greg Ward) Date: Mon, 21 Apr 2003 21:57:29 -0400 Subject: [Python-Dev] Bug/feature/patch policy for optparse.py In-Reply-To: <003d01c3081f$e5232540$410ea044@oemcomputer> References: <20030421024743.GA3911@cthulhu.gerg.ca> <003d01c3081f$e5232540$410ea044@oemcomputer> Message-ID: <20030422015729.GA966@cthulhu.gerg.ca> [Raymond, I'm assuming you did not mean to send your reply to me privately, so I'm cc'ing python-dev!] On 21 April 2003, Raymond Hettinger said: > Why is it important to keep two separate implementations? > Also, if you have to have two, why not have the python cvs > as the primary (to the remove the restriction, to take advantage > of the snake farm, to let third party users have a single place > to file a bug report, to have more developer eyes and fingers > to work a problem, etc)? Hmmm, good question. Probably it's mostly for ego gratification -- one of *my* SF projects was above the 50th percentile in activity last week, just because of my flurry of checkins on Sunday! ;-> But seriously: if people using Python < 2.3 are to be able to use Optik (aka optparse), then there needs to be somewhere for the setup script, tarball etc. to live. optik.sourceforge.net is as good a place as any. Perhaps in due course, the code in Lib/optparse.py (and Lib/test/test_optparse.py) will become the definitive copy, but for now it's not. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ I'm on a strict vegetarian diet -- I only eat vegetarians. From gward@python.net Tue Apr 22 03:26:07 2003 From: gward@python.net (Greg Ward) Date: Mon, 21 Apr 2003 22:26:07 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> References: <200304192343.48211.aleax@aleax.it> <200304200829.52477.aleax@aleax.it> <Pine.SOL.4.55.0304200108490.27731@death.OCF.Berkeley.EDU> <200304201210.07054.aleax@aleax.it> <20030420105807.C15881@localhost.localdomain> <200304210058.h3L0w6o19963@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030422022607.GA1107@cthulhu.gerg.ca> On 20 April 2003, Guido van Rossum said: > I'm not too worried that people will ask for prod() as well. And if > they do, maybe we can give them that too; there's not much else along > the same lines (bitwise or/and; ha ha ha) so even if the slope may be > a bit slippery, I'm not worried about sliding too far. I can't count the number of times sum() would have been useful to me. I can count the number of times prod() would have been: zero. Bitwise and/or en masse seems unnecessary (although I remember being quite tickled by the fact that you can do bitwise operations on strings in Perl -- whee, fun! -- when I was young and naive). However, there have been a number of occasions where I wanted *logical* and/or en masse: are any/all elements of this list true/false? On several occasions I tried to do it in one super-clever line of code using reduce(), and I think I even succeeded once. But usually I give up and make it a loop. IMHO *this* is likely to be the feature people start asking for after they decide sum() is handy. Greg PS. my nominations for removal in Python 3.0: reduce() and filter(). -- Greg Ward <gward@python.net> http://www.gerg.ca/ What happens if you touch these two wires tog-- From tim.one@comcast.net Tue Apr 22 03:44:58 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 22:44:58 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <3EA44706.3010806@v.loewis.de> Message-ID: <LNBBLJKPBEHFEDALKOLCOEOCEDAB.tim.one@comcast.net> [Noah Spurrier] >> I write these little directory/file filters quite often. I have >> come across this problem of renaming the directories you are >> traversing before. [Martin v. L=F6wis] > I still can't understand why you can't use os.path.walk for that. > Did you know that you can modify the list that is passed to the > callback, and that walk will continue to visit the elements in the = list? Let's spell it out. Say the directory structure is like so: a/ b/ c/ d/ e/ and we want to stick "x" at the end of each directory name. The firs= t thing the callback sees is arg, "a", ["b", "e"] The callback can rename b and e, and change the contents of the fname= s list to ["bx", "ex"] so that walk will find the renamed directories. Etc. This works: """ import os def renamer(arg, dirname, fnames): for i, name in enumerate(fnames): if os.path.isdir(os.path.join(dirname, name)): newname =3D name + "x" os.rename(os.path.join(dirname, name), os.path.join(dirname, newname)) fnames[i] =3D newname # crucial! os.path.walk('a', renamer, None) """ It's certainly less bother renaming bottom-up; this works too (given = the last walk() generator implementation I posted): """ import os for root, dirs, files in walk('a', topdown=3DFalse): for d in dirs: os.rename(os.path.join(root, d), os.path.join(root, d + 'x')) """ A possible surprise is that neither of these renames 'a'. From tim.one@comcast.net Tue Apr 22 04:03:20 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 23:03:20 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030422022607.GA1107@cthulhu.gerg.ca> Message-ID: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> [Greg Ward] > I can't count the number of times sum() would have been useful to me. I > can count the number of times prod() would have been: zero. Two correct answers. Good for you, Greg! > Bitwise and/or en masse seems unnecessary (although I remember being > quite tickled by the fact that you can do bitwise operations on strings > in Perl -- whee, fun! -- when I was young and naive). > > However, there have been a number of occasions where I wanted *logical* > and/or en masse: are any/all elements of this list true/false? On > several occasions I tried to do it in one super-clever line of code > using reduce(), and I think I even succeeded once. But usually I give > up and make it a loop. IMHO *this* is likely to be the feature people > start asking for after they decide sum() is handy. def alltrue(seq): return sum(map(bool, seq)) == len(seq) def atleastonetrue(seq): return sum(map(bool, seq)) > 0 > ... > PS. my nominations for removal in Python 3.0: reduce() and filter(). reduce() is still in Python?! Brrrr. filter() is hard to get rid of because the bizarre filter(None, seq) special case is supernaturally fast. Indeed, time the above against def alltrue(seq): return len(filter(None, seq)) == len(seq) def atleastonetrue(seq): return bool(filter(None, seq)) Let me know which wins <wink>. From tim.one@comcast.net Tue Apr 22 04:33:05 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 23:33:05 -0400 Subject: [Python-Dev] New re failures on Windows In-Reply-To: <Pine.OS2.4.44.0304212340550.27154-100000@tenring.andymac.org> Message-ID: <LNBBLJKPBEHFEDALKOLCAEOHEDAB.tim.one@comcast.net> [Martin v. L=F6wis] >> Instead of trying various compilers hoping that the problem goes a= way, >> I recommend that you try to narrow down the test case that fails. [Andrew MacIntyre] > I never had any hope the problem would "go away". I've been trying= to > quantify the extent of the problem, by finding out which compilers > exhibit the failure with what optimisation settings, so that the > autoconf configurations generated don't result in interpreters that > blow up unexpectedly. Narrowing it down to the specific C code that's at fault is still the= best hope. There are two reasons for that: 1. It's very easy to write ill-defined code in C, and for all we know now some part of _sre is depending on undefined, or implementation defined (but apparently likely), behavior. 2. If that's not the problem, optimization bugs are usually easy to sidestep via minor code changes. You have to know which code is getting screwed first, though. > ... > I have a patch to configure.in which I'll upload to SF shortly whic= h > lowers the optimisation for FreeBSD. Not my preferred outcome, but= all > I'm able to offer in my current circumstances. Narrowing it down is indeed A Project. From tim.one@comcast.net Tue Apr 22 04:43:28 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 21 Apr 2003 23:43:28 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <3EA3654D.3070402@activestate.com> Message-ID: <LNBBLJKPBEHFEDALKOLCAEOIEDAB.tim.one@comcast.net> [David Ascher] > Scipy's stats package is more complete than many people expect. I > would argue strongly against putting a 'cheap stats' package in the > core, since building one such packages takes a huge amount of work, > doing it twice is silly. At least the first version of the stats > package now in chaco used to not require numeric, although I think that > requirement is a red herring in practice. I expect that when Guido is thinking about a simple stats package, he's not picturing more than median, mean, sdev, variance, and maybe percentile points, all limited to one dimension. Just about anyone can take over maintenance of those if need be, although how to code a numerically robust sdev isn't well known outside of people who've been burned by "good enough, it can't be *that* hard <wink>" initial attempts. From tim.one@comcast.net Tue Apr 22 05:06:10 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 22 Apr 2003 00:06:10 -0400 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <200304211256.42839.fincher.8@osu.edu> Message-ID: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net> [Jeremy Fincher] > This code brought up an interesting question to me: if sets have > a .discard method that removes an element without raising KeyError > if the element isn't in the set, should lists perhaps have that same > method? I don't think list.remove(x) is used enough to care, when the presence of x in the list is unknown. Adding methods for purity alone is neither Pythonic nor Perlish <wink>. > On another related front, sets (in my Python 2.3a2) raise KeyError on a > .remove(elt) when elt isn't in the set. Since sets aren't mappings, > should that be a ValueError (like list raises) instead? Since sets aren't sequences either, why should sets raise the same exception lists raise? It's up to the type to use whichever fool exceptions it chooses. This doesn't always make life easy for users, alas -- there's not much consistency in exception behavior across packages. In this case, a user would be wise to avoid expecting IndexError or KeyError, and catch their common base class (LookupError) instead. The distinction between IndexError and KeyError isn't really useful (IMO; LookupError was injected as a base class recently in Python's life). From martin@v.loewis.de Tue Apr 22 06:28:03 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 22 Apr 2003 07:28:03 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> Message-ID: <m31xzvf6e4.fsf@mira.informatik.hu-berlin.de> Tim Rice <tim@multitalents.net> writes: > The UnixWare build is way dead right now. (today's CVS) Any volunteers to fix it? Regards, Martin From martin@v.loewis.de Tue Apr 22 06:40:49 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 22 Apr 2003 07:40:49 +0200 Subject: [Python-Dev] LynxOS4 dynamic loading with dlopen() and -ldl In-Reply-To: <20030421192738.A23585@io.com> References: <20030421192738.A23585@io.com> Message-ID: <m3wuhndr8e.fsf@mira.informatik.hu-berlin.de> duane voth <duanev@io.com> writes: > I hacked setup.py to stop "removing" the bad module files and brought > up the python interpreter to try the import by hand: [...] > (btw, it would be nice if 'ImportError: Symbol not found: "PyInt_Type"' > was emitted without all the debugging by hand I *strongly* recommend to use the Python CVS (to become 2.3) as a baseline for your port. Among other things, it does this already. > PyInt_Type is declared in Objects/intobject.o and is visible in > the python binary (the one doing the dlopen()). I'm not that familiar > with dlopen() but shouldn't references from the .so being loaded to > the loading program be resolved by dlopen during load? For executables, this is highly platform dependent - they never consider the case of somebody linking with an *executable*; they expect that symbols normally come from shared libraries. On ELF systems, it is supported, but still depends on the linker. For example, the GNU linker wants --export-dynamic as a linker option in order to expose symbols from the executable. You can use "nm -D --defined-only" (for GNU nm) to find out whether the executable exports symbols dynamically. > Running nm on 'python' gives '004d2d3c D PyInt_Type' so all the > python symbols are being exported properly. You are looking into the wrong section :-( Try strip on the binary and see the symbols go away. On ELF systems, you need the .dynsym/.dynstr sections on the binary. > LynxOS seems to shy away from shared libraries (they live in > a special nonstandard directory and not all libraries have shared > versions). Should I be thinking about doing a static python? If > so, I will need to abandon dlopen() completely right? But I also > want to use tkinter and the X11 libs to so I don't think static is > really what I want! It depends. I have a strong dislike towards shared libraries, myself. They are hard to use and somewhat inefficient, both in terms of start-up time, and in terms of memory usage. OTOH, for Python extension modules, they simplify the build and deployment process, and help to cut dependencies to other libraries. So if you can make it work, you should. You can then *still* consider integrating as many modules as reasonable into your python interpreter image, by means of Setup, and, for an embedded system, you definitely should also do that. Add demand paging to the picture: If the system has demand-paging, the size of the binary is irrelevant, as the system will swap in only what is needed. If the system needs to read the entire image into RAM, you want it as small as possible, though. Regards, Martin From agthorr@barsoom.org Tue Apr 22 06:42:18 2003 From: agthorr@barsoom.org (Agthorr) Date: Mon, 21 Apr 2003 22:42:18 -0700 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net> References: <20030420183005.GB8449@barsoom.org> <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net> Message-ID: <20030422054218.GA18642@barsoom.org> On Sun, Apr 20, 2003 at 09:31:04PM -0400, Tim Peters wrote: > I'm opposed to this. The purpose of Queue is to mediate communication among > threads, and a Queue.Queue rarely gets large because of its intended > applications. As other recent timing posts have shown, you simply can't > beat the list.append + list.pop(0) approach until a queue gets quite large > (relative to the intended purpose of a Queue.Queue). Out of curiosity, I ran some tests, comparing: list.append + list.pop(0) Queue.Queue my modified Queue.Queue The test adds n integers to the Queue, then removes them. I use the timeit module to perform the measurements, and do not count the loading of the module or creating of the list/queue object (since presumably the user will do this extremely infrequently). What I found was that for small n, list.append/pop is much faster than either Queue implementation. I assume this means that the bulk of the time is spent dealing with thread synchronization issues and with the overhead of using a class. It takes around one twentieth the time to complete the list.append/pop compared to either Queue implementation. For small n, the two Queue implementations were at least in the same ballpark. Mine was roughly 25% slower for n < 10, and around 10% slower for 10 < n < 100. After that, the difference gradually declined until the circular array took the lead somewhere in the vicinity of n=2000. The performance difference didn't become large until n=10000 where the O(n^2) growth finally began to kill the list.append/pop. Disappointed with these results, I spent some time tweaking my modified Queue.Queue to improve the performance. I create local variables in a few places, perform a bitwise-AND instead of a modulus, and initialize the circular buffer with 8 elements instead of just 1. I also now grow the circular buffer more efficiently. This made a huge difference. My implementation now outperforms the current Queue.Queue for n > 1! It does around 1% to 4% better up until around n=500, then the advantage starts to slowly ramp up. My updated Queue implement is here: http://www.cs.uoregon.edu/~agthorr/QueueNew.py and my test program is here: http://www.cs.uoregon.edu/~agthorr/test.py > If you have an unusual application for a Queue.Queue where it's actually > faster to do a circular-buffer gimmick (and don't believe that you do before > you time it), My application is a little program that sends simulation jobs to a small server farm. I have one thread per server that grabs jobs off the Queue and starts the remote simulation. I have a fair number of simulation parameters, and this translates into thousands of jobs getting added to the Queue. So, yes, for my particular application, the O(n^2) behavior really is a genuine problem ;) If circular-array Queue.Queue was significantly slower for low n, I'd agree with you that the current implementation should not be changed. It doesn't appear to be a problem, though. However, speaking of subclassing Queue: is it likely there are many user applications that subclass it in a way that would break? (i.e., they override some, but not all, of the functions intended for overriding). -- Agthorr From noah@noah.org Tue Apr 22 09:32:21 2003 From: noah@noah.org (Noah Spurrier) Date: Tue, 22 Apr 2003 01:32:21 -0700 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net> Message-ID: <3EA4FE15.1070803@noah.org> Tim>The callback can rename b and e, and change the contents of the fnames list Tim>to ["bx", "ex"] so that walk will find the renamed directories. Etc. Ha! This is sweet, but I would call this solution "nonobvious". But perhaps it is a good argument for not modifying os.path.walk(), yet should a walktree generator be included in Python's future I hope that it will have the explicit option for postorder depth first. Tim> Sorry, I'm unmovable on this point. My typical uses for this function do Tim> have to separates dirs from non-dirs, walk() has to make the distinction Tim> *anyway* (for its internal use), and it's expensive for the client to do the Tim> join() and isdir() bits all over again (isdir() is a filesystem op, and at Tim> least on my box repeated isdir() is overwhelmingly more costly than Tim> partitioning or joining a Python list). I'm probably less adamant on this point than you :-) And you are right, it's cheaper for me to simply run through both lists than it would be to loop over a conditional based on isdir(). Tim> What about that worries you? I don't like it because I have some Tim> directories with many thousands of files, and stuffing a long redundant path Tim> at the start of each is wasteful in the abstract. I'm not sure it really Tim> matters, though -- e.g., 10K files in a directory * 200 redundant chars each Tim> = a measly 2 megabytes wasted <wink>. That was also what bothered me ;-) I guess it's more of a habit than necessity. Tim> Not all Python platforms have symlinks, of course. The traditional answer True, but checking a file with os.path.islink() should be safe even on platforms without links -- if the docs are to be believed. The docs says that platforms that don't support links will always return False for islink(). The Python docs are a little inconsistent on links. 1. os.path.islink(path) claims to be only check links on UNIX and always false if symbolic links are not supported. 2. os.readlink(path) is only available on UNIX and is not defined on Windows. 3. os.path.realpath(path) claims to be only available on UNIX, but it is actually defined and returns the given path if you call it on Windows. Tim> I'm finding you too hard to follow here, because your use of "depthfirst" Tim> and "breadthfirst" doesn't follow normal usage of the terms. Here's normal You are right. I will stop calling it Breadth First now. Feel free to dope slap me. This confusion on my part was due to the apparent order when one prints the elements of the names list when the visit function is called. It would print B, C, D, E, F, G, H, I, J, K, but that's the parent printing the children, not the children printing themselves as they are visited. Oh... (a small, dim light clicks on.) Still, walktree should have the option to hit the bottom of a branch and then process on it's way back up (post-order). OK, how is this following version? Yours, Noah from __future__ import generators # needed for Python 2.2 import os def walktree (basepath=".", postorder=True, ignorelinks=True): """This walks a directory tree, starting from the basepath directory. This is somewhat like os.path.walk, but using generators instead of a visit function. One important difference is that walktree() defaults to postorder with optional preorder, whereas the os.path.walk function allows only preorder. Postorder was made the default because it is safer if you are going to be modifying the directory names you visit. This avoids the problem of renaming a directory before visiting the children of that directory. The ignorelinks option determines whether to follow symbolic links. Some symbolic links can lead to recursive traversal cycles. A better way would be to detect and prune cycles. """ children = os.listdir(basepath) dirs, nondirs = [], [] for name in children: fullpath = os.path.join (basepath, name) if os.path.isdir (fullpath) and not (ignorelinks and os.path.islink(fullpath)): dirs.append(name) else: nondirs.append(name) if not postorder: yield basepath, dirs, nondirs for name in dirs: for next_branch in walktree (os.path.join(basepath, name), postorder, ignorelinks): yield next_branch if postorder: yield basepath, dirs, nondirs def test(): for basepath, dirs, nondirs in walktree(): for name in dirs: print os.path.join(basepath, name) for name in nondirs: print os.path.join(basepath, name) if __name__ == '__main__': test() From mwh@python.net Tue Apr 22 09:34:29 2003 From: mwh@python.net (Michael Hudson) Date: Tue, 22 Apr 2003 09:34:29 +0100 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net> (Tim Peters's message of "Tue, 22 Apr 2003 00:06:10 -0400") References: <LNBBLJKPBEHFEDALKOLCKEOJEDAB.tim.one@comcast.net> Message-ID: <2mist7nd62.fsf@starship.python.net> Tim Peters <tim.one@comcast.net> writes: > [Jeremy Fincher] >> This code brought up an interesting question to me: if sets have >> a .discard method that removes an element without raising KeyError >> if the element isn't in the set, should lists perhaps have that same >> method? > > I don't think list.remove(x) is used enough to care, when the presence of x > in the list is unknown. I've wished for this, more than once, in the past. I can't quite remember why, I have to admit. while x in seq: seq.remove(x) is vulgar, on at least two levels. For all that, I'm not sure this is worth the pain. >> On another related front, sets (in my Python 2.3a2) raise KeyError on a >> .remove(elt) when elt isn't in the set. Since sets aren't mappings, >> should that be a ValueError (like list raises) instead? > > Since sets aren't sequences either, why should sets raise the same exception > lists raise? It's up to the type to use whichever fool exceptions it > chooses. This doesn't always make life easy for users, alas -- there's not > much consistency in exception behavior across packages. In this case, a > user would be wise to avoid expecting IndexError or KeyError, and catch > their common base class (LookupError) instead. The distinction between > IndexError and KeyError isn't really useful (IMO; LookupError was injected > as a base class recently in Python's life). Without me noticing, too! Well, I knew there was a lookup error that you get when failing to find a codec, but I didn't know IndexError and KeyError derived from it... Also note that Jeremy was suggesting *ValueError*, not IndexError... that any kind of index-or-key-ing is going on is trivia of the implementation, surely? Cheers, M. -- First of all, email me your AOL password as a security measure. You may find that won't be able to connect to the 'net for a while. This is normal. The next thing to do is turn your computer upside down and shake it to reboot it. -- Darren Tucker, asr From andymac@bullseye.apana.org.au Tue Apr 22 09:27:01 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Tue, 22 Apr 2003 19:27:01 +1100 (edt) Subject: [Python-Dev] sre vs gcc (was: New re failures on Windows) In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEOHEDAB.tim.one@comcast.net> Message-ID: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org> [redirected to people apparently working on SRE] On Mon, 21 Apr 2003, Tim Peters wrote: > Narrowing it down to the specific C code that's at fault is still the best > hope. There are two reasons for that: > > 1. It's very easy to write ill-defined code in C, and for all we know > now some part of _sre is depending on undefined, or implementation > defined (but apparently likely), behavior. > > 2. If that's not the problem, optimization bugs are usually easy to > sidestep via minor code changes. You have to know which code is > getting screwed first, though. Seeing that Gustavo had checked in some changes to _sre.c on Sunday, I CVS up'ed and now find that a gcc 2.95.4 build survives test_sre with -O3. A gcc 3.2.2 build still gets a bus error with either -O3 or -O2. The actual test case from test_sre that fails is: ---8<---8<--- # non-simple '*?' still recurses and hits the recursion limit test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None, RuntimeError) ---8<---8<--- For the moment, the FreeBSD 5.x (ie gcc 3.2.x) element of my configure.in patch (SF #725024) is still valid. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From aleax@aleax.it Tue Apr 22 11:54:10 2003 From: aleax@aleax.it (Alex Martelli) Date: Tue, 22 Apr 2003 12:54:10 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> Message-ID: <200304221254.10510.aleax@aleax.it> On Tuesday 22 April 2003 05:03 am, Tim Peters wrote: ... > filter() is hard to get rid of because the bizarre filter(None, seq) > special case is supernaturally fast. Indeed, time the above against > > def alltrue(seq): > return len(filter(None, seq)) == len(seq) > > def atleastonetrue(seq): > return bool(filter(None, seq)) > > Let me know which wins <wink>. Hmmm, I think I must be missing something here. Surely in many application cases a loop exploiting short-circuiting behavior will have better expected performance than anything that's going all the way through the sequence no matter what? Far greater variance, sure, and if the probability of true items gets extreme enough then the gain from short-circuiting will evaporate, but...: [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in range(9999)]' -s''' > def any(x): > for xx in x: > if xx: return True > return False > ''' 'any(seq)' 1000000 loops, best of 3: 1.42 usec per loop [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in range(9999)]' -s''' def any(x): return bool(filter(None,x)) ''' 'any(seq)' 1000 loops, best of 3: 679 usec per loop ...i.e., despite filter's amazing performance, looping over 10k items still takes a bit more than shortcircuiting out at once;-). If Python ever gains such C-coded functions as any, all, etc (hopefully in some library module, not in builtins!) I do hope and imagine they'd short- circuit, of course. BTW, I think any should return the first true item (or the last one if all false, or False for an empty sequence) and all should return the first false item (or the last one if all true, or True for an empty seq) by analogy with the behavior of operators and/or. Alex From guido@python.org Tue Apr 22 13:03:15 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 08:03:15 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: "Your message of Mon, 21 Apr 2003 18:55:38 PDT." <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> Message-ID: <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net> > The UnixWare build is way dead right now. (today's CVS) > > cc -c -K pentium,host,inline,loop_unroll,alloca -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c > UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set > UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select > gmake: *** [Modules/python.o] Error 1 That doesn't look like a *new* problem to me; if sys/select.h is being included twice, that probably was so for a long time. You may be the only person with access to this platform. Can you find the problem? Was this present in 2.3a2? --Guido van Rossum (home page: http://www.python.org/~guido/) From harri.pasanen@trema.com Tue Apr 22 13:47:02 2003 From: harri.pasanen@trema.com (Harri Pasanen) Date: Tue, 22 Apr 2003 14:47:02 +0200 Subject: [Python-Dev] Embedded python on Win2K, import failures In-Reply-To: <022e01c301bd$4b7f5a70$530f8490@eden> References: <022e01c301bd$4b7f5a70$530f8490@eden> Message-ID: <200304221447.02812.harri.pasanen@trema.com> On Sunday 13 April 2003 15:05, Mark Hammond wrote: > > Did you try -v, as > > > > > 'import site' failed; use -v for traceback > > > > suggested? > > Yep. as I said: > > > Running with "-v" shows: > > Note that as I mentioned, this is only if you move away _sre.pyd. > The original report was almost certainly a simple import error. I was away the past week, so excuse my delayed response. Ok, I found the problem, it is just a difference in the way Linux and Windows versions are built, but the failure mode could arguably be a bug. _sre.pyd is a separate module in windows, while on linux it is part of the whole lib. (libpython23.a, libpython23.so). I was running the python from the build tree, and PCbuild was not part of the sys.path for the embedded python. When running with the interactive python, in the style ../../PCbuild/python.exe, the sys.path implicitly gets the PCbuild directory, _sre.pyd is found and everything works. So when everything is configured and installed properly, everything works. The bug here is, that when _sre.pyd is not found from sys.path, and I'm running the embedded python. I'm not seeing any import errors, things just silently fail. At runtime "import re" goes though without a problem, but the resulting module is invalid, which is only noticed when the module is first time used. So I had no idea that _sre module was not being found, or that it was even required. Does this merit a bug at sf? Another thing, assuming I would get an import error from the embedded python, how do I enable the "use -v for traceback" for an it? Is there a function call I can add for the same effect? Regards, Harri From bkc@murkworks.com Tue Apr 22 14:41:45 2003 From: bkc@murkworks.com (Brad Clements) Date: Tue, 22 Apr 2003 09:41:45 -0400 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <m31xzvf6e4.fsf@mira.informatik.hu-berlin.de> References: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> Message-ID: <3EA50E59.29235.1E8CFCCA@localhost> On 22 Apr 2003 at 7:28, Martin v. L=F6wis wrote: > Tim Rice <tim@multitalents.net> writes: > > > The UnixWare build is way dead right now. (today's CVS) > > Any volunteers to fix it? > > Regards, > Martin I'm sorry I'm not in a position to fix it, but I do have an un-opened Unix= ware Advanced Server 2.01 box set (docs and media) if anyone wants them. Personally, I think Unixware is dead. Novell dropped it ages ago. -- Brad Clements, bkc@murkworks.com (315)268-1000 http://www.murkworks.com (315)268-9812 Fax http://www.wecanstopspam.org/ AOL-IM: BKClements From tim@multitalents.net Tue Apr 22 15:03:19 2003 From: tim@multitalents.net (Tim Rice) Date: Tue, 22 Apr 2003 07:03:19 -0700 (PDT) Subject: [Python-Dev] 2.3b1 release In-Reply-To: <3EA50E59.29235.1E8CFCCA@localhost> References: <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> <3EA50E59.29235.1E8CFCCA@localhost> Message-ID: <Pine.UW2.4.53.0304220701050.25189@ou8.int.multitalents.net> On Tue, 22 Apr 2003, Brad Clements wrote: > On 22 Apr 2003 at 7:28, Martin v. L=F6wis wrote: >=20 > > Tim Rice <tim@multitalents.net> writes: > > > > > The UnixWare build is way dead right now. (today's CVS) > > > > Any volunteers to fix it? > > > > Regards, > > Martin >=20 > I'm sorry I'm not in a position to fix it, but I do have an un-opened Uni= xware Advanced > Server 2.01 box set (docs and media) if anyone wants them. >=20 > Personally, I think Unixware is dead. Novell dropped it ages ago. "Dropped it" isn't quite correct. They sold it to SCO. --=20 Tim Rice=09=09=09=09Multitalents=09(707) 887-1469 tim@multitalents.net From tim@zope.com Tue Apr 22 16:36:24 2003 From: tim@zope.com (Tim Peters) Date: Tue, 22 Apr 2003 11:36:24 -0400 Subject: [Python-Dev] New thread death in test_bsddb3 Message-ID: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com> test_bsddb3.py fails quickly today under a debug build, with a thread state error, on Win2K, every time. Linux? I assume this is a bad interaction between Mark Hammond's new auto-thread-state code and _bsddb.c's custom thread-manipulation macros: C:\Code\python\PCbuild>python_d ../lib/test/test_bsddb3.py -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002) bsddb.db.version(): (4, 1, 25) bsddb.db.__version__: 4.1.5 bsddb.db.cvsid: $Id: _bsddb.c,v 1.11 2003/03/31 19:51:29 bwarsaw Exp $ python version: 2.3a2+ (#39, Apr 22 2003, 10:48:23) [MSC v.1200 32 bit (I ntel)] -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Fatal Python error: Invalid thread state for this thread C:\Code\python\PCbuild> It's dying in _db_associateCallback, here: static int _db_associateCallback(DB* db, const DBT* priKey, const DBT* priData, DBT* secKey) { int retval = DB_DONOTINDEX; DBObject* secondaryDB = (DBObject*)db->app_private; PyObject* callback = secondaryDB->associateCallback; int type = secondaryDB->primaryDBType; PyObject* key; PyObject* data; PyObject* args; PyObject* result; if (callback != NULL) { MYDB_BEGIN_BLOCK_THREADS; ************ HERE ************* The macro is defined like so: #define MYDB_BEGIN_BLOCK_THREADS { \ PyThreadState* prevState; \ PyThreadState* newState; \ PyEval_AcquireLock(); \ newState = PyThreadState_New(_db_interpreterState); \ prevState = PyThreadState_Swap(newState); PyThreadState_Swap is complaining here: #if defined(Py_DEBUG) if (new) { PyThreadState *check = PyGILState_GetThisThreadState(); if (check && check != new) Py_FatalError("Invalid thread state for this thread"); } #endif This is a new check, I believe it's an intentional check, and I doubt _bsddb.c *should* pass it as-is. From gherron@islandtraining.com Tue Apr 22 16:49:42 2003 From: gherron@islandtraining.com (Gary Herron) Date: Tue, 22 Apr 2003 08:49:42 -0700 Subject: [Python-Dev] Re: sre vs gcc (was: New re failures on Windows) In-Reply-To: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org> References: <Pine.OS2.4.44.0304221839050.27170-100000@tenring.andymac.org> Message-ID: <200304220849.43411.gherron@islandtraining.com> On Tuesday 22 April 2003 01:27 am, Andrew MacIntyre wrote: > [redirected to people apparently working on SRE] > > On Mon, 21 Apr 2003, Tim Peters wrote: > > Narrowing it down to the specific C code that's at fault is still the > > best hope. There are two reasons for that: > > > > 1. It's very easy to write ill-defined code in C, and for all we know > > now some part of _sre is depending on undefined, or implementation > > defined (but apparently likely), behavior. > > > > 2. If that's not the problem, optimization bugs are usually easy to > > sidestep via minor code changes. You have to know which code is > > getting screwed first, though. > > Seeing that Gustavo had checked in some changes to _sre.c on Sunday, I CVS > up'ed and now find that a gcc 2.95.4 build survives test_sre with -O3. > A gcc 3.2.2 build still gets a bus error with either -O3 or -O2. > > The actual test case from test_sre that fails is: > ---8<---8<--- > # non-simple '*?' still recurses and hits the recursion limit > test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None, > RuntimeError) ---8<---8<--- > > For the moment, the FreeBSD 5.x (ie gcc 3.2.x) element of my configure.in > patch (SF #725024) is still valid. Ah. Good clue! Here's a very likely fix to that problem. Around line 3102 of _sre.c find the line that sets USE_RECURSION_LIMIT. Depending on you platform it will be set to either 10000 or 7500. As a test, lower that value to 1000 or even 100. If all the tests pass, then we know the culprit. The sre code uses that value to prevent run-away recursion from overflowing the stack. It's value must be large enough to allow for *reasonable* levels of recursion, but small enough to catch a run-away recursion before it actually overflows the stack. On at least one class of machines, a value of 10000 was determined to be too high (i.e., the stack overflowed before that many levels of recursion were hit), and so the limit for them was lowered to 7500. Perhaps such is needed for your platform. You have a lot of leeway here in your tests. None of the tests in test_sre recurse more than 100 levels except for that one test which is expressly designed to blow past any limit, thereby testing that excessive recursion caught correctly. (And on your system, it is not being caught correctly, perhaps because the stack is overflowing before the USE_RECURSION_LIMIT is hit.) Let me know the results of the test please. Thank you, Gary Herron From tim@multitalents.net Tue Apr 22 16:54:40 2003 From: tim@multitalents.net (Tim Rice) Date: Tue, 22 Apr 2003 08:54:40 -0700 (PDT) Subject: [Python-Dev] 2.3b1 release In-Reply-To: <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net> On Tue, 22 Apr 2003, Guido van Rossum wrote: > > The UnixWare build is way dead right now. (today's CVS) > > > > cc -c -K pentium,host,inline,loop_unroll,alloca -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select > > gmake: *** [Modules/python.o] Error 1 > > That doesn't look like a *new* problem to me; if sys/select.h is being > included twice, that probably was so for a long time. You may be the > only person with access to this platform. Can you find the problem? > > Was this present in 2.3a2? > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think it was in 2.3a1 and probably before. It looks like the problem is having both sys/time.h and sys/select.h included when both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED are defined. SYS_SELECT_WITH_SYS_TIME is not defined in pyconfig.h so configure is detecting the problem. It's just that SYS_SELECT_WITH_SYS_TIME is not user anywhere in the code. Something like this will get things a lot farther. ------------------------ --- pyport.h.old 2003-04-17 13:17:24.000000000 -0700 +++ pyport.h 2003-04-22 08:51:43.230240009 -0700 @@ -115,7 +115,9 @@ #ifdef HAVE_SYS_SELECT_H +#ifdef SYS_SELECT_WITH_SYS_TIME #include <sys/select.h> +#endif #endif /* !HAVE_SYS_SELECT_H */ ------------------------ -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From walter@livinglogic.de Tue Apr 22 16:57:12 2003 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Tue, 22 Apr 2003 17:57:12 +0200 Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net> References: <3EA25869.6070404@noah.org> <200304202059.h3KKxUQ19593@pcp02138704pcs.reston01.va.comcast.net> <3EA34034.9060109@ActiveState.com> <200304210101.h3L11rv20026@pcp02138704pcs.reston01.va.comcast.net> <20030421014851.GB18971@glacier.arctrix.com> <200304211204.h3LC4cv20855@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3EA56658.10408@livinglogic.de> Guido van Rossum wrote: >>Guido van Rossum wrote: >> >>>But if I had to do it over again, I wouldn't have added walk() in the >>>current form. >> >>I think it's the perfect place for a generator. Has anybody considered Jason Orendorff's path module (http://www.jorendorff.com/articles/python/path/) for inclusion in the standard library? It has a path walking generator and much, much more. > Absolutely! So let's try to write something new based on generators, > make it flexible enough so that it can handle pre-order or post-order > visits, and then phase out os.walk(). This new generator should probably support callbacks that determine whether directories should be entered or not. Bye, Walter Dörwald From guido@python.org Tue Apr 22 17:01:34 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 12:01:34 -0400 Subject: [Python-Dev] Magic number needs upgrade Message-ID: <200304221601.h3MG1Yo32750@odiug.zope.com> Now that we have new bytecode optimizations, the pyc file magic number needs to be changed. --Guido van Rossum (home page: http://www.python.org/~guido/) From walter@livinglogic.de Tue Apr 22 17:08:52 2003 From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue, 22 Apr 2003 18:08:52 +0200 Subject: [Python-Dev] test_pwd failing In-Reply-To: <20030419160754.GA847@cthulhu.gerg.ca> References: <200304151518.h3FFI2S27822@odiug.zope.com> <3E9C25B9.7020308@livinglogic.de> <3E9C2828.4040803@livinglogic.de> <20030419160754.GA847@cthulhu.gerg.ca> Message-ID: <3EA56914.2040803@livinglogic.de> Greg Ward wrote: > On 15 April 2003, Walter Dörwald said: > >>Should the same change be done for the pwd module, i.e. >>are duplicate gid's allowed in /etc/group? > > Yes. I got a test failure from test_grp the other night, but I didn't > report it because I hadn't investigated it thoroughly yet. I'm guessing > it's the same as the test_pwd failure... and yes, it stems from a > duplicate GID in the /etc/group file on that system. This (and duplicate user or group names) should be fixed now. Bye, Walter Dörwald From guido@python.org Tue Apr 22 17:40:21 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 12:40:21 -0400 Subject: [Python-Dev] Re: Magic number needs upgrade In-Reply-To: Your message of "Tue, 22 Apr 2003 12:01:34 EDT." Message-ID: <200304221640.h3MGeLP05887@odiug.zope.com> > Now that we have new bytecode optimizations, the pyc file magic > number needs to be changed. Of course we might also consider turning back Raymond's bytecode optimizations. Given that I can't discern any speedup, I wonder what the wisdom is of adding more code complexity. We're still holding off on Ping and Aahz's changes (see the cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for similar reasons (inconclusive evidence of speedups in real programs). What makes Raymond's changes different? I also wonder why this is done unconditionally, rather than only with -O. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Tue Apr 22 20:53:25 2003 From: barry@python.org (Barry Warsaw) Date: 22 Apr 2003 15:53:25 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <3E9DD413.8030002@v.loewis.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de> Message-ID: <1051041205.32490.51.camel@barry> On Wed, 2003-04-16 at 18:07, "Martin v. Löwis" wrote: > > So why isn't the English/US-ASCII bias for msgids considered a liability > > for gettext? Do non-English programmers not want to use native literals > > in their source code? > > Using English for msgids is about the only way to get translation. > Finding a Turkish speaker who can translate from Spanish is > *significantly* more difficult than starting from English; if you were > starting from, say, Chinese, and going to Hebrew might just be impossible. > > So any programmer who seriously wants to have his software translated > will put English texts into the source code. Non-English literals are > only used if l10n is not an issue. That's probably true. I'm just not sure Zope wants to make that a requirement. > > BTW, I believe that if all your msgids /are/ us-ascii, you should be > > able to ignore this change and have it works backwards compatibly. > > "This" change being addition of the "coerce" argument? If you think > you will need it, we can leave it in. Actually, thinking about this more, we probably don't even need the coerce flag. If all your msgids are us-ascii, you don't care whether they've been coerced to Unicode or not because they'll still compare equal. So I propose to remove the coerce flag, but still Unicode-ify both msgids and msgstrs. Then .ugettext() will just return the Unicode msgstr in the catalog, while .gettext() will encode it to an 8-bit string based on the charset. Personally, I think most i18n Python apps are going to want to use .ugettext() anyway, so for the average program this will just work as expected. I have the tests passing for this change. Any objections? > >>If the msgids are UTF-8, with non-ASCII characters C-escaped, > >>translators will *still* put non-UTF-8 encodings into the catalogs. > >>This will then be a problem: The catalog encoding won't be UTF-8, > >>and you can't process the msgids. > > > > Isn't this just another validation step to run on the .po files? There > > are already several ways translators can (and do!) make mistakes, so we > > already have to validate the files anyway. > > I'm not sure how exactly a validation step would be executed. Would that > step simply verify that the encoding of a catalog is UTF-8? That > validation step would fail for catalogs that legally use other charsets. The validation step would make sure that all the msgids and msgstrs could be decoded using the encoding claimed in the headers. If msgids are us-ascii then (just about) any other encoding for msgstrs should work just fine. If there are non-ascii in both msgids and msgstrs, then some common encoding would have to be used (what other than utf-8?). It's a choice left up to the application and its translators. -Barry From jeremy@zope.com Tue Apr 22 20:47:27 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 22 Apr 2003 15:47:27 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads Message-ID: <1051040847.12834.32.camel@slothrop.zope.com> I've been working a little on the trace module lately, trying to get it to work correctly with Zope. One issue that remains open is how to handle multi-threaded programs. The PEP below proposes a solution. Jeremy PEP: XXX Title: Trace and Profile Support for Threads Version: $Revision: 1.1 $ Last-Modified: $Date: 2002/08/30 04:11:20 $ Author: Jeremy Hylton <jeremy@alum.mit.edu> Status: Active Type: Standards Track Content-Type: text/x-rst Created: 22-Apr-2003 Post-History: 22-Apr-2003 Abstract ======== This PEP describes a mechanism for attaching profile and trace functions to a thread when it is created. This mechanism allows existing tools, like the profiler, to work with multi-threaded programs. The new functionality is exposed via a new event type for trace functions. Rationale ========= The Python interpreter provides profile and trace hooks to support tools like debuggers and profilers. The hooks are associated with a single thread, which makes them harder to use in a multi-threaded environment. For example, the profiler will only collect data for a single thread. If the profiled application spawns new threads, the new threads will not be profiled. This PEP describes a mechanism that allows tools using profile and trace hooks to hook thread creation events. This mechanism would allow tools like the profiler to automatically instrument new threads as soon as they are created. The ability to hook thread creation makes a variety of tools more useful. It should allow them to work seamlessly with multi-threaded applications. The best alternative given the current interpreter support is to edit a multi-threaded application to manually insert calls to enable tracing or profiling. Background ========== There are two different hooks provided by the interpreter, one for tracing and one for profiling. The hooks are basically the same, except that the trace hook is called for each line that is executed but the profile hook is only called for each function. The hooks are exposed by the C API [1] and at the Python level by the sys module [2]. For simplicity, the rest of the section just talks about the trace function. A trace function [3] is called with three arguments: a frame, an event, and an event-dependent argument. The event is one of the following strings: "call," "line," "return," or "exception." The C API defines trace function that takes an int instead of a string to define the trace event. The sys.settrace() function sets the global trace function. A global trace function is called whenever a new local scope is entered. If the global trace function returns a value, it is used as the local trace function. If it returns None, no local tracing occurs. Thread creation event ===================== The proposed mechanism is to add a thread creation event called "thread" and PyTrace_THREAD. When thread.start_new_thread() is called, the calling thread's trace function is called with a thread event. The frame passed is None or NULL and the argument is the callable argument passed to start_new_thread(). If the trace function returns a value from the thread event, it is used as the global trace function for the newly created thread. Implementation ============== The bootstrap code in the thread module (Modules/threadmodule.c) must be extended to take trace functions into account. A thread's bootstate must be extended to include pointers to the trace function and its state object. The t_bootstrap() code must call the trace function before executing the boot function. Compatibility and Limitations ============================= An existing trace or profile function may be unprepared for the new event type. This may cause them to treat the thread event as some other kind of event. The thread event does not pass a valid frame object, because the frame isn't available before the thread starts running. Once the thread starts running, it is too late to generate the thread event. The hook is only available when a thread is created using the Python thread module. If a custom C extension calls PyThread_start_new_thread() directly, the trace function will not be called for that thread. It's hard to judge whether this behavior is good or bad. It is driven partly by implementation details. The implementation of PyThread_start_new_thread() can not tell when or if Python code will be executed by the thread. References ========== .. [1] Section 8.2, Profiling and Tracing, Python/C API Reference Manual (http://www.python.org/dev/doc/devel/api/profiling.html) .. [2] Section 3.1, sys, Python Library Reference (http://www.python.org/dev/doc/devel/lib/module-sys.html) .. [3] Section 9.2, How It Works (Python Debugger), Python Library Reference (http://www.python.org/dev/doc/devel/lib/debugger-hooks.html) Copyright ========= This document has been placed in the public domain. From dave@boost-consulting.com Tue Apr 22 22:58:23 2003 From: dave@boost-consulting.com (David Abrahams) Date: Tue, 22 Apr 2003 17:58:23 -0400 Subject: [Python-Dev] Metatype conflict among bases? Message-ID: <84lly2i48w.fsf@boost-consulting.com> Consider: class A(object): class __metaclass__(type): pass class B(A): # TypeError: metatype conflict among bases class __metaclass__(type): pass Now that's a weird error message at least! There's only one base (A), and I'm telling Python explicitly to use the nested __metaclass__ instead of A's __metaclass__! Should I not be surprised that Python won't let me set the metatype explicitly? -- Dave Abrahams Boost Consulting www.boost-consulting.com From drifty@alum.berkeley.edu Tue Apr 22 22:58:10 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Tue, 22 Apr 2003 14:58:10 -0700 (PDT) Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <1051040847.12834.32.camel@slothrop.zope.com> References: <1051040847.12834.32.camel@slothrop.zope.com> Message-ID: <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU> [Jeremy Hylton] > I've been working a little on the trace module lately, trying to get it > to work correctly with Zope. One issue that remains open is how to > handle multi-threaded programs. The PEP below proposes a solution. > Seems reasonable to me. Now if we just got rid of threads all together we wouldn't have to worry about this. =) <snip - a lot of stuff> > A trace function [3] is called with three arguments: a frame, an > event, and an event-dependent argument. The event is one of the > following strings: "call," "line," "return," or "exception." The C > API defines trace function that takes an int instead of a string to ^ > define the trace event. > Need "a" here? One one grammatical mistake?!? Wish I could pull that off once in the summaries. =) -Brett From python@rcn.com Tue Apr 22 23:01:05 2003 From: python@rcn.com (Raymond Hettinger) Date: Tue, 22 Apr 2003 18:01:05 -0400 Subject: [Python-Dev] Re: Magic number needs upgrade References: <200304221640.h3MGeLP05887@odiug.zope.com> Message-ID: <002101c3091a$ac2dfac0$1a10a044@oemcomputer> > > Now that we have new bytecode optimizations, the pyc file magic > > number needs to be changed. We have several options: 1. change the magic number to accomodate NOP. 2. install an additional step that eliminates the NOPs from the bytecode (they are not strictly necessary). this will make the code even shorter and faster without a need to change the magic number. i've got this in my hip pocket if we decide that this is the way to go. the generated code is beautiful. 3. eliminate the last two optimizations which were the only ones that needed a NOP: a) compare_op (is, in,is not, not in) unary_not --> compare_op(is not, not in, is, in) nop b) unary_not jump_if_false (tgt) --> nop jump_if_true (tgt) > I wonder what > the wisdom is of adding more code complexity. Part of the benefit is that there will no longer be any need to re-arrange branches and conditionals in order to avoid 'not'. As of now, it has near-zero cost in most situations (except when used with and/or). > We're still holding off on Ping and Aahz's changes (see the > cache-attr-branch) and Thomas and Brett's CALL_ATTR optimizations, for > similar reasons (inconclusive evidence of speedups in real programs). > > What makes Raymond's changes different? * They are thoroughly tested. * They are decoupled from the surrounding code and will survive changes to ceval.c and newcompile.c. * They provide some benefits without hurting anything else. * They provide a framework for others to build upon. The scanning loop and basic block tester make it a piece of cake to add/change/remove new code transformations. CALL_ATTR ought to go in when it is ready. It certainly provides measurable speed-up in the targeted behavior. It just needs more polish so that it doesn't slow down other pathways. The benefit is real, but in real programs it is being offset by reduced performance in non-targeted behavior. With some more work, it ought to be a real gem. Unfortunately, it is tightly coupled to the implementation of new and old-style class. Still, it looks like a winner. What we're seeing is a consequence of Amdahl's law and Python's broad scope. Instead of a single hotspot, Python exercises many different types of code and each needs to be optimized separately. People have taken on many of these and collectively they are having a great effect. The proposals by Ping, Aahz, Brett, and Thomas are import steps to address untouched areas. I took on the task of making sure that the basic pure python code slithers along quickly. The basics like "while", "for", "if", "not" have all been improved. Lowering the cost of those constructs will result in less effort towards by-passing them with vectorized code (map, etc). Code in something like sets.py won't show much benefit because so much effort had been directed at using filter, map, dict.update, and other high volume c-coded functions and methods. Any one person's optimizations will likely help by a few percent at most. But, taken together, they will be a big win. > I also wonder why this is done unconditionally, rather than only with > -O. Neal, Brett, and I had discussed this a bit and I came to the conclusion that these code transformations are like the ones already built into the compiler -- they have some benefit, but cost almost nothing (two passes over the code string at compile time). The -O option makes sense for optimizations that have a high time overhead, throw-away debugging information, change semantics, or reduce feature access. IOW, -O is for when you're trading something away in return for a bit of speed in production code. There is essentially no benefit to not using the optimized bytecode. Raymond Hettinger From jeremy@zope.com Tue Apr 22 23:02:07 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 22 Apr 2003 18:02:07 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU> References: <1051040847.12834.32.camel@slothrop.zope.com> <Pine.SOL.4.55.0304221454490.26597@death.OCF.Berkeley.EDU> Message-ID: <1051048927.12834.47.camel@slothrop.zope.com> On Tue, 2003-04-22 at 17:58, Brett Cannon wrote: > <snip - a lot of stuff> > > A trace function [3] is called with three arguments: a frame, an > > event, and an event-dependent argument. The event is one of the > > following strings: "call," "line," "return," or "exception." The C > > API defines trace function that takes an int instead of a string to > ^ > > define the trace event. > > > > Need "a" here? One one grammatical mistake?!? Wish I could pull that off > once in the summaries. =) The PEP was short. Just write shorter summaries <wink>. Jeremy From martin@v.loewis.de Tue Apr 22 23:15:08 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 23 Apr 2003 00:15:08 +0200 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <1051041205.32490.51.camel@barry> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de> <1051041205.32490.51.camel@barry> Message-ID: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > So I propose to remove the coerce flag, but still Unicode-ify both > msgids and msgstrs. Then .ugettext() will just return the Unicode > msgstr in the catalog, while .gettext() will encode it to an 8-bit > string based on the charset. Personally, I think most i18n Python apps > are going to want to use .ugettext() anyway, so for the average program > this will just work as expected. > > I have the tests passing for this change. Any objections? For safety, I'd recommend that you use byte string msgids if conversion to Unicode fails. Otherwise, I'm fine with automatically coercing everything to Unicode. I do know about catalogs that use Latin-1 in msgids (to represent accented characters in the names of authors). That should not cause failures. Regards, Martin From mhammond@skippinet.com.au Tue Apr 22 23:27:44 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 23 Apr 2003 08:27:44 +1000 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com> Message-ID: <000a01c3091e$65a978a0$530f8490@eden> > test_bsddb3.py fails quickly today under a debug build, with > a thread state > error, on Win2K, every time. Linux? > > I assume this is a bad interaction between Mark Hammond's new > auto-thread-state code and _bsddb.c's custom > thread-manipulation macros: Yes, this is my fault. The assertion is detecting the fact that bsddb is creating and using its own interpreter/thread states than using the thread-state already seen for that thread. As Tim says, the assertion is new, but the check it makes is valid. I believe that removing the assertion would allow it to work, but the right thing to do is fix bsddb to use the new PyGILState_ API, and therefore share the threadstate with the rest of Python. I will do this very shortly (ie, within a couple of hours) Mark. From pje@telecommunity.com Tue Apr 22 23:31:24 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Tue, 22 Apr 2003 18:31:24 -0400 Subject: [Python-Dev] Metatype conflict among bases? Message-ID: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> David Abrahams <dave@boost-consulting.com> wrote: > >Consider: > > class A(object): > class __metaclass__(type): > pass > > class B(A): # TypeError: metatype conflict among bases > class __metaclass__(type): > pass > >Now that's a weird error message at least! There's only one base (A), >and I'm telling Python explicitly to use the nested __metaclass__ >instead of A's __metaclass__! > >Should I not be surprised that Python won't let me set the metatype >explicitly? The problem here is that B.__metaclass__ *must* be the same as, or a subclass of, A.__metaclass__, or vice versa. It doesn't matter whether the metaclass is specified implicitly or explicitly, this constraint must be met. Your code doesn't meet this constraint. Here's a revised example that does: class A(object): class __metaclass__(type): pass class B(A): class __metaclass__(A.__class__): pass B.__metaclass__ will now meet the "metaclass inheritance" constraint. See the "descrintro" document for some more info about this, and the "Putting Metaclasses To Work" book for even more info about it than you would ever want to know. :) Here's a short statement of the constraint, though: A class X's metaclass (X.__class__) must be identical to, or a subclass of, the metaclass of *every* class in X.__bases__. That is: for b in X.__bases__: assert X.__class__ is b.__class__ or issubclass(X.__class, b.__class__),\ "metatype conflict among bases" From tim@multitalents.net Tue Apr 22 23:35:52 2003 From: tim@multitalents.net (Tim Rice) Date: Tue, 22 Apr 2003 15:35:52 -0700 (PDT) Subject: [Python-Dev] 2.3b1 release In-Reply-To: <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net> <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net> Message-ID: <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net> On Tue, 22 Apr 2003, Tim Rice wrote: > On Tue, 22 Apr 2003, Guido van Rossum wrote: > > > > The UnixWare build is way dead right now. (today's CVS) > > > > > > cc -c -K pentium,host,inline,loop_unroll,alloca -DNDEBUG -O -I. -I/opt/src/utils/python/python/dist/src/Include -DPy_BUILD_CORE -o Modules/python.o /opt/src/utils/python/python/dist/src/Modules/python.c > > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 45: identifier redeclared: fd_set > > > UX:acomp: ERROR: "/usr/include/sys/select.h", line 72: identifier redeclared: select > > > gmake: *** [Modules/python.o] Error 1 > > > > That doesn't look like a *new* problem to me; if sys/select.h is being > > included twice, that probably was so for a long time. You may be the > > only person with access to this platform. Can you find the problem? > > > > Was this present in 2.3a2? > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > I think it was in 2.3a1 and probably before. > > It looks like the problem is having both sys/time.h and sys/select.h > included when both _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED are > defined. > > SYS_SELECT_WITH_SYS_TIME is not defined in pyconfig.h so configure > is detecting the problem. It's just that SYS_SELECT_WITH_SYS_TIME is > not user anywhere in the code. > > Something like this will get things a lot farther. > ------------------------ > --- pyport.h.old 2003-04-17 13:17:24.000000000 -0700 > +++ pyport.h 2003-04-22 08:51:43.230240009 -0700 > @@ -115,7 +115,9 @@ > > #ifdef HAVE_SYS_SELECT_H > > +#ifdef SYS_SELECT_WITH_SYS_TIME > #include <sys/select.h> > +#endif > > #endif /* !HAVE_SYS_SELECT_H */ > > ------------------------ Well after patching pyport.h for the sys/select problem, I had errors because of missing u_int and u_long data types. Patch configure.in, pyconfig.h.in, pyport.h. Now u_char, and u_short. Patch configure.in, pyconfig.h.in, pyport.h. some more. Now missing defines of NI_MAXHOST, NI_NUMERICHOST, & NI_MAXSERV. At that point I said to myself "This is nuts, 2.2.2 worked fine". So I backed out all my other patches and added this one. -------------------------- --- configure.in.old 2003-04-17 13:16:42.000000000 -0700 +++ configure.in 2003-04-22 15:26:13.450080095 -0700 @@ -124,6 +124,8 @@ # of union __?sigval. Reported by Stuart Bishop. SunOS/5.6) define_xopen_source=no;; + OpenUNIX/8.* | UnixWare/7.*) + define_xopen_source=no;; esac if test $define_xopen_source = yes -------------------------- Builds fine now. -- Tim Rice Multitalents (707) 887-1469 tim@multitalents.net From mhammond@skippinet.com.au Wed Apr 23 00:05:26 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 23 Apr 2003 09:05:26 +1000 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEMCFGAA.tim@zope.com> Message-ID: <000001c30923$aa162830$530f8490@eden> > test_bsddb3.py fails quickly today under a debug build, with > a thread state > error, on Win2K, every time. Linux? Actually, some guidance would be nice here. Is this code (_bsddb.c) ever expected to again build under pre-trunk versions of Python, or can I remove the old thread-state management code? ie, should my changes be or the style: #if defined(NEW_PYGILSTATE_API_EXISTS) // new 1 line of code #else // existing many lines of code #endif Or just stick with the new code? Nothing-is-finished-until-there-is-nothing-left-to-remove ly, Mark. From tim.one@comcast.net Wed Apr 23 00:18:22 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 22 Apr 2003 19:18:22 -0400 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <000001c30923$aa162830$530f8490@eden> Message-ID: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net> [Mark Hammond] > Actually, some guidance would be nice here. It's easy this time. BTW, I agree your new check is the right thing to do! If another case like this pops up, though, we/you should probably add a section to the PEP explaining what to do about it. > Is this code (_bsddb.c) ever expected to again build under pre-trunk > versions of Python, or can I remove the old thread-state management code? The former: the pybsddb project still exists and is used with older versions of Python. Barry mumbled something today at the office about wanting to keep the C code in synch. > ie, should my changes be or the style: > > #if defined(NEW_PYGILSTATE_API_EXISTS) > // new 1 line of code > #else > // existing many lines of code > #endif Yes, that would be great. From mhammond@skippinet.com.au Wed Apr 23 00:41:44 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 23 Apr 2003 09:41:44 +1000 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net> Message-ID: <000301c30928$bc311160$530f8490@eden> > Yes, that would be great. Cool - all checked in. Thanks. Mark. From guido@python.org Wed Apr 23 01:23:47 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 20:23:47 -0400 Subject: [Python-Dev] Metatype conflict among bases? In-Reply-To: "Your message of Tue, 22 Apr 2003 17:58:23 EDT." <84lly2i48w.fsf@boost-consulting.com> References: <84lly2i48w.fsf@boost-consulting.com> Message-ID: <200304230023.h3N0Nlf26157@pcp02138704pcs.reston01.va.comcast.net> > Consider: > > class A(object): > class __metaclass__(type): > pass > > class B(A): # TypeError: metatype conflict among bases > class __metaclass__(type): > pass > > Now that's a weird error message at least! There's only one base (A), > and I'm telling Python explicitly to use the nested __metaclass__ > instead of A's __metaclass__! > > Should I not be surprised that Python won't let me set the metatype > explicitly? The metaclass must be a subclass of the metaclass of all the bases. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 23 01:49:03 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 20:49:03 -0400 Subject: [Python-Dev] Re: Magic number needs upgrade In-Reply-To: "Your message of Tue, 22 Apr 2003 18:01:05 EDT." <002101c3091a$ac2dfac0$1a10a044@oemcomputer> References: <200304221640.h3MGeLP05887@odiug.zope.com> <002101c3091a$ac2dfac0$1a10a044@oemcomputer> Message-ID: <200304230049.h3N0n3Q26957@pcp02138704pcs.reston01.va.comcast.net> > > What makes Raymond's changes different? > > * They are thoroughly tested. > > * They are decoupled from the surrounding code and > will survive changes to ceval.c and newcompile.c. > > * They provide some benefits without hurting anything else. What are the benefits? I see zero improvement. And more code hurts. > * They provide a framework for others to build upon. > The scanning loop and basic block tester make it > a piece of cake to add/change/remove new code transformations. > CALL_ATTR ought to go in when it is ready. No, only if it really makes a difference. We can't expect to beat Parrot by accumulating an endless string of theoretical improvements that each contribute 0.1% speedup to the average application. > It certainly provides measurable speed-up in the targeted behavior. > It just needs more polish so that it doesn't slow down other > pathways. The benefit is real, but in real programs it is being > offset by reduced performance in non-targeted behavior. With some > more work, it ought to be a real gem. Unfortunately, it is tightly > coupled to the implementation of new and old-style class. Still, it > looks like a winner. That's what I though, until I benchmarked it. It's possible that it can be saved. It's also possible that we've pretty much reached a point where any optimization we think of is somehow undone by the effect of more code and hence less code locality. > What we're seeing is a consequence of Amdahl's law and Python's > broad scope. Instead of a single hotspot, Python exercises many > different types of code and each needs to be optimized separately. > People have taken on many of these and collectively they are having > a great effect. The proposals by Ping, Aahz, Brett, and Thomas > are import steps to address untouched areas. Possibly. Or possibly we need to step back and redesign the interpreter from scratch. Or put more effort in e.g. Psyco. > I took on the task of making sure that the basic pure python code > slithers along quickly. The basics like "while", "for", "if", "not" > have all been improved. Lowering the cost of those constructs > will result in less effort towards by-passing them with vectorized > code (map, etc). Code in something like sets.py won't show much > benefit because so much effort had been directed at using filter, > map, dict.update, and other high volume c-coded functions and > methods. And I'm happy that Python 2.3 is significantly faster than 2.2 (15% in my benchmark!). > Any one person's optimizations will likely help by a few percent > at most. But, taken together, they will be a big win. Yet, I expect that we're reaching a limit, or at least crawling up ever slower. > > I also wonder why this is done unconditionally, rather than only with > > -O. > > Neal, Brett, and I had discussed this a bit and I came to the conclusion > that these code transformations are like the ones already built into the > compiler -- they have some benefit, but cost almost nothing (two passes > over the code string at compile time). The -O option makes sense for > optimizations that have a high time overhead, throw-away debugging > information, change semantics, or reduce feature access. IOW, -O is > for when you're trading something away in return for a bit of speed > in production code. Yeah, but right now -O does *nothing* except remove asserts. We might as well get rid of it. > There is essentially no benefit to not using the optimized bytecode. Of course not, if you keep putting all optimizations in the default case. If we had only optimized unary minus followed by a constant in -O mode, the (several!) bugs in that optimization would have been caught much sooner. PS, Raymond, can I ask you to look at the following bugs and patches that are assigned to you: bugs 549151 (!), 557704 (!), 665835, 678519, patches 708374, 685051, 658316, 562501. The (!) ones have priority. It's okay if you don't have time, but in that case say so so I can find another way to get them addressed. --Guido van Rossum (home page: http://www.python.org/~guido/) From andrew@acooke.org Wed Apr 23 02:12:05 2003 From: andrew@acooke.org (andrew cooke) Date: Tue, 22 Apr 2003 21:12:05 -0400 (CLT) Subject: [Python-Dev] os.path.walk() lacks 'depth first' option In-Reply-To: <3EA4FE15.1070803@noah.org> References: <LNBBLJKPBEHFEDALKOLCOENKEDAB.tim.one@comcast.net> <3EA4FE15.1070803@noah.org> Message-ID: <53569.127.0.0.1.1051060325.squirrel@127.0.0.1> I hesitate to post this because I'm out of my depth - I've never used generators before, and I'm not 100% certain that this strange compromise between imperative (the usual breadth/depth switch using queues) and a functional (the usual pre/post switch using the call stack) algorithms is ok. However, it appears to work and may be useful - it's a simple extension to Noah's code that allows the user to choose between breadth- and depth-first traversal. It is more expensive, using a list as either fifo or lifo queue (depending on breadth/depth selection). [Noah - I decided to post this rather than bother you again - hope that's OK] #!/usr/bin/python2.2 from __future__ import generators # needed for Python 2.2 import os def walktree(basepath=".", postorder=True, depthfirst=True, ignorelinks=True): """Noah Spurrier's code, modified to allow depth/breadth-first traversal. The recursion is there *only* to allow postorder processing as the stack rolls back - the rest of the algorithm is imperative and queue would be declared outside helper if I knew how.""" def helper(queue): if queue: if depthfirst: dir = queue.pop(-1) else: dir = queue.pop(0) children = os.listdir(dir) dirs, nondirs = [], [] for name in children: fullpath = os.path.join(dir, name) if os.path.isdir(fullpath) and not \ (ignorelinks and os.path.islink(fullpath)): dirs.append(name) queue.append(fullpath) else: nondirs.append(name) if not postorder: yield dir, dirs, nondirs for rest in helper(queue): yield rest if postorder: yield dir, dirs, nondirs return helper([basepath]) def test(): for basepath, dirs, nondirs in \ walktree(postorder=True, depthfirst=False): for name in dirs: print os.path.join(basepath, name) for name in nondirs: print os.path.join(basepath, name) if __name__ == '__main__': test() -- http://www.acooke.org/andrew From andymac@bullseye.apana.org.au Wed Apr 23 00:52:19 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Wed, 23 Apr 2003 10:52:19 +1100 (edt) Subject: [Python-Dev] Re: sre vs gcc (was: New re failures on Windows) In-Reply-To: <200304220849.43411.gherron@islandtraining.com> Message-ID: <Pine.OS2.4.44.0304231025270.28508-100000@tenring.andymac.org> On Tue, 22 Apr 2003, Gary Herron wrote: > On Tuesday 22 April 2003 01:27 am, Andrew MacIntyre wrote: {...} > > The actual test case from test_sre that fails is: > > ---8<---8<--- > > # non-simple '*?' still recurses and hits the recursion limit > > test(r"""sre.search('(a|b)*?c', 10000*'ab'+'cd').end(0)""", None, > > RuntimeError) ---8<---8<--- {...} > Ah. Good clue! Here's a very likely fix to that problem. Around > line 3102 of _sre.c find the line that sets USE_RECURSION_LIMIT. > Depending on you platform it will be set to either 10000 or 7500. As > a test, lower that value to 1000 or even 100. If all the tests pass, > then we know the culprit. The magic number for USE_RECURSION_LIMIT is between 9250 & 9500. Note that this is for gcc 3.2.2 on FreeBSD 4.7. For gcc 3.2.1 on OS/2, 9250 is too high, but 7500 lets test_sre complete. If the above test case is commented out, the "Test engine limitations" test case section fails at the same USE_RECURSION_LIMIT settings as the above test case. I'll prepare a patch to supercede 725024 which sets USE_RECURSION_LIMIT to 7500 on FreeBSD & OS/2 with gcc 3.x, but I won't get to it for a day or two. I'll assign it to Gustavo. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From gward@python.net Wed Apr 23 02:35:06 2003 From: gward@python.net (Greg Ward) Date: Tue, 22 Apr 2003 21:35:06 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> References: <20030422022607.GA1107@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> Message-ID: <20030423013506.GA2547@cthulhu.gerg.ca> On 21 April 2003, Tim Peters said: > filter() is hard to get rid of because the bizarre filter(None, seq) special > case is supernaturally fast. Indeed, time the above against Hmmm, a random idea: has filter() ever been used for anything else? I didn't think so. So why not remove everything *except* that handy special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and that's *all* filter() does. Just a random thought... -- Greg Ward <gward@python.net> http://www.gerg.ca/ Dyslexics of the world, untie! From guido@python.org Wed Apr 23 02:37:52 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 22 Apr 2003 21:37:52 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: "Your message of 22 Apr 2003 15:47:27 EDT." <1051040847.12834.32.camel@slothrop.zope.com> References: <1051040847.12834.32.camel@slothrop.zope.com> Message-ID: <200304230137.h3N1bqh27095@pcp02138704pcs.reston01.va.comcast.net> > PEP: XXX > Title: Trace and Profile Support for Threads > Author: Jeremy Hylton <jeremy@alum.mit.edu> Nice idea, Jeremy! I have some more worries to add to the compatibility section. It seems reasonable for a trace implementation to implement a state machine that assumes that events come in certain orders, e.g. CALL, LINE, LINE, ..., RAISE or RETURN, and it might assume without checking that all these apply to the same frame. Calls from multiple threads would confuse such a tracer! If we can limit ourselves to threads started with the higher-level (and recommended) threading module, we could provide a different mechanism: you give the threading module a "tracer factory function" which is invoked when a thread is started and passed to sys.settrace(). Since sys.settrace() manipulates per-thread state, this should work. Since the API is new, there is no compatibility problem. The API could be super simple: threading.settrace(factory) This would cause the following to be executed when a new thread is started: sys.settrace(factory(frame, "thread", thread)) (An end-thread event should probably also be passed to the factory.) By giving the factory the same signature as the regular trace function, it is still possible to use the same tracer function if it doesn't get confused by events from multiple threads, but it's also possible to implement something different. No C code would have to be written. What do you think? Or does the dependency on the threading module kill this idea? (Then we should think of adding this to the thread module instead. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Wed Apr 23 03:33:21 2003 From: barry@python.org (Barry Warsaw) Date: 22 Apr 2003 22:33:21 -0400 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <000a01c3091e$65a978a0$530f8490@eden> References: <000a01c3091e$65a978a0$530f8490@eden> Message-ID: <1051065201.19699.2.camel@anthem> On Tue, 2003-04-22 at 18:27, Mark Hammond wrote: > Yes, this is my fault. The assertion is detecting the fact that bsddb is > creating and using its own interpreter/thread states than using the > thread-state already seen for that thread. > > As Tim says, the assertion is new, but the check it makes is valid. I > believe that removing the assertion would allow it to work, but the right > thing to do is fix bsddb to use the new PyGILState_ API, and therefore share > the threadstate with the rest of Python. > > I will do this very shortly (ie, within a couple of hours) Thanks for taking care of this Mark! Yes, as PEP 291 states, bsddb.c has to be compatible with Python 2.1. At some point we may want to re-evaluate that, but for now, if it's easy to do, we should keep compatibility. -Barry From jack@performancedrivers.com Wed Apr 23 03:38:23 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Tue, 22 Apr 2003 22:38:23 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca>; from gward@python.net on Tue, Apr 22, 2003 at 09:35:06PM -0400 References: <20030422022607.GA1107@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> <20030423013506.GA2547@cthulhu.gerg.ca> Message-ID: <20030422223823.D15881@localhost.localdomain> On Tue, Apr 22, 2003 at 09:35:06PM -0400, Greg Ward wrote: > On 21 April 2003, Tim Peters said: > > filter() is hard to get rid of because the bizarre filter(None, seq) special > > case is supernaturally fast. Indeed, time the above against > > Hmmm, a random idea: has filter() ever been used for anything else? > I didn't think so. So why not remove everything *except* that handy > special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and > that's *all* filter() does. Most frequently I test truth of a member of a tuple or list, newl = filter(lambda x:x[-2], l) secondly just plain truth, but here are some other examples. sql_obs = filter(lambda x:isinstance(x, SQL), l) words = filter(lambda x: x[-1] != ':', words) # filter out group: related: etc pad_these = filter(lambda x:len(x) < maxlen, lists) files = filter(lambda x:dir_matches(sid, x), os.listdir(libConst.STATE_DIR + '/')) delete_these = map(lambda x:x[0][2:], filter(lambda x: x[1], d.iteritems())) files = filter(lambda x:x.endswith('.state'), os.listdir(base_dir)) Go ahead, ask why we don't yank out lambda too, nobody uses that *wink* -jack From dave@boost-consulting.com Wed Apr 23 03:39:11 2003 From: dave@boost-consulting.com (David Abrahams) Date: Tue, 22 Apr 2003 22:39:11 -0400 Subject: [Python-Dev] Metatype conflict among bases? In-Reply-To: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> (Phillip J. Eby's message of "Tue, 22 Apr 2003 18:31:24 -0400") References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> Message-ID: <844r4qgcog.fsf@boost-consulting.com> "Phillip J. Eby" <pje@telecommunity.com> writes: > The problem here is that B.__metaclass__ *must* be the same as, or a > subclass of, A.__metaclass__, or vice versa. It doesn't matter > whether the metaclass is specified implicitly or explicitly, this > constraint must be met. Your code doesn't meet this constraint. > Here's a revised example that does: > > class A(object): > class __metaclass__(type): > pass > > class B(A): > class __metaclass__(A.__class__): > pass > > B.__metaclass__ will now meet the "metaclass inheritance" constraint. > See the "descrintro" document for some more info about this, and the > "Putting Metaclasses To Work" book for even more info about it than > you would ever want to know. :) I knew all that once, and have since forgotten more than I knew :(. I actually already managed to make the code work by doing what you did above, so it couldn't have been buried too deeply in the caves of my brain. > Here's a short statement of the constraint, though: > > A class X's metaclass (X.__class__) must be identical to, or a > subclass of, the metaclass of *every* class in X.__bases__. That is: > > for b in X.__bases__: > assert X.__class__ is b.__class__ or issubclass(X.__class, b.__class__),\ > "metatype conflict among bases" Still, the message is misleading. There's only one base class, so the metatype conflict is not "among bases". -- Dave Abrahams Boost Consulting www.boost-consulting.com From barry@python.org Wed Apr 23 03:42:01 2003 From: barry@python.org (Barry Warsaw) Date: 22 Apr 2003 22:42:01 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca> References: <20030422022607.GA1107@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCGEOEEDAB.tim.one@comcast.net> <20030423013506.GA2547@cthulhu.gerg.ca> Message-ID: <1051065721.19699.9.camel@anthem> On Tue, 2003-04-22 at 21:35, Greg Ward wrote: > On 21 April 2003, Tim Peters said: > > filter() is hard to get rid of because the bizarre filter(None, seq) special > > case is supernaturally fast. Indeed, time the above against > > Hmmm, a random idea: has filter() ever been used for anything else? > I didn't think so. So why not remove everything *except* that handy > special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and > that's *all* filter() does. I've never used it for anything else, but I'm also just as happy to use [x for x in seq if x] Although it's a bit verbose, TOOWTDI. -Barry From tim.one@comcast.net Wed Apr 23 04:50:58 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 22 Apr 2003 23:50:58 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <200304221254.10510.aleax@aleax.it> Message-ID: <LNBBLJKPBEHFEDALKOLCCEADEEAB.tim.one@comcast.net> [Alex Martelli] > Hmmm, I think I must be missing something here. Surely in many > application cases a loop exploiting short-circuiting behavior will have > better expected performance than anything that's going all the way > through the sequence no matter what? No, you're only missing that I seem rarely to have apps where it actually matters. > Far greater variance, sure, and if the probability of true items gets > extreme enough then the gain from short-circuiting will evaporate, Or, more likely, become a pessimization (liability). > but...: > > [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in > range(9999)]' > -s''' > > def any(x): > > for xx in x: > > if xx: return True > > return False > > ''' 'any(seq)' > 1000000 loops, best of 3: 1.42 usec per loop > > [alex@lancelot src]$ ./python Lib/timeit.py -s'seq=[i%2 for i in > range(9999)]' > -s''' > def any(x): > return bool(filter(None,x)) > ''' 'any(seq)' > 1000 loops, best of 3: 679 usec per loop > > ...i.e., despite filter's amazing performance, looping over 10k > items still takes a bit more than shortcircuiting out at once;-). It's only because Guido sped up loops for 2.3 <wink>. > If Python ever gains such C-coded functions as any, all, etc (hopefully > in some library module, not in builtins!) I do hope and imagine they'd > short-circuit, of course. BTW, I think any should return the first > true item (or the last one if all false, or False for an empty sequence) > and all should return the first false item (or the last one if all true, > or True for an empty seq) by analogy with the behavior of operators > and/or. I agree that getting the first witness (for "any") or counterexample (for "all") can be useful. I'm not sure I care what it returns if all are false for "any", or all true for "all". If I don't care, they're easy to write with itertools now: """ import itertools def all(seq): for x in itertools.ifilterfalse(None, seq): return x # return first false value return True def any(seq): for x in itertools.ifilter(None, seq): return x # return first true value return False print all([1, 2, 3]) # True print all([1, 2, 3, 0, 4, 5]) # 0, the first counterexample print any([0, 0, 0, 0]) # False print any([0, 42, 0, 0]) # 42, the first witness """ I liked ABC's quantified boolean expressions: SOME x IN collection HAS bool_expression_presumably_referencing_x EACH x IN collection HAS bool_expression_presumably_referencing_x NO x IN collection HAS bool_expression_presumably_referencing_x The first left x bound to the first witness when true. ABC didn't have boolean data values -- these expressions could only be used in control-flow statements (like IF). x was then a block-local binding in the block controlled by the truth of the expression, so there was no question about what to do with x when the expression was false (you didn't enter the block then, so couldn't reference the block-local x). The second and third left x bound to the first counterexample when the expression was false, and in those cases x was local to the ELSE clause. I viewed that as finessing around a question that shouldn't be asked, via the simple expedient of making the question unaskable <wink>. The exact rules were pretty complicated, though. From tim.one@comcast.net Wed Apr 23 04:59:27 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 22 Apr 2003 23:59:27 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <20030423013506.GA2547@cthulhu.gerg.ca> Message-ID: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net> [Greg Ward] > Hmmm, a random idea: has filter() ever been used for anything else? > I didn't think so. So why not remove everything *except* that handy > special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and > that's *all* filter() does. > > Just a random thought... It's been used for lots of other stuff, but I'm not sure if any other use wouldn't read better as a listcomp. For example, from spambayes: def textparts(msg): """Return a set of all msg parts with content maintype 'text'.""" return Set(filter(lambda part: part.get_content_maintype() == 'text', msg.walk())) I think that reads better as: return Set([part for part in msg.walk() if part.get_content_maintype() == 'text']) In Python 3.0 that will become a set comprehension <wink>: return {part for part in msg.walk() if part.get_content_maintype() == 'text'} From martin@v.loewis.de Wed Apr 23 06:09:28 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 23 Apr 2003 07:09:28 +0200 Subject: [Python-Dev] 2.3b1 release In-Reply-To: <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net> References: <200304161552.h3GFqAQ10181@odiug.zope.com> <Pine.UW2.4.53.0304211854020.15002@ou8.int.multitalents.net> <200304221203.h3MC3Fc24221@pcp02138704pcs.reston01.va.comcast.net> <Pine.UW2.4.53.0304220724020.25189@ou8.int.multitalents.net> <Pine.UW2.4.53.0304221527570.453@ou8.int.multitalents.net> Message-ID: <m34r4per5j.fsf@mira.informatik.hu-berlin.de> Tim Rice <tim@multitalents.net> writes: > Well after patching pyport.h for the sys/select problem, I had > errors because of missing u_int and u_long data types. In this form, I consider the patch unacceptable. Setting define_xopen_source should be the last resort, to be used only if the operating system is broken in the sense of not working at all as an X/Open system, for compiling software. If this is indeed the case that OpenUnix cannot work with _XOPEN_SOURCE defined, giving one instance of an unsolvable problem in a comment that explains why it should be disabled. See the comments for other systems as to how to explain such problems. Saying there are "errors" is too unspecific; saying that u_int is not defined but needed for the signature of the foo_bar function would be ok. Please post your updated patch to SF. Regards, Martin From Ludovic.Aubry@logilab.fr Wed Apr 23 09:44:19 2003 From: Ludovic.Aubry@logilab.fr (Ludovic Aubry) Date: Wed, 23 Apr 2003 10:44:19 +0200 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <005e01c3084c$3fe7d300$ec11a044@oemcomputer> References: <200304192343.48211.aleax@aleax.it> <200304211252.32948.aleax@aleax.it> <200304211248.h3LCmw622763@pcp02138704pcs.reston01.va.comcast.net> <200304211703.24685.aleax@aleax.it> <003d01c30848$ebcc2d00$ec11a044@oemcomputer> <005e01c3084c$3fe7d300$ec11a044@oemcomputer> Message-ID: <20030423084419.GC567@logilab.fr> On Mon, Apr 21, 2003 at 05:23:25PM -0400, Raymond Hettinger wrote: > [RH] > > For the C implementation, consider bypassing operator.add > > and calling the nb_add slot directly. It's faster and fulfills > > the intention to avoid the alternative call to sq_concat. > > Forget I said that, you still need PyNumber_Add() to > handle coercion and such. Though without some > special casing it's going to be darned difficult to match > the performance of a pure python for-loop (especially > for a sequence of integers). Why not move the integer add optimization from ceval.c into PyNumber_Add ? Granted you have an extra call on the fast path, but on the other hand * more code could benefit from this optimization * you don't have code related to the same operation spread in several files * the ceval loop has a reduced footprint -- Ludovic Aubry LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org From dave@boost-consulting.com Wed Apr 23 10:50:59 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 23 Apr 2003 05:50:59 -0400 Subject: [Python-Dev] Re: Fwd: summing a bunch of numbers (or "whatevers") References: <20030423013506.GA2547@cthulhu.gerg.ca> <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net> Message-ID: <847k9lbkzg.fsf@boost-consulting.com> Tim Peters <tim.one@comcast.net> writes: > [Greg Ward] >> Hmmm, a random idea: has filter() ever been used for anything else? >> I didn't think so. So why not remove everything *except* that handy >> special-case: ie. in 3.0, filter(seq) == filter(None, seq) today, and >> that's *all* filter() does. >> >> Just a random thought... > > It's been used for lots of other stuff, but I'm not sure if any other use > wouldn't read better as a listcomp. For example, from spambayes: > > def textparts(msg): > """Return a set of all msg parts with content maintype 'text'.""" > return Set(filter(lambda part: part.get_content_maintype() == 'text', > msg.walk())) > > I think that reads better as: > > return Set([part for part in msg.walk() > if part.get_content_maintype() == 'text']) IMO this one's much nicer than either of those: return Set( filter_(msg.walk(), _1.get_content_maintype() == 'text') ) with filter_ = lambda x,y: filter y,x and _N for N in 0..9 left as an exercise to the reader. It helps my brain a lot to be able to write the sequence before the filtering function, and for the kind of simple lambdas that Python is restricted to, having to name the arguments is just syntactic deadweight. python = best_language([pound for pound in the_world]) but-list-comprehensions-always-read-like-strange-english-to-me-ly y'rs, -- Dave Abrahams Boost Consulting www.boost-consulting.com From mwh@python.net Wed Apr 23 11:23:31 2003 From: mwh@python.net (Michael Hudson) Date: Wed, 23 Apr 2003 11:23:31 +0100 Subject: [Python-Dev] Metatype conflict among bases? In-Reply-To: <844r4qgcog.fsf@boost-consulting.com> (David Abrahams's message of "Tue, 22 Apr 2003 22:39:11 -0400") References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> <844r4qgcog.fsf@boost-consulting.com> Message-ID: <2m3ck9wlzw.fsf@starship.python.net> David Abrahams <dave@boost-consulting.com> writes: > Still, the message is misleading. There's only one base class, so > the metatype conflict is not "among bases". Not arguing with that, but: what would you suggest instead? I'm agin the idea of having small essays in tracebacks... Cheers, M. -- People think I'm a nice guy, and the fact is that I'm a scheming, conniving bastard who doesn't care for any hurt feelings or lost hours of work if it just results in what I consider to be a better system. -- Linus Torvalds From barry@python.org Wed Apr 23 12:19:16 2003 From: barry@python.org (Barry Warsaw) Date: 23 Apr 2003 07:19:16 -0400 Subject: [Python-Dev] Fwd: summing a bunch of numbers (or "whatevers") In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCAEAEEEAB.tim.one@comcast.net> Message-ID: <1051096756.19699.14.camel@anthem> On Tue, 2003-04-22 at 23:59, Tim Peters wrote: > In Python 3.0 that will become a set comprehension <wink>: PEP 274 lives! -Barry From dave@boost-consulting.com Wed Apr 23 13:17:15 2003 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 23 Apr 2003 08:17:15 -0400 Subject: [Python-Dev] Re: Metatype conflict among bases? References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> <844r4qgcog.fsf@boost-consulting.com> <2m3ck9wlzw.fsf@starship.python.net> Message-ID: <znmhxvas.fsf@boost-consulting.com> Michael Hudson <mwh@python.net> writes: > David Abrahams <dave@boost-consulting.com> writes: > >> Still, the message is misleading. There's only one base class, so >> the metatype conflict is not "among bases". > > Not arguing with that, but: what would you suggest instead? I'm agin > the idea of having small essays in tracebacks... metatype conflict: metatype of derived class B must be a (non-strict) subclass of the metatypes of its bases I don't think that's too verbose. Too many traceback messages from Python give no indication of what the actual problem was or how to fix it, so I don't mind getting a bit more essay-like. Just today on python-list I saw this >>> range(map(lambda x:x+1, [0, 100, 3])) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required come up as a problem for someone. -- Dave Abrahams Boost Consulting www.boost-consulting.com From skip@pobox.com Wed Apr 23 14:27:57 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 23 Apr 2003 08:27:57 -0500 Subject: [Python-Dev] okay to beef up tests on the maintenance branch? Message-ID: <16038.38109.39294.770440@montanaro.dyndns.org> Is it okay to beef up the test harness code a little on the 2.2 maintenance branch? I'm installing 2.2.2 on a Solaris 8 machine at the moment and notice two small warts: * -u all isn't accepted * there are no sunos5 expected skips Any problem adding them for 2.2.3 if they aren't already in CVS (they may already be there)? More generally, is improving the test harness okay (not strictly a bug fix) since it doesn't directly affect the performance of the interpreter? Thx, Skip From aleax@aleax.it Wed Apr 23 14:49:54 2003 From: aleax@aleax.it (Alex Martelli) Date: Wed, 23 Apr 2003 15:49:54 +0200 Subject: [Python-Dev] Re: Metatype conflict among bases? In-Reply-To: <znmhxvas.fsf@boost-consulting.com> References: <5.1.1.6.0.20030422182428.02e864a0@mail.rapidsite.net> <2m3ck9wlzw.fsf@starship.python.net> <znmhxvas.fsf@boost-consulting.com> Message-ID: <200304231549.54063.aleax@aleax.it> On Wednesday 23 April 2003 02:17 pm, David Abrahams wrote: ... > it, so I don't mind getting a bit more essay-like. Just today on > python-list I saw this > > >>> range(map(lambda x:x+1, [0, 100, 3])) > > Traceback (most recent call last): > File "<stdin>", line 1, in ? > TypeError: an integer is required > > come up as a problem for someone. It's a bit better in the current CVS Python -- essentially all error messages from built-ins now identify which built-in is involved, and many give extra, pertinent information -- e.g.: [alex@lancelot src]$ ./python -c 'range(map(str,[1,2,3]))' Traceback (most recent call last): File "<string>", line 1, in ? TypeError: range() integer end argument expected, got list. As long as the message still typically fits within one line, I think there can be no substantial objection to making it clearer and more infomative. Alex From fdrake@acm.org Wed Apr 23 15:52:03 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 23 Apr 2003 10:52:03 -0400 Subject: [Python-Dev] okay to beef up tests on the maintenance branch? In-Reply-To: <16038.38109.39294.770440@montanaro.dyndns.org> References: <16038.38109.39294.770440@montanaro.dyndns.org> Message-ID: <16038.43155.910242.470533@grendel.zope.com> Skip Montanaro writes: > * -u all isn't accepted I think "all" and the "-<feature>" syntax should both be added; I don't see any problem with backporting enhancements to the maintenance tools. > * there are no sunos5 expected skips The expected skips information should certainly be maintained on the maintenance branch. Feel free! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From neal@metaslash.com Wed Apr 23 18:21:10 2003 From: neal@metaslash.com (Neal Norwitz) Date: Wed, 23 Apr 2003 13:21:10 -0400 Subject: [Python-Dev] vacation Message-ID: <20030423172110.GO12836@epoch.metaslash.com> I'm going on vacation from Apr 26 - May 6. I will probably not be available during this period. Sometime in the next month or so, I plan to run valgrind and pychecker over everything. I should be done before beta2. Also, the snake farm still has some issues. I will try to improve the snake farm status in May or June. But if anybody wants to volunteer to fix any of the issues, feel free. :-) http://www.lysator.liu.se/xenofarm/python/latest.html Some test failures are: test_logging Solaris 8, RedHat 9 test_getargs2 Solaris 8, Mac OS X test_time RedHat 9, Linux ia64 For a hack which seems to fix test_logging problem, see my comment here: http://python.org/sf/725904 Neal From theller@python.net Wed Apr 23 18:48:06 2003 From: theller@python.net (Thomas Heller) Date: 23 Apr 2003 19:48:06 +0200 Subject: [Python-Dev] vacation In-Reply-To: <20030423172110.GO12836@epoch.metaslash.com> References: <20030423172110.GO12836@epoch.metaslash.com> Message-ID: <3ck95cmh.fsf@python.net> Neal Norwitz <neal@metaslash.com> writes: > Some test failures are: > > test_getargs2 Solaris 8, Mac OS X It seems test_getargs2 fails on big endian platforms. Is the solaris 8 such a machine? See also the comments I added to http://www.python.org/sf/724774. I have the impression that the test is broken. Should I try to fix it (difficult, without access to neither Mac or Solaris), or should it simply be deleted ;-) Thomas From neal@metaslash.com Wed Apr 23 19:02:21 2003 From: neal@metaslash.com (Neal Norwitz) Date: Wed, 23 Apr 2003 14:02:21 -0400 Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation) In-Reply-To: <3ck95cmh.fsf@python.net> References: <20030423172110.GO12836@epoch.metaslash.com> <3ck95cmh.fsf@python.net> Message-ID: <20030423180221.GP12836@epoch.metaslash.com> On Wed, Apr 23, 2003 at 07:48:06PM +0200, Thomas Heller wrote: > Neal Norwitz <neal@metaslash.com> writes: > > > Some test failures are: > > > > test_getargs2 Solaris 8, Mac OS X > > It seems test_getargs2 fails on big endian platforms. Is the solaris 8 > such a machine? I believe so. > See also the comments I added to http://www.python.org/sf/724774. > > I have the impression that the test is broken. Should I try to fix it > (difficult, without access to neither Mac or Solaris), or should it > simply be deleted ;-) I think getargs_ul() is broken. For example, if the user passes more than a single char as the format, memory will be scribbled on. The format should be checked to make sure it contains acceptable values for getargs_ul() to be safe. I fixed a similar problem in revision 1.23 of _testcapimodule.c. See comment and code around line 330. I'm not really sure of the purpose of _testcapimodule, so perhaps the lack of error checking is acceptable? I can fix the problems, but not before the beta will go out. Neal From theller@python.net Wed Apr 23 19:08:47 2003 From: theller@python.net (Thomas Heller) Date: 23 Apr 2003 20:08:47 +0200 Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation) In-Reply-To: <20030423180221.GP12836@epoch.metaslash.com> References: <20030423172110.GO12836@epoch.metaslash.com> <3ck95cmh.fsf@python.net> <20030423180221.GP12836@epoch.metaslash.com> Message-ID: <ist53x3k.fsf@python.net> Neal Norwitz <neal@metaslash.com> writes: > On Wed, Apr 23, 2003 at 07:48:06PM +0200, Thomas Heller wrote: > > Neal Norwitz <neal@metaslash.com> writes: > > > > > Some test failures are: > > > > > > test_getargs2 Solaris 8, Mac OS X > > > > It seems test_getargs2 fails on big endian platforms. Is the solaris 8 > > such a machine? > > I believe so. > > > See also the comments I added to http://www.python.org/sf/724774. > > > > I have the impression that the test is broken. Should I try to fix it > > (difficult, without access to neither Mac or Solaris), or should it > > simply be deleted ;-) > > I think getargs_ul() is broken. That was what I meant. > For example, if the user passes more > than a single char as the format, memory will be scribbled on. The > format should be checked to make sure it contains acceptable values > for getargs_ul() to be safe. It is even broken if only single character formats are passed, because it always uses an unsigned long * as the third parameter, which is wrong for 'B' and 'H' format codes. > > I fixed a similar problem in revision 1.23 of _testcapimodule.c. > See comment and code around line 330. I will take a look. > > I'm not really sure of the purpose of _testcapimodule, so perhaps > the lack of error checking is acceptable? I can fix the problems, > but not before the beta will go out. > > Neal Thomas From guido@python.org Wed Apr 23 19:31:53 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 23 Apr 2003 14:31:53 -0400 Subject: [Python-Dev] Democracy Message-ID: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> I read this interview in ACM's *Ubiquity* which reminded me of the Python developer community. Seems we are doing some things right. Maybe we can learn from it in cases where we aren't. http://www.acm.org/ubiquity/interviews/b_manville_1.html --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Apr 23 20:24:01 2003 From: skip@pobox.com (Skip Montanaro) Date: Wed, 23 Apr 2003 14:24:01 -0500 Subject: [Python-Dev] vacation In-Reply-To: <3ck95cmh.fsf@python.net> References: <20030423172110.GO12836@epoch.metaslash.com> <3ck95cmh.fsf@python.net> Message-ID: <16038.59473.700903.98765@montanaro.dyndns.org> Thomas> I have the impression that the test is broken. Should I try to Thomas> fix it (difficult, without access to neither Mac or Solaris), or Thomas> should it simply be deleted ;-) I have access to both Mac OS X and Solaris 8. I routinely build from CVS on my Mac Laptop (my default Python interpreter there is built from CVS). I can set up a CVS tree on a Solaris 8 machine and test anything you need. Skip From theller@python.net Wed Apr 23 20:37:43 2003 From: theller@python.net (Thomas Heller) Date: 23 Apr 2003 21:37:43 +0200 Subject: [Python-Dev] vacation In-Reply-To: <16038.59473.700903.98765@montanaro.dyndns.org> References: <20030423172110.GO12836@epoch.metaslash.com> <3ck95cmh.fsf@python.net> <16038.59473.700903.98765@montanaro.dyndns.org> Message-ID: <sms92eew.fsf@python.net> Skip Montanaro <skip@pobox.com> writes: > Thomas> I have the impression that the test is broken. Should I try to > Thomas> fix it (difficult, without access to neither Mac or Solaris), or > Thomas> should it simply be deleted ;-) > > I have access to both Mac OS X and Solaris 8. I routinely build from CVS on > my Mac Laptop (my default Python interpreter there is built from CVS). I > can set up a CVS tree on a Solaris 8 machine and test anything you need. In this case I'll try to fix it tomorrow. Thanks, Thomas From aahz@pythoncraft.com Wed Apr 23 20:46:40 2003 From: aahz@pythoncraft.com (Aahz) Date: Wed, 23 Apr 2003 15:46:40 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <1051040847.12834.32.camel@slothrop.zope.com> References: <1051040847.12834.32.camel@slothrop.zope.com> Message-ID: <20030423194638.GA19312@panix.com> On Tue, Apr 22, 2003, Jeremy Hylton wrote: > > Abstract > ======== > > This PEP describes a mechanism for attaching profile and trace > functions to a thread when it is created. This mechanism allows > existing tools, like the profiler, to work with multi-threaded > programs. The new functionality is exposed via a new event type for > trace functions. Hrm. While I don't want to overload what looks like a simple PEP, I'd like some thoughts about how this ought to interact with thread-local storage (if at all). There are some modules (notably the BCD module) that need to keep track of state on a per-thread basis, but without requiring a user of the module to do the work. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From guido@python.org Wed Apr 23 21:58:09 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 23 Apr 2003 16:58:09 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: "Your message of Wed, 23 Apr 2003 15:46:40 EDT." <20030423194638.GA19312@panix.com> References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> Message-ID: <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> > Hrm. While I don't want to overload what looks like a simple PEP, I'd > like some thoughts about how this ought to interact with thread-local > storage (if at all). There are some modules (notably the BCD module) > that need to keep track of state on a per-thread basis, but without > requiring a user of the module to do the work. IMO you can do thread-local storage just fine by attaching private attributes to threading.currentThread(). --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@performancedrivers.com Wed Apr 23 22:53:11 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Wed, 23 Apr 2003 17:53:11 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Apr 23, 2003 at 02:31:53PM -0400 References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030423175310.F15881@localhost.localdomain> On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote: > I read this interview in ACM's *Ubiquity* which reminded me of the > Python developer community. Seems we are doing some things right. > Maybe we can learn from it in cases where we aren't. He seems to be talking more about Governments (and treating companies as governments b/c the people can't or don't want to leave) and knowledge workers broadly. A better comparison would be Habitat for Humanity (and voluntary associations in general). Habitat has some fixed overhead for the organization. They get free labor from anyone that wants to contribute it and agrees with the scope of work. The amount of product they can churn out (houses) is greatly increased by private donations that can hire full-time labor and marginal supplies. Most of the voluntary labor is from the local community who want to see the area improved. Would be home owners have to contribute large amounts of time in exchange for an inexpensive house built mostly by others. It wouldn't go away if there was no funding, it would just be a local fixup club (which do exist). If there is a large group of people that think they should be building differently, they will form their own association (fork) which will take some or all of the patrons and volunteers with it. It maintains its character because the bulk of the labor and all the contributions are voluntary. If they paid everyone and sold the houses at a profit they would be a regular company. The building houses vs building code analogy is not perfect. Houses have fixed costs per deployment a portion of which is paid by the new owner. Software costs are extremely low per copy, so that wouldn't work. People get real but widely varying benefits from a copy of python (personal site v commercial product). In closing, if there is something to be learned by looking at others, specific purpose voluntary associations seem to be the better place to look than governments. -jack From lalo@laranja.org Wed Apr 23 22:54:13 2003 From: lalo@laranja.org (Lalo Martins) Date: Wed, 23 Apr 2003 18:54:13 -0300 Subject: [Python-Dev] Democracy In-Reply-To: <20030423175310.F15881@localhost.localdomain> References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> <20030423175310.F15881@localhost.localdomain> Message-ID: <20030423215413.GD8197@laranja.org> On Wed, Apr 23, 2003 at 05:53:11PM -0400, Jack Diederich wrote: > On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote: > > I read this interview in ACM's *Ubiquity* which reminded me of the > > Python developer community. Seems we are doing some things right. > > Maybe we can learn from it in cases where we aren't. > > He seems to be talking more about Governments (and treating companies as > governments b/c the people can't or don't want to leave) and knowledge workers > broadly. In fact he mentions in the text that the open source community (he uses the term "open software") is a good example of this model. []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://www.laranja.org/pessoal/pgp Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/ GNU: never give up freedom http://www.gnu.org/ From aahz@pythoncraft.com Wed Apr 23 23:40:05 2003 From: aahz@pythoncraft.com (Aahz) Date: Wed, 23 Apr 2003 18:40:05 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030423224005.GA6089@panix.com> On Wed, Apr 23, 2003, Guido van Rossum wrote: > Aahz: >> >> Hrm. While I don't want to overload what looks like a simple PEP, I'd >> like some thoughts about how this ought to interact with thread-local >> storage (if at all). There are some modules (notably the BCD module) >> that need to keep track of state on a per-thread basis, but without >> requiring a user of the module to do the work. > > IMO you can do thread-local storage just fine by attaching private > attributes to threading.currentThread(). Agreed -- *if* Jeremy goes for your threading-only solution. If this PEP hooks in at a lower level, that's going to require that everything else built on top of threads work at a lower level, too. Seems to me that this is a good argument for module-level properties, BTW, or we require that all module attributes be set only through functions. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From aahz@pythoncraft.com Wed Apr 23 23:59:05 2003 From: aahz@pythoncraft.com (Aahz) Date: Wed, 23 Apr 2003 18:59:05 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <20030423175310.F15881@localhost.localdomain> References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> <20030423175310.F15881@localhost.localdomain> Message-ID: <20030423225905.GA11217@panix.com> On Wed, Apr 23, 2003, Jack Diederich wrote: > > In closing, if there is something to be learned by looking at others, > specific purpose voluntary associations seem to be the better place to > look than governments. Excellent post! Another community that I often mention along those lines is science fiction fandom. It's particularly relevant because fandom has many of the same social issues as the programming community (people who are True Believers, unbelievable amounts of politics, people with marginal social skills, and so on). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From pje@telecommunity.com Thu Apr 24 01:14:52 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Wed, 23 Apr 2003 20:14:52 -0400 Subject: [Python-Dev] Updating PEP 246 for type/class unification, 2.2+, etc. Message-ID: <5.1.1.6.0.20030423191448.00a30e20@mail.rapidsite.net> I'd like to propose some revisions to PEP 246 based on experience trying to implement a prototype of it for use in PEAK and Zope (perhaps Twisted as well). The issues I see are as follows: 1. PEP 246 allows TypeError in __conform__ and __adapt__ methods to pass silently. (After considerable work and thought, I was able to reverse-engineer *why*, but that rationale should at least be explicitly documented in the PEP, even if the limitation is unavoidable.) 2. The reference implementation in the PEP has fancy extra features that are not specified by the main body of the PEP, and in some cases raise more questions than they answer about what a valid PEP 246 implementation should do. (adaptRaiseTypeException, adaptForceFailException, _check, etc.) 3. The PEP 246 examples do not illustrate Python 2.2+ idioms for creating usable __conform__ and __adapt__ methods. For example, a class instance with a __call__ method gets stuck in another class in order to (presumably) work around the absence of staticmethod or classmethod in Python prior to version 2.2. The reference implementation also uses string exceptions, which were a no-no even before version 2.2. 4. PEP 246 does not cover implementation issues for developers in the cases where 'obj' is a class or 'protocol' is an instance. The former is particularly important in the context of adapting metaclass instances, and the latter is relevant for using Zope 'Interface' objects (for example) as protocols. None of these issues are unresolvable; in fact I have proposals to address them all. If the PEP authors agree with my assessments, perhaps they will undertake to update the PEP. My goal is not to get a PEP 246 'adapt()' blessed for the Python core or distro in the immediate future, but rather to have a usable reference standard for framework developers to build implementations on. Even more important... I would like framework users to be able to write __conform__ and __adapt__ methods that will be in principle usable by any framework that uses PEP 246 as a standard for adaptation. In this sense, we may view the role of PEP 246 as being similar to the Python DBAPI. So, without further ado, my proposals for revisions to PEP 246 are as follows: Issue #1: My reverse-engineering leads me to the conclusion that PEP 246 specifies that TypeError be ignored because of issue #4 above: using a class as 'obj' or an instance for 'protocol' may lead to a TypeError caused by using a class method as an instance method or vice versa. While the creator of the objects being supplied to 'adapt()' can work around these issues with descriptors, the casual user should not be expected to. Thus, such TypeErrors should be ignored. To resolve this dilemna, I propose that 'adapt()' use the following pseudocode to verify whether a TypeError has arisen from invocation of a method, or the execution of a method: try: # note: real implementation needs to catch AttributeError! result = obj.__conform__(protocol) if result is not None: return result except TypeError: if sys.exc_info()[2].tb_frame is not sys._getframe(): raise In other words, if the exception was raised in the calling frame, it is assumed to be an invocation error rather than an execution error, and can thus be safely ignored. The only "exception" to this pattern is if the targeted method is written in C and thus does not create a separate frame for execution. (Note that C code generated by Pyrex creates dummy execution frames before returning an exception to Python, so this is only an issue for hand-written C code.) The worst case scenario here is that authors of '__conform__' and '__adapt__' methods written in C must 1) guarantee that TypeError will not be raised, 2) accept silent loss of internal TypeErrors, or 3) write code to create a dummy frame when raising an error. As far as Jython impact, the mechanism by which TypeErrors are raised is different, so I do not know if it is possible for the Java or Python levels to cleanly make this differentiation. If Jython simulates Python frames and tracebacks, including only Python-level frames, then this would work more or less directly. I confess I do not understand enough about Jython's implementation at present to know how practical it is under Jython. An alternative might be to recognize the text of the Python exception values for unbound methods, missing arguments, etc., applying to the method being called. This might actually be more complex to implement correctly, though. Issue #2: I propose that the PEP 246 reference implementation be pared down to remove extraneous features. Specifically, I believe that the signature of adapt should be: _marker = object() def adapt(obj, protocol, default=_marker): # ... attempt to return adapted result if default is _marker: raise NotImplementedError(...) 'adaptForceFailException' looks to me like a YAGNI, since an object shouldn't veto its being used for a protocol, if the protocol knows how to adapt it. And the protocol doesn't need to force failure, it can return failure. Rather than raising a TypeError for adaptation failure, and thus "raising" even further confusion regarding the proper handling of TypeError. Finally, the '_check()' function should be dropped. Its presence simply makes it harder to evaluate or consider PEP 246 for inclusion in Python or a framework, because it is left unspecified what '_check()' should do. We are given many examples of what it *could* do, but not what it *should* do. In any event, I think it's a YAGNI because if the object claims it can conform or the protocol claims it can adapt, then what business is it of 'adapt()' to question the consent of the objects involved? Issue #3: Examples should use 'classmethod' for '__adapt__' rather than simulated or real 'staticmethod', and include the case where a subclass delegates to a superclass '__adapt__' method. And string exceptions are right out. Issue #4: Illustrate the issues that arise for adapting classes or metaclass instances, and using instances rather than types as protocols. Ideally, examples of descriptors that work around the issues should be included. (And as soon as I've figured out how to write them, I'll be happy to supply source!) Thoughts, anyone? From guido@python.org Thu Apr 24 01:33:19 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 23 Apr 2003 20:33:19 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: "Your message of Wed, 23 Apr 2003 18:40:05 EDT." <20030423224005.GA6089@panix.com> References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> Message-ID: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> > > Aahz: > >> Hrm. While I don't want to overload what looks like a simple PEP, I'd > >> like some thoughts about how this ought to interact with thread-local > >> storage (if at all). There are some modules (notably the BCD module) > >> that need to keep track of state on a per-thread basis, but without > >> requiring a user of the module to do the work. > On Wed, Apr 23, 2003, Guido van Rossum wrote: > > IMO you can do thread-local storage just fine by attaching private > > attributes to threading.currentThread(). Aahz: > Agreed -- *if* Jeremy goes for your threading-only solution. If this > PEP hooks in at a lower level, that's going to require that everything > else built on top of threads work at a lower level, too. Well, I think it's fair to say that you should use the higher-level threading module if you want higher-level concepts like thread-local storage. (A poor name IMO; it would be better to call it "per-thread data".) > Seems to me that this is a good argument for module-level properties, > BTW, or we require that all module attributes be set only through > functions. I'm not following. What do you mean by module-level properties? --Guido van Rossum (home page: http://www.python.org/~guido/) From amk@amk.ca Wed Apr 23 17:39:47 2003 From: amk@amk.ca (A.M. Kuchling) Date: Wed, 23 Apr 2003 12:39:47 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 Message-ID: <20030423163947.GA24541@nyman.amk.ca> A while ago Paul Rubin proposed adding a Rijndael/AES module to 2.3. (AES = Advanced Encryption Standard, a block cipher that's likely to be around for a long time). Rubin wanted to come up with a nice interface for the module, and has posted some notes toward it. I have an existing implementation that's 2212 lines of code; I like the interface, but opinions may vary. :) Do we want to do anything about this for 2.3? A benefit is that AES is useful, and likely to remain so for the next 20 years; a drawback is that it might entangle the PSF in export-control legalities. I vaguely recall the PSF getting some legal advice on this point; am I misremembering? What was the outcome? If AES gets added, rotor can be deprecated to encourage people to use something better; patch is at <URL:http://www.python.org/sf/679505>. --amk (www.amk.ca) Cerebral circuits in order. Physiognomy dubious. -- K9 assesses the Doctor's condition, in "The Invasion of Time" From guido@python.org Thu Apr 24 02:14:26 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 23 Apr 2003 21:14:26 -0400 Subject: [Python-Dev] Democracy In-Reply-To: "Your message of Wed, 23 Apr 2003 17:53:11 EDT." <20030423175310.F15881@localhost.localdomain> References: <200304231831.h3NIVr729722@pcp02138704pcs.reston01.va.comcast.net> <20030423175310.F15881@localhost.localdomain> Message-ID: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> > On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote: > > I read this interview in ACM's *Ubiquity* which reminded me of the > > Python developer community. Seems we are doing some things right. > > Maybe we can learn from it in cases where we aren't. > > He seems to be talking more about Governments (and treating > companies as governments b/c the people can't or don't want to > leave) and knowledge workers broadly. Well, he specifically points out that the US government is an inappropriate model, and suggests instead to use the government of ancient Athens as a model. Then he goes on to point out several properties of that community that I think match our community pretty well: (1) Shared communal values, including moral reciprocity; you get professional or personal growth in return for your contributions. I think many developers contribute and learn something from the review of their code by others. (2) Structure, a body for debate, dialogue, and decision-making. "The organization is the people." In our case: mailing lists, PEPs, SourceForge, CVS. (3) Specific practices: the right and expectation of *participation*; *consequence* or *accountability*: if you decide something, you have to do the work; *deliberation*: resist partisanship; *merit* as the basis for decisions; and *closure*: debates shouldn't go on forever and once a decision is made, everyone is supposed to get on board. I think all those things match our way of working pretty well! > A better comparison would be Habitat for Humanity (and voluntary > associations in general). [...] Maybe. I get lots of junk mail asking for contributions from HforH and frankly I've always thought of them as yet another charity: there are lots of these, and most of them are so much larger than our community that comparison is difficult. IMO these large charities in general (maybe not HforH, I don't know anything about them because on principle I never open unsolicited mail) are too much like modern-day massive governments already: they typically have a leadership who, like politicians, would do anything to keep or improve their personal position. I hope that's not true for the Python developer community. Certainly my own motivation is the fun I have here and not personal gain!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Apr 24 02:17:08 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 23 Apr 2003 21:17:08 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: "Your message of Wed, 23 Apr 2003 12:39:47 EDT." <20030423163947.GA24541@nyman.amk.ca> References: <20030423163947.GA24541@nyman.amk.ca> Message-ID: <200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net> > A while ago Paul Rubin proposed adding a Rijndael/AES module to 2.3. > (AES = Advanced Encryption Standard, a block cipher that's likely to > be around for a long time). Rubin wanted to come up with a nice > interface for the module, and has posted some notes toward it. I have > an existing implementation that's 2212 lines of code; I like the > interface, but opinions may vary. :) > > Do we want to do anything about this for 2.3? A benefit is that AES > is useful, and likely to remain so for the next 20 years; a drawback > is that it might entangle the PSF in export-control legalities. I > vaguely recall the PSF getting some legal advice on this point; am I > misremembering? What was the outcome? I don't recall; I think Jeremy knows most about these issues. Personally, I expect that even if we could get certification, it would be much easier if there was no encryption code at all in Python, and if people had to get it from a 3rd party site. > If AES gets added, rotor can be deprecated to encourage people to use > something better; patch is at <URL:http://www.python.org/sf/679505>. Rotor should be deprecated regardless; I've never heard of someone using it. --Guido van Rossum (home page: http://www.python.org/~guido/) From agthorr@barsoom.org Thu Apr 24 04:46:57 2003 From: agthorr@barsoom.org (Agthorr) Date: Wed, 23 Apr 2003 20:46:57 -0700 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <20030422054218.GA18642@barsoom.org> References: <20030420183005.GB8449@barsoom.org> <LNBBLJKPBEHFEDALKOLCKEKCEDAB.tim.one@comcast.net> <20030422054218.GA18642@barsoom.org> Message-ID: <20030424034656.GF12507@barsoom.org> On Mon, Apr 21, 2003 at 10:42:18PM -0700, Agthorr wrote: > However, speaking of subclassing Queue: is it likely there are many > user applications that subclass it in a way that would break? (i.e., > they override some, but not all, of the functions intended for > overriding). Answering myself, I notice that the bisect class documents this use of the Queue class: ------------------------------------------------------------------------ The bisect module can be used with the Queue module to implement a priority queue (example courtesy of Fredrik Lundh): \index{Priority Queue} \begin{verbatim} import Queue, bisect class PriorityQueue(Queue.Queue): def _put(self, item): bisect.insort(self.queue, item) ------------------------------------------------------------------------ This example relies on the behavior of the other internal functions of the Queue class. Since my faster Queue class changes the internal structure, it breaks this example. Strangely, the internal functions are not actually mentioned in the documentation for Queue, so this example is somewhat anomalous. However, the comments inside Queue.py *do* suggest subclassing Queue to create non-FIFO queues. The example was not present in 2.2, so removing it may not hurt too many people. I confess I'm new to the Python development process. Who makes decisions about whether this type of change should go in, or not? Do I just submit a patch and cross my fingers? ;) -- Agthorr From tim_one@email.msn.com Thu Apr 24 05:35:39 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 24 Apr 2003 00:35:39 -0400 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <20030424034656.GF12507@barsoom.org> Message-ID: <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com> [Agthorr] >> However, speaking of subclassing Queue: is it likely there are many >> user applications that subclass it in a way that would break? (i.e., >> they override some, but not all, of the functions intended for >> overriding). [Agthorr] > Answering myself, I notice that the bisect class documents this use of > the Queue class: > ------------------------------------------------------------------------ > The bisect module can be used with the Queue module to implement > a priority queue (example courtesy of Fredrik Lundh): \index{Priority > Queue} > > \begin{verbatim} > import Queue, bisect > > class PriorityQueue(Queue.Queue): > def _put(self, item): > bisect.insort(self.queue, item) > ------------------------------------------------------------------------ > > This example relies on the behavior of the other internal functions of > the Queue class. Since my faster Queue class changes the internal > structure, it breaks this example. Strangely, the internal functions > are not actually mentioned in the documentation for Queue, so this > example is somewhat anomalous. However, the comments inside Queue.py > *do* suggest subclassing Queue to create non-FIFO queues. > > The example was not present in 2.2, so removing it may not hurt too > many people. I'm sorry I had to let this thread drop. I had lots of time to type on the weekend, and on Monday because I took that day off from work sick. My time is gone now, though. As a delayed answer to your question, yes, people do this. I expect the most common subclass does just this: def _get(self): return self.queue.pop() That is, for many apps, the first-in part of FIFO isn't needed, and a stack of work is just as good. I'm not sure it wouldn't be just as good for your simulation app, either! People aren't "supposed to" muck with private names, and a single underscore at the front is a convention for saying "please don't muck with this". I don't believe you made a strong enough case to break code that cheats, though: the code as it is now is obviously correct at first glance. The best that can be said for the much hairier circular-buffer business is that it's not obviously incorrect at first glance, and Python isn't immune to that ongoing maintenacne is more expensive than initial development. I also think your use of (presumably many) thousands of Queue items is unusual. A subclass may be welcome, and doc clarifications would certainly be welcome. > I confess I'm new to the Python development process. Who makes > decisions about whether this type of change should go in, or not? Do > I just submit a patch and cross my fingers? ;) There aren't enough volunteers to review patches, and "Guido's team" doesn't spend work hours on Python anymore except as it happens to intersect with important Zope needs, so I'm afraid it may sit there forever. Talking about it on Python-Dev was/is a good thing. If you haven't already, you should devour the developer material at: http://www.python.org/dev/ Right now we're trying to conserve our "spare time" for resolving issues necessary to release 2.3b1 on Friday, so it's hard to keep a conversation going. Don't let any of this discourage you! To become a Python developer requires an almost supernatural love of discouragement -- you'll know what I mean when you meet any of us <wink>. From tim_one@email.msn.com Thu Apr 24 05:42:35 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 24 Apr 2003 00:42:35 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCIEPIEHAB.tim_one@email.msn.com> [Guido] > ... > Certainly my own motivation is the fun I have here and not personal > gain!!! It's good to hear that. I've been worrying that if your goal had been riches and power all along, you must be incompetent <wink>. don't-worry-*our*-goal-is-your-personal-gain-ly y'rs - tim From martin@v.loewis.de Thu Apr 24 06:19:55 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 07:19:55 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <20030423163947.GA24541@nyman.amk.ca> References: <20030423163947.GA24541@nyman.amk.ca> Message-ID: <m3lly079qc.fsf@mira.informatik.hu-berlin.de> "A.M. Kuchling" <amk@amk.ca> writes: > Do we want to do anything about this for 2.3? A benefit is that AES > is useful, and likely to remain so for the next 20 years; a drawback > is that it might entangle the PSF in export-control legalities. I > vaguely recall the PSF getting some legal advice on this point; am I > misremembering? What was the outcome? I think we now formally meet all US export requirements. The requirement is that we inform some agency that we do export cryptographic software. Jeremy did that. I don't recall the exact details of that registration, but I think it would be easy to update it to also report that we export an AES implementation (or, perhaps, our registration was generic to cover all future additions to the SF CVS tree). So I'm all in favour of adding AES to the Python standard library. Regards, Martin From agthorr@barsoom.org Thu Apr 24 06:26:41 2003 From: agthorr@barsoom.org (Agthorr) Date: Wed, 23 Apr 2003 22:26:41 -0700 Subject: [Python-Dev] FIFO data structure? In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com> References: <20030424034656.GF12507@barsoom.org> <LNBBLJKPBEHFEDALKOLCEEPIEHAB.tim_one@email.msn.com> Message-ID: <20030424052640.GG12507@barsoom.org> On Thu, Apr 24, 2003 at 12:35:39AM -0400, Tim Peters wrote: > I'm sorry I had to let this thread drop. I had lots of time to type on the > weekend, and on Monday because I took that day off from work sick. My time > is gone now, though. Quite alright; I understand entirely. I appreciate your responding now :-) > That is, for many apps, the first-in part of FIFO isn't needed, and a stack > of work is just as good. I'm not sure it wouldn't be just as good for your > simulation app, either! It might be. When I originally wrote my simulation-dispatcher, it needed to work with a small FIFO Queue. I'm currently using it for a different project where a large stack would work fine. There's a good chance that sometime in the future I'll need the large FIFO, though. > People aren't "supposed to" muck with private names, and a single underscore > at the front is a convention for saying "please don't muck with > this". I've been thinking about what the "right way" for the Queue to expose it's interface would be. It doesn't seem quite right for those functions to be "public" names either; since they should never actually be called directly by a user program. Is there a convention for member functions that are meant to be overridden, but not (externally) called? > I don't believe you made a strong enough case to break code that cheats, > though: the code as it is now is obviously correct at first glance. The > best that can be said for the much hairier circular-buffer business is that > it's not obviously incorrect at first glance, and Python isn't immune to > that ongoing maintenacne is more expensive than initial development. I also > think your use of (presumably many) thousands of Queue items is unusual. A > subclass may be welcome, and doc clarifications would certainly be > welcome. That's fair. My other primary programming language is C, where the standard libraries tend to be tightly optimized for performance. Hence, my expectations tend to be biased in that direction. That doesn't mean that my expectations are the right way to do things though ;) "Premature optimization is the root of much evil" > There aren't enough volunteers to review patches, and "Guido's team" doesn't > spend work hours on Python anymore except as it happens to intersect with > important Zope needs, so I'm afraid it may sit there forever. Talking about > it on Python-Dev was/is a good thing. If you haven't already, you should > devour the developer material at: > > http://www.python.org/dev/ I have, indeed, already devoured it. :) > Right now we're trying to conserve our "spare time" for resolving issues > necessary to release 2.3b1 on Friday, so it's hard to keep a conversation > going. Okay, in that case I'll drop the Queue issue for now, and revisit the thread on heaps. That's something I feel needs to be done right for 2.3, or a bunch of user code will come to depend on the heap implementation rather than the heap interface. > Don't let any of this discourage you! To become a Python developer requires > an almost supernatural love of discouragement -- you'll know what I mean > when you meet any of us <wink>. Thanks :) -- Agthorr From ji@mit.jyu.fi Thu Apr 24 07:12:05 2003 From: ji@mit.jyu.fi (Jonne Itkonen) Date: Thu, 24 Apr 2003 09:12:05 +0300 (EETDST) Subject: [Python-Dev] Democracy In-Reply-To: <20030423175310.F15881@localhost.localdomain> Message-ID: <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi> On Wed, 23 Apr 2003, Jack Diederich wrote: > On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote: > > I read this interview in ACM's *Ubiquity* which reminded me of the > > Python developer community. Seems we are doing some things right. > > Maybe we can learn from it in cases where we aren't. > > He seems to be talking more about Governments (and treating companies as > governments b/c the people can't or don't want to leave) and knowledge > workers broadly. ... > The building houses vs building code analogy is not perfect. ... > In closing, if there is something to be learned by looking at others, There always is... The article at Ubiquity, Jack's writings, and the appearance of ancient Greeks every here and there... I'd like to point you to http://www.dreamsongs.org/MobSoftware.html Is the resemblance in my eyes, or do we get a glimpse of a shift of paradigm approaching? Jonne From jack@performancedrivers.com Thu Apr 24 09:05:43 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Thu, 24 Apr 2003 04:05:43 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi>; from ji@mit.jyu.fi on Thu, Apr 24, 2003 at 09:12:05AM +0300 References: <20030423175310.F15881@localhost.localdomain> <Pine.HPX.4.44.0304240829210.17419-100000@tarzan.it.jyu.fi> Message-ID: <20030424040543.I15881@localhost.localdomain> On Thu, Apr 24, 2003 at 09:12:05AM +0300, Jonne Itkonen wrote: > On Wed, 23 Apr 2003, Jack Diederich wrote: > > > On Wed, Apr 23, 2003 at 02:31:53PM -0400, Guido van Rossum wrote: > > > I read this interview in ACM's *Ubiquity* which reminded me of the > > > Python developer community. Seems we are doing some things right. > > > Maybe we can learn from it in cases where we aren't. > > > > He seems to be talking more about Governments (and treating companies as > > governments b/c the people can't or don't want to leave) and knowledge > > workers broadly. > ... > > The building houses vs building code analogy is not perfect. > ... > > In closing, if there is something to be learned by looking at others, > > There always is... > > http://www.dreamsongs.org/MobSoftware.html > Before we go too far afield, does anyone know of a Wiki where this kind of thing is dicussed, or is anyone willing to host one? This is a worthwhile conversation, but is a runaway favorite for off topic thread of tomorrow. -jack From mal@lemburg.com Thu Apr 24 09:32:27 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 24 Apr 2003 10:32:27 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <m3lly079qc.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> Message-ID: <3EA7A11B.8090202@lemburg.com> Martin v. L=F6wis wrote: > "A.M. Kuchling" <amk@amk.ca> writes: >=20 >=20 >>Do we want to do anything about this for 2.3? A benefit is that AES >>is useful, and likely to remain so for the next 20 years; a drawback >>is that it might entangle the PSF in export-control legalities. I >>vaguely recall the PSF getting some legal advice on this point; am I >>misremembering? What was the outcome? >=20 > I think we now formally meet all US export requirements. The > requirement is that we inform some agency that we do export > cryptographic software. Jeremy did that. I don't recall the exact > details of that registration, but I think it would be easy to update > it to also report that we export an AES implementation (or, perhaps, > our registration was generic to cover all future additions to the SF > CVS tree). >=20 > So I'm all in favour of adding AES to the Python standard library. -1. Why do you only look at US export rules when discussing crypto code in Python ? There are plenty of other countries where importing/exporting and/or using such code is illegal: http://rechten.kub.nl/koops/cryptolaw/cls2.htm Please keep the crypto code separate from the core Python distribution. --=20 Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 24 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 61 days left From mal@lemburg.com Thu Apr 24 09:36:56 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 24 Apr 2003 10:36:56 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA7A11B.8090202@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> Message-ID: <3EA7A228.2010705@lemburg.com> M.-A. Lemburg wrote: > Why do you only look at US export rules when discussing crypto > code in Python ? There are plenty of other countries where > importing/exporting and/or using such code is illegal: > > http://rechten.kub.nl/koops/cryptolaw/cls2.htm > > Please keep the crypto code separate from the core Python > distribution. Here's a really nice graphical overview: http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 24 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 61 days left From nramchandani@harveynash.com Thu Apr 24 12:23:22 2003 From: nramchandani@harveynash.com (Neeta Ramchandani) Date: Thu, 24 Apr 2003 12:23:22 +0100 Subject: [Python-Dev] Python Developers Message-ID: <sea7d753.075@lon_nw_9.harveynash.com> Hi, I know there aren't many of you guys, but I have an Investment Bank that is= looking for an OO Scriptor, with at least 2 years Java experience with Uni= x and proper Python development skills. =20 Anyone know anyone....or anyone interested in this 3-6 contract? Neeta Ramchandani Key Account Manager Harvey Nash IT Investment Banking / Finance Team DD: 020 73331518 Fax: 020 73332657 E-mail: nramchandani@harveynash.com Website: www.harveynash.com ***************************************************************** IMPORTANT NOTICE The information in this e-mail and any attached files is CONFIDENTIAL and m= ay be legally privileged or prohibited from disclosure and unauthorised use= . The views of the author may not necessarily reflect those of the Company. It is intended solely for the addressee, or the employee or agent responsib= le for delivering such materials to the addressee. If you have received th= is message in error please return it to the sender then delete the email an= d destroy any copies of it. If you are not the intended recipient, any for= m of reproduction, dissemination, copying, disclosure, modification, distri= bution and/or publication or any action taken or omitted to be taken in rel= iance upon this message or its attachments is prohibited and may be unlawfu= l. At present the integrity of e-mail across the Internet cannot be guaranteed= and messages sent via this medium are potentially at risk. All liability = is excluded to the extent permitted by law for any claims arising as a resu= lt of the use of this medium to transmit information by or to the Harvey Na= sh Group plc. ***************************************************************** From guido@python.org Thu Apr 24 13:20:53 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 08:20:53 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: "Your message of Thu, 24 Apr 2003 10:36:56 +0200." <3EA7A228.2010705@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> Message-ID: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> > M.-A. Lemburg wrote: > > Why do you only look at US export rules when discussing crypto > > code in Python ? There are plenty of other countries where > > importing/exporting and/or using such code is illegal: > > > > http://rechten.kub.nl/koops/cryptolaw/cls2.htm > > > > Please keep the crypto code separate from the core Python > > distribution. > > Here's a really nice graphical overview: > > http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm Thanks for the URLs! Another good reason to avoid tying up Python with crypto. --Guido van Rossum (home page: http://www.python.org/~guido/) From akuchlin@mems-exchange.org Thu Apr 24 13:38:02 2003 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 24 Apr 2003 08:38:02 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA7A228.2010705@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> Message-ID: <20030424123802.GA32257@ute.mems-exchange.org> On Thu, Apr 24, 2003 at 10:36:56AM +0200, M.-A. Lemburg wrote: >Here's a really nice graphical overview: > http://rechten.kub.nl/koops/cryptolaw/cls-sum.htm Thanks for posting this link; very nice! Guido wrote: >Rotor should be deprecated regardless; I've never heard of someone >using it. Actually, back when Zope was Principia, products could be shipped as encrypted .pyc's, and the rotor module was used to encrypt them. It's not relevant now, though. I'll mark the deprecation patch as accepted and check it in. --amk (www.amk.ca) "Generic identifier" -- think about it too much and your head explodes. -- Sean McGrath at IPC7, discussing SGML terminology From aahz@pythoncraft.com Thu Apr 24 14:31:52 2003 From: aahz@pythoncraft.com (Aahz) Date: Thu, 24 Apr 2003 09:31:52 -0400 Subject: [Python-Dev] Python Developers In-Reply-To: <sea7d753.075@lon_nw_9.harveynash.com> References: <sea7d753.075@lon_nw_9.harveynash.com> Message-ID: <20030424133152.GC12899@panix.com> On Thu, Apr 24, 2003, Neeta Ramchandani wrote: > > I know there aren't many of you guys, but I have an Investment Bank > that is looking for an OO Scriptor, with at least 2 years Java > experience with Unix and proper Python development skills. Anyone > know anyone....or anyone interested in this 3-6 contract? Please send this to jobs@python.org; that's where to advertise for Python jobs. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From skip@pobox.com Thu Apr 24 15:10:17 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 09:10:17 -0500 Subject: [Python-Dev] why is test_socketserver in expected skips? Message-ID: <16039.61513.240914.807445@montanaro.dyndns.org> test_socketserver seems to be in all the expected skip lists except for (oddly enough) os2emx. It correctly bails if the network resource isn't set and the 2.2 branch version seems to complete for me on my Mac OS X system. When run like: % ./python.exe ../Lib/test/test_socketserver.py the 2.3 branch version fails because the network resource isn't enabled: Traceback (most recent call last): File "../Lib/test/test_socketserver.py", line 5, in ? test_support.requires('network') File "/Users/skip/src/python/head/dist/src/Lib/test/test_support.py", line 68, in requires raise ResourceDenied(msg) test.test_support.ResourceDenied: Use of the `network' resource not enabled [5953 refs] Seems like a fairly simple change to test_support.requires() would correct things: def requires(resource, msg=None): # see if the caller's module is __main__ - if so, treat as if # the resource was set if sys._getframe().f_back.f_globals.get("__name__") == "__main__": return if not is_resource_enabled(resource): if msg is None: msg = "Use of the `%s' resource not enabled" % resource raise ResourceDenied(msg) Someone please shout if the above not-quite-obvious code doesn't look correct. Thx, Skip From guido@python.org Thu Apr 24 15:18:26 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 10:18:26 -0400 Subject: [Python-Dev] why is test_socketserver in expected skips? In-Reply-To: Your message of "Thu, 24 Apr 2003 09:10:17 CDT." <16039.61513.240914.807445@montanaro.dyndns.org> References: <16039.61513.240914.807445@montanaro.dyndns.org> Message-ID: <200304241418.h3OEIQA11173@odiug.zope.com> > test_socketserver seems to be in all the expected skip lists except > for (oddly enough) os2emx. Probably because the os2emx port hasn't been updated in a while. > It correctly bails if the network resource isn't set and the 2.2 > branch version seems to complete for me on my Mac OS X system. When > run like: > > % ./python.exe ../Lib/test/test_socketserver.py > > the 2.3 branch version fails because the network resource isn't enabled: > > Traceback (most recent call last): > File "../Lib/test/test_socketserver.py", line 5, in ? > test_support.requires('network') > File "/Users/skip/src/python/head/dist/src/Lib/test/test_support.py", line 68, in requires > raise ResourceDenied(msg) > test.test_support.ResourceDenied: Use of the `network' resource not enabled > [5953 refs] > > Seems like a fairly simple change to test_support.requires() would > correct things: > > def requires(resource, msg=None): > # see if the caller's module is __main__ - if so, treat as if > # the resource was set > if sys._getframe().f_back.f_globals.get("__name__") == "__main__": > return > if not is_resource_enabled(resource): > if msg is None: > msg = "Use of the `%s' resource not enabled" % resource > raise ResourceDenied(msg) > > Someone please shout if the above not-quite-obvious code doesn't look > correct. Looks good to me; I've thought of this myself occasionally. Please also update the README file for testing to mention this detail! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Thu Apr 24 15:58:36 2003 From: barry@python.org (Barry Warsaw) Date: 24 Apr 2003 10:58:36 -0400 Subject: [Python-Dev] Re: [I18n-sig] Changes to gettext.py for Python 2.3 In-Reply-To: <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de> References: <1050083516.11172.40.camel@barry> <3E971D8A.5020006@v.loewis.de> <1050092819.11172.89.camel@barry> <m3istk3pr3.fsf@mira.informatik.hu-berlin.de> <1050511925.9818.78.camel@barry> <m3u1cy9rlp.fsf@mira.informatik.hu-berlin.de> <1050521768.14112.15.camel@barry> <3E9DD413.8030002@v.loewis.de> <1051041205.32490.51.camel@barry> <m3fzoatc0j.fsf@mira.informatik.hu-berlin.de> Message-ID: <1051196316.22909.13.camel@barry> On Tue, 2003-04-22 at 18:15, Martin v. Löwis wrote: > For safety, I'd recommend that you use byte string msgids if > conversion to Unicode fails. Otherwise, I'm fine with automatically > coercing everything to Unicode. For now, I'll add a comment to the code at the point of conversion since I'm not sure whether it's better to throw an exception or attempt to carry on with 8-bit strings. I'll update the docs too. > I do know about catalogs that use Latin-1 in msgids (to represent > accented characters in the names of authors). That should not cause > failures. Cool, thanks for the feedback Martin! -Barry From skip@pobox.com Thu Apr 24 16:34:13 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 10:34:13 -0500 Subject: [Python-Dev] why is test_socketserver in expected skips? In-Reply-To: <200304241418.h3OEIQA11173@odiug.zope.com> References: <16039.61513.240914.807445@montanaro.dyndns.org> <200304241418.h3OEIQA11173@odiug.zope.com> Message-ID: <16040.1013.400199.534299@montanaro.dyndns.org> >>>>> "Guido" == Guido van Rossum <guido@python.org> writes: >> test_socketserver seems to be in all the expected skip lists except >> for (oddly enough) os2emx. Guido> Probably because the os2emx port hasn't been updated in a while. I guess I should have phrased my question differently. Why is it on any expected skip lists at all? It seems to me that the 'network' resouce requirement is sufficient to keep it from being run inappropriately. >> def requires(resource, msg=None): >> # see if the caller's module is __main__ - if so, treat as if >> # the resource was set ... >> Someone please shout if the above not-quite-obvious code doesn't look >> correct. Guido> Looks good to me; I've thought of this myself occasionally. Guido> Please also update the README file for testing to mention this Guido> detail! Thanks, I'll tuck it into CVS later today. Skip From guido@python.org Thu Apr 24 16:48:54 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 11:48:54 -0400 Subject: [Python-Dev] why is test_socketserver in expected skips? In-Reply-To: Your message of "Thu, 24 Apr 2003 10:34:13 CDT." <16040.1013.400199.534299@montanaro.dyndns.org> References: <16039.61513.240914.807445@montanaro.dyndns.org> <200304241418.h3OEIQA11173@odiug.zope.com> <16040.1013.400199.534299@montanaro.dyndns.org> Message-ID: <200304241548.h3OFms411960@odiug.zope.com> > I guess I should have phrased my question differently. Why is it on > any expected skip lists at all? It seems to me that the 'network' > resouce requirement is sufficient to keep it from being run > inappropriately. I seems to me too. It looks like such tests are still added to the "skipped" lists by regrtest.main(). Maybe they shouldn't be? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Thu Apr 24 16:52:56 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 24 Apr 2003 11:52:56 -0400 Subject: [Python-Dev] why is test_socketserver in expected skips? In-Reply-To: <16040.1013.400199.534299@montanaro.dyndns.org> References: <16039.61513.240914.807445@montanaro.dyndns.org> <200304241418.h3OEIQA11173@odiug.zope.com> <16040.1013.400199.534299@montanaro.dyndns.org> Message-ID: <16040.2136.405300.211588@grendel.zope.com> Skip Montanaro writes: > I guess I should have phrased my question differently. Why is it on any > expected skip lists at all? It seems to me that the 'network' resouce > requirement is sufficient to keep it from being run inappropriately. Being on the expected skip lists doesn't keep it from running; the resource requirement handles that, and causes it to be skipped when the resource isn't enabled. Until fairly recently, a test that was skipped due to resource denial was still reported as an unexpected skip if it wasn't listed. That was fixed in Lib/test/regrtest.py revision 1.122. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From theller@python.net Thu Apr 24 17:24:48 2003 From: theller@python.net (Thomas Heller) Date: 24 Apr 2003 18:24:48 +0200 Subject: [Python-Dev] Re: test_getargs2 failures (was: vacation) In-Reply-To: <20030423180221.GP12836@epoch.metaslash.com> References: <20030423172110.GO12836@epoch.metaslash.com> <3ck95cmh.fsf@python.net> <20030423180221.GP12836@epoch.metaslash.com> Message-ID: <8ytz3ltb.fsf@python.net> Neal Norwitz <neal@metaslash.com> writes: > I think getargs_ul() is broken. For example, if the user passes more > than a single char as the format, memory will be scribbled on. The > format should be checked to make sure it contains acceptable values > for getargs_ul() to be safe. > > I fixed a similar problem in revision 1.23 of _testcapimodule.c. > See comment and code around line 330. > I've replaced the getargs_ul() function and friends with new getargs_X() functions for all the tested format codes. I've also adapted test_getargs2 to use these new functions. Skip and Jack have offered to test this, anyone else is welcome as well to report crashes. Thomas From mcherm@mcherm.com Thu Apr 24 17:44:09 2003 From: mcherm@mcherm.com (Michael Chermside) Date: Thu, 24 Apr 2003 09:44:09 -0700 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option Message-ID: <1051202649.3ea814599f6fa@mcherm.com> Tim: Don't get a swelled head or anything ;-), but your generator-based version of walk() is beautiful piece of work. I don't mean the code (although that's clean and readable), but the design. Using a generator is clearly good, having it return (path,names) tuples is a nice way to work, and having it return (path,dirnames,filenames) tuples is inspired. (If you want them lumped together, just add the lists!) Allowing the consumer to modify control the flow by modifying dirnames is very nice. And the fact that it's so simple to code (22 short lines) is a testament to the power of generators. I'm +2 on putting this in immediately and deprecating os.path.walk(). -- Michael Chermside From guido@python.org Thu Apr 24 17:50:25 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 12:50:25 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: Your message of "Thu, 24 Apr 2003 09:44:09 PDT." <1051202649.3ea814599f6fa@mcherm.com> References: <1051202649.3ea814599f6fa@mcherm.com> Message-ID: <200304241650.h3OGoPM15432@odiug.zope.com> > From: Michael Chermside <mcherm@mcherm.com> > Tim: > > Don't get a swelled head or anything ;-), but your generator-based > version of walk() is beautiful piece of work. I don't mean the code > (although that's clean and readable), but the design. Using a > generator is clearly good, having it return (path,names) tuples is a > nice way to work, and having it return (path,dirnames,filenames) > tuples is inspired. (If you want them lumped together, just add the > lists!) Allowing the consumer to modify control the flow by > modifying dirnames is very nice. And the fact that it's so simple to > code (22 short lines) is a testament to the power of generators. > > I'm +2 on putting this in immediately and deprecating os.path.walk(). Agreed. How about naming it os.walk()? I think it's not OS specific -- all the OS specific stuff is part of os.path. So we only need one implementation. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Thu Apr 24 18:04:57 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 13:04:57 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option References: <1051202649.3ea814599f6fa@mcherm.com> <200304241650.h3OGoPM15432@odiug.zope.com> Message-ID: <001e01c30a83$a2cf6440$b6b8958d@oemcomputer> > > I'm +2 on putting this in immediately and deprecating os.path.walk(). > > Agreed. How about naming it os.walk()? I think it's not OS specific > -- all the OS specific stuff is part of os.path. So we only need one > implementation. Double check on SF. Someone had posted a patch for this and Martin v. Löwis had some reasons for rejecting it or something else that should have been done at the same time. Raymond Hettinger ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From Raymond Hettinger" <python@rcn.com Thu Apr 24 18:48:09 2003 From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 13:48:09 -0400 Subject: [Python-Dev] netrc.py Message-ID: <004601c30a89$c4459e40$b6b8958d@oemcomputer> Bram Moolenaar > > Please at least do not produce the NetrcParseError when the > > "login" field is omitted. This can be done by changing the > > "else:" above "malformed %s entry" to "elif not password:". > > That is the minimal change to make this module work on my > > system. Bram is requesting netrc.py be modified to exclude entries without a login field. An example use case is for mail servers: machine mail password fruit If the change is made, the line won't be handled at all. It would be silently skipped. Currently is raises a NetrcParseError. Do you guys think this is appropriate? On the one hand, it's a bummer that netrc.py cannot currently be used with files containing these lines. On the other hand, silently skipping over them doesn't seem quite right either. Raymond Hettinger P.S. He would also like (but does not have to have) this backported. From martin@v.loewis.de Thu Apr 24 19:21:20 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 20:21:20 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA7A11B.8090202@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> Message-ID: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> "M.-A. Lemburg" <mal@lemburg.com> writes: > Why do you only look at US export rules when discussing crypto > code in Python ? Because only exporting matters. Importing is no problem: You can easily *remove* stuff from the distribution, by creating a copy of package that doesn't have the code that cannot be imported. That would be the job of whoever wants to import it. Exporting also only matters from the servers which host the Python distribution, i.e. the US and the Netherlands. Regards, Martin From martin@v.loewis.de Thu Apr 24 19:22:41 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 20:22:41 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > Thanks for the URLs! Another good reason to avoid tying up Python > with crypto. I don't consider that a good reason. Including batteries is one of the strengths of Python, and if there are useful libraries, we should attempt to include them. Regards, Martin From esr@thyrsus.com Thu Apr 24 19:26:22 2003 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 24 Apr 2003 14:26:22 -0400 Subject: [Python-Dev] netrc.py In-Reply-To: <004601c30a89$c4459e40$b6b8958d@oemcomputer> References: <004601c30a89$c4459e40$b6b8958d@oemcomputer> Message-ID: <20030424182622.GA21500@thyrsus.com> Raymond Hettinger <raymond.hettinger@verizon.net>: > Bram Moolenaar > > > Please at least do not produce the NetrcParseError when the > > > "login" field is omitted. This can be done by changing the > > > "else:" above "malformed %s entry" to "elif not password:". > > > That is the minimal change to make this module work on my > > > system. > > Bram is requesting netrc.py be modified to exclude entries > without a login field. An example use case is for mail servers: > > machine mail password fruit > > If the change is made, the line won't be handled at all. It > would be silently skipped. Currently is raises a NetrcParseError. > > Do you guys think this is appropriate? On the one hand, > it's a bummer that netrc.py cannot currently be used with > files containing these lines. On the other hand, silently > skipping over them doesn't seem quite right either. As the original designer, I say -1. It's not clear to me or when how entries of this kind have value. But I'm willing to be convinced otherwise by a good argument. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> From guido@python.org Thu Apr 24 19:30:30 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 14:30:30 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: Your message of "24 Apr 2003 20:22:41 +0200." <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304241830.h3OIUUj22372@odiug.zope.com> > > Thanks for the URLs! Another good reason to avoid tying up Python > > with crypto. > > I don't consider that a good reason. Including batteries is one of the > strengths of Python, and if there are useful libraries, we should > attempt to include them. IMO there are more important batteries to include before we deal with the hassle of registering for crypto stuff. Even if it's harmless, the inclusion of any crypto at all causes some people to have to go through a lot of corporate red tape. I just dealt with questions from someone who was re-exporting Python and needed answers for his corporate lawyer. If I had to say "yes, Python contains an AES implementation" his red tape amount would have multiplied. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Thu Apr 24 19:37:20 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 13:37:20 -0500 Subject: [Python-Dev] netrc.py In-Reply-To: <004601c30a89$c4459e40$b6b8958d@oemcomputer> References: <004601c30a89$c4459e40$b6b8958d@oemcomputer> Message-ID: <16040.12000.700605.215458@montanaro.dyndns.org> >>>>> "Raymond" == Raymond Hettinger <raymond.hettinger@verizon.net> writes: Raymond> Bram Moolenaar >> > Please at least do not produce the NetrcParseError when the >> > "login" field is omitted. This can be done by changing the >> > "else:" above "malformed %s entry" to "elif not password:". >> > That is the minimal change to make this module work on my >> > system. Raymond> Bram is requesting netrc.py be modified to exclude entries Raymond> without a login field. An example use case is for mail Raymond> servers: Raymond> machine mail password fruit Raymond> If the change is made, the line won't be handled at all. It Raymond> would be silently skipped. Currently is raises a Raymond> NetrcParseError. Why not have it add an entry to self.hosts with an empty string associated with the 'login' key? Skip From fdrake@acm.org Thu Apr 24 19:43:06 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 24 Apr 2003 14:43:06 -0400 Subject: [Python-Dev] netrc.py In-Reply-To: <20030424182622.GA21500@thyrsus.com> References: <004601c30a89$c4459e40$b6b8958d@oemcomputer> <20030424182622.GA21500@thyrsus.com> Message-ID: <16040.12346.526703.651003@grendel.zope.com> Eric S. Raymond writes: > As the original designer, I say -1. It's not clear to me or when > how entries of this kind have value. But I'm willing to be > convinced otherwise by a good argument. Looking at the netrc(5) manpage on my RedHat 7.3 box, I'd say it's clear that a machine entry without a login should specifically suppress autologin for that machine. For example, this .netrc file: machine ftp.example.com default login anonymous password fred@example.com should cause autologin on every machine except for ftp.example.com. If the ftp.example.com entry is simply dropped, the default could be used, and that would be wrong. So I think the entry should be retained, with a login value of None. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From wesleyhenwood@hotmail.com Thu Apr 24 20:41:48 2003 From: wesleyhenwood@hotmail.com (wesley henwood) Date: Thu, 24 Apr 2003 19:41:48 +0000 Subject: [Python-Dev] PyRun_* functions Message-ID: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com> Quote from py docs: "Note also that several of these functions take FILE* parameters. On particular issue which needs to be handled carefully is that the FILE structure for different C libraries can be different and incompatible. Under Windows (at least), it is possible for dynamically linked extensions to actually use different libraries, so care should be taken that FILE* parameters are only passed to these functions if it is certain that they were created by the same library that the Python runtime is using." How does one do this - make sure that they were created with the same lib? Its seems that it would be a good enhancement to remove the FILE pointer parameter from these functions, and just use the file name. For example, change PyRun_SimpleFile( FILE *fp, char *filename) to PyRun_SimpleFile(char *filename). Then no one would have to worry about the incompatibility. _________________________________________________________________ From python@rcn.com Thu Apr 24 20:49:12 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 15:49:12 -0400 Subject: [Python-Dev] netrc.py References: <004601c30a89$c4459e40$b6b8958d@oemcomputer><20030424182622.GA21500@thyrsus.com> <16040.12346.526703.651003@grendel.zope.com> Message-ID: <005201c30a9a$94e23580$b6b8958d@oemcomputer> [Fred L. Drake, Jr.] > Looking at the netrc(5) manpage on my RedHat 7.3 box, I'd say it's > clear that a machine entry without a login should specifically > suppress autologin for that machine. For example, this .netrc file: > > machine ftp.example.com > default login anonymous password fred@example.com > > should cause autologin on every machine except for ftp.example.com. > If the ftp.example.com entry is simply dropped, the default could be > used, and that would be wrong. > > So I think the entry should be retained, with a login value of None. [Skip Montanaro] > Why not have it add an entry to self.hosts with an empty string associated > with the 'login' key? Since existing apps expect a string, the empty string approach may be preferable. Raymond Hettinger From fdrake@acm.org Thu Apr 24 20:52:26 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 24 Apr 2003 15:52:26 -0400 Subject: [Python-Dev] netrc.py In-Reply-To: <005201c30a9a$94e23580$b6b8958d@oemcomputer> References: <004601c30a89$c4459e40$b6b8958d@oemcomputer> <20030424182622.GA21500@thyrsus.com> <16040.12346.526703.651003@grendel.zope.com> <005201c30a9a$94e23580$b6b8958d@oemcomputer> Message-ID: <16040.16506.722640.409224@grendel.zope.com> Raymond Hettinger writes: > Since existing apps expect a string, the empty string approach may > be preferable. I could live with that. The real point is that it's wrong to drop records without a login on the floor. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From python@rcn.com Thu Apr 24 20:57:31 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 15:57:31 -0400 Subject: [Python-Dev] netrc.py References: <004601c30a89$c4459e40$b6b8958d@oemcomputer><20030424182622.GA21500@thyrsus.com><16040.12346.526703.651003@grendel.zope.com><005201c30a9a$94e23580$b6b8958d@oemcomputer> <16040.16506.722640.409224@grendel.zope.com> Message-ID: <007a01c30a9b$be1d3660$b6b8958d@oemcomputer> > > Since existing apps expect a string, the empty string approach may > > be preferable. [Fred] > I could live with that. The real point is that it's wrong to drop > records without a login on the floor. Since that solution is friendly to existing apps, do you think it is reasonable to backport it? Raymond From fdrake@acm.org Thu Apr 24 21:04:50 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 24 Apr 2003 16:04:50 -0400 Subject: [Python-Dev] netrc.py In-Reply-To: <007a01c30a9b$be1d3660$b6b8958d@oemcomputer> References: <004601c30a89$c4459e40$b6b8958d@oemcomputer> <20030424182622.GA21500@thyrsus.com> <16040.12346.526703.651003@grendel.zope.com> <005201c30a9a$94e23580$b6b8958d@oemcomputer> <16040.16506.722640.409224@grendel.zope.com> <007a01c30a9b$be1d3660$b6b8958d@oemcomputer> Message-ID: <16040.17250.119938.342267@grendel.zope.com> Raymond Hettinger writes: > Since that solution is friendly to existing apps, do you think > it is reasonable to backport it? I'd be happy with that; not handling those entries is a bug in my book (using the netrc(5) manpage as my critical reference), so it's very reasonable to backport. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From drifty@alum.berkeley.edu Thu Apr 24 21:09:26 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Thu, 24 Apr 2003 13:09:26 -0700 (PDT) Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <200304241830.h3OIUUj22372@odiug.zope.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> Message-ID: <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> [Guido van Rossum] > > > Thanks for the URLs! Another good reason to avoid tying up Python > > > with crypto. > > > > I don't consider that a good reason. Including batteries is one of the > > strengths of Python, and if there are useful libraries, we should > > attempt to include them. > > IMO there are more important batteries to include before we deal with > the hassle of registering for crypto stuff. Even if it's harmless, > the inclusion of any crypto at all causes some people to have to go > through a lot of corporate red tape. <snip> Good point. I admit I think it would be cool to have an AES implementation in the stdlib, but I don't see it as crucial. I think does make sense, though, to have a package that is maintained separately that python-dev pseudo endorses (like PyXML and win32all) that contains all of this crypto stuff. -Brett From fdrake@acm.org Thu Apr 24 21:09:41 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 24 Apr 2003 16:09:41 -0400 Subject: [Python-Dev] PyRun_* functions In-Reply-To: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com> References: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com> Message-ID: <16040.17541.672978.719267@grendel.zope.com> wesley henwood writes: > How does one do this - make sure that they were created with the same lib? Exactly. This tends not to be a problem on Unix (though possible), but isn't so rare on Windows. > Its seems that it would be a good enhancement to remove the FILE pointer > parameter from these functions, and just use the file name. For example, > change PyRun_SimpleFile( FILE *fp, char *filename) to PyRun_SimpleFile(char > *filename). Then no one would have to worry about the incompatibility. That would be a loss of functionality -- these can currently work with, for example, standard input. That's currently required by the interpreter's main program. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From guido@python.org Thu Apr 24 21:12:25 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 16:12:25 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: Your message of "Thu, 24 Apr 2003 13:09:26 PDT." <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> Message-ID: <200304242012.h3OKCP325878@odiug.zope.com> > I think does make sense, though, to have a package that is maintained > separately that python-dev pseudo endorses (like PyXML and win32all) that > contains all of this crypto stuff. Right. --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Thu Apr 24 21:22:46 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Thu, 24 Apr 2003 13:22:46 -0700 (PDT) Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <1051215797.1847.6.camel@barry> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> Message-ID: <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> [Barry Warsaw] > On Thu, 2003-04-24 at 16:09, Brett Cannon wrote: > > > I think does make sense, though, to have a package that is maintained > > separately that python-dev pseudo endorses (like PyXML and win32all) that > > contains all of this crypto stuff. > > Where do we draw the line? Do we delete the ssl stuff? What about the > crypto hashes? hmac? md5? mpz? All of Chapter 15 in the library > reference manual? > Anything that causes export issues should be separate. From my understanding hash functions are not regulated. I believe SSL is okay because the encryption is not high enough (this all from memory, so don't take this as hard fact). But you are right, Barry, there is no hard line that can easily be drawn; joys of laws in the US. =) -Brett From martin@v.loewis.de Thu Apr 24 21:29:10 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 22:29:10 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> Message-ID: <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> Brett Cannon <bac@OCF.Berkeley.EDU> writes: > Anything that causes export issues should be separate. From my > understanding hash functions are not regulated. I believe SSL is okay > because the encryption is not high enough (this all from memory, so don't > take this as hard fact). It is probably pointless to discuss this among non-lawyers, however, I do believe that a strict "no crypto" policy would cause the removal of all the modules that Barry mentioned. For the specific case of OpenSSL, it seems pretty clear that it *cannot* be exported from the US without telling the respective agency. When I studied their rules, I came to the conclusion that even the *wrapper* around it needs to be declared (so both the Windows binary release and the source release cannot be exported without being declared in advance). Of course, if one considers crypto stuff as useless and a waste of time, then probably https is not interesting, either. Regards, Martin From guido@python.org Thu Apr 24 21:34:10 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 16:34:10 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: Your message of "24 Apr 2003 22:29:10 +0200." <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304242034.h3OKYAt26069@odiug.zope.com> > Of course, if one considers crypto stuff as useless and a waste of > time, then probably https is not interesting, either. Except that some URLs are *only* accessible through https -- this was the push for supporting https. I don't see the same kind of push for AES yet. It is true that we should report the inclusiong of openssl and its wrappers to the authorities. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Thu Apr 24 21:37:03 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 22:37:03 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <200304242034.h3OKYAt26069@odiug.zope.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> <200304242034.h3OKYAt26069@odiug.zope.com> Message-ID: <m3el3reiog.fsf@mira.informatik.hu-berlin.de> Guido van Rossum <guido@python.org> writes: > It is true that we should report the inclusiong of openssl and its > wrappers to the authorities. I think we did already; Jeremy should know the details. Regards, Martin From python@rcn.com Thu Apr 24 21:14:43 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 16:14:43 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> Message-ID: <00a901c30a9e$253a63c0$b6b8958d@oemcomputer> > > > I don't consider that a good reason. Including batteries is one of the > > > strengths of Python, and if there are useful libraries, we should > > > attempt to include them. > > > > IMO there are more important batteries to include before we deal with > > the hassle of registering for crypto stuff. Even if it's harmless, > > the inclusion of any crypto at all causes some people to have to go > > through a lot of corporate red tape. Just sneak it through by labeling it as a Python-to-Perl conversion tool ;) Raymond From barry@python.org Thu Apr 24 21:23:17 2003 From: barry@python.org (Barry Warsaw) Date: 24 Apr 2003 16:23:17 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> Message-ID: <1051215797.1847.6.camel@barry> On Thu, 2003-04-24 at 16:09, Brett Cannon wrote: > > IMO there are more important batteries to include before we deal with > > the hassle of registering for crypto stuff. Even if it's harmless, > > the inclusion of any crypto at all causes some people to have to go > > through a lot of corporate red tape. > <snip> > > Good point. I admit I think it would be cool to have an AES > implementation in the stdlib, but I don't see it as crucial. > > I think does make sense, though, to have a package that is maintained > separately that python-dev pseudo endorses (like PyXML and win32all) that > contains all of this crypto stuff. Where do we draw the line? Do we delete the ssl stuff? What about the crypto hashes? hmac? md5? mpz? All of Chapter 15 in the library reference manual? -Barry From agthorr@barsoom.org Thu Apr 24 21:48:12 2003 From: agthorr@barsoom.org (Agthorr) Date: Thu, 24 Apr 2003 13:48:12 -0700 Subject: [Python-Dev] heaps Message-ID: <20030424204812.GD24838@barsoom.org> I brought up heapq last week, but there was only brief discussion before the issue got sidetracked into a discussion of FIFO queues. I'd like to revisit heapq. The two people who responded seemed to agree that the existing heapq interface was lacking, and this seemed to be the sentiment many months ago when heapq was added. I'll summarize some of the heap interfaces that have been proposed: - the heapq currently in CVS: - Provides functions to manipulate a list organized as a binary heap - Advantages: - Internal binary heap structure is transparent to user, useful for educational purposes - Low overhead - Already in CVS - My MinHeap/MaxHeap classes: - Provides a class with heap access routines, using a list internally - Advantages: - Implementation is opaque, so it can be replaced later with Fibonacci heaps or Paired heaps without breaking user programs - Provides an adjust_key() command needed by some applications (e.g. Dijkstra's Algorithm) - David Eppstein's priorityDictionary class: - Provides a class with a dictionary-style interface (ex: heap['cat'] = 5 would give 'cat' a priority of 5 in the heap) - Advantages: - Implementation is opaque, so it can be replaced later with Fibonacci heaps or Paired heaps without breaking user programs - A dictionary interface may be more intuitive for certain applications - Limitation: - Objects with the same value may only have a single instance in the heap. I'd very much like to see the current heapq replaced with a different interface in time for 2.3. I believe that an opaque object is better, since it allows more flexibility later. If the current heapq is released, user program will start to use it, and then it will be much more difficult to switch to a different heap algorithm later, should that become desirable. Also, decrease-key is an important feature that many users will expect from a heap; this operation is notably missing from heapq. I'm willing to do whatever work is necessary to get a more flexible heap interface into 2.3. If the consensus prefers my MinHeap (or something similar), I'll gladly write documentation (and have already written rather brutal tests). Somebody with authority, just tell me where to pour my energy in this matter :) -- Agthorr From guido@python.org Thu Apr 24 21:50:23 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 16:50:23 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: Your message of "24 Apr 2003 22:37:03 +0200." <m3el3reiog.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> <200304242034.h3OKYAt26069@odiug.zope.com> <m3el3reiog.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304242050.h3OKoNx26182@odiug.zope.com> > > It is true that we should report the inclusiong of openssl and its > > wrappers to the authorities. > > I think we did already; Jeremy should know the details. Jeremy sits next to me, and he tells me he did not. However it is on his TODO list. --Guido van Rossum (home page: http://www.python.org/~guido/) From neal@metaslash.com Thu Apr 24 22:02:29 2003 From: neal@metaslash.com (Neal Norwitz) Date: Thu, 24 Apr 2003 17:02:29 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <200304242050.h3OKoNx26182@odiug.zope.com> References: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> <200304242034.h3OKYAt26069@odiug.zope.com> <m3el3reiog.fsf@mira.informatik.hu-berlin.de> <200304242050.h3OKoNx26182@odiug.zope.com> Message-ID: <20030424210229.GT12836@epoch.metaslash.com> On Thu, Apr 24, 2003 at 04:50:23PM -0400, Guido van Rossum wrote: > > > It is true that we should report the inclusiong of openssl and its > > > wrappers to the authorities. > > > > I think we did already; Jeremy should know the details. > > Jeremy sits next to me, and he tells me he did not. However it is on > his TODO list. I contacted the BXA which is part of the US Dept. of Commerce: <http://www.bxa.doc.gov/Encryption/PubAvailEncSourceCodeNofify.html>. I think I notified them that Python contains Rotor and then forwarded the info to Jeremy. I'm not sure if there's anything else that needs to be done. It was unclear of whether this was required for each release. My memory is fuzzy. I remember talking to Martin and Jeremy, but this was probably at least 6 months ago. Neal From martin@v.loewis.de Thu Apr 24 22:17:19 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 24 Apr 2003 23:17:19 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <20030424210229.GT12836@epoch.metaslash.com> References: <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <1051215797.1847.6.camel@barry> <Pine.SOL.4.55.0304241318350.4654@death.OCF.Berkeley.EDU> <m3r87rej1l.fsf@mira.informatik.hu-berlin.de> <200304242034.h3OKYAt26069@odiug.zope.com> <m3el3reiog.fsf@mira.informatik.hu-berlin.de> <200304242050.h3OKoNx26182@odiug.zope.com> <20030424210229.GT12836@epoch.metaslash.com> Message-ID: <m3ptnbd28w.fsf@mira.informatik.hu-berlin.de> Neal Norwitz <neal@metaslash.com> writes: > My memory is fuzzy. I remember talking to Martin and Jeremy, but > this was probably at least 6 months ago. You first sent a letter that I include below; you then edited the NOTIFICATION at http://mail.python.org/pipermail/python-dev/2002-March/021785.html It appears that you then didn't actually send the notification to BXA, but that the PSF board passed a motion in http://www.python.org/psf/records/board/minutes-2002-04-09.html charging Jeremy with contacting BXA; it appears that this did not happen, either. Regards, Martin > > I work on an open source project called Python. It is a programming > language which is publicly available at http://www.python.org/. > The current version is 2.2. This software is provided free of charge. > > We would like to comply with US export regulations, however, > we are not sure what, if anything, needs to be done. > > There is an encryption technique used in the rotormodule.c file > (which is attached). This apparently uses 80 bits. > > Do we need to send a NOTIFICATION? Is there anything else we > need to do? > > Thank you, > Neal From python@rcn.com Thu Apr 24 22:52:00 2003 From: python@rcn.com (Raymond Hettinger) Date: Thu, 24 Apr 2003 17:52:00 -0400 Subject: [Python-Dev] heaps References: <20030424204812.GD24838@barsoom.org> Message-ID: <001901c30aab$bf31c060$b6b8958d@oemcomputer> > I'd very much like to see the current heapq replaced with a different > interface in time for 2.3. I believe that an opaque object is better, > since it allows more flexibility later. I'm quite pleased with the version already in CVS. It is a small masterpiece of exposition, sophistication, simplicity, and speed. A class based interface is not necessary for every algorithm. For the other approaches, what might be useful is to define an API and leave it that. The various implementations can be maintained on Cookbook pages, the Vaults of Parnassus, or an SF project. The min/max heap and fibonacci heaps are a great idea. Nice work. Raymond Hettinger From tim.one@comcast.net Thu Apr 24 23:01:42 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 24 Apr 2003 18:01:42 -0400 Subject: [Python-Dev] New test failure on Windows Message-ID: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> Last-second re changes don't appear to be going in the right direction <wink>: C:\Code\python\PCbuild>python ../lib/test/test_re.py Running re_tests test suite test_basic_re_sub (__main__.ReTests) ... ok test_constants (__main__.ReTests) ... ok test_escaped_re_sub (__main__.ReTests) ... ok test_flags (__main__.ReTests) ... ok test_limitations (__main__.ReTests) ... ERROR test_pickling (__main__.ReTests) ... ok test_qualified_re_split (__main__.ReTests) ... ok test_qualified_re_sub (__main__.ReTests) ... ok test_re_escape (__main__.ReTests) ... ok test_re_findall (__main__.ReTests) ... ok test_re_match (__main__.ReTests) ... ok test_re_split (__main__.ReTests) ... ok test_re_subn (__main__.ReTests) ... ok test_search_star_plus (__main__.ReTests) ... ok test_symbolic_refs (__main__.ReTests) ... ok ====================================================================== ERROR: test_limitations (__main__.ReTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_re.py", line 182, in test_limitations self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) File "C:\Code\python\lib\sre.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded ---------------------------------------------------------------------- From gherron@islandtraining.com Thu Apr 24 23:38:42 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 24 Apr 2003 15:38:42 -0700 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> Message-ID: <200304241538.43480.gherron@islandtraining.com> On Thursday 24 April 2003 03:01 pm, Tim Peters wrote: > Last-second re changes don't appear to be going in the right direction > <wink>: > > C:\Code\python\PCbuild>python ../lib/test/test_re.py > Running re_tests test suite > test_basic_re_sub (__main__.ReTests) ... ok > test_constants (__main__.ReTests) ... ok > test_escaped_re_sub (__main__.ReTests) ... ok > test_flags (__main__.ReTests) ... ok > test_limitations (__main__.ReTests) ... ERROR > test_pickling (__main__.ReTests) ... ok > test_qualified_re_split (__main__.ReTests) ... ok > test_qualified_re_sub (__main__.ReTests) ... ok > test_re_escape (__main__.ReTests) ... ok > test_re_findall (__main__.ReTests) ... ok > test_re_match (__main__.ReTests) ... ok > test_re_split (__main__.ReTests) ... ok > test_re_subn (__main__.ReTests) ... ok > test_search_star_plus (__main__.ReTests) ... ok > test_symbolic_refs (__main__.ReTests) ... ok > > ====================================================================== > ERROR: test_limitations (__main__.ReTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_re.py", line 182, in test_limitations > self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) > File "C:\Code\python\lib\sre.py", line 132, in match > return _compile(pattern, flags).match(string) > RuntimeError: maximum recursion limit exceeded > Today's change to test_re (rather than a change to any of the sre code) is the problem. It appears the Skip was attempting to translate the tests to use the unittest module. One test (and perhaps others) were translated incorrectly. The original test was: try: verify(re.match('(x)*', 50000*'x').span() == (0, 50000)) except RuntimeError, v: print v Since this is *supposed* to cause a RuntimeError, it should be translated something like self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x') but definitely not as self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) Here's the CVS log entry: ---------------------------- revision 1.34 date: 2003/04/24 19:43:18; author: montanaro; state: Exp; lines: +294 -371 first cut at unittest version of re tests ---------------------------- Gary Herron From drifty@alum.berkeley.edu Thu Apr 24 23:43:34 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Thu, 24 Apr 2003 15:43:34 -0700 (PDT) Subject: [Python-Dev] When is it okay to ``cvs remove``? Message-ID: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> I am rewriting test_urllib.py from scratch since the current version is very lacking (and out of date; the thing tests against UserDict from odd reason). Since I have written it from scratch I figure doing a ``cvs remove`` on the current test_urllib.py and then adding my new version to get a fresh version numbering? Also, my rewrite is not finished (have some more things I want to test), but what I have so far passes and seems good. Should I bother to check in what I have so far to have it in b1, or hold off until the suite is completely finished? I am assuming since these are unit tests that are passing I don't need to bother with an SF patch to get a code review from someone. -Brett From thomas@xs4all.net Thu Apr 24 23:59:14 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 25 Apr 2003 00:59:14 +0200 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> Message-ID: <20030424225914.GA26254@xs4all.nl> On Thu, Apr 24, 2003 at 03:43:34PM -0700, Brett Cannon wrote: > I am rewriting test_urllib.py from scratch since the current version is > very lacking (and out of date; the thing tests against UserDict from odd > reason). Since I have written it from scratch I figure doing a ``cvs > remove`` on the current test_urllib.py and then adding my new version to > get a fresh version numbering? That's not particularly useful. The only thing that does is create a period in time (or rather, 'history' -- CVS history) in which test_urllib.py doesn't exist. Re-adding the file won't give you a fresh version numbering either, it'll just give you a lot of headaches, especially when there are branches involved (right, Barry ? :-) Just commit your new test_urllib.py directly, when it's all done, using something like cvs commit -r2.0 test_urllib.py But you probably want to discuss the version number you want to force, Guido might like to reserve 2.0 for something (although I think he should use '3000' instead :) CVS is very 4-dimensional; it only allows for one file to exist at any given spot in the entire timeline. It can leave and come back, but it's still the same file. (And, for example, it can never become a directory.) And a file can have only one 1.1 revision. If you have direct access to the CVS repository (which is actually RCS) you can remove the RCS file and start really a'fresh, but that means you lose history. It's nullifies the file. (and is also about as drastic as Galactus' Nullifier ;-) > Also, my rewrite is not finished (have some more things I want to test), > but what I have so far passes and seems good. Should I bother to check in > what I have so far to have it in b1, or hold off until the suite is > completely finished? I am assuming since these are unit tests that are > passing I don't need to bother with an SF patch to get a code review from > someone. It might at least make sense to have some differing platforms run the test before you check it in. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From eppstein@ics.uci.edu Fri Apr 25 00:20:50 2003 From: eppstein@ics.uci.edu (David Eppstein) Date: Thu, 24 Apr 2003 16:20:50 -0700 Subject: [Python-Dev] Re: heaps References: <20030424204812.GD24838@barsoom.org> <001901c30aab$bf31c060$b6b8958d@oemcomputer> Message-ID: <eppstein-C9853A.16204924042003@main.gmane.org> In article <001901c30aab$bf31c060$b6b8958d@oemcomputer>, "Raymond Hettinger" <python@rcn.com> wrote: > > I'd very much like to see the current heapq replaced with a different > > interface in time for 2.3. I believe that an opaque object is better, > > since it allows more flexibility later. > > I'm quite pleased with the version already in CVS. It is a small > masterpiece of exposition, sophistication, simplicity, and speed. > A class based interface is not necessary for every algorithm. It has some elegance, but omits basic operations that are necessary for many heap-based algorithms and are not provided by this interface. Specifically, the three algorithms that use heaps in my upper-division undergraduate algorithms classes are heapsort (for which heapq works fine, but you would generally want to use L.sort() instead), Dijkstra's algorithm (and its relatives such as A* and Prim), which needs the ability to decrease keys, and event-queue-based plane sweep algorithms (e.g. for finding all crossing pairs in a set of line segments) which need the ability to delete items from other than the top. To see how important the lack of these operations is, I decided to compare two implementations of Dijkstra's algorithm. The priority-dict implementation from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/119466 takes as input a graph, coded as nested dicts {vertex: {neighbor: edge length}}. This is a variation of a graph coding suggested in one of Guido's essays that, as Raymond suggests, avoids using a separate class based interface. Here's a simplification of my dictionary-based Dijkstra implementation: def Dijkstra(G,start,end=None): D = {} # dictionary of final distances P = {} # dictionary of predecessors Q = priorityDictionary() # est.dist. of non-final vert. Q[start] = 0 for v in Q: D[v] = Q[v] for w in G[v]: vwLength = D[v] + G[v][w] if w not in D and (w not in Q or vwLength < Q[w]): Q[w] = vwLength P[w] = v return (D,P) Here's a translation of the same implementation to heapq (untested since I'm not running 2.3). Since there is no decrease in heapq, nor any way to find and remove old keys, I changed the algorithm to add new tuples for each new key, leaving the old tuples in place until they bubble up to the top of the heap. def Dijkstra(G,start,end=None): D = {} # dictionary of final distances P = {} # dictionary of predecessors Q = [(0,None,start)] # heap of (est.dist., pred., vert.) while Q: dist,pred,v = heappop(Q) if v in D: continue # tuple outdated by decrease-key, ignore D[v] = dist P[v] = pred for w in G[v]: heappush(Q, (D[v] + G[v][w], v, w)) return (D,P) My analysis of the differences between the two implementations: - The heapq version is slightly complicated (the two lines if...continue) by the need to explicitly ignore tuples with outdated priorities. This need for inserting low-level data structure maintenance code into higher-level algorithms is intrinsic to using heapq, since its data is not structured in a way that can support efficient decrease key operations. - Since the heap version had no way to determine when a new key was smaller than an old one, the heapq implementation needed two separate data structures to maintain predecessors (middle elements of tuples for items in queue, dictionary P for items already removed from queue). In the dictionary implementation, both types of items stored their predecessors in P, so there was no need to transfer this information from one structure to another. - The dictionary version is slightly complicated by the need to look up old heap keys and compare them with the new ones instead of just blasting new tuples onto the heap. So despite the more-flexible heap structure of the dictionary implementation, the overall code complexity of both implementations ends up being about the same. - Heapq forced me to build tuples of keys and items, while the dictionary based heap did not have the same object-creation overhead (unless it's hidden inside the creation of dictionary entries). On the other hand, since I was already building tuples, it was convenient to also store predecessors in them instead of in some other structure. - The heapq version uses significantly more storage than the dictionary: proportional to the number of edges instead of the number of vertices. - The changes I made to Dijkstra's algorithm in order to use heapq might not have been obvious to a non-expert; more generally I think this lack of flexibility would make it more difficult to use heapq for cookbook-type implementation of textbook algorithms. - In Dijkstra's algorithm, it was easy to identify and ignore outdated heap entries, sidestepping the inability to decrease keys. I'm not convinced that this would be as easy in other applications of heaps. - One of the reasons to separate data structures from the algorithms that use them is that the data structures can be replaced by ones with equivalent behavior, without changing any of the algorithm code. The heapq Dijkstra implementation is forced to include code based on the internal details of heapq (specifically, the line initializing the heap to be a one element list), making it less flexible for some uses. The usual reason one might want to replace a data structure is for efficiency, but there are others: for instance, I teach various algorithms classes and might want to use an implementation of Dijkstra's algorithm as a testbed for learning about different priority queue data structures. I could do that with the dictionary-based implementation (since it shows nothing of the heap details) but not the heapq one. Overall, while heapq was usable for implementing Dijkstra, I think it has significant shortcomings that could be avoided by a more well-thought-out interface that provided a little more functionality and a little clearer separation between interface and implementation. -- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science From aahz@pythoncraft.com Fri Apr 25 00:22:48 2003 From: aahz@pythoncraft.com (Aahz) Date: Thu, 24 Apr 2003 19:22:48 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030424232248.GA25695@panix.com> On Wed, Apr 23, 2003, Guido van Rossum wrote: > Aahz: >> >> Seems to me that this is a good argument for module-level properties, >> BTW, or we require that all module attributes be set only through >> functions. > > I'm not following. What do you mean by module-level properties? Data descriptors on module objects. Let's suppose we have, say, a BCD module. For example, we want to set the "global" rounding state on a per-thread basis. By definition, modules are singletons, so there needs to be a container within the module to hold the per-thread rounding state. Question is, how/when do we update that container? Currently, the only option is to require a user to call a function with the new setting as a parameter; I can imagine cases where it would be convenient to be able to simply set the module attribute, exactly the way we now permit with new-style classes. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Why is this newsgroup different from all other newsgroups? From gherron@islandtraining.com Fri Apr 25 00:33:50 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 24 Apr 2003 16:33:50 -0700 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <200304241538.43480.gherron@islandtraining.com> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241538.43480.gherron@islandtraining.com> Message-ID: <200304241633.50247.gherron@islandtraining.com> There's a bit more to this problem. It has to do with the *sre* test versus the *re* tests. When test_sre is run, it claims to run all its own tests as well as all of test_re. However any failed tests in test_re are not reported by test_sre. (Neither the one found by Tim nor any others I just purposely introduced into test_re.) This is clearly a problem with test_sre. Only if you run test_re directly rather than through test_sre do you see Tim's error. I'm hoping that Skip, who made these changes, can fix them. (BTW, I like the idea of putting all these tests into unittest -- the old test code looked like a cancer of multiple test methods grown on top of each other.) Gary Herron On Thursday 24 April 2003 03:38 pm, Gary Herron wrote: > On Thursday 24 April 2003 03:01 pm, Tim Peters wrote: > > Last-second re changes don't appear to be going in the right direction > > <wink>: > > > > C:\Code\python\PCbuild>python ../lib/test/test_re.py > > Running re_tests test suite > > test_basic_re_sub (__main__.ReTests) ... ok > > test_constants (__main__.ReTests) ... ok > > test_escaped_re_sub (__main__.ReTests) ... ok > > test_flags (__main__.ReTests) ... ok > > test_limitations (__main__.ReTests) ... ERROR > > test_pickling (__main__.ReTests) ... ok > > test_qualified_re_split (__main__.ReTests) ... ok > > test_qualified_re_sub (__main__.ReTests) ... ok > > test_re_escape (__main__.ReTests) ... ok > > test_re_findall (__main__.ReTests) ... ok > > test_re_match (__main__.ReTests) ... ok > > test_re_split (__main__.ReTests) ... ok > > test_re_subn (__main__.ReTests) ... ok > > test_search_star_plus (__main__.ReTests) ... ok > > test_symbolic_refs (__main__.ReTests) ... ok > > > > ====================================================================== > > ERROR: test_limitations (__main__.ReTests) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "../lib/test/test_re.py", line 182, in test_limitations > > self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) > > File "C:\Code\python\lib\sre.py", line 132, in match > > return _compile(pattern, flags).match(string) > > RuntimeError: maximum recursion limit exceeded > > Today's change to test_re (rather than a change to any of the sre > code) is the problem. It appears the Skip was attempting to translate > the tests to use the unittest module. One test (and perhaps others) > were translated incorrectly. > > > The original test was: > > try: > verify(re.match('(x)*', 50000*'x').span() == (0, 50000)) > except RuntimeError, v: > print v > > Since this is *supposed* to cause a RuntimeError, it should be > translated something like > > self.assertRaises(RuntimeError, re.match, '(x)*', 50000*'x') > > but definitely not as > > self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) > > > Here's the CVS log entry: > ---------------------------- > revision 1.34 > date: 2003/04/24 19:43:18; author: montanaro; state: Exp; lines: +294 > -371 first cut at unittest version of re tests > ---------------------------- > > Gary Herron > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From thomas@xs4all.net Fri Apr 25 00:03:16 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 25 Apr 2003 01:03:16 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> Message-ID: <20030424230316.GB26254@xs4all.nl> On Thu, Apr 24, 2003 at 08:21:20PM +0200, Martin v. L=F6wis wrote: > Exporting also only matters from the servers which host the Python > distribution, i.e. the US and the Netherlands. Good point. Not only will Guido be exporting crypto software tomorrow whe= n he uploads 2.3b1, he will also be importing it... Especially since he's still a Dutch citizen. I can't figure out if that's a good thing or not. = :) --=20 Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me = spread! From tim.one@comcast.net Fri Apr 25 01:13:05 2003 From: tim.one@comcast.net (Tim Peters) Date: Thu, 24 Apr 2003 20:13:05 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <20030424225914.GA26254@xs4all.nl> Message-ID: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net> [Thomas Wouters, dispensing good CVS advice] > ... > Just commit your new test_urllib.py directly, when it's all done, using > something like > > cvs commit -r2.0 test_urllib.py > > But you probably want to discuss the version number you want to > force, Guido might like to reserve 2.0 for something (although I > think he should use '3000' instead :) That part I didn't grok: why force an artifical version number? I can't imagine a use for that. The "Rewrote from scratch." checkin comment Brett will surely make is milestone enough in the CVS log. From pedronis@bluewin.ch Fri Apr 25 01:17:28 2003 From: pedronis@bluewin.ch (Samuele Pedroni) Date: Fri, 25 Apr 2003 02:17:28 +0200 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: <20030424232248.GA25695@panix.com> References: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.2.1.1.0.20030425021640.0230e0d0@pop.bluewin.ch> At 19:22 24.04.03 -0400, Aahz wrote: >On Wed, Apr 23, 2003, Guido van Rossum wrote: > > Aahz: > >> > >> Seems to me that this is a good argument for module-level properties, > >> BTW, or we require that all module attributes be set only through > >> functions. > > > > I'm not following. What do you mean by module-level properties? > >Data descriptors on module objects. Let's suppose we have, say, a BCD >module. For example, we want to set the "global" rounding state on a >per-thread basis. By definition, modules are singletons, so there needs >to be a container within the module to hold the per-thread rounding >state. Question is, how/when do we update that container? Currently, >the only option is to require a user to call a function with the new >setting as a parameter; I can imagine cases where it would be convenient >to be able to simply set the module attribute, exactly the way we now >permit with new-style classes. see the following thread http://aspn.activestate.com/ASPN/Mail/Message/1497615 From gherron@islandtraining.com Fri Apr 25 01:47:28 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 24 Apr 2003 17:47:28 -0700 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <200304241633.50247.gherron@islandtraining.com> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241538.43480.gherron@islandtraining.com> <200304241633.50247.gherron@islandtraining.com> Message-ID: <200304241747.29059.gherron@islandtraining.com> On Thursday 24 April 2003 04:33 pm, Gary Herron wrote: > There's a bit more to this problem. It has to do with the *sre* test > versus the *re* tests. When test_sre is run, it claims to run all its > own tests as well as all of test_re. However any failed tests in > test_re are not reported by test_sre. (Neither the one found by Tim > nor any others I just purposely introduced into test_re.) This is > clearly a problem with test_sre. Only if you run test_re directly > rather than through test_sre do you see Tim's error. Sigh... I find I was confused and must correct that last paragraph... Test test_sre imports re_test not test_re, and test_re also imports re_test -- perhaps you'll understand my confusion. Running test_sre should *not* find Tim's bug, but running test_re should. Test_sre has no problem, test_re needs to be fixed, and re_test, used by both, is fine. Sigh... Gary Herron From drifty@alum.berkeley.edu Fri Apr 25 01:56:39 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Thu, 24 Apr 2003 17:56:39 -0700 (PDT) Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <20030424225914.GA26254@xs4all.nl> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> <20030424225914.GA26254@xs4all.nl> Message-ID: <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU> [Thomas Wouters] > On Thu, Apr 24, 2003 at 03:43:34PM -0700, Brett Cannon wrote: > > > Also, my rewrite is not finished (have some more things I want to test), > > but what I have so far passes and seems good. Should I bother to check in > > what I have so far to have it in b1, or hold off until the suite is > > completely finished? I am assuming since these are unit tests that are > > passing I don't need to bother with an SF patch to get a code review from > > someone. > > It might at least make sense to have some differing platforms run the test > before you check it in. > OK, I will finish the code then first. Just to double-check, creating a tracker item and initially assigning it to myself will not cause people to ignore it since everyone who cares will see the new item when it gets mailed to Patches and sees I am asking for other people on other platforms beyond OS X to give the code a run, right? And is having new testing suites peer-reviewed a common thing? Or should I only worry about it when there is a slight chance cross-platform issues might sprout up from the tests? I already know to have any questionable code and massive code changes checked, but I also don't want to hold up code I think is good and safe on SF and bug other people to check it for me. -Brett From andymac@bullseye.apana.org.au Thu Apr 24 23:43:55 2003 From: andymac@bullseye.apana.org.au (Andrew MacIntyre) Date: Fri, 25 Apr 2003 09:43:55 +1100 (edt) Subject: [Python-Dev] why is test_socketserver in expected skips? In-Reply-To: <200304241418.h3OEIQA11173@odiug.zope.com> Message-ID: <Pine.OS2.4.44.0304250934160.28662-100000@tenring.andymac.org> On Thu, 24 Apr 2003, Guido van Rossum wrote: > > test_socketserver seems to be in all the expected skip lists except > > for (oddly enough) os2emx. > > Probably because the os2emx port hasn't been updated in a while. As it happens, I routinely test with the network resource enabled, and the EMX port Makefile explicitly enables it for the test target. So I never considered test_socketserver an expected skip... -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From guido@python.org Fri Apr 25 02:22:57 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 21:22:57 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: "Your message of Thu, 24 Apr 2003 15:43:34 PDT." <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> Message-ID: <200304250122.h3P1MvJ01176@pcp02138704pcs.reston01.va.comcast.net> > I am rewriting test_urllib.py from scratch since the current version > is very lacking (and out of date; the thing tests against UserDict > from odd reason). Since I have written it from scratch I figure > doing a ``cvs remove`` on the current test_urllib.py and then adding > my new version to get a fresh version numbering? No, just copy it on top. and check it in. We don't do fresh version numbering. :-) > Also, my rewrite is not finished (have some more things I want to > test), but what I have so far passes and seems good. Should I > bother to check in what I have so far to have it in b1, or hold off > until the suite is completely finished? I am assuming since these > are unit tests that are passing I don't need to bother with an SF > patch to get a code review from someone. I'd say check it in and keep working on it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 25 02:32:14 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 21:32:14 -0400 Subject: [Python-Dev] draft PEP: Trace and Profile Support for Threads In-Reply-To: "Your message of Thu, 24 Apr 2003 19:22:48 EDT." <20030424232248.GA25695@panix.com> References: <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> <20030424232248.GA25695@panix.com> Message-ID: <200304250132.h3P1WEc01920@pcp02138704pcs.reston01.va.comcast.net> > >> Seems to me that this is a good argument for module-level properties, > >> BTW, or we require that all module attributes be set only through > >> functions. > > > > I'm not following. What do you mean by module-level properties? > > Data descriptors on module objects. I promise you will never get these. Modules are supposed to be robust and simple. If you want fancy, you can use classes and instances. > Let's suppose we have, say, a BCD module. For example, we want to > set the "global" rounding state on a per-thread basis. By > definition, modules are singletons, so there needs to be a container > within the module to hold the per-thread rounding state. Question > is, how/when do we update that container? Currently, the only > option is to require a user to call a function with the new setting > as a parameter; I can imagine cases where it would be convenient to > be able to simply set the module attribute, exactly the way we now > permit with new-style classes. Hm, why hide the mechanism? I'd say let the BCD module get an options object by explicitly asking for the current thread (or using a higher-level per-thread data facility), and let the user make a function call to set the state -- the function can request the per-thread options object and update it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 25 02:51:34 2003 From: guido@python.org (Guido van Rossum) Date: Thu, 24 Apr 2003 21:51:34 -0400 Subject: [Python-Dev] New test failure on Windows In-Reply-To: "Your message of Thu, 24 Apr 2003 17:47:28 PDT." <200304241747.29059.gherron@islandtraining.com> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241538.43480.gherron@islandtraining.com> <200304241633.50247.gherron@islandtraining.com> <200304241747.29059.gherron@islandtraining.com> Message-ID: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> I think I understand the problem, and I've checked something in that makes the test pass, by insisting that the match raise RuntimeError with a specific error message. This is what was tested before; that particular error message was part of the expected output in Lib/test/output/test_re, which is now no longer needed and which I have hence deleted. (Hmm, I wonder if there are any other files in Lib/test/output that are no longer needed? All those files should eventually disappear...) --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Fri Apr 25 03:01:52 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Thu, 24 Apr 2003 22:01:52 -0400 Subject: [Python-Dev] Data Descriptors on module objects (was Re: draft PEP: Trace and Profile Support for Threads) In-Reply-To: <20030424232248.GA25695@panix.com> References: <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.1.0.14.0.20030424215447.0220a3b0@mail.telecommunity.com> At 07:22 PM 4/24/03 -0400, Aahz wrote: >Data descriptors on module objects. If you *really* need them, you can have them. from types import ModuleType import time, sys class ModuleWithDescriptor(ModuleType): bar = property(lambda self: time.time()) moduleFoo = ModuleWithDescriptor() # named module must be importable, but not yet imported; # parent package must be in sys.modules moduleFoo.__name__ = "mypackage.foo" sys.modules['mypacakge.foo'] = reload(moduleFoo) import mypackage.foo # watch the time change... print mypackage.foo.bar print mypackage.foo.bar I *love* new-style classes. I use the trick above for lazy module importation; a subclass of ModuleType that doesn't import itself until the first time a __getattribute__ occurs. From pje@telecommunity.com Fri Apr 25 03:50:49 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Thu, 24 Apr 2003 22:50:49 -0400 Subject: [Python-Dev] Data Descriptors on module objects (was Re: draft PEP: Trace and Profile Support for Threads) In-Reply-To: <5.1.0.14.0.20030424215447.0220a3b0@mail.telecommunity.com> References: <20030424232248.GA25695@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> <1051040847.12834.32.camel@slothrop.zope.com> <20030423194638.GA19312@panix.com> <200304232058.h3NKw9G30648@pcp02138704pcs.reston01.va.comcast.net> <20030423224005.GA6089@panix.com> <200304240033.h3O0XJF31358@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <5.1.0.14.0.20030424224918.02ca5010@mail.telecommunity.com> At 10:01 PM 4/24/03 -0400, Phillip J. Eby wrote: ># named module must be importable, but not yet imported; ># parent package must be in sys.modules >moduleFoo.__name__ = "mypackage.foo" >sys.modules['mypacakge.foo'] = >reload(moduleFoo) Oops, that was supposed to read: sys.modules['mypackage.foo'] = moduleFoo From gherron@islandtraining.com Fri Apr 25 03:56:49 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 24 Apr 2003 19:56:49 -0700 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241747.29059.gherron@islandtraining.com> <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304241956.49764.gherron@islandtraining.com> On Thursday 24 April 2003 06:51 pm, Guido van Rossum wrote: > I think I understand the problem, and I've checked something in that > makes the test pass, by insisting that the match raise RuntimeError > with a specific error message. This is what was tested before; that > particular error message was part of the expected output in > Lib/test/output/test_re, which is now no longer needed and which I > have hence deleted. Looks good. Perhaps test_re should be (or should have been) phased out. Test_sre makes many of the same tests (including today's offending one), as well as many new ones, and both run all the many old test from re_test. It must be a (historical) quirk that they both exist. It's mostly a waste to run both, and having two is a maintenance hassle, underscored by the fact that Skip has choosen the less important one of the two (IMHO) to modernize. It's not a high priority, but perhaps I'll look at straightening things out in the (somewhat distant) future. Gary Herron From skip@pobox.com Fri Apr 25 04:00:51 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 22:00:51 -0500 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> Message-ID: <16040.42211.529453.408981@montanaro.dyndns.org> Tim> ====================================================================== Tim> ERROR: test_limitations (__main__.ReTests) Tim> ---------------------------------------------------------------------- Tim> Traceback (most recent call last): Tim> File "../lib/test/test_re.py", line 182, in test_limitations Tim> self.assertEqual(re.match('(x)*', 50000*'x').span(), (0, 50000)) Tim> File "C:\Code\python\lib\sre.py", line 132, in match Tim> return _compile(pattern, flags).match(string) Tim> RuntimeError: maximum recursion limit exceeded My apologies. I made most of these changes a couple months ago. test_re has been failing with the stack limit problem all this time. I thought it was related to the usual Mac OS X stack limit problem. Thanks to Guido and Gary also for elucidating and fixing the problem while I was at my son's hockey game. (I know, I'll get my priorities straight one of these days...) Skip From skip@pobox.com Fri Apr 25 04:05:28 2003 From: skip@pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 22:05:28 -0500 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <200304241956.49764.gherron@islandtraining.com> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241747.29059.gherron@islandtraining.com> <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> <200304241956.49764.gherron@islandtraining.com> Message-ID: <16040.42488.834447.899129@montanaro.dyndns.org> Gary> It's mostly a waste to run both, and having two is a maintenance Gary> hassle, underscored by the fact that Skip has choosen the less Gary> important one of the two (IMHO) to modernize. I think it would be better to fold missing tests in from test_sre to test_re, not so much because I've partly converted test_re to use unittest, but because "re" is what people generally import. It never even occurred to me to look for "test_sre" when I was looking for a candidate test suite to convert to unittest. I'll keep working at completing the conversion. Skip From tim_one@email.msn.com Fri Apr 25 04:17:56 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 24 Apr 2003 23:17:56 -0400 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <LNBBLJKPBEHFEDALKOLCCEAOEIAB.tim_one@email.msn.com> [Guido] > I think I understand the problem, and I've checked something in that > makes the test pass, by insisting that the match raise RuntimeError > with a specific error message. This is what was tested before; that > particular error message was part of the expected output in > Lib/test/output/test_re, which is now no longer needed and which I > have hence deleted. That's all exactly right. Thanks! > (Hmm, I wonder if there are any other files in Lib/test/output that > are no longer needed? All those files should eventually disappear...) Fred used to keep good track of this, so I doubt there's a big backlog. I expect the best candidates are those (like the re tests) recently converted to unittest. Getting rid of expected-output files should be part of such a conversion (or of a conversion to doctest). OTOH, the expected-output kind of test remains fine by me! It used to be very painful to see what went wrong when things failed, but quite some time ago that mechanism was reworked to save all the output and display a diff instead. From barry@python.org Fri Apr 25 04:19:57 2003 From: barry@python.org (Barry Warsaw) Date: 24 Apr 2003 23:19:57 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <20030424225914.GA26254@xs4all.nl> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> <20030424225914.GA26254@xs4all.nl> Message-ID: <1051240796.11580.4.camel@geddy> On Thu, 2003-04-24 at 18:59, Thomas Wouters wrote: > That's not particularly useful. The only thing that does is create a period > in time (or rather, 'history' -- CVS history) in which test_urllib.py > doesn't exist. Re-adding the file won't give you a fresh version numbering > either, it'll just give you a lot of headaches, especially when there are > branches involved (right, Barry ? :-) And one thing we do /not/ need is more headaches with cvs. :) The specific problem I've been fighting with (in Mailman's cvs) is that I've cvs rm'd some binary files, but both a cvs checkout and a cvs export continue to resurrect the files when I provide -r on the initial command. If I do a checkout of the trunk, then cvs up to the tag, the file goes away as intended. Sigh. > Just commit your new test_urllib.py directly, when it's all done, using > something like > > cvs commit -r2.0 test_urllib.py > > But you probably want to discuss the version number you want to force, Guido > might like to reserve 2.0 for something (although I think he should use > '3000' instead :) I know Guido doesn't care, but I like to have the file major revision numbers match the s/w's major rev number. Really, I just hate to see huge minor revision numbers on files. I hate it as much as I hate to hear Tim's tummy rumbling, right around noon. -Barry From tim_one@email.msn.com Fri Apr 25 04:32:07 2003 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 24 Apr 2003 23:32:07 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <1051240796.11580.4.camel@geddy> Message-ID: <LNBBLJKPBEHFEDALKOLCMEBAEIAB.tim_one@email.msn.com> [Barry Warsaw] > ... > I know Guido doesn't care, but I like to have the file major revision > numbers match the s/w's major rev number. Really, I just hate to see > huge minor revision numbers on files. Good news: I'm living proof that you can learn to ignore that files *have* CVS revision numbers. If you need a milestone marker, apply a tag. > I hate it as much as I hate to hear Tim's tummy rumbling, right around > noon. Lucky for both of us that my lunch admin almost never lets that happen anymore. If I could remember his name, I'd recommend him to you. From gherron@islandtraining.com Fri Apr 25 04:48:12 2003 From: gherron@islandtraining.com (Gary Herron) Date: Thu, 24 Apr 2003 20:48:12 -0700 Subject: [Python-Dev] New test failure on Windows In-Reply-To: <16040.42488.834447.899129@montanaro.dyndns.org> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241956.49764.gherron@islandtraining.com> <16040.42488.834447.899129@montanaro.dyndns.org> Message-ID: <200304242048.12864.gherron@islandtraining.com> On Thursday 24 April 2003 08:05 pm, Skip Montanaro wrote: > Gary> It's mostly a waste to run both, and having two is a maintenance > Gary> hassle, underscored by the fact that Skip has choosen the less > Gary> important one of the two (IMHO) to modernize. > > I think it would be better to fold missing tests in from test_sre to > test_re, not so much because I've partly converted test_re to use unittest, > but because "re" is what people generally import. It never even occurred > to me to look for "test_sre" when I was looking for a candidate test suite > to convert to unittest. I'll keep working at completing the conversion. Sure. This is sensible. Gary Herron From DavidA@ActiveState.com Fri Apr 25 05:10:22 2003 From: DavidA@ActiveState.com (David Ascher) Date: Thu, 24 Apr 2003 21:10:22 -0700 Subject: [Python-Dev] Cryptographic stuff for 2.3 References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <3EA7A228.2010705@lemburg.com> <200304241220.h3OCKra32500@pcp02138704pcs.reston01.va.comcast.net> <m3d6jbg3gu.fsf@mira.informatik.hu-berlin.de> <200304241830.h3OIUUj22372@odiug.zope.com> <Pine.SOL.4.55.0304241307080.4654@death.OCF.Berkeley.EDU> <200304242012.h3OKCP325878@odiug.zope.com> Message-ID: <3EA8B52E.7090505@ActiveState.com> Guido van Rossum wrote: >>I think does make sense, though, to have a package that is maintained >>separately that python-dev pseudo endorses (like PyXML and win32all) that >>contains all of this crypto stuff. > > > Right. Although of course then the crypto requirements would impact that package, and MAL's point applies to it. --david From Anthony Baxter <anthony@interlink.com.au> Fri Apr 25 06:02:24 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Fri, 25 Apr 2003 15:02:24 +1000 Subject: [Python-Dev] shellwords In-Reply-To: <2mlly6pgff.fsf@starship.python.net> Message-ID: <200304250502.h3P52PH25342@localhost.localdomain> >>> Michael Hudson wrote > Particularly the file-manipulation stuff... shutil tends to lose > somewhat x-platform. The other file manipulation thingy that would be good would be to abstract out the bits of tarfile and zipfile and make a standard interface to the two. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From martin@v.loewis.de Fri Apr 25 06:06:01 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 25 Apr 2003 07:06:01 +0200 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCMECNEEAB.tim.one@comcast.net> Message-ID: <m3y91zb1za.fsf@mira.informatik.hu-berlin.de> Tim Peters <tim.one@comcast.net> writes: > That part I didn't grok: why force an artifical version number? I can't > imagine a use for that. The "Rewrote from scratch." checkin comment Brett > will surely make is milestone enough in the CVS log. Bumping the major number makes a more visible change. There is no technical reason to do that, nor one to avoid doing so if you like the visible change. A number of files in the Python CVS do have a 2.x version number; I always wondered why that is. Regards, Martin From guido@python.org Fri Apr 25 06:16:29 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 01:16:29 -0400 Subject: [Python-Dev] New test failure on Windows In-Reply-To: "Your message of Thu, 24 Apr 2003 19:56:49 PDT." <200304241956.49764.gherron@islandtraining.com> References: <BIEJKCLHCIOIHAGOKOLHMEDHFHAA.tim.one@comcast.net> <200304241747.29059.gherron@islandtraining.com> <200304250151.h3P1pYc02769@pcp02138704pcs.reston01.va.comcast.net> <200304241956.49764.gherron@islandtraining.com> Message-ID: <200304250516.h3P5GTI02992@pcp02138704pcs.reston01.va.comcast.net> > Perhaps test_re should be (or should have been) phased out. Test_sre > makes many of the same tests (including today's offending one), as > well as many new ones, and both run all the many old test from > re_test. It must be a (historical) quirk that they both exist. It's > mostly a waste to run both, and having two is a maintenance hassle, > underscored by the fact that Skip has choosen the less important one > of the two (IMHO) to modernize. > > It's not a high priority, but perhaps I'll look at straightening > things out in the (somewhat distant) future. Yes, this is mostly a historical artefact from the time when SRE was one of the two provided RE implementations. If you can straighten this one out, be my guest. I see no reason to stop working on the tests while Python is in beta. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Fri Apr 25 06:14:51 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 25 Apr 2003 07:14:51 +0200 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> <20030424225914.GA26254@xs4all.nl> <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU> Message-ID: <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de> Brett Cannon <bac@OCF.Berkeley.EDU> writes: > OK, I will finish the code then first. Just to double-check, creating a > tracker item and initially assigning it to myself will not cause people to > ignore it since everyone who cares will see the new item when it gets > mailed to Patches and sees I am asking for other people on other platforms > beyond OS X to give the code a run, right? Wrong; I do ignore patches that are assigned. > And is having new testing suites peer-reviewed a common thing? Or should > I only worry about it when there is a slight chance cross-platform issues > might sprout up from the tests? I believe the general policy is that if you are certain that a certain patch is useful and correct, you don't need to post it on SF; that is the case in particular if you are *the* maintainer of that piece of code. So if you have doubts, post on SF - but do ask yourself whether there is anybody who you think could eliminate those doubts; if there is no true expert around, the patch will stay unreviewed forever. The policy about beta releases is (or should be) stricter. No new features, and perhaps no new tests unless they test for a bug that gets fixed. So in the beta cycle, patches are posted to SF just to store them there until after the release of Python 2.3. In the specific case, ask yourself what the cost would be if you produce a test failure under conditions that you consider obscure. Will enough people test the test before 2.3 is released? Does the new test suite behave differently enough from the old one to make false positives a possibility? Regards, Martin From guido@python.org Fri Apr 25 06:19:08 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 01:19:08 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: "Your message of 24 Apr 2003 23:19:57 EDT." <1051240796.11580.4.camel@geddy> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> <20030424225914.GA26254@xs4all.nl> <1051240796.11580.4.camel@geddy> Message-ID: <200304250519.h3P5J8x03015@pcp02138704pcs.reston01.va.comcast.net> > I know Guido doesn't care, but I like to have the file major revision > numbers match the s/w's major rev number. Really, I just hate to see > huge minor revision numbers on files. Well, some files already have a 2.x revno, others don't. The 2.x revnos were introduced in ancient times. I'm all for switching to 3.x when we're doing Python 3.0, but until then, I see no reason to play with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 25 06:24:45 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 01:24:45 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: "Your message of 25 Apr 2003 07:14:51 +0200." <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de> References: <Pine.SOL.4.55.0304241539170.12770@death.OCF.Berkeley.EDU> <20030424225914.GA26254@xs4all.nl> <Pine.SOL.4.55.0304241751530.12770@death.OCF.Berkeley.EDU> <m3u1cnb1kk.fsf@mira.informatik.hu-berlin.de> Message-ID: <200304250524.h3P5OjH03063@pcp02138704pcs.reston01.va.comcast.net> > The policy about beta releases is (or should be) stricter. No new > features, and perhaps no new tests unless they test for a bug that > gets fixed. So in the beta cycle, patches are posted to SF just to > store them there until after the release of Python 2.3. I don't see much of a reason to be so strict about no new tests. > In the specific case, ask yourself what the cost would be if you > produce a test failure under conditions that you consider obscure. > Will enough people test the test before 2.3 is released? Does the new > test suite behave differently enough from the old one to make false > positives a possibility? This is always a good set of questions to ask yourself. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Apr 25 07:42:11 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 Apr 2003 08:42:11 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> Message-ID: <3EA8D8C3.40503@lemburg.com> Martin v. L=F6wis wrote: > "M.-A. Lemburg" <mal@lemburg.com> writes: >=20 >>Why do you only look at US export rules when discussing crypto >>code in Python ? >=20 > Because only exporting matters. Importing is no problem: You can > easily *remove* stuff from the distribution, by creating a copy of > package that doesn't have the code that cannot be imported. That would > be the job of whoever wants to import it. >=20 > Exporting also only matters from the servers which host the Python > distribution, i.e. the US and the Netherlands. That's really optimistic. Every CD vendor, mirror site, etc. in the world hosting the Python distribution would have to go through the business of evaluating whether it's legal to distribute Python or not in their particular case. Even better: users who download Python from some web-site/CD would have to trace back the path the Python version took to be sure that they are using a legally exported and imported version. Crypto is just too much (legal) work if you're serious about it. I also don't really see a problem here: there are plenty good crypto packages out there ready to be used. Not having them in the core distribution raises the awareness bar just a little to make people think about whether it's legal to use them in their particular case. So again: why put the whole Python distribution at risk just because you want to make life easier for the small share of people actually using such code ? --=20 Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 25 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 60 days left From martin@v.loewis.de Fri Apr 25 08:01:18 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Apr 2003 09:01:18 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA8D8C3.40503@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> <3EA8D8C3.40503@lemburg.com> Message-ID: <3EA8DD3E.8090201@v.loewis.de> M.-A. Lemburg wrote: > That's really optimistic. Every CD vendor, mirror site, etc. in the > world hosting the Python distribution would have to go through the > business of evaluating whether it's legal to distribute Python or not > in their particular case. Every CD vendor, mirror site, etc. would have to perform a risk analysis, yes. That goes beyond analysing the legal status only - people will usually also take into account what the risk of prosecution is. They already do that for all other software they distribute, and apparently come to the conclusion that the risk of being prosecuted is nearly zero. > Crypto is just too much (legal) work if you're serious about it. So then you would advise to remove the OpenSSL support from the Windows distribution, and from Python altogether? Because if not, why would it be bad to add more cryptographic packages to the standard Python distribution? Either you violate some law in some country already by distributing Python from A to B, or you don't. Adding another package doesn't change anything here. > I also don't really see a problem here: there are plenty good > crypto packages out there ready to be used. And it may be indeed the case that authors of such package fear the loss of reputation if competing packages were included into the Python distribution :-( Regards, Martin From tim_one@email.msn.com Fri Apr 25 08:12:47 2003 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 25 Apr 2003 03:12:47 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304241650.h3OGoPM15432@odiug.zope.com> Message-ID: <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com> [Guido] > Agreed. How about naming it os.walk()? I think it's not OS specific > -- all the OS specific stuff is part of os.path. So we only need one > implementation. I've checked this in, modified to treat symlinks the same way os.path.walk() treated them, and with docs and test cases. It wasn't my intent to cut off people who want fancier stuff, but available time is finite, and at least now they can demonstrate their sincerity by supplying code, doc, and test suite patches <wink>. From mal@lemburg.com Fri Apr 25 09:02:26 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 Apr 2003 10:02:26 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA8DD3E.8090201@v.loewis.de> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> <3EA8D8C3.40503@lemburg.com> <3EA8DD3E.8090201@v.loewis.de> Message-ID: <3EA8EB92.4070606@lemburg.com> Martin v. L=F6wis wrote: > M.-A. Lemburg wrote: >=20 >> That's really optimistic. Every CD vendor, mirror site, etc. in the >> world hosting the Python distribution would have to go through the >> business of evaluating whether it's legal to distribute Python or not >> in their particular case. >=20 > Every CD vendor, mirror site, etc. would have to perform a risk=20 > analysis, yes. That goes beyond analysing the legal status only - peopl= e=20 > will usually also take into account what the risk of prosecution is. > They already do that for all other software they distribute, and=20 > apparently come to the conclusion that the risk of being prosecuted is=20 > nearly zero. In reality is probably is for most parts of the world. But why put this burden on the casual user ? >> Crypto is just too much (legal) work if you're serious about it. >=20 > So then you would advise to remove the OpenSSL support from the Windows= =20 > distribution, and from Python altogether? Hmm, I didn't know that the Windows installer comes with an SSL module that includes OpenSSL. I'd strongly advise to make that a separate download. At the very least, there should be a Windows installer without that module and a note on the web-site mentioning the problem and maybe linking to the URL I gave in my other mail. In any case, the download page should have a note about the use of crypto code and interfaces to crypto code to make things safer for both the PSF and the user downloading the distribution. > Because if not, why would it be bad to add more cryptographic packages=20 > to the standard Python distribution? Either you violate some law in som= e=20 > country already by distributing Python from A to B, or you don't. Addin= g=20 > another package doesn't change anything here. I can't follow you're argument. This is like "you've robbed one bank; it doesn't get worse if you rob another two". I also don't understand your position in the light of the PSF's intentions. The PSF is meant to protect the IP in Python -- how does that fit with being careless about breaking law ? >> I also don't really see a problem here: there are plenty good >> crypto packages out there ready to be used.=20 >=20 > And it may be indeed the case that authors of such package fear the los= s=20 > of reputation if competing packages were included into the Python=20 > distribution :-( Is there ? pycrypto is all you need if you're into deep crypto. The standard SSL support is enough crypt for most people and that's already included in the distribution. --=20 Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 25 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 60 days left From Paul.Moore@atosorigin.com Fri Apr 25 09:59:52 2003 From: Paul.Moore@atosorigin.com (Moore, Paul) Date: Fri, 25 Apr 2003 09:59:52 +0100 Subject: [Python-Dev] Cryptographic stuff for 2.3 Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> From: M.-A. Lemburg [mailto:mal@lemburg.com] > In reality is probably is for most parts of the world. But > why put this burden on the casual user ? Speaking as a "casual user", I very rarely need or use crypto software. However, when I do need it, having it "built in" is a major benefit - most of the crypto packages either have dependencies I'm not familiar with or don't have, or go far too deep into crypto theory for me to follow. At the end of the day, all I want is simple stuff, like for urllib to get a "https" web page for me, "just like my browser does" (ie, with no thought on my part...) >>> Crypto is just too much (legal) work if you're serious >>> about it. >>=20 >> So then you would advise to remove the OpenSSL support >> from the Windows distribution, and from Python altogether? > > Hmm, I didn't know that the Windows installer comes with an SSL > module that includes OpenSSL. I'd strongly advise to make that > a separate download. If you did, I'd expect that 99% of Windows users would perceive that as "Python can't handle https URLs". Having a separate download might be enough, as long as it was utterly trivial - download the package, click to install, done. All dependencies included, no extra work. > Is there ? pycrypto is all you need if you're into deep crypto. But pycrypto (at least when I've looked into it) definitely *isn't* just a 1-click install, and a quick Google search reveals no way of getting a prebuilt Windows binary. Of course, you say "if you're into deep crypto", so maybe you'd say that expecting users to build their own isn't unreasonable at that level. Actually, m2crypto is another candidate, and it does include Windows binaries (but they are a bit fiddly to install)... > The standard SSL support is enough crypt for most people and > that's already included in the distribution. But you were arguing to take it out... Personally, I'd like the existing stuff to stay as-is. I don't particularly see the need for more crypto stuff in the core, but I'd like to see a well-maintained, easy to install, "sanctioned" crypto package for people who want to either use crypto "for real", or just investigate it. Paul. From andrew@acooke.org Fri Apr 25 12:47:05 2003 From: andrew@acooke.org (andrew cooke) Date: Fri, 25 Apr 2003 07:47:05 -0400 (CLT) Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com> References: <200304241650.h3OGoPM15432@odiug.zope.com> <LNBBLJKPBEHFEDALKOLCMEBLEIAB.tim_one@email.msn.com> Message-ID: <41193.127.0.0.1.1051271225.squirrel@127.0.0.1> Tim Peters said: > I've checked this in, modified to treat symlinks the same way > os.path.walk() > treated them, and with docs and test cases. It wasn't my intent to cut > off > people who want fancier stuff, but available time is finite, and at least > now they can demonstrate their sincerity by supplying code, doc, and test > suite patches <wink>. For the record - the version I posted (with breadth-first as an option) wasn't reliable (it runs out of stack space on reasonable directory structures). Andrew -- http://www.acooke.org/andrew From tim@zope.com Fri Apr 25 16:14:31 2003 From: tim@zope.com (Tim Peters) Date: Fri, 25 Apr 2003 11:14:31 -0400 Subject: [Python-Dev] More new Windos test failures Message-ID: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com> test_urllib and test_socket fail on Win2K today. test_socket (I think Guido already knows about these): ====================================================================== ERROR: testIPv4toString (__main__.GeneralModuleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_socket.py", line 322, in testIPv4toString from socket import inet_aton as f, inet_pton, AF_INET ImportError: cannot import name inet_pton ====================================================================== ERROR: testStringToIPv4 (__main__.GeneralModuleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_socket.py", line 352, in testStringToIPv4 from socket import inet_ntoa as f, inet_ntop, AF_INET ImportError: cannot import name inet_ntop ---------------------------------------------------------------------- Ran 46 tests in 3.555s FAILED (errors=2) test_urllib (these may all be bad line-end assumptions): ====================================================================== FAIL: test_fileno (__main__.urlopen_FileTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_urllib.py", line 68, in test_fileno "Reading on the file descriptor returned by fileno() " File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: Reading on the file descriptor returned by fileno() did not return the expected text ====================================================================== FAIL: test_iter (__main__.urlopen_FileTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_urllib.py", line 88, in test_iter self.assertEqual(line, self.text) File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 'test_urllib: urlopen_FileTests\r\n' != 'test_urllib: urlopen_FileTests\n' ====================================================================== FAIL: test_read (__main__.urlopen_FileTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_urllib.py", line 48, in test_read self.assertEqual(self.text, self.returned_obj.read()) File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib: urlopen_FileTests\r\n' ====================================================================== FAIL: test_readline (__main__.urlopen_FileTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_urllib.py", line 51, in test_readline self.assertEqual(self.text, self.returned_obj.readline()) File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib: urlopen_FileTests\r\n' ====================================================================== FAIL: test_readlines (__main__.urlopen_FileTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../lib/test/test_urllib.py", line 61, in test_readlines "readlines() returned improper text") File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual raise self.failureException, \ AssertionError: readlines() returned improper text ---------------------------------------------------------------------- Ran 23 tests in 0.280s FAILED (failures=5) From guido@python.org Fri Apr 25 16:21:55 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 11:21:55 -0400 Subject: [Python-Dev] Failin tests on Windows In-Reply-To: "Your message of Fri, 25 Apr 2003 06:40:27 EDT." <005d01c30b17$28840580$1a3cc797@oemcomputer> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> Message-ID: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> > test_urllib.py is crashing on my fresh WinMe build: > > test_fileno (__main__.urlopen_FileTests) ... FAIL > test_iter (__main__.urlopen_FileTests) ... FAIL > test_read (__main__.urlopen_FileTests) ... FAIL > test_readline (__main__.urlopen_FileTests) ... FAIL > test_readlines (__main__.urlopen_FileTests) ... FAIL Should be fixed now -- I'm writing the file with test data in binary mode. I think that it would be preferably if the socket._fileobject class would actually interpret the mode argument, but it's never done that, so I'm not in a hurry to add this feature to this already hairy class. (Better wait until the new "sio" class -- see sandbox.) > Two of the test cases are failing in test_socket.py > on a fresh build for WinMe: > > testIPv4toString (__main__.GeneralModuleTests) ... ERROR > testStringToIPv4 (__main__.GeneralModuleTests) ... ERROR Fixed too, by skipping the tests of inet_ntop() and inet_pton() when they don't exist. All tests now pass for me, both on Linux (Red Hat 7.8) and Windows (Win 98 second edition). --Guido van Rossum (home page: http://www.python.org/~guido/) From wesleyhenwood@hotmail.com Fri Apr 25 16:33:57 2003 From: wesleyhenwood@hotmail.com (wesley henwood) Date: Fri, 25 Apr 2003 15:33:57 +0000 Subject: [Python-Dev] Re: PyRun_* functions Message-ID: <BAY7-F110q50K8paSxf00007dd9@hotmail.com> How do I make certain that FILE* parameters are only passed to these functions if it is certain that they were created by the same library that the Python runtime is using? _________________________________________________________________ From jeremy@zope.com Fri Apr 25 16:34:22 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 25 Apr 2003 11:34:22 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <3EA8D8C3.40503@lemburg.com> References: <20030423163947.GA24541@nyman.amk.ca> <m3lly079qc.fsf@mira.informatik.hu-berlin.de> <3EA7A11B.8090202@lemburg.com> <m3he8ng3j3.fsf@mira.informatik.hu-berlin.de> <3EA8D8C3.40503@lemburg.com> Message-ID: <1051284862.1009.6.camel@slothrop.zope.com> On Fri, 2003-04-25 at 02:42, M.-A. Lemburg wrote: > That's really optimistic. Every CD vendor, mirror site, etc. in the > world hosting the Python distribution would have to go through the > business of evaluating whether it's legal to distribute Python or not > in their particular case. I haven't had time to follow this thread closely, but I think I saw a message from Martin where he explained that the OpenSSL wrapper we already have is probably covered by US export regulations. I think it's a matter of interpretation, but I agree with that interpretation. So everyone who distributes Python already needs to do that analysis. I think it's unlikely we would remove the crypto code we already have, so I'm all for adding more crypto code that makes the library more useful. Jeremy From guido@python.org Fri Apr 25 16:38:35 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 11:38:35 -0400 Subject: [Python-Dev] Re: PyRun_* functions In-Reply-To: "Your message of Fri, 25 Apr 2003 15:33:57 -0000." <BAY7-F110q50K8paSxf00007dd9@hotmail.com> References: <BAY7-F110q50K8paSxf00007dd9@hotmail.com> Message-ID: <200304251538.h3PFcZ119642@pcp02138704pcs.reston01.va.comcast.net> > How do I make certain that FILE* parameters are only passed to these > functions if it is certain that they were created by the same library that > the Python runtime is using? On which platform? --Guido van Rossum (home page: http://www.python.org/~guido/) From duanev@io.com Fri Apr 25 16:47:48 2003 From: duanev@io.com (Duane Voth) Date: Fri, 25 Apr 2003 10:47:48 -0500 Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses! Message-ID: <20030425104748.A26488@io.com> First, Martin, muchas garcias! --export-dynamic was exactly the ticket. Next hurdle: Lynx is clearly hoping curses will go the way of the condor, their implementation is pre ncurses! Comments at the top of Python-2.2.2/Modules/_cursesmodule.c suggest that there was a prior version of Python curses that should be much closer to what LynxOS4 supports. Does anyone have an archived copy of the old _cursesmodule.c? Modules/_cursesmodule.c comments: * Based on prior work by Lance Ellinghaus and Oliver Andrich * Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse, * Cathedral City, California Republic, United States of America. * * Version 1.5b1, heavily extended for ncurses by Oliver Andrich: * Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany. so I guess I'm looking for version 1.2 of _cursesmodule.c. -- Duane Voth duanev@io.com -- duanev@atlantis.io.com From james.kew@btinternet.com Fri Apr 25 16:56:58 2003 From: james.kew@btinternet.com (James Kew) Date: Fri, 25 Apr 2003 16:56:58 +0100 Subject: [Python-Dev] Re: Cryptographic stuff for 2.3 References: <20030423163947.GA24541@nyman.amk.ca> <200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <b8blqg$96u$1@main.gmane.org> "Guido van Rossum" <guido@python.org> wrote in message news:200304240117.h3O1H8S31520@pcp02138704pcs.reston01.va.comcast.net... > Rotor should be deprecated regardless; I've never heard of someone > using it. I have seen it mentioned occasionally on c.l.py, usually with a followup of "don't use rotor, it's not secure": http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&th=5a655073e0b632ea&rnum=4 http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&th=7b945db40cf892fd&rnum=5 James From python@rcn.com Fri Apr 25 17:00:15 2003 From: python@rcn.com (Raymond Hettinger) Date: Fri, 25 Apr 2003 12:00:15 -0400 Subject: [Python-Dev] Re: Failin tests on Windows References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <000701c30b43$c30f9ac0$125ffea9@oemcomputer> > All tests now pass for me, both on Linux (Red Hat 7.8) and Windows > (Win 98 second edition). On a fresh WinME build, all test pass for me also :-) Raymond From guido@python.org Fri Apr 25 17:00:27 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 12:00:27 -0400 Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses! In-Reply-To: "Your message of Fri, 25 Apr 2003 10:47:48 CDT." <20030425104748.A26488@io.com> References: <20030425104748.A26488@io.com> Message-ID: <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net> > Modules/_cursesmodule.c comments: > * Based on prior work by Lance Ellinghaus and Oliver Andrich > * Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse, > * Cathedral City, California Republic, United States of America. > * > * Version 1.5b1, heavily extended for ncurses by Oliver Andrich: > * Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany. > > so I guess I'm looking for version 1.2 of _cursesmodule.c. You should be able to get that out of CVS. The oldest version at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Modules/_cursesmodule.c is labeled 2.1, but the CVS version numbers don't match author's versions. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@zope.com Fri Apr 25 17:16:46 2003 From: jeremy@zope.com (Jeremy Hylton) Date: 25 Apr 2003 12:16:46 -0400 Subject: [Python-Dev] test_ossaudiodev hanging again Message-ID: <1051287405.1009.66.camel@slothrop.zope.com> I thought I'd report that test_ossaudiodev is back to hanging on my RH 7.2 box. It's been a while since I ran the test suite with the audio resource enabled, so I don't know when it started to hang. Jeremy From guido@python.org Fri Apr 25 17:39:46 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 12:39:46 -0400 Subject: [Python-Dev] test_ossaudiodev hanging again In-Reply-To: "Your message of 25 Apr 2003 12:16:46 EDT." <1051287405.1009.66.camel@slothrop.zope.com> References: <1051287405.1009.66.camel@slothrop.zope.com> Message-ID: <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net> > I thought I'd report that test_ossaudiodev is back to hanging on my RH > 7.2 box. It's been a while since I ran the test suite with the audio > resource enabled, so I don't know when it started to hang. It probably never stopped hanging. It only runs when you pass "-u audio" to regrtest though. I note that it passes for me with Red Hat 7.3, so you might want to upgrade. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Fri Apr 25 17:58:40 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Apr 2003 18:58:40 +0200 Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses! In-Reply-To: <20030425104748.A26488@io.com> References: <20030425104748.A26488@io.com> Message-ID: <3EA96940.4060501@v.loewis.de> Duane Voth wrote: > Next hurdle: Lynx is clearly hoping curses will go the way of the condor, > their implementation is pre ncurses! Comments at the top of > Python-2.2.2/Modules/_cursesmodule.c suggest that there was a prior > version of Python curses that should be much closer to what LynxOS4 > supports. Does anyone have an archived copy of the old _cursesmodule.c? You can get old versions of all source code from the CVS. > * Based on prior work by Lance Ellinghaus and Oliver Andrich > * Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse, > * Cathedral City, California Republic, United States of America. > * > * Version 1.5b1, heavily extended for ncurses by Oliver Andrich: > * Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany. > > so I guess I'm looking for version 1.2 of _cursesmodule.c. I think your guess is wrong. The extensions are used only if available, and the curses module works with pre-ncurses implementations of curses just fine. Regards, Martin From barry@python.org Fri Apr 25 18:10:57 2003 From: barry@python.org (Barry Warsaw) Date: 25 Apr 2003 13:10:57 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> Message-ID: <1051290657.1500.6.camel@barry> On Fri, 2003-04-25 at 04:59, Moore, Paul wrote: > Personally, I'd like the existing stuff to stay as-is. I'd hate to see sha removed from the standard distro. -Barry From tim.one@comcast.net Fri Apr 25 18:47:50 2003 From: tim.one@comcast.net (Tim Peters) Date: Fri, 25 Apr 2003 13:47:50 -0400 Subject: [Python-Dev] Failin tests on Windows In-Reply-To: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <BIEJKCLHCIOIHAGOKOLHOEGIFHAA.tim.one@comcast.net> [Guido] > ... > All tests now pass for me, both on Linux (Red Hat 7.8) and Windows > (Win 98 second edition). It looks good on Win2K now too, both release and debug builds. I saw one failure in test_queue, but believe that's due to a pre-existing race condition in the test code (recall that we've both seen test_queue fail before). From theller@python.net Fri Apr 25 19:10:33 2003 From: theller@python.net (Thomas Heller) Date: 25 Apr 2003 20:10:33 +0200 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCMEPBEDAB.tim.one@comcast.net> Message-ID: <y91ycusm.fsf@python.net> Tim Peters <tim.one@comcast.net> writes: > [Mark Hammond] > > Actually, some guidance would be nice here. > > It's easy this time. BTW, I agree your new check is the right thing to do! > If another case like this pops up, though, we/you should probably add a > section to the PEP explaining what to do about it. > ctypes ;-) is another case (and more cases will pop up as soon as the beta is released, and people try their extensions under it). I agree it is easy to fix, but usually when Python crashes with an invalid thread state I'm very anxious at first. So is the policy now that it is no longer *allowed* to create another thread state, while in previous versions there wasn't any choice, because there existed no way to get the existing one? IMO a fatal error is very harsh, especially there's no problem to continue execution - excactly what happens in a release build. Not that I am misunderstood: I very much appreciate the work Mark has done, and look forward to use it to it's fullest extent. Thomas From niemeyer@conectiva.com Fri Apr 25 19:11:57 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Fri, 25 Apr 2003 15:11:57 -0300 Subject: [Python-Dev] shellwords In-Reply-To: <200304250502.h3P52PH25342@localhost.localdomain> References: <2mlly6pgff.fsf@starship.python.net> <200304250502.h3P52PH25342@localhost.localdomain> Message-ID: <20030425181157.GB6591@localhost.distro.conectiva> > > Particularly the file-manipulation stuff... shutil tends to lose > > somewhat x-platform. > > The other file manipulation thingy that would be good would be to > abstract out the bits of tarfile and zipfile and make a standard > interface to the two. IIRC, tarfile has a wrapper which makes it compatible with zipfile. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From guido@python.org Fri Apr 25 19:26:25 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 14:26:25 -0400 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: "Your message of 25 Apr 2003 13:10:57 EDT." <1051290657.1500.6.camel@barry> References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> <1051290657.1500.6.camel@barry> Message-ID: <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net> > I'd hate to see sha removed from the standard distro. Me too; I don't see sha or md5 as crypto. I'm only against adding new *crypto* capability. I'm also for isolating existing crypto capability so it's easy to remove for anyone who has a need for a crypto-free distribution. I think we're already doing that, given that even on Windows, the SSL module is a separate DLL. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Fri Apr 25 19:36:45 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 25 Apr 2003 14:36:45 -0400 Subject: [Python-Dev] Python 2.3b1 documentation Message-ID: <16041.32829.612385.536757@grendel.zope.com> I've already formatted the documentation for Python 2.3b1; please don't touch the Doc directory until the final release has been announced. Thanks! -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From skip@pobox.com Fri Apr 25 19:22:43 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 13:22:43 -0500 Subject: [Python-Dev] test_logging hangs on Solaris 8 Message-ID: <16041.31987.943313.278329@montanaro.dyndns.org> Using the latest version from CVS, on Solaris 8 test_logging hangs. Lots of output, then: ... INFO:root:Info index = 99 -- logging 100 at INFO, messages should be seen every 10 events -- -- logging 101 at INFO, messages should be seen every 10 events -- INFO:root:Info index = 100 INFO:root:Info index = 101 -- log_test2 end --------------------------------------------------- -- log_test3 begin --------------------------------------------------- Unfiltered... INFO:a:Info 1 INFO:a.b:Info 2 INFO:a.c:Info 3 INFO:a.b.c:Info 4 INFO:a.b.c.d:Info 5 INFO:a.bb.c:Info 6 INFO:b:Info 7 INFO:b.a:Info 8 INFO:c.a.b:Info 9 INFO:a.bb:Info 10 Filtered with 'a.b'... INFO:a.b:Info 2 INFO:a.b.c:Info 4 INFO:a.b.c.d:Info 5 -- log_test3 end --------------------------------------------------- and it just sits there. ^C doesn't terminate it. I have to stop it w/ ^Z, then "kill %1" it. I have the very latest source checked out. Any ideas? Skip From skip@pobox.com Fri Apr 25 16:38:03 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 10:38:03 -0500 Subject: [Python-Dev] should sre.Scanner be exposed through re and documented? Message-ID: <16041.22107.533893.743928@montanaro.dyndns.org> While moving tests from test_sre to test_re I stumbled upon a simple test for sre.Scanner. This looks fairly cool. Should it be exposed through re and documented? Skip From skip@pobox.com Fri Apr 25 17:14:51 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 11:14:51 -0500 Subject: [Python-Dev] test_s?re merge Message-ID: <16041.24315.500827.370963@montanaro.dyndns.org> For those of you who don't read python-checkins, the merge of test_re.py and test_sre.py has been completed and test_sre.py is no longer in the repository. Future test cases should be added to test_re.py, even if it's a test specifically of Fredrik's sre module. The sre.Scanner object is the only thing imported directly from sre, and only because it is not sucked in by re.py. I may also assimilate Tim's re_tests.py at some point, but probably not real soon, so if someone feels like tackling that, be my guest. ;-) Skip From skip@pobox.com Fri Apr 25 17:27:22 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 11:27:22 -0500 Subject: [Python-Dev] bz2 module fails to compile on Solaris 8 Message-ID: <16041.25066.559968.451868@montanaro.dyndns.org> The bz2 module isn't compiling for me on Solaris 8: building 'bz2' extension gcc -g -Wall -Wstrict-prototypes -fPIC -I. -I/export/home/python/dist/src/./Include -I/usr/local/include -I/export/home/python/dist/src/Include -I/export/home/python/dist/src -c /export/home/python/dist/src/Modules/bz2module.c -o build/temp.solaris-2.8-sun4u-2.3/bz2module.o cc1: warning: changing search order for system directory "/usr/local/include" cc1: warning: as it has already been specified as a non-system directory /export/home/python/dist/src/Modules/bz2module.c: In function `Util_CatchBZ2Error': /export/home/python/dist/src/Modules/bz2module.c:120: `BZ_CONFIG_ERROR' undeclared (first use in this function) /export/home/python/dist/src/Modules/bz2module.c:120: (Each undeclared identifier is reported only once ... This particular machine has a /usr/include/bzlib.h file with a copyright date of 1998. There are several other BZ_*_ERROR defines, but not BZ_CONFIG_ERROR. Adding a conditional define for that macro isn't sufficient to get it to compile. I get lots of "structure has no ..." errors: Modules/bz2module.c:1521: structure has no member named `total_out_hi32' Modules/bz2module.c:1521: structure has no member named `total_out_lo32' Perhaps this version of bz2 lib is too old to use with Gustavo's module? Skip From guido@python.org Fri Apr 25 20:24:42 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 15:24:42 -0400 Subject: [Python-Dev] Tagging the tree Message-ID: <200304251924.h3PJOgw25941@pcp02138704pcs.reston01.va.comcast.net> I'm tagging the CVS tree now. Please no more checkins until the release is announced or unless I specifically ask you! --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Fri Apr 25 20:32:26 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Apr 2003 21:32:26 +0200 Subject: [Python-Dev] should sre.Scanner be exposed through re and documented? In-Reply-To: <16041.22107.533893.743928@montanaro.dyndns.org> References: <16041.22107.533893.743928@montanaro.dyndns.org> Message-ID: <3EA98D4A.40107@v.loewis.de> Skip Montanaro wrote: > While moving tests from test_sre to test_re I stumbled upon a simple test > for sre.Scanner. This looks fairly cool. Should it be exposed through re > and documented? I think /F did not consider it ready for general consumption. I believe the approach is cool, but the API would still leave features to be desired. In practical compiler construction, I usually copy the approach, and duplicate it - that gives a very efficient and readily comprehensible scanner. IOW, I would leave it where it is: As a masterpiece of work to get inspiration from, but not as a tool to give out to anybody. Regards, Martin From guido@python.org Fri Apr 25 20:38:36 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 15:38:36 -0400 Subject: [Python-Dev] bz2 module fails to compile on Solaris 8 In-Reply-To: "Your message of Fri, 25 Apr 2003 11:27:22 CDT." <16041.25066.559968.451868@montanaro.dyndns.org> References: <16041.25066.559968.451868@montanaro.dyndns.org> Message-ID: <200304251938.h3PJcap26135@pcp02138704pcs.reston01.va.comcast.net> > The bz2 module isn't compiling for me on Solaris 8: > > building 'bz2' extension > gcc -g -Wall -Wstrict-prototypes -fPIC -I. -I/export/home/python/dist/src/./Include -I/usr/local/include -I/export/home/python/dist/src/Include -I/export/home/python/dist/src -c /export/home/python/dist/src/Modules/bz2module.c -o build/temp.solaris-2.8-sun4u-2.3/bz2module.o > cc1: warning: changing search order for system directory "/usr/local/include" > cc1: warning: as it has already been specified as a non-system directory > /export/home/python/dist/src/Modules/bz2module.c: In function `Util_CatchBZ2Error': > /export/home/python/dist/src/Modules/bz2module.c:120: `BZ_CONFIG_ERROR' undeclared (first use in this function) > /export/home/python/dist/src/Modules/bz2module.c:120: (Each undeclared identifier is reported only once > ... > > This particular machine has a /usr/include/bzlib.h file with a copyright > date of 1998. There are several other BZ_*_ERROR defines, but not > BZ_CONFIG_ERROR. Adding a conditional define for that macro isn't > sufficient to get it to compile. I get lots of "structure has no ..." > errors: > > Modules/bz2module.c:1521: structure has no member named `total_out_hi32' > Modules/bz2module.c:1521: structure has no member named `total_out_lo32' > > Perhaps this version of bz2 lib is too old to use with Gustavo's module? Again, maybe we should just give up on Solaris. :-( Please work with Gustavo to fix this after the b1 release. (I'm still waiting for the "cvs tag" command to finish...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 25 20:37:17 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 15:37:17 -0400 Subject: [Python-Dev] test_logging hangs on Solaris 8 In-Reply-To: "Your message of Fri, 25 Apr 2003 13:22:43 CDT." <16041.31987.943313.278329@montanaro.dyndns.org> References: <16041.31987.943313.278329@montanaro.dyndns.org> Message-ID: <200304251937.h3PJbHr26118@pcp02138704pcs.reston01.va.comcast.net> > Using the latest version from CVS, on Solaris 8 test_logging hangs. Lots of > output, then: > > ... > INFO:root:Info index = 99 > -- logging 100 at INFO, messages should be seen every 10 events -- > -- logging 101 at INFO, messages should be seen every 10 events -- > INFO:root:Info index = 100 > INFO:root:Info index = 101 > -- log_test2 end --------------------------------------------------- > -- log_test3 begin --------------------------------------------------- > Unfiltered... > INFO:a:Info 1 > INFO:a.b:Info 2 > INFO:a.c:Info 3 > INFO:a.b.c:Info 4 > INFO:a.b.c.d:Info 5 > INFO:a.bb.c:Info 6 > INFO:b:Info 7 > INFO:b.a:Info 8 > INFO:c.a.b:Info 9 > INFO:a.bb:Info 10 > Filtered with 'a.b'... > INFO:a.b:Info 2 > INFO:a.b.c:Info 4 > INFO:a.b.c.d:Info 5 > -- log_test3 end --------------------------------------------------- > > and it just sits there. ^C doesn't terminate it. I have to stop it w/ ^Z, > then "kill %1" it. I have the very latest source checked out. Any ideas? Let's eradicate Solaris from the universe. :-) Seriously, this will have to wait until after the b1 release today. Someone else reported success on Solaris 8 IIRC. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 25 20:38:55 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 15:38:55 -0400 Subject: [Python-Dev] should sre.Scanner be exposed through re and documented? In-Reply-To: "Your message of Fri, 25 Apr 2003 10:38:03 CDT." <16041.22107.533893.743928@montanaro.dyndns.org> References: <16041.22107.533893.743928@montanaro.dyndns.org> Message-ID: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net> > While moving tests from test_sre to test_re I stumbled upon a simple test > for sre.Scanner. This looks fairly cool. Should it be exposed through re > and documented? What's Scanner? --Guido van Rossum (home page: http://www.python.org/~guido/) From neal@metaslash.com Fri Apr 25 20:41:48 2003 From: neal@metaslash.com (Neal Norwitz) Date: Fri, 25 Apr 2003 15:41:48 -0400 Subject: [Python-Dev] test_logging hangs on Solaris 8 In-Reply-To: <16041.31987.943313.278329@montanaro.dyndns.org> References: <16041.31987.943313.278329@montanaro.dyndns.org> Message-ID: <20030425194147.GG12173@epoch.metaslash.com> On Fri, Apr 25, 2003 at 01:22:43PM -0500, Skip Montanaro wrote: > Using the latest version from CVS, on Solaris 8 test_logging hangs. Lots of > output, then: > > ... > > and it just sits there. ^C doesn't terminate it. I have to stop it w/ ^Z, > then "kill %1" it. I have the very latest source checked out. Any ideas? On Solaris 8, I've had the test pass, hang, and crash the interpreter (actually it was test_threaded_import when running all the tests). The problem may be related to Mark's changes, but I'm not sure. Anyway, the tests passed when run many times with the change below, perhaps it will work for you. I'm running the entire suite on Solaris now. The change works on Linux. Neal -- --- Lib/test/test_logging.py.save 2003-04-25 15:30:23.000000000 -0400 +++ Lib/test/test_logging.py 2003-04-25 15:30:52.000000000 -0400 @@ -470,6 +470,8 @@ socketDataProcessed.acquire() socketDataProcessed.wait() socketDataProcessed.release() + for thread in threads: + thread.join() banner("logrecv output", "begin") sys.stdout.write(sockOut.getvalue()) sockOut.close() From gherron@islandtraining.com Fri Apr 25 20:49:33 2003 From: gherron@islandtraining.com (Gary Herron) Date: Fri, 25 Apr 2003 12:49:33 -0700 Subject: [Python-Dev] should sre.Scanner be exposed through re and documented? In-Reply-To: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net> References: <16041.22107.533893.743928@montanaro.dyndns.org> <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304251249.33780.gherron@islandtraining.com> On Friday 25 April 2003 12:38 pm, Guido van Rossum wrote: > > While moving tests from test_sre to test_re I stumbled upon a simple test > > for sre.Scanner. This looks fairly cool. Should it be exposed through > > re and documented? > > What's Scanner? > You create a Scanner instance with a list of re's and associated functions, then you use it to scan a string, returning a list of parts which match the given re's. (Actually the matches are run through the associated functions, and their output is what forms the returned list.) Here's the single test case Skip refereed to: def s_ident(scanner, token): return token def s_operator(scanner, token): return "op%s" % token def s_float(scanner, token): return float(token) def s_int(scanner, token): return int(token) scanner = sre.Scanner([ (r"[a-zA-Z_]\w*", s_ident), (r"\d+\.\d*", s_float), (r"\d+", s_int), (r"=|\+|-|\*|/", s_operator), (r"\s+", None), ]) # sanity check test('scanner.scan("sum = 3*foo + 312.50 + bar")', (['sum', 'op=', 3, 'op*', 'foo', 'op+', 312.5, 'op+', 'bar'], '')) Gary Herron From skip@pobox.com Fri Apr 25 20:58:34 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 14:58:34 -0500 Subject: [Python-Dev] should sre.Scanner be exposed through re and documented? In-Reply-To: <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net> References: <16041.22107.533893.743928@montanaro.dyndns.org> <200304251938.h3PJctv26146@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <16041.37738.788965.865040@montanaro.dyndns.org> Guido> What's Scanner? Gary already posted the example I was going to (damn phone!)... ;-) I defer to Martin's judgement on this. (I presume his response has passed through your mailbox by now.) I still think it would be nice to demonstrate it somewhere. I'll look and see if there's somewhere some toy script can be squeezed into the Demo directory. Skip From skip@pobox.com Fri Apr 25 21:04:46 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 15:04:46 -0500 Subject: [Python-Dev] test_logging hangs on Solaris 8 In-Reply-To: <20030425194147.GG12173@epoch.metaslash.com> References: <16041.31987.943313.278329@montanaro.dyndns.org> <20030425194147.GG12173@epoch.metaslash.com> Message-ID: <16041.38110.284173.399590@montanaro.dyndns.org> Neal> Anyway, the tests passed when run many times with the change Neal> below, perhaps it will work for you. I'm running the entire suite Neal> on Solaris now. The change works on Linux. Thanks. Alas, it didn't seem to help. Skip From guido@python.org Fri Apr 25 21:07:43 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 16:07:43 -0400 Subject: [Python-Dev] test_logging hangs on Solaris 8 In-Reply-To: "Your message of Fri, 25 Apr 2003 15:41:48 EDT." <20030425194147.GG12173@epoch.metaslash.com> References: <16041.31987.943313.278329@montanaro.dyndns.org> <20030425194147.GG12173@epoch.metaslash.com> Message-ID: <200304252007.h3PK7h426521@pcp02138704pcs.reston01.va.comcast.net> > From: Neal Norwitz <neal@metaslash.com> > On Fri, Apr 25, 2003 at 01:22:43PM -0500, Skip Montanaro wrote: > > Using the latest version from CVS, on Solaris 8 test_logging hangs. Lots of > > output, then: > > > > ... > > > > and it just sits there. ^C doesn't terminate it. I have to stop it w/ ^Z, > > then "kill %1" it. I have the very latest source checked out. Any ideas? > > On Solaris 8, I've had the test pass, hang, and crash the interpreter > (actually it was test_threaded_import when running all the tests). > The problem may be related to Mark's changes, but I'm not sure. > > Anyway, the tests passed when run many times with the change below, > perhaps it will work for you. I'm running the entire suite on > Solaris now. The change works on Linux. > > Neal > -- > --- Lib/test/test_logging.py.save 2003-04-25 15:30:23.000000000 -0400 > +++ Lib/test/test_logging.py 2003-04-25 15:30:52.000000000 -0400 > @@ -470,6 +470,8 @@ > socketDataProcessed.acquire() > socketDataProcessed.wait() > socketDataProcessed.release() > + for thread in threads: > + thread.join() > banner("logrecv output", "begin") > sys.stdout.write(sockOut.getvalue()) > sockOut.close() OK, I'll add that to the release branch, so I can scratch at least one of the "known bugs" we start out with... --Guido van Rossum (home page: http://www.python.org/~guido/) From wesleyhenwood@hotmail.com Fri Apr 25 21:23:53 2003 From: wesleyhenwood@hotmail.com (wesley henwood) Date: Fri, 25 Apr 2003 20:23:53 +0000 Subject: [Python-Dev] Re: PyRun_* functions Message-ID: <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com> >>How do I make certain that FILE* parameters are only passed to these >>functions if it is certain that they were created by the same library >> >>that the Python runtime is using? >On which platform? Windows. _________________________________________________________________ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From guido@python.org Fri Apr 25 22:02:53 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 17:02:53 -0400 Subject: [Python-Dev] Re: PyRun_* functions In-Reply-To: "Your message of Fri, 25 Apr 2003 20:23:53 -0000." <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com> References: <BAY7-F10HcchXzT4CaL0000ba9c@hotmail.com> Message-ID: <200304252102.h3PL2r426956@pcp02138704pcs.reston01.va.comcast.net> > >>How do I make certain that FILE* parameters are only passed to these > >>functions if it is certain that they were created by the same library > >> >>that the Python runtime is using? > > >On which platform? > > Windows. Link your application with MSVCRT.DLL. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@zope.com Fri Apr 25 22:21:28 2003 From: tim@zope.com (Tim Peters) Date: Fri, 25 Apr 2003 17:21:28 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <41193.127.0.0.1.1051271225.squirrel@127.0.0.1> Message-ID: <BIEJKCLHCIOIHAGOKOLHAEIJFHAA.tim@zope.com> [andrew cooke] > For the record - the version I posted (with breadth-first as an option) > wasn't reliable (it runs out of stack space on reasonable directory > structures). Apart from that, did you have a use case for breadth-first directory traversal? Because it's clumsier, you usually find BFS only used on search trees that are too deep/expensive to traverse exhaustively (e.g., a tree of chess moves), or that have infinite paths (so that DFS can't terminate even in theory). Directory trees aren't usually <wink> of that nature. From drifty@alum.berkeley.edu Fri Apr 25 23:17:52 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 15:17:52 -0700 (PDT) Subject: [Python-Dev] Failin tests on Windows In-Reply-To: <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> [Guido van Rossum] > > test_urllib.py is crashing on my fresh WinMe build: > > > > test_fileno (__main__.urlopen_FileTests) ... FAIL > > test_iter (__main__.urlopen_FileTests) ... FAIL > > test_read (__main__.urlopen_FileTests) ... FAIL > > test_readline (__main__.urlopen_FileTests) ... FAIL > > test_readlines (__main__.urlopen_FileTests) ... FAIL > > Should be fixed now -- I'm writing the file with test data in binary > mode. > Didn't even think of that problem when I wrote the tests. Should I patch the docs for urllib (again =) to say that files are open in binary? I know I wasn't expecting urllib to open in binary mode for a local text file. Thanks for fixing this, Guido. I think I am going to do a self-imposed "no checkins within 24 hours of a planned release" rule. -Brett From drifty@alum.berkeley.edu Fri Apr 25 23:41:17 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 15:41:17 -0700 (PDT) Subject: [Python-Dev] More new Windos test failures In-Reply-To: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com> References: <BIEJKCLHCIOIHAGOKOLHCEFHFHAA.tim@zope.com> Message-ID: <Pine.SOL.4.55.0304251538420.25263@death.OCF.Berkeley.EDU> [Tim Peters] > test_urllib (these may all be bad line-end assumptions): > Yep, it looks like it is line-ending issues. Is this still happening even after Guido changed the test to open the files in binary? If it is I will change the tests after Guido give the all clear for CVS checkins again and strip all text before comparing. > ====================================================================== > FAIL: test_fileno (__main__.urlopen_FileTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_urllib.py", line 68, in test_fileno > "Reading on the file descriptor returned by fileno() " > File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: Reading on the file descriptor returned by fileno() did not > return the expected text > > ====================================================================== > FAIL: test_iter (__main__.urlopen_FileTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_urllib.py", line 88, in test_iter > self.assertEqual(line, self.text) > File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: 'test_urllib: urlopen_FileTests\r\n' != 'test_urllib: > urlopen_FileTests\n' > > ====================================================================== > FAIL: test_read (__main__.urlopen_FileTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_urllib.py", line 48, in test_read > self.assertEqual(self.text, self.returned_obj.read()) > File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib: > urlopen_FileTests\r\n' > > ====================================================================== > FAIL: test_readline (__main__.urlopen_FileTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_urllib.py", line 51, in test_readline > self.assertEqual(self.text, self.returned_obj.readline()) > File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: 'test_urllib: urlopen_FileTests\n' != 'test_urllib: > urlopen_FileTests\r\n' > > ====================================================================== > FAIL: test_readlines (__main__.urlopen_FileTests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "../lib/test/test_urllib.py", line 61, in test_readlines > "readlines() returned improper text") > File "C:\Code\python\lib\unittest.py", line 292, in failUnlessEqual > raise self.failureException, \ > AssertionError: readlines() returned improper text > > ---------------------------------------------------------------------- > Ran 23 tests in 0.280s > > FAILED (failures=5) From duanev@io.com Fri Apr 25 23:44:05 2003 From: duanev@io.com (Duane Voth) Date: Fri, 25 Apr 2003 17:44:05 -0500 Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses! In-Reply-To: <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Fri, Apr 25, 2003 at 12:00:27PM -0400 References: <20030425104748.A26488@io.com> <200304251600.h3PG0Rc22678@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030425174405.A31111@io.com> On Fri, Apr 25, 2003 at 12:00:27PM -0400, Guido van Rossum wrote: > You should be able to get that out of CVS. The oldest version > ... > is labeled 2.1, but the CVS version numbers don't match author's > versions. After taking a closer look, not even _cursesmodule.c 2.1 is going to help me. The copyright in the /usr/include/curses.h on this box is: /* * Copyright (c) 1980 Regents of the University of California. * All rights reserved. The Berkeley software License Agreement * specifies the terms and conditions for redistribution. * * @(#)curses.h 5.1 (Berkeley) 6/7/85 */ There is no support for either attributes or colors! Taking the current source and cutting out everything unsupported will at least keep the python-curses API intact. That's probably my best route. -- Duane Voth duanev@io.com -- duanev@atlantis.io.com From Jack.Jansen@oratrix.com Fri Apr 25 23:47:43 2003 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sat, 26 Apr 2003 00:47:43 +0200 Subject: [Python-Dev] LynxOS4 port: need pre-ncurses curses! In-Reply-To: <3EA96940.4060501@v.loewis.de> Message-ID: <ED18587C-776F-11D7-B113-000A27B19B96@oratrix.com> On vrijdag, apr 25, 2003, at 18:58 Europe/Amsterdam, Martin v. L=F6wis=20= wrote: >> * Based on prior work by Lance Ellinghaus and Oliver Andrich >> * Version 1.2 of this module: Copyright 1994 by Lance Ellinghouse, >> * Cathedral City, California Republic, United States of America. >> * >> * Version 1.5b1, heavily extended for ncurses by Oliver Andrich: >> * Copyright 1996,1997 by Oliver Andrich, Koblenz, Germany. >> so I guess I'm looking for version 1.2 of _cursesmodule.c. > > I think your guess is wrong. The extensions are used only if = available, > and the curses module works with pre-ncurses implementations of curses > just fine. Not in all cases. Before MacOSX had ncurses (MacOSX 10.1 and earlier had an ancient BSD curses) the only solution was to disable building curses, as the module didn't compile, and fixing it was far from obvious. -- - Jack Jansen <Jack.Jansen@oratrix.com> =20 http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma=20 Goldman - From fincher.8@osu.edu Sat Apr 26 00:48:26 2003 From: fincher.8@osu.edu (Jeremy Fincher) Date: Fri, 25 Apr 2003 19:48:26 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304241650.h3OGoPM15432@odiug.zope.com> References: <1051202649.3ea814599f6fa@mcherm.com> <200304241650.h3OGoPM15432@odiug.zope.com> Message-ID: <200304251948.26774.fincher.8@osu.edu> On Thursday 24 April 2003 12:50 pm, Guido van Rossum wrote: > Agreed. How about naming it os.walk()? I think it's not OS specific > -- all the OS specific stuff is part of os.path. So we only need one > implementation. It's a minor quibble to be sure, but os.walk doesn't really describe what exactly it's doing. I'd suggest os.pathwalk, but that'd be too error-prone, being os.path.walk without a dot. Perhaps os.pathwalker? Just a (likely ill-informed :)) opinion :) Jeremy From tim@zope.com Sat Apr 26 00:45:33 2003 From: tim@zope.com (Tim Peters) Date: Fri, 25 Apr 2003 19:45:33 -0400 Subject: [Python-Dev] More new Windos test failures In-Reply-To: <Pine.SOL.4.55.0304251538420.25263@death.OCF.Berkeley.EDU> Message-ID: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com> >> test_urllib (these may all be bad line-end assumptions): [Brett] > Yep, it looks like it is line-ending issues. Is this still happening > even after Guido changed the test to open the files in binary? No, all is well now. That's why you didn't see a sequence of increasingly vicious msgs from me <wink>. From guido@python.org Sat Apr 26 00:51:30 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 19:51:30 -0400 Subject: [Python-Dev] Failin tests on Windows In-Reply-To: "Your message of Fri, 25 Apr 2003 15:17:52 PDT." <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> Message-ID: <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> > [Guido van Rossum] > > > > test_urllib.py is crashing on my fresh WinMe build: > > > > > > test_fileno (__main__.urlopen_FileTests) ... FAIL > > > test_iter (__main__.urlopen_FileTests) ... FAIL > > > test_read (__main__.urlopen_FileTests) ... FAIL > > > test_readline (__main__.urlopen_FileTests) ... FAIL > > > test_readlines (__main__.urlopen_FileTests) ... FAIL > > > > Should be fixed now -- I'm writing the file with test data in binary > > mode. > > > > Didn't even think of that problem when I wrote the tests. Should I > patch the docs for urllib (again =) to say that files are open in > binary? I know I wasn't expecting urllib to open in binary mode for > a local text file. It's a good idea to document that urllib (currently!) never does newline translation. Given that URLs often point to binary files, that's probably a good idea! > Thanks for fixing this, Guido. I think I am going to do a self-imposed > "no checkins within 24 hours of a planned release" rule. Yeah, me too. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Sat Apr 26 00:54:34 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 16:54:34 -0700 (PDT) Subject: [Python-Dev] Failin tests on Windows In-Reply-To: <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU> [Guido van Rossum] > > [Guido van Rossum] > > Didn't even think of that problem when I wrote the tests. Should I > > patch the docs for urllib (again =) to say that files are open in > > binary? I know I wasn't expecting urllib to open in binary mode for > > a local text file. > > It's a good idea to document that urllib (currently!) never does > newline translation. Given that URLs often point to binary files, > that's probably a good idea! > OK. I will patch the docs and the docstrings (and backport it as necessary) after you raise the commit moratorium. > > Thanks for fixing this, Guido. I think I am going to do a self-imposed > > "no checkins within 24 hours of a planned release" rule. > > Yeah, me too. :-) > Perhaps this should be in the FAQ? -Brett From drifty@alum.berkeley.edu Sat Apr 26 00:57:46 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 16:57:46 -0700 (PDT) Subject: [Python-Dev] More new Windos test failures In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com> References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com> Message-ID: <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU> [Tim Peters] > >> test_urllib (these may all be bad line-end assumptions): > > [Brett] > > Yep, it looks like it is line-ending issues. Is this still happening > > even after Guido changed the test to open the files in binary? > > No, all is well now. That's why you didn't see a sequence of increasingly > vicious msgs from me <wink>. > =) I have fixed my copy, though, to rstrip all the text that is compared in case Guido's quick fix is removed later. I will commit it when Guido gives the all-clear. -Brett From guido@python.org Sat Apr 26 01:02:53 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 20:02:53 -0400 Subject: [Python-Dev] Failin tests on Windows In-Reply-To: "Your message of Fri, 25 Apr 2003 16:54:34 PDT." <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU> Message-ID: <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net> > > It's a good idea to document that urllib (currently!) never does > > newline translation. Given that URLs often point to binary files, > > that's probably a good idea! > > OK. I will patch the docs and the docstrings (and backport it as > necessary) after you raise the commit moratorium. Consider it raised. Python 2.3b1 is officially released! > > > Thanks for fixing this, Guido. I think I am going to do a > > > self-imposed "no checkins within 24 hours of a planned release" > > > rule. > > > > Yeah, me too. :-) > > Perhaps this should be in the FAQ? But then releases would be so *boring*! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Sat Apr 26 01:07:34 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 17:07:34 -0700 (PDT) Subject: [Python-Dev] Rules of a beta release? In-Reply-To: <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU> <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU> [Guido van Rossum] > > > It's a good idea to document that urllib (currently!) never does > > > newline translation. Given that URLs often point to binary files, > > > that's probably a good idea! > > > > OK. I will patch the docs and the docstrings (and backport it as > > necessary) after you raise the commit moratorium. > > Consider it raised. Python 2.3b1 is officially released! > Wonderful! > > > > Thanks for fixing this, Guido. I think I am going to do a > > > > self-imposed "no checkins within 24 hours of a planned release" > > > > rule. > > > > > > Yeah, me too. :-) > > > > Perhaps this should be in the FAQ? > > But then releases would be so *boring*! :-) > I think Raymond should add something about this in his next bit of "Hard Knocks"-type writing. =) Now that we are officially in a beta release, I want to clarify what the ground rules are in terms of commits are. Obviously no new functionality such as new modules or built-ins. But what about small features? Specifically, since I have CVS commit I can finally apply my patch to regrtest.py to allow the use of a skips.txt file listing tests to skip (unless people don't want it anymore). Now that is a new feature, but it is minor *and* it is on an undocumented module (for now; I will get those docs done before 2.3 final is reached). Is this reasonable to commit now? Anything else I should know so I don't run a muck in CVS? =) -Brett From guido@python.org Sat Apr 26 01:12:41 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 20:12:41 -0400 Subject: [Python-Dev] RELEASED: Python 2.3b1 Message-ID: <200304260012.h3Q0CgE03802@pcp02138704pcs.reston01.va.comcast.net> Python 2.3b1 is the first beta release of Python 2.3. Much improved since the last alpha, chockfull of things you'd like to check out: http://www.python.org/2.3/ Some highlights of what's new since 2.3a2: - sum() builtin, adds a sequence of numbers, beats reduce(). - csv module, reads comma-separated-value files (and more). - timeit module, times code snippets. - os.walk(), a generator slated to replace os.path.walk(). - platform module, by Marc-Andre Lemburg, returns detailed platform information. For more highlights, see http://www.python.org/2.3/highlights.html New since Python 2.2: - Many new and improved library modules, e.g. sets, heapq, datetime, textwrap, optparse, logging, bsddb, bz2, tarfile, ossaudiodev, and a new random number generator based on the highly acclaimed Mersenne Twister algorithm (with a period of 2**19937-1!). - New builtin enumerate(): an iterator yielding (index, item) pairs. - Extended slices, e.g. "hello"[::-1] returns "olleh". - Universal newlines mode for reading files (converts \r, \n and \r\n all into \n). - Source code encoding declarations. (PEP 263) - Import from zip files. (PEP 273 and PEP 302) - FutureWarning issued for "unsigned" operations on ints. (PEP 237) - Faster list.sort() is now stable. - Unicode filenames on Windows. - Karatsuba long multiplication (running time O(N**1.58) instead of O(N**2)). See also http://www.python.org/doc/2.3b1/whatsnew/ - Andrew Kuchling's description of all important changes since 2.2. We request widespread testing of this release but don't recommend using it for production situations yet. Beta releases contain bugs. New APIs are expected to be stable, and may be changed only if serious deficiencies are found. No new APIs or modules will be added after the first beta release. If you have an important Python application, we strongly recommend that you try it out with a beta release and report any incompatibilities or other problems you may encounter, so that they can be fixed before the final release. To report problems, use the SourceForge bug tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470 Enjoy! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Sat Apr 26 01:16:50 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 26 Apr 2003 10:16:50 +1000 Subject: [Python-Dev] New thread death in test_bsddb3 In-Reply-To: <y91ycusm.fsf@python.net> Message-ID: <015c01c30b89$22b46a60$530f8490@eden> > So is the policy now that it is no longer *allowed* to create another > thread state, while in previous versions there wasn't any choice, > because there existed no way to get the existing one? Only not allowed under debug builds <wink>. I would be more than happy to have this code print a warning, or take some alternative action - but I would hate to see the message dropped. Would a PyErr_Warning call be more appropriate? The only issue here is that literally *thousands* may be generated. Mark. From guido@python.org Sat Apr 26 01:19:14 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 20:19:14 -0400 Subject: [Python-Dev] Rules of a beta release? In-Reply-To: "Your message of Fri, 25 Apr 2003 17:07:34 PDT." <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU> References: <E198zgN-00016z-00@sc8-pr-cvs1.sourceforge.net> <005d01c30b17$28840580$1a3cc797@oemcomputer> <200304251521.h3PFLt206738@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251511280.25263@death.OCF.Berkeley.EDU> <200304252351.h3PNpUI02836@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251653040.9257@death.OCF.Berkeley.EDU> <200304260002.h3Q02ri03451@pcp02138704pcs.reston01.va.comcast.net> <Pine.SOL.4.55.0304251702360.9257@death.OCF.Berkeley.EDU> Message-ID: <200304260019.h3Q0JEV03824@pcp02138704pcs.reston01.va.comcast.net> > Now that we are officially in a beta release, I want to clarify what > the ground rules are in terms of commits are. Obviously no new > functionality such as new modules or built-ins. But what about > small features? Specifically, since I have CVS commit I can finally > apply my patch to regrtest.py to allow the use of a skips.txt file > listing tests to skip (unless people don't want it anymore). Now > that is a new feature, but it is minor *and* it is on an > undocumented module (for now; I will get those docs done before 2.3 > final is reached). IMO in general fiddling with the test suite during beta is okay. There should be guidelines for this (for all I know there's already a PEP :-) but I'm too tired to write any more about it. Use common sense. It should be the case that if someone tested their application with 2.3b1 and they tweaked everything to work with that version, they shouldn't have to tweak anything to work with 2.3final. I plan to make an exception for IDLE: a brand new copy of IDLEfork will replace the current IDLE 0.8. I was very tempted to include it today, but there wasn't time to get all the loose ends tied up: it has a C extension now, and the Windows installer would have to be changed; plus, Kurt has some improvements that he hasn't even checked in. So he'll do an independent IDLEfork beta, and then it'll be incorporated into Python. Hopefully that will all be done two weeks from now. > Is this reasonable to commit now? Anything else I should know so I > don't run a muck in CVS? =) Don't be too fearful -- if you really commit an atrocity, the nice thing of CVS is that it's easy to roll back. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Apr 26 01:21:37 2003 From: guido@python.org (Guido van Rossum) Date: Fri, 25 Apr 2003 20:21:37 -0400 Subject: [Python-Dev] More new Windos test failures In-Reply-To: "Your message of Fri, 25 Apr 2003 16:57:46 PDT." <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU> References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com> <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU> Message-ID: <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net> > =) I have fixed my copy, though, to rstrip all the text that is > compared in case Guido's quick fix is removed later. I will commit > it when Guido gives the all-clear. I just realized that this would be *wrong* -- URLs may point to binary files and there's no reliable way to know whether this is the case. --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Sat Apr 26 01:27:48 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 26 Apr 2003 10:27:48 +1000 Subject: [Python-Dev] PyRun_* functions In-Reply-To: <BAY7-F101Dp35O7rGKN00003a29@hotmail.com> Message-ID: <017501c30b8a$aaa22010$530f8490@eden> > Its seems that it would be a good enhancement to remove the > FILE pointer > parameter from these functions, and just use the file name. > For example, > change PyRun_SimpleFile( FILE *fp, char *filename) to > PyRun_SimpleFile(char > *filename). Then no one would have to worry about the incompatibility. Or simply a PyFile_Open/Close pair - exactly mirroring fopen(), but inside the Python DLL, so guaranteed to use the same library. I believe the only reason this hasn't come up before as a patch is that PyRun_() functions that take file objects are great "getting started" functions, but tend to not be used in real apps - in that case the requirements start getting tricker, so you tend to drop down to the lower-level Python APIs. If it really did worry you, I would expect a patch at sourceforge with these 2 new functions would have a good chance of getting in. Mark. From drifty@alum.berkeley.edu Sat Apr 26 01:32:34 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Fri, 25 Apr 2003 17:32:34 -0700 (PDT) Subject: [Python-Dev] More new Windos test failures In-Reply-To: <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net> References: <LNBBLJKPBEHFEDALKOLCMEDKEEAB.tim@zope.com> <Pine.SOL.4.55.0304251655280.9257@death.OCF.Berkeley.EDU> <200304260021.h3Q0LbB03868@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <Pine.SOL.4.55.0304251731590.9257@death.OCF.Berkeley.EDU> [Guido van Rossum] > > =) I have fixed my copy, though, to rstrip all the text that is > > compared in case Guido's quick fix is removed later. I will commit > > it when Guido gives the all-clear. > > I just realized that this would be *wrong* -- URLs may point to binary > files and there's no reliable way to know whether this is the case. > OK, so then I won't commit my changes and let the stand as they are in CVS right now. -Brett From Raymond Hettinger" <python@rcn.com Sat Apr 26 01:33:00 2003 From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger) Date: Fri, 25 Apr 2003 20:33:00 -0400 Subject: [Python-Dev] Curiousity Message-ID: <003d01c30b8b$64697b60$125ffea9@oemcomputer> Do we have download statistics for the various releases including alpha and betas? Raymond Hettinger From mark@ned.dem.csiro.au Sat Apr 26 03:17:58 2003 From: mark@ned.dem.csiro.au (Mark Favas) Date: Sat, 26 Apr 2003 10:17:58 +0800 (WST) Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9) Message-ID: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au> Just confirming Skip's observation - 2.3b1 test_logging (with Neal's patch) passed once on Solaris 9 (gcc 3.2.2) but failed thereafter. No other test failures. Mark Favas From skip@pobox.com Sat Apr 26 03:56:06 2003 From: skip@pobox.com (Skip Montanaro) Date: Fri, 25 Apr 2003 21:56:06 -0500 Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9) In-Reply-To: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au> References: <200304260217.h3Q2Hwmb003576@solo.ned.dem.csiro.au> Message-ID: <16041.62790.619502.562615@montanaro.dyndns.org> Mark> Just confirming Skip's observation - 2.3b1 test_logging (with Mark> Neal's patch) passed once on Solaris 9 (gcc 3.2.2) but failed Mark> thereafter. No other test failures. Failed (completed with one or more failures or errors) or hung? Skip From mark@ned.dem.csiro.au Sat Apr 26 11:11:16 2003 From: mark@ned.dem.csiro.au (Mark Favas) Date: Sat, 26 Apr 2003 18:11:16 +0800 (WST) Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9) Message-ID: <200304261011.h3QABGka004908@solo.ned.dem.csiro.au> [Skip] Failed (completed with one or more failures or errors) or hung? Sorry - hung, couldn't ^C it, had to ^Z and "kill %1" the "make test" process. Mark Favas From martin@v.loewis.de Sat Apr 26 14:10:59 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 26 Apr 2003 15:10:59 +0200 Subject: [Python-Dev] Curiousity In-Reply-To: <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net> References: <003d01c30b8b$64697b60$125ffea9@oemcomputer> <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3EAA8563.6060404@v.loewis.de> Guido van Rossum wrote: > Please don't spread these around; I've made this response > non-archivable by including an "X-Archive: No" header. (It's okay IMO > for folks receiving python-dev to see this, but not to spread it > around.) What is creating accesses to URLs like /doc/2.3a2//////////////////////////////about.html ??? Regards, Martin From pje@telecommunity.com Sat Apr 26 14:37:48 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Sat, 26 Apr 2003 09:37:48 -0400 Subject: [Python-Dev] Accepted PEPs? Message-ID: <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com> I was going over the PEP index this morning, and I noticed a large number of PEPs listed under the "open" list that would seem to me to be "accepted", if not "done" in some cases, according to the criteria described by the headings. (Specifically, PEPs 218, 237, 273, 282, 283, 301, 302, 305, and 307.) Others under "open" I would guess are in fact "rejected", notably 294 (the patch was closed rejected) and 313 (presumably tongue-in-cheek). Should I submit a patch for PEP 0? From guido@python.org Sat Apr 26 17:11:23 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 26 Apr 2003 12:11:23 -0400 Subject: [Python-Dev] Accepted PEPs? In-Reply-To: "Your message of Sat, 26 Apr 2003 09:37:48 EDT." <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com> References: <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com> Message-ID: <200304261611.h3QGBNF05043@pcp02138704pcs.reston01.va.comcast.net> > I was going over the PEP index this morning, and I noticed a large number > of PEPs listed under the "open" list that would seem to me to be > "accepted", if not "done" in some cases, according to the criteria > described by the headings. (Specifically, PEPs 218, 237, 273, 282, 283, > 301, 302, 305, and 307.) Some of those (e.g. 237) have multiple stages and ought to remain open until the last stage is implemented. 283 ought to remain open until Python 2.3 final is released. Some others need to be brought in line with what ended up being implemented. Authors with commit privileges can update their own PEPs; others can send patches or new versions to the PEP editors. > Others under "open" I would guess are in fact "rejected", notably > 294 (the patch was closed rejected) Correct -- this *issue* is still open, but the solution from the PEP is rejected. > and 313 (presumably tongue-in-cheek). I think it's appropriate for April Fool's PEPs to be in limbo forever. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Apr 26 17:48:41 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 26 Apr 2003 12:48:41 -0400 Subject: [Python-Dev] Curiousity In-Reply-To: "Your message of Sat, 26 Apr 2003 15:10:59 +0200." <3EAA8563.6060404@v.loewis.de> References: <003d01c30b8b$64697b60$125ffea9@oemcomputer> <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net> <3EAA8563.6060404@v.loewis.de> Message-ID: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> > What is creating accesses to URLs like > > /doc/2.3a2//////////////////////////////about.html > > ??? I see these too (not that exact one though) and always in the form /dev/doc/devel//////lib/<something>. Grepping through today's access log (/usr/local/log/httpd.access on creosote) suggests that these come from Ultraseek. This suggests that the spider on search.python.org perhaps generates these. It appears to generate such URLs with any number of slashes between 1 and 6. But I can't find any clues like relative URLs using an extra / anywhere in those files. It might be a bug in Ultraseek's url joining algorithm. A while ago some people were interested in upgrading our Ultraseek setup, but that initiative seems to have fallen by the wayside. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From pje@telecommunity.com Sat Apr 26 21:51:58 2003 From: pje@telecommunity.com (Phillip J. Eby) Date: Sat, 26 Apr 2003 16:51:58 -0400 Subject: [Python-Dev] Accepted PEPs? In-Reply-To: <200304261611.h3QGBNF05043@pcp02138704pcs.reston01.va.comca st.net> References: <"Your message of Sat, 26 Apr 2003 09:37:48 EDT." <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com> <5.1.0.14.0.20030426092433.0234e6e0@mail.telecommunity.com> Message-ID: <5.1.0.14.0.20030426164409.034ae2b0@mail.telecommunity.com> At 12:11 PM 4/26/03 -0400, Guido van Rossum wrote: > > I was going over the PEP index this morning, and I noticed a large number > > of PEPs listed under the "open" list that would seem to me to be > > "accepted", if not "done" in some cases, according to the criteria > > described by the headings. (Specifically, PEPs 218, 237, 273, 282, 283, > > 301, 302, 305, and 307.) > >Some of those (e.g. 237) have multiple stages and ought to remain open >until the last stage is implemented. 283 ought to remain open until >Python 2.3 final is released. Some others need to be brought in line >with what ended up being implemented. Authors with commit privileges >can update their own PEPs; others can send patches or new versions to >the PEP editors. The PEP list has an additional heading called "Accepted"; currently only 252 and 253 are in that category. I would've thought that the ones like 237 and 283 that are not "Done" but are definitely "accepted" would go under that "Accepted" heading. It's not a big deal, but it's very hard to see from the list which things are "in progress", "need revisions", or are "unlikely to make it". So since I already took the trouble to work out the answers for myself, I thought I'd offer to help the next person who came along. :) From goodger@python.org Sat Apr 26 23:20:53 2003 From: goodger@python.org (David Goodger) Date: Sat, 26 Apr 2003 18:20:53 -0400 Subject: [Python-Dev] Accepted PEPs? Message-ID: <3EAB0645.7040306@python.org> [Phillip J. Eby] >>> I was going over the PEP index this morning, and I noticed a large >>> number of PEPs listed under the "open" list that would seem to me >>> to be "accepted", if not "done" in some cases, according to the >>> criteria described by the headings. (Specifically, PEPs 218, 237, >>> 273, 282, 283, 301, 302, 305, and 307.) Wearing my PEP Editor hat, I recently performed a similar exercise. I even got Guido's OK on suggested changes to Final and Approved on those specific PEPs (all but 305, which I'd missed). On further reflection however, I'm not sure that we should go forward without at least giving the authors notice, and a chance to make changes (especially, changes that bring PEPs in line with current reality). PEP 1 states: Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to "Final". It's unclear whether the BDFL should even be able to review a PEP without the author's review request (I'm pretty sure everyone would agree that it's OK, but it's not clear from the wording). So as not to upset PEP authors unnecessarily ;-), I think we ought to follow the formal process. It's not too onerous; a simple note (stating "PEP X is ready for review") to <peps@python.org> would be sufficient: I'll send out reminders. > It's not a big deal, but it's very hard to see from the list which > things are "in progress", "need revisions", or are "unlikely to make > it". So since I already took the trouble to work out the answers for > myself, I thought I'd offer to help the next person who came along. > :) Everyone needs a good kick in the pants once in a while, thanks. >>> Others under "open" I would guess are in fact "rejected", notably >>> 294 (the patch was closed rejected) [Guido] >> Correct -- this *issue* is still open, but the solution from the >> PEP is rejected. So is PEP 294 itself rejected? Or should we await a formal review request (as per the above)? [Phillip] >>> Should I submit a patch for PEP 0? Don't bother; I'll update it as required. -- David Goodger <http://starship.python.net/~goodger> Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/> (Please cc: all PEP correspondence to <peps@python.org>.) From goodger@python.org Sat Apr 26 23:25:18 2003 From: goodger@python.org (David Goodger) Date: Sat, 26 Apr 2003 18:25:18 -0400 Subject: [Python-Dev] Reminder to PEP authors Message-ID: <3EAB074E.9040102@python.org> There are several PEPs with "Draft" status which are ripe for review. PEP 1 states: Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to "Final". PEP authors, please keep your PEPs up to date. When you think the PEP is ready for review, please send a note to <peps@python.org> stating "PEP X is ready for review". Otherwise the PEP may remain in "Draft" limbo indefinitely. It is the PEP author's responsibility to move the process forward: Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. Authors with CVS check-in privileges are welcome to check in their own content changes. Others should send updates to <peps@python.org> (please make updates to the latest text from CVS). -- David Goodger <http://starship.python.net/~goodger> Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/> (Please cc: all PEP correspondence to <peps@python.org>.) From barry@python.org Sun Apr 27 02:28:12 2003 From: barry@python.org (Barry Warsaw) Date: 26 Apr 2003 21:28:12 -0400 Subject: [Python-Dev] Curiousity In-Reply-To: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> References: <003d01c30b8b$64697b60$125ffea9@oemcomputer> <200304261239.h3QCdDg04766@pcp02138704pcs.reston01.va.comcast.net> <3EAA8563.6060404@v.loewis.de> <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1051406872.20524.2.camel@geddy> On Sat, 2003-04-26 at 12:48, Guido van Rossum wrote: > A while ago some people were interested in upgrading our Ultraseek > setup, but that initiative seems to have fallen by the wayside. :-( Last time I talked to Thomas about this, I think he mentioned that the machine he had earmarked for the upgrade got appropriated by others. IIRC, he was expected more machines to become available soon though. -Barry From gward@python.net Sun Apr 27 02:35:42 2003 From: gward@python.net (Greg Ward) Date: Sat, 26 Apr 2003 21:35:42 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> References: <20030423175310.F15881@localhost.localdomain> <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030427013542.GA919@cthulhu.gerg.ca> On 23 April 2003, Guido van Rossum said: > > A better comparison would be Habitat for Humanity (and voluntary > > associations in general). [...] > > Maybe. I get lots of junk mail asking for contributions from HforH > and frankly I've always thought of them as yet another charity: there > are lots of these, and most of them are so much larger than our > community that comparison is difficult. Don't forget, the PSF is gunning for charity status too. That's just the most obvious way to state legally, "We are a community with shared values, etc. etc.". I think there are a lot of parallels between open source development and other volunteer organizations. Heck, I like to justify the occasional weekend spent hunkered down in front of the computer by saying I'm doing volunteer work. IMHO, hacking on Python is the moral equivalent of helping to maintain public-access hiking trails. (Although the latter is better exercise, it's nice not to have to jump in the shower after a day spent hacking on Python. ;-) Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Reality is for people who can't handle science fiction. From gward@python.net Sun Apr 27 02:44:07 2003 From: gward@python.net (Greg Ward) Date: Sat, 26 Apr 2003 21:44:07 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304251948.26774.fincher.8@osu.edu> References: <200304241650.h3OGoPM15432@odiug.zope.com> <200304251948.26774.fincher.8@osu.edu> Message-ID: <20030427014407.GC919@cthulhu.gerg.ca> On 25 April 2003, Jeremy Fincher said: > It's a minor quibble to be sure, but os.walk doesn't really describe what > exactly it's doing. I'd suggest os.pathwalk, but that'd be too error-prone, > being os.path.walk without a dot. Perhaps os.pathwalker? os.walktree? os.walkdirs? os.walkpath? (On reflection, the latter two are pretty dumb. walktree is the right name, undoubtedly. ;-) Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ God is omnipotent, omniscient, and omnibenevolent ---it says so right here on the label. From gward@python.net Sun Apr 27 02:55:55 2003 From: gward@python.net (Greg Ward) Date: Sat, 26 Apr 2003 21:55:55 -0400 Subject: [Python-Dev] When is it okay to ``cvs remove``? In-Reply-To: <1051240796.11580.4.camel@geddy> References: <20030424225914.GA26254@xs4all.nl> <1051240796.11580.4.camel@geddy> Message-ID: <20030427015555.GD919@cthulhu.gerg.ca> On 24 April 2003, Barry Warsaw said: > I know Guido doesn't care, but I like to have the file major revision > numbers match the s/w's major rev number. Blecchh! Evil! Wrong! Bad! Naughty, naughty! Software versions have nothing to do with file revisions. Some obscure little file might change very little (or not at all) in going from MyGreatBigProduct 1.4 to 2.0. It's revision number should not be artificially bumped just because a lot of other files in the same project got bumped too. > Really, I just hate to see > huge minor revision numbers on files. Then you'll just love Subversion: when Neil S. converted the MEMS Exchange CVS repository to Subversion back in January, all of a sudden every file we knew and loved had a revision number around 21000. Yow! I'm not convinced Subversion's model is exactly right, but it's certainly no worse than CVS'. Probably better. Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ I'd rather have a bottle in front of me than have to have a frontal lobotomy. From gward@python.net Sun Apr 27 03:05:41 2003 From: gward@python.net (Greg Ward) Date: Sat, 26 Apr 2003 22:05:41 -0400 Subject: [Python-Dev] test_ossaudiodev hanging again In-Reply-To: <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net> References: <1051287405.1009.66.camel@slothrop.zope.com> <200304251639.h3PGdk924475@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030427020541.GE919@cthulhu.gerg.ca> On 25 April 2003, Guido van Rossum said: > It probably never stopped hanging. It only runs when you pass > "-u audio" to regrtest though. > > I note that it passes for me with Red Hat 7.3, so you might want to > upgrade. :-) Could be hardware, or it could be the device driver in the kernel. Jeremy, what audio software do you use regularly -- xmms? play? anything? ossaudiodev currently goes to great pains to open the audio device in what *seems* to be the right way, but I have no idea if it really is. (Oh yeah: it opens with O_NONBLOCK, to avoid hanging on the open() call. Then it uses fcntl() to put the device back in blocking mode, so that write() acts sanely. If you really want to do non-blocking audio I/O, you use the nonblock() method, which uses an OSS-specific ioctl(). O_NONBLOCK has no documented meaning with OSS; using it at open() time was just a lucky guess on my part. It does seem to affect write(), at least with one of my audio devices. [I have a sound card and an external USB audio device, which makes things interesting at times.]) Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ If you and a friend are being chased by a lion, it is not necessary to outrun the lion. It is only necessary to outrun your friend. From mhammond@skippinet.com.au Sun Apr 27 04:00:29 2003 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sun, 27 Apr 2003 13:00:29 +1000 Subject: [Python-Dev] test_ossaudiodev hanging again In-Reply-To: <20030427020541.GE919@cthulhu.gerg.ca> Message-ID: <02df01c30c69$29cf59a0$530f8490@eden> [Greg] > On 25 April 2003, Guido van Rossum said: > > It probably never stopped hanging. It only runs when you pass > > "-u audio" to regrtest though. > > > > I note that it passes for me with Red Hat 7.3, so you might want to > > upgrade. :-) > > Could be hardware, or it could be the device driver in the kernel. > Jeremy, what audio software do you use regularly -- xmms? play? > anything? ossaudiodev currently goes to great pains to open the audio > device in what *seems* to be the right way, but I have no idea if it > really is. It fails for me too, RH8: Linux bobcat 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux Install is pure vanilla on an asus laptop. As far as I can tell, there is no sound driver installed (but I'm not sure :) Gnome desktop is not starting the "sound server". I have never heard a sound through these speakers under Linux (so the fact Python can't play a sound isn't a problem, but the fact write() hangs is) Note that as mentioned this only fails/hangs when the audio resource is enabled, so in general I don't have a problem but thought the data point may be interesting. Mark. From guido@python.org Sun Apr 27 04:28:59 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 26 Apr 2003 23:28:59 -0400 Subject: [Python-Dev] Democracy In-Reply-To: "Your message of Sat, 26 Apr 2003 21:35:42 EDT." <20030427013542.GA919@cthulhu.gerg.ca> References: <20030423175310.F15881@localhost.localdomain> <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> <20030427013542.GA919@cthulhu.gerg.ca> Message-ID: <200304270328.h3R3Sxd05643@pcp02138704pcs.reston01.va.comcast.net> > > > A better comparison would be Habitat for Humanity (and voluntary > > > associations in general). [...] > > > On 23 April 2003, Guido van Rossum said: > > Maybe. I get lots of junk mail asking for contributions from > > HforH and frankly I've always thought of them as yet another > > charity: there are lots of these, and most of them are so much > > larger than our community that comparison is difficult. [Greg Ward] > Don't forget, the PSF is gunning for charity status too. That's > just the most obvious way to state legally, "We are a community with > shared values, etc. etc.". I think there are a lot of parallels > between open source development and other volunteer organizations. > Heck, I like to justify the occasional weekend spent hunkered down > in front of the computer by saying I'm doing volunteer work. IMHO, > hacking on Python is the moral equivalent of helping to maintain > public-access hiking trails. (Although the latter is better > exercise, it's nice not to have to jump in the shower after a day > spent hacking on Python. ;-) Sure (although I hope you jump in the shower anyway :). But I don't want the PSF to grow to the point where we have to send junk mail to people who haven't heard about us. --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Sun Apr 27 04:28:37 2003 From: barry@python.org (Barry Warsaw) Date: 26 Apr 2003 23:28:37 -0400 Subject: [Python-Dev] test_ossaudiodev hanging again In-Reply-To: <02df01c30c69$29cf59a0$530f8490@eden> References: <02df01c30c69$29cf59a0$530f8490@eden> Message-ID: <1051414116.20524.98.camel@geddy> On Sat, 2003-04-26 at 23:00, Mark Hammond wrote: > It fails for me too, RH8: I just upgraded my RH7.3 laptop to RH9. test_ossaudiodev passes for me, even though I turn off the speaker on this laptop due to unbearable feedback (can you say "poor Dell design"?). In fact Python 2.3 cvs looks pretty good on RH9. Python 2.2 maint is another story, but I'm still investigating some things and will send a separate email about that later. -Barry From guido@python.org Sun Apr 27 04:37:21 2003 From: guido@python.org (Guido van Rossum) Date: Sat, 26 Apr 2003 23:37:21 -0400 Subject: [Python-Dev] Accepted PEPs? In-Reply-To: "Your message of Sat, 26 Apr 2003 18:20:53 EDT." <3EAB0645.7040306@python.org> References: <3EAB0645.7040306@python.org> Message-ID: <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net> > So is PEP 294 itself rejected? Or should we await a formal review > request (as per the above)? I suggest to reject it without further ado. It seems there are two kinds of PEPs: those aimed primarily at public review, and those aimed primarily at the BDFL. 294 seems to be of the latter kind; it's 10 months old now and has never been posted (at least according to its Post-History). I wonder if the language in PEP 1 about this needs firming up? --Guido van Rossum (home page: http://www.python.org/~guido/) From goodger@python.org Sun Apr 27 05:59:19 2003 From: goodger@python.org (David Goodger) Date: Sun, 27 Apr 2003 00:59:19 -0400 Subject: [Python-Dev] Democracy In-Reply-To: <20030427013542.GA919@cthulhu.gerg.ca> References: <20030423175310.F15881@localhost.localdomain> <200304240114.h3O1EQG31505@pcp02138704pcs.reston01.va.comcast.net> <20030427013542.GA919@cthulhu.gerg.ca> Message-ID: <3EAB63A7.6050801@python.org> Greg Ward wrote: > IMHO, hacking on Python is > the moral equivalent of helping to maintain public-access hiking > trails. (Although the latter is better exercise, it's nice not to have > to jump in the shower after a day spent hacking on Python. ;-) You must not be practising Extreme Programming properly. (Apologies for the obvious... but what an opening!) -- David Goodger From barry@python.org Sun Apr 27 06:02:27 2003 From: barry@python.org (Barry Warsaw) Date: 27 Apr 2003 01:02:27 -0400 Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9 Message-ID: <1051419746.20524.193.camel@geddy> --=-3sCy95DzaAuHt1Ovtt10 Content-Type: text/plain Content-Transfer-Encoding: 7bit I've been upgrading a few machines to Redhat 9 from 7.3 and I've run into a few minor problems with Python on the 2.2 maint branch. Both dbmmodule and _socket fail to build properly. Neither problems exist in 2.3 cvs. The socket problem is fairly shallow I think: including ssl.h eventually includes krb5.h. Python 2.3's setup.py has a couple of lines of code to deal with this, and that just needs to go into 2.2 maint's setup.py, so I checked this in. The dbm problem is just a bit deeper. dbm ends up linking against gdbm, so the library has to be specified in setup.py. I'm nervous about adding the stuff to setup.py because I don't want to break other platforms. Looking at 2.3's setup.py shows this section to be more complicated and I'm too tired to tease everything out tonight. I'll attach a diff to this message in case anybody else feels like mucking with it in the meantime. -Barry --=-3sCy95DzaAuHt1Ovtt10 Content-Description: Content-Disposition: inline; filename=setup.py-patch.txt Content-Type: text/x-patch; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Index: setup.py =================================================================== RCS file: /cvsroot/python/python/dist/src/setup.py,v retrieving revision 1.73.4.16 diff -u -r1.73.4.16 setup.py --- setup.py 27 Apr 2003 04:00:01 -0000 1.73.4.16 +++ setup.py 27 Apr 2003 05:00:15 -0000 @@ -406,7 +406,8 @@ exts.append( Extension('dbm', ['dbmmodule.c'], libraries = ['db1'] ) ) else: - exts.append( Extension('dbm', ['dbmmodule.c']) ) + exts.append( Extension('dbm', ['dbmmodule.c'], + libraries = ['gdbm']) ) # Anthony Baxter's gdbm module. GNU dbm(3) will require -lgdbm: if (self.compiler.find_library_file(lib_dirs, 'gdbm')): --=-3sCy95DzaAuHt1Ovtt10-- From Anthony Baxter <anthony@interlink.com.au> Sun Apr 27 07:05:51 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Sun, 27 Apr 2003 16:05:51 +1000 Subject: [Python-Dev] Curiousity In-Reply-To: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200304270605.h3R65qK21094@localhost.localdomain> >>> Guido van Rossum wrote > A while ago some people were interested in upgrading our Ultraseek > setup, but that initiative seems to have fallen by the wayside. :-( Not exactly. I'm still waiting for a) the new linux-based creosote b) the ultraseek license key Anthony From niemeyer@conectiva.com Sun Apr 27 07:44:46 2003 From: niemeyer@conectiva.com (Gustavo Niemeyer) Date: Sun, 27 Apr 2003 03:44:46 -0300 Subject: [Python-Dev] test_s?re merge In-Reply-To: <16041.24315.500827.370963@montanaro.dyndns.org> References: <16041.24315.500827.370963@montanaro.dyndns.org> Message-ID: <20030427064444.GA30981@localhost> > For those of you who don't read python-checkins, the merge of test_re.py and > test_sre.py has been completed and test_sre.py is no longer in the [...] Great work! Thanks. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] From martin@v.loewis.de Sun Apr 27 09:22:53 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 27 Apr 2003 10:22:53 +0200 Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9 In-Reply-To: <1051419746.20524.193.camel@geddy> References: <1051419746.20524.193.camel@geddy> Message-ID: <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de> Barry Warsaw <barry@python.org> writes: > - exts.append( Extension('dbm', ['dbmmodule.c']) ) > + exts.append( Extension('dbm', ['dbmmodule.c'], > + libraries = ['gdbm']) ) I think this was an alternative for platforms where a dbm library is part of the C library. Your patch would kill those platforms - but it may be that we are talking about the empty set here. Regards, Martin From oren-py-d@hishome.net Sun Apr 27 10:58:22 2003 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sun, 27 Apr 2003 05:58:22 -0400 Subject: [Python-Dev] Accepted PEPs? In-Reply-To: <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net> References: <3EAB0645.7040306@python.org> <200304270337.h3R3bLY05672@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030427095822.GA66695@hishome.net> On Sat, Apr 26, 2003 at 11:37:21PM -0400, Guido van Rossum wrote: > > So is PEP 294 itself rejected? Or should we await a formal review > > request (as per the above)? > > I suggest to reject it without further ado. Go ahead. I still consider this an open issue (though of pretty low priority). If anyone else here feels that it's redundant to refer to built in types by two different names and has a better idea of where to put names that match the __name__ attribute of types, please go ahead and write a proposal. > It seems there are two > kinds of PEPs: those aimed primarily at public review, and those aimed > primarily at the BDFL. 294 seems to be of the latter kind; it's 10 > months old now and has never been posted (at least according to its > Post-History). I wonder if the language in PEP 1 about this needs > firming up? Mea culpa. I never realized that I forgot to actually post it. Oren From skip@mojam.com Sun Apr 27 13:01:17 2003 From: skip@mojam.com (Skip Montanaro) Date: Sun, 27 Apr 2003 07:01:17 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200304271201.h3RC1He01081@manatee.mojam.com> Bug/Patch Summary ----------------- 407 open / 3569 total bugs (+12) 129 open / 2106 total patches (+1) New Bugs -------- socketmodule doesn't compile on strict POSIX systems (2003-04-20) http://python.org/sf/724588 email/quopriMIME.py exception on int (lstrip) (2003-04-20) http://python.org/sf/724621 Minor /Tools/Scripts/crlf.py bugs (2003-04-20) http://python.org/sf/724767 Possible OSX module location bug (2003-04-21) http://python.org/sf/725026 SRE bug with capturing groups in alternatives in repeats (2003-04-21) http://python.org/sf/725106 SRE bugs with capturing groups in negative assertions (2003-04-21) http://python.org/sf/725149 urlopen object's read() doesn't read to EOF (2003-04-21) http://python.org/sf/725265 Broken links (2003-04-23) http://python.org/sf/726150 textwrap.wrap infinite loop (2003-04-23) http://python.org/sf/726446 platform module needs docs (LaTeX) (2003-04-24) http://python.org/sf/726911 valgrind python fails (2003-04-24) http://python.org/sf/727051 use bsddb185 if necessary in dbhash (2003-04-24) http://python.org/sf/727137 Core Dumps : Python2.2.2 (2003-04-24) http://python.org/sf/727241 test_bsddb3 fails (2003-04-25) http://python.org/sf/727571 Documentation formatting bugs (2003-04-25) http://python.org/sf/727692 email parsedate still wrong (PATCH) (2003-04-25) http://python.org/sf/727719 getpath.c-generated prefix wrong for Tru64 scripts (2003-04-25) http://python.org/sf/727732 Test failures on Linux, Python 2.3b1 tarball (2003-04-26) http://python.org/sf/728051 tmpnam problems on windows 2.3b, breaks test.test_os (2003-04-26) http://python.org/sf/728097 Tools/msgfmt.py results in two warnings under Python 2.3b1 (2003-04-26) http://python.org/sf/728277 setup.py breaks during build of Python-2.3b1 (2003-04-27) http://python.org/sf/728322 IRIX, 2.3b1, socketmodule.c compilation errors (2003-04-27) http://python.org/sf/728330 New Patches ----------- Improved output for unittest failUnlessEqual (2003-04-22) http://python.org/sf/725569 Modules/addrinfo.h patch (2003-04-22) http://python.org/sf/725942 help() with readline support (2003-04-23) http://python.org/sf/726204 Clarify docs for except target assignment (2003-04-24) http://python.org/sf/726751 AUTH_TYPE and REMOTE_USER for CGIHTTPServer.py:run_cgi() (2003-04-25) http://python.org/sf/727483 Editing of __str__ and __repr__ docs (2003-04-25) http://python.org/sf/727789 Remove extra line ending in CGI XML-RPC responses (2003-04-25) http://python.org/sf/727805 Multiple webbrowser.py bug fixes / improvements (2003-04-26) http://python.org/sf/728278 Closed Bugs ----------- netrc module can't handle all passwords (2002-05-18) http://python.org/sf/557704 netrc & special chars in passwords (2002-12-01) http://python.org/sf/646592 optparse store_true uses 1 and 0 (2002-12-28) http://python.org/sf/659604 filter() treatment of str and tuple inconsistent (2003-01-10) http://python.org/sf/665835 StringIO self-iterator (2003-01-31) http://python.org/sf/678519 repr() of large array objects takes quadratic time (2003-02-05) http://python.org/sf/680789 math.log(0) differs from math.log(0L) (2003-03-27) http://python.org/sf/711019 sys.path on MacOSX (2003-04-10) http://python.org/sf/719297 Icon on applets is wrong (2003-04-10) http://python.org/sf/719303 tarfile gets filenames wrong (2003-04-15) http://python.org/sf/721871 logging.setLoggerClass() doesn't support new-style classes (2003-04-18) http://python.org/sf/723801 Closed Patches -------------- Fix for seg fault on test_re on mac osx (2002-07-12) http://python.org/sf/580869 [mingw patches] alloca and posixmodule (2002-10-04) http://python.org/sf/618791 Generator form of os.path.walk (2002-12-12) http://python.org/sf/652980 Add inet_pton and inet_ntop to socket (2002-12-24) http://python.org/sf/658327 Deprecate rotor module (2003-02-03) http://python.org/sf/679505 fix for 680789: reprs in arraymodule (2003-02-11) http://python.org/sf/685051 fix bug 678519: cStringIO self iterator (2003-03-01) http://python.org/sf/695710 allow timeit to see your globals() (2003-04-08) http://python.org/sf/717575 Patch to distutils doc for metadata explanation (2003-04-09) http://python.org/sf/718027 From thomas@xs4all.net Sun Apr 27 15:03:46 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 27 Apr 2003 16:03:46 +0200 Subject: [Python-Dev] Curiousity In-Reply-To: <200304270605.h3R65qK21094@localhost.localdomain> References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain> Message-ID: <20030427140346.GC26254@xs4all.nl> On Sun, Apr 27, 2003 at 04:05:51PM +1000, Anthony Baxter wrote: > Not exactly. I'm still waiting for > a) the new linux-based creosote No, it's not a new creosote. It's a new machine, it doesn't replace creosote. It's taking a bit longer than I expected, partly because of my workload, partly because that of others and partly because of unforseen events, but it's still on its way. > b) the ultraseek license key Last Barry and I looked (during PyCon), we couldn't find the Linux version... Are we certain it still exists ? :) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Sun Apr 27 15:11:01 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 27 Apr 2003 16:11:01 +0200 Subject: [Python-Dev] Problems w/ Python 2.2-maint and Redhat 9 In-Reply-To: <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de> References: <1051419746.20524.193.camel@geddy> <m38ytwtkm9.fsf@mira.informatik.hu-berlin.de> Message-ID: <20030427141101.GD26254@xs4all.nl> On Sun, Apr 27, 2003 at 10:22:53AM +0200, Martin v. L=F6wis wrote: > Barry Warsaw <barry@python.org> writes: >=20 > > - exts.append( Extension('dbm', ['dbmmodule.c']) ) > > + exts.append( Extension('dbm', ['dbmmodule.c'], > > + libraries =3D ['gdbm']) ) > I think this was an alternative for platforms where a dbm library is > part of the C library. Your patch would kill those platforms - but it > may be that we are talking about the empty set here. In either case, it should only link to gdbm if gdbm exists -- which is checked for right below the patch. --=20 Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me = spread! From Anthony Baxter <anthony@interlink.com.au> Sun Apr 27 15:48:41 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Mon, 28 Apr 2003 00:48:41 +1000 Subject: [Python-Dev] shellwords In-Reply-To: <20030425181157.GB6591@localhost.distro.conectiva> Message-ID: <200304271448.h3REmfK23350@localhost.localdomain> >>> Gustavo Niemeyer wrote > > The other file manipulation thingy that would be good would be to > > abstract out the bits of tarfile and zipfile and make a standard > > interface to the two. > > IIRC, tarfile has a wrapper which makes it compatible with zipfile. Yah, but tarfile's interface is much nicer. I was talking about a mode that makes zipfile like tarfile. Anthony From dberlin@dberlin.org Sun Apr 27 16:49:02 2003 From: dberlin@dberlin.org (Daniel Berlin) Date: Sun, 27 Apr 2003 11:49:02 -0400 Subject: [Python-Dev] Why doesn't the uu module give you the filename? Message-ID: <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org> While it's simple enough to get the uu module to uudecode a string (using StringIO), it's impossible to get it to handle you the filename the uuencoded thing specifies. IE given begin 644 a.ii.gz <whatever> end Their is no way to get the decode function to tell you the thing is named a.ii.gz. Of course, it uses this filename itself in creating an output file if you don't specify one. It just won't tell *you* what the filename is. I could just give it no output file, and let it create it, then determine the name of the file it created, but this seems like a very large kludge. Besides, I am decoding from/to a string, in memory. I don't want to start have it write things to the disk for no reason. The context of all of this is that I have a program that is converting text that possibly contains uuencoded attachments into a bunch of SQL statements to insert into a database (It's converting a GNATS bug database to a Bugzilla one. It's a rewrite of an incredibly ugly, slow, barely functional perl script that spews errors at random and leaks memory for no reason :P). I had to cut/paste the decode function from the uu module into a new module and make it return the filename, just so that i could get access to it. This seems a bit silly. The decode function has no return value right now, so giving it one shouldn't break existing applications (since none of them should be expecting it to return anything). I believe it should return the filename specified in the begin line. As an added bonus, it would be even nicer if it also returned the start and end position of the decoded portion inside the input text. that way if one wants to replace the entire uuencoded text with something like, say, "See bug attachments for <filename>", you can do it easily. :P As i said, i've got a version of uu.decode that does all of this, i'll happily submit it as a patch if people agree i'm right. --Dan From ANTIGEN@netsys.co.za Sun Apr 27 16:50:07 2003 From: ANTIGEN@netsys.co.za (ANTIGEN_NETSYS-NT-SERV) Date: Sun, 27 Apr 2003 17:50:07 +0200 Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus Message-ID: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za> Antigen for Exchange found Unknown infected with CorruptedCompressedUuencodeFile virus. The file is currently Removed. The message, "Why doesn't the uu module give you the filename?", was sent from python-list-admin@python.org and was discovered in IMC Queues\Inbound located at Netsys International/NETSYS/NETSYS-NT-SERV. From dberlin@dberlin.org Sun Apr 27 17:25:43 2003 From: dberlin@dberlin.org (Daniel Berlin) Date: Sun, 27 Apr 2003 12:25:43 -0400 Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus In-Reply-To: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za> Message-ID: <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org> Yes, it's the incredible "even if it was valid uuencoded text, it would be a very dangerous empty file" virus. Who designs this shite? What's next? "Antigen for Exchange found Unknown infected with YourMailerCantDoMIMEProperly virus" or maybe "Antigen for Exchange found Unknown infected with YouQuotedTooMuchText virus" or more likely "Antigen for Exchange found our entire organization infected with WeUseBrainDeadAntiVirusSoftware virus" --Dan On Sunday, April 27, 2003, at 11:50 AM, ANTIGEN_NETSYS-NT-SERV wrote: > Antigen for Exchange found Unknown infected with > CorruptedCompressedUuencodeFile virus. > The file is currently Removed. The message, "Why doesn't the uu > module give > you the filename?", was > sent from python-list-admin@python.org and was discovered in IMC > Queues\Inbound > located at Netsys International/NETSYS/NETSYS-NT-SERV. > From itamar@itamarst.org Sun Apr 27 19:53:16 2003 From: itamar@itamarst.org (Itamar Shtull-Trauring) Date: Sun, 27 Apr 2003 14:53:16 -0400 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? Message-ID: <20030427145316.475c3cf5.itamar@itamarst.org> The "we always wrap socket objects with python class" change seems to have slowed down networking on Linux (and presumably other platforms where socket objects used to be unwrapped.) Moshe Zadka ran some benchmarks on Linux (2.4.9 - a redhat machine at work probably) with 2.2 and 2.3b1 using Demos/sockets/throughput.py. For count of 1000: 2.3 server, 2.3 client: Throughput: 13556.811 K/sec. 2.3 server, 2.2 client: Throughput: 24917.862 K/sec. 2.2 server, 2.2 client: Throughput: 29838.491 K/sec. 10,000: 2.3 server, 2.3 client: Throughput: 35994.749 K/sec. 2.3 server, 2.2 client: Throughput: 34398.085 K/sec. 2.2 server, 2.2 client: Throughput: 49488.916 K/sec. 50,000: 2.3 server, 2.3 client: Throughput: 39002.538 K/sec. 2.3 server, 2.2 client: Throughput: 48064.785 K/sec. 2.2 server, 2.2 client: Throughput: 59799.672 K/sec. On a 2.3a2 I have I did "socket.socket = socket._socketobject", and got a 20% slowdown compared to 2.2 on throughput. (2.3a2 without this change is the same speed as 2.2). Can other people do some tests to verify these numbers? If this slowdown is confirmed, it is really not acceptable, since the change seems to have been made only to support making timeout sockets slightly easier to use. Why should everyone have to pay a speed penalty just so a minority of people can skip calling a "socket.installtimeoutsupport()" at the beginning of their program? it's just one line of code they'd need to add. In real programs the speed drop would probably be much less pronounced, although I bet this slows down e.g. Anthony Baxter's portforwarder quite a bit. If Python 2.3 is released without fixing this Twisted will probably monkeypatch the socket module so that we can get full performance, since we have our own (unavoidable) layers of Python indirection :) -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting From guido@python.org Sun Apr 27 20:19:17 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 27 Apr 2003 15:19:17 -0400 Subject: [Python-Dev] Why doesn't the uu module give you the filename? In-Reply-To: "Your message of Sun, 27 Apr 2003 11:49:02 EDT." <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org> References: <C4C4782E-78C7-11D7-B62F-000A95A34564@dberlin.org> Message-ID: <200304271919.h3RJJHs15021@pcp02138704pcs.reston01.va.comcast.net> > While it's simple enough to get the uu module to uudecode a string > (using StringIO), it's impossible to get it to handle you the filename > the uuencoded thing specifies. [...] > As i said, i've got a version of uu.decode that does all of this, i'll > happily submit it as a patch if people agree i'm right. Sure, as long as your patch is backwards compatible. Send it to SourceForge. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Apr 27 20:22:17 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 27 Apr 2003 15:22:17 -0400 Subject: [Python-Dev] Curiousity In-Reply-To: "Your message of Sun, 27 Apr 2003 16:03:46 +0200." <20030427140346.GC26254@xs4all.nl> References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain> <20030427140346.GC26254@xs4all.nl> Message-ID: <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net> > > b) the ultraseek license key > > Last Barry and I looked (during PyCon), we couldn't find the Linux > version... Are we certain it still exists ? :) Googling for "ultraseek download" found thispage, which seems to have it: http://downloadcenter.verity.com/dlc/index.jsp --Guido van Rossum (home page: http://www.python.org/~guido/) From drifty@alum.berkeley.edu Sun Apr 27 20:53:08 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 27 Apr 2003 12:53:08 -0700 (PDT) Subject: [Python-Dev] test_logging hangs on OS X (was: ... Solaris 8) In-Reply-To: <16041.31987.943313.278329@montanaro.dyndns.org> References: <16041.31987.943313.278329@montanaro.dyndns.org> Message-ID: <Pine.SOL.4.55.0304271250320.17451@death.OCF.Berkeley.EDU> [Skip Montanaro] > Using the latest version from CVS, on Solaris 8 test_logging hangs. I am getting this hang on OS X as well. Anyone else? > Lots of output, then: > > ... > -- log_test3 end --------------------------------------------------- > I am getting the hang at the same place; log_test3 ends and then nothing happens afterwards. At least this has convinced me that to forward with my skips.txt patch. -Brett From martin@v.loewis.de Sun Apr 27 21:19:44 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 27 Apr 2003 22:19:44 +0200 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org> References: <20030427145316.475c3cf5.itamar@itamarst.org> Message-ID: <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de> Itamar Shtull-Trauring <itamar@itamarst.org> writes: > Can other people do some tests to verify these numbers? For that, it would be good if Moshe's test procedure was published. Regards, Martin From itamar@itamarst.org Sun Apr 27 21:23:27 2003 From: itamar@itamarst.org (Itamar Shtull-Trauring) Date: Sun, 27 Apr 2003 16:23:27 -0400 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de> References: <20030427145316.475c3cf5.itamar@itamarst.org> <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de> Message-ID: <20030427162327.428c974f.itamar@itamarst.org> On 27 Apr 2003 22:19:44 +0200 martin@v.loewis.de (Martin v. L=F6wis) wrote: > > Can other people do some tests to verify these numbers? >=20 > For that, it would be good if Moshe's test procedure was published. On Debian, you can do: cd /usr/share/doc/python2.2/examples/Demos/sockets.py python2.2 throughput.py -s & python2.2 throughput.py -c 10000 localhost and try with python2.3 and different numbers other than 10000. On non-Debian platforms/packages it's wherever you have the python examples installed. --=20 Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting From drifty@alum.berkeley.edu Sun Apr 27 21:32:58 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Sun, 27 Apr 2003 13:32:58 -0700 (PDT) Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <20030427162327.428c974f.itamar@itamarst.org> References: <20030427145316.475c3cf5.itamar@itamarst.org> <m3vfwzk80v.fsf@mira.informatik.hu-berlin.de> <20030427162327.428c974f.itamar@itamarst.org> Message-ID: <Pine.SOL.4.55.0304271328270.17451@death.OCF.Berkeley.EDU> [Itamar Shtull-Trauring] > On 27 Apr 2003 22:19:44 +0200 > martin@v.loewis.de (Martin v. L=F6wis) wrote: > > > > Can other people do some tests to verify these numbers? > > > > non-Debian platforms/packages it's wherever you have the python examples > installed. So running Demo/sockets/throughput.py with the -c 10000 argument I get under OS X: * Python 2.2.2: 7976.756k K/sec * CVS Python (compiled on April 18): 2772.97 K/sec Now I put no great effort into steriliziing my system so that nothing else was running so take these numbers with a grain of salt. -Brett From tim.one@comcast.net Sun Apr 27 22:22:00 2003 From: tim.one@comcast.net (Tim Peters) Date: Sun, 27 Apr 2003 17:22:00 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <20030427014407.GC919@cthulhu.gerg.ca> Message-ID: <LNBBLJKPBEHFEDALKOLCGEEOEEAB.tim.one@comcast.net> [Jeremy Fincher] > It's a minor quibble to be sure, but os.walk doesn't really > describe what exactly it's doing. I'd suggest os.pathwalk, but > that'd be too error-prone, being os.path.walk without a dot. Perhaps > os.pathwalker? [Greg Ward] > os.walktree? os.walkdirs? os.walkpath? > > (On reflection, the latter two are pretty dumb. walktree is the right > name, undoubtedly. ;-) I don't expect any short name to describe exactly what a thing does, and don't worry about it. math.sin() isn't about lust in your heart, or math.tan() about practicing safe sunning either. Guido has his own inscrutable criteria for picking names. Mine is whether, *after* I know what a thing does, it's hard to forget what the name means. "walk" passed that test for me, and better than Python or Java did <wink>. From thomas@xs4all.net Sun Apr 27 22:40:11 2003 From: thomas@xs4all.net (Thomas Wouters) Date: Sun, 27 Apr 2003 23:40:11 +0200 Subject: [Python-Dev] Antigen found CorruptedCompressedUuencodeFile virus In-Reply-To: <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org> References: <41A321246CB6D511AE2600C0DFF8012E5A9BEC@netsys-nt-serv.netsys.co.za> <E4BE28B1-78CC-11D7-84A5-000A95A34564@dberlin.org> Message-ID: <20030427214011.GE26254@xs4all.nl> On Sun, Apr 27, 2003 at 12:25:43PM -0400, Daniel Berlin wrote: > Yes, it's the incredible "even if it was valid uuencoded text, it would > be a very dangerous empty file" virus. > Who designs this shite? Someone who was paying attention to the incredibly numerous problems with braindead mailprograms (oddly enough by far the most common of them on one particular platform, and from one particular vendor ;) If there is a problem with some kind of pattern, somewhere someone will write a program to block the pattern, and lots of people will use/buy it. It beats getting infected. > What's next? > "Antigen for Exchange found Unknown infected with > YourMailerCantDoMIMEProperly virus" > or maybe > "Antigen for Exchange found Unknown infected with YouQuotedTooMuchText > virus" If those viruses actually exist, yes, I'm certain you will see them. > or more likely > "Antigen for Exchange found our entire organization infected with > WeUseBrainDeadAntiVirusSoftware virus" You mean 'WeUseBrainDeadMailClientsAndMailServers'. No worries, though, it's not python-dev itself that checks for viruses, and whoever has his or her viruschecker on the unpopular 'warn everyone on the CC list too' setting will proably be frantically trying to fix it now :) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From aahz@pythoncraft.com Mon Apr 28 00:25:39 2003 From: aahz@pythoncraft.com (Aahz) Date: Sun, 27 Apr 2003 19:25:39 -0400 Subject: [Python-Dev] Curiousity In-Reply-To: <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net> References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain> <20030427140346.GC26254@xs4all.nl> <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030427232539.GA25650@panix.com> On Sun, Apr 27, 2003, Guido van Rossum wrote: > > Googling for "ultraseek download" found thispage, which seems to have > it: > > http://downloadcenter.verity.com/dlc/index.jsp Did you try accessing URL? It seems to be down right now. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 93 From guido@python.org Mon Apr 28 02:26:00 2003 From: guido@python.org (Guido van Rossum) Date: Sun, 27 Apr 2003 21:26:00 -0400 Subject: [Python-Dev] Curiousity In-Reply-To: "Your message of Sun, 27 Apr 2003 19:25:39 EDT." <20030427232539.GA25650@panix.com> References: <200304261648.h3QGmfA05194@pcp02138704pcs.reston01.va.comcast.net> <200304270605.h3R65qK21094@localhost.localdomain> <20030427140346.GC26254@xs4all.nl> <200304271922.h3RJMHY15041@pcp02138704pcs.reston01.va.comcast.net> <20030427232539.GA25650@panix.com> Message-ID: <200304280126.h3S1Q0i15475@pcp02138704pcs.reston01.va.comcast.net> > On Sun, Apr 27, 2003, Guido van Rossum wrote: > > > > Googling for "ultraseek download" found thispage, which seems to have > > it: > > > > http://downloadcenter.verity.com/dlc/index.jsp > > Did you try accessing URL? It seems to be down right now. Yes, I even downloaded the Linux tarball (to my Windows laptop :-). It's up right now for me. --Guido van Rossum (home page: http://www.python.org/~guido/) From jepler@unpythonic.net Mon Apr 28 03:04:22 2003 From: jepler@unpythonic.net (Jeff Epler) Date: Sun, 27 Apr 2003 21:04:22 -0500 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org> References: <20030427145316.475c3cf5.itamar@itamarst.org> Message-ID: <20030428020421.GA31496@unpythonic.net> I can also reproduce the slowdown. Measured on a Redhat 9 machine, python-2.2.2-26.i386.rpm vs python 2.3b1 compiled with default options. 700MHz Pentium III in a laptop. best of 3 runs. Count of 100000. Running over the loopback device. Sentence fragments. Server Client Throughput Speed 2.2 2.2 53520.4 100.00% 2.2 2.3b1 43726.28 81.70% 2.3b1 2.2 43032.06 80.40% 2.3b1 2.3b1 38283.78 71.53% System load was low at the time, though I had various apps running. I also ran the test over my 802.11b wireless setup: Server Client Throughput Speed 2.2 2.2 639.16 100.00% 2.3b1 2.2 639.07 99.98% (client was a 350MHz machine with various programs running) that is, when running over a relatively slow link (theoretically, 11mbps) the slowdown is not measurable. However, I don't think that this really decreases the importance of this performance regression. Jeff From mal@lemburg.com Mon Apr 28 08:53:35 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 28 Apr 2003 09:53:35 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> Message-ID: <3EACDDFF.3060304@lemburg.com> Moore, Paul wrote: > From: M.-A. Lemburg [mailto:mal@lemburg.com] > >>In reality it probably is for most parts of the world. But >>why put this burden on the casual user ? > > Speaking as a "casual user", I very rarely need or use crypto > software. However, when I do need it, having it "built in" is > a major benefit - most of the crypto packages either have > dependencies I'm not familiar with or don't have, or go far > too deep into crypto theory for me to follow. At the end of > the day, all I want is simple stuff, like for urllib to get a > "https" web page for me, "just like my browser does" (ie, with > no thought on my part...) Paul, that's the wrong approach to the problem. Crypto code causes legal problems not ones which have to do with how to wrap up distributions. There's hardly anything to argue about here, unfortunately. >>>>Crypto is just too much (legal) work if you're serious >>>>about it. >>> >>>So then you would advise to remove the OpenSSL support >>>from the Windows distribution, and from Python altogether? >> >>Hmm, I didn't know that the Windows installer comes with an SSL >>module that includes OpenSSL. I'd strongly advise to make that >>a separate download. > > If you did, I'd expect that 99% of Windows users would perceive > that as "Python can't handle https URLs". Having a separate > download might be enough, as long as it was utterly trivial - > download the package, click to install, done. All dependencies > included, no extra work. Right; and that would be possible... not only for Windows, but for most supported platforms via distutils. >>Is there ? pycrypto is all you need if you're into deep crypto. > > But pycrypto (at least when I've looked into it) definitely *isn't* > just a 1-click install, and a quick Google search reveals no way > of getting a prebuilt Windows binary. Of course, you say "if you're > into deep crypto", so maybe you'd say that expecting users to build > their own isn't unreasonable at that level. > > Actually, m2crypto is another candidate, and it does include > Windows binaries (but they are a bit fiddly to install)... Both packages are maintained outside the Python distribution, so there's nothing much we can do to change that situation. I was talking about the code currently integrated in Python itself. >>The standard SSL support is enough crypt for most people and >>that's already included in the distribution. > > But you were arguing to take it out... I am argueing to take out the OpenSSL code currently shipped with the Windows installer, not the wrapper code in the _ssl module. > Personally, I'd like the existing stuff to stay as-is. I can understand your point, but we have to do something about the current situation, unless we want to put the whole Python distribution at risk of being illegally exported/imported/used in some parts of the world. Making the crypto part of the distribution would solve the problem and only introduce a mild inconvenience for casual users. > I don't > particularly see the need for more crypto stuff in the core, but I'd > like to see a well-maintained, easy to install, "sanctioned" crypto > package for people who want to either use crypto "for real", or just > investigate it. That's a feature request :-) -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 28 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 57 days left From mal@lemburg.com Mon Apr 28 11:00:29 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 28 Apr 2003 12:00:29 +0200 Subject: [Python-Dev] Cryptographic stuff for 2.3 In-Reply-To: <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net> References: <16E1010E4581B049ABC51D4975CEDB88619A4C@UKDCX001.uk.int.atosorigin.com> <1051290657.1500.6.camel@barry> <200304251826.h3PIQQU25424@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3EACFBBD.7000909@lemburg.com> Guido van Rossum wrote: >>I'd hate to see sha removed from the standard distro. > > > Me too; I don't see sha or md5 as crypto. I'm only against adding new > *crypto* capability. Hash algorithms are usually not regulated as crypto code -- even though they can be used for such purposes; see e.g. shaffing and winnowing: http://theory.lcs.mit.edu/~rivest/chaffing.txt > I'm also for isolating existing crypto capability so it's easy to > remove for anyone who has a need for a crypto-free distribution. I > think we're already doing that, given that even on Windows, the SSL > module is a separate DLL. We could wrap up the following set: a.installer with crypto code + notice that downloading and using this version is illegal in some countries and that the downloading and/or reexporting the installer to certain countries is not legal b.installer without crypto c.crypto package as distutils installer with the same notice as for the combined package Or we just do b and c and leave a to companies like ActiveState. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 28 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 57 days left From cjohns@cybertec.com.au Mon Apr 28 14:31:36 2003 From: cjohns@cybertec.com.au (Chris Johns) Date: Mon, 28 Apr 2003 23:31:36 +1000 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled Message-ID: <3EAD2D38.3030906@cybertec.com.au> Hello, Porting Python to the open source realtime OS called RTEMS I get a compile error on line 2797 of socketmodule.c. This is from CVS and I suspect a result of the SF patch #658327. More problems exist on lines 2814, 2835 and 2850. Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet. Also where is INET_ADDRSTRLEN suppose to be defined ? Regards -- Chris Johns, cjohns at cybertec.com.au From guido@python.org Mon Apr 28 15:29:55 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 28 Apr 2003 10:29:55 -0400 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: Your message of "Sun, 27 Apr 2003 21:04:22 CDT." <20030428020421.GA31496@unpythonic.net> References: <20030427145316.475c3cf5.itamar@itamarst.org> <20030428020421.GA31496@unpythonic.net> Message-ID: <200304281429.h3SETtq06555@odiug.zope.com> I'm guessing that the slowdown comes from the fact that calling a method like recv() on the wrapper object is now a Python method which calls the C method on the wrapped object. I wonder if the slowdown can't be easily repaired by changing the wrapper class to copy the relevant methods to instance variables. It would be even nicer to use subclassing instead of a wrapper object. I vaguely recall that I tried this before but couldn't figure out how to do it, but I've got a feeling that it ought to be doable -- after all the C socket object has separate __new__ and __init__ methods. I hope someone can take this ball and submit a patch -- it would indeed be a shame to have to live with the slowdown (even if it only shows up when using the loopback device) or to have a practice of monkey patching socket.py. (BTW instead of monkey-patching socket.py, it might be easier to write "import _socket as socket".) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Mon Apr 28 15:32:23 2003 From: skip@pobox.com (Skip Montanaro) Date: Mon, 28 Apr 2003 09:32:23 -0500 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org> References: <20030427145316.475c3cf5.itamar@itamarst.org> Message-ID: <16045.15223.224661.442041@montanaro.dyndns.org> Itamar> If this slowdown is confirmed, it is really not acceptable, Itamar> since the change seems to have been made only to support making Itamar> timeout sockets slightly easier to use. It was done to support making timeout sockets work properly. As they existed previously, timeout sockets wouldn't work with protocols which would most likely use them: higher level modules such as httplib, which call sock.makefile(), then call readlines?() on the resulting file object. Itamar> Why should everyone have to pay a speed penalty just so a Itamar> minority of people can skip calling a Itamar> "socket.installtimeoutsupport()" at the beginning of their Itamar> program? it's just one line of code they'd need to add. I think it would be easier for the minority of programs that care about the 20% performance loss to simply set import socket, _socket socket.socket = socket.SocketType = _socket.socket I don't know about you, but fast and incorrect don't help me much. Feel free to submit a patch which improves performance but maintains proper behavior in the face of timeouts (that is, allows test_urllibnet to still work correctly). Skip From hemanexp@yahoo.com Mon Apr 28 15:40:41 2003 From: hemanexp@yahoo.com (perl lover) Date: Mon, 28 Apr 2003 07:40:41 -0700 (PDT) Subject: [Python-Dev] Getting mouse position interms of canvas unit. Message-ID: <20030428144041.1817.qmail@web41709.mail.yahoo.com> hi, iam new to python and tkinter. I have created a canvas of size 300m * 300m (in millimeter). I bind mouse move method to canvas. when i move the mouse over the canvas the mouse position gets printed in pixel unit. But i want to get mouse position values interms of canvas unit (ie, millimeter). How can i get mouse position values interms of canvas unit? My program is given below. ***************************** from Tkinter import * root = Tk() c = Canvas(root,width="300m",height="300m",background = 'gray') c.pack() def mouseMove(event): print c.canvasx(event.x), c.canvasy(event.y) c.create_rectangle('16m','10.5m','21m','15.5m',fill='blue') c.bind('<Motion>',mouseMove) root.mainloop() Tnanx __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com From tjreedy@udel.edu Mon Apr 28 18:22:32 2003 From: tjreedy@udel.edu (Terry Reedy) Date: Mon, 28 Apr 2003 13:22:32 -0400 Subject: [Python-Dev] Re: Getting mouse position interms of canvas unit. References: <20030428144041.1817.qmail@web41709.mail.yahoo.com> Message-ID: <b8jntb$7qd$1@main.gmane.org> "perl lover" <hemanexp@yahoo.com> wrote in message news:20030428144041.1817.qmail@web41709.mail.yahoo.com... > hi, > iam new to python and tkinter. Hi. This is the Python *developers* list. The following is a Python *usage* question rather than a Python *development* question. Please submit such to comp.lang.python or the regular Python mailing list (see www.python.org). >How can i get mouse position values in terms of canvas unit? If nothing else, figure out the ratio between pixel and canvas (mm) units and multiply. Terry J. Reedy From Raymond Hettinger" <python@rcn.com Mon Apr 28 20:51:36 2003 From: Raymond Hettinger" <python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 15:51:36 -0400 Subject: [Python-Dev] Dictionary tuning Message-ID: <001b01c30dbf$94363140$125ffea9@oemcomputer> I've experimented with about a dozen ways to improve dictionary performance and found one that benefits some programs by up to 5% without hurting the performance of other programs by more than a single percentage point. It entails a one line change to dictobject.c resulting in a new schedule of dictionary sizes for a given number of entries: Number of Current size Proposed size Filled Entries of dictionary of dictionary -------------- ------------- ------------- [-- 0 to 5 --] 8 8 [-- 6 to 10 --] 16 32 [-- 11 to 21 --] 32 32 [-- 22 to 42 --] 64 128 [-- 43 to 85 --] 128 128 [-- 86 to 170 --] 256 512 [-- 171 to 341 --] 512 512 The idea is to lower the average sparseness of dictionaries (by 0% to 50% of their current sparsenes). This results in fewer collisions, faster collision resolution, fewer memory accesses, and better cache performance. A small side-benefit is halving the number of resize operations as the dictionary grows. The above table of dictionary sizes shows that odd numbered steps have the same size as the current approach while even numbered steps are twice as large. As a result, small dicts keep their current size and the amortized size of large dicts remains the same. Along the way, some dicts will be a little larger and will benefit from the increased sparseness. I would like to know what you guys think about the idea and would appreciate your verifying the performance on your various processors and operating systems. Raymond Hettinger P.S. The one line patch is: *** dictobject.c 17 Mar 2003 19:46:09 -0000 2.143 --- dictobject.c 25 Apr 2003 22:33:24 -0000 *************** *** 532,538 **** * deleted). */ if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { ! if (dictresize(mp, mp->ma_used*2) != 0) return -1; } return 0; --- 532,538 ---- * deleted). */ if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { ! if (dictresize(mp, mp->ma_used*4) != 0) return -1; } From martin@v.loewis.de Mon Apr 28 21:07:08 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 28 Apr 2003 22:07:08 +0200 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <3EAD2D38.3030906@cybertec.com.au> References: <3EAD2D38.3030906@cybertec.com.au> Message-ID: <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> Chris Johns <cjohns@cybertec.com.au> writes: > Porting Python to the open source realtime OS called RTEMS I get a > compile error on line 2797 of socketmodule.c. In my copy, this is the line char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))]; Can you report more on the nature of the compile error (such as its *message*)? > Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet. (assuming this is a question): I'm unsure. It should not cause a compile time failure, period. > Also where is INET_ADDRSTRLEN suppose to be defined ? <netinet/in.h> Regards, Martin From glyph@twistedmatrix.com Mon Apr 28 21:49:27 2003 From: glyph@twistedmatrix.com (Glyph Lefkowitz) Date: Mon, 28 Apr 2003 15:49:27 -0500 Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs In-Reply-To: <20030428160006.2359.60528.Mailman@mail.python.org> Message-ID: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday, April 28, 2003, at 11:00 AM, python-dev-request@python.org wrote: > Itamar> If this slowdown is confirmed, it is really not acceptable, > Itamar> since the change seems to have been made only to support > making > Itamar> timeout sockets slightly easier to use. > > It was done to support making timeout sockets work properly. As they > existed previously, timeout sockets wouldn't work with protocols which > would > most likely use them: higher level modules such as httplib, which call > sock.makefile(), then call readlines?() on the resulting file object. Clearly this is a flaw in httplib's design. Perhaps one should be able to pass in a socket or file factory? That would allow speaking HTTP over non-TCP transports or through something like a SOCKS proxy, which is arguably a good thing. Do you want to add SOCKS support by adding another wrapper around the socket module as well? How about a python software firewall? Pretty soon our "correct" socket module will have 20 performance-destroying wrappers around it in order to work around deficiencies in the interfaces of some programs which use sockets. httplib is importing a module where passing a factory function is the correct thing to do. At first it looks like you can parameterize it by hacking up a module, but you can only do that once or twice before the design problem really becomes pressing. The socket module is not a high-level interface to networking. Attempting to make it into one will harm its utility as a low-level interface that good high-level interfaces can be built on top of. > Itamar> Why should everyone have to pay a speed penalty just so a > Itamar> minority of people can skip calling a > Itamar> "socket.installtimeoutsupport()" at the beginning of their > Itamar> program? it's just one line of code they'd need to add. > > I think it would be easier for the minority of programs that care > about the > 20% performance loss to simply set I think this should be in the release notes for 2.3. "Python is 10% faster, unless you use sockets, in which case it is much, much slower. Do the following in order to regain lost performance and retain the same semantics:" I anticipate that more than just Twisted will want to monkey-patch the module. (A 20% drop in throughput is a significant issue to more than an eclectic audience.) If you're not going to fix this bug, maybe we could have a "socket.monkeypatch()" method which would prevent different systems from stepping on each other when they do it? > I don't know about you, but fast and incorrect don't help me much. Since when is the behavior of the socket module "incorrect"? If anything the interface to "timeout sockets" is incorrect, because BSD sockets do not in fact support timeouts. The interface is doing a bunch of things behind the user's back which would be better done another way, for example, with actually asynchronous networking. It's pretty likely that there is some obscure corner-case that the select() in timeout sockets doesn't catch. From a brief glance, internal_select ignores error return values, and nothing checks its errno before making another socket call. If I remember correctly, that means that if select gets an EINTR, the following call to accept() or recv() or what-have-you may very well block. Of course, since the socket is in non-blocking mode at this point, that means that Python will raise an exception on the EAGAIN EWOULDBLOCK error. This is pretty hard to write a test for. I could be wrong about this particular error, but in general if one wishes to be pedantic about "correctness", one must first check the result codes from one's C system calls. > Feel free to submit a patch which improves performance but maintains > proper behavior in the face of timeouts (that is, allows > test_urllibnet to still work correctly). Why is the Python development team introducing bugs into Python and then expecting the user community to fix things that used to work? I could understand not wanting to put a lot of effort into correcting obscure or difficult-to-find performance problems that only a few people care about, but the obvious thing to do in this case is simply to change the default behavior. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin) iD8DBQE+rZPbvVGR4uSOE2wRAhZVAKCjWkl1NSr8bC1DGcbvhKwL4GZ9+ACeO2cJ FNU17XosCZxRTVRF/wIkLys= =GJ3H -----END PGP SIGNATURE----- From guido@python.org Mon Apr 28 21:58:54 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 28 Apr 2003 16:58:54 -0400 Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs In-Reply-To: Your message of "Mon, 28 Apr 2003 15:49:27 CDT." <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com> References: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com> Message-ID: <200304282058.h3SKwsj18824@odiug.zope.com> > I think this should be in the release notes for 2.3. "Python is 10% > faster, unless you use sockets, in which case it is much, much slower. > Do the following in order to regain lost performance and retain the > same semantics:" That is total bullshit, Glyph, and you know it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 28 22:02:53 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 28 Apr 2003 17:02:53 -0400 Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs In-Reply-To: Your message of "Mon, 28 Apr 2003 15:49:27 CDT." <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com> References: <E6A9FE28-79BA-11D7-B13E-000393C9700E@twistedmatrix.com> Message-ID: <200304282102.h3SL2rW18842@odiug.zope.com> > Why is the Python development team introducing bugs into Python and > then expecting the user community to fix things that used to work? I resent your rhetoric, Glyph. Had you read the rest of this thread, you would have seen that the performance regression only happens for sending data at maximum speed over the loopback device, and is negligeable when receiving e.g. data over a LAN. You would also have seen that I have already suggested two different simple fixes. > I could understand not wanting to put a lot of effort into > correcting obscure or difficult-to-find performance problems that > only a few people care about, but the obvious thing to do in this > case is simply to change the default behavior. It can and will be fixed. I just don't have the time to fix it myself. The functionality (of having timeouts work properly for streams created by socket.makefile()) is useful to have. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 28 23:06:48 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 28 Apr 2003 18:06:48 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: Your message of "Mon, 28 Apr 2003 15:51:36 EDT." <001b01c30dbf$94363140$125ffea9@oemcomputer> References: <001b01c30dbf$94363140$125ffea9@oemcomputer> Message-ID: <200304282206.h3SM6md20118@odiug.zope.com> > I've experimented with about a dozen ways to improve dictionary > performance and found one that benefits some programs by up to > 5% without hurting the performance of other programs by more > than a single percentage point. > > It entails a one line change to dictobject.c resulting in a new > schedule of dictionary sizes for a given number of entries: > > Number of Current size Proposed size > Filled Entries of dictionary of dictionary > -------------- ------------- ------------- > [-- 0 to 5 --] 8 8 > [-- 6 to 10 --] 16 32 > [-- 11 to 21 --] 32 32 > [-- 22 to 42 --] 64 128 > [-- 43 to 85 --] 128 128 > [-- 86 to 170 --] 256 512 > [-- 171 to 341 --] 512 512 I suppose there's an "and so on" here, right? I wonder if for *really* large dicts the space sacrifice isn't worth the time saved? > The idea is to lower the average sparseness of dictionaries (by > 0% to 50% of their current sparsenes). This results in fewer > collisions, faster collision resolution, fewer memory accesses, > and better cache performance. A small side-benefit is halving > the number of resize operations as the dictionary grows. I think you mean "raise the average sparseness" don't you? (The more sparse something is, the more gaps it has.) I tried the patch with my new favorite benchmark, startup time for Zope (which surely populates a lot of dicts :-). It did give about 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% speedup. --Guido van Rossum (home page: http://www.python.org/~guido/) From cjohns@cybertec.com.au Mon Apr 28 22:54:32 2003 From: cjohns@cybertec.com.au (Chris Johns) Date: Tue, 29 Apr 2003 07:54:32 +1000 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> Message-ID: <3EADA318.5010602@cybertec.com.au> Martin v. L=F6wis wrote: > Chris Johns <cjohns@cybertec.com.au> writes: >=20 >=20 >>Porting Python to the open source realtime OS called RTEMS I get a >>compile error on line 2797 of socketmodule.c.=20 >=20 >=20 > In my copy, this is the line >=20 > char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))]; ^^ I would assume the marked code is for IPV6 so needs to be protected by=20 ENABLE_IPV6, for example: #ifdef ENABLE_IPV6 char packed[MAX(sizeof(struct in_addr), sizeof(struct in6_addr))]; #else char packed[sizeof(struct in_addr)]; #endif >=20 > Can you report more on the nature of the compile error (such as its > *message*)? >=20 (I do not use the Python build system as I have to cross-compile and so u= se an=20 automake makefile in a RISCOS type layout) Sure. The output is from gcc-3.2.3: m68k-rtems-gcc -DHAVE_CONFIG_H -I. -I../python-cvs/dist/src/RTEMS -I.=20 -I../python-cvs/dist/src/RTEMS/../Include=20 -I../python-cvs/dist/src/RTEMS/../Python -I/opt/rtems/m68k-rtems/lib/incl= ude=20 -m5200 -O4 -g -DPLATFORM=3D"\"RTEMS (m5200)\"" -c -o socketmodule.o `tes= t -f=20 '../python-cvs/dist/src/RTEMS/../Modules/socketmodule.c' || echo=20 '../python-cvs/dist/src/RTEMS/'`../python-cvs/dist/src/RTEMS/../Modules/s= ocketmodule.c =2E./python-cvs/dist/src/Modules/socketmodule.c: In function `socket_inet= _pton': =2E./python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to a= n=20 incomplete type =2E./python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to a= n=20 incomplete type =2E./python-cvs/dist/src/Modules/socketmodule.c:2816: sizeof applied to a= n=20 incomplete type =2E./python-cvs/dist/src/Modules/socketmodule.c: In function `socket_inet= _ntop': =2E./python-cvs/dist/src/Modules/socketmodule.c:2835: `INET_ADDRSTRLEN' u= ndeclared=20 (first use in this function) =2E./python-cvs/dist/src/Modules/socketmodule.c:2835: (Each undeclared id= entifier=20 is reported only once =2E./python-cvs/dist/src/Modules/socketmodule.c:2835: for each function i= t appears=20 in.) =2E./python-cvs/dist/src/Modules/socketmodule.c:2835: `INET6_ADDRSTRLEN' = undeclared (first use in this function) =2E./python-cvs/dist/src/Modules/socketmodule.c:2851: sizeof applied to a= n=20 incomplete type >=20 >>Should this code check ENABLE_IPV6 as IPV6 is not support on RTEMS yet.= >=20 >=20 > (assuming this is a question): I'm unsure. It should not cause a > compile time failure, period. >=20 Sorry, it was a question. See above. >=20 >>Also where is INET_ADDRSTRLEN suppose to be defined ? >=20 > <netinet/in.h> >=20 Thanks. The RTEMS TCP/IP stack is an old port of the FreeBSD stack and do= es not=20 have this. The current FreeBSD does so I will fix RTEMS. I will not add=20 INET6_ADDRSTRLEN as no other IPV6 support is currently provided. --=20 Chris Johns, cjohns at cybertec.com.au From martin@v.loewis.de Mon Apr 28 23:17:42 2003 From: martin@v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 29 Apr 2003 00:17:42 +0200 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <3EADA318.5010602@cybertec.com.au> References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> Message-ID: <3EADA886.9020605@v.loewis.de> Chris Johns wrote: > ../python-cvs/dist/src/Modules/socketmodule.c:2797: sizeof applied to an > incomplete type I see. And the system does have inet_pton? *That* sounds like a bug to me - there should be no inet_pton if the IPv6 API is unsupported. So I think the configure test should be changed to define HAVE_PTON only if all prerequisites of its usage are met (or the entire function should be hidden if IPv6 is disabled). Regards, Martin From jack@performancedrivers.com Mon Apr 28 23:19:20 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Mon, 28 Apr 2003 18:19:20 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <200304282206.h3SM6md20118@odiug.zope.com>; from guido@python.org on Mon, Apr 28, 2003 at 06:06:48PM -0400 References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> Message-ID: <20030428181920.O15881@localhost.localdomain> > > I've experimented with about a dozen ways to improve dictionary > > performance and found one that benefits some programs by up to > > 5% without hurting the performance of other programs by more > > than a single percentage point. You wouldn't have some created some handy tables of 'typical' dictionary usage, would you? They would be interesting in general, but very nice for the PEPs doing dict optimizations for symbol tables in particular. -jack From cjohns@cybertec.com.au Mon Apr 28 23:33:35 2003 From: cjohns@cybertec.com.au (Chris Johns) Date: Tue, 29 Apr 2003 08:33:35 +1000 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <3EADA886.9020605@v.loewis.de> References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de> Message-ID: <3EADAC3F.6020802@cybertec.com.au> Martin v. Lvwis wrote: > > I see. And the system does have inet_pton? *That* sounds like a bug to > me - there should be no inet_pton if the IPv6 API is unsupported. Agreed. I will disable them. > > So I think the configure test should be changed to define HAVE_PTON only > if all prerequisites of its usage are met (or the entire function should > be hidden if IPv6 is disabled). > It would make Python more robust, but this is a mistake on my part. Thanks for the help. -- Chris Johns, cjohns at cybertec.com.au From goodger@python.org Tue Apr 29 00:13:14 2003 From: goodger@python.org (David Goodger) Date: Mon, 28 Apr 2003 19:13:14 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 Message-ID: <3EADB58A.2030607@python.org> The following paragraph is from PEP 1, "PEP Work Flow" section: Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. I propose adding the following text: ... The BDFL may also initiate a PEP review, first notifying the PEP author(s). In addition, I think it would be useful to add some text describing the PEP acceptance criteria. Something like the following: For a PEP to be accepted it must meet certain minimum criteria. It must be a clear description of the proposed enhancement. The enhancement must represent a net improvement. The implementation, if applicable, must be solid and must not complicate the interpreter unduly. Finally, a proposed enhancement must be "pythonic" in order to be accepted by the BDFL. (However, "pythonic" is an imprecise term; it may be defined as whatever is acceptable to the BDFL. This logic is intentionally circular.) See PEP 2 for standard library module acceptance criteria. Please comment. -- David Goodger <http://starship.python.net/~goodger> Python Enhancement Proposal (PEP) Editor <http://www.python.org/peps/> (Please cc: all PEP correspondence to <peps@python.org>.) From bkelly@sourcereview.net Tue Apr 29 00:25:57 2003 From: bkelly@sourcereview.net (Brett Kelly) Date: Mon, 28 Apr 2003 16:25:57 -0700 Subject: [Python-Dev] Introduction :) Message-ID: <20030428232557.GE21953@inkedmn.homelinux.org> --j36CejUufJ9OqLyd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Howdy folks, i'm new to this mailing list, thought i'd say hello and introduce myself. i'm Brett, i've been using python for about 2 years now, it was the first language i learned. I hope to learn more about python's advanced featurs (and gain a deeper understanding of OOP), as well as contribute in whatever small way i can. Anyway, hello! --=20 Brett Kelly bkelly@sourcereview.net This message was created using the Mutt mail agent and=20 digitally signed using GnuPG. --j36CejUufJ9OqLyd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE+rbiFa7gYa9SI8SoRAlaqAJ4/98em+U4nsVfUi9cT9lSukvtkQgCdFKGY 2+WIdi5jLj754hWlQzqSVv8= =CouZ -----END PGP SIGNATURE----- --j36CejUufJ9OqLyd-- From tdelaney@avaya.com Tue Apr 29 00:45:21 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 09:45:21 +1000 Subject: [Python-Dev] Re: Python-Dev digest, Vol 1 #3221 - 4 msgs Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC128@au3010avexu1.global.avaya.com> > From: Guido van Rossum [mailto:guido@python.org] >=20 > > Why is the Python development team introducing bugs into Python and=20 > > then expecting the user community to fix things that used to work? >=20 > I resent your rhetoric, Glyph. Had you read the rest of this thread, > you would have seen that the performance regression only happens for > sending data at maximum speed over the loopback device, and is > negligeable when receiving e.g. data over a LAN. You would also have > seen that I have already suggested two different simple fixes. Indeed - the primary purpose of a beta is IMO to discover these issues = by use in as great a number of scenarios as possible before the final = release is made. I would be extremely surprised if this cannot be fixed before 2.3 final = (in fact, I would be extremely surprised if such a known regression were = allowed in 2.3 final). A beta should (excluding implementation bugs) have *correct* behaviour. = Performance is not the #1 priority for a beta. Tim Delaney From tdelaney@avaya.com Tue Apr 29 00:51:47 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 09:51:47 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com> > From: Guido van Rossum [mailto:guido@python.org] >=20 > > I've experimented with about a dozen ways to improve dictionary=20 > > performance and found one that benefits some programs by up to=20 > > 5% without hurting the performance of other programs by more > > than a single percentage point. > >=20 > > It entails a one line change to dictobject.c resulting in a new=20 > > schedule of dictionary sizes for a given number of entries: > >=20 > > Number of Current size Proposed size > > Filled Entries of dictionary of dictionary > > -------------- ------------- ------------- > > [-- 0 to 5 --] 8 8 > > [-- 6 to 10 --] 16 32 > > [-- 11 to 21 --] 32 32 > > [-- 22 to 42 --] 64 128 > > [-- 43 to 85 --] 128 128 > > [-- 86 to 170 --] 256 512 > > [-- 171 to 341 --] 512 512 >=20 > I suppose there's an "and so on" here, right? I wonder if for > *really* large dicts the space sacrifice isn't worth the time saved? What is the effect on peak memory usage over "average" programs? This might be a worthwhile speedup on small dicts (up to a TBD number of = entries) but not worthwhile for large dicts. However, to add this = capability in would of course add more code to a very common code path = (additional test for current size to decide what factor to increase by). > I tried the patch with my new favorite benchmark, startup time for > Zope (which surely populates a lot of dicts :-). It did give about > 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% > speedup. Nice (in relative, not absolute terms). Do we have any numbers on memory = usage during and after that period? Tim Delaney From tdelaney@avaya.com Tue Apr 29 01:10:39 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 10:10:39 +1000 Subject: [Python-Dev] Thoughts on -O Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com> Was doing some thinking in the shower this morning, and came up with = some ideas for specifying optimisation. These are currently quite = nebulous thoughts ... We have the current situation: -O only removes asserts -OO removes asserts and docstrings. I think this is an ideal time to revisit the purpose of -O for 2.4 or = later. IMO the "vanilla" mode should be a "release" mode. Users should not have = to use a command-line option to gain "release" optimisations such as = asserts. I would propose that we have the following modes for python to work in. 1. Release/Production mode (no command-line switch) - asserts are turned off - well-tested/stable optimisations are included - possibly additional things, such as not calling trace functions 2. Optimised mode (-O) - more experimental optimisations are included i.e. those that may = have performance improvements in some cases, but penalties in others, = etc - may possibly split this up so individual optimisations can be turned = on and off as required - this would leave -O by itself as a no-op 3. Docstring elimination mode (-OO) - may be specified in addition to optimised mode - it does not imply = optimised mode 4. Debug mode (-D?) - will be the slowest mode - no optimisations - cannot be called with = either -O or -OO - turns on asserts - turns on trace functions I would see Debug mode being used by developers in unit tests, code = coverage, etc. .pyc and .pyo files would need to know which optimisations they were = compiled with so that if they would be loaded again with the "wrong" = optimisations they would be re-compiled. Anyway, any thoughts, rebuttals, etc would be of interest. I'd like to = get some discussion before I create a PEP. Cheers. Tim Delaney From drifty@alum.berkeley.edu Tue Apr 29 01:13:31 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Mon, 28 Apr 2003 17:13:31 -0700 (PDT) Subject: [Python-Dev] Introduction :) In-Reply-To: <20030428232557.GE21953@inkedmn.homelinux.org> References: <20030428232557.GE21953@inkedmn.homelinux.org> Message-ID: <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU> [Brett Kelly] > Howdy folks, i'm new to this mailing list, thought i'd say hello and > introduce myself. > > i'm Brett, <snip> Does this mean I have to append my my last initial to my name to differentiate? Or since I was here first can I just become the default "Brett" and make Brett K. have to use his initial? =) -Brett From python@rcn.com Tue Apr 29 00:50:14 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 19:50:14 -0400 Subject: [Python-Dev] Dictionary tuning References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain> Message-ID: <000201c30de7$01b5bd40$125ffea9@oemcomputer> [jack, master of the mac] > You wouldn't have some created some handy tables of 'typical' dictionary > usage, would you? They would be interesting in general, but very nice > for the PEPs doing dict optimizations for symbol tables in particular. That path proved fruitless. I studied the usage patterns in about a dozen of my apps and found that there is no such thing as typical. Instead there are many categories of dictionary usage. * attribute/method look-up in many small dictionaries * uniquification apps with many redundant stores and few lookups * membership testing with few stores and many lookups into small or large dicts. * database style lookups following Zipf's law for key access in large dicts. * graph explorers that access a few keys frequently and then move onto another set of related nodes. * global/builtin variable access following a failed search of locals. Almost every dictionary tune-up that helped one app would end-up hurting another. The only way to test the effectiveness of a change was to time a whole suite of applications. The standard benchmarks were useless in this regard. Worse still, contrived test programs would not predict the results for real apps. There were several reasons for this: * there is a special case for handling dicts that only have string keys * real apps exhibit keys access patterns that pull the most frequently accessed entries into the cache. this thwarted attempts to improve cache performance at the expense of more collisions. * any small, fixed set of test keys may have atypical collision anomalies, non-representative access frequencies, or not be characteristic of other dicts with a slightly different number of keys. * some sets of keys have non-random hash patterns but if you rely on this, it victimizes other sets of keys. * the results are platform dependent (ratio of processor speed to memory speed; size of cache; size of cache a line; cache associativity; write-back vs. write-through; etc). I had done some experiments that focused on symbol tables and had some luck with sequential searches into a self-organizing list. Using a list eliminated the holes and allowed more of the entries to fit in a single cache line. No placeholders were needed for deleted entries and that saves a test in the search loop. The self-organizing property kept the most frequently accessed entries at the head of the list. Using a sequential search had slightly less overhead than the hash table search pattern. Except for the builtin dictionary, most of the symbol tables in my apps have only a handful of entries. if-only-i-had-had-a-single-valid-dict-performance-predictor-ly yours, Raymond Hettinger From python@rcn.com Tue Apr 29 01:09:15 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 20:09:15 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 References: <3EADB58A.2030607@python.org> Message-ID: <000301c30de7$021a7280$125ffea9@oemcomputer> > I propose adding the following text: > ... The BDFL may also initiate a PEP review, first notifying the > PEP author(s). Periodic updates to the parade-of-peps serves equally well. > For a PEP to be accepted it must meet certain minimum criteria. > It must be a clear description of the proposed enhancement. The > enhancement must represent a net improvement. The implementation, > if applicable, must be solid and must not complicate the > interpreter unduly. Finally, a proposed enhancement must be > "pythonic" in order to be accepted by the BDFL. (However, > "pythonic" is an imprecise term; it may be defined as whatever is > acceptable to the BDFL. This logic is intentionally circular.) Peps can go through a lot of stages before they get to this point. That can include having other peps explore other options; refinements to the idea, etc. >From these proposals and the annoucement earlier this week, I sense a desire to have fewer peps and to more rapidly get them out of the draft status. In general, I don't think this is a good idea. If someone wants to do a write-up and weather the ensuing firestorm, that is enough for me. If it has to sit for a few years before becoming obviously good or bad, that's fine too. Also, some ideas need time. My generator attributes idea had no chance for Py2.3. After people spend a year or so using generators, they might collectively begin to see a need for it. Also, someone may be able to help express the rationale more clearly. As written, the rationale would result in instant death for the pep. After a pep dies, it becomes a permanent impediment for similar ideas even if someone comes up with better use cases or a slightly improved implementation. The first time I proposed something like a DictMixin class, it was violently shot down. A few months later, I had an improved version and those with a long memory immediately pointed out, "hey, that was shot down". After one more round, it was accepted, the alpha reviewers loved it, and it got applied through-out the library. Early rejection of peps will doom some useful ideas before they have a fighting chance. The authors can read the parade of peps and adapt or withdraw as appropriate. IOW, I like the process as it stands and am -1 on the amendment. It should be up to the pep author to decide when to stick his head in the guillotine to see what happens :) Raymond Hettinger "Theories have four stages of acceptance: i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view. iii) this is true but quite unimportant. iv) I always said so." - J.B.S. Haldane, 1963 "All great truths began as blasphemies" - George Bernard Shaw From python@rcn.com Tue Apr 29 01:20:35 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 20:20:35 -0400 Subject: [Python-Dev] Dictionary tuning References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com> Message-ID: <000501c30de7$029f32e0$125ffea9@oemcomputer> > What is the effect on peak memory usage over "average" programs? Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs. > This might be a worthwhile speedup on small dicts (up to a TBD > number of entries) but not worthwhile for large dicts. Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line. > However, to add this capability in would of course add more code > to a very common code path (additional test for current size to > decide what factor to increase by). Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR] > I tried the patch with my new favorite benchmark, startup time for > Zope (which surely populates a lot of dicts :-). It did give about > 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% > speedup. [Tim] >Nice (in relative, not absolute terms). Do we have any numbers on > memory usage during and after that period? I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________ From guido@python.org Tue Apr 29 01:43:00 2003 From: guido@python.org (Guido van Rossum) Date: Mon, 28 Apr 2003 20:43:00 -0400 Subject: [Python-Dev] Thoughts on -O In-Reply-To: "Your message of Tue, 29 Apr 2003 10:10:39 +1000." <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com> Message-ID: <200304290043.h3T0h0R16884@pcp02138704pcs.reston01.va.comcast.net> > Was doing some thinking in the shower this morning, and came up with > some ideas for specifying optimisation. These are currently quite > nebulous thoughts ... > > We have the current situation: > > -O only removes asserts It may do some more, but not much more now that SET_LINENO is never generated. > -OO removes asserts and docstrings. > > I think this is an ideal time to revisit the purpose of -O for 2.4 or later. Hm, I would think we can wait until after 2.3 is released, lest we be tempted to "push one more feature into 2.3". > IMO the "vanilla" mode should be a "release" mode. Users should not > have to use a command-line option to gain "release" optimisations > such as asserts. I strongly disagree, and I expect most Python users would. I think this idea of a default harks back to the time when computers were slow and you would put on your special debugging hat only when you had a problem you couldn't solve by thinking about it. These days, often you don't care about the small gain in speed that -O or even -OO offers, because the program runs fast enough; but often you *do* care about the extra checks that assert offers. (I know I do.) > I would propose that we have the following modes for python to work in. > > 1. Release/Production mode (no command-line switch) > > - asserts are turned off > - well-tested/stable optimisations are included > - possibly additional things, such as not calling trace functions > > 2. Optimised mode (-O) > > - more experimental optimisations are included i.e. those that may > have performance improvements in some cases, but penalties in > others, etc > > - may possibly split this up so individual optimisations can be > turned on and off as required - this would leave -O by itself as > a no-op > > 3. Docstring elimination mode (-OO) > > - may be specified in addition to optimised mode - it does not > imply optimised mode > > 4. Debug mode (-D?) > > - will be the slowest mode - no optimisations - cannot be called > with either -O or -OO > - turns on asserts > - turns on trace functions > > I would see Debug mode being used by developers in unit tests, code > coverage, etc. If I'm right about how Python is used, most Python users are in debug mode most of the time. So this ought to be the default. > .pyc and .pyo files would need to know which optimisations they were > compiled with so that if they would be loaded again with the "wrong" > optimisations they would be re-compiled. That's what the difference between .pyc and .pyo was intended to convey; IMO this was a mistake. > Anyway, any thoughts, rebuttals, etc would be of interest. I'd like > to get some discussion before I create a PEP. I'm not convinced that we need anything, given the minimal effect of most currently available optimizations. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Tue Apr 29 01:09:15 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 20:09:15 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 References: <3EADB58A.2030607@python.org> Message-ID: <000101c30de8$57758840$125ffea9@oemcomputer> > I propose adding the following text: > ... The BDFL may also initiate a PEP review, first notifying the > PEP author(s). Periodic updates to the parade-of-peps serves equally well. > For a PEP to be accepted it must meet certain minimum criteria. > It must be a clear description of the proposed enhancement. The > enhancement must represent a net improvement. The implementation, > if applicable, must be solid and must not complicate the > interpreter unduly. Finally, a proposed enhancement must be > "pythonic" in order to be accepted by the BDFL. (However, > "pythonic" is an imprecise term; it may be defined as whatever is > acceptable to the BDFL. This logic is intentionally circular.) Peps can go through a lot of stages before they get to this point. That can include having other peps explore other options; refinements to the idea, etc. >From these proposals and the annoucement earlier this week, I sense a desire to have fewer peps and to more rapidly get them out of the draft status. In general, I don't think this is a good idea. If someone wants to do a write-up and weather the ensuing firestorm, that is enough for me. If it has to sit for a few years before becoming obviously good or bad, that's fine too. Also, some ideas need time. My generator attributes idea had no chance for Py2.3. After people spend a year or so using generators, they might collectively begin to see a need for it. Also, someone may be able to help express the rationale more clearly. As written, the rationale would result in instant death for the pep. After a pep dies, it becomes a permanent impediment for similar ideas even if someone comes up with better use cases or a slightly improved implementation. The first time I proposed something like a DictMixin class, it was violently shot down. A few months later, I had an improved version and those with a long memory immediately pointed out, "hey, that was shot down". After one more round, it was accepted, the alpha reviewers loved it, and it got applied through-out the library. Early rejection of peps will doom some useful ideas before they have a fighting chance. The authors can read the parade of peps and adapt or withdraw as appropriate. IOW, I like the process as it stands and am -1 on the amendment. It should be up to the pep author to decide when to stick his head in the guillotine to see what happens :) Raymond Hettinger "Theories have four stages of acceptance: i) this is worthless nonsense; ii) this is an interesting, but perverse, point of view. iii) this is true but quite unimportant. iv) I always said so." - J.B.S. Haldane, 1963 "All great truths began as blasphemies" - George Bernard Shaw From python@rcn.com Tue Apr 29 01:20:35 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 20:20:35 -0400 Subject: [Python-Dev] Dictionary tuning References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC12E@au3010avexu1.global.avaya.com> Message-ID: <000301c30de8$5815bfe0$125ffea9@oemcomputer> > What is the effect on peak memory usage over "average" programs? Since the amortized growth is the same, the effect is Nil on average. Special cases can be contrived with a specific number of records where the memory use is doubled, but in general it is nearly unchanged for average programs. > This might be a worthwhile speedup on small dicts (up to a TBD > number of entries) but not worthwhile for large dicts. Actually, it helps large dictionaries even more that small dictionaries. Collisions in large dicts are resolved through other memory probes which are almost certain not to be in the current cache line. > However, to add this capability in would of course add more code > to a very common code path (additional test for current size to > decide what factor to increase by). Your intuition is exactly correct. All experiments to special case various sizes results in decreased performance because it added a tiny amount to some of the most heavily exercised code in python. Further, it results in an unpredicable branch which is also not a good thing. [GvR] > I tried the patch with my new favorite benchmark, startup time for > Zope (which surely populates a lot of dicts :-). It did give about > 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% > speedup. [Tim] >Nice (in relative, not absolute terms). Do we have any numbers on > memory usage during and after that period? I found out that timing dict performance was hard. Capturing memory usage was harder. Checking entry space,space plus unused space, calls to PyMalloc, and calls to the os malloc, only the last is important, but it depends on all kinds of things that are not easily controlled. Tim Delaney _______________________________________________ From python@rcn.com Tue Apr 29 00:50:14 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 19:50:14 -0400 Subject: [Python-Dev] Dictionary tuning References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain> Message-ID: <000001c30de8$57219be0$125ffea9@oemcomputer> [jack, master of the mac] > You wouldn't have some created some handy tables of 'typical' dictionary > usage, would you? They would be interesting in general, but very nice > for the PEPs doing dict optimizations for symbol tables in particular. That path proved fruitless. I studied the usage patterns in about a dozen of my apps and found that there is no such thing as typical. Instead there are many categories of dictionary usage. * attribute/method look-up in many small dictionaries * uniquification apps with many redundant stores and few lookups * membership testing with few stores and many lookups into small or large dicts. * database style lookups following Zipf's law for key access in large dicts. * graph explorers that access a few keys frequently and then move onto another set of related nodes. * global/builtin variable access following a failed search of locals. Almost every dictionary tune-up that helped one app would end-up hurting another. The only way to test the effectiveness of a change was to time a whole suite of applications. The standard benchmarks were useless in this regard. Worse still, contrived test programs would not predict the results for real apps. There were several reasons for this: * there is a special case for handling dicts that only have string keys * real apps exhibit keys access patterns that pull the most frequently accessed entries into the cache. this thwarted attempts to improve cache performance at the expense of more collisions. * any small, fixed set of test keys may have atypical collision anomalies, non-representative access frequencies, or not be characteristic of other dicts with a slightly different number of keys. * some sets of keys have non-random hash patterns but if you rely on this, it victimizes other sets of keys. * the results are platform dependent (ratio of processor speed to memory speed; size of cache; size of cache a line; cache associativity; write-back vs. write-through; etc). I had done some experiments that focused on symbol tables and had some luck with sequential searches into a self-organizing list. Using a list eliminated the holes and allowed more of the entries to fit in a single cache line. No placeholders were needed for deleted entries and that saves a test in the search loop. The self-organizing property kept the most frequently accessed entries at the head of the list. Using a sequential search had slightly less overhead than the hash table search pattern. Except for the builtin dictionary, most of the symbol tables in my apps have only a handful of entries. if-only-i-had-had-a-single-valid-dict-performance-predictor-ly yours, Raymond Hettinger From aahz@pythoncraft.com Tue Apr 29 01:57:50 2003 From: aahz@pythoncraft.com (Aahz) Date: Mon, 28 Apr 2003 20:57:50 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 Message-ID: <20030429005750.GA17963@panix.com> On Mon, Apr 28, 2003, Raymond Hettinger wrote: > > From these proposals and the annoucement earlier this week, > I sense a desire to have fewer peps and to more rapidly get > them out of the draft status. There's some truth to that. OTOH, until the BDFL declares something to be an ex-PEP, I don't think BDFL rejection of a PEP means that it is forever dead -- it just requires substantial revision to resurrect it. The point of PEPs is to prevent rehashing of old subjects in the same way, not to prevent new ideas from restarting discussions. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 93 From aahz@pythoncraft.com Tue Apr 29 01:58:37 2003 From: aahz@pythoncraft.com (Aahz) Date: Mon, 28 Apr 2003 20:58:37 -0400 Subject: [Python-Dev] Introduction :) In-Reply-To: <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU> References: <20030428232557.GE21953@inkedmn.homelinux.org> <Pine.SOL.4.55.0304281712001.14187@death.OCF.Berkeley.EDU> Message-ID: <20030429005837.GB17963@panix.com> On Mon, Apr 28, 2003, Brett Cannon wrote: > [Brett Kelly] >> >> Howdy folks, i'm new to this mailing list, thought i'd say hello and >> introduce myself. >> >> i'm Brett, > > Does this mean I have to append my my last initial to my name to > differentiate? Or since I was here first can I just become the default > "Brett" and make Brett K. have to use his initial? =) "Explicit is better than implicit." ;-) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In many ways, it's a dull language, borrowing solid old concepts from many other languages & styles: boring syntax, unsurprising semantics, few automatic coercions, etc etc. But that's one of the things I like about it." --Tim Peters on Python, 16 Sep 93 From tdelaney@avaya.com Tue Apr 29 01:59:41 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 10:59:41 +1000 Subject: [Python-Dev] Thoughts on -O Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC167@au3010avexu1.global.avaya.com> > From: Guido van Rossum [mailto:guido@python.org] >=20 > > -OO removes asserts and docstrings. > >=20 > > I think this is an ideal time to revisit the purpose of -O=20 > for 2.4 or later. >=20 > Hm, I would think we can wait until after 2.3 is released, lest we be > tempted to "push one more feature into 2.3". I have absolutely *no* intention of pushing any of this for 2.3. Good = lord no. For a start, these would be major feature changes ... > > IMO the "vanilla" mode should be a "release" mode. Users should not > > have to use a command-line option to gain "release" optimisations > > such as asserts. >=20 > I strongly disagree, and I expect most Python users would. I think > this idea of a default harks back to the time when computers were slow > and you would put on your special debugging hat only when you had a > problem you couldn't solve by thinking about it. These days, often > you don't care about the small gain in speed that -O or even -OO > offers, because the program runs fast enough; but often you *do* care > about the extra checks that assert offers. (I know I do.) True. I'm ambivalent about that myself. But in that case, I would argue = instead that there should not be any option to remove asserts. > > .pyc and .pyo files would need to know which optimisations they were > > compiled with so that if they would be loaded again with the "wrong" > > optimisations they would be re-compiled. >=20 > That's what the difference between .pyc and .pyo was intended to > convey; IMO this was a mistake. Yep - I know this. I would actually suggest removing .pyo and simply = have the info held in the .pyc. > > Anyway, any thoughts, rebuttals, etc would be of interest. I'd like > > to get some discussion before I create a PEP. >=20 > I'm not convinced that we need anything, given the minimal effect of > most currently available optimizations. One of my options is to create a PEP specifically to have it rejected. However, I think there are definitely a couple of useful things in here. = In particular, it provides a path for introducing optimisations. One of = the complaints I have seen recently is that all optimisations are being = added to both paths. Perhaps this could be reduced to a process PEP with the following major = points: 1. Any new optimisation must be introduced on the optimised path. 2. Optimisations may be promoted from the optimised path to the vanilla = path at BDFL discretion. 3. Experimental optimisations in general will required at least one = complete release before being promoted from the optimised path to the = vanilla path. Tim Delaney From tim.one@comcast.net Tue Apr 29 02:49:52 2003 From: tim.one@comcast.net (Tim Peters) Date: Mon, 28 Apr 2003 21:49:52 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <000301c30de8$5815bfe0$125ffea9@oemcomputer> Message-ID: <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net> [Tim Delaney] >> What is the effect on peak memory usage over "average" programs? [Raymond Hettinger] > Since the amortized growth is the same, the effect is Nil on average. > Special cases can be contrived with a specific number of records > where the memory use is doubled, but in general it is nearly unchanged > for average programs. That doesn't make sense. Dicts can be larger after the patch, but never smaller, so there's nothing opposing the "can be larger" part: on average, allocated address space must be strictly larger than before. Whether that *matters* on average to the average user is something we can answer rigorously just as soon as we find an average user with an average program <wink>. I'm not inclined to worry much about it. >> This might be a worthwhile speedup on small dicts (up to a TBD >> number of entries) but not worthwhile for large dicts. > Actually, it helps large dictionaries even more that small dictionaries. > Collisions in large dicts are resolved through other memory probes > which are almost certain not to be in the current cache line. That part makes sense. Resizing a large dict is an expensive operation too. >> However, to add this capability in would of course add more code >> to a very common code path (additional test for current size to >> decide what factor to increase by). > Your intuition is exactly correct. All experiments to special case > various sizes results in decreased performance because it added > a tiny amount to some of the most heavily exercised code in > python. This part isn't clear: the changed code is in the body of an if() block that normally *isn't* entered (over an ever-growing dict's life, it's entered O(log(len(dict))) times, and independent of the number of dict lookups). The change cuts the number of times it's entered by approximately a factor of 2, but it isn't entered often even now. > Further, it results in an unpredicable branch which is > also not a good thing. Since the body of the loop isn't entered often, unpredictable one-shot branches within the body shouldn't have a measurable effect. The unpredictable branches when physically resizing the dict will swamp them regardless. The surrounding if-test continues to be predictable in the "branch taken" direction. What could be much worse is that stuffing code into the if-block bloats the code so much as to frustrate lookahead I-stream caching of the normal "branch taken and return 0" path: if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { if (dictresize(mp, mp->ma_used*2) != 0) return -1; } return 0; Rewriting as if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) return 0; return dictresize(mp, mp->ma_used*2) ? -1 : 0; would help some compilers generate better code for the expected path, and especially if the blob after "return 0;" got hairier. IOW, if fiddling with different growth factors at different sizes slowed things down, we have to look for something that affected the *normal* paths; it's hard to imagine that the the guts of the if-block execute often enough to matter (discounting its call to dictresize(), which is an expensive routine). > I found out that timing dict performance was hard. > Capturing memory usage was harder. Checking entry > space,space plus unused space, calls to PyMalloc, and > calls to the os malloc, only the last is important, but > it depends on all kinds of things that are not easily > controlled. In my early Cray days, the Cray boxes were batch one-job-at-a-time, and all memory was real. If you had a CPU-bound program, it took the same number of nanoseconds each time you ran it. Benchmarking was hard then too <0.5 wink>. From tdelaney@avaya.com Tue Apr 29 03:40:41 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 12:40:41 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com> > From: Tim Peters [mailto:tim.one@comcast.net] >=20 > That doesn't make sense. Dicts can be larger after the=20 > patch, but never > smaller, so there's nothing opposing the "can be larger"=20 > part: on average, > allocated address space must be strictly larger than before. =20 > Whether that > *matters* on average to the average user is something we can answer > rigorously just as soon as we find an average user with an=20 > average program > <wink>. I'm not inclined to worry much about it. That's what I was getting at. I know that (for example) most classes I create have less that 16 entries in their __dict__. With this change, each class instance would take (approx) twice as much memory for its __dict__. I suspect that class instance __dict__ is the most common dictionary I use. > >> This might be a worthwhile speedup on small dicts (up to a TBD > >> number of entries) but not worthwhile for large dicts. >=20 > > Actually, it helps large dictionaries even more that small=20 > dictionaries. > > Collisions in large dicts are resolved through other memory probes > > which are almost certain not to be in the current cache line. >=20 > That part makes sense. Resizing a large dict is an expensive=20 > operation too. That's not what I meant. Most dictionaries are fairly small. Large dictionaries are common, but I doubt they are common enough to offset the potential memory loss from this patch. Currently if you go one over a threshold you have a capacity of 2*len(d)-1. With the patch this would change to 4*len(d)-1 - very significant for large dictionaries. Thus my consideration that it might be worthwhile for smaller dictionaries (depending on memory memory characteristics) but not for large dictionaries. Perhaps we need to add some internal profiling, so that "quickly-growing" dictionaries get larger reallocations ;) > Since the body of the loop isn't entered often, unpredictable one-shot > branches within the body shouldn't have a measurable effect. The > unpredictable branches when physically resizing the dict will=20 > swamp them > regardless. The surrounding if-test continues to be=20 > predictable in the > "branch taken" direction. I didn't look at the surrounding code (bad Tim D - thwack!) but in this case I would not expect an appreciable performance loss from this. However, the fact that we're getting an appreciable performance *gain* from changes on this branch suggests that it might be slightly more vulnerable than expected (but should still be swamped by the resize). > What could be much worse is that stuffing code into the=20 > if-block bloats the > code so much as to frustrate lookahead I-stream caching of the normal > "branch taken and return 0" path: >=20 > if (mp->ma_used > n_used && mp->ma_fill*3 >=3D=20 > (mp->ma_mask+1)*2) { > if (dictresize(mp, mp->ma_used*2) !=3D 0) > return -1; > } > return 0; >=20 > Rewriting as >=20 > if (mp->ma_used <=3D n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) > return 0; >=20 > return dictresize(mp, mp->ma_used*2) ? -1 : 0; >=20 > would help some compilers generate better code for the=20 > expected path, and > especially if the blob after "return 0;" got hairier. I find that considerably easier to read in any case ;) Cheers. Tim Delaney From python@rcn.com Tue Apr 29 04:15:50 2003 From: python@rcn.com (Raymond Hettinger) Date: Mon, 28 Apr 2003 23:15:50 -0400 Subject: [Python-Dev] Dictionary tuning References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com> Message-ID: <003b01c30dfd$a2d28b20$920aa044@oemcomputer> [Tim Peters] > Dicts can be larger after the > patch, but never > smaller, so there's nothing opposing the "can be larger" > part: on average, allocated address space must be strictly larger than before. I think of the resize intervals as steps on a staircase. My patch eliminates the even numbered stairs. The average logarithmic slope of the staircase doesn't change, there are just fewer discrete steps. Also, the height of the staircase doesn't change unless the top stair was even, in which case, another half step is added. [Tim Peters] > Resizing a large dict is an expensive operation too. Not only are there fewer resizes, but the cost of the operation becomes cheaper because it takes less time to load a sparse dictionary than one that is more dense. [Tim Peters] > Whether that *matters* on average to the average user is something > we can answer > rigorously just as soon as we find an average user with an > average program > <wink>. I'm not inclined to worry much about it. Me either, I suspect that it is rare to find a stable application that is functioning just fine and consuming nearly all memory. Sooner or later, some change in data, hardware, os, or script would push it over the edge. [Timothy Delaney] > That's what I was getting at. I know that (for example) most > classes I create have less that 16 entries in their __dict__. > With this change, each class instance would take (approx) twice > as much memory for its __dict__. I suspect that class instance > __dict__ is the most common dictionary I use. Those dicts would also be the ones benefitting from the patch. Their density would be halved; resulting in fewer collisions, improved search times, and better cache performance. [Timothy Delaney] > Perhaps we need to add some internal profiling, so that > "quickly-growing" dictionaries get larger reallocations ;) I came up with this patch a couple of months ago and have since tried every tweak I could think of (apply to this size dict but not that one, etc) but found nothing that survived a battery of application benchmarks. Have you guys tried out the patch? I'm very interested in getting results from different benchmarks, processors, cache sizes, and various operating systems. sparse-is-better-than-dense-ly yours, Raymond (currently, the only one. unlike two Tims, two Bretts, two Jacks and a Fredrik distinct from Fred) ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From tdelaney@avaya.com Tue Apr 29 04:34:06 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 13:34:06 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1C5@au3010avexu1.global.avaya.com> > From: Raymond Hettinger [mailto:python@rcn.com] >=20 > [Timothy Delaney] > > That's what I was getting at. I know that (for example) most > > classes I create have less that 16 entries in their __dict__. > > With this change, each class instance would take (approx) twice > > as much memory for its __dict__. I suspect that class instance > > __dict__ is the most common dictionary I use. >=20 > Those dicts would also be the ones benefitting from the patch. > Their density would be halved; resulting in fewer collisions, > improved search times, and better cache performance. No question that they would benefit. The question is whether the benefit outweighs the possible penalties. Of course, we can't evaluate that until we've got some data ... > [Timothy Delaney] > > Perhaps we need to add some internal profiling, so that > > "quickly-growing" dictionaries get larger reallocations ;) >=20 > I came up with this patch a couple of months ago and have > since tried every tweak I could think of (apply to this size > dict but not that one, etc) but found nothing that survived > a battery of application benchmarks. Note the smiley. Not at all intended seriously - the extra complication would almost certainly eliminate any possible performance gains. > Have you guys tried out the patch? I'm very interested in=20 > getting results from different benchmarks, processors,=20 > cache sizes, and various operating systems. If I can find the time I will. We're in crunch time on my project at the moment ... I'm somewhat over-allocated :( In case I forgot to mention it - I like the ideas in the patch, and really like the performance improvement. But with the things I'm doing at the moment, memory is proving more of a bottleneck than performance ... once we start hitting virtual memory, you can forget about a 5% performance improvement ... Tim Delaney From goodger@python.org Tue Apr 29 04:53:22 2003 From: goodger@python.org (David Goodger) Date: Mon, 28 Apr 2003 23:53:22 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 In-Reply-To: <000101c30de8$57758840$125ffea9@oemcomputer> References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> Message-ID: <3EADF732.7020300@python.org> [David Goodger] >> I propose adding the following text: >> >> ... The BDFL may also initiate a PEP review, first notifying the >> PEP author(s). [Raymond Hettinger] > Periodic updates to the parade-of-peps serves equally well. Except that Guido doesn't have time to update the PEP Parade. He told me so when I asked a few days ago. > From these proposals and the annoucement earlier this week, > I sense a desire to have fewer peps and to more rapidly get > them out of the draft status. Not quite. The desire is not to cull the weak, but to promote the strong. The desire is to change already-implemented and implicitly-accepted PEPs to from "Status: Draft" to "Status: Accepted" or "Status: Final". See the "Accepted PEPs?" thread from a few days ago; 9 "Draft" but already-implemented-for-2.3 PEPs were identified. Their status lines ought to be changed, but the wording as written implies that Guido and the PEP editors have to wait for authors to ask for a review. We should be able to be more proactive. New proposed addition: ... For PEPs that are pre-determined to be acceptable (e.g., their implementation has already been checked in) the BDFL may also initiate a PEP review, first notifying the PEP author(s) and giving them a chance to make revisions. It is implied that Guido himself doesn't necessarily do all the notifying or initiating, but may delegate to his loyal serfs. ;-) > If someone wants > to do a write-up and weather the ensuing firestorm, that is > enough for me. If it has to sit for a few years before becoming > obviously good or bad, that's fine too. > > Also, some ideas need time. Good points; I agree completely. I have no problem leaving doomed (or currently perceived as doomed) PEPs to remain in limbo until the author(s) choose to seal their fate. >> For a PEP to be accepted it must meet certain minimum criteria. It >> must be a clear description of the proposed enhancement. The >> enhancement must represent a net improvement. The implementation, >> if applicable, must be solid and must not complicate the >> interpreter unduly. Finally, a proposed enhancement must be >> "pythonic" in order to be accepted by the BDFL. (However, >> "pythonic" is an imprecise term; it may be defined as whatever is >> acceptable to the BDFL. This logic is intentionally circular.) Clarification: this paragraph addresses a completely separate issue than the proposed addition above. I have sensed some confusion as to what constitutes an acceptable PEP, and a hand-waving blurb giving a vague definition seems useful. Of course, it would be great if we could make the text more precise, but vagueness may have value here. Comments on the wording are welcome. > IOW, I like the process as it stands and am -1 on the > amendment. It should be up to the pep author to > decide when to stick his head in the guillotine to > see what happens :) What's your opinion now, post-clarifications? Please treat the two parts separately. -- David Goodger From python@rcn.com Tue Apr 29 05:03:43 2003 From: python@rcn.com (Raymond Hettinger) Date: Tue, 29 Apr 2003 00:03:43 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org> Message-ID: <000b01c30e04$53ae5d60$920aa044@oemcomputer> > [David Goodger] > The desire is not to cull the weak, but to promote the > strong. The desire is to change already-implemented and > implicitly-accepted PEPs to from "Status: Draft" to "Status: Accepted" > or "Status: Final". That's a good goal. > Good points; I agree completely. I have no problem leaving doomed (or > currently perceived as doomed) PEPs to remain in limbo until the > author(s) choose to seal their fate. Great. I have one of those ;) > >> For a PEP to be accepted it must meet certain minimum criteria. It > >> must be a clear description of the proposed enhancement. The > >> enhancement must represent a net improvement. The implementation, > >> if applicable, must be solid and must not complicate the > >> interpreter unduly. Finally, a proposed enhancement must be > >> "pythonic" in order to be accepted by the BDFL. (However, > >> "pythonic" is an imprecise term; it may be defined as whatever is > >> acceptable to the BDFL. This logic is intentionally circular.) > > Clarification: this paragraph addresses a completely separate issue than > the proposed addition above. I have sensed some confusion as to what > constitutes an acceptable PEP, and a hand-waving blurb giving a vague > definition seems useful. That's reasonable. I'm not sure it would have filtered out anything except an April Fools pep. > What's your opinion now, post-clarifications? Please treat the two > parts separately. +1 +0 BTW, thanks for your work as PEP editor. Keep it up, Raymond Hettinger From goodger@python.org Tue Apr 29 05:14:29 2003 From: goodger@python.org (David Goodger) Date: Tue, 29 Apr 2003 00:14:29 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 In-Reply-To: <000b01c30e04$53ae5d60$920aa044@oemcomputer> References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org> <000b01c30e04$53ae5d60$920aa044@oemcomputer> Message-ID: <3EADFC25.7000906@python.org> Raymond Hettinger wrote: > I'm not sure it would have filtered out anything > except an April Fools pep. That one was its own reward. :-) -- David Goodger From tim.one@comcast.net Tue Apr 29 05:22:26 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 29 Apr 2003 00:22:26 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC1A4@au3010avexu1.global.avaya.com> Message-ID: <LNBBLJKPBEHFEDALKOLCMEHDEEAB.tim.one@comcast.net> [Delaney, Timothy C] > That's what I was getting at. I know that (for example) most > classes I create have less that 16 entries in their __dict__. > With this change, each class instance would take (approx) twice > as much memory for its __dict__. I suspect that class instance > __dict__ is the most common dictionary I use. Do they have fewer then 6 entries? Dicts with 5 or fewer entries don't change size at all (an "empty dict" comes with room for 5 entries). Surprise <wink>: in many apps, the most frequent use is dicts created to hold keyword arguments at call sites. This is under the covers so you're not normally aware of it. Those almost always hold less than 6 entries; except in apps where they don't. But they're usually short-lived too (not surviving the function call they're created for). > That's not what I meant. Most dictionaries are fairly small. > Large dictionaries are common, but I doubt they are common enough > to offset the potential memory loss from this patch. Currently if > you go one over a threshold you have a capacity of 2*len(d)-1. Two-thirds of which is empty space right after resizing, BTW. > With the patch this would change to 4*len(d)-1 - very significant > for large dictionaries. I don't know that it is. One dict slot consumes 12 bytes on 32-bit boxes, and slots are allocated contiguously so there's no hidden malloc overhead per slot. I hope a dict with a million slots counts as large, but that's "only" 12MB for slot space. When it gets too large to fit in RAM, that's deadly to performance; I've reached that point many times in experimental code, but those were lazy algorithms to an extreme. So I'm more worried about apps with several large dicts than about apps with one huge dict. > ... > I didn't look at the surrounding code (bad Tim D - thwack!) but > in this case I would not expect an appreciable performance loss > from this. However, the fact that we're getting an appreciable > performance *gain* from changes on this branch suggests that it > might be slightly more vulnerable than expected (but should still be > swamped by the resize). There's always more than one effect from a change. Raymond explained that large dict performance is boosted due to fewer collisions, and that makes perfect sense (every probe in a large dict is likely to be a cache miss). It doesn't make sense that fiddling the code inside the if-block slows anything, unless perhaps it's an unfortunate I-stream cache effect slowing the normal (if-block not entered) case. When you're looking at out-of-cache code, second- and third- order causes are often the whole story. From greg@cosc.canterbury.ac.nz Tue Apr 29 05:37:15 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Apr 2003 16:37:15 +1200 (NZST) Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304251948.26774.fincher.8@osu.edu> Message-ID: <200304290437.h3T4bFl09594@oma.cosc.canterbury.ac.nz> Jeremy Fincher <fincher.8@osu.edu>: > It's a minor quibble to be sure, but os.walk doesn't really describe what > exactly it's doing. How about os.walkdir (by analogy with os.listdir). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jack@performancedrivers.com Tue Apr 29 05:36:36 2003 From: jack@performancedrivers.com (Jack Diederich) Date: Tue, 29 Apr 2003 00:36:36 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <000001c30de8$57219be0$125ffea9@oemcomputer>; from python@rcn.com on Mon, Apr 28, 2003 at 07:50:14PM -0400 References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <20030428181920.O15881@localhost.localdomain> <000001c30de8$57219be0$125ffea9@oemcomputer> Message-ID: <20030429003636.Q15881@localhost.localdomain> On Mon, Apr 28, 2003 at 07:50:14PM -0400, Raymond Hettinger wrote: > [jackdiederich] > > You wouldn't have some created some handy tables of 'typical' dictionary > > usage, would you? They would be interesting in general, but very nice > > for the PEPs doing dict optimizations for symbol tables in particular. > > That path proved fruitless. I studied the usage patterns in about > a dozen of my apps and found that there is no such thing as typical. > Instead there are many categories of dictionary usage. [symbol table amongst them] A good proj would be breaking out the particular cases of dictionary usage and using the right dict for the right job. Module symbol tables are dicts that have a different 'typical' usage than dicts in general. They are likely even regular enough in usage to actually _have_ a typical usage (no finger quotes). I've looked at aliasing dicts used in symbol (builtin, module, local) tables so they could be specialized from generic dicts in the source and I get lost in the nuances (esp frame stuff). If someone who did know the code well enough would make the effort it would allow those of us who are familiar but not intimate with the source to take a shot at optimizing a particular use case (symbol table dicts). Alas, people who are that familiar aren't likely to do it, they have more important things to do. -jackdied ps, I've always wanted to try ternary trees as symbol tables. They have worse than O(1) lookup, but in real life are probably OK for symbol tables. They nest beautifully and do cascading caching decently. From tim.one@comcast.net Tue Apr 29 05:57:58 2003 From: tim.one@comcast.net (Tim Peters) Date: Tue, 29 Apr 2003 00:57:58 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304290437.h3T4bFl09594@oma.cosc.canterbury.ac.nz> Message-ID: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> [Greg Ewing] > How about os.walkdir (by analogy with os.listdir). I'm -0 on bothering to change the name, but, if we have to, I'm +1 on walkdir (for the reason Greg gives there). From martin@v.loewis.de Tue Apr 29 06:56:51 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 29 Apr 2003 07:56:51 +0200 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <3EADAC3F.6020802@cybertec.com.au> References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de> <3EADAC3F.6020802@cybertec.com.au> Message-ID: <m3ade9alss.fsf@mira.informatik.hu-berlin.de> Chris Johns <cjohns@cybertec.com.au> writes: > > So I think the configure test should be changed to define HAVE_PTON > > only if all prerequisites of its usage are met (or the entire > > function should be hidden if IPv6 is disabled). > > > > It would make Python more robust, but this is a mistake on my part. It's a trade-off between maintainability and robustness, and in this specific case, we favoured maintainability over robustness: We simply assume that the code ought to compile on all systems that have pton(3). It might be that this assumption is wrong. If so, we need to consider whether we want to support the systems for which it is wrong, in which case my proposal would be to strengthen the pton test (thus ignoring the buggy pton from the platform). In this case, I read your message that it really is your fault and not the system's (for hand-editing pyconfig.h); if you did indeed run autoconf to determine presence of pton, I'd encourage you to contribute a patch that analyses pton in more detail. Regards, Martin From dberlin@dberlin.org Tue Apr 29 07:48:01 2003 From: dberlin@dberlin.org (Daniel Berlin) Date: Tue, 29 Apr 2003 02:48:01 -0400 Subject: [Python-Dev] Thoughts on -O In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC167@au3010avexu1.global.avaya.com> Message-ID: <854D4DF4-7A0E-11D7-9180-000A95A34564@dberlin.org> >> > > Yep - I know this. I would actually suggest removing .pyo and simply > have the info held in the .pyc. > >>> Anyway, any thoughts, rebuttals, etc would be of interest. I'd like >>> to get some discussion before I create a PEP. >> >> I'm not convinced that we need anything, given the minimal effect of >> most currently available optimizations. > > One of my options is to create a PEP specifically to have it rejected. > > However, I think there are definitely a couple of useful things in > here. In particular, it provides a path for introducing optimisations. > One of the complaints I have seen recently is that all optimisations > are being added to both paths. > > Perhaps this could be reduced to a process PEP with the following > major points: > > 1. Any new optimisation must be introduced on the optimised path. > > 2. Optimisations may be promoted from the optimised path to the > vanilla path at BDFL discretion. > > 3. Experimental optimisations in general will required at least one > complete release before being promoted from the optimised path to the > vanilla path. Before everyone gets too far, are there actually concrete separate optimizations we are talking about here? Or is this just "in case someone comes up with an optimization that helps" I'm a compiler hacker by hobby and job (Technically, i'm a 2nd year law student by trade, who works for IBM's TJ Watson Research Center as a GCC Hacker), and i've looked at most optimizing python compilers that have existed in the past 4-5 years (geez, have i been lurking on python-dev that long. Wow. I used to actively contribute now and then, stopped for a few years). The only one that makes any appreciable difference is Psyco (unsurprising, actually), and measurements i did (and i think this was the idea behind it) show this is because of two things 1. Removal of python overhead (ie bytecode execution vs direct machine code) 2. Removal of temporary objects (which is more powerful than it sounds, because of how it's done. Psyco simply doesn't emit code to compute something at runtime until forced. it does as much as it can at compile time, when possible. In this way, one can view it as a very powerful symbolic execution engine) In terms of improvements, starting with Psyco as your base (to be honest, doing something completely different isn't a smart idea. He's got the right idea, there's no other real way you are going to get more speed), the best you can do are the following: 1. Improve the generated machine code (IE better register allocation, better scheduling, a peephole optimizer). as for register allocation, I've never measured how often Psyco spills right now. Some platforms are all about spill code generation (x86), others are more about coalescing registers. 2. Teach it how to execute more operations at compile time (IE improve the symbolic execution engine) 3. Improve the profiling done at runtime. That's about all you can do. I've lumped all classical compiler optimizations into "improve generated machine code", since that is where you'd be able to do them (unless you want to introduce a new middle IR, which will complicate matters greatly, and probably not significantly speed things up). Number 1 can become expensive quickly for a JIT, for rapidly diminishing gains. Number 2 has the natural limit that once you've taught it how to virtualize every base python object and operation, it should be able to compute everything not in a c module given the input, and your limit becomes how good at profiling you are to choose what to specialize. Number 3 doesn't become important until you start hitting negative gains due to choosing the wrong functions to specialize. Any useful thing not involving specialization is some combination of 1. Not going to be applicable without specialization and compilation to machine code (I can think of no useful optimization that will make a significant difference at the python code level, that wouldn't be easier and faster to do at the machine code level. Python does not give enough guarantees that makes it better to optimizer python bytecode). 2. Already covered by the way it does compilation. 3. Too expensive. Couple all of this with the fact that there are a limited number of operations performed at the python level already that aren't taken care of by making a better symbolic execution engine. In short, I believe if you want to seriously talk about "adding this optimization", or "adding that optimization", that time would be better served doing something like psyco (if it's not acceptable or can't be made acceptable), where your main thing was specialization of functions, and compilation to machine code of the specialized functions. These are your only real options for speeding up python code. Diddling around at the python source or bytecode level will buy you *less* (since you still have the interpreter overhead), and be just as difficult (since you will still need to specialize to be able to know the types involved). If you want something to look at besides Psyco, see LLVM's runtime abilities (http://llvm.cs.uiuc.edu). It might also make a good backend machine code optimizer replacement for Psyco's hard-coded x86 output, because it can exploit type information. To put all of this in context, i'm assuming you aren't looking for 5-10% gains, total. Instead, i'm assuming you are looking for very significant speedups (100% or greater). If you only want 5-10%, that's easy to do at just the bytecode level, but you eventually hit the limit of the speed of bytecode execution, and from experience, you will hit it rather quickly. --Dan From mal@lemburg.com Tue Apr 29 08:08:13 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 29 Apr 2003 09:08:13 +0200 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <200304282206.h3SM6md20118@odiug.zope.com> References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> Message-ID: <3EAE24DD.2070409@lemburg.com> Guido van Rossum wrote: >>I've experimented with about a dozen ways to improve dictionary >>performance and found one that benefits some programs by up to >>5% without hurting the performance of other programs by more >>than a single percentage point. >> >>It entails a one line change to dictobject.c resulting in a new >>schedule of dictionary sizes for a given number of entries: Perhaps you could share that change ? Or is it on SF somewhere ? >>Number of Current size Proposed size >>Filled Entries of dictionary of dictionary >>-------------- ------------- ------------- >>[-- 0 to 5 --] 8 8 >>[-- 6 to 10 --] 16 32 >>[-- 11 to 21 --] 32 32 >>[-- 22 to 42 --] 64 128 >>[-- 43 to 85 --] 128 128 >>[-- 86 to 170 --] 256 512 >>[-- 171 to 341 --] 512 512 > > > I suppose there's an "and so on" here, right? I wonder if for > *really* large dicts the space sacrifice isn't worth the time saved? Once upon a time, when I was playing with inlining dictionary tables (now part of the dictionary implementation thanks to Tim), I found that optimizing dictionaries to have a table size 8 gave the best results. Most dictionaries in a Python application have very few entries (and most of them were instance dictionaries at the time -- not sure whether that's changed). Another result of my experiments was that reducing the number of resizes made a big difference. To get some more useful numbers, I suggest to instrument Python to display the table size of dictionaries and the number of resizes necessary to make them that big. You should also keep a good eye on the overall process size. I believe that the reason for the speedups you see is that cache sizes and processor optimizations have changes since the times the current resizing implementation was chosen, so maybe we ought to rethink the parameters: * minimum table size * first three resize steps I don't think that large dictionaries should become more sparse -- that's just a waste of memory. >>The idea is to lower the average sparseness of dictionaries (by >>0% to 50% of their current sparsenes). This results in fewer >>collisions, faster collision resolution, fewer memory accesses, >>and better cache performance. A small side-benefit is halving >>the number of resize operations as the dictionary grows. > > I think you mean "raise the average sparseness" don't you? > (The more sparse something is, the more gaps it has.) > > I tried the patch with my new favorite benchmark, startup time for > Zope (which surely populates a lot of dicts :-). It did give about > 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% > speedup. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 29 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 56 days left From mal@lemburg.com Tue Apr 29 08:11:21 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 29 Apr 2003 09:11:21 +0200 Subject: [Python-Dev] Thoughts on -O In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC141@au3010avexu1.global.avaya.com> Message-ID: <3EAE2599.8020702@lemburg.com> Delaney, Timothy C (Timothy) wrote: > Was doing some thinking in the shower this morning, and came up with some ideas for specifying optimisation. These are currently quite nebulous thoughts ... > > We have the current situation: > > -O only removes asserts > -OO removes asserts and docstrings. That's true, but not what they actually mean: -O ... optimize the byte code without changing semantics -OO ... optimize even further, slight changes in semantics are allowed (note that some tools rely on the availabilitiy of doc-strings) Rather than adding more options, we should rather think about more optimizations to add ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 29 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 56 days left From tdelaney@avaya.com Tue Apr 29 08:16:01 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 17:16:01 +1000 Subject: [Python-Dev] Thoughts on -O Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com> > From: Daniel Berlin [mailto:dberlin@dberlin.org] > > > > 1. Any new optimisation must be introduced on the optimised path. > > > > 2. Optimisations may be promoted from the optimised path to the=20 > > vanilla path at BDFL discretion. > > > > 3. Experimental optimisations in general will required at least one=20 > > complete release before being promoted from the optimised=20 > path to the=20 > > vanilla path. >=20 > Before everyone gets too far, are there actually concrete separate=20 > optimizations we are talking about here? > Or is this just "in case someone comes up with an optimization that=20 > helps" One I had in mind would be the CALL_ATTR patch, which Guido explicitly mentioned as having been implemented on the main path, not on the optimised path, and pointed out that if it had been implemented only on the optimised path a number of issues with it would have been discovered much earlier. > The only one that makes any appreciable difference is Psyco Indeed. I would love Psyco to eventually be part of Python, but suspect it will only be so in the PyPy implementation. > To put all of this in context, i'm assuming you aren't looking for=20 > 5-10% gains, total. Instead, i'm assuming you are looking for very=20 > significant speedups (100% or greater). Many of the recent optimisation patches have involved 5% speedups in some cases. If they all worked without impacting each other (cache effects, etc) we could probably approach 50% improvement in some cases. I have no problems if someone can get a 5% speedup across the board without introducing incredibly hairy code. I would like such optimisations to eventually become part of the main path - but I would prefer that it not become part of the main path until it has been exposed to many different environments - assuming the implementor or someone else can't come up with one or more cases where it becomes a pessimisation. > If you only want 5-10%, that's easy to do at just the bytecode level,=20 > but you eventually hit the limit of the speed of bytecode execution,=20 > and from experience, you will hit it rather quickly. Indeed. Every attempt so far has either been in the 5% improvement or less, standalone, and most have resulted in worse performance when combined. Tim Delaney From tdelaney@avaya.com Tue Apr 29 08:25:50 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Tue, 29 Apr 2003 17:25:50 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com> > From: Tim Peters [mailto:tim.one@comcast.net] >=20 > [Delaney, Timothy C] > > That's what I was getting at. I know that (for example) most > > classes I create have less that 16 entries in their __dict__. > > With this change, each class instance would take (approx) twice > > as much memory for its __dict__. I suspect that class instance > > __dict__ is the most common dictionary I use. >=20 > Do they have fewer then 6 entries? Dicts with 5 or fewer=20 > entries don't > change size at all (an "empty dict" comes with room for 5 entries). No hard and fast data here. That would require grovelling through code = ;) I was making an quick estimate. Off the top of my head, most classes I create have ... __init__ 3-5 other methods 3-5 instance attributes Hmm - that would only be 3-5 instance __dict__ entries, with 4-6 class __dict__ entries, correct? I was forgetting that methods are put into the instance __dict__. Bah - it's too late. It's the end of the day, and I've barely managed to get 2 hours real work done. Tim Delaney From python@rcn.com Tue Apr 29 09:12:52 2003 From: python@rcn.com (Raymond Hettinger) Date: Tue, 29 Apr 2003 04:12:52 -0400 Subject: [Python-Dev] Dictionary tuning upto 100,000 entries References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <3EAE24DD.2070409@lemburg.com> Message-ID: <000201c30e27$a89bc4c0$125ffea9@oemcomputer> [Raymond] > >>I've experimented with about a dozen ways to improve dictionary > >>performance and found one that benefits some programs by up to > >>5% without hurting the performance of other programs by more > >>than a single percentage point. > >> > >>It entails a one line change to dictobject.c resulting in a new > >>schedule of dictionary sizes for a given number of entries: [Mark Lemburg] > Perhaps you could share that change ? Or is it on SF somewhere ? It was in the original post. But SF is better, so I just loaded it to the patch manager: www.python.org/sf/729395 [GvR] > > I suppose there's an "and so on" here, right? I wonder if for > > *really* large dicts the space sacrifice isn't worth the time saved? Due to the concerns raised about massive dictionaries, I revised the patch to switch back to the old growth schedule for sizes above 100,000 entries (approx 1.2 Mb). [Mark Lemburg] > I believe that the reason for the speedups you see is > that cache sizes and processor optimizations have changes > since the times the current resizing implementation was chosen, > so maybe we ought to rethink the parameters: > > * minimum table size > * first three resize steps I've done dozens of experiements with changing these parameters and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7) but found that what helped some applications would hurt others. The current tuning remains fairly effective. Changing the resize step from *2 to *4 was the only alteration that yielded across the board improvements. Raymond Hettinger From mal@lemburg.com Tue Apr 29 10:48:03 2003 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 29 Apr 2003 11:48:03 +0200 Subject: [Python-Dev] Dictionary tuning upto 100,000 entries In-Reply-To: <000201c30e27$a89bc4c0$125ffea9@oemcomputer> References: <001b01c30dbf$94363140$125ffea9@oemcomputer> <200304282206.h3SM6md20118@odiug.zope.com> <3EAE24DD.2070409@lemburg.com> <000201c30e27$a89bc4c0$125ffea9@oemcomputer> Message-ID: <3EAE4A53.2030005@lemburg.com> Raymond Hettinger wrote: >>I believe that the reason for the speedups you see is >>that cache sizes and processor optimizations have changes >>since the times the current resizing implementation was chosen, >>so maybe we ought to rethink the parameters: >> >>* minimum table size >>* first three resize steps > > > I've done dozens of experiements with changing these parameters > and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7) > but found that what helped some applications would hurt others. > The current tuning remains fairly effective. Changing the resize > step from *2 to *4 was the only alteration that yielded across > the board improvements. Ok, but I still fear that using *4 will cause too much memory bloat for dicts which have more than 10-30 entries. If you instrument Python you'll find that for typical applications, most dictionaries will have only few entries. Tuning the implementation to those findings is what you really want to do :-) If you take e.g. Zope, what difference in memory consumption does your patch make ? -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Apr 29 2003) >>> Python/Zope Products & Consulting ... http://www.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ EuroPython 2003, Charleroi, Belgium: 56 days left From guido@python.org Tue Apr 29 11:19:18 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Apr 2003 06:19:18 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: "Your message of Mon, 28 Apr 2003 21:49:52 EDT." <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCEEGNEEAB.tim.one@comcast.net> Message-ID: <200304291019.h3TAJI517748@pcp02138704pcs.reston01.va.comcast.net> > What could be much worse is that stuffing code into the if-block > bloats the code so much as to frustrate lookahead I-stream caching > of the normal "branch taken and return 0" path: > > if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) { > if (dictresize(mp, mp->ma_used*2) != 0) > return -1; > } > return 0; > > Rewriting as > > if (mp->ma_used <= n_used || mp->ma_fill*3 < (mp->ma_mask+1)*2) > return 0; > > return dictresize(mp, mp->ma_used*2) ? -1 : 0; That last line might as well be return dictresize(mp, mp->ma_used*2); /* Or *4, per Raymond */ Which reminds me, there are two other places where dictresize() is called; shouldn't those be changed to the new fill factor? All in ll I think I'm mildly in favor of this change. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 29 11:36:12 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Apr 2003 06:36:12 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: "Your message of Tue, 29 Apr 2003 00:57:58 EDT." <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> Message-ID: <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net> > [Greg Ewing] > > How about os.walkdir (by analogy with os.listdir). [Tim] > I'm -0 on bothering to change the name, but, if we have to, I'm +1 on > walkdir (for the reason Greg gives there). I'm -1 om changing the name. os.walk() it is. Short-n-sweet, --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 29 11:46:58 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Apr 2003 06:46:58 -0400 Subject: [Python-Dev] Thoughts on -O In-Reply-To: "Your message of Tue, 29 Apr 2003 17:16:01 +1000." <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22C@au3010avexu1.global.avaya.com> Message-ID: <200304291046.h3TAkwt17905@pcp02138704pcs.reston01.va.comcast.net> [Tim Delaney] > One I had in mind would be the CALL_ATTR patch, which Guido > explicitly mentioned as having been implemented on the main > path, not on the optimised path, and pointed out that if it > had been implemented only on the optimised path a number of > issues with it would have been discovered much earlier. Correction: I meant to say that about the optimization of expressions of the form '-' NUMBER # e.g. -1 This was buggy for years. I'm not aware of problems with CALL_ATTR (which exists only as a patch on SF) except that it's not always a speedup. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 29 12:04:27 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Apr 2003 07:04:27 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: "Your message of Tue, 29 Apr 2003 17:25:50 +1000." <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC22E@au3010avexu1.global.avaya.com> Message-ID: <200304291104.h3TB4R618388@pcp02138704pcs.reston01.va.comcast.net> [Tim Delaney] > Off the top of my head, most classes I create have ... > > __init__ > 3-5 other methods > 3-5 instance attributes > > Hmm - that would only be 3-5 instance __dict__ entries, with > 4-6 class __dict__ entries, correct? > > I was forgetting that methods are put into the instance __dict__. No, they're not. > Bah - it's too late. It's the end of the day, and I've barely > managed to get 2 hours real work done. That might explain your recent goofs. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 29 12:54:23 2003 From: guido@python.org (Guido van Rossum) Date: Tue, 29 Apr 2003 07:54:23 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 In-Reply-To: "Your message of Mon, 28 Apr 2003 20:57:50 EDT." <20030429005750.GA17963@panix.com> References: <20030429005750.GA17963@panix.com> Message-ID: <200304291154.h3TBsNc18815@pcp02138704pcs.reston01.va.comcast.net> > There's some truth to that. OTOH, until the BDFL declares something > to be an ex-PEP, I don't think BDFL rejection of a PEP means that it > is forever dead -- it just requires substantial revision to > resurrect it. The point of PEPs is to prevent rehashing of old > subjects in the same way, not to prevent new ideas from restarting > discussions. In general, it's better to create a new PEP if you have a new idea. The only reason to revive a rejected PEP would be if the reason for rejecting the specific idea put forth in the PEP becomes invalid. A PEP should propose a specific solution. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Tue Apr 29 14:38:05 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 29 Apr 2003 08:38:05 -0500 Subject: [Python-Dev] proposed amendments to PEP 1 In-Reply-To: <3EADF732.7020300@python.org> References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org> Message-ID: <16046.32829.920029.296191@montanaro.dyndns.org> I'd like to move PEP 305 (CSV) along and intend to bring the text up-to-date w.r.t. the current implementation, however the code which implements CSV reading and writing doesn't currently handle Unicode. Given that there is a module checked into CSV, what should the PEP's status be, "draft" or "accepted" or something else? Skip From goodger@python.org Tue Apr 29 14:43:46 2003 From: goodger@python.org (David Goodger) Date: Tue, 29 Apr 2003 09:43:46 -0400 Subject: [Python-Dev] proposed amendments to PEP 1 In-Reply-To: <16046.32829.920029.296191@montanaro.dyndns.org> References: <3EADB58A.2030607@python.org> <000101c30de8$57758840$125ffea9@oemcomputer> <3EADF732.7020300@python.org> <16046.32829.920029.296191@montanaro.dyndns.org> Message-ID: <3EAE8192.4030803@python.org> Skip Montanaro wrote: > I'd like to move PEP 305 (CSV) along and intend to bring the text up-to-date > w.r.t. the current implementation, however the code which implements CSV > reading and writing doesn't currently handle Unicode. Given that there is a > module checked into CSV, CVS? > what should the PEP's status be, "draft" or > "accepted" or something else? "Accepted" for now, becoming "Final" when the implementation is finished. Assuming my first proposed PEP 1 amendment is okayed, Guido has already indicated that PEP 305 is to be accepted. -- David Goodger From skip@pobox.com Tue Apr 29 16:11:09 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 29 Apr 2003 10:11:09 -0500 Subject: [Python-Dev] Dictionary tuning Message-ID: <16046.38413.487331.327698@montanaro.dyndns.org> >> Have you guys tried out the patch? I'm very interested in getting >> results from different benchmarks, processors, cache sizes, and >> various operating systems. Tim> If I can find the time I will. We're in crunch time on my project Tim> at the moment ... I'm somewhat over-allocated :( Can't you just head over to Dunkin' Donuts and resize? ;-) Skip From fdrake@acm.org Tue Apr 29 16:15:54 2003 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 29 Apr 2003 11:15:54 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <16046.38413.487331.327698@montanaro.dyndns.org> References: <16046.38413.487331.327698@montanaro.dyndns.org> Message-ID: <16046.38698.308785.590565@grendel.zope.com> Skip Montanaro writes: > Can't you just head over to Dunkin' Donuts and resize? ;-) Ooh, ooh! Count me in! ...er, oh, I guess I've done that too many times already. Never mind. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation From tdelaney@avaya.com Tue Apr 29 22:51:31 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Wed, 30 Apr 2003 07:51:31 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC252@au3010avexu1.global.avaya.com> > From: Guido van Rossum [mailto:guido@python.org] >=20 > [Tim Delaney] > > Off the top of my head, most classes I create have ... > >=20 > > __init__ > > 3-5 other methods > > 3-5 instance attributes > >=20 > > Hmm - that would only be 3-5 instance __dict__ entries, with > > 4-6 class __dict__ entries, correct? > >=20 > > I was forgetting that methods are put into the instance __dict__. >=20 > No, they're not. Bah - I meant to say __class__.__dict__ - if you look at the numbers = above they add up that way. > > Bah - it's too late. It's the end of the day, and I've barely > > managed to get 2 hours real work done. >=20 > That might explain your recent goofs. :-) See above ;) Well, it's a whole new day ... I've got an 8am phone call to the US (10 minutes away) ... maybe I can do better today ... Tim Delaney From tdelaney@avaya.com Tue Apr 29 22:52:52 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Wed, 30 Apr 2003 07:52:52 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com> > From: Skip Montanaro [mailto:skip@pobox.com] >=20 > >> Have you guys tried out the patch? I'm very interested=20 > in getting > >> results from different benchmarks, processors, cache sizes, and > >> various operating systems. >=20 > Tim> If I can find the time I will. We're in crunch time=20 > on my project > Tim> at the moment ... I'm somewhat over-allocated :( >=20 > Can't you just head over to Dunkin' Donuts and resize? ;-) Umm ... I'm parsing this OK ... seems syntactically correct ... but the not sure about the semantics ... Tim Delaney From skip@pobox.com Tue Apr 29 23:05:06 2003 From: skip@pobox.com (Skip Montanaro) Date: Tue, 29 Apr 2003 17:05:06 -0500 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com> References: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC253@au3010avexu1.global.avaya.com> Message-ID: <16046.63250.915898.339768@montanaro.dyndns.org> Tim> at the moment ... I'm somewhat over-allocated :( Skip> Can't you just head over to Dunkin' Donuts and resize? ;-) Tim> Umm ... I'm parsing this OK ... seems syntactically correct ... but Tim> the not sure about the semantics ... Well, when a dictionary is over-allocated, we make it bigger to create more space. I was thinking maybe you could try a similar sort of approach using donuts... Skip From tdelaney@avaya.com Wed Apr 30 00:54:35 2003 From: tdelaney@avaya.com (Delaney, Timothy C (Timothy)) Date: Wed, 30 Apr 2003 09:54:35 +1000 Subject: [Python-Dev] Dictionary tuning Message-ID: <338366A6D2E2CA4C9DAEAE652E12A1DE4AC291@au3010avexu1.global.avaya.com> > From: Skip Montanaro [mailto:skip@pobox.com] >=20 > Tim> at the moment ... I'm somewhat over-allocated :( >=20 > Skip> Can't you just head over to Dunkin' Donuts and resize? ;-) >=20 > Tim> Umm ... I'm parsing this OK ... seems syntactically=20 > correct ... but > Tim> the not sure about the semantics ... >=20 > Well, when a dictionary is over-allocated, we make it bigger=20 > to create more > space. I was thinking maybe you could try a similar sort of=20 > approach using > donuts... Making me bigger won't help anything (I'm trying to make myself smaller). Now, if Dunkin' Donuts can make more of me, that's another matter ... Tim Delaney From gward@python.net Wed Apr 30 03:07:44 2003 From: gward@python.net (Greg Ward) Date: Tue, 29 Apr 2003 22:07:44 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net> References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20030430020743.GA6541@cthulhu.gerg.ca> On 29 April 2003, Guido van Rossum said: > I'm -1 om changing the name. os.walk() it is. Sheesh, it's like my undeniably brilliant suggestion of os.walktree() disappeared into thin air. Ah well, back to wasting my vast talents elsewhere... ;-> Greg -- Greg Ward <gward@python.net> http://www.gerg.ca/ Jesus Saves -- and you can too, by redeeming these valuable coupons! From laotzu@pobox.com Wed Apr 30 03:16:18 2003 From: laotzu@pobox.com (Mathieu Fenniak) Date: Tue, 29 Apr 2003 20:16:18 -0600 Subject: [Python-Dev] 2.3b1, and object() Message-ID: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> I've been testing Python 2.3b1 since its release. I've tested it with a number of applications I've written myself, as well as testing most of the new language features and modules out. I've encountered no problems, and everything is happy and working. On an unrelated note, I'm curious, what's the difference between an instance of an object, and an instance of an empty class? Calling the object builtin returns an <object object at ...>, which I would expect would function the same as a 'class blah(object): pass', but they do not function similarly at all. >>> class A(object): pass >>> a = A() >>> a.i = 5 >>> a.i 5 >>> >>> a = object() >>> a.i = 5 Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'object' object has no attribute 'i' -- Random words of the day: Who does not trust enough will not be trusted. Lao-Tzu Mathieu Fenniak <laotzu@pobox.com> PGP Key ID 0x2459092A http://www.stompstompstomp.com/ From drifty@alum.berkeley.edu Wed Apr 30 04:47:37 2003 From: drifty@alum.berkeley.edu (Brett Cannon) Date: Tue, 29 Apr 2003 20:47:37 -0700 (PDT) Subject: [Python-Dev] test_logging hangs on Solaris 8 (and 9) Message-ID: <Pine.SOL.4.55.0304292042280.9903@death.OCF.Berkeley.EDU> (sorry for messing up people's threading of this thread but I deleted the original emails since I summarized it already in my rough draft of the next summary) I just created patch #729988 that I think fixes any possible hanging issues with test_logging in regards to it hanging after completing test 3 (its last test). I just switched the lock used from a Condition lock (which I think was sending its 'notify' faster than it took to reach the 'wait' call in the main thread) to an Event lock. It solves the hanging on my OS X box. The reason I didn't apply it is that I don't have much threading experience and I would rather be safe than sorry. I just need someone to sign off on it; I will apply it myself. -Brett From tim_one@email.msn.com Wed Apr 30 05:32:23 2003 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 30 Apr 2003 00:32:23 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <3EAE24DD.2070409@lemburg.com> Message-ID: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> [M.-A. Lemburg] > ... > Once upon a time, when I was playing with inlining dictionary > tables (now part of the dictionary implementation thanks to Tim), Thank you! > ... > I don't think that large dictionaries should become more > sparse -- that's just a waste of memory. Collision resolution is very fast if the dict slots happen to live in cache. When they're out of cache, the apparent speed of the C code is irrelevant, the time is virtually all consumed by the HW (or even OS) resolving the cache misses, and every collision probe is very likely to be a cache miss then (the probe sequence-- by design --jumps all over the slots in (pseudo-)random order). So when Raymond explained that increasing sparseness helped *most* for large dicts, it made great sense to me. We can likely resolve dozens of collisions in a small dict in the time it takes for one extra probe in a large dict. Jeremy had a possibly happy idea wrt this: make the collision probe sequence start in a slot adjacent to the colliding slot. That's likely to get sucked into cache "for free", tagging along with the slot that collided. If that's effective, it could buy much of the "large dict" speed gains Raymond saw without increasing the dict size. If someone wants to experiment with that in lookdict_string(), stick a new ++i; before the for loop, and move the existing i = (i << 2) + i + perturb + 1; to the bottom of that loop. Likewise for lookdict(). From cjohns@cybertec.com.au Wed Apr 30 06:10:50 2003 From: cjohns@cybertec.com.au (Chris Johns) Date: Wed, 30 Apr 2003 15:10:50 +1000 Subject: [Python-Dev] cvs socketmodule.c and IPV6 disabled In-Reply-To: <3EADAC3F.6020802@cybertec.com.au> References: <3EAD2D38.3030906@cybertec.com.au> <m3adea9yj7.fsf@mira.informatik.hu-berlin.de> <3EADA318.5010602@cybertec.com.au> <3EADA886.9020605@v.loewis.de> <3EADAC3F.6020802@cybertec.com.au> Message-ID: <3EAF5ADA.2010006@cybertec.com.au> Chris Johns wrote: > Martin v. Lvwis wrote: > >> >> I see. And the system does have inet_pton? *That* sounds like a bug to >> me - there should be no inet_pton if the IPv6 API is unsupported. > > > Agreed. I will disable them. > I disabled HAVE_INET_PTON in the pyconfig.h although the functions are present in the RTEMS header files as suggested. This throws up another error. When disabled the inet_pton and inet_ntop funtions in socketmodule.c are built. The RTEMS prototypes and the ones provided in socketmodule.c are not extactly the same giving a compile time error. The RTEMS history is the IP stack is a port of the FreeBSD stack from a while ago. It must have some IPV6 things how-ever as far as I know is not working on RTEMS. I suspect it is not complete/current. I feel the best solution is to define HAVE_INET_PTON in pyconfig.h. >> >> So I think the configure test should be changed to define HAVE_PTON >> only if all prerequisites of its usage are met (or the entire function >> should be hidden if IPv6 is disabled). >> > > It would make Python more robust, but this is a mistake on my part. > I wrapped 'socket_inet_pton' and friend with ENABLE_IPV6 and sockets under RTEMS work. -- Chris Johns, cjohns at cybertec.com.au From martin@v.loewis.de Wed Apr 30 06:14:44 2003 From: martin@v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=) Date: 30 Apr 2003 07:14:44 +0200 Subject: [Python-Dev] 2.3b1, and object() In-Reply-To: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> References: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> Message-ID: <m3d6j4efcr.fsf@mira.informatik.hu-berlin.de> Mathieu Fenniak <laotzu@pobox.com> writes: > On an unrelated note, I'm curious, what's the difference between an > instance of an object, and an instance of an empty class? On python-dev, you are supposed to study the Python source code to answer such questions (or find other means to investigate the answer yourself) Regards, Martin From greg@cosc.canterbury.ac.nz Wed Apr 30 06:34:54 2003 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Apr 2003 17:34:54 +1200 (NZST) Subject: [Python-Dev] 2.3b1, and object() In-Reply-To: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> Message-ID: <200304300534.h3U5Ysa15757@oma.cosc.canterbury.ac.nz> Mathieu Fenniak <laotzu@pobox.com>: > >>> class A(object): pass > >>> a = A() > >>> a.i = 5 > >>> a.i > 5 > >>> > > >>> a = object() > >>> a.i = 5 > Traceback (most recent call last): > File "<stdin>", line 1, in ? > AttributeError: 'object' object has no attribute 'i' I think this is because object is a built-in type, and as such doesn't allow attributes to be added, unless you create a Python subclass of it. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From Anthony Baxter <anthony@interlink.com.au> Wed Apr 30 08:29:08 2003 From: Anthony Baxter <anthony@interlink.com.au> (Anthony Baxter) Date: Wed, 30 Apr 2003 17:29:08 +1000 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <20030427145316.475c3cf5.itamar@itamarst.org> Message-ID: <200304300729.h3U7T9O05308@localhost.localdomain> >>> Itamar Shtull-Trauring wrote > In real programs the speed drop would probably be much less pronounced, > although I bet this slows down e.g. Anthony Baxter's portforwarder quite > a bit. If Python 2.3 is released without fixing this Twisted will > probably monkeypatch the socket module so that we can get full > performance, since we have our own (unavoidable) layers of Python > indirection :) For whatever reason, it actually doesn't seem to matter. Python2.2 seems to clock in about 10% slower (in throughput and connections/second) than the same code running under 2.3a1. Upgrading to current-CVS, I see almost no difference between 2.3a1 and current-CVS (maybe 5% improvement). (FWIW, python2.1 is almost 25% slower than current-cvs!) The code in question is pythondirector, a pure-python TCP loadbalancer, http://pythondirector.sf.net/. In this case all the above were run with Twisted 1.0.3. All tests were run on my laptop via the loopback interface. Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood. From guido@python.org Wed Apr 30 14:49:51 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 30 Apr 2003 09:49:51 -0400 Subject: [Python-Dev] Re: os.path.walk() lacks 'depth first' option In-Reply-To: Your message of "Tue, 29 Apr 2003 22:07:44 EDT." <20030430020743.GA6541@cthulhu.gerg.ca> References: <LNBBLJKPBEHFEDALKOLCGEHIEEAB.tim.one@comcast.net> <200304291036.h3TAaCA17856@pcp02138704pcs.reston01.va.comcast.net> <20030430020743.GA6541@cthulhu.gerg.ca> Message-ID: <200304301349.h3UDnpJ28834@odiug.zope.com> > On 29 April 2003, Guido van Rossum said: > > I'm -1 om changing the name. os.walk() it is. > > Sheesh, it's like my undeniably brilliant suggestion of os.walktree() > disappeared into thin air. Ah well, back to wasting my vast talents > elsewhere... ;-> > > Greg Sorry, I didn't see your suggestion until after I'd released 2.3b1. The difference is not significant enough to rename things again after the beta release. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 30 14:59:46 2003 From: guido@python.org (Guido van Rossum) Date: Wed, 30 Apr 2003 09:59:46 -0400 Subject: [Python-Dev] 2.3b1, and object() In-Reply-To: Your message of "Tue, 29 Apr 2003 20:16:18 MDT." <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> References: <BA1E8A2B-7AB1-11D7-9D9E-000393903B64@pobox.com> Message-ID: <200304301359.h3UDxku28868@odiug.zope.com> > On an unrelated note, I'm curious, what's the difference between an > instance of an object, and an instance of an empty class? Calling the > object builtin returns an <object object at ...>, which I would expect > would function the same as a 'class blah(object): pass', but they do > not function similarly at all. > > >>> class A(object): pass > >>> a = A() > >>> a.i = 5 > >>> a.i > 5 > >>> > > >>> a = object() > >>> a.i = 5 > Traceback (most recent call last): > File "<stdin>", line 1, in ? > AttributeError: 'object' object has no attribute 'i' Instances of 'object' don't have an instance dict, so they are uncapable of having instance variables. When you use a class statement, instances of the subclass get an instance dict, unless __slots__ is used in that class statement. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Wed Apr 30 16:06:51 2003 From: python@rcn.com (Raymond Hettinger) Date: Wed, 30 Apr 2003 11:06:51 -0400 Subject: [Python-Dev] Dictionary tuning References: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> Message-ID: <001101c30f2a$216954a0$b1b3958d@oemcomputer> > Jeremy had a possibly happy idea wrt this: make the collision probe > sequence start in a slot adjacent to the colliding slot. That's likely to > get sucked into cache "for free", tagging along with the slot that collided. > If that's effective, it could buy much of the "large dict" speed gains > Raymond saw without increasing the dict size. I worked on similar approaches last month and found them wanting. The concept was that a 64byte cache line held 5.3 dict entries and that probing those was much less expensive than making a random probe into memory outside of the cache. The first thing I learned was that the random probes were necessary to reduce collisions. Checking the adjacent space is like a single step of linear chaining, it increases the number of collisions. That would be fine if the cost were offset by decreased memory access time; however, for small dicts, the whole dict is already in cache and having more collisions degrades performance with no compensating gain. The next bright idea was to have a separate lookup function for small dicts and for larger dictionaries. I set the large dict lookup to search adjacent entries. The good news is that an artificial test of big dicts showed a substantial improvement (around 25%). The bad news is that real programs were worse-off than before. A day of investigation showed the cause. The artificial test accessed keys randomly and showed the anticipated benefit. However, real programs access some keys more frequently than others (I believe Zipf's law applies.) Those keys *and* their collision chains are likely already in the cache. So, big dicts had the same limitation as small dicts: You always lose when you accept more collisions in return for exploiting cache locality. The conclusion was clear, the best way to gain performance was to have fewer collisions in the first place. Hence, I resumed experiments on sparsification. > > If someone wants to experiment with that in lookdict_string(), stick a new > > ++i; > > before the for loop, and move the existing > > i = (i << 2) + i + perturb + 1; > > to the bottom of that loop. Likewise for lookdict(). PyStone gains 1%. PyBench loses a 1%. timecell gains 2% (spreadsheet benchmark) timemat loses 2% (pure python matrix package benchmark) timepuzzle loses 1% (class based graph traverser) Raymond Hettinger P.S. There is one other way to improve cache behavior but it involves touching code throughout dictobject.c. Move the entry values into a separate array from the key/hash pairs. That way, you get 8 entries per cache line. P.P.S. One other idea is to use a different search pattern for small dictionaries. Store entries in a self-organizing list with no holes. Dummy fields aren't needed which saves a test in the linear search loop. When an entry is found, move it one closer to the head of the list so that the most common entries get found instantly. Since there are no holes, all eight cells can be used instead of the current maximum of five. Like the current arrangement, the whole small dict fits into just two cache lines. ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# ################################################################# From jepler@unpythonic.net Wed Apr 30 17:16:16 2003 From: jepler@unpythonic.net (Jeff Epler) Date: Wed, 30 Apr 2003 11:16:16 -0500 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> References: <3EAE24DD.2070409@lemburg.com> <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> Message-ID: <20030430161615.GB22792@unpythonic.net> On Wed, Apr 30, 2003 at 12:32:23AM -0400, Tim Peters wrote: > If someone wants to experiment with that in lookdict_string(), stick a new > > ++i; > > before the for loop, and move the existing > > i = (i << 2) + i + perturb + 1; > > to the bottom of that loop. Likewise for lookdict(). You might also investigate making PyDictEntry a power-of-two bytes big (it's currently 12 bytes) so that they align nicely in the cache, and then use i ^= 1; instead of ++i; so that the second key checked is always in the same (32-byte or bigger) cache line. Of course, increasing the size of PyDictEntry would also increase the size of all dicts by 33%, so the speed payoff would have to be big. It's also not obvious that ma_smalltable will be 32-byte aligned (since no special effort was made, it's unlikely to be). If it's not, then this optimization would still not pay (compared to i++) for <= MINSIZE dictionaries. (which are the important case?) A little program indicates that the table has an 8-byte or better alignment, the xor approach gives same-cache-line results more frequently than the increment approach even with a 12-byte PyDictEntry. This doesn't quite make sense to me. It also indicates that if the alignment is not 32 bytes but the dict is 16 bytes that xor is a loss, which does make sense. The results (for a 32-byte cache line): algorithm sizeof() alignment % in same cache line i^=1 12 4 62.5 i^=1 12 8 75.0 i^=1 12 16 75.0 i^=1 12 32 75.0 i^=1 16 4 50.0 i^=1 16 8 50.0 i^=1 16 16 50.0 i^=1 16 32 100.0 ++i 12 4 62.5 ++i 12 8 62.5 ++i 12 16 62.5 ++i 12 32 62.5 ++i 16 4 50.0 ++i 16 8 50.0 ++i 16 16 50.0 ++i 16 32 50.0 so using i^=1 and adding 4 bytes to each dict (if necessary) to get 8-alignment of ma_smalltable would give a 12.5% increase in the hit rate of the second probe compared to i++. Ouch. When I take into account that each probe accesses me_key (not just me_hash) the results change: i^=1 16 4 37.5 ++i 16 4 37.5 i^=1 12 16 50.0 i^=1 12 32 50.0 i^=1 12 4 50.0 i^=1 12 8 50.0 i^=1 16 16 50.0 i^=1 16 8 50.0 ++i 12 16 50.0 ++i 12 32 50.0 ++i 12 4 50.0 ++i 12 8 50.0 ++i 16 16 50.0 ++i 16 32 50.0 ++i 16 8 50.0 i^=1 16 32 100.0 You don't beat i++ unless you go to size 16 with alignment 32. Looking at the # of cache lines accessed on average, the numbers are unsurprising. For the 37.5% items, 1.625 cache lines are accessed for the two probes, 1.5 for the 50% items, and 1.0 for the 100% items. Looking at the number of cache lines accessed for a single probe, 8-or-better alignment gives 1.0 cache lines accessed for 16-byte structures, and 1.125 for all other cases (4-byte alignment or 12-byte structure) If the "more than 3 probes" case bears optimizing (and I doubt it does), the for(perturb) loop could be unrolled once, with even iterations using ++i or i^=1 and odd iterations using i = (i << 2) + i + perturb + 1; so that the same-cache-line property is used as often as possible. Of course, the code duplication of the rest of the loop body will increase i-cache pressure a bit. And I'm surprised if you read this far. Summary: i^=1 is not likely to win comapred to ++i, unless we increase dict size 33%. Jeff From itamar@itamarst.org Wed Apr 30 17:41:54 2003 From: itamar@itamarst.org (Itamar Shtull-Trauring) Date: Wed, 30 Apr 2003 12:41:54 -0400 Subject: [Python-Dev] Python 2.3b1 has 20% slower networking? In-Reply-To: <200304300729.h3U7T9O05308@localhost.localdomain> References: <20030427145316.475c3cf5.itamar@itamarst.org> <200304300729.h3U7T9O05308@localhost.localdomain> Message-ID: <20030430124154.2da91bfe.itamar@itamarst.org> On Wed, 30 Apr 2003 17:29:08 +1000 Anthony Baxter <anthony@interlink.com.au> wrote: > For whatever reason, it actually doesn't seem to matter. OK, great. And thanks to the python-dev team for fixing the issue in CVS so quickly. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting From python@rcn.com Wed Apr 30 18:30:22 2003 From: python@rcn.com (Raymond Hettinger) Date: Wed, 30 Apr 2003 13:30:22 -0400 Subject: [Python-Dev] Dictionary tuning References: <3EAE24DD.2070409@lemburg.com> <LNBBLJKPBEHFEDALKOLCKEIAEIAB.tim_one@email.msn.com> <20030430161615.GB22792@unpythonic.net> Message-ID: <000901c30f3e$2e31a3e0$b1b3958d@oemcomputer> > And I'm surprised if you read this far. Summary: i^=1 is not likely to > win comapred to ++i, unless we increase dict size 33%. Right! I had tried i^=1 and it had near zero or slightly negative effects on performance. It resulted in more collisions, though the collisions were resolved relatively cheaply. I had also experimented with changing alignment, but nothing helped. Everything is already word aligned and that takes care of the HW issues. The only benefit to the alignment is that i^=1 guarantees a cache hit. Without alignment, the odds are 4 out of 5.3 will have a hit (since there a 5.3 entries to a line). Increasing the dict size 33% with unused space doesn't help sparseness and negatively impacts the chance cache hits you already have with smaller dictionaries. heyhey-mymy-there's-more-to-the-picture-than-meets-the-eye, Raymond Hettinger From tim.one@comcast.net Wed Apr 30 19:13:43 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 30 Apr 2003 14:13:43 -0400 Subject: [Python-Dev] Dictionary tuning In-Reply-To: <20030430161615.GB22792@unpythonic.net> Message-ID: <BIEJKCLHCIOIHAGOKOLHKEFEFIAA.tim.one@comcast.net> FYI, for years the dict code had some #ifdef'ed preprocessor gimmick to force cache alignment. I ripped that out a while back because nobody ever reported an improvement when using it. From tim.one@comcast.net Wed Apr 30 20:43:45 2003 From: tim.one@comcast.net (Tim Peters) Date: Wed, 30 Apr 2003 15:43:45 -0400 Subject: [Python-Dev] RE: os.path.walk() lacks 'depth first' option In-Reply-To: <1051202649.3ea814599f6fa@mcherm.com> Message-ID: <BIEJKCLHCIOIHAGOKOLHKEFMFIAA.tim.one@comcast.net> [Michael Chermside] > Don't get a swelled head or anything ;-), but your generator-based > version of walk() is beautiful piece of work. I don't mean the code > (although that's clean and readable), but the design. > ... Thanks for the nudge! If you hadn't reminded us, I bet this would have been forgotten. (I would have replied earlier, except my head got so heavy it took this look to peel my lips off the floor.) From python@rcn.com Wed Apr 30 21:14:44 2003 From: python@rcn.com (Raymond Hettinger) Date: Wed, 30 Apr 2003 16:14:44 -0400 Subject: [Python-Dev] Dictionary tuning References: <BIEJKCLHCIOIHAGOKOLHKEFEFIAA.tim.one@comcast.net> Message-ID: <002301c30f55$245394c0$125ffea9@oemcomputer> [Timbot] > FYI, for years the dict code had some #ifdef'ed preprocessor gimmick to > force cache alignment. I ripped that out a while back because nobody ever > reported an improvement when using it. Gee, you mean we're not the first ones to have ever thought up dictionary optimizations that didn't pan out? I've tried square wheels, pentagonal wheels, and gotten even better results with octagonal wheels. Each further subdivision seems to have less-and-less payoff so I'm confident that octagonal is close to optimum ;-) I'm going to write-up an informational PEP to summarize the results of research to-date. After the first draft, I'm sure the other experimenters will each have lessons to share. In addition, I'll attach a benchmarking suite and dictionary simulator (fully instrumented). That way, future generations can reproduce the results and pickup where we left-off. I've decided that this new process should have a name, something pithy, yet magical sounding, so it shall be dubbed SCIENCE. Raymond Hettinger From patmiller@llnl.gov Wed Apr 30 23:15:31 2003 From: patmiller@llnl.gov (Patrick J. Miller) Date: Wed, 30 Apr 2003 15:15:31 -0700 Subject: [Python-Dev] Initialization hook for extenders Message-ID: <3EB04B03.887CDF7B@llnl.gov> This is a multi-part message in MIME format. --------------42714FBF141F967516679964 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I work on several projects that have initialization requirements that need to grab control after Py_Initialize(), but before any user code runs (via input, script, -c, etc...). Note that these are Python clones that take advantage of an installed python (using its $prefix/lib/pythonx.x/*.py and site-packages/*) We could use PyImport_AppendInittab("sitecustomize",initsitecustomize); But if there already IS customization in sitecustomize.py, I've blown it away (and have to look it up and force an import). And if someone uses the -S flag, I'm screwed. I propose a hook styled after Py_AtExit(func) called Py_AtInit(func) which maintains a list of functions that are called in Py_Initialize right after main and site initializations. If the hook isn't used, then the cost is a single extra function call at initialization. Here's a spurious example: A customer wants a version of python that has all the math functions and his extensions to act like builtins... I would write (without refcnt or error checks ;-): #include "Python.h" static void after_init(void) { PyObject *builtin,*builtin_dict,*math,*math_dict,*user,*user_dict; builtin = PyImport_ImportModule("__builtin__"); builtin_dict = PyModule_GetDict(builtin); math = PyImport_ImportModule("math"); math_dict = PyModule_GetDict(math); user = PyImport_ImportModule("user"); user_dict = PyModule_GetDict(math); PyDict_Update(builtin_dictionary, math_dict); PyDict_Update(builtin_dictionary, user_dict); } int main(int argc, char** argv) { PyImport_AppendInittab("user",inituser); Py_AtInit(after_init); return Py_Main(argc, argv); } voila! An extended Python with new builtins. I actually want this to do some MPI initialization to setup a single user prompt with broadcast which has to run after Py_Initialize() but before the import of readline. I've attached a copy of the patch (also going to patches at sf.net) Pat -- Patrick Miller | (925) 423-0309 | http://www.llnl.gov/CASC/people/pmiller Son, when you grow up you will know who I really am. I am just a child like you who has been forced to act responsibly. -- Rod Byrnes --------------42714FBF141F967516679964 Content-Type: text/plain; charset=us-ascii; name="Py_AtInit.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Py_AtInit.diff" Index: dist/src/Include/pythonrun.h =================================================================== RCS file: /cvsroot/python/python/dist/src/Include/pythonrun.h,v retrieving revision 2.62 diff -c -r2.62 pythonrun.h *** dist/src/Include/pythonrun.h 13 Feb 2003 22:07:52 -0000 2.62 --- dist/src/Include/pythonrun.h 30 Apr 2003 22:04:13 -0000 *************** *** 75,80 **** --- 75,81 ---- PyAPI_FUNC(void) PyErr_Display(PyObject *, PyObject *, PyObject *); PyAPI_FUNC(int) Py_AtExit(void (*func)(void)); + PyAPI_FUNC(int) Py_AtInit(void (*func)(void)); PyAPI_FUNC(void) Py_Exit(int); Index: dist/src/Python/pythonrun.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/pythonrun.c,v retrieving revision 2.193 diff -c -r2.193 pythonrun.c *** dist/src/Python/pythonrun.c 22 Apr 2003 11:18:00 -0000 2.193 --- dist/src/Python/pythonrun.c 30 Apr 2003 22:04:16 -0000 *************** *** 106,111 **** --- 106,135 ---- return flag; } + #define NINITFUNCS 32 + static void (*initfuncs[NINITFUNCS])(void); + static int ninitfuncs = 0; + + int Py_AtInit(void (*func)(void)) + { + if (ninitfuncs >= NINITFUNCS) + return -1; + if (!func) + return -1; + initfuncs[ninitfuncs++] = func; + return 0; + } + + static void initinitialize(void) + { + int i; + for(i=0;i<ninitfuncs;++i) { + initfuncs[i](); + if (PyErr_Occurred()) + Py_FatalError("Py_AtInit: initialization error"); + } + } + void Py_Initialize(void) { *************** *** 182,190 **** --- 206,217 ---- initsigs(); /* Signal handling stuff, including initintr() */ initmain(); /* Module __main__ */ + if (!Py_NoSiteFlag) initsite(); /* Module site */ + initinitialize(); /* Extenstion hooks */ + /* auto-thread-state API, if available */ #ifdef WITH_THREAD _PyGILState_Init(interp, tstate); *************** *** 1418,1423 **** --- 1445,1451 ---- #endif /* MS_WINDOWS */ abort(); } + /* Clean up and exit */ --------------42714FBF141F967516679964-- From thfcvjqtoeik@usa.net Wed Apr 30 23:13:35 2003 From: thfcvjqtoeik@usa.net (Megan Shearer) Date: Wed, 30 Apr 03 22:13:35 GMT Subject: [Python-Dev] Continue using wxj Message-ID: <fk$6l$4uq698$$$0@h5x9.gw5> This is a multi-part message in MIME format. --E8931..C.2CBE26.8D33A9AC Content-Type: text/html Content-Transfer-Encoding: quoted-printable <html> <body bgcolor=3D"#ffffff"> <IMG src=3D"http://www.648am.com/where.cfm?id=3Dur1" height=3D"1" width=3D= "1" border=3D0> <div align=3D"center"> <p><a href=3D"http://www.blondebutterflies.com/cosmetics/form.htm"><img = name=3D"urmailer2" src=3D"http://www.blondebutterflies.com/cosmetics/image= s/urmailer2.jpg" width=3D"800" height=3D"554" border=3D"0" alt=3D""></a></= p> </div> <div align=3D"center"><img src=3D"http://www.blondebutterflies.com/footer3= gif" alt=3D"" name=3D"footer3" width=3D"600" height=3D"60" border=3D"0" u= semap=3D"#footer3Map"> <map name=3D"footer3Map"> <area shape=3D"rect" coords=3D"491,22,563,40" href=3D"http://www.blond= ebutterflies.com/nope.html"> <area shape=3D"rect" coords=3D"85,43,486,61" href=3D"http://www.blonde= butterflies.com/next.cfm"> </map> </div> </body> </html>tr dohjno uvl hx dyrgv bqpt lhdwq dgwdrzncz hox t m olcdhvry --E8931..C.2CBE26.8D33A9AC--