From fredrik@pythonware.com Wed Feb 23 09:55:05 2005 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 23 Feb 2005 10:55:05 +0100 Subject: [Python-Dev] RE: Nested scopes resolution -- you can breathe again! References: Message-ID: <01c301c5198d$c6bcc3f0$0900a8c0@SPIFF> Mikael Olofsson wrote: > There really is a time machine. So I guess I can get the full Python 3k > functionality by doing > > from __future__ import * I wouldn't do that: it imports both "warnings_are_errors" and "from_import_star_is_evil", and we've found that it's impossible to catch ParadoxErrors in a platform independent way. Cheers /F From abo at minkirri.apana.org.au Tue Feb 1 00:30:05 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Tue Feb 1 00:30:46 2005 Subject: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls) In-Reply-To: References: <000d01c50744$b2395700$fe26a044@oemcomputer> Message-ID: <1107214205.3719.23.camel@schizo> On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote: > > Wouldn't it be nicer to have a facility that let you send messages > > between processes and manage concurrency properly instead? You'll need > > most of this anyway to do multithreading sanely, and the benefit to the > > multiple process model is that you can scale to multiple machines, not > > just processors. For brokering data between processes on the same > > machine, you can use mapped memory if you can't afford to copy it > > around, which gives you basically all the benefits of threads with > > fewer pitfalls. > > I don't think this is an answered problem. There are plenty of > researchers on both sides of this fence. It is not been proven at all > that threads are a bad model. > > http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf or even > http://www.python.org/~jeremy/weblog/030912.html These are both threads vs events discussions (ie, threads vs an async-event handler loop). This has nearly nothing to do with multiple CPU utilisation. The real discussion for multiple CPU utilisation is threads vs processes. Once again, my knowledge of this is old and possibly out of date, but threads do not scale well on multiple CPU's because threads use shared memory between each thread. Multiple CPU hardware _can_ have physically shared memory, but it is hardware hell keeping CPU caches in sync etc. It is much easier to build a multi-CPU machine with separate memory for each CPU, and high speed communication channels between each CPU. I suspect most modern multi-CPU's use this architecture. Assuming they have the separate-memory architecture, you get much better CPU utilisation if you design your program as separate processes communicating together, not threads sharing memory. In fact, it wouldn't surprise me if most Operating Systems that support threads don't support distributing threads over multiple CPU's at all. A quick google search revealed this; http://www.heise.de/ct/english/98/13/140/ Keeping in mind the high overheads of sharing memory between CPU's, the discussion about threads at this url seems to confirm; threads with shared memory are hard to distribute over multiple CPU's. Different OS's and/or thread implementations have tried (or just outright rejected) different ways of doing it, to varying degrees of success. IMHO, the fact that QNX doesn't distribute threads speaks volumes. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From abo at minkirri.apana.org.au Tue Feb 1 03:06:34 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Tue Feb 1 03:07:22 2005 Subject: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up function calls) In-Reply-To: <1107214205.3719.23.camel@schizo> References: <000d01c50744$b2395700$fe26a044@oemcomputer> <1107214205.3719.23.camel@schizo> Message-ID: <1107223595.3719.36.camel@schizo> On Tue, 2005-02-01 at 10:30 +1100, Donovan Baarda wrote: > On Mon, 2005-01-31 at 15:16 -0500, Nathan Binkert wrote: > > > Wouldn't it be nicer to have a facility that let you send messages > > > between processes and manage concurrency properly instead? You'll need [...] > A quick google search revealed this; > > http://www.heise.de/ct/english/98/13/140/ > > Keeping in mind the high overheads of sharing memory between CPU's, the > discussion about threads at this url seems to confirm; threads with > shared memory are hard to distribute over multiple CPU's. Different OS's > and/or thread implementations have tried (or just outright rejected) > different ways of doing it, to varying degrees of success. IMHO, the > fact that QNX doesn't distribute threads speaks volumes. Sorry for replying to my reply, but I forgot the bit that brings it all back On Topic :-) The belief that the opcode granularity thread-switch driven by the GIL is the cause of Python's threads being non-distributable is only half true. Since OS's don't distribute threads well, any attempts to "Fix Python's Threading" in an attempt to make its threads distributable is a waste of time. The only thing that this might achieve would be to reduce the latency on thread switches, maybe allowing faster response to OS events like signals. However, the complexity introduced would cause more problems than it would fix, and could easily result in worse performance, not better. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From p.f.moore at gmail.com Tue Feb 1 10:18:04 2005 From: p.f.moore at gmail.com (Paul Moore) Date: Tue Feb 1 10:18:07 2005 Subject: [Python-Dev] python-dev Summary for 2004-12-16 through 2004-12-31 [draft] In-Reply-To: <41FEAAEC.5080805@ocf.berkeley.edu> References: <41FEAAEC.5080805@ocf.berkeley.edu> Message-ID: <79990c6b050201011871e86ce3@mail.gmail.com> On Mon, 31 Jan 2005 14:02:20 -0800, Brett C. wrote: > 2.5 was released just before the time this summary covers so most stuff was on bug > fixes discovered after the release. Give Guido the time machine keys back! I assume you meant 2.4, or is this a blatant attempt to get back ahead of schedule with summaries? :-) Paul. PS If you look in this month's python-dev archives, you'll see evidence of /F's last attempt to steal the time machine, with a message posted from the "far future" of Feb 23rd, 2005. He clearly stalled the machine, as he posted from an alternate reality. Let this be a warning! From cedric.dev at tele2.fr Tue Feb 1 13:20:10 2005 From: cedric.dev at tele2.fr (cedric paille) Date: Tue Feb 1 12:24:14 2005 Subject: [Python-Dev] Python reference count question Message-ID: <007701c50858$5ff4cc30$90010d0a@umanis.com> Hi all, i'm working on an app that embed python 2.3 with Gnu/Linux, and i'd like to have some precisions: I'm making python's modules to extend my application's functions with a built in script editor. At now all works very well, but i'd like to know if i'm not forgetting some references inc/dec.... Here is a portion of my code: static PyObject * Scene_GetNodeGraph(PyObject *self, PyObject *args) { NodeGraph* Ng = NodeGraph::GetInstance(); std::vector NodG; Ng->GetNodeGraph(NodG); PyObject* List = PyList_New(NodG.size()); PyObject* Str = NULL; std::vector::iterator it = NodG.begin(); int i = 0; for (;it != NodG.end();it++) { Str = PyString_FromString(it->AsChar()); PyList_SetItem(List,i,Str); i++; } return List; } Can someone take a look at this and tell me if i must add some inc/decref ? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050201/20c06f0b/attachment.html From steve at holdenweb.com Tue Feb 1 16:32:43 2005 From: steve at holdenweb.com (Steve Holden) Date: Tue Feb 1 16:38:13 2005 Subject: [Python-Dev] Database import problems Message-ID: <41FFA11B.6000807@holdenweb.com> I wonder if there is a developer with MySQL or sqlite and the appropriate Python interface module who can help me to understand a problem I'm experiencing trying to use PEP 302-style import hooks. Basically I suspect we've either got an import bug or (more likely IMHO) a documentation bug, but I don't want to file on sf until I know exactly what the problem is, and I'm reluctant to use too much bandwidth on python-dev, which I know to be a busy list. The background is visible in the Python-list archives starting at http://mail.python.org/pipermail/python-list/2005-January/262148.html Of course it's possible that a savvy developer can just tell me what the problem is by reading that thread. If not, being a bear of little brain I need help from someone who is used to running debugging interpreters and can see exactly what's going on - my debugging system is fine for Python source, but has no insight into the interpreter code itself. Since I'm not currently subscribed to python-dev an email response (or, better, a follow-up on the c.l.py thread) would be appreciated if you can solve this problem. I'm happy to send full code off-list (or on-list, come to that) to anybody who can assist. regards Steve -- Meet the Python developers and your c.l.py favorites Come to PyCon!!!! http://www.python.org/pycon/2005/ Steve Holden http://www.holdenweb.com/ From gvanrossum at gmail.com Tue Feb 1 16:49:41 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 1 16:49:52 2005 Subject: [Python-Dev] Database import problems In-Reply-To: <41FFA11B.6000807@holdenweb.com> References: <41FFA11B.6000807@holdenweb.com> Message-ID: On Tue, 01 Feb 2005 10:32:43 -0500, Steve Holden wrote: > I wonder if there is a developer with MySQL or sqlite and the > appropriate Python interface module who can help me to understand a > problem I'm experiencing trying to use PEP 302-style import hooks. [...] I sent Steve a private reply pointing out the line "sys.modules['path'] = path" in os.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Tue Feb 1 16:50:12 2005 From: aahz at pythoncraft.com (Aahz) Date: Tue Feb 1 16:50:14 2005 Subject: [Python-Dev] Python reference count question In-Reply-To: <007701c50858$5ff4cc30$90010d0a@umanis.com> References: <007701c50858$5ff4cc30$90010d0a@umanis.com> Message-ID: <20050201155012.GA14254@panix.com> On Tue, Feb 01, 2005, cedric paille wrote: > > Hi all, i'm working on an app that embed python 2.3 with Gnu/Linux, > and i'd like to have some precisions: python-dev is for the core developers to discuss bugs and patches. Please use comp.lang.python for questions about using Python. Thanks. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Given that C++ has pointers and typecasts, it's really hard to have a serious conversation about type safety with a C++ programmer and keep a straight face. It's kind of like having a guy who juggles chainsaws wearing body armor arguing with a guy who juggles rubber chickens wearing a T-shirt about who's in more danger." --Roy Smith, c.l.py, 2004.05.23 From ndbecker2 at verizon.net Tue Feb 1 17:11:37 2005 From: ndbecker2 at verizon.net (Neal Becker) Date: Tue Feb 1 17:37:25 2005 Subject: [Python-Dev] complex I/O problem Message-ID: If I call "print" on a complex value, I may get this: '(2+2j)' But this is not acceptable as input: complex ('(2+2j)') Traceback (most recent call last): File "", line 1, in ? ValueError: complex() arg is a malformed string Whatever format is used for output should be accepted as input! From amk at amk.ca Tue Feb 1 18:16:10 2005 From: amk at amk.ca (A.M. Kuchling) Date: Tue Feb 1 18:18:14 2005 Subject: [Python-Dev] complex I/O problem In-Reply-To: References: Message-ID: <20050201171610.GA10114@rogue.amk.ca> On Tue, Feb 01, 2005 at 11:11:37AM -0500, Neal Becker wrote: > complex ('(2+2j)') > Traceback (most recent call last): > File "", line 1, in ? > ValueError: complex() arg is a malformed string > > Whatever format is used for output should be accepted as input! This isn't true in general; it's not true of strings, for example, nor of files. Parsing complex numbers would be pretty complicated, because it would have to accept '(2+2j)', '2+2j', '3e-6j', and perhaps even '4j+3'. It seems easier to just use eval() than to make complex() implement an entire mini-parser. --amk From gvanrossum at gmail.com Tue Feb 1 18:27:45 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 1 18:27:51 2005 Subject: [Python-Dev] complex I/O problem In-Reply-To: <20050201171610.GA10114@rogue.amk.ca> References: <20050201171610.GA10114@rogue.amk.ca> Message-ID: On Tue, 1 Feb 2005 12:16:10 -0500, A.M. Kuchling wrote: > On Tue, Feb 01, 2005 at 11:11:37AM -0500, Neal Becker wrote: > > complex ('(2+2j)') > > Traceback (most recent call last): > > File "", line 1, in ? > > ValueError: complex() arg is a malformed string > > > > Whatever format is used for output should be accepted as input! > > This isn't true in general; it's not true of strings, for example, nor > of files. Parsing complex numbers would be pretty complicated, > because it would have to accept '(2+2j)', '2+2j', '3e-6j', and perhaps > even '4j+3'. It seems easier to just use eval() than to make > complex() implement an entire mini-parser. Well, complex('2+2j') works, so it's not that far... But the rules are different: - There's no requirement whatsoever for str(); it can be whatever makes the most sense for the type. - For repr(), if at all possible, eval(repr(x)) == x should hold, in a suitable environment (you may have to import certain things in the namespace). If this can't be made true, repr(x) should be of the form <...>. - If there's no need for str() and repr() to be different, let str(x) == repr(x). So I think complex() is just fine. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Tue Feb 1 18:31:22 2005 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue Feb 1 18:33:45 2005 Subject: [Python-Dev] complex I/O problem In-Reply-To: <20050201171610.GA10114@rogue.amk.ca> References: <20050201171610.GA10114@rogue.amk.ca> Message-ID: <20050201092927.48FF.JCARLSON@uci.edu> "A.M. Kuchling" wrote: > > On Tue, Feb 01, 2005 at 11:11:37AM -0500, Neal Becker wrote: > > complex ('(2+2j)') > > Traceback (most recent call last): > > File "", line 1, in ? > > ValueError: complex() arg is a malformed string > > > > Whatever format is used for output should be accepted as input! > > This isn't true in general; it's not true of strings, for example, nor > of files. Parsing complex numbers would be pretty complicated, > because it would have to accept '(2+2j)', '2+2j', '3e-6j', and perhaps > even '4j+3'. It seems easier to just use eval() than to make > complex() implement an entire mini-parser. Which brings up the fact that while some things are able to make the eval(str(obj)) loop, more are able to make the eval(repr(obj)) loop (like strings themselves...). - Josiah From fdrake at acm.org Tue Feb 1 21:06:17 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue Feb 1 21:06:43 2005 Subject: Moving towards Python 3.0 (was Re: [Python-Dev] Speed up functioncalls) In-Reply-To: <1107198504.4185.5.camel@localhost> References: <1107190265.41fe61f9caab8@mcherm.com> <1107198504.4185.5.camel@localhost> Message-ID: <200502011506.17223.fdrake@acm.org> On Monday 31 January 2005 14:08, Glyph Lefkowitz wrote: > As it stands, this idiom works most of the time, and if an EMFILE errno > triggered the GC, it would always work. That might help things on Unix, but I don't think that's meaningful. Windows is much more sensitive to files being closed, and the refcount solution supports that more effectively than delayed garbage collection strategies. With the current approach, you can delete the file right away after releasing the last reference to the open file object, even on Windows. You can't do that with delayed GC since Windows will be convinced that the file is still open and refuse to let you delete it. To fix that, you'd have to trigger GC from the failed removal operation and try again. I think we'd find there are a lot more operations that need that support than we'd like to think. -Fred -- Fred L. Drake, Jr. From theller at python.net Tue Feb 1 21:17:17 2005 From: theller at python.net (Thomas Heller) Date: Tue Feb 1 21:15:41 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? Message-ID: <4qgwf31u.fsf@python.net> The 2.4 python.org installer installs msvcr71.dll on the target system. If someone uses py2exe or a similar tool to create a frozen application, is he allowed to redistribute this msvcr71.dll to other users together with his application or not, even if he doesn't own MSVC? This was asked on the py2exe users list, but I could not answer this question. Googling for msvcr71.dll finds some site which offer to download it, and they pretend that they are not violating any license, but I wasn't able to find definite words from MS about that. Thanks, Thomas From bac at OCF.Berkeley.EDU Tue Feb 1 23:22:45 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Tue Feb 1 23:22:53 2005 Subject: [Python-Dev] python-dev Summary for 2004-12-16 through 2004-12-31 [draft] In-Reply-To: <79990c6b050201011871e86ce3@mail.gmail.com> References: <41FEAAEC.5080805@ocf.berkeley.edu> <79990c6b050201011871e86ce3@mail.gmail.com> Message-ID: <42000135.8030701@ocf.berkeley.edu> Paul Moore wrote: > On Mon, 31 Jan 2005 14:02:20 -0800, Brett C. wrote: > >>2.5 was released just before the time this summary covers so most stuff was on bug >>fixes discovered after the release. > > > Give Guido the time machine keys back! > Fine, but I was going to go back in time, win the lottery, and give so much money to the PSF that a bunch of people were going to work on Python full-time for the rest of their lives. It's your fault, Paul, that isn't going to happen now. =) > I assume you meant 2.4, or is this a blatant attempt to get back ahead > of schedule with summaries? :-) > =) No, it's a typo. Problem of always using and working on 2.5 but having to remember when I am dealing with older versions. > Paul. > > PS If you look in this month's python-dev archives, you'll see > evidence of /F's last attempt to steal the time machine, with a > message posted from the "far future" of Feb 23rd, 2005. He clearly > stalled the machine, as he posted from an alternate reality. Let this > be a warning! Will actually be nice to finally not have to automatically skip the first line in the archive page thanks to that funky email. -Brett From mike at skew.org Wed Feb 2 01:50:51 2005 From: mike at skew.org (Mike Brown) Date: Wed Feb 2 01:51:01 2005 Subject: [Python-Dev] mimetypes and _winreg In-Reply-To: <40CB5684.2090609@garthy.com> Message-ID: <200502020050.j120opQW020156@chilled.skew.org> Following up on this 12 Jun 2004 post... Garth wrote: > Thomas Heller wrote: > >Mike Brown writes: > >>I thought it would be nice to try to improve the mimetypes module by having > >>it, on Windows, query the Registry to get the mapping of filename extensions > >>to media types, since the mimetypes code currently just blindly checks > >>posix-specific paths for httpd-style mapping files. However, it seems that the > >>way to get mappings from the Windows registry is excessively slow in Python. > >> > >>I'm told that the reason has to do with the limited subset of APIs that are > >>exposed in the _winreg module. I think it is that EnumKey(key, index) is > >>querying for the entire list of subkeys for the given key every time you call > >>it. Or something. Whatever the situation is, the code I tried below is way > >>slower than I think it ought to be. > >> > >>Does anyone have any suggestions (besides "write it in C")? Could _winreg > >>possibly be improved to provide an iterator or better interface to get the > >>subkeys? (or certain ones? There are a lot of keys under HKEY_CLASSES_ROOT, > >>and I only need the ones that start with a period). > > > >See this post I made some time ago: > > > > > >>Should I file this as a feature request? > > > >If you still think it should be changed in the core, you should work on > >a patch. > > > I could file a patch if no one else is looking at it. The solution would > be to use RegEnumKeyEx and remove RegQueryInfoKey. This loses > compatability with win16 which I guess is ok. > > Garth I would say it looks like no one else was looking at it, and Garth apparently didn't submit a patch. It's beyond my means to come up with a patch myself. Would someone be willing to take a look at it? Sorry, but I really want access to registry subkeys to stop being so dog-slow. :) Thanks for taking a look, -Mike From vwehren at home.nl Wed Feb 2 06:30:04 2005 From: vwehren at home.nl (Vincent Wehren) Date: Wed Feb 2 06:30:05 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: <4qgwf31u.fsf@python.net> References: <4qgwf31u.fsf@python.net> Message-ID: <4200655C.2030305@home.nl> Thomas Heller wrote: > The 2.4 python.org installer installs msvcr71.dll on the target system. > > If someone uses py2exe or a similar tool to create a frozen application, > is he allowed to redistribute this msvcr71.dll to other users together > with his application or not, even if he doesn't own MSVC? According to the EULA, you may distribute anything listed in redist.txt: """2.2 Redistributable Code-General. Microsoft grants you a nonexclusive, royalty-free right to reproduce and distribute the object code form of any portion of the Software listed in REDIST.TXT ("Redistributable Code"). For general redistribution requirements for Redistributable Code, see Section 3.1, below.""" So the right to distribute is coupled to the a) the EULA and b) redist.txt. (As a side note, the Microsoft Visual C++ Toolkit 2003 for example contains NO redistributables per redist.txt). In the case of not owning a compiler at all, chances seem pretty slim you have any rights to distribute anything. -- Vincent Wehren > > This was asked on the py2exe users list, but I could not answer this > question. Googling for msvcr71.dll finds some site which offer to > download it, and they pretend that they are not violating any license, > but I wasn't able to find definite words from MS about that. > > Thanks, > > Thomas > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/vwehren%40home.nl > From t-meyer at ihug.co.nz Wed Feb 2 09:38:00 2005 From: t-meyer at ihug.co.nz (Tony Meyer) Date: Wed Feb 2 09:38:47 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: Message-ID: [Thanks for bringing this up, BTW, Thomas]. [Thomas Heller] >> The 2.4 python.org installer installs msvcr71.dll on the >> target system. >> >> If someone uses py2exe or a similar tool to create a frozen >> application, is he allowed to redistribute this msvcr71.dll >> to other users together with his application or not, even if >> he doesn't own MSVC? [Vincent Wehren] > According to the EULA, Is that the EULA of MS VC++? > you may distribute anything listed in redist.txt: And, just to be clear, mscvr71.dll is in redist.txt? > """2.2 Redistributable Code-General. Microsoft grants you a > nonexclusive, royalty-free right to reproduce and distribute > the object code form of any portion of the Software listed in > REDIST.TXT ("Redistributable Code"). For general redistribution > requirements for Redistributable Code, see Section 3.1, below.""" Is it legit to redistribute an EULA? If so, would you mind sending me a copy of this (off-list)? > So the right to distribute is coupled to the a) the EULA and b) > redist.txt. (As a side note, the Microsoft Visual C++ Toolkit > 2003 for example contains NO redistributables per redist.txt). I'm not that familiar with the names of all these things. Is the "Microsoft Visual C++ Toolkit 2003" the free thing that you can get? > In the case of not owning a compiler at all, chances seem pretty slim > you have any rights to distribute anything. Well, I 'own' a copy of gcc, which is a compiler . Can anyone here suggest a way to get around this? As a specific example: the SpamBayes distribution includes a py2exe binary, and it would be nice (although not essential) to build this with 2.4. However, at the moment my name goes down as the release manager, and I don't have (AFAICT) a licence to redistribute msvcr71.dl. Should people in this situation just stick with 2.3 or buy a copy of a MS compiler? =Tony.Meyer From ajm at flonidan.dk Wed Feb 2 11:38:05 2005 From: ajm at flonidan.dk (Anders J. Munch) Date: Wed Feb 2 11:38:20 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? Message-ID: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> >From Tony Meyer [mailto:t-meyer@ihug.co.nz]: > Can anyone here suggest a way to get around this? As a specific > example: the SpamBayes distribution includes a py2exe binary, and it > would be nice (although not essential) to build this with 2.4. > However, at the moment my name goes down as the release manager, and > I don't have (AFAICT) a licence to redistribute msvcr71.dl. Instead of redistributing msvcr71.dll on your own volition, help someone else distribute it: 1. John X. Programmer buys the product, agrees to the EULA and puts the DLL up for download, with the explicit and stated intent of distributing it to anyone who needs it. 2. You, being the nice person you are, decide to help John X. Programmer. You do that by including msvcr71.dll in your software distribution. After all, the users of your software needs it. As you are merely aiding John X. Programmer in performing the redistribution that is within his rights to do, there is no need for anyone to be granted any additional rights, and specifically you do not need to agree to the EULA. Unless the EULA contains specific language to forbid such multi-stage open-ended redistribution, I'd say you can just re-redistribute away. but-then-I-am-not-a-lawyer-ly y'rs, Anders From theller at python.net Wed Feb 2 12:05:32 2005 From: theller at python.net (Thomas Heller) Date: Wed Feb 2 12:04:02 2005 Subject: [Python-Dev] mimetypes and _winreg In-Reply-To: <200502020050.j120opQW020156@chilled.skew.org> (Mike Brown's message of "Tue, 1 Feb 2005 17:50:51 -0700 (MST)") References: <200502020050.j120opQW020156@chilled.skew.org> Message-ID: Mike Brown writes: > Following up on this 12 Jun 2004 post... > > Garth wrote: >> Thomas Heller wrote: >> >Mike Brown writes: >> >>I thought it would be nice to try to improve the mimetypes module by having >> >>it, on Windows, query the Registry to get the mapping of filename extensions >> >>to media types, since the mimetypes code currently just blindly checks >> >>posix-specific paths for httpd-style mapping files. However, it seems that the >> >>way to get mappings from the Windows registry is excessively slow in Python. >> >> >> >>I'm told that the reason has to do with the limited subset of APIs that are >> >>exposed in the _winreg module. I think it is that EnumKey(key, index) is >> >>querying for the entire list of subkeys for the given key every time you call >> >>it. Or something. Whatever the situation is, the code I tried below is way >> >>slower than I think it ought to be. >> >> >> >>Does anyone have any suggestions (besides "write it in C")? Could _winreg >> >>possibly be improved to provide an iterator or better interface to get the >> >>subkeys? (or certain ones? There are a lot of keys under HKEY_CLASSES_ROOT, >> >>and I only need the ones that start with a period). >> > >> >See this post I made some time ago: >> > >> > >> >>Should I file this as a feature request? >> > >> >If you still think it should be changed in the core, you should work on >> >a patch. >> > >> I could file a patch if no one else is looking at it. The solution would >> be to use RegEnumKeyEx and remove RegQueryInfoKey. This loses >> compatability with win16 which I guess is ok. >> >> Garth > > I would say it looks like no one else was looking at it, and Garth apparently > didn't submit a patch. It's beyond my means to come up with a patch myself. > Would someone be willing to take a look at it? There is a patch, but, as so often, work on it has stalled. http://www.python.org/sf/977553 Thomas From stephen at xemacs.org Wed Feb 2 13:40:23 2005 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed Feb 2 13:40:40 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> (Anders J. Munch's message of "Wed, 2 Feb 2005 11:38:05 +0100") References: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> Message-ID: <87wttr3zk8.fsf@tleepslib.sk.tsukuba.ac.jp> >>>>> "Anders" == Anders J Munch writes: Anders> Unless the EULA contains specific language to forbid such Anders> multi-stage open-ended redistribution, I'd say you can Anders> just re-redistribute away. Anders> but-then-I-am-not-a-lawyer-ly y'rs, Anders I am not either, but in matters like this it works the other way around: all rights not _explicitly_ granted are reserved. Somebody had better ask a real lawyer; in theory, you could be putting downstream users who share with their friends at risk. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. From vwehren at home.nl Wed Feb 2 18:27:47 2005 From: vwehren at home.nl (Vincent Wehren) Date: Wed Feb 2 18:27:48 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: References: Message-ID: <42010D93.7010002@home.nl> Tony Meyer wrote: > [Thanks for bringing this up, BTW, Thomas]. > > [Thomas Heller] > > > [Vincent Wehren] > >>According to the EULA, > > > Is that the EULA of MS VC++? The full text of the EULA for Visual C++ Toolkit 2003 can be found at http://msdn.microsoft.com/visualc/vctoolkit2003/eula.aspx For VS.NET: http://proprietary.clendons.co.nz/licenses/eula/VisualStudiodotnetEnterpriseArchitect2002-eula.htm > >>you may distribute anything listed in redist.txt: > > > And, just to be clear, mscvr71.dll is in redist.txt? Not in the free toolkit; in the $-version it must be. > I'm not that familiar with the names of all these things. Is the "Microsoft > Visual C++ Toolkit 2003" the free thing that you can get? Yep. >>In the case of not owning a compiler at all, chances seem pretty slim >>you have any rights to distribute anything. > > > Well, I 'own' a copy of gcc, which is a compiler . > > Can anyone here suggest a way to get around this? As a specific example: > the SpamBayes distribution includes a py2exe binary, and it would be nice > (although not essential) to build this with 2.4. However, at the moment my > name goes down as the release manager, and I don't have (AFAICT) a licence > to redistribute msvcr71.dl. Okay: thinking about this for a bit longer: it is the Python interpreter that needs msvcr71.dll, right. You need the python interpreter for py2exe. The distributor of Python is allowed to redistribute msvcr71.dll, and you are acting as re-distributor for the Python interpreter (to end users) and the EULA never even cares for/applies to the frozen binary... -- Vincent Wehren > > Should people in this situation just stick with 2.3 or buy a copy of a MS > compiler? > > =Tony.Meyer > > From nhodgson at bigpond.net.au Wed Feb 2 22:04:40 2005 From: nhodgson at bigpond.net.au (Neil Hodgson) Date: Wed Feb 2 22:04:49 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? References: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> Message-ID: <001e01c5096a$cfbe8480$214c8890@neil> Anders J. Munch: > 1. John X. Programmer buys the product, agrees to the EULA and puts > the DLL up for download, with the explicit and stated intent of > distributing it to anyone who needs it. Disallowed in 3.1(a): # you agree: ... to distribute the Redistributables only ... in # conjunction with and as a part of a software application # product developed by you that adds significant and primary # functionality to the Redistributables > Unless the EULA contains specific language to forbid such multi-stage > open-ended redistribution, I'd say you can just re-redistribute away. Lawyers think like lawyers much better than developers do. Neil From theller at python.net Wed Feb 2 22:16:10 2005 From: theller at python.net (Thomas Heller) Date: Wed Feb 2 22:14:41 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: <001e01c5096a$cfbe8480$214c8890@neil> (Neil Hodgson's message of "Thu, 3 Feb 2005 08:04:40 +1100") References: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> <001e01c5096a$cfbe8480$214c8890@neil> Message-ID: <3bwed5np.fsf@python.net> "Neil Hodgson" writes: > Anders J. Munch: > >> 1. John X. Programmer buys the product, agrees to the EULA and puts >> the DLL up for download, with the explicit and stated intent of >> distributing it to anyone who needs it. > > Disallowed in 3.1(a): > # you agree: ... to distribute the Redistributables only ... in > # conjunction with and as a part of a software application > # product developed by you that adds significant and primary > # functionality to the Redistributables > All this pretty much subsumes what I was thinking. The only question that remains is: why are there some sites like http://www.dll-files.com/ which offer this and other MS dlls for download? For the spambayes binary, maybe there should be another person adding the msvcr71.dll to the distribution that Tony builds? Someone who has a MSVC license, and also is developer on the spambayes project? Thomas From tim.peters at gmail.com Wed Feb 2 23:12:36 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 2 23:12:39 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: <3bwed5np.fsf@python.net> References: <6D9E824FA10BD411BE95000629EE2EC3C6DE48@FLONIDAN-MAIL> <001e01c5096a$cfbe8480$214c8890@neil> <3bwed5np.fsf@python.net> Message-ID: <1f7befae050202141243ecc3a2@mail.gmail.com> [Thomas Heller] > ... > For the spambayes binary, maybe there should be another person adding > the msvcr71.dll to the distribution that Tony builds? Someone who has a > MSVC license, and also is developer on the spambayes project? To the best of my knowledge, Tony is distributing my duly licensed copy of msvcr71.dll with spambayes. And so long as I remain totally ignorant of what Tony actually does, that will remain my best knowledge. Win-win . From noamraph at gmail.com Wed Feb 2 23:55:31 2005 From: noamraph at gmail.com (Noam Raphael) Date: Wed Feb 2 23:56:13 2005 Subject: [Python-Dev] A proposal: built in support for abstract methods Message-ID: Hello, I would like to suggest a new method decorator: abstractmethod. I'm definitely not the only one who've thought about it, but I discussed this on c.l.py, and came to think that it's a nice idea. An even Pythonic! This has nothing to do with type checking and adaptation - or, to be more precise, it may be combined with them, but it will live happily without them. I don't understand these issues a great deal. What was my situations? I had to write a few classes, all with the same interface but with a different implementation, that were meant to work inside some infrastructure. The specific class that would be used would be selected by what exactly the user wanted. Some methods of these classes were exactly the same in all of the classes, so naturally, I wrote a base class with an implementation of these methods. But then came the question: and what about the other methods? I wanted to document that they should exist in all the classes of that family, and that they should do XYZ; otherwise, they won't fit the infrastructure. So I wrote something like: def get_changed(self): """This method should return the changed keys since last call.""" raise NotImplementedError But I wasn't happy about it. I thought that @abstractmethod def get_changed(self): """This methods should ...""" would have been nicer. Why? 1. "Beautiful is better than ugly." - Who was talking here about errors? I just wanted to say what the method should do! 2. "Explicit is better than implicit." - This is really the issue. I *meant* to declare that a method should be implemented in subclasses, and what it should do, but I *was* actually defining a method which raises NotImplementedError when called with no arguments. I am used to understanding NotImplementedError as "We should really implement this some day, when we have the time", not as "In order to be a proud subclass of BaseClass, you should implement this method". 3. "There should be one-- and preferably only one --obvious way to do it." - I could have written this in a few other ways: def get_changed(self): """This method should return the changed keys since last call. PURE VIRTUAL. """ def get_changed(self): """This method should return the changed keys since last call.""" raise NotImplementedError, "get_changed is an abstract methods. Subclasses of BaseClass should implement it." What's good about the last example is that when the exception occurs, it would be easier to find the problem. What's bad about it, is that it's completely redundent, and very long to write. Ok. Now another thing: I want classes that contain abstractmethods be uninstantiable. One (and the main) reason is that instantiating that class of mine doesn't make sense. It doesn't know how to do anything useful, and doesn't represent any consistent object that you can have instances of. The other reason is that it will help the programmer to find out quickly methods he forgot to implement in his subclasses. You may say that it suits "Errors should never pass silently." The basic reason why I think this is fitting is that abstract classes are something which is natural when creating class hierarchies; usually, when I write a method, all subclasses must inherit it, or implement another version with a compatible behaviour. Sometimes there is no standard behaviour, so all subclasses must choose the second option. This concept is already in use in Python's standard library today! "basestring" was created as the base class of "str" and "unicode". What I'm proposing is just to make this possible also in code written in Python. George Sakkis has posted a very nice Python implementation of this: http://groups-beta.google.com/group/comp.lang.python/msg/597e9ffa7b1f709b To summarize, I think that abstract methods are simply not regular functions, since by definition they don't specify actions, and so they deserves an object of their own. And if it helps with testing the subclasses - then why not? What do you say? Noam From t-meyer at ihug.co.nz Thu Feb 3 01:23:39 2005 From: t-meyer at ihug.co.nz (Tony Meyer) Date: Thu Feb 3 01:23:45 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: Message-ID: [Thomas Heller] >> For the spambayes binary, maybe there should be another >> person adding the msvcr71.dll to the distribution that Tony >> builds? Someone who has a MSVC license, and also is developer >> on the spambayes project? [Tim Peters] > To the best of my knowledge, Tony is distributing my duly > licensed copy of msvcr71.dll with spambayes. And so long as > I remain totally ignorant of what Tony actually does, that > will remain my best knowledge. Win-win . That solves the specific SpamBayes problem. It still seems like this is somewhat of a PITA for people wanting to build frozen Windows apps with Python 2.4, though. OTOH, I can't personally think of anything (apart from the it'll-never-fly go back to VC6 solution or the bound-to-be-terrible static linking solution) that the Python developers can do about it. (Well, there's that chap from Microsoft at PyCon, right? How about one of you convince him to convince Microsoft to give all Python developers a licence to redistribute msvcr71.dll? ). BTW, this bit of the EULA isn't great: ""(iii) to distribute the Licensee Software containing the Redistributables pursuant to an end user license agreement (which may be "break-the-seal", "click-wrap" or signed), with terms no less protective than those contained in this EULA;""" The PSF licence is probably somewhat less protective than that one. I suppose the PSF licence really applies to the source, though, and not the built binary. Or something like that :) (Users giving the software directly to someone else, rather than downloading from the official site, is probably covered by: """You also agree not to permit further distribution of the Redistributables by your end users except you may permit further redistribution of the Redistributables by your distributors to your end-user customers if your distributors only distribute the Redistributables in conjunction with, and as part of, the Licensee Software and you and your distributors comply with all other terms of this EULA.""" Where the users become our redistributors.) =Tony.Meyer From bac at OCF.Berkeley.EDU Thu Feb 3 01:58:19 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Feb 3 01:58:33 2005 Subject: [Python-Dev] redux: fractional seconds in strptime In-Reply-To: <41E83EB8.8060405@ocf.berkeley.edu> References: <16870.61059.451494.303971@montanaro.dyndns.org> <41E74790.60108@ocf.berkeley.edu> <16871.37525.981821.580939@montanaro.dyndns.org> <41E80995.5030901@ocf.berkeley.edu> <16872.3770.25143.582154@montanaro.dyndns.org> <41E83EB8.8060405@ocf.berkeley.edu> Message-ID: <4201772B.90601@ocf.berkeley.edu> Everyone went silent on this topic. Does this mean people just stopped caring (which I doubt since I know Skip wants this bad enough to bring it up every so often)? Was it the issue of symmetry with strftime? I am willing to add this (albeit the simple way I proposed in my last email on this thread) but I obviously don't want to bother if no one wants it or likes my proposed solution. -Brett From pje at telecommunity.com Thu Feb 3 02:01:36 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Feb 3 01:59:26 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: References: Message-ID: <5.1.1.6.0.20050202195401.038c04d0@mail.telecommunity.com> At 01:23 PM 2/3/05 +1300, Tony Meyer wrote: >(Users giving the software directly to someone else, rather than downloading >from the official site, is probably covered by: > >"""You also agree not to permit further distribution of the Redistributables >by your end users except you may permit further redistribution of the >Redistributables by your distributors to your end-user customers if your >distributors only distribute the Redistributables in conjunction with, and >as part of, the Licensee Software and you and your distributors comply with >all other terms of this EULA.""" > >Where the users become our redistributors.) Sounds like this puts all Python users in the clear, since Python is the Licensee Software in that case. So, anybody can distribute msvcr71 as "part of" Python. OTOH, the other wording sounds like Python itself has to have a click-wrap, tear-open, or signature EULA! IOW, the EULA appears to prohibit free distribution of the runtime with a program that has no EULA. So, in an amusing turn of events, the EULA actually appears to forbid the current offering of Python for Windows, since it does not have such a EULA. This is a much bigger worry than the original question. If we're actually allowed to distribute Python with the runtime at all, then py2exe and such are perfectly safe, since it's in conjunction with permitted redistribution. If distribution of the runtime is not allowed, on the other hand, then use of MSVC 7 for Python becomes altogether impossible without adding some kind of click-wrap licensing scheme. From t-meyer at ihug.co.nz Thu Feb 3 02:32:11 2005 From: t-meyer at ihug.co.nz (Tony Meyer) Date: Thu Feb 3 02:32:12 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: Message-ID: (I should point out the thread that starts here, too: in case anyone isn't aware of it). > Sounds like this puts all Python users in the clear, since > Python is the Licensee Software in that case. So, anybody can > distribute msvcr71 as "part of" Python. I guess it would really take a lawyer (well, probably several) to say whether distributing a frozen application is distributing Python or not. > OTOH, the other wording sounds like Python itself has to have > a click-wrap, tear-open, or signature EULA! IOW, the EULA > appears to prohibit free distribution of the runtime with a > program that has no EULA. > > So, in an amusing turn of events, the EULA actually appears > to forbid the current offering of Python for Windows, since > it does not have such a EULA. I presume that adding a "click-wrap" EULA to the Python .msi would not be difficult. Lots of other .msi's have "click-wrap" licenses, so there must be some sample code that can be used. The license is already in the distribution, it would just be displayed at an additional time. The EULA has to be no less restrictive than the MSVC one (presumably only in relation to the bits of MSVC that are being redistributed), so I guess a section at the end of the PSF license that duplicates the relevant bits of the MSVC one would work. (Of course, IANAL). =Tony.Meyer From tjreedy at udel.edu Thu Feb 3 03:11:31 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Thu Feb 3 03:11:51 2005 Subject: [Python-Dev] Re: Is msvcr71.dll re-redistributable? References: <5.1.1.6.0.20050202195401.038c04d0@mail.telecommunity.com> Message-ID: "Phillip J. Eby" wrote in message news:5.1.1.6.0.20050202195401.038c04d0@mail.telecommunity.com... > So, in an amusing turn of events, the EULA actually appears to forbid the > current offering of Python for Windows, since it does not have such a > EULA. Except of course that MS gave Python developers several copies of its newest compiler specifically for the purpose of compiling the Windows distribution. It would be nice to get a clear English statement from MS. I have dealt with the legalese in property sales agreements, lease agreements, and normal software licenses, but the quoted EULA snippets are the most obscure by far. Terry J. Reedy From anthony at interlink.com.au Thu Feb 3 11:30:57 2005 From: anthony at interlink.com.au (anthony@interlink.com.au) Date: Thu Feb 3 11:31:21 2005 Subject: [Python-Dev] Returned mail: Data format error Message-ID: <20050203103119.28BFB1E4003@bag.python.org> Your message was not delivered due to the following reason: Your message could not be delivered because the destination computer was unreachable within the allowed queue period. The amount of time a message is queued before it is returned depends on local configura- tion parameters. Most likely there is a network problem that prevented delivery, but it is also possible that the computer is turned off, or does not have a mail system running right now. Your message could not be delivered within 7 days: Mail server 190.102.237.222 is not responding. The following recipients could not receive this message: Please reply to postmaster@interlink.com.au if you feel this message to be in error. From skip at pobox.com Thu Feb 3 14:12:30 2005 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 3 14:12:02 2005 Subject: [Python-Dev] redux: fractional seconds in strptime In-Reply-To: <4201772B.90601@ocf.berkeley.edu> References: <16870.61059.451494.303971@montanaro.dyndns.org> <41E74790.60108@ocf.berkeley.edu> <16871.37525.981821.580939@montanaro.dyndns.org> <41E80995.5030901@ocf.berkeley.edu> <16872.3770.25143.582154@montanaro.dyndns.org> <41E83EB8.8060405@ocf.berkeley.edu> <4201772B.90601@ocf.berkeley.edu> Message-ID: <16898.9022.505916.761977@montanaro.dyndns.org> Brett> Everyone went silent on this topic. Does this mean people just Brett> stopped caring (which I doubt since I know Skip wants this bad Brett> enough to bring it up every so often)? Was it the issue of Brett> symmetry with strftime? I have a patch to do strptime() fractional seconds, but stumbled on the reverse direction (making strftime() accept fractional seconds). I'll submit a patch with what I have later today. I have to catch a train just now. Skip From Jack.Jansen at cwi.nl Thu Feb 3 15:15:37 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Feb 3 15:16:08 2005 Subject: [Python-Dev] Is msvcr71.dll re-redistributable? In-Reply-To: <5.1.1.6.0.20050202195401.038c04d0@mail.telecommunity.com> References: <5.1.1.6.0.20050202195401.038c04d0@mail.telecommunity.com> Message-ID: On 3 Feb 2005, at 02:01, Phillip J. Eby wrote: > Sounds like this puts all Python users in the clear, since Python is > the Licensee Software in that case. So, anybody can distribute > msvcr71 as "part of" Python. > > OTOH, the other wording sounds like Python itself has to have a > click-wrap, tear-open, or signature EULA! IOW, the EULA appears to > prohibit free distribution of the runtime with a program that has no > EULA. > > So, in an amusing turn of events, the EULA actually appears to forbid > the current offering of Python for Windows, since it does not have > such a EULA. That was also my conclusion last year:-( But at least Python can still be distributed without msvcr71, putting the burden of obtaining it on the end user, because of Python's license. In another project we're using GPL, and careful reading (disclaimer: IANAL) has not convinced me that GPL and the EULA are compatible. Actually, I have this vague feeling that the MSVC 7 EULA (plus the fact that MS isn't shipping msvcr71.dll with Windows) might have been drafted specifically to be incompatible with the clause in GPL that doesn't allow you to link against third party libraries unless they're part of the OS. What we've done in that project is link with msvcr71.dll, but not include it in the installer. I think that we could (theoretically) still be dragged into court by the FSF, but at least not by Microsoft. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From gvanrossum at gmail.com Thu Feb 3 16:03:24 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 3 16:03:29 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team Message-ID: If you read BugTraq, python-announce or the Daily Python URL today, you would have noticed a Python Security Advisory. (If you missed it: http://www.python.org/security/PSF-2005-001/ .) This was the first one issued in this form, but I'm sure it won't be the last one. Until now, we haven't had any infrastructure for this type of thing. In this particular case, the original discoverer first asked on c.l.py for advice on how to proceed, which yielded only unhelpful referrals to SF or python-dev. Then he wrote the authors of the affected module. Fredrik was so kind to forward it to me, and I happened to have time to deal with it. (Hey, I work for a security company, so I would have *made* time if I had to.) But I may not always be that responsive -- I could be busy, or traveling, or people might not think of mailing me. I believe it would be better if there was a "response team" for such situations. The response team would normally not have to do anything; they wouldn't have to be actively looking for security bugs, for example. But anyone with a (suspected) security problem related to Python would be able to email the team (e.g. security at python.org), trusting that the information would be kept confidential until a patch is developed; the response team would then investigate the problem and decide on an appropriate response. I want to be on the team; Barry also works for a security company and I hope he'll want to join (he can also make up a better acronym :-); I hope at least one person from the release team can be involved, e.g. Anthony; and I would like to see some more volunteers involved to have a good spread of availability and expertise. (How about a Windows user?) If you want to be on the team, send email to me *personally*. For discussion about the team's responsibilities and procedures, please follow up here. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu Feb 3 17:01:02 2005 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 3 17:00:21 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: References: Message-ID: <16898.19134.658304.948731@montanaro.dyndns.org> Guido> For discussion about the team's responsibilities and procedures, Guido> please follow up here. I noticed the checkins. I think there is one other necessary output: source patches against all the affected versions need to be made available so people can apply the patch to an existing installed version without needing to upgrade. Skip From gvanrossum at gmail.com Thu Feb 3 17:25:09 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 3 17:25:42 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: <16898.19134.658304.948731@montanaro.dyndns.org> References: <16898.19134.658304.948731@montanaro.dyndns.org> Message-ID: > I noticed the checkins. I think there is one other necessary output: source > patches against all the affected versions need to be made available so > people can apply the patch to an existing installed version without needing > to upgrade. Patches for 2.2, 2.3 and 2.4 are on the website (python.org/security/PSF-2005-001/ has links). The module didn't exist before 2.2. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From 2004b at usenet.alexanderweb.de Thu Feb 3 17:36:59 2005 From: 2004b at usenet.alexanderweb.de (Alexander Schremmer) Date: Thu Feb 3 17:54:01 2005 Subject: [Python-Dev] Re: Is msvcr71.dll re-redistributable? References: <4qgwf31u.fsf@python.net> Message-ID: On Tue, 01 Feb 2005 21:17:17 +0100, Thomas Heller wrote: > The 2.4 python.org installer installs msvcr71.dll on the target system. > > If someone uses py2exe or a similar tool to create a frozen application, > is he allowed to redistribute this msvcr71.dll to other users together > with his application or not, even if he doesn't own MSVC? How about statically compiling the code? Then you do not need to distribute the runtime library. It should not make a big difference for the rather large file python24.dll Kind regards, Alexander From theller at python.net Thu Feb 3 19:37:40 2005 From: theller at python.net (Thomas Heller) Date: Thu Feb 3 19:36:15 2005 Subject: [Python-Dev] Re: Is msvcr71.dll re-redistributable? In-Reply-To: (Alexander Schremmer's message of "Thu, 3 Feb 2005 17:36:59 +0100") References: <4qgwf31u.fsf@python.net> Message-ID: <7jlpbibv.fsf@python.net> Alexander Schremmer <2004b@usenet.alexanderweb.de> writes: > On Tue, 01 Feb 2005 21:17:17 +0100, Thomas Heller wrote: > >> The 2.4 python.org installer installs msvcr71.dll on the target system. >> >> If someone uses py2exe or a similar tool to create a frozen application, >> is he allowed to redistribute this msvcr71.dll to other users together >> with his application or not, even if he doesn't own MSVC? > > How about statically compiling the code? Then you do not need to distribute > the runtime library. It should not make a big difference for the rather > large file python24.dll This would not work since each binary extension for Python 2.4 uses the dll runtime lib. Thomas From bac at OCF.Berkeley.EDU Fri Feb 4 02:39:39 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Feb 4 02:39:48 2005 Subject: [Python-Dev] python-dev Summary for 2005-01-01 through 2005-01-15 [draft] Message-ID: <4202D25B.9030808@ocf.berkeley.edu> Wow, another summary out the same week as the previous one! Perk of keeping things short and to the point. Then again keeping them this simple and short begs the question of whether the summaries are worth it still at that point. Regardless, probably will send this one out Saturday or Sunday so corrections need to get in by then. ------------------------------------- ===================== Summary Announcements ===================== PyCon_ will be upon us come late March! Still time to plan to go. A warning on the thoroughness off this summary is in order. While trying to delete a single thread of email I managed to accidentally delete my entire python-dev mailbox. I did the best I could to retrieve the emails but it's possible I didn't resuscitate all of my emails, so I may have overlooked something. .. _PyCon: http://www.pycon.org/ ======= Summary ======= ------------- PEP movements ------------- .. tip:: PEP updates by email are available as a topic from the `Python-checkins`_ mailing list. `PEP 246`_ was a major topic of discussion during the time period covered by this summary. This all stemmed from `Guido's blog`_ entries on optional type checking. This led to a huge discussion on many aspects of protocols, interfaces, and adaptation and the broadening of this author's vocabulary to include "Liskov violation". "Monkey typing" also became a new term to know thanks to Phillip J. Eby's proto-PEP on the topic (found at http://peak.telecommunity.com/DevCenter/MonkeyTyping). Stemming from the phrase "monkey see, monkey do", it's Phillip version of taking PEP 246 logically farther (I think; the whole thing is more than my currently burned-out-on-school brain can handle right now). .. _Python-checkins: http://mail.python.org/mailman/listinfo/python-checkins .. _PEP 246: http://www.python.org/peps/pep-0246.html .. Guido's blog: http://www.artima.com/weblogs/index.jsp?blogger=guido Contributing threads: - `getattr and __mro__ <>`__ - `Son of PEP 246, redux <>`__ - `PEP 246: lossless and stateless <>`__ - `PEP 246: LiskovViolation as a name <>`__ - `"Monkey Typing" pre-PEP, partial draft <>`__ ------------------------------------------------------------------------------------ Optional type checking: how to inadvertently cause a flame war worse than decorators ------------------------------------------------------------------------------------ `Guido's blog`_ had comments on the idea of adding optional static type checking to Python. While just comments in a blog, it caused a massive response from people, mostly negative from what I gathered. After Guido discussed things some more it culminated in a blog entry found at http://www.artima.com/weblogs/viewpost.jsp?thread=87182 that lays out what his actual plans are. I highly recommend reading it since it suggests adding optional run-time type checking for function arguments along with some other proposals. All of this led to `PEP 246`_ getting updated. For some more details on that see the `PEP movements`_ section of this summary. And if there is a lesson to be learned from all of this, it's that when Alex Martelli and Phillip J. Eby start a technical discussion it's going to be long, in-depth, complex, and lead to my inbox being brimming in python-dev email. ------------------------------ Let's get the AST branch done! ------------------------------ Guido posted an email to the list stating he would like to to make progress towards integrating "things like type inferencing, integrating PyChecker, or optional static type checking" into Python. In order to make that easier he put out a request that people work on the AST branch and finish it. For those that don't know about Python's back-end, the compiler as it stands now takes the parse tree from the parser and emits bytecode directly from that. This is far from optimal since the parse tree is more verbose than needed and it is not the easiest thing to work with. The AST branch attempts to fix this by taking a more traditional approach to compiling. This means the parse tree is used to generate an AST (abstract syntax tree; and even more technically could be considered a control flow graph in view of how it is implemented) which in turn is used to emit bytecode. The AST itself is much easier to work with when compared to the parse tree; better to know you are working with an 'if' guard thanks to it being an 'if' node in the AST than checking if the parse tree statement you are working with starts with 'if' and ends with a ':'. While all of this sounds great, the issue is the AST branch is not finished yet. It is not entirely far off, but new features from 2.4 (decorators and generator expressions) need to be added along with more bug fixing and clean up. This means the AST branch is going to get finished for 2.5 somehow. But help is needed. While the usual suspects who have previously contributed to the branch are hoping to finish it, more help is always appreciated. If you care to get involved, check out the AST branch (tagged as 'ast-branch' in CVS; see the `python-dev FAQ`_ on how to do a tagged branch checkout), read Python/compile.txt and just dive in! There will also be a sprint on the AST branch at PyCon. .. _python-dev FAQ: http://www.python.org/dev/devfaq.html Contributing threads: - `Please help complete the AST branch <>`__ - `Will ASTbranch compile on windows yet? <>`__ - `ast branch pragmatics <>`__ - `Re: [Python-checkins] python/dist/src/Python pythonrun.c, 2.161.2.15, 2.161.2.16 <>`__ -------------------------------- Ditching unbound methods in Py3k -------------------------------- Guido suggested removing unbound methods from Python since their usefulness of checking their first argument and other slight differences from functions just didn't seem worth keeping around and complicating the language. So the idea seems sound. But then people with uses for the extra information kept in unbound methods (im_func and im_self) popped up. To make the long thread short, enough people stepped up mentioning uses they had for the information for Guido to retract the suggestion in the name of backwards compatibility. But unbound methods are now on the list of things to go in Python 3000. Contributing threads: - `Let's get rid of unbound methods <>`__ - `Getting rid of unbound methods: patch available <>`__ - `PEP 246 - concrete assistance to developers of new adapter classes <>`__ ------------------------------------------ Getting exceptions to be new-style classes ------------------------------------------ A patch to allow exceptions to be new-style classes is currently at http://www.python.org/1104669 . The plan is to get that patch in order, apply it, and as long as a ton of code does not break from exceptions moving from classic to new-style classes it will be made permanent in 2.5 . This in no way touches on the major changes as touched upon in a `previous summary `__ which will need a PEP to get the hierarchy cleaned up and discuss any possible changes to bar 'except' statements. Contributing threads: - `Exceptions *must*? be old-style classes? <>`__ =============== Skipped Threads =============== - Mac questions - 2.3.5 schedule, and something I'd like to get in - csv module TODO list - an idea for improving struct.unpack api - Minor change to behaviour of csv module - PATCH/RFC for AF_NETLINK support - logging class submission - Recent IBM Patent releases - frame.f_locals is writable - redux: fractional seconds in strptime - Darwin's realloc(...) implementation never shrinks allocations From kbk at shore.net Fri Feb 4 05:41:59 2005 From: kbk at shore.net (Kurt B. Kaiser) Date: Fri Feb 4 05:42:08 2005 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200502040442.j144fxhi015740@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 284 open ( +4) / 2748 closed ( +1) / 3032 total ( +5) Bugs : 804 open ( +1) / 4812 closed (+13) / 5616 total (+14) RFE : 167 open ( +0) / 142 closed ( +1) / 309 total ( +1) New / Reopened Patches ______________________ Patch for Lib/bsddb/__init__.py to work with modulefinder (2005-01-31) http://python.org/sf/1112812 opened by Tony Meyer New tutorial tests in test_generators.py (2005-01-31) http://python.org/sf/1113421 opened by Francis Girard Add SSL certificate validation (2005-02-01) http://python.org/sf/1114345 opened by James Eagan support PY_LONGLONG in structmember (2005-02-02) http://python.org/sf/1115086 opened by Sam Rushing Add SSL certificate validation (2005-02-03) http://python.org/sf/1115631 opened by James Eagan Patches Closed ______________ Make history recall a-cyclic (2004-03-11) http://python.org/sf/914546 closed by kbk New / Reopened Bugs ___________________ Cannot ./configure on FC3 with gcc 3.4.2 (2005-01-26) CLOSED http://python.org/sf/1110007 reopened by liturgist cgi.FieldStorage memory usage can spike in line-oriented ops (2005-01-30) http://python.org/sf/1112549 opened by Chris McDonough patch 1079734 broke cgi.FieldStorage w/ multipart post req. (2005-01-31) http://python.org/sf/1112856 opened by Irmen de Jong ioctl has problems on 64 bit machines (2005-01-31) http://python.org/sf/1112949 opened by Stephen Norris move_file()'s return value when dry_run=1 unclear (2005-01-31) http://python.org/sf/1112955 opened by Eelis Please add do-while guard to Py_DECREF etc. (2005-01-31) http://python.org/sf/1113244 opened by Richard Kettlewell OSATerminology still semi-broken (2005-01-31) http://python.org/sf/1113328 opened by has document {m} regex matcher wrt empty matches (2005-01-31) http://python.org/sf/1113484 opened by Wummel keywords in keyword_arguments not possible (2005-02-01) CLOSED http://python.org/sf/1113984 opened by Christoph Zwerschke inicode.decode (2005-02-01) CLOSED http://python.org/sf/1114093 opened by Manlio Perillo copy.py bug (2005-02-02) http://python.org/sf/1114776 opened by Vincenzo Di Somma webbrowser doesn't start default Gnome browser by default (2005-02-02) http://python.org/sf/1114929 opened by Jeremy Sanders eval ! (2005-02-02) CLOSED http://python.org/sf/1115039 opened by Andrew Collier Built-in compile function with PEP 0263 encoding bug (2005-02-03) http://python.org/sf/1115379 opened by Christoph Zwerschke os.path.splitext don't handle unix hidden file correctly (2005-02-04) http://python.org/sf/1115886 opened by Jeong-Min Lee Bugs Closed ___________ broken link in tkinter docs (2005-01-24) http://python.org/sf/1108490 closed by jlgijsbers recursion core dumps (2005-01-26) http://python.org/sf/1110055 closed by tim_one install_lib fails under Python 2.1 (2004-11-02) http://python.org/sf/1058960 closed by loewis Double __init__.py executing (2004-06-22) http://python.org/sf/977250 closed by loewis Cannot ./configure on FC3 with gcc 3.4.2 (2005-01-26) http://python.org/sf/1110007 closed by liturgist IDLE hangs due to subprocess (2004-12-28) http://python.org/sf/1092225 closed by kbk Empty curses module is loaded in win32 (2004-07-12) http://python.org/sf/989333 closed by tebeka Tab / Space Configuration Does Not Work in IDLE (2003-08-05) http://python.org/sf/783887 closed by kbk Negative numbers to os.read() cause segfault (2004-12-01) http://python.org/sf/1077106 closed by mwh Time module missing from latest module index (2005-01-25) http://python.org/sf/1109523 closed by montanaro keywords in keyword_arguments not possible (2005-02-01) http://python.org/sf/1113984 closed by rhettinger unicode.decode (2005-02-01) http://python.org/sf/1114093 closed by lemburg eval ! (2005-02-02) http://python.org/sf/1115039 closed by rhettinger New / Reopened RFE __________________ All Statements Should Have Return Values (Syntax Proposal) (2005-02-01) CLOSED http://python.org/sf/1114404 opened by Lenny Domnitser RFE Closed __________ All Statements Should Have Return Values (Syntax Proposal) (2005-02-01) http://python.org/sf/1114404 closed by goodger From burt at dfki.de Fri Feb 4 15:04:33 2005 From: burt at dfki.de (burt@dfki.de) Date: Fri Feb 4 15:04:36 2005 Subject: [Python-Dev] JOB OPENING: Implementor for Python and Search Message-ID: <87fz0ch15a.fsf@dfki.uni-sb.de> I hope posting job vacancies does not violate established list netiquette. The job in question is mainly to do with the PyPy EU project. -- Alastair --- ---- Alastair Burt German Centre for AI (DFKI), Stuhlsatzenhausweg 3 Saarbruecken 66123, Germany Email: burt@dfki.de Tel: +49 681 302 2565 Fax: +49 681 302 5338 DFKI-LT - Job Opening The German Research Center for Artificial Intelligence (DFKI GmbH) is seeking for its Language Technology Lab a researcher/software developer with a strong background in Computer Science, who is interested in working on industrial R&D in the area of language implementation and support for the semantic web. The contract will be for approx. two years, with extensions being subject to availability of funding. Description of work The successful candidate will carry on research and collaborate on software design and implementation in the following areas: - Investigation of search in constraint and logic programming languages. - Investigation of query languages for the semantic web. - Design of conceptual framework for search in Python. - Implementation of search in Python using the facilities offered by PyPy. - Application of the new search functionality to support queries in ontology driven web sites. Requirements: - Good programming skills, particularly in the Python programming language. - Knowledge of semantic web technologies. - Excellent communication skills in English. - Working with high motivation in a team. Additional assets: - Knowledge of the implementation of Python. - Knowledge of constraint programming. Additional Information DFKI GmbH is located on the campus of Saarland University in Saarbr?cken, Germany. The university's research groups and curricula in the fields of Computational Linguistics and Computer Science are internationally renowned. The LT-Lab offers excellent working conditions in a well-established research group. The position provides opportunities to collaborate in a variety of international projects. The competitive salary is calculated according to qualifications based on DFKI GmbH scales. The successful candidate will have opportunities for improving their qualification. Please send your electronic application (preferably in PDF format) to lt-jobs@dfki.de, referring to job opening No. 200501, not later than February 15, 2005. A meaningful application should include a cover letter, a CV, a brief summary of research interests, a statement of interest in the position offered, and contact information for three references. From fdrake at acm.org Fri Feb 4 17:06:39 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri Feb 4 17:06:54 2005 Subject: [Python-Dev] JOB OPENING: Implementor for Python and Search In-Reply-To: <87fz0ch15a.fsf@dfki.uni-sb.de> References: <87fz0ch15a.fsf@dfki.uni-sb.de> Message-ID: <200502041106.22317.fdrake@acm.org> On Friday 04 February 2005 09:04, burt@dfki.de wrote: > I hope posting job vacancies does not violate established list > netiquette. The job in question is mainly to do with the PyPy EU project. There's a Python Job Board on python.org; see http://www.python.org/Jobs-howto.html for information on posting opportunities there. -Fred -- Fred L. Drake, Jr. From skip at pobox.com Fri Feb 4 17:31:05 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 4 17:31:09 2005 Subject: [Python-Dev] JOB OPENING: Implementor for Python and Search In-Reply-To: <87fz0ch15a.fsf@dfki.uni-sb.de> References: <87fz0ch15a.fsf@dfki.uni-sb.de> Message-ID: <16899.41801.358248.884554@montanaro.dyndns.org> Alastair> I hope posting job vacancies does not violate established list Alistair> netiquette. The job in question is mainly to do with the PyPy Alistair> EU project. Not a huge faux pas, but you will get much better exposure by submitting it to the Python Job Board. Details on posting vacancies can be found here: http://www.python.org/Jobs-howto.html -- Skip Montanaro skip@mojam.com http://www.mojam.com/ From gvanrossum at gmail.com Fri Feb 4 19:46:52 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri Feb 4 19:46:59 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Misc NEWS, 1.1237, 1.1238 In-Reply-To: References: Message-ID: [jhylton@users.sourceforge.net] > Log Message: > Add NEWS item about future parser bug. Give back the time machine! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jhylton at gmail.com Fri Feb 4 20:00:25 2005 From: jhylton at gmail.com (Jeremy Hylton) Date: Fri Feb 4 20:00:29 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Misc NEWS, 1.1237, 1.1238 In-Reply-To: References: Message-ID: On Fri, 4 Feb 2005 10:46:52 -0800, Guido van Rossum wrote: > [jhylton@users.sourceforge.net] > > Log Message: > > Add NEWS item about future parser bug. > > Give back the time machine! I already will have by the time you needed it. Jeremy From Jack.Jansen at cwi.nl Sat Feb 5 00:46:11 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Sat Feb 5 00:46:18 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Misc NEWS, 1.1237, 1.1238 In-Reply-To: References: Message-ID: On 4-feb-05, at 20:00, Jeremy Hylton wrote: >>> Add NEWS item about future parser bug. >> >> Give back the time machine! > > I already will have by the time you needed it. I knew this was going to happen one day. (And now we should all be getting out our copies of the HHGTTG and work out the horrible future past conditional tense and such. It's probably having shall been in book 2). -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From jjl at pobox.com Sat Feb 5 00:57:23 2005 From: jjl at pobox.com (John J Lee) Date: Sat Feb 5 01:00:25 2005 Subject: [Python-Dev] cookielib patch Message-ID: Anyone like to commit 1028908? Patch was written by module author (me), including an important doc warning re (lack of) thread safety which I mistakenly thought had got into 2.4.0. John From jjl at pobox.com Sat Feb 5 01:06:37 2005 From: jjl at pobox.com (John J Lee) Date: Sat Feb 5 01:09:38 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: References: Message-ID: On Thu, 3 Feb 2005, Guido van Rossum wrote: [...] > hope at least one person from the release team can be involved, e.g. [...] Guido, from python-announce list: [...] > Python 2.3.5 will be released from www.python.org within a few days > containing a fix for this issue. Python 2.4.1 will be released later > this month containing the same fix. Patches for Python 2.2, 2.3 and > 2.4 are also immediately available: [...] Hope this question isn't too dumb: How will Python releases made in response to security bugs be done: will they just include the security fix (rather than being taken from CVS HEAD), without the usual alpha / beta testing cycle? Or what...? John From anthony at interlink.com.au Sat Feb 5 07:43:17 2005 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Feb 5 07:43:26 2005 Subject: [Python-Dev] 2.3.5 and 2.4.1 release plans Message-ID: <200502051743.18393.anthony@interlink.com.au> Ok, so here's the state of play: 2.3.5 is currently aimed for next Tuesday, but there's an outstanding issue - the new copy code appears to have broken something, see www.python.org/sf/1114776 for the gory details. I'm completely out of time this weekend to look into it too closely - if someone has 1/2 an hour and wants to do some triage on the bug, I'd appreciate it, a great deal. I'm currently thinking about a 2.4.1 around the 23td of Feb - Martin and Fred, does this work for you? There's a bunch of backporting that should probably happen for that - I will try to get some time to do this in the next week or so. -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Sat Feb 5 07:44:32 2005 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Feb 5 07:44:44 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python future.c, 2.14, 2.15 In-Reply-To: References: Message-ID: <200502051744.32779.anthony@interlink.com.au> On Saturday 05 February 2005 05:38, jhylton@users.sourceforge.net wrote: > Fix bug that allowed future statements virtually anywhere in a module. > > If we exit via the break here, we need to set ff_last_lineno or > FUTURE_POSSIBLE() will remain true. The bug affected statements > containing a variety of expressions, but not all expressions. It has > been present since Python 2.2. While this is undoubtedly a bug fix, I'm not sure that it should be backported - it will break people's code that is "working" now (albeit in a faulty way). What do people think? -- Anthony Baxter It's never too late to have a happy childhood. From python at rcn.com Sat Feb 5 08:31:26 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 5 08:35:13 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python future.c, 2.14, 2.15 In-Reply-To: <200502051744.32779.anthony@interlink.com.au> Message-ID: <001701c50b54$b3c94e40$2c10c797@oemcomputer> [Anthony] > While this is undoubtedly a bug fix, I'm not sure that it should be > backported - it will break people's code that is "working" now (albeit > in a faulty way). What do people think? I concur -- the balance of risks is towards the patch causing more harm than good. Raymond From aleax at aleax.it Sat Feb 5 09:06:53 2005 From: aleax at aleax.it (Alex Martelli) Date: Sat Feb 5 09:06:51 2005 Subject: [Python-Dev] 2.3.5 and 2.4.1 release plans In-Reply-To: <200502051743.18393.anthony@interlink.com.au> References: <200502051743.18393.anthony@interlink.com.au> Message-ID: <346c27eb4c05f81a6e089b28d19079b5@aleax.it> On 2005 Feb 05, at 07:43, Anthony Baxter wrote: > Ok, so here's the state of play: 2.3.5 is currently aimed for next > Tuesday, > but there's an outstanding issue - the new copy code appears to have > broken something, see www.python.org/sf/1114776 for the gory details. > I'm completely out of time this weekend to look into it too closely - > if > someone has 1/2 an hour and wants to do some triage on the bug, I'd > appreciate it, a great deal. Done: the issue is easy to fix but not to reproduce, and I'd like to reproduce it so as to fix the unit tests, which currently don't catch the problem. The problem boils down to: deepcopying an instance of a type that doesn't have an __mro__ (and is not one of the many types explicitly recorded in the _deepcopy_dispatch dictionary, such as types.ClassType, types.InstanceType, etc, etc). The easy fix: instead of cls.__mro__ use inspect.getmro which deals with that specifically. Before I commit the fix: can anybody help out with an example of a type anywhere in the standard library that should be deepcopyable, used to be deepcopyable in 2.3.4, isn't one of those which get explicitly recorded in copy._deepcopy_dispatch, AND doesn't have an __mro__? Even the _testcapi.Copyable type magically grows an __mro__; I'm not sure how to MAKE a type w/o one... Thanks, Alex From jhylton at gmail.com Sat Feb 5 16:49:13 2005 From: jhylton at gmail.com (Jeremy Hylton) Date: Sat Feb 5 16:49:16 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python future.c, 2.14, 2.15 In-Reply-To: <001701c50b54$b3c94e40$2c10c797@oemcomputer> References: <200502051744.32779.anthony@interlink.com.au> <001701c50b54$b3c94e40$2c10c797@oemcomputer> Message-ID: On Sat, 5 Feb 2005 02:31:26 -0500, Raymond Hettinger wrote: > [Anthony] > > While this is undoubtedly a bug fix, I'm not sure that it should be > > backported - it will break people's code that is "working" now (albeit > > in a faulty way). What do people think? > > I concur -- the balance of risks is towards the patch causing more harm > than good. I would not backport it to Python 2.3. People have been using it for a long time. I'd be inclined to backport it to Python 2.4, which is still relatively new. If someone has buggy code, an upgrade is going to cause a problem for them at some point. Given how unlikely the risk is -- particularly given that division is the only useful future now -- I'd say the risk is acceptable for Python 2.4.1. (Unlike, say, Python 2.4.2.) Jeremy From aleax at aleax.it Sat Feb 5 17:01:18 2005 From: aleax at aleax.it (Alex Martelli) Date: Sat Feb 5 17:01:18 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python future.c, 2.14, 2.15 In-Reply-To: References: <200502051744.32779.anthony@interlink.com.au> <001701c50b54$b3c94e40$2c10c797@oemcomputer> Message-ID: <22e5d82ed8314b0280871488c3d75356@aleax.it> On 2005 Feb 05, at 16:49, Jeremy Hylton wrote: > On Sat, 5 Feb 2005 02:31:26 -0500, Raymond Hettinger > wrote: >> [Anthony] >>> While this is undoubtedly a bug fix, I'm not sure that it should be >>> backported - it will break people's code that is "working" now >>> (albeit >>> in a faulty way). What do people think? >> >> I concur -- the balance of risks is towards the patch causing more >> harm >> than good. > > I would not backport it to Python 2.3. People have been using it for > a long time. I'd be inclined to backport it to Python 2.4, which is > still relatively new. If someone has buggy code, an upgrade is going > to cause a problem for them at some point. Given how unlikely the > risk is -- particularly given that division is the only useful future > now -- I'd say the risk is acceptable for Python 2.4.1. (Unlike, say, > Python 2.4.2.) +1 on having the fix in 2.4.1 but not in 2.3.5 -- exactly for the reasons Jeremy is giving. Alex From gvanrossum at gmail.com Sat Feb 5 17:02:46 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Feb 5 17:02:49 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: References: Message-ID: > How will Python releases made in response to security bugs be done: will > they just include the security fix (rather than being taken from CVS > HEAD), without the usual alpha / beta testing cycle? Or what...? Depends where you get the release. *Vendors* (ActiveState, Red Hat, Ubuntu, Debian, etc.) typically release a new version that has *just* the fix; they have the infrastructure in place to do this sort of thing quickly and to let their customers benefit quickly. On python.org, however, we tend to take the maintenance branch for a particular version (e.g. 2.3.x or 2.4.x), add the fix, and accellerate the release. For example, we'll release 2.3.5 next week, and 2.4.1 probably some time this month. (In addition, of course, we publish the raw patch; also, we might end up making exceptions and/or start following the vendors' example in some or all cases). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sat Feb 5 21:31:34 2005 From: skip at pobox.com (Skip Montanaro) Date: Sat Feb 5 21:31:06 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: References: Message-ID: <16901.11558.650334.340590@montanaro.dyndns.org> >> How will Python releases made in response to security bugs be done: >> will they just include the security fix (rather than being taken from >> CVS HEAD), without the usual alpha / beta testing cycle? Or what...? Guido> On python.org, however, we tend to take the maintenance branch Guido> for a particular version (e.g. 2.3.x or 2.4.x), add the fix, and Guido> accellerate the release. Would it be possible to release a 2.3.4a that has just the fix over and above the released version? In this case it turns out that the fix nearly coincided with the release of 2.3.5 and 2.4.1. Would you do an accelerated release if this had come up right after they were released? Skip From python at rcn.com Sat Feb 5 21:44:34 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 5 21:48:29 2005 Subject: [Python-Dev] Wanted: members for Python Security Response Team In-Reply-To: <16901.11558.650334.340590@montanaro.dyndns.org> Message-ID: <001b01c50bc3$81f3e460$fa01a044@oemcomputer> > Would it be possible to release a 2.3.4a that has just the fix over and > above the released version? In this case it turns out that the fix nearly > coincided with the release of 2.3.5 and 2.4.1. Would you do an > accelerated > release if this had come up right after they were released? Just go to 2.3.6. No need to add a further complication to the numbering scheme. Raymond From tjreedy at udel.edu Sat Feb 5 23:22:49 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Feb 5 23:23:08 2005 Subject: [Python-Dev] Re: Wanted: members for Python Security Response Team References: <16901.11558.650334.340590@montanaro.dyndns.org> <001b01c50bc3$81f3e460$fa01a044@oemcomputer> Message-ID: "Raymond Hettinger" wrote in message news:001b01c50bc3$81f3e460$fa01a044@oemcomputer... >> Would it be possible to release a 2.3.4a that has just the fix over > and >> above the released version? In this case it turns out that the fix > nearly >> coincided with the release of 2.3.5 and 2.4.1. Would you do an >> accelerated >> release if this had come up right after they were released? > Just go to 2.3.6. No need to add a further complication to the > numbering scheme. As I remember, 2.3.1 was precedent for this -- a quick fix-one-critical-item release about a week after 2.3. Perhaps Python.org should have a release-announcement-only mailing list for people who would not get the news any other way. And/or perhaps final release announcements and security warnings could be made on the various Python-application mail lists if not so done already. Terry J. Reedy From ncoghlan at iinet.net.au Sun Feb 6 03:31:54 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sun Feb 6 03:32:00 2005 Subject: [Python-Dev] Re: Wanted: members for Python Security Response Team In-Reply-To: References: <16901.11558.650334.340590@montanaro.dyndns.org> <001b01c50bc3$81f3e460$fa01a044@oemcomputer> Message-ID: <4205819A.4000108@iinet.net.au> Terry Reedy wrote: > Perhaps Python.org should have a release-announcement-only mailing list for > people who would not get the news any other way. And/or perhaps final > release announcements and security warnings could be made on the various > Python-application mail lists if not so done already. Alternately, could some topics be set up on the existing lists? (ala the new PEP topic for the checkins list). Regards, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From tim.peters at gmail.com Sun Feb 6 08:34:00 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sun Feb 6 08:34:04 2005 Subject: [Python-Dev] 2.3.5 and 2.4.1 release plans In-Reply-To: <346c27eb4c05f81a6e089b28d19079b5@aleax.it> References: <200502051743.18393.anthony@interlink.com.au> <346c27eb4c05f81a6e089b28d19079b5@aleax.it> Message-ID: <1f7befae05020523344e36fb3e@mail.gmail.com> [Anthony Baxter] >> Ok, so here's the state of play: 2.3.5 is currently aimed for next >> Tuesday, but there's an outstanding issue - the new copy code appears >> to have broken something, see www.python.org/sf/1114776 for the gory >> details. ... [Alex Martelli] > The problem boils down to: deepcopying an instance of a type that > doesn't have an __mro__ (and is not one of the many types explicitly > recorded in the _deepcopy_dispatch dictionary, such as types.ClassType, > types.InstanceType, etc, etc). > > The easy fix: instead of cls.__mro__ use inspect.getmro which deals > with that specifically. > > Before I commit the fix: can anybody help out with an example of a type > anywhere in the standard library that should be deepcopyable, used to > be deepcopyable in 2.3.4, isn't one of those which get explicitly > recorded in copy._deepcopy_dispatch, AND doesn't have an __mro__? Even > the _testcapi.Copyable type magically grows an __mro__; I'm not sure > how to MAKE a type w/o one... Since the original bug report came from Zopeland, chances are good (although the report is too vague to be sure) that the problem involves ExtensionClass. That's complicated C code in Zope predating new-style classes, making it possible to build Python-class-like objects in C code under old Pythons. In general, EC-derived classes don't play well with newer Python features (well, at least not until Zope 2.8, where ExtensionClass is recoded as a new-style Python class -- but still keeping some semantics from old-style classes ... ). Anyway, I expect that instances of any EC-derived class would have the problem in the bug report. For example, the base Persistent class in ZODB 3.2.5 is an ExtensionClass: $ \python23\python.exe Python 2.3.5c1 (#61, Jan 25 2005, 19:52:06) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import ZODB # don't ask -- it's necessary to import this first >>> from Persistence import Persistent >>> p = Persistent() >>> import copy >>> copy.deepcopy(p) # deepcopy() barfs on __mro__ Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\lib\copy.py", line 200, in deepcopy copier = _getspecial(cls, "__deepcopy__") File "C:\Python23\lib\copy.py", line 66, in _getspecial for basecls in cls.__mro__: AttributeError: __mro__ >>> copy.copy(p) # copy() does too Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\lib\copy.py", line 86, in copy copier = _getspecial(cls, "__copy__") File "C:\Python23\lib\copy.py", line 66, in _getspecial for basecls in cls.__mro__: AttributeError: __mro__ Unsure whether this is enough, but at least inspect.getmro() isn't phased by an EC-derived class: >>> inspect.getmro(Persistent) (,) More info from the bug report filer is really needed. A problem is that this stuff doesn't appear "to work" under Python 2.3.4 either: $ ../Python-2.3.4/python Python 2.3.4 (#1, Aug 9 2004, 17:15:36) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ZODB >>> from Persistence import Persistent >>> p = Persistent() >>> import copy >>> copy.deepcopy(p) Traceback (most recent call last): File "", line 1, in ? File "/home/tim/Python-2.3.4/Lib/copy.py", line 206, in deepcopy y = _reconstruct(x, rv, 1, memo) File "/home/tim/Python-2.3.4/Lib/copy.py", line 338, in _reconstruct y = callable(*args) TypeError: ExtensionClass object argument after * must be a sequence >>> copy.copy(p) Traceback (most recent call last): File "", line 1, in ? File "/home/tim/Python-2.3.4/Lib/copy.py", line 95, in copy return _reconstruct(x, rv, 0) File "/home/tim/Python-2.3.4/Lib/copy.py", line 338, in _reconstruct y = callable(*args) TypeError: ExtensionClass object argument after * must be a sequence >>> From aleax at aleax.it Sun Feb 6 09:07:30 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 6 09:07:30 2005 Subject: [Python-Dev] 2.3.5 and 2.4.1 release plans In-Reply-To: <1f7befae05020523344e36fb3e@mail.gmail.com> References: <200502051743.18393.anthony@interlink.com.au> <346c27eb4c05f81a6e089b28d19079b5@aleax.it> <1f7befae05020523344e36fb3e@mail.gmail.com> Message-ID: On 2005 Feb 06, at 08:34, Tim Peters wrote: ... >> The easy fix: instead of cls.__mro__ use inspect.getmro which deals >> with that specifically. ... > Since the original bug report came from Zopeland, chances are good > (although the report is too vague to be sure) that the problem > involves ExtensionClass. That's complicated C code in Zope predating True, of course. Still, any type w/o an __mro__ that's not recorded in the dispatch table will tickle the same bug -- give the same traceback, at least (if the original submitter would then proceed to tickle more bugs once this one's solved, I can't know, of course -- but this one does need fixing). > Unsure whether this is enough, but at least inspect.getmro() isn't > phased by an EC-derived class: I'm pretty sure it's enough -- at least for SOME "types w/o __mro__". Thanks to a suggestion from John Lenton on c.l.py, I was able to make a unit test based on: class C(type): def __getattribute__(self, attr): if attr == '__mro__': raise AttributeError, "What, *me*, a __mro__? Nevah!" return super(C, self).__getattribute__(attr) class D(object): __metaclass__ = C Cheating, maybe, but it does show that the 2.3.5rc1 copy.py breaks and moving to inspect.mro repairs the break, which is all one really asks of a tiny unit test;-). So, I've committed test and fix on the 2.3 maintenance branch and marked the bug as fixed. (Hmmmm, is it only me, or is sourceforce bug browsing broken for bugs with 7-digits numbers? This one was 1114776 -- first one w/a 7-digit number I had yet seen -- and in no way could I get the browser to list it, it kept listing only 6-digit ones...). Alex From skip at pobox.com Sun Feb 6 17:49:05 2005 From: skip at pobox.com (Skip Montanaro) Date: Sun Feb 6 17:49:12 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: References: Message-ID: <16902.19073.787609.523027@montanaro.dyndns.org> In a python-checkins message, Raymond stated: Raymond> Replace list of constants with tuples of constants. I understand the motivation here (the peephole optimizer can convert a tuple of constants into a single constant that need not be constructed over and over), but is the effort worth the cost of changing the logical nature of the data structures used? If lists are conceptually like vectors or arrays in other languages and tuples are like C structs or Pascal records, then by converting from list to tuple form you've somehow muddied the data structure water just to take advantage of tuples' immutability. Wouldn't it be better to have the peephole optimizer recognize the throwaway nature of lists in these contexts: for elt in [1, 2, 4, 8, 16]: ... if foo in [list, tuple]: ... (anywhere a list of constants immediately follows the "in" or "not in" keywords) and convert them into constants? The cases you converted all matched that usage. Skip From gvanrossum at gmail.com Sun Feb 6 17:54:58 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 6 17:55:03 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: <16902.19073.787609.523027@montanaro.dyndns.org> References: <16902.19073.787609.523027@montanaro.dyndns.org> Message-ID: On Sun, 6 Feb 2005 10:49:05 -0600, Skip Montanaro wrote: > > In a python-checkins message, Raymond stated: > > Raymond> Replace list of constants with tuples of constants. > > I understand the motivation here (the peephole optimizer can convert a tuple > of constants into a single constant that need not be constructed over and > over), but is the effort worth the cost of changing the logical nature of > the data structures used? If lists are conceptually like vectors or arrays > in other languages and tuples are like C structs or Pascal records, then by > converting from list to tuple form you've somehow muddied the data structure > water just to take advantage of tuples' immutability. > > Wouldn't it be better to have the peephole optimizer recognize the throwaway > nature of lists in these contexts: > > for elt in [1, 2, 4, 8, 16]: > ... > > if foo in [list, tuple]: > ... > > (anywhere a list of constants immediately follows the "in" or "not in" > keywords) and convert them into constants? The cases you converted all > matched that usage. I'm with Skip, *unless* the change is in a PROVEN TIME-CRITICAL PIECE OF CODE. Let's not hand-micro-optimize code just because we can. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Sun Feb 6 19:05:46 2005 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun Feb 6 19:05:51 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: <16902.19073.787609.523027@montanaro.dyndns.org> References: <16902.19073.787609.523027@montanaro.dyndns.org> Message-ID: On Sun, 6 Feb 2005 10:49:05 -0600, Skip Montanaro wrote: > > Wouldn't it be better to have the peephole optimizer recognize the throwaway > nature of lists in these contexts: > > for elt in [1, 2, 4, 8, 16]: > ... > > if foo in [list, tuple]: > ... > > (anywhere a list of constants immediately follows the "in" or "not in" > keywords) and convert them into constants? The cases you converted all > matched that usage. I think I implemented this once. I'll try to see if I can find a patch. It wasn't too difficult, but I'm not sure if the patch was clean. Neal From python at rcn.com Sun Feb 6 19:03:56 2005 From: python at rcn.com (Raymond Hettinger) Date: Sun Feb 6 19:07:42 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: <16902.19073.787609.523027@montanaro.dyndns.org> Message-ID: <001001c50c76$3a298140$8abb9d8d@oemcomputer> [Skip] > If lists are conceptually like vectors or > arrays > in other languages and tuples are like C structs or Pascal records, then > by > converting from list to tuple form you've somehow muddied the data > structure > water just to take advantage of tuples' immutability. In the context of literals used with the "in" operator, practices are widely divergent within the standard library and within the tutorial. Even within a single module, there were arbitrary switches between "x in [1,2,3]" and "x in (1,2,3)" and "x in 1,2,3". It seems that the list-as-arrays-tuple-as-records guideline is not meaningful or applicable in the context of the "in" operator. Proscribing tuple.__contains__ and tuple.__iter__ carrys the notion a bit too far. > Wouldn't it be better to have the peephole optimizer recognize the > throwaway > nature of lists That's a good idea. Implementing it will be more straight-forward after the AST branch gets completed. Raymond From python at rcn.com Sun Feb 6 19:15:30 2005 From: python at rcn.com (Raymond Hettinger) Date: Sun Feb 6 19:19:16 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: Message-ID: <001101c50c77$d7cff9a0$8abb9d8d@oemcomputer> [Neal] > I think I implemented this once. I'll try to see if I can find a > patch. It wasn't too difficult, but I'm not sure if the patch was > clean. If the opportunity arises, another worthwhile peepholer buildout would be to recognize if-elif chains that can be transformed to a single lookup and dispatch (see MAL's note in pep 275). Raymond From python at rcn.com Sun Feb 6 19:42:24 2005 From: python at rcn.com (Raymond Hettinger) Date: Sun Feb 6 19:46:11 2005 Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Lib/test test_copy.py, 1.11.8.1, 1.11.8.2 In-Reply-To: Message-ID: <001701c50c7b$9a25b3c0$8abb9d8d@oemcomputer> > Modified Files: > Tag: release23-maint > test_copy.py > Log Message: > fix bug 1114776 Don't forget release24-maint. Raymond From skip at pobox.com Sun Feb 6 20:13:51 2005 From: skip at pobox.com (Skip Montanaro) Date: Sun Feb 6 20:13:57 2005 Subject: [Python-Dev] list of constants -> tuple of constants In-Reply-To: <001001c50c76$3a298140$8abb9d8d@oemcomputer> References: <16902.19073.787609.523027@montanaro.dyndns.org> <001001c50c76$3a298140$8abb9d8d@oemcomputer> Message-ID: <16902.27759.886169.551985@montanaro.dyndns.org> Raymond> [Skip] >> If lists are conceptually like vectors or arrays in other languages >> and tuples are like C structs or Pascal records, then by converting >> from list to tuple form you've somehow muddied the data structure >> water just to take advantage of tuples' immutability. Raymond> In the context of literals used with the "in" operator, Raymond> practices are widely divergent within the standard library and Raymond> within the tutorial. Then perhaps we should strive to make the standard library and tutorial more consistent. Answers to questions on c.l.py often advocate the standard library as a good source for example code. Raymond> It seems that the list-as-arrays-tuple-as-records guideline is Raymond> not meaningful or applicable in the context of the "in" Raymond> operator. Proscribing tuple.__contains__ and tuple.__iter__ Raymond> carrys the notion a bit too far. I agree that the presence of __contains__ and __iter__ kind of blurs the distinction between the concept of sequence and struct. Skip From raymond.hettinger at verizon.net Mon Feb 7 08:21:33 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Mon Feb 7 08:25:37 2005 Subject: [Python-Dev] Other library updates Message-ID: <000501c50ce5$a70b31e0$b806a044@oemcomputer> Any objections to replacing the likes of types.IntType and types.ListType with int and list? Raymond From doko at cs.tu-berlin.de Mon Feb 7 14:36:32 2005 From: doko at cs.tu-berlin.de (Matthias Klose) Date: Mon Feb 7 14:36:39 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1107726549.20128.12.camel@localhost> References: <1107726549.20128.12.camel@localhost> Message-ID: <16903.28384.621922.349@gargle.gargle.HOWL> A Debian user pointed out (http://bugs.debian.org/293932), that the current license for the Python profiler is not conforming to the DFSG (Debian free software guidelines). http://www.python.org/doc/current/lib/node829.html states "This permission is explicitly restricted to the copying and modification of the software to remain in Python, compiled Python, or other languages (such as C) wherein the modified or derived code is exclusively imported into a Python module." The DFSG, http://www.debian.org/doc/debian-policy/ch-archive.html#s-dfsg, third paragraph state: "Derived Works The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software." - Does somebody knows about the history of this license, why it is more restricted than the Python license? - Is there a chance to change the license for these two modules (profile.py, pstats.py)? The md5.h/md5c.c files allow "copy and use", but no modification of the files. There are some alternative implementations, i.e. in glibc, openssl, so a replacement should be sage. Any other requirements when considering a replacement? Matthias From aleax at aleax.it Mon Feb 7 14:49:56 2005 From: aleax at aleax.it (Alex Martelli) Date: Mon Feb 7 14:50:02 2005 Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Lib/test test_copy.py, 1.11.8.1, 1.11.8.2 In-Reply-To: <001701c50c7b$9a25b3c0$8abb9d8d@oemcomputer> References: <001701c50c7b$9a25b3c0$8abb9d8d@oemcomputer> Message-ID: On 2005 Feb 06, at 19:42, Raymond Hettinger wrote: >> Modified Files: >> Tag: release23-maint >> test_copy.py >> Log Message: >> fix bug 1114776 > > Don't forget release24-maint. Done -- but the maintenance branch of 2.4 has a problem right now: it doesn't pass unit tests, specifically test_os (I checked right after a cvs up and before doing any changes, of course). This appears to be connected to: mapping_tests.py being very strict (or something) and demanding that some mapping be able to update itself from a ``simple dictionary'' that's not iterable and does not have an .items method either; while the _Environ class in os.py appears to make some reasonable demands from the argument to its .update method. I'm not _sure_ which side of the dispute is in the right, so I haven't changed anything there (even though committing anything with unit tests broken makes my teeth grit). I do admit that this kind of issue makes a good case for more formalized interfaces...;-) Alex From skip at pobox.com Mon Feb 7 14:58:24 2005 From: skip at pobox.com (Skip Montanaro) Date: Mon Feb 7 14:58:29 2005 Subject: [Python-Dev] Other library updates In-Reply-To: <000501c50ce5$a70b31e0$b806a044@oemcomputer> References: <000501c50ce5$a70b31e0$b806a044@oemcomputer> Message-ID: <16903.29696.379798.207105@montanaro.dyndns.org> Raymond> Any objections to replacing the likes of types.IntType and Raymond> types.ListType with int and list? +1 Skip From gvanrossum at gmail.com Mon Feb 7 17:31:08 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 7 17:31:18 2005 Subject: [Python-Dev] Other library updates In-Reply-To: <000501c50ce5$a70b31e0$b806a044@oemcomputer> References: <000501c50ce5$a70b31e0$b806a044@oemcomputer> Message-ID: > Any objections to replacing the likes of types.IntType and > types.ListType with int and list? I presume in isinstance tests etc.? In general the procedure for modernizing source code is not to touch it unless you're reviewing or editing the whole module (or at least part of it) anyway. This would be a good occasion to see if perhaps the tests you find are formulated too narrowly -- e.g. isinstance(x, int) should almost always be isinstance(x, (int, long)), and isinstance(x, list) is also often a poorly written test for "sequence-ness". -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Mon Feb 7 17:52:55 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon Feb 7 18:02:07 2005 Subject: [Python-Dev] Re: python/dist/src/Lib DocXMLRPCServer.py, 1.4, 1.5 cookielib.py, 1.6, 1.7 copy.py, 1.43, 1.44 optparse.py, 1.12, 1.13 pickle.py, 1.160, 1.161 subprocess.py, 1.13, 1.14 unittest.py, 1.37, 1.38 xmlrpclib.py, 1.36, 1.37 References: Message-ID: > Reduce the usage of the types module. > Index: xmlrpclib.py > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Lib/xmlrpclib.py,v # Notes: # this version is designed to work with Python 2.1 or newer. > - dispatch[IntType] = dump_int > + dispatch[int] = dump_int $ python2.1 >>> type(0) == int 0 >>> type([]) == list 0 >>> type({}) == dict Traceback (most recent call last): File "", line 1, in ? NameError: name 'dict' is not defined From raymond.hettinger at verizon.net Tue Feb 8 06:44:45 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Tue Feb 8 06:48:52 2005 Subject: [Python-Dev] test_codecs failing Message-ID: <000501c50da1$56c25800$52b79d8d@oemcomputer> The most recent test_codecs check-in (1.19) is failing on a MSCV6.0 compilation running on WinMe: ---------------------------------------------------------------------- Ran 35 tests in 1.430s FAILED (failures=1) Traceback (most recent call last): File "\py25\lib\test\test_codecs.py", line 786, in ? test_main() File "\py25\lib\test\test_codecs.py", line 781, in test_main BasicStrTest File "C:\PY25\lib\test\test_support.py", line 290, in run_unittest run_suite(suite, testclass) File "C:\PY25\lib\test\test_support.py", line 275, in run_suite raise TestFailed(err) test.test_support.TestFailed: Traceback (most recent call last): File "\py25\lib\test\test_codecs.py", line 165, in test_badbom self.assertRaises(UnicodeError, f.read) AssertionError: UnicodeError not raised C:\pydev>python Python 2.5a0 (#46, Feb 7 2005, 21:37:18) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. Raymond From anthony at interlink.com.au Tue Feb 8 06:53:04 2005 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Feb 8 06:54:14 2005 Subject: [Python-Dev] BRANCH FREEZE for 2.3.5 Message-ID: <200502081653.05122.anthony@interlink.com.au> Can people stay off the release23-maint branch while we cut 2.3.5 (final), starting in about 5 hours time (say, around 1200 UTC). Thanks, Anthony -- Anthony Baxter It's never too late to have a happy childhood. From fredrik at pythonware.com Tue Feb 8 09:43:13 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 8 09:43:29 2005 Subject: [Python-Dev] Re: python/dist/src/Lib rfc822.py,1.78,1.79 References: Message-ID: rhettinger@users.sourceforge.net wrote: > @@ -399,9 +393,8 @@ > del self[name] # Won't fail if it doesn't exist > self.dict[name.lower()] = value > text = name + ": " + value > - lines = text.split("\n") > - for line in lines: > - self.headers.append(line + "\n") > + self.headers.extend(text.splitlines(True)) > + self.headers.append('\n') and you're 100% sure that the change in how things are stored in headers won't affect any existing code? (the docstring says that headers contain a list of lines, which is no longer true) From mdehoon at ims.u-tokyo.ac.jp Tue Feb 8 10:08:52 2005 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Tue Feb 8 10:04:48 2005 Subject: [Python-Dev] Patch review [ 981773 ] crach link c++ extension by mingw Message-ID: <420881A4.3000601@ims.u-tokyo.ac.jp> Patch review [ 981773 ] crach link c++ extension by mingw When building a C++ extension for Windows using MinGW, the linking would fail due to an incorrect link command. The patch contains a solution for this problem. I could reproduce this bug with Python 2.3.5c1, but in Python 2.4 it seems to have been fixed. Using this Python version: '2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)]' a C++ extension compiled and linked correctly with MinGW. So I think this patch is no longer needed (except if we want to back-port it to 2.3.5, which I doubt). --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From walter at livinglogic.de Tue Feb 8 11:11:31 2005 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Tue Feb 8 11:11:34 2005 Subject: [Python-Dev] test_codecs failing In-Reply-To: <000501c50da1$56c25800$52b79d8d@oemcomputer> References: <000501c50da1$56c25800$52b79d8d@oemcomputer> Message-ID: <42089053.7090904@livinglogic.de> Raymond Hettinger wrote: > The most recent test_codecs check-in (1.19) is failing on a MSCV6.0 > compilation running on WinMe: > > ---------------------------------------------------------------------- > Ran 35 tests in 1.430s > > FAILED (failures=1) > Traceback (most recent call last): > [...] > test.test_support.TestFailed: Traceback (most recent call last): > File "\py25\lib\test\test_codecs.py", line 165, in test_badbom > self.assertRaises(UnicodeError, f.read) > AssertionError: UnicodeError not raised Fixed. But the question remains: Why does a StreamWriter have a read() method? Bye, Walter D?rwald From mal at egenix.com Tue Feb 8 11:34:32 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Tue Feb 8 11:34:36 2005 Subject: [Python-Dev] test_codecs failing In-Reply-To: <42089053.7090904@livinglogic.de> References: <000501c50da1$56c25800$52b79d8d@oemcomputer> <42089053.7090904@livinglogic.de> Message-ID: <420895B8.6050300@egenix.com> Walter D?rwald wrote: > Raymond Hettinger wrote: > >> The most recent test_codecs check-in (1.19) is failing on a MSCV6.0 >> compilation running on WinMe: >> >> ---------------------------------------------------------------------- >> Ran 35 tests in 1.430s >> >> FAILED (failures=1) >> Traceback (most recent call last): > > > [...] > >> test.test_support.TestFailed: Traceback (most recent call last): >> File "\py25\lib\test\test_codecs.py", line 165, in test_badbom >> self.assertRaises(UnicodeError, f.read) >> AssertionError: UnicodeError not raised > > > Fixed. But the question remains: Why does a StreamWriter have > a read() method? It inherits that method from the underlying stream - just as all other methods and attributes that the stream defines and which are not overridden by the StreamWriter methods. This approach was taken to make it possible to user StreamWriter (and StreamReader) instance as drop-in replacement in situations where the application normally expects a file-like object. Note that a file opened in write mode also exposes a read() method. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 08 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From fredrik at pythonware.com Tue Feb 8 10:10:49 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 8 16:26:33 2005 Subject: [Python-Dev] Re: python/dist/src/Lib rfc822.py,1.78,1.79 References: Message-ID: >> @@ -399,9 +393,8 @@ >> del self[name] # Won't fail if it doesn't exist >> self.dict[name.lower()] = value >> text = name + ": " + value >> - lines = text.split("\n") >> - for line in lines: >> - self.headers.append(line + "\n") >> + self.headers.extend(text.splitlines(True)) >> + self.headers.append('\n') > > and you're 100% sure that the change in how things are stored > in headers won't affect any existing code? > > (the docstring says that headers contain a list of lines, which is no > longer true) and the module documentation says: Each line contains a trailing newline. The blank line terminating the headers is not contained in the list. which is no longer true (unless I'm missing something here) From gvanrossum at gmail.com Tue Feb 8 16:35:17 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 8 16:35:21 2005 Subject: [Python-Dev] Re: python/dist/src/Lib rfc822.py,1.78,1.79 In-Reply-To: References: Message-ID: On Tue, 8 Feb 2005 10:10:49 +0100, Fredrik Lundh wrote: > > >> @@ -399,9 +393,8 @@ > >> del self[name] # Won't fail if it doesn't exist > >> self.dict[name.lower()] = value > >> text = name + ": " + value > >> - lines = text.split("\n") > >> - for line in lines: > >> - self.headers.append(line + "\n") > >> + self.headers.extend(text.splitlines(True)) > >> + self.headers.append('\n') > > > > and you're 100% sure that the change in how things are stored > > in headers won't affect any existing code? > > > > (the docstring says that headers contain a list of lines, which is no > > longer true) > > and the module documentation says: > > Each line contains a trailing newline. The blank line terminating > the headers is not contained in the list. > > which is no longer true (unless I'm missing something here) This would have been caught if there was a unit test validating what the documentation says. Why aren't there unit tests for this code? I think we need to raise the bar for "wholistic" improvements to a module: first write a unit test if there isn't already one (and if there is one, make sure that it tests all documented behavior), *then* refactor. Yes, this would be less fun. It's not supposed to be fun. It's supposed to avoid breaking code. Raymond, please roll back that change until this is taken care of. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue Feb 8 17:59:23 2005 From: barry at python.org (Barry Warsaw) Date: Tue Feb 8 17:59:27 2005 Subject: [Python-Dev] Re: python/dist/src/Lib rfc822.py,1.78,1.79 In-Reply-To: References: Message-ID: <1107881963.19011.18.camel@geddy.wooz.org> On Tue, 2005-02-08 at 10:35, Guido van Rossum wrote: > This would have been caught if there was a unit test validating what > the documentation says. Why aren't there unit tests for this code? I > think we need to raise the bar for "wholistic" improvements to a > module: first write a unit test if there isn't already one (and if > there is one, make sure that it tests all documented behavior), *then* > refactor. Yes, this would be less fun. It's not supposed to be fun. > It's supposed to avoid breaking code. +1. This module is used in so many place, you really have to take the documented interface seriously (not that you shouldn't otherwise, of course). I suspect even the undocumented current semantics are relied on in many place. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20050208/6bed0e61/attachment.pgp From greg at electricrain.com Tue Feb 8 20:52:43 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Tue Feb 8 20:52:51 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16903.28384.621922.349@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> Message-ID: <20050208195243.GD10650@zot.electricrain.com> > The md5.h/md5c.c files allow "copy and use", but no modification of > the files. There are some alternative implementations, i.e. in glibc, > openssl, so a replacement should be sage. Any other requirements when > considering a replacement? > > Matthias I believe the "plan" for md5 and sha1 and such is to use the much faster openssl versions "in the future" (based on a long thread debating future interfaces to such things on python-dev last summer). That'll sidestep any tedious license issue and give a better implementation at the same time. i don't believe anyone has taken the time to make such a patch yet. -g From tim.peters at gmail.com Tue Feb 8 21:37:50 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 8 21:37:53 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16903.28384.621922.349@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> Message-ID: <1f7befae05020812377c72de26@mail.gmail.com> [Matthias Klose] > A Debian user pointed out (http://bugs.debian.org/293932), that the > current license for the Python profiler is not conforming to the DFSG > (Debian free software guidelines). > > http://www.python.org/doc/current/lib/node829.html states > > "This permission is explicitly restricted to the copying and > modification of the software to remain in Python, compiled Python, > or other languages (such as C) wherein the modified or derived code > is exclusively imported into a Python module." ... > - Does somebody knows about the history of this license, why it is > more restricted than the Python license? Simply because that's the license Jim Roskind slapped on it when he contributed this code 10 years ago. I imagine (but don't know) that Guido looked at it, thought "hmm -- shouldn't be a problem for Python's users", and so accepted it. > - Is there a chance to change the license for these two modules > (profile.py, pstats.py)? Not unless some remnant of InfoSeek Corp can be found, since they're the copyright holder (their work, their license). Alas, Jim Roskind hasn't been seen in the Python world this century. OTOH, if InfoSeek has vanished, it's unlikely they'll be suing anyone. Given how Python-specific profile.py and pstats.py are, it's hard for me to imagine anyone wanting to make a derivative that isn't imported into a Python module. In that respect it seems like a license clause that forbids you to run the software while the tip of your tongue is licking the back of your own neck. Still, if that matters, perhaps Debian will need to leave these modules out. Bold users will still be able to grab them from any number of other places. From jhylton at gmail.com Tue Feb 8 21:52:29 2005 From: jhylton at gmail.com (Jeremy Hylton) Date: Tue Feb 8 21:52:32 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1f7befae05020812377c72de26@mail.gmail.com> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: Maybe some ambitious PSF activitst could contact Roskind and Steve Kirsch and see if they know who at Disney to talk to... Or maybe the Disney guys who were at PyCon last year could help. Jeremy On Tue, 8 Feb 2005 15:37:50 -0500, Tim Peters wrote: > [Matthias Klose] > > A Debian user pointed out (http://bugs.debian.org/293932), that the > > current license for the Python profiler is not conforming to the DFSG > > (Debian free software guidelines). > > > > http://www.python.org/doc/current/lib/node829.html states > > > > "This permission is explicitly restricted to the copying and > > modification of the software to remain in Python, compiled Python, > > or other languages (such as C) wherein the modified or derived code > > is exclusively imported into a Python module." > ... > > - Does somebody knows about the history of this license, why it is > > more restricted than the Python license? > > Simply because that's the license Jim Roskind slapped on it when he > contributed this code 10 years ago. I imagine (but don't know) that > Guido looked at it, thought "hmm -- shouldn't be a problem for > Python's users", and so accepted it. > > > - Is there a chance to change the license for these two modules > > (profile.py, pstats.py)? > > Not unless some remnant of InfoSeek Corp can be found, since they're > the copyright holder (their work, their license). Alas, Jim Roskind > hasn't been seen in the Python world this century. > > OTOH, if InfoSeek has vanished, it's unlikely they'll be suing anyone. > Given how Python-specific profile.py and pstats.py are, it's hard for > me to imagine anyone wanting to make a derivative that isn't imported > into a Python module. In that respect it seems like a license clause > that forbids you to run the software while the tip of your tongue is > licking the back of your own neck. > > Still, if that matters, perhaps Debian will need to leave these > modules out. Bold users will still be able to grab them from > any number of other places. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > From martin at v.loewis.de Tue Feb 8 22:35:28 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue Feb 8 22:35:12 2005 Subject: [Python-Dev] 2.3.5 and 2.4.1 release plans In-Reply-To: <200502051743.18393.anthony@interlink.com.au> References: <200502051743.18393.anthony@interlink.com.au> Message-ID: <420930A0.5080808@v.loewis.de> Anthony Baxter wrote: > I'm currently thinking about a 2.4.1 around the 23td of Feb - Martin and > Fred, does this work for you? Yes. I will need to test whether my replacement of VB scripts in the installer with native DLLs works even on W95; I'm confident to complete this next week (already have the W95 machine installed). Regards, Martin From anthony at python.org Wed Feb 9 08:27:49 2005 From: anthony at python.org (Anthony Baxter) Date: Wed Feb 9 08:28:22 2005 Subject: [Python-Dev] RELEASED Python 2.3.5, final Message-ID: <200502091827.56277.anthony@python.org> On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.3.5 (final). Python 2.3.5 is a bug-fix release. See the release notes at the website (also available as Misc/NEWS in the source distribution) for details of the bugs squished in this release. Python 2.3.5 contains an important security fix for SimpleXMLRPCServer - for more, see the announcement of PSF-2005-001 at: http://www.python.org/security/PSF-2005-001/ Python 2.3.5 is the last planned release in the Python 2.3 series, and is being released for those people who still need to run Python 2.3. Python 2.4 is a newer release, and should be preferred if possible. From here, bugfix releases are switching to the Python 2.4 branch - 2.4.1 will be the next Python release. For more information on Python 2.3.5, including download links for various platforms, release notes, and known issues, please see: http://www.python.org/2.3.5 Highlights of this new release include: - Bug fixes. According to the release notes, more than 50 bugs have been fixed, including a couple of bugs that could cause Python to crash. Highlights of the previous major Python release (2.3) are available from the Python 2.3 page, at http://www.python.org/2.3/highlights.html Enjoy the new release, Anthony Anthony Baxter anthony@python.org Python Release Manager (on behalf of the entire python-dev team) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20050209/5f77db03/attachment.pgp From trentm at ActiveState.com Wed Feb 9 19:01:52 2005 From: trentm at ActiveState.com (Trent Mick) Date: Wed Feb 9 19:04:09 2005 Subject: [Python-Dev] update copyright date in PC/python_nt.rc? Message-ID: <420A5010.5030008@activestate.com> Howdy, The copyright date was updated to 2005 in Python/getcopyright.c. Should the same be done in PC/python_nt.rc? Or perhaps, is there any reason python_nt.rc should NOT be updated? Cheers, Trent -- Trent Mick trentm@activestate.com From bjourne at gmail.com Wed Feb 9 20:20:16 2005 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Wed Feb 9 20:27:32 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars Message-ID: <740c3aec050209112069d8c328@mail.gmail.com> I'd like to help develop Python for fun and profit and I've heard that posting patch reviews to python-dev is a good way to contribute. So here goes: PATCH REVIEW: [ 1098732 ] Skip Montanaro has written a patch which makes it so that you can inspect variable values in tracebacks. IMHO, it is a brilliant idea and can make debugging quite alot easier. However, I'm not so fond of the way that he has implemented it, it needs work. He basically outputs all names in all stackframes all the way up to the top which makes the traceback look way to cluttered. He has also implemented it as a hook to sys.excepthook, I would ike it to be the default way in which tracebacks are printed, or atleast ctivated by a command line switch to Python. What does everyone else think? Does Skip's idea have any merit? http://sourceforge.net/tracker/index.php?func=detail&aid=1098732&group_id=5470&atid=305470 -- mvh Bj?rn From pje at telecommunity.com Wed Feb 9 20:43:04 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Feb 9 20:40:58 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <740c3aec050209112069d8c328@mail.gmail.com> Message-ID: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> At 08:20 PM 2/9/05 +0100, BJ?rn Lindqvist wrote: >Does Skip's idea have >any merit? Yes, but not as a default behavior. Many people already consider the fact that tracebacks display file paths to be a potential security problem. If anything, the default traceback display should have less information, not more. (E.g., display module __name__ instead of the code's __file__). Also note that the stdlib already has a cgitb module that does this sort of display for CGI scripts, so the technique isn't new, and cgitb provides a good example for people to create their own advanced traceback formatters with. If there were another command line option added to Python for this, I'd personally prefer it be an option to enter the debugger when a terminal traceback is printed. Currently, I use 'python -i' so that I get an interpreter prompt, then use 'import pdb; pdb.pm()' to enter the debugger at the point where the error occurred. One can then print whatever local variables are desired, go up and down the stack, list code, and even perform calculations on the values on the stack. About the only place I can think of where such an extremely verbose traceback would be useful and safe, is inside of unit tests. I believe that the py.test package uses traceback introspection of this kind in order to display relevant values when an assertion fails. So, it might be useful in the context of a unit test error report to get some of that information, but even there, there is a question of how much is relevant for display. From tim.peters at gmail.com Wed Feb 9 21:30:34 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 9 21:31:10 2005 Subject: [Python-Dev] update copyright date in PC/python_nt.rc? In-Reply-To: <420A5010.5030008@activestate.com> References: <420A5010.5030008@activestate.com> Message-ID: <1f7befae05020912302f782316@mail.gmail.com> [Trent Mick] > The copyright date was updated to 2005 in Python/getcopyright.c. Should > the same be done in PC/python_nt.rc? Yes. > Or perhaps, is there any reason python_nt.rc should NOT be updated? Only reason I can think of is your inexcusable laziness for not having done it yourself . From trentm at ActiveState.com Wed Feb 9 22:07:04 2005 From: trentm at ActiveState.com (Trent Mick) Date: Wed Feb 9 22:09:26 2005 Subject: [Python-Dev] update copyright date in PC/python_nt.rc? In-Reply-To: <1f7befae05020912302f782316@mail.gmail.com> References: <420A5010.5030008@activestate.com> <1f7befae05020912302f782316@mail.gmail.com> Message-ID: <420A7B78.6030606@activestate.com> > Only reason I can think of is your inexcusable laziness for not having > done it yourself . Done. I'd ask whether I should backport this to release23-maint... but then I'd have to reason whether there is any point given that a 2.3.6 is unlikely. And I'd have to ask Anthony. and... enh. Trent -- Trent Mick trentm@activestate.com From oliphant at ee.byu.edu Wed Feb 9 22:43:34 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Feb 9 22:43:37 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core Message-ID: <420A8406.4020808@ee.byu.edu> There has recently been some much-needed discussion on the numpy-discussions list run by sourceforge regarding the state of the multidimensional array objects available for Python. It is desired by many that there be a single multidimensional array object in the Python core to facilitate data transfer and interfacing between multiple packages. I am a co-author of the current PEP regarding inclusion of the multidimensional array object into the core. However, that PEP is sorely outdated. Currently there are two multidimensional array objects that are in use in the Python community: Numeric --- original arrayobject created by Jim Hugunin and many others. Has been developed and used for 10 years. An upgrade that adds the features of numarray but maintains the same basic structure of Numeric called Numeric3 is in development and will be ready for more wide-spread use in a couple of weeks. Numarray --- in development for about 3 years. It was billed by some as a replacement for Numeric,. While introducing some new features, it still has not covered the full feature set that Numeric had making it impossible for all Numeric users to use it. In addition, it is still unacceptably slow for many operations that Numeric does well. Scientific users will always have to install more packages in order to use Python for their purposes. However, there is still the desire that the basic array object would be common among all Python users. To assist in writing a new PEP, we need clarification from Guido and others involved regarding 1) What specifically about Numeric prevented it from being acceptable as an addition to the Python core. 2) Are there any fixed requirements (other than coding style) before an arrayobject would be accepted into the Python core. Thanks for your comments. I think they will help the discussion currently taking place. -Travis Oliphant From bac at OCF.Berkeley.EDU Wed Feb 9 22:59:35 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Wed Feb 9 22:59:57 2005 Subject: [Python-Dev] discourage patch reviews to the list? (was: Patch review: [ 1098732 ]) In-Reply-To: <740c3aec050209112069d8c328@mail.gmail.com> References: <740c3aec050209112069d8c328@mail.gmail.com> Message-ID: <420A87C7.7030102@ocf.berkeley.edu> BJ?rn Lindqvist wrote: > I'd like to help develop Python for fun and profit and I've heard that > posting patch reviews to python-dev is a good way to contribute. So > here goes: > Are we actually promoting this? I am fine with people doing this when they have done five reviews and want their specific patch looked at (personally I prefer when people do it in a single email, but I can live with individual ones). But if people don't have that in mind, should we not be encouraging this? I mean it seems to be defeating the purpose of SF and having the various mailing lists that send out updates on SF posts. -Brett From gvanrossum at gmail.com Wed Feb 9 23:45:18 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Feb 9 23:45:59 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: <420A8406.4020808@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> Message-ID: > 1) What specifically about Numeric prevented it from being acceptable as > an addition to the Python core. It's very long ago, I believe that the authors themselves didn't think it was good enough. It certainly had a very hackish coding style. Numarray was supposed to fix all that. I'm sorry to hear that it hasn't (yet) reached the maturity you find necessary. > 2) Are there any fixed requirements (other than coding style) before an > arrayobject would be accepted into the Python core. The intended user community must accept the code as "best-of-breed". It seems that the Num* community has some work to do in this respect. Also (this applies to all code) the code must be stable enough that the typical Python release cycle (about 18 months between feature releases) doesn't cause problems. Finally there must be someone willing to be responsible for maintenance of the code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Feb 10 00:10:01 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 00:10:03 2005 Subject: [Python-Dev] discourage patch reviews to the list? In-Reply-To: <420A87C7.7030102@ocf.berkeley.edu> References: <740c3aec050209112069d8c328@mail.gmail.com> <420A87C7.7030102@ocf.berkeley.edu> Message-ID: <420A9849.6020304@v.loewis.de> Brett C. wrote: > But if people don't have that in mind, should we not be encouraging > this? I mean it seems to be defeating the purpose of SF and having the > various mailing lists that send out updates on SF posts. Clearly, the comment should *also* go to SF - posting it to python-dev may mean it gets lost eventually (in particular, when somebody gets to look at the patch). Bj?rn did post his comment to SF, and a summary to python-dev. I personally think this is a good strategy: it puts focus on things that should be worked on. Let me explain why I think that these patches should be worked on: - it might be that the analysis of the patch suggests that the patch should be rejected, as-is. If so, it has a good chance to be closed *right away* with somebody with write privileges to the tracker, if he agrees with the analysis taken. People who care can follow the link in the email message, and see that the patch was closed. People who don't care can quickly grasp this is a patch review, and delete the message. - it might be that the analysis suggests changes. Posting it to python-dev gives the submitter of the patch a chance to challenge the review. If somebody thinks the requested changes are unecessary, they will comment. People actually prefer to discuss questionable requests for changes on the mailing list, instead of discussing them in the SF tracker. - it might be that the analysis recommend acceptance. Again, it might be that this can trigger a quick action by some committer - anybody else can safely ignore the message. However, *some* committer should take *some* action on the patch - one day or the other. Having the right to commit is a privilege, but it is also an obligation. The patch needs to be eventually looked at, and decided upon. Somebody already did the majority of the work, and suggested an action. It should be easy to decide whether this action is agreeable or not (unless the review is flawed, in which case the reviewer should be told about this). To put it the other way 'round: should we only discuss changes on python-dev which *don't* have patches on SF???? I don't think so. Furthermore, this strategy exposes the reviewer. A reviewer is somebody who will potentially get write access to the tracker, and perhaps CVS write access. A reviewer who wants to contribute in this way regularly clearly needs to gain the trust of other contributors, and posting smart, valuable, objective, balanced reviews on contributed patches is an excellent way to gain such trust (likewise, posting reviews which turn out to be flawed is a way to find out that the reviewer still needs to learn things before he can be trusted). Regards, Martin P.S. These remarks are mostly of general nature - I haven't actually studied yet Bj?rn's review (but I leave it in my inbox so I can get back to it next week). From martin at v.loewis.de Thu Feb 10 00:21:08 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 00:21:10 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> Message-ID: <420A9AE4.5090000@v.loewis.de> Phillip J. Eby wrote: > Yes, but not as a default behavior. Many people already consider the > fact that tracebacks display file paths to be a potential security > problem. If anything, the default traceback display should have less > information, not more. (E.g., display module __name__ instead of the > code's __file__). Notice that this patch does not change the exception printing behaviour of Python at all. It just changes the implementation of traceback.print_exception, so it only affects code that actually uses this function. Furthermore, it only affects code that uses this function and is *changed* to supply the argument True for print_args. > Also note that the stdlib already has a cgitb module that does this sort > of display for CGI scripts, so the technique isn't new, and cgitb > provides a good example for people to create their own advanced > traceback formatters with. Sure. However, if this is frequently needed (outside the context of CGI), it would sure be helpful if the traceback module supported it. > If there were another command line option added to Python for this, I'd > personally prefer it be an option to enter the debugger when a terminal > traceback is printed. Currently, I use 'python -i' so that I get an > interpreter prompt, then use 'import pdb; pdb.pm()' to enter the > debugger at the point where the error occurred. With the patch, you would have to add an explicit try/except into your code, to supply True for print_args (or set a sys.excepthook, as Skip suggests in his patch readme). Regards, Martin From mwh at python.net Thu Feb 10 00:22:59 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 10 00:23:01 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.340, 2.341 In-Reply-To: (rhettinger@users.sourceforge.net's message of "Sun, 06 Feb 2005 14:05:44 -0800") References: Message-ID: <2m65115ne4.fsf@starship.python.net> rhettinger@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Python > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26507/Python > > Modified Files: > compile.c > Log Message: > Transform "x in (1,2,3)" to "x in frozenset([1,2,3])". > > Inspired by Skip's idea to recognize the throw-away nature of sequences > in this context and to transform their type to one with better performance. This breaks code: >>> [] in (1,) Traceback (most recent call last): File "", line 1, in ? TypeError: list objects are unhashable (and so breaks test_email -- is noone else running the test suite?). It's a cute idea, but IMHO violates the principle of least surprise too much. Cheers, mwh -- ZAPHOD: Who are you? ROOSTA: A friend. ZAPHOD: Oh yeah? Anyone's friend in particular, or just generally well-disposed to people? -- HHGttG, Episode 7 From mwh at python.net Thu Feb 10 00:25:54 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 10 00:25:56 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> (Phillip J. Eby's message of "Wed, 09 Feb 2005 14:43:04 -0500") References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> Message-ID: <2m1xbp5n99.fsf@starship.python.net> "Phillip J. Eby" writes: > At 08:20 PM 2/9/05 +0100, BJ?rn Lindqvist wrote: >>Does Skip's idea have >>any merit? > > Yes, but not as a default behavior. Many people already consider the > fact that tracebacks display file paths to be a potential security > problem. If anything, the default traceback display should have less > information, not more. (E.g., display module __name__ instead of the > code's __file__). Oh, come on. Making tracebacks less useful to protect people who accidentally spray them across the internet seems absurd. Would you like them not to show source, either? Cheers, mwh -- Many of the posts you see on Usenet are actually from moths. You can tell which posters they are by their attraction to the flames. -- Internet Oracularity #1279-06 From martin at v.loewis.de Thu Feb 10 00:26:56 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 00:26:57 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: <420A8406.4020808@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> Message-ID: <420A9C40.4060306@v.loewis.de> Travis Oliphant wrote: > I am a co-author of the current PEP regarding inclusion of the > multidimensional array object into the core. However, that PEP is > sorely outdated. [...] > 1) What specifically about Numeric prevented it from being acceptable as > an addition to the Python core. > 2) Are there any fixed requirements (other than coding style) before an > arrayobject would be accepted into the Python core. I think you answered these questions yourself. If a PEP is sorely outdated after only 3 years of its life, there clearly is something wrong with the PEP. Python language features will have to live 10 years or so before they can be considered outdated, and then another 20 years before they can be removed (look at string exceptions as an example). So if it is still not clear what kind of API would be adequate after all these years, it is best (IMO) to wait a few more years for somebody to show up with a good solution to the problem (which I admit I don't understand). Regards, Martin From bob at redivi.com Thu Feb 10 00:40:08 2005 From: bob at redivi.com (Bob Ippolito) Date: Thu Feb 10 00:40:21 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <2m1xbp5n99.fsf@starship.python.net> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <2m1xbp5n99.fsf@starship.python.net> Message-ID: On Feb 9, 2005, at 6:25 PM, Michael Hudson wrote: > "Phillip J. Eby" writes: > >> At 08:20 PM 2/9/05 +0100, BJ?rn Lindqvist wrote: >>> Does Skip's idea have >>> any merit? >> >> Yes, but not as a default behavior. Many people already consider the >> fact that tracebacks display file paths to be a potential security >> problem. If anything, the default traceback display should have less >> information, not more. (E.g., display module __name__ instead of the >> code's __file__). > > Oh, come on. Making tracebacks less useful to protect people who > accidentally spray them across the internet seems absurd. Would you > like them not to show source, either? On Mac OS X the paths to the files are so long as to make the tracebacks really ugly and *less* usable. I certainly wouldn't mind if __name__ showed up instead of __file__. I have a "pywhich" script that shows me the file given a name that I use: (note that modulegraph.util.imp_find_module is like imp.find_module but it will walk the packages to find the actual module and it only returns the filename) #!/usr/bin/env python import sys, os from modulegraph.util import imp_find_module for module in sys.argv[1:]: path,oext = os.path.splitext(imp_find_module(module)[1]) for ext in ('.py', oext): if os.path.exists(path+ext): print path+ext break -bob From gvanrossum at gmail.com Thu Feb 10 00:53:56 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 10 00:54:01 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <2m1xbp5n99.fsf@starship.python.net> Message-ID: > > Oh, come on. Making tracebacks less useful to protect people who > > accidentally spray them across the internet seems absurd. Would you > > like them not to show source, either? My response exactly. > On Mac OS X the paths to the files are so long as to make the > tracebacks really ugly and *less* usable. I certainly wouldn't mind if > __name__ showed up instead of __file__. I have a "pywhich" script that > shows me the file given a name that I use: Well, sorry, but not everybody is as smart as you, and having the file name rather than the module name there helps debugging important sys.python issues. It wouldn't be the first time that someone has a hacked version of a standard module tucked away in a directory that happens to land on the path, and seeing the pathname is then a lot more productive than the module name. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant at ee.byu.edu Thu Feb 10 00:54:29 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Feb 10 00:54:37 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: <420A9C40.4060306@v.loewis.de> References: <420A8406.4020808@ee.byu.edu> <420A9C40.4060306@v.loewis.de> Message-ID: <420AA2B5.2060801@ee.byu.edu> Martin v. L?wis wrote: > Travis Oliphant wrote: > >> I am a co-author of the current PEP regarding inclusion of the >> multidimensional array object into the core. However, that PEP is >> sorely outdated. > > [...] > >> 1) What specifically about Numeric prevented it from being acceptable >> as an addition to the Python core. >> 2) Are there any fixed requirements (other than coding style) before >> an arrayobject would be accepted into the Python core. > > > I think you answered these questions yourself. If a PEP is sorely > outdated after only 3 years of its life, there clearly is something > wrong with the PEP. Exactly, the PEP does not reflect the reality of what anybody wants in the core. It needs modification, or replacment. Can I just do that? Or do I need permission from Barrett and others who has only a passing interest in this anymore. > Python language features will have to live > 10 years or so before they can be considered outdated, and then > another 20 years before they can be removed (look at string > exceptions as an example). I think you misunderstood my meaning. For example Numeric has lived 10 years with very few changes. It seems to me it is rather stable. > > So if it is still not clear what kind of API would be adequate > after all these years, it is best (IMO) to wait a few more years > for somebody to show up with a good solution to the problem > (which I admit I don't understand). It actually is pretty clear to many. There have been a wide variety of modules written on top of Numeric and Numarray. Most of the rough spots around the edges have been ironed out. Our arguments now are about packaging other code living on top of an arrayobject. Thanks for your help, -Travis From pje at telecommunity.com Thu Feb 10 01:11:48 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Feb 10 01:09:45 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <420A9AE4.5090000@v.loewis.de> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050209191027.03ea87f0@mail.telecommunity.com> At 12:21 AM 2/10/05 +0100, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>Yes, but not as a default behavior. Many people already consider the >>fact that tracebacks display file paths to be a potential security >>problem. If anything, the default traceback display should have less >>information, not more. (E.g., display module __name__ instead of the >>code's __file__). > >Notice that this patch does not change the exception printing behaviour >of Python at all. It just changes the implementation of >traceback.print_exception, so it only affects code that actually uses >this function. Furthermore, it only affects code that uses this function >and is *changed* to supply the argument True for print_args. I was just responding to the OP, who was advocating it for Python default behavior, or behavior controlled by the command line. That's why I said, "Yes, but not as a default behavior." From david.ascher at gmail.com Thu Feb 10 01:12:26 2005 From: david.ascher at gmail.com (David Ascher) Date: Thu Feb 10 01:12:30 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: References: <420A8406.4020808@ee.byu.edu> Message-ID: On Wed, 9 Feb 2005 14:45:18 -0800, Guido van Rossum wrote: > The intended user community must accept the code as "best-of-breed". > It seems that the Num* community has some work to do in this respect. I've not followed the num* discussion in quite a while, but my impression back then was that there wasn't "one" such community. Instead, the technical differences in the approaches required in specific fields, regarding things like the relative importance of memory profiles, speed, error handling, willingness to require modern C++ compilers, etc. made practical compromises quite tricky. I would love to be proven wrong. --david From pje at telecommunity.com Thu Feb 10 01:15:48 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu Feb 10 01:13:44 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <2m1xbp5n99.fsf@starship.python.net> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050209191225.03eaeec0@mail.telecommunity.com> At 11:25 PM 2/9/05 +0000, Michael Hudson wrote: >"Phillip J. Eby" writes: > > > At 08:20 PM 2/9/05 +0100, BJ?rn Lindqvist wrote: > >>Does Skip's idea have > >>any merit? > > > > Yes, but not as a default behavior. Many people already consider the > > fact that tracebacks display file paths to be a potential security > > problem. If anything, the default traceback display should have less > > information, not more. (E.g., display module __name__ instead of the > > code's __file__). > >Oh, come on. Making tracebacks less useful to protect people who >accidentally spray them across the internet seems absurd. Would you >like them not to show source, either? I said that many people considered that to be the case, not that I did. ;) I'd personally prefer to read module names than filenames, so I guess I should've mentioned that. :) Of course, Guido has previously answered the filename vs. modulename question (years ago in fact), so it was moot even before I mentioned it. For some reason it slipped my mind at the time, though. From oliphant at ee.byu.edu Thu Feb 10 01:34:59 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Feb 10 01:35:04 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: References: <420A8406.4020808@ee.byu.edu> Message-ID: <420AAC33.807@ee.byu.edu> David Ascher wrote: >I've not followed the num* discussion in quite a while, but my >impression back then was that there wasn't "one" such community. >Instead, the technical differences in the approaches required in >specific fields, regarding things like the relative importance of >memory profiles, speed, error handling, willingness to require modern >C++ compilers, etc. made practical compromises quite tricky. > > I really appreciate comments from those who remember some of the old discussions. There are indeed some different needs. Most of this, however, is in the ufunc object (how do you do math with the arrays). And, a lot of this has been ameliorated with the new concepts of error modes that numarray introduced. There is less argumentation over the basic array object as a memory structure. The biggest argument right now is the design of the object: i.e. a mixture of Python and C (numarray) versus a C-only object (Numeric3). In other words, what I'm saying is that in terms of how the array object should be structure, a lot is known. What is more controversial is should the design be built upon Numarray's object structure (a mixture of Python and C), or on Numeric's --- all in C -Travis From martin at v.loewis.de Thu Feb 10 01:39:09 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 01:39:11 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <5.1.1.6.0.20050209191027.03ea87f0@mail.telecommunity.com> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <5.1.1.6.0.20050209191027.03ea87f0@mail.telecommunity.com> Message-ID: <420AAD2D.1060300@v.loewis.de> Phillip J. Eby wrote: > I was just responding to the OP, who was advocating it for Python > default behavior, or behavior controlled by the command line. That's > why I said, "Yes, but not as a default behavior." I wasn't sure how to interpret the message - I could not find out whether you have looked at the patch, and agreed with it, or whether you merely read the OP's summary of the patch. Regards, Martin From martin at v.loewis.de Thu Feb 10 01:49:33 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 01:49:35 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: <420AA2B5.2060801@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420A9C40.4060306@v.loewis.de> <420AA2B5.2060801@ee.byu.edu> Message-ID: <420AAF9D.6090303@v.loewis.de> Travis Oliphant wrote: > Exactly, the PEP does not reflect the reality of what anybody wants in > the core. It needs modification, or replacment. Can I just do that? My understanding is this: you can, and you should. You are the author of the PEP (together with Paul Barrett), and the PEP is still in Draft status (with a Python-Version of 2.2). Until the PEP is Accepted or Rejected status, you can make any changes to it that you want. It would be nice if you would track the Post-History section, and perhaps a History section at the end, pointing out that the PEP got completely restructured at some point. > Or do I need permission from Barrett and others who has only a passing > interest in this anymore. According to PEP 1, you could ask Barrett for a complete takeover, to remove him from the Authors list. If he agrees, there would be no problem to change that list after so much time has passed. > I think you misunderstood my meaning. For example Numeric has lived 10 > years with very few changes. It seems to me it is rather stable. I probably misunderstand something. If Numeric has been stable for 10 years, why is not good (no need to answer here - an answer in the PEP would be appreciated)? If there is something new to replace it, how stable is that? Regards, Martin From martin at v.loewis.de Thu Feb 10 01:53:24 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 01:53:25 2005 Subject: [Python-Dev] Clarification sought about including a multidimensional array object into Python core In-Reply-To: <420AAC33.807@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> Message-ID: <420AB084.1000008@v.loewis.de> Travis Oliphant wrote: > In other words, what I'm saying is that in terms of how the array object > should be structure, a lot is known. What is more controversial is > should the design be built upon Numarray's object structure (a mixture > of Python and C), or on Numeric's --- all in C To me, this sounds like an implementation detail. I'm sure it is an important detail, as I understand all of this is mostly done for performance reasons. The PEP should list the options, include criteria for selection, and then propose a choice. People can then discuss whether the list of options is complete (if not, you need to extend it), whether the criteria are agreed (they might be not, and there might be difficult consensus, which the PEP should point out), and whether the choice is the right one given the criteria (there should be no debate about this - everybody should agree factually that the choice meets the criteria best). Regards, Martin From bac at OCF.Berkeley.EDU Thu Feb 10 02:25:14 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Feb 10 02:25:28 2005 Subject: [Python-Dev] discourage patch reviews to the list? In-Reply-To: <420A9849.6020304@v.loewis.de> References: <740c3aec050209112069d8c328@mail.gmail.com> <420A87C7.7030102@ocf.berkeley.edu> <420A9849.6020304@v.loewis.de> Message-ID: <420AB7FA.3040106@ocf.berkeley.edu> Martin v. L?wis wrote: > Brett C. wrote: > > But if people don't have that in mind, should we not be encouraging > >> this? I mean it seems to be defeating the purpose of SF and having >> the various mailing lists that send out updates on SF posts. [SNIP] > Bj?rn did post his comment to SF, and a summary to python-dev. I > personally think this is a good strategy: it puts focus on things > that should be worked on. > > Let me explain why I think that these patches should be worked on: > - it might be that the analysis of the patch suggests that the patch > should be rejected, as-is. [SNIP] > - it might be that the analysis suggests changes. [SNIP] > - it might be that the analysis recommend acceptance. [SNIP] All valid points, but I also don't want people to suddenly start posting one-liners or bug posts. I guess it comes down to a signal-to-noise ratio and if the level of signal we are currently getting will hold. If we say it is okay for people to send in patch reviews *only* and not notifications of new patches, bug reports, or bug reviews, then I can handle it. > To put it the other way 'round: should we only discuss changes on > python-dev which *don't* have patches on SF???? I don't think > so. > And neither do I. I just don't want a ton of random emails on python-dev that really belong in the SF tracker instead. Reason why we don't tend to take direct bug reports in email unless there is a question over semantics. > Furthermore, this strategy exposes the reviewer. A reviewer is > somebody who will potentially get write access to the tracker, > and perhaps CVS write access. A reviewer who wants to contribute > in this way regularly clearly needs to gain the trust of other > contributors, and posting smart, valuable, objective, balanced > reviews on contributed patches is an excellent way to gain such > trust (likewise, posting reviews which turn out to be flawed > is a way to find out that the reviewer still needs to learn > things before he can be trusted). > That is a very good point. Guess I am softening on my rejection to this. =) If people in general agree to this idea of having people post patch reviews to python-dev I will update the dev intro essay to reflect all of this. I will also add a mention about the 5-1 patch review deal. [SNIP] > P.S. These remarks are mostly of general nature - I haven't > actually studied yet Bj?rn's review (but I leave it in my > inbox so I can get back to it next week). Same here. I didn't mean to single out Bj?rn in any way. He just happened to trigger an email out of me. =) -Brett From paul at pfdubois.com Thu Feb 10 02:30:16 2005 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu Feb 10 02:30:19 2005 Subject: [Python-Dev] Numeric life as I see it In-Reply-To: <420AB084.1000008@v.loewis.de> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> Message-ID: <420AB928.3090004@pfdubois.com> Martin v. L?wis wrote: The PEP should list the options, include criteria > for selection, and then propose a choice. People can then discuss > whether the list of options is complete (if not, you need to extend > it), whether the criteria are agreed (they might be not, and there > might be difficult consensus, which the PEP should point out), and > whether the choice is the right one given the criteria (there should > be no debate about this - everybody should agree factually that the > choice meets the criteria best). > Unrealistic. I think it is undisputed that there are people with irreconcilably different needs. Frankly, we spent many, many months on the design of Numeric and it represents a set of compromises already. However, the one thing it wouldn't compromise on was speed, even at the expense of safety. A community exists that cannot live with this compromise. We were told that the Python core could also not live with that compromise. Over the years there was pressure to add safety, convenience, flexibility, etc., all sometimes incompatible with speed. Numarray represents in some sense the set of compromises in that direction, besides its technical innovations. Numeric / Numeric3 represents the need for speed camp. I think it is reasonable to suppose that the need for speed piece can be wrapped suitably by the need for safety-flexibility-convenience facilities. I believe that hope underlies Travis' plan. The Nummies (the official set of developers) thought that the Numeric code base was an unsuitable basis for further development. There was no dissent about that at least. My idea was to get something like what Travis is now doing done to replace it. I felt it important to get myself out of the picture after five years as the lead developer especially since my day job had ceased to involve using Numeric. However, removing my cork from the bottle released the unresolved pressure between these two camps. My plan for transition failed. I thought I had consensus on the goal and in fact it wasn't really there. Everyone is perfectly good-willed and clever and trying hard to "all just get along", but the goal was lost. Eric Raymond should write a book about it called "Bumbled Bazaar". I hope everyone will still try to achieve that goal. Interoperability of all the Numeric-related software (including supporting a 'default' plotting package) is required. Aside: While I am at it, let me reiterate what I have said to the other developers privately: there is NO value to inheriting from the array class. Don't try to achieve that capability if it costs anything, even just effort, because it buys you nothing. Those of you who keep remarking on this as if it would simply haven't thought it through IMHO. It sounds so intellectually appealing that David Ascher and I had a version of Numeric that almost did it before we realized our folly. From martin at v.loewis.de Thu Feb 10 03:30:01 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 10 03:30:03 2005 Subject: [Python-Dev] discourage patch reviews to the list? In-Reply-To: <420AB7FA.3040106@ocf.berkeley.edu> References: <740c3aec050209112069d8c328@mail.gmail.com> <420A87C7.7030102@ocf.berkeley.edu> <420A9849.6020304@v.loewis.de> <420AB7FA.3040106@ocf.berkeley.edu> Message-ID: <420AC729.6070804@v.loewis.de> Brett C. wrote: > All valid points, but I also don't want people to suddenly start posting > one-liners or bug posts. I agree that keeping the noise level low is desirable; I hope this will come out naturally when we start commenting on high-noise remarks. For example, I would have no problems telling somebody who says "me too" on a feature request that he should go away and come back with an implementation of the requested feature. I would still apply the "standard" conventions of python-dev: that you should be fairly knowledgable about the things you are talking about before posting. > I guess it comes down to a signal-to-noise ratio and if the level of > signal we are currently getting will hold. If we say it is okay for > people to send in patch reviews *only* and not notifications of new > patches, bug reports, or bug reviews, then I can handle it. People do tend to notify about patches from time to time, especially when they are committers, and want to weigh in their reputation to advance peer review of the proposed changes. Other people who notify about new patches they made will continue to get my "5 for 1" offer which actually triggered this new interest in contributing-by-reviewing. Another reason not to post patches to python-dev is message size for modem users although I'm doubtful how valid this rationale is these days, given ADSL, spam, HTML mails, and everything... > And neither do I. I just don't want a ton of random emails on > python-dev that really belong in the SF tracker instead. Reason why we > don't tend to take direct bug reports in email unless there is a > question over semantics. I certainly don't want to see random comments on python-dev, either (and I do see random comments come in bursts, and have to choose to ignore entire threads because of that. I don't have to write python-dev summaries, though :-) I disagree with the primary reason not to take bug reports on python-dev, however: bug reports in email get lost if not immediately processed; usage of a tracker is necessary to actually "keep track". So this kind of bug management is the primary reason for the tracker, not that we want to keep random users out of python-dev (although this is a convenient side effect). Regards, Martin From skip at pobox.com Thu Feb 10 04:44:05 2005 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 10 04:44:25 2005 Subject: [Python-Dev] Patch review: [ 1098732 ] Enhance tracebacks and stack traces with vars In-Reply-To: <5.1.1.6.0.20050209191027.03ea87f0@mail.telecommunity.com> References: <5.1.1.6.0.20050209143415.030e1750@mail.telecommunity.com> <5.1.1.6.0.20050209191027.03ea87f0@mail.telecommunity.com> Message-ID: <16906.55429.990644.712145@montanaro.dyndns.org> Phillip> I was just responding to the OP, who was advocating it for Phillip> Python default behavior, or behavior controlled by the command Phillip> line. That's why I said, "Yes, but not as a default behavior." My original intent was that it would probably not fly as default behavior. I'm not sure I would always want that behavior either. I would like it for long-running daemons that crash while unattended (places where "python -i" wouldn't really help). Skip From oliphant at ee.byu.edu Thu Feb 10 05:09:52 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Feb 10 05:09:56 2005 Subject: [Python-Dev] Re: Numeric life as I see it In-Reply-To: <420AB928.3090004@pfdubois.com> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> Message-ID: <420ADE90.9050304@ee.byu.edu> > Martin v. L?wis wrote: > The PEP should list the options, include criteria > >> for selection, and then propose a choice. People can then discuss >> whether the list of options is complete (if not, you need to extend >> it), whether the criteria are agreed (they might be not, and there >> might be difficult consensus, which the PEP should point out), and >> whether the choice is the right one given the criteria (there should >> be no debate about this - everybody should agree factually that the >> choice meets the criteria best). >> > > Unrealistic. I think it is undisputed that there are people with > irreconcilably different needs. Frankly, we spent many, many months on > the design of Numeric and it represents a set of compromises already. > However, the one thing it wouldn't compromise on was speed, even at > the expense of safety. A community exists that cannot live with this > compromise. We were told that the Python core could also not live with > that compromise. I'm not sure I agree. The ufuncobject is the only place where this concern existed (should we trip OverFlow, ZeroDivision, etc. errors durring array math). Numarray introduced and implemented the concept of error modes that can be pushed and popped. I believe this is the right solution for the ufuncobject. One question we are pursuing is could the arrayobject get into the core without a particular ufunc object. Most see this as sub-optimal, but maybe it is the only way. > > Over the years there was pressure to add safety, convenience, > flexibility, etc., all sometimes incompatible with speed. Numarray > represents in some sense the set of compromises in that direction, > besides its technical innovations. Numeric / Numeric3 represents the > need for speed camp. I don't see numarray as representing this at all. To me, numarray represents the desire to have more flexible array types (specifically record arrays and maybe character arrays). I personally don't see Numeric3 as in any kind of "need for speed" camp either. I've never liked this distinction, because I don't think it represents a true dichotomy. To me, the differences between Numeric3 and numarray are currently more "architectural" than implementational. Perhaps you are referring to the fact that because numarray has several portions written in Python it is "more flexible" or "more convenient" but slower for small arrays. If you are saying that then I guess Numeric3 is a "need for speed" implementation, and I apologize for not understanding. > > I think it is reasonable to suppose that the need for speed piece can > be wrapped suitably by the need for safety-flexibility-convenience > facilities. I believe that hope underlies Travis' plan. If the "safety-flexibility-convenience" facilities can be specified, then I'm all for one implementation. Numeric3 design goals do not go against any of these ideas intentionally. > > The Nummies (the official set of developers) thought that the Numeric > code base was an unsuitable basis for further development. There was > no dissent about that at least. My idea was to get something like what > Travis is now doing done to replace it. I felt it important to get > myself out of the picture after five years as the lead developer > especially since my day job had ceased to involve using Numeric. Some of the parts needed to be re-written, but I didn't think that meant moving away from the goal to have a single C-type that is the arrayobject. During this process Python 2.2 came out and allowed sub-classing from C-types. As Perry mentioned, and I think needs to be emphasized again, this changed things as any benefit from having a Python-class for the final basic array type disappeared --- beyond ease of prototyping and testing. > > However, removing my cork from the bottle released the unresolved > pressure between these two camps. My plan for transition failed. I > thought I had consensus on the goal and in fact it wasn't really > there. Everyone is perfectly good-willed and clever and trying hard to > "all just get along", but the goal was lost. Eric Raymond should > write a book about it called "Bumbled Bazaar". This is an accurate description. Fortunately, I don't think any ill-will exists (assuming I haven't created any with my recent activities). I do want to "get-along." I just don't want to be silent when there are issues that I think I understand. > > I hope everyone will still try to achieve that goal. Interoperability > of all the Numeric-related software (including supporting a 'default' > plotting package) is required. Utopia is always out of reach :-) > Aside: While I am at it, let me reiterate what I have said to the > other developers privately: there is NO value to inheriting from the > array class. Don't try to achieve that capability if it costs > anything, even just effort, because it buys you nothing. Those of you > who keep remarking on this as if it would simply haven't thought it > through IMHO. It sounds so intellectually appealing that David Ascher > and I had a version of Numeric that almost did it before we realized > our folly. > I appreciate some of what Paul is saying here, but I'm not fully convinced that this is still true with Python 2.2 and up new-style c-types. The concerns seem to be over the fact that you have to re-implement everything in the sub-class because the base-class will always return one of its objects instead of a sub-class object. It seems to me, however, that if the C methods use the object type alloc function when creating new objects then some of this problem is avoided (i.e. if the method is called with a sub-class type passed in, then a sub-class type gets set). Have you looked at how Python now allows sub-classing in C? I'm not an expert here, but it seems like a lot of the problems you were discussing have been ameliorated. There are probably still issues, but.... I will know more when I seen what happens with a Matrix Object inheriting from a Python C-array object. I'm wondering if anyone else with more knowledge about new-style c-types could help here and show me the error of my thinking. -Travis From gvanrossum at gmail.com Thu Feb 10 05:36:39 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 10 05:36:44 2005 Subject: [Python-Dev] Re: Numeric life as I see it In-Reply-To: <420ADE90.9050304@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: [Paul] > > Aside: While I am at it, let me reiterate what I have said to the > > other developers privately: there is NO value to inheriting from the > > array class. Don't try to achieve that capability if it costs > > anything, even just effort, because it buys you nothing. Those of you > > who keep remarking on this as if it would simply haven't thought it > > through IMHO. It sounds so intellectually appealing that David Ascher > > and I had a version of Numeric that almost did it before we realized > > our folly. [Travis] > I appreciate some of what Paul is saying here, but I'm not fully > convinced that this is still true with Python 2.2 and up new-style > c-types. The concerns seem to be over the fact that you have to > re-implement everything in the sub-class because the base-class will > always return one of its objects instead of a sub-class object. > It seems to me, however, that if the C methods use the object type > alloc function when creating new objects then some of this problem is > avoided (i.e. if the method is called with a sub-class type passed in, > then a sub-class type gets set). This would severely constrain the __new__ method of the subclass. > Have you looked at how Python now allows sub-classing in C? I'm not an > expert here, but it seems like a lot of the problems you were discussing > have been ameliorated. There are probably still issues, but.... > > I will know more when I seen what happens with a Matrix Object > inheriting from a Python C-array object. And why would a Matrix need to inherit from a C-array? Wouldn't it make more sense from an OO POV for the Matrix to *have* a C-array without *being* one? > I'm wondering if anyone else with more knowledge about new-style c-types > could help here and show me the error of my thinking. I'm trying... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant at ee.byu.edu Thu Feb 10 06:02:11 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Feb 10 06:02:15 2005 Subject: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it In-Reply-To: References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: <420AEAD3.9030705@ee.byu.edu> >[Travis] > > >>I appreciate some of what Paul is saying here, but I'm not fully >>convinced that this is still true with Python 2.2 and up new-style >>c-types. The concerns seem to be over the fact that you have to >>re-implement everything in the sub-class because the base-class will >>always return one of its objects instead of a sub-class object. >>It seems to me, however, that if the C methods use the object type >>alloc function when creating new objects then some of this problem is >>avoided (i.e. if the method is called with a sub-class type passed in, >>then a sub-class type gets set). >> >> > >This would severely constrain the __new__ method of the subclass. > > I obviously don't understand the intricacies here, so fortunately it's not a key issue for me because I'm not betting the farm on being able to inherit from the arrayobject. But, it is apparent that I don't understand all the issues. >>Have you looked at how Python now allows sub-classing in C? I'm not an >>expert here, but it seems like a lot of the problems you were discussing >>have been ameliorated. There are probably still issues, but.... >> >>I will know more when I seen what happens with a Matrix Object >>inheriting from a Python C-array object. >> >> > >And why would a Matrix need to inherit from a C-array? Wouldn't it >make more sense from an OO POV for the Matrix to *have* a C-array >without *being* one? > > The only reason I'm thinking of here is to have it inherit from the C-array many of the default methods without having to implement them all itself. I think Paul is saying that this never works with C-types like arrays, and I guess from your comments you agree with him. The only real reason for wanting to construct a separate Matrix object is the need to overload the * operation to do matrix multiplication instead of element-by-element multiplication. -Travis From david.ascher at gmail.com Thu Feb 10 06:50:26 2005 From: david.ascher at gmail.com (David Ascher) Date: Thu Feb 10 06:50:29 2005 Subject: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it In-Reply-To: <420AEAD3.9030705@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> <420AEAD3.9030705@ee.byu.edu> Message-ID: On Wed, 09 Feb 2005 22:02:11 -0700, Travis Oliphant wrote: GvR: >And why would a Matrix need to inherit from a C-array? Wouldn't it >make more sense from an OO POV for the Matrix to *have* a C-array >without *being* one? Travis: > The only reason I'm thinking of here is to have it inherit from the > C-array many of the default methods without having to implement them all > itself. I think Paul is saying that this never works with C-types like > arrays, and I guess from your comments you agree with him. > > The only real reason for wanting to construct a separate Matrix object > is the need to overload the * operation to do matrix multiplication > instead of element-by-element multiplication. This is dredging stuff up from years (and layers and layers of new memories =), but I think that what Paul was referring to was in fact independent of implementation language. The basic problem, IIRC, had to do with the classic (it turns out) problem of confusing the need for reuse of implementation bits with interface inheritance. We always felt that things that people felt were "array-like" (Matrices, missing value arrays, etc.) _should_ inherit from array, and that (much like you're saying), it would save work. In practice, however, there were a few problems (again, from lousy memory), all boiling down to the fact that the array object implemenation implies interfaces that weren't actually applicable to the others. The biggest problems had to do with the fact that when you do subclassing, you end up in a nasty combinatorial problem when you wanted to figure out what operand1 operator operand2 means, if operand1 is a derivative and operand2 is a different derivative. In other words, if you multiply a matrix with a missingvalues array, what should you do? Having a common inheritance means you need to _stop_ default behaviors from happening, to avoid meaningless results. It gets worse with function calls that take "array-like objects" as arguments. A lot of this may be resolvable with the recent notions of adaptation and more formalized interfaces. In the meantime, I would, like Paul, recommend that you separate the interface-bound type aspects (which is what Python classes are in fact!) from the implementation sharing. This may be obvious to everyone, and if so, sorry. --david From oliphant at ee.byu.edu Thu Feb 10 10:30:22 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Feb 10 10:30:38 2005 Subject: [Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it In-Reply-To: <1c3044466186480f55ef45d2c977731b@laposte.net> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> <1c3044466186480f55ef45d2c977731b@laposte.net> Message-ID: <420B29AE.8030701@ee.byu.edu> >> One question we are pursuing is could the arrayobject get into the >> core without a particular ufunc object. Most see this as >> sub-optimal, but maybe it is the only way. > > > Since all the artithmetic operations are in ufunc that would be > suboptimal solution, but indeed still a workable one. I think replacing basic number operations of the arrayobject should simple, so perhaps a default ufunc object could be worked out for inclusion. > >> I appreciate some of what Paul is saying here, but I'm not fully >> convinced that this is still true with Python 2.2 and up new-style >> c-types. The concerns seem to be over the fact that you have to >> re-implement everything in the sub-class because the base-class will >> always return one of its objects instead of a sub-class object. > > > I'd say that such discussions should be postponed until someone > proposes a good use for subclassing arrays. Matrices are not one, in > my opinion. > Agreed. It is is not critical to what I am doing, and I obviously need more understanding before tackling such things. Numeric3 uses the new c-type largely because of the nice getsets table which is separate from the methods table. This replaces the rather ugly C-functions getattr and setattr. -Travis From p.f.moore at gmail.com Thu Feb 10 10:40:21 2005 From: p.f.moore at gmail.com (Paul Moore) Date: Thu Feb 10 10:40:24 2005 Subject: [Python-Dev] discourage patch reviews to the list? In-Reply-To: <420AB7FA.3040106@ocf.berkeley.edu> References: <740c3aec050209112069d8c328@mail.gmail.com> <420A87C7.7030102@ocf.berkeley.edu> <420A9849.6020304@v.loewis.de> <420AB7FA.3040106@ocf.berkeley.edu> Message-ID: <79990c6b05021001407626182@mail.gmail.com> On Wed, 09 Feb 2005 17:25:14 -0800, Brett C. wrote: > All valid points, but I also don't want people to suddenly start posting > one-liners or bug posts. > > I guess it comes down to a signal-to-noise ratio and if the level of signal we > are currently getting will hold. If we say it is okay for people to send in > patch reviews *only* and not notifications of new patches, bug reports, or bug > reviews, then I can handle it. Having done some reviews (admittedly for the 5-for-1 deal) I do like seeing patch reviews appear on python-dev. As they are meant to be reviews, this implies a certain level of effort expended, and quality in the response. I agree with Martin that detail comments should go in the tracker - a posting can summarise to an extent, but should be enough to let python-dev readers know if they can act on the review. It's nice to see new contributors doing good work to help Python, and I assume they like the chance to feel like they are "participating" by posting helpful contributions to python-dev. IMHO, the tracker doesn't give this same feeling of "contributing". Also, review postings encourage others to do the same - I know I did my reviews after having seen someone else post a set of reviews. It made me think "hey, I could do that!" I'm sure there are other lurkers on python-dev who could be encouraged to assist in the same way. Having said this, I'd suggest that if people intend to review multiple patches, they post a summary covering a number of patches at a time. Paul. From jimjjewett at gmail.com Thu Feb 10 19:51:43 2005 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu Feb 10 19:51:48 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.343, 2.344 In-Reply-To: References: Message-ID: On Wed, 09 Feb 2005 17:42:41 -0800, rhettinger@users.sourceforge.net wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31172 > > Modified Files: > compile.c > Log Message: > Remove the set conversion which didn't work with: [] in (0,) Why is this a problem? If there were *any* unhashable objects in the container, then the compiler would have bailed on the initial set-conversion. If there aren't any unhashable values, then the (unhashable) item being checked is not in the set. ==> Return False. Are you worried about unhashable objects (as item) which compare == to something that is hashable (in container)? Custom rich compares can already confuse the "in" tests. Or is the problem that guarding against/trapping this case is somehow so expensive that it overrides the expected savings? -jJ From tim.peters at gmail.com Thu Feb 10 20:09:34 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 10 20:09:39 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: References: Message-ID: <1f7befae0502101109161da0d1@mail.gmail.com> [fdrake@users.sourceforge.net] > Modified Files: > xmlrpclib.py > Log Message: > accept datetime.datetime instances when marshalling; > dateTime.iso8601 elements still unmarshal into xmlrpclib.DateTime objects > > Index: xmlrpclib.py ... > + if datetime and isinstance(value, datetime.datetime): > + self.value = value.strftime("%Y%m%dT%H:%M:%S") > + return ... [and similarly later] ... Fred, is there a reason to avoid datetime.datetime's .isoformat() method here? Like so: >>> import datetime >>> print datetime.datetime(2005, 2, 10, 14, 0, 8).isoformat() 2005-02-10T14:00:08 A possible downside is that you'll also get fractional seconds if the instance records a non-zero .microseconds value. From fdrake at acm.org Thu Feb 10 20:23:59 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Feb 10 20:24:11 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <1f7befae0502101109161da0d1@mail.gmail.com> References: <1f7befae0502101109161da0d1@mail.gmail.com> Message-ID: <200502101423.59995.fdrake@acm.org> On Thursday 10 February 2005 14:09, Tim Peters wrote: > Fred, is there a reason to avoid datetime.datetime's .isoformat() > method here? Like so: Yes. The XML-RPC spec is quite vague. It claims that the dates are in ISO 8601 format, but doesn't say anything more about it. The example shows a string without hyphens (but with colons), so I stuck with eactly that. > A possible downside is that you'll also get fractional seconds if the > instance records a non-zero .microseconds value. There's nothing in the XML-RPC spec about the resolution of time, so, again, I'd rather be conservative in what we generate. -Fred -- Fred L. Drake, Jr. From tim.peters at gmail.com Thu Feb 10 20:44:21 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 10 20:44:24 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <200502101423.59995.fdrake@acm.org> References: <1f7befae0502101109161da0d1@mail.gmail.com> <200502101423.59995.fdrake@acm.org> Message-ID: <1f7befae050210114446dee240@mail.gmail.com> [Tim] >> Fred, is there a reason to avoid datetime.datetime's .isoformat() >> method here? Like so: > Yes. The XML-RPC spec is quite vague. It claims that the dates are in ISO > 8601 format, but doesn't say anything more about it. The example shows a > string without hyphens (but with colons), so I stuck with eactly that. Well, then since that isn't ISO 8601 format, it would be nice to have a comment explaining why it's claiming to be anyway <0.5 wink>. >> A possible downside is that you'll also get fractional seconds if the >> instance records a non-zero .microseconds value. > There's nothing in the XML-RPC spec about the resolution of time, so, again, > I'd rather be conservative in what we generate. dt.replace(microsecond=0).isoformat() suffices for that much. Tack on .replace('-', '') to do the whole job. From mwh at python.net Thu Feb 10 20:54:13 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 10 20:54:15 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.343, 2.344 In-Reply-To: (Jim Jewett's message of "Thu, 10 Feb 2005 13:51:43 -0500") References: Message-ID: <2mmzuc42e2.fsf@starship.python.net> Jim Jewett writes: > On Wed, 09 Feb 2005 17:42:41 -0800, rhettinger@users.sourceforge.net > wrote: >> Update of /cvsroot/python/python/dist/src/Python >> In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31172 >> >> Modified Files: >> compile.c >> Log Message: >> Remove the set conversion which didn't work with: [] in (0,) > > Why is this a problem? It broke the test suite... > If there were *any* unhashable objects in the container, then the > compiler would have bailed on the initial set-conversion. Also, the RHS wouldn't have been a tuple of constants, as far as the compiler saw it. > If there aren't any unhashable values, then the (unhashable) item > being checked is not in the set. ==> Return False. This would seem to require changing the frozenset implementation. I don't know if the option of unhashable implying returning false from frozenset.__contains__() was considered at the time it was implemented but it doesn't feel right to me. > Are you worried about unhashable objects (as item) which > compare == to something that is hashable (in container)? > Custom rich compares can already confuse the "in" tests. This was a concern of mine, yes. Although any custom object (particularly an unhashable one!) that compares equal to something so simple as an integer, string or tuple seems bad design, I'm not sure that's the point. > Or is the problem that guarding against/trapping this case is > somehow so expensive that it overrides the expected savings? If you want to compile the expression x in (1,2,3) to contain the moral equivalent of a try:except: block, I'd question your sanity. Cheers, mwh -- > It might get my attention if you'd spin around in your chair, > spoke in tongues, and puked jets of green goblin goo. I can arrange for this. ;-) -- Barry Warsaw & Fred Drake From fdrake at acm.org Thu Feb 10 21:16:37 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Feb 10 21:16:42 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <1f7befae050210114446dee240@mail.gmail.com> References: <200502101423.59995.fdrake@acm.org> <1f7befae050210114446dee240@mail.gmail.com> Message-ID: <200502101516.37550.fdrake@acm.org> On Thursday 10 February 2005 14:44, Tim Peters wrote: > Well, then since that isn't ISO 8601 format, it would be nice to have > a comment explaining why it's claiming to be anyway <0.5 wink>. Hmm, that's right (ISO 8601:2000, section 5.4.2). Sigh. > dt.replace(microsecond=0).isoformat() > > suffices for that much. Tack on .replace('-', '') to do the whole job. Yep, that would work too. -Fred -- Fred L. Drake, Jr. From fdrake at acm.org Thu Feb 10 21:32:14 2005 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu Feb 10 21:32:21 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <1f7befae050210114446dee240@mail.gmail.com> References: <200502101423.59995.fdrake@acm.org> <1f7befae050210114446dee240@mail.gmail.com> Message-ID: <200502101532.14964.fdrake@acm.org> On Thursday 10 February 2005 14:44, Tim Peters wrote: > Well, then since that isn't ISO 8601 format, it would be nice to have > a comment explaining why it's claiming to be anyway <0.5 wink>. I've posted a note on the XML-RPC list about this. There doesn't seem to be anything that describes the range of what's accepted and produced by the various XML-RPC libraries, but I've not looked hard for it. -Fred -- Fred L. Drake, Jr. From jjl at pobox.com Thu Feb 10 23:30:23 2005 From: jjl at pobox.com (John J Lee) Date: Thu Feb 10 23:34:20 2005 Subject: [Python-Dev] Patches for cookielib bugs, for 2.4.1 Message-ID: Hope these can get in before 2.4.1. All include unit tests. http://python.org/sf/1117339 cookielib and cookies with special names http://python.org/sf/1117454 cookielib.LWPCookieJar incorrectly loads value-less cookies http://python.org/sf/1117398 cookielib LWPCookieJar and MozillaCookieJar exceptions John From pinard at iro.umontreal.ca Fri Feb 11 00:00:04 2005 From: pinard at iro.umontreal.ca (=?iso-8859-1?Q?Fran=E7ois?= Pinard) Date: Fri Feb 11 00:00:32 2005 Subject: [Python-Dev] discourage patch reviews to the list? In-Reply-To: <420AC729.6070804@v.loewis.de> References: <740c3aec050209112069d8c328@mail.gmail.com> <420A87C7.7030102@ocf.berkeley.edu> <420A9849.6020304@v.loewis.de> <420AB7FA.3040106@ocf.berkeley.edu> <420AC729.6070804@v.loewis.de> Message-ID: <20050210230004.GA17095@phenix.progiciels-bpi.ca> [Martin von L?wis] > I disagree with the primary reason not to take bug reports on > python-dev, however: bug reports in email get lost if not immediately > processed; usage of a tracker is necessary to actually "keep > track". Some developers and users appreciate bug trackers, or at least are able to stand them. Others, at least like me, just hate them. When a developer replies to one of my emails, asking me that I use the bug tracker, my email was surely not lost, since the developer is replying to it. That developer could have used the bug tracker himself, the way he sees fit, instead of inviting me to do it. In fact, a developer asking me to use the tracker of the day is trying to educate me into using it. Or maybe he knows that using the tracker is uneasy and is trying to spare himself some disgust. Or maybe he is consciously trying to turn me down :-). I do not buy the argument of the fear of emails being lost. Actually, almost all of my emails reporting bugs received a reply in one form or another, so developers do see them. If a developer wants to use a bug tracker, then nice, good for him. For one, trackers merely tell me that I should get a life and do nicer things than reporting bugs. In any case, Python has plenty of users, and others will contribute anyway. So, after all, why should I? > So this kind of bug management is the primary reason for the tracker, > not that we want to keep random users out of python-dev (although this > is a convenient side effect). Hey, that's good! Trackers may act like a randomiser! :-) -- Fran?ois Pinard http://pinard.progiciels-bpi.ca From david.ascher at gmail.com Fri Feb 11 00:03:29 2005 From: david.ascher at gmail.com (David Ascher) Date: Fri Feb 11 00:03:32 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <200502101532.14964.fdrake@acm.org> References: <200502101423.59995.fdrake@acm.org> <1f7befae050210114446dee240@mail.gmail.com> <200502101532.14964.fdrake@acm.org> Message-ID: On Thu, 10 Feb 2005 15:32:14 -0500, Fred L. Drake, Jr. wrote: > On Thursday 10 February 2005 14:44, Tim Peters wrote: > > Well, then since that isn't ISO 8601 format, it would be nice to have > > a comment explaining why it's claiming to be anyway <0.5 wink>. > > I've posted a note on the XML-RPC list about this. There doesn't seem to be > anything that describes the range of what's accepted and produced by the > various XML-RPC libraries, but I've not looked hard for it. Is there any surprise here? =) From tim.peters at gmail.com Fri Feb 11 00:27:01 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 00:28:49 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39 In-Reply-To: <200502101516.37550.fdrake@acm.org> References: <200502101423.59995.fdrake@acm.org> <1f7befae050210114446dee240@mail.gmail.com> <200502101516.37550.fdrake@acm.org> Message-ID: <1f7befae05021015277972d295@mail.gmail.com> [Tim] >> Well, then since that isn't ISO 8601 format, it would be nice to have >> a comment explaining why it's claiming to be anyway <0.5 wink>. [Fred] > Hmm, that's right (ISO 8601:2000, section 5.4.2). Sigh. Ain't your fault. I didn't remember that I had seen the XML-RPC spec before, in conjunction with its crazy rules for representing floats. It's a very vague doc indeed. Anyway, some quick googling strongly suggests that many XML-RPC implementors don't know anything about 8601 either, and accept/produce only the non-8601 format inferred from the single example in "the spec". Heh -- kids . From bjourne at gmail.com Fri Feb 11 01:15:18 2005 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Fri Feb 11 01:18:50 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.343, 2.344 In-Reply-To: References: Message-ID: <740c3aec05021016151c0de340@mail.gmail.com> > On Wed, 09 Feb 2005 17:42:41 -0800, rhettinger@users.sourceforge.net > wrote: > > Update of /cvsroot/python/python/dist/src/Python > > In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv31172 > > > > Modified Files: > > compile.c > > Log Message: > > Remove the set conversion which didn't work with: [] in (0,) > > Why is this a problem? If there were *any* unhashable objects > in the container, then the compiler would have bailed on the > initial set-conversion. >>> [] in frozenset(["hi", "ho"]) Traceback (most recent call last): File "", line 1, in ? TypeError: list objects are unhashable The compiler do bail out when there are unhashable objects outside the tuple, but not if the LHS is unhashable. I believe that is because internally frozenset uses a dict and it does something similar to d.has_key([]) in this case. It should be trivial for the compiler to also check the LHS for hashability I think. That is also why the email unit test failed - LHS was unhashable but the RHS was hashable. There is a patch for that (1119016) at SF but that may no longer be needed. -- mvh Bj?rn From python at rcn.com Fri Feb 11 01:59:25 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 11 02:03:22 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Pythoncompile.c, 2.343, 2.344 In-Reply-To: <740c3aec05021016151c0de340@mail.gmail.com> Message-ID: <004101c50fd4$ee8c9080$83f9cc97@oemcomputer> [Raymond] > > > Remove the set conversion which didn't work with: [] in (0,) [Jim] > > Why is this a problem? If there were *any* unhashable objects > > in the container, then the compiler would have bailed on the > > initial set-conversion. > > >>> [] in frozenset(["hi", "ho"]) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: list objects are unhashable [Bjorn] > The compiler do bail out when there are unhashable objects outside the > tuple, but not if the LHS is unhashable. I believe that is because > internally frozenset uses a dict and it does something similar to > d.has_key([]) in this case. It should be trivial for the compiler to > also check the LHS for hashability I think. > > That is also why the email unit test failed - LHS was unhashable but > the RHS was hashable. There is a patch for that (1119016) at SF but > that may no longer be needed. Right, that patch only fixes a symptom. Also, the compiler cannot check the hashability of the search key because it is likely not known at compile time (i.e. x in (1,2,3) where x is a function argument). For the time being, the set conversion concept was removed entirely. To go forward with it at some point, it will need a fast search type other than frozenset, something like: class FastSearchTuple(tuple): """Tuple lookalike that has O(1) search time if both the key and tuple elements are hashable; otherwise it reverts to an O(n) linear search. Used by compile.c for 'in' tests on tuples of constants. """ def __init__(self, data): try: self.dict = dict.fromkeys(data) except TypeError: self.dict = None def __contains__(self, key): try: return key in self.dict except TypeError: return tuple.__contains__(self, key) Raymond Hettinger From abo at minkirri.apana.org.au Fri Feb 11 03:15:47 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 11 03:16:32 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050208195243.GD10650@zot.electricrain.com> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> Message-ID: <1108088147.3753.51.camel@schizo> On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: > > The md5.h/md5c.c files allow "copy and use", but no modification of > > the files. There are some alternative implementations, i.e. in glibc, > > openssl, so a replacement should be sage. Any other requirements when > > considering a replacement? One thing to consider is "degree of difficulty" :-) > > Matthias > > I believe the "plan" for md5 and sha1 and such is to use the much > faster openssl versions "in the future" (based on a long thread > debating future interfaces to such things on python-dev last summer). > That'll sidestep any tedious license issue and give a better > implementation at the same time. i don't believe anyone has taken the > time to make such a patch yet. I wasn't around for that discussion. There are two viable replacements for the RSA implementation currently used; libmd openssl . The libmd implementation is by Colin Plumb and has the licence; "This code is in the public domain; do with it what you wish." The API is identical to the RSA implementation and BSD world's libmd and hence is a drop in replacement. This implementation is faster than the RSA implementation. The openssl implementation has an apache style license. The API is almost the same but slightly different to the RSA API, so it would require a little bit of work to make it fit. This implementation is the fastest currently available, as it includes many platform specific optimisations for a large range of platforms. Currently md5c.c is included in the python sources. The libmd implementation has a drop in replacement for md5c.c. The openssl implementation is a complicated tangle of Makefile expanded template code that would be harder to include in the Python sources. In the Linux world, openssl is starting to become ubiquitous, so not including it and statically or even dynamically linking against it is feasible. However, using Python in other lands will probably require something to be included. Long term, I think openssl is the way to go. Short term, libmd is a painless replacement that gets around the licencing issues. I have been using the libmd API stuff for md4 in librsync, and am looking at migrating to the openssl API. If people hassle me, I could probably do the openssl API migration for Python, but I'm not sure what the best approach would be to including the source in Python sources. FWIW, I also have an md4sum module and md4c.c implementation that I'm happy to contribute to Python (done for pysysnc). -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From bob at redivi.com Fri Feb 11 03:30:59 2005 From: bob at redivi.com (Bob Ippolito) Date: Fri Feb 11 03:31:06 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108088147.3753.51.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> Message-ID: On Feb 10, 2005, at 9:15 PM, Donovan Baarda wrote: > On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: >>> The md5.h/md5c.c files allow "copy and use", but no modification of >>> the files. There are some alternative implementations, i.e. in glibc, >>> openssl, so a replacement should be sage. Any other requirements when >>> considering a replacement? > > One thing to consider is "degree of difficulty" :-) > >>> Matthias >> >> I believe the "plan" for md5 and sha1 and such is to use the much >> faster openssl versions "in the future" (based on a long thread >> debating future interfaces to such things on python-dev last summer). >> That'll sidestep any tedious license issue and give a better >> implementation at the same time. i don't believe anyone has taken the >> time to make such a patch yet. > > I wasn't around for that discussion. There are two viable replacements > for the RSA implementation currently used; > > libmd > openssl . -- > In the Linux world, openssl is starting to become ubiquitous, so not > including it and statically or even dynamically linking against it is > feasible. However, using Python in other lands will probably require > something to be included. > > Long term, I think openssl is the way to go. Short term, libmd is a > painless replacement that gets around the licencing issues. OpenSSL is also ubiquitous on Mac OS X (as a shared lib): Mac OS X 10.2.8 has OpenSSL 0.9.6i Feb 19 2003 Mac OS X 10.3.8 has OpenSSL 0.9.7b 10 Apr 2003 One possible alternative would be to bring in something like PyOpenSSL and just rewrite the md5 (and sha?) extensions as Python modules that use that API. -bob From abo at minkirri.apana.org.au Fri Feb 11 03:50:48 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 11 03:51:25 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> Message-ID: <1108090248.3753.53.camel@schizo> On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote: > On Feb 10, 2005, at 9:15 PM, Donovan Baarda wrote: > > > On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: [...] > One possible alternative would be to bring in something like PyOpenSSL > and just rewrite the md5 (and sha?) > extensions as Python modules that use that API. Only problem with this, is pyopenssl doesn't yet include any mdX or sha modules. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From bob at redivi.com Fri Feb 11 05:13:55 2005 From: bob at redivi.com (Bob Ippolito) Date: Fri Feb 11 05:14:17 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108090248.3753.53.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> Message-ID: <226e9c65e562f9b0439333053036fef3@redivi.com> On Feb 10, 2005, at 9:50 PM, Donovan Baarda wrote: > On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote: >> On Feb 10, 2005, at 9:15 PM, Donovan Baarda wrote: >> >>> On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: > [...] >> One possible alternative would be to bring in something like PyOpenSSL >> and just rewrite the md5 (and >> sha?) >> extensions as Python modules that use that API. > > Only problem with this, is pyopenssl doesn't yet include any mdX or sha > modules. My bad, how about M2Crypto then? This one supports message digests and is more license compatible with Python to boot. -bob From abo at minkirri.apana.org.au Fri Feb 11 07:15:39 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 11 07:16:20 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <226e9c65e562f9b0439333053036fef3@redivi.com> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> Message-ID: <1108102539.3753.87.camel@schizo> On Thu, 2005-02-10 at 23:13 -0500, Bob Ippolito wrote: > On Feb 10, 2005, at 9:50 PM, Donovan Baarda wrote: > > > On Thu, 2005-02-10 at 21:30 -0500, Bob Ippolito wrote: [...] > > Only problem with this, is pyopenssl doesn't yet include any mdX or sha > > modules. > > My bad, how about M2Crypto > then? This one supports message digests and is more license compatible > with Python to boot. [...] This one does have md5 support, but the Python API is rather different from the current python md5sum API. It hooks into the slightly higher level MVP openssl layer, rather than the lower level md5 layer. Hooking into the MVP layer pretty much requires including all the openssl message digest implementations (which may or may not be a good idea). It also uses SWIG to generate the extension module. I don't think anything else in Python itself uses SWIG, so starting to use it would introduce a "Build Dependency". I think it would be cleaner and simpler to modify the existing md5module.c to use the openssl md5 layer API (this is just a search/replace to change the function names). The bigger problem is deciding what/how/whether to include the openssl md5 implementation sources so that win32 can use them. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From abo at minkirri.apana.org.au Fri Feb 11 07:52:20 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 11 07:52:57 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108102539.3753.87.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> Message-ID: <1108104740.3753.91.camel@schizo> On Fri, 2005-02-11 at 17:15 +1100, Donovan Baarda wrote: [...] > I think it would be cleaner and simpler to modify the existing > md5module.c to use the openssl md5 layer API (this is just a > search/replace to change the function names). The bigger problem is > deciding what/how/whether to include the openssl md5 implementation > sources so that win32 can use them. Thinking about it, probably the best way is to include the libmd md5c.c modified to use the openssl API, and then use configure to check for and use openssl if it is available. That way win32 could use the provided md5c.c, and other platforms could use the faster openssl. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From doko at cs.tu-berlin.de Fri Feb 11 12:55:02 2005 From: doko at cs.tu-berlin.de (Matthias Klose) Date: Fri Feb 11 12:55:26 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108088147.3753.51.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> Message-ID: <16908.40214.287358.160325@gargle.gargle.HOWL> Donovan Baarda writes: > On Tue, 2005-02-08 at 11:52 -0800, Gregory P. Smith wrote: > > > The md5.h/md5c.c files allow "copy and use", but no modification of > > > the files. There are some alternative implementations, i.e. in glibc, > > > openssl, so a replacement should be sage. Any other requirements when > > > considering a replacement? > > One thing to consider is "degree of difficulty" :-) > > > > Matthias > > > > I believe the "plan" for md5 and sha1 and such is to use the much > > faster openssl versions "in the future" (based on a long thread > > debating future interfaces to such things on python-dev last summer). > > That'll sidestep any tedious license issue and give a better > > implementation at the same time. i don't believe anyone has taken the > > time to make such a patch yet. > > I wasn't around for that discussion. There are two viable replacements > for the RSA implementation currently used; > > libmd > openssl . > > The libmd implementation is by Colin Plumb and has the licence; "This > code is in the public domain; do with it what you wish." The API is > identical to the RSA implementation and BSD world's libmd and hence is a > drop in replacement. This implementation is faster than the RSA > implementation. > [...] > > Currently md5c.c is included in the python sources. The libmd > implementation has a drop in replacement for md5c.c. The openssl > implementation is a complicated tangle of Makefile expanded template > code that would be harder to include in the Python sources. I would prefer that one as a short term solution. Patch at #1118602. From doko at cs.tu-berlin.de Fri Feb 11 13:04:38 2005 From: doko at cs.tu-berlin.de (Matthias Klose) Date: Fri Feb 11 13:04:56 2005 Subject: Bug#293932: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: <16908.40790.23812.274563@gargle.gargle.HOWL> Jeremy Hylton writes: > Maybe some ambitious PSF activitst could contact Roskind and Steve > Kirsch and see if they know who at Disney to talk to... Or maybe the > Disney guys who were at PyCon last year could help. please could somebody give me a contact address? Matthias From jhylton at gmail.com Fri Feb 11 13:35:18 2005 From: jhylton at gmail.com (Jeremy Hylton) Date: Fri Feb 11 13:35:21 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16908.40214.287358.160325@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <16908.40214.287358.160325@gargle.gargle.HOWL> Message-ID: On Fri, 11 Feb 2005 12:55:02 +0100, Matthias Klose wrote: > > Currently md5c.c is included in the python sources. The libmd > > implementation has a drop in replacement for md5c.c. The openssl > > implementation is a complicated tangle of Makefile expanded template > > code that would be harder to include in the Python sources. > > I would prefer that one as a short term solution. Patch at #1118602. Unfortunately a license that says it is in the public domain is unacceptable (and should be for Debian, too). That is to say, it's not possible for someone to claim that something they produce is in the public domain. See http://www.linuxjournal.com/article/6225 Jeremy From skip at pobox.com Fri Feb 11 13:54:32 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 11 13:54:44 2005 Subject: Bug#293932: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <16908.40790.23812.274563@gargle.gargle.HOWL> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> <16908.40790.23812.274563@gargle.gargle.HOWL> Message-ID: <16908.43784.902706.197167@montanaro.dyndns.org> >> Maybe some ambitious PSF activitst could contact Roskind and Steve >> Kirsch and see if they know who at Disney to talk to... Or maybe the >> Disney guys who were at PyCon last year could help. Matthias> please could somebody give me a contact address? Steve's easy enough to get ahold of: http://www.skirsch.com/ (He even still has a UltraSeek-powered search of his site. ;-) Search Kirsch's site for Jim Roskind returned jar@netscape.com but that was dated 31 Oct 2000. An abstract for a talk at University of Arizona in late 2003 sort of implied he was still at Netscape then ... maybe... Skip From greg at electricrain.com Fri Feb 11 18:51:18 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Fri Feb 11 18:51:26 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108102539.3753.87.camel@schizo> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> Message-ID: <20050211175118.GC25441@zot.electricrain.com> > I think it would be cleaner and simpler to modify the existing > md5module.c to use the openssl md5 layer API (this is just a > search/replace to change the function names). The bigger problem is > deciding what/how/whether to include the openssl md5 implementation > sources so that win32 can use them. yes, that is all i was suggesting. win32 python is already linked against openssl for the socket module ssl support, having the md5 and sha1 modules depend on openssl should not cause a problem. -greg From trentm at ActiveState.com Fri Feb 11 19:37:15 2005 From: trentm at ActiveState.com (Trent Mick) Date: Fri Feb 11 19:39:35 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken Message-ID: <420CFB5B.7030007@activestate.com> Has anyone else noticed that viewcvs is broken on SF? > [trentm@booboo ~] > $ curl -D tmp/headers http://cvs.sourceforge.net/viewcvs.py/python > > > 502 Bad Gateway > >

Bad Gateway

>

The proxy server received an invalid > response from an upstream server.
>

> > [trentm@booboo ~] > $ cat tmp/headers > HTTP/1.1 502 Bad Gateway > Date: Fri, 11 Feb 2005 18:38:25 GMT > Server: Apache/2.0.40 (Red Hat Linux) > Content-Length: 232 > Connection: close > Content-Type: text/html; charset=iso-8859-1 Or is this just me? It is also broken for other projects for me -- e.g. 'pywin32'. Cheers, Trent -- Trent Mick trentm@activestate.com From tim.peters at gmail.com Fri Feb 11 20:14:30 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 20:14:33 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <420CFB5B.7030007@activestate.com> References: <420CFB5B.7030007@activestate.com> Message-ID: <1f7befae05021111143c346e3@mail.gmail.com> [Trent Mick] > Has anyone else noticed that viewcvs is broken on SF? It failed the same way from Virginia just now. I suppose that's your reward for kindly updating the Python copyright . The good news is that you can use this lull in your Python work to contribute to ZODB development! ViewCVS at zope.org is always happy to see you: http://svn.zope.org/ZODB/trunk/ From theller at python.net Fri Feb 11 20:20:57 2005 From: theller at python.net (Thomas Heller) Date: Fri Feb 11 20:19:24 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <1f7befae05021111143c346e3@mail.gmail.com> (Tim Peters's message of "Fri, 11 Feb 2005 14:14:30 -0500") References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> Message-ID: <7jleewdi.fsf@python.net> Tim Peters writes: > [Trent Mick] >> Has anyone else noticed that viewcvs is broken on SF? > > It failed the same way from Virginia just now. I suppose that's your > reward for kindly updating the Python copyright . > The failure lasts already for several days: http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1107968334 Thomas From tim.peters at gmail.com Fri Feb 11 20:24:51 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 20:24:54 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <7jleewdi.fsf@python.net> References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> <7jleewdi.fsf@python.net> Message-ID: <1f7befae05021111246ca3c616@mail.gmail.com> [Thomas Heller] Jeez Louise! As of 2005-02-09 there is an outage of anonymous CVS (tarballs, pserver-based CVS and ViewCVS) for projects whose UNIX names start with the letters m, n, p, q, t, y and z. We are currently working on resolving this issue. So that means it wouldn't even do us any good to rename the project to Thomas, Trent, Mick, Tim, Peters, or ZPython either! All right. Heller 2.5, here we come. From theller at python.net Fri Feb 11 20:27:11 2005 From: theller at python.net (Thomas Heller) Date: Fri Feb 11 20:25:39 2005 Subject: [Python-Dev] ViewCVS on SourceForge is broken In-Reply-To: <1f7befae05021111143c346e3@mail.gmail.com> (Tim Peters's message of "Fri, 11 Feb 2005 14:14:30 -0500") References: <420CFB5B.7030007@activestate.com> <1f7befae05021111143c346e3@mail.gmail.com> Message-ID: <1xbmew34.fsf@python.net> Tim Peters writes: > [Trent Mick] >> Has anyone else noticed that viewcvs is broken on SF? > > It failed the same way from Virginia just now. I suppose that's your > reward for kindly updating the Python copyright . > > The good news is that you can use this lull in your Python work to > contribute to ZODB development! ViewCVS at zope.org is always happy > to see you: > > http://svn.zope.org/ZODB/trunk/ Thomas Heller writes: > The failure lasts already for several days: > > http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1107968334 "As of 2005-02-09 there is an outage of anonymous CVS (tarballs, pserver-based CVS and ViewCVS) for projects whose UNIX names start with the letters m, n, p, q, t, y and z." As you can see, both projects with names starting with 'p' and 'z' are affected, so may I suggest to contribute to *ctypes* instead of zope ;-) Thomas From mcherm at mcherm.com Fri Feb 11 21:03:29 2005 From: mcherm at mcherm.com (Michael Chermside) Date: Fri Feb 11 21:03:39 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c Message-ID: <1108152209.420d0f91e312c@mcherm.com> Jeremy writes: > Unfortunately a license that says it is in the public domain is > unacceptable (and should be for Debian, too). That is to say, it's > not possible for someone to claim that something they produce is in > the public domain. See http://www.linuxjournal.com/article/6225 Not quite true. It would be a bit off-topic to discuss on this list so I will simply point you to: http://creativecommons.org/license/publicdomain-2 ...which is specifically designed for the US legal system. It _IS_ possible for someone to produce something in the public domain, it just isn't as easy as some people think (just saying it doesn't necessarily make it so (at least under US law)) and it may not be a good idea. I would expect that if something truly WERE in the public domain, then it would be acceptable for Python (and for Debian too, for that matter). I can't comment on whether this applies to libmd. -- Michael Chermside From tim.peters at gmail.com Fri Feb 11 21:46:00 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 11 21:46:03 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108152209.420d0f91e312c@mcherm.com> References: <1108152209.420d0f91e312c@mcherm.com> Message-ID: <1f7befae0502111246244647c9@mail.gmail.com> [Jeremy Hylton] >> Unfortunately a license that says it is in the public domain is >> unacceptable (and should be for Debian, too). That is to say, it's >> not possible for someone to claim that something they produce is in >> the public domain. See http://www.linuxjournal.com/article/6225 [Michael Chermside] > Not quite true. It would be a bit off-topic to discuss on this list > so I will simply point you to: > > http://creativecommons.org/license/publicdomain-2 > > ...which is specifically designed for the US legal system. It _IS_ > possible for someone to produce something in the public domain, it > just isn't as easy as some people think (just saying it doesn't > necessarily make it so (at least under US law)) and it may not be > a good idea. The article Jeremy pointed at was written by the Python Software Foundation's occasional legal counsel, and he disagrees. While I would love to believe that copyright law isn't this bizarre, I can't recommend going against the best legal advice the PSF was willing to pay for . Note that Creative Commons doesn't recommend that you do either; from their FAQ: Can I use a Creative Commons license for software? In theory, yes, but it is not in your best interest. We strongly encourage you to use one of the very good software licenses available today. (The Free Software Foundation and the Open Source Initiative stand out as resources for such licenses.) > I would expect that if something truly WERE in the public domain, > then it would be acceptable for Python (and for Debian too, for > that matter). So would I, but according to Larry there isn't such a thing (excepting software written by the US Government; and for other software you might be thinking about today, maybe in about a century if the author lets their copyright lapse). If Larry is correct, it isn't legally possible for an individual in the US to disclaim copyright, regardless what they may say or sign. The danger then is that accepting software that purports to be free of copyright can come back to bite you, if the author later changes their mind (from your POV; the claim is that from US law's POV, nothing has actually changed, since the author never actually gave up copyright to begin with). The very fact that this argument exists underscores the desirability of only accepting software with an explicit license, spelling out the copyright holder's intents wrt distribution, modification, etc. Then you're just in legal mud, instead of legal quicksand. From pje at telecommunity.com Fri Feb 11 23:59:33 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 11 23:57:10 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1f7befae0502111246244647c9@mail.gmail.com> References: <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> Message-ID: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> At 03:46 PM 2/11/05 -0500, Tim Peters wrote: >If Larry is correct, it isn't legally possible for an individual in >the US to disclaim copyright, regardless what they may say or sign. >The danger then is that accepting software that purports to be free of >copyright can come back to bite you, if the author later changes their >mind (from your POV; the claim is that from US law's POV, nothing has >actually changed, since the author never actually gave up copyright to >begin with). > >The very fact that this argument exists underscores the desirability >of only accepting software with an explicit license, spelling out the >copyright holder's intents wrt distribution, modification, etc. Then >you're just in legal mud, instead of legal quicksand. And as long as we're flailing about in a substance which may include, but is not limited to, mud and/or quicksand or other flailing-suitable legal substances, it should be pointed out that even though software presented by its owner to be in the public domain is technically still copyright by that individual, the odds of them successfully prosecuting a copyright enforcement action might be significantly narrowed, due to the doctrine of promissory estoppel. Promissory estoppel is basically the idea that one-sided promises *are* enforceable when somebody reasonably relies on them and is injured by the withdrawal. IBM, for example, has pled in its defense against SCO that SCO's distribution of its so-called proprietary code under the GPL constituted a reasonable promise that others were free to use the code under the terms of the GPL, and that IBM further relied on that promise. Ergo, they are claiming, SCO's promise is enforceable by law. Of course, SCO v. IBM hasn't had any judgments yet, certainly not on that subject, and maybe never will. But it's important to know that the law *does* have some principles like this that allow overriding the more egregiously insane aspects of the law. :) Oh, also, if somebody decides to back out on their dedication to the public domain, and you can show that they did it on purpose, then that's "unclean hands" and possibly "copyright abuse" as well. Just to muddy up the waters a little bit. :) Obviously, the PSF should follow its own lawyer's advice, but it seemed to me that the point of Mr. Rosen's article was more to advise people releasing software to use a license that allows them to disclaim warranties. I personally can't see how taking the reasonable interpretation of a public domain declaration can lead to any difficulties, but then, IANAL. I'm surprised, however, that he didn't even touch on promissory estoppel, if there is some reason he believes that the doctrine wouldn't apply to a software license. Heck, I was under the impression that free copyright licenses in general got their effect by way of promissory estoppel, since such licenses are always one-sided promises. The GPL in particular makes an explicit point of this, even though it doesn't use the words "promissory estoppel". The point is that the law doesn't allow you to copy, so the license is your defense against a charge of copyright infringement. Therefore, even Rosen's so-called "Give it away" license is enforceable, in the sense that the licensor should be barred from taking action against someone taking the license at face value. Rosen also says, "Under basic contract law, a gift cannot be enforced. The donor can retract his gift at any time, for any reason". If this were true, I could give you a watch for Christmas and then sue you to make you give it back, so I'm not sure what he's getting at here. But again, IANAL, certainly not a famous one like Mr. Rosen. I *am* most curious to know why his article seems to imply that a promise not to sue someone for copyright infringement isn't a valid defense against such a suit, because that would seem to imply that *no* free software license is valid, including the GPL or the PSF license! (Surely those "gifts" can be retracted too, no?) From abo at minkirri.apana.org.au Sat Feb 12 00:11:01 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 12 00:11:16 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> Message-ID: <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> G'day again, From: "Gregory P. Smith" > > I think it would be cleaner and simpler to modify the existing > > md5module.c to use the openssl md5 layer API (this is just a > > search/replace to change the function names). The bigger problem is > > deciding what/how/whether to include the openssl md5 implementation > > sources so that win32 can use them. > > yes, that is all i was suggesting. > > win32 python is already linked against openssl for the socket module > ssl support, having the md5 and sha1 modules depend on openssl should > not cause a problem. IANAL... I have too much common sense, so I won't argue licences :-) So is openssl already included in the Python sources, or is it just a dependency? I had a quick look and couldn't find it so it must be a dependency. Given that Python is already dependant on openssl, it makes sense to change md5sum to use it. I have a feeling that openssl internally uses md5, so this way we wont link against two different md5sum implementations. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From martin at v.loewis.de Sat Feb 12 00:57:40 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 12 00:57:44 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> References: <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> Message-ID: <420D4674.4040804@v.loewis.de> Phillip J. Eby wrote: > I personally can't see how taking the reasonable interpretation of a > public domain declaration can lead to any difficulties, but then, > IANAL. The ultimate question is whether we could legally relicense such code under the Python license, ie. remove the PD declaration, and attach the Python license to it. I'm sure somebody would come along and claim "you cannot do that, and because you did, I cannot use your code, because it is not legally trustworthy"; people would say the same if the PD declaration would stay around. It is important for us that our users (including our commercial users) trust that Python has a clear legal track record. For such users, it is irrelevant whether you think that a litigation of the actual copyright holder would have any chance to stand in court, or whether such action is even likely. So for some users, replacing RSA-copyrighted-and-licensed code with PD-declared-and-unlicensed code makes Python less trustworthy. Clearly, for Debian, it is exactly the other way 'round. So I have rejected the patch, preserving the status quo, until a properly licensed open source implementation of md5 arrives. Until then, Debian will have to patch Python. > But again, IANAL, certainly not a famous one like Mr. Rosen. I *am* > most curious to know why his article seems to imply that a promise not > to sue someone for copyright infringement isn't a valid defense against > such a suit It might be, but that is irrelevant for open source projects that include contributions. Either they don't care too much about such things, in which case anything remotely "free" would be acceptable, or they are very nit-picking, in which case you need a good record for any contribution you ever received. Regards, Martin From pje at telecommunity.com Sat Feb 12 01:25:35 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Feb 12 01:23:11 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <420D4674.4040804@v.loewis.de> References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> At 12:57 AM 2/12/05 +0100, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>I personally can't see how taking the reasonable interpretation of a >>public domain declaration can lead to any difficulties, but then, IANAL. > >The ultimate question is whether we could legally relicense such >code under the Python license, ie. remove the PD declaration, and >attach the Python license to it. I'm sure somebody would come along >and claim "you cannot do that, and because you did, I cannot use >your code, because it is not legally trustworthy"; people would >say the same if the PD declaration would stay around. Right, but now we've moved off the legality and into marketing, which is an even less sane subject in some ways. The law at least has certain checks and balances built into it, but in marketing, people's irrationality knows no bounds. ;) >It might be, but that is irrelevant for open source projects that >include contributions. Either they don't care too much about such >things, in which case anything remotely "free" would be acceptable, >or they are very nit-picking, in which case you need a good record >for any contribution you ever received. Isn't the PSF somewhere in between? I mean, in theory we are supposed to be tracking stuff, but in practice there's no contributor agreement for CVS committers ala Zope Corp.'s approach. So in some sense right now, Python depends largely on the implied promise of its contributors to license their contributions under the same terms as Python. ISTM that if somebody's lawyer is worried about whether Python contains pseudo-public domain code, they should be downright horrified by the absence of a paper trail on the rest. But IANAM (I Am Not A Marketer), either. :) From martin at v.loewis.de Sat Feb 12 02:09:05 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 12 02:09:08 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> Message-ID: <420D5731.8020702@v.loewis.de> Phillip J. Eby wrote: > Isn't the PSF somewhere in between? I mean, in theory we are supposed > to be tracking stuff, but in practice there's no contributor agreement > for CVS committers ala Zope Corp.'s approach. That is not true, see http://www.python.org/psf/contrib.html We certainly don't have forms from all contributors, yet, but we are working on it. > So in some sense right > now, Python depends largely on the implied promise of its contributors > to license their contributions under the same terms as Python. ISTM > that if somebody's lawyer is worried about whether Python contains > pseudo-public domain code, they should be downright horrified by the > absence of a paper trail on the rest. But IANAM (I Am Not A Marketer), > either. :) And indeed, they are horrified. Right now, we can tell them we are working on it - so I would like to see that any change that we make to improve the PSF's legal standing. Adding code which was put into the "public domain" makes it worse (atleast in the specific case - we are clearly allowed to do what we do with the current md5 code; for the newly-proposed code, it is not so clear, even if you think it is likely we would win in court). Regards, Martin From bob at redivi.com Sat Feb 12 02:38:18 2005 From: bob at redivi.com (Bob Ippolito) Date: Sat Feb 12 02:38:33 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> Message-ID: <5d300838ef9716aeaae53579ab1f7733@redivi.com> On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote: > G'day again, > > From: "Gregory P. Smith" >>> I think it would be cleaner and simpler to modify the existing >>> md5module.c to use the openssl md5 layer API (this is just a >>> search/replace to change the function names). The bigger problem is >>> deciding what/how/whether to include the openssl md5 implementation >>> sources so that win32 can use them. >> >> yes, that is all i was suggesting. >> >> win32 python is already linked against openssl for the socket module >> ssl support, having the md5 and sha1 modules depend on openssl should >> not cause a problem. > > IANAL... I have too much common sense, so I won't argue licences :-) > > So is openssl already included in the Python sources, or is it just a > dependency? I had a quick look and couldn't find it so it must be a > dependency. > > Given that Python is already dependant on openssl, it makes sense to > change > md5sum to use it. I have a feeling that openssl internally uses md5, > so this > way we wont link against two different md5sum implementations. It is an optional dependency that is used when present (read: not just win32). The sources are not included with Python. OpenSSL does internally have an implementation of md5 (and sha1, among other things). -bob From pje at telecommunity.com Sat Feb 12 03:28:43 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat Feb 12 03:26:19 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <420D5731.8020702@v.loewis.de> References: <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <1108152209.420d0f91e312c@mcherm.com> <1108152209.420d0f91e312c@mcherm.com> <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com> <5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050211212759.03db5b30@mail.telecommunity.com> At 02:09 AM 2/12/05 +0100, Martin v. L?wis wrote: >Phillip J. Eby wrote: >>Isn't the PSF somewhere in between? I mean, in theory we are supposed to >>be tracking stuff, but in practice there's no contributor agreement for >>CVS committers ala Zope Corp.'s approach. > >That is not true, see > >http://www.python.org/psf/contrib.html > >We certainly don't have forms from all contributors, yet, but we >are working on it. > >>So in some sense right now, Python depends largely on the implied promise >>of its contributors to license their contributions under the same terms >>as Python. ISTM that if somebody's lawyer is worried about whether >>Python contains pseudo-public domain code, they should be downright >>horrified by the absence of a paper trail on the rest. But IANAM (I Am >>Not A Marketer), either. :) > >And indeed, they are horrified. Right now, we can tell them we are >working on it - so I would like to see that any change that we make >to improve the PSF's legal standing. Adding code which was put into >the "public domain" makes it worse (atleast in the specific case - >we are clearly allowed to do what we do with the current md5 code; >for the newly-proposed code, it is not so clear, even if you think >it is likely we would win in court). Thanks for the clarifications. From abo at minkirri.apana.org.au Sat Feb 12 03:54:27 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 12 03:54:37 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> Message-ID: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> G'day, From: "Bob Ippolito" > On Feb 11, 2005, at 6:11 PM, Donovan Baarda wrote: [...] > > Given that Python is already dependant on openssl, it makes sense to > > change > > md5sum to use it. I have a feeling that openssl internally uses md5, > > so this > > way we wont link against two different md5sum implementations. > > It is an optional dependency that is used when present (read: not just > win32). The sources are not included with Python. Are there any potential problems with making the md5sum module availability "optional" in the same way as this? > OpenSSL does internally have an implementation of md5 (and sha1, among > other things). Yeah, I know, that's why it could be used for the md5sum module :-) What I meant was a Python application using ssl sockets and the md5sum module will effectively have two different md5sum implementations in memory. Using the openssl md5sum for the md5sum module will make it "leaner", as well as faster. ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From tjreedy at udel.edu Sat Feb 12 07:40:36 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Feb 12 07:40:52 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c References: <5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com><1108152209.420d0f91e312c@mcherm.com><1108152209.420d0f91e312c@mcherm.com><5.1.1.6.0.20050211172834.03c16e10@mail.telecommunity.com><5.1.1.6.0.20050211191840.03814ec0@mail.telecommunity.com> <420D5731.8020702@v.loewis.de> Message-ID: ""Martin v. Löwis"" wrote in message news:420D5731.8020702@v.loewis.de... > http://www.python.org/psf/contrib.html After reading this page and pages linked thereto, I get the impression that you are only asking for contributor forms from contributors of original material (such as module or manual section) and not from submitters of suggestions (via news,mail) or patches (via sourceforge). Correct? Seems sensible to me that contributing via a public suggestion box constitutes permission to use the suggestion. Terry J. Reedy From amk at amk.ca Sat Feb 12 14:37:21 2005 From: amk at amk.ca (A.M. Kuchling) Date: Sat Feb 12 14:40:04 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> Message-ID: <20050212133721.GA13429@rogue.amk.ca> On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > Are there any potential problems with making the md5sum module availability > "optional" in the same way as this? The md5 module has been a standard module for a long time; making it optional in the next version of Python isn't possible. We'd have to require OpenSSL to compile Python. I'm happy to replace the MD5 and/or SHA implementations with other code, provided other code with a suitable license can be found. --amk From barry at python.org Sat Feb 12 15:06:12 2005 From: barry at python.org (Barry Warsaw) Date: Sat Feb 12 15:06:14 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: <1108217172.20404.37.camel@presto.wooz.org> On Sat, 2005-02-12 at 08:37, A.M. Kuchling wrote: > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. I totally agree. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20050212/74657c79/attachment.pgp From rkern at ucsd.edu Sat Feb 12 15:11:17 2005 From: rkern at ucsd.edu (Robert Kern) Date: Sat Feb 12 15:11:43 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: A.M. Kuchling wrote: > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > >>Are there any potential problems with making the md5sum module availability >>"optional" in the same way as this? > > > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. > > I'm happy to replace the MD5 and/or SHA implementations with other > code, provided other code with a suitable license can be found. How about this one: http://sourceforge.net/project/showfiles.php?group_id=42360 From an API standpoint, it's trivially different from the one currently in Python. From md5.c: /* Copyright (C) 1999, 2000, 2002 Aladdin Enterprises. All rights reserved. This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. L. Peter Deutsch ghost@aladdin.com */ /* $Id: md5.c,v 1.6 2002/04/13 19:20:28 lpd Exp $ */ /* Independent implementation of MD5 (RFC 1321). This code implements the MD5 Algorithm defined in RFC 1321, whose text is available at http://www.ietf.org/rfc/rfc1321.txt The code is derived from the text of the RFC, including the test suite (section A.5) but excluding the rest of Appendix A. It does not include any code or documentation that is identified in the RFC as being copyrighted. [etc.] -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From aahz at pythoncraft.com Sat Feb 12 15:53:26 2005 From: aahz at pythoncraft.com (Aahz) Date: Sat Feb 12 15:53:29 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <420D5731.8020702@v.loewis.de> Message-ID: <20050212145326.GA7836@panix.com> On Sat, Feb 12, 2005, Terry Reedy wrote: > ""Martin v. Löwis"" wrote in message > news:420D5731.8020702@v.loewis.de... >> >> http://www.python.org/psf/contrib.html > > After reading this page and pages linked thereto, I get the impression that > you are only asking for contributor forms from contributors of original > material (such as module or manual section) and not from submitters of > suggestions (via news,mail) or patches (via sourceforge). Correct? Half-correct: patches constitute "work" and should also require a contrib agreement. But we're probably not going to press the point until we get contrib agreements from all CVS committers. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From tjreedy at udel.edu Sat Feb 12 21:30:42 2005 From: tjreedy at udel.edu (Terry Reedy) Date: Sat Feb 12 21:30:59 2005 Subject: [Python-Dev] Re: Re: license issues with profiler.py and md5.h/md5c.c References: <420D5731.8020702@v.loewis.de> <20050212145326.GA7836@panix.com> Message-ID: "Aahz" wrote in message news:20050212145326.GA7836@panix.com... On Sat, Feb 12, 2005, Terry Reedy wrote: >>> http://www.python.org/psf/contrib.html >> After reading this page and pages linked thereto, I get the impression >> that >> you are only asking for contributor forms from contributors of original >> material (such as module or manual section) and not from submitters of >> suggestions (via news,mail) or patches (via sourceforge). Correct? > Half-correct: patches constitute "work" and should also require a > contrib agreement. As I remember, my impression was based on the suggested procedure of first copywrite one's work and then license it under one of two acceptible "original licenses". This makes sense for a whole module, but hardly for most patches, to the point of being nonsense for a patch of one word, as some of mine have been (in text form, with the actual diff being prepared by the committer). This is not to deny that editing -- finding the exact place to insert or change a word is "work" -- but to say that it is work of a different sort from original authorship. So, if the lawyer thinks patches should also have a contrib agreement, then I strongly recommend a separate blanket agreement that covers all patches one ever contributes as one ongoing work. > But we're probably not going to press the point > until we get contrib agreements from all CVS committers. Even though I am not such, I would happily fill and fax a blanket patch agreement were that deemed to be helpful. Terry J. Reedy From greg at electricrain.com Sat Feb 12 22:04:02 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Feb 12 22:04:08 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212133721.GA13429@rogue.amk.ca> References: <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> Message-ID: <20050212210402.GE25441@zot.electricrain.com> On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote: > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > > Are there any potential problems with making the md5sum module availability > > "optional" in the same way as this? > > The md5 module has been a standard module for a long time; making it > optional in the next version of Python isn't possible. We'd have to > require OpenSSL to compile Python. > > I'm happy to replace the MD5 and/or SHA implementations with other > code, provided other code with a suitable license can be found. > agreed. it can not be made optional. What I'd prefer (and will do if i find the time) is to have the md5 and sha1 module use OpenSSLs implementations when available. Falling back to their built in ones when openssl isn't present. That way its always there but uses the much faster optimized openssl algorithms when they exist. -g From david.ascher at gmail.com Sat Feb 12 22:42:01 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 12 22:42:05 2005 Subject: [Python-Dev] Jim Roskind Message-ID: I contacted Jim Roskind re: the profiler code. i said: I'm a strong supporter of Opensource software, but I'm probably not going to be able to help you very much. I could be much more helpful with understanding the code or its use ;-). To summarize what I'll say: I don't own the rights to this stuff. ... but I don't believe there are any patents that I was ever involved with that might encumber this work. I would note that my profiler code is really very rarely used in commercial products, and it is much more typically used by developers (I guess a developer toolkit, if sold, would use it). I'm pretty delighted that the code has found so much use by developers over the years. As I noted in the intro to the documentation, I had only been coding in Python for 3 weeks when I wrote it. On the positive side, it exposed many weaknesses in many developer's code (including our own at InfoSeek), as well as in core Python code (subtle bugs in the interpreter) that surely helped everyone. Even though I was a newbie, It was VERY carefully crafted,, and I'd expect that it would take a fair amount of effort to reproduce it (and that is is probably why it has not been changed much... or at least no one told me when they changed/fixed it ;-) ). With regard to why I probably can't help much..... First off, InfoSeek (holder of the copyright) was bought by Disney, and I don't know what if anything has eventually become of the tradename. There is a chance that Disney owns the rights... and I have no idea who to ask there :-/. Second, I took a look at the Copyright, and it sure seems pretty permissive. I'm amazed if folks want something more permissive. This is what I found on the web for it: Copyright ? 1994, by InfoSeek Corporation, all rights reserved. Written by James Roskind.10.1 Permission to use, copy, modify, and distribute this Python software and its associated documentation for any purpose (subject to the restriction in the following sentence) without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of InfoSeek not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. This permission is explicitly restricted to the copying and modification of the software to remain in Python, compiled Python, or other languages (such as C) wherein the modified or derived code is exclusively imported into a Python module. INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. As I recall, I probably personally created the terms of the above license. I used a similar license on my C/C++ grammar, and Infoseek just added a bunch of wording to be sure that they were not at risk, and that their name would not be used in vain (or in advertising material). I think they were also interested in limiting its use to Python.... but I don't think that is a concern that would bother you. I read the link you directed me to, and its primary focus seemed ot be on patents for related or included technology. I don't believe that infoseek applied for or got any patents in this area (and certainly if they did so without my name, it would probably invalidate the patent), and I'm sure I didn't get any patents in this area at Netscape/AOL. In fact I don't think I got any patents back in 1994 or 1995. My only prior patent dated back to about 1983 (a hardware patent) that has since expired. I have some patents since (roughly) 1995, and even though I don't think any of them relate to profiling (though some did relate to languages, or more specifically, security in languages), I wouldn't want to mess with assigning rights to any of those patents, as they belong to AOL/Netscape. Here again, to my knowledge, none of my patents relate in any way to this area (profiling). Sadly, if they did, I would not have the right to assign them. I'm sure you're just doing your job, and following through by dotting all the I's and crossing all T's. My suggestion is to (as you said) work around the issue. You could always re-write the code from scratch, as the approaches are not rocket science and are pretty thoroughly explained. I wouldn't suggest it unless you are desperate. If I were you, I'd wait for a license problem to emerge (which I don't believe will ever happen). Hope that helps, Jim David Ascher wrote on 2/11/2005, 8:57 PM: > Dear Jim -- > > David Ascher here, writing to you on behalf of the Python Software > Foundation. Someone recently pointed to your copyright statement in > Python's standard library (profile.py, if you recall, way back from > '94). Apparently there are some issues re: the specific terms of the > license you picked. We can probably find ways of working around those > issues but I was wondering if you'd be willing to relicense the code > under a different license, as per http://www.python.org/psf/contrib.html > > I don't really know if we need to worry about the current owners of > InfoSeek, whoever that may be. You'd know better. From david.ascher at gmail.com Sat Feb 12 22:45:54 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 12 22:45:57 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: On Tue, 8 Feb 2005 15:52:29 -0500, Jeremy Hylton wrote: > Maybe some ambitious PSF activitst could contact Roskind and Steve > Kirsch and see if they know who at Disney to talk to... Or maybe the > Disney guys who were at PyCon last year could help. I contacted Jim. His response follows: --- I'm a strong supporter of Opensource software, but I'm probably not going to be able to help you very much. I could be much more helpful with understanding the code or its use ;-). To summarize what I'll say: I don't own the rights to this stuff. ... but I don't believe there are any patents that I was ever involved with that might encumber this work. I would note that my profiler code is really very rarely used in commercial products, and it is much more typically used by developers (I guess a developer toolkit, if sold, would use it). I'm pretty delighted that the code has found so much use by developers over the years. As I noted in the intro to the documentation, I had only been coding in Python for 3 weeks when I wrote it. On the positive side, it exposed many weaknesses in many developer's code (including our own at InfoSeek), as well as in core Python code (subtle bugs in the interpreter) that surely helped everyone. Even though I was a newbie, It was VERY carefully crafted,, and I'd expect that it would take a fair amount of effort to reproduce it (and that is is probably why it has not been changed much... or at least no one told me when they changed/fixed it ;-) ). With regard to why I probably can't help much..... First off, InfoSeek (holder of the copyright) was bought by Disney, and I don't know what if anything has eventually become of the tradename. There is a chance that Disney owns the rights... and I have no idea who to ask there :-/. Second, I took a look at the Copyright, and it sure seems pretty permissive. I'm amazed if folks want something more permissive. This is what I found on the web for it: Copyright ? 1994, by InfoSeek Corporation, all rights reserved. Written by James Roskind.10.1 Permission to use, copy, modify, and distribute this Python software and its associated documentation for any purpose (subject to the restriction in the following sentence) without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of InfoSeek not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. This permission is explicitly restricted to the copying and modification of the software to remain in Python, compiled Python, or other languages (such as C) wherein the modified or derived code is exclusively imported into a Python module. INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. As I recall, I probably personally created the terms of the above license. I used a similar license on my C/C++ grammar, and Infoseek just added a bunch of wording to be sure that they were not at risk, and that their name would not be used in vain (or in advertising material). I think they were also interested in limiting its use to Python.... but I don't think that is a concern that would bother you. I read the link you directed me to, and its primary focus seemed ot be on patents for related or included technology. I don't believe that infoseek applied for or got any patents in this area (and certainly if they did so without my name, it would probably invalidate the patent), and I'm sure I didn't get any patents in this area at Netscape/AOL. In fact I don't think I got any patents back in 1994 or 1995. My only prior patent dated back to about 1983 (a hardware patent) that has since expired. I have some patents since (roughly) 1995, and even though I don't think any of them relate to profiling (though some did relate to languages, or more specifically, security in languages), I wouldn't want to mess with assigning rights to any of those patents, as they belong to AOL/Netscape. Here again, to my knowledge, none of my patents relate in any way to this area (profiling). Sadly, if they did, I would not have the right to assign them. I'm sure you're just doing your job, and following through by dotting all the I's and crossing all T's. My suggestion is to (as you said) work around the issue. You could always re-write the code from scratch, as the approaches are not rocket science and are pretty thoroughly explained. I wouldn't suggest it unless you are desperate. If I were you, I'd wait for a license problem to emerge (which I don't believe will ever happen). --- FWIW, I agree. Personnally, I think that if Debian has a problem with the above, it's their problem to deal with, not Python's. --david From rkern at ucsd.edu Sun Feb 13 00:24:27 2005 From: rkern at ucsd.edu (Robert Kern) Date: Sun Feb 13 00:24:50 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <1107726549.20128.12.camel@localhost> <16903.28384.621922.349@gargle.gargle.HOWL> <1f7befae05020812377c72de26@mail.gmail.com> Message-ID: David Ascher wrote: > FWIW, I agree. Personnally, I think that if Debian has a problem with > the above, it's their problem to deal with, not Python's. The OSI may also have a problem with the license if they were to be made aware of it. See section 8 of the Open Source Definition: """8. License Must Not Be Specific to a Product The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution. """ I'm not entirely sure if this affects the PSF's use of OSI's trademark. IANAL. TINLA. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From greg at electricrain.com Sun Feb 13 02:35:35 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sun Feb 13 02:35:39 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <013501c510ae$2abd7360$24ed0ccb@apana.org.au> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> Message-ID: <20050213013535.GF25441@zot.electricrain.com> I've created an OpenSSL version of the sha module. trivial to modify to be a md5 module. Its a first version with cleanup to be done and such. being managed in the SF patch manager: https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 enjoy. i'll do more cleanup and work on it soon. From martin at v.loewis.de Sun Feb 13 20:38:47 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Feb 13 20:38:50 2005 Subject: [Python-Dev] Re: Re: license issues with profiler.py and md5.h/md5c.c In-Reply-To: References: <420D5731.8020702@v.loewis.de> <20050212145326.GA7836@panix.com> Message-ID: <420FACC7.9020502@v.loewis.de> Terry Reedy wrote: > As I remember, my impression was based on the suggested procedure of first > copywrite one's work and then license it under one of two acceptible > "original licenses". This makes sense for a whole module, but hardly for > most patches, to the point of being nonsense for a patch of one word, as > some of mine have been (in text form, with the actual diff being prepared > by the committer). To my understanding, there is no way to "copyright one's work" - in the terminology of Larry Rosen (and I guess U.S. copyright law), "copyright subsists". I.e. the creator of some work has copyright, whether he wants it or not. Now, the question is, what precisely constitutes "work"? To my understanding, modifying an existing work creates derivative work; he who creates the derivative work first needs a license to do so, and then owns the title of the derivative work. There is, of course, the issue of trivial changes - "nobody could have it done differently". However, I understand that the bar for trivial changes is very, very low; I understand that even putting a comment into the change indicating what the change was already makes this original work. Nobody is obliged to phrase the comment in precisely the same way, so this specific wording of the comment is original work of the contributor, who needs to license the change to us. > So, if the lawyer thinks patches should also have a contrib agreement, then > I strongly recommend a separate blanket agreement that covers all patches > one ever contributes as one ongoing work. Our contributor's form is such a blanket agreement. You fill it out once, and then you indicate, in each patch, that this patch falls under the agreement you sent in earlier. > Even though I am not such, I would happily fill and fax a blanket patch > agreement were that deemed to be helpful. When we have sufficient coverage from committers, I will move on to people in Misc/ACKS. You can just go ahead and send in the form right away. Regards, Martin From abo at minkirri.apana.org.au Mon Feb 14 01:02:23 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Mon Feb 14 01:03:02 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <20050213013535.GF25441@zot.electricrain.com> References: <20050208195243.GD10650@zot.electricrain.com> <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050213013535.GF25441@zot.electricrain.com> Message-ID: <1108339344.3768.24.camel@schizo> On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote: > I've created an OpenSSL version of the sha module. trivial to modify > to be a md5 module. Its a first version with cleanup to be done and > such. being managed in the SF patch manager: > > https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 > > enjoy. i'll do more cleanup and work on it soon. Hmmm. I see the patch entry, but it seems to be missing the actual patch. Did you code this from scratch, or did you base it on the current md5module.c? Is it using the openssl sha interface, or the higher level EVP interface? The reason I ask is it would be pretty trivial to modify md5module.c to use the openssl API for any digest, and would be less risk than fresh-coding one. -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From abo at minkirri.apana.org.au Mon Feb 14 01:19:34 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Mon Feb 14 01:20:12 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050212210402.GE25441@zot.electricrain.com> References: <1108088147.3753.51.camel@schizo> <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> Message-ID: <1108340374.3768.33.camel@schizo> G'day, On Sat, 2005-02-12 at 13:04 -0800, Gregory P. Smith wrote: > On Sat, Feb 12, 2005 at 08:37:21AM -0500, A.M. Kuchling wrote: > > On Sat, Feb 12, 2005 at 01:54:27PM +1100, Donovan Baarda wrote: > > > Are there any potential problems with making the md5sum module availability > > > "optional" in the same way as this? > > > > The md5 module has been a standard module for a long time; making it > > optional in the next version of Python isn't possible. We'd have to > > require OpenSSL to compile Python. > > > > I'm happy to replace the MD5 and/or SHA implementations with other > > code, provided other code with a suitable license can be found. > > > > agreed. it can not be made optional. What I'd prefer (and will do if > i find the time) is to have the md5 and sha1 module use OpenSSLs > implementations when available. Falling back to their built in ones > when openssl isn't present. That way its always there but uses the > much faster optimized openssl algorithms when they exist. So we need a fallback md5 implementation for when openssl is not available. The RSA implementation is not usable because it has an unsuitable license. Looking at this licence again, I'm not sure what the problem is. It allows you to freely modify, distribute, etc, with the only limit you must retain the RSA licence blurb. The libmd implementation cannot be used because the author tried to give it away unconditionally, and the lawyers say you can't. (dumb! dumb! dumb! someone needs to figure out a way to systematically get around this kind of stupidity, perhaps have someone in a less legally stupid country claim and re-license free code). The libmd5-rfc sourceforge project implementation looks OK. It needs to be modified to have an API identical to openssl (rename structures/functions). Then setup.py needs to be modified to use openssl if available, or fallback to the provided libmd5-rfc implementation. The SHA module is a bit different... it includes a built in SHA implementation. It might pay to strip out the implementation and give it an openssl-like API, then make shamodule.c a use it, or openssl if available. Greg Smith might have already done much of this... -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From greg at electricrain.com Mon Feb 14 01:21:54 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Mon Feb 14 01:21:59 2005 Subject: [Python-Dev] Re: OpenSSL sha module / license issues with md5.h/md5c.c In-Reply-To: <1108339344.3768.24.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050213013535.GF25441@zot.electricrain.com> <1108339344.3768.24.camel@schizo> Message-ID: <20050214002154.GI25441@zot.electricrain.com> On Mon, Feb 14, 2005 at 11:02:23AM +1100, Donovan Baarda wrote: > On Sat, 2005-02-12 at 17:35 -0800, Gregory P. Smith wrote: > > I've created an OpenSSL version of the sha module. trivial to modify > > to be a md5 module. Its a first version with cleanup to be done and > > such. being managed in the SF patch manager: > > > > https://sourceforge.net/tracker/?func=detail&aid=1121611&group_id=5470&atid=305470 > > > > enjoy. i'll do more cleanup and work on it soon. > > Hmmm. I see the patch entry, but it seems to be missing the actual > patch. > > Did you code this from scratch, or did you base it on the current > md5module.c? Is it using the openssl sha interface, or the higher level > EVP interface? > > The reason I ask is it would be pretty trivial to modify md5module.c to > use the openssl API for any digest, and would be less risk than > fresh-coding one. Ugh. Sourceforge ignored it on the patch submission. i've attached it properly now. This initial version is derived from shamodule.c which does not have any license issues. it is currently only meant as an example of how easy it is to use the openssl hashing interface. I'm taking it an turning it into a generic openssl hash wrapper that'll do md5 sha1 and anything else. -g From ncoghlan at iinet.net.au Mon Feb 14 03:26:44 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Mon Feb 14 03:27:57 2005 Subject: [Python-Dev] A hybrid C & Python implementation for itertools Message-ID: <42100C64.5090001@iinet.net.au> I can't really imagine Raymond liking this idea, and I have a feeling the idea has been shot down before. However, I can't persuade Google to tell me anything about such an occasion, so here goes anyway. . . The utilities in the itertools module can easily be composed to provide additional useful functionality (e.g. the itertools recipes given in the documentation [1]). However, having to recode these every time you need them, or arranging access to a utility module can be a pain for application programming in some corporate environments [2]. The lack of builtin support also leads to many variations on a theme, only some of which actually work properly, or which work, but in subtly different ways [3]. On the other hand, it really isn't worth the effort to code these algorithms in C for the current itertools module. If itertools was a hybrid module, the handy 3-4 liners could go in the Python section, with the heavy lifting done by the underlying C module. The Python equivalents to the current C code could also be placed in the hybrid module (as happens with some of the other hybrid modules in the library). An alternative approach is based on an idea from Alex Martelli [4]. As Alex points out, itertools is currently more about *creating* iterators than it is about consuming them (the only function desription that doesn't start with 'Make an iterator' is itertools.tee and that starts with 'Return n independent iterators'). Alex's idea would involve adding a module with a new name that is focused on *consuming* iterators (IOW, extending the available standard accumulators beyond the existing min(), max() and sum() without further populating the builtins). The downside of the latter proposal is that the recipes in the itertools documentation relate both to producing *and* consuming iterators, so a new module would leave the question of where to put the handy iterator producers. Regards, Nick. [1] http://www.python.org/dev/doc/devel/lib/itertools-recipes.html [2] http://mail.python.org/pipermail/python-list/2005-February/266310.html [3] http://mail.python.org/pipermail/python-list/2005-February/266311.html [4] http://groups-beta.google.com/group/comp.lang.python/msg/a76b4c2caf6c435c -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From python at rcn.com Mon Feb 14 05:07:10 2005 From: python at rcn.com (Raymond Hettinger) Date: Mon Feb 14 05:11:03 2005 Subject: [Python-Dev] A hybrid C & Python implementation for itertools References: <42100C64.5090001@iinet.net.au> Message-ID: <006e01c5124a$a81ed540$5e2dc797@oemcomputer> [Nick Coghlan] > If itertools was a hybrid module, the handy 3-4 liners could go in the Python > section, with the heavy lifting done by the underlying C module. The Python > equivalents to the current C code could also be placed in the hybrid module (as > happens with some of the other hybrid modules in the library). Both of those ideas likely reflect the future direction of itertools. FWIW, the historical reasons for keeping the derived tools in the docs were: * Not casting them in stone too early so they could be updated and refined at will. * They had more value as a teaching tool (showing how basic tools could be combined) than as stand-alone tools. * Adding more tools makes the whole toolset harder to use. * When an itertool solution is not immediately obvious, then a generator solution is likely to be easier to write and more understandable. Your two alternate partitioning recipes provide an excellent case in point. * Several of the derived tools do not arise often in practice. For example, I've never used tabulate(), nth(), pairwise(), or repeatfunc(). > Alex's idea would involve adding a module with a new name that is > focused on *consuming* iterators (IOW, extending the available standard > accumulators beyond the existing min(), max() and sum() without further > populating the builtins). That would be nice. From the existing itertool recipes, good candidates would include take(), all(), any(), no(), and quantify(). Raymond From just at letterror.com Mon Feb 14 10:23:03 2005 From: just at letterror.com (Just van Rossum) Date: Mon Feb 14 10:23:06 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libimp.tex, 1.36, 1.36.2.1 libsite.tex, 1.26, 1.26.4.1 libtempfile.tex, 1.22, 1.22.4.1 libos.tex, 1.146.2.1, 1.146.2.2 In-Reply-To: Message-ID: bcannon@users.sourceforge.net wrote: > \begin{datadesc}{PY_RESOURCE} > -The module was found as a Macintosh resource. This value can only be > -returned on a Macintosh. > +The module was found as a Mac OS 9 resource. This value can only be > +returned on a Mac OS 9 or earlier Macintosh. > \end{datadesc} not entirely true: it's limited to the sa called "OS9" version of MacPython, which happily runs natively on OSX as a Carbon app... Just From troels at thule.no Mon Feb 14 15:03:22 2005 From: troels at thule.no (Troels Walsted Hansen) Date: Mon Feb 14 15:03:28 2005 Subject: [Python-Dev] builtin_id() returns negative numbers Message-ID: <4210AFAA.9060108@thule.no> Hi all, The Python binding in libxml2 uses the following code for __repr__(): class xmlNode(xmlCore): def __init__(self, _obj=None): self._o = None xmlCore.__init__(self, _obj=_obj) def __repr__(self): return "" % (self.name, id (self)) With Python 2.3.4 I'm seeing warnings like the one below: :2357: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up I believe this is caused by the memory address having the sign bit set, causing builtin_id() to return a negative integer. I grepped around in the Python standard library and found a rather awkward work-around that seems to be slowly propagating to various module using the "'%x' % id(self)" idiom: Lib/asyncore.py: # On some systems (RH10) id() can be a negative number. # work around this. MAX = 2L*sys.maxint+1 return '<%s at %#x>' % (' '.join(status), id(self)&MAX) $ grep -r 'can be a negative number' * Lib/asyncore.py: # On some systems (RH10) id() can be a negative number. Lib/repr.py: # On some systems (RH10) id() can be a negative number. Lib/tarfile.py: # On some systems (RH10) id() can be a negative number. Lib/test/test_repr.py: # On some systems (RH10) id() can be a negative number. Lib/xml/dom/minidom.py: # On some systems (RH10) id() can be a negative number. There are many modules that do not have this work-around in Python 2.3.4. Wouldn't it be more elegant to make builtin_id() return an unsigned long integer? Is the performance impact too great? A long integer is used on platforms where SIZEOF_VOID_P > SIZEOF_LONG (most 64 bit platforms?), so all Python code must be prepared to handle it already... Troels From tim.peters at gmail.com Mon Feb 14 16:41:35 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 16:41:37 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <4210AFAA.9060108@thule.no> References: <4210AFAA.9060108@thule.no> Message-ID: <1f7befae050214074122b715a@mail.gmail.com> [Troels Walsted Hansen] > The Python binding in libxml2 uses the following code for __repr__(): > > class xmlNode(xmlCore): > def __init__(self, _obj=None): > self._o = None > xmlCore.__init__(self, _obj=_obj) > > def __repr__(self): > return "" % (self.name, id (self)) > > With Python 2.3.4 I'm seeing warnings like the one below: > :2357: FutureWarning: %u/%o/%x/%X of negative int > will return a signed string in Python 2.4 and up > > I believe this is caused by the memory address having the sign bit set, > causing builtin_id() to return a negative integer. Yes, that's right. > I grepped around in the Python standard library and found a rather > awkward work-around that seems to be slowly propagating to various > module using the "'%x' % id(self)" idiom: No, it's not propagating any more: I see that none of these exist in 2.4: > Lib/asyncore.py: > # On some systems (RH10) id() can be a negative number. > # work around this. > MAX = 2L*sys.maxint+1 > return '<%s at %#x>' % (' '.join(status), id(self)&MAX) > > $ grep -r 'can be a negative number' * > Lib/asyncore.py: # On some systems (RH10) id() can be a negative > number. > Lib/repr.py: # On some systems (RH10) id() can be a negative > number. > Lib/tarfile.py: # On some systems (RH10) id() can be a negative > number. > Lib/test/test_repr.py: # On some systems (RH10) id() can be a > negative number. > Lib/xml/dom/minidom.py: # On some systems (RH10) id() can be a > negative number. > > There are many modules that do not have this work-around in Python 2.3.4. Not sure, but it looks like this stuff was ripped out in 2.4 simply because 2.4 no longer produces a FutureWarning in these cases. That doesn't address that the output changed, or that the output for a negative id() produced by %x under 2.4 is probably surprising to most. > Wouldn't it be more elegant to make builtin_id() return an unsigned > long integer? I think so. This is the function ZODB 3.3 uses, BTW: # Addresses can "look negative" on some boxes, some of the time. If you # feed a "negative address" to an %x format, Python 2.3 displays it as # unsigned, but produces a FutureWarning, because Python 2.4 will display # it as signed. So when you want to prodce an address, use positive_id() to # obtain it. def positive_id(obj): """Return id(obj) as a non-negative integer.""" result = id(obj) if result < 0: # This is a puzzle: there's no way to know the natural width of # addresses on this box (in particular, there's no necessary # relation to sys.maxint). Try 32 bits first (and on a 32-bit # box, adding 2**32 gives a positive number with the same hex # representation as the original result). result += 1L << 32 if result < 0: # Undo that, and try 64 bits. result -= 1L << 32 result += 1L << 64 assert result >= 0 # else addresses are fatter than 64 bits return result The gives a non-negative result regardless of Python version and (almost) regardless of platform (the `assert` hasn't triggered on any ZODB 3.3 platform yet). > Is the performance impact too great? For some app, somewhere, maybe. It's a tradeoff. The very widespread practice of embedding %x output from id() favors getting rid of the sign issue, IMO. > A long integer is used on platforms where SIZEOF_VOID_P > SIZEOF_LONG > (most 64 bit platforms?), Win64 is probably the only major (meaning likely to be popular among Python users) platform where sizeof(void*) > sizeof(long). > so all Python code must be prepared to handle it already... In theory . From foom at fuhm.net Mon Feb 14 17:33:13 2005 From: foom at fuhm.net (James Y Knight) Date: Mon Feb 14 17:33:25 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050214074122b715a@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> Message-ID: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> On Feb 14, 2005, at 10:41 AM, Tim Peters wrote: >> Wouldn't it be more elegant to make builtin_id() return an unsigned >> long integer? > > I think so. This is the function ZODB 3.3 uses, BTW: > > def positive_id(obj): > """Return id(obj) as a non-negative integer.""" > [...] I think it'd be nice to change it, too. Twisted also uses a similar function. However, last time this topic came up, this Tim Peters guy argued against it. ;) Quoting http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > Python doesn't promise to return a postive integer for id(), although > it may have been nicer if it did. It's dangerous to change that now, > because some code does depend on the "32 bit-ness as a signed integer" > accident of CPython's id() implementation on 32-bit machines. For > example, code using struct.pack(), or code using one of ZODB's > specialized int-key BTree types with id's as keys. James From tim.peters at gmail.com Mon Feb 14 18:30:46 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 18:30:49 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <1f7befae05021409307ab36a15@mail.gmail.com> [James Y Knight] > I think it'd be nice to change it, too. Twisted also uses a similar > function. > > However, last time this topic came up, this Tim Peters guy argued > against it. ;) > > Quoting > http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > >> Python doesn't promise to return a postive integer for id(), although >> it may have been nicer if it did. It's dangerous to change that now, >> because some code does depend on the "32 bit-ness as a signed integer" >> accident of CPython's id() implementation on 32-bit machines. For >> example, code using struct.pack(), or code using one of ZODB's >> specialized int-key BTree types with id's as keys. Yup, it's still a tradeoff, and it's still dangerous (as any change in visible behavior is). It's especially unfortunate that since "%x" % id(obj) does produce different output in 2.4 than in 2.3 when id(obj) < 0, we would change that output _again_ in 2.5 if id(obj) grew a new non-negative promise. That is, the best time to do this would have been for 2.4. Maybe it's just a wart we have to live with now; OTOH, the docs explicitly warn that id() may return a long, so any code relying on "short int"-ness has always been relying on an implementation quirk. From jcarlson at uci.edu Mon Feb 14 18:29:57 2005 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon Feb 14 18:32:21 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> References: <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <20050214092543.36F0.JCARLSON@uci.edu> James Y Knight wrote: > > > On Feb 14, 2005, at 10:41 AM, Tim Peters wrote: > > >> Wouldn't it be more elegant to make builtin_id() return an unsigned > >> long integer? > > > > I think so. This is the function ZODB 3.3 uses, BTW: > > > > def positive_id(obj): > > """Return id(obj) as a non-negative integer.""" > > [...] > > I think it'd be nice to change it, too. Twisted also uses a similar > function. > > However, last time this topic came up, this Tim Peters guy argued > against it. ;) > > Quoting > http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > > > Python doesn't promise to return a postive integer for id(), although > > it may have been nicer if it did. It's dangerous to change that now, > > because some code does depend on the "32 bit-ness as a signed integer" > > accident of CPython's id() implementation on 32-bit machines. For > > example, code using struct.pack(), or code using one of ZODB's > > specialized int-key BTree types with id's as keys. All Tim was saying is that you can't /change/ builtin_id() because of backwards compatibiliity with Zope and struct.pack(). You are free to create a positive_id() function, and request its inclusion into builtins (low probability; people don't like doing that). Heck, you are even free to drop it in your local site.py implementation. But changing the current function is probably a no-no. - Josiah From tim.peters at gmail.com Mon Feb 14 20:13:57 2005 From: tim.peters at gmail.com (Tim Peters) Date: Mon Feb 14 20:14:01 2005 Subject: [Python-Dev] Re: [Zope] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <1f7befae050214111319abbda@mail.gmail.com> [Gfeller Martin] > I'm running a large Zope application on a 1x1GHz CPU 1GB mem > Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 > The application typically builds large lists by appending > and extending them. That's historically been an especially bad case for Windows systems, although the behavior varied across specific Windows flavors. Python has changed lots of things over time to improve it, including yet another twist on list-reallocation strategy new in Python 2.4. > We regularly observed that using a given functionality a > second time using the same process was much slower (50%) > than when it ran the first time after startup. Heh. On Win98SE, the _first_ time you ran pystone after rebooting the machine, it ran twice as fast as the second (or third, fourth, ...) time you tried it. The only way I ever found to get back the original speed without a reboot was to run a different process in-between that allocated almost all physical memory in one giant chunk. Presumably that convinced Win98SE to throw away its fragmented heap and start over again. > This behavior greatly improved with Python 2.3 (thanks > to the improved Python object allocator, I presume). The page you reference later describes a scheme that's (at least superficially) a lot like pymalloc uses for "small objects". In effect, pymalloc takes over buckets 1-32 in the table. > Nevertheless, I tried to convert the heap used by Python > to a Windows Low Fragmentation Heap (available on XP > and 2003 Server). This improved the overall run time > of a typical CPU-intensive report by about 15% > (overall run time is in the 5 minutes range), with the > same memory consumption. > > I consider 15% significant enough to let you know about it. Yes, and thank you. FYI, Python doesn't call any of the Win32 heap functions directly; the behavior it sees is inherited from whatever Microsoft's C implementation uses to support C's malloc()/realloc()/free(). pymalloc requests 256KB at a time from the platform malloc, and carves it up itself, so pymalloc isn't affected by LFH (LFH punts on requests over 16KB, much as pymalloc punts on requests over 256 bytes). But "large objects" (including list guts) don't go thru pymalloc to begin with, so as long as your list guts fit in 16KB, LFH could make a real difference to how they behave. Well, actually, it's probably more the case that LFH gives a boost by keeping small objects _out_ of the general heap. Then growing a giant list doesn't bump into gazillions of small objects. > For information about the Low Fragmentation Heap, see > > > Best regards, > Martin > > PS: Since I don't speak C, I used ctypes to convert all > heaps in the process to LFH (I don't know how to determine > which one is the C heap). It's the one consuming all the time . From tismer at stackless.com Tue Feb 15 01:38:43 2005 From: tismer at stackless.com (Christian Tismer) Date: Tue Feb 15 01:38:36 2005 Subject: [Python-Dev] Ann: PyPy Sprint before PYCON 2005 in Washington Message-ID: <42114493.8050006@stackless.com> PyPy Sprint before PYCON 2005 in Washington ------------------------------------------- In the four days from 19th March till 22th March (inclusive) the PyPy team will host a sprint on their new Python-in-Python implementation. The PyPy project was granted funding by the European Union as part of its Sixth Framework Program, and is now on track to produce a stackless Python-in-Python Just-in-Time Compiler by December 2006. Our Python implementation, released under the MIT/BSD license, already provides new levels of flexibility and extensibility at the core interpreter and object implementation level. Armin Rigo and Holger Krekel will also give talks about PyPy and the separate py.test tool (used to perform various kinds of testing in PyPy) during the conference. Naturally, we are eager to see how the other re-implementation of Python, namely IronPython, is doing and to explore collaboration possibilities. Of course, that will depend on the degree of openness that Microsoft wants to employ. The Pycon2005 sprint is going to focus on reaching compatibility with CPython (currently we target version 2.3.4) for our PyPy version running on top of CPython. One goal of the sprint is to pass 60% or more of the unmodified regression tests of mainline CPython. It will thus be a great way to get to know CPython and PyPy better at the same time! Other possible work areas include: - translation to C to get a first working lower-level representation of the interpreter "specified in Python" - integrating and implementing a full parser/compiler chain written in Python maybe already targetting the new AST-branch of mainline CPython - fixing various remaining issues that will come up while trying to reach the compatibility goal - integrate or code pure python implementations of some Python modules currently written in C. - whatever issues you come up with! (please tell us before hand so we can better plan introductions etc.pp.) Besides core developers, Bea D?ring will be present to help improving and document our sprint and agile development process. We are going to give tutorials about PyPy's basic concepts and provide help to newcomers usually by pairing them with experienced pypythonistas. However, we kindly ask newcomers to be present on the first day's morning (19th of March) of the sprint to be able to get everyone a smooth start into the sprint. So far most newcomers had few problems in getting a good start into our codebase. However, it is good to have the following preparational points in mind: - some experience with programming in the Python language and interest to dive deeper - subscription to pypy-dev and pypy-sprint at http://codespeak.net/pypy/index.cgi?lists - have a subversion-client, Pygame and graphviz installed on the machine you bring to the sprint. - have a look at our current documentation, especially the architecture and getting-started documents under http://codespeak.net/pypy/index.cgi?doc The pypy-dev and pypy-sprint lists are also the contact points for raising questions and suggesting and discussing sprint topics beforehand. We are on #pypy on irc.freenode.net most of the time. Please don't hesitate to contact us or introduce yourself and your interests! Logistics --------- Organizational details will be posted to pypy-sprint and are or will be available in the Pycon2005-Sprint wiki here: http://www.python.org/moin/PyConDC2005/Sprints Registration ------------ send mail to pypy-sprint@codespeak.net, stating the days you can be present and any specific interests if applicable. Registered Participants ----------------------- all days: Jacob Hall?n Armin Rigo Holger Krekel Samuele Pedroni Anders Chrigstr?m Bea D?ring Christian Tismer Richard Emslie -- Christian Tismer :^) tismerysoft GmbH : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ncoghlan at iinet.net.au Tue Feb 15 10:43:30 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Tue Feb 15 10:43:33 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <20050214092543.36F0.JCARLSON@uci.edu> References: <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> <20050214092543.36F0.JCARLSON@uci.edu> Message-ID: <4211C442.3010001@iinet.net.au> Josiah Carlson wrote: >>Quoting >>http://mail.python.org/pipermail/python-dev/2004-November/050049.html: >> >> >>>Python doesn't promise to return a postive integer for id(), although >>>it may have been nicer if it did. It's dangerous to change that now, >>>because some code does depend on the "32 bit-ness as a signed integer" >>>accident of CPython's id() implementation on 32-bit machines. For >>>example, code using struct.pack(), or code using one of ZODB's >>>specialized int-key BTree types with id's as keys. > > > All Tim was saying is that you can't /change/ builtin_id() because of > backwards compatibiliity with Zope and struct.pack(). You are free to > create a positive_id() function, and request its inclusion into builtins > (low probability; people don't like doing that). Heck, you are even free > to drop it in your local site.py implementation. But changing the > current function is probably a no-no. There's always the traditional response to "want to fix it but can't due to backwards compatibility": a keyword argument that defaults to False. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From fredrik at pythonware.com Tue Feb 15 10:56:58 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 15 10:57:26 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: James Y Knight wrote: > However, last time this topic came up, this Tim Peters guy argued against it. ;) > > Quoting http://mail.python.org/pipermail/python-dev/2004-November/050049.html: > >> Python doesn't promise to return a postive integer for id(), although >> it may have been nicer if it did. It's dangerous to change that now, >> because some code does depend on the "32 bit-ness as a signed integer" >> accident of CPython's id() implementation on 32-bit machines. For >> example, code using struct.pack(), or code using one of ZODB's >> specialized int-key BTree types with id's as keys. can anyone explain the struct.pack and ZODB use cases? the first one doesn't make sense to me, and the other relies on Python *not* behaving as documented (which is worse than relying on undocumented behaviour, imo). From fredrik at pythonware.com Tue Feb 15 13:47:35 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue Feb 15 13:47:51 2005 Subject: [Python-Dev] pymalloc on 2.1.3 Message-ID: does anyone remember if there were any big changes in pymalloc between the 2.1 series (where it was introduced) and 2.3 (where it was enabled by default). or in other words, is the 2.1.3 pymalloc stable enough for production use? (we're having serious memory fragmentation problems on a 2.1.3 system, and while I can patch/rebuild the interpreter if necessary, we cannot update the system right now...) From mwh at python.net Tue Feb 15 13:58:19 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 13:58:22 2005 Subject: [Python-Dev] pymalloc on 2.1.3 In-Reply-To: (Fredrik Lundh's message of "Tue, 15 Feb 2005 13:47:35 +0100") References: Message-ID: <2mmzu60yl0.fsf@starship.python.net> "Fredrik Lundh" writes: > does anyone remember if there were any big changes in pymalloc between > the 2.1 series (where it was introduced) and 2.3 (where it was enabled by > default). Yes. (Was it really 2.1? Time flies!) > or in other words, is the 2.1.3 pymalloc stable enough for production use? Well, Tim posted ways of making it crash, but I don't know how likely they are to occur in non-malicious code. "cvs log Objects/obmalloc.c" might enlighten, or at least give an idea which months of the python-dev archive to search. Cheers, mwh -- this "I hate c++" is so old it's as old as C++, yes -- from Twisted.Quotes From tim.peters at gmail.com Tue Feb 15 15:50:02 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 15:50:05 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers In-Reply-To: References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <1F0A5980-7EA6-11D9-9DB9-000A95A50FB2@fuhm.net> Message-ID: <1f7befae05021506507964d814@mail.gmail.com> [Fredrik Lundh] > can anyone explain the struct.pack and ZODB use cases? the first one > doesn't make sense to me, Not deep and surely not common, just possible. If you're on a 32-bit box and doing struct.pack("...i...", ... id(obj) ...), it in fact cannot fail now (no, that isn't guaranteed by the docs, it's just an implementation reality), but would fail if id() ever returned a positive long with the same bit pattern as a negative 32-bit int ("OverflowError: long int too large to convert to int").. > and the other relies on Python *not* behaving as documented (which is worse > than relying on undocumented behaviour, imo). I don't know what you think the problem with ZODB's integer-flavored keys might be, then. The problem I'm thinking of is that by "integer-flavored" they really mean *C* int, not Python integer (which is C long). They're delicate enough that way that they already don't work right on most current 64-bit boxes whenever the value of a Python int doesn't in fact fit in the platform's C int: http://collector.zope.org/Zope/1592 If id() returned a long in some cases on 32-bit boxes, then code using id() as key (in an II or IO tree) or value (in an II or OI) tree would stop working. Again, the Python docs didn't guarantee this would work, and the int-flavored BTrees have 64-bit box bugs in their handling of integers, but the id()-as-key-or-value case has nevertheless worked wholly reliably until now on 32-bit boxes. Any change in visible behavior has the potential to break code -- that shouldn't be controversial, because it's so obvious, and so relentlessly proved in real life. It's a tradeoff. I've said I'm in favor of taking away the sign issue for id() in this case, although I'm not going to claim that no code will break as a result, and I'd be a lot more positive about it if we could use the time machine to change this behavior for 2.4. From tim.peters at gmail.com Tue Feb 15 16:19:01 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 16:19:04 2005 Subject: [Python-Dev] pymalloc on 2.1.3 In-Reply-To: References: Message-ID: <1f7befae0502150719a24607d@mail.gmail.com> [Fredrik Lundh] > does anyone remember if there were any big changes in pymalloc between > the 2.1 series (where it was introduced) and 2.3 (where it was enabled by > default). Yes, huge -- few original lines survived exactly, although many survived in intent. > or in other words, is the 2.1.3 pymalloc stable enough for production use? Different question entirely . It _was_ used in production by some people, and happily so. Major differences: + 2.1 used a probabilistic scheme for guessing whether addresses passed to it were obtained from pymalloc or from the system malloc. It was easy for a malicous pure-Python program to corrupt pymalloc and/or malloc internals as a result, leading to things like segfaults, and even sneaky ways to mutate the Python bytecode stream. It's extremely unlikely that a non- malicious program could bump into these. + Horrid hackery went into 2.3's version to cater to broken extension modules that called PyMem functions without holding the GIL. 2.1's may not be as thread-safe in these cases. + 2.1's only fields requests up to 64 bytes, 2.3's up to 256 bytes. Changes in the dict implementation, and new-style classes, for 2.3 made it a pragmatic necessity to boost the limit for 2.3. > (we're having serious memory fragmentation problems on a 2.1.3 system, > and while I can patch/rebuild the interpreter if necessary, we cannot update > the system right now...) I'd give it a shot -- pymalloc has always been very effective at handling large numbers of small objects gracefully. The meaning of "small" got 4x bigger since 2.1, which appeared to be a pure win, but 64 bytes was enough under 2.1 that most small instance dicts fit. From mwh at python.net Tue Feb 15 16:49:49 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 16:49:51 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: <2mu0pebo6u.fsf@starship.python.net> (Michael Hudson's message of "Tue, 18 Jan 2005 18:13:29 +0000") References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> Message-ID: <2mfyzx257m.fsf@starship.python.net> Michael Hudson writes: > Michael Hudson writes: > >> I hope to have a new patch (which makes PyExc_Exception new-style, but >> allows arbitrary old-style classes as exceptions) "soon". It may even >> pass bits of "make test" :) > > Done: http://www.python.org/sf/1104669 Now I think it's really done, apart from documentation. My design decision was to make Exception new-style. Things can be raised if they are instances of old-style classes or instances of Exception. If this meets with general agreement, I'd like to check the above patch in. It will break some highly introspective code, so it's IMO best to get it in early in the 2.5 cycle. The other option is to keep Exception old-style but allow new-style subclasses, but I think all this will do is break the above mentioned introspective code in a quieter way... The patch also updates the PendingDeprecationWarning on raising a string exception to a full DeprecationWarning (something that should be done anyway). Cheers, mwh -- python py.py ~/Source/python/dist/src/Lib/test/pystone.py Pystone(1.1) time for 5000 passes = 19129.1 This machine benchmarks at 0.261381 pystones/second From gvanrossum at gmail.com Tue Feb 15 19:55:53 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 15 19:56:12 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: <2mfyzx257m.fsf@starship.python.net> References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> <2mfyzx257m.fsf@starship.python.net> Message-ID: > My design decision was to make Exception new-style. Things can be > raised if they are instances of old-style classes or instances of > Exception. If this meets with general agreement, I'd like to check > the above patch in. I like it, but didn't you forget to mention that strings can still be raised? I think we can't break that (but we can insert a deprecation warning for this in 2.5 so we can hopefully deprecate it in 2.6, or 2.7 at the latest). > The patch also updates the PendingDeprecationWarning on raising a > string exception to a full DeprecationWarning (something that should > be done anyway). What I said. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh at python.net Tue Feb 15 20:27:23 2005 From: mwh at python.net (Michael Hudson) Date: Tue Feb 15 20:27:25 2005 Subject: [Python-Dev] Exceptions *must*? be old-style classes? In-Reply-To: (Guido van Rossum's message of "Tue, 15 Feb 2005 10:55:53 -0800") References: <20050117105219.GA12763@vicky.ecs.soton.ac.uk> <2mbrboca5r.fsf@starship.python.net> <5.1.1.6.0.20050117113419.03972d20@mail.telecommunity.com> <41EC38DE.8080603@v.loewis.de> <2my8eqbrk2.fsf@starship.python.net> <2mu0pebo6u.fsf@starship.python.net> <2mfyzx257m.fsf@starship.python.net> Message-ID: <2mbral1v50.fsf@starship.python.net> Guido van Rossum writes: >> My design decision was to make Exception new-style. Things can be >> raised if they are instances of old-style classes or instances of >> Exception. If this meets with general agreement, I'd like to check >> the above patch in. > > I like it, but didn't you forget to mention that strings can still be > raised? I think we can't break that (but we can insert a deprecation > warning for this in 2.5 so we can hopefully deprecate it in 2.6, or > 2.7 at the latest). I try to forget that as much as possible :) >> The patch also updates the PendingDeprecationWarning on raising a >> string exception to a full DeprecationWarning (something that should >> be done anyway). > > What I said. :-) :) I'll try to bash the documentation into shape next. Cheers, mwh -- please realize that the Common Lisp community is more than 40 years old. collectively, the community has already been where every clueless newbie will be going for the next three years. so relax, please. -- Erik Naggum, comp.lang.lisp From ejones at uwaterloo.ca Tue Feb 15 22:39:38 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Tue Feb 15 22:39:32 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? Message-ID: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> After I finally understood what thread-safety guarantees the Python memory allocator needs to provide, I went and did some hard thinking about the code this afternoon. I believe that my modifications provide the same guarantees that the original version did. I do need to declare the arenas array to be volatile, and leak the array when resizing it. Please correct me if I am wrong, but the situation that needs to be supported is this: While one thread holds the GIL, any other thread can call PyObject_Free with a pointer that was returned by the system malloc. The following situation is *not* supported: While one thread holds the GIL, another thread calls PyObject_Free with a pointer that was returned by PyObject_Malloc. I'm hoping that I got things a little better this time around. I've submitted my updated patch to the patch tracker. For reference, I've included links to SourceForge and the previous thread. Thank you, Evan Jones Previous thread: http://mail.python.org/pipermail/python-dev/2005-January/051255.html Patch location: http://sourceforge.net/tracker/index.php? func=detail&aid=1123430&group_id=5470&atid=305470 From tim.peters at gmail.com Tue Feb 15 23:52:02 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Feb 15 23:52:05 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> Message-ID: <1f7befae05021514524d0a35ec@mail.gmail.com> [Evan Jones] > After I finally understood what thread-safety guarantees the Python > memory allocator needs to provide, I went and did some hard thinking > about the code this afternoon. I believe that my modifications provide > the same guarantees that the original version did. I do need to declare > the arenas array to be volatile, and leak the array when resizing it. > Please correct me if I am wrong, but the situation that needs to be > supported is this: As I said before, I don't think we need to support this any more. More, I think we should not -- the support code is excruciatingly subtle, it wasted plenty of your time trying to keep it working, and if we keep it in it's going to continue to waste time over the coming years (for example, in the short term, it will waste my time reviewing it). > While one thread holds the GIL, any other thread can call PyObject_Free > with a pointer that was returned by the system malloc. What _was_ supported was more generally that any number of threads could call PyObject_Free with pointers that were returned by the system malloc/realloc at the same time as a single thread, holding the GIL, was doing anything whatsoever (including executing any code inside obmalloc.c) Although that's a misleading way of expressing the actual intent; more on that below. > The following situation is *not* supported: > > While one thread holds the GIL, another thread calls PyObject_Free with > a pointer that was returned by PyObject_Malloc. Right, that was never supported (and I doubt it could be without introducing a new mutex in obmalloc.c). > I'm hoping that I got things a little better this time around. I've > submitted my updated patch to the patch tracker. For reference, I've > included links to SourceForge and the previous thread. > > Thank you, Thank you! I probably can't make time to review anything before this weekend. I will try to then. I expect it would be easier if you ripped out the horrid support for PyObject_Free abuse; in a sane world, the release-build PyMem_FREE, PyMem_Del, and PyMem_DEL would expand to "free" instead of to "PyObject_FREE" (via changes to pymem.h). IOW, it was never the _intent_ that people be able to call PyObject_Free without holding the GIL. The need for that came from a different problem, that old code sometimes mixed calls to PyObject_New with calls to PyMem_DEL (or PyMem_FREE or PyMem_Del). It's for that latter reason that PyMem_DEL (and its synonyms) were changed to expand to PyObject_Free. This shouldn't be supported anymore. Because it _was_ supported, there was no way to tell whether PyObject_Free was being called because (a) we were catering to long-obsolete but once-loved code that called PyMem_DEL while holding the GIL and with a pointer obtained by PyObject_New; or, (b) somebody was calling PyMem_Del (etc) with a non-object pointer they had obtained from PyMem_New, or from the system malloc directly. It was never legit to do #a without holding the GIL. It was clear as mud whether it was legit to do #b without holding the GIL. If PyMem_Del (etc) change to expand to "free" in a release build, then #b can remain clear as mud without harming anyone. Nobody should be doing #a anymore. If someone still is, "tough luck -- fix it, you've had years of warning" is easy for me to live with at this stage. I suppose the other consideration is that already-compiled extension modules on non-Windows(*) systems will, if they're not recompiled, continue to call PyObject_Free everywhere they had a PyMem_Del/DEL/FREE call. If such code is calling it without holding the GIL, and obmalloc.c stops trying to support this insanity, then they're going to grow some thread races they woudn't have if they did recompile (to get such call sites remapped to the system free). I don't really care about that either: it's a general rule that virtually all Python API functions must be called with the GIL held, and there was never an exception in the docs for the PyMem_ family. (*) Windows is immune simply because the Windows Python is set up in such a way that you always have to recompile extension modules when Python's minor version number (the j in i.j.k) gets bumped. From ejones at uwaterloo.ca Wed Feb 16 04:02:53 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Wed Feb 16 04:04:05 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <1f7befae05021514524d0a35ec@mail.gmail.com> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> Message-ID: <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> On Feb 15, 2005, at 17:52, Tim Peters wrote: > As I said before, I don't think we need to support this any more. > More, I think we should not -- the support code is excruciatingly > subtle, it wasted plenty of your time trying to keep it working, and > if we keep it in it's going to continue to waste time over the coming > years (for example, in the short term, it will waste my time reviewing > it). I do not have nearly enough experience in the Python world to evaluate this decision. I've only been programming in Python for about two years now, and as I am sure you are aware, this is my first patch that I have submitted to Python. I don't really know my way around the Python internals, beyond writing basic extensions in C. Martin's opinion is clearly the opposite of yours. Basically, the debate seems to boil down to maintaining backwards compatibility at the cost of making the code in obmalloc.c harder to understand. The particular case that is being supported could definitely be viewed as a "bug" in the code that using obmalloc. It also likely is quite rare. However, until now it has been supported, so it is hard to judge exactly how much code would be affected. It would definitely be a minor barrier to moving to Python 2.5. Is there some sort of consensus that is possible on this issue? >> While one thread holds the GIL, any other thread can call >> PyObject_Free >> with a pointer that was returned by the system malloc. > What _was_ supported was more generally that > > any number of threads could call PyObject_Free with pointers that > were > returned by the system malloc/realloc > > at the same time as > > a single thread, holding the GIL, was doing anything whatsoever > (including > executing any code inside obmalloc.c) Okay, good, that is what I have assumed. > Although that's a misleading way of expressing the actual intent; more > on that below. That's fine. It may be a misleading description of the intent, but it is an accurate description of the required behaviour. At least I hope it is. > I expect it would be easier if you > ripped out the horrid support for PyObject_Free abuse; in a sane > world, the release-build PyMem_FREE, PyMem_Del, and PyMem_DEL would > expand to "free" instead of to "PyObject_FREE" (via changes to > pymem.h). It turns out that basically the only thing that would change would be removing the "volatile" specifiers from two of the global variables, plus it would remove about 100 lines of comments. :) The "work" was basically just hurting my brain trying to reason about the concurrency issues, not changing code. > It was never legit to do #a without holding the GIL. It was clear as > mud whether it was legit to do #b without holding the GIL. If > PyMem_Del (etc) change to expand to "free" in a release build, then #b > can remain clear as mud without harming anyone. Nobody should be > doing #a anymore. If someone still is, "tough luck -- fix it, you've > had years of warning" is easy for me to live with at this stage. Hmm... The issue is that case #a may not be an easy problem to diagnose: Some implementations of free() will happily do nothing if they are passed a pointer they know nothing about. This would just result in a memory leak. Other implementations of free() can output a warning or crash in this case, which would make it trivial to locate. > I suppose the other consideration is that already-compiled extension > modules on non-Windows(*) systems will, if they're not recompiled, > continue to call PyObject_Free everywhere they had a > PyMem_Del/DEL/FREE call. Is it guaranteed that extension modules will be binary compatible with future Python releases? I didn't think this was the case. Thanks for the feedback, Evan Jones From tim.peters at gmail.com Wed Feb 16 05:26:18 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 16 05:26:22 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> Message-ID: <1f7befae05021520263d77a2a3@mail.gmail.com> [Tim Peters] >> As I said before, I don't think we need to support this any more. >> More, I think we should not -- the support code is excruciatingly >> subtle, it wasted plenty of your time trying to keep it working, and >> if we keep it in it's going to continue to waste time over the coming >> years (for example, in the short term, it will waste my time reviewing >> it). [Evan Jones] > I do not have nearly enough experience in the Python world to evaluate > this decision. I've only been programming in Python for about two years > now, and as I am sure you are aware, this is my first patch that I have > submitted to Python. I don't really know my way around the Python > internals, beyond writing basic extensions in C. Martin's opinion is > clearly the opposite of yours. ? This is all I recall Martin saying about this: http://mail.python.org/pipermail/python-dev/2005-January/051265.html I'm not certain it is acceptable to make this assumption. Why is it not possible to use the same approach that was previously used (i.e. leak the arenas array)? Do you have something else in mind? I'll talk with Martin about it if he still wants to. Martin, this miserable code must die! > Basically, the debate seems to boil down to maintaining backwards > compatibility at the cost of making the code in obmalloc.c harder to > understand. The "let it leak to avoid thread problems" cruft is arguably the single most obscure bit of coding in Python's code base. I created it, so I get to say that . Even 100 lines of comments aren't enough to make it clear, as you've discovered. I've lost track of how many hours of my life have been pissed away explaining it, and its consequences (like how come this or that memory-checking program complains about the memory leak it causes), and the historical madness that gave rise to it in the beginning. I've had enough of it -- the only purpose this part ever had was to protect against C code that wasn't playing by the rules anyway. BFD. There are many ways to provoke segfaults with C code that breaks the rules, and there's just not anything that special about this way _except_ that I added objectionable (even at the time) hacks to preserve this kind of broken C code until authors had time to fix it. Time's up. > The particular case that is being supported could definitely be viewed > as a "bug" in the code that using obmalloc. It also likely is quite rare. > However, until now it has been supported, so it is hard to judge exactly > how much code would be affected. People spent many hours searching for affected code when it first went in, and only found a few examples then, in obscure extension modules. It's unlikely usage has grown. The hack was put it in for the dubious benefit of the few examples that were found then. > It would definitely be a minor barrier to moving to Python 2.5. That's in part what python-dev is for. Of course nobody here has code that will break -- but the majority of high-use extension modules are maintained by people who read this list, so that's not as empty as it sounds. It's also what alpha and beta releases are for. Fear of change isn't a good enough reason to maintain this code. > Is there some sort of consensus that is possible on this issue? Absolutely, provided it matches my view <0.5 wink>. Rip it out, and if alpha/beta testing suggests that's a disaster, _maybe_ put it back in. ... > It turns out that basically the only thing that would change would be > removing the "volatile" specifiers from two of the global variables, > plus it would remove about 100 lines of comments. :) The "work" was > basically just hurting my brain trying to reason about the concurrency > issues, not changing code. And the brain of everyone else who ever bumps into this. There's a high probability that if this code actually doesn't work (can you produce a formal proof of correctness for it? I can't -- and I tried), nothing can be done to repair it; and code this outrageously delicate has a decent chance of being buggy no matter how many people stare at it (overlooking that you + me isn't that many). You also mentioned before that removing the "volatile"s may have given a speed boost, and that's believable. I mentioned above the unending costs in explanations, and nuisance gripes from memory-integrity tools about the deliberate leaks. There are many kinds of ongoing costs here, and no _intended_ benefit anymore (it certainly wasn't my intent to cater to buggy C code forever). >> It was never legit to do #a without holding the GIL. It was clear as >> mud whether it was legit to do #b without holding the GIL. If >> PyMem_Del (etc) change to expand to "free" in a release build, then #b >> can remain clear as mud without harming anyone. Nobody should be >> doing #a anymore. If someone still is, "tough luck -- fix it, you've >> had years of warning" is easy for me to live with at this stage. > Hmm... The issue is that case #a may not be an easy problem to > diagnose: Many errors in C code are difficult to diagnose. That's life. Mixing a PyObject call with a PyMem call is obvious now "by eyeball", so if there is such code still out there, and it blows up, an experienced eye has a good chance of spotting the error at once. ' > Some implementations of free() will happily do nothing if > they are passed a pointer they know nothing about. This would just > result in a memory leak. Other implementations of free() can output a > warning or crash in this case, which would make it trivial to locate. I expect most implementations of free() would end up corrupting memory state, leading to no symptoms or to disastrous symptoms, from 0 to a googol cycles after the mistake was made. Errors in using malloc/free are often nightmares to debug. We're not trying to make coding in C pleasant here -- which is good, because that's unachievable . >> I suppose the other consideration is that already-compiled extension >> modules on non-Windows(*) systems will, if they're not recompiled, >> continue to call PyObject_Free everywhere they had a >> PyMem_Del/DEL/FREE call. > Is it guaranteed that extension modules will be binary compatible with > future Python releases? I didn't think this was the case. Nope, that's not guarantfeed. There's a magic number (PYTHON_API_VERSION) that changes whenever the Python C API undergoes an incompatible change, and binary compatibility is guaranteed across releases if that doesn't change. The then-current value of PYTHON_API_VERSION gets compiled into extensions, by virtue of the module-initialization macro their initialization function has to call. The guts of that function are in the Python core (Py_InitModule4()), which raises this warning if the passed-in version doesn't match the current version: "Python C API version mismatch for module %.100s:\ This Python has API version %d, module %.100s has version %d."; This is _just_ a warning, though. Perhaps unfortunately for Python's users, Guido learned long ago that most API mismatches don't actually matter for his own code . For example, the C API officially changed when the signature of PyFrame_New() changed in 2001 -- but almost no extension modules call that function. Similarly, if we change PyMem_Del (etc) to map to the system free(), PYTHON_API_VERSION should be bumped for Python 2.5 -- but many people will ignore the mismatch warning, and again it will probably make no difference (if there's code still out there that calls PyMem_DEL (etc) without holding the GIL, I don't know about it). From kbk at shore.net Wed Feb 16 06:32:36 2005 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Feb 16 06:32:48 2005 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200502160532.j1G5Wahi031058@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 298 open (+14) / 2754 closed ( +6) / 3052 total (+20) Bugs : 823 open (+19) / 4829 closed (+17) / 5652 total (+36) RFE : 168 open ( +1) / 144 closed ( +2) / 312 total ( +3) New / Reopened Patches ______________________ date.strptime and time.strptime as well (2005-02-04) http://python.org/sf/1116362 opened by Josh NameError in cookielib domain check (2005-02-04) CLOSED http://python.org/sf/1116583 opened by Chad Miller Minor improvement on 1116583 (2005-02-06) http://python.org/sf/1117114 opened by John J Lee cookielib and cookies with special names (2005-02-06) http://python.org/sf/1117339 opened by John J Lee cookielib LWPCookieJar and MozillaCookieJar exceptions (2005-02-06) http://python.org/sf/1117398 opened by John J Lee cookielib.LWPCookieJar incorrectly loads value-less cookies (2005-02-06) http://python.org/sf/1117454 opened by John J Lee urllib2 .getheaders attribute error (2005-02-07) http://python.org/sf/1117588 opened by Wummel replace md5 impl. with one having a more free license (2005-02-07) CLOSED http://python.org/sf/1117961 opened by Matthias Klose unknown locale: lt_LT (patch) (2005-02-08) http://python.org/sf/1118341 opened by Nerijus Baliunas Fix crash in xmlprase_GetInputContext in pyexpat.c (2005-02-08) http://python.org/sf/1118602 opened by Mathieu Fenniak enable time + timedelta (2005-02-08) http://python.org/sf/1118748 opened by Josh fix for a bug in Header.__unicode__() (2005-02-09) CLOSED http://python.org/sf/1119016 opened by Bj?rn Lindqvist python -c readlink()s and stat()s '-c' (2005-02-09) http://python.org/sf/1119423 opened by Brian Foley patches to compile for AIX 4.1.x (2005-02-09) http://python.org/sf/1119626 opened by Stuart D. Gathman better datetime support for xmlrpclib (2005-02-10) http://python.org/sf/1120353 opened by Fred L. Drake, Jr. ZipFile.open - read-only file-like obj for files in archive (2005-02-11) http://python.org/sf/1121142 opened by Alan McIntyre Reference count bug fix (2005-02-12) http://python.org/sf/1121234 opened by Michiel de Hoon sha and md5 modules should use OpenSSL when possible (2005-02-12) http://python.org/sf/1121611 opened by Gregory P. Smith Python memory allocator: Free memory (2005-02-15) http://python.org/sf/1123430 opened by Evan Jones Patches Closed ______________ Add SSL certificate validation (2005-02-03) http://python.org/sf/1115631 closed by noonian NameError in cookielib domain check (2005-02-04) http://python.org/sf/1116583 closed by rhettinger replace md5 impl. with one having a more free license (2005-02-07) http://python.org/sf/1117961 closed by loewis fix for a bug in Header.__unicode__() (2005-02-09) http://python.org/sf/1119016 closed by sonderblade time.tzset() not built on Solaris (2005-01-04) http://python.org/sf/1096244 closed by bcannon OSATerminology extension fix (2004-06-25) http://python.org/sf/979784 closed by jackjansen New / Reopened Bugs ___________________ xmlrpclib: wrong decoding in '_stringify' (2005-02-04) CLOSED http://python.org/sf/1115989 opened by Dieter Maurer Prefix search is filesystem-centric (2005-02-04) http://python.org/sf/1116520 opened by Steve Holden Wrong match with regex, non-greedy problem (2005-02-05) CLOSED http://python.org/sf/1116571 opened by rengel Solaris 10 fails to compile complexobject.c (2005-02-04) http://python.org/sf/1116722 opened by Case Van Horsen Dictionary Evaluation Issue (2005-02-05) http://python.org/sf/1117048 opened by WalterBrunswick Typo in list.sort() documentation (2005-02-06) CLOSED http://python.org/sf/1117063 opened by Viktor Ferenczi sgmllib.SGMLParser (2005-02-06) CLOSED http://python.org/sf/1117302 opened by Paul Birnie SimpleHTTPServer and mimetypes: almost together (2005-02-06) http://python.org/sf/1117556 opened by Matthew L Daniel os.path.exists returns false negatives in MAC environments. (2005-02-07) http://python.org/sf/1117601 opened by Stephen Bennett profiler: Bad return and Bad call errors with exceptions (2005-02-06) http://python.org/sf/1117670 opened by Matthew Mueller "in" operator bug ? (2005-02-07) CLOSED http://python.org/sf/1117757 opened by Andrea Bolzonella BSDDB openhash (2005-02-07) http://python.org/sf/1117761 opened by Andrea Bolzonella lists coupled (2005-02-07) CLOSED http://python.org/sf/1118101 opened by chopf Error in representation of complex numbers(again) (2005-02-09) http://python.org/sf/1118729 opened by George Yoshida builtin file() vanishes (2005-02-08) CLOSED http://python.org/sf/1118977 opened by Barry Alan Scott Docs for set() omit constructor (2005-02-09) CLOSED http://python.org/sf/1119282 opened by Kent Johnson curses.initscr - initscr exit w/o env(TERM) set (2005-02-09) http://python.org/sf/1119331 opened by Jacob Lilly xrange() builtin accepts keyword arg silently (2005-02-09) http://python.org/sf/1119418 opened by Martin Blais Python Programming FAQ should be updated for Python 2.4 (2005-02-09) http://python.org/sf/1119439 opened by Michael Hoffman ScrolledText allows Frame.bbox to hide Text.bbox (2005-02-09) http://python.org/sf/1119673 opened by Drew Perttula list extend() accepts args besides lists (2005-02-09) CLOSED http://python.org/sf/1119700 opened by Dan Everhart Static library incompatible with nptl (2005-02-10) http://python.org/sf/1119860 opened by daniel Static library incompatible with nptl (2005-02-10) CLOSED http://python.org/sf/1119866 opened by daniel Python 2.4.0 crashes with a segfault, EXAMPLE ATTACHED (2005-02-11) http://python.org/sf/1120452 opened by Viktor Ferenczi bug in unichr() documentation (2005-02-11) http://python.org/sf/1120777 opened by Marko Kreen Problem in join function definition (2005-02-11) CLOSED http://python.org/sf/1120862 opened by yseb file seek error (2005-02-11) CLOSED http://python.org/sf/1121152 opened by Richard Lawhorn Python24.dll crashes, EXAMPLE ATTACHED (2005-02-12) http://python.org/sf/1121201 opened by Viktor Ferenczi zip incorrectly and incompletely documented (2005-02-12) http://python.org/sf/1121416 opened by Alan Decorated functions are unpickleable (2005-02-12) CLOSED http://python.org/sf/1121475 opened by S Joshua Swamidass distutils.dir_utils not unicode compatible (2005-02-12) http://python.org/sf/1121494 opened by Morten Lied Johansen subprocess example missing "stdout=PIPE" (2005-02-12) http://python.org/sf/1121579 opened by Monte Davidoff SMTPHandler argument misdescribed (2005-02-13) http://python.org/sf/1121875 opened by Peter marshal may crash on truncated input (2005-02-14) http://python.org/sf/1122301 opened by Fredrik Lundh incorrect handle of declaration in markupbase (2005-02-14) http://python.org/sf/1122916 opened by Wai Yip Tung Typo in Curses-Function doc (2005-02-15) http://python.org/sf/1123268 opened by Aaron C. Spike test_peepholer failing on HEAD (2005-02-15) CLOSED http://python.org/sf/1123354 opened by Tim Peters add SHA256/384/512 to lib (2005-02-16) http://python.org/sf/1123660 opened by paul rubin Bugs Closed ___________ xmlrpclib: wrong decoding in '_stringify' (2005-02-04) http://python.org/sf/1115989 closed by fdrake Wrong match with regex, non-greedy problem (2005-02-05) http://python.org/sf/1116571 closed by effbot Typo in list.sort() documentation (2005-02-05) http://python.org/sf/1117063 closed by rhettinger sgmllib.SGMLParser (2005-02-06) http://python.org/sf/1117302 closed by effbot PyThreadState_SetAsyncExc segfault (2004-11-18) http://python.org/sf/1069160 closed by gvanrossum "in" operator bug ? (2005-02-07) http://python.org/sf/1117757 closed by tim_one lists coupled (2005-02-07) http://python.org/sf/1118101 closed by tim_one builtin file() vanishes (2005-02-09) http://python.org/sf/1118977 closed by loewis Docs for set() omit constructor (2005-02-09) http://python.org/sf/1119282 closed by rhettinger list extend() accepts args besides lists (2005-02-09) http://python.org/sf/1119700 closed by rhettinger Static library incompatible with nptl (2005-02-10) http://python.org/sf/1119866 closed by ekloef Problem in join function definition (2005-02-11) http://python.org/sf/1120862 closed by rhettinger file seek error (2005-02-11) http://python.org/sf/1121152 closed by tim_one Decorated functions are unpickleable (2005-02-12) http://python.org/sf/1121475 closed by bcannon "Macintosh" references in the docs need to be checked. (2005-01-04) http://python.org/sf/1095802 closed by bcannon RE '*.?' cores if len of found string exceeds 10000 (2004-10-26) http://python.org/sf/1054564 closed by effbot missing mappings in locale tables (2002-10-09) http://python.org/sf/620739 closed by effbot test_peepholer failing on HEAD (2005-02-15) http://python.org/sf/1123354 closed by tim_one New / Reopened RFE __________________ urllib.urlopen should put the http-error-code in .info() (2005-02-07) http://python.org/sf/1117751 opened by Robert Kiendl Option to force variables to be declared (2005-02-14) http://python.org/sf/1122279 opened by Zac Evans Line Numbers (2005-02-14) http://python.org/sf/1122532 opened by Egon Frerich RFE Closed __________ commands.mkarg function should be public (2001-12-04) http://python.org/sf/489106 closed by donut Missing socketpair() function. (2002-06-12) http://python.org/sf/567969 closed by grahamh From martin at v.loewis.de Wed Feb 16 08:50:51 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed Feb 16 08:50:55 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <1f7befae05021520263d77a2a3@mail.gmail.com> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> <1f7befae05021520263d77a2a3@mail.gmail.com> Message-ID: <4212FB5B.1030209@v.loewis.de> Tim Peters wrote: > I'm not certain it is acceptable to make this assumption. Why is it > not possible to use the same approach that was previously used (i.e. > leak the arenas array)? > > Do you have something else in mind? I'll talk with Martin about it if > he still wants to. Martin, this miserable code must die! That's fine with me. I meant what I said: "I'm not certain". The patch original claimed that it cannot possibly preserve this feature, and I felt that this claim was incorrect - indeed, Evan then understood the feature, and made it possible. I can personally accept breaking the code that still relies on the invalid APIs. The only problem is that it is really hard to determine whether some code *does* violate the API usage. Regards, Martin From konrad.hinsen at laposte.net Thu Feb 10 09:38:40 2005 From: konrad.hinsen at laposte.net (konrad.hinsen@laposte.net) Date: Wed Feb 16 14:20:17 2005 Subject: [Numpy-discussion] Re: [Python-Dev] Re: Numeric life as I see it In-Reply-To: References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: On 10.02.2005, at 05:36, Guido van Rossum wrote: > And why would a Matrix need to inherit from a C-array? Wouldn't it > make more sense from an OO POV for the Matrix to *have* a C-array > without *being* one? Definitely. Most array operations make no sense on matrices. And matrices are limited to two dimensions. Making Matrix a subclass of Array would be inheritance for implementation while removing 90% of the interface. On the other hand, a Matrix object is perfectly defined by its behaviour and independent of its implementation. One could perfectly well implement one using Python lists or dictionaries, even though that would be pointless from a performance point of view. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ ------- From konrad.hinsen at laposte.net Thu Feb 10 09:45:28 2005 From: konrad.hinsen at laposte.net (konrad.hinsen@laposte.net) Date: Wed Feb 16 14:20:18 2005 Subject: [Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it In-Reply-To: <420ADE90.9050304@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> Message-ID: <1c3044466186480f55ef45d2c977731b@laposte.net> On 10.02.2005, at 05:09, Travis Oliphant wrote: > I'm not sure I agree. The ufuncobject is the only place where this > concern existed (should we trip OverFlow, ZeroDivision, etc. errors > durring array math). Numarray introduced and implemented the concept > of error modes that can be pushed and popped. I believe this is the > right solution for the ufuncobject. Indeed. Note also that the ufunc stuff is less critical to agree on than the array data structure. Anyone unhappy with ufuncs could write their own module and use it instead. It would be the data structure and its access rules that fix the structure of all the code that uses it, so that's what needs to be acceptable to everyone. > One question we are pursuing is could the arrayobject get into the > core without a particular ufunc object. Most see this as > sub-optimal, but maybe it is the only way. Since all the artithmetic operations are in ufunc that would be suboptimal solution, but indeed still a workable one. > I appreciate some of what Paul is saying here, but I'm not fully > convinced that this is still true with Python 2.2 and up new-style > c-types. The concerns seem to be over the fact that you have to > re-implement everything in the sub-class because the base-class will > always return one of its objects instead of a sub-class object. I'd say that such discussions should be postponed until someone proposes a good use for subclassing arrays. Matrices are not one, in my opinion. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ ------- From verveer at embl.de Thu Feb 10 10:53:10 2005 From: verveer at embl.de (Peter Verveer) Date: Wed Feb 16 14:20:20 2005 Subject: [Python-Dev] Re: [Numpy-discussion] Re: Numeric life as I see it In-Reply-To: <420B29AE.8030701@ee.byu.edu> References: <420A8406.4020808@ee.byu.edu> <420AAC33.807@ee.byu.edu> <420AB084.1000008@v.loewis.de> <420AB928.3090004@pfdubois.com> <420ADE90.9050304@ee.byu.edu> <1c3044466186480f55ef45d2c977731b@laposte.net> <420B29AE.8030701@ee.byu.edu> Message-ID: <50ac60a36c2add7d708dc02d8bf623a3@embl.de> On Feb 10, 2005, at 10:30 AM, Travis Oliphant wrote: > >>> One question we are pursuing is could the arrayobject get into the >>> core without a particular ufunc object. Most see this as >>> sub-optimal, but maybe it is the only way. >> >> >> Since all the artithmetic operations are in ufunc that would be >> suboptimal solution, but indeed still a workable one. > > > I think replacing basic number operations of the arrayobject should > simple, so perhaps a default ufunc object could be worked out for > inclusion. I agree, getting it in the core is among others, intended to give it broad access, not just to hard-core numeric people. For many uses (including many of my simpler scripts) you don't need the more exotic functionality of ufuncs. You could just do with implementing the standard math functions, possibly leaving out things like reduce. That would be very easy to implement. > >> >>> I appreciate some of what Paul is saying here, but I'm not fully >>> convinced that this is still true with Python 2.2 and up new-style >>> c-types. The concerns seem to be over the fact that you have to >>> re-implement everything in the sub-class because the base-class will >>> always return one of its objects instead of a sub-class object. >> >> >> I'd say that such discussions should be postponed until someone >> proposes a good use for subclassing arrays. Matrices are not one, in >> my opinion. >> > Agreed. It is is not critical to what I am doing, and I obviously > need more understanding before tackling such things. Numeric3 uses > the new c-type largely because of the nice getsets table which is > separate from the methods table. This replaces the rather ugly > C-functions getattr and setattr. I would agree that sub-classing arrays might not be worth the trouble. Peter From perry at stsci.edu Thu Feb 10 16:21:24 2005 From: perry at stsci.edu (Perry Greenfield) Date: Wed Feb 16 14:20:21 2005 Subject: [Python-Dev] RE: [Numpy-discussion] Numeric life as I see it In-Reply-To: <420AB928.3090004@pfdubois.com> Message-ID: Paul Dubois wrote: > > Aside: While I am at it, let me reiterate what I have said to the other > developers privately: there is NO value to inheriting from the array > class. Don't try to achieve that capability if it costs anything, even > just effort, because it buys you nothing. Those of you who keep > remarking on this as if it would simply haven't thought it through IMHO. > It sounds so intellectually appealing that David Ascher and I had a > version of Numeric that almost did it before we realized our folly. > To be contrarian, we did find great benefit (at least initially) for inheritance for developing the record array and character array classes since they share so many structural operations (indexing, slicing, transposes, concatenation, etc.) with numeric arrays. It's possible that the approach that Travis is considering doesn't need to use inheritance to accomplish this (I don't know enough about the details yet), but it sure did save a lot of duplication of implementation. I do understand what you are getting at. Any numerical array inheritance generally forces one to reimplement all ufuncs and such, and that does make it less useful in that case (though I still wonder if it still isn't better than the alternatives) Perry Greenfield From nick at ilm.com Fri Feb 11 23:32:15 2005 From: nick at ilm.com (Nick Rasmussen) Date: Wed Feb 16 14:20:22 2005 Subject: [Python-Dev] subclassing PyCFunction_Type Message-ID: <20050211223215.GS14902@ewok.lucasdigital.com> tommy said that this would be the best place to ask this question.... I'm trying to get functions wrapped via boost to show up as builtin types so that pydoc includes them when documenting the module containing them. Right now boost python functions are created using a PyTypeObject such that when inspect.isbuiltin does: return isinstance(object, types.BuiltinFunctionType) isintance returns 0. Initially I had just modified a local pydoc to document all functions with unknown source modules (since the module can't be deduced from non-python functions), but I figured that the right fix was to get boost::python functions to correctly show up as builtins, so I tried setting PyCFunction_Type as the boost function type object's tp_base, which worked fine for me using linux on amd64, but when my patch was tried out on other platforms, it ran into regression test failures: http://mail.python.org/pipermail/c++-sig/2005-February/008545.html So I have some questions: Should boost::python functions be modified in some way to show up as builtin function types or is the right fix really to patch pydoc? Is PyCFunction_Type intended to be subclassable? I noticed that it does not have Py_TPFLAGS_BASETYPE set in its tp_flags. Also, PyCFunction_Type has Py_TPFLAGS_HAVE_GC, and as the assertion failures in the testsuite seemed to be centered around object allocation/ garbage collection, so is there something related to subclassing a gc-aware class that needs to be happening (currently the boost type object doesn't support garbage collection). If subclassing PyCFunction_Type isn't the right way to make these functions be considered as builtin functions, what is? -nick From apolinejuliet at yahoo.com Mon Feb 14 04:31:40 2005 From: apolinejuliet at yahoo.com (apoline juliet obina) Date: Wed Feb 16 14:20:24 2005 Subject: [Python-Dev] Py2.3.1 Message-ID: <20050214033140.60072.qmail@web30707.mail.mud.yahoo.com> iis it "pydos" ? your net add?/ --------------------------------- Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050214/ca6c95d6/attachment.htm From Martin.Gfeller at comit.ch Mon Feb 14 19:41:51 2005 From: Martin.Gfeller at comit.ch (Gfeller Martin) Date: Wed Feb 16 14:20:25 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% Message-ID: Dear all, I'm running a large Zope application on a 1x1GHz CPU 1GB mem Window XP Prof machine using Zope 2.7.3 and Py 2.3.4 The application typically builds large lists by appending and extending them. We regularly observed that using a given functionality a second time using the same process was much slower (50%) than when it ran the first time after startup. This behavior greatly improved with Python 2.3 (thanks to the improved Python object allocator, I presume). Nevertheless, I tried to convert the heap used by Python to a Windows Low Fragmentation Heap (available on XP and 2003 Server). This improved the overall run time of a typical CPU-intensive report by about 15% (overall run time is in the 5 minutes range), with the same memory consumption. I consider 15% significant enough to let you know about it. For information about the Low Fragmentation Heap, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/low_fragmentation_heap.asp Best regards, Martin PS: Since I don't speak C, I used ctypes to convert all heaps in the process to LFH (I don't know how to determine which one is the C heap). ________________________ COMIT AG Risk Management Systems Pflanzschulstrasse 7 CH-8004 Z?rich Telefon +41 (44) 1 298 92 84 http://www.comit.ch http://www.quantax.com - Quantax Trading and Risk System From leogah at spamcop.net Mon Feb 14 23:35:31 2005 From: leogah at spamcop.net (Richard Brodie) Date: Wed Feb 16 14:20:26 2005 Subject: [Python-Dev] builtin_id() returns negative numbers Message-ID: <000701c512e5$7de81660$af0189c3@oemcomputer> > Maybe it's just a wart we have to live with now; OTOH, > the docs explicitly warn that id() may return a long, so any code > relying on "short int"-ness has always been relying on an > implementation quirk. Well, the docs say that %x does unsigned conversion, so they've been relying on an implementation quirk as well ;) Would it be practical to add new conversion syntax to string interpolation? Like, for example, %p as an unsigned hex number the same size as (void *). Otherwise, unless I misunderstand integer unification, one would just have to strike the distinction between, say, %d and %u. From mwh at python.net Wed Feb 16 14:33:28 2005 From: mwh at python.net (Michael Hudson) Date: Wed Feb 16 14:33:31 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <20050211223215.GS14902@ewok.lucasdigital.com> (Nick Rasmussen's message of "Fri, 11 Feb 2005 14:32:15 -0800") References: <20050211223215.GS14902@ewok.lucasdigital.com> Message-ID: <2m4qgc1vfb.fsf@starship.python.net> Nick Rasmussen writes: [five days ago] > Should boost::python functions be modified in some way to show > up as builtin function types or is the right fix really to patch > pydoc? My heart leans towards the latter. > Is PyCFunction_Type intended to be subclassable? Doesn't look like it, does it? :) More seriosly, "no". Cheers, mwh -- ARTHUR: Don't ask me how it works or I'll start to whimper. -- The Hitch-Hikers Guide to the Galaxy, Episode 11 From pje at telecommunity.com Wed Feb 16 17:02:18 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Feb 16 17:00:32 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <20050211223215.GS14902@ewok.lucasdigital.com> Message-ID: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: >tommy said that this would be the best place to ask >this question.... > >I'm trying to get functions wrapped via boost to show >up as builtin types so that pydoc includes them when >documenting the module containing them. Right now >boost python functions are created using a PyTypeObject >such that when inspect.isbuiltin does: > > return isinstance(object, types.BuiltinFunctionType) FYI, this may not be the "right" way to do this, but since 2.3 'isinstance()' looks at an object's __class__ rather than its type(), so you could perhaps include a '__class__' descriptor in your method type that returns BuiltinFunctionType and see if that works. It's a kludge, but it might let your code work with existing versions of Python. From bob at redivi.com Wed Feb 16 17:26:34 2005 From: bob at redivi.com (Bob Ippolito) Date: Wed Feb 16 17:26:43 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> Message-ID: <5614e00fb134b968fa76a1896c456f4a@redivi.com> On Feb 16, 2005, at 11:02, Phillip J. Eby wrote: > At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: >> tommy said that this would be the best place to ask >> this question.... >> >> I'm trying to get functions wrapped via boost to show >> up as builtin types so that pydoc includes them when >> documenting the module containing them. Right now >> boost python functions are created using a PyTypeObject >> such that when inspect.isbuiltin does: >> >> return isinstance(object, types.BuiltinFunctionType) > > FYI, this may not be the "right" way to do this, but since 2.3 > 'isinstance()' looks at an object's __class__ rather than its type(), > so you could perhaps include a '__class__' descriptor in your method > type that returns BuiltinFunctionType and see if that works. > > It's a kludge, but it might let your code work with existing versions > of Python. It works in Python 2.3.0: import types class FakeBuiltin(object): __doc__ = property(lambda self: self.doc) __name__ = property(lambda self: self.name) __self__ = property(lambda self: None) __class__ = property(lambda self: types.BuiltinFunctionType) def __init__(self, name, doc): self.name = name self.doc = doc >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) Help on built-in function name: name(...) name(foo, bar, baz) -> rval -bob From pje at telecommunity.com Wed Feb 16 17:43:51 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Feb 16 17:42:04 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5614e00fb134b968fa76a1896c456f4a@redivi.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote: > >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) >Help on built-in function name: > >name(...) > name(foo, bar, baz) -> rval If you wanted to be even more ambitious, you could return FunctionType and have a fake func_code so pydoc will be able to see the argument signature directly. :) From bob at redivi.com Wed Feb 16 17:52:56 2005 From: bob at redivi.com (Bob Ippolito) Date: Wed Feb 16 17:53:11 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5.1.1.6.0.20050216114230.037364a0@mail.telecommunity.com> Message-ID: <640f0846671b73a92939648d278e4861@redivi.com> On Feb 16, 2005, at 11:43, Phillip J. Eby wrote: > At 11:26 AM 2/16/05 -0500, Bob Ippolito wrote: >> >>> help(FakeBuiltin("name", "name(foo, bar, baz) -> rval")) >> Help on built-in function name: >> >> name(...) >> name(foo, bar, baz) -> rval > > If you wanted to be even more ambitious, you could return FunctionType > and have a fake func_code so pydoc will be able to see the argument > signature directly. :) I was thinking that too, but I didn't have the energy to code it in an email :) -bob From fredrik at pythonware.com Wed Feb 16 21:08:14 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 21:19:07 2005 Subject: [Python-Dev] string find(substring) vs. substring in string Message-ID: any special reason why "in" is faster if the substring is found, but a lot slower if it's not in there? timeit -s "s = 'not there'*100" "s.find('not there') != -1" 1000000 loops, best of 3: 0.749 usec per loop timeit -s "s = 'not there'*100" "'not there' in s" 10000000 loops, best of 3: 0.122 usec per loop timeit -s "s = 'not the xyz'*100" "s.find('not there') != -1" 100000 loops, best of 3: 7.03 usec per loop timeit -s "s = 'not the xyz'*100" "'not there' in s" 10000 loops, best of 3: 25.9 usec per loop ps. btw, it's about time we did something about this: timeit -s "s = 'not the xyz'*100" -s "import re; p = re.compile('not there')" "p.search(s)" 100000 loops, best of 3: 5.72 usec per loop From FBatista at uniFON.com.ar Wed Feb 16 21:23:59 2005 From: FBatista at uniFON.com.ar (Batista, Facundo) Date: Wed Feb 16 21:28:28 2005 Subject: [Python-Dev] string find(substring) vs. substring in string Message-ID: [Fredrik Lundh] #- any special reason why "in" is faster if the substring is found, but #- a lot slower if it's not in there? Maybe because it stops searching when it finds it? The time seems to be very dependant of the position of the first match: fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'not there'*100" "'not there' in s" 1000000 loops, best of 3: 0.222 usec per loop fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*20 + 'not there'*100" "'not there' in s" 100000 loops, best of 3: 5.54 usec per loop fbatista@pytonisa ~/ota> python /usr/local/lib/python2.3/timeit.py -s "s = 'blah blah'*40 + 'not there'*100" "'not there' in s" 100000 loops, best of 3: 10.8 usec per loop . Facundo Bit?cora De Vuelo: http://www.taniquetil.com.ar/plog PyAr - Python Argentina: http://pyar.decode.com.ar/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050216/e799aff5/attachment.html From mike at skew.org Wed Feb 16 21:34:16 2005 From: mike at skew.org (Mike Brown) Date: Wed Feb 16 21:34:18 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: Message-ID: <200502162034.j1GKYGBU067236@chilled.skew.org> Fredrik Lundh wrote: > any special reason why "in" is faster if the substring is found, but > a lot slower if it's not in there? Just guessing here, but in general I would think that it would stop searching as soon as it found it, whereas until then, it keeps looking, which takes more time. But I would also hope that it would be smart enough to know that it doesn't need to look past the 2nd character in 'not the xyz' when it is searching for 'not there' (due to the lengths of the sequences). From amk at amk.ca Wed Feb 16 21:54:31 2005 From: amk at amk.ca (A.M. Kuchling) Date: Wed Feb 16 21:57:23 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <200502162034.j1GKYGBU067236@chilled.skew.org> References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: <20050216205431.GA8873@rogue.amk.ca> On Wed, Feb 16, 2005 at 01:34:16PM -0700, Mike Brown wrote: > time. But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). Assuming stringobject.c:string_contains is the right function, the code looks like this: size = PyString_GET_SIZE(el); rhs = PyString_AS_STRING(el); lhs = PyString_AS_STRING(a); /* optimize for a single character */ if (size == 1) return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; end = lhs + (PyString_GET_SIZE(a) - size); while (lhs <= end) { if (memcmp(lhs++, rhs, size) == 0) return 1; } So it's doing a zillion memcmp()s. I don't think there's a more efficient way to do this with ANSI C; memmem() is a GNU extension that searches for blocks of memory. Perhaps saving some memcmps by writing if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) would help. --amk From gvanrossum at gmail.com Wed Feb 16 22:03:10 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Feb 16 22:03:13 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <20050216205431.GA8873@rogue.amk.ca> References: <200502162034.j1GKYGBU067236@chilled.skew.org> <20050216205431.GA8873@rogue.amk.ca> Message-ID: > Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory. Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help. Which is exactly how s.find() wins this race. (I guess it loses when it's found by having to do the "find" lookup.) Maybe string_contains should just call string_find_internal()? And then there's the question of how the re module gets to be faster still; I suppose it doesn't bother with memcmp() at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From irmen at xs4all.nl Wed Feb 16 22:08:36 2005 From: irmen at xs4all.nl (Irmen de Jong) Date: Wed Feb 16 22:08:38 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <200502162034.j1GKYGBU067236@chilled.skew.org> References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: <4213B654.7070901@xs4all.nl> Mike Brown wrote: > Fredrik Lundh wrote: > >>any special reason why "in" is faster if the substring is found, but >>a lot slower if it's not in there? > > > Just guessing here, but in general I would think that it would stop searching > as soon as it found it, whereas until then, it keeps looking, which takes more > time. But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). There's the Boyer-Moore string search algorithm which is allegedly much faster than a simplistic scanning approach, and I also found this: http://portal.acm.org/citation.cfm?id=79184 So perhaps there's room for improvement :) --Irmen From fredrik at pythonware.com Wed Feb 16 22:19:20 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:19:13 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> <20050216205431.GA8873@rogue.amk.ca> Message-ID: A.M. Kuchling wrote: >> time. But I would also hope that it would be smart enough to know that it >> doesn't need to look past the 2nd character in 'not the xyz' when it is >> searching for 'not there' (due to the lengths of the sequences). > > Assuming stringobject.c:string_contains is the right function, the > code looks like this: > > size = PyString_GET_SIZE(el); > rhs = PyString_AS_STRING(el); > lhs = PyString_AS_STRING(a); > > /* optimize for a single character */ > if (size == 1) > return memchr(lhs, *rhs, PyString_GET_SIZE(a)) != NULL; > > end = lhs + (PyString_GET_SIZE(a) - size); > while (lhs <= end) { > if (memcmp(lhs++, rhs, size) == 0) > return 1; > } > > So it's doing a zillion memcmp()s. I don't think there's a more > efficient way to do this with ANSI C; memmem() is a GNU extension that > searches for blocks of memory. oops. so whoever implemented contains didn't even bother to look at the find implementation... (which uses the same brute-force algorithm, but a better implementation...) > Perhaps saving some memcmps by writing > > if ((*lhs == *rhs) && memcmp(lhs++, rhs, size) == 0) > > would help. memcmp still compiles to REP CMPB on many x86 compilers, and the setup overhead for memcmp sucks on modern x86 hardware; it's usually better to write your own bytewise comparision... (and the fact that we're still brute-force search algorithms in "find" is a bit embarrassing -- note that RE outperforms "in" by a factor of five.... guess it's time to finish the split/replace parts of stringlib and produce a patch... ;-) From fredrik at pythonware.com Wed Feb 16 22:23:03 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:33:56 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Mike Brown wrote: >> any special reason why "in" is faster if the substring is found, but >> a lot slower if it's not in there? > > Just guessing here, but in general I would think that it would stop searching > as soon as it found it, whereas until then, it keeps looking, which takes more > time. the point was that string.find does the same thing, but is much faster in the "no match" case. > But I would also hope that it would be smart enough to know that it > doesn't need to look past the 2nd character in 'not the xyz' when it is > searching for 'not there' (due to the lengths of the sequences). note that the target string was "not the xyz"*100, so the search algorithm surely has to look past the second character ;-) (btw, the benchmark was taken from jim hugunin's ironpython talk, and seems to be carefully designed to kill performance also for more advanced algorithms -- including boyer-moore) From fredrik at pythonware.com Wed Feb 16 22:50:55 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 22:50:53 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org><20050216205431.GA8873@rogue.amk.ca> Message-ID: Guido van Rossum wrote: > Which is exactly how s.find() wins this race. (I guess it loses when > it's found by having to do the "find" lookup.) Maybe string_contains > should just call string_find_internal()? I somehow suspected that "in" did some extra work in case the "find" failed; guess I should have looked at the code instead... I didn't really expect anyone to use a bad implementation of a brute-force algorithm (O(nm)) when the library already contained a reasonably good version of the same algorithm. > And then there's the question of how the re module gets to be faster > still; I suppose it doesn't bother with memcmp() at all. the benchmark cheats (a bit) -- it builds a state machine (KMP-style) in "compile", and uses that to search in O(n) time. that approach won't fly for "in" and find, of course, but it's definitely possible to make them run a lot faster than RE (i.e. O(n/m) for most cases)... but refactoring the contains code to use find_internal sounds like a good first step. any takers? From tim.peters at gmail.com Wed Feb 16 22:55:27 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 16 22:55:49 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage Message-ID: <1f7befae05021613553afaaa2f@mail.gmail.com> Rev 2.66 of funcobject.c made func.__name__ writable for the first time. That's great, but the patch also introduced what I'm pretty sure was an unintended incompatibility: after 2.66, func.__name__ was no longer *readable* in restricted execution mode. I can't think of a good reason to restrict reading func.__name__, and it looks like this part of the change was an accident. So, unless someone objects soon, I intend to restore that func.__name__ is readable regardless of execution mode (but will continue to be unwritable in restricted execution mode). Objections? Tres Seaver filed a bug report (some Zope tests fail under 2.4 because of this): http://www.python.org/sf/1124295 From raymond.hettinger at verizon.net Wed Feb 16 23:06:54 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Wed Feb 16 23:11:46 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string Message-ID: <000001c51473$df4717a0$8d2acb97@oemcomputer> > but refactoring the contains code to use find_internal sounds like a good > first step.? any takers? > > ? I'm up for it. ? Raymond Hettinger From fredrik at pythonware.com Wed Feb 16 23:10:40 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed Feb 16 23:11:52 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org><20050216205431.GA8873@rogue.amk.ca> Message-ID: > memcmp still compiles to REP CMPB on many x86 compilers, and the setup > overhead for memcmp sucks on modern x86 hardware make that "compiles to REPE CMPSB" and "the setup overhead for REPE CMPSB" From Scott.Daniels at Acm.Org Wed Feb 16 23:00:54 2005 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Wed Feb 16 23:12:18 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: <4213B654.7070901@xs4all.nl> References: <200502162034.j1GKYGBU067236@chilled.skew.org> <4213B654.7070901@xs4all.nl> Message-ID: Irmen de Jong wrote: > There's the Boyer-Moore string search algorithm which is > allegedly much faster than a simplistic scanning approach, > and I also found this: http://portal.acm.org/citation.cfm?id=79184 > So perhaps there's room for improvement :) The problem is setup vs. run. If the question is 'ab in 'rabcd', Boyer-Moore and other fancy searches will be swamped with prep time. In Fred's comparison with re, he does the re.compile(...) outside of the timing loop. You need to decide what the common case is. The longer the thing you are searching in, the more one-time-only overhead you can afford to reduce the per-search-character cost. --Scott David Daniels Scott.Daniels@Acm.Org From gvanrossum at gmail.com Wed Feb 16 23:16:08 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Wed Feb 16 23:16:11 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: References: <200502162034.j1GKYGBU067236@chilled.skew.org> <4213B654.7070901@xs4all.nl> Message-ID: > The longer the thing you are searching in, the more one-time-only > overhead you can afford to reduce the per-search-character cost. Only if you don't find it close to the start. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From Scott.Daniels at Acm.Org Wed Feb 16 23:19:20 2005 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Wed Feb 16 23:33:23 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string In-Reply-To: References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Fredrik Lundh wrote: > (btw, the benchmark was taken from jim hugunin's ironpython talk, and > seems to be carefully designed to kill performance also for more advanced > algorithms -- including boyer-moore) Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do about 300 probes once the table is set (the underscores below): not the xyznot the xyznot the xyz... not ther_ not the__ not ther_ not the__ not ther_ ... -- Scott David Daniels Scott.Daniels@Acm.Org From fredrik at pythonware.com Thu Feb 17 00:10:29 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 00:16:13 2005 Subject: [Python-Dev] Re: string find(substring) vs. substring in string References: <200502162034.j1GKYGBU067236@chilled.skew.org> Message-ID: Scott David Daniels wrote: > Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do > about 300 probes once the table is set (the underscores below): > > not the xyznot the xyznot the xyz... > not ther_ > not the__ > not ther_ > not the__ > not ther_ > ... yup; it gets into a 9/2/9/2 rut. tweak the pattern a little, and you get better results for BM. ("kill" is of course an understatement, but BM usually works better. but it still needs a sizeof(alphabet) table, so you can pretty much forget about it if you want to support unicode...) From martin at v.loewis.de Thu Feb 17 00:42:05 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu Feb 17 00:42:09 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <4213DA4D.8090502@v.loewis.de> Gfeller Martin wrote: > Nevertheless, I tried to convert the heap used by Python > to a Windows Low Fragmentation Heap (available on XP > and 2003 Server). This improved the overall run time > of a typical CPU-intensive report by about 15% > (overall run time is in the 5 minutes range), with the > same memory consumption. I must admit that I'm surprised. I would have expected that most allocations in Python go through obmalloc, so the heap would only see "large" allocations. It would be interesting to find out, in your application, why it is still an improvement to use the low-fragmentation heaps. Regards, Martin From allison at sumeru.stanford.EDU Thu Feb 17 01:06:24 2005 From: allison at sumeru.stanford.EDU (Dennis Allison) Date: Thu Feb 17 01:06:31 2005 Subject: [Python-Dev] string find(substring) vs. substring in string In-Reply-To: <4213B654.7070901@xs4all.nl> Message-ID: Boyer-Moore and variants need a bit of preprocessing on the pattern which makes them great for long patterns but more costly for short ones. On Wed, 16 Feb 2005, Irmen de Jong wrote: > Mike Brown wrote: > > Fredrik Lundh wrote: > > > >>any special reason why "in" is faster if the substring is found, but > >>a lot slower if it's not in there? > > > > > > Just guessing here, but in general I would think that it would stop searching > > as soon as it found it, whereas until then, it keeps looking, which takes more > > time. But I would also hope that it would be smart enough to know that it > > doesn't need to look past the 2nd character in 'not the xyz' when it is > > searching for 'not there' (due to the lengths of the sequences). > > There's the Boyer-Moore string search algorithm which is > allegedly much faster than a simplistic scanning approach, > and I also found this: http://portal.acm.org/citation.cfm?id=79184 > So perhaps there's room for improvement :) > > --Irmen > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/allison%40sumeru.stanford.edu > From ejones at uwaterloo.ca Thu Feb 17 02:26:16 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Thu Feb 17 02:26:22 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <4213DA4D.8090502@v.loewis.de> References: <4213DA4D.8090502@v.loewis.de> Message-ID: On Feb 16, 2005, at 18:42, Martin v. L?wis wrote: > I must admit that I'm surprised. I would have expected > that most allocations in Python go through obmalloc, so > the heap would only see "large" allocations. > > It would be interesting to find out, in your application, > why it is still an improvement to use the low-fragmentation > heaps. Hmm... This is an excellent point. A grep through the Python source code shows that the following files call the native system malloc (I've excluded a few obviously platform specific files). A quick visual inspection shows that most of these are using it to allocate some sort of array or string, so it likely *should* go through the system malloc. Gfeller, any idea if you are using any of the modules on this list? If so, it would be pretty easy to try converting them to call the obmalloc functions instead, and see how that affects the performance. Evan Jones Demo/pysvr/pysvr.c Modules/_bsddb.c Modules/_curses_panel.c Modules/_cursesmodule.c Modules/_hotshot.c Modules/_sre.c Modules/audioop.c Modules/bsddbmodule.c Modules/cPickle.c Modules/cStringIO.c Modules/getaddrinfo.c Modules/main.c Modules/pyexpat.c Modules/readline.c Modules/regexpr.c Modules/rgbimgmodule.c Modules/svmodule.c Modules/timemodule.c Modules/zlibmodule.c PC/getpathp.c Python/strdup.c Python/thread.c From greg.ewing at canterbury.ac.nz Thu Feb 17 03:27:09 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu Feb 17 03:27:24 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <000701c512e5$7de81660$af0189c3@oemcomputer> References: <000701c512e5$7de81660$af0189c3@oemcomputer> Message-ID: <421400FD.8090303@canterbury.ac.nz> Richard Brodie wrote: > > Otherwise, unless I misunderstand integer unification, one would > just have to strike the distinction between, say, %d and %u. Couldn't that be done anyway? The distinction really only makes sense in C, where there's no way of knowing whether the value is signed or unsigned otherwise. In Python the value itself knows whether it's signed or not. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From gvanrossum at gmail.com Thu Feb 17 07:22:40 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 17 07:22:43 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <421400FD.8090303@canterbury.ac.nz> References: <000701c512e5$7de81660$af0189c3@oemcomputer> <421400FD.8090303@canterbury.ac.nz> Message-ID: > > Otherwise, unless I misunderstand integer unification, one would > > just have to strike the distinction between, say, %d and %u. > > Couldn't that be done anyway? The distinction really only > makes sense in C, where there's no way of knowing whether > the value is signed or unsigned otherwise. In Python the > value itself knows whether it's signed or not. The time machine is at your service: in Python 2.4 there's no difference. That's integer unification for you! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at electricrain.com Thu Feb 17 07:53:30 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Thu Feb 17 07:53:51 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108340374.3768.33.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> Message-ID: <20050217065330.GP25441@zot.electricrain.com> fyi - i've updated the python sha1/md5 openssl patch. it now replaces the entire sha and md5 modules with a generic hashes module that gives access to all of the hash algorithms supported by OpenSSL (including appropriate legacy interface wrappers and falling back to the old code when compiled without openssl). https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 I don't quite like the module name 'hashes' that i chose for the generic interface (too close to the builtin hash() function). Other suggestions on a module name? 'digest' comes to mind. -greg From fredrik at pythonware.com Thu Feb 17 10:12:19 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 10:12:16 2005 Subject: [Python-Dev] Re: license issues with profiler.py and md5.h/md5c.c References: <1108090248.3753.53.camel@schizo><226e9c65e562f9b0439333053036fef3@redivi.com><1108102539.3753.87.camel@schizo><20050211175118.GC25441@zot.electricrain.com><00c701c5108e$f3d0b930$24ed0ccb@apana.org.au><5d300838ef9716aeaae53579ab1f7733@redivi.com><013501c510ae$2abd7360$24ed0ccb@apana.org.au><20050212133721.GA13429@rogue.amk.ca><20050212210402.GE25441@zot.electricrain.com><1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: "Gregory P. Smith" wrote: > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. hashtools, hashlib, and _hash are common names for helper modules like this. (you still provide md5 and sha wrappers, I hope) From mwh at python.net Thu Feb 17 11:51:35 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 11:51:37 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae05021613553afaaa2f@mail.gmail.com> (Tim Peters's message of "Wed, 16 Feb 2005 16:55:27 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> Message-ID: <2mzmy3zcg8.fsf@starship.python.net> Tim Peters writes: > Rev 2.66 of funcobject.c made func.__name__ writable for the first > time. That's great, but the patch also introduced what I'm pretty > sure was an unintended incompatibility: after 2.66, func.__name__ was > no longer *readable* in restricted execution mode. Yeah, my bad. > I can't think of a good reason to restrict reading func.__name__, > and it looks like this part of the change was an accident. So, > unless someone objects soon, I intend to restore that func.__name__ > is readable regardless of execution mode (but will continue to be > unwritable in restricted execution mode). > > Objections? Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Cheers, mwh -- The only problem with Microsoft is they just have no taste. -- Steve Jobs, (From _Triumph of the Nerds_ PBS special) and quoted by Aahz on comp.lang.python From astrand at lysator.liu.se Thu Feb 17 13:22:03 2005 From: astrand at lysator.liu.se (Peter Astrand) Date: Thu Feb 17 13:22:14 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) Message-ID: I'd like to have your opinion on this bug. Personally, I'd prefer to keep test_no_leaking as it is, but if you think otherwise... One thing that actually can motivate that test_subprocess takes 20% of the overall time is that this test is a good generic Python stress test - this test might catch some other startup race condition, for example. Regards, ?strand ---------- Forwarded message ---------- Date: Thu, 17 Feb 2005 04:09:33 -0800 From: SourceForge.net To: noreply@sourceforge.net Subject: [ python-Bugs-1124637 ] test_subprocess is far too slow Bugs item #1124637, was opened at 2005-02-17 11:10 Message generated for change (Comment added) made by mwh You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1124637&group_id=5470 Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Peter ?strand (astrand) Summary: test_subprocess is far too slow Initial Comment: test_subprocess takes multiple minutes. I'm pretty sure it's "test_no_leaking". It should either be sped up or only tested when some -u argument is passed to regrtest. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2005-02-17 12:09 Message: Logged In: YES user_id=6656 Bog standard linux pc -- p3 933, 384 megs of ram. "$ time ./python ../Lib/test/regrtest.py test_subprocess" reports 2 minutes 7. This is a debug build, a release build might be quicker. A run of the entire test suite takes a hair over nine minutes, so 20-odd % of the time seems to be test_subprocess. It also takes ages on my old-ish ibook (600 Mhz G3, also 384 megs of ram), but that's at home and I can't time it. ---------------------------------------------------------------------- Comment By: Peter ?strand (astrand) Date: 2005-02-17 11:50 Message: Logged In: YES user_id=344921 Tell me a bit about your type of OS and hardware. On my machine (P4 2.66 GHz with Linux), the test takes 28 seconds. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1124637&group_id=5470 From ncoghlan at iinet.net.au Thu Feb 17 15:15:46 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Thu Feb 17 15:15:50 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214A712.6090107@iinet.net.au> Peter Astrand wrote: > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > test_no_leaking as it is, but if you think otherwise... > > One thing that actually can motivate that test_subprocess takes 20% of the > overall time is that this test is a good generic Python stress test - this > test might catch some other startup race condition, for example. test_decimal has a short version which tests basic functionality and always runs, but enabling -udecimal also runs the specification tests (which take a fair bit longer). So keeping the basic subprocess tests unconditional, and running the long ones only if -uall or -usubprocess are given would seem reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From fredrik at pythonware.com Thu Feb 17 15:19:24 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 15:19:58 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) References: <4214A712.6090107@iinet.net.au> Message-ID: Nick Coghlan wrote: >> One thing that actually can motivate that test_subprocess takes 20% of the >> overall time is that this test is a good generic Python stress test - this >> test might catch some other startup race condition, for example. > > test_decimal has a short version which tests basic functionality and always runs, but > enabling -udecimal also runs the specification tests (which take a fair bit longer). > > So keeping the basic subprocess tests unconditional, and running the long ones only if -uall > or -usubprocess are given would seem reasonable. does anyone ever use the -u options when running tests? From mwh at python.net Thu Feb 17 15:30:06 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 15:30:41 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: (Fredrik Lundh's message of "Thu, 17 Feb 2005 15:19:24 +0100") References: <4214A712.6090107@iinet.net.au> Message-ID: <2mll9nz2c1.fsf@starship.python.net> "Fredrik Lundh" writes: > Nick Coghlan wrote: > >>> One thing that actually can motivate that test_subprocess takes 20% of the >>> overall time is that this test is a good generic Python stress test - this >>> test might catch some other startup race condition, for example. >> >> test_decimal has a short version which tests basic functionality and always runs, but >> enabling -udecimal also runs the specification tests (which take a fair bit longer). >> >> So keeping the basic subprocess tests unconditional, and running the long ones only if -uall >> or -usubprocess are given would seem reasonable. > > does anyone ever use the -u options when running tests? Yes, occasionally. Esp. with test_compiler a testall run is an overnight job but I try to do it every now and again. Cheers, mwh -- If design space weren't so vast, and the good solutions so small a portion of it, programming would be a lot easier. -- maney, comp.lang.python From tim.peters at gmail.com Thu Feb 17 15:43:20 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:43:55 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae050217064337532915@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? From tim.peters at gmail.com Thu Feb 17 15:43:20 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:45:14 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae050217064337532915@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? From tim.peters at gmail.com Thu Feb 17 15:56:14 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 15:56:16 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mzmy3zcg8.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> Message-ID: <1f7befae05021706564914b901@mail.gmail.com> [Michael Hudson] > ... > Well, I fixed it on reading the bug report and before getting to > python-dev mail :) Sorry if this duplicated your work, but hey, it was > only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time . Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are visible in the Zope world, due to auto-generated test runner failure reports; alas, this is in a new test, and 2.4 worked fine with the Zope tests as they were when 2.4 was released)? From mwh at python.net Thu Feb 17 15:55:23 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 16:15:42 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae050217064337532915@mail.gmail.com> (Tim Peters's message of "Thu, 17 Feb 2005 09:43:20 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> Message-ID: <2mfyzvz15w.fsf@starship.python.net> Tim Peters writes: > [Michael Hudson] >> ... >> Well, I fixed it on reading the bug report and before getting to >> python-dev mail :) Sorry if this duplicated your work, but hey, it was >> only a two line change... > > Na, the real work was tracking it down in the bowels of Zope's C-coded > security machinery -- we'll let you do that part next time . > > Did you add a test to ensure this remains fixed? Yup. > A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are > very visible in the Zope world, due to auto-generated test runner > failure reports)? No, I'll do that now. I'm not very good at remembering NEWS blurbs... Cheers, mwh -- 6. The code definitely is not portable - it will produce incorrect results if run from the surface of Mars. -- James Bonfield, http://www.ioccc.org/2000/rince.hint From tim.peters at gmail.com Thu Feb 17 16:17:22 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 16:17:27 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: References: <4214A712.6090107@iinet.net.au> Message-ID: <1f7befae05021707171476f540@mail.gmail.com> [Fredrik Lundh] > does anyone ever use the -u options when running tests? Yes -- I routinely do -uall, under both release and debug builds, but only on Windows. WinXP in particular seems to do a good job when hyper-threading is available -- running the tests doesn't slow down anything else I'm doing, except during the disk-intensive tests (test_largefile is a major pig on Windows). From anthony at interlink.com.au Thu Feb 17 16:24:35 2005 From: anthony at interlink.com.au (Anthony Baxter) Date: Thu Feb 17 16:25:11 2005 Subject: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: References: <4214A712.6090107@iinet.net.au> Message-ID: <200502180224.36851.anthony@interlink.com.au> On Friday 18 February 2005 01:19, Fredrik Lundh wrote: > > does anyone ever use the -u options when running tests? I use "make testall" (which invokes with -uall) regularly, and turn on specific options when they're testing something I'm working with. -- Anthony Baxter It's never too late to have a happy childhood. From tim.peters at gmail.com Thu Feb 17 16:25:50 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 16:25:53 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <2mfyzvz15w.fsf@starship.python.net> References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> <2mfyzvz15w.fsf@starship.python.net> Message-ID: <1f7befae05021707252136573e@mail.gmail.com> [sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the first msg was still in "draft" status] >> Did you add a test to ensure this remains fixed? [mwh] > Yup. Bless you. Did you attach a contributor agreement and mark the test as being contributed under said contributor agreement, adjacent to your valid copyright notice ? >> A NEWS blurb ...? > No, I'll do that now. I'm not very good at remembering NEWS blurbs... LOL -- sorry, I'm just imagining what NEWS would look like if we required a contributor-agreement notification on each blurb. I appreciate your work here, and will try to find a drug to counteract the ones I appear to have overdosed on this morning ... From mwh at python.net Thu Feb 17 16:29:12 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 16:29:14 2005 Subject: [Python-Dev] 2.4 func.__name__ breakage In-Reply-To: <1f7befae05021707252136573e@mail.gmail.com> (Tim Peters's message of "Thu, 17 Feb 2005 10:25:50 -0500") References: <1f7befae05021613553afaaa2f@mail.gmail.com> <2mzmy3zcg8.fsf@starship.python.net> <1f7befae050217064337532915@mail.gmail.com> <2mfyzvz15w.fsf@starship.python.net> <1f7befae05021707252136573e@mail.gmail.com> Message-ID: <2m8y5nyzlj.fsf@starship.python.net> Tim Peters writes: > [sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the > first msg was still in "draft" status] > >>> Did you add a test to ensure this remains fixed? > > [mwh] >> Yup. > > Bless you. Did you attach a contributor agreement and mark the test > as being contributed under said contributor agreement, adjacent to > your valid copyright notice ? Fortunately 2 lines < 25 lines, so I think I'm safe on this one :) Cheers, mwh -- glyph: I don't know anything about reality. -- from Twisted.Quotes From gvanrossum at gmail.com Thu Feb 17 16:30:58 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Thu Feb 17 16:31:00 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > test_no_leaking as it is, but if you think otherwise... > > One thing that actually can motivate that test_subprocess takes 20% of the > overall time is that this test is a good generic Python stress test - this > test might catch some other startup race condition, for example. A suite of unit tests is a precious thing. We want to test as much as we can, and as thoroughly as possible; but at the same time we want the test to run reasonably fast. If the test takes too long, human nature being what it is, this will actually cause less thorough testing because developers don't feel like running the test suite after each small change, and then we get frequent problems where someone breaks the build because they couldn't wait to run the unit test. (For example, where I work we have a Java test suite that takes 25 minutes to run. The build is broken on a daily basis by developers (including me) who make a small change and check it in believing it won't break anything.) The Python test suite already has a way (the -u flag) to distinguish between "regular" broad-coverage testing and deep coverage for specific (or all) areas. Let's keep the really long-running tests out of the regular test suite. There used to be a farm of machines that did nothing but run the test suite ("snake-farm"). This seems to have stopped (it was run by volunteers at a Swedish university). Maybe we should revive such an effort, and make sure it runs with -u all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From astrand at lysator.liu.se Thu Feb 17 16:52:12 2005 From: astrand at lysator.liu.se (Peter Astrand) Date: Thu Feb 17 16:52:24 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: On Thu, 17 Feb 2005, Guido van Rossum wrote: > > I'd like to have your opinion on this bug. Personally, I'd prefer to keep > > test_no_leaking as it is, but if you think otherwise... > A suite of unit tests is a precious thing. We want to test as much as > we can, and as thoroughly as possible; but at the same time we want > the test to run reasonably fast. If the test takes too long, human > nature being what it is, this will actually cause less thorough > testing because developers don't feel like running the test suite > after each small change, and then we get frequent problems where Good point. > The Python test suite already has a way (the -u flag) to distinguish > between "regular" broad-coverage testing and deep coverage for > specific (or all) areas. Let's keep the really long-running tests out > of the regular test suite. I'm convinced. Is this easy to implement? Anyone interested in doing this? > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. Yes, Snake Farm is/was a project at "Lysator", an academic computer society located at Linkoping University. As you can tell from my mail address, I'm a member as well. I haven't been involved in the Snake Farm project, though. /Peter ?strand From python at rcn.com Thu Feb 17 17:02:54 2005 From: python at rcn.com (Raymond Hettinger) Date: Thu Feb 17 17:06:54 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: Message-ID: <002301c5150a$24760de0$3bbd2c81@oemcomputer> > Let's keep the really long-running tests out > of the regular test suite. For test_subprocess, consider adopting the technique used by test_decimal. When -u decimal is not specified, a small random selection of the resource intensive tests are run. That way, all of the tests eventually get run even if no one is routinely using -u all. Raymond From skip at pobox.com Thu Feb 17 17:19:35 2005 From: skip at pobox.com (Skip Montanaro) Date: Thu Feb 17 17:17:40 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? Message-ID: <16916.50199.723442.36695@montanaro.dyndns.org> I am frantically trying to get ready to be out of town for a week of vacation. Someone sent me some patches for datetime and asked me to look at them. I begged off but referred him to http://www.python.org/dev/ and made mention of the five patch review idea. Can someone make sure that's explained on the /dev/ site? Thx, Skip From walter at livinglogic.de Thu Feb 17 17:22:25 2005 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Feb 17 17:22:28 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214C4C1.5070309@livinglogic.de> Guido van Rossum wrote: > [...] > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. I've changed the job that produces the data for http://coverage.livinglogic.de/ to run python Lib/test/regrtest.py -uall -T -N Unfortunately this job currently produces only coverage info, the output of the test suite is thrown away. It should be easy to fix this, so that the output gets put into the database. Bye, Walter D?rwald From mwh at python.net Thu Feb 17 18:11:19 2005 From: mwh at python.net (Michael Hudson) Date: Thu Feb 17 18:11:22 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: <002301c5150a$24760de0$3bbd2c81@oemcomputer> (Raymond Hettinger's message of "Thu, 17 Feb 2005 11:02:54 -0500") References: <002301c5150a$24760de0$3bbd2c81@oemcomputer> Message-ID: <2m3bvvyuvc.fsf@starship.python.net> "Raymond Hettinger" writes: >> Let's keep the really long-running tests out >> of the regular test suite. > > For test_subprocess, consider adopting the technique used by > test_decimal. When -u decimal is not specified, a small random > selection of the resource intensive tests are run. That way, all of the > tests eventually get run even if no one is routinely using -u all. I do like this strategy but I don't think it applies to this test -- it has to try to create more than 'ulimit -n' processes, if I understand it correctly. Which makes me think there might be other ways to write the test if the resource module is available... Cheers, mwh -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From tim.peters at gmail.com Thu Feb 17 18:26:36 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 18:26:40 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd) In-Reply-To: <2m3bvvyuvc.fsf@starship.python.net> References: <002301c5150a$24760de0$3bbd2c81@oemcomputer> <2m3bvvyuvc.fsf@starship.python.net> Message-ID: <1f7befae05021709266fbc542d@mail.gmail.com> [Raymond Hettinger] >> For test_subprocess, consider adopting the technique used by >> test_decimal. When -u decimal is not specified, a small random >> selection of the resource intensive tests are run. That way, all of the >> tests eventually get run even if no one is routinely using -u all. [Michael Hudson] > I do like this strategy but I don't think it applies to this test -- > it has to try to create more than 'ulimit -n' processes, if I > understand it correctly. Which makes me think there might be other > ways to write the test if the resource module is available... Aha! That explains why test_subprocess runs so much faster on Windows despite that Windows process-creation time is measured in geological eras: test_no_leaking special-cases Windows to do only 65 iterations instead of 1026. It's easy to put that under control of a -u option instead; e.g., instead of max_handles = 1026 if mswindows: max_handles = 65 just use 1026 all the time, and stuff, e.g., if not test_support.is_resource_enabled("subprocess"): return at the start of test_no_leaking(). From aahz at pythoncraft.com Thu Feb 17 18:33:46 2005 From: aahz at pythoncraft.com (Aahz) Date: Thu Feb 17 18:33:50 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <16916.50199.723442.36695@montanaro.dyndns.org> References: <16916.50199.723442.36695@montanaro.dyndns.org> Message-ID: <20050217173346.GB18117@panix.com> On Thu, Feb 17, 2005, Skip Montanaro wrote: > > I am frantically trying to get ready to be out of town for a > week of vacation. Someone sent me some patches for datetime > and asked me to look at them. I begged off but referred him to > http://www.python.org/dev/ and made mention of the five patch review > idea. Can someone make sure that's explained on the /dev/ site? This should go into Brett's survey of the Python dev process, not as official documentation. It's simply an offer made by some of the prominent members of python-dev. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From arigo at tunes.org Thu Feb 17 19:11:19 2005 From: arigo at tunes.org (Armin Rigo) Date: Thu Feb 17 19:14:50 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050214074122b715a@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> Message-ID: <20050217181119.GA3055@vicky.ecs.soton.ac.uk> Hi Tim, On Mon, Feb 14, 2005 at 10:41:35AM -0500, Tim Peters wrote: > # This is a puzzle: there's no way to know the natural width of > # addresses on this box (in particular, there's no necessary > # relation to sys.maxint). Isn't this natural width nowadays available as: 256 ** struct.calcsize('P') ? Armin From tim.peters at gmail.com Thu Feb 17 19:44:11 2005 From: tim.peters at gmail.com (Tim Peters) Date: Thu Feb 17 19:44:16 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <20050217181119.GA3055@vicky.ecs.soton.ac.uk> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> Message-ID: <1f7befae050217104431312214@mail.gmail.com> [Tim Peters] >> # This is a puzzle: there's no way to know the natural width of >> # addresses on this box (in particular, there's no necessary >> # relation to sys.maxint). [Armin Rigo] > Isn't this natural width nowadays available as: > > 256 ** struct.calcsize('P') > > ? Looks right to me -- cool! I never used struct's 'P' format because it always appeared useless to me: even if I could ship pointers across processes or boxes, there's not much I could do with them after getting integers back from unpack(). But silly me! I'm sure Guido put it there anticipating the need for calcsize('P') when making a positive_id() function in Python. Now if you'll just sign and fax a Zope contributor agreement, I'll upgrade ZODB to use this slick trick . From fredrik at pythonware.com Thu Feb 17 21:21:38 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu Feb 17 21:21:43 2005 Subject: [Python-Dev] Re: Re: string find(substring) vs. substring in string References: <000001c51473$df4717a0$8d2acb97@oemcomputer> Message-ID: Raymond Hettinger wrote: > > but refactoring the contains code to use find_internal sounds like a good > > first step. any takers? > > I'm up for it. excellent! just fyi, unless my benchmark is mistaken, the Unicode implementation has the same problem: str in -> 25.8 µsec per loop unicode in -> 26.8 µsec per loop str.find() -> 6.73 µsec per loop unicode.find() -> 7.24 µsec per loop oddly enough, if I change the target string so it doesn't contain any partial matches at all, unicode.find() wins the race: str in -> 24.5 µsec per loop unicode in -> 24.6 µsec per loop str.find() -> 2.86 µsec per loop unicode.find() -> 2.16 µsec per loop From bac at OCF.Berkeley.EDU Thu Feb 17 21:22:29 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Thu Feb 17 21:22:44 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <20050217173346.GB18117@panix.com> References: <16916.50199.723442.36695@montanaro.dyndns.org> <20050217173346.GB18117@panix.com> Message-ID: <4214FD05.7020203@ocf.berkeley.edu> [removed pydotorg from people receiving this email] Aahz wrote: > On Thu, Feb 17, 2005, Skip Montanaro wrote: > >>I am frantically trying to get ready to be out of town for a >>week of vacation. Someone sent me some patches for datetime >>and asked me to look at them. I begged off but referred him to >>http://www.python.org/dev/ and made mention of the five patch review >>idea. Can someone make sure that's explained on the /dev/ site? > > > This should go into Brett's survey of the Python dev process, not as > official documentation. It's simply an offer made by some of the > prominent members of python-dev. I am planning on adding that blurb in there. Actually, while I have everyone's attention, I might as well throw an idea out there about sprucing up yet again the docs on contributing. I was thinking of taking the current dev intro and have it just explain how things basically work around here. So the doc would become more of just a high-level overview of how we dev the language. But I would cut out the helping out section and spin that into another doc that would go into some more detail on how to make a contribution. So this would specify in more detail how to report a bug, how to comment on one, etc. (same goes for patches). This is where I would stick the 5-for-1 deal. Lastly, write up a doc that covers what one with CVS checkin rights needs to do when checking in code. So how one goes about getting checkin rights, getting initial checkins OK'ed by others, and then the usual steps taken for a checkin. Sound worth it to people? Not really needed so go back and do your homework, Brett? What? -Brett From Jack.Jansen at cwi.nl Thu Feb 17 21:46:03 2005 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Thu Feb 17 21:46:03 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libimp.tex, 1.36, 1.36.2.1 libsite.tex, 1.26, 1.26.4.1 libtempfile.tex, 1.22, 1.22.4.1 libos.tex, 1.146.2.1, 1.146.2.2 In-Reply-To: References: Message-ID: On 14-feb-05, at 10:23, Just van Rossum wrote: > bcannon@users.sourceforge.net wrote: > >> \begin{datadesc}{PY_RESOURCE} >> -The module was found as a Macintosh resource. This value can only be >> -returned on a Macintosh. >> +The module was found as a Mac OS 9 resource. This value can only be >> +returned on a Mac OS 9 or earlier Macintosh. >> \end{datadesc} > > not entirely true: it's limited to the sa called "OS9" version of > MacPython, which happily runs natively on OSX as a Carbon app... But as of 2.4 there's no such thing as MacPython-OS9 any more. But as the constant is still in there I thought it best to document it. -- Jack Jansen, , http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From walter at livinglogic.de Thu Feb 17 23:22:20 2005 From: walter at livinglogic.de (=?iso-8859-1?Q?Walter_D=F6rwald?=) Date: Thu Feb 17 23:22:22 2005 Subject: [Python-Dev] Negative indices in UserString.MutableString Message-ID: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> Currently UserString.MutableString does not support negative indices: >>> import UserString >>> UserString.MutableString("foo")[-1] = "bar" Traceback (most recent call last): File "", line 1, in ? File "/home/Python-test/dist/src/Lib/UserString.py", line 149, in __setitem__ if index < 0 or index >= len(self.data): raise IndexError IndexError Should this be fixed so that negative value are treated as being relative to the end? Bye, Walter D?rwald From aahz at pythoncraft.com Thu Feb 17 23:23:36 2005 From: aahz at pythoncraft.com (Aahz) Date: Thu Feb 17 23:23:37 2005 Subject: [Python-Dev] Negative indices in UserString.MutableString In-Reply-To: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> References: <1543.84.56.105.228.1108678940.squirrel@isar.livinglogic.de> Message-ID: <20050217222336.GA18285@panix.com> On Thu, Feb 17, 2005, Walter D?rwald wrote: > > Currently UserString.MutableString does not support negative indices: > > >>> import UserString > >>> UserString.MutableString("foo")[-1] = "bar" > Traceback (most recent call last): > File "", line 1, in ? > File "/home/Python-test/dist/src/Lib/UserString.py", line 149, in __setitem__ > if index < 0 or index >= len(self.data): raise IndexError > IndexError > > Should this be fixed so that negative value are treated as being > relative to the end? Yup! As usual, patches welcome. (Yes, I'm comfortable channeling Guido here.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From greg.ewing at canterbury.ac.nz Fri Feb 18 02:58:46 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri Feb 18 02:59:05 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050217104431312214@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> <1f7befae050217104431312214@mail.gmail.com> Message-ID: <42154BD6.4030001@canterbury.ac.nz> Tim Peters wrote: > Looks right to me -- cool! I never used struct's 'P' format because > it always appeared useless to me: But silly me! I'm sure Guido > put it there anticipating the need for calcsize('P') when making a > positive_id() function in Python. Smells like more time machine activity to me. Any minute now you'll find there's suddenly a positive_id() builtin that's been there ever since 1.3 or so. And the 'P' format, then always never having just become useful, will have unappeared... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From nick at ilm.com Thu Feb 17 00:56:24 2005 From: nick at ilm.com (Nick Rasmussen) Date: Fri Feb 18 03:04:18 2005 Subject: [Python-Dev] subclassing PyCFunction_Type In-Reply-To: <5614e00fb134b968fa76a1896c456f4a@redivi.com> References: <5.1.1.6.0.20050216110025.02fb7e70@mail.telecommunity.com> <5614e00fb134b968fa76a1896c456f4a@redivi.com> Message-ID: <20050216235624.GO17806@ewok.lucasdigital.com> On Wed, 16 Feb 2005, Bob Ippolito wrote: > > On Feb 16, 2005, at 11:02, Phillip J. Eby wrote: > > >At 02:32 PM 2/11/05 -0800, Nick Rasmussen wrote: > >>tommy said that this would be the best place to ask > >>this question.... > >> > >>I'm trying to get functions wrapped via boost to show > >>up as builtin types so that pydoc includes them when > >>documenting the module containing them. Right now > >>boost python functions are created using a PyTypeObject > >>such that when inspect.isbuiltin does: > >> > >> return isinstance(object, types.BuiltinFunctionType) > > > >FYI, this may not be the "right" way to do this, but since 2.3 > >'isinstance()' looks at an object's __class__ rather than its type(), > >so you could perhaps include a '__class__' descriptor in your method > >type that returns BuiltinFunctionType and see if that works. > > > >It's a kludge, but it might let your code work with existing versions > >of Python. > > It works in Python 2.3.0: > That seemed to do the trick for me as well, I'll run it past the boost::python folks and see what they think. many thanks -nick From maalanen at ra.abo.fi Thu Feb 17 17:30:27 2005 From: maalanen at ra.abo.fi (Marcus Alanen) Date: Fri Feb 18 03:04:20 2005 Subject: [Python-Dev] [ python-Bugs-1124637 ] test_subprocess is far too slow (fwd) In-Reply-To: References: Message-ID: <4214C6A3.1000806@ra.abo.fi> Guido van Rossum wrote: > The Python test suite already has a way (the -u flag) to distinguish > between "regular" broad-coverage testing and deep coverage for > specific (or all) areas. Let's keep the really long-running tests out > of the regular test suite. > > There used to be a farm of machines that did nothing but run the test > suite ("snake-farm"). This seems to have stopped (it was run by > volunteers at a Swedish university). Maybe we should revive such an > effort, and make sure it runs with -u all. Hello Guido and everybody else, I hacked together a simple distributed unittest runner for our projects. Requirements are a NFS-mounted home directory across the slave nodes and SSH-based "automatic" authentication, i.e. no passwords or passphrases necessary. It officially works-for-me for around three hosts (see below) so that cuts the time down basically to a third (real-life example ~600 seconds to ~200 seconds, so it does work :-). It also supports "serialized tests", i.e. tests that must be run one after the other and cannot be run in parallel. http://mde.abo.fi/tools/disttest/ Comes with some problems; my blurb from advogato.org: """ Disttest is a distributed unittesting runner. You simply set the DISTTEST_HOSTS variable to a space-separated list of hostnames to connect to using SSH, and then run "disttest". The nodes must all have the same filesystem (usually an NFS-mounted /home) and have the Disttest program installed. You even gain a bit with just one computer by setting the variable to "localhost localhost". :-) There are currently two annoying problem with it, though. For some reason, 1) the unittest program connecting to the X server sometimes fails to provide the correct authentication, and 2) sometimes the actual connection to the X server can't be established. I think these are related to 1) congestion on the shared .Xauthority file, and 2) a too small listen() queue on the forwarding port by the SSH daemon. Both problems show up when using too many (over 4?) hosts, which is the whole point of the program! Sigh. """ Error checking probably bad. Anyway, feel free to check it out, modify, comment or anything. We're thinking of checking the assumptions in the blurb above, but no timetable is set. My guess is that the NFS-mounted home directory is the showstopper and people usually don't have lot's of machines hanging around, but that's for you to decide. Disclaimer: I don't know anything of CPython development nor of the tests in the CPython test suite. ;-) Best regards, and a big thank you for Python, Marcus From Martin.Gfeller at comit.ch Thu Feb 17 19:34:50 2005 From: Martin.Gfeller at comit.ch (Gfeller Martin) Date: Fri Feb 18 03:04:22 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% Message-ID: Hi, what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn is heavily used by the application). The lists also get fairly large, although not huge - up to typically 50000 (complex) objects in the tests I've measured. As I said, I don't speak C, so I can only speculate - do the lists at some point grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a higher upper limit, if I understood Tim Peters correctly)? Best regards, Martin -----Original Message----- From: Evan Jones [mailto:ejones@uwaterloo.ca] Sent: Thursday, 17 Feb 2005 02:26 To: Python Dev Cc: Gfeller Martin; Martin v. L?wis Subject: Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% On Feb 16, 2005, at 18:42, Martin v. L?wis wrote: > I must admit that I'm surprised. I would have expected > that most allocations in Python go through obmalloc, so > the heap would only see "large" allocations. > > It would be interesting to find out, in your application, > why it is still an improvement to use the low-fragmentation > heaps. Hmm... This is an excellent point. A grep through the Python source code shows that the following files call the native system malloc (I've excluded a few obviously platform specific files). A quick visual inspection shows that most of these are using it to allocate some sort of array or string, so it likely *should* go through the system malloc. Gfeller, any idea if you are using any of the modules on this list? If so, it would be pretty easy to try converting them to call the obmalloc functions instead, and see how that affects the performance. Evan Jones Demo/pysvr/pysvr.c Modules/_bsddb.c Modules/_curses_panel.c Modules/_cursesmodule.c Modules/_hotshot.c Modules/_sre.c Modules/audioop.c Modules/bsddbmodule.c Modules/cPickle.c Modules/cStringIO.c Modules/getaddrinfo.c Modules/main.c Modules/pyexpat.c Modules/readline.c Modules/regexpr.c Modules/rgbimgmodule.c Modules/svmodule.c Modules/timemodule.c Modules/zlibmodule.c PC/getpathp.c Python/strdup.c Python/thread.c From tim.peters at gmail.com Fri Feb 18 04:38:08 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 18 04:38:14 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: Message-ID: <1f7befae050217193863ffc028@mail.gmail.com> [Gfeller Martin] > what immediately comes to mind are Modules/cPickle.c and > Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn > is heavily used by the application). I probably guessed right the first time : LFH doesn't help with the lists directly, but helps indirectly by keeping smaller objects out of the general heap where the list guts actually live. Say we have a general heap with a memory map like this, meaning a contiguous range of available memory, where 'f' means a block is free. The units of the block don't really matter, maybe one 'f' is one byte, maybe one 'f' is 4MB -- it's all the same in the end: fffffffffffffffffffffffffffffffffffffffffffffff Now you allocate a relatively big object (like the guts of a large list), and it's assigned a contiguous range of blocks marked 'b': bbbbbbbbbbbbbbbffffffffffffffffffffffffffffffff Then you allocate a small object, marked 's': bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff The you want to grow the big object. Oops! It can't extend the block of b's in-place, because 's' is in the way. Instead it has to copy the whole darn thing: fffffffffffffffsbbbbbbbbbbbbbbbffffffffffffffff But if 's' is allocated from some _other_ heap, then the big object can grow in-place, and that's much more efficient than copying the whole thing. obmalloc has two primary effects: it manages a large number of very small (<= 256 bytes) memory chunks very efficiently, but it _also_ helps larger objects indirectly, by keeping the very small objects out of the platform C malloc's way. LFH appears to be an extension of the same basic idea, raising the "small object" limit to 16KB. Now note that pymalloc and LFH are *bad* ideas for objects that want to grow. pymalloc and LFH segregate the memory they manage into blocks of different sizes. For example, pymalloc keeps a list of free blocks each of which is exactly 64 bytes long. Taking a 64-byte block out of that list, or putting it back in, is very efficient. But if an object that uses a 64-byte block wants to grow, pymalloc can _never_ grow it in-place, it always has to copy it. That's a cost that comes with segregating memory by size, and for that reason Python deliberately doesn't use pymalloc in several cases where objects are expected to grow over time. One thing to take from that is that LFH can't be helping list-growing in a direct way either, if LFH (as seems likely) also needs to copy objects that grow in order to keep its internal memory segregated by size. The indirect benefit is still available, though: LFH may be helping simply by keeping smaller objects out of the general heap's hair. > The lists also get fairly large, although not huge - up to typically 50000 > (complex) objects in the tests I've measured. That's much larger than LFH can handle. Its limit is 16KB. A Python list with 50K elements requires a contiguous chunk of 200KB on a 32-bit machine to hold the list guts. > As I said, I don't speak C, so I can only speculate - do the lists at some point >grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a > higher upper limit, if I understood Tim Peters correctly)? A Python list object comprises two separately allocated pieces of memory. First is a list header, a small piece of memory of fixed size, independent of len(list). The list header is always obtained from obmalloc; LFH will never be involved with that, and neither will the system malloc. The list header has a pointer to a separate piece of memory, which contains the guts of a list, a contiguous vector of len(list) pionters (to Python objects). For a list of length n, this needs 4*n bytes on a 32-bit box. obmalloc never manages that space, and for the reason given above: we expect that list guts may grow, and obmalloc is meant for fixed-size chunks of memory. So the list guts will get handled by LFH, until the list needs more than 4K entries (hitting the 16KB LFH limit). Until then, LFH probably wastes time by copying growing list guts from size class to size class. Then the list guts finally get copied to the general heap, and stay there. I'm afraid the only you can know for sure is by obtaining detailed memory maps and analyzing them. From abo at minkirri.apana.org.au Fri Feb 18 05:09:51 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Fri Feb 18 05:10:35 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <20050217065330.GP25441@zot.electricrain.com> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: <1108699791.3758.98.camel@schizo> On Wed, 2005-02-16 at 22:53 -0800, Gregory P. Smith wrote: > fyi - i've updated the python sha1/md5 openssl patch. it now replaces > the entire sha and md5 modules with a generic hashes module that gives > access to all of the hash algorithms supported by OpenSSL (including > appropriate legacy interface wrappers and falling back to the old code > when compiled without openssl). > > https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 > > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. I just had a quick look, and have these comments (psedo patch review?). Apologies for the noise on the list... DESCRIPTION =========== This patch keeps the current md5c.c, md5module.c files and adds the following; _hashopenssl.c, hashes.py, md5.py, sha.py. The old md5 and sha extension modules get replaced by hashes.py, md5.py, and sha.py python modules that leverage off _hash (openssl) or _md5 and _sha (no openssl) extension modules. The new _hash extension module "wraps" the high level openssl EVP interface, which uses a string parameter to indicate what type of message digest algorithm to use. The advantage of this is it makes all openssl supported digests available, and if openssl adds more, we get them for free. A disadvantage of this is it is an abstraction level above the actual md5 and sha implementations, and this may add overheads. These overheads are probably negligible compared to the actual implementation speedups. The new _md5 and _sha extension modules are simply re-named versions of the old md5 and sha modules. The hashes.py module acts as an import wrapper for _hash, and falls back to using _md5 and _sha modules if _hash is not available. It provides an EVP style API (string hash name parameter), that supports only md5 and sha hashes if openssl is not available. The new md5.py and sha.py modules simply use hash.py. COMMENTS ======== The introduction of a "hashes" module with a new API that supports many different digests (provided openssl is available) is extending Python, not just "fixing the licenses" of md5 and sha modules. If all we wanted to do was fix the md5 module, a simpler solution would be to change the md5c.c API to match openssl's implementation, and make md5module.c use it, conditionally compiling against md5c.c or linking against openssl in setup.py. A similar approach could be used for sha, but would require stripping the sha implementation out of shamodule.c I am mildly of concerned about the namespace/filespace clutter introduced by this implementation... it feels unnecessary, as does the tangled dependencies between them. With openssl, hashes.py duplicates the functionality of _hash. Without openssl, md5.py and sha.py duplicate _md5 and _sha, via a roundabout route through hash.py. The python wrappers seem overly complicated, with things like def new(name, string=None): if string: return _hash.new(name) else: return _hash.new.(name,string) being common where the following would suffice; def new(name,string=""): return _hash.new(name,string) I think this is because _hash.new() uses an optional string parameter, but I have a feeling a C update with a zero length string is faster than this Python if. If it was a concern, the C implementation could check the value of the string length before calling update. Given the convenience methods for different hashes in hashes.py (which incidentally look like they are only available when _hash is not available... something else that needs fixing), the md5.py module could be simply coded as; from hashes import md5 new = md5 Despite all these nit-picks, it looks pretty good. It is orders of magnitude better than any of the other non-existent solutions, including the one I didn't code :-) -- Donovan Baarda http://minkirri.apana.org.au/~abo/ From raymond.hettinger at verizon.net Fri Feb 18 07:53:37 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Fri Feb 18 07:57:43 2005 Subject: [Python-Dev] Prospective Peephole Transformation Message-ID: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Based on some ideas from Skip, I had tried transforming the likes of "x in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it substantially simplified the generated code and converted the O(n) lookup into an O(1) step. There were substantial savings even if the set contained only a single entry. When disassembled, the bytecode is not only much shorter, it is also much more readable (corresponding almost directly to the original source). The problem with the transformation was that it didn't handle the case where x was non-hashable and it would raise a TypeError instead of returning False as it should. That situation arose once in the email module's test suite. To get it to work, I would have to introduce a frozenset subtype: class Searchset(frozenset): def __contains__(self, element): try: return frozenset.__contains__(self, element) except TypeError: return False Then, the transformation would be "x in Searchset([1, 2, 3])". Since the new Searchset object goes in the constant table, marshal would have to be taught how to save and restore the object. This is a more complicated than the original frozenset version of the patch, so I would like to get feedback on whether you guys think it is worth it. Raymond Hettinger From fredrik at pythonware.com Fri Feb 18 09:18:31 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 09:18:40 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: Raymond Hettinger wrote: > Based on some ideas from Skip, I had tried transforming the likes of "x > in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it > substantially simplified the generated code and converted the O(n) > lookup into an O(1) step. There were substantial savings even if the > set contained only a single entry. savings in what? time or bytecode size? constructed micro-benchmarks, or examples from real-life code? do we have any statistics on real-life "n" values? From martin at v.loewis.de Fri Feb 18 10:06:24 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri Feb 18 10:06:28 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <1108699791.3758.98.camel@schizo> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> Message-ID: <4215B010.2090600@v.loewis.de> Donovan Baarda wrote: > This patch keeps the current md5c.c, md5module.c files and adds the > following; _hashopenssl.c, hashes.py, md5.py, sha.py. [...] > If all we wanted to do was fix the md5 module If we want to fix the licensing issues with the md5 module, this patch does not help at all, as it keeps the current md5 module (along with its licensing issues). So any patch to solve the problem will need to delete the code with the questionable license. Then, the approach in the patch breaks the promise that the md5 module is always there. It would require that OpenSSL is always there - a promise that we cannot make (IMO). Regards, Martin From arigo at tunes.org Fri Feb 18 12:36:08 2005 From: arigo at tunes.org (Armin Rigo) Date: Fri Feb 18 12:39:37 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <1f7befae050217104431312214@mail.gmail.com> References: <4210AFAA.9060108@thule.no> <1f7befae050214074122b715a@mail.gmail.com> <20050217181119.GA3055@vicky.ecs.soton.ac.uk> <1f7befae050217104431312214@mail.gmail.com> Message-ID: <20050218113608.GB25496@vicky.ecs.soton.ac.uk> Hi Tim, On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote: > > 256 ** struct.calcsize('P') > > Now if you'll just sign and fax a Zope contributor agreement, I'll > upgrade ZODB to use this slick trick . I hereby donate this line of code to the public domain :-) Armin From skip at pobox.com Fri Feb 18 15:41:42 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 15:39:15 2005 Subject: [Python-Dev] Five review rule on the /dev/ page? In-Reply-To: <20050217173346.GB18117@panix.com> References: <16916.50199.723442.36695@montanaro.dyndns.org> <20050217173346.GB18117@panix.com> Message-ID: <16917.65190.515241.199460@montanaro.dyndns.org> aahz> This should go into Brett's survey of the Python dev process, not aahz> as official documentation. It's simply an offer made by some of aahz> the prominent members of python-dev. As long as it's referred to from www.python.org/dev that's fine. Skip From skip at pobox.com Fri Feb 18 15:57:39 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 15:55:29 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <16918.611.903084.183700@montanaro.dyndns.org> >> Based on some ideas from Skip, I had tried transforming the likes of >> "x in (1,2,3)" into "x in frozenset([1,2,3])".... Fredrik> savings in what? time or bytecode size? constructed Fredrik> micro-benchmarks, or examples from real-life code? Fredrik> do we have any statistics on real-life "n" values? My original suggestion wasn't based on performance issues. It was based on the notion of tuples-as-records and lists-as-arrays. Raymond had originally gone through the code and changed for x in [1,2,3]: to for x in (1,2,3): I suggested that since the standard library code is commonly used as an example of basic Python principles (that's probably not the right word), it should uphold that ideal tuple/list distinction. Raymond then translated for x in [1,2,3]: to for x in frozenset([1,2,3]): I'm unclear why the list in "for x in [1,2,3]" or "if x not in [1,2,3]" can't fairly easily be recognized as a constant and just be placed in the constants array. The bytecode would show n LOAD_CONST opcodes followed by BUILD_LIST then either a COMPARE_OP (in the test case) or GET_ITER+FOR_ITER (in the for loop case). I think the optimizer should be able to recognize both constructs fairly easily. I don't know if that would provide a performance increase or not. I was after separation of functionality between tuples and lists. Skip From python at rcn.com Fri Feb 18 15:58:10 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 16:02:09 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <16918.611.903084.183700@montanaro.dyndns.org> Message-ID: <000001c515ca$4378e260$803cc797@oemcomputer> > I'm unclear why the list in "for x in [1,2,3]" or "if x not in [1,2,3]" > can't fairly easily be recognized as a constant and just be placed in the > constants array. That part got done (at least for the if-statement). The question is whether the type transformation idea should be carried a step further so that a single step search operation replaces the linear search. Raymond From irmen at xs4all.nl Fri Feb 18 15:36:15 2005 From: irmen at xs4all.nl (Irmen de Jong) Date: Fri Feb 18 16:02:14 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <16918.611.903084.183700@montanaro.dyndns.org> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> <16918.611.903084.183700@montanaro.dyndns.org> Message-ID: <4215FD5F.4040605@xs4all.nl> Skip Montanaro wrote: > I suggested that since the standard library code is commonly used as an > example of basic Python principles (that's probably not the right word), it > should uphold that ideal tuple/list distinction. Raymond then translated > > for x in [1,2,3]: > > to > > for x in frozenset([1,2,3]): I may be missing something here (didn't follow the whole thread) but those two are not functionally equal. The docstring on frozenset sais "Build an immutable unordered collection." So there's no guarantee that the elements will return from the frozenset iterator in the order that you constructed the frozenset with, right? --Irmen From python at rcn.com Fri Feb 18 16:15:04 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 16:19:03 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <4215FD5F.4040605@xs4all.nl> Message-ID: <000101c515cc$9f96d0a0$803cc797@oemcomputer> > > Raymond then > translated > > > > for x in [1,2,3]: > > > > to > > > > for x in frozenset([1,2,3]): That's not right. for-statements are not touched. > I may be missing something here (didn't follow the whole thread) but > those two are not functionally equal. > The docstring on frozenset sais "Build an immutable unordered collection." > So there's no guarantee that the elements will return from the > frozenset iterator in the order that you constructed the frozenset with, > right? Only contains expressions are translated: "if x in [1,2,3]" currently turns into: "if x in (1,2,3)" and I'm proposing that it go one step further: "if x in Seachset([1,2,3])" where Search set is a frozenset subtype that doesn't require x to be hashable. Also, the transformation would only happen when the contents of the search are all constants. Raymond From pje at telecommunity.com Fri Feb 18 16:36:43 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 16:34:03 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <000101c515cc$9f96d0a0$803cc797@oemcomputer> References: <4215FD5F.4040605@xs4all.nl> Message-ID: <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> At 10:15 AM 2/18/05 -0500, Raymond Hettinger wrote: >Only contains expressions are translated: > > "if x in [1,2,3]" > >currently turns into: > > "if x in (1,2,3)" > >and I'm proposing that it go one step further: > > "if x in Seachset([1,2,3])" ISTM that whenever I use a constant in-list like that, it's almost always with just a few (<4) items, so it doesn't seem worth the extra effort (especially disrupting the marshal module) just to squeeze out those extra two comparisons and replace them with a hashing operation. From fredrik at pythonware.com Fri Feb 18 16:45:32 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 16:45:45 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: >>Only contains expressions are translated: >> >> "if x in [1,2,3]" >> >>currently turns into: >> >> "if x in (1,2,3)" >> >>and I'm proposing that it go one step further: >> >> "if x in Seachset([1,2,3])" > > ISTM that whenever I use a constant in-list like that, it's almost always with just a few (<4) > items, so it doesn't seem worth the extra effort (especially disrupting the marshal module) just > to squeeze out those extra two comparisons and replace them with a hashing operation. it could be worth expanding them to "if x == 1 or x == 2 or x == 3:" though... C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" 10000000 loops, best of 3: 0.11 usec per loop C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.0691 usec per loop C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.123 usec per loop C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" 10000000 loops, best of 3: 0.143 usec per loop C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.187 usec per loop C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.197 usec per loop C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.225 usec per loop C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 10000000 loops, best of 3: 0.161 usec per loop From skip at pobox.com Fri Feb 18 17:03:28 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Feb 18 17:00:59 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <000101c515cc$9f96d0a0$803cc797@oemcomputer> References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> Message-ID: <16918.4560.171364.66303@montanaro.dyndns.org> >> > Raymond then >> translated >> > >> > for x in [1,2,3]: >> > >> > to >> > >> > for x in frozenset([1,2,3]): Raymond> That's not right. for-statements are not touched. Thanks for the correction. My apologies for the misstep. Skip From pje at telecommunity.com Fri Feb 18 17:42:51 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 17:40:12 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> At 04:45 PM 2/18/05 +0100, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > >>Only contains expressions are translated: > >> > >> "if x in [1,2,3]" > >> > >>currently turns into: > >> > >> "if x in (1,2,3)" > >> > >>and I'm proposing that it go one step further: > >> > >> "if x in Seachset([1,2,3])" > > > > ISTM that whenever I use a constant in-list like that, it's almost > always with just a few (<4) > > items, so it doesn't seem worth the extra effort (especially disrupting > the marshal module) just > > to squeeze out those extra two comparisons and replace them with a > hashing operation. > >it could be worth expanding them to > > "if x == 1 or x == 2 or x == 3:" > >though... > >C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" >10000000 loops, best of 3: 0.11 usec per loop >C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.0691 usec per loop > >C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.123 usec per loop >C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" >10000000 loops, best of 3: 0.143 usec per loop > >C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.187 usec per loop >C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" >1000000 loops, best of 3: 0.197 usec per loop > >C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" >1000000 loops, best of 3: 0.225 usec per loop >C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" >10000000 loops, best of 3: 0.161 usec per loop > > Were these timings done with the code that turns (1,2,3) into a constant? Also, I presume that these timings still include extra LOAD_FAST operations that could be replaced with DUP_TOP in the actual expansion, although I don't know how much difference that would make in practice, since saving the argument fetch might be offset by the need to swap and pop at the end. From fredrik at pythonware.com Fri Feb 18 17:52:08 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 17:52:16 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: > Were these timings done with the code that turns (1,2,3) into a constant? I used a stock 2.4 from python.org, which seems to do this (for tuples, not for lists). > Also, I presume that these timings still include extra LOAD_FAST operations that could be replaced > with DUP_TOP in the actual expansion, although I don't know how much difference that would make in > practice, since saving the argument fetch might be offset by the need to swap and pop at the end. here's the disassembly: >>> dis.dis(compile("if a in (1, 2, 3): pass", "", "exec")) 1 0 LOAD_NAME 0 (a) 3 LOAD_CONST 4 ((1, 2, 3)) 6 COMPARE_OP 6 (in) 9 JUMP_IF_FALSE 4 (to 16) 12 POP_TOP 13 JUMP_FORWARD 1 (to 17) >> 16 POP_TOP >> 17 LOAD_CONST 3 (None) 20 RETURN_VALUE >>> dis.dis(compile("if a == 1 or a == 2 or a == 3: pass", "", "exec")) 1 0 LOAD_NAME 0 (a) 3 LOAD_CONST 0 (1) 6 COMPARE_OP 2 (==) 9 JUMP_IF_TRUE 26 (to 38) 12 POP_TOP 13 LOAD_NAME 0 (a) 16 LOAD_CONST 1 (2) 19 COMPARE_OP 2 (==) 22 JUMP_IF_TRUE 13 (to 38) 25 POP_TOP 26 LOAD_NAME 0 (a) 29 LOAD_CONST 2 (3) 32 COMPARE_OP 2 (==) 35 JUMP_IF_FALSE 4 (to 42) >> 38 POP_TOP 39 JUMP_FORWARD 1 (to 43) >> 42 POP_TOP >> 43 LOAD_CONST 3 (None) 46 RETURN_VALUE From pje at telecommunity.com Fri Feb 18 18:09:29 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri Feb 18 18:06:50 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> At 05:52 PM 2/18/05 +0100, Fredrik Lundh wrote: >Phillip J. Eby wrote: > > > Were these timings done with the code that turns (1,2,3) into a constant? > >I used a stock 2.4 from python.org, which seems to do this (for tuples, >not for lists). > > > Also, I presume that these timings still include extra LOAD_FAST > operations that could be replaced > > with DUP_TOP in the actual expansion, although I don't know how much > difference that would make in > > practice, since saving the argument fetch might be offset by the need > to swap and pop at the end. > >here's the disassembly: FYI, that's not a dissassembly of what timeit was actually timing; see 'template' in timeit.py. As a practical matter, the only difference would probably be the use of LOAD_FAST instead of LOAD_NAME, as timeit runs the code in a function body. But whatever. Still, it's rather interesting that tuple.__contains__ appears slower than a series of LOAD_CONST and "==" operations, considering that the tuple should be doing basically the same thing, only without bytecode fetch-and-decode overhead. Maybe it's tuple.__contains__ that needs optimizing here? From fredrik at pythonware.com Fri Feb 18 18:12:50 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri Feb 18 18:12:40 2005 Subject: [Python-Dev] Re: Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com><5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> Message-ID: Phillip J. Eby wrote: >>here's the disassembly: > > FYI, that's not a dissassembly of what timeit was actually timing; see 'template' in timeit.py. > As a practical matter, the only difference would probably be the use of LOAD_FAST instead of > LOAD_NAME, as > timeit runs the code in a function body. >>> def f1(a): ... if a in (1, 2, 3): ... pass ... >>> def f2(a): ... if a == 1 or a == 2 or a == 3: ... pass ... >>> dis.dis(f1) 2 0 LOAD_FAST 0 (a) 3 LOAD_CONST 4 ((1, 2, 3)) 6 COMPARE_OP 6 (in) 9 JUMP_IF_FALSE 4 (to 16) 12 POP_TOP 3 13 JUMP_FORWARD 1 (to 17) >> 16 POP_TOP >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE >>> >>> dis.dis(f2) 2 0 LOAD_FAST 0 (a) 3 LOAD_CONST 1 (1) 6 COMPARE_OP 2 (==) 9 JUMP_IF_TRUE 26 (to 38) 12 POP_TOP 13 LOAD_FAST 0 (a) 16 LOAD_CONST 2 (2) 19 COMPARE_OP 2 (==) 22 JUMP_IF_TRUE 13 (to 38) 25 POP_TOP 26 LOAD_FAST 0 (a) 29 LOAD_CONST 3 (3) 32 COMPARE_OP 2 (==) 35 JUMP_IF_FALSE 4 (to 42) >> 38 POP_TOP 3 39 JUMP_FORWARD 1 (to 43) >> 42 POP_TOP >> 43 LOAD_CONST 0 (None) 46 RETURN_VALUE > Still, it's rather interesting that tuple.__contains__ appears slower than a series of LOAD_CONST > and "==" operations, considering that the tuple should be doing basically the same thing, only > without bytecode fetch-and-decode overhead. Maybe it's tuple.__contains__ that needs optimizing > here? wouldn't be the first time... From jimjjewett at gmail.com Fri Feb 18 20:10:05 2005 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri Feb 18 20:10:09 2005 Subject: [Python-Dev] Prospective Peephole Transformation Message-ID: Raymond Hettinger: > tried transforming the likes of "x in (1,2,3)" into "x in frozenset([1,2,3])". >... There were substantial savings even if the set contained only a single entry. >... where x was non-hashable and it would raise a TypeError instead of > returning False as it should. I read the objection as saying that it should not return False, because an unhashable object might pretend it is equal to a hashable one in the set. """ class Searchset(frozenset): def __contains__(self, element): try: return frozenset.__contains__(self, element) except TypeError: return False """ So instead of return False it should be return x in frozenset.__iter__() This would be a net loss if there were many unhashable x. You could restrict the iteration to x that implement a custom __eq__, if you ensured that none of the SearchSet elements do... but it starts to get uglier and less general. Raymond has already look at http://www.python.org/sf/1141428, which contains some test case patches to enforce this implicit "sequences always use __eq__; only mappings can short-circuit on __hash__" contract. -jJ From mal at egenix.com Fri Feb 18 21:57:16 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Fri Feb 18 21:57:22 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <000c01c51586$92c7dd60$3a01a044@oemcomputer> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <421656AC.6010602@egenix.com> Raymond Hettinger wrote: > Based on some ideas from Skip, I had tried transforming the likes of "x > in (1,2,3)" into "x in frozenset([1,2,3])". When applicable, it > substantially simplified the generated code and converted the O(n) > lookup into an O(1) step. There were substantial savings even if the > set contained only a single entry. When disassembled, the bytecode is > not only much shorter, it is also much more readable (corresponding > almost directly to the original source). > > The problem with the transformation was that it didn't handle the case > where x was non-hashable and it would raise a TypeError instead of > returning False as it should. That situation arose once in the email > module's test suite. > > To get it to work, I would have to introduce a frozenset subtype: > > class Searchset(frozenset): > def __contains__(self, element): > try: > return frozenset.__contains__(self, element) > except TypeError: > return False > > Then, the transformation would be "x in Searchset([1, 2, 3])". Since > the new Searchset object goes in the constant table, marshal would have > to be taught how to save and restore the object. > > This is a more complicated than the original frozenset version of the > patch, so I would like to get feedback on whether you guys think it is > worth it. Wouldn't it help a lot more if the compiler would detect that (1,2,3) is immutable and convert it into a constant at compile time ?! The next step would then be to have Python roll out these loops (in -O mode). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 18 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From oliphant at ee.byu.edu Fri Feb 18 22:12:53 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 22:12:56 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used Message-ID: <42165A55.3000609@ee.byu.edu> Hello again, There is a great discussion going on the numpy list regarding a proposed PEP for multidimensional arrays that is in the works. During this discussion as resurfaced regarding slicing with objects that are not IntegerType objects but that have a tp_as_number->nb_int method to convert to an int. Would it be possible to change _PyEval_SliceIndex in ceval.c so that rather than throwing an error if the indexing object is not an integer, the code first checks to see if the object has a tp_as_number->nb_int method and calls it instead. If this is acceptable, it is an easy patch. Thanks, -Travis Oliphant From gvanrossum at gmail.com Fri Feb 18 22:28:34 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri Feb 18 22:28:39 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42165A55.3000609@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> Message-ID: > Would it be possible to change > > _PyEval_SliceIndex in ceval.c > > so that rather than throwing an error if the indexing object is not an > integer, the code first checks to see if the object has a > tp_as_number->nb_int method and calls it instead. I don't think this is the right solution; since float has that method, it would allow floats to be used as slice indices, but that's not supposed to work (to protect yourself against irreproducible results due to rounding errors). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bac at OCF.Berkeley.EDU Fri Feb 18 22:31:47 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Fri Feb 18 22:31:58 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42165A55.3000609@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> Message-ID: <42165EC3.6010209@ocf.berkeley.edu> Travis Oliphant wrote: > Hello again, > > There is a great discussion going on the numpy list regarding a proposed > PEP for multidimensional arrays that is in the works. > > During this discussion as resurfaced regarding slicing with objects that > are not IntegerType objects but that > have a tp_as_number->nb_int method to convert to an int. > Would it be possible to change > > _PyEval_SliceIndex in ceval.c > > so that rather than throwing an error if the indexing object is not an > integer, the code first checks to see if the object has a > tp_as_number->nb_int method and calls it instead. > You would also have to change apply_slice() since that also has a guard for checking the slice arguments are either NULL, int, or long objects. But I am +1 with it since the guard is already there for ints and longs to handle those properly and thus the common case does not slow down in any way. As long as it also accepts Python objects that define __int__ and not just C types that have the nb_int slot defined I am okay with this idea. -Brett From oliphant at ee.byu.edu Fri Feb 18 22:35:43 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 22:35:46 2005 Subject: [Python-Dev] Fix _PyEval_SliceIndex (Take two) Message-ID: <42165FAF.8080703@ee.byu.edu> (More readable second paragraph) Hello again, There is a great discussion going on the numpy list regarding a proposed PEP for multidimensional arrays that is in the works. During this discussion a problem has resurfaced regarding slicing with objects that are not IntegerType objects but that have a tp_as_number->nb_int method. Would it be possible to change _PyEval_SliceIndex in ceval.c so that rather than raising an exception if the indexing object is not an integer, the code first checks to see if the object has a tp_as_number->nb_int method and trys it before raising an exception. If this is acceptable, it is an easy patch. Thanks, -Travis Oliphant From david.ascher at gmail.com Fri Feb 18 22:36:31 2005 From: david.ascher at gmail.com (David Ascher) Date: Fri Feb 18 22:36:34 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: On Fri, 18 Feb 2005 13:28:34 -0800, Guido van Rossum wrote: > > Would it be possible to change > > > > _PyEval_SliceIndex in ceval.c > > > > so that rather than throwing an error if the indexing object is not an > > integer, the code first checks to see if the object has a > > tp_as_number->nb_int method and calls it instead. > > I don't think this is the right solution; since float has that method, > it would allow floats to be used as slice indices, but that's not > supposed to work (to protect yourself against irreproducible results > due to rounding errors). I wonder if floats are the special case here, not "integer like objects". I've never been particularly happy about the confusion between the two roles of int() and it's C equivalents, i.e. casting and conversion. From gvanrossum at gmail.com Fri Feb 18 22:48:16 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Fri Feb 18 22:48:55 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: [Travis] > > > Would it be possible to change > > > > > > _PyEval_SliceIndex in ceval.c > > > > > > so that rather than throwing an error if the indexing object is not an > > > integer, the code first checks to see if the object has a > > > tp_as_number->nb_int method and calls it instead. [Guido] > > I don't think this is the right solution; since float has that method, > > it would allow floats to be used as slice indices, but that's not > > supposed to work (to protect yourself against irreproducible results > > due to rounding errors). [David] > I wonder if floats are the special case here, not "integer like objects". > > I've never been particularly happy about the confusion between the two > roles of int() and it's C equivalents, i.e. casting and conversion. You're right, that's the crux of the matter; I unfortunately copied a design mistake from C here. In Python 3000 I'd like to change this so that floats have a __trunc__() method to return an integer (invokable via trunc(x)). But in Python 2.x, we can't be sure that floats are the *only* exception -- surely people who are implementing their own "float-like" classes are copying float's example and implementing __int__ to mean the same thing. For example, the new decimal class in Python 2.4 has a converting/truncating __int__ method. (And despite being decimal, it's no less approximate than float; decimal is *not* an exact numerical type.) So I still think it's unsafe (in Python 2.x) to accept __int__ in the way Travis proposes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bob at redivi.com Fri Feb 18 22:54:25 2005 From: bob at redivi.com (Bob Ippolito) Date: Fri Feb 18 22:54:28 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: <80cd5d26efaff4232b909b0567fb5ea3@redivi.com> On Feb 18, 2005, at 4:36 PM, David Ascher wrote: > On Fri, 18 Feb 2005 13:28:34 -0800, Guido van Rossum > wrote: >>> Would it be possible to change >>> >>> _PyEval_SliceIndex in ceval.c >>> >>> so that rather than throwing an error if the indexing object is not >>> an >>> integer, the code first checks to see if the object has a >>> tp_as_number->nb_int method and calls it instead. >> >> I don't think this is the right solution; since float has that method, >> it would allow floats to be used as slice indices, but that's not >> supposed to work (to protect yourself against irreproducible results >> due to rounding errors). > > I wonder if floats are the special case here, not "integer like > objects". > > I've never been particularly happy about the confusion between the two > roles of int() and it's C equivalents, i.e. casting and conversion. All of the __special__ methods for this purpose seem to be usable only for conversion, not casting (__str__, __unicode__, etc.). The only way I've found to pass for a particular value type is to subclass one. We do this a lot in PyObjC. It ends up being a net win anyway, because you get free implementations of all the relevant methods, at the expense of having two copies of the value. The fact that these proxy objects are no longer visible-from-Python subclasses of Objective-C objects isn't really a big deal in our case, because the canonical Objective-C way to checking inheritance still work. The wrapper types use an attribute protocol for casting (__pyobjc_object__), and delegate to this object with __getattr__. >>> from Foundation import * >>> one = NSNumber.numberWithInt_(1) >>> type(one).mro() [, , ] >>> isinstance(one, NSNumber) False >>> isinstance(one.__pyobjc_object__, NSNumber) True >>> one.isKindOfClass_(NSNumber) 1 >>> type(one) >>> type(one.__pyobjc_object__) -bob From ejones at uwaterloo.ca Fri Feb 18 22:58:36 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Fri Feb 18 22:59:25 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <1f7befae050217193863ffc028@mail.gmail.com> References: <1f7befae050217193863ffc028@mail.gmail.com> Message-ID: On Thu, 2005-02-17 at 22:38, Tim Peters wrote: > Then you allocate a small object, marked 's': > > bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff Isn't the whole point of obmalloc is that we don't want to allocate "s" on the heap, since it is small? I guess "s" could be an object that might potentially grow. > One thing to take from that is that LFH can't be helping list-growing > in a direct way either, if LFH (as seems likely) also needs to copy > objects that grow in order to keep its internal memory segregated by > size. The indirect benefit is still available, though: LFH may be > helping simply by keeping smaller objects out of the general heap's > hair. So then wouldn't this mean that there would have to be some sort of small object being allocated via the system malloc that is causing the poor behaviour? As you mention, I wouldn't think it would be list objects, since resizing lists using LFH should be *worse*. That would actually be something that is worth verifying, however. It could be that the Windows LFH is extra clever? > I'm afraid the only you can know for sure is by obtaining detailed > memory maps and analyzing them. Well, it would also be useful to find out what code is calling the system malloc. This would make it easy to examine the code and see if it should be calling obmalloc or the system malloc. Any good ideas for easily obtaining this information? I imagine that some profilers must be able to produce a complete call graph? Evan Jones From ejones at uwaterloo.ca Fri Feb 18 23:07:46 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Fri Feb 18 23:12:10 2005 Subject: [Python-Dev] Memory Allocator Part 2: Did I get it right? In-Reply-To: <4212FB5B.1030209@v.loewis.de> References: <8b28704b4465e03002fc70db5facedb6@uwaterloo.ca> <1f7befae05021514524d0a35ec@mail.gmail.com> <4c0d14b0b08390d046e1220b6f360745@uwaterloo.ca> <1f7befae05021520263d77a2a3@mail.gmail.com> <4212FB5B.1030209@v.loewis.de> Message-ID: Sorry for taking so long to get back to this thread, it has been one of those weeks for me. On Feb 16, 2005, at 2:50, Martin v. L?wis wrote: > Evan then understood the feature, and made it possible. This is very true: it was a very useful exercise. > I can personally accept breaking the code that still relies on the > invalid APIs. The only problem is that it is really hard to determine > whether some code *does* violate the API usage. Great. Please ignore the patch on SourceForge for a little while. I'll produce a "revision 3" this weekend, without the compatibility hack. Evan Jones From python at rcn.com Fri Feb 18 23:09:19 2005 From: python at rcn.com (Raymond Hettinger) Date: Fri Feb 18 23:13:23 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <421656AC.6010602@egenix.com> Message-ID: <001401c51606$7ec6cda0$803cc797@oemcomputer> > Wouldn't it help a lot more if the compiler would detect that > (1,2,3) is immutable and convert it into a constant at > compile time ?! Yes. We've already gotten it to that point: Python 2.5a0 (#46, Feb 15 2005, 19:11:35) [MSC v.1200 32 bit (Intel)] on win32 >>> import dis >>> dis.dis(compile('x in ("xml", "html", "css")', '', 'eval')) 0 0 LOAD_NAME 0 (x) 3 LOAD_CONST 3 (('xml', 'html', 'css')) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE The question is whether to go a step further to replace the linear search with a single hashed lookup: 0 0 LOAD_NAME 0 (x) 3 LOAD_CONST 3 (searchset(['xml', 'html', 'css'])) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE This situation seems to arise often in source code. You can see the cases in the standard library with: grep 'in ("' *.py The transformation is easy to make at compile time. The part holding me back is the introduction of searchset as a frozenset subtype and teaching marshal how to put it a pyc file. FWIW, some sample timings are included below (using frozenset to approximate what searchset would do). The summary is that the tuple search takes .49usec plus .12usec for each item searched until a match is found. The frozenset lookup takes a constant .53 usec. Raymond ------------------------------------------------------------------------ C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='xml'" "x in s" 1000000 loops, best of 9: 0.49 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='css'" "x in s" 1000000 loops, best of 9: 0.621 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='html'" "x in s" 1000000 loops, best of 9: 0.747 usec per loop C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='pdf'" "x in s" 100000 loops, best of 9: 0.851 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='xml'" "x in s" 1000000 loops, best of 9: 0.529 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='css'" "x in s" 1000000 loops, best of 9: 0.522 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='html'" "x in s" 1000000 loops, best of 9: 0.53 usec per loop C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s "x='pdf'" "x in s" 1000000 loops, best of 9: 0.523 usec per loop From oliphant at ee.byu.edu Fri Feb 18 23:40:54 2005 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Feb 18 23:40:57 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> Message-ID: <42166EF6.7010600@ee.byu.edu> Guido van Rossum wrote: >>Would it be possible to change >> >>_PyEval_SliceIndex in ceval.c >> >>so that rather than throwing an error if the indexing object is not an >>integer, the code first checks to see if the object has a >>tp_as_number->nb_int method and calls it instead. >> >> > >I don't think this is the right solution; since float has that method, >it would allow floats to be used as slice indices, > > O.K., then how about if arrayobjects can make it in the core, then a check for a rank-0 integer-type arrayobject is allowed before raising an exception? -Travis From tim.peters at gmail.com Fri Feb 18 23:51:37 2005 From: tim.peters at gmail.com (Tim Peters) Date: Fri Feb 18 23:51:40 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: <1f7befae050217193863ffc028@mail.gmail.com> Message-ID: <1f7befae050218145157bd81c9@mail.gmail.com> [Tim Peters] ... >> Then you allocate a small object, marked 's': >> >> bbbbbbbbbbbbbbbsfffffffffffffffffffffffffffffff [Evan Jones] > Isn't the whole point of obmalloc No, because it doesn't matter what follows that introduction: obmalloc has several points, including exploiting the GIL, heuristics aiming at reusing memory while it's still high in the memory heirarchy, almost never touching a piece of memory until it's actually needed, and so on. > is that we don't want to allocate "s" on the heap, since it is small? That's one of obmalloc's goals, yes. But "small" is a relative adjective, not absolute. Because we're primarily talking about LFH here, the natural meaning for "small" in _this_ thread is < 16KB, which is much larger than "small" means to obmalloc. The memory-map example applies just well to LFH as to obmalloc, by changing which meaning for "small" you have in mind. > I guess "s" could be an object that might potentially grow. For example, list guts in Python are never handled by obmalloc, although the small fixed-size list _header_ object is always handled by obmalloc. >> One thing to take from that is that LFH can't be helping list-growing >> in a direct way either, if LFH (as seems likely) also needs to copy >> objects that grow in order to keep its internal memory segregated by >> size. The indirect benefit is still available, though: LFH may be >> helping simply by keeping smaller objects out of the general heap's >> hair. > So then wouldn't this mean that there would have to be some sort of > small object being allocated via the system malloc that is causing the > poor behaviour? Yes. For example, a 300-character string could do it (that's not small to obmalloc, but is to LFH). Strings produced by pickling are very often that large, and especially in Zope (which uses pickles extensively under the covers -- reading and writing persistent objects in Zope all involve pickle strings). > As you mention, I wouldn't think it would be list objects, since resizing > lists using LFH should be *worse*. Until they get to LFH's boundary for "small", and we have only the vaguest idea what Martin's app does here -- we know it grows lists containing 50K elements in the end, and ... well, that's all I really know about it . A well-known trick is applicable in that case, if Martin thinks it's worth the bother: grow the list to its final size once, at the start (overestimating if you don't know for sure). Then instead of appending, keep an index to the next free slot, same as you'd do in C. Then the list guts never move, so if that doesn't yield the same kind of speedup without using LFH, list copying wasn't actually the culprit to begin with. > That would actually be something that is worth verifying, however. Not worth the time to me -- Windows is closed-source, and I'm too old to enjoy staring at binary disassemblies any more. Besides, list guts can't stay in LFH after the list exceeds 4K elements. If list-copying costs are significant here, they're far more likely to be due to copying lists over 4K elements than under -- copying a list takes O(len(list)) time. So the realloc() strategy used by LFH _probably_ isn't of _primary)_ interest here. > It could be that the Windows LFH is extra clever? Sure -- that I doubt it moves Heaven & Earth to cater to reallocs is just educated guessing. I wrote my first production heap manager at Cray Research, around 1979 . > ... > Well, it would also be useful to find out what code is calling the > system malloc. This would make it easy to examine the code and see if > it should be calling obmalloc or the system malloc. Any good ideas for > easily obtaining this information? I imagine that some profilers must > be able to produce a complete call graph? Windows supports extensive facilities for analyzing heap usage, even from an external process that attaches to the process you want to analyze. Ditto for profiling. But it's not easy, and I don't know of any free tools that are of real help. If someone were motivated enough, it would probably be easiest to run Martin's app on a Linux box, and use the free Linux tools to analyze it. From david.ascher at gmail.com Sat Feb 19 00:08:24 2005 From: david.ascher at gmail.com (David Ascher) Date: Sat Feb 19 00:08:34 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: <42166EF6.7010600@ee.byu.edu> References: <42165A55.3000609@ee.byu.edu> <42166EF6.7010600@ee.byu.edu> Message-ID: On Fri, 18 Feb 2005 15:40:54 -0700, Travis Oliphant wrote: > Guido van Rossum wrote: > > >>Would it be possible to change > >> > >>_PyEval_SliceIndex in ceval.c > >> > >>so that rather than throwing an error if the indexing object is not an > >>integer, the code first checks to see if the object has a > >>tp_as_number->nb_int method and calls it instead. > >> > >> > > > >I don't think this is the right solution; since float has that method, > >it would allow floats to be used as slice indices, > > > > > O.K., > > then how about if arrayobjects can make it in the core, then a check for > a rank-0 integer-type > arrayobject is allowed before raising an exception? Following up on Bob's point, maybe making rank-0 integer type arrayobjects inherit from int has some mileage? Somewhat weird, but... From mal at egenix.com Sat Feb 19 00:42:35 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Feb 19 00:42:42 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <001401c51606$7ec6cda0$803cc797@oemcomputer> References: <001401c51606$7ec6cda0$803cc797@oemcomputer> Message-ID: <42167D6B.9020606@egenix.com> Raymond Hettinger wrote: >>Wouldn't it help a lot more if the compiler would detect that >>(1,2,3) is immutable and convert it into a constant at >>compile time ?! > > > Yes. We've already gotten it to that point: > > Python 2.5a0 (#46, Feb 15 2005, 19:11:35) [MSC v.1200 32 bit (Intel)] on > win32 > >>>>import dis >>>>dis.dis(compile('x in ("xml", "html", "css")', '', 'eval')) > > 0 0 LOAD_NAME 0 (x) > 3 LOAD_CONST 3 (('xml', 'html', 'css')) > 6 COMPARE_OP 6 (in) > 9 RETURN_VALUE Cool. Does that work for all tuples in the program ? > The question is whether to go a step further to replace the linear > search with a single hashed lookup: > > 0 0 LOAD_NAME 0 (x) > 3 LOAD_CONST 3 (searchset(['xml', 'html', > 'css'])) > 6 COMPARE_OP 6 (in) > 9 RETURN_VALUE > > This situation seems to arise often in source code. You can see the > cases in the standard library with: grep 'in ("' *.py I did a search on our code and Python's std lib. It turns out that by far most such usages use either 2 or 3 values in the tuple. If you look at the types of the values, the most common usages are strings and integers. I'd assume that you'll get somewhat different results from your benchmark if you had integers in the tuple. > The transformation is easy to make at compile time. The part holding me > back is the introduction of searchset as a frozenset subtype and > teaching marshal how to put it a pyc file. Hmm, what if you'd teach tuples to do faster contains lookups for string or integer only content, e.g. by introducing sub-types for string-only and integer-only tuples ?! > FWIW, some sample timings are included below (using frozenset to > approximate what searchset would do). The summary is that the tuple > search takes .49usec plus .12usec for each item searched until a match > is found. The frozenset lookup takes a constant .53 usec. > > > > Raymond > > > > ------------------------------------------------------------------------ > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='xml'" > "x in s" > 1000000 loops, best of 9: 0.49 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='css'" > "x in s" > 1000000 loops, best of 9: 0.621 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='html'" > "x in s" > 1000000 loops, best of 9: 0.747 usec per loop > > C:\py25>python -m timeit -r9 -s "s=('xml', 'css', 'html')" -s "x='pdf'" > "x in s" > 100000 loops, best of 9: 0.851 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='xml'" "x in s" > 1000000 loops, best of 9: 0.529 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='css'" "x in s" > 1000000 loops, best of 9: 0.522 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='html'" "x in s" > 1000000 loops, best of 9: 0.53 usec per loop > > C:\py25>python -m timeit -r9 -s "s=frozenset(['xml', 'css', 'html'])" -s > "x='pdf'" "x in s" > 1000000 loops, best of 9: 0.523 usec per loop -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 19 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From gvanrossum at gmail.com Sat Feb 19 00:49:44 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sat Feb 19 00:49:47 2005 Subject: [Python-Dev] Fixing _PyEval_SliceIndex so that integer-like objects can be used In-Reply-To: References: <42165A55.3000609@ee.byu.edu> <42166EF6.7010600@ee.byu.edu> Message-ID: [Travis] > > then how about if arrayobjects can make it in the core, then a check for > > a rank-0 integer-type > > arrayobject is allowed before raising an exception? Sure, *if* you can get the premise accepted. [David] > Following up on Bob's point, maybe making rank-0 integer type > arrayobjects inherit from int has some mileage? Somewhat weird, > but... Hm, currently inheriting from int would imply that the C-level memory lay-out of the object is an extension of the built-in int type. That's probably too much of a constraint. But perhaps somehow rank-0-integer-array and int could be the same type? I don't think it would hurt too badly if an int had a method to find out its rank as an array. And I assume you can't iterate over a rank-0 array, right? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ejones at uwaterloo.ca Sat Feb 19 01:10:55 2005 From: ejones at uwaterloo.ca (Evan Jones) Date: Sat Feb 19 01:10:51 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: <1f7befae050218145157bd81c9@mail.gmail.com> References: <1f7befae050217193863ffc028@mail.gmail.com> <1f7befae050218145157bd81c9@mail.gmail.com> Message-ID: On Feb 18, 2005, at 17:51, Tim Peters wrote: > grow the list to its final size once, at the start (overestimating if > you don't know for sure). Then instead of appending, keep an index to > the next free slot, same as you'd do in C. Then the list guts never > move, so if that doesn't yield the same kind of speedup without using > LFH, list copying wasn't actually the culprit to begin with. If this *does* improve the performance of his application by 15%, that would strongly argue for an addition to the list API similar to Java's ArrayList.ensureCapacity or the STL's vector::reserve. Since the list implementation already maintains separate ints for the list array size and the list occupied size, this would really just expose this implementation detail to Python. I don't like revealing the implementation in this fashion, but if it does make a significant performance difference, it could be worth it. http://java.sun.com/j2se/1.5.0/docs/api/java/util/ ArrayList.html#ensureCapacity(int) http://www.sgi.com/tech/stl/Vector.html#4 Evan Jones From tim.peters at gmail.com Sat Feb 19 02:43:06 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 02:43:10 2005 Subject: [Python-Dev] Re: Re: Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> <5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com> <5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> Message-ID: <1f7befae050218174345e029e8@mail.gmail.com> [Phillip J. Eby] >> Still, it's rather interesting that tuple.__contains__ appears slower than a >> series of LOAD_CONST and "==" operations, considering that the tuple >> should be doing basically the same thing, only> without bytecode fetch-and- >> decode overhead. Maybe it's tuple.__contains__ that needs optimizing >> here? [Fredrik Lundh] > wouldn't be the first time... How soon we forget . Fredrik introduced a pile of optimizations special-casing the snot out of small integers into ceval.c a long time ago, like this in COMPARE_OP: case COMPARE_OP: w = POP(); v = TOP(); if (PyInt_CheckExact(w) && PyInt_CheckExact(v)) { /* INLINE: cmp(int, int) */ register long a, b; register int res; a = PyInt_AS_LONG(v); b = PyInt_AS_LONG(w); switch (oparg) { case PyCmp_LT: res = a < b; break; case PyCmp_LE: res = a <= b; break; case PyCmp_EQ: res = a == b; break; case PyCmp_NE: res = a != b; break; case PyCmp_GT: res = a > b; break; case PyCmp_GE: res = a >= b; break; case PyCmp_IS: res = v == w; break; case PyCmp_IS_NOT: res = v != w; break; default: goto slow_compare; } x = res ? Py_True : Py_False; Py_INCREF(x); } else { slow_compare: x = cmp_outcome(oparg, v, w); } That's a hell of a lot faster than tuple comparison's deferral to PyObject_RichCompareBool can be, even if we inlined the same blob inside the latter (then we'd still have the additional overhead of calling PyObject_RichCompareBool). As-is, PyObject_RichCompareBool() has to do (relatively) significant work just to out find which concrete comparision implementation to call. As a result, "i == j" in Python source code, when i and j are little ints, is much faster than comparing i and j via any other route in Python. That's mostly really good, IMO -- /F's int optimizations are of major value in real life. Context-dependent optimizations make code performance less predictable too -- that's life. From python at rcn.com Sat Feb 19 02:41:24 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 02:45:29 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <42167D6B.9020606@egenix.com> Message-ID: <002401c51624$1f0ff3a0$803cc797@oemcomputer> > >>Wouldn't it help a lot more if the compiler would detect that > >>(1,2,3) is immutable and convert it into a constant at > >>compile time ?! > > > > > > Yes. We've already gotten it to that point: . . . > > Cool. Does that work for all tuples in the program ? It is limited to just tuples of constants (strings, ints, floats, complex, None, and other tuples). Also, it is limited in its ability to detect a nesting like: a=((1,2),(3,4)). One other limitation is that floats like -0.23 are not recognized as constants because the initial compilation still produces a UNARY_NEGATIVE operation: >>> dis.dis(compile('-0.23', '', 'eval')) 0 0 LOAD_CONST 0 (0.23000000000000001) 3 UNARY_NEGATIVE 4 RETURN_VALUE > I did a search on our code and Python's std lib. It turns > out that by far most such usages use either 2 or 3 > values in the tuple. If you look at the types of the > values, the most common usages are strings and integers. Right, those are the most common cases. The linear searches are ubiquitous. Here's a small selection: if comptype not in ('NONE', 'ULAW', 'ALAW', 'G722') return tail.lower() in (".py", ".pyw") assert n in (2, 3, 4, 5) if value[2] in ('F','n','N') if sectName in ("temp", "cdata", "ignore", "include", "rcdata") if not decode or encoding in ('', '7bit', '8bit', 'binary'): if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") Unfortunately, there are several common patterns that are skipped because rarely changed globals/builtins cannot be treated as constants: if isinstance(x, (int, float, complex)): # types are not constants if op in (ROT_TWO, POP_TOP, LOAD_FAST): # global consts from opcode.py except (TypeError, KeyError, IndexError): # builtins are not constant > I'd assume that you'll get somewhat different results > from your benchmark if you had integers in the tuple. Nope, the results are substantially the same give or take 2usec. > Hmm, what if you'd teach tuples to do faster contains lookups for > string or integer only content, e.g. by introducing sub-types for > string-only and integer-only tuples ?! For a linear search, tuples are already pretty darned good and leave room for only microscopic O(n) improvements. The bigger win comes from using a better algorithm and data structure -- hashing beats linear search hands-down. The constant search time is faster for all n>1, resulting in much improved scalability. No tweaking of tuple.__contains__() can match it. Sets are the right data structure for fast membership testing. I would love for sets to be used internally while letting users continue to write the clean looking code shown above. Raymond From tim.peters at gmail.com Sat Feb 19 03:06:45 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 03:06:48 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <000c01c51586$92c7dd60$3a01a044@oemcomputer> References: <000c01c51586$92c7dd60$3a01a044@oemcomputer> Message-ID: <1f7befae050218180668dad506@mail.gmail.com> [Raymond Hettinger] > ... > The problem with the transformation was that it didn't handle the case > where x was non-hashable and it would raise a TypeError instead of > returning False as it should. I'm very glad you introduced the optimization of building small constant tuples at compile-time. IMO, that was a pure win. I don't like this one, though. The meaning of "x in (c1, c2, ..., c_n)" is "x == c1 or x == c2 or ... or x == c_n", and a transformation that doesn't behave exactly like the latter in all cases is simply wrong. Even if x isn't hashable, it could still be of a type that implements __eq__, and where x.__eq__(c_i) returned True for some i, and then False is plainly the wrong result. It could also be that x is of a type that is hashable, but where x.__hash__() raises TypeError at this point in the code. That could be for good or bad (bug) reasons, but suppressing the TypeError and converting into False would be a bad thing regardless. > That situation arose once in the email module's test suite. I don't even care if no code in the standard library triggered a problem here: the transformation isn't semantically correct on the face of it. If we knew the type of x at compile-time, then sure, in most (almost all) cases we could know it was a safe transformation (and even without the hack to turn TypeError into False). But we don't know now, so the worst case has to be assumed: can't do this one now. Maybe someday, though. From tim.peters at gmail.com Sat Feb 19 03:24:55 2005 From: tim.peters at gmail.com (Tim Peters) Date: Sat Feb 19 03:24:59 2005 Subject: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15% In-Reply-To: References: <1f7befae050217193863ffc028@mail.gmail.com> <1f7befae050218145157bd81c9@mail.gmail.com> Message-ID: <1f7befae050218182444fb7413@mail.gmail.com> [Tim Peters] >> grow the list to its final size once, at the start (overestimating if >> you don't know for sure). Then instead of appending, keep an index to >> the next free slot, same as you'd do in C. Then the list guts never >> move, so if that doesn't yield the same kind of speedup without using >> LFH, list copying wasn't actually the culprit to begin with. [Evan Jones] > If this *does* improve the performance of his application by 15%, that > would strongly argue for an addition to the list API similar to Java's > ArrayList.ensureCapacity or the STL's vector::reserve. Since the > list implementation already maintains separate ints for the list array > size and the list occupied size, this would really just expose this > implementation detail to Python. I don't like revealing the > implementation in this fashion, but if it does make a significant > performance difference, it could be worth it. That's a happy thought! It was first suggested for Python in 1991 , but before Python 2.4 the list implementation didn't have separate members for current size and capacity, so "can't get there from here" was the only response. It still wouldn't be trivial, because nothing in listobject.c now believes the allocated size ever needs to be preserved, and all len()-changing list operations ensure that "not too much" overallocation remains (see list_resize() in listobject.c for details). But let's see whether it would help first. From ncoghlan at iinet.net.au Sat Feb 19 05:46:32 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sat Feb 19 05:46:38 2005 Subject: [Python-Dev] Proposal for a module to deal with hashing In-Reply-To: <20050217065330.GP25441@zot.electricrain.com> References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> Message-ID: <4216C4A8.9060408@iinet.net.au> Gregory P. Smith wrote: > fyi - i've updated the python sha1/md5 openssl patch. it now replaces > the entire sha and md5 modules with a generic hashes module that gives > access to all of the hash algorithms supported by OpenSSL (including > appropriate legacy interface wrappers and falling back to the old code > when compiled without openssl). > > https://sourceforge.net/tracker/index.php?func=detail&aid=1121611&group_id=5470&atid=305470 > > I don't quite like the module name 'hashes' that i chose for the > generic interface (too close to the builtin hash() function). Other > suggestions on a module name? 'digest' comes to mind. 'hashtools' and 'hashlib' would both have precedents in the standard library (itertools and urllib, for example). It occurs to me that such a module would provide a way to fix the bug with incorrectly hashable instances of new-style classes: Py> class C: ... def __eq__(self, other): return True ... Py> hash(C()) Traceback (most recent call last): File "", line 1, in ? TypeError: unhashable instance Py> class C(object): ... def __eq__(self, other): return True ... Py> hash(C()) 10357232 Guido wanted to fix this by eliminating object.__hash__, but that caused problems for Jython. If I remember that discussion correctly, the problem was that, in Jython, the default hash is _not_ simply hash(id(obj)) the way it is in CPython, so Python code needs a way to get access to the default implementation. A hashtools.default_hash that worked like the current object.__hash__ would seem to provide such a spelling, and allow object.__hash__ to be removed (fixing the above bug). Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From python at rcn.com Sat Feb 19 05:47:01 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 05:54:07 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <1f7befae050218180668dad506@mail.gmail.com> Message-ID: <002d01c5163e$3184d720$803cc797@oemcomputer> > I'm very glad you introduced the optimization of building small > constant tuples at compile-time. IMO, that was a pure win. It's been out in the wild for a while now with no issues. I'm somewhat happy with it. > the transformation isn't semantically correct on the > face of it. Well that's the end of that. What we really need is a clean syntax for specifying a constant frozenset without compiler transformations of tuples. That would have the further advantage of letting builtins and globals be used as element values. if isinstance(x, {int, float, complex}): if opcode in {REPEAT, MIN_REPEAT, MAX_REPEAT}: if (code in {301, 302, 303, 307} and m in {"GET", "HEAD"}: if op in (ROT_TWO, POP_TOP, LOAD_FAST) Perhaps something other notation would be better but the idea is basically the same. Raymond From ncoghlan at iinet.net.au Sat Feb 19 06:03:27 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sat Feb 19 06:03:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class Message-ID: <4216C89F.3040400@iinet.net.au> This is something I've typed way too many times: Py> class C(): File "", line 1 class C(): ^ SyntaxError: invalid syntax It's the asymmetry with functions that gets to me - defining a function with no arguments still requires parentheses in the definition statement, but defining a class with no bases requires the parentheses to be omitted. Which leads in to the real question: Does this *really* need to be a syntax error? Or could it be used as an easier way to spell "class C(object):"? Then, in Python 3K, simply drop support for omitting the parentheses from class definitions - require inheriting from ClassicClass instead. This would also have the benefit that the elimination of defaulting to classic classes would cause a syntax error rather than subtle changes in behaviour. Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From abo at minkirri.apana.org.au Sat Feb 19 06:18:00 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 19 06:18:10 2005 Subject: [Python-Dev] builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com> <20050218113608.GB25496@vicky.ecs.soton.ac.uk> Message-ID: <024f01c51642$612a6c70$24ed0ccb@apana.org.au> From: "Armin Rigo" > Hi Tim, > > > On Thu, Feb 17, 2005 at 01:44:11PM -0500, Tim Peters wrote: > > > 256 ** struct.calcsize('P') > > > > Now if you'll just sign and fax a Zope contributor agreement, I'll > > upgrade ZODB to use this slick trick . > > I hereby donate this line of code to the public domain :-) Damn... we can't use it then! Seriously, on the Python lists there has been a discussion rejecting an md5sum implementation because the author "donated it to the public domain". Apparently lawyers have decided that you can't give code away. Intellectual charity is illegal :-) ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From abo at minkirri.apana.org.au Sat Feb 19 06:38:36 2005 From: abo at minkirri.apana.org.au (Donovan Baarda) Date: Sat Feb 19 06:38:48 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c References: <1108090248.3753.53.camel@schizo> <226e9c65e562f9b0439333053036fef3@redivi.com> <1108102539.3753.87.camel@schizo> <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> <4215B010.2090600@v.loewis.de> Message-ID: <027b01c51645$42262dc0$24ed0ccb@apana.org.au> From: "Martin v. L?wis" > Donovan Baarda wrote: > > This patch keeps the current md5c.c, md5module.c files and adds the > > following; _hashopenssl.c, hashes.py, md5.py, sha.py. > [...] > > If all we wanted to do was fix the md5 module > > If we want to fix the licensing issues with the md5 module, this patch > does not help at all, as it keeps the current md5 module (along with > its licensing issues). So any patch to solve the problem will need > to delete the code with the questionable license. It maybe half fixes it in that if Python is happy with the RSA one, they can continue to include it, and if Debian is unhappy with it, they can remove it and build against openssl. It doesn't fully fix the license problem. It is still worth considering because it doesn't make it worse, and it does allow Python to use much faster implementations and support other digest algorithms when openssl is available. > Then, the approach in the patch breaks the promise that the md5 module > is always there. It would require that OpenSSL is always there - a > promise that we cannot make (IMO). It would be better if found an alternative md5c.c. I found one that was the libmd implementation that someone mildly tweaked and then slapped an LGPL on. I have a feeling that would make the lawyers tremble more than the "public domain" libmd one, unless they are happy that someone else is prepared to wear the grief for slapping a LGPL onto something public domain. Probably the best at the moment is the sourceforge one, which is listed as having a "zlib/libpng licence". ---------------------------------------------------------------- Donovan Baarda http://minkirri.apana.org.au/~abo/ ---------------------------------------------------------------- From greg at electricrain.com Sat Feb 19 07:46:32 2005 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Feb 19 07:46:35 2005 Subject: [Python-Dev] license issues with profiler.py and md5.h/md5c.c In-Reply-To: <4215B010.2090600@v.loewis.de> References: <20050211175118.GC25441@zot.electricrain.com> <00c701c5108e$f3d0b930$24ed0ccb@apana.org.au> <5d300838ef9716aeaae53579ab1f7733@redivi.com> <013501c510ae$2abd7360$24ed0ccb@apana.org.au> <20050212133721.GA13429@rogue.amk.ca> <20050212210402.GE25441@zot.electricrain.com> <1108340374.3768.33.camel@schizo> <20050217065330.GP25441@zot.electricrain.com> <1108699791.3758.98.camel@schizo> <4215B010.2090600@v.loewis.de> Message-ID: <20050219064632.GF14279@zot.electricrain.com> On Fri, Feb 18, 2005 at 10:06:24AM +0100, "Martin v. L?wis" wrote: > Donovan Baarda wrote: > >This patch keeps the current md5c.c, md5module.c files and adds the > >following; _hashopenssl.c, hashes.py, md5.py, sha.py. > [...] > >If all we wanted to do was fix the md5 module > > If we want to fix the licensing issues with the md5 module, this patch > does not help at all, as it keeps the current md5 module (along with > its licensing issues). So any patch to solve the problem will need > to delete the code with the questionable license. > > Then, the approach in the patch breaks the promise that the md5 module > is always there. It would require that OpenSSL is always there - a > promise that we cannot make (IMO). I'm aware of that. My goals are primarily to get a good openssl based hashes/digest module going to be used instead of the built in implementations when openssl available because openssl is -so- much faster. Fixing the debian instigated md5 licensing issue is secondary and is something I'll get to later on after i work on the fun stuff. And as Donovan has said, the patch already does present debian with the option of dropping that md5 module and using the openssl derived one instead if they're desperate. based on laziness winning and the issue being so minor i hope they just wait for a patch from me that replaces the md5c.c with one of the acceptably licensed ones for their 2.3/2.4 packages. -g From aleax at aleax.it Sat Feb 19 08:55:44 2005 From: aleax at aleax.it (Alex Martelli) Date: Sat Feb 19 08:55:48 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4216C89F.3040400@iinet.net.au> References: <4216C89F.3040400@iinet.net.au> Message-ID: <03a3f1153caf34d2d087fcc240486a24@aleax.it> On 2005 Feb 19, at 06:03, Nick Coghlan wrote: > This is something I've typed way too many times: > > Py> class C(): > File "", line 1 > class C(): > ^ > SyntaxError: invalid syntax > > It's the asymmetry with functions that gets to me - defining a > function with no arguments still requires parentheses in the > definition statement, but defining a class with no bases requires the > parentheses to be omitted. Seconded. It's always irked me enough that it's the only ``apology'' for Python syntax you'll see in the Nutshell -- top of p. 71, "The syntax of the class statement has a small, tricky difference from that of the def statement" etc. > Which leads in to the real question: Does this *really* need to be a > syntax error? Or could it be used as an easier way to spell "class > C(object):"? -0 ... instinctively, I dread the task of explaining / teaching about the rationale for this somewhat kludgy transitional solution [[empty parentheses may be written OR omitted, with large difference in meaning, not very related to other cases of such parentheses]], even though I think you're right that it would make the future transition to 3.0 somewhat safer. Alex From python at rcn.com Sat Feb 19 09:01:14 2005 From: python at rcn.com (Raymond Hettinger) Date: Sat Feb 19 09:08:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class References: <4216C89F.3040400@iinet.net.au> <03a3f1153caf34d2d087fcc240486a24@aleax.it> Message-ID: <000101c51659$b2f79e80$afbb9d8d@oemcomputer> > > This is something I've typed way too many times: > > > > Py> class C(): > > File "", line 1 > > class C(): > > ^ > > SyntaxError: invalid syntax > > > > It's the asymmetry with functions that gets to me - defining a > > function with no arguments still requires parentheses in the > > definition statement, but defining a class with no bases requires the > > parentheses to be omitted. > > Seconded. It's always irked me enough that it's the only ``apology'' > for Python syntax you'll see in the Nutshell -- top of p. 71, "The > syntax of the class statement has a small, tricky difference from that > of the def statement" etc. +1 For me, this would come-up when experimenting with mixins. Adding and removing a mixin usually entailed a corresponding change to the parentheses. Raymond From michael.walter at gmail.com Sat Feb 19 09:12:50 2005 From: michael.walter at gmail.com (Michael Walter) Date: Sat Feb 19 09:12:54 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <000101c51659$b2f79e80$afbb9d8d@oemcomputer> References: <4216C89F.3040400@iinet.net.au> <03a3f1153caf34d2d087fcc240486a24@aleax.it> <000101c51659$b2f79e80$afbb9d8d@oemcomputer> Message-ID: <877e9a1705021900123c6f0ce2@mail.gmail.com> But... only as an additional option, not as a replacement, right? Michael On Sat, 19 Feb 2005 03:01:14 -0500, Raymond Hettinger wrote: > > > This is something I've typed way too many times: > > > > > > Py> class C(): > > > File "", line 1 > > > class C(): > > > ^ > > > SyntaxError: invalid syntax > > > > > > It's the asymmetry with functions that gets to me - defining a > > > function with no arguments still requires parentheses in the > > > definition statement, but defining a class with no bases requires the > > > parentheses to be omitted. > > > > Seconded. It's always irked me enough that it's the only ``apology'' > > for Python syntax you'll see in the Nutshell -- top of p. 71, "The > > syntax of the class statement has a small, tricky difference from that > > of the def statement" etc. > > +1 For me, this would come-up when experimenting with mixins. Adding and removing a mixin usually entailed a corresponding > change to the parentheses. > > > Raymond > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com > From fredrik at pythonware.com Sat Feb 19 10:33:59 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 19 10:33:57 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com><5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com><5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> Message-ID: Tim Peters wrote: > [Fredrik Lundh] >> wouldn't be the first time... > > How soon we forget . oh, that was in the dark ages of Python 1.4. I've rebooted myself many times since then... > Fredrik introduced a pile of optimizations special-casing the snot out > of small integers into ceval.c a long time ago iirc, you claimed that after a couple of major optimizations had been added, "there's no single optimization left that can speed up pystone by more than X%", so I came up with an "(X+2)%" optimization. you should do that more often ;-) > As a result, "i == j" in Python source code, when i and j are little > ints, is much faster than comparing i and j via any other route in > Python. which explains why my "in" vs. "or" tests showed good results for integers, but not for strings... I'd say that this explains why it would still make sense to let the code generator change "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. (see my earlier timeit results) From fredrik at pythonware.com Sat Feb 19 10:40:16 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat Feb 19 10:40:11 2005 Subject: [Python-Dev] Re: builtin_id() returns negative numbers References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com><20050218113608.GB25496@vicky.ecs.soton.ac.uk> <024f01c51642$612a6c70$24ed0ccb@apana.org.au> Message-ID: Donovan Baarda wrote: > Apparently lawyers have decided that you can't give code away. Intellectual > charity is illegal :-) what else would a lawyer say? do you really expect lawyers to admit that there are ways to do things that don't involve lawyers? From martin at v.loewis.de Sat Feb 19 11:47:13 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 19 11:47:15 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com><5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com><5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> Message-ID: <42171931.4020600@v.loewis.de> Fredrik Lundh wrote: > I'd say that this explains why it would still make sense to let the code generator change > "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. How often does that happen in real code? Regards, Martin From martin at v.loewis.de Sat Feb 19 11:54:06 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat Feb 19 11:54:09 2005 Subject: [Python-Dev] builtin_id() returns negative numbers In-Reply-To: <024f01c51642$612a6c70$24ed0ccb@apana.org.au> References: <4210AFAA.9060108@thule.no><1f7befae050214074122b715a@mail.gmail.com><20050217181119.GA3055@vicky.ecs.soton.ac.uk><1f7befae050217104431312214@mail.gmail.com> <20050218113608.GB25496@vicky.ecs.soton.ac.uk> <024f01c51642$612a6c70$24ed0ccb@apana.org.au> Message-ID: <42171ACE.9020502@v.loewis.de> Donovan Baarda wrote: > Seriously, on the Python lists there has been a discussion rejecting an > md5sum implementation because the author "donated it to the public domain". > Apparently lawyers have decided that you can't give code away. Intellectual > charity is illegal :-) Despite the smiley: It is not illegal - it just does not have any legal effect. Just by saying "I am the chancellor of Germany", it does not make you the chancellor of Germany; instead, you need to go through the election processes. Likewise, saying "the public can have my code" does not make it so. Instead, you have to formulate a license that permits the public to do with the code what you think it should be allowed to do. Most people who've used the term "public domain" in the past didn't really care whether they still have the copyright - what they wanted to say is that anybody can use their work for any purpose. Regards, Martin From mal at egenix.com Sat Feb 19 13:06:37 2005 From: mal at egenix.com (M.-A. Lemburg) Date: Sat Feb 19 13:06:40 2005 Subject: [Python-Dev] Prospective Peephole Transformation In-Reply-To: <002401c51624$1f0ff3a0$803cc797@oemcomputer> References: <002401c51624$1f0ff3a0$803cc797@oemcomputer> Message-ID: <42172BCD.2010807@egenix.com> Raymond Hettinger wrote: >>Hmm, what if you'd teach tuples to do faster contains lookups for >>string or integer only content, e.g. by introducing sub-types for >>string-only and integer-only tuples ?! > > > For a linear search, tuples are already pretty darned good and leave > room for only microscopic O(n) improvements. The bigger win comes from > using a better algorithm and data structure -- hashing beats linear > search hands-down. The constant search time is faster for all n>1, > resulting in much improved scalability. No tweaking of > tuple.__contains__() can match it. > > Sets are the right data structure for fast membership testing. I would > love for sets to be used internally while letting users continue to > write the clean looking code shown above. That's what I was thinking off: if the compiler can detect the constant nature and the use of a common type, it could set a flag in the tuple type telling it about this feature. The tuple could then convert the tuple contents to a set internally and when the __contains__ hook is first called and use the set for the lookup. Alternatively, you could use a sub-type for a few common cases. In either case you would have to teach marshal how to treat the extra bit of information. The user won't notice all this in the Python program and can continue to write clean code (in some cases, even cleaner code than before - I usually use the keyword hack to force certain things into the locals at module load time, but would love to get rid off this). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 19 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: From aahz at pythoncraft.com Sat Feb 19 16:11:46 2005 From: aahz at pythoncraft.com (Aahz) Date: Sat Feb 19 16:11:48 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: <42171931.4020600@v.loewis.de> References: <1f7befae050218174345e029e8@mail.gmail.com> <42171931.4020600@v.loewis.de> Message-ID: <20050219151146.GA4837@panix.com> On Sat, Feb 19, 2005, "Martin v. L?wis" wrote: > Fredrik Lundh wrote: >> >>I'd say that this explains why it would still make sense to let the code >>generator change >>"x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c >>are all integers. > > How often does that happen in real code? Dunno how often, but I was working on some code at my company yesterday that did that -- we use a lot of ints to indicate options. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From mwh at python.net Sat Feb 19 21:27:13 2005 From: mwh at python.net (Michael Hudson) Date: Sat Feb 19 21:27:16 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4216C89F.3040400@iinet.net.au> (Nick Coghlan's message of "Sat, 19 Feb 2005 15:03:27 +1000") References: <4216C89F.3040400@iinet.net.au> Message-ID: <2mpsywxplq.fsf@starship.python.net> Nick Coghlan writes: > This is something I've typed way too many times: > > Py> class C(): > File "", line 1 > class C(): > ^ > SyntaxError: invalid syntax > > It's the asymmetry with functions that gets to me - defining a > function with no arguments still requires parentheses in the > definition statement, but defining a class with no bases requires the > parentheses to be omitted. Yeah, this has annoyed me for ages too. However! You obviously haven't read Misc/HISTORY recently enough :) The surprising thing is that "class C():" used to work (in fact before 0.9.4 the parens mandatory). It became a syntax error in 0.9.9, seemingly because Guido was peeved that people hadn't updated all their old code to the new syntax. I wonder if he'd like to try that trick again today :) I'd still vote for it to be changed. > Which leads in to the real question: Does this *really* need to be a > syntax error? Or could it be used as an easier way to spell "class > C(object):"? -1. Too magical, too opaque. > Then, in Python 3K, simply drop support for omitting the parentheses > from class definitions - require inheriting from ClassicClass > instead. HISTORY repeats itself... Cheers, mwh -- [Perl] combines all the worst aspects of C and Lisp: a billion different sublanguages in one monolithic executable. It combines the power of C with the readability of PostScript. -- Jamie Zawinski From reinhold-birkenfeld-nospam at wolke7.net Sun Feb 20 00:26:36 2005 From: reinhold-birkenfeld-nospam at wolke7.net (Reinhold Birkenfeld) Date: Sun Feb 20 00:26:09 2005 Subject: [Python-Dev] Some old patches Message-ID: Hello, this time working up some of the patches with beards: - #751943 Adds the display of the line number to cgitb stack traces even when the source code is not available to cgitb. This makes sense in the case that the source is lying around somewhere else. However, the original patch generates a link to "file://?" on the occasion that the source file name is not known. I have created a new patch (#1144549) that fixes this, and also renames all local variables "file" in cgitb to avoid builtin shadowing. - #749830 Allows the mmap call on UNIX to be supplied a length argument of 0 to mmap the whole file (which is already implemented on Windows). However, the patch doesn't apply on current CVS, so I made a new patch (#1144555) that does. Recommend apply, unless this may cause problems on some Unices which I don't know about. - #547176 Allows the rlcompleter to complete on [] item access (constructs like sim[0]. could then be completed). As comments in the patch point out, this easily leads to execution of arbitrary code via __getitem__, which is IMHO a too big side effect of completing (though IPython does this). Recommend reject. - #645894 Allows the use of resource.getrusage time values for profile.py, which results in better timing resolution on FreeBSD. However, this may lead to worse timing resolution on other OS, so perhaps the patch should be changed to be restricted to this particular platform. - #697613 -- bug #670311 This handles the problem that python -i exits on SystemExit exceptions by introducting two new API functions. While it works for me, I am not sure whether this is too much overhead for fixing a glitch no one else complained about. - #802188 This adds a specific error message for invalid tokens after a '\' used as line continuation. While it may be helpful when the invalid token is whitespace, Python usually shows the exact location of the invalid token, so you can examine this line and find the error. On the other hand, the patch is no big deal, so if a specific error message is welcome, it may as well be applied. Enough for today... and best of all: I have no patch which I want to promote! Reinhold From gvanrossum at gmail.com Sun Feb 20 02:08:09 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 02:08:15 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <2mpsywxplq.fsf@starship.python.net> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> Message-ID: > > This is something I've typed way too many times: > > > > Py> class C(): > > File "", line 1 > > class C(): > > ^ > > SyntaxError: invalid syntax > > > > It's the asymmetry with functions that gets to me - defining a > > function with no arguments still requires parentheses in the > > definition statement, but defining a class with no bases requires the > > parentheses to be omitted. It's fine to fix this in 2.5. I guess I can add this to my list of early oopsies -- although to the very bottom. :-) It's *not* fine to make C() mean C(object). (We already have enough other ways to declaring new-style classes.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at iinet.net.au Sun Feb 20 03:13:25 2005 From: ncoghlan at iinet.net.au (Nick Coghlan) Date: Sun Feb 20 03:13:31 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> Message-ID: <4217F245.2020004@iinet.net.au> Guido van Rossum wrote: >>>This is something I've typed way too many times: >>> >>>Py> class C(): >>> File "", line 1 >>> class C(): >>> ^ >>>SyntaxError: invalid syntax >>> >>>It's the asymmetry with functions that gets to me - defining a >>>function with no arguments still requires parentheses in the >>>definition statement, but defining a class with no bases requires the >>>parentheses to be omitted. > > > It's fine to fix this in 2.5. I guess I can add this to my list of > early oopsies -- although to the very bottom. :-) > > It's *not* fine to make C() mean C(object). (We already have enough > other ways to declaring new-style classes.) > Fair enough - the magnitude of the semantic difference between "class C:" and "class C():" bothered me a little, too. I'll just have to remember that I can put "__metaclass__ == type" at the top of modules :) Cheers, Nick. -- Nick Coghlan | ncoghlan@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net From jack at performancedrivers.com Sun Feb 20 04:35:38 2005 From: jack at performancedrivers.com (Jack Diederich) Date: Sun Feb 20 04:35:42 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <4217F245.2020004@iinet.net.au> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> Message-ID: <20050220033538.GF9263@performancedrivers.com> On Sun, Feb 20, 2005 at 12:13:25PM +1000, Nick Coghlan wrote: > Guido van Rossum wrote: > >>>This is something I've typed way too many times: > >>> > >>>Py> class C(): > >>> File "", line 1 > >>> class C(): > >>> ^ > >>>SyntaxError: invalid syntax > >>> > >>>It's the asymmetry with functions that gets to me - defining a > >>>function with no arguments still requires parentheses in the > >>>definition statement, but defining a class with no bases requires the > >>>parentheses to be omitted. > > > > > >It's fine to fix this in 2.5. I guess I can add this to my list of > >early oopsies -- although to the very bottom. :-) > > > >It's *not* fine to make C() mean C(object). (We already have enough > >other ways to declaring new-style classes.) > > > > Fair enough - the magnitude of the semantic difference between "class C:" > and "class C():" bothered me a little, too. I'll just have to remember that > I can put "__metaclass__ == type" at the top of modules :) I always use new style classes so I only have to remember one set of behaviors. "__metaclass__ = type" is warty, it has the "action at a distance" problem that decorators solve for functions. I didn't dig into the C but does having 'type' as metaclass guarantee the same behavior as inheriting 'object' or does object provide something type doesn't? *wince* Py3k? Faster please[*]. -Jack * a US-ism of a conservative bent, loosely translated as "change for the better? I'll get behind that." From python at rcn.com Sun Feb 20 04:46:40 2005 From: python at rcn.com (Raymond Hettinger) Date: Sun Feb 20 04:51:42 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: Message-ID: <001301c516fe$ed674700$f33ec797@oemcomputer> > > > This is something I've typed way too many times: > > > > > > Py> class C(): > > > File "", line 1 > > > class C(): > > > ^ > > > SyntaxError: invalid syntax > > > > > > It's the asymmetry with functions that gets to me - defining a > > > function with no arguments still requires parentheses in the > > > definition statement, but defining a class with no bases requires the > > > parentheses to be omitted. > > It's fine to fix this in 2.5. Yea! Raymond From raymond.hettinger at verizon.net Sun Feb 20 05:20:25 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Feb 20 05:24:29 2005 Subject: [Python-Dev] UserString Message-ID: <000001c51703$80f97520$f33ec797@oemcomputer> I noticed that UserString objects have methods that do not accept other UserString objects as arguments: >>> from UserString import UserString >>> UserString('slartibartfast').count(UserString('a')) Traceback (most recent call last): File "", line 1, in -toplevel- UserString('slartibartfast').count(UserString('a')) File "C:\PY24\lib\UserString.py", line 66, in count return self.data.count(sub, start, end) TypeError: expected a character buffer object >>> UserString('abc') in UserString('abcde') Traceback (most recent call last): File "", line 1, in -toplevel- UserString('abc') in UserString('abcde') File "C:\PY24\lib\UserString.py", line 35, in __contains__ return char in self.data TypeError: 'in ' requires string as left operand This sort of thing is easy to test for and easy to fix. The question is whether we care about updating this module anymore or is it a relic. Also, is the use case one that we care about. AFAICT, this has never come up before. Raymond From gvanrossum at gmail.com Sun Feb 20 06:33:31 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 06:33:38 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <20050220033538.GF9263@performancedrivers.com> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: > I didn't dig into the C but does having 'type' > as metaclass guarantee the same behavior as inheriting 'object' or does object > provide something type doesn't? *wince* No, they're equivalent. __metaclass__ = type cause the base class to be object, and a base class of object causes the metaclass to be type. But I agree wholeheartedly: class C(object): is much preferred. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Sun Feb 20 09:15:25 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 09:15:29 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <20050220033538.GF9263@performancedrivers.com> References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> On 2005 Feb 20, at 04:35, Jack Diederich wrote: > I always use new style classes so I only have to remember one set of > behaviors. I agree: that's reason #1 I recommend always using new-style whenever I teach / tutor / mentor in Python nowadays. > "__metaclass__ = type" is warty, it has the "action at a distance" > problem that > decorators solve for functions. I disagree. I view it as akin to a "from __future__ import" except that -- since the compiler doesn't need-to-know, as typeclass-picking happens at runtime -- it was accomplished by less magical and more flexible means. > I didn't dig into the C but does having 'type' > as metaclass guarantee the same behavior as inheriting 'object' or > does object > provide something type doesn't? *wince* I believe the former holds, since for example: >>> class X: __metaclass__ = type ... >>> X.__bases__ (,) If you're making a newstyle class with an oldstyle base, it's different: >>> class Y: pass ... >>> class X(Y): __metaclass__ = type ... Traceback (most recent call last): File "", line 1, in ? TypeError: Error when calling the metaclass bases a new-style class can't have only classic bases in this case, you do need to inherit object explicitly: >>> class X(Y, object): pass ... >>> X.__bases__ (, ) >>> type(X) This is because types.ClassType turns somersaults to enable this: in this latter construct, Python's mechanisms determine ClassType as the metaclass (it's the metaclass of the first base class), but then ClassType in turn sniffs around for another metaclass to delegate to, among the supplied bases, and having found one washes its hands of the whole business;-). Alex From aleax at aleax.it Sun Feb 20 09:32:35 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 09:32:43 2005 Subject: [Python-Dev] UserString In-Reply-To: <000001c51703$80f97520$f33ec797@oemcomputer> References: <000001c51703$80f97520$f33ec797@oemcomputer> Message-ID: <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> On 2005 Feb 20, at 05:20, Raymond Hettinger wrote: ... > This sort of thing is easy to test for and easy to fix. The question > is > whether we care about updating this module anymore or is it a relic. > Also, is the use case one that we care about. AFAICT, this has never > come up before. I did have some issues w/UserString at a client's, but that was connected to some code doing type-checking (and was fixed by injecting basestring as a base of the client's subclass of UserString and ensuring the type-checking always used isinstance and basestring). My two cents: a *mixin* to make it easy to emulate full-fledged strings would be almost as precious as your DictMixin (ones to emulate lists, sets, files [w/buffering], ..., might be even more useful). The point is all of these rich interfaces have a lot of redundancy and a mixin can provide all methods generically based on a few fundamental methods, which can be quite useful, just like DictMixin. But a complete emulation of strings (etc) is mostly of "didactical" use, a sort of checklist to help ensure one implements all methods, not really useful for new code "in production"; at least, I haven't found such uses recently. The above-mentioned client's class was an attempt to join RE functionality to strings and was a rather messy hack anyway, for example (perhaps prompted by client's previous familiarity with Perl, I'm not sure); at any rate, the client should probably have subclassed str or unicode if he really wanted that hack. I can't think of a GOOD use for UserString (etc) since subclassing str (etc) was allowed in 2.2 or at least since a few loose ends about newstyle classes were neatly tied up in 2.3. If we do decide "it is a relic, no more updates" perhaps some indication of deprecation would be warranted. ((In any case, I do think the mixins would be useful)). Alex From mwh at python.net Sun Feb 20 10:38:29 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 10:38:31 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> (Alex Martelli's message of "Sun, 20 Feb 2005 09:15:25 +0100") References: <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> <243fad4f779b2c979e1aa71fd866cda1@aleax.it> Message-ID: <2mvf8nwoyy.fsf@starship.python.net> Alex Martelli writes: > On 2005 Feb 20, at 04:35, Jack Diederich wrote: > >> I didn't dig into the C but does having 'type' >> as metaclass guarantee the same behavior as inheriting 'object' or >> does object >> provide something type doesn't? *wince* > > I believe the former holds, since for example: I was going to say that 'type(object) is type' is everything you need to know, but you also need the bit of code in type_new that replaces an empty bases tuple with (object,) -- but class C: __metaclass__ = Type and class C(object): pass produce identical classes. > This is because types.ClassType turns somersaults to enable this: in > this latter construct, Python's mechanisms determine ClassType as the > metaclass (it's the metaclass of the first base class), but then > ClassType in turn sniffs around for another metaclass to delegate to, > among the supplied bases, and having found one washes its hands of the > whole business;-). It's also notable that type_new does exactly the same thing! Cheers, mwh -- Jokes around here tend to get followed by implementations. -- from Twisted.Quotes From fredrik at pythonware.com Sun Feb 20 13:07:17 2005 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sun Feb 20 13:07:21 2005 Subject: [Python-Dev] Re: Re: Prospective Peephole Transformation References: <4215FD5F.4040605@xs4all.nl><000101c515cc$9f96d0a0$803cc797@oemcomputer><5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com><5.1.1.6.0.20050218113820.02f83870@mail.telecommunity.com><5.1.1.6.0.20050218120310.03c70510@mail.telecommunity.com> <1f7befae050218174345e029e8@mail.gmail.com> <42171931.4020600@v.loewis.de> Message-ID: Martin v. Löwis wrote: >> I'd say that this explains why it would still make sense to let the code generator change >> "x in (a, b, c)" to "x == a or x == b or x == c", as long as a, b, and c are all integers. > > How often does that happen in real code? don't know, but it happens: [fredrik@brain Python-2.4]$ grep "if.*in *([0-9]" Lib/*.py Lib/BaseHTTPServer.py: if self.command != 'HEAD' and code >= 200 and code not in (204, 304): Lib/asyncore.py: if err in (0, EISCONN): Lib/mimify.py: if len(args) not in (0, 1, 2): Lib/sunau.py: if nchannels not in (1, 2, 4): Lib/sunau.py: if sampwidth not in (1, 2, 4): Lib/urllib2.py: if code not in (200, 206): Lib/urllib2.py: if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") Lib/whichdb.py: if magic in (0x00061561, 0x61150600): Lib/whichdb.py: if magic in (0x00061561, 0x61150600): [fredrik@brain Python-2.4]$ grep "if.*in *\[[0-9]" Lib/*.py Lib/decimal.py: if value[0] not in [0,1]: Lib/smtplib.py: if code not in [235, 503]: judging from the standard library, "string in string tuple/list" is a lot more common. From raymond.hettinger at verizon.net Sun Feb 20 16:39:24 2005 From: raymond.hettinger at verizon.net (Raymond Hettinger) Date: Sun Feb 20 16:43:33 2005 Subject: [Python-Dev] Store x Load x --> DupStore Message-ID: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Any objections to new peephole transformation that merges a store/load pair into a single step? There is a tested patch at: www.python.org/sf/1144842 It folds the two steps into a new opcode. In the case of store_name/load_name, it saves one three byte instruction, a trip around the eval-loop, two stack mutations, a incref/decref pair, a dictionary lookup, and an error check (for the lookup). While it acts like a dup followed by a store, it is implemented more simply as a store that doesn't pop the stack. The transformation is broadly applicable and occurs thousands of times in the standard library and test suite. Raymond Hettinger From gvanrossum at gmail.com Sun Feb 20 17:06:28 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 17:06:33 2005 Subject: [Python-Dev] UserString In-Reply-To: <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: [Alex] > I did have some issues w/UserString at a client's, but that was > connected to some code doing type-checking (and was fixed by injecting > basestring as a base of the client's subclass of UserString and > ensuring the type-checking always used isinstance and basestring). Oh, bah. That's not what basestring was for. I can't blame you or your client, but my *intention* was that basestring would *only* be the base of the two *real* built-in string types (str and unicode). The reason for its existence was that some low-level built-in (or extension) operations only accept those two *real* string types and consequently some user code might want to validate ("look before you leap") its own arguments if those eventually ended up being passed to aforementioned low-level built-in code. My intention was always that UserString and other string-like objects would explicitly *not* inherit from basestring. Of course, my intention was lost, your client used basestring to mean "any string-ish object", got away with it because they weren't using any of those low-level built-ins, and you had to comply rather than explain it to them. Sounds like a good reason to add interfaces to the language. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gvanrossum at gmail.com Sun Feb 20 17:17:15 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Sun Feb 20 17:17:17 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <000101c51762$5b8369e0$7c1cc797@oemcomputer> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: > Any objections to new peephole transformation that merges a store/load > pair into a single step? > > There is a tested patch at: www.python.org/sf/1144842 > > It folds the two steps into a new opcode. In the case of > store_name/load_name, it saves one three byte instruction, a trip around > the eval-loop, two stack mutations, a incref/decref pair, a dictionary > lookup, and an error check (for the lookup). While it acts like a dup > followed by a store, it is implemented more simply as a store that > doesn't pop the stack. The transformation is broadly applicable and > occurs thousands of times in the standard library and test suite. What exactly are you trying to accomplish? Do you have examples of code that would be sped up measurably by this transformation? Does anybody care about those speedups even if they *are* measurable? I'm concerned that there's too much hacking of the VM going on with too little benefit. The VM used to be relatively simple code that many people could easily understand. The benefit of that was that new language features could be implemented relatively easily even by relatively inexperienced developers. All that seems to be lost, and I fear that the end result is going to be a calcified VM that's only 10% faster than the original, since we appear to have reached the land of diminishing returns here. I don't see any concentrated efforts trying to figure out where the biggest pain is and how to relieve it; rather, it looks as if the easiest targets are being approached. Now, if these were low-hanging fruit, I'd happily agree, but I'm not so sure that they are all that valuable. Where are the attempts to speed up function/method calls? That's an area where we could *really* use a breakthrough... Eventually we'll need a radically different approach, maybe PyPy, maybe Starkiller. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Sun Feb 20 17:41:31 2005 From: aleax at aleax.it (Alex Martelli) Date: Sun Feb 20 17:41:36 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: On 2005 Feb 20, at 17:06, Guido van Rossum wrote: > [Alex] >> I did have some issues w/UserString at a client's, but that was >> connected to some code doing type-checking (and was fixed by injecting >> basestring as a base of the client's subclass of UserString and >> ensuring the type-checking always used isinstance and basestring). > > Oh, bah. That's not what basestring was for. I can't blame you or your > client, but my *intention* was that basestring would *only* be the > base of the two *real* built-in string types (str and unicode). The > reason for its existence was that some low-level built-in (or > extension) operations only accept those two *real* string types and > consequently some user code might want to validate ("look before you > leap") its own arguments if those eventually ended up being passed to > aforementioned low-level built-in code. My intention was always that > UserString and other string-like objects would explicitly *not* > inherit from basestring. Of course, my intention was lost, your client > used basestring to mean "any string-ish object", got away with it > because they weren't using any of those low-level built-ins, and you > had to comply rather than explain it to them. I would gladly have explained, if I had understood your design intent correctly at the time (whether the explanation would have done much good is another issue); but I'm afraid I didn't. Now I do (thanks for explaining!) though I'm not sure what can be done in retrospect to communicate it more widely. The need to check "is this thingy here string-like" is sort of frequent, because strings are sequences which, when iterated on, yield sequences (strings of length 1) which, when iterated on, yield sequences ad infinitum. Strings are sequences but more often than not one wants to treat them as "scalars" instead. isinstance and basestring allow that frequently needed check so nicely, that, if they're not intended for it, they're an "attractive nuisance" legally;-). The need to make stringlike thingies emerges both for bad reasons (e.g., I never liked that client's "string cum re" perloidism) and good ones (e.g., easing the interfacing with external frameworks that have their own stringythings, such as Qt's QtString); and checking if something is stringlike is also frequent, as per previous para. Darn... > Sounds like a good reason to add interfaces to the language. :-) If an interface must be usable to say "is this string-like?" it will have to be untyped, I guess, and the .translate method will be a small problem (one-argument for unicode, two-args for str, and very different argument semantics) -- don't recall offhand if there are other such nonpolymorphic methods there. Alex From pje at telecommunity.com Sun Feb 20 18:37:41 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 18:34:59 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >Where are the attempts to speed up function/method calls? That's an >area where we could *really* use a breakthrough... Amen! So what happened to Armin's pre-allocated frame patch? Did that get into 2.4? Also, does anybody know where all the time goes in a function call, anyway? I assume that some of the pieces are: * tuple/dict allocation for arguments (but some of this is bypassed on the fast branch for Python-to-Python calls, right?) * frame allocation and setup (but Armin's patch was supposed to eliminate most of this whenever a function isn't being used re-entrantly) * argument "parsing" (check number of args, map kwargs to their positions, etc.; but isn't some of this already fast-pathed for Python-to-Python calls?) I suppose the fast branch fixes don't help special methods like __getitem__ et al, since those don't go through the fast branch, but I don't think those are the majority of function calls. And whatever happened to CALL_METHOD? Do we need a tp_callmethod that takes an argument array, length, and keywords, so that we can skip instancemethod allocation in the common case of calling a method directly? From pje at telecommunity.com Sun Feb 20 18:15:44 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 18:35:09 2005 Subject: [Python-Dev] Requesting that a class be a new-style class In-Reply-To: <243fad4f779b2c979e1aa71fd866cda1@aleax.it> References: <20050220033538.GF9263@performancedrivers.com> <4216C89F.3040400@iinet.net.au> <2mpsywxplq.fsf@starship.python.net> <4217F245.2020004@iinet.net.au> <20050220033538.GF9263@performancedrivers.com> Message-ID: <5.1.1.6.0.20050220121233.021107a0@mail.telecommunity.com> At 09:15 AM 2/20/05 +0100, Alex Martelli wrote: >This is because types.ClassType turns somersaults to enable this: in this >latter construct, Python's mechanisms determine ClassType as the metaclass >(it's the metaclass of the first base class), but then ClassType in turn >sniffs around for another metaclass to delegate to, among the supplied >bases, and having found one washes its hands of the whole business;-). To be pedantic, the actual algorithm in 2.2+ has nothing to do with the first base class; that's the pre-2.2 algorithm. The 2.2 algorithm looks for the most-derived metaclass of the base classes, and simply ignores classic bases altogether. From martin at v.loewis.de Sun Feb 20 18:41:19 2005 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun Feb 20 18:41:22 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <4218CBBF.8030400@v.loewis.de> Guido van Rossum wrote: > I'm concerned that there's too much hacking of the VM going on with > too little benefit. I completely agree. It would be so much more useful if people tried to fix the bugs that have been reported. Regards, Martin From mwh at python.net Sun Feb 20 19:38:39 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 19:38:40 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: (Guido van Rossum's message of "Sun, 20 Feb 2005 08:17:15 -0800") References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <2mr7jbvzyo.fsf@starship.python.net> Guido van Rossum writes: >> Any objections to new peephole transformation that merges a store/load >> pair into a single step? >> >> There is a tested patch at: www.python.org/sf/1144842 >> >> It folds the two steps into a new opcode. In the case of >> store_name/load_name, it saves one three byte instruction, a trip around >> the eval-loop, two stack mutations, a incref/decref pair, a dictionary >> lookup, and an error check (for the lookup). While it acts like a dup >> followed by a store, it is implemented more simply as a store that >> doesn't pop the stack. The transformation is broadly applicable and >> occurs thousands of times in the standard library and test suite. I'm still a little curious as to what code creates such opcodes... > What exactly are you trying to accomplish? Do you have examples of > code that would be sped up measurably by this transformation? Does > anybody care about those speedups even if they *are* measurable? > > I'm concerned that there's too much hacking of the VM going on with > too little benefit. The VM used to be relatively simple code that many > people could easily understand. The benefit of that was that new > language features could be implemented relatively easily even by > relatively inexperienced developers. All that seems to be lost, and I > fear that the end result is going to be a calcified VM that's only 10% > faster than the original, since we appear to have reached the land of > diminishing returns here. In the case of the bytecode optimizer, I'm not sure this is a fair accusation. Even if you don't understand it, you can ignore it and not have your understanding of the rest of the VM affected (I'm not sure that compile.c has ever been "easily understood" in any case :). > I don't see any concentrated efforts trying to figure out where the > biggest pain is and how to relieve it; rather, it looks as if the > easiest targets are being approached. Now, if these were low-hanging > fruit, I'd happily agree, but I'm not so sure that they are all that > valuable. I think some of the peepholer's work are pure wins -- x,y = y,x unpacking and the creation of constant tuples certainly spring to mind. If Raymond wants to spend his time on this stuff, that's his choice. I don't think the obfuscation cost is all that high. > Where are the attempts to speed up function/method calls? That's an > area where we could *really* use a breakthrough... The problem is that it's hard! > Eventually we'll need a radically different approach, maybe PyPy, > maybe Starkiller. Yup. Cheers, mwh -- Gevalia is undrinkable low-octane see-through only slightly roasted bilge water. Compared to .us coffee it is quite drinkable. -- M?ns Nilsson, asr From mwh at python.net Sun Feb 20 20:00:13 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 20:00:30 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 20 Feb 2005 12:37:41 -0500") References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <2mmztzvyyq.fsf@starship.python.net> "Phillip J. Eby" writes: > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >>Where are the attempts to speed up function/method calls? That's an >>area where we could *really* use a breakthrough... > > Amen! > > So what happened to Armin's pre-allocated frame patch? Did that get into 2.4? No, because it slows down recursive function calls, or functions that happen to be called at the same time in different threads. Fixing *that* would require things like code specific frame free-lists and that's getting a bit convoluted and might waste quite a lot of memory. Eliminating the blockstack would be nice (esp. if it's enough to get frames small enough that they get allocated by PyMalloc) but this seemed to be tricky too (or at least Armin, Samuele and I spent a cuple of hours yakking about it on IRC and didn't come up with a clear approach). Dynamically allocating the blockstack would be simpler, and might acheive a similar win. (This is all from memory, I haven't thought about specifics in a while). > Also, does anybody know where all the time goes in a function call, > anyway? I did once... > I assume that some of the pieces are: > > * tuple/dict allocation for arguments (but some of this is bypassed on > the fast branch for Python-to-Python calls, right?) All of it, in easy cases. ISTR that the fast path could be a little wider -- it bails when the called function has default arguments, but I think this case could be handled easily enough. > * frame allocation and setup (but Armin's patch was supposed to > eliminate most of this whenever a function isn't being used > re-entrantly) Ah, you remember the wart :) I think even with the patch, frame setup is a significant amount of work. Why are frames so big? > * argument "parsing" (check number of args, map kwargs to their > positions, etc.; but isn't some of this already fast-pathed for > Python-to-Python calls?) Yes. With some effort you could probably avoid a copy (and incref) of the arguments from the callers to the callees stack area. BFD. > I suppose the fast branch fixes don't help special methods like > __getitem__ et al, since those don't go through the fast branch, but I > don't think those are the majority of function calls. Indeed. I suspect this fails the effort/benefit test, but I could be wrong. > And whatever happened to CALL_METHOD? It didn't work as an optimization, as far as I remember. I think the patch is on SF somewhere. Or is a branch in CVS? Oh, it's patch #709744. > Do we need a tp_callmethod that takes an argument array, length, and > keywords, so that we can skip instancemethod allocation in the > common case of calling a method directly? Hmm, didn't think of that, and I don't think it's how the CALL_ATTR attempt worked. I presume it would need to take a method name too :) I already have a patch that does this for regular function calls (it's a rearrangement/refactoring not an optimization though). Cheers, mwh -- I think perhaps we should have electoral collages and construct our representatives entirely of little bits of cloth and papier mache. -- Owen Dunn, ucam.chat, from his review of the year From bac at OCF.Berkeley.EDU Sun Feb 20 20:41:03 2005 From: bac at OCF.Berkeley.EDU (Brett C.) Date: Sun Feb 20 20:41:12 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mmztzvyyq.fsf@starship.python.net> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <2mmztzvyyq.fsf@starship.python.net> Message-ID: <4218E7CF.1020208@ocf.berkeley.edu> Michael Hudson wrote: > "Phillip J. Eby" writes: [SNIP] >>And whatever happened to CALL_METHOD? > > > It didn't work as an optimization, as far as I remember. I think the > patch is on SF somewhere. Or is a branch in CVS? Oh, it's patch > #709744. > > >>Do we need a tp_callmethod that takes an argument array, length, and >>keywords, so that we can skip instancemethod allocation in the >>common case of calling a method directly? > > > Hmm, didn't think of that, and I don't think it's how the CALL_ATTR > attempt worked. I presume it would need to take a method name too :) > CALL_ATTR basically replaced ``LOAD_ATTR; CALL_FUNCTION`` with a single opcode. Idea was that the function creation by the LOAD_ATTR was a wasted step so might as well just skip it and call the method directly. Problem was the work required to support both classic and new-style classes. Now I have not looked at the code since it was written back at PyCon 2003 and I was a total newbie to the core's C code at that point and I think Thomas said it had been two years since he did any major core hacking. In other words it could possibly have been done better. =) -Brett From pje at telecommunity.com Sun Feb 20 21:22:00 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 21:19:19 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mr7jbvzyo.fsf@starship.python.net> References: <000101c51762$5b8369e0$7c1cc797@oemcomputer> Message-ID: <5.1.1.6.0.20050220150416.029b3960@mail.telecommunity.com> At 06:38 PM 2/20/05 +0000, Michael Hudson wrote: > >> It folds the two steps into a new opcode. In the case of > >> store_name/load_name, it saves one three byte instruction, a trip around > >> the eval-loop, two stack mutations, a incref/decref pair, a dictionary > >> lookup, and an error check (for the lookup). While it acts like a dup > >> followed by a store, it is implemented more simply as a store that > >> doesn't pop the stack. The transformation is broadly applicable and > >> occurs thousands of times in the standard library and test suite. > >I'm still a little curious as to what code creates such opcodes... A simple STORE+LOAD case: >>> dis.dis(compile("x=1; y=x*2","?","exec")) 1 0 LOAD_CONST 0 (1) 3 STORE_NAME 0 (x) 6 LOAD_NAME 0 (x) 9 LOAD_CONST 1 (2) 12 BINARY_MULTIPLY 13 STORE_NAME 1 (y) 16 LOAD_CONST 2 (None) 19 RETURN_VALUE And a simple DUP+STORE case: >>> dis.dis(compile("x=y=1","?","exec")) 1 0 LOAD_CONST 0 (1) 3 DUP_TOP 4 STORE_NAME 0 (x) 7 STORE_NAME 1 (y) 10 LOAD_CONST 1 (None) 13 RETURN_VALUE Of course, I'm not sure how commonly this sort of code occurs in places where it makes a difference to anything. Function call overhead continues to be Python's most damaging performance issue, because it makes it expensive to use abstraction. Here's a thought. Suppose we split frames into an "object" part and a "struct" part, with the object part being just a pointer to the struct part, and a flag indicating whether the struct part is stack-allocated or malloc'ed. This would let us stack-allocate the bulk of the frame structure, but still have a frame "object" to pass around. On exit from the C routine that stack-allocated the frame struct, we check to see if the frame object has a refcount>1, and if so, malloc a permanent home for the frame struct and update the frame object's struct pointer and flag. In this way, frame allocation overhead could be reduced to the cost of an alloca, or just incorporated into the stack frame setup of the C routine itself, allowing the entire struct to be treated as "local variables" from a C perspective (which might benefit performance on architectures that reserve a register for local variable access). Of course, this would slow down exception handling and other scenarios that result in extra references to a frame object, but if the OS malloc is the slow part of frame allocation (frame objects are too large for pymalloc), then perhaps it would be a net win. On the other hand, this approach would definitely use more stack space per calling level. From pje at telecommunity.com Sun Feb 20 21:56:26 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 21:53:45 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2mmztzvyyq.fsf@starship.python.net> References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >"Phillip J. Eby" writes: > > > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: > >>Where are the attempts to speed up function/method calls? That's an > >>area where we could *really* use a breakthrough... > > > > Amen! > > > > So what happened to Armin's pre-allocated frame patch? Did that get > into 2.4? > >No, because it slows down recursive function calls, or functions that >happen to be called at the same time in different threads. Fixing >*that* would require things like code specific frame free-lists and >that's getting a bit convoluted and might waste quite a lot of memory. Ah. I thought it was just going to fall back to the normal case if the pre-allocated frame wasn't available (i.e., didn't have a refcount of 1). >Eliminating the blockstack would be nice (esp. if it's enough to get >frames small enough that they get allocated by PyMalloc) but this >seemed to be tricky too (or at least Armin, Samuele and I spent a >cuple of hours yakking about it on IRC and didn't come up with a clear >approach). Dynamically allocating the blockstack would be simpler, >and might acheive a similar win. (This is all from memory, I haven't >thought about specifics in a while). I'm not very familiar with the operation of the block stack, but why does it need to be a stack? For exception handling purposes, wouldn't it suffice to know the offset of the current handler, and have an opcode to set the current handler location? And for "for" loops, couldn't an anonymous local be used to hold the loop iterator instead of using a stack variable? Hm, actually I think I see the answer; in the case of module-level code there can be no "anonymous local variables" the way there can in functions. Hmm. I guess you'd need to also have a "reset stack to level X" opcode, then, and both it and the set-handler opcode would have to be placed at every destination of a jump that crosses block boundaries. It's not clear how big a win that is, due to the added opcodes even on non-error paths. Hey, wait a minute... all the block stack data is static, isn't it? I mean, the contents of the block stack at any point in a code string could be determined statically, by examination of the bytecode, couldn't it? If that's the case, then perhaps we could design a pre-computed data structure similar to co_lnotab that would be used by the evaluator in place of the blockstack. Of course, I may be talking through my hat here, as I have very little experience with how the blockstack works. However, if this idea makes sense, then perhaps it could actually speed up non-error paths as well (except perhaps for the 'return' statement), at the cost of a larger code structure and compiler complexity. But, if it also means that frames can be allocated faster (e.g. via pymalloc), it might be worth it, just like getting rid of SET_LINENO turned out to be a net win. >All of it, in easy cases. ISTR that the fast path could be a little >wider -- it bails when the called function has default arguments, but >I think this case could be handled easily enough. When it has *any* default arguments, or only when it doesn't have values to supply for them? >Why are frames so big? Because there are CO_MAXBLOCKS * 12 bytes in there for the block stack. If there was no need for that, frames could perhaps be allocated via pymalloc. They only have around 100 bytes or so in them, apart from the blockstack and locals/value stack. > > Do we need a tp_callmethod that takes an argument array, length, and > > keywords, so that we can skip instancemethod allocation in the > > common case of calling a method directly? > >Hmm, didn't think of that, and I don't think it's how the CALL_ATTR >attempt worked. I presume it would need to take a method name too :) Er, yeah, I thought that was obvious. :) From pje at telecommunity.com Sun Feb 20 22:34:50 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun Feb 20 22:32:10 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote: >At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >>Eliminating the blockstack would be nice (esp. if it's enough to get >>frames small enough that they get allocated by PyMalloc) but this >>seemed to be tricky too (or at least Armin, Samuele and I spent a >>cuple of hours yakking about it on IRC and didn't come up with a clear >>approach). Dynamically allocating the blockstack would be simpler, >>and might acheive a similar win. (This is all from memory, I haven't >>thought about specifics in a while). I think I have an idea how to do it in a (relatively) simple fashion; see if you can find a hole in it: * Change the PyTryBlock struct to include an additional member, 'int b_prev', that refers to the previous block in a chain * Change the compiler's emission of SETUP_* opcodes, so that instead of a PyTryBlock being added to the blockstack at interpretation time, it's added to the end of a 'co_blktree' block array at compile time, with its 'b_prev' pointing to the current "top" of the block stack. Instead of the SETUP_* argument being the handler offset, have it be the index of the just-added blocktree entry. * Replace f_blockstack and f_iblock with 'int f_iblktree', and change PyFrame_BlockSetup() to set this equal to the SETUP_* argument, and PyFrame_BlockPop() to use this as an index into the code's co_blktree to retrieve the needed values. PyFrame_BlockPop() would then set f_iblktree equal to the "popped" block's 'b_prev' member, thus "popping" the block from this virtual stack. (Note, by the way, that the blocktree could actually be created as a post-processing step of the current compilation process, by a loop that scans the bytecode and tracks the current stack and blockstack levels, and then replaces the SETUP_* opcodes' arguments. This might be a simpler option than trying to change the compiler to do it along the way.) Can anybody see any flaws in this concept? As far as I can tell it just generates all possible block stack states at compile time, but doesn't change block semantics in the least, and it scarcely touches the eval loop. It seems like it could drop the size of frames enough to let them use pymalloc instead of the OS malloc, at the cost of a 16 bytes per block increase in the size of code objects. (And of course the necessary changes to 'marshal' and 'dis' as well as the compiler and eval loop.) (More precisely, frames whose f_nlocals + f_stacksize is 40 or less, would be 256 bytes or less, and therefore pymalloc-able. However, this should cover all but the most complex functions.) From mwh at python.net Sun Feb 20 22:54:43 2005 From: mwh at python.net (Michael Hudson) Date: Sun Feb 20 22:54:46 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> (Phillip J. Eby's message of "Sun, 20 Feb 2005 15:56:26 -0500") References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <2m8y5ix5gc.fsf@starship.python.net> "Phillip J. Eby" writes: > At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >>"Phillip J. Eby" writes: >> >> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >> >>Where are the attempts to speed up function/method calls? That's an >> >>area where we could *really* use a breakthrough... >> > >> > Amen! >> > >> > So what happened to Armin's pre-allocated frame patch? Did that >> get into 2.4? >> >>No, because it slows down recursive function calls, or functions that >>happen to be called at the same time in different threads. Fixing >>*that* would require things like code specific frame free-lists and >>that's getting a bit convoluted and might waste quite a lot of memory. > > Ah. I thought it was just going to fall back to the normal case if > the pre-allocated frame wasn't available (i.e., didn't have a refcount > of 1). Well, I don't think that's the test, but that might work. Someone should try it :) (I'm trying something else currently). >>Eliminating the blockstack would be nice (esp. if it's enough to get >>frames small enough that they get allocated by PyMalloc) but this >>seemed to be tricky too (or at least Armin, Samuele and I spent a >>cuple of hours yakking about it on IRC and didn't come up with a clear >>approach). Dynamically allocating the blockstack would be simpler, >>and might acheive a similar win. (This is all from memory, I haven't >>thought about specifics in a while). > > I'm not very familiar with the operation of the block stack, but why > does it need to be a stack? Finally blocks are the problem, I think. > For exception handling purposes, wouldn't it suffice to know the > offset of the current handler, and have an opcode to set the current > handler location? And for "for" loops, couldn't an anonymous local > be used to hold the loop iterator instead of using a stack variable? > Hm, actually I think I see the answer; in the case of module-level > code there can be no "anonymous local variables" the way there can in > functions. Hmm. I don't think this is the killer blow. I can't remember the details and it's too late to think about them, so I'm going to wait and see if Samuele replies :) >>All of it, in easy cases. ISTR that the fast path could be a little >>wider -- it bails when the called function has default arguments, but >>I think this case could be handled easily enough. > > When it has *any* default arguments, or only when it doesn't have > values to supply for them? When it has *any*, I think. I also think this is easy to change. >>Why are frames so big? > > Because there are CO_MAXBLOCKS * 12 bytes in there for the block > stack. If there was no need for that, frames could perhaps be > allocated via pymalloc. They only have around 100 bytes or so in > them, apart from the blockstack and locals/value stack. What I'm trying is allocating the blockstack separately and see if two pymallocs are cheaper than one malloc. >> > Do we need a tp_callmethod that takes an argument array, length, and >> > keywords, so that we can skip instancemethod allocation in the >> > common case of calling a method directly? >> >>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR >>attempt worked. I presume it would need to take a method name too :) > > Er, yeah, I thought that was obvious. :) Someone should try this too :) Cheers, mwh -- It is never worth a first class man's time to express a majority opinion. By definition, there are plenty of others to do that. -- G. H. Hardy From greg.ewing at canterbury.ac.nz Mon Feb 21 03:14:13 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 03:14:29 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> Message-ID: <421943F5.7080408@canterbury.ac.nz> Phillip J. Eby wrote: > At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote: > >> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote: >> >>> Eliminating the blockstack would be nice (esp. if it's enough to get >>> frames small enough that they get allocated by PyMalloc) Someone might like to take a look at the way Pyrex generates C code for try-except and try-finally blocks. It manages to get (what I hope is) the same effect using local variables and gotos. It doesn't have to deal with a stack pointer, but I think that should just be a compiler-determinable adjustment to be done when jumping to an outer block. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Mon Feb 21 04:32:11 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 04:32:27 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <4219563B.8080503@canterbury.ac.nz> Phillip J. Eby wrote: > Hm, actually I think I see the answer; in the case of module-level code > there can be no "anonymous local variables" the way there can in > functions. Why not? There's still a frame object associated with the call of the anonymous function holding the module's top-level code. The compiler can allocate locals in that frame, even if the user's code can't. > I guess you'd need to also have a "reset stack to > level X" opcode, then, and both it and the set-handler opcode would have > to be placed at every destination of a jump that crosses block > boundaries. It's not clear how big a win that is, due to the added > opcodes even on non-error paths. Only exceptions and break statements would require stack pointer adjustment, and they're relatively rare. I don't think an extra opcode in those cases would make much of a difference. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Mon Feb 21 04:32:25 2005 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon Feb 21 04:32:43 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: <42195649.3030400@canterbury.ac.nz> Alex Martelli wrote: > > On 2005 Feb 20, at 17:06, Guido van Rossum wrote: > >> Oh, bah. That's not what basestring was for. I can't blame you or your >> client, but my *intention* was that basestring would *only* be the >> base of the two *real* built-in string types (str and unicode). I think all this just reinforces the notion that LBYL is a bad idea! > The need to check "is this thingy here string-like" is sort of frequent, > because strings are sequences which, when iterated on, yield sequences > (strings of length 1) which, when iterated on, yield sequences ad > infinitum. Yes, this characteristic of strings is unfortunate because it tends to make some degree of LBYLing unavoidable. I don't think the right solution is to try to come up with safe ways of doing LBYL on strings, though, at least not in the long term. Maybe in Python 3000 this could be fixed by making strings *not* be sequences. They would be sliceable, but *not* indexable or iterable. If you wanted to iterate over their chars, you would have to say 'for c in s.chars()' or something. Then you would be able to test whether something is sequence-like by the presence of __getitem__ or __iter__ methods, without getting tripped up by strings. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+ From pje at telecommunity.com Mon Feb 21 04:41:09 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Feb 21 04:38:29 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <4219563B.8080503@canterbury.ac.nz> References: <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> Message-ID: <5.1.1.6.0.20050220223833.02e8dc80@mail.telecommunity.com> At 04:32 PM 2/21/05 +1300, Greg Ewing wrote: >Phillip J. Eby wrote: > >>Hm, actually I think I see the answer; in the case of module-level code >>there can be no "anonymous local variables" the way there can in functions. > >Why not? There's still a frame object associated with the call >of the anonymous function holding the module's top-level code. >The compiler can allocate locals in that frame, even if the >user's code can't. That's a good point, but if you look at my "eliminating the block stack" post, you'll see that there's a simpler way to potentially get rid of the block stack, where "simpler" means "simpler changes in fewer places". From pje at telecommunity.com Mon Feb 21 04:44:44 2005 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon Feb 21 04:42:05 2005 Subject: [Python-Dev] UserString In-Reply-To: <42195649.3030400@canterbury.ac.nz> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> Message-ID: <5.1.1.6.0.20050220224135.02e90ad0@mail.telecommunity.com> At 04:32 PM 2/21/05 +1300, Greg Ewing wrote: >Alex Martelli wrote: >>The need to check "is this thingy here string-like" is sort of frequent, >>because strings are sequences which, when iterated on, yield sequences >>(strings of length 1) which, when iterated on, yield sequences ad infinitum. > >Yes, this characteristic of strings is unfortunate because it >tends to make some degree of LBYLing unavoidable. FWIW, the trick I usually use to deal with this aspect of strings in recursive algorithms is to check whether the current item of an iteration is the same object I'm iterating over; if so, I know I've descended into a string. It doesn't catch it on the first recursion level of course (unless it was a 1-character string to start with), but it's a quick-and-dirty way to EAFP such algorithms. From gvanrossum at gmail.com Mon Feb 21 04:42:34 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 04:42:37 2005 Subject: [Python-Dev] UserString In-Reply-To: <42195649.3030400@canterbury.ac.nz> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: > >> Oh, bah. That's not what basestring was for. I can't blame you or your > >> client, but my *intention* was that basestring would *only* be the > >> base of the two *real* built-in string types (str and unicode). > > I think all this just reinforces the notion that LBYL is > a bad idea! In this case, perhaps; but in general? (And I think there's a legitimate desire to sometimes special-case string-like things, e.g. consider a function that takes either a stream or a filename argument.) Anyway, can you explain why LBYL is bad? > > The need to check "is this thingy here string-like" is sort of frequent, > > because strings are sequences which, when iterated on, yield sequences > > (strings of length 1) which, when iterated on, yield sequences ad > > infinitum. > > Yes, this characteristic of strings is unfortunate because it > tends to make some degree of LBYLing unavoidable. I don't > think the right solution is to try to come up with safe ways > of doing LBYL on strings, though, at least not in the long > term. > > Maybe in Python 3000 this could be fixed by making strings *not* > be sequences. They would be sliceable, but *not* indexable or > iterable. If you wanted to iterate over their chars, you > would have to say 'for c in s.chars()' or something. > > Then you would be able to test whether something is sequence-like > by the presence of __getitem__ or __iter__ methods, without > getting tripped up by strings. There would be other ways to get out of this dilemma; we could introduce a char type, for example. Also, strings might be recognizable by other means, e.g. the presence of a lower() method or some other characteristic method that doesn't apply to sequence in general. (To Alex: leaving transform() out of the string interface seems to me the simplest solution.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gvanrossum at gmail.com Mon Feb 21 04:47:08 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 04:47:11 2005 Subject: [Python-Dev] Eliminating the block stack (was Re: Store x Load x --> DupStore) In-Reply-To: <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> References: <2mmztzvyyq.fsf@starship.python.net> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <5.1.1.6.0.20050220160300.02e8bc30@mail.telecommunity.com> Message-ID: > >>Eliminating the blockstack would be nice (esp. if it's enough to get > >>frames small enough that they get allocated by PyMalloc) but this > >>seemed to be tricky too (or at least Armin, Samuele and I spent a > >>cuple of hours yakking about it on IRC and didn't come up with a clear > >>approach). Dynamically allocating the blockstack would be simpler, > >>and might acheive a similar win. (This is all from memory, I haven't > >>thought about specifics in a while). I don't know if this helps, but since I invented the block stack around 1990, I believe I recall the main reason to make it dynamic was to simplify code generation, not because it is inherently dynamic. At the time an extra run-time data structure seemed to require less coding than an extra compile-time data structure. The same argument got me using dicts for locals; that was clearly a bottleneck and eliminated long ago, but I think we should be able to lose the block stack now, too. Somewhat ironically, eliminating the block stack will reduce the stack frame size, while eliminating the dict for locals added to it. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax at aleax.it Mon Feb 21 08:06:37 2005 From: aleax at aleax.it (Alex Martelli) Date: Mon Feb 21 08:06:43 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> On 2005 Feb 21, at 04:42, Guido van Rossum wrote: >>>> Oh, bah. That's not what basestring was for. I can't blame you or >>>> your >>>> client, but my *intention* was that basestring would *only* be the >>>> base of the two *real* built-in string types (str and unicode). >> >> I think all this just reinforces the notion that LBYL is >> a bad idea! > > In this case, perhaps; but in general? (And I think there's a > legitimate desire to sometimes special-case string-like things, e.g. > consider a function that takes either a stream or a filename > argument.) > > Anyway, can you explain why LBYL is bad? In the general case, it's bad because of a combination of issues. It may violate "once, and only once!" -- the operations one needs to check may basicaly duplicate the operations one then wants to perform. Apart from wasted effort, it may happen that the situation changes between the look and the leap (on an external file, or due perhaps to threading or other reentrancy). It's often hard in the look to cover exactly the set of prereq's you need for the leap -- e.g. I've often seen code such as if i < len(foo): foo[i] = 24 which breaks for i<-len(foo); the first time this happens the guard's changed to 0<=i> Then you would be able to test whether something is sequence-like >> by the presence of __getitem__ or __iter__ methods, without >> getting tripped up by strings. > > There would be other ways to get out of this dilemma; we could > introduce a char type, for example. Also, strings might be > recognizable by other means, e.g. the presence of a lower() method or > some other characteristic method that doesn't apply to sequence in > general. Sure, there would many possibilities. > (To Alex: leaving transform() out of the string interface seems to me > the simplest solution.) I guess you mean translate. Yes, that would probably be simplest. Alex From aleax at aleax.it Mon Feb 21 08:06:37 2005 From: aleax at aleax.it (Alex Martelli) Date: Mon Feb 21 08:06:45 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> Message-ID: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> On 2005 Feb 21, at 04:42, Guido van Rossum wrote: >>>> Oh, bah. That's not what basestring was for. I can't blame you or >>>> your >>>> client, but my *intention* was that basestring would *only* be the >>>> base of the two *real* built-in string types (str and unicode). >> >> I think all this just reinforces the notion that LBYL is >> a bad idea! > > In this case, perhaps; but in general? (And I think there's a > legitimate desire to sometimes special-case string-like things, e.g. > consider a function that takes either a stream or a filename > argument.) > > Anyway, can you explain why LBYL is bad? In the general case, it's bad because of a combination of issues. It may violate "once, and only once!" -- the operations one needs to check may basicaly duplicate the operations one then wants to perform. Apart from wasted effort, it may happen that the situation changes between the look and the leap (on an external file, or due perhaps to threading or other reentrancy). It's often hard in the look to cover exactly the set of prereq's you need for the leap -- e.g. I've often seen code such as if i < len(foo): foo[i] = 24 which breaks for i<-len(foo); the first time this happens the guard's changed to 0<=i> Then you would be able to test whether something is sequence-like >> by the presence of __getitem__ or __iter__ methods, without >> getting tripped up by strings. > > There would be other ways to get out of this dilemma; we could > introduce a char type, for example. Also, strings might be > recognizable by other means, e.g. the presence of a lower() method or > some other characteristic method that doesn't apply to sequence in > general. Sure, there would many possibilities. > (To Alex: leaving transform() out of the string interface seems to me > the simplest solution.) I guess you mean translate. Yes, that would probably be simplest. Alex From mwh at python.net Mon Feb 21 10:00:11 2005 From: mwh at python.net (Michael Hudson) Date: Mon Feb 21 10:00:13 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: <2m8y5ix5gc.fsf@starship.python.net> (Michael Hudson's message of "Sun, 20 Feb 2005 21:54:43 +0000") References: <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <000101c51762$5b8369e0$7c1cc797@oemcomputer> <5.1.1.6.0.20050220122401.028a4e50@mail.telecommunity.com> <5.1.1.6.0.20050220152217.029bb650@mail.telecommunity.com> <2m8y5ix5gc.fsf@starship.python.net> Message-ID: <2m1xbawan8.fsf@starship.python.net> Michael Hudson writes: >> Because there are CO_MAXBLOCKS * 12 bytes in there for the block >> stack. If there was no need for that, frames could perhaps be >> allocated via pymalloc. They only have around 100 bytes or so in >> them, apart from the blockstack and locals/value stack. > > What I'm trying is allocating the blockstack separately and see if two > pymallocs are cheaper than one malloc. This makes no difference at all, of course -- once timeit or pystone gets going the code path that actually allocates a new frame as opposed to popping one off the free list simply never gets executed. Duh! Cheers, mwh (and despite what the sigmonster implies, I wasn't drunk last night :) -- This is an off-the-top-of-the-head-and-not-quite-sober suggestion, so is probably technically laughable. I'll see how embarassed I feel tomorrow morning. -- Patrick Gosling, ucam.comp.misc From z_axis at 163.com Mon Feb 21 14:54:33 2005 From: z_axis at 163.com (z-axis) Date: Mon Feb 21 14:49:38 2005 Subject: [Python-Dev] Re: Welcome to the "Python-Dev" mailing list Message-ID: <20050221134936.909271E4003@bag.python.org> hi,friends i am a python newbie but i used Java for about 5 years. when i saw python introduce in a famous magzine called <> in China, i am immediately absorbed by its pretty code. i hope i can use Python to do real development. regards! ¡¡¡¡ ======== 2005-02-21 14:28:00 ÄúÔÚÀ´ÐÅÖÐдµÀ£º ======== Welcome to the Python-Dev@python.org mailing list! If you are a new subscriber, please take the time to introduce yourself briefly in your first post. It is appreciated if you lurk around for a while before posting! :-) Additional information on Python's development process can be found in the Python Developer's Guide: http://www.python.org/dev/ To post to this list, send your email to: python-dev@python.org General information about the mailing list is at: http://mail.python.org/mailman/listinfo/python-dev If you ever want to unsubscribe or change your options (eg, switch to or from digest mode, change your password, etc.), visit your subscription page at: http://mail.python.org/mailman/options/python-dev/z_axis%40163.com You can also make such adjustments via email by sending a message to: Python-Dev-request@python.org with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You must know your password to change your options (including changing the password, itself) or to unsubscribe. It is: zpython999 Normally, Mailman will remind you of your python.org mailing list passwords once every month, although you can disable this if you prefer. This reminder will also include instructions on how to unsubscribe or change your account options. There is also a button on your options page that will email your current password to you. = = = = = = = = = = = = = = = = = = = = = = ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡Ö Àñ£¡ ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡z-axis z_axis@163.com ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ ¡¡¡¡¡¡¡¡¡¡2005-02-21 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20050221/e537f44e/attachment.htm From gvanrossum at gmail.com Mon Feb 21 17:15:47 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Mon Feb 21 17:15:51 2005 Subject: [Python-Dev] UserString In-Reply-To: <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> Message-ID: > > Anyway, can you explain why LBYL is bad? > > In the general case, it's bad because of a combination of issues. It > may violate "once, and only once!" -- the operations one needs to check > may basicaly duplicate the operations one then wants to perform. Apart > from wasted effort, it may happen that the situation changes between > the look and the leap (on an external file, or due perhaps to threading > or other reentrancy). It's often hard in the look to cover exactly the > set of prereq's you need for the leap -- e.g. I've often seen code such > as > if i < len(foo): > foo[i] = 24 > which breaks for i<-len(foo); the first time this happens the guard's > changed to 0<=i w/negative index; finally it stabilizes to the correct check, > -len(foo)<=i check that Python performs again when you then use foo[i]... just > cluttering code. The intermediate Pythonista's who's learned to code > "try: foo[i]=24 // except IndexError: pass" is much better off than the > one who's still striving to LBYL as he had (e.g.) when using C. > > Etc -- this is all very general and generic. Right. There are plenty of examples where LBYL is better, e.g. because there are too many different exceptions to catch, or they occur in too many places. One of my favorites is creating a directory if it doesn't already exist; I always use this LBYL-ish pattern: if not os.path.exists(dn): try: os.makedirs(dn) except os.error, err: ...log the error... because the specific exception for "it already exists" is quite subtle to pull out of the os.error structure. Taken to th extreme, the "LBYL is bad" meme would be an argument against my optional type checking proposal, which I doubt is what you want. So, I'd like to take a much more balanced view on LBYL. > I had convinced myself that strings were a special case worth singling > out, via isinstance and basestring, just as (say) dictionaries are > singled out quite differently by metods such as get... I may well have > been too superficial in this conclusion. I think there are lots of situations where the desire to special-case strings is legitimate. > >> Then you would be able to test whether something is sequence-like > >> by the presence of __getitem__ or __iter__ methods, without > >> getting tripped up by strings. > > > > There would be other ways to get out of this dilemma; we could > > introduce a char type, for example. Also, strings might be > > recognizable by other means, e.g. the presence of a lower() method or > > some other characteristic method that doesn't apply to sequence in > > general. > > Sure, there would many possibilities. > > > (To Alex: leaving transform() out of the string interface seems to me > > the simplest solution.) > > I guess you mean translate. Yes, that would probably be simplest. Right. BTW, there's *still* no sign from a PEP 246 rewrite. Maybe someone could offer Clark a hand? (Last time I inquired he was recovering from a week of illness.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Mon Feb 21 22:24:32 2005 From: python at rcn.com (Raymond Hettinger) Date: Mon Feb 21 22:28:35 2005 Subject: [Python-Dev] Store x Load x --> DupStore In-Reply-To: Message-ID: <000a01c5185b$bc999700$f61ac797@oemcomputer> > Where are the attempts to speed up function/method calls? That's an > area where we could *really* use a breakthrough... At one time you had entertained treating some of the builtin calls as fixed. Is that something you want to go forward with? It would entail a "from __future__" and transition period. It would not be hard to take code like "return len(alist)" and transform it from: 2 0 LOAD_GLOBAL 0 (len) 3 LOAD_FAST 0 (alist) 6 CALL_FUNCTION 1 9 RETURN_VALUE to: 2 0 LOAD_FAST 0 (alist) 3 OBJECT_LEN 4 RETURN_VALUE Some functions already have a custom opcode that cannot be used unless we freeze the meaning of the function name: repr --> UNARY_CONVERT --> PyObject_Repr iter --> GET_ITER --> PyObject_GetIter Alternately, functions could be served by a table of known, fixed functions: 2 0 LOAD_FAST 0 (alist) 3 CALL_DEDICATED 0 (PyObject_Len) 6 RETURN_VALUE where the dispatch table is something like: [PyObject_Len, PyObject_Repr, PyObject_IsInstance, PyObject_IsTrue, PyObject_GetIter, ...]. Of course, none of these offer a big boost and there is some loss of dynamic behavior. Raymond From barry at python.org Tue Feb 22 03:50:01 2005 From: barry at python.org (Barry Warsaw) Date: Tue Feb 22 03:50:17 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> Message-ID: <1109040601.25187.170.camel@presto.wooz.org> On Mon, 2005-02-21 at 11:15, Guido van Rossum wrote: > Right. There are plenty of examples where LBYL is better, e.g. because > there are too many different exceptions to catch, or they occur in too > many places. One of my favorites is creating a directory if it doesn't > already exist; I always use this LBYL-ish pattern: > > if not os.path.exists(dn): > try: > os.makedirs(dn) > except os.error, err: > ...log the error... > > because the specific exception for "it already exists" is quite subtle > to pull out of the os.error structure. Really? I do this kind of thing all the time: import os import errno try: os.makedirs(dn) except OSError, e: if e.errno <> errno.EEXIST: raise -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 307 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20050221/ae2d9387/attachment.pgp From quarl at NOSPAM.quarl.org Tue Feb 22 02:41:38 2005 From: quarl at NOSPAM.quarl.org (Karl Chen) Date: Tue Feb 22 07:34:34 2005 Subject: [Python-Dev] textwrap wordsep_re Message-ID: Hi, textwrap.fill() is awesome. Except when the string to wrap contains dates -- which I would like not to be broken. In general I think wordsep_re can be smarter about what it decides are hyphenated words. For example, this code: print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) produces: aaaaaaaaaa 2005- 02-21 A slightly tweaked wordsep_re: textwrap.TextWrapper.wordsep_re = \ re.compile(r'(\s+|' # any whitespace r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) behaves better: aaaaaaaaaa 2005-02-21 What do you think about changing the default wordsep_re? -- Karl 2005-02-21 17:39 From aahz at pythoncraft.com Tue Feb 22 15:35:06 2005 From: aahz at pythoncraft.com (Aahz) Date: Tue Feb 22 15:35:10 2005 Subject: [Python-Dev] textwrap wordsep_re In-Reply-To: References: Message-ID: <20050222143506.GA27893@panix.com> On Mon, Feb 21, 2005, Karl Chen wrote: > > A slightly tweaked wordsep_re: > textwrap.TextWrapper.wordsep_re = \ > re.compile(r'(\s+|' # any whitespace > r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words > r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash > print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) > behaves better: > aaaaaaaaaa > 2005-02-21 > > What do you think about changing the default wordsep_re? Please post a patch to SF. If you're not familiar with the process, take a look at http://www.python.org/dev/dev_intro.html Another thing: I don't know whether you'll get this in direct e-mail; it's considered a bit rude for python-dev to use munged addresses. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code -- not in reams of trivial code that bores the reader to death." --GvR From gvanrossum at gmail.com Tue Feb 22 17:16:52 2005 From: gvanrossum at gmail.com (Guido van Rossum) Date: Tue Feb 22 17:16:57 2005 Subject: [Python-Dev] UserString In-Reply-To: <1109040601.25187.170.camel@presto.wooz.org> References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: > Really? I do this kind of thing all the time: > > import os > import errno > try: > os.makedirs(dn) > except OSError, e: > if e.errno <> errno.EEXIST: > raise You have a lot more faith in the errno module than I do. Are you sure the same error codes work on all platforms where Python works? It's also not exactly readable (except for old Unix hacks). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From david.ascher at gmail.com Tue Feb 22 17:20:47 2005 From: david.ascher at gmail.com (David Ascher) Date: Tue Feb 22 17:20:50 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: On Tue, 22 Feb 2005 08:16:52 -0800, Guido van Rossum wrote: > > Really? I do this kind of thing all the time: > > > > import os > > import errno > > try: > > os.makedirs(dn) > > except OSError, e: > > if e.errno <> errno.EEXIST: > > raise > > You have a lot more faith in the errno module than I do. Are you sure > the same error codes work on all platforms where Python works? It's > also not exactly readable (except for old Unix hacks). Agreed. In general, I often wish in production code (especially in not-100% Python systems) that Python did a better job of at the very least documenting what kinds of exceptions were raised by what function calls. Otherwise you end up with what are effectively blanket try/except statements way too often for my taste. --da From andymac at bullseye.apana.org.au Tue Feb 22 13:13:08 2005 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Tue Feb 22 19:19:49 2005 Subject: [Python-Dev] Re: Prospective Peephole Transformation In-Reply-To: References: <4215FD5F.4040605@xs4all.nl> <000101c515cc$9f96d0a0$803cc797@oemcomputer> <5.1.1.6.0.20050218103403.03869990@mail.telecommunity.com> Message-ID: <421B21D4.5050306@bullseye.apana.org.au> Fredrik Lundh wrote: > it could be worth expanding them to > > "if x == 1 or x == 2 or x == 3:" > > though... > > C:\>timeit -s "a = 1" "if a in (1, 2, 3): pass" > 10000000 loops, best of 3: 0.11 usec per loop > C:\>timeit -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.0691 usec per loop > > C:\>timeit -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.123 usec per loop > C:\>timeit -s "a = 2" "if a in (1, 2, 3): pass" > 10000000 loops, best of 3: 0.143 usec per loop > > C:\>timeit -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.187 usec per loop > C:\>timeit -s "a = 3" "if a in (1, 2, 3): pass" > 1000000 loops, best of 3: 0.197 usec per loop > > C:\>timeit -s "a = 4" "if a in (1, 2, 3): pass" > 1000000 loops, best of 3: 0.225 usec per loop > C:\>timeit -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" > 10000000 loops, best of 3: 0.161 usec per loop Out of curiousity I ran /F's tests on my FreeBSD 4.8 box with a recent checkout: $ ./python Lib/timeit.py -s "a = 1" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.247 usec per loop $ ./python Lib/timeit.py -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.225 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.343 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.353 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.415 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.457 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.467 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.488 usec per loop I then applied this patch: --- Objects/tupleobject.c.orig Fri Jun 11 05:28:08 2004 +++ Objects/tupleobject.c Tue Feb 22 22:10:18 2005 @@ -298,6 +298,11 @@ int i, cmp; for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i) + cmp = (PyTuple_GET_ITEM(a, i) == el); + if (cmp) + return cmp; + + for (i = 0, cmp = 0 ; cmp == 0 && i < a->ob_size; ++i) cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i), Py_EQ); return cmp; Re-running the tests yielded: $ ./python Lib/timeit.py -s "a = 1" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.234 usec per loop $ ./python Lib/timeit.py -s "a = 1" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.228 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.239 usec per loop $ ./python Lib/timeit.py -s "a = 2" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.36 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.241 usec per loop $ ./python Lib/timeit.py -s "a = 3" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.469 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a in (1, 2, 3): pass" 1000000 loops, best of 3: 0.475 usec per loop $ ./python Lib/timeit.py -s "a = 4" "if a == 1 or a == 2 or a == 3: pass" 1000000 loops, best of 3: 0.489 usec per loop ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From quarl at cs.berkeley.edu Mon Feb 21 12:39:41 2005 From: quarl at cs.berkeley.edu (Karl Chen) Date: Tue Feb 22 20:00:13 2005 Subject: [Python-Dev] textwrap.py wordsep_re Message-ID: Hi, textwrap.fill() is awesome. Except when the string to wrap contains dates -- which I would like not to be filled. In general I think wordsep_re can be smarter about what it decides are hyphenated words. For example, this code: print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) produces: aaaaaaaaaa 2005- 02-21 A slightly tweaked wordsep_re: textwrap.TextWrapper.wordsep_re =\ re.compile(r'(\s+|' # any whitespace r'[^\s\w]*\w+[a-zA-Z]-(?=[a-zA-Z]\w+)|' # hyphenated words r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash print textwrap.fill('aaaaaaaaaa 2005-02-21', 18) behaves better: aaaaaaaaaa 2005-02-21 What do you think about changing the default wordsep_re? -- Karl 2005-02-21 03:32 From michel at dialnetwork.com Wed Feb 23 03:04:34 2005 From: michel at dialnetwork.com (Michel Pelletier) Date: Wed Feb 23 00:24:07 2005 Subject: [Python-Dev] UserString In-Reply-To: <20050222110123.608C41E403C@bag.python.org> References: <20050222110123.608C41E403C@bag.python.org> Message-ID: <200502221804.34808.michel@dialnetwork.com> On Tuesday 22 February 2005 03:01 am, Guido wrote: > > BTW, there's *still* no sign from a PEP 246 rewrite. Maybe someone > could offer Clark a hand? (Last time I inquired he was recovering from > a week of illness.) Last summer Alex, Clark, Phillip and I swapped a few emails about reviving the 245/246 drive and submitting a plan for a PSF grant. I was pushing the effort and then had to lamely drop out due to a new job. This is good grant material for someone which leads to my question, when will the next cycle of PSF grants happen? I'm not volunteering and I won't have the bandwidth to participate, but if there are other starving souls out there willing to do the heavy lifting to help Alex it could get done quickly within the PSFs own framework for advancing the language. -Michel From andrewm at object-craft.com.au Wed Feb 23 01:14:45 2005 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Wed Feb 23 01:14:34 2005 Subject: [Python-Dev] UserString In-Reply-To: References: <000001c51703$80f97520$f33ec797@oemcomputer> <0f5201ccd99380eeac0400da69d6d9f7@aleax.it> <42195649.3030400@canterbury.ac.nz> <89b4ed0afdf4a58a4425a588bdbb1965@aleax.it> <1109040601.25187.170.camel@presto.wooz.org> Message-ID: <20050223001445.DB6583C889@coffee.object-craft.com.au> >> if e.errno <> errno.EEXIST: >> raise > >You have a lot more faith in the errno module than I do. Are you sure >the same error codes work on all platforms where Python works? It's >also not exactly readable (except for old Unix hacks). On the other hand, LBYL in this context can result in race conditions and security vulnerabilities. "os.makedirs" is already a composite of many system calls, so all bets are off anyway, but for simpler operations that result in an atomic system call, this is important. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From tim.peters at gmail.com Wed Feb 23 03:57:22 2005 From: tim.peters at gmail.com (Tim Peters) Date: Wed Feb 23 03:57:25 2005 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.344, 2.345 In-Reply-To: References: Message-ID: <1f7befae050222185758fdd46e@mail.gmail.com> [rhettinger@users.sourceforge.net] > Modified Files: > compile.c > Log Message: > Teach the peepholer to fold unary operations on constants. > > Afterwards, -0.5 loads in a single step and no longer requires a runtime > UNARY_NEGATIVE operation. Aargh. The compiler already folded in a leading minus for ints, and exempting floats from this was deliberate. Stick this in a file: import math print math.atan2(-0.0, -0.0) If you run that directly, a decent 754-conforming libm will display an approximation to -pi (-3.14...; this is the required result in C99 if its optional 754 support is implemented, and even MSVC has done this all along). But if you import the same module from a .pyc or .pyo, now on the HEAD it prints 0.0 instead. In 2.4 it still prints -pi. I often say that all behavior in the presence of infinities, NaNs, and signed zeroes is undefined in CPython, and that's strictly true (just _try_ to find reassuring words about any of those cases in the Python docs ). But it's still the case that we (meaning mostly me) strive to preserve sensible 754 semantics when it's reasonably possible to do so. Not even gonzo-optimizing Fortran compilers will convert -0.0 to 0.0 anymore, precisely because it's not semantically neutral. In this case, it's marshal that drops the sign bit of a float 0 on the floor, so surprises result if and only if you run from a precompiled Python module now. I don't think you need to revert the whole patch, but -0.0 must be left alone (or marshal taught to preserve the sign of a float 0.0 -- but then you have the problem of _detecting_ the sign of a float 0.0, and nothing in standard C89 can do so). Even in 754-land, it's OK to fold in the sign for non-zero float literals (-x is always unexceptional in 754 unless x is a signaling NaN, and there are no signaling NaN literals; and the sign bit of any finite float except zero is already preserved by marshal). From kbk at shore.net Wed Feb 23 05:19:55 2005 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed Feb 23 05:20:51 2005 Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200502230419.j1N4Jthi005718@bayview.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 308 open (+10) / 2755 closed ( +1) / 3063 total (+11) Bugs : 838 open (+15) / 4834 closed ( +5) / 5672 total (+20) RFE : 168 open ( +0) / 148 closed ( +4) / 316 total ( +4) New / Reopened Patches ______________________ do not add directory of sys.argv[0] into sys.path (2004-05-02) http://python.org/sf/946373 reopened by wrobell isapi.samples.advanced.py fix (2005-02-17) http://python.org/sf/1126187 opened by Philippe Kirsanov more __contains__ tests (2005-02-17) http://python.org/sf/1141428 opened by Jim Jewett Fix to allow urllib2 digest auth to talk to livejournal.com (2005-02-18) http://python.org/sf/1143695 opened by Benno Rice Add IEEE Float support to wave.py (2005-02-19) http://python.org/sf/1144504 opened by Ben Schwartz cgitb: make more usable for 'binary-only' sw (new patch) (2005-02-19) http://python.org/sf/1144549 opened by Reinhold Birkenfeld allow UNIX mmap size to default to current file size (new) (2005-02-19) http://python.org/sf/1144555 opened by Reinhold Birkenfeld Make OpenerDirector instances pickle-able (2005-02-20) http://python.org/sf/1144636 opened by John J Lee webbrowser.Netscape.open bug fix (2005-02-20) http://python.org/sf/1144816 opened by Pernici Mario Replace store/load pair with a single new opcode (2005-02-20) http://python.org/sf/1144842 opened by Raymond Hettinger Remove some invariant conditions and assert in ceval (2005-02-20) http://python.org/sf/1145039 opened by Neal Norwitz Patches Closed ______________ date.strptime and time.strptime as well (2005-02-04) http://python.org/sf/1116362 closed by josh-sf New / Reopened Bugs ___________________ attempting to use urllib2 on some URLs fails starting on 2.4 (2005-02-16) http://python.org/sf/1123695 opened by Stephan Sokolow descrintro describes __new__ and __init__ behavior wrong (2005-02-15) http://python.org/sf/1123716 opened by Steven Bethard gensuitemodule.processfile fails (2005-02-16) http://python.org/sf/1123727 opened by Jurjen N.E. Bos PyDateTime_FromDateAndTime documented as PyDate_FromDateAndT (2005-02-16) CLOSED http://python.org/sf/1124278 opened by smilechaser Function's __name__ no longer accessible in restricted mode (2005-02-16) CLOSED http://python.org/sf/1124295 opened by Tres Seaver Python24.dll crashes, EXAMPLE ATTACHED (2005-02-12) CLOSED http://python.org/sf/1121201 reopened by complex IDLE line wrapping (2005-02-16) CLOSED http://python.org/sf/1124503 opened by Chris Rebert test_os fails on 2.4 (2005-02-17) CLOSED http://python.org/sf/1124513 reopened by doerwalter test_os fails on 2.4 (2005-02-16) CLOSED http://python.org/sf/1124513 opened by Brett Cannon test_subprocess is far too slow (2005-02-17) http://python.org/sf/1124637 opened by Michael Hudson Math mode not well handled in \documentclass{howto} (2005-02-17) http://python.org/sf/1124692 opened by Daniele Varrazzo GetStdHandle in interactive GUI (2005-02-17) http://python.org/sf/1124861 opened by davids subprocess.py Errors with IDLE (2005-02-17) http://python.org/sf/1126208 opened by Kurt B. Kaiser subprocesss module retains older license header (2005-02-17) http://python.org/sf/1138653 opened by Tres Seaver Python syntax is not so XML friendly! (2005-02-18) CLOSED http://python.org/sf/1143855 opened by Colbert Philippe inspect.getsource() breakage in 2.4 (2005-02-18) http://python.org/sf/1143895 opened by Armin Rigo future warning in commets (2005-02-18) http://python.org/sf/1144057 opened by Grzegorz Makarewicz reload() is broken for C extension objects (2005-02-19) http://python.org/sf/1144263 opened by Matthew G. Knepley htmllib quote parse error within a