Mailman 3 November 2009 - Python-ideas

Heap data type
by Facundo Batista 07 Dec '09

07 Dec '09

Using the module heapq, I found that it's not easy to use. Or, at least, that it could be straightforward to use. My main issue is that for correct usage: - user should provide an external list, that shouldn't use for anything else to don't break the invariant - to alter the heap queue, a bunch of functions must be used, passing always the external list I think that changing "external list" for "internal attribute", and "bunch of functions " for "methods", it will leave the module easier to use, safer, and more object oriented. So, I basically coded it. What do you think about including this class in the heap module? """ class Heap(object): '''Maintains a heapified list, always conserving the invariant. Heaps are arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. ''' def __init__(self, iterable=None): '''Initializes the heap from any iterable. >>> Heap([1, 2]) Heap([1, 2]) >>> Heap([]) Heap([]) >>> Heap() Heap([]) >>> Heap((1, 2, 3)) Heap([1, 2, 3]) >>> Heap(x**2 for x in range(3)) Heap([0, 1, 4]) ''' if iterable is None: self._queue = [] else: self._queue = list(iterable) heapq.heapify(self._queue) def __repr__(self): return "Heap(%s)" % self._queue def push(self, item): '''Push the item to the heap queue. >>> h = Heap() >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.push(2) >>> h.push(1) >>> h.push(0) >>> h Heap([0, 2, 1, 5, 3, 4]) ''' heapq.heappush(self._queue, item) def pop(self): '''Pop one item from the heap queue. >>> h = Heap([0, 2, 1, 5, 3, 4]) >>> h.pop() 0 >>> h.pop() 1 >>> h.pop() 2 >>> h.pop() 3 >>> h.pop() 4 >>> h.pop() 5 >>> h Heap([]) >>> h.pop() Traceback (most recent call last): ... IndexError: index out of range ''' return heapq.heappop(self._queue) def pushpop(self, item): '''Push the item and pop one from the heap queue. Note that this method is more efficient than calling both push() and pop() separately. >>> h = Heap() >>> h.pushpop(7) 7 >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.pushpop(7) 3 >>> h.pushpop(7) 4 >>> h Heap([5, 7, 7]) ''' return heapq.heappushpop(self._queue, item) def replace(self, item): '''Pop one item and push the received one into the heap queue Note that this method is more efficient than calling both pop() and push() separately. >>> h = Heap() >>> h.replace(3) Traceback (most recent call last): ... IndexError: index out of range >>> h.push(7) >>> h.replace(2) 7 >>> h Heap([2]) >>> h.push(3) >>> h Heap([2, 3]) >>> h.replace(1) 2 >>> h.replace(9) 1 >>> h Heap([3, 9]) ''' return heapq.heapreplace(self._queue, item) """ Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

11 14

XOR
by Sturla Molden 26 Nov '09

26 Nov '09

Why does Python have a bitwise but not a logical xor operator? It's even weirder because boolean objects do have a __xor__ method. Does Py3k have an xor keyword? (I am using 2.6 due to NumPy.) (Yes I know BDFL is planning a moratorium on syntax.) S.M.

13 17

Remove GIL with CAS instructions?
by Sturla Molden 24 Nov '09

24 Nov '09

The GIL is particularly evil on multiprocessor systems. Not just because it prevents parallel excution of Python code, but thread switches are defunct. This is because a thread that periodically releases and re-acquires the GIL (at checkintervals), more or less always wins. On a multiprocessor system, the thread it will continue to run its processor and grab the GIL back before a sleeping thread wakes up. The cooperative multithreading intended by using checkintervals only works properly on single-processor systems. Python threads only work properly on single-processor computers. On computers with multiple CPUs, one must set the affinity of the Python process is set to one CPU only for proper cooperative thread scheduling. This behaviour of the GIL is a PITA, but often overlooked - all debates seems to be about parallel scalability, though far less important. Today, most people have multicore desktop computers (e.g. mine have four cores). The Linux kernel got rid of the BKL a long time ago. It's time Python does the same. The GIL also has its virtues. When calling a C extension, it is serialized by default unless the GIL is released. An attempt was made (many years ago) to replace the GIL with fine-grained locking. It slowed down serial code. That should come as no surprise, as OS mutexes and semaphores are expensive. There are lock-free data structures of any conceivable sort. There are multithreaded garbage collectors in Java and .NET, that don't need a global lock. I've heard claims that removal of the GIL would require Python to use a garbage collector instead of reference counts. That is not true. Lock-free data structures and garbage collectors have all one thing in common: they use a compare-and-exchange (CAS) instruction present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is used to implement OS mutexes (sleeplocks) and the less expensive spinlocks. Without a CAS instruction for the platform, there would be no GIL. All platforms that has a working GIL has a working CAS. Python could use the CAS instruction for lock-free management of reference counts. All built-in types (int, float, str, list, tuple, dict, set, etc) could be made lock-free using CAS. That would allow the interpreter to execute bytecode without holding the GIL. Only calls to C extensions would be serialized with the GIL. Schematically, this is how refcounts could be managed without needing a lock: /* * Returns 1 if an exhange was made, and (*addr == old) before * the exchange, otherwise returns 0. * * uses opcode CMPXCHG on x86 * */ extern int Py_refcnt_CAS(Py_ssize_t *addr, Py_ssize_t old, Py_ssize_t new); inline void Py_DECREF(PyObject *obj) { register Py_ssize_t refcnt; do { refcnt = obj->refcnt; } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt-1)); refcnt = obj->refcnt; if (refcnt == 0) { if(Py_refcnt_CAS(&obj->ob_refcnt, 0, -1)) { /* deallocate the object */ } } } inline void Py_INCREF(PyObject *obj) { register Py_ssize_t refcnt; refcnt = obj->refcnt; if (refcnt >= 0) { do { if((refcnt = obj->refcnt)<0) break; } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt+1)); } } Regards, Sturla Molden

10 21

Ordered Dictionary Literals
by Richard Saunders 13 Nov '09

13 Nov '09

Hey All, We were exploring new features of Python 3.0 at our Tucson User's Group here in Tucson (TuPLE: Tucson Python Language Enthusiasts), in particular, the OrderedDict. See http://groups.google.com/group/TuPLEgroup/browse_thread/thread/40af73f8e194… Has there been any discussion about making a "better" OrderedDict literal? I did some googling and didn't find anything. Basically, the thought was there might be a place for a slightly better literal for OrderedDict in Python 3.0 od = OrderedDict([('a',1),('b':2)]) # seems clumsy The two ideas floated were: od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies key-value or od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order Apologies if this is the wrong place for this discussion. There has been a lot of opinions flying here at work and at TuPLE which I will be happy to share if this is the right place. ;) Gooday, Richie

9 8

Add encoding attribute to bytes
by Terry Reedy 10 Nov '09

10 Nov '09

A Python interpreter has one encoding for floats, ints, and strings. sys.float_info and sys.int_info give details about the first two. although they are mostly invisible to user code. (I presume they are attached to sys rather than float and int precisely because this.) A couple of recent posts have discussed making the unicode encoding (UCS2 v 4) both less visible and more discoverable to extensions. Bytes are nearly always an encoding of *something*, but the particular encoding used is instance-specific. As Guido has said, the programmer must keep track. But how? In an OO language, one obvious way is as an attribute of the instance. That would be carried with the instance and make it self-identifying. What I do not know if it is feasible to give an immutable instance of a builtin class a mutable attribute slot. If it were, I think this could make 3.x bytes easier and more transparent to use. When a string is encoded to bytes, the attribute would be set. If it were then pickled, the attribute would be stored with it and restored with it, and less easily lost. If it were then decoded, the attribute would be used. If it were sent to the net, the attribute would be used to set the appropriate headers. The reverse process would apply from net to bytes to (unicode) text. Bytes representing other types of data, such as nedia could also be tagged, not just those representing text. This would be a proposal for 3.3 at the earliest. It would involved revising stdlib modules, as appropriate, to use the new info. Terry Jan Reedy

7 16

Fwd: [Python-Dev] PEP 3003 - Python Language Moratorium
by Brett Cannon 09 Nov '09

09 Nov '09

FYI to everyone on this list. ---------- Forwarded message ---------- From: Guido van Rossum <guido(a)python.org> Date: Mon, Nov 9, 2009 at 09:56 Subject: Re: [Python-Dev] PEP 3003 - Python Language Moratorium To: Brett Cannon <brett(a)python.org> Cc: python-dev(a)python.org Thanks Brett. I've moved the moratorium PEP to Status: Accepted. I've added the words about inclusion of 3.2 and exclusion of 3.3 (which were eaten by a svn conflict when I previously tried to add them) and added a section to th end stating that an extension will require another PEP. --Guido [snip - non-critical email Guido was replying to]

1 0

Refactoring the import system to be object-oriented at the top level
by Nick Coghlan 09 Nov '09

09 Nov '09

(Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving away a lot of major problems with the concept, not least of which is the sheer amount of work involved. I just wanted to get the idea published somewhere while I was thinking about it) I'm in the process of implementing a runpy.run_path function for 2.7/3.2 to allow Python code to use the zipfile and directory execution feature provided by the CPython command line in 2.6/3.1. It turns out the global state used for the import system is causing some major pain in the implementation. It's solvable, but it will probably involve a couple of rather ugly hacks and the result sure as hell isn't going to be thread-safe. Anyway, the gist of the idea in the subject line is to take all of the PEP 302 data stores and make them attributes of an ImportEngine class. This would affect at least: sys.modules sys.path sys.path_hooks sys.path_importer_cache sys.meta_path sys.dont_write_bytecode The underlying import machinery would migrate to instance methods of the new class. The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine. Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account. The main immediate benefit from my point of view would be to allow runpy to create *copies* of the standard import engine so that runpy.run_module and runpy.run_path could go do their thing without impacting the rest of the interpreter. At the moment that really isn't feasible, hence the lack of thread safety in that module. I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine. >From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class. Cheers, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia ---------------------------------------------------------------

3 6

[OT] Python history
by Daniel Fetchinson 08 Nov '09

08 Nov '09

Are Guido's articles on python history at python-history.blogspot.com over? I guess there are lots more anecdotes and stories still but the blog hasn't been updated for a long time so was wondering whether I should expect anything down the pipe? Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown

2 2

UCS2 vs UCS4 ABIs
by Daniel Stutzbach 07 Nov '09

07 Nov '09

Scope ----- This idea affects the CPython ABI for extension modules. It has no impact on the Python language syntax nor other Python implementations. The Problem ----------- Currently, Python can be built with an internal Unicode representation of UCS2 or UCS4. The two are binary incompatible, but the distinction is not included as part of the platform name. Consequently, if one installs a binary egg (e.g., with easy_install), there's a good chance one will get an error such as the following when trying to use it: undefined symbol: PyUnicodeUCS2_FromString In Python 2, some extension modules can blissfully link to either ABI, as the problem only arises for modules that call a PyUnicode_* macro (which expands to calling either a PyUnicodeUCS2_* or PyUnicodeUCS4_* function). For Python 3, every extension type will need to call a PyUnicode_* macro, since __repr__ must return a Unicode object. This problem has been known since at least 2006, as seen in this thread from the distutils-sig: http://markmail.org/message/bla5vrwlv3kn3n7e?q=thread:bla5vrwlv3kn3n7e In that thread, it was suggested that the Unicode representation become part of the platform name. That change would require a distutils and/or setuptools change, which has not happened and does not appear likely to happen in the near future. It would also mean that anyone who wants to provide binary eggs for common platforms will need to provide twice as many eggs. Solution -------- Get rid of the ABI difference for the 99% of extension modules that don't care about the internal representation of Unicode strings. From the extension module's point of view, PyObject is opaque. It will manipulate the Unicode string entirely through PyUnicode_* function calls and does not care about the internal representation. For example, PyUnicode_FromString has the following signature in the documentation: PyObject *PyUnicode_FromString(const char *u) Currently, it's #ifdef'ed to either PyUnicodeUCS2_FromString or PyUnicodeUCS4_FromString. Remove the macro and name the function PyUnicode_FromString regardless of which internal representation is being used. The vast majority of binary eggs will then work correctly on both UCS2 and UCS4 Pythons. Functions that explicitly use Py_UNICODE or PyUnicodeObject as part of their signature will continue to be #ifdef'ed, so extension modules that *do* care about the internal representation will still generate a link error. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

4 7

Anonymizing the PyCon review process
by Terry Jones 04 Nov '09

04 Nov '09

Last night I got a couple of PyCon talks rejected, and someone else sent me a rejection email they'd received. I wasn't surprised at the rejections, but I was quite surprised that many of the review comments were at least in part based on the presenter (sometimes the incorrectly assumed presenter) instead of on the proposed talk. Out of the 14 reviews, 5 of them have comments about the author of the proposal. I'll just give 2 examples here, taken from 2 different reviewers on 2 different proposals. 1. Imagine you're a relatively unknown Python programmer, you submit a talk to PyCon, and get back a review whose first sentence reads: I don't know the reputation of this particular speaker, so I won't "+1" That sends a pretty unfortunate message: Because you're not a recognized Python person, I wont give you the thumbs up. Maybe I'm being naive or simplistic, but I'd have hoped one route to becoming recognized would be by giving a PyCon talk. From the POV of the review recipient the leading justification for a non-recommendation has nothing to do with the talk! 2. What if you gave a PyCon talk in an earlier year that wasn't rated as highly as other talks? You send in a PyCon proposal, and get this back: I like XXX but honestly his talk at pycon 09 went poorly. That was the *entire* review in this case. What's the message here? Sounds like: well, you gave a talk once and it wasn't so great, so part of my vote against your proposal is because of that. It's like telling people to go away and not bother ever submitting again. Again, it's that's not based on the current talk proposal. People tend to get better at giving talks. If a proposal's content is technically good enough to get in, let them give another talk and help them to make it better. Just when is XXX supposed to re-send another PyCon talk, if ever? What makes this even worse is that XXX was not even the primary author of the proposal, and was not to be the speaker. So here we have a review that's negative *entirely* due to a talk that someone else gave in a previous year. How discouraging. Should the person in the future "take one for the team" and decline to be listed as a co-author on joint proposals - even though they're not going to speak - fearing that a reviewer will reply with a -1 and a one-line dismissal? That's the unfortunate dynamic that the above "review" has created. I hope this doesn't sound like personal sour grapes. It's not at all. I've had *tons* of rejection letters in my life (see http://bit.ly/1xytIr), including from PyCon. They're water off a duck's back at this point :-) I do however care about Python and the Python community. The most important point is the message that's sent back to aspiring speakers. Reviews that are based on the supposed character, or old talks, or how recognized you are or aren't, or on a guess as to which of multiple authors might be doing the presenting - all of those send a bad message. They make PyCon look insular and cliquey. If the committee of people is (or merely gives the impression of being) inwardly focused, the community and in the longer term perhaps the language itself will suffer through reduced diversity and through discouraging precisely the people who are animated enough and have the initiative and ambition to submit talks. Those are *exactly* the wrong folks to discourage. The obvious suggestion is to anonymize the review process. That's standard in mature conferences. It doesn't eliminate bias (in fact you *don't want* to eliminate bias - you need it to survive, you need it to assess quality), but it does reduce the opportunity for judgment based on the wrong things. When I say "wrong" I mean: if you're going to judge based on stuff that's not just the proposal content, then ask for a CV, or a speaking record, or whatever you intend to consider in the review process. Anonymizing conference reviewing has healthy effects. I've seen it up close in academic circles. It's like a breath of fresh air and the results are surprising. When they did it in the genetic algorithms world, all of a sudden really interesting talks were being accepted from all over the world and many very experienced researchers were having multiple talks rejected. That was unexpected, refreshing, and generally agreed to be a very healthy and embracing/welcoming move. If PyCon doesn't move to anonymizing reviews, then at least *try* not to base acceptance decisions on who a speaker is (or, worse, who it's presumed to be). If for some reason you have to, it's *perhaps* better not to tell the poor submitter that they're being rejected in part based on who they are or aren't. The CFP requests a talk proposal, and that's what should be reacted to in the review response, even if there's more to the story. Some of the comments above *might* be appropriate for a conference committee meeting, but not for the first (or only!) line of a review. OK, rant over :-) Regards to everyone & thanks for all the PyCon work. I know how much work it is, and that it's not easy. I hope to be able to make it to Atlanta. Terry Jones

9 10