Mailman 3 October 2009 - Python-ideas

Heap data type
by Facundo Batista 07 Dec '09

07 Dec '09

Using the module heapq, I found that it's not easy to use. Or, at least, that it could be straightforward to use. My main issue is that for correct usage: - user should provide an external list, that shouldn't use for anything else to don't break the invariant - to alter the heap queue, a bunch of functions must be used, passing always the external list I think that changing "external list" for "internal attribute", and "bunch of functions " for "methods", it will leave the module easier to use, safer, and more object oriented. So, I basically coded it. What do you think about including this class in the heap module? """ class Heap(object): '''Maintains a heapified list, always conserving the invariant. Heaps are arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. ''' def __init__(self, iterable=None): '''Initializes the heap from any iterable. >>> Heap([1, 2]) Heap([1, 2]) >>> Heap([]) Heap([]) >>> Heap() Heap([]) >>> Heap((1, 2, 3)) Heap([1, 2, 3]) >>> Heap(x**2 for x in range(3)) Heap([0, 1, 4]) ''' if iterable is None: self._queue = [] else: self._queue = list(iterable) heapq.heapify(self._queue) def __repr__(self): return "Heap(%s)" % self._queue def push(self, item): '''Push the item to the heap queue. >>> h = Heap() >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.push(2) >>> h.push(1) >>> h.push(0) >>> h Heap([0, 2, 1, 5, 3, 4]) ''' heapq.heappush(self._queue, item) def pop(self): '''Pop one item from the heap queue. >>> h = Heap([0, 2, 1, 5, 3, 4]) >>> h.pop() 0 >>> h.pop() 1 >>> h.pop() 2 >>> h.pop() 3 >>> h.pop() 4 >>> h.pop() 5 >>> h Heap([]) >>> h.pop() Traceback (most recent call last): ... IndexError: index out of range ''' return heapq.heappop(self._queue) def pushpop(self, item): '''Push the item and pop one from the heap queue. Note that this method is more efficient than calling both push() and pop() separately. >>> h = Heap() >>> h.pushpop(7) 7 >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.pushpop(7) 3 >>> h.pushpop(7) 4 >>> h Heap([5, 7, 7]) ''' return heapq.heappushpop(self._queue, item) def replace(self, item): '''Pop one item and push the received one into the heap queue Note that this method is more efficient than calling both pop() and push() separately. >>> h = Heap() >>> h.replace(3) Traceback (most recent call last): ... IndexError: index out of range >>> h.push(7) >>> h.replace(2) 7 >>> h Heap([2]) >>> h.push(3) >>> h Heap([2, 3]) >>> h.replace(1) 2 >>> h.replace(9) 1 >>> h Heap([3, 9]) ''' return heapq.heapreplace(self._queue, item) """ Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

11 14

XOR
by Sturla Molden 26 Nov '09

26 Nov '09

Why does Python have a bitwise but not a logical xor operator? It's even weirder because boolean objects do have a __xor__ method. Does Py3k have an xor keyword? (I am using 2.6 due to NumPy.) (Yes I know BDFL is planning a moratorium on syntax.) S.M.

13 17

Remove GIL with CAS instructions?
by Sturla Molden 24 Nov '09

24 Nov '09

The GIL is particularly evil on multiprocessor systems. Not just because it prevents parallel excution of Python code, but thread switches are defunct. This is because a thread that periodically releases and re-acquires the GIL (at checkintervals), more or less always wins. On a multiprocessor system, the thread it will continue to run its processor and grab the GIL back before a sleeping thread wakes up. The cooperative multithreading intended by using checkintervals only works properly on single-processor systems. Python threads only work properly on single-processor computers. On computers with multiple CPUs, one must set the affinity of the Python process is set to one CPU only for proper cooperative thread scheduling. This behaviour of the GIL is a PITA, but often overlooked - all debates seems to be about parallel scalability, though far less important. Today, most people have multicore desktop computers (e.g. mine have four cores). The Linux kernel got rid of the BKL a long time ago. It's time Python does the same. The GIL also has its virtues. When calling a C extension, it is serialized by default unless the GIL is released. An attempt was made (many years ago) to replace the GIL with fine-grained locking. It slowed down serial code. That should come as no surprise, as OS mutexes and semaphores are expensive. There are lock-free data structures of any conceivable sort. There are multithreaded garbage collectors in Java and .NET, that don't need a global lock. I've heard claims that removal of the GIL would require Python to use a garbage collector instead of reference counts. That is not true. Lock-free data structures and garbage collectors have all one thing in common: they use a compare-and-exchange (CAS) instruction present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is used to implement OS mutexes (sleeplocks) and the less expensive spinlocks. Without a CAS instruction for the platform, there would be no GIL. All platforms that has a working GIL has a working CAS. Python could use the CAS instruction for lock-free management of reference counts. All built-in types (int, float, str, list, tuple, dict, set, etc) could be made lock-free using CAS. That would allow the interpreter to execute bytecode without holding the GIL. Only calls to C extensions would be serialized with the GIL. Schematically, this is how refcounts could be managed without needing a lock: /* * Returns 1 if an exhange was made, and (*addr == old) before * the exchange, otherwise returns 0. * * uses opcode CMPXCHG on x86 * */ extern int Py_refcnt_CAS(Py_ssize_t *addr, Py_ssize_t old, Py_ssize_t new); inline void Py_DECREF(PyObject *obj) { register Py_ssize_t refcnt; do { refcnt = obj->refcnt; } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt-1)); refcnt = obj->refcnt; if (refcnt == 0) { if(Py_refcnt_CAS(&obj->ob_refcnt, 0, -1)) { /* deallocate the object */ } } } inline void Py_INCREF(PyObject *obj) { register Py_ssize_t refcnt; refcnt = obj->refcnt; if (refcnt >= 0) { do { if((refcnt = obj->refcnt)<0) break; } while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt+1)); } } Regards, Sturla Molden

10 21

Re: [Python-ideas] One obvious way to do interning [Was: Retrieve an arbitrary element from a set without removing it]
by Alexander Belopolsky 28 Oct '09

28 Oct '09

On Mon, Oct 26, 2009 at 1:29 PM, John Arbash Meinel <john(a)arbash-meinel.com> wrote: .. > Anyway, I've had pretty good results, and some significant memory > savings at times. (The string interned() dict is often the largest > single object. Once you get 500k strings in there, it is something like > 12MB all by itself. I've certainly seen it in the 24MB+ range.) Please see http://bugs.python.org/issue7224 . It would be interesting to see if it make a difference to any of those 24MB+ interned string dictionary applications.

1 0

Proposal: Moratorium on Python language changes
by Guido van Rossum 27 Oct '09

27 Oct '09

I propose a moratorium on language changes. This would be a period of several years during which no changes to Python's grammar or language semantics will be accepted. The reason is that frequent changes to the language cause pain for implementors of alternate implementations (Jython, IronPython, PyPy, and others probably already in the wings) at little or no benefit to the average user (who won't see the changes for years to come and might not be in a position to upgrade to the latest version for years after). The main goal of the Python development community at this point should be to get widespread acceptance of Python 3000. There is tons of work to be done before we can be comfortable about Python 3.x, mostly in creating solid ports of those 3rd party libraries that must be ported to Py3k before other libraries and applications can be ported. (Other work related to Py3k acceptance might be tools to help porting, tools to help maintaining multiple versions of a codebase, documentation about porting to Python 3, and so on. Also, work like that going on in the distutils-sig is very relevant.) Note, the moratorium would only cover the language itself plus built-in functions, not the standard library. Development in the standard library is valuable and much less likely to be a stumbling block for alternate language implementations. I also want to exclude details of the CPython implementation, including the C API from being completely frozen -- for example, if someone came up with (otherwise acceptable) changes to get rid of the GIL I wouldn't object. But the moratorium would clearly apply to proposals for anonymous blocks, "yield from" (PEP 380), changes to decorator syntax, and the like. (I'm sure it won't stop *discussion* of those proposals, and that's not the purpose of the moratorium; but at least it will stop worries elsewhere that such proposals might actually be *accepted* any time soon.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

32 107

stdlib with its own release cycle ?
by Tarek Ziadé 27 Oct '09

27 Oct '09

Hello, That's not my idea, it has been suggested by several people in the distutils discussions. But I'd like to bring it here because I couldn't find a clear answer or position on this subject in the lists archives (let me know if there's one). And depending on the answers, I might suggest it as a topic for the next language summit. What about having a different release cycle for the stdlib, and shipping Python in two distinct releases: - Python : core + stdlib - Python-Stdlib : stdlib The Python release would remain unchanged, but its cycle would be longer (as the moratorium seems to imply). stdlib would have a shorter release cycle (18 months?) allowing people to upgrade it without having to wait for the next full Python release. Each stdlib release would be compatible with a range of Python versions. Regards, Tarek

14 44

Re: [Python-ideas] [Python-Dev] One obvious way to do interning [Was: Retrieve an arbitrary element from a set without removing it]
by Nick Coghlan 26 Oct '09

26 Oct '09

Terry Reedy wrote: > Alexander Belopolsky wrote: >> Terry Reedy wrote: >>> I had exactly the same idea, but did not post because it violates the >>> general rule that mutators return None. >> >> Is there such a rule? What about set/dict pop? > > The rule perhaps should be restated as 'Collection mutators return None > or possible an item but not the collection.' And to clarify the rationale for that guideline: it is to make it clear that the mutator is changing the container in place and *not* creating a new container object. myset.pop() # No new container, returns popped object mylist.sort() # No new container, returns None sorted(mylist) # New container, so return it mystr.lower() # Creates new string, so return it Cheers, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia ---------------------------------------------------------------

1 0

Re: [Python-ideas] One obvious way to do interning [Was: Retrieve an arbitrary element from a set without removing it]
by Alexander Belopolsky 26 Oct '09

26 Oct '09

On Mon, Oct 26, 2009 at 1:29 PM, John Arbash Meinel <john(a)arbash-meinel.com> wrote: ... > Just to let you know that I created my own "SimpleSet" class for use in > the Bazaar code base. And I did exactly this, overloading "Add" to also > return the object added. It works pretty well. > Very interesting. Is there a public web viewer for your branch? I tried http://code.python.org/loggerhead/, but it gives me 503 Service Temporarily Unavailable.

1 0

Re: [Python-ideas] [Python-Dev] One obvious way to do interning [Was: Retrieve anarbitrary element from a set without removing it]
by Raymond Hettinger 26 Oct '09

26 Oct '09

>> Here is an alternative idea on how storing interned objects in a set >> can be supported. FWIW, here is a recipe that can retrieve canonical objects from any container, letting look-up a preferred representative of an equivalence class: http://code.activestate.com/recipes/499299/ Raymond

2 1

Re: [Python-ideas] One obvious way to do interning [Was: Retrieve an arbitrary element from a set without removing it]
by Georg Brandl 26 Oct '09

26 Oct '09

Alexander Belopolsky schrieb: > Changing the subject to reflect branched discussion and forwarding to > python-ideas where it probably belongs. > > On Mon, Oct 26, 2009 at 12:02 PM, Terry Reedy <tjreedy(a)udel.edu> wrote: >> Alexander Belopolsky wrote: >> >>> Here is an alternative idea on how storing interned objects in a set >>> can be supported. Currently set.add method returns None and has no >>> effect when set already has an object equal to the one being added. I >>> propose to consider changing that behavior to make set.add return the >>> added object or the set member that is equal to the object being >>> added. It is unlikely that many programs rely on the return value >>> being None (with doctests being a probable exception), so adding this >>> feature is unlikely to cause much grief. >> >> I had exactly the same idea, but did not post because it violates the >> general rule that mutators return None. > > Is there such a rule? What about set/dict pop? The rule is about methods that do not have an obvious return value, where the choice is between returning self and None. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

1 0