Using the module heapq, I found that it's not easy to use. Or, at
least, that it could be straightforward to use.
My main issue is that for correct usage:
- user should provide an external list, that shouldn't use for
anything else to don't break the invariant
- to alter the heap queue, a bunch of functions must be used, passing
always the external list
I think that changing "external list" for "internal attribute", and
"bunch of functions " for "methods", it will leave the module easier
to use, safer, and more object oriented.
So, I basically coded it. What do you think about including this class
in the heap module?
"""
class Heap(object):
'''Maintains a heapified list, always conserving the invariant.
Heaps are arrays for which heap[k] <= heap[2*k+1] and
heap[k] <= heap[2*k+2] for all k, counting elements from zero.
'''
def __init__(self, iterable=None):
'''Initializes the heap from any iterable.
>>> Heap([1, 2])
Heap([1, 2])
>>> Heap([])
Heap([])
>>> Heap()
Heap([])
>>> Heap((1, 2, 3))
Heap([1, 2, 3])
>>> Heap(x**2 for x in range(3))
Heap([0, 1, 4])
'''
if iterable is None:
self._queue = []
else:
self._queue = list(iterable)
heapq.heapify(self._queue)
def __repr__(self):
return "Heap(%s)" % self._queue
def push(self, item):
'''Push the item to the heap queue.
>>> h = Heap()
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.push(2)
>>> h.push(1)
>>> h.push(0)
>>> h
Heap([0, 2, 1, 5, 3, 4])
'''
heapq.heappush(self._queue, item)
def pop(self):
'''Pop one item from the heap queue.
>>> h = Heap([0, 2, 1, 5, 3, 4])
>>> h.pop()
0
>>> h.pop()
1
>>> h.pop()
2
>>> h.pop()
3
>>> h.pop()
4
>>> h.pop()
5
>>> h
Heap([])
>>> h.pop()
Traceback (most recent call last):
...
IndexError: index out of range
'''
return heapq.heappop(self._queue)
def pushpop(self, item):
'''Push the item and pop one from the heap queue.
Note that this method is more efficient than calling both
push() and pop() separately.
>>> h = Heap()
>>> h.pushpop(7)
7
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.pushpop(7)
3
>>> h.pushpop(7)
4
>>> h
Heap([5, 7, 7])
'''
return heapq.heappushpop(self._queue, item)
def replace(self, item):
'''Pop one item and push the received one into the heap queue
Note that this method is more efficient than calling both
pop() and push() separately.
>>> h = Heap()
>>> h.replace(3)
Traceback (most recent call last):
...
IndexError: index out of range
>>> h.push(7)
>>> h.replace(2)
7
>>> h
Heap([2])
>>> h.push(3)
>>> h
Heap([2, 3])
>>> h.replace(1)
2
>>> h.replace(9)
1
>>> h
Heap([3, 9])
'''
return heapq.heapreplace(self._queue, item)
"""
Regards,
--
. Facundo
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Why does Python have a bitwise but not a logical xor operator? It's even
weirder because boolean objects do have a __xor__ method.
Does Py3k have an xor keyword? (I am using 2.6 due to NumPy.)
(Yes I know BDFL is planning a moratorium on syntax.)
S.M.
The GIL is particularly evil on multiprocessor systems. Not just because
it prevents parallel excution of Python code, but thread switches are
defunct. This is because a thread that periodically releases and
re-acquires the GIL (at checkintervals), more or less always wins. On a
multiprocessor system, the thread it will continue to run its processor
and grab the GIL back before a sleeping thread wakes up. The cooperative
multithreading intended by using checkintervals only works properly on
single-processor systems. Python threads only work properly on
single-processor computers. On computers with multiple CPUs, one must
set the affinity of the Python process is set to one CPU only for proper
cooperative thread scheduling. This behaviour of the GIL is a PITA, but
often overlooked - all debates seems to be about parallel scalability,
though far less important. Today, most people have multicore desktop
computers (e.g. mine have four cores). The Linux kernel got rid of the
BKL a long time ago. It's time Python does the same.
The GIL also has its virtues. When calling a C extension, it is
serialized by default unless the GIL is released.
An attempt was made (many years ago) to replace the GIL with
fine-grained locking. It slowed down serial code. That should come as no
surprise, as OS mutexes and semaphores are expensive.
There are lock-free data structures of any conceivable sort. There are
multithreaded garbage collectors in Java and .NET, that don't need a
global lock. I've heard claims that removal of the GIL would require
Python to use a garbage collector instead of reference counts. That is
not true. Lock-free data structures and garbage collectors have all one
thing in common: they use a compare-and-exchange (CAS) instruction
present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is
used to implement OS mutexes (sleeplocks) and the less expensive
spinlocks. Without a CAS instruction for the platform, there would be no
GIL. All platforms that has a working GIL has a working CAS.
Python could use the CAS instruction for lock-free management of
reference counts. All built-in types (int, float, str, list, tuple,
dict, set, etc) could be made lock-free using CAS. That would allow the
interpreter to execute bytecode without holding the GIL. Only calls to C
extensions would be serialized with the GIL.
Schematically, this is how refcounts could be managed without needing a
lock:
/*
* Returns 1 if an exhange was made, and (*addr == old) before
* the exchange, otherwise returns 0.
*
* uses opcode CMPXCHG on x86
*
*/
extern int Py_refcnt_CAS(Py_ssize_t *addr, Py_ssize_t old, Py_ssize_t
new);
inline void Py_DECREF(PyObject *obj)
{
register Py_ssize_t refcnt;
do {
refcnt = obj->refcnt;
} while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt-1));
refcnt = obj->refcnt;
if (refcnt == 0) {
if(Py_refcnt_CAS(&obj->ob_refcnt, 0, -1))
{
/* deallocate the object */
}
}
}
inline void Py_INCREF(PyObject *obj)
{
register Py_ssize_t refcnt;
refcnt = obj->refcnt;
if (refcnt >= 0) {
do {
if((refcnt = obj->refcnt)<0) break;
} while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt+1));
}
}
Regards,
Sturla Molden
On Mon, Oct 26, 2009 at 1:29 PM, John Arbash Meinel
<john(a)arbash-meinel.com> wrote:
..
> Anyway, I've had pretty good results, and some significant memory
> savings at times. (The string interned() dict is often the largest
> single object. Once you get 500k strings in there, it is something like
> 12MB all by itself. I've certainly seen it in the 24MB+ range.)
Please see http://bugs.python.org/issue7224 .
It would be interesting to see if it make a difference to any of those
24MB+ interned string dictionary applications.
I propose a moratorium on language changes. This would be a period of
several years during which no changes to Python's grammar or language
semantics will be accepted. The reason is that frequent changes to the
language cause pain for implementors of alternate implementations
(Jython, IronPython, PyPy, and others probably already in the wings)
at little or no benefit to the average user (who won't see the changes
for years to come and might not be in a position to upgrade to the
latest version for years after).
The main goal of the Python development community at this point should
be to get widespread acceptance of Python 3000. There is tons of work
to be done before we can be comfortable about Python 3.x, mostly in
creating solid ports of those 3rd party libraries that must be ported
to Py3k before other libraries and applications can be ported. (Other
work related to Py3k acceptance might be tools to help porting, tools
to help maintaining multiple versions of a codebase, documentation
about porting to Python 3, and so on. Also, work like that going on in
the distutils-sig is very relevant.)
Note, the moratorium would only cover the language itself plus
built-in functions, not the standard library. Development in the
standard library is valuable and much less likely to be a stumbling
block for alternate language implementations. I also want to exclude
details of the CPython implementation, including the C API from being
completely frozen -- for example, if someone came up with (otherwise
acceptable) changes to get rid of the GIL I wouldn't object.
But the moratorium would clearly apply to proposals for anonymous
blocks, "yield from" (PEP 380), changes to decorator syntax, and the
like. (I'm sure it won't stop *discussion* of those proposals, and
that's not the purpose of the moratorium; but at least it will stop
worries elsewhere that such proposals might actually be *accepted* any
time soon.)
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
Hello,
That's not my idea, it has been suggested by several people in the
distutils discussions. But I'd like to bring it here because I
couldn't find a clear answer or position on this subject in the lists
archives (let me know if there's one). And depending on the answers, I
might suggest it as a topic for the next language summit.
What about having a different release cycle for the stdlib, and
shipping Python in two distinct releases:
- Python : core + stdlib
- Python-Stdlib : stdlib
The Python release would remain unchanged, but its cycle would be
longer (as the moratorium seems to imply).
stdlib would have a shorter release cycle (18 months?) allowing people
to upgrade it without having to wait for the next full Python release.
Each stdlib release would be compatible with a range of Python versions.
Regards,
Tarek
Terry Reedy wrote:
> Alexander Belopolsky wrote:
>> Terry Reedy wrote:
>>> I had exactly the same idea, but did not post because it violates the
>>> general rule that mutators return None.
>>
>> Is there such a rule? What about set/dict pop?
>
> The rule perhaps should be restated as 'Collection mutators return None
> or possible an item but not the collection.'
And to clarify the rationale for that guideline: it is to make it clear
that the mutator is changing the container in place and *not* creating a
new container object.
myset.pop() # No new container, returns popped object
mylist.sort() # No new container, returns None
sorted(mylist) # New container, so return it
mystr.lower() # Creates new string, so return it
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
---------------------------------------------------------------
On Mon, Oct 26, 2009 at 1:29 PM, John Arbash Meinel
<john(a)arbash-meinel.com> wrote:
...
> Just to let you know that I created my own "SimpleSet" class for use in
> the Bazaar code base. And I did exactly this, overloading "Add" to also
> return the object added. It works pretty well.
>
Very interesting. Is there a public web viewer for your branch? I
tried http://code.python.org/loggerhead/, but it gives me 503 Service
Temporarily Unavailable.
>> Here is an alternative idea on how storing interned objects in a set
>> can be supported.
FWIW, here is a recipe that can retrieve canonical objects
from any container, letting look-up a preferred representative
of an equivalence class: http://code.activestate.com/recipes/499299/
Raymond
Alexander Belopolsky schrieb:
> Changing the subject to reflect branched discussion and forwarding to
> python-ideas where it probably belongs.
>
> On Mon, Oct 26, 2009 at 12:02 PM, Terry Reedy <tjreedy(a)udel.edu> wrote:
>> Alexander Belopolsky wrote:
>>
>>> Here is an alternative idea on how storing interned objects in a set
>>> can be supported. Currently set.add method returns None and has no
>>> effect when set already has an object equal to the one being added. I
>>> propose to consider changing that behavior to make set.add return the
>>> added object or the set member that is equal to the object being
>>> added. It is unlikely that many programs rely on the return value
>>> being None (with doctests being a probable exception), so adding this
>>> feature is unlikely to cause much grief.
>>
>> I had exactly the same idea, but did not post because it violates the
>> general rule that mutators return None.
>
> Is there such a rule? What about set/dict pop?
The rule is about methods that do not have an obvious return value, where
the choice is between returning self and None.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.