Using the module heapq, I found that it's not easy to use. Or, at
least, that it could be straightforward to use.
My main issue is that for correct usage:
- user should provide an external list, that shouldn't use for
anything else to don't break the invariant
- to alter the heap queue, a bunch of functions must be used, passing
always the external list
I think that changing "external list" for "internal attribute", and
"bunch of functions " for "methods", it will leave the module easier
to use, safer, and more object oriented.
So, I basically coded it. What do you think about including this class
in the heap module?
"""
class Heap(object):
'''Maintains a heapified list, always conserving the invariant.
Heaps are arrays for which heap[k] <= heap[2*k+1] and
heap[k] <= heap[2*k+2] for all k, counting elements from zero.
'''
def __init__(self, iterable=None):
'''Initializes the heap from any iterable.
>>> Heap([1, 2])
Heap([1, 2])
>>> Heap([])
Heap([])
>>> Heap()
Heap([])
>>> Heap((1, 2, 3))
Heap([1, 2, 3])
>>> Heap(x**2 for x in range(3))
Heap([0, 1, 4])
'''
if iterable is None:
self._queue = []
else:
self._queue = list(iterable)
heapq.heapify(self._queue)
def __repr__(self):
return "Heap(%s)" % self._queue
def push(self, item):
'''Push the item to the heap queue.
>>> h = Heap()
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.push(2)
>>> h.push(1)
>>> h.push(0)
>>> h
Heap([0, 2, 1, 5, 3, 4])
'''
heapq.heappush(self._queue, item)
def pop(self):
'''Pop one item from the heap queue.
>>> h = Heap([0, 2, 1, 5, 3, 4])
>>> h.pop()
0
>>> h.pop()
1
>>> h.pop()
2
>>> h.pop()
3
>>> h.pop()
4
>>> h.pop()
5
>>> h
Heap([])
>>> h.pop()
Traceback (most recent call last):
...
IndexError: index out of range
'''
return heapq.heappop(self._queue)
def pushpop(self, item):
'''Push the item and pop one from the heap queue.
Note that this method is more efficient than calling both
push() and pop() separately.
>>> h = Heap()
>>> h.pushpop(7)
7
>>> h.push(5)
>>> h.push(4)
>>> h.push(3)
>>> h.pushpop(7)
3
>>> h.pushpop(7)
4
>>> h
Heap([5, 7, 7])
'''
return heapq.heappushpop(self._queue, item)
def replace(self, item):
'''Pop one item and push the received one into the heap queue
Note that this method is more efficient than calling both
pop() and push() separately.
>>> h = Heap()
>>> h.replace(3)
Traceback (most recent call last):
...
IndexError: index out of range
>>> h.push(7)
>>> h.replace(2)
7
>>> h
Heap([2])
>>> h.push(3)
>>> h
Heap([2, 3])
>>> h.replace(1)
2
>>> h.replace(9)
1
>>> h
Heap([3, 9])
'''
return heapq.heapreplace(self._queue, item)
"""
Regards,
--
. Facundo
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Why does Python have a bitwise but not a logical xor operator? It's even
weirder because boolean objects do have a __xor__ method.
Does Py3k have an xor keyword? (I am using 2.6 due to NumPy.)
(Yes I know BDFL is planning a moratorium on syntax.)
S.M.
The GIL is particularly evil on multiprocessor systems. Not just because
it prevents parallel excution of Python code, but thread switches are
defunct. This is because a thread that periodically releases and
re-acquires the GIL (at checkintervals), more or less always wins. On a
multiprocessor system, the thread it will continue to run its processor
and grab the GIL back before a sleeping thread wakes up. The cooperative
multithreading intended by using checkintervals only works properly on
single-processor systems. Python threads only work properly on
single-processor computers. On computers with multiple CPUs, one must
set the affinity of the Python process is set to one CPU only for proper
cooperative thread scheduling. This behaviour of the GIL is a PITA, but
often overlooked - all debates seems to be about parallel scalability,
though far less important. Today, most people have multicore desktop
computers (e.g. mine have four cores). The Linux kernel got rid of the
BKL a long time ago. It's time Python does the same.
The GIL also has its virtues. When calling a C extension, it is
serialized by default unless the GIL is released.
An attempt was made (many years ago) to replace the GIL with
fine-grained locking. It slowed down serial code. That should come as no
surprise, as OS mutexes and semaphores are expensive.
There are lock-free data structures of any conceivable sort. There are
multithreaded garbage collectors in Java and .NET, that don't need a
global lock. I've heard claims that removal of the GIL would require
Python to use a garbage collector instead of reference counts. That is
not true. Lock-free data structures and garbage collectors have all one
thing in common: they use a compare-and-exchange (CAS) instruction
present in most modern processors, e.g. CMPXCHG on x86. In fact, CAS is
used to implement OS mutexes (sleeplocks) and the less expensive
spinlocks. Without a CAS instruction for the platform, there would be no
GIL. All platforms that has a working GIL has a working CAS.
Python could use the CAS instruction for lock-free management of
reference counts. All built-in types (int, float, str, list, tuple,
dict, set, etc) could be made lock-free using CAS. That would allow the
interpreter to execute bytecode without holding the GIL. Only calls to C
extensions would be serialized with the GIL.
Schematically, this is how refcounts could be managed without needing a
lock:
/*
* Returns 1 if an exhange was made, and (*addr == old) before
* the exchange, otherwise returns 0.
*
* uses opcode CMPXCHG on x86
*
*/
extern int Py_refcnt_CAS(Py_ssize_t *addr, Py_ssize_t old, Py_ssize_t
new);
inline void Py_DECREF(PyObject *obj)
{
register Py_ssize_t refcnt;
do {
refcnt = obj->refcnt;
} while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt-1));
refcnt = obj->refcnt;
if (refcnt == 0) {
if(Py_refcnt_CAS(&obj->ob_refcnt, 0, -1))
{
/* deallocate the object */
}
}
}
inline void Py_INCREF(PyObject *obj)
{
register Py_ssize_t refcnt;
refcnt = obj->refcnt;
if (refcnt >= 0) {
do {
if((refcnt = obj->refcnt)<0) break;
} while (!Py_refcnt_CAS(&obj->ob_refcnt, refcnt, refcnt+1));
}
}
Regards,
Sturla Molden
Hey All,
We were exploring new features of Python 3.0 at our Tucson User's Group
here in Tucson (TuPLE: Tucson Python Language Enthusiasts), in particular,
the OrderedDict. See
http://groups.google.com/group/TuPLEgroup/browse_thread/thread/40af73f8e194…
Has there been any discussion about making a "better" OrderedDict literal? I
did
some googling and didn't find anything.
Basically, the thought was there might be a place for a slightly better
literal for OrderedDict
in Python 3.0
od = OrderedDict([('a',1),('b':2)]) # seems clumsy
The two ideas floated were:
od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies key-value
or
od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order
Apologies if this is the wrong place for this discussion. There has been
a lot of opinions flying here at work and at TuPLE which I will be happy
to share if this is the right place. ;)
Gooday,
Richie
A Python interpreter has one encoding for floats, ints, and strings.
sys.float_info and sys.int_info give details about the first two.
although they are mostly invisible to user code. (I presume they are
attached to sys rather than float and int precisely because this.) A
couple of recent posts have discussed making the unicode encoding (UCS2
v 4) both less visible and more discoverable to extensions.
Bytes are nearly always an encoding of *something*, but the particular
encoding used is instance-specific. As Guido has said, the programmer
must keep track. But how? In an OO language, one obvious way is as an
attribute of the instance. That would be carried with the instance and
make it self-identifying.
What I do not know if it is feasible to give an immutable instance of a
builtin class a mutable attribute slot. If it were, I think this could
make 3.x bytes easier and more transparent to use. When a string is
encoded to bytes, the attribute would be set. If it were then pickled,
the attribute would be stored with it and restored with it, and less
easily lost. If it were then decoded, the attribute would be used. If it
were sent to the net, the attribute would be used to set the appropriate
headers. The reverse process would apply from net to bytes to (unicode)
text.
Bytes representing other types of data, such as nedia could also be
tagged, not just those representing text.
This would be a proposal for 3.3 at the earliest. It would involved
revising stdlib modules, as appropriate, to use the new info.
Terry Jan Reedy
FYI to everyone on this list.
---------- Forwarded message ----------
From: Guido van Rossum <guido(a)python.org>
Date: Mon, Nov 9, 2009 at 09:56
Subject: Re: [Python-Dev] PEP 3003 - Python Language Moratorium
To: Brett Cannon <brett(a)python.org>
Cc: python-dev(a)python.org
Thanks Brett. I've moved the moratorium PEP to Status: Accepted. I've
added the words about inclusion of 3.2 and exclusion of 3.3 (which
were eaten by a svn conflict when I previously tried to add them) and
added a section to th end stating that an extension will require
another PEP.
--Guido
[snip - non-critical email Guido was replying to]
(Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving
away a lot of major problems with the concept, not least of which is the
sheer amount of work involved. I just wanted to get the idea published
somewhere while I was thinking about it)
I'm in the process of implementing a runpy.run_path function for 2.7/3.2
to allow Python code to use the zipfile and directory execution feature
provided by the CPython command line in 2.6/3.1. It turns out the global
state used for the import system is causing some major pain in the
implementation. It's solvable, but it will probably involve a couple of
rather ugly hacks and the result sure as hell isn't going to be thread-safe.
Anyway, the gist of the idea in the subject line is to take all of the
PEP 302 data stores and make them attributes of an ImportEngine class.
This would affect at least:
sys.modules
sys.path
sys.path_hooks
sys.path_importer_cache
sys.meta_path
sys.dont_write_bytecode
The underlying import machinery would migrate to instance methods of the
new class.
The standard import engine instance would be stored in a new sys module
attribute (e.g. 'sys.import_engine'). For backwards compatibility, the
existing sys attributes would remain as references to the relevant
instance attributes of the standard engine.
Modules would get a new special attribute (e.g. '__import_engine__')
identifying the import engine that was used to import that module.
__import__ would be modified to take the new special attribute into account.
The main immediate benefit from my point of view would be to allow runpy
to create *copies* of the standard import engine so that
runpy.run_module and runpy.run_path could go do their thing without
impacting the rest of the interpreter. At the moment that really isn't
feasible, hence the lack of thread safety in that module.
I suspect such a change would greatly simplify experimentation with
Python security models as well: restricted code could be given a
restricted import engine rather than the restrictions having to be baked
in to the standard import engine.
>From an OO design point of view, it's a classic migration of global
state and multiple functions to manipulate that state into a single
global instance of a new class.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
---------------------------------------------------------------
Are Guido's articles on python history at python-history.blogspot.com over?
I guess there are lots more anecdotes and stories still but the blog
hasn't been updated for a long time so was wondering whether I should
expect anything down the pipe?
Cheers,
Daniel
--
Psss, psss, put it down! - http://www.cafepress.com/putitdown
Scope
-----
This idea affects the CPython ABI for extension modules. It has no impact
on the Python language syntax nor other Python implementations.
The Problem
-----------
Currently, Python can be built with an internal Unicode representation of
UCS2 or UCS4. The two are binary incompatible, but the distinction is not
included as part of the platform name. Consequently, if one installs a
binary egg (e.g., with easy_install), there's a good chance one will get an
error such as the following when trying to use it:
undefined symbol: PyUnicodeUCS2_FromString
In Python 2, some extension modules can blissfully link to either ABI, as
the problem only arises for modules that call a PyUnicode_* macro (which
expands to calling either a PyUnicodeUCS2_* or PyUnicodeUCS4_* function).
For Python 3, every extension type will need to call a PyUnicode_* macro,
since __repr__ must return a Unicode object.
This problem has been known since at least 2006, as seen in this thread from
the distutils-sig:
http://markmail.org/message/bla5vrwlv3kn3n7e?q=thread:bla5vrwlv3kn3n7e
In that thread, it was suggested that the Unicode representation become part
of the platform name. That change would require a distutils and/or
setuptools change, which has not happened and does not appear likely to
happen in the near future. It would also mean that anyone who wants to
provide binary eggs for common platforms will need to provide twice as many
eggs.
Solution
--------
Get rid of the ABI difference for the 99% of extension modules that don't
care about the internal representation of Unicode strings. From the
extension module's point of view, PyObject is opaque. It will manipulate
the Unicode string entirely through PyUnicode_* function calls and does not
care about the internal representation.
For example, PyUnicode_FromString has the following signature in the
documentation:
PyObject *PyUnicode_FromString(const char *u)
Currently, it's #ifdef'ed to either PyUnicodeUCS2_FromString or
PyUnicodeUCS4_FromString.
Remove the macro and name the function PyUnicode_FromString regardless of
which internal representation is being used. The vast majority of binary
eggs will then work correctly on both UCS2 and UCS4 Pythons.
Functions that explicitly use Py_UNICODE or PyUnicodeObject as part of their
signature will continue to be #ifdef'ed, so extension modules that *do* care
about the internal representation will still generate a link error.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
Last night I got a couple of PyCon talks rejected, and someone else sent me
a rejection email they'd received. I wasn't surprised at the rejections,
but I was quite surprised that many of the review comments were at least in
part based on the presenter (sometimes the incorrectly assumed presenter)
instead of on the proposed talk.
Out of the 14 reviews, 5 of them have comments about the author of the
proposal. I'll just give 2 examples here, taken from 2 different reviewers
on 2 different proposals.
1. Imagine you're a relatively unknown Python programmer, you submit a talk
to PyCon, and get back a review whose first sentence reads:
I don't know the reputation of this particular speaker, so I won't "+1"
That sends a pretty unfortunate message: Because you're not a recognized
Python person, I wont give you the thumbs up. Maybe I'm being naive or
simplistic, but I'd have hoped one route to becoming recognized would be by
giving a PyCon talk. From the POV of the review recipient the leading
justification for a non-recommendation has nothing to do with the talk!
2. What if you gave a PyCon talk in an earlier year that wasn't rated as
highly as other talks? You send in a PyCon proposal, and get this back:
I like XXX but honestly his talk at pycon 09 went poorly.
That was the *entire* review in this case. What's the message here? Sounds
like: well, you gave a talk once and it wasn't so great, so part of my vote
against your proposal is because of that. It's like telling people to go
away and not bother ever submitting again. Again, it's that's not based on
the current talk proposal. People tend to get better at giving talks. If a
proposal's content is technically good enough to get in, let them give
another talk and help them to make it better. Just when is XXX supposed to
re-send another PyCon talk, if ever? What makes this even worse is that
XXX was not even the primary author of the proposal, and was not to be the
speaker. So here we have a review that's negative *entirely* due to a talk
that someone else gave in a previous year. How discouraging. Should the
person in the future "take one for the team" and decline to be listed as a
co-author on joint proposals - even though they're not going to speak -
fearing that a reviewer will reply with a -1 and a one-line dismissal?
That's the unfortunate dynamic that the above "review" has created.
I hope this doesn't sound like personal sour grapes. It's not at all. I've
had *tons* of rejection letters in my life (see http://bit.ly/1xytIr),
including from PyCon. They're water off a duck's back at this point :-) I
do however care about Python and the Python community.
The most important point is the message that's sent back to aspiring
speakers. Reviews that are based on the supposed character, or old talks,
or how recognized you are or aren't, or on a guess as to which of multiple
authors might be doing the presenting - all of those send a bad message.
They make PyCon look insular and cliquey. If the committee of people is (or
merely gives the impression of being) inwardly focused, the community and
in the longer term perhaps the language itself will suffer through reduced
diversity and through discouraging precisely the people who are animated
enough and have the initiative and ambition to submit talks. Those are
*exactly* the wrong folks to discourage.
The obvious suggestion is to anonymize the review process. That's standard
in mature conferences. It doesn't eliminate bias (in fact you *don't want*
to eliminate bias - you need it to survive, you need it to assess quality),
but it does reduce the opportunity for judgment based on the wrong things.
When I say "wrong" I mean: if you're going to judge based on stuff that's
not just the proposal content, then ask for a CV, or a speaking record, or
whatever you intend to consider in the review process.
Anonymizing conference reviewing has healthy effects. I've seen it up close
in academic circles. It's like a breath of fresh air and the results are
surprising. When they did it in the genetic algorithms world, all of a
sudden really interesting talks were being accepted from all over the world
and many very experienced researchers were having multiple talks rejected.
That was unexpected, refreshing, and generally agreed to be a very healthy
and embracing/welcoming move.
If PyCon doesn't move to anonymizing reviews, then at least *try* not to
base acceptance decisions on who a speaker is (or, worse, who it's presumed
to be). If for some reason you have to, it's *perhaps* better not to tell
the poor submitter that they're being rejected in part based on who they
are or aren't. The CFP requests a talk proposal, and that's what should be
reacted to in the review response, even if there's more to the story. Some
of the comments above *might* be appropriate for a conference committee
meeting, but not for the first (or only!) line of a review.
OK, rant over :-) Regards to everyone & thanks for all the PyCon work. I
know how much work it is, and that it's not easy. I hope to be able to make
it to Atlanta.
Terry Jones