I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Hi, I have attached a patch at:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1739789&group_…
A common optimization tip for Python code is to use locals rather than
globals. This converts dictionary lookups of (interned) strings to
tuple indexing. I have created a patch that achieves this speed
benefit "automatically" for all globals and builtins, by adding a
feature to dictobjects.
Additionally, the idea of this patch is that it puts down the
necessary infrastructure to also allow virtually all attribute
accesses to also be
accelerated in the same way (with some extra work, of course).
I have already suggested this before but I got the impression that the
spirit of the replies was "talk is cheap, show us the
code/benchmarks". So I wrote some code.
Getting the changes to work was not easy, and required learning about
the nuances of dictobject's, their re-entrancy issues, etc. These
changes do slow down dictobjects, but it seems that this slowdown is
more than offset by the speed increase of builtins/globals access.
Benchmarks:
A set of benchmarks that repeatedly perform:
A. Global reads
B. Global writes
C. Builtin reads
with little overheads (just repeatedly issuing global/builtin access
bytecodes, many times per loop iteration to minimize the loop
overhead), yield 30% time decrease (~42% speed increase).
Regression tests take ~62 seconds (of user+sys time) with Python2.6 trunk
Regression tests take ~65 seconds (of user+sys time) with the patch
Regression tests are about ~4.5% slower.
(Though Regression tests probably spend their running time on a lot
more code than other programs, so are not a good benchmark, which
spends more time instantiating function objects, and less time
executing them)
pystone seems to be improved by about 5%.
My conclusions:
The LOAD_GLOBAL/STORE_GLOBAL opcodes are considerably faster. Dict
accesses or perhaps the general extra activity around seem to be
insignificantly slower, or at least cancel out against the speed
benefits in the regression tests.
The next step I am going to try, is to replace the PyObject_GetAttr
call with code that:
* Calls PyObject_GetAttr only if GenericGetAttr is not the object's
handler, as to allow modifying the behaviour.
* Otherwise, remember for each attribute-accessing opcode, the last
type from which the attribute was accessed. A single pointer
comparison can check if the attribute access is using the same type.
In case it does, it can use a stored exported key from the type
dictionary [or from an mro cache dictionary for that type, if that is
added], rather than a dict lookup. If it yields the same speed
benefit, it could make attribute access opcodes up-to 42% faster as
well, when used on the same types (which is probably the common case,
particularly in inner loops).
This will allow, with the combination of __slots__, to eliminate all
dict lookups for most instance-side accesses as well.
P.S: I discovered a lot of code duplication (and "went along" and
duplicated my code in the same spirit), but was wondering if a patch
that utilized C's preprocessor heavily to prevent code duplication in
CPython's code, and trusting the "inline" keyword to prevent thousands
of lines in the same function (ceval.c's opcode switch) would be
accepted.
I somewhat tongue-in-cheekly propose to make the first
seven most common English words all integral part of
the Python language (three already are):
source: http://www.world-english.org/english500.htm
1 the:
Singletons:
the class Logger:
# ...
2 of
inheritance:
class Square of Shape::
# pass
3 to
printing:
print('hello world') to sys.stdout
4 and
already a keyword
5 a
introspection:
if object is a dict:
# ...
6 in
already a keyword
7 is
already a keyword
Then it gets tougher:
8 it
9 you
10 that
Top 500 words that are already
keywords/builtins/conventions in Python:
27 or
49 each
55 if
189 try
198 self
251 open
254 next
Top 500 words that are already keywords in some
languages:
25 this
52 do
68 long
Top 500 words that should NEVER be keywords:
78 could
81 did
180 men
252 seem
435 oh
Words that seem like they'd be part of a programming
language, but maybe a bad idea:
74 has
82 my
120 every
148 too
____________________________________________________________________________________
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
http://farechase.yahoo.com/
I believe that it is possible to add a useful feature to dicts, that
will enable some interesting performance features in the future.
Dict lookups are in general supposed to be O(1) and they are indeed very fast.
However, there are several drawbacks:
A. O(1) is not always achieved, collisions may still occur
B. Some lookups use static "literal" keys, and could benefit from
accessing a dict item directly (very fast O(1) _worst_ case, without
even a function call when done from the C side).
C. B is especially true when dicts are used to hold attributes - in
that case literal attribute names, in conjunction with psyco-like type
specialization, can help avoid many dict lookups. I won't delve into
this in this mail - but this is actually the use-case that I'm after
optimizing.
There is a possible implementation that can allow a dict to export
items with minimal impact on its other performance.
Create a new type of PyObject, a PyDictItemObject that will contain a
key, value pair. (This will NOT exist for each hash table entry, but
only for specifically exported items).
Add a bitmap to dicts that has a single bit for every hash table
entry. If the entry is marked in the bitmap, then its PyObject*
"value" is not a reference to the value object, but to a
PyDictItemObject.
A new dict method "PyDict_ExportItem" that takes a single argument:
key will create a PyDictItemObject, and assign the dict's key to it,
and mark that hash-table entry in the bitmap. If PyDict_ExportItem is
called when the item is already exported, it will return another
reference to the same PyDictItemObject. The value in PyDictItemObject
will initially be set to NULL (which means "key not mapped"). Both the
PyDictItemObject and PyDict_exportitem should probably not be exported
to the Python-side, but PyDictItemObject should probably be a PyObject
for refcounting purposes.
All the dict methods to get/set values, once they found the correct
entry, check the bitmap to see if the entry is marked, and if it is -
access the value in the PyDictItemObject instead of the value itself.
In addition, if that value is NULL, it represents the key not actually
being in the dict (__getitem__ can raise a KeyError there, for
example, and __setitem__ can simply use Py_XDECREF and overwrite value
with the value).
Alternatively to the bitmap, the hash table entry can contain another
boolean int -- I am not sure which is preferable in terms of
cache-locality, but the bitmap is definitely cheaper, space-wise.
This would allow dict users to create an exported item for a key once,
and then access that key in real O(1) without function calls. As
mentioned before, this can also serve in the future, as the basis for
avoiding dict lookups for attribute searches.
Hi. I was wondering if there had ever been an official decision on
the idea of adding labeled break and continue functionality to Python.
I've found a few places where the idea has come up, in the context of
named code blocks:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/a69662…
and in the context of discussing do/while loops and assignments in
conditionals:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/6da848…
Both of those discussions just kind of petered out or changed
direction without any conclusion.
There's also this Python 2.6 which has a similar syntax (although
different semantics) to one of the syntaxes proposed in the first
discussion above:
http://sourceforge.net/tracker/index.php?func=detail&aid=1714448&group_id=5…
I would be willing to help make a case and then write a PEP for
labeled break and continue, as long as the community or the BDFL
hasn't already decided against it.
-matt
P.S. My apologies about cross posting; python-ideas seems like a
better place to post this, but PEP 1 says to post to python-list.
This is a linguistic reflection inspired by PEP 3131.
English is a language that has undergone a major
transformation in the last 200 to 300 years. It used
to be spoken mostly on one particular island across
the channel from France. Now it's spoken worldwide.
Two of the larger populations of English speakers,
residents of the UK and residents of the US, live an
ocean away from each other.
Unlike Python, English never had a PEP process. It
naturally evolved. But like Python, English has been
promoted, for various reasons, as a worldwide
language, mostly successfully. English is also like
Python in the sense that it had a mostly fresh start
during certain colonizations, but it still had
backward compatibility issues.
Some observations:
1) US and UK residents can mostly converse with
each other.
2) American English has diverged from British
English in vocubalary, although many of the differing
words are esoteric, or are inherently culturally
incompatible, or have synonyms recognized on both
sides of the ocean, or are idiomatic expressions.
3) American English differs from British grammar
only in pretty non-fundamental areas. American
English, despite 200 years of evolution away from its
parent, preserves subject-verb-object ordering.
Adjectives almost always precede nouns. Differences
come down to subtle things like how you deal with
collective nouns, etc.
4) Some words are spelled differently between
American English and British English, but the
spellings are generally mutually understanded by all
speakers. (Even on the same side of the ocean,
spelling can be ambiguous in English, so variant
spellings often arise [more often, than, say,
Spanish].)
5) American English and British English still have
the exact same alphabet. A to Z.
Are there analogies here to be drawn to Python?
Thoughts?
On AmE and BrE:
http://en.wikipedia.org/wiki/American_and_British_English_differences
"America and England are two nations divided by a
common language."
____________________________________________________________________________________
Don't get soaked. Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather
Hello all,
I was looking into function annotations, that are to be introduced in
Python3k, and I found this old message sent to this mailing list :
http://mail.python.org/pipermail/python-ideas/2007-January/000037.html
I would like to restart the discussion of attribute annotation, because in
my opinion it can be a very powerful feature. I personally think about using
it for SOAP message serialization, or any kind of XML serialization, the
Idea would be to annotate various object attribues to be either marshaled as
XML Elements or XML Attributes of the current Node that reflects the Object.
Thank you,
--
Ali
Hi
List comprehensions (and generator expressions) come in two
'flavours' at the moment:
(1) [f(x) for x in L], which stands for map(f, L). Let's call this a
'map comprehension'
(2) [f(x) for x in L if p(x)], which stands for map(f, filter(p, L)).
Let's call this a 'map-filter comprehension'.
Now if one wants to write simply filter(p, L) as a list
comprehension, one has to write:
(3) [x for x in L if p(x)]. This could be called a 'filter
comprehension'.
the 'x for x in L' is not very nice IMHO, but it is often handy to
use such expressions over 'filter(...)', eg building the sublist of a
given list consisting of all the items of a given type could be
written as:
filter(lambda x: isinstance(x, FilteringType), heterogeneous_list)
or:
[x for x in heterogenous_list if isinstance(x, FilteringType)]
I still prefer the list comprehension over the lambda/filter
combination, but neither feels very satisfying (to me :) (not that
one cannot use partial in the filter version)
Why not just drop the 'x for' at the start of a 'filter
comprehension' (or generator expression)? Thus (3) could be written
more simply as:
(3') [x in L if p(x)]
This is consistent with common mathematical notation:
* { f(x) | x \in L } means the set of all f(x) for x in L
* { f(x) | x \in L, p(x) } means the set of all f(x) for x in L
satisfying predicate p.
* { x \in L | p(x) } means the set of all x in L satisfying predicate p.
--
Arnaud