webmaster has already heard from 4 people who cannot install it.
I sent them to the bug tracker or to python-list but they seem
not to have gone either place. Is there some guide I should be
sending them to, 'how to debug installation problems'?
Laura
Hi,
I updated my PEP 509 to make the dictionary version globally unique.
With *two* use cases of this PEP (Yury's method call patch and my FAT
Python project), I think that the PEP is now ready to be accepted.
Globally unique identifier is a requirement for Yury's patch
optimizing method calls ( https://bugs.python.org/issue26110 ). It
allows to check for free if the dictionary was replaced.
I also renamed the ma_version field to ma_version_tag.
HTML version:
https://www.python.org/dev/peps/pep-0509/
Victor
PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6
Abstract
========
Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.
Rationale
=========
In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".
The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement fast guards
on namespaces.
Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. Since the version is globally
unique, the version is also enough to check if the namespace dictionary
was not replaced with a new dictionary. The performance of a guard does
not depend on the number of watched dictionary entries, complexity of
O(1), if the dictionary version does not change.
Example of optimization: copy the value of a global variable to function
constants. This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime when the function is called, instead of using the
constant.
See the `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ for the concrete usage of
guards to specialize functions and for the rationale on Python static
optimizers.
Guard example
=============
Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::
UNSET = object()
class GuardDictKey:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.value = dict.get(key, UNSET)
self.version = dict_get_version(dict)
def check(self):
"""Return True if the dictionary entry did not changed
and the dictionary was not replaced."""
# read the version of the dict structure
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
return True
# lookup in the dictionary
value = self.dict.get(self.key, UNSET)
if value is self.value:
# another key was modified:
# cache the new dictionary version
self.version = version
return True
# the key was modified
return False
Usage of the dict version
=========================
Speedup method calls 1.2x
-------------------------
Yury Selivanov wrote a `patch to optimize method calls
<https://bugs.python.org/issue26110>`_. The patch depends on the
`implement per-opcode cache in ceval
<https://bugs.python.org/issue26219>`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.
The cache also requires that the dictionary version is globally unique.
It is possible to define a function in a namespace and call it
in a different namespace: using ``exec()`` with the *globals* parameter
for example. In this case, the globals dictionary was changed and the
cache must be invalidated.
Specialized functions using guards
----------------------------------
The `PEP 510 -- Specialized functions with guards
<https://www.python.org/dev/peps/pep-0510/>`_ proposes an API to support
specialized functions with guards. It allows to implement static
optimizers for Python without breaking the Python semantics.
Example of a static Python optimizer: the `fatoptimizer
<http://fatoptimizer.readthedocs.org/>`_ of the `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project
implements many optimizations which require guards on namespaces.
Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
guards on ``builtins.__dict__['range']`` and ``globals()['range']``
are required
Pyjion
------
According of Brett Cannon, one of the two main developers of Pyjion,
Pyjion can also benefit from dictionary version to implement
optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET
Core runtime).
Unladen Swallow
---------------
Even if dictionary version was not explicitly mentioned, optimizing
globals and builtins lookup was part of the Unladen Swallow plan:
"Implement one of the several proposed schemes for speeding lookups of
globals and builtins." Source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler
implemented with LLVM. The project stopped in 2011: `Unladen Swallow
Retrospective
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
Changes
=======
Add a ``ma_version_tag`` field to the ``PyDictObject`` structure with
the C type ``PY_INT64_T``, 64-bit unsigned integer. Add also a global
dictionary version. Each time a dictionary is created, the global
version is incremented and the dictionary version is initialized to the
global version. The global version is also incremented and copied to the
dictionary version at each dictionary change:
* ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist
* ``__detitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
is not ``dict[key]``
* ``update(...)`` if new values are different than existing values:
values are compared by identity, not by their content; the version can
be incremented multiple times
The ``PyDictObject`` structure is not part of the stable ABI.
The field is called ``ma_version_tag`` rather than ``ma_version`` to
suggest to compare it using ``version_tag == old_version_tag`` rather
than ``version <= old_version`` which makes the integer overflow much
likely.
Example using an hypothetical ``dict_get_version(dict)`` function::
>>> d = {}
>>> dict_get_version(d)
100
>>> d['key'] = 'value'
>>> dict_get_version(d)
101
>>> d['key'] = 'new value'
>>> dict_get_version(d)
102
>>> del d['key']
>>> dict_get_version(d)
103
The version is not incremented if an existing key is set to the same
value. For efficiency, values are compared by their identity:
``new_value is old_value``, not by their content:
``new_value == old_value``. Example::
>>> d = {}
>>> value = object()
>>> d['key'] = value
>>> dict_get_version(d)
40
>>> d['key'] = value
>>> dict_get_version(d)
40
.. note::
CPython uses some singleton like integers in the range [-5; 257],
empty tuple, empty strings, Unicode strings of a single character in
the range [U+0000; U+00FF], etc. When a key is set twice to the same
singleton, the version is not modified.
Implementation and Performance
==============================
The `issue #26058: PEP 509: Add ma_version_tag to PyDictObject
<https://bugs.python.org/issue26058>`_ contains a patch implementing
this PEP.
On pybench and timeit microbenchmarks, the patch does not seem to add
any overhead on dictionary operations.
When the version does not change, ``PyDict_GetItem()`` takes 14.8 ns for
a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover,
a guard can watch for multiple keys. For example, for an optimization
using 10 global variables in a function, 10 dictionary lookups costs 148
ns, whereas the guard still only costs 3.8 ns when the version does not
change (39x as fast).
The `fat module
<http://fatoptimizer.readthedocs.org/en/latest/fat.html>`_ implements
such guards: ``fat.GuardDict`` is based on the dictionary version.
Integer overflow
================
The implementation uses the C type ``PY_UINT64_T`` to store the version:
a 64 bits unsigned integer. The C code uses ``version++``. On integer
overflow, the version is wrapped to ``0`` (and then continue to be
incremented) according to the C standard.
After an integer overflow, a guard can succeed whereas the watched
dictionary key was modified. The bug only occurs at a guard check if
there are exaclty ``2 ** 64`` dictionary creations or modifications
since the previous guard check.
If a dictionary is modified every nanosecond, ``2 ** 64`` modifications
takes longer than 584 years. Using a 32-bit version, it only takes 4
seconds. That's why a 64-bit unsigned type is also used on 32-bit
systems. A dictionary lookup at the C level takes 14.8 ns.
A risk of a bug every 584 years is acceptable.
Alternatives
============
Expose the version at Python level as a read-only __version__ property
----------------------------------------------------------------------
The first version of the PEP proposed to expose the dictionary version
as a read-only ``__version__`` property at Python level, and also to add
the property to ``collections.UserDict`` (since this type must mimick
the ``dict`` API).
There are multiple issues:
* To be consistent and avoid bad surprises, the version must be added to
all mapping types. Implementing a new mapping type would require extra
work for no benefit, since the version is only required on the
``dict`` type in practice.
* All Python implementations must implement this new property, it gives
more work to other implementations, whereas they may not use the
dictionary version at all.
* Exposing the dictionary version at Python level can lead the
false assumption on performances. Checking ``dict.__version__`` at
the Python level is not faster than a dictionary lookup. A dictionary
lookup has a cost of 48.7 ns and checking a guard has a cost of 47.5
ns, the difference is only 1.2 ns (3%)::
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
10000000 loops, best of 3: 0.0487 usec per loop
$ ./python -m timeit -s 'd = {str(i):i for i in range(100)}'
'd.__version__ == 100'
10000000 loops, best of 3: 0.0475 usec per loop
* The ``__version__`` can be wrapped on integer overflow. It is error
prone: using ``dict.__version__ <= guard_version`` is wrong,
``dict.__version__ == guard_version`` must be used instead to reduce
the risk of bug on integer overflow (even if the integer overflow is
unlikely in practice).
Mandatory bikeshedding on the property name:
* ``__cache_token__``: name proposed by Nick Coghlan, name coming from
`abc.get_cache_token()
<https://docs.python.org/3/library/abc.html#abc.get_cache_token>`_.
* ``__version__``
* ``__timestamp__``
Add a version to each dict entry
--------------------------------
A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can only store the entry
version to avoid the strong reference to the value (only strong
references to the dictionary and to the key are needed).
Changes: add a ``me_version`` field to the ``PyDictKeyEntry`` structure,
the field has the C type ``PY_INT64_T``. When a key is created or
modified, the entry version is set to the dictionary version which is
incremented at any change (create, modify, delete).
Pseudo-code of an fast guard to check if a dictionary key was modified
using hypothetical ``dict_get_version(dict)``
``dict_get_entry_version(dict)`` functions::
UNSET = object()
class GuardDictKey:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.dict_version = dict_get_version(dict)
self.entry_version = dict_get_entry_version(dict, key)
def check(self):
"""Return True if the dictionary entry did not changed
and the dictionary was not replaced."""
# read the version of the dict structure
dict_version = dict_get_version(self.dict)
if dict_version == self.version:
# Fast-path: dictionary lookup avoided
return True
# lookup in the dictionary
entry_version = get_dict_key_version(dict, key)
if entry_version == self.entry_version:
# another key was modified:
# cache the new dictionary version
self.dict_version = dict_version
return True
# the key was modified
return False
The main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system.
In Python, the memory footprint matters and the trend is to reduce it.
Examples:
* `PEP 393 -- Flexible String Representation
<https://www.python.org/dev/peps/pep-0393/>`_
* `PEP 412 -- Key-Sharing Dictionary
<https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype
----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core,
expecting the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
use ``globals={}``. It is not possible to cast the ``dict`` to a
``dict`` subtype because the caller expects the ``globals`` parameter
to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling
``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
functions require the exact ``dict`` type.
* ``Python/ceval.c`` does not completely supports dict subtypes for
namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict``
instances. If a ``dict`` subtype is used for namespaces, the garbage
collector can be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken
for ``dict`` subtypes, and so it would make Python a little bit
slower.
Prior Art
=========
Method cache and type version tag
---------------------------------
In 2007, Armin Rigo wrote a patch to to implement a cache of methods. It
was merged into Python 2.6. The patch adds a "type attribute cache
version tag" (``tp_version_tag``) and a "valid version tag" flag to
types (the ``PyTypeObject`` structure).
The type version tag is not available at the Python level.
The version tag has the C type ``unsigned int``. The cache is a global
hash table of 4096 entries, shared by all types. The cache is global to
"make it fast, have a deterministic and low memory footprint, and be
easy to invalidate". Each cache entry has a version tag. A global
version tag is used to create the next version tag, it also has the C
type ``unsigned int``.
By default, a type has its "valid version tag" flag cleared to indicate
that the version tag is invalid. When the first method of the type is
cached, the version tag and the "valid version tag" flag are set. When a
type is modified, the "valid version tag" flag of the type and its
subclasses is cleared. Later, when a cache entry of these types is used,
the entry is removed because its version tag is outdated.
On integer overflow, the whole cache is cleared and the global version
tag is reset to ``0``.
See `Method cache (issue #1685986)
<https://bugs.python.org/issue1685986>`_ and `Armin's method cache
optimization updated for Python 2.6 (issue #1700288)
<https://bugs.python.org/issue1700288>`_.
Globals / builtins cache
------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache (issue
#10401) <http://bugs.python.org/issue10401>`_ which adds a private
``ma_version`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``Py_ssize_t``.
The patch adds a "global and builtin cache" to functions and frames, and
changes ``LOAD_GLOBAL`` and ``STORE_GLOBAL`` instructions to use the
cache.
The change on the ``PyDictObject`` structure is very similar to this
PEP.
Cached globals+builtins lookup
------------------------------
In 2006, Andrea Griffini proposed a patch implementing a `Cached
globals+builtins lookup optimization
<https://bugs.python.org/issue1616125>`_. The patch adds a private
``timestamp`` field to the ``PyDictObject`` structure (``dict`` type),
the field has the C type ``size_t``.
Thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
Guard against changing dict during iteration
--------------------------------------------
In 2013, Serhiy Storchaka proposed `Guard against changing dict during
iteration (issue #19332) <https://bugs.python.org/issue19332>`_ which
adds a ``ma_count`` field to the ``PyDictObject`` structure (``dict``
type), the field has the C type ``size_t``. This field is incremented
when the dictionary is modified, and so is very similar to the proposed
dictionary version.
Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation. The dictionary version changes when values
are replaced, whereas modifying dictionary values while iterating on
dictionary keys is legit in Python.
PySizer
-------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.
Discussion
==========
Thread on the mailing lists:
* python-dev: `PEP 509: Add a private version to dict
<https://mail.python.org/pipermail/python-dev/2016-January/142685.html>`_
(january 2016)
* python-ideas: `RFC: PEP: Add dict.__version__
<https://mail.python.org/pipermail/python-ideas/2016-January/037702.html>`_
(january 2016)
Copyright
=========
This document has been placed in the public domain.
Hi,
tl;dr The summary is that I have a patch that improves CPython
performance up to 5-10% on macro benchmarks. Benchmarks results on
Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
at [1]. There are no slowdowns that I could reproduce consistently.
There are twodifferent optimizations that yield this speedup:
LOAD_METHOD/CALL_METHOD opcodes and per-opcode cache in ceval loop.
LOAD_METHOD & CALL_METHOD
-------------------------
We had a lot of conversations with Victor about his PEP 509, and he sent
me a link to his amazing compilation of notes about CPython performance
[2]. One optimization that he pointed out to me was LOAD/CALL_METHOD
opcodes, an idea first originated in PyPy.
There is a patch that implements this optimization, it's tracked here:
[3]. There are some low level details that I explained in the issue,
but I'll go over the high level design in this email as well.
Every time you access a method attribute on an object, a BoundMethod
object is created. It is a fairly expensive operation, despite a
freelist of BoundMethods (so that memory allocation is generally
avoided). The idea is to detect what looks like a method call in the
compiler, and emit a pair of specialized bytecodes for that.
So instead of LOAD_GLOBAL/LOAD_ATTR/CALL_FUNCTION we will have
LOAD_GLOBAL/LOAD_METHOD/CALL_METHOD.
LOAD_METHOD looks at the object on top of the stack, and checks if the
name resolves to a method or to a regular attribute. If it's a method,
then we push the unbound method object and the object to the stack. If
it's an attribute, we push the resolved attribute and NULL.
When CALL_METHOD looks at the stack it knows how to call the unbound
method properly (pushing the object as a first arg), or how to call a
regular callable.
This idea does make CPython faster around 2-4%. And it surely doesn't
make it slower. I think it's a safe bet to at least implement this
optimization in CPython 3.6.
So far, the patch only optimizes positional-only method calls. It's
possible to optimize all kind of calls, but this will necessitate 3 more
opcodes (explained in the issue). We'll need to do some careful
benchmarking to see if it's really needed.
Per-opcode cache in ceval
-------------------------
While reading PEP 509, I was thinking about how we can use
dict->ma_version in ceval to speed up globals lookups. One of the key
assumptions (and this is what makes JITs possible) is that real-life
programs don't modify globals and rebind builtins (often), and that most
code paths operate on objects of the same type.
In CPython, all pure Python functions have code objects. When you call
a function, ceval executes its code object in a frame. Frames contain
contextual information, including pointers to the globals and builtins
dict. The key observation here is that almost all code objects always
have same pointers to the globals (the module they were defined in) and
to the builtins. And it's not a good programming practice to mutate
globals or rebind builtins.
Let's look at this function:
def spam():
print(ham)
Here are its opcodes:
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_GLOBAL 1 (ham)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
The opcodes we want to optimize are LAOD_GLOBAL, 0 and 3. Let's look at
the first one, that loads the 'print' function from builtins. The
opcode knows the following bits of information:
- its offset (0),
- its argument (0 -> 'print'),
- its type (LOAD_GLOBAL).
And these bits of information will *never* change. So if this opcode
could resolve the 'print' name (from globals or builtins, likely the
latter) and save the pointer to it somewhere, along with
globals->ma_version and builtins->ma_version, it could, on its second
call, just load this cached info back, check that the globals and
builtins dict haven't changed and push the cached ref to the stack.
That would save it from doing two dict lookups.
We can also optimize LOAD_METHOD. There are high chances, that 'obj' in
'obj.method()' will be of the same type every time we execute the code
object. So if we'd have an opcodes cache, LOAD_METHOD could then cache
a pointer to the resolved unbound method, a pointer to obj.__class__,
and tp_version_tag of obj.__class__. Then it would only need to check
if the cached object type is the same (and that it wasn't modified) and
that obj.__dict__ doesn't override 'method'. Long story short, this
caching really speeds up method calls on types implemented in C.
list.append becomes very fast, because list doesn't have a __dict__, so
the check is very cheap (with cache).
A straightforward way to implement such a cache is simple, but consumes
a lot of memory, that would be just wasted, since we only need such a
cache for LOAD_GLOBAL and LOAD_METHOD opcodes. So we have to be creative
about the cache design. Here's what I came up with:
1. We add a few fields to the code object.
2. ceval will count how many times each code object is executed.
3. When the code object is executed over ~900 times, we mark it as
"hot". We also create an 'unsigned char' array "MAPPING", with length
set to match the length of the code object. So we have a 1-to-1 mapping
between opcodes and MAPPING array.
4. Next ~100 calls, while the code object is "hot", LOAD_GLOBAL and
LOAD_METHOD do "MAPPING[opcode_offset()]++".
5. After 1024 calls to the code object, ceval loop will iterate through
the MAPPING, counting all opcodes that were executed more than 50 times.
6. We then create an array of cache structs "CACHE" (here's a link to
the updated code.h file: [6]). We update MAPPING to be a mapping
between opcode position and position in the CACHE. The code object is
now "optimized".
7. When the code object is "optimized", LOAD_METHOD and LOAD_GLOBAL use
the CACHE array for fast path.
8. When there is a cache miss, i.e. the builtins/global/obj.__dict__
were mutated, the opcode marks its entry in 'CACHE' as deoptimized, and
it will never try to use the cache again.
Here's a link to the issue tracker with the first version of the patch:
[5]. I'm working on the patch in a github repo here: [4].
Summary
-------
There are many things about this algorithm that we can improve/tweak.
Perhaps we should profile code objects longer, or account for time they
were executed. Maybe we shouldn't deoptimize opcodes on their first
cache miss. Maybe we can come up with better data structures. We also
need to profile the memory and see how much more this cache will require.
One thing I'm certain about, is that we can get a 5-10% speedup of
CPython with relatively low memory impact. And I think it's worth
exploring that!
If you're interested in these kind of optimizations, please help with
code reviews, ideas, profiling and benchmarks. The latter is especially
important, I'd never imagine how hard it is to come up with a good macro
benchmark.
I also want to thank my company MagicStack (magic.io) for sponsoring
this work.
Thanks,
Yury
[1] https://gist.github.com/1st1/aed69d63a2ff4de4c7be
[2] http://faster-cpython.readthedocs.org/index.html
[3] http://bugs.python.org/issue26110
[4] https://github.com/1st1/cpython/tree/opcache2
[5] http://bugs.python.org/issue26219
[6]
https://github.com/python/cpython/compare/master...1st1:opcache2?expand=1#d…
Hi,
In the middle of recent discussions about Python performance, it was
discussed to change the Python bytecode. Serhiy proposed to reuse
MicroPython short bytecode to reduce the disk space and reduce the
memory footprint.
Demur Rumed proposes a different change to use a regular bytecode
using 16-bit units: an instruction has always one 8-bit argument, it's
zero if the instruction doesn't have an argument:
http://bugs.python.org/issue26647
According to benchmarks, it looks faster:
http://bugs.python.org/issue26647#msg263339
IMHO it's a nice enhancement: it makes the code simpler. The most
interesting change is made in Python/ceval.c:
- if (HAS_ARG(opcode))
- oparg = NEXTARG();
+ oparg = NEXTARG();
This code is the very hot loop evaluating Python bytecode. I expect
that removing a conditional branch here can reduce the CPU branch
misprediction.
I reviewed first versions of the change, and IMHO it's almost ready to
be merged. But I would prefer to have a review from a least a second
core reviewer.
Can someone please review the change?
--
The side effect of wordcode is that arguments in 0..255 now uses 2
bytes per instruction instead of 3, so it also reduce the size of
bytecode for the most common case.
Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
argument for keyword defaults and 24-bit argument for annotations.
Other common instruction known to use large argument are jumps for
bytecode longer than 256 bytes.
--
Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
instructions. Later, we can discuss if it would be possible to ensure
that the bytecode is always aligned to 16-bit in memory to fetch the
two bytes using a uint16_t* pointer.
Maybe we can overallocate 1 byte in codeobject.c and align manually
the memory block if needed. Or ceval.c should maybe copy the code if
it's not aligned?
Raymond Hettinger proposes something like that, but it looks like
there are concerns about non-aligned memory accesses:
http://bugs.python.org/issue25823
The cost of non-aligned memory accesses depends on the CPU
architecture, but it can raise a SIGBUS on some arch (MIPS and
SPARC?).
Victor
Saw recent discussion:
https://mail.python.org/pipermail/python-dev/2016-February/143013.html
I remember trying WPython; it was fast. Unfortunately it feels it came at
the wrong time when development was invested in getting py3k out the door.
It also had a lot of other ideas like *_INT instructions which allowed
having oparg to be a constant int rather than needing to LOAD_CONST one.
Anyways I'll stop reminiscing
abarnert has started an experiment with wordcode:
https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d0f…
I've personally benchmarked this fork with positive results. This
experiment seeks to be conservative-- it doesn't seek to introduce new
opcodes or combine BINARY_OP's all into a single op where the currently
unused-in-wordcode arg then states the kind of binary op (à la COMPARE_OP).
I've submitted a pull request which is working on fixing tests & updating
peephole.c
Bringing this up on the list to figure out if there's interest in a basic
wordcode change. It feels like there's no downsides: faster code, smaller
bytecode, simpler interpretation of bytecode (The Nth instruction starts at
the 2Nth byte if you count EXTENDED_ARG as an instruction). The only
downside is the transitional cost
What'd be necessary for this to be pulled upstream?
Hi all,
after talking to Guido and Serhiy we present the next revision
of this PEP. It is a compromise that we are all happy with,
and a relatively restricted rule that makes additions to PEP 8
basically unnecessary.
I think the discussion has shown that supporting underscores in
the from-string constructors is valuable, therefore this is now
added to the specification section.
The remaining open question is about the reverse direction: do
we want a string formatting modifier that adds underscores as
thousands separators?
cheers,
Georg
-----------------------------------------------------------------
PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl, Serhiy Storchaka
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6
Post-History: 10-Feb-2016, 11-Feb-2016
Abstract and Rationale
======================
This PEP proposes to extend Python's syntax and number-from-string
constructors so that underscores can be used as visual separators for
digit grouping purposes in integral, floating-point and complex number
literals.
This is a common feature of other modern languages, and can aid
readability of long literals, or literals whose value should clearly
separate into parts, such as bytes or words in hexadecimal notation.
Examples::
# grouping decimal numbers by thousands
amount = 10_000_000.0
# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF
# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110
# same, for string conversions
flags = int('0b_1111_0000', 2)
Specification
=============
The current proposal is to allow one underscore between digits, and
after base specifiers in numeric literals. The underscores have no
semantic meaning, and literals are parsed as if the underscores were
absent.
Literal Grammar
---------------
The production list for integer literals would therefore look like
this::
integer: decinteger | bininteger | octinteger | hexinteger
decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
bininteger: "0" ("b" | "B") (["_"] bindigit)+
octinteger: "0" ("o" | "O") (["_"] octdigit)+
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit: "1"..."9"
digit: "0"..."9"
bindigit: "0" | "1"
octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F"
For floating-point and complex literals::
floatnumber: pointfloat | exponentfloat
pointfloat: [digitpart] fraction | digitpart "."
exponentfloat: (digitpart | pointfloat) exponent
digitpart: digit (["_"] digit)*
fraction: "." digitpart
exponent: ("e" | "E") ["+" | "-"] digitpart
imagnumber: (floatnumber | digitpart) ("j" | "J")
Constructors
------------
Following the same rules for placement, underscores will be allowed in
the following constructors:
- ``int()`` (with any base)
- ``float()``
- ``complex()``
- ``Decimal()``
Prior Art
=========
Those languages that do allow underscore grouping implement a large
variety of rules for allowed placement of underscores. In cases where
the language spec contradicts the actual behavior, the actual behavior
is listed. ("single" or "multiple" refer to allowing runs of
consecutive underscores.)
* Ada: single, only between digits [8]_
* C# (open proposal for 7.0): multiple, only between digits [6]_
* C++14: single, between digits (different separator chosen) [1]_
* D: multiple, anywhere, including trailing [2]_
* Java: multiple, only between digits [7]_
* Julia: single, only between digits (but not in float exponent parts)
[9]_
* Perl 5: multiple, basically anywhere, although docs say it's
restricted to one underscore between digits [3]_
* Ruby: single, only between digits (although docs say "anywhere")
[10]_
* Rust: multiple, anywhere, except for between exponent "e" and digits
[4]_
* Swift: multiple, between digits and trailing (although textual
description says only "between digits") [5]_
Alternative Syntax
==================
Underscore Placement Rules
--------------------------
Instead of the relatively strict rule specified above, the use of
underscores could be limited. As we seen from other languages, common
rules include:
* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscores allowed, but only between digits.
* Multiple consecutive underscores allowed, in most positions except
for the start of the literal, or special positions like after a
decimal point.
The syntax in this PEP has ultimately been selected because it covers
the common use cases, and does not allow for syntax that would have to
be discouraged in style guides anyway.
A less common rule would be to allow underscores only every N digits
(where N could be 3 for decimal literals, or 4 for hexadecimal ones).
This is unnecessarily restrictive, especially considering the
separator placement is different in different cultures.
Different Separators
--------------------
A proposed alternate syntax was to use whitespace for grouping.
Although strings are a precedent for combining adjoining literals, the
behavior can lead to unexpected effects which are not possible with
underscores. Also, no other language is known to use this rule,
except for languages that generally disregard any whitespace.
C++14 introduces apostrophes for grouping (because underscores
introduce ambiguity with user-defined literals), which is not
considered because of the use in Python's string literals. [1]_
Open Proposals
==============
It has been proposed [11]_ to extend the number-to-string formatting
language to allow ``_`` as a thousans separator, where currently only
``,`` is supported. This could be used to easily generate code with
more readable literals.
Implementation
==============
A preliminary patch that implements the specification given above has
been posted to the issue tracker. [12]_
References
==========
.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html
.. [2] http://dlang.org/spec/lex.html#integerliteral
.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors
.. [4] http://doc.rust-lang.org/reference.html#number-literals
.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swif…
.. [6] https://github.com/dotnet/roslyn/issues/216
.. [7]
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores…
.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4
.. [9]
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point…
.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers
.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html
.. [12] http://bugs.python.org/issue26331
Copyright
=========
This document has been placed in the public domain.
The call has started to go out for sprint groups to list themselves online.
Anyone want to specifically lead the core sprint this year? If no one
specifically does then I will sign us up and do my usual thing of pointing
people at the devguide and encourage people to ask questions but not do a
lot of hand-holding (I'm expecting to be busy either working on GitHub
migration stuff or doing other things that I have been neglecting due to my
GitHub migration work).
---------- Forwarded message ---------
From: Ewa Jodlowska <ewa(a)python.org>
Date: Mon, 4 Apr 2016 at 07:14
Subject: [PSF-Community] Sprinting at PyCon US 2016
To: <psf-community(a)python.org>
Are you coming to PyCon US? Have you thought about sprinting?
The coding Sprints are the hidden gem of PyCon, up to 4 days (June 2-5) of
coding with many Python projects and their maintainers. And if you're
coming to PyCon, taking part in the Sprints is easy!
You don’t need to change your registration* to join the Sprints. There’s no
additional registration fee, and you even get lunch. You do need to cover
the additional lodging and other meals, but that’s it. If you’ve booked a
room through the PyCon registration system, you'll need to contact the
registration team at pycon2016(a)cteusa.com as soon as possible to request
the extra nights. The sprinting itself (along with lunch every day) is
free, so your only expenses are your room and other meals.
If you're interested in what projects will be sprinting, just keep an eye
on the sprints page on the PyCon web site at
https://us.pycon.org/2016/community/sprints/ Be sure to check back, as
groups are being added all the time.
If you haven't sprinted before, or if you just need to brush up on
sprinting tools and techniques, there will again be an 'Intro to Sprinting'
session the evening of June 1, lead by Shauna Gordon-McKeon and other
members of Python community. To grab a free ticket for this session, just
visit
https://www.eventbrite.com/e/introduction-to-open-source-the-pycon-sprints-…
.
*Please note that conference registration is sold out, but you do not need
a conference registration to come to the Sprints.
_______________________________________________
PSF-Community mailing list
PSF-Community(a)python.org
https://mail.python.org/mailman/listinfo/psf-community
On Fri, Apr 29, 2016 at 12:18:46PM -0400, Random832 wrote:
> On Fri, Apr 29, 2016, at 10:45, Marcos Dione wrote:
> > One possible solution hat was suggested to me in the #python IRC
> > channel was to use that, then test if the resulting value is negative,
> > and adjust accordingly, but I wonder if there is a cleaner, more general
> > solution (for instance, what if the type was something else, like loff_t,
> > although for that one in particular there *is* a convertion
> > function/macro).
>
> In principle, you could just use PyLong_AsUnsignedLong (or LongLong),
> and raise OverflowError manually if the value happens to be out of
> size_t's range. (99% sure that on every linux platform unsigned long is
> the same size as size_t.
>
> But it's not like it'd be the first function in OS to call a system call
> that takes a size_t. Read just uses Py_ssize_t. Write uses the buffer
> protocol, which uses Py_ssize_t. How concerned are you really about the
> lost range here? What does the system call return (its return type is
> ssize_t) if it writes more than SSIZE_MAX bytes? (This shouldn't be hard
> to test, just try copying a >2GB file on a 32-bit system)
It's a very good point, but I don't have any 32 bits systems around
with a kernel-4.5. I'll try to figure it out and/or ask in the kernel ML.
> I'm more curious about what your calling convention is going to be for
> off_in and off_out. I can't think of any other interfaces that have
> optional output parameters. Python functions generally deal with output
> parameters in the underlying C function (there are a few examples in
> math) by returning a tuple.
These are not output parameters, even if they're pointers. they'r
using the NULL pointer to signal that the current offsets should not be
touched, to differentiate from a offset of 0. Something that in Python we
would use None.
Hola,
Estoy intentando conectarme a twitter para recibir tweets, sin embargo algunos códigos que he bajado de internet, me indican que debo de instalar tweepy y matplotlib, lo hago y sigo recibiendo el mensaje de que no están instalados. tweepy no reporta problemas, lo invoco en la línea de comandos, todo bien, Igual con matplotlib requiere de varias dependencias (dateutils, numpy, tornado, etc, ya las instales) antes de su instalación, pero ya en el editor de Python, al ejecutar el código, me aparece el siguiente mensaje:
ImportError: No module named 'matplotlib'
alguna idea?
Felipe
First of all, I'm not subbscribed to the list (too much traffic for
me), so please CC: me in any answers if possible.
I'm trying to add a new syscall to the os module:
https://bugs.python.org/issue26826
One of the few missing parts is to cenvert a parameter, which would
be a Python int object using PyArg_ParseTupleAndKeywords() to a size_t
variable. For something similar, the 'n' format exists, but that one
converts to Py_ssize_t (which is ssize_t, really), but that one is
signed.
One possible solution hat was suggested to me in the #python IRC
channel was to use that, then test if the resulting value is negative,
and adjust accordingly, but I wonder if there is a cleaner, more general
solution (for instance, what if the type was something else, like loff_t,
although for that one in particular there *is* a convertion
function/macro).
--
(Not so) Random fortune:
Premature optimization is the root of all evil.
-- Donald Knuth