Hi,
Here is the second PEP, part of a serie of 3 PEP to add an API to
implement a static Python optimizer specializing functions with
guards.
HTML version:
https://faster-cpython.readthedocs.org/pep_specialize.html
PEP: xxx
Title: Specialized functions with guards
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6
Abstract
========
Add an API to add specialized functions with guards to functions, to
support static optimizers respecting the Python semantic.
Rationale
=========
Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implement optimizations respecting the Python
semantic requires to detect when "something changes", we will call these
checks "guards".
This PEP proposes to add a ``specialize()`` method to functions to add a
specialized functions with guards. When the function is called, the
specialized function is used if nothing changed, otherwise use the
original bytecode.
Writing an optimizer is out of the scope of this PEP.
Example
=======
Using bytecode
--------------
Replace ``chr(65)`` with ``"A"``::
import myoptimizer
def func():
return chr(65)
def fast_func():
return "A"
func.specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")])
del fast_func
print("func(): %s" % func())
print("#specialized: %s" % len(func.get_specialized()))
print()
import builtins
builtins.chr = lambda obj: "mock"
print("func(): %s" % func())
print("#specialized: %s" % len(func.get_specialized()))
Output::
func(): A
#specialized: 1
func(): mock
#specialized: 0
The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the
builtin ``len()`` function and the ``len`` name in the global namespace.
The guard fails if the builtin function is replaced or if a ``len`` name
is defined in the global namespace.
The first call returns directly the string ``"A"``. The second call
removes the specialized function because the builtin ``chr()`` function
was replaced, and executes the original bytecode
On a microbenchmark, calling the specialized function takes 88 ns,
whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast.
Using builtin function
----------------------
Replace a slow Python function calling ``chr(obj)`` with a direct call
to the builtin ``chr()`` function::
import myoptimizer
def func(arg):
return chr(arg)
func.specialize(chr, [myoptimizer.GuardBuiltins("chr")])
print("func(65): %s" % func(65))
print("#specialized: %s" % len(func.get_specialized()))
print()
import builtins
builtins.chr = lambda obj: "mock"
print("func(65): %s" % func(65))
print("#specialized: %s" % len(func.get_specialized()))
Output::
func(): A
#specialized: 1
func(): mock
#specialized: 0
The first call returns directly the builtin ``chr()`` function (without
creating a Python frame). The second call removes the specialized
function because the builtin ``chr()`` function was replaced, and
executes the original bytecode.
On a microbenchmark, calling the specialized function takes 95 ns,
whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast.
Calling directly ``chr(65)`` takes 76 ns.
Python Function Call
====================
Pseudo-code to call a Python function having specialized functions with
guards::
def call_func(func, *args, **kwargs):
# by default, call the regular bytecode
code = func.__code__.co_code
specialized = func.get_specialized()
nspecialized = len(specialized)
index = 0
while index < nspecialized:
guard = specialized[index].guard
# pass arguments, some guards need them
check = guard(args, kwargs)
if check == 1:
# guard succeeded: we can use the specialized function
code = specialized[index].code
break
elif check == -1:
# guard will always fail: remove the specialized function
del specialized[index]
elif check == 0:
# guard failed temporarely
index += 1
# code can be a code object or any callable object
execute_code(code, args, kwargs)
Changes
=======
* Add two new methods to functions:
- ``specialize(code, guards: list)``: add specialized
function with guard. `code` is a code object (ex:
``func2.__code__``) or any callable object (ex: ``len``).
The specialization can be ignored if a guard already fails.
- ``get_specialized()``: get the list of specialized functions with
guards
* Base ``Guard`` type which can be used as parent type to implement
guards. It requires to implement a ``check()`` function, with an
optional ``first_check()`` function. API:
* ``int check(PyObject *guard, PyObject **stack)``: return 1 on
success, 0 if the guard failed temporarely, -1 if the guard will
always fail
* ``int first_check(PyObject *guard, PyObject *func)``: return 0 on
success, -1 if the guard will always fail
Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best
of 3 runs):
* Original Python: 79 ns
* Patched Python: 79 ns
According to this microbenchmark, the changes has no overhead on calling
a Python function without specialization.
Behaviour
=========
When a function code is replaced (``func.__code__ = new_code``), all
specialized functions are removed.
When a function is serialized (by ``marshal`` or ``pickle`` for
example), specialized functions and guards are ignored (not serialized).
Copyright
=========
This document has been placed in the public domain.
--
Victor
Hi,
Here is a first PEP, part of a serie of 3 PEP to add an API to
implement a static Python optimizer specializing functions with
guards.
HTML version:
https://faster-cpython.readthedocs.org/pep_dict_version.html#pep-dict-versi…
PEP: xxx
Title: Add dict.__version__
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6
Abstract
========
Add a new read-only ``__version__`` property to ``dict`` and
``collections.UserDict`` types, incremented at each change.
Rationale
=========
In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction searchs for a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (namespace of
a function) is usually optimized to an array, but it can be a dict too.
Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantic requires to detect when "something changes": we will call these
checks "guards".
The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a version to dictionaries to implement efficient
guards on namespaces.
Example of optimization: replace loading a global variable with a
constant. This optimization requires a guard on the global variable to
check if it was modified. If the variable is modified, the variable must
be loaded at runtime, instead of using the constant.
Guard example
=============
Pseudo-code of an efficient guard to check if a dictionary key was
modified (created, updated or deleted)::
UNSET = object()
class Guard:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.value = dict.get(key, UNSET)
self.version = dict.__version__
def check(self):
"""Return True if the dictionary value did not changed."""
version = self.dict.__version__
if version == self.version:
# Fast-path: avoid the dictionary lookup
return True
value = self.dict.get(self.key, UNSET)
if value == self.value:
# another key was modified:
# cache the new dictionary version
self.version = version
return True
return False
Changes
=======
Add a read-only ``__version__`` property to builtin ``dict`` type and to
the ``collections.UserDict`` type. New empty dictionaries are initilized
to version ``0``. The version is incremented at each change:
* ``clear()`` if the dict was non-empty
* ``pop(key)`` if the key exists
* ``popitem()`` if the dict is non-empty
* ``setdefault(key, value)`` if the `key` does not exist
* ``__detitem__(key)`` if the key exists
* ``__setitem__(key, value)`` if the `key` doesn't exist or if the value
is different
* ``update(...)`` if new values are different than existing values (the
version can be incremented multiple times)
Example::
>>> d = {}
>>> d.__version__
0
>>> d['key'] = 'value'
>>> d.__version__
1
>>> d['key'] = 'new value'
>>> d.__version__
2
>>> del d['key']
>>> d.__version__
3
If a dictionary is created with items, the version is also incremented
at each dictionary insertion. Example::
>>> d=dict(x=7, y=33)
>>> d.__version__
2
The version is not incremented is an existing key is modified to the
same value, but only the identifier of the value is tested, not the
content of the value. Example::
>>> d={}
>>> value = object()
>>> d['key'] = value
>>> d.__version__
2
>>> d['key'] = value
>>> d.__version__
2
.. note::
CPython uses some singleton like integers in the range [-5; 257],
empty tuple, empty strings, Unicode strings of a single character in
the range [U+0000; U+00FF], etc. When a key is set twice to the same
singleton, the version is not modified.
The PEP is designed to implement guards on namespaces, only the ``dict``
type can be used for namespaces in practice. ``collections.UserDict``
is modified because it must mimicks ``dict``. ``collections.Mapping`` is
unchanged.
Integer overflow
================
The implementation uses the C unsigned integer type ``size_t`` to store
the version. On 32-bit systems, the maximum version is ``2**32-1``
(more than ``4.2 * 10 ** 9``, 4 billions). On 64-bit systems, the maximum
version is ``2**64-1`` (more than ``1.8 * 10**19``).
The C code uses ``version++``. The behaviour on integer overflow of the
version is undefined. The minimum guarantee is that the version always
changes when the dictionary is modified.
The check ``dict.__version__ == old_version`` can be true after an
integer overflow, so a guard can return false even if the value changed,
which is wrong. The bug occurs if the dict is modified at least ``2**64``
times (on 64-bit system) between two checks of the guard.
Using a more complex type (ex: ``PyLongObject``) to avoid the overflow
would slow down operations on the ``dict`` type. Even if there is a
theorical risk of missing a value change, the risk is considered too low
compared to the slow down of using a more complex type.
Alternatives
============
Add a version to each dict entry
--------------------------------
A single version per dictionary requires to keep a strong reference to
the value which can keep the value alive longer than expected. If we add
also a version per dictionary entry, the guard can rely on the entry
version and so avoid the strong reference to the value (only strong
references to a dictionary and key are needed).
Changes: add a ``getversion(key)`` method to dictionary which returns
``None`` if the key doesn't exist. When a key is created or modified,
the entry version is set to the dictionary version which is incremented
at each change (create, modify, delete).
Pseudo-code of an efficient guard to check if a dict key was modified
using ``getversion()``::
UNSET = object()
class Guard:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.dict_version = dict.__version__
self.entry_version = dict.getversion(key)
def check(self):
"""Return True if the dictionary value did not changed."""
dict_version = self.dict.__version__
if dict_version == self.version:
# Fast-path: avoid the dictionary lookup
return True
# lookup in the dictionary, but get the entry version,
#not the value
entry_version = self.dict.getversion(self.key)
if entry_version == self.entry_version:
# another key was modified:
# cache the new dictionary version
self.dict_version = dict_version
return True
return False
This main drawback of this option is the impact on the memory footprint.
It increases the size of each dictionary entry, so the overhead depends
on the number of buckets (dictionary entries, used or unused yet). For
example, it increases the size of each dictionary entry by 8 bytes on
64-bit system if we use ``size_t``.
In Python, the memory footprint matters and the trend is more to reduce
it. Examples:
* `PEP 393 -- Flexible String Representation
<https://www.python.org/dev/peps/pep-0393/>`_
* `PEP 412 -- Key-Sharing Dictionary
<https://www.python.org/dev/peps/pep-0412/>`_
Add a new dict subtype
----------------------
Add a new ``verdict`` type, subtype of ``dict``. When guards are needed,
use the ``verdict`` for namespaces (module namespace, type namespace,
instance namespace, etc.) instead of ``dict``.
Leave the ``dict`` type unchanged to not add any overhead (memory
footprint) when guards are not needed.
Technical issue: a lot of C code in the wild, including CPython core,
expect the exact ``dict`` type. Issues:
* ``exec()`` requires a ``dict`` for globals and locals. A lot of code
use ``globals={}``. It is not possible to cast the ``dict`` to a
``dict`` subtype because the caller expects the ``globals`` parameter
to be modified (``dict`` is mutable).
* Functions call directly ``PyDict_xxx()`` functions, instead of calling
``PyObject_xxx()`` if the object is a ``dict`` subtype
* ``PyDict_CheckExact()`` check fails on ``dict`` subtype, whereas some
functions require the exact ``dict`` type.
* ``Python/ceval.c`` does not completly supports dict subtypes for
namespaces
The ``exec()`` issue is a blocker issue.
Other issues:
* The garbage collector has a special code to "untrack" ``dict``
instances. If a ``dict`` subtype is used for namespaces, the garbage
collector may be unable to break some reference cycles.
* Some functions have a fast-path for ``dict`` which would not be taken
for ``dict`` subtypes, and so it would make Python a little bit
slower.
Usage of dict.__version__
=========================
astoptimizer of FAT Python
--------------------------
The astoptimizer of the FAT Python project implements many optimizations
which require guards on namespaces. Examples:
* Call pure builtins: to replace ``len("abc")`` with ``3``, guards on
``builtins.__dict__['len']`` and ``globals()['len']`` are required
* Loop unrolling: to unroll the loop ``for i in range(...): ...``,
guards on ``builtins.__dict__['range']`` and ``globals()['range']``
are required
The `FAT Python
<http://faster-cpython.readthedocs.org/fat_python.html>`_ project is a
static optimizer for Python 3.6.
Pyjion
------
According of Brett Cannon, one of the two main developers of Pyjion, Pyjion can
also benefit from dictionary version to implement optimizations.
Pyjion is a JIT compiler for Python based upon CoreCLR (Microsoft .NET Core
runtime).
Unladen Swallow
---------------
Even if dictionary version was not explicitly mentionned, optimization globals
and builtins lookup was part of the Unladen Swallow plan: "Implement one of the
several proposed schemes for speeding lookups of globals and builtins."
Source: `Unladen Swallow ProjectPlan
<https://code.google.com/p/unladen-swallow/wiki/ProjectPlan>`_.
Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler implemented
with LLVM. The project stopped in 2011: `Unladen Swallow Retrospective
<http://qinsb.blogspot.com.au/2011/03/unladen-swallow-retrospective.html>`_.
Prior Art
=========
Cached globals+builtins lookup
------------------------------
In 2006, Andrea Griffini proposes a patch implementing a `Cached
globals+builtins lookup optimization <https://bugs.python.org/issue1616125>`_.
The patch adds a private ``timestamp`` field to dict.
See the thread on python-dev: `About dictionary lookup caching
<https://mail.python.org/pipermail/python-dev/2006-December/070348.html>`_.
Globals / builtins cache
------------------------
In 2010, Antoine Pitrou proposed a `Globals / builtins cache
<http://bugs.python.org/issue10401>`_ which adds a private
``ma_version`` field to the ``dict`` type. The patch adds a "global and
builtin cache" to functions and frames, and changes ``LOAD_GLOBAL`` and
``STORE_GLOBAL`` instructions to use the cache.
PySizer
-------
`PySizer <http://pysizer.8325.org/>`_: a memory profiler for Python,
Google Summer of Code 2005 project by Nick Smallbone.
This project has a patch for CPython 2.4 which adds ``key_time`` and
``value_time`` fields to dictionary entries. It uses a global
process-wide counter for dictionaries, incremented each time that a
dictionary is modified. The times are used to decide when child objects
first appeared in their parent objects.
Copyright
=========
This document has been placed in the public domain.
--
Victor
Hi everyone,
What do you think about enabling a more friendly interface to chmod
information in Python? I believe that currently if I want to get chmod
information from a file, I need to do this:
my_path.stat().st_mode & 0o777
(I'm using `pathlib`.)
(If there's a nicer way than this, please let me know.)
This sucks. And then the result is then a number, like 511, which you then
have to call `oct` on it to get 0o777. I'm not even happy with getting the
octal number. For some of us who live and breathe Linux, seeing a number
like 0o440 might be crystal-clear, since your mind automatically translates
that to the permissions that user/group/others have, but I haven't reached
that level.
I would really like an object-oriented approach to chmod, like an object
which I can ask "Does group have execute permissions?" and say "Please add
read permissions to everyone" etc. Just because Linux speaks in code
doesn't mean that we need to.
And of course, I'd want that on the `pathlib` module so I could do it all
on the path object without referencing another module.
What do you think?
Ram.
What do you think about implementing functionality similar to the `find`
utility in Linux in the Pathlib module? I wanted this today, I had a script
to write to archive a bunch of files from a folder, and I decided to try
writing it in Python rather than in Bash. But I needed something stronger
than `Path.glob` in order to select the files. I wanted a regular
expression. (In this particular case, I wanted to get a list of all the
files excluding the `.git` folder and all files inside of it.
Thanks,
Ram.
On Jan 11, 2016, at 03:25 PM, anatoly techtonik wrote:
>On Wed, Jan 6, 2016 at 2:49 AM, Barry Warsaw <barry(a)python.org> wrote:
>
>> reStructuredText is clearly a better format
>
>Can you expand on that? I use markdown everywhere
reST is better than plain text. Markdown is not a PEP format option.
>> all recent PEP submissions have been in reST for a while now anyway.
>
>Is it possible to query exact numbers automatically?
Feel free to grep the PEPs hg repo.
>What is the tooling support for handling PEP 9 and PEP 12?
UTSL. Everything is in the PEPs hg repo.
Cheers,
-Barry
Hi,
I hope python-ideas is the right place to post this, I'm very new to
this and appreciate a pointer in the right direction if this is not it.
The requests project is getting multiple bug reports about a problem in
the stdlib http.client, so I thought I'd raise an issue about it here.
The bug reports concern people posting http requests with unicode
strings when they should be using utf-8 encoded strings.
Since RFC 2616 says latin-1 is the default encoding http.client tries
that and fails with a UnicodeEncodeError.
My idea is NOT to change from latin-1 to something else, that would
break compliance with the spec, but instead catch that exception, and
try encoding with utf-8 instead. That would avoid breaking backward
compatibility, unless someone specifically relied on that exception,
which I think is very unlikely.
This is also how other languages http libraries seem to deal with this,
sending in unicode just works:
In cURL (works fine):
curl http://example.com -d "Celebrate 🎉"
In Ruby with http.rb (works fine):
require 'http'
r = HTTP.post("http://example.com", :body => "Celebrate 🎉)
In Node with request (works fine):
var request = require('request');
request.post({url: 'http://example.com', body: "Celebrate 🎉"}, function
(error, response, body) {
console.log(body)
})
But Python 3 with requests crashes instead:
import requests
r = requests.post("http://localhost:8000/tag", data="Celebrate 🎉")
...with the following stacktrace:
...
File "../lib/python3.4/http/client.py", line 1127, in _send_request
body = body.encode('iso-8859-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position
14-15: ordinal not in range(256)
----
So the rationale for this idea is:
* http.client doesn't work the way beginners expect for very basic
usecases (posting unicode strings)
* Libraries in other languages behave like beginners expect, which
magnifies the problem.
* Changing the default latin-1 encoding probably isn't possible, because
it would break the spec...
* But catching the exception and try encoding in utf-8 instead wouldn't
break the spec and solves the problem.
----
Here's a couple of issues where people expect things to work differently:
https://github.com/kennethreitz/requests/issues/1926https://github.com/kennethreitz/requests/issues/2838https://github.com/kennethreitz/requests/issues/1822
----
Does this make sense?
/Emil
Hi,
timedelta handling always felt cumbersome to me:
from datetime import timedelta
short_period = timedelta(seconds=10)
long_period = timedelta(hours=4, seconds=37)
Today, I came across this one https://github.com/lxc/lxd/pull/1471/files
and I found the creation of a 10 seconds timeout extremely intuitive.
Would this represent a valuable addition to Python?
from datetime import second, hour
short period = 10*second
long_period = 4*hour + 37*second
Best,
Sven
I don't think this will be at all controversial. Brett suggested, and there
was no disagreement from the PEP editors, that plain text PEPs be deprecated.
reStructuredText is clearly a better format, and all recent PEP submissions
have been in reST for a while now anyway.
I am therefore withdrawing[*] PEP 9 and have made other appropriate changes to
make it clear that only PEP 12 format is acceptable going forward. The PEP
editors will not be converting the legacy PEPs to reST, nor will we currently
be renaming the relevant PEP source files to end with ".rst" since there's too
much tooling that would have to change to do so. However, if either task
really interests you, please get in touch with the PEP editors.
it-only-took-15-years-ly y'rs,
-Barry (on behalf of the PEP editors)
[*] Status: Withdrawn being about the only currently appropriate resolution
status for process PEPs.
[Adding python-ideas back -- I'm not sure why you dropped it but it looks
like an oversight, not intentional]
On Fri, Jan 1, 2016 at 2:25 PM, Andrew Barnert <abarnert(a)yahoo.com> wrote:
> On Dec 27, 2015, at 09:04, Guido van Rossum <guido(a)python.org> wrote:
>
> > If we want some way to turn something that just defines __getitem__ and
> __len__ into a proper sequence, it should just be made to inherit from
> Sequence, which supplies the default __iter__ and __reversed__.
> (Registration is *not* good enough here.)
>
> So, if I understand correctly, you're hoping that we can first make the
> old-style sequence protocol unnecessary, except for backward compatibility,
> and then maybe change the docs to only mention it for backward
> compatibility, and only then deprecate it?
>
That sounds about right.
> I think it's worth doing those first two steps, but not actually
> deprecating it, at least while Python 2.7 is still around; otherwise, for
> dual-version code, something like Steven D'Aprano's "Squares" type would
> have to copy Indexable from the 3.x stdlib or get it from some third-party
> module like six or backports.collections.
>
Yes, that's fine. Deprecation sometimes just has to take a really long time.
> > If we really want a way to turn something that just supports __getitem__
> into an Iterable maybe we can provide an additional ABC for that purpose;
> let's call it a HalfSequence until we've come up with a better name. (We
> can't use Iterable for this because Iterable should not reference
> __getitem__.)
>
> #25988 (using Nick's name Indexable, and the details from that post).
>
Oh, interesting. Though I have misgivings about that name.
> > I also think it's fine to introduce Reversible as another ABC and
> carefully fit it into the existing hierarchy. It should be a one-trick pony
> and be another base class for Sequence; it should not have a default
> implementation. (But this has been beaten to death in other threads -- it's
> time to just file an issue with a patch.)
>
> #25987.
Thanks!
--
--Guido van Rossum (python.org/~guido)
Suppose I have a file called randomFile.py which reads like this:
class A:
def __init__(self, foo):
self.foo = foo
self.bar = bar(foo)
class B(A):
pass
class C(B):
pass
def bar(foo):
return foo + 1
Suppose in another file in the same directory, I have another python
program.
from randomFile import C
# some code
When C has to be imported, B also has to be imported because it is the
parent. Therefore, A also has to be imported. This also results in the
function bar being imported. When from ... import ... is called, does
Python follow all the references and import everything that is needed, or
does it just import the whole namespace (making wildcard imports acceptable
:O)?
--
-Surya Subbarao