It has been a while since I posted a copy of PEP 1 to the mailing
lists and newsgroups. I've recently done some updating of a few
sections, so in the interest of gaining wider community participation
in the Python development process, I'm posting the latest revision of
PEP 1 here. A version of the PEP is always available on-line at
-------------------- snip snip --------------------
Title: PEP Purpose and Guidelines
Version: $Revision: 1.36 $
Last-Modified: $Date: 2002/07/29 18:34:59 $
Author: Barry A. Warsaw, Jeremy Hylton
Post-History: 21-Mar-2001, 29-Jul-2002
What is a PEP?
PEP stands for Python Enhancement Proposal. A PEP is a design
document providing information to the Python community, or
describing a new feature for Python. The PEP should provide a
concise technical specification of the feature and a rationale for
We intend PEPs to be the primary mechanisms for proposing new
features, for collecting community input on an issue, and for
documenting the design decisions that have gone into Python. The
PEP author is responsible for building consensus within the
community and documenting dissenting opinions.
Because the PEPs are maintained as plain text files under CVS
control, their revision history is the historical record of the
Kinds of PEPs
There are two kinds of PEPs. A standards track PEP describes a
new feature or implementation for Python. An informational PEP
describes a Python design issue, or provides general guidelines or
information to the Python community, but does not propose a new
feature. Informational PEPs do not necessarily represent a Python
community consensus or recommendation, so users and implementors
are free to ignore informational PEPs or follow their advice.
PEP Work Flow
The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers
for each PEP and changes its status.
The PEP process begins with a new idea for Python. It is highly
recommended that a single PEP contain a single key proposal or new
idea. The more focussed the PEP, the more successfully it tends
to be. The PEP editor reserves the right to reject PEP proposals
if they appear too unfocussed or too broad. If in doubt, split
your PEP into several well-focussed ones.
Each PEP must have a champion -- someone who writes the PEP using
the style and format described below, shepherds the discussions in
the appropriate forums, and attempts to build community consensus
around the idea. The PEP champion (a.k.a. Author) should first
attempt to ascertain whether the idea is PEP-able. Small
enhancements or patches often don't need a PEP and can be injected
into the Python development work flow with a patch submission to
the SourceForge patch manager or feature request tracker.
The PEP champion then emails the PEP editor <peps(a)python.org> with
a proposed title and a rough, but fleshed out, draft of the PEP.
This draft must be written in PEP style as described below.
If the PEP editor approves, he will assign the PEP a number, label
it as standards track or informational, give it status 'draft',
and create and check-in the initial draft of the PEP. The PEP
editor will not unreasonably deny a PEP. Reasons for denying PEP
status include duplication of effort, being technically unsound,
not providing proper motivation or addressing backwards
compatibility, or not in keeping with the Python philosophy. The
BDFL (Benevolent Dictator for Life, Guido van Rossum) can be
consulted during the approval phase, and is the final arbitrator
of the draft's PEP-ability.
If a pre-PEP is rejected, the author may elect to take the pre-PEP
to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org
mailing list) to help flesh it out, gain feedback and consensus
from the community at large, and improve the PEP for
The author of the PEP is then responsible for posting the PEP to
the community forums, and marshaling community support for it. As
updates are necessary, the PEP author can check in new versions if
they have CVS commit permissions, or can email new PEP versions to
the PEP editor for committing.
Standards track PEPs consists of two parts, a design document and
a reference implementation. The PEP should be reviewed and
accepted before a reference implementation is begun, unless a
reference implementation will aid people in studying the PEP.
Standards Track PEPs must include an implementation - in the form
of code, patch, or URL to same - before it can be considered
PEP authors are responsible for collecting community feedback on a
PEP before submitting it for review. A PEP that has not been
discussed on python-list(a)python.org and/or python-dev(a)python.org
will not be accepted. However, wherever possible, long open-ended
discussions on public mailing lists should be avoided. Strategies
to keep the discussions efficient include, setting up a separate
SIG mailing list for the topic, having the PEP author accept
private comments in the early design phases, etc. PEP authors
should use their discretion here.
Once the authors have completed a PEP, they must inform the PEP
editor that it is ready for review. PEPs are reviewed by the BDFL
and his chosen consultants, who may accept or reject a PEP or send
it back to the author(s) for revision.
Once a PEP has been accepted, the reference implementation must be
completed. When the reference implementation is complete and
accepted by the BDFL, the status will be changed to `Final.'
A PEP can also be assigned status `Deferred.' The PEP author or
editor can assign the PEP this status when no progress is being
made on the PEP. Once a PEP is deferred, the PEP editor can
re-assign it to draft status.
A PEP can also be `Rejected'. Perhaps after all is said and done
it was not a good idea. It is still important to have a record of
PEPs can also be replaced by a different PEP, rendering the
original obsolete. This is intended for Informational PEPs, where
version 2 of an API can replace version 1.
PEP work flow is as follows:
Draft -> Accepted -> Final -> Replaced
Some informational PEPs may also have a status of `Active' if they
are never meant to be completed. E.g. PEP 1.
What belongs in a successful PEP?
Each PEP should have the following parts:
1. Preamble -- RFC822 style headers containing meta-data about the
PEP, including the PEP number, a short descriptive title
(limited to a maximum of 44 characters), the names, and
optionally the contact info for each author, etc.
2. Abstract -- a short (~200 word) description of the technical
issue being addressed.
3. Copyright/public domain -- Each PEP must either be explicitly
labelled as placed in the public domain (see this PEP as an
example) or licensed under the Open Publication License.
4. Specification -- The technical specification should describe
the syntax and semantics of any new language feature. The
specification should be detailed enough to allow competing,
interoperable implementations for any of the current Python
platforms (CPython, JPython, Python .NET).
5. Motivation -- The motivation is critical for PEPs that want to
change the Python language. It should clearly explain why the
existing language specification is inadequate to address the
problem that the PEP solves. PEP submissions without
sufficient motivation may be rejected outright.
6. Rationale -- The rationale fleshes out the specification by
describing what motivated the design and why particular design
decisions were made. It should describe alternate designs that
were considered and related work, e.g. how the feature is
supported in other languages.
The rationale should provide evidence of consensus within the
community and discuss important objections or concerns raised
7. Backwards Compatibility -- All PEPs that introduce backwards
incompatibilities must include a section describing these
incompatibilities and their severity. The PEP must explain how
the author proposes to deal with these incompatibilities. PEP
submissions without a sufficient backwards compatibility
treatise may be rejected outright.
8. Reference Implementation -- The reference implementation must
be completed before any PEP is given status 'Final,' but it
need not be completed before the PEP is accepted. It is better
to finish the specification and rationale first and reach
consensus on it before writing code.
The final implementation must include test code and
documentation appropriate for either the Python language
reference or the standard library reference.
PEPs are written in plain ASCII text, and should adhere to a
rigid style. There is a Python script that parses this style and
converts the plain text PEP to HTML for viewing on the web.
PEP 9 contains a boilerplate template you can use to get
started writing your PEP.
Each PEP must begin with an RFC822 style header preamble. The
headers must appear in the following order. Headers marked with
`*' are optional and are described below. All other headers are
PEP: <pep number>
Title: <pep title>
Version: <cvs version string>
Last-Modified: <cvs date string>
Author: <list of authors' real names and optionally, email addrs>
* Discussions-To: <email address>
Status: <Draft | Active | Accepted | Deferred | Final | Replaced>
Type: <Informational | Standards Track>
* Requires: <pep numbers>
Created: <date created on, in dd-mmm-yyyy format>
* Python-Version: <version number>
Post-History: <dates of postings to python-list and python-dev>
* Replaces: <pep number>
* Replaced-By: <pep number>
The Author: header lists the names and optionally, the email
addresses of all the authors/owners of the PEP. The format of the
author entry should be
address(a)dom.ain (Random J. User)
if the email address is included, and just
Random J. User
if the address is not given. If there are multiple authors, each
should be on a separate line following RFC 822 continuation line
conventions. Note that personal email addresses in PEPs will be
obscured as a defense against spam harvesters.
Standards track PEPs must have a Python-Version: header which
indicates the version of Python that the feature will be released
with. Informational PEPs do not need a Python-Version: header.
While a PEP is in private discussions (usually during the initial
Draft phase), a Discussions-To: header will indicate the mailing
list or URL where the PEP is being discussed. No Discussions-To:
header is necessary if the PEP is being discussed privately with
the author, or on the python-list or python-dev email mailing
lists. Note that email addresses in the Discussions-To: header
will not be obscured.
Created: records the date that the PEP was assigned a number,
while Post-History: is used to record the dates of when new
versions of the PEP are posted to python-list and/or python-dev.
Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001.
PEPs may have a Requires: header, indicating the PEP numbers that
this PEP depends on.
PEPs may also have a Replaced-By: header indicating that a PEP has
been rendered obsolete by a later document; the value is the
number of the PEP that replaces the current document. The newer
PEP must have a Replaces: header containing the number of the PEP
that it rendered obsolete.
PEP Formatting Requirements
PEP headings must begin in column zero and the initial letter of
each word must be capitalized as in book titles. Acronyms should
be in all capitals. The body of each section must be indented 4
spaces. Code samples inside body sections should be indented a
further 4 spaces, and other indentation can be used as required to
make the text readable. You must use two blank lines between the
last line of a section's body and the next section heading.
You must adhere to the Emacs convention of adding two spaces at
the end of every sentence. You should fill your paragraphs to
column 70, but under no circumstances should your lines extend
past column 79. If your code samples spill over column 79, you
should rewrite them.
Tab characters must never appear in the document at all. A PEP
should include the standard Emacs stanza included by example at
the bottom of this PEP.
A PEP must contain a Copyright section, and it is strongly
recommended to put the PEP in the public domain.
When referencing an external web page in the body of a PEP, you
should include the title of the page in the text, with a
footnote reference to the URL. Do not include the URL in the body
text of the PEP. E.g.
Refer to the Python Language web site  for more details.
When referring to another PEP, include the PEP number in the body
text, such as "PEP 1". The title may optionally appear. Add a
footnote reference that includes the PEP's title and author. It
may optionally include the explicit URL on a separate line, but
only in the References section. Note that the pep2html.py script
will calculate URLs automatically, e.g.:
Refer to PEP 1  for more information about PEP style
 PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
If you decide to provide an explicit URL for a PEP, please use
this as the URL template:
PEP numbers in URLs must be padded with zeros from the left, so as
to be exactly 4 characters wide, however PEP numbers in text are
Reporting PEP Bugs, or Submitting PEP Updates
How you report a bug, or submit a PEP update depends on several
factors, such as the maturity of the PEP, the preferences of the
PEP author, and the nature of your comments. For the early draft
stages of the PEP, it's probably best to send your comments and
changes directly to the PEP author. For more mature, or finished
PEPs you may want to submit corrections to the SourceForge bug
manager or better yet, the SourceForge patch manager so that
your changes don't get lost. If the PEP author is a SF developer,
assign the bug/patch to him, otherwise assign it to the PEP
When in doubt about where to send your changes, please check first
with the PEP author and/or PEP editor.
PEP authors who are also SF committers, can update the PEPs
themselves by using "cvs commit" to commit their changes.
Remember to also push the formatted PEP text out to the web by
doing the following:
% python pep2html.py -i NUM
where NUM is the number of the PEP you want to push out. See
% python pep2html.py --help
Transferring PEP Ownership
It occasionally becomes necessary to transfer ownership of PEPs to
a new champion. In general, we'd like to retain the original
author as a co-author of the transferred PEP, but that's really up
to the original author. A good reason to transfer ownership is
because the original author no longer has the time or interest in
updating it or following through with the PEP process, or has
fallen off the face of the 'net (i.e. is unreachable or not
responding to email). A bad reason to transfer ownership is
because you don't agree with the direction of the PEP. We try to
build consensus around a PEP, but if that's not possible, you can
always submit a competing PEP.
If you are interested assuming ownership of a PEP, send a message
asking to take over, addressed to both the original author and the
PEP editor <peps(a)python.org>. If the original author doesn't
respond to email in a timely manner, the PEP editor will make a
unilateral decision (it's not like such decisions can be
References and Footnotes
 This historical record is available by the normal CVS commands
for retrieving older revisions. For those without direct access
to the CVS tree, you can browse the current and past PEP revisions
via the SourceForge web site at
 The script referred to here is pep2html.py, which lives in
the same directory in the CVS tree as the PEPs themselves.
Try "pep2html.py --help" for details.
The URL for viewing PEPs on the web is
 PEP 9, Sample PEP Template
This document has been placed in the public domain.
In Python 2.5 `0or` was accepted by the Python parser. It became an
error in 2.6 because "0o" became recognizing as an incomplete octal
number. `1or` still is accepted.
On other hand, `1if 2else 3` is accepted despites the fact that "2e" can
be recognized as an incomplete floating point number. In this case the
tokenizer pushes "e" back and returns "2".
Shouldn't it do the same with "0o"? It is possible to make `0or` be
parseable again. Python implementation is able to tokenize this example:
$ echo '0or' | ./python -m tokenize
1,0-1,1: NUMBER '0'
1,1-1,3: NAME 'or'
1,3-1,4: OP '['
1,4-1,5: OP ']'
1,5-1,6: NEWLINE '\n'
2,0-2,0: ENDMARKER ''
On other hand, all these examples look weird. There is an assymmetry:
`1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize
visually the boundary between a number and the following identifier or
keyword, especially if numbers can contain letters ("b", "e", "j", "o",
"x") and underscores, and identifiers can contain digits. On both sides
of the boundary can be letters, digits, and underscores.
I propose to change the Python syntax by adding a requirement that there
should be a whitespace or delimiter between a numeric literal and the
webmaster has already heard from 4 people who cannot install it.
I sent them to the bug tracker or to python-list but they seem
not to have gone either place. Is there some guide I should be
sending them to, 'how to debug installation problems'?
If one goes to httWhps://www.python.org/downloads
<https://www.python.org/downloads> from a Windows browser, the default
download URL is for the 32-bit installer instead of the 64-bit one.
I wonder why is this still the case?
Shouldn't we encourage new Windows users (who may not even know the
distinction between the two architectures) to use the 64-bit version of
Python, since most likely they can?
If this is not the correct forum for this, please let me know where I can
direct my question/feature request, thanks.
I'd like to submit this PEP for discussion. It is quite specialized
and the main target audience of the proposed changes is
users and authors of applications/libraries transferring large amounts
of data (read: the scientific computing & data science ecosystems).
The PEP text is also inlined below.
Title: Pickle protocol 5 with out-of-band data
Author: Antoine Pitrou <solipsis(a)pitrou.net>
Type: Standards Track
This PEP proposes to standardize a new pickle protocol version, and
accompanying APIs to take full advantage of it:
1. A new pickle protocol version (5) to cover the extra metadata needed
for out-of-band data buffers.
2. A new ``PickleBuffer`` type for ``__reduce_ex__`` implementations
to return out-of-band data buffers.
3. A new ``buffer_callback`` parameter when pickling, to handle out-of-band
4. A new ``buffers`` parameter when unpickling to provide out-of-band data
The PEP guarantees unchanged behaviour for anyone not using the new APIs.
The pickle protocol was originally designed in 1995 for on-disk persistency
of arbitrary Python objects. The performance of a 1995-era storage medium
probably made it irrelevant to focus on performance metrics such as
use of RAM bandwidth when copying temporary data before writing it to disk.
Nowadays the pickle protocol sees a growing use in applications where most
of the data isn't ever persisted to disk (or, when it is, it uses a portable
format instead of Python-specific). Instead, pickle is being used to transmit
data and commands from one process to another, either on the same machine
or on multiple machines. Those applications will sometimes deal with very
large data (such as Numpy arrays or Pandas dataframes) that need to be
transferred around. For those applications, pickle is currently
wasteful as it imposes spurious memory copies of the data being serialized.
As a matter of fact, the standard ``multiprocessing`` module uses pickle
for serialization, and therefore also suffers from this problem when
sending large data to another process.
Third-party Python libraries, such as Dask [#dask]_, PyArrow [#pyarrow]_
and IPyParallel [#ipyparallel]_, have started implementing alternative
serialization schemes with the explicit goal of avoiding copies on large
data. Implementing a new serialization scheme is difficult and often
leads to reduced generality (since many Python objects support pickle
but not the new serialization scheme). Falling back on pickle for
unsupported types is an option, but then you get back the spurious
memory copies you wanted to avoid in the first place. For example,
``dask`` is able to avoid memory copies for Numpy arrays and
built-in containers thereof (such as lists or dicts containing Numpy
arrays), but if a large Numpy array is an attribute of a user-defined
object, ``dask`` will serialize the user-defined object as a pickle
stream, leading to memory copies.
The common theme of these third-party serialization efforts is to generate
a stream of object metadata (which contains pickle-like information about
the objects being serialized) and a separate stream of zero-copy buffer
objects for the payloads of large objects. Note that, in this scheme,
small objects such as ints, etc. can be dumped together with the metadata
stream. Refinements can include opportunistic compression of large data
depending on its type and layout, like ``dask`` does.
This PEP aims to make ``pickle`` usable in a way where large data is handled
as a separate stream of zero-copy buffers, letting the application handle
those buffers optimally.
To keep the example simple and avoid requiring knowledge of third-party
libraries, we will focus here on a bytearray object (but the issue is
conceptually the same with more sophisticated objects such as Numpy arrays).
Like most objects, the bytearray object isn't immediately understood by
the pickle module and must therefore specify its decomposition scheme.
Here is how a bytearray object currently decomposes for pickling::
(<class 'bytearray'>, (b'abc',), None)
This is because the ``bytearray.__reduce_ex__`` implementation reads
morally as follows::
def __reduce_ex__(self, protocol):
if protocol == 4:
return type(self), bytes(self), None
# Legacy code for earlier protocols omitted
In turn it produces the following pickle code::
>>> pickletools.dis(pickletools.optimize(pickle.dumps(b, protocol=4)))
0: \x80 PROTO 4
2: \x95 FRAME 30
11: \x8c SHORT_BINUNICODE 'builtins'
21: \x8c SHORT_BINUNICODE 'bytearray'
32: \x93 STACK_GLOBAL
33: C SHORT_BINBYTES b'abc'
38: \x85 TUPLE1
39: R REDUCE
40: . STOP
(the call to ``pickletools.optimize`` above is only meant to make the
pickle stream more readable by removing the MEMOIZE opcodes)
We can notice several things about the bytearray's payload (the sequence
of bytes ``b'abc'``):
* ``bytearray.__reduce_ex__`` produces a first copy by instantiating a
new bytes object from the bytearray's data.
* ``pickle.dumps`` produces a second copy when inserting the contents of
that bytes object into the pickle stream, after the SHORT_BINBYTES opcode.
* Furthermore, when deserializing the pickle stream, a temporary bytes
object is created when the SHORT_BINBYTES opcode is encountered (inducing
a data copy).
What we really want is something like the following:
* ``bytearray.__reduce_ex__`` produces a *view* of the bytearray's data.
* ``pickle.dumps`` doesn't try to copy that data into the pickle stream
but instead passes the buffer view to its caller (which can decide on the
most efficient handling of that buffer).
* When deserializing, ``pickle.loads`` takes the pickle stream and the
buffer view separately, and passes the buffer view directly to the
We see that several conditions are required for the above to work:
* ``__reduce__`` or ``__reduce_ex__`` must be able to return *something*
that indicates a serializable no-copy buffer view.
* The pickle protocol must be able to represent references to such buffer
views, instructing the unpickler that it may have to get the actual buffer
out of band.
* The ``pickle.Pickler`` API must provide its caller with a way
to receive such buffer views while serializing.
* The ``pickle.Unpickler`` API must similarly allow its caller to provide
the buffer views required for deserialization.
* For compatibility, the pickle protocol must also be able to contain direct
serializations of such buffer views, such that current uses of the ``pickle``
API don't have to be modified if they are not concerned with memory copies.
We are introducing a new type ``pickle.PickleBuffer`` which can be
instantiated from any buffer-supporting object, and is specifically meant
to be returned from ``__reduce__`` implementations::
def __reduce_ex__(self, protocol):
if protocol == 5:
return type(self), PickleBuffer(self), None
# Legacy code for earlier protocols omitted
``PickleBuffer`` is a simple wrapper that doesn't have all the memoryview
semantics and functionality, but is specifically recognized by the ``pickle``
module if protocol 5 or higher is enabled. It is an error to try to
serialize a ``PickleBuffer`` with pickle protocol version 4 or earlier.
Only the raw *data* of the ``PickleBuffer`` will be considered by the
``pickle`` module. Any type-specific *metadata* (such as shapes or
datatype) must be returned separately by the type's ``__reduce__``
implementation, as is already the case.
The ``PickleBuffer`` class supports a very simple Python API. Its constructor
takes a single PEP 3118-compatible object [#pep-3118]_. ``PickleBuffer``
objects themselves support the buffer protocol, so consumers can
call ``memoryview(...)`` on them to get additional information
about the underlying buffer (such as the original type, shape, etc.).
On the C side, a simple API will be provided to create and inspect
``PyObject *PyPickleBuffer_FromObject(PyObject *obj)``
Create a ``PickleBuffer`` object holding a view over the PEP 3118-compatible
Return whether *obj* is a ``PickleBuffer`` instance.
``const Py_buffer *PyPickleBuffer_GetBuffer(PyObject *picklebuf)``
Return a pointer to the internal ``Py_buffer`` owned by the ``PickleBuffer``
``PickleBuffer`` can wrap any kind of buffer, including non-contiguous
buffers. It's up to consumers to decide how best to handle different kinds
of buffers (for example, some consumers may find it acceptable to make a
contiguous copy of non-contiguous buffers).
``pickle.Pickler.__init__`` and ``pickle.dumps`` are augmented with an additional
def __init__(self, file, protocol=None, ..., buffer_callback=None):
If *buffer_callback* is not None, then it is called with a list
of out-of-band buffer views when deemed necessary (this could be
once every buffer, or only after a certain size is reached,
or once at the end, depending on implementation details). The
callback should arrange to store or transmit those buffers without
changing their order.
If *buffer_callback* is None (the default), buffer views are
serialized into *file* as part of the pickle stream.
It is an error if *buffer_callback* is not None and *protocol* is
None or smaller than 5.
def pickle.dumps(obj, protocol=None, *, ..., buffer_callback=None):
See above for *buffer_callback*.
``pickle.Unpickler.__init__`` and ``pickle.loads`` are augmented with an
additional ``buffers`` parameter::
def __init__(file, *, ..., buffers=None):
If *buffers* is not None, it should be an iterable of buffer-enabled
objects that is consumed each time the pickle stream references
an out-of-band buffer view. Such buffers have been given in order
to the *buffer_callback* of a Pickler object.
If *buffers* is None (the default), then the buffers are taken
from the pickle stream, assuming they are serialized there.
It is an error for *buffers* to be None if the pickle stream
was produced with a non-None *buffer_callback*.
def pickle.loads(data, *, ..., buffers=None):
See above for *buffers*.
Three new opcodes are introduced:
* ``BYTEARRAY`` creates a bytearray from the data following it in the pickle
stream and pushes it on the stack (just like ``BINBYTES8`` does for bytes
* ``NEXT_BUFFER`` fetches a buffer from the ``buffers`` iterable and pushes
it on the stack.
* ``READONLY_BUFFER`` makes a readonly view of the top of the stack.
When pickling encounters a ``PickleBuffer``, there can be four cases:
* If a ``buffer_callback`` is given and the ``PickleBuffer`` is writable,
the ``PickleBuffer`` is given to the callback and a ``NEXT_BUFFER`` opcode
is appended to the pickle stream.
* If a ``buffer_callback`` is given and the ``PickleBuffer`` is readonly,
the ``PickleBuffer`` is given to the callback and a ``NEXT_BUFFER`` opcode
is appended to the pickle stream, followed by a ``READONLY_BUFFER`` opcode.
* If no ``buffer_callback`` is given and the ``PickleBuffer`` is writable,
it is serialized into the pickle stream as if it were a ``bytearray`` object.
* If no ``buffer_callback`` is given and the ``PickleBuffer`` is readonly,
it is serialized into the pickle stream as if it were a ``bytes`` object.
The distinction between readonly and writable buffers is explained below
PEP 3118 buffers [#pep-3118]_ can be readonly or writable. Some objects,
such as Numpy arrays, need to be backed by a mutable buffer for full
operation. Pickle consumers that use the ``buffer_callback`` and ``buffers``
arguments will have to be careful to recreate mutable buffers. When doing
I/O, this implies using buffer-passing API variants such as ``readinto``
(which are also often preferrable for performance).
If you pickle and then unpickle an object in the same process, passing
out-of-band buffer views, then the unpickled object may be backed by the
same buffer as the original pickled object.
For example, it might be reasonable to implement reduction of a Numpy array
as follows (crucial metadata such as shapes is omitted for simplicity)::
def __reduce_ex__(self, protocol):
if protocol == 5:
return numpy.frombuffer, (PickleBuffer(self), self.dtype)
# Legacy code for earlier protocols omitted
Then simply passing the PickleBuffer around from ``dumps`` to ``loads``
will produce a new Numpy array sharing the same underlying memory as the
original Numpy object (and, incidentally, keeping it alive)::
>>> import numpy as np
>>> a = np.zeros(10)
>>> buffers = 
>>> data = pickle.dumps(a, protocol=5, buffer_callback=buffers.extend)
>>> b = pickle.loads(data, buffers=buffers)
>>> b = 42
This won't happen with the traditional ``pickle`` API (i.e. without passing
``buffers`` and ``buffer_callback`` parameters), because then the buffer view
is serialized inside the pickle stream with a copy.
The ``pickle`` persistence interface is a way of storing references to
designated objects in the pickle stream while handling their actual
serialization out of band. For example, one might consider the following
for zero-copy serialization of bytearrays::
def __init__(self, *args, **kwargs):
self.buffers = 
def persistent_id(self, obj):
if type(obj) is not bytearray:
index = len(self.buffers)
return ('bytearray', index)
def __init__(self, *args, buffers, **kwargs):
self.buffers = buffers
def persistent_load(self, pid):
type_tag, index = pid
if type_tag == 'bytearray':
assert 0 # unexpected type
This mechanism has two drawbacks:
* Each ``pickle`` consumer must reimplement ``Pickler`` and ``Unpickler``
subclasses, with custom code for each type of interest. Essentially,
N pickle consumers end up each implementing custom code for M producers.
This is difficult (especially for sophisticated types such as Numpy
arrays) and poorly scalable.
* Each object encountered by the pickle module (even simple built-in objects
such as ints and strings) triggers a call to the user's ``persistent_id()``
method, leading to a possible performance drop compared to nominal.
Should ``buffer_callback`` take a single buffers or a sequence of buffers?
* Taking a single buffer would allow returning a boolean indicating whether
the given buffer is serialized in-band or out-of-band.
* Taking a sequence of buffers is potentially more efficient by reducing
function call overhead.
Dask.distributed implements a custom zero-copy serialization with fallback
to pickle [#dask-serialization]_.
PyArrow implements zero-copy component-based serialization for a few
selected types [#pyarrow-serialization]_.
PEP 554 proposes hosting multiple interpreters in a single process, with
provisions for transferring buffers between interpreters as a communication
Thanks to the following people for early feedback: Nick Coghlan, Olivier
Grisel, Stefan Krah, MinRK, Matt Rocklin, Eric Snow.
.. [#dask] Dask.distributed -- A lightweight library for distributed computing
.. [#dask-serialization] Dask.distributed custom serialization
.. [#ipyparallel] IPyParallel -- Using IPython for parallel computing
.. [#pyarrow] PyArrow -- A cross-language development platform for in-memory data
.. [#pyarrow-serialization] PyArrow IPC and component-based serialization
.. [#pep-3118] PEP 3118 -- Revising the buffer protocol
.. [#pep-554] PEP 554 -- Multiple Interpreters in the Stdlib
This document has been placed into the public domain.
On Twitter, Raymond Hettinger wrote:
"The decision making process on Python-dev is an anti-pattern,
governed by anecdotal data and ambiguity over what problem is solved."
About "anecdotal data", I would like to discuss the Python startup time.
== Python 3.7 compared to 2.7 ==
First of all, on speed.python.org, we have:
* Python 2.7: 6.4 ms with site, 3.0 ms without site (-S)
* master (3.7): 14.5 ms with site, 8.4 ms without site (-S)
Python 3.7 startup time is 2.3x slower with site (default mode), or
2.8x slower without site (-S command line option).
(I will skip Python 3.4, 3.5 and 3.6 which are much worse than Python 3.7...)
So if an user complained about Python 2.7 startup time: be prepared
for a 2x - 3x more angry user when "forced" to upgrade to Python 3!
== Mercurial vs Git, Python vs C, startup time ==
Startup time matters a lot for Mercurial since Mercurial is compared
to Git. Git and Mercurial have similar features, but Git is written in
C whereas Mercurial is written in Python. Quick benchmark on the
* hg version: 44.6 ms +- 0.2 ms
* git --version: 974 us +- 7 us
Mercurial startup time is already 45.8x slower than Git whereas tested
Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
developers, with a startup time 2x - 3x slower...
I tested Mecurial 3.7.3 and Git 2.7.4 on Ubuntu 16.04.1 using "python3
-m perf command -- ...".
== CPython core developers don't care? no, they do care ==
Christian Heimes, Naoki INADA, Serhiy Storchaka, Yury Selivanov, me
(Victor Stinner) and other core developers made multiple changes last
years to reduce the number of imports at startup, optimize impotlib,
IHMO all these core developers are well aware of the competition of
programming languages, and honesty Python startup time isn't "good".
So let's compare it to other programming languages similar to Python.
== PHP, Ruby, Perl ==
I measured the startup time of other programming languages which are
similar to Python, still on the speed.python.org server using "python3
-m perf command -- ...":
* perl -e ' ': 1.18 ms +- 0.01 ms
* php -r ' ': 8.57 ms +- 0.05 ms
* ruby -e ' ': 32.8 ms +- 0.1 ms
Wow, Perl is quite good! PHP seems as good as Python 2 (but Python 3
is worse). Ruby startup time seems less optimized than other
* perl 5, version 22, subversion 1 (v5.22.1)
* PHP 7.0.18-0ubuntu0.16.04.1 (cli) ( NTS )
* ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]
== Quick Google search ==
I also searched for "python startup time" and "python slow startup
time" on Google and found many articles. Some examples:
"Reducing the Python startup time"
=> "The python startup time always nagged me (17-30ms) and I just
searched again for a way to reduce it, when I found this: The
Python-Launcher caches GTK imports and forks new processes to reduce
the startup time of python GUI programs."
=> "Wow, Python startup time is worse than I thought."
"How to speed up python starting up and/or reduce file search while
=> "The first time I log to the system and start one command it takes
6 seconds just to show a few line of help. If I immediately issue the
same command again it takes 0.1s. After a couple of minutes it gets
back to 6s. (proof of short-lived cache)"
"How does one optimise the startup of a Python script/program?"
=> "I wrote a Python program that would be used very often (imagine
'cd' or 'ls') for very short runtimes, how would I make it start up as
fast as possible?"
"Python Interpreter Startup time"
"Python is very slow to start on Windows 7"
=> "Python takes 17 times longer to load on my Windows 7 machine than
Ubuntu 14.04 running on a VM"
=> "returns in 0.614s on Windows and 0.036s on Linux"
"How to make a fast command line tool in Python" (old article Python 2.5.2)
=> "(...) some techniques Bazaar uses to start quickly, such as lazy imports."
So please continue efforts for make Python startup even faster to beat
all other programming languages, and finally convince Mercurial to
Let me present PEP 579 and PEP 580.
PEP 579 is an informational meta-PEP, listing some of the issues with
functions/methods implemented in C. The idea is to create several PEPs
each fix some part of the issues mentioned in PEP 579.
PEP 580 is a standards track PEP to introduce a new "C call" protocol,
which is an important part of PEP 579. In the reference implementation
(which is work in progress), this protocol will be used by built-in
functions and methods. However, it should be used by more classes in the
You find the texts at
On 2018-07-31 08:58, Antoine Pitrou wrote:
> I think Stefan is right that we
> should push people towards Cython and alternatives, rather than direct
> use of the C API (which people often fail to use correctly, in my
I know this probably isn't the correct place to bring it up, but I'm
sure that CPython itself could benefit from using Cython. For example,
most of the C extensions in Modules/ could be written in Cython.
I just sent an email to the capi-sig mailing list. Since this mailing
list was idle for months, I copy my email here to get a wider
audience. But if possible, I would prefer that you join me on capi-sig
to reply ;-)
Last year, I gave a talk at the Language Summit (during Pycon) to
explain that CPython should become 2x faster to remain competitive.
IMHO all attempts to optimize Python (CPython forks) have failed
because they have been blocked by the C API which implies strict
I started to write a proposal to change the C API to hide
implementation details, to prepare CPython for future changes. It
allows to experimental optimization ideas without loosing support for
C extensions are a large part of the Python success. They are also the
reason why PyPy didn't replace CPython yet. PyPy cpyext remains slower
than CPython because PyPy has to mimick CPython which adds a
significant overhead (even if PyPy developers are working *hard* to
I created a new to discuss how to introduce backward incompatible
changes in the C API without breaking all C extensions:
The source can be found at:
I would like to create a team of people who want to work on this
project: CPython, PyPy, Cython and anyone who depends on the C API.
Contact me in private if you want to be added to the GitHub project.
I propose to discuss on the capi-sig mailing list since I would like
to involve people from various projects, and I don't want to bother
you with the high traffic of python-dev.
PS: I added some people as BCC ;-)
I finished my work on the _PyCoreConfig structure: it's a C structure
in Include/pystate.h which has many fields used to configure Python
initialization. In Python 3.6 and older, these parameters were scatted
around the code, and it was hard to get an exhaustive list of it.
This work is linked to the Nick Coghlan's PEP 432 "Restructuring the
CPython startup sequence":
Right now, the new API is still private. Nick Coghlan splitted the
initialization in two parts: "core" and "main". I'm not sure that this
split is needed. We should see what to do, but it would be nice to
make the _PyCoreConfig API public! IMHO it's way better than the old
way to configuration Python initialization.
It is now possible to only use _PyCoreConfig to initialize Python: it
overrides old ways to configure Python like environment variables (ex:
PYTHONPATH), global configuration variables (ex: Py_BytesWarningFlag)
and C functions (ex: Py_SetProgramName()).
I added tests to test_embed on the different ways to configure Python
* environment variables (ex: PYTHONPATH)
* global configuration variables (ex: Py_BytesWarningFlag) and C
functions (ex: Py_SetProgramName())
I found and fixed many issues when writing these tests :-)
Reading the current configuration, _PyCoreConfig_Read(), no longer
changes the configuration. Now the code to read the configuration and
the code to apply the configuration is properly separated.
The work is not fully complete, there are a few remaining corner cases
and some parameters (ex: Py_FrozenFlag) which cannot be set by
_PyCoreConfig yet. My latest issue used to work on this API:
I had to refactor a lot of code to implement all of that.
The problem is that Python 3.7 got the half-baked implementation, and
it caused issues:
* Calling Py_Main() after Py_Initialize() fails with a fatal error on
* PYTHONOPTIMIZE environment variable is ignored by Py_Initialize()
I fixed the first issue, I'm now working on the second one to see how
it can be fixed. Other option would be to backport the code from
master to the 3.7 branch, since the code in master has a way better
design. But it requires to backport a lot of changes. I'm not sure yet
what is the best option.