Python-Dev
Threads by month
- ----- 2024 -----
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2002 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2001 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2000 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 1999 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
August 2017
- 67 participants
- 51 discussions
I'm back, I've re-read the PEP, and I've re-read the long thread with "(no
subject)".
I think Georg Brandl nailed it:
"""
*I like the "sequence and dict flattening" part of the PEP, mostly because
itis consistent and should be easy to understand, but the comprehension
syntaxenhancements seem to be bad for readability and "comprehending" what
the codedoes.The call syntax part is a mixed bag on the one hand it is nice
to be consistent with the extended possibilities in literals (flattening),
but on the other hand there would be small but annoying inconsistencies
anyways (e.g. the duplicate kwarg case above).*
"""
Greg Ewing followed up explaining that the inconsistency between dict
flattening and call syntax is inherent in the pre-existing different rules
for dicts vs. keyword args: {'a':1, 'a':2} results in {'a':2}, while f(a=1,
a=2) is an error. (This form is a SyntaxError; the dynamic case f(a=1,
**{'a': 1}) is a TypeError.)
For me, allowing f(*a, *b) and f(**d, **e) and all the other combinations
for function calls proposed by the PEP is an easy +1 -- it's a
straightforward extension of the existing pattern, and anybody who knows
what f(x, *a) does will understand f(x, *a, y, *b). Guessing what f(**d,
**e) means shouldn't be hard either. Understanding the edge case for
duplicate keys with f(**d, **e) is a little harder, but the error messages
are pretty clear, and it is not a new edge case.
The sequence and dict flattening syntax proposals are also clean and
logical -- we already have *-unpacking on the receiving side, so allowing
*x in tuple expressions reads pretty naturally (and the similarity with *a
in argument lists certainly helps). From here, having [a, *x, b, *y] is
also natural, and then the extension to other displays is natural: {a, *x,
b, *y} and {a:1, **d, b:2, **e}. This, too, gets a +1 from me.
So that leaves comprehensions. IIRC, during the development of the patch we
realized that f(*x for x in xs) is sufficiently ambiguous that we decided
to disallow it -- note that f(x for x in xs) is already somewhat of a
special case because an argument can only be a "bare" generator expression
if it is the only argument. The same reasoning doesn't apply (in that form)
to list, set and dict comprehensions -- while f(x for x in xs) is identical
in meaning to f((x for x in xs)), [x for x in xs] is NOT the same as [(x
for x in xs)] (that's a list of one element, and the element is a generator
expression).
The basic premise of this part of the proposal is that if you have a few
iterables, the new proposal (without comprehensions) lets you create a list
or generator expression that iterates over all of them, essentially
flattening them:
>>> xs = [1, 2, 3]
>>> ys = ['abc', 'def']
>>> zs = [99]
>>> [*xs, *ys, *zs]
[1, 2, 3, 'abc', 'def', 99]
>>>
But now suppose you have a list of iterables:
>>> xss = [[1, 2, 3], ['abc', 'def'], [99]]
>>> [*xss[0], *xss[1], *xss[2]]
[1, 2, 3, 'abc', 'def', 99]
>>>
Wouldn't it be nice if you could write the latter using a comprehension?
>>> xss = [[1, 2, 3], ['abc', 'def'], [99]]
>>> [*xs for xs in xss]
[1, 2, 3, 'abc', 'def', 99]
>>>
This is somewhat seductive, and the following is even nicer: the *xs
position may be an expression, e.g.:
>>> xss = [[1, 2, 3], ['abc', 'def'], [99]]
>>> [*xs[:2] for xs in xss]
[1, 2, 'abc', 'def', 99]
>>>
On the other hand, I had to explore the possibilities here by experimenting
in the interpreter, and I discovered some odd edge cases (e.g. you can
parenthesize the starred expression, but that seems a syntactic accident).
All in all I am personally +0 on the comprehension part of the PEP, and I
like that it provides a way to "flatten" a sequence of sequences, but I
think very few people in the thread have supported this part. Therefore I
would like to ask Neil to update the PEP and the patch to take out the
comprehension part, so that the two "easy wins" can make it into Python 3.5
(basically, I am accepting two-thirds of the PEP :-). There is some time
yet until alpha 2.
I would also like code reviewers (Benjamin?) to start reviewing the patch
<http://bugs.python.org/issue2292>, taking into account that the
comprehension part needs to be removed.
--
--Guido van Rossum (python.org/~guido)
7
17
It has been a while since I posted a copy of PEP 1 to the mailing
lists and newsgroups. I've recently done some updating of a few
sections, so in the interest of gaining wider community participation
in the Python development process, I'm posting the latest revision of
PEP 1 here. A version of the PEP is always available on-line at
http://www.python.org/peps/pep-0001.html
Enjoy,
-Barry
-------------------- snip snip --------------------
PEP: 1
Title: PEP Purpose and Guidelines
Version: $Revision: 1.36 $
Last-Modified: $Date: 2002/07/29 18:34:59 $
Author: Barry A. Warsaw, Jeremy Hylton
Status: Active
Type: Informational
Created: 13-Jun-2000
Post-History: 21-Mar-2001, 29-Jul-2002
What is a PEP?
PEP stands for Python Enhancement Proposal. A PEP is a design
document providing information to the Python community, or
describing a new feature for Python. The PEP should provide a
concise technical specification of the feature and a rationale for
the feature.
We intend PEPs to be the primary mechanisms for proposing new
features, for collecting community input on an issue, and for
documenting the design decisions that have gone into Python. The
PEP author is responsible for building consensus within the
community and documenting dissenting opinions.
Because the PEPs are maintained as plain text files under CVS
control, their revision history is the historical record of the
feature proposal[1].
Kinds of PEPs
There are two kinds of PEPs. A standards track PEP describes a
new feature or implementation for Python. An informational PEP
describes a Python design issue, or provides general guidelines or
information to the Python community, but does not propose a new
feature. Informational PEPs do not necessarily represent a Python
community consensus or recommendation, so users and implementors
are free to ignore informational PEPs or follow their advice.
PEP Work Flow
The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers
for each PEP and changes its status.
The PEP process begins with a new idea for Python. It is highly
recommended that a single PEP contain a single key proposal or new
idea. The more focussed the PEP, the more successfully it tends
to be. The PEP editor reserves the right to reject PEP proposals
if they appear too unfocussed or too broad. If in doubt, split
your PEP into several well-focussed ones.
Each PEP must have a champion -- someone who writes the PEP using
the style and format described below, shepherds the discussions in
the appropriate forums, and attempts to build community consensus
around the idea. The PEP champion (a.k.a. Author) should first
attempt to ascertain whether the idea is PEP-able. Small
enhancements or patches often don't need a PEP and can be injected
into the Python development work flow with a patch submission to
the SourceForge patch manager[2] or feature request tracker[3].
The PEP champion then emails the PEP editor <peps(a)python.org> with
a proposed title and a rough, but fleshed out, draft of the PEP.
This draft must be written in PEP style as described below.
If the PEP editor approves, he will assign the PEP a number, label
it as standards track or informational, give it status 'draft',
and create and check-in the initial draft of the PEP. The PEP
editor will not unreasonably deny a PEP. Reasons for denying PEP
status include duplication of effort, being technically unsound,
not providing proper motivation or addressing backwards
compatibility, or not in keeping with the Python philosophy. The
BDFL (Benevolent Dictator for Life, Guido van Rossum) can be
consulted during the approval phase, and is the final arbitrator
of the draft's PEP-ability.
If a pre-PEP is rejected, the author may elect to take the pre-PEP
to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org
mailing list) to help flesh it out, gain feedback and consensus
from the community at large, and improve the PEP for
re-submission.
The author of the PEP is then responsible for posting the PEP to
the community forums, and marshaling community support for it. As
updates are necessary, the PEP author can check in new versions if
they have CVS commit permissions, or can email new PEP versions to
the PEP editor for committing.
Standards track PEPs consists of two parts, a design document and
a reference implementation. The PEP should be reviewed and
accepted before a reference implementation is begun, unless a
reference implementation will aid people in studying the PEP.
Standards Track PEPs must include an implementation - in the form
of code, patch, or URL to same - before it can be considered
Final.
PEP authors are responsible for collecting community feedback on a
PEP before submitting it for review. A PEP that has not been
discussed on python-list(a)python.org and/or python-dev(a)python.org
will not be accepted. However, wherever possible, long open-ended
discussions on public mailing lists should be avoided. Strategies
to keep the discussions efficient include, setting up a separate
SIG mailing list for the topic, having the PEP author accept
private comments in the early design phases, etc. PEP authors
should use their discretion here.
Once the authors have completed a PEP, they must inform the PEP
editor that it is ready for review. PEPs are reviewed by the BDFL
and his chosen consultants, who may accept or reject a PEP or send
it back to the author(s) for revision.
Once a PEP has been accepted, the reference implementation must be
completed. When the reference implementation is complete and
accepted by the BDFL, the status will be changed to `Final.'
A PEP can also be assigned status `Deferred.' The PEP author or
editor can assign the PEP this status when no progress is being
made on the PEP. Once a PEP is deferred, the PEP editor can
re-assign it to draft status.
A PEP can also be `Rejected'. Perhaps after all is said and done
it was not a good idea. It is still important to have a record of
this fact.
PEPs can also be replaced by a different PEP, rendering the
original obsolete. This is intended for Informational PEPs, where
version 2 of an API can replace version 1.
PEP work flow is as follows:
Draft -> Accepted -> Final -> Replaced
^
+----> Rejected
v
Deferred
Some informational PEPs may also have a status of `Active' if they
are never meant to be completed. E.g. PEP 1.
What belongs in a successful PEP?
Each PEP should have the following parts:
1. Preamble -- RFC822 style headers containing meta-data about the
PEP, including the PEP number, a short descriptive title
(limited to a maximum of 44 characters), the names, and
optionally the contact info for each author, etc.
2. Abstract -- a short (~200 word) description of the technical
issue being addressed.
3. Copyright/public domain -- Each PEP must either be explicitly
labelled as placed in the public domain (see this PEP as an
example) or licensed under the Open Publication License[4].
4. Specification -- The technical specification should describe
the syntax and semantics of any new language feature. The
specification should be detailed enough to allow competing,
interoperable implementations for any of the current Python
platforms (CPython, JPython, Python .NET).
5. Motivation -- The motivation is critical for PEPs that want to
change the Python language. It should clearly explain why the
existing language specification is inadequate to address the
problem that the PEP solves. PEP submissions without
sufficient motivation may be rejected outright.
6. Rationale -- The rationale fleshes out the specification by
describing what motivated the design and why particular design
decisions were made. It should describe alternate designs that
were considered and related work, e.g. how the feature is
supported in other languages.
The rationale should provide evidence of consensus within the
community and discuss important objections or concerns raised
during discussion.
7. Backwards Compatibility -- All PEPs that introduce backwards
incompatibilities must include a section describing these
incompatibilities and their severity. The PEP must explain how
the author proposes to deal with these incompatibilities. PEP
submissions without a sufficient backwards compatibility
treatise may be rejected outright.
8. Reference Implementation -- The reference implementation must
be completed before any PEP is given status 'Final,' but it
need not be completed before the PEP is accepted. It is better
to finish the specification and rationale first and reach
consensus on it before writing code.
The final implementation must include test code and
documentation appropriate for either the Python language
reference or the standard library reference.
PEP Template
PEPs are written in plain ASCII text, and should adhere to a
rigid style. There is a Python script that parses this style and
converts the plain text PEP to HTML for viewing on the web[5].
PEP 9 contains a boilerplate[7] template you can use to get
started writing your PEP.
Each PEP must begin with an RFC822 style header preamble. The
headers must appear in the following order. Headers marked with
`*' are optional and are described below. All other headers are
required.
PEP: <pep number>
Title: <pep title>
Version: <cvs version string>
Last-Modified: <cvs date string>
Author: <list of authors' real names and optionally, email addrs>
* Discussions-To: <email address>
Status: <Draft | Active | Accepted | Deferred | Final | Replaced>
Type: <Informational | Standards Track>
* Requires: <pep numbers>
Created: <date created on, in dd-mmm-yyyy format>
* Python-Version: <version number>
Post-History: <dates of postings to python-list and python-dev>
* Replaces: <pep number>
* Replaced-By: <pep number>
The Author: header lists the names and optionally, the email
addresses of all the authors/owners of the PEP. The format of the
author entry should be
address(a)dom.ain (Random J. User)
if the email address is included, and just
Random J. User
if the address is not given. If there are multiple authors, each
should be on a separate line following RFC 822 continuation line
conventions. Note that personal email addresses in PEPs will be
obscured as a defense against spam harvesters.
Standards track PEPs must have a Python-Version: header which
indicates the version of Python that the feature will be released
with. Informational PEPs do not need a Python-Version: header.
While a PEP is in private discussions (usually during the initial
Draft phase), a Discussions-To: header will indicate the mailing
list or URL where the PEP is being discussed. No Discussions-To:
header is necessary if the PEP is being discussed privately with
the author, or on the python-list or python-dev email mailing
lists. Note that email addresses in the Discussions-To: header
will not be obscured.
Created: records the date that the PEP was assigned a number,
while Post-History: is used to record the dates of when new
versions of the PEP are posted to python-list and/or python-dev.
Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001.
PEPs may have a Requires: header, indicating the PEP numbers that
this PEP depends on.
PEPs may also have a Replaced-By: header indicating that a PEP has
been rendered obsolete by a later document; the value is the
number of the PEP that replaces the current document. The newer
PEP must have a Replaces: header containing the number of the PEP
that it rendered obsolete.
PEP Formatting Requirements
PEP headings must begin in column zero and the initial letter of
each word must be capitalized as in book titles. Acronyms should
be in all capitals. The body of each section must be indented 4
spaces. Code samples inside body sections should be indented a
further 4 spaces, and other indentation can be used as required to
make the text readable. You must use two blank lines between the
last line of a section's body and the next section heading.
You must adhere to the Emacs convention of adding two spaces at
the end of every sentence. You should fill your paragraphs to
column 70, but under no circumstances should your lines extend
past column 79. If your code samples spill over column 79, you
should rewrite them.
Tab characters must never appear in the document at all. A PEP
should include the standard Emacs stanza included by example at
the bottom of this PEP.
A PEP must contain a Copyright section, and it is strongly
recommended to put the PEP in the public domain.
When referencing an external web page in the body of a PEP, you
should include the title of the page in the text, with a
footnote reference to the URL. Do not include the URL in the body
text of the PEP. E.g.
Refer to the Python Language web site [1] for more details.
...
[1] http://www.python.org
When referring to another PEP, include the PEP number in the body
text, such as "PEP 1". The title may optionally appear. Add a
footnote reference that includes the PEP's title and author. It
may optionally include the explicit URL on a separate line, but
only in the References section. Note that the pep2html.py script
will calculate URLs automatically, e.g.:
...
Refer to PEP 1 [7] for more information about PEP style
...
References
[7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
http://www.python.org/peps/pep-0001.html
If you decide to provide an explicit URL for a PEP, please use
this as the URL template:
http://www.python.org/peps/pep-xxxx.html
PEP numbers in URLs must be padded with zeros from the left, so as
to be exactly 4 characters wide, however PEP numbers in text are
never padded.
Reporting PEP Bugs, or Submitting PEP Updates
How you report a bug, or submit a PEP update depends on several
factors, such as the maturity of the PEP, the preferences of the
PEP author, and the nature of your comments. For the early draft
stages of the PEP, it's probably best to send your comments and
changes directly to the PEP author. For more mature, or finished
PEPs you may want to submit corrections to the SourceForge bug
manager[6] or better yet, the SourceForge patch manager[2] so that
your changes don't get lost. If the PEP author is a SF developer,
assign the bug/patch to him, otherwise assign it to the PEP
editor.
When in doubt about where to send your changes, please check first
with the PEP author and/or PEP editor.
PEP authors who are also SF committers, can update the PEPs
themselves by using "cvs commit" to commit their changes.
Remember to also push the formatted PEP text out to the web by
doing the following:
% python pep2html.py -i NUM
where NUM is the number of the PEP you want to push out. See
% python pep2html.py --help
for details.
Transferring PEP Ownership
It occasionally becomes necessary to transfer ownership of PEPs to
a new champion. In general, we'd like to retain the original
author as a co-author of the transferred PEP, but that's really up
to the original author. A good reason to transfer ownership is
because the original author no longer has the time or interest in
updating it or following through with the PEP process, or has
fallen off the face of the 'net (i.e. is unreachable or not
responding to email). A bad reason to transfer ownership is
because you don't agree with the direction of the PEP. We try to
build consensus around a PEP, but if that's not possible, you can
always submit a competing PEP.
If you are interested assuming ownership of a PEP, send a message
asking to take over, addressed to both the original author and the
PEP editor <peps(a)python.org>. If the original author doesn't
respond to email in a timely manner, the PEP editor will make a
unilateral decision (it's not like such decisions can be
reversed. :).
References and Footnotes
[1] This historical record is available by the normal CVS commands
for retrieving older revisions. For those without direct access
to the CVS tree, you can browse the current and past PEP revisions
via the SourceForge web site at
http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=…
[2] http://sourceforge.net/tracker/?group_id=5470&atid=305470
[3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse
[4] http://www.opencontent.org/openpub/
[5] The script referred to here is pep2html.py, which lives in
the same directory in the CVS tree as the PEPs themselves.
Try "pep2html.py --help" for details.
The URL for viewing PEPs on the web is
http://www.python.org/peps/
[6] http://sourceforge.net/tracker/?group_id=5470&atid=305470
[7] PEP 9, Sample PEP Template
http://www.python.org/peps/pep-0009.html
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
8
14
congrats on 3.5! Alas, windows 7 users are having problems installing it
by Laura Creighton 16 Sep '19
by Laura Creighton 16 Sep '19
16 Sep '19
webmaster has already heard from 4 people who cannot install it.
I sent them to the bug tracker or to python-list but they seem
not to have gone either place. Is there some guide I should be
sending them to, 'how to debug installation problems'?
Laura
5
6
Hi,
On Twitter, Raymond Hettinger wrote:
"The decision making process on Python-dev is an anti-pattern,
governed by anecdotal data and ambiguity over what problem is solved."
https://twitter.com/raymondh/status/887069454693158912
About "anecdotal data", I would like to discuss the Python startup time.
== Python 3.7 compared to 2.7 ==
First of all, on speed.python.org, we have:
* Python 2.7: 6.4 ms with site, 3.0 ms without site (-S)
* master (3.7): 14.5 ms with site, 8.4 ms without site (-S)
Python 3.7 startup time is 2.3x slower with site (default mode), or
2.8x slower without site (-S command line option).
(I will skip Python 3.4, 3.5 and 3.6 which are much worse than Python 3.7...)
So if an user complained about Python 2.7 startup time: be prepared
for a 2x - 3x more angry user when "forced" to upgrade to Python 3!
== Mercurial vs Git, Python vs C, startup time ==
Startup time matters a lot for Mercurial since Mercurial is compared
to Git. Git and Mercurial have similar features, but Git is written in
C whereas Mercurial is written in Python. Quick benchmark on the
speed.python.org server:
* hg version: 44.6 ms +- 0.2 ms
* git --version: 974 us +- 7 us
Mercurial startup time is already 45.8x slower than Git whereas tested
Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
developers, with a startup time 2x - 3x slower...
I tested Mecurial 3.7.3 and Git 2.7.4 on Ubuntu 16.04.1 using "python3
-m perf command -- ...".
== CPython core developers don't care? no, they do care ==
Christian Heimes, Naoki INADA, Serhiy Storchaka, Yury Selivanov, me
(Victor Stinner) and other core developers made multiple changes last
years to reduce the number of imports at startup, optimize impotlib,
etc.
IHMO all these core developers are well aware of the competition of
programming languages, and honesty Python startup time isn't "good".
So let's compare it to other programming languages similar to Python.
== PHP, Ruby, Perl ==
I measured the startup time of other programming languages which are
similar to Python, still on the speed.python.org server using "python3
-m perf command -- ...":
* perl -e ' ': 1.18 ms +- 0.01 ms
* php -r ' ': 8.57 ms +- 0.05 ms
* ruby -e ' ': 32.8 ms +- 0.1 ms
Wow, Perl is quite good! PHP seems as good as Python 2 (but Python 3
is worse). Ruby startup time seems less optimized than other
languages.
Tested versions:
* perl 5, version 22, subversion 1 (v5.22.1)
* PHP 7.0.18-0ubuntu0.16.04.1 (cli) ( NTS )
* ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]
== Quick Google search ==
I also searched for "python startup time" and "python slow startup
time" on Google and found many articles. Some examples:
"Reducing the Python startup time"
http://www.draketo.de/book/export/html/498
=> "The python startup time always nagged me (17-30ms) and I just
searched again for a way to reduce it, when I found this: The
Python-Launcher caches GTK imports and forks new processes to reduce
the startup time of python GUI programs."
https://nelsonslog.wordpress.com/2013/04/08/python-startup-time/
=> "Wow, Python startup time is worse than I thought."
"How to speed up python starting up and/or reduce file search while
loading libraries?"
https://stackoverflow.com/questions/15474160/how-to-speed-up-python-startin…
=> "The first time I log to the system and start one command it takes
6 seconds just to show a few line of help. If I immediately issue the
same command again it takes 0.1s. After a couple of minutes it gets
back to 6s. (proof of short-lived cache)"
"How does one optimise the startup of a Python script/program?"
https://www.quora.com/How-does-one-optimise-the-startup-of-a-Python-script-…
=> "I wrote a Python program that would be used very often (imagine
'cd' or 'ls') for very short runtimes, how would I make it start up as
fast as possible?"
"Python Interpreter Startup time"
https://bytes.com/topic/python/answers/34469-pyhton-interpreter-startup-time
"Python is very slow to start on Windows 7"
https://stackoverflow.com/questions/29997274/python-is-very-slow-to-start-o…
=> "Python takes 17 times longer to load on my Windows 7 machine than
Ubuntu 14.04 running on a VM"
=> "returns in 0.614s on Windows and 0.036s on Linux"
"How to make a fast command line tool in Python" (old article Python 2.5.2)
https://files.bemusement.org/talks/OSDC2008-FastPython/
=> "(...) some techniques Bazaar uses to start quickly, such as lazy imports."
--
So please continue efforts for make Python startup even faster to beat
all other programming languages, and finally convince Mercurial to
upgrade ;-)
Victor
42
96
Hi,
This is the 4th iteration of the PEP that Elvis and I have
rewritten from scratch.
The specification section has been separated from the implementation
section, which makes them easier to follow.
During the rewrite, we realized that generators and coroutines should
work with the EC in exactly the same way (coroutines used to be
created with no LC in prior versions of the PEP).
We also renamed Context Keys to Context Variables which seems
to be a more appropriate name.
Hopefully this update will resolve the remaining questions
about the specification and the proposed implementation, and
will allow us to focus on refining the API.
Yury
PEP: 550
Title: Execution Context
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yury(a)magic.io>,
Elvis Pranskevichus <elvis(a)magic.io>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2017
Python-Version: 3.7
Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017
Abstract
========
This PEP adds a new generic mechanism of ensuring consistent access
to non-local state in the context of out-of-order execution, such
as in Python generators and coroutines.
Thread-local storage, such as ``threading.local()``, is inadequate for
programs that execute concurrently in the same OS thread. This PEP
proposes a solution to this problem.
Rationale
=========
Prior to the advent of asynchronous programming in Python, programs
used OS threads to achieve concurrency. The need for thread-specific
state was solved by ``threading.local()`` and its C-API equivalent,
``PyThreadState_GetDict()``.
A few examples of where Thread-local storage (TLS) is commonly
relied upon:
* Context managers like decimal contexts, ``numpy.errstate``,
and ``warnings.catch_warnings``.
* Request-related data, such as security tokens and request
data in web applications, language context for ``gettext`` etc.
* Profiling, tracing, and logging in large code bases.
Unfortunately, TLS does not work well for programs which execute
concurrently in a single thread. A Python generator is the simplest
example of a concurrent program. Consider the following::
def fractions(precision, x, y):
with decimal.localcontext() as ctx:
ctx.prec = precision
yield Decimal(x) / Decimal(y)
yield Decimal(x) / Decimal(y**2)
g1 = fractions(precision=2, x=1, y=3)
g2 = fractions(precision=6, x=2, y=3)
items = list(zip(g1, g2))
The expected value of ``items`` is::
[(Decimal('0.33'), Decimal('0.666667')),
(Decimal('0.11'), Decimal('0.222222'))]
Rather surprisingly, the actual result is::
[(Decimal('0.33'), Decimal('0.666667')),
(Decimal('0.111111'), Decimal('0.222222'))]
This is because Decimal context is stored as a thread-local, so
concurrent iteration of the ``fractions()`` generator would corrupt
the state. A similar problem exists with coroutines.
Applications also often need to associate certain data with a given
thread of execution. For example, a web application server commonly
needs access to the current HTTP request object.
The inadequacy of TLS in asynchronous code has lead to the
proliferation of ad-hoc solutions, which are limited in scope and
do not support all required use cases.
The current status quo is that any library (including the standard
library), which relies on TLS, is likely to be broken when used in
asynchronous code or with generators (see [3]_ as an example issue.)
Some languages, that support coroutines or generators, recommend
passing the context manually as an argument to every function, see [1]_
for an example. This approach, however, has limited use for Python,
where there is a large ecosystem that was built to work with a TLS-like
context. Furthermore, libraries like ``decimal`` or ``numpy`` rely
on context implicitly in overloaded operator implementations.
The .NET runtime, which has support for async/await, has a generic
solution for this problem, called ``ExecutionContext`` (see [2]_).
Goals
=====
The goal of this PEP is to provide a more reliable
``threading.local()`` alternative, which:
* provides the mechanism and the API to fix non-local state issues
with coroutines and generators;
* has no or negligible performance impact on the existing code or
the code that will be using the new mechanism, including
libraries like ``decimal`` and ``numpy``.
High-Level Specification
========================
The full specification of this PEP is broken down into three parts:
* High-Level Specification (this section): the description of the
overall solution. We show how it applies to generators and
coroutines in user code, without delving into implementation details.
* Detailed Specification: the complete description of new concepts,
APIs, and related changes to the standard library.
* Implementation Details: the description and analysis of data
structures and algorithms used to implement this PEP, as well as the
necessary changes to CPython.
For the purpose of this section, we define *execution context* as an
opaque container of non-local state that allows consistent access to
its contents in the concurrent execution environment.
A *context variable* is an object representing a value in the
execution context. A new context variable is created by calling
the ``new_context_var()`` function. A context variable object has
two methods:
* ``lookup()``: returns the value of the variable in the current
execution context;
* ``set()``: sets the value of the variable in the current
execution context.
Regular Single-threaded Code
----------------------------
In regular, single-threaded code that doesn't involve generators or
coroutines, context variables behave like globals::
var = new_context_var()
def sub():
assert var.lookup() == 'main'
var.set('sub')
def main():
var.set('main')
sub()
assert var.lookup() == 'sub'
Multithreaded Code
------------------
In multithreaded code, context variables behave like thread locals::
var = new_context_var()
def sub():
assert var.lookup() is None # The execution context is empty
# for each new thread.
var.set('sub')
def main():
var.set('main')
thread = threading.Thread(target=sub)
thread.start()
thread.join()
assert var.lookup() == 'main'
Generators
----------
In generators, changes to context variables are local and are not
visible to the caller, but are visible to the code called by the
generator. Once set in the generator, the context variable is
guaranteed not to change between iterations::
var = new_context_var()
def gen():
var.set('gen')
assert var.lookup() == 'gen'
yield 1
assert var.lookup() == 'gen'
yield 2
def main():
var.set('main')
g = gen()
next(g)
assert var.lookup() == 'main'
var.set('main modified')
next(g)
assert var.lookup() == 'main modified'
Changes to caller's context variables are visible to the generator
(unless they were also modified inside the generator)::
var = new_context_var()
def gen():
assert var.lookup() == 'var'
yield 1
assert var.lookup() == 'var modified'
yield 2
def main():
g = gen()
var.set('var')
next(g)
var.set('var modified')
next(g)
Now, let's revisit the decimal precision example from the `Rationale`_
section, and see how the execution context can improve the situation::
import decimal
decimal_prec = new_context_var() # create a new context variable
# Pre-PEP 550 Decimal relies on TLS for its context.
# This subclass switches the decimal context storage
# to the execution context for illustration purposes.
#
class MyDecimal(decimal.Decimal):
def __init__(self, value="0"):
prec = decimal_prec.lookup()
if prec is None:
raise ValueError('could not find decimal precision')
context = decimal.Context(prec=prec)
super().__init__(value, context=context)
def fractions(precision, x, y):
# Normally, this would be set by a context manager,
# but for simplicity we do this directly.
decimal_prec.set(precision)
yield MyDecimal(x) / MyDecimal(y)
yield MyDecimal(x) / MyDecimal(y**2)
g1 = fractions(precision=2, x=1, y=3)
g2 = fractions(precision=6, x=2, y=3)
items = list(zip(g1, g2))
The value of ``items`` is::
[(Decimal('0.33'), Decimal('0.666667')),
(Decimal('0.11'), Decimal('0.222222'))]
which matches the expected result.
Coroutines and Asynchronous Tasks
---------------------------------
In coroutines, like in generators, context variable changes are local
and are not visible to the caller::
import asyncio
var = new_context_var()
async def sub():
assert var.lookup() == 'main'
var.set('sub')
assert var.lookup() == 'sub'
async def main():
var.set('main')
await sub()
assert var.lookup() == 'main'
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
To establish the full semantics of execution context in couroutines,
we must also consider *tasks*. A task is the abstraction used by
*asyncio*, and other similar libraries, to manage the concurrent
execution of coroutines. In the example above, a task is created
implicitly by the ``run_until_complete()`` function.
``asyncio.wait_for()`` is another example of implicit task creation::
async def sub():
await asyncio.sleep(1)
assert var.lookup() == 'main'
async def main():
var.set('main')
# waiting for sub() directly
await sub()
# waiting for sub() with a timeout
await asyncio.wait_for(sub(), timeout=2)
var.set('main changed')
Intuitively, we expect the assertion in ``sub()`` to hold true in both
invocations, even though the ``wait_for()`` implementation actually
spawns a task, which runs ``sub()`` concurrently with ``main()``.
Thus, tasks **must** capture a snapshot of the current execution
context at the moment of their creation and use it to execute the
wrapped coroutine whenever that happens. If this is not done, then
innocuous looking changes like wrapping a coroutine in a ``wait_for()``
call would cause surprising breakage. This leads to the following::
import asyncio
var = new_context_var()
async def sub():
# Sleeping will make sub() run after
# `var` is modified in main().
await asyncio.sleep(1)
assert var.lookup() == 'main'
async def main():
var.set('main')
loop.create_task(sub()) # schedules asynchronous execution
# of sub().
assert var.lookup() == 'main'
var.set('main changed')
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
In the above code we show how ``sub()``, running in a separate task,
sees the value of ``var`` as it was when ``loop.create_task(sub())``
was called.
Like tasks, the intuitive behaviour of callbacks scheduled with either
``Loop.call_soon()``, ``Loop.call_later()``, or
``Future.add_done_callback()`` is to also capture a snapshot of the
current execution context at the point of scheduling, and use it to
run the callback::
current_request = new_context_var()
def log_error(e):
logging.error('error when handling request %r',
current_request.lookup())
async def render_response():
...
async def handle_get_request(request):
current_request.set(request)
try:
return await render_response()
except Exception as e:
get_event_loop().call_soon(log_error, e)
return '500 - Internal Server Error'
Detailed Specification
======================
Conceptually, an *execution context* (EC) is a stack of logical
contexts. There is one EC per Python thread.
A *logical context* (LC) is a mapping of context variables to their
values in that particular LC.
A *context variable* is an object representing a value in the
execution context. A new context variable object is created by calling
the ``sys.new_context_var(name: str)`` function. The value of the
``name`` argument is not used by the EC machinery, but may be used for
debugging and introspection.
The context variable object has the following methods and attributes:
* ``name``: the value passed to ``new_context_var()``.
* ``lookup()``: traverses the execution context top-to-bottom,
until the variable value is found. Returns ``None``, if the variable
is not present in the execution context;
* ``set()``: sets the value of the variable in the topmost logical
context.
Generators
----------
When created, each generator object has an empty logical context object
stored in its ``__logical_context__`` attribute. This logical context
is pushed onto the execution context at the beginning of each generator
iteration and popped at the end::
var1 = sys.new_context_var('var1')
var2 = sys.new_context_var('var2')
def gen():
var1.set('var1-gen')
var2.set('var2-gen')
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
# ]
n = nested_gen() # nested_gen_LC is created
next(n)
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
# ]
var1.set('var1-gen-mod')
var2.set('var2-gen-mod')
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'})
# ]
next(n)
def nested_gen():
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
# nested_gen_LC()
# ]
assert var1.lookup() == 'var1-gen'
assert var2.lookup() == 'var2-gen'
var1.set('var1-nested-gen')
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
# nested_gen_LC({var1: 'var1-nested-gen'})
# ]
yield
# EC = [
# outer_LC(),
# gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}),
# nested_gen_LC({var1: 'var1-nested-gen'})
# ]
assert var1.lookup() == 'var1-nested-gen'
assert var2.lookup() == 'var2-gen-mod'
yield
# EC = [outer_LC()]
g = gen() # gen_LC is created for the generator object `g`
list(g)
# EC = [outer_LC()]
The snippet above shows the state of the execution context stack
throughout the generator lifespan.
contextlib.contextmanager
-------------------------
Earlier, we've used the following example::
import decimal
# create a new context variable
decimal_prec = sys.new_context_var('decimal_prec')
# ...
def fractions(precision, x, y):
decimal_prec.set(precision)
yield MyDecimal(x) / MyDecimal(y)
yield MyDecimal(x) / MyDecimal(y**2)
Let's extend it by adding a context manager::
@contextlib.contextmanager
def precision_context(prec):
old_rec = decimal_prec.lookup()
try:
decimal_prec.set(prec)
yield
finally:
decimal_prec.set(old_prec)
Unfortunately, this would not work straight away, as the modification
to the ``decimal_prec`` variable is contained to the
``precision_context()`` generator, and therefore will not be visible
inside the ``with`` block::
def fractions(precision, x, y):
# EC = [{}, {}]
with precision_context(precision):
# EC becomes [{}, {}, {decimal_prec: precision}] in the
# *precision_context()* generator,
# but here the EC is still [{}, {}]
# raises ValueError('could not find decimal precision')!
yield MyDecimal(x) / MyDecimal(y)
yield MyDecimal(x) / MyDecimal(y**2)
The way to fix this is to set the generator's ``__logical_context__``
attribute to ``None``. This will cause the generator to avoid
modifying the execution context stack.
We modify the ``contextlib.contextmanager()`` decorator to
set ``genobj.__logical_context__`` to ``None`` to produce
well-behaved context managers::
def fractions(precision, x, y):
# EC = [{}, {}]
with precision_context(precision):
# EC = [{}, {decimal_prec: precision}]
yield MyDecimal(x) / MyDecimal(y)
yield MyDecimal(x) / MyDecimal(y**2)
# EC becomes [{}, {decimal_prec: None}]
asyncio
-------
``asyncio`` uses ``Loop.call_soon``, ``Loop.call_later``,
and ``Loop.call_at`` to schedule the asynchronous execution of a
function. ``asyncio.Task`` uses ``call_soon()`` to further the
execution of the wrapped coroutine.
We modify ``Loop.call_{at,later,soon}`` to accept the new
optional *execution_context* keyword argument, which defaults to
the copy of the current execution context::
def call_soon(self, callback, *args, execution_context=None):
if execution_context is None:
execution_context = sys.get_execution_context()
# ... some time later
sys.run_with_execution_context(
execution_context, callback, args)
The ``sys.get_execution_context()`` function returns a shallow copy
of the current execution context. By shallow copy here we mean such
a new execution context that:
* lookups in the copy provide the same results as in the original
execution context, and
* any changes in the original execution context do not affect the
copy, and
* any changes to the copy do not affect the original execution
context.
Either of the following satisfy the copy requirements:
* a new stack with shallow copies of logical contexts;
* a new stack with one squashed logical context.
The ``sys.run_with_execution_context(ec, func, *args, **kwargs)``
function runs ``func(*args, **kwargs)`` with *ec* as the execution
context. The function performs the following steps:
1. Set *ec* as the current execution context stack in the current
thread.
2. Push an empty logical context onto the stack.
3. Run ``func(*args, **kwargs)``.
4. Pop the logical context from the stack.
5. Restore the original execution context stack.
6. Return or raise the ``func()`` result.
These steps ensure that *ec* cannot be modified by *func*,
which makes ``run_with_execution_context()`` idempotent.
``asyncio.Task`` is modified as follows::
class Task:
def __init__(self, coro):
...
# Get the current execution context snapshot.
self._exec_context = sys.get_execution_context()
self._loop.call_soon(
self._step,
execution_context=self._exec_context)
def _step(self, exc=None):
...
self._loop.call_soon(
self._step,
execution_context=self._exec_context)
...
Generators Transformed into Iterators
-------------------------------------
Any Python generator can be represented as an equivalent iterator.
Compilers like Cython rely on this axiom. With respect to the
execution context, such iterator should behave the same way as the
generator it represents.
This means that there needs to be a Python API to create new logical
contexts and run code with a given logical context.
The ``sys.new_logical_context()`` function creates a new empty
logical context.
The ``sys.run_with_logical_context(lc, func, *args, **kwargs)``
function can be used to run functions in the specified logical context.
The *lc* can be modified as a result of the call.
The ``sys.run_with_logical_context()`` function performs the following
steps:
1. Push *lc* onto the current execution context stack.
2. Run ``func(*args, **kwargs)``.
3. Pop *lc* from the execution context stack.
4. Return or raise the ``func()`` result.
By using ``new_logical_context()`` and ``run_with_logical_context()``,
we can replicate the generator behaviour like this::
class Generator:
def __init__(self):
self.logical_context = sys.new_logical_context()
def __iter__(self):
return self
def __next__(self):
return sys.run_with_logical_context(
self.logical_context, self._next_impl)
def _next_impl(self):
# Actual __next__ implementation.
...
Let's see how this pattern can be applied to a real generator::
# create a new context variable
decimal_prec = sys.new_context_var('decimal_precision')
def gen_series(n, precision):
decimal_prec.set(precision)
for i in range(1, n):
yield MyDecimal(i) / MyDecimal(3)
# gen_series is equivalent to the following iterator:
class Series:
def __init__(self, n, precision):
# Create a new empty logical context on creation,
# like the generators do.
self.logical_context = sys.new_logical_context()
# run_with_logical_context() will pushes
# self.logical_context onto the execution context stack,
# runs self._next_impl, and pops self.logical_context
# from the stack.
return sys.run_with_logical_context(
self.logical_context, self._init, n, precision)
def _init(self, n, precision):
self.i = 1
self.n = n
decimal_prec.set(precision)
def __iter__(self):
return self
def __next__(self):
return sys.run_with_logical_context(
self.logical_context, self._next_impl)
def _next_impl(self):
decimal_prec.set(self.precision)
result = MyDecimal(self.i) / MyDecimal(3)
self.i += 1
return result
For regular iterators such approach to logical context management is
normally not necessary, and it is recommended to set and restore
context variables directly in ``__next__``::
class Series:
def __next__(self):
old_prec = decimal_prec.lookup()
try:
decimal_prec.set(self.precision)
...
finally:
decimal_prec.set(old_prec)
Asynchronous Generators
-----------------------
The execution context semantics in asynchronous generators does not
differ from that of regular generators and coroutines.
Implementation
==============
Execution context is implemented as an immutable linked list of
logical contexts, where each logical context is an immutable weak key
mapping. A pointer to the currently active execution context is stored
in the OS thread state::
+-----------------+
| | ec
| PyThreadState +-------------+
| | |
+-----------------+ |
|
ec_node ec_node ec_node v
+------+------+ +------+------+ +------+------+
| NULL | lc |<----| prev | lc |<----| prev | lc |
+------+--+---+ +------+--+---+ +------+--+---+
| | |
LC v LC v LC v
+-------------+ +-------------+ +-------------+
| var1: obj1 | | EMPTY | | var1: obj4 |
| var2: obj2 | +-------------+ +-------------+
| var3: obj3 |
+-------------+
The choice of the immutable list of immutable mappings as a fundamental
data structure is motivated by the need to efficiently implement
``sys.get_execution_context()``, which is to be frequently used by
asynchronous tasks and callbacks. When the EC is immutable,
``get_execution_context()`` can simply copy the current execution
context *by reference*::
def get_execution_context(self):
return PyThreadState_Get().ec
Let's review all possible context modification scenarios:
* The ``ContextVariable.set()`` method is called::
def ContextVar_set(self, val):
# See a more complete set() definition
# in the `Context Variables` section.
tstate = PyThreadState_Get()
top_ec_node = tstate.ec
top_lc = top_ec_node.lc
new_top_lc = top_lc.set(self, val)
tstate.ec = ec_node(
prev=top_ec_node.prev,
lc=new_top_lc)
* The ``sys.run_with_logical_context()`` is called, in which case
the passed logical context object is appended to the
execution context::
def run_with_logical_context(lc, func, *args, **kwargs):
tstate = PyThreadState_Get()
old_top_ec_node = tstate.ec
new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc)
try:
tstate.ec = new_top_ec_node
return func(*args, **kwargs)
finally:
tstate.ec = old_top_ec_node
* The ``sys.run_with_execution_context()`` is called, in which case
the current execution context is set to the passed execution context
with a new empty logical context appended to it::
def run_with_execution_context(ec, func, *args, **kwargs):
tstate = PyThreadState_Get()
old_top_ec_node = tstate.ec
new_lc = sys.new_logical_context()
new_top_ec_node = ec_node(prev=ec, lc=new_lc)
try:
tstate.ec = new_top_ec_node
return func(*args, **kwargs)
finally:
tstate.ec = old_top_ec_node
* Either ``genobj.send()``, ``genobj.throw()``, ``genobj.close()``
are called on a ``genobj`` generator, in which case the logical
context recorded in ``genobj`` is pushed onto the stack::
PyGen_New(PyGenObject *gen):
gen.__logical_context__ = sys.new_logical_context()
gen_send(PyGenObject *gen, ...):
tstate = PyThreadState_Get()
if gen.__logical_context__ is not None:
old_top_ec_node = tstate.ec
new_top_ec_node = ec_node(
prev=old_top_ec_node,
lc=gen.__logical_context__)
try:
tstate.ec = new_top_ec_node
return _gen_send_impl(gen, ...)
finally:
gen.__logical_context__ = tstate.ec.lc
tstate.ec = old_top_ec_node
else:
return _gen_send_impl(gen, ...)
* Coroutines and asynchronous generators share the implementation
with generators, and the above changes apply to them as well.
In certain scenarios the EC may need to be squashed to limit the
size of the chain. For example, consider the following corner case::
async def repeat(coro, delay):
await coro()
await asyncio.sleep(delay)
loop.create_task(repeat(coro, delay))
async def ping():
print('ping')
loop = asyncio.get_event_loop()
loop.create_task(repeat(ping, 1))
loop.run_forever()
In the above code, the EC chain will grow as long as ``repeat()`` is
called. Each new task will call ``sys.run_in_execution_context()``,
which will append a new logical context to the chain. To prevent
unbounded growth, ``sys.get_execution_context()`` checks if the chain
is longer than a predetermined maximum, and if it is, squashes the
chain into a single LC::
def get_execution_context():
tstate = PyThreadState_Get()
if tstate.ec_len > EC_LEN_MAX:
squashed_lc = sys.new_logical_context()
ec_node = tstate.ec
while ec_node:
# The LC.merge() method does not replace existing keys.
squashed_lc = squashed_lc.merge(ec_node.lc)
ec_node = ec_node.prev
return ec_node(prev=NULL, lc=squashed_lc)
else:
return tstate.ec
Logical Context
---------------
Logical context is an immutable weak key mapping which has the
following properties with respect to garbage collection:
* ``ContextVar`` objects are strongly-referenced only from the
application code, not from any of the Execution Context machinery
or values they point to. This means that there are no reference
cycles that could extend their lifespan longer than necessary, or
prevent their collection by the GC.
* Values put in the Execution Context are guaranteed to be kept
alive while there is a ``ContextVar`` key referencing them in
the thread.
* If a ``ContextVar`` is garbage collected, all of its values will
be removed from all contexts, allowing them to be GCed if needed.
* If a thread has ended its execution, its thread state will be
cleaned up along with its ``ExecutionContext``, cleaning
up all values bound to all context variables in the thread.
As discussed earluier, we need ``sys.get_execution_context()`` to be
consistently fast regardless of the size of the execution context, so
logical context is necessarily an immutable mapping.
Choosing ``dict`` for the underlying implementation is suboptimal,
because ``LC.set()`` will cause ``dict.copy()``, which is an O(N)
operation, where *N* is the number of items in the LC.
``get_execution_context()``, when squashing the EC, is a O(M)
operation, where *M* is the total number of context variable values
in the EC.
So, instead of ``dict``, we choose Hash Array Mapped Trie (HAMT)
as the underlying implementation of logical contexts. (Scala and
Clojure use HAMT to implement high performance immutable collections
[5]_, [6]_.)
With HAMT ``.set()`` becomes an O(log N) operation, and
``get_execution_context()`` squashing is more efficient on average due
to structural sharing in HAMT.
See `Appendix: HAMT Performance Analysis`_ for a more elaborate
analysis of HAMT performance compared to ``dict``.
Context Variables
-----------------
The ``ContextVar.lookup()`` and ``ContextVar.set()`` methods are
implemented as follows (in pseudo-code)::
class ContextVar:
def get(self):
tstate = PyThreadState_Get()
ec_node = tstate.ec
while ec_node:
if self in ec_node.lc:
return ec_node.lc[self]
ec_node = ec_node.prev
return None
def set(self, value):
tstate = PyThreadState_Get()
top_ec_node = tstate.ec
if top_ec_node is not None:
top_lc = top_ec_node.lc
new_top_lc = top_lc.set(self, value)
tstate.ec = ec_node(
prev=top_ec_node.prev,
lc=new_top_lc)
else:
top_lc = sys.new_logical_context()
new_top_lc = top_lc.set(self, value)
tstate.ec = ec_node(
prev=NULL,
lc=new_top_lc)
For efficient access in performance-sensitive code paths, such as in
``numpy`` and ``decimal``, we add a cache to ``ContextVar.get()``,
making it an O(1) operation when the cache is hit. The cache key is
composed from the following:
* The new ``uint64_t PyThreadState->unique_id``, which is a globally
unique thread state identifier. It is computed from the new
``uint64_t PyInterpreterState->ts_counter``, which is incremented
whenever a new thread state is created.
* The ``uint64_t ContextVar->version`` counter, which is incremented
whenever the context variable value is changed in any logical context
in any thread.
The cache is then implemented as follows::
class ContextVar:
def set(self, value):
... # implementation
self.version += 1
def get(self):
tstate = PyThreadState_Get()
if (self.last_tstate_id == tstate.unique_id and
self.last_version == self.version):
return self.last_value
value = self._get_uncached()
self.last_value = value # borrowed ref
self.last_tstate_id = tstate.unique_id
self.last_version = self.version
return value
Note that ``last_value`` is a borrowed reference. The assumption
is that if the version checks are fine, the object will be alive.
This allows the values of context variables to be properly garbage
collected.
This generic caching approach is similar to what the current C
implementation of ``decimal`` does to cache the the current decimal
context, and has similar performance characteristics.
Performance Considerations
==========================
Tests of the reference implementation based on the prior
revisions of this PEP have shown 1-2% slowdown on generator
microbenchmarks and no noticeable difference in macrobenchmarks.
The performance of non-generator and non-async code is not
affected by this PEP.
Summary of the New APIs
=======================
Python
------
The following new Python APIs are introduced by this PEP:
1. The ``sys.new_context_var(name: str='...')`` function to create
``ContextVar`` objects.
2. The ``ContextVar`` object, which has:
* the read-only ``.name`` attribute,
* the ``.lookup()`` method which returns the value of the variable
in the current execution context;
* the ``.set()`` method which sets the value of the variable in
the current execution context.
3. The ``sys.get_execution_context()`` function, which returns a
copy of the current execution context.
4. The ``sys.new_execution_context()`` function, which returns a new
empty execution context.
5. The ``sys.new_logical_context()`` function, which returns a new
empty logical context.
6. The ``sys.run_with_execution_context(ec: ExecutionContext,
func, *args, **kwargs)`` function, which runs *func* with the
provided execution context.
7. The ``sys.run_with_logical_context(lc:LogicalContext,
func, *args, **kwargs)`` function, which runs *func* with the
provided logical context on top of the current execution context.
C API
-----
1. ``PyContextVar * PyContext_NewVar(char *desc)``: create a
``PyContextVar`` object.
2. ``PyObject * PyContext_LookupVar(PyContextVar *)``: return
the value of the variable in the current execution context.
3. ``int PyContext_SetVar(PyContextVar *, PyObject *)``: set
the value of the variable in the current execution context.
4. ``PyLogicalContext * PyLogicalContext_New()``: create a new empty
``PyLogicalContext``.
5. ``PyLogicalContext * PyExecutionContext_New()``: create a new empty
``PyExecutionContext``.
6. ``PyExecutionContext * PyExecutionContext_Get()``: return the
current execution context.
7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the
passed EC object as the current for the active thread state.
8. ``int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *,
PyLogicalContext *)``: allows to implement
``sys.run_with_logical_context`` Python API.
Design Considerations
=====================
Should ``PyThreadState_GetDict()`` use the execution context?
-------------------------------------------------------------
No. ``PyThreadState_GetDict`` is based on TLS, and changing its
semantics will break backwards compatibility.
PEP 521
-------
:pep:`521` proposes an alternative solution to the problem, which
extends the context manager protocol with two new methods:
``__suspend__()`` and ``__resume__()``. Similarly, the asynchronous
context manager protocol is also extended with ``__asuspend__()`` and
``__aresume__()``.
This allows implementing context managers that manage non-local state,
which behave correctly in generators and coroutines.
For example, consider the following context manager, which uses
execution state::
class Context:
def __init__(self):
self.var = new_context_var('var')
def __enter__(self):
self.old_x = self.var.lookup()
self.var.set('something')
def __exit__(self, *err):
self.var.set(self.old_x)
An equivalent implementation with PEP 521::
local = threading.local()
class Context:
def __enter__(self):
self.old_x = getattr(local, 'x', None)
local.x = 'something'
def __suspend__(self):
local.x = self.old_x
def __resume__(self):
local.x = 'something'
def __exit__(self, *err):
local.x = self.old_x
The downside of this approach is the addition of significant new
complexity to the context manager protocol and the interpreter
implementation. This approach is also likely to negatively impact
the performance of generators and coroutines.
Additionally, the solution in :pep:`521` is limited to context managers,
and does not provide any mechanism to propagate state in asynchronous
tasks and callbacks.
Can Execution Context be implemented outside of CPython?
--------------------------------------------------------
No. Proper generator behaviour with respect to the execution context
requires changes to the interpreter.
Should we update sys.displayhook and other APIs to use EC?
----------------------------------------------------------
APIs like redirecting stdout by overwriting ``sys.stdout``, or
specifying new exception display hooks by overwriting the
``sys.displayhook`` function are affecting the whole Python process
**by design**. Their users assume that the effect of changing
them will be visible across OS threads. Therefore we cannot
just make these APIs to use the new Execution Context.
That said we think it is possible to design new APIs that will
be context aware, but that is outside of the scope of this PEP.
Greenlets
---------
Greenlet is an alternative implementation of cooperative
scheduling for Python. Although greenlet package is not part of
CPython, popular frameworks like gevent rely on it, and it is
important that greenlet can be modified to support execution
contexts.
Conceptually, the behaviour of greenlets is very similar to that of
generators, which means that similar changes around greenlet entry
and exit can be done to add support for execution context.
Backwards Compatibility
=======================
This proposal preserves 100% backwards compatibility.
Appendix: HAMT Performance Analysis
===================================
.. figure:: pep-0550-hamt_vs_dict-v2.png
:align: center
:width: 100%
Figure 1. Benchmark code can be found here: [9]_.
The above chart demonstrates that:
* HAMT displays near O(1) performance for all benchmarked
dictionary sizes.
* ``dict.copy()`` becomes very slow around 100 items.
.. figure:: pep-0550-lookup_hamt.png
:align: center
:width: 100%
Figure 2. Benchmark code can be found here: [10]_.
Figure 2 compares the lookup costs of ``dict`` versus a HAMT-based
immutable mapping. HAMT lookup time is 30-40% slower than Python dict
lookups on average, which is a very good result, considering that the
latter is very well optimized.
Thre is research [8]_ showing that there are further possible
improvements to the performance of HAMT.
The reference implementation of HAMT for CPython can be found here:
[7]_.
Acknowledgments
===============
Thanks to Victor Petrovykh for countless discussions around the topic
and PEP proofreading and edits.
Thanks to Nathaniel Smith for proposing the ``ContextVar`` design
[17]_ [18]_, for pushing the PEP towards a more complete design, and
coming up with the idea of having a stack of contexts in the thread
state.
Thanks to Nick Coghlan for numerous suggestions and ideas on the
mailing list, and for coming up with a case that cause the complete
rewrite of the initial PEP version [19]_.
Version History
===============
1. Initial revision, posted on 11-Aug-2017 [20]_.
2. V2 posted on 15-Aug-2017 [21]_.
The fundamental limitation that caused a complete redesign of the
first version was that it was not possible to implement an iterator
that would interact with the EC in the same way as generators
(see [19]_.)
Version 2 was a complete rewrite, introducing new terminology
(Local Context, Execution Context, Context Item) and new APIs.
3. V3 posted on 18-Aug-2017 [22]_.
Updates:
* Local Context was renamed to Logical Context. The term "local"
was ambiguous and conflicted with local name scopes.
* Context Item was renamed to Context Key, see the thread with Nick
Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details.
* Context Item get cache design was adjusted, per Nathaniel Smith's
idea in [25]_.
* Coroutines are created without a Logical Context; ceval loop
no longer needs to special case the ``await`` expression
(proposed by Nick Coghlan in [24]_.)
4. V4 posted on 25-Aug-2017: the current version.
* The specification section has been completely rewritten.
* Context Key renamed to Context Var.
* Removed the distinction between generators and coroutines with
respect to logical context isolation.
References
==========
.. [1] https://blog.golang.org/context
.. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.…
.. [3] https://github.com/numpy/numpy/issues/9444
.. [4] http://bugs.python.org/issue31179
.. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
.. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashma…
.. [7] https://github.com/1st1/cpython/tree/hamt
.. [8] https://michael.steindorfer.name/publications/oopsla15.pdf
.. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
.. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
.. [11] https://github.com/1st1/cpython/tree/pep550
.. [12] https://www.python.org/dev/peps/pep-0492/#async-await
.. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.…
.. [14] https://github.com/MagicStack/pgbench
.. [15] https://github.com/python/performance
.. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c
.. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html
.. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html
.. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046775.html
.. [20] https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d0…
.. [21] https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c17…
.. [22] https://github.com/python/peps/blob/287ed87bb475a7da657f950b353c71c1248f67e…
.. [23] https://mail.python.org/pipermail/python-ideas/2017-August/046801.html
.. [24] https://mail.python.org/pipermail/python-ideas/2017-August/046790.html
.. [25] https://mail.python.org/pipermail/python-ideas/2017-August/046786.html
Copyright
=========
This document has been placed in the public domain.
19
129
Hi everyone,
While looking over the PyLong source code in Objects/longobject.c I came
across the fact that the PyLong object doesnt't include implementation for
basic inplace operations such as adding or multiplication:
[...]
long_long, /*nb_int*/
0, /*nb_reserved*/
long_float, /*nb_float*/
0, /* nb_inplace_add */
0, /* nb_inplace_subtract */
0, /* nb_inplace_multiply */
0, /* nb_inplace_remainder */
[...]
While I understand that the immutable nature of this type of object justifies
this approach, I wanted to experiment and see how much performance an inplace
add would bring.
My inplace add will revert to calling the default long_add function when:
- the refcount of the first operand indicates that it's being shared
or
- that operand is one of the preallocated 'small ints'
which should mitigate the effects of not conforming to the PyLong immutability
specification.
It also allocates a new PyLong _only_ in case of a potential overflow.
The workload I used to evaluate this is a simple script that does a lot of
inplace adding:
import time
import sys
def write_progress(prev_percentage, value, limit):
percentage = (100 * value) // limit
if percentage != prev_percentage:
sys.stdout.write("%d%%\r" % (percentage))
sys.stdout.flush()
return percentage
progress = -1
the_value = 0
the_increment = ((1 << 30) - 1)
crt_iter = 0
total_iters = 10 ** 9
start = time.time()
while crt_iter < total_iters:
the_value += the_increment
crt_iter += 1
progress = write_progress(progress, crt_iter, total_iters)
end = time.time()
print ("\n%.3fs" % (end - start))
print ("the_value: %d" % (the_value))
Running the baseline version outputs:
./python inplace.py
100%
356.633s
the_value: 1073741823000000000
Running the modified version outputs:
./python inplace.py
100%
308.606s
the_value: 1073741823000000000
In summary, I got a +13.47% improvement for the modified version.
The CPython revision I'm using is 7f066844a79ea201a28b9555baf4bceded90484f
from the master branch and I'm running on a I7 6700K CPU with Turbo-Boost
disabled (frequency is pinned at 4GHz).
Do you think that such an optimization would be a good approach ?
Thank you,
Catalin
8
13
PEP 539 (second round): A new C API for Thread-Local Storage in CPython
by Masayuki YAMAMOTO 01 Sep '17
by Masayuki YAMAMOTO 01 Sep '17
01 Sep '17
Hi python-dev,
Since Erik started the PEP 539 thread on python-ideas, I've collected
feedbacks in the discussion and pull-request, and tried improvement for the
API specification and reference implementation, as the result I think
resolved issues which pointed out by feedbacks.
Well, it's probably not finish yet, there is one which bothers me. I'm not
sure the CPython startup sequence design (PEP 432 Restructuring the CPython
startup sequence, it might be a conflict with the draft specification [1]),
please let me know what you think about the new API specification. In any
case, I start a new thread of the updated draft.
Summary of technical changes:
- Two functions which correspond PyThread_delete_key_value and
PyThread_ReInitTLS are omitted, because these are for the removed CPython's
own TLS implementation.
- Add an internal field "_is_initialized" and a constant default value
"Py_tss_NEEDS_INIT" to Py_tss_t type to indicate the thread key's
initialization state independent of the underlying implementation.
- Then, define behaviors for functions which uses the "_is_initialized"
field.
- Change the key argument to pass a pointer, allow to use in the limited
API that does not know the key type size.
- Add three functions which dynamic (de-)allocation and the key's
initialization state checking, because handle opaque struct.
- Change platform support in the case of enabling thread support, all
platforms are required at least one of native thread implementations.
Also the draft has been added explanations and rationales for above
changes, moreover, additional annotations for information.
Regards,
Masayuki
[1]: The specifications of thread key creation and deletion refer how to
use in the API clients (Modules/_tracemalloc.c and Python/pystate.c). One
of those, Py_Initialize function that is a caller's origin of
PyThread_tss_create is the flow "no-op when called for a second time" until
CPython 3.6 [2]. However, an internal function _Py_InitializeCore that has
been added newly in the current master branch is the flow "fatal error when
called for a second time" [3].
[2]: https://docs.python.org/3.6/c-api/init.html#c.Py_Initialize
[3]: https://github.com/python/cpython/blob/master/Python/pylifecycle.c#L508
First round for PEP 539:
https://mail.python.org/pipermail/python-ideas/2016-December/043983.html
Discussion for the issue:
https://bugs.python.org/issue25658
HTML version for PEP 539 draft:
https://www.python.org/dev/peps/pep-0539/
Diff between first round and second round:
https://gist.github.com/ma8ma/624f9e4435ebdb26230130b11ce12d20/revisions
And the pull-request for reference implementation (work in progress):
https://github.com/python/cpython/pull/1362
========================================
PEP: 539
Title: A New C-API for Thread-Local Storage in CPython
Version: $Revision$
Last-Modified: $Date$
Author: Erik M. Bray, Masayuki Yamamoto
BDFL-Delegate: Nick Coghlan
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 20-Dec-2016
Post-History: 16-Dec-2016
Abstract
========
The proposal is to add a new Thread Local Storage (TLS) API to CPython which
would supersede use of the existing TLS API within the CPython interpreter,
while deprecating the existing API. The new API is named "Thread Specific
Storage (TSS) API" (see `Rationale for Proposed Solution`_ for the origin of
the name).
Because the existing TLS API is only used internally (it is not mentioned in
the documentation, and the header that defines it, ``pythread.h``, is not
included in ``Python.h`` either directly or indirectly), this proposal
probably
only affects CPython, but might also affect other interpreter
implementations
(PyPy?) that implement parts of the CPython API.
This is motivated primarily by the fact that the old API uses ``int`` to
represent TLS keys across all platforms, which is neither POSIX-compliant,
nor portable in any practical sense [1]_.
.. note::
Throughout this document the acronym "TLS" refers to Thread Local
Storage and should not be confused with "Transportation Layer Security"
protocols.
Specification
=============
The current API for TLS used inside the CPython interpreter consists of 6
functions::
PyAPI_FUNC(int) PyThread_create_key(void)
PyAPI_FUNC(void) PyThread_delete_key(int key)
PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value)
PyAPI_FUNC(void *) PyThread_get_key_value(int key)
PyAPI_FUNC(void) PyThread_delete_key_value(int key)
PyAPI_FUNC(void) PyThread_ReInitTLS(void)
These would be superseded by a new set of analogous functions::
PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key)
PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key)
PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value)
PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key)
The specification also adds a few new features:
* A new type ``Py_tss_t``--an opaque type the definition of which may
depend on the underlying TLS implementation. It is defined::
typedef struct {
bool _is_initialized;
NATIVE_TSS_KEY_T _key;
} Py_tss_t;
where ``NATIVE_TSS_KEY_T`` is a macro whose value depends on the
underlying native TLS implementation (e.g. ``pthread_key_t``).
* A constant default value for ``Py_tss_t`` variables,
``Py_tss_NEEDS_INIT``.
* Three new functions::
PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void)
PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key)
PyAPI_FUNC(bool) PyThread_tss_is_created(Py_tss_t *key)
The first two are needed for dynamic (de-)allocation of a ``Py_tss_t``,
particularly in extension modules built with ``Py_LIMITED_API``, where
static allocation of this type is not possible due to its implementation
being opaque at build time. A value returned by ``PyThread_tss_alloc``
is the same state as initialized by ``Py_tss_NEEDS_INIT``, or ``NULL`` for
dynamic allocation failure. The behavior of ``PyThread_tss_free``
involves
calling ``PyThread_tss_delete`` preventively, or is no-op if the value
pointed to by the ``key`` argument is ``NULL``.
``PyThread_tss_is_created``
returns ``true`` if the given ``Py_tss_t`` has been initialized (i.e. by
``PyThread_tss_create``).
The new TSS API does not provide functions which correspond to
``PyThread_delete_key_value`` and ``PyThread_ReInitTLS``, because these
functions are for the removed CPython's own TLS implementation, that is the
existing API behavior has become as follows:
``PyThread_delete_key_value(key)``
is equal to ``PyThread_set_key_value(key, NULL)``, and
``PyThread_ReInitTLS()``
is no-op [8]_.
The new ``PyThread_tss_`` functions are almost exactly analogous to their
original counterparts with a few minor differences: Whereas
``PyThread_create_key`` takes no arguments and returns a TLS key as an
``int``, ``PyThread_tss_create`` takes a ``Py_tss_t*`` as an argument and
returns an ``int`` status code. The behavior of ``PyThread_tss_create`` is
undefined if the value pointed to by the ``key`` argument is not initialized
by ``Py_tss_NEEDS_INIT``. The returned status code is zero on success
and non-zero on failure. The meanings of non-zero status codes are not
otherwise defined by this specification.
Similarly the other ``PyThread_tss_`` functions are passed a ``Py_tss_t*``
whereas previously the key was passed by value. This change is necessary,
as
being an opaque type, the ``Py_tss_t`` type could hypothetically be almost
any size. This is especially necessary for extension modules built with
``Py_LIMITED_API``, where the size of the type is not known. Except for
``PyThread_tss_free``, the behaviors of ``PyThread_tss_`` are undefined if
the
value pointed to by the ``key`` argument is ``NULL``.
Moreover, because of the use of ``Py_tss_t`` instead of ``int``, there are
additional behaviors which the existing API design would be carried over
into
new API: The TSS key creation and deletion are parts of "do-if-needed" flow
and these features are silently skipped if already done--Calling
``PyThread_tss_create`` with an initialized key does nothing and returns
success soon. This is also the case of calling ``PyThread_tss_delete`` with
an uninitialized key.
The behavior of ``PyThread_tss_delete`` is defined to change the key's
initialization state to "uninitialized" in order to restart the CPython
interpreter without terminating the process (e.g. embedding Python in an
application) [12]_.
The old ``PyThread_*_key*`` functions will be marked as deprecated in the
documentation, but will not generate runtime deprecation warnings.
Additionally, on platforms where ``sizeof(pthread_key_t) != sizeof(int)``,
``PyThread_create_key`` will return immediately with a failure status, and
the other TLS functions will all be no-ops on such platforms.
Comparison of API Specification
-------------------------------
================= =============================
=============================
API Thread Local Storage (TLS) Thread Specific Storage
(TSS)
================= =============================
=============================
Version Existing New
Key Type ``int`` ``Py_tss_t`` (opaque type)
Handle Native Key cast to ``int`` conceal into internal
field
Function Argument ``int`` ``Py_tss_t *``
Features - create key - create key
- delete key - delete key
- set value - set value
- get value - get value
- delete value - (set ``NULL`` instead)
[8]_
- reinitialize keys (for - (unnecessary) [8]_
after fork)
- dynamically
(de-)allocate
key
- check key's
initialization
state
Default Value (``-1`` as key creation ``Py_tss_NEEDS_INIT``
failure)
Requirement native thread native thread
(since CPython 3.7 [9]_)
Restriction Not support platform where Unable to statically
allocate
native TLS key is defined in key when
``Py_LIMITED_API``
a way that cannot be safely is defined.
cast to ``int``.
================= =============================
=============================
Example
-------
With the proposed changes, a TSS key is initialized like::
static Py_tss_t tss_key = Py_tss_NEEDS_INIT;
if (PyThread_tss_create(&tss_key)) {
/* ... handle key creation failure ... */
}
The initialization state of the key can then be checked like::
assert(PyThread_tss_is_created(&tss_key));
The rest of the API is used analogously to the old API::
int the_value = 1;
if (PyThread_tss_get(&tss_key) == NULL) {
PyThread_tss_set(&tss_key, (void *)&the_value);
assert(PyThread_tss_get(&tss_key) != NULL);
}
/* ... once done with the key ... */
PyThread_tss_delete(&tss_key);
assert(!PyThread_tss_is_created(&tss_key));
When ``Py_LIMITED_API`` is defined, a TSS key must be dynamically
allocated::
static Py_tss_t *ptr_key = PyThread_tss_alloc();
if (ptr_key == NULL) {
/* ... handle key allocation failure ... */
}
assert(!PyThread_tss_is_created(ptr_key));
/* ... once done with the key ... */
PyThread_tss_free(ptr_key);
ptr_key = NULL;
Platform Support Changes
========================
A new "Native Thread Implementation" section will be added to PEP 11 that
states:
* As of CPython 3.7, in the case of enabling thread support, all platforms
are
required to provide at least one of native thread implementation (as of
pthreads or Windows) to implement TSS API. Any TSS API problems that occur
in the implementation without native thread will be closed as "won't fix".
Motivation
==========
The primary problem at issue here is the type of the keys (``int``) used for
TLS values, as defined by the original PyThread TLS API.
The original TLS API was added to Python by GvR back in 1997, and at the
time the key used to represent a TLS value was an ``int``, and so it has
been to the time of writing. This used CPython's own TLS implementation,
but the current generation of which hasn't been used, largely unchanged, in
Python/thread.c. Support for implementation of the API on top of native
thread implementations (pthreads and Windows) was added much later, and the
own implementation has been no longer necessary and removed [9]_.
The problem with the choice of ``int`` to represent a TLS key, is that while
it was fine for CPython's own TLS implementation, and happens to be
compatible with Windows (which uses ``DWORD`` for the analogous data), it is
not compatible with the POSIX standard for the pthreads API, which defines
``pthread_key_t`` as an opaque type not further defined by the standard (as
with ``Py_tss_t`` described above) [14]_. This leaves it up to the
underlying
implementation how a ``pthread_key_t`` value is used to look up
thread-specific data.
This has not generally been a problem for Python's API, as it just happens
that on Linux ``pthread_key_t`` is defined as an ``unsigned int``, and so is
fully compatible with Python's TLS API--``pthread_key_t``'s created by
``pthread_create_key`` can be freely cast to ``int`` and back (well, not
exactly, even this has some limitations as pointed out by issue #22206).
However, as issue #25658 points out, there are at least some platforms
(namely Cygwin, CloudABI, but likely others as well) which have otherwise
modern and POSIX-compliant pthreads implementations, but are not compatible
with Python's API because their ``pthread_key_t`` is defined in a way that
cannot be safely cast to ``int``. In fact, the possibility of running into
this problem was raised by MvL at the time pthreads TLS was added [2]_.
It could be argued that PEP-11 makes specific requirements for supporting a
new, not otherwise officially-support platform (such as CloudABI), and that
the status of Cygwin support is currently dubious. However, this creates a
very high barrier to supporting platforms that are otherwise Linux- and/or
POSIX-compatible and where CPython might otherwise "just work" except for
this one hurdle. CPython itself imposes this implementation barrier by way
of an API that is not compatible with POSIX (and in fact makes invalid
assumptions about pthreads).
Rationale for Proposed Solution
===============================
The use of an opaque type (``Py_tss_t``) to key TLS values allows the API to
be compatible, with all present (POSIX and Windows) and future (C11?)
native TLS implementations supported by CPython, as it allows the definition
of ``Py_tss_t`` to depend on the underlying implementation.
Since the existing TLS API has been available in *the limited API* [13]_ for
some platforms (e.g. Linux), CPython makes an effort to provide the new TSS
API
at that level likewise. Note, however, that ``Py_tss_t`` definition
becomes to
be an opaque struct when ``Py_LIMITED_API`` is defined, because exposing
``NATIVE_TSS_KEY_T`` as part of the limited API would prevent us from
switching
native thread implementation without rebuilding extension module.
A new API must be introduced, rather than changing the function signatures
of
the current API, in order to maintain backwards compatibility. The new API
also more clearly groups together these related functions under a single
name
prefix, ``PyThread_tss_``. The "tss" in the name stands for
"thread-specific
storage", and was influenced by the naming and design of the "tss" API that
is
part of the C11 threads API [15]_. However, this is in no way meant to
imply
compatibility with or support for the C11 threads API, or signal any future
intention of supporting C11--it's just the influence for the naming and
design.
The inclusion of the special default value ``Py_tss_NEEDS_INIT`` is required
by the fact that not all native TLS implementations define a sentinel value
for uninitialized TLS keys. For example, on Windows a TLS key is
represented by a ``DWORD`` (``unsigned int``) and its value must be treated
as opaque [3]_. So there is no unsigned integer value that can be safely
used to represent an uninitialized TLS key on Windows. Likewise, POSIX
does not specify a sentinel for an uninitialized ``pthread_key_t``, instead
relying on the ``pthread_once`` interface to ensure that a given TLS key is
initialized only once per-process. Therefore, the ``Py_tss_t`` type
contains an explicit ``._is_initialized`` that can indicate the key's
initialization state independent of the underlying implementation.
Changing ``PyThread_create_key`` to immediately return a failure status on
systems using pthreads where ``sizeof(int) != sizeof(pthread_key_t)`` is
intended as a sanity check: Currently, ``PyThread_create_key`` may report
initial success on such systems, but attempts to use the returned key are
likely to fail. Although in practice this failure occurs earlier in the
interpreter initialization, it's better to fail immediately at the source of
problem (``PyThread_create_key``) rather than sometime later when use of an
invalid key is attempted. In other words, this indicates clearly that the
old API is not supported on platforms where it cannot be used reliably, and
that no effort will be made to add such support.
Rejected Ideas
==============
* Do nothing: The status quo is fine because it works on Linux, and
platforms
wishing to be supported by CPython should follow the requirements of
PEP-11. As explained above, while this would be a fair argument if
CPython were being to asked to make changes to support particular quirks
or features of a specific platform, in this case it is quirk of CPython
that prevents it from being used to its full potential on otherwise
POSIX-compliant platforms. The fact that the current implementation
happens to work on Linux is a happy accident, and there's no guarantee
that this will never change.
* Affected platforms should just configure Python ``--without-threads``:
This is a possible temporary workaround to the issue, but only that.
Python should not be hobbled on affected platforms despite them being
otherwise perfectly capable of running multi-threaded Python.
* Affected platforms should use CPython's own TLS implementation instead of
native TLS implementation: This is a more acceptable alternative to the
previous idea, and in fact there had been a patch to do just that [4]_.
However, the own implementation being "slower and clunkier" in general
than native implementations still needlessly hobbles performance on
affected
platforms. At least one other module (``tracemalloc``) is also broken if
Python is built without native implementation. And this idea cannot be
adopted because the own implementation was removed.
* Keep the existing API, but work around the issue by providing a mapping
from
``pthread_key_t`` values to ``int`` values. A couple attempts were made
at
this ([5]_, [6]_), but this only injects needless complexity and overhead
into performance-critical code on platforms that are not currently
affected
by this issue (such as Linux). Even if use of this workaround were made
conditional on platform compatibility, it introduces platform-specific
code
to maintain, and still has the problem of the previous rejected ideas of
needlessly hobbling performance on affected platforms.
Implementation
==============
An initial version of a patch [7]_ is available on the bug tracker for this
issue. Since the migration to Github, it's being developed in the
``pep539-tss-api`` feature branch [10]_ in Masayuki Yamamoto's fork of the
CPython repository on Github. A work-in-progress PR is available at [11]_.
This reference implementation covers not only the enhancement request in API
features, but also the client codes fix needed to replace the existing TLS
API
with the new TSS API.
Copyright
=========
This document has been placed in the public domain.
References and Footnotes
========================
.. [1] http://bugs.python.org/issue25658
.. [2] https://bugs.python.org/msg116292
.. [3]
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).…
.. [4] http://bugs.python.org/file45548/configure-pthread_key_t.patch
.. [5] http://bugs.python.org/file44269/issue25658-1.patch
.. [6] http://bugs.python.org/file44303/key-constant-time.diff
.. [7] http://bugs.python.org/file46379/pythread-tss-3.patch
.. [8] https://bugs.python.org/msg298342
.. [9] http://bugs.python.org/issue30832
.. [10]
https://github.com/python/cpython/compare/master...ma8ma:pep539-tss-api
.. [11] https://github.com/python/cpython/pull/1362
.. [12] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx
.. [13] It is also called as "stable ABI"
(https://www.python.org/dev/peps/pep-0384/)
.. [14]
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create…
.. [15] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=404
3
5
Re: [Python-Dev] bpo-5001: More-informative multiprocessing error messages (#3079)
by Serhiy Storchaka 01 Sep '17
by Serhiy Storchaka 01 Sep '17
01 Sep '17
30.08.17 01:52, Antoine Pitrou пише:
> https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3…
> commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c
> branch: master
> author: Allen W. Smith, Ph.D <drallensmith(a)users.noreply.github.com>
> committer: Antoine Pitrou <pitrou(a)free.fr>
> date: 2017-08-30T00:52:18+02:00
> summary:
>
> bpo-5001: More-informative multiprocessing error messages (#3079)
>
> * Make error message more informative
>
> Replace assertions in error-reporting code with more-informative version that doesn't cause confusion over where and what the error is.
>
> * Additional clarification + get travis to check
>
> * Change from SystemError to TypeError
>
> As suggested in PR comment by @pitrou, changing from SystemError; TypeError appears appropriate.
>
> * NEWS file installation; ACKS addition (will do my best to justify it by additional work)
>
> * Making current AssertionErrors in multiprocessing more informative
>
> * Blurb added re multiprocessing managers.py, queues.py cleanup
>
> * Further multiprocessing cleanup - went through pool.py
>
> * Fix two asserts in multiprocessing/util.py
>
> * Most asserts in multiprocessing more informative
>
> * Didn't save right version
>
> * Further work on multiprocessing error messages
>
> * Correct typo
>
> * Correct typo v2
>
> * Blasted colon... serves me right for trying to work on two things at once
>
> * Simplify NEWS entry
>
> * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst
>
> * Update 2017-08-18-17-16-38.bpo-5001.gwnthq.rst
>
> OK, never mind.
>
> * Corrected (thanks to pitrou) error messages for notify
>
> * Remove extraneous backslash in docstring.
Please, please don't forget to edit commit messages before merging. An
excessively verbose commit message will be kept in the repository
forever and will harm future developers that read a history.
6
6
Python 3.3 is fast approaching its end-of-life date, 2017-09-29. Per our release policy, that date is five years after the initial release of 3.3, 3.3.0 final on 2012-09-29. Note that 3.3 has been in security-fix only mode since the 2014-03-08 release of 3.3.5. It has been a while since we produced a 3.3.x security-fix release and, due to his commitments elsewhere, Georg has agreed for me to lead 3.3 to its well-deserved retirement.
To that end, I would like to schedule its next, and hopefully final, security-fix release to coincide with the already announced 3.4.7 security-fix release. In particular, we'll plan to tag and release 3.3.7rc1 on Monday 2017-07-24 (UTC) and tag and release 3.3.7 final on Monday 2017-08-07. In the coming days, I'll be reviewing the outstanding 3.3 security issues and merging appropriate 3.3 PRs. Some of them have been sitting as patches for a long time so, if you have any such security issues that you think belong in 3.3, it would be very helpful if you would review such patches and turn them into 3.3 PRs.
As a reminder, here are the guidelines from the devguide as to what is appropriate for a security-fix only branch:
"The only changes made to a security branch are those fixing issues exploitable by attackers such as crashes, privilege escalation and, optionally, other issues such as denial of service attacks. Any other changes are not considered a security risk and thus not backported to a security branch. You should also consider fixing hard-failing tests in open security branches since it is important to be able to run the tests successfully before releasing."
Note that documentation changes, other than any that might be related to a security fix, are also out of scope.
Assuming no new security issues arise prior to the EOL date, 3.3.7 will likely be the final release of 3.3. And you really shouldn't be using 3.3 at all at this point; while downstream distributors are, of course, free to provide support of 3.3 to their customers, in a little over two months when EOL is reached python-dev will no longer accept any issues or make any changes available for 3.3. If you are still using 3.3, you really owe it to your applications, to your users, and to yourself to upgrade to a more recent release of Python 3, preferably 3.6! Many, many fixes, new features, and substantial performance improvements await you.
https://www.python.org/dev/peps/pep-0398/
https://devguide.python.org/devcycle/#security-branches
--
Ned Deily
nad(a)python.org -- []
3
2
Re: [Python-Dev] [Python-checkins] bpo-5001: More-informative multiprocessing error messages (#3079)
by Chris Jerdonek 30 Aug '17
by Chris Jerdonek 30 Aug '17
30 Aug '17
https://github.com/python/cpython/commit/bd73e72b4a9f019be514954b1d40e64dc3…
> commit: bd73e72b4a9f019be514954b1d40e64dc3a5e81c
> branch: master
> author: Allen W. Smith, Ph.D <drallensmith(a)users.noreply.github.com>
> committer: Antoine Pitrou <pitrou(a)free.fr>
> date: 2017-08-30T00:52:18+02:00
> summary:
>
> ...
> @@ -307,6 +309,10 @@ def imap(self, func, iterable, chunksize=1):
> ))
> return result
> else:
> + if chunksize < 1:
> + raise ValueError(
> + "Chunksize must be 1+, not {0:n}".format(
> + chunksize))
> assert chunksize > 1
It looks like removing this assert statement was missed.
--Chris
> task_batches = Pool._get_tasks(func, iterable, chunksize)
> result = IMapIterator(self._cache)
> @@ -334,7 +340,9 @@ def imap_unordered(self, func, iterable, chunksize=1):
> ))
> return result
> else:
> - assert chunksize > 1
> + if chunksize < 1:
> + raise ValueError(
> + "Chunksize must be 1+, not {0!r}".format(chunksize))
> task_batches = Pool._get_tasks(func, iterable, chunksize)
> result = IMapUnorderedIterator(self._cache)
> self._taskqueue.put(
3
2