Mailman 3 October 2009 - Python-Dev

PEP 1, PEP Purpose and Guidelines
by barry＠zope.com 18 May '21

18 May '21

It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor <peps(a)python.org> with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list(a)python.org and/or python-dev(a)python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: <pep number> Title: <pep title> Version: <cvs version string> Last-Modified: <cvs date string> Author: <list of authors' real names and optionally, email addrs> * Discussions-To: <email address> Status: <Draft | Active | Accepted | Deferred | Final | Replaced> Type: <Informational | Standards Track> * Requires: <pep numbers> Created: <date created on, in dd-mmm-yyyy format> * Python-Version: <version number> Post-History: <dates of postings to python-list and python-dev> * Replaces: <pep number> * Replaced-By: <pep number> The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address(a)dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor <peps(a)python.org>. If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=… [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

8 14

Reviving restricted mode?
by Guido van Rossum 14 Aug '14

14 Aug '14

I've received some enthusiastic emails from someone who wants to revive restricted mode. He started out with a bunch of patches to the CPython runtime using ctypes, which he attached to an App Engine bug: http://code.google.com/p/googleappengine/issues/detail?id=671 Based on his code (the file secure.py is all you need, included in secure.tar.gz) it seems he believes the only security leaks are __subclasses__, gi_frame and gi_code. (I have since convinced him that if we add "restricted" guards to these attributes, he doesn't need the functions added to sys.) I don't recall the exploits that Samuele once posted that caused the death of rexec.py -- does anyone recall, or have a pointer to the threads? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

19 35

PEP 3145 (With Contents)
by Eric Pruitt 25 Dec '12

25 Dec '12

Alright, I will re-submit with the contents pasted. I never use double backquotes as I think them rather ugly; that is the work of an editor or some automated program in the chain. Plus, it also messed up my line formatting and now I have lines with one word on them... Anyway, the contents of PEP 3145: PEP: 3145 Title: Asynchronous I/O For subprocess.Popen Author: (James) Eric Pruitt, Charles R. McCreary, Josiah Carlson Type: Standards Track Content-Type: text/plain Created: 04-Aug-2009 Python-Version: 3.2 Abstract: In its present form, the subprocess.Popen implementation is prone to dead-locking and blocking of the parent Python script while waiting on data from the child process. Motivation: A search for "python asynchronous subprocess" will turn up numerous accounts of people wanting to execute a child process and communicate with it from time to time reading only the data that is available instead of blocking to wait for the program to produce data [1] [2] [3]. The current behavior of the subprocess module is that when a user sends or receives data via the stdin, stderr and stdout file objects, dead locks are common and documented [4] [5]. While communicate can be used to alleviate some of the buffering issues, it will still cause the parent process to block while attempting to read data when none is available to be read from the child process. Rationale: There is a documented need for asynchronous, non-blocking functionality in subprocess.Popen [6] [7] [2] [3]. Inclusion of the code would improve the utility of the Python standard library that can be used on Unix based and Windows builds of Python. Practically every I/O object in Python has a file-like wrapper of some sort. Sockets already act as such and for strings there is StringIO. Popen can be made to act like a file by simply using the methods attached the the subprocess.Popen.stderr, stdout and stdin file-like objects. But when using the read and write methods of those options, you do not have the benefit of asynchronous I/O. In the proposed solution the wrapper wraps the asynchronous methods to mimic a file object. Reference Implementation: I have been maintaining a Google Code repository that contains all of my changes including tests and documentation [9] as well as blog detailing the problems I have come across in the development process [10]. I have been working on implementing non-blocking asynchronous I/O in the subprocess.Popen module as well as a wrapper class for subprocess.Popen that makes it so that an executed process can take the place of a file by duplicating all of the methods and attributes that file objects have. There are two base functions that have been added to the subprocess.Popen class: Popen.send and Popen._recv, each with two separate implementations, one for Windows and one for Unix based systems. The Windows implementation uses ctypes to access the functions needed to control pipes in the kernel 32 DLL in an asynchronous manner. On Unix based systems, the Python interface for file control serves the same purpose. The different implementations of Popen.send and Popen._recv have identical arguments to make code that uses these functions work across multiple platforms. When calling the Popen._recv function, it requires the pipe name be passed as an argument so there exists the Popen.recv function that passes selects stdout as the pipe for Popen._recv by default. Popen.recv_err selects stderr as the pipe by default. "Popen.recv" and "Popen.recv_err" are much easier to read and understand than "Popen._recv('stdout' ..." and "Popen._recv('stderr' ..." respectively. Since the Popen._recv function does not wait on data to be produced before returning a value, it may return empty bytes. Popen.asyncread handles this issue by returning all data read over a given time interval. The ProcessIOWrapper class uses the asyncread and asyncwrite functions to allow a process to act like a file so that there are no blocking issues that can arise from using the stdout and stdin file objects produced from a subprocess.Popen call. References: [1] [ python-Feature Requests-1191964 ] asynchronous Subprocess http://mail.python.org/pipermail/python-bugs-list/2006-December/ 036524.html [2] Daily Life in an Ivory Basement : /feb-07/problems-with-subprocess http://ivory.idyll.org/blog/feb-07/problems-with-subprocess [3] How can I run an external command asynchronously from Python? - Stack Overflow http://stackoverflow.com/questions/636561/how-can-i-run-an-external- command-asynchronously-from-python [4] 18.1. subprocess - Subprocess management - Python v2.6.2 documentation http://docs.python.org/library/subprocess.html#subprocess.Popen.wait [5] 18.1. subprocess - Subprocess management - Python v2.6.2 documentation http://docs.python.org/library/subprocess.html#subprocess.Popen.kill [6] Issue 1191964: asynchronous Subprocess - Python tracker http://bugs.python.org/issue1191964 [7] Module to allow Asynchronous subprocess use on Windows and Posix platforms - ActiveState Code http://code.activestate.com/recipes/440554/ [8] subprocess.rst - subprocdev - Project Hosting on Google Code http://code.google.com/p/subprocdev/source/browse/doc/subprocess.rst?spec=s… [9] subprocdev - Project Hosting on Google Code http://code.google.com/p/subprocdev [10] Python Subprocess Dev http://subdev.blogspot.com/ Copyright: This P.E.P. is licensed under the Open Publication License; http://www.opencontent.org/openpub/. On Tue, Sep 8, 2009 at 22:56, Benjamin Peterson <benjamin(a)python.org> wrote: > 2009/9/7 Eric Pruitt <eric.pruitt(a)gmail.com>: >> Hello all, >> >> I have been working on adding asynchronous I/O to the Python >> subprocess module as part of my Google Summer of Code project. Now >> that I have finished documenting and pruning the code, I present PEP >> 3145 for its inclusion into the Python core code. Any and all feedback >> on the PEP (http://www.python.org/dev/peps/pep-3145/) is appreciated. > > Hi Eric, > One of the reasons you're not getting many response is that you've not > pasted the contents of the PEP in this message. That makes it really > easy for people to comment on various sections. > > BTW, it seems like you were trying to use reST formatting with the > text PEP layout. Double backquotes only mean something in reST. > > > -- > Regards, > Benjamin >

10 26

http://www.pythonlabs.com/logos.html is gone
by Georg Brandl 08 Feb '11

08 Feb '11

Which I noticed since it's cited in the BeOpen license we still refer to in LICENSE. Since pythonlabs.com itself is still up, it probably isn't much work to make the logos.html URI work again, but I don't know who maintains that page. cheer, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

2 3

Problems with hex-conversion functions
by Ender Wiggin 02 Oct '10

02 Oct '10

Hello everyone. I see several problems with the two hex-conversion function pairs that Python offers: 1. binascii.hexlify and binascii.unhexlify 2. bytes.fromhex and bytes.hex Problem #1: bytes.hex is not implemented, although it was specified in PEP 358. This means there is no symmetrical function to accompany bytes.fromhex. Problem #2: Both pairs perform the same function, although The Zen Of Python suggests that "There should be one-- and preferably only one --obvious way to do it." I do not understand why PEP 358 specified the bytes function pair although it mentioned the binascii pair... Problem #3: bytes.fromhex may receive spaces in the input string, although binascii.unhexlify may not. I see no good reason for these two functions to have different features. Problem #4: binascii.unhexlify may receive both input types: strings or bytes, whereas bytes.fromhex raises an exception when given a bytes parameter. Again there is no reason for these functions to be different. Problem #5: binascii.hexlify returns a bytes type - although ideally, converting to hex should always return string types and converting from hex should always return bytes. IMO there is no meaning of bytes as an output of hexlify, since the output is a representation of other bytes. This is also the suggested behavior of bytes.hex in PEP 358 Problems #4 and #5 call for a decision about the input and output of the functions being discussed: Option A : Strict input and output unhexlify (and bytes.fromhex) may only receives string and may only return bytes hexlify (and bytes.hex) may only receives bytes and may only return strings Option B : Robust input and strict output unhexlify (and bytes.fromhex) may receive bytes and strings and may only return bytes hexlify (and bytes.hex) may receive bytes or strings and may only return strings Of course we may also consider a third option, which will allow the return type of all functions to be robust (perhaps specified in a keyword argument), but as I wrote in the description of problem #5, I see no sense in that. Note that PEP 3137 describes: "... the more strict definitions of encoding and decoding in Python 3000: encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string." - suggesting option A. To repeat problems #4 and #5, the current behavior does not match any option: * The return type of binascii.hexlify should be string, and this is not the current behavior. As for the input: * Option A is not the current behavior because binascii.unhexlify may receive both input types. * Option B is not the current behavior because bytes.fromhex does not allow bytes as input. To fix these issues, three changes should be applied: 1. Deprecate bytes.fromhex. This fixes the following problems: #4 (go with option B and remove the function that does not allow bytes input) #2 (the binascii functions will be the only way to "do it") #1 (bytes.hex should not be implemented) 2. In order to keep the functionality that bytes.fromhex has over unhexlify, the latter function should be able to handle spaces in its input (fix #3) 3. binascii.hexlify should return string as its return type (fix #5)

5 7

Reworking the GIL
by Antoine Pitrou 23 Nov '09

23 Nov '09

Hello there, The last couple of days I've been working on an experimental rewrite of the GIL. Since the work has been turning out rather successful (or, at least, not totally useless and crashing!) I thought I'd announce it here. First I want to stress this is not about removing the GIL. There still is a Global Interpreter Lock which serializes access to most parts of the interpreter. These protected parts haven't changed either, so Python doesn't become really better at extracting computational parallelism out of several cores. Goals ----- The new GIL (which is also the name of the sandbox area I've committed it in, "newgil") addresses the following issues : 1) Switching by opcode counting. Counting opcodes is a very crude way of estimating times, since the time spent executing a single opcode can very wildly. Litterally, an opcode can be as short as a handful of nanoseconds (think something like "... is not None") or as long as a fraction of second, or even longer (think calling a heavy non-GIL releasing C function, such as re.search()). Therefore, releasing the GIL every 100 opcodes, regardless of their length, is a very poor policy. The new GIL does away with this by ditching _Py_Ticker entirely and instead using a fixed interval (by default 5 milliseconds, but settable) after which we ask the main thread to release the GIL and let another thread be scheduled. 2) GIL overhead and efficiency in contended situations. Apparently, some OSes (OS X mainly) have problems with lock performance when the lock is already taken: the system calls are heavy. This is the "Dave Beazley effect", where he took a very trivial loop, therefore made of very short opcodes and therefore releasing the GIL very often (probably 100000 times a second), and runs it in one or two threads on an OS with poor lock performance (OS X). He sees a 50% increase in runtime when using two threads rather than one, in what is admittedly a pathological case. Even on better platforms such as Linux, eliminating the overhead of many GIL acquires and releases (since the new GIL is released on a fixed time basis rather than on an opcode counting basis) yields slightly better performance (read: a smaller performance degradation :-)) when there are several pure Python computation threads running. 3) Thread switching latency. The traditional scheme merely releases the GIL for a couple of CPU cycles, and reacquires it immediately. Unfortunately, this doesn't mean the OS will automatically switch to another, GIL-awaiting thread. In many situations, the same thread will continue running. This, with the opcode counting scheme, is the reason why some people have been complaining about latency problems when an I/O thread competes with a computational thread (the I/O thread wouldn't be scheduled right away when e.g. a packet arrives; or rather, it would be scheduled by the OS, but unscheduled immediately when trying to acquire the GIL, and it would be scheduled again only much later). The new GIL improves on this by combinating two mechanisms: - forced thread switching, which means that when the switching interval is terminated (mentioned in 1) and the GIL is released, we will force any of the threads waiting on the GIL to be scheduled instead of the formerly GIL-holding thread. Which thread exactly is an OS decision, however: the goal here is not to have our own scheduler (this could be discussed but I wanted the design to remain simple :-) After all, man-years of work have been invested in scheduling algorithms by kernel programming teams). - priority requests, which is an option for a thread requesting the GIL to be scheduled as soon as possible, and forcibly (rather than any other threads). This is meant to be used by GIL-releasing methods such as read() on files and sockets. The scheme, again, is very simple: when a priority request is done by a thread, the GIL is released as soon as possible by the thread holding it (including in the eval loop), and then the thread making the priority request is forcibly scheduled (by making all other GIL-awaiting threads wait in the meantime). Implementation -------------- The new GIL is implemented using a couple of mutexes and condition variables. A {mutex, condition} pair is used to protect the GIL itself, which is a mere variable named `gil_locked` (there are a couple of other variables for bookkeeping). Another {mutex, condition} pair is used for forced thread switching (described above). Finally, a separate mutex is used for priority requests (described above). The code is in the sandbox: http://svn.python.org/view/sandbox/trunk/newgil/ The file of interest is Python/ceval_gil.h. Changes in other files are very minimal, except for priority requests which have been added at strategic places (some methods of I/O modules). Also, the code remains rather short, while of course being less trivial than the old one. NB : this is a branch of py3k. There should be no real difficulty porting it back to trunk, provided someone wants to do the job. Platforms --------- I've implemented the new GIL for POSIX and Windows (tested under Linux and Windows XP (running in a VM)). Judging by what I can read in the online MSDN docs, the Windows support should include everything from Windows 2000, and probably recent versions of Windows CE. Other platforms aren't implemented, because I don't have access to the necessary hardware. Besides, I must admit I'm not very motivated in working on niche/obsolete systems. I've e-mailed Andrew MacIntyre in private to ask him if he'd like to do the OS/2 support. Supporting a new platform is not very difficult: it's a matter of writing the 50-or-so lines of necessary platform-specific macros at the beginning of Python/ceval_gil.h. The reason I couldn't use the existing thread support (Python/thread_*.h) is that these abstractions are too poor. Mainly, they don't provide: - events, conditions or an equivalent thereof - the ability to acquire a resource with a timeout Measurements ------------ Before starting this work, I wrote ccbench (*), a little benchmark script ("ccbench" being a shorthand for "concurrency benchmark") which measures two things: - computation throughput with one or several concurrent threads - latency to external events (I use an UDP socket) when there is zero, one, or several background computation threads running (*) http://svn.python.org/view/sandbox/trunk/ccbench/ The benchmark involves several computation workloads with different GIL characteristics. By default there are 3 of them: A- one pure Python workload (computation of a number of digits of pi): that is, something which spends its time in the eval loop B- one mostly C workload where the C implementation doesn't release the GIL (regular expression matching) C- one mostly C workload where the implementation does release the GIL (bz2 compression) In the ccbench directory you will find benchmark results, under Linux, for two different systems I have here. The new GIL shows roughly similar but slightly better throughput results than the old one. And it is much better in the latency tests, especially in workload B (going down from almost a second of average latency with the old GIL, to a couple of milliseconds with the new GIL). This is the combined result of using a time-based scheme (rather than opcode-based) and of forced thread switching (rather than relying on the OS to actually switch threads when we speculatively release the GIL). As a sidenote, I might mention that single-threaded performance is not degraded at all. It is, actually, theoretically a bit better because the old ticker check in the eval loop becomes simpler; however, this goes mostly unnoticed. Now what remains to be done? Having other people test it would be fine. Even better if you have an actual multi-threaded py3k application. But ccbench results for other OSes would be nice too :-) (I get good results under the Windows XP VM but I feel that a VM is not an ideal setup for a concurrency benchmark) Of course, studying and reviewing the code is welcome. As for integrating it into the mainline py3k branch, I guess we have to answer these questions: - is the approach interesting? (we could decide that it's just not worth it, and that a good GIL can only be a dead (removed) GIL) - is the patch good, mature and debugged enough? - how do we deal with the unsupported platforms (POSIX and Windows support should cover most bases, but the fate of OS/2 support depends on Andrew)? Regards Antoine.

20 56

Retrieve an arbitrary element from a set without removing it
by Willi Richert 10 Nov '09

10 Nov '09

Hi, recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it. Three possibilities came to mind: 1. x = some_set.pop() some_set.add(x) 2. for x in some_set: break 3. x = iter(some_set).next() Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this? I imagine something like x = some_set.get() or x = some_set.pop(False) and am thinking about providing a patch against setobject.c (preferring the .get() solution being a stripped down pop()). Before, I would like to know whether I have overlooked something or whether this can be done in an already existing way. Thanks, wr

39 151

Refactoring installation schemes
by Tarek Ziadé 06 Nov '09

06 Nov '09

Hello, Since the addition of PEP 370, (per-user site packages), site.py and distutils/command/install.py are *both* providing the various installation directories for Python, depending on the system and the Python version. We have also started to discuss lately in various Mailing Lists the addition of new schemes for IronPython and Jython, meaning that we might add some more in both places. I would like to suggest a simplification by adding a dedicated module to manage these installation schemes in one single place in the stdlib. This new independant module would be used by site.py and distutils and would also make it easier for third party code to work with these schemes. Of course this new module would be rather simple and not add any new import statement to avoid any overhead when Python starts and loads site.py Regards Tarek

9 17

nonlocal keyword in 2.x?
by Mike Krell 05 Nov '09

05 Nov '09

Is there any possibility of backporting support for the nonlocal keyword into a 2.x release? I see it's not in 2.6, but I don't know if that was an intentional design choice or due to a lack of demand / round tuits. I'm also not sure if this would fall under the scope of the proposed moratorium on new language features (although my first impression was that it could be allowed since it already exists in python 3. One of my motivations for asking is a recent blog post by Fernando Perez of IPython fame that describes an interesting decorator-based idiom inspired by Apple's Grand Central Dispatch which would allow many interesting possibilities for expressing parallelization and other manipulations of execution context for blocks of python code. Unfortunately, using the technique to its fullest extent requires the nonlocal keyword. The blog post is here: https://cirl.berkeley.edu/fperez/py4science/decorators.html Mike

17 44

PEP 382: Namespace Packages
by "Martin v. Löwis" 04 Nov '09

04 Nov '09

I propose the following PEP for inclusion to Python 3.1. Please comment. Regards, Martin Abstract ======== Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages __path__ must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. Terminology =========== Within this PEP, the term package refers to Python packages as defined by Python's import statement. The term distribution refers to separately installable sets of Python modules as stored in the Python package index, and installed by distutils or setuptools. The term vendor package refers to groups of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages install on Linux systems). The term portion refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package. Namespace packages today ======================== Python currently provides the pkgutil.extend_path to denote a package as a namespace package. The recommended way of using it is to put:: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) int the package's ``__init__.py``. Every distribution needs to provide the same contents in its ``__init__.py``, so that extend_path is invoked independent of which portion of the package gets imported first. As a consequence, the package's ``__init__.py`` cannot practically define any names as it depends on the order of the package fragments on sys.path which portion is imported first. As a special feature, extend_path reads files named ``*.pkg`` which allow to declare additional portions. setuptools provides a similar function pkg_resources.declare_namespace that is used in the form:: import pkg_resources pkg_resources.declare_namespace(__name__) In the portion's __init__.py, no assignment to __path__ is necessary, as declare_namespace modifies the package __path__ through sys.modules. As a special feature, declare_namespace also supports zip files, and registers the package name internally so that future additions to sys.path by setuptools can properly add additional portions to each package. setuptools allows declaring namespace packages in a distribution's setup.py, so that distribution developers don't need to put the magic __path__ modification into __init__.py themselves. Rationale ========= The current imperative approach to namespace packages has lead to multiple slightly-incompatible mechanisms for providing namespace packages. For example, pkgutil supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports inspecting zip files, and supports adding portions to its _namespace_packages variable, whereas pkgutil doesn't. In addition, the current approach causes problems for system vendors. Vendor packages typically must not provide overlapping files, and an attempt to install a vendor package that has a file already on disk will fail or cause unpredictable behavior. As vendors might chose to package distributions such that they will end up all in a single directory for the namespace package, all portions would contribute conflicting __init__.py files. Specification ============= Rather than using an imperative mechanism for importing packages, a declarative approach is proposed here, as an extension to the existing ``*.pkg`` mechanism. The import statement is extended so that it directly considers ``*.pkg`` files during import; a directory is considered a package if it either contains a file named __init__.py, or a file whose name ends with ".pkg". In addition, the format of the ``*.pkg`` file is extended: a line with the single character ``*`` indicates that the entire sys.path will be searched for portions of the namespace package at the time the namespace packages is imported. Importing a package will immediately compute the package's __path__; the ``*.pkg`` files are not considered anymore after the initial import. If a ``*.pkg`` package contains an asterisk, this asterisk is prepended to the package's __path__ to indicate that the package is a namespace package (and that thus further extensions to sys.path might also want to extend __path__). At most one such asterisk gets prepended to the path. extend_path will be extended to recognize namespace packages according to this PEP, and avoid adding directories twice to __path__. No other change to the importing mechanism is made; searching modules (including __init__.py) will continue to stop at the first module encountered. Discussion ========== With the addition of ``*.pkg`` files to the import mechanism, namespace packages can stop filling out the namespace package's __init__.py. As a consequence, extend_path and declare_namespace become obsolete. It is recommended that distributions put a file <distribution>.pkg into their namespace packages, with a single asterisk. This allows vendor packages to install multiple portions of namespace package into a single directory, with no risk of overlapping files. Namespace packages can start providing non-trivial __init__.py implementations; to do so, it is recommended that a single distribution provides a portion with just the namespace package's __init__.py (and potentially other modules that belong to the namespace package proper). The mechanism is mostly compatible with the existing namespace mechanisms. extend_path will be adjusted to this specification; any other mechanism might cause portions to get added twice to __path__. Copyright ========= This document has been placed in the public domain.

31 110