Mailman 3 June 2014 - Python-Dev

PEP 1, PEP Purpose and Guidelines
by barry＠zope.com 18 May '21

18 May '21

It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor <peps(a)python.org> with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list(a)python.org and/or python-dev(a)python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: <pep number> Title: <pep title> Version: <cvs version string> Last-Modified: <cvs date string> Author: <list of authors' real names and optionally, email addrs> * Discussions-To: <email address> Status: <Draft | Active | Accepted | Deferred | Final | Replaced> Type: <Informational | Standards Track> * Requires: <pep numbers> Created: <date created on, in dd-mmm-yyyy format> * Python-Version: <version number> Post-History: <dates of postings to python-list and python-dev> * Replaces: <pep number> * Replaced-By: <pep number> The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address(a)dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor <peps(a)python.org>. If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=… [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

8 14

Update to PEP 11 to clarify garnering platform support
by Brett Cannon 27 Feb '15

27 Feb '15

Here is some proposed wording. Since it is more of a clarification of what it takes to garner support -- which is just a new section -- rather than a complete rewrite I'm including just the diff to make it easier to read the changes. *diff -r 49d18bb47ebc pep-0011.txt* *--- a/pep-0011.txt Wed May 14 11:18:22 2014 -0400* *+++ b/pep-0011.txt Fri May 16 13:48:30 2014 -0400* @@ -2,22 +2,21 @@ Title: Removing support for little used platforms Version: $Revision$ Last-Modified: $Date$ -Author: martin(a)v.loewis.de (Martin von Löwis) +Author: Martin von Löwis <martin(a)v.loewis.de>, + Brett Cannon <brett(a)python.org> Status: Active Type: Process Content-Type: text/x-rst Created: 07-Jul-2002 Post-History: 18-Aug-2007 + 16-May-2014 Abstract -------- -This PEP documents operating systems (platforms) which are not -supported in Python anymore. For some of these systems, -supporting code might be still part of Python, but will be removed -in a future release - unless somebody steps forward as a volunteer -to maintain this code. +This PEP documents how an operating system (platform) garners +support in Python as well as documenting past support. Rationale @@ -37,16 +36,53 @@ change to the Python source code will work on all supported platforms. -To reduce this risk, this PEP proposes a procedure to remove code -for platforms with no Python users. +To reduce this risk, this PEP specifies what is required for a +platform to be considered supported by Python as well as providing a +procedure to remove code for platforms with little or no Python +users. +Supporting platforms +-------------------- + +Gaining official platform support requires two things. First, a core +developer needs to volunteer to maintain platform-specific code. This +core developer can either already be a member of the Python +development team or be given contributor rights on the basis of +maintaining platform support (it is at the discretion of the Python +development team to decide if a person is ready to have such rights +even if it is just for supporting a specific platform). + +Second, a stable buildbot must be provided [2]_. This guarantees that +platform support will not be accidentally broken by a Python core +developer who does not have personal access to the platform. For a +buildbot to be considered stable it requires that the machine be +reliably up and functioning (but it is up to the Python core +developers to decide whether to promote a buildbot to being +considered stable). + +This policy does not disqualify supporting other platforms +indirectly. Patches which are not platform-specific but still done to +add platform support will be considered for inclusion. For example, +if platform-independent changes were necessary in the configure +script which was motivated to support a specific platform that would +be accepted. Patches which add platform-specific code such as the +name of a specific platform to the configure script will generally +not be accepted without the platform having official support. + +CPU architecture and compiler support are viewed in a similar manner +as platforms. For example, to consider the ARM architecture supported +a buildbot running on ARM would be required along with support from +the Python development team. In general it is not required to have +a CPU architecture run under every possible platform in order to be +considered supported. Unsupporting platforms ---------------------- -If a certain platform that currently has special code in it is -deemed to be without Python users, a note must be posted in this -PEP that this platform is no longer actively supported. This +If a certain platform that currently has special code in Python is +deemed to be without Python users or lacks proper support from the +Python development team and/or a buildbot, a note must be posted in +this PEP that this platform is no longer actively supported. This note must include: - the name of the system @@ -69,8 +105,8 @@ forward and offer maintenance. -Resupporting platforms ----------------------- +Re-supporting platforms +----------------------- If a user of a platform wants to see this platform supported again, he may volunteer to maintain the platform support. Such an @@ -101,7 +137,7 @@ release is made. Developers of extension modules will generally need to use the same Visual Studio release; they are concerned both with the availability of the versions they need to use, and with keeping -the zoo of versions small. The Python source tree will keep +the zoo of versions small. The Python source tree will keep unmaintained build files for older Visual Studio releases, for which patches will be accepted. Such build files will be removed from the source tree 3 years after the extended support for the compiler has @@ -223,6 +259,7 @@ ---------- .. [1] http://support.microsoft.com/lifecycle/ +.. [2] http://buildbot.python.org/3.x.stable/ Copyright ---------

3 4

More compact dictionaries with faster iteration
by Raymond Hettinger 03 Jan '15

03 Jan '15

The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer. Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices. For example, the dictionary: d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} is currently stored as: entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] Instead, the data should be organized as follows: indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']] Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics. The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit. For a sparse table of size t with n entries, the sizes are: curr_size = 24 * t new_size = 24 * n + sizeof(index) * t In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression. Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict. In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses. Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion). With the reduced memory footprint, we can also expect better cache utilization. For those wanting to experiment with the design, there is a pure Python proof-of-concept here: http://code.activestate.com/recipes/578375 YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket). Raymond

27 70

Semi-official read-only Github mirror of the CPython Mercurial repository
by Eli Bendersky 15 Oct '14

15 Oct '14

Hi all, https://github.com/python/cpython is now live as a semi-official, *read only* Github mirror of the CPython Mercurial repository. Let me know if you have any problems/concerns. I still haven't decided how often to update it (considering either just N times a day, or maybe use a Hg hook for batching). Suggestions are welcome. The methodology I used to create it is via hg-fast-export. I also tried to pack and gc the git repo as much as possible before the initial Github push - it went down from almost ~2GB to ~200MB (so this is the size of a fresh clone right now). Eli P.S. thanks Jesse for the keys to https://github.com/python

4 8

Reviving restricted mode?
by Guido van Rossum 14 Aug '14

14 Aug '14

I've received some enthusiastic emails from someone who wants to revive restricted mode. He started out with a bunch of patches to the CPython runtime using ctypes, which he attached to an App Engine bug: http://code.google.com/p/googleappengine/issues/detail?id=671 Based on his code (the file secure.py is all you need, included in secure.tar.gz) it seems he believes the only security leaks are __subclasses__, gi_frame and gi_code. (I have since convinced him that if we add "restricted" guards to these attributes, he doesn't need the functions added to sys.) I don't recall the exploits that Samuele once posted that caused the death of rexec.py -- does anyone recall, or have a pointer to the threads? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

19 35

Tracker Stats
by Ezio Melotti 07 Jul '14

07 Jul '14

Hi, I added a new "stats" page to the bug tracker: http://bugs.python.org/issue?@template=stats The page can be reached from the sidebar of the bug tracker: Summaries -> Stats The data are updated once a week, together with the Summary of Python tracker issues. Best Regards, Ezio Melotti

6 11

PEP 471 -- os.scandir() function -- a better and faster directory iterator
by Ben Hoyt 01 Jul '14

01 Jul '14

Hi Python dev folks, I've written a PEP proposing a specific os.scandir() API for a directory iterator that returns the stat-like info from the OS, the main advantage of which is to speed up os.walk() and similar operations between 4-20x, depending on your OS and file system. Full details, background info, and context links are in the PEP, which Victor Stinner has uploaded at the following URL, and I've also copied inline below. http://legacy.python.org/dev/peps/pep-0471/ Would love feedback on the PEP, but also of course on the proposal itself. -Ben PEP: 471 Title: os.scandir() function -- a better and faster directory iterator Version: $Revision$ Last-Modified: $Date$ Author: Ben Hoyt <benhoyt(a)gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-May-2014 Python-Version: 3.5 Abstract ======== This PEP proposes including a new directory iteration function, ``os.scandir()``, in the standard library. This new function adds useful functionality and increases the speed of ``os.walk()`` by 2-10 times (depending on the platform and file system) by significantly reducing the number of times ``stat()`` needs to be called. Rationale ========= Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the system call ``os.stat()`` or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- already tell you whether the files returned are directories or not, so no further system calls are needed. In short, you can reduce the number of system calls from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually much wider than they are deep, it's often much better than this.) In practice, removing all those extra system calls makes ``os.walk()`` about **8-9 times as fast on Windows**, and about **2-3 times as fast on Linux and Mac OS X**. So we're not talking about micro- optimizations. See more `benchmarks`_. .. _`benchmarks`: https://github.com/benhoyt/scandir#benchmarks Somewhat relatedly, many people (see Python `Issue 11406`_) are also keen on a version of ``os.listdir()`` that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories. So as well as providing a ``scandir()`` iterator function for calling directly, Python's existing ``os.walk()`` function could be sped up a huge amount. .. _`Issue 11406`: http://bugs.python.org/issue11406 Implementation ============== The implementation of this proposal was written by Ben Hoyt (initial version) and Tim Golden (who helped a lot with the C extension module). It lives on GitHub at `benhoyt/scandir`_. .. _`benhoyt/scandir`: https://github.com/benhoyt/scandir Note that this module has been used and tested (see "Use in the wild" section in this PEP), so it's more than a proof-of-concept. However, it is marked as beta software and is not extensively battle-tested. It will need some cleanup and more thorough testing before going into the standard library, as well as integration into `posixmodule.c`. Specifics of proposal ===================== Specifically, this PEP proposes adding a single function to the ``os`` module in the standard library, ``scandir``, that takes a single, optional string as its argument:: scandir(path='.') -> generator of DirEntry objects Like ``listdir``, ``scandir`` calls the operating system's directory iteration system calls to get the names of the files in the ``path`` directory, but it's different from ``listdir`` in two ways: * Instead of bare filename strings, it returns lightweight ``DirEntry`` objects that hold the filename string and provide simple methods that allow access to the stat-like data the operating system returned. * It returns a generator instead of a list, so that ``scandir`` acts as a true iterator instead of returning the full list immediately. ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to ``path`` (corresponds to the return values of ``os.listdir``) * ``is_dir()``: like ``os.path.isdir()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_file()``: like ``os.path.isfile()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``is_symlink()``: like ``os.path.islink()``, but requires no system calls on most systems (Linux, Windows, OS X) * ``lstat()``: like ``os.lstat()``, but requires no system calls on Windows The ``DirEntry`` attribute and method names were chosen to be the same as those in the new ``pathlib`` module for consistency. Notes on caching ---------------- The ``DirEntry`` objects are relatively dumb -- the ``name`` attribute is obviously always cached, and the ``is_X`` and ``lstat`` methods cache their values (immediately on Windows via ``FindNextFile``, and on first use on Linux / OS X via a ``stat`` call) and never refetch from the system. For this reason, ``DirEntry`` objects are intended to be used and thrown away after iteration, not stored in long-lived data structured and the methods called again and again. If a user wants to do that (for example, for watching a file's size change), they'll need to call the regular ``os.lstat()`` or ``os.path.getsize()`` functions which force a new system call each time. Examples ======== Here's a good usage pattern for ``scandir``. This is in fact almost exactly how the scandir module's faster ``os.walk()`` implementation uses it:: dirs = [] non_dirs = [] for entry in scandir(path): if entry.is_dir(): dirs.append(entry) else: non_dirs.append(entry) The above ``os.walk()``-like code will be significantly using scandir on both Windows and Linux or OS X. Or, for getting the total size of files in a directory tree -- showing use of the ``DirEntry.lstat()`` method:: def get_tree_size(path): """Return total size of files in path and subdirs.""" size = 0 for entry in scandir(path): if entry.is_dir(): sub_path = os.path.join(path, entry.name) size += get_tree_size(sub_path) else: size += entry.lstat().st_size return size Note that ``get_tree_size()`` will get a huge speed boost on Windows, because no extra stat call are needed, but on Linux and OS X the size information is not returned by the directory iteration functions, so this function won't gain anything there. Support ======= The scandir module on GitHub has been forked and used quite a bit (see "Use in the wild" in this PEP), but there's also been a fair bit of direct support for a scandir-like function from core developers and others on the python-dev and python-ideas mailing lists. A sampling: * **Nick Coghlan**, a core Python developer: "I've had the local Red Hat release engineering team express their displeasure at having to stat every file in a network mounted directory tree for info that is present in the dirent structure, so a definite +1 to os.scandir from me, so long as it makes that info available." [`source1 <http://bugs.python.org/issue11406>`_] * **Tim Golden**, a core Python developer, supports scandir enough to have spent time refactoring and significantly improving scandir's C extension module. [`source2 <https://github.com/tjguk/scandir>`_] * **Christian Heimes**, a core Python developer: "+1 for something like yielddir()" [`source3 <https://mail.python.org/pipermail/python-ideas/2012-November/017772.html>`_] and "Indeed! I'd like to see the feature in 3.4 so I can remove my own hack from our code base." [`source4 <http://bugs.python.org/issue11406>`_] * **Gregory P. Smith**, a core Python developer: "As 3.4beta1 happens tonight, this isn't going to make 3.4 so i'm bumping this to 3.5. I really like the proposed design outlined above." [`source5 <http://bugs.python.org/issue11406>`_] * **Guido van Rossum** on the possibility of adding scandir to Python 3.5 (as it was too late for 3.4): "The ship has likewise sailed for adding scandir() (whether to os or pathlib). By all means experiment and get it ready for consideration for 3.5, but I don't want to add it to 3.4." [`source6 <https://mail.python.org/pipermail/python-dev/2013-November/130583.html>`_] Support for this PEP itself (meta-support?) was given by Nick Coghlan on python-dev: "A PEP reviewing all this for 3.5 and proposing a specific os.scandir API would be a good thing." [`source7 <https://mail.python.org/pipermail/python-dev/2013-November/130588.html>`_] Use in the wild =============== To date, ``scandir`` is definitely useful, but has been clearly marked "beta", so it's uncertain how much use of it there is in the wild. Ben Hoyt has had several reports from people using it. For example: * Chris F: "I am processing some pretty large directories and was half expecting to have to modify getdents. So thanks for saving me the effort." [via personal email] * bschollnick: "I wanted to let you know about this, since I am using Scandir as a building block for this code. Here's a good example of scandir making a radical performance improvement over os.listdir." [`source8 <https://github.com/benhoyt/scandir/issues/19>`_] * Avram L: "I'm testing our scandir for a project I'm working on. Seems pretty solid, so first thing, just want to say nice work!" [via personal email] Others have `requested a PyPI package`_ for it, which has been created. See `PyPI package`_. .. _`requested a PyPI package`: https://github.com/benhoyt/scandir/issues/12 .. _`PyPI package`: https://pypi.python.org/pypi/scandir GitHub stats don't mean too much, but scandir does have several watchers, issues, forks, etc. Here's the run-down as of the stats as of June 5, 2014: * Watchers: 17 * Stars: 48 * Forks: 15 * Issues: 2 open, 19 closed **However, the much larger point is this:**, if this PEP is accepted, ``os.walk()`` can easily be reimplemented using ``scandir`` rather than ``listdir`` and ``stat``, increasing the speed of ``os.walk()`` very significantly. There are thousands of developers, scripts, and production code that would benefit from this large speedup of ``os.walk()``. For example, on GitHub, there are almost as many uses of ``os.walk`` (194,000) as there are of ``os.mkdir`` (230,000). Open issues and optional things =============================== There are a few open issues or optional additions: Should scandir be in its own module? ------------------------------------ Should the function be included in the standard library in a new module, ``scandir.scandir()``, or just as ``os.scandir()`` as discussed? The preference of this PEP's author (Ben Hoyt) would be ``os.scandir()``, as it's just a single function. Should there be a way to access the full path? ---------------------------------------------- Should ``DirEntry``'s have a way to get the full path without using ``os.path.join(path, entry.name)``? This is a pretty common pattern, and it may be useful to add pathlib-like ``str(entry)`` functionality. This functionality has also been requested in `issue 13`_ on GitHub. .. _`issue 13`: https://github.com/benhoyt/scandir/issues/13 Should it expose Windows wildcard functionality? ------------------------------------------------ Should ``scandir()`` have a way of exposing the wildcard functionality in the Windows ``FindFirstFile`` / ``FindNextFile`` functions? The scandir module on GitHub exposes this as a ``windows_wildcard`` keyword argument, allowing Windows power users the option to pass a custom wildcard to ``FindFirstFile``, which may avoid the need to use ``fnmatch`` or similar on the resulting names. It is named the unwieldly ``windows_wildcard`` to remind you you're writing power- user, Windows-only code if you use it. This boils down to whether ``scandir`` should be about exposing all of the system's directory iteration features, or simply providing a fast, simple, cross-platform directory iteration API. This PEP's author votes for not including ``windows_wildcard`` in the standard library version, because even though it could be useful in rare cases (say the Windows Dropbox client?), it'd be too easy to use it just because you're a Windows developer, and create code that is not cross-platform. Possible improvements ===================== There are many possible improvements one could make to scandir, but here is a short list of some this PEP's author has in mind: * scandir could potentially be further sped up by calling ``readdir`` / ``FindNextFile`` say 50 times per ``Py_BEGIN_ALLOW_THREADS`` block so that it stays in the C extension module for longer, and may be somewhat faster as a result. This approach hasn't been tested, but was suggested by on Issue 11406 by Antoine Pitrou. [`source9 <http://bugs.python.org/msg130125>`_] Previous discussion =================== * `Original thread Ben Hoyt started on python-ideas`_ about speeding up ``os.walk()`` * Python `Issue 11406`_, which includes the original proposal for a scandir-like function * `Further thread Ben Hoyt started on python-dev`_ that refined the ``scandir()`` API, including Nick Coghlan's suggestion of scandir yielding ``DirEntry``-like objects * `Final thread Ben Hoyt started on python-dev`_ to discuss the interaction between scandir and the new ``pathlib`` module * `Question on StackOverflow`_ about why ``os.walk()`` is slow and pointers on how to fix it (this inspired the author of this PEP early on) * `BetterWalk`_, this PEP's author's previous attempt at this, on which the scandir code is based .. _`Original thread Ben Hoyt started on python-ideas`: https://mail.python.org/pipermail/python-ideas/2012-November/017770.html .. _`Further thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-May/126119.html .. _`Final thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-November/130572.html .. _`Question on StackOverflow`: http://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-… .. _`BetterWalk`: https://github.com/benhoyt/betterwalk Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

22 58

Fix Unicode-disabled build of Python 2.7
by Serhiy Storchaka 29 Jun '14

29 Jun '14

I submitted a number of patches which fixes currently broken Unicode-disabled build of Python 2.7 (built with --disable-unicode configure option). I suppose this was broken in 2.7 when C implementation of the io module was introduced. http://bugs.python.org/issue21833 -- main patch which fixes the io module and adds helpers for testing. http://bugs.python.org/issue21834 -- a lot of minor fixes for tests. Following issues fix different modules and related tests: http://bugs.python.org/issue21854 -- cookielib http://bugs.python.org/issue21838 -- ctypes http://bugs.python.org/issue21855 -- decimal http://bugs.python.org/issue21839 -- distutils http://bugs.python.org/issue21843 -- doctest http://bugs.python.org/issue21851 -- gettext http://bugs.python.org/issue21844 -- HTMLParser http://bugs.python.org/issue21850 -- httplib and SimpleHTTPServer http://bugs.python.org/issue21842 -- IDLE http://bugs.python.org/issue21853 -- inspect http://bugs.python.org/issue21848 -- logging http://bugs.python.org/issue21849 -- multiprocessing http://bugs.python.org/issue21852 -- optparse http://bugs.python.org/issue21840 -- os.path http://bugs.python.org/issue21845 -- plistlib http://bugs.python.org/issue21836 -- sqlite3 http://bugs.python.org/issue21837 -- tarfile http://bugs.python.org/issue21835 -- Tkinter http://bugs.python.org/issue21847 -- xmlrpc http://bugs.python.org/issue21841 -- xml.sax http://bugs.python.org/issue21846 -- zipfile Most fixes are trivial and are only several lines of a code.

13 23

LZO bug
by MRAB 27 Jun '14

27 Jun '14

Is this something that we need to worry about? Raising Lazarus - The 20 Year Old Bug that Went to Mars http://blog.securitymouse.com/2014/06/raising-lazarus-20-year-old-bug-that.…

2 1

buildbot.python.org down?
by Ned Deily 27 Jun '14

27 Jun '14

The buildbot web site seems to have been down for some hours and still is as of 0915 UTC. I'm not sure who is watching over it but I'll ping the infrastructure team as well. -- Ned Deily, nad(a)acm.org

2 1