Mailman 3 February 2019 - Python-Dev

PEP 448 review
by Guido van Rossum 29 Mar '23

29 Mar '23

I'm back, I've re-read the PEP, and I've re-read the long thread with "(no subject)". I think Georg Brandl nailed it: """ *I like the "sequence and dict flattening" part of the PEP, mostly because itis consistent and should be easy to understand, but the comprehension syntaxenhancements seem to be bad for readability and "comprehending" what the codedoes.The call syntax part is a mixed bag on the one hand it is nice to be consistent with the extended possibilities in literals (flattening), but on the other hand there would be small but annoying inconsistencies anyways (e.g. the duplicate kwarg case above).* """ Greg Ewing followed up explaining that the inconsistency between dict flattening and call syntax is inherent in the pre-existing different rules for dicts vs. keyword args: {'a':1, 'a':2} results in {'a':2}, while f(a=1, a=2) is an error. (This form is a SyntaxError; the dynamic case f(a=1, **{'a': 1}) is a TypeError.) For me, allowing f(*a, *b) and f(**d, **e) and all the other combinations for function calls proposed by the PEP is an easy +1 -- it's a straightforward extension of the existing pattern, and anybody who knows what f(x, *a) does will understand f(x, *a, y, *b). Guessing what f(**d, **e) means shouldn't be hard either. Understanding the edge case for duplicate keys with f(**d, **e) is a little harder, but the error messages are pretty clear, and it is not a new edge case. The sequence and dict flattening syntax proposals are also clean and logical -- we already have *-unpacking on the receiving side, so allowing *x in tuple expressions reads pretty naturally (and the similarity with *a in argument lists certainly helps). From here, having [a, *x, b, *y] is also natural, and then the extension to other displays is natural: {a, *x, b, *y} and {a:1, **d, b:2, **e}. This, too, gets a +1 from me. So that leaves comprehensions. IIRC, during the development of the patch we realized that f(*x for x in xs) is sufficiently ambiguous that we decided to disallow it -- note that f(x for x in xs) is already somewhat of a special case because an argument can only be a "bare" generator expression if it is the only argument. The same reasoning doesn't apply (in that form) to list, set and dict comprehensions -- while f(x for x in xs) is identical in meaning to f((x for x in xs)), [x for x in xs] is NOT the same as [(x for x in xs)] (that's a list of one element, and the element is a generator expression). The basic premise of this part of the proposal is that if you have a few iterables, the new proposal (without comprehensions) lets you create a list or generator expression that iterates over all of them, essentially flattening them: >>> xs = [1, 2, 3] >>> ys = ['abc', 'def'] >>> zs = [99] >>> [*xs, *ys, *zs] [1, 2, 3, 'abc', 'def', 99] >>> But now suppose you have a list of iterables: >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xss[0], *xss[1], *xss[2]] [1, 2, 3, 'abc', 'def', 99] >>> Wouldn't it be nice if you could write the latter using a comprehension? >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xs for xs in xss] [1, 2, 3, 'abc', 'def', 99] >>> This is somewhat seductive, and the following is even nicer: the *xs position may be an expression, e.g.: >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xs[:2] for xs in xss] [1, 2, 'abc', 'def', 99] >>> On the other hand, I had to explore the possibilities here by experimenting in the interpreter, and I discovered some odd edge cases (e.g. you can parenthesize the starred expression, but that seems a syntactic accident). All in all I am personally +0 on the comprehension part of the PEP, and I like that it provides a way to "flatten" a sequence of sequences, but I think very few people in the thread have supported this part. Therefore I would like to ask Neil to update the PEP and the patch to take out the comprehension part, so that the two "easy wins" can make it into Python 3.5 (basically, I am accepting two-thirds of the PEP :-). There is some time yet until alpha 2. I would also like code reviewers (Benjamin?) to start reviewing the patch <http://bugs.python.org/issue2292>, taking into account that the comprehension part needs to be removed. -- --Guido van Rossum (python.org/~guido)

7 17

PEP 1, PEP Purpose and Guidelines
by barry＠zope.com 18 May '21

18 May '21

It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor <peps(a)python.org> with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list(a)python.org and/or python-dev(a)python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: <pep number> Title: <pep title> Version: <cvs version string> Last-Modified: <cvs date string> Author: <list of authors' real names and optionally, email addrs> * Discussions-To: <email address> Status: <Draft | Active | Accepted | Deferred | Final | Replaced> Type: <Informational | Standards Track> * Requires: <pep numbers> Created: <date created on, in dd-mmm-yyyy format> * Python-Version: <version number> Post-History: <dates of postings to python-list and python-dev> * Replaces: <pep number> * Replaced-By: <pep number> The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address(a)dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor <peps(a)python.org>. If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=… [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

8 14

Boundaries between numbers and identifiers
by Serhiy Storchaka 15 Apr '21

15 Apr '21

In Python 2.5 `0or[]` was accepted by the Python parser. It became an error in 2.6 because "0o" became recognizing as an incomplete octal number. `1or[]` still is accepted. On other hand, `1if 2else 3` is accepted despites the fact that "2e" can be recognized as an incomplete floating point number. In this case the tokenizer pushes "e" back and returns "2". Shouldn't it do the same with "0o"? It is possible to make `0or[]` be parseable again. Python implementation is able to tokenize this example: $ echo '0or[]' | ./python -m tokenize 1,0-1,1: NUMBER '0' 1,1-1,3: NAME 'or' 1,3-1,4: OP '[' 1,4-1,5: OP ']' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER '' On other hand, all these examples look weird. There is an assymmetry: `1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize visually the boundary between a number and the following identifier or keyword, especially if numbers can contain letters ("b", "e", "j", "o", "x") and underscores, and identifiers can contain digits. On both sides of the boundary can be letters, digits, and underscores. I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword.

8 10

About "python-porting" mail list
by Facundo Batista 20 Jan '20

20 Jan '20

Hello! This list (which I co-admin, with Georg) is getting less and less traffic as months pass by. See: https://mail.python.org/pipermail/python-porting/ The interwebs has been collecting ton of resources about porting py2 to 3 during these years. Any not-yet-answered question surely can be done in a list with more participants. Can we kill this list? Thanks! Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista

4 7

congrats on 3.5! Alas, windows 7 users are having problems installing it
by Laura Creighton 15 Sep '19

15 Sep '19

webmaster has already heard from 4 people who cannot install it. I sent them to the bug tracker or to python-list but they seem not to have gone either place. Is there some guide I should be sending them to, 'how to debug installation problems'? Laura

5 6

why is not 64-bit installer the default download link for Windows?
by Cosimo Lupo 19 Jun '19

19 Jun '19

If one goes to httWhps://www.python.org/downloads <https://www.python.org/downloads> from a Windows browser, the default download URL is for the 32-bit installer instead of the 64-bit one. I wonder why is this still the case? Shouldn't we encourage new Windows users (who may not even know the distinction between the two architectures) to use the 64-bit version of Python, since most likely they can? If this is not the correct forum for this, please let me know where I can direct my question/feature request, thanks. Cosimo -- Cosimo Lupo

14 22

int() and math.trunc don't accept objects that only define __index__
by Rémi Lapeyre 15 Mar '19

15 Mar '19

Hi, I open this thread to discuss the proposal by Nick Coghlan in https://bugs.python.org/issue33039 to add __int__ and __trunc__ to a type when __index__ is defined. Currently __int__ does not default to __index__ during class initialisation so both must be defined to get a coherant behavior: (cpython-venv) ➜ cpython git:(add-key-argument-to-bisect) ✗ python3 Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb 18 2019, 16:10:22) [Clang 10.0.0 (clang-1000.10.44.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import math >>> class MyInt: ... def __index__(self): ... return 4 ... >>> int(MyInt()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: int() argument must be a string, a bytes-like object or a number, not 'MyInt' >>> math.trunc(MyInt()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type MyInt doesn't define __trunc__ method >>> hex(MyInt()) '0x4' >>> len("a"*MyInt()) 4 >>> MyInt.__int__ = MyInt.__index__ >>> int(MyInt()) 4 The difference in behavior is espacially weird in builtins like int() and hex(). The documentation mentions at https://docs.python.org/3/reference/datamodel.html#object.__index__ the need to always define both __index__ and __int__: Note: In order to have a coherent integer type class, when __index__() is defined __int__() should also be defined, and both should return the same value. Nick Coghlan proposes to make __int__ defaults to __index__ when only the second is defined and asked to open a discussion on python-dev before making any change "as the closest equivalent we have to this right now is the "negative" derivation, where overriding __eq__ without overriding __hash__ implicitly marks the derived class as unhashable (look for "type->tp_hash = PyObject_HashNotImplemented;").". I think the change proposed makes more sense than the current behavior and volunteer to implement it if it is accepted. What do you think about this?

4 7

Possible performance regression
by Raymond Hettinger 15 Mar '19

15 Mar '19

I'll been running benchmarks that have been stable for a while. But between today and yesterday, there has been an almost across the board performance regression. It's possible that this is a measurement error or something unique to my system (my Mac installed the 10.14.3 release today), so I'm hoping other folks can run checks as well. Raymond -- Yesterday ------------------------------------------------------------------------ $ ./python.exe Tools/scripts/var_access_benchmark.py Variable and attribute read access: 4.0 ns read_local 4.5 ns read_nonlocal 13.1 ns read_global 17.4 ns read_builtin 17.4 ns read_classvar_from_class 15.8 ns read_classvar_from_instance 24.6 ns read_instancevar 19.7 ns read_instancevar_slots 18.5 ns read_namedtuple 26.3 ns read_boundmethod Variable and attribute write access: 4.6 ns write_local 4.8 ns write_nonlocal 17.5 ns write_global 39.1 ns write_classvar 34.4 ns write_instancevar 25.3 ns write_instancevar_slots Data structure read access: 17.5 ns read_list 18.4 ns read_deque 19.2 ns read_dict Data structure write access: 19.0 ns write_list 22.0 ns write_deque 24.4 ns write_dict Stack (or queue) operations: 55.5 ns list_append_pop 46.3 ns deque_append_pop 46.7 ns deque_append_popleft Timing loop overhead: 0.3 ns loop_overhead -- Today --------------------------------------------------------------------------- $ ./python.exe py Tools/scripts/var_access_benchmark.py Variable and attribute read access: 5.0 ns read_local 5.3 ns read_nonlocal 14.7 ns read_global 18.6 ns read_builtin 19.9 ns read_classvar_from_class 17.7 ns read_classvar_from_instance 26.1 ns read_instancevar 21.0 ns read_instancevar_slots 21.7 ns read_namedtuple 27.8 ns read_boundmethod Variable and attribute write access: 6.1 ns write_local 7.3 ns write_nonlocal 18.9 ns write_global 40.7 ns write_classvar 36.2 ns write_instancevar 26.1 ns write_instancevar_slots Data structure read access: 19.1 ns read_list 19.6 ns read_deque 20.6 ns read_dict Data structure write access: 22.8 ns write_list 23.5 ns write_deque 27.8 ns write_dict Stack (or queue) operations: 54.8 ns list_append_pop 49.5 ns deque_append_pop 49.4 ns deque_append_popleft Timing loop overhead: 0.3 ns loop_overhead

15 37

[bpo-35155] Requesting a review
by Denton Liu 11 Mar '19

11 Mar '19

Hello all, A couple months back, I reported bpo-35155[1] and I submitted a PR for consideration[2]. After a couple of reviews, it seems like progress has stalled. Would it be possible for someone to review this? Thanks, Denton [1]: https://bugs.python.org/issue35155 [2]: https://github.com/python/cpython/pull/10313

2 4

Compact ordered set
by INADA Naoki 07 Mar '19

07 Mar '19

Hello, folks. I'm working on compact and ordered set implementation. It has internal data structure similar to new dict from Python 3.6. It is still work in progress. Comments, tests, and documents should be updated. But it passes existing tests excluding test_sys and test_gdb (both tests checks implementation detail) https://github.com/methane/cpython/pull/16 Before completing this work, I want to evaluate it. Following is my current thoughts about the compact ordered set. ## Preserving insertion order Order is not fundamental for set. There are no order in set in the math world. But it is convenient sometime in real world. For example, it makes doctest easy. When writing set to logs, we can use "grep" command if print order is stable. pyc is stable without PYTHONHASHSEED=0 hack. Additionally, consistency with dict is desirable. It removes one pitfall for new Python users. "Remove duplicated items from list" idiom become `list(set(duplicated))` from `list(dict.fromkeys(duplicated))`. ## Memory efficiency Hash table has dilemma. To reduce collision rate, hash table should be sparse. But it wastes memory. Since current set is optimized for both of hit and miss cases, it is more sparse than dict. (It is bit surprise that set typically uses more memory than same size dict!) New implementation partially solve this dilemma. It has sparse "index table" which items are small (1byte when table size <= 256, 2bytes when table size <= 65536), and dense entry table (each item has key and hash, which is 16bytes on 64bit system). I use 1/2 for capacity rate for now. So new implementation is memory efficient when len(s) <= 32768. But memory efficiency is roughly equal to current implementation when 32768 < len(s) <= 2**31, and worse than current implementation when len(s) > 2**31. Here is quick test about memory usage. https://gist.github.com/methane/98b7f43fc00a84964f66241695112e91 # Performance pyperformance result: $ ./python -m perf compare_to master.json oset2.json -G --min-speed=2 Slower (3): - unpickle_list: 8.48 us +- 0.09 us -> 12.8 us +- 0.5 us: 1.52x slower (+52%) - unpickle: 29.6 us +- 2.5 us -> 44.1 us +- 2.5 us: 1.49x slower (+49%) - regex_dna: 448 ms +- 3 ms -> 462 ms +- 2 ms: 1.03x slower (+3%) Faster (4): - meteor_contest: 189 ms +- 1 ms -> 165 ms +- 1 ms: 1.15x faster (-13%) - telco: 15.8 ms +- 0.2 ms -> 15.3 ms +- 0.2 ms: 1.03x faster (-3%) - django_template: 266 ms +- 6 ms -> 259 ms +- 3 ms: 1.03x faster (-3%) - unpickle_pure_python: 818 us +- 6 us -> 801 us +- 9 us: 1.02x faster (-2%) Benchmark hidden because not significant (49) unpickle and unpickle_list shows massive slowdown. I suspect this slowdown is not caused from set change. Linux perf shows many pagefault is happened in pymalloc_malloc. I think memory usage changes hit weak point of pymalloc accidentally. I will try to investigate it. On the other hand, meteor_contest shows 13% speedup. It uses set. Other doesn't show significant performance changes. I need to write more benchmarks for various set workload. I expect new set is faster on simple creation, iteration and destruction. Especially, sequential iteration and deletion will reduce cache misses. (e.g. https://bugs.python.org/issue32846 ) On the other hand, new implementation will be slow on complex (heavy random add & del) case. ----- Any comments are welcome. And any benchmark for set workloads are very welcome. Regards, -- INADA Naoki <songofacandy(a)gmail.com>

12 22