Mailman 3 November 2020 - Python-Dev

Mixed Python/C debugging
by Skip Montanaro 26 Apr '23

26 Apr '23

Having tried comp.lang.python with no response, I turn here... After at least ten years away from Python's run-time interpreter & byte code compiler, I'm getting set to familiarize myself with that again. This will, I think, entail debugging a mixed Python/C environment. I'm an Emacs user and am aware that GDB since 7.0 has support for debugging at the Python code level. Is Emacs+GDB my best bet? Are there any Python IDEs which support C-level breakpoints and debugging? Thanks, Skip

5 6

PEP 448 review
by Guido van Rossum 29 Mar '23

29 Mar '23

I'm back, I've re-read the PEP, and I've re-read the long thread with "(no subject)". I think Georg Brandl nailed it: """ *I like the "sequence and dict flattening" part of the PEP, mostly because itis consistent and should be easy to understand, but the comprehension syntaxenhancements seem to be bad for readability and "comprehending" what the codedoes.The call syntax part is a mixed bag on the one hand it is nice to be consistent with the extended possibilities in literals (flattening), but on the other hand there would be small but annoying inconsistencies anyways (e.g. the duplicate kwarg case above).* """ Greg Ewing followed up explaining that the inconsistency between dict flattening and call syntax is inherent in the pre-existing different rules for dicts vs. keyword args: {'a':1, 'a':2} results in {'a':2}, while f(a=1, a=2) is an error. (This form is a SyntaxError; the dynamic case f(a=1, **{'a': 1}) is a TypeError.) For me, allowing f(*a, *b) and f(**d, **e) and all the other combinations for function calls proposed by the PEP is an easy +1 -- it's a straightforward extension of the existing pattern, and anybody who knows what f(x, *a) does will understand f(x, *a, y, *b). Guessing what f(**d, **e) means shouldn't be hard either. Understanding the edge case for duplicate keys with f(**d, **e) is a little harder, but the error messages are pretty clear, and it is not a new edge case. The sequence and dict flattening syntax proposals are also clean and logical -- we already have *-unpacking on the receiving side, so allowing *x in tuple expressions reads pretty naturally (and the similarity with *a in argument lists certainly helps). From here, having [a, *x, b, *y] is also natural, and then the extension to other displays is natural: {a, *x, b, *y} and {a:1, **d, b:2, **e}. This, too, gets a +1 from me. So that leaves comprehensions. IIRC, during the development of the patch we realized that f(*x for x in xs) is sufficiently ambiguous that we decided to disallow it -- note that f(x for x in xs) is already somewhat of a special case because an argument can only be a "bare" generator expression if it is the only argument. The same reasoning doesn't apply (in that form) to list, set and dict comprehensions -- while f(x for x in xs) is identical in meaning to f((x for x in xs)), [x for x in xs] is NOT the same as [(x for x in xs)] (that's a list of one element, and the element is a generator expression). The basic premise of this part of the proposal is that if you have a few iterables, the new proposal (without comprehensions) lets you create a list or generator expression that iterates over all of them, essentially flattening them: >>> xs = [1, 2, 3] >>> ys = ['abc', 'def'] >>> zs = [99] >>> [*xs, *ys, *zs] [1, 2, 3, 'abc', 'def', 99] >>> But now suppose you have a list of iterables: >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xss[0], *xss[1], *xss[2]] [1, 2, 3, 'abc', 'def', 99] >>> Wouldn't it be nice if you could write the latter using a comprehension? >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xs for xs in xss] [1, 2, 3, 'abc', 'def', 99] >>> This is somewhat seductive, and the following is even nicer: the *xs position may be an expression, e.g.: >>> xss = [[1, 2, 3], ['abc', 'def'], [99]] >>> [*xs[:2] for xs in xss] [1, 2, 'abc', 'def', 99] >>> On the other hand, I had to explore the possibilities here by experimenting in the interpreter, and I discovered some odd edge cases (e.g. you can parenthesize the starred expression, but that seems a syntactic accident). All in all I am personally +0 on the comprehension part of the PEP, and I like that it provides a way to "flatten" a sequence of sequences, but I think very few people in the thread have supported this part. Therefore I would like to ask Neil to update the PEP and the patch to take out the comprehension part, so that the two "easy wins" can make it into Python 3.5 (basically, I am accepting two-thirds of the PEP :-). There is some time yet until alpha 2. I would also like code reviewers (Benjamin?) to start reviewing the patch <http://bugs.python.org/issue2292>, taking into account that the comprehension part needs to be removed. -- --Guido van Rossum (python.org/~guido)

7 17

PEP 638: Syntactic macros
by Mark Shannon 05 Feb '23

05 Feb '23

Hi everyone, I've submitted my PEP on syntactic macros as PEP 638. https://www.python.org/dev/peps/pep-0638/ All comments and suggestions are welcome. Cheers, Mark

16 25

Speeding up CPython
by Mark Shannon 18 May '21

18 May '21

Hi everyone, CPython is slow. We all know that, yet little is done to fix it. I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding. I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different. Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject. My ideas for possible funding, as well as the actual plan of development, can be found here: https://github.com/markshannon/faster-cpython I'd love to hear your thoughts on this. Cheers, Mark.

33 77

PEP 1, PEP Purpose and Guidelines
by barry＠zope.com 18 May '21

18 May '21

It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor <peps(a)python.org> with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list(a)python.org and/or python-dev(a)python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: <pep number> Title: <pep title> Version: <cvs version string> Last-Modified: <cvs date string> Author: <list of authors' real names and optionally, email addrs> * Discussions-To: <email address> Status: <Draft | Active | Accepted | Deferred | Final | Replaced> Type: <Informational | Standards Track> * Requires: <pep numbers> Created: <date created on, in dd-mmm-yyyy format> * Python-Version: <version number> Post-History: <dates of postings to python-list and python-dev> * Replaces: <pep number> * Replaced-By: <pep number> The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address(a)dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor <peps(a)python.org>. If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=… [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

8 14

Boundaries between numbers and identifiers
by Serhiy Storchaka 15 Apr '21

15 Apr '21

In Python 2.5 `0or[]` was accepted by the Python parser. It became an error in 2.6 because "0o" became recognizing as an incomplete octal number. `1or[]` still is accepted. On other hand, `1if 2else 3` is accepted despites the fact that "2e" can be recognized as an incomplete floating point number. In this case the tokenizer pushes "e" back and returns "2". Shouldn't it do the same with "0o"? It is possible to make `0or[]` be parseable again. Python implementation is able to tokenize this example: $ echo '0or[]' | ./python -m tokenize 1,0-1,1: NUMBER '0' 1,1-1,3: NAME 'or' 1,3-1,4: OP '[' 1,4-1,5: OP ']' 1,5-1,6: NEWLINE '\n' 2,0-2,0: ENDMARKER '' On other hand, all these examples look weird. There is an assymmetry: `1or 2` is a valid syntax, but `1 or2` is not. It is hard to recognize visually the boundary between a number and the following identifier or keyword, especially if numbers can contain letters ("b", "e", "j", "o", "x") and underscores, and identifiers can contain digits. On both sides of the boundary can be letters, digits, and underscores. I propose to change the Python syntax by adding a requirement that there should be a whitespace or delimiter between a numeric literal and the following keyword.

8 10

PEP 624: Remove Py_UNICODE encoder APIs
by Inada Naoki 04 Feb '21

04 Feb '21

Hi, folks. Since the previous discussion was suspended without consensus, I wrote a new PEP for it. (Thank you Victor for reviewing it!) This PEP looks very similar to PEP 623 "Remove wstr from Unicode", but for encoder APIs, not for Unicode object APIs. URL (not available yet): https://www.python.org/dev/peps/pep-0624/ --- PEP: 624 Title: Remove Py_UNICODE encoder APIs Author: Inada Naoki <songofacandy(a)gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 06-Jul-2020 Python-Version: 3.11 Abstract ======== This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in Python 3.11: * ``PyUnicode_Encode()`` * ``PyUnicode_EncodeASCII()`` * ``PyUnicode_EncodeLatin1()`` * ``PyUnicode_EncodeUTF7()`` * ``PyUnicode_EncodeUTF8()`` * ``PyUnicode_EncodeUTF16()`` * ``PyUnicode_EncodeUTF32()`` * ``PyUnicode_EncodeUnicodeEscape()`` * ``PyUnicode_EncodeRawUnicodeEscape()`` * ``PyUnicode_EncodeCharmap()`` * ``PyUnicode_TranslateCharmap()`` * ``PyUnicode_EncodeDecimal()`` * ``PyUnicode_TransformDecimalToASCII()`` .. note:: `PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP is not relating to Unicode object. These PEPs are split because they have different motivation and need different discussion. Motivation ========== In general, reducing the number of APIs that have been deprecated for a long time and have few users is a good idea for not only it improves the maintainability of CPython, but it also helps API users and other Python implementations. Rationale ========= Deprecated since Python 3.3 --------------------------- ``Py_UNICODE`` and APIs using it are deprecated since Python 3.3. Inefficient ----------- All of these APIs are implemented using ``PyUnicode_FromWideChar``. So these APIs are inefficient when user want to encode Unicode object. Not used widely --------------- When searching from top 4000 PyPI packages [1]_, only pyodbc use these APIs. * ``PyUnicode_EncodeUTF8()`` * ``PyUnicode_EncodeUTF16()`` pyodbc uses these APIs to encode Unicode object into bytes object. So it is easy to fix it. [2]_ Alternative APIs ================ There are alternative APIs to accept ``PyObject *unicode`` instead of ``Py_UNICODE *``. Users can migrate to them. ========================================= ========================================== Deprecated API Alternative APIs ========================================= ========================================== ``PyUnicode_Encode()`` ``PyUnicode_AsEncodedString()`` ``PyUnicode_EncodeASCII()`` ``PyUnicode_AsASCIIString()`` \(1) ``PyUnicode_EncodeLatin1()`` ``PyUnicode_AsLatin1String()`` \(1) ``PyUnicode_EncodeUTF7()`` \(2) ``PyUnicode_EncodeUTF8()`` ``PyUnicode_AsUTF8String()`` \(1) ``PyUnicode_EncodeUTF16()`` ``PyUnicode_AsUTF16String()`` \(3) ``PyUnicode_EncodeUTF32()`` ``PyUnicode_AsUTF32String()`` \(3) ``PyUnicode_EncodeUnicodeEscape()`` ``PyUnicode_AsUnicodeEscapeString()`` ``PyUnicode_EncodeRawUnicodeEscape()`` ``PyUnicode_AsRawUnicodeEscapeString()`` ``PyUnicode_EncodeCharmap()`` ``PyUnicode_AsCharmapString()`` \(1) ``PyUnicode_TranslateCharmap()`` ``PyUnicode_Translate()`` ``PyUnicode_EncodeDecimal()`` \(4) ``PyUnicode_TransformDecimalToASCII()`` \(4) ========================================= ========================================== Notes: (1) ``const char *errors`` parameter is missing. (2) There is no public alternative API. But user can use generic ``PyUnicode_AsEncodedString()`` instead. (3) ``const char *errors, int byteorder`` parameters are missing. (4) There is no direct replacement. But ``Py_UNICODE_TODECIMAL`` can be used instead. CPython uses ``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting from Unicode to numbers instead. Plan ==== Python 3.9 ---------- Add ``Py_DEPRECATED(3.3)`` to following APIs. This change is committed already [3]_. All other APIs have been marked ``Py_DEPRECATED(3.3)`` already. * ``PyUnicode_EncodeDecimal()`` * ``PyUnicode_TransformDecimalToASCII()``. Document all APIs as "will be removed in version 3.11". Python 3.11 ----------- These APIs are removed. * ``PyUnicode_Encode()`` * ``PyUnicode_EncodeASCII()`` * ``PyUnicode_EncodeLatin1()`` * ``PyUnicode_EncodeUTF7()`` * ``PyUnicode_EncodeUTF8()`` * ``PyUnicode_EncodeUTF16()`` * ``PyUnicode_EncodeUTF32()`` * ``PyUnicode_EncodeUnicodeEscape()`` * ``PyUnicode_EncodeRawUnicodeEscape()`` * ``PyUnicode_EncodeCharmap()`` * ``PyUnicode_TranslateCharmap()`` * ``PyUnicode_EncodeDecimal()`` * ``PyUnicode_TransformDecimalToASCII()`` Alternative ideas ================= Instead of just removing deprecated APIs, we may be able to use thier names with different signature. Make some private APIs public ------------------------------ ``PyUnicode_EncodeUTF7()`` doesn't have public alternative APIs. Some APIs have alternative public APIs. But they are missing ``const char *errors`` or ``int byteorder`` parameters. We can rename some private APIs and make them public to cover missing APIs and parameters. ============================= ================================ Rename to Rename from ============================= ================================ ``PyUnicode_EncodeASCII()`` ``_PyUnicode_AsASCIIString()`` ``PyUnicode_EncodeLatin1()`` ``_PyUnicode_AsLatin1String()`` ``PyUnicode_EncodeUTF7()`` ``_PyUnicode_EncodeUTF7()`` ``PyUnicode_EncodeUTF8()`` ``_PyUnicode_AsUTF8String()`` ``PyUnicode_EncodeUTF16()`` ``_PyUnicode_EncodeUTF16()`` ``PyUnicode_EncodeUTF32()`` ``_PyUnicode_EncodeUTF32()`` ============================= ================================ Pros: * We have more consistent API set. Cons: * We have more public APIs to maintain. * Existing public APIs are enough for most use cases, and ``PyUnicode_AsEncodedString()`` can be used in other cases. Replace ``Py_UNICODE*`` with ``Py_UCS4*`` ----------------------------------------- We can replace ``Py_UNICODE`` (typedef of ``wchar_t``) with ``Py_UCS4``. Since builtin codecs support UCS-4, we don't need to convert ``Py_UCS4*`` string to Unicode object. Pros: * We have more consistent API set. * User can encode UCS-4 string in C without creating Unicode object. Cons: * We have more public APIs to maintain. * Applications which uses UTF-8 or UTF-32 can not use these APIs anyway. * Other Python implementations may not have builtin codec for UCS-4. * If we change the Unicode internal representation to UTF-8, we need to keep UCS-4 support only for these APIs. Replace ``Py_UNICODE*`` with ``wchar_t*`` ----------------------------------------- We can replace ``Py_UNICODE`` to ``wchar_t``. Pros: * We have more consistent API set. * Backward compatible. Cons: * We have more public APIs to maintain. * They are inefficient on platforms ``wchar_t*`` is UTF-16. It is because built-in codecs supports only UCS-1, UCS-2, and UCS-4 input. Rejected ideas ============== Using runtime warning --------------------- These APIs doesn't release GIL for now. Emitting a warning from such APIs is not safe. See this example. .. code-block:: PyObject *u = PyList_GET_ITEM(list, i); // u is borrowed reference. PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u), PyUnicode_GET_SIZE(u), NULL); // Assumes u is still living reference. PyObject *t = PyTuple_Pack(2, u, b); Py_DECREF(b); return t; If we emit Python warning from ``PyUnicode_EncodeUTF8()``, warning filters and other threads may change the ``list`` and ``u`` can be a dangling reference after ``PyUnicode_EncodeUTF8()`` returned. Additionally, since we are not changing behavior but removing C APIs, runtime ``DeprecationWarning`` might not helpful for Python developers. We should warn to extension developers instead. Discussions =========== * `Plan to remove Py_UNICODE APis except PEP 623 <https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6I…>`_ * `bpo-41123: Remove Py_UNICODE APIs except PEP 623: <https://bugs.python.org/issue41123>`_ References ========== .. [1] Source package list chosen from top 4000 PyPI packages. (https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.…) .. [2] pyodbc -- Don't use PyUnicode_Encode API #792 (https://github.com/mkleehammer/pyodbc/pull/792) .. [3] Uncomment Py_DEPRECATED for Py_UNICODE APIs (GH-21318) (https://github.com/python/cpython/commit/9c3840870814493fed62e140cfa43c2883…) Copyright ========= This document has been placed in the public domain. -- Inada Naoki <songofacandy(a)gmail.com>

9 27

Improve CPython tracing performance
by Fabio Zadrozny 30 Jan '21

30 Jan '21

Hi all, Right now, when a debugger is active, the number of local variables can affect the tracing speed quite a lot. For instance, having tracing setup in a program such as the one below takes 4.64 seconds to run, yet, changing all the variables to have the same name -- i.e.: change all assignments to `a = 1` (such that there's only a single variable in the namespace), it takes 1.47 seconds (in my machine)... the higher the number of variables, the slower the tracing becomes. ``` import time t = time.time() def call(): a = 1 b = 1 c = 1 d = 1 e = 1 f = 1 def noop(frame, event, arg): return noop import sys sys.settrace(noop) for i in range(1_000_000): call() print('%.2fs' % (time.time() - t,)) ``` This happens because `PyFrame_FastToLocalsWithError` and `PyFrame_LocalsToFast` are called inside the `call_trampoline` ( https://github.com/python/cpython/blob/master/Python/sysmodule.c#L946). So, I'd like to simply remove those calls. Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise mutating non-current frames doesn't work anyways. As a note, pydevd already has such a call: https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b… and PyPy also has a counterpart. As for `PyFrame_FastToLocalsWithError`, I don't really see any reason to call it at all. i.e.: something as the code below prints the `a` variable from the `main()` frame regardless of that and I checked all pydevd tests and nothing seems to be affected (it seems that accessing f_locals already does this: https://github.com/python/cpython/blob/cb9879b948a19c9434316f8ab6aba9c4601a…, so, I don't see much reason to call it at all). ``` def call(): import sys frame = sys._getframe() print(frame.f_back.f_locals) def main(): a = 1 call() if __name__ == '__main__': main() ``` Does anyone see any issue with this? If it's non controversial, is a PEP needed or just an issue to track it would be enough to remove those 2 lines? Thanks, Fabio

3 9

pathlib.Path: inconsistent symlink_to() and link_to()
by Barney Gale 22 Jan '21

22 Jan '21

Hi, Pathlib's symlink_to() and link_to() methods have different argument orders, so: a.symlink_to(b) # Creates a symlink from A to B a.link_to(b) # Creates a hard link from B to A I don't think link_to() was intended to be implemented this way, as the docs say "Create a hard link pointing to a path named target.". It's also inconsistent with everything else in pathlib, most obviously symlink_to(). Bug report here: https://bugs.python.org/issue39291 This /really/ irks me. Apparently it's too late to fix link_to(), so I'd like to suggest we add a new hardlink_to() method that matches the symlink_to() argument order. link_to() then becomes deprecated/undocumented. Any thoughts? Barney

4 5

Advantages of pattern matching - a simple comparative analysis
by Brian Coleman 19 Jan '21

19 Jan '21

Take as an example a function designed to process a tree of nodes similar to that which might be output by a JSON parser. There are 4 types of node: - A node representing JSON strings - A node representing JSON numbers - A node representing JSON arrays - A node representing JSON dictionaries The function transforms a tree of nodes, beginning at the root node, and proceeding recursively through each child node in turn. The result is a Python object, with the following transformation applied to each node type: - A JSON string `->` Python `str` - A JSON number `->` Python `float` - A JSON array `->` Python `list` - A JSON dictionary `->` Python `dict` I have implemented this function using 3 different approaches: - The visitor pattern - `isinstance` checks against the node type - Pattern matching Here is the implementation using the visitor pattern: ``` from typing import List, Tuple class NodeVisitor: def visit_string_node(self, node: StringNode): pass def visit_number_node(self, node: NumberNode): pass def visit_list_node(self, node: ListNode): pass def visit_dict_node(self, node: DictNode): pass class Node: def visit(visitor: NodeVisitor): raise NotImplementedError() class StringNode(Node): value: str def visit(self, visitor: NodeVisitor): visitor.visit_string_node(self) class NumberNode(Node): value: str def visit(self, visitor: NodeVisitor): visitor.visit_number_node(self) class ListNode(Node): children: List[Node] def visit(self, visitor: NodeVisitor): visitor.visit_list_node(self) class DictNode(Node): children: List[Tuple[str, Node]] def visit(self, visitor: NodeVisitor): visitor.visit_dict_node(self) class Processor(NodeVisitor): def process(root_node: Node): return root_node.visit(self) def visit_string_node(self, node: StringNode): return node.value def visit_number_node(self, node: NumberNode): return float(node.value) def visit_list_node(self, node: ListNode): return [child_node.visit(self) for child_node in node.children] def visit_dict_node(self, node: DictNode): return {key: child_node.visit(self) for key, child_node in node.children} def process(root_node: Node): processor = Processor() return processor.process(root_node) ``` Here is the implementation using `isinstance` checks against the node type: ``` from typing import List, Tuple class Node: pass class StringNode(Node): value: str class NumberNode(Node): value: str class ListNode(Node): children: List[Node] class DictNode(Node): children: List[Tuple[str, Node]] def process(root_node: Node): def process_node(node: Node): if isinstance(node, StringNode): return node.value elif isinstance(node, NumberNode): return float(node.value) elif isinstance(node, ListNode): return [process_node(child_node) for child_node in node.children] elif isinstance(node, DictNode): return {key: process_node(child_node) for key, child_node in node.children} else: raise Exception('Unexpected node') return process_node(root_node) ``` Finally here is the implementation using pattern matching: ``` from typing import List, Tuple class Node: pass class StringNode(Node): value: str class NumberNode(Node): value: str class ListNode(Node): children: List[Node] class DictNode(Node): children: List[Tuple[str, Node]] def process(root_node: Node): def process_node(node: Node): match node: case StringNode(value=str_value): return str_value case NumberNode(value=number_value): return float(number_value) case ListNode(children=child_nodes): return [process_node(child_node) for child_node in child_nodes] case DictNode(children=child_nodes): return {key: process_node(child_node) for key, child_node in child_nodes} case _: raise Exception('Unexpected node') return process_node(root_node) ``` Here are the lengths of the different implementations: - Pattern matching `->` 37 lines - `isinstance` checks `->` 36 lines - The visitor pattern `->` 69 lines The visitor pattern implementation is by far the most verbose solution, weighing in at almost twice the length of the alternative implementations due to the large amount of boilerplate that is necessary to achieve double dispatch. The pattern matching and `isinstance` check implementations are very similar in length for this trivial example. In each implementation, there are 2 operations performed on each node. - Determine the type of the node - Destructure the node to extract the desired data The visitor pattern and `isinstance` check implementations separate these 2 operations, whereas the pattern matching approach combines the operations together. I believe that it is the declarative nature of pattern matching, where the operations of determining the type of the node and destructuring the node are combined into a single clause, which allows pattern matching to express a concise solution to the problem. In this trivial example, the advantage of pattern matching over the alternative of using a sequence of `if`-`elif`-`else` statements is not as obvious as it would be when compared to a more complex example, where a sub-tree of nodes might be matched based on their type and be destructured in a single clause. I have seen elsewhere an argument that pattern matching should not be accepted into Python as it introduces a pseudo-DSL that is separate from the rest of the language. I agree that pattern matching might be viewed as a pseudo-DSL, but I believe that it is a good thing, if it allows the solution to certain classes of problems to be expressed in a concise manner. People often raise similar objections to operator overloading in other languages, whereas the presence of operator overloading in Python allows mathematical expressions involving custom numeric types such as vectors to be expressed in a natural way. Furthermore, Python has a regular expression module which implements it's own DSL for the purpose of matching string patterns. Regular expressions, in a similar way to pattern matching, allow string patterns to be expressed in a concise and declarative manner. I really hope that the Steering Council accepts pattern matching into Python. I think that it allows for processing of heterogeneous graphs of objects using recursion in a concise, declarative manner. I would like to thank the authors of the Structural Pattern Matching PEP for their hard work in designing this feature and developing an implementation of it. I believe that it will be a wonderful addition to the language that I am very much looking forward to using.

12 25