Hi, folks.
Since the previous discussion was suspended without consensus, I wrote
a new PEP for it. (Thank you Victor for reviewing it!)
This PEP looks very similar to PEP 623 "Remove wstr from Unicode",
but for encoder APIs, not for Unicode object APIs.
URL (not available yet): https://www.python.org/dev/peps/pep-0624/
---
PEP: 624
Title: Remove Py_UNICODE encoder APIs
Author: Inada Naoki <songofacandy(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 06-Jul-2020
Python-Version: 3.11
Abstract
========
This PEP proposes to remove deprecated ``Py_UNICODE`` encoder APIs in
Python 3.11:
* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``
.. note::
`PEP 623 <https://www.python.org/dev/peps/pep-0623/>`_ propose to remove
Unicode object APIs relating to ``Py_UNICODE``. On the other hand, this PEP
is not relating to Unicode object. These PEPs are split because they have
different motivation and need different discussion.
Motivation
==========
In general, reducing the number of APIs that have been deprecated for
a long time and have few users is a good idea for not only it
improves the maintainability of CPython, but it also helps API users
and other Python implementations.
Rationale
=========
Deprecated since Python 3.3
---------------------------
``Py_UNICODE`` and APIs using it are deprecated since Python 3.3.
Inefficient
-----------
All of these APIs are implemented using ``PyUnicode_FromWideChar``.
So these APIs are inefficient when user want to encode Unicode
object.
Not used widely
---------------
When searching from top 4000 PyPI packages [1]_, only pyodbc use
these APIs.
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
pyodbc uses these APIs to encode Unicode object into bytes object.
So it is easy to fix it. [2]_
Alternative APIs
================
There are alternative APIs to accept ``PyObject *unicode`` instead of
``Py_UNICODE *``. Users can migrate to them.
=========================================
==========================================
Deprecated API Alternative APIs
=========================================
==========================================
``PyUnicode_Encode()`` ``PyUnicode_AsEncodedString()``
``PyUnicode_EncodeASCII()`` ``PyUnicode_AsASCIIString()`` \(1)
``PyUnicode_EncodeLatin1()`` ``PyUnicode_AsLatin1String()`` \(1)
``PyUnicode_EncodeUTF7()`` \(2)
``PyUnicode_EncodeUTF8()`` ``PyUnicode_AsUTF8String()`` \(1)
``PyUnicode_EncodeUTF16()`` ``PyUnicode_AsUTF16String()`` \(3)
``PyUnicode_EncodeUTF32()`` ``PyUnicode_AsUTF32String()`` \(3)
``PyUnicode_EncodeUnicodeEscape()`` ``PyUnicode_AsUnicodeEscapeString()``
``PyUnicode_EncodeRawUnicodeEscape()``
``PyUnicode_AsRawUnicodeEscapeString()``
``PyUnicode_EncodeCharmap()`` ``PyUnicode_AsCharmapString()`` \(1)
``PyUnicode_TranslateCharmap()`` ``PyUnicode_Translate()``
``PyUnicode_EncodeDecimal()`` \(4)
``PyUnicode_TransformDecimalToASCII()`` \(4)
=========================================
==========================================
Notes:
(1)
``const char *errors`` parameter is missing.
(2)
There is no public alternative API. But user can use generic
``PyUnicode_AsEncodedString()`` instead.
(3)
``const char *errors, int byteorder`` parameters are missing.
(4)
There is no direct replacement. But ``Py_UNICODE_TODECIMAL``
can be used instead. CPython uses
``_PyUnicode_TransformDecimalAndSpaceToASCII`` for converting
from Unicode to numbers instead.
Plan
====
Python 3.9
----------
Add ``Py_DEPRECATED(3.3)`` to following APIs. This change is committed
already [3]_. All other APIs have been marked ``Py_DEPRECATED(3.3)``
already.
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``.
Document all APIs as "will be removed in version 3.11".
Python 3.11
-----------
These APIs are removed.
* ``PyUnicode_Encode()``
* ``PyUnicode_EncodeASCII()``
* ``PyUnicode_EncodeLatin1()``
* ``PyUnicode_EncodeUTF7()``
* ``PyUnicode_EncodeUTF8()``
* ``PyUnicode_EncodeUTF16()``
* ``PyUnicode_EncodeUTF32()``
* ``PyUnicode_EncodeUnicodeEscape()``
* ``PyUnicode_EncodeRawUnicodeEscape()``
* ``PyUnicode_EncodeCharmap()``
* ``PyUnicode_TranslateCharmap()``
* ``PyUnicode_EncodeDecimal()``
* ``PyUnicode_TransformDecimalToASCII()``
Alternative ideas
=================
Instead of just removing deprecated APIs, we may be able to use thier
names with different signature.
Make some private APIs public
------------------------------
``PyUnicode_EncodeUTF7()`` doesn't have public alternative APIs.
Some APIs have alternative public APIs. But they are missing
``const char *errors`` or ``int byteorder`` parameters.
We can rename some private APIs and make them public to cover missing
APIs and parameters.
============================= ================================
Rename to Rename from
============================= ================================
``PyUnicode_EncodeASCII()`` ``_PyUnicode_AsASCIIString()``
``PyUnicode_EncodeLatin1()`` ``_PyUnicode_AsLatin1String()``
``PyUnicode_EncodeUTF7()`` ``_PyUnicode_EncodeUTF7()``
``PyUnicode_EncodeUTF8()`` ``_PyUnicode_AsUTF8String()``
``PyUnicode_EncodeUTF16()`` ``_PyUnicode_EncodeUTF16()``
``PyUnicode_EncodeUTF32()`` ``_PyUnicode_EncodeUTF32()``
============================= ================================
Pros:
* We have more consistent API set.
Cons:
* We have more public APIs to maintain.
* Existing public APIs are enough for most use cases, and
``PyUnicode_AsEncodedString()`` can be used in other cases.
Replace ``Py_UNICODE*`` with ``Py_UCS4*``
-----------------------------------------
We can replace ``Py_UNICODE`` (typedef of ``wchar_t``) with
``Py_UCS4``. Since builtin codecs support UCS-4, we don't need to
convert ``Py_UCS4*`` string to Unicode object.
Pros:
* We have more consistent API set.
* User can encode UCS-4 string in C without creating Unicode object.
Cons:
* We have more public APIs to maintain.
* Applications which uses UTF-8 or UTF-32 can not use these APIs
anyway.
* Other Python implementations may not have builtin codec for UCS-4.
* If we change the Unicode internal representation to UTF-8, we need
to keep UCS-4 support only for these APIs.
Replace ``Py_UNICODE*`` with ``wchar_t*``
-----------------------------------------
We can replace ``Py_UNICODE`` to ``wchar_t``.
Pros:
* We have more consistent API set.
* Backward compatible.
Cons:
* We have more public APIs to maintain.
* They are inefficient on platforms ``wchar_t*`` is UTF-16. It is
because built-in codecs supports only UCS-1, UCS-2, and UCS-4
input.
Rejected ideas
==============
Using runtime warning
---------------------
These APIs doesn't release GIL for now. Emitting a warning from
such APIs is not safe. See this example.
.. code-block::
PyObject *u = PyList_GET_ITEM(list, i); // u is borrowed reference.
PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u),
PyUnicode_GET_SIZE(u), NULL);
// Assumes u is still living reference.
PyObject *t = PyTuple_Pack(2, u, b);
Py_DECREF(b);
return t;
If we emit Python warning from ``PyUnicode_EncodeUTF8()``, warning
filters and other threads may change the ``list`` and ``u`` can be
a dangling reference after ``PyUnicode_EncodeUTF8()`` returned.
Additionally, since we are not changing behavior but removing C APIs,
runtime ``DeprecationWarning`` might not helpful for Python
developers. We should warn to extension developers instead.
Discussions
===========
* `Plan to remove Py_UNICODE APis except PEP 623
<https://mail.python.org/archives/list/python-dev@python.org/thread/S7KW2U6I…>`_
* `bpo-41123: Remove Py_UNICODE APIs except PEP 623:
<https://bugs.python.org/issue41123>`_
References
==========
.. [1] Source package list chosen from top 4000 PyPI packages.
(https://github.com/methane/notes/blob/master/2020/wchar-cache/package_list.…)
.. [2] pyodbc -- Don't use PyUnicode_Encode API #792
(https://github.com/mkleehammer/pyodbc/pull/792)
.. [3] Uncomment Py_DEPRECATED for Py_UNICODE APIs (GH-21318)
(https://github.com/python/cpython/commit/9c3840870814493fed62e140cfa43c2883…)
Copyright
=========
This document has been placed in the public domain.
--
Inada Naoki <songofacandy(a)gmail.com>
Hi all,
Right now, when a debugger is active, the number of local variables can
affect the tracing speed quite a lot.
For instance, having tracing setup in a program such as the one below takes
4.64 seconds to run, yet, changing all the variables to have the same name
-- i.e.: change all assignments to `a = 1` (such that there's only a single
variable in the namespace), it takes 1.47 seconds (in my machine)... the
higher the number of variables, the slower the tracing becomes.
```
import time
t = time.time()
def call():
a = 1
b = 1
c = 1
d = 1
e = 1
f = 1
def noop(frame, event, arg):
return noop
import sys
sys.settrace(noop)
for i in range(1_000_000):
call()
print('%.2fs' % (time.time() - t,))
```
This happens because `PyFrame_FastToLocalsWithError` and
`PyFrame_LocalsToFast` are called inside the `call_trampoline` (
https://github.com/python/cpython/blob/master/Python/sysmodule.c#L946).
So, I'd like to simply remove those calls.
Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise
mutating non-current frames doesn't work anyways. As a note, pydevd already
has such a call:
https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b…
and PyPy also has a counterpart.
As for `PyFrame_FastToLocalsWithError`, I don't really see any reason to
call it at all.
i.e.: something as the code below prints the `a` variable from the `main()`
frame regardless of that and I checked all pydevd tests and nothing seems
to be affected (it seems that accessing f_locals already does this:
https://github.com/python/cpython/blob/cb9879b948a19c9434316f8ab6aba9c4601a…,
so, I don't see much reason to call it at all).
```
def call():
import sys
frame = sys._getframe()
print(frame.f_back.f_locals)
def main():
a = 1
call()
if __name__ == '__main__':
main()
```
Does anyone see any issue with this?
If it's non controversial, is a PEP needed or just an issue to track it
would be enough to remove those 2 lines?
Thanks,
Fabio
Hi,
Pathlib's symlink_to() and link_to() methods have different argument
orders, so:
a.symlink_to(b) # Creates a symlink from A to B
a.link_to(b) # Creates a hard link from B to A
I don't think link_to() was intended to be implemented this way, as the
docs say "Create a hard link pointing to a path named target.". It's also
inconsistent with everything else in pathlib, most obviously symlink_to().
Bug report here: https://bugs.python.org/issue39291
This /really/ irks me. Apparently it's too late to fix link_to(), so I'd
like to suggest we add a new hardlink_to() method that matches the
symlink_to() argument order. link_to() then becomes deprecated/undocumented.
Any thoughts?
Barney
Hi everyone,
PEP 634/5/6 presents a possible implementation of pattern matching for
Python.
Much of the discussion around PEP 634, and PEP 622 before it, seems to
imply that PEP 634 is synonymous with pattern matching; that if you
reject PEP 634 then you are rejecting pattern matching.
That simply isn't true.
Can we discuss whether we want pattern matching in Python and
the broader semantics first, before dealing with low level details?
Do we want pattern matching in Python at all?
---------------------------------------------
Pattern matching works really well in statically typed, functional
languages.
The lack of mutability, constrained scope and the ability of the
compiler to distinguish let variables from constants means that pattern
matching code has fewer errors, and can be compiled efficiently.
Pattern matching works less well in dynamically-typed, functional
languages and statically-typed, procedural languages.
Nevertheless, it works well enough for it to be a popular feature in
both erlang and rust.
In dynamically-typed, procedural languages, however, it is not clear (at
least not to me) that it works well enough to be worthwhile.
That is not say that pattern matching could never be of value in Python,
but PEP 635 fails to demonstrate that it can (although it does a better
job than PEP 622).
Should match be an expression, or a statement?
----------------------------------------------
Do we want a fancy switch statement, or a powerful expression?
Expressions have the advantage of not leaking (like comprehensions in
Python 3), but statements are easier to work with.
Can pattern matching make it clear what is assigned?
----------------------------------------------------
Embedding the variables to be assigned into a pattern, makes the pattern
concise, but requires discarding normal Python syntax and inventing a
new sub-language. Could we make patterns fit Python better?
Is it possible to make assignment to variables clear, and unambiguous,
and allow the use of symbolic constants at the same time?
I think it is, but PEP 634 fails to do this.
How should pattern matching be integrated with the object model?
----------------------------------------------------------------
What special method(s) should be added? How and when should they be called?
PEP 634 largely disregards the object model, meaning it has many special
cases, and is inefficient.
The semantics must be well defined.
-----------------------------------
Language extensions PEPs should define the semantics of those
extensions. For example, PEP 343 and PEP 380 both did.
https://www.python.org/dev/peps/pep-0343/#specification-the-with-statementhttps://www.python.org/dev/peps/pep-0380/#formal-semantics
PEP 634 just waves its hands and talks about undefined behavior, which
horrifies me.
In summary,
I would ask anyone who wants pattern matching adding to Python, to not
support PEP 634.
PEP 634 just isn't a good fit for Python, and we deserve something better.
Cheers,
Mark.
(Context: Continuing to prepare for the core dev sprint next week. Since
the sprint is near, *I'd greatly appreciate any quick comments, feedback
and ideas!*)
Following up my collection of past beginning contributor experiences, I've
collected these experiences in a dedicated GitHub repo[1] and written a
(subjective!) summary of main themes that I recognize in the stories, which
I've also included in the repo[2].
A "TL;DR" bullet list of those main themes:
* Slow/no responsiveness
* Long, slow process
* Hard to find where to contribute
* Mentorship helps a lot, but is scarce
* A lot to learn to get started
* It's intimidating
More specifically, something that has come up often is that maintaining
momentum for new contributors is crucial for them to become long-term
contributors. Most often, this comes up in relation to the first two
points: Suggestions or PRs are completely receive no attention at all
("ignored") or stop receiving attention at some point ("lost to the void").
Unfortunately, the probability of this is pretty high for any issue/PR, so
for a new contributor this is almost guaranteed to happen while working on
one of their first few contributions. I've seen this happen many times, and
have found that I have to personally follow promising contributors' work to
ensure that this doesn't happen to them. I've also seen contributors learn
to actively seek out core devs when these situations arise, which is often
a successful tactic, but shouldn't be necessary so often.
Now, this is in large part a result of the fact that us core devs are not a
very large group, made up almost entirely of volunteers working on this in
their spare time. Last I checked, the total amount of paid development time
dedicated to developing Python is less than 3 full-time (i.e. ~100 hours a
week).
The situation being problematic is clear enough that the PSF had concrete
plans to hire paid developers to review issues and PRs. However, those
plans have been put on hold indefinitely, since the PSF's funding has
shrunk dramatically since the COVID-19 outbreak (no PyCon!).
So, what can be done? Besides raising more funds (see a note on this
below), I think we can find ways to reduce how often issues/PRs become
"stalled". Here are some ideas:
1. *Generate reminders for reviewers when an issue or PR becomes "stalled'
due to them.* Personally, I've found that both b.p.o. and GitHub make it
relatively hard to remember to follow up on all of the many issues/PRs
you've taken part in reviewing. It takes considerable attention and
discipline to do so consistently, and reminders like these would have
helped me. Many (many!) times, all it took to get an issue/PR moving
forward (or closed) was a simple "ping?" comment.
2. *Generate reminders for contributors when an issue or PR becomes
"stalled" due to them.* Similar to the above, but I consider these separate.
3. *Advertise something like a "2-for-1" standing offer for reviews.* This
would give contributors an "official", acceptable way to get attention for
their issue/PR, other than "begging" for attention on a mailing list. There
are good ways for new contributors to be of significant help despite being
new to the project, such as checking whether old bugs are still relevant,
searching for duplicate issues, or applying old patches to the current code
and creating a PR. (This would be similar to Martin v. Löwis's 5-for-1
offer in 2012[3], which had little success but lead to some interesting
followup discussion[4]).
4. *Encourage core devs to dedicate some of their time to working through
issues/PRs which are "ignored" or "stalled".* This would require first
generating reliable lists of issues/PRs in such states. This could be in
various forms, such as predefined GitHub/b.p.o. queries, a dedicated
web-page, a periodic message similar to b.p.o.'s "weekly summary" email, or
dedicated tags/labels for issues/PRs. (Perhaps prioritize "stalled" over
"ignored".)
- Tal Einat
[1]: https://github.com/taleinat/python-contribution-feedback
[2]:
https://github.com/taleinat/python-contribution-feedback/blob/master/Takeaw…
[3]:
https://mail.python.org/archives/list/python-dev@python.org/message/7DLUN4Y…
[4]:
https://mail.python.org/archives/list/python-dev@python.org/thread/N4MMHXXO…
Hi folks,
This is a mailing list repost of the Discourse thread at
https://discuss.python.org/t/pep-642-constraint-pattern-syntax-for-structur…
The rendered version of the PEP can be found here:
https://www.python.org/dev/peps/pep-0642/
The full text is also quoted in the Discourse thread.
The remainder of this email is the same introduction that I posted on Discourse.
I’m largely a fan of the Structural Pattern Matching proposal in PEP
634, but there’s one specific piece of the syntax proposal that I
strongly dislike: the idea of basing the distinction between capture
patterns and value patterns purely on whether they use a simple name
or a dotted name.
Thus PEP 642, which retains most of PEP 634 unchanged, but adjusts
value checks to use an explicit prefix syntax (either `?EXPR` for
equality constraints, or `?is EXPR` for identity constraints), rather
than relying on users learning that literals and attribute lookups in
a capture pattern mean a value lookup check, while simple names mean a
capture pattern (unlike both normal expressions, where all three mean
a value lookup, and assignment targets, where both simple and dotted
names bind a new reference).
The PEP itself has a lot of words explaining why I’ve made the design
decisions I have, as well as the immediate and potential future
benefits offered by using an explicit prefix syntax for value
constraints, but the super short form goes like this:
* if you don’t like match statements at all, or wish we were working
on designing a C-style switch statement instead, then PEP 642 isn’t
going to appeal to you any more than PEP 634 does
* if, like me, you don’t like the idea of breaking the existing
property of Python that caching the result of a value lookup
subexpression in a local variable and then using that variable in
place of the original subexpression should “just work”, then PEP 642’s
explicit constraint prefix syntax may be more to your liking
* however, if the idea of the `?` symbol becoming part of Python’s
syntax doesn’t appeal to you, then you may consider any improved
clarity of intent that PEP 642 might offer to not be worth that cost
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
Hi!
The timeline for this year's election will be the same as last year.
* The nomination period will begin Nov 1, 2020 (do not post nominations
until then)
* Nomination period will end Nov 15, 2020
* Voting will begin Dec 1, 2020
* Voting will end Dec 15, 2020
Nominations will be collected via https://discuss.python.org/ (more details
to follow on Nov 1).
New for this year: Ernest W. Durbin III will be running the vote along with
the assistance of Joe Carey, a PSF employee. They will be co-admins going
forward. I have cc'ed them in on this thread as well in case there are any
questions.
Thanks,
Ewa
Hi, all.
To avoid BytesWarning, the compiler needs to do some hack when they
need to store bytes and str constants in one dict or set.
BytesWarning has maintenance costs. It is not huge, but significant.
When can we remove it? My idea is:
3.10: Deprecate the -b option.
3.11: Make the -b option no-op. Bytes warning never emits.
3.12: Remove the -b option.
BytesWarning will be deprecated in the document, but not to be removed.
Users who want to use the -b option during 2->3 conversion need to use
Python ~3.10 for a while.
Regards,
--
Inada Naoki <songofacandy(a)gmail.com>
I’m not on this list. But I have offered to help - if there are tasks that need to be done to help this I can help put the weight of a commercial entity behind it whether that involves assigning our developers to work on this, helping pay for external developers to do so, or assisting with access to machine resources.
For the record there are multiple illumos distributions and most are both free and run reasonably well in virtual machines. Claiming that developers don’t have access as a reason to discontinue the port is a bit disingenuous. Anyone can get access if they want and if they can figure out how to login and use Linux then this should be pretty close to trivial for them.
What’s more likely is that some group of developers aren’t interested in supporting stuff they don’t actively use. I get it. It’s easier to work in a monoculture. But in this case there are many many more users of this that would be impacted than a naive examination of downloads will show.
Of course this all presumes that the core Python team still places value on being a cross platform portable tool. I can help solve most of the other concerns - except for this one.
- Garrett
Hi,
I propose to drop the Solaris support in Python to reduce the Python
maintenance burden:
https://bugs.python.org/issue42173
I wrote a draft PR to show how much code could be removed (around 700
lines in 65 files):
https://github.com/python/cpython/pull/23002/files
In 2016, I asked if we still wanted to maintain the Solaris support in
Python, because Solaris buildbots were failing for longer than 6
months and nobody was able to fix them. It was requested to find a
core developer volunteer to fix Solaris issues and to set up a Solaris
buildbot.
https://mail.python.org/archives/list/python-dev@python.org/thread/NOT2RORS…
Four years later, nothing has happened. Moreover, in 2018, Oracle laid
off the Solaris development engineering staff. There are around 25
open Python bugs specific to Solaris.
I see 3 options:
* Current best effort support (no change): changes only happen if a
core dev volunteers to review and merge a change written by a
contributor.
* Schedule the removal in 2 Python releases (Python 3.12) and start to
announce that Solaris support is going to be removed
* Remove the Solaris code right now (my proposition): Solaris code
will have to be maintained outside the official Python code base, as
"downstream patches"
Solaris has a few specific features visible at the Python level:
select.devpoll, os.stat().st_fstype and stat.S_ISDOOR().
While it's unclear to me if Oracle still actively maintains Solaris
(latest release in 2018, no major update since 2018), Illumos and
OpenSolaris (variants or "forks") still seem to be active.
In 2019, a Solaris blog post explains that Solaris 11.4 still uses
Python 2.7 but plans to migrate to Python 3, and Python 3.4 is also
available. These two Python versions are no longer supported.
https://blogs.oracle.com/solaris/future-of-python-on-solaris
The question is if the Python project has to maintain the Solaris
specific code or if this code should now be maintained outside Python.
What do you think? Should we wait 5 more years? Should we expect a
company will offer to maintain the Solaris support? Is there a
motivated core developer to fix Solaris issue? As I wrote, nothing has
happened in the last 4 years...
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.