When can we remove wchar_t* cache from string?
Hi, all. Py_UNICODE has been deprecated since PEP 393 (Flexible string representation). wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance. The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal? My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it. I grepped the use of the deprecated APIs from top 4000 PyPI packages. result: https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use step: https://github.com/methane/notes/blob/master/2020/wchar-cache/README.md I noticed: * Most of them are generated by Cython. * I reported it to Cython so Cython 0.29.21 will fix them. I expect more than 1 year between Cython 0.29.21 and Python 3.11rc1. * Most of them are `PyUnicode_FromUnicode(NULL, 0);` * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0. Regards, -- Inada Naoki <songofacandy@gmail.com>
Hi, A big +1 to exposing fewer internals of the PyUnicodeObject to C code. Ultimately, making PyUnicodeObject immutable to C code would be a real bonus. It would make the code cleaner, safer and faster. A triple win! I don't think removing Py_UNICODE API be sufficient for that, have you thoughts on what else would need to change? On 12/06/2020 9:32 am, Inada Naoki wrote:
Hi, all.
Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).
wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance.
The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis
But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal?
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
I grepped the use of the deprecated APIs from top 4000 PyPI packages.
result: https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use step: https://github.com/methane/notes/blob/master/2020/wchar-cache/README.md
I noticed:
* Most of them are generated by Cython. * I reported it to Cython so Cython 0.29.21 will fix them. I expect more than 1 year between Cython 0.29.21 and Python 3.11rc1. * Most of them are `PyUnicode_FromUnicode(NULL, 0);` * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0.
Regards,
-- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7JVC3IKS... Code of Conduct: http://python.org/psf/codeofconduct/
On 2020-06-12 09:32, Inada Naoki wrote:
Hi, all.
Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).
wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance.
The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis
But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal?
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
I grepped the use of the deprecated APIs from top 4000 PyPI packages.
result: https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use step: https://github.com/methane/notes/blob/master/2020/wchar-cache/README.md
I noticed:
* Most of them are generated by Cython. * I reported it to Cython so Cython 0.29.21 will fix them. I expect more than 1 year between Cython 0.29.21 and Python 3.11rc1. * Most of them are `PyUnicode_FromUnicode(NULL, 0);` * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0.
I think it would be strange to keep PyUnicode_FromUnicode but complain unless length == 0. If it's going to be removed, then remove it and suggest a replacement for that use-case, such as PyUnicode_FromString with a NULL argument. (I'm not sure if PyUnicode_FromString will accept NULL, but if it currently doesn't, then maybe it should in future be treated as being equivalent to PyUnicode_FromString("").)
On Sat, Jun 13, 2020 at 1:36 AM MRAB <python@mrabarnett.plus.com> wrote:
* Most of them are `PyUnicode_FromUnicode(NULL, 0);` * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0.
I think it would be strange to keep PyUnicode_FromUnicode but complain unless length == 0. If it's going to be removed, then remove it and suggest a replacement for that use-case, such as PyUnicode_FromString with a NULL argument. (I'm not sure if PyUnicode_FromString will accept NULL, but if it currently doesn't, then maybe it should in future be treated as being equivalent to PyUnicode_FromString("").)
Of course, there is an API to create an empty string: PyUnicode_New(0, 0); But since Cython is using `PyUnicode_FromString(NULL, 0)`, keep it working for some versions will mitigate the breaking change. Note that we can remove wchar_t cache while keeping it working. Anyway, this is an idea for mitigation. If all of maintained packages fixes it before Python 3.11, mitigation is not needed. Regards, -- Inada Naoki <songofacandy@gmail.com>
Le sam. 13 juin 2020 à 12:39, Inada Naoki <songofacandy@gmail.com> a écrit :
Of course, there is an API to create an empty string: PyUnicode_New(0, 0); But since Cython is using `PyUnicode_FromString(NULL, 0)`, keep it working for some versions will mitigate the breaking change. Note that we can remove wchar_t cache while keeping it working.
Anyway, this is an idea for mitigation. If all of maintained packages fixes it before Python 3.11, mitigation is not needed.
Can someone propose a Cython PR to use PyUnicode_New(0, 0) on Python 3.3 and newer, or PyUnicode_FromString(NULL, 0) on old Python versions? Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Sorry, ignore my comment: Cython no longer uses PyUnicode_FromString(NULL, 0) in the master branch. The change was backported to the 0.29.x branch, but this stable branch requires a second fix, so I wrote it: https://github.com/cython/cython/pull/3721 Victor Le ven. 3 juil. 2020 à 16:00, Victor Stinner <vstinner@python.org> a écrit :
Le sam. 13 juin 2020 à 12:39, Inada Naoki <songofacandy@gmail.com> a écrit :
Of course, there is an API to create an empty string: PyUnicode_New(0, 0); But since Cython is using `PyUnicode_FromString(NULL, 0)`, keep it working for some versions will mitigate the breaking change. Note that we can remove wchar_t cache while keeping it working.
Anyway, this is an idea for mitigation. If all of maintained packages fixes it before Python 3.11, mitigation is not needed.
Can someone propose a Cython PR to use PyUnicode_New(0, 0) on Python 3.3 and newer, or PyUnicode_FromString(NULL, 0) on old Python versions?
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
-- Night gathers, and now my watch begins. It shall not end until my death.
On Fri, Jun 12, 2020 at 5:32 PM Inada Naoki <songofacandy@gmail.com> wrote:
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
Additionally, raise DeprecationWarning runtime when these APIs are used. -- Inada Naoki <songofacandy@gmail.com>
Additionally, raise DeprecationWarning runtime when these APIs are used.
So, just to clarify, current usage of these 7 unicode APIs does not emit any warnings and would only start doing so in 3.10? If so, I think we may want to consider giving users until 3.12 until they're removed. Especially with the shortened yearly release cadence, a single version between deprecation warning and removal feels a bit short. Even if they've already been announced as deprecated in whatsnew and the documentation since 3.3, it's very possible for it to have been missed. In this case, it might be okay to remove in 3.11 since they've been deprecated for an exceptionally long period and appear to have a clear transition path. But, 3.12 would be safer for removal, and I don't think it would be much of an additional burden on our end to keep them around for one extra version. Another option might be to proceed with the 3.11 removal, and simply delay it to 3.12 if we receive significant complaints of breakage in 3.11 to give users some extra time to address it. As far as I'm aware, this isn't typically done, but I think it would be more than reasonable in this scenario (assuming the deprecation warnings are just being introduced in 3.10). On Sat, Jun 13, 2020 at 6:46 AM Inada Naoki <songofacandy@gmail.com> wrote:
On Fri, Jun 12, 2020 at 5:32 PM Inada Naoki <songofacandy@gmail.com> wrote:
My proposal is, schedule the removal on Python 3.11. But we will
postpone
the removal if we can not remove its usage until it.
Additionally, raise DeprecationWarning runtime when these APIs are used.
-- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3L456JBA... Code of Conduct: http://python.org/psf/codeofconduct/
2020年6月13日(土) 20:12 Kyle Stanley <aeros167@gmail.com>:
Additionally, raise DeprecationWarning runtime when these APIs are used.
So, just to clarify, current usage of these 7 unicode APIs does not emit any warnings and would only start doing so in 3.10?
They have been deprecated in C already. Compiler emits warning. This additional proposal is adding runtime warning before removal. In this case, it might be okay to remove in 3.11 since they've been
deprecated for an exceptionally long period and appear to have a clear transition path. But, 3.12 would be safer for removal, and I don't think it would be much of an additional burden on our end to keep them around for one extra version.
I am trying to find and remove use of theses APIs in PyPI packages. We will postpone the removal if the migration is slow. But let's set the goal to 3.11 for now.
They have been deprecated in C already. Compiler emits warning.
This additional proposal is adding runtime warning before removal.
Oh, sorry. I misunderstood the previous statement then. In that case, I think scheduling the removal for 3.11 is perfectly reasonable if the compiler warnings have already been in place. +1. On Sat, Jun 13, 2020 at 7:20 AM Inada Naoki <songofacandy@gmail.com> wrote:
2020年6月13日(土) 20:12 Kyle Stanley <aeros167@gmail.com>:
Additionally, raise DeprecationWarning runtime when these APIs are used.
So, just to clarify, current usage of these 7 unicode APIs does not emit any warnings and would only start doing so in 3.10?
They have been deprecated in C already. Compiler emits warning.
This additional proposal is adding runtime warning before removal.
In this case, it might be okay to remove in 3.11 since they've been
deprecated for an exceptionally long period and appear to have a clear transition path. But, 3.12 would be safer for removal, and I don't think it would be much of an additional burden on our end to keep them around for one extra version.
I am trying to find and remove use of theses APIs in PyPI packages. We will postpone the removal if the migration is slow. But let's set the goal to 3.11 for now.
On Sat, Jun 13, 2020 at 8:20 PM Inada Naoki <songofacandy@gmail.com> wrote:
2020年6月13日(土) 20:12 Kyle Stanley <aeros167@gmail.com>:
Additionally, raise DeprecationWarning runtime when these APIs are used.
So, just to clarify, current usage of these 7 unicode APIs does not emit any warnings and would only start doing so in 3.10?
They have been deprecated in C already. Compiler emits warning.
This additional proposal is adding runtime warning before removal.
I'm sorry, I was wrong. Py_DEPRECATED(3.3) is commented out for some APIs. So Python 3.8 doesn't show warning for them. I want to uncomment them in Python 3.9. https://github.com/python/cpython/pull/20878 As far as I grepped, most of PyPI packages use deprecated APIs because Cython generates it. Updating Cython will fix them. Some of them are straightforward and I have created an issue or sent pull request already. A few projects, pyScss and Genshi are not straightforward. But it is not too hard and I will help them. I still think 2 years are enough to removal. Regards,
I'm sorry, I was wrong. Py_DEPRECATED(3.3) is commented out for some APIs. So Python 3.8 doesn't show warning for them.
Ah, no problem. Thanks for checking up on that.
I still think 2 years are enough to removal.
Hmm, okay. At the least though, it does mean we have to be a bit more vigilant in ensuring that everyone has had a chance to migrate from those APIs, and delaying the removal if not. On Sun, Jun 14, 2020 at 9:34 PM Inada Naoki <songofacandy@gmail.com> wrote:
On Sat, Jun 13, 2020 at 8:20 PM Inada Naoki <songofacandy@gmail.com> wrote:
2020年6月13日(土) 20:12 Kyle Stanley <aeros167@gmail.com>:
Additionally, raise DeprecationWarning runtime when these APIs are
used.
So, just to clarify, current usage of these 7 unicode APIs does not
emit any warnings and would only start doing so in 3.10?
They have been deprecated in C already. Compiler emits warning.
This additional proposal is adding runtime warning before removal.
I'm sorry, I was wrong. Py_DEPRECATED(3.3) is commented out for some APIs. So Python 3.8 doesn't show warning for them. I want to uncomment them in Python 3.9. https://github.com/python/cpython/pull/20878
As far as I grepped, most of PyPI packages use deprecated APIs because Cython generates it. Updating Cython will fix them. Some of them are straightforward and I have created an issue or sent pull request already.
A few projects, pyScss and Genshi are not straightforward. But it is not too hard and I will help them.
I still think 2 years are enough to removal.
Regards,
12.06.20 11:32, Inada Naoki пише:
Hi, all.
Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).
wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance.
The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis
But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal?
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
I have a plan for more graduate removing of this feature. I created a PR which adds several compile options, so Python can be built in one of three modes: 1. Support wchar_t* cache and use it. It is the current mode. 2. Support wchar_t* cache, but do not use it internally in CPython. It can be used to test whether getting rid of the wchar_t* cache can have negative effects. 3. Do not support wchar_t* cache. It is binary incompatible build. Its purpose is to allow authors of third-party libraries to prepare to future breakage. The plan is: 1. Add support of the above compile options. Unfortunately I did not have time to do this before feature freeze in 3.9, but maybe make an exception? 2. Make option 2 default. 3. Remove option 1. 4. Enable compiler deprecations for all legacy C API. Currently they are silenced for the C API used internally. 5. Make legacy C API always failing. 6. Remove legacy C API from header files. There is a long way to steps 5 and 6. I think 3.11 is too early. https://bugs.python.org/issue36346 https://github.com/python/cpython/pull/12409
Hi Serhiy, On 15/06/2020 8:22 am, Serhiy Storchaka wrote:
12.06.20 11:32, Inada Naoki пише:
Hi, all.
Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).
wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance.
The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis
But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal?
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
I have a plan for more graduate removing of this feature. I created a PR which adds several compile options, so Python can be built in one of three modes:
I don't like this approach. Adding compile time options means we need to test more versions, but is no help to end users as they will end up with the release version anyway.
1. Support wchar_t* cache and use it. It is the current mode.
2. Support wchar_t* cache, but do not use it internally in CPython. It can be used to test whether getting rid of the wchar_t* cache can have negative effects.
We can test an performance impacts in the usual way, but comparing the before and after versions of any changes.
3. Do not support wchar_t* cache. It is binary incompatible build. Its purpose is to allow authors of third-party libraries to prepare to future breakage.
Deprecation warnings allow third-parties to fix things before they are removed, without anyone having to compile their own version of Python.
The plan is:
1. Add support of the above compile options. Unfortunately I did not have time to do this before feature freeze in 3.9, but maybe make an exception? 2. Make option 2 default. 3. Remove option 1. 4. Enable compiler deprecations for all legacy C API. Currently they are silenced for the C API used internally. 5. Make legacy C API always failing. 6. Remove legacy C API from header files.
There is a long way to steps 5 and 6. I think 3.11 is too early.
https://bugs.python.org/issue36346 https://github.com/python/cpython/pull/12409 _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IB5M72AV...
Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, 15 Jun 2020 11:22:09 +0100 Mark Shannon <mark@hotpy.org> wrote:
I don't like this approach. Adding compile time options means we need to test more versions, but is no help to end users as they will end up with the release version anyway.
I agree with Mark. This sounds less pointless complication and undue maintenance overhead. I would like to propose the opposite approach. Simply remove those functions and the wchar_t cache now. They have been deprecated since 3.3. Yes, there's going to be a bit of pain for a couple downstream projects (mostly Cython), but it should be minor anyway. Regards Antoine.
On Mon, Jun 15, 2020 at 4:25 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
I have a plan for more graduate removing of this feature. I created a PR which adds several compile options, so Python can be built in one of three modes:
1. Support wchar_t* cache and use it. It is the current mode.
2. Support wchar_t* cache, but do not use it internally in CPython. It can be used to test whether getting rid of the wchar_t* cache can have negative effects.
3. Do not support wchar_t* cache. It is binary incompatible build. Its purpose is to allow authors of third-party libraries to prepare to future breakage.
[snip]
I like your pull request, although I am not sure option 2 is really needed. With your compile-time option, we can remove wstr in early alpha stage (e.g. Python 3.11a1), and get it back again if it breaks too many packages.
The plan is:
1. Add support of the above compile options. Unfortunately I did not have time to do this before feature freeze in 3.9, but maybe make an exception? 2. Make option 2 default. 3. Remove option 1. 4. Enable compiler deprecations for all legacy C API. Currently they are silenced for the C API used internally. 5. Make legacy C API always failing. 6. Remove legacy C API from header files.
There is a long way to steps 5 and 6. I think 3.11 is too early.
Note that compiler deprecation (4) is approved by Łukasz Langa. So Python 3.9 will have compiler deprecation. https://github.com/python/cpython/pull/20878#issuecomment-644830032 On the other hand, PyArg_ParseTuple(AndKeywords) with u/Z format doesn't have any deprecation yet. I'm not sure we can backport the runtime DeprecationWarning to 3.9, because we need to fix Argument Clinic too. (Serhiy's pull request fix the Argument Clinic.) Regards, -- Inada Naoki <songofacandy@gmail.com>
Hi INADA-san, IMO Python 3.11 is too early because we don't emit a DeprecationWarning on every single deprecation function. 1) Emit a DeprecationWarning at runtime (ex: Python 3.10) 2) Wait two Python releases: see https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 3) Remove the deprecated feature (ex: Python 3.12) I don't understand if *all* deprecated functions are causing implementation issues, or only a few of them? PyUnicode_AS_UNICODE() initializes PyASCIIObject.wstr if needed, and then return PyASCIIObject.wstr. I don't think that PyASCIIObject.wstr can be called "a cache": there are functions relying on this member. On the other hand, PyUnicode_FromUnicode(str, size) is basically a wrapper to PyUnicode_FromWideChar(): it doesn't harm to keep this wrapper to ease migration. Only PyUnicode_FromUnicode(NULL, size) is causing troubles, right? Is there a list of deprecated functions and is it possible to group them in two categories: must be removed and "can be kept for a few more releases"? If the intent is to reduce Python memory footprint, PyASCIIObject.wstr can be moved out of PyASCIIObject structure, maybe we can imagine a WeakDict. It would map a Python str object to its wstr member (wchar_* string). If the Python str object is removed, we can release the wstr string. The technical problem is that it is not possible to create a weak reference to a Python str. We may insert code in unicode_dealloc() to delete manually the wstr in this case. Maybe a _Py_hashtable_t of pycore_hashtable.h could be used for that. Since this discussion is on-going for something like 5 years in multiple bugs.python.org issues and email threads, maybe it would help to have a short PEP describing issues of the deprecated functions, explain the plan to migrate to the new functions, and give a schedule of the incompatible changes. INADA-san: would you be a candidate to write such PEP? Victor Le ven. 12 juin 2020 à 10:37, Inada Naoki <songofacandy@gmail.com> a écrit :
Hi, all.
Py_UNICODE has been deprecated since PEP 393 (Flexible string representation).
wchar_t* cache in the string object is used only in deprecated APIs. It waste 1 word (8 bytes on 64bit machine) per string instance.
The deprecated APIs are documented as "Deprecated since version 3.3, will be removed in version 4.0." See https://docs.python.org/3/c-api/unicode.html#deprecated-py-unicode-apis
But when PEP 393 is implemented, no one expects 3.10 will be released. Can we reschedule the removal?
My proposal is, schedule the removal on Python 3.11. But we will postpone the removal if we can not remove its usage until it.
I grepped the use of the deprecated APIs from top 4000 PyPI packages.
result: https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use step: https://github.com/methane/notes/blob/master/2020/wchar-cache/README.md
I noticed:
* Most of them are generated by Cython. * I reported it to Cython so Cython 0.29.21 will fix them. I expect more than 1 year between Cython 0.29.21 and Python 3.11rc1. * Most of them are `PyUnicode_FromUnicode(NULL, 0);` * We may be able to keep PyUnicode_FromUnicode, but raise error when length>0.
Regards,
-- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7JVC3IKS... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, Jun 16, 2020 at 12:35 AM Victor Stinner <vstinner@python.org> wrote:
Hi INADA-san,
IMO Python 3.11 is too early because we don't emit a DeprecationWarning on every single deprecation function.
1) Emit a DeprecationWarning at runtime (ex: Python 3.10) 2) Wait two Python releases: see https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 3) Remove the deprecated feature (ex: Python 3.12)
Hmm, Is there any chance to add DeprecationWarning in Python 3.9? * They are deprecated in document since Python 3.3 (2012) * As far as grepping PyPI sdist sources, I feel two years may be enough to remove them. * We can postpone the schedule anyway.
I don't understand if *all* deprecated functions are causing implementation issues, or only a few of them?
Of course. I meant only APIs using PyASCIIObject.wstr. As far as I know, * PyUnicode_AS_DATA * PyUnicode_AS_UNICODE * PyUnicode_AsUnicode * PyUnicode_AsUnicodeAndSize * PyUnicode_FromUnicode(NULL, size) * PyUnicode_FromStringAndSize(NULL, size) * PyUnicode_GetSize * PyUnicode_GET_SIZE * PyUnicode_GET_DATA_SIZE * PyUnicode_WSTR_LENGTH * PyArg_ParseTuple, and PyArg_ParseTupleAndTuple with format 'u' or 'Z'.
PyUnicode_AS_UNICODE() initializes PyASCIIObject.wstr if needed, and then return PyASCIIObject.wstr. I don't think that PyASCIIObject.wstr can be called "a cache": there are functions relying on this member.
OK, I will call it wstr, instead of wchar_t* cache.
On the other hand, PyUnicode_FromUnicode(str, size) is basically a wrapper to PyUnicode_FromWideChar(): it doesn't harm to keep this wrapper to ease migration. Only PyUnicode_FromUnicode(NULL, size) is causing troubles, right?
You're right.
Is there a list of deprecated functions and is it possible to group them in two categories: must be removed and "can be kept for a few more releases"?
If the intent is to reduce Python memory footprint, PyASCIIObject.wstr can be moved out of PyASCIIObject structure, maybe we can imagine a WeakDict. It would map a Python str object to its wstr member (wchar_* string). If the Python str object is removed, we can release the wstr string. The technical problem is that it is not possible to create a weak reference to a Python str. We may insert code in unicode_dealloc() to delete manually the wstr in this case. Maybe a _Py_hashtable_t of pycore_hashtable.h could be used for that.
It is an interesting idea, but I think it is too complex. Fixing all packages in the PyPI would be a better approach.
Since this discussion is on-going for something like 5 years in multiple bugs.python.org issues and email threads, maybe it would help to have a short PEP describing issues of the deprecated functions, explain the plan to migrate to the new functions, and give a schedule of the incompatible changes. INADA-san: would you be a candidate to write such PEP?
OK, I will try to write it. -- Inada Naoki <songofacandy@gmail.com>
Le mar. 16 juin 2020 à 10:42, Inada Naoki <songofacandy@gmail.com> a écrit :
Hmm, Is there any chance to add DeprecationWarning in Python 3.9?
In my experience, more and more projects are running their test suite with -Werror, which is a good thing. Introducing a new warning is likely to "break" many of these projects. For example, in Fedora, we run the test suite when we build a package. If a test fails, the package build fails and we have to decide to either ignore the failing tests (not good) or find a solution to repair the tests (update the code base to new C API functions).
It is an interesting idea, but I think it is too complex. Fixing all packages in the PyPI would be a better approach.
It's not the first time that we have to take such decision. "Fixing all PyPI packages" is not possible. Python core developers are limited are so we can only port a very low number of packages. Breaking packages on purpose force developers to upgrade their code base, it should work better than deprecation warnings. But it is likely to make some people unhappy. Having a separated hash table would prevent to break many PyPI packages by continuing to provide the backward compatibility. We can even consider to disable it by default, but provide a temporary option to opt-in for backward compatibility. For example, "python3.10 -X unicode_compat". I proposed sys.set_python_compat_version(version) in the rejected PEP 606, but this PEP was too broad: https://www.python.org/dev/peps/pep-0606/ The question is if it's worth it to pay the maintenance burden on the Python side, or to drop backward compatibility if it's "too expensive". I understood that your first motivation is to reduce PyASCIObject structure size. Using a hash table, the overhead would only be paid by users of the deprecated functions. But it requires to keep the code and so continue to maintain it. Maybe I missed some drawbacks. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Tue, Jun 16, 2020 at 9:30 PM Victor Stinner <vstinner@python.org> wrote:
Le mar. 16 juin 2020 à 10:42, Inada Naoki <songofacandy@gmail.com> a écrit :
Hmm, Is there any chance to add DeprecationWarning in Python 3.9?
In my experience, more and more projects are running their test suite with -Werror, which is a good thing. Introducing a new warning is likely to "break" many of these projects. For example, in Fedora, we run the test suite when we build a package. If a test fails, the package build fails and we have to decide to either ignore the failing tests (not good) or find a solution to repair the tests (update the code base to new C API functions).
But Python 3.9 is still in beta phase and we have enough time to get feedback. If the new warning is unacceptable breakage, we can remove it in RC phase.
It is an interesting idea, but I think it is too complex. Fixing all packages in the PyPI would be a better approach.
It's not the first time that we have to take such decision. "Fixing all PyPI packages" is not possible. Python core developers are limited are so we can only port a very low number of packages. Breaking packages on purpose force developers to upgrade their code base, it should work better than deprecation warnings. But it is likely to make some people unhappy.
OK, My terminology was wrong. Not all, but almost of living packages. * This change doesn't affect to pure Python packages. * Most of the rest uses Cython. Since I already report an issue to Cython, regenerating with new Cython release fixes them. * Most of the rest support PEP 393 already. So I expect only few percents of active packages will be affected. This is a list of use of deprecated APIs from the top 4000 packages, except PyArg_ParseTuple(AndKeywords). Files generated by Cython are excluded. But most of them are false positives yet (e.g. in `#if PY2`). https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use I have filed some issues and sent some pull requests already after I created this thread.
Having a separated hash table would prevent to break many PyPI packages by continuing to provide the backward compatibility. We can even consider to disable it by default, but provide a temporary option to opt-in for backward compatibility. For example, "python3.10 -X unicode_compat".
I proposed sys.set_python_compat_version(version) in the rejected PEP 606, but this PEP was too broad: https://www.python.org/dev/peps/pep-0606/
The question is if it's worth it to pay the maintenance burden on the Python side, or to drop backward compatibility if it's "too expensive".
I understood that your first motivation is to reduce PyASCIObject structure size. Using a hash table, the overhead would only be paid by users of the deprecated functions. But it requires to keep the code and so continue to maintain it. Maybe I missed some drawbacks.
Memory usage is the most important motivation. But runtime cost of PyUnicode_READY and maintenance cost of legacy unicode matters too. I will reconsider your idea. But I still feel that helping many third parties is the most constructive way. Regards, -- Inada Naoki <songofacandy@gmail.com>
Inada Naoki wrote:
On Tue, Jun 16, 2020 at 9:30 PM Victor Stinner vstinner@python.org wrote:
Le mar. 16 juin 2020 à 10:42, Inada Naoki songofacandy@gmail.com a écrit : Hmm, Is there any chance to add DeprecationWarning in Python 3.9? In my experience, more and more projects are running their test suite with -Werror, which is a good thing. Introducing a new warning is likely to "break" many of these projects. For example, in Fedora, we run the test suite when we build a package. If a test fails, the package build fails and we have to decide to either ignore the failing tests (not good) or find a solution to repair the tests (update the code base to new C API functions). But Python 3.9 is still in beta phase and we have enough time to get feedback.
If the new warning is unacceptable breakage, we can remove it in RC phase.
Sure, but it's also a bit disruptive to throw in new warnings in the middle of the beta cycle versus removing them. Typically we try to improve compatibility for people in betas, not lower it. Is it that important to get it done in 3.9 versus making the change in the master branch right now and just waiting 12 extra months? In the end, though, it's the release manager's decision. -Brett
It is an interesting idea, but I think it is too complex. Fixing all packages in the PyPI would be a better approach. It's not the first time that we have to take such decision. "Fixing all PyPI packages" is not possible. Python core developers are limited are so we can only port a very low number of packages. Breaking packages on purpose force developers to upgrade their code base, it should work better than deprecation warnings. But it is likely to make some people unhappy. OK, My terminology was wrong. Not all, but almost of living packages.
This change doesn't affect to pure Python packages. Most of the rest uses Cython. Since I already report an issue to Cython, regenerating with new Cython release fixes them. Most of the rest support PEP 393 already.
Having a separated hash table would prevent to break many PyPI packages by continuing to provide the backward compatibility. We can even consider to disable it by default, but provide a temporary option to opt-in for backward compatibility. For example, "python3.10 -X unicode_compat". I proposed sys.set_python_compat_version(version) in the rejected PEP 606, but this PEP was too broad: https://www.python.org/dev/peps/pep-0606/ The question is if it's worth it to pay the maintenance burden on the Python side, or to drop backward compatibility if it's "too expensive". I understood that your first motivation is to reduce PyASCIObject structure size. Using a hash table, the overhead would only be paid by users of the deprecated functions. But it requires to keep the code and so continue to maintain it. Maybe I missed some drawbacks. Memory usage is the most important motivation. But runtime cost of PyUnicode_READY and maintenance cost of legacy unicode matters too. I will reconsider your idea. But I still feel that helping many third
So I expect only few percents of active packages will be affected. This is a list of use of deprecated APIs from the top 4000 packages, except PyArg_ParseTuple(AndKeywords). Files generated by Cython are excluded. But most of them are false positives yet (e.g. in #if PY2). https://github.com/methane/notes/blob/master/2020/wchar-cache/deprecated-use I have filed some issues and sent some pull requests already after I created this thread. parties is the most constructive way. Regards,
On 16Jun2020 1641, Inada Naoki wrote:
* This change doesn't affect to pure Python packages. * Most of the rest uses Cython. Since I already report an issue to Cython, regenerating with new Cython release fixes them.
The precedent set in our last release with tp_print was that regenerating Cython releases was too much to ask. Unless we're going to overrule that immediately, we should leave everything there and give users/developers a full release cycle with updated Cython version to make new releases without causing any breakage. Cheers, Steve
On Wed, Jun 17, 2020 at 4:16 AM Steve Dower <steve.dower@python.org> wrote:
On 16Jun2020 1641, Inada Naoki wrote:
* This change doesn't affect to pure Python packages. * Most of the rest uses Cython. Since I already report an issue to Cython, regenerating with new Cython release fixes them.
The precedent set in our last release with tp_print was that regenerating Cython releases was too much to ask.
Unless we're going to overrule that immediately, we should leave everything there and give users/developers a full release cycle with updated Cython version to make new releases without causing any breakage.
We have one year for 3.10 and two years for 3.11. Additionally, unlike the case of tp_print, we don't need to wait all of them are regenerated. Cython used deprecated APIs in two cases: * Cython used PyUnicode_FromUnicode(NULL, 0) to create empty string. Many packages are affected. But we can keep it working when we removed wstr. https://github.com/cython/cython/pull/3677 * Cython used PyUnicode_FromUnicode() in very minor cases. Only few packages are affected. https://github.com/cython/cython/issues/3678 So we need to ask to regenerate with Cython >= 0.9.21 only a few projects. Regards, -- Inada Naoki <songofacandy@gmail.com>
participants (9)
-
Antoine Pitrou
-
Brett Cannon
-
Inada Naoki
-
Kyle Stanley
-
Mark Shannon
-
MRAB
-
Serhiy Storchaka
-
Steve Dower
-
Victor Stinner