Replacing PySequence_Check / PyMapping_Check

Hi all,
Tl;dr - As mentioned in issue https://bugs.python.org/issue5945,
PySequence_Check
and PyMapping_Check
should be deprecated and
discontinued. The reason for this is that they do not provide accurate
results and can lead to unexpected behavior. (PyMapping_Check for a list
will return True, and PySequence_Check for a mapping that doesn't subclass
dict will also return true)
This proposal is to have a process to deprecate those and introduce new functions instead.
Thanks to the latest changes in the C-API, we now have flags set for classes that are sequences or mapping - those were created for the pattern matching functionality.
This lets us have a path for clear and accurate determination of sequence or mapping.
The use cases for the change are as follows:
C-API code that wishes to use the PySequence/PyMapping API and to guarantee correct behavior needs to check it is indeed a valid sequence/mapping
Static types casting from ambiguous “PyObject” to “PySequence” & “PyMapping” types as provided by PyO3 ( https://pyo3.rs/v0.15.1/conversions/tables.html#argument-types)
The correct way to determine right now if an object is a sequence, is to
check its Py_TPFLAGS_SEQUENCE
flag and check if it’s a string, bytes or
bytearray as they are the only sequences not having TPFLAGS_SEQUENCE
per
specification. This is cumbersome for most non-expert developers not aware
of the nuances in the C-API. Having a C-API that provides this check out of
the box can ease it for most.
PyMapping_Check
and PySequence_Check
are still widely used and are
**part of the stable ABI** even though core developers recommend avoiding
those. This is very odd to stabilize and is an API that should be
deprecated.
Our suggested change is to add two new functions: PyMapping_CheckNew
&
PySequence_CheckNew
. They will be implemented using the flags check and
the str/bytes/bytearray check. Those functions will be 100% accurate and
follow the existing documentations.
We will do the following process to improve C-API:
Add these functions. 2.
Add a deprecation notice on PyMapping_Check
and PySequence_Check
and
remove all built-in usage of it. We will recommend the use of the new
functions.
3.
Drop the old functions. 4.
Alias the new functions to PyMapping_Check
and PySequence_Check
and
add a deprecation notice of PyMapping_CheckNew
and PySequence_CheckNew
.
5.
Rename PyMapping_CheckNew
and PySequence_CheckNew
to Mapping_Check
and PySequence_Check
.
The timeline will probably be long as it’s part of stable ABI, but in our opinion, the change is for the better.
Caveats:
Sequences and Mappings will match the typing.Sequence and typing.Mapping abcs. Duck typing is further “discouraged”. We do not think of it as an issue since Python already has idioms and static type checkers that discourage this sort of duck typing.
C extensions skipping multiple minor Python versions will use the new code without their knowledge, which is not backwards compatible. This is an issue with any removal of a stable ABI or language function, and we consider it acceptable.
Thanks to Bar Harel for helping me with the proposal.
This is was initially discussed in this bpo: https://bugs.python.org/issue46376
Best regards,
Aviram

On Sun, 23 Jan 2022 at 22:45, Aviram Hassan <aviramyhassan@gmail.com> wrote:
TL;DR counter proposal: just update PySequence_Check and PyMapping_Check to test Py_TPFLAGS_SEQUENCE and whatever the new recommendation is for mappings. No new functions, make the old functions do the right thing. Backwards code compatibility with correct behaviour going forward.
OK, I don't use PyMapping_Check in my C extensions but I do test for sequences. And I use the Python duck typing definition of "valid sequence" in that if PySequence_Check returns True, I can invoke sequence-y methods on it.
And I do not see a problem with dicts passing the test, because in Python they are iterable and can be indexed. It may not look very sensible to use a mapping as a sequence, but duck typing / dynamic typing to me means that if it works in Python, it should work in C.
OK I have no opinion on this. If you want to introduce new functions for this specific problem, go ahead.
That would be me - I wasn't aware that strings etc didn't pass the sequence test! The extensions I've written were not expecting such, so I never thought to see if they'd work.
However, again I don't see the need for a new function.
A non expert such as myself currently writes
if (PySequence_Check(foo)) { ...
Update PySequence_Check to test the magic flag instead, and I won't notice any difference. A bug in my code (str bytes etc not working) will have been fixed and I didn't have to do anything.
An expert currently writes something like
if (PySequenceCheck(foo) || fooType->tp_flags & Py_TPFLAGS_SEQUENCE)) {
(Code to retrieve fooType skipped, yeah, should use use Py_TYPE() etc)
And again all their current code continues to work without change. The flag test is now unnecessary, but I doubt it is going to be a massive performance hit. And if the writer really does care about performance, they can do versions-specific code.
[ munch ]
Wait, what? WHAT?
Please leave this out of your proposal.
--
cheers,
Hugh Fisher

On 23. 01. 22 12:44, Aviram Hassan wrote:
They should *not* be removed, or changed. The surprising behavior and naming is not reason enough to break existing, working code.
It's OK to add new ways to do things, and to discourage the old ones if they're obsolete. But there are existing extensions that use what was available, and those should continue to work.
It is stabilized because existing code depends on it.
"100% accurate" is a bold claim. Do you mean "accurate by definition", i.e. we should define "sequence" in terms of what PySequence_CheckNew returns?
The existing documentation for PyMapping_Check says:
Note that it returns 1 for Python classes with a __getitem__() method since in general case it is impossible to determine what type of keys it supports.
This doesn't sound like what the behavior you want. What other documentation are you referring to?
All reasonable up to here (except there might be better names than *New).
But for steps 4 and 5, I think the best timeline is "never".
Static type checkers are still an optional tool for people who want that sort of thing. Quite a popular tool, sure. But discouraging duck typing would be a very big change to the status quo. Perhaps the proposal should start with that, rather than put it in a caveats section?

Any Python code that uses isinstance(obj, collections.abc.Sequence) or a type annotation of obj: Sequence
already defines sequences as something that subclasses or registers to abc.Sequence. It will conform by 100% to those semantics.
This is a much more philosophical question. We're not actually discouraging duck typing.
It might sound like a "big change", but de-facto, any code that currently requests a Sequence in its type annotation will not lint correctly with classes not inheriting from Sequence. Any code that checks isinstance(obj, Sequence) will return False even if the class perfectly matches the semantics.
Remember, abc.Sequence is not a protocol, it is an ABC. Look at the current situation, try passing a "sequence-like object" to random.sample()
and you'll fail on the isinstance check. Pass a mapping-like to os.environ and you'll fail as well. These fail even if you don't use the "optional" type hint, and if you do use, half of the standard library will not work with duck typing if you don't actually register to the relevant ABC. You can't really say duck typing is encouraged if the standard library no longer fully works with it.
Either way, the discussion about duck typing is a little irrelevant, all we're doing is basically giving a faster alternative for isinstance(obj, Sequence) and isinstance(obj, Mapping) which are frequently used. What we meant by talking about duck typing is that the current PyMapping_Check calls everything that has __getitem__ a Mapping, while we call only things that are isinstance(obj, Mapping) a mapping, which matches both isinstance in Python and type annotations. If someone wishes to support duck typing he can choose not to use the function. Again, this is 100% accurate to the current Python semantics.

On Mon, 24 Jan 2022 at 23:32, Bar Harel <bzvi7919@gmail.com> wrote:
Either way, the discussion about duck typing is a little irrelevant, all we're doing is basically giving a faster alternative for isinstance(obj, Sequence) and isinstance(obj, Mapping) which are frequently used. What we meant by talking about duck typing is that the current PyMapping_Check calls everything that has __getitem__ a Mapping, while we call only things that are isinstance(obj, Mapping) a mapping, which matches both isinstance in Python and type annotations. If someone wishes to support duck typing he can choose not to use the function. Again, this is 100% accurate to the current Python semantics.
There's a way to define new helper functions without harming support
for ducktyping any more than ABCs do in general: define the new API
functions as semantically equivalent to PyObject_IsInstance(obj, ref_to_collections.abc.Sequence)
and PyObject_IsInstance(obj, ref_to_collections.abc.Mapping)
(i.e the C API equivalent of the ABC
checking idiom used in Python code), rather than defining them in term
of the internal type object flags used to optimise the "True" cases of
those checks. That way anything explicitly registered with the ABCs
(or otherwise set up to pass their instance checks) will still pass
the C API type check, objects that directly inherit from types with
the internal flag set will merely pass the check a little bit faster.
If distinct PyMapping_CheckInstance and PySequence_CheckInstance functions were exposed in the C API, then they could take advantage of all the tricks used to optimise collections ABC access and checks from the interpreter runtime
Given the new helper functions, the PyMapping_Check documentation
could be updated to say something like "This function does NOT provide
the C equivalent of isinstance(obj, collections.abc.Mapping)
. Use
PyMapping_CheckInstance
to efficiently check if an object implements
the Mapping ABC", and a similar change could be made to the
PySequence_Check documentation.
One thing that definitely *shouldn't* happen is deprecating or changing PySequence_Check and PyMapping_Check. Those C API functions still do exactly what they were designed to do, it's just that what they were designed to do relates to querying details regarding function pointers stored in C structs, rather than asking whether an object implements the interface of a particular Python ABC.
Cheers, Nick.
P.S. Without accounting for error checking, refcounting, or optimisation, the gist of "correctly" checking if an object is a sequence from C code is currently:
collections_abc = PyImport_ImportModule("collections.abc");
sequence_abc = PyObject_GetAttrString(collections_abc, "Sequence");
is_sequence = PyObject_IsInstance(obj_to_check, sequence_abc);
That's already overly verbose, and adding error handling, refcount adjustments and caching for speed improvements makes it worse.
Third party code should definitely NOT be checking the contents of the type flags directly.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I think you have a wrong understanding of what those APIs are meant for. They don't check for an object *being* of a certain (sub)type, but instead check whether they *expose* certain C APIs.
PySequence_Check() checks for presence of the tp_as_sequence->sq_item slot which is prerequisite for many other PySequence_*() APIs.
PyMapping_Check() checks for the tp_as_mapping->mp_subscript slot, which is which is prerequisite for many other PyMapping_*() APIs.
Nothing more, nothing less :-)
They do not implement ABC checks, type flag checks, subtype checks, etc.
If you want the latter, you should suggest new APIs which then cater to those use cases, in case other APIs don't already provide this logic. PyObject_IsInstance() will go a long way for many of those, BTW.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jan 24 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On Sun, 23 Jan 2022 at 22:45, Aviram Hassan <aviramyhassan@gmail.com> wrote:
TL;DR counter proposal: just update PySequence_Check and PyMapping_Check to test Py_TPFLAGS_SEQUENCE and whatever the new recommendation is for mappings. No new functions, make the old functions do the right thing. Backwards code compatibility with correct behaviour going forward.
OK, I don't use PyMapping_Check in my C extensions but I do test for sequences. And I use the Python duck typing definition of "valid sequence" in that if PySequence_Check returns True, I can invoke sequence-y methods on it.
And I do not see a problem with dicts passing the test, because in Python they are iterable and can be indexed. It may not look very sensible to use a mapping as a sequence, but duck typing / dynamic typing to me means that if it works in Python, it should work in C.
OK I have no opinion on this. If you want to introduce new functions for this specific problem, go ahead.
That would be me - I wasn't aware that strings etc didn't pass the sequence test! The extensions I've written were not expecting such, so I never thought to see if they'd work.
However, again I don't see the need for a new function.
A non expert such as myself currently writes
if (PySequence_Check(foo)) { ...
Update PySequence_Check to test the magic flag instead, and I won't notice any difference. A bug in my code (str bytes etc not working) will have been fixed and I didn't have to do anything.
An expert currently writes something like
if (PySequenceCheck(foo) || fooType->tp_flags & Py_TPFLAGS_SEQUENCE)) {
(Code to retrieve fooType skipped, yeah, should use use Py_TYPE() etc)
And again all their current code continues to work without change. The flag test is now unnecessary, but I doubt it is going to be a massive performance hit. And if the writer really does care about performance, they can do versions-specific code.
[ munch ]
Wait, what? WHAT?
Please leave this out of your proposal.
--
cheers,
Hugh Fisher

On 23. 01. 22 12:44, Aviram Hassan wrote:
They should *not* be removed, or changed. The surprising behavior and naming is not reason enough to break existing, working code.
It's OK to add new ways to do things, and to discourage the old ones if they're obsolete. But there are existing extensions that use what was available, and those should continue to work.
It is stabilized because existing code depends on it.
"100% accurate" is a bold claim. Do you mean "accurate by definition", i.e. we should define "sequence" in terms of what PySequence_CheckNew returns?
The existing documentation for PyMapping_Check says:
Note that it returns 1 for Python classes with a __getitem__() method since in general case it is impossible to determine what type of keys it supports.
This doesn't sound like what the behavior you want. What other documentation are you referring to?
All reasonable up to here (except there might be better names than *New).
But for steps 4 and 5, I think the best timeline is "never".
Static type checkers are still an optional tool for people who want that sort of thing. Quite a popular tool, sure. But discouraging duck typing would be a very big change to the status quo. Perhaps the proposal should start with that, rather than put it in a caveats section?

Any Python code that uses isinstance(obj, collections.abc.Sequence) or a type annotation of obj: Sequence
already defines sequences as something that subclasses or registers to abc.Sequence. It will conform by 100% to those semantics.
This is a much more philosophical question. We're not actually discouraging duck typing.
It might sound like a "big change", but de-facto, any code that currently requests a Sequence in its type annotation will not lint correctly with classes not inheriting from Sequence. Any code that checks isinstance(obj, Sequence) will return False even if the class perfectly matches the semantics.
Remember, abc.Sequence is not a protocol, it is an ABC. Look at the current situation, try passing a "sequence-like object" to random.sample()
and you'll fail on the isinstance check. Pass a mapping-like to os.environ and you'll fail as well. These fail even if you don't use the "optional" type hint, and if you do use, half of the standard library will not work with duck typing if you don't actually register to the relevant ABC. You can't really say duck typing is encouraged if the standard library no longer fully works with it.
Either way, the discussion about duck typing is a little irrelevant, all we're doing is basically giving a faster alternative for isinstance(obj, Sequence) and isinstance(obj, Mapping) which are frequently used. What we meant by talking about duck typing is that the current PyMapping_Check calls everything that has __getitem__ a Mapping, while we call only things that are isinstance(obj, Mapping) a mapping, which matches both isinstance in Python and type annotations. If someone wishes to support duck typing he can choose not to use the function. Again, this is 100% accurate to the current Python semantics.

On Mon, 24 Jan 2022 at 23:32, Bar Harel <bzvi7919@gmail.com> wrote:
Either way, the discussion about duck typing is a little irrelevant, all we're doing is basically giving a faster alternative for isinstance(obj, Sequence) and isinstance(obj, Mapping) which are frequently used. What we meant by talking about duck typing is that the current PyMapping_Check calls everything that has __getitem__ a Mapping, while we call only things that are isinstance(obj, Mapping) a mapping, which matches both isinstance in Python and type annotations. If someone wishes to support duck typing he can choose not to use the function. Again, this is 100% accurate to the current Python semantics.
There's a way to define new helper functions without harming support
for ducktyping any more than ABCs do in general: define the new API
functions as semantically equivalent to PyObject_IsInstance(obj, ref_to_collections.abc.Sequence)
and PyObject_IsInstance(obj, ref_to_collections.abc.Mapping)
(i.e the C API equivalent of the ABC
checking idiom used in Python code), rather than defining them in term
of the internal type object flags used to optimise the "True" cases of
those checks. That way anything explicitly registered with the ABCs
(or otherwise set up to pass their instance checks) will still pass
the C API type check, objects that directly inherit from types with
the internal flag set will merely pass the check a little bit faster.
If distinct PyMapping_CheckInstance and PySequence_CheckInstance functions were exposed in the C API, then they could take advantage of all the tricks used to optimise collections ABC access and checks from the interpreter runtime
Given the new helper functions, the PyMapping_Check documentation
could be updated to say something like "This function does NOT provide
the C equivalent of isinstance(obj, collections.abc.Mapping)
. Use
PyMapping_CheckInstance
to efficiently check if an object implements
the Mapping ABC", and a similar change could be made to the
PySequence_Check documentation.
One thing that definitely *shouldn't* happen is deprecating or changing PySequence_Check and PyMapping_Check. Those C API functions still do exactly what they were designed to do, it's just that what they were designed to do relates to querying details regarding function pointers stored in C structs, rather than asking whether an object implements the interface of a particular Python ABC.
Cheers, Nick.
P.S. Without accounting for error checking, refcounting, or optimisation, the gist of "correctly" checking if an object is a sequence from C code is currently:
collections_abc = PyImport_ImportModule("collections.abc");
sequence_abc = PyObject_GetAttrString(collections_abc, "Sequence");
is_sequence = PyObject_IsInstance(obj_to_check, sequence_abc);
That's already overly verbose, and adding error handling, refcount adjustments and caching for speed improvements makes it worse.
Third party code should definitely NOT be checking the contents of the type flags directly.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I think you have a wrong understanding of what those APIs are meant for. They don't check for an object *being* of a certain (sub)type, but instead check whether they *expose* certain C APIs.
PySequence_Check() checks for presence of the tp_as_sequence->sq_item slot which is prerequisite for many other PySequence_*() APIs.
PyMapping_Check() checks for the tp_as_mapping->mp_subscript slot, which is which is prerequisite for many other PyMapping_*() APIs.
Nothing more, nothing less :-)
They do not implement ABC checks, type flag checks, subtype checks, etc.
If you want the latter, you should suggest new APIs which then cater to those use cases, in case other APIs don't already provide this logic. PyObject_IsInstance() will go a long way for many of those, BTW.
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Experts (#1, Jan 24 2022)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
participants (6)
-
Aviram Hassan
-
Bar Harel
-
Hugh Fisher
-
Marc-Andre Lemburg
-
Nick Coghlan
-
Petr Viktorin