PySequence_Check but no __len__
Hi friends, there is a case in the Python API where I am not sure what to do: If an object defines __getitem__() only but no __len__(), then PySequence_Check() already is true and does not care. So if I define no __len__, it simply fails. Is this intended? I was mislead and thought this was the unlimited case, but it seems still to be true that sequences are always finite. Can someone please enlighten me? -- Christian Tismer-Sperling :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023
Sorry, I don't quite follow.
On Thu, 21 Jun 2018 at 08:50 Christian Tismer
Hi friends,
there is a case in the Python API where I am not sure what to do:
If an object defines __getitem__() only but no __len__(), then PySequence_Check() already is true and does not care.
Which matches https://docs.python.org/3/c-api/sequence.html#c.PySequence_Check .
From Objects/abstract.c:
int PySequence_Check(PyObject *s) { if (PyDict_Check(s)) return 0; return s != NULL && s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL; }
So if I define no __len__, it simply fails. Is this intended?
What is "it" in this case that is failing? It isn't PySequence_Check() so I'm not sure what the issue is. -Brett
I was mislead and thought this was the unlimited case, but it seems still to be true that sequences are always finite.
Can someone please enlighten me? -- Christian Tismer-Sperling :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 <+49%20173%202418776> fax +49 (30) 700143-0023 <+49%2030%207001430023>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
Hi Brett, because you did not understand me, I must have had a fundamental misunderstanding. So I started a self-analysis and came to the conclusion that this was my error since maybe a decade: When iterators and generators came into existence, I somehow fell into the trap to think that there are now sequences with undetermined or infinite length. They would be exactly those sequences which have no __len__ attribute. I understand now that sequences are always of fixed length and adjusted myself. ----------------------------------------- My problem is to find out how to deal with a class which has __getitem__ but no __len__. The documentation suggests that the length of a sequence can always be obtained by len(). https://docs.python.org/3/reference/datamodel.html But the existence of __len__ is not guaranteed or enforced. And if you look at the definition of PySequence_Fast(), you find that a sequence can be turned into a list with iteration only and no __len__. So, is a sequence valid without __len__, if iteration is supported, instead? There is the whole chapter about sequence protocol https://docs.python.org/3/c-api/sequence.html?highlight=sequence but I cannot find out an exact definition what makes up a sequence? Sorry if I'm again the only one who misunderstands the obvious :) Best -- Chris On 21.06.18 18:29, Brett Cannon wrote:
Sorry, I don't quite follow.
On Thu, 21 Jun 2018 at 08:50 Christian Tismer
mailto:tismer@stackless.com> wrote: Hi friends,
there is a case in the Python API where I am not sure what to do:
If an object defines __getitem__() only but no __len__(), then PySequence_Check() already is true and does not care.
Which matches https://docs.python.org/3/c-api/sequence.html#c.PySequence_Check .
From Objects/abstract.c:
int PySequence_Check(PyObject *s) { if (PyDict_Check(s)) return 0; return s != NULL && s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL; }
So if I define no __len__, it simply fails. Is this intended?
What is "it" in this case that is failing? It isn't PySequence_Check() so I'm not sure what the issue is.
-Brett
I was mislead and thought this was the unlimited case, but it seems still to be true that sequences are always finite.
Can someone please enlighten me? -- Christian Tismer-Sperling :^) tismer@stackless.com mailto:tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 tel:+49%20173%202418776 fax +49 (30) 700143-0023 tel:+49%2030%207001430023
_______________________________________________ Python-Dev mailing list Python-Dev@python.org mailto:Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tismer%40stackless.com
-- Christian Tismer-Sperling :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023
Answering myself: PySequence_Check determines a sequence. See the docs. len() can but does not have to exist. The size is always limited. After evicting my initial fault, this is now obvious. Sorry about the noise. On 22.06.18 13:17, Christian Tismer wrote:
Hi Brett,
because you did not understand me, I must have had a fundamental misunderstanding. So I started a self-analysis and came to the conclusion that this was my error since maybe a decade:
When iterators and generators came into existence, I somehow fell into the trap to think that there are now sequences with undetermined or infinite length. They would be exactly those sequences which have no __len__ attribute.
I understand now that sequences are always of fixed length and adjusted myself.
-----------------------------------------
My problem is to find out how to deal with a class which has __getitem__ but no __len__.
The documentation suggests that the length of a sequence can always be obtained by len(). https://docs.python.org/3/reference/datamodel.html
But the existence of __len__ is not guaranteed or enforced. And if you look at the definition of PySequence_Fast(), you find that a sequence can be turned into a list with iteration only and no __len__.
So, is a sequence valid without __len__, if iteration is supported, instead?
There is the whole chapter about sequence protocol https://docs.python.org/3/c-api/sequence.html?highlight=sequence
but I cannot find out an exact definition what makes up a sequence?
Sorry if I'm again the only one who misunderstands the obvious :)
Best -- Chris
On 21.06.18 18:29, Brett Cannon wrote:
Sorry, I don't quite follow.
On Thu, 21 Jun 2018 at 08:50 Christian Tismer
mailto:tismer@stackless.com> wrote: Hi friends,
there is a case in the Python API where I am not sure what to do:
If an object defines __getitem__() only but no __len__(), then PySequence_Check() already is true and does not care.
Which matches https://docs.python.org/3/c-api/sequence.html#c.PySequence_Check .
From Objects/abstract.c:
int PySequence_Check(PyObject *s) { if (PyDict_Check(s)) return 0; return s != NULL && s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL; }
So if I define no __len__, it simply fails. Is this intended?
What is "it" in this case that is failing? It isn't PySequence_Check() so I'm not sure what the issue is.
-Brett
I was mislead and thought this was the unlimited case, but it seems still to be true that sequences are always finite.
Can someone please enlighten me? -- Christian Tismer-Sperling :^) tismer@stackless.com mailto:tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 tel:+49%20173%202418776 fax +49 (30) 700143-0023 tel:+49%2030%207001430023
_______________________________________________ Python-Dev mailing list Python-Dev@python.org mailto:Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tismer%40stackless.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tismer%40stackless.com
-- Christian Tismer-Sperling :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : http://pyside.org 14482 Potsdam : GPG key -> 0xE7301150FB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023
On 22 June 2018 at 21:45, Christian Tismer
Answering myself:
PySequence_Check determines a sequence. See the docs.
len() can but does not have to exist. The size is always limited.
Just to throw a couple of extra wrinkles on this: Due to a C API implementation detail in CPython, not only can len() throw TypeError for non-finite sequences (which implement other parts of the sequence API, but not that), but sufficiently large finite sequences may also throw OverflowError:
data = range(-2**64, 2**64) format((data.stop - data.start) // data.step, "e") '3.689349e+19' format(sys.maxsize, "e") '9.223372e+18' len(data) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t data.__len__() Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t
Infinite sequences that want to prevent infinite loops or unbounded memory consumption in consumers may also choose to implement a __length_hint__ that throws TypeError (see https://bugs.python.org/issue33939 for a proposal to do that in itertools). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 6/22/2018 7:17 AM, Christian Tismer wrote:
My problem is to find out how to deal with a class which has __getitem__ but no __len__.
The documentation suggests that the length of a sequence can always be obtained by len(). https://docs.python.org/3/reference/datamodel.html
It says that plainly: "The built-in function len() returns the number of items of a sequence. " https://docs.python.org/3/library/collections.abc.html#collections-abstract-... says that a Sequence has both __getitem__ and __len__. I am surprised that a C-API function calls something a 'sequence' without it having __len__. -- Terry Jan Reedy
On 22.06.2018 22:07, Terry Reedy wrote:
On 6/22/2018 7:17 AM, Christian Tismer wrote:
My problem is to find out how to deal with a class which has __getitem__ but no __len__.
The documentation suggests that the length of a sequence can always be obtained by len(). https://docs.python.org/3/reference/datamodel.html
It says that plainly: "The built-in function len() returns the number of items of a sequence. "
https://docs.python.org/3/library/collections.abc.html#collections-abstract-...
says that a Sequence has both __getitem__ and __len__.
I am surprised that a C-API function calls something a 'sequence' without it having __len__.
A practical sequence check is checking for __iter__ . An iterator doesn't necessarily have a defined length -- e.g. a stream or a generator. -- Regards, Ivan
On 22.06.2018 22:17, Ivan Pozdeev wrote:
On 22.06.2018 22:07, Terry Reedy wrote:
On 6/22/2018 7:17 AM, Christian Tismer wrote:
My problem is to find out how to deal with a class which has __getitem__ but no __len__.
The documentation suggests that the length of a sequence can always be obtained by len(). https://docs.python.org/3/reference/datamodel.html
It says that plainly: "The built-in function len() returns the number of items of a sequence. "
https://docs.python.org/3/library/collections.abc.html#collections-abstract-...
says that a Sequence has both __getitem__ and __len__.
I am surprised that a C-API function calls something a 'sequence' without it having __len__.
A practical sequence check is checking for __iter__ . An iterator doesn't necessarily have a defined length -- e.g. a stream or a generator.
Now, I know this isn't what https://docs.python.org/3/glossary.html#term-sequence says. But practically, the documentation seems to use "sequence" in the sense "finite iterable". Functions that need to know the length of input in advance seem to be the minority. -- Regards, Ivan
Ivan Pozdeev via Python-Dev wrote:
the documentation seems to use "sequence" in the sense "finite iterable". Functions that need to know the length of input in advance seem to be the minority.
The official classifications we have are: Sequence: __iter__, __getitem__, __len__ Iterable: __iter__ There isn't any official term for a sequential thing that has __iter__ and __getitem__ but not __len__. That's probably because the need for such a thing doesn't seem to come up very much. One usually processes a potentially infinite sequence by iterating over it, not picking things out at arbitrary positions. And usually its items are generated by an algorithm that works sequentially, so random access would be difficult to implement. -- Greg
On 22 June 2018 at 20:17, Ivan Pozdeev via Python-Dev
On 22.06.2018 22:07, Terry Reedy wrote:
https://docs.python.org/3/library/collections.abc.html#collections-abstract-...
says that a Sequence has both __getitem__ and __len__.
I am surprised that a C-API function calls something a 'sequence' without it having __len__.
A practical sequence check is checking for __iter__ . An iterator doesn't necessarily have a defined length -- e.g. a stream or a generator.
There's a difference between the ABC "Sequence" and the informally named sequence concept used in the C API. It's basically just that the C API term predates the ABC significantly, and there's no way that we'd change the C API naming because it would break too much code, but IMO it's just one of those "historical reasons" type of things that can't really be adequately explained, but just needs to be accepted... An ABC Sequence has __getitem__ and __len__. In terms of ABCs, something with __iter__ is an Iterable. Informal terminology is a different matter... Paul
Terry Reedy wrote:
I am surprised that a C-API function calls something a 'sequence' without it having __len__.
It's a bit strange that PySequence_Check exists at all. The principle of duck typing would suggest that one should be checking for the specific methods one needs. I suspect it's a holdover from very early Python, where the notion of a "sequence type" and a "mapping type" were more of a concrete thing. This is reflected in the existence of the tp_as_sequence and tp_as_mapping substructures. It was expected that a given type would either implement all the methods in one of those substructures or none of them, so shorcuts such as checking for just one method and assuming the others would exist made sense. But user-defined classes messed all that up, because it became possible to create a type that has __getitem__ but not __len__, etc. It also made it impossible to distinguish reliably between a sequence and a mapping. So it seems to me that PySequence_Check and related functions are not very useful any more, since it's not possible for them to really do what they claim to do. -- Greg
On 6/22/2018 7:57 PM, Greg Ewing wrote:
Terry Reedy wrote:
I am surprised that a C-API function calls something a 'sequence' without it having __len__.
It's a bit strange that PySequence_Check exists at all. The principle of duck typing would suggest that one should be checking for the specific methods one needs.
I suspect it's a holdover from very early Python, where the notion of a "sequence type" and a "mapping type" were more of a concrete thing. This is reflected in the existence of the tp_as_sequence and tp_as_mapping substructures. It was expected that a given type would either implement all the methods in one of those substructures or none of them, so shorcuts such as checking for just one method and assuming the others would exist made sense.
But user-defined classes messed all that up, because it became possible to create a type that has __getitem__ but not __len__, etc. It also made it impossible to distinguish reliably between a sequence and a mapping.
So it seems to me that PySequence_Check and related functions are not very useful any more, since it's not possible for them to really do what they claim to do.
So one should not take them as defining what they appear to define. In a sense, 'PySequence_Check' should be 'PySubscriptable_Check'. -- Terry Jan Reedy
participants (7)
-
Brett Cannon
-
Christian Tismer
-
Greg Ewing
-
Ivan Pozdeev
-
Nick Coghlan
-
Paul Moore
-
Terry Reedy