semantics of subclassing things from itertools
Hi I would like to know what are the semantics if you subclass something from itertools (e.g. islice). Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator", but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python. I would like some clarification on that. Cheers, fijal
On 10.09.15 10:23, Maciej Fijalkowski wrote:
I would like to know what are the semantics if you subclass something from itertools (e.g. islice).
Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator", but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python.
I would like some clarification on that.
There is another reason why itertools iterators can't be implemented as simple generator functions. All iterators are pickleable in 3.x.
On Thu, Sep 10, 2015 at 10:26 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
On 10.09.15 10:23, Maciej Fijalkowski wrote:
I would like to know what are the semantics if you subclass something from itertools (e.g. islice).
Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator", but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python.
I would like some clarification on that.
There is another reason why itertools iterators can't be implemented as simple generator functions. All iterators are pickleable in 3.x.
maybe the documentation should reflect that? (note that generators are pickleable on pypy anyway)
On 10.09.15 15:50, Maciej Fijalkowski wrote:
On Thu, Sep 10, 2015 at 10:26 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
There is another reason why itertools iterators can't be implemented as simple generator functions. All iterators are pickleable in 3.x.
maybe the documentation should reflect that? (note that generators are pickleable on pypy anyway)
This pickling is not compatible with CPython. So even if itertools classes would not subclassable, you would need to implement itertools iterators as classes.
On 9/10/2015 3:23 AM, Maciej Fijalkowski wrote:
Hi
I would like to know what are the semantics if you subclass something from itertools (e.g. islice).
I believe people are depending on an undocumented internal speed optimization. See below.
Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator",
I believe Raymond has said that 'equivalent' should be taken as 'equivalent in essential function for iterating' rather than 'exactly equivalent for all operations'. The results of type() and isinstance() are not relevant for this. The itertools doc begins with "This module implements a number of iterator building blocks ..." After listing them, it says "The following module *functions* all construct and return iterators." (These part of the doc are unchanged from the original in 2.3. I added the emphasis.) The generator functions are mathematically equivalent if they produce equivalent iterators for the same inputs. The iterators are equivalent if they produce the same stream of objects when iterated. If they do, the doc is correct; if not, the doc is buggy and should be fixed. I see the undocumented fact that the module *functions* are implemented as C classes as an internal implementation detail to optimize speed. I believe Raymond intentionally used 'function' rather than 'class' and intended the equivalents to be usable by other implementations. Ask Raymond directly (he is not currently active on pydev) if I am correct.
but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python.
You could write equivalent iterator classes in Python, but the result would be significantly slower. -- Terry Jan Reedy
On Sep 10, 2015, at 3:23 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I would like to know what are the semantics if you subclass something from itertools (e.g. islice).
Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator", but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python.
I would like some clarification on that.
The docs should say "roughly equivalent to" not "exactly equivalent to". The intended purpose of the examples in the itertools docs is to use pure python code to help people better understand each tool. It is not is intended to dictate that tool x is a generator or is a function. The intended semantics are that the itertools are classes (not functions and not generators). They are intended to be sub-classable (that is why they have Py_TPFLAGS_BASETYPE defined). The description as a function was perhaps used too loosely (in much the same way that we tend to think of int(3.14) as being a function when int is really a class). I tend to think about mapping, filtering, accumulating, as being functions while at the same time knowing that they are actually classes that produce iterators. The section called "itertools functions" is a misnomer but is also useful because the patterns of documenting functions better fit the itertools and because documenting them as classes suggest that they should each have a list of methods on that class (which doesn't make send because the itertools are each one trick ponies with no aspirations to grow a pool of methods). When I get a chance, I'll go through those docs and make them more precise. Sorry for the ambiguity. Raymond
On Fri, Sep 11, 2015 at 1:48 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Sep 10, 2015, at 3:23 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I would like to know what are the semantics if you subclass something from itertools (e.g. islice).
Right now it's allowed and people do it, which is why the documentation is incorrect. It states "equivalent to: a function-or a generator", but you can't subclass whatever it is equivalent to, which is why in PyPy we're unable to make it work in pure python.
I would like some clarification on that.
The docs should say "roughly equivalent to" not "exactly equivalent to". The intended purpose of the examples in the itertools docs is to use pure python code to help people better understand each tool. It is not is intended to dictate that tool x is a generator or is a function.
The intended semantics are that the itertools are classes (not functions and not generators). They are intended to be sub-classable (that is why they have Py_TPFLAGS_BASETYPE defined).
Ok, so what's completely missing from the documentation is what *are* the semantics of subclasses of those classes? Can you override any magic methods? Can you override next (which is or isn't a magic method depending how you look)? Etc. The documentation on this is completely missing and it's left guessing with "whatever cpython happens to be doing".
On Sep 13, 2015, at 3:49 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
The intended semantics are that the itertools are classes (not functions and not generators). They are intended to be sub-classable (that is why they have Py_TPFLAGS_BASETYPE defined).
Ok, so what's completely missing from the documentation is what *are* the semantics of subclasses of those classes? Can you override any magic methods? Can you override next (which is or isn't a magic method depending how you look)? Etc.
The documentation on this is completely missing and it's left guessing with "whatever cpython happens to be doing".
The reason it is underspecified is that this avenue of development was never explored (not thought about, planned, used, tested, or documented). IIRC, the entire decision process for having Py_TPFLAGS_BASETYPE boiled down to a single question: Was there any reason to close this door and make the itertools not subclassable? For something like NoneType, there was a reason to be unsubclassable; otherwise, the default choice was to give users maximum flexibility (the itertools were intended to be a generic set of building blocks, forming what Guido termed an "iterator algebra"). As an implementor of another version of Python, you are reasonably asking the question, what is the specification for subclassing semantics? The answer is somewhat unsatisfying -- I don't know because I've never thought about it. As far as I can tell, this question has never come up in the 13 years of itertools existence and you may be the first person to have ever cared about this. Raymond
On Sun, Sep 13, 2015 at 5:46 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Sep 13, 2015, at 3:49 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
The intended semantics are that the itertools are classes (not functions and not generators). They are intended to be sub-classable (that is why they have Py_TPFLAGS_BASETYPE defined).
Ok, so what's completely missing from the documentation is what *are* the semantics of subclasses of those classes? Can you override any magic methods? Can you override next (which is or isn't a magic method depending how you look)? Etc.
The documentation on this is completely missing and it's left guessing with "whatever cpython happens to be doing".
The reason it is underspecified is that this avenue of development was never explored (not thought about, planned, used, tested, or documented). IIRC, the entire decision process for having Py_TPFLAGS_BASETYPE boiled down to a single question: Was there any reason to close this door and make the itertools not subclassable?
For something like NoneType, there was a reason to be unsubclassable; otherwise, the default choice was to give users maximum flexibility (the itertools were intended to be a generic set of building blocks, forming what Guido termed an "iterator algebra").
As an implementor of another version of Python, you are reasonably asking the question, what is the specification for subclassing semantics? The answer is somewhat unsatisfying -- I don't know because I've never thought about it. As far as I can tell, this question has never come up in the 13 years of itertools existence and you may be the first person to have ever cared about this.
Raymond
Well, fair enough, but the semantics of "whatever happens to happen because we decided subclassing is a cool idea" is possibly the worst answer to those questions. Ideally, make it non-subclassable. If you want to have it subclassable, then please have defined semantics as opposed to undefined.
On Sep 13, 2015, at 3:09 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Well, fair enough, but the semantics of "whatever happens to happen because we decided subclassing is a cool idea" is possibly the worst answer to those questions.
It's hard to read this in any way that isn't insulting. It was subclassable because a) it was a class, 2) type/class unification was pushing us in the direction of making builtin types more like regular classes (which are subclassable), and 3) because it seemed potentially useful to users (and apparently it has been because users are subclassing it). FWIW, the code was modeled on what was done for enumerate() and reversed() where I got a lot of coaching and review from Tim Peters, Alex Martelli, Fredrik Lundh, and other python luminaries of the day.
Ideally, make it non-subclassable. If you want to have it subclassable, then please have defined semantics as opposed to undefined.
No, I'm not going to change a 13 year-old API and break existing user code just because you've gotten worked-up about it. FWIW, the semantics wouldn't even be defined in the itertools docs. It is properly in some section that describes what happens to any C type that defines sets the Py_TPFLAGS_BASETYPE flag. In general, all of the exposed dunder methods are overridable or extendable by subclassers. Raymond P.S. Threads like this are why I've developed an aversion to python-dev. I've answered your questions with respect and candor. I've been sympathetic to your unique needs as someone building an implementation of a language that doesn't have a spec. I was apologetic that the docs which have been helpful to users weren't precise enough for your needs. In return, you've suggested that my first contributions to Python were irresponsible and based on doing whatever seemed cool. In fact, the opposite is the case. I spent a full summer researching how similar tools were used in other languages and fitting them into Python in a way that supported known use cases. I raised the standard of the Python docs by including rough python equivalent code, showing sample inputs and outputs, building a quick navigation and summary section as the top of the docs, adding a recipes section, making thorough unittests, and getting input from Alex, Tim, and Fredrik (Guido also gave high level advice on the module design). I'm not inclined to go on with this thread. Your questions have been answered to the extent that I remember the answers. If you have a doc patch you want to submit, please assign it to me on the tracker. I would be happy to review it.
Hey Raymond I'm sorry you got insulted, that was not my intention. I suppose something like "itertools objects are implemented as classes internally, which means they're subclassable like other builtin types" is an improvement to documentation. On Mon, Sep 14, 2015 at 12:17 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Sep 13, 2015, at 3:09 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
Well, fair enough, but the semantics of "whatever happens to happen because we decided subclassing is a cool idea" is possibly the worst answer to those questions.
It's hard to read this in any way that isn't insulting.
It was subclassable because a) it was a class, 2) type/class unification was pushing us in the direction of making builtin types more like regular classes (which are subclassable), and 3) because it seemed potentially useful to users (and apparently it has been because users are subclassing it).
FWIW, the code was modeled on what was done for enumerate() and reversed() where I got a lot of coaching and review from Tim Peters, Alex Martelli, Fredrik Lundh, and other python luminaries of the day.
Ideally, make it non-subclassable. If you want to have it subclassable, then please have defined semantics as opposed to undefined.
No, I'm not going to change a 13 year-old API and break existing user code just because you've gotten worked-up about it.
FWIW, the semantics wouldn't even be defined in the itertools docs. It is properly in some section that describes what happens to any C type that defines sets the Py_TPFLAGS_BASETYPE flag. In general, all of the exposed dunder methods are overridable or extendable by subclassers.
Raymond
P.S. Threads like this are why I've developed an aversion to python-dev. I've answered your questions with respect and candor. I've been sympathetic to your unique needs as someone building an implementation of a language that doesn't have a spec. I was apologetic that the docs which have been helpful to users weren't precise enough for your needs.
In return, you've suggested that my first contributions to Python were irresponsible and based on doing whatever seemed cool.
In fact, the opposite is the case. I spent a full summer researching how similar tools were used in other languages and fitting them into Python in a way that supported known use cases. I raised the standard of the Python docs by including rough python equivalent code, showing sample inputs and outputs, building a quick navigation and summary section as the top of the docs, adding a recipes section, making thorough unittests, and getting input from Alex, Tim, and Fredrik (Guido also gave high level advice on the module design).
I'm not inclined to go on with this thread. Your questions have been answered to the extent that I remember the answers. If you have a doc patch you want to submit, please assign it to me on the tracker. I would be happy to review it.
participants (4)
-
Maciej Fijalkowski -
Raymond Hettinger -
Serhiy Storchaka -
Terry Reedy