Re: [Python-Dev] cpython (3.2): don't mention implementation detail
On Mon, 19 Dec 2011 22:42:43 +0100
benjamin.peterson
http://hg.python.org/cpython/rev/d85efd73b0e1 changeset: 74088:d85efd73b0e1 branch: 3.2 parent: 74082:71e5a083f9b1 user: Benjamin Peterson
date: Mon Dec 19 16:41:11 2011 -0500 summary: don't mention implementation detail files: Doc/library/operator.rst | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst --- a/Doc/library/operator.rst +++ b/Doc/library/operator.rst @@ -12,11 +12,11 @@ from operator import itemgetter, iadd
-The :mod:`operator` module exports a set of functions implemented in C -corresponding to the intrinsic operators of Python. For example, -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function -names are those used for special class methods; variants without leading and -trailing ``__`` are also provided for convenience.
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function. You could tag it as a "CPython implementation detail" if you want, or talk about performance rather than mention "C". Regards Antoine.
On Tue, Dec 20, 2011 at 10:51 AM, Antoine Pitrou
On Mon, 19 Dec 2011 22:42:43 +0100 benjamin.peterson
wrote: http://hg.python.org/cpython/rev/d85efd73b0e1 changeset: 74088:d85efd73b0e1 branch: 3.2 parent: 74082:71e5a083f9b1 user: Benjamin Peterson
date: Mon Dec 19 16:41:11 2011 -0500 summary: don't mention implementation detail files: Doc/library/operator.rst | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst --- a/Doc/library/operator.rst +++ b/Doc/library/operator.rst @@ -12,11 +12,11 @@ from operator import itemgetter, iadd
-The :mod:`operator` module exports a set of functions implemented in C -corresponding to the intrinsic operators of Python. For example, -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function -names are those used for special class methods; variants without leading and -trailing ``__`` are also provided for convenience.
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function.
You could tag it as a "CPython implementation detail" if you want, or talk about performance rather than mention "C".
Regards
Antoine.
If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :) Cheers, fijal
Le mardi 20 décembre 2011 à 12:01 +0200, Maciej Fijalkowski a écrit :
If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :)
But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. So not mentioning that it provides a speed benefit on CPython hides the primary reason for using the operator module. Overwise it's just a bunch of useless wrappers. --------- More generally, not talking about performance at all is more harmful than making CPython-specific comments in the documentation. Implementation details *deserve* to be documented when they have an impact on behaviour (including performance / resource usage). Python is not just a platonic ideal. Do you suggest we also remove this part: http://docs.python.org/dev/library/io.html#performance ? Regards Antoine.
On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou
If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :)
But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. So not mentioning that it provides a speed benefit on CPython hides the primary reason for using the operator module. Overwise it's just a bunch of useless wrappers.
So the question is if the docs are Python documentation or CPython documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) be just as fast as operator.attrgetter("foo").
Implementation details *deserve* to be documented when they have an impact on behaviour (including performance / resource usage). Python is not just a platonic ideal. Do you suggest we also remove this part: http://docs.python.org/dev/library/io.html#performance ?
I agree that it's good to document some implementation details, but it seems like the paragraph, as it was before, documented too many details. It seems like a paragraph that mentions the specificity of this aspect for CPython and omits the reference to C as the VM implementation should be acceptable to all parties. Cheers, Dirkjan
On Tue, 20 Dec 2011 11:14:15 +0100
Dirkjan Ochtman
On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou
wrote: If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :)
But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. So not mentioning that it provides a speed benefit on CPython hides the primary reason for using the operator module. Overwise it's just a bunch of useless wrappers.
So the question is if the docs are Python documentation or CPython documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) be just as fast as operator.attrgetter("foo").
I would expect it to be just as fast right now, although that's just an uninformed guess. That said, CPython is both the dominant implementation and the only one (AFAIR) to have stable 3.2 support.
Implementation details *deserve* to be documented when they have an impact on behaviour (including performance / resource usage). Python is not just a platonic ideal. Do you suggest we also remove this part: http://docs.python.org/dev/library/io.html#performance ?
I agree that it's good to document some implementation details, but it seems like the paragraph, as it was before, documented too many details. It seems like a paragraph that mentions the specificity of this aspect for CPython and omits the reference to C as the VM implementation should be acceptable to all parties.
Agreed. The original wording was poor since it mentioned C while what is really significant is performance. There are probably Python programmers who don't even know what C is. Regards Antoine.
On Tue, Dec 20, 2011 at 12:14 PM, Dirkjan Ochtman
On Tue, Dec 20, 2011 at 11:08, Antoine Pitrou
wrote: If this documentation is to be used by other python implementations, then mentions of performance are outright harmful, since the performance characteristics differ quite drastically. Written in C is also not a part of specification as far as I know :)
But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. So not mentioning that it provides a speed benefit on CPython hides the primary reason for using the operator module. Overwise it's just a bunch of useless wrappers.
So the question is if the docs are Python documentation or CPython documentation? On PyPy, I'm guessing lambda x: x.foo might (some day) be just as fast as operator.attrgetter("foo").
as of now lambda is much faster on pypy for a constant name (there is not a good reason why exactly attrgetter is slower, but it somehow losts the fact that name is constant if it is). I'm in general fine with saying that this is either Python documentation or CPython documentation, but leaving this intermingled has caused us quite some headaches in the past. For example using attrgetter and map rather than just writing a loop is slower on PyPy, so a knowledge that it's *fast* in the operator module is misleading *in Python*. How about we somehow mark that all python documentation when it talks about performance, it talks about CPython performance? Cheers, fijal
On 2011-12-20, at 11:08 , Antoine Pitrou wrote:
But that's basically the only reason to invoke the `operator.attrgetter("foo")` ugliness, instead of writing the explicit and obvious `lambda x: x.foo`. I don't agree with this, an attrgetter in the current namespace can be clearer than an explicit lambda in place, and more importantly when trying to fetch more than one attribute attrgetter is far superior to lambdas as far as I'm concerned.
I don't think I've ever seen `attrgetter` (or any of the other `operator` functions) advocated on basis of speed. This mention does not even exist in the Python 2 docs, which does not prevent people from using `operator`.
On 12/20/2011 3:51 AM, Antoine Pitrou wrote:
On Mon, 19 Dec 2011 22:42:43 +0100 benjamin.peterson
wrote: http://hg.python.org/cpython/rev/d85efd73b0e1 changeset: 74088:d85efd73b0e1 branch: 3.2 parent: 74082:71e5a083f9b1 user: Benjamin Peterson
date: Mon Dec 19 16:41:11 2011 -0500 summary: don't mention implementation detail files: Doc/library/operator.rst | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst --- a/Doc/library/operator.rst +++ b/Doc/library/operator.rst @@ -12,11 +12,11 @@ from operator import itemgetter, iadd
-The :mod:`operator` module exports a set of functions implemented in C -corresponding to the intrinsic operators of Python. For example, -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function -names are those used for special class methods; variants without leading and -trailing ``__`` are also provided for convenience.
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function.
You could tag it as a "CPython implementation detail" if you want, or talk about performance rather than mention "C".
The existence of operator and the behavior of its functions is not a C implementation detail. So some change was needed. I think a programmer can assume that they are are written in the implementation language to be as fast as possible. I do not think we should load the manual with 'In CPython, this is implemented in C" notes all over. For instance, there is nothing is the library manual that I can see that specifies that the builtin functions and types are written in C (for CPython). And I remember that Guido has asked that the manual not discuss big O() behavior of the methods of builtin classes. I so see a note like "The binascii module contains low-level functions written in C for greater speed that are used by the higher-level modules." But that should be revised somehow for the same reason as operator. But I don't this this is typical. The heapq module makes no mention of _heapq. I think all this sort of stuff belong in a separate CPython Notes. Perhaps Python Setup and Usage could be renamed CPython Setup and Usage and expanded with more info on gc (ref counting), O() notes, Python vs. C code, etc. I presume that other implementations are not run with 'python script.py', so the very first section is CPython specific anyway. In fact, I have the impression that for some *nix systems, that is CPython 2 specific. -- Terry Jan Reedy
On Tue, 20 Dec 2011 05:27:41 -0500
Terry Reedy
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function.
You could tag it as a "CPython implementation detail" if you want, or talk about performance rather than mention "C".
The existence of operator and the behavior of its functions is not a C implementation detail.
And?
I think a programmer can assume that they are are written in the implementation language to be as fast as possible.
Yeah, you can assume anything, and then get bitten by the fact that e.g. OrderedDict is pure Python and thus massively slower than dict. But at least you've achieved some platonic ideal of how documentation should not talk about implementation details, which is great, right? Why you think we should leave users in the dark rather than inform them is beyond me. While we certainly should find a good compromise between readability and completeness, and should certainly tweak the doc's wording and layout adequately, removing useful information is nonsense.
For instance, there is nothing is the library manual that I can see that specifies that the builtin functions and types are written in C (for CPython).
I guess everyone expects builtin functions and types to be reasonably fast, regardless of the language or implementation. (even though I did see some beginner code rewrite its own slow "list" wrapper, so it's probably not an universal expectation)
Perhaps Python Setup and Usage could be renamed CPython Setup and Usage and expanded with more info on gc (ref counting), O() notes, Python vs. C code, etc.
Really? That's a perfectly inappropriate place to talk about performance details of *any* implementation. Regards Antoine.
Wiadomość napisana przez Antoine Pitrou w dniu 20 gru 2011, o godz. 11:57:
Why you think we should leave users in the dark rather than inform them is beyond me. While we certainly should find a good compromise between readability and completeness, and should certainly tweak the doc's wording and layout adequately, removing useful information is nonsense.
+1 -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o.
On Tue, Dec 20, 2011 at 11:27, Terry Reedy
And I remember that Guido has asked that the manual not discuss big O() behavior of the methods of builtin classes.
Do you know when/where he did that? It seems useful to know that on CPython, list.insert(0, x) will become slow as the list grows... It probably shouldn't be upfront, but O() hints for some of the core stuff seems useful (though again, in some cases they should probably be limited to CPython). Cheers, Dirkjan
Wiadomość napisana przez Dirkjan Ochtman w dniu 20 gru 2011, o godz. 12:24:
On Tue, Dec 20, 2011 at 11:27, Terry Reedy
wrote: And I remember that Guido has asked that the manual not discuss big O() behavior of the methods of builtin classes.
Do you know when/where he did that?
http://mail.python.org/pipermail/python-dev/2008-March/077511.html -- Best regards, Łukasz Langa Senior Systems Architecture Engineer IT Infrastructure Department Grupa Allegro Sp. z o.o.
On Tue, Dec 20, 2011 at 6:24 AM, Dirkjan Ochtman
On Tue, Dec 20, 2011 at 11:27, Terry Reedy
wrote: And I remember that Guido has asked that the manual not discuss big O() behavior of the methods of builtin classes.
Do you know when/where he did that? It seems useful to know that on CPython, list.insert(0, x) will become slow as the list grows... It probably shouldn't be upfront, but O() hints for some of the core stuff seems useful (though again, in some cases they should probably be limited to CPython).
I think the question of the day is whether the documentation is targeting those who wish to have an understanding of what is happening under the hood, or those that want to take such details for granted. I much prefer the little notes and performance hints. - John
2011/12/20 Antoine Pitrou
On Mon, 19 Dec 2011 22:42:43 +0100 benjamin.peterson
wrote: http://hg.python.org/cpython/rev/d85efd73b0e1 changeset: 74088:d85efd73b0e1 branch: 3.2 parent: 74082:71e5a083f9b1 user: Benjamin Peterson
date: Mon Dec 19 16:41:11 2011 -0500 summary: don't mention implementation detail files: Doc/library/operator.rst | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst --- a/Doc/library/operator.rst +++ b/Doc/library/operator.rst @@ -12,11 +12,11 @@ from operator import itemgetter, iadd
-The :mod:`operator` module exports a set of functions implemented in C -corresponding to the intrinsic operators of Python. For example, -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function -names are those used for special class methods; variants without leading and -trailing ``__`` are also provided for convenience.
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function.
In that case, I would rather speak of "fast" functions rather than "implemented in C" functions (a la the itertools docs). Would that be acceptable? -- Regards, Benjamin
Le mardi 20 décembre 2011 à 10:57 -0500, Benjamin Peterson a écrit :
2011/12/20 Antoine Pitrou
: On Mon, 19 Dec 2011 22:42:43 +0100 benjamin.peterson
wrote: http://hg.python.org/cpython/rev/d85efd73b0e1 changeset: 74088:d85efd73b0e1 branch: 3.2 parent: 74082:71e5a083f9b1 user: Benjamin Peterson
date: Mon Dec 19 16:41:11 2011 -0500 summary: don't mention implementation detail files: Doc/library/operator.rst | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Doc/library/operator.rst b/Doc/library/operator.rst --- a/Doc/library/operator.rst +++ b/Doc/library/operator.rst @@ -12,11 +12,11 @@ from operator import itemgetter, iadd
-The :mod:`operator` module exports a set of functions implemented in C -corresponding to the intrinsic operators of Python. For example, -``operator.add(x, y)`` is equivalent to the expression ``x+y``. The function -names are those used for special class methods; variants without leading and -trailing ``__`` are also provided for convenience.
I disagree with this change. Knowing that they are written in C is important when deciding to pass them to e.g. sort() or sorted(), because you know it will be faster than an arbitrary pure Python function.
In that case, I would rather speak of "fast" functions rather than "implemented in C" functions (a la the itertools docs). Would that be acceptable?
Definitely. Regards Antoine.
2011/12/20 Antoine Pitrou
Le mardi 20 décembre 2011 à 10:57 -0500, Benjamin Peterson a écrit :
In that case, I would rather speak of "fast" functions rather than "implemented in C" functions (a la the itertools docs). Would that be acceptable?
Definitely.
Done. -- Regards, Benjamin
On 12/20/2011 11:15 AM, Benjamin Peterson wrote:
2011/12/20 Antoine Pitrou
: Le mardi 20 décembre 2011 à 10:57 -0500, Benjamin Peterson a écrit :
In that case, I would rather speak of "fast" functions rather than "implemented in C" functions (a la the itertools docs). Would that be acceptable?
Definitely.
Done.
I like what you did too. -- Terry Jan Reedy
participants (8)
-
Antoine Pitrou
-
Benjamin Peterson
-
Dirkjan Ochtman
-
John O'Connor
-
Maciej Fijalkowski
-
Terry Reedy
-
Xavier Morel
-
Łukasz Langa